Initial release This is a GPU-accelerated AV1 decoder for the Xbox One

commit: eb966e45d9fd4561743909aeef007fc368d2e228 [log] [tgz]
author: James Zern <jzern@google.com> Fri Nov 13 16:44:17 2020 -0800
committer: James Zern <jzern@google.com> Fri Nov 13 17:00:01 2020 -0800
tree: 3225afd842febdd0d01cfd37d824ab24a33a7d33
parent: f1bcb610c1c34b10a9b7844dd5924b067bf2d151 [diff]
diff --git a/LICENSE.txt b/LICENSE.txt
new file mode 100644
index 0000000..0c51157
--- /dev/null
+++ b/LICENSE.txt

@@ -0,0 +1,57 @@
+Alliance for Open Media Patent License 1.0
+License Terms.
+1.1. Patent License. Subject to the terms and conditions of this License, each Licensor, on behalf of itself and successors in interest and assigns, grants Licensee a non-sublicensable, perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as expressly stated in this License) patent license to its Necessary Claims to make, use, sell, offer for sale, import or distribute any Implementation.
+
+1.2. Conditions.
+
+1.2.1. Availability. As a condition to the grant of rights to Licensee to make, sell, offer for sale, import or distribute an Implementation under Section 1.1, Licensee must make its Necessary Claims available under this License, and must reproduce this License with any Implementation as follows:
+
+a. For distribution in source code, by including this License in the root directory of the source code with its Implementation.
+
+b. For distribution in any other form (including binary, object form, and/or hardware description code (e.g., HDL, RTL, Gate Level Netlist, GDSII, etc.)), by including this License in the documentation, legal notices, and/or other written materials provided with the Implementation.
+
+1.2.2. Additional Conditions. This license is directly from Licensor to Licensee. Licensee acknowledges as a condition of benefiting from it that no rights from Licensor are received from suppliers, distributors, or otherwise in connection with this License.
+
+1.3. Defensive Termination. If any Licensee, its Affiliates, or its agents initiates patent litigation or files, maintains, or voluntarily participates in a lawsuit against another entity or any person asserting that any Implementation infringes Necessary Claims, any patent licenses granted under this License directly to the Licensee are immediately terminated as of the date of the initiation of action unless 1) that suit was in response to a corresponding suit regarding an Implementation first brought against an initiating entity, or 2) that suit was brought to enforce the terms of this License (including intervention in a third-party action by a Licensee).
+
+1.4. Disclaimers. The Reference Implementation and Specification are provided “AS IS” and without warranty. The entire risk as to implementing or otherwise using the Reference Implementation or Specification is assumed by the implementer and user. Licensor expressly disclaims any warranties (express, implied, or otherwise), including implied warranties of merchantability, non-infringement, fitness for a particular purpose, or title, related to the material. IN NO EVENT WILL LICENSOR BE LIABLE TO ANY OTHER PARTY FOR LOST PROFITS OR ANY FORM OF INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER FROM ANY CAUSES OF ACTION OF ANY KIND WITH RESPECT TO THIS LICENSE, WHETHER BASED ON BREACH OF CONTRACT, TORT (INCLUDING NEGLIGENCE), OR OTHERWISE, AND WHETHER OR NOT THE OTHER PARTRY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Definitions.
+2.1. Affiliate. “Affiliate” means an entity that directly or indirectly Controls, is Controlled by, or is under common Control of that party.
+
+2.2. Control. “Control” means direct or indirect control of more than 50% of the voting power to elect directors of that corporation, or for any other entity, the power to direct management of such entity.
+
+2.3. Decoder. “Decoder” means any decoder that conforms fully with all non-optional portions of the Specification.
+
+2.4. Encoder. “Encoder” means any encoder that produces a bitstream that can be decoded by a Decoder only to the extent it produces such a bitstream.
+
+2.5. Final Deliverable. “Final Deliverable” means the final version of a deliverable approved by the Alliance for Open Media as a Final Deliverable.
+
+2.6. Implementation. “Implementation” means any implementation, including the Reference Implementation, that is an Encoder and/or a Decoder. An Implementation also includes components of an Implementation only to the extent they are used as part of an Implementation.
+
+2.7. License. “License” means this license.
+
+2.8. Licensee. “Licensee” means any person or entity who exercises patent rights granted under this License.
+
+2.9. Licensor. “Licensor” means (i) any Licensee that makes, sells, offers for sale, imports or distributes any Implementation, or (ii) a person or entity that has a licensing obligation to the Implementation as a result of its membership and/or participation in the Alliance for Open Media working group that developed the Specification.
+
+2.10. Necessary Claims. “Necessary Claims” means all claims of patents or patent applications, (a) that currently or at any time in the future, are owned or controlled by the Licensor, and (b) (i) would be an Essential Claim as defined by the W3C Policy as of February 5, 2004 (https://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential) as if the Specification was a W3C Recommendation; or (ii) are infringed by the Reference Implementation.
+
+2.11. Reference Implementation. “Reference Implementation” means an Encoder and/or Decoder released by the Alliance for Open Media as a Final Deliverable.
+
+2.12. Specification. “Specification” means the specification designated by the Alliance for Open Media as a Final Deliverable for which this License was issued.
+
+Software License
+Copyright (c) 2016, Alliance for Open Media
+
+All rights reserved.
+
+Copyright 2020 Google LLC
+
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+
+Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+
+Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
\ No newline at end of file

diff --git a/README.pdf b/README.pdf
new file mode 100644
index 0000000..9688cf9
--- /dev/null
+++ b/README.pdf
Binary files differ

diff --git a/av1xbox.sln b/av1xbox.sln
new file mode 100644
index 0000000..9bb1835
--- /dev/null
+++ b/av1xbox.sln

@@ -0,0 +1,45 @@
+

+Microsoft Visual Studio Solution File, Format Version 12.00

+# Visual Studio 15

+VisualStudioVersion = 15.0.26430.13

+MinimumVisualStudioVersion = 10.0.40219.1

+Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "tests", "tests", "{71E99F71-FAAA-4C20-9DC7-742F09A4F78E}"

+EndProject

+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "libav1", "libav1\build\libav1.vcxproj", "{E68F3562-9876-4971-8D90-27035E27A825}"

+EndProject

+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "uwp_dx12_player", "testing\uwp_dx12_player\uwp_dx12_player.vcxproj", "{634F5B60-2642-427E-902E-FF6A90965205}"

+EndProject

+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "uwp_common", "testing\uwp_common\uwp_common.vcxproj", "{C7B73550-4CA4-45C1-8436-936AFD7F0CA5}"

+EndProject

+Global

+	GlobalSection(SolutionConfigurationPlatforms) = preSolution

+		Debug|x64 = Debug|x64

+		Release|x64 = Release|x64

+	EndGlobalSection

+	GlobalSection(ProjectConfigurationPlatforms) = postSolution

+		{E68F3562-9876-4971-8D90-27035E27A825}.Debug|x64.ActiveCfg = Debug|x64

+		{E68F3562-9876-4971-8D90-27035E27A825}.Debug|x64.Build.0 = Debug|x64

+		{E68F3562-9876-4971-8D90-27035E27A825}.Release|x64.ActiveCfg = Release|x64

+		{E68F3562-9876-4971-8D90-27035E27A825}.Release|x64.Build.0 = Release|x64

+		{634F5B60-2642-427E-902E-FF6A90965205}.Debug|x64.ActiveCfg = Debug|x64

+		{634F5B60-2642-427E-902E-FF6A90965205}.Debug|x64.Build.0 = Debug|x64

+		{634F5B60-2642-427E-902E-FF6A90965205}.Debug|x64.Deploy.0 = Debug|x64

+		{634F5B60-2642-427E-902E-FF6A90965205}.Release|x64.ActiveCfg = Release|x64

+		{634F5B60-2642-427E-902E-FF6A90965205}.Release|x64.Build.0 = Release|x64

+		{634F5B60-2642-427E-902E-FF6A90965205}.Release|x64.Deploy.0 = Release|x64

+		{C7B73550-4CA4-45C1-8436-936AFD7F0CA5}.Debug|x64.ActiveCfg = Debug|x64

+		{C7B73550-4CA4-45C1-8436-936AFD7F0CA5}.Debug|x64.Build.0 = Debug|x64

+		{C7B73550-4CA4-45C1-8436-936AFD7F0CA5}.Release|x64.ActiveCfg = Release|x64

+		{C7B73550-4CA4-45C1-8436-936AFD7F0CA5}.Release|x64.Build.0 = Release|x64

+	EndGlobalSection

+	GlobalSection(SolutionProperties) = preSolution

+		HideSolutionNode = FALSE

+	EndGlobalSection

+	GlobalSection(NestedProjects) = preSolution

+		{634F5B60-2642-427E-902E-FF6A90965205} = {71E99F71-FAAA-4C20-9DC7-742F09A4F78E}

+		{C7B73550-4CA4-45C1-8436-936AFD7F0CA5} = {71E99F71-FAAA-4C20-9DC7-742F09A4F78E}

+	EndGlobalSection

+	GlobalSection(ExtensibilityGlobals) = postSolution

+		SolutionGuid = {67FED277-603A-4146-859C-9F01479D6253}

+	EndGlobalSection

+EndGlobal


diff --git a/libav1/LICENSE b/libav1/LICENSE
new file mode 100644
index 0000000..fc340c3
--- /dev/null
+++ b/libav1/LICENSE

@@ -0,0 +1,27 @@
+Copyright (c) 2016, Alliance for Open Media. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+1. Redistributions of source code must retain the above copyright
+   notice, this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright
+   notice, this list of conditions and the following disclaimer in
+   the documentation and/or other materials provided with the
+   distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+

diff --git a/libav1/PATENTS b/libav1/PATENTS
new file mode 100644
index 0000000..493f616
--- /dev/null
+++ b/libav1/PATENTS

@@ -0,0 +1,108 @@
+Alliance for Open Media Patent License 1.0
+
+1. License Terms.
+
+1.1. Patent License. Subject to the terms and conditions of this License, each
+     Licensor, on behalf of itself and successors in interest and assigns,
+     grants Licensee a non-sublicensable, perpetual, worldwide, non-exclusive,
+     no-charge, royalty-free, irrevocable (except as expressly stated in this
+     License) patent license to its Necessary Claims to make, use, sell, offer
+     for sale, import or distribute any Implementation.
+
+1.2. Conditions.
+
+1.2.1. Availability. As a condition to the grant of rights to Licensee to make,
+       sell, offer for sale, import or distribute an Implementation under
+       Section 1.1, Licensee must make its Necessary Claims available under
+       this License, and must reproduce this License with any Implementation
+       as follows:
+
+       a. For distribution in source code, by including this License in the
+          root directory of the source code with its Implementation.
+
+       b. For distribution in any other form (including binary, object form,
+          and/or hardware description code (e.g., HDL, RTL, Gate Level Netlist,
+          GDSII, etc.)), by including this License in the documentation, legal
+          notices, and/or other written materials provided with the
+          Implementation.
+
+1.2.2. Additional Conditions. This license is directly from Licensor to
+       Licensee.  Licensee acknowledges as a condition of benefiting from it
+       that no rights from Licensor are received from suppliers, distributors,
+       or otherwise in connection with this License.
+
+1.3. Defensive Termination. If any Licensee, its Affiliates, or its agents
+     initiates patent litigation or files, maintains, or voluntarily
+     participates in a lawsuit against another entity or any person asserting
+     that any Implementation infringes Necessary Claims, any patent licenses
+     granted under this License directly to the Licensee are immediately
+     terminated as of the date of the initiation of action unless 1) that suit
+     was in response to a corresponding suit regarding an Implementation first
+     brought against an initiating entity, or 2) that suit was brought to
+     enforce the terms of this License (including intervention in a third-party
+     action by a Licensee).
+
+1.4. Disclaimers. The Reference Implementation and Specification are provided
+     "AS IS" and without warranty. The entire risk as to implementing or
+     otherwise using the Reference Implementation or Specification is assumed
+     by the implementer and user. Licensor expressly disclaims any warranties
+     (express, implied, or otherwise), including implied warranties of
+     merchantability, non-infringement, fitness for a particular purpose, or
+     title, related to the material. IN NO EVENT WILL LICENSOR BE LIABLE TO
+     ANY OTHER PARTY FOR LOST PROFITS OR ANY FORM OF INDIRECT, SPECIAL,
+     INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER FROM ANY CAUSES OF
+     ACTION OF ANY KIND WITH RESPECT TO THIS LICENSE, WHETHER BASED ON BREACH
+     OF CONTRACT, TORT (INCLUDING NEGLIGENCE), OR OTHERWISE, AND WHETHER OR
+     NOT THE OTHER PARTRY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+2. Definitions.
+
+2.1. Affiliate.  "Affiliate" means an entity that directly or indirectly
+     Controls, is Controlled by, or is under common Control of that party.
+
+2.2. Control. "Control" means direct or indirect control of more than 50% of
+     the voting power to elect directors of that corporation, or for any other
+     entity, the power to direct management of such entity.
+
+2.3. Decoder.  "Decoder" means any decoder that conforms fully with all
+     non-optional portions of the Specification.
+
+2.4. Encoder.  "Encoder" means any encoder that produces a bitstream that can
+     be decoded by a Decoder only to the extent it produces such a bitstream.
+
+2.5. Final Deliverable.  "Final Deliverable" means the final version of a
+     deliverable approved by the Alliance for Open Media as a Final
+     Deliverable.
+
+2.6. Implementation.  "Implementation" means any implementation, including the
+     Reference Implementation, that is an Encoder and/or a Decoder. An
+     Implementation also includes components of an Implementation only to the
+     extent they are used as part of an Implementation.
+
+2.7. License. "License" means this license.
+
+2.8. Licensee. "Licensee" means any person or entity who exercises patent
+     rights granted under this License.
+
+2.9. Licensor.  "Licensor" means (i) any Licensee that makes, sells, offers
+     for sale, imports or distributes any Implementation, or (ii) a person
+     or entity that has a licensing obligation to the Implementation as a
+     result of its membership and/or participation in the Alliance for Open
+     Media working group that developed the Specification.
+
+2.10. Necessary Claims.  "Necessary Claims" means all claims of patents or
+      patent applications, (a) that currently or at any time in the future,
+      are owned or controlled by the Licensor, and (b) (i) would be an
+      Essential Claim as defined by the W3C Policy as of February 5, 2004
+      (https://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential)
+      as if the Specification was a W3C Recommendation; or (ii) are infringed
+      by the Reference Implementation.
+
+2.11. Reference Implementation. "Reference Implementation" means an Encoder
+      and/or Decoder released by the Alliance for Open Media as a Final
+      Deliverable.
+
+2.12. Specification. "Specification" means the specification designated by
+      the Alliance for Open Media as a Final Deliverable for which this
+      License was issued.
+

diff --git a/libav1/aom/aom.h b/libav1/aom/aom.h
new file mode 100644
index 0000000..b1cc1ec
--- /dev/null
+++ b/libav1/aom/aom.h

@@ -0,0 +1,147 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+/*!\defgroup aom AOM
+ * \ingroup codecs
+ * AOM is aom's newest video compression algorithm that uses motion
+ * compensated prediction, Discrete Cosine Transform (DCT) coding of the
+ * prediction error signal and context dependent entropy coding techniques
+ * based on arithmetic principles. It features:
+ *  - YUV 4:2:0 image format
+ *  - Macro-block based coding (16x16 luma plus two 8x8 chroma)
+ *  - 1/4 (1/8) pixel accuracy motion compensated prediction
+ *  - 4x4 DCT transform
+ *  - 128 level linear quantizer
+ *  - In loop deblocking filter
+ *  - Context-based entropy coding
+ *
+ * @{
+ */
+/*!\file
+ * \brief Provides controls common to both the AOM encoder and decoder.
+ */
+#ifndef AOM_AOM_AOM_H_
+#define AOM_AOM_AOM_H_
+
+#include "aom/aom_codec.h"
+#include "aom/aom_image.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*!\brief Control functions
+ *
+ * The set of macros define the control functions of AOM interface
+ */
+enum aom_com_control_id {
+  /*!\brief pass in an external frame into decoder to be used as reference frame
+   */
+  AOM_SET_POSTPROC = 3, /**< set the decoder's post processing settings  */
+  AOM_SET_DBG_COLOR_REF_FRAME =
+      4, /**< set the reference frames to color for each macroblock */
+  AOM_SET_DBG_COLOR_MB_MODES = 5, /**< set which macro block modes to color */
+  AOM_SET_DBG_COLOR_B_MODES = 6,  /**< set which blocks modes to color */
+  AOM_SET_DBG_DISPLAY_MV = 7,     /**< set which motion vector modes to draw */
+
+  /* TODO(jkoleszar): The encoder incorrectly reuses some of these values (5+)
+   * for its control ids. These should be migrated to something like the
+   * AOM_DECODER_CTRL_ID_START range next time we're ready to break the ABI.
+   */
+  AV1_GET_REFERENCE = 128, /**< get a pointer to a reference frame */
+  AV1_SET_REFERENCE = 129, /**< write a frame into a reference buffer */
+  AV1_COPY_REFERENCE =
+      130, /**< get a copy of reference frame from the decoder */
+  AOM_COMMON_CTRL_ID_MAX,
+
+  AV1_GET_NEW_FRAME_IMAGE = 192, /**< get a pointer to the new frame */
+  AV1_COPY_NEW_FRAME_IMAGE =
+      193, /**< copy the new frame to an external buffer */
+
+  AOM_DECODER_CTRL_ID_START = 256
+};
+
+/*!\brief post process flags
+ *
+ * The set of macros define AOM decoder post processing flags
+ */
+enum aom_postproc_level {
+  AOM_NOFILTERING = 0,
+  AOM_DEBLOCK = 1 << 0,
+  AOM_DEMACROBLOCK = 1 << 1,
+  AOM_ADDNOISE = 1 << 2,
+  AOM_DEBUG_TXT_FRAME_INFO = 1 << 3, /**< print frame information */
+  AOM_DEBUG_TXT_MBLK_MODES =
+      1 << 4, /**< print macro block modes over each macro block */
+  AOM_DEBUG_TXT_DC_DIFF = 1 << 5,   /**< print dc diff for each macro block */
+  AOM_DEBUG_TXT_RATE_INFO = 1 << 6, /**< print video rate info (encoder only) */
+  AOM_MFQE = 1 << 10
+};
+
+/*!\brief post process flags
+ *
+ * This define a structure that describe the post processing settings. For
+ * the best objective measure (using the PSNR metric) set post_proc_flag
+ * to AOM_DEBLOCK and deblocking_level to 1.
+ */
+
+typedef struct aom_postproc_cfg {
+  /*!\brief the types of post processing to be done, should be combination of
+   * "aom_postproc_level" */
+  int post_proc_flag;
+  int deblocking_level; /**< the strength of deblocking, valid range [0, 16] */
+  int noise_level; /**< the strength of additive noise, valid range [0, 16] */
+} aom_postproc_cfg_t;
+
+/*!\brief AV1 specific reference frame data struct
+ *
+ * Define the data struct to access av1 reference frames.
+ */
+typedef struct av1_ref_frame {
+  int idx;              /**< frame index to get (input) */
+  int use_external_ref; /**< Directly use external ref buffer(decoder only) */
+  aom_image_t img;      /**< img structure to populate (output) */
+} av1_ref_frame_t;
+
+/*!\cond */
+/*!\brief aom decoder control function parameter type
+ *
+ * defines the data type for each of AOM decoder control function requires
+ */
+AOM_CTRL_USE_TYPE(AOM_SET_POSTPROC, aom_postproc_cfg_t *)
+#define AOM_CTRL_AOM_SET_POSTPROC
+AOM_CTRL_USE_TYPE(AOM_SET_DBG_COLOR_REF_FRAME, int)
+#define AOM_CTRL_AOM_SET_DBG_COLOR_REF_FRAME
+AOM_CTRL_USE_TYPE(AOM_SET_DBG_COLOR_MB_MODES, int)
+#define AOM_CTRL_AOM_SET_DBG_COLOR_MB_MODES
+AOM_CTRL_USE_TYPE(AOM_SET_DBG_COLOR_B_MODES, int)
+#define AOM_CTRL_AOM_SET_DBG_COLOR_B_MODES
+AOM_CTRL_USE_TYPE(AOM_SET_DBG_DISPLAY_MV, int)
+#define AOM_CTRL_AOM_SET_DBG_DISPLAY_MV
+AOM_CTRL_USE_TYPE(AV1_GET_REFERENCE, av1_ref_frame_t *)
+#define AOM_CTRL_AV1_GET_REFERENCE
+AOM_CTRL_USE_TYPE(AV1_SET_REFERENCE, av1_ref_frame_t *)
+#define AOM_CTRL_AV1_SET_REFERENCE
+AOM_CTRL_USE_TYPE(AV1_COPY_REFERENCE, av1_ref_frame_t *)
+#define AOM_CTRL_AV1_COPY_REFERENCE
+AOM_CTRL_USE_TYPE(AV1_GET_NEW_FRAME_IMAGE, aom_image_t *)
+#define AOM_CTRL_AV1_GET_NEW_FRAME_IMAGE
+AOM_CTRL_USE_TYPE(AV1_COPY_NEW_FRAME_IMAGE, aom_image_t *)
+#define AOM_CTRL_AV1_COPY_NEW_FRAME_IMAGE
+
+/*!\endcond */
+/*! @} - end defgroup aom */
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_AOM_H_

diff --git a/libav1/aom/aom_codec.h b/libav1/aom/aom_codec.h
new file mode 100644
index 0000000..9dc2669
--- /dev/null
+++ b/libav1/aom/aom_codec.h

@@ -0,0 +1,517 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+/*!\defgroup codec Common Algorithm Interface
+ * This abstraction allows applications to easily support multiple video
+ * formats with minimal code duplication. This section describes the interface
+ * common to all codecs (both encoders and decoders).
+ * @{
+ */
+
+/*!\file
+ * \brief Describes the codec algorithm interface to applications.
+ *
+ * This file describes the interface between an application and a
+ * video codec algorithm.
+ *
+ * An application instantiates a specific codec instance by using
+ * aom_codec_init() and a pointer to the algorithm's interface structure:
+ *     <pre>
+ *     my_app.c:
+ *       extern aom_codec_iface_t my_codec;
+ *       {
+ *           aom_codec_ctx_t algo;
+ *           res = aom_codec_init(&algo, &my_codec);
+ *       }
+ *     </pre>
+ *
+ * Once initialized, the instance is managed using other functions from
+ * the aom_codec_* family.
+ */
+#ifndef AOM_AOM_AOM_CODEC_H_
+#define AOM_AOM_AOM_CODEC_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "aom/aom_image.h"
+#include "aom/aom_integer.h"
+
+/*!\brief Decorator indicating a function is deprecated */
+#ifndef AOM_DEPRECATED
+#if defined(__GNUC__) && __GNUC__
+#define AOM_DEPRECATED __attribute__((deprecated))
+#elif defined(_MSC_VER)
+#define AOM_DEPRECATED
+#else
+#define AOM_DEPRECATED
+#endif
+#endif /* AOM_DEPRECATED */
+
+#ifndef AOM_DECLSPEC_DEPRECATED
+#if defined(__GNUC__) && __GNUC__
+#define AOM_DECLSPEC_DEPRECATED /**< \copydoc #AOM_DEPRECATED */
+#elif defined(_MSC_VER)
+/*!\brief \copydoc #AOM_DEPRECATED */
+#define AOM_DECLSPEC_DEPRECATED __declspec(deprecated)
+#else
+#define AOM_DECLSPEC_DEPRECATED /**< \copydoc #AOM_DEPRECATED */
+#endif
+#endif /* AOM_DECLSPEC_DEPRECATED */
+
+/*!\brief Decorator indicating a function is potentially unused */
+#ifdef AOM_UNUSED
+#elif defined(__GNUC__) || defined(__clang__)
+#define AOM_UNUSED __attribute__((unused))
+#else
+#define AOM_UNUSED
+#endif
+
+/*!\brief Decorator indicating that given struct/union/enum is packed */
+#ifndef ATTRIBUTE_PACKED
+#if defined(__GNUC__) && __GNUC__
+#define ATTRIBUTE_PACKED __attribute__((packed))
+#elif defined(_MSC_VER)
+#define ATTRIBUTE_PACKED
+#else
+#define ATTRIBUTE_PACKED
+#endif
+#endif /* ATTRIBUTE_PACKED */
+
+/*!\brief Current ABI version number
+ *
+ * \internal
+ * If this file is altered in any way that changes the ABI, this value
+ * must be bumped.  Examples include, but are not limited to, changing
+ * types, removing or reassigning enums, adding/removing/rearranging
+ * fields to structures
+ */
+#define AOM_CODEC_ABI_VERSION (3 + AOM_IMAGE_ABI_VERSION) /**<\hideinitializer*/
+
+/*!\brief Algorithm return codes */
+typedef enum {
+  /*!\brief Operation completed without error */
+  AOM_CODEC_OK,
+
+  /*!\brief Unspecified error */
+  AOM_CODEC_ERROR,
+
+  /*!\brief Memory operation failed */
+  AOM_CODEC_MEM_ERROR,
+
+  /*!\brief ABI version mismatch */
+  AOM_CODEC_ABI_MISMATCH,
+
+  /*!\brief Algorithm does not have required capability */
+  AOM_CODEC_INCAPABLE,
+
+  /*!\brief The given bitstream is not supported.
+   *
+   * The bitstream was unable to be parsed at the highest level. The decoder
+   * is unable to proceed. This error \ref SHOULD be treated as fatal to the
+   * stream. */
+  AOM_CODEC_UNSUP_BITSTREAM,
+
+  /*!\brief Encoded bitstream uses an unsupported feature
+   *
+   * The decoder does not implement a feature required by the encoder. This
+   * return code should only be used for features that prevent future
+   * pictures from being properly decoded. This error \ref MAY be treated as
+   * fatal to the stream or \ref MAY be treated as fatal to the current GOP.
+   */
+  AOM_CODEC_UNSUP_FEATURE,
+
+  /*!\brief The coded data for this stream is corrupt or incomplete
+   *
+   * There was a problem decoding the current frame.  This return code
+   * should only be used for failures that prevent future pictures from
+   * being properly decoded. This error \ref MAY be treated as fatal to the
+   * stream or \ref MAY be treated as fatal to the current GOP. If decoding
+   * is continued for the current GOP, artifacts may be present.
+   */
+  AOM_CODEC_CORRUPT_FRAME,
+
+  /*!\brief An application-supplied parameter is not valid.
+   *
+   */
+  AOM_CODEC_INVALID_PARAM,
+
+  /*!\brief An iterator reached the end of list.
+   *
+   */
+  AOM_CODEC_LIST_END
+
+} aom_codec_err_t;
+
+/*! \brief Codec capabilities bitfield
+ *
+ *  Each codec advertises the capabilities it supports as part of its
+ *  ::aom_codec_iface_t interface structure. Capabilities are extra interfaces
+ *  or functionality, and are not required to be supported.
+ *
+ *  The available flags are specified by AOM_CODEC_CAP_* defines.
+ */
+typedef long aom_codec_caps_t;
+#define AOM_CODEC_CAP_DECODER 0x1 /**< Is a decoder */
+#define AOM_CODEC_CAP_ENCODER 0x2 /**< Is an encoder */
+
+/*! \brief Initialization-time Feature Enabling
+ *
+ *  Certain codec features must be known at initialization time, to allow for
+ *  proper memory allocation.
+ *
+ *  The available flags are specified by AOM_CODEC_USE_* defines.
+ */
+typedef long aom_codec_flags_t;
+
+/*!\brief Codec interface structure.
+ *
+ * Contains function pointers and other data private to the codec
+ * implementation. This structure is opaque to the application.
+ */
+typedef const struct aom_codec_iface aom_codec_iface_t;
+
+/*!\brief Codec private data structure.
+ *
+ * Contains data private to the codec implementation. This structure is opaque
+ * to the application.
+ */
+typedef struct aom_codec_priv aom_codec_priv_t;
+
+/*!\brief Iterator
+ *
+ * Opaque storage used for iterating over lists.
+ */
+typedef const void *aom_codec_iter_t;
+
+/*!\brief Codec context structure
+ *
+ * All codecs \ref MUST support this context structure fully. In general,
+ * this data should be considered private to the codec algorithm, and
+ * not be manipulated or examined by the calling application. Applications
+ * may reference the 'name' member to get a printable description of the
+ * algorithm.
+ */
+typedef struct aom_codec_ctx {
+  const char *name;             /**< Printable interface name */
+  aom_codec_iface_t *iface;     /**< Interface pointers */
+  aom_codec_err_t err;          /**< Last returned error */
+  const char *err_detail;       /**< Detailed info, if available */
+  aom_codec_flags_t init_flags; /**< Flags passed at init time */
+  const struct aom_codec_dec_cfg *cfg;
+  aom_codec_priv_t *priv; /**< Algorithm private storage */
+} aom_codec_ctx_t;
+
+/*!\brief Bit depth for codec
+ * *
+ * This enumeration determines the bit depth of the codec.
+ */
+typedef enum aom_bit_depth {
+  AOM_BITS_8 = 8,   /**<  8 bits */
+  AOM_BITS_10 = 10, /**< 10 bits */
+  AOM_BITS_12 = 12, /**< 12 bits */
+} aom_bit_depth_t;
+
+/*!\brief Superblock size selection.
+ *
+ * Defines the superblock size used for encoding. The superblock size can
+ * either be fixed at 64x64 or 128x128 pixels, or it can be dynamically
+ * selected by the encoder for each frame.
+ */
+typedef enum aom_superblock_size {
+  AOM_SUPERBLOCK_SIZE_64X64,   /**< Always use 64x64 superblocks. */
+  AOM_SUPERBLOCK_SIZE_128X128, /**< Always use 128x128 superblocks. */
+  AOM_SUPERBLOCK_SIZE_DYNAMIC  /**< Select superblock size dynamically. */
+} aom_superblock_size_t;
+
+/*
+ * Library Version Number Interface
+ *
+ * For example, see the following sample return values:
+ *     aom_codec_version()           (1<<16 | 2<<8 | 3)
+ *     aom_codec_version_str()       "v1.2.3-rc1-16-gec6a1ba"
+ *     aom_codec_version_extra_str() "rc1-16-gec6a1ba"
+ */
+
+/*!\brief Return the version information (as an integer)
+ *
+ * Returns a packed encoding of the library version number. This will only
+ * include
+ * the major.minor.patch component of the version number. Note that this encoded
+ * value should be accessed through the macros provided, as the encoding may
+ * change
+ * in the future.
+ *
+ */
+int aom_codec_version(void);
+
+/*!\brief Return the version major number */
+#define aom_codec_version_major() ((aom_codec_version() >> 16) & 0xff)
+
+/*!\brief Return the version minor number */
+#define aom_codec_version_minor() ((aom_codec_version() >> 8) & 0xff)
+
+/*!\brief Return the version patch number */
+#define aom_codec_version_patch() ((aom_codec_version() >> 0) & 0xff)
+
+/*!\brief Return the version information (as a string)
+ *
+ * Returns a printable string containing the full library version number. This
+ * may
+ * contain additional text following the three digit version number, as to
+ * indicate
+ * release candidates, prerelease versions, etc.
+ *
+ */
+const char *aom_codec_version_str(void);
+
+/*!\brief Return the version information (as a string)
+ *
+ * Returns a printable "extra string". This is the component of the string
+ * returned
+ * by aom_codec_version_str() following the three digit version number.
+ *
+ */
+const char *aom_codec_version_extra_str(void);
+
+/*!\brief Return the build configuration
+ *
+ * Returns a printable string containing an encoded version of the build
+ * configuration. This may be useful to aom support.
+ *
+ */
+const char *aom_codec_build_config(void);
+
+/*!\brief Return the name for a given interface
+ *
+ * Returns a human readable string for name of the given codec interface.
+ *
+ * \param[in]    iface     Interface pointer
+ *
+ */
+const char *aom_codec_iface_name(aom_codec_iface_t *iface);
+
+/*!\brief Convert error number to printable string
+ *
+ * Returns a human readable string for the last error returned by the
+ * algorithm. The returned error will be one line and will not contain
+ * any newline characters.
+ *
+ *
+ * \param[in]    err     Error number.
+ *
+ */
+const char *aom_codec_err_to_string(aom_codec_err_t err);
+
+/*!\brief Retrieve error synopsis for codec context
+ *
+ * Returns a human readable string for the last error returned by the
+ * algorithm. The returned error will be one line and will not contain
+ * any newline characters.
+ *
+ *
+ * \param[in]    ctx     Pointer to this instance's context.
+ *
+ */
+const char *aom_codec_error(aom_codec_ctx_t *ctx);
+
+/*!\brief Retrieve detailed error information for codec context
+ *
+ * Returns a human readable string providing detailed information about
+ * the last error.
+ *
+ * \param[in]    ctx     Pointer to this instance's context.
+ *
+ * \retval NULL
+ *     No detailed information is available.
+ */
+const char *aom_codec_error_detail(aom_codec_ctx_t *ctx);
+
+/* REQUIRED FUNCTIONS
+ *
+ * The following functions are required to be implemented for all codecs.
+ * They represent the base case functionality expected of all codecs.
+ */
+
+/*!\brief Destroy a codec instance
+ *
+ * Destroys a codec context, freeing any associated memory buffers.
+ *
+ * \param[in] ctx   Pointer to this instance's context
+ *
+ * \retval #AOM_CODEC_OK
+ *     The codec algorithm initialized.
+ * \retval #AOM_CODEC_MEM_ERROR
+ *     Memory allocation failed.
+ */
+aom_codec_err_t aom_codec_destroy(aom_codec_ctx_t *ctx);
+
+/*!\brief Get the capabilities of an algorithm.
+ *
+ * Retrieves the capabilities bitfield from the algorithm's interface.
+ *
+ * \param[in] iface   Pointer to the algorithm interface
+ *
+ */
+aom_codec_caps_t aom_codec_get_caps(aom_codec_iface_t *iface);
+
+/*!\brief Control algorithm
+ *
+ * This function is used to exchange algorithm specific data with the codec
+ * instance. This can be used to implement features specific to a particular
+ * algorithm.
+ *
+ * This wrapper function dispatches the request to the helper function
+ * associated with the given ctrl_id. It tries to call this function
+ * transparently, but will return #AOM_CODEC_ERROR if the request could not
+ * be dispatched.
+ *
+ * Note that this function should not be used directly. Call the
+ * #aom_codec_control wrapper macro instead.
+ *
+ * \param[in]     ctx              Pointer to this instance's context
+ * \param[in]     ctrl_id          Algorithm specific control identifier
+ *
+ * \retval #AOM_CODEC_OK
+ *     The control request was processed.
+ * \retval #AOM_CODEC_ERROR
+ *     The control request was not processed.
+ * \retval #AOM_CODEC_INVALID_PARAM
+ *     The data was not valid.
+ */
+aom_codec_err_t aom_codec_control_(aom_codec_ctx_t *ctx, int ctrl_id, ...);
+#if defined(AOM_DISABLE_CTRL_TYPECHECKS) && AOM_DISABLE_CTRL_TYPECHECKS
+#define aom_codec_control(ctx, id, data) aom_codec_control_(ctx, id, data)
+#define AOM_CTRL_USE_TYPE(id, typ)
+#define AOM_CTRL_USE_TYPE_DEPRECATED(id, typ)
+#define AOM_CTRL_VOID(id, typ)
+
+#else
+/*!\brief aom_codec_control wrapper macro
+ *
+ * This macro allows for type safe conversions across the variadic parameter
+ * to aom_codec_control_().
+ *
+ * \internal
+ * It works by dispatching the call to the control function through a wrapper
+ * function named with the id parameter.
+ */
+#define aom_codec_control(ctx, id, data) \
+  aom_codec_control_##id(ctx, id, data) /**<\hideinitializer*/
+
+/*!\brief aom_codec_control type definition macro
+ *
+ * This macro allows for type safe conversions across the variadic parameter
+ * to aom_codec_control_(). It defines the type of the argument for a given
+ * control identifier.
+ *
+ * \internal
+ * It defines a static function with
+ * the correctly typed arguments as a wrapper to the type-unsafe internal
+ * function.
+ */
+#define AOM_CTRL_USE_TYPE(id, typ)                                           \
+  static aom_codec_err_t aom_codec_control_##id(aom_codec_ctx_t *, int, typ) \
+      AOM_UNUSED;                                                            \
+                                                                             \
+  static aom_codec_err_t aom_codec_control_##id(aom_codec_ctx_t *ctx,        \
+                                                int ctrl_id, typ data) {     \
+    return aom_codec_control_(ctx, ctrl_id, data);                           \
+  } /**<\hideinitializer*/
+
+/*!\brief aom_codec_control deprecated type definition macro
+ *
+ * Like #AOM_CTRL_USE_TYPE, but indicates that the specified control is
+ * deprecated and should not be used. Consult the documentation for your
+ * codec for more information.
+ *
+ * \internal
+ * It defines a static function with the correctly typed arguments as a
+ * wrapper to the type-unsafe internal function.
+ */
+#define AOM_CTRL_USE_TYPE_DEPRECATED(id, typ)                            \
+  AOM_DECLSPEC_DEPRECATED static aom_codec_err_t aom_codec_control_##id( \
+      aom_codec_ctx_t *, int, typ) AOM_DEPRECATED AOM_UNUSED;            \
+                                                                         \
+  AOM_DECLSPEC_DEPRECATED static aom_codec_err_t aom_codec_control_##id( \
+      aom_codec_ctx_t *ctx, int ctrl_id, typ data) {                     \
+    return aom_codec_control_(ctx, ctrl_id, data);                       \
+  } /**<\hideinitializer*/
+
+/*!\brief aom_codec_control void type definition macro
+ *
+ * This macro allows for type safe conversions across the variadic parameter
+ * to aom_codec_control_(). It indicates that a given control identifier takes
+ * no argument.
+ *
+ * \internal
+ * It defines a static function without a data argument as a wrapper to the
+ * type-unsafe internal function.
+ */
+#define AOM_CTRL_VOID(id)                                               \
+  static aom_codec_err_t aom_codec_control_##id(aom_codec_ctx_t *, int) \
+      AOM_UNUSED;                                                       \
+                                                                        \
+  static aom_codec_err_t aom_codec_control_##id(aom_codec_ctx_t *ctx,   \
+                                                int ctrl_id) {          \
+    return aom_codec_control_(ctx, ctrl_id);                            \
+  } /**<\hideinitializer*/
+
+#endif
+
+/*!\brief OBU types. */
+typedef enum ATTRIBUTE_PACKED {
+  OBU_SEQUENCE_HEADER = 1,
+  OBU_TEMPORAL_DELIMITER = 2,
+  OBU_FRAME_HEADER = 3,
+  OBU_TILE_GROUP = 4,
+  OBU_METADATA = 5,
+  OBU_FRAME = 6,
+  OBU_REDUNDANT_FRAME_HEADER = 7,
+  OBU_TILE_LIST = 8,
+  OBU_PADDING = 15,
+} OBU_TYPE;
+
+/*!\brief OBU metadata types. */
+typedef enum {
+  OBU_METADATA_TYPE_AOM_RESERVED_0 = 0,
+  OBU_METADATA_TYPE_HDR_CLL = 1,
+  OBU_METADATA_TYPE_HDR_MDCV = 2,
+  OBU_METADATA_TYPE_SCALABILITY = 3,
+  OBU_METADATA_TYPE_ITUT_T35 = 4,
+  OBU_METADATA_TYPE_TIMECODE = 5,
+} OBU_METADATA_TYPE;
+
+/*!\brief Returns string representation of OBU_TYPE.
+ *
+ * \param[in]     type            The OBU_TYPE to convert to string.
+ */
+const char *aom_obu_type_to_string(OBU_TYPE type);
+
+/*!\brief Config Options
+ *
+ * This type allows to enumerate and control options defined for control
+ * via config file at runtime.
+ */
+typedef struct cfg_options {
+  /*!\brief Reflects if ext_partition should be enabled
+   *
+   * If this value is non-zero it enabled the feature
+   */
+  unsigned int ext_partition;
+} cfg_options_t;
+
+/*!@} - end defgroup codec*/
+#ifdef __cplusplus
+}
+#endif
+#endif  // AOM_AOM_AOM_CODEC_H_

diff --git a/libav1/aom/aom_decoder.h b/libav1/aom/aom_decoder.h
new file mode 100644
index 0000000..bc6b94d
--- /dev/null
+++ b/libav1/aom/aom_decoder.h

@@ -0,0 +1,371 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#ifndef AOM_AOM_AOM_DECODER_H_
+#define AOM_AOM_AOM_DECODER_H_
+
+/*!\defgroup decoder Decoder Algorithm Interface
+ * \ingroup codec
+ * This abstraction allows applications using this decoder to easily support
+ * multiple video formats with minimal code duplication. This section describes
+ * the interface common to all decoders.
+ * @{
+ */
+
+/*!\file
+ * \brief Describes the decoder algorithm interface to applications.
+ *
+ * This file describes the interface between an application and a
+ * video decoder algorithm.
+ *
+ */
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "aom/aom_codec.h"
+#include "aom/aom_frame_buffer.h"
+
+/*!\brief Current ABI version number
+ *
+ * \internal
+ * If this file is altered in any way that changes the ABI, this value
+ * must be bumped.  Examples include, but are not limited to, changing
+ * types, removing or reassigning enums, adding/removing/rearranging
+ * fields to structures
+ */
+#define AOM_DECODER_ABI_VERSION \
+  (3 + AOM_CODEC_ABI_VERSION) /**<\hideinitializer*/
+
+/*! \brief Decoder capabilities bitfield
+ *
+ *  Each decoder advertises the capabilities it supports as part of its
+ *  ::aom_codec_iface_t interface structure. Capabilities are extra interfaces
+ *  or functionality, and are not required to be supported by a decoder.
+ *
+ *  The available flags are specified by AOM_CODEC_CAP_* defines.
+ */
+#define AOM_CODEC_CAP_PUT_SLICE 0x10000 /**< Will issue put_slice callbacks */
+#define AOM_CODEC_CAP_PUT_FRAME 0x20000 /**< Will issue put_frame callbacks */
+#define AOM_CODEC_CAP_POSTPROC 0x40000  /**< Can postprocess decoded frame */
+
+/*! \brief Initialization-time Feature Enabling
+ *
+ *  Certain codec features must be known at initialization time, to allow for
+ *  proper memory allocation.
+ *
+ *  The available flags are specified by AOM_CODEC_USE_* defines.
+ */
+/*!brief Can support external frame buffers */
+#define AOM_CODEC_CAP_EXTERNAL_FRAME_BUFFER 0x200000
+
+#define AOM_CODEC_USE_POSTPROC 0x10000 /**< Postprocess decoded frame */
+
+/*!\brief Stream properties
+ *
+ * This structure is used to query or set properties of the decoded
+ * stream.
+ */
+typedef struct aom_codec_stream_info {
+  unsigned int w;                      /**< Width (or 0 for unknown/default) */
+  unsigned int h;                      /**< Height (or 0 for unknown/default) */
+  unsigned int is_kf;                  /**< Current frame is a keyframe */
+  unsigned int number_spatial_layers;  /**< Number of spatial layers */
+  unsigned int number_temporal_layers; /**< Number of temporal layers */
+  unsigned int is_annexb;              /**< Is Bitstream in Annex-B format */
+} aom_codec_stream_info_t;
+
+/* REQUIRED FUNCTIONS
+ *
+ * The following functions are required to be implemented for all decoders.
+ * They represent the base case functionality expected of all decoders.
+ */
+
+typedef struct av1_out_buffers_cb
+{
+    av1_get_decoded_buffer_cb_fn_t get_out_buffer_cb; /**< Get external output buffer */
+    av1_release_decoded_buffer_cb_fn_t release_out_buffer_cb; /**< Release external output buffer */
+    aom_notify_frame_ready_cb_fn_t notify_frame_ready_cb;
+    void* out_buffers_priv; /**< private pointer for functions above */
+} av1_out_buffers_cb_t;
+
+/*!\brief Initialization Configurations
+ *
+ * This structure is used to pass init time configuration options to the
+ * decoder.
+ */
+typedef struct aom_codec_dec_cfg {
+  unsigned int threads; /**< Maximum number of threads to use, default 1 */
+  unsigned int width;       /**< Width */
+  unsigned int height;       /**< Height */
+  unsigned int bitdepth;
+  unsigned int allow_lowbitdepth; /**< Allow use of low-bitdepth coding path */
+  cfg_options_t cfg;              /**< Options defined per config attributes */
+  void * host_memory;
+  size_t host_size;
+  av1_out_buffers_cb_t out_buffers_cb; /**< External output buffers functions*/
+  void* dx12device;
+  void* dx12command_queue;
+  void* dxPsos;
+  int tryHDR10x3;
+} aom_codec_dec_cfg_t;            /**< alias for struct aom_codec_dec_cfg */
+
+/*!\brief Initialize a decoder instance
+ *
+ * Initializes a decoder context using the given interface. Applications
+ * should call the aom_codec_dec_init convenience macro instead of this
+ * function directly, to ensure that the ABI version number parameter
+ * is properly initialized.
+ *
+ * If the library was configured with --disable-multithread, this call
+ * is not thread safe and should be guarded with a lock if being used
+ * in a multithreaded context.
+ *
+ * \param[in]    ctx     Pointer to this instance's context.
+ * \param[in]    iface   Pointer to the algorithm interface to use.
+ * \param[in]    cfg     Configuration to use, if known. May be NULL.
+ * \param[in]    flags   Bitfield of AOM_CODEC_USE_* flags
+ * \param[in]    ver     ABI version number. Must be set to
+ *                       AOM_DECODER_ABI_VERSION
+ * \retval #AOM_CODEC_OK
+ *     The decoder algorithm initialized.
+ * \retval #AOM_CODEC_MEM_ERROR
+ *     Memory allocation failed.
+ */
+aom_codec_err_t aom_codec_dec_init_ver(aom_codec_ctx_t *ctx,
+                                       aom_codec_iface_t *iface,
+                                       const aom_codec_dec_cfg_t *cfg,
+                                       aom_codec_flags_t flags, int ver);
+
+/*!\brief Convenience macro for aom_codec_dec_init_ver()
+ *
+ * Ensures the ABI version parameter is properly set.
+ */
+#define aom_codec_dec_init(ctx, iface, cfg, flags) \
+  aom_codec_dec_init_ver(ctx, iface, cfg, flags, AOM_DECODER_ABI_VERSION)
+
+/*!\brief Parse stream info from a buffer
+ *
+ * Performs high level parsing of the bitstream. Construction of a decoder
+ * context is not necessary. Can be used to determine if the bitstream is
+ * of the proper format, and to extract information from the stream.
+ *
+ * \param[in]      iface   Pointer to the algorithm interface
+ * \param[in]      data    Pointer to a block of data to parse
+ * \param[in]      data_sz Size of the data buffer
+ * \param[in,out]  si      Pointer to stream info to update. The is_annexb
+ *                         member \ref MUST be properly initialized. This
+ *                         function sets the rest of the members.
+ *
+ * \retval #AOM_CODEC_OK
+ *     Bitstream is parsable and stream information updated.
+ * \retval #AOM_CODEC_INVALID_PARAM
+ *     One of the arguments is invalid, for example a NULL pointer.
+ * \retval #AOM_CODEC_UNSUP_BITSTREAM
+ *     The decoder didn't recognize the coded data, or the
+ *     buffer was too short.
+ */
+aom_codec_err_t aom_codec_peek_stream_info(aom_codec_iface_t *iface,
+                                           const uint8_t *data, size_t data_sz,
+                                           aom_codec_stream_info_t *si);
+
+/*!\brief Return information about the current stream.
+ *
+ * Returns information about the stream that has been parsed during decoding.
+ *
+ * \param[in]      ctx     Pointer to this instance's context
+ * \param[in,out]  si      Pointer to stream info to update.
+ *
+ * \retval #AOM_CODEC_OK
+ *     Bitstream is parsable and stream information updated.
+ * \retval #AOM_CODEC_INVALID_PARAM
+ *     One of the arguments is invalid, for example a NULL pointer.
+ * \retval #AOM_CODEC_UNSUP_BITSTREAM
+ *     The decoder couldn't parse the submitted data.
+ */
+aom_codec_err_t aom_codec_get_stream_info(aom_codec_ctx_t *ctx,
+                                          aom_codec_stream_info_t *si);
+
+/*!\brief Decode data
+ *
+ * Processes a buffer of coded data. If the processing results in a new
+ * decoded frame becoming available, PUT_SLICE and PUT_FRAME events may be
+ * generated, as appropriate. Encoded data \ref MUST be passed in DTS (decode
+ * time stamp) order. Frames produced will always be in PTS (presentation
+ * time stamp) order.
+ *
+ * \param[in] ctx          Pointer to this instance's context
+ * \param[in] data         Pointer to this block of new coded data. If
+ *                         NULL, a AOM_CODEC_CB_PUT_FRAME event is posted
+ *                         for the previously decoded frame.
+ * \param[in] data_sz      Size of the coded data, in bytes.
+ * \param[in] user_priv    Application specific data to associate with
+ *                         this frame.
+ *
+ * \return Returns #AOM_CODEC_OK if the coded data was processed completely
+ *         and future pictures can be decoded without error. Otherwise,
+ *         see the descriptions of the other error codes in ::aom_codec_err_t
+ *         for recoverability capabilities.
+ */
+aom_codec_err_t aom_codec_decode(aom_codec_ctx_t *ctx, const uint8_t *data,
+                                 size_t data_sz, void *user_priv);
+
+/*!\brief Decoded frames iterator
+ *
+ * Iterates over a list of the frames available for display. The iterator
+ * storage should be initialized to NULL to start the iteration. Iteration is
+ * complete when this function returns NULL.
+ *
+ * The list of available frames becomes valid upon completion of the
+ * aom_codec_decode call, and remains valid until the next call to
+ * aom_codec_decode.
+ *
+ * \param[in]     ctx      Pointer to this instance's context
+ * \param[in,out] iter     Iterator storage, initialized to NULL
+ *
+ * \return Returns a pointer to an image, if one is ready for display. Frames
+ *         produced will always be in PTS (presentation time stamp) order.
+ */
+aom_image_t *aom_codec_get_frame(aom_codec_ctx_t *ctx, aom_codec_iter_t *iter);
+
+/*!\defgroup cap_put_frame Frame-Based Decoding Functions
+ *
+ * The following functions are required to be implemented for all decoders
+ * that advertise the AOM_CODEC_CAP_PUT_FRAME capability. Calling these
+ * functions
+ * for codecs that don't advertise this capability will result in an error
+ * code being returned, usually AOM_CODEC_ERROR
+ * @{
+ */
+
+/*!\brief put frame callback prototype
+ *
+ * This callback is invoked by the decoder to notify the application of
+ * the availability of decoded image data.
+ */
+typedef void (*aom_codec_put_frame_cb_fn_t)(void *user_priv,
+                                            const aom_image_t *img);
+
+/*!\brief Register for notification of frame completion.
+ *
+ * Registers a given function to be called when a decoded frame is
+ * available.
+ *
+ * \param[in] ctx          Pointer to this instance's context
+ * \param[in] cb           Pointer to the callback function
+ * \param[in] user_priv    User's private data
+ *
+ * \retval #AOM_CODEC_OK
+ *     Callback successfully registered.
+ * \retval #AOM_CODEC_ERROR
+ *     Decoder context not initialized, or algorithm not capable of
+ *     posting slice completion.
+ */
+aom_codec_err_t aom_codec_register_put_frame_cb(aom_codec_ctx_t *ctx,
+                                                aom_codec_put_frame_cb_fn_t cb,
+                                                void *user_priv);
+
+/*!@} - end defgroup cap_put_frame */
+
+/*!\defgroup cap_put_slice Slice-Based Decoding Functions
+ *
+ * The following functions are required to be implemented for all decoders
+ * that advertise the AOM_CODEC_CAP_PUT_SLICE capability. Calling these
+ * functions
+ * for codecs that don't advertise this capability will result in an error
+ * code being returned, usually AOM_CODEC_ERROR
+ * @{
+ */
+
+/*!\brief put slice callback prototype
+ *
+ * This callback is invoked by the decoder to notify the application of
+ * the availability of partially decoded image data. The
+ */
+typedef void (*aom_codec_put_slice_cb_fn_t)(void *user_priv,
+                                            const aom_image_t *img,
+                                            const aom_image_rect_t *valid,
+                                            const aom_image_rect_t *update);
+
+/*!\brief Register for notification of slice completion.
+ *
+ * Registers a given function to be called when a decoded slice is
+ * available.
+ *
+ * \param[in] ctx          Pointer to this instance's context
+ * \param[in] cb           Pointer to the callback function
+ * \param[in] user_priv    User's private data
+ *
+ * \retval #AOM_CODEC_OK
+ *     Callback successfully registered.
+ * \retval #AOM_CODEC_ERROR
+ *     Decoder context not initialized, or algorithm not capable of
+ *     posting slice completion.
+ */
+aom_codec_err_t aom_codec_register_put_slice_cb(aom_codec_ctx_t *ctx,
+                                                aom_codec_put_slice_cb_fn_t cb,
+                                                void *user_priv);
+
+/*!@} - end defgroup cap_put_slice*/
+
+/*!\defgroup cap_external_frame_buffer External Frame Buffer Functions
+ *
+ * The following section is required to be implemented for all decoders
+ * that advertise the AOM_CODEC_CAP_EXTERNAL_FRAME_BUFFER capability.
+ * Calling this function for codecs that don't advertise this capability
+ * will result in an error code being returned, usually AOM_CODEC_ERROR.
+ *
+ * \note
+ * Currently this only works with AV1.
+ * @{
+ */
+
+/*!\brief Pass in external frame buffers for the decoder to use.
+ *
+ * Registers functions to be called when libaom needs a frame buffer
+ * to decode the current frame and a function to be called when libaom does
+ * not internally reference the frame buffer. This set function must
+ * be called before the first call to decode or libaom will assume the
+ * default behavior of allocating frame buffers internally.
+ *
+ * \param[in] ctx          Pointer to this instance's context
+ * \param[in] cb_get       Pointer to the get callback function
+ * \param[in] cb_release   Pointer to the release callback function
+ * \param[in] cb_priv      Callback's private data
+ *
+ * \retval #AOM_CODEC_OK
+ *     External frame buffers will be used by libaom.
+ * \retval #AOM_CODEC_INVALID_PARAM
+ *     One or more of the callbacks were NULL.
+ * \retval #AOM_CODEC_ERROR
+ *     Decoder context not initialized, or algorithm not capable of
+ *     using external frame buffers.
+ *
+ * \note
+ * When decoding AV1, the application may be required to pass in at least
+ * #AOM_MAXIMUM_WORK_BUFFERS external frame
+ * buffers.
+ */
+aom_codec_err_t aom_codec_set_frame_buffer_functions(
+    aom_codec_ctx_t *ctx, aom_get_frame_buffer_cb_fn_t cb_get,
+    aom_release_frame_buffer_cb_fn_t cb_release, void *cb_priv);
+
+/*!@} - end defgroup cap_external_frame_buffer */
+aom_codec_err_t av1_codec_query_memory_requirements(aom_codec_dec_cfg_t *cfg);
+/*!@} - end defgroup decoder*/
+/*!\brief Precreate compute shaders.
+ *
+ */
+void* av1_create_pipeline_cache_handle(void* d3d12device, int threads);
+void av1_destroy_pipeline_cache_handle(void* handle);
+#ifdef __cplusplus
+}
+#endif
+#endif  // AOM_AOM_AOM_DECODER_H_

diff --git a/libav1/aom/aom_encoder.h b/libav1/aom/aom_encoder.h
new file mode 100644
index 0000000..f8a7cec
--- /dev/null
+++ b/libav1/aom/aom_encoder.h

@@ -0,0 +1,989 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#ifndef AOM_AOM_AOM_ENCODER_H_
+#define AOM_AOM_AOM_ENCODER_H_
+
+/*!\defgroup encoder Encoder Algorithm Interface
+ * \ingroup codec
+ * This abstraction allows applications using this encoder to easily support
+ * multiple video formats with minimal code duplication. This section describes
+ * the interface common to all encoders.
+ * @{
+ */
+
+/*!\file
+ * \brief Describes the encoder algorithm interface to applications.
+ *
+ * This file describes the interface between an application and a
+ * video encoder algorithm.
+ *
+ */
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "aom/aom_codec.h"
+
+/*!\brief Current ABI version number
+ *
+ * \internal
+ * If this file is altered in any way that changes the ABI, this value
+ * must be bumped.  Examples include, but are not limited to, changing
+ * types, removing or reassigning enums, adding/removing/rearranging
+ * fields to structures
+ */
+#define AOM_ENCODER_ABI_VERSION \
+  (5 + AOM_CODEC_ABI_VERSION) /**<\hideinitializer*/
+
+/*! \brief Encoder capabilities bitfield
+ *
+ *  Each encoder advertises the capabilities it supports as part of its
+ *  ::aom_codec_iface_t interface structure. Capabilities are extra
+ *  interfaces or functionality, and are not required to be supported
+ *  by an encoder.
+ *
+ *  The available flags are specified by AOM_CODEC_CAP_* defines.
+ */
+#define AOM_CODEC_CAP_PSNR 0x10000 /**< Can issue PSNR packets */
+
+/*! Can support input images at greater than 8 bitdepth.
+ */
+#define AOM_CODEC_CAP_HIGHBITDEPTH 0x40000
+
+/*! \brief Initialization-time Feature Enabling
+ *
+ *  Certain codec features must be known at initialization time, to allow
+ *  for proper memory allocation.
+ *
+ *  The available flags are specified by AOM_CODEC_USE_* defines.
+ */
+#define AOM_CODEC_USE_PSNR 0x10000 /**< Calculate PSNR on each frame */
+/*!\brief Make the encoder output one  partition at a time. */
+#define AOM_CODEC_USE_HIGHBITDEPTH 0x40000 /**< Use high bitdepth */
+
+/*!\brief Generic fixed size buffer structure
+ *
+ * This structure is able to hold a reference to any fixed size buffer.
+ */
+typedef struct aom_fixed_buf {
+  void *buf;       /**< Pointer to the data */
+  size_t sz;       /**< Length of the buffer, in chars */
+} aom_fixed_buf_t; /**< alias for struct aom_fixed_buf */
+
+/*!\brief Time Stamp Type
+ *
+ * An integer, which when multiplied by the stream's time base, provides
+ * the absolute time of a sample.
+ */
+typedef int64_t aom_codec_pts_t;
+
+/*!\brief Compressed Frame Flags
+ *
+ * This type represents a bitfield containing information about a compressed
+ * frame that may be useful to an application. The most significant 16 bits
+ * can be used by an algorithm to provide additional detail, for example to
+ * support frame types that are codec specific (MPEG-1 D-frames for example)
+ */
+typedef uint32_t aom_codec_frame_flags_t;
+#define AOM_FRAME_IS_KEY 0x1 /**< frame is the start of a GOP */
+/*!\brief frame can be dropped without affecting the stream (no future frame
+ * depends on this one) */
+#define AOM_FRAME_IS_DROPPABLE 0x2
+/*!\brief this is an INTRA_ONLY frame */
+#define AOM_FRAME_IS_INTRAONLY 0x10
+/*!\brief this is an S-frame */
+#define AOM_FRAME_IS_SWITCH 0x20
+/*!\brief this is an error-resilient frame */
+#define AOM_FRAME_IS_ERROR_RESILIENT 0x40
+/*!\brief this is a key-frame dependent recovery-point frame */
+#define AOM_FRAME_IS_DELAYED_RANDOM_ACCESS_POINT 0x80
+
+/*!\brief Error Resilient flags
+ *
+ * These flags define which error resilient features to enable in the
+ * encoder. The flags are specified through the
+ * aom_codec_enc_cfg::g_error_resilient variable.
+ */
+typedef uint32_t aom_codec_er_flags_t;
+/*!\brief Improve resiliency against losses of whole frames */
+#define AOM_ERROR_RESILIENT_DEFAULT 0x1
+
+/*!\brief Encoder output packet variants
+ *
+ * This enumeration lists the different kinds of data packets that can be
+ * returned by calls to aom_codec_get_cx_data(). Algorithms \ref MAY
+ * extend this list to provide additional functionality.
+ */
+enum aom_codec_cx_pkt_kind {
+  AOM_CODEC_CX_FRAME_PKT,    /**< Compressed video frame */
+  AOM_CODEC_STATS_PKT,       /**< Two-pass statistics for this frame */
+  AOM_CODEC_FPMB_STATS_PKT,  /**< first pass mb statistics for this frame */
+  AOM_CODEC_PSNR_PKT,        /**< PSNR statistics for this frame */
+  AOM_CODEC_CUSTOM_PKT = 256 /**< Algorithm extensions  */
+};
+
+/*!\brief Encoder output packet
+ *
+ * This structure contains the different kinds of output data the encoder
+ * may produce while compressing a frame.
+ */
+typedef struct aom_codec_cx_pkt {
+  enum aom_codec_cx_pkt_kind kind; /**< packet variant */
+  union {
+    struct {
+      void *buf; /**< compressed data buffer */
+      size_t sz; /**< length of compressed data */
+      /*!\brief time stamp to show frame (in timebase units) */
+      aom_codec_pts_t pts;
+      /*!\brief duration to show frame (in timebase units) */
+      unsigned long duration;
+      aom_codec_frame_flags_t flags; /**< flags for this frame */
+      /*!\brief the partition id defines the decoding order of the partitions.
+       * Only applicable when "output partition" mode is enabled. First
+       * partition has id 0.*/
+      int partition_id;
+      /*!\brief size of the visible frame in this packet */
+      size_t vis_frame_size;
+    } frame;                            /**< data for compressed frame packet */
+    aom_fixed_buf_t twopass_stats;      /**< data for two-pass packet */
+    aom_fixed_buf_t firstpass_mb_stats; /**< first pass mb packet */
+    struct aom_psnr_pkt {
+      unsigned int samples[4]; /**< Number of samples, total/y/u/v */
+      uint64_t sse[4];         /**< sum squared error, total/y/u/v */
+      double psnr[4];          /**< PSNR, total/y/u/v */
+    } psnr;                    /**< data for PSNR packet */
+    aom_fixed_buf_t raw;       /**< data for arbitrary packets */
+
+    /* This packet size is fixed to allow codecs to extend this
+     * interface without having to manage storage for raw packets,
+     * i.e., if it's smaller than 128 bytes, you can store in the
+     * packet list directly.
+     */
+    char pad[128 - sizeof(enum aom_codec_cx_pkt_kind)]; /**< fixed sz */
+  } data;                                               /**< packet data */
+} aom_codec_cx_pkt_t; /**< alias for struct aom_codec_cx_pkt */
+
+/*!\brief Rational Number
+ *
+ * This structure holds a fractional value.
+ */
+typedef struct aom_rational {
+  int num;        /**< fraction numerator */
+  int den;        /**< fraction denominator */
+} aom_rational_t; /**< alias for struct aom_rational */
+
+/*!\brief Multi-pass Encoding Pass */
+enum aom_enc_pass {
+  AOM_RC_ONE_PASS,   /**< Single pass mode */
+  AOM_RC_FIRST_PASS, /**< First pass of multi-pass mode */
+  AOM_RC_LAST_PASS   /**< Final pass of multi-pass mode */
+};
+
+/*!\brief Rate control mode */
+enum aom_rc_mode {
+  AOM_VBR, /**< Variable Bit Rate (VBR) mode */
+  AOM_CBR, /**< Constant Bit Rate (CBR) mode */
+  AOM_CQ,  /**< Constrained Quality (CQ)  mode */
+  AOM_Q,   /**< Constant Quality (Q) mode */
+};
+
+/*!\brief Keyframe placement mode.
+ *
+ * This enumeration determines whether keyframes are placed automatically by
+ * the encoder or whether this behavior is disabled. Older releases of this
+ * SDK were implemented such that AOM_KF_FIXED meant keyframes were disabled.
+ * This name is confusing for this behavior, so the new symbols to be used
+ * are AOM_KF_AUTO and AOM_KF_DISABLED.
+ */
+enum aom_kf_mode {
+  AOM_KF_FIXED,       /**< deprecated, implies AOM_KF_DISABLED */
+  AOM_KF_AUTO,        /**< Encoder determines optimal placement automatically */
+  AOM_KF_DISABLED = 0 /**< Encoder does not place keyframes. */
+};
+
+/*!\brief Encoded Frame Flags
+ *
+ * This type indicates a bitfield to be passed to aom_codec_encode(), defining
+ * per-frame boolean values. By convention, bits common to all codecs will be
+ * named AOM_EFLAG_*, and bits specific to an algorithm will be named
+ * /algo/_eflag_*. The lower order 16 bits are reserved for common use.
+ */
+typedef long aom_enc_frame_flags_t;
+#define AOM_EFLAG_FORCE_KF (1 << 0) /**< Force this frame to be a keyframe */
+
+/*!\brief Encoder configuration structure
+ *
+ * This structure contains the encoder settings that have common representations
+ * across all codecs. This doesn't imply that all codecs support all features,
+ * however.
+ */
+typedef struct aom_codec_enc_cfg {
+  /*
+   * generic settings (g)
+   */
+
+  /*!\brief Algorithm specific "usage" value
+   *
+   * Algorithms may define multiple values for usage, which may convey the
+   * intent of how the application intends to use the stream. If this value
+   * is non-zero, consult the documentation for the codec to determine its
+   * meaning.
+   */
+  unsigned int g_usage;
+
+  /*!\brief Maximum number of threads to use
+   *
+   * For multi-threaded implementations, use no more than this number of
+   * threads. The codec may use fewer threads than allowed. The value
+   * 0 is equivalent to the value 1.
+   */
+  unsigned int g_threads;
+
+  /*!\brief Bitstream profile to use
+   *
+   * Some codecs support a notion of multiple bitstream profiles. Typically
+   * this maps to a set of features that are turned on or off. Often the
+   * profile to use is determined by the features of the intended decoder.
+   * Consult the documentation for the codec to determine the valid values
+   * for this parameter, or set to zero for a sane default.
+   */
+  unsigned int g_profile; /**< profile of bitstream to use */
+
+  /*!\brief Width of the frame
+   *
+   * This value identifies the presentation resolution of the frame,
+   * in pixels. Note that the frames passed as input to the encoder must
+   * have this resolution. Frames will be presented by the decoder in this
+   * resolution, independent of any spatial resampling the encoder may do.
+   */
+  unsigned int g_w;
+
+  /*!\brief Height of the frame
+   *
+   * This value identifies the presentation resolution of the frame,
+   * in pixels. Note that the frames passed as input to the encoder must
+   * have this resolution. Frames will be presented by the decoder in this
+   * resolution, independent of any spatial resampling the encoder may do.
+   */
+  unsigned int g_h;
+
+  /*!\brief Max number of frames to encode
+   *
+   */
+  unsigned int g_limit;
+
+  /*!\brief Forced maximum width of the frame
+   *
+   * If this value is non-zero then it is used to force the maximum frame
+   * width written in write_sequence_header().
+   */
+  unsigned int g_forced_max_frame_width;
+
+  /*!\brief Forced maximum height of the frame
+   *
+   * If this value is non-zero then it is used to force the maximum frame
+   * height written in write_sequence_header().
+   */
+  unsigned int g_forced_max_frame_height;
+
+  /*!\brief Bit-depth of the codec
+   *
+   * This value identifies the bit_depth of the codec,
+   * Only certain bit-depths are supported as identified in the
+   * aom_bit_depth_t enum.
+   */
+  aom_bit_depth_t g_bit_depth;
+
+  /*!\brief Bit-depth of the input frames
+   *
+   * This value identifies the bit_depth of the input frames in bits.
+   * Note that the frames passed as input to the encoder must have
+   * this bit-depth.
+   */
+  unsigned int g_input_bit_depth;
+
+  /*!\brief Stream timebase units
+   *
+   * Indicates the smallest interval of time, in seconds, used by the stream.
+   * For fixed frame rate material, or variable frame rate material where
+   * frames are timed at a multiple of a given clock (ex: video capture),
+   * the \ref RECOMMENDED method is to set the timebase to the reciprocal
+   * of the frame rate (ex: 1001/30000 for 29.970 Hz NTSC). This allows the
+   * pts to correspond to the frame number, which can be handy. For
+   * re-encoding video from containers with absolute time timestamps, the
+   * \ref RECOMMENDED method is to set the timebase to that of the parent
+   * container or multimedia framework (ex: 1/1000 for ms, as in FLV).
+   */
+  struct aom_rational g_timebase;
+
+  /*!\brief Enable error resilient modes.
+   *
+   * The error resilient bitfield indicates to the encoder which features
+   * it should enable to take measures for streaming over lossy or noisy
+   * links.
+   */
+  aom_codec_er_flags_t g_error_resilient;
+
+  /*!\brief Multi-pass Encoding Mode
+   *
+   * This value should be set to the current phase for multi-pass encoding.
+   * For single pass, set to #AOM_RC_ONE_PASS.
+   */
+  enum aom_enc_pass g_pass;
+
+  /*!\brief Allow lagged encoding
+   *
+   * If set, this value allows the encoder to consume a number of input
+   * frames before producing output frames. This allows the encoder to
+   * base decisions for the current frame on future frames. This does
+   * increase the latency of the encoding pipeline, so it is not appropriate
+   * in all situations (ex: realtime encoding).
+   *
+   * Note that this is a maximum value -- the encoder may produce frames
+   * sooner than the given limit. Set this value to 0 to disable this
+   * feature.
+   */
+  unsigned int g_lag_in_frames;
+
+  /*
+   * rate control settings (rc)
+   */
+
+  /*!\brief Temporal resampling configuration, if supported by the codec.
+   *
+   * Temporal resampling allows the codec to "drop" frames as a strategy to
+   * meet its target data rate. This can cause temporal discontinuities in
+   * the encoded video, which may appear as stuttering during playback. This
+   * trade-off is often acceptable, but for many applications is not. It can
+   * be disabled in these cases.
+   *
+   * Note that not all codecs support this feature. All aom AVx codecs do.
+   * For other codecs, consult the documentation for that algorithm.
+   *
+   * This threshold is described as a percentage of the target data buffer.
+   * When the data buffer falls below this percentage of fullness, a
+   * dropped frame is indicated. Set the threshold to zero (0) to disable
+   * this feature.
+   */
+  unsigned int rc_dropframe_thresh;
+
+  /*!\brief Mode for spatial resampling, if supported by the codec.
+   *
+   * Spatial resampling allows the codec to compress a lower resolution
+   * version of the frame, which is then upscaled by the decoder to the
+   * correct presentation resolution. This increases visual quality at
+   * low data rates, at the expense of CPU time on the encoder/decoder.
+   */
+  unsigned int rc_resize_mode;
+
+  /*!\brief Frame resize denominator.
+   *
+   * The denominator for resize to use, assuming 8 as the numerator.
+   *
+   * Valid denominators are  8 - 16 for now.
+   */
+  unsigned int rc_resize_denominator;
+
+  /*!\brief Keyframe resize denominator.
+   *
+   * The denominator for resize to use, assuming 8 as the numerator.
+   *
+   * Valid denominators are  8 - 16 for now.
+   */
+  unsigned int rc_resize_kf_denominator;
+
+  /*!\brief Frame super-resolution scaling mode.
+   *
+   * Similar to spatial resampling, frame super-resolution integrates
+   * upscaling after the encode/decode process. Taking control of upscaling and
+   * using restoration filters should allow it to outperform normal resizing.
+   *
+   * Valid values are 0 to 4 as defined in enum SUPERRES_MODE.
+   */
+  unsigned int rc_superres_mode;
+
+  /*!\brief Frame super-resolution denominator.
+   *
+   * The denominator for superres to use. If fixed it will only change if the
+   * cumulative scale change over resizing and superres is greater than 1/2;
+   * this forces superres to reduce scaling.
+   *
+   * Valid denominators are 8 to 16.
+   *
+   * Used only by SUPERRES_FIXED.
+   */
+  unsigned int rc_superres_denominator;
+
+  /*!\brief Keyframe super-resolution denominator.
+   *
+   * The denominator for superres to use. If fixed it will only change if the
+   * cumulative scale change over resizing and superres is greater than 1/2;
+   * this forces superres to reduce scaling.
+   *
+   * Valid denominators are 8 - 16 for now.
+   */
+  unsigned int rc_superres_kf_denominator;
+
+  /*!\brief Frame super-resolution q threshold.
+   *
+   * The q level threshold after which superres is used.
+   * Valid values are 1 to 63.
+   *
+   * Used only by SUPERRES_QTHRESH
+   */
+  unsigned int rc_superres_qthresh;
+
+  /*!\brief Keyframe super-resolution q threshold.
+   *
+   * The q level threshold after which superres is used for key frames.
+   * Valid values are 1 to 63.
+   *
+   * Used only by SUPERRES_QTHRESH
+   */
+  unsigned int rc_superres_kf_qthresh;
+
+  /*!\brief Rate control algorithm to use.
+   *
+   * Indicates whether the end usage of this stream is to be streamed over
+   * a bandwidth constrained link, indicating that Constant Bit Rate (CBR)
+   * mode should be used, or whether it will be played back on a high
+   * bandwidth link, as from a local disk, where higher variations in
+   * bitrate are acceptable.
+   */
+  enum aom_rc_mode rc_end_usage;
+
+  /*!\brief Two-pass stats buffer.
+   *
+   * A buffer containing all of the stats packets produced in the first
+   * pass, concatenated.
+   */
+  aom_fixed_buf_t rc_twopass_stats_in;
+
+  /*!\brief first pass mb stats buffer.
+   *
+   * A buffer containing all of the first pass mb stats packets produced
+   * in the first pass, concatenated.
+   */
+  aom_fixed_buf_t rc_firstpass_mb_stats_in;
+
+  /*!\brief Target data rate
+   *
+   * Target bandwidth to use for this stream, in kilobits per second.
+   */
+  unsigned int rc_target_bitrate;
+
+  /*
+   * quantizer settings
+   */
+
+  /*!\brief Minimum (Best Quality) Quantizer
+   *
+   * The quantizer is the most direct control over the quality of the
+   * encoded image. The range of valid values for the quantizer is codec
+   * specific. Consult the documentation for the codec to determine the
+   * values to use. To determine the range programmatically, call
+   * aom_codec_enc_config_default() with a usage value of 0.
+   */
+  unsigned int rc_min_quantizer;
+
+  /*!\brief Maximum (Worst Quality) Quantizer
+   *
+   * The quantizer is the most direct control over the quality of the
+   * encoded image. The range of valid values for the quantizer is codec
+   * specific. Consult the documentation for the codec to determine the
+   * values to use. To determine the range programmatically, call
+   * aom_codec_enc_config_default() with a usage value of 0.
+   */
+  unsigned int rc_max_quantizer;
+
+  /*
+   * bitrate tolerance
+   */
+
+  /*!\brief Rate control adaptation undershoot control
+   *
+   * This value, expressed as a percentage of the target bitrate,
+   * controls the maximum allowed adaptation speed of the codec.
+   * This factor controls the maximum amount of bits that can
+   * be subtracted from the target bitrate in order to compensate
+   * for prior overshoot.
+   *
+   * Valid values in the range 0-1000.
+   */
+  unsigned int rc_undershoot_pct;
+
+  /*!\brief Rate control adaptation overshoot control
+   *
+   * This value, expressed as a percentage of the target bitrate,
+   * controls the maximum allowed adaptation speed of the codec.
+   * This factor controls the maximum amount of bits that can
+   * be added to the target bitrate in order to compensate for
+   * prior undershoot.
+   *
+   * Valid values in the range 0-1000.
+   */
+  unsigned int rc_overshoot_pct;
+
+  /*
+   * decoder buffer model parameters
+   */
+
+  /*!\brief Decoder Buffer Size
+   *
+   * This value indicates the amount of data that may be buffered by the
+   * decoding application. Note that this value is expressed in units of
+   * time (milliseconds). For example, a value of 5000 indicates that the
+   * client will buffer (at least) 5000ms worth of encoded data. Use the
+   * target bitrate (#rc_target_bitrate) to convert to bits/bytes, if
+   * necessary.
+   */
+  unsigned int rc_buf_sz;
+
+  /*!\brief Decoder Buffer Initial Size
+   *
+   * This value indicates the amount of data that will be buffered by the
+   * decoding application prior to beginning playback. This value is
+   * expressed in units of time (milliseconds). Use the target bitrate
+   * (#rc_target_bitrate) to convert to bits/bytes, if necessary.
+   */
+  unsigned int rc_buf_initial_sz;
+
+  /*!\brief Decoder Buffer Optimal Size
+   *
+   * This value indicates the amount of data that the encoder should try
+   * to maintain in the decoder's buffer. This value is expressed in units
+   * of time (milliseconds). Use the target bitrate (#rc_target_bitrate)
+   * to convert to bits/bytes, if necessary.
+   */
+  unsigned int rc_buf_optimal_sz;
+
+  /*
+   * 2 pass rate control parameters
+   */
+
+  /*!\brief Two-pass mode CBR/VBR bias
+   *
+   * Bias, expressed on a scale of 0 to 100, for determining target size
+   * for the current frame. The value 0 indicates the optimal CBR mode
+   * value should be used. The value 100 indicates the optimal VBR mode
+   * value should be used. Values in between indicate which way the
+   * encoder should "lean."
+   */
+  unsigned int rc_2pass_vbr_bias_pct;
+
+  /*!\brief Two-pass mode per-GOP minimum bitrate
+   *
+   * This value, expressed as a percentage of the target bitrate, indicates
+   * the minimum bitrate to be used for a single GOP (aka "section")
+   */
+  unsigned int rc_2pass_vbr_minsection_pct;
+
+  /*!\brief Two-pass mode per-GOP maximum bitrate
+   *
+   * This value, expressed as a percentage of the target bitrate, indicates
+   * the maximum bitrate to be used for a single GOP (aka "section")
+   */
+  unsigned int rc_2pass_vbr_maxsection_pct;
+
+  /*
+   * keyframing settings (kf)
+   */
+
+  /*!\brief Option to enable forward reference key frame
+   *
+   */
+  int fwd_kf_enabled;
+
+  /*!\brief Keyframe placement mode
+   *
+   * This value indicates whether the encoder should place keyframes at a
+   * fixed interval, or determine the optimal placement automatically
+   * (as governed by the #kf_min_dist and #kf_max_dist parameters)
+   */
+  enum aom_kf_mode kf_mode;
+
+  /*!\brief Keyframe minimum interval
+   *
+   * This value, expressed as a number of frames, prevents the encoder from
+   * placing a keyframe nearer than kf_min_dist to the previous keyframe. At
+   * least kf_min_dist frames non-keyframes will be coded before the next
+   * keyframe. Set kf_min_dist equal to kf_max_dist for a fixed interval.
+   */
+  unsigned int kf_min_dist;
+
+  /*!\brief Keyframe maximum interval
+   *
+   * This value, expressed as a number of frames, forces the encoder to code
+   * a keyframe if one has not been coded in the last kf_max_dist frames.
+   * A value of 0 implies all frames will be keyframes. Set kf_min_dist
+   * equal to kf_max_dist for a fixed interval.
+   */
+  unsigned int kf_max_dist;
+
+  /*!\brief sframe interval
+   *
+   * This value, expressed as a number of frames, forces the encoder to code
+   * an S-Frame every sframe_dist frames.
+   */
+  unsigned int sframe_dist;
+
+  /*!\brief sframe insertion mode
+   *
+   * This value must be set to 1 or 2, and tells the encoder how to insert
+   * S-Frames. It will only have an effect if sframe_dist != 0.
+   *
+   * If altref is enabled:
+   *   - if sframe_mode == 1, the considered frame will be made into an
+   *     S-Frame only if it is an altref frame
+   *   - if sframe_mode == 2, the next altref frame will be made into an
+   *     S-Frame.
+   *
+   * Otherwise: the considered frame will be made into an S-Frame.
+   */
+  unsigned int sframe_mode;
+
+  /*!\brief Tile coding mode
+   *
+   * This value indicates the tile coding mode.
+   * A value of 0 implies a normal non-large-scale tile coding. A value of 1
+   * implies a large-scale tile coding.
+   */
+  unsigned int large_scale_tile;
+
+  /*!\brief Monochrome mode
+   *
+   * If this is nonzero, the encoder will generate a monochrome stream
+   * with no chroma planes.
+   */
+  unsigned int monochrome;
+
+  /*!\brief full_still_picture_hdr
+   *
+   * If this is nonzero, the encoder will generate a full header even for
+   * still picture encoding. if zero, a reduced header is used for still
+   * picture. This flag has no effect when a regular video with more than
+   * a single frame is encoded.
+   */
+  unsigned int full_still_picture_hdr;
+
+  /*!\brief Bitstream syntax mode
+   *
+   * This value indicates the bitstream syntax mode.
+   * A value of 0 indicates bitstream is saved as Section 5 bitstream. A value
+   * of 1 indicates the bitstream is saved in Annex-B format
+   */
+  unsigned int save_as_annexb;
+
+  /*!\brief Number of explicit tile widths specified
+   *
+   * This value indicates the number of tile widths specified
+   * A value of 0 implies no tile widths are specified.
+   * Tile widths are given in the array tile_widths[]
+   */
+  int tile_width_count;
+
+  /*!\brief Number of explicit tile heights specified
+   *
+   * This value indicates the number of tile heights specified
+   * A value of 0 implies no tile heights are specified.
+   * Tile heights are given in the array tile_heights[]
+   */
+  int tile_height_count;
+
+/*!\brief Maximum number of tile widths in tile widths array
+ *
+ * This define gives the maximum number of elements in the tile_widths array.
+ */
+#define MAX_TILE_WIDTHS 64  // maximum tile width array length
+
+  /*!\brief Array of specified tile widths
+   *
+   * This array specifies tile widths (and may be empty)
+   * The number of widths specified is given by tile_width_count
+   */
+  int tile_widths[MAX_TILE_WIDTHS];
+
+/*!\brief Maximum number of tile heights in tile heights array.
+ *
+ * This define gives the maximum number of elements in the tile_heights array.
+ */
+#define MAX_TILE_HEIGHTS 64  // maximum tile height array length
+
+  /*!\brief Array of specified tile heights
+   *
+   * This array specifies tile heights (and may be empty)
+   * The number of heights specified is given by tile_height_count
+   */
+  int tile_heights[MAX_TILE_HEIGHTS];
+
+  /*!\brief Options defined per config file
+   *
+   */
+  cfg_options_t cfg;
+} aom_codec_enc_cfg_t; /**< alias for struct aom_codec_enc_cfg */
+
+/*!\brief Initialize an encoder instance
+ *
+ * Initializes a encoder context using the given interface. Applications
+ * should call the aom_codec_enc_init convenience macro instead of this
+ * function directly, to ensure that the ABI version number parameter
+ * is properly initialized.
+ *
+ * If the library was configured with --disable-multithread, this call
+ * is not thread safe and should be guarded with a lock if being used
+ * in a multithreaded context.
+ *
+ * \param[in]    ctx     Pointer to this instance's context.
+ * \param[in]    iface   Pointer to the algorithm interface to use.
+ * \param[in]    cfg     Configuration to use, if known.
+ * \param[in]    flags   Bitfield of AOM_CODEC_USE_* flags
+ * \param[in]    ver     ABI version number. Must be set to
+ *                       AOM_ENCODER_ABI_VERSION
+ * \retval #AOM_CODEC_OK
+ *     The decoder algorithm initialized.
+ * \retval #AOM_CODEC_MEM_ERROR
+ *     Memory allocation failed.
+ */
+aom_codec_err_t aom_codec_enc_init_ver(aom_codec_ctx_t *ctx,
+                                       aom_codec_iface_t *iface,
+                                       const aom_codec_enc_cfg_t *cfg,
+                                       aom_codec_flags_t flags, int ver);
+
+/*!\brief Convenience macro for aom_codec_enc_init_ver()
+ *
+ * Ensures the ABI version parameter is properly set.
+ */
+#define aom_codec_enc_init(ctx, iface, cfg, flags) \
+  aom_codec_enc_init_ver(ctx, iface, cfg, flags, AOM_ENCODER_ABI_VERSION)
+
+/*!\brief Initialize multi-encoder instance
+ *
+ * Initializes multi-encoder context using the given interface.
+ * Applications should call the aom_codec_enc_init_multi convenience macro
+ * instead of this function directly, to ensure that the ABI version number
+ * parameter is properly initialized.
+ *
+ * \param[in]    ctx     Pointer to this instance's context.
+ * \param[in]    iface   Pointer to the algorithm interface to use.
+ * \param[in]    cfg     Configuration to use, if known.
+ * \param[in]    num_enc Total number of encoders.
+ * \param[in]    flags   Bitfield of AOM_CODEC_USE_* flags
+ * \param[in]    dsf     Pointer to down-sampling factors.
+ * \param[in]    ver     ABI version number. Must be set to
+ *                       AOM_ENCODER_ABI_VERSION
+ * \retval #AOM_CODEC_OK
+ *     The decoder algorithm initialized.
+ * \retval #AOM_CODEC_MEM_ERROR
+ *     Memory allocation failed.
+ */
+aom_codec_err_t aom_codec_enc_init_multi_ver(
+    aom_codec_ctx_t *ctx, aom_codec_iface_t *iface, aom_codec_enc_cfg_t *cfg,
+    int num_enc, aom_codec_flags_t flags, aom_rational_t *dsf, int ver);
+
+/*!\brief Convenience macro for aom_codec_enc_init_multi_ver()
+ *
+ * Ensures the ABI version parameter is properly set.
+ */
+#define aom_codec_enc_init_multi(ctx, iface, cfg, num_enc, flags, dsf) \
+  aom_codec_enc_init_multi_ver(ctx, iface, cfg, num_enc, flags, dsf,   \
+                               AOM_ENCODER_ABI_VERSION)
+
+/*!\brief Get a default configuration
+ *
+ * Initializes a encoder configuration structure with default values. Supports
+ * the notion of "usages" so that an algorithm may offer different default
+ * settings depending on the user's intended goal. This function \ref SHOULD
+ * be called by all applications to initialize the configuration structure
+ * before specializing the configuration with application specific values.
+ *
+ * \param[in]    iface     Pointer to the algorithm interface to use.
+ * \param[out]   cfg       Configuration buffer to populate.
+ * \param[in]    reserved  Must set to 0.
+ *
+ * \retval #AOM_CODEC_OK
+ *     The configuration was populated.
+ * \retval #AOM_CODEC_INCAPABLE
+ *     Interface is not an encoder interface.
+ * \retval #AOM_CODEC_INVALID_PARAM
+ *     A parameter was NULL, or the usage value was not recognized.
+ */
+aom_codec_err_t aom_codec_enc_config_default(aom_codec_iface_t *iface,
+                                             aom_codec_enc_cfg_t *cfg,
+                                             unsigned int reserved);
+
+/*!\brief Set or change configuration
+ *
+ * Reconfigures an encoder instance according to the given configuration.
+ *
+ * \param[in]    ctx     Pointer to this instance's context
+ * \param[in]    cfg     Configuration buffer to use
+ *
+ * \retval #AOM_CODEC_OK
+ *     The configuration was populated.
+ * \retval #AOM_CODEC_INCAPABLE
+ *     Interface is not an encoder interface.
+ * \retval #AOM_CODEC_INVALID_PARAM
+ *     A parameter was NULL, or the usage value was not recognized.
+ */
+aom_codec_err_t aom_codec_enc_config_set(aom_codec_ctx_t *ctx,
+                                         const aom_codec_enc_cfg_t *cfg);
+
+/*!\brief Get global stream headers
+ *
+ * Retrieves a stream level global header packet, if supported by the codec.
+ * Calls to this function should be deferred until all configuration information
+ * has been passed to libaom. Otherwise the global header data may be
+ * invalidated by additional configuration changes.
+ *
+ * The AV1 implementation of this function returns an OBU. The OBU returned is
+ * in Low Overhead Bitstream Format. Specifically, the obu_has_size_field bit is
+ * set, and the buffer contains the obu_size field for the returned OBU.
+ *
+ * \param[in]    ctx     Pointer to this instance's context
+ *
+ * \retval NULL
+ *     Encoder does not support global header, or an error occurred while
+ *     generating the global header.
+ *
+ * \retval Non-NULL
+ *     Pointer to buffer containing global header packet. The caller owns the
+ *     memory associated with this buffer, and must free the 'buf' member of the
+ *     aom_fixed_buf_t as well as the aom_fixed_buf_t pointer. Memory returned
+ *     must be freed via call to free().
+ */
+aom_fixed_buf_t *aom_codec_get_global_headers(aom_codec_ctx_t *ctx);
+
+/*!\brief usage parameter analogous to AV1 GOOD QUALITY mode. */
+#define AOM_USAGE_GOOD_QUALITY (0)
+/*!\brief usage parameter analogous to AV1 REALTIME mode. */
+#define AOM_USAGE_REALTIME (1)
+
+/*!\brief Encode a frame
+ *
+ * Encodes a video frame at the given "presentation time." The presentation
+ * time stamp (PTS) \ref MUST be strictly increasing.
+ *
+ * When the last frame has been passed to the encoder, this function should
+ * continue to be called, with the img parameter set to NULL. This will
+ * signal the end-of-stream condition to the encoder and allow it to encode
+ * any held buffers. Encoding is complete when aom_codec_encode() is called
+ * and aom_codec_get_cx_data() returns no data.
+ *
+ * \param[in]    ctx       Pointer to this instance's context
+ * \param[in]    img       Image data to encode, NULL to flush.
+ * \param[in]    pts       Presentation time stamp, in timebase units.
+ * \param[in]    duration  Duration to show frame, in timebase units.
+ * \param[in]    flags     Flags to use for encoding this frame.
+ *
+ * \retval #AOM_CODEC_OK
+ *     The configuration was populated.
+ * \retval #AOM_CODEC_INCAPABLE
+ *     Interface is not an encoder interface.
+ * \retval #AOM_CODEC_INVALID_PARAM
+ *     A parameter was NULL, the image format is unsupported, etc.
+ */
+aom_codec_err_t aom_codec_encode(aom_codec_ctx_t *ctx, const aom_image_t *img,
+                                 aom_codec_pts_t pts, unsigned long duration,
+                                 aom_enc_frame_flags_t flags);
+
+/*!\brief Set compressed data output buffer
+ *
+ * Sets the buffer that the codec should output the compressed data
+ * into. This call effectively sets the buffer pointer returned in the
+ * next AOM_CODEC_CX_FRAME_PKT packet. Subsequent packets will be
+ * appended into this buffer. The buffer is preserved across frames,
+ * so applications must periodically call this function after flushing
+ * the accumulated compressed data to disk or to the network to reset
+ * the pointer to the buffer's head.
+ *
+ * `pad_before` bytes will be skipped before writing the compressed
+ * data, and `pad_after` bytes will be appended to the packet. The size
+ * of the packet will be the sum of the size of the actual compressed
+ * data, pad_before, and pad_after. The padding bytes will be preserved
+ * (not overwritten).
+ *
+ * Note that calling this function does not guarantee that the returned
+ * compressed data will be placed into the specified buffer. In the
+ * event that the encoded data will not fit into the buffer provided,
+ * the returned packet \ref MAY point to an internal buffer, as it would
+ * if this call were never used. In this event, the output packet will
+ * NOT have any padding, and the application must free space and copy it
+ * to the proper place. This is of particular note in configurations
+ * that may output multiple packets for a single encoded frame (e.g., lagged
+ * encoding) or if the application does not reset the buffer periodically.
+ *
+ * Applications may restore the default behavior of the codec providing
+ * the compressed data buffer by calling this function with a NULL
+ * buffer.
+ *
+ * Applications \ref MUSTNOT call this function during iteration of
+ * aom_codec_get_cx_data().
+ *
+ * \param[in]    ctx         Pointer to this instance's context
+ * \param[in]    buf         Buffer to store compressed data into
+ * \param[in]    pad_before  Bytes to skip before writing compressed data
+ * \param[in]    pad_after   Bytes to skip after writing compressed data
+ *
+ * \retval #AOM_CODEC_OK
+ *     The buffer was set successfully.
+ * \retval #AOM_CODEC_INVALID_PARAM
+ *     A parameter was NULL, the image format is unsupported, etc.
+ */
+aom_codec_err_t aom_codec_set_cx_data_buf(aom_codec_ctx_t *ctx,
+                                          const aom_fixed_buf_t *buf,
+                                          unsigned int pad_before,
+                                          unsigned int pad_after);
+
+/*!\brief Encoded data iterator
+ *
+ * Iterates over a list of data packets to be passed from the encoder to the
+ * application. The different kinds of packets available are enumerated in
+ * #aom_codec_cx_pkt_kind.
+ *
+ * #AOM_CODEC_CX_FRAME_PKT packets should be passed to the application's
+ * muxer. Multiple compressed frames may be in the list.
+ * #AOM_CODEC_STATS_PKT packets should be appended to a global buffer.
+ *
+ * The application \ref MUST silently ignore any packet kinds that it does
+ * not recognize or support.
+ *
+ * The data buffers returned from this function are only guaranteed to be
+ * valid until the application makes another call to any aom_codec_* function.
+ *
+ * \param[in]     ctx      Pointer to this instance's context
+ * \param[in,out] iter     Iterator storage, initialized to NULL
+ *
+ * \return Returns a pointer to an output data packet (compressed frame data,
+ *         two-pass statistics, etc.) or NULL to signal end-of-list.
+ *
+ */
+const aom_codec_cx_pkt_t *aom_codec_get_cx_data(aom_codec_ctx_t *ctx,
+                                                aom_codec_iter_t *iter);
+
+/*!\brief Get Preview Frame
+ *
+ * Returns an image that can be used as a preview. Shows the image as it would
+ * exist at the decompressor. The application \ref MUST NOT write into this
+ * image buffer.
+ *
+ * \param[in]     ctx      Pointer to this instance's context
+ *
+ * \return Returns a pointer to a preview image, or NULL if no image is
+ *         available.
+ *
+ */
+const aom_image_t *aom_codec_get_preview_frame(aom_codec_ctx_t *ctx);
+
+/*!@} - end defgroup encoder*/
+#ifdef __cplusplus
+}
+#endif
+#endif  // AOM_AOM_AOM_ENCODER_H_

diff --git a/libav1/aom/aom_frame_buffer.h b/libav1/aom/aom_frame_buffer.h
new file mode 100644
index 0000000..95af47f
--- /dev/null
+++ b/libav1/aom/aom_frame_buffer.h

@@ -0,0 +1,117 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_AOM_FRAME_BUFFER_H_
+#define AOM_AOM_AOM_FRAME_BUFFER_H_
+
+/*!\file
+ * \brief Describes the decoder external frame buffer interface.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "aom/aom_integer.h"
+
+/*!\brief The maximum number of work buffers used by libaom.
+ *  Support maximum 4 threads to decode video in parallel.
+ *  Each thread will use one work buffer.
+ * TODO(hkuang): Add support to set number of worker threads dynamically.
+ */
+#define AOM_MAXIMUM_WORK_BUFFERS 8
+
+/*!\brief The maximum number of reference buffers that a AV1 encoder may use.
+ */
+#define AOM_MAXIMUM_REF_BUFFERS 8
+
+/*!\brief External frame buffer
+ *
+ * This structure holds allocated frame buffers used by the decoder.
+ */
+typedef struct aom_codec_frame_buffer {
+  uint8_t *data; /**< Pointer to the data buffer */
+  size_t size;   /**< Size of data in bytes */
+  void *priv;    /**< Frame's private data */
+} aom_codec_frame_buffer_t;
+
+typedef struct av1_decoded_frame_buffer {
+    void *dx12_buffer; /**< Pointer to the data buffer */
+    void *buffer_host_ptr;
+    size_t buffer_size;
+    void *dx12_texture[3];   /**< Size of data in bytes */
+    void *priv;    /**< Frame's private data */
+} av1_decoded_frame_buffer_t;
+
+typedef enum
+{
+    fbtNone = -1,
+    fbt8bit = 0,
+    fbt10bit,
+    fbt10x3
+} frame_buffer_type;
+
+/*!\brief get frame buffer callback prototype
+ *
+ * This callback is invoked by the decoder to retrieve data for the frame
+ * buffer in order for the decode call to complete. The callback must
+ * allocate at least min_size in bytes and assign it to fb->data. The callback
+ * must zero out all the data allocated. Then the callback must set fb->size
+ * to the allocated size. The application does not need to align the allocated
+ * data. The callback is triggered when the decoder needs a frame buffer to
+ * decode a compressed image into. This function may be called more than once
+ * for every call to aom_codec_decode. The application may set fb->priv to
+ * some data which will be passed back in the aom_image_t and the release
+ * function call. |fb| is guaranteed to not be NULL. On success the callback
+ * must return 0. Any failure the callback must return a value less than 0.
+ *
+ * \param[in] priv         Callback's private data
+ * \param[in] new_size     Size in bytes needed by the buffer
+ * \param[in,out] fb       Pointer to aom_codec_frame_buffer_t
+ */
+typedef int (*aom_get_frame_buffer_cb_fn_t)(void *priv, size_t min_size,
+                                            av1_decoded_frame_buffer_t *fb);
+
+/*!\brief release frame buffer callback prototype
+ *
+ * This callback is invoked by the decoder when the frame buffer is not
+ * referenced by any other buffers. |fb| is guaranteed to not be NULL. On
+ * success the callback must return 0. Any failure the callback must return
+ * a value less than 0.
+ *
+ * \param[in] priv         Callback's private data
+ * \param[in] fb           Pointer to aom_codec_frame_buffer_t
+ */
+typedef int (*aom_release_frame_buffer_cb_fn_t)(void *priv,
+                                                aom_codec_frame_buffer_t *fb);
+
+typedef int(*av1_get_decoded_buffer_cb_fn_t)(void *priv, size_t buf_size, uint16_t w, uint16_t h, 
+    frame_buffer_type fb_type, av1_decoded_frame_buffer_t *fb);
+typedef int(*av1_release_decoded_buffer_cb_fn_t)(void *priv, void* buffer_priv);
+
+/*!\brief notify out frame ready callback prototype
+ *
+ * This callback is invoked by the decoder when next frame is decoded.
+ If
+ *
+ * \param[in] priv         Callback's private data
+ * \param[in] img          decoded frame. Zero means eof
+*/
+typedef int(*aom_notify_frame_ready_cb_fn_t)(void *priv,
+    void *img);
+
+
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_AOM_FRAME_BUFFER_H_

diff --git a/libav1/aom/aom_image.h b/libav1/aom/aom_image.h
new file mode 100644
index 0000000..2c72527
--- /dev/null
+++ b/libav1/aom/aom_image.h

@@ -0,0 +1,332 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+/*!\file
+ * \brief Describes the aom image descriptor and associated operations
+ *
+ */
+#ifndef AOM_AOM_AOM_IMAGE_H_
+#define AOM_AOM_AOM_IMAGE_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "aom/aom_integer.h"
+
+/*!\brief Current ABI version number
+ *
+ * \internal
+ * If this file is altered in any way that changes the ABI, this value
+ * must be bumped.  Examples include, but are not limited to, changing
+ * types, removing or reassigning enums, adding/removing/rearranging
+ * fields to structures
+ */
+#define AOM_IMAGE_ABI_VERSION (5) /**<\hideinitializer*/
+
+#define AOM_IMG_FMT_PLANAR 0x100  /**< Image is a planar format. */
+#define AOM_IMG_FMT_UV_FLIP 0x200 /**< V plane precedes U in memory. */
+/** 0x400 used to signal alpha channel, skipping for backwards compatibility. */
+#define AOM_IMG_FMT_HIGHBITDEPTH 0x800 /**< Image uses 16bit framebuffer. */
+
+/*!\brief List of supported image formats */
+typedef enum aom_img_fmt {
+  AOM_IMG_FMT_NONE,
+  AOM_IMG_FMT_YV12 =
+      AOM_IMG_FMT_PLANAR | AOM_IMG_FMT_UV_FLIP | 1, /**< planar YVU */
+  AOM_IMG_FMT_I420 = AOM_IMG_FMT_PLANAR | 2,
+  AOM_IMG_FMT_AOMYV12 = AOM_IMG_FMT_PLANAR | AOM_IMG_FMT_UV_FLIP |
+                        3, /** < planar 4:2:0 format with aom color space */
+  AOM_IMG_FMT_AOMI420 = AOM_IMG_FMT_PLANAR | 4,
+  AOM_IMG_FMT_I422 = AOM_IMG_FMT_PLANAR | 5,
+  AOM_IMG_FMT_I444 = AOM_IMG_FMT_PLANAR | 6,
+  AOM_IMG_FMT_I42016 = AOM_IMG_FMT_I420 | AOM_IMG_FMT_HIGHBITDEPTH,
+  AOM_IMG_FMT_YV1216 = AOM_IMG_FMT_YV12 | AOM_IMG_FMT_HIGHBITDEPTH,
+  AOM_IMG_FMT_I42216 = AOM_IMG_FMT_I422 | AOM_IMG_FMT_HIGHBITDEPTH,
+  AOM_IMG_FMT_I44416 = AOM_IMG_FMT_I444 | AOM_IMG_FMT_HIGHBITDEPTH,
+} aom_img_fmt_t; /**< alias for enum aom_img_fmt */
+
+/*!\brief List of supported color primaries */
+typedef enum aom_color_primaries {
+  AOM_CICP_CP_RESERVED_0 = 0,  /**< For future use */
+  AOM_CICP_CP_BT_709 = 1,      /**< BT.709 */
+  AOM_CICP_CP_UNSPECIFIED = 2, /**< Unspecified */
+  AOM_CICP_CP_RESERVED_3 = 3,  /**< For future use */
+  AOM_CICP_CP_BT_470_M = 4,    /**< BT.470 System M (historical) */
+  AOM_CICP_CP_BT_470_B_G = 5,  /**< BT.470 System B, G (historical) */
+  AOM_CICP_CP_BT_601 = 6,      /**< BT.601 */
+  AOM_CICP_CP_SMPTE_240 = 7,   /**< SMPTE 240 */
+  AOM_CICP_CP_GENERIC_FILM =
+      8, /**< Generic film (color filters using illuminant C) */
+  AOM_CICP_CP_BT_2020 = 9,      /**< BT.2020, BT.2100 */
+  AOM_CICP_CP_XYZ = 10,         /**< SMPTE 428 (CIE 1921 XYZ) */
+  AOM_CICP_CP_SMPTE_431 = 11,   /**< SMPTE RP 431-2 */
+  AOM_CICP_CP_SMPTE_432 = 12,   /**< SMPTE EG 432-1  */
+  AOM_CICP_CP_RESERVED_13 = 13, /**< For future use (values 13 - 21)  */
+  AOM_CICP_CP_EBU_3213 = 22,    /**< EBU Tech. 3213-E  */
+  AOM_CICP_CP_RESERVED_23 = 23  /**< For future use (values 23 - 255)  */
+} aom_color_primaries_t;        /**< alias for enum aom_color_primaries */
+
+/*!\brief List of supported transfer functions */
+typedef enum aom_transfer_characteristics {
+  AOM_CICP_TC_RESERVED_0 = 0,  /**< For future use */
+  AOM_CICP_TC_BT_709 = 1,      /**< BT.709 */
+  AOM_CICP_TC_UNSPECIFIED = 2, /**< Unspecified */
+  AOM_CICP_TC_RESERVED_3 = 3,  /**< For future use */
+  AOM_CICP_TC_BT_470_M = 4,    /**< BT.470 System M (historical)  */
+  AOM_CICP_TC_BT_470_B_G = 5,  /**< BT.470 System B, G (historical) */
+  AOM_CICP_TC_BT_601 = 6,      /**< BT.601 */
+  AOM_CICP_TC_SMPTE_240 = 7,   /**< SMPTE 240 M */
+  AOM_CICP_TC_LINEAR = 8,      /**< Linear */
+  AOM_CICP_TC_LOG_100 = 9,     /**< Logarithmic (100 : 1 range) */
+  AOM_CICP_TC_LOG_100_SQRT10 =
+      10,                     /**< Logarithmic (100 * Sqrt(10) : 1 range) */
+  AOM_CICP_TC_IEC_61966 = 11, /**< IEC 61966-2-4 */
+  AOM_CICP_TC_BT_1361 = 12,   /**< BT.1361 */
+  AOM_CICP_TC_SRGB = 13,      /**< sRGB or sYCC*/
+  AOM_CICP_TC_BT_2020_10_BIT = 14, /**< BT.2020 10-bit systems */
+  AOM_CICP_TC_BT_2020_12_BIT = 15, /**< BT.2020 12-bit systems */
+  AOM_CICP_TC_SMPTE_2084 = 16,     /**< SMPTE ST 2084, ITU BT.2100 PQ */
+  AOM_CICP_TC_SMPTE_428 = 17,      /**< SMPTE ST 428 */
+  AOM_CICP_TC_HLG = 18,            /**< BT.2100 HLG, ARIB STD-B67 */
+  AOM_CICP_TC_RESERVED_19 = 19     /**< For future use (values 19-255) */
+} aom_transfer_characteristics_t;  /**< alias for enum aom_transfer_function */
+
+/*!\brief List of supported matrix coefficients */
+typedef enum aom_matrix_coefficients {
+  AOM_CICP_MC_IDENTITY = 0,    /**< Identity matrix */
+  AOM_CICP_MC_BT_709 = 1,      /**< BT.709 */
+  AOM_CICP_MC_UNSPECIFIED = 2, /**< Unspecified */
+  AOM_CICP_MC_RESERVED_3 = 3,  /**< For future use */
+  AOM_CICP_MC_FCC = 4,         /**< US FCC 73.628 */
+  AOM_CICP_MC_BT_470_B_G = 5,  /**< BT.470 System B, G (historical) */
+  AOM_CICP_MC_BT_601 = 6,      /**< BT.601 */
+  AOM_CICP_MC_SMPTE_240 = 7,   /**< SMPTE 240 M */
+  AOM_CICP_MC_SMPTE_YCGCO = 8, /**< YCgCo */
+  AOM_CICP_MC_BT_2020_NCL =
+      9, /**< BT.2020 non-constant luminance, BT.2100 YCbCr  */
+  AOM_CICP_MC_BT_2020_CL = 10, /**< BT.2020 constant luminance */
+  AOM_CICP_MC_SMPTE_2085 = 11, /**< SMPTE ST 2085 YDzDx */
+  AOM_CICP_MC_CHROMAT_NCL =
+      12, /**< Chromaticity-derived non-constant luminance */
+  AOM_CICP_MC_CHROMAT_CL = 13, /**< Chromaticity-derived constant luminance */
+  AOM_CICP_MC_ICTCP = 14,      /**< BT.2100 ICtCp */
+  AOM_CICP_MC_RESERVED_15 = 15 /**< For future use (values 15-255)  */
+} aom_matrix_coefficients_t;
+
+/*!\brief List of supported color range */
+typedef enum aom_color_range {
+  AOM_CR_STUDIO_RANGE = 0, /**< Y [16..235], UV [16..240] */
+  AOM_CR_FULL_RANGE = 1    /**< YUV/RGB [0..255] */
+} aom_color_range_t;       /**< alias for enum aom_color_range */
+
+/*!\brief List of chroma sample positions */
+typedef enum aom_chroma_sample_position {
+  AOM_CSP_UNKNOWN = 0,          /**< Unknown */
+  AOM_CSP_VERTICAL = 1,         /**< Horizontally co-located with luma(0, 0)*/
+                                /**< sample, between two vertical samples */
+  AOM_CSP_COLOCATED = 2,        /**< Co-located with luma(0, 0) sample */
+  AOM_CSP_RESERVED = 3          /**< Reserved value */
+} aom_chroma_sample_position_t; /**< alias for enum aom_transfer_function */
+
+/**\brief Image Descriptor */
+typedef struct aom_image {
+  aom_img_fmt_t fmt;                 /**< Image Format */
+  aom_color_primaries_t cp;          /**< CICP Color Primaries */
+  aom_transfer_characteristics_t tc; /**< CICP Transfer Characteristics */
+  aom_matrix_coefficients_t mc;      /**< CICP Matrix Coefficients */
+  int monochrome;                    /**< Whether image is monochrome */
+  aom_chroma_sample_position_t csp;  /**< chroma sample position */
+  aom_color_range_t range;           /**< Color Range */
+
+  /* Image storage dimensions */
+  unsigned int w;         /**< Stored image width */
+  unsigned int h;         /**< Stored image height */
+  unsigned int bit_depth; /**< Stored image bit-depth */
+
+  /* Image display dimensions */
+  unsigned int d_w; /**< Displayed image width */
+  unsigned int d_h; /**< Displayed image height */
+
+  /* Image intended rendering dimensions */
+  unsigned int r_w; /**< Intended rendering image width */
+  unsigned int r_h; /**< Intended rendering image height */
+
+  /* Chroma subsampling info */
+  unsigned int x_chroma_shift; /**< subsampling order, X */
+  unsigned int y_chroma_shift; /**< subsampling order, Y */
+
+/* Image data pointers. */
+#define AOM_PLANE_PACKED 0  /**< To be used for all packed formats */
+#define AOM_PLANE_Y 0       /**< Y (Luminance) plane */
+#define AOM_PLANE_U 1       /**< U (Chroma) plane */
+#define AOM_PLANE_V 2       /**< V (Chroma) plane */
+  unsigned char *planes[3]; /**< pointer to the top left pixel for each plane */
+  int stride[3];            /**< stride between rows for each plane */
+  size_t sz;                /**< data size */
+
+  int bps; /**< bits per sample (for packed formats) */
+
+  int temporal_id; /**< Temporal layer Id of image */
+  int spatial_id;  /**< Spatial layer Id of image */
+
+  /*!\brief The following member may be set by the application to associate
+   * data with this image.
+   */
+  void *user_priv;
+
+  /* The following members should be treated as private. */
+  unsigned char *img_data; /**< private */
+  int img_data_owner;      /**< private */
+  int self_allocd;         /**< private */
+
+  void *fb_priv; /**< Frame buffer data associated with the image. */
+  void *fb2_priv; /**< external frame buffer data associated with the image. */
+  int is_hdr10x3; /**< 1010102 packed format (for hdr only). */
+} aom_image_t;   /**< alias for struct aom_image */
+
+/**\brief Representation of a rectangle on a surface */
+typedef struct aom_image_rect {
+  unsigned int x;   /**< leftmost column */
+  unsigned int y;   /**< topmost row */
+  unsigned int w;   /**< width */
+  unsigned int h;   /**< height */
+} aom_image_rect_t; /**< alias for struct aom_image_rect */
+
+/*!\brief Open a descriptor, allocating storage for the underlying image
+ *
+ * Returns a descriptor for storing an image of the given format. The
+ * storage for the descriptor is allocated on the heap.
+ *
+ * \param[in]    img       Pointer to storage for descriptor. If this parameter
+ *                         is NULL, the storage for the descriptor will be
+ *                         allocated on the heap.
+ * \param[in]    fmt       Format for the image
+ * \param[in]    d_w       Width of the image
+ * \param[in]    d_h       Height of the image
+ * \param[in]    align     Alignment, in bytes, of the image buffer and
+ *                         each row in the image(stride).
+ *
+ * \return Returns a pointer to the initialized image descriptor. If the img
+ *         parameter is non-null, the value of the img parameter will be
+ *         returned.
+ */
+aom_image_t *aom_img_alloc(aom_image_t *img, aom_img_fmt_t fmt,
+                           unsigned int d_w, unsigned int d_h,
+                           unsigned int align);
+
+/*!\brief Open a descriptor, using existing storage for the underlying image
+ *
+ * Returns a descriptor for storing an image of the given format. The
+ * storage for descriptor has been allocated elsewhere, and a descriptor is
+ * desired to "wrap" that storage.
+ *
+ * \param[in]    img       Pointer to storage for descriptor. If this parameter
+ *                         is NULL, the storage for the descriptor will be
+ *                         allocated on the heap.
+ * \param[in]    fmt       Format for the image
+ * \param[in]    d_w       Width of the image
+ * \param[in]    d_h       Height of the image
+ * \param[in]    align     Alignment, in bytes, of each row in the image.
+ * \param[in]    img_data  Storage to use for the image
+ *
+ * \return Returns a pointer to the initialized image descriptor. If the img
+ *         parameter is non-null, the value of the img parameter will be
+ *         returned.
+ */
+aom_image_t *aom_img_wrap(aom_image_t *img, aom_img_fmt_t fmt, unsigned int d_w,
+                          unsigned int d_h, unsigned int align,
+                          unsigned char *img_data);
+
+/*!\brief Open a descriptor, allocating storage for the underlying image with a
+ * border
+ *
+ * Returns a descriptor for storing an image of the given format and its
+ * borders. The storage for the descriptor is allocated on the heap.
+ *
+ * \param[in]    img        Pointer to storage for descriptor. If this parameter
+ *                          is NULL, the storage for the descriptor will be
+ *                          allocated on the heap.
+ * \param[in]    fmt        Format for the image
+ * \param[in]    d_w        Width of the image
+ * \param[in]    d_h        Height of the image
+ * \param[in]    align      Alignment, in bytes, of the image buffer and
+ *                          each row in the image(stride).
+ * \param[in]    size_align Alignment, in bytes, of the image width and height.
+ * \param[in]    border     A border that is padded on four sides of the image.
+ *
+ * \return Returns a pointer to the initialized image descriptor. If the img
+ *         parameter is non-null, the value of the img parameter will be
+ *         returned.
+ */
+aom_image_t *aom_img_alloc_with_border(aom_image_t *img, aom_img_fmt_t fmt,
+                                       unsigned int d_w, unsigned int d_h,
+                                       unsigned int align,
+                                       unsigned int size_align,
+                                       unsigned int border);
+
+/*!\brief Set the rectangle identifying the displayed portion of the image
+ *
+ * Updates the displayed rectangle (aka viewport) on the image surface to
+ * match the specified coordinates and size.
+ *
+ * \param[in]    img       Image descriptor
+ * \param[in]    x         leftmost column
+ * \param[in]    y         topmost row
+ * \param[in]    w         width
+ * \param[in]    h         height
+ * \param[in]    border    A border that is padded on four sides of the image.
+ *
+ * \return 0 if the requested rectangle is valid, nonzero otherwise.
+ */
+int aom_img_set_rect(aom_image_t *img, unsigned int x, unsigned int y,
+                     unsigned int w, unsigned int h, unsigned int border);
+
+/*!\brief Flip the image vertically (top for bottom)
+ *
+ * Adjusts the image descriptor's pointers and strides to make the image
+ * be referenced upside-down.
+ *
+ * \param[in]    img       Image descriptor
+ */
+void aom_img_flip(aom_image_t *img);
+
+/*!\brief Close an image descriptor
+ *
+ * Frees all allocated storage associated with an image descriptor.
+ *
+ * \param[in]    img       Image descriptor
+ */
+void aom_img_free(aom_image_t *img);
+
+/*!\brief Get the width of a plane
+ *
+ * Get the width of a plane of an image
+ *
+ * \param[in]    img       Image descriptor
+ * \param[in]    plane     Plane index
+ */
+int aom_img_plane_width(const aom_image_t *img, int plane);
+
+/*!\brief Get the height of a plane
+ *
+ * Get the height of a plane of an image
+ *
+ * \param[in]    img       Image descriptor
+ * \param[in]    plane     Plane index
+ */
+int aom_img_plane_height(const aom_image_t *img, int plane);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_AOM_IMAGE_H_

diff --git a/libav1/aom/aom_integer.h b/libav1/aom/aom_integer.h
new file mode 100644
index 0000000..90263bd
--- /dev/null
+++ b/libav1/aom/aom_integer.h

@@ -0,0 +1,106 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#ifndef AOM_AOM_AOM_INTEGER_H_
+#define AOM_AOM_AOM_INTEGER_H_
+
+/* get ptrdiff_t, size_t, wchar_t, NULL */
+#include <stddef.h>
+
+#if defined(_MSC_VER)
+#define AOM_FORCE_INLINE __forceinline
+#define AOM_INLINE __inline
+#else
+#define AOM_FORCE_INLINE __inline__ __attribute__((always_inline))
+// TODO(jbb): Allow a way to force inline off for older compilers.
+#define AOM_INLINE inline
+#endif
+
+#if defined(AOM_EMULATE_INTTYPES)
+typedef signed char int8_t;
+typedef signed short int16_t;
+typedef signed int int32_t;
+
+typedef unsigned char uint8_t;
+typedef unsigned short uint16_t;
+typedef unsigned int uint32_t;
+
+#ifndef _UINTPTR_T_DEFINED
+typedef size_t uintptr_t;
+#endif
+
+#else
+
+/* Most platforms have the C99 standard integer types. */
+
+#if defined(__cplusplus)
+#if !defined(__STDC_FORMAT_MACROS)
+#define __STDC_FORMAT_MACROS
+#endif
+#if !defined(__STDC_LIMIT_MACROS)
+#define __STDC_LIMIT_MACROS
+#endif
+#endif  // __cplusplus
+
+#include <stdint.h>
+
+#endif
+
+/* VS2010 defines stdint.h, but not inttypes.h */
+#if defined(_MSC_VER) && _MSC_VER < 1800
+#define PRId64 "I64d"
+#else
+#include <inttypes.h>
+#endif
+
+#if !defined(INT8_MAX)
+#define INT8_MAX 127
+#endif
+
+#if !defined(INT32_MAX)
+#define INT32_MAX 2147483647
+#endif
+
+#if !defined(INT32_MIN)
+#define INT32_MIN (-2147483647 - 1)
+#endif
+
+#define NELEMENTS(x) (int)(sizeof(x) / sizeof(x[0]))
+
+#if defined(__cplusplus)
+extern "C" {
+#endif  // __cplusplus
+
+// Returns size of uint64_t when encoded using LEB128.
+size_t aom_uleb_size_in_bytes(uint64_t value);
+
+// Returns 0 on success, -1 on decode failure.
+// On success, 'value' stores the decoded LEB128 value and 'length' stores
+// the number of bytes decoded.
+int aom_uleb_decode(const uint8_t *buffer, size_t available, uint64_t *value,
+                    size_t *length);
+
+// Encodes LEB128 integer. Returns 0 when successful, and -1 upon failure.
+int aom_uleb_encode(uint64_t value, size_t available, uint8_t *coded_value,
+                    size_t *coded_size);
+
+// Encodes LEB128 integer to size specified. Returns 0 when successful, and -1
+// upon failure.
+// Note: This will write exactly pad_to_size bytes; if the value cannot be
+// encoded in this many bytes, then this will fail.
+int aom_uleb_encode_fixed_size(uint64_t value, size_t available,
+                               size_t pad_to_size, uint8_t *coded_value,
+                               size_t *coded_size);
+
+#if defined(__cplusplus)
+}  // extern "C"
+#endif  // __cplusplus
+
+#endif  // AOM_AOM_AOM_INTEGER_H_

diff --git a/libav1/aom/aomcx.h b/libav1/aom/aomcx.h
new file mode 100644
index 0000000..cb49c48
--- /dev/null
+++ b/libav1/aom/aomcx.h

@@ -0,0 +1,1422 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#ifndef AOM_AOM_AOMCX_H_
+#define AOM_AOM_AOMCX_H_
+
+/*!\defgroup aom_encoder AOMedia AOM/AV1 Encoder
+ * \ingroup aom
+ *
+ * @{
+ */
+#include "aom/aom.h"
+#include "aom/aom_encoder.h"
+
+/*!\file
+ * \brief Provides definitions for using AOM or AV1 encoder algorithm within the
+ *        aom Codec Interface.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*!\name Algorithm interface for AV1
+ *
+ * This interface provides the capability to encode raw AV1 streams.
+ * @{
+ */
+extern aom_codec_iface_t aom_codec_av1_cx_algo;
+extern aom_codec_iface_t *aom_codec_av1_cx(void);
+/*!@} - end algorithm interface member group*/
+
+/*
+ * Algorithm Flags
+ */
+
+/*!\brief Don't reference the last frame
+ *
+ * When this flag is set, the encoder will not use the last frame as a
+ * predictor. When not set, the encoder will choose whether to use the
+ * last frame or not automatically.
+ */
+#define AOM_EFLAG_NO_REF_LAST (1 << 16)
+/*!\brief Don't reference the last2 frame
+ *
+ * When this flag is set, the encoder will not use the last2 frame as a
+ * predictor. When not set, the encoder will choose whether to use the
+ * last2 frame or not automatically.
+ */
+#define AOM_EFLAG_NO_REF_LAST2 (1 << 17)
+/*!\brief Don't reference the last3 frame
+ *
+ * When this flag is set, the encoder will not use the last3 frame as a
+ * predictor. When not set, the encoder will choose whether to use the
+ * last3 frame or not automatically.
+ */
+#define AOM_EFLAG_NO_REF_LAST3 (1 << 18)
+/*!\brief Don't reference the golden frame
+ *
+ * When this flag is set, the encoder will not use the golden frame as a
+ * predictor. When not set, the encoder will choose whether to use the
+ * golden frame or not automatically.
+ */
+#define AOM_EFLAG_NO_REF_GF (1 << 19)
+
+/*!\brief Don't reference the alternate reference frame
+ *
+ * When this flag is set, the encoder will not use the alt ref frame as a
+ * predictor. When not set, the encoder will choose whether to use the
+ * alt ref frame or not automatically.
+ */
+#define AOM_EFLAG_NO_REF_ARF (1 << 20)
+/*!\brief Don't reference the bwd reference frame
+ *
+ * When this flag is set, the encoder will not use the bwd ref frame as a
+ * predictor. When not set, the encoder will choose whether to use the
+ * bwd ref frame or not automatically.
+ */
+#define AOM_EFLAG_NO_REF_BWD (1 << 21)
+/*!\brief Don't reference the alt2 reference frame
+ *
+ * When this flag is set, the encoder will not use the alt2 ref frame as a
+ * predictor. When not set, the encoder will choose whether to use the
+ * alt2 ref frame or not automatically.
+ */
+#define AOM_EFLAG_NO_REF_ARF2 (1 << 22)
+
+/*!\brief Don't update the last frame
+ *
+ * When this flag is set, the encoder will not update the last frame with
+ * the contents of the current frame.
+ */
+#define AOM_EFLAG_NO_UPD_LAST (1 << 23)
+
+/*!\brief Don't update the golden frame
+ *
+ * When this flag is set, the encoder will not update the golden frame with
+ * the contents of the current frame.
+ */
+#define AOM_EFLAG_NO_UPD_GF (1 << 24)
+
+/*!\brief Don't update the alternate reference frame
+ *
+ * When this flag is set, the encoder will not update the alt ref frame with
+ * the contents of the current frame.
+ */
+#define AOM_EFLAG_NO_UPD_ARF (1 << 25)
+/*!\brief Disable entropy update
+ *
+ * When this flag is set, the encoder will not update its internal entropy
+ * model based on the entropy of this frame.
+ */
+#define AOM_EFLAG_NO_UPD_ENTROPY (1 << 26)
+/*!\brief Disable ref frame mvs
+ *
+ * When this flag is set, the encoder will not allow frames to
+ * be encoded using mfmv.
+ */
+#define AOM_EFLAG_NO_REF_FRAME_MVS (1 << 27)
+/*!\brief Enable error resilient frame
+ *
+ * When this flag is set, the encoder will code frames as error
+ * resilient.
+ */
+#define AOM_EFLAG_ERROR_RESILIENT (1 << 28)
+/*!\brief Enable s frame mode
+ *
+ * When this flag is set, the encoder will code frames as an
+ * s frame.
+ */
+#define AOM_EFLAG_SET_S_FRAME (1 << 29)
+/*!\brief Force primary_ref_frame to PRIMARY_REF_NONE
+ *
+ * When this flag is set, the encoder will set a frame's primary_ref_frame
+ * to PRIMARY_REF_NONE
+ */
+#define AOM_EFLAG_SET_PRIMARY_REF_NONE (1 << 30)
+
+/*!\brief AVx encoder control functions
+ *
+ * This set of macros define the control functions available for AVx
+ * encoder interface.
+ *
+ * \sa #aom_codec_control
+ */
+enum aome_enc_control_id {
+  /*!\brief Codec control function to set which reference frame encoder can use.
+   */
+  AOME_USE_REFERENCE = 7,
+
+  /*!\brief Codec control function to pass an ROI map to encoder.
+   */
+  AOME_SET_ROI_MAP = 8,
+
+  /*!\brief Codec control function to pass an Active map to encoder.
+   */
+  AOME_SET_ACTIVEMAP,
+
+  /*!\brief Codec control function to set encoder scaling mode.
+   */
+  AOME_SET_SCALEMODE = 11,
+
+  /*!\brief Codec control function to set encoder spatial layer id.
+   */
+  AOME_SET_SPATIAL_LAYER_ID = 12,
+
+  /*!\brief Codec control function to set encoder internal speed settings.
+   *
+   * Changes in this value influences, among others, the encoder's selection
+   * of motion estimation methods. Values greater than 0 will increase encoder
+   * speed at the expense of quality.
+   *
+   * \note Valid range: 0..8
+   */
+  AOME_SET_CPUUSED = 13,
+
+  /*!\brief Codec control function to enable automatic set and use alf frames.
+   */
+  AOME_SET_ENABLEAUTOALTREF,
+
+  /*!\brief Codec control function to set sharpness.
+   */
+  AOME_SET_SHARPNESS = AOME_SET_ENABLEAUTOALTREF + 2,
+
+  /*!\brief Codec control function to set the threshold for MBs treated static.
+   */
+  AOME_SET_STATIC_THRESHOLD,
+
+  /*!\brief Codec control function to get last quantizer chosen by the encoder.
+   *
+   * Return value uses internal quantizer scale defined by the codec.
+   */
+  AOME_GET_LAST_QUANTIZER = AOME_SET_STATIC_THRESHOLD + 2,
+
+  /*!\brief Codec control function to get last quantizer chosen by the encoder.
+   *
+   * Return value uses the 0..63 scale as used by the rc_*_quantizer config
+   * parameters.
+   */
+  AOME_GET_LAST_QUANTIZER_64,
+
+  /*!\brief Codec control function to set the max no of frames to create arf.
+   */
+  AOME_SET_ARNR_MAXFRAMES,
+
+  /*!\brief Codec control function to set the filter strength for the arf.
+   */
+  AOME_SET_ARNR_STRENGTH,
+
+  /*!\brief Codec control function to set visual tuning.
+   */
+  AOME_SET_TUNING = AOME_SET_ARNR_STRENGTH + 2,
+
+  /*!\brief Codec control function to set constrained quality level.
+   *
+   * \attention For this value to be used aom_codec_enc_cfg_t::g_usage must be
+   *            set to #AOM_CQ.
+   * \note Valid range: 0..63
+   */
+  AOME_SET_CQ_LEVEL,
+
+  /*!\brief Codec control function to set Max data rate for Intra frames.
+   *
+   * This value controls additional clamping on the maximum size of a
+   * keyframe. It is expressed as a percentage of the average
+   * per-frame bitrate, with the special (and default) value 0 meaning
+   * unlimited, or no additional clamping beyond the codec's built-in
+   * algorithm.
+   *
+   * For example, to allocate no more than 4.5 frames worth of bitrate
+   * to a keyframe, set this to 450.
+   */
+  AOME_SET_MAX_INTRA_BITRATE_PCT,
+
+  /*!\brief Codec control function to set number of spatial layers.
+   */
+  AOME_SET_NUMBER_SPATIAL_LAYERS,
+
+  /*!\brief Codec control function to set max data rate for Inter frames.
+   *
+   * This value controls additional clamping on the maximum size of an
+   * inter frame. It is expressed as a percentage of the average
+   * per-frame bitrate, with the special (and default) value 0 meaning
+   * unlimited, or no additional clamping beyond the codec's built-in
+   * algorithm.
+   *
+   * For example, to allow no more than 4.5 frames worth of bitrate
+   * to an inter frame, set this to 450.
+   */
+  AV1E_SET_MAX_INTER_BITRATE_PCT = AOME_SET_MAX_INTRA_BITRATE_PCT + 2,
+
+  /*!\brief Boost percentage for Golden Frame in CBR mode.
+   *
+   * This value controls the amount of boost given to Golden Frame in
+   * CBR mode. It is expressed as a percentage of the average
+   * per-frame bitrate, with the special (and default) value 0 meaning
+   * the feature is off, i.e., no golden frame boost in CBR mode and
+   * average bitrate target is used.
+   *
+   * For example, to allow 100% more bits, i.e, 2X, in a golden frame
+   * than average frame, set this to 100.
+   */
+  AV1E_SET_GF_CBR_BOOST_PCT,
+
+  /*!\brief Codec control function to set lossless encoding mode.
+   *
+   * AV1 can operate in lossless encoding mode, in which the bitstream
+   * produced will be able to decode and reconstruct a perfect copy of
+   * input source. This control function provides a mean to switch encoder
+   * into lossless coding mode(1) or normal coding mode(0) that may be lossy.
+   *                          0 = lossy coding mode
+   *                          1 = lossless coding mode
+   *
+   *  By default, encoder operates in normal coding mode (maybe lossy).
+   */
+  AV1E_SET_LOSSLESS = AV1E_SET_GF_CBR_BOOST_PCT + 2,
+
+  /** control function to enable the row based multi-threading of encoder. A
+   * value that is equal to 1 indicates that row based multi-threading is
+   * enabled.
+   */
+  AV1E_SET_ROW_MT,
+
+  /*!\brief Codec control function to set number of tile columns.
+   *
+   * In encoding and decoding, AV1 allows an input image frame be partitioned
+   * into separate vertical tile columns, which can be encoded or decoded
+   * independently. This enables easy implementation of parallel encoding and
+   * decoding. The parameter for this control describes the number of tile
+   * columns (in log2 units), which has a valid range of [0, 6]:
+   *             0 = 1 tile column
+   *             1 = 2 tile columns
+   *             2 = 4 tile columns
+   *             .....
+   *             n = 2**n tile columns
+   *
+   * By default, the value is 0, i.e. one single column tile for entire image.
+   */
+  AV1E_SET_TILE_COLUMNS,
+
+  /*!\brief Codec control function to set number of tile rows.
+   *
+   * In encoding and decoding, AV1 allows an input image frame be partitioned
+   * into separate horizontal tile rows, which can be encoded or decoded
+   * independently. The parameter for this control describes the number of tile
+   * rows (in log2 units), which has a valid range of [0, 6]:
+   *            0 = 1 tile row
+   *            1 = 2 tile rows
+   *            2 = 4 tile rows
+   *            .....
+   *            n = 2**n tile rows
+   *
+   * By default, the value is 0, i.e. one single row tile for entire image.
+   */
+  AV1E_SET_TILE_ROWS,
+
+  /*!\brief Codec control function to enable RDO modulated by frame temporal
+   * dependency.
+   *
+   * By default, this feature is off.
+   */
+  AV1E_SET_ENABLE_TPL_MODEL,
+
+  /*!\brief Codec control function to enable frame parallel decoding feature.
+   *
+   * AV1 has a bitstream feature to reduce decoding dependency between frames
+   * by turning off backward update of probability context used in encoding
+   * and decoding. This allows staged parallel processing of more than one
+   * video frames in the decoder. This control function provides a mean to
+   * turn this feature on or off for bitstreams produced by encoder.
+   *
+   * By default, this feature is off.
+   */
+  AV1E_SET_FRAME_PARALLEL_DECODING,
+
+  /*!\brief Codec control function to enable error_resilient_mode
+   *
+   * AV1 has a bitstream feature to guarantee parseability of a frame
+   * by turning on the error_resilient_decoding mode, even though the
+   * reference buffers are unreliable or not received.
+   *
+   * By default, this feature is off.
+   */
+  AV1E_SET_ERROR_RESILIENT_MODE,
+
+  /*!\brief Codec control function to enable s_frame_mode
+   *
+   * AV1 has a bitstream feature to designate certain frames as S-frames,
+   * from where we can switch to a different stream,
+   * even though the reference buffers may not be exactly identical.
+   *
+   * By default, this feature is off.
+   */
+  AV1E_SET_S_FRAME_MODE,
+
+  /*!\brief Codec control function to set adaptive quantization mode.
+   *
+   * AV1 has a segment based feature that allows encoder to adaptively change
+   * quantization parameter for each segment within a frame to improve the
+   * subjective quality. This control makes encoder operate in one of the
+   * several AQ_modes supported.
+   *
+   * By default, encoder operates with AQ_Mode 0(adaptive quantization off).
+   */
+  AV1E_SET_AQ_MODE,
+
+  /*!\brief Codec control function to enable/disable periodic Q boost.
+   *
+   * One AV1 encoder speed feature is to enable quality boost by lowering
+   * frame level Q periodically. This control function provides a mean to
+   * turn on/off this feature.
+   *               0 = off
+   *               1 = on
+   *
+   * By default, the encoder is allowed to use this feature for appropriate
+   * encoding modes.
+   */
+  AV1E_SET_FRAME_PERIODIC_BOOST,
+
+  /*!\brief Codec control function to set noise sensitivity.
+   *
+   *  0: off, 1: On(YOnly)
+   */
+  AV1E_SET_NOISE_SENSITIVITY,
+
+  /*!\brief Codec control function to set content type.
+   * \note Valid parameter range:
+   *              AOM_CONTENT_DEFAULT = Regular video content (Default)
+   *              AOM_CONTENT_SCREEN  = Screen capture content
+   */
+  AV1E_SET_TUNE_CONTENT,
+
+  /*!\brief Codec control function to set CDF update mode.
+   *
+   *  0: no update          1: update on every frame
+   *  2: selectively update
+   */
+  AV1E_SET_CDF_UPDATE_MODE,
+
+  /*!\brief Codec control function to set color space info.
+   * \note Valid ranges: 0..23, default is "Unspecified".
+   *                     0 = For future use
+   *                     1 = BT.709
+   *                     2 = Unspecified
+   *                     3 = For future use
+   *                     4 = BT.470 System M (historical)
+   *                     5 = BT.470 System B, G (historical)
+   *                     6 = BT.601
+   *                     7 = SMPTE 240
+   *                     8 = Generic film (color filters using illuminant C)
+   *                     9 = BT.2020, BT.2100
+   *                     10 = SMPTE 428 (CIE 1921 XYZ)
+   *                     11 = SMPTE RP 431-2
+   *                     12 = SMPTE EG 432-1
+   *                     13 = For future use (values 13 - 21)
+   *                     22 = EBU Tech. 3213-E
+   *                     23 = For future use
+   *
+   */
+  AV1E_SET_COLOR_PRIMARIES,
+
+  /*!\brief Codec control function to set transfer function info.
+   * \note Valid ranges: 0..19, default is "Unspecified".
+   *                     0 = For future use
+   *                     1 = BT.709
+   *                     2 = Unspecified
+   *                     3 = For future use
+   *                     4 = BT.470 System M (historical)
+   *                     5 = BT.470 System B, G (historical)
+   *                     6 = BT.601
+   *                     7 = SMPTE 240 M
+   *                     8 = Linear
+   *                     9 = Logarithmic (100 : 1 range)
+   *                     10 = Logarithmic (100 * Sqrt(10) : 1 range)
+   *                     11 = IEC 61966-2-4
+   *                     12 = BT.1361
+   *                     13 = sRGB or sYCC
+   *                     14 = BT.2020 10-bit systems
+   *                     15 = BT.2020 12-bit systems
+   *                     16 = SMPTE ST 2084, ITU BT.2100 PQ
+   *                     17 = SMPTE ST 428
+   *                     18 = BT.2100 HLG, ARIB STD-B67
+   *                     19 = For future use
+   *
+   */
+  AV1E_SET_TRANSFER_CHARACTERISTICS,
+
+  /*!\brief Codec control function to set transfer function info.
+   * \note Valid ranges: 0..15, default is "Unspecified".
+   *                     0 = Identity matrix
+   *                     1 = BT.709
+   *                     2 = Unspecified
+   *                     3 = For future use
+   *                     4 = US FCC 73.628
+   *                     5 = BT.470 System B, G (historical)
+   *                     6 = BT.601
+   *                     7 = SMPTE 240 M
+   *                     8 = YCgCo
+   *                     9 = BT.2020 non-constant luminance, BT.2100 YCbCr
+   *                     10 = BT.2020 constant luminance
+   *                     11 = SMPTE ST 2085 YDzDx
+   *                     12 = Chromaticity-derived non-constant luminance
+   *                     13 = Chromaticity-derived constant luminance
+   *                     14 = BT.2100 ICtCp
+   *                     15 = For future use
+   *
+   */
+  AV1E_SET_MATRIX_COEFFICIENTS,
+
+  /*!\brief Codec control function to set chroma 4:2:0 sample position info.
+   * \note Valid ranges: 0..3, default is "UNKNOWN".
+   *                     0 = UNKNOWN,
+   *                     1 = VERTICAL
+   *                     2 = COLOCATED
+   *                     3 = RESERVED
+   */
+  AV1E_SET_CHROMA_SAMPLE_POSITION,
+
+  /*!\brief Codec control function to set minimum interval between GF/ARF frames
+   *
+   * By default the value is set as 4.
+   */
+  AV1E_SET_MIN_GF_INTERVAL,
+
+  /*!\brief Codec control function to set minimum interval between GF/ARF frames
+   *
+   * By default the value is set as 16.
+   */
+  AV1E_SET_MAX_GF_INTERVAL,
+
+  /*!\brief Codec control function to get an Active map back from the encoder.
+   */
+  AV1E_GET_ACTIVEMAP,
+
+  /*!\brief Codec control function to set color range bit.
+   * \note Valid ranges: 0..1, default is 0
+   *                     0 = Limited range (16..235 or HBD equivalent)
+   *                     1 = Full range (0..255 or HBD equivalent)
+   */
+  AV1E_SET_COLOR_RANGE,
+
+  /*!\brief Codec control function to set intended rendering image size.
+   *
+   * By default, this is identical to the image size in pixels.
+   */
+  AV1E_SET_RENDER_SIZE,
+
+  /*!\brief Codec control function to set target level.
+   *
+   * 255: off (default); 0: only keep level stats; 10: target for level 1.0;
+   * 11: target for level 1.1; ... 62: target for level 6.2
+   */
+  AV1E_SET_TARGET_LEVEL,
+
+  /*!\brief Codec control function to get bitstream level.
+   */
+  AV1E_GET_LEVEL,
+
+  /*!\brief Codec control function to set intended superblock size.
+   *
+   * By default, the superblock size is determined separately for each
+   * frame by the encoder.
+   *
+   * Experiment: EXT_PARTITION
+   */
+  AV1E_SET_SUPERBLOCK_SIZE,
+
+  /*!\brief Codec control function to enable automatic set and use
+   * bwd-pred frames.
+   *
+   */
+  AOME_SET_ENABLEAUTOBWDREF,
+
+  /*!\brief Codec control function to encode with CDEF.
+   *
+   * CDEF is the constrained directional enhancement filter which is an
+   * in-loop filter aiming to remove coding artifacts
+   *                          0 = do not apply CDEF
+   *                          1 = apply CDEF
+   *
+   *  By default, the encoder applies CDEF.
+   *
+   * Experiment: AOM_CDEF
+   */
+  AV1E_SET_ENABLE_CDEF,
+
+  /*!\brief Codec control function to encode with Loop Restoration Filter.
+   *
+   *                          0 = do not apply Restoration Filter
+   *                          1 = apply Restoration Filter
+   *
+   *  By default, the encoder applies Restoration Filter.
+   *
+   */
+  AV1E_SET_ENABLE_RESTORATION,
+
+  /*!\brief Codec control function to predict with OBMC mode.
+   *
+   *                          0 = do not allow OBMC mode
+   *                          1 = allow OBMC mode
+   *
+   *  By default, the encoder allows OBMC prediction mode.
+   *
+   */
+  AV1E_SET_ENABLE_OBMC,
+
+  /*!\brief Codec control function to encode without trellis quantization.
+   *
+   *                          0 = apply trellis quantization
+   *                          1 = do not apply trellis quantization
+   *                          2 = disable trellis quantization partially
+   *
+   *  By default, the encoder applies optimization on quantized
+   *  coefficients.
+   *
+   */
+  AV1E_SET_DISABLE_TRELLIS_QUANT,
+
+  /*!\brief Codec control function to encode with quantisation matrices.
+   *
+   * AOM can operate with default quantisation matrices dependent on
+   * quantisation level and block type.
+   *                          0 = do not use quantisation matrices
+   *                          1 = use quantisation matrices
+   *
+   *  By default, the encoder operates without quantisation matrices.
+   *
+   * Experiment: AOM_QM
+   */
+
+  AV1E_SET_ENABLE_QM,
+
+  /*!\brief Codec control function to set the min quant matrix flatness.
+   *
+   * AOM can operate with different ranges of quantisation matrices.
+   * As quantisation levels increase, the matrices get flatter. This
+   * control sets the minimum level of flatness from which the matrices
+   * are determined.
+   *
+   *  By default, the encoder sets this minimum at half the available
+   *  range.
+   *
+   * Experiment: AOM_QM
+   */
+  AV1E_SET_QM_MIN,
+
+  /*!\brief Codec control function to set the max quant matrix flatness.
+   *
+   * AOM can operate with different ranges of quantisation matrices.
+   * As quantisation levels increase, the matrices get flatter. This
+   * control sets the maximum level of flatness possible.
+   *
+   * By default, the encoder sets this maximum at the top of the
+   * available range.
+   *
+   * Experiment: AOM_QM
+   */
+  AV1E_SET_QM_MAX,
+
+  /*!\brief Codec control function to set the min quant matrix flatness.
+   *
+   * AOM can operate with different ranges of quantisation matrices.
+   * As quantisation levels increase, the matrices get flatter. This
+   * control sets the flatness for luma (Y).
+   *
+   *  By default, the encoder sets this minimum at half the available
+   *  range.
+   *
+   * Experiment: AOM_QM
+   */
+  AV1E_SET_QM_Y,
+
+  /*!\brief Codec control function to set the min quant matrix flatness.
+   *
+   * AOM can operate with different ranges of quantisation matrices.
+   * As quantisation levels increase, the matrices get flatter. This
+   * control sets the flatness for chroma (U).
+   *
+   *  By default, the encoder sets this minimum at half the available
+   *  range.
+   *
+   * Experiment: AOM_QM
+   */
+  AV1E_SET_QM_U,
+
+  /*!\brief Codec control function to set the min quant matrix flatness.
+   *
+   * AOM can operate with different ranges of quantisation matrices.
+   * As quantisation levels increase, the matrices get flatter. This
+   * control sets the flatness for chrome (V).
+   *
+   *  By default, the encoder sets this minimum at half the available
+   *  range.
+   *
+   * Experiment: AOM_QM
+   */
+  AV1E_SET_QM_V,
+
+  /*!\brief Codec control function to encode with dist_8x8.
+   *
+   *  The dist_8x8 is enabled automatically for model tuning parameters that
+   *  require measuring distortion at the 8x8 level. This control also allows
+   *  measuring distortion at the 8x8 level for other tuning options
+   *  (e.g., PSNR), for testing purposes.
+   *                          0 = do not use dist_8x8
+   *                          1 = use dist_8x8
+   *
+   *  By default, the encoder does not use dist_8x8
+   *
+   * Experiment: DIST_8X8
+   */
+  AV1E_SET_ENABLE_DIST_8X8,
+
+  /*!\brief Codec control function to set a maximum number of tile groups.
+   *
+   * This will set the maximum number of tile groups. This will be
+   * overridden if an MTU size is set. The default value is 1.
+   *
+   * Experiment: TILE_GROUPS
+   */
+  AV1E_SET_NUM_TG,
+
+  /*!\brief Codec control function to set an MTU size for a tile group.
+   *
+   * This will set the maximum number of bytes in a tile group. This can be
+   * exceeded only if a single tile is larger than this amount.
+   *
+   * By default, the value is 0, in which case a fixed number of tile groups
+   * is used.
+   *
+   * Experiment: TILE_GROUPS
+   */
+  AV1E_SET_MTU,
+
+  /*!\brief Codec control function to set the number of symbols in an ANS data
+   * window.
+   *
+   * The number of ANS symbols (both boolean and non-booleans alphabets) in an
+   * ANS data window is set to 1 << value.
+   *
+   * \note Valid range: [8, 23]
+   *
+   * Experiment: ANS
+   */
+  AV1E_SET_ANS_WINDOW_SIZE_LOG2,
+
+  /*!\brief Codec control function to enable/disable rectangular partitions.
+   *
+   * This will enable or disable usage of rectangular partitions. The default
+   * value is 1.
+   *
+   */
+  AV1E_SET_ENABLE_RECT_PARTITIONS,
+
+  /*!\brief Codec control function to turn on / off intra edge filter
+   * at sequence level.
+   *
+   * This will enable or disable usage of intra-edge filtering. The default
+   * value is 1.
+   *
+   */
+  AV1E_SET_ENABLE_INTRA_EDGE_FILTER,
+
+  /*!\brief Codec control function to turn on / off frame order hint for a
+   * few tools:
+   *
+   * joint compound mode
+   * motion field motion vector
+   * ref frame sign bias
+   *
+   * The default value is 1.
+   *
+   */
+  AV1E_SET_ENABLE_ORDER_HINT,
+
+  /*!\brief Codec control function to turn on / off 64-length transforms.
+   *
+   * This will enable or disable usage of length 64 transforms in any
+   * direction. The default value is 1.
+   *
+   */
+  AV1E_SET_ENABLE_TX64,
+
+  /*!\brief Codec control function to turn on / off dist-wtd compound mode
+   * at sequence level.
+   *
+   * This will enable or disable distance-weighted compound mode. The default
+   * value is 1. If AV1E_SET_ENABLE_ORDER_HINT is 0, then this flag is forced
+   * to 0.
+   *
+   */
+  AV1E_SET_ENABLE_DIST_WTD_COMP,
+
+  /*!\brief Codec control function to turn on / off ref frame mvs (mfmv) usage
+   * at sequence level.
+   *
+   * This will enable or disable usage of MFMV. The default value is 1.
+   * If AV1E_SET_ENABLE_ORDER_HINT is 0, then this flag is forced to 0.
+   *
+   */
+  AV1E_SET_ENABLE_REF_FRAME_MVS,
+
+  /*!\brief Codec control function to set temporal mv prediction
+   * enabling/disabling at frame level.
+   *
+   * This will enable or disable temporal mv predicton. The default value is 1.
+   * If AV1E_SET_ENABLE_REF_FRAME_MVS is 0, then this flag is forced to 0.
+   *
+   */
+  AV1E_SET_ALLOW_REF_FRAME_MVS,
+
+  /*!\brief Codec control function to turn on / off dual filter usage
+   * for a sequence.
+   *
+   * This will enable or disable use of dual interpolation filter.
+   * The default value is 1.
+   *
+   */
+  AV1E_SET_ENABLE_DUAL_FILTER,
+
+  /*!\brief Codec control function to turn on / off masked compound usage
+   * for a sequence.
+   *
+   * This will enable or disable usage of wedge and diff-wtd compound
+   * modes. The default value is 1.
+   *
+   */
+  AV1E_SET_ENABLE_MASKED_COMP,
+
+  /*!\brief Codec control function to turn on / off interintra compound
+   * for a sequence.
+   *
+   * This will enable or disable usage of inter-intra compound modes.
+   * The default value is 1.
+   *
+   */
+  AV1E_SET_ENABLE_INTERINTRA_COMP,
+
+  /*!\brief Codec control function to turn on / off smooth inter-intra
+   * mode for a sequence.
+   *
+   * This will enable or disable usage of smooth inter-intra mode.
+   * The default value is 1.
+   *
+   */
+  AV1E_SET_ENABLE_SMOOTH_INTERINTRA,
+
+  /*!\brief Codec control function to turn on / off difference weighted
+   * compound.
+   *
+   * This will enable or disable usage of difference weighted compound.
+   * The default value is 1.
+   *
+   */
+  AV1E_SET_ENABLE_DIFF_WTD_COMP,
+
+  /*!\brief Codec control function to turn on / off interinter wedge
+   * compound.
+   *
+   * This will enable or disable usage of interinter wedge compound.
+   * The default value is 1.
+   *
+   */
+  AV1E_SET_ENABLE_INTERINTER_WEDGE,
+
+  /*!\brief Codec control function to turn on / off interintra wedge
+   * compound.
+   *
+   * This will enable or disable usage of interintra wedge compound.
+   * The default value is 1.
+   *
+   */
+  AV1E_SET_ENABLE_INTERINTRA_WEDGE,
+
+  /*!\brief Codec control function to turn on / off global motion usage
+   * for a sequence.
+   *
+   * This will enable or disable usage of global motion. The default value is 1.
+   *
+   */
+  AV1E_SET_ENABLE_GLOBAL_MOTION,
+
+  /*!\brief Codec control function to turn on / off warped motion usage
+   * at sequence level.
+   *
+   * This will enable or disable usage of warped motion. The default value is 1.
+   *
+   */
+  AV1E_SET_ENABLE_WARPED_MOTION,
+
+  /*!\brief Codec control function to turn on / off warped motion usage
+   * at frame level.
+   *
+   * This will enable or disable usage of warped motion. The default value is 1.
+   * If AV1E_SET_ENABLE_WARPED_MOTION is 0, then this flag is forced to 0.
+   *
+   */
+  AV1E_SET_ALLOW_WARPED_MOTION,
+
+  /*!\brief Codec control function to turn on / off filter intra usage at
+   * sequence level.
+   *
+   * This will enable or disable usage of filter intra. The default value is 1.
+   * If AV1E_SET_ENABLE_FILTER_INTRA is 0, then this flag is forced to 0.
+   *
+   */
+  AV1E_SET_ENABLE_FILTER_INTRA,
+
+  /*!\brief Codec control function to turn on / off smooth intra modes usage.
+   *
+   * This will enable or disable usage of smooth, smooth_h and smooth_v intra
+   * modes. The default value is 1.
+   *
+   */
+  AV1E_SET_ENABLE_SMOOTH_INTRA,
+
+  /*!\brief Codec control function to turn on / off Paeth intra mode usage.
+   *
+   * This will enable or disable usage of Paeth intra mode. The default value
+   * is 1.
+   *
+   */
+  AV1E_SET_ENABLE_PAETH_INTRA,
+
+  /*!\brief Codec control function to turn on / off CFL uv intra mode usage.
+   *
+   * This will enable or disable usage of chroma-from-luma intra mode. The
+   * default value is 1.
+   *
+   */
+  AV1E_SET_ENABLE_CFL_INTRA,
+
+  /*!\brief Codec control function to turn on / off frame superresolution.
+   *
+   * This will enable or disable frame superresolution. The default value is 1
+   * If AV1E_SET_ENABLE_SUPERRES is 0, then this flag is forced to 0.
+   */
+  AV1E_SET_ENABLE_SUPERRES,
+
+  /*!\brief Codec control function to turn on/off palette mode */
+  AV1E_SET_ENABLE_PALETTE,
+
+  /*!\brief Codec control function to turn on/off intra block copy mode */
+  AV1E_SET_ENABLE_INTRABC,
+
+  /*!\brief Codec control function to turn on/off intra angle delta */
+  AV1E_SET_ENABLE_ANGLE_DELTA,
+
+  /*!\brief Codec control function to set the delta q mode
+   *
+   * AV1 has a segment based feature that allows encoder to adaptively change
+   * quantization parameter for each segment within a frame to improve the
+   * subjective quality. the delta q mode is added on top of segment based
+   * feature, and allows control per 64x64 q and lf delta.This control makes
+   * encoder operate in one of the several DELTA_Q_modes supported.
+   *
+   * By default, encoder operates with DELTAQ_Mode 0(deltaq signaling off).
+   */
+  AV1E_SET_DELTAQ_MODE,
+
+  /*!\brief Codec control function to set the single tile decoding mode to 0 or
+   * 1.
+   *
+   * 0 means that the single tile decoding is off, and 1 means that the single
+   * tile decoding is on.
+   *
+   * Experiment: EXT_TILE
+   */
+  AV1E_SET_SINGLE_TILE_DECODING,
+
+  /*!\brief Codec control function to enable the extreme motion vector unit test
+   * in AV1. Please note that this is only used in motion vector unit test.
+   *
+   * 0 : off, 1 : MAX_EXTREME_MV, 2 : MIN_EXTREME_MV
+   */
+  AV1E_ENABLE_MOTION_VECTOR_UNIT_TEST,
+
+  /*!\brief Codec control function to signal picture timing info in the
+   * bitstream. \note Valid ranges: 0..1, default is "UNKNOWN". 0 = UNKNOWN, 1 =
+   * EQUAL
+   */
+  AV1E_SET_TIMING_INFO_TYPE,
+
+  /*!\brief Codec control function to add film grain parameters (one of several
+   * preset types) info in the bitstream.
+   * \note Valid ranges: 0..11, default is "0". 0 = UNKNOWN,
+   * 1..16 = different test vectors for grain
+   */
+  AV1E_SET_FILM_GRAIN_TEST_VECTOR,
+
+  /*!\brief Codec control function to set the path to the film grain parameters
+   */
+  AV1E_SET_FILM_GRAIN_TABLE,
+
+  /*!\brief Sets the noise level */
+  AV1E_SET_DENOISE_NOISE_LEVEL,
+
+  /*!\brief Sets the denoisers block size */
+  AV1E_SET_DENOISE_BLOCK_SIZE,
+
+  /*!\brief Sets the chroma subsampling x value */
+  AV1E_SET_CHROMA_SUBSAMPLING_X,
+
+  /*!\brief Sets the chroma subsampling y value */
+  AV1E_SET_CHROMA_SUBSAMPLING_Y,
+
+  /*!\brief Control to use a reduced tx type set */
+  AV1E_SET_REDUCED_TX_TYPE_SET,
+
+  /*!\brief Control to use dct only for intra modes */
+  AV1E_SET_INTRA_DCT_ONLY,
+
+  /*!\brief Control to use dct only for inter modes */
+  AV1E_SET_INTER_DCT_ONLY,
+
+  /*!\brief Control to use default tx type only for intra modes */
+  AV1E_SET_INTRA_DEFAULT_TX_ONLY,
+
+  /*!\brief Control to use adaptive quantize_b */
+  AV1E_SET_QUANT_B_ADAPT,
+
+  /*!\brief Control to select maximum height for the GF group pyramid structure
+   * (valid values: 1 - 4) */
+  AV1E_SET_GF_MAX_PYRAMID_HEIGHT,
+
+  /*!\brief Control to select maximum reference frames allowed per frame
+   * (valid values: 3 - 7) */
+  AV1E_SET_MAX_REFERENCE_FRAMES,
+
+  /*!\brief Control to use reduced set of single and compound references. */
+  AV1E_SET_REDUCED_REFERENCE_SET,
+
+  /*!\brief Control to set frequency of the cost updates for coefficients
+   * Possible values are:
+   * 0: Update at SB level (default)
+   * 1: Update at SB row level in tile
+   * 2: Update at tile level
+   */
+  AV1E_SET_COEFF_COST_UPD_FREQ,
+
+  /*!\brief Control to set frequency of the cost updates for mode
+   * Possible values are:
+   * 0: Update at SB level (default)
+   * 1: Update at SB row level in tile
+   * 2: Update at tile level
+   */
+  AV1E_SET_MODE_COST_UPD_FREQ,
+};
+
+/*!\brief aom 1-D scaling mode
+ *
+ * This set of constants define 1-D aom scaling modes
+ */
+typedef enum aom_scaling_mode_1d {
+  AOME_NORMAL = 0,
+  AOME_FOURFIVE = 1,
+  AOME_THREEFIVE = 2,
+  AOME_ONETWO = 3
+} AOM_SCALING_MODE;
+
+/*!\brief Max number of segments
+ *
+ * This is the limit of number of segments allowed within a frame.
+ *
+ * Currently same as "MAX_SEGMENTS" in AV1, the maximum that AV1 supports.
+ *
+ */
+#define AOM_MAX_SEGMENTS 8
+
+/*!\brief  aom region of interest map
+ *
+ * These defines the data structures for the region of interest map
+ *
+ * TODO(yaowu): create a unit test for ROI map related APIs
+ *
+ */
+typedef struct aom_roi_map {
+  /*! An id between 0 and 7 for each 8x8 region within a frame. */
+  unsigned char *roi_map;
+  unsigned int rows;              /**< Number of rows. */
+  unsigned int cols;              /**< Number of columns. */
+  int delta_q[AOM_MAX_SEGMENTS];  /**< Quantizer deltas. */
+  int delta_lf[AOM_MAX_SEGMENTS]; /**< Loop filter deltas. */
+  /*! Static breakout threshold for each segment. */
+  unsigned int static_threshold[AOM_MAX_SEGMENTS];
+} aom_roi_map_t;
+
+/*!\brief  aom active region map
+ *
+ * These defines the data structures for active region map
+ *
+ */
+
+typedef struct aom_active_map {
+  /*!\brief specify an on (1) or off (0) each 16x16 region within a frame */
+  unsigned char *active_map;
+  unsigned int rows; /**< number of rows */
+  unsigned int cols; /**< number of cols */
+} aom_active_map_t;
+
+/*!\brief  aom image scaling mode
+ *
+ * This defines the data structure for image scaling mode
+ *
+ */
+typedef struct aom_scaling_mode {
+  AOM_SCALING_MODE h_scaling_mode; /**< horizontal scaling mode */
+  AOM_SCALING_MODE v_scaling_mode; /**< vertical scaling mode   */
+} aom_scaling_mode_t;
+
+/*!brief AV1 encoder content type */
+typedef enum {
+  AOM_CONTENT_DEFAULT,
+  AOM_CONTENT_SCREEN,
+  AOM_CONTENT_INVALID
+} aom_tune_content;
+
+/*!brief AV1 encoder timing info type signaling */
+typedef enum {
+  AOM_TIMING_UNSPECIFIED,
+  AOM_TIMING_EQUAL,
+  AOM_TIMING_DEC_MODEL
+} aom_timing_info_type_t;
+
+/*!\brief Model tuning parameters
+ *
+ * Changes the encoder to tune for certain types of input material.
+ *
+ */
+typedef enum {
+  AOM_TUNE_PSNR,
+  AOM_TUNE_SSIM,
+  AOM_TUNE_CDEF_DIST,
+  AOM_TUNE_DAALA_DIST
+} aom_tune_metric;
+
+/*!\cond */
+/*!\brief Encoder control function parameter type
+ *
+ * Defines the data types that AOME/AV1E control functions take. Note that
+ * additional common controls are defined in aom.h
+ *
+ */
+
+AOM_CTRL_USE_TYPE(AOME_USE_REFERENCE, int)
+#define AOM_CTRL_AOME_USE_REFERENCE
+AOM_CTRL_USE_TYPE(AOME_SET_ROI_MAP, aom_roi_map_t *)
+#define AOM_CTRL_AOME_SET_ROI_MAP
+AOM_CTRL_USE_TYPE(AOME_SET_ACTIVEMAP, aom_active_map_t *)
+#define AOM_CTRL_AOME_SET_ACTIVEMAP
+AOM_CTRL_USE_TYPE(AOME_SET_SCALEMODE, aom_scaling_mode_t *)
+#define AOM_CTRL_AOME_SET_SCALEMODE
+
+AOM_CTRL_USE_TYPE(AOME_SET_SPATIAL_LAYER_ID, int)
+#define AOM_CTRL_AOME_SET_SPATIAL_LAYER_ID
+
+AOM_CTRL_USE_TYPE(AOME_SET_CPUUSED, int)
+#define AOM_CTRL_AOME_SET_CPUUSED
+AOM_CTRL_USE_TYPE(AOME_SET_DEVSF, int)
+#define AOM_CTRL_AOME_SET_DEVSF
+AOM_CTRL_USE_TYPE(AOME_SET_ENABLEAUTOALTREF, unsigned int)
+#define AOM_CTRL_AOME_SET_ENABLEAUTOALTREF
+
+AOM_CTRL_USE_TYPE(AOME_SET_ENABLEAUTOBWDREF, unsigned int)
+#define AOM_CTRL_AOME_SET_ENABLEAUTOBWDREF
+
+AOM_CTRL_USE_TYPE(AOME_SET_SHARPNESS, unsigned int)
+#define AOM_CTRL_AOME_SET_SHARPNESS
+AOM_CTRL_USE_TYPE(AOME_SET_STATIC_THRESHOLD, unsigned int)
+#define AOM_CTRL_AOME_SET_STATIC_THRESHOLD
+
+AOM_CTRL_USE_TYPE(AOME_SET_ARNR_MAXFRAMES, unsigned int)
+#define AOM_CTRL_AOME_SET_ARNR_MAXFRAMES
+AOM_CTRL_USE_TYPE(AOME_SET_ARNR_STRENGTH, unsigned int)
+#define AOM_CTRL_AOME_SET_ARNR_STRENGTH
+AOM_CTRL_USE_TYPE(AOME_SET_TUNING, int) /* aom_tune_metric */
+#define AOM_CTRL_AOME_SET_TUNING
+AOM_CTRL_USE_TYPE(AOME_SET_CQ_LEVEL, unsigned int)
+#define AOM_CTRL_AOME_SET_CQ_LEVEL
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ROW_MT, int)
+#define AOM_CTRL_AV1E_SET_ROW_MT
+
+AOM_CTRL_USE_TYPE(AV1E_SET_TILE_COLUMNS, int)
+#define AOM_CTRL_AV1E_SET_TILE_COLUMNS
+AOM_CTRL_USE_TYPE(AV1E_SET_TILE_ROWS, int)
+#define AOM_CTRL_AV1E_SET_TILE_ROWS
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_TPL_MODEL, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_TPL_MODEL
+
+AOM_CTRL_USE_TYPE(AOME_GET_LAST_QUANTIZER, int *)
+#define AOM_CTRL_AOME_GET_LAST_QUANTIZER
+AOM_CTRL_USE_TYPE(AOME_GET_LAST_QUANTIZER_64, int *)
+#define AOM_CTRL_AOME_GET_LAST_QUANTIZER_64
+
+AOM_CTRL_USE_TYPE(AOME_SET_MAX_INTRA_BITRATE_PCT, unsigned int)
+#define AOM_CTRL_AOME_SET_MAX_INTRA_BITRATE_PCT
+AOM_CTRL_USE_TYPE(AOME_SET_MAX_INTER_BITRATE_PCT, unsigned int)
+#define AOM_CTRL_AOME_SET_MAX_INTER_BITRATE_PCT
+
+AOM_CTRL_USE_TYPE(AOME_SET_NUMBER_SPATIAL_LAYERS, int)
+#define AOME_CTRL_AOME_SET_NUMBER_SPATIAL_LAYERS
+
+AOM_CTRL_USE_TYPE(AV1E_SET_GF_CBR_BOOST_PCT, unsigned int)
+#define AOM_CTRL_AV1E_SET_GF_CBR_BOOST_PCT
+
+AOM_CTRL_USE_TYPE(AV1E_SET_LOSSLESS, unsigned int)
+#define AOM_CTRL_AV1E_SET_LOSSLESS
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_CDEF, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_CDEF
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_RESTORATION, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_RESTORATION
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_OBMC, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_OBMC
+
+AOM_CTRL_USE_TYPE(AV1E_SET_DISABLE_TRELLIS_QUANT, unsigned int)
+#define AOM_CTRL_AV1E_SET_DISABLE_TRELLIS_QUANT
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_QM, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_QM
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_DIST_8X8, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_DIST_8X8
+
+AOM_CTRL_USE_TYPE(AV1E_SET_QM_MIN, unsigned int)
+#define AOM_CTRL_AV1E_SET_QM_MIN
+
+AOM_CTRL_USE_TYPE(AV1E_SET_QM_MAX, unsigned int)
+#define AOM_CTRL_AV1E_SET_QM_MAX
+
+AOM_CTRL_USE_TYPE(AV1E_SET_QM_Y, unsigned int)
+#define AOM_CTRL_AV1E_SET_QM_Y
+
+AOM_CTRL_USE_TYPE(AV1E_SET_QM_U, unsigned int)
+#define AOM_CTRL_AV1E_SET_QM_U
+
+AOM_CTRL_USE_TYPE(AV1E_SET_QM_V, unsigned int)
+#define AOM_CTRL_AV1E_SET_QM_V
+
+AOM_CTRL_USE_TYPE(AV1E_SET_NUM_TG, unsigned int)
+#define AOM_CTRL_AV1E_SET_NUM_TG
+AOM_CTRL_USE_TYPE(AV1E_SET_MTU, unsigned int)
+#define AOM_CTRL_AV1E_SET_MTU
+
+AOM_CTRL_USE_TYPE(AV1E_SET_TIMING_INFO_TYPE, int) /* aom_timing_info_type_t */
+#define AOM_CTRL_AV1E_SET_TIMING_INFO_TYPE
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_RECT_PARTITIONS, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_RECT_PARTITIONS
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_INTRA_EDGE_FILTER, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_INTRA_EDGE_FILTER
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_ORDER_HINT, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_ORDER_HINT
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_TX64, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_TX64
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_DIST_WTD_COMP, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_DIST_WTD_COMP
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_REF_FRAME_MVS, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_REF_FRAME_MVS
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ALLOW_REF_FRAME_MVS, unsigned int)
+#define AOM_CTRL_AV1E_SET_ALLOW_REF_FRAME_MVS
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_DUAL_FILTER, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_DUAL_FILTER
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_MASKED_COMP, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_MASKED_COMP
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_INTERINTRA_COMP, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_INTERINTRA_COMP
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_SMOOTH_INTERINTRA, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_SMOOTH_INTERINTRA
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_DIFF_WTD_COMP, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_DIFF_WTD_COMP
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_INTERINTER_WEDGE, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_INTERINTER_WEDGE
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_INTERINTRA_WEDGE, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_INTERINTRA_WEDGE
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_GLOBAL_MOTION, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_GLOBAL_MOTION
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_WARPED_MOTION, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_WARPED_MOTION
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ALLOW_WARPED_MOTION, unsigned int)
+#define AOM_CTRL_AV1E_SET_ALLOW_WARPED_MOTION
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_FILTER_INTRA, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_FILTER_INTRA
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_SMOOTH_INTRA, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_SMOOTH_INTRA
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_PAETH_INTRA, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_PAETH_INTRA
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_CFL_INTRA, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_CFL_INTRA
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_SUPERRES, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_SUPERRES
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_PALETTE, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_PALETTE
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_INTRABC, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_INTRABC
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ENABLE_ANGLE_DELTA, unsigned int)
+#define AOM_CTRL_AV1E_SET_ENABLE_ANGLE_DELTA
+
+AOM_CTRL_USE_TYPE(AV1E_SET_FRAME_PARALLEL_DECODING, unsigned int)
+#define AOM_CTRL_AV1E_SET_FRAME_PARALLEL_DECODING
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ERROR_RESILIENT_MODE, unsigned int)
+#define AOM_CTRL_AV1E_SET_ERROR_RESILIENT_MODE
+
+AOM_CTRL_USE_TYPE(AV1E_SET_S_FRAME_MODE, unsigned int)
+#define AOM_CTRL_AV1E_SET_S_FRAME_MODE
+
+AOM_CTRL_USE_TYPE(AV1E_SET_AQ_MODE, unsigned int)
+#define AOM_CTRL_AV1E_SET_AQ_MODE
+
+AOM_CTRL_USE_TYPE(AV1E_SET_DELTAQ_MODE, unsigned int)
+#define AOM_CTRL_AV1E_SET_DELTAQ_MODE
+
+AOM_CTRL_USE_TYPE(AV1E_SET_FRAME_PERIODIC_BOOST, unsigned int)
+#define AOM_CTRL_AV1E_SET_FRAME_PERIODIC_BOOST
+
+AOM_CTRL_USE_TYPE(AV1E_SET_NOISE_SENSITIVITY, unsigned int)
+#define AOM_CTRL_AV1E_SET_NOISE_SENSITIVITY
+
+AOM_CTRL_USE_TYPE(AV1E_SET_TUNE_CONTENT, int) /* aom_tune_content */
+#define AOM_CTRL_AV1E_SET_TUNE_CONTENT
+
+AOM_CTRL_USE_TYPE(AV1E_SET_COLOR_PRIMARIES, int)
+#define AOM_CTRL_AV1E_SET_COLOR_PRIMARIES
+
+AOM_CTRL_USE_TYPE(AV1E_SET_TRANSFER_CHARACTERISTICS, int)
+#define AOM_CTRL_AV1E_SET_TRANSFER_CHARACTERISTICS
+
+AOM_CTRL_USE_TYPE(AV1E_SET_MATRIX_COEFFICIENTS, int)
+#define AOM_CTRL_AV1E_SET_MATRIX_COEFFICIENTS
+
+AOM_CTRL_USE_TYPE(AV1E_SET_CHROMA_SAMPLE_POSITION, int)
+#define AOM_CTRL_AV1E_SET_CHROMA_SAMPLE_POSITION
+
+AOM_CTRL_USE_TYPE(AV1E_SET_MIN_GF_INTERVAL, unsigned int)
+#define AOM_CTRL_AV1E_SET_MIN_GF_INTERVAL
+
+AOM_CTRL_USE_TYPE(AV1E_SET_MAX_GF_INTERVAL, unsigned int)
+#define AOM_CTRL_AV1E_SET_MAX_GF_INTERVAL
+
+AOM_CTRL_USE_TYPE(AV1E_GET_ACTIVEMAP, aom_active_map_t *)
+#define AOM_CTRL_AV1E_GET_ACTIVEMAP
+
+AOM_CTRL_USE_TYPE(AV1E_SET_COLOR_RANGE, int)
+#define AOM_CTRL_AV1E_SET_COLOR_RANGE
+
+#define AOM_CTRL_AV1E_SET_RENDER_SIZE
+AOM_CTRL_USE_TYPE(AV1E_SET_RENDER_SIZE, int *)
+
+AOM_CTRL_USE_TYPE(AV1E_SET_SUPERBLOCK_SIZE, unsigned int)
+#define AOM_CTRL_AV1E_SET_SUPERBLOCK_SIZE
+
+AOM_CTRL_USE_TYPE(AV1E_SET_TARGET_LEVEL, unsigned int)
+#define AOM_CTRL_AV1E_SET_TARGET_LEVEL
+
+AOM_CTRL_USE_TYPE(AV1E_GET_LEVEL, int *)
+#define AOM_CTRL_AV1E_GET_LEVEL
+
+AOM_CTRL_USE_TYPE(AV1E_SET_ANS_WINDOW_SIZE_LOG2, unsigned int)
+#define AOM_CTRL_AV1E_SET_ANS_WINDOW_SIZE_LOG2
+
+AOM_CTRL_USE_TYPE(AV1E_SET_SINGLE_TILE_DECODING, unsigned int)
+#define AOM_CTRL_AV1E_SET_SINGLE_TILE_DECODING
+
+AOM_CTRL_USE_TYPE(AV1E_ENABLE_MOTION_VECTOR_UNIT_TEST, unsigned int)
+#define AOM_CTRL_AV1E_ENABLE_MOTION_VECTOR_UNIT_TEST
+
+AOM_CTRL_USE_TYPE(AV1E_SET_FILM_GRAIN_TEST_VECTOR, unsigned int)
+#define AOM_CTRL_AV1E_SET_FILM_GRAIN_TEST_VECTOR
+
+AOM_CTRL_USE_TYPE(AV1E_SET_FILM_GRAIN_TABLE, const char *)
+#define AOM_CTRL_AV1E_SET_FILM_GRAIN_TABLE
+
+AOM_CTRL_USE_TYPE(AV1E_SET_CDF_UPDATE_MODE, int)
+#define AOM_CTRL_AV1E_SET_CDF_UPDATE_MODE
+
+#ifdef CONFIG_DENOISE
+AOM_CTRL_USE_TYPE(AV1E_SET_DENOISE_NOISE_LEVEL, int);
+#define AOM_CTRL_AV1E_SET_DENOISE_NOISE_LEVEL
+
+AOM_CTRL_USE_TYPE(AV1E_SET_DENOISE_BLOCK_SIZE, unsigned int);
+#define AOM_CTRL_AV1E_SET_DENOISE_BLOCK_SIZE
+#endif
+
+AOM_CTRL_USE_TYPE(AV1E_SET_CHROMA_SUBSAMPLING_X, unsigned int)
+#define AOM_CTRL_AV1E_SET_CHROMA_SUBSAMPLING_X
+
+AOM_CTRL_USE_TYPE(AV1E_SET_CHROMA_SUBSAMPLING_Y, unsigned int)
+#define AOM_CTRL_AV1E_SET_CHROMA_SUBSAMPLING_Y
+
+AOM_CTRL_USE_TYPE(AV1E_SET_REDUCED_TX_TYPE_SET, unsigned int)
+#define AOM_CTRL_AV1E_SET_REDUCED_TX_TYPE_SET
+
+AOM_CTRL_USE_TYPE(AV1E_SET_INTRA_DCT_ONLY, unsigned int)
+#define AOM_CTRL_AV1E_SET_INTRA_DCT_ONLY
+
+AOM_CTRL_USE_TYPE(AV1E_SET_INTER_DCT_ONLY, unsigned int)
+#define AOM_CTRL_AV1E_SET_INTER_DCT_ONLY
+
+AOM_CTRL_USE_TYPE(AV1E_SET_INTRA_DEFAULT_TX_ONLY, unsigned int)
+#define AOM_CTRL_AV1E_SET_INTRA_DEFAULT_TX_ONLY
+
+AOM_CTRL_USE_TYPE(AV1E_SET_QUANT_B_ADAPT, unsigned int)
+#define AOM_CTRL_AV1E_SET_QUANT_B_ADAPT
+
+AOM_CTRL_USE_TYPE(AV1E_SET_GF_MAX_PYRAMID_HEIGHT, unsigned int)
+#define AOM_CTRL_AV1E_SET_GF_MAX_PYRAMID_HEIGHT
+
+AOM_CTRL_USE_TYPE(AV1E_SET_MAX_REFERENCE_FRAMES, unsigned int)
+#define AOM_CTRL_AV1E_SET_MAX_REFERENCE_FRAMES
+
+AOM_CTRL_USE_TYPE(AV1E_SET_REDUCED_REFERENCE_SET, unsigned int)
+#define AOM_CTRL_AV1E_SET_REDUCED_REFERENCE_SET
+
+AOM_CTRL_USE_TYPE(AV1E_SET_COEFF_COST_UPD_FREQ, unsigned int)
+#define AOM_CTRL_AV1E_SET_COEFF_COST_UPD_FREQ
+
+AOM_CTRL_USE_TYPE(AV1E_SET_MODE_COST_UPD_FREQ, unsigned int)
+#define AOM_CTRL_AV1E_SET_MODE_COST_UPD_FREQ
+
+/*!\endcond */
+/*! @} - end defgroup aom_encoder */
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_AOMCX_H_

diff --git a/libav1/aom/aomdx.h b/libav1/aom/aomdx.h
new file mode 100644
index 0000000..c71eaf9
--- /dev/null
+++ b/libav1/aom/aomdx.h

@@ -0,0 +1,323 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+/*!\defgroup aom_decoder AOMedia AOM/AV1 Decoder
+ * \ingroup aom
+ *
+ * @{
+ */
+/*!\file
+ * \brief Provides definitions for using AOM or AV1 within the aom Decoder
+ *        interface.
+ */
+#ifndef AOM_AOM_AOMDX_H_
+#define AOM_AOM_AOMDX_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Include controls common to both the encoder and decoder */
+#include "aom/aom.h"
+
+/*!\name Algorithm interface for AV1
+ *
+ * This interface provides the capability to decode AV1 streams.
+ * @{
+ */
+extern aom_codec_iface_t aom_codec_av1_dx_algo;
+extern aom_codec_iface_t *aom_codec_av1_dx(void);
+/*!@} - end algorithm interface member group*/
+
+/** Data structure that stores bit accounting for debug
+ */
+typedef struct Accounting Accounting;
+
+#ifndef AOM_INSPECTION_H_
+/** Callback that inspects decoder frame data.
+ */
+typedef void (*aom_inspect_cb)(void *decoder, void *ctx);
+
+#endif
+
+/*!\brief Structure to hold inspection callback and context.
+ *
+ * Defines a structure to hold the inspection callback function and calling
+ * context.
+ */
+typedef struct aom_inspect_init {
+  /*! Inspection callback. */
+  aom_inspect_cb inspect_cb;
+
+  /*! Inspection context. */
+  void *inspect_ctx;
+} aom_inspect_init;
+
+/*!\brief Structure to collect a buffer index when inspecting.
+ *
+ * Defines a structure to hold the buffer and return an index
+ * when calling decode from inspect. This enables us to decode
+ * non showable sub frames.
+ */
+typedef struct {
+  /*! Pointer for new position in compressed buffer after decoding 1 OBU. */
+  const unsigned char *buf;
+  /*! Index into reference buffer array to see result of decoding 1 OBU. */
+  int idx;
+  /*! Is a show existing frame. */
+  int show_existing;
+} Av1DecodeReturn;
+
+/*!\brief Structure to hold a tile's start address and size in the bitstream.
+ *
+ * Defines a structure to hold a tile's start address and size in the bitstream.
+ */
+typedef struct aom_tile_data {
+  /*! Tile data size. */
+  size_t coded_tile_data_size;
+  /*! Tile's start address. */
+  const void *coded_tile_data;
+  /*! Extra size information. */
+  size_t extra_size;
+} aom_tile_data;
+
+/*!\brief Structure to hold the external reference frame pointer.
+ *
+ * Define a structure to hold the external reference frame pointer.
+ */
+typedef struct av1_ext_ref_frame {
+  /*! Start pointer of external references. */
+  aom_image_t *img;
+  /*! Number of available external references. */
+  int num;
+} av1_ext_ref_frame_t;
+
+/*!\enum aom_dec_control_id
+ * \brief AOM decoder control functions
+ *
+ * This set of macros define the control functions available for the AOM
+ * decoder interface.
+ *
+ * \sa #aom_codec_control
+ */
+enum aom_dec_control_id {
+  /** control function to get info on which reference frames were updated
+   *  by the last decode
+   */
+  AOMD_GET_LAST_REF_UPDATES = AOM_DECODER_CTRL_ID_START,
+
+  /** check if the indicated frame is corrupted */
+  AOMD_GET_FRAME_CORRUPTED,
+
+  /** control function to get info on which reference frames were used
+   *  by the last decode
+   */
+  AOMD_GET_LAST_REF_USED,
+
+  /** control function to get the dimensions that the current frame is decoded
+   * at. This may be different to the intended display size for the frame as
+   * specified in the wrapper or frame header (see AV1D_GET_DISPLAY_SIZE). */
+  AV1D_GET_FRAME_SIZE,
+
+  /** control function to get the current frame's intended display dimensions
+   * (as specified in the wrapper or frame header). This may be different to
+   * the decoded dimensions of this frame (see AV1D_GET_FRAME_SIZE). */
+  AV1D_GET_DISPLAY_SIZE,
+
+  /** control function to get the bit depth of the stream. */
+  AV1D_GET_BIT_DEPTH,
+
+  /** control function to get the image format of the stream. */
+  AV1D_GET_IMG_FORMAT,
+
+  /** control function to get the size of the tile. */
+  AV1D_GET_TILE_SIZE,
+
+  /** control function to get the tile count in a tile list. */
+  AV1D_GET_TILE_COUNT,
+
+  /** control function to set the byte alignment of the planes in the reference
+   * buffers. Valid values are power of 2, from 32 to 1024. A value of 0 sets
+   * legacy alignment. I.e. Y plane is aligned to 32 bytes, U plane directly
+   * follows Y plane, and V plane directly follows U plane. Default value is 0.
+   */
+  AV1_SET_BYTE_ALIGNMENT,
+
+  /** control function to invert the decoding order to from right to left. The
+   * function is used in a test to confirm the decoding independence of tile
+   * columns. The function may be used in application where this order
+   * of decoding is desired.
+   *
+   * TODO(yaowu): Rework the unit test that uses this control, and in a future
+   *              release, this test-only control shall be removed.
+   */
+  AV1_INVERT_TILE_DECODE_ORDER,
+
+  /** control function to set the skip loop filter flag. Valid values are
+   * integers. The decoder will skip the loop filter when its value is set to
+   * nonzero. If the loop filter is skipped the decoder may accumulate decode
+   * artifacts. The default value is 0.
+   */
+  AV1_SET_SKIP_LOOP_FILTER,
+
+  /** control function to retrieve a pointer to the Accounting struct.  When
+   * compiled without --enable-accounting, this returns AOM_CODEC_INCAPABLE.
+   * If called before a frame has been decoded, this returns AOM_CODEC_ERROR.
+   * The caller should ensure that AOM_CODEC_OK is returned before attempting
+   * to dereference the Accounting pointer.
+   */
+  AV1_GET_ACCOUNTING,
+
+  /** control function to get last decoded frame quantizer. Returned value uses
+   * internal quantizer scale defined by the codec.
+   */
+  AOMD_GET_LAST_QUANTIZER,
+
+  /** control function to set the range of tile decoding. A value that is
+   * greater and equal to zero indicates only the specific row/column is
+   * decoded. A value that is -1 indicates the whole row/column is decoded.
+   * A special case is both values are -1 that means the whole frame is
+   * decoded.
+   */
+  AV1_SET_DECODE_TILE_ROW,
+  AV1_SET_DECODE_TILE_COL,
+  /** control function to set the tile coding mode. A value that is equal to
+   *  zero indicates the tiles are coded in normal tile mode. A value that is
+   *  1 indicates the tiles are coded in large-scale tile mode.
+   */
+  AV1_SET_TILE_MODE,
+  /** control function to get the frame header information of an encoded frame
+   * in the bitstream. This provides a way to access a frame's header data.
+   */
+  AV1D_GET_FRAME_HEADER_INFO,
+  /** control function to get the start address and size of a tile in the coded
+   * bitstream. This provides a way to access a specific tile's bitstream data.
+   */
+  AV1D_GET_TILE_DATA,
+  /** control function to set the external references' pointers in the decoder.
+   *  This is used while decoding the tile list OBU in large-scale tile coding
+   *  mode.
+   */
+  AV1D_SET_EXT_REF_PTR,
+  /** control function to enable the ext-tile software debug and testing code in
+   * the decoder.
+   */
+  AV1D_EXT_TILE_DEBUG,
+
+  /** control function to enable the row based multi-threading of decoding. A
+   * value that is equal to 1 indicates that row based multi-threading is
+   * enabled.
+   */
+  AV1D_SET_ROW_MT,
+
+  /** control function to indicate whether bitstream is in Annex-B format. */
+  AV1D_SET_IS_ANNEXB,
+
+  /** control function to indicate which operating point to use. A scalable
+   *  stream may define multiple operating points, each of which defines a
+   *  set of temporal and spatial layers to be processed. The operating point
+   *  index may take a value between 0 and operating_points_cnt_minus_1 (which
+   *  is at most 31).
+   */
+  AV1D_SET_OPERATING_POINT,
+
+  /** control function to indicate whether to output one frame per temporal
+   *  unit (the default), or one frame per spatial layer.
+   *  In a scalable stream, each temporal unit corresponds to a single "frame"
+   *  of video, and within a temporal unit there may be multiple spatial layers
+   *  with different versions of that frame.
+   *  For video playback, only the highest-quality version (within the
+   *  selected operating point) is needed, but for some use cases it is useful
+   *  to have access to multiple versions of a frame when they are available.
+   */
+  AV1D_SET_OUTPUT_ALL_LAYERS,
+
+  /** control function to set an aom_inspect_cb callback that is invoked each
+   * time a frame is decoded.  When compiled without --enable-inspection, this
+   * returns AOM_CODEC_INCAPABLE.
+   */
+  AV1_SET_INSPECTION_CALLBACK,
+
+  /** control function to set the skip film grain flag. Valid values are
+   * integers. The decoder will skip the film grain when its value is set to
+   * nonzero. The default value is 0.
+   */
+  AV1D_SET_SKIP_FILM_GRAIN,
+
+  AOM_DECODER_CTRL_ID_MAX,
+};
+
+/*!\cond */
+/*!\brief AOM decoder control function parameter type
+ *
+ * Defines the data types that AOMD control functions take. Note that
+ * additional common controls are defined in aom.h
+ *
+ */
+
+AOM_CTRL_USE_TYPE(AOMD_GET_LAST_REF_UPDATES, int *)
+#define AOM_CTRL_AOMD_GET_LAST_REF_UPDATES
+AOM_CTRL_USE_TYPE(AOMD_GET_FRAME_CORRUPTED, int *)
+#define AOM_CTRL_AOMD_GET_FRAME_CORRUPTED
+AOM_CTRL_USE_TYPE(AOMD_GET_LAST_REF_USED, int *)
+#define AOM_CTRL_AOMD_GET_LAST_REF_USED
+AOM_CTRL_USE_TYPE(AOMD_GET_LAST_QUANTIZER, int *)
+#define AOM_CTRL_AOMD_GET_LAST_QUANTIZER
+AOM_CTRL_USE_TYPE(AV1D_GET_DISPLAY_SIZE, int *)
+#define AOM_CTRL_AV1D_GET_DISPLAY_SIZE
+AOM_CTRL_USE_TYPE(AV1D_GET_BIT_DEPTH, unsigned int *)
+#define AOM_CTRL_AV1D_GET_BIT_DEPTH
+AOM_CTRL_USE_TYPE(AV1D_GET_IMG_FORMAT, aom_img_fmt_t *)
+#define AOM_CTRL_AV1D_GET_IMG_FORMAT
+AOM_CTRL_USE_TYPE(AV1D_GET_TILE_SIZE, unsigned int *)
+#define AOM_CTRL_AV1D_GET_TILE_SIZE
+AOM_CTRL_USE_TYPE(AV1D_GET_TILE_COUNT, unsigned int *)
+#define AOM_CTRL_AV1D_GET_TILE_COUNT
+AOM_CTRL_USE_TYPE(AV1D_GET_FRAME_SIZE, int *)
+#define AOM_CTRL_AV1D_GET_FRAME_SIZE
+AOM_CTRL_USE_TYPE(AV1_INVERT_TILE_DECODE_ORDER, int)
+#define AOM_CTRL_AV1_INVERT_TILE_DECODE_ORDER
+AOM_CTRL_USE_TYPE(AV1_GET_ACCOUNTING, Accounting **)
+#define AOM_CTRL_AV1_GET_ACCOUNTING
+AOM_CTRL_USE_TYPE(AV1_SET_DECODE_TILE_ROW, int)
+#define AOM_CTRL_AV1_SET_DECODE_TILE_ROW
+AOM_CTRL_USE_TYPE(AV1_SET_DECODE_TILE_COL, int)
+#define AOM_CTRL_AV1_SET_DECODE_TILE_COL
+AOM_CTRL_USE_TYPE(AV1_SET_TILE_MODE, unsigned int)
+#define AOM_CTRL_AV1_SET_TILE_MODE
+AOM_CTRL_USE_TYPE(AV1D_GET_FRAME_HEADER_INFO, aom_tile_data *)
+#define AOM_CTRL_AV1D_GET_FRAME_HEADER_INFO
+AOM_CTRL_USE_TYPE(AV1D_GET_TILE_DATA, aom_tile_data *)
+#define AOM_CTRL_AV1D_GET_TILE_DATA
+AOM_CTRL_USE_TYPE(AV1D_SET_EXT_REF_PTR, av1_ext_ref_frame_t *)
+#define AOM_CTRL_AV1D_SET_EXT_REF_PTR
+AOM_CTRL_USE_TYPE(AV1D_EXT_TILE_DEBUG, unsigned int)
+#define AOM_CTRL_AV1D_EXT_TILE_DEBUG
+AOM_CTRL_USE_TYPE(AV1D_SET_ROW_MT, unsigned int)
+#define AOM_CTRL_AV1D_SET_ROW_MT
+AOM_CTRL_USE_TYPE(AV1D_SET_SKIP_FILM_GRAIN, int)
+#define AOM_CTRL_AV1D_SET_SKIP_FILM_GRAIN
+AOM_CTRL_USE_TYPE(AV1D_SET_IS_ANNEXB, unsigned int)
+#define AOM_CTRL_AV1D_SET_IS_ANNEXB
+AOM_CTRL_USE_TYPE(AV1D_SET_OPERATING_POINT, int)
+#define AOM_CTRL_AV1D_SET_OPERATING_POINT
+AOM_CTRL_USE_TYPE(AV1D_SET_OUTPUT_ALL_LAYERS, int)
+#define AOM_CTRL_AV1D_SET_OUTPUT_ALL_LAYERS
+AOM_CTRL_USE_TYPE(AV1_SET_INSPECTION_CALLBACK, aom_inspect_init *)
+#define AOM_CTRL_AV1_SET_INSPECTION_CALLBACK
+/*!\endcond */
+/*! @} - end defgroup aom_decoder */
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_AOMDX_H_

diff --git a/libav1/aom/internal/aom_codec_internal.h b/libav1/aom/internal/aom_codec_internal.h
new file mode 100644
index 0000000..185bcad
--- /dev/null
+++ b/libav1/aom/internal/aom_codec_internal.h

@@ -0,0 +1,442 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+/*!\file
+ * \brief Describes the decoder algorithm interface for algorithm
+ *        implementations.
+ *
+ * This file defines the private structures and data types that are only
+ * relevant to implementing an algorithm, as opposed to using it.
+ *
+ * To create a decoder algorithm class, an interface structure is put
+ * into the global namespace:
+ *     <pre>
+ *     my_codec.c:
+ *       aom_codec_iface_t my_codec = {
+ *           "My Codec v1.0",
+ *           AOM_CODEC_ALG_ABI_VERSION,
+ *           ...
+ *       };
+ *     </pre>
+ *
+ * An application instantiates a specific decoder instance by using
+ * aom_codec_init() and a pointer to the algorithm's interface structure:
+ *     <pre>
+ *     my_app.c:
+ *       extern aom_codec_iface_t my_codec;
+ *       {
+ *           aom_codec_ctx_t algo;
+ *           res = aom_codec_init(&algo, &my_codec);
+ *       }
+ *     </pre>
+ *
+ * Once initialized, the instance is managed using other functions from
+ * the aom_codec_* family.
+ */
+#ifndef AOM_AOM_INTERNAL_AOM_CODEC_INTERNAL_H_
+#define AOM_AOM_INTERNAL_AOM_CODEC_INTERNAL_H_
+#include "../aom_decoder.h"
+#include "../aom_encoder.h"
+#include <stdarg.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*!\brief Current ABI version number
+ *
+ * \internal
+ * If this file is altered in any way that changes the ABI, this value
+ * must be bumped.  Examples include, but are not limited to, changing
+ * types, removing or reassigning enums, adding/removing/rearranging
+ * fields to structures
+ */
+#define AOM_CODEC_INTERNAL_ABI_VERSION (5) /**<\hideinitializer*/
+
+typedef struct aom_codec_alg_priv aom_codec_alg_priv_t;
+typedef struct aom_codec_priv_enc_mr_cfg aom_codec_priv_enc_mr_cfg_t;
+
+/*!\brief init function pointer prototype
+ *
+ * Performs algorithm-specific initialization of the decoder context. This
+ * function is called by the generic aom_codec_init() wrapper function, so
+ * plugins implementing this interface may trust the input parameters to be
+ * properly initialized.
+ *
+ * \param[in] ctx   Pointer to this instance's context
+ * \retval #AOM_CODEC_OK
+ *     The input stream was recognized and decoder initialized.
+ * \retval #AOM_CODEC_MEM_ERROR
+ *     Memory operation failed.
+ */
+typedef aom_codec_err_t (*aom_codec_init_fn_t)(
+    aom_codec_ctx_t *ctx, aom_codec_priv_enc_mr_cfg_t *data);
+
+/*!\brief destroy function pointer prototype
+ *
+ * Performs algorithm-specific destruction of the decoder context. This
+ * function is called by the generic aom_codec_destroy() wrapper function,
+ * so plugins implementing this interface may trust the input parameters
+ * to be properly initialized.
+ *
+ * \param[in] ctx   Pointer to this instance's context
+ * \retval #AOM_CODEC_OK
+ *     The input stream was recognized and decoder initialized.
+ * \retval #AOM_CODEC_MEM_ERROR
+ *     Memory operation failed.
+ */
+typedef aom_codec_err_t (*aom_codec_destroy_fn_t)(aom_codec_alg_priv_t *ctx);
+
+/*!\brief parse stream info function pointer prototype
+ *
+ * Performs high level parsing of the bitstream. This function is called by the
+ * generic aom_codec_peek_stream_info() wrapper function, so plugins
+ * implementing this interface may trust the input parameters to be properly
+ * initialized.
+ *
+ * \param[in]      data    Pointer to a block of data to parse
+ * \param[in]      data_sz Size of the data buffer
+ * \param[in,out]  si      Pointer to stream info to update. The is_annexb
+ *                         member \ref MUST be properly initialized. This
+ *                         function sets the rest of the members.
+ *
+ * \retval #AOM_CODEC_OK
+ *     Bitstream is parsable and stream information updated
+ */
+typedef aom_codec_err_t (*aom_codec_peek_si_fn_t)(const uint8_t *data,
+                                                  size_t data_sz,
+                                                  aom_codec_stream_info_t *si);
+
+/*!\brief Return information about the current stream.
+ *
+ * Returns information about the stream that has been parsed during decoding.
+ *
+ * \param[in]      ctx     Pointer to this instance's context
+ * \param[in,out]  si      Pointer to stream info to update
+ *
+ * \retval #AOM_CODEC_OK
+ *     Bitstream is parsable and stream information updated
+ */
+typedef aom_codec_err_t (*aom_codec_get_si_fn_t)(aom_codec_alg_priv_t *ctx,
+                                                 aom_codec_stream_info_t *si);
+
+/*!\brief control function pointer prototype
+ *
+ * This function is used to exchange algorithm specific data with the decoder
+ * instance. This can be used to implement features specific to a particular
+ * algorithm.
+ *
+ * This function is called by the generic aom_codec_control() wrapper
+ * function, so plugins implementing this interface may trust the input
+ * parameters to be properly initialized. However,  this interface does not
+ * provide type safety for the exchanged data or assign meanings to the
+ * control codes. Those details should be specified in the algorithm's
+ * header file. In particular, the ctrl_id parameter is guaranteed to exist
+ * in the algorithm's control mapping table, and the data parameter may be NULL.
+ *
+ *
+ * \param[in]     ctx              Pointer to this instance's context
+ * \param[in]     ctrl_id          Algorithm specific control identifier
+ * \param[in,out] data             Data to exchange with algorithm instance.
+ *
+ * \retval #AOM_CODEC_OK
+ *     The internal state data was deserialized.
+ */
+typedef aom_codec_err_t (*aom_codec_control_fn_t)(aom_codec_alg_priv_t *ctx,
+                                                  va_list ap);
+
+/*!\brief control function pointer mapping
+ *
+ * This structure stores the mapping between control identifiers and
+ * implementing functions. Each algorithm provides a list of these
+ * mappings. This list is searched by the aom_codec_control() wrapper
+ * function to determine which function to invoke. The special
+ * value {0, NULL} is used to indicate end-of-list, and must be
+ * present. The special value {0, <non-null>} can be used as a catch-all
+ * mapping. This implies that ctrl_id values chosen by the algorithm
+ * \ref MUST be non-zero.
+ */
+typedef const struct aom_codec_ctrl_fn_map {
+  int ctrl_id;
+  aom_codec_control_fn_t fn;
+} aom_codec_ctrl_fn_map_t;
+
+/*!\brief decode data function pointer prototype
+ *
+ * Processes a buffer of coded data. If the processing results in a new
+ * decoded frame becoming available, #AOM_CODEC_CB_PUT_SLICE and
+ * #AOM_CODEC_CB_PUT_FRAME events are generated as appropriate. This
+ * function is called by the generic aom_codec_decode() wrapper function,
+ * so plugins implementing this interface may trust the input parameters
+ * to be properly initialized.
+ *
+ * \param[in] ctx          Pointer to this instance's context
+ * \param[in] data         Pointer to this block of new coded data. If
+ *                         NULL, a #AOM_CODEC_CB_PUT_FRAME event is posted
+ *                         for the previously decoded frame.
+ * \param[in] data_sz      Size of the coded data, in bytes.
+ *
+ * \return Returns #AOM_CODEC_OK if the coded data was processed completely
+ *         and future pictures can be decoded without error. Otherwise,
+ *         see the descriptions of the other error codes in ::aom_codec_err_t
+ *         for recoverability capabilities.
+ */
+typedef aom_codec_err_t (*aom_codec_decode_fn_t)(aom_codec_alg_priv_t *ctx,
+                                                 const uint8_t *data,
+                                                 size_t data_sz,
+                                                 void *user_priv);
+
+/*!\brief Decoded frames iterator
+ *
+ * Iterates over a list of the frames available for display. The iterator
+ * storage should be initialized to NULL to start the iteration. Iteration is
+ * complete when this function returns NULL.
+ *
+ * The list of available frames becomes valid upon completion of the
+ * aom_codec_decode call, and remains valid until the next call to
+ * aom_codec_decode.
+ *
+ * \param[in]     ctx      Pointer to this instance's context
+ * \param[in out] iter     Iterator storage, initialized to NULL
+ *
+ * \return Returns a pointer to an image, if one is ready for display. Frames
+ *         produced will always be in PTS (presentation time stamp) order.
+ */
+typedef aom_image_t *(*aom_codec_get_frame_fn_t)(aom_codec_alg_priv_t *ctx,
+                                                 aom_codec_iter_t *iter);
+
+/*!\brief Pass in external frame buffers for the decoder to use.
+ *
+ * Registers functions to be called when libaom needs a frame buffer
+ * to decode the current frame and a function to be called when libaom does
+ * not internally reference the frame buffer. This set function must
+ * be called before the first call to decode or libaom will assume the
+ * default behavior of allocating frame buffers internally.
+ *
+ * \param[in] ctx          Pointer to this instance's context
+ * \param[in] cb_get       Pointer to the get callback function
+ * \param[in] cb_release   Pointer to the release callback function
+ * \param[in] cb_priv      Callback's private data
+ *
+ * \retval #AOM_CODEC_OK
+ *     External frame buffers will be used by libaom.
+ * \retval #AOM_CODEC_INVALID_PARAM
+ *     One or more of the callbacks were NULL.
+ * \retval #AOM_CODEC_ERROR
+ *     Decoder context not initialized, or algorithm not capable of
+ *     using external frame buffers.
+ *
+ * \note
+ * When decoding AV1, the application may be required to pass in at least
+ * #AOM_MAXIMUM_WORK_BUFFERS external frame
+ * buffers.
+ */
+typedef aom_codec_err_t (*aom_codec_set_fb_fn_t)(
+    aom_codec_alg_priv_t *ctx, aom_get_frame_buffer_cb_fn_t cb_get,
+    aom_release_frame_buffer_cb_fn_t cb_release, void *cb_priv);
+
+typedef aom_codec_err_t (*aom_codec_encode_fn_t)(aom_codec_alg_priv_t *ctx,
+                                                 const aom_image_t *img,
+                                                 aom_codec_pts_t pts,
+                                                 unsigned long duration,
+                                                 aom_enc_frame_flags_t flags);
+typedef const aom_codec_cx_pkt_t *(*aom_codec_get_cx_data_fn_t)(
+    aom_codec_alg_priv_t *ctx, aom_codec_iter_t *iter);
+
+typedef aom_codec_err_t (*aom_codec_enc_config_set_fn_t)(
+    aom_codec_alg_priv_t *ctx, const aom_codec_enc_cfg_t *cfg);
+typedef aom_fixed_buf_t *(*aom_codec_get_global_headers_fn_t)(
+    aom_codec_alg_priv_t *ctx);
+
+typedef aom_image_t *(*aom_codec_get_preview_frame_fn_t)(
+    aom_codec_alg_priv_t *ctx);
+
+typedef aom_codec_err_t (*aom_codec_enc_mr_get_mem_loc_fn_t)(
+    const aom_codec_enc_cfg_t *cfg, void **mem_loc);
+
+/*!\brief usage configuration mapping
+ *
+ * This structure stores the mapping between usage identifiers and
+ * configuration structures. Each algorithm provides a list of these
+ * mappings. This list is searched by the aom_codec_enc_config_default()
+ * wrapper function to determine which config to return. The special value
+ * {-1, {0}} is used to indicate end-of-list, and must be present. At least
+ * one mapping must be present, in addition to the end-of-list.
+ *
+ */
+typedef const struct aom_codec_enc_cfg_map {
+  int usage;
+  aom_codec_enc_cfg_t cfg;
+} aom_codec_enc_cfg_map_t;
+
+/*!\brief Decoder algorithm interface interface
+ *
+ * All decoders \ref MUST expose a variable of this type.
+ */
+struct aom_codec_iface {
+  const char *name;                   /**< Identification String  */
+  int abi_version;                    /**< Implemented ABI version */
+  aom_codec_caps_t caps;              /**< Decoder capabilities */
+  aom_codec_init_fn_t init;           /**< \copydoc ::aom_codec_init_fn_t */
+  aom_codec_destroy_fn_t destroy;     /**< \copydoc ::aom_codec_destroy_fn_t */
+  aom_codec_ctrl_fn_map_t *ctrl_maps; /**< \copydoc ::aom_codec_ctrl_fn_map_t */
+  struct aom_codec_dec_iface {
+    aom_codec_peek_si_fn_t peek_si; /**< \copydoc ::aom_codec_peek_si_fn_t */
+    aom_codec_get_si_fn_t get_si;   /**< \copydoc ::aom_codec_get_si_fn_t */
+    aom_codec_decode_fn_t decode;   /**< \copydoc ::aom_codec_decode_fn_t */
+    aom_codec_get_frame_fn_t
+        get_frame;                   /**< \copydoc ::aom_codec_get_frame_fn_t */
+    aom_codec_set_fb_fn_t set_fb_fn; /**< \copydoc ::aom_codec_set_fb_fn_t */
+    aom_codec_set_fb_fn_t set_outb_fn; /**< \copydoc ::aom_codec_set_outb_fn_t */
+  } dec;
+  struct aom_codec_enc_iface {
+    int cfg_map_count;
+    aom_codec_enc_cfg_map_t
+        *cfg_maps;                /**< \copydoc ::aom_codec_enc_cfg_map_t */
+    aom_codec_encode_fn_t encode; /**< \copydoc ::aom_codec_encode_fn_t */
+    aom_codec_get_cx_data_fn_t
+        get_cx_data; /**< \copydoc ::aom_codec_get_cx_data_fn_t */
+    aom_codec_enc_config_set_fn_t
+        cfg_set; /**< \copydoc ::aom_codec_enc_config_set_fn_t */
+    aom_codec_get_global_headers_fn_t
+        get_glob_hdrs; /**< \copydoc ::aom_codec_get_global_headers_fn_t */
+    aom_codec_get_preview_frame_fn_t
+        get_preview; /**< \copydoc ::aom_codec_get_preview_frame_fn_t */
+    aom_codec_enc_mr_get_mem_loc_fn_t
+        mr_get_mem_loc; /**< \copydoc ::aom_codec_enc_mr_get_mem_loc_fn_t */
+  } enc;
+};
+
+/*!\brief Callback function pointer / user data pair storage */
+typedef struct aom_codec_priv_cb_pair {
+  union {
+    aom_codec_put_frame_cb_fn_t put_frame;
+    aom_codec_put_slice_cb_fn_t put_slice;
+  } u;
+  void *user_priv;
+} aom_codec_priv_cb_pair_t;
+
+/*!\brief Instance private storage
+ *
+ * This structure is allocated by the algorithm's init function. It can be
+ * extended in one of two ways. First, a second, algorithm specific structure
+ * can be allocated and the priv member pointed to it. Alternatively, this
+ * structure can be made the first member of the algorithm specific structure,
+ * and the pointer cast to the proper type.
+ */
+struct aom_codec_priv {
+  const char *err_detail;
+  aom_codec_flags_t init_flags;
+  struct {
+    aom_codec_priv_cb_pair_t put_frame_cb;
+    aom_codec_priv_cb_pair_t put_slice_cb;
+  } dec;
+  struct {
+    aom_fixed_buf_t cx_data_dst_buf;
+    unsigned int cx_data_pad_before;
+    unsigned int cx_data_pad_after;
+    aom_codec_cx_pkt_t cx_data_pkt;
+    unsigned int total_encoders;
+  } enc;
+};
+
+/*
+ * Multi-resolution encoding internal configuration
+ */
+struct aom_codec_priv_enc_mr_cfg {
+  unsigned int mr_total_resolutions;
+  unsigned int mr_encoder_id;
+  struct aom_rational mr_down_sampling_factor;
+  void *mr_low_res_mode_info;
+};
+
+#undef AOM_CTRL_USE_TYPE
+#define AOM_CTRL_USE_TYPE(id, typ) \
+  static AOM_INLINE typ id##__value(va_list args) { return va_arg(args, typ); }
+
+#undef AOM_CTRL_USE_TYPE_DEPRECATED
+#define AOM_CTRL_USE_TYPE_DEPRECATED(id, typ) \
+  static AOM_INLINE typ id##__value(va_list args) { return va_arg(args, typ); }
+
+#define CAST(id, arg) id##__value(arg)
+
+/* CODEC_INTERFACE convenience macro
+ *
+ * By convention, each codec interface is a struct with extern linkage, where
+ * the symbol is suffixed with _algo. A getter function is also defined to
+ * return a pointer to the struct, since in some cases it's easier to work
+ * with text symbols than data symbols (see issue #169). This function has
+ * the same name as the struct, less the _algo suffix. The CODEC_INTERFACE
+ * macro is provided to define this getter function automatically.
+ */
+#define CODEC_INTERFACE(id)                          \
+  aom_codec_iface_t *id(void) { return &id##_algo; } \
+  aom_codec_iface_t id##_algo
+
+/* Internal Utility Functions
+ *
+ * The following functions are intended to be used inside algorithms as
+ * utilities for manipulating aom_codec_* data structures.
+ */
+struct aom_codec_pkt_list {
+  unsigned int cnt;
+  unsigned int max;
+  struct aom_codec_cx_pkt pkts[1];
+};
+
+#define aom_codec_pkt_list_decl(n)     \
+  union {                              \
+    struct aom_codec_pkt_list head;    \
+    struct {                           \
+      struct aom_codec_pkt_list head;  \
+      struct aom_codec_cx_pkt pkts[n]; \
+    } alloc;                           \
+  }
+
+#define aom_codec_pkt_list_init(m) \
+  (m)->alloc.head.cnt = 0,         \
+  (m)->alloc.head.max = sizeof((m)->alloc.pkts) / sizeof((m)->alloc.pkts[0])
+
+int aom_codec_pkt_list_add(struct aom_codec_pkt_list *,
+                           const struct aom_codec_cx_pkt *);
+
+const aom_codec_cx_pkt_t *aom_codec_pkt_list_get(
+    struct aom_codec_pkt_list *list, aom_codec_iter_t *iter);
+
+#include <stdio.h>
+#include <setjmp.h>
+
+struct aom_internal_error_info {
+  aom_codec_err_t error_code;
+  int has_detail;
+  char detail[80];
+  int setjmp;  // Boolean: whether 'jmp' is valid.
+  jmp_buf jmp;
+};
+
+#define CLANG_ANALYZER_NORETURN
+#if defined(__has_feature)
+#if __has_feature(attribute_analyzer_noreturn)
+#undef CLANG_ANALYZER_NORETURN
+#define CLANG_ANALYZER_NORETURN __attribute__((analyzer_noreturn))
+#endif
+#endif
+
+void aom_internal_error(struct aom_internal_error_info *info,
+                        aom_codec_err_t error, const char *fmt,
+                        ...) CLANG_ANALYZER_NORETURN;
+
+void aom_merge_corrupted_flag(int *corrupted, int value);
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_INTERNAL_AOM_CODEC_INTERNAL_H_

diff --git a/libav1/aom/src/aom_codec.c b/libav1/aom/src/aom_codec.c
new file mode 100644
index 0000000..733bffb
--- /dev/null
+++ b/libav1/aom/src/aom_codec.c

@@ -0,0 +1,157 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+/*!\file
+ * \brief Provides the high level interface to wrap decoder algorithms.
+ *
+ */
+#include <stdarg.h>
+#include <stdlib.h>
+
+#include "config/aom_config.h"
+#include "config/aom_version.h"
+
+#include "aom/aom_integer.h"
+#include "aom/internal/aom_codec_internal.h"
+
+#define SAVE_STATUS(ctx, var) (ctx ? (ctx->err = var) : var)
+
+int aom_codec_version(void) { return VERSION_PACKED; }
+
+const char *aom_codec_version_str(void) { return VERSION_STRING_NOSP; }
+
+const char *aom_codec_version_extra_str(void) { return VERSION_EXTRA; }
+
+const char *aom_codec_iface_name(aom_codec_iface_t *iface) {
+  return iface ? iface->name : "<invalid interface>";
+}
+
+const char *aom_codec_err_to_string(aom_codec_err_t err) {
+  switch (err) {
+    case AOM_CODEC_OK: return "Success";
+    case AOM_CODEC_ERROR: return "Unspecified internal error";
+    case AOM_CODEC_MEM_ERROR: return "Memory allocation error";
+    case AOM_CODEC_ABI_MISMATCH: return "ABI version mismatch";
+    case AOM_CODEC_INCAPABLE:
+      return "Codec does not implement requested capability";
+    case AOM_CODEC_UNSUP_BITSTREAM:
+      return "Bitstream not supported by this decoder";
+    case AOM_CODEC_UNSUP_FEATURE:
+      return "Bitstream required feature not supported by this decoder";
+    case AOM_CODEC_CORRUPT_FRAME: return "Corrupt frame detected";
+    case AOM_CODEC_INVALID_PARAM: return "Invalid parameter";
+    case AOM_CODEC_LIST_END: return "End of iterated list";
+  }
+
+  return "Unrecognized error code";
+}
+
+const char *aom_codec_error(aom_codec_ctx_t *ctx) {
+  return (ctx) ? aom_codec_err_to_string(ctx->err)
+               : aom_codec_err_to_string(AOM_CODEC_INVALID_PARAM);
+}
+
+const char *aom_codec_error_detail(aom_codec_ctx_t *ctx) {
+  if (ctx && ctx->err)
+    return ctx->priv ? ctx->priv->err_detail : ctx->err_detail;
+
+  return NULL;
+}
+
+aom_codec_err_t aom_codec_destroy(aom_codec_ctx_t *ctx) {
+  aom_codec_err_t res;
+
+  if (!ctx)
+    res = AOM_CODEC_INVALID_PARAM;
+  else if (!ctx->iface || !ctx->priv)
+    res = AOM_CODEC_ERROR;
+  else {
+    ctx->iface->destroy((aom_codec_alg_priv_t *)ctx->priv);
+
+    ctx->iface = NULL;
+    ctx->name = NULL;
+    ctx->priv = NULL;
+    res = AOM_CODEC_OK;
+  }
+
+  return SAVE_STATUS(ctx, res);
+}
+
+aom_codec_caps_t aom_codec_get_caps(aom_codec_iface_t *iface) {
+  return (iface) ? iface->caps : 0;
+}
+
+aom_codec_err_t aom_codec_control_(aom_codec_ctx_t *ctx, int ctrl_id, ...) {
+  aom_codec_err_t res;
+
+  if (!ctx || !ctrl_id)
+    res = AOM_CODEC_INVALID_PARAM;
+  else if (!ctx->iface || !ctx->priv || !ctx->iface->ctrl_maps)
+    res = AOM_CODEC_ERROR;
+  else {
+    aom_codec_ctrl_fn_map_t *entry;
+
+    res = AOM_CODEC_ERROR;
+
+    for (entry = ctx->iface->ctrl_maps; entry && entry->fn; entry++) {
+      if (!entry->ctrl_id || entry->ctrl_id == ctrl_id) {
+        va_list ap;
+
+        va_start(ap, ctrl_id);
+        res = entry->fn((aom_codec_alg_priv_t *)ctx->priv, ap);
+        va_end(ap);
+        break;
+      }
+    }
+  }
+
+  return SAVE_STATUS(ctx, res);
+}
+
+void aom_internal_error(struct aom_internal_error_info *info,
+                        aom_codec_err_t error, const char *fmt, ...) {
+  va_list ap;
+
+  info->error_code = error;
+  info->has_detail = 0;
+
+  if (fmt) {
+    size_t sz = sizeof(info->detail);
+
+    info->has_detail = 1;
+    va_start(ap, fmt);
+    vsnprintf(info->detail, sz - 1, fmt, ap);
+    va_end(ap);
+    info->detail[sz - 1] = '\0';
+  }
+
+  if (info->setjmp) longjmp(info->jmp, info->error_code);
+}
+
+void aom_merge_corrupted_flag(int *corrupted, int value) {
+  *corrupted |= value;
+}
+
+const char *aom_obu_type_to_string(OBU_TYPE type) {
+  switch (type) {
+    case OBU_SEQUENCE_HEADER: return "OBU_SEQUENCE_HEADER";
+    case OBU_TEMPORAL_DELIMITER: return "OBU_TEMPORAL_DELIMITER";
+    case OBU_FRAME_HEADER: return "OBU_FRAME_HEADER";
+    case OBU_REDUNDANT_FRAME_HEADER: return "OBU_REDUNDANT_FRAME_HEADER";
+    case OBU_FRAME: return "OBU_FRAME";
+    case OBU_TILE_GROUP: return "OBU_TILE_GROUP";
+    case OBU_METADATA: return "OBU_METADATA";
+    case OBU_TILE_LIST: return "OBU_TILE_LIST";
+    case OBU_PADDING: return "OBU_PADDING";
+    default: break;
+  }
+  return "<Invalid OBU Type>";
+}

diff --git a/libav1/aom/src/aom_decoder.c b/libav1/aom/src/aom_decoder.c
new file mode 100644
index 0000000..0558f53
--- /dev/null
+++ b/libav1/aom/src/aom_decoder.c

@@ -0,0 +1,177 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+/*!\file
+ * \brief Provides the high level interface to wrap decoder algorithms.
+ *
+ */
+#include <string.h>
+#include "aom/internal/aom_codec_internal.h"
+
+#define SAVE_STATUS(ctx, var) (ctx ? (ctx->err = var) : var)
+
+static aom_codec_alg_priv_t *get_alg_priv(aom_codec_ctx_t *ctx) {
+  return (aom_codec_alg_priv_t *)ctx->priv;
+}
+
+aom_codec_err_t aom_codec_dec_init_ver(aom_codec_ctx_t *ctx,
+                                       aom_codec_iface_t *iface,
+                                       const aom_codec_dec_cfg_t *cfg,
+                                       aom_codec_flags_t flags, int ver) {
+  aom_codec_err_t res;
+
+  if (ver != AOM_DECODER_ABI_VERSION)
+    res = AOM_CODEC_ABI_MISMATCH;
+  else if (!ctx || !iface)
+    res = AOM_CODEC_INVALID_PARAM;
+  else if (iface->abi_version != AOM_CODEC_INTERNAL_ABI_VERSION)
+    res = AOM_CODEC_ABI_MISMATCH;
+  else if ((flags & AOM_CODEC_USE_POSTPROC) &&
+           !(iface->caps & AOM_CODEC_CAP_POSTPROC))
+    res = AOM_CODEC_INCAPABLE;
+  else if (!(iface->caps & AOM_CODEC_CAP_DECODER))
+    res = AOM_CODEC_INCAPABLE;
+  else {
+    memset(ctx, 0, sizeof(*ctx));
+    ctx->iface = iface;
+    ctx->name = iface->name;
+    ctx->priv = NULL;
+    ctx->init_flags = flags;
+    ctx->cfg = cfg;
+
+    res = ctx->iface->init(ctx, NULL);
+    if (res) {
+      ctx->err_detail = ctx->priv ? ctx->priv->err_detail : NULL;
+      aom_codec_destroy(ctx);
+    }
+  }
+
+  return SAVE_STATUS(ctx, res);
+}
+
+aom_codec_err_t aom_codec_peek_stream_info(aom_codec_iface_t *iface,
+                                           const uint8_t *data, size_t data_sz,
+                                           aom_codec_stream_info_t *si) {
+  aom_codec_err_t res;
+
+  if (!iface || !data || !data_sz || !si) {
+    res = AOM_CODEC_INVALID_PARAM;
+  } else {
+    /* Set default/unknown values */
+    si->w = 0;
+    si->h = 0;
+
+    res = iface->dec.peek_si(data, data_sz, si);
+  }
+
+  return res;
+}
+
+aom_codec_err_t aom_codec_get_stream_info(aom_codec_ctx_t *ctx,
+                                          aom_codec_stream_info_t *si) {
+  aom_codec_err_t res;
+
+  if (!ctx || !si) {
+    res = AOM_CODEC_INVALID_PARAM;
+  } else if (!ctx->iface || !ctx->priv) {
+    res = AOM_CODEC_ERROR;
+  } else {
+    /* Set default/unknown values */
+    si->w = 0;
+    si->h = 0;
+
+    res = ctx->iface->dec.get_si(get_alg_priv(ctx), si);
+  }
+
+  return SAVE_STATUS(ctx, res);
+}
+
+aom_codec_err_t aom_codec_decode(aom_codec_ctx_t *ctx, const uint8_t *data,
+                                 size_t data_sz, void *user_priv) {
+  aom_codec_err_t res;
+
+  if (!ctx)
+    res = AOM_CODEC_INVALID_PARAM;
+  else if (!ctx->iface || !ctx->priv)
+    res = AOM_CODEC_ERROR;
+  else {
+    res = ctx->iface->dec.decode(get_alg_priv(ctx), data, data_sz, user_priv);
+  }
+
+  return SAVE_STATUS(ctx, res);
+}
+
+aom_image_t *aom_codec_get_frame(aom_codec_ctx_t *ctx, aom_codec_iter_t *iter) {
+  aom_image_t *img;
+
+  if (!ctx || !iter || !ctx->iface || !ctx->priv)
+    img = NULL;
+  else
+    img = ctx->iface->dec.get_frame(get_alg_priv(ctx), iter);
+
+  return img;
+}
+
+aom_codec_err_t aom_codec_register_put_frame_cb(aom_codec_ctx_t *ctx,
+                                                aom_codec_put_frame_cb_fn_t cb,
+                                                void *user_priv) {
+  aom_codec_err_t res;
+
+  if (!ctx || !cb)
+    res = AOM_CODEC_INVALID_PARAM;
+  else if (!ctx->iface || !ctx->priv ||
+           !(ctx->iface->caps & AOM_CODEC_CAP_PUT_FRAME))
+    res = AOM_CODEC_ERROR;
+  else {
+    ctx->priv->dec.put_frame_cb.u.put_frame = cb;
+    ctx->priv->dec.put_frame_cb.user_priv = user_priv;
+    res = AOM_CODEC_OK;
+  }
+
+  return SAVE_STATUS(ctx, res);
+}
+
+aom_codec_err_t aom_codec_register_put_slice_cb(aom_codec_ctx_t *ctx,
+                                                aom_codec_put_slice_cb_fn_t cb,
+                                                void *user_priv) {
+  aom_codec_err_t res;
+
+  if (!ctx || !cb)
+    res = AOM_CODEC_INVALID_PARAM;
+  else if (!ctx->iface || !ctx->priv ||
+           !(ctx->iface->caps & AOM_CODEC_CAP_PUT_SLICE))
+    res = AOM_CODEC_ERROR;
+  else {
+    ctx->priv->dec.put_slice_cb.u.put_slice = cb;
+    ctx->priv->dec.put_slice_cb.user_priv = user_priv;
+    res = AOM_CODEC_OK;
+  }
+
+  return SAVE_STATUS(ctx, res);
+}
+
+aom_codec_err_t aom_codec_set_frame_buffer_functions(
+    aom_codec_ctx_t *ctx, aom_get_frame_buffer_cb_fn_t cb_get,
+    aom_release_frame_buffer_cb_fn_t cb_release, void *cb_priv) {
+  aom_codec_err_t res;
+
+  if (!ctx || !cb_get || !cb_release) {
+    res = AOM_CODEC_INVALID_PARAM;
+  } else if (!ctx->iface || !ctx->priv ||
+             !(ctx->iface->caps & AOM_CODEC_CAP_EXTERNAL_FRAME_BUFFER)) {
+    res = AOM_CODEC_ERROR;
+  } else {
+    res = ctx->iface->dec.set_fb_fn(get_alg_priv(ctx), cb_get, cb_release,
+                                    cb_priv);
+  }
+
+  return SAVE_STATUS(ctx, res);
+}

diff --git a/libav1/aom/src/aom_image.c b/libav1/aom/src/aom_image.c
new file mode 100644
index 0000000..6504cdd
--- /dev/null
+++ b/libav1/aom/src/aom_image.c

@@ -0,0 +1,259 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <stdlib.h>
+#include <string.h>
+
+#include "aom/aom_image.h"
+#include "aom/aom_integer.h"
+#include "aom_mem/aom_mem.h"
+
+static INLINE unsigned int align_image_dimension(unsigned int d,
+                                                 unsigned int subsampling,
+                                                 unsigned int size_align) {
+  unsigned int align;
+
+  align = (1 << subsampling) - 1;
+  align = (size_align - 1 > align) ? (size_align - 1) : align;
+  return ((d + align) & ~align);
+}
+
+static aom_image_t *img_alloc_helper(
+    aom_image_t *img, aom_img_fmt_t fmt, unsigned int d_w, unsigned int d_h,
+    unsigned int buf_align, unsigned int stride_align, unsigned int size_align,
+    unsigned char *img_data, unsigned int border) {
+  unsigned int h, w, s, xcs, ycs, bps;
+  unsigned int stride_in_bytes;
+
+  /* Treat align==0 like align==1 */
+  if (!buf_align) buf_align = 1;
+
+  /* Validate alignment (must be power of 2) */
+  if (buf_align & (buf_align - 1)) goto fail;
+
+  /* Treat align==0 like align==1 */
+  if (!stride_align) stride_align = 1;
+
+  /* Validate alignment (must be power of 2) */
+  if (stride_align & (stride_align - 1)) goto fail;
+
+  /* Treat align==0 like align==1 */
+  if (!size_align) size_align = 1;
+
+  /* Validate alignment (must be power of 2) */
+  if (size_align & (size_align - 1)) goto fail;
+
+  /* Get sample size for this format */
+  switch (fmt) {
+    case AOM_IMG_FMT_I420:
+    case AOM_IMG_FMT_YV12:
+    case AOM_IMG_FMT_AOMI420:
+    case AOM_IMG_FMT_AOMYV12: bps = 12; break;
+    case AOM_IMG_FMT_I422:
+    case AOM_IMG_FMT_I444: bps = 24; break;
+    case AOM_IMG_FMT_YV1216:
+    case AOM_IMG_FMT_I42016: bps = 24; break;
+    case AOM_IMG_FMT_I42216:
+    case AOM_IMG_FMT_I44416: bps = 48; break;
+    default: bps = 16; break;
+  }
+
+  /* Get chroma shift values for this format */
+  switch (fmt) {
+    case AOM_IMG_FMT_I420:
+    case AOM_IMG_FMT_YV12:
+    case AOM_IMG_FMT_AOMI420:
+    case AOM_IMG_FMT_AOMYV12:
+    case AOM_IMG_FMT_I422:
+    case AOM_IMG_FMT_I42016:
+    case AOM_IMG_FMT_YV1216:
+    case AOM_IMG_FMT_I42216: xcs = 1; break;
+    default: xcs = 0; break;
+  }
+
+  switch (fmt) {
+    case AOM_IMG_FMT_I420:
+    case AOM_IMG_FMT_YV12:
+    case AOM_IMG_FMT_AOMI420:
+    case AOM_IMG_FMT_AOMYV12:
+    case AOM_IMG_FMT_YV1216:
+    case AOM_IMG_FMT_I42016: ycs = 1; break;
+    default: ycs = 0; break;
+  }
+
+  /* Calculate storage sizes given the chroma subsampling */
+  w = align_image_dimension(d_w, xcs, size_align);
+  h = align_image_dimension(d_h, ycs, size_align);
+
+  s = (fmt & AOM_IMG_FMT_PLANAR) ? w : bps * w / 8;
+  s = (s + 2 * border + stride_align - 1) & ~(stride_align - 1);
+  stride_in_bytes = (fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? s * 2 : s;
+
+  /* Allocate the new image */
+  if (!img) {
+    img = (aom_image_t *)calloc(1, sizeof(aom_image_t));
+
+    if (!img) goto fail;
+
+    img->self_allocd = 1;
+  } else {
+    memset(img, 0, sizeof(aom_image_t));
+  }
+
+  img->img_data = img_data;
+
+  if (!img_data) {
+    const uint64_t alloc_size =
+        (fmt & AOM_IMG_FMT_PLANAR)
+            ? (uint64_t)(h + 2 * border) * stride_in_bytes * bps / 8
+            : (uint64_t)(h + 2 * border) * stride_in_bytes;
+
+    if (alloc_size != (size_t)alloc_size) goto fail;
+
+    img->img_data = (uint8_t *)aom_memalign(buf_align, (size_t)alloc_size);
+    img->img_data_owner = 1;
+    img->sz = (size_t)alloc_size;
+  }
+
+  if (!img->img_data) goto fail;
+
+  img->fmt = fmt;
+  img->bit_depth = (fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? 16 : 8;
+  // aligned width and aligned height
+  img->w = w;
+  img->h = h;
+  img->x_chroma_shift = xcs;
+  img->y_chroma_shift = ycs;
+  img->bps = bps;
+
+  /* Calculate strides */
+  img->stride[AOM_PLANE_Y] = stride_in_bytes;
+  img->stride[AOM_PLANE_U] = img->stride[AOM_PLANE_V] = stride_in_bytes >> xcs;
+
+  /* Default viewport to entire image */
+  if (!aom_img_set_rect(img, 0, 0, d_w, d_h, border)) return img;
+
+fail:
+  aom_img_free(img);
+  return NULL;
+}
+
+aom_image_t *aom_img_alloc(aom_image_t *img, aom_img_fmt_t fmt,
+                           unsigned int d_w, unsigned int d_h,
+                           unsigned int align) {
+  return img_alloc_helper(img, fmt, d_w, d_h, align, align, 1, NULL, 0);
+}
+
+aom_image_t *aom_img_wrap(aom_image_t *img, aom_img_fmt_t fmt, unsigned int d_w,
+                          unsigned int d_h, unsigned int stride_align,
+                          unsigned char *img_data) {
+  /* By setting buf_align = 1, we don't change buffer alignment in this
+   * function. */
+  return img_alloc_helper(img, fmt, d_w, d_h, 1, stride_align, 1, img_data, 0);
+}
+
+aom_image_t *aom_img_alloc_with_border(aom_image_t *img, aom_img_fmt_t fmt,
+                                       unsigned int d_w, unsigned int d_h,
+                                       unsigned int align,
+                                       unsigned int size_align,
+                                       unsigned int border) {
+  return img_alloc_helper(img, fmt, d_w, d_h, align, align, size_align, NULL,
+                          border);
+}
+
+int aom_img_set_rect(aom_image_t *img, unsigned int x, unsigned int y,
+                     unsigned int w, unsigned int h, unsigned int border) {
+  unsigned char *data;
+
+  if (x + w <= img->w && y + h <= img->h) {
+    img->d_w = w;
+    img->d_h = h;
+
+    x += border;
+    y += border;
+
+    /* Calculate plane pointers */
+    if (!(img->fmt & AOM_IMG_FMT_PLANAR)) {
+      img->planes[AOM_PLANE_PACKED] =
+          img->img_data + x * img->bps / 8 + y * img->stride[AOM_PLANE_PACKED];
+    } else {
+      const int bytes_per_sample =
+          (img->fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? 2 : 1;
+      data = img->img_data;
+
+      img->planes[AOM_PLANE_Y] =
+          data + x * bytes_per_sample + y * img->stride[AOM_PLANE_Y];
+      data += (img->h + 2 * border) * img->stride[AOM_PLANE_Y];
+
+      unsigned int uv_border_h = border >> img->y_chroma_shift;
+      unsigned int uv_x = x >> img->x_chroma_shift;
+      unsigned int uv_y = y >> img->y_chroma_shift;
+      if (!(img->fmt & AOM_IMG_FMT_UV_FLIP)) {
+        img->planes[AOM_PLANE_U] =
+            data + uv_x * bytes_per_sample + uv_y * img->stride[AOM_PLANE_U];
+        data += ((img->h >> img->y_chroma_shift) + 2 * uv_border_h) *
+                img->stride[AOM_PLANE_U];
+        img->planes[AOM_PLANE_V] =
+            data + uv_x * bytes_per_sample + uv_y * img->stride[AOM_PLANE_V];
+      } else {
+        img->planes[AOM_PLANE_V] =
+            data + uv_x * bytes_per_sample + uv_y * img->stride[AOM_PLANE_V];
+        data += ((img->h >> img->y_chroma_shift) + 2 * uv_border_h) *
+                img->stride[AOM_PLANE_V];
+        img->planes[AOM_PLANE_U] =
+            data + uv_x * bytes_per_sample + uv_y * img->stride[AOM_PLANE_U];
+      }
+    }
+    return 0;
+  }
+  return -1;
+}
+
+void aom_img_flip(aom_image_t *img) {
+  /* Note: In the calculation pointer adjustment calculation, we want the
+   * rhs to be promoted to a signed type. Section 6.3.1.8 of the ISO C99
+   * standard indicates that if the adjustment parameter is unsigned, the
+   * stride parameter will be promoted to unsigned, causing errors when
+   * the lhs is a larger type than the rhs.
+   */
+  img->planes[AOM_PLANE_Y] += (signed)(img->d_h - 1) * img->stride[AOM_PLANE_Y];
+  img->stride[AOM_PLANE_Y] = -img->stride[AOM_PLANE_Y];
+
+  img->planes[AOM_PLANE_U] += (signed)((img->d_h >> img->y_chroma_shift) - 1) *
+                              img->stride[AOM_PLANE_U];
+  img->stride[AOM_PLANE_U] = -img->stride[AOM_PLANE_U];
+
+  img->planes[AOM_PLANE_V] += (signed)((img->d_h >> img->y_chroma_shift) - 1) *
+                              img->stride[AOM_PLANE_V];
+  img->stride[AOM_PLANE_V] = -img->stride[AOM_PLANE_V];
+}
+
+void aom_img_free(aom_image_t *img) {
+  if (img) {
+    if (img->img_data && img->img_data_owner) aom_free(img->img_data);
+
+    if (img->self_allocd) free(img);
+  }
+}
+
+int aom_img_plane_width(const aom_image_t *img, int plane) {
+  if (plane > 0 && img->x_chroma_shift > 0)
+    return (img->d_w + 1) >> img->x_chroma_shift;
+  else
+    return img->d_w;
+}
+
+int aom_img_plane_height(const aom_image_t *img, int plane) {
+  if (plane > 0 && img->y_chroma_shift > 0)
+    return (img->d_h + 1) >> img->y_chroma_shift;
+  else
+    return img->d_h;
+}

diff --git a/libav1/aom/src/aom_integer.c b/libav1/aom/src/aom_integer.c
new file mode 100644
index 0000000..7edfd0d
--- /dev/null
+++ b/libav1/aom/src/aom_integer.c

@@ -0,0 +1,105 @@
+/*
+ * Copyright (c) 2018, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#include <assert.h>
+
+#include "aom/aom_integer.h"
+
+static const size_t kMaximumLeb128Size = 8;
+static const uint8_t kLeb128ByteMask = 0x7f;  // Binary: 01111111
+
+// Disallow values larger than 32-bits to ensure consistent behavior on 32 and
+// 64 bit targets: value is typically used to determine buffer allocation size
+// when decoded.
+static const uint64_t kMaximumLeb128Value = UINT32_MAX;
+
+size_t aom_uleb_size_in_bytes(uint64_t value) {
+  size_t size = 0;
+  do {
+    ++size;
+  } while ((value >>= 7) != 0);
+  return size;
+}
+
+int aom_uleb_decode(const uint8_t *buffer, size_t available, uint64_t *value,
+                    size_t *length) {
+  if (buffer && value) {
+    *value = 0;
+    for (size_t i = 0; i < kMaximumLeb128Size && i < available; ++i) {
+      const uint8_t decoded_byte = *(buffer + i) & kLeb128ByteMask;
+      *value |= ((uint64_t)decoded_byte) << (i * 7);
+      if ((*(buffer + i) >> 7) == 0) {
+        if (length) {
+          *length = i + 1;
+        }
+
+        // Fail on values larger than 32-bits to ensure consistent behavior on
+        // 32 and 64 bit targets: value is typically used to determine buffer
+        // allocation size.
+        if (*value > UINT32_MAX) return -1;
+
+        return 0;
+      }
+    }
+  }
+
+  // If we get here, either the buffer/value pointers were invalid,
+  // or we ran over the available space
+  return -1;
+}
+
+int aom_uleb_encode(uint64_t value, size_t available, uint8_t *coded_value,
+                    size_t *coded_size) {
+  const size_t leb_size = aom_uleb_size_in_bytes(value);
+  if (value > kMaximumLeb128Value || leb_size > kMaximumLeb128Size ||
+      leb_size > available || !coded_value || !coded_size) {
+    return -1;
+  }
+
+  for (size_t i = 0; i < leb_size; ++i) {
+    uint8_t byte = value & 0x7f;
+    value >>= 7;
+
+    if (value != 0) byte |= 0x80;  // Signal that more bytes follow.
+
+    *(coded_value + i) = byte;
+  }
+
+  *coded_size = leb_size;
+  return 0;
+}
+
+int aom_uleb_encode_fixed_size(uint64_t value, size_t available,
+                               size_t pad_to_size, uint8_t *coded_value,
+                               size_t *coded_size) {
+  if (value > kMaximumLeb128Value || !coded_value || !coded_size ||
+      available < pad_to_size || pad_to_size > kMaximumLeb128Size) {
+    return -1;
+  }
+  const uint64_t limit = 1ULL << (7 * pad_to_size);
+  if (value >= limit) {
+    // Can't encode 'value' within 'pad_to_size' bytes
+    return -1;
+  }
+
+  for (size_t i = 0; i < pad_to_size; ++i) {
+    uint8_t byte = value & 0x7f;
+    value >>= 7;
+
+    if (i < pad_to_size - 1) byte |= 0x80;  // Signal that more bytes follow.
+
+    *(coded_value + i) = byte;
+  }
+
+  assert(value == 0);
+
+  *coded_size = pad_to_size;
+  return 0;
+}

diff --git a/libav1/aom_dsp/aom_convolve.c b/libav1/aom_dsp/aom_convolve.c
new file mode 100644
index 0000000..4791826
--- /dev/null
+++ b/libav1/aom_dsp/aom_convolve.c

@@ -0,0 +1,238 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+#include <string.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+
+#include "aom/aom_integer.h"
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_dsp/aom_filter.h"
+#include "aom_ports/mem.h"
+
+static INLINE int horz_scalar_product(const uint8_t *a, const int16_t *b) {
+  int sum = 0;
+  for (int k = 0; k < SUBPEL_TAPS; ++k) sum += a[k] * b[k];
+  return sum;
+}
+
+static INLINE int vert_scalar_product(const uint8_t *a, ptrdiff_t a_stride,
+                                      const int16_t *b) {
+  int sum = 0;
+  for (int k = 0; k < SUBPEL_TAPS; ++k) sum += a[k * a_stride] * b[k];
+  return sum;
+}
+
+static void convolve_horiz(const uint8_t *src, ptrdiff_t src_stride,
+                           uint8_t *dst, ptrdiff_t dst_stride,
+                           const InterpKernel *x_filters, int x0_q4,
+                           int x_step_q4, int w, int h) {
+  src -= SUBPEL_TAPS / 2 - 1;
+  for (int y = 0; y < h; ++y) {
+    int x_q4 = x0_q4;
+    for (int x = 0; x < w; ++x) {
+      const uint8_t *const src_x = &src[x_q4 >> SUBPEL_BITS];
+      const int16_t *const x_filter = x_filters[x_q4 & SUBPEL_MASK];
+      const int sum = horz_scalar_product(src_x, x_filter);
+      dst[x] = clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS));
+      x_q4 += x_step_q4;
+    }
+    src += src_stride;
+    dst += dst_stride;
+  }
+}
+
+static void convolve_vert(const uint8_t *src, ptrdiff_t src_stride,
+                          uint8_t *dst, ptrdiff_t dst_stride,
+                          const InterpKernel *y_filters, int y0_q4,
+                          int y_step_q4, int w, int h) {
+  src -= src_stride * (SUBPEL_TAPS / 2 - 1);
+
+  for (int x = 0; x < w; ++x) {
+    int y_q4 = y0_q4;
+    for (int y = 0; y < h; ++y) {
+      const unsigned char *src_y = &src[(y_q4 >> SUBPEL_BITS) * src_stride];
+      const int16_t *const y_filter = y_filters[y_q4 & SUBPEL_MASK];
+      const int sum = vert_scalar_product(src_y, src_stride, y_filter);
+      dst[y * dst_stride] = clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS));
+      y_q4 += y_step_q4;
+    }
+    ++src;
+    ++dst;
+  }
+}
+
+static const InterpKernel *get_filter_base(const int16_t *filter) {
+  // NOTE: This assumes that the filter table is 256-byte aligned.
+  // TODO(agrange) Modify to make independent of table alignment.
+  return (const InterpKernel *)(((intptr_t)filter) & ~((intptr_t)0xFF));
+}
+
+static int get_filter_offset(const int16_t *f, const InterpKernel *base) {
+  return (int)((const InterpKernel *)(intptr_t)f - base);
+}
+
+void aom_convolve8_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
+                           uint8_t *dst, ptrdiff_t dst_stride,
+                           const int16_t *filter_x, int x_step_q4,
+                           const int16_t *filter_y, int y_step_q4, int w,
+                           int h) {
+  const InterpKernel *const filters_x = get_filter_base(filter_x);
+  const int x0_q4 = get_filter_offset(filter_x, filters_x);
+
+  (void)filter_y;
+  (void)y_step_q4;
+
+  convolve_horiz(src, src_stride, dst, dst_stride, filters_x, x0_q4, x_step_q4,
+                 w, h);
+}
+
+void aom_convolve8_vert_c(const uint8_t *src, ptrdiff_t src_stride,
+                          uint8_t *dst, ptrdiff_t dst_stride,
+                          const int16_t *filter_x, int x_step_q4,
+                          const int16_t *filter_y, int y_step_q4, int w,
+                          int h) {
+  const InterpKernel *const filters_y = get_filter_base(filter_y);
+  const int y0_q4 = get_filter_offset(filter_y, filters_y);
+
+  (void)filter_x;
+  (void)x_step_q4;
+
+  convolve_vert(src, src_stride, dst, dst_stride, filters_y, y0_q4, y_step_q4,
+                w, h);
+}
+
+void aom_convolve_copy_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
+                         ptrdiff_t dst_stride, const int16_t *filter_x,
+                         int filter_x_stride, const int16_t *filter_y,
+                         int filter_y_stride, int w, int h) {
+  int r;
+
+  (void)filter_x;
+  (void)filter_x_stride;
+  (void)filter_y;
+  (void)filter_y_stride;
+
+  for (r = h; r > 0; --r) {
+    memcpy(dst, src, w);
+    src += src_stride;
+    dst += dst_stride;
+  }
+}
+
+static INLINE int highbd_vert_scalar_product(const uint16_t *a,
+                                             ptrdiff_t a_stride,
+                                             const int16_t *b) {
+  int sum = 0;
+  for (int k = 0; k < SUBPEL_TAPS; ++k) sum += a[k * a_stride] * b[k];
+  return sum;
+}
+
+static INLINE int highbd_horz_scalar_product(const uint16_t *a,
+                                             const int16_t *b) {
+  int sum = 0;
+  for (int k = 0; k < SUBPEL_TAPS; ++k) sum += a[k] * b[k];
+  return sum;
+}
+
+static void highbd_convolve_horiz(const uint8_t *src8, ptrdiff_t src_stride,
+                                  uint8_t *dst8, ptrdiff_t dst_stride,
+                                  const InterpKernel *x_filters, int x0_q4,
+                                  int x_step_q4, int w, int h, int bd) {
+  uint16_t *src = CONVERT_TO_SHORTPTR(src8);
+  uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
+  src -= SUBPEL_TAPS / 2 - 1;
+  for (int y = 0; y < h; ++y) {
+    int x_q4 = x0_q4;
+    for (int x = 0; x < w; ++x) {
+      const uint16_t *const src_x = &src[x_q4 >> SUBPEL_BITS];
+      const int16_t *const x_filter = x_filters[x_q4 & SUBPEL_MASK];
+      const int sum = highbd_horz_scalar_product(src_x, x_filter);
+      dst[x] = clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd);
+      x_q4 += x_step_q4;
+    }
+    src += src_stride;
+    dst += dst_stride;
+  }
+}
+
+static void highbd_convolve_vert(const uint8_t *src8, ptrdiff_t src_stride,
+                                 uint8_t *dst8, ptrdiff_t dst_stride,
+                                 const InterpKernel *y_filters, int y0_q4,
+                                 int y_step_q4, int w, int h, int bd) {
+  uint16_t *src = CONVERT_TO_SHORTPTR(src8);
+  uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
+  src -= src_stride * (SUBPEL_TAPS / 2 - 1);
+  for (int x = 0; x < w; ++x) {
+    int y_q4 = y0_q4;
+    for (int y = 0; y < h; ++y) {
+      const uint16_t *src_y = &src[(y_q4 >> SUBPEL_BITS) * src_stride];
+      const int16_t *const y_filter = y_filters[y_q4 & SUBPEL_MASK];
+      const int sum = highbd_vert_scalar_product(src_y, src_stride, y_filter);
+      dst[y * dst_stride] =
+          clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd);
+      y_q4 += y_step_q4;
+    }
+    ++src;
+    ++dst;
+  }
+}
+
+void aom_highbd_convolve8_horiz_c(const uint8_t *src, ptrdiff_t src_stride,
+                                  uint8_t *dst, ptrdiff_t dst_stride,
+                                  const int16_t *filter_x, int x_step_q4,
+                                  const int16_t *filter_y, int y_step_q4, int w,
+                                  int h, int bd) {
+  const InterpKernel *const filters_x = get_filter_base(filter_x);
+  const int x0_q4 = get_filter_offset(filter_x, filters_x);
+  (void)filter_y;
+  (void)y_step_q4;
+
+  highbd_convolve_horiz(src, src_stride, dst, dst_stride, filters_x, x0_q4,
+                        x_step_q4, w, h, bd);
+}
+
+void aom_highbd_convolve8_vert_c(const uint8_t *src, ptrdiff_t src_stride,
+                                 uint8_t *dst, ptrdiff_t dst_stride,
+                                 const int16_t *filter_x, int x_step_q4,
+                                 const int16_t *filter_y, int y_step_q4, int w,
+                                 int h, int bd) {
+  const InterpKernel *const filters_y = get_filter_base(filter_y);
+  const int y0_q4 = get_filter_offset(filter_y, filters_y);
+  (void)filter_x;
+  (void)x_step_q4;
+
+  highbd_convolve_vert(src, src_stride, dst, dst_stride, filters_y, y0_q4,
+                       y_step_q4, w, h, bd);
+}
+
+void aom_highbd_convolve_copy_c(const uint8_t *src8, ptrdiff_t src_stride,
+                                uint8_t *dst8, ptrdiff_t dst_stride,
+                                const int16_t *filter_x, int filter_x_stride,
+                                const int16_t *filter_y, int filter_y_stride,
+                                int w, int h, int bd) {
+  int r;
+  uint16_t *src = CONVERT_TO_SHORTPTR(src8);
+  uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
+  (void)filter_x;
+  (void)filter_y;
+  (void)filter_x_stride;
+  (void)filter_y_stride;
+  (void)bd;
+
+  for (r = h; r > 0; --r) {
+    memcpy(dst, src, w * sizeof(uint16_t));
+    src += src_stride;
+    dst += dst_stride;
+  }
+}

diff --git a/libav1/aom_dsp/aom_dsp_common.h b/libav1/aom_dsp/aom_dsp_common.h
new file mode 100644
index 0000000..a185b23
--- /dev/null
+++ b/libav1/aom_dsp/aom_dsp_common.h

@@ -0,0 +1,98 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_AOM_DSP_COMMON_H_
+#define AOM_AOM_DSP_AOM_DSP_COMMON_H_
+
+#include "config/aom_config.h"
+
+#include "aom/aom_integer.h"
+#include "aom_ports/mem.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#ifndef MAX_SB_SIZE
+#define MAX_SB_SIZE 128
+#endif  // ndef MAX_SB_SIZE
+
+#define AOMMIN(x, y) (((x) < (y)) ? (x) : (y))
+#define AOMMAX(x, y) (((x) > (y)) ? (x) : (y))
+
+#define IMPLIES(a, b) (!(a) || (b))  //  Logical 'a implies b' (or 'a -> b')
+
+#define IS_POWER_OF_TWO(x) (((x) & ((x)-1)) == 0)
+
+/* Left shifting a negative value became undefined behavior in C99 (downgraded
+   from merely implementation-defined in C89). This should still compile to the
+   correct thing on any two's-complement machine, but avoid ubsan warnings.*/
+#define AOM_SIGNED_SHL(x, shift) ((x) * (((x)*0 + 1) << (shift)))
+
+// These can be used to give a hint about branch outcomes.
+// This can have an effect, even if your target processor has a
+// good branch predictor, as these hints can affect basic block
+// ordering by the compiler.
+#ifdef __GNUC__
+#define LIKELY(v) __builtin_expect(v, 1)
+#define UNLIKELY(v) __builtin_expect(v, 0)
+#else
+#define LIKELY(v) (v)
+#define UNLIKELY(v) (v)
+#endif
+
+typedef uint8_t qm_val_t;
+#define AOM_QM_BITS 5
+
+// Note:
+// tran_low_t  is the datatype used for final transform coefficients.
+// tran_high_t is the datatype used for intermediate transform stages.
+typedef int64_t tran_high_t;
+typedef int32_t tran_low_t;
+
+static INLINE uint8_t clip_pixel(int val) {
+  return (val > 255) ? 255 : (val < 0) ? 0 : val;
+}
+
+static INLINE int clamp(int value, int low, int high) {
+  return value < low ? low : (value > high ? high : value);
+}
+
+static INLINE int64_t clamp64(int64_t value, int64_t low, int64_t high) {
+  return value < low ? low : (value > high ? high : value);
+}
+
+static INLINE double fclamp(double value, double low, double high) {
+  return value < low ? low : (value > high ? high : value);
+}
+
+static INLINE uint16_t clip_pixel_highbd(int val, int bd) {
+  switch (bd) {
+    case 8:
+    default: return (uint16_t)clamp(val, 0, 255);
+    case 10: return (uint16_t)clamp(val, 0, 1023);
+    case 12: return (uint16_t)clamp(val, 0, 4095);
+  }
+}
+
+// The result of this branchless code is equivalent to (value < 0 ? 0 : value)
+// or max(0, value) and might be faster in some cases.
+// Care should be taken since the behavior of right shifting signed type
+// negative value is undefined by C standards and implementation defined,
+static INLINE unsigned int negative_to_zero(int value) {
+  return value & ~(value >> (sizeof(value) * 8 - 1));
+}
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_DSP_AOM_DSP_COMMON_H_

diff --git a/libav1/aom_dsp/aom_dsp_rtcd.c b/libav1/aom_dsp/aom_dsp_rtcd.c
new file mode 100644
index 0000000..1514bd6
--- /dev/null
+++ b/libav1/aom_dsp/aom_dsp_rtcd.c

@@ -0,0 +1,18 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#include "config/aom_config.h"
+
+#define RTCD_C
+#include "config/aom_dsp_rtcd.h"
+
+#include "aom_ports/aom_once.h"
+
+void aom_dsp_rtcd() { aom_once(setup_rtcd_internal); }

diff --git a/libav1/aom_dsp/aom_filter.h b/libav1/aom_dsp/aom_filter.h
new file mode 100644
index 0000000..00686ac
--- /dev/null
+++ b/libav1/aom_dsp/aom_filter.h

@@ -0,0 +1,56 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_AOM_FILTER_H_
+#define AOM_AOM_DSP_AOM_FILTER_H_
+
+#include "aom/aom_integer.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define FILTER_BITS 7
+
+#define SUBPEL_BITS 4
+#define SUBPEL_MASK ((1 << SUBPEL_BITS) - 1)
+#define SUBPEL_SHIFTS (1 << SUBPEL_BITS)
+#define SUBPEL_TAPS 8
+
+#define SCALE_SUBPEL_BITS 10
+#define SCALE_SUBPEL_SHIFTS (1 << SCALE_SUBPEL_BITS)
+#define SCALE_SUBPEL_MASK (SCALE_SUBPEL_SHIFTS - 1)
+#define SCALE_EXTRA_BITS (SCALE_SUBPEL_BITS - SUBPEL_BITS)
+#define SCALE_EXTRA_OFF ((1 << SCALE_EXTRA_BITS) / 2)
+
+#define RS_SUBPEL_BITS 6
+#define RS_SUBPEL_MASK ((1 << RS_SUBPEL_BITS) - 1)
+#define RS_SCALE_SUBPEL_BITS 14
+#define RS_SCALE_SUBPEL_MASK ((1 << RS_SCALE_SUBPEL_BITS) - 1)
+#define RS_SCALE_EXTRA_BITS (RS_SCALE_SUBPEL_BITS - RS_SUBPEL_BITS)
+#define RS_SCALE_EXTRA_OFF (1 << (RS_SCALE_EXTRA_BITS - 1))
+
+typedef int16_t InterpKernel[SUBPEL_TAPS];
+
+#define BIL_SUBPEL_BITS 3
+#define BIL_SUBPEL_SHIFTS (1 << BIL_SUBPEL_BITS)
+
+// 2 tap bilinear filters
+static const uint8_t bilinear_filters_2t[BIL_SUBPEL_SHIFTS][2] = {
+  { 128, 0 }, { 112, 16 }, { 96, 32 }, { 80, 48 },
+  { 64, 64 }, { 48, 80 },  { 32, 96 }, { 16, 112 },
+};
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_DSP_AOM_FILTER_H_

diff --git a/libav1/aom_dsp/aom_simd.h b/libav1/aom_dsp/aom_simd.h
new file mode 100644
index 0000000..ab950ca
--- /dev/null
+++ b/libav1/aom_dsp/aom_simd.h

@@ -0,0 +1,38 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_AOM_SIMD_H_
+#define AOM_AOM_DSP_AOM_SIMD_H_
+
+#include <stdint.h>
+
+#if defined(_WIN32)
+#include <intrin.h>
+#endif
+
+#include "config/aom_config.h"
+
+#include "aom_dsp/aom_simd_inline.h"
+
+#define SIMD_CHECK 1  // Sanity checks in C equivalents
+
+#if HAVE_NEON
+#include "simd/v256_intrinsics_arm.h"
+// VS compiling for 32 bit targets does not support vector types in
+// structs as arguments, which makes the v256 type of the intrinsics
+// hard to support, so optimizations for this target are disabled.
+#elif HAVE_SSE2 && (defined(_WIN64) || !defined(_MSC_VER) || defined(__clang__))
+#include "simd/v256_intrinsics_x86.h"
+#else
+#include "simd/v256_intrinsics.h"
+#endif
+
+#endif  // AOM_AOM_DSP_AOM_SIMD_H_

diff --git a/libav1/aom_dsp/aom_simd_inline.h b/libav1/aom_dsp/aom_simd_inline.h
new file mode 100644
index 0000000..eb333f6
--- /dev/null
+++ b/libav1/aom_dsp/aom_simd_inline.h

@@ -0,0 +1,21 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_AOM_SIMD_INLINE_H_
+#define AOM_AOM_DSP_AOM_SIMD_INLINE_H_
+
+#include "aom/aom_integer.h"
+
+#ifndef SIMD_INLINE
+#define SIMD_INLINE static AOM_FORCE_INLINE
+#endif
+
+#endif  // AOM_AOM_DSP_AOM_SIMD_INLINE_H_

diff --git a/libav1/aom_dsp/avg.c b/libav1/aom_dsp/avg.c
new file mode 100644
index 0000000..a97cd0f
--- /dev/null
+++ b/libav1/aom_dsp/avg.c

@@ -0,0 +1,182 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <stdlib.h>
+
+#include "config/aom_dsp_rtcd.h"
+#include "aom_ports/mem.h"
+
+void aom_minmax_8x8_c(const uint8_t *s, int p, const uint8_t *d, int dp,
+                      int *min, int *max) {
+  int i, j;
+  *min = 255;
+  *max = 0;
+  for (i = 0; i < 8; ++i, s += p, d += dp) {
+    for (j = 0; j < 8; ++j) {
+      int diff = abs(s[j] - d[j]);
+      *min = diff < *min ? diff : *min;
+      *max = diff > *max ? diff : *max;
+    }
+  }
+}
+
+unsigned int aom_avg_4x4_c(const uint8_t *s, int p) {
+  int i, j;
+  int sum = 0;
+  for (i = 0; i < 4; ++i, s += p)
+    for (j = 0; j < 4; sum += s[j], ++j) {
+    }
+
+  return (sum + 8) >> 4;
+}
+
+unsigned int aom_avg_8x8_c(const uint8_t *s, int p) {
+  int i, j;
+  int sum = 0;
+  for (i = 0; i < 8; ++i, s += p)
+    for (j = 0; j < 8; sum += s[j], ++j) {
+    }
+
+  return (sum + 32) >> 6;
+}
+
+// src_diff: first pass, 9 bit, dynamic range [-255, 255]
+//           second pass, 12 bit, dynamic range [-2040, 2040]
+static void hadamard_col8(const int16_t *src_diff, ptrdiff_t src_stride,
+                          int16_t *coeff) {
+  int16_t b0 = src_diff[0 * src_stride] + src_diff[1 * src_stride];
+  int16_t b1 = src_diff[0 * src_stride] - src_diff[1 * src_stride];
+  int16_t b2 = src_diff[2 * src_stride] + src_diff[3 * src_stride];
+  int16_t b3 = src_diff[2 * src_stride] - src_diff[3 * src_stride];
+  int16_t b4 = src_diff[4 * src_stride] + src_diff[5 * src_stride];
+  int16_t b5 = src_diff[4 * src_stride] - src_diff[5 * src_stride];
+  int16_t b6 = src_diff[6 * src_stride] + src_diff[7 * src_stride];
+  int16_t b7 = src_diff[6 * src_stride] - src_diff[7 * src_stride];
+
+  int16_t c0 = b0 + b2;
+  int16_t c1 = b1 + b3;
+  int16_t c2 = b0 - b2;
+  int16_t c3 = b1 - b3;
+  int16_t c4 = b4 + b6;
+  int16_t c5 = b5 + b7;
+  int16_t c6 = b4 - b6;
+  int16_t c7 = b5 - b7;
+
+  coeff[0] = c0 + c4;
+  coeff[7] = c1 + c5;
+  coeff[3] = c2 + c6;
+  coeff[4] = c3 + c7;
+  coeff[2] = c0 - c4;
+  coeff[6] = c1 - c5;
+  coeff[1] = c2 - c6;
+  coeff[5] = c3 - c7;
+}
+
+// The order of the output coeff of the hadamard is not important. For
+// optimization purposes the final transpose may be skipped.
+void aom_hadamard_8x8_c(const int16_t *src_diff, ptrdiff_t src_stride,
+                        tran_low_t *coeff) {
+  int idx;
+  int16_t buffer[64];
+  int16_t buffer2[64];
+  int16_t *tmp_buf = &buffer[0];
+  for (idx = 0; idx < 8; ++idx) {
+    hadamard_col8(src_diff, src_stride, tmp_buf);  // src_diff: 9 bit
+                                                   // dynamic range [-255, 255]
+    tmp_buf += 8;
+    ++src_diff;
+  }
+
+  tmp_buf = &buffer[0];
+  for (idx = 0; idx < 8; ++idx) {
+    hadamard_col8(tmp_buf, 8, buffer2 + 8 * idx);  // tmp_buf: 12 bit
+    // dynamic range [-2040, 2040]
+    // buffer2: 15 bit
+    // dynamic range [-16320, 16320]
+    ++tmp_buf;
+  }
+
+  for (idx = 0; idx < 64; ++idx) coeff[idx] = (tran_low_t)buffer2[idx];
+}
+
+// In place 16x16 2D Hadamard transform
+void aom_hadamard_16x16_c(const int16_t *src_diff, ptrdiff_t src_stride,
+                          tran_low_t *coeff) {
+  int idx;
+  for (idx = 0; idx < 4; ++idx) {
+    // src_diff: 9 bit, dynamic range [-255, 255]
+    const int16_t *src_ptr =
+        src_diff + (idx >> 1) * 8 * src_stride + (idx & 0x01) * 8;
+    aom_hadamard_8x8_c(src_ptr, src_stride, coeff + idx * 64);
+  }
+
+  // coeff: 15 bit, dynamic range [-16320, 16320]
+  for (idx = 0; idx < 64; ++idx) {
+    tran_low_t a0 = coeff[0];
+    tran_low_t a1 = coeff[64];
+    tran_low_t a2 = coeff[128];
+    tran_low_t a3 = coeff[192];
+
+    tran_low_t b0 = (a0 + a1) >> 1;  // (a0 + a1): 16 bit, [-32640, 32640]
+    tran_low_t b1 = (a0 - a1) >> 1;  // b0-b3: 15 bit, dynamic range
+    tran_low_t b2 = (a2 + a3) >> 1;  // [-16320, 16320]
+    tran_low_t b3 = (a2 - a3) >> 1;
+
+    coeff[0] = b0 + b2;  // 16 bit, [-32640, 32640]
+    coeff[64] = b1 + b3;
+    coeff[128] = b0 - b2;
+    coeff[192] = b1 - b3;
+
+    ++coeff;
+  }
+}
+
+void aom_hadamard_32x32_c(const int16_t *src_diff, ptrdiff_t src_stride,
+                          tran_low_t *coeff) {
+  int idx;
+  for (idx = 0; idx < 4; ++idx) {
+    // src_diff: 9 bit, dynamic range [-255, 255]
+    const int16_t *src_ptr =
+        src_diff + (idx >> 1) * 16 * src_stride + (idx & 0x01) * 16;
+    aom_hadamard_16x16_c(src_ptr, src_stride, coeff + idx * 256);
+  }
+
+  // coeff: 15 bit, dynamic range [-16320, 16320]
+  for (idx = 0; idx < 256; ++idx) {
+    tran_low_t a0 = coeff[0];
+    tran_low_t a1 = coeff[256];
+    tran_low_t a2 = coeff[512];
+    tran_low_t a3 = coeff[768];
+
+    tran_low_t b0 = (a0 + a1) >> 2;  // (a0 + a1): 16 bit, [-32640, 32640]
+    tran_low_t b1 = (a0 - a1) >> 2;  // b0-b3: 15 bit, dynamic range
+    tran_low_t b2 = (a2 + a3) >> 2;  // [-16320, 16320]
+    tran_low_t b3 = (a2 - a3) >> 2;
+
+    coeff[0] = b0 + b2;  // 16 bit, [-32640, 32640]
+    coeff[256] = b1 + b3;
+    coeff[512] = b0 - b2;
+    coeff[768] = b1 - b3;
+
+    ++coeff;
+  }
+}
+
+// coeff: 16 bits, dynamic range [-32640, 32640].
+// length: value range {16, 64, 256, 1024}.
+int aom_satd_c(const tran_low_t *coeff, int length) {
+  int i;
+  int satd = 0;
+  for (i = 0; i < length; ++i) satd += abs(coeff[i]);
+
+  // satd: 26 bits, dynamic range [-32640 * 1024, 32640 * 1024]
+  return satd;
+}

diff --git a/libav1/aom_dsp/binary_codes_reader.c b/libav1/aom_dsp/binary_codes_reader.c
new file mode 100644
index 0000000..7cd903d
--- /dev/null
+++ b/libav1/aom_dsp/binary_codes_reader.c

@@ -0,0 +1,56 @@
+/*
+ * Copyright (c) 2017, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "aom_dsp/binary_codes_reader.h"
+#include "aom_dsp/recenter.h"
+#include "av1/common/common.h"
+
+uint16_t aom_read_primitive_quniform_(aom_reader *r,
+                                      uint16_t n ACCT_STR_PARAM) {
+  if (n <= 1) return 0;
+  const int l = get_msb(n) + 1;
+  const int m = (1 << l) - n;
+  const int v = aom_read_literal(r, l - 1, ACCT_STR_NAME);
+  return v < m ? v : (v << 1) - m + aom_read_bit(r, ACCT_STR_NAME);
+}
+
+// Decode finite subexponential code that for a symbol v in [0, n-1] with
+// parameter k
+uint16_t aom_read_primitive_subexpfin_(aom_reader *r, uint16_t n,
+                                       uint16_t k ACCT_STR_PARAM) {
+  int i = 0;
+  int mk = 0;
+
+  while (1) {
+    int b = (i ? k + i - 1 : k);
+    int a = (1 << b);
+
+    if (n <= mk + 3 * a) {
+      return aom_read_primitive_quniform(r, n - mk, ACCT_STR_NAME) + mk;
+    }
+
+    if (!aom_read_bit(r, ACCT_STR_NAME)) {
+      return aom_read_literal(r, b, ACCT_STR_NAME) + mk;
+    }
+
+    i = i + 1;
+    mk += a;
+  }
+
+  assert(0);
+  return 0;
+}
+
+uint16_t aom_read_primitive_refsubexpfin_(aom_reader *r, uint16_t n, uint16_t k,
+                                          uint16_t ref ACCT_STR_PARAM) {
+  return inv_recenter_finite_nonneg(
+      n, ref, aom_read_primitive_subexpfin(r, n, k, ACCT_STR_NAME));
+}

diff --git a/libav1/aom_dsp/binary_codes_reader.h b/libav1/aom_dsp/binary_codes_reader.h
new file mode 100644
index 0000000..d218f06
--- /dev/null
+++ b/libav1/aom_dsp/binary_codes_reader.h

@@ -0,0 +1,44 @@
+/*
+ * Copyright (c) 2017, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_BINARY_CODES_READER_H_
+#define AOM_AOM_DSP_BINARY_CODES_READER_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <assert.h>
+
+#include "config/aom_config.h"
+
+#include "aom/aom_integer.h"
+#include "aom_dsp/bitreader.h"
+#include "aom_dsp/bitreader_buffer.h"
+
+#define aom_read_primitive_quniform(r, n, ACCT_STR_NAME) \
+  aom_read_primitive_quniform_(r, n ACCT_STR_ARG(ACCT_STR_NAME))
+#define aom_read_primitive_subexpfin(r, n, k, ACCT_STR_NAME) \
+  aom_read_primitive_subexpfin_(r, n, k ACCT_STR_ARG(ACCT_STR_NAME))
+#define aom_read_primitive_refsubexpfin(r, n, k, ref, ACCT_STR_NAME) \
+  aom_read_primitive_refsubexpfin_(r, n, k, ref ACCT_STR_ARG(ACCT_STR_NAME))
+
+uint16_t aom_read_primitive_quniform_(aom_reader *r, uint16_t n ACCT_STR_PARAM);
+uint16_t aom_read_primitive_subexpfin_(aom_reader *r, uint16_t n,
+                                       uint16_t k ACCT_STR_PARAM);
+uint16_t aom_read_primitive_refsubexpfin_(aom_reader *r, uint16_t n, uint16_t k,
+                                          uint16_t ref ACCT_STR_PARAM);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_DSP_BINARY_CODES_READER_H_

diff --git a/libav1/aom_dsp/bitreader.h b/libav1/aom_dsp/bitreader.h
new file mode 100644
index 0000000..45a4608
--- /dev/null
+++ b/libav1/aom_dsp/bitreader.h

@@ -0,0 +1,157 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_BITREADER_H_
+#define AOM_AOM_DSP_BITREADER_H_
+
+#include <assert.h>
+#include <limits.h>
+//#include <xmmintrin.h>
+
+#include "config/aom_config.h"
+
+#include "aom/aomdx.h"
+#include "aom/aom_integer.h"
+#include "aom_dsp/daalaboolreader.h"
+#include "aom_dsp/prob.h"
+#include "av1/common/odintrin.h"
+
+#if CONFIG_ACCOUNTING
+#include "av1/decoder/accounting.h"
+#define ACCT_STR_NAME acct_str
+#define ACCT_STR_PARAM , const char *ACCT_STR_NAME
+#define ACCT_STR_ARG(s) , s
+#else
+#define ACCT_STR_PARAM
+#define ACCT_STR_ARG(s)
+#endif
+
+#define aom_read(r, prob, ACCT_STR_NAME) \
+  aom_read_(r, prob ACCT_STR_ARG(ACCT_STR_NAME))
+#define aom_read_bit(r, ACCT_STR_NAME) \
+  aom_read_bit_(r ACCT_STR_ARG(ACCT_STR_NAME))
+#define aom_read_tree(r, tree, probs, ACCT_STR_NAME) \
+  aom_read_tree_(r, tree, probs ACCT_STR_ARG(ACCT_STR_NAME))
+#define aom_read_literal(r, bits, ACCT_STR_NAME) \
+  aom_read_literal_(r, bits ACCT_STR_ARG(ACCT_STR_NAME))
+#define aom_read_cdf(r, cdf, nsymbs, ACCT_STR_NAME) \
+  aom_read_cdf_(r, cdf, nsymbs ACCT_STR_ARG(ACCT_STR_NAME))
+#define aom_read_symbol(r, cdf, nsymbs, ACCT_STR_NAME) \
+  aom_read_symbol_(r, cdf, nsymbs ACCT_STR_ARG(ACCT_STR_NAME))
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef struct daala_reader aom_reader;
+
+static INLINE int aom_reader_init(aom_reader *r, const uint8_t *buffer,
+                                  size_t size) {
+  return aom_daala_reader_init(r, buffer, (int)size);
+}
+
+static INLINE const uint8_t *aom_reader_find_begin(aom_reader *r) {
+  return aom_daala_reader_find_begin(r);
+}
+
+static INLINE const uint8_t *aom_reader_find_end(aom_reader *r) {
+  return aom_daala_reader_find_end(r);
+}
+
+// Returns true if the bit reader has tried to decode more data from the buffer
+// than was actually provided.
+static INLINE int aom_reader_has_overflowed(const aom_reader *r) {
+  return aom_daala_reader_has_overflowed(r);
+}
+
+// Returns the position in the bit reader in bits.
+static INLINE uint32_t aom_reader_tell(const aom_reader *r) {
+  return aom_daala_reader_tell(r);
+}
+
+// Returns the position in the bit reader in 1/8th bits.
+static INLINE uint32_t aom_reader_tell_frac(const aom_reader *r) {
+  return aom_daala_reader_tell_frac(r);
+}
+
+#if CONFIG_ACCOUNTING
+static INLINE void aom_process_accounting(const aom_reader *r ACCT_STR_PARAM) {
+  if (r->accounting != NULL) {
+    uint32_t tell_frac;
+    tell_frac = aom_reader_tell_frac(r);
+    aom_accounting_record(r->accounting, ACCT_STR_NAME,
+                          tell_frac - r->accounting->last_tell_frac);
+    r->accounting->last_tell_frac = tell_frac;
+  }
+}
+
+static INLINE void aom_update_symb_counts(const aom_reader *r, int is_binary) {
+  if (r->accounting != NULL) {
+    r->accounting->syms.num_multi_syms += !is_binary;
+    r->accounting->syms.num_binary_syms += !!is_binary;
+  }
+}
+#endif
+
+static INLINE int aom_read_(aom_reader *r, int prob ACCT_STR_PARAM) {
+  int ret;
+  ret = aom_daala_read(r, prob);
+#if CONFIG_ACCOUNTING
+  if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
+  aom_update_symb_counts(r, 1);
+#endif
+  return ret;
+}
+
+static INLINE int aom_read_bit_(aom_reader *r ACCT_STR_PARAM) {
+  int ret;
+  ret = aom_read(r, 128, NULL);  // aom_prob_half
+#if CONFIG_ACCOUNTING
+  if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
+#endif
+  return ret;
+}
+
+static INLINE int aom_read_literal_(aom_reader *r, int bits ACCT_STR_PARAM) {
+  int literal = 0, bit;
+
+  for (bit = bits - 1; bit >= 0; bit--) literal |= aom_read_bit(r, NULL) << bit;
+#if CONFIG_ACCOUNTING
+  if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
+#endif
+  return literal;
+}
+
+static INLINE int aom_read_cdf_(aom_reader *r, aom_cdf_prob *cdf,
+                                int nsymbs ACCT_STR_PARAM) {
+  int ret;
+  ret = od_ec_decode_cdf_q15_standard(&r->ec, cdf, nsymbs);
+
+#if CONFIG_ACCOUNTING
+  if (ACCT_STR_NAME) aom_process_accounting(r, ACCT_STR_NAME);
+  aom_update_symb_counts(r, (nsymbs == 2));
+#endif
+  return ret;
+}
+
+static INLINE int aom_read_symbol_(aom_reader *r, aom_cdf_prob *cdf,
+                                   int nsymbs ACCT_STR_PARAM) {
+//_mm_prefetch((const char*)(cdf + (nsymbs&&0xfffffff0)), _MM_HINT_T0);
+if (r->allow_update_cdf)
+    return od_ec_decode_cdf_q15(&r->ec, cdf, nsymbs);
+  return od_ec_decode_cdf_q15_standard(&r->ec, cdf, nsymbs);    
+}
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_DSP_BITREADER_H_

diff --git a/libav1/aom_dsp/bitreader_buffer.c b/libav1/aom_dsp/bitreader_buffer.c
new file mode 100644
index 0000000..984b217
--- /dev/null
+++ b/libav1/aom_dsp/bitreader_buffer.c

@@ -0,0 +1,116 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+
+#include "config/aom_config.h"
+
+#include "aom_dsp/bitreader_buffer.h"
+#include "aom_dsp/recenter.h"
+#include "aom_ports/bitops.h"
+
+size_t aom_rb_bytes_read(const struct aom_read_bit_buffer *rb) {
+  return (rb->bit_offset + 7) >> 3;
+}
+
+int aom_rb_read_bit(struct aom_read_bit_buffer *rb) {
+  const uint32_t off = rb->bit_offset;
+  const uint32_t p = off >> 3;
+  const int q = 7 - (int)(off & 0x7);
+  if (rb->bit_buffer + p < rb->bit_buffer_end) {
+    const int bit = (rb->bit_buffer[p] >> q) & 1;
+    rb->bit_offset = off + 1;
+    return bit;
+  } else {
+    if (rb->error_handler) rb->error_handler(rb->error_handler_data);
+    return 0;
+  }
+}
+
+int aom_rb_read_literal(struct aom_read_bit_buffer *rb, int bits) {
+  assert(bits <= 31);
+  int value = 0, bit;
+  for (bit = bits - 1; bit >= 0; bit--) value |= aom_rb_read_bit(rb) << bit;
+  return value;
+}
+
+uint32_t aom_rb_read_unsigned_literal(struct aom_read_bit_buffer *rb,
+                                      int bits) {
+  assert(bits <= 32);
+  uint32_t value = 0;
+  int bit;
+  for (bit = bits - 1; bit >= 0; bit--)
+    value |= (uint32_t)aom_rb_read_bit(rb) << bit;
+  return value;
+}
+
+int aom_rb_read_inv_signed_literal(struct aom_read_bit_buffer *rb, int bits) {
+  const int nbits = sizeof(unsigned) * 8 - bits - 1;
+  const unsigned value = (unsigned)aom_rb_read_literal(rb, bits + 1) << nbits;
+  return ((int)value) >> nbits;
+}
+
+uint32_t aom_rb_read_uvlc(struct aom_read_bit_buffer *rb) {
+  int leading_zeros = 0;
+  while (!aom_rb_read_bit(rb)) ++leading_zeros;
+  // Maximum 32 bits.
+  if (leading_zeros >= 32) return UINT32_MAX;
+  const uint32_t base = (1u << leading_zeros) - 1;
+  const uint32_t value = aom_rb_read_literal(rb, leading_zeros);
+  return base + value;
+}
+
+static uint16_t aom_rb_read_primitive_quniform(struct aom_read_bit_buffer *rb,
+                                               uint16_t n) {
+  if (n <= 1) return 0;
+  const int l = get_msb(n) + 1;
+  const int m = (1 << l) - n;
+  const int v = aom_rb_read_literal(rb, l - 1);
+  return v < m ? v : (v << 1) - m + aom_rb_read_bit(rb);
+}
+
+static uint16_t aom_rb_read_primitive_subexpfin(struct aom_read_bit_buffer *rb,
+                                                uint16_t n, uint16_t k) {
+  int i = 0;
+  int mk = 0;
+
+  while (1) {
+    int b = (i ? k + i - 1 : k);
+    int a = (1 << b);
+
+    if (n <= mk + 3 * a) {
+      return aom_rb_read_primitive_quniform(rb, n - mk) + mk;
+    }
+
+    if (!aom_rb_read_bit(rb)) {
+      return aom_rb_read_literal(rb, b) + mk;
+    }
+
+    i = i + 1;
+    mk += a;
+  }
+
+  assert(0);
+  return 0;
+}
+
+static uint16_t aom_rb_read_primitive_refsubexpfin(
+    struct aom_read_bit_buffer *rb, uint16_t n, uint16_t k, uint16_t ref) {
+  return inv_recenter_finite_nonneg(n, ref,
+                                    aom_rb_read_primitive_subexpfin(rb, n, k));
+}
+
+int16_t aom_rb_read_signed_primitive_refsubexpfin(
+    struct aom_read_bit_buffer *rb, uint16_t n, uint16_t k, int16_t ref) {
+  ref += n - 1;
+  const uint16_t scaled_n = (n << 1) - 1;
+  return aom_rb_read_primitive_refsubexpfin(rb, scaled_n, k, ref) - n + 1;
+}

diff --git a/libav1/aom_dsp/bitreader_buffer.h b/libav1/aom_dsp/bitreader_buffer.h
new file mode 100644
index 0000000..359fbe5
--- /dev/null
+++ b/libav1/aom_dsp/bitreader_buffer.h

@@ -0,0 +1,53 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_BITREADER_BUFFER_H_
+#define AOM_AOM_DSP_BITREADER_BUFFER_H_
+
+#include <limits.h>
+
+#include "aom/aom_integer.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef void (*aom_rb_error_handler)(void *data);
+
+struct aom_read_bit_buffer {
+  const uint8_t *bit_buffer;
+  const uint8_t *bit_buffer_end;
+  uint32_t bit_offset;
+
+  void *error_handler_data;
+  aom_rb_error_handler error_handler;
+};
+
+size_t aom_rb_bytes_read(const struct aom_read_bit_buffer *rb);
+
+int aom_rb_read_bit(struct aom_read_bit_buffer *rb);
+
+int aom_rb_read_literal(struct aom_read_bit_buffer *rb, int bits);
+
+uint32_t aom_rb_read_unsigned_literal(struct aom_read_bit_buffer *rb, int bits);
+
+int aom_rb_read_inv_signed_literal(struct aom_read_bit_buffer *rb, int bits);
+
+uint32_t aom_rb_read_uvlc(struct aom_read_bit_buffer *rb);
+
+int16_t aom_rb_read_signed_primitive_refsubexpfin(
+    struct aom_read_bit_buffer *rb, uint16_t n, uint16_t k, int16_t ref);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_DSP_BITREADER_BUFFER_H_

diff --git a/libav1/aom_dsp/blend.h b/libav1/aom_dsp/blend.h
new file mode 100644
index 0000000..fd87dc1
--- /dev/null
+++ b/libav1/aom_dsp/blend.h

@@ -0,0 +1,45 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_BLEND_H_
+#define AOM_AOM_DSP_BLEND_H_
+
+#include "aom_ports/mem.h"
+
+// Various blending functions and macros.
+// See also the aom_blend_* functions in aom_dsp_rtcd.h
+
+// Alpha blending with alpha values from the range [0, 64], where 64
+// means use the first input and 0 means use the second input.
+
+#define AOM_BLEND_A64_ROUND_BITS 6
+#define AOM_BLEND_A64_MAX_ALPHA (1 << AOM_BLEND_A64_ROUND_BITS)  // 64
+
+#define AOM_BLEND_A64(a, v0, v1)                                          \
+  ROUND_POWER_OF_TWO((a) * (v0) + (AOM_BLEND_A64_MAX_ALPHA - (a)) * (v1), \
+                     AOM_BLEND_A64_ROUND_BITS)
+
+// Alpha blending with alpha values from the range [0, 256], where 256
+// means use the first input and 0 means use the second input.
+#define AOM_BLEND_A256_ROUND_BITS 8
+#define AOM_BLEND_A256_MAX_ALPHA (1 << AOM_BLEND_A256_ROUND_BITS)  // 256
+
+#define AOM_BLEND_A256(a, v0, v1)                                          \
+  ROUND_POWER_OF_TWO((a) * (v0) + (AOM_BLEND_A256_MAX_ALPHA - (a)) * (v1), \
+                     AOM_BLEND_A256_ROUND_BITS)
+
+// Blending by averaging.
+#define AOM_BLEND_AVG(v0, v1) ROUND_POWER_OF_TWO((v0) + (v1), 1)
+
+#define DIFF_FACTOR_LOG2 4
+#define DIFF_FACTOR (1 << DIFF_FACTOR_LOG2)
+
+#endif  // AOM_AOM_DSP_BLEND_H_

diff --git a/libav1/aom_dsp/blend_a64_hmask.c b/libav1/aom_dsp/blend_a64_hmask.c
new file mode 100644
index 0000000..0554b43
--- /dev/null
+++ b/libav1/aom_dsp/blend_a64_hmask.c

@@ -0,0 +1,69 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+
+#include "aom/aom_integer.h"
+#include "aom_ports/mem.h"
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_dsp/blend.h"
+
+#include "config/aom_dsp_rtcd.h"
+
+void aom_blend_a64_hmask_c(uint8_t *dst, uint32_t dst_stride,
+                           const uint8_t *src0, uint32_t src0_stride,
+                           const uint8_t *src1, uint32_t src1_stride,
+                           const uint8_t *mask, int w, int h) {
+  int i, j;
+
+  assert(IMPLIES(src0 == dst, src0_stride == dst_stride));
+  assert(IMPLIES(src1 == dst, src1_stride == dst_stride));
+
+  assert(h >= 1);
+  assert(w >= 1);
+  assert(IS_POWER_OF_TWO(h));
+  assert(IS_POWER_OF_TWO(w));
+
+  for (i = 0; i < h; ++i) {
+    for (j = 0; j < w; ++j) {
+      dst[i * dst_stride + j] = AOM_BLEND_A64(
+          mask[j], src0[i * src0_stride + j], src1[i * src1_stride + j]);
+    }
+  }
+}
+
+void aom_highbd_blend_a64_hmask_c(uint8_t *dst_8, uint32_t dst_stride,
+                                  const uint8_t *src0_8, uint32_t src0_stride,
+                                  const uint8_t *src1_8, uint32_t src1_stride,
+                                  const uint8_t *mask, int w, int h, int bd) {
+  int i, j;
+  uint16_t *dst = CONVERT_TO_SHORTPTR(dst_8);
+  const uint16_t *src0 = CONVERT_TO_SHORTPTR(src0_8);
+  const uint16_t *src1 = CONVERT_TO_SHORTPTR(src1_8);
+  (void)bd;
+
+  assert(IMPLIES(src0 == dst, src0_stride == dst_stride));
+  assert(IMPLIES(src1 == dst, src1_stride == dst_stride));
+
+  assert(h >= 1);
+  assert(w >= 1);
+  assert(IS_POWER_OF_TWO(h));
+  assert(IS_POWER_OF_TWO(w));
+
+  assert(bd == 8 || bd == 10 || bd == 12);
+
+  for (i = 0; i < h; ++i) {
+    for (j = 0; j < w; ++j) {
+      dst[i * dst_stride + j] = AOM_BLEND_A64(
+          mask[j], src0[i * src0_stride + j], src1[i * src1_stride + j]);
+    }
+  }
+}

diff --git a/libav1/aom_dsp/blend_a64_mask.c b/libav1/aom_dsp/blend_a64_mask.c
new file mode 100644
index 0000000..79956c3
--- /dev/null
+++ b/libav1/aom_dsp/blend_a64_mask.c

@@ -0,0 +1,345 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+
+#include "aom/aom_integer.h"
+#include "aom_ports/mem.h"
+#include "aom_dsp/blend.h"
+#include "aom_dsp/aom_dsp_common.h"
+
+#include "config/aom_dsp_rtcd.h"
+
+// Blending with alpha mask. Mask values come from the range [0, 64],
+// as described for AOM_BLEND_A64 in aom_dsp/blend.h. src0 or src1 can
+// be the same as dst, or dst can be different from both sources.
+
+// NOTE(david.barker): The input and output of aom_blend_a64_d16_mask_c() are
+// in a higher intermediate precision, and will later be rounded down to pixel
+// precision.
+// Thus, in order to avoid double-rounding, we want to use normal right shifts
+// within this function, not ROUND_POWER_OF_TWO.
+// This works because of the identity:
+// ROUND_POWER_OF_TWO(x >> y, z) == ROUND_POWER_OF_TWO(x, y+z)
+//
+// In contrast, the output of the non-d16 functions will not be further rounded,
+// so we *should* use ROUND_POWER_OF_TWO there.
+
+void aom_lowbd_blend_a64_d16_mask_c(
+    uint8_t *dst, uint32_t dst_stride, const CONV_BUF_TYPE *src0,
+    uint32_t src0_stride, const CONV_BUF_TYPE *src1, uint32_t src1_stride,
+    const uint8_t *mask, uint32_t mask_stride, int w, int h, int subw, int subh,
+    ConvolveParams *conv_params) {
+  int i, j;
+  const int bd = 8;
+  const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+  const int round_offset = (1 << (offset_bits - conv_params->round_1)) +
+                           (1 << (offset_bits - conv_params->round_1 - 1));
+  const int round_bits =
+      2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
+
+  assert(IMPLIES((void *)src0 == dst, src0_stride == dst_stride));
+  assert(IMPLIES((void *)src1 == dst, src1_stride == dst_stride));
+
+  assert(h >= 4);
+  assert(w >= 4);
+  assert(IS_POWER_OF_TWO(h));
+  assert(IS_POWER_OF_TWO(w));
+
+  if (subw == 0 && subh == 0) {
+    for (i = 0; i < h; ++i) {
+      for (j = 0; j < w; ++j) {
+        int32_t res;
+        const int m = mask[i * mask_stride + j];
+        res = ((m * (int32_t)src0[i * src0_stride + j] +
+                (AOM_BLEND_A64_MAX_ALPHA - m) *
+                    (int32_t)src1[i * src1_stride + j]) >>
+               AOM_BLEND_A64_ROUND_BITS);
+        res -= round_offset;
+        dst[i * dst_stride + j] =
+            clip_pixel(ROUND_POWER_OF_TWO(res, round_bits));
+      }
+    }
+  } else if (subw == 1 && subh == 1) {
+    for (i = 0; i < h; ++i) {
+      for (j = 0; j < w; ++j) {
+        int32_t res;
+        const int m = ROUND_POWER_OF_TWO(
+            mask[(2 * i) * mask_stride + (2 * j)] +
+                mask[(2 * i + 1) * mask_stride + (2 * j)] +
+                mask[(2 * i) * mask_stride + (2 * j + 1)] +
+                mask[(2 * i + 1) * mask_stride + (2 * j + 1)],
+            2);
+        res = ((m * (int32_t)src0[i * src0_stride + j] +
+                (AOM_BLEND_A64_MAX_ALPHA - m) *
+                    (int32_t)src1[i * src1_stride + j]) >>
+               AOM_BLEND_A64_ROUND_BITS);
+        res -= round_offset;
+        dst[i * dst_stride + j] =
+            clip_pixel(ROUND_POWER_OF_TWO(res, round_bits));
+      }
+    }
+  } else if (subw == 1 && subh == 0) {
+    for (i = 0; i < h; ++i) {
+      for (j = 0; j < w; ++j) {
+        int32_t res;
+        const int m = AOM_BLEND_AVG(mask[i * mask_stride + (2 * j)],
+                                    mask[i * mask_stride + (2 * j + 1)]);
+        res = ((m * (int32_t)src0[i * src0_stride + j] +
+                (AOM_BLEND_A64_MAX_ALPHA - m) *
+                    (int32_t)src1[i * src1_stride + j]) >>
+               AOM_BLEND_A64_ROUND_BITS);
+        res -= round_offset;
+        dst[i * dst_stride + j] =
+            clip_pixel(ROUND_POWER_OF_TWO(res, round_bits));
+      }
+    }
+  } else {
+    for (i = 0; i < h; ++i) {
+      for (j = 0; j < w; ++j) {
+        int32_t res;
+        const int m = AOM_BLEND_AVG(mask[(2 * i) * mask_stride + j],
+                                    mask[(2 * i + 1) * mask_stride + j]);
+        res = ((int32_t)(m * (int32_t)src0[i * src0_stride + j] +
+                         (AOM_BLEND_A64_MAX_ALPHA - m) *
+                             (int32_t)src1[i * src1_stride + j]) >>
+               AOM_BLEND_A64_ROUND_BITS);
+        res -= round_offset;
+        dst[i * dst_stride + j] =
+            clip_pixel(ROUND_POWER_OF_TWO(res, round_bits));
+      }
+    }
+  }
+}
+
+void aom_highbd_blend_a64_d16_mask_c(
+    uint8_t *dst_8, uint32_t dst_stride, const CONV_BUF_TYPE *src0,
+    uint32_t src0_stride, const CONV_BUF_TYPE *src1, uint32_t src1_stride,
+    const uint8_t *mask, uint32_t mask_stride, int w, int h, int subw, int subh,
+    ConvolveParams *conv_params, const int bd) {
+  const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+  const int round_offset = (1 << (offset_bits - conv_params->round_1)) +
+                           (1 << (offset_bits - conv_params->round_1 - 1));
+  const int round_bits =
+      2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
+  uint16_t *dst = CONVERT_TO_SHORTPTR(dst_8);
+
+  assert(IMPLIES(src0 == dst, src0_stride == dst_stride));
+  assert(IMPLIES(src1 == dst, src1_stride == dst_stride));
+
+  assert(h >= 1);
+  assert(w >= 1);
+  assert(IS_POWER_OF_TWO(h));
+  assert(IS_POWER_OF_TWO(w));
+
+  // excerpt from clip_pixel_highbd()
+  // set saturation_value to (1 << bd) - 1
+  unsigned int saturation_value;
+  switch (bd) {
+    case 8:
+    default: saturation_value = 255; break;
+    case 10: saturation_value = 1023; break;
+    case 12: saturation_value = 4095; break;
+  }
+
+  if (subw == 0 && subh == 0) {
+    for (int i = 0; i < h; ++i) {
+      for (int j = 0; j < w; ++j) {
+        int32_t res;
+        const int m = mask[j];
+        res = ((m * src0[j] + (AOM_BLEND_A64_MAX_ALPHA - m) * src1[j]) >>
+               AOM_BLEND_A64_ROUND_BITS);
+        res -= round_offset;
+        unsigned int v = negative_to_zero(ROUND_POWER_OF_TWO(res, round_bits));
+        dst[j] = AOMMIN(v, saturation_value);
+      }
+      mask += mask_stride;
+      src0 += src0_stride;
+      src1 += src1_stride;
+      dst += dst_stride;
+    }
+  } else if (subw == 1 && subh == 1) {
+    for (int i = 0; i < h; ++i) {
+      for (int j = 0; j < w; ++j) {
+        int32_t res;
+        const int m = ROUND_POWER_OF_TWO(
+            mask[2 * j] + mask[mask_stride + 2 * j] + mask[2 * j + 1] +
+                mask[mask_stride + 2 * j + 1],
+            2);
+        res = (m * src0[j] + (AOM_BLEND_A64_MAX_ALPHA - m) * src1[j]) >>
+              AOM_BLEND_A64_ROUND_BITS;
+        res -= round_offset;
+        unsigned int v = negative_to_zero(ROUND_POWER_OF_TWO(res, round_bits));
+        dst[j] = AOMMIN(v, saturation_value);
+      }
+      mask += 2 * mask_stride;
+      src0 += src0_stride;
+      src1 += src1_stride;
+      dst += dst_stride;
+    }
+  } else if (subw == 1 && subh == 0) {
+    for (int i = 0; i < h; ++i) {
+      for (int j = 0; j < w; ++j) {
+        int32_t res;
+        const int m = AOM_BLEND_AVG(mask[2 * j], mask[2 * j + 1]);
+        res = (m * src0[j] + (AOM_BLEND_A64_MAX_ALPHA - m) * src1[j]) >>
+              AOM_BLEND_A64_ROUND_BITS;
+        res -= round_offset;
+        unsigned int v = negative_to_zero(ROUND_POWER_OF_TWO(res, round_bits));
+        dst[j] = AOMMIN(v, saturation_value);
+      }
+      mask += mask_stride;
+      src0 += src0_stride;
+      src1 += src1_stride;
+      dst += dst_stride;
+    }
+  } else {
+    for (int i = 0; i < h; ++i) {
+      for (int j = 0; j < w; ++j) {
+        int32_t res;
+        const int m = AOM_BLEND_AVG(mask[j], mask[mask_stride + j]);
+        res = (m * src0[j] + (AOM_BLEND_A64_MAX_ALPHA - m) * src1[j]) >>
+              AOM_BLEND_A64_ROUND_BITS;
+        res -= round_offset;
+        unsigned int v = negative_to_zero(ROUND_POWER_OF_TWO(res, round_bits));
+        dst[j] = AOMMIN(v, saturation_value);
+      }
+      mask += 2 * mask_stride;
+      src0 += src0_stride;
+      src1 += src1_stride;
+      dst += dst_stride;
+    }
+  }
+}
+
+// Blending with alpha mask. Mask values come from the range [0, 64],
+// as described for AOM_BLEND_A64 in aom_dsp/blend.h. src0 or src1 can
+// be the same as dst, or dst can be different from both sources.
+
+void aom_blend_a64_mask_c(uint8_t *dst, uint32_t dst_stride,
+                          const uint8_t *src0, uint32_t src0_stride,
+                          const uint8_t *src1, uint32_t src1_stride,
+                          const uint8_t *mask, uint32_t mask_stride, int w,
+                          int h, int subw, int subh) {
+  int i, j;
+
+  assert(IMPLIES(src0 == dst, src0_stride == dst_stride));
+  assert(IMPLIES(src1 == dst, src1_stride == dst_stride));
+
+  assert(h >= 1);
+  assert(w >= 1);
+  assert(IS_POWER_OF_TWO(h));
+  assert(IS_POWER_OF_TWO(w));
+
+  if (subw == 0 && subh == 0) {
+    for (i = 0; i < h; ++i) {
+      for (j = 0; j < w; ++j) {
+        const int m = mask[i * mask_stride + j];
+        dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
+                                                src1[i * src1_stride + j]);
+      }
+    }
+  } else if (subw == 1 && subh == 1) {
+    for (i = 0; i < h; ++i) {
+      for (j = 0; j < w; ++j) {
+        const int m = ROUND_POWER_OF_TWO(
+            mask[(2 * i) * mask_stride + (2 * j)] +
+                mask[(2 * i + 1) * mask_stride + (2 * j)] +
+                mask[(2 * i) * mask_stride + (2 * j + 1)] +
+                mask[(2 * i + 1) * mask_stride + (2 * j + 1)],
+            2);
+        dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
+                                                src1[i * src1_stride + j]);
+      }
+    }
+  } else if (subw == 1 && subh == 0) {
+    for (i = 0; i < h; ++i) {
+      for (j = 0; j < w; ++j) {
+        const int m = AOM_BLEND_AVG(mask[i * mask_stride + (2 * j)],
+                                    mask[i * mask_stride + (2 * j + 1)]);
+        dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
+                                                src1[i * src1_stride + j]);
+      }
+    }
+  } else {
+    for (i = 0; i < h; ++i) {
+      for (j = 0; j < w; ++j) {
+        const int m = AOM_BLEND_AVG(mask[(2 * i) * mask_stride + j],
+                                    mask[(2 * i + 1) * mask_stride + j]);
+        dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
+                                                src1[i * src1_stride + j]);
+      }
+    }
+  }
+}
+
+void aom_highbd_blend_a64_mask_c(uint8_t *dst_8, uint32_t dst_stride,
+                                 const uint8_t *src0_8, uint32_t src0_stride,
+                                 const uint8_t *src1_8, uint32_t src1_stride,
+                                 const uint8_t *mask, uint32_t mask_stride,
+                                 int w, int h, int subw, int subh, int bd) {
+  int i, j;
+  uint16_t *dst = CONVERT_TO_SHORTPTR(dst_8);
+  const uint16_t *src0 = CONVERT_TO_SHORTPTR(src0_8);
+  const uint16_t *src1 = CONVERT_TO_SHORTPTR(src1_8);
+  (void)bd;
+
+  assert(IMPLIES(src0 == dst, src0_stride == dst_stride));
+  assert(IMPLIES(src1 == dst, src1_stride == dst_stride));
+
+  assert(h >= 1);
+  assert(w >= 1);
+  assert(IS_POWER_OF_TWO(h));
+  assert(IS_POWER_OF_TWO(w));
+
+  assert(bd == 8 || bd == 10 || bd == 12);
+
+  if (subw == 0 && subh == 0) {
+    for (i = 0; i < h; ++i) {
+      for (j = 0; j < w; ++j) {
+        const int m = mask[i * mask_stride + j];
+        dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
+                                                src1[i * src1_stride + j]);
+      }
+    }
+  } else if (subw == 1 && subh == 1) {
+    for (i = 0; i < h; ++i) {
+      for (j = 0; j < w; ++j) {
+        const int m = ROUND_POWER_OF_TWO(
+            mask[(2 * i) * mask_stride + (2 * j)] +
+                mask[(2 * i + 1) * mask_stride + (2 * j)] +
+                mask[(2 * i) * mask_stride + (2 * j + 1)] +
+                mask[(2 * i + 1) * mask_stride + (2 * j + 1)],
+            2);
+        dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
+                                                src1[i * src1_stride + j]);
+      }
+    }
+  } else if (subw == 1 && subh == 0) {
+    for (i = 0; i < h; ++i) {
+      for (j = 0; j < w; ++j) {
+        const int m = AOM_BLEND_AVG(mask[i * mask_stride + (2 * j)],
+                                    mask[i * mask_stride + (2 * j + 1)]);
+        dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
+                                                src1[i * src1_stride + j]);
+      }
+    }
+  } else {
+    for (i = 0; i < h; ++i) {
+      for (j = 0; j < w; ++j) {
+        const int m = AOM_BLEND_AVG(mask[(2 * i) * mask_stride + j],
+                                    mask[(2 * i + 1) * mask_stride + j]);
+        dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
+                                                src1[i * src1_stride + j]);
+      }
+    }
+  }
+}

diff --git a/libav1/aom_dsp/blend_a64_vmask.c b/libav1/aom_dsp/blend_a64_vmask.c
new file mode 100644
index 0000000..4f222e1
--- /dev/null
+++ b/libav1/aom_dsp/blend_a64_vmask.c

@@ -0,0 +1,71 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+
+#include "aom/aom_integer.h"
+#include "aom_ports/mem.h"
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_dsp/blend.h"
+
+#include "config/aom_dsp_rtcd.h"
+
+void aom_blend_a64_vmask_c(uint8_t *dst, uint32_t dst_stride,
+                           const uint8_t *src0, uint32_t src0_stride,
+                           const uint8_t *src1, uint32_t src1_stride,
+                           const uint8_t *mask, int w, int h) {
+  int i, j;
+
+  assert(IMPLIES(src0 == dst, src0_stride == dst_stride));
+  assert(IMPLIES(src1 == dst, src1_stride == dst_stride));
+
+  assert(h >= 1);
+  assert(w >= 1);
+  assert(IS_POWER_OF_TWO(h));
+  assert(IS_POWER_OF_TWO(w));
+
+  for (i = 0; i < h; ++i) {
+    const int m = mask[i];
+    for (j = 0; j < w; ++j) {
+      dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
+                                              src1[i * src1_stride + j]);
+    }
+  }
+}
+
+void aom_highbd_blend_a64_vmask_c(uint8_t *dst_8, uint32_t dst_stride,
+                                  const uint8_t *src0_8, uint32_t src0_stride,
+                                  const uint8_t *src1_8, uint32_t src1_stride,
+                                  const uint8_t *mask, int w, int h, int bd) {
+  int i, j;
+  uint16_t *dst = CONVERT_TO_SHORTPTR(dst_8);
+  const uint16_t *src0 = CONVERT_TO_SHORTPTR(src0_8);
+  const uint16_t *src1 = CONVERT_TO_SHORTPTR(src1_8);
+  (void)bd;
+
+  assert(IMPLIES(src0 == dst, src0_stride == dst_stride));
+  assert(IMPLIES(src1 == dst, src1_stride == dst_stride));
+
+  assert(h >= 1);
+  assert(w >= 1);
+  assert(IS_POWER_OF_TWO(h));
+  assert(IS_POWER_OF_TWO(w));
+
+  assert(bd == 8 || bd == 10 || bd == 12);
+
+  for (i = 0; i < h; ++i) {
+    const int m = mask[i];
+    for (j = 0; j < w; ++j) {
+      dst[i * dst_stride + j] = AOM_BLEND_A64(m, src0[i * src0_stride + j],
+                                              src1[i * src1_stride + j]);
+    }
+  }
+}

diff --git a/libav1/aom_dsp/daalaboolreader.c b/libav1/aom_dsp/daalaboolreader.c
new file mode 100644
index 0000000..6c2259f
--- /dev/null
+++ b/libav1/aom_dsp/daalaboolreader.c

@@ -0,0 +1,47 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "aom_dsp/daalaboolreader.h"
+
+int aom_daala_reader_init(daala_reader *r, const uint8_t *buffer, int size) {
+  if (size && !buffer) {
+    return 1;
+  }
+  r->buffer_end = buffer + size;
+  r->buffer = buffer;
+  od_ec_dec_init(&r->ec, buffer, size);
+#if CONFIG_ACCOUNTING
+  r->accounting = NULL;
+#endif
+  return 0;
+}
+
+const uint8_t *aom_daala_reader_find_begin(daala_reader *r) {
+  return r->buffer;
+}
+
+const uint8_t *aom_daala_reader_find_end(daala_reader *r) {
+  return r->buffer_end;
+}
+
+uint32_t aom_daala_reader_tell(const daala_reader *r) {
+  return od_ec_dec_tell(&r->ec);
+}
+
+uint32_t aom_daala_reader_tell_frac(const daala_reader *r) {
+  return od_ec_dec_tell_frac(&r->ec);
+}
+
+int aom_daala_reader_has_overflowed(const daala_reader *r) {
+  const uint32_t tell_bits = aom_daala_reader_tell(r);
+  const uint32_t tell_bytes = (tell_bits + 7) >> 3;
+  return ((ptrdiff_t)tell_bytes > r->buffer_end - r->buffer);
+}

diff --git a/libav1/aom_dsp/daalaboolreader.h b/libav1/aom_dsp/daalaboolreader.h
new file mode 100644
index 0000000..5f6ffde
--- /dev/null
+++ b/libav1/aom_dsp/daalaboolreader.h

@@ -0,0 +1,157 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_DAALABOOLREADER_H_
+#define AOM_AOM_DSP_DAALABOOLREADER_H_
+
+#include <xmmintrin.h>
+#include "aom/aom_integer.h"
+#include "aom_dsp/entdec.h"
+#include "aom_dsp/prob.h"
+#if CONFIG_ACCOUNTING
+#include "av1/decoder/accounting.h"
+#endif
+#if CONFIG_BITSTREAM_DEBUG
+#include <stdio.h>
+#include "aom_util/debug_util.h"
+#endif  // CONFIG_BITSTREAM_DEBUG
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct daala_reader {
+  const uint8_t *buffer;
+  const uint8_t *buffer_end;
+  od_ec_dec ec;
+#if CONFIG_ACCOUNTING
+  Accounting *accounting;
+#endif
+  uint8_t allow_update_cdf;
+};
+
+typedef struct daala_reader daala_reader;
+
+int aom_daala_reader_init(daala_reader *r, const uint8_t *buffer, int size);
+const uint8_t *aom_daala_reader_find_begin(daala_reader *r);
+const uint8_t *aom_daala_reader_find_end(daala_reader *r);
+uint32_t aom_daala_reader_tell(const daala_reader *r);
+uint32_t aom_daala_reader_tell_frac(const daala_reader *r);
+// Returns true if the reader has tried to decode more data from the buffer
+// than was actually provided.
+int aom_daala_reader_has_overflowed(const daala_reader *r);
+
+static INLINE int aom_daala_read(daala_reader *r, int prob) {
+  int bit, p;
+  //_mm_prefetch((const char*)(&r->ec), _MM_HINT_T0);
+  p = (0x7FFFFF - (prob << 15) + prob) >> 8;
+#if CONFIG_BITSTREAM_DEBUG
+/*{
+  const int queue_r = bitstream_queue_get_read();
+  const int frame_idx = bitstream_queue_get_frame_read();
+  if (frame_idx == 0 && queue_r == 0) {
+    fprintf(stderr, "\n *** bitstream queue at frame_idx_r %d queue_r %d\n",
+            frame_idx, queue_r);
+  }
+}*/
+#endif
+  bit = od_ec_decode_bool_q15(&r->ec, p);
+
+#if CONFIG_BITSTREAM_DEBUG
+  {
+    int i;
+    int ref_bit, ref_nsymbs;
+    aom_cdf_prob ref_cdf[16];
+    const int queue_r = bitstream_queue_get_read();
+    const int frame_idx = bitstream_queue_get_frame_read();
+    bitstream_queue_pop(&ref_bit, ref_cdf, &ref_nsymbs);
+    if (ref_nsymbs != 2) {
+      fprintf(stderr,
+              "\n *** [bit] nsymbs error, frame_idx_r %d nsymbs %d ref_nsymbs "
+              "%d queue_r %d\n",
+              frame_idx, 2, ref_nsymbs, queue_r);
+      assert(0);
+    }
+    if ((ref_nsymbs != 2) || (ref_cdf[0] != (aom_cdf_prob)p) ||
+        (ref_cdf[1] != 32767)) {
+      fprintf(stderr,
+              "\n *** [bit] cdf error, frame_idx_r %d cdf {%d, %d} ref_cdf {%d",
+              frame_idx, p, 32767, ref_cdf[0]);
+      for (i = 1; i < ref_nsymbs; ++i) fprintf(stderr, ", %d", ref_cdf[i]);
+      fprintf(stderr, "} queue_r %d\n", queue_r);
+      assert(0);
+    }
+    if (bit != ref_bit) {
+      fprintf(stderr,
+              "\n *** [bit] symb error, frame_idx_r %d symb %d ref_symb %d "
+              "queue_r %d\n",
+              frame_idx, bit, ref_bit, queue_r);
+      assert(0);
+    }
+  }
+#endif
+
+  return bit;
+}
+
+static INLINE int daala_read_symbol(daala_reader *r, aom_cdf_prob *cdf,
+                                    int nsymbs) {
+  int symb;
+  assert(cdf != NULL);
+  symb = od_ec_decode_cdf_q15_standard(&r->ec, cdf, nsymbs);
+
+#if CONFIG_BITSTREAM_DEBUG
+  {
+    int i;
+    int cdf_error = 0;
+    int ref_symb, ref_nsymbs;
+    aom_cdf_prob ref_cdf[16];
+    const int queue_r = bitstream_queue_get_read();
+    const int frame_idx = bitstream_queue_get_frame_read();
+    bitstream_queue_pop(&ref_symb, ref_cdf, &ref_nsymbs);
+    if (nsymbs != ref_nsymbs) {
+      fprintf(stderr,
+              "\n *** nsymbs error, frame_idx_r %d nsymbs %d ref_nsymbs %d "
+              "queue_r %d\n",
+              frame_idx, nsymbs, ref_nsymbs, queue_r);
+      cdf_error = 0;
+      assert(0);
+    } else {
+      for (i = 0; i < nsymbs; ++i)
+        if (cdf[i] != ref_cdf[i]) cdf_error = 1;
+    }
+    if (cdf_error) {
+      fprintf(stderr, "\n *** cdf error, frame_idx_r %d cdf {%d", frame_idx,
+              cdf[0]);
+      for (i = 1; i < nsymbs; ++i) fprintf(stderr, ", %d", cdf[i]);
+      fprintf(stderr, "} ref_cdf {%d", ref_cdf[0]);
+      for (i = 1; i < ref_nsymbs; ++i) fprintf(stderr, ", %d", ref_cdf[i]);
+      fprintf(stderr, "} queue_r %d\n", queue_r);
+      assert(0);
+    }
+    if (symb != ref_symb) {
+      fprintf(
+          stderr,
+          "\n *** symb error, frame_idx_r %d symb %d ref_symb %d queue_r %d\n",
+          frame_idx, symb, ref_symb, queue_r);
+      assert(0);
+    }
+  }
+#endif
+
+  return symb;
+}
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_DSP_DAALABOOLREADER_H_

diff --git a/libav1/aom_dsp/entcode.c b/libav1/aom_dsp/entcode.c
new file mode 100644
index 0000000..aad96c6
--- /dev/null
+++ b/libav1/aom_dsp/entcode.c

@@ -0,0 +1,49 @@
+/*
+ * Copyright (c) 2001-2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "aom_dsp/entcode.h"
+
+/*Given the current total integer number of bits used and the current value of
+   rng, computes the fraction number of bits used to OD_BITRES precision.
+  This is used by od_ec_enc_tell_frac() and od_ec_dec_tell_frac().
+  nbits_total: The number of whole bits currently used, i.e., the value
+                returned by od_ec_enc_tell() or od_ec_dec_tell().
+  rng: The current value of rng from either the encoder or decoder state.
+  Return: The number of bits scaled by 2**OD_BITRES.
+          This will always be slightly larger than the exact value (e.g., all
+           rounding error is in the positive direction).*/
+uint32_t od_ec_tell_frac(uint32_t nbits_total, uint32_t rng) {
+  uint32_t nbits;
+  int l;
+  int i;
+  /*To handle the non-integral number of bits still left in the encoder/decoder
+     state, we compute the worst-case number of bits of val that must be
+     encoded to ensure that the value is inside the range for any possible
+     subsequent bits.
+    The computation here is independent of val itself (the decoder does not
+     even track that value), even though the real number of bits used after
+     od_ec_enc_done() may be 1 smaller if rng is a power of two and the
+     corresponding trailing bits of val are all zeros.
+    If we did try to track that special case, then coding a value with a
+     probability of 1/(1 << n) might sometimes appear to use more than n bits.
+    This may help explain the surprising result that a newly initialized
+     encoder or decoder claims to have used 1 bit.*/
+  nbits = nbits_total << OD_BITRES;
+  l = 0;
+  for (i = OD_BITRES; i-- > 0;) {
+    int b;
+    rng = rng * rng >> 15;
+    b = (int)(rng >> 16);
+    l = l << 1 | b;
+    rng >>= b;
+  }
+  return nbits - l;
+}

diff --git a/libav1/aom_dsp/entcode.h b/libav1/aom_dsp/entcode.h
new file mode 100644
index 0000000..7518879
--- /dev/null
+++ b/libav1/aom_dsp/entcode.h

@@ -0,0 +1,41 @@
+/*
+ * Copyright (c) 2001-2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_ENTCODE_H_
+#define AOM_AOM_DSP_ENTCODE_H_
+
+#include <limits.h>
+#include <stddef.h>
+#include "av1/common/odintrin.h"
+#include "aom_dsp/prob.h"
+
+#define EC_PROB_SHIFT 6
+#define EC_MIN_PROB 4  // must be <= (1<<EC_PROB_SHIFT)/16
+
+/*OPT: od_ec_window must be at least 32 bits, but if you have fast arithmetic
+   on a larger type, you can speed up the decoder by using it here.*/
+typedef uint32_t od_ec_window;
+
+/*The size in bits of od_ec_window.*/
+#define OD_EC_WINDOW_SIZE ((int)sizeof(od_ec_window) * CHAR_BIT)
+
+/*The resolution of fractional-precision bit usage measurements, i.e.,
+   3 => 1/8th bits.*/
+#define OD_BITRES (3)
+
+#define OD_ICDF AOM_ICDF
+
+/*See entcode.c for further documentation.*/
+
+OD_WARN_UNUSED_RESULT uint32_t od_ec_tell_frac(uint32_t nbits_total,
+                                               uint32_t rng);
+
+#endif  // AOM_AOM_DSP_ENTCODE_H_

diff --git a/libav1/aom_dsp/entdec.c b/libav1/aom_dsp/entdec.c
new file mode 100644
index 0000000..0043ac7
--- /dev/null
+++ b/libav1/aom_dsp/entdec.c

@@ -0,0 +1,287 @@
+/*
+ * Copyright (c) 2001-2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+#include "aom_dsp/entdec.h"
+#include "aom_dsp/prob.h"
+
+/*A range decoder.
+  This is an entropy decoder based upon \cite{Mar79}, which is itself a
+   rediscovery of the FIFO arithmetic code introduced by \cite{Pas76}.
+  It is very similar to arithmetic encoding, except that encoding is done with
+   digits in any base, instead of with bits, and so it is faster when using
+   larger bases (i.e.: a byte).
+  The author claims an average waste of $\frac{1}{2}\log_b(2b)$ bits, where $b$
+   is the base, longer than the theoretical optimum, but to my knowledge there
+   is no published justification for this claim.
+  This only seems true when using near-infinite precision arithmetic so that
+   the process is carried out with no rounding errors.
+
+  An excellent description of implementation details is available at
+   http://www.arturocampos.com/ac_range.html
+  A recent work \cite{MNW98} which proposes several changes to arithmetic
+   encoding for efficiency actually re-discovers many of the principles
+   behind range encoding, and presents a good theoretical analysis of them.
+
+  End of stream is handled by writing out the smallest number of bits that
+   ensures that the stream will be correctly decoded regardless of the value of
+   any subsequent bits.
+  od_ec_dec_tell() can be used to determine how many bits were needed to decode
+   all the symbols thus far; other data can be packed in the remaining bits of
+   the input buffer.
+  @PHDTHESIS{Pas76,
+    author="Richard Clark Pasco",
+    title="Source coding algorithms for fast data compression",
+    school="Dept. of Electrical Engineering, Stanford University",
+    address="Stanford, CA",
+    month=May,
+    year=1976,
+    URL="http://www.richpasco.org/scaffdc.pdf"
+  }
+  @INPROCEEDINGS{Mar79,
+   author="Martin, G.N.N.",
+   title="Range encoding: an algorithm for removing redundancy from a digitised
+    message",
+   booktitle="Video & Data Recording Conference",
+   year=1979,
+   address="Southampton",
+   month=Jul,
+   URL="http://www.compressconsult.com/rangecoder/rngcod.pdf.gz"
+  }
+  @ARTICLE{MNW98,
+   author="Alistair Moffat and Radford Neal and Ian H. Witten",
+   title="Arithmetic Coding Revisited",
+   journal="{ACM} Transactions on Information Systems",
+   year=1998,
+   volume=16,
+   number=3,
+   pages="256--294",
+   month=Jul,
+   URL="http://researchcommons.waikato.ac.nz/bitstream/handle/10289/78/content.pdf"
+  }*/
+
+/*This is meant to be a large, positive constant that can still be efficiently
+   loaded as an immediate (on platforms like ARM, for example).
+  Even relatively modest values like 100 would work fine.*/
+#define OD_EC_LOTS_OF_BITS (0x4000)
+
+/*The return value of od_ec_dec_tell does not change across an od_ec_dec_refill
+   call.*/
+static void od_ec_dec_refill(od_ec_dec *dec) {
+  int s;
+  od_ec_window dif;
+  int16_t cnt;
+  const unsigned char *bptr;
+  const unsigned char *end;
+  dif = dec->dif;
+  cnt = dec->cnt;
+  bptr = dec->bptr;
+  end = dec->end;
+  s = OD_EC_WINDOW_SIZE - 9 - (cnt + 15);
+  for (; s >= 0 && bptr < end; s -= 8, bptr++) {
+    /*Each time a byte is inserted into the window (dif), bptr advances and cnt
+       is incremented by 8, so the total number of consumed bits (the return
+       value of od_ec_dec_tell) does not change.*/
+    assert(s <= OD_EC_WINDOW_SIZE - 8);
+    dif ^= (od_ec_window)bptr[0] << s;
+    cnt += 8;
+  }
+  if (bptr >= end) {
+    /*We've reached the end of the buffer. It is perfectly valid for us to need
+       to fill the window with additional bits past the end of the buffer (and
+       this happens in normal operation). These bits should all just be taken
+       as zero. But we cannot increment bptr past 'end' (this is undefined
+       behavior), so we start to increment dec->tell_offs. We also don't want
+       to keep testing bptr against 'end', so we set cnt to OD_EC_LOTS_OF_BITS
+       and adjust dec->tell_offs so that the total number of unconsumed bits in
+       the window (dec->cnt - dec->tell_offs) does not change. This effectively
+       puts lots of zero bits into the window, and means we won't try to refill
+       it from the buffer for a very long time (at which point we'll put lots
+       of zero bits into the window again).*/
+    dec->tell_offs += OD_EC_LOTS_OF_BITS - cnt;
+    cnt = OD_EC_LOTS_OF_BITS;
+  }
+  dec->dif = dif;
+  dec->cnt = cnt;
+  dec->bptr = bptr;
+}
+
+/*Takes updated dif and range values, renormalizes them so that
+   32768 <= rng < 65536 (reading more bytes from the stream into dif if
+   necessary), and stores them back in the decoder context.
+  dif: The new value of dif.
+  rng: The new value of the range.
+  ret: The value to return.
+  Return: ret.
+          This allows the compiler to jump to this function via a tail-call.*/
+static int od_ec_dec_normalize(od_ec_dec *dec, od_ec_window dif, unsigned rng,
+                               int ret) {
+  int d;
+  assert(rng <= 65535U);
+  /*The number of leading zeros in the 16-bit binary representation of rng.*/
+  d = 16 - OD_ILOG_NZ(rng);
+  /*d bits in dec->dif are consumed.*/
+  dec->cnt -= d;
+  /*This is equivalent to shifting in 1's instead of 0's.*/
+  dec->dif = ((dif + 1) << d) - 1;
+  dec->rng = rng << d;
+  if (dec->cnt < 0) od_ec_dec_refill(dec);
+  return ret;
+}
+
+/*Initializes the decoder.
+  buf: The input buffer to use.
+  storage: The size in bytes of the input buffer.*/
+void od_ec_dec_init(od_ec_dec *dec, const unsigned char *buf,
+                    uint32_t storage) {
+  dec->buf = buf;
+  dec->tell_offs = 10 - (OD_EC_WINDOW_SIZE - 8);
+  dec->end = buf + storage;
+  dec->bptr = buf;
+  dec->dif = ((od_ec_window)1 << (OD_EC_WINDOW_SIZE - 1)) - 1;
+  dec->rng = 0x8000;
+  dec->cnt = -15;
+  od_ec_dec_refill(dec);
+}
+
+/*Decode a single binary value.
+  f: The probability that the bit is one, scaled by 32768.
+  Return: The value decoded (0 or 1).*/
+int od_ec_decode_bool_q15(od_ec_dec *dec, unsigned f) {
+  od_ec_window dif;
+  od_ec_window vw;
+  unsigned r;
+  unsigned r_new;
+  unsigned v;
+  int ret;
+  assert(0 < f);
+  assert(f < 32768U);
+  dif = dec->dif;
+  r = dec->rng;
+  assert(dif >> (OD_EC_WINDOW_SIZE - 16) < r);
+  assert(32768U <= r);
+  v = ((r >> 8) * (uint32_t)(f >> EC_PROB_SHIFT) >> (7 - EC_PROB_SHIFT));
+  v += EC_MIN_PROB;
+  vw = (od_ec_window)v << (OD_EC_WINDOW_SIZE - 16);
+  ret = 1;
+  r_new = v;
+  if (dif >= vw) {
+    r_new = r - v;
+    dif -= vw;
+    ret = 0;
+  }
+  return od_ec_dec_normalize(dec, dif, r_new, ret);
+}
+
+/*Decodes a symbol given an inverse cumulative distribution function (CDF)
+   table in Q15.
+  icdf: CDF_PROB_TOP minus the CDF, such that symbol s falls in the range
+         [s > 0 ? (CDF_PROB_TOP - icdf[s - 1]) : 0, CDF_PROB_TOP - icdf[s]).
+        The values must be monotonically non-increasing, and icdf[nsyms - 1]
+         must be 0.
+  nsyms: The number of symbols in the alphabet.
+         This should be at most 16.
+  Return: The decoded symbol s.*/
+
+int od_ec_decode_cdf_q15_standard(od_ec_dec *dec, uint16_t *icdf, int nsyms) {
+  od_ec_window dif;
+  unsigned r;
+  unsigned c;
+  unsigned u;
+  unsigned v;
+  int ret;
+  (void)nsyms;
+  dif = dec->dif;
+  r = dec->rng;
+  const int N = nsyms - 1;
+
+  assert(dif >> (OD_EC_WINDOW_SIZE - 16) < r);
+  assert(icdf[nsyms - 1] == OD_ICDF(CDF_PROB_TOP));
+  assert(32768U <= r);
+  assert(7 - EC_PROB_SHIFT - CDF_SHIFT >= 0);
+  c = (unsigned)(dif >> (OD_EC_WINDOW_SIZE - 16));
+  v = r;
+  ret = -1;
+  do {
+    u = v;
+    v = ((r >> 8) * (uint32_t)(icdf[++ret] >> EC_PROB_SHIFT) >>
+         (7 - EC_PROB_SHIFT - CDF_SHIFT));
+    v += EC_MIN_PROB * (N - ret);
+  } while (c < v);
+  assert(v < u);
+  assert(u <= r);
+  r = u - v;
+  dif -= (od_ec_window)v << (OD_EC_WINDOW_SIZE - 16);
+  return od_ec_dec_normalize(dec, dif, r, ret);
+}
+
+int od_ec_decode_cdf_q15(od_ec_dec *s, aom_cdf_prob *cdf, int n_symbols) {
+    od_ec_window u, v = s->rng, r = s->rng >> 8;
+    const od_ec_window c = s->dif >> (OD_EC_WINDOW_SIZE - 16);
+    unsigned ret = 0;
+
+    int rate = cdf[n_symbols];
+    //rate = 4 + (rate > 15) + (rate > 31) + (n_symbols > 3);
+    if (rate > 31)
+        rate = 4 + 1 + 1 + (n_symbols > 3);
+    else {
+        cdf[n_symbols]++;
+        if (rate > 15)
+            rate = 4 + 1 + (n_symbols > 3);
+        else
+            rate = 4 + (n_symbols > 3);
+    }
+    aom_cdf_prob add;
+    do {
+        u = v;
+        v = r * (cdf[ret] >> EC_PROB_SHIFT); 
+        v >>= 7 - EC_PROB_SHIFT;
+        v += EC_MIN_PROB * (n_symbols - ret - 1);
+        add = ((0x8000 - cdf[ret]) >> rate);
+        cdf[ret++] += add;
+    } while (c < v);
+
+    int i = ret - 1;
+    cdf[i] -= add;
+    for (; i < n_symbols-1; ++i) {
+        cdf[i] -= cdf[i] >> rate;
+    }
+
+    //cdf[n_symbols] += (cdf[n_symbols] < 32);
+
+    assert(u <= s->rng);
+
+    return od_ec_dec_normalize(s, s->dif - (v << (OD_EC_WINDOW_SIZE - 16)), (unsigned)(u - v), ret) - 1;
+}
+
+/*Returns the number of bits "used" by the decoded symbols so far.
+  This same number can be computed in either the encoder or the decoder, and is
+   suitable for making coding decisions.
+  Return: The number of bits.
+          This will always be slightly larger than the exact value (e.g., all
+           rounding error is in the positive direction).*/
+int od_ec_dec_tell(const od_ec_dec *dec) {
+  /*There is a window of bits stored in dec->dif. The difference
+     (dec->bptr - dec->buf) tells us how many bytes have been read into this
+     window. The difference (dec->cnt - dec->tell_offs) tells us how many of
+     the bits in that window remain unconsumed.*/
+  return (int)((dec->bptr - dec->buf) * 8 - dec->cnt + dec->tell_offs);
+}
+
+/*Returns the number of bits "used" by the decoded symbols so far.
+  This same number can be computed in either the encoder or the decoder, and is
+   suitable for making coding decisions.
+  Return: The number of bits scaled by 2**OD_BITRES.
+          This will always be slightly larger than the exact value (e.g., all
+           rounding error is in the positive direction).*/
+uint32_t od_ec_dec_tell_frac(const od_ec_dec *dec) {
+  return od_ec_tell_frac(od_ec_dec_tell(dec), dec->rng);
+}

diff --git a/libav1/aom_dsp/entdec.h b/libav1/aom_dsp/entdec.h
new file mode 100644
index 0000000..7e56d12
--- /dev/null
+++ b/libav1/aom_dsp/entdec.h

@@ -0,0 +1,85 @@
+/*
+ * Copyright (c) 2001-2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_ENTDEC_H_
+#define AOM_AOM_DSP_ENTDEC_H_
+#include <limits.h>
+#include "aom_dsp/entcode.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef struct od_ec_dec od_ec_dec;
+
+#if defined(OD_ACCOUNTING) && OD_ACCOUNTING
+#define OD_ACC_STR , char *acc_str
+#define od_ec_dec_bits(dec, ftb, str) od_ec_dec_bits_(dec, ftb, str)
+#else
+#define OD_ACC_STR
+#define od_ec_dec_bits(dec, ftb, str) od_ec_dec_bits_(dec, ftb)
+#endif
+
+/*The entropy decoder context.*/
+struct od_ec_dec {
+  /*The start of the current input buffer.*/
+  const unsigned char *buf;
+  /*An offset used to keep track of tell after reaching the end of the stream.
+    This is constant throughout most of the decoding process, but becomes
+     important once we hit the end of the buffer and stop incrementing bptr
+     (and instead pretend cnt has lots of bits).*/
+  int32_t tell_offs;
+  /*The end of the current input buffer.*/
+  const unsigned char *end;
+  /*The read pointer for the entropy-coded bits.*/
+  const unsigned char *bptr;
+  /*The difference between the high end of the current range, (low + rng), and
+     the coded value, minus 1.
+    This stores up to OD_EC_WINDOW_SIZE bits of that difference, but the
+     decoder only uses the top 16 bits of the window to decode the next symbol.
+    As we shift up during renormalization, if we don't have enough bits left in
+     the window to fill the top 16, we'll read in more bits of the coded
+     value.*/
+  od_ec_window dif;
+  /*The number of values in the current range.*/
+  uint16_t rng;
+  /*The number of bits of data in the current value.*/
+  int16_t cnt;
+};
+
+/*See entdec.c for further documentation.*/
+
+void od_ec_dec_init(od_ec_dec *dec, const unsigned char *buf, uint32_t storage)
+    OD_ARG_NONNULL(1) OD_ARG_NONNULL(2);
+
+OD_WARN_UNUSED_RESULT int od_ec_decode_bool_q15(od_ec_dec *dec, unsigned f)
+    OD_ARG_NONNULL(1);
+OD_WARN_UNUSED_RESULT int od_ec_decode_cdf_q15(od_ec_dec *dec,
+                                               uint16_t *cdf, int nsyms)
+    OD_ARG_NONNULL(1) OD_ARG_NONNULL(2);
+
+OD_WARN_UNUSED_RESULT int od_ec_decode_cdf_q15_standard(od_ec_dec *dec,
+    uint16_t *cdf, int nsyms)
+    OD_ARG_NONNULL(1) OD_ARG_NONNULL(2);
+
+OD_WARN_UNUSED_RESULT uint32_t od_ec_dec_bits_(od_ec_dec *dec, unsigned ftb)
+    OD_ARG_NONNULL(1);
+
+OD_WARN_UNUSED_RESULT int od_ec_dec_tell(const od_ec_dec *dec)
+    OD_ARG_NONNULL(1);
+OD_WARN_UNUSED_RESULT uint32_t od_ec_dec_tell_frac(const od_ec_dec *dec)
+    OD_ARG_NONNULL(1);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_DSP_ENTDEC_H_

diff --git a/libav1/aom_dsp/fft.c b/libav1/aom_dsp/fft.c
new file mode 100644
index 0000000..0ba71cf
--- /dev/null
+++ b/libav1/aom_dsp/fft.c

@@ -0,0 +1,219 @@
+/*
+ * Copyright (c) 2018, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_dsp/fft_common.h"
+
+static INLINE void simple_transpose(const float *A, float *B, int n) {
+  for (int y = 0; y < n; y++) {
+    for (int x = 0; x < n; x++) {
+      B[y * n + x] = A[x * n + y];
+    }
+  }
+}
+
+// The 1d transform is real to complex and packs the complex results in
+// a way to take advantage of conjugate symmetry (e.g., the n/2 + 1 real
+// components, followed by the n/2 - 1 imaginary components). After the
+// transform is done on the rows, the first n/2 + 1 columns are real, and
+// the remaining are the imaginary components. After the transform on the
+// columns, the region of [0, n/2]x[0, n/2] contains the real part of
+// fft of the real columns. The real part of the 2d fft also includes the
+// imaginary part of transformed imaginary columns. This function assembles
+// the correct outputs while putting the real and imaginary components
+// next to each other.
+static INLINE void unpack_2d_output(const float *col_fft, float *output,
+                                    int n) {
+  for (int y = 0; y <= n / 2; ++y) {
+    const int y2 = y + n / 2;
+    const int y_extra = y2 > n / 2 && y2 < n;
+
+    for (int x = 0; x <= n / 2; ++x) {
+      const int x2 = x + n / 2;
+      const int x_extra = x2 > n / 2 && x2 < n;
+      output[2 * (y * n + x)] =
+          col_fft[y * n + x] - (x_extra && y_extra ? col_fft[y2 * n + x2] : 0);
+      output[2 * (y * n + x) + 1] = (y_extra ? col_fft[y2 * n + x] : 0) +
+                                    (x_extra ? col_fft[y * n + x2] : 0);
+      if (y_extra) {
+        output[2 * ((n - y) * n + x)] =
+            col_fft[y * n + x] +
+            (x_extra && y_extra ? col_fft[y2 * n + x2] : 0);
+        output[2 * ((n - y) * n + x) + 1] =
+            -(y_extra ? col_fft[y2 * n + x] : 0) +
+            (x_extra ? col_fft[y * n + x2] : 0);
+      }
+    }
+  }
+}
+
+void aom_fft_2d_gen(const float *input, float *temp, float *output, int n,
+                    aom_fft_1d_func_t tform, aom_fft_transpose_func_t transpose,
+                    aom_fft_unpack_func_t unpack, int vec_size) {
+  for (int x = 0; x < n; x += vec_size) {
+    tform(input + x, output + x, n);
+  }
+  transpose(output, temp, n);
+
+  for (int x = 0; x < n; x += vec_size) {
+    tform(temp + x, output + x, n);
+  }
+  transpose(output, temp, n);
+
+  unpack(temp, output, n);
+}
+
+static INLINE void store_float(float *output, float input) { *output = input; }
+static INLINE float add_float(float a, float b) { return a + b; }
+static INLINE float sub_float(float a, float b) { return a - b; }
+static INLINE float mul_float(float a, float b) { return a * b; }
+
+GEN_FFT_2(void, float, float, float, *, store_float);
+GEN_FFT_4(void, float, float, float, *, store_float, (float), add_float,
+          sub_float);
+GEN_FFT_8(void, float, float, float, *, store_float, (float), add_float,
+          sub_float, mul_float);
+GEN_FFT_16(void, float, float, float, *, store_float, (float), add_float,
+           sub_float, mul_float);
+GEN_FFT_32(void, float, float, float, *, store_float, (float), add_float,
+           sub_float, mul_float);
+
+void aom_fft2x2_float_c(const float *input, float *temp, float *output) {
+  aom_fft_2d_gen(input, temp, output, 2, aom_fft1d_2_float, simple_transpose,
+                 unpack_2d_output, 1);
+}
+
+void aom_fft4x4_float_c(const float *input, float *temp, float *output) {
+  aom_fft_2d_gen(input, temp, output, 4, aom_fft1d_4_float, simple_transpose,
+                 unpack_2d_output, 1);
+}
+
+void aom_fft8x8_float_c(const float *input, float *temp, float *output) {
+  aom_fft_2d_gen(input, temp, output, 8, aom_fft1d_8_float, simple_transpose,
+                 unpack_2d_output, 1);
+}
+
+void aom_fft16x16_float_c(const float *input, float *temp, float *output) {
+  aom_fft_2d_gen(input, temp, output, 16, aom_fft1d_16_float, simple_transpose,
+                 unpack_2d_output, 1);
+}
+
+void aom_fft32x32_float_c(const float *input, float *temp, float *output) {
+  aom_fft_2d_gen(input, temp, output, 32, aom_fft1d_32_float, simple_transpose,
+                 unpack_2d_output, 1);
+}
+
+void aom_ifft_2d_gen(const float *input, float *temp, float *output, int n,
+                     aom_fft_1d_func_t fft_single, aom_fft_1d_func_t fft_multi,
+                     aom_fft_1d_func_t ifft_multi,
+                     aom_fft_transpose_func_t transpose, int vec_size) {
+  // Column 0 and n/2 have conjugate symmetry, so we can directly do the ifft
+  // and get real outputs.
+  for (int y = 0; y <= n / 2; ++y) {
+    output[y * n] = input[2 * y * n];
+    output[y * n + 1] = input[2 * (y * n + n / 2)];
+  }
+  for (int y = n / 2 + 1; y < n; ++y) {
+    output[y * n] = input[2 * (y - n / 2) * n + 1];
+    output[y * n + 1] = input[2 * ((y - n / 2) * n + n / 2) + 1];
+  }
+
+  for (int i = 0; i < 2; i += vec_size) {
+    ifft_multi(output + i, temp + i, n);
+  }
+
+  // For the other columns, since we don't have a full ifft for complex inputs
+  // we have to split them into the real and imaginary counterparts.
+  // Pack the real component, then the imaginary components.
+  for (int y = 0; y < n; ++y) {
+    for (int x = 1; x < n / 2; ++x) {
+      output[y * n + (x + 1)] = input[2 * (y * n + x)];
+    }
+    for (int x = 1; x < n / 2; ++x) {
+      output[y * n + (x + n / 2)] = input[2 * (y * n + x) + 1];
+    }
+  }
+  for (int y = 2; y < vec_size; y++) {
+    fft_single(output + y, temp + y, n);
+  }
+  // This is the part that can be sped up with SIMD
+  for (int y = AOMMAX(2, vec_size); y < n; y += vec_size) {
+    fft_multi(output + y, temp + y, n);
+  }
+
+  // Put the 0 and n/2 th results in the correct place.
+  for (int x = 0; x < n; ++x) {
+    output[x] = temp[x * n];
+    output[(n / 2) * n + x] = temp[x * n + 1];
+  }
+  // This rearranges and transposes.
+  for (int y = 1; y < n / 2; ++y) {
+    // Fill in the real columns
+    for (int x = 0; x <= n / 2; ++x) {
+      output[x + y * n] =
+          temp[(y + 1) + x * n] +
+          ((x > 0 && x < n / 2) ? temp[(y + n / 2) + (x + n / 2) * n] : 0);
+    }
+    for (int x = n / 2 + 1; x < n; ++x) {
+      output[x + y * n] = temp[(y + 1) + (n - x) * n] -
+                          temp[(y + n / 2) + ((n - x) + n / 2) * n];
+    }
+    // Fill in the imag columns
+    for (int x = 0; x <= n / 2; ++x) {
+      output[x + (y + n / 2) * n] =
+          temp[(y + n / 2) + x * n] -
+          ((x > 0 && x < n / 2) ? temp[(y + 1) + (x + n / 2) * n] : 0);
+    }
+    for (int x = n / 2 + 1; x < n; ++x) {
+      output[x + (y + n / 2) * n] = temp[(y + 1) + ((n - x) + n / 2) * n] +
+                                    temp[(y + n / 2) + (n - x) * n];
+    }
+  }
+  for (int y = 0; y < n; y += vec_size) {
+    ifft_multi(output + y, temp + y, n);
+  }
+  transpose(temp, output, n);
+}
+
+GEN_IFFT_2(void, float, float, float, *, store_float);
+GEN_IFFT_4(void, float, float, float, *, store_float, (float), add_float,
+           sub_float);
+GEN_IFFT_8(void, float, float, float, *, store_float, (float), add_float,
+           sub_float, mul_float);
+GEN_IFFT_16(void, float, float, float, *, store_float, (float), add_float,
+            sub_float, mul_float);
+GEN_IFFT_32(void, float, float, float, *, store_float, (float), add_float,
+            sub_float, mul_float);
+
+void aom_ifft2x2_float_c(const float *input, float *temp, float *output) {
+  aom_ifft_2d_gen(input, temp, output, 2, aom_fft1d_2_float, aom_fft1d_2_float,
+                  aom_ifft1d_2_float, simple_transpose, 1);
+}
+
+void aom_ifft4x4_float_c(const float *input, float *temp, float *output) {
+  aom_ifft_2d_gen(input, temp, output, 4, aom_fft1d_4_float, aom_fft1d_4_float,
+                  aom_ifft1d_4_float, simple_transpose, 1);
+}
+
+void aom_ifft8x8_float_c(const float *input, float *temp, float *output) {
+  aom_ifft_2d_gen(input, temp, output, 8, aom_fft1d_8_float, aom_fft1d_8_float,
+                  aom_ifft1d_8_float, simple_transpose, 1);
+}
+
+void aom_ifft16x16_float_c(const float *input, float *temp, float *output) {
+  aom_ifft_2d_gen(input, temp, output, 16, aom_fft1d_16_float,
+                  aom_fft1d_16_float, aom_ifft1d_16_float, simple_transpose, 1);
+}
+
+void aom_ifft32x32_float_c(const float *input, float *temp, float *output) {
+  aom_ifft_2d_gen(input, temp, output, 32, aom_fft1d_32_float,
+                  aom_fft1d_32_float, aom_ifft1d_32_float, simple_transpose, 1);
+}

diff --git a/libav1/aom_dsp/fft_common.h b/libav1/aom_dsp/fft_common.h
new file mode 100644
index 0000000..5137331
--- /dev/null
+++ b/libav1/aom_dsp/fft_common.h

@@ -0,0 +1,1050 @@
+/*
+ * Copyright (c) 2018, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_FFT_COMMON_H_
+#define AOM_AOM_DSP_FFT_COMMON_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*!\brief A function pointer for computing 1d fft and ifft.
+ *
+ * The function will point to an implementation for a specific transform size,
+ * and may perform the transforms using vectorized instructions.
+ *
+ * For a non-vectorized forward transforms of size n, the input and output
+ * buffers will be size n. The output takes advantage of conjugate symmetry and
+ * packs the results as: [r_0, r_1, ..., r_{n/2}, i_1, ..., i_{n/2-1}], where
+ * (r_{j}, i_{j}) is the complex output for index j.
+ *
+ * An inverse transform will assume that the complex "input" is packed
+ * similarly. Its output will be real.
+ *
+ * Non-vectorized transforms (e.g., on a single row) would use a stride = 1.
+ *
+ * Vectorized implementations are parallelized along the columns so that the fft
+ * can be performed on multiple columns at a time. In such cases the data block
+ * for input and output is typically square (n x n) and the stride will
+ * correspond to the spacing between rows. At minimum, the input size must be
+ * n x simd_vector_length.
+ *
+ * \param[in]  input   Input buffer. See above for size restrictions.
+ * \param[out] output  Output buffer. See above for size restrictions.
+ * \param[in]  stride  The spacing in number of elements between rows
+ *                     (or elements)
+ */
+typedef void (*aom_fft_1d_func_t)(const float *input, float *output,
+                                  int stride);
+
+// Declare some of the forward non-vectorized transforms which are used in some
+// of the vectorized implementations
+void aom_fft1d_4_float(const float *input, float *output, int stride);
+void aom_fft1d_8_float(const float *input, float *output, int stride);
+void aom_fft1d_16_float(const float *input, float *output, int stride);
+void aom_fft1d_32_float(const float *input, float *output, int stride);
+
+/**\!brief Function pointer for transposing a matrix of floats.
+ *
+ * \param[in]  input  Input buffer (size n x n)
+ * \param[out] output Output buffer (size n x n)
+ * \param[in]  n      Extent of one dimension of the square matrix.
+ */
+typedef void (*aom_fft_transpose_func_t)(const float *input, float *output,
+                                         int n);
+
+/**\!brief Function pointer for re-arranging intermediate 2d transform results.
+ *
+ * After re-arrangement, the real and imaginary components will be packed
+ * tightly next to each other.
+ *
+ * \param[in]  input  Input buffer (size n x n)
+ * \param[out] output Output buffer (size 2 x n x n)
+ * \param[in]  n      Extent of one dimension of the square matrix.
+ */
+typedef void (*aom_fft_unpack_func_t)(const float *input, float *output, int n);
+
+/*!\brief Performs a 2d fft with the given functions.
+ *
+ * This generator function allows for multiple different implementations of 2d
+ * fft with different vector operations, without having to redefine the main
+ * body multiple times.
+ *
+ * \param[in]  input     Input buffer to run the transform on (size n x n)
+ * \param[out] temp      Working buffer for computing the transform (size n x n)
+ * \param[out] output    Output buffer (size 2 x n x n)
+ * \param[in]  tform     Forward transform function
+ * \param[in]  transpose Transpose function (for n x n matrix)
+ * \param[in]  unpack    Unpack function used to massage outputs to correct form
+ * \param[in]  vec_size  Vector size (the transform is done vec_size units at
+ *                       a time)
+ */
+void aom_fft_2d_gen(const float *input, float *temp, float *output, int n,
+                    aom_fft_1d_func_t tform, aom_fft_transpose_func_t transpose,
+                    aom_fft_unpack_func_t unpack, int vec_size);
+
+/*!\brief Perform a 2d inverse fft with the given helper functions
+ *
+ * \param[in]  input      Input buffer to run the transform on (size 2 x n x n)
+ * \param[out] temp       Working buffer for computations (size 2 x n x n)
+ * \param[out] output     Output buffer (size n x n)
+ * \param[in]  fft_single Forward transform function (non vectorized)
+ * \param[in]  fft_multi  Forward transform function (vectorized)
+ * \param[in]  ifft_multi Inverse transform function (vectorized)
+ * \param[in]  transpose  Transpose function (for n x n matrix)
+ * \param[in]  vec_size   Vector size (the transform is done vec_size
+ *                        units at a time)
+ */
+void aom_ifft_2d_gen(const float *input, float *temp, float *output, int n,
+                     aom_fft_1d_func_t fft_single, aom_fft_1d_func_t fft_multi,
+                     aom_fft_1d_func_t ifft_multi,
+                     aom_fft_transpose_func_t transpose, int vec_size);
+#ifdef __cplusplus
+}
+#endif
+
+// The macros below define 1D fft/ifft for different data types and for
+// different simd vector intrinsic types.
+
+#define GEN_FFT_2(ret, suffix, T, T_VEC, load, store)               \
+  ret aom_fft1d_2_##suffix(const T *input, T *output, int stride) { \
+    const T_VEC i0 = load(input + 0 * stride);                      \
+    const T_VEC i1 = load(input + 1 * stride);                      \
+    store(output + 0 * stride, i0 + i1);                            \
+    store(output + 1 * stride, i0 - i1);                            \
+  }
+
+#define GEN_FFT_4(ret, suffix, T, T_VEC, load, store, constant, add, sub) \
+  ret aom_fft1d_4_##suffix(const T *input, T *output, int stride) {       \
+    const T_VEC kWeight0 = constant(0.0f);                                \
+    const T_VEC i0 = load(input + 0 * stride);                            \
+    const T_VEC i1 = load(input + 1 * stride);                            \
+    const T_VEC i2 = load(input + 2 * stride);                            \
+    const T_VEC i3 = load(input + 3 * stride);                            \
+    const T_VEC w0 = add(i0, i2);                                         \
+    const T_VEC w1 = sub(i0, i2);                                         \
+    const T_VEC w2 = add(i1, i3);                                         \
+    const T_VEC w3 = sub(i1, i3);                                         \
+    store(output + 0 * stride, add(w0, w2));                              \
+    store(output + 1 * stride, w1);                                       \
+    store(output + 2 * stride, sub(w0, w2));                              \
+    store(output + 3 * stride, sub(kWeight0, w3));                        \
+  }
+
+#define GEN_FFT_8(ret, suffix, T, T_VEC, load, store, constant, add, sub, mul) \
+  ret aom_fft1d_8_##suffix(const T *input, T *output, int stride) {            \
+    const T_VEC kWeight0 = constant(0.0f);                                     \
+    const T_VEC kWeight2 = constant(0.707107f);                                \
+    const T_VEC i0 = load(input + 0 * stride);                                 \
+    const T_VEC i1 = load(input + 1 * stride);                                 \
+    const T_VEC i2 = load(input + 2 * stride);                                 \
+    const T_VEC i3 = load(input + 3 * stride);                                 \
+    const T_VEC i4 = load(input + 4 * stride);                                 \
+    const T_VEC i5 = load(input + 5 * stride);                                 \
+    const T_VEC i6 = load(input + 6 * stride);                                 \
+    const T_VEC i7 = load(input + 7 * stride);                                 \
+    const T_VEC w0 = add(i0, i4);                                              \
+    const T_VEC w1 = sub(i0, i4);                                              \
+    const T_VEC w2 = add(i2, i6);                                              \
+    const T_VEC w3 = sub(i2, i6);                                              \
+    const T_VEC w4 = add(w0, w2);                                              \
+    const T_VEC w5 = sub(w0, w2);                                              \
+    const T_VEC w7 = add(i1, i5);                                              \
+    const T_VEC w8 = sub(i1, i5);                                              \
+    const T_VEC w9 = add(i3, i7);                                              \
+    const T_VEC w10 = sub(i3, i7);                                             \
+    const T_VEC w11 = add(w7, w9);                                             \
+    const T_VEC w12 = sub(w7, w9);                                             \
+    store(output + 0 * stride, add(w4, w11));                                  \
+    store(output + 1 * stride, add(w1, mul(kWeight2, sub(w8, w10))));          \
+    store(output + 2 * stride, w5);                                            \
+    store(output + 3 * stride, sub(w1, mul(kWeight2, sub(w8, w10))));          \
+    store(output + 4 * stride, sub(w4, w11));                                  \
+    store(output + 5 * stride,                                                 \
+          sub(sub(kWeight0, w3), mul(kWeight2, add(w10, w8))));                \
+    store(output + 6 * stride, sub(kWeight0, w12));                            \
+    store(output + 7 * stride, sub(w3, mul(kWeight2, add(w10, w8))));          \
+  }
+
+#define GEN_FFT_16(ret, suffix, T, T_VEC, load, store, constant, add, sub, \
+                   mul)                                                    \
+  ret aom_fft1d_16_##suffix(const T *input, T *output, int stride) {       \
+    const T_VEC kWeight0 = constant(0.0f);                                 \
+    const T_VEC kWeight2 = constant(0.707107f);                            \
+    const T_VEC kWeight3 = constant(0.92388f);                             \
+    const T_VEC kWeight4 = constant(0.382683f);                            \
+    const T_VEC i0 = load(input + 0 * stride);                             \
+    const T_VEC i1 = load(input + 1 * stride);                             \
+    const T_VEC i2 = load(input + 2 * stride);                             \
+    const T_VEC i3 = load(input + 3 * stride);                             \
+    const T_VEC i4 = load(input + 4 * stride);                             \
+    const T_VEC i5 = load(input + 5 * stride);                             \
+    const T_VEC i6 = load(input + 6 * stride);                             \
+    const T_VEC i7 = load(input + 7 * stride);                             \
+    const T_VEC i8 = load(input + 8 * stride);                             \
+    const T_VEC i9 = load(input + 9 * stride);                             \
+    const T_VEC i10 = load(input + 10 * stride);                           \
+    const T_VEC i11 = load(input + 11 * stride);                           \
+    const T_VEC i12 = load(input + 12 * stride);                           \
+    const T_VEC i13 = load(input + 13 * stride);                           \
+    const T_VEC i14 = load(input + 14 * stride);                           \
+    const T_VEC i15 = load(input + 15 * stride);                           \
+    const T_VEC w0 = add(i0, i8);                                          \
+    const T_VEC w1 = sub(i0, i8);                                          \
+    const T_VEC w2 = add(i4, i12);                                         \
+    const T_VEC w3 = sub(i4, i12);                                         \
+    const T_VEC w4 = add(w0, w2);                                          \
+    const T_VEC w5 = sub(w0, w2);                                          \
+    const T_VEC w7 = add(i2, i10);                                         \
+    const T_VEC w8 = sub(i2, i10);                                         \
+    const T_VEC w9 = add(i6, i14);                                         \
+    const T_VEC w10 = sub(i6, i14);                                        \
+    const T_VEC w11 = add(w7, w9);                                         \
+    const T_VEC w12 = sub(w7, w9);                                         \
+    const T_VEC w14 = add(w4, w11);                                        \
+    const T_VEC w15 = sub(w4, w11);                                        \
+    const T_VEC w16[2] = { add(w1, mul(kWeight2, sub(w8, w10))),           \
+                           sub(sub(kWeight0, w3),                          \
+                               mul(kWeight2, add(w10, w8))) };             \
+    const T_VEC w18[2] = { sub(w1, mul(kWeight2, sub(w8, w10))),           \
+                           sub(w3, mul(kWeight2, add(w10, w8))) };         \
+    const T_VEC w19 = add(i1, i9);                                         \
+    const T_VEC w20 = sub(i1, i9);                                         \
+    const T_VEC w21 = add(i5, i13);                                        \
+    const T_VEC w22 = sub(i5, i13);                                        \
+    const T_VEC w23 = add(w19, w21);                                       \
+    const T_VEC w24 = sub(w19, w21);                                       \
+    const T_VEC w26 = add(i3, i11);                                        \
+    const T_VEC w27 = sub(i3, i11);                                        \
+    const T_VEC w28 = add(i7, i15);                                        \
+    const T_VEC w29 = sub(i7, i15);                                        \
+    const T_VEC w30 = add(w26, w28);                                       \
+    const T_VEC w31 = sub(w26, w28);                                       \
+    const T_VEC w33 = add(w23, w30);                                       \
+    const T_VEC w34 = sub(w23, w30);                                       \
+    const T_VEC w35[2] = { add(w20, mul(kWeight2, sub(w27, w29))),         \
+                           sub(sub(kWeight0, w22),                         \
+                               mul(kWeight2, add(w29, w27))) };            \
+    const T_VEC w37[2] = { sub(w20, mul(kWeight2, sub(w27, w29))),         \
+                           sub(w22, mul(kWeight2, add(w29, w27))) };       \
+    store(output + 0 * stride, add(w14, w33));                             \
+    store(output + 1 * stride,                                             \
+          add(w16[0], add(mul(kWeight3, w35[0]), mul(kWeight4, w35[1])))); \
+    store(output + 2 * stride, add(w5, mul(kWeight2, sub(w24, w31))));     \
+    store(output + 3 * stride,                                             \
+          add(w18[0], add(mul(kWeight4, w37[0]), mul(kWeight3, w37[1])))); \
+    store(output + 4 * stride, w15);                                       \
+    store(output + 5 * stride,                                             \
+          add(w18[0], sub(sub(kWeight0, mul(kWeight4, w37[0])),            \
+                          mul(kWeight3, w37[1]))));                        \
+    store(output + 6 * stride, sub(w5, mul(kWeight2, sub(w24, w31))));     \
+    store(output + 7 * stride,                                             \
+          add(w16[0], sub(sub(kWeight0, mul(kWeight3, w35[0])),            \
+                          mul(kWeight4, w35[1]))));                        \
+    store(output + 8 * stride, sub(w14, w33));                             \
+    store(output + 9 * stride,                                             \
+          add(w16[1], sub(mul(kWeight3, w35[1]), mul(kWeight4, w35[0])))); \
+    store(output + 10 * stride,                                            \
+          sub(sub(kWeight0, w12), mul(kWeight2, add(w31, w24))));          \
+    store(output + 11 * stride,                                            \
+          add(w18[1], sub(mul(kWeight4, w37[1]), mul(kWeight3, w37[0])))); \
+    store(output + 12 * stride, sub(kWeight0, w34));                       \
+    store(output + 13 * stride,                                            \
+          sub(sub(kWeight0, w18[1]),                                       \
+              sub(mul(kWeight3, w37[0]), mul(kWeight4, w37[1]))));         \
+    store(output + 14 * stride, sub(w12, mul(kWeight2, add(w31, w24))));   \
+    store(output + 15 * stride,                                            \
+          sub(sub(kWeight0, w16[1]),                                       \
+              sub(mul(kWeight4, w35[0]), mul(kWeight3, w35[1]))));         \
+  }
+
+#define GEN_FFT_32(ret, suffix, T, T_VEC, load, store, constant, add, sub,   \
+                   mul)                                                      \
+  ret aom_fft1d_32_##suffix(const T *input, T *output, int stride) {         \
+    const T_VEC kWeight0 = constant(0.0f);                                   \
+    const T_VEC kWeight2 = constant(0.707107f);                              \
+    const T_VEC kWeight3 = constant(0.92388f);                               \
+    const T_VEC kWeight4 = constant(0.382683f);                              \
+    const T_VEC kWeight5 = constant(0.980785f);                              \
+    const T_VEC kWeight6 = constant(0.19509f);                               \
+    const T_VEC kWeight7 = constant(0.83147f);                               \
+    const T_VEC kWeight8 = constant(0.55557f);                               \
+    const T_VEC i0 = load(input + 0 * stride);                               \
+    const T_VEC i1 = load(input + 1 * stride);                               \
+    const T_VEC i2 = load(input + 2 * stride);                               \
+    const T_VEC i3 = load(input + 3 * stride);                               \
+    const T_VEC i4 = load(input + 4 * stride);                               \
+    const T_VEC i5 = load(input + 5 * stride);                               \
+    const T_VEC i6 = load(input + 6 * stride);                               \
+    const T_VEC i7 = load(input + 7 * stride);                               \
+    const T_VEC i8 = load(input + 8 * stride);                               \
+    const T_VEC i9 = load(input + 9 * stride);                               \
+    const T_VEC i10 = load(input + 10 * stride);                             \
+    const T_VEC i11 = load(input + 11 * stride);                             \
+    const T_VEC i12 = load(input + 12 * stride);                             \
+    const T_VEC i13 = load(input + 13 * stride);                             \
+    const T_VEC i14 = load(input + 14 * stride);                             \
+    const T_VEC i15 = load(input + 15 * stride);                             \
+    const T_VEC i16 = load(input + 16 * stride);                             \
+    const T_VEC i17 = load(input + 17 * stride);                             \
+    const T_VEC i18 = load(input + 18 * stride);                             \
+    const T_VEC i19 = load(input + 19 * stride);                             \
+    const T_VEC i20 = load(input + 20 * stride);                             \
+    const T_VEC i21 = load(input + 21 * stride);                             \
+    const T_VEC i22 = load(input + 22 * stride);                             \
+    const T_VEC i23 = load(input + 23 * stride);                             \
+    const T_VEC i24 = load(input + 24 * stride);                             \
+    const T_VEC i25 = load(input + 25 * stride);                             \
+    const T_VEC i26 = load(input + 26 * stride);                             \
+    const T_VEC i27 = load(input + 27 * stride);                             \
+    const T_VEC i28 = load(input + 28 * stride);                             \
+    const T_VEC i29 = load(input + 29 * stride);                             \
+    const T_VEC i30 = load(input + 30 * stride);                             \
+    const T_VEC i31 = load(input + 31 * stride);                             \
+    const T_VEC w0 = add(i0, i16);                                           \
+    const T_VEC w1 = sub(i0, i16);                                           \
+    const T_VEC w2 = add(i8, i24);                                           \
+    const T_VEC w3 = sub(i8, i24);                                           \
+    const T_VEC w4 = add(w0, w2);                                            \
+    const T_VEC w5 = sub(w0, w2);                                            \
+    const T_VEC w7 = add(i4, i20);                                           \
+    const T_VEC w8 = sub(i4, i20);                                           \
+    const T_VEC w9 = add(i12, i28);                                          \
+    const T_VEC w10 = sub(i12, i28);                                         \
+    const T_VEC w11 = add(w7, w9);                                           \
+    const T_VEC w12 = sub(w7, w9);                                           \
+    const T_VEC w14 = add(w4, w11);                                          \
+    const T_VEC w15 = sub(w4, w11);                                          \
+    const T_VEC w16[2] = { add(w1, mul(kWeight2, sub(w8, w10))),             \
+                           sub(sub(kWeight0, w3),                            \
+                               mul(kWeight2, add(w10, w8))) };               \
+    const T_VEC w18[2] = { sub(w1, mul(kWeight2, sub(w8, w10))),             \
+                           sub(w3, mul(kWeight2, add(w10, w8))) };           \
+    const T_VEC w19 = add(i2, i18);                                          \
+    const T_VEC w20 = sub(i2, i18);                                          \
+    const T_VEC w21 = add(i10, i26);                                         \
+    const T_VEC w22 = sub(i10, i26);                                         \
+    const T_VEC w23 = add(w19, w21);                                         \
+    const T_VEC w24 = sub(w19, w21);                                         \
+    const T_VEC w26 = add(i6, i22);                                          \
+    const T_VEC w27 = sub(i6, i22);                                          \
+    const T_VEC w28 = add(i14, i30);                                         \
+    const T_VEC w29 = sub(i14, i30);                                         \
+    const T_VEC w30 = add(w26, w28);                                         \
+    const T_VEC w31 = sub(w26, w28);                                         \
+    const T_VEC w33 = add(w23, w30);                                         \
+    const T_VEC w34 = sub(w23, w30);                                         \
+    const T_VEC w35[2] = { add(w20, mul(kWeight2, sub(w27, w29))),           \
+                           sub(sub(kWeight0, w22),                           \
+                               mul(kWeight2, add(w29, w27))) };              \
+    const T_VEC w37[2] = { sub(w20, mul(kWeight2, sub(w27, w29))),           \
+                           sub(w22, mul(kWeight2, add(w29, w27))) };         \
+    const T_VEC w38 = add(w14, w33);                                         \
+    const T_VEC w39 = sub(w14, w33);                                         \
+    const T_VEC w40[2] = {                                                   \
+      add(w16[0], add(mul(kWeight3, w35[0]), mul(kWeight4, w35[1]))),        \
+      add(w16[1], sub(mul(kWeight3, w35[1]), mul(kWeight4, w35[0])))         \
+    };                                                                       \
+    const T_VEC w41[2] = { add(w5, mul(kWeight2, sub(w24, w31))),            \
+                           sub(sub(kWeight0, w12),                           \
+                               mul(kWeight2, add(w31, w24))) };              \
+    const T_VEC w42[2] = {                                                   \
+      add(w18[0], add(mul(kWeight4, w37[0]), mul(kWeight3, w37[1]))),        \
+      add(w18[1], sub(mul(kWeight4, w37[1]), mul(kWeight3, w37[0])))         \
+    };                                                                       \
+    const T_VEC w44[2] = {                                                   \
+      add(w18[0],                                                            \
+          sub(sub(kWeight0, mul(kWeight4, w37[0])), mul(kWeight3, w37[1]))), \
+      sub(sub(kWeight0, w18[1]),                                             \
+          sub(mul(kWeight3, w37[0]), mul(kWeight4, w37[1])))                 \
+    };                                                                       \
+    const T_VEC w45[2] = { sub(w5, mul(kWeight2, sub(w24, w31))),            \
+                           sub(w12, mul(kWeight2, add(w31, w24))) };         \
+    const T_VEC w46[2] = {                                                   \
+      add(w16[0],                                                            \
+          sub(sub(kWeight0, mul(kWeight3, w35[0])), mul(kWeight4, w35[1]))), \
+      sub(sub(kWeight0, w16[1]),                                             \
+          sub(mul(kWeight4, w35[0]), mul(kWeight3, w35[1])))                 \
+    };                                                                       \
+    const T_VEC w47 = add(i1, i17);                                          \
+    const T_VEC w48 = sub(i1, i17);                                          \
+    const T_VEC w49 = add(i9, i25);                                          \
+    const T_VEC w50 = sub(i9, i25);                                          \
+    const T_VEC w51 = add(w47, w49);                                         \
+    const T_VEC w52 = sub(w47, w49);                                         \
+    const T_VEC w54 = add(i5, i21);                                          \
+    const T_VEC w55 = sub(i5, i21);                                          \
+    const T_VEC w56 = add(i13, i29);                                         \
+    const T_VEC w57 = sub(i13, i29);                                         \
+    const T_VEC w58 = add(w54, w56);                                         \
+    const T_VEC w59 = sub(w54, w56);                                         \
+    const T_VEC w61 = add(w51, w58);                                         \
+    const T_VEC w62 = sub(w51, w58);                                         \
+    const T_VEC w63[2] = { add(w48, mul(kWeight2, sub(w55, w57))),           \
+                           sub(sub(kWeight0, w50),                           \
+                               mul(kWeight2, add(w57, w55))) };              \
+    const T_VEC w65[2] = { sub(w48, mul(kWeight2, sub(w55, w57))),           \
+                           sub(w50, mul(kWeight2, add(w57, w55))) };         \
+    const T_VEC w66 = add(i3, i19);                                          \
+    const T_VEC w67 = sub(i3, i19);                                          \
+    const T_VEC w68 = add(i11, i27);                                         \
+    const T_VEC w69 = sub(i11, i27);                                         \
+    const T_VEC w70 = add(w66, w68);                                         \
+    const T_VEC w71 = sub(w66, w68);                                         \
+    const T_VEC w73 = add(i7, i23);                                          \
+    const T_VEC w74 = sub(i7, i23);                                          \
+    const T_VEC w75 = add(i15, i31);                                         \
+    const T_VEC w76 = sub(i15, i31);                                         \
+    const T_VEC w77 = add(w73, w75);                                         \
+    const T_VEC w78 = sub(w73, w75);                                         \
+    const T_VEC w80 = add(w70, w77);                                         \
+    const T_VEC w81 = sub(w70, w77);                                         \
+    const T_VEC w82[2] = { add(w67, mul(kWeight2, sub(w74, w76))),           \
+                           sub(sub(kWeight0, w69),                           \
+                               mul(kWeight2, add(w76, w74))) };              \
+    const T_VEC w84[2] = { sub(w67, mul(kWeight2, sub(w74, w76))),           \
+                           sub(w69, mul(kWeight2, add(w76, w74))) };         \
+    const T_VEC w85 = add(w61, w80);                                         \
+    const T_VEC w86 = sub(w61, w80);                                         \
+    const T_VEC w87[2] = {                                                   \
+      add(w63[0], add(mul(kWeight3, w82[0]), mul(kWeight4, w82[1]))),        \
+      add(w63[1], sub(mul(kWeight3, w82[1]), mul(kWeight4, w82[0])))         \
+    };                                                                       \
+    const T_VEC w88[2] = { add(w52, mul(kWeight2, sub(w71, w78))),           \
+                           sub(sub(kWeight0, w59),                           \
+                               mul(kWeight2, add(w78, w71))) };              \
+    const T_VEC w89[2] = {                                                   \
+      add(w65[0], add(mul(kWeight4, w84[0]), mul(kWeight3, w84[1]))),        \
+      add(w65[1], sub(mul(kWeight4, w84[1]), mul(kWeight3, w84[0])))         \
+    };                                                                       \
+    const T_VEC w91[2] = {                                                   \
+      add(w65[0],                                                            \
+          sub(sub(kWeight0, mul(kWeight4, w84[0])), mul(kWeight3, w84[1]))), \
+      sub(sub(kWeight0, w65[1]),                                             \
+          sub(mul(kWeight3, w84[0]), mul(kWeight4, w84[1])))                 \
+    };                                                                       \
+    const T_VEC w92[2] = { sub(w52, mul(kWeight2, sub(w71, w78))),           \
+                           sub(w59, mul(kWeight2, add(w78, w71))) };         \
+    const T_VEC w93[2] = {                                                   \
+      add(w63[0],                                                            \
+          sub(sub(kWeight0, mul(kWeight3, w82[0])), mul(kWeight4, w82[1]))), \
+      sub(sub(kWeight0, w63[1]),                                             \
+          sub(mul(kWeight4, w82[0]), mul(kWeight3, w82[1])))                 \
+    };                                                                       \
+    store(output + 0 * stride, add(w38, w85));                               \
+    store(output + 1 * stride,                                               \
+          add(w40[0], add(mul(kWeight5, w87[0]), mul(kWeight6, w87[1]))));   \
+    store(output + 2 * stride,                                               \
+          add(w41[0], add(mul(kWeight3, w88[0]), mul(kWeight4, w88[1]))));   \
+    store(output + 3 * stride,                                               \
+          add(w42[0], add(mul(kWeight7, w89[0]), mul(kWeight8, w89[1]))));   \
+    store(output + 4 * stride, add(w15, mul(kWeight2, sub(w62, w81))));      \
+    store(output + 5 * stride,                                               \
+          add(w44[0], add(mul(kWeight8, w91[0]), mul(kWeight7, w91[1]))));   \
+    store(output + 6 * stride,                                               \
+          add(w45[0], add(mul(kWeight4, w92[0]), mul(kWeight3, w92[1]))));   \
+    store(output + 7 * stride,                                               \
+          add(w46[0], add(mul(kWeight6, w93[0]), mul(kWeight5, w93[1]))));   \
+    store(output + 8 * stride, w39);                                         \
+    store(output + 9 * stride,                                               \
+          add(w46[0], sub(sub(kWeight0, mul(kWeight6, w93[0])),              \
+                          mul(kWeight5, w93[1]))));                          \
+    store(output + 10 * stride,                                              \
+          add(w45[0], sub(sub(kWeight0, mul(kWeight4, w92[0])),              \
+                          mul(kWeight3, w92[1]))));                          \
+    store(output + 11 * stride,                                              \
+          add(w44[0], sub(sub(kWeight0, mul(kWeight8, w91[0])),              \
+                          mul(kWeight7, w91[1]))));                          \
+    store(output + 12 * stride, sub(w15, mul(kWeight2, sub(w62, w81))));     \
+    store(output + 13 * stride,                                              \
+          add(w42[0], sub(sub(kWeight0, mul(kWeight7, w89[0])),              \
+                          mul(kWeight8, w89[1]))));                          \
+    store(output + 14 * stride,                                              \
+          add(w41[0], sub(sub(kWeight0, mul(kWeight3, w88[0])),              \
+                          mul(kWeight4, w88[1]))));                          \
+    store(output + 15 * stride,                                              \
+          add(w40[0], sub(sub(kWeight0, mul(kWeight5, w87[0])),              \
+                          mul(kWeight6, w87[1]))));                          \
+    store(output + 16 * stride, sub(w38, w85));                              \
+    store(output + 17 * stride,                                              \
+          add(w40[1], sub(mul(kWeight5, w87[1]), mul(kWeight6, w87[0]))));   \
+    store(output + 18 * stride,                                              \
+          add(w41[1], sub(mul(kWeight3, w88[1]), mul(kWeight4, w88[0]))));   \
+    store(output + 19 * stride,                                              \
+          add(w42[1], sub(mul(kWeight7, w89[1]), mul(kWeight8, w89[0]))));   \
+    store(output + 20 * stride,                                              \
+          sub(sub(kWeight0, w34), mul(kWeight2, add(w81, w62))));            \
+    store(output + 21 * stride,                                              \
+          add(w44[1], sub(mul(kWeight8, w91[1]), mul(kWeight7, w91[0]))));   \
+    store(output + 22 * stride,                                              \
+          add(w45[1], sub(mul(kWeight4, w92[1]), mul(kWeight3, w92[0]))));   \
+    store(output + 23 * stride,                                              \
+          add(w46[1], sub(mul(kWeight6, w93[1]), mul(kWeight5, w93[0]))));   \
+    store(output + 24 * stride, sub(kWeight0, w86));                         \
+    store(output + 25 * stride,                                              \
+          sub(sub(kWeight0, w46[1]),                                         \
+              sub(mul(kWeight5, w93[0]), mul(kWeight6, w93[1]))));           \
+    store(output + 26 * stride,                                              \
+          sub(sub(kWeight0, w45[1]),                                         \
+              sub(mul(kWeight3, w92[0]), mul(kWeight4, w92[1]))));           \
+    store(output + 27 * stride,                                              \
+          sub(sub(kWeight0, w44[1]),                                         \
+              sub(mul(kWeight7, w91[0]), mul(kWeight8, w91[1]))));           \
+    store(output + 28 * stride, sub(w34, mul(kWeight2, add(w81, w62))));     \
+    store(output + 29 * stride,                                              \
+          sub(sub(kWeight0, w42[1]),                                         \
+              sub(mul(kWeight8, w89[0]), mul(kWeight7, w89[1]))));           \
+    store(output + 30 * stride,                                              \
+          sub(sub(kWeight0, w41[1]),                                         \
+              sub(mul(kWeight4, w88[0]), mul(kWeight3, w88[1]))));           \
+    store(output + 31 * stride,                                              \
+          sub(sub(kWeight0, w40[1]),                                         \
+              sub(mul(kWeight6, w87[0]), mul(kWeight5, w87[1]))));           \
+  }
+
+#define GEN_IFFT_2(ret, suffix, T, T_VEC, load, store)               \
+  ret aom_ifft1d_2_##suffix(const T *input, T *output, int stride) { \
+    const T_VEC i0 = load(input + 0 * stride);                       \
+    const T_VEC i1 = load(input + 1 * stride);                       \
+    store(output + 0 * stride, i0 + i1);                             \
+    store(output + 1 * stride, i0 - i1);                             \
+  }
+
+#define GEN_IFFT_4(ret, suffix, T, T_VEC, load, store, constant, add, sub) \
+  ret aom_ifft1d_4_##suffix(const T *input, T *output, int stride) {       \
+    const T_VEC kWeight0 = constant(0.0f);                                 \
+    const T_VEC i0 = load(input + 0 * stride);                             \
+    const T_VEC i1 = load(input + 1 * stride);                             \
+    const T_VEC i2 = load(input + 2 * stride);                             \
+    const T_VEC i3 = load(input + 3 * stride);                             \
+    const T_VEC w2 = add(i0, i2);                                          \
+    const T_VEC w3 = sub(i0, i2);                                          \
+    const T_VEC w4[2] = { add(i1, i1), sub(i3, i3) };                      \
+    const T_VEC w5[2] = { sub(i1, i1), sub(sub(kWeight0, i3), i3) };       \
+    store(output + 0 * stride, add(w2, w4[0]));                            \
+    store(output + 1 * stride, add(w3, w5[1]));                            \
+    store(output + 2 * stride, sub(w2, w4[0]));                            \
+    store(output + 3 * stride, sub(w3, w5[1]));                            \
+  }
+
+#define GEN_IFFT_8(ret, suffix, T, T_VEC, load, store, constant, add, sub, \
+                   mul)                                                    \
+  ret aom_ifft1d_8_##suffix(const T *input, T *output, int stride) {       \
+    const T_VEC kWeight0 = constant(0.0f);                                 \
+    const T_VEC kWeight2 = constant(0.707107f);                            \
+    const T_VEC i0 = load(input + 0 * stride);                             \
+    const T_VEC i1 = load(input + 1 * stride);                             \
+    const T_VEC i2 = load(input + 2 * stride);                             \
+    const T_VEC i3 = load(input + 3 * stride);                             \
+    const T_VEC i4 = load(input + 4 * stride);                             \
+    const T_VEC i5 = load(input + 5 * stride);                             \
+    const T_VEC i6 = load(input + 6 * stride);                             \
+    const T_VEC i7 = load(input + 7 * stride);                             \
+    const T_VEC w6 = add(i0, i4);                                          \
+    const T_VEC w7 = sub(i0, i4);                                          \
+    const T_VEC w8[2] = { add(i2, i2), sub(i6, i6) };                      \
+    const T_VEC w9[2] = { sub(i2, i2), sub(sub(kWeight0, i6), i6) };       \
+    const T_VEC w10[2] = { add(w6, w8[0]), w8[1] };                        \
+    const T_VEC w11[2] = { sub(w6, w8[0]), sub(kWeight0, w8[1]) };         \
+    const T_VEC w12[2] = { add(w7, w9[1]), sub(kWeight0, w9[0]) };         \
+    const T_VEC w13[2] = { sub(w7, w9[1]), w9[0] };                        \
+    const T_VEC w14[2] = { add(i1, i3), sub(i7, i5) };                     \
+    const T_VEC w15[2] = { sub(i1, i3), sub(sub(kWeight0, i5), i7) };      \
+    const T_VEC w16[2] = { add(i3, i1), sub(i5, i7) };                     \
+    const T_VEC w17[2] = { sub(i3, i1), sub(sub(kWeight0, i7), i5) };      \
+    const T_VEC w18[2] = { add(w14[0], w16[0]), add(w14[1], w16[1]) };     \
+    const T_VEC w19[2] = { sub(w14[0], w16[0]), sub(w14[1], w16[1]) };     \
+    const T_VEC w20[2] = { add(w15[0], w17[1]), sub(w15[1], w17[0]) };     \
+    const T_VEC w21[2] = { sub(w15[0], w17[1]), add(w15[1], w17[0]) };     \
+    store(output + 0 * stride, add(w10[0], w18[0]));                       \
+    store(output + 1 * stride,                                             \
+          add(w12[0], mul(kWeight2, add(w20[0], w20[1]))));                \
+    store(output + 2 * stride, add(w11[0], w19[1]));                       \
+    store(output + 3 * stride,                                             \
+          sub(w13[0], mul(kWeight2, sub(w21[0], w21[1]))));                \
+    store(output + 4 * stride, sub(w10[0], w18[0]));                       \
+    store(output + 5 * stride,                                             \
+          add(w12[0], sub(sub(kWeight0, mul(kWeight2, w20[0])),            \
+                          mul(kWeight2, w20[1]))));                        \
+    store(output + 6 * stride, sub(w11[0], w19[1]));                       \
+    store(output + 7 * stride,                                             \
+          add(w13[0], mul(kWeight2, sub(w21[0], w21[1]))));                \
+  }
+
+#define GEN_IFFT_16(ret, suffix, T, T_VEC, load, store, constant, add, sub,   \
+                    mul)                                                      \
+  ret aom_ifft1d_16_##suffix(const T *input, T *output, int stride) {         \
+    const T_VEC kWeight0 = constant(0.0f);                                    \
+    const T_VEC kWeight2 = constant(0.707107f);                               \
+    const T_VEC kWeight3 = constant(0.92388f);                                \
+    const T_VEC kWeight4 = constant(0.382683f);                               \
+    const T_VEC i0 = load(input + 0 * stride);                                \
+    const T_VEC i1 = load(input + 1 * stride);                                \
+    const T_VEC i2 = load(input + 2 * stride);                                \
+    const T_VEC i3 = load(input + 3 * stride);                                \
+    const T_VEC i4 = load(input + 4 * stride);                                \
+    const T_VEC i5 = load(input + 5 * stride);                                \
+    const T_VEC i6 = load(input + 6 * stride);                                \
+    const T_VEC i7 = load(input + 7 * stride);                                \
+    const T_VEC i8 = load(input + 8 * stride);                                \
+    const T_VEC i9 = load(input + 9 * stride);                                \
+    const T_VEC i10 = load(input + 10 * stride);                              \
+    const T_VEC i11 = load(input + 11 * stride);                              \
+    const T_VEC i12 = load(input + 12 * stride);                              \
+    const T_VEC i13 = load(input + 13 * stride);                              \
+    const T_VEC i14 = load(input + 14 * stride);                              \
+    const T_VEC i15 = load(input + 15 * stride);                              \
+    const T_VEC w14 = add(i0, i8);                                            \
+    const T_VEC w15 = sub(i0, i8);                                            \
+    const T_VEC w16[2] = { add(i4, i4), sub(i12, i12) };                      \
+    const T_VEC w17[2] = { sub(i4, i4), sub(sub(kWeight0, i12), i12) };       \
+    const T_VEC w18[2] = { add(w14, w16[0]), w16[1] };                        \
+    const T_VEC w19[2] = { sub(w14, w16[0]), sub(kWeight0, w16[1]) };         \
+    const T_VEC w20[2] = { add(w15, w17[1]), sub(kWeight0, w17[0]) };         \
+    const T_VEC w21[2] = { sub(w15, w17[1]), w17[0] };                        \
+    const T_VEC w22[2] = { add(i2, i6), sub(i14, i10) };                      \
+    const T_VEC w23[2] = { sub(i2, i6), sub(sub(kWeight0, i10), i14) };       \
+    const T_VEC w24[2] = { add(i6, i2), sub(i10, i14) };                      \
+    const T_VEC w25[2] = { sub(i6, i2), sub(sub(kWeight0, i14), i10) };       \
+    const T_VEC w26[2] = { add(w22[0], w24[0]), add(w22[1], w24[1]) };        \
+    const T_VEC w27[2] = { sub(w22[0], w24[0]), sub(w22[1], w24[1]) };        \
+    const T_VEC w28[2] = { add(w23[0], w25[1]), sub(w23[1], w25[0]) };        \
+    const T_VEC w29[2] = { sub(w23[0], w25[1]), add(w23[1], w25[0]) };        \
+    const T_VEC w30[2] = { add(w18[0], w26[0]), add(w18[1], w26[1]) };        \
+    const T_VEC w31[2] = { sub(w18[0], w26[0]), sub(w18[1], w26[1]) };        \
+    const T_VEC w32[2] = { add(w20[0], mul(kWeight2, add(w28[0], w28[1]))),   \
+                           add(w20[1], mul(kWeight2, sub(w28[1], w28[0]))) }; \
+    const T_VEC w33[2] = { add(w20[0],                                        \
+                               sub(sub(kWeight0, mul(kWeight2, w28[0])),      \
+                                   mul(kWeight2, w28[1]))),                   \
+                           add(w20[1], mul(kWeight2, sub(w28[0], w28[1]))) }; \
+    const T_VEC w34[2] = { add(w19[0], w27[1]), sub(w19[1], w27[0]) };        \
+    const T_VEC w35[2] = { sub(w19[0], w27[1]), add(w19[1], w27[0]) };        \
+    const T_VEC w36[2] = { sub(w21[0], mul(kWeight2, sub(w29[0], w29[1]))),   \
+                           sub(w21[1], mul(kWeight2, add(w29[1], w29[0]))) }; \
+    const T_VEC w37[2] = { add(w21[0], mul(kWeight2, sub(w29[0], w29[1]))),   \
+                           add(w21[1], mul(kWeight2, add(w29[1], w29[0]))) }; \
+    const T_VEC w38[2] = { add(i1, i7), sub(i15, i9) };                       \
+    const T_VEC w39[2] = { sub(i1, i7), sub(sub(kWeight0, i9), i15) };        \
+    const T_VEC w40[2] = { add(i5, i3), sub(i11, i13) };                      \
+    const T_VEC w41[2] = { sub(i5, i3), sub(sub(kWeight0, i13), i11) };       \
+    const T_VEC w42[2] = { add(w38[0], w40[0]), add(w38[1], w40[1]) };        \
+    const T_VEC w43[2] = { sub(w38[0], w40[0]), sub(w38[1], w40[1]) };        \
+    const T_VEC w44[2] = { add(w39[0], w41[1]), sub(w39[1], w41[0]) };        \
+    const T_VEC w45[2] = { sub(w39[0], w41[1]), add(w39[1], w41[0]) };        \
+    const T_VEC w46[2] = { add(i3, i5), sub(i13, i11) };                      \
+    const T_VEC w47[2] = { sub(i3, i5), sub(sub(kWeight0, i11), i13) };       \
+    const T_VEC w48[2] = { add(i7, i1), sub(i9, i15) };                       \
+    const T_VEC w49[2] = { sub(i7, i1), sub(sub(kWeight0, i15), i9) };        \
+    const T_VEC w50[2] = { add(w46[0], w48[0]), add(w46[1], w48[1]) };        \
+    const T_VEC w51[2] = { sub(w46[0], w48[0]), sub(w46[1], w48[1]) };        \
+    const T_VEC w52[2] = { add(w47[0], w49[1]), sub(w47[1], w49[0]) };        \
+    const T_VEC w53[2] = { sub(w47[0], w49[1]), add(w47[1], w49[0]) };        \
+    const T_VEC w54[2] = { add(w42[0], w50[0]), add(w42[1], w50[1]) };        \
+    const T_VEC w55[2] = { sub(w42[0], w50[0]), sub(w42[1], w50[1]) };        \
+    const T_VEC w56[2] = { add(w44[0], mul(kWeight2, add(w52[0], w52[1]))),   \
+                           add(w44[1], mul(kWeight2, sub(w52[1], w52[0]))) }; \
+    const T_VEC w57[2] = { add(w44[0],                                        \
+                               sub(sub(kWeight0, mul(kWeight2, w52[0])),      \
+                                   mul(kWeight2, w52[1]))),                   \
+                           add(w44[1], mul(kWeight2, sub(w52[0], w52[1]))) }; \
+    const T_VEC w58[2] = { add(w43[0], w51[1]), sub(w43[1], w51[0]) };        \
+    const T_VEC w59[2] = { sub(w43[0], w51[1]), add(w43[1], w51[0]) };        \
+    const T_VEC w60[2] = { sub(w45[0], mul(kWeight2, sub(w53[0], w53[1]))),   \
+                           sub(w45[1], mul(kWeight2, add(w53[1], w53[0]))) }; \
+    const T_VEC w61[2] = { add(w45[0], mul(kWeight2, sub(w53[0], w53[1]))),   \
+                           add(w45[1], mul(kWeight2, add(w53[1], w53[0]))) }; \
+    store(output + 0 * stride, add(w30[0], w54[0]));                          \
+    store(output + 1 * stride,                                                \
+          add(w32[0], add(mul(kWeight3, w56[0]), mul(kWeight4, w56[1]))));    \
+    store(output + 2 * stride,                                                \
+          add(w34[0], mul(kWeight2, add(w58[0], w58[1]))));                   \
+    store(output + 3 * stride,                                                \
+          add(w36[0], add(mul(kWeight4, w60[0]), mul(kWeight3, w60[1]))));    \
+    store(output + 4 * stride, add(w31[0], w55[1]));                          \
+    store(output + 5 * stride,                                                \
+          sub(w33[0], sub(mul(kWeight4, w57[0]), mul(kWeight3, w57[1]))));    \
+    store(output + 6 * stride,                                                \
+          sub(w35[0], mul(kWeight2, sub(w59[0], w59[1]))));                   \
+    store(output + 7 * stride,                                                \
+          sub(w37[0], sub(mul(kWeight3, w61[0]), mul(kWeight4, w61[1]))));    \
+    store(output + 8 * stride, sub(w30[0], w54[0]));                          \
+    store(output + 9 * stride,                                                \
+          add(w32[0], sub(sub(kWeight0, mul(kWeight3, w56[0])),               \
+                          mul(kWeight4, w56[1]))));                           \
+    store(output + 10 * stride,                                               \
+          add(w34[0], sub(sub(kWeight0, mul(kWeight2, w58[0])),               \
+                          mul(kWeight2, w58[1]))));                           \
+    store(output + 11 * stride,                                               \
+          add(w36[0], sub(sub(kWeight0, mul(kWeight4, w60[0])),               \
+                          mul(kWeight3, w60[1]))));                           \
+    store(output + 12 * stride, sub(w31[0], w55[1]));                         \
+    store(output + 13 * stride,                                               \
+          add(w33[0], sub(mul(kWeight4, w57[0]), mul(kWeight3, w57[1]))));    \
+    store(output + 14 * stride,                                               \
+          add(w35[0], mul(kWeight2, sub(w59[0], w59[1]))));                   \
+    store(output + 15 * stride,                                               \
+          add(w37[0], sub(mul(kWeight3, w61[0]), mul(kWeight4, w61[1]))));    \
+  }
+#define GEN_IFFT_32(ret, suffix, T, T_VEC, load, store, constant, add, sub,    \
+                    mul)                                                       \
+  ret aom_ifft1d_32_##suffix(const T *input, T *output, int stride) {          \
+    const T_VEC kWeight0 = constant(0.0f);                                     \
+    const T_VEC kWeight2 = constant(0.707107f);                                \
+    const T_VEC kWeight3 = constant(0.92388f);                                 \
+    const T_VEC kWeight4 = constant(0.382683f);                                \
+    const T_VEC kWeight5 = constant(0.980785f);                                \
+    const T_VEC kWeight6 = constant(0.19509f);                                 \
+    const T_VEC kWeight7 = constant(0.83147f);                                 \
+    const T_VEC kWeight8 = constant(0.55557f);                                 \
+    const T_VEC i0 = load(input + 0 * stride);                                 \
+    const T_VEC i1 = load(input + 1 * stride);                                 \
+    const T_VEC i2 = load(input + 2 * stride);                                 \
+    const T_VEC i3 = load(input + 3 * stride);                                 \
+    const T_VEC i4 = load(input + 4 * stride);                                 \
+    const T_VEC i5 = load(input + 5 * stride);                                 \
+    const T_VEC i6 = load(input + 6 * stride);                                 \
+    const T_VEC i7 = load(input + 7 * stride);                                 \
+    const T_VEC i8 = load(input + 8 * stride);                                 \
+    const T_VEC i9 = load(input + 9 * stride);                                 \
+    const T_VEC i10 = load(input + 10 * stride);                               \
+    const T_VEC i11 = load(input + 11 * stride);                               \
+    const T_VEC i12 = load(input + 12 * stride);                               \
+    const T_VEC i13 = load(input + 13 * stride);                               \
+    const T_VEC i14 = load(input + 14 * stride);                               \
+    const T_VEC i15 = load(input + 15 * stride);                               \
+    const T_VEC i16 = load(input + 16 * stride);                               \
+    const T_VEC i17 = load(input + 17 * stride);                               \
+    const T_VEC i18 = load(input + 18 * stride);                               \
+    const T_VEC i19 = load(input + 19 * stride);                               \
+    const T_VEC i20 = load(input + 20 * stride);                               \
+    const T_VEC i21 = load(input + 21 * stride);                               \
+    const T_VEC i22 = load(input + 22 * stride);                               \
+    const T_VEC i23 = load(input + 23 * stride);                               \
+    const T_VEC i24 = load(input + 24 * stride);                               \
+    const T_VEC i25 = load(input + 25 * stride);                               \
+    const T_VEC i26 = load(input + 26 * stride);                               \
+    const T_VEC i27 = load(input + 27 * stride);                               \
+    const T_VEC i28 = load(input + 28 * stride);                               \
+    const T_VEC i29 = load(input + 29 * stride);                               \
+    const T_VEC i30 = load(input + 30 * stride);                               \
+    const T_VEC i31 = load(input + 31 * stride);                               \
+    const T_VEC w30 = add(i0, i16);                                            \
+    const T_VEC w31 = sub(i0, i16);                                            \
+    const T_VEC w32[2] = { add(i8, i8), sub(i24, i24) };                       \
+    const T_VEC w33[2] = { sub(i8, i8), sub(sub(kWeight0, i24), i24) };        \
+    const T_VEC w34[2] = { add(w30, w32[0]), w32[1] };                         \
+    const T_VEC w35[2] = { sub(w30, w32[0]), sub(kWeight0, w32[1]) };          \
+    const T_VEC w36[2] = { add(w31, w33[1]), sub(kWeight0, w33[0]) };          \
+    const T_VEC w37[2] = { sub(w31, w33[1]), w33[0] };                         \
+    const T_VEC w38[2] = { add(i4, i12), sub(i28, i20) };                      \
+    const T_VEC w39[2] = { sub(i4, i12), sub(sub(kWeight0, i20), i28) };       \
+    const T_VEC w40[2] = { add(i12, i4), sub(i20, i28) };                      \
+    const T_VEC w41[2] = { sub(i12, i4), sub(sub(kWeight0, i28), i20) };       \
+    const T_VEC w42[2] = { add(w38[0], w40[0]), add(w38[1], w40[1]) };         \
+    const T_VEC w43[2] = { sub(w38[0], w40[0]), sub(w38[1], w40[1]) };         \
+    const T_VEC w44[2] = { add(w39[0], w41[1]), sub(w39[1], w41[0]) };         \
+    const T_VEC w45[2] = { sub(w39[0], w41[1]), add(w39[1], w41[0]) };         \
+    const T_VEC w46[2] = { add(w34[0], w42[0]), add(w34[1], w42[1]) };         \
+    const T_VEC w47[2] = { sub(w34[0], w42[0]), sub(w34[1], w42[1]) };         \
+    const T_VEC w48[2] = { add(w36[0], mul(kWeight2, add(w44[0], w44[1]))),    \
+                           add(w36[1], mul(kWeight2, sub(w44[1], w44[0]))) };  \
+    const T_VEC w49[2] = { add(w36[0],                                         \
+                               sub(sub(kWeight0, mul(kWeight2, w44[0])),       \
+                                   mul(kWeight2, w44[1]))),                    \
+                           add(w36[1], mul(kWeight2, sub(w44[0], w44[1]))) };  \
+    const T_VEC w50[2] = { add(w35[0], w43[1]), sub(w35[1], w43[0]) };         \
+    const T_VEC w51[2] = { sub(w35[0], w43[1]), add(w35[1], w43[0]) };         \
+    const T_VEC w52[2] = { sub(w37[0], mul(kWeight2, sub(w45[0], w45[1]))),    \
+                           sub(w37[1], mul(kWeight2, add(w45[1], w45[0]))) };  \
+    const T_VEC w53[2] = { add(w37[0], mul(kWeight2, sub(w45[0], w45[1]))),    \
+                           add(w37[1], mul(kWeight2, add(w45[1], w45[0]))) };  \
+    const T_VEC w54[2] = { add(i2, i14), sub(i30, i18) };                      \
+    const T_VEC w55[2] = { sub(i2, i14), sub(sub(kWeight0, i18), i30) };       \
+    const T_VEC w56[2] = { add(i10, i6), sub(i22, i26) };                      \
+    const T_VEC w57[2] = { sub(i10, i6), sub(sub(kWeight0, i26), i22) };       \
+    const T_VEC w58[2] = { add(w54[0], w56[0]), add(w54[1], w56[1]) };         \
+    const T_VEC w59[2] = { sub(w54[0], w56[0]), sub(w54[1], w56[1]) };         \
+    const T_VEC w60[2] = { add(w55[0], w57[1]), sub(w55[1], w57[0]) };         \
+    const T_VEC w61[2] = { sub(w55[0], w57[1]), add(w55[1], w57[0]) };         \
+    const T_VEC w62[2] = { add(i6, i10), sub(i26, i22) };                      \
+    const T_VEC w63[2] = { sub(i6, i10), sub(sub(kWeight0, i22), i26) };       \
+    const T_VEC w64[2] = { add(i14, i2), sub(i18, i30) };                      \
+    const T_VEC w65[2] = { sub(i14, i2), sub(sub(kWeight0, i30), i18) };       \
+    const T_VEC w66[2] = { add(w62[0], w64[0]), add(w62[1], w64[1]) };         \
+    const T_VEC w67[2] = { sub(w62[0], w64[0]), sub(w62[1], w64[1]) };         \
+    const T_VEC w68[2] = { add(w63[0], w65[1]), sub(w63[1], w65[0]) };         \
+    const T_VEC w69[2] = { sub(w63[0], w65[1]), add(w63[1], w65[0]) };         \
+    const T_VEC w70[2] = { add(w58[0], w66[0]), add(w58[1], w66[1]) };         \
+    const T_VEC w71[2] = { sub(w58[0], w66[0]), sub(w58[1], w66[1]) };         \
+    const T_VEC w72[2] = { add(w60[0], mul(kWeight2, add(w68[0], w68[1]))),    \
+                           add(w60[1], mul(kWeight2, sub(w68[1], w68[0]))) };  \
+    const T_VEC w73[2] = { add(w60[0],                                         \
+                               sub(sub(kWeight0, mul(kWeight2, w68[0])),       \
+                                   mul(kWeight2, w68[1]))),                    \
+                           add(w60[1], mul(kWeight2, sub(w68[0], w68[1]))) };  \
+    const T_VEC w74[2] = { add(w59[0], w67[1]), sub(w59[1], w67[0]) };         \
+    const T_VEC w75[2] = { sub(w59[0], w67[1]), add(w59[1], w67[0]) };         \
+    const T_VEC w76[2] = { sub(w61[0], mul(kWeight2, sub(w69[0], w69[1]))),    \
+                           sub(w61[1], mul(kWeight2, add(w69[1], w69[0]))) };  \
+    const T_VEC w77[2] = { add(w61[0], mul(kWeight2, sub(w69[0], w69[1]))),    \
+                           add(w61[1], mul(kWeight2, add(w69[1], w69[0]))) };  \
+    const T_VEC w78[2] = { add(w46[0], w70[0]), add(w46[1], w70[1]) };         \
+    const T_VEC w79[2] = { sub(w46[0], w70[0]), sub(w46[1], w70[1]) };         \
+    const T_VEC w80[2] = {                                                     \
+      add(w48[0], add(mul(kWeight3, w72[0]), mul(kWeight4, w72[1]))),          \
+      add(w48[1], sub(mul(kWeight3, w72[1]), mul(kWeight4, w72[0])))           \
+    };                                                                         \
+    const T_VEC w81[2] = {                                                     \
+      add(w48[0],                                                              \
+          sub(sub(kWeight0, mul(kWeight3, w72[0])), mul(kWeight4, w72[1]))),   \
+      add(w48[1], sub(mul(kWeight4, w72[0]), mul(kWeight3, w72[1])))           \
+    };                                                                         \
+    const T_VEC w82[2] = { add(w50[0], mul(kWeight2, add(w74[0], w74[1]))),    \
+                           add(w50[1], mul(kWeight2, sub(w74[1], w74[0]))) };  \
+    const T_VEC w83[2] = { add(w50[0],                                         \
+                               sub(sub(kWeight0, mul(kWeight2, w74[0])),       \
+                                   mul(kWeight2, w74[1]))),                    \
+                           add(w50[1], mul(kWeight2, sub(w74[0], w74[1]))) };  \
+    const T_VEC w84[2] = {                                                     \
+      add(w52[0], add(mul(kWeight4, w76[0]), mul(kWeight3, w76[1]))),          \
+      add(w52[1], sub(mul(kWeight4, w76[1]), mul(kWeight3, w76[0])))           \
+    };                                                                         \
+    const T_VEC w85[2] = {                                                     \
+      add(w52[0],                                                              \
+          sub(sub(kWeight0, mul(kWeight4, w76[0])), mul(kWeight3, w76[1]))),   \
+      add(w52[1], sub(mul(kWeight3, w76[0]), mul(kWeight4, w76[1])))           \
+    };                                                                         \
+    const T_VEC w86[2] = { add(w47[0], w71[1]), sub(w47[1], w71[0]) };         \
+    const T_VEC w87[2] = { sub(w47[0], w71[1]), add(w47[1], w71[0]) };         \
+    const T_VEC w88[2] = {                                                     \
+      sub(w49[0], sub(mul(kWeight4, w73[0]), mul(kWeight3, w73[1]))),          \
+      add(w49[1],                                                              \
+          sub(sub(kWeight0, mul(kWeight4, w73[1])), mul(kWeight3, w73[0])))    \
+    };                                                                         \
+    const T_VEC w89[2] = {                                                     \
+      add(w49[0], sub(mul(kWeight4, w73[0]), mul(kWeight3, w73[1]))),          \
+      add(w49[1], add(mul(kWeight4, w73[1]), mul(kWeight3, w73[0])))           \
+    };                                                                         \
+    const T_VEC w90[2] = { sub(w51[0], mul(kWeight2, sub(w75[0], w75[1]))),    \
+                           sub(w51[1], mul(kWeight2, add(w75[1], w75[0]))) };  \
+    const T_VEC w91[2] = { add(w51[0], mul(kWeight2, sub(w75[0], w75[1]))),    \
+                           add(w51[1], mul(kWeight2, add(w75[1], w75[0]))) };  \
+    const T_VEC w92[2] = {                                                     \
+      sub(w53[0], sub(mul(kWeight3, w77[0]), mul(kWeight4, w77[1]))),          \
+      add(w53[1],                                                              \
+          sub(sub(kWeight0, mul(kWeight3, w77[1])), mul(kWeight4, w77[0])))    \
+    };                                                                         \
+    const T_VEC w93[2] = {                                                     \
+      add(w53[0], sub(mul(kWeight3, w77[0]), mul(kWeight4, w77[1]))),          \
+      add(w53[1], add(mul(kWeight3, w77[1]), mul(kWeight4, w77[0])))           \
+    };                                                                         \
+    const T_VEC w94[2] = { add(i1, i15), sub(i31, i17) };                      \
+    const T_VEC w95[2] = { sub(i1, i15), sub(sub(kWeight0, i17), i31) };       \
+    const T_VEC w96[2] = { add(i9, i7), sub(i23, i25) };                       \
+    const T_VEC w97[2] = { sub(i9, i7), sub(sub(kWeight0, i25), i23) };        \
+    const T_VEC w98[2] = { add(w94[0], w96[0]), add(w94[1], w96[1]) };         \
+    const T_VEC w99[2] = { sub(w94[0], w96[0]), sub(w94[1], w96[1]) };         \
+    const T_VEC w100[2] = { add(w95[0], w97[1]), sub(w95[1], w97[0]) };        \
+    const T_VEC w101[2] = { sub(w95[0], w97[1]), add(w95[1], w97[0]) };        \
+    const T_VEC w102[2] = { add(i5, i11), sub(i27, i21) };                     \
+    const T_VEC w103[2] = { sub(i5, i11), sub(sub(kWeight0, i21), i27) };      \
+    const T_VEC w104[2] = { add(i13, i3), sub(i19, i29) };                     \
+    const T_VEC w105[2] = { sub(i13, i3), sub(sub(kWeight0, i29), i19) };      \
+    const T_VEC w106[2] = { add(w102[0], w104[0]), add(w102[1], w104[1]) };    \
+    const T_VEC w107[2] = { sub(w102[0], w104[0]), sub(w102[1], w104[1]) };    \
+    const T_VEC w108[2] = { add(w103[0], w105[1]), sub(w103[1], w105[0]) };    \
+    const T_VEC w109[2] = { sub(w103[0], w105[1]), add(w103[1], w105[0]) };    \
+    const T_VEC w110[2] = { add(w98[0], w106[0]), add(w98[1], w106[1]) };      \
+    const T_VEC w111[2] = { sub(w98[0], w106[0]), sub(w98[1], w106[1]) };      \
+    const T_VEC w112[2] = {                                                    \
+      add(w100[0], mul(kWeight2, add(w108[0], w108[1]))),                      \
+      add(w100[1], mul(kWeight2, sub(w108[1], w108[0])))                       \
+    };                                                                         \
+    const T_VEC w113[2] = {                                                    \
+      add(w100[0],                                                             \
+          sub(sub(kWeight0, mul(kWeight2, w108[0])), mul(kWeight2, w108[1]))), \
+      add(w100[1], mul(kWeight2, sub(w108[0], w108[1])))                       \
+    };                                                                         \
+    const T_VEC w114[2] = { add(w99[0], w107[1]), sub(w99[1], w107[0]) };      \
+    const T_VEC w115[2] = { sub(w99[0], w107[1]), add(w99[1], w107[0]) };      \
+    const T_VEC w116[2] = {                                                    \
+      sub(w101[0], mul(kWeight2, sub(w109[0], w109[1]))),                      \
+      sub(w101[1], mul(kWeight2, add(w109[1], w109[0])))                       \
+    };                                                                         \
+    const T_VEC w117[2] = {                                                    \
+      add(w101[0], mul(kWeight2, sub(w109[0], w109[1]))),                      \
+      add(w101[1], mul(kWeight2, add(w109[1], w109[0])))                       \
+    };                                                                         \
+    const T_VEC w118[2] = { add(i3, i13), sub(i29, i19) };                     \
+    const T_VEC w119[2] = { sub(i3, i13), sub(sub(kWeight0, i19), i29) };      \
+    const T_VEC w120[2] = { add(i11, i5), sub(i21, i27) };                     \
+    const T_VEC w121[2] = { sub(i11, i5), sub(sub(kWeight0, i27), i21) };      \
+    const T_VEC w122[2] = { add(w118[0], w120[0]), add(w118[1], w120[1]) };    \
+    const T_VEC w123[2] = { sub(w118[0], w120[0]), sub(w118[1], w120[1]) };    \
+    const T_VEC w124[2] = { add(w119[0], w121[1]), sub(w119[1], w121[0]) };    \
+    const T_VEC w125[2] = { sub(w119[0], w121[1]), add(w119[1], w121[0]) };    \
+    const T_VEC w126[2] = { add(i7, i9), sub(i25, i23) };                      \
+    const T_VEC w127[2] = { sub(i7, i9), sub(sub(kWeight0, i23), i25) };       \
+    const T_VEC w128[2] = { add(i15, i1), sub(i17, i31) };                     \
+    const T_VEC w129[2] = { sub(i15, i1), sub(sub(kWeight0, i31), i17) };      \
+    const T_VEC w130[2] = { add(w126[0], w128[0]), add(w126[1], w128[1]) };    \
+    const T_VEC w131[2] = { sub(w126[0], w128[0]), sub(w126[1], w128[1]) };    \
+    const T_VEC w132[2] = { add(w127[0], w129[1]), sub(w127[1], w129[0]) };    \
+    const T_VEC w133[2] = { sub(w127[0], w129[1]), add(w127[1], w129[0]) };    \
+    const T_VEC w134[2] = { add(w122[0], w130[0]), add(w122[1], w130[1]) };    \
+    const T_VEC w135[2] = { sub(w122[0], w130[0]), sub(w122[1], w130[1]) };    \
+    const T_VEC w136[2] = {                                                    \
+      add(w124[0], mul(kWeight2, add(w132[0], w132[1]))),                      \
+      add(w124[1], mul(kWeight2, sub(w132[1], w132[0])))                       \
+    };                                                                         \
+    const T_VEC w137[2] = {                                                    \
+      add(w124[0],                                                             \
+          sub(sub(kWeight0, mul(kWeight2, w132[0])), mul(kWeight2, w132[1]))), \
+      add(w124[1], mul(kWeight2, sub(w132[0], w132[1])))                       \
+    };                                                                         \
+    const T_VEC w138[2] = { add(w123[0], w131[1]), sub(w123[1], w131[0]) };    \
+    const T_VEC w139[2] = { sub(w123[0], w131[1]), add(w123[1], w131[0]) };    \
+    const T_VEC w140[2] = {                                                    \
+      sub(w125[0], mul(kWeight2, sub(w133[0], w133[1]))),                      \
+      sub(w125[1], mul(kWeight2, add(w133[1], w133[0])))                       \
+    };                                                                         \
+    const T_VEC w141[2] = {                                                    \
+      add(w125[0], mul(kWeight2, sub(w133[0], w133[1]))),                      \
+      add(w125[1], mul(kWeight2, add(w133[1], w133[0])))                       \
+    };                                                                         \
+    const T_VEC w142[2] = { add(w110[0], w134[0]), add(w110[1], w134[1]) };    \
+    const T_VEC w143[2] = { sub(w110[0], w134[0]), sub(w110[1], w134[1]) };    \
+    const T_VEC w144[2] = {                                                    \
+      add(w112[0], add(mul(kWeight3, w136[0]), mul(kWeight4, w136[1]))),       \
+      add(w112[1], sub(mul(kWeight3, w136[1]), mul(kWeight4, w136[0])))        \
+    };                                                                         \
+    const T_VEC w145[2] = {                                                    \
+      add(w112[0],                                                             \
+          sub(sub(kWeight0, mul(kWeight3, w136[0])), mul(kWeight4, w136[1]))), \
+      add(w112[1], sub(mul(kWeight4, w136[0]), mul(kWeight3, w136[1])))        \
+    };                                                                         \
+    const T_VEC w146[2] = {                                                    \
+      add(w114[0], mul(kWeight2, add(w138[0], w138[1]))),                      \
+      add(w114[1], mul(kWeight2, sub(w138[1], w138[0])))                       \
+    };                                                                         \
+    const T_VEC w147[2] = {                                                    \
+      add(w114[0],                                                             \
+          sub(sub(kWeight0, mul(kWeight2, w138[0])), mul(kWeight2, w138[1]))), \
+      add(w114[1], mul(kWeight2, sub(w138[0], w138[1])))                       \
+    };                                                                         \
+    const T_VEC w148[2] = {                                                    \
+      add(w116[0], add(mul(kWeight4, w140[0]), mul(kWeight3, w140[1]))),       \
+      add(w116[1], sub(mul(kWeight4, w140[1]), mul(kWeight3, w140[0])))        \
+    };                                                                         \
+    const T_VEC w149[2] = {                                                    \
+      add(w116[0],                                                             \
+          sub(sub(kWeight0, mul(kWeight4, w140[0])), mul(kWeight3, w140[1]))), \
+      add(w116[1], sub(mul(kWeight3, w140[0]), mul(kWeight4, w140[1])))        \
+    };                                                                         \
+    const T_VEC w150[2] = { add(w111[0], w135[1]), sub(w111[1], w135[0]) };    \
+    const T_VEC w151[2] = { sub(w111[0], w135[1]), add(w111[1], w135[0]) };    \
+    const T_VEC w152[2] = {                                                    \
+      sub(w113[0], sub(mul(kWeight4, w137[0]), mul(kWeight3, w137[1]))),       \
+      add(w113[1],                                                             \
+          sub(sub(kWeight0, mul(kWeight4, w137[1])), mul(kWeight3, w137[0])))  \
+    };                                                                         \
+    const T_VEC w153[2] = {                                                    \
+      add(w113[0], sub(mul(kWeight4, w137[0]), mul(kWeight3, w137[1]))),       \
+      add(w113[1], add(mul(kWeight4, w137[1]), mul(kWeight3, w137[0])))        \
+    };                                                                         \
+    const T_VEC w154[2] = {                                                    \
+      sub(w115[0], mul(kWeight2, sub(w139[0], w139[1]))),                      \
+      sub(w115[1], mul(kWeight2, add(w139[1], w139[0])))                       \
+    };                                                                         \
+    const T_VEC w155[2] = {                                                    \
+      add(w115[0], mul(kWeight2, sub(w139[0], w139[1]))),                      \
+      add(w115[1], mul(kWeight2, add(w139[1], w139[0])))                       \
+    };                                                                         \
+    const T_VEC w156[2] = {                                                    \
+      sub(w117[0], sub(mul(kWeight3, w141[0]), mul(kWeight4, w141[1]))),       \
+      add(w117[1],                                                             \
+          sub(sub(kWeight0, mul(kWeight3, w141[1])), mul(kWeight4, w141[0])))  \
+    };                                                                         \
+    const T_VEC w157[2] = {                                                    \
+      add(w117[0], sub(mul(kWeight3, w141[0]), mul(kWeight4, w141[1]))),       \
+      add(w117[1], add(mul(kWeight3, w141[1]), mul(kWeight4, w141[0])))        \
+    };                                                                         \
+    store(output + 0 * stride, add(w78[0], w142[0]));                          \
+    store(output + 1 * stride,                                                 \
+          add(w80[0], add(mul(kWeight5, w144[0]), mul(kWeight6, w144[1]))));   \
+    store(output + 2 * stride,                                                 \
+          add(w82[0], add(mul(kWeight3, w146[0]), mul(kWeight4, w146[1]))));   \
+    store(output + 3 * stride,                                                 \
+          add(w84[0], add(mul(kWeight7, w148[0]), mul(kWeight8, w148[1]))));   \
+    store(output + 4 * stride,                                                 \
+          add(w86[0], mul(kWeight2, add(w150[0], w150[1]))));                  \
+    store(output + 5 * stride,                                                 \
+          add(w88[0], add(mul(kWeight8, w152[0]), mul(kWeight7, w152[1]))));   \
+    store(output + 6 * stride,                                                 \
+          add(w90[0], add(mul(kWeight4, w154[0]), mul(kWeight3, w154[1]))));   \
+    store(output + 7 * stride,                                                 \
+          add(w92[0], add(mul(kWeight6, w156[0]), mul(kWeight5, w156[1]))));   \
+    store(output + 8 * stride, add(w79[0], w143[1]));                          \
+    store(output + 9 * stride,                                                 \
+          sub(w81[0], sub(mul(kWeight6, w145[0]), mul(kWeight5, w145[1]))));   \
+    store(output + 10 * stride,                                                \
+          sub(w83[0], sub(mul(kWeight4, w147[0]), mul(kWeight3, w147[1]))));   \
+    store(output + 11 * stride,                                                \
+          sub(w85[0], sub(mul(kWeight8, w149[0]), mul(kWeight7, w149[1]))));   \
+    store(output + 12 * stride,                                                \
+          sub(w87[0], mul(kWeight2, sub(w151[0], w151[1]))));                  \
+    store(output + 13 * stride,                                                \
+          sub(w89[0], sub(mul(kWeight7, w153[0]), mul(kWeight8, w153[1]))));   \
+    store(output + 14 * stride,                                                \
+          sub(w91[0], sub(mul(kWeight3, w155[0]), mul(kWeight4, w155[1]))));   \
+    store(output + 15 * stride,                                                \
+          sub(w93[0], sub(mul(kWeight5, w157[0]), mul(kWeight6, w157[1]))));   \
+    store(output + 16 * stride, sub(w78[0], w142[0]));                         \
+    store(output + 17 * stride,                                                \
+          add(w80[0], sub(sub(kWeight0, mul(kWeight5, w144[0])),               \
+                          mul(kWeight6, w144[1]))));                           \
+    store(output + 18 * stride,                                                \
+          add(w82[0], sub(sub(kWeight0, mul(kWeight3, w146[0])),               \
+                          mul(kWeight4, w146[1]))));                           \
+    store(output + 19 * stride,                                                \
+          add(w84[0], sub(sub(kWeight0, mul(kWeight7, w148[0])),               \
+                          mul(kWeight8, w148[1]))));                           \
+    store(output + 20 * stride,                                                \
+          add(w86[0], sub(sub(kWeight0, mul(kWeight2, w150[0])),               \
+                          mul(kWeight2, w150[1]))));                           \
+    store(output + 21 * stride,                                                \
+          add(w88[0], sub(sub(kWeight0, mul(kWeight8, w152[0])),               \
+                          mul(kWeight7, w152[1]))));                           \
+    store(output + 22 * stride,                                                \
+          add(w90[0], sub(sub(kWeight0, mul(kWeight4, w154[0])),               \
+                          mul(kWeight3, w154[1]))));                           \
+    store(output + 23 * stride,                                                \
+          add(w92[0], sub(sub(kWeight0, mul(kWeight6, w156[0])),               \
+                          mul(kWeight5, w156[1]))));                           \
+    store(output + 24 * stride, sub(w79[0], w143[1]));                         \
+    store(output + 25 * stride,                                                \
+          add(w81[0], sub(mul(kWeight6, w145[0]), mul(kWeight5, w145[1]))));   \
+    store(output + 26 * stride,                                                \
+          add(w83[0], sub(mul(kWeight4, w147[0]), mul(kWeight3, w147[1]))));   \
+    store(output + 27 * stride,                                                \
+          add(w85[0], sub(mul(kWeight8, w149[0]), mul(kWeight7, w149[1]))));   \
+    store(output + 28 * stride,                                                \
+          add(w87[0], mul(kWeight2, sub(w151[0], w151[1]))));                  \
+    store(output + 29 * stride,                                                \
+          add(w89[0], sub(mul(kWeight7, w153[0]), mul(kWeight8, w153[1]))));   \
+    store(output + 30 * stride,                                                \
+          add(w91[0], sub(mul(kWeight3, w155[0]), mul(kWeight4, w155[1]))));   \
+    store(output + 31 * stride,                                                \
+          add(w93[0], sub(mul(kWeight5, w157[0]), mul(kWeight6, w157[1]))));   \
+  }
+
+#endif  // AOM_AOM_DSP_FFT_COMMON_H_

diff --git a/libav1/aom_dsp/fwd_txfm.c b/libav1/aom_dsp/fwd_txfm.c
new file mode 100644
index 0000000..e50f951
--- /dev/null
+++ b/libav1/aom_dsp/fwd_txfm.c

@@ -0,0 +1,103 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+#include "aom_dsp/txfm_common.h"
+#include "config/aom_dsp_rtcd.h"
+
+void aom_fdct8x8_c(const int16_t *input, tran_low_t *final_output, int stride) {
+  int i, j;
+  tran_low_t intermediate[64];
+  int pass;
+  tran_low_t *output = intermediate;
+  const tran_low_t *in = NULL;
+
+  // Transform columns
+  for (pass = 0; pass < 2; ++pass) {
+    tran_high_t s0, s1, s2, s3, s4, s5, s6, s7;  // canbe16
+    tran_high_t t0, t1, t2, t3;                  // needs32
+    tran_high_t x0, x1, x2, x3;                  // canbe16
+
+    for (i = 0; i < 8; i++) {
+      // stage 1
+      if (pass == 0) {
+        s0 = (input[0 * stride] + input[7 * stride]) * 4;
+        s1 = (input[1 * stride] + input[6 * stride]) * 4;
+        s2 = (input[2 * stride] + input[5 * stride]) * 4;
+        s3 = (input[3 * stride] + input[4 * stride]) * 4;
+        s4 = (input[3 * stride] - input[4 * stride]) * 4;
+        s5 = (input[2 * stride] - input[5 * stride]) * 4;
+        s6 = (input[1 * stride] - input[6 * stride]) * 4;
+        s7 = (input[0 * stride] - input[7 * stride]) * 4;
+        ++input;
+      } else {
+        s0 = in[0 * 8] + in[7 * 8];
+        s1 = in[1 * 8] + in[6 * 8];
+        s2 = in[2 * 8] + in[5 * 8];
+        s3 = in[3 * 8] + in[4 * 8];
+        s4 = in[3 * 8] - in[4 * 8];
+        s5 = in[2 * 8] - in[5 * 8];
+        s6 = in[1 * 8] - in[6 * 8];
+        s7 = in[0 * 8] - in[7 * 8];
+        ++in;
+      }
+
+      // fdct4(step, step);
+      x0 = s0 + s3;
+      x1 = s1 + s2;
+      x2 = s1 - s2;
+      x3 = s0 - s3;
+      t0 = (x0 + x1) * cospi_16_64;
+      t1 = (x0 - x1) * cospi_16_64;
+      t2 = x2 * cospi_24_64 + x3 * cospi_8_64;
+      t3 = -x2 * cospi_8_64 + x3 * cospi_24_64;
+      output[0] = (tran_low_t)fdct_round_shift(t0);
+      output[2] = (tran_low_t)fdct_round_shift(t2);
+      output[4] = (tran_low_t)fdct_round_shift(t1);
+      output[6] = (tran_low_t)fdct_round_shift(t3);
+
+      // Stage 2
+      t0 = (s6 - s5) * cospi_16_64;
+      t1 = (s6 + s5) * cospi_16_64;
+      t2 = fdct_round_shift(t0);
+      t3 = fdct_round_shift(t1);
+
+      // Stage 3
+      x0 = s4 + t2;
+      x1 = s4 - t2;
+      x2 = s7 - t3;
+      x3 = s7 + t3;
+
+      // Stage 4
+      t0 = x0 * cospi_28_64 + x3 * cospi_4_64;
+      t1 = x1 * cospi_12_64 + x2 * cospi_20_64;
+      t2 = x2 * cospi_12_64 + x1 * -cospi_20_64;
+      t3 = x3 * cospi_28_64 + x0 * -cospi_4_64;
+      output[1] = (tran_low_t)fdct_round_shift(t0);
+      output[3] = (tran_low_t)fdct_round_shift(t2);
+      output[5] = (tran_low_t)fdct_round_shift(t1);
+      output[7] = (tran_low_t)fdct_round_shift(t3);
+      output += 8;
+    }
+    in = intermediate;
+    output = final_output;
+  }
+
+  // Rows
+  for (i = 0; i < 8; ++i) {
+    for (j = 0; j < 8; ++j) final_output[j + i * 8] /= 2;
+  }
+}
+
+void aom_highbd_fdct8x8_c(const int16_t *input, tran_low_t *final_output,
+                          int stride) {
+  aom_fdct8x8_c(input, final_output, stride);
+}

diff --git a/libav1/aom_dsp/grain_synthesis.c b/libav1/aom_dsp/grain_synthesis.c
new file mode 100644
index 0000000..b96e1c3
--- /dev/null
+++ b/libav1/aom_dsp/grain_synthesis.c

@@ -0,0 +1,1409 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+/*!\file
+ * \brief Describes film grain parameters and film grain synthesis
+ *
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <assert.h>
+#include "aom_dsp/grain_synthesis.h"
+#include "aom_mem/aom_mem.h"
+
+// Samples with Gaussian distribution in the range of [-2048, 2047] (12 bits)
+// with zero mean and standard deviation of about 512.
+// should be divided by 4 for 10-bit range and 16 for 8-bit range.
+static const int gaussian_sequence[2048] = {
+  56,    568,   -180,  172,   124,   -84,   172,   -64,   -900,  24,   820,
+  224,   1248,  996,   272,   -8,    -916,  -388,  -732,  -104,  -188, 800,
+  112,   -652,  -320,  -376,  140,   -252,  492,   -168,  44,    -788, 588,
+  -584,  500,   -228,  12,    680,   272,   -476,  972,   -100,  652,  368,
+  432,   -196,  -720,  -192,  1000,  -332,  652,   -136,  -552,  -604, -4,
+  192,   -220,  -136,  1000,  -52,   372,   -96,   -624,  124,   -24,  396,
+  540,   -12,   -104,  640,   464,   244,   -208,  -84,   368,   -528, -740,
+  248,   -968,  -848,  608,   376,   -60,   -292,  -40,   -156,  252,  -292,
+  248,   224,   -280,  400,   -244,  244,   -60,   76,    -80,   212,  532,
+  340,   128,   -36,   824,   -352,  -60,   -264,  -96,   -612,  416,  -704,
+  220,   -204,  640,   -160,  1220,  -408,  900,   336,   20,    -336, -96,
+  -792,  304,   48,    -28,   -1232, -1172, -448,  104,   -292,  -520, 244,
+  60,    -948,  0,     -708,  268,   108,   356,   -548,  488,   -344, -136,
+  488,   -196,  -224,  656,   -236,  -1128, 60,    4,     140,   276,  -676,
+  -376,  168,   -108,  464,   8,     564,   64,    240,   308,   -300, -400,
+  -456,  -136,  56,    120,   -408,  -116,  436,   504,   -232,  328,  844,
+  -164,  -84,   784,   -168,  232,   -224,  348,   -376,  128,   568,  96,
+  -1244, -288,  276,   848,   832,   -360,  656,   464,   -384,  -332, -356,
+  728,   -388,  160,   -192,  468,   296,   224,   140,   -776,  -100, 280,
+  4,     196,   44,    -36,   -648,  932,   16,    1428,  28,    528,  808,
+  772,   20,    268,   88,    -332,  -284,  124,   -384,  -448,  208,  -228,
+  -1044, -328,  660,   380,   -148,  -300,  588,   240,   540,   28,   136,
+  -88,   -436,  256,   296,   -1000, 1400,  0,     -48,   1056,  -136, 264,
+  -528,  -1108, 632,   -484,  -592,  -344,  796,   124,   -668,  -768, 388,
+  1296,  -232,  -188,  -200,  -288,  -4,    308,   100,   -168,  256,  -500,
+  204,   -508,  648,   -136,  372,   -272,  -120,  -1004, -552,  -548, -384,
+  548,   -296,  428,   -108,  -8,    -912,  -324,  -224,  -88,   -112, -220,
+  -100,  996,   -796,  548,   360,   -216,  180,   428,   -200,  -212, 148,
+  96,    148,   284,   216,   -412,  -320,  120,   -300,  -384,  -604, -572,
+  -332,  -8,    -180,  -176,  696,   116,   -88,   628,   76,    44,   -516,
+  240,   -208,  -40,   100,   -592,  344,   -308,  -452,  -228,  20,   916,
+  -1752, -136,  -340,  -804,  140,   40,    512,   340,   248,   184,  -492,
+  896,   -156,  932,   -628,  328,   -688,  -448,  -616,  -752,  -100, 560,
+  -1020, 180,   -800,  -64,   76,    576,   1068,  396,   660,   552,  -108,
+  -28,   320,   -628,  312,   -92,   -92,   -472,  268,   16,    560,  516,
+  -672,  -52,   492,   -100,  260,   384,   284,   292,   304,   -148, 88,
+  -152,  1012,  1064,  -228,  164,   -376,  -684,  592,   -392,  156,  196,
+  -524,  -64,   -884,  160,   -176,  636,   648,   404,   -396,  -436, 864,
+  424,   -728,  988,   -604,  904,   -592,  296,   -224,  536,   -176, -920,
+  436,   -48,   1176,  -884,  416,   -776,  -824,  -884,  524,   -548, -564,
+  -68,   -164,  -96,   692,   364,   -692,  -1012, -68,   260,   -480, 876,
+  -1116, 452,   -332,  -352,  892,   -1088, 1220,  -676,  12,    -292, 244,
+  496,   372,   -32,   280,   200,   112,   -440,  -96,   24,    -644, -184,
+  56,    -432,  224,   -980,  272,   -260,  144,   -436,  420,   356,  364,
+  -528,  76,    172,   -744,  -368,  404,   -752,  -416,  684,   -688, 72,
+  540,   416,   92,    444,   480,   -72,   -1416, 164,   -1172, -68,  24,
+  424,   264,   1040,  128,   -912,  -524,  -356,  64,    876,   -12,  4,
+  -88,   532,   272,   -524,  320,   276,   -508,  940,   24,    -400, -120,
+  756,   60,    236,   -412,  100,   376,   -484,  400,   -100,  -740, -108,
+  -260,  328,   -268,  224,   -200,  -416,  184,   -604,  -564,  -20,  296,
+  60,    892,   -888,  60,    164,   68,    -760,  216,   -296,  904,  -336,
+  -28,   404,   -356,  -568,  -208,  -1480, -512,  296,   328,   -360, -164,
+  -1560, -776,  1156,  -428,  164,   -504,  -112,  120,   -216,  -148, -264,
+  308,   32,    64,    -72,   72,    116,   176,   -64,   -272,  460,  -536,
+  -784,  -280,  348,   108,   -752,  -132,  524,   -540,  -776,  116,  -296,
+  -1196, -288,  -560,  1040,  -472,  116,   -848,  -1116, 116,   636,  696,
+  284,   -176,  1016,  204,   -864,  -648,  -248,  356,   972,   -584, -204,
+  264,   880,   528,   -24,   -184,  116,   448,   -144,  828,   524,  212,
+  -212,  52,    12,    200,   268,   -488,  -404,  -880,  824,   -672, -40,
+  908,   -248,  500,   716,   -576,  492,   -576,  16,    720,   -108, 384,
+  124,   344,   280,   576,   -500,  252,   104,   -308,  196,   -188, -8,
+  1268,  296,   1032,  -1196, 436,   316,   372,   -432,  -200,  -660, 704,
+  -224,  596,   -132,  268,   32,    -452,  884,   104,   -1008, 424,  -1348,
+  -280,  4,     -1168, 368,   476,   696,   300,   -8,    24,    180,  -592,
+  -196,  388,   304,   500,   724,   -160,  244,   -84,   272,   -256, -420,
+  320,   208,   -144,  -156,  156,   364,   452,   28,    540,   316,  220,
+  -644,  -248,  464,   72,    360,   32,    -388,  496,   -680,  -48,  208,
+  -116,  -408,  60,    -604,  -392,  548,   -840,  784,   -460,  656,  -544,
+  -388,  -264,  908,   -800,  -628,  -612,  -568,  572,   -220,  164,  288,
+  -16,   -308,  308,   -112,  -636,  -760,  280,   -668,  432,   364,  240,
+  -196,  604,   340,   384,   196,   592,   -44,   -500,  432,   -580, -132,
+  636,   -76,   392,   4,     -412,  540,   508,   328,   -356,  -36,  16,
+  -220,  -64,   -248,  -60,   24,    -192,  368,   1040,  92,    -24,  -1044,
+  -32,   40,    104,   148,   192,   -136,  -520,  56,    -816,  -224, 732,
+  392,   356,   212,   -80,   -424,  -1008, -324,  588,   -1496, 576,  460,
+  -816,  -848,  56,    -580,  -92,   -1372, -112,  -496,  200,   364,  52,
+  -140,  48,    -48,   -60,   84,    72,    40,    132,   -356,  -268, -104,
+  -284,  -404,  732,   -520,  164,   -304,  -540,  120,   328,   -76,  -460,
+  756,   388,   588,   236,   -436,  -72,   -176,  -404,  -316,  -148, 716,
+  -604,  404,   -72,   -88,   -888,  -68,   944,   88,    -220,  -344, 960,
+  472,   460,   -232,  704,   120,   832,   -228,  692,   -508,  132,  -476,
+  844,   -748,  -364,  -44,   1116,  -1104, -1056, 76,    428,   552,  -692,
+  60,    356,   96,    -384,  -188,  -612,  -576,  736,   508,   892,  352,
+  -1132, 504,   -24,   -352,  324,   332,   -600,  -312,  292,   508,  -144,
+  -8,    484,   48,    284,   -260,  -240,  256,   -100,  -292,  -204, -44,
+  472,   -204,  908,   -188,  -1000, -256,  92,    1164,  -392,  564,  356,
+  652,   -28,   -884,  256,   484,   -192,  760,   -176,  376,   -524, -452,
+  -436,  860,   -736,  212,   124,   504,   -476,  468,   76,    -472, 552,
+  -692,  -944,  -620,  740,   -240,  400,   132,   20,    192,   -196, 264,
+  -668,  -1012, -60,   296,   -316,  -828,  76,    -156,  284,   -768, -448,
+  -832,  148,   248,   652,   616,   1236,  288,   -328,  -400,  -124, 588,
+  220,   520,   -696,  1032,  768,   -740,  -92,   -272,  296,   448,  -464,
+  412,   -200,  392,   440,   -200,  264,   -152,  -260,  320,   1032, 216,
+  320,   -8,    -64,   156,   -1016, 1084,  1172,  536,   484,   -432, 132,
+  372,   -52,   -256,  84,    116,   -352,  48,    116,   304,   -384, 412,
+  924,   -300,  528,   628,   180,   648,   44,    -980,  -220,  1320, 48,
+  332,   748,   524,   -268,  -720,  540,   -276,  564,   -344,  -208, -196,
+  436,   896,   88,    -392,  132,   80,    -964,  -288,  568,   56,   -48,
+  -456,  888,   8,     552,   -156,  -292,  948,   288,   128,   -716, -292,
+  1192,  -152,  876,   352,   -600,  -260,  -812,  -468,  -28,   -120, -32,
+  -44,   1284,  496,   192,   464,   312,   -76,   -516,  -380,  -456, -1012,
+  -48,   308,   -156,  36,    492,   -156,  -808,  188,   1652,  68,   -120,
+  -116,  316,   160,   -140,  352,   808,   -416,  592,   316,   -480, 56,
+  528,   -204,  -568,  372,   -232,  752,   -344,  744,   -4,    324,  -416,
+  -600,  768,   268,   -248,  -88,   -132,  -420,  -432,  80,    -288, 404,
+  -316,  -1216, -588,  520,   -108,  92,    -320,  368,   -480,  -216, -92,
+  1688,  -300,  180,   1020,  -176,  820,   -68,   -228,  -260,  436,  -904,
+  20,    40,    -508,  440,   -736,  312,   332,   204,   760,   -372, 728,
+  96,    -20,   -632,  -520,  -560,  336,   1076,  -64,   -532,  776,  584,
+  192,   396,   -728,  -520,  276,   -188,  80,    -52,   -612,  -252, -48,
+  648,   212,   -688,  228,   -52,   -260,  428,   -412,  -272,  -404, 180,
+  816,   -796,  48,    152,   484,   -88,   -216,  988,   696,   188,  -528,
+  648,   -116,  -180,  316,   476,   12,    -564,  96,    476,   -252, -364,
+  -376,  -392,  556,   -256,  -576,  260,   -352,  120,   -16,   -136, -260,
+  -492,  72,    556,   660,   580,   616,   772,   436,   424,   -32,  -324,
+  -1268, 416,   -324,  -80,   920,   160,   228,   724,   32,    -516, 64,
+  384,   68,    -128,  136,   240,   248,   -204,  -68,   252,   -932, -120,
+  -480,  -628,  -84,   192,   852,   -404,  -288,  -132,  204,   100,  168,
+  -68,   -196,  -868,  460,   1080,  380,   -80,   244,   0,     484,  -888,
+  64,    184,   352,   600,   460,   164,   604,   -196,  320,   -64,  588,
+  -184,  228,   12,    372,   48,    -848,  -344,  224,   208,   -200, 484,
+  128,   -20,   272,   -468,  -840,  384,   256,   -720,  -520,  -464, -580,
+  112,   -120,  644,   -356,  -208,  -608,  -528,  704,   560,   -424, 392,
+  828,   40,    84,    200,   -152,  0,     -144,  584,   280,   -120, 80,
+  -556,  -972,  -196,  -472,  724,   80,    168,   -32,   88,    160,  -688,
+  0,     160,   356,   372,   -776,  740,   -128,  676,   -248,  -480, 4,
+  -364,  96,    544,   232,   -1032, 956,   236,   356,   20,    -40,  300,
+  24,    -676,  -596,  132,   1120,  -104,  532,   -1096, 568,   648,  444,
+  508,   380,   188,   -376,  -604,  1488,  424,   24,    756,   -220, -192,
+  716,   120,   920,   688,   168,   44,    -460,  568,   284,   1144, 1160,
+  600,   424,   888,   656,   -356,  -320,  220,   316,   -176,  -724, -188,
+  -816,  -628,  -348,  -228,  -380,  1012,  -452,  -660,  736,   928,  404,
+  -696,  -72,   -268,  -892,  128,   184,   -344,  -780,  360,   336,  400,
+  344,   428,   548,   -112,  136,   -228,  -216,  -820,  -516,  340,  92,
+  -136,  116,   -300,  376,   -244,  100,   -316,  -520,  -284,  -12,  824,
+  164,   -548,  -180,  -128,  116,   -924,  -828,  268,   -368,  -580, 620,
+  192,   160,   0,     -1676, 1068,  424,   -56,   -360,  468,   -156, 720,
+  288,   -528,  556,   -364,  548,   -148,  504,   316,   152,   -648, -620,
+  -684,  -24,   -376,  -384,  -108,  -920,  -1032, 768,   180,   -264, -508,
+  -1268, -260,  -60,   300,   -240,  988,   724,   -376,  -576,  -212, -736,
+  556,   192,   1092,  -620,  -880,  376,   -56,   -4,    -216,  -32,  836,
+  268,   396,   1332,  864,   -600,  100,   56,    -412,  -92,   356,  180,
+  884,   -468,  -436,  292,   -388,  -804,  -704,  -840,  368,   -348, 140,
+  -724,  1536,  940,   372,   112,   -372,  436,   -480,  1136,  296,  -32,
+  -228,  132,   -48,   -220,  868,   -1016, -60,   -1044, -464,  328,  916,
+  244,   12,    -736,  -296,  360,   468,   -376,  -108,  -92,   788,  368,
+  -56,   544,   400,   -672,  -420,  728,   16,    320,   44,    -284, -380,
+  -796,  488,   132,   204,   -596,  -372,  88,    -152,  -908,  -636, -572,
+  -624,  -116,  -692,  -200,  -56,   276,   -88,   484,   -324,  948,  864,
+  1000,  -456,  -184,  -276,  292,   -296,  156,   676,   320,   160,  908,
+  -84,   -1236, -288,  -116,  260,   -372,  -644,  732,   -756,  -96,  84,
+  344,   -520,  348,   -688,  240,   -84,   216,   -1044, -136,  -676, -396,
+  -1500, 960,   -40,   176,   168,   1516,  420,   -504,  -344,  -364, -360,
+  1216,  -940,  -380,  -212,  252,   -660,  -708,  484,   -444,  -152, 928,
+  -120,  1112,  476,   -260,  560,   -148,  -344,  108,   -196,  228,  -288,
+  504,   560,   -328,  -88,   288,   -1008, 460,   -228,  468,   -836, -196,
+  76,    388,   232,   412,   -1168, -716,  -644,  756,   -172,  -356, -504,
+  116,   432,   528,   48,    476,   -168,  -608,  448,   160,   -532, -272,
+  28,    -676,  -12,   828,   980,   456,   520,   104,   -104,  256,  -344,
+  -4,    -28,   -368,  -52,   -524,  -572,  -556,  -200,  768,   1124, -208,
+  -512,  176,   232,   248,   -148,  -888,  604,   -600,  -304,  804,  -156,
+  -212,  488,   -192,  -804,  -256,  368,   -360,  -916,  -328,  228,  -240,
+  -448,  -472,  856,   -556,  -364,  572,   -12,   -156,  -368,  -340, 432,
+  252,   -752,  -152,  288,   268,   -580,  -848,  -592,  108,   -76,  244,
+  312,   -716,  592,   -80,   436,   360,   4,     -248,  160,   516,  584,
+  732,   44,    -468,  -280,  -292,  -156,  -588,  28,    308,   912,  24,
+  124,   156,   180,   -252,  944,   -924,  -772,  -520,  -428,  -624, 300,
+  -212,  -1144, 32,    -724,  800,   -1128, -212,  -1288, -848,  180,  -416,
+  440,   192,   -576,  -792,  -76,   -1080, 80,    -532,  -352,  -132, 380,
+  -820,  148,   1112,  128,   164,   456,   700,   -924,  144,   -668, -384,
+  648,   -832,  508,   552,   -52,   -100,  -656,  208,   -568,  748,  -88,
+  680,   232,   300,   192,   -408,  -1012, -152,  -252,  -268,  272,  -876,
+  -664,  -648,  -332,  -136,  16,    12,    1152,  -28,   332,   -536, 320,
+  -672,  -460,  -316,  532,   -260,  228,   -40,   1052,  -816,  180,  88,
+  -496,  -556,  -672,  -368,  428,   92,    356,   404,   -408,  252,  196,
+  -176,  -556,  792,   268,   32,    372,   40,    96,    -332,  328,  120,
+  372,   -900,  -40,   472,   -264,  -592,  952,   128,   656,   112,  664,
+  -232,  420,   4,     -344,  -464,  556,   244,   -416,  -32,   252,  0,
+  -412,  188,   -696,  508,   -476,  324,   -1096, 656,   -312,  560,  264,
+  -136,  304,   160,   -64,   -580,  248,   336,   -720,  560,   -348, -288,
+  -276,  -196,  -500,  852,   -544,  -236,  -1128, -992,  -776,  116,  56,
+  52,    860,   884,   212,   -12,   168,   1020,  512,   -552,  924,  -148,
+  716,   188,   164,   -340,  -520,  -184,  880,   -152,  -680,  -208, -1156,
+  -300,  -528,  -472,  364,   100,   -744,  -1056, -32,   540,   280,  144,
+  -676,  -32,   -232,  -280,  -224,  96,    568,   -76,   172,   148,  148,
+  104,   32,    -296,  -32,   788,   -80,   32,    -16,   280,   288,  944,
+  428,   -484
+};
+
+static const int gauss_bits = 11;
+
+static int luma_subblock_size_y = 32;
+static int luma_subblock_size_x = 32;
+
+static int chroma_subblock_size_y = 16;
+static int chroma_subblock_size_x = 16;
+
+static const int min_luma_legal_range = 16;
+static const int max_luma_legal_range = 235;
+
+static const int min_chroma_legal_range = 16;
+static const int max_chroma_legal_range = 240;
+
+static int scaling_lut_y[256];
+static int scaling_lut_cb[256];
+static int scaling_lut_cr[256];
+
+static int grain_center;
+static int grain_min;
+static int grain_max;
+
+static uint16_t random_register = 0;  // random number generator register
+
+static void init_arrays(const aom_film_grain_t *params, int luma_stride,
+                        int chroma_stride, int ***pred_pos_luma_p,
+                        int ***pred_pos_chroma_p, int **luma_grain_block,
+                        int **cb_grain_block, int **cr_grain_block,
+                        int **y_line_buf, int **cb_line_buf, int **cr_line_buf,
+                        int **y_col_buf, int **cb_col_buf, int **cr_col_buf,
+                        int luma_grain_samples, int chroma_grain_samples,
+                        int chroma_subsamp_y, int chroma_subsamp_x) {
+  memset(scaling_lut_y, 0, sizeof(*scaling_lut_y) * 256);
+  memset(scaling_lut_cb, 0, sizeof(*scaling_lut_cb) * 256);
+  memset(scaling_lut_cr, 0, sizeof(*scaling_lut_cr) * 256);
+
+  int num_pos_luma = 2 * params->ar_coeff_lag * (params->ar_coeff_lag + 1);
+  int num_pos_chroma = num_pos_luma;
+  if (params->num_y_points > 0) ++num_pos_chroma;
+
+  int **pred_pos_luma;
+  int **pred_pos_chroma;
+
+  pred_pos_luma = (int **)aom_malloc(sizeof(*pred_pos_luma) * num_pos_luma);
+
+  for (int row = 0; row < num_pos_luma; row++) {
+    pred_pos_luma[row] = (int *)aom_malloc(sizeof(**pred_pos_luma) * 3);
+  }
+
+  pred_pos_chroma =
+      (int **)aom_malloc(sizeof(*pred_pos_chroma) * num_pos_chroma);
+
+  for (int row = 0; row < num_pos_chroma; row++) {
+    pred_pos_chroma[row] = (int *)aom_malloc(sizeof(**pred_pos_chroma) * 3);
+  }
+
+  int pos_ar_index = 0;
+
+  for (int row = -params->ar_coeff_lag; row < 0; row++) {
+    for (int col = -params->ar_coeff_lag; col < params->ar_coeff_lag + 1;
+         col++) {
+      pred_pos_luma[pos_ar_index][0] = row;
+      pred_pos_luma[pos_ar_index][1] = col;
+      pred_pos_luma[pos_ar_index][2] = 0;
+
+      pred_pos_chroma[pos_ar_index][0] = row;
+      pred_pos_chroma[pos_ar_index][1] = col;
+      pred_pos_chroma[pos_ar_index][2] = 0;
+      ++pos_ar_index;
+    }
+  }
+
+  for (int col = -params->ar_coeff_lag; col < 0; col++) {
+    pred_pos_luma[pos_ar_index][0] = 0;
+    pred_pos_luma[pos_ar_index][1] = col;
+    pred_pos_luma[pos_ar_index][2] = 0;
+
+    pred_pos_chroma[pos_ar_index][0] = 0;
+    pred_pos_chroma[pos_ar_index][1] = col;
+    pred_pos_chroma[pos_ar_index][2] = 0;
+
+    ++pos_ar_index;
+  }
+
+  if (params->num_y_points > 0) {
+    pred_pos_chroma[pos_ar_index][0] = 0;
+    pred_pos_chroma[pos_ar_index][1] = 0;
+    pred_pos_chroma[pos_ar_index][2] = 1;
+  }
+
+  *pred_pos_luma_p = pred_pos_luma;
+  *pred_pos_chroma_p = pred_pos_chroma;
+
+  *y_line_buf = (int *)aom_malloc(sizeof(**y_line_buf) * luma_stride * 2);
+  *cb_line_buf = (int *)aom_malloc(sizeof(**cb_line_buf) * chroma_stride *
+                                   (2 >> chroma_subsamp_y));
+  *cr_line_buf = (int *)aom_malloc(sizeof(**cr_line_buf) * chroma_stride *
+                                   (2 >> chroma_subsamp_y));
+
+  *y_col_buf =
+      (int *)aom_malloc(sizeof(**y_col_buf) * (luma_subblock_size_y + 2) * 2);
+  *cb_col_buf =
+      (int *)aom_malloc(sizeof(**cb_col_buf) *
+                        (chroma_subblock_size_y + (2 >> chroma_subsamp_y)) *
+                        (2 >> chroma_subsamp_x));
+  *cr_col_buf =
+      (int *)aom_malloc(sizeof(**cr_col_buf) *
+                        (chroma_subblock_size_y + (2 >> chroma_subsamp_y)) *
+                        (2 >> chroma_subsamp_x));
+
+  *luma_grain_block =
+      (int *)aom_malloc(sizeof(**luma_grain_block) * luma_grain_samples);
+  *cb_grain_block =
+      (int *)aom_malloc(sizeof(**cb_grain_block) * chroma_grain_samples);
+  *cr_grain_block =
+      (int *)aom_malloc(sizeof(**cr_grain_block) * chroma_grain_samples);
+}
+
+static void dealloc_arrays(const aom_film_grain_t *params, int ***pred_pos_luma,
+                           int ***pred_pos_chroma, int **luma_grain_block,
+                           int **cb_grain_block, int **cr_grain_block,
+                           int **y_line_buf, int **cb_line_buf,
+                           int **cr_line_buf, int **y_col_buf, int **cb_col_buf,
+                           int **cr_col_buf) {
+  int num_pos_luma = 2 * params->ar_coeff_lag * (params->ar_coeff_lag + 1);
+  int num_pos_chroma = num_pos_luma;
+  if (params->num_y_points > 0) ++num_pos_chroma;
+
+  for (int row = 0; row < num_pos_luma; row++) {
+    aom_free((*pred_pos_luma)[row]);
+  }
+  aom_free(*pred_pos_luma);
+
+  for (int row = 0; row < num_pos_chroma; row++) {
+    aom_free((*pred_pos_chroma)[row]);
+  }
+  aom_free((*pred_pos_chroma));
+
+  aom_free(*y_line_buf);
+
+  aom_free(*cb_line_buf);
+
+  aom_free(*cr_line_buf);
+
+  aom_free(*y_col_buf);
+
+  aom_free(*cb_col_buf);
+
+  aom_free(*cr_col_buf);
+
+  aom_free(*luma_grain_block);
+
+  aom_free(*cb_grain_block);
+
+  aom_free(*cr_grain_block);
+}
+
+// get a number between 0 and 2^bits - 1
+static INLINE int get_random_number(int bits) {
+  uint16_t bit;
+  bit = ((random_register >> 0) ^ (random_register >> 1) ^
+         (random_register >> 3) ^ (random_register >> 12)) &
+        1;
+  random_register = (random_register >> 1) | (bit << 15);
+  return (random_register >> (16 - bits)) & ((1 << bits) - 1);
+}
+
+static void init_random_generator(int luma_line, uint16_t seed) {
+  // same for the picture
+
+  uint16_t msb = (seed >> 8) & 255;
+  uint16_t lsb = seed & 255;
+
+  random_register = (msb << 8) + lsb;
+
+  //  changes for each row
+  int luma_num = luma_line >> 5;
+
+  random_register ^= ((luma_num * 37 + 178) & 255) << 8;
+  random_register ^= ((luma_num * 173 + 105) & 255);
+}
+
+// Return 0 for success, -1 for failure
+static int generate_luma_grain_block(
+    const aom_film_grain_t *params, int **pred_pos_luma, int *luma_grain_block,
+    int luma_block_size_y, int luma_block_size_x, int luma_grain_stride,
+    int left_pad, int top_pad, int right_pad, int bottom_pad) {
+  if (params->num_y_points == 0) {
+    memset(luma_grain_block, 0,
+           sizeof(*luma_grain_block) * luma_block_size_y * luma_grain_stride);
+    return 0;
+  }
+
+  int bit_depth = params->bit_depth;
+  int gauss_sec_shift = 12 - bit_depth + params->grain_scale_shift;
+
+  int num_pos_luma = 2 * params->ar_coeff_lag * (params->ar_coeff_lag + 1);
+  int rounding_offset = (1 << (params->ar_coeff_shift - 1));
+
+  for (int i = 0; i < luma_block_size_y; i++)
+    for (int j = 0; j < luma_block_size_x; j++)
+      luma_grain_block[i * luma_grain_stride + j] =
+          (gaussian_sequence[get_random_number(gauss_bits)] +
+           ((1 << gauss_sec_shift) >> 1)) >>
+          gauss_sec_shift;
+
+  for (int i = top_pad; i < luma_block_size_y - bottom_pad; i++)
+    for (int j = left_pad; j < luma_block_size_x - right_pad; j++) {
+      int wsum = 0;
+      for (int pos = 0; pos < num_pos_luma; pos++) {
+        wsum = wsum + params->ar_coeffs_y[pos] *
+                          luma_grain_block[(i + pred_pos_luma[pos][0]) *
+                                               luma_grain_stride +
+                                           j + pred_pos_luma[pos][1]];
+      }
+      luma_grain_block[i * luma_grain_stride + j] =
+          clamp(luma_grain_block[i * luma_grain_stride + j] +
+                    ((wsum + rounding_offset) >> params->ar_coeff_shift),
+                grain_min, grain_max);
+    }
+  return 0;
+}
+
+// Return 0 for success, -1 for failure
+static int generate_chroma_grain_blocks(
+    const aom_film_grain_t *params,
+    //                                  int** pred_pos_luma,
+    int **pred_pos_chroma, int *luma_grain_block, int *cb_grain_block,
+    int *cr_grain_block, int luma_grain_stride, int chroma_block_size_y,
+    int chroma_block_size_x, int chroma_grain_stride, int left_pad, int top_pad,
+    int right_pad, int bottom_pad, int chroma_subsamp_y, int chroma_subsamp_x) {
+  int bit_depth = params->bit_depth;
+  int gauss_sec_shift = 12 - bit_depth + params->grain_scale_shift;
+
+  int num_pos_chroma = 2 * params->ar_coeff_lag * (params->ar_coeff_lag + 1);
+  if (params->num_y_points > 0) ++num_pos_chroma;
+  int rounding_offset = (1 << (params->ar_coeff_shift - 1));
+  int chroma_grain_block_size = chroma_block_size_y * chroma_grain_stride;
+
+  if (params->num_cb_points || params->chroma_scaling_from_luma) {
+    init_random_generator(7 << 5, params->random_seed);
+
+    for (int i = 0; i < chroma_block_size_y; i++)
+      for (int j = 0; j < chroma_block_size_x; j++)
+        cb_grain_block[i * chroma_grain_stride + j] =
+            (gaussian_sequence[get_random_number(gauss_bits)] +
+             ((1 << gauss_sec_shift) >> 1)) >>
+            gauss_sec_shift;
+  } else {
+    memset(cb_grain_block, 0,
+           sizeof(*cb_grain_block) * chroma_grain_block_size);
+  }
+
+  if (params->num_cr_points || params->chroma_scaling_from_luma) {
+    init_random_generator(11 << 5, params->random_seed);
+
+    for (int i = 0; i < chroma_block_size_y; i++)
+      for (int j = 0; j < chroma_block_size_x; j++)
+        cr_grain_block[i * chroma_grain_stride + j] =
+            (gaussian_sequence[get_random_number(gauss_bits)] +
+             ((1 << gauss_sec_shift) >> 1)) >>
+            gauss_sec_shift;
+  } else {
+    memset(cr_grain_block, 0,
+           sizeof(*cr_grain_block) * chroma_grain_block_size);
+  }
+
+  for (int i = top_pad; i < chroma_block_size_y - bottom_pad; i++)
+    for (int j = left_pad; j < chroma_block_size_x - right_pad; j++) {
+      int wsum_cb = 0;
+      int wsum_cr = 0;
+      for (int pos = 0; pos < num_pos_chroma; pos++) {
+        if (pred_pos_chroma[pos][2] == 0) {
+          wsum_cb = wsum_cb + params->ar_coeffs_cb[pos] *
+                                  cb_grain_block[(i + pred_pos_chroma[pos][0]) *
+                                                     chroma_grain_stride +
+                                                 j + pred_pos_chroma[pos][1]];
+          wsum_cr = wsum_cr + params->ar_coeffs_cr[pos] *
+                                  cr_grain_block[(i + pred_pos_chroma[pos][0]) *
+                                                     chroma_grain_stride +
+                                                 j + pred_pos_chroma[pos][1]];
+        } else if (pred_pos_chroma[pos][2] == 1) {
+          int av_luma = 0;
+          int luma_coord_y = ((i - top_pad) << chroma_subsamp_y) + top_pad;
+          int luma_coord_x = ((j - left_pad) << chroma_subsamp_x) + left_pad;
+
+          for (int k = luma_coord_y; k < luma_coord_y + chroma_subsamp_y + 1;
+               k++)
+            for (int l = luma_coord_x; l < luma_coord_x + chroma_subsamp_x + 1;
+                 l++)
+              av_luma += luma_grain_block[k * luma_grain_stride + l];
+
+          av_luma =
+              (av_luma + ((1 << (chroma_subsamp_y + chroma_subsamp_x)) >> 1)) >>
+              (chroma_subsamp_y + chroma_subsamp_x);
+
+          wsum_cb = wsum_cb + params->ar_coeffs_cb[pos] * av_luma;
+          wsum_cr = wsum_cr + params->ar_coeffs_cr[pos] * av_luma;
+        } else {
+          fprintf(
+              stderr,
+              "Grain synthesis: prediction between two chroma components is "
+              "not supported!");
+          return -1;
+        }
+      }
+      if (params->num_cb_points || params->chroma_scaling_from_luma)
+        cb_grain_block[i * chroma_grain_stride + j] =
+            clamp(cb_grain_block[i * chroma_grain_stride + j] +
+                      ((wsum_cb + rounding_offset) >> params->ar_coeff_shift),
+                  grain_min, grain_max);
+      if (params->num_cr_points || params->chroma_scaling_from_luma)
+        cr_grain_block[i * chroma_grain_stride + j] =
+            clamp(cr_grain_block[i * chroma_grain_stride + j] +
+                      ((wsum_cr + rounding_offset) >> params->ar_coeff_shift),
+                  grain_min, grain_max);
+    }
+  return 0;
+}
+
+static void init_scaling_function(const int scaling_points[][2], int num_points,
+                                  int scaling_lut[]) {
+  if (num_points == 0) return;
+
+  for (int i = 0; i < scaling_points[0][0]; i++)
+    scaling_lut[i] = scaling_points[0][1];
+
+  for (int point = 0; point < num_points - 1; point++) {
+    int delta_y = scaling_points[point + 1][1] - scaling_points[point][1];
+    int delta_x = scaling_points[point + 1][0] - scaling_points[point][0];
+
+    int64_t delta = delta_y * ((65536 + (delta_x >> 1)) / delta_x);
+
+    for (int x = 0; x < delta_x; x++) {
+      scaling_lut[scaling_points[point][0] + x] =
+          scaling_points[point][1] + (int)((x * delta + 32768) >> 16);
+    }
+  }
+
+  for (int i = scaling_points[num_points - 1][0]; i < 256; i++)
+    scaling_lut[i] = scaling_points[num_points - 1][1];
+}
+
+// function that extracts samples from a LUT (and interpolates intemediate
+// frames for 10- and 12-bit video)
+static int scale_LUT(int *scaling_lut, int index, int bit_depth) {
+  int x = index >> (bit_depth - 8);
+
+  if (!(bit_depth - 8) || x == 255)
+    return scaling_lut[x];
+  else
+    return scaling_lut[x] + (((scaling_lut[x + 1] - scaling_lut[x]) *
+                                  (index & ((1 << (bit_depth - 8)) - 1)) +
+                              (1 << (bit_depth - 9))) >>
+                             (bit_depth - 8));
+}
+
+static void add_noise_to_block(const aom_film_grain_t *params, uint8_t *luma,
+                               uint8_t *cb, uint8_t *cr, int luma_stride,
+                               int chroma_stride, int *luma_grain,
+                               int *cb_grain, int *cr_grain,
+                               int luma_grain_stride, int chroma_grain_stride,
+                               int half_luma_height, int half_luma_width,
+                               int bit_depth, int chroma_subsamp_y,
+                               int chroma_subsamp_x, int mc_identity) {
+  int cb_mult = params->cb_mult - 128;            // fixed scale
+  int cb_luma_mult = params->cb_luma_mult - 128;  // fixed scale
+  int cb_offset = params->cb_offset - 256;
+
+  int cr_mult = params->cr_mult - 128;            // fixed scale
+  int cr_luma_mult = params->cr_luma_mult - 128;  // fixed scale
+  int cr_offset = params->cr_offset - 256;
+
+  int rounding_offset = (1 << (params->scaling_shift - 1));
+
+  int apply_y = params->num_y_points > 0 ? 1 : 0;
+  int apply_cb =
+      (params->num_cb_points > 0 || params->chroma_scaling_from_luma) ? 1 : 0;
+  int apply_cr =
+      (params->num_cr_points > 0 || params->chroma_scaling_from_luma) ? 1 : 0;
+
+  if (params->chroma_scaling_from_luma) {
+    cb_mult = 0;        // fixed scale
+    cb_luma_mult = 64;  // fixed scale
+    cb_offset = 0;
+
+    cr_mult = 0;        // fixed scale
+    cr_luma_mult = 64;  // fixed scale
+    cr_offset = 0;
+  }
+
+  int min_luma, max_luma, min_chroma, max_chroma;
+
+  if (params->clip_to_restricted_range) {
+    min_luma = min_luma_legal_range;
+    max_luma = max_luma_legal_range;
+
+    if (mc_identity) {
+      min_chroma = min_luma_legal_range;
+      max_chroma = max_luma_legal_range;
+    } else {
+      min_chroma = min_chroma_legal_range;
+      max_chroma = max_chroma_legal_range;
+    }
+  } else {
+    min_luma = min_chroma = 0;
+    max_luma = max_chroma = 255;
+  }
+
+  for (int i = 0; i < (half_luma_height << (1 - chroma_subsamp_y)); i++) {
+    for (int j = 0; j < (half_luma_width << (1 - chroma_subsamp_x)); j++) {
+      int average_luma = 0;
+      if (chroma_subsamp_x) {
+        average_luma = (luma[(i << chroma_subsamp_y) * luma_stride +
+                             (j << chroma_subsamp_x)] +
+                        luma[(i << chroma_subsamp_y) * luma_stride +
+                             (j << chroma_subsamp_x) + 1] +
+                        1) >>
+                       1;
+      } else {
+        average_luma = luma[(i << chroma_subsamp_y) * luma_stride + j];
+      }
+
+      if (apply_cb) {
+        cb[i * chroma_stride + j] = clamp(
+            cb[i * chroma_stride + j] +
+                ((scale_LUT(scaling_lut_cb,
+                            clamp(((average_luma * cb_luma_mult +
+                                    cb_mult * cb[i * chroma_stride + j]) >>
+                                   6) +
+                                      cb_offset,
+                                  0, (256 << (bit_depth - 8)) - 1),
+                            8) *
+                      cb_grain[i * chroma_grain_stride + j] +
+                  rounding_offset) >>
+                 params->scaling_shift),
+            min_chroma, max_chroma);
+      }
+
+      if (apply_cr) {
+        cr[i * chroma_stride + j] = clamp(
+            cr[i * chroma_stride + j] +
+                ((scale_LUT(scaling_lut_cr,
+                            clamp(((average_luma * cr_luma_mult +
+                                    cr_mult * cr[i * chroma_stride + j]) >>
+                                   6) +
+                                      cr_offset,
+                                  0, (256 << (bit_depth - 8)) - 1),
+                            8) *
+                      cr_grain[i * chroma_grain_stride + j] +
+                  rounding_offset) >>
+                 params->scaling_shift),
+            min_chroma, max_chroma);
+      }
+    }
+  }
+
+  if (apply_y) {
+    for (int i = 0; i < (half_luma_height << 1); i++) {
+      for (int j = 0; j < (half_luma_width << 1); j++) {
+        luma[i * luma_stride + j] =
+            clamp(luma[i * luma_stride + j] +
+                      ((scale_LUT(scaling_lut_y, luma[i * luma_stride + j], 8) *
+                            luma_grain[i * luma_grain_stride + j] +
+                        rounding_offset) >>
+                       params->scaling_shift),
+                  min_luma, max_luma);
+      }
+    }
+  }
+}
+
+static void add_noise_to_block_hbd(
+    const aom_film_grain_t *params, uint16_t *luma, uint16_t *cb, uint16_t *cr,
+    int luma_stride, int chroma_stride, int *luma_grain, int *cb_grain,
+    int *cr_grain, int luma_grain_stride, int chroma_grain_stride,
+    int half_luma_height, int half_luma_width, int bit_depth,
+    int chroma_subsamp_y, int chroma_subsamp_x, int mc_identity) {
+  int cb_mult = params->cb_mult - 128;            // fixed scale
+  int cb_luma_mult = params->cb_luma_mult - 128;  // fixed scale
+  // offset value depends on the bit depth
+  int cb_offset = (params->cb_offset << (bit_depth - 8)) - (1 << bit_depth);
+
+  int cr_mult = params->cr_mult - 128;            // fixed scale
+  int cr_luma_mult = params->cr_luma_mult - 128;  // fixed scale
+  // offset value depends on the bit depth
+  int cr_offset = (params->cr_offset << (bit_depth - 8)) - (1 << bit_depth);
+
+  int rounding_offset = (1 << (params->scaling_shift - 1));
+
+  int apply_y = params->num_y_points > 0 ? 1 : 0;
+  int apply_cb =
+      (params->num_cb_points > 0 || params->chroma_scaling_from_luma) > 0 ? 1
+                                                                          : 0;
+  int apply_cr =
+      (params->num_cr_points > 0 || params->chroma_scaling_from_luma) > 0 ? 1
+                                                                          : 0;
+
+  if (params->chroma_scaling_from_luma) {
+    cb_mult = 0;        // fixed scale
+    cb_luma_mult = 64;  // fixed scale
+    cb_offset = 0;
+
+    cr_mult = 0;        // fixed scale
+    cr_luma_mult = 64;  // fixed scale
+    cr_offset = 0;
+  }
+
+  int min_luma, max_luma, min_chroma, max_chroma;
+
+  if (params->clip_to_restricted_range) {
+    min_luma = min_luma_legal_range << (bit_depth - 8);
+    max_luma = max_luma_legal_range << (bit_depth - 8);
+
+    if (mc_identity) {
+      min_chroma = min_luma_legal_range << (bit_depth - 8);
+      max_chroma = max_luma_legal_range << (bit_depth - 8);
+    } else {
+      min_chroma = min_chroma_legal_range << (bit_depth - 8);
+      max_chroma = max_chroma_legal_range << (bit_depth - 8);
+    }
+  } else {
+    min_luma = min_chroma = 0;
+    max_luma = max_chroma = (256 << (bit_depth - 8)) - 1;
+  }
+
+  for (int i = 0; i < (half_luma_height << (1 - chroma_subsamp_y)); i++) {
+    for (int j = 0; j < (half_luma_width << (1 - chroma_subsamp_x)); j++) {
+      int average_luma = 0;
+      if (chroma_subsamp_x) {
+        average_luma = (luma[(i << chroma_subsamp_y) * luma_stride +
+                             (j << chroma_subsamp_x)] +
+                        luma[(i << chroma_subsamp_y) * luma_stride +
+                             (j << chroma_subsamp_x) + 1] +
+                        1) >>
+                       1;
+      } else {
+        average_luma = luma[(i << chroma_subsamp_y) * luma_stride + j];
+      }
+
+      if (apply_cb) {
+        cb[i * chroma_stride + j] = clamp(
+            cb[i * chroma_stride + j] +
+                ((scale_LUT(scaling_lut_cb,
+                            clamp(((average_luma * cb_luma_mult +
+                                    cb_mult * cb[i * chroma_stride + j]) >>
+                                   6) +
+                                      cb_offset,
+                                  0, (256 << (bit_depth - 8)) - 1),
+                            bit_depth) *
+                      cb_grain[i * chroma_grain_stride + j] +
+                  rounding_offset) >>
+                 params->scaling_shift),
+            min_chroma, max_chroma);
+      }
+      if (apply_cr) {
+        cr[i * chroma_stride + j] = clamp(
+            cr[i * chroma_stride + j] +
+                ((scale_LUT(scaling_lut_cr,
+                            clamp(((average_luma * cr_luma_mult +
+                                    cr_mult * cr[i * chroma_stride + j]) >>
+                                   6) +
+                                      cr_offset,
+                                  0, (256 << (bit_depth - 8)) - 1),
+                            bit_depth) *
+                      cr_grain[i * chroma_grain_stride + j] +
+                  rounding_offset) >>
+                 params->scaling_shift),
+            min_chroma, max_chroma);
+      }
+    }
+  }
+
+  if (apply_y) {
+    for (int i = 0; i < (half_luma_height << 1); i++) {
+      for (int j = 0; j < (half_luma_width << 1); j++) {
+        luma[i * luma_stride + j] =
+            clamp(luma[i * luma_stride + j] +
+                      ((scale_LUT(scaling_lut_y, luma[i * luma_stride + j],
+                                  bit_depth) *
+                            luma_grain[i * luma_grain_stride + j] +
+                        rounding_offset) >>
+                       params->scaling_shift),
+                  min_luma, max_luma);
+      }
+    }
+  }
+}
+
+static void copy_rect(uint8_t *src, int src_stride, uint8_t *dst,
+                      int dst_stride, int width, int height,
+                      int use_high_bit_depth) {
+  int hbd_coeff = use_high_bit_depth ? 2 : 1;
+  while (height) {
+    memcpy(dst, src, width * sizeof(uint8_t) * hbd_coeff);
+    src += src_stride;
+    dst += dst_stride;
+    --height;
+  }
+  return;
+}
+
+static void copy_area(int *src, int src_stride, int *dst, int dst_stride,
+                      int width, int height) {
+  while (height) {
+    memcpy(dst, src, width * sizeof(*src));
+    src += src_stride;
+    dst += dst_stride;
+    --height;
+  }
+  return;
+}
+
+static void extend_even(uint8_t *dst, int dst_stride, int width, int height,
+                        int use_high_bit_depth) {
+  if ((width & 1) == 0 && (height & 1) == 0) return;
+  if (use_high_bit_depth) {
+    uint16_t *dst16 = (uint16_t *)dst;
+    int dst16_stride = dst_stride / 2;
+    if (width & 1) {
+      for (int i = 0; i < height; ++i)
+        dst16[i * dst16_stride + width] = dst16[i * dst16_stride + width - 1];
+    }
+    width = (width + 1) & (~1);
+    if (height & 1) {
+      memcpy(&dst16[height * dst16_stride], &dst16[(height - 1) * dst16_stride],
+             sizeof(*dst16) * width);
+    }
+  } else {
+    if (width & 1) {
+      for (int i = 0; i < height; ++i)
+        dst[i * dst_stride + width] = dst[i * dst_stride + width - 1];
+    }
+    width = (width + 1) & (~1);
+    if (height & 1) {
+      memcpy(&dst[height * dst_stride], &dst[(height - 1) * dst_stride],
+             sizeof(*dst) * width);
+    }
+  }
+}
+
+static void ver_boundary_overlap(int *left_block, int left_stride,
+                                 int *right_block, int right_stride,
+                                 int *dst_block, int dst_stride, int width,
+                                 int height) {
+  if (width == 1) {
+    while (height) {
+      *dst_block = clamp((*left_block * 23 + *right_block * 22 + 16) >> 5,
+                         grain_min, grain_max);
+      left_block += left_stride;
+      right_block += right_stride;
+      dst_block += dst_stride;
+      --height;
+    }
+    return;
+  } else if (width == 2) {
+    while (height) {
+      dst_block[0] = clamp((27 * left_block[0] + 17 * right_block[0] + 16) >> 5,
+                           grain_min, grain_max);
+      dst_block[1] = clamp((17 * left_block[1] + 27 * right_block[1] + 16) >> 5,
+                           grain_min, grain_max);
+      left_block += left_stride;
+      right_block += right_stride;
+      dst_block += dst_stride;
+      --height;
+    }
+    return;
+  }
+}
+
+static void hor_boundary_overlap(int *top_block, int top_stride,
+                                 int *bottom_block, int bottom_stride,
+                                 int *dst_block, int dst_stride, int width,
+                                 int height) {
+  if (height == 1) {
+    while (width) {
+      *dst_block = clamp((*top_block * 23 + *bottom_block * 22 + 16) >> 5,
+                         grain_min, grain_max);
+      ++top_block;
+      ++bottom_block;
+      ++dst_block;
+      --width;
+    }
+    return;
+  } else if (height == 2) {
+    while (width) {
+      dst_block[0] = clamp((27 * top_block[0] + 17 * bottom_block[0] + 16) >> 5,
+                           grain_min, grain_max);
+      dst_block[dst_stride] = clamp((17 * top_block[top_stride] +
+                                     27 * bottom_block[bottom_stride] + 16) >>
+                                        5,
+                                    grain_min, grain_max);
+      ++top_block;
+      ++bottom_block;
+      ++dst_block;
+      --width;
+    }
+    return;
+  }
+}
+
+int av1_add_film_grain(const aom_film_grain_t *params, const aom_image_t *src,
+                       aom_image_t *dst) {
+  uint8_t *luma, *cb, *cr;
+  int height, width, luma_stride, chroma_stride;
+  int use_high_bit_depth = 0;
+  int chroma_subsamp_x = 0;
+  int chroma_subsamp_y = 0;
+  int mc_identity = src->mc == AOM_CICP_MC_IDENTITY ? 1 : 0;
+
+  switch (src->fmt) {
+    case AOM_IMG_FMT_AOMI420:
+    case AOM_IMG_FMT_I420:
+      use_high_bit_depth = 0;
+      chroma_subsamp_x = 1;
+      chroma_subsamp_y = 1;
+      break;
+    case AOM_IMG_FMT_I42016:
+      use_high_bit_depth = 1;
+      chroma_subsamp_x = 1;
+      chroma_subsamp_y = 1;
+      break;
+      //    case AOM_IMG_FMT_444A:
+    case AOM_IMG_FMT_I444:
+      use_high_bit_depth = 0;
+      chroma_subsamp_x = 0;
+      chroma_subsamp_y = 0;
+      break;
+    case AOM_IMG_FMT_I44416:
+      use_high_bit_depth = 1;
+      chroma_subsamp_x = 0;
+      chroma_subsamp_y = 0;
+      break;
+    case AOM_IMG_FMT_I422:
+      use_high_bit_depth = 0;
+      chroma_subsamp_x = 1;
+      chroma_subsamp_y = 0;
+      break;
+    case AOM_IMG_FMT_I42216:
+      use_high_bit_depth = 1;
+      chroma_subsamp_x = 1;
+      chroma_subsamp_y = 0;
+      break;
+    default:  // unknown input format
+      fprintf(stderr, "Film grain error: input format is not supported!");
+      return -1;
+  }
+
+  assert(params->bit_depth == src->bit_depth);
+
+  dst->fmt = src->fmt;
+  dst->bit_depth = src->bit_depth;
+
+  dst->r_w = src->r_w;
+  dst->r_h = src->r_h;
+  dst->d_w = src->d_w;
+  dst->d_h = src->d_h;
+
+  dst->cp = src->cp;
+  dst->tc = src->tc;
+  dst->mc = src->mc;
+
+  dst->monochrome = src->monochrome;
+  dst->csp = src->csp;
+  dst->range = src->range;
+
+  dst->x_chroma_shift = src->x_chroma_shift;
+  dst->y_chroma_shift = src->y_chroma_shift;
+
+  dst->temporal_id = src->temporal_id;
+  dst->spatial_id = src->spatial_id;
+
+  width = src->d_w % 2 ? src->d_w + 1 : src->d_w;
+  height = src->d_h % 2 ? src->d_h + 1 : src->d_h;
+
+  copy_rect(src->planes[AOM_PLANE_Y], src->stride[AOM_PLANE_Y],
+            dst->planes[AOM_PLANE_Y], dst->stride[AOM_PLANE_Y], src->d_w,
+            src->d_h, use_high_bit_depth);
+  // Note that dst is already assumed to be aligned to even.
+  extend_even(dst->planes[AOM_PLANE_Y], dst->stride[AOM_PLANE_Y], src->d_w,
+              src->d_h, use_high_bit_depth);
+
+  if (!src->monochrome) {
+    copy_rect(src->planes[AOM_PLANE_U], src->stride[AOM_PLANE_U],
+              dst->planes[AOM_PLANE_U], dst->stride[AOM_PLANE_U],
+              width >> chroma_subsamp_x, height >> chroma_subsamp_y,
+              use_high_bit_depth);
+
+    copy_rect(src->planes[AOM_PLANE_V], src->stride[AOM_PLANE_V],
+              dst->planes[AOM_PLANE_V], dst->stride[AOM_PLANE_V],
+              width >> chroma_subsamp_x, height >> chroma_subsamp_y,
+              use_high_bit_depth);
+  }
+
+  luma = dst->planes[AOM_PLANE_Y];
+  cb = dst->planes[AOM_PLANE_U];
+  cr = dst->planes[AOM_PLANE_V];
+
+  // luma and chroma strides in samples
+  luma_stride = dst->stride[AOM_PLANE_Y] >> use_high_bit_depth;
+  chroma_stride = dst->stride[AOM_PLANE_U] >> use_high_bit_depth;
+
+  return av1_add_film_grain_run(
+      params, luma, cb, cr, height, width, luma_stride, chroma_stride,
+      use_high_bit_depth, chroma_subsamp_y, chroma_subsamp_x, mc_identity);
+}
+
+int av1_add_film_grain_run(const aom_film_grain_t *params, uint8_t *luma,
+                           uint8_t *cb, uint8_t *cr, int height, int width,
+                           int luma_stride, int chroma_stride,
+                           int use_high_bit_depth, int chroma_subsamp_y,
+                           int chroma_subsamp_x, int mc_identity) {
+  int **pred_pos_luma;
+  int **pred_pos_chroma;
+  int *luma_grain_block;
+  int *cb_grain_block;
+  int *cr_grain_block;
+
+  int *y_line_buf;
+  int *cb_line_buf;
+  int *cr_line_buf;
+
+  int *y_col_buf;
+  int *cb_col_buf;
+  int *cr_col_buf;
+
+  random_register = params->random_seed;
+
+  int left_pad = 3;
+  int right_pad = 3;  // padding to offset for AR coefficients
+  int top_pad = 3;
+  int bottom_pad = 0;
+
+  int ar_padding = 3;  // maximum lag used for stabilization of AR coefficients
+
+  luma_subblock_size_y = 32;
+  luma_subblock_size_x = 32;
+
+  chroma_subblock_size_y = luma_subblock_size_y >> chroma_subsamp_y;
+  chroma_subblock_size_x = luma_subblock_size_x >> chroma_subsamp_x;
+
+  // Initial padding is only needed for generation of
+  // film grain templates (to stabilize the AR process)
+  // Only a 64x64 luma and 32x32 chroma part of a template
+  // is used later for adding grain, padding can be discarded
+
+  int luma_block_size_y =
+      top_pad + 2 * ar_padding + luma_subblock_size_y * 2 + bottom_pad;
+  int luma_block_size_x = left_pad + 2 * ar_padding + luma_subblock_size_x * 2 +
+                          2 * ar_padding + right_pad;
+
+  int chroma_block_size_y = top_pad + (2 >> chroma_subsamp_y) * ar_padding +
+                            chroma_subblock_size_y * 2 + bottom_pad;
+  int chroma_block_size_x = left_pad + (2 >> chroma_subsamp_x) * ar_padding +
+                            chroma_subblock_size_x * 2 +
+                            (2 >> chroma_subsamp_x) * ar_padding + right_pad;
+
+  int luma_grain_stride = luma_block_size_x;
+  int chroma_grain_stride = chroma_block_size_x;
+
+  int overlap = params->overlap_flag;
+  int bit_depth = params->bit_depth;
+
+  grain_center = 128 << (bit_depth - 8);
+  grain_min = 0 - grain_center;
+  grain_max = (256 << (bit_depth - 8)) - 1 - grain_center;
+
+  init_arrays(params, luma_stride, chroma_stride, &pred_pos_luma,
+              &pred_pos_chroma, &luma_grain_block, &cb_grain_block,
+              &cr_grain_block, &y_line_buf, &cb_line_buf, &cr_line_buf,
+              &y_col_buf, &cb_col_buf, &cr_col_buf,
+              luma_block_size_y * luma_block_size_x,
+              chroma_block_size_y * chroma_block_size_x, chroma_subsamp_y,
+              chroma_subsamp_x);
+
+  if (generate_luma_grain_block(params, pred_pos_luma, luma_grain_block,
+                                luma_block_size_y, luma_block_size_x,
+                                luma_grain_stride, left_pad, top_pad, right_pad,
+                                bottom_pad))
+    return -1;
+
+  if (generate_chroma_grain_blocks(
+          params,
+          //                               pred_pos_luma,
+          pred_pos_chroma, luma_grain_block, cb_grain_block, cr_grain_block,
+          luma_grain_stride, chroma_block_size_y, chroma_block_size_x,
+          chroma_grain_stride, left_pad, top_pad, right_pad, bottom_pad,
+          chroma_subsamp_y, chroma_subsamp_x))
+    return -1;
+
+  init_scaling_function(params->scaling_points_y, params->num_y_points,
+                        scaling_lut_y);
+
+  if (params->chroma_scaling_from_luma) {
+    memcpy(scaling_lut_cb, scaling_lut_y, sizeof(*scaling_lut_y) * 256);
+    memcpy(scaling_lut_cr, scaling_lut_y, sizeof(*scaling_lut_y) * 256);
+  } else {
+    init_scaling_function(params->scaling_points_cb, params->num_cb_points,
+                          scaling_lut_cb);
+    init_scaling_function(params->scaling_points_cr, params->num_cr_points,
+                          scaling_lut_cr);
+  }
+  for (int y = 0; y < height / 2; y += (luma_subblock_size_y >> 1)) {
+    init_random_generator(y * 2, params->random_seed);
+
+    for (int x = 0; x < width / 2; x += (luma_subblock_size_x >> 1)) {
+      int offset_y = get_random_number(8);
+      int offset_x = (offset_y >> 4) & 15;
+      offset_y &= 15;
+
+      int luma_offset_y = left_pad + 2 * ar_padding + (offset_y << 1);
+      int luma_offset_x = top_pad + 2 * ar_padding + (offset_x << 1);
+
+      int chroma_offset_y = top_pad + (2 >> chroma_subsamp_y) * ar_padding +
+                            offset_y * (2 >> chroma_subsamp_y);
+      int chroma_offset_x = left_pad + (2 >> chroma_subsamp_x) * ar_padding +
+                            offset_x * (2 >> chroma_subsamp_x);
+
+      if (overlap && x) {
+        ver_boundary_overlap(
+            y_col_buf, 2,
+            luma_grain_block + luma_offset_y * luma_grain_stride +
+                luma_offset_x,
+            luma_grain_stride, y_col_buf, 2, 2,
+            AOMMIN(luma_subblock_size_y + 2, height - (y << 1)));
+
+        ver_boundary_overlap(
+            cb_col_buf, 2 >> chroma_subsamp_x,
+            cb_grain_block + chroma_offset_y * chroma_grain_stride +
+                chroma_offset_x,
+            chroma_grain_stride, cb_col_buf, 2 >> chroma_subsamp_x,
+            2 >> chroma_subsamp_x,
+            AOMMIN(chroma_subblock_size_y + (2 >> chroma_subsamp_y),
+                   (height - (y << 1)) >> chroma_subsamp_y));
+
+        ver_boundary_overlap(
+            cr_col_buf, 2 >> chroma_subsamp_x,
+            cr_grain_block + chroma_offset_y * chroma_grain_stride +
+                chroma_offset_x,
+            chroma_grain_stride, cr_col_buf, 2 >> chroma_subsamp_x,
+            2 >> chroma_subsamp_x,
+            AOMMIN(chroma_subblock_size_y + (2 >> chroma_subsamp_y),
+                   (height - (y << 1)) >> chroma_subsamp_y));
+
+        int i = y ? 1 : 0;
+
+        if (use_high_bit_depth) {
+          add_noise_to_block_hbd(
+              params,
+              (uint16_t *)luma + ((y + i) << 1) * luma_stride + (x << 1),
+              (uint16_t *)cb +
+                  ((y + i) << (1 - chroma_subsamp_y)) * chroma_stride +
+                  (x << (1 - chroma_subsamp_x)),
+              (uint16_t *)cr +
+                  ((y + i) << (1 - chroma_subsamp_y)) * chroma_stride +
+                  (x << (1 - chroma_subsamp_x)),
+              luma_stride, chroma_stride, y_col_buf + i * 4,
+              cb_col_buf + i * (2 - chroma_subsamp_y) * (2 - chroma_subsamp_x),
+              cr_col_buf + i * (2 - chroma_subsamp_y) * (2 - chroma_subsamp_x),
+              2, (2 - chroma_subsamp_x),
+              AOMMIN(luma_subblock_size_y >> 1, height / 2 - y) - i, 1,
+              bit_depth, chroma_subsamp_y, chroma_subsamp_x, mc_identity);
+        } else {
+          add_noise_to_block(
+              params, luma + ((y + i) << 1) * luma_stride + (x << 1),
+              cb + ((y + i) << (1 - chroma_subsamp_y)) * chroma_stride +
+                  (x << (1 - chroma_subsamp_x)),
+              cr + ((y + i) << (1 - chroma_subsamp_y)) * chroma_stride +
+                  (x << (1 - chroma_subsamp_x)),
+              luma_stride, chroma_stride, y_col_buf + i * 4,
+              cb_col_buf + i * (2 - chroma_subsamp_y) * (2 - chroma_subsamp_x),
+              cr_col_buf + i * (2 - chroma_subsamp_y) * (2 - chroma_subsamp_x),
+              2, (2 - chroma_subsamp_x),
+              AOMMIN(luma_subblock_size_y >> 1, height / 2 - y) - i, 1,
+              bit_depth, chroma_subsamp_y, chroma_subsamp_x, mc_identity);
+        }
+      }
+
+      if (overlap && y) {
+        if (x) {
+          hor_boundary_overlap(y_line_buf + (x << 1), luma_stride, y_col_buf, 2,
+                               y_line_buf + (x << 1), luma_stride, 2, 2);
+
+          hor_boundary_overlap(cb_line_buf + x * (2 >> chroma_subsamp_x),
+                               chroma_stride, cb_col_buf, 2 >> chroma_subsamp_x,
+                               cb_line_buf + x * (2 >> chroma_subsamp_x),
+                               chroma_stride, 2 >> chroma_subsamp_x,
+                               2 >> chroma_subsamp_y);
+
+          hor_boundary_overlap(cr_line_buf + x * (2 >> chroma_subsamp_x),
+                               chroma_stride, cr_col_buf, 2 >> chroma_subsamp_x,
+                               cr_line_buf + x * (2 >> chroma_subsamp_x),
+                               chroma_stride, 2 >> chroma_subsamp_x,
+                               2 >> chroma_subsamp_y);
+        }
+
+        hor_boundary_overlap(
+            y_line_buf + ((x ? x + 1 : 0) << 1), luma_stride,
+            luma_grain_block + luma_offset_y * luma_grain_stride +
+                luma_offset_x + (x ? 2 : 0),
+            luma_grain_stride, y_line_buf + ((x ? x + 1 : 0) << 1), luma_stride,
+            AOMMIN(luma_subblock_size_x - ((x ? 1 : 0) << 1),
+                   width - ((x ? x + 1 : 0) << 1)),
+            2);
+
+        hor_boundary_overlap(
+            cb_line_buf + ((x ? x + 1 : 0) << (1 - chroma_subsamp_x)),
+            chroma_stride,
+            cb_grain_block + chroma_offset_y * chroma_grain_stride +
+                chroma_offset_x + ((x ? 1 : 0) << (1 - chroma_subsamp_x)),
+            chroma_grain_stride,
+            cb_line_buf + ((x ? x + 1 : 0) << (1 - chroma_subsamp_x)),
+            chroma_stride,
+            AOMMIN(chroma_subblock_size_x -
+                       ((x ? 1 : 0) << (1 - chroma_subsamp_x)),
+                   (width - ((x ? x + 1 : 0) << 1)) >> chroma_subsamp_x),
+            2 >> chroma_subsamp_y);
+
+        hor_boundary_overlap(
+            cr_line_buf + ((x ? x + 1 : 0) << (1 - chroma_subsamp_x)),
+            chroma_stride,
+            cr_grain_block + chroma_offset_y * chroma_grain_stride +
+                chroma_offset_x + ((x ? 1 : 0) << (1 - chroma_subsamp_x)),
+            chroma_grain_stride,
+            cr_line_buf + ((x ? x + 1 : 0) << (1 - chroma_subsamp_x)),
+            chroma_stride,
+            AOMMIN(chroma_subblock_size_x -
+                       ((x ? 1 : 0) << (1 - chroma_subsamp_x)),
+                   (width - ((x ? x + 1 : 0) << 1)) >> chroma_subsamp_x),
+            2 >> chroma_subsamp_y);
+
+        if (use_high_bit_depth) {
+          add_noise_to_block_hbd(
+              params, (uint16_t *)luma + (y << 1) * luma_stride + (x << 1),
+              (uint16_t *)cb + (y << (1 - chroma_subsamp_y)) * chroma_stride +
+                  (x << ((1 - chroma_subsamp_x))),
+              (uint16_t *)cr + (y << (1 - chroma_subsamp_y)) * chroma_stride +
+                  (x << ((1 - chroma_subsamp_x))),
+              luma_stride, chroma_stride, y_line_buf + (x << 1),
+              cb_line_buf + (x << (1 - chroma_subsamp_x)),
+              cr_line_buf + (x << (1 - chroma_subsamp_x)), luma_stride,
+              chroma_stride, 1,
+              AOMMIN(luma_subblock_size_x >> 1, width / 2 - x), bit_depth,
+              chroma_subsamp_y, chroma_subsamp_x, mc_identity);
+        } else {
+          add_noise_to_block(
+              params, luma + (y << 1) * luma_stride + (x << 1),
+              cb + (y << (1 - chroma_subsamp_y)) * chroma_stride +
+                  (x << ((1 - chroma_subsamp_x))),
+              cr + (y << (1 - chroma_subsamp_y)) * chroma_stride +
+                  (x << ((1 - chroma_subsamp_x))),
+              luma_stride, chroma_stride, y_line_buf + (x << 1),
+              cb_line_buf + (x << (1 - chroma_subsamp_x)),
+              cr_line_buf + (x << (1 - chroma_subsamp_x)), luma_stride,
+              chroma_stride, 1,
+              AOMMIN(luma_subblock_size_x >> 1, width / 2 - x), bit_depth,
+              chroma_subsamp_y, chroma_subsamp_x, mc_identity);
+        }
+      }
+
+      int i = overlap && y ? 1 : 0;
+      int j = overlap && x ? 1 : 0;
+
+      if (use_high_bit_depth) {
+        add_noise_to_block_hbd(
+            params,
+            (uint16_t *)luma + ((y + i) << 1) * luma_stride + ((x + j) << 1),
+            (uint16_t *)cb +
+                ((y + i) << (1 - chroma_subsamp_y)) * chroma_stride +
+                ((x + j) << (1 - chroma_subsamp_x)),
+            (uint16_t *)cr +
+                ((y + i) << (1 - chroma_subsamp_y)) * chroma_stride +
+                ((x + j) << (1 - chroma_subsamp_x)),
+            luma_stride, chroma_stride,
+            luma_grain_block + (luma_offset_y + (i << 1)) * luma_grain_stride +
+                luma_offset_x + (j << 1),
+            cb_grain_block +
+                (chroma_offset_y + (i << (1 - chroma_subsamp_y))) *
+                    chroma_grain_stride +
+                chroma_offset_x + (j << (1 - chroma_subsamp_x)),
+            cr_grain_block +
+                (chroma_offset_y + (i << (1 - chroma_subsamp_y))) *
+                    chroma_grain_stride +
+                chroma_offset_x + (j << (1 - chroma_subsamp_x)),
+            luma_grain_stride, chroma_grain_stride,
+            AOMMIN(luma_subblock_size_y >> 1, height / 2 - y) - i,
+            AOMMIN(luma_subblock_size_x >> 1, width / 2 - x) - j, bit_depth,
+            chroma_subsamp_y, chroma_subsamp_x, mc_identity);
+      } else {
+        add_noise_to_block(
+            params, luma + ((y + i) << 1) * luma_stride + ((x + j) << 1),
+            cb + ((y + i) << (1 - chroma_subsamp_y)) * chroma_stride +
+                ((x + j) << (1 - chroma_subsamp_x)),
+            cr + ((y + i) << (1 - chroma_subsamp_y)) * chroma_stride +
+                ((x + j) << (1 - chroma_subsamp_x)),
+            luma_stride, chroma_stride,
+            luma_grain_block + (luma_offset_y + (i << 1)) * luma_grain_stride +
+                luma_offset_x + (j << 1),
+            cb_grain_block +
+                (chroma_offset_y + (i << (1 - chroma_subsamp_y))) *
+                    chroma_grain_stride +
+                chroma_offset_x + (j << (1 - chroma_subsamp_x)),
+            cr_grain_block +
+                (chroma_offset_y + (i << (1 - chroma_subsamp_y))) *
+                    chroma_grain_stride +
+                chroma_offset_x + (j << (1 - chroma_subsamp_x)),
+            luma_grain_stride, chroma_grain_stride,
+            AOMMIN(luma_subblock_size_y >> 1, height / 2 - y) - i,
+            AOMMIN(luma_subblock_size_x >> 1, width / 2 - x) - j, bit_depth,
+            chroma_subsamp_y, chroma_subsamp_x, mc_identity);
+      }
+
+      if (overlap) {
+        if (x) {
+          // Copy overlapped column bufer to line buffer
+          copy_area(y_col_buf + (luma_subblock_size_y << 1), 2,
+                    y_line_buf + (x << 1), luma_stride, 2, 2);
+
+          copy_area(
+              cb_col_buf + (chroma_subblock_size_y << (1 - chroma_subsamp_x)),
+              2 >> chroma_subsamp_x,
+              cb_line_buf + (x << (1 - chroma_subsamp_x)), chroma_stride,
+              2 >> chroma_subsamp_x, 2 >> chroma_subsamp_y);
+
+          copy_area(
+              cr_col_buf + (chroma_subblock_size_y << (1 - chroma_subsamp_x)),
+              2 >> chroma_subsamp_x,
+              cr_line_buf + (x << (1 - chroma_subsamp_x)), chroma_stride,
+              2 >> chroma_subsamp_x, 2 >> chroma_subsamp_y);
+        }
+
+        // Copy grain to the line buffer for overlap with a bottom block
+        copy_area(
+            luma_grain_block +
+                (luma_offset_y + luma_subblock_size_y) * luma_grain_stride +
+                luma_offset_x + ((x ? 2 : 0)),
+            luma_grain_stride, y_line_buf + ((x ? x + 1 : 0) << 1), luma_stride,
+            AOMMIN(luma_subblock_size_x, width - (x << 1)) - (x ? 2 : 0), 2);
+
+        copy_area(cb_grain_block +
+                      (chroma_offset_y + chroma_subblock_size_y) *
+                          chroma_grain_stride +
+                      chroma_offset_x + (x ? 2 >> chroma_subsamp_x : 0),
+                  chroma_grain_stride,
+                  cb_line_buf + ((x ? x + 1 : 0) << (1 - chroma_subsamp_x)),
+                  chroma_stride,
+                  AOMMIN(chroma_subblock_size_x,
+                         ((width - (x << 1)) >> chroma_subsamp_x)) -
+                      (x ? 2 >> chroma_subsamp_x : 0),
+                  2 >> chroma_subsamp_y);
+
+        copy_area(cr_grain_block +
+                      (chroma_offset_y + chroma_subblock_size_y) *
+                          chroma_grain_stride +
+                      chroma_offset_x + (x ? 2 >> chroma_subsamp_x : 0),
+                  chroma_grain_stride,
+                  cr_line_buf + ((x ? x + 1 : 0) << (1 - chroma_subsamp_x)),
+                  chroma_stride,
+                  AOMMIN(chroma_subblock_size_x,
+                         ((width - (x << 1)) >> chroma_subsamp_x)) -
+                      (x ? 2 >> chroma_subsamp_x : 0),
+                  2 >> chroma_subsamp_y);
+
+        // Copy grain to the column buffer for overlap with the next block to
+        // the right
+
+        copy_area(luma_grain_block + luma_offset_y * luma_grain_stride +
+                      luma_offset_x + luma_subblock_size_x,
+                  luma_grain_stride, y_col_buf, 2, 2,
+                  AOMMIN(luma_subblock_size_y + 2, height - (y << 1)));
+
+        copy_area(cb_grain_block + chroma_offset_y * chroma_grain_stride +
+                      chroma_offset_x + chroma_subblock_size_x,
+                  chroma_grain_stride, cb_col_buf, 2 >> chroma_subsamp_x,
+                  2 >> chroma_subsamp_x,
+                  AOMMIN(chroma_subblock_size_y + (2 >> chroma_subsamp_y),
+                         (height - (y << 1)) >> chroma_subsamp_y));
+
+        copy_area(cr_grain_block + chroma_offset_y * chroma_grain_stride +
+                      chroma_offset_x + chroma_subblock_size_x,
+                  chroma_grain_stride, cr_col_buf, 2 >> chroma_subsamp_x,
+                  2 >> chroma_subsamp_x,
+                  AOMMIN(chroma_subblock_size_y + (2 >> chroma_subsamp_y),
+                         (height - (y << 1)) >> chroma_subsamp_y));
+      }
+    }
+  }
+
+  dealloc_arrays(params, &pred_pos_luma, &pred_pos_chroma, &luma_grain_block,
+                 &cb_grain_block, &cr_grain_block, &y_line_buf, &cb_line_buf,
+                 &cr_line_buf, &y_col_buf, &cb_col_buf, &cr_col_buf);
+  return 0;
+}

diff --git a/libav1/aom_dsp/grain_synthesis.h b/libav1/aom_dsp/grain_synthesis.h
new file mode 100644
index 0000000..9155b39
--- /dev/null
+++ b/libav1/aom_dsp/grain_synthesis.h

@@ -0,0 +1,192 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+/*!\file
+ * \brief Describes film grain parameters and film grain synthesis
+ *
+ */
+#ifndef AOM_AOM_DSP_GRAIN_SYNTHESIS_H_
+#define AOM_AOM_DSP_GRAIN_SYNTHESIS_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <string.h>
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom/aom_image.h"
+
+/*!\brief Structure containing film grain synthesis parameters for a frame
+ *
+ * This structure contains input parameters for film grain synthesis
+ */
+typedef struct {
+  // This structure is compared element-by-element in the function
+  // av1_check_grain_params_equiv: this function must be updated if any changes
+  // are made to this structure.
+  int apply_grain;
+
+  int update_parameters;
+
+  // 8 bit values
+  int scaling_points_y[14][2];
+  int num_y_points;  // value: 0..14
+
+  // 8 bit values
+  int scaling_points_cb[10][2];
+  int num_cb_points;  // value: 0..10
+
+  // 8 bit values
+  int scaling_points_cr[10][2];
+  int num_cr_points;  // value: 0..10
+
+  int scaling_shift;  // values : 8..11
+
+  int ar_coeff_lag;  // values:  0..3
+
+  // 8 bit values
+  int ar_coeffs_y[24];
+  int ar_coeffs_cb[25];
+  int ar_coeffs_cr[25];
+
+  // Shift value: AR coeffs range
+  // 6: [-2, 2)
+  // 7: [-1, 1)
+  // 8: [-0.5, 0.5)
+  // 9: [-0.25, 0.25)
+  int ar_coeff_shift;  // values : 6..9
+
+  int cb_mult;       // 8 bits
+  int cb_luma_mult;  // 8 bits
+  int cb_offset;     // 9 bits
+
+  int cr_mult;       // 8 bits
+  int cr_luma_mult;  // 8 bits
+  int cr_offset;     // 9 bits
+
+  int overlap_flag;
+
+  int clip_to_restricted_range;
+
+  unsigned int bit_depth;  // video bit depth
+
+  int chroma_scaling_from_luma;
+
+  int grain_scale_shift;
+
+  uint16_t random_seed;
+  // This structure is compared element-by-element in the function
+  // av1_check_grain_params_equiv: this function must be updated if any changes
+  // are made to this structure.
+} aom_film_grain_t;
+
+/*!\brief Check if two film grain parameters structs are equivalent
+ *
+ * Check if two film grain parameters are equal, except for the
+ * update_parameters and random_seed elements which are ignored.
+ *
+ * \param[in]    pa               The first set of parameters to compare
+ * \param[in]    pb               The second set of parameters to compare
+ * \return       Returns 1 if the params are equivalent, 0 otherwise
+ */
+static INLINE int av1_check_grain_params_equiv(
+    const aom_film_grain_t *const pa, const aom_film_grain_t *const pb) {
+  if (pa->apply_grain != pb->apply_grain) return 0;
+  // Don't compare update_parameters
+
+  if (pa->num_y_points != pb->num_y_points) return 0;
+  if (memcmp(pa->scaling_points_y, pb->scaling_points_y,
+             pa->num_y_points * 2 * sizeof(*pa->scaling_points_y)) != 0)
+    return 0;
+
+  if (pa->num_cb_points != pb->num_cb_points) return 0;
+  if (memcmp(pa->scaling_points_cb, pb->scaling_points_cb,
+             pa->num_cb_points * 2 * sizeof(*pa->scaling_points_cb)) != 0)
+    return 0;
+
+  if (pa->num_cr_points != pb->num_cr_points) return 0;
+  if (memcmp(pa->scaling_points_cr, pb->scaling_points_cr,
+             pa->num_cr_points * 2 * sizeof(*pa->scaling_points_cr)) != 0)
+    return 0;
+
+  if (pa->scaling_shift != pb->scaling_shift) return 0;
+  if (pa->ar_coeff_lag != pb->ar_coeff_lag) return 0;
+
+  const int num_pos = 2 * pa->ar_coeff_lag * (pa->ar_coeff_lag + 1);
+  if (memcmp(pa->ar_coeffs_y, pb->ar_coeffs_y,
+             num_pos * sizeof(*pa->ar_coeffs_y)) != 0)
+    return 0;
+  if (memcmp(pa->ar_coeffs_cb, pb->ar_coeffs_cb,
+             num_pos * sizeof(*pa->ar_coeffs_cb)) != 0)
+    return 0;
+  if (memcmp(pa->ar_coeffs_cr, pb->ar_coeffs_cr,
+             num_pos * sizeof(*pa->ar_coeffs_cr)) != 0)
+    return 0;
+
+  if (pa->ar_coeff_shift != pb->ar_coeff_shift) return 0;
+
+  if (pa->cb_mult != pb->cb_mult) return 0;
+  if (pa->cb_luma_mult != pb->cb_luma_mult) return 0;
+  if (pa->cb_offset != pb->cb_offset) return 0;
+
+  if (pa->cr_mult != pb->cr_mult) return 0;
+  if (pa->cr_luma_mult != pb->cr_luma_mult) return 0;
+  if (pa->cr_offset != pb->cr_offset) return 0;
+
+  if (pa->overlap_flag != pb->overlap_flag) return 0;
+  if (pa->clip_to_restricted_range != pb->clip_to_restricted_range) return 0;
+  if (pa->bit_depth != pb->bit_depth) return 0;
+  if (pa->chroma_scaling_from_luma != pb->chroma_scaling_from_luma) return 0;
+  if (pa->grain_scale_shift != pb->grain_scale_shift) return 0;
+
+  return 1;
+}
+
+/*!\brief Add film grain
+ *
+ * Add film grain to an image
+ *
+ * Returns 0 for success, -1 for failure
+ *
+ * \param[in]    grain_params     Grain parameters
+ * \param[in]    luma             luma plane
+ * \param[in]    cb               cb plane
+ * \param[in]    cr               cr plane
+ * \param[in]    height           luma plane height
+ * \param[in]    width            luma plane width
+ * \param[in]    luma_stride      luma plane stride
+ * \param[in]    chroma_stride    chroma plane stride
+ */
+int av1_add_film_grain_run(const aom_film_grain_t *grain_params, uint8_t *luma,
+                           uint8_t *cb, uint8_t *cr, int height, int width,
+                           int luma_stride, int chroma_stride,
+                           int use_high_bit_depth, int chroma_subsamp_y,
+                           int chroma_subsamp_x, int mc_identity);
+
+/*!\brief Add film grain
+ *
+ * Add film grain to an image
+ *
+ * Returns 0 for success, -1 for failure
+ *
+ * \param[in]    grain_params     Grain parameters
+ * \param[in]    src              Source image
+ * \param[out]   dst              Resulting image with grain
+ */
+int av1_add_film_grain(const aom_film_grain_t *grain_params,
+                       const aom_image_t *src, aom_image_t *dst);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_DSP_GRAIN_SYNTHESIS_H_

diff --git a/libav1/aom_dsp/grain_table.c b/libav1/aom_dsp/grain_table.c
new file mode 100644
index 0000000..5eb5b68
--- /dev/null
+++ b/libav1/aom_dsp/grain_table.c

@@ -0,0 +1,334 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+/*!\file
+ * \brief This file has the implementation details of the grain table.
+ *
+ * The file format is an ascii representation for readability and
+ * editability. Array parameters are separated from the non-array
+ * parameters and prefixed with a few characters to make for easy
+ * localization with a parameter set. Each entry is prefixed with "E"
+ * and the other parameters are only specified if "update-parms" is
+ * non-zero.
+ *
+ * filmgrn1
+ * E <start-time> <end-time> <apply-grain> <random-seed> <update-parms>
+ *  p <ar_coeff_lag> <ar_coeff_shift> <grain_scale_shift> ...
+ *  sY <num_y_points> <point_0_x> <point_0_y> ...
+ *  sCb <num_cb_points> <point_0_x> <point_0_y> ...
+ *  sCr <num_cr_points> <point_0_x> <point_0_y> ...
+ *  cY <ar_coeff_y_0> ....
+ *  cCb <ar_coeff_cb_0> ....
+ *  cCr <ar_coeff_cr_0> ....
+ * E <start-time> ...
+ */
+#include <string.h>
+#include <stdio.h>
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_dsp/grain_table.h"
+#include "aom_mem/aom_mem.h"
+
+static const char kFileMagic[8] = "filmgrn1";
+
+static void grain_table_entry_read(FILE *file,
+                                   struct aom_internal_error_info *error_info,
+                                   aom_film_grain_table_entry_t *entry) {
+  aom_film_grain_t *pars = &entry->params;
+  int num_read =
+      fscanf(file, "E %" PRId64 " %" PRId64 " %d %hd %d\n", &entry->start_time,
+             &entry->end_time, &pars->apply_grain, &pars->random_seed,
+             &pars->update_parameters);
+  if (num_read == 0 && feof(file)) return;
+  if (num_read != 5) {
+    aom_internal_error(error_info, AOM_CODEC_ERROR,
+                       "Unable to read entry header. Read %d != 5", num_read);
+    return;
+  }
+  if (pars->update_parameters) {
+    num_read = fscanf(file, "p %d %d %d %d %d %d %d %d %d %d %d %d\n",
+                      &pars->ar_coeff_lag, &pars->ar_coeff_shift,
+                      &pars->grain_scale_shift, &pars->scaling_shift,
+                      &pars->chroma_scaling_from_luma, &pars->overlap_flag,
+                      &pars->cb_mult, &pars->cb_luma_mult, &pars->cb_offset,
+                      &pars->cr_mult, &pars->cr_luma_mult, &pars->cr_offset);
+    if (num_read != 12) {
+      aom_internal_error(error_info, AOM_CODEC_ERROR,
+                         "Unable to read entry params. Read %d != 12",
+                         num_read);
+      return;
+    }
+    if (!fscanf(file, "\tsY %d ", &pars->num_y_points)) {
+      aom_internal_error(error_info, AOM_CODEC_ERROR,
+                         "Unable to read num y points");
+      return;
+    }
+    for (int i = 0; i < pars->num_y_points; ++i) {
+      if (2 != fscanf(file, "%d %d", &pars->scaling_points_y[i][0],
+                      &pars->scaling_points_y[i][1])) {
+        aom_internal_error(error_info, AOM_CODEC_ERROR,
+                           "Unable to read y scaling points");
+        return;
+      }
+    }
+    if (!fscanf(file, "\n\tsCb %d", &pars->num_cb_points)) {
+      aom_internal_error(error_info, AOM_CODEC_ERROR,
+                         "Unable to read num cb points");
+      return;
+    }
+    for (int i = 0; i < pars->num_cb_points; ++i) {
+      if (2 != fscanf(file, "%d %d", &pars->scaling_points_cb[i][0],
+                      &pars->scaling_points_cb[i][1])) {
+        aom_internal_error(error_info, AOM_CODEC_ERROR,
+                           "Unable to read cb scaling points");
+        return;
+      }
+    }
+    if (!fscanf(file, "\n\tsCr %d", &pars->num_cr_points)) {
+      aom_internal_error(error_info, AOM_CODEC_ERROR,
+                         "Unable to read num cr points");
+      return;
+    }
+    for (int i = 0; i < pars->num_cr_points; ++i) {
+      if (2 != fscanf(file, "%d %d", &pars->scaling_points_cr[i][0],
+                      &pars->scaling_points_cr[i][1])) {
+        aom_internal_error(error_info, AOM_CODEC_ERROR,
+                           "Unable to read cr scaling points");
+        return;
+      }
+    }
+
+    fscanf(file, "\n\tcY");
+    const int n = 2 * pars->ar_coeff_lag * (pars->ar_coeff_lag + 1);
+    for (int i = 0; i < n; ++i) {
+      if (1 != fscanf(file, "%d", &pars->ar_coeffs_y[i])) {
+        aom_internal_error(error_info, AOM_CODEC_ERROR,
+                           "Unable to read Y coeffs");
+        return;
+      }
+    }
+    fscanf(file, "\n\tcCb");
+    for (int i = 0; i <= n; ++i) {
+      if (1 != fscanf(file, "%d", &pars->ar_coeffs_cb[i])) {
+        aom_internal_error(error_info, AOM_CODEC_ERROR,
+                           "Unable to read Cb coeffs");
+        return;
+      }
+    }
+    fscanf(file, "\n\tcCr");
+    for (int i = 0; i <= n; ++i) {
+      if (1 != fscanf(file, "%d", &pars->ar_coeffs_cr[i])) {
+        aom_internal_error(error_info, AOM_CODEC_ERROR,
+                           "Unable to read Cr coeffs");
+        return;
+      }
+    }
+    fscanf(file, "\n");
+  }
+}
+
+static void grain_table_entry_write(FILE *file,
+                                    aom_film_grain_table_entry_t *entry) {
+  const aom_film_grain_t *pars = &entry->params;
+  fprintf(file, "E %" PRId64 " %" PRId64 " %d %d %d\n", entry->start_time,
+          entry->end_time, pars->apply_grain, pars->random_seed,
+          pars->update_parameters);
+  if (pars->update_parameters) {
+    fprintf(file, "\tp %d %d %d %d %d %d %d %d %d %d %d %d\n",
+            pars->ar_coeff_lag, pars->ar_coeff_shift, pars->grain_scale_shift,
+            pars->scaling_shift, pars->chroma_scaling_from_luma,
+            pars->overlap_flag, pars->cb_mult, pars->cb_luma_mult,
+            pars->cb_offset, pars->cr_mult, pars->cr_luma_mult,
+            pars->cr_offset);
+    fprintf(file, "\tsY %d ", pars->num_y_points);
+    for (int i = 0; i < pars->num_y_points; ++i) {
+      fprintf(file, " %d %d", pars->scaling_points_y[i][0],
+              pars->scaling_points_y[i][1]);
+    }
+    fprintf(file, "\n\tsCb %d", pars->num_cb_points);
+    for (int i = 0; i < pars->num_cb_points; ++i) {
+      fprintf(file, " %d %d", pars->scaling_points_cb[i][0],
+              pars->scaling_points_cb[i][1]);
+    }
+    fprintf(file, "\n\tsCr %d", pars->num_cr_points);
+    for (int i = 0; i < pars->num_cr_points; ++i) {
+      fprintf(file, " %d %d", pars->scaling_points_cr[i][0],
+              pars->scaling_points_cr[i][1]);
+    }
+    fprintf(file, "\n\tcY");
+    const int n = 2 * pars->ar_coeff_lag * (pars->ar_coeff_lag + 1);
+    for (int i = 0; i < n; ++i) {
+      fprintf(file, " %d", pars->ar_coeffs_y[i]);
+    }
+    fprintf(file, "\n\tcCb");
+    for (int i = 0; i <= n; ++i) {
+      fprintf(file, " %d", pars->ar_coeffs_cb[i]);
+    }
+    fprintf(file, "\n\tcCr");
+    for (int i = 0; i <= n; ++i) {
+      fprintf(file, " %d", pars->ar_coeffs_cr[i]);
+    }
+    fprintf(file, "\n");
+  }
+}
+
+void aom_film_grain_table_append(aom_film_grain_table_t *t, int64_t time_stamp,
+                                 int64_t end_time,
+                                 const aom_film_grain_t *grain) {
+  if (!t->tail || memcmp(grain, &t->tail->params, sizeof(*grain))) {
+    aom_film_grain_table_entry_t *new_tail = aom_malloc(sizeof(*new_tail));
+    memset(new_tail, 0, sizeof(*new_tail));
+    if (t->tail) t->tail->next = new_tail;
+    if (!t->head) t->head = new_tail;
+    t->tail = new_tail;
+
+    new_tail->start_time = time_stamp;
+    new_tail->end_time = end_time;
+    new_tail->params = *grain;
+  } else {
+    t->tail->end_time = AOMMAX(t->tail->end_time, end_time);
+    t->tail->start_time = AOMMIN(t->tail->start_time, time_stamp);
+  }
+}
+
+int aom_film_grain_table_lookup(aom_film_grain_table_t *t, int64_t time_stamp,
+                                int64_t end_time, int erase,
+                                aom_film_grain_t *grain) {
+  aom_film_grain_table_entry_t *entry = t->head;
+  aom_film_grain_table_entry_t *prev_entry = 0;
+  int16_t random_seed = grain ? grain->random_seed : 0;
+  if (grain) memset(grain, 0, sizeof(*grain));
+
+  while (entry) {
+    aom_film_grain_table_entry_t *next = entry->next;
+    if (time_stamp >= entry->start_time && time_stamp < entry->end_time) {
+      if (grain) {
+        *grain = entry->params;
+        if (time_stamp != 0) grain->random_seed = random_seed;
+      }
+      if (!erase) return 1;
+
+      const int64_t entry_end_time = entry->end_time;
+      if (time_stamp <= entry->start_time && end_time >= entry->end_time) {
+        if (t->tail == entry) t->tail = prev_entry;
+        if (prev_entry) {
+          prev_entry->next = entry->next;
+        } else {
+          t->head = entry->next;
+        }
+        aom_free(entry);
+      } else if (time_stamp <= entry->start_time &&
+                 end_time < entry->end_time) {
+        entry->start_time = end_time;
+      } else if (time_stamp > entry->start_time &&
+                 end_time >= entry->end_time) {
+        entry->end_time = time_stamp;
+      } else {
+        aom_film_grain_table_entry_t *new_entry =
+            aom_malloc(sizeof(*new_entry));
+        new_entry->next = entry->next;
+        new_entry->start_time = end_time;
+        new_entry->end_time = entry->end_time;
+        new_entry->params = entry->params;
+        entry->next = new_entry;
+        entry->end_time = time_stamp;
+        if (t->tail == entry) t->tail = new_entry;
+      }
+      // If segments aren't aligned, delete from the beggining of subsequent
+      // segments
+      if (end_time > entry_end_time) {
+        aom_film_grain_table_lookup(t, entry->end_time, end_time, 1, 0);
+      }
+      return 1;
+    }
+    prev_entry = entry;
+    entry = next;
+  }
+  return 0;
+}
+
+aom_codec_err_t aom_film_grain_table_read(
+    aom_film_grain_table_t *t, const char *filename,
+    struct aom_internal_error_info *error_info) {
+  FILE *file = fopen(filename, "rb");
+  if (!file) {
+    aom_internal_error(error_info, AOM_CODEC_ERROR, "Unable to open %s",
+                       filename);
+    return error_info->error_code;
+  }
+  error_info->error_code = AOM_CODEC_OK;
+
+  // Read in one extra character as there should be white space after
+  // the header.
+  char magic[9];
+  if (!fread(magic, 9, 1, file) || memcmp(magic, kFileMagic, 8)) {
+    aom_internal_error(error_info, AOM_CODEC_ERROR,
+                       "Unable to read (or invalid) file magic");
+    fclose(file);
+    return error_info->error_code;
+  }
+
+  aom_film_grain_table_entry_t *prev_entry = 0;
+  while (!feof(file)) {
+    aom_film_grain_table_entry_t *entry = aom_malloc(sizeof(*entry));
+    memset(entry, 0, sizeof(*entry));
+    grain_table_entry_read(file, error_info, entry);
+    entry->next = 0;
+
+    if (prev_entry) prev_entry->next = entry;
+    if (!t->head) t->head = entry;
+    t->tail = entry;
+    prev_entry = entry;
+
+    if (error_info->error_code != AOM_CODEC_OK) break;
+  }
+
+  fclose(file);
+  return error_info->error_code;
+}
+
+aom_codec_err_t aom_film_grain_table_write(
+    const aom_film_grain_table_t *t, const char *filename,
+    struct aom_internal_error_info *error_info) {
+  error_info->error_code = AOM_CODEC_OK;
+
+  FILE *file = fopen(filename, "wb");
+  if (!file) {
+    aom_internal_error(error_info, AOM_CODEC_ERROR, "Unable to open file %s",
+                       filename);
+    return error_info->error_code;
+  }
+
+  if (!fwrite(kFileMagic, 8, 1, file)) {
+    aom_internal_error(error_info, AOM_CODEC_ERROR,
+                       "Unable to write file magic");
+    fclose(file);
+    return error_info->error_code;
+  }
+
+  fprintf(file, "\n");
+  aom_film_grain_table_entry_t *entry = t->head;
+  while (entry) {
+    grain_table_entry_write(file, entry);
+    entry = entry->next;
+  }
+  fclose(file);
+  return error_info->error_code;
+}
+
+void aom_film_grain_table_free(aom_film_grain_table_t *t) {
+  aom_film_grain_table_entry_t *entry = t->head;
+  while (entry) {
+    aom_film_grain_table_entry_t *next = entry->next;
+    aom_free(entry);
+    entry = next;
+  }
+  memset(t, 0, sizeof(*t));
+}

diff --git a/libav1/aom_dsp/grain_table.h b/libav1/aom_dsp/grain_table.h
new file mode 100644
index 0000000..a8ac507
--- /dev/null
+++ b/libav1/aom_dsp/grain_table.h

@@ -0,0 +1,102 @@
+/*
+ * Copyright (c) 2018, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+/*!\file
+ * \brief A table mapping from time to corresponding film grain parameters.
+ *
+ * In order to apply grain synthesis in the decoder, the film grain parameters
+ * need to be signalled in the encoder. The film grain parameters are time
+ * varying, and for two-pass encoding (and denoiser implementation flexibility)
+ * it is common to denoise the video and do parameter estimation before encoding
+ * the denoised video.
+ *
+ * The film grain table is used to provide this flexibility and is used as a
+ * parameter that is passed to the encoder.
+ *
+ * Further, if regraining is to be done in say a single pass mode, or in two
+ * pass within the encoder (before frames are added to the lookahead buffer),
+ * this data structure can be used to keep track of on-the-fly estimated grain
+ * parameters, that are then extracted from the table before the encoded frame
+ * is written.
+ */
+#ifndef AOM_AOM_DSP_GRAIN_TABLE_H_
+#define AOM_AOM_DSP_GRAIN_TABLE_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "aom_dsp/grain_synthesis.h"
+#include "aom/internal/aom_codec_internal.h"
+
+typedef struct aom_film_grain_table_entry_t {
+  aom_film_grain_t params;
+  int64_t start_time;
+  int64_t end_time;
+  struct aom_film_grain_table_entry_t *next;
+} aom_film_grain_table_entry_t;
+
+typedef struct {
+  aom_film_grain_table_entry_t *head;
+  aom_film_grain_table_entry_t *tail;
+} aom_film_grain_table_t;
+
+/*!\brief Add a mapping from [time_stamp, end_time) to the given grain
+ * parameters
+ *
+ * \param[in/out] table      The grain table
+ * \param[in]     time_stamp The start time stamp
+ * \param[in]     end_stamp  The end time_stamp
+ * \param[in]     grain      The grain parameters
+ */
+void aom_film_grain_table_append(aom_film_grain_table_t *table,
+                                 int64_t time_stamp, int64_t end_time,
+                                 const aom_film_grain_t *grain);
+
+/*!\brief Look-up (and optionally erase) the grain parameters for the given time
+ *
+ * \param[in]  table      The grain table
+ * \param[in]  time_stamp The start time stamp
+ * \param[in]  end_stamp  The end time_stamp
+ * \param[in]  erase      Whether the time segment can be deleted
+ * \param[out] grain      The output grain parameters
+ */
+int aom_film_grain_table_lookup(aom_film_grain_table_t *t, int64_t time_stamp,
+                                int64_t end_time, int erase,
+                                aom_film_grain_t *grain);
+
+/*!\brief Reads the grain table from a file.
+ *
+ * \param[out]  table       The grain table
+ * \param[in]   filename    The file to read from
+ * \param[in]   error_info  Error info for tracking errors
+ */
+aom_codec_err_t aom_film_grain_table_read(
+    aom_film_grain_table_t *table, const char *filename,
+    struct aom_internal_error_info *error_info);
+
+/*!\brief Writes the grain table from a file.
+ *
+ * \param[out]  table       The grain table
+ * \param[in]   filename    The file to read from
+ * \param[in]   error_info  Error info for tracking errors
+ */
+aom_codec_err_t aom_film_grain_table_write(
+    const aom_film_grain_table_t *t, const char *filename,
+    struct aom_internal_error_info *error_info);
+
+void aom_film_grain_table_free(aom_film_grain_table_t *t);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  // AOM_AOM_DSP_GRAIN_TABLE_H_

diff --git a/libav1/aom_dsp/intrapred.c b/libav1/aom_dsp/intrapred.c
new file mode 100644
index 0000000..72ccfd8
--- /dev/null
+++ b/libav1/aom_dsp/intrapred.c

@@ -0,0 +1,792 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+#include <math.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_dsp/intrapred_common.h"
+#include "aom_mem/aom_mem.h"
+#include "aom_ports/bitops.h"
+
+static INLINE void v_predictor(uint8_t *dst, ptrdiff_t stride, int bw, int bh,
+                               const uint8_t *above, const uint8_t *left) {
+  int r;
+  (void)left;
+
+  for (r = 0; r < bh; r++) {
+    memcpy(dst, above, bw);
+    dst += stride;
+  }
+}
+
+static INLINE void h_predictor(uint8_t *dst, ptrdiff_t stride, int bw, int bh,
+                               const uint8_t *above, const uint8_t *left) {
+  int r;
+  (void)above;
+
+  for (r = 0; r < bh; r++) {
+    memset(dst, left[r], bw);
+    dst += stride;
+  }
+}
+
+static INLINE int abs_diff(int a, int b) { return (a > b) ? a - b : b - a; }
+
+static INLINE uint16_t paeth_predictor_single(uint16_t left, uint16_t top,
+                                              uint16_t top_left) {
+  const int base = top + left - top_left;
+  const int p_left = abs_diff(base, left);
+  const int p_top = abs_diff(base, top);
+  const int p_top_left = abs_diff(base, top_left);
+
+  // Return nearest to base of left, top and top_left.
+  return (p_left <= p_top && p_left <= p_top_left)
+             ? left
+             : (p_top <= p_top_left) ? top : top_left;
+}
+
+static INLINE void paeth_predictor(uint8_t *dst, ptrdiff_t stride, int bw,
+                                   int bh, const uint8_t *above,
+                                   const uint8_t *left) {
+  int r, c;
+  const uint8_t ytop_left = above[-1];
+
+  for (r = 0; r < bh; r++) {
+    for (c = 0; c < bw; c++)
+      dst[c] = (uint8_t)paeth_predictor_single(left[r], above[c], ytop_left);
+    dst += stride;
+  }
+}
+
+// Some basic checks on weights for smooth predictor.
+#define sm_weights_sanity_checks(weights_w, weights_h, weights_scale, \
+                                 pred_scale)                          \
+  assert(weights_w[0] < weights_scale);                               \
+  assert(weights_h[0] < weights_scale);                               \
+  assert(weights_scale - weights_w[bw - 1] < weights_scale);          \
+  assert(weights_scale - weights_h[bh - 1] < weights_scale);          \
+  assert(pred_scale < 31)  // ensures no overflow when calculating predictor.
+
+#define divide_round(value, bits) (((value) + (1 << ((bits)-1))) >> (bits))
+
+static INLINE void smooth_predictor(uint8_t *dst, ptrdiff_t stride, int bw,
+                                    int bh, const uint8_t *above,
+                                    const uint8_t *left) {
+  const uint8_t below_pred = left[bh - 1];   // estimated by bottom-left pixel
+  const uint8_t right_pred = above[bw - 1];  // estimated by top-right pixel
+  const uint8_t *const sm_weights_w = sm_weight_arrays + bw;
+  const uint8_t *const sm_weights_h = sm_weight_arrays + bh;
+  // scale = 2 * 2^sm_weight_log2_scale
+  const int log2_scale = 1 + sm_weight_log2_scale;
+  const uint16_t scale = (1 << sm_weight_log2_scale);
+  sm_weights_sanity_checks(sm_weights_w, sm_weights_h, scale,
+                           log2_scale + sizeof(*dst));
+  int r;
+  for (r = 0; r < bh; ++r) {
+    int c;
+    for (c = 0; c < bw; ++c) {
+      const uint8_t pixels[] = { above[c], below_pred, left[r], right_pred };
+      const uint8_t weights[] = { sm_weights_h[r], scale - sm_weights_h[r],
+                                  sm_weights_w[c], scale - sm_weights_w[c] };
+      uint32_t this_pred = 0;
+      int i;
+      assert(scale >= sm_weights_h[r] && scale >= sm_weights_w[c]);
+      for (i = 0; i < 4; ++i) {
+        this_pred += weights[i] * pixels[i];
+      }
+      dst[c] = divide_round(this_pred, log2_scale);
+    }
+    dst += stride;
+  }
+}
+
+static INLINE void smooth_v_predictor(uint8_t *dst, ptrdiff_t stride, int bw,
+                                      int bh, const uint8_t *above,
+                                      const uint8_t *left) {
+  const uint8_t below_pred = left[bh - 1];  // estimated by bottom-left pixel
+  const uint8_t *const sm_weights = sm_weight_arrays + bh;
+  // scale = 2^sm_weight_log2_scale
+  const int log2_scale = sm_weight_log2_scale;
+  const uint16_t scale = (1 << sm_weight_log2_scale);
+  sm_weights_sanity_checks(sm_weights, sm_weights, scale,
+                           log2_scale + sizeof(*dst));
+
+  int r;
+  for (r = 0; r < bh; r++) {
+    int c;
+    for (c = 0; c < bw; ++c) {
+      const uint8_t pixels[] = { above[c], below_pred };
+      const uint8_t weights[] = { sm_weights[r], scale - sm_weights[r] };
+      uint32_t this_pred = 0;
+      assert(scale >= sm_weights[r]);
+      int i;
+      for (i = 0; i < 2; ++i) {
+        this_pred += weights[i] * pixels[i];
+      }
+      dst[c] = divide_round(this_pred, log2_scale);
+    }
+    dst += stride;
+  }
+}
+
+static INLINE void smooth_h_predictor(uint8_t *dst, ptrdiff_t stride, int bw,
+                                      int bh, const uint8_t *above,
+                                      const uint8_t *left) {
+  const uint8_t right_pred = above[bw - 1];  // estimated by top-right pixel
+  const uint8_t *const sm_weights = sm_weight_arrays + bw;
+  // scale = 2^sm_weight_log2_scale
+  const int log2_scale = sm_weight_log2_scale;
+  const uint16_t scale = (1 << sm_weight_log2_scale);
+  sm_weights_sanity_checks(sm_weights, sm_weights, scale,
+                           log2_scale + sizeof(*dst));
+
+  int r;
+  for (r = 0; r < bh; r++) {
+    int c;
+    for (c = 0; c < bw; ++c) {
+      const uint8_t pixels[] = { left[r], right_pred };
+      const uint8_t weights[] = { sm_weights[c], scale - sm_weights[c] };
+      uint32_t this_pred = 0;
+      assert(scale >= sm_weights[c]);
+      int i;
+      for (i = 0; i < 2; ++i) {
+        this_pred += weights[i] * pixels[i];
+      }
+      dst[c] = divide_round(this_pred, log2_scale);
+    }
+    dst += stride;
+  }
+}
+
+static INLINE void dc_128_predictor(uint8_t *dst, ptrdiff_t stride, int bw,
+                                    int bh, const uint8_t *above,
+                                    const uint8_t *left) {
+  int r;
+  (void)above;
+  (void)left;
+
+  for (r = 0; r < bh; r++) {
+    memset(dst, 128, bw);
+    dst += stride;
+  }
+}
+
+static INLINE void dc_left_predictor(uint8_t *dst, ptrdiff_t stride, int bw,
+                                     int bh, const uint8_t *above,
+                                     const uint8_t *left) {
+  int i, r, expected_dc, sum = 0;
+  (void)above;
+
+  for (i = 0; i < bh; i++) sum += left[i];
+  expected_dc = (sum + (bh >> 1)) / bh;
+
+  for (r = 0; r < bh; r++) {
+    memset(dst, expected_dc, bw);
+    dst += stride;
+  }
+}
+
+static INLINE void dc_top_predictor(uint8_t *dst, ptrdiff_t stride, int bw,
+                                    int bh, const uint8_t *above,
+                                    const uint8_t *left) {
+  int i, r, expected_dc, sum = 0;
+  (void)left;
+
+  for (i = 0; i < bw; i++) sum += above[i];
+  expected_dc = (sum + (bw >> 1)) / bw;
+
+  for (r = 0; r < bh; r++) {
+    memset(dst, expected_dc, bw);
+    dst += stride;
+  }
+}
+
+static INLINE void dc_predictor(uint8_t *dst, ptrdiff_t stride, int bw, int bh,
+                                const uint8_t *above, const uint8_t *left) {
+  int i, r, expected_dc, sum = 0;
+  const int count = bw + bh;
+
+  for (i = 0; i < bw; i++) {
+    sum += above[i];
+  }
+  for (i = 0; i < bh; i++) {
+    sum += left[i];
+  }
+
+  expected_dc = (sum + (count >> 1)) / count;
+
+  for (r = 0; r < bh; r++) {
+    memset(dst, expected_dc, bw);
+    dst += stride;
+  }
+}
+
+static INLINE int divide_using_multiply_shift(int num, int shift1,
+                                              int multiplier, int shift2) {
+  const int interm = num >> shift1;
+  return interm * multiplier >> shift2;
+}
+
+// The constants (multiplier and shifts) for a given block size are obtained
+// as follows:
+// - Let sum_w_h =  block width + block height.
+// - Shift 'sum_w_h' right until we reach an odd number. Let the number of
+// shifts for that block size be called 'shift1' (see the parameter in
+// dc_predictor_rect() function), and let the odd number be 'd'. [d has only 2
+// possible values: d = 3 for a 1:2 rect block and d = 5 for a 1:4 rect
+// block].
+// - Find multipliers for (i) dividing by 3, and (ii) dividing by 5,
+// using the "Algorithm 1" in:
+// http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1467632
+// by ensuring that m + n = 16 (in that algorithm). This ensures that our 2nd
+// shift will be 16, regardless of the block size.
+
+// Note: For low bitdepth, assembly code may be optimized by using smaller
+// constants for smaller block sizes, where the range of the 'sum' is
+// restricted to fewer bits.
+
+#define DC_MULTIPLIER_1X2 0x5556
+#define DC_MULTIPLIER_1X4 0x3334
+
+#define DC_SHIFT2 16
+
+static INLINE void dc_predictor_rect(uint8_t *dst, ptrdiff_t stride, int bw,
+                                     int bh, const uint8_t *above,
+                                     const uint8_t *left, int shift1,
+                                     int multiplier) {
+  int sum = 0;
+
+  for (int i = 0; i < bw; i++) {
+    sum += above[i];
+  }
+  for (int i = 0; i < bh; i++) {
+    sum += left[i];
+  }
+
+  const int expected_dc = divide_using_multiply_shift(
+      sum + ((bw + bh) >> 1), shift1, multiplier, DC_SHIFT2);
+  assert(expected_dc < (1 << 8));
+
+  for (int r = 0; r < bh; r++) {
+    memset(dst, expected_dc, bw);
+    dst += stride;
+  }
+}
+
+#undef DC_SHIFT2
+
+void aom_dc_predictor_4x8_c(uint8_t *dst, ptrdiff_t stride,
+                            const uint8_t *above, const uint8_t *left) {
+  dc_predictor_rect(dst, stride, 4, 8, above, left, 2, DC_MULTIPLIER_1X2);
+}
+
+void aom_dc_predictor_8x4_c(uint8_t *dst, ptrdiff_t stride,
+                            const uint8_t *above, const uint8_t *left) {
+  dc_predictor_rect(dst, stride, 8, 4, above, left, 2, DC_MULTIPLIER_1X2);
+}
+
+void aom_dc_predictor_4x16_c(uint8_t *dst, ptrdiff_t stride,
+                             const uint8_t *above, const uint8_t *left) {
+  dc_predictor_rect(dst, stride, 4, 16, above, left, 2, DC_MULTIPLIER_1X4);
+}
+
+void aom_dc_predictor_16x4_c(uint8_t *dst, ptrdiff_t stride,
+                             const uint8_t *above, const uint8_t *left) {
+  dc_predictor_rect(dst, stride, 16, 4, above, left, 2, DC_MULTIPLIER_1X4);
+}
+
+void aom_dc_predictor_8x16_c(uint8_t *dst, ptrdiff_t stride,
+                             const uint8_t *above, const uint8_t *left) {
+  dc_predictor_rect(dst, stride, 8, 16, above, left, 3, DC_MULTIPLIER_1X2);
+}
+
+void aom_dc_predictor_16x8_c(uint8_t *dst, ptrdiff_t stride,
+                             const uint8_t *above, const uint8_t *left) {
+  dc_predictor_rect(dst, stride, 16, 8, above, left, 3, DC_MULTIPLIER_1X2);
+}
+
+void aom_dc_predictor_8x32_c(uint8_t *dst, ptrdiff_t stride,
+                             const uint8_t *above, const uint8_t *left) {
+  dc_predictor_rect(dst, stride, 8, 32, above, left, 3, DC_MULTIPLIER_1X4);
+}
+
+void aom_dc_predictor_32x8_c(uint8_t *dst, ptrdiff_t stride,
+                             const uint8_t *above, const uint8_t *left) {
+  dc_predictor_rect(dst, stride, 32, 8, above, left, 3, DC_MULTIPLIER_1X4);
+}
+
+void aom_dc_predictor_16x32_c(uint8_t *dst, ptrdiff_t stride,
+                              const uint8_t *above, const uint8_t *left) {
+  dc_predictor_rect(dst, stride, 16, 32, above, left, 4, DC_MULTIPLIER_1X2);
+}
+
+void aom_dc_predictor_32x16_c(uint8_t *dst, ptrdiff_t stride,
+                              const uint8_t *above, const uint8_t *left) {
+  dc_predictor_rect(dst, stride, 32, 16, above, left, 4, DC_MULTIPLIER_1X2);
+}
+
+void aom_dc_predictor_16x64_c(uint8_t *dst, ptrdiff_t stride,
+                              const uint8_t *above, const uint8_t *left) {
+  dc_predictor_rect(dst, stride, 16, 64, above, left, 4, DC_MULTIPLIER_1X4);
+}
+
+void aom_dc_predictor_64x16_c(uint8_t *dst, ptrdiff_t stride,
+                              const uint8_t *above, const uint8_t *left) {
+  dc_predictor_rect(dst, stride, 64, 16, above, left, 4, DC_MULTIPLIER_1X4);
+}
+
+void aom_dc_predictor_32x64_c(uint8_t *dst, ptrdiff_t stride,
+                              const uint8_t *above, const uint8_t *left) {
+  dc_predictor_rect(dst, stride, 32, 64, above, left, 5, DC_MULTIPLIER_1X2);
+}
+
+void aom_dc_predictor_64x32_c(uint8_t *dst, ptrdiff_t stride,
+                              const uint8_t *above, const uint8_t *left) {
+  dc_predictor_rect(dst, stride, 64, 32, above, left, 5, DC_MULTIPLIER_1X2);
+}
+
+#undef DC_MULTIPLIER_1X2
+#undef DC_MULTIPLIER_1X4
+
+static INLINE void highbd_v_predictor(uint16_t *dst, ptrdiff_t stride, int bw,
+                                      int bh, const uint16_t *above,
+                                      const uint16_t *left, int bd) {
+  int r;
+  (void)left;
+  (void)bd;
+  for (r = 0; r < bh; r++) {
+    memcpy(dst, above, bw * sizeof(uint16_t));
+    dst += stride;
+  }
+}
+
+static INLINE void highbd_h_predictor(uint16_t *dst, ptrdiff_t stride, int bw,
+                                      int bh, const uint16_t *above,
+                                      const uint16_t *left, int bd) {
+  int r;
+  (void)above;
+  (void)bd;
+  for (r = 0; r < bh; r++) {
+    aom_memset16(dst, left[r], bw);
+    dst += stride;
+  }
+}
+
+static INLINE void highbd_paeth_predictor(uint16_t *dst, ptrdiff_t stride,
+                                          int bw, int bh, const uint16_t *above,
+                                          const uint16_t *left, int bd) {
+  int r, c;
+  const uint16_t ytop_left = above[-1];
+  (void)bd;
+
+  for (r = 0; r < bh; r++) {
+    for (c = 0; c < bw; c++)
+      dst[c] = paeth_predictor_single(left[r], above[c], ytop_left);
+    dst += stride;
+  }
+}
+
+static INLINE void highbd_smooth_predictor(uint16_t *dst, ptrdiff_t stride,
+                                           int bw, int bh,
+                                           const uint16_t *above,
+                                           const uint16_t *left, int bd) {
+  (void)bd;
+  const uint16_t below_pred = left[bh - 1];   // estimated by bottom-left pixel
+  const uint16_t right_pred = above[bw - 1];  // estimated by top-right pixel
+  const uint8_t *const sm_weights_w = sm_weight_arrays + bw;
+  const uint8_t *const sm_weights_h = sm_weight_arrays + bh;
+  // scale = 2 * 2^sm_weight_log2_scale
+  const int log2_scale = 1 + sm_weight_log2_scale;
+  const uint16_t scale = (1 << sm_weight_log2_scale);
+  sm_weights_sanity_checks(sm_weights_w, sm_weights_h, scale,
+                           log2_scale + sizeof(*dst));
+  int r;
+  for (r = 0; r < bh; ++r) {
+    int c;
+    for (c = 0; c < bw; ++c) {
+      const uint16_t pixels[] = { above[c], below_pred, left[r], right_pred };
+      const uint8_t weights[] = { sm_weights_h[r], scale - sm_weights_h[r],
+                                  sm_weights_w[c], scale - sm_weights_w[c] };
+      uint32_t this_pred = 0;
+      int i;
+      assert(scale >= sm_weights_h[r] && scale >= sm_weights_w[c]);
+      for (i = 0; i < 4; ++i) {
+        this_pred += weights[i] * pixels[i];
+      }
+      dst[c] = divide_round(this_pred, log2_scale);
+    }
+    dst += stride;
+  }
+}
+
+static INLINE void highbd_smooth_v_predictor(uint16_t *dst, ptrdiff_t stride,
+                                             int bw, int bh,
+                                             const uint16_t *above,
+                                             const uint16_t *left, int bd) {
+  (void)bd;
+  const uint16_t below_pred = left[bh - 1];  // estimated by bottom-left pixel
+  const uint8_t *const sm_weights = sm_weight_arrays + bh;
+  // scale = 2^sm_weight_log2_scale
+  const int log2_scale = sm_weight_log2_scale;
+  const uint16_t scale = (1 << sm_weight_log2_scale);
+  sm_weights_sanity_checks(sm_weights, sm_weights, scale,
+                           log2_scale + sizeof(*dst));
+
+  int r;
+  for (r = 0; r < bh; r++) {
+    int c;
+    for (c = 0; c < bw; ++c) {
+      const uint16_t pixels[] = { above[c], below_pred };
+      const uint8_t weights[] = { sm_weights[r], scale - sm_weights[r] };
+      uint32_t this_pred = 0;
+      assert(scale >= sm_weights[r]);
+      int i;
+      for (i = 0; i < 2; ++i) {
+        this_pred += weights[i] * pixels[i];
+      }
+      dst[c] = divide_round(this_pred, log2_scale);
+    }
+    dst += stride;
+  }
+}
+
+static INLINE void highbd_smooth_h_predictor(uint16_t *dst, ptrdiff_t stride,
+                                             int bw, int bh,
+                                             const uint16_t *above,
+                                             const uint16_t *left, int bd) {
+  (void)bd;
+  const uint16_t right_pred = above[bw - 1];  // estimated by top-right pixel
+  const uint8_t *const sm_weights = sm_weight_arrays + bw;
+  // scale = 2^sm_weight_log2_scale
+  const int log2_scale = sm_weight_log2_scale;
+  const uint16_t scale = (1 << sm_weight_log2_scale);
+  sm_weights_sanity_checks(sm_weights, sm_weights, scale,
+                           log2_scale + sizeof(*dst));
+
+  int r;
+  for (r = 0; r < bh; r++) {
+    int c;
+    for (c = 0; c < bw; ++c) {
+      const uint16_t pixels[] = { left[r], right_pred };
+      const uint8_t weights[] = { sm_weights[c], scale - sm_weights[c] };
+      uint32_t this_pred = 0;
+      assert(scale >= sm_weights[c]);
+      int i;
+      for (i = 0; i < 2; ++i) {
+        this_pred += weights[i] * pixels[i];
+      }
+      dst[c] = divide_round(this_pred, log2_scale);
+    }
+    dst += stride;
+  }
+}
+
+static INLINE void highbd_dc_128_predictor(uint16_t *dst, ptrdiff_t stride,
+                                           int bw, int bh,
+                                           const uint16_t *above,
+                                           const uint16_t *left, int bd) {
+  int r;
+  (void)above;
+  (void)left;
+
+  for (r = 0; r < bh; r++) {
+    aom_memset16(dst, 128 << (bd - 8), bw);
+    dst += stride;
+  }
+}
+
+static INLINE void highbd_dc_left_predictor(uint16_t *dst, ptrdiff_t stride,
+                                            int bw, int bh,
+                                            const uint16_t *above,
+                                            const uint16_t *left, int bd) {
+  int i, r, expected_dc, sum = 0;
+  (void)above;
+  (void)bd;
+
+  for (i = 0; i < bh; i++) sum += left[i];
+  expected_dc = (sum + (bh >> 1)) / bh;
+
+  for (r = 0; r < bh; r++) {
+    aom_memset16(dst, expected_dc, bw);
+    dst += stride;
+  }
+}
+
+static INLINE void highbd_dc_top_predictor(uint16_t *dst, ptrdiff_t stride,
+                                           int bw, int bh,
+                                           const uint16_t *above,
+                                           const uint16_t *left, int bd) {
+  int i, r, expected_dc, sum = 0;
+  (void)left;
+  (void)bd;
+
+  for (i = 0; i < bw; i++) sum += above[i];
+  expected_dc = (sum + (bw >> 1)) / bw;
+
+  for (r = 0; r < bh; r++) {
+    aom_memset16(dst, expected_dc, bw);
+    dst += stride;
+  }
+}
+
+static INLINE void highbd_dc_predictor(uint16_t *dst, ptrdiff_t stride, int bw,
+                                       int bh, const uint16_t *above,
+                                       const uint16_t *left, int bd) {
+  int i, r, expected_dc, sum = 0;
+  const int count = bw + bh;
+  (void)bd;
+
+  for (i = 0; i < bw; i++) {
+    sum += above[i];
+  }
+  for (i = 0; i < bh; i++) {
+    sum += left[i];
+  }
+
+  expected_dc = (sum + (count >> 1)) / count;
+
+  for (r = 0; r < bh; r++) {
+    aom_memset16(dst, expected_dc, bw);
+    dst += stride;
+  }
+}
+
+// Obtained similarly as DC_MULTIPLIER_1X2 and DC_MULTIPLIER_1X4 above, but
+// assume 2nd shift of 17 bits instead of 16.
+// Note: Strictly speaking, 2nd shift needs to be 17 only when:
+// - bit depth == 12, and
+// - bw + bh is divisible by 5 (as opposed to divisible by 3).
+// All other cases can use half the multipliers with a shift of 16 instead.
+// This special optimization can be used when writing assembly code.
+#define HIGHBD_DC_MULTIPLIER_1X2 0xAAAB
+// Note: This constant is odd, but a smaller even constant (0x199a) with the
+// appropriate shift should work for neon in 8/10-bit.
+#define HIGHBD_DC_MULTIPLIER_1X4 0x6667
+
+#define HIGHBD_DC_SHIFT2 17
+
+static INLINE void highbd_dc_predictor_rect(uint16_t *dst, ptrdiff_t stride,
+                                            int bw, int bh,
+                                            const uint16_t *above,
+                                            const uint16_t *left, int bd,
+                                            int shift1, uint32_t multiplier) {
+  int sum = 0;
+  (void)bd;
+
+  for (int i = 0; i < bw; i++) {
+    sum += above[i];
+  }
+  for (int i = 0; i < bh; i++) {
+    sum += left[i];
+  }
+
+  const int expected_dc = divide_using_multiply_shift(
+      sum + ((bw + bh) >> 1), shift1, multiplier, HIGHBD_DC_SHIFT2);
+  assert(expected_dc < (1 << bd));
+
+  for (int r = 0; r < bh; r++) {
+    aom_memset16(dst, expected_dc, bw);
+    dst += stride;
+  }
+}
+
+#undef HIGHBD_DC_SHIFT2
+
+void aom_highbd_dc_predictor_4x8_c(uint16_t *dst, ptrdiff_t stride,
+                                   const uint16_t *above, const uint16_t *left,
+                                   int bd) {
+  highbd_dc_predictor_rect(dst, stride, 4, 8, above, left, bd, 2,
+                           HIGHBD_DC_MULTIPLIER_1X2);
+}
+
+void aom_highbd_dc_predictor_8x4_c(uint16_t *dst, ptrdiff_t stride,
+                                   const uint16_t *above, const uint16_t *left,
+                                   int bd) {
+  highbd_dc_predictor_rect(dst, stride, 8, 4, above, left, bd, 2,
+                           HIGHBD_DC_MULTIPLIER_1X2);
+}
+
+void aom_highbd_dc_predictor_4x16_c(uint16_t *dst, ptrdiff_t stride,
+                                    const uint16_t *above, const uint16_t *left,
+                                    int bd) {
+  highbd_dc_predictor_rect(dst, stride, 4, 16, above, left, bd, 2,
+                           HIGHBD_DC_MULTIPLIER_1X4);
+}
+
+void aom_highbd_dc_predictor_16x4_c(uint16_t *dst, ptrdiff_t stride,
+                                    const uint16_t *above, const uint16_t *left,
+                                    int bd) {
+  highbd_dc_predictor_rect(dst, stride, 16, 4, above, left, bd, 2,
+                           HIGHBD_DC_MULTIPLIER_1X4);
+}
+
+void aom_highbd_dc_predictor_8x16_c(uint16_t *dst, ptrdiff_t stride,
+                                    const uint16_t *above, const uint16_t *left,
+                                    int bd) {
+  highbd_dc_predictor_rect(dst, stride, 8, 16, above, left, bd, 3,
+                           HIGHBD_DC_MULTIPLIER_1X2);
+}
+
+void aom_highbd_dc_predictor_16x8_c(uint16_t *dst, ptrdiff_t stride,
+                                    const uint16_t *above, const uint16_t *left,
+                                    int bd) {
+  highbd_dc_predictor_rect(dst, stride, 16, 8, above, left, bd, 3,
+                           HIGHBD_DC_MULTIPLIER_1X2);
+}
+
+void aom_highbd_dc_predictor_8x32_c(uint16_t *dst, ptrdiff_t stride,
+                                    const uint16_t *above, const uint16_t *left,
+                                    int bd) {
+  highbd_dc_predictor_rect(dst, stride, 8, 32, above, left, bd, 3,
+                           HIGHBD_DC_MULTIPLIER_1X4);
+}
+
+void aom_highbd_dc_predictor_32x8_c(uint16_t *dst, ptrdiff_t stride,
+                                    const uint16_t *above, const uint16_t *left,
+                                    int bd) {
+  highbd_dc_predictor_rect(dst, stride, 32, 8, above, left, bd, 3,
+                           HIGHBD_DC_MULTIPLIER_1X4);
+}
+
+void aom_highbd_dc_predictor_16x32_c(uint16_t *dst, ptrdiff_t stride,
+                                     const uint16_t *above,
+                                     const uint16_t *left, int bd) {
+  highbd_dc_predictor_rect(dst, stride, 16, 32, above, left, bd, 4,
+                           HIGHBD_DC_MULTIPLIER_1X2);
+}
+
+void aom_highbd_dc_predictor_32x16_c(uint16_t *dst, ptrdiff_t stride,
+                                     const uint16_t *above,
+                                     const uint16_t *left, int bd) {
+  highbd_dc_predictor_rect(dst, stride, 32, 16, above, left, bd, 4,
+                           HIGHBD_DC_MULTIPLIER_1X2);
+}
+
+void aom_highbd_dc_predictor_16x64_c(uint16_t *dst, ptrdiff_t stride,
+                                     const uint16_t *above,
+                                     const uint16_t *left, int bd) {
+  highbd_dc_predictor_rect(dst, stride, 16, 64, above, left, bd, 4,
+                           HIGHBD_DC_MULTIPLIER_1X4);
+}
+
+void aom_highbd_dc_predictor_64x16_c(uint16_t *dst, ptrdiff_t stride,
+                                     const uint16_t *above,
+                                     const uint16_t *left, int bd) {
+  highbd_dc_predictor_rect(dst, stride, 64, 16, above, left, bd, 4,
+                           HIGHBD_DC_MULTIPLIER_1X4);
+}
+
+void aom_highbd_dc_predictor_32x64_c(uint16_t *dst, ptrdiff_t stride,
+                                     const uint16_t *above,
+                                     const uint16_t *left, int bd) {
+  highbd_dc_predictor_rect(dst, stride, 32, 64, above, left, bd, 5,
+                           HIGHBD_DC_MULTIPLIER_1X2);
+}
+
+void aom_highbd_dc_predictor_64x32_c(uint16_t *dst, ptrdiff_t stride,
+                                     const uint16_t *above,
+                                     const uint16_t *left, int bd) {
+  highbd_dc_predictor_rect(dst, stride, 64, 32, above, left, bd, 5,
+                           HIGHBD_DC_MULTIPLIER_1X2);
+}
+
+#undef HIGHBD_DC_MULTIPLIER_1X2
+#undef HIGHBD_DC_MULTIPLIER_1X4
+
+// This serves as a wrapper function, so that all the prediction functions
+// can be unified and accessed as a pointer array. Note that the boundary
+// above and left are not necessarily used all the time.
+#define intra_pred_sized(type, width, height)                  \
+  void aom_##type##_predictor_##width##x##height##_c(          \
+      uint8_t *dst, ptrdiff_t stride, const uint8_t *above,    \
+      const uint8_t *left) {                                   \
+    type##_predictor(dst, stride, width, height, above, left); \
+  }
+
+#define intra_pred_highbd_sized(type, width, height)                        \
+  void aom_highbd_##type##_predictor_##width##x##height##_c(                \
+      uint16_t *dst, ptrdiff_t stride, const uint16_t *above,               \
+      const uint16_t *left, int bd) {                                       \
+    highbd_##type##_predictor(dst, stride, width, height, above, left, bd); \
+  }
+
+/* clang-format off */
+#define intra_pred_rectangular(type) \
+  intra_pred_sized(type, 4, 8) \
+  intra_pred_sized(type, 8, 4) \
+  intra_pred_sized(type, 8, 16) \
+  intra_pred_sized(type, 16, 8) \
+  intra_pred_sized(type, 16, 32) \
+  intra_pred_sized(type, 32, 16) \
+  intra_pred_sized(type, 32, 64) \
+  intra_pred_sized(type, 64, 32) \
+  intra_pred_sized(type, 4, 16) \
+  intra_pred_sized(type, 16, 4) \
+  intra_pred_sized(type, 8, 32) \
+  intra_pred_sized(type, 32, 8) \
+  intra_pred_sized(type, 16, 64) \
+  intra_pred_sized(type, 64, 16) \
+  intra_pred_highbd_sized(type, 4, 8) \
+  intra_pred_highbd_sized(type, 8, 4) \
+  intra_pred_highbd_sized(type, 8, 16) \
+  intra_pred_highbd_sized(type, 16, 8) \
+  intra_pred_highbd_sized(type, 16, 32) \
+  intra_pred_highbd_sized(type, 32, 16) \
+  intra_pred_highbd_sized(type, 32, 64) \
+  intra_pred_highbd_sized(type, 64, 32) \
+  intra_pred_highbd_sized(type, 4, 16) \
+  intra_pred_highbd_sized(type, 16, 4) \
+  intra_pred_highbd_sized(type, 8, 32) \
+  intra_pred_highbd_sized(type, 32, 8) \
+  intra_pred_highbd_sized(type, 16, 64) \
+  intra_pred_highbd_sized(type, 64, 16)
+#define intra_pred_above_4x4(type) \
+  intra_pred_sized(type, 8, 8) \
+  intra_pred_sized(type, 16, 16) \
+  intra_pred_sized(type, 32, 32) \
+  intra_pred_sized(type, 64, 64) \
+  intra_pred_highbd_sized(type, 4, 4) \
+  intra_pred_highbd_sized(type, 8, 8) \
+  intra_pred_highbd_sized(type, 16, 16) \
+  intra_pred_highbd_sized(type, 32, 32) \
+  intra_pred_highbd_sized(type, 64, 64) \
+  intra_pred_rectangular(type)
+#define intra_pred_allsizes(type) \
+  intra_pred_sized(type, 4, 4) \
+  intra_pred_above_4x4(type)
+#define intra_pred_square(type) \
+  intra_pred_sized(type, 4, 4) \
+  intra_pred_sized(type, 8, 8) \
+  intra_pred_sized(type, 16, 16) \
+  intra_pred_sized(type, 32, 32) \
+  intra_pred_sized(type, 64, 64) \
+  intra_pred_highbd_sized(type, 4, 4) \
+  intra_pred_highbd_sized(type, 8, 8) \
+  intra_pred_highbd_sized(type, 16, 16) \
+  intra_pred_highbd_sized(type, 32, 32) \
+  intra_pred_highbd_sized(type, 64, 64)
+
+intra_pred_allsizes(v)
+intra_pred_allsizes(h)
+intra_pred_allsizes(smooth)
+intra_pred_allsizes(smooth_v)
+intra_pred_allsizes(smooth_h)
+intra_pred_allsizes(paeth)
+intra_pred_allsizes(dc_128)
+intra_pred_allsizes(dc_left)
+intra_pred_allsizes(dc_top)
+intra_pred_square(dc)
+/* clang-format on */
+#undef intra_pred_allsizes

diff --git a/libav1/aom_dsp/intrapred_common.h b/libav1/aom_dsp/intrapred_common.h
new file mode 100644
index 0000000..3ec62a8
--- /dev/null
+++ b/libav1/aom_dsp/intrapred_common.h

@@ -0,0 +1,47 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_INTRAPRED_COMMON_H_
+#define AOM_AOM_DSP_INTRAPRED_COMMON_H_
+
+#include "config/aom_config.h"
+
+// Weights are quadratic from '1' to '1 / block_size', scaled by
+// 2^sm_weight_log2_scale.
+static const int sm_weight_log2_scale = 8;
+
+// max(block_size_wide[BLOCK_LARGEST], block_size_high[BLOCK_LARGEST])
+#define MAX_BLOCK_DIM 64
+
+/* clang-format off */
+static const uint8_t sm_weight_arrays[2 * MAX_BLOCK_DIM] = {
+  // Unused, because we always offset by bs, which is at least 2.
+  0, 0,
+  // bs = 2
+  255, 128,
+  // bs = 4
+  255, 149, 85, 64,
+  // bs = 8
+  255, 197, 146, 105, 73, 50, 37, 32,
+  // bs = 16
+  255, 225, 196, 170, 145, 123, 102, 84, 68, 54, 43, 33, 26, 20, 17, 16,
+  // bs = 32
+  255, 240, 225, 210, 196, 182, 169, 157, 145, 133, 122, 111, 101, 92, 83, 74,
+  66, 59, 52, 45, 39, 34, 29, 25, 21, 17, 14, 12, 10, 9, 8, 8,
+  // bs = 64
+  255, 248, 240, 233, 225, 218, 210, 203, 196, 189, 182, 176, 169, 163, 156,
+  150, 144, 138, 133, 127, 121, 116, 111, 106, 101, 96, 91, 86, 82, 77, 73, 69,
+  65, 61, 57, 54, 50, 47, 44, 41, 38, 35, 32, 29, 27, 25, 22, 20, 18, 16, 15,
+  13, 12, 10, 9, 8, 7, 6, 6, 5, 5, 4, 4, 4,
+};
+/* clang-format on */
+
+#endif  // AOM_AOM_DSP_INTRAPRED_COMMON_H_

diff --git a/libav1/aom_dsp/loopfilter.c b/libav1/aom_dsp/loopfilter.c
new file mode 100644
index 0000000..a3f2618
--- /dev/null
+++ b/libav1/aom_dsp/loopfilter.c

@@ -0,0 +1,925 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <stdlib.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_ports/mem.h"
+
+static INLINE int8_t signed_char_clamp(int t) {
+  return (int8_t)clamp(t, -128, 127);
+}
+
+static INLINE int16_t signed_char_clamp_high(int t, int bd) {
+  switch (bd) {
+    case 10: return (int16_t)clamp(t, -128 * 4, 128 * 4 - 1);
+    case 12: return (int16_t)clamp(t, -128 * 16, 128 * 16 - 1);
+    case 8:
+    default: return (int16_t)clamp(t, -128, 128 - 1);
+  }
+}
+
+// should we apply any filter at all: 11111111 yes, 00000000 no
+static INLINE int8_t filter_mask2(uint8_t limit, uint8_t blimit, uint8_t p1,
+                                  uint8_t p0, uint8_t q0, uint8_t q1) {
+  int8_t mask = 0;
+  mask |= (abs(p1 - p0) > limit) * -1;
+  mask |= (abs(q1 - q0) > limit) * -1;
+  mask |= (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 > blimit) * -1;
+  return ~mask;
+}
+
+static INLINE int8_t filter_mask(uint8_t limit, uint8_t blimit, uint8_t p3,
+                                 uint8_t p2, uint8_t p1, uint8_t p0, uint8_t q0,
+                                 uint8_t q1, uint8_t q2, uint8_t q3) {
+  int8_t mask = 0;
+  mask |= (abs(p3 - p2) > limit) * -1;
+  mask |= (abs(p2 - p1) > limit) * -1;
+  mask |= (abs(p1 - p0) > limit) * -1;
+  mask |= (abs(q1 - q0) > limit) * -1;
+  mask |= (abs(q2 - q1) > limit) * -1;
+  mask |= (abs(q3 - q2) > limit) * -1;
+  mask |= (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 > blimit) * -1;
+  return ~mask;
+}
+
+static INLINE int8_t filter_mask3_chroma(uint8_t limit, uint8_t blimit,
+                                         uint8_t p2, uint8_t p1, uint8_t p0,
+                                         uint8_t q0, uint8_t q1, uint8_t q2) {
+  int8_t mask = 0;
+  mask |= (abs(p2 - p1) > limit) * -1;
+  mask |= (abs(p1 - p0) > limit) * -1;
+  mask |= (abs(q1 - q0) > limit) * -1;
+  mask |= (abs(q2 - q1) > limit) * -1;
+  mask |= (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 > blimit) * -1;
+  return ~mask;
+}
+
+static INLINE int8_t flat_mask3_chroma(uint8_t thresh, uint8_t p2, uint8_t p1,
+                                       uint8_t p0, uint8_t q0, uint8_t q1,
+                                       uint8_t q2) {
+  int8_t mask = 0;
+  mask |= (abs(p1 - p0) > thresh) * -1;
+  mask |= (abs(q1 - q0) > thresh) * -1;
+  mask |= (abs(p2 - p0) > thresh) * -1;
+  mask |= (abs(q2 - q0) > thresh) * -1;
+  return ~mask;
+}
+
+static INLINE int8_t flat_mask4(uint8_t thresh, uint8_t p3, uint8_t p2,
+                                uint8_t p1, uint8_t p0, uint8_t q0, uint8_t q1,
+                                uint8_t q2, uint8_t q3) {
+  int8_t mask = 0;
+  mask |= (abs(p1 - p0) > thresh) * -1;
+  mask |= (abs(q1 - q0) > thresh) * -1;
+  mask |= (abs(p2 - p0) > thresh) * -1;
+  mask |= (abs(q2 - q0) > thresh) * -1;
+  mask |= (abs(p3 - p0) > thresh) * -1;
+  mask |= (abs(q3 - q0) > thresh) * -1;
+  return ~mask;
+}
+
+// is there high edge variance internal edge: 11111111 yes, 00000000 no
+static INLINE int8_t hev_mask(uint8_t thresh, uint8_t p1, uint8_t p0,
+                              uint8_t q0, uint8_t q1) {
+  int8_t hev = 0;
+  hev |= (abs(p1 - p0) > thresh) * -1;
+  hev |= (abs(q1 - q0) > thresh) * -1;
+  return hev;
+}
+
+static INLINE void filter4(int8_t mask, uint8_t thresh, uint8_t *op1,
+                           uint8_t *op0, uint8_t *oq0, uint8_t *oq1) {
+  int8_t filter1, filter2;
+
+  const int8_t ps1 = (int8_t)*op1 ^ 0x80;
+  const int8_t ps0 = (int8_t)*op0 ^ 0x80;
+  const int8_t qs0 = (int8_t)*oq0 ^ 0x80;
+  const int8_t qs1 = (int8_t)*oq1 ^ 0x80;
+  const uint8_t hev = hev_mask(thresh, *op1, *op0, *oq0, *oq1);
+
+  // add outer taps if we have high edge variance
+  int8_t filter = signed_char_clamp(ps1 - qs1) & hev;
+
+  // inner taps
+  filter = signed_char_clamp(filter + 3 * (qs0 - ps0)) & mask;
+
+  // save bottom 3 bits so that we round one side +4 and the other +3
+  // if it equals 4 we'll set to adjust by -1 to account for the fact
+  // we'd round 3 the other way
+  filter1 = signed_char_clamp(filter + 4) >> 3;
+  filter2 = signed_char_clamp(filter + 3) >> 3;
+
+  *oq0 = signed_char_clamp(qs0 - filter1) ^ 0x80;
+  *op0 = signed_char_clamp(ps0 + filter2) ^ 0x80;
+
+  // outer tap adjustments
+  filter = ROUND_POWER_OF_TWO(filter1, 1) & ~hev;
+
+  *oq1 = signed_char_clamp(qs1 - filter) ^ 0x80;
+  *op1 = signed_char_clamp(ps1 + filter) ^ 0x80;
+}
+
+void aom_lpf_horizontal_4_c(uint8_t *s, int p /* pitch */,
+                            const uint8_t *blimit, const uint8_t *limit,
+                            const uint8_t *thresh) {
+  int i;
+  int count = 4;
+
+  // loop filter designed to work using chars so that we can make maximum use
+  // of 8 bit simd instructions.
+  for (i = 0; i < count; ++i) {
+    const uint8_t p1 = s[-2 * p], p0 = s[-p];
+    const uint8_t q0 = s[0 * p], q1 = s[1 * p];
+    const int8_t mask = filter_mask2(*limit, *blimit, p1, p0, q0, q1);
+    filter4(mask, *thresh, s - 2 * p, s - 1 * p, s, s + 1 * p);
+    ++s;
+  }
+}
+
+void aom_lpf_horizontal_4_dual_c(uint8_t *s, int p, const uint8_t *blimit0,
+                                 const uint8_t *limit0, const uint8_t *thresh0,
+                                 const uint8_t *blimit1, const uint8_t *limit1,
+                                 const uint8_t *thresh1) {
+  aom_lpf_horizontal_4_c(s, p, blimit0, limit0, thresh0);
+  aom_lpf_horizontal_4_c(s + 4, p, blimit1, limit1, thresh1);
+}
+
+void aom_lpf_vertical_4_c(uint8_t *s, int pitch, const uint8_t *blimit,
+                          const uint8_t *limit, const uint8_t *thresh) {
+  int i;
+  int count = 4;
+
+  // loop filter designed to work using chars so that we can make maximum use
+  // of 8 bit simd instructions.
+  for (i = 0; i < count; ++i) {
+    const uint8_t p1 = s[-2], p0 = s[-1];
+    const uint8_t q0 = s[0], q1 = s[1];
+    const int8_t mask = filter_mask2(*limit, *blimit, p1, p0, q0, q1);
+    filter4(mask, *thresh, s - 2, s - 1, s, s + 1);
+    s += pitch;
+  }
+}
+
+void aom_lpf_vertical_4_dual_c(uint8_t *s, int pitch, const uint8_t *blimit0,
+                               const uint8_t *limit0, const uint8_t *thresh0,
+                               const uint8_t *blimit1, const uint8_t *limit1,
+                               const uint8_t *thresh1) {
+  aom_lpf_vertical_4_c(s, pitch, blimit0, limit0, thresh0);
+  aom_lpf_vertical_4_c(s + 4 * pitch, pitch, blimit1, limit1, thresh1);
+}
+
+static INLINE void filter6(int8_t mask, uint8_t thresh, int8_t flat,
+                           uint8_t *op2, uint8_t *op1, uint8_t *op0,
+                           uint8_t *oq0, uint8_t *oq1, uint8_t *oq2) {
+  if (flat && mask) {
+    const uint8_t p2 = *op2, p1 = *op1, p0 = *op0;
+    const uint8_t q0 = *oq0, q1 = *oq1, q2 = *oq2;
+
+    // 5-tap filter [1, 2, 2, 2, 1]
+    *op1 = ROUND_POWER_OF_TWO(p2 * 3 + p1 * 2 + p0 * 2 + q0, 3);
+    *op0 = ROUND_POWER_OF_TWO(p2 + p1 * 2 + p0 * 2 + q0 * 2 + q1, 3);
+    *oq0 = ROUND_POWER_OF_TWO(p1 + p0 * 2 + q0 * 2 + q1 * 2 + q2, 3);
+    *oq1 = ROUND_POWER_OF_TWO(p0 + q0 * 2 + q1 * 2 + q2 * 3, 3);
+  } else {
+    filter4(mask, thresh, op1, op0, oq0, oq1);
+  }
+}
+
+static INLINE void filter8(int8_t mask, uint8_t thresh, int8_t flat,
+                           uint8_t *op3, uint8_t *op2, uint8_t *op1,
+                           uint8_t *op0, uint8_t *oq0, uint8_t *oq1,
+                           uint8_t *oq2, uint8_t *oq3) {
+  if (flat && mask) {
+    const uint8_t p3 = *op3, p2 = *op2, p1 = *op1, p0 = *op0;
+    const uint8_t q0 = *oq0, q1 = *oq1, q2 = *oq2, q3 = *oq3;
+
+    // 7-tap filter [1, 1, 1, 2, 1, 1, 1]
+    *op2 = ROUND_POWER_OF_TWO(p3 + p3 + p3 + 2 * p2 + p1 + p0 + q0, 3);
+    *op1 = ROUND_POWER_OF_TWO(p3 + p3 + p2 + 2 * p1 + p0 + q0 + q1, 3);
+    *op0 = ROUND_POWER_OF_TWO(p3 + p2 + p1 + 2 * p0 + q0 + q1 + q2, 3);
+    *oq0 = ROUND_POWER_OF_TWO(p2 + p1 + p0 + 2 * q0 + q1 + q2 + q3, 3);
+    *oq1 = ROUND_POWER_OF_TWO(p1 + p0 + q0 + 2 * q1 + q2 + q3 + q3, 3);
+    *oq2 = ROUND_POWER_OF_TWO(p0 + q0 + q1 + 2 * q2 + q3 + q3 + q3, 3);
+  } else {
+    filter4(mask, thresh, op1, op0, oq0, oq1);
+  }
+}
+
+void aom_lpf_horizontal_6_c(uint8_t *s, int p, const uint8_t *blimit,
+                            const uint8_t *limit, const uint8_t *thresh) {
+  int i;
+  int count = 4;
+
+  // loop filter designed to work using chars so that we can make maximum use
+  // of 8 bit simd instructions.
+  for (i = 0; i < count; ++i) {
+    const uint8_t p2 = s[-3 * p], p1 = s[-2 * p], p0 = s[-p];
+    const uint8_t q0 = s[0 * p], q1 = s[1 * p], q2 = s[2 * p];
+
+    const int8_t mask =
+        filter_mask3_chroma(*limit, *blimit, p2, p1, p0, q0, q1, q2);
+    const int8_t flat = flat_mask3_chroma(1, p2, p1, p0, q0, q1, q2);
+    filter6(mask, *thresh, flat, s - 3 * p, s - 2 * p, s - 1 * p, s, s + 1 * p,
+            s + 2 * p);
+    ++s;
+  }
+}
+
+void aom_lpf_horizontal_6_dual_c(uint8_t *s, int p, const uint8_t *blimit0,
+                                 const uint8_t *limit0, const uint8_t *thresh0,
+                                 const uint8_t *blimit1, const uint8_t *limit1,
+                                 const uint8_t *thresh1) {
+  aom_lpf_horizontal_6_c(s, p, blimit0, limit0, thresh0);
+  aom_lpf_horizontal_6_c(s + 4, p, blimit1, limit1, thresh1);
+}
+
+void aom_lpf_horizontal_8_c(uint8_t *s, int p, const uint8_t *blimit,
+                            const uint8_t *limit, const uint8_t *thresh) {
+  int i;
+  int count = 4;
+
+  // loop filter designed to work using chars so that we can make maximum use
+  // of 8 bit simd instructions.
+  for (i = 0; i < count; ++i) {
+    const uint8_t p3 = s[-4 * p], p2 = s[-3 * p], p1 = s[-2 * p], p0 = s[-p];
+    const uint8_t q0 = s[0 * p], q1 = s[1 * p], q2 = s[2 * p], q3 = s[3 * p];
+
+    const int8_t mask =
+        filter_mask(*limit, *blimit, p3, p2, p1, p0, q0, q1, q2, q3);
+    const int8_t flat = flat_mask4(1, p3, p2, p1, p0, q0, q1, q2, q3);
+    filter8(mask, *thresh, flat, s - 4 * p, s - 3 * p, s - 2 * p, s - 1 * p, s,
+            s + 1 * p, s + 2 * p, s + 3 * p);
+    ++s;
+  }
+}
+
+void aom_lpf_horizontal_8_dual_c(uint8_t *s, int p, const uint8_t *blimit0,
+                                 const uint8_t *limit0, const uint8_t *thresh0,
+                                 const uint8_t *blimit1, const uint8_t *limit1,
+                                 const uint8_t *thresh1) {
+  aom_lpf_horizontal_8_c(s, p, blimit0, limit0, thresh0);
+  aom_lpf_horizontal_8_c(s + 4, p, blimit1, limit1, thresh1);
+}
+
+void aom_lpf_vertical_6_c(uint8_t *s, int pitch, const uint8_t *blimit,
+                          const uint8_t *limit, const uint8_t *thresh) {
+  int i;
+  int count = 4;
+
+  for (i = 0; i < count; ++i) {
+    const uint8_t p2 = s[-3], p1 = s[-2], p0 = s[-1];
+    const uint8_t q0 = s[0], q1 = s[1], q2 = s[2];
+    const int8_t mask =
+        filter_mask3_chroma(*limit, *blimit, p2, p1, p0, q0, q1, q2);
+    const int8_t flat = flat_mask3_chroma(1, p2, p1, p0, q0, q1, q2);
+    filter6(mask, *thresh, flat, s - 3, s - 2, s - 1, s, s + 1, s + 2);
+    s += pitch;
+  }
+}
+
+void aom_lpf_vertical_6_dual_c(uint8_t *s, int pitch, const uint8_t *blimit0,
+                               const uint8_t *limit0, const uint8_t *thresh0,
+                               const uint8_t *blimit1, const uint8_t *limit1,
+                               const uint8_t *thresh1) {
+  aom_lpf_vertical_6_c(s, pitch, blimit0, limit0, thresh0);
+  aom_lpf_vertical_6_c(s + 4 * pitch, pitch, blimit1, limit1, thresh1);
+}
+
+void aom_lpf_vertical_8_c(uint8_t *s, int pitch, const uint8_t *blimit,
+                          const uint8_t *limit, const uint8_t *thresh) {
+  int i;
+  int count = 4;
+
+  for (i = 0; i < count; ++i) {
+    const uint8_t p3 = s[-4], p2 = s[-3], p1 = s[-2], p0 = s[-1];
+    const uint8_t q0 = s[0], q1 = s[1], q2 = s[2], q3 = s[3];
+    const int8_t mask =
+        filter_mask(*limit, *blimit, p3, p2, p1, p0, q0, q1, q2, q3);
+    const int8_t flat = flat_mask4(1, p3, p2, p1, p0, q0, q1, q2, q3);
+    filter8(mask, *thresh, flat, s - 4, s - 3, s - 2, s - 1, s, s + 1, s + 2,
+            s + 3);
+    s += pitch;
+  }
+}
+
+void aom_lpf_vertical_8_dual_c(uint8_t *s, int pitch, const uint8_t *blimit0,
+                               const uint8_t *limit0, const uint8_t *thresh0,
+                               const uint8_t *blimit1, const uint8_t *limit1,
+                               const uint8_t *thresh1) {
+  aom_lpf_vertical_8_c(s, pitch, blimit0, limit0, thresh0);
+  aom_lpf_vertical_8_c(s + 4 * pitch, pitch, blimit1, limit1, thresh1);
+}
+
+static INLINE void filter14(int8_t mask, uint8_t thresh, int8_t flat,
+                            int8_t flat2, uint8_t *op6, uint8_t *op5,
+                            uint8_t *op4, uint8_t *op3, uint8_t *op2,
+                            uint8_t *op1, uint8_t *op0, uint8_t *oq0,
+                            uint8_t *oq1, uint8_t *oq2, uint8_t *oq3,
+                            uint8_t *oq4, uint8_t *oq5, uint8_t *oq6) {
+  if (flat2 && flat && mask) {
+    const uint8_t p6 = *op6, p5 = *op5, p4 = *op4, p3 = *op3, p2 = *op2,
+                  p1 = *op1, p0 = *op0;
+    const uint8_t q0 = *oq0, q1 = *oq1, q2 = *oq2, q3 = *oq3, q4 = *oq4,
+                  q5 = *oq5, q6 = *oq6;
+
+    // 13-tap filter [1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1]
+    *op5 = ROUND_POWER_OF_TWO(p6 * 7 + p5 * 2 + p4 * 2 + p3 + p2 + p1 + p0 + q0,
+                              4);
+    *op4 = ROUND_POWER_OF_TWO(
+        p6 * 5 + p5 * 2 + p4 * 2 + p3 * 2 + p2 + p1 + p0 + q0 + q1, 4);
+    *op3 = ROUND_POWER_OF_TWO(
+        p6 * 4 + p5 + p4 * 2 + p3 * 2 + p2 * 2 + p1 + p0 + q0 + q1 + q2, 4);
+    *op2 = ROUND_POWER_OF_TWO(
+        p6 * 3 + p5 + p4 + p3 * 2 + p2 * 2 + p1 * 2 + p0 + q0 + q1 + q2 + q3,
+        4);
+    *op1 = ROUND_POWER_OF_TWO(p6 * 2 + p5 + p4 + p3 + p2 * 2 + p1 * 2 + p0 * 2 +
+                                  q0 + q1 + q2 + q3 + q4,
+                              4);
+    *op0 = ROUND_POWER_OF_TWO(p6 + p5 + p4 + p3 + p2 + p1 * 2 + p0 * 2 +
+                                  q0 * 2 + q1 + q2 + q3 + q4 + q5,
+                              4);
+    *oq0 = ROUND_POWER_OF_TWO(p5 + p4 + p3 + p2 + p1 + p0 * 2 + q0 * 2 +
+                                  q1 * 2 + q2 + q3 + q4 + q5 + q6,
+                              4);
+    *oq1 = ROUND_POWER_OF_TWO(p4 + p3 + p2 + p1 + p0 + q0 * 2 + q1 * 2 +
+                                  q2 * 2 + q3 + q4 + q5 + q6 * 2,
+                              4);
+    *oq2 = ROUND_POWER_OF_TWO(
+        p3 + p2 + p1 + p0 + q0 + q1 * 2 + q2 * 2 + q3 * 2 + q4 + q5 + q6 * 3,
+        4);
+    *oq3 = ROUND_POWER_OF_TWO(
+        p2 + p1 + p0 + q0 + q1 + q2 * 2 + q3 * 2 + q4 * 2 + q5 + q6 * 4, 4);
+    *oq4 = ROUND_POWER_OF_TWO(
+        p1 + p0 + q0 + q1 + q2 + q3 * 2 + q4 * 2 + q5 * 2 + q6 * 5, 4);
+    *oq5 = ROUND_POWER_OF_TWO(p0 + q0 + q1 + q2 + q3 + q4 * 2 + q5 * 2 + q6 * 7,
+                              4);
+  } else {
+    filter8(mask, thresh, flat, op3, op2, op1, op0, oq0, oq1, oq2, oq3);
+  }
+}
+
+static void mb_lpf_horizontal_edge_w(uint8_t *s, int p, const uint8_t *blimit,
+                                     const uint8_t *limit,
+                                     const uint8_t *thresh, int count) {
+  int i;
+  int step = 4;
+
+  // loop filter designed to work using chars so that we can make maximum use
+  // of 8 bit simd instructions.
+  for (i = 0; i < step * count; ++i) {
+    const uint8_t p6 = s[-7 * p], p5 = s[-6 * p], p4 = s[-5 * p],
+                  p3 = s[-4 * p], p2 = s[-3 * p], p1 = s[-2 * p], p0 = s[-p];
+    const uint8_t q0 = s[0 * p], q1 = s[1 * p], q2 = s[2 * p], q3 = s[3 * p],
+                  q4 = s[4 * p], q5 = s[5 * p], q6 = s[6 * p];
+    const int8_t mask =
+        filter_mask(*limit, *blimit, p3, p2, p1, p0, q0, q1, q2, q3);
+    const int8_t flat = flat_mask4(1, p3, p2, p1, p0, q0, q1, q2, q3);
+    const int8_t flat2 = flat_mask4(1, p6, p5, p4, p0, q0, q4, q5, q6);
+
+    filter14(mask, *thresh, flat, flat2, s - 7 * p, s - 6 * p, s - 5 * p,
+             s - 4 * p, s - 3 * p, s - 2 * p, s - 1 * p, s, s + 1 * p,
+             s + 2 * p, s + 3 * p, s + 4 * p, s + 5 * p, s + 6 * p);
+    ++s;
+  }
+}
+
+void aom_lpf_horizontal_14_c(uint8_t *s, int p, const uint8_t *blimit,
+                             const uint8_t *limit, const uint8_t *thresh) {
+  mb_lpf_horizontal_edge_w(s, p, blimit, limit, thresh, 1);
+}
+
+void aom_lpf_horizontal_14_dual_c(uint8_t *s, int p, const uint8_t *blimit0,
+                                  const uint8_t *limit0, const uint8_t *thresh0,
+                                  const uint8_t *blimit1, const uint8_t *limit1,
+                                  const uint8_t *thresh1) {
+  mb_lpf_horizontal_edge_w(s, p, blimit0, limit0, thresh0, 1);
+  mb_lpf_horizontal_edge_w(s + 4, p, blimit1, limit1, thresh1, 1);
+}
+
+static void mb_lpf_vertical_edge_w(uint8_t *s, int p, const uint8_t *blimit,
+                                   const uint8_t *limit, const uint8_t *thresh,
+                                   int count) {
+  int i;
+
+  for (i = 0; i < count; ++i) {
+    const uint8_t p6 = s[-7], p5 = s[-6], p4 = s[-5], p3 = s[-4], p2 = s[-3],
+                  p1 = s[-2], p0 = s[-1];
+    const uint8_t q0 = s[0], q1 = s[1], q2 = s[2], q3 = s[3], q4 = s[4],
+                  q5 = s[5], q6 = s[6];
+    const int8_t mask =
+        filter_mask(*limit, *blimit, p3, p2, p1, p0, q0, q1, q2, q3);
+    const int8_t flat = flat_mask4(1, p3, p2, p1, p0, q0, q1, q2, q3);
+    const int8_t flat2 = flat_mask4(1, p6, p5, p4, p0, q0, q4, q5, q6);
+
+    filter14(mask, *thresh, flat, flat2, s - 7, s - 6, s - 5, s - 4, s - 3,
+             s - 2, s - 1, s, s + 1, s + 2, s + 3, s + 4, s + 5, s + 6);
+    s += p;
+  }
+}
+
+void aom_lpf_vertical_14_c(uint8_t *s, int p, const uint8_t *blimit,
+                           const uint8_t *limit, const uint8_t *thresh) {
+  mb_lpf_vertical_edge_w(s, p, blimit, limit, thresh, 4);
+}
+
+void aom_lpf_vertical_14_dual_c(uint8_t *s, int pitch, const uint8_t *blimit0,
+                                const uint8_t *limit0, const uint8_t *thresh0,
+                                const uint8_t *blimit1, const uint8_t *limit1,
+                                const uint8_t *thresh1) {
+  mb_lpf_vertical_edge_w(s, pitch, blimit0, limit0, thresh0, 4);
+  mb_lpf_vertical_edge_w(s + 4 * pitch, pitch, blimit1, limit1, thresh1, 4);
+}
+
+// Should we apply any filter at all: 11111111 yes, 00000000 no ?
+static INLINE int8_t highbd_filter_mask2(uint8_t limit, uint8_t blimit,
+                                         uint16_t p1, uint16_t p0, uint16_t q0,
+                                         uint16_t q1, int bd) {
+  int8_t mask = 0;
+  int16_t limit16 = (uint16_t)limit << (bd - 8);
+  int16_t blimit16 = (uint16_t)blimit << (bd - 8);
+  mask |= (abs(p1 - p0) > limit16) * -1;
+  mask |= (abs(q1 - q0) > limit16) * -1;
+  mask |= (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 > blimit16) * -1;
+  return ~mask;
+}
+
+// Should we apply any filter at all: 11111111 yes, 00000000 no ?
+static INLINE int8_t highbd_filter_mask(uint8_t limit, uint8_t blimit,
+                                        uint16_t p3, uint16_t p2, uint16_t p1,
+                                        uint16_t p0, uint16_t q0, uint16_t q1,
+                                        uint16_t q2, uint16_t q3, int bd) {
+  int8_t mask = 0;
+  int16_t limit16 = (uint16_t)limit << (bd - 8);
+  int16_t blimit16 = (uint16_t)blimit << (bd - 8);
+  mask |= (abs(p3 - p2) > limit16) * -1;
+  mask |= (abs(p2 - p1) > limit16) * -1;
+  mask |= (abs(p1 - p0) > limit16) * -1;
+  mask |= (abs(q1 - q0) > limit16) * -1;
+  mask |= (abs(q2 - q1) > limit16) * -1;
+  mask |= (abs(q3 - q2) > limit16) * -1;
+  mask |= (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 > blimit16) * -1;
+  return ~mask;
+}
+
+static INLINE int8_t highbd_filter_mask3_chroma(uint8_t limit, uint8_t blimit,
+                                                uint16_t p2, uint16_t p1,
+                                                uint16_t p0, uint16_t q0,
+                                                uint16_t q1, uint16_t q2,
+                                                int bd) {
+  int8_t mask = 0;
+  int16_t limit16 = (uint16_t)limit << (bd - 8);
+  int16_t blimit16 = (uint16_t)blimit << (bd - 8);
+  mask |= (abs(p2 - p1) > limit16) * -1;
+  mask |= (abs(p1 - p0) > limit16) * -1;
+  mask |= (abs(q1 - q0) > limit16) * -1;
+  mask |= (abs(q2 - q1) > limit16) * -1;
+  mask |= (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 > blimit16) * -1;
+  return ~mask;
+}
+
+static INLINE int8_t highbd_flat_mask3_chroma(uint8_t thresh, uint16_t p2,
+                                              uint16_t p1, uint16_t p0,
+                                              uint16_t q0, uint16_t q1,
+                                              uint16_t q2, int bd) {
+  int8_t mask = 0;
+  int16_t thresh16 = (uint16_t)thresh << (bd - 8);
+  mask |= (abs(p1 - p0) > thresh16) * -1;
+  mask |= (abs(q1 - q0) > thresh16) * -1;
+  mask |= (abs(p2 - p0) > thresh16) * -1;
+  mask |= (abs(q2 - q0) > thresh16) * -1;
+  return ~mask;
+}
+
+static INLINE int8_t highbd_flat_mask4(uint8_t thresh, uint16_t p3, uint16_t p2,
+                                       uint16_t p1, uint16_t p0, uint16_t q0,
+                                       uint16_t q1, uint16_t q2, uint16_t q3,
+                                       int bd) {
+  int8_t mask = 0;
+  int16_t thresh16 = (uint16_t)thresh << (bd - 8);
+  mask |= (abs(p1 - p0) > thresh16) * -1;
+  mask |= (abs(q1 - q0) > thresh16) * -1;
+  mask |= (abs(p2 - p0) > thresh16) * -1;
+  mask |= (abs(q2 - q0) > thresh16) * -1;
+  mask |= (abs(p3 - p0) > thresh16) * -1;
+  mask |= (abs(q3 - q0) > thresh16) * -1;
+  return ~mask;
+}
+
+// Is there high edge variance internal edge:
+// 11111111_11111111 yes, 00000000_00000000 no ?
+static INLINE int16_t highbd_hev_mask(uint8_t thresh, uint16_t p1, uint16_t p0,
+                                      uint16_t q0, uint16_t q1, int bd) {
+  int16_t hev = 0;
+  int16_t thresh16 = (uint16_t)thresh << (bd - 8);
+  hev |= (abs(p1 - p0) > thresh16) * -1;
+  hev |= (abs(q1 - q0) > thresh16) * -1;
+  return hev;
+}
+
+static INLINE void highbd_filter4(int8_t mask, uint8_t thresh, uint16_t *op1,
+                                  uint16_t *op0, uint16_t *oq0, uint16_t *oq1,
+                                  int bd) {
+  int16_t filter1, filter2;
+  // ^0x80 equivalent to subtracting 0x80 from the values to turn them
+  // into -128 to +127 instead of 0 to 255.
+  int shift = bd - 8;
+  const int16_t ps1 = (int16_t)*op1 - (0x80 << shift);
+  const int16_t ps0 = (int16_t)*op0 - (0x80 << shift);
+  const int16_t qs0 = (int16_t)*oq0 - (0x80 << shift);
+  const int16_t qs1 = (int16_t)*oq1 - (0x80 << shift);
+  const uint16_t hev = highbd_hev_mask(thresh, *op1, *op0, *oq0, *oq1, bd);
+
+  // Add outer taps if we have high edge variance.
+  int16_t filter = signed_char_clamp_high(ps1 - qs1, bd) & hev;
+
+  // Inner taps.
+  filter = signed_char_clamp_high(filter + 3 * (qs0 - ps0), bd) & mask;
+
+  // Save bottom 3 bits so that we round one side +4 and the other +3
+  // if it equals 4 we'll set to adjust by -1 to account for the fact
+  // we'd round 3 the other way.
+  filter1 = signed_char_clamp_high(filter + 4, bd) >> 3;
+  filter2 = signed_char_clamp_high(filter + 3, bd) >> 3;
+
+  *oq0 = signed_char_clamp_high(qs0 - filter1, bd) + (0x80 << shift);
+  *op0 = signed_char_clamp_high(ps0 + filter2, bd) + (0x80 << shift);
+
+  // Outer tap adjustments.
+  filter = ROUND_POWER_OF_TWO(filter1, 1) & ~hev;
+
+  *oq1 = signed_char_clamp_high(qs1 - filter, bd) + (0x80 << shift);
+  *op1 = signed_char_clamp_high(ps1 + filter, bd) + (0x80 << shift);
+}
+
+void aom_highbd_lpf_horizontal_4_c(uint16_t *s, int p /* pitch */,
+                                   const uint8_t *blimit, const uint8_t *limit,
+                                   const uint8_t *thresh, int bd) {
+  int i;
+  int count = 4;
+
+  // loop filter designed to work using chars so that we can make maximum use
+  // of 8 bit simd instructions.
+  for (i = 0; i < count; ++i) {
+    const uint16_t p1 = s[-2 * p];
+    const uint16_t p0 = s[-p];
+    const uint16_t q0 = s[0 * p];
+    const uint16_t q1 = s[1 * p];
+    const int8_t mask =
+        highbd_filter_mask2(*limit, *blimit, p1, p0, q0, q1, bd);
+    highbd_filter4(mask, *thresh, s - 2 * p, s - 1 * p, s, s + 1 * p, bd);
+    ++s;
+  }
+}
+
+void aom_highbd_lpf_horizontal_4_dual_c(
+    uint16_t *s, int p, const uint8_t *blimit0, const uint8_t *limit0,
+    const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1,
+    const uint8_t *thresh1, int bd) {
+  aom_highbd_lpf_horizontal_4_c(s, p, blimit0, limit0, thresh0, bd);
+  aom_highbd_lpf_horizontal_4_c(s + 4, p, blimit1, limit1, thresh1, bd);
+}
+
+void aom_highbd_lpf_vertical_4_c(uint16_t *s, int pitch, const uint8_t *blimit,
+                                 const uint8_t *limit, const uint8_t *thresh,
+                                 int bd) {
+  int i;
+  int count = 4;
+
+  // loop filter designed to work using chars so that we can make maximum use
+  // of 8 bit simd instructions.
+  for (i = 0; i < count; ++i) {
+    const uint16_t p1 = s[-2], p0 = s[-1];
+    const uint16_t q0 = s[0], q1 = s[1];
+    const int8_t mask =
+        highbd_filter_mask2(*limit, *blimit, p1, p0, q0, q1, bd);
+    highbd_filter4(mask, *thresh, s - 2, s - 1, s, s + 1, bd);
+    s += pitch;
+  }
+}
+
+void aom_highbd_lpf_vertical_4_dual_c(
+    uint16_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0,
+    const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1,
+    const uint8_t *thresh1, int bd) {
+  aom_highbd_lpf_vertical_4_c(s, pitch, blimit0, limit0, thresh0, bd);
+  aom_highbd_lpf_vertical_4_c(s + 4 * pitch, pitch, blimit1, limit1, thresh1,
+                              bd);
+}
+
+static INLINE void highbd_filter6(int8_t mask, uint8_t thresh, int8_t flat,
+                                  uint16_t *op2, uint16_t *op1, uint16_t *op0,
+                                  uint16_t *oq0, uint16_t *oq1, uint16_t *oq2,
+                                  int bd) {
+  if (flat && mask) {
+    const uint16_t p2 = *op2, p1 = *op1, p0 = *op0;
+    const uint16_t q0 = *oq0, q1 = *oq1, q2 = *oq2;
+
+    // 5-tap filter [1, 2, 2, 2, 1]
+    *op1 = ROUND_POWER_OF_TWO(p2 * 3 + p1 * 2 + p0 * 2 + q0, 3);
+    *op0 = ROUND_POWER_OF_TWO(p2 + p1 * 2 + p0 * 2 + q0 * 2 + q1, 3);
+    *oq0 = ROUND_POWER_OF_TWO(p1 + p0 * 2 + q0 * 2 + q1 * 2 + q2, 3);
+    *oq1 = ROUND_POWER_OF_TWO(p0 + q0 * 2 + q1 * 2 + q2 * 3, 3);
+  } else {
+    highbd_filter4(mask, thresh, op1, op0, oq0, oq1, bd);
+  }
+}
+
+static INLINE void highbd_filter8(int8_t mask, uint8_t thresh, int8_t flat,
+                                  uint16_t *op3, uint16_t *op2, uint16_t *op1,
+                                  uint16_t *op0, uint16_t *oq0, uint16_t *oq1,
+                                  uint16_t *oq2, uint16_t *oq3, int bd) {
+  if (flat && mask) {
+    const uint16_t p3 = *op3, p2 = *op2, p1 = *op1, p0 = *op0;
+    const uint16_t q0 = *oq0, q1 = *oq1, q2 = *oq2, q3 = *oq3;
+
+    // 7-tap filter [1, 1, 1, 2, 1, 1, 1]
+    *op2 = ROUND_POWER_OF_TWO(p3 + p3 + p3 + 2 * p2 + p1 + p0 + q0, 3);
+    *op1 = ROUND_POWER_OF_TWO(p3 + p3 + p2 + 2 * p1 + p0 + q0 + q1, 3);
+    *op0 = ROUND_POWER_OF_TWO(p3 + p2 + p1 + 2 * p0 + q0 + q1 + q2, 3);
+    *oq0 = ROUND_POWER_OF_TWO(p2 + p1 + p0 + 2 * q0 + q1 + q2 + q3, 3);
+    *oq1 = ROUND_POWER_OF_TWO(p1 + p0 + q0 + 2 * q1 + q2 + q3 + q3, 3);
+    *oq2 = ROUND_POWER_OF_TWO(p0 + q0 + q1 + 2 * q2 + q3 + q3 + q3, 3);
+  } else {
+    highbd_filter4(mask, thresh, op1, op0, oq0, oq1, bd);
+  }
+}
+
+void aom_highbd_lpf_horizontal_8_c(uint16_t *s, int p, const uint8_t *blimit,
+                                   const uint8_t *limit, const uint8_t *thresh,
+                                   int bd) {
+  int i;
+  int count = 4;
+
+  // loop filter designed to work using chars so that we can make maximum use
+  // of 8 bit simd instructions.
+  for (i = 0; i < count; ++i) {
+    const uint16_t p3 = s[-4 * p], p2 = s[-3 * p], p1 = s[-2 * p], p0 = s[-p];
+    const uint16_t q0 = s[0 * p], q1 = s[1 * p], q2 = s[2 * p], q3 = s[3 * p];
+
+    const int8_t mask =
+        highbd_filter_mask(*limit, *blimit, p3, p2, p1, p0, q0, q1, q2, q3, bd);
+    const int8_t flat =
+        highbd_flat_mask4(1, p3, p2, p1, p0, q0, q1, q2, q3, bd);
+    highbd_filter8(mask, *thresh, flat, s - 4 * p, s - 3 * p, s - 2 * p,
+                   s - 1 * p, s, s + 1 * p, s + 2 * p, s + 3 * p, bd);
+    ++s;
+  }
+}
+
+void aom_highbd_lpf_horizontal_6_c(uint16_t *s, int p, const uint8_t *blimit,
+                                   const uint8_t *limit, const uint8_t *thresh,
+                                   int bd) {
+  int i;
+  int count = 4;
+
+  // loop filter designed to work using chars so that we can make maximum use
+  // of 8 bit simd instructions.
+  for (i = 0; i < count; ++i) {
+    const uint16_t p2 = s[-3 * p], p1 = s[-2 * p], p0 = s[-p];
+    const uint16_t q0 = s[0 * p], q1 = s[1 * p], q2 = s[2 * p];
+
+    const int8_t mask =
+        highbd_filter_mask3_chroma(*limit, *blimit, p2, p1, p0, q0, q1, q2, bd);
+    const int8_t flat = highbd_flat_mask3_chroma(1, p2, p1, p0, q0, q1, q2, bd);
+    highbd_filter6(mask, *thresh, flat, s - 3 * p, s - 2 * p, s - 1 * p, s,
+                   s + 1 * p, s + 2 * p, bd);
+    ++s;
+  }
+}
+
+void aom_highbd_lpf_horizontal_6_dual_c(
+    uint16_t *s, int p, const uint8_t *blimit0, const uint8_t *limit0,
+    const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1,
+    const uint8_t *thresh1, int bd) {
+  aom_highbd_lpf_horizontal_6_c(s, p, blimit0, limit0, thresh0, bd);
+  aom_highbd_lpf_horizontal_6_c(s + 4, p, blimit1, limit1, thresh1, bd);
+}
+
+void aom_highbd_lpf_horizontal_8_dual_c(
+    uint16_t *s, int p, const uint8_t *blimit0, const uint8_t *limit0,
+    const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1,
+    const uint8_t *thresh1, int bd) {
+  aom_highbd_lpf_horizontal_8_c(s, p, blimit0, limit0, thresh0, bd);
+  aom_highbd_lpf_horizontal_8_c(s + 4, p, blimit1, limit1, thresh1, bd);
+}
+
+void aom_highbd_lpf_vertical_6_c(uint16_t *s, int pitch, const uint8_t *blimit,
+                                 const uint8_t *limit, const uint8_t *thresh,
+                                 int bd) {
+  int i;
+  int count = 4;
+
+  for (i = 0; i < count; ++i) {
+    const uint16_t p2 = s[-3], p1 = s[-2], p0 = s[-1];
+    const uint16_t q0 = s[0], q1 = s[1], q2 = s[2];
+    const int8_t mask =
+        highbd_filter_mask3_chroma(*limit, *blimit, p2, p1, p0, q0, q1, q2, bd);
+    const int8_t flat = highbd_flat_mask3_chroma(1, p2, p1, p0, q0, q1, q2, bd);
+    highbd_filter6(mask, *thresh, flat, s - 3, s - 2, s - 1, s, s + 1, s + 2,
+                   bd);
+    s += pitch;
+  }
+}
+
+void aom_highbd_lpf_vertical_6_dual_c(
+    uint16_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0,
+    const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1,
+    const uint8_t *thresh1, int bd) {
+  aom_highbd_lpf_vertical_6_c(s, pitch, blimit0, limit0, thresh0, bd);
+  aom_highbd_lpf_vertical_6_c(s + 4 * pitch, pitch, blimit1, limit1, thresh1,
+                              bd);
+}
+
+void aom_highbd_lpf_vertical_8_c(uint16_t *s, int pitch, const uint8_t *blimit,
+                                 const uint8_t *limit, const uint8_t *thresh,
+                                 int bd) {
+  int i;
+  int count = 4;
+
+  for (i = 0; i < count; ++i) {
+    const uint16_t p3 = s[-4], p2 = s[-3], p1 = s[-2], p0 = s[-1];
+    const uint16_t q0 = s[0], q1 = s[1], q2 = s[2], q3 = s[3];
+    const int8_t mask =
+        highbd_filter_mask(*limit, *blimit, p3, p2, p1, p0, q0, q1, q2, q3, bd);
+    const int8_t flat =
+        highbd_flat_mask4(1, p3, p2, p1, p0, q0, q1, q2, q3, bd);
+    highbd_filter8(mask, *thresh, flat, s - 4, s - 3, s - 2, s - 1, s, s + 1,
+                   s + 2, s + 3, bd);
+    s += pitch;
+  }
+}
+
+void aom_highbd_lpf_vertical_8_dual_c(
+    uint16_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0,
+    const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1,
+    const uint8_t *thresh1, int bd) {
+  aom_highbd_lpf_vertical_8_c(s, pitch, blimit0, limit0, thresh0, bd);
+  aom_highbd_lpf_vertical_8_c(s + 4 * pitch, pitch, blimit1, limit1, thresh1,
+                              bd);
+}
+
+static INLINE void highbd_filter14(int8_t mask, uint8_t thresh, int8_t flat,
+                                   int8_t flat2, uint16_t *op6, uint16_t *op5,
+                                   uint16_t *op4, uint16_t *op3, uint16_t *op2,
+                                   uint16_t *op1, uint16_t *op0, uint16_t *oq0,
+                                   uint16_t *oq1, uint16_t *oq2, uint16_t *oq3,
+                                   uint16_t *oq4, uint16_t *oq5, uint16_t *oq6,
+                                   int bd) {
+  if (flat2 && flat && mask) {
+    const uint16_t p6 = *op6;
+    const uint16_t p5 = *op5;
+    const uint16_t p4 = *op4;
+    const uint16_t p3 = *op3;
+    const uint16_t p2 = *op2;
+    const uint16_t p1 = *op1;
+    const uint16_t p0 = *op0;
+    const uint16_t q0 = *oq0;
+    const uint16_t q1 = *oq1;
+    const uint16_t q2 = *oq2;
+    const uint16_t q3 = *oq3;
+    const uint16_t q4 = *oq4;
+    const uint16_t q5 = *oq5;
+    const uint16_t q6 = *oq6;
+
+    // 13-tap filter [1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1]
+    *op5 = ROUND_POWER_OF_TWO(p6 * 7 + p5 * 2 + p4 * 2 + p3 + p2 + p1 + p0 + q0,
+                              4);
+    *op4 = ROUND_POWER_OF_TWO(
+        p6 * 5 + p5 * 2 + p4 * 2 + p3 * 2 + p2 + p1 + p0 + q0 + q1, 4);
+    *op3 = ROUND_POWER_OF_TWO(
+        p6 * 4 + p5 + p4 * 2 + p3 * 2 + p2 * 2 + p1 + p0 + q0 + q1 + q2, 4);
+    *op2 = ROUND_POWER_OF_TWO(
+        p6 * 3 + p5 + p4 + p3 * 2 + p2 * 2 + p1 * 2 + p0 + q0 + q1 + q2 + q3,
+        4);
+    *op1 = ROUND_POWER_OF_TWO(p6 * 2 + p5 + p4 + p3 + p2 * 2 + p1 * 2 + p0 * 2 +
+                                  q0 + q1 + q2 + q3 + q4,
+                              4);
+    *op0 = ROUND_POWER_OF_TWO(p6 + p5 + p4 + p3 + p2 + p1 * 2 + p0 * 2 +
+                                  q0 * 2 + q1 + q2 + q3 + q4 + q5,
+                              4);
+    *oq0 = ROUND_POWER_OF_TWO(p5 + p4 + p3 + p2 + p1 + p0 * 2 + q0 * 2 +
+                                  q1 * 2 + q2 + q3 + q4 + q5 + q6,
+                              4);
+    *oq1 = ROUND_POWER_OF_TWO(p4 + p3 + p2 + p1 + p0 + q0 * 2 + q1 * 2 +
+                                  q2 * 2 + q3 + q4 + q5 + q6 * 2,
+                              4);
+    *oq2 = ROUND_POWER_OF_TWO(
+        p3 + p2 + p1 + p0 + q0 + q1 * 2 + q2 * 2 + q3 * 2 + q4 + q5 + q6 * 3,
+        4);
+    *oq3 = ROUND_POWER_OF_TWO(
+        p2 + p1 + p0 + q0 + q1 + q2 * 2 + q3 * 2 + q4 * 2 + q5 + q6 * 4, 4);
+    *oq4 = ROUND_POWER_OF_TWO(
+        p1 + p0 + q0 + q1 + q2 + q3 * 2 + q4 * 2 + q5 * 2 + q6 * 5, 4);
+    *oq5 = ROUND_POWER_OF_TWO(p0 + q0 + q1 + q2 + q3 + q4 * 2 + q5 * 2 + q6 * 7,
+                              4);
+  } else {
+    highbd_filter8(mask, thresh, flat, op3, op2, op1, op0, oq0, oq1, oq2, oq3,
+                   bd);
+  }
+}
+
+static void highbd_mb_lpf_horizontal_edge_w(uint16_t *s, int p,
+                                            const uint8_t *blimit,
+                                            const uint8_t *limit,
+                                            const uint8_t *thresh, int count,
+                                            int bd) {
+  int i;
+  int step = 4;
+
+  // loop filter designed to work using chars so that we can make maximum use
+  // of 8 bit simd instructions.
+  for (i = 0; i < step * count; ++i) {
+    const uint16_t p3 = s[-4 * p];
+    const uint16_t p2 = s[-3 * p];
+    const uint16_t p1 = s[-2 * p];
+    const uint16_t p0 = s[-p];
+    const uint16_t q0 = s[0 * p];
+    const uint16_t q1 = s[1 * p];
+    const uint16_t q2 = s[2 * p];
+    const uint16_t q3 = s[3 * p];
+    const int8_t mask =
+        highbd_filter_mask(*limit, *blimit, p3, p2, p1, p0, q0, q1, q2, q3, bd);
+    const int8_t flat =
+        highbd_flat_mask4(1, p3, p2, p1, p0, q0, q1, q2, q3, bd);
+
+    const int8_t flat2 =
+        highbd_flat_mask4(1, s[-7 * p], s[-6 * p], s[-5 * p], p0, q0, s[4 * p],
+                          s[5 * p], s[6 * p], bd);
+
+    highbd_filter14(mask, *thresh, flat, flat2, s - 7 * p, s - 6 * p, s - 5 * p,
+                    s - 4 * p, s - 3 * p, s - 2 * p, s - 1 * p, s, s + 1 * p,
+                    s + 2 * p, s + 3 * p, s + 4 * p, s + 5 * p, s + 6 * p, bd);
+    ++s;
+  }
+}
+
+void aom_highbd_lpf_horizontal_14_c(uint16_t *s, int p, const uint8_t *blimit,
+                                    const uint8_t *limit, const uint8_t *thresh,
+                                    int bd) {
+  highbd_mb_lpf_horizontal_edge_w(s, p, blimit, limit, thresh, 1, bd);
+}
+
+void aom_highbd_lpf_horizontal_14_dual_c(
+    uint16_t *s, int p, const uint8_t *blimit0, const uint8_t *limit0,
+    const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1,
+    const uint8_t *thresh1, int bd) {
+  highbd_mb_lpf_horizontal_edge_w(s, p, blimit0, limit0, thresh0, 1, bd);
+  highbd_mb_lpf_horizontal_edge_w(s + 4, p, blimit1, limit1, thresh1, 1, bd);
+}
+
+static void highbd_mb_lpf_vertical_edge_w(uint16_t *s, int p,
+                                          const uint8_t *blimit,
+                                          const uint8_t *limit,
+                                          const uint8_t *thresh, int count,
+                                          int bd) {
+  int i;
+
+  for (i = 0; i < count; ++i) {
+    const uint16_t p3 = s[-4];
+    const uint16_t p2 = s[-3];
+    const uint16_t p1 = s[-2];
+    const uint16_t p0 = s[-1];
+    const uint16_t q0 = s[0];
+    const uint16_t q1 = s[1];
+    const uint16_t q2 = s[2];
+    const uint16_t q3 = s[3];
+    const int8_t mask =
+        highbd_filter_mask(*limit, *blimit, p3, p2, p1, p0, q0, q1, q2, q3, bd);
+    const int8_t flat =
+        highbd_flat_mask4(1, p3, p2, p1, p0, q0, q1, q2, q3, bd);
+    const int8_t flat2 =
+        highbd_flat_mask4(1, s[-7], s[-6], s[-5], p0, q0, s[4], s[5], s[6], bd);
+
+    highbd_filter14(mask, *thresh, flat, flat2, s - 7, s - 6, s - 5, s - 4,
+                    s - 3, s - 2, s - 1, s, s + 1, s + 2, s + 3, s + 4, s + 5,
+                    s + 6, bd);
+    s += p;
+  }
+}
+
+void aom_highbd_lpf_vertical_14_c(uint16_t *s, int p, const uint8_t *blimit,
+                                  const uint8_t *limit, const uint8_t *thresh,
+                                  int bd) {
+  highbd_mb_lpf_vertical_edge_w(s, p, blimit, limit, thresh, 4, bd);
+}
+
+void aom_highbd_lpf_vertical_14_dual_c(
+    uint16_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0,
+    const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1,
+    const uint8_t *thresh1, int bd) {
+  highbd_mb_lpf_vertical_edge_w(s, pitch, blimit0, limit0, thresh0, 4, bd);
+  highbd_mb_lpf_vertical_edge_w(s + 4 * pitch, pitch, blimit1, limit1, thresh1,
+                                4, bd);
+}

diff --git a/libav1/aom_dsp/postproc.h b/libav1/aom_dsp/postproc.h
new file mode 100644
index 0000000..f3d87f2
--- /dev/null
+++ b/libav1/aom_dsp/postproc.h

@@ -0,0 +1,26 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_POSTPROC_H_
+#define AOM_AOM_DSP_POSTPROC_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// Fills a noise buffer with gaussian noise strength determined by sigma.
+int aom_setup_noise(double sigma, int size, char *noise);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  // AOM_AOM_DSP_POSTPROC_H_

diff --git a/libav1/aom_dsp/prob.h b/libav1/aom_dsp/prob.h
new file mode 100644
index 0000000..20ffdea
--- /dev/null
+++ b/libav1/aom_dsp/prob.h

@@ -0,0 +1,671 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_PROB_H_
+#define AOM_AOM_DSP_PROB_H_
+
+#include <assert.h>
+#include <stdio.h>
+
+#include "config/aom_config.h"
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_dsp/entcode.h"
+#include "aom_ports/bitops.h"
+#include "aom_ports/mem.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// TODO(negge): Rename this aom_prob once we remove vpxbool.
+typedef uint16_t aom_cdf_prob;
+
+#define CDF_SIZE(x) ((x) + 1)
+#define CDF_PROB_BITS 15
+#define CDF_PROB_TOP (1 << CDF_PROB_BITS)
+#define CDF_INIT_TOP 32768
+#define CDF_SHIFT (15 - CDF_PROB_BITS)
+/*The value stored in an iCDF is CDF_PROB_TOP minus the actual cumulative
+  probability (an "inverse" CDF).
+  This function converts from one representation to the other (and is its own
+  inverse).*/
+#define AOM_ICDF(x) (CDF_PROB_TOP - (x))
+
+#if CDF_SHIFT == 0
+
+#define AOM_CDF2(a0) AOM_ICDF(a0), AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF3(a0, a1) AOM_ICDF(a0), AOM_ICDF(a1), AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF4(a0, a1, a2) \
+  AOM_ICDF(a0), AOM_ICDF(a1), AOM_ICDF(a2), AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF5(a0, a1, a2, a3) \
+  AOM_ICDF(a0)                   \
+  , AOM_ICDF(a1), AOM_ICDF(a2), AOM_ICDF(a3), AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF6(a0, a1, a2, a3, a4)                        \
+  AOM_ICDF(a0)                                              \
+  , AOM_ICDF(a1), AOM_ICDF(a2), AOM_ICDF(a3), AOM_ICDF(a4), \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF7(a0, a1, a2, a3, a4, a5)                                  \
+  AOM_ICDF(a0)                                                            \
+  , AOM_ICDF(a1), AOM_ICDF(a2), AOM_ICDF(a3), AOM_ICDF(a4), AOM_ICDF(a5), \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF8(a0, a1, a2, a3, a4, a5, a6)                              \
+  AOM_ICDF(a0)                                                            \
+  , AOM_ICDF(a1), AOM_ICDF(a2), AOM_ICDF(a3), AOM_ICDF(a4), AOM_ICDF(a5), \
+      AOM_ICDF(a6), AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF9(a0, a1, a2, a3, a4, a5, a6, a7)                          \
+  AOM_ICDF(a0)                                                            \
+  , AOM_ICDF(a1), AOM_ICDF(a2), AOM_ICDF(a3), AOM_ICDF(a4), AOM_ICDF(a5), \
+      AOM_ICDF(a6), AOM_ICDF(a7), AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF10(a0, a1, a2, a3, a4, a5, a6, a7, a8)                     \
+  AOM_ICDF(a0)                                                            \
+  , AOM_ICDF(a1), AOM_ICDF(a2), AOM_ICDF(a3), AOM_ICDF(a4), AOM_ICDF(a5), \
+      AOM_ICDF(a6), AOM_ICDF(a7), AOM_ICDF(a8), AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF11(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9)                 \
+  AOM_ICDF(a0)                                                            \
+  , AOM_ICDF(a1), AOM_ICDF(a2), AOM_ICDF(a3), AOM_ICDF(a4), AOM_ICDF(a5), \
+      AOM_ICDF(a6), AOM_ICDF(a7), AOM_ICDF(a8), AOM_ICDF(a9),             \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF12(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10)               \
+  AOM_ICDF(a0)                                                               \
+  , AOM_ICDF(a1), AOM_ICDF(a2), AOM_ICDF(a3), AOM_ICDF(a4), AOM_ICDF(a5),    \
+      AOM_ICDF(a6), AOM_ICDF(a7), AOM_ICDF(a8), AOM_ICDF(a9), AOM_ICDF(a10), \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF13(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11)          \
+  AOM_ICDF(a0)                                                               \
+  , AOM_ICDF(a1), AOM_ICDF(a2), AOM_ICDF(a3), AOM_ICDF(a4), AOM_ICDF(a5),    \
+      AOM_ICDF(a6), AOM_ICDF(a7), AOM_ICDF(a8), AOM_ICDF(a9), AOM_ICDF(a10), \
+      AOM_ICDF(a11), AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF14(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12)     \
+  AOM_ICDF(a0)                                                               \
+  , AOM_ICDF(a1), AOM_ICDF(a2), AOM_ICDF(a3), AOM_ICDF(a4), AOM_ICDF(a5),    \
+      AOM_ICDF(a6), AOM_ICDF(a7), AOM_ICDF(a8), AOM_ICDF(a9), AOM_ICDF(a10), \
+      AOM_ICDF(a11), AOM_ICDF(a12), AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF15(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13) \
+  AOM_ICDF(a0)                                                                \
+  , AOM_ICDF(a1), AOM_ICDF(a2), AOM_ICDF(a3), AOM_ICDF(a4), AOM_ICDF(a5),     \
+      AOM_ICDF(a6), AOM_ICDF(a7), AOM_ICDF(a8), AOM_ICDF(a9), AOM_ICDF(a10),  \
+      AOM_ICDF(a11), AOM_ICDF(a12), AOM_ICDF(a13), AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF16(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, \
+                  a14)                                                        \
+  AOM_ICDF(a0)                                                                \
+  , AOM_ICDF(a1), AOM_ICDF(a2), AOM_ICDF(a3), AOM_ICDF(a4), AOM_ICDF(a5),     \
+      AOM_ICDF(a6), AOM_ICDF(a7), AOM_ICDF(a8), AOM_ICDF(a9), AOM_ICDF(a10),  \
+      AOM_ICDF(a11), AOM_ICDF(a12), AOM_ICDF(a13), AOM_ICDF(a14),             \
+      AOM_ICDF(CDF_PROB_TOP), 0
+
+#else
+#define AOM_CDF2(a0)                                       \
+  AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 2) + \
+            ((CDF_INIT_TOP - 2) >> 1)) /                   \
+               ((CDF_INIT_TOP - 2)) +                      \
+           1)                                              \
+  , AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF3(a0, a1)                                       \
+  AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 3) +     \
+            ((CDF_INIT_TOP - 3) >> 1)) /                       \
+               ((CDF_INIT_TOP - 3)) +                          \
+           1)                                                  \
+  ,                                                            \
+      AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 3) + \
+                ((CDF_INIT_TOP - 3) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 3)) +                      \
+               2),                                             \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF4(a0, a1, a2)                                   \
+  AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 4) +     \
+            ((CDF_INIT_TOP - 4) >> 1)) /                       \
+               ((CDF_INIT_TOP - 4)) +                          \
+           1)                                                  \
+  ,                                                            \
+      AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 4) + \
+                ((CDF_INIT_TOP - 4) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 4)) +                      \
+               2),                                             \
+      AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 4) + \
+                ((CDF_INIT_TOP - 4) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 4)) +                      \
+               3),                                             \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF5(a0, a1, a2, a3)                               \
+  AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 5) +     \
+            ((CDF_INIT_TOP - 5) >> 1)) /                       \
+               ((CDF_INIT_TOP - 5)) +                          \
+           1)                                                  \
+  ,                                                            \
+      AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 5) + \
+                ((CDF_INIT_TOP - 5) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 5)) +                      \
+               2),                                             \
+      AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 5) + \
+                ((CDF_INIT_TOP - 5) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 5)) +                      \
+               3),                                             \
+      AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 5) + \
+                ((CDF_INIT_TOP - 5) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 5)) +                      \
+               4),                                             \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF6(a0, a1, a2, a3, a4)                           \
+  AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 6) +     \
+            ((CDF_INIT_TOP - 6) >> 1)) /                       \
+               ((CDF_INIT_TOP - 6)) +                          \
+           1)                                                  \
+  ,                                                            \
+      AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 6) + \
+                ((CDF_INIT_TOP - 6) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 6)) +                      \
+               2),                                             \
+      AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 6) + \
+                ((CDF_INIT_TOP - 6) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 6)) +                      \
+               3),                                             \
+      AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 6) + \
+                ((CDF_INIT_TOP - 6) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 6)) +                      \
+               4),                                             \
+      AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 6) + \
+                ((CDF_INIT_TOP - 6) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 6)) +                      \
+               5),                                             \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF7(a0, a1, a2, a3, a4, a5)                       \
+  AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 7) +     \
+            ((CDF_INIT_TOP - 7) >> 1)) /                       \
+               ((CDF_INIT_TOP - 7)) +                          \
+           1)                                                  \
+  ,                                                            \
+      AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 7) + \
+                ((CDF_INIT_TOP - 7) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 7)) +                      \
+               2),                                             \
+      AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 7) + \
+                ((CDF_INIT_TOP - 7) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 7)) +                      \
+               3),                                             \
+      AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 7) + \
+                ((CDF_INIT_TOP - 7) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 7)) +                      \
+               4),                                             \
+      AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 7) + \
+                ((CDF_INIT_TOP - 7) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 7)) +                      \
+               5),                                             \
+      AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 7) + \
+                ((CDF_INIT_TOP - 7) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 7)) +                      \
+               6),                                             \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF8(a0, a1, a2, a3, a4, a5, a6)                   \
+  AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 8) +     \
+            ((CDF_INIT_TOP - 8) >> 1)) /                       \
+               ((CDF_INIT_TOP - 8)) +                          \
+           1)                                                  \
+  ,                                                            \
+      AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 8) + \
+                ((CDF_INIT_TOP - 8) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 8)) +                      \
+               2),                                             \
+      AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 8) + \
+                ((CDF_INIT_TOP - 8) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 8)) +                      \
+               3),                                             \
+      AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 8) + \
+                ((CDF_INIT_TOP - 8) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 8)) +                      \
+               4),                                             \
+      AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 8) + \
+                ((CDF_INIT_TOP - 8) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 8)) +                      \
+               5),                                             \
+      AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 8) + \
+                ((CDF_INIT_TOP - 8) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 8)) +                      \
+               6),                                             \
+      AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 8) + \
+                ((CDF_INIT_TOP - 8) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 8)) +                      \
+               7),                                             \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF9(a0, a1, a2, a3, a4, a5, a6, a7)               \
+  AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 9) +     \
+            ((CDF_INIT_TOP - 9) >> 1)) /                       \
+               ((CDF_INIT_TOP - 9)) +                          \
+           1)                                                  \
+  ,                                                            \
+      AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 9) + \
+                ((CDF_INIT_TOP - 9) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 9)) +                      \
+               2),                                             \
+      AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 9) + \
+                ((CDF_INIT_TOP - 9) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 9)) +                      \
+               3),                                             \
+      AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 9) + \
+                ((CDF_INIT_TOP - 9) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 9)) +                      \
+               4),                                             \
+      AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 9) + \
+                ((CDF_INIT_TOP - 9) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 9)) +                      \
+               5),                                             \
+      AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 9) + \
+                ((CDF_INIT_TOP - 9) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 9)) +                      \
+               6),                                             \
+      AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 9) + \
+                ((CDF_INIT_TOP - 9) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 9)) +                      \
+               7),                                             \
+      AOM_ICDF((((a7)-8) * ((CDF_INIT_TOP >> CDF_SHIFT) - 9) + \
+                ((CDF_INIT_TOP - 9) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 9)) +                      \
+               8),                                             \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF10(a0, a1, a2, a3, a4, a5, a6, a7, a8)           \
+  AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) +     \
+            ((CDF_INIT_TOP - 10) >> 1)) /                       \
+               ((CDF_INIT_TOP - 10)) +                          \
+           1)                                                   \
+  ,                                                             \
+      AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
+                ((CDF_INIT_TOP - 10) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 10)) +                      \
+               2),                                              \
+      AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
+                ((CDF_INIT_TOP - 10) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 10)) +                      \
+               3),                                              \
+      AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
+                ((CDF_INIT_TOP - 10) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 10)) +                      \
+               4),                                              \
+      AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
+                ((CDF_INIT_TOP - 10) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 10)) +                      \
+               5),                                              \
+      AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
+                ((CDF_INIT_TOP - 10) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 10)) +                      \
+               6),                                              \
+      AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
+                ((CDF_INIT_TOP - 10) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 10)) +                      \
+               7),                                              \
+      AOM_ICDF((((a7)-8) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
+                ((CDF_INIT_TOP - 10) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 10)) +                      \
+               8),                                              \
+      AOM_ICDF((((a8)-9) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
+                ((CDF_INIT_TOP - 10) >> 1)) /                   \
+                   ((CDF_INIT_TOP - 10)) +                      \
+               9),                                              \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF11(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9)        \
+  AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) +      \
+            ((CDF_INIT_TOP - 11) >> 1)) /                        \
+               ((CDF_INIT_TOP - 11)) +                           \
+           1)                                                    \
+  ,                                                              \
+      AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) +  \
+                ((CDF_INIT_TOP - 11) >> 1)) /                    \
+                   ((CDF_INIT_TOP - 11)) +                       \
+               2),                                               \
+      AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) +  \
+                ((CDF_INIT_TOP - 11) >> 1)) /                    \
+                   ((CDF_INIT_TOP - 11)) +                       \
+               3),                                               \
+      AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) +  \
+                ((CDF_INIT_TOP - 11) >> 1)) /                    \
+                   ((CDF_INIT_TOP - 11)) +                       \
+               4),                                               \
+      AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) +  \
+                ((CDF_INIT_TOP - 11) >> 1)) /                    \
+                   ((CDF_INIT_TOP - 11)) +                       \
+               5),                                               \
+      AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) +  \
+                ((CDF_INIT_TOP - 11) >> 1)) /                    \
+                   ((CDF_INIT_TOP - 11)) +                       \
+               6),                                               \
+      AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) +  \
+                ((CDF_INIT_TOP - 11) >> 1)) /                    \
+                   ((CDF_INIT_TOP - 11)) +                       \
+               7),                                               \
+      AOM_ICDF((((a7)-8) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) +  \
+                ((CDF_INIT_TOP - 11) >> 1)) /                    \
+                   ((CDF_INIT_TOP - 11)) +                       \
+               8),                                               \
+      AOM_ICDF((((a8)-9) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) +  \
+                ((CDF_INIT_TOP - 11) >> 1)) /                    \
+                   ((CDF_INIT_TOP - 11)) +                       \
+               9),                                               \
+      AOM_ICDF((((a9)-10) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) + \
+                ((CDF_INIT_TOP - 11) >> 1)) /                    \
+                   ((CDF_INIT_TOP - 11)) +                       \
+               10),                                              \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF12(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10)    \
+  AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) +       \
+            ((CDF_INIT_TOP - 12) >> 1)) /                         \
+               ((CDF_INIT_TOP - 12)) +                            \
+           1)                                                     \
+  ,                                                               \
+      AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) +   \
+                ((CDF_INIT_TOP - 12) >> 1)) /                     \
+                   ((CDF_INIT_TOP - 12)) +                        \
+               2),                                                \
+      AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) +   \
+                ((CDF_INIT_TOP - 12) >> 1)) /                     \
+                   ((CDF_INIT_TOP - 12)) +                        \
+               3),                                                \
+      AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) +   \
+                ((CDF_INIT_TOP - 12) >> 1)) /                     \
+                   ((CDF_INIT_TOP - 12)) +                        \
+               4),                                                \
+      AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) +   \
+                ((CDF_INIT_TOP - 12) >> 1)) /                     \
+                   ((CDF_INIT_TOP - 12)) +                        \
+               5),                                                \
+      AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) +   \
+                ((CDF_INIT_TOP - 12) >> 1)) /                     \
+                   ((CDF_INIT_TOP - 12)) +                        \
+               6),                                                \
+      AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) +   \
+                ((CDF_INIT_TOP - 12) >> 1)) /                     \
+                   ((CDF_INIT_TOP - 12)) +                        \
+               7),                                                \
+      AOM_ICDF((((a7)-8) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) +   \
+                ((CDF_INIT_TOP - 12) >> 1)) /                     \
+                   ((CDF_INIT_TOP - 12)) +                        \
+               8),                                                \
+      AOM_ICDF((((a8)-9) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) +   \
+                ((CDF_INIT_TOP - 12) >> 1)) /                     \
+                   ((CDF_INIT_TOP - 12)) +                        \
+               9),                                                \
+      AOM_ICDF((((a9)-10) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) +  \
+                ((CDF_INIT_TOP - 12) >> 1)) /                     \
+                   ((CDF_INIT_TOP - 12)) +                        \
+               10),                                               \
+      AOM_ICDF((((a10)-11) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) + \
+                ((CDF_INIT_TOP - 12) >> 1)) /                     \
+                   ((CDF_INIT_TOP - 12)) +                        \
+               11),                                               \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF13(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11) \
+  AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) +         \
+            ((CDF_INIT_TOP - 13) >> 1)) /                           \
+               ((CDF_INIT_TOP - 13)) +                              \
+           1)                                                       \
+  ,                                                                 \
+      AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) +     \
+                ((CDF_INIT_TOP - 13) >> 1)) /                       \
+                   ((CDF_INIT_TOP - 13)) +                          \
+               2),                                                  \
+      AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) +     \
+                ((CDF_INIT_TOP - 13) >> 1)) /                       \
+                   ((CDF_INIT_TOP - 13)) +                          \
+               3),                                                  \
+      AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) +     \
+                ((CDF_INIT_TOP - 13) >> 1)) /                       \
+                   ((CDF_INIT_TOP - 13)) +                          \
+               4),                                                  \
+      AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) +     \
+                ((CDF_INIT_TOP - 13) >> 1)) /                       \
+                   ((CDF_INIT_TOP - 13)) +                          \
+               5),                                                  \
+      AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) +     \
+                ((CDF_INIT_TOP - 13) >> 1)) /                       \
+                   ((CDF_INIT_TOP - 13)) +                          \
+               6),                                                  \
+      AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) +     \
+                ((CDF_INIT_TOP - 13) >> 1)) /                       \
+                   ((CDF_INIT_TOP - 13)) +                          \
+               7),                                                  \
+      AOM_ICDF((((a7)-8) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) +     \
+                ((CDF_INIT_TOP - 13) >> 1)) /                       \
+                   ((CDF_INIT_TOP - 13)) +                          \
+               8),                                                  \
+      AOM_ICDF((((a8)-9) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) +     \
+                ((CDF_INIT_TOP - 13) >> 1)) /                       \
+                   ((CDF_INIT_TOP - 13)) +                          \
+               9),                                                  \
+      AOM_ICDF((((a9)-10) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) +    \
+                ((CDF_INIT_TOP - 13) >> 1)) /                       \
+                   ((CDF_INIT_TOP - 13)) +                          \
+               10),                                                 \
+      AOM_ICDF((((a10)-11) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) +   \
+                ((CDF_INIT_TOP - 13) >> 1)) /                       \
+                   ((CDF_INIT_TOP - 13)) +                          \
+               11),                                                 \
+      AOM_ICDF((((a11)-12) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) +   \
+                ((CDF_INIT_TOP - 13) >> 1)) /                       \
+                   ((CDF_INIT_TOP - 13)) +                          \
+               12),                                                 \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF14(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12) \
+  AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) +              \
+            ((CDF_INIT_TOP - 14) >> 1)) /                                \
+               ((CDF_INIT_TOP - 14)) +                                   \
+           1)                                                            \
+  ,                                                                      \
+      AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) +          \
+                ((CDF_INIT_TOP - 14) >> 1)) /                            \
+                   ((CDF_INIT_TOP - 14)) +                               \
+               2),                                                       \
+      AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) +          \
+                ((CDF_INIT_TOP - 14) >> 1)) /                            \
+                   ((CDF_INIT_TOP - 14)) +                               \
+               3),                                                       \
+      AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) +          \
+                ((CDF_INIT_TOP - 14) >> 1)) /                            \
+                   ((CDF_INIT_TOP - 14)) +                               \
+               4),                                                       \
+      AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) +          \
+                ((CDF_INIT_TOP - 14) >> 1)) /                            \
+                   ((CDF_INIT_TOP - 14)) +                               \
+               5),                                                       \
+      AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) +          \
+                ((CDF_INIT_TOP - 14) >> 1)) /                            \
+                   ((CDF_INIT_TOP - 14)) +                               \
+               6),                                                       \
+      AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) +          \
+                ((CDF_INIT_TOP - 14) >> 1)) /                            \
+                   ((CDF_INIT_TOP - 14)) +                               \
+               7),                                                       \
+      AOM_ICDF((((a7)-8) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) +          \
+                ((CDF_INIT_TOP - 14) >> 1)) /                            \
+                   ((CDF_INIT_TOP - 14)) +                               \
+               8),                                                       \
+      AOM_ICDF((((a8)-9) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) +          \
+                ((CDF_INIT_TOP - 14) >> 1)) /                            \
+                   ((CDF_INIT_TOP - 14)) +                               \
+               9),                                                       \
+      AOM_ICDF((((a9)-10) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) +         \
+                ((CDF_INIT_TOP - 14) >> 1)) /                            \
+                   ((CDF_INIT_TOP - 14)) +                               \
+               10),                                                      \
+      AOM_ICDF((((a10)-11) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) +        \
+                ((CDF_INIT_TOP - 14) >> 1)) /                            \
+                   ((CDF_INIT_TOP - 14)) +                               \
+               11),                                                      \
+      AOM_ICDF((((a11)-12) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) +        \
+                ((CDF_INIT_TOP - 14) >> 1)) /                            \
+                   ((CDF_INIT_TOP - 14)) +                               \
+               12),                                                      \
+      AOM_ICDF((((a12)-13) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) +        \
+                ((CDF_INIT_TOP - 14) >> 1)) /                            \
+                   ((CDF_INIT_TOP - 14)) +                               \
+               13),                                                      \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF15(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13) \
+  AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) +                   \
+            ((CDF_INIT_TOP - 15) >> 1)) /                                     \
+               ((CDF_INIT_TOP - 15)) +                                        \
+           1)                                                                 \
+  ,                                                                           \
+      AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) +               \
+                ((CDF_INIT_TOP - 15) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 15)) +                                    \
+               2),                                                            \
+      AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) +               \
+                ((CDF_INIT_TOP - 15) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 15)) +                                    \
+               3),                                                            \
+      AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) +               \
+                ((CDF_INIT_TOP - 15) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 15)) +                                    \
+               4),                                                            \
+      AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) +               \
+                ((CDF_INIT_TOP - 15) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 15)) +                                    \
+               5),                                                            \
+      AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) +               \
+                ((CDF_INIT_TOP - 15) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 15)) +                                    \
+               6),                                                            \
+      AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) +               \
+                ((CDF_INIT_TOP - 15) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 15)) +                                    \
+               7),                                                            \
+      AOM_ICDF((((a7)-8) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) +               \
+                ((CDF_INIT_TOP - 15) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 15)) +                                    \
+               8),                                                            \
+      AOM_ICDF((((a8)-9) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) +               \
+                ((CDF_INIT_TOP - 15) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 15)) +                                    \
+               9),                                                            \
+      AOM_ICDF((((a9)-10) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) +              \
+                ((CDF_INIT_TOP - 15) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 15)) +                                    \
+               10),                                                           \
+      AOM_ICDF((((a10)-11) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) +             \
+                ((CDF_INIT_TOP - 15) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 15)) +                                    \
+               11),                                                           \
+      AOM_ICDF((((a11)-12) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) +             \
+                ((CDF_INIT_TOP - 15) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 15)) +                                    \
+               12),                                                           \
+      AOM_ICDF((((a12)-13) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) +             \
+                ((CDF_INIT_TOP - 15) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 15)) +                                    \
+               13),                                                           \
+      AOM_ICDF((((a13)-14) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) +             \
+                ((CDF_INIT_TOP - 15) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 15)) +                                    \
+               14),                                                           \
+      AOM_ICDF(CDF_PROB_TOP), 0
+#define AOM_CDF16(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, \
+                  a14)                                                        \
+  AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) +                   \
+            ((CDF_INIT_TOP - 16) >> 1)) /                                     \
+               ((CDF_INIT_TOP - 16)) +                                        \
+           1)                                                                 \
+  ,                                                                           \
+      AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) +               \
+                ((CDF_INIT_TOP - 16) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 16)) +                                    \
+               2),                                                            \
+      AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) +               \
+                ((CDF_INIT_TOP - 16) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 16)) +                                    \
+               3),                                                            \
+      AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) +               \
+                ((CDF_INIT_TOP - 16) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 16)) +                                    \
+               4),                                                            \
+      AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) +               \
+                ((CDF_INIT_TOP - 16) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 16)) +                                    \
+               5),                                                            \
+      AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) +               \
+                ((CDF_INIT_TOP - 16) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 16)) +                                    \
+               6),                                                            \
+      AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) +               \
+                ((CDF_INIT_TOP - 16) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 16)) +                                    \
+               7),                                                            \
+      AOM_ICDF((((a7)-8) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) +               \
+                ((CDF_INIT_TOP - 16) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 16)) +                                    \
+               8),                                                            \
+      AOM_ICDF((((a8)-9) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) +               \
+                ((CDF_INIT_TOP - 16) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 16)) +                                    \
+               9),                                                            \
+      AOM_ICDF((((a9)-10) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) +              \
+                ((CDF_INIT_TOP - 16) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 16)) +                                    \
+               10),                                                           \
+      AOM_ICDF((((a10)-11) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) +             \
+                ((CDF_INIT_TOP - 16) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 16)) +                                    \
+               11),                                                           \
+      AOM_ICDF((((a11)-12) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) +             \
+                ((CDF_INIT_TOP - 16) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 16)) +                                    \
+               12),                                                           \
+      AOM_ICDF((((a12)-13) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) +             \
+                ((CDF_INIT_TOP - 16) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 16)) +                                    \
+               13),                                                           \
+      AOM_ICDF((((a13)-14) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) +             \
+                ((CDF_INIT_TOP - 16) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 16)) +                                    \
+               14),                                                           \
+      AOM_ICDF((((a14)-15) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) +             \
+                ((CDF_INIT_TOP - 16) >> 1)) /                                 \
+                   ((CDF_INIT_TOP - 16)) +                                    \
+               15),                                                           \
+      AOM_ICDF(CDF_PROB_TOP), 0
+
+#endif
+
+static INLINE uint8_t get_prob(unsigned int num, unsigned int den) {
+  assert(den != 0);
+  {
+    const int p = (int)(((uint64_t)num * 256 + (den >> 1)) / den);
+    // (p > 255) ? 255 : (p < 1) ? 1 : p;
+    const int clipped_prob = p | ((255 - p) >> 23) | (p == 0);
+    return (uint8_t)clipped_prob;
+  }
+}
+
+static INLINE void update_cdf(aom_cdf_prob *cdf, int8_t val, int nsymbs) {
+  int rate;
+  int i, tmp;
+
+  static const int nsymbs2speed[17] = { 0, 0, 1, 1, 2, 2, 2, 2, 2,
+                                        2, 2, 2, 2, 2, 2, 2, 2 };
+  assert(nsymbs < 17);
+  rate = 3 + (cdf[nsymbs] > 15) + (cdf[nsymbs] > 31) +
+         nsymbs2speed[nsymbs];  // + get_msb(nsymbs);
+  tmp = AOM_ICDF(0);
+
+  // Single loop (faster)
+  for (i = 0; i < nsymbs - 1; ++i) {
+    tmp = (i == val) ? 0 : tmp;
+    if (tmp < cdf[i]) {
+      cdf[i] -= ((cdf[i] - tmp) >> rate);
+    } else {
+      cdf[i] += ((tmp - cdf[i]) >> rate);
+    }
+  }
+  cdf[nsymbs] += (cdf[nsymbs] < 32);
+}
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_DSP_PROB_H_

diff --git a/libav1/aom_dsp/recenter.h b/libav1/aom_dsp/recenter.h
new file mode 100644
index 0000000..b3fd412
--- /dev/null
+++ b/libav1/aom_dsp/recenter.h

@@ -0,0 +1,61 @@
+/*
+ * Copyright (c) 2018, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_RECENTER_H_
+#define AOM_AOM_DSP_RECENTER_H_
+
+#include "config/aom_config.h"
+
+#include "aom/aom_integer.h"
+
+// Inverse recenters a non-negative literal v around a reference r
+static INLINE uint16_t inv_recenter_nonneg(uint16_t r, uint16_t v) {
+  if (v > (r << 1))
+    return v;
+  else if ((v & 1) == 0)
+    return (v >> 1) + r;
+  else
+    return r - ((v + 1) >> 1);
+}
+
+// Inverse recenters a non-negative literal v in [0, n-1] around a
+// reference r also in [0, n-1]
+static INLINE uint16_t inv_recenter_finite_nonneg(uint16_t n, uint16_t r,
+                                                  uint16_t v) {
+  if ((r << 1) <= n) {
+    return inv_recenter_nonneg(r, v);
+  } else {
+    return n - 1 - inv_recenter_nonneg(n - 1 - r, v);
+  }
+}
+
+// Recenters a non-negative literal v around a reference r
+static INLINE uint16_t recenter_nonneg(uint16_t r, uint16_t v) {
+  if (v > (r << 1))
+    return v;
+  else if (v >= r)
+    return ((v - r) << 1);
+  else
+    return ((r - v) << 1) - 1;
+}
+
+// Recenters a non-negative literal v in [0, n-1] around a
+// reference r also in [0, n-1]
+static INLINE uint16_t recenter_finite_nonneg(uint16_t n, uint16_t r,
+                                              uint16_t v) {
+  if ((r << 1) <= n) {
+    return recenter_nonneg(r, v);
+  } else {
+    return recenter_nonneg(n - 1 - r, n - 1 - v);
+  }
+}
+
+#endif  // AOM_AOM_DSP_RECENTER_H_

diff --git a/libav1/aom_dsp/subtract.c b/libav1/aom_dsp/subtract.c
new file mode 100644
index 0000000..2f6da96
--- /dev/null
+++ b/libav1/aom_dsp/subtract.c

@@ -0,0 +1,53 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <stdlib.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+
+#include "aom/aom_integer.h"
+#include "aom_ports/mem.h"
+
+void aom_subtract_block_c(int rows, int cols, int16_t *diff,
+                          ptrdiff_t diff_stride, const uint8_t *src,
+                          ptrdiff_t src_stride, const uint8_t *pred,
+                          ptrdiff_t pred_stride) {
+  int r, c;
+
+  for (r = 0; r < rows; r++) {
+    for (c = 0; c < cols; c++) diff[c] = src[c] - pred[c];
+
+    diff += diff_stride;
+    pred += pred_stride;
+    src += src_stride;
+  }
+}
+
+void aom_highbd_subtract_block_c(int rows, int cols, int16_t *diff,
+                                 ptrdiff_t diff_stride, const uint8_t *src8,
+                                 ptrdiff_t src_stride, const uint8_t *pred8,
+                                 ptrdiff_t pred_stride, int bd) {
+  int r, c;
+  uint16_t *src = CONVERT_TO_SHORTPTR(src8);
+  uint16_t *pred = CONVERT_TO_SHORTPTR(pred8);
+  (void)bd;
+
+  for (r = 0; r < rows; r++) {
+    for (c = 0; c < cols; c++) {
+      diff[c] = src[c] - pred[c];
+    }
+
+    diff += diff_stride;
+    pred += pred_stride;
+    src += src_stride;
+  }
+}

diff --git a/libav1/aom_dsp/sum_squares.c b/libav1/aom_dsp/sum_squares.c
new file mode 100644
index 0000000..44ec41f
--- /dev/null
+++ b/libav1/aom_dsp/sum_squares.c

@@ -0,0 +1,40 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+
+#include "config/aom_dsp_rtcd.h"
+
+uint64_t aom_sum_squares_2d_i16_c(const int16_t *src, int src_stride, int width,
+                                  int height) {
+  int r, c;
+  uint64_t ss = 0;
+
+  for (r = 0; r < height; r++) {
+    for (c = 0; c < width; c++) {
+      const int16_t v = src[c];
+      ss += v * v;
+    }
+    src += src_stride;
+  }
+
+  return ss;
+}
+
+uint64_t aom_sum_squares_i16_c(const int16_t *src, uint32_t n) {
+  uint64_t ss = 0;
+  do {
+    const int16_t v = *src++;
+    ss += v * v;
+  } while (--n);
+
+  return ss;
+}

diff --git a/libav1/aom_dsp/txfm_common.h b/libav1/aom_dsp/txfm_common.h
new file mode 100644
index 0000000..f13d690
--- /dev/null
+++ b/libav1/aom_dsp/txfm_common.h

@@ -0,0 +1,91 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_TXFM_COMMON_H_
+#define AOM_AOM_DSP_TXFM_COMMON_H_
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "av1/common/enums.h"
+
+// Constants and Macros used by all idct/dct functions
+#define DCT_CONST_BITS 14
+#define DCT_CONST_ROUNDING (1 << (DCT_CONST_BITS - 1))
+
+#define UNIT_QUANT_SHIFT 2
+#define UNIT_QUANT_FACTOR (1 << UNIT_QUANT_SHIFT)
+
+typedef struct txfm_param {
+  // for both forward and inverse transforms
+  TX_TYPE tx_type;
+  TX_SIZE tx_size;
+  int lossless;
+  int bd;
+  // are the pixel buffers octets or shorts?  This should collapse to
+  // bd==8 implies !is_hbd, but that's not certain right now.
+  int is_hbd;
+  TxSetType tx_set_type;
+  // for inverse transforms only
+  int eob;
+} TxfmParam;
+
+// Constants:
+//  for (int i = 1; i< 32; ++i)
+//    printf("static const int cospi_%d_64 = %.0f;\n", i,
+//           round(16384 * cos(i*PI/64)));
+// Note: sin(k*Pi/64) = cos((32-k)*Pi/64)
+static const tran_high_t cospi_1_64 = 16364;
+static const tran_high_t cospi_2_64 = 16305;
+static const tran_high_t cospi_3_64 = 16207;
+static const tran_high_t cospi_4_64 = 16069;
+static const tran_high_t cospi_5_64 = 15893;
+static const tran_high_t cospi_6_64 = 15679;
+static const tran_high_t cospi_7_64 = 15426;
+static const tran_high_t cospi_8_64 = 15137;
+static const tran_high_t cospi_9_64 = 14811;
+static const tran_high_t cospi_10_64 = 14449;
+static const tran_high_t cospi_11_64 = 14053;
+static const tran_high_t cospi_12_64 = 13623;
+static const tran_high_t cospi_13_64 = 13160;
+static const tran_high_t cospi_14_64 = 12665;
+static const tran_high_t cospi_15_64 = 12140;
+static const tran_high_t cospi_16_64 = 11585;
+static const tran_high_t cospi_17_64 = 11003;
+static const tran_high_t cospi_18_64 = 10394;
+static const tran_high_t cospi_19_64 = 9760;
+static const tran_high_t cospi_20_64 = 9102;
+static const tran_high_t cospi_21_64 = 8423;
+static const tran_high_t cospi_22_64 = 7723;
+static const tran_high_t cospi_23_64 = 7005;
+static const tran_high_t cospi_24_64 = 6270;
+static const tran_high_t cospi_25_64 = 5520;
+static const tran_high_t cospi_26_64 = 4756;
+static const tran_high_t cospi_27_64 = 3981;
+static const tran_high_t cospi_28_64 = 3196;
+static const tran_high_t cospi_29_64 = 2404;
+static const tran_high_t cospi_30_64 = 1606;
+static const tran_high_t cospi_31_64 = 804;
+
+//  16384 * sqrt(2) * sin(kPi/9) * 2 / 3
+static const tran_high_t sinpi_1_9 = 5283;
+static const tran_high_t sinpi_2_9 = 9929;
+static const tran_high_t sinpi_3_9 = 13377;
+static const tran_high_t sinpi_4_9 = 15212;
+
+// 16384 * sqrt(2)
+static const tran_high_t Sqrt2 = 23170;
+static const tran_high_t InvSqrt2 = 11585;
+
+static INLINE tran_high_t fdct_round_shift(tran_high_t input) {
+  tran_high_t rv = ROUND_POWER_OF_TWO(input, DCT_CONST_BITS);
+  return rv;
+}
+
+#endif  // AOM_AOM_DSP_TXFM_COMMON_H_

diff --git a/libav1/aom_mem/aom_mem.c b/libav1/aom_mem/aom_mem.c
new file mode 100644
index 0000000..e603fc5
--- /dev/null
+++ b/libav1/aom_mem/aom_mem.c

@@ -0,0 +1,84 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "aom_mem.h"
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include "include/aom_mem_intrnl.h"
+#include "aom/aom_integer.h"
+
+#if defined(AOM_MAX_ALLOCABLE_MEMORY)
+// Returns 0 in case of overflow of nmemb * size.
+static int check_size_argument_overflow(uint64_t nmemb, uint64_t size) {
+  const uint64_t total_size = nmemb * size;
+  if (nmemb == 0) return 1;
+  if (size > AOM_MAX_ALLOCABLE_MEMORY / nmemb) return 0;
+  if (total_size != (size_t)total_size) return 0;
+  return 1;
+}
+#endif
+
+static size_t GetAlignedMallocSize(size_t size, size_t align) {
+  return size + align - 1 + ADDRESS_STORAGE_SIZE;
+}
+
+static size_t *GetMallocAddressLocation(void *const mem) {
+  return ((size_t *)mem) - 1;
+}
+
+static void SetActualMallocAddress(void *const mem,
+                                   const void *const malloc_addr) {
+  size_t *const malloc_addr_location = GetMallocAddressLocation(mem);
+  *malloc_addr_location = (size_t)malloc_addr;
+}
+
+static void *GetActualMallocAddress(void *const mem) {
+  const size_t *const malloc_addr_location = GetMallocAddressLocation(mem);
+  return (void *)(*malloc_addr_location);
+}
+
+void *aom_memalign(size_t align, size_t size) {
+  void *x = NULL;
+  const size_t aligned_size = GetAlignedMallocSize(size, align);
+#if defined(AOM_MAX_ALLOCABLE_MEMORY)
+  if (!check_size_argument_overflow(1, aligned_size)) return NULL;
+#endif
+  void *const addr = malloc(aligned_size);
+  if (addr) {
+    x = align_addr((unsigned char *)addr + ADDRESS_STORAGE_SIZE, align);
+    SetActualMallocAddress(x, addr);
+  }
+  return x;
+}
+
+void *aom_malloc(size_t size) { return aom_memalign(DEFAULT_ALIGNMENT, size); }
+
+void *aom_calloc(size_t num, size_t size) {
+  const size_t total_size = num * size;
+  void *const x = aom_malloc(total_size);
+  if (x) memset(x, 0, total_size);
+  return x;
+}
+
+void aom_free(void *memblk) {
+  if (memblk) {
+    void *addr = GetActualMallocAddress(memblk);
+    free(addr);
+  }
+}
+
+void *aom_memset16(void *dest, int val, size_t length) {
+  size_t i;
+  uint16_t *dest16 = (uint16_t *)dest;
+  for (i = 0; i < length; i++) *dest16++ = val;
+  return dest;
+}

diff --git a/libav1/aom_mem/aom_mem.h b/libav1/aom_mem/aom_mem.h
new file mode 100644
index 0000000..4b1fa45
--- /dev/null
+++ b/libav1/aom_mem/aom_mem.h

@@ -0,0 +1,70 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_MEM_AOM_MEM_H_
+#define AOM_AOM_MEM_AOM_MEM_H_
+
+#include "aom/aom_integer.h"
+#include "config/aom_config.h"
+
+#if defined(__uClinux__)
+#include <lddk.h>
+#endif
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#ifndef AOM_MAX_ALLOCABLE_MEMORY
+#if SIZE_MAX > (1ULL << 32)
+#define AOM_MAX_ALLOCABLE_MEMORY 8589934592  // 8 GB
+#else
+// For 32-bit targets keep this below INT_MAX to avoid valgrind warnings.
+#define AOM_MAX_ALLOCABLE_MEMORY ((1ULL << 31) - (1 << 16))
+#endif
+#endif
+
+void *aom_memalign(size_t align, size_t size);
+void *aom_malloc(size_t size);
+void *aom_calloc(size_t num, size_t size);
+void aom_free(void *memblk);
+void *aom_memset16(void *dest, int val, size_t length);
+
+#include <string.h>
+
+#ifdef AOM_MEM_PLTFRM
+#include AOM_MEM_PLTFRM
+#endif
+
+#if CONFIG_DEBUG
+#define AOM_CHECK_MEM_ERROR(error_info, lval, expr)                         \
+  do {                                                                      \
+    lval = (expr);                                                          \
+    if (!lval)                                                              \
+      aom_internal_error(error_info, AOM_CODEC_MEM_ERROR,                   \
+                         "Failed to allocate " #lval " at %s:%d", __FILE__, \
+                         __LINE__);                                         \
+  } while (0)
+#else
+#define AOM_CHECK_MEM_ERROR(error_info, lval, expr)       \
+  do {                                                    \
+    lval = (expr);                                        \
+    if (!lval)                                            \
+      aom_internal_error(error_info, AOM_CODEC_MEM_ERROR, \
+                         "Failed to allocate " #lval);    \
+  } while (0)
+#endif
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif  // AOM_AOM_MEM_AOM_MEM_H_

diff --git a/libav1/aom_mem/include/aom_mem_intrnl.h b/libav1/aom_mem/include/aom_mem_intrnl.h
new file mode 100644
index 0000000..cbc30a9
--- /dev/null
+++ b/libav1/aom_mem/include/aom_mem_intrnl.h

@@ -0,0 +1,33 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_MEM_INCLUDE_AOM_MEM_INTRNL_H_
+#define AOM_AOM_MEM_INCLUDE_AOM_MEM_INTRNL_H_
+
+#include "config/aom_config.h"
+
+#define ADDRESS_STORAGE_SIZE sizeof(size_t)
+
+#ifndef DEFAULT_ALIGNMENT
+#if defined(VXWORKS)
+/*default addr alignment to use in calls to aom_* functions other than
+  aom_memalign*/
+#define DEFAULT_ALIGNMENT 32
+#else
+#define DEFAULT_ALIGNMENT (2 * sizeof(void *)) /* NOLINT */
+#endif
+#endif
+
+/*returns an addr aligned to the byte boundary specified by align*/
+#define align_addr(addr, align) \
+  (void *)(((size_t)(addr) + ((align)-1)) & ~(size_t)((align)-1))
+
+#endif  // AOM_AOM_MEM_INCLUDE_AOM_MEM_INTRNL_H_

diff --git a/libav1/aom_ports/aom_once.h b/libav1/aom_ports/aom_once.h
new file mode 100644
index 0000000..d1a031b
--- /dev/null
+++ b/libav1/aom_ports/aom_once.h

@@ -0,0 +1,104 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_PORTS_AOM_ONCE_H_
+#define AOM_AOM_PORTS_AOM_ONCE_H_
+
+#include "config/aom_config.h"
+
+/* Implement a function wrapper to guarantee initialization
+ * thread-safety for library singletons.
+ *
+ * NOTE: This function uses static locks, and can only be
+ * used with one common argument per compilation unit. So
+ *
+ * file1.c:
+ *   aom_once(foo);
+ *   ...
+ *   aom_once(foo);
+ *
+ * file2.c:
+ *   aom_once(bar);
+ *
+ * will ensure foo() and bar() are each called only once, but in
+ *
+ * file1.c:
+ *   aom_once(foo);
+ *   aom_once(bar):
+ *
+ * bar() will never be called because the lock is used up
+ * by the call to foo().
+ */
+
+#if CONFIG_MULTITHREAD && defined(_WIN32)
+#include <windows.h>
+/* Declare a per-compilation-unit state variable to track the progress
+ * of calling func() only once. This must be at global scope because
+ * local initializers are not thread-safe in MSVC prior to Visual
+ * Studio 2015.
+ */
+static INIT_ONCE aom_init_once = INIT_ONCE_STATIC_INIT;
+
+static void aom_once(void (*func)(void)) {
+  BOOL pending;
+  InitOnceBeginInitialize(&aom_init_once, 0, &pending, NULL);
+  if (!pending) {
+    // Initialization has already completed.
+    return;
+  }
+  func();
+  InitOnceComplete(&aom_init_once, 0, NULL);
+}
+
+#elif CONFIG_MULTITHREAD && defined(__OS2__)
+#define INCL_DOS
+#include <os2.h>
+static void aom_once(void (*func)(void)) {
+  static int done;
+
+  /* If the initialization is complete, return early. */
+  if (done) return;
+
+  /* Causes all other threads in the process to block themselves
+   * and give up their time slice.
+   */
+  DosEnterCritSec();
+
+  if (!done) {
+    func();
+    done = 1;
+  }
+
+  /* Restores normal thread dispatching for the current process. */
+  DosExitCritSec();
+}
+
+#elif CONFIG_MULTITHREAD && HAVE_PTHREAD_H
+#include <pthread.h>
+static void aom_once(void (*func)(void)) {
+  static pthread_once_t lock = PTHREAD_ONCE_INIT;
+  pthread_once(&lock, func);
+}
+
+#else
+/* Default version that performs no synchronization. */
+
+static void aom_once(void (*func)(void)) {
+  static int done;
+
+  if (!done) {
+    func();
+    done = 1;
+  }
+}
+#endif
+
+#endif  // AOM_AOM_PORTS_AOM_ONCE_H_

diff --git a/libav1/aom_ports/aom_timer.h b/libav1/aom_ports/aom_timer.h
new file mode 100644
index 0000000..9b17b89
--- /dev/null
+++ b/libav1/aom_ports/aom_timer.h

@@ -0,0 +1,111 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_PORTS_AOM_TIMER_H_
+#define AOM_AOM_PORTS_AOM_TIMER_H_
+
+#include "config/aom_config.h"
+
+#include "aom/aom_integer.h"
+
+#if CONFIG_OS_SUPPORT
+
+#if defined(_WIN32)
+/*
+ * Win32 specific includes
+ */
+#ifndef WIN32_LEAN_AND_MEAN
+#define WIN32_LEAN_AND_MEAN
+#endif
+#include <windows.h>
+#else
+/*
+ * POSIX specific includes
+ */
+#include <sys/time.h>
+
+/* timersub is not provided by msys at this time. */
+#ifndef timersub
+#define timersub(a, b, result)                       \
+  do {                                               \
+    (result)->tv_sec = (a)->tv_sec - (b)->tv_sec;    \
+    (result)->tv_usec = (a)->tv_usec - (b)->tv_usec; \
+    if ((result)->tv_usec < 0) {                     \
+      --(result)->tv_sec;                            \
+      (result)->tv_usec += 1000000;                  \
+    }                                                \
+  } while (0)
+#endif
+#endif
+
+struct aom_usec_timer {
+#if defined(_WIN32)
+  LARGE_INTEGER begin, end;
+#else
+  struct timeval begin, end;
+#endif
+};
+
+static INLINE void aom_usec_timer_start(struct aom_usec_timer *t) {
+#if defined(_WIN32)
+  QueryPerformanceCounter(&t->begin);
+#else
+  gettimeofday(&t->begin, NULL);
+#endif
+}
+
+static INLINE void aom_usec_timer_mark(struct aom_usec_timer *t) {
+#if defined(_WIN32)
+  QueryPerformanceCounter(&t->end);
+#else
+  gettimeofday(&t->end, NULL);
+#endif
+}
+
+static INLINE int64_t aom_usec_timer_elapsed(struct aom_usec_timer *t) {
+#if defined(_WIN32)
+  LARGE_INTEGER freq, diff;
+
+  diff.QuadPart = t->end.QuadPart - t->begin.QuadPart;
+
+  QueryPerformanceFrequency(&freq);
+  return diff.QuadPart * 1000000 / freq.QuadPart;
+#else
+  struct timeval diff;
+
+  timersub(&t->end, &t->begin, &diff);
+  return ((int64_t)diff.tv_sec) * 1000000 + diff.tv_usec;
+#endif
+}
+
+#else /* CONFIG_OS_SUPPORT = 0*/
+
+/* Empty timer functions if CONFIG_OS_SUPPORT = 0 */
+#ifndef timersub
+#define timersub(a, b, result)
+#endif
+
+struct aom_usec_timer {
+  void *dummy;
+};
+
+static INLINE void aom_usec_timer_start(struct aom_usec_timer *t) { (void)t; }
+
+static INLINE void aom_usec_timer_mark(struct aom_usec_timer *t) { (void)t; }
+
+static INLINE int aom_usec_timer_elapsed(struct aom_usec_timer *t) {
+  (void)t;
+  return 0;
+}
+
+#endif /* CONFIG_OS_SUPPORT */
+
+#endif  // AOM_AOM_PORTS_AOM_TIMER_H_

diff --git a/libav1/aom_ports/bitops.h b/libav1/aom_ports/bitops.h
new file mode 100644
index 0000000..44df173
--- /dev/null
+++ b/libav1/aom_ports/bitops.h

@@ -0,0 +1,78 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_PORTS_BITOPS_H_
+#define AOM_AOM_PORTS_BITOPS_H_
+
+#include <assert.h>
+
+#include "aom_ports/msvc.h"
+#include "config/aom_config.h"
+
+#ifdef _MSC_VER
+#if defined(_M_X64) || defined(_M_IX86)
+#include <intrin.h>
+#define USE_MSC_INTRINSICS
+#endif
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// get_msb:
+// Returns (int)floor(log2(n)). n must be > 0.
+// These versions of get_msb() are only valid when n != 0 because all
+// of the optimized versions are undefined when n == 0:
+// https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
+
+// use GNU builtins where available.
+#if defined(__GNUC__) && \
+    ((__GNUC__ == 3 && __GNUC_MINOR__ >= 4) || __GNUC__ >= 4)
+static INLINE int get_msb(unsigned int n) {
+  assert(n != 0);
+  return 31 ^ __builtin_clz(n);
+}
+#elif defined(USE_MSC_INTRINSICS)
+#pragma intrinsic(_BitScanReverse)
+
+static INLINE int get_msb(unsigned int n) {
+  unsigned long first_set_bit;
+  assert(n != 0);
+  _BitScanReverse(&first_set_bit, n);
+  return first_set_bit;
+}
+#undef USE_MSC_INTRINSICS
+#else
+static INLINE int get_msb(unsigned int n) {
+  int log = 0;
+  unsigned int value = n;
+  int i;
+
+  assert(n != 0);
+
+  for (i = 4; i >= 0; --i) {
+    const int shift = (1 << i);
+    const unsigned int x = value >> shift;
+    if (x != 0) {
+      value = x;
+      log += shift;
+    }
+  }
+  return log;
+}
+#endif
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_PORTS_BITOPS_H_

diff --git a/libav1/aom_ports/emmintrin_compat.h b/libav1/aom_ports/emmintrin_compat.h
new file mode 100644
index 0000000..85d218a
--- /dev/null
+++ b/libav1/aom_ports/emmintrin_compat.h

@@ -0,0 +1,56 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_PORTS_EMMINTRIN_COMPAT_H_
+#define AOM_AOM_PORTS_EMMINTRIN_COMPAT_H_
+
+#if defined(__GNUC__) && __GNUC__ < 4
+/* From emmintrin.h (gcc 4.5.3) */
+/* Casts between various SP, DP, INT vector types.  Note that these do no
+   conversion of values, they just change the type.  */
+extern __inline __m128
+    __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+    _mm_castpd_ps(__m128d __A) {
+  return (__m128)__A;
+}
+
+extern __inline __m128i
+    __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+    _mm_castpd_si128(__m128d __A) {
+  return (__m128i)__A;
+}
+
+extern __inline __m128d
+    __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+    _mm_castps_pd(__m128 __A) {
+  return (__m128d)__A;
+}
+
+extern __inline __m128i
+    __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+    _mm_castps_si128(__m128 __A) {
+  return (__m128i)__A;
+}
+
+extern __inline __m128
+    __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+    _mm_castsi128_ps(__m128i __A) {
+  return (__m128)__A;
+}
+
+extern __inline __m128d
+    __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+    _mm_castsi128_pd(__m128i __A) {
+  return (__m128d)__A;
+}
+#endif
+
+#endif  // AOM_AOM_PORTS_EMMINTRIN_COMPAT_H_

diff --git a/libav1/aom_ports/mem.h b/libav1/aom_ports/mem.h
new file mode 100644
index 0000000..9e3d424
--- /dev/null
+++ b/libav1/aom_ports/mem.h

@@ -0,0 +1,99 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_PORTS_MEM_H_
+#define AOM_AOM_PORTS_MEM_H_
+
+#include "aom/aom_integer.h"
+#include "config/aom_config.h"
+
+#if (defined(__GNUC__) && __GNUC__) || defined(__SUNPRO_C)
+#define DECLARE_ALIGNED(n, typ, val) typ val __attribute__((aligned(n)))
+#elif defined(_MSC_VER)
+#define DECLARE_ALIGNED(n, typ, val) __declspec(align(n)) typ val
+#else
+#warning No alignment directives known for this compiler.
+#define DECLARE_ALIGNED(n, typ, val) typ val
+#endif
+
+/* Indicates that the usage of the specified variable has been audited to assure
+ * that it's safe to use uninitialized. Silences 'may be used uninitialized'
+ * warnings on gcc.
+ */
+#if defined(__GNUC__) && __GNUC__
+#define UNINITIALIZED_IS_SAFE(x) x = x
+#else
+#define UNINITIALIZED_IS_SAFE(x) x
+#endif
+
+#if HAVE_NEON && defined(_MSC_VER)
+#define __builtin_prefetch(x)
+#endif
+
+/* Shift down with rounding for use when n >= 0, value >= 0 */
+#define ROUND_POWER_OF_TWO(value, n) (((value) + (((1 << (n)) >> 1))) >> (n))
+
+/* Shift down with rounding for signed integers, for use when n >= 0 */
+#define ROUND_POWER_OF_TWO_SIGNED(value, n)           \
+  (((value) < 0) ? -ROUND_POWER_OF_TWO(-(value), (n)) \
+                 : ROUND_POWER_OF_TWO((value), (n)))
+
+/* Shift down with rounding for use when n >= 0, value >= 0 for (64 bit) */
+#define ROUND_POWER_OF_TWO_64(value, n) \
+  (((value) + ((((int64_t)1 << (n)) >> 1))) >> (n))
+/* Shift down with rounding for signed integers, for use when n >= 0 (64 bit) */
+#define ROUND_POWER_OF_TWO_SIGNED_64(value, n)           \
+  (((value) < 0) ? -ROUND_POWER_OF_TWO_64(-(value), (n)) \
+                 : ROUND_POWER_OF_TWO_64((value), (n)))
+
+/* shift right or left depending on sign of n */
+#define RIGHT_SIGNED_SHIFT(value, n) \
+  ((n) < 0 ? ((value) << (-(n))) : ((value) >> (n)))
+
+#define ALIGN_POWER_OF_TWO(value, n) \
+  (((value) + ((1 << (n)) - 1)) & ~((1 << (n)) - 1))
+
+#define DIVIDE_AND_ROUND(x, y) (((x) + ((y) >> 1)) / (y))
+
+#define CONVERT_TO_SHORTPTR(x) ((uint16_t *)(((uintptr_t)(x)) << 1))
+#define CONVERT_TO_BYTEPTR(x) ((uint8_t *)(((uintptr_t)(x)) >> 1))
+
+/*!\brief force enum to be unsigned 1 byte*/
+#define UENUM1BYTE(enumvar) \
+  ;                         \
+  typedef uint8_t enumvar
+
+/*!\brief force enum to be signed 1 byte*/
+#define SENUM1BYTE(enumvar) \
+  ;                         \
+  typedef int8_t enumvar
+
+/*!\brief force enum to be unsigned 2 byte*/
+#define UENUM2BYTE(enumvar) \
+  ;                         \
+  typedef uint16_t enumvar
+
+/*!\brief force enum to be signed 2 byte*/
+#define SENUM2BYTE(enumvar) \
+  ;                         \
+  typedef int16_t enumvar
+
+/*!\brief force enum to be unsigned 4 byte*/
+#define UENUM4BYTE(enumvar) \
+  ;                         \
+  typedef uint32_t enumvar
+
+/*!\brief force enum to be unsigned 4 byte*/
+#define SENUM4BYTE(enumvar) \
+  ;                         \
+  typedef int32_t enumvar
+
+#endif  // AOM_AOM_PORTS_MEM_H_

diff --git a/libav1/aom_ports/mem_ops.h b/libav1/aom_ports/mem_ops.h
new file mode 100644
index 0000000..2b5bc0f
--- /dev/null
+++ b/libav1/aom_ports/mem_ops.h

@@ -0,0 +1,228 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_PORTS_MEM_OPS_H_
+#define AOM_AOM_PORTS_MEM_OPS_H_
+
+/* \file
+ * \brief Provides portable memory access primitives
+ *
+ * This function provides portable primitives for getting and setting of
+ * signed and unsigned integers in 16, 24, and 32 bit sizes. The operations
+ * can be performed on unaligned data regardless of hardware support for
+ * unaligned accesses.
+ *
+ * The type used to pass the integral values may be changed by defining
+ * MEM_VALUE_T with the appropriate type. The type given must be an integral
+ * numeric type.
+ *
+ * The actual functions instantiated have the MEM_VALUE_T type name pasted
+ * on to the symbol name. This allows the developer to instantiate these
+ * operations for multiple types within the same translation unit. This is
+ * of somewhat questionable utility, but the capability exists nonetheless.
+ * Users not making use of this functionality should call the functions
+ * without the type name appended, and the preprocessor will take care of
+ * it.
+ *
+ * NOTE: This code is not supported on platforms where char > 1 octet ATM.
+ */
+
+#ifndef MAU_T
+/* Minimum Access Unit for this target */
+#define MAU_T unsigned char
+#endif
+
+#ifndef MEM_VALUE_T
+#define MEM_VALUE_T int
+#endif
+
+#undef MEM_VALUE_T_SZ_BITS
+#define MEM_VALUE_T_SZ_BITS (sizeof(MEM_VALUE_T) << 3)
+
+#undef mem_ops_wrap_symbol
+#define mem_ops_wrap_symbol(fn) mem_ops_wrap_symbol2(fn, MEM_VALUE_T)
+#undef mem_ops_wrap_symbol2
+#define mem_ops_wrap_symbol2(fn, typ) mem_ops_wrap_symbol3(fn, typ)
+#undef mem_ops_wrap_symbol3
+#define mem_ops_wrap_symbol3(fn, typ) fn##_as_##typ
+
+/*
+ * Include aligned access routines
+ */
+#define INCLUDED_BY_MEM_OPS_H
+#include "mem_ops_aligned.h"
+#undef INCLUDED_BY_MEM_OPS_H
+
+#undef mem_get_be16
+#define mem_get_be16 mem_ops_wrap_symbol(mem_get_be16)
+static unsigned MEM_VALUE_T mem_get_be16(const void *vmem) {
+  unsigned MEM_VALUE_T val;
+  const MAU_T *mem = (const MAU_T *)vmem;
+
+  val = mem[0] << 8;
+  val |= mem[1];
+  return val;
+}
+
+#undef mem_get_be24
+#define mem_get_be24 mem_ops_wrap_symbol(mem_get_be24)
+static unsigned MEM_VALUE_T mem_get_be24(const void *vmem) {
+  unsigned MEM_VALUE_T val;
+  const MAU_T *mem = (const MAU_T *)vmem;
+
+  val = mem[0] << 16;
+  val |= mem[1] << 8;
+  val |= mem[2];
+  return val;
+}
+
+#undef mem_get_be32
+#define mem_get_be32 mem_ops_wrap_symbol(mem_get_be32)
+static unsigned MEM_VALUE_T mem_get_be32(const void *vmem) {
+  unsigned MEM_VALUE_T val;
+  const MAU_T *mem = (const MAU_T *)vmem;
+
+  val = ((unsigned MEM_VALUE_T)mem[0]) << 24;
+  val |= mem[1] << 16;
+  val |= mem[2] << 8;
+  val |= mem[3];
+  return val;
+}
+
+#undef mem_get_le16
+#define mem_get_le16 mem_ops_wrap_symbol(mem_get_le16)
+static unsigned MEM_VALUE_T mem_get_le16(const void *vmem) {
+  unsigned MEM_VALUE_T val;
+  const MAU_T *mem = (const MAU_T *)vmem;
+
+  val = mem[1] << 8;
+  val |= mem[0];
+  return val;
+}
+
+#undef mem_get_le24
+#define mem_get_le24 mem_ops_wrap_symbol(mem_get_le24)
+static unsigned MEM_VALUE_T mem_get_le24(const void *vmem) {
+  unsigned MEM_VALUE_T val;
+  const MAU_T *mem = (const MAU_T *)vmem;
+
+  val = mem[2] << 16;
+  val |= mem[1] << 8;
+  val |= mem[0];
+  return val;
+}
+
+#undef mem_get_le32
+#define mem_get_le32 mem_ops_wrap_symbol(mem_get_le32)
+static unsigned MEM_VALUE_T mem_get_le32(const void *vmem) {
+  unsigned MEM_VALUE_T val;
+  const MAU_T *mem = (const MAU_T *)vmem;
+
+  val = ((unsigned MEM_VALUE_T)mem[3]) << 24;
+  val |= mem[2] << 16;
+  val |= mem[1] << 8;
+  val |= mem[0];
+  return val;
+}
+
+#define mem_get_s_generic(end, sz)                                            \
+  static AOM_INLINE signed MEM_VALUE_T mem_get_s##end##sz(const void *vmem) { \
+    const MAU_T *mem = (const MAU_T *)vmem;                                   \
+    signed MEM_VALUE_T val = mem_get_##end##sz(mem);                          \
+    return (val << (MEM_VALUE_T_SZ_BITS - sz)) >> (MEM_VALUE_T_SZ_BITS - sz); \
+  }
+
+/* clang-format off */
+#undef  mem_get_sbe16
+#define mem_get_sbe16 mem_ops_wrap_symbol(mem_get_sbe16)
+mem_get_s_generic(be, 16)
+
+#undef  mem_get_sbe24
+#define mem_get_sbe24 mem_ops_wrap_symbol(mem_get_sbe24)
+mem_get_s_generic(be, 24)
+
+#undef  mem_get_sbe32
+#define mem_get_sbe32 mem_ops_wrap_symbol(mem_get_sbe32)
+mem_get_s_generic(be, 32)
+
+#undef  mem_get_sle16
+#define mem_get_sle16 mem_ops_wrap_symbol(mem_get_sle16)
+mem_get_s_generic(le, 16)
+
+#undef  mem_get_sle24
+#define mem_get_sle24 mem_ops_wrap_symbol(mem_get_sle24)
+mem_get_s_generic(le, 24)
+
+#undef  mem_get_sle32
+#define mem_get_sle32 mem_ops_wrap_symbol(mem_get_sle32)
+mem_get_s_generic(le, 32)
+
+#undef  mem_put_be16
+#define mem_put_be16 mem_ops_wrap_symbol(mem_put_be16)
+static AOM_INLINE void mem_put_be16(void *vmem, MEM_VALUE_T val) {
+  MAU_T *mem = (MAU_T *)vmem;
+
+  mem[0] = (MAU_T)((val >> 8) & 0xff);
+  mem[1] = (MAU_T)((val >> 0) & 0xff);
+}
+
+#undef  mem_put_be24
+#define mem_put_be24 mem_ops_wrap_symbol(mem_put_be24)
+static AOM_INLINE void mem_put_be24(void *vmem, MEM_VALUE_T val) {
+  MAU_T *mem = (MAU_T *)vmem;
+
+  mem[0] = (MAU_T)((val >> 16) & 0xff);
+  mem[1] = (MAU_T)((val >>  8) & 0xff);
+  mem[2] = (MAU_T)((val >>  0) & 0xff);
+}
+
+#undef  mem_put_be32
+#define mem_put_be32 mem_ops_wrap_symbol(mem_put_be32)
+static AOM_INLINE void mem_put_be32(void *vmem, MEM_VALUE_T val) {
+  MAU_T *mem = (MAU_T *)vmem;
+
+  mem[0] = (MAU_T)((val >> 24) & 0xff);
+  mem[1] = (MAU_T)((val >> 16) & 0xff);
+  mem[2] = (MAU_T)((val >>  8) & 0xff);
+  mem[3] = (MAU_T)((val >>  0) & 0xff);
+}
+
+#undef  mem_put_le16
+#define mem_put_le16 mem_ops_wrap_symbol(mem_put_le16)
+static AOM_INLINE void mem_put_le16(void *vmem, MEM_VALUE_T val) {
+  MAU_T *mem = (MAU_T *)vmem;
+
+  mem[0] = (MAU_T)((val >> 0) & 0xff);
+  mem[1] = (MAU_T)((val >> 8) & 0xff);
+}
+
+#undef  mem_put_le24
+#define mem_put_le24 mem_ops_wrap_symbol(mem_put_le24)
+static AOM_INLINE void mem_put_le24(void *vmem, MEM_VALUE_T val) {
+  MAU_T *mem = (MAU_T *)vmem;
+
+  mem[0] = (MAU_T)((val >>  0) & 0xff);
+  mem[1] = (MAU_T)((val >>  8) & 0xff);
+  mem[2] = (MAU_T)((val >> 16) & 0xff);
+}
+
+#undef  mem_put_le32
+#define mem_put_le32 mem_ops_wrap_symbol(mem_put_le32)
+static AOM_INLINE void mem_put_le32(void *vmem, MEM_VALUE_T val) {
+  MAU_T *mem = (MAU_T *)vmem;
+
+  mem[0] = (MAU_T)((val >>  0) & 0xff);
+  mem[1] = (MAU_T)((val >>  8) & 0xff);
+  mem[2] = (MAU_T)((val >> 16) & 0xff);
+  mem[3] = (MAU_T)((val >> 24) & 0xff);
+}
+/* clang-format on */
+#endif  // AOM_AOM_PORTS_MEM_OPS_H_

diff --git a/libav1/aom_ports/mem_ops_aligned.h b/libav1/aom_ports/mem_ops_aligned.h
new file mode 100644
index 0000000..37c3675
--- /dev/null
+++ b/libav1/aom_ports/mem_ops_aligned.h

@@ -0,0 +1,173 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_PORTS_MEM_OPS_ALIGNED_H_
+#define AOM_AOM_PORTS_MEM_OPS_ALIGNED_H_
+
+#include "aom/aom_integer.h"
+
+/* \file
+ * \brief Provides portable memory access primitives for operating on aligned
+ *        data
+ *
+ * This file is split from mem_ops.h for easier maintenance. See mem_ops.h
+ * for a more detailed description of these primitives.
+ */
+#ifndef INCLUDED_BY_MEM_OPS_H
+#error Include mem_ops.h, not mem_ops_aligned.h directly.
+#endif
+
+/* Architectures that provide instructions for doing this byte swapping
+ * could redefine these macros.
+ */
+#define swap_endian_16(val, raw)                                     \
+  do {                                                               \
+    val = (uint16_t)(((raw >> 8) & 0x00ff) | ((raw << 8) & 0xff00)); \
+  } while (0)
+#define swap_endian_32(val, raw)                                   \
+  do {                                                             \
+    val = ((raw >> 24) & 0x000000ff) | ((raw >> 8) & 0x0000ff00) | \
+          ((raw << 8) & 0x00ff0000) | ((raw << 24) & 0xff000000);  \
+  } while (0)
+#define swap_endian_16_se(val, raw) \
+  do {                              \
+    swap_endian_16(val, raw);       \
+    val = ((val << 16) >> 16);      \
+  } while (0)
+#define swap_endian_32_se(val, raw) swap_endian_32(val, raw)
+
+#define mem_get_ne_aligned_generic(end, sz)                           \
+  static AOM_INLINE unsigned MEM_VALUE_T mem_get_##end##sz##_aligned( \
+      const void *vmem) {                                             \
+    const uint##sz##_t *mem = (const uint##sz##_t *)vmem;             \
+    return *mem;                                                      \
+  }
+
+#define mem_get_sne_aligned_generic(end, sz)                         \
+  static AOM_INLINE signed MEM_VALUE_T mem_get_s##end##sz##_aligned( \
+      const void *vmem) {                                            \
+    const int##sz##_t *mem = (const int##sz##_t *)vmem;              \
+    return *mem;                                                     \
+  }
+
+#define mem_get_se_aligned_generic(end, sz)                           \
+  static AOM_INLINE unsigned MEM_VALUE_T mem_get_##end##sz##_aligned( \
+      const void *vmem) {                                             \
+    const uint##sz##_t *mem = (const uint##sz##_t *)vmem;             \
+    unsigned MEM_VALUE_T val, raw = *mem;                             \
+    swap_endian_##sz(val, raw);                                       \
+    return val;                                                       \
+  }
+
+#define mem_get_sse_aligned_generic(end, sz)                         \
+  static AOM_INLINE signed MEM_VALUE_T mem_get_s##end##sz##_aligned( \
+      const void *vmem) {                                            \
+    const int##sz##_t *mem = (const int##sz##_t *)vmem;              \
+    unsigned MEM_VALUE_T val, raw = *mem;                            \
+    swap_endian_##sz##_se(val, raw);                                 \
+    return val;                                                      \
+  }
+
+#define mem_put_ne_aligned_generic(end, sz)                             \
+  static AOM_INLINE void mem_put_##end##sz##_aligned(void *vmem,        \
+                                                     MEM_VALUE_T val) { \
+    uint##sz##_t *mem = (uint##sz##_t *)vmem;                           \
+    *mem = (uint##sz##_t)val;                                           \
+  }
+
+#define mem_put_se_aligned_generic(end, sz)                             \
+  static AOM_INLINE void mem_put_##end##sz##_aligned(void *vmem,        \
+                                                     MEM_VALUE_T val) { \
+    uint##sz##_t *mem = (uint##sz##_t *)vmem, raw;                      \
+    swap_endian_##sz(raw, val);                                         \
+    *mem = (uint##sz##_t)raw;                                           \
+  }
+
+#include "config/aom_config.h"
+
+#if CONFIG_BIG_ENDIAN
+#define mem_get_be_aligned_generic(sz) mem_get_ne_aligned_generic(be, sz)
+#define mem_get_sbe_aligned_generic(sz) mem_get_sne_aligned_generic(be, sz)
+#define mem_get_le_aligned_generic(sz) mem_get_se_aligned_generic(le, sz)
+#define mem_get_sle_aligned_generic(sz) mem_get_sse_aligned_generic(le, sz)
+#define mem_put_be_aligned_generic(sz) mem_put_ne_aligned_generic(be, sz)
+#define mem_put_le_aligned_generic(sz) mem_put_se_aligned_generic(le, sz)
+#else
+#define mem_get_be_aligned_generic(sz) mem_get_se_aligned_generic(be, sz)
+#define mem_get_sbe_aligned_generic(sz) mem_get_sse_aligned_generic(be, sz)
+#define mem_get_le_aligned_generic(sz) mem_get_ne_aligned_generic(le, sz)
+#define mem_get_sle_aligned_generic(sz) mem_get_sne_aligned_generic(le, sz)
+#define mem_put_be_aligned_generic(sz) mem_put_se_aligned_generic(be, sz)
+#define mem_put_le_aligned_generic(sz) mem_put_ne_aligned_generic(le, sz)
+#endif
+
+/* clang-format off */
+#undef  mem_get_be16_aligned
+#define mem_get_be16_aligned mem_ops_wrap_symbol(mem_get_be16_aligned)
+mem_get_be_aligned_generic(16)
+
+#undef  mem_get_be32_aligned
+#define mem_get_be32_aligned mem_ops_wrap_symbol(mem_get_be32_aligned)
+mem_get_be_aligned_generic(32)
+
+#undef  mem_get_le16_aligned
+#define mem_get_le16_aligned mem_ops_wrap_symbol(mem_get_le16_aligned)
+mem_get_le_aligned_generic(16)
+
+#undef  mem_get_le32_aligned
+#define mem_get_le32_aligned mem_ops_wrap_symbol(mem_get_le32_aligned)
+mem_get_le_aligned_generic(32)
+
+#undef  mem_get_sbe16_aligned
+#define mem_get_sbe16_aligned mem_ops_wrap_symbol(mem_get_sbe16_aligned)
+mem_get_sbe_aligned_generic(16)
+
+#undef  mem_get_sbe32_aligned
+#define mem_get_sbe32_aligned mem_ops_wrap_symbol(mem_get_sbe32_aligned)
+mem_get_sbe_aligned_generic(32)
+
+#undef  mem_get_sle16_aligned
+#define mem_get_sle16_aligned mem_ops_wrap_symbol(mem_get_sle16_aligned)
+mem_get_sle_aligned_generic(16)
+
+#undef  mem_get_sle32_aligned
+#define mem_get_sle32_aligned mem_ops_wrap_symbol(mem_get_sle32_aligned)
+mem_get_sle_aligned_generic(32)
+
+#undef  mem_put_be16_aligned
+#define mem_put_be16_aligned mem_ops_wrap_symbol(mem_put_be16_aligned)
+mem_put_be_aligned_generic(16)
+
+#undef  mem_put_be32_aligned
+#define mem_put_be32_aligned mem_ops_wrap_symbol(mem_put_be32_aligned)
+mem_put_be_aligned_generic(32)
+
+#undef  mem_put_le16_aligned
+#define mem_put_le16_aligned mem_ops_wrap_symbol(mem_put_le16_aligned)
+mem_put_le_aligned_generic(16)
+
+#undef  mem_put_le32_aligned
+#define mem_put_le32_aligned mem_ops_wrap_symbol(mem_put_le32_aligned)
+mem_put_le_aligned_generic(32)
+
+#undef mem_get_ne_aligned_generic
+#undef mem_get_se_aligned_generic
+#undef mem_get_sne_aligned_generic
+#undef mem_get_sse_aligned_generic
+#undef mem_put_ne_aligned_generic
+#undef mem_put_se_aligned_generic
+#undef swap_endian_16
+#undef swap_endian_32
+#undef swap_endian_16_se
+#undef swap_endian_32_se
+/* clang-format on */
+
+#endif  // AOM_AOM_PORTS_MEM_OPS_ALIGNED_H_

diff --git a/libav1/aom_ports/msvc.h b/libav1/aom_ports/msvc.h
new file mode 100644
index 0000000..e78e605
--- /dev/null
+++ b/libav1/aom_ports/msvc.h

@@ -0,0 +1,75 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_PORTS_MSVC_H_
+#define AOM_AOM_PORTS_MSVC_H_
+#ifdef _MSC_VER
+
+#include "config/aom_config.h"
+
+#if _MSC_VER < 1900  // VS2015 provides snprintf
+#define snprintf _snprintf
+#endif  // _MSC_VER < 1900
+
+#if _MSC_VER < 1800  // VS2013 provides round
+#include <math.h>
+static INLINE double round(double x) {
+  if (x < 0)
+    return ceil(x - 0.5);
+  else
+    return floor(x + 0.5);
+}
+
+static INLINE float roundf(float x) {
+  if (x < 0)
+    return (float)ceil(x - 0.5f);
+  else
+    return (float)floor(x + 0.5f);
+}
+
+static INLINE long lroundf(float x) {
+  if (x < 0)
+    return (long)(x - 0.5f);
+  else
+    return (long)(x + 0.5f);
+}
+#endif  // _MSC_VER < 1800
+
+#if HAVE_AVX
+#include <immintrin.h>
+// Note:
+// _mm256_insert_epi16 intrinsics is available from vs2017.
+// We define this macro for vs2015 and earlier. The
+// intrinsics used here are in vs2015 document:
+// https://msdn.microsoft.com/en-us/library/hh977022.aspx
+// Input parameters:
+// a: __m256i,
+// d: int16_t,
+// indx: imm8 (0 - 15)
+#if _MSC_VER <= 1900
+#define _mm256_insert_epi16(a, d, indx)                                      \
+  _mm256_insertf128_si256(                                                   \
+      a,                                                                     \
+      _mm_insert_epi16(_mm256_extractf128_si256(a, indx >> 3), d, indx % 8), \
+      indx >> 3)
+
+static INLINE int _mm256_extract_epi32(__m256i a, const int i) {
+  return a.m256i_i32[i & 7];
+}
+static INLINE __m256i _mm256_insert_epi32(__m256i a, int b, const int i) {
+  __m256i c = a;
+  c.m256i_i32[i & 7] = b;
+  return c;
+}
+#endif  // _MSC_VER <= 1900
+#endif  // HAVE_AVX
+#endif  // _MSC_VER
+#endif  // AOM_AOM_PORTS_MSVC_H_

diff --git a/libav1/aom_ports/sanitizer.h b/libav1/aom_ports/sanitizer.h
new file mode 100644
index 0000000..1dd8eb4
--- /dev/null
+++ b/libav1/aom_ports/sanitizer.h

@@ -0,0 +1,38 @@
+/*
+ * Copyright (c) 2018, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_PORTS_SANITIZER_H_
+#define AOM_AOM_PORTS_SANITIZER_H_
+
+// AddressSanitizer support.
+
+// Define AOM_ADDRESS_SANITIZER if AddressSanitizer is used.
+// Clang.
+#if defined(__has_feature)
+#if __has_feature(address_sanitizer)
+#define AOM_ADDRESS_SANITIZER 1
+#endif
+#endif  // defined(__has_feature)
+// GCC.
+#if defined(__SANITIZE_ADDRESS__)
+#define AOM_ADDRESS_SANITIZER 1
+#endif  // defined(__SANITIZE_ADDRESS__)
+
+// Define the macros for AddressSanitizer manual memory poisoning. See
+// https://github.com/google/sanitizers/wiki/AddressSanitizerManualPoisoning.
+#if defined(AOM_ADDRESS_SANITIZER)
+#include <sanitizer/asan_interface.h>
+#else
+#define ASAN_POISON_MEMORY_REGION(addr, size) ((void)(addr), (void)(size))
+#define ASAN_UNPOISON_MEMORY_REGION(addr, size) ((void)(addr), (void)(size))
+#endif
+
+#endif  // AOM_AOM_PORTS_SANITIZER_H_

diff --git a/libav1/aom_ports/system_state.h b/libav1/aom_ports/system_state.h
new file mode 100644
index 0000000..6640839
--- /dev/null
+++ b/libav1/aom_ports/system_state.h

@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_PORTS_SYSTEM_STATE_H_
+#define AOM_AOM_PORTS_SYSTEM_STATE_H_
+
+#include "config/aom_config.h"
+
+#if ARCH_X86 || ARCH_X86_64
+void aom_reset_mmx_state(void);
+#define aom_clear_system_state() aom_reset_mmx_state()
+#else
+#define aom_clear_system_state()
+#endif  // ARCH_X86 || ARCH_X86_64
+#endif  // AOM_AOM_PORTS_SYSTEM_STATE_H_

diff --git a/libav1/aom_ports/x86.h b/libav1/aom_ports/x86.h
new file mode 100644
index 0000000..a2d1b16
--- /dev/null
+++ b/libav1/aom_ports/x86.h

@@ -0,0 +1,376 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_PORTS_X86_H_
+#define AOM_AOM_PORTS_X86_H_
+#include <stdlib.h>
+
+#if defined(_MSC_VER)
+#include <intrin.h> /* For __cpuidex, __rdtsc */
+#endif
+
+#include "aom/aom_integer.h"
+#include "config/aom_config.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef enum {
+  AOM_CPU_UNKNOWN = -1,
+  AOM_CPU_AMD,
+  AOM_CPU_AMD_OLD,
+  AOM_CPU_CENTAUR,
+  AOM_CPU_CYRIX,
+  AOM_CPU_INTEL,
+  AOM_CPU_NEXGEN,
+  AOM_CPU_NSC,
+  AOM_CPU_RISE,
+  AOM_CPU_SIS,
+  AOM_CPU_TRANSMETA,
+  AOM_CPU_TRANSMETA_OLD,
+  AOM_CPU_UMC,
+  AOM_CPU_VIA,
+
+  AOM_CPU_LAST
+} aom_cpu_t;
+
+#if defined(__GNUC__) && __GNUC__ || defined(__ANDROID__)
+#if ARCH_X86_64
+#define cpuid(func, func2, ax, bx, cx, dx)                      \
+  __asm__ __volatile__("cpuid           \n\t"                   \
+                       : "=a"(ax), "=b"(bx), "=c"(cx), "=d"(dx) \
+                       : "a"(func), "c"(func2));
+#else
+#define cpuid(func, func2, ax, bx, cx, dx)     \
+  __asm__ __volatile__(                        \
+      "mov %%ebx, %%edi   \n\t"                \
+      "cpuid              \n\t"                \
+      "xchg %%edi, %%ebx  \n\t"                \
+      : "=a"(ax), "=D"(bx), "=c"(cx), "=d"(dx) \
+      : "a"(func), "c"(func2));
+#endif
+#elif defined(__SUNPRO_C) || \
+    defined(__SUNPRO_CC) /* end __GNUC__ or __ANDROID__*/
+#if ARCH_X86_64
+#define cpuid(func, func2, ax, bx, cx, dx)     \
+  asm volatile(                                \
+      "xchg %rsi, %rbx \n\t"                   \
+      "cpuid           \n\t"                   \
+      "movl %ebx, %edi \n\t"                   \
+      "xchg %rsi, %rbx \n\t"                   \
+      : "=a"(ax), "=D"(bx), "=c"(cx), "=d"(dx) \
+      : "a"(func), "c"(func2));
+#else
+#define cpuid(func, func2, ax, bx, cx, dx)     \
+  asm volatile(                                \
+      "pushl %ebx       \n\t"                  \
+      "cpuid            \n\t"                  \
+      "movl %ebx, %edi  \n\t"                  \
+      "popl %ebx        \n\t"                  \
+      : "=a"(ax), "=D"(bx), "=c"(cx), "=d"(dx) \
+      : "a"(func), "c"(func2));
+#endif
+#else /* end __SUNPRO__ */
+#if ARCH_X86_64
+#if defined(_MSC_VER) && _MSC_VER > 1500
+#define cpuid(func, func2, a, b, c, d) \
+  do {                                 \
+    int regs[4];                       \
+    __cpuidex(regs, func, func2);      \
+    a = regs[0];                       \
+    b = regs[1];                       \
+    c = regs[2];                       \
+    d = regs[3];                       \
+  } while (0)
+#else
+#define cpuid(func, func2, a, b, c, d) \
+  do {                                 \
+    int regs[4];                       \
+    __cpuid(regs, func);               \
+    a = regs[0];                       \
+    b = regs[1];                       \
+    c = regs[2];                       \
+    d = regs[3];                       \
+  } while (0)
+#endif
+#else
+/* clang-format off */
+#define cpuid(func, func2, a, b, c, d) \
+  __asm mov eax, func                  \
+  __asm mov ecx, func2                 \
+  __asm cpuid                          \
+  __asm mov a, eax                     \
+  __asm mov b, ebx                     \
+  __asm mov c, ecx                     \
+  __asm mov d, edx
+#endif
+/* clang-format on */
+#endif /* end others */
+
+// NaCl has no support for xgetbv or the raw opcode.
+#if !defined(__native_client__) && (defined(__i386__) || defined(__x86_64__))
+static INLINE uint64_t xgetbv(void) {
+  const uint32_t ecx = 0;
+  uint32_t eax, edx;
+  // Use the raw opcode for xgetbv for compatibility with older toolchains.
+  __asm__ volatile(".byte 0x0f, 0x01, 0xd0\n"
+                   : "=a"(eax), "=d"(edx)
+                   : "c"(ecx));
+  return ((uint64_t)edx << 32) | eax;
+}
+#elif (defined(_M_X64) || defined(_M_IX86)) && defined(_MSC_FULL_VER) && \
+    _MSC_FULL_VER >= 160040219  // >= VS2010 SP1
+#include <immintrin.h>
+#define xgetbv() _xgetbv(0)
+#elif defined(_MSC_VER) && defined(_M_IX86)
+static INLINE uint64_t xgetbv(void) {
+  uint32_t eax_, edx_;
+  __asm {
+    xor ecx, ecx  // ecx = 0
+    // Use the raw opcode for xgetbv for compatibility with older toolchains.
+    __asm _emit 0x0f __asm _emit 0x01 __asm _emit 0xd0
+    mov eax_, eax
+    mov edx_, edx
+  }
+  return ((uint64_t)edx_ << 32) | eax_;
+}
+#else
+#define xgetbv() 0U  // no AVX for older x64 or unrecognized toolchains.
+#endif
+
+#if defined(_MSC_VER) && _MSC_VER >= 1700
+#include <windows.h>
+#if WINAPI_FAMILY_PARTITION(WINAPI_FAMILY_APP)
+#define getenv(x) NULL
+#endif
+#endif
+
+#define HAS_MMX 0x01
+#define HAS_SSE 0x02
+#define HAS_SSE2 0x04
+#define HAS_SSE3 0x08
+#define HAS_SSSE3 0x10
+#define HAS_SSE4_1 0x20
+#define HAS_AVX 0x40
+#define HAS_AVX2 0x80
+#define HAS_SSE4_2 0x100
+#ifndef BIT
+#define BIT(n) (1 << n)
+#endif
+
+static INLINE int x86_simd_caps(void) {
+  unsigned int flags = 0;
+  unsigned int mask = ~0;
+  unsigned int max_cpuid_val, reg_eax, reg_ebx, reg_ecx, reg_edx;
+  char *env;
+  (void)reg_ebx;
+
+  /* See if the CPU capabilities are being overridden by the environment */
+  env = getenv("AOM_SIMD_CAPS");
+
+  if (env && *env) return (int)strtol(env, NULL, 0);
+
+  env = getenv("AOM_SIMD_CAPS_MASK");
+
+  if (env && *env) mask = (unsigned int)strtoul(env, NULL, 0);
+
+  /* Ensure that the CPUID instruction supports extended features */
+  cpuid(0, 0, max_cpuid_val, reg_ebx, reg_ecx, reg_edx);
+
+  if (max_cpuid_val < 1) return 0;
+
+  /* Get the standard feature flags */
+  cpuid(1, 0, reg_eax, reg_ebx, reg_ecx, reg_edx);
+
+  if (reg_edx & BIT(23)) flags |= HAS_MMX;
+
+  if (reg_edx & BIT(25)) flags |= HAS_SSE; /* aka xmm */
+
+  if (reg_edx & BIT(26)) flags |= HAS_SSE2; /* aka wmt */
+
+  if (reg_ecx & BIT(0)) flags |= HAS_SSE3;
+
+  if (reg_ecx & BIT(9)) flags |= HAS_SSSE3;
+
+  if (reg_ecx & BIT(19)) flags |= HAS_SSE4_1;
+
+  if (reg_ecx & BIT(20)) flags |= HAS_SSE4_2;
+
+  // bits 27 (OSXSAVE) & 28 (256-bit AVX)
+  if ((reg_ecx & (BIT(27) | BIT(28))) == (BIT(27) | BIT(28))) {
+    if ((xgetbv() & 0x6) == 0x6) {
+      flags |= HAS_AVX;
+
+      if (max_cpuid_val >= 7) {
+        /* Get the leaf 7 feature flags. Needed to check for AVX2 support */
+        cpuid(7, 0, reg_eax, reg_ebx, reg_ecx, reg_edx);
+
+        if (reg_ebx & BIT(5)) flags |= HAS_AVX2;
+      }
+    }
+  }
+
+  return 0;
+  //flags &mask;
+}
+
+// Fine-Grain Measurement Functions
+//
+// If you are a timing a small region of code, access the timestamp counter
+// (TSC) via:
+//
+// unsigned int start = x86_tsc_start();
+//   ...
+// unsigned int end = x86_tsc_end();
+// unsigned int diff = end - start;
+//
+// The start/end functions introduce a few more instructions than using
+// x86_readtsc directly, but prevent the CPU's out-of-order execution from
+// affecting the measurement (by having earlier/later instructions be evaluated
+// in the time interval). See the white paper, "How to Benchmark Code
+// Execution Times on Intel® IA-32 and IA-64 Instruction Set Architectures" by
+// Gabriele Paoloni for more information.
+//
+// If you are timing a large function (CPU time > a couple of seconds), use
+// x86_readtsc64 to read the timestamp counter in a 64-bit integer. The
+// out-of-order leakage that can occur is minimal compared to total runtime.
+static INLINE unsigned int x86_readtsc(void) {
+#if defined(__GNUC__) && __GNUC__
+  unsigned int tsc;
+  __asm__ __volatile__("rdtsc\n\t" : "=a"(tsc) :);
+  return tsc;
+#elif defined(__SUNPRO_C) || defined(__SUNPRO_CC)
+  unsigned int tsc;
+  asm volatile("rdtsc\n\t" : "=a"(tsc) :);
+  return tsc;
+#else
+#if ARCH_X86_64
+  return (unsigned int)__rdtsc();
+#else
+  __asm rdtsc;
+#endif
+#endif
+}
+// 64-bit CPU cycle counter
+static INLINE uint64_t x86_readtsc64(void) {
+#if defined(__GNUC__) && __GNUC__
+  uint32_t hi, lo;
+  __asm__ __volatile__("rdtsc" : "=a"(lo), "=d"(hi));
+  return ((uint64_t)hi << 32) | lo;
+#elif defined(__SUNPRO_C) || defined(__SUNPRO_CC)
+  uint_t hi, lo;
+  asm volatile("rdtsc\n\t" : "=a"(lo), "=d"(hi));
+  return ((uint64_t)hi << 32) | lo;
+#else
+#if ARCH_X86_64
+  return (uint64_t)__rdtsc();
+#else
+  __asm rdtsc;
+#endif
+#endif
+}
+
+// 32-bit CPU cycle counter with a partial fence against out-of-order execution.
+static INLINE unsigned int x86_readtscp(void) {
+#if defined(__GNUC__) && __GNUC__
+  unsigned int tscp;
+  __asm__ __volatile__("rdtscp\n\t" : "=a"(tscp) :);
+  return tscp;
+#elif defined(__SUNPRO_C) || defined(__SUNPRO_CC)
+  unsigned int tscp;
+  asm volatile("rdtscp\n\t" : "=a"(tscp) :);
+  return tscp;
+#elif defined(_MSC_VER)
+  unsigned int ui;
+  return (unsigned int)__rdtscp(&ui);
+#else
+#if ARCH_X86_64
+  return (unsigned int)__rdtscp();
+#else
+  __asm rdtscp;
+#endif
+#endif
+}
+
+static INLINE unsigned int x86_tsc_start(void) {
+  unsigned int reg_eax, reg_ebx, reg_ecx, reg_edx;
+  cpuid(0, 0, reg_eax, reg_ebx, reg_ecx, reg_edx);
+  return x86_readtsc();
+}
+
+static INLINE unsigned int x86_tsc_end(void) {
+  uint32_t v = x86_readtscp();
+  unsigned int reg_eax, reg_ebx, reg_ecx, reg_edx;
+  cpuid(0, 0, reg_eax, reg_ebx, reg_ecx, reg_edx);
+  return v;
+}
+
+#if defined(__GNUC__) && __GNUC__
+#define x86_pause_hint() __asm__ __volatile__("pause \n\t")
+#elif defined(__SUNPRO_C) || defined(__SUNPRO_CC)
+#define x86_pause_hint() asm volatile("pause \n\t")
+#else
+#if ARCH_X86_64
+#define x86_pause_hint() _mm_pause();
+#else
+#define x86_pause_hint() __asm pause
+#endif
+#endif
+
+#if defined(__GNUC__) && __GNUC__
+static void x87_set_control_word(unsigned short mode) {
+  __asm__ __volatile__("fldcw %0" : : "m"(*&mode));
+}
+static unsigned short x87_get_control_word(void) {
+  unsigned short mode;
+  __asm__ __volatile__("fstcw %0\n\t" : "=m"(*&mode) :);
+  return mode;
+}
+#elif defined(__SUNPRO_C) || defined(__SUNPRO_CC)
+static void x87_set_control_word(unsigned short mode) {
+  asm volatile("fldcw %0" : : "m"(*&mode));
+}
+static unsigned short x87_get_control_word(void) {
+  unsigned short mode;
+  asm volatile("fstcw %0\n\t" : "=m"(*&mode) :);
+  return mode;
+}
+#elif ARCH_X86_64
+/* No fldcw intrinsics on Windows x64, punt to external asm */
+extern void aom_winx64_fldcw(unsigned short mode);
+extern unsigned short aom_winx64_fstcw(void);
+#define x87_set_control_word aom_winx64_fldcw
+#define x87_get_control_word aom_winx64_fstcw
+#else
+static void x87_set_control_word(unsigned short mode) {
+  __asm { fldcw mode }
+}
+static unsigned short x87_get_control_word(void) {
+  unsigned short mode;
+  __asm { fstcw mode }
+  return mode;
+}
+#endif
+
+static INLINE unsigned int x87_set_double_precision(void) {
+  unsigned int mode = x87_get_control_word();
+  x87_set_control_word((mode & ~0x300) | 0x200);
+  return mode;
+}
+
+extern void aom_reset_mmx_state(void);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_PORTS_X86_H_

diff --git a/libav1/aom_scale/aom_scale.h b/libav1/aom_scale/aom_scale.h
new file mode 100644
index 0000000..11812a1
--- /dev/null
+++ b/libav1/aom_scale/aom_scale.h

@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_SCALE_AOM_SCALE_H_
+#define AOM_AOM_SCALE_AOM_SCALE_H_
+
+#include "aom_scale/yv12config.h"
+
+extern void aom_scale_frame(YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst,
+                            unsigned char *temp_area, unsigned char temp_height,
+                            unsigned int hscale, unsigned int hratio,
+                            unsigned int vscale, unsigned int vratio,
+                            unsigned int interlaced, const int num_planes);
+
+#endif  // AOM_AOM_SCALE_AOM_SCALE_H_

diff --git a/libav1/aom_scale/aom_scale_rtcd.c b/libav1/aom_scale/aom_scale_rtcd.c
new file mode 100644
index 0000000..a04e053
--- /dev/null
+++ b/libav1/aom_scale/aom_scale_rtcd.c

@@ -0,0 +1,18 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#include "config/aom_config.h"
+
+#define RTCD_C
+#include "config/aom_scale_rtcd.h"
+
+#include "aom_ports/aom_once.h"
+
+void aom_scale_rtcd() { aom_once(setup_rtcd_internal); }

diff --git a/libav1/aom_scale/generic/aom_scale.c b/libav1/aom_scale/generic/aom_scale.c
new file mode 100644
index 0000000..206c42c
--- /dev/null
+++ b/libav1/aom_scale/generic/aom_scale.c

@@ -0,0 +1,506 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+/****************************************************************************
+ *
+ *   Module Title :     scale.c
+ *
+ *   Description  :     Image scaling functions.
+ *
+ ***************************************************************************/
+
+/****************************************************************************
+ *  Header Files
+ ****************************************************************************/
+#include "config/aom_scale_rtcd.h"
+
+#include "aom_mem/aom_mem.h"
+#include "aom_scale/aom_scale.h"
+#include "aom_scale/yv12config.h"
+
+typedef struct {
+  int expanded_frame_width;
+  int expanded_frame_height;
+
+  int HScale;
+  int HRatio;
+  int VScale;
+  int VRatio;
+
+  YV12_BUFFER_CONFIG *src_yuv_config;
+  YV12_BUFFER_CONFIG *dst_yuv_config;
+
+} SCALE_VARS;
+
+/****************************************************************************
+ *
+ *  ROUTINE       : scale1d_2t1_i
+ *
+ *  INPUTS        : const unsigned char *source : Pointer to data to be scaled.
+ *                  int source_step             : Number of pixels to step on
+ *                                                in source.
+ *                  unsigned int source_scale   : Scale for source (UNUSED).
+ *                  unsigned int source_length  : Length of source (UNUSED).
+ *                  unsigned char *dest         : Pointer to output data array.
+ *                  int dest_step               : Number of pixels to step on
+ *                                                in destination.
+ *                  unsigned int dest_scale     : Scale for destination
+ *                                                (UNUSED).
+ *                  unsigned int dest_length    : Length of destination.
+ *
+ *  OUTPUTS       : None.
+ *
+ *  RETURNS       : void
+ *
+ *  FUNCTION      : Performs 2-to-1 interpolated scaling.
+ *
+ *  SPECIAL NOTES : None.
+ *
+ ****************************************************************************/
+static void scale1d_2t1_i(const unsigned char *source, int source_step,
+                          unsigned int source_scale, unsigned int source_length,
+                          unsigned char *dest, int dest_step,
+                          unsigned int dest_scale, unsigned int dest_length) {
+  const unsigned char *const dest_end = dest + dest_length * dest_step;
+  (void)source_length;
+  (void)source_scale;
+  (void)dest_scale;
+
+  source_step *= 2;  // Every other row.
+
+  dest[0] = source[0];  // Special case: 1st pixel.
+  source += source_step;
+  dest += dest_step;
+
+  while (dest < dest_end) {
+    const unsigned int a = 3 * source[-source_step];
+    const unsigned int b = 10 * source[0];
+    const unsigned int c = 3 * source[source_step];
+    *dest = (unsigned char)((8 + a + b + c) >> 4);
+    source += source_step;
+    dest += dest_step;
+  }
+}
+
+/****************************************************************************
+ *
+ *  ROUTINE       : scale1d_2t1_ps
+ *
+ *  INPUTS        : const unsigned char *source : Pointer to data to be scaled.
+ *                  int source_step             : Number of pixels to step on
+ *                                                in source.
+ *                  unsigned int source_scale   : Scale for source (UNUSED).
+ *                  unsigned int source_length  : Length of source (UNUSED).
+ *                  unsigned char *dest         : Pointer to output data array.
+ *                  int dest_step               : Number of pixels to step on
+ *                                                in destination.
+ *                  unsigned int dest_scale     : Scale for destination
+ *                                                (UNUSED).
+ *                  unsigned int dest_length    : Length of destination.
+ *
+ *  OUTPUTS       : None.
+ *
+ *  RETURNS       : void
+ *
+ *  FUNCTION      : Performs 2-to-1 point subsampled scaling.
+ *
+ *  SPECIAL NOTES : None.
+ *
+ ****************************************************************************/
+static void scale1d_2t1_ps(const unsigned char *source, int source_step,
+                           unsigned int source_scale,
+                           unsigned int source_length, unsigned char *dest,
+                           int dest_step, unsigned int dest_scale,
+                           unsigned int dest_length) {
+  const unsigned char *const dest_end = dest + dest_length * dest_step;
+  (void)source_length;
+  (void)source_scale;
+  (void)dest_scale;
+
+  source_step *= 2;  // Every other row.
+
+  while (dest < dest_end) {
+    *dest = *source;
+    source += source_step;
+    dest += dest_step;
+  }
+}
+/****************************************************************************
+ *
+ *  ROUTINE       : scale1d_c
+ *
+ *  INPUTS        : const unsigned char *source : Pointer to data to be scaled.
+ *                  int source_step             : Number of pixels to step on
+ *                                                in source.
+ *                  unsigned int source_scale   : Scale for source.
+ *                  unsigned int source_length  : Length of source (UNUSED).
+ *                  unsigned char *dest         : Pointer to output data array.
+ *                  int dest_step               : Number of pixels to step on
+ *                                                in destination.
+ *                  unsigned int dest_scale     : Scale for destination.
+ *                  unsigned int dest_length    : Length of destination.
+ *
+ *  OUTPUTS       : None.
+ *
+ *  RETURNS       : void
+ *
+ *  FUNCTION      : Performs linear interpolation in one dimension.
+ *
+ *  SPECIAL NOTES : None.
+ *
+ ****************************************************************************/
+static void scale1d_c(const unsigned char *source, int source_step,
+                      unsigned int source_scale, unsigned int source_length,
+                      unsigned char *dest, int dest_step,
+                      unsigned int dest_scale, unsigned int dest_length) {
+  const unsigned char *const dest_end = dest + dest_length * dest_step;
+  const unsigned int round_value = dest_scale / 2;
+  unsigned int left_modifier = dest_scale;
+  unsigned int right_modifier = 0;
+  unsigned char left_pixel = source[0];
+  unsigned char right_pixel = source[source_step];
+
+  (void)source_length;
+
+  /* These asserts are needed if there are boundary issues... */
+  /* assert ( dest_scale > source_scale );*/
+  /* assert ( (source_length - 1) * dest_scale >= (dest_length - 1) *
+   * source_scale);*/
+
+  while (dest < dest_end) {
+    *dest = (unsigned char)((left_modifier * left_pixel +
+                             right_modifier * right_pixel + round_value) /
+                            dest_scale);
+
+    right_modifier += source_scale;
+
+    while (right_modifier > dest_scale) {
+      right_modifier -= dest_scale;
+      source += source_step;
+      left_pixel = source[0];
+      right_pixel = source[source_step];
+    }
+
+    left_modifier = dest_scale - right_modifier;
+  }
+}
+
+/****************************************************************************
+ *
+ *  ROUTINE       : Scale2D
+ *
+ *  INPUTS        : const unsigned char *source    : Pointer to data to be
+ *                                                   scaled.
+ *                  int source_pitch               : Stride of source image.
+ *                  unsigned int source_width      : Width of input image.
+ *                  unsigned int source_height     : Height of input image.
+ *                  unsigned char *dest            : Pointer to output data
+ *                                                   array.
+ *                  int dest_pitch                 : Stride of destination
+ *                                                   image.
+ *                  unsigned int dest_width        : Width of destination image.
+ *                  unsigned int dest_height       : Height of destination
+ *                                                   image.
+ *                  unsigned char *temp_area       : Pointer to temp work area.
+ *                  unsigned char temp_area_height : Height of temp work area.
+ *                  unsigned int hscale            : Horizontal scale factor
+ *                                                   numerator.
+ *                  unsigned int hratio            : Horizontal scale factor
+ *                                                   denominator.
+ *                  unsigned int vscale            : Vertical scale factor
+ *                                                   numerator.
+ *                  unsigned int vratio            : Vertical scale factor
+ *                                                   denominator.
+ *                  unsigned int interlaced        : Interlace flag.
+ *
+ *  OUTPUTS       : None.
+ *
+ *  RETURNS       : void
+ *
+ *  FUNCTION      : Performs 2-tap linear interpolation in two dimensions.
+ *
+ *  SPECIAL NOTES : Expansion is performed one band at a time to help with
+ *                  caching.
+ *
+ ****************************************************************************/
+static void Scale2D(
+    /*const*/
+    unsigned char *source, int source_pitch, unsigned int source_width,
+    unsigned int source_height, unsigned char *dest, int dest_pitch,
+    unsigned int dest_width, unsigned int dest_height, unsigned char *temp_area,
+    unsigned char temp_area_height, unsigned int hscale, unsigned int hratio,
+    unsigned int vscale, unsigned int vratio, unsigned int interlaced) {
+  unsigned int i, j, k;
+  unsigned int bands;
+  unsigned int dest_band_height;
+  unsigned int source_band_height;
+
+  typedef void (*Scale1D)(const unsigned char *source, int source_step,
+                          unsigned int source_scale, unsigned int source_length,
+                          unsigned char *dest, int dest_step,
+                          unsigned int dest_scale, unsigned int dest_length);
+
+  Scale1D Scale1Dv = scale1d_c;
+  Scale1D Scale1Dh = scale1d_c;
+
+  void (*horiz_line_scale)(const unsigned char *, unsigned int, unsigned char *,
+                           unsigned int) = NULL;
+  void (*vert_band_scale)(unsigned char *, int, unsigned char *, int,
+                          unsigned int) = NULL;
+
+  int ratio_scalable = 1;
+  int interpolation = 0;
+
+  unsigned char *source_base;
+  unsigned char *line_src;
+
+  source_base = (unsigned char *)source;
+
+  if (source_pitch < 0) {
+    int offset;
+
+    offset = (source_height - 1);
+    offset *= source_pitch;
+
+    source_base += offset;
+  }
+
+  /* find out the ratio for each direction */
+  switch (hratio * 10 / hscale) {
+    case 8:
+      /* 4-5 Scale in Width direction */
+      horiz_line_scale = aom_horizontal_line_5_4_scale;
+      break;
+    case 6:
+      /* 3-5 Scale in Width direction */
+      horiz_line_scale = aom_horizontal_line_5_3_scale;
+      break;
+    case 5:
+      /* 1-2 Scale in Width direction */
+      horiz_line_scale = aom_horizontal_line_2_1_scale;
+      break;
+    default:
+      /* The ratio is not acceptable now */
+      /* throw("The ratio is not acceptable for now!"); */
+      ratio_scalable = 0;
+      break;
+  }
+
+  switch (vratio * 10 / vscale) {
+    case 8:
+      /* 4-5 Scale in vertical direction */
+      vert_band_scale = aom_vertical_band_5_4_scale;
+      source_band_height = 5;
+      dest_band_height = 4;
+      break;
+    case 6:
+      /* 3-5 Scale in vertical direction */
+      vert_band_scale = aom_vertical_band_5_3_scale;
+      source_band_height = 5;
+      dest_band_height = 3;
+      break;
+    case 5:
+      /* 1-2 Scale in vertical direction */
+
+      if (interlaced) {
+        /* if the content is interlaced, point sampling is used */
+        vert_band_scale = aom_vertical_band_2_1_scale;
+      } else {
+        interpolation = 1;
+        /* if the content is progressive, interplo */
+        vert_band_scale = aom_vertical_band_2_1_scale_i;
+      }
+
+      source_band_height = 2;
+      dest_band_height = 1;
+      break;
+    default:
+      /* The ratio is not acceptable now */
+      /* throw("The ratio is not acceptable for now!"); */
+      ratio_scalable = 0;
+      break;
+  }
+
+  if (ratio_scalable) {
+    if (source_height == dest_height) {
+      /* for each band of the image */
+      for (k = 0; k < dest_height; ++k) {
+        horiz_line_scale(source, source_width, dest, dest_width);
+        source += source_pitch;
+        dest += dest_pitch;
+      }
+
+      return;
+    }
+
+    if (interpolation) {
+      if (source < source_base) source = source_base;
+
+      horiz_line_scale(source, source_width, temp_area, dest_width);
+    }
+
+    for (k = 0; k < (dest_height + dest_band_height - 1) / dest_band_height;
+         ++k) {
+      /* scale one band horizontally */
+      for (i = 0; i < source_band_height; ++i) {
+        /* Trap case where we could read off the base of the source buffer */
+
+        line_src = source + i * source_pitch;
+
+        if (line_src < source_base) line_src = source_base;
+
+        horiz_line_scale(line_src, source_width,
+                         temp_area + (i + 1) * dest_pitch, dest_width);
+      }
+
+      /* Vertical scaling is in place */
+      vert_band_scale(temp_area + dest_pitch, dest_pitch, dest, dest_pitch,
+                      dest_width);
+
+      if (interpolation)
+        memcpy(temp_area, temp_area + source_band_height * dest_pitch,
+               dest_width);
+
+      /* Next band... */
+      source += (unsigned long)source_band_height * source_pitch;
+      dest += (unsigned long)dest_band_height * dest_pitch;
+    }
+
+    return;
+  }
+
+  if (hscale == 2 && hratio == 1) Scale1Dh = scale1d_2t1_ps;
+
+  if (vscale == 2 && vratio == 1) {
+    if (interlaced)
+      Scale1Dv = scale1d_2t1_ps;
+    else
+      Scale1Dv = scale1d_2t1_i;
+  }
+
+  if (source_height == dest_height) {
+    /* for each band of the image */
+    for (k = 0; k < dest_height; ++k) {
+      Scale1Dh(source, 1, hscale, source_width + 1, dest, 1, hratio,
+               dest_width);
+      source += source_pitch;
+      dest += dest_pitch;
+    }
+
+    return;
+  }
+
+  if (dest_height > source_height) {
+    dest_band_height = temp_area_height - 1;
+    source_band_height = dest_band_height * source_height / dest_height;
+  } else {
+    source_band_height = temp_area_height - 1;
+    dest_band_height = source_band_height * vratio / vscale;
+  }
+
+  /* first row needs to be done so that we can stay one row ahead for vertical
+   * zoom */
+  Scale1Dh(source, 1, hscale, source_width + 1, temp_area, 1, hratio,
+           dest_width);
+
+  /* for each band of the image */
+  bands = (dest_height + dest_band_height - 1) / dest_band_height;
+
+  for (k = 0; k < bands; ++k) {
+    /* scale one band horizontally */
+    for (i = 1; i < source_band_height + 1; ++i) {
+      if (k * source_band_height + i < source_height) {
+        Scale1Dh(source + i * source_pitch, 1, hscale, source_width + 1,
+                 temp_area + i * dest_pitch, 1, hratio, dest_width);
+      } else { /*  Duplicate the last row */
+        /* copy temp_area row 0 over from last row in the past */
+        memcpy(temp_area + i * dest_pitch, temp_area + (i - 1) * dest_pitch,
+               dest_pitch);
+      }
+    }
+
+    /* scale one band vertically */
+    for (j = 0; j < dest_width; ++j) {
+      Scale1Dv(&temp_area[j], dest_pitch, vscale, source_band_height + 1,
+               &dest[j], dest_pitch, vratio, dest_band_height);
+    }
+
+    /* copy temp_area row 0 over from last row in the past */
+    memcpy(temp_area, temp_area + source_band_height * dest_pitch, dest_pitch);
+
+    /* move to the next band */
+    source += source_band_height * source_pitch;
+    dest += dest_band_height * dest_pitch;
+  }
+}
+
+/****************************************************************************
+ *
+ *  ROUTINE       : aom_scale_frame
+ *
+ *  INPUTS        : YV12_BUFFER_CONFIG *src        : Pointer to frame to be
+ *                                                   scaled.
+ *                  YV12_BUFFER_CONFIG *dst        : Pointer to buffer to hold
+ *                                                   scaled frame.
+ *                  unsigned char *temp_area       : Pointer to temp work area.
+ *                  unsigned char temp_area_height : Height of temp work area.
+ *                  unsigned int hscale            : Horizontal scale factor
+ *                                                   numerator.
+ *                  unsigned int hratio            : Horizontal scale factor
+ *                                                   denominator.
+ *                  unsigned int vscale            : Vertical scale factor
+ *                                                   numerator.
+ *                  unsigned int vratio            : Vertical scale factor
+ *                                                   denominator.
+ *                  unsigned int interlaced        : Interlace flag.
+ *
+ *  OUTPUTS       : None.
+ *
+ *  RETURNS       : void
+ *
+ *  FUNCTION      : Performs 2-tap linear interpolation in two dimensions.
+ *
+ *  SPECIAL NOTES : Expansion is performed one band at a time to help with
+ *                  caching.
+ *
+ ****************************************************************************/
+void aom_scale_frame(YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst,
+                     unsigned char *temp_area, unsigned char temp_height,
+                     unsigned int hscale, unsigned int hratio,
+                     unsigned int vscale, unsigned int vratio,
+                     unsigned int interlaced, const int num_planes) {
+  const int dw = (hscale - 1 + src->y_width * hratio) / hscale;
+  const int dh = (vscale - 1 + src->y_height * vratio) / vscale;
+
+  for (int plane = 0; plane < num_planes; ++plane) {
+    const int is_uv = plane > 0;
+    const int plane_dw = dw >> is_uv;
+    const int plane_dh = dh >> is_uv;
+
+    Scale2D((unsigned char *)src->buffers[plane], src->strides[is_uv],
+            src->widths[is_uv], src->heights[is_uv],
+            (unsigned char *)dst->buffers[plane], dst->strides[is_uv], plane_dw,
+            plane_dh, temp_area, temp_height, hscale, hratio, vscale, vratio,
+            interlaced);
+
+    if (plane_dw < dst->widths[is_uv])
+      for (int i = 0; i < plane_dh; ++i)
+        memset(dst->buffers[plane] + i * dst->strides[is_uv] + plane_dw - 1,
+               dst->buffers[plane][i * dst->strides[is_uv] + plane_dw - 2],
+               dst->widths[is_uv] - plane_dw + 1);
+
+    if (plane_dh < dst->heights[is_uv])
+      for (int i = plane_dh - 1; i < dst->heights[is_uv]; ++i)
+        memcpy(dst->buffers[plane] + i * dst->strides[is_uv],
+               dst->buffers[plane] + (plane_dh - 2) * dst->strides[is_uv],
+               dst->widths[is_uv] + 1);
+  }
+}

diff --git a/libav1/aom_scale/generic/gen_scalers.c b/libav1/aom_scale/generic/gen_scalers.c
new file mode 100644
index 0000000..549e2aa
--- /dev/null
+++ b/libav1/aom_scale/generic/gen_scalers.c

@@ -0,0 +1,201 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "config/aom_scale_rtcd.h"
+
+#include "aom_scale/aom_scale.h"
+#include "aom_mem/aom_mem.h"
+/****************************************************************************
+ *  Imports
+ ****************************************************************************/
+
+/****************************************************************************
+ *
+ *
+ *  INPUTS        : const unsigned char *source : Pointer to source data.
+ *                  unsigned int source_width   : Stride of source.
+ *                  unsigned char *dest         : Pointer to destination data.
+ *                  unsigned int dest_width     : Stride of destination
+ *                                                (NOT USED).
+ *
+ *  OUTPUTS       : None.
+ *
+ *  RETURNS       : void
+ *
+ *  FUNCTION      : Copies horizontal line of pixels from source to
+ *                  destination scaling up by 4 to 5.
+ *
+ *  SPECIAL NOTES : None.
+ *
+ ****************************************************************************/
+void aom_horizontal_line_5_4_scale_c(const unsigned char *source,
+                                     unsigned int source_width,
+                                     unsigned char *dest,
+                                     unsigned int dest_width) {
+  const unsigned char *const source_end = source + source_width;
+  (void)dest_width;
+
+  while (source < source_end) {
+    const unsigned int a = source[0];
+    const unsigned int b = source[1];
+    const unsigned int c = source[2];
+    const unsigned int d = source[3];
+    const unsigned int e = source[4];
+
+    dest[0] = (unsigned char)a;
+    dest[1] = (unsigned char)((b * 192 + c * 64 + 128) >> 8);
+    dest[2] = (unsigned char)((c * 128 + d * 128 + 128) >> 8);
+    dest[3] = (unsigned char)((d * 64 + e * 192 + 128) >> 8);
+
+    source += 5;
+    dest += 4;
+  }
+}
+
+void aom_vertical_band_5_4_scale_c(unsigned char *source, int src_pitch,
+                                   unsigned char *dest, int dest_pitch,
+                                   unsigned int dest_width) {
+  const unsigned char *const dest_end = dest + dest_width;
+  while (dest < dest_end) {
+    const unsigned int a = source[0 * src_pitch];
+    const unsigned int b = source[1 * src_pitch];
+    const unsigned int c = source[2 * src_pitch];
+    const unsigned int d = source[3 * src_pitch];
+    const unsigned int e = source[4 * src_pitch];
+
+    dest[0 * dest_pitch] = (unsigned char)a;
+    dest[1 * dest_pitch] = (unsigned char)((b * 192 + c * 64 + 128) >> 8);
+    dest[2 * dest_pitch] = (unsigned char)((c * 128 + d * 128 + 128) >> 8);
+    dest[3 * dest_pitch] = (unsigned char)((d * 64 + e * 192 + 128) >> 8);
+
+    ++source;
+    ++dest;
+  }
+}
+
+/*7***************************************************************************
+ *
+ *  ROUTINE       : aom_horizontal_line_3_5_scale_c
+ *
+ *  INPUTS        : const unsigned char *source : Pointer to source data.
+ *                  unsigned int source_width   : Stride of source.
+ *                  unsigned char *dest         : Pointer to destination data.
+ *                  unsigned int dest_width     : Stride of destination
+ *                                                (NOT USED).
+ *
+ *  OUTPUTS       : None.
+ *
+ *  RETURNS       : void
+ *
+ *  FUNCTION      : Copies horizontal line of pixels from source to
+ *                  destination scaling up by 3 to 5.
+ *
+ *  SPECIAL NOTES : None.
+ *
+ *
+ ****************************************************************************/
+void aom_horizontal_line_5_3_scale_c(const unsigned char *source,
+                                     unsigned int source_width,
+                                     unsigned char *dest,
+                                     unsigned int dest_width) {
+  const unsigned char *const source_end = source + source_width;
+  (void)dest_width;
+  while (source < source_end) {
+    const unsigned int a = source[0];
+    const unsigned int b = source[1];
+    const unsigned int c = source[2];
+    const unsigned int d = source[3];
+    const unsigned int e = source[4];
+
+    dest[0] = (unsigned char)a;
+    dest[1] = (unsigned char)((b * 85 + c * 171 + 128) >> 8);
+    dest[2] = (unsigned char)((d * 171 + e * 85 + 128) >> 8);
+
+    source += 5;
+    dest += 3;
+  }
+}
+
+void aom_vertical_band_5_3_scale_c(unsigned char *source, int src_pitch,
+                                   unsigned char *dest, int dest_pitch,
+                                   unsigned int dest_width) {
+  const unsigned char *const dest_end = dest + dest_width;
+  while (dest < dest_end) {
+    const unsigned int a = source[0 * src_pitch];
+    const unsigned int b = source[1 * src_pitch];
+    const unsigned int c = source[2 * src_pitch];
+    const unsigned int d = source[3 * src_pitch];
+    const unsigned int e = source[4 * src_pitch];
+
+    dest[0 * dest_pitch] = (unsigned char)a;
+    dest[1 * dest_pitch] = (unsigned char)((b * 85 + c * 171 + 128) >> 8);
+    dest[2 * dest_pitch] = (unsigned char)((d * 171 + e * 85 + 128) >> 8);
+
+    ++source;
+    ++dest;
+  }
+}
+
+/****************************************************************************
+ *
+ *  ROUTINE       : aom_horizontal_line_1_2_scale_c
+ *
+ *  INPUTS        : const unsigned char *source : Pointer to source data.
+ *                  unsigned int source_width   : Stride of source.
+ *                  unsigned char *dest         : Pointer to destination data.
+ *                  unsigned int dest_width     : Stride of destination
+ *                                                (NOT USED).
+ *
+ *  OUTPUTS       : None.
+ *
+ *  RETURNS       : void
+ *
+ *  FUNCTION      : Copies horizontal line of pixels from source to
+ *                  destination scaling up by 1 to 2.
+ *
+ *  SPECIAL NOTES : None.
+ *
+ ****************************************************************************/
+void aom_horizontal_line_2_1_scale_c(const unsigned char *source,
+                                     unsigned int source_width,
+                                     unsigned char *dest,
+                                     unsigned int dest_width) {
+  const unsigned char *const source_end = source + source_width;
+  (void)dest_width;
+  while (source < source_end) {
+    dest[0] = source[0];
+    source += 2;
+    ++dest;
+  }
+}
+
+void aom_vertical_band_2_1_scale_c(unsigned char *source, int src_pitch,
+                                   unsigned char *dest, int dest_pitch,
+                                   unsigned int dest_width) {
+  (void)dest_pitch;
+  (void)src_pitch;
+  memcpy(dest, source, dest_width);
+}
+
+void aom_vertical_band_2_1_scale_i_c(unsigned char *source, int src_pitch,
+                                     unsigned char *dest, int dest_pitch,
+                                     unsigned int dest_width) {
+  const unsigned char *const dest_end = dest + dest_width;
+  (void)dest_pitch;
+  while (dest < dest_end) {
+    const unsigned int a = source[-src_pitch] * 3;
+    const unsigned int b = source[0] * 10;
+    const unsigned int c = source[src_pitch] * 3;
+    dest[0] = (unsigned char)((8 + a + b + c) >> 4);
+    ++source;
+    ++dest;
+  }
+}

diff --git a/libav1/aom_scale/generic/yv12config.c b/libav1/aom_scale/generic/yv12config.c
new file mode 100644
index 0000000..71a5c2f
--- /dev/null
+++ b/libav1/aom_scale/generic/yv12config.c

@@ -0,0 +1,204 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+
+#include "aom_mem/aom_mem.h"
+#include "aom_ports/mem.h"
+#include "aom_scale/yv12config.h"
+#include "av1/common/enums.h"
+
+/****************************************************************************
+ *  Exports
+ ****************************************************************************/
+
+/****************************************************************************
+ *
+ ****************************************************************************/
+#define yv12_align_addr(addr, align) \
+  (void *)(((size_t)(addr) + ((align)-1)) & (size_t) - (align))
+
+// TODO(jkoleszar): Maybe replace this with struct aom_image
+
+int aom_free_frame_buffer(YV12_BUFFER_CONFIG *ybf) {
+  if (ybf) {
+    if (ybf->buffer_alloc_sz > 0) {
+      aom_free(ybf->buffer_alloc);
+    }
+    if (ybf->y_buffer_8bit) aom_free(ybf->y_buffer_8bit);
+
+    /* buffer_alloc isn't accessed by most functions.  Rather y_buffer,
+      u_buffer and v_buffer point to buffer_alloc and are used.  Clear out
+      all of this so that a freed pointer isn't inadvertently used */
+    memset(ybf, 0, sizeof(YV12_BUFFER_CONFIG));
+  } else {
+    return -1;
+  }
+
+  return 0;
+}
+// Todo remove it. It isn't used
+int aom_realloc_frame_buffer(YV12_BUFFER_CONFIG *ybf, int width, int height,
+                             int ss_x, int ss_y, int use_highbitdepth,
+                             int border, int byte_alignment,
+                             aom_codec_frame_buffer_t *fb,
+                             aom_get_frame_buffer_cb_fn_t cb, void *cb_priv) {
+//#if CONFIG_SIZE_LIMIT
+//  if (width > DECODE_WIDTH_LIMIT || height > DECODE_HEIGHT_LIMIT) return -1;
+//#endif
+//  return -1;
+//  /* Only support allocating buffers that have a border that's a multiple
+//   * of 32. The border restriction is required to get 16-byte alignment of
+//   * the start of the chroma rows without introducing an arbitrary gap
+//   * between planes, which would break the semantics of things like
+//   * aom_img_set_rect(). */
+//  if (border & 0x1f) return -3;
+//
+//  if (ybf) {
+//    const int aom_byte_align = (byte_alignment == 0) ? 1 : byte_alignment;
+//    const int aligned_width = (width + 7) & ~7;
+//    const int aligned_height = (height + 7) & ~7;
+//    const int y_stride = ((aligned_width + 2 * border) + 31) & ~31;
+//    const uint64_t yplane_size =
+//        (aligned_height + 2 * border) * (uint64_t)y_stride + byte_alignment;
+//    const int uv_width = aligned_width >> ss_x;
+//    const int uv_height = aligned_height >> ss_y;
+//    const int uv_stride = y_stride >> ss_x;
+//    const int uv_border_w = border >> ss_x;
+//    const int uv_border_h = border >> ss_y;
+//    const uint64_t uvplane_size =
+//        (uv_height + 2 * uv_border_h) * (uint64_t)uv_stride + byte_alignment;
+//
+//    const uint64_t frame_size =
+//        (1 + use_highbitdepth) * (yplane_size + 2 * uvplane_size);
+//
+//    uint8_t *buf = NULL;
+//
+//#if defined AOM_MAX_ALLOCABLE_MEMORY
+//    // The size of ybf->buffer_alloc.
+//    uint64_t alloc_size = frame_size;
+//    // The size of ybf->y_buffer_8bit.
+//    if (use_highbitdepth) alloc_size += yplane_size;
+//    // The decoder may allocate REF_FRAMES frame buffers in the frame buffer
+//    // pool. Bound the total amount of allocated memory as if these REF_FRAMES
+//    // frame buffers were allocated in a single allocation.
+//    if (alloc_size > AOM_MAX_ALLOCABLE_MEMORY / REF_FRAMES) return -1;
+//#endif
+//
+//    if (cb != NULL) {
+//      const int align_addr_extra_size = 31;
+//      const uint64_t external_frame_size = frame_size + align_addr_extra_size;
+//
+//      assert(fb != NULL);
+//
+//      if (external_frame_size != (size_t)external_frame_size) return -1;
+//
+//      // Allocation to hold larger frame, or first allocation.
+//      if (cb(cb_priv, (size_t)external_frame_size, fb) < 0) return -1;
+//
+//      if (fb->data == NULL || fb->size < external_frame_size) return -1;
+//
+//      ybf->buffer_alloc = (uint8_t *)yv12_align_addr(fb->data, 32);
+//
+//#if defined(__has_feature)
+//#if __has_feature(memory_sanitizer)
+//      // This memset is needed for fixing the issue of using uninitialized
+//      // value in msan test. It will cause a perf loss, so only do this for
+//      // msan test.
+//      memset(ybf->buffer_alloc, 0, (size_t)frame_size);
+//#endif
+//#endif
+//    } else if (frame_size > ybf->buffer_alloc_sz) {
+//      // Allocation to hold larger frame, or first allocation.
+//      aom_free(ybf->buffer_alloc);
+//      ybf->buffer_alloc = NULL;
+//      ybf->buffer_alloc_sz = 0;
+//
+//      if (frame_size != (size_t)frame_size) return -1;
+//
+//      ybf->buffer_alloc = (uint8_t *)aom_memalign(32, (size_t)frame_size);
+//      if (!ybf->buffer_alloc) return -1;
+//
+//      ybf->buffer_alloc_sz = (size_t)frame_size;
+//
+//      // This memset is needed for fixing valgrind error from C loop filter
+//      // due to access uninitialized memory in frame border. It could be
+//      // removed if border is totally removed.
+//      memset(ybf->buffer_alloc, 0, ybf->buffer_alloc_sz);
+//    }
+//
+//    ybf->y_crop_width = width;
+//    ybf->y_crop_height = height;
+//    ybf->y_width = aligned_width;
+//    ybf->y_height = aligned_height;
+//    ybf->y_stride = y_stride;
+//
+//    ybf->uv_crop_width = (width + ss_x) >> ss_x;
+//    ybf->uv_crop_height = (height + ss_y) >> ss_y;
+//    ybf->uv_width = uv_width;
+//    ybf->uv_height = uv_height;
+//    ybf->uv_stride = uv_stride;
+//
+//    ybf->border = border;
+//    ybf->frame_size = (size_t)frame_size;
+//    ybf->subsampling_x = ss_x;
+//    ybf->subsampling_y = ss_y;
+//
+//    buf = ybf->buffer_alloc;
+//    if (use_highbitdepth) {
+//      // Store uint16 addresses when using 16bit framebuffers
+//      buf = CONVERT_TO_BYTEPTR(ybf->buffer_alloc);
+//      ybf->flags = YV12_FLAG_HIGHBITDEPTH;
+//    } else {
+//      ybf->flags = 0;
+//    }
+//
+//    ybf->y_buffer = (uint8_t *)yv12_align_addr(
+//        buf + (border * y_stride) + border, aom_byte_align);
+//    ybf->u_buffer = (uint8_t *)yv12_align_addr(
+//        buf + yplane_size + (uv_border_h * uv_stride) + uv_border_w,
+//        aom_byte_align);
+//    ybf->v_buffer =
+//        (uint8_t *)yv12_align_addr(buf + yplane_size + uvplane_size +
+//                                       (uv_border_h * uv_stride) + uv_border_w,
+//                                   aom_byte_align);
+//
+//    ybf->use_external_reference_buffers = 0;
+//
+//    if (use_highbitdepth) {
+//      if (ybf->y_buffer_8bit) aom_free(ybf->y_buffer_8bit);
+//      ybf->y_buffer_8bit = (uint8_t *)aom_memalign(32, (size_t)yplane_size);
+//      if (!ybf->y_buffer_8bit) return -1;
+//    } else {
+//      if (ybf->y_buffer_8bit) {
+//        aom_free(ybf->y_buffer_8bit);
+//        ybf->y_buffer_8bit = NULL;
+//        ybf->buf_8bit_valid = 0;
+//      }
+//    }
+//
+//    ybf->corrupted = 0; /* assume not corrupted by errors */
+//    return 0;
+//  }
+  return -2;
+}
+
+int aom_alloc_frame_buffer(YV12_BUFFER_CONFIG *ybf, int width, int height,
+                           int ss_x, int ss_y, int use_highbitdepth, int border,
+                           int byte_alignment) {
+  if (ybf) {
+    aom_free_frame_buffer(ybf);
+    return aom_realloc_frame_buffer(ybf, width, height, ss_x, ss_y,
+                                    use_highbitdepth, border, byte_alignment,
+                                    NULL, NULL, NULL);
+  }
+  return -2;
+}

diff --git a/libav1/aom_scale/generic/yv12extend.c b/libav1/aom_scale/generic/yv12extend.c
new file mode 100644
index 0000000..127ca23
--- /dev/null
+++ b/libav1/aom_scale/generic/yv12extend.c

@@ -0,0 +1,436 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+
+#include "config/aom_config.h"
+#include "config/aom_scale_rtcd.h"
+
+#include "aom/aom_integer.h"
+#include "aom_mem/aom_mem.h"
+#include "aom_ports/mem.h"
+#include "aom_scale/yv12config.h"
+
+static void extend_plane(uint8_t *const src, int src_stride, int width,
+                         int height, int extend_top, int extend_left,
+                         int extend_bottom, int extend_right) {
+  int i;
+  const int linesize = extend_left + extend_right + width;
+
+  /* copy the left and right most columns out */
+  uint8_t *src_ptr1 = src;
+  uint8_t *src_ptr2 = src + width - 1;
+  uint8_t *dst_ptr1 = src - extend_left;
+  uint8_t *dst_ptr2 = src + width;
+
+  for (i = 0; i < height; ++i) {
+    memset(dst_ptr1, src_ptr1[0], extend_left);
+    memset(dst_ptr2, src_ptr2[0], extend_right);
+    src_ptr1 += src_stride;
+    src_ptr2 += src_stride;
+    dst_ptr1 += src_stride;
+    dst_ptr2 += src_stride;
+  }
+
+  /* Now copy the top and bottom lines into each line of the respective
+   * borders
+   */
+  src_ptr1 = src - extend_left;
+  src_ptr2 = src + src_stride * (height - 1) - extend_left;
+  dst_ptr1 = src + src_stride * -extend_top - extend_left;
+  dst_ptr2 = src + src_stride * height - extend_left;
+
+  for (i = 0; i < extend_top; ++i) {
+    memcpy(dst_ptr1, src_ptr1, linesize);
+    dst_ptr1 += src_stride;
+  }
+
+  for (i = 0; i < extend_bottom; ++i) {
+    memcpy(dst_ptr2, src_ptr2, linesize);
+    dst_ptr2 += src_stride;
+  }
+}
+
+static void extend_plane_high(uint8_t *const src8, int src_stride, int width,
+                              int height, int extend_top, int extend_left,
+                              int extend_bottom, int extend_right) {
+  int i;
+  const int linesize = extend_left + extend_right + width;
+  uint16_t *src = CONVERT_TO_SHORTPTR(src8);
+
+  /* copy the left and right most columns out */
+  uint16_t *src_ptr1 = src;
+  uint16_t *src_ptr2 = src + width - 1;
+  uint16_t *dst_ptr1 = src - extend_left;
+  uint16_t *dst_ptr2 = src + width;
+
+  for (i = 0; i < height; ++i) {
+    aom_memset16(dst_ptr1, src_ptr1[0], extend_left);
+    aom_memset16(dst_ptr2, src_ptr2[0], extend_right);
+    src_ptr1 += src_stride;
+    src_ptr2 += src_stride;
+    dst_ptr1 += src_stride;
+    dst_ptr2 += src_stride;
+  }
+
+  /* Now copy the top and bottom lines into each line of the respective
+   * borders
+   */
+  src_ptr1 = src - extend_left;
+  src_ptr2 = src + src_stride * (height - 1) - extend_left;
+  dst_ptr1 = src + src_stride * -extend_top - extend_left;
+  dst_ptr2 = src + src_stride * height - extend_left;
+
+  for (i = 0; i < extend_top; ++i) {
+    memcpy(dst_ptr1, src_ptr1, linesize * sizeof(uint16_t));
+    dst_ptr1 += src_stride;
+  }
+
+  for (i = 0; i < extend_bottom; ++i) {
+    memcpy(dst_ptr2, src_ptr2, linesize * sizeof(uint16_t));
+    dst_ptr2 += src_stride;
+  }
+}
+
+void aom_yv12_extend_frame_borders_c(YV12_BUFFER_CONFIG *ybf,
+                                     const int num_planes) {
+  assert(ybf->border % 2 == 0);
+  assert(ybf->y_height - ybf->y_crop_height < 16);
+  assert(ybf->y_width - ybf->y_crop_width < 16);
+  assert(ybf->y_height - ybf->y_crop_height >= 0);
+  assert(ybf->y_width - ybf->y_crop_width >= 0);
+
+  if (ybf->flags & YV12_FLAG_HIGHBITDEPTH) {
+    for (int plane = 0; plane < num_planes; ++plane) {
+      const int is_uv = plane > 0;
+      const int plane_border = ybf->border >> is_uv;
+      extend_plane_high(
+          ybf->buffers[plane], ybf->strides[is_uv], ybf->crop_widths[is_uv],
+          ybf->crop_heights[is_uv], plane_border, plane_border,
+          plane_border + ybf->heights[is_uv] - ybf->crop_heights[is_uv],
+          plane_border + ybf->widths[is_uv] - ybf->crop_widths[is_uv]);
+    }
+    return;
+  }
+  for (int plane = 0; plane < num_planes; ++plane) {
+    const int is_uv = plane > 0;
+    const int plane_border = ybf->border >> is_uv;
+    extend_plane(ybf->buffers[plane], ybf->strides[is_uv],
+                 ybf->crop_widths[is_uv], ybf->crop_heights[is_uv],
+                 plane_border, plane_border,
+                 plane_border + ybf->heights[is_uv] - ybf->crop_heights[is_uv],
+                 plane_border + ybf->widths[is_uv] - ybf->crop_widths[is_uv]);
+  }
+}
+
+static void extend_frame(YV12_BUFFER_CONFIG *const ybf, int ext_size,
+                         const int num_planes) {
+  const int ss_x = ybf->uv_width < ybf->y_width;
+  const int ss_y = ybf->uv_height < ybf->y_height;
+
+  assert(ybf->y_height - ybf->y_crop_height < 16);
+  assert(ybf->y_width - ybf->y_crop_width < 16);
+  assert(ybf->y_height - ybf->y_crop_height >= 0);
+  assert(ybf->y_width - ybf->y_crop_width >= 0);
+
+  if (ybf->flags & YV12_FLAG_HIGHBITDEPTH) {
+    for (int plane = 0; plane < num_planes; ++plane) {
+      const int is_uv = plane > 0;
+      const int top = ext_size >> (is_uv ? ss_y : 0);
+      const int left = ext_size >> (is_uv ? ss_x : 0);
+      const int bottom = top + ybf->heights[is_uv] - ybf->crop_heights[is_uv];
+      const int right = left + ybf->widths[is_uv] - ybf->crop_widths[is_uv];
+      extend_plane_high(ybf->buffers[plane], ybf->strides[is_uv],
+                        ybf->crop_widths[is_uv], ybf->crop_heights[is_uv], top,
+                        left, bottom, right);
+    }
+    return;
+  }
+  for (int plane = 0; plane < num_planes; ++plane) {
+    const int is_uv = plane > 0;
+    const int top = ext_size >> (is_uv ? ss_y : 0);
+    const int left = ext_size >> (is_uv ? ss_x : 0);
+    const int bottom = top + ybf->heights[is_uv] - ybf->crop_heights[is_uv];
+    const int right = left + ybf->widths[is_uv] - ybf->crop_widths[is_uv];
+    extend_plane(ybf->buffers[plane], ybf->strides[is_uv],
+                 ybf->crop_widths[is_uv], ybf->crop_heights[is_uv], top, left,
+                 bottom, right);
+  }
+}
+
+void aom_extend_frame_borders_c(YV12_BUFFER_CONFIG *ybf, const int num_planes) {
+  extend_frame(ybf, ybf->border, num_planes);
+}
+
+void aom_extend_frame_inner_borders_c(YV12_BUFFER_CONFIG *ybf,
+                                      const int num_planes) {
+  const int inner_bw = (ybf->border > AOMINNERBORDERINPIXELS)
+                           ? AOMINNERBORDERINPIXELS
+                           : ybf->border;
+  extend_frame(ybf, inner_bw, num_planes);
+}
+
+void aom_extend_frame_borders_y_c(YV12_BUFFER_CONFIG *ybf) {
+  int ext_size = ybf->border;
+  assert(ybf->y_height - ybf->y_crop_height < 16);
+  assert(ybf->y_width - ybf->y_crop_width < 16);
+  assert(ybf->y_height - ybf->y_crop_height >= 0);
+  assert(ybf->y_width - ybf->y_crop_width >= 0);
+
+  if (ybf->flags & YV12_FLAG_HIGHBITDEPTH) {
+    extend_plane_high(ybf->y_buffer, ybf->y_stride, ybf->y_crop_width,
+                      ybf->y_crop_height, ext_size, ext_size,
+                      ext_size + ybf->y_height - ybf->y_crop_height,
+                      ext_size + ybf->y_width - ybf->y_crop_width);
+    return;
+  }
+  extend_plane(ybf->y_buffer, ybf->y_stride, ybf->y_crop_width,
+               ybf->y_crop_height, ext_size, ext_size,
+               ext_size + ybf->y_height - ybf->y_crop_height,
+               ext_size + ybf->y_width - ybf->y_crop_width);
+}
+
+static void memcpy_short_addr(uint8_t *dst8, const uint8_t *src8, int num) {
+  uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
+  uint16_t *src = CONVERT_TO_SHORTPTR(src8);
+  memcpy(dst, src, num * sizeof(uint16_t));
+}
+
+// Copies the source image into the destination image and updates the
+// destination's UMV borders.
+// Note: The frames are assumed to be identical in size.
+void aom_yv12_copy_frame_c(const YV12_BUFFER_CONFIG *src_bc,
+                           YV12_BUFFER_CONFIG *dst_bc, const int num_planes) {
+#if 0
+  /* These assertions are valid in the codec, but the libaom-tester uses
+   * this code slightly differently.
+   */
+  assert(src_bc->y_width == dst_bc->y_width);
+  assert(src_bc->y_height == dst_bc->y_height);
+#endif
+
+  assert((src_bc->flags & YV12_FLAG_HIGHBITDEPTH) ==
+         (dst_bc->flags & YV12_FLAG_HIGHBITDEPTH));
+
+  if (src_bc->flags & YV12_FLAG_HIGHBITDEPTH) {
+    for (int plane = 0; plane < num_planes; ++plane) {
+      const uint8_t *plane_src = src_bc->buffers[plane];
+      uint8_t *plane_dst = dst_bc->buffers[plane];
+      const int is_uv = plane > 0;
+
+      for (int row = 0; row < src_bc->heights[is_uv]; ++row) {
+        memcpy_short_addr(plane_dst, plane_src, src_bc->widths[is_uv]);
+        plane_src += src_bc->strides[is_uv];
+        plane_dst += dst_bc->strides[is_uv];
+      }
+    }
+    aom_yv12_extend_frame_borders_c(dst_bc, num_planes);
+    return;
+  }
+  for (int plane = 0; plane < num_planes; ++plane) {
+    const uint8_t *plane_src = src_bc->buffers[plane];
+    uint8_t *plane_dst = dst_bc->buffers[plane];
+    const int is_uv = plane > 0;
+
+    for (int row = 0; row < src_bc->heights[is_uv]; ++row) {
+      memcpy(plane_dst, plane_src, src_bc->widths[is_uv]);
+      plane_src += src_bc->strides[is_uv];
+      plane_dst += dst_bc->strides[is_uv];
+    }
+  }
+  aom_yv12_extend_frame_borders_c(dst_bc, num_planes);
+}
+
+void aom_yv12_copy_y_c(const YV12_BUFFER_CONFIG *src_ybc,
+                       YV12_BUFFER_CONFIG *dst_ybc) {
+  int row;
+  const uint8_t *src = src_ybc->y_buffer;
+  uint8_t *dst = dst_ybc->y_buffer;
+
+  if (src_ybc->flags & YV12_FLAG_HIGHBITDEPTH) {
+    const uint16_t *src16 = CONVERT_TO_SHORTPTR(src);
+    uint16_t *dst16 = CONVERT_TO_SHORTPTR(dst);
+    for (row = 0; row < src_ybc->y_height; ++row) {
+      memcpy(dst16, src16, src_ybc->y_width * sizeof(uint16_t));
+      src16 += src_ybc->y_stride;
+      dst16 += dst_ybc->y_stride;
+    }
+    return;
+  }
+
+  for (row = 0; row < src_ybc->y_height; ++row) {
+    memcpy(dst, src, src_ybc->y_width);
+    src += src_ybc->y_stride;
+    dst += dst_ybc->y_stride;
+  }
+}
+
+void aom_yv12_copy_u_c(const YV12_BUFFER_CONFIG *src_bc,
+                       YV12_BUFFER_CONFIG *dst_bc) {
+  int row;
+  const uint8_t *src = src_bc->u_buffer;
+  uint8_t *dst = dst_bc->u_buffer;
+
+  if (src_bc->flags & YV12_FLAG_HIGHBITDEPTH) {
+    const uint16_t *src16 = CONVERT_TO_SHORTPTR(src);
+    uint16_t *dst16 = CONVERT_TO_SHORTPTR(dst);
+    for (row = 0; row < src_bc->uv_height; ++row) {
+      memcpy(dst16, src16, src_bc->uv_width * sizeof(uint16_t));
+      src16 += src_bc->uv_stride;
+      dst16 += dst_bc->uv_stride;
+    }
+    return;
+  }
+
+  for (row = 0; row < src_bc->uv_height; ++row) {
+    memcpy(dst, src, src_bc->uv_width);
+    src += src_bc->uv_stride;
+    dst += dst_bc->uv_stride;
+  }
+}
+
+void aom_yv12_copy_v_c(const YV12_BUFFER_CONFIG *src_bc,
+                       YV12_BUFFER_CONFIG *dst_bc) {
+  int row;
+  const uint8_t *src = src_bc->v_buffer;
+  uint8_t *dst = dst_bc->v_buffer;
+
+  if (src_bc->flags & YV12_FLAG_HIGHBITDEPTH) {
+    const uint16_t *src16 = CONVERT_TO_SHORTPTR(src);
+    uint16_t *dst16 = CONVERT_TO_SHORTPTR(dst);
+    for (row = 0; row < src_bc->uv_height; ++row) {
+      memcpy(dst16, src16, src_bc->uv_width * sizeof(uint16_t));
+      src16 += src_bc->uv_stride;
+      dst16 += dst_bc->uv_stride;
+    }
+    return;
+  }
+
+  for (row = 0; row < src_bc->uv_height; ++row) {
+    memcpy(dst, src, src_bc->uv_width);
+    src += src_bc->uv_stride;
+    dst += dst_bc->uv_stride;
+  }
+}
+
+void aom_yv12_partial_copy_y_c(const YV12_BUFFER_CONFIG *src_ybc, int hstart1,
+                               int hend1, int vstart1, int vend1,
+                               YV12_BUFFER_CONFIG *dst_ybc, int hstart2,
+                               int vstart2) {
+  int row;
+  const uint8_t *src = src_ybc->y_buffer;
+  uint8_t *dst = dst_ybc->y_buffer;
+
+  if (src_ybc->flags & YV12_FLAG_HIGHBITDEPTH) {
+    const uint16_t *src16 =
+        CONVERT_TO_SHORTPTR(src + vstart1 * src_ybc->y_stride + hstart1);
+    uint16_t *dst16 =
+        CONVERT_TO_SHORTPTR(dst + vstart2 * dst_ybc->y_stride + hstart2);
+
+    for (row = vstart1; row < vend1; ++row) {
+      memcpy(dst16, src16, (hend1 - hstart1) * sizeof(uint16_t));
+      src16 += src_ybc->y_stride;
+      dst16 += dst_ybc->y_stride;
+    }
+    return;
+  }
+  src = (src + vstart1 * src_ybc->y_stride + hstart1);
+  dst = (dst + vstart2 * dst_ybc->y_stride + hstart2);
+
+  for (row = vstart1; row < vend1; ++row) {
+    memcpy(dst, src, (hend1 - hstart1));
+    src += src_ybc->y_stride;
+    dst += dst_ybc->y_stride;
+  }
+}
+
+void aom_yv12_partial_coloc_copy_y_c(const YV12_BUFFER_CONFIG *src_ybc,
+                                     YV12_BUFFER_CONFIG *dst_ybc, int hstart,
+                                     int hend, int vstart, int vend) {
+  aom_yv12_partial_copy_y_c(src_ybc, hstart, hend, vstart, vend, dst_ybc,
+                            hstart, vstart);
+}
+
+void aom_yv12_partial_copy_u_c(const YV12_BUFFER_CONFIG *src_bc, int hstart1,
+                               int hend1, int vstart1, int vend1,
+                               YV12_BUFFER_CONFIG *dst_bc, int hstart2,
+                               int vstart2) {
+  int row;
+  const uint8_t *src = src_bc->u_buffer;
+  uint8_t *dst = dst_bc->u_buffer;
+
+  if (src_bc->flags & YV12_FLAG_HIGHBITDEPTH) {
+    const uint16_t *src16 =
+        CONVERT_TO_SHORTPTR(src + vstart1 * src_bc->uv_stride + hstart1);
+    uint16_t *dst16 =
+        CONVERT_TO_SHORTPTR(dst + vstart2 * dst_bc->uv_stride + hstart2);
+    for (row = vstart1; row < vend1; ++row) {
+      memcpy(dst16, src16, (hend1 - hstart1) * sizeof(uint16_t));
+      src16 += src_bc->uv_stride;
+      dst16 += dst_bc->uv_stride;
+    }
+    return;
+  }
+
+  src = (src + vstart1 * src_bc->uv_stride + hstart1);
+  dst = (dst + vstart2 * dst_bc->uv_stride + hstart2);
+
+  for (row = vstart1; row < vend1; ++row) {
+    memcpy(dst, src, (hend1 - hstart1));
+    src += src_bc->uv_stride;
+    dst += dst_bc->uv_stride;
+  }
+}
+
+void aom_yv12_partial_coloc_copy_u_c(const YV12_BUFFER_CONFIG *src_bc,
+                                     YV12_BUFFER_CONFIG *dst_bc, int hstart,
+                                     int hend, int vstart, int vend) {
+  aom_yv12_partial_copy_u_c(src_bc, hstart, hend, vstart, vend, dst_bc, hstart,
+                            vstart);
+}
+
+void aom_yv12_partial_copy_v_c(const YV12_BUFFER_CONFIG *src_bc, int hstart1,
+                               int hend1, int vstart1, int vend1,
+                               YV12_BUFFER_CONFIG *dst_bc, int hstart2,
+                               int vstart2) {
+  int row;
+  const uint8_t *src = src_bc->v_buffer;
+  uint8_t *dst = dst_bc->v_buffer;
+
+  if (src_bc->flags & YV12_FLAG_HIGHBITDEPTH) {
+    const uint16_t *src16 =
+        CONVERT_TO_SHORTPTR(src + vstart1 * src_bc->uv_stride + hstart1);
+    uint16_t *dst16 =
+        CONVERT_TO_SHORTPTR(dst + vstart2 * dst_bc->uv_stride + hstart2);
+    for (row = vstart1; row < vend1; ++row) {
+      memcpy(dst16, src16, (hend1 - hstart1) * sizeof(uint16_t));
+      src16 += src_bc->uv_stride;
+      dst16 += dst_bc->uv_stride;
+    }
+    return;
+  }
+
+  src = (src + vstart1 * src_bc->uv_stride + hstart1);
+  dst = (dst + vstart2 * dst_bc->uv_stride + hstart2);
+
+  for (row = vstart1; row < vend1; ++row) {
+    memcpy(dst, src, (hend1 - hstart1));
+    src += src_bc->uv_stride;
+    dst += dst_bc->uv_stride;
+  }
+}
+
+void aom_yv12_partial_coloc_copy_v_c(const YV12_BUFFER_CONFIG *src_bc,
+                                     YV12_BUFFER_CONFIG *dst_bc, int hstart,
+                                     int hend, int vstart, int vend) {
+  aom_yv12_partial_copy_v_c(src_bc, hstart, hend, vstart, vend, dst_bc, hstart,
+                            vstart);
+}

diff --git a/libav1/aom_scale/yv12config.h b/libav1/aom_scale/yv12config.h
new file mode 100644
index 0000000..48f824b
--- /dev/null
+++ b/libav1/aom_scale/yv12config.h

@@ -0,0 +1,193 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_SCALE_YV12CONFIG_H_
+#define AOM_AOM_SCALE_YV12CONFIG_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "config/aom_config.h"
+
+#include "aom/aom_codec.h"
+#include "aom/aom_frame_buffer.h"
+#include "aom/aom_integer.h"
+
+#define AOMINNERBORDERINPIXELS 160
+#define AOM_INTERP_EXTEND 4
+#define AOM_BORDER_IN_PIXELS 288
+#define AOM_ENC_NO_SCALE_BORDER 160
+#define AOM_DEC_BORDER_IN_PIXELS 64
+
+typedef struct 
+{
+    int stride;
+    int offset;
+    int res_stride;
+    int res_offset;
+} PlaneInfo;
+
+typedef struct
+{
+    int stride;
+    int offset;
+    int width;
+    int height;
+} RefPlaneInfo;
+
+typedef struct
+{
+    //int8_t * pool_ptr;
+    //int8_t * fb_ptr;
+    void * mvs_alloc;
+    uint8_t * seg_alloc;
+    size_t base_offset;
+    size_t size;
+    PlaneInfo planes[3];
+    int width;
+    int height;
+    int hbd;
+    int y_crop_width;
+    int y_crop_height;
+    int uv_crop_width;
+    int uv_crop_height;
+
+    int frame_number;
+    int ref_cnt;
+} HwFrameBuffer;
+
+typedef struct
+{
+    uint8_t * fb_ptr;
+    size_t size;
+
+    PlaneInfo planes[3];
+    int frame_number;
+    int y_crop_width;
+    int y_crop_height;
+    int uv_crop_width;
+    int uv_crop_height;
+    int hbd;
+    int monochrome;
+    void * user_priv;
+    void * alloc_priv;
+
+    void * hw_buf;
+    int is_valid;
+} HwOutputImage;
+
+typedef struct yv12_buffer_config {
+  union {
+    struct {
+      int y_width;
+      int uv_width;
+    };
+    int widths[2];
+  };
+  union {
+    struct {
+      int y_height;
+      int uv_height;
+    };
+    int heights[2];
+  };
+  union {
+    struct {
+      int y_crop_width;
+      int uv_crop_width;
+    };
+    int crop_widths[2];
+  };
+  union {
+    struct {
+      int y_crop_height;
+      int uv_crop_height;
+    };
+    int crop_heights[2];
+  };
+  union {
+    struct {
+      int y_stride;
+      int uv_stride;
+    };
+    int strides[2];
+  };
+  union {
+    struct {
+      uint8_t *y_buffer;
+      uint8_t *u_buffer;
+      uint8_t *v_buffer;
+    };
+    uint8_t *buffers[3];
+  };
+
+  // Indicate whether y_buffer, u_buffer, and v_buffer points to the internally
+  // allocated memory or external buffers.
+  int use_external_reference_buffers;
+  // This is needed to store y_buffer, u_buffer, and v_buffer when set reference
+  // uses an external refernece, and restore those buffer pointers after the
+  // external reference frame is no longer used.
+  uint8_t *store_buf_adr[3];
+
+  // If the frame is stored in a 16-bit buffer, this stores an 8-bit version
+  // for use in global motion detection. It is allocated on-demand.
+  uint8_t *y_buffer_8bit;
+  int buf_8bit_valid;
+
+  uint8_t *buffer_alloc;
+  size_t buffer_alloc_sz;
+  int border;
+  size_t frame_size;
+  int subsampling_x;
+  int subsampling_y;
+  unsigned int bit_depth;
+  aom_color_primaries_t color_primaries;
+  aom_transfer_characteristics_t transfer_characteristics;
+  aom_matrix_coefficients_t matrix_coefficients;
+  uint8_t monochrome;
+  aom_chroma_sample_position_t chroma_sample_position;
+  aom_color_range_t color_range;
+  int render_width;
+  int render_height;
+
+  int corrupted;
+  int flags;
+
+  HwFrameBuffer * hw_buffer;
+  HwOutputImage * hw_show_image;
+} YV12_BUFFER_CONFIG;
+
+#define YV12_FLAG_HIGHBITDEPTH 8
+
+int aom_alloc_frame_buffer(YV12_BUFFER_CONFIG *ybf, int width, int height,
+                           int ss_x, int ss_y, int use_highbitdepth, int border,
+                           int byte_alignment);
+
+// Updates the yv12 buffer config with the frame buffer. |byte_alignment| must
+// be a power of 2, from 32 to 1024. 0 sets legacy alignment. If cb is not
+// NULL, then libaom is using the frame buffer callbacks to handle memory.
+// If cb is not NULL, libaom will call cb with minimum size in bytes needed
+// to decode the current frame. If cb is NULL, libaom will allocate memory
+// internally to decode the current frame. Returns 0 on success. Returns < 0
+// on failure.
+int aom_realloc_frame_buffer(YV12_BUFFER_CONFIG *ybf, int width, int height,
+                             int ss_x, int ss_y, int use_highbitdepth,
+                             int border, int byte_alignment,
+                             aom_codec_frame_buffer_t *fb,
+                             aom_get_frame_buffer_cb_fn_t cb, void *cb_priv);
+int aom_free_frame_buffer(YV12_BUFFER_CONFIG *ybf);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  // AOM_AOM_SCALE_YV12CONFIG_H_

diff --git a/libav1/aom_util/aom_thread.c b/libav1/aom_util/aom_thread.c
new file mode 100644
index 0000000..9547b1e
--- /dev/null
+++ b/libav1/aom_util/aom_thread.c

@@ -0,0 +1,194 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+//
+// Multi-threaded worker
+//
+// Original source:
+//  https://chromium.googlesource.com/webm/libwebp
+
+// Enable GNU extensions in glibc so that we can call pthread_setname_np().
+// This must be before any #include statements.
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+
+#include <assert.h>
+#include <string.h>  // for memset()
+
+#include "aom_mem/aom_mem.h"
+#include "aom_util/aom_thread.h"
+
+#if CONFIG_MULTITHREAD
+
+//------------------------------------------------------------------------------
+
+static void execute(AVxWorker *const worker);  // Forward declaration.
+
+static THREADFN thread_loop(void *ptr) {
+  AVxWorker *const worker = (AVxWorker *)ptr;
+#ifdef __APPLE__
+  if (worker->thread_name != NULL) {
+    // Apple's version of pthread_setname_np takes one argument and operates on
+    // the current thread only. The maximum size of the thread_name buffer was
+    // noted in the Chromium source code and was confirmed by experiments. If
+    // thread_name is too long, pthread_setname_np returns -1 with errno
+    // ENAMETOOLONG (63).
+    char thread_name[64];
+    strncpy(thread_name, worker->thread_name, sizeof(thread_name));
+    thread_name[sizeof(thread_name) - 1] = '\0';
+    pthread_setname_np(thread_name);
+  }
+#elif defined(__GLIBC__) || defined(__BIONIC__)
+  if (worker->thread_name != NULL) {
+    // Linux and Android require names (with nul) fit in 16 chars, otherwise
+    // pthread_setname_np() returns ERANGE (34).
+    char thread_name[16];
+    strncpy(thread_name, worker->thread_name, sizeof(thread_name));
+    thread_name[sizeof(thread_name) - 1] = '\0';
+    pthread_setname_np(pthread_self(), thread_name);
+  }
+#endif
+  int done = 0;
+  while (!done) {
+    pthread_mutex_lock(&worker->impl_.mutex_);
+    while (worker->status_ == OK) {  // wait in idling mode
+      pthread_cond_wait(&worker->impl_.condition_, &worker->impl_.mutex_);
+    }
+    if (worker->status_ == WORK) {
+      execute(worker);
+      worker->status_ = OK;
+    } else if (worker->status_ == NOT_OK) {  // finish the worker
+      done = 1;
+    }
+    // signal to the main thread that we're done (for sync())
+    pthread_cond_signal(&worker->impl_.condition_);
+    pthread_mutex_unlock(&worker->impl_.mutex_);
+  }
+  return THREAD_RETURN(NULL);  // Thread is finished
+}
+
+// main thread state control
+static void change_state(AVxWorker *const worker, AVxWorkerStatus new_status) {
+  // No-op when attempting to change state on a thread that didn't come up.
+  // Checking status_ without acquiring the lock first would result in a data
+  // race.
+  pthread_mutex_lock(&worker->impl_.mutex_);
+  if (worker->status_ >= OK) {
+    // wait for the worker to finish
+    while (worker->status_ != OK) {
+      pthread_cond_wait(&worker->impl_.condition_, &worker->impl_.mutex_);
+    }
+    // assign new status and release the working thread if needed
+    if (new_status != OK) {
+      worker->status_ = new_status;
+      pthread_cond_signal(&worker->impl_.condition_);
+    }
+  }
+  pthread_mutex_unlock(&worker->impl_.mutex_);
+}
+
+#endif  // CONFIG_MULTITHREAD
+
+//------------------------------------------------------------------------------
+
+static void init(AVxWorker *const worker) {
+  memset(worker, 0, sizeof(*worker));
+  worker->status_ = NOT_OK;
+}
+
+static int sync(AVxWorker *const worker) {
+  if (worker->status_ != NOT_OK)
+    change_state(worker, OK);
+  assert(worker->status_ <= OK);
+  return !worker->had_error;
+}
+
+static int reset(AVxWorker *const worker) {
+  int ok = 1;
+  worker->had_error = 0;
+  if (worker->status_ < OK) {
+#if CONFIG_MULTITHREAD
+    if (pthread_mutex_init(&worker->impl_.mutex_, NULL)) {
+      goto Error;
+    }
+    if (pthread_cond_init(&worker->impl_.condition_, NULL)) {
+      pthread_mutex_destroy(&worker->impl_.mutex_);
+      goto Error;
+    }
+    pthread_mutex_lock(&worker->impl_.mutex_);
+    ok = !pthread_create(&worker->impl_.thread_, NULL, thread_loop, worker);
+    if (ok) worker->status_ = OK;
+    pthread_mutex_unlock(&worker->impl_.mutex_);
+    if (!ok) {
+      pthread_mutex_destroy(&worker->impl_.mutex_);
+      pthread_cond_destroy(&worker->impl_.condition_);
+    Error:
+      return 0;
+    }
+#else
+    worker->status_ = OK;
+#endif
+  } else if (worker->status_ > OK) {
+    ok = sync(worker);
+  }
+  assert(!ok || (worker->status_ == OK));
+  return ok;
+}
+
+static void execute(AVxWorker *const worker) {
+  if (worker->hook != NULL) {
+    worker->had_error |= !worker->hook(worker->data1, worker->data2);
+  }
+}
+
+static void launch(AVxWorker *const worker) {
+#if CONFIG_MULTITHREAD
+  change_state(worker, WORK);
+#else
+  execute(worker);
+#endif
+}
+
+static void end(AVxWorker *const worker) {
+#if CONFIG_MULTITHREAD
+  if (worker->status_ != NOT_OK)
+    change_state(worker, NOT_OK);
+  pthread_join(worker->impl_.thread_, NULL);
+  pthread_mutex_destroy(&worker->impl_.mutex_);
+  pthread_cond_destroy(&worker->impl_.condition_);
+#else
+  worker->status_ = NOT_OK;
+  assert(worker->impl_ == NULL);
+#endif
+  assert(worker->status_ == NOT_OK);
+}
+
+//------------------------------------------------------------------------------
+
+static AVxWorkerInterface g_worker_interface = { init,   reset,   sync,
+                                                 launch, execute, end };
+
+int aom_set_worker_interface(const AVxWorkerInterface *const winterface) {
+  if (winterface == NULL || winterface->init == NULL ||
+      winterface->reset == NULL || winterface->sync == NULL ||
+      winterface->launch == NULL || winterface->execute == NULL ||
+      winterface->end == NULL) {
+    return 0;
+  }
+  g_worker_interface = *winterface;
+  return 1;
+}
+
+const AVxWorkerInterface *aom_get_worker_interface(void) {
+  return &g_worker_interface;
+}
+
+//------------------------------------------------------------------------------

diff --git a/libav1/aom_util/aom_thread.h b/libav1/aom_util/aom_thread.h
new file mode 100644
index 0000000..3d8d63c
--- /dev/null
+++ b/libav1/aom_util/aom_thread.h

@@ -0,0 +1,439 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+//
+// Multi-threaded worker
+//
+// Original source:
+//  https://chromium.googlesource.com/webm/libwebp
+
+#ifndef AOM_AOM_UTIL_AOM_THREAD_H_
+#define AOM_AOM_UTIL_AOM_THREAD_H_
+
+#include "config/aom_config.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// Set maximum decode threads to be 8 due to the limit of frame buffers
+// and not enough semaphores in the emulation layer on windows.
+#define MAX_DECODE_THREADS 8
+#define MAX_NUM_THREADS 64
+
+#if CONFIG_MULTITHREAD
+
+#if defined(_WIN32) && !HAVE_PTHREAD_H
+#include <errno.h>    // NOLINT
+#include <process.h>  // NOLINT
+#include <windows.h>  // NOLINT
+typedef HANDLE pthread_t;
+typedef CRITICAL_SECTION pthread_mutex_t;
+
+#if _WIN32_WINNT >= 0x0600  // Windows Vista / Server 2008 or greater
+#define USE_WINDOWS_CONDITION_VARIABLE
+typedef CONDITION_VARIABLE pthread_cond_t;
+#else
+typedef struct {
+  HANDLE waiting_sem_;
+  HANDLE received_sem_;
+  HANDLE signal_event_;
+} pthread_cond_t;
+#endif  // _WIN32_WINNT >= 0x600
+
+#ifndef WINAPI_FAMILY_PARTITION
+#define WINAPI_PARTITION_DESKTOP 1
+#define WINAPI_FAMILY_PARTITION(x) x
+#endif
+
+#if 1//!WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)
+#define USE_CREATE_THREAD
+#endif
+
+//------------------------------------------------------------------------------
+// simplistic pthread emulation layer
+
+// _beginthreadex requires __stdcall
+#define THREADFN unsigned int __stdcall
+#define THREAD_RETURN(val) (unsigned int)((DWORD_PTR)val)
+
+#if _WIN32_WINNT >= 0x0501  // Windows XP or greater
+#define WaitForSingleObject(obj, timeout) \
+  WaitForSingleObjectEx(obj, timeout, FALSE /*bAlertable*/)
+#endif
+
+static INLINE int pthread_create(pthread_t *const thread, const void *attr,
+    LPTHREAD_START_ROUTINE start,
+                                 void *arg) {
+  (void)attr;
+#ifdef USE_CREATE_THREAD
+  *thread = CreateThread(NULL,          /* lpThreadAttributes */
+                         0,             /* dwStackSize */
+                         start, arg, 0, /* dwStackSize */
+                         NULL);         /* lpThreadId */
+#else
+  *thread = (pthread_t)_beginthreadex(NULL,          /* void *security */
+                                      0,             /* unsigned stack_size */
+                                      start, arg, 0, /* unsigned initflag */
+                                      NULL);         /* unsigned *thrdaddr */
+#endif
+  if (*thread == NULL) return 1;
+  SetThreadPriority(*thread, THREAD_PRIORITY_ABOVE_NORMAL);
+  return 0;
+}
+
+static INLINE int pthread_join(pthread_t thread, void **value_ptr) {
+  (void)value_ptr;
+  return (WaitForSingleObject(thread, INFINITE) != WAIT_OBJECT_0 ||
+          CloseHandle(thread) == 0);
+}
+
+// Mutex
+static INLINE int pthread_mutex_init(pthread_mutex_t *const mutex,
+                                     void *mutexattr) {
+  (void)mutexattr;
+#if _WIN32_WINNT >= 0x0600  // Windows Vista / Server 2008 or greater
+  InitializeCriticalSectionEx(mutex, 0 /*dwSpinCount*/, 0 /*Flags*/);
+#else
+  InitializeCriticalSection(mutex);
+#endif
+  return 0;
+}
+
+static INLINE int pthread_mutex_trylock(pthread_mutex_t *const mutex) {
+  return TryEnterCriticalSection(mutex) ? 0 : EBUSY;
+}
+
+static INLINE int pthread_mutex_lock(pthread_mutex_t *const mutex) {
+  EnterCriticalSection(mutex);
+  return 0;
+}
+
+static INLINE int pthread_mutex_unlock(pthread_mutex_t *const mutex) {
+  LeaveCriticalSection(mutex);
+  return 0;
+}
+
+static INLINE int pthread_mutex_destroy(pthread_mutex_t *const mutex) {
+  DeleteCriticalSection(mutex);
+  return 0;
+}
+
+// Condition
+static INLINE int pthread_cond_destroy(pthread_cond_t *const condition) {
+  int ok = 1;
+#ifdef USE_WINDOWS_CONDITION_VARIABLE
+  (void)condition;
+#else
+  ok &= (CloseHandle(condition->waiting_sem_) != 0);
+  ok &= (CloseHandle(condition->received_sem_) != 0);
+  ok &= (CloseHandle(condition->signal_event_) != 0);
+#endif
+  return !ok;
+}
+
+static INLINE int pthread_cond_init(pthread_cond_t *const condition,
+                                    void *cond_attr) {
+  (void)cond_attr;
+#ifdef USE_WINDOWS_CONDITION_VARIABLE
+  InitializeConditionVariable(condition);
+#else
+  condition->waiting_sem_ = CreateSemaphore(NULL, 0, MAX_DECODE_THREADS, NULL);
+  condition->received_sem_ = CreateSemaphore(NULL, 0, MAX_DECODE_THREADS, NULL);
+  condition->signal_event_ = CreateEvent(NULL, FALSE, FALSE, NULL);
+  if (condition->waiting_sem_ == NULL || condition->received_sem_ == NULL ||
+      condition->signal_event_ == NULL) {
+    pthread_cond_destroy(condition);
+    return 1;
+  }
+#endif
+  return 0;
+}
+
+static INLINE int pthread_cond_signal(pthread_cond_t *const condition) {
+  int ok = 1;
+#ifdef USE_WINDOWS_CONDITION_VARIABLE
+  WakeConditionVariable(condition);
+#else
+  if (WaitForSingleObject(condition->waiting_sem_, 0) == WAIT_OBJECT_0) {
+    // a thread is waiting in pthread_cond_wait: allow it to be notified
+    ok = SetEvent(condition->signal_event_);
+    // wait until the event is consumed so the signaler cannot consume
+    // the event via its own pthread_cond_wait.
+    ok &= (WaitForSingleObject(condition->received_sem_, INFINITE) !=
+           WAIT_OBJECT_0);
+  }
+#endif
+  return !ok;
+}
+
+static INLINE int pthread_cond_broadcast(pthread_cond_t *const condition) {
+  int ok = 1;
+#ifdef USE_WINDOWS_CONDITION_VARIABLE
+  WakeAllConditionVariable(condition);
+#else
+  while (WaitForSingleObject(condition->waiting_sem_, 0) == WAIT_OBJECT_0) {
+    // a thread is waiting in pthread_cond_wait: allow it to be notified
+    ok &= SetEvent(condition->signal_event_);
+    // wait until the event is consumed so the signaler cannot consume
+    // the event via its own pthread_cond_wait.
+    ok &= (WaitForSingleObject(condition->received_sem_, INFINITE) !=
+           WAIT_OBJECT_0);
+  }
+#endif
+  return !ok;
+}
+
+static INLINE int pthread_cond_wait(pthread_cond_t *const condition,
+                                    pthread_mutex_t *const mutex) {
+  int ok;
+#ifdef USE_WINDOWS_CONDITION_VARIABLE
+  ok = SleepConditionVariableCS(condition, mutex, INFINITE);
+#else
+  // note that there is a consumer available so the signal isn't dropped in
+  // pthread_cond_signal
+  if (!ReleaseSemaphore(condition->waiting_sem_, 1, NULL)) return 1;
+  // now unlock the mutex so pthread_cond_signal may be issued
+  pthread_mutex_unlock(mutex);
+  ok = (WaitForSingleObject(condition->signal_event_, INFINITE) ==
+        WAIT_OBJECT_0);
+  ok &= ReleaseSemaphore(condition->received_sem_, 1, NULL);
+  pthread_mutex_lock(mutex);
+#endif
+  return !ok;
+}
+#elif defined(__OS2__)
+#define INCL_DOS
+#include <os2.h>  // NOLINT
+
+#include <errno.h>        // NOLINT
+#include <stdlib.h>       // NOLINT
+#include <sys/builtin.h>  // NOLINT
+
+#define pthread_t TID
+#define pthread_mutex_t HMTX
+
+typedef struct {
+  HEV event_sem_;
+  HEV ack_sem_;
+  volatile unsigned wait_count_;
+} pthread_cond_t;
+
+//------------------------------------------------------------------------------
+// simplistic pthread emulation layer
+
+#define THREADFN void *
+#define THREAD_RETURN(val) (val)
+
+typedef struct {
+  void *(*start_)(void *);
+  void *arg_;
+} thread_arg;
+
+static void thread_start(void *arg) {
+  thread_arg targ = *(thread_arg *)arg;
+  free(arg);
+
+  targ.start_(targ.arg_);
+}
+
+static INLINE int pthread_create(pthread_t *const thread, const void *attr,
+                                 void *(*start)(void *), void *arg) {
+  int tid;
+  thread_arg *targ = (thread_arg *)malloc(sizeof(*targ));
+  if (targ == NULL) return 1;
+
+  (void)attr;
+
+  targ->start_ = start;
+  targ->arg_ = arg;
+  tid = (pthread_t)_beginthread(thread_start, NULL, 1024 * 1024, targ);
+  if (tid == -1) {
+    free(targ);
+    return 1;
+  }
+
+  *thread = tid;
+  return 0;
+}
+
+static INLINE int pthread_join(pthread_t thread, void **value_ptr) {
+  (void)value_ptr;
+  return DosWaitThread(&thread, DCWW_WAIT) != 0;
+}
+
+// Mutex
+static INLINE int pthread_mutex_init(pthread_mutex_t *const mutex,
+                                     void *mutexattr) {
+  (void)mutexattr;
+  return DosCreateMutexSem(NULL, mutex, 0, FALSE) != 0;
+}
+
+static INLINE int pthread_mutex_trylock(pthread_mutex_t *const mutex) {
+  return DosRequestMutexSem(*mutex, SEM_IMMEDIATE_RETURN) == 0 ? 0 : EBUSY;
+}
+
+static INLINE int pthread_mutex_lock(pthread_mutex_t *const mutex) {
+  return DosRequestMutexSem(*mutex, SEM_INDEFINITE_WAIT) != 0;
+}
+
+static INLINE int pthread_mutex_unlock(pthread_mutex_t *const mutex) {
+  return DosReleaseMutexSem(*mutex) != 0;
+}
+
+static INLINE int pthread_mutex_destroy(pthread_mutex_t *const mutex) {
+  return DosCloseMutexSem(*mutex) != 0;
+}
+
+// Condition
+static INLINE int pthread_cond_destroy(pthread_cond_t *const condition) {
+  int ok = 1;
+  ok &= DosCloseEventSem(condition->event_sem_) == 0;
+  ok &= DosCloseEventSem(condition->ack_sem_) == 0;
+  return !ok;
+}
+
+static INLINE int pthread_cond_init(pthread_cond_t *const condition,
+                                    void *cond_attr) {
+  int ok = 1;
+  (void)cond_attr;
+
+  ok &=
+      DosCreateEventSem(NULL, &condition->event_sem_, DCE_POSTONE, FALSE) == 0;
+  ok &= DosCreateEventSem(NULL, &condition->ack_sem_, DCE_POSTONE, FALSE) == 0;
+  if (!ok) {
+    pthread_cond_destroy(condition);
+    return 1;
+  }
+  condition->wait_count_ = 0;
+  return 0;
+}
+
+static INLINE int pthread_cond_signal(pthread_cond_t *const condition) {
+  int ok = 1;
+
+  if (!__atomic_cmpxchg32(&condition->wait_count_, 0, 0)) {
+    ok &= DosPostEventSem(condition->event_sem_) == 0;
+    ok &= DosWaitEventSem(condition->ack_sem_, SEM_INDEFINITE_WAIT) == 0;
+  }
+
+  return !ok;
+}
+
+static INLINE int pthread_cond_broadcast(pthread_cond_t *const condition) {
+  int ok = 1;
+
+  while (!__atomic_cmpxchg32(&condition->wait_count_, 0, 0))
+    ok &= pthread_cond_signal(condition) == 0;
+
+  return !ok;
+}
+
+static INLINE int pthread_cond_wait(pthread_cond_t *const condition,
+                                    pthread_mutex_t *const mutex) {
+  int ok = 1;
+
+  __atomic_increment(&condition->wait_count_);
+
+  ok &= pthread_mutex_unlock(mutex) == 0;
+
+  ok &= DosWaitEventSem(condition->event_sem_, SEM_INDEFINITE_WAIT) == 0;
+
+  __atomic_decrement(&condition->wait_count_);
+
+  ok &= DosPostEventSem(condition->ack_sem_) == 0;
+
+  pthread_mutex_lock(mutex);
+
+  return !ok;
+}
+#else                 // _WIN32
+#include <pthread.h>  // NOLINT
+#define THREADFN void *
+#define THREAD_RETURN(val) val
+#endif
+
+#endif  // CONFIG_MULTITHREAD
+
+// State of the worker thread object
+typedef enum {
+  NOT_OK = 0,  // object is unusable
+  OK,          // ready to work
+  WORK         // busy finishing the current task
+} AVxWorkerStatus;
+
+// Function to be called by the worker thread. Takes two opaque pointers as
+// arguments (data1 and data2). Should return true on success and return false
+// in case of error.
+typedef int (*AVxWorkerHook)(void *, void *);
+
+// Platform-dependent implementation details for the worker.
+typedef struct {
+    pthread_mutex_t mutex_;
+    pthread_cond_t condition_;
+    pthread_t thread_;
+} AVxWorkerImpl;
+
+// Synchronization object used to launch job in the worker thread
+typedef struct {
+  AVxWorkerImpl impl_;
+  AVxWorkerStatus status_;
+  // Thread name for the debugger. If not NULL, must point to a string that
+  // outlives the worker thread. For portability, use a name <= 15 characters
+  // long (not including the terminating NUL character).
+  const char *thread_name;
+  AVxWorkerHook hook;  // hook to call
+  void *data1;         // first argument passed to 'hook'
+  void *data2;         // second argument passed to 'hook'
+  int had_error;       // true if a call to 'hook' returned false
+} AVxWorker;
+
+// The interface for all thread-worker related functions. All these functions
+// must be implemented.
+typedef struct {
+  // Must be called first, before any other method.
+  void (*init)(AVxWorker *const worker);
+  // Must be called to initialize the object and spawn the thread. Re-entrant.
+  // Will potentially launch the thread. Returns false in case of error.
+  int (*reset)(AVxWorker *const worker);
+  // Makes sure the previous work is finished. Returns true if worker->had_error
+  // was not set and no error condition was triggered by the working thread.
+  int (*sync)(AVxWorker *const worker);
+  // Triggers the thread to call hook() with data1 and data2 arguments. These
+  // hook/data1/data2 values can be changed at any time before calling this
+  // function, but not be changed afterward until the next call to Sync().
+  void (*launch)(AVxWorker *const worker);
+  // This function is similar to launch() except that it calls the
+  // hook directly instead of using a thread. Convenient to bypass the thread
+  // mechanism while still using the AVxWorker structs. sync() must
+  // still be called afterward (for error reporting).
+  void (*execute)(AVxWorker *const worker);
+  // Kill the thread and terminate the object. To use the object again, one
+  // must call reset() again.
+  void (*end)(AVxWorker *const worker);
+} AVxWorkerInterface;
+
+// Install a new set of threading functions, overriding the defaults. This
+// should be done before any workers are started, i.e., before any encoding or
+// decoding takes place. The contents of the interface struct are copied, it
+// is safe to free the corresponding memory after this call. This function is
+// not thread-safe. Return false in case of invalid pointer or methods.
+int aom_set_worker_interface(const AVxWorkerInterface *const winterface);
+
+// Retrieve the currently set thread worker interface.
+const AVxWorkerInterface *aom_get_worker_interface(void);
+
+//------------------------------------------------------------------------------
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_UTIL_AOM_THREAD_H_

diff --git a/libav1/aom_util/debug_util.c b/libav1/aom_util/debug_util.c
new file mode 100644
index 0000000..468c47e
--- /dev/null
+++ b/libav1/aom_util/debug_util.c

@@ -0,0 +1,275 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+#include <stdio.h>
+#include <string.h>
+#include "aom_util/debug_util.h"
+
+static int frame_idx_w = 0;
+
+static int frame_idx_r = 0;
+
+void bitstream_queue_set_frame_write(int frame_idx) { frame_idx_w = frame_idx; }
+
+int bitstream_queue_get_frame_write(void) { return frame_idx_w; }
+
+void bitstream_queue_set_frame_read(int frame_idx) { frame_idx_r = frame_idx; }
+
+int bitstream_queue_get_frame_read(void) { return frame_idx_r; }
+
+#if CONFIG_BITSTREAM_DEBUG
+#define QUEUE_MAX_SIZE 2000000
+static int result_queue[QUEUE_MAX_SIZE];
+static int nsymbs_queue[QUEUE_MAX_SIZE];
+static aom_cdf_prob cdf_queue[QUEUE_MAX_SIZE][16];
+
+static int queue_r = 0;
+static int queue_w = 0;
+static int queue_prev_w = -1;
+static int skip_r = 0;
+static int skip_w = 0;
+
+void bitstream_queue_set_skip_write(int skip) { skip_w = skip; }
+
+void bitstream_queue_set_skip_read(int skip) { skip_r = skip; }
+
+void bitstream_queue_record_write(void) { queue_prev_w = queue_w; }
+
+void bitstream_queue_reset_write(void) { queue_w = queue_prev_w; }
+
+int bitstream_queue_get_write(void) { return queue_w; }
+
+int bitstream_queue_get_read(void) { return queue_r; }
+
+void bitstream_queue_pop(int *result, aom_cdf_prob *cdf, int *nsymbs) {
+  if (!skip_r) {
+    if (queue_w == queue_r) {
+      printf("buffer underflow queue_w %d queue_r %d\n", queue_w, queue_r);
+      assert(0);
+    }
+    *result = result_queue[queue_r];
+    *nsymbs = nsymbs_queue[queue_r];
+    memcpy(cdf, cdf_queue[queue_r], *nsymbs * sizeof(*cdf));
+    queue_r = (queue_r + 1) % QUEUE_MAX_SIZE;
+  }
+}
+
+void bitstream_queue_push(int result, const aom_cdf_prob *cdf, int nsymbs) {
+  if (!skip_w) {
+    result_queue[queue_w] = result;
+    nsymbs_queue[queue_w] = nsymbs;
+    memcpy(cdf_queue[queue_w], cdf, nsymbs * sizeof(*cdf));
+    queue_w = (queue_w + 1) % QUEUE_MAX_SIZE;
+    if (queue_w == queue_r) {
+      printf("buffer overflow queue_w %d queue_r %d\n", queue_w, queue_r);
+      assert(0);
+    }
+  }
+}
+#endif  // CONFIG_BITSTREAM_DEBUG
+
+#if CONFIG_MISMATCH_DEBUG
+static int frame_buf_idx_r = 0;
+static int frame_buf_idx_w = 0;
+static int max_frame_buf_num = 5;
+#define MAX_FRAME_STRIDE 1280
+#define MAX_FRAME_HEIGHT 720
+static uint16_t
+    frame_pre[5][3][MAX_FRAME_STRIDE * MAX_FRAME_HEIGHT];  // prediction only
+static uint16_t
+    frame_tx[5][3][MAX_FRAME_STRIDE * MAX_FRAME_HEIGHT];  // prediction + txfm
+static int frame_stride = MAX_FRAME_STRIDE;
+static int frame_height = MAX_FRAME_HEIGHT;
+static int frame_size = MAX_FRAME_STRIDE * MAX_FRAME_HEIGHT;
+void mismatch_move_frame_idx_w() {
+  frame_buf_idx_w = (frame_buf_idx_w + 1) % max_frame_buf_num;
+  if (frame_buf_idx_w == frame_buf_idx_r) {
+    printf("frame_buf overflow\n");
+    assert(0);
+  }
+}
+
+void mismatch_reset_frame(int num_planes) {
+  for (int plane = 0; plane < num_planes; ++plane) {
+    memset(frame_pre[frame_buf_idx_w][plane], 0,
+           sizeof(frame_pre[frame_buf_idx_w][plane][0]) * frame_size);
+    memset(frame_tx[frame_buf_idx_w][plane], 0,
+           sizeof(frame_tx[frame_buf_idx_w][plane][0]) * frame_size);
+  }
+}
+
+void mismatch_move_frame_idx_r() {
+  if (frame_buf_idx_w == frame_buf_idx_r) {
+    printf("frame_buf underflow\n");
+    assert(0);
+  }
+  frame_buf_idx_r = (frame_buf_idx_r + 1) % max_frame_buf_num;
+}
+
+void mismatch_record_block_pre(const uint8_t *src, int src_stride,
+                               int frame_offset, int plane, int pixel_c,
+                               int pixel_r, int blk_w, int blk_h, int highbd) {
+  if (pixel_c + blk_w >= frame_stride || pixel_r + blk_h >= frame_height) {
+    printf("frame_buf undersized\n");
+    assert(0);
+  }
+
+  const uint16_t *src16 = highbd ? CONVERT_TO_SHORTPTR(src) : NULL;
+  for (int r = 0; r < blk_h; ++r) {
+    for (int c = 0; c < blk_w; ++c) {
+      frame_pre[frame_buf_idx_w][plane]
+               [(r + pixel_r) * frame_stride + c + pixel_c] =
+                   src16 ? src16[r * src_stride + c] : src[r * src_stride + c];
+    }
+  }
+#if 0
+  int ref_frame_idx = 3;
+  int ref_frame_offset = 4;
+  int ref_plane = 1;
+  int ref_pixel_c = 162;
+  int ref_pixel_r = 16;
+  if (frame_idx_w == ref_frame_idx && plane == ref_plane &&
+      frame_offset == ref_frame_offset && ref_pixel_c >= pixel_c &&
+      ref_pixel_c < pixel_c + blk_w && ref_pixel_r >= pixel_r &&
+      ref_pixel_r < pixel_r + blk_h) {
+    printf(
+        "\nrecord_block_pre frame_idx %d frame_offset %d plane %d pixel_c %d pixel_r %d blk_w "
+        "%d blk_h %d\n",
+        frame_idx_w, frame_offset, plane, pixel_c, pixel_r, blk_w, blk_h);
+  }
+#endif
+}
+void mismatch_record_block_tx(const uint8_t *src, int src_stride,
+                              int frame_offset, int plane, int pixel_c,
+                              int pixel_r, int blk_w, int blk_h, int highbd) {
+  if (pixel_c + blk_w >= frame_stride || pixel_r + blk_h >= frame_height) {
+    printf("frame_buf undersized\n");
+    assert(0);
+  }
+
+  const uint16_t *src16 = highbd ? CONVERT_TO_SHORTPTR(src) : NULL;
+  for (int r = 0; r < blk_h; ++r) {
+    for (int c = 0; c < blk_w; ++c) {
+      frame_tx[frame_buf_idx_w][plane]
+              [(r + pixel_r) * frame_stride + c + pixel_c] =
+                  src16 ? src16[r * src_stride + c] : src[r * src_stride + c];
+    }
+  }
+#if 0
+  int ref_frame_idx = 3;
+  int ref_frame_offset = 4;
+  int ref_plane = 1;
+  int ref_pixel_c = 162;
+  int ref_pixel_r = 16;
+  if (frame_idx_w == ref_frame_idx && plane == ref_plane && frame_offset == ref_frame_offset &&
+      ref_pixel_c >= pixel_c && ref_pixel_c < pixel_c + blk_w &&
+      ref_pixel_r >= pixel_r && ref_pixel_r < pixel_r + blk_h) {
+    printf(
+        "\nrecord_block_tx frame_idx %d frame_offset %d plane %d pixel_c %d pixel_r %d blk_w "
+        "%d blk_h %d\n",
+        frame_idx_w, frame_offset, plane, pixel_c, pixel_r, blk_w, blk_h);
+  }
+#endif
+}
+void mismatch_check_block_pre(const uint8_t *src, int src_stride,
+                              int frame_offset, int plane, int pixel_c,
+                              int pixel_r, int blk_w, int blk_h, int highbd) {
+  if (pixel_c + blk_w >= frame_stride || pixel_r + blk_h >= frame_height) {
+    printf("frame_buf undersized\n");
+    assert(0);
+  }
+
+  const uint16_t *src16 = highbd ? CONVERT_TO_SHORTPTR(src) : NULL;
+  int mismatch = 0;
+  for (int r = 0; r < blk_h; ++r) {
+    for (int c = 0; c < blk_w; ++c) {
+      if (frame_pre[frame_buf_idx_r][plane]
+                   [(r + pixel_r) * frame_stride + c + pixel_c] !=
+          (uint16_t)(src16 ? src16[r * src_stride + c]
+                           : src[r * src_stride + c])) {
+        mismatch = 1;
+      }
+    }
+  }
+  if (mismatch) {
+    printf(
+        "\ncheck_block_pre failed frame_idx %d frame_offset %d plane %d "
+        "pixel_c %d pixel_r "
+        "%d blk_w %d blk_h %d\n",
+        frame_idx_r, frame_offset, plane, pixel_c, pixel_r, blk_w, blk_h);
+    printf("enc\n");
+    for (int rr = 0; rr < blk_h; ++rr) {
+      for (int cc = 0; cc < blk_w; ++cc) {
+        printf("%d ", frame_pre[frame_buf_idx_r][plane]
+                               [(rr + pixel_r) * frame_stride + cc + pixel_c]);
+      }
+      printf("\n");
+    }
+
+    printf("dec\n");
+    for (int rr = 0; rr < blk_h; ++rr) {
+      for (int cc = 0; cc < blk_w; ++cc) {
+        printf("%d ",
+               src16 ? src16[rr * src_stride + cc] : src[rr * src_stride + cc]);
+      }
+      printf("\n");
+    }
+    assert(0);
+  }
+}
+void mismatch_check_block_tx(const uint8_t *src, int src_stride,
+                             int frame_offset, int plane, int pixel_c,
+                             int pixel_r, int blk_w, int blk_h, int highbd) {
+  if (pixel_c + blk_w >= frame_stride || pixel_r + blk_h >= frame_height) {
+    printf("frame_buf undersized\n");
+    assert(0);
+  }
+
+  const uint16_t *src16 = highbd ? CONVERT_TO_SHORTPTR(src) : NULL;
+  int mismatch = 0;
+  for (int r = 0; r < blk_h; ++r) {
+    for (int c = 0; c < blk_w; ++c) {
+      if (frame_tx[frame_buf_idx_r][plane]
+                  [(r + pixel_r) * frame_stride + c + pixel_c] !=
+          (uint16_t)(src16 ? src16[r * src_stride + c]
+                           : src[r * src_stride + c])) {
+        mismatch = 1;
+      }
+    }
+  }
+  if (mismatch) {
+    printf(
+        "\ncheck_block_tx failed frame_idx %d frame_offset %d plane %d pixel_c "
+        "%d pixel_r "
+        "%d blk_w %d blk_h %d\n",
+        frame_idx_r, frame_offset, plane, pixel_c, pixel_r, blk_w, blk_h);
+    printf("enc\n");
+    for (int rr = 0; rr < blk_h; ++rr) {
+      for (int cc = 0; cc < blk_w; ++cc) {
+        printf("%d ", frame_tx[frame_buf_idx_r][plane]
+                              [(rr + pixel_r) * frame_stride + cc + pixel_c]);
+      }
+      printf("\n");
+    }
+
+    printf("dec\n");
+    for (int rr = 0; rr < blk_h; ++rr) {
+      for (int cc = 0; cc < blk_w; ++cc) {
+        printf("%d ",
+               src16 ? src16[rr * src_stride + cc] : src[rr * src_stride + cc]);
+      }
+      printf("\n");
+    }
+    assert(0);
+  }
+}
+#endif  // CONFIG_MISMATCH_DEBUG

diff --git a/libav1/aom_util/debug_util.h b/libav1/aom_util/debug_util.h
new file mode 100644
index 0000000..127a8b4
--- /dev/null
+++ b/libav1/aom_util/debug_util.h

@@ -0,0 +1,69 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_UTIL_DEBUG_UTIL_H_
+#define AOM_AOM_UTIL_DEBUG_UTIL_H_
+
+#include "config/aom_config.h"
+
+#include "aom_dsp/prob.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void bitstream_queue_set_frame_write(int frame_idx);
+int bitstream_queue_get_frame_write(void);
+void bitstream_queue_set_frame_read(int frame_idx);
+int bitstream_queue_get_frame_read(void);
+
+#if CONFIG_BITSTREAM_DEBUG
+/* This is a debug tool used to detect bitstream error. On encoder side, it
+ * pushes each bit and probability into a queue before the bit is written into
+ * the Arithmetic coder. On decoder side, whenever a bit is read out from the
+ * Arithmetic coder, it pops out the reference bit and probability from the
+ * queue as well. If the two results do not match, this debug tool will report
+ * an error.  This tool can be used to pin down the bitstream error precisely.
+ * By combining gdb's backtrace method, we can detect which module causes the
+ * bitstream error. */
+int bitstream_queue_get_write(void);
+int bitstream_queue_get_read(void);
+void bitstream_queue_record_write(void);
+void bitstream_queue_reset_write(void);
+void bitstream_queue_pop(int *result, aom_cdf_prob *cdf, int *nsymbs);
+void bitstream_queue_push(int result, const aom_cdf_prob *cdf, int nsymbs);
+void bitstream_queue_set_skip_write(int skip);
+void bitstream_queue_set_skip_read(int skip);
+#endif  // CONFIG_BITSTREAM_DEBUG
+
+#if CONFIG_MISMATCH_DEBUG
+void mismatch_move_frame_idx_w();
+void mismatch_move_frame_idx_r();
+void mismatch_reset_frame(int num_planes);
+void mismatch_record_block_pre(const uint8_t *src, int src_stride,
+                               int frame_offset, int plane, int pixel_c,
+                               int pixel_r, int blk_w, int blk_h, int highbd);
+void mismatch_record_block_tx(const uint8_t *src, int src_stride,
+                              int frame_offset, int plane, int pixel_c,
+                              int pixel_r, int blk_w, int blk_h, int highbd);
+void mismatch_check_block_pre(const uint8_t *src, int src_stride,
+                              int frame_offset, int plane, int pixel_c,
+                              int pixel_r, int blk_w, int blk_h, int highbd);
+void mismatch_check_block_tx(const uint8_t *src, int src_stride,
+                             int frame_offset, int plane, int pixel_c,
+                             int pixel_r, int blk_w, int blk_h, int highbd);
+#endif  // CONFIG_MISMATCH_DEBUG
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AOM_UTIL_DEBUG_UTIL_H_

diff --git a/libav1/aom_util/endian_inl.h b/libav1/aom_util/endian_inl.h
new file mode 100644
index 0000000..f536ec5
--- /dev/null
+++ b/libav1/aom_util/endian_inl.h

@@ -0,0 +1,122 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+//
+// Endian related functions.
+
+#ifndef AOM_AOM_UTIL_ENDIAN_INL_H_
+#define AOM_AOM_UTIL_ENDIAN_INL_H_
+
+#include <stdlib.h>
+
+#include "config/aom_config.h"
+
+#include "aom/aom_integer.h"
+
+#if defined(__GNUC__)
+#define LOCAL_GCC_VERSION ((__GNUC__ << 8) | __GNUC_MINOR__)
+#define LOCAL_GCC_PREREQ(maj, min) (LOCAL_GCC_VERSION >= (((maj) << 8) | (min)))
+#else
+#define LOCAL_GCC_VERSION 0
+#define LOCAL_GCC_PREREQ(maj, min) 0
+#endif
+
+// handle clang compatibility
+#ifndef __has_builtin
+#define __has_builtin(x) 0
+#endif
+
+// some endian fix (e.g.: mips-gcc doesn't define __BIG_ENDIAN__)
+#if !defined(WORDS_BIGENDIAN) &&                   \
+    (defined(__BIG_ENDIAN__) || defined(_M_PPC) || \
+     (defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)))
+#define WORDS_BIGENDIAN
+#endif
+
+#if defined(WORDS_BIGENDIAN)
+#define HToLE32 BSwap32
+#define HToLE16 BSwap16
+#define HToBE64(x) (x)
+#define HToBE32(x) (x)
+#else
+#define HToLE32(x) (x)
+#define HToLE16(x) (x)
+#define HToBE64(X) BSwap64(X)
+#define HToBE32(X) BSwap32(X)
+#endif
+
+#if LOCAL_GCC_PREREQ(4, 8) || __has_builtin(__builtin_bswap16)
+#define HAVE_BUILTIN_BSWAP16
+#endif
+
+#if LOCAL_GCC_PREREQ(4, 3) || __has_builtin(__builtin_bswap32)
+#define HAVE_BUILTIN_BSWAP32
+#endif
+
+#if LOCAL_GCC_PREREQ(4, 3) || __has_builtin(__builtin_bswap64)
+#define HAVE_BUILTIN_BSWAP64
+#endif
+
+#if HAVE_MIPS32 && defined(__mips__) && !defined(__mips64) && \
+    defined(__mips_isa_rev) && (__mips_isa_rev >= 2) && (__mips_isa_rev < 6)
+#define AOM_USE_MIPS32_R2
+#endif
+
+static INLINE uint16_t BSwap16(uint16_t x) {
+#if defined(HAVE_BUILTIN_BSWAP16)
+  return __builtin_bswap16(x);
+#elif defined(_MSC_VER)
+  return _byteswap_ushort(x);
+#else
+  // gcc will recognize a 'rorw $8, ...' here:
+  return (x >> 8) | ((x & 0xff) << 8);
+#endif  // HAVE_BUILTIN_BSWAP16
+}
+
+static INLINE uint32_t BSwap32(uint32_t x) {
+#if defined(AOM_USE_MIPS32_R2)
+  uint32_t ret;
+  __asm__ volatile(
+      "wsbh   %[ret], %[x]          \n\t"
+      "rotr   %[ret], %[ret],  16   \n\t"
+      : [ret] "=r"(ret)
+      : [x] "r"(x));
+  return ret;
+#elif defined(HAVE_BUILTIN_BSWAP32)
+  return __builtin_bswap32(x);
+#elif defined(__i386__) || defined(__x86_64__)
+  uint32_t swapped_bytes;
+  __asm__ volatile("bswap %0" : "=r"(swapped_bytes) : "0"(x));
+  return swapped_bytes;
+#elif defined(_MSC_VER)
+  return (uint32_t)_byteswap_ulong(x);
+#else
+  return (x >> 24) | ((x >> 8) & 0xff00) | ((x << 8) & 0xff0000) | (x << 24);
+#endif  // HAVE_BUILTIN_BSWAP32
+}
+
+static INLINE uint64_t BSwap64(uint64_t x) {
+#if defined(HAVE_BUILTIN_BSWAP64)
+  return __builtin_bswap64(x);
+#elif defined(__x86_64__)
+  uint64_t swapped_bytes;
+  __asm__ volatile("bswapq %0" : "=r"(swapped_bytes) : "0"(x));
+  return swapped_bytes;
+#elif defined(_MSC_VER)
+  return (uint64_t)_byteswap_uint64(x);
+#else   // generic code for swapping 64-bit values (suggested by bdb@)
+  x = ((x & 0xffffffff00000000ull) >> 32) | ((x & 0x00000000ffffffffull) << 32);
+  x = ((x & 0xffff0000ffff0000ull) >> 16) | ((x & 0x0000ffff0000ffffull) << 16);
+  x = ((x & 0xff00ff00ff00ff00ull) >> 8) | ((x & 0x00ff00ff00ff00ffull) << 8);
+  return x;
+#endif  // HAVE_BUILTIN_BSWAP64
+}
+
+#endif  // AOM_AOM_UTIL_ENDIAN_INL_H_

diff --git a/libav1/av1/av1_dx_iface.c b/libav1/av1/av1_dx_iface.c
new file mode 100644
index 0000000..a5d5e0a
--- /dev/null
+++ b/libav1/av1/av1_dx_iface.c

@@ -0,0 +1,781 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+ /*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+
+#include <stdlib.h>
+#include <string.h>
+
+#include "config/aom_config.h"
+#include "config/aom_version.h"
+
+#include "aom/internal/aom_codec_internal.h"
+#include "aom/aomdx.h"
+#include "aom/aom_decoder.h"
+#include "aom_dsp/bitreader_buffer.h"
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_ports/mem_ops.h"
+#include "aom_util/aom_thread.h"
+
+#include "av1/common/alloccommon.h"
+#include "av1/common/frame_buffers.h"
+#include "av1/common/enums.h"
+#include "av1/common/obu_util.h"
+
+#include "av1/decoder/decoder.h"
+#include "av1/decoder/decodeframe.h"
+#include "av1/decoder/obu.h"
+
+#include "av1/av1_iface_common.h"
+#include "dx/av1_core.h"
+
+struct aom_codec_alg_priv {
+  aom_codec_priv_t base;
+  aom_codec_dec_cfg_t cfg;
+  aom_codec_stream_info_t si;
+  int postproc_cfg_set;
+  aom_postproc_cfg_t postproc_cfg;
+  aom_image_t img;
+  int img_avail;
+  int flushed;
+  int invert_tile_order;
+  RefCntBuffer *last_show_frame;  // Last output frame buffer
+  int byte_alignment;
+  int skip_loop_filter;
+  int skip_film_grain;
+  int decode_tile_row;
+  int decode_tile_col;
+  unsigned int tile_mode;
+  unsigned int ext_tile_debug;
+  unsigned int row_mt;
+  EXTERNAL_REFERENCES ext_refs;
+  unsigned int is_annexb;
+  int operating_point;
+  int output_all_layers;
+
+  struct Av1Core * gpu_decoder;
+
+  AV1Decoder * pbi;
+
+  int need_resync;  // wait for key/intra-only frame
+  // BufferPool that holds all reference frames. Shared by all the FrameWorkers.
+  BufferPool *buffer_pool;
+
+  // External frame buffer info to save for AV1 common.
+  void *ext_priv;  // Private data associated with the external frame buffers.
+  aom_get_frame_buffer_cb_fn_t get_ext_fb_cb;
+  aom_release_frame_buffer_cb_fn_t release_ext_fb_cb;
+
+#if CONFIG_INSPECTION
+  aom_inspect_cb inspect_cb;
+  void *inspect_ctx;
+#endif
+};
+
+
+static void set_error_detail(aom_codec_alg_priv_t *ctx,
+    const char *const error) {
+    ctx->base.err_detail = error;
+}
+
+
+static void init_buffer_callbacks(aom_codec_alg_priv_t *ctx) {
+    AV1_COMMON *const cm = &ctx->pbi->common;
+    BufferPool *const pool = cm->buffer_pool;
+
+    cm->cur_frame = NULL;
+    cm->byte_alignment = ctx->byte_alignment;
+    cm->skip_loop_filter = ctx->skip_loop_filter;
+    cm->skip_film_grain = ctx->skip_film_grain;
+
+    pool->get_fb_cb = NULL;
+    pool->release_fb_cb = NULL;
+    pool->release_fb_cb_int = av1_release_fb_callback;
+    pool->cb_priv = ctx->pbi->gpu_decoder;
+}
+
+static void set_default_ppflags(aom_postproc_cfg_t *cfg) {
+    cfg->post_proc_flag = AOM_DEBLOCK | AOM_DEMACROBLOCK;
+    cfg->deblocking_level = 4;
+    cfg->noise_level = 0;
+}
+
+aom_codec_err_t av1_codec_query_memory_requirements(aom_codec_dec_cfg_t *cfg)
+{
+    if (!cfg)
+        return AOM_CODEC_INVALID_PARAM;
+    if (av1_query_memory_requirements(cfg))
+        return AOM_CODEC_UNSUP_FEATURE;
+    return AOM_CODEC_OK;
+}
+
+static aom_codec_err_t decoder_init(aom_codec_ctx_t *ctx,
+                                    aom_codec_priv_enc_mr_cfg_t *data) {
+  // This function only allocates space for the aom_codec_alg_priv_t
+  // structure. More memory may be required at the time the stream
+  // information becomes known.
+  (void)data;
+
+  if (!ctx->priv) {
+    const int alg_priv_size = (sizeof(aom_codec_alg_priv_t) + 31) & ~31;
+    const aom_codec_dec_cfg_t * cfg = ctx->cfg;
+    if (!cfg || !cfg->host_memory || cfg->host_size < alg_priv_size)
+        return AOM_CODEC_MEM_ERROR;
+    aom_codec_alg_priv_t *const priv = (aom_codec_alg_priv_t *)cfg->host_memory;
+    priv->cfg = *cfg;
+    priv->cfg.allow_lowbitdepth = CONFIG_LOWBITDEPTH;
+    ctx->priv = (aom_codec_priv_t *)priv;
+    ctx->priv->init_flags = ctx->init_flags;
+    priv->flushed = 0;
+    ctx->cfg = &priv->cfg;
+    priv->cfg.cfg.ext_partition = 1;
+    priv->row_mt = 0;
+    priv->tile_mode = 0;
+    priv->decode_tile_row = -1;
+    priv->decode_tile_col = -1;
+    priv->last_show_frame = NULL;
+    priv->need_resync = 1;
+    priv->flushed = 0;
+    priv->cfg.host_size -= alg_priv_size;
+    priv->cfg.host_memory = (uint8_t *)priv->cfg.host_memory + alg_priv_size;
+    if (av1_create_gpu_decoder(&priv->gpu_decoder, &priv->cfg))
+        return AOM_CODEC_MEM_ERROR;
+    av1_allocate_pbi(priv->gpu_decoder, &priv->pbi, &priv->buffer_pool);
+#if CONFIG_MULTITHREAD
+    if (pthread_mutex_init(&priv->buffer_pool->pool_mutex, NULL)) {
+        set_error_detail(priv, "Failed to allocate buffer pool mutex");
+        return AOM_CODEC_MEM_ERROR;
+    }
+#endif
+
+
+    if (av1_decoder_create(priv->pbi, priv->buffer_pool))
+    {
+        set_error_detail(priv, "Failed to create decoder");
+        return AOM_CODEC_ERROR;
+    }
+
+    priv->pbi->common.options = &priv->cfg.cfg;
+    priv->pbi->allow_lowbitdepth = priv->cfg.allow_lowbitdepth;
+    priv->pbi->max_threads = AOMMIN(8, priv->cfg.threads);
+    priv->pbi->inv_tile_order = priv->invert_tile_order;
+    priv->pbi->common.large_scale_tile = priv->tile_mode;
+    priv->pbi->common.is_annexb = priv->is_annexb;
+    priv->pbi->dec_tile_row = priv->decode_tile_row;
+    priv->pbi->dec_tile_col = priv->decode_tile_col;
+    priv->pbi->operating_point = priv->operating_point;
+    priv->pbi->ext_tile_debug = priv->ext_tile_debug;
+    priv->pbi->row_mt = priv->row_mt;
+    priv->pbi->gpu_decoder = priv->gpu_decoder;
+
+    // If postprocessing was enabled by the application and a
+    // configuration has not been provided, default it.
+    set_default_ppflags(&priv->postproc_cfg);
+
+    init_buffer_callbacks(priv);
+  }
+
+  return AOM_CODEC_OK;
+}
+
+static aom_codec_err_t decoder_destroy(aom_codec_alg_priv_t *ctx) {
+    if (ctx->pbi)
+    {
+    //    aom_free(ctx->pbi->common.tpl_mvs);
+        ctx->pbi->common.tpl_mvs = NULL;
+        av1_remove_common(&ctx->pbi->common);
+    //    av1_free_restoration_buffers(&ctx->pbi->common);
+        av1_decoder_remove(ctx->pbi);
+    }
+#if CONFIG_MULTITHREAD
+    if (ctx->buffer_pool)
+        pthread_mutex_destroy(&ctx->buffer_pool->pool_mutex);
+#endif
+
+  av1_destroy_gpu_decoder(ctx->gpu_decoder);
+  return AOM_CODEC_OK;
+}
+
+// Parses the operating points (including operating_point_idc, seq_level_idx,
+// and seq_tier) and then sets si->number_spatial_layers and
+// si->number_temporal_layers based on operating_point_idc[0].
+static aom_codec_err_t parse_operating_points(struct aom_read_bit_buffer *rb,
+                                              int is_reduced_header,
+                                              aom_codec_stream_info_t *si) {
+  int operating_point_idc0 = 0;
+
+  if (is_reduced_header) {
+    aom_rb_read_literal(rb, LEVEL_BITS);  // level
+  } else {
+    const uint8_t operating_points_cnt_minus_1 =
+        aom_rb_read_literal(rb, OP_POINTS_CNT_MINUS_1_BITS);
+    for (int i = 0; i < operating_points_cnt_minus_1 + 1; i++) {
+      int operating_point_idc;
+      operating_point_idc = aom_rb_read_literal(rb, OP_POINTS_IDC_BITS);
+      if (i == 0) operating_point_idc0 = operating_point_idc;
+      int seq_level_idx = aom_rb_read_literal(rb, LEVEL_BITS);  // level
+      if (seq_level_idx > 7) aom_rb_read_bit(rb);               // tier
+    }
+  }
+
+  if (aom_get_num_layers_from_operating_point_idc(
+          operating_point_idc0, &si->number_spatial_layers,
+          &si->number_temporal_layers) != AOM_CODEC_OK) {
+    return AOM_CODEC_ERROR;
+  }
+
+  return AOM_CODEC_OK;
+}
+
+static aom_codec_err_t decoder_peek_si_internal(const uint8_t *data,
+                                                size_t data_sz,
+                                                aom_codec_stream_info_t *si,
+                                                int *is_intra_only) {
+  int intra_only_flag = 0;
+  int got_sequence_header = 0;
+  int found_keyframe = 0;
+
+  if (data + data_sz <= data || data_sz < 1) return AOM_CODEC_INVALID_PARAM;
+
+  si->w = 0;
+  si->h = 0;
+  si->is_kf = 0;  // is_kf indicates whether the current packet contains a RAP
+
+  ObuHeader obu_header;
+  memset(&obu_header, 0, sizeof(obu_header));
+  size_t payload_size = 0;
+  size_t bytes_read = 0;
+  uint8_t reduced_still_picture_hdr = 0;
+  aom_codec_err_t status = aom_read_obu_header_and_size(
+      data, data_sz, si->is_annexb, &obu_header, &payload_size, &bytes_read);
+  if (status != AOM_CODEC_OK) return status;
+
+  // If the first OBU is a temporal delimiter, skip over it and look at the next
+  // OBU in the bitstream
+  if (obu_header.type == OBU_TEMPORAL_DELIMITER) {
+    // Skip any associated payload (there shouldn't be one, but just in case)
+    if (data_sz < bytes_read + payload_size) return AOM_CODEC_CORRUPT_FRAME;
+    data += bytes_read + payload_size;
+    data_sz -= bytes_read + payload_size;
+
+    status = aom_read_obu_header_and_size(
+        data, data_sz, si->is_annexb, &obu_header, &payload_size, &bytes_read);
+    if (status != AOM_CODEC_OK) return status;
+  }
+  while (1) {
+    data += bytes_read;
+    data_sz -= bytes_read;
+    if (data_sz < payload_size) return AOM_CODEC_CORRUPT_FRAME;
+    // Check that the selected OBU is a sequence header
+    if (obu_header.type == OBU_SEQUENCE_HEADER) {
+      // Sanity check on sequence header size
+      if (data_sz < 2) return AOM_CODEC_CORRUPT_FRAME;
+      // Read a few values from the sequence header payload
+      struct aom_read_bit_buffer rb = { data, data + data_sz, 0, NULL, NULL };
+
+      av1_read_profile(&rb);  // profile
+      const uint8_t still_picture = aom_rb_read_bit(&rb);
+      reduced_still_picture_hdr = aom_rb_read_bit(&rb);
+
+      if (!still_picture && reduced_still_picture_hdr) {
+        return AOM_CODEC_UNSUP_BITSTREAM;
+      }
+
+      if (parse_operating_points(&rb, reduced_still_picture_hdr, si) !=
+          AOM_CODEC_OK) {
+        return AOM_CODEC_ERROR;
+      }
+
+      int num_bits_width = aom_rb_read_literal(&rb, 4) + 1;
+      int num_bits_height = aom_rb_read_literal(&rb, 4) + 1;
+      int max_frame_width = aom_rb_read_literal(&rb, num_bits_width) + 1;
+      int max_frame_height = aom_rb_read_literal(&rb, num_bits_height) + 1;
+      si->w = max_frame_width;
+      si->h = max_frame_height;
+      got_sequence_header = 1;
+    } else if (obu_header.type == OBU_FRAME_HEADER ||
+               obu_header.type == OBU_FRAME) {
+      if (got_sequence_header && reduced_still_picture_hdr) {
+        found_keyframe = 1;
+        break;
+      } else {
+        // make sure we have enough bits to get the frame type out
+        if (data_sz < 1) return AOM_CODEC_CORRUPT_FRAME;
+        struct aom_read_bit_buffer rb = { data, data + data_sz, 0, NULL, NULL };
+        const int show_existing_frame = aom_rb_read_bit(&rb);
+        if (!show_existing_frame) {
+          const FRAME_TYPE frame_type = (FRAME_TYPE)aom_rb_read_literal(&rb, 2);
+          if (frame_type == KEY_FRAME) {
+            found_keyframe = 1;
+            break;  // Stop here as no further OBUs will change the outcome.
+          }
+        }
+      }
+    }
+    // skip past any unread OBU header data
+    data += payload_size;
+    data_sz -= payload_size;
+    if (data_sz == 0) break;  // exit if we're out of OBUs
+    status = aom_read_obu_header_and_size(
+        data, data_sz, si->is_annexb, &obu_header, &payload_size, &bytes_read);
+    if (status != AOM_CODEC_OK) return status;
+  }
+  if (got_sequence_header && found_keyframe) si->is_kf = 1;
+  if (is_intra_only != NULL) *is_intra_only = intra_only_flag;
+  return AOM_CODEC_OK;
+}
+
+static aom_codec_err_t decoder_peek_si(const uint8_t *data, size_t data_sz,
+                                       aom_codec_stream_info_t *si) {
+  return decoder_peek_si_internal(data, data_sz, si, NULL);
+}
+
+static aom_codec_err_t decoder_get_si(aom_codec_alg_priv_t *ctx,
+                                      aom_codec_stream_info_t *si) {
+  memcpy(si, &ctx->si, sizeof(*si));
+
+  return AOM_CODEC_OK;
+}
+
+
+static aom_codec_err_t update_error_state(
+    aom_codec_alg_priv_t *ctx, const struct aom_internal_error_info *error) {
+  if (error->error_code)
+    set_error_detail(ctx, error->has_detail ? error->detail : NULL);
+
+  return error->error_code;
+}
+
+static INLINE void check_resync(aom_codec_alg_priv_t *const ctx,
+                                const AV1Decoder *const pbi) {
+  // Clear resync flag if worker got a key frame or intra only frame.
+  if (ctx->need_resync == 1 && pbi->need_resync == 0 &&
+      frame_is_intra_only(&pbi->common))
+    ctx->need_resync = 0;
+}
+
+static aom_codec_err_t decode_one(aom_codec_alg_priv_t *ctx,
+                                  const uint8_t **data, size_t data_sz,
+                                  void *user_priv) {
+  // Determine the stream parameters. Note that we rely on peek_si to
+  // validate that we have a buffer that does not wrap around the top
+  // of the heap.
+  if (!ctx->si.h) {
+    int is_intra_only = 0;
+    ctx->si.is_annexb = ctx->is_annexb;
+    const aom_codec_err_t res =
+        decoder_peek_si_internal(*data, data_sz, &ctx->si, &is_intra_only);
+    if (res != AOM_CODEC_OK) return res;
+
+    if (!ctx->si.is_kf && !is_intra_only) return AOM_CODEC_ERROR;
+  }
+
+  AV1Decoder * pbi = ctx->pbi;
+  pbi->data = *data;
+  pbi->data_size = data_sz;
+  pbi->user_priv = user_priv;
+  pbi->common.large_scale_tile = ctx->tile_mode;
+  pbi->dec_tile_row = ctx->decode_tile_row;
+  pbi->dec_tile_col = ctx->decode_tile_col;
+  pbi->ext_tile_debug = ctx->ext_tile_debug;
+  pbi->row_mt = ctx->row_mt;
+  pbi->ext_refs = ctx->ext_refs;
+  pbi->common.is_annexb = ctx->is_annexb;
+
+//  winterface->execute(worker);
+  //const uint8_t *data = frame_worker_data->data;
+  pbi->data_end = pbi->data;
+  int result = av1_receive_compressed_data(pbi, pbi->data_size, &pbi->data_end);
+
+  // Update data pointer after decode.
+  *data = pbi->data_end;
+
+  if (result)
+    return update_error_state(ctx, &pbi->common.error);
+  check_resync(ctx, pbi);
+  return AOM_CODEC_OK;
+}
+
+static aom_codec_err_t decoder_decode(aom_codec_alg_priv_t *ctx,
+                                      const uint8_t *data, size_t data_sz,
+                                      void *user_priv) {
+  aom_codec_err_t res = AOM_CODEC_OK;
+
+  /* Sanity checks */
+  /* NULL data ptr allowed if data_sz is 0 too */
+  if (data == NULL && data_sz == 0) {
+    ctx->flushed = 1;
+    if (ctx->gpu_decoder) {
+        av1_drain_gpu_decoder(ctx->gpu_decoder);
+    }
+    return AOM_CODEC_OK;
+  }
+  if (data == NULL || data_sz == 0) return AOM_CODEC_INVALID_PARAM;
+
+  // Reset flushed when receiving a valid frame.
+  ctx->flushed = 0;
+
+  const uint8_t *data_start = data;
+  const uint8_t *data_end = data + data_sz;
+
+  if (ctx->is_annexb) {
+    // read the size of this temporal unit
+    size_t length_of_size;
+    uint64_t temporal_unit_size;
+    if (aom_uleb_decode(data_start, data_sz, &temporal_unit_size,
+                        &length_of_size) != 0) {
+      return AOM_CODEC_CORRUPT_FRAME;
+    }
+    data_start += length_of_size;
+    if (temporal_unit_size > (size_t)(data_end - data_start))
+      return AOM_CODEC_CORRUPT_FRAME;
+    data_end = data_start + temporal_unit_size;
+  }
+
+  // Decode in serial mode.
+  while (data_start < data_end) {
+    uint64_t frame_size;
+    if (ctx->is_annexb) {
+      // read the size of this frame unit
+      size_t length_of_size;
+      if (aom_uleb_decode(data_start, (size_t)(data_end - data_start),
+                          &frame_size, &length_of_size) != 0) {
+        return AOM_CODEC_CORRUPT_FRAME;
+      }
+      data_start += length_of_size;
+      if (frame_size > (size_t)(data_end - data_start))
+        return AOM_CODEC_CORRUPT_FRAME;
+    } else {
+      frame_size = (uint64_t)(data_end - data_start);
+    }
+
+    res = decode_one(ctx, &data_start, (size_t)frame_size, user_priv);
+    if (res != AOM_CODEC_OK) return res;
+
+    // Allow extra zero bytes after the frame end
+    while (data_start < data_end) {
+      const uint8_t marker = data_start[0];
+      if (marker) break;
+      ++data_start;
+    }
+  }
+
+  return res;
+}
+
+static aom_image_t *decoder_get_frame(aom_codec_alg_priv_t *ctx, aom_codec_iter_t *iter) 
+{
+    aom_image_t *img = NULL;
+    // To avoid having to allocate any extra storage, treat 'iter' as
+    // simply a pointer to an integer index
+    if (ctx->pbi == NULL)
+        return NULL;
+    // NOTE(david.barker): This code does not support multiple worker threads
+    // yet. We should probably move the iteration over threads into *iter
+    // instead of using ctx->next_output_worker_id.
+    AV1Decoder *const pbi = ctx->pbi;
+
+    img = &ctx->img;
+    if (get_output_frame(pbi, img))
+        return NULL;
+    return img;
+}
+
+static aom_codec_err_t decoder_set_fb_fn(
+    aom_codec_alg_priv_t *ctx, aom_get_frame_buffer_cb_fn_t cb_get,
+    aom_release_frame_buffer_cb_fn_t cb_release, void *cb_priv) {
+  return AOM_CODEC_ERROR;
+}
+
+static aom_codec_err_t ctrl_set_reference(aom_codec_alg_priv_t *ctx,
+                                          va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_copy_reference(aom_codec_alg_priv_t *ctx,
+                                           va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_get_reference(aom_codec_alg_priv_t *ctx,
+                                          va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_get_new_frame_image(aom_codec_alg_priv_t *ctx,
+                                                va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_copy_new_frame_image(aom_codec_alg_priv_t *ctx,
+                                                 va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_set_postproc(aom_codec_alg_priv_t *ctx,
+                                         va_list args) {
+  (void)ctx;
+  (void)args;
+  return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_set_dbg_options(aom_codec_alg_priv_t *ctx,
+                                            va_list args) {
+  (void)ctx;
+  (void)args;
+  return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_get_last_ref_updates(aom_codec_alg_priv_t *ctx,
+                                                 va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_get_last_quantizer(aom_codec_alg_priv_t *ctx,
+                                               va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_get_frame_corrupted(aom_codec_alg_priv_t *ctx,
+                                                va_list args) {
+   return AOM_CODEC_INVALID_PARAM;
+}
+
+static aom_codec_err_t ctrl_get_frame_size(aom_codec_alg_priv_t *ctx,
+                                           va_list args) {
+    return AOM_CODEC_INCAPABLE;
+
+}
+
+static aom_codec_err_t ctrl_get_frame_header_info(aom_codec_alg_priv_t *ctx,
+                                                  va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_get_tile_data(aom_codec_alg_priv_t *ctx,
+                                          va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_set_ext_ref_ptr(aom_codec_alg_priv_t *ctx,
+                                            va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_get_render_size(aom_codec_alg_priv_t *ctx,
+                                            va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_get_bit_depth(aom_codec_alg_priv_t *ctx,
+                                          va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_get_img_format(aom_codec_alg_priv_t *ctx,
+                                           va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_get_tile_size(aom_codec_alg_priv_t *ctx,
+                                          va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_get_tile_count(aom_codec_alg_priv_t *ctx,
+                                           va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_set_invert_tile_order(aom_codec_alg_priv_t *ctx,
+                                                  va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_set_byte_alignment(aom_codec_alg_priv_t *ctx,
+                                               va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_set_skip_loop_filter(aom_codec_alg_priv_t *ctx,
+                                                 va_list args) {
+  ctx->skip_loop_filter = va_arg(args, int);
+  
+  if (ctx->pbi) {
+    ctx->pbi->common.skip_loop_filter = ctx->skip_loop_filter;
+  }
+
+  return AOM_CODEC_OK;
+}
+
+static aom_codec_err_t ctrl_set_skip_film_grain(aom_codec_alg_priv_t *ctx,
+                                                va_list args) {
+  ctx->skip_film_grain = va_arg(args, int);
+
+  if (ctx->pbi) {
+    ctx->pbi->common.skip_film_grain = ctx->skip_film_grain;
+  }
+
+  return AOM_CODEC_OK;
+}
+
+static aom_codec_err_t ctrl_get_accounting(aom_codec_alg_priv_t *ctx,
+                                           va_list args) {
+#if !CONFIG_ACCOUNTING
+  (void)ctx;
+  (void)args;
+  return AOM_CODEC_INCAPABLE;
+#else
+  if (ctx->frame_workers) {
+    AVxWorker *const worker = ctx->frame_workers;
+    FrameWorkerData *const frame_worker_data = (FrameWorkerData *)worker->data1;
+    AV1Decoder *pbi = frame_worker_data->pbi;
+    Accounting **acct = va_arg(args, Accounting **);
+    *acct = &pbi->accounting;
+    return AOM_CODEC_OK;
+  }
+  return AOM_CODEC_ERROR;
+#endif
+}
+static aom_codec_err_t ctrl_set_decode_tile_row(aom_codec_alg_priv_t *ctx,
+                                                va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_set_decode_tile_col(aom_codec_alg_priv_t *ctx,
+                                                va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_set_tile_mode(aom_codec_alg_priv_t *ctx,
+                                          va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_set_is_annexb(aom_codec_alg_priv_t *ctx,
+                                          va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_set_operating_point(aom_codec_alg_priv_t *ctx,
+                                                va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_set_output_all_layers(aom_codec_alg_priv_t *ctx,
+                                                  va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_set_inspection_callback(aom_codec_alg_priv_t *ctx,
+                                                    va_list args) {
+#if !CONFIG_INSPECTION
+  (void)ctx;
+  (void)args;
+  return AOM_CODEC_INCAPABLE;
+#else
+  aom_inspect_init *init = va_arg(args, aom_inspect_init *);
+  ctx->inspect_cb = init->inspect_cb;
+  ctx->inspect_ctx = init->inspect_ctx;
+  return AOM_CODEC_OK;
+#endif
+}
+
+static aom_codec_err_t ctrl_ext_tile_debug(aom_codec_alg_priv_t *ctx,
+                                           va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_err_t ctrl_set_row_mt(aom_codec_alg_priv_t *ctx,
+                                       va_list args) {
+    return AOM_CODEC_INCAPABLE;
+}
+
+static aom_codec_ctrl_fn_map_t decoder_ctrl_maps[] = {
+  { AV1_COPY_REFERENCE, ctrl_copy_reference },
+
+  // Setters
+  { AV1_SET_REFERENCE, ctrl_set_reference },
+  { AOM_SET_POSTPROC, ctrl_set_postproc },
+  { AOM_SET_DBG_COLOR_REF_FRAME, ctrl_set_dbg_options },
+  { AOM_SET_DBG_COLOR_MB_MODES, ctrl_set_dbg_options },
+  { AOM_SET_DBG_COLOR_B_MODES, ctrl_set_dbg_options },
+  { AOM_SET_DBG_DISPLAY_MV, ctrl_set_dbg_options },
+  { AV1_INVERT_TILE_DECODE_ORDER, ctrl_set_invert_tile_order },
+  { AV1_SET_BYTE_ALIGNMENT, ctrl_set_byte_alignment },
+  { AV1_SET_SKIP_LOOP_FILTER, ctrl_set_skip_loop_filter },
+  { AV1_SET_DECODE_TILE_ROW, ctrl_set_decode_tile_row },
+  { AV1_SET_DECODE_TILE_COL, ctrl_set_decode_tile_col },
+  { AV1_SET_TILE_MODE, ctrl_set_tile_mode },
+  { AV1D_SET_IS_ANNEXB, ctrl_set_is_annexb },
+  { AV1D_SET_OPERATING_POINT, ctrl_set_operating_point },
+  { AV1D_SET_OUTPUT_ALL_LAYERS, ctrl_set_output_all_layers },
+  { AV1_SET_INSPECTION_CALLBACK, ctrl_set_inspection_callback },
+  { AV1D_EXT_TILE_DEBUG, ctrl_ext_tile_debug },
+  { AV1D_SET_ROW_MT, ctrl_set_row_mt },
+  { AV1D_SET_EXT_REF_PTR, ctrl_set_ext_ref_ptr },
+  { AV1D_SET_SKIP_FILM_GRAIN, ctrl_set_skip_film_grain },
+
+  // Getters
+  { AOMD_GET_FRAME_CORRUPTED, ctrl_get_frame_corrupted },
+  { AOMD_GET_LAST_QUANTIZER, ctrl_get_last_quantizer },
+  { AOMD_GET_LAST_REF_UPDATES, ctrl_get_last_ref_updates },
+  { AV1D_GET_BIT_DEPTH, ctrl_get_bit_depth },
+  { AV1D_GET_IMG_FORMAT, ctrl_get_img_format },
+  { AV1D_GET_TILE_SIZE, ctrl_get_tile_size },
+  { AV1D_GET_TILE_COUNT, ctrl_get_tile_count },
+  { AV1D_GET_DISPLAY_SIZE, ctrl_get_render_size },
+  { AV1D_GET_FRAME_SIZE, ctrl_get_frame_size },
+  { AV1_GET_ACCOUNTING, ctrl_get_accounting },
+  { AV1_GET_NEW_FRAME_IMAGE, ctrl_get_new_frame_image },
+  { AV1_COPY_NEW_FRAME_IMAGE, ctrl_copy_new_frame_image },
+  { AV1_GET_REFERENCE, ctrl_get_reference },
+  { AV1D_GET_FRAME_HEADER_INFO, ctrl_get_frame_header_info },
+  { AV1D_GET_TILE_DATA, ctrl_get_tile_data },
+
+  { -1, NULL },
+};
+
+#ifndef VERSION_STRING
+#define VERSION_STRING
+#endif
+CODEC_INTERFACE(aom_codec_av1_dx) = {
+  "AOMedia Project AV1 Decoder" VERSION_STRING,
+  AOM_CODEC_INTERNAL_ABI_VERSION,
+  AOM_CODEC_CAP_DECODER |
+      AOM_CODEC_CAP_EXTERNAL_FRAME_BUFFER,  // aom_codec_caps_t
+  decoder_init,                             // aom_codec_init_fn_t
+  decoder_destroy,                          // aom_codec_destroy_fn_t
+  decoder_ctrl_maps,                        // aom_codec_ctrl_fn_map_t
+  {
+      // NOLINT
+      decoder_peek_si,    // aom_codec_peek_si_fn_t
+      decoder_get_si,     // aom_codec_get_si_fn_t
+      decoder_decode,     // aom_codec_decode_fn_t
+      decoder_get_frame,  // aom_codec_get_frame_fn_t
+      decoder_set_fb_fn,  // aom_codec_set_fb_fn_t
+  },
+  {
+      // NOLINT
+      0,
+      NULL,  // aom_codec_enc_cfg_map_t
+      NULL,  // aom_codec_encode_fn_t
+      NULL,  // aom_codec_get_cx_data_fn_t
+      NULL,  // aom_codec_enc_config_set_fn_t
+      NULL,  // aom_codec_get_global_headers_fn_t
+      NULL,  // aom_codec_get_preview_frame_fn_t
+      NULL   // aom_codec_enc_mr_get_mem_loc_fn_t
+  }
+};

diff --git a/libav1/av1/av1_iface_common.h b/libav1/av1/av1_iface_common.h
new file mode 100644
index 0000000..e590060
--- /dev/null
+++ b/libav1/av1/av1_iface_common.h

@@ -0,0 +1,18 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#ifndef AOM_AV1_AV1_IFACE_COMMON_H_
+#define AOM_AV1_AV1_IFACE_COMMON_H_
+
+#include "aom_ports/mem.h"
+#include "aom_scale/yv12config.h"
+
+
+#endif  // AOM_AV1_AV1_IFACE_COMMON_H_

diff --git a/libav1/av1/common/alloccommon.c b/libav1/av1/common/alloccommon.c
new file mode 100644
index 0000000..97b8b0f
--- /dev/null
+++ b/libav1/av1/common/alloccommon.c

@@ -0,0 +1,228 @@
+/*
+ *
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "config/aom_config.h"
+
+#include "aom_mem/aom_mem.h"
+
+#include "av1/common/alloccommon.h"
+#include "av1/common/blockd.h"
+#include "av1/common/entropymode.h"
+#include "av1/common/entropymv.h"
+#include "av1/common/onyxc_int.h"
+
+int av1_get_MBs(int width, int height) {
+  const int aligned_width = ALIGN_POWER_OF_TWO(width, 3);
+  const int aligned_height = ALIGN_POWER_OF_TWO(height, 3);
+  const int mi_cols = aligned_width >> MI_SIZE_LOG2;
+  const int mi_rows = aligned_height >> MI_SIZE_LOG2;
+
+  const int mb_cols = (mi_cols + 2) >> 2;
+  const int mb_rows = (mi_rows + 2) >> 2;
+  return mb_rows * mb_cols;
+}
+
+#if LOOP_FILTER_BITMASK
+static int alloc_loop_filter_mask(AV1_COMMON *cm) {
+  aom_free(cm->lf.lfm);
+  cm->lf.lfm = NULL;
+
+  // Each lfm holds bit masks for all the 4x4 blocks in a max
+  // 64x64 (128x128 for ext_partitions) region.  The stride
+  // and rows are rounded up / truncated to a multiple of 16
+  // (32 for ext_partition).
+  cm->lf.lfm_stride = (cm->mi_cols + (MI_SIZE_64X64 - 1)) >> MIN_MIB_SIZE_LOG2;
+  cm->lf.lfm_num = ((cm->mi_rows + (MI_SIZE_64X64 - 1)) >> MIN_MIB_SIZE_LOG2) *
+                   cm->lf.lfm_stride;
+  cm->lf.lfm =
+      (LoopFilterMask *)aom_calloc(cm->lf.lfm_num, sizeof(*cm->lf.lfm));
+  if (!cm->lf.lfm) return 1;
+
+  unsigned int i;
+  for (i = 0; i < cm->lf.lfm_num; ++i) av1_zero(cm->lf.lfm[i]);
+
+  return 0;
+}
+
+static void free_loop_filter_mask(AV1_COMMON *cm) {
+  if (cm->lf.lfm == NULL) return;
+
+  aom_free(cm->lf.lfm);
+  cm->lf.lfm = NULL;
+  cm->lf.lfm_num = 0;
+  cm->lf.lfm_stride = 0;
+}
+#endif
+
+void av1_set_mb_mi(AV1_COMMON *cm, int width, int height) {
+  // Ensure that the decoded width and height are both multiples of
+  // 8 luma pixels (note: this may only be a multiple of 4 chroma pixels if
+  // subsampling is used).
+  // This simplifies the implementation of various experiments,
+  // eg. cdef, which operates on units of 8x8 luma pixels.
+  const int aligned_width = ALIGN_POWER_OF_TWO(width, 3);
+  const int aligned_height = ALIGN_POWER_OF_TWO(height, 3);
+
+  cm->mi_cols = aligned_width >> MI_SIZE_LOG2;
+  cm->mi_rows = aligned_height >> MI_SIZE_LOG2;
+  cm->mi_stride = calc_mi_size(cm->mi_cols);
+
+  cm->mb_cols = (cm->mi_cols + 2) >> 2;
+  cm->mb_rows = (cm->mi_rows + 2) >> 2;
+  cm->MBs = cm->mb_rows * cm->mb_cols;
+
+#if LOOP_FILTER_BITMASK
+  alloc_loop_filter_mask(cm);
+#endif
+}
+
+void av1_free_ref_frame_buffers(BufferPool *pool) {
+  int i;
+
+  for (i = 0; i < FRAME_BUFFERS; ++i) {
+    if (pool->frame_bufs[i].ref_count > 0 &&
+        pool->frame_bufs[i].raw_frame_buffer.data != NULL) {
+      pool->release_fb_cb(pool->cb_priv, &pool->frame_bufs[i].raw_frame_buffer);
+      pool->frame_bufs[i].raw_frame_buffer.data = NULL;
+      pool->frame_bufs[i].raw_frame_buffer.size = 0;
+      pool->frame_bufs[i].raw_frame_buffer.priv = NULL;
+      pool->frame_bufs[i].ref_count = 0;
+    }
+    aom_free(pool->frame_bufs[i].mvs);
+    pool->frame_bufs[i].mvs = NULL;
+    aom_free(pool->frame_bufs[i].seg_map);
+    pool->frame_bufs[i].seg_map = NULL;
+    aom_free_frame_buffer(&pool->frame_bufs[i].buf);
+  }
+}
+
+// Assumes cm->rst_info[p].restoration_unit_size is already initialized
+void av1_alloc_restoration_buffers(AV1_COMMON *cm) {
+  const int num_planes = av1_num_planes(cm);
+  for (int p = 0; p < num_planes; ++p)
+    av1_alloc_restoration_struct(cm, &cm->rst_info[p], p > 0);
+
+  cm->rst_tmpbuf = NULL;
+  cm->rlbs = NULL;
+  for (int p = 0; p < num_planes; ++p) {
+      cm->rst_info[p].boundaries.stripe_boundary_above = NULL;
+      cm->rst_info[p].boundaries.stripe_boundary_below = NULL;
+  }
+ }
+
+void av1_free_restoration_buffers(AV1_COMMON *cm) {
+  int p;
+  for (p = 0; p < MAX_MB_PLANE; ++p)
+    av1_free_restoration_struct(&cm->rst_info[p]);
+  aom_free(cm->rst_tmpbuf);
+  cm->rst_tmpbuf = NULL;
+  aom_free(cm->rlbs);
+  cm->rlbs = NULL;
+  for (p = 0; p < MAX_MB_PLANE; ++p) {
+    RestorationStripeBoundaries *boundaries = &cm->rst_info[p].boundaries;
+    aom_free(boundaries->stripe_boundary_above);
+    aom_free(boundaries->stripe_boundary_below);
+    boundaries->stripe_boundary_above = NULL;
+    boundaries->stripe_boundary_below = NULL;
+  }
+
+  aom_free_frame_buffer(&cm->rst_frame);
+}
+
+void av1_free_above_context_buffers(AV1_COMMON *cm,
+                                    int num_free_above_contexts) {
+  int i;
+  const int num_planes = cm->num_allocated_above_context_planes;
+
+  for (int tile_row = 0; tile_row < num_free_above_contexts; tile_row++) {
+    for (i = 0; i < num_planes; i++) {
+        if (cm->above_context[i])
+        {
+            aom_free(cm->above_context[i][tile_row]);
+            cm->above_context[i][tile_row] = NULL;
+        }
+    }
+    if (cm->above_seg_context)
+    {
+        aom_free(cm->above_seg_context[tile_row]);
+        cm->above_seg_context[tile_row] = NULL;
+    }
+
+    if (cm->above_txfm_context)
+    {
+        aom_free(cm->above_txfm_context[tile_row]);
+        cm->above_txfm_context[tile_row] = NULL;
+    }
+  }
+  for (i = 0; i < num_planes; i++) {
+    aom_free(cm->above_context[i]);
+    cm->above_context[i] = NULL;
+  }
+  aom_free(cm->above_seg_context);
+  cm->above_seg_context = NULL;
+
+  aom_free(cm->above_txfm_context);
+  cm->above_txfm_context = NULL;
+
+  cm->num_allocated_above_contexts = 0;
+  cm->num_allocated_above_context_mi_col = 0;
+  cm->num_allocated_above_context_planes = 0;
+}
+
+int av1_alloc_above_context_buffers(AV1_COMMON *cm,
+                                    int num_alloc_above_contexts) {
+  const int num_planes = av1_num_planes(cm);
+  int plane_idx;
+  const int aligned_mi_cols =
+      ALIGN_POWER_OF_TWO(cm->mi_cols, MAX_MIB_SIZE_LOG2);
+
+  // Allocate above context buffers
+  cm->num_allocated_above_contexts = num_alloc_above_contexts;
+  cm->num_allocated_above_context_mi_col = aligned_mi_cols;
+  cm->num_allocated_above_context_planes = num_planes;
+  for (plane_idx = 0; plane_idx < num_planes; plane_idx++) {
+    cm->above_context[plane_idx] = (ENTROPY_CONTEXT **)aom_calloc(
+        num_alloc_above_contexts, sizeof(cm->above_context[0]));
+    if (!cm->above_context[plane_idx]) return 1;
+  }
+
+  cm->above_seg_context = (PARTITION_CONTEXT **)aom_calloc(
+      num_alloc_above_contexts, sizeof(cm->above_seg_context));
+  if (!cm->above_seg_context) return 1;
+
+  cm->above_txfm_context = (TXFM_CONTEXT **)aom_calloc(
+      num_alloc_above_contexts, sizeof(cm->above_txfm_context));
+  if (!cm->above_txfm_context) return 1;
+
+  for (int tile_row = 0; tile_row < num_alloc_above_contexts; tile_row++) {
+    for (plane_idx = 0; plane_idx < num_planes; plane_idx++) {
+      cm->above_context[plane_idx][tile_row] = (ENTROPY_CONTEXT *)aom_calloc(
+          aligned_mi_cols, sizeof(*cm->above_context[0][tile_row]));
+      if (!cm->above_context[plane_idx][tile_row]) return 1;
+    }
+
+    cm->above_seg_context[tile_row] = (PARTITION_CONTEXT *)aom_calloc(
+        aligned_mi_cols, sizeof(*cm->above_seg_context[tile_row]));
+    if (!cm->above_seg_context[tile_row]) return 1;
+
+    cm->above_txfm_context[tile_row] = (TXFM_CONTEXT *)aom_calloc(
+        aligned_mi_cols, sizeof(*cm->above_txfm_context[tile_row]));
+    if (!cm->above_txfm_context[tile_row]) return 1;
+  }
+
+  return 0;
+}
+
+void av1_remove_common(AV1_COMMON *cm) {
+  cm->fc = NULL;
+  cm->default_frame_context = NULL;
+}

diff --git a/libav1/av1/common/alloccommon.h b/libav1/av1/common/alloccommon.h
new file mode 100644
index 0000000..9608340
--- /dev/null
+++ b/libav1/av1/common/alloccommon.h

@@ -0,0 +1,47 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_ALLOCCOMMON_H_
+#define AOM_AV1_COMMON_ALLOCCOMMON_H_
+
+#define INVALID_IDX -1  // Invalid buffer index.
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct AV1Common;
+struct BufferPool;
+
+void av1_remove_common(struct AV1Common *cm);
+
+int av1_alloc_above_context_buffers(struct AV1Common *cm,
+                                    int num_alloc_above_contexts);
+void av1_free_above_context_buffers(struct AV1Common *cm,
+                                    int num_free_above_contexts);
+int av1_alloc_context_buffers(struct AV1Common *cm, int width, int height);
+void av1_init_context_buffers(struct AV1Common *cm);
+
+void av1_free_ref_frame_buffers(struct BufferPool *pool);
+void av1_alloc_restoration_buffers(struct AV1Common *cm);
+void av1_free_restoration_buffers(struct AV1Common *cm);
+
+int av1_alloc_state_buffers(struct AV1Common *cm, int width, int height);
+void av1_free_state_buffers(struct AV1Common *cm);
+
+void av1_set_mb_mi(struct AV1Common *cm, int width, int height);
+int av1_get_MBs(int width, int height);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_ALLOCCOMMON_H_

diff --git a/libav1/av1/common/av1_inv_txfm1d.c b/libav1/av1/common/av1_inv_txfm1d.c
new file mode 100644
index 0000000..7ef2d6d
--- /dev/null
+++ b/libav1/av1/common/av1_inv_txfm1d.c

@@ -0,0 +1,1846 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <stdlib.h>
+#include "av1/common/av1_inv_txfm1d.h"
+#include "av1/common/av1_txfm.h"
+
+// TODO(angiebird): Make 1-d txfm functions static
+//
+
+void av1_idct4_new(const int32_t *input, int32_t *output, int8_t cos_bit,
+                   const int8_t *stage_range) {
+  assert(output != input);
+  const int32_t size = 4;
+  const int32_t *cospi = cospi_arr(cos_bit);
+
+  int32_t stage = 0;
+  int32_t *bf0, *bf1;
+  int32_t step[4];
+
+  // stage 0;
+
+  // stage 1;
+  stage++;
+  bf1 = output;
+  bf1[0] = input[0];
+  bf1[1] = input[2];
+  bf1[2] = input[1];
+  bf1[3] = input[3];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 2
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = half_btf(cospi[32], bf0[0], cospi[32], bf0[1], cos_bit);
+  bf1[1] = half_btf(cospi[32], bf0[0], -cospi[32], bf0[1], cos_bit);
+  bf1[2] = half_btf(cospi[48], bf0[2], -cospi[16], bf0[3], cos_bit);
+  bf1[3] = half_btf(cospi[16], bf0[2], cospi[48], bf0[3], cos_bit);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 3
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = clamp_value(bf0[0] + bf0[3], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[2], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[1] - bf0[2], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[0] - bf0[3], stage_range[stage]);
+}
+
+void av1_idct8_new(const int32_t *input, int32_t *output, int8_t cos_bit,
+                   const int8_t *stage_range) {
+  assert(output != input);
+  const int32_t size = 8;
+  const int32_t *cospi = cospi_arr(cos_bit);
+
+  int32_t stage = 0;
+  int32_t *bf0, *bf1;
+  int32_t step[8];
+
+  // stage 0;
+
+  // stage 1;
+  stage++;
+  bf1 = output;
+  bf1[0] = input[0];
+  bf1[1] = input[4];
+  bf1[2] = input[2];
+  bf1[3] = input[6];
+  bf1[4] = input[1];
+  bf1[5] = input[5];
+  bf1[6] = input[3];
+  bf1[7] = input[7];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 2
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = bf0[0];
+  bf1[1] = bf0[1];
+  bf1[2] = bf0[2];
+  bf1[3] = bf0[3];
+  bf1[4] = half_btf(cospi[56], bf0[4], -cospi[8], bf0[7], cos_bit);
+  bf1[5] = half_btf(cospi[24], bf0[5], -cospi[40], bf0[6], cos_bit);
+  bf1[6] = half_btf(cospi[40], bf0[5], cospi[24], bf0[6], cos_bit);
+  bf1[7] = half_btf(cospi[8], bf0[4], cospi[56], bf0[7], cos_bit);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 3
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = half_btf(cospi[32], bf0[0], cospi[32], bf0[1], cos_bit);
+  bf1[1] = half_btf(cospi[32], bf0[0], -cospi[32], bf0[1], cos_bit);
+  bf1[2] = half_btf(cospi[48], bf0[2], -cospi[16], bf0[3], cos_bit);
+  bf1[3] = half_btf(cospi[16], bf0[2], cospi[48], bf0[3], cos_bit);
+  bf1[4] = clamp_value(bf0[4] + bf0[5], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[4] - bf0[5], stage_range[stage]);
+  bf1[6] = clamp_value(-bf0[6] + bf0[7], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[6] + bf0[7], stage_range[stage]);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 4
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = clamp_value(bf0[0] + bf0[3], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[2], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[1] - bf0[2], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[0] - bf0[3], stage_range[stage]);
+  bf1[4] = bf0[4];
+  bf1[5] = half_btf(-cospi[32], bf0[5], cospi[32], bf0[6], cos_bit);
+  bf1[6] = half_btf(cospi[32], bf0[5], cospi[32], bf0[6], cos_bit);
+  bf1[7] = bf0[7];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 5
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = clamp_value(bf0[0] + bf0[7], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[6], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[2] + bf0[5], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[3] + bf0[4], stage_range[stage]);
+  bf1[4] = clamp_value(bf0[3] - bf0[4], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[2] - bf0[5], stage_range[stage]);
+  bf1[6] = clamp_value(bf0[1] - bf0[6], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[0] - bf0[7], stage_range[stage]);
+}
+
+void av1_idct16_new(const int32_t *input, int32_t *output, int8_t cos_bit,
+                    const int8_t *stage_range) {
+  assert(output != input);
+  const int32_t size = 16;
+  const int32_t *cospi = cospi_arr(cos_bit);
+
+  int32_t stage = 0;
+  int32_t *bf0, *bf1;
+  int32_t step[16];
+
+  // stage 0;
+
+  // stage 1;
+  stage++;
+  bf1 = output;
+  bf1[0] = input[0];
+  bf1[1] = input[8];
+  bf1[2] = input[4];
+  bf1[3] = input[12];
+  bf1[4] = input[2];
+  bf1[5] = input[10];
+  bf1[6] = input[6];
+  bf1[7] = input[14];
+  bf1[8] = input[1];
+  bf1[9] = input[9];
+  bf1[10] = input[5];
+  bf1[11] = input[13];
+  bf1[12] = input[3];
+  bf1[13] = input[11];
+  bf1[14] = input[7];
+  bf1[15] = input[15];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 2
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = bf0[0];
+  bf1[1] = bf0[1];
+  bf1[2] = bf0[2];
+  bf1[3] = bf0[3];
+  bf1[4] = bf0[4];
+  bf1[5] = bf0[5];
+  bf1[6] = bf0[6];
+  bf1[7] = bf0[7];
+  bf1[8] = half_btf(cospi[60], bf0[8], -cospi[4], bf0[15], cos_bit);
+  bf1[9] = half_btf(cospi[28], bf0[9], -cospi[36], bf0[14], cos_bit);
+  bf1[10] = half_btf(cospi[44], bf0[10], -cospi[20], bf0[13], cos_bit);
+  bf1[11] = half_btf(cospi[12], bf0[11], -cospi[52], bf0[12], cos_bit);
+  bf1[12] = half_btf(cospi[52], bf0[11], cospi[12], bf0[12], cos_bit);
+  bf1[13] = half_btf(cospi[20], bf0[10], cospi[44], bf0[13], cos_bit);
+  bf1[14] = half_btf(cospi[36], bf0[9], cospi[28], bf0[14], cos_bit);
+  bf1[15] = half_btf(cospi[4], bf0[8], cospi[60], bf0[15], cos_bit);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 3
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = bf0[0];
+  bf1[1] = bf0[1];
+  bf1[2] = bf0[2];
+  bf1[3] = bf0[3];
+  bf1[4] = half_btf(cospi[56], bf0[4], -cospi[8], bf0[7], cos_bit);
+  bf1[5] = half_btf(cospi[24], bf0[5], -cospi[40], bf0[6], cos_bit);
+  bf1[6] = half_btf(cospi[40], bf0[5], cospi[24], bf0[6], cos_bit);
+  bf1[7] = half_btf(cospi[8], bf0[4], cospi[56], bf0[7], cos_bit);
+  bf1[8] = clamp_value(bf0[8] + bf0[9], stage_range[stage]);
+  bf1[9] = clamp_value(bf0[8] - bf0[9], stage_range[stage]);
+  bf1[10] = clamp_value(-bf0[10] + bf0[11], stage_range[stage]);
+  bf1[11] = clamp_value(bf0[10] + bf0[11], stage_range[stage]);
+  bf1[12] = clamp_value(bf0[12] + bf0[13], stage_range[stage]);
+  bf1[13] = clamp_value(bf0[12] - bf0[13], stage_range[stage]);
+  bf1[14] = clamp_value(-bf0[14] + bf0[15], stage_range[stage]);
+  bf1[15] = clamp_value(bf0[14] + bf0[15], stage_range[stage]);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 4
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = half_btf(cospi[32], bf0[0], cospi[32], bf0[1], cos_bit);
+  bf1[1] = half_btf(cospi[32], bf0[0], -cospi[32], bf0[1], cos_bit);
+  bf1[2] = half_btf(cospi[48], bf0[2], -cospi[16], bf0[3], cos_bit);
+  bf1[3] = half_btf(cospi[16], bf0[2], cospi[48], bf0[3], cos_bit);
+  bf1[4] = clamp_value(bf0[4] + bf0[5], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[4] - bf0[5], stage_range[stage]);
+  bf1[6] = clamp_value(-bf0[6] + bf0[7], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[6] + bf0[7], stage_range[stage]);
+  bf1[8] = bf0[8];
+  bf1[9] = half_btf(-cospi[16], bf0[9], cospi[48], bf0[14], cos_bit);
+  bf1[10] = half_btf(-cospi[48], bf0[10], -cospi[16], bf0[13], cos_bit);
+  bf1[11] = bf0[11];
+  bf1[12] = bf0[12];
+  bf1[13] = half_btf(-cospi[16], bf0[10], cospi[48], bf0[13], cos_bit);
+  bf1[14] = half_btf(cospi[48], bf0[9], cospi[16], bf0[14], cos_bit);
+  bf1[15] = bf0[15];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 5
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = clamp_value(bf0[0] + bf0[3], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[2], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[1] - bf0[2], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[0] - bf0[3], stage_range[stage]);
+  bf1[4] = bf0[4];
+  bf1[5] = half_btf(-cospi[32], bf0[5], cospi[32], bf0[6], cos_bit);
+  bf1[6] = half_btf(cospi[32], bf0[5], cospi[32], bf0[6], cos_bit);
+  bf1[7] = bf0[7];
+  bf1[8] = clamp_value(bf0[8] + bf0[11], stage_range[stage]);
+  bf1[9] = clamp_value(bf0[9] + bf0[10], stage_range[stage]);
+  bf1[10] = clamp_value(bf0[9] - bf0[10], stage_range[stage]);
+  bf1[11] = clamp_value(bf0[8] - bf0[11], stage_range[stage]);
+  bf1[12] = clamp_value(-bf0[12] + bf0[15], stage_range[stage]);
+  bf1[13] = clamp_value(-bf0[13] + bf0[14], stage_range[stage]);
+  bf1[14] = clamp_value(bf0[13] + bf0[14], stage_range[stage]);
+  bf1[15] = clamp_value(bf0[12] + bf0[15], stage_range[stage]);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 6
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = clamp_value(bf0[0] + bf0[7], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[6], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[2] + bf0[5], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[3] + bf0[4], stage_range[stage]);
+  bf1[4] = clamp_value(bf0[3] - bf0[4], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[2] - bf0[5], stage_range[stage]);
+  bf1[6] = clamp_value(bf0[1] - bf0[6], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[0] - bf0[7], stage_range[stage]);
+  bf1[8] = bf0[8];
+  bf1[9] = bf0[9];
+  bf1[10] = half_btf(-cospi[32], bf0[10], cospi[32], bf0[13], cos_bit);
+  bf1[11] = half_btf(-cospi[32], bf0[11], cospi[32], bf0[12], cos_bit);
+  bf1[12] = half_btf(cospi[32], bf0[11], cospi[32], bf0[12], cos_bit);
+  bf1[13] = half_btf(cospi[32], bf0[10], cospi[32], bf0[13], cos_bit);
+  bf1[14] = bf0[14];
+  bf1[15] = bf0[15];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 7
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = clamp_value(bf0[0] + bf0[15], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[14], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[2] + bf0[13], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[3] + bf0[12], stage_range[stage]);
+  bf1[4] = clamp_value(bf0[4] + bf0[11], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[5] + bf0[10], stage_range[stage]);
+  bf1[6] = clamp_value(bf0[6] + bf0[9], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[7] + bf0[8], stage_range[stage]);
+  bf1[8] = clamp_value(bf0[7] - bf0[8], stage_range[stage]);
+  bf1[9] = clamp_value(bf0[6] - bf0[9], stage_range[stage]);
+  bf1[10] = clamp_value(bf0[5] - bf0[10], stage_range[stage]);
+  bf1[11] = clamp_value(bf0[4] - bf0[11], stage_range[stage]);
+  bf1[12] = clamp_value(bf0[3] - bf0[12], stage_range[stage]);
+  bf1[13] = clamp_value(bf0[2] - bf0[13], stage_range[stage]);
+  bf1[14] = clamp_value(bf0[1] - bf0[14], stage_range[stage]);
+  bf1[15] = clamp_value(bf0[0] - bf0[15], stage_range[stage]);
+}
+
+void av1_idct32_new(const int32_t *input, int32_t *output, int8_t cos_bit,
+                    const int8_t *stage_range) {
+  assert(output != input);
+  const int32_t size = 32;
+  const int32_t *cospi = cospi_arr(cos_bit);
+
+  int32_t stage = 0;
+  int32_t *bf0, *bf1;
+  int32_t step[32];
+
+  // stage 0;
+
+  // stage 1;
+  stage++;
+  bf1 = output;
+  bf1[0] = input[0];
+  bf1[1] = input[16];
+  bf1[2] = input[8];
+  bf1[3] = input[24];
+  bf1[4] = input[4];
+  bf1[5] = input[20];
+  bf1[6] = input[12];
+  bf1[7] = input[28];
+  bf1[8] = input[2];
+  bf1[9] = input[18];
+  bf1[10] = input[10];
+  bf1[11] = input[26];
+  bf1[12] = input[6];
+  bf1[13] = input[22];
+  bf1[14] = input[14];
+  bf1[15] = input[30];
+  bf1[16] = input[1];
+  bf1[17] = input[17];
+  bf1[18] = input[9];
+  bf1[19] = input[25];
+  bf1[20] = input[5];
+  bf1[21] = input[21];
+  bf1[22] = input[13];
+  bf1[23] = input[29];
+  bf1[24] = input[3];
+  bf1[25] = input[19];
+  bf1[26] = input[11];
+  bf1[27] = input[27];
+  bf1[28] = input[7];
+  bf1[29] = input[23];
+  bf1[30] = input[15];
+  bf1[31] = input[31];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 2
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = bf0[0];
+  bf1[1] = bf0[1];
+  bf1[2] = bf0[2];
+  bf1[3] = bf0[3];
+  bf1[4] = bf0[4];
+  bf1[5] = bf0[5];
+  bf1[6] = bf0[6];
+  bf1[7] = bf0[7];
+  bf1[8] = bf0[8];
+  bf1[9] = bf0[9];
+  bf1[10] = bf0[10];
+  bf1[11] = bf0[11];
+  bf1[12] = bf0[12];
+  bf1[13] = bf0[13];
+  bf1[14] = bf0[14];
+  bf1[15] = bf0[15];
+  bf1[16] = half_btf(cospi[62], bf0[16], -cospi[2], bf0[31], cos_bit);
+  bf1[17] = half_btf(cospi[30], bf0[17], -cospi[34], bf0[30], cos_bit);
+  bf1[18] = half_btf(cospi[46], bf0[18], -cospi[18], bf0[29], cos_bit);
+  bf1[19] = half_btf(cospi[14], bf0[19], -cospi[50], bf0[28], cos_bit);
+  bf1[20] = half_btf(cospi[54], bf0[20], -cospi[10], bf0[27], cos_bit);
+  bf1[21] = half_btf(cospi[22], bf0[21], -cospi[42], bf0[26], cos_bit);
+  bf1[22] = half_btf(cospi[38], bf0[22], -cospi[26], bf0[25], cos_bit);
+  bf1[23] = half_btf(cospi[6], bf0[23], -cospi[58], bf0[24], cos_bit);
+  bf1[24] = half_btf(cospi[58], bf0[23], cospi[6], bf0[24], cos_bit);
+  bf1[25] = half_btf(cospi[26], bf0[22], cospi[38], bf0[25], cos_bit);
+  bf1[26] = half_btf(cospi[42], bf0[21], cospi[22], bf0[26], cos_bit);
+  bf1[27] = half_btf(cospi[10], bf0[20], cospi[54], bf0[27], cos_bit);
+  bf1[28] = half_btf(cospi[50], bf0[19], cospi[14], bf0[28], cos_bit);
+  bf1[29] = half_btf(cospi[18], bf0[18], cospi[46], bf0[29], cos_bit);
+  bf1[30] = half_btf(cospi[34], bf0[17], cospi[30], bf0[30], cos_bit);
+  bf1[31] = half_btf(cospi[2], bf0[16], cospi[62], bf0[31], cos_bit);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 3
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = bf0[0];
+  bf1[1] = bf0[1];
+  bf1[2] = bf0[2];
+  bf1[3] = bf0[3];
+  bf1[4] = bf0[4];
+  bf1[5] = bf0[5];
+  bf1[6] = bf0[6];
+  bf1[7] = bf0[7];
+  bf1[8] = half_btf(cospi[60], bf0[8], -cospi[4], bf0[15], cos_bit);
+  bf1[9] = half_btf(cospi[28], bf0[9], -cospi[36], bf0[14], cos_bit);
+  bf1[10] = half_btf(cospi[44], bf0[10], -cospi[20], bf0[13], cos_bit);
+  bf1[11] = half_btf(cospi[12], bf0[11], -cospi[52], bf0[12], cos_bit);
+  bf1[12] = half_btf(cospi[52], bf0[11], cospi[12], bf0[12], cos_bit);
+  bf1[13] = half_btf(cospi[20], bf0[10], cospi[44], bf0[13], cos_bit);
+  bf1[14] = half_btf(cospi[36], bf0[9], cospi[28], bf0[14], cos_bit);
+  bf1[15] = half_btf(cospi[4], bf0[8], cospi[60], bf0[15], cos_bit);
+  bf1[16] = clamp_value(bf0[16] + bf0[17], stage_range[stage]);
+  bf1[17] = clamp_value(bf0[16] - bf0[17], stage_range[stage]);
+  bf1[18] = clamp_value(-bf0[18] + bf0[19], stage_range[stage]);
+  bf1[19] = clamp_value(bf0[18] + bf0[19], stage_range[stage]);
+  bf1[20] = clamp_value(bf0[20] + bf0[21], stage_range[stage]);
+  bf1[21] = clamp_value(bf0[20] - bf0[21], stage_range[stage]);
+  bf1[22] = clamp_value(-bf0[22] + bf0[23], stage_range[stage]);
+  bf1[23] = clamp_value(bf0[22] + bf0[23], stage_range[stage]);
+  bf1[24] = clamp_value(bf0[24] + bf0[25], stage_range[stage]);
+  bf1[25] = clamp_value(bf0[24] - bf0[25], stage_range[stage]);
+  bf1[26] = clamp_value(-bf0[26] + bf0[27], stage_range[stage]);
+  bf1[27] = clamp_value(bf0[26] + bf0[27], stage_range[stage]);
+  bf1[28] = clamp_value(bf0[28] + bf0[29], stage_range[stage]);
+  bf1[29] = clamp_value(bf0[28] - bf0[29], stage_range[stage]);
+  bf1[30] = clamp_value(-bf0[30] + bf0[31], stage_range[stage]);
+  bf1[31] = clamp_value(bf0[30] + bf0[31], stage_range[stage]);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 4
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = bf0[0];
+  bf1[1] = bf0[1];
+  bf1[2] = bf0[2];
+  bf1[3] = bf0[3];
+  bf1[4] = half_btf(cospi[56], bf0[4], -cospi[8], bf0[7], cos_bit);
+  bf1[5] = half_btf(cospi[24], bf0[5], -cospi[40], bf0[6], cos_bit);
+  bf1[6] = half_btf(cospi[40], bf0[5], cospi[24], bf0[6], cos_bit);
+  bf1[7] = half_btf(cospi[8], bf0[4], cospi[56], bf0[7], cos_bit);
+  bf1[8] = clamp_value(bf0[8] + bf0[9], stage_range[stage]);
+  bf1[9] = clamp_value(bf0[8] - bf0[9], stage_range[stage]);
+  bf1[10] = clamp_value(-bf0[10] + bf0[11], stage_range[stage]);
+  bf1[11] = clamp_value(bf0[10] + bf0[11], stage_range[stage]);
+  bf1[12] = clamp_value(bf0[12] + bf0[13], stage_range[stage]);
+  bf1[13] = clamp_value(bf0[12] - bf0[13], stage_range[stage]);
+  bf1[14] = clamp_value(-bf0[14] + bf0[15], stage_range[stage]);
+  bf1[15] = clamp_value(bf0[14] + bf0[15], stage_range[stage]);
+  bf1[16] = bf0[16];
+  bf1[17] = half_btf(-cospi[8], bf0[17], cospi[56], bf0[30], cos_bit);
+  bf1[18] = half_btf(-cospi[56], bf0[18], -cospi[8], bf0[29], cos_bit);
+  bf1[19] = bf0[19];
+  bf1[20] = bf0[20];
+  bf1[21] = half_btf(-cospi[40], bf0[21], cospi[24], bf0[26], cos_bit);
+  bf1[22] = half_btf(-cospi[24], bf0[22], -cospi[40], bf0[25], cos_bit);
+  bf1[23] = bf0[23];
+  bf1[24] = bf0[24];
+  bf1[25] = half_btf(-cospi[40], bf0[22], cospi[24], bf0[25], cos_bit);
+  bf1[26] = half_btf(cospi[24], bf0[21], cospi[40], bf0[26], cos_bit);
+  bf1[27] = bf0[27];
+  bf1[28] = bf0[28];
+  bf1[29] = half_btf(-cospi[8], bf0[18], cospi[56], bf0[29], cos_bit);
+  bf1[30] = half_btf(cospi[56], bf0[17], cospi[8], bf0[30], cos_bit);
+  bf1[31] = bf0[31];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 5
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = half_btf(cospi[32], bf0[0], cospi[32], bf0[1], cos_bit);
+  bf1[1] = half_btf(cospi[32], bf0[0], -cospi[32], bf0[1], cos_bit);
+  bf1[2] = half_btf(cospi[48], bf0[2], -cospi[16], bf0[3], cos_bit);
+  bf1[3] = half_btf(cospi[16], bf0[2], cospi[48], bf0[3], cos_bit);
+  bf1[4] = clamp_value(bf0[4] + bf0[5], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[4] - bf0[5], stage_range[stage]);
+  bf1[6] = clamp_value(-bf0[6] + bf0[7], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[6] + bf0[7], stage_range[stage]);
+  bf1[8] = bf0[8];
+  bf1[9] = half_btf(-cospi[16], bf0[9], cospi[48], bf0[14], cos_bit);
+  bf1[10] = half_btf(-cospi[48], bf0[10], -cospi[16], bf0[13], cos_bit);
+  bf1[11] = bf0[11];
+  bf1[12] = bf0[12];
+  bf1[13] = half_btf(-cospi[16], bf0[10], cospi[48], bf0[13], cos_bit);
+  bf1[14] = half_btf(cospi[48], bf0[9], cospi[16], bf0[14], cos_bit);
+  bf1[15] = bf0[15];
+  bf1[16] = clamp_value(bf0[16] + bf0[19], stage_range[stage]);
+  bf1[17] = clamp_value(bf0[17] + bf0[18], stage_range[stage]);
+  bf1[18] = clamp_value(bf0[17] - bf0[18], stage_range[stage]);
+  bf1[19] = clamp_value(bf0[16] - bf0[19], stage_range[stage]);
+  bf1[20] = clamp_value(-bf0[20] + bf0[23], stage_range[stage]);
+  bf1[21] = clamp_value(-bf0[21] + bf0[22], stage_range[stage]);
+  bf1[22] = clamp_value(bf0[21] + bf0[22], stage_range[stage]);
+  bf1[23] = clamp_value(bf0[20] + bf0[23], stage_range[stage]);
+  bf1[24] = clamp_value(bf0[24] + bf0[27], stage_range[stage]);
+  bf1[25] = clamp_value(bf0[25] + bf0[26], stage_range[stage]);
+  bf1[26] = clamp_value(bf0[25] - bf0[26], stage_range[stage]);
+  bf1[27] = clamp_value(bf0[24] - bf0[27], stage_range[stage]);
+  bf1[28] = clamp_value(-bf0[28] + bf0[31], stage_range[stage]);
+  bf1[29] = clamp_value(-bf0[29] + bf0[30], stage_range[stage]);
+  bf1[30] = clamp_value(bf0[29] + bf0[30], stage_range[stage]);
+  bf1[31] = clamp_value(bf0[28] + bf0[31], stage_range[stage]);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 6
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = clamp_value(bf0[0] + bf0[3], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[2], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[1] - bf0[2], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[0] - bf0[3], stage_range[stage]);
+  bf1[4] = bf0[4];
+  bf1[5] = half_btf(-cospi[32], bf0[5], cospi[32], bf0[6], cos_bit);
+  bf1[6] = half_btf(cospi[32], bf0[5], cospi[32], bf0[6], cos_bit);
+  bf1[7] = bf0[7];
+  bf1[8] = clamp_value(bf0[8] + bf0[11], stage_range[stage]);
+  bf1[9] = clamp_value(bf0[9] + bf0[10], stage_range[stage]);
+  bf1[10] = clamp_value(bf0[9] - bf0[10], stage_range[stage]);
+  bf1[11] = clamp_value(bf0[8] - bf0[11], stage_range[stage]);
+  bf1[12] = clamp_value(-bf0[12] + bf0[15], stage_range[stage]);
+  bf1[13] = clamp_value(-bf0[13] + bf0[14], stage_range[stage]);
+  bf1[14] = clamp_value(bf0[13] + bf0[14], stage_range[stage]);
+  bf1[15] = clamp_value(bf0[12] + bf0[15], stage_range[stage]);
+  bf1[16] = bf0[16];
+  bf1[17] = bf0[17];
+  bf1[18] = half_btf(-cospi[16], bf0[18], cospi[48], bf0[29], cos_bit);
+  bf1[19] = half_btf(-cospi[16], bf0[19], cospi[48], bf0[28], cos_bit);
+  bf1[20] = half_btf(-cospi[48], bf0[20], -cospi[16], bf0[27], cos_bit);
+  bf1[21] = half_btf(-cospi[48], bf0[21], -cospi[16], bf0[26], cos_bit);
+  bf1[22] = bf0[22];
+  bf1[23] = bf0[23];
+  bf1[24] = bf0[24];
+  bf1[25] = bf0[25];
+  bf1[26] = half_btf(-cospi[16], bf0[21], cospi[48], bf0[26], cos_bit);
+  bf1[27] = half_btf(-cospi[16], bf0[20], cospi[48], bf0[27], cos_bit);
+  bf1[28] = half_btf(cospi[48], bf0[19], cospi[16], bf0[28], cos_bit);
+  bf1[29] = half_btf(cospi[48], bf0[18], cospi[16], bf0[29], cos_bit);
+  bf1[30] = bf0[30];
+  bf1[31] = bf0[31];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 7
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = clamp_value(bf0[0] + bf0[7], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[6], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[2] + bf0[5], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[3] + bf0[4], stage_range[stage]);
+  bf1[4] = clamp_value(bf0[3] - bf0[4], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[2] - bf0[5], stage_range[stage]);
+  bf1[6] = clamp_value(bf0[1] - bf0[6], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[0] - bf0[7], stage_range[stage]);
+  bf1[8] = bf0[8];
+  bf1[9] = bf0[9];
+  bf1[10] = half_btf(-cospi[32], bf0[10], cospi[32], bf0[13], cos_bit);
+  bf1[11] = half_btf(-cospi[32], bf0[11], cospi[32], bf0[12], cos_bit);
+  bf1[12] = half_btf(cospi[32], bf0[11], cospi[32], bf0[12], cos_bit);
+  bf1[13] = half_btf(cospi[32], bf0[10], cospi[32], bf0[13], cos_bit);
+  bf1[14] = bf0[14];
+  bf1[15] = bf0[15];
+  bf1[16] = clamp_value(bf0[16] + bf0[23], stage_range[stage]);
+  bf1[17] = clamp_value(bf0[17] + bf0[22], stage_range[stage]);
+  bf1[18] = clamp_value(bf0[18] + bf0[21], stage_range[stage]);
+  bf1[19] = clamp_value(bf0[19] + bf0[20], stage_range[stage]);
+  bf1[20] = clamp_value(bf0[19] - bf0[20], stage_range[stage]);
+  bf1[21] = clamp_value(bf0[18] - bf0[21], stage_range[stage]);
+  bf1[22] = clamp_value(bf0[17] - bf0[22], stage_range[stage]);
+  bf1[23] = clamp_value(bf0[16] - bf0[23], stage_range[stage]);
+  bf1[24] = clamp_value(-bf0[24] + bf0[31], stage_range[stage]);
+  bf1[25] = clamp_value(-bf0[25] + bf0[30], stage_range[stage]);
+  bf1[26] = clamp_value(-bf0[26] + bf0[29], stage_range[stage]);
+  bf1[27] = clamp_value(-bf0[27] + bf0[28], stage_range[stage]);
+  bf1[28] = clamp_value(bf0[27] + bf0[28], stage_range[stage]);
+  bf1[29] = clamp_value(bf0[26] + bf0[29], stage_range[stage]);
+  bf1[30] = clamp_value(bf0[25] + bf0[30], stage_range[stage]);
+  bf1[31] = clamp_value(bf0[24] + bf0[31], stage_range[stage]);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 8
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = clamp_value(bf0[0] + bf0[15], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[14], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[2] + bf0[13], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[3] + bf0[12], stage_range[stage]);
+  bf1[4] = clamp_value(bf0[4] + bf0[11], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[5] + bf0[10], stage_range[stage]);
+  bf1[6] = clamp_value(bf0[6] + bf0[9], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[7] + bf0[8], stage_range[stage]);
+  bf1[8] = clamp_value(bf0[7] - bf0[8], stage_range[stage]);
+  bf1[9] = clamp_value(bf0[6] - bf0[9], stage_range[stage]);
+  bf1[10] = clamp_value(bf0[5] - bf0[10], stage_range[stage]);
+  bf1[11] = clamp_value(bf0[4] - bf0[11], stage_range[stage]);
+  bf1[12] = clamp_value(bf0[3] - bf0[12], stage_range[stage]);
+  bf1[13] = clamp_value(bf0[2] - bf0[13], stage_range[stage]);
+  bf1[14] = clamp_value(bf0[1] - bf0[14], stage_range[stage]);
+  bf1[15] = clamp_value(bf0[0] - bf0[15], stage_range[stage]);
+  bf1[16] = bf0[16];
+  bf1[17] = bf0[17];
+  bf1[18] = bf0[18];
+  bf1[19] = bf0[19];
+  bf1[20] = half_btf(-cospi[32], bf0[20], cospi[32], bf0[27], cos_bit);
+  bf1[21] = half_btf(-cospi[32], bf0[21], cospi[32], bf0[26], cos_bit);
+  bf1[22] = half_btf(-cospi[32], bf0[22], cospi[32], bf0[25], cos_bit);
+  bf1[23] = half_btf(-cospi[32], bf0[23], cospi[32], bf0[24], cos_bit);
+  bf1[24] = half_btf(cospi[32], bf0[23], cospi[32], bf0[24], cos_bit);
+  bf1[25] = half_btf(cospi[32], bf0[22], cospi[32], bf0[25], cos_bit);
+  bf1[26] = half_btf(cospi[32], bf0[21], cospi[32], bf0[26], cos_bit);
+  bf1[27] = half_btf(cospi[32], bf0[20], cospi[32], bf0[27], cos_bit);
+  bf1[28] = bf0[28];
+  bf1[29] = bf0[29];
+  bf1[30] = bf0[30];
+  bf1[31] = bf0[31];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 9
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = clamp_value(bf0[0] + bf0[31], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[30], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[2] + bf0[29], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[3] + bf0[28], stage_range[stage]);
+  bf1[4] = clamp_value(bf0[4] + bf0[27], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[5] + bf0[26], stage_range[stage]);
+  bf1[6] = clamp_value(bf0[6] + bf0[25], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[7] + bf0[24], stage_range[stage]);
+  bf1[8] = clamp_value(bf0[8] + bf0[23], stage_range[stage]);
+  bf1[9] = clamp_value(bf0[9] + bf0[22], stage_range[stage]);
+  bf1[10] = clamp_value(bf0[10] + bf0[21], stage_range[stage]);
+  bf1[11] = clamp_value(bf0[11] + bf0[20], stage_range[stage]);
+  bf1[12] = clamp_value(bf0[12] + bf0[19], stage_range[stage]);
+  bf1[13] = clamp_value(bf0[13] + bf0[18], stage_range[stage]);
+  bf1[14] = clamp_value(bf0[14] + bf0[17], stage_range[stage]);
+  bf1[15] = clamp_value(bf0[15] + bf0[16], stage_range[stage]);
+  bf1[16] = clamp_value(bf0[15] - bf0[16], stage_range[stage]);
+  bf1[17] = clamp_value(bf0[14] - bf0[17], stage_range[stage]);
+  bf1[18] = clamp_value(bf0[13] - bf0[18], stage_range[stage]);
+  bf1[19] = clamp_value(bf0[12] - bf0[19], stage_range[stage]);
+  bf1[20] = clamp_value(bf0[11] - bf0[20], stage_range[stage]);
+  bf1[21] = clamp_value(bf0[10] - bf0[21], stage_range[stage]);
+  bf1[22] = clamp_value(bf0[9] - bf0[22], stage_range[stage]);
+  bf1[23] = clamp_value(bf0[8] - bf0[23], stage_range[stage]);
+  bf1[24] = clamp_value(bf0[7] - bf0[24], stage_range[stage]);
+  bf1[25] = clamp_value(bf0[6] - bf0[25], stage_range[stage]);
+  bf1[26] = clamp_value(bf0[5] - bf0[26], stage_range[stage]);
+  bf1[27] = clamp_value(bf0[4] - bf0[27], stage_range[stage]);
+  bf1[28] = clamp_value(bf0[3] - bf0[28], stage_range[stage]);
+  bf1[29] = clamp_value(bf0[2] - bf0[29], stage_range[stage]);
+  bf1[30] = clamp_value(bf0[1] - bf0[30], stage_range[stage]);
+  bf1[31] = clamp_value(bf0[0] - bf0[31], stage_range[stage]);
+}
+
+void av1_iadst4_new(const int32_t *input, int32_t *output, int8_t cos_bit,
+                    const int8_t *stage_range) {
+  int bit = cos_bit;
+  const int32_t *sinpi = sinpi_arr(bit);
+  int32_t s0, s1, s2, s3, s4, s5, s6, s7;
+
+  int32_t x0 = input[0];
+  int32_t x1 = input[1];
+  int32_t x2 = input[2];
+  int32_t x3 = input[3];
+
+  if (!(x0 | x1 | x2 | x3)) {
+    output[0] = output[1] = output[2] = output[3] = 0;
+    return;
+  }
+
+  assert(sinpi[1] + sinpi[2] == sinpi[4]);
+
+  // stage 1
+  s0 = range_check_value(sinpi[1] * x0, stage_range[1] + bit);
+  s1 = range_check_value(sinpi[2] * x0, stage_range[1] + bit);
+  s2 = range_check_value(sinpi[3] * x1, stage_range[1] + bit);
+  s3 = range_check_value(sinpi[4] * x2, stage_range[1] + bit);
+  s4 = range_check_value(sinpi[1] * x2, stage_range[1] + bit);
+  s5 = range_check_value(sinpi[2] * x3, stage_range[1] + bit);
+  s6 = range_check_value(sinpi[4] * x3, stage_range[1] + bit);
+
+  // stage 2
+  // NOTICE: (x0 - x2) here may use one extra bit compared to the
+  // opt_range_row/col specified in av1_gen_inv_stage_range()
+  s7 = range_check_value((x0 - x2) + x3, stage_range[2]);
+
+  // stage 3
+  s0 = range_check_value(s0 + s3, stage_range[3] + bit);
+  s1 = range_check_value(s1 - s4, stage_range[3] + bit);
+  s3 = range_check_value(s2, stage_range[3] + bit);
+  s2 = range_check_value(sinpi[3] * s7, stage_range[3] + bit);
+
+  // stage 4
+  s0 = range_check_value(s0 + s5, stage_range[4] + bit);
+  s1 = range_check_value(s1 - s6, stage_range[4] + bit);
+
+  // stage 5
+  x0 = range_check_value(s0 + s3, stage_range[5] + bit);
+  x1 = range_check_value(s1 + s3, stage_range[5] + bit);
+  x2 = range_check_value(s2, stage_range[5] + bit);
+  x3 = range_check_value(s0 + s1, stage_range[5] + bit);
+
+  // stage 6
+  x3 = range_check_value(x3 - s3, stage_range[6] + bit);
+
+  output[0] = round_shift(x0, bit);
+  output[1] = round_shift(x1, bit);
+  output[2] = round_shift(x2, bit);
+  output[3] = round_shift(x3, bit);
+}
+
+void av1_iadst8_new(const int32_t *input, int32_t *output, int8_t cos_bit,
+                    const int8_t *stage_range) {
+  assert(output != input);
+  const int32_t size = 8;
+  const int32_t *cospi = cospi_arr(cos_bit);
+
+  int32_t stage = 0;
+  int32_t *bf0, *bf1;
+  int32_t step[8];
+
+  // stage 0;
+
+  // stage 1;
+  stage++;
+  bf1 = output;
+  bf1[0] = input[7];
+  bf1[1] = input[0];
+  bf1[2] = input[5];
+  bf1[3] = input[2];
+  bf1[4] = input[3];
+  bf1[5] = input[4];
+  bf1[6] = input[1];
+  bf1[7] = input[6];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 2
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = half_btf(cospi[4], bf0[0], cospi[60], bf0[1], cos_bit);
+  bf1[1] = half_btf(cospi[60], bf0[0], -cospi[4], bf0[1], cos_bit);
+  bf1[2] = half_btf(cospi[20], bf0[2], cospi[44], bf0[3], cos_bit);
+  bf1[3] = half_btf(cospi[44], bf0[2], -cospi[20], bf0[3], cos_bit);
+  bf1[4] = half_btf(cospi[36], bf0[4], cospi[28], bf0[5], cos_bit);
+  bf1[5] = half_btf(cospi[28], bf0[4], -cospi[36], bf0[5], cos_bit);
+  bf1[6] = half_btf(cospi[52], bf0[6], cospi[12], bf0[7], cos_bit);
+  bf1[7] = half_btf(cospi[12], bf0[6], -cospi[52], bf0[7], cos_bit);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 3
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = clamp_value(bf0[0] + bf0[4], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[5], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[2] + bf0[6], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[3] + bf0[7], stage_range[stage]);
+  bf1[4] = clamp_value(bf0[0] - bf0[4], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[1] - bf0[5], stage_range[stage]);
+  bf1[6] = clamp_value(bf0[2] - bf0[6], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[3] - bf0[7], stage_range[stage]);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 4
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = bf0[0];
+  bf1[1] = bf0[1];
+  bf1[2] = bf0[2];
+  bf1[3] = bf0[3];
+  bf1[4] = half_btf(cospi[16], bf0[4], cospi[48], bf0[5], cos_bit);
+  bf1[5] = half_btf(cospi[48], bf0[4], -cospi[16], bf0[5], cos_bit);
+  bf1[6] = half_btf(-cospi[48], bf0[6], cospi[16], bf0[7], cos_bit);
+  bf1[7] = half_btf(cospi[16], bf0[6], cospi[48], bf0[7], cos_bit);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 5
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = clamp_value(bf0[0] + bf0[2], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[3], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[0] - bf0[2], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[1] - bf0[3], stage_range[stage]);
+  bf1[4] = clamp_value(bf0[4] + bf0[6], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[5] + bf0[7], stage_range[stage]);
+  bf1[6] = clamp_value(bf0[4] - bf0[6], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[5] - bf0[7], stage_range[stage]);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 6
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = bf0[0];
+  bf1[1] = bf0[1];
+  bf1[2] = half_btf(cospi[32], bf0[2], cospi[32], bf0[3], cos_bit);
+  bf1[3] = half_btf(cospi[32], bf0[2], -cospi[32], bf0[3], cos_bit);
+  bf1[4] = bf0[4];
+  bf1[5] = bf0[5];
+  bf1[6] = half_btf(cospi[32], bf0[6], cospi[32], bf0[7], cos_bit);
+  bf1[7] = half_btf(cospi[32], bf0[6], -cospi[32], bf0[7], cos_bit);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 7
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = bf0[0];
+  bf1[1] = -bf0[4];
+  bf1[2] = bf0[6];
+  bf1[3] = -bf0[2];
+  bf1[4] = bf0[3];
+  bf1[5] = -bf0[7];
+  bf1[6] = bf0[5];
+  bf1[7] = -bf0[1];
+}
+
+void av1_iadst16_new(const int32_t *input, int32_t *output, int8_t cos_bit,
+                     const int8_t *stage_range) {
+  assert(output != input);
+  const int32_t size = 16;
+  const int32_t *cospi = cospi_arr(cos_bit);
+
+  int32_t stage = 0;
+  int32_t *bf0, *bf1;
+  int32_t step[16];
+
+  // stage 0;
+
+  // stage 1;
+  stage++;
+  bf1 = output;
+  bf1[0] = input[15];
+  bf1[1] = input[0];
+  bf1[2] = input[13];
+  bf1[3] = input[2];
+  bf1[4] = input[11];
+  bf1[5] = input[4];
+  bf1[6] = input[9];
+  bf1[7] = input[6];
+  bf1[8] = input[7];
+  bf1[9] = input[8];
+  bf1[10] = input[5];
+  bf1[11] = input[10];
+  bf1[12] = input[3];
+  bf1[13] = input[12];
+  bf1[14] = input[1];
+  bf1[15] = input[14];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 2
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = half_btf(cospi[2], bf0[0], cospi[62], bf0[1], cos_bit);
+  bf1[1] = half_btf(cospi[62], bf0[0], -cospi[2], bf0[1], cos_bit);
+  bf1[2] = half_btf(cospi[10], bf0[2], cospi[54], bf0[3], cos_bit);
+  bf1[3] = half_btf(cospi[54], bf0[2], -cospi[10], bf0[3], cos_bit);
+  bf1[4] = half_btf(cospi[18], bf0[4], cospi[46], bf0[5], cos_bit);
+  bf1[5] = half_btf(cospi[46], bf0[4], -cospi[18], bf0[5], cos_bit);
+  bf1[6] = half_btf(cospi[26], bf0[6], cospi[38], bf0[7], cos_bit);
+  bf1[7] = half_btf(cospi[38], bf0[6], -cospi[26], bf0[7], cos_bit);
+  bf1[8] = half_btf(cospi[34], bf0[8], cospi[30], bf0[9], cos_bit);
+  bf1[9] = half_btf(cospi[30], bf0[8], -cospi[34], bf0[9], cos_bit);
+  bf1[10] = half_btf(cospi[42], bf0[10], cospi[22], bf0[11], cos_bit);
+  bf1[11] = half_btf(cospi[22], bf0[10], -cospi[42], bf0[11], cos_bit);
+  bf1[12] = half_btf(cospi[50], bf0[12], cospi[14], bf0[13], cos_bit);
+  bf1[13] = half_btf(cospi[14], bf0[12], -cospi[50], bf0[13], cos_bit);
+  bf1[14] = half_btf(cospi[58], bf0[14], cospi[6], bf0[15], cos_bit);
+  bf1[15] = half_btf(cospi[6], bf0[14], -cospi[58], bf0[15], cos_bit);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 3
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = clamp_value(bf0[0] + bf0[8], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[9], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[2] + bf0[10], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[3] + bf0[11], stage_range[stage]);
+  bf1[4] = clamp_value(bf0[4] + bf0[12], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[5] + bf0[13], stage_range[stage]);
+  bf1[6] = clamp_value(bf0[6] + bf0[14], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[7] + bf0[15], stage_range[stage]);
+  bf1[8] = clamp_value(bf0[0] - bf0[8], stage_range[stage]);
+  bf1[9] = clamp_value(bf0[1] - bf0[9], stage_range[stage]);
+  bf1[10] = clamp_value(bf0[2] - bf0[10], stage_range[stage]);
+  bf1[11] = clamp_value(bf0[3] - bf0[11], stage_range[stage]);
+  bf1[12] = clamp_value(bf0[4] - bf0[12], stage_range[stage]);
+  bf1[13] = clamp_value(bf0[5] - bf0[13], stage_range[stage]);
+  bf1[14] = clamp_value(bf0[6] - bf0[14], stage_range[stage]);
+  bf1[15] = clamp_value(bf0[7] - bf0[15], stage_range[stage]);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 4
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = bf0[0];
+  bf1[1] = bf0[1];
+  bf1[2] = bf0[2];
+  bf1[3] = bf0[3];
+  bf1[4] = bf0[4];
+  bf1[5] = bf0[5];
+  bf1[6] = bf0[6];
+  bf1[7] = bf0[7];
+  bf1[8] = half_btf(cospi[8], bf0[8], cospi[56], bf0[9], cos_bit);
+  bf1[9] = half_btf(cospi[56], bf0[8], -cospi[8], bf0[9], cos_bit);
+  bf1[10] = half_btf(cospi[40], bf0[10], cospi[24], bf0[11], cos_bit);
+  bf1[11] = half_btf(cospi[24], bf0[10], -cospi[40], bf0[11], cos_bit);
+  bf1[12] = half_btf(-cospi[56], bf0[12], cospi[8], bf0[13], cos_bit);
+  bf1[13] = half_btf(cospi[8], bf0[12], cospi[56], bf0[13], cos_bit);
+  bf1[14] = half_btf(-cospi[24], bf0[14], cospi[40], bf0[15], cos_bit);
+  bf1[15] = half_btf(cospi[40], bf0[14], cospi[24], bf0[15], cos_bit);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 5
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = clamp_value(bf0[0] + bf0[4], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[5], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[2] + bf0[6], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[3] + bf0[7], stage_range[stage]);
+  bf1[4] = clamp_value(bf0[0] - bf0[4], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[1] - bf0[5], stage_range[stage]);
+  bf1[6] = clamp_value(bf0[2] - bf0[6], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[3] - bf0[7], stage_range[stage]);
+  bf1[8] = clamp_value(bf0[8] + bf0[12], stage_range[stage]);
+  bf1[9] = clamp_value(bf0[9] + bf0[13], stage_range[stage]);
+  bf1[10] = clamp_value(bf0[10] + bf0[14], stage_range[stage]);
+  bf1[11] = clamp_value(bf0[11] + bf0[15], stage_range[stage]);
+  bf1[12] = clamp_value(bf0[8] - bf0[12], stage_range[stage]);
+  bf1[13] = clamp_value(bf0[9] - bf0[13], stage_range[stage]);
+  bf1[14] = clamp_value(bf0[10] - bf0[14], stage_range[stage]);
+  bf1[15] = clamp_value(bf0[11] - bf0[15], stage_range[stage]);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 6
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = bf0[0];
+  bf1[1] = bf0[1];
+  bf1[2] = bf0[2];
+  bf1[3] = bf0[3];
+  bf1[4] = half_btf(cospi[16], bf0[4], cospi[48], bf0[5], cos_bit);
+  bf1[5] = half_btf(cospi[48], bf0[4], -cospi[16], bf0[5], cos_bit);
+  bf1[6] = half_btf(-cospi[48], bf0[6], cospi[16], bf0[7], cos_bit);
+  bf1[7] = half_btf(cospi[16], bf0[6], cospi[48], bf0[7], cos_bit);
+  bf1[8] = bf0[8];
+  bf1[9] = bf0[9];
+  bf1[10] = bf0[10];
+  bf1[11] = bf0[11];
+  bf1[12] = half_btf(cospi[16], bf0[12], cospi[48], bf0[13], cos_bit);
+  bf1[13] = half_btf(cospi[48], bf0[12], -cospi[16], bf0[13], cos_bit);
+  bf1[14] = half_btf(-cospi[48], bf0[14], cospi[16], bf0[15], cos_bit);
+  bf1[15] = half_btf(cospi[16], bf0[14], cospi[48], bf0[15], cos_bit);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 7
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = clamp_value(bf0[0] + bf0[2], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[3], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[0] - bf0[2], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[1] - bf0[3], stage_range[stage]);
+  bf1[4] = clamp_value(bf0[4] + bf0[6], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[5] + bf0[7], stage_range[stage]);
+  bf1[6] = clamp_value(bf0[4] - bf0[6], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[5] - bf0[7], stage_range[stage]);
+  bf1[8] = clamp_value(bf0[8] + bf0[10], stage_range[stage]);
+  bf1[9] = clamp_value(bf0[9] + bf0[11], stage_range[stage]);
+  bf1[10] = clamp_value(bf0[8] - bf0[10], stage_range[stage]);
+  bf1[11] = clamp_value(bf0[9] - bf0[11], stage_range[stage]);
+  bf1[12] = clamp_value(bf0[12] + bf0[14], stage_range[stage]);
+  bf1[13] = clamp_value(bf0[13] + bf0[15], stage_range[stage]);
+  bf1[14] = clamp_value(bf0[12] - bf0[14], stage_range[stage]);
+  bf1[15] = clamp_value(bf0[13] - bf0[15], stage_range[stage]);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 8
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = bf0[0];
+  bf1[1] = bf0[1];
+  bf1[2] = half_btf(cospi[32], bf0[2], cospi[32], bf0[3], cos_bit);
+  bf1[3] = half_btf(cospi[32], bf0[2], -cospi[32], bf0[3], cos_bit);
+  bf1[4] = bf0[4];
+  bf1[5] = bf0[5];
+  bf1[6] = half_btf(cospi[32], bf0[6], cospi[32], bf0[7], cos_bit);
+  bf1[7] = half_btf(cospi[32], bf0[6], -cospi[32], bf0[7], cos_bit);
+  bf1[8] = bf0[8];
+  bf1[9] = bf0[9];
+  bf1[10] = half_btf(cospi[32], bf0[10], cospi[32], bf0[11], cos_bit);
+  bf1[11] = half_btf(cospi[32], bf0[10], -cospi[32], bf0[11], cos_bit);
+  bf1[12] = bf0[12];
+  bf1[13] = bf0[13];
+  bf1[14] = half_btf(cospi[32], bf0[14], cospi[32], bf0[15], cos_bit);
+  bf1[15] = half_btf(cospi[32], bf0[14], -cospi[32], bf0[15], cos_bit);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 9
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = bf0[0];
+  bf1[1] = -bf0[8];
+  bf1[2] = bf0[12];
+  bf1[3] = -bf0[4];
+  bf1[4] = bf0[6];
+  bf1[5] = -bf0[14];
+  bf1[6] = bf0[10];
+  bf1[7] = -bf0[2];
+  bf1[8] = bf0[3];
+  bf1[9] = -bf0[11];
+  bf1[10] = bf0[15];
+  bf1[11] = -bf0[7];
+  bf1[12] = bf0[5];
+  bf1[13] = -bf0[13];
+  bf1[14] = bf0[9];
+  bf1[15] = -bf0[1];
+}
+
+void av1_iidentity4_c(const int32_t *input, int32_t *output, int8_t cos_bit,
+                      const int8_t *stage_range) {
+  (void)cos_bit;
+  (void)stage_range;
+  for (int i = 0; i < 4; ++i) {
+    output[i] = round_shift((int64_t)NewSqrt2 * input[i], NewSqrt2Bits);
+  }
+  assert(stage_range[0] + NewSqrt2Bits <= 32);
+}
+
+void av1_iidentity8_c(const int32_t *input, int32_t *output, int8_t cos_bit,
+                      const int8_t *stage_range) {
+  (void)cos_bit;
+  (void)stage_range;
+  for (int i = 0; i < 8; ++i) output[i] = (int32_t)((int64_t)input[i] * 2);
+}
+
+void av1_iidentity16_c(const int32_t *input, int32_t *output, int8_t cos_bit,
+                       const int8_t *stage_range) {
+  (void)cos_bit;
+  (void)stage_range;
+  for (int i = 0; i < 16; ++i)
+    output[i] = round_shift((int64_t)NewSqrt2 * 2 * input[i], NewSqrt2Bits);
+  assert(stage_range[0] + NewSqrt2Bits <= 32);
+}
+
+void av1_iidentity32_c(const int32_t *input, int32_t *output, int8_t cos_bit,
+                       const int8_t *stage_range) {
+  (void)cos_bit;
+  (void)stage_range;
+  for (int i = 0; i < 32; ++i) output[i] = (int32_t)((int64_t)input[i] * 4);
+}
+
+void av1_idct64_new(const int32_t *input, int32_t *output, int8_t cos_bit,
+                    const int8_t *stage_range) {
+  assert(output != input);
+  const int32_t size = 64;
+  const int32_t *cospi = cospi_arr(cos_bit);
+
+  int32_t stage = 0;
+  int32_t *bf0, *bf1;
+  int32_t step[64];
+
+  // stage 0;
+
+  // stage 1;
+  stage++;
+  bf1 = output;
+  bf1[0] = input[0];
+  bf1[1] = input[32];
+  bf1[2] = input[16];
+  bf1[3] = input[48];
+  bf1[4] = input[8];
+  bf1[5] = input[40];
+  bf1[6] = input[24];
+  bf1[7] = input[56];
+  bf1[8] = input[4];
+  bf1[9] = input[36];
+  bf1[10] = input[20];
+  bf1[11] = input[52];
+  bf1[12] = input[12];
+  bf1[13] = input[44];
+  bf1[14] = input[28];
+  bf1[15] = input[60];
+  bf1[16] = input[2];
+  bf1[17] = input[34];
+  bf1[18] = input[18];
+  bf1[19] = input[50];
+  bf1[20] = input[10];
+  bf1[21] = input[42];
+  bf1[22] = input[26];
+  bf1[23] = input[58];
+  bf1[24] = input[6];
+  bf1[25] = input[38];
+  bf1[26] = input[22];
+  bf1[27] = input[54];
+  bf1[28] = input[14];
+  bf1[29] = input[46];
+  bf1[30] = input[30];
+  bf1[31] = input[62];
+  bf1[32] = input[1];
+  bf1[33] = input[33];
+  bf1[34] = input[17];
+  bf1[35] = input[49];
+  bf1[36] = input[9];
+  bf1[37] = input[41];
+  bf1[38] = input[25];
+  bf1[39] = input[57];
+  bf1[40] = input[5];
+  bf1[41] = input[37];
+  bf1[42] = input[21];
+  bf1[43] = input[53];
+  bf1[44] = input[13];
+  bf1[45] = input[45];
+  bf1[46] = input[29];
+  bf1[47] = input[61];
+  bf1[48] = input[3];
+  bf1[49] = input[35];
+  bf1[50] = input[19];
+  bf1[51] = input[51];
+  bf1[52] = input[11];
+  bf1[53] = input[43];
+  bf1[54] = input[27];
+  bf1[55] = input[59];
+  bf1[56] = input[7];
+  bf1[57] = input[39];
+  bf1[58] = input[23];
+  bf1[59] = input[55];
+  bf1[60] = input[15];
+  bf1[61] = input[47];
+  bf1[62] = input[31];
+  bf1[63] = input[63];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 2
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = bf0[0];
+  bf1[1] = bf0[1];
+  bf1[2] = bf0[2];
+  bf1[3] = bf0[3];
+  bf1[4] = bf0[4];
+  bf1[5] = bf0[5];
+  bf1[6] = bf0[6];
+  bf1[7] = bf0[7];
+  bf1[8] = bf0[8];
+  bf1[9] = bf0[9];
+  bf1[10] = bf0[10];
+  bf1[11] = bf0[11];
+  bf1[12] = bf0[12];
+  bf1[13] = bf0[13];
+  bf1[14] = bf0[14];
+  bf1[15] = bf0[15];
+  bf1[16] = bf0[16];
+  bf1[17] = bf0[17];
+  bf1[18] = bf0[18];
+  bf1[19] = bf0[19];
+  bf1[20] = bf0[20];
+  bf1[21] = bf0[21];
+  bf1[22] = bf0[22];
+  bf1[23] = bf0[23];
+  bf1[24] = bf0[24];
+  bf1[25] = bf0[25];
+  bf1[26] = bf0[26];
+  bf1[27] = bf0[27];
+  bf1[28] = bf0[28];
+  bf1[29] = bf0[29];
+  bf1[30] = bf0[30];
+  bf1[31] = bf0[31];
+  bf1[32] = half_btf(cospi[63], bf0[32], -cospi[1], bf0[63], cos_bit);
+  bf1[33] = half_btf(cospi[31], bf0[33], -cospi[33], bf0[62], cos_bit);
+  bf1[34] = half_btf(cospi[47], bf0[34], -cospi[17], bf0[61], cos_bit);
+  bf1[35] = half_btf(cospi[15], bf0[35], -cospi[49], bf0[60], cos_bit);
+  bf1[36] = half_btf(cospi[55], bf0[36], -cospi[9], bf0[59], cos_bit);
+  bf1[37] = half_btf(cospi[23], bf0[37], -cospi[41], bf0[58], cos_bit);
+  bf1[38] = half_btf(cospi[39], bf0[38], -cospi[25], bf0[57], cos_bit);
+  bf1[39] = half_btf(cospi[7], bf0[39], -cospi[57], bf0[56], cos_bit);
+  bf1[40] = half_btf(cospi[59], bf0[40], -cospi[5], bf0[55], cos_bit);
+  bf1[41] = half_btf(cospi[27], bf0[41], -cospi[37], bf0[54], cos_bit);
+  bf1[42] = half_btf(cospi[43], bf0[42], -cospi[21], bf0[53], cos_bit);
+  bf1[43] = half_btf(cospi[11], bf0[43], -cospi[53], bf0[52], cos_bit);
+  bf1[44] = half_btf(cospi[51], bf0[44], -cospi[13], bf0[51], cos_bit);
+  bf1[45] = half_btf(cospi[19], bf0[45], -cospi[45], bf0[50], cos_bit);
+  bf1[46] = half_btf(cospi[35], bf0[46], -cospi[29], bf0[49], cos_bit);
+  bf1[47] = half_btf(cospi[3], bf0[47], -cospi[61], bf0[48], cos_bit);
+  bf1[48] = half_btf(cospi[61], bf0[47], cospi[3], bf0[48], cos_bit);
+  bf1[49] = half_btf(cospi[29], bf0[46], cospi[35], bf0[49], cos_bit);
+  bf1[50] = half_btf(cospi[45], bf0[45], cospi[19], bf0[50], cos_bit);
+  bf1[51] = half_btf(cospi[13], bf0[44], cospi[51], bf0[51], cos_bit);
+  bf1[52] = half_btf(cospi[53], bf0[43], cospi[11], bf0[52], cos_bit);
+  bf1[53] = half_btf(cospi[21], bf0[42], cospi[43], bf0[53], cos_bit);
+  bf1[54] = half_btf(cospi[37], bf0[41], cospi[27], bf0[54], cos_bit);
+  bf1[55] = half_btf(cospi[5], bf0[40], cospi[59], bf0[55], cos_bit);
+  bf1[56] = half_btf(cospi[57], bf0[39], cospi[7], bf0[56], cos_bit);
+  bf1[57] = half_btf(cospi[25], bf0[38], cospi[39], bf0[57], cos_bit);
+  bf1[58] = half_btf(cospi[41], bf0[37], cospi[23], bf0[58], cos_bit);
+  bf1[59] = half_btf(cospi[9], bf0[36], cospi[55], bf0[59], cos_bit);
+  bf1[60] = half_btf(cospi[49], bf0[35], cospi[15], bf0[60], cos_bit);
+  bf1[61] = half_btf(cospi[17], bf0[34], cospi[47], bf0[61], cos_bit);
+  bf1[62] = half_btf(cospi[33], bf0[33], cospi[31], bf0[62], cos_bit);
+  bf1[63] = half_btf(cospi[1], bf0[32], cospi[63], bf0[63], cos_bit);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 3
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = bf0[0];
+  bf1[1] = bf0[1];
+  bf1[2] = bf0[2];
+  bf1[3] = bf0[3];
+  bf1[4] = bf0[4];
+  bf1[5] = bf0[5];
+  bf1[6] = bf0[6];
+  bf1[7] = bf0[7];
+  bf1[8] = bf0[8];
+  bf1[9] = bf0[9];
+  bf1[10] = bf0[10];
+  bf1[11] = bf0[11];
+  bf1[12] = bf0[12];
+  bf1[13] = bf0[13];
+  bf1[14] = bf0[14];
+  bf1[15] = bf0[15];
+  bf1[16] = half_btf(cospi[62], bf0[16], -cospi[2], bf0[31], cos_bit);
+  bf1[17] = half_btf(cospi[30], bf0[17], -cospi[34], bf0[30], cos_bit);
+  bf1[18] = half_btf(cospi[46], bf0[18], -cospi[18], bf0[29], cos_bit);
+  bf1[19] = half_btf(cospi[14], bf0[19], -cospi[50], bf0[28], cos_bit);
+  bf1[20] = half_btf(cospi[54], bf0[20], -cospi[10], bf0[27], cos_bit);
+  bf1[21] = half_btf(cospi[22], bf0[21], -cospi[42], bf0[26], cos_bit);
+  bf1[22] = half_btf(cospi[38], bf0[22], -cospi[26], bf0[25], cos_bit);
+  bf1[23] = half_btf(cospi[6], bf0[23], -cospi[58], bf0[24], cos_bit);
+  bf1[24] = half_btf(cospi[58], bf0[23], cospi[6], bf0[24], cos_bit);
+  bf1[25] = half_btf(cospi[26], bf0[22], cospi[38], bf0[25], cos_bit);
+  bf1[26] = half_btf(cospi[42], bf0[21], cospi[22], bf0[26], cos_bit);
+  bf1[27] = half_btf(cospi[10], bf0[20], cospi[54], bf0[27], cos_bit);
+  bf1[28] = half_btf(cospi[50], bf0[19], cospi[14], bf0[28], cos_bit);
+  bf1[29] = half_btf(cospi[18], bf0[18], cospi[46], bf0[29], cos_bit);
+  bf1[30] = half_btf(cospi[34], bf0[17], cospi[30], bf0[30], cos_bit);
+  bf1[31] = half_btf(cospi[2], bf0[16], cospi[62], bf0[31], cos_bit);
+  bf1[32] = clamp_value(bf0[32] + bf0[33], stage_range[stage]);
+  bf1[33] = clamp_value(bf0[32] - bf0[33], stage_range[stage]);
+  bf1[34] = clamp_value(-bf0[34] + bf0[35], stage_range[stage]);
+  bf1[35] = clamp_value(bf0[34] + bf0[35], stage_range[stage]);
+  bf1[36] = clamp_value(bf0[36] + bf0[37], stage_range[stage]);
+  bf1[37] = clamp_value(bf0[36] - bf0[37], stage_range[stage]);
+  bf1[38] = clamp_value(-bf0[38] + bf0[39], stage_range[stage]);
+  bf1[39] = clamp_value(bf0[38] + bf0[39], stage_range[stage]);
+  bf1[40] = clamp_value(bf0[40] + bf0[41], stage_range[stage]);
+  bf1[41] = clamp_value(bf0[40] - bf0[41], stage_range[stage]);
+  bf1[42] = clamp_value(-bf0[42] + bf0[43], stage_range[stage]);
+  bf1[43] = clamp_value(bf0[42] + bf0[43], stage_range[stage]);
+  bf1[44] = clamp_value(bf0[44] + bf0[45], stage_range[stage]);
+  bf1[45] = clamp_value(bf0[44] - bf0[45], stage_range[stage]);
+  bf1[46] = clamp_value(-bf0[46] + bf0[47], stage_range[stage]);
+  bf1[47] = clamp_value(bf0[46] + bf0[47], stage_range[stage]);
+  bf1[48] = clamp_value(bf0[48] + bf0[49], stage_range[stage]);
+  bf1[49] = clamp_value(bf0[48] - bf0[49], stage_range[stage]);
+  bf1[50] = clamp_value(-bf0[50] + bf0[51], stage_range[stage]);
+  bf1[51] = clamp_value(bf0[50] + bf0[51], stage_range[stage]);
+  bf1[52] = clamp_value(bf0[52] + bf0[53], stage_range[stage]);
+  bf1[53] = clamp_value(bf0[52] - bf0[53], stage_range[stage]);
+  bf1[54] = clamp_value(-bf0[54] + bf0[55], stage_range[stage]);
+  bf1[55] = clamp_value(bf0[54] + bf0[55], stage_range[stage]);
+  bf1[56] = clamp_value(bf0[56] + bf0[57], stage_range[stage]);
+  bf1[57] = clamp_value(bf0[56] - bf0[57], stage_range[stage]);
+  bf1[58] = clamp_value(-bf0[58] + bf0[59], stage_range[stage]);
+  bf1[59] = clamp_value(bf0[58] + bf0[59], stage_range[stage]);
+  bf1[60] = clamp_value(bf0[60] + bf0[61], stage_range[stage]);
+  bf1[61] = clamp_value(bf0[60] - bf0[61], stage_range[stage]);
+  bf1[62] = clamp_value(-bf0[62] + bf0[63], stage_range[stage]);
+  bf1[63] = clamp_value(bf0[62] + bf0[63], stage_range[stage]);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 4
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = bf0[0];
+  bf1[1] = bf0[1];
+  bf1[2] = bf0[2];
+  bf1[3] = bf0[3];
+  bf1[4] = bf0[4];
+  bf1[5] = bf0[5];
+  bf1[6] = bf0[6];
+  bf1[7] = bf0[7];
+  bf1[8] = half_btf(cospi[60], bf0[8], -cospi[4], bf0[15], cos_bit);
+  bf1[9] = half_btf(cospi[28], bf0[9], -cospi[36], bf0[14], cos_bit);
+  bf1[10] = half_btf(cospi[44], bf0[10], -cospi[20], bf0[13], cos_bit);
+  bf1[11] = half_btf(cospi[12], bf0[11], -cospi[52], bf0[12], cos_bit);
+  bf1[12] = half_btf(cospi[52], bf0[11], cospi[12], bf0[12], cos_bit);
+  bf1[13] = half_btf(cospi[20], bf0[10], cospi[44], bf0[13], cos_bit);
+  bf1[14] = half_btf(cospi[36], bf0[9], cospi[28], bf0[14], cos_bit);
+  bf1[15] = half_btf(cospi[4], bf0[8], cospi[60], bf0[15], cos_bit);
+  bf1[16] = clamp_value(bf0[16] + bf0[17], stage_range[stage]);
+  bf1[17] = clamp_value(bf0[16] - bf0[17], stage_range[stage]);
+  bf1[18] = clamp_value(-bf0[18] + bf0[19], stage_range[stage]);
+  bf1[19] = clamp_value(bf0[18] + bf0[19], stage_range[stage]);
+  bf1[20] = clamp_value(bf0[20] + bf0[21], stage_range[stage]);
+  bf1[21] = clamp_value(bf0[20] - bf0[21], stage_range[stage]);
+  bf1[22] = clamp_value(-bf0[22] + bf0[23], stage_range[stage]);
+  bf1[23] = clamp_value(bf0[22] + bf0[23], stage_range[stage]);
+  bf1[24] = clamp_value(bf0[24] + bf0[25], stage_range[stage]);
+  bf1[25] = clamp_value(bf0[24] - bf0[25], stage_range[stage]);
+  bf1[26] = clamp_value(-bf0[26] + bf0[27], stage_range[stage]);
+  bf1[27] = clamp_value(bf0[26] + bf0[27], stage_range[stage]);
+  bf1[28] = clamp_value(bf0[28] + bf0[29], stage_range[stage]);
+  bf1[29] = clamp_value(bf0[28] - bf0[29], stage_range[stage]);
+  bf1[30] = clamp_value(-bf0[30] + bf0[31], stage_range[stage]);
+  bf1[31] = clamp_value(bf0[30] + bf0[31], stage_range[stage]);
+  bf1[32] = bf0[32];
+  bf1[33] = half_btf(-cospi[4], bf0[33], cospi[60], bf0[62], cos_bit);
+  bf1[34] = half_btf(-cospi[60], bf0[34], -cospi[4], bf0[61], cos_bit);
+  bf1[35] = bf0[35];
+  bf1[36] = bf0[36];
+  bf1[37] = half_btf(-cospi[36], bf0[37], cospi[28], bf0[58], cos_bit);
+  bf1[38] = half_btf(-cospi[28], bf0[38], -cospi[36], bf0[57], cos_bit);
+  bf1[39] = bf0[39];
+  bf1[40] = bf0[40];
+  bf1[41] = half_btf(-cospi[20], bf0[41], cospi[44], bf0[54], cos_bit);
+  bf1[42] = half_btf(-cospi[44], bf0[42], -cospi[20], bf0[53], cos_bit);
+  bf1[43] = bf0[43];
+  bf1[44] = bf0[44];
+  bf1[45] = half_btf(-cospi[52], bf0[45], cospi[12], bf0[50], cos_bit);
+  bf1[46] = half_btf(-cospi[12], bf0[46], -cospi[52], bf0[49], cos_bit);
+  bf1[47] = bf0[47];
+  bf1[48] = bf0[48];
+  bf1[49] = half_btf(-cospi[52], bf0[46], cospi[12], bf0[49], cos_bit);
+  bf1[50] = half_btf(cospi[12], bf0[45], cospi[52], bf0[50], cos_bit);
+  bf1[51] = bf0[51];
+  bf1[52] = bf0[52];
+  bf1[53] = half_btf(-cospi[20], bf0[42], cospi[44], bf0[53], cos_bit);
+  bf1[54] = half_btf(cospi[44], bf0[41], cospi[20], bf0[54], cos_bit);
+  bf1[55] = bf0[55];
+  bf1[56] = bf0[56];
+  bf1[57] = half_btf(-cospi[36], bf0[38], cospi[28], bf0[57], cos_bit);
+  bf1[58] = half_btf(cospi[28], bf0[37], cospi[36], bf0[58], cos_bit);
+  bf1[59] = bf0[59];
+  bf1[60] = bf0[60];
+  bf1[61] = half_btf(-cospi[4], bf0[34], cospi[60], bf0[61], cos_bit);
+  bf1[62] = half_btf(cospi[60], bf0[33], cospi[4], bf0[62], cos_bit);
+  bf1[63] = bf0[63];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 5
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = bf0[0];
+  bf1[1] = bf0[1];
+  bf1[2] = bf0[2];
+  bf1[3] = bf0[3];
+  bf1[4] = half_btf(cospi[56], bf0[4], -cospi[8], bf0[7], cos_bit);
+  bf1[5] = half_btf(cospi[24], bf0[5], -cospi[40], bf0[6], cos_bit);
+  bf1[6] = half_btf(cospi[40], bf0[5], cospi[24], bf0[6], cos_bit);
+  bf1[7] = half_btf(cospi[8], bf0[4], cospi[56], bf0[7], cos_bit);
+  bf1[8] = clamp_value(bf0[8] + bf0[9], stage_range[stage]);
+  bf1[9] = clamp_value(bf0[8] - bf0[9], stage_range[stage]);
+  bf1[10] = clamp_value(-bf0[10] + bf0[11], stage_range[stage]);
+  bf1[11] = clamp_value(bf0[10] + bf0[11], stage_range[stage]);
+  bf1[12] = clamp_value(bf0[12] + bf0[13], stage_range[stage]);
+  bf1[13] = clamp_value(bf0[12] - bf0[13], stage_range[stage]);
+  bf1[14] = clamp_value(-bf0[14] + bf0[15], stage_range[stage]);
+  bf1[15] = clamp_value(bf0[14] + bf0[15], stage_range[stage]);
+  bf1[16] = bf0[16];
+  bf1[17] = half_btf(-cospi[8], bf0[17], cospi[56], bf0[30], cos_bit);
+  bf1[18] = half_btf(-cospi[56], bf0[18], -cospi[8], bf0[29], cos_bit);
+  bf1[19] = bf0[19];
+  bf1[20] = bf0[20];
+  bf1[21] = half_btf(-cospi[40], bf0[21], cospi[24], bf0[26], cos_bit);
+  bf1[22] = half_btf(-cospi[24], bf0[22], -cospi[40], bf0[25], cos_bit);
+  bf1[23] = bf0[23];
+  bf1[24] = bf0[24];
+  bf1[25] = half_btf(-cospi[40], bf0[22], cospi[24], bf0[25], cos_bit);
+  bf1[26] = half_btf(cospi[24], bf0[21], cospi[40], bf0[26], cos_bit);
+  bf1[27] = bf0[27];
+  bf1[28] = bf0[28];
+  bf1[29] = half_btf(-cospi[8], bf0[18], cospi[56], bf0[29], cos_bit);
+  bf1[30] = half_btf(cospi[56], bf0[17], cospi[8], bf0[30], cos_bit);
+  bf1[31] = bf0[31];
+  bf1[32] = clamp_value(bf0[32] + bf0[35], stage_range[stage]);
+  bf1[33] = clamp_value(bf0[33] + bf0[34], stage_range[stage]);
+  bf1[34] = clamp_value(bf0[33] - bf0[34], stage_range[stage]);
+  bf1[35] = clamp_value(bf0[32] - bf0[35], stage_range[stage]);
+  bf1[36] = clamp_value(-bf0[36] + bf0[39], stage_range[stage]);
+  bf1[37] = clamp_value(-bf0[37] + bf0[38], stage_range[stage]);
+  bf1[38] = clamp_value(bf0[37] + bf0[38], stage_range[stage]);
+  bf1[39] = clamp_value(bf0[36] + bf0[39], stage_range[stage]);
+  bf1[40] = clamp_value(bf0[40] + bf0[43], stage_range[stage]);
+  bf1[41] = clamp_value(bf0[41] + bf0[42], stage_range[stage]);
+  bf1[42] = clamp_value(bf0[41] - bf0[42], stage_range[stage]);
+  bf1[43] = clamp_value(bf0[40] - bf0[43], stage_range[stage]);
+  bf1[44] = clamp_value(-bf0[44] + bf0[47], stage_range[stage]);
+  bf1[45] = clamp_value(-bf0[45] + bf0[46], stage_range[stage]);
+  bf1[46] = clamp_value(bf0[45] + bf0[46], stage_range[stage]);
+  bf1[47] = clamp_value(bf0[44] + bf0[47], stage_range[stage]);
+  bf1[48] = clamp_value(bf0[48] + bf0[51], stage_range[stage]);
+  bf1[49] = clamp_value(bf0[49] + bf0[50], stage_range[stage]);
+  bf1[50] = clamp_value(bf0[49] - bf0[50], stage_range[stage]);
+  bf1[51] = clamp_value(bf0[48] - bf0[51], stage_range[stage]);
+  bf1[52] = clamp_value(-bf0[52] + bf0[55], stage_range[stage]);
+  bf1[53] = clamp_value(-bf0[53] + bf0[54], stage_range[stage]);
+  bf1[54] = clamp_value(bf0[53] + bf0[54], stage_range[stage]);
+  bf1[55] = clamp_value(bf0[52] + bf0[55], stage_range[stage]);
+  bf1[56] = clamp_value(bf0[56] + bf0[59], stage_range[stage]);
+  bf1[57] = clamp_value(bf0[57] + bf0[58], stage_range[stage]);
+  bf1[58] = clamp_value(bf0[57] - bf0[58], stage_range[stage]);
+  bf1[59] = clamp_value(bf0[56] - bf0[59], stage_range[stage]);
+  bf1[60] = clamp_value(-bf0[60] + bf0[63], stage_range[stage]);
+  bf1[61] = clamp_value(-bf0[61] + bf0[62], stage_range[stage]);
+  bf1[62] = clamp_value(bf0[61] + bf0[62], stage_range[stage]);
+  bf1[63] = clamp_value(bf0[60] + bf0[63], stage_range[stage]);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 6
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = half_btf(cospi[32], bf0[0], cospi[32], bf0[1], cos_bit);
+  bf1[1] = half_btf(cospi[32], bf0[0], -cospi[32], bf0[1], cos_bit);
+  bf1[2] = half_btf(cospi[48], bf0[2], -cospi[16], bf0[3], cos_bit);
+  bf1[3] = half_btf(cospi[16], bf0[2], cospi[48], bf0[3], cos_bit);
+  bf1[4] = clamp_value(bf0[4] + bf0[5], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[4] - bf0[5], stage_range[stage]);
+  bf1[6] = clamp_value(-bf0[6] + bf0[7], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[6] + bf0[7], stage_range[stage]);
+  bf1[8] = bf0[8];
+  bf1[9] = half_btf(-cospi[16], bf0[9], cospi[48], bf0[14], cos_bit);
+  bf1[10] = half_btf(-cospi[48], bf0[10], -cospi[16], bf0[13], cos_bit);
+  bf1[11] = bf0[11];
+  bf1[12] = bf0[12];
+  bf1[13] = half_btf(-cospi[16], bf0[10], cospi[48], bf0[13], cos_bit);
+  bf1[14] = half_btf(cospi[48], bf0[9], cospi[16], bf0[14], cos_bit);
+  bf1[15] = bf0[15];
+  bf1[16] = clamp_value(bf0[16] + bf0[19], stage_range[stage]);
+  bf1[17] = clamp_value(bf0[17] + bf0[18], stage_range[stage]);
+  bf1[18] = clamp_value(bf0[17] - bf0[18], stage_range[stage]);
+  bf1[19] = clamp_value(bf0[16] - bf0[19], stage_range[stage]);
+  bf1[20] = clamp_value(-bf0[20] + bf0[23], stage_range[stage]);
+  bf1[21] = clamp_value(-bf0[21] + bf0[22], stage_range[stage]);
+  bf1[22] = clamp_value(bf0[21] + bf0[22], stage_range[stage]);
+  bf1[23] = clamp_value(bf0[20] + bf0[23], stage_range[stage]);
+  bf1[24] = clamp_value(bf0[24] + bf0[27], stage_range[stage]);
+  bf1[25] = clamp_value(bf0[25] + bf0[26], stage_range[stage]);
+  bf1[26] = clamp_value(bf0[25] - bf0[26], stage_range[stage]);
+  bf1[27] = clamp_value(bf0[24] - bf0[27], stage_range[stage]);
+  bf1[28] = clamp_value(-bf0[28] + bf0[31], stage_range[stage]);
+  bf1[29] = clamp_value(-bf0[29] + bf0[30], stage_range[stage]);
+  bf1[30] = clamp_value(bf0[29] + bf0[30], stage_range[stage]);
+  bf1[31] = clamp_value(bf0[28] + bf0[31], stage_range[stage]);
+  bf1[32] = bf0[32];
+  bf1[33] = bf0[33];
+  bf1[34] = half_btf(-cospi[8], bf0[34], cospi[56], bf0[61], cos_bit);
+  bf1[35] = half_btf(-cospi[8], bf0[35], cospi[56], bf0[60], cos_bit);
+  bf1[36] = half_btf(-cospi[56], bf0[36], -cospi[8], bf0[59], cos_bit);
+  bf1[37] = half_btf(-cospi[56], bf0[37], -cospi[8], bf0[58], cos_bit);
+  bf1[38] = bf0[38];
+  bf1[39] = bf0[39];
+  bf1[40] = bf0[40];
+  bf1[41] = bf0[41];
+  bf1[42] = half_btf(-cospi[40], bf0[42], cospi[24], bf0[53], cos_bit);
+  bf1[43] = half_btf(-cospi[40], bf0[43], cospi[24], bf0[52], cos_bit);
+  bf1[44] = half_btf(-cospi[24], bf0[44], -cospi[40], bf0[51], cos_bit);
+  bf1[45] = half_btf(-cospi[24], bf0[45], -cospi[40], bf0[50], cos_bit);
+  bf1[46] = bf0[46];
+  bf1[47] = bf0[47];
+  bf1[48] = bf0[48];
+  bf1[49] = bf0[49];
+  bf1[50] = half_btf(-cospi[40], bf0[45], cospi[24], bf0[50], cos_bit);
+  bf1[51] = half_btf(-cospi[40], bf0[44], cospi[24], bf0[51], cos_bit);
+  bf1[52] = half_btf(cospi[24], bf0[43], cospi[40], bf0[52], cos_bit);
+  bf1[53] = half_btf(cospi[24], bf0[42], cospi[40], bf0[53], cos_bit);
+  bf1[54] = bf0[54];
+  bf1[55] = bf0[55];
+  bf1[56] = bf0[56];
+  bf1[57] = bf0[57];
+  bf1[58] = half_btf(-cospi[8], bf0[37], cospi[56], bf0[58], cos_bit);
+  bf1[59] = half_btf(-cospi[8], bf0[36], cospi[56], bf0[59], cos_bit);
+  bf1[60] = half_btf(cospi[56], bf0[35], cospi[8], bf0[60], cos_bit);
+  bf1[61] = half_btf(cospi[56], bf0[34], cospi[8], bf0[61], cos_bit);
+  bf1[62] = bf0[62];
+  bf1[63] = bf0[63];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 7
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = clamp_value(bf0[0] + bf0[3], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[2], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[1] - bf0[2], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[0] - bf0[3], stage_range[stage]);
+  bf1[4] = bf0[4];
+  bf1[5] = half_btf(-cospi[32], bf0[5], cospi[32], bf0[6], cos_bit);
+  bf1[6] = half_btf(cospi[32], bf0[5], cospi[32], bf0[6], cos_bit);
+  bf1[7] = bf0[7];
+  bf1[8] = clamp_value(bf0[8] + bf0[11], stage_range[stage]);
+  bf1[9] = clamp_value(bf0[9] + bf0[10], stage_range[stage]);
+  bf1[10] = clamp_value(bf0[9] - bf0[10], stage_range[stage]);
+  bf1[11] = clamp_value(bf0[8] - bf0[11], stage_range[stage]);
+  bf1[12] = clamp_value(-bf0[12] + bf0[15], stage_range[stage]);
+  bf1[13] = clamp_value(-bf0[13] + bf0[14], stage_range[stage]);
+  bf1[14] = clamp_value(bf0[13] + bf0[14], stage_range[stage]);
+  bf1[15] = clamp_value(bf0[12] + bf0[15], stage_range[stage]);
+  bf1[16] = bf0[16];
+  bf1[17] = bf0[17];
+  bf1[18] = half_btf(-cospi[16], bf0[18], cospi[48], bf0[29], cos_bit);
+  bf1[19] = half_btf(-cospi[16], bf0[19], cospi[48], bf0[28], cos_bit);
+  bf1[20] = half_btf(-cospi[48], bf0[20], -cospi[16], bf0[27], cos_bit);
+  bf1[21] = half_btf(-cospi[48], bf0[21], -cospi[16], bf0[26], cos_bit);
+  bf1[22] = bf0[22];
+  bf1[23] = bf0[23];
+  bf1[24] = bf0[24];
+  bf1[25] = bf0[25];
+  bf1[26] = half_btf(-cospi[16], bf0[21], cospi[48], bf0[26], cos_bit);
+  bf1[27] = half_btf(-cospi[16], bf0[20], cospi[48], bf0[27], cos_bit);
+  bf1[28] = half_btf(cospi[48], bf0[19], cospi[16], bf0[28], cos_bit);
+  bf1[29] = half_btf(cospi[48], bf0[18], cospi[16], bf0[29], cos_bit);
+  bf1[30] = bf0[30];
+  bf1[31] = bf0[31];
+  bf1[32] = clamp_value(bf0[32] + bf0[39], stage_range[stage]);
+  bf1[33] = clamp_value(bf0[33] + bf0[38], stage_range[stage]);
+  bf1[34] = clamp_value(bf0[34] + bf0[37], stage_range[stage]);
+  bf1[35] = clamp_value(bf0[35] + bf0[36], stage_range[stage]);
+  bf1[36] = clamp_value(bf0[35] - bf0[36], stage_range[stage]);
+  bf1[37] = clamp_value(bf0[34] - bf0[37], stage_range[stage]);
+  bf1[38] = clamp_value(bf0[33] - bf0[38], stage_range[stage]);
+  bf1[39] = clamp_value(bf0[32] - bf0[39], stage_range[stage]);
+  bf1[40] = clamp_value(-bf0[40] + bf0[47], stage_range[stage]);
+  bf1[41] = clamp_value(-bf0[41] + bf0[46], stage_range[stage]);
+  bf1[42] = clamp_value(-bf0[42] + bf0[45], stage_range[stage]);
+  bf1[43] = clamp_value(-bf0[43] + bf0[44], stage_range[stage]);
+  bf1[44] = clamp_value(bf0[43] + bf0[44], stage_range[stage]);
+  bf1[45] = clamp_value(bf0[42] + bf0[45], stage_range[stage]);
+  bf1[46] = clamp_value(bf0[41] + bf0[46], stage_range[stage]);
+  bf1[47] = clamp_value(bf0[40] + bf0[47], stage_range[stage]);
+  bf1[48] = clamp_value(bf0[48] + bf0[55], stage_range[stage]);
+  bf1[49] = clamp_value(bf0[49] + bf0[54], stage_range[stage]);
+  bf1[50] = clamp_value(bf0[50] + bf0[53], stage_range[stage]);
+  bf1[51] = clamp_value(bf0[51] + bf0[52], stage_range[stage]);
+  bf1[52] = clamp_value(bf0[51] - bf0[52], stage_range[stage]);
+  bf1[53] = clamp_value(bf0[50] - bf0[53], stage_range[stage]);
+  bf1[54] = clamp_value(bf0[49] - bf0[54], stage_range[stage]);
+  bf1[55] = clamp_value(bf0[48] - bf0[55], stage_range[stage]);
+  bf1[56] = clamp_value(-bf0[56] + bf0[63], stage_range[stage]);
+  bf1[57] = clamp_value(-bf0[57] + bf0[62], stage_range[stage]);
+  bf1[58] = clamp_value(-bf0[58] + bf0[61], stage_range[stage]);
+  bf1[59] = clamp_value(-bf0[59] + bf0[60], stage_range[stage]);
+  bf1[60] = clamp_value(bf0[59] + bf0[60], stage_range[stage]);
+  bf1[61] = clamp_value(bf0[58] + bf0[61], stage_range[stage]);
+  bf1[62] = clamp_value(bf0[57] + bf0[62], stage_range[stage]);
+  bf1[63] = clamp_value(bf0[56] + bf0[63], stage_range[stage]);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 8
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = clamp_value(bf0[0] + bf0[7], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[6], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[2] + bf0[5], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[3] + bf0[4], stage_range[stage]);
+  bf1[4] = clamp_value(bf0[3] - bf0[4], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[2] - bf0[5], stage_range[stage]);
+  bf1[6] = clamp_value(bf0[1] - bf0[6], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[0] - bf0[7], stage_range[stage]);
+  bf1[8] = bf0[8];
+  bf1[9] = bf0[9];
+  bf1[10] = half_btf(-cospi[32], bf0[10], cospi[32], bf0[13], cos_bit);
+  bf1[11] = half_btf(-cospi[32], bf0[11], cospi[32], bf0[12], cos_bit);
+  bf1[12] = half_btf(cospi[32], bf0[11], cospi[32], bf0[12], cos_bit);
+  bf1[13] = half_btf(cospi[32], bf0[10], cospi[32], bf0[13], cos_bit);
+  bf1[14] = bf0[14];
+  bf1[15] = bf0[15];
+  bf1[16] = clamp_value(bf0[16] + bf0[23], stage_range[stage]);
+  bf1[17] = clamp_value(bf0[17] + bf0[22], stage_range[stage]);
+  bf1[18] = clamp_value(bf0[18] + bf0[21], stage_range[stage]);
+  bf1[19] = clamp_value(bf0[19] + bf0[20], stage_range[stage]);
+  bf1[20] = clamp_value(bf0[19] - bf0[20], stage_range[stage]);
+  bf1[21] = clamp_value(bf0[18] - bf0[21], stage_range[stage]);
+  bf1[22] = clamp_value(bf0[17] - bf0[22], stage_range[stage]);
+  bf1[23] = clamp_value(bf0[16] - bf0[23], stage_range[stage]);
+  bf1[24] = clamp_value(-bf0[24] + bf0[31], stage_range[stage]);
+  bf1[25] = clamp_value(-bf0[25] + bf0[30], stage_range[stage]);
+  bf1[26] = clamp_value(-bf0[26] + bf0[29], stage_range[stage]);
+  bf1[27] = clamp_value(-bf0[27] + bf0[28], stage_range[stage]);
+  bf1[28] = clamp_value(bf0[27] + bf0[28], stage_range[stage]);
+  bf1[29] = clamp_value(bf0[26] + bf0[29], stage_range[stage]);
+  bf1[30] = clamp_value(bf0[25] + bf0[30], stage_range[stage]);
+  bf1[31] = clamp_value(bf0[24] + bf0[31], stage_range[stage]);
+  bf1[32] = bf0[32];
+  bf1[33] = bf0[33];
+  bf1[34] = bf0[34];
+  bf1[35] = bf0[35];
+  bf1[36] = half_btf(-cospi[16], bf0[36], cospi[48], bf0[59], cos_bit);
+  bf1[37] = half_btf(-cospi[16], bf0[37], cospi[48], bf0[58], cos_bit);
+  bf1[38] = half_btf(-cospi[16], bf0[38], cospi[48], bf0[57], cos_bit);
+  bf1[39] = half_btf(-cospi[16], bf0[39], cospi[48], bf0[56], cos_bit);
+  bf1[40] = half_btf(-cospi[48], bf0[40], -cospi[16], bf0[55], cos_bit);
+  bf1[41] = half_btf(-cospi[48], bf0[41], -cospi[16], bf0[54], cos_bit);
+  bf1[42] = half_btf(-cospi[48], bf0[42], -cospi[16], bf0[53], cos_bit);
+  bf1[43] = half_btf(-cospi[48], bf0[43], -cospi[16], bf0[52], cos_bit);
+  bf1[44] = bf0[44];
+  bf1[45] = bf0[45];
+  bf1[46] = bf0[46];
+  bf1[47] = bf0[47];
+  bf1[48] = bf0[48];
+  bf1[49] = bf0[49];
+  bf1[50] = bf0[50];
+  bf1[51] = bf0[51];
+  bf1[52] = half_btf(-cospi[16], bf0[43], cospi[48], bf0[52], cos_bit);
+  bf1[53] = half_btf(-cospi[16], bf0[42], cospi[48], bf0[53], cos_bit);
+  bf1[54] = half_btf(-cospi[16], bf0[41], cospi[48], bf0[54], cos_bit);
+  bf1[55] = half_btf(-cospi[16], bf0[40], cospi[48], bf0[55], cos_bit);
+  bf1[56] = half_btf(cospi[48], bf0[39], cospi[16], bf0[56], cos_bit);
+  bf1[57] = half_btf(cospi[48], bf0[38], cospi[16], bf0[57], cos_bit);
+  bf1[58] = half_btf(cospi[48], bf0[37], cospi[16], bf0[58], cos_bit);
+  bf1[59] = half_btf(cospi[48], bf0[36], cospi[16], bf0[59], cos_bit);
+  bf1[60] = bf0[60];
+  bf1[61] = bf0[61];
+  bf1[62] = bf0[62];
+  bf1[63] = bf0[63];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 9
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = clamp_value(bf0[0] + bf0[15], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[14], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[2] + bf0[13], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[3] + bf0[12], stage_range[stage]);
+  bf1[4] = clamp_value(bf0[4] + bf0[11], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[5] + bf0[10], stage_range[stage]);
+  bf1[6] = clamp_value(bf0[6] + bf0[9], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[7] + bf0[8], stage_range[stage]);
+  bf1[8] = clamp_value(bf0[7] - bf0[8], stage_range[stage]);
+  bf1[9] = clamp_value(bf0[6] - bf0[9], stage_range[stage]);
+  bf1[10] = clamp_value(bf0[5] - bf0[10], stage_range[stage]);
+  bf1[11] = clamp_value(bf0[4] - bf0[11], stage_range[stage]);
+  bf1[12] = clamp_value(bf0[3] - bf0[12], stage_range[stage]);
+  bf1[13] = clamp_value(bf0[2] - bf0[13], stage_range[stage]);
+  bf1[14] = clamp_value(bf0[1] - bf0[14], stage_range[stage]);
+  bf1[15] = clamp_value(bf0[0] - bf0[15], stage_range[stage]);
+  bf1[16] = bf0[16];
+  bf1[17] = bf0[17];
+  bf1[18] = bf0[18];
+  bf1[19] = bf0[19];
+  bf1[20] = half_btf(-cospi[32], bf0[20], cospi[32], bf0[27], cos_bit);
+  bf1[21] = half_btf(-cospi[32], bf0[21], cospi[32], bf0[26], cos_bit);
+  bf1[22] = half_btf(-cospi[32], bf0[22], cospi[32], bf0[25], cos_bit);
+  bf1[23] = half_btf(-cospi[32], bf0[23], cospi[32], bf0[24], cos_bit);
+  bf1[24] = half_btf(cospi[32], bf0[23], cospi[32], bf0[24], cos_bit);
+  bf1[25] = half_btf(cospi[32], bf0[22], cospi[32], bf0[25], cos_bit);
+  bf1[26] = half_btf(cospi[32], bf0[21], cospi[32], bf0[26], cos_bit);
+  bf1[27] = half_btf(cospi[32], bf0[20], cospi[32], bf0[27], cos_bit);
+  bf1[28] = bf0[28];
+  bf1[29] = bf0[29];
+  bf1[30] = bf0[30];
+  bf1[31] = bf0[31];
+  bf1[32] = clamp_value(bf0[32] + bf0[47], stage_range[stage]);
+  bf1[33] = clamp_value(bf0[33] + bf0[46], stage_range[stage]);
+  bf1[34] = clamp_value(bf0[34] + bf0[45], stage_range[stage]);
+  bf1[35] = clamp_value(bf0[35] + bf0[44], stage_range[stage]);
+  bf1[36] = clamp_value(bf0[36] + bf0[43], stage_range[stage]);
+  bf1[37] = clamp_value(bf0[37] + bf0[42], stage_range[stage]);
+  bf1[38] = clamp_value(bf0[38] + bf0[41], stage_range[stage]);
+  bf1[39] = clamp_value(bf0[39] + bf0[40], stage_range[stage]);
+  bf1[40] = clamp_value(bf0[39] - bf0[40], stage_range[stage]);
+  bf1[41] = clamp_value(bf0[38] - bf0[41], stage_range[stage]);
+  bf1[42] = clamp_value(bf0[37] - bf0[42], stage_range[stage]);
+  bf1[43] = clamp_value(bf0[36] - bf0[43], stage_range[stage]);
+  bf1[44] = clamp_value(bf0[35] - bf0[44], stage_range[stage]);
+  bf1[45] = clamp_value(bf0[34] - bf0[45], stage_range[stage]);
+  bf1[46] = clamp_value(bf0[33] - bf0[46], stage_range[stage]);
+  bf1[47] = clamp_value(bf0[32] - bf0[47], stage_range[stage]);
+  bf1[48] = clamp_value(-bf0[48] + bf0[63], stage_range[stage]);
+  bf1[49] = clamp_value(-bf0[49] + bf0[62], stage_range[stage]);
+  bf1[50] = clamp_value(-bf0[50] + bf0[61], stage_range[stage]);
+  bf1[51] = clamp_value(-bf0[51] + bf0[60], stage_range[stage]);
+  bf1[52] = clamp_value(-bf0[52] + bf0[59], stage_range[stage]);
+  bf1[53] = clamp_value(-bf0[53] + bf0[58], stage_range[stage]);
+  bf1[54] = clamp_value(-bf0[54] + bf0[57], stage_range[stage]);
+  bf1[55] = clamp_value(-bf0[55] + bf0[56], stage_range[stage]);
+  bf1[56] = clamp_value(bf0[55] + bf0[56], stage_range[stage]);
+  bf1[57] = clamp_value(bf0[54] + bf0[57], stage_range[stage]);
+  bf1[58] = clamp_value(bf0[53] + bf0[58], stage_range[stage]);
+  bf1[59] = clamp_value(bf0[52] + bf0[59], stage_range[stage]);
+  bf1[60] = clamp_value(bf0[51] + bf0[60], stage_range[stage]);
+  bf1[61] = clamp_value(bf0[50] + bf0[61], stage_range[stage]);
+  bf1[62] = clamp_value(bf0[49] + bf0[62], stage_range[stage]);
+  bf1[63] = clamp_value(bf0[48] + bf0[63], stage_range[stage]);
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 10
+  stage++;
+  bf0 = output;
+  bf1 = step;
+  bf1[0] = clamp_value(bf0[0] + bf0[31], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[30], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[2] + bf0[29], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[3] + bf0[28], stage_range[stage]);
+  bf1[4] = clamp_value(bf0[4] + bf0[27], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[5] + bf0[26], stage_range[stage]);
+  bf1[6] = clamp_value(bf0[6] + bf0[25], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[7] + bf0[24], stage_range[stage]);
+  bf1[8] = clamp_value(bf0[8] + bf0[23], stage_range[stage]);
+  bf1[9] = clamp_value(bf0[9] + bf0[22], stage_range[stage]);
+  bf1[10] = clamp_value(bf0[10] + bf0[21], stage_range[stage]);
+  bf1[11] = clamp_value(bf0[11] + bf0[20], stage_range[stage]);
+  bf1[12] = clamp_value(bf0[12] + bf0[19], stage_range[stage]);
+  bf1[13] = clamp_value(bf0[13] + bf0[18], stage_range[stage]);
+  bf1[14] = clamp_value(bf0[14] + bf0[17], stage_range[stage]);
+  bf1[15] = clamp_value(bf0[15] + bf0[16], stage_range[stage]);
+  bf1[16] = clamp_value(bf0[15] - bf0[16], stage_range[stage]);
+  bf1[17] = clamp_value(bf0[14] - bf0[17], stage_range[stage]);
+  bf1[18] = clamp_value(bf0[13] - bf0[18], stage_range[stage]);
+  bf1[19] = clamp_value(bf0[12] - bf0[19], stage_range[stage]);
+  bf1[20] = clamp_value(bf0[11] - bf0[20], stage_range[stage]);
+  bf1[21] = clamp_value(bf0[10] - bf0[21], stage_range[stage]);
+  bf1[22] = clamp_value(bf0[9] - bf0[22], stage_range[stage]);
+  bf1[23] = clamp_value(bf0[8] - bf0[23], stage_range[stage]);
+  bf1[24] = clamp_value(bf0[7] - bf0[24], stage_range[stage]);
+  bf1[25] = clamp_value(bf0[6] - bf0[25], stage_range[stage]);
+  bf1[26] = clamp_value(bf0[5] - bf0[26], stage_range[stage]);
+  bf1[27] = clamp_value(bf0[4] - bf0[27], stage_range[stage]);
+  bf1[28] = clamp_value(bf0[3] - bf0[28], stage_range[stage]);
+  bf1[29] = clamp_value(bf0[2] - bf0[29], stage_range[stage]);
+  bf1[30] = clamp_value(bf0[1] - bf0[30], stage_range[stage]);
+  bf1[31] = clamp_value(bf0[0] - bf0[31], stage_range[stage]);
+  bf1[32] = bf0[32];
+  bf1[33] = bf0[33];
+  bf1[34] = bf0[34];
+  bf1[35] = bf0[35];
+  bf1[36] = bf0[36];
+  bf1[37] = bf0[37];
+  bf1[38] = bf0[38];
+  bf1[39] = bf0[39];
+  bf1[40] = half_btf(-cospi[32], bf0[40], cospi[32], bf0[55], cos_bit);
+  bf1[41] = half_btf(-cospi[32], bf0[41], cospi[32], bf0[54], cos_bit);
+  bf1[42] = half_btf(-cospi[32], bf0[42], cospi[32], bf0[53], cos_bit);
+  bf1[43] = half_btf(-cospi[32], bf0[43], cospi[32], bf0[52], cos_bit);
+  bf1[44] = half_btf(-cospi[32], bf0[44], cospi[32], bf0[51], cos_bit);
+  bf1[45] = half_btf(-cospi[32], bf0[45], cospi[32], bf0[50], cos_bit);
+  bf1[46] = half_btf(-cospi[32], bf0[46], cospi[32], bf0[49], cos_bit);
+  bf1[47] = half_btf(-cospi[32], bf0[47], cospi[32], bf0[48], cos_bit);
+  bf1[48] = half_btf(cospi[32], bf0[47], cospi[32], bf0[48], cos_bit);
+  bf1[49] = half_btf(cospi[32], bf0[46], cospi[32], bf0[49], cos_bit);
+  bf1[50] = half_btf(cospi[32], bf0[45], cospi[32], bf0[50], cos_bit);
+  bf1[51] = half_btf(cospi[32], bf0[44], cospi[32], bf0[51], cos_bit);
+  bf1[52] = half_btf(cospi[32], bf0[43], cospi[32], bf0[52], cos_bit);
+  bf1[53] = half_btf(cospi[32], bf0[42], cospi[32], bf0[53], cos_bit);
+  bf1[54] = half_btf(cospi[32], bf0[41], cospi[32], bf0[54], cos_bit);
+  bf1[55] = half_btf(cospi[32], bf0[40], cospi[32], bf0[55], cos_bit);
+  bf1[56] = bf0[56];
+  bf1[57] = bf0[57];
+  bf1[58] = bf0[58];
+  bf1[59] = bf0[59];
+  bf1[60] = bf0[60];
+  bf1[61] = bf0[61];
+  bf1[62] = bf0[62];
+  bf1[63] = bf0[63];
+  av1_range_check_buf(stage, input, bf1, size, stage_range[stage]);
+
+  // stage 11
+  stage++;
+  bf0 = step;
+  bf1 = output;
+  bf1[0] = clamp_value(bf0[0] + bf0[63], stage_range[stage]);
+  bf1[1] = clamp_value(bf0[1] + bf0[62], stage_range[stage]);
+  bf1[2] = clamp_value(bf0[2] + bf0[61], stage_range[stage]);
+  bf1[3] = clamp_value(bf0[3] + bf0[60], stage_range[stage]);
+  bf1[4] = clamp_value(bf0[4] + bf0[59], stage_range[stage]);
+  bf1[5] = clamp_value(bf0[5] + bf0[58], stage_range[stage]);
+  bf1[6] = clamp_value(bf0[6] + bf0[57], stage_range[stage]);
+  bf1[7] = clamp_value(bf0[7] + bf0[56], stage_range[stage]);
+  bf1[8] = clamp_value(bf0[8] + bf0[55], stage_range[stage]);
+  bf1[9] = clamp_value(bf0[9] + bf0[54], stage_range[stage]);
+  bf1[10] = clamp_value(bf0[10] + bf0[53], stage_range[stage]);
+  bf1[11] = clamp_value(bf0[11] + bf0[52], stage_range[stage]);
+  bf1[12] = clamp_value(bf0[12] + bf0[51], stage_range[stage]);
+  bf1[13] = clamp_value(bf0[13] + bf0[50], stage_range[stage]);
+  bf1[14] = clamp_value(bf0[14] + bf0[49], stage_range[stage]);
+  bf1[15] = clamp_value(bf0[15] + bf0[48], stage_range[stage]);
+  bf1[16] = clamp_value(bf0[16] + bf0[47], stage_range[stage]);
+  bf1[17] = clamp_value(bf0[17] + bf0[46], stage_range[stage]);
+  bf1[18] = clamp_value(bf0[18] + bf0[45], stage_range[stage]);
+  bf1[19] = clamp_value(bf0[19] + bf0[44], stage_range[stage]);
+  bf1[20] = clamp_value(bf0[20] + bf0[43], stage_range[stage]);
+  bf1[21] = clamp_value(bf0[21] + bf0[42], stage_range[stage]);
+  bf1[22] = clamp_value(bf0[22] + bf0[41], stage_range[stage]);
+  bf1[23] = clamp_value(bf0[23] + bf0[40], stage_range[stage]);
+  bf1[24] = clamp_value(bf0[24] + bf0[39], stage_range[stage]);
+  bf1[25] = clamp_value(bf0[25] + bf0[38], stage_range[stage]);
+  bf1[26] = clamp_value(bf0[26] + bf0[37], stage_range[stage]);
+  bf1[27] = clamp_value(bf0[27] + bf0[36], stage_range[stage]);
+  bf1[28] = clamp_value(bf0[28] + bf0[35], stage_range[stage]);
+  bf1[29] = clamp_value(bf0[29] + bf0[34], stage_range[stage]);
+  bf1[30] = clamp_value(bf0[30] + bf0[33], stage_range[stage]);
+  bf1[31] = clamp_value(bf0[31] + bf0[32], stage_range[stage]);
+  bf1[32] = clamp_value(bf0[31] - bf0[32], stage_range[stage]);
+  bf1[33] = clamp_value(bf0[30] - bf0[33], stage_range[stage]);
+  bf1[34] = clamp_value(bf0[29] - bf0[34], stage_range[stage]);
+  bf1[35] = clamp_value(bf0[28] - bf0[35], stage_range[stage]);
+  bf1[36] = clamp_value(bf0[27] - bf0[36], stage_range[stage]);
+  bf1[37] = clamp_value(bf0[26] - bf0[37], stage_range[stage]);
+  bf1[38] = clamp_value(bf0[25] - bf0[38], stage_range[stage]);
+  bf1[39] = clamp_value(bf0[24] - bf0[39], stage_range[stage]);
+  bf1[40] = clamp_value(bf0[23] - bf0[40], stage_range[stage]);
+  bf1[41] = clamp_value(bf0[22] - bf0[41], stage_range[stage]);
+  bf1[42] = clamp_value(bf0[21] - bf0[42], stage_range[stage]);
+  bf1[43] = clamp_value(bf0[20] - bf0[43], stage_range[stage]);
+  bf1[44] = clamp_value(bf0[19] - bf0[44], stage_range[stage]);
+  bf1[45] = clamp_value(bf0[18] - bf0[45], stage_range[stage]);
+  bf1[46] = clamp_value(bf0[17] - bf0[46], stage_range[stage]);
+  bf1[47] = clamp_value(bf0[16] - bf0[47], stage_range[stage]);
+  bf1[48] = clamp_value(bf0[15] - bf0[48], stage_range[stage]);
+  bf1[49] = clamp_value(bf0[14] - bf0[49], stage_range[stage]);
+  bf1[50] = clamp_value(bf0[13] - bf0[50], stage_range[stage]);
+  bf1[51] = clamp_value(bf0[12] - bf0[51], stage_range[stage]);
+  bf1[52] = clamp_value(bf0[11] - bf0[52], stage_range[stage]);
+  bf1[53] = clamp_value(bf0[10] - bf0[53], stage_range[stage]);
+  bf1[54] = clamp_value(bf0[9] - bf0[54], stage_range[stage]);
+  bf1[55] = clamp_value(bf0[8] - bf0[55], stage_range[stage]);
+  bf1[56] = clamp_value(bf0[7] - bf0[56], stage_range[stage]);
+  bf1[57] = clamp_value(bf0[6] - bf0[57], stage_range[stage]);
+  bf1[58] = clamp_value(bf0[5] - bf0[58], stage_range[stage]);
+  bf1[59] = clamp_value(bf0[4] - bf0[59], stage_range[stage]);
+  bf1[60] = clamp_value(bf0[3] - bf0[60], stage_range[stage]);
+  bf1[61] = clamp_value(bf0[2] - bf0[61], stage_range[stage]);
+  bf1[62] = clamp_value(bf0[1] - bf0[62], stage_range[stage]);
+  bf1[63] = clamp_value(bf0[0] - bf0[63], stage_range[stage]);
+}

diff --git a/libav1/av1/common/av1_inv_txfm1d.h b/libav1/av1/common/av1_inv_txfm1d.h
new file mode 100644
index 0000000..c31c019
--- /dev/null
+++ b/libav1/av1/common/av1_inv_txfm1d.h

@@ -0,0 +1,61 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_AV1_INV_TXFM1D_H_
+#define AOM_AV1_COMMON_AV1_INV_TXFM1D_H_
+
+#include "av1/common/av1_txfm.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+static INLINE int32_t clamp_value(int32_t value, int8_t bit) {
+  if (bit <= 0) return value;  // Do nothing for invalid clamp bit.
+  const int64_t max_value = (1LL << (bit - 1)) - 1;
+  const int64_t min_value = -(1LL << (bit - 1));
+  return (int32_t)clamp64(value, min_value, max_value);
+}
+
+static INLINE void clamp_buf(int32_t *buf, int32_t size, int8_t bit) {
+  for (int i = 0; i < size; ++i) buf[i] = clamp_value(buf[i], bit);
+}
+
+void av1_idct4_new(const int32_t *input, int32_t *output, int8_t cos_bit,
+                   const int8_t *stage_range);
+void av1_idct8_new(const int32_t *input, int32_t *output, int8_t cos_bit,
+                   const int8_t *stage_range);
+void av1_idct16_new(const int32_t *input, int32_t *output, int8_t cos_bit,
+                    const int8_t *stage_range);
+void av1_idct32_new(const int32_t *input, int32_t *output, int8_t cos_bit,
+                    const int8_t *stage_range);
+void av1_idct64_new(const int32_t *input, int32_t *output, int8_t cos_bit,
+                    const int8_t *stage_range);
+void av1_iadst4_new(const int32_t *input, int32_t *output, int8_t cos_bit,
+                    const int8_t *stage_range);
+void av1_iadst8_new(const int32_t *input, int32_t *output, int8_t cos_bit,
+                    const int8_t *stage_range);
+void av1_iadst16_new(const int32_t *input, int32_t *output, int8_t cos_bit,
+                     const int8_t *stage_range);
+void av1_iidentity4_c(const int32_t *input, int32_t *output, int8_t cos_bit,
+                      const int8_t *stage_range);
+void av1_iidentity8_c(const int32_t *input, int32_t *output, int8_t cos_bit,
+                      const int8_t *stage_range);
+void av1_iidentity16_c(const int32_t *input, int32_t *output, int8_t cos_bit,
+                       const int8_t *stage_range);
+void av1_iidentity32_c(const int32_t *input, int32_t *output, int8_t cos_bit,
+                       const int8_t *stage_range);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  // AOM_AV1_COMMON_AV1_INV_TXFM1D_H_

diff --git a/libav1/av1/common/av1_inv_txfm1d_cfg.h b/libav1/av1/common/av1_inv_txfm1d_cfg.h
new file mode 100644
index 0000000..7d80a00
--- /dev/null
+++ b/libav1/av1/common/av1_inv_txfm1d_cfg.h

@@ -0,0 +1,47 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_AV1_INV_TXFM1D_CFG_H_
+#define AOM_AV1_COMMON_AV1_INV_TXFM1D_CFG_H_
+#include "av1/common/av1_inv_txfm1d.h"
+
+// sum of fwd_shift_##
+static const int8_t inv_start_range[TX_SIZES_ALL] = {
+  5,  // 4x4 transform
+  6,  // 8x8 transform
+  7,  // 16x16 transform
+  7,  // 32x32 transform
+  7,  // 64x64 transform
+  5,  // 4x8 transform
+  5,  // 8x4 transform
+  6,  // 8x16 transform
+  6,  // 16x8 transform
+  6,  // 16x32 transform
+  6,  // 32x16 transform
+  6,  // 32x64 transform
+  6,  // 64x32 transform
+  6,  // 4x16 transform
+  6,  // 16x4 transform
+  7,  // 8x32 transform
+  7,  // 32x8 transform
+  7,  // 16x64 transform
+  7,  // 64x16 transform
+};
+
+extern const int8_t *inv_txfm_shift_ls[TX_SIZES_ALL];
+
+// Values in both inv_cos_bit_col and inv_cos_bit_row are always 12
+// for each valid row and col combination
+#define INV_COS_BIT 12
+extern const int8_t inv_cos_bit_col[5 /*row*/][5 /*col*/];
+extern const int8_t inv_cos_bit_row[5 /*row*/][5 /*col*/];
+
+#endif  // AOM_AV1_COMMON_AV1_INV_TXFM1D_CFG_H_

diff --git a/libav1/av1/common/av1_inv_txfm2d.c b/libav1/av1/common/av1_inv_txfm2d.c
new file mode 100644
index 0000000..d20bd02
--- /dev/null
+++ b/libav1/av1/common/av1_inv_txfm2d.c

@@ -0,0 +1,582 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "config/aom_dsp_rtcd.h"
+#include "config/av1_rtcd.h"
+
+#include "av1/common/enums.h"
+#include "av1/common/av1_txfm.h"
+#include "av1/common/av1_inv_txfm1d.h"
+#include "av1/common/av1_inv_txfm1d_cfg.h"
+
+void av1_highbd_iwht4x4_16_add_c(const tran_low_t *input, uint8_t *dest8,
+                                 int stride, int bd) {
+  /* 4-point reversible, orthonormal inverse Walsh-Hadamard in 3.5 adds,
+     0.5 shifts per pixel. */
+  int i;
+  tran_low_t output[16];
+  tran_low_t a1, b1, c1, d1, e1;
+  const tran_low_t *ip = input;
+  tran_low_t *op = output;
+  uint16_t *dest = CONVERT_TO_SHORTPTR(dest8);
+
+  for (i = 0; i < 4; i++) {
+    a1 = ip[0] >> UNIT_QUANT_SHIFT;
+    c1 = ip[1] >> UNIT_QUANT_SHIFT;
+    d1 = ip[2] >> UNIT_QUANT_SHIFT;
+    b1 = ip[3] >> UNIT_QUANT_SHIFT;
+    a1 += c1;
+    d1 -= b1;
+    e1 = (a1 - d1) >> 1;
+    b1 = e1 - b1;
+    c1 = e1 - c1;
+    a1 -= b1;
+    d1 += c1;
+
+    op[0] = a1;
+    op[1] = b1;
+    op[2] = c1;
+    op[3] = d1;
+    ip += 4;
+    op += 4;
+  }
+
+  ip = output;
+  for (i = 0; i < 4; i++) {
+    a1 = ip[4 * 0];
+    c1 = ip[4 * 1];
+    d1 = ip[4 * 2];
+    b1 = ip[4 * 3];
+    a1 += c1;
+    d1 -= b1;
+    e1 = (a1 - d1) >> 1;
+    b1 = e1 - b1;
+    c1 = e1 - c1;
+    a1 -= b1;
+    d1 += c1;
+
+    range_check_value(a1, bd + 1);
+    range_check_value(b1, bd + 1);
+    range_check_value(c1, bd + 1);
+    range_check_value(d1, bd + 1);
+
+    dest[stride * 0] = highbd_clip_pixel_add(dest[stride * 0], a1, bd);
+    dest[stride * 1] = highbd_clip_pixel_add(dest[stride * 1], b1, bd);
+    dest[stride * 2] = highbd_clip_pixel_add(dest[stride * 2], c1, bd);
+    dest[stride * 3] = highbd_clip_pixel_add(dest[stride * 3], d1, bd);
+
+    ip++;
+    dest++;
+  }
+}
+
+void av1_highbd_iwht4x4_1_add_c(const tran_low_t *in, uint8_t *dest8,
+                                int dest_stride, int bd) {
+  int i;
+  tran_low_t a1, e1;
+  tran_low_t tmp[4];
+  const tran_low_t *ip = in;
+  tran_low_t *op = tmp;
+  uint16_t *dest = CONVERT_TO_SHORTPTR(dest8);
+  (void)bd;
+
+  a1 = ip[0] >> UNIT_QUANT_SHIFT;
+  e1 = a1 >> 1;
+  a1 -= e1;
+  op[0] = a1;
+  op[1] = op[2] = op[3] = e1;
+
+  ip = tmp;
+  for (i = 0; i < 4; i++) {
+    e1 = ip[0] >> 1;
+    a1 = ip[0] - e1;
+    dest[dest_stride * 0] =
+        highbd_clip_pixel_add(dest[dest_stride * 0], a1, bd);
+    dest[dest_stride * 1] =
+        highbd_clip_pixel_add(dest[dest_stride * 1], e1, bd);
+    dest[dest_stride * 2] =
+        highbd_clip_pixel_add(dest[dest_stride * 2], e1, bd);
+    dest[dest_stride * 3] =
+        highbd_clip_pixel_add(dest[dest_stride * 3], e1, bd);
+    ip++;
+    dest++;
+  }
+}
+
+static INLINE TxfmFunc inv_txfm_type_to_func(TXFM_TYPE txfm_type) {
+  switch (txfm_type) {
+    case TXFM_TYPE_DCT4: return av1_idct4_new;
+    case TXFM_TYPE_DCT8: return av1_idct8_new;
+    case TXFM_TYPE_DCT16: return av1_idct16_new;
+    case TXFM_TYPE_DCT32: return av1_idct32_new;
+    case TXFM_TYPE_DCT64: return av1_idct64_new;
+    case TXFM_TYPE_ADST4: return av1_iadst4_new;
+    case TXFM_TYPE_ADST8: return av1_iadst8_new;
+    case TXFM_TYPE_ADST16: return av1_iadst16_new;
+    case TXFM_TYPE_IDENTITY4: return av1_iidentity4_c;
+    case TXFM_TYPE_IDENTITY8: return av1_iidentity8_c;
+    case TXFM_TYPE_IDENTITY16: return av1_iidentity16_c;
+    case TXFM_TYPE_IDENTITY32: return av1_iidentity32_c;
+    default: assert(0); return NULL;
+  }
+}
+
+static const int8_t inv_shift_4x4[2] = { 0, -4 };
+static const int8_t inv_shift_8x8[2] = { -1, -4 };
+static const int8_t inv_shift_16x16[2] = { -2, -4 };
+static const int8_t inv_shift_32x32[2] = { -2, -4 };
+static const int8_t inv_shift_64x64[2] = { -2, -4 };
+static const int8_t inv_shift_4x8[2] = { 0, -4 };
+static const int8_t inv_shift_8x4[2] = { 0, -4 };
+static const int8_t inv_shift_8x16[2] = { -1, -4 };
+static const int8_t inv_shift_16x8[2] = { -1, -4 };
+static const int8_t inv_shift_16x32[2] = { -1, -4 };
+static const int8_t inv_shift_32x16[2] = { -1, -4 };
+static const int8_t inv_shift_32x64[2] = { -1, -4 };
+static const int8_t inv_shift_64x32[2] = { -1, -4 };
+static const int8_t inv_shift_4x16[2] = { -1, -4 };
+static const int8_t inv_shift_16x4[2] = { -1, -4 };
+static const int8_t inv_shift_8x32[2] = { -2, -4 };
+static const int8_t inv_shift_32x8[2] = { -2, -4 };
+static const int8_t inv_shift_16x64[2] = { -2, -4 };
+static const int8_t inv_shift_64x16[2] = { -2, -4 };
+
+const int8_t *inv_txfm_shift_ls[TX_SIZES_ALL] = {
+  inv_shift_4x4,   inv_shift_8x8,   inv_shift_16x16, inv_shift_32x32,
+  inv_shift_64x64, inv_shift_4x8,   inv_shift_8x4,   inv_shift_8x16,
+  inv_shift_16x8,  inv_shift_16x32, inv_shift_32x16, inv_shift_32x64,
+  inv_shift_64x32, inv_shift_4x16,  inv_shift_16x4,  inv_shift_8x32,
+  inv_shift_32x8,  inv_shift_16x64, inv_shift_64x16,
+};
+
+/* clang-format off */
+const int8_t inv_cos_bit_col[MAX_TXWH_IDX]      // txw_idx
+                            [MAX_TXWH_IDX] = {  // txh_idx
+    { INV_COS_BIT, INV_COS_BIT, INV_COS_BIT,           0,           0 },
+    { INV_COS_BIT, INV_COS_BIT, INV_COS_BIT, INV_COS_BIT,           0 },
+    { INV_COS_BIT, INV_COS_BIT, INV_COS_BIT, INV_COS_BIT, INV_COS_BIT },
+    {           0, INV_COS_BIT, INV_COS_BIT, INV_COS_BIT, INV_COS_BIT },
+    {           0,           0, INV_COS_BIT, INV_COS_BIT, INV_COS_BIT }
+  };
+
+const int8_t inv_cos_bit_row[MAX_TXWH_IDX]      // txw_idx
+                            [MAX_TXWH_IDX] = {  // txh_idx
+    { INV_COS_BIT, INV_COS_BIT, INV_COS_BIT,           0,           0 },
+    { INV_COS_BIT, INV_COS_BIT, INV_COS_BIT, INV_COS_BIT,           0 },
+    { INV_COS_BIT, INV_COS_BIT, INV_COS_BIT, INV_COS_BIT, INV_COS_BIT },
+    {           0, INV_COS_BIT, INV_COS_BIT, INV_COS_BIT, INV_COS_BIT },
+    {           0,           0, INV_COS_BIT, INV_COS_BIT, INV_COS_BIT }
+  };
+/* clang-format on */
+
+const int8_t iadst4_range[7] = { 0, 1, 0, 0, 0, 0, 0 };
+
+void av1_get_inv_txfm_cfg(TX_TYPE tx_type, TX_SIZE tx_size,
+                          TXFM_2D_FLIP_CFG *cfg) {
+  assert(cfg != NULL);
+  cfg->tx_size = tx_size;
+  av1_zero(cfg->stage_range_col);
+  av1_zero(cfg->stage_range_row);
+  set_flip_cfg(tx_type, cfg);
+  const TX_TYPE_1D tx_type_1d_col = vtx_tab[tx_type];
+  const TX_TYPE_1D tx_type_1d_row = htx_tab[tx_type];
+  cfg->shift = inv_txfm_shift_ls[tx_size];
+  const int txw_idx = get_txw_idx(tx_size);
+  const int txh_idx = get_txh_idx(tx_size);
+  cfg->cos_bit_col = inv_cos_bit_col[txw_idx][txh_idx];
+  cfg->cos_bit_row = inv_cos_bit_row[txw_idx][txh_idx];
+  cfg->txfm_type_col = av1_txfm_type_ls[txh_idx][tx_type_1d_col];
+  if (cfg->txfm_type_col == TXFM_TYPE_ADST4) {
+    memcpy(cfg->stage_range_col, iadst4_range, sizeof(iadst4_range));
+  }
+  cfg->txfm_type_row = av1_txfm_type_ls[txw_idx][tx_type_1d_row];
+  if (cfg->txfm_type_row == TXFM_TYPE_ADST4) {
+    memcpy(cfg->stage_range_row, iadst4_range, sizeof(iadst4_range));
+  }
+  cfg->stage_num_col = av1_txfm_stage_num_list[cfg->txfm_type_col];
+  cfg->stage_num_row = av1_txfm_stage_num_list[cfg->txfm_type_row];
+}
+
+void av1_gen_inv_stage_range(int8_t *stage_range_col, int8_t *stage_range_row,
+                             const TXFM_2D_FLIP_CFG *cfg, TX_SIZE tx_size,
+                             int bd) {
+  const int fwd_shift = inv_start_range[tx_size];
+  const int8_t *shift = cfg->shift;
+  int8_t opt_range_row, opt_range_col;
+  if (bd == 8) {
+    opt_range_row = 16;
+    opt_range_col = 16;
+  } else if (bd == 10) {
+    opt_range_row = 18;
+    opt_range_col = 16;
+  } else {
+    assert(bd == 12);
+    opt_range_row = 20;
+    opt_range_col = 18;
+  }
+  // i < MAX_TXFM_STAGE_NUM will mute above array bounds warning
+  for (int i = 0; i < cfg->stage_num_row && i < MAX_TXFM_STAGE_NUM; ++i) {
+    int real_range_row = cfg->stage_range_row[i] + fwd_shift + bd + 1;
+    (void)real_range_row;
+    if (cfg->txfm_type_row == TXFM_TYPE_ADST4 && i == 1) {
+      // the adst4 may use 1 extra bit on top of opt_range_row at stage 1
+      // so opt_range_col >= real_range_col will not hold
+      stage_range_row[i] = opt_range_row;
+    } else {
+      assert(opt_range_row >= real_range_row);
+      stage_range_row[i] = opt_range_row;
+    }
+  }
+  // i < MAX_TXFM_STAGE_NUM will mute above array bounds warning
+  for (int i = 0; i < cfg->stage_num_col && i < MAX_TXFM_STAGE_NUM; ++i) {
+    int real_range_col =
+        cfg->stage_range_col[i] + fwd_shift + shift[0] + bd + 1;
+    (void)real_range_col;
+    if (cfg->txfm_type_col == TXFM_TYPE_ADST4 && i == 1) {
+      // the adst4 may use 1 extra bit on top of opt_range_row at stage 1
+      // so opt_range_col >= real_range_col will not hold
+      stage_range_col[i] = opt_range_col;
+    } else {
+      assert(opt_range_col >= real_range_col);
+      stage_range_col[i] = opt_range_col;
+    }
+  }
+}
+
+static INLINE void inv_txfm2d_add_c(const int32_t *input, uint16_t *output,
+                                    int stride, TXFM_2D_FLIP_CFG *cfg,
+                                    int32_t *txfm_buf, TX_SIZE tx_size,
+                                    int bd) {
+  // Note when assigning txfm_size_col, we use the txfm_size from the
+  // row configuration and vice versa. This is intentionally done to
+  // accurately perform rectangular transforms. When the transform is
+  // rectangular, the number of columns will be the same as the
+  // txfm_size stored in the row cfg struct. It will make no difference
+  // for square transforms.
+  const int txfm_size_col = tx_size_wide[cfg->tx_size];
+  const int txfm_size_row = tx_size_high[cfg->tx_size];
+  // Take the shift from the larger dimension in the rectangular case.
+  const int8_t *shift = cfg->shift;
+  const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
+  int8_t stage_range_row[MAX_TXFM_STAGE_NUM];
+  int8_t stage_range_col[MAX_TXFM_STAGE_NUM];
+  assert(cfg->stage_num_row <= MAX_TXFM_STAGE_NUM);
+  assert(cfg->stage_num_col <= MAX_TXFM_STAGE_NUM);
+  av1_gen_inv_stage_range(stage_range_col, stage_range_row, cfg, tx_size, bd);
+
+  const int8_t cos_bit_col = cfg->cos_bit_col;
+  const int8_t cos_bit_row = cfg->cos_bit_row;
+  const TxfmFunc txfm_func_col = inv_txfm_type_to_func(cfg->txfm_type_col);
+  const TxfmFunc txfm_func_row = inv_txfm_type_to_func(cfg->txfm_type_row);
+
+  // txfm_buf's length is  txfm_size_row * txfm_size_col + 2 *
+  // AOMMAX(txfm_size_row, txfm_size_col)
+  // it is used for intermediate data buffering
+  const int buf_offset = AOMMAX(txfm_size_row, txfm_size_col);
+  int32_t *temp_in = txfm_buf;
+  int32_t *temp_out = temp_in + buf_offset;
+  int32_t *buf = temp_out + buf_offset;
+  int32_t *buf_ptr = buf;
+  int c, r;
+
+  // Rows
+  for (r = 0; r < txfm_size_row; ++r) {
+    if (abs(rect_type) == 1) {
+      for (c = 0; c < txfm_size_col; ++c) {
+        temp_in[c] = round_shift((int64_t)input[c] * NewInvSqrt2, NewSqrt2Bits);
+      }
+      clamp_buf(temp_in, txfm_size_col, bd + 8);
+      txfm_func_row(temp_in, buf_ptr, cos_bit_row, stage_range_row);
+    } else {
+      for (c = 0; c < txfm_size_col; ++c) {
+        temp_in[c] = input[c];
+      }
+      clamp_buf(temp_in, txfm_size_col, bd + 8);
+      txfm_func_row(temp_in, buf_ptr, cos_bit_row, stage_range_row);
+    }
+    av1_round_shift_array(buf_ptr, txfm_size_col, -shift[0]);
+    input += txfm_size_col;
+    buf_ptr += txfm_size_col;
+  }
+
+  // Columns
+  for (c = 0; c < txfm_size_col; ++c) {
+    if (cfg->lr_flip == 0) {
+      for (r = 0; r < txfm_size_row; ++r)
+        temp_in[r] = buf[r * txfm_size_col + c];
+    } else {
+      // flip left right
+      for (r = 0; r < txfm_size_row; ++r)
+        temp_in[r] = buf[r * txfm_size_col + (txfm_size_col - c - 1)];
+    }
+    clamp_buf(temp_in, txfm_size_row, AOMMAX(bd + 6, 16));
+    txfm_func_col(temp_in, temp_out, cos_bit_col, stage_range_col);
+    av1_round_shift_array(temp_out, txfm_size_row, -shift[1]);
+    if (cfg->ud_flip == 0) {
+      for (r = 0; r < txfm_size_row; ++r) {
+        output[r * stride + c] =
+            highbd_clip_pixel_add(output[r * stride + c], temp_out[r], bd);
+      }
+    } else {
+      // flip upside down
+      for (r = 0; r < txfm_size_row; ++r) {
+        output[r * stride + c] = highbd_clip_pixel_add(
+            output[r * stride + c], temp_out[txfm_size_row - r - 1], bd);
+      }
+    }
+  }
+}
+
+void inv_txfm2d_c(const int32_t *input, uint16_t *output, int stride, TXFM_2D_FLIP_CFG *cfg,
+                                    int32_t *txfm_buf, TX_SIZE tx_size, int bd) {
+  // Note when assigning txfm_size_col, we use the txfm_size from the
+  // row configuration and vice versa. This is intentionally done to
+  // accurately perform rectangular transforms. When the transform is
+  // rectangular, the number of columns will be the same as the
+  // txfm_size stored in the row cfg struct. It will make no difference
+  // for square transforms.
+  const int txfm_size_col = tx_size_wide[cfg->tx_size];
+  const int txfm_size_row = tx_size_high[cfg->tx_size];
+  // Take the shift from the larger dimension in the rectangular case.
+  const int8_t *shift = cfg->shift;
+  const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
+  int8_t stage_range_row[MAX_TXFM_STAGE_NUM];
+  int8_t stage_range_col[MAX_TXFM_STAGE_NUM];
+  assert(cfg->stage_num_row <= MAX_TXFM_STAGE_NUM);
+  assert(cfg->stage_num_col <= MAX_TXFM_STAGE_NUM);
+  av1_gen_inv_stage_range(stage_range_col, stage_range_row, cfg, tx_size, bd);
+
+  const int8_t cos_bit_col = cfg->cos_bit_col;
+  const int8_t cos_bit_row = cfg->cos_bit_row;
+  const TxfmFunc txfm_func_col = inv_txfm_type_to_func(cfg->txfm_type_col);
+  const TxfmFunc txfm_func_row = inv_txfm_type_to_func(cfg->txfm_type_row);
+
+  // txfm_buf's length is  txfm_size_row * txfm_size_col + 2 *
+  // AOMMAX(txfm_size_row, txfm_size_col)
+  // it is used for intermediate data buffering
+  const int buf_offset = AOMMAX(txfm_size_row, txfm_size_col);
+  int32_t *temp_in = txfm_buf;
+  int32_t *temp_out = temp_in + buf_offset;
+  int32_t *buf = temp_out + buf_offset;
+  int32_t *buf_ptr = buf;
+  int c, r;
+
+  // Rows
+  for (r = 0; r < txfm_size_row; ++r) {
+    if (abs(rect_type) == 1) {
+      for (c = 0; c < txfm_size_col; ++c) {
+        temp_in[c] = round_shift((int64_t)input[c] * NewInvSqrt2, NewSqrt2Bits);
+      }
+      clamp_buf(temp_in, txfm_size_col, bd + 8);
+      txfm_func_row(temp_in, buf_ptr, cos_bit_row, stage_range_row);
+    } else {
+      for (c = 0; c < txfm_size_col; ++c) {
+        temp_in[c] = input[c];
+      }
+      clamp_buf(temp_in, txfm_size_col, bd + 8);
+      txfm_func_row(temp_in, buf_ptr, cos_bit_row, stage_range_row);
+    }
+    av1_round_shift_array(buf_ptr, txfm_size_col, -shift[0]);
+    input += txfm_size_col;
+    buf_ptr += txfm_size_col;
+  }
+
+  // Columns
+  for (c = 0; c < txfm_size_col; ++c) {
+    if (cfg->lr_flip == 0) {
+      for (r = 0; r < txfm_size_row; ++r) temp_in[r] = buf[r * txfm_size_col + c];
+    } else {
+      // flip left right
+      for (r = 0; r < txfm_size_row; ++r) temp_in[r] = buf[r * txfm_size_col + (txfm_size_col - c - 1)];
+    }
+    clamp_buf(temp_in, txfm_size_row, AOMMAX(bd + 6, 16));
+    txfm_func_col(temp_in, temp_out, cos_bit_col, stage_range_col);
+    av1_round_shift_array(temp_out, txfm_size_row, -shift[1]);
+    if (cfg->ud_flip == 0) {
+      for (r = 0; r < txfm_size_row; ++r) {
+        output[r * stride + c] = temp_out[r];
+      }
+    } else {
+      // flip upside down
+      for (r = 0; r < txfm_size_row; ++r) {
+        output[r * stride + c] = temp_out[txfm_size_row - r - 1];
+      }
+    }
+  }
+}
+
+static INLINE void inv_txfm2d_add_facade(const int32_t *input, uint16_t *output,
+                                         int stride, int32_t *txfm_buf,
+                                         TX_TYPE tx_type, TX_SIZE tx_size,
+                                         int bd) {
+  TXFM_2D_FLIP_CFG cfg;
+  av1_get_inv_txfm_cfg(tx_type, tx_size, &cfg);
+  // Forward shift sum uses larger square size, to be consistent with what
+  // av1_gen_inv_stage_range() does for inverse shifts.
+  inv_txfm2d_add_c(input, output, stride, &cfg, txfm_buf, tx_size, bd);
+}
+
+void av1_inv_txfm2d_add_4x8_c(const int32_t *input, uint16_t *output,
+                              int stride, TX_TYPE tx_type, int bd) {
+  DECLARE_ALIGNED(32, int, txfm_buf[4 * 8 + 8 + 8]);
+  inv_txfm2d_add_facade(input, output, stride, txfm_buf, tx_type, TX_4X8, bd);
+}
+
+void av1_inv_txfm2d_add_8x4_c(const int32_t *input, uint16_t *output,
+                              int stride, TX_TYPE tx_type, int bd) {
+  DECLARE_ALIGNED(32, int, txfm_buf[8 * 4 + 8 + 8]);
+  inv_txfm2d_add_facade(input, output, stride, txfm_buf, tx_type, TX_8X4, bd);
+}
+
+void av1_inv_txfm2d_add_8x16_c(const int32_t *input, uint16_t *output,
+                               int stride, TX_TYPE tx_type, int bd) {
+  DECLARE_ALIGNED(32, int, txfm_buf[8 * 16 + 16 + 16]);
+  inv_txfm2d_add_facade(input, output, stride, txfm_buf, tx_type, TX_8X16, bd);
+}
+
+void av1_inv_txfm2d_add_16x8_c(const int32_t *input, uint16_t *output,
+                               int stride, TX_TYPE tx_type, int bd) {
+  DECLARE_ALIGNED(32, int, txfm_buf[16 * 8 + 16 + 16]);
+  inv_txfm2d_add_facade(input, output, stride, txfm_buf, tx_type, TX_16X8, bd);
+}
+
+void av1_inv_txfm2d_add_16x32_c(const int32_t *input, uint16_t *output,
+                                int stride, TX_TYPE tx_type, int bd) {
+  DECLARE_ALIGNED(32, int, txfm_buf[16 * 32 + 32 + 32]);
+  inv_txfm2d_add_facade(input, output, stride, txfm_buf, tx_type, TX_16X32, bd);
+}
+
+void av1_inv_txfm2d_add_32x16_c(const int32_t *input, uint16_t *output,
+                                int stride, TX_TYPE tx_type, int bd) {
+  DECLARE_ALIGNED(32, int, txfm_buf[32 * 16 + 32 + 32]);
+  inv_txfm2d_add_facade(input, output, stride, txfm_buf, tx_type, TX_32X16, bd);
+}
+
+void av1_inv_txfm2d_add_4x4_c(const int32_t *input, uint16_t *output,
+                              int stride, TX_TYPE tx_type, int bd) {
+  DECLARE_ALIGNED(32, int, txfm_buf[4 * 4 + 4 + 4]);
+  inv_txfm2d_add_facade(input, output, stride, txfm_buf, tx_type, TX_4X4, bd);
+}
+
+void av1_inv_txfm2d_add_8x8_c(const int32_t *input, uint16_t *output,
+                              int stride, TX_TYPE tx_type, int bd) {
+  DECLARE_ALIGNED(32, int, txfm_buf[8 * 8 + 8 + 8]);
+  inv_txfm2d_add_facade(input, output, stride, txfm_buf, tx_type, TX_8X8, bd);
+}
+
+void av1_inv_txfm2d_add_16x16_c(const int32_t *input, uint16_t *output,
+                                int stride, TX_TYPE tx_type, int bd) {
+  DECLARE_ALIGNED(32, int, txfm_buf[16 * 16 + 16 + 16]);
+  inv_txfm2d_add_facade(input, output, stride, txfm_buf, tx_type, TX_16X16, bd);
+}
+
+void av1_inv_txfm2d_add_32x32_c(const int32_t *input, uint16_t *output,
+                                int stride, TX_TYPE tx_type, int bd) {
+  DECLARE_ALIGNED(32, int, txfm_buf[32 * 32 + 32 + 32]);
+  inv_txfm2d_add_facade(input, output, stride, txfm_buf, tx_type, TX_32X32, bd);
+}
+
+void av1_inv_txfm2d_add_64x64_c(const int32_t *input, uint16_t *output,
+                                int stride, TX_TYPE tx_type, int bd) {
+  // TODO(urvang): Can the same array be reused, instead of using a new array?
+  // Remap 32x32 input into a modified 64x64 by:
+  // - Copying over these values in top-left 32x32 locations.
+  // - Setting the rest of the locations to 0.
+  int32_t mod_input[64 * 64];
+  for (int row = 0; row < 32; ++row) {
+    memcpy(mod_input + row * 64, input + row * 32, 32 * sizeof(*mod_input));
+    memset(mod_input + row * 64 + 32, 0, 32 * sizeof(*mod_input));
+  }
+  memset(mod_input + 32 * 64, 0, 32 * 64 * sizeof(*mod_input));
+  DECLARE_ALIGNED(32, int, txfm_buf[64 * 64 + 64 + 64]);
+  inv_txfm2d_add_facade(mod_input, output, stride, txfm_buf, tx_type, TX_64X64,
+                        bd);
+}
+
+void av1_inv_txfm2d_add_64x32_c(const int32_t *input, uint16_t *output,
+                                int stride, TX_TYPE tx_type, int bd) {
+  // Remap 32x32 input into a modified 64x32 by:
+  // - Copying over these values in top-left 32x32 locations.
+  // - Setting the rest of the locations to 0.
+  int32_t mod_input[64 * 32];
+  for (int row = 0; row < 32; ++row) {
+    memcpy(mod_input + row * 64, input + row * 32, 32 * sizeof(*mod_input));
+    memset(mod_input + row * 64 + 32, 0, 32 * sizeof(*mod_input));
+  }
+  DECLARE_ALIGNED(32, int, txfm_buf[64 * 32 + 64 + 64]);
+  inv_txfm2d_add_facade(mod_input, output, stride, txfm_buf, tx_type, TX_64X32,
+                        bd);
+}
+
+void av1_inv_txfm2d_add_32x64_c(const int32_t *input, uint16_t *output,
+                                int stride, TX_TYPE tx_type, int bd) {
+  // Remap 32x32 input into a modified 32x64 input by:
+  // - Copying over these values in top-left 32x32 locations.
+  // - Setting the rest of the locations to 0.
+  int32_t mod_input[32 * 64];
+  memcpy(mod_input, input, 32 * 32 * sizeof(*mod_input));
+  memset(mod_input + 32 * 32, 0, 32 * 32 * sizeof(*mod_input));
+  DECLARE_ALIGNED(32, int, txfm_buf[64 * 32 + 64 + 64]);
+  inv_txfm2d_add_facade(mod_input, output, stride, txfm_buf, tx_type, TX_32X64,
+                        bd);
+}
+
+void av1_inv_txfm2d_add_16x64_c(const int32_t *input, uint16_t *output,
+                                int stride, TX_TYPE tx_type, int bd) {
+  // Remap 16x32 input into a modified 16x64 input by:
+  // - Copying over these values in top-left 16x32 locations.
+  // - Setting the rest of the locations to 0.
+  int32_t mod_input[16 * 64];
+  memcpy(mod_input, input, 16 * 32 * sizeof(*mod_input));
+  memset(mod_input + 16 * 32, 0, 16 * 32 * sizeof(*mod_input));
+  DECLARE_ALIGNED(32, int, txfm_buf[16 * 64 + 64 + 64]);
+  inv_txfm2d_add_facade(mod_input, output, stride, txfm_buf, tx_type, TX_16X64,
+                        bd);
+}
+
+void av1_inv_txfm2d_add_64x16_c(const int32_t *input, uint16_t *output,
+                                int stride, TX_TYPE tx_type, int bd) {
+  // Remap 32x16 input into a modified 64x16 by:
+  // - Copying over these values in top-left 32x16 locations.
+  // - Setting the rest of the locations to 0.
+  int32_t mod_input[64 * 16];
+  for (int row = 0; row < 16; ++row) {
+    memcpy(mod_input + row * 64, input + row * 32, 32 * sizeof(*mod_input));
+    memset(mod_input + row * 64 + 32, 0, 32 * sizeof(*mod_input));
+  }
+  DECLARE_ALIGNED(32, int, txfm_buf[16 * 64 + 64 + 64]);
+  inv_txfm2d_add_facade(mod_input, output, stride, txfm_buf, tx_type, TX_64X16,
+                        bd);
+}
+
+void av1_inv_txfm2d_add_4x16_c(const int32_t *input, uint16_t *output,
+                               int stride, TX_TYPE tx_type, int bd) {
+  DECLARE_ALIGNED(32, int, txfm_buf[4 * 16 + 16 + 16]);
+  inv_txfm2d_add_facade(input, output, stride, txfm_buf, tx_type, TX_4X16, bd);
+}
+
+void av1_inv_txfm2d_add_16x4_c(const int32_t *input, uint16_t *output,
+                               int stride, TX_TYPE tx_type, int bd) {
+  DECLARE_ALIGNED(32, int, txfm_buf[4 * 16 + 16 + 16]);
+  inv_txfm2d_add_facade(input, output, stride, txfm_buf, tx_type, TX_16X4, bd);
+}
+
+void av1_inv_txfm2d_add_8x32_c(const int32_t *input, uint16_t *output,
+                               int stride, TX_TYPE tx_type, int bd) {
+  DECLARE_ALIGNED(32, int, txfm_buf[8 * 32 + 32 + 32]);
+  inv_txfm2d_add_facade(input, output, stride, txfm_buf, tx_type, TX_8X32, bd);
+}
+
+void av1_inv_txfm2d_add_32x8_c(const int32_t *input, uint16_t *output,
+                               int stride, TX_TYPE tx_type, int bd) {
+  DECLARE_ALIGNED(32, int, txfm_buf[8 * 32 + 32 + 32]);
+  inv_txfm2d_add_facade(input, output, stride, txfm_buf, tx_type, TX_32X8, bd);
+}

diff --git a/libav1/av1/common/av1_loopfilter.c b/libav1/av1/common/av1_loopfilter.c
new file mode 100644
index 0000000..0aa1f9b
--- /dev/null
+++ b/libav1/av1/common/av1_loopfilter.c

@@ -0,0 +1,2483 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <math.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_mem/aom_mem.h"
+#include "aom_ports/mem.h"
+#include "av1/common/av1_loopfilter.h"
+#include "av1/common/onyxc_int.h"
+#include "av1/common/reconinter.h"
+#include "av1/common/seg_common.h"
+
+static const SEG_LVL_FEATURES seg_lvl_lf_lut[MAX_MB_PLANE][2] = {
+  { SEG_LVL_ALT_LF_Y_V, SEG_LVL_ALT_LF_Y_H },
+  { SEG_LVL_ALT_LF_U, SEG_LVL_ALT_LF_U },
+  { SEG_LVL_ALT_LF_V, SEG_LVL_ALT_LF_V }
+};
+
+static const int delta_lf_id_lut[MAX_MB_PLANE][2] = {
+  { 0, 1 }, { 2, 2 }, { 3, 3 }
+};
+
+enum { VERT_EDGE = 0, HORZ_EDGE = 1, NUM_EDGE_DIRS } UENUM1BYTE(EDGE_DIR);
+
+static const int mode_lf_lut[] = {
+  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  // INTRA_MODES
+  1, 1, 0, 1,                             // INTER_MODES (GLOBALMV == 0)
+  1, 1, 1, 1, 1, 1, 0, 1  // INTER_COMPOUND_MODES (GLOBAL_GLOBALMV == 0)
+};
+
+// 256 bit masks (64x64 / 4x4) for left transform size for Y plane.
+// We use 4 uint64_t to represent the 256 bit.
+// Each 1 represents a position where we should apply a loop filter
+// across the left border of an 4x4 block boundary.
+//
+// In the case of TX_8x8->  ( in low order byte first we end up with
+// a mask that looks like this (-- and | are used for better view)
+//
+//    10101010|10101010
+//    10101010|10101010
+//    10101010|10101010
+//    10101010|10101010
+//    10101010|10101010
+//    10101010|10101010
+//    10101010|10101010
+//    10101010|10101010
+//    -----------------
+//    10101010|10101010
+//    10101010|10101010
+//    10101010|10101010
+//    10101010|10101010
+//    10101010|10101010
+//    10101010|10101010
+//    10101010|10101010
+//    10101010|10101010
+//
+// A loopfilter should be applied to every other 4x4 horizontally.
+
+// 256 bit masks (64x64 / 4x4) for above transform size for Y plane.
+// We use 4 uint64_t to represent the 256 bit.
+// Each 1 represents a position where we should apply a loop filter
+// across the top border of an 4x4 block boundary.
+//
+// In the case of TX_8x8->  ( in low order byte first we end up with
+// a mask that looks like this
+//
+//    11111111|11111111
+//    00000000|00000000
+//    11111111|11111111
+//    00000000|00000000
+//    11111111|11111111
+//    00000000|00000000
+//    11111111|11111111
+//    00000000|00000000
+//    -----------------
+//    11111111|11111111
+//    00000000|00000000
+//    11111111|11111111
+//    00000000|00000000
+//    11111111|11111111
+//    00000000|00000000
+//    11111111|11111111
+//    00000000|00000000
+//
+// A loopfilter should be applied to every other 4x4 horizontally.
+
+const int mask_id_table_tx_4x4[BLOCK_SIZES_ALL] = {
+  0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, -1, -1, -1, 13, 14, 15, 16, 17, 18
+};
+
+const int mask_id_table_tx_8x8[BLOCK_SIZES_ALL] = {
+  -1, -1, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, 10, 11, 12, 13
+};
+
+const int mask_id_table_tx_16x16[BLOCK_SIZES_ALL] = {
+  -1, -1, -1, -1, -1, -1, 0, 1, 2, 3, 4, 5, 6, -1, -1, -1, -1, -1, -1, -1, 7, 8
+};
+
+const int mask_id_table_tx_32x32[BLOCK_SIZES_ALL] = { -1, -1, -1, -1, -1, -1,
+                                                      -1, -1, -1, 0,  1,  2,
+                                                      3,  -1, -1, -1, -1, -1,
+                                                      -1, -1, -1, -1 };
+const int mask_id_table_vert_border[BLOCK_SIZES_ALL] = { 0,  47, 49, 19, 51, 53,
+                                                         33, 55, 57, 42, 59, 60,
+                                                         46, -1, -1, -1, 61, 62,
+                                                         63, 64, 65, 66 };
+
+const FilterMask left_mask_univariant_reordered[67] = {
+  // TX_4X4
+  { { 0x0000000000000001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 4X4, TX_4X4
+  { { 0x0000000000010001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 4X8, TX_4X4
+  { { 0x0000000000000003ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X4, TX_4X4
+  { { 0x0000000000030003ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X8, TX_4X4
+  { { 0x0003000300030003ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X16, TX_4X4
+  { { 0x00000000000f000fULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X8, TX_4X4
+  { { 0x000f000f000f000fULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X16, TX_4X4
+  { { 0x000f000f000f000fULL, 0x000f000f000f000fULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X32, TX_4X4
+  { { 0x00ff00ff00ff00ffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X16, TX_4X4
+  { { 0x00ff00ff00ff00ffULL, 0x00ff00ff00ff00ffULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X32, TX_4X4
+  { { 0x00ff00ff00ff00ffULL, 0x00ff00ff00ff00ffULL, 0x00ff00ff00ff00ffULL,
+      0x00ff00ff00ff00ffULL } },  // block size 32X64, TX_4X4
+  { { 0xffffffffffffffffULL, 0xffffffffffffffffULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X32, TX_4X4
+  { { 0xffffffffffffffffULL, 0xffffffffffffffffULL, 0xffffffffffffffffULL,
+      0xffffffffffffffffULL } },  // block size 64X64, TX_4X4
+  { { 0x0001000100010001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 4X16, TX_4X4
+  { { 0x000000000000000fULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X4, TX_4X4
+  { { 0x0003000300030003ULL, 0x0003000300030003ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X32, TX_4X4
+  { { 0x0000000000ff00ffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X8, TX_4X4
+  { { 0x000f000f000f000fULL, 0x000f000f000f000fULL, 0x000f000f000f000fULL,
+      0x000f000f000f000fULL } },  // block size 16X64, TX_4X4
+  { { 0xffffffffffffffffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X16, TX_4X4
+  // TX_8X8
+  { { 0x0000000000010001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X8, TX_8X8
+  { { 0x0001000100010001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X16, TX_8X8
+  { { 0x0000000000050005ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X8, TX_8X8
+  { { 0x0005000500050005ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X16, TX_8X8
+  { { 0x0005000500050005ULL, 0x0005000500050005ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X32, TX_8X8
+  { { 0x0055005500550055ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X16, TX_8X8
+  { { 0x0055005500550055ULL, 0x0055005500550055ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X32, TX_8X8
+  { { 0x0055005500550055ULL, 0x0055005500550055ULL, 0x0055005500550055ULL,
+      0x0055005500550055ULL } },  // block size 32X64, TX_8X8
+  { { 0x5555555555555555ULL, 0x5555555555555555ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X32, TX_8X8
+  { { 0x5555555555555555ULL, 0x5555555555555555ULL, 0x5555555555555555ULL,
+      0x5555555555555555ULL } },  // block size 64X64, TX_8X8
+  { { 0x0001000100010001ULL, 0x0001000100010001ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X32, TX_8X8
+  { { 0x0000000000550055ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X8, TX_8X8
+  { { 0x0005000500050005ULL, 0x0005000500050005ULL, 0x0005000500050005ULL,
+      0x0005000500050005ULL } },  // block size 16X64, TX_8X8
+  { { 0x5555555555555555ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X16, TX_8X8
+  // TX_16X16
+  { { 0x0001000100010001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X16, TX_16X16
+  { { 0x0001000100010001ULL, 0x0001000100010001ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X32, TX_16X16
+  { { 0x0011001100110011ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X16, TX_16X16
+  { { 0x0011001100110011ULL, 0x0011001100110011ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X32, TX_16X16
+  { { 0x0011001100110011ULL, 0x0011001100110011ULL, 0x0011001100110011ULL,
+      0x0011001100110011ULL } },  // block size 32X64, TX_16X16
+  { { 0x1111111111111111ULL, 0x1111111111111111ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X32, TX_16X16
+  { { 0x1111111111111111ULL, 0x1111111111111111ULL, 0x1111111111111111ULL,
+      0x1111111111111111ULL } },  // block size 64X64, TX_16X16
+  { { 0x0001000100010001ULL, 0x0001000100010001ULL, 0x0001000100010001ULL,
+      0x0001000100010001ULL } },  // block size 16X64, TX_16X16
+  { { 0x1111111111111111ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X16, TX_16X16
+  // TX_32X32
+  { { 0x0001000100010001ULL, 0x0001000100010001ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X32, TX_32X32
+  { { 0x0101010101010101ULL, 0x0101010101010101ULL, 0x0101010101010101ULL,
+      0x0101010101010101ULL } },  // block size 32X64, TX_32X32
+  { { 0x0101010101010101ULL, 0x0101010101010101ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X32, TX_32X32
+  { { 0x0101010101010101ULL, 0x0101010101010101ULL, 0x0101010101010101ULL,
+      0x0101010101010101ULL } },  // block size 64X64, TX_32X32
+  // TX_64X64
+  { { 0x0001000100010001ULL, 0x0001000100010001ULL, 0x0001000100010001ULL,
+      0x0001000100010001ULL } },  // block size 64X64, TX_64X64
+  // 2:1, 1:2 transform sizes.
+  { { 0x0000000000010001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 4X8, TX_4X8
+  { { 0x0001000100010001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 4X16, TX_4X8
+  { { 0x0000000000000001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X4, TX_8X4
+  { { 0x0000000000000005ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X4, TX_8X4
+  { { 0x0001000100010001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X16, TX_8X16
+  { { 0x0001000100010001ULL, 0x0001000100010001ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X32, TX_8X16
+  { { 0x0000000000010001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X8, TX_16X8
+  { { 0x0000000000110011ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X8, TX_16X8
+  { { 0x0001000100010001ULL, 0x0001000100010001ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X32, TX_16X32
+  { { 0x0001000100010001ULL, 0x0001000100010001ULL, 0x0001000100010001ULL,
+      0x0001000100010001ULL } },  // block size 16X64, TX_16X32
+  { { 0x0001000100010001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X16, TX_32X16
+  { { 0x0101010101010101ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X16, TX_32X16
+  { { 0x0001000100010001ULL, 0x0001000100010001ULL, 0x0001000100010001ULL,
+      0x0001000100010001ULL } },  // block size 32X64, TX_32X64
+  { { 0x0001000100010001ULL, 0x0001000100010001ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X32, TX_64X32
+  // 4:1, 1:4 transform sizes.
+  { { 0x0001000100010001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 4X16, TX_4X16
+  { { 0x0000000000000001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X4, TX_16X4
+  { { 0x0001000100010001ULL, 0x0001000100010001ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X32, TX_8X32
+  { { 0x0000000000010001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X8, TX_32X8
+  { { 0x0001000100010001ULL, 0x0001000100010001ULL, 0x0001000100010001ULL,
+      0x0001000100010001ULL } },  // block size 16X64, TX_16X64
+  { { 0x0001000100010001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X16, TX_64X16
+};
+
+const FilterMask above_mask_univariant_reordered[67] = {
+  // TX_4X4
+  { { 0x0000000000000001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 4X4, TX_4X4
+  { { 0x0000000000010001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 4X8, TX_4X4
+  { { 0x0000000000000003ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X4, TX_4X4
+  { { 0x0000000000030003ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X8, TX_4X4
+  { { 0x0003000300030003ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X16, TX_4X4
+  { { 0x00000000000f000fULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X8, TX_4X4
+  { { 0x000f000f000f000fULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X16, TX_4X4
+  { { 0x000f000f000f000fULL, 0x000f000f000f000fULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X32, TX_4X4
+  { { 0x00ff00ff00ff00ffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X16, TX_4X4
+  { { 0x00ff00ff00ff00ffULL, 0x00ff00ff00ff00ffULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X32, TX_4X4
+  { { 0x00ff00ff00ff00ffULL, 0x00ff00ff00ff00ffULL, 0x00ff00ff00ff00ffULL,
+      0x00ff00ff00ff00ffULL } },  // block size 32X64, TX_4X4
+  { { 0xffffffffffffffffULL, 0xffffffffffffffffULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X32, TX_4X4
+  { { 0xffffffffffffffffULL, 0xffffffffffffffffULL, 0xffffffffffffffffULL,
+      0xffffffffffffffffULL } },  // block size 64X64, TX_4x4
+  { { 0x0001000100010001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 4X16, TX_4X4
+  { { 0x000000000000000fULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X4, TX_4X4
+  { { 0x0003000300030003ULL, 0x0003000300030003ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X32, TX_4X4
+  { { 0x0000000000ff00ffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X8, TX_4X4
+  { { 0x000f000f000f000fULL, 0x000f000f000f000fULL, 0x000f000f000f000fULL,
+      0x000f000f000f000fULL } },  // block size 16X64, TX_4X4
+  { { 0xffffffffffffffffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X16, TX_4X4
+  // TX_8X8
+  { { 0x0000000000000003ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X8, TX_8X8
+  { { 0x0000000300000003ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X16, TX_8X8
+  { { 0x000000000000000fULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X8, TX_8X8
+  { { 0x0000000f0000000fULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X16, TX_8X8
+  { { 0x0000000f0000000fULL, 0x0000000f0000000fULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X32, TX_8X8
+  { { 0x000000ff000000ffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X16, TX_8X8
+  { { 0x000000ff000000ffULL, 0x000000ff000000ffULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X32, TX_8X8
+  { { 0x000000ff000000ffULL, 0x000000ff000000ffULL, 0x000000ff000000ffULL,
+      0x000000ff000000ffULL } },  // block size 32X64, TX_8X8
+  { { 0x0000ffff0000ffffULL, 0x0000ffff0000ffffULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X32, TX_8X8
+  { { 0x0000ffff0000ffffULL, 0x0000ffff0000ffffULL, 0x0000ffff0000ffffULL,
+      0x0000ffff0000ffffULL } },  // block size 64X64, TX_8X8
+  { { 0x0000000300000003ULL, 0x0000000300000003ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X32, TX_8X8
+  { { 0x00000000000000ffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X8, TX_8X8
+  { { 0x0000000f0000000fULL, 0x0000000f0000000fULL, 0x0000000f0000000fULL,
+      0x0000000f0000000fULL } },  // block size 16X64, TX_8X8
+  { { 0x0000ffff0000ffffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X16, TX_8X8
+  // TX_16X16
+  { { 0x000000000000000fULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X16, TX_16X16
+  { { 0x000000000000000fULL, 0x000000000000000fULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X32, TX_16X16
+  { { 0x00000000000000ffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X16, TX_16X16
+  { { 0x00000000000000ffULL, 0x00000000000000ffULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X32, TX_16X16
+  { { 0x00000000000000ffULL, 0x00000000000000ffULL, 0x00000000000000ffULL,
+      0x00000000000000ffULL } },  // block size 32X64, TX_16X16
+  { { 0x000000000000ffffULL, 0x000000000000ffffULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X32, TX_16X16
+  { { 0x000000000000ffffULL, 0x000000000000ffffULL, 0x000000000000ffffULL,
+      0x000000000000ffffULL } },  // block size 64X64, TX_16X16
+  { { 0x000000000000000fULL, 0x000000000000000fULL, 0x000000000000000fULL,
+      0x000000000000000fULL } },  // block size 16X64, TX_16X16
+  { { 0x000000000000ffffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X16, TX_16X16
+  // TX_32X32
+  { { 0x00000000000000ffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X32, TX_32X32
+  { { 0x00000000000000ffULL, 0x0000000000000000ULL, 0x00000000000000ffULL,
+      0x0000000000000000ULL } },  // block size 32X64, TX_32X32
+  { { 0x000000000000ffffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X32, TX_32X32
+  { { 0x000000000000ffffULL, 0x0000000000000000ULL, 0x000000000000ffffULL,
+      0x0000000000000000ULL } },  // block size 64X64, TX_32X32
+  // TX_64X64
+  { { 0x000000000000ffffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X64, TX_64X64
+  // 2:1, 1:2 transform sizes.
+  { { 0x0000000000000001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 4X8, TX_4X8
+  { { 0x0000000100000001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 4X16, TX_4X8
+  { { 0x0000000000000003ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X4, TX_8X4
+  { { 0x000000000000000fULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X4, TX_8X4
+  { { 0x0000000000000003ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X16, TX_8X16
+  { { 0x0000000000000003ULL, 0x0000000000000003ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X32, TX_8X16
+  { { 0x000000000000000fULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X8, TX_16X8
+  { { 0x00000000000000ffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X8, TX_16X8
+  { { 0x000000000000000fULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X32, TX_16X32
+  { { 0x000000000000000fULL, 0x0000000000000000ULL, 0x000000000000000fULL,
+      0x0000000000000000ULL } },  // block size 16X64, TX_16X32
+  { { 0x00000000000000ffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X16, TX_32X16
+  { { 0x000000000000ffffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X16, TX_32X16
+  { { 0x00000000000000ffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X64, TX_32X64
+  { { 0x000000000000ffffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X32, TX_64X32
+  // 4:1, 1:4 transform sizes.
+  { { 0x0000000000000001ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 4X16, TX_4X16
+  { { 0x000000000000000fULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X4, TX_16X4
+  { { 0x0000000000000003ULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 8X32, TX_8X32
+  { { 0x00000000000000ffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 32X8, TX_32X8
+  { { 0x000000000000000fULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 16X64, TX_16X64
+  { { 0x000000000000ffffULL, 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL } },  // block size 64X16, TX_64X16
+};
+
+#if LOOP_FILTER_BITMASK
+LoopFilterMask *get_loop_filter_mask(const AV1_COMMON *const cm, int mi_row,
+                                     int mi_col) {
+  assert(cm->lf.lfm != NULL);
+  const int row = mi_row >> MIN_MIB_SIZE_LOG2;  // 64x64
+  const int col = mi_col >> MIN_MIB_SIZE_LOG2;
+  return &cm->lf.lfm[row * cm->lf.lfm_stride + col];
+}
+
+typedef void (*LpfFunc)(uint8_t *s, int p, const uint8_t *blimit,
+                        const uint8_t *limit, const uint8_t *thresh);
+
+typedef void (*LpfDualFunc)(uint8_t *s, int p, const uint8_t *blimit0,
+                            const uint8_t *limit0, const uint8_t *thresh0,
+                            const uint8_t *blimit1, const uint8_t *limit1,
+                            const uint8_t *thresh1);
+
+typedef void (*HbdLpfFunc)(uint16_t *s, int p, const uint8_t *blimit,
+                           const uint8_t *limit, const uint8_t *thresh, int bd);
+
+typedef void (*HbdLpfDualFunc)(uint16_t *s, int p, const uint8_t *blimit0,
+                               const uint8_t *limit0, const uint8_t *thresh0,
+                               const uint8_t *blimit1, const uint8_t *limit1,
+                               const uint8_t *thresh1, int bd);
+#endif  // LOOP_FILTER_BITMASK
+
+static void update_sharpness(loop_filter_info_n *lfi, int sharpness_lvl) {
+  int lvl;
+
+  // For each possible value for the loop filter fill out limits
+  for (lvl = 0; lvl <= MAX_LOOP_FILTER; lvl++) {
+    // Set loop filter parameters that control sharpness.
+    int block_inside_limit = lvl >> ((sharpness_lvl > 0) + (sharpness_lvl > 4));
+
+    if (sharpness_lvl > 0) {
+      if (block_inside_limit > (9 - sharpness_lvl))
+        block_inside_limit = (9 - sharpness_lvl);
+    }
+
+    if (block_inside_limit < 1) block_inside_limit = 1;
+
+    memset(lfi->lfthr[lvl].lim, block_inside_limit, SIMD_WIDTH);
+    memset(lfi->lfthr[lvl].mblim, (2 * (lvl + 2) + block_inside_limit),
+           SIMD_WIDTH);
+  }
+}
+
+uint8_t get_filter_level(const AV1_COMMON *cm, const loop_filter_info_n *lfi_n,
+                         const int dir_idx, int plane,
+                         const MB_MODE_INFO *mbmi) {
+  const int segment_id = mbmi->segment_id;
+  if (cm->delta_q_info.delta_lf_present_flag) {
+    int delta_lf;
+    if (cm->delta_q_info.delta_lf_multi) {
+      const int delta_lf_idx = delta_lf_id_lut[plane][dir_idx];
+      delta_lf = mbmi->delta_lf[delta_lf_idx];
+    } else {
+      delta_lf = mbmi->delta_lf_from_base;
+    }
+    int base_level;
+    if (plane == 0)
+      base_level = cm->lf.filter_level[dir_idx];
+    else if (plane == 1)
+      base_level = cm->lf.filter_level_u;
+    else
+      base_level = cm->lf.filter_level_v;
+    int lvl_seg = clamp(delta_lf + base_level, 0, MAX_LOOP_FILTER);
+    assert(plane >= 0 && plane <= 2);
+    const int seg_lf_feature_id = seg_lvl_lf_lut[plane][dir_idx];
+    if (segfeature_active(&cm->seg, segment_id, seg_lf_feature_id)) {
+      const int data = get_segdata(&cm->seg, segment_id, seg_lf_feature_id);
+      lvl_seg = clamp(lvl_seg + data, 0, MAX_LOOP_FILTER);
+    }
+
+    if (cm->lf.mode_ref_delta_enabled) {
+      const int scale = 1 << (lvl_seg >> 5);
+      lvl_seg += cm->lf.ref_deltas[mbmi->ref_frame[0]] * scale;
+      if (mbmi->ref_frame[0] > INTRA_FRAME)
+        lvl_seg += cm->lf.mode_deltas[mode_lf_lut[mbmi->mode]] * scale;
+      lvl_seg = clamp(lvl_seg, 0, MAX_LOOP_FILTER);
+    }
+    return lvl_seg;
+  } else {
+    return lfi_n->lvl[plane][segment_id][dir_idx][mbmi->ref_frame[0]]
+                     [mode_lf_lut[mbmi->mode]];
+  }
+}
+
+void av1_loop_filter_init(AV1_COMMON *cm) {
+  assert(MB_MODE_COUNT == NELEMENTS(mode_lf_lut));
+  loop_filter_info_n *lfi = &cm->lf_info;
+  struct loopfilter *lf = &cm->lf;
+  int lvl;
+
+  lf->combine_vert_horz_lf = 1;
+
+  // init limits for given sharpness
+  update_sharpness(lfi, lf->sharpness_level);
+
+  // init hev threshold const vectors
+  for (lvl = 0; lvl <= MAX_LOOP_FILTER; lvl++)
+    memset(lfi->lfthr[lvl].hev_thr, (lvl >> 4), SIMD_WIDTH);
+}
+
+// Update the loop filter for the current frame.
+// This should be called before loop_filter_rows(),
+// av1_loop_filter_frame() calls this function directly.
+void av1_loop_filter_frame_init(AV1_COMMON *cm, int plane_start,
+                                int plane_end) {
+  int filt_lvl[MAX_MB_PLANE], filt_lvl_r[MAX_MB_PLANE];
+  int plane;
+  int seg_id;
+  // n_shift is the multiplier for lf_deltas
+  // the multiplier is 1 for when filter_lvl is between 0 and 31;
+  // 2 when filter_lvl is between 32 and 63
+  loop_filter_info_n *const lfi = &cm->lf_info;
+  struct loopfilter *const lf = &cm->lf;
+  const struct segmentation *const seg = &cm->seg;
+
+  // update sharpness limits
+  update_sharpness(lfi, lf->sharpness_level);
+
+  filt_lvl[0] = cm->lf.filter_level[0];
+  filt_lvl[1] = cm->lf.filter_level_u;
+  filt_lvl[2] = cm->lf.filter_level_v;
+
+  filt_lvl_r[0] = cm->lf.filter_level[1];
+  filt_lvl_r[1] = cm->lf.filter_level_u;
+  filt_lvl_r[2] = cm->lf.filter_level_v;
+
+  assert(plane_start >= AOM_PLANE_Y);
+  assert(plane_end <= MAX_MB_PLANE);
+
+  for (plane = plane_start; plane < plane_end; plane++) {
+    if (plane == 0 && !filt_lvl[0] && !filt_lvl_r[0])
+      break;
+    else if (plane == 1 && !filt_lvl[1])
+      continue;
+    else if (plane == 2 && !filt_lvl[2])
+      continue;
+
+    for (seg_id = 0; seg_id < MAX_SEGMENTS; seg_id++) {
+      for (int dir = 0; dir < 2; ++dir) {
+        int lvl_seg = (dir == 0) ? filt_lvl[plane] : filt_lvl_r[plane];
+        const int seg_lf_feature_id = seg_lvl_lf_lut[plane][dir];
+        if (segfeature_active(seg, seg_id, seg_lf_feature_id)) {
+          const int data = get_segdata(&cm->seg, seg_id, seg_lf_feature_id);
+          lvl_seg = clamp(lvl_seg + data, 0, MAX_LOOP_FILTER);
+        }
+
+        if (!lf->mode_ref_delta_enabled) {
+          // we could get rid of this if we assume that deltas are set to
+          // zero when not in use; encoder always uses deltas
+          memset(lfi->lvl[plane][seg_id][dir], lvl_seg,
+                 sizeof(lfi->lvl[plane][seg_id][dir]));
+        } else {
+          int ref, mode;
+          const int scale = 1 << (lvl_seg >> 5);
+          const int intra_lvl = lvl_seg + lf->ref_deltas[INTRA_FRAME] * scale;
+          lfi->lvl[plane][seg_id][dir][INTRA_FRAME][0] =
+              clamp(intra_lvl, 0, MAX_LOOP_FILTER);
+
+          for (ref = LAST_FRAME; ref < REF_FRAMES; ++ref) {
+            for (mode = 0; mode < MAX_MODE_LF_DELTAS; ++mode) {
+              const int inter_lvl = lvl_seg + lf->ref_deltas[ref] * scale +
+                                    lf->mode_deltas[mode] * scale;
+              lfi->lvl[plane][seg_id][dir][ref][mode] =
+                  clamp(inter_lvl, 0, MAX_LOOP_FILTER);
+            }
+          }
+        }
+      }
+    }
+  }
+}
+
+#if LOOP_FILTER_BITMASK
+// A 64x64 tx block requires 256 bits to represent each 4x4 tx block.
+// Every 4 rows is represented by one uint64_t mask. Hence,
+// there are 4 uint64_t bitmask[4] to represent the 64x64 block.
+//
+// Given a location by (mi_col, mi_row), This function returns the index
+// 0, 1, 2, 3 to select which bitmask[] to use, and the shift value.
+//
+// For example, mi_row is the offset of pixels in mi size (4),
+// (mi_row / 4) returns which uint64_t.
+// After locating which uint64_t, mi_row % 4 is the
+// row offset, and each row has 16 = 1 << stride_log2 4x4 units.
+// Therefore, shift = (row << stride_log2) + mi_col;
+int get_index_shift(int mi_col, int mi_row, int *index) {
+  // *index = mi_row >> 2;
+  // rows = mi_row % 4;
+  // stride_log2 = 4;
+  // shift = (rows << stride_log2) + mi_col;
+  *index = mi_row >> 2;
+  return ((mi_row & 3) << 4) | mi_col;
+}
+
+static void check_mask(const FilterMask *lfm) {
+#ifndef NDEBUG
+  for (int i = 0; i < 4; ++i) {
+    assert(!(lfm[TX_4X4].bits[i] & lfm[TX_8X8].bits[i]));
+    assert(!(lfm[TX_4X4].bits[i] & lfm[TX_16X16].bits[i]));
+    assert(!(lfm[TX_4X4].bits[i] & lfm[TX_32X32].bits[i]));
+    assert(!(lfm[TX_4X4].bits[i] & lfm[TX_64X64].bits[i]));
+    assert(!(lfm[TX_8X8].bits[i] & lfm[TX_16X16].bits[i]));
+    assert(!(lfm[TX_8X8].bits[i] & lfm[TX_32X32].bits[i]));
+    assert(!(lfm[TX_8X8].bits[i] & lfm[TX_64X64].bits[i]));
+    assert(!(lfm[TX_16X16].bits[i] & lfm[TX_32X32].bits[i]));
+    assert(!(lfm[TX_16X16].bits[i] & lfm[TX_64X64].bits[i]));
+    assert(!(lfm[TX_32X32].bits[i] & lfm[TX_64X64].bits[i]));
+  }
+#else
+  (void)lfm;
+#endif
+}
+
+static void check_loop_filter_masks(const LoopFilterMask *lfm, int plane) {
+  if (plane == 0) {
+    // Assert if we try to apply 2 different loop filters at the same
+    // position.
+    check_mask(lfm->left_y);
+    check_mask(lfm->above_y);
+  } else if (plane == 1) {
+    check_mask(lfm->left_u);
+    check_mask(lfm->above_u);
+  } else {
+    check_mask(lfm->left_v);
+    check_mask(lfm->above_v);
+  }
+}
+
+static void update_masks(EDGE_DIR dir, int plane, uint64_t *mask,
+                         TX_SIZE sqr_tx_size, LoopFilterMask *lfm) {
+  if (dir == VERT_EDGE) {
+    switch (plane) {
+      case 0:
+        for (int i = 0; i < 4; ++i) lfm->left_y[sqr_tx_size].bits[i] |= mask[i];
+        break;
+      case 1:
+        for (int i = 0; i < 4; ++i) lfm->left_u[sqr_tx_size].bits[i] |= mask[i];
+        break;
+      case 2:
+        for (int i = 0; i < 4; ++i) lfm->left_v[sqr_tx_size].bits[i] |= mask[i];
+        break;
+      default: assert(plane <= 2);
+    }
+  } else {
+    switch (plane) {
+      case 0:
+        for (int i = 0; i < 4; ++i)
+          lfm->above_y[sqr_tx_size].bits[i] |= mask[i];
+        break;
+      case 1:
+        for (int i = 0; i < 4; ++i)
+          lfm->above_u[sqr_tx_size].bits[i] |= mask[i];
+        break;
+      case 2:
+        for (int i = 0; i < 4; ++i)
+          lfm->above_v[sqr_tx_size].bits[i] |= mask[i];
+        break;
+      default: assert(plane <= 2);
+    }
+  }
+}
+
+static int is_frame_boundary(AV1_COMMON *const cm, int plane, int mi_row,
+                             int mi_col, int ssx, int ssy, EDGE_DIR dir) {
+  if (plane && (ssx || ssy)) {
+    if (ssx && ssy) {  // format 420
+      if ((mi_row << MI_SIZE_LOG2) > cm->height ||
+          (mi_col << MI_SIZE_LOG2) > cm->width)
+        return 1;
+    } else if (ssx) {  // format 422
+      if ((mi_row << MI_SIZE_LOG2) >= cm->height ||
+          (mi_col << MI_SIZE_LOG2) > cm->width)
+        return 1;
+    }
+  } else {
+    if ((mi_row << MI_SIZE_LOG2) >= cm->height ||
+        (mi_col << MI_SIZE_LOG2) >= cm->width)
+      return 1;
+  }
+
+  int row_or_col;
+  if (plane == 0) {
+    row_or_col = dir == VERT_EDGE ? mi_col : mi_row;
+  } else {
+    // chroma sub8x8 block uses bottom/right mi of co-located 8x8 luma block.
+    // So if mi_col == 1, it is actually the frame boundary.
+    if (dir == VERT_EDGE) {
+      row_or_col = ssx ? (mi_col & 0x0FFFFFFE) : mi_col;
+    } else {
+      row_or_col = ssy ? (mi_row & 0x0FFFFFFE) : mi_row;
+    }
+  }
+  return row_or_col == 0;
+}
+
+static void setup_masks(AV1_COMMON *const cm, int mi_row, int mi_col, int plane,
+                        int ssx, int ssy, TX_SIZE tx_size) {
+  LoopFilterMask *lfm = get_loop_filter_mask(cm, mi_row, mi_col);
+  const int x = (mi_col << (MI_SIZE_LOG2 - ssx));
+  const int y = (mi_row << (MI_SIZE_LOG2 - ssy));
+  // decide whether current vertical/horizontal edge needs loop filtering
+  for (EDGE_DIR dir = VERT_EDGE; dir <= HORZ_EDGE; ++dir) {
+    // chroma sub8x8 block uses bottom/right mi of co-located 8x8 luma block.
+    mi_row |= ssy;
+    mi_col |= ssx;
+
+    MB_MODE_INFO **mi = cm->mi_grid_visible + mi_row * cm->mi_stride + mi_col;
+    const MB_MODE_INFO *const mbmi = mi[0];
+    const int curr_skip = mbmi->skip && is_inter_block(mbmi);
+    const BLOCK_SIZE bsize = mbmi->sb_type;
+    const BLOCK_SIZE bsizec = scale_chroma_bsize(bsize, ssx, ssy);
+    const BLOCK_SIZE plane_bsize = ss_size_lookup[bsizec][ssx][ssy];
+    const uint8_t level = get_filter_level(cm, &cm->lf_info, dir, plane, mbmi);
+    const int prediction_masks = dir == VERT_EDGE
+                                     ? block_size_wide[plane_bsize] - 1
+                                     : block_size_high[plane_bsize] - 1;
+    const int is_coding_block_border =
+        dir == VERT_EDGE ? !(x & prediction_masks) : !(y & prediction_masks);
+
+    // TODO(chengchen): step can be optimized.
+    const int row_step = mi_size_high[TX_4X4] << ssy;
+    const int col_step = mi_size_wide[TX_4X4] << ssx;
+    const int mi_height =
+        dir == VERT_EDGE ? tx_size_high_unit[tx_size] << ssy : row_step;
+    const int mi_width =
+        dir == VERT_EDGE ? col_step : tx_size_wide_unit[tx_size] << ssx;
+
+    // assign filter levels
+    for (int r = mi_row; r < mi_row + mi_height; r += row_step) {
+      for (int c = mi_col; c < mi_col + mi_width; c += col_step) {
+        // do not filter frame boundary
+        // Note: when chroma planes' size are half of luma plane,
+        // chroma plane mi corresponds to even position.
+        // If frame size is not even, we still need to filter this chroma
+        // position. Therefore the boundary condition check needs to be
+        // separated to two cases.
+        if (plane && (ssx || ssy)) {
+          if (ssx && ssy) {  // format 420
+            if ((r << MI_SIZE_LOG2) > cm->height ||
+                (c << MI_SIZE_LOG2) > cm->width)
+              continue;
+          } else if (ssx) {  // format 422
+            if ((r << MI_SIZE_LOG2) >= cm->height ||
+                (c << MI_SIZE_LOG2) > cm->width)
+              continue;
+          }
+        } else {
+          if ((r << MI_SIZE_LOG2) >= cm->height ||
+              (c << MI_SIZE_LOG2) >= cm->width)
+            continue;
+        }
+
+        const int row = r % MI_SIZE_64X64;
+        const int col = c % MI_SIZE_64X64;
+        if (plane == 0) {
+          if (dir == VERT_EDGE)
+            lfm->lfl_y_ver[row][col] = level;
+          else
+            lfm->lfl_y_hor[row][col] = level;
+        } else if (plane == 1) {
+          lfm->lfl_u_ver[row][col] = level;
+          lfm->lfl_u_hor[row][col] = level;
+        } else {
+          lfm->lfl_v_ver[row][col] = level;
+          lfm->lfl_v_hor[row][col] = level;
+        }
+      }
+    }
+
+    for (int r = mi_row; r < mi_row + mi_height; r += row_step) {
+      for (int c = mi_col; c < mi_col + mi_width; c += col_step) {
+        // do not filter frame boundary
+        if (is_frame_boundary(cm, plane, r, c, ssx, ssy, dir)) continue;
+
+        uint64_t mask[4] = { 0 };
+        const int prev_row = dir == VERT_EDGE ? r : r - (1 << ssy);
+        const int prev_col = dir == VERT_EDGE ? c - (1 << ssx) : c;
+        MB_MODE_INFO **mi_prev =
+            cm->mi_grid_visible + prev_row * cm->mi_stride + prev_col;
+        const MB_MODE_INFO *const mbmi_prev = mi_prev[0];
+        const int prev_skip = mbmi_prev->skip && is_inter_block(mbmi_prev);
+        const uint8_t level_prev =
+            get_filter_level(cm, &cm->lf_info, dir, plane, mbmi_prev);
+        const int is_edge =
+            (level || level_prev) &&
+            (!curr_skip || !prev_skip || is_coding_block_border);
+
+        if (is_edge) {
+          const TX_SIZE prev_tx_size =
+              plane ? av1_get_max_uv_txsize(mbmi_prev->sb_type, ssx, ssy)
+                    : mbmi_prev->tx_size;
+          TX_SIZE min_tx_size = (dir == VERT_EDGE)
+                                    ? AOMMIN(txsize_horz_map[tx_size],
+                                             txsize_horz_map[prev_tx_size])
+                                    : AOMMIN(txsize_vert_map[tx_size],
+                                             txsize_vert_map[prev_tx_size]);
+          min_tx_size = AOMMIN(min_tx_size, TX_16X16);
+          assert(min_tx_size < TX_SIZES);
+          const int row = r % MI_SIZE_64X64;
+          const int col = c % MI_SIZE_64X64;
+          int index = 0;
+          const int shift = get_index_shift(col, row, &index);
+          assert(index < 4 && index >= 0);
+          mask[index] |= ((uint64_t)1 << shift);
+          // set mask on corresponding bit
+          update_masks(dir, plane, mask, min_tx_size, lfm);
+        }
+      }
+    }
+  }
+}
+
+static void setup_tx_block_mask(AV1_COMMON *const cm, int mi_row, int mi_col,
+                                int blk_row, int blk_col,
+                                BLOCK_SIZE plane_bsize, TX_SIZE tx_size,
+                                int plane, int ssx, int ssy) {
+  blk_row <<= ssy;
+  blk_col <<= ssx;
+  if (((mi_row + blk_row) << MI_SIZE_LOG2) >= cm->height ||
+      ((mi_col + blk_col) << MI_SIZE_LOG2) >= cm->width)
+    return;
+
+  // U/V plane, tx_size is always the largest size
+  if (plane) {
+    assert(tx_size_wide[tx_size] <= 32 && tx_size_high[tx_size] <= 32);
+    setup_masks(cm, mi_row + blk_row, mi_col + blk_col, plane, ssx, ssy,
+                tx_size);
+    return;
+  }
+
+  MB_MODE_INFO **mi = cm->mi_grid_visible + mi_row * cm->mi_stride + mi_col;
+  const MB_MODE_INFO *const mbmi = mi[0];
+  // For Y plane:
+  // If intra block, tx size is univariant.
+  // If inter block, tx size follows inter_tx_size.
+  TX_SIZE plane_tx_size = tx_size;
+  const int is_inter = is_inter_block(mbmi);
+
+  if (plane == 0) {
+    if (is_inter) {
+      if (mbmi->skip) {
+        // TODO(chengchen): change av1_get_transform_size() to be consistant.
+        // plane_tx_size = get_max_rect_tx_size(plane_bsize);
+        plane_tx_size = mbmi->tx_size;
+      } else {
+        plane_tx_size = mbmi->inter_tx_size[av1_get_txb_size_index(
+            plane_bsize, blk_row, blk_col)];
+      }
+    } else {
+      MB_MODE_INFO **mi_this = cm->mi_grid_visible +
+                               (mi_row + blk_row) * cm->mi_stride + mi_col +
+                               blk_col;
+      const MB_MODE_INFO *const mbmi_this = mi_this[0];
+      plane_tx_size = mbmi_this->tx_size;
+    }
+  }
+
+  assert(txsize_to_bsize[plane_tx_size] <= plane_bsize);
+
+  if (plane || plane_tx_size == tx_size) {
+    setup_masks(cm, mi_row + blk_row, mi_col + blk_col, plane, ssx, ssy,
+                tx_size);
+  } else {
+    const TX_SIZE sub_txs = sub_tx_size_map[tx_size];
+    const int bsw = tx_size_wide_unit[sub_txs];
+    const int bsh = tx_size_high_unit[sub_txs];
+    for (int row = 0; row < tx_size_high_unit[tx_size]; row += bsh) {
+      for (int col = 0; col < tx_size_wide_unit[tx_size]; col += bsw) {
+        const int offsetr = blk_row + row;
+        const int offsetc = blk_col + col;
+        setup_tx_block_mask(cm, mi_row, mi_col, offsetr, offsetc, plane_bsize,
+                            sub_txs, plane, ssx, ssy);
+      }
+    }
+  }
+}
+
+static void setup_fix_block_mask(AV1_COMMON *const cm, int mi_row, int mi_col,
+                                 int plane, int ssx, int ssy) {
+  MB_MODE_INFO **mi =
+      cm->mi_grid_visible + (mi_row | ssy) * cm->mi_stride + (mi_col | ssx);
+  const MB_MODE_INFO *const mbmi = mi[0];
+
+  const BLOCK_SIZE bsize = mbmi->sb_type;
+  const BLOCK_SIZE bsizec = scale_chroma_bsize(bsize, ssx, ssy);
+  const BLOCK_SIZE plane_bsize = ss_size_lookup[bsizec][ssx][ssy];
+
+  const int block_width = mi_size_wide[plane_bsize];
+  const int block_height = mi_size_high[plane_bsize];
+
+  TX_SIZE max_txsize = max_txsize_rect_lookup[plane_bsize];
+  // The decoder is designed so that it can process 64x64 luma pixels at a
+  // time. If this is a chroma plane with subsampling and bsize corresponds to
+  // a subsampled BLOCK_128X128 then the lookup above will give TX_64X64. That
+  // mustn't be used for the subsampled plane (because it would be bigger than
+  // a 64x64 luma block) so we round down to TX_32X32.
+  if (plane && txsize_sqr_up_map[max_txsize] == TX_64X64) {
+    if (max_txsize == TX_16X64)
+      max_txsize = TX_16X32;
+    else if (max_txsize == TX_64X16)
+      max_txsize = TX_32X16;
+    else
+      max_txsize = TX_32X32;
+  }
+
+  const BLOCK_SIZE txb_size = txsize_to_bsize[max_txsize];
+  const int bw = block_size_wide[txb_size] >> tx_size_wide_log2[0];
+  const int bh = block_size_high[txb_size] >> tx_size_wide_log2[0];
+  const BLOCK_SIZE max_unit_bsize = ss_size_lookup[BLOCK_64X64][ssx][ssy];
+  int mu_blocks_wide = block_size_wide[max_unit_bsize] >> tx_size_wide_log2[0];
+  int mu_blocks_high = block_size_high[max_unit_bsize] >> tx_size_high_log2[0];
+
+  mu_blocks_wide = AOMMIN(block_width, mu_blocks_wide);
+  mu_blocks_high = AOMMIN(block_height, mu_blocks_high);
+
+  // Y: Largest tx_size is 64x64, while superblock size can be 128x128.
+  // Here we ensure that setup_tx_block_mask process at most a 64x64 block.
+  // U/V: largest tx size is 32x32.
+  for (int idy = 0; idy < block_height; idy += mu_blocks_high) {
+    for (int idx = 0; idx < block_width; idx += mu_blocks_wide) {
+      const int unit_height = AOMMIN(mu_blocks_high + idy, block_height);
+      const int unit_width = AOMMIN(mu_blocks_wide + idx, block_width);
+      for (int blk_row = idy; blk_row < unit_height; blk_row += bh) {
+        for (int blk_col = idx; blk_col < unit_width; blk_col += bw) {
+          setup_tx_block_mask(cm, mi_row, mi_col, blk_row, blk_col, plane_bsize,
+                              max_txsize, plane, ssx, ssy);
+        }
+      }
+    }
+  }
+}
+
+static void setup_block_mask(AV1_COMMON *const cm, int mi_row, int mi_col,
+                             BLOCK_SIZE bsize, int plane, int ssx, int ssy) {
+  if ((mi_row << MI_SIZE_LOG2) >= cm->height ||
+      (mi_col << MI_SIZE_LOG2) >= cm->width)
+    return;
+
+  const PARTITION_TYPE partition = get_partition(cm, mi_row, mi_col, bsize);
+  const BLOCK_SIZE subsize = get_partition_subsize(bsize, partition);
+  const int hbs = mi_size_wide[bsize] / 2;
+  const int quarter_step = mi_size_wide[bsize] / 4;
+  const int allow_sub8x8 = (ssx || ssy) ? bsize > BLOCK_8X8 : 1;
+  const int has_next_row =
+      (((mi_row + hbs) << MI_SIZE_LOG2) < cm->height) & allow_sub8x8;
+  const int has_next_col =
+      (((mi_col + hbs) << MI_SIZE_LOG2) < cm->width) & allow_sub8x8;
+  int i;
+
+  switch (partition) {
+    case PARTITION_NONE:
+      setup_fix_block_mask(cm, mi_row, mi_col, plane, ssx, ssy);
+      break;
+    case PARTITION_HORZ:
+      setup_fix_block_mask(cm, mi_row, mi_col, plane, ssx, ssy);
+      if (has_next_row)
+        setup_fix_block_mask(cm, mi_row + hbs, mi_col, plane, ssx, ssy);
+      break;
+    case PARTITION_VERT:
+      setup_fix_block_mask(cm, mi_row, mi_col, plane, ssx, ssy);
+      if (has_next_col)
+        setup_fix_block_mask(cm, mi_row, mi_col + hbs, plane, ssx, ssy);
+      break;
+    case PARTITION_SPLIT:
+      setup_block_mask(cm, mi_row, mi_col, subsize, plane, ssx, ssy);
+      if (has_next_col)
+        setup_block_mask(cm, mi_row, mi_col + hbs, subsize, plane, ssx, ssy);
+      if (has_next_row)
+        setup_block_mask(cm, mi_row + hbs, mi_col, subsize, plane, ssx, ssy);
+      if (has_next_col & has_next_row)
+        setup_block_mask(cm, mi_row + hbs, mi_col + hbs, subsize, plane, ssx,
+                         ssy);
+      break;
+    case PARTITION_HORZ_A:
+      setup_fix_block_mask(cm, mi_row, mi_col, plane, ssx, ssy);
+      if (has_next_col)
+        setup_fix_block_mask(cm, mi_row, mi_col + hbs, plane, ssx, ssy);
+      if (has_next_row)
+        setup_fix_block_mask(cm, mi_row + hbs, mi_col, plane, ssx, ssy);
+      break;
+    case PARTITION_HORZ_B:
+      setup_fix_block_mask(cm, mi_row, mi_col, plane, ssx, ssy);
+      if (has_next_row)
+        setup_fix_block_mask(cm, mi_row + hbs, mi_col, plane, ssx, ssy);
+      if (has_next_col & has_next_row)
+        setup_fix_block_mask(cm, mi_row + hbs, mi_col + hbs, plane, ssx, ssy);
+      break;
+    case PARTITION_VERT_A:
+      setup_fix_block_mask(cm, mi_row, mi_col, plane, ssx, ssy);
+      if (has_next_row)
+        setup_fix_block_mask(cm, mi_row + hbs, mi_col, plane, ssx, ssy);
+      if (has_next_col)
+        setup_fix_block_mask(cm, mi_row, mi_col + hbs, plane, ssx, ssy);
+      break;
+    case PARTITION_VERT_B:
+      setup_fix_block_mask(cm, mi_row, mi_col, plane, ssx, ssy);
+      if (has_next_col)
+        setup_fix_block_mask(cm, mi_row, mi_col + hbs, plane, ssx, ssy);
+      if (has_next_row)
+        setup_fix_block_mask(cm, mi_row + hbs, mi_col + hbs, plane, ssx, ssy);
+      break;
+    case PARTITION_HORZ_4:
+      for (i = 0; i < 4; ++i) {
+        int this_mi_row = mi_row + i * quarter_step;
+        if (i > 0 && (this_mi_row << MI_SIZE_LOG2) >= cm->height) break;
+        // chroma plane filter the odd location
+        if (plane && bsize == BLOCK_16X16 && (i & 0x01)) continue;
+
+        setup_fix_block_mask(cm, this_mi_row, mi_col, plane, ssx, ssy);
+      }
+      break;
+    case PARTITION_VERT_4:
+      for (i = 0; i < 4; ++i) {
+        int this_mi_col = mi_col + i * quarter_step;
+        if (i > 0 && this_mi_col >= cm->mi_cols) break;
+        // chroma plane filter the odd location
+        if (plane && bsize == BLOCK_16X16 && (i & 0x01)) continue;
+
+        setup_fix_block_mask(cm, mi_row, this_mi_col, plane, ssx, ssy);
+      }
+      break;
+    default: assert(0);
+  }
+}
+
+// TODO(chengchen): if lossless, do not need to setup mask. But when
+// segments enabled, each segment has different lossless settings.
+void av1_setup_bitmask(AV1_COMMON *const cm, int mi_row, int mi_col, int plane,
+                       int subsampling_x, int subsampling_y, int row_end,
+                       int col_end) {
+  const int num_64x64 = cm->seq_params.mib_size >> MIN_MIB_SIZE_LOG2;
+  for (int y = 0; y < num_64x64; ++y) {
+    for (int x = 0; x < num_64x64; ++x) {
+      const int row = mi_row + y * MI_SIZE_64X64;
+      const int col = mi_col + x * MI_SIZE_64X64;
+      if (row >= row_end || col >= col_end) continue;
+      if ((row << MI_SIZE_LOG2) >= cm->height ||
+          (col << MI_SIZE_LOG2) >= cm->width)
+        continue;
+
+      LoopFilterMask *lfm = get_loop_filter_mask(cm, row, col);
+      if (lfm == NULL) return;
+
+      // init mask to zero
+      if (plane == 0) {
+        av1_zero(lfm->left_y);
+        av1_zero(lfm->above_y);
+        av1_zero(lfm->lfl_y_ver);
+        av1_zero(lfm->lfl_y_hor);
+      } else if (plane == 1) {
+        av1_zero(lfm->left_u);
+        av1_zero(lfm->above_u);
+        av1_zero(lfm->lfl_u_ver);
+        av1_zero(lfm->lfl_u_hor);
+      } else {
+        av1_zero(lfm->left_v);
+        av1_zero(lfm->above_v);
+        av1_zero(lfm->lfl_v_ver);
+        av1_zero(lfm->lfl_v_hor);
+      }
+    }
+  }
+
+  // set up bitmask for each superblock
+  setup_block_mask(cm, mi_row, mi_col, cm->seq_params.sb_size, plane,
+                   subsampling_x, subsampling_y);
+
+  for (int y = 0; y < num_64x64; ++y) {
+    for (int x = 0; x < num_64x64; ++x) {
+      const int row = mi_row + y * MI_SIZE_64X64;
+      const int col = mi_col + x * MI_SIZE_64X64;
+      if (row >= row_end || col >= col_end) continue;
+      if ((row << MI_SIZE_LOG2) >= cm->height ||
+          (col << MI_SIZE_LOG2) >= cm->width)
+        continue;
+
+      LoopFilterMask *lfm = get_loop_filter_mask(cm, row, col);
+      if (lfm == NULL) return;
+
+      // check if the mask is valid
+      check_loop_filter_masks(lfm, plane);
+
+      {
+        // Let 16x16 hold 32x32 (Y/U/V) and 64x64(Y only).
+        // Even tx size is greater, we only apply max length filter, which
+        // is 16.
+        if (plane == 0) {
+          for (int j = 0; j < 4; ++j) {
+            lfm->left_y[TX_16X16].bits[j] |= lfm->left_y[TX_32X32].bits[j];
+            lfm->left_y[TX_16X16].bits[j] |= lfm->left_y[TX_64X64].bits[j];
+            lfm->above_y[TX_16X16].bits[j] |= lfm->above_y[TX_32X32].bits[j];
+            lfm->above_y[TX_16X16].bits[j] |= lfm->above_y[TX_64X64].bits[j];
+
+            // set 32x32 and 64x64 to 0
+            lfm->left_y[TX_32X32].bits[j] = 0;
+            lfm->left_y[TX_64X64].bits[j] = 0;
+            lfm->above_y[TX_32X32].bits[j] = 0;
+            lfm->above_y[TX_64X64].bits[j] = 0;
+          }
+        } else if (plane == 1) {
+          for (int j = 0; j < 4; ++j) {
+            lfm->left_u[TX_16X16].bits[j] |= lfm->left_u[TX_32X32].bits[j];
+            lfm->above_u[TX_16X16].bits[j] |= lfm->above_u[TX_32X32].bits[j];
+
+            // set 32x32 to 0
+            lfm->left_u[TX_32X32].bits[j] = 0;
+            lfm->above_u[TX_32X32].bits[j] = 0;
+          }
+        } else {
+          for (int j = 0; j < 4; ++j) {
+            lfm->left_v[TX_16X16].bits[j] |= lfm->left_v[TX_32X32].bits[j];
+            lfm->above_v[TX_16X16].bits[j] |= lfm->above_v[TX_32X32].bits[j];
+
+            // set 32x32 to 0
+            lfm->left_v[TX_32X32].bits[j] = 0;
+            lfm->above_v[TX_32X32].bits[j] = 0;
+          }
+        }
+      }
+
+      // check if the mask is valid
+      check_loop_filter_masks(lfm, plane);
+    }
+  }
+}
+
+static void filter_selectively_vert_row2(
+    int subsampling_factor, uint8_t *s, int pitch, int plane,
+    uint64_t mask_16x16_0, uint64_t mask_8x8_0, uint64_t mask_4x4_0,
+    uint64_t mask_16x16_1, uint64_t mask_8x8_1, uint64_t mask_4x4_1,
+    const loop_filter_info_n *lfi_n, uint8_t *lfl, uint8_t *lfl2) {
+  uint64_t mask;
+  const int step = 1 << subsampling_factor;
+
+  for (mask = mask_16x16_0 | mask_8x8_0 | mask_4x4_0 | mask_16x16_1 |
+              mask_8x8_1 | mask_4x4_1;
+       mask; mask >>= step) {
+    const loop_filter_thresh *lfi0 = lfi_n->lfthr + *lfl;
+    const loop_filter_thresh *lfi1 = lfi_n->lfthr + *lfl2;
+
+    if (mask & 1) {
+      if ((mask_16x16_0 | mask_16x16_1) & 1) {
+        // chroma plane filters less pixels introduced in deblock_13tap
+        // experiment
+        LpfFunc lpf_vertical = plane ? aom_lpf_vertical_6 : aom_lpf_vertical_14;
+
+        if ((mask_16x16_0 & mask_16x16_1) & 1) {
+          if (plane) {
+            aom_lpf_vertical_6_dual(s, pitch, lfi0->mblim, lfi0->lim,
+                                    lfi0->hev_thr, lfi1->mblim, lfi1->lim,
+                                    lfi1->hev_thr);
+          } else {
+            aom_lpf_vertical_14_dual(s, pitch, lfi0->mblim, lfi0->lim,
+                                     lfi0->hev_thr, lfi1->mblim, lfi1->lim,
+                                     lfi1->hev_thr);
+          }
+        } else if (mask_16x16_0 & 1) {
+          lpf_vertical(s, pitch, lfi0->mblim, lfi0->lim, lfi0->hev_thr);
+        } else {
+          lpf_vertical(s + 4 * pitch, pitch, lfi1->mblim, lfi1->lim,
+                       lfi1->hev_thr);
+        }
+      }
+
+      if ((mask_8x8_0 | mask_8x8_1) & 1) {
+        // chroma plane filters less pixels introduced in deblock_13tap
+        // experiment
+        LpfFunc lpf_vertical = plane ? aom_lpf_vertical_6 : aom_lpf_vertical_8;
+
+        if ((mask_8x8_0 & mask_8x8_1) & 1) {
+          if (plane) {
+            aom_lpf_vertical_6_dual(s, pitch, lfi0->mblim, lfi0->lim,
+                                    lfi0->hev_thr, lfi1->mblim, lfi1->lim,
+                                    lfi1->hev_thr);
+          } else {
+            aom_lpf_vertical_8_dual(s, pitch, lfi0->mblim, lfi0->lim,
+                                    lfi0->hev_thr, lfi1->mblim, lfi1->lim,
+                                    lfi1->hev_thr);
+          }
+        } else if (mask_8x8_0 & 1) {
+          lpf_vertical(s, pitch, lfi0->mblim, lfi0->lim, lfi0->hev_thr);
+        } else {
+          lpf_vertical(s + 4 * pitch, pitch, lfi1->mblim, lfi1->lim,
+                       lfi1->hev_thr);
+        }
+      }
+
+      if ((mask_4x4_0 | mask_4x4_1) & 1) {
+        if ((mask_4x4_0 & mask_4x4_1) & 1) {
+          aom_lpf_vertical_4_dual(s, pitch, lfi0->mblim, lfi0->lim,
+                                  lfi0->hev_thr, lfi1->mblim, lfi1->lim,
+                                  lfi1->hev_thr);
+        } else if (mask_4x4_0 & 1) {
+          aom_lpf_vertical_4(s, pitch, lfi0->mblim, lfi0->lim, lfi0->hev_thr);
+        } else {
+          aom_lpf_vertical_4(s + 4 * pitch, pitch, lfi1->mblim, lfi1->lim,
+                             lfi1->hev_thr);
+        }
+      }
+    }
+
+    s += 4;
+    lfl += step;
+    lfl2 += step;
+    mask_16x16_0 >>= step;
+    mask_8x8_0 >>= step;
+    mask_4x4_0 >>= step;
+    mask_16x16_1 >>= step;
+    mask_8x8_1 >>= step;
+    mask_4x4_1 >>= step;
+  }
+}
+
+static void highbd_filter_selectively_vert_row2(
+    int subsampling_factor, uint16_t *s, int pitch, int plane,
+    uint64_t mask_16x16_0, uint64_t mask_8x8_0, uint64_t mask_4x4_0,
+    uint64_t mask_16x16_1, uint64_t mask_8x8_1, uint64_t mask_4x4_1,
+    const loop_filter_info_n *lfi_n, uint8_t *lfl, uint8_t *lfl2, int bd) {
+  uint64_t mask;
+  const int step = 1 << subsampling_factor;
+
+  for (mask = mask_16x16_0 | mask_8x8_0 | mask_4x4_0 | mask_16x16_1 |
+              mask_8x8_1 | mask_4x4_1;
+       mask; mask >>= step) {
+    const loop_filter_thresh *lfi0 = lfi_n->lfthr + *lfl;
+    const loop_filter_thresh *lfi1 = lfi_n->lfthr + *lfl2;
+
+    if (mask & 1) {
+      if ((mask_16x16_0 | mask_16x16_1) & 1) {
+        // chroma plane filters less pixels introduced in deblock_13tap
+        // experiment
+        HbdLpfFunc highbd_lpf_vertical =
+            plane ? aom_highbd_lpf_vertical_6 : aom_highbd_lpf_vertical_14;
+
+        if ((mask_16x16_0 & mask_16x16_1) & 1) {
+          if (plane) {
+            aom_highbd_lpf_vertical_6_dual(s, pitch, lfi0->mblim, lfi0->lim,
+                                           lfi0->hev_thr, lfi1->mblim,
+                                           lfi1->lim, lfi1->hev_thr, bd);
+          } else {
+            aom_highbd_lpf_vertical_14_dual(s, pitch, lfi0->mblim, lfi0->lim,
+                                            lfi0->hev_thr, lfi1->mblim,
+                                            lfi1->lim, lfi1->hev_thr, bd);
+          }
+        } else if (mask_16x16_0 & 1) {
+          highbd_lpf_vertical(s, pitch, lfi0->mblim, lfi0->lim, lfi0->hev_thr,
+                              bd);
+        } else {
+          highbd_lpf_vertical(s + 4 * pitch, pitch, lfi1->mblim, lfi1->lim,
+                              lfi1->hev_thr, bd);
+        }
+      }
+
+      if ((mask_8x8_0 | mask_8x8_1) & 1) {
+        HbdLpfFunc highbd_lpf_vertical =
+            plane ? aom_highbd_lpf_vertical_6 : aom_highbd_lpf_vertical_8;
+
+        if ((mask_8x8_0 & mask_8x8_1) & 1) {
+          if (plane) {
+            aom_highbd_lpf_vertical_6_dual(s, pitch, lfi0->mblim, lfi0->lim,
+                                           lfi0->hev_thr, lfi1->mblim,
+                                           lfi1->lim, lfi1->hev_thr, bd);
+          } else {
+            aom_highbd_lpf_vertical_8_dual(s, pitch, lfi0->mblim, lfi0->lim,
+                                           lfi0->hev_thr, lfi1->mblim,
+                                           lfi1->lim, lfi1->hev_thr, bd);
+          }
+        } else if (mask_8x8_0 & 1) {
+          highbd_lpf_vertical(s, pitch, lfi0->mblim, lfi0->lim, lfi0->hev_thr,
+                              bd);
+        } else {
+          highbd_lpf_vertical(s + 4 * pitch, pitch, lfi1->mblim, lfi1->lim,
+                              lfi1->hev_thr, bd);
+        }
+      }
+
+      if ((mask_4x4_0 | mask_4x4_1) & 1) {
+        if ((mask_4x4_0 & mask_4x4_1) & 1) {
+          aom_highbd_lpf_vertical_4_dual(s, pitch, lfi0->mblim, lfi0->lim,
+                                         lfi0->hev_thr, lfi1->mblim, lfi1->lim,
+                                         lfi1->hev_thr, bd);
+        } else if (mask_4x4_0 & 1) {
+          aom_highbd_lpf_vertical_4(s, pitch, lfi0->mblim, lfi0->lim,
+                                    lfi0->hev_thr, bd);
+        } else {
+          aom_highbd_lpf_vertical_4(s + 4 * pitch, pitch, lfi1->mblim,
+                                    lfi1->lim, lfi1->hev_thr, bd);
+        }
+      }
+    }
+
+    s += 4;
+    lfl += step;
+    lfl2 += step;
+    mask_16x16_0 >>= step;
+    mask_8x8_0 >>= step;
+    mask_4x4_0 >>= step;
+    mask_16x16_1 >>= step;
+    mask_8x8_1 >>= step;
+    mask_4x4_1 >>= step;
+  }
+}
+
+static void filter_selectively_horiz(uint8_t *s, int pitch, int plane,
+                                     int subsampling, uint64_t mask_16x16,
+                                     uint64_t mask_8x8, uint64_t mask_4x4,
+                                     const loop_filter_info_n *lfi_n,
+                                     const uint8_t *lfl) {
+  uint64_t mask;
+  int count;
+  const int step = 1 << subsampling;
+  const unsigned int two_block_mask = subsampling ? 5 : 3;
+  int offset = 0;
+
+  for (mask = mask_16x16 | mask_8x8 | mask_4x4; mask; mask >>= step * count) {
+    const loop_filter_thresh *lfi = lfi_n->lfthr + *lfl;
+    // Next block's thresholds, when it is within current 64x64 block.
+    // If it is out of bound, its mask is zero, and it points to current edge's
+    // filter parameters, instead of next edge's.
+    int next_edge = step;
+    if (offset + next_edge >= MI_SIZE_64X64) next_edge = 0;
+    const loop_filter_thresh *lfin = lfi_n->lfthr + *(lfl + next_edge);
+
+    count = 1;
+    if (mask & 1) {
+      if (mask_16x16 & 1) {
+        // chroma plane filters less pixels introduced in deblock_13tap
+        // experiment
+        LpfFunc lpf_horizontal =
+            plane ? aom_lpf_horizontal_6 : aom_lpf_horizontal_14;
+
+        if ((mask_16x16 & two_block_mask) == two_block_mask) {
+          if (plane) {
+            aom_lpf_horizontal_6_dual(s, pitch, lfi->mblim, lfi->lim,
+                                      lfi->hev_thr, lfin->mblim, lfin->lim,
+                                      lfin->hev_thr);
+          } else {
+            aom_lpf_horizontal_14_dual(s, pitch, lfi->mblim, lfi->lim,
+                                       lfi->hev_thr, lfin->mblim, lfin->lim,
+                                       lfin->hev_thr);
+          }
+          count = 2;
+        } else {
+          lpf_horizontal(s, pitch, lfi->mblim, lfi->lim, lfi->hev_thr);
+        }
+      } else if (mask_8x8 & 1) {
+        // chroma plane filters less pixels introduced in deblock_13tap
+        // experiment
+        LpfFunc lpf_horizontal =
+            plane ? aom_lpf_horizontal_6 : aom_lpf_horizontal_8;
+
+        if ((mask_8x8 & two_block_mask) == two_block_mask) {
+          if (plane) {
+            aom_lpf_horizontal_6_dual(s, pitch, lfi->mblim, lfi->lim,
+                                      lfi->hev_thr, lfin->mblim, lfin->lim,
+                                      lfin->hev_thr);
+          } else {
+            aom_lpf_horizontal_8_dual(s, pitch, lfi->mblim, lfi->lim,
+                                      lfi->hev_thr, lfin->mblim, lfin->lim,
+                                      lfin->hev_thr);
+          }
+          count = 2;
+        } else {
+          lpf_horizontal(s, pitch, lfi->mblim, lfi->lim, lfi->hev_thr);
+        }
+      } else if (mask_4x4 & 1) {
+        if ((mask_4x4 & two_block_mask) == two_block_mask) {
+          aom_lpf_horizontal_4_dual(s, pitch, lfi->mblim, lfi->lim,
+                                    lfi->hev_thr, lfin->mblim, lfin->lim,
+                                    lfin->hev_thr);
+          count = 2;
+        } else {
+          aom_lpf_horizontal_4(s, pitch, lfi->mblim, lfi->lim, lfi->hev_thr);
+        }
+      }
+    }
+
+    s += 4 * count;
+    lfl += step * count;
+    mask_16x16 >>= step * count;
+    mask_8x8 >>= step * count;
+    mask_4x4 >>= step * count;
+    offset += step * count;
+  }
+}
+
+static void highbd_filter_selectively_horiz(
+    uint16_t *s, int pitch, int plane, int subsampling, uint64_t mask_16x16,
+    uint64_t mask_8x8, uint64_t mask_4x4, const loop_filter_info_n *lfi_n,
+    uint8_t *lfl, int bd) {
+  uint64_t mask;
+  int count;
+  const int step = 1 << subsampling;
+  const unsigned int two_block_mask = subsampling ? 5 : 3;
+  int offset = 0;
+
+  for (mask = mask_16x16 | mask_8x8 | mask_4x4; mask; mask >>= step * count) {
+    const loop_filter_thresh *lfi = lfi_n->lfthr + *lfl;
+    // Next block's thresholds, when it is within current 64x64 block.
+    // If it is out of bound, its mask is zero, and it points to current edge's
+    // filter parameters, instead of next edge's.
+    int next_edge = step;
+    if (offset + next_edge >= MI_SIZE_64X64) next_edge = 0;
+    const loop_filter_thresh *lfin = lfi_n->lfthr + *(lfl + next_edge);
+
+    count = 1;
+    if (mask & 1) {
+      if (mask_16x16 & 1) {
+        HbdLpfFunc highbd_lpf_horizontal =
+            plane ? aom_highbd_lpf_horizontal_6 : aom_highbd_lpf_horizontal_14;
+
+        if ((mask_16x16 & two_block_mask) == two_block_mask) {
+          if (plane) {
+            aom_highbd_lpf_horizontal_6_dual_c(s, pitch, lfi->mblim, lfi->lim,
+                                               lfi->hev_thr, lfin->mblim,
+                                               lfin->lim, lfin->hev_thr, bd);
+          } else {
+            aom_highbd_lpf_horizontal_14_dual(s, pitch, lfi->mblim, lfi->lim,
+                                              lfi->hev_thr, lfin->mblim,
+                                              lfin->lim, lfin->hev_thr, bd);
+          }
+          count = 2;
+        } else {
+          highbd_lpf_horizontal(s, pitch, lfi->mblim, lfi->lim, lfi->hev_thr,
+                                bd);
+        }
+      } else if (mask_8x8 & 1) {
+        HbdLpfFunc highbd_lpf_horizontal =
+            plane ? aom_highbd_lpf_horizontal_6 : aom_highbd_lpf_horizontal_8;
+
+        if ((mask_8x8 & two_block_mask) == two_block_mask) {
+          if (plane) {
+            aom_highbd_lpf_horizontal_6_dual_c(s, pitch, lfi->mblim, lfi->lim,
+                                               lfi->hev_thr, lfin->mblim,
+                                               lfin->lim, lfin->hev_thr, bd);
+          } else {
+            aom_highbd_lpf_horizontal_8_dual_c(s, pitch, lfi->mblim, lfi->lim,
+                                               lfi->hev_thr, lfin->mblim,
+                                               lfin->lim, lfin->hev_thr, bd);
+          }
+          count = 2;
+        } else {
+          highbd_lpf_horizontal(s, pitch, lfi->mblim, lfi->lim, lfi->hev_thr,
+                                bd);
+        }
+      } else if (mask_4x4 & 1) {
+        if ((mask_4x4 & two_block_mask) == two_block_mask) {
+          aom_highbd_lpf_horizontal_4_dual_c(s, pitch, lfi->mblim, lfi->lim,
+                                             lfi->hev_thr, lfin->mblim,
+                                             lfin->lim, lfin->hev_thr, bd);
+          count = 2;
+        } else {
+          aom_highbd_lpf_horizontal_4(s, pitch, lfi->mblim, lfi->lim,
+                                      lfi->hev_thr, bd);
+        }
+      }
+    }
+
+    s += 4 * count;
+    lfl += step * count;
+    mask_16x16 >>= step * count;
+    mask_8x8 >>= step * count;
+    mask_4x4 >>= step * count;
+    offset += step * count;
+  }
+}
+
+void av1_build_bitmask_vert_info(
+    AV1_COMMON *const cm, const struct macroblockd_plane *const plane_ptr,
+    int plane) {
+  const int subsampling_x = plane_ptr->subsampling_x;
+  const int subsampling_y = plane_ptr->subsampling_y;
+  const int row_step = (MI_SIZE >> MI_SIZE_LOG2);
+  const int is_uv = plane > 0;
+  TX_SIZE tx_size = TX_16X16, prev_tx_size = TX_16X16;
+  uint8_t level, prev_level = 1;
+  uint64_t skip, prev_skip = 0;
+  uint64_t is_coding_block_border;
+
+  for (int r = 0; (r << MI_SIZE_LOG2) < plane_ptr->dst.height; r += row_step) {
+    const int mi_row = r << subsampling_y;
+    const int row = mi_row % MI_SIZE_64X64;
+    const int row_uv = row | subsampling_y;
+    int index = 0;
+    const int shift = get_index_shift(0, row, &index);
+
+    for (int c = 0; (c << MI_SIZE_LOG2) < plane_ptr->dst.width;
+         c += (tx_size_wide_unit[TX_64X64] >> subsampling_x)) {
+      const int mi_col = c << subsampling_x;
+      LoopFilterMask *lfm = get_loop_filter_mask(cm, mi_row, mi_col);
+
+      for (int col_in_unit = 0;
+           col_in_unit < (tx_size_wide_unit[TX_64X64] >> subsampling_x);) {
+        const int x = (c + col_in_unit) << MI_SIZE_LOG2;
+        if (x >= plane_ptr->dst.width) break;
+        const int col = col_in_unit << subsampling_x;
+        const int col_uv = col | subsampling_x;
+        const uint64_t mask = ((uint64_t)1 << (shift | col));
+        skip = lfm->skip.bits[index] & mask;
+        is_coding_block_border = lfm->is_vert_border.bits[index] & mask;
+        switch (plane) {
+          case 0: level = lfm->lfl_y_ver[row_uv][col_uv]; break;
+          case 1: level = lfm->lfl_u_ver[row_uv][col_uv]; break;
+          case 2: level = lfm->lfl_v_ver[row_uv][col_uv]; break;
+          default: assert(plane >= 0 && plane <= 2); return;
+        }
+        for (TX_SIZE ts = TX_4X4; ts <= TX_64X64; ++ts) {
+          if (is_uv && ts == TX_64X64) continue;
+          if (lfm->tx_size_ver[is_uv][ts].bits[index] & mask) {
+            tx_size = ts;
+            break;
+          }
+        }
+        if ((c + col_in_unit > 0) && (level || prev_level) &&
+            (!prev_skip || !skip || is_coding_block_border)) {
+          const TX_SIZE min_tx_size =
+              AOMMIN(TX_16X16, AOMMIN(tx_size, prev_tx_size));
+          const int shift_1 = get_index_shift(col_uv, row_uv, &index);
+          const uint64_t mask_1 = ((uint64_t)1 << shift_1);
+          switch (plane) {
+            case 0: lfm->left_y[min_tx_size].bits[index] |= mask_1; break;
+            case 1: lfm->left_u[min_tx_size].bits[index] |= mask_1; break;
+            case 2: lfm->left_v[min_tx_size].bits[index] |= mask_1; break;
+            default: assert(plane >= 0 && plane <= 2); return;
+          }
+          if (level == 0 && prev_level != 0) {
+            switch (plane) {
+              case 0: lfm->lfl_y_ver[row_uv][col_uv] = prev_level; break;
+              case 1: lfm->lfl_u_ver[row_uv][col_uv] = prev_level; break;
+              case 2: lfm->lfl_v_ver[row_uv][col_uv] = prev_level; break;
+              default: assert(plane >= 0 && plane <= 2); return;
+            }
+          }
+        }
+
+        // update prev info
+        prev_level = level;
+        prev_skip = skip;
+        prev_tx_size = tx_size;
+        // advance
+        col_in_unit += tx_size_wide_unit[tx_size];
+      }
+    }
+  }
+}
+
+void av1_build_bitmask_horz_info(
+    AV1_COMMON *const cm, const struct macroblockd_plane *const plane_ptr,
+    int plane) {
+  const int subsampling_x = plane_ptr->subsampling_x;
+  const int subsampling_y = plane_ptr->subsampling_y;
+  const int col_step = (MI_SIZE >> MI_SIZE_LOG2);
+  const int is_uv = plane > 0;
+  TX_SIZE tx_size = TX_16X16, prev_tx_size = TX_16X16;
+  uint8_t level, prev_level = 1;
+  uint64_t skip, prev_skip = 0;
+  uint64_t is_coding_block_border;
+
+  for (int c = 0; (c << MI_SIZE_LOG2) < plane_ptr->dst.width; c += col_step) {
+    const int mi_col = c << subsampling_x;
+    const int col = mi_col % MI_SIZE_64X64;
+    const int col_uv = col | subsampling_x;
+
+    for (int r = 0; (r << MI_SIZE_LOG2) < plane_ptr->dst.height;
+         r += (tx_size_high_unit[TX_64X64] >> subsampling_y)) {
+      const int mi_row = r << subsampling_y;
+      LoopFilterMask *lfm = get_loop_filter_mask(cm, mi_row, mi_col);
+
+      for (int r_in_unit = 0;
+           r_in_unit < (tx_size_high_unit[TX_64X64] >> subsampling_y);) {
+        const int y = (r + r_in_unit) << MI_SIZE_LOG2;
+        if (y >= plane_ptr->dst.height) break;
+        const int row = r_in_unit << subsampling_y;
+        const int row_uv = row | subsampling_y;
+        int index = 0;
+        const int shift = get_index_shift(col, row, &index);
+        const uint64_t mask = ((uint64_t)1 << shift);
+        skip = lfm->skip.bits[index] & mask;
+        is_coding_block_border = lfm->is_horz_border.bits[index] & mask;
+        switch (plane) {
+          case 0: level = lfm->lfl_y_hor[row_uv][col_uv]; break;
+          case 1: level = lfm->lfl_u_hor[row_uv][col_uv]; break;
+          case 2: level = lfm->lfl_v_hor[row_uv][col_uv]; break;
+          default: assert(plane >= 0 && plane <= 2); return;
+        }
+        for (TX_SIZE ts = TX_4X4; ts <= TX_64X64; ++ts) {
+          if (is_uv && ts == TX_64X64) continue;
+          if (lfm->tx_size_hor[is_uv][ts].bits[index] & mask) {
+            tx_size = ts;
+            break;
+          }
+        }
+        if ((r + r_in_unit > 0) && (level || prev_level) &&
+            (!prev_skip || !skip || is_coding_block_border)) {
+          const TX_SIZE min_tx_size =
+              AOMMIN(TX_16X16, AOMMIN(tx_size, prev_tx_size));
+          const int shift_1 = get_index_shift(col_uv, row_uv, &index);
+          const uint64_t mask_1 = ((uint64_t)1 << shift_1);
+
+          switch (plane) {
+            case 0: lfm->above_y[min_tx_size].bits[index] |= mask_1; break;
+            case 1: lfm->above_u[min_tx_size].bits[index] |= mask_1; break;
+            case 2: lfm->above_v[min_tx_size].bits[index] |= mask_1; break;
+            default: assert(plane >= 0 && plane <= 2); return;
+          }
+          if (level == 0 && prev_level != 0) {
+            switch (plane) {
+              case 0: lfm->lfl_y_hor[row_uv][col_uv] = prev_level; break;
+              case 1: lfm->lfl_u_hor[row_uv][col_uv] = prev_level; break;
+              case 2: lfm->lfl_v_hor[row_uv][col_uv] = prev_level; break;
+              default: assert(plane >= 0 && plane <= 2); return;
+            }
+          }
+        }
+
+        // update prev info
+        prev_level = level;
+        prev_skip = skip;
+        prev_tx_size = tx_size;
+        // advance
+        r_in_unit += tx_size_high_unit[tx_size];
+      }
+    }
+  }
+}
+
+void av1_filter_block_plane_bitmask_vert(
+    AV1_COMMON *const cm, struct macroblockd_plane *const plane_ptr, int pl,
+    int mi_row, int mi_col) {
+  struct buf_2d *const dst = &plane_ptr->dst;
+  uint8_t *const buf0 = dst->buf;
+  const int ssx = plane_ptr->subsampling_x;
+  const int ssy = plane_ptr->subsampling_y;
+  const int mask_cutoff = 0xffff;
+  const int row_step = 1 << ssy;
+  const int two_row_step = 2 << ssy;
+  const int row_stride = dst->stride << MI_SIZE_LOG2;
+  const int two_row_stride = row_stride << 1;
+  uint64_t mask_16x16 = 0;
+  uint64_t mask_8x8 = 0;
+  uint64_t mask_4x4 = 0;
+  uint8_t *lfl;
+  uint8_t *lfl2;
+  LoopFilterMask *lfm = get_loop_filter_mask(cm, mi_row, mi_col);
+  assert(lfm);
+
+  // 1. vertical filtering. filter two rows at a time
+  for (int r = 0;
+       ((mi_row + r) << MI_SIZE_LOG2) < cm->height && r < MI_SIZE_64X64;
+       r += two_row_step) {
+    const int row = r | ssy;
+    const int row_next = row + row_step;
+    const int col = ssx;
+    int index = 0;
+    const int shift = get_index_shift(col, row, &index);
+    int index_next = 0;
+    const int shift_next = get_index_shift(col, row_next, &index_next);
+    const int has_next_row = row_next < cm->mi_rows;
+    switch (pl) {
+      case 0:
+        mask_16x16 = lfm->left_y[TX_16X16].bits[index];
+        mask_8x8 = lfm->left_y[TX_8X8].bits[index];
+        mask_4x4 = lfm->left_y[TX_4X4].bits[index];
+        lfl = &lfm->lfl_y_ver[row][col];
+        lfl2 = &lfm->lfl_y_ver[row_next][col];
+        break;
+      case 1:
+        mask_16x16 = lfm->left_u[TX_16X16].bits[index];
+        mask_8x8 = lfm->left_u[TX_8X8].bits[index];
+        mask_4x4 = lfm->left_u[TX_4X4].bits[index];
+        lfl = &lfm->lfl_u_ver[row][col];
+        lfl2 = &lfm->lfl_u_ver[row_next][col];
+        break;
+      case 2:
+        mask_16x16 = lfm->left_v[TX_16X16].bits[index];
+        mask_8x8 = lfm->left_v[TX_8X8].bits[index];
+        mask_4x4 = lfm->left_v[TX_4X4].bits[index];
+        lfl = &lfm->lfl_v_ver[row][col];
+        lfl2 = &lfm->lfl_v_ver[row_next][col];
+        break;
+      default: assert(pl >= 0 && pl <= 2); return;
+    }
+    uint64_t mask_16x16_0 = (mask_16x16 >> shift) & mask_cutoff;
+    uint64_t mask_8x8_0 = (mask_8x8 >> shift) & mask_cutoff;
+    uint64_t mask_4x4_0 = (mask_4x4 >> shift) & mask_cutoff;
+    uint64_t mask_16x16_1 = (mask_16x16 >> shift_next) & mask_cutoff;
+    uint64_t mask_8x8_1 = (mask_8x8 >> shift_next) & mask_cutoff;
+    uint64_t mask_4x4_1 = (mask_4x4 >> shift_next) & mask_cutoff;
+    if (!has_next_row) {
+      mask_16x16_1 = 0;
+      mask_8x8_1 = 0;
+      mask_4x4_1 = 0;
+    }
+
+    if (cm->seq_params.use_highbitdepth)
+      highbd_filter_selectively_vert_row2(
+          ssx, CONVERT_TO_SHORTPTR(dst->buf), dst->stride, pl, mask_16x16_0,
+          mask_8x8_0, mask_4x4_0, mask_16x16_1, mask_8x8_1, mask_4x4_1,
+          &cm->lf_info, lfl, lfl2, (int)cm->seq_params.bit_depth);
+    else
+      filter_selectively_vert_row2(
+          ssx, dst->buf, dst->stride, pl, mask_16x16_0, mask_8x8_0, mask_4x4_0,
+          mask_16x16_1, mask_8x8_1, mask_4x4_1, &cm->lf_info, lfl, lfl2);
+    dst->buf += two_row_stride;
+  }
+  // reset buf pointer for horizontal filtering
+  dst->buf = buf0;
+}
+
+void av1_filter_block_plane_bitmask_horz(
+    AV1_COMMON *const cm, struct macroblockd_plane *const plane_ptr, int pl,
+    int mi_row, int mi_col) {
+  struct buf_2d *const dst = &plane_ptr->dst;
+  uint8_t *const buf0 = dst->buf;
+  const int ssx = plane_ptr->subsampling_x;
+  const int ssy = plane_ptr->subsampling_y;
+  const int mask_cutoff = 0xffff;
+  const int row_step = 1 << ssy;
+  const int row_stride = dst->stride << MI_SIZE_LOG2;
+  uint64_t mask_16x16 = 0;
+  uint64_t mask_8x8 = 0;
+  uint64_t mask_4x4 = 0;
+  uint8_t *lfl;
+  LoopFilterMask *lfm = get_loop_filter_mask(cm, mi_row, mi_col);
+  assert(lfm);
+  for (int r = 0;
+       ((mi_row + r) << MI_SIZE_LOG2) < cm->height && r < MI_SIZE_64X64;
+       r += row_step) {
+    if (mi_row + r == 0) {
+      dst->buf += row_stride;
+      continue;
+    }
+    const int row = r | ssy;
+    const int col = ssx;
+    int index = 0;
+    const int shift = get_index_shift(col, row, &index);
+    switch (pl) {
+      case 0:
+        mask_16x16 = lfm->above_y[TX_16X16].bits[index];
+        mask_8x8 = lfm->above_y[TX_8X8].bits[index];
+        mask_4x4 = lfm->above_y[TX_4X4].bits[index];
+        lfl = &lfm->lfl_y_hor[row][col];
+        break;
+      case 1:
+        mask_16x16 = lfm->above_u[TX_16X16].bits[index];
+        mask_8x8 = lfm->above_u[TX_8X8].bits[index];
+        mask_4x4 = lfm->above_u[TX_4X4].bits[index];
+        lfl = &lfm->lfl_u_hor[row][col];
+        break;
+      case 2:
+        mask_16x16 = lfm->above_v[TX_16X16].bits[index];
+        mask_8x8 = lfm->above_v[TX_8X8].bits[index];
+        mask_4x4 = lfm->above_v[TX_4X4].bits[index];
+        lfl = &lfm->lfl_v_hor[row][col];
+        break;
+      default: assert(pl >= 0 && pl <= 2); return;
+    }
+    mask_16x16 = (mask_16x16 >> shift) & mask_cutoff;
+    mask_8x8 = (mask_8x8 >> shift) & mask_cutoff;
+    mask_4x4 = (mask_4x4 >> shift) & mask_cutoff;
+
+    if (cm->seq_params.use_highbitdepth)
+      highbd_filter_selectively_horiz(
+          CONVERT_TO_SHORTPTR(dst->buf), dst->stride, pl, ssx, mask_16x16,
+          mask_8x8, mask_4x4, &cm->lf_info, lfl, (int)cm->seq_params.bit_depth);
+    else
+      filter_selectively_horiz(dst->buf, dst->stride, pl, ssx, mask_16x16,
+                               mask_8x8, mask_4x4, &cm->lf_info, lfl);
+    dst->buf += row_stride;
+  }
+  // reset buf pointer for next block
+  dst->buf = buf0;
+}
+
+void av1_filter_block_plane_ver(AV1_COMMON *const cm,
+                                struct macroblockd_plane *const plane_ptr,
+                                int pl, int mi_row, int mi_col) {
+  struct buf_2d *const dst = &plane_ptr->dst;
+  int r, c;
+  const int ssx = plane_ptr->subsampling_x;
+  const int ssy = plane_ptr->subsampling_y;
+  const int mask_cutoff = 0xffff;
+  const int single_step = 1 << ssy;
+  const int r_step = 2 << ssy;
+  uint64_t mask_16x16 = 0;
+  uint64_t mask_8x8 = 0;
+  uint64_t mask_4x4 = 0;
+  uint8_t *lfl;
+  uint8_t *lfl2;
+
+  // filter two rows at a time
+  for (r = 0; r < cm->seq_params.mib_size &&
+              ((mi_row + r) << MI_SIZE_LOG2 < cm->height);
+       r += r_step) {
+    for (c = 0; c < cm->seq_params.mib_size &&
+                ((mi_col + c) << MI_SIZE_LOG2 < cm->width);
+         c += MI_SIZE_64X64) {
+      dst->buf += ((c << MI_SIZE_LOG2) >> ssx);
+      LoopFilterMask *lfm = get_loop_filter_mask(cm, mi_row + r, mi_col + c);
+      assert(lfm);
+      const int row = ((mi_row + r) | ssy) % MI_SIZE_64X64;
+      const int col = ((mi_col + c) | ssx) % MI_SIZE_64X64;
+      int index = 0;
+      const int shift = get_index_shift(col, row, &index);
+      // current and next row should belong to the same mask_idx and index
+      // next row's shift
+      const int row_next = row + single_step;
+      int index_next = 0;
+      const int shift_next = get_index_shift(col, row_next, &index_next);
+      switch (pl) {
+        case 0:
+          mask_16x16 = lfm->left_y[TX_16X16].bits[index];
+          mask_8x8 = lfm->left_y[TX_8X8].bits[index];
+          mask_4x4 = lfm->left_y[TX_4X4].bits[index];
+          lfl = &lfm->lfl_y_ver[row][col];
+          lfl2 = &lfm->lfl_y_ver[row_next][col];
+          break;
+        case 1:
+          mask_16x16 = lfm->left_u[TX_16X16].bits[index];
+          mask_8x8 = lfm->left_u[TX_8X8].bits[index];
+          mask_4x4 = lfm->left_u[TX_4X4].bits[index];
+          lfl = &lfm->lfl_u_ver[row][col];
+          lfl2 = &lfm->lfl_u_ver[row_next][col];
+          break;
+        case 2:
+          mask_16x16 = lfm->left_v[TX_16X16].bits[index];
+          mask_8x8 = lfm->left_v[TX_8X8].bits[index];
+          mask_4x4 = lfm->left_v[TX_4X4].bits[index];
+          lfl = &lfm->lfl_v_ver[row][col];
+          lfl2 = &lfm->lfl_v_ver[row_next][col];
+          break;
+        default: assert(pl >= 0 && pl <= 2); return;
+      }
+      uint64_t mask_16x16_0 = (mask_16x16 >> shift) & mask_cutoff;
+      uint64_t mask_8x8_0 = (mask_8x8 >> shift) & mask_cutoff;
+      uint64_t mask_4x4_0 = (mask_4x4 >> shift) & mask_cutoff;
+      uint64_t mask_16x16_1 = (mask_16x16 >> shift_next) & mask_cutoff;
+      uint64_t mask_8x8_1 = (mask_8x8 >> shift_next) & mask_cutoff;
+      uint64_t mask_4x4_1 = (mask_4x4 >> shift_next) & mask_cutoff;
+
+      if (cm->seq_params.use_highbitdepth)
+        highbd_filter_selectively_vert_row2(
+            ssx, CONVERT_TO_SHORTPTR(dst->buf), dst->stride, pl, mask_16x16_0,
+            mask_8x8_0, mask_4x4_0, mask_16x16_1, mask_8x8_1, mask_4x4_1,
+            &cm->lf_info, lfl, lfl2, (int)cm->seq_params.bit_depth);
+      else
+        filter_selectively_vert_row2(ssx, dst->buf, dst->stride, pl,
+                                     mask_16x16_0, mask_8x8_0, mask_4x4_0,
+                                     mask_16x16_1, mask_8x8_1, mask_4x4_1,
+                                     &cm->lf_info, lfl, lfl2);
+      dst->buf -= ((c << MI_SIZE_LOG2) >> ssx);
+    }
+    dst->buf += 2 * MI_SIZE * dst->stride;
+  }
+}
+
+void av1_filter_block_plane_hor(AV1_COMMON *const cm,
+                                struct macroblockd_plane *const plane_ptr,
+                                int pl, int mi_row, int mi_col) {
+  struct buf_2d *const dst = &plane_ptr->dst;
+  int r, c;
+  const int ssx = plane_ptr->subsampling_x;
+  const int ssy = plane_ptr->subsampling_y;
+  const int mask_cutoff = 0xffff;
+  const int r_step = 1 << ssy;
+  uint64_t mask_16x16 = 0;
+  uint64_t mask_8x8 = 0;
+  uint64_t mask_4x4 = 0;
+  uint8_t *lfl;
+
+  for (r = 0; r < cm->seq_params.mib_size &&
+              ((mi_row + r) << MI_SIZE_LOG2 < cm->height);
+       r += r_step) {
+    for (c = 0; c < cm->seq_params.mib_size &&
+                ((mi_col + c) << MI_SIZE_LOG2 < cm->width);
+         c += MI_SIZE_64X64) {
+      if (mi_row + r == 0) continue;
+
+      dst->buf += ((c << MI_SIZE_LOG2) >> ssx);
+      LoopFilterMask *lfm = get_loop_filter_mask(cm, mi_row + r, mi_col + c);
+      assert(lfm);
+      const int row = ((mi_row + r) | ssy) % MI_SIZE_64X64;
+      const int col = ((mi_col + c) | ssx) % MI_SIZE_64X64;
+      int index = 0;
+      const int shift = get_index_shift(col, row, &index);
+      switch (pl) {
+        case 0:
+          mask_16x16 = lfm->above_y[TX_16X16].bits[index];
+          mask_8x8 = lfm->above_y[TX_8X8].bits[index];
+          mask_4x4 = lfm->above_y[TX_4X4].bits[index];
+          lfl = &lfm->lfl_y_hor[row][col];
+          break;
+        case 1:
+          mask_16x16 = lfm->above_u[TX_16X16].bits[index];
+          mask_8x8 = lfm->above_u[TX_8X8].bits[index];
+          mask_4x4 = lfm->above_u[TX_4X4].bits[index];
+          lfl = &lfm->lfl_u_hor[row][col];
+          break;
+        case 2:
+          mask_16x16 = lfm->above_v[TX_16X16].bits[index];
+          mask_8x8 = lfm->above_v[TX_8X8].bits[index];
+          mask_4x4 = lfm->above_v[TX_4X4].bits[index];
+          lfl = &lfm->lfl_v_hor[row][col];
+          break;
+        default: assert(pl >= 0 && pl <= 2); return;
+      }
+      mask_16x16 = (mask_16x16 >> shift) & mask_cutoff;
+      mask_8x8 = (mask_8x8 >> shift) & mask_cutoff;
+      mask_4x4 = (mask_4x4 >> shift) & mask_cutoff;
+
+      if (cm->seq_params.use_highbitdepth)
+        highbd_filter_selectively_horiz(CONVERT_TO_SHORTPTR(dst->buf),
+                                        dst->stride, pl, ssx, mask_16x16,
+                                        mask_8x8, mask_4x4, &cm->lf_info, lfl,
+                                        (int)cm->seq_params.bit_depth);
+      else
+        filter_selectively_horiz(dst->buf, dst->stride, pl, ssx, mask_16x16,
+                                 mask_8x8, mask_4x4, &cm->lf_info, lfl);
+      dst->buf -= ((c << MI_SIZE_LOG2) >> ssx);
+    }
+    dst->buf += MI_SIZE * dst->stride;
+  }
+}
+#endif  // LOOP_FILTER_BITMASK
+
+static TX_SIZE get_transform_size(const MACROBLOCKD *const xd,
+                                  const MB_MODE_INFO *const mbmi,
+                                  const EDGE_DIR edge_dir, const int mi_row,
+                                  const int mi_col, const int plane,
+                                  const struct macroblockd_plane *plane_ptr) {
+  assert(mbmi != NULL);
+  if (xd && xd->lossless[mbmi->segment_id]) return TX_4X4;
+
+  TX_SIZE tx_size =
+      (plane == AOM_PLANE_Y)
+          ? mbmi->tx_size
+          : av1_get_max_uv_txsize(mbmi->sb_type, plane_ptr->subsampling_x,
+                                  plane_ptr->subsampling_y);
+  assert(tx_size < TX_SIZES_ALL);
+  if ((plane == AOM_PLANE_Y) && is_inter_block(mbmi) && !mbmi->skip) {
+    const BLOCK_SIZE sb_type = mbmi->sb_type;
+    const int blk_row = mi_row & (mi_size_high[sb_type] - 1);
+    const int blk_col = mi_col & (mi_size_wide[sb_type] - 1);
+    const TX_SIZE mb_tx_size =
+        mbmi->inter_tx_size[av1_get_txb_size_index(sb_type, blk_row, blk_col)];
+    assert(mb_tx_size < TX_SIZES_ALL);
+    tx_size = mb_tx_size;
+  }
+
+  // since in case of chrominance or non-square transorm need to convert
+  // transform size into transform size in particular direction.
+  // for vertical edge, filter direction is horizontal, for horizontal
+  // edge, filter direction is vertical.
+  tx_size = (VERT_EDGE == edge_dir) ? txsize_horz_map[tx_size]
+                                    : txsize_vert_map[tx_size];
+  return tx_size;
+}
+
+typedef struct AV1_DEBLOCKING_PARAMETERS {
+  // length of the filter applied to the outer edge
+  uint32_t filter_length;
+  // deblocking limits
+  const uint8_t *lim;
+  const uint8_t *mblim;
+  const uint8_t *hev_thr;
+} AV1_DEBLOCKING_PARAMETERS;
+
+// Return TX_SIZE from get_transform_size(), so it is plane and direction
+// awared
+static TX_SIZE set_lpf_parameters(
+    AV1_DEBLOCKING_PARAMETERS *const params, const ptrdiff_t mode_step,
+    const AV1_COMMON *const cm, const MACROBLOCKD *const xd,
+    const EDGE_DIR edge_dir, const uint32_t x, const uint32_t y,
+    const int plane, const struct macroblockd_plane *const plane_ptr) {
+  // reset to initial values
+  params->filter_length = 0;
+
+  // no deblocking is required
+  const uint32_t width = plane_ptr->dst.width;
+  const uint32_t height = plane_ptr->dst.height;
+  if ((width <= x) || (height <= y)) {
+    // just return the smallest transform unit size
+    return TX_4X4;
+  }
+
+  const uint32_t scale_horz = plane_ptr->subsampling_x;
+  const uint32_t scale_vert = plane_ptr->subsampling_y;
+  // for sub8x8 block, chroma prediction mode is obtained from the bottom/right
+  // mi structure of the co-located 8x8 luma block. so for chroma plane, mi_row
+  // and mi_col should map to the bottom/right mi structure, i.e, both mi_row
+  // and mi_col should be odd number for chroma plane.
+  const int mi_row = scale_vert | ((y << scale_vert) >> MI_SIZE_LOG2);
+  const int mi_col = scale_horz | ((x << scale_horz) >> MI_SIZE_LOG2);
+  MB_MODE_INFO **mi = cm->mi_grid_visible + mi_row * cm->mi_stride + mi_col;
+  const MB_MODE_INFO *mbmi = mi[0];
+  // If current mbmi is not correctly setup, return an invalid value to stop
+  // filtering. One example is that if this tile is not coded, then its mbmi
+  // it not set up.
+  if (mbmi == NULL) return TX_INVALID;
+
+  const TX_SIZE ts =
+      get_transform_size(xd, mi[0], edge_dir, mi_row, mi_col, plane, plane_ptr);
+
+  {
+    const uint32_t coord = (VERT_EDGE == edge_dir) ? (x) : (y);
+    const uint32_t transform_masks =
+        edge_dir == VERT_EDGE ? tx_size_wide[ts] - 1 : tx_size_high[ts] - 1;
+    const int32_t tu_edge = (coord & transform_masks) ? (0) : (1);
+
+    if (!tu_edge) return ts;
+
+    // prepare outer edge parameters. deblock the edge if it's an edge of a TU
+    {
+      const uint32_t curr_level =
+          get_filter_level(cm, &cm->lf_info, edge_dir, plane, mbmi);
+      const int curr_skipped = mbmi->skip && is_inter_block(mbmi);
+      uint32_t level = curr_level;
+      if (coord) {
+        {
+          const MB_MODE_INFO *const mi_prev = *(mi - mode_step);
+          if (mi_prev == NULL) return TX_INVALID;
+          const int pv_row =
+              (VERT_EDGE == edge_dir) ? (mi_row) : (mi_row - (1 << scale_vert));
+          const int pv_col =
+              (VERT_EDGE == edge_dir) ? (mi_col - (1 << scale_horz)) : (mi_col);
+          const TX_SIZE pv_ts = get_transform_size(
+              xd, mi_prev, edge_dir, pv_row, pv_col, plane, plane_ptr);
+
+          const uint32_t pv_lvl =
+              get_filter_level(cm, &cm->lf_info, edge_dir, plane, mi_prev);
+
+          const int pv_skip = mi_prev->skip && is_inter_block(mi_prev);
+          const BLOCK_SIZE bsize =
+              get_plane_block_size(mbmi->sb_type, plane_ptr->subsampling_x,
+                                   plane_ptr->subsampling_y);
+          const int prediction_masks = edge_dir == VERT_EDGE
+                                           ? block_size_wide[bsize] - 1
+                                           : block_size_high[bsize] - 1;
+          const int32_t pu_edge = !(coord & prediction_masks);
+          // if the current and the previous blocks are skipped,
+          // deblock the edge if the edge belongs to a PU's edge only.
+          if ((curr_level || pv_lvl) &&
+              (!pv_skip || !curr_skipped || pu_edge)) {
+            const TX_SIZE min_ts = AOMMIN(ts, pv_ts);
+            if (TX_4X4 >= min_ts) {
+              params->filter_length = 4;
+            } else if (TX_8X8 == min_ts) {
+              if (plane != 0)
+                params->filter_length = 6;
+              else
+                params->filter_length = 8;
+            } else {
+              params->filter_length = 14;
+              // No wide filtering for chroma plane
+              if (plane != 0) {
+                params->filter_length = 6;
+              }
+            }
+
+            // update the level if the current block is skipped,
+            // but the previous one is not
+            level = (curr_level) ? (curr_level) : (pv_lvl);
+          }
+        }
+      }
+      // prepare common parameters
+      if (params->filter_length) {
+        const loop_filter_thresh *const limits = cm->lf_info.lfthr + level;
+        params->lim = limits->lim;
+        params->mblim = limits->mblim;
+        params->hev_thr = limits->hev_thr;
+      }
+    }
+  }
+
+  return ts;
+}
+
+void av1_filter_block_plane_vert(const AV1_COMMON *const cm,
+                                 const MACROBLOCKD *const xd, const int plane,
+                                 const MACROBLOCKD_PLANE *const plane_ptr,
+                                 const uint32_t mi_row, const uint32_t mi_col) {
+  const int row_step = MI_SIZE >> MI_SIZE_LOG2;
+  const uint32_t scale_horz = plane_ptr->subsampling_x;
+  const uint32_t scale_vert = plane_ptr->subsampling_y;
+  uint8_t *const dst_ptr = plane_ptr->dst.buf;
+  const int dst_stride = plane_ptr->dst.stride;
+  const int y_range = (MAX_MIB_SIZE >> scale_vert);
+  const int x_range = (MAX_MIB_SIZE >> scale_horz);
+  const int use_highbitdepth = cm->seq_params.use_highbitdepth;
+  const aom_bit_depth_t bit_depth = cm->seq_params.bit_depth;
+  for (int y = 0; y < y_range; y += row_step) {
+    uint8_t *p = dst_ptr + y * MI_SIZE * dst_stride;
+    for (int x = 0; x < x_range;) {
+      // inner loop always filter vertical edges in a MI block. If MI size
+      // is 8x8, it will filter the vertical edge aligned with a 8x8 block.
+      // If 4x4 trasnform is used, it will then filter the internal edge
+      //  aligned with a 4x4 block
+      const uint32_t curr_x = ((mi_col * MI_SIZE) >> scale_horz) + x * MI_SIZE;
+      const uint32_t curr_y = ((mi_row * MI_SIZE) >> scale_vert) + y * MI_SIZE;
+      uint32_t advance_units;
+      TX_SIZE tx_size;
+      AV1_DEBLOCKING_PARAMETERS params;
+      memset(&params, 0, sizeof(params));
+
+      tx_size =
+          set_lpf_parameters(&params, ((ptrdiff_t)1 << scale_horz), cm, xd,
+                             VERT_EDGE, curr_x, curr_y, plane, plane_ptr);
+      if (tx_size == TX_INVALID) {
+        params.filter_length = 0;
+        tx_size = TX_4X4;
+      }
+
+      switch (params.filter_length) {
+        // apply 4-tap filtering
+        case 4:
+          if (use_highbitdepth)
+            aom_highbd_lpf_vertical_4(CONVERT_TO_SHORTPTR(p), dst_stride,
+                                      params.mblim, params.lim, params.hev_thr,
+                                      bit_depth);
+          else
+            aom_lpf_vertical_4(p, dst_stride, params.mblim, params.lim,
+                               params.hev_thr);
+          break;
+        case 6:  // apply 6-tap filter for chroma plane only
+          assert(plane != 0);
+          if (use_highbitdepth)
+            aom_highbd_lpf_vertical_6(CONVERT_TO_SHORTPTR(p), dst_stride,
+                                      params.mblim, params.lim, params.hev_thr,
+                                      bit_depth);
+          else
+            aom_lpf_vertical_6(p, dst_stride, params.mblim, params.lim,
+                               params.hev_thr);
+          break;
+        // apply 8-tap filtering
+        case 8:
+          if (use_highbitdepth)
+            aom_highbd_lpf_vertical_8(CONVERT_TO_SHORTPTR(p), dst_stride,
+                                      params.mblim, params.lim, params.hev_thr,
+                                      bit_depth);
+          else
+            aom_lpf_vertical_8(p, dst_stride, params.mblim, params.lim,
+                               params.hev_thr);
+          break;
+        // apply 14-tap filtering
+        case 14:
+          if (use_highbitdepth)
+            aom_highbd_lpf_vertical_14(CONVERT_TO_SHORTPTR(p), dst_stride,
+                                       params.mblim, params.lim, params.hev_thr,
+                                       bit_depth);
+          else
+            aom_lpf_vertical_14(p, dst_stride, params.mblim, params.lim,
+                                params.hev_thr);
+          break;
+        // no filtering
+        default: break;
+      }
+      // advance the destination pointer
+      advance_units = tx_size_wide_unit[tx_size];
+      x += advance_units;
+      p += advance_units * MI_SIZE;
+    }
+  }
+}
+
+void av1_filter_block_plane_horz(const AV1_COMMON *const cm,
+                                 const MACROBLOCKD *const xd, const int plane,
+                                 const MACROBLOCKD_PLANE *const plane_ptr,
+                                 const uint32_t mi_row, const uint32_t mi_col) {
+  const int col_step = MI_SIZE >> MI_SIZE_LOG2;
+  const uint32_t scale_horz = plane_ptr->subsampling_x;
+  const uint32_t scale_vert = plane_ptr->subsampling_y;
+  uint8_t *const dst_ptr = plane_ptr->dst.buf;
+  const int dst_stride = plane_ptr->dst.stride;
+  const int y_range = (MAX_MIB_SIZE >> scale_vert);
+  const int x_range = (MAX_MIB_SIZE >> scale_horz);
+  const int use_highbitdepth = cm->seq_params.use_highbitdepth;
+  const aom_bit_depth_t bit_depth = cm->seq_params.bit_depth;
+  for (int x = 0; x < x_range; x += col_step) {
+    uint8_t *p = dst_ptr + x * MI_SIZE;
+    for (int y = 0; y < y_range;) {
+      // inner loop always filter vertical edges in a MI block. If MI size
+      // is 8x8, it will first filter the vertical edge aligned with a 8x8
+      // block. If 4x4 trasnform is used, it will then filter the internal
+      // edge aligned with a 4x4 block
+      const uint32_t curr_x = ((mi_col * MI_SIZE) >> scale_horz) + x * MI_SIZE;
+      const uint32_t curr_y = ((mi_row * MI_SIZE) >> scale_vert) + y * MI_SIZE;
+      uint32_t advance_units;
+      TX_SIZE tx_size;
+      AV1_DEBLOCKING_PARAMETERS params;
+      memset(&params, 0, sizeof(params));
+
+      tx_size =
+          set_lpf_parameters(&params, (cm->mi_stride << scale_vert), cm, xd,
+                             HORZ_EDGE, curr_x, curr_y, plane, plane_ptr);
+      if (tx_size == TX_INVALID) {
+        params.filter_length = 0;
+        tx_size = TX_4X4;
+      }
+
+      switch (params.filter_length) {
+        // apply 4-tap filtering
+        case 4:
+          if (use_highbitdepth)
+            aom_highbd_lpf_horizontal_4(CONVERT_TO_SHORTPTR(p), dst_stride,
+                                        params.mblim, params.lim,
+                                        params.hev_thr, bit_depth);
+          else
+            aom_lpf_horizontal_4(p, dst_stride, params.mblim, params.lim,
+                                 params.hev_thr);
+          break;
+        // apply 6-tap filtering
+        case 6:
+          assert(plane != 0);
+          if (use_highbitdepth)
+            aom_highbd_lpf_horizontal_6(CONVERT_TO_SHORTPTR(p), dst_stride,
+                                        params.mblim, params.lim,
+                                        params.hev_thr, bit_depth);
+          else
+            aom_lpf_horizontal_6(p, dst_stride, params.mblim, params.lim,
+                                 params.hev_thr);
+          break;
+        // apply 8-tap filtering
+        case 8:
+          if (use_highbitdepth)
+            aom_highbd_lpf_horizontal_8(CONVERT_TO_SHORTPTR(p), dst_stride,
+                                        params.mblim, params.lim,
+                                        params.hev_thr, bit_depth);
+          else
+            aom_lpf_horizontal_8(p, dst_stride, params.mblim, params.lim,
+                                 params.hev_thr);
+          break;
+        // apply 14-tap filtering
+        case 14:
+          if (use_highbitdepth)
+            aom_highbd_lpf_horizontal_14(CONVERT_TO_SHORTPTR(p), dst_stride,
+                                         params.mblim, params.lim,
+                                         params.hev_thr, bit_depth);
+          else
+            aom_lpf_horizontal_14(p, dst_stride, params.mblim, params.lim,
+                                  params.hev_thr);
+          break;
+        // no filtering
+        default: break;
+      }
+
+      // advance the destination pointer
+      advance_units = tx_size_high_unit[tx_size];
+      y += advance_units;
+      p += advance_units * dst_stride * MI_SIZE;
+    }
+  }
+}
+
+void av1_filter_block_plane_vert_test(const AV1_COMMON *const cm,
+                                      const MACROBLOCKD *const xd,
+                                      const int plane,
+                                      const MACROBLOCKD_PLANE *const plane_ptr,
+                                      const uint32_t mi_row,
+                                      const uint32_t mi_col) {
+  const int row_step = MI_SIZE >> MI_SIZE_LOG2;
+  const uint32_t scale_horz = plane_ptr->subsampling_x;
+  const uint32_t scale_vert = plane_ptr->subsampling_y;
+  uint8_t *const dst_ptr = plane_ptr->dst.buf;
+  const int dst_stride = plane_ptr->dst.stride;
+  const int y_range = cm->mi_rows >> scale_vert;
+  const int x_range = cm->mi_cols >> scale_horz;
+  for (int y = 0; y < y_range; y += row_step) {
+    uint8_t *p = dst_ptr + y * MI_SIZE * dst_stride;
+    for (int x = 0; x < x_range;) {
+      // inner loop always filter vertical edges in a MI block. If MI size
+      // is 8x8, it will filter the vertical edge aligned with a 8x8 block.
+      // If 4x4 trasnform is used, it will then filter the internal edge
+      //  aligned with a 4x4 block
+      const uint32_t curr_x = ((mi_col * MI_SIZE) >> scale_horz) + x * MI_SIZE;
+      const uint32_t curr_y = ((mi_row * MI_SIZE) >> scale_vert) + y * MI_SIZE;
+      uint32_t advance_units;
+      TX_SIZE tx_size;
+      AV1_DEBLOCKING_PARAMETERS params;
+      memset(&params, 0, sizeof(params));
+
+      tx_size =
+          set_lpf_parameters(&params, ((ptrdiff_t)1 << scale_horz), cm, xd,
+                             VERT_EDGE, curr_x, curr_y, plane, plane_ptr);
+      if (tx_size == TX_INVALID) {
+        params.filter_length = 0;
+        tx_size = TX_4X4;
+      }
+
+      // advance the destination pointer
+      advance_units = tx_size_wide_unit[tx_size];
+      x += advance_units;
+      p += advance_units * MI_SIZE;
+    }
+  }
+}
+
+void av1_filter_block_plane_horz_test(const AV1_COMMON *const cm,
+                                      const MACROBLOCKD *const xd,
+                                      const int plane,
+                                      const MACROBLOCKD_PLANE *const plane_ptr,
+                                      const uint32_t mi_row,
+                                      const uint32_t mi_col) {
+  const int col_step = MI_SIZE >> MI_SIZE_LOG2;
+  const uint32_t scale_horz = plane_ptr->subsampling_x;
+  const uint32_t scale_vert = plane_ptr->subsampling_y;
+  uint8_t *const dst_ptr = plane_ptr->dst.buf;
+  const int dst_stride = plane_ptr->dst.stride;
+  const int y_range = cm->mi_rows >> scale_vert;
+  const int x_range = cm->mi_cols >> scale_horz;
+  for (int x = 0; x < x_range; x += col_step) {
+    uint8_t *p = dst_ptr + x * MI_SIZE;
+    for (int y = 0; y < y_range;) {
+      // inner loop always filter vertical edges in a MI block. If MI size
+      // is 8x8, it will first filter the vertical edge aligned with a 8x8
+      // block. If 4x4 trasnform is used, it will then filter the internal
+      // edge aligned with a 4x4 block
+      const uint32_t curr_x = ((mi_col * MI_SIZE) >> scale_horz) + x * MI_SIZE;
+      const uint32_t curr_y = ((mi_row * MI_SIZE) >> scale_vert) + y * MI_SIZE;
+      uint32_t advance_units;
+      TX_SIZE tx_size;
+      AV1_DEBLOCKING_PARAMETERS params;
+      memset(&params, 0, sizeof(params));
+
+      tx_size =
+          set_lpf_parameters(&params, (cm->mi_stride << scale_vert), cm, xd,
+                             HORZ_EDGE, curr_x, curr_y, plane, plane_ptr);
+      if (tx_size == TX_INVALID) {
+        params.filter_length = 0;
+        tx_size = TX_4X4;
+      }
+
+      // advance the destination pointer
+      advance_units = tx_size_high_unit[tx_size];
+      y += advance_units;
+      p += advance_units * dst_stride * MI_SIZE;
+    }
+  }
+}
+
+static void loop_filter_rows(YV12_BUFFER_CONFIG *frame_buffer, AV1_COMMON *cm,
+                             MACROBLOCKD *xd, int start, int stop,
+#if LOOP_FILTER_BITMASK
+                             int is_decoding,
+#endif
+                             int plane_start, int plane_end) {
+  struct macroblockd_plane *pd = xd->plane;
+  const int col_start = 0;
+  const int col_end = cm->mi_cols;
+  int mi_row, mi_col;
+  int plane;
+
+#if LOOP_FILTER_BITMASK
+  if (is_decoding) {
+    cm->is_decoding = is_decoding;
+    for (plane = plane_start; plane < plane_end; plane++) {
+      if (plane == 0 && !(cm->lf.filter_level[0]) && !(cm->lf.filter_level[1]))
+        break;
+      else if (plane == 1 && !(cm->lf.filter_level_u))
+        continue;
+      else if (plane == 2 && !(cm->lf.filter_level_v))
+        continue;
+
+      av1_setup_dst_planes(pd, cm->seq_params.sb_size, frame_buffer, 0, 0,
+                           plane, plane + 1);
+
+      av1_build_bitmask_vert_info(cm, &pd[plane], plane);
+      av1_build_bitmask_horz_info(cm, &pd[plane], plane);
+
+      // apply loop filtering which only goes through buffer once
+      for (mi_row = start; mi_row < stop; mi_row += MI_SIZE_64X64) {
+        for (mi_col = col_start; mi_col < col_end; mi_col += MI_SIZE_64X64) {
+          av1_setup_dst_planes(pd, BLOCK_64X64, frame_buffer, mi_row, mi_col,
+                               plane, plane + 1);
+          av1_filter_block_plane_bitmask_vert(cm, &pd[plane], plane, mi_row,
+                                              mi_col);
+          if (mi_col - MI_SIZE_64X64 >= 0) {
+            av1_setup_dst_planes(pd, BLOCK_64X64, frame_buffer, mi_row,
+                                 mi_col - MI_SIZE_64X64, plane, plane + 1);
+            av1_filter_block_plane_bitmask_horz(cm, &pd[plane], plane, mi_row,
+                                                mi_col - MI_SIZE_64X64);
+          }
+        }
+        av1_setup_dst_planes(pd, BLOCK_64X64, frame_buffer, mi_row,
+                             mi_col - MI_SIZE_64X64, plane, plane + 1);
+        av1_filter_block_plane_bitmask_horz(cm, &pd[plane], plane, mi_row,
+                                            mi_col - MI_SIZE_64X64);
+      }
+    }
+    return;
+  }
+#endif
+
+  for (plane = plane_start; plane < plane_end; plane++) {
+    if (plane == 0 && !(cm->lf.filter_level[0]) && !(cm->lf.filter_level[1]))
+      break;
+    else if (plane == 1 && !(cm->lf.filter_level_u))
+      continue;
+    else if (plane == 2 && !(cm->lf.filter_level_v))
+      continue;
+
+    if (cm->lf.combine_vert_horz_lf) {
+      // filter all vertical and horizontal edges in every 128x128 super block
+      for (mi_row = start; mi_row < stop; mi_row += MAX_MIB_SIZE) {
+        for (mi_col = col_start; mi_col < col_end; mi_col += MAX_MIB_SIZE) {
+          // filter vertical edges
+          av1_setup_dst_planes(pd, cm->seq_params.sb_size, frame_buffer, mi_row,
+                               mi_col, plane, plane + 1);
+          av1_filter_block_plane_vert(cm, xd, plane, &pd[plane], mi_row,
+                                      mi_col);
+          // filter horizontal edges
+          if (mi_col - MAX_MIB_SIZE >= 0) {
+            av1_setup_dst_planes(pd, cm->seq_params.sb_size, frame_buffer,
+                                 mi_row, mi_col - MAX_MIB_SIZE, plane,
+                                 plane + 1);
+            av1_filter_block_plane_horz(cm, xd, plane, &pd[plane], mi_row,
+                                        mi_col - MAX_MIB_SIZE);
+          }
+        }
+        // filter horizontal edges
+        av1_setup_dst_planes(pd, cm->seq_params.sb_size, frame_buffer, mi_row,
+                             mi_col - MAX_MIB_SIZE, plane, plane + 1);
+        av1_filter_block_plane_horz(cm, xd, plane, &pd[plane], mi_row,
+                                    mi_col - MAX_MIB_SIZE);
+      }
+    } else {
+      // filter all vertical edges in every 128x128 super block
+      for (mi_row = start; mi_row < stop; mi_row += MAX_MIB_SIZE) {
+        for (mi_col = col_start; mi_col < col_end; mi_col += MAX_MIB_SIZE) {
+          av1_setup_dst_planes(pd, cm->seq_params.sb_size, frame_buffer, mi_row,
+                               mi_col, plane, plane + 1);
+          av1_filter_block_plane_vert(cm, xd, plane, &pd[plane], mi_row,
+                                      mi_col);
+        }
+      }
+
+      // filter all horizontal edges in every 128x128 super block
+      for (mi_row = start; mi_row < stop; mi_row += MAX_MIB_SIZE) {
+        for (mi_col = col_start; mi_col < col_end; mi_col += MAX_MIB_SIZE) {
+          av1_setup_dst_planes(pd, cm->seq_params.sb_size, frame_buffer, mi_row,
+                               mi_col, plane, plane + 1);
+          av1_filter_block_plane_horz(cm, xd, plane, &pd[plane], mi_row,
+                                      mi_col);
+        }
+      }
+    }
+  }
+}
+
+void av1_loop_filter_frame(YV12_BUFFER_CONFIG *frame, AV1_COMMON *cm,
+                           MACROBLOCKD *xd,
+#if LOOP_FILTER_BITMASK
+                           int is_decoding,
+#endif
+                           int plane_start, int plane_end, int partial_frame) {
+  int start_mi_row, end_mi_row, mi_rows_to_filter;
+
+  start_mi_row = 0;
+  mi_rows_to_filter = cm->mi_rows;
+  if (partial_frame && cm->mi_rows > 8) {
+    start_mi_row = cm->mi_rows >> 1;
+    start_mi_row &= 0xfffffff8;
+    mi_rows_to_filter = AOMMAX(cm->mi_rows / 8, 8);
+  }
+  end_mi_row = start_mi_row + mi_rows_to_filter;
+  av1_loop_filter_frame_init(cm, plane_start, plane_end);
+  loop_filter_rows(frame, cm, xd, start_mi_row, end_mi_row,
+#if LOOP_FILTER_BITMASK
+                   is_decoding,
+#endif
+                   plane_start, plane_end);
+}

diff --git a/libav1/av1/common/av1_loopfilter.h b/libav1/av1/common/av1_loopfilter.h
new file mode 100644
index 0000000..ae4d372
--- /dev/null
+++ b/libav1/av1/common/av1_loopfilter.h

@@ -0,0 +1,216 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_AV1_LOOPFILTER_H_
+#define AOM_AV1_COMMON_AV1_LOOPFILTER_H_
+
+#include "config/aom_config.h"
+
+#include "aom_ports/mem.h"
+#include "av1/common/blockd.h"
+#include "av1/common/seg_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MAX_LOOP_FILTER 63
+#define MAX_SHARPNESS 7
+
+#define SIMD_WIDTH 16
+
+enum lf_path {
+  LF_PATH_420,
+  LF_PATH_444,
+  LF_PATH_SLOW,
+};
+
+typedef struct {
+  uint64_t bits[4];
+} FilterMask;
+
+#if LOOP_FILTER_BITMASK
+// This structure holds bit masks for all 4x4 blocks in a 64x64 region.
+// Each 1 bit represents a position in which we want to apply the loop filter.
+// For Y plane, 4x4 in 64x64 requires 16x16 = 256 bit, therefore we use 4
+// uint64_t; For U, V plane, for 420 format, plane size is 32x32, thus we use
+// a uint64_t to represent bitmask.
+// Left_ entries refer to whether we apply a filter on the border to the
+// left of the block.   Above_ entries refer to whether or not to apply a
+// filter on the above border.
+// Since each transform is accompanied by a potentially different type of
+// loop filter there is a different entry in the array for each transform size.
+typedef struct {
+  FilterMask left_y[TX_SIZES];
+  FilterMask above_y[TX_SIZES];
+  FilterMask left_u[TX_SIZES];
+  FilterMask above_u[TX_SIZES];
+  FilterMask left_v[TX_SIZES];
+  FilterMask above_v[TX_SIZES];
+
+  // Y plane vertical edge and horizontal edge filter level
+  uint8_t lfl_y_hor[MI_SIZE_64X64][MI_SIZE_64X64];
+  uint8_t lfl_y_ver[MI_SIZE_64X64][MI_SIZE_64X64];
+
+  // U plane filter level
+  uint8_t lfl_u_ver[MI_SIZE_64X64][MI_SIZE_64X64];
+  uint8_t lfl_u_hor[MI_SIZE_64X64][MI_SIZE_64X64];
+
+  // V plane filter level
+  uint8_t lfl_v_ver[MI_SIZE_64X64][MI_SIZE_64X64];
+  uint8_t lfl_v_hor[MI_SIZE_64X64][MI_SIZE_64X64];
+
+  // other info
+  FilterMask skip;
+  FilterMask is_vert_border;
+  FilterMask is_horz_border;
+  // Y or UV planes, 5 tx sizes: 4x4, 8x8, 16x16, 32x32, 64x64
+  FilterMask tx_size_ver[2][5];
+  FilterMask tx_size_hor[2][5];
+} LoopFilterMask;
+#endif  // LOOP_FILTER_BITMASK
+
+struct loopfilter {
+  int filter_level[2];
+  int filter_level_u;
+  int filter_level_v;
+
+  int sharpness_level;
+
+  uint8_t mode_ref_delta_enabled;
+  uint8_t mode_ref_delta_update;
+
+  // 0 = Intra, Last, Last2+Last3,
+  // GF, BRF, ARF2, ARF
+  int8_t ref_deltas[REF_FRAMES];
+
+  // 0 = ZERO_MV, MV
+  int8_t mode_deltas[MAX_MODE_LF_DELTAS];
+
+  int combine_vert_horz_lf;
+
+#if LOOP_FILTER_BITMASK
+  LoopFilterMask *lfm;
+  size_t lfm_num;
+  int lfm_stride;
+#endif  // LOOP_FILTER_BITMASK
+};
+
+// Need to align this structure so when it is declared and
+// passed it can be loaded into vector registers.
+typedef struct {
+  DECLARE_ALIGNED(SIMD_WIDTH, uint8_t, mblim[SIMD_WIDTH]);
+  DECLARE_ALIGNED(SIMD_WIDTH, uint8_t, lim[SIMD_WIDTH]);
+  DECLARE_ALIGNED(SIMD_WIDTH, uint8_t, hev_thr[SIMD_WIDTH]);
+} loop_filter_thresh;
+
+typedef struct {
+  loop_filter_thresh lfthr[MAX_LOOP_FILTER + 1];
+  uint8_t lvl[MAX_MB_PLANE][MAX_SEGMENTS][2][REF_FRAMES][MAX_MODE_LF_DELTAS];
+} loop_filter_info_n;
+
+/* assorted loopfilter functions which get used elsewhere */
+struct AV1Common;
+struct macroblockd;
+struct AV1LfSyncData;
+
+void av1_loop_filter_init(struct AV1Common *cm);
+
+void av1_loop_filter_frame_init(struct AV1Common *cm, int plane_start,
+                                int plane_end);
+
+#if LOOP_FILTER_BITMASK
+void av1_loop_filter_frame(YV12_BUFFER_CONFIG *frame, struct AV1Common *cm,
+                           struct macroblockd *mbd, int is_decoding,
+                           int plane_start, int plane_end, int partial_frame);
+#else
+void av1_loop_filter_frame(YV12_BUFFER_CONFIG *frame, struct AV1Common *cm,
+                           struct macroblockd *mbd, int plane_start,
+                           int plane_end, int partial_frame);
+#endif
+
+void av1_filter_block_plane_vert(const struct AV1Common *const cm,
+                                 const MACROBLOCKD *const xd, const int plane,
+                                 const MACROBLOCKD_PLANE *const plane_ptr,
+                                 const uint32_t mi_row, const uint32_t mi_col);
+
+void av1_filter_block_plane_horz(const struct AV1Common *const cm,
+                                 const MACROBLOCKD *const xd, const int plane,
+                                 const MACROBLOCKD_PLANE *const plane_ptr,
+                                 const uint32_t mi_row, const uint32_t mi_col);
+
+typedef struct LoopFilterWorkerData {
+  YV12_BUFFER_CONFIG *frame_buffer;
+  struct AV1Common *cm;
+  struct macroblockd_plane planes[MAX_MB_PLANE];
+  // TODO(Ranjit): When the filter functions are modified to use xd->lossless
+  // add lossless as a member here.
+  MACROBLOCKD *xd;
+} LFWorkerData;
+
+uint8_t get_filter_level(const struct AV1Common *cm,
+                         const loop_filter_info_n *lfi_n, const int dir_idx,
+                         int plane, const MB_MODE_INFO *mbmi);
+#if LOOP_FILTER_BITMASK
+void av1_setup_bitmask(struct AV1Common *const cm, int mi_row, int mi_col,
+                       int plane, int subsampling_x, int subsampling_y,
+                       int row_end, int col_end);
+
+void av1_filter_block_plane_ver(struct AV1Common *const cm,
+                                struct macroblockd_plane *const plane_ptr,
+                                int pl, int mi_row, int mi_col);
+
+void av1_filter_block_plane_hor(struct AV1Common *const cm,
+                                struct macroblockd_plane *const plane, int pl,
+                                int mi_row, int mi_col);
+LoopFilterMask *get_loop_filter_mask(const struct AV1Common *const cm,
+                                     int mi_row, int mi_col);
+int get_index_shift(int mi_col, int mi_row, int *index);
+
+void av1_build_bitmask_vert_info(
+    struct AV1Common *const cm, const struct macroblockd_plane *const plane_ptr,
+    int plane);
+
+void av1_build_bitmask_horz_info(
+    struct AV1Common *const cm, const struct macroblockd_plane *const plane_ptr,
+    int plane);
+
+void av1_filter_block_plane_bitmask_vert(
+    struct AV1Common *const cm, struct macroblockd_plane *const plane_ptr,
+    int pl, int mi_row, int mi_col);
+
+void av1_filter_block_plane_bitmask_horz(
+    struct AV1Common *const cm, struct macroblockd_plane *const plane_ptr,
+    int pl, int mi_row, int mi_col);
+
+#endif  // LOOP_FILTER_BITMASK
+
+extern const int mask_id_table_tx_4x4[BLOCK_SIZES_ALL];
+
+extern const int mask_id_table_tx_8x8[BLOCK_SIZES_ALL];
+
+extern const int mask_id_table_tx_16x16[BLOCK_SIZES_ALL];
+
+extern const int mask_id_table_tx_32x32[BLOCK_SIZES_ALL];
+
+// corresponds to entry id in table left_mask_univariant_reordered,
+// of block size mxn and TX_mxn.
+extern const int mask_id_table_vert_border[BLOCK_SIZES_ALL];
+
+extern const FilterMask left_mask_univariant_reordered[67];
+
+extern const FilterMask above_mask_univariant_reordered[67];
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_AV1_LOOPFILTER_H_

diff --git a/libav1/av1/common/av1_rtcd.c b/libav1/av1/common/av1_rtcd.c
new file mode 100644
index 0000000..a77a4d2
--- /dev/null
+++ b/libav1/av1/common/av1_rtcd.c

@@ -0,0 +1,22 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#include "config/aom_config.h"
+
+#define RTCD_C
+#include "config/av1_rtcd.h"
+
+#include "aom_ports/aom_once.h"
+
+void av1_rtcd() {
+  // TODO(JBB): Remove this aom_once, by insuring that both the encoder and
+  // decoder setup functions are protected by aom_once();
+  aom_once(setup_rtcd_internal);
+}

diff --git a/libav1/av1/common/av1_txfm.c b/libav1/av1/common/av1_txfm.c
new file mode 100644
index 0000000..ac43402
--- /dev/null
+++ b/libav1/av1/common/av1_txfm.c

@@ -0,0 +1,161 @@
+/*
+ * Copyright (c) 2017, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "config/aom_dsp_rtcd.h"
+#include "config/av1_rtcd.h"
+
+#include "av1/common/av1_txfm.h"
+
+// av1_cospi_arr[i][j] = (int)round(cos(PI*j/128) * (1<<(cos_bit_min+i)));
+const int32_t av1_cospi_arr_data[7][64] = {
+  { 1024, 1024, 1023, 1021, 1019, 1016, 1013, 1009, 1004, 999, 993, 987, 980,
+    972,  964,  955,  946,  936,  926,  915,  903,  891,  878, 865, 851, 837,
+    822,  807,  792,  775,  759,  742,  724,  706,  688,  669, 650, 630, 610,
+    590,  569,  548,  526,  505,  483,  460,  438,  415,  392, 369, 345, 321,
+    297,  273,  249,  224,  200,  175,  150,  125,  100,  75,  50,  25 },
+  { 2048, 2047, 2046, 2042, 2038, 2033, 2026, 2018, 2009, 1998, 1987,
+    1974, 1960, 1945, 1928, 1911, 1892, 1872, 1851, 1829, 1806, 1782,
+    1757, 1730, 1703, 1674, 1645, 1615, 1583, 1551, 1517, 1483, 1448,
+    1412, 1375, 1338, 1299, 1260, 1220, 1179, 1138, 1096, 1053, 1009,
+    965,  921,  876,  830,  784,  737,  690,  642,  595,  546,  498,
+    449,  400,  350,  301,  251,  201,  151,  100,  50 },
+  { 4096, 4095, 4091, 4085, 4076, 4065, 4052, 4036, 4017, 3996, 3973,
+    3948, 3920, 3889, 3857, 3822, 3784, 3745, 3703, 3659, 3612, 3564,
+    3513, 3461, 3406, 3349, 3290, 3229, 3166, 3102, 3035, 2967, 2896,
+    2824, 2751, 2675, 2598, 2520, 2440, 2359, 2276, 2191, 2106, 2019,
+    1931, 1842, 1751, 1660, 1567, 1474, 1380, 1285, 1189, 1092, 995,
+    897,  799,  700,  601,  501,  401,  301,  201,  101 },
+  { 8192, 8190, 8182, 8170, 8153, 8130, 8103, 8071, 8035, 7993, 7946,
+    7895, 7839, 7779, 7713, 7643, 7568, 7489, 7405, 7317, 7225, 7128,
+    7027, 6921, 6811, 6698, 6580, 6458, 6333, 6203, 6070, 5933, 5793,
+    5649, 5501, 5351, 5197, 5040, 4880, 4717, 4551, 4383, 4212, 4038,
+    3862, 3683, 3503, 3320, 3135, 2948, 2760, 2570, 2378, 2185, 1990,
+    1795, 1598, 1401, 1202, 1003, 803,  603,  402,  201 },
+  { 16384, 16379, 16364, 16340, 16305, 16261, 16207, 16143, 16069, 15986, 15893,
+    15791, 15679, 15557, 15426, 15286, 15137, 14978, 14811, 14635, 14449, 14256,
+    14053, 13842, 13623, 13395, 13160, 12916, 12665, 12406, 12140, 11866, 11585,
+    11297, 11003, 10702, 10394, 10080, 9760,  9434,  9102,  8765,  8423,  8076,
+    7723,  7366,  7005,  6639,  6270,  5897,  5520,  5139,  4756,  4370,  3981,
+    3590,  3196,  2801,  2404,  2006,  1606,  1205,  804,   402 },
+  { 32768, 32758, 32729, 32679, 32610, 32522, 32413, 32286, 32138, 31972, 31786,
+    31581, 31357, 31114, 30853, 30572, 30274, 29957, 29622, 29269, 28899, 28511,
+    28106, 27684, 27246, 26791, 26320, 25833, 25330, 24812, 24279, 23732, 23170,
+    22595, 22006, 21403, 20788, 20160, 19520, 18868, 18205, 17531, 16846, 16151,
+    15447, 14733, 14010, 13279, 12540, 11793, 11039, 10279, 9512,  8740,  7962,
+    7180,  6393,  5602,  4808,  4011,  3212,  2411,  1608,  804 },
+  { 65536, 65516, 65457, 65358, 65220, 65043, 64827, 64571, 64277, 63944, 63572,
+    63162, 62714, 62228, 61705, 61145, 60547, 59914, 59244, 58538, 57798, 57022,
+    56212, 55368, 54491, 53581, 52639, 51665, 50660, 49624, 48559, 47464, 46341,
+    45190, 44011, 42806, 41576, 40320, 39040, 37736, 36410, 35062, 33692, 32303,
+    30893, 29466, 28020, 26558, 25080, 23586, 22078, 20557, 19024, 17479, 15924,
+    14359, 12785, 11204, 9616,  8022,  6424,  4821,  3216,  1608 }
+};
+
+// av1_sinpi_arr_data[i][j] = (int)round((sqrt(2) * sin(j*Pi/9) * 2 / 3) * (1
+// << (cos_bit_min + i))) modified so that elements j=1,2 sum to element j=4.
+const int32_t av1_sinpi_arr_data[7][5] = {
+  { 0, 330, 621, 836, 951 },        { 0, 660, 1241, 1672, 1901 },
+  { 0, 1321, 2482, 3344, 3803 },    { 0, 2642, 4964, 6689, 7606 },
+  { 0, 5283, 9929, 13377, 15212 },  { 0, 10566, 19858, 26755, 30424 },
+  { 0, 21133, 39716, 53510, 60849 }
+};
+
+void av1_round_shift_array_c(int32_t *arr, int size, int bit) {
+  int i;
+  if (bit == 0) {
+    return;
+  } else {
+    if (bit > 0) {
+      for (i = 0; i < size; i++) {
+        arr[i] = round_shift(arr[i], bit);
+      }
+    } else {
+      for (i = 0; i < size; i++) {
+        arr[i] = (int32_t)clamp64(((int64_t)1 << (-bit)) * arr[i], INT32_MIN,
+                                  INT32_MAX);
+      }
+    }
+  }
+}
+
+const TXFM_TYPE av1_txfm_type_ls[5][TX_TYPES_1D] = {
+  { TXFM_TYPE_DCT4, TXFM_TYPE_ADST4, TXFM_TYPE_ADST4, TXFM_TYPE_IDENTITY4 },
+  { TXFM_TYPE_DCT8, TXFM_TYPE_ADST8, TXFM_TYPE_ADST8, TXFM_TYPE_IDENTITY8 },
+  { TXFM_TYPE_DCT16, TXFM_TYPE_ADST16, TXFM_TYPE_ADST16, TXFM_TYPE_IDENTITY16 },
+  { TXFM_TYPE_DCT32, TXFM_TYPE_INVALID, TXFM_TYPE_INVALID,
+    TXFM_TYPE_IDENTITY32 },
+  { TXFM_TYPE_DCT64, TXFM_TYPE_INVALID, TXFM_TYPE_INVALID, TXFM_TYPE_INVALID }
+};
+
+const int8_t av1_txfm_stage_num_list[TXFM_TYPES] = {
+  4,   // TXFM_TYPE_DCT4
+  6,   // TXFM_TYPE_DCT8
+  8,   // TXFM_TYPE_DCT16
+  10,  // TXFM_TYPE_DCT32
+  12,  // TXFM_TYPE_DCT64
+  7,   // TXFM_TYPE_ADST4
+  8,   // TXFM_TYPE_ADST8
+  10,  // TXFM_TYPE_ADST16
+  1,   // TXFM_TYPE_IDENTITY4
+  1,   // TXFM_TYPE_IDENTITY8
+  1,   // TXFM_TYPE_IDENTITY16
+  1,   // TXFM_TYPE_IDENTITY32
+};
+
+void av1_range_check_buf(int32_t stage, const int32_t *input,
+                         const int32_t *buf, int32_t size, int8_t bit) {
+#if CONFIG_COEFFICIENT_RANGE_CHECKING
+  const int64_t max_value = (1LL << (bit - 1)) - 1;
+  const int64_t min_value = -(1LL << (bit - 1));
+
+  int in_range = 1;
+
+  for (int i = 0; i < size; ++i) {
+    if (buf[i] < min_value || buf[i] > max_value) {
+      in_range = 0;
+    }
+  }
+
+  if (!in_range) {
+    fprintf(stderr, "Error: coeffs contain out-of-range values\n");
+    fprintf(stderr, "size: %d\n", size);
+    fprintf(stderr, "stage: %d\n", stage);
+    fprintf(stderr, "allowed range: [%" PRId64 ";%" PRId64 "]\n", min_value,
+            max_value);
+
+    fprintf(stderr, "coeffs: ");
+
+    fprintf(stderr, "[");
+    for (int j = 0; j < size; j++) {
+      if (j > 0) fprintf(stderr, ", ");
+      fprintf(stderr, "%d", input[j]);
+    }
+    fprintf(stderr, "]\n");
+
+    fprintf(stderr, "   buf: ");
+
+    fprintf(stderr, "[");
+    for (int j = 0; j < size; j++) {
+      if (j > 0) fprintf(stderr, ", ");
+      fprintf(stderr, "%d", buf[j]);
+    }
+    fprintf(stderr, "]\n\n");
+  }
+
+  assert(in_range);
+#else
+  (void)stage;
+  (void)input;
+  (void)buf;
+  (void)size;
+  (void)bit;
+#endif
+}

diff --git a/libav1/av1/common/av1_txfm.h b/libav1/av1/common/av1_txfm.h
new file mode 100644
index 0000000..14e2c0e
--- /dev/null
+++ b/libav1/av1/common/av1_txfm.h

@@ -0,0 +1,232 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_AV1_TXFM_H_
+#define AOM_AV1_COMMON_AV1_TXFM_H_
+
+#include <assert.h>
+#include <math.h>
+#include <stdio.h>
+
+#include "config/aom_config.h"
+
+#include "av1/common/enums.h"
+#include "av1/common/blockd.h"
+#include "aom/aom_integer.h"
+#include "aom_dsp/aom_dsp_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#if !defined(DO_RANGE_CHECK_CLAMP)
+#define DO_RANGE_CHECK_CLAMP 0
+#endif
+
+extern const int32_t av1_cospi_arr_data[7][64];
+extern const int32_t av1_sinpi_arr_data[7][5];
+
+#define MAX_TXFM_STAGE_NUM 12
+
+static const int cos_bit_min = 10;
+static const int cos_bit_max = 16;
+
+#define NewSqrt2Bits ((int32_t)12)
+// 2^12 * sqrt(2)
+static const int32_t NewSqrt2 = 5793;
+// 2^12 / sqrt(2)
+static const int32_t NewInvSqrt2 = 2896;
+
+static INLINE const int32_t *cospi_arr(int n) {
+  return av1_cospi_arr_data[n - cos_bit_min];
+}
+
+static INLINE const int32_t *sinpi_arr(int n) {
+  return av1_sinpi_arr_data[n - cos_bit_min];
+}
+
+static INLINE int32_t range_check_value(int32_t value, int8_t bit) {
+#if CONFIG_COEFFICIENT_RANGE_CHECKING
+  const int64_t max_value = (1LL << (bit - 1)) - 1;
+  const int64_t min_value = -(1LL << (bit - 1));
+  if (value < min_value || value > max_value) {
+    fprintf(stderr, "coeff out of bit range, value: %d bit %d\n", value, bit);
+    assert(0);
+  }
+#endif  // CONFIG_COEFFICIENT_RANGE_CHECKING
+#if DO_RANGE_CHECK_CLAMP
+  bit = AOMMIN(bit, 31);
+  return clamp(value, -(1 << (bit - 1)), (1 << (bit - 1)) - 1);
+#endif  // DO_RANGE_CHECK_CLAMP
+  (void)bit;
+  return value;
+}
+
+static INLINE int32_t round_shift(int64_t value, int bit) {
+  assert(bit >= 1);
+  return (int32_t)((value + (1ll << (bit - 1))) >> bit);
+}
+
+static INLINE int32_t half_btf(int32_t w0, int32_t in0, int32_t w1, int32_t in1,
+                               int bit) {
+  int64_t result_64 = (int64_t)(w0 * in0) + (int64_t)(w1 * in1);
+  int64_t intermediate = result_64 + (1LL << (bit - 1));
+  // NOTE(david.barker): The value 'result_64' may not necessarily fit
+  // into 32 bits. However, the result of this function is nominally
+  // ROUND_POWER_OF_TWO_64(result_64, bit)
+  // and that is required to fit into stage_range[stage] many bits
+  // (checked by range_check_buf()).
+  //
+  // Here we've unpacked that rounding operation, and it can be shown
+  // that the value of 'intermediate' here *does* fit into 32 bits
+  // for any conformant bitstream.
+  // The upshot is that, if you do all this calculation using
+  // wrapping 32-bit arithmetic instead of (non-wrapping) 64-bit arithmetic,
+  // then you'll still get the correct result.
+  // To provide a check on this logic, we assert that 'intermediate'
+  // would fit into an int32 if range checking is enabled.
+#if CONFIG_COEFFICIENT_RANGE_CHECKING
+  assert(intermediate >= INT32_MIN && intermediate <= INT32_MAX);
+#endif
+  return (int32_t)(intermediate >> bit);
+}
+
+static INLINE uint16_t highbd_clip_pixel_add(uint16_t dest, tran_high_t trans,
+                                             int bd) {
+  return clip_pixel_highbd(dest + (int)trans, bd);
+}
+
+typedef void (*TxfmFunc)(const int32_t *input, int32_t *output, int8_t cos_bit,
+                         const int8_t *stage_range);
+
+typedef void (*FwdTxfm2dFunc)(const int16_t *input, int32_t *output, int stride,
+                              TX_TYPE tx_type, int bd);
+
+enum {
+  TXFM_TYPE_DCT4,
+  TXFM_TYPE_DCT8,
+  TXFM_TYPE_DCT16,
+  TXFM_TYPE_DCT32,
+  TXFM_TYPE_DCT64,
+  TXFM_TYPE_ADST4,
+  TXFM_TYPE_ADST8,
+  TXFM_TYPE_ADST16,
+  TXFM_TYPE_IDENTITY4,
+  TXFM_TYPE_IDENTITY8,
+  TXFM_TYPE_IDENTITY16,
+  TXFM_TYPE_IDENTITY32,
+  TXFM_TYPES,
+  TXFM_TYPE_INVALID,
+} UENUM1BYTE(TXFM_TYPE);
+
+typedef struct TXFM_2D_FLIP_CFG {
+  TX_SIZE tx_size;
+  int ud_flip;  // flip upside down
+  int lr_flip;  // flip left to right
+  const int8_t *shift;
+  int8_t cos_bit_col;
+  int8_t cos_bit_row;
+  int8_t stage_range_col[MAX_TXFM_STAGE_NUM];
+  int8_t stage_range_row[MAX_TXFM_STAGE_NUM];
+  TXFM_TYPE txfm_type_col;
+  TXFM_TYPE txfm_type_row;
+  int stage_num_col;
+  int stage_num_row;
+} TXFM_2D_FLIP_CFG;
+
+static INLINE void get_flip_cfg(TX_TYPE tx_type, int *ud_flip, int *lr_flip) {
+  switch (tx_type) {
+    case DCT_DCT:
+    case ADST_DCT:
+    case DCT_ADST:
+    case ADST_ADST:
+      *ud_flip = 0;
+      *lr_flip = 0;
+      break;
+    case IDTX:
+    case V_DCT:
+    case H_DCT:
+    case V_ADST:
+    case H_ADST:
+      *ud_flip = 0;
+      *lr_flip = 0;
+      break;
+    case FLIPADST_DCT:
+    case FLIPADST_ADST:
+    case V_FLIPADST:
+      *ud_flip = 1;
+      *lr_flip = 0;
+      break;
+    case DCT_FLIPADST:
+    case ADST_FLIPADST:
+    case H_FLIPADST:
+      *ud_flip = 0;
+      *lr_flip = 1;
+      break;
+    case FLIPADST_FLIPADST:
+      *ud_flip = 1;
+      *lr_flip = 1;
+      break;
+    default:
+      *ud_flip = 0;
+      *lr_flip = 0;
+      assert(0);
+  }
+}
+
+static INLINE void set_flip_cfg(TX_TYPE tx_type, TXFM_2D_FLIP_CFG *cfg) {
+  get_flip_cfg(tx_type, &cfg->ud_flip, &cfg->lr_flip);
+}
+
+// Utility function that returns the log of the ratio of the col and row
+// sizes.
+static INLINE int get_rect_tx_log_ratio(int col, int row) {
+  if (col == row) return 0;
+  if (col > row) {
+    if (col == row * 2) return 1;
+    if (col == row * 4) return 2;
+    assert(0 && "Unsupported transform size");
+  } else {
+    if (row == col * 2) return -1;
+    if (row == col * 4) return -2;
+    assert(0 && "Unsupported transform size");
+  }
+  return 0;  // Invalid
+}
+
+void av1_gen_fwd_stage_range(int8_t *stage_range_col, int8_t *stage_range_row,
+                             const TXFM_2D_FLIP_CFG *cfg, int bd);
+
+void av1_gen_inv_stage_range(int8_t *stage_range_col, int8_t *stage_range_row,
+                             const TXFM_2D_FLIP_CFG *cfg, TX_SIZE tx_size,
+                             int bd);
+
+void av1_get_fwd_txfm_cfg(TX_TYPE tx_type, TX_SIZE tx_size,
+                          TXFM_2D_FLIP_CFG *cfg);
+void av1_get_inv_txfm_cfg(TX_TYPE tx_type, TX_SIZE tx_size,
+                          TXFM_2D_FLIP_CFG *cfg);
+extern const TXFM_TYPE av1_txfm_type_ls[5][TX_TYPES_1D];
+extern const int8_t av1_txfm_stage_num_list[TXFM_TYPES];
+static INLINE int get_txw_idx(TX_SIZE tx_size) {
+  return tx_size_wide_log2[tx_size] - tx_size_wide_log2[0];
+}
+static INLINE int get_txh_idx(TX_SIZE tx_size) {
+  return tx_size_high_log2[tx_size] - tx_size_high_log2[0];
+}
+
+void av1_range_check_buf(int32_t stage, const int32_t *input,
+                         const int32_t *buf, int32_t size, int8_t bit);
+#define MAX_TXWH_IDX 5
+#ifdef __cplusplus
+}
+#endif  // __cplusplus
+
+#endif  // AOM_AV1_COMMON_AV1_TXFM_H_

diff --git a/libav1/av1/common/blockd.c b/libav1/av1/common/blockd.c
new file mode 100644
index 0000000..2e796b6
--- /dev/null
+++ b/libav1/av1/common/blockd.c

@@ -0,0 +1,140 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <math.h>
+
+#include "aom_ports/system_state.h"
+
+#include "av1/common/blockd.h"
+#include "av1/common/onyxc_int.h"
+
+PREDICTION_MODE av1_left_block_mode(const MB_MODE_INFO *left_mi) {
+  if (!left_mi) return DC_PRED;
+  assert(!is_inter_block(left_mi) || is_intrabc_block(left_mi));
+  return left_mi->mode;
+}
+
+PREDICTION_MODE av1_above_block_mode(const MB_MODE_INFO *above_mi) {
+  if (!above_mi) return DC_PRED;
+  assert(!is_inter_block(above_mi) || is_intrabc_block(above_mi));
+  return above_mi->mode;
+}
+
+void av1_set_contexts(const MACROBLOCKD *xd, struct macroblockd_plane *pd,
+                      int plane, BLOCK_SIZE plane_bsize, TX_SIZE tx_size,
+                      int has_eob, int aoff, int loff) {
+  ENTROPY_CONTEXT *const a = pd->above_context + aoff;
+  ENTROPY_CONTEXT *const l = pd->left_context + loff;
+  const int txs_wide = tx_size_wide_unit[tx_size];
+  const int txs_high = tx_size_high_unit[tx_size];
+
+  // above
+  if (has_eob && xd->mb_to_right_edge < 0) {
+    const int blocks_wide = max_block_wide(xd, plane_bsize, plane);
+    const int above_contexts = AOMMIN(txs_wide, blocks_wide - aoff);
+    memset(a, has_eob, sizeof(*a) * above_contexts);
+    memset(a + above_contexts, 0, sizeof(*a) * (txs_wide - above_contexts));
+  } else {
+    memset(a, has_eob, sizeof(*a) * txs_wide);
+  }
+
+  // left
+  if (has_eob && xd->mb_to_bottom_edge < 0) {
+    const int blocks_high = max_block_high(xd, plane_bsize, plane);
+    const int left_contexts = AOMMIN(txs_high, blocks_high - loff);
+    memset(l, has_eob, sizeof(*l) * left_contexts);
+    memset(l + left_contexts, 0, sizeof(*l) * (txs_high - left_contexts));
+  } else {
+    memset(l, has_eob, sizeof(*l) * txs_high);
+  }
+}
+void av1_reset_skip_context(MACROBLOCKD *xd, int mi_row, int mi_col,
+                            BLOCK_SIZE bsize, const int num_planes) {
+  int i;
+  int nplanes;
+  int chroma_ref;
+  chroma_ref =
+      is_chroma_reference(mi_row, mi_col, bsize, xd->plane[1].subsampling_x,
+                          xd->plane[1].subsampling_y);
+  nplanes = 1 + (num_planes - 1) * chroma_ref;
+  for (i = 0; i < nplanes; i++) {
+    struct macroblockd_plane *const pd = &xd->plane[i];
+    const BLOCK_SIZE plane_bsize =
+        get_plane_block_size(bsize, pd->subsampling_x, pd->subsampling_y);
+    const int txs_wide = block_size_wide[plane_bsize] >> tx_size_wide_log2[0];
+    const int txs_high = block_size_high[plane_bsize] >> tx_size_high_log2[0];
+    memset(pd->above_context, 0, sizeof(ENTROPY_CONTEXT) * txs_wide);
+    memset(pd->left_context, 0, sizeof(ENTROPY_CONTEXT) * txs_high);
+  }
+}
+
+void av1_reset_loop_filter_delta(MACROBLOCKD *xd, int num_planes) {
+  xd->delta_lf_from_base = 0;
+  const int frame_lf_count =
+      num_planes > 1 ? FRAME_LF_COUNT : FRAME_LF_COUNT - 2;
+  for (int lf_id = 0; lf_id < frame_lf_count; ++lf_id) xd->delta_lf[lf_id] = 0;
+}
+
+void av1_reset_loop_restoration(MACROBLOCKD *xd, const int num_planes) {
+  for (int p = 0; p < num_planes; ++p) {
+    set_default_wiener(xd->wiener_info + p);
+    set_default_sgrproj(xd->sgrproj_info + p);
+  }
+}
+
+void av1_setup_block_planes(MACROBLOCKD *xd, int ss_x, int ss_y,
+                            const int num_planes) {
+  int i;
+
+  for (i = 0; i < num_planes; i++) {
+    xd->plane[i].plane_type = get_plane_type(i);
+    xd->plane[i].subsampling_x = i ? ss_x : 0;
+    xd->plane[i].subsampling_y = i ? ss_y : 0;
+  }
+  for (i = num_planes; i < MAX_MB_PLANE; i++) {
+    xd->plane[i].subsampling_x = 1;
+    xd->plane[i].subsampling_y = 1;
+  }
+}
+
+const int16_t dr_intra_derivative[90] = {
+  // More evenly spread out angles and limited to 10-bit
+  // Values that are 0 will never be used
+  //                    Approx angle
+  0,    0, 0,        //
+  1023, 0, 0,        // 3, ...
+  547,  0, 0,        // 6, ...
+  372,  0, 0, 0, 0,  // 9, ...
+  273,  0, 0,        // 14, ...
+  215,  0, 0,        // 17, ...
+  178,  0, 0,        // 20, ...
+  151,  0, 0,        // 23, ... (113 & 203 are base angles)
+  132,  0, 0,        // 26, ...
+  116,  0, 0,        // 29, ...
+  102,  0, 0, 0,     // 32, ...
+  90,   0, 0,        // 36, ...
+  80,   0, 0,        // 39, ...
+  71,   0, 0,        // 42, ...
+  64,   0, 0,        // 45, ... (45 & 135 are base angles)
+  57,   0, 0,        // 48, ...
+  51,   0, 0,        // 51, ...
+  45,   0, 0, 0,     // 54, ...
+  40,   0, 0,        // 58, ...
+  35,   0, 0,        // 61, ...
+  31,   0, 0,        // 64, ...
+  27,   0, 0,        // 67, ... (67 & 157 are base angles)
+  23,   0, 0,        // 70, ...
+  19,   0, 0,        // 73, ...
+  15,   0, 0, 0, 0,  // 76, ...
+  11,   0, 0,        // 81, ...
+  7,    0, 0,        // 84, ...
+  3,    0, 0,        // 87, ...
+};

diff --git a/libav1/av1/common/blockd.h b/libav1/av1/common/blockd.h
new file mode 100644
index 0000000..39efbd8
--- /dev/null
+++ b/libav1/av1/common/blockd.h

@@ -0,0 +1,1265 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_BLOCKD_H_
+#define AOM_AV1_COMMON_BLOCKD_H_
+
+#include "config/aom_config.h"
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_ports/mem.h"
+#include "aom_scale/yv12config.h"
+
+#include "av1/common/common_data.h"
+#include "av1/common/quant_common.h"
+#include "av1/common/entropy.h"
+#include "av1/common/entropymode.h"
+#include "av1/common/mv.h"
+#include "av1/common/scale.h"
+#include "av1/common/seg_common.h"
+#include "av1/common/tile_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define USE_B_QUANT_NO_TRELLIS 1
+
+#define MAX_MB_PLANE 3
+
+#define MAX_DIFFWTD_MASK_BITS 1
+
+// DIFFWTD_MASK_TYPES should not surpass 1 << MAX_DIFFWTD_MASK_BITS
+enum {
+  DIFFWTD_38 = 0,
+  DIFFWTD_38_INV,
+  DIFFWTD_MASK_TYPES,
+} UENUM4BYTE(DIFFWTD_MASK_TYPE);
+
+enum {
+  KEY_FRAME = 0,
+  INTER_FRAME = 1,
+  INTRA_ONLY_FRAME = 2,  // replaces intra-only
+  S_FRAME = 3,
+  FRAME_TYPES,
+} UENUM1BYTE(FRAME_TYPE);
+
+static INLINE int is_comp_ref_allowed(BLOCK_SIZE bsize) {
+  return AOMMIN(block_size_wide[bsize], block_size_high[bsize]) >= 8;
+}
+
+static INLINE int is_inter_mode(PREDICTION_MODE mode) {
+  return mode >= INTER_MODE_START && mode < INTER_MODE_END;
+}
+
+typedef struct {
+  uint8_t *plane[MAX_MB_PLANE];
+  int stride[MAX_MB_PLANE];
+} BUFFER_SET;
+
+static INLINE int is_inter_singleref_mode(PREDICTION_MODE mode) {
+  return mode >= SINGLE_INTER_MODE_START && mode < SINGLE_INTER_MODE_END;
+}
+static INLINE int is_inter_compound_mode(PREDICTION_MODE mode) {
+  return mode >= COMP_INTER_MODE_START && mode < COMP_INTER_MODE_END;
+}
+
+static INLINE PREDICTION_MODE compound_ref0_mode(PREDICTION_MODE mode) {
+  static PREDICTION_MODE lut[] = {
+    MB_MODE_COUNT,  // DC_PRED
+    MB_MODE_COUNT,  // V_PRED
+    MB_MODE_COUNT,  // H_PRED
+    MB_MODE_COUNT,  // D45_PRED
+    MB_MODE_COUNT,  // D135_PRED
+    MB_MODE_COUNT,  // D113_PRED
+    MB_MODE_COUNT,  // D157_PRED
+    MB_MODE_COUNT,  // D203_PRED
+    MB_MODE_COUNT,  // D67_PRED
+    MB_MODE_COUNT,  // SMOOTH_PRED
+    MB_MODE_COUNT,  // SMOOTH_V_PRED
+    MB_MODE_COUNT,  // SMOOTH_H_PRED
+    MB_MODE_COUNT,  // PAETH_PRED
+    MB_MODE_COUNT,  // NEARESTMV
+    MB_MODE_COUNT,  // NEARMV
+    MB_MODE_COUNT,  // GLOBALMV
+    MB_MODE_COUNT,  // NEWMV
+    NEARESTMV,      // NEAREST_NEARESTMV
+    NEARMV,         // NEAR_NEARMV
+    NEARESTMV,      // NEAREST_NEWMV
+    NEWMV,          // NEW_NEARESTMV
+    NEARMV,         // NEAR_NEWMV
+    NEWMV,          // NEW_NEARMV
+    GLOBALMV,       // GLOBAL_GLOBALMV
+    NEWMV,          // NEW_NEWMV
+  };
+  assert(NELEMENTS(lut) == MB_MODE_COUNT);
+  assert(is_inter_compound_mode(mode));
+  return lut[mode];
+}
+
+static INLINE PREDICTION_MODE compound_ref1_mode(PREDICTION_MODE mode) {
+  static PREDICTION_MODE lut[] = {
+    MB_MODE_COUNT,  // DC_PRED
+    MB_MODE_COUNT,  // V_PRED
+    MB_MODE_COUNT,  // H_PRED
+    MB_MODE_COUNT,  // D45_PRED
+    MB_MODE_COUNT,  // D135_PRED
+    MB_MODE_COUNT,  // D113_PRED
+    MB_MODE_COUNT,  // D157_PRED
+    MB_MODE_COUNT,  // D203_PRED
+    MB_MODE_COUNT,  // D67_PRED
+    MB_MODE_COUNT,  // SMOOTH_PRED
+    MB_MODE_COUNT,  // SMOOTH_V_PRED
+    MB_MODE_COUNT,  // SMOOTH_H_PRED
+    MB_MODE_COUNT,  // PAETH_PRED
+    MB_MODE_COUNT,  // NEARESTMV
+    MB_MODE_COUNT,  // NEARMV
+    MB_MODE_COUNT,  // GLOBALMV
+    MB_MODE_COUNT,  // NEWMV
+    NEARESTMV,      // NEAREST_NEARESTMV
+    NEARMV,         // NEAR_NEARMV
+    NEWMV,          // NEAREST_NEWMV
+    NEARESTMV,      // NEW_NEARESTMV
+    NEWMV,          // NEAR_NEWMV
+    NEARMV,         // NEW_NEARMV
+    GLOBALMV,       // GLOBAL_GLOBALMV
+    NEWMV,          // NEW_NEWMV
+  };
+  assert(NELEMENTS(lut) == MB_MODE_COUNT);
+  assert(is_inter_compound_mode(mode));
+  return lut[mode];
+}
+
+static INLINE int have_nearmv_in_inter_mode(PREDICTION_MODE mode) {
+  return (mode == NEARMV || mode == NEAR_NEARMV || mode == NEAR_NEWMV ||
+          mode == NEW_NEARMV);
+}
+
+static INLINE int have_newmv_in_inter_mode(PREDICTION_MODE mode) {
+  return (mode == NEWMV || mode == NEW_NEWMV || mode == NEAREST_NEWMV ||
+          mode == NEW_NEARESTMV || mode == NEAR_NEWMV || mode == NEW_NEARMV);
+}
+
+static INLINE int is_masked_compound_type(COMPOUND_TYPE type) {
+  return (type == COMPOUND_WEDGE || type == COMPOUND_DIFFWTD);
+}
+
+/* For keyframes, intra block modes are predicted by the (already decoded)
+   modes for the Y blocks to the left and above us; for interframes, there
+   is a single probability table. */
+
+typedef struct {
+  // Value of base colors for Y, U, and V
+  uint16_t palette_colors[3 * PALETTE_MAX_SIZE];
+  // Number of base colors for Y (0) and UV (1)
+  int16_t palette_size[2];
+} PALETTE_MODE_INFO;
+
+typedef struct {
+  FILTER_INTRA_MODE filter_intra_mode;
+  uint8_t use_filter_intra;
+} FILTER_INTRA_MODE_INFO;
+
+static const PREDICTION_MODE fimode_to_intradir[FILTER_INTRA_MODES] = {
+  DC_PRED, V_PRED, H_PRED, D157_PRED, DC_PRED
+};
+
+#if CONFIG_RD_DEBUG
+#define TXB_COEFF_COST_MAP_SIZE (MAX_MIB_SIZE)
+#endif
+
+typedef struct RD_STATS {
+  int rate;
+  int64_t dist;
+  // Please be careful of using rdcost, it's not guaranteed to be set all the
+  // time.
+  // TODO(angiebird): Create a set of functions to manipulate the RD_STATS. In
+  // these functions, make sure rdcost is always up-to-date according to
+  // rate/dist.
+  int64_t rdcost;
+  int64_t sse;
+  int skip;  // sse should equal to dist when skip == 1
+  int64_t ref_rdcost;
+  int zero_rate;
+  uint8_t invalid_rate;
+#if CONFIG_ONE_PASS_SVM
+  int eob, eob_0, eob_1, eob_2, eob_3;
+  int64_t rd, rd_0, rd_1, rd_2, rd_3;
+  int64_t y_sse, sse_0, sse_1, sse_2, sse_3;
+#endif
+#if CONFIG_RD_DEBUG
+  int txb_coeff_cost[MAX_MB_PLANE];
+  int txb_coeff_cost_map[MAX_MB_PLANE][TXB_COEFF_COST_MAP_SIZE]
+                        [TXB_COEFF_COST_MAP_SIZE];
+#endif  // CONFIG_RD_DEBUG
+} RD_STATS;
+
+// This struct is used to group function args that are commonly
+// sent together in functions related to interinter compound modes
+typedef struct {
+  COMPOUND_TYPE type;
+  DIFFWTD_MASK_TYPE mask_type;
+  int wedge_index;
+  int wedge_sign;
+} INTERINTER_COMPOUND_DATA;
+
+#define INTER_TX_SIZE_BUF_LEN 16
+#define TXK_TYPE_BUF_LEN 64
+// This structure now relates to 4x4 block regions.
+typedef struct MB_MODE_INFO {
+  PALETTE_MODE_INFO palette_mode_info;
+  WarpedMotionParams wm_params;
+  // interinter members
+  INTERINTER_COMPOUND_DATA interinter_comp;
+  FILTER_INTRA_MODE_INFO filter_intra_mode_info;
+  int_mv mv[2];
+  // Only for INTER blocks
+  InterpFilters interp_filters;
+  // TODO(debargha): Consolidate these flags
+  int interintra_wedge_index;
+  int interintra_wedge_sign;
+  int current_qindex;
+  int delta_lf_from_base;
+  int delta_lf[FRAME_LF_COUNT];
+#if CONFIG_RD_DEBUG
+  RD_STATS rd_stats;
+#endif
+  int mi_row;
+  int mi_col;
+  int index_base;
+  int num_proj_ref;
+
+  // Index of the alpha Cb and alpha Cr combination
+  int cfl_alpha_idx;
+  // Joint sign of alpha Cb and alpha Cr
+  int cfl_alpha_signs;
+
+  // Indicate if masked compound is used(1) or not(0).
+  int16_t comp_group_idx;
+  // If comp_group_idx=0, indicate if dist_wtd_comp(0) or avg_comp(1) is used.
+  int16_t compound_idx;
+#if CONFIG_INSPECTION
+  int16_t tx_skip[TXK_TYPE_BUF_LEN];
+#endif
+  // Common for both INTER and INTRA blocks
+  BLOCK_SIZE sb_type;
+  PARTITION_TYPE partition;
+  MV_REFERENCE_FRAME ref_frame[2];
+
+  PREDICTION_MODE mode;
+  // Only for INTRA blocks
+  UV_PREDICTION_MODE uv_mode;
+  // interintra members
+  INTERINTRA_MODE interintra_mode;
+  MOTION_MODE motion_mode;
+
+//  TX_TYPE txk_type[TXK_TYPE_BUF_LEN];
+
+  int8_t use_wedge_interintra;
+  uint8_t use_intrabc;
+  // The actual prediction angle is the base angle + (angle_delta * step).
+  int8_t angle_delta[PLANE_TYPES];
+
+  TX_SIZE tx_size;
+  int8_t skip;
+  int8_t skip_mode;
+  int8_t segment_id;
+
+  uint8_t inter_tx_size[INTER_TX_SIZE_BUF_LEN];
+  int8_t seg_id_predicted;  // valid only when temporal_update is enabled
+  /* deringing gain *per-superblock* */
+  int8_t cdef_strength;
+  uint8_t ref_mv_idx;
+  int8_t reserved;
+} MB_MODE_INFO;
+
+static INLINE int is_intrabc_block(const MB_MODE_INFO *mbmi) {
+  return mbmi->use_intrabc;
+}
+
+static INLINE PREDICTION_MODE get_uv_mode(UV_PREDICTION_MODE mode) {
+  assert(mode < UV_INTRA_MODES);
+  static const PREDICTION_MODE uv2y[] = {
+    DC_PRED,        // UV_DC_PRED
+    V_PRED,         // UV_V_PRED
+    H_PRED,         // UV_H_PRED
+    D45_PRED,       // UV_D45_PRED
+    D135_PRED,      // UV_D135_PRED
+    D113_PRED,      // UV_D113_PRED
+    D157_PRED,      // UV_D157_PRED
+    D203_PRED,      // UV_D203_PRED
+    D67_PRED,       // UV_D67_PRED
+    SMOOTH_PRED,    // UV_SMOOTH_PRED
+    SMOOTH_V_PRED,  // UV_SMOOTH_V_PRED
+    SMOOTH_H_PRED,  // UV_SMOOTH_H_PRED
+    PAETH_PRED,     // UV_PAETH_PRED
+    DC_PRED,        // UV_CFL_PRED
+    INTRA_INVALID,  // UV_INTRA_MODES
+    INTRA_INVALID,  // UV_MODE_INVALID
+  };
+  return uv2y[mode];
+}
+
+static INLINE int is_inter_block(const MB_MODE_INFO *mbmi) {
+  return is_intrabc_block(mbmi) || mbmi->ref_frame[0] > INTRA_FRAME;
+}
+
+static INLINE int has_second_ref(const MB_MODE_INFO *mbmi) {
+  return mbmi->ref_frame[1] > INTRA_FRAME;
+}
+
+static INLINE int has_uni_comp_refs(const MB_MODE_INFO *mbmi) {
+  return has_second_ref(mbmi) && (!((mbmi->ref_frame[0] >= BWDREF_FRAME) ^
+                                    (mbmi->ref_frame[1] >= BWDREF_FRAME)));
+}
+
+static INLINE MV_REFERENCE_FRAME comp_ref0(int ref_idx) {
+  static const MV_REFERENCE_FRAME lut[] = {
+    LAST_FRAME,     // LAST_LAST2_FRAMES,
+    LAST_FRAME,     // LAST_LAST3_FRAMES,
+    LAST_FRAME,     // LAST_GOLDEN_FRAMES,
+    BWDREF_FRAME,   // BWDREF_ALTREF_FRAMES,
+    LAST2_FRAME,    // LAST2_LAST3_FRAMES
+    LAST2_FRAME,    // LAST2_GOLDEN_FRAMES,
+    LAST3_FRAME,    // LAST3_GOLDEN_FRAMES,
+    BWDREF_FRAME,   // BWDREF_ALTREF2_FRAMES,
+    ALTREF2_FRAME,  // ALTREF2_ALTREF_FRAMES,
+  };
+  assert(NELEMENTS(lut) == TOTAL_UNIDIR_COMP_REFS);
+  return lut[ref_idx];
+}
+
+static INLINE MV_REFERENCE_FRAME comp_ref1(int ref_idx) {
+  static const MV_REFERENCE_FRAME lut[] = {
+    LAST2_FRAME,    // LAST_LAST2_FRAMES,
+    LAST3_FRAME,    // LAST_LAST3_FRAMES,
+    GOLDEN_FRAME,   // LAST_GOLDEN_FRAMES,
+    ALTREF_FRAME,   // BWDREF_ALTREF_FRAMES,
+    LAST3_FRAME,    // LAST2_LAST3_FRAMES
+    GOLDEN_FRAME,   // LAST2_GOLDEN_FRAMES,
+    GOLDEN_FRAME,   // LAST3_GOLDEN_FRAMES,
+    ALTREF2_FRAME,  // BWDREF_ALTREF2_FRAMES,
+    ALTREF_FRAME,   // ALTREF2_ALTREF_FRAMES,
+  };
+  assert(NELEMENTS(lut) == TOTAL_UNIDIR_COMP_REFS);
+  return lut[ref_idx];
+}
+
+PREDICTION_MODE av1_left_block_mode(const MB_MODE_INFO *left_mi);
+
+PREDICTION_MODE av1_above_block_mode(const MB_MODE_INFO *above_mi);
+
+static INLINE int is_global_mv_block(const MB_MODE_INFO *const mbmi,
+                                     TransformationType type) {
+  const PREDICTION_MODE mode = mbmi->mode;
+  const BLOCK_SIZE bsize = mbmi->sb_type;
+  const int block_size_allowed =
+      AOMMIN(block_size_wide[bsize], block_size_high[bsize]) >= 8;
+  return (mode == GLOBALMV || mode == GLOBAL_GLOBALMV) && type > TRANSLATION &&
+         block_size_allowed;
+}
+
+#if CONFIG_MISMATCH_DEBUG
+static INLINE void mi_to_pixel_loc(int *pixel_c, int *pixel_r, int mi_col,
+                                   int mi_row, int tx_blk_col, int tx_blk_row,
+                                   int subsampling_x, int subsampling_y) {
+  *pixel_c = ((mi_col >> subsampling_x) << MI_SIZE_LOG2) +
+             (tx_blk_col << tx_size_wide_log2[0]);
+  *pixel_r = ((mi_row >> subsampling_y) << MI_SIZE_LOG2) +
+             (tx_blk_row << tx_size_high_log2[0]);
+}
+#endif
+
+enum { MV_PRECISION_Q3, MV_PRECISION_Q4 } UENUM1BYTE(mv_precision);
+
+struct buf_2d {
+  uint8_t *buf;
+  uint8_t *buf0;
+  int width;
+  int height;
+  int stride;
+};
+
+typedef struct  
+{
+    uint16_t x, y;
+    uint32_t mode_flags;
+    uint32_t mode_info0;
+    uint32_t mode_info1;
+} PredictionBlock;
+
+
+typedef struct  
+{
+    uint16_t x, y;
+    uint32_t flags;
+    int mat[6];
+    int alpha;
+    int beta;
+    int delta;
+    int gamma;
+} InterWarpBlock;
+
+
+typedef struct  
+{
+    uint32_t type;
+    uint32_t index;
+} PredictionBlockSortingInfo;
+
+
+typedef struct tx_block_info_gpu {
+    uint32_t flags;
+    uint32_t input_offset;
+    uint32_t output_pos;
+    uint32_t sorting_idx;
+} tx_block_info_gpu;
+
+typedef struct
+{
+    tx_block_info_gpu * idct_blocks_host;
+    tran_low_t * dq_buffer_base;
+    int dq_buffer_offset;
+    int dq_buffer_ptr;
+    int dq_buffer_max;
+    int blocks_offset;
+    int idct_blocks_ptr;
+    int idct_blocks_sizes[TX_SIZES_ALL + 2];
+    int block_count;
+    int block_count_wrp;
+    int intra_iter_max;
+    int intra_iter_max_uv;
+    int have_inter;
+    unsigned int * gen_indexes;
+    int gen_index_ptr;
+    int gen_index_base;
+    int * gen_block_map;
+    int * gen_block_map_wrp;
+    int gen_intra_iter_y;
+    int gen_pred_map_max;
+    int gen_intra_iter_uv;
+    int gen_intra_max_iter;
+    int gen_intra_iter_set;
+    int gen_iter_clear_offset;
+    int gen_iter_clear_size;
+    int gen_block_map_offset;
+    int gen_block_warp_offset;
+    int mi_offset;
+    int mi_count;
+    int mi_col_start;
+    int mi_row_start;
+} av1_tile_data;
+
+typedef struct tx_block_info {
+    uint16_t eob;
+    uint16_t max_scan_line;
+    uint16_t x;
+    uint16_t y;
+    uint8_t plane;
+    uint8_t tx_size;
+    uint8_t tx_type;
+    uint8_t flags;
+    uint32_t dq_offset;
+    uint32_t dq_offset_raw;
+    uint32_t gpu_index;
+} tx_block_info;
+
+typedef struct eob_info {
+  uint16_t eob;
+  uint16_t max_scan_line;
+} eob_info;
+
+typedef struct {
+  eob_info eob_data[MAX_MB_PLANE]
+                   [MAX_SB_SQUARE / (TX_SIZE_W_MIN * TX_SIZE_H_MIN)];
+  DECLARE_ALIGNED(16, uint8_t, color_index_map[2][MAX_SB_SQUARE]);
+} CB_BUFFER;
+
+typedef struct macroblockd_plane {
+  tran_low_t *dqcoeff;
+  eob_info *eob_data;
+  PLANE_TYPE plane_type;
+  int subsampling_x;
+  int subsampling_y;
+  struct buf_2d dst;
+  struct buf_2d pre[2];
+  ENTROPY_CONTEXT *above_context;
+  ENTROPY_CONTEXT *left_context;
+
+  // The dequantizers below are true dequantizers used only in the
+  // dequantization process.  They have the same coefficient
+  // shift/scale as TX.
+  int16_t seg_dequant_QTX[MAX_SEGMENTS][2];
+  uint8_t *color_index_map;
+
+  // block size in pixels
+  uint8_t width, height;
+
+  qm_val_t *seg_iqmatrix[MAX_SEGMENTS][TX_SIZES_ALL];
+  qm_val_t *seg_qmatrix[MAX_SEGMENTS][TX_SIZES_ALL];
+
+  // the 'dequantizers' below are not literal dequantizer values.
+  // They're used by encoder RDO to generate ad-hoc lambda values.
+  // They use a hardwired Q3 coeff shift and do not necessarily match
+  // the TX scale in use.
+  const int16_t *dequant_Q3;
+} MACROBLOCKD_PLANE;
+
+#define BLOCK_OFFSET(x, i) \
+  ((x) + (i) * (1 << (tx_size_wide_log2[0] + tx_size_high_log2[0])))
+
+typedef struct {
+  DECLARE_ALIGNED(16, InterpKernel, vfilter);
+  DECLARE_ALIGNED(16, InterpKernel, hfilter);
+} WienerInfo;
+
+typedef struct {
+  int ep;
+  int xqd[2];
+} SgrprojInfo;
+
+#if CONFIG_DEBUG
+#define CFL_SUB8X8_VAL_MI_SIZE (4)
+#define CFL_SUB8X8_VAL_MI_SQUARE \
+  (CFL_SUB8X8_VAL_MI_SIZE * CFL_SUB8X8_VAL_MI_SIZE)
+#endif  // CONFIG_DEBUG
+#define CFL_MAX_BLOCK_SIZE (BLOCK_32X32)
+#define CFL_BUF_LINE (32)
+#define CFL_BUF_LINE_I128 (CFL_BUF_LINE >> 3)
+#define CFL_BUF_LINE_I256 (CFL_BUF_LINE >> 4)
+#define CFL_BUF_SQUARE (CFL_BUF_LINE * CFL_BUF_LINE)
+typedef struct cfl_ctx {
+  // Q3 reconstructed luma pixels (only Q2 is required, but Q3 is used to avoid
+  // shifts)
+  uint16_t recon_buf_q3[CFL_BUF_SQUARE];
+  // Q3 AC contributions (reconstructed luma pixels - tx block avg)
+  int16_t ac_buf_q3[CFL_BUF_SQUARE];
+
+  // Cache the DC_PRED when performing RDO, so it does not have to be recomputed
+  // for every scaling parameter
+  int dc_pred_is_cached[CFL_PRED_PLANES];
+  // The DC_PRED cache is disable when decoding
+  int use_dc_pred_cache;
+  // Only cache the first row of the DC_PRED
+  int16_t dc_pred_cache[CFL_PRED_PLANES][CFL_BUF_LINE];
+
+  // Height and width currently used in the CfL prediction buffer.
+  int buf_height, buf_width;
+
+  int are_parameters_computed;
+
+  // Chroma subsampling
+  int subsampling_x, subsampling_y;
+
+  int mi_row, mi_col;
+
+  // Whether the reconstructed luma pixels need to be stored
+  int store_y;
+
+#if CONFIG_DEBUG
+  int rate;
+#endif  // CONFIG_DEBUG
+
+  int is_chroma_reference;
+} CFL_CTX;
+
+typedef struct dist_wtd_comp_params {
+  int use_dist_wtd_comp_avg;
+  int fwd_offset;
+  int bck_offset;
+} DIST_WTD_COMP_PARAMS;
+
+struct scale_factors;
+
+// Most/all of the pointers are mere pointers to actual arrays are allocated
+// elsewhere. This is mostly for coding convenience.
+typedef struct macroblockd {
+  struct macroblockd_plane plane[MAX_MB_PLANE];
+  CB_BUFFER * cb_buffer;
+
+  TileInfo tile;
+
+  int mi_stride;
+
+  MB_MODE_INFO **mi;
+  MB_MODE_INFO *left_mbmi;
+  MB_MODE_INFO *above_mbmi;
+  MB_MODE_INFO *chroma_left_mbmi;
+  MB_MODE_INFO *chroma_above_mbmi;
+
+  int up_available;
+  int left_available;
+  int chroma_up_available;
+  int chroma_left_available;
+
+  /* Distance of MB away from frame edges in subpixels (1/8th pixel)  */
+  int mb_to_left_edge;
+  int mb_to_right_edge;
+  int mb_to_top_edge;
+  int mb_to_bottom_edge;
+
+  /* pointers to reference frame scale factors */
+  const struct scale_factors *block_ref_scale_factors[2];
+
+  /* pointer to current frame */
+  const YV12_BUFFER_CONFIG *cur_buf;
+
+  ENTROPY_CONTEXT *above_context[MAX_MB_PLANE];
+  ENTROPY_CONTEXT left_context[MAX_MB_PLANE][MAX_MIB_SIZE];
+
+  PARTITION_CONTEXT *above_seg_context;
+  PARTITION_CONTEXT left_seg_context[MAX_MIB_SIZE];
+
+  TXFM_CONTEXT *above_txfm_context;
+  TXFM_CONTEXT *left_txfm_context;
+  TXFM_CONTEXT left_txfm_context_buffer[MAX_MIB_SIZE];
+
+  WienerInfo wiener_info[MAX_MB_PLANE];
+  SgrprojInfo sgrproj_info[MAX_MB_PLANE];
+
+  // block dimension in the unit of mode_info.
+  uint8_t n4_w, n4_h;
+
+  uint8_t ref_mv_count[MODE_CTX_REF_FRAMES];
+  CANDIDATE_MV ref_mv_stack[MODE_CTX_REF_FRAMES][MAX_REF_MV_STACK_SIZE];
+  uint8_t is_sec_rect;
+
+  // Counts of each reference frame in the above and left neighboring blocks.
+  // NOTE: Take into account both single and comp references.
+  uint8_t neighbors_ref_counts[REF_FRAMES];
+
+  FRAME_CONTEXT *tile_ctx;
+  /* Bit depth: 8, 10, 12 */
+  int bd;
+  int overlappable_neighbors[2];
+  TX_TYPE txk_type[TXK_TYPE_BUF_LEN];
+
+  int qindex[MAX_SEGMENTS];
+  int lossless[MAX_SEGMENTS];
+  int corrupted;
+  int cur_frame_force_integer_mv;
+  // same with that in AV1_COMMON
+  struct aom_internal_error_info *error_info;
+  const WarpedMotionParams *global_motion;
+  int delta_qindex;
+  int current_qindex;
+  // Since actual frame level loop filtering level value is not available
+  // at the beginning of the tile (only available during actual filtering)
+  // at encoder side.we record the delta_lf (against the frame level loop
+  // filtering level) and code the delta between previous superblock's delta
+  // lf and current delta lf. It is equivalent to the delta between previous
+  // superblock's actual lf and current lf.
+  int delta_lf_from_base;
+  // For this experiment, we have four frame filter levels for different plane
+  // and direction. So, to support the per superblock update, we need to add
+  // a few more params as below.
+  // 0: delta loop filter level for y plane vertical
+  // 1: delta loop filter level for y plane horizontal
+  // 2: delta loop filter level for u plane
+  // 3: delta loop filter level for v plane
+  // To make it consistent with the reference to each filter level in segment,
+  // we need to -1, since
+  // SEG_LVL_ALT_LF_Y_V = 1;
+  // SEG_LVL_ALT_LF_Y_H = 2;
+  // SEG_LVL_ALT_LF_U   = 3;
+  // SEG_LVL_ALT_LF_V   = 4;
+  int delta_lf[FRAME_LF_COUNT];
+  int cdef_preset[4];
+
+  uint8_t *mc_buf[2];
+  CFL_CTX cfl;
+
+  DIST_WTD_COMP_PARAMS jcp_param;
+
+  uint16_t cb_offset[MAX_MB_PLANE];
+  uint16_t txb_offset[MAX_MB_PLANE];
+  uint16_t color_index_map_offset[2];
+
+  CONV_BUF_TYPE *tmp_conv_dst;
+  uint8_t *tmp_obmc_bufs[2];
+  av1_tile_data * tile_data;
+} MACROBLOCKD;
+
+static INLINE int is_cur_buf_hbd(const MACROBLOCKD *xd) {
+  return xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH ? 1 : 0;
+}
+
+static INLINE uint8_t *get_buf_by_bd(const MACROBLOCKD *xd, uint8_t *buf16) {
+  return (xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH)
+             ? CONVERT_TO_BYTEPTR(buf16)
+             : buf16;
+}
+
+static INLINE int get_sqr_bsize_idx(BLOCK_SIZE bsize) {
+  switch (bsize) {
+    case BLOCK_4X4: return 0;
+    case BLOCK_8X8: return 1;
+    case BLOCK_16X16: return 2;
+    case BLOCK_32X32: return 3;
+    case BLOCK_64X64: return 4;
+    case BLOCK_128X128: return 5;
+    default: return SQR_BLOCK_SIZES;
+  }
+}
+
+// For a square block size 'bsize', returns the size of the sub-blocks used by
+// the given partition type. If the partition produces sub-blocks of different
+// sizes, then the function returns the largest sub-block size.
+// Implements the Partition_Subsize lookup table in the spec (Section 9.3.
+// Conversion tables).
+// Note: the input block size should be square.
+// Otherwise it's considered invalid.
+static INLINE BLOCK_SIZE get_partition_subsize(BLOCK_SIZE bsize,
+                                               PARTITION_TYPE partition) {
+  if (partition == PARTITION_INVALID) {
+    return BLOCK_INVALID;
+  } else {
+    const int sqr_bsize_idx = get_sqr_bsize_idx(bsize);
+    return sqr_bsize_idx >= SQR_BLOCK_SIZES
+               ? BLOCK_INVALID
+               : subsize_lookup[partition][sqr_bsize_idx];
+  }
+}
+
+static TX_TYPE intra_mode_to_tx_type(const MB_MODE_INFO *mbmi,
+                                     PLANE_TYPE plane_type) {
+  static const TX_TYPE _intra_mode_to_tx_type[INTRA_MODES] = {
+    DCT_DCT,    // DC_PRED
+    ADST_DCT,   // V_PRED
+    DCT_ADST,   // H_PRED
+    DCT_DCT,    // D45_PRED
+    ADST_ADST,  // D135_PRED
+    ADST_DCT,   // D113_PRED
+    DCT_ADST,   // D157_PRED
+    DCT_ADST,   // D203_PRED
+    ADST_DCT,   // D67_PRED
+    ADST_ADST,  // SMOOTH_PRED
+    ADST_DCT,   // SMOOTH_V_PRED
+    DCT_ADST,   // SMOOTH_H_PRED
+    ADST_ADST,  // PAETH_PRED
+  };
+  const PREDICTION_MODE mode =
+      (plane_type == PLANE_TYPE_Y) ? mbmi->mode : get_uv_mode(mbmi->uv_mode);
+  assert(mode < INTRA_MODES);
+  return _intra_mode_to_tx_type[mode];
+}
+
+static INLINE int is_rect_tx(TX_SIZE tx_size) { return tx_size >= TX_SIZES; }
+
+static INLINE int block_signals_txsize(BLOCK_SIZE bsize) {
+  return bsize > BLOCK_4X4;
+}
+
+// Number of transform types in each set type
+static const int av1_num_ext_tx_set[EXT_TX_SET_TYPES] = {
+  1, 2, 5, 7, 12, 16,
+};
+
+static const int av1_ext_tx_used[EXT_TX_SET_TYPES][TX_TYPES] = {
+  { 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
+  { 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0 },
+  { 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0 },
+  { 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0 },
+  { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0 },
+  { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 },
+};
+
+static const uint16_t av1_ext_tx_used_flag[EXT_TX_SET_TYPES] = {
+  0x0001,  // 0000 0000 0000 0001
+  0x0201,  // 0000 0010 0000 0001
+  0x020F,  // 0000 0010 0000 1111
+  0x0E0F,  // 0000 1110 0000 1111
+  0x0FFF,  // 0000 1111 1111 1111
+  0xFFFF,  // 1111 1111 1111 1111
+};
+
+static INLINE TxSetType av1_get_ext_tx_set_type(TX_SIZE tx_size, int is_inter,
+                                                int use_reduced_set) {
+  const TX_SIZE tx_size_sqr_up = txsize_sqr_up_map[tx_size];
+  if (tx_size_sqr_up > TX_32X32) return EXT_TX_SET_DCTONLY;
+  if (tx_size_sqr_up == TX_32X32)
+    return is_inter ? EXT_TX_SET_DCT_IDTX : EXT_TX_SET_DCTONLY;
+  if (use_reduced_set)
+    return is_inter ? EXT_TX_SET_DCT_IDTX : EXT_TX_SET_DTT4_IDTX;
+  const TX_SIZE tx_size_sqr = txsize_sqr_map[tx_size];
+  if (is_inter) {
+    return (tx_size_sqr == TX_16X16 ? EXT_TX_SET_DTT9_IDTX_1DDCT
+                                    : EXT_TX_SET_ALL16);
+  } else {
+    return (tx_size_sqr == TX_16X16 ? EXT_TX_SET_DTT4_IDTX
+                                    : EXT_TX_SET_DTT4_IDTX_1DDCT);
+  }
+}
+
+// Maps tx set types to the indices.
+static const int ext_tx_set_index[2][EXT_TX_SET_TYPES] = {
+  { // Intra
+    0, -1, 2, 1, -1, -1 },
+  { // Inter
+    0, 3, -1, -1, 2, 1 },
+};
+
+static INLINE int get_ext_tx_set(TX_SIZE tx_size, int is_inter,
+                                 int use_reduced_set) {
+  const TxSetType set_type =
+      av1_get_ext_tx_set_type(tx_size, is_inter, use_reduced_set);
+  return ext_tx_set_index[is_inter][set_type];
+}
+
+static INLINE int get_ext_tx_types(TX_SIZE tx_size, int is_inter,
+                                   int use_reduced_set) {
+  const int set_type =
+      av1_get_ext_tx_set_type(tx_size, is_inter, use_reduced_set);
+  return av1_num_ext_tx_set[set_type];
+}
+
+#define TXSIZEMAX(t1, t2) (tx_size_2d[(t1)] >= tx_size_2d[(t2)] ? (t1) : (t2))
+#define TXSIZEMIN(t1, t2) (tx_size_2d[(t1)] <= tx_size_2d[(t2)] ? (t1) : (t2))
+
+static INLINE TX_SIZE tx_size_from_tx_mode(BLOCK_SIZE bsize, TX_MODE tx_mode) {
+  const TX_SIZE largest_tx_size = tx_mode_to_biggest_tx_size[tx_mode];
+  const TX_SIZE max_rect_tx_size = max_txsize_rect_lookup[bsize];
+  if (bsize == BLOCK_4X4)
+    return AOMMIN(max_txsize_lookup[bsize], largest_tx_size);
+  if (txsize_sqr_map[max_rect_tx_size] <= largest_tx_size)
+    return max_rect_tx_size;
+  else
+    return largest_tx_size;
+}
+
+extern const int16_t dr_intra_derivative[90];
+static const uint8_t mode_to_angle_map[] = {
+  0, 90, 180, 45, 135, 113, 157, 203, 67, 0, 0, 0, 0,
+};
+
+// Converts block_index for given transform size to index of the block in raster
+// order.
+static INLINE int av1_block_index_to_raster_order(TX_SIZE tx_size,
+                                                  int block_idx) {
+  // For transform size 4x8, the possible block_idx values are 0 & 2, because
+  // block_idx values are incremented in steps of size 'tx_width_unit x
+  // tx_height_unit'. But, for this transform size, block_idx = 2 corresponds to
+  // block number 1 in raster order, inside an 8x8 MI block.
+  // For any other transform size, the two indices are equivalent.
+  return (tx_size == TX_4X8 && block_idx == 2) ? 1 : block_idx;
+}
+
+// Inverse of above function.
+// Note: only implemented for transform sizes 4x4, 4x8 and 8x4 right now.
+static INLINE int av1_raster_order_to_block_index(TX_SIZE tx_size,
+                                                  int raster_order) {
+  assert(tx_size == TX_4X4 || tx_size == TX_4X8 || tx_size == TX_8X4);
+  // We ensure that block indices are 0 & 2 if tx size is 4x8 or 8x4.
+  return (tx_size == TX_4X4) ? raster_order : (raster_order > 0) ? 2 : 0;
+}
+
+static INLINE TX_TYPE get_default_tx_type(PLANE_TYPE plane_type,
+                                          const MACROBLOCKD *xd,
+                                          TX_SIZE tx_size,
+                                          int is_screen_content_type) {
+  const MB_MODE_INFO *const mbmi = xd->mi[0];
+
+  if (is_inter_block(mbmi) || plane_type != PLANE_TYPE_Y ||
+      xd->lossless[mbmi->segment_id] || tx_size >= TX_32X32 ||
+      is_screen_content_type)
+    return DCT_DCT;
+
+  return intra_mode_to_tx_type(mbmi, plane_type);
+}
+
+// Implements the get_plane_residual_size() function in the spec (Section
+// 5.11.38. Get plane residual size function).
+static INLINE BLOCK_SIZE get_plane_block_size(BLOCK_SIZE bsize,
+                                              int subsampling_x,
+                                              int subsampling_y) {
+  if (bsize == BLOCK_INVALID) return BLOCK_INVALID;
+  return ss_size_lookup[bsize][subsampling_x][subsampling_y];
+}
+
+static INLINE int av1_get_txb_size_index(BLOCK_SIZE bsize, int blk_row,
+                                         int blk_col) {
+  TX_SIZE txs = max_txsize_rect_lookup[bsize];
+  for (int level = 0; level < MAX_VARTX_DEPTH - 1; ++level)
+    txs = sub_tx_size_map[txs];
+  const int tx_w_log2 = tx_size_wide_log2[txs] - MI_SIZE_LOG2;
+  const int tx_h_log2 = tx_size_high_log2[txs] - MI_SIZE_LOG2;
+  const int bw_log2 = mi_size_wide_log2[bsize];
+  const int stride_log2 = bw_log2 - tx_w_log2;
+  const int index =
+      ((blk_row >> tx_h_log2) << stride_log2) + (blk_col >> tx_w_log2);
+  assert(index < INTER_TX_SIZE_BUF_LEN);
+  return index;
+}
+
+static INLINE int av1_get_txk_type_index(BLOCK_SIZE bsize, int blk_row,
+                                         int blk_col) {
+  TX_SIZE txs = max_txsize_rect_lookup[bsize];
+  for (int level = 0; level < MAX_VARTX_DEPTH; ++level)
+    txs = sub_tx_size_map[txs];
+  const int tx_w_log2 = tx_size_wide_log2[txs] - MI_SIZE_LOG2;
+  const int tx_h_log2 = tx_size_high_log2[txs] - MI_SIZE_LOG2;
+  const int bw_uint_log2 = mi_size_wide_log2[bsize];
+  const int stride_log2 = bw_uint_log2 - tx_w_log2;
+  const int index =
+      ((blk_row >> tx_h_log2) << stride_log2) + (blk_col >> tx_w_log2);
+  assert(index < TXK_TYPE_BUF_LEN);
+  return index;
+}
+
+static INLINE void update_txk_array(TX_TYPE *txk_type, BLOCK_SIZE bsize,
+                                    int blk_row, int blk_col, TX_SIZE tx_size,
+                                    TX_TYPE tx_type) {
+  const int txk_type_idx = av1_get_txk_type_index(bsize, blk_row, blk_col);
+  txk_type[txk_type_idx] = tx_type;
+
+  const int txw = tx_size_wide_unit[tx_size];
+  const int txh = tx_size_high_unit[tx_size];
+  // The 16x16 unit is due to the constraint from tx_64x64 which sets the
+  // maximum tx size for chroma as 32x32. Coupled with 4x1 transform block
+  // size, the constraint takes effect in 32x16 / 16x32 size too. To solve
+  // the intricacy, cover all the 16x16 units inside a 64 level transform.
+  if (txw == tx_size_wide_unit[TX_64X64] ||
+      txh == tx_size_high_unit[TX_64X64]) {
+    const int tx_unit = tx_size_wide_unit[TX_16X16];
+    for (int idy = 0; idy < txh; idy += tx_unit) {
+      for (int idx = 0; idx < txw; idx += tx_unit) {
+        const int this_index =
+            av1_get_txk_type_index(bsize, blk_row + idy, blk_col + idx);
+        txk_type[this_index] = tx_type;
+      }
+    }
+  }
+}
+
+static INLINE TX_TYPE av1_get_tx_type(PLANE_TYPE plane_type,
+                                      const MACROBLOCKD *xd, int blk_row,
+                                      int blk_col, TX_SIZE tx_size,
+                                      int reduced_tx_set) {
+  const MB_MODE_INFO *const mbmi = xd->mi[0];
+  const struct macroblockd_plane *const pd = &xd->plane[plane_type];
+  const TxSetType tx_set_type =
+      av1_get_ext_tx_set_type(tx_size, is_inter_block(mbmi), reduced_tx_set);
+
+  TX_TYPE tx_type;
+  if (xd->lossless[mbmi->segment_id] || txsize_sqr_up_map[tx_size] > TX_32X32) {
+    tx_type = DCT_DCT;
+  } else {
+    if (plane_type == PLANE_TYPE_Y) {
+      const int txk_type_idx =
+          av1_get_txk_type_index(mbmi->sb_type, blk_row, blk_col);
+      tx_type = xd->txk_type[txk_type_idx];
+    } else if (is_inter_block(mbmi)) {
+      // scale back to y plane's coordinate
+      blk_row <<= pd->subsampling_y;
+      blk_col <<= pd->subsampling_x;
+      const int txk_type_idx =
+          av1_get_txk_type_index(mbmi->sb_type, blk_row, blk_col);
+      tx_type = xd->txk_type[txk_type_idx];
+    } else {
+      // In intra mode, uv planes don't share the same prediction mode as y
+      // plane, so the tx_type should not be shared
+      tx_type = intra_mode_to_tx_type(mbmi, PLANE_TYPE_UV);
+    }
+  }
+  assert(tx_type < TX_TYPES);
+  if (!av1_ext_tx_used[tx_set_type][tx_type]) return DCT_DCT;
+  return tx_type;
+}
+
+void av1_setup_block_planes(MACROBLOCKD *xd, int ss_x, int ss_y,
+                            const int num_planes);
+
+static INLINE int bsize_to_max_depth(BLOCK_SIZE bsize) {
+  TX_SIZE tx_size = max_txsize_rect_lookup[bsize];
+  int depth = 0;
+  while (depth < MAX_TX_DEPTH && tx_size != TX_4X4) {
+    depth++;
+    tx_size = sub_tx_size_map[tx_size];
+  }
+  return depth;
+}
+
+static INLINE int bsize_to_tx_size_cat(BLOCK_SIZE bsize) {
+  TX_SIZE tx_size = max_txsize_rect_lookup[bsize];
+  assert(tx_size != TX_4X4);
+  int depth = 0;
+  while (tx_size != TX_4X4) {
+    depth++;
+    tx_size = sub_tx_size_map[tx_size];
+    assert(depth < 10);
+  }
+  assert(depth <= MAX_TX_CATS);
+  return depth - 1;
+}
+
+static INLINE TX_SIZE depth_to_tx_size(int depth, BLOCK_SIZE bsize) {
+  TX_SIZE max_tx_size = max_txsize_rect_lookup[bsize];
+  TX_SIZE tx_size = max_tx_size;
+  for (int d = 0; d < depth; ++d) tx_size = sub_tx_size_map[tx_size];
+  return tx_size;
+}
+
+static INLINE TX_SIZE av1_get_adjusted_tx_size(TX_SIZE tx_size) {
+  switch (tx_size) {
+    case TX_64X64:
+    case TX_64X32:
+    case TX_32X64: return TX_32X32;
+    case TX_64X16: return TX_32X16;
+    case TX_16X64: return TX_16X32;
+    default: return tx_size;
+  }
+}
+
+static INLINE TX_SIZE av1_get_max_uv_txsize(BLOCK_SIZE bsize, int subsampling_x,
+                                            int subsampling_y) {
+  const BLOCK_SIZE plane_bsize =
+      get_plane_block_size(bsize, subsampling_x, subsampling_y);
+  assert(plane_bsize < BLOCK_SIZES_ALL);
+  const TX_SIZE uv_tx = max_txsize_rect_lookup[plane_bsize];
+  return av1_get_adjusted_tx_size(uv_tx);
+}
+
+static INLINE TX_SIZE av1_get_tx_size(int plane, const MACROBLOCKD *xd) {
+  const MB_MODE_INFO *mbmi = xd->mi[0];
+  if (xd->lossless[mbmi->segment_id]) return TX_4X4;
+  if (plane == 0) return mbmi->tx_size;
+  const MACROBLOCKD_PLANE *pd = &xd->plane[plane];
+  return av1_get_max_uv_txsize(mbmi->sb_type, pd->subsampling_x,
+                               pd->subsampling_y);
+}
+
+void av1_reset_skip_context(MACROBLOCKD *xd, int mi_row, int mi_col,
+                            BLOCK_SIZE bsize, const int num_planes);
+
+void av1_reset_loop_filter_delta(MACROBLOCKD *xd, int num_planes);
+
+void av1_reset_loop_restoration(MACROBLOCKD *xd, const int num_planes);
+
+typedef void (*foreach_transformed_block_visitor)(int plane, int block,
+                                                  int blk_row, int blk_col,
+                                                  BLOCK_SIZE plane_bsize,
+                                                  TX_SIZE tx_size, void *arg);
+
+void av1_set_contexts(const MACROBLOCKD *xd, struct macroblockd_plane *pd,
+                      int plane, BLOCK_SIZE plane_bsize, TX_SIZE tx_size,
+                      int has_eob, int aoff, int loff);
+
+#define MAX_INTERINTRA_SB_SQUARE 32 * 32
+static INLINE int is_interintra_mode(const MB_MODE_INFO *mbmi) {
+  return (mbmi->ref_frame[0] > INTRA_FRAME &&
+          mbmi->ref_frame[1] == INTRA_FRAME);
+}
+
+static INLINE int is_interintra_allowed_bsize(const BLOCK_SIZE bsize) {
+  return (bsize >= BLOCK_8X8) && (bsize <= BLOCK_32X32);
+}
+
+static INLINE int is_interintra_allowed_mode(const PREDICTION_MODE mode) {
+  return (mode >= SINGLE_INTER_MODE_START) && (mode < SINGLE_INTER_MODE_END);
+}
+
+static INLINE int is_interintra_allowed_ref(const MV_REFERENCE_FRAME rf[2]) {
+  return (rf[0] > INTRA_FRAME) && (rf[1] <= INTRA_FRAME);
+}
+
+static INLINE int is_interintra_allowed(const MB_MODE_INFO *mbmi) {
+  return is_interintra_allowed_bsize(mbmi->sb_type) &&
+         is_interintra_allowed_mode(mbmi->mode) &&
+         is_interintra_allowed_ref(mbmi->ref_frame);
+}
+
+static INLINE int is_interintra_allowed_bsize_group(int group) {
+  int i;
+  for (i = 0; i < BLOCK_SIZES_ALL; i++) {
+    if (size_group_lookup[i] == group &&
+        is_interintra_allowed_bsize((BLOCK_SIZE)i)) {
+      return 1;
+    }
+  }
+  return 0;
+}
+
+static INLINE int is_interintra_pred(const MB_MODE_INFO *mbmi) {
+  return mbmi->ref_frame[0] > INTRA_FRAME &&
+         mbmi->ref_frame[1] == INTRA_FRAME && is_interintra_allowed(mbmi);
+}
+
+static INLINE int get_vartx_max_txsize(const MACROBLOCKD *xd, BLOCK_SIZE bsize,
+                                       int plane) {
+  if (xd->lossless[xd->mi[0]->segment_id]) return TX_4X4;
+  const TX_SIZE max_txsize = max_txsize_rect_lookup[bsize];
+  if (plane == 0) return max_txsize;            // luma
+  return av1_get_adjusted_tx_size(max_txsize);  // chroma
+}
+
+static INLINE int is_motion_variation_allowed_bsize(BLOCK_SIZE bsize) {
+  return AOMMIN(block_size_wide[bsize], block_size_high[bsize]) >= 8;
+}
+
+static INLINE int is_motion_variation_allowed_compound(
+    const MB_MODE_INFO *mbmi) {
+  if (!has_second_ref(mbmi))
+    return 1;
+  else
+    return 0;
+}
+
+// input: log2 of length, 0(4), 1(8), ...
+static const int max_neighbor_obmc[6] = { 0, 1, 2, 3, 4, 4 };
+
+static INLINE int check_num_overlappable_neighbors(const MACROBLOCKD *xd) {
+  return !(xd->overlappable_neighbors[0] == 0 &&
+           xd->overlappable_neighbors[1] == 0);
+}
+
+static INLINE MOTION_MODE
+motion_mode_allowed(const WarpedMotionParams *gm_params, const MACROBLOCKD *xd,
+                    const MB_MODE_INFO *mbmi, int allow_warped_motion) {
+  if (xd->cur_frame_force_integer_mv == 0) {
+    const TransformationType gm_type = gm_params[mbmi->ref_frame[0]].wmtype;
+    if (is_global_mv_block(mbmi, gm_type)) return SIMPLE_TRANSLATION;
+  }
+  if (is_motion_variation_allowed_bsize(mbmi->sb_type) &&
+      is_inter_mode(mbmi->mode) && mbmi->ref_frame[1] != INTRA_FRAME &&
+      is_motion_variation_allowed_compound(mbmi)) {
+    if (!check_num_overlappable_neighbors(xd)) return SIMPLE_TRANSLATION;
+    assert(!has_second_ref(mbmi));
+    if (mbmi->num_proj_ref >= 1 &&
+        (allow_warped_motion &&
+         !av1_is_scaled(xd->block_ref_scale_factors[0]))) {
+      if (xd->cur_frame_force_integer_mv) {
+        return OBMC_CAUSAL;
+      }
+      return WARPED_CAUSAL;
+    }
+    return OBMC_CAUSAL;
+  } else {
+    return SIMPLE_TRANSLATION;
+  }
+}
+
+static INLINE void assert_motion_mode_valid(MOTION_MODE mode,
+                                            const WarpedMotionParams *gm_params,
+                                            const MACROBLOCKD *xd,
+                                            const MB_MODE_INFO *mbmi,
+                                            int allow_warped_motion) {
+  const MOTION_MODE last_motion_mode_allowed =
+      motion_mode_allowed(gm_params, xd, mbmi, allow_warped_motion);
+
+  // Check that the input mode is not illegal
+  if (last_motion_mode_allowed < mode)
+    assert(0 && "Illegal motion mode selected");
+}
+
+static INLINE int is_neighbor_overlappable(const MB_MODE_INFO *mbmi) {
+  return (is_inter_block(mbmi));
+}
+
+static INLINE int av1_allow_palette(int allow_screen_content_tools,
+                                    BLOCK_SIZE sb_type) {
+  return allow_screen_content_tools && block_size_wide[sb_type] <= 64 &&
+         block_size_high[sb_type] <= 64 && sb_type >= BLOCK_8X8;
+}
+
+// Returns sub-sampled dimensions of the given block.
+// The output values for 'rows_within_bounds' and 'cols_within_bounds' will
+// differ from 'height' and 'width' when part of the block is outside the
+// right
+// and/or bottom image boundary.
+static INLINE void av1_get_block_dimensions(BLOCK_SIZE bsize, int plane,
+                                            const MACROBLOCKD *xd, int *width,
+                                            int *height,
+                                            int *rows_within_bounds,
+                                            int *cols_within_bounds) {
+  const int block_height = block_size_high[bsize];
+  const int block_width = block_size_wide[bsize];
+  const int block_rows = (xd->mb_to_bottom_edge >= 0)
+                             ? block_height
+                             : (xd->mb_to_bottom_edge >> 3) + block_height;
+  const int block_cols = (xd->mb_to_right_edge >= 0)
+                             ? block_width
+                             : (xd->mb_to_right_edge >> 3) + block_width;
+  const struct macroblockd_plane *const pd = &xd->plane[plane];
+  assert(IMPLIES(plane == PLANE_TYPE_Y, pd->subsampling_x == 0));
+  assert(IMPLIES(plane == PLANE_TYPE_Y, pd->subsampling_y == 0));
+  assert(block_width >= block_cols);
+  assert(block_height >= block_rows);
+  const int plane_block_width = block_width >> pd->subsampling_x;
+  const int plane_block_height = block_height >> pd->subsampling_y;
+  // Special handling for chroma sub8x8.
+  const int is_chroma_sub8_x = plane > 0 && plane_block_width < 4;
+  const int is_chroma_sub8_y = plane > 0 && plane_block_height < 4;
+  if (width) *width = plane_block_width + 2 * is_chroma_sub8_x;
+  if (height) *height = plane_block_height + 2 * is_chroma_sub8_y;
+  if (rows_within_bounds) {
+    *rows_within_bounds =
+        (block_rows >> pd->subsampling_y) + 2 * is_chroma_sub8_y;
+  }
+  if (cols_within_bounds) {
+    *cols_within_bounds =
+        (block_cols >> pd->subsampling_x) + 2 * is_chroma_sub8_x;
+  }
+}
+
+/* clang-format off */
+typedef aom_cdf_prob (*MapCdf)[PALETTE_COLOR_INDEX_CONTEXTS]
+                              [CDF_SIZE(PALETTE_COLORS)];
+typedef const int (*ColorCost)[PALETTE_SIZES][PALETTE_COLOR_INDEX_CONTEXTS]
+                              [PALETTE_COLORS];
+/* clang-format on */
+
+typedef struct {
+  int rows;
+  int cols;
+  int n_colors;
+  int plane_width;
+  int plane_height;
+  uint8_t *color_map;
+  MapCdf map_cdf;
+  ColorCost color_cost;
+} Av1ColorMapParam;
+
+static INLINE int is_nontrans_global_motion(const MACROBLOCKD *xd,
+                                            const MB_MODE_INFO *mbmi) {
+  int ref;
+
+  // First check if all modes are GLOBALMV
+  if (mbmi->mode != GLOBALMV && mbmi->mode != GLOBAL_GLOBALMV) return 0;
+
+  if (AOMMIN(mi_size_wide[mbmi->sb_type], mi_size_high[mbmi->sb_type]) < 2)
+    return 0;
+
+  // Now check if all global motion is non translational
+  for (ref = 0; ref < 1 + has_second_ref(mbmi); ++ref) {
+    if (xd->global_motion[mbmi->ref_frame[ref]].wmtype == TRANSLATION) return 0;
+  }
+  return 1;
+}
+
+static INLINE PLANE_TYPE get_plane_type(int plane) {
+  return (plane == 0) ? PLANE_TYPE_Y : PLANE_TYPE_UV;
+}
+
+static INLINE int av1_get_max_eob(TX_SIZE tx_size) {
+  if (tx_size == TX_64X64 || tx_size == TX_64X32 || tx_size == TX_32X64) {
+    return 1024;
+  }
+  if (tx_size == TX_16X64 || tx_size == TX_64X16) {
+    return 512;
+  }
+  return tx_size_2d[tx_size];
+}
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_BLOCKD_H_

diff --git a/libav1/av1/common/cdef.c b/libav1/av1/common/cdef.c
new file mode 100644
index 0000000..556dede
--- /dev/null
+++ b/libav1/av1/common/cdef.c

@@ -0,0 +1,406 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+#include <math.h>
+#include <string.h>
+
+#include "config/aom_scale_rtcd.h"
+
+#include "aom/aom_integer.h"
+#include "av1/common/cdef.h"
+#include "av1/common/cdef_block.h"
+#include "av1/common/onyxc_int.h"
+#include "av1/common/reconinter.h"
+
+int sb_all_skip(const AV1_COMMON *const cm, int mi_row, int mi_col) {
+  int maxc, maxr;
+  int skip = 1;
+  maxc = cm->mi_cols - mi_col;
+  maxr = cm->mi_rows - mi_row;
+
+  maxr = AOMMIN(maxr, MI_SIZE_64X64);
+  maxc = AOMMIN(maxc, MI_SIZE_64X64);
+
+  for (int r = 0; r < maxr; r++) {
+    for (int c = 0; c < maxc; c++) {
+      skip =
+          skip &&
+          cm->mi_grid_visible[(mi_row + r) * cm->mi_stride + mi_col + c]->skip;
+    }
+  }
+  return skip;
+}
+
+static int is_8x8_block_skip(MB_MODE_INFO **grid, int mi_row, int mi_col,
+                             int mi_stride) {
+  int is_skip = 1;
+  for (int r = 0; r < mi_size_high[BLOCK_8X8]; ++r)
+    for (int c = 0; c < mi_size_wide[BLOCK_8X8]; ++c)
+      is_skip &= grid[(mi_row + r) * mi_stride + (mi_col + c)]->skip;
+
+  return is_skip;
+}
+
+int sb_compute_cdef_list(const AV1_COMMON *const cm, int mi_row, int mi_col,
+                         cdef_list *dlist, BLOCK_SIZE bs) {
+  MB_MODE_INFO **grid = cm->mi_grid_visible;
+  int maxc = cm->mi_cols - mi_col;
+  int maxr = cm->mi_rows - mi_row;
+
+  if (bs == BLOCK_128X128 || bs == BLOCK_128X64)
+    maxc = AOMMIN(maxc, MI_SIZE_128X128);
+  else
+    maxc = AOMMIN(maxc, MI_SIZE_64X64);
+  if (bs == BLOCK_128X128 || bs == BLOCK_64X128)
+    maxr = AOMMIN(maxr, MI_SIZE_128X128);
+  else
+    maxr = AOMMIN(maxr, MI_SIZE_64X64);
+
+  const int r_step = mi_size_high[BLOCK_8X8];
+  const int c_step = mi_size_wide[BLOCK_8X8];
+  const int r_shift = (r_step == 2);
+  const int c_shift = (c_step == 2);
+
+  assert(r_step == 1 || r_step == 2);
+  assert(c_step == 1 || c_step == 2);
+
+  int count = 0;
+
+  for (int r = 0; r < maxr; r += r_step) {
+    for (int c = 0; c < maxc; c += c_step) {
+      if (!is_8x8_block_skip(grid, mi_row + r, mi_col + c, cm->mi_stride)) {
+        dlist[count].by = r >> r_shift;
+        dlist[count].bx = c >> c_shift;
+        dlist[count].skip = 0;
+        count++;
+      }
+    }
+  }
+  return count;
+}
+
+void copy_rect8_8bit_to_16bit_c(uint16_t *dst, int dstride, const uint8_t *src,
+                                int sstride, int v, int h) {
+  for (int i = 0; i < v; i++) {
+    for (int j = 0; j < h; j++) {
+      dst[i * dstride + j] = src[i * sstride + j];
+    }
+  }
+}
+
+void copy_rect8_16bit_to_16bit_c(uint16_t *dst, int dstride,
+                                 const uint16_t *src, int sstride, int v,
+                                 int h) {
+  for (int i = 0; i < v; i++) {
+    for (int j = 0; j < h; j++) {
+      dst[i * dstride + j] = src[i * sstride + j];
+    }
+  }
+}
+
+static void copy_sb8_16(AV1_COMMON *cm, uint16_t *dst, int dstride,
+                        const uint8_t *src, int src_voffset, int src_hoffset,
+                        int sstride, int vsize, int hsize) {
+  if (cm->seq_params.use_highbitdepth) {
+    const uint16_t *base =
+        &CONVERT_TO_SHORTPTR(src)[src_voffset * sstride + src_hoffset];
+    copy_rect8_16bit_to_16bit(dst, dstride, base, sstride, vsize, hsize);
+  } else {
+    const uint8_t *base = &src[src_voffset * sstride + src_hoffset];
+    copy_rect8_8bit_to_16bit(dst, dstride, base, sstride, vsize, hsize);
+  }
+}
+
+static INLINE void fill_rect(uint16_t *dst, int dstride, int v, int h,
+                             uint16_t x) {
+  for (int i = 0; i < v; i++) {
+    for (int j = 0; j < h; j++) {
+      dst[i * dstride + j] = x;
+    }
+  }
+}
+
+static INLINE void copy_rect(uint16_t *dst, int dstride, const uint16_t *src,
+                             int sstride, int v, int h) {
+  for (int i = 0; i < v; i++) {
+    for (int j = 0; j < h; j++) {
+      dst[i * dstride + j] = src[i * sstride + j];
+    }
+  }
+}
+
+void av1_cdef_frame(YV12_BUFFER_CONFIG *frame, AV1_COMMON *cm,
+                    MACROBLOCKD *xd) {
+  const CdefInfo *const cdef_info = &cm->cdef_info;
+  const int num_planes = av1_num_planes(cm);
+  DECLARE_ALIGNED(16, uint16_t, src[CDEF_INBUF_SIZE]);
+  uint16_t *linebuf[3];
+  uint16_t *colbuf[3];
+  cdef_list dlist[MI_SIZE_64X64 * MI_SIZE_64X64];
+  unsigned char *row_cdef, *prev_row_cdef, *curr_row_cdef;
+  int cdef_count;
+  int dir[CDEF_NBLOCKS][CDEF_NBLOCKS] = { { 0 } };
+  int var[CDEF_NBLOCKS][CDEF_NBLOCKS] = { { 0 } };
+  int mi_wide_l2[3];
+  int mi_high_l2[3];
+  int xdec[3];
+  int ydec[3];
+  int coeff_shift = AOMMAX(cm->seq_params.bit_depth - 8, 0);
+  const int nvfb = (cm->mi_rows + MI_SIZE_64X64 - 1) / MI_SIZE_64X64;
+  const int nhfb = (cm->mi_cols + MI_SIZE_64X64 - 1) / MI_SIZE_64X64;
+  av1_setup_dst_planes(xd->plane, cm->seq_params.sb_size, frame, 0, 0, 0,
+                       num_planes);
+  row_cdef = aom_malloc(sizeof(*row_cdef) * (nhfb + 2) * 2);
+  memset(row_cdef, 1, sizeof(*row_cdef) * (nhfb + 2) * 2);
+  prev_row_cdef = row_cdef + 1;
+  curr_row_cdef = prev_row_cdef + nhfb + 2;
+  for (int pli = 0; pli < num_planes; pli++) {
+    xdec[pli] = xd->plane[pli].subsampling_x;
+    ydec[pli] = xd->plane[pli].subsampling_y;
+    mi_wide_l2[pli] = MI_SIZE_LOG2 - xd->plane[pli].subsampling_x;
+    mi_high_l2[pli] = MI_SIZE_LOG2 - xd->plane[pli].subsampling_y;
+  }
+  const int stride = (cm->mi_cols << MI_SIZE_LOG2) + 2 * CDEF_HBORDER;
+  for (int pli = 0; pli < num_planes; pli++) {
+    linebuf[pli] = aom_malloc(sizeof(*linebuf) * CDEF_VBORDER * stride);
+    colbuf[pli] =
+        aom_malloc(sizeof(*colbuf) *
+                   ((CDEF_BLOCKSIZE << mi_high_l2[pli]) + 2 * CDEF_VBORDER) *
+                   CDEF_HBORDER);
+  }
+  for (int fbr = 0; fbr < nvfb; fbr++) {
+    for (int pli = 0; pli < num_planes; pli++) {
+      const int block_height =
+          (MI_SIZE_64X64 << mi_high_l2[pli]) + 2 * CDEF_VBORDER;
+      fill_rect(colbuf[pli], CDEF_HBORDER, block_height, CDEF_HBORDER,
+                CDEF_VERY_LARGE);
+    }
+    int cdef_left = 1;
+    for (int fbc = 0; fbc < nhfb; fbc++) {
+      int level, sec_strength;
+      int uv_level, uv_sec_strength;
+      int nhb, nvb;
+      int cstart = 0;
+      curr_row_cdef[fbc] = 0;
+      if (cm->mi_grid_visible[MI_SIZE_64X64 * fbr * cm->mi_stride +
+                              MI_SIZE_64X64 * fbc] == NULL ||
+          cm->mi_grid_visible[MI_SIZE_64X64 * fbr * cm->mi_stride +
+                              MI_SIZE_64X64 * fbc]
+                  ->cdef_strength == -1) {
+        cdef_left = 0;
+        continue;
+      }
+      if (!cdef_left) cstart = -CDEF_HBORDER;
+      nhb = AOMMIN(MI_SIZE_64X64, cm->mi_cols - MI_SIZE_64X64 * fbc);
+      nvb = AOMMIN(MI_SIZE_64X64, cm->mi_rows - MI_SIZE_64X64 * fbr);
+      int frame_top, frame_left, frame_bottom, frame_right;
+
+      int mi_row = MI_SIZE_64X64 * fbr;
+      int mi_col = MI_SIZE_64X64 * fbc;
+      // for the current filter block, it's top left corner mi structure (mi_tl)
+      // is first accessed to check whether the top and left boundaries are
+      // frame boundaries. Then bottom-left and top-right mi structures are
+      // accessed to check whether the bottom and right boundaries
+      // (respectively) are frame boundaries.
+      //
+      // Note that we can't just check the bottom-right mi structure - eg. if
+      // we're at the right-hand edge of the frame but not the bottom, then
+      // the bottom-right mi is NULL but the bottom-left is not.
+      frame_top = (mi_row == 0) ? 1 : 0;
+      frame_left = (mi_col == 0) ? 1 : 0;
+
+      if (fbr != nvfb - 1)
+        frame_bottom = (mi_row + MI_SIZE_64X64 == cm->mi_rows) ? 1 : 0;
+      else
+        frame_bottom = 1;
+
+      if (fbc != nhfb - 1)
+        frame_right = (mi_col + MI_SIZE_64X64 == cm->mi_cols) ? 1 : 0;
+      else
+        frame_right = 1;
+
+      const int mbmi_cdef_strength =
+          cm->mi_grid_visible[MI_SIZE_64X64 * fbr * cm->mi_stride +
+                              MI_SIZE_64X64 * fbc]
+              ->cdef_strength;
+      level =
+          cdef_info->cdef_strengths[mbmi_cdef_strength] / CDEF_SEC_STRENGTHS;
+      sec_strength =
+          cdef_info->cdef_strengths[mbmi_cdef_strength] % CDEF_SEC_STRENGTHS;
+      sec_strength += sec_strength == 3;
+      uv_level =
+          cdef_info->cdef_uv_strengths[mbmi_cdef_strength] / CDEF_SEC_STRENGTHS;
+      uv_sec_strength =
+          cdef_info->cdef_uv_strengths[mbmi_cdef_strength] % CDEF_SEC_STRENGTHS;
+      uv_sec_strength += uv_sec_strength == 3;
+      if ((level == 0 && sec_strength == 0 && uv_level == 0 &&
+           uv_sec_strength == 0) ||
+          (cdef_count = sb_compute_cdef_list(cm, fbr * MI_SIZE_64X64,
+                                             fbc * MI_SIZE_64X64, dlist,
+                                             BLOCK_64X64)) == 0) {
+        cdef_left = 0;
+        continue;
+      }
+
+      curr_row_cdef[fbc] = 1;
+      for (int pli = 0; pli < num_planes; pli++) {
+        int coffset;
+        int rend, cend;
+        int pri_damping = cdef_info->cdef_pri_damping;
+        int sec_damping = cdef_info->cdef_sec_damping;
+        int hsize = nhb << mi_wide_l2[pli];
+        int vsize = nvb << mi_high_l2[pli];
+
+        if (pli) {
+          level = uv_level;
+          sec_strength = uv_sec_strength;
+        }
+
+        if (fbc == nhfb - 1)
+          cend = hsize;
+        else
+          cend = hsize + CDEF_HBORDER;
+
+        if (fbr == nvfb - 1)
+          rend = vsize;
+        else
+          rend = vsize + CDEF_VBORDER;
+
+        coffset = fbc * MI_SIZE_64X64 << mi_wide_l2[pli];
+        if (fbc == nhfb - 1) {
+          /* On the last superblock column, fill in the right border with
+             CDEF_VERY_LARGE to avoid filtering with the outside. */
+          fill_rect(&src[cend + CDEF_HBORDER], CDEF_BSTRIDE,
+                    rend + CDEF_VBORDER, hsize + CDEF_HBORDER - cend,
+                    CDEF_VERY_LARGE);
+        }
+        if (fbr == nvfb - 1) {
+          /* On the last superblock row, fill in the bottom border with
+             CDEF_VERY_LARGE to avoid filtering with the outside. */
+          fill_rect(&src[(rend + CDEF_VBORDER) * CDEF_BSTRIDE], CDEF_BSTRIDE,
+                    CDEF_VBORDER, hsize + 2 * CDEF_HBORDER, CDEF_VERY_LARGE);
+        }
+        /* Copy in the pixels we need from the current superblock for
+           deringing.*/
+        copy_sb8_16(cm,
+                    &src[CDEF_VBORDER * CDEF_BSTRIDE + CDEF_HBORDER + cstart],
+                    CDEF_BSTRIDE, xd->plane[pli].dst.buf,
+                    (MI_SIZE_64X64 << mi_high_l2[pli]) * fbr, coffset + cstart,
+                    xd->plane[pli].dst.stride, rend, cend - cstart);
+        if (!prev_row_cdef[fbc]) {
+          copy_sb8_16(cm, &src[CDEF_HBORDER], CDEF_BSTRIDE,
+                      xd->plane[pli].dst.buf,
+                      (MI_SIZE_64X64 << mi_high_l2[pli]) * fbr - CDEF_VBORDER,
+                      coffset, xd->plane[pli].dst.stride, CDEF_VBORDER, hsize);
+        } else if (fbr > 0) {
+          copy_rect(&src[CDEF_HBORDER], CDEF_BSTRIDE, &linebuf[pli][coffset],
+                    stride, CDEF_VBORDER, hsize);
+        } else {
+          fill_rect(&src[CDEF_HBORDER], CDEF_BSTRIDE, CDEF_VBORDER, hsize,
+                    CDEF_VERY_LARGE);
+        }
+        if (!prev_row_cdef[fbc - 1]) {
+          copy_sb8_16(cm, src, CDEF_BSTRIDE, xd->plane[pli].dst.buf,
+                      (MI_SIZE_64X64 << mi_high_l2[pli]) * fbr - CDEF_VBORDER,
+                      coffset - CDEF_HBORDER, xd->plane[pli].dst.stride,
+                      CDEF_VBORDER, CDEF_HBORDER);
+        } else if (fbr > 0 && fbc > 0) {
+          copy_rect(src, CDEF_BSTRIDE, &linebuf[pli][coffset - CDEF_HBORDER],
+                    stride, CDEF_VBORDER, CDEF_HBORDER);
+        } else {
+          fill_rect(src, CDEF_BSTRIDE, CDEF_VBORDER, CDEF_HBORDER,
+                    CDEF_VERY_LARGE);
+        }
+        if (!prev_row_cdef[fbc + 1]) {
+          copy_sb8_16(cm, &src[CDEF_HBORDER + (nhb << mi_wide_l2[pli])],
+                      CDEF_BSTRIDE, xd->plane[pli].dst.buf,
+                      (MI_SIZE_64X64 << mi_high_l2[pli]) * fbr - CDEF_VBORDER,
+                      coffset + hsize, xd->plane[pli].dst.stride, CDEF_VBORDER,
+                      CDEF_HBORDER);
+        } else if (fbr > 0 && fbc < nhfb - 1) {
+          copy_rect(&src[hsize + CDEF_HBORDER], CDEF_BSTRIDE,
+                    &linebuf[pli][coffset + hsize], stride, CDEF_VBORDER,
+                    CDEF_HBORDER);
+        } else {
+          fill_rect(&src[hsize + CDEF_HBORDER], CDEF_BSTRIDE, CDEF_VBORDER,
+                    CDEF_HBORDER, CDEF_VERY_LARGE);
+        }
+        if (cdef_left) {
+          /* If we deringed the superblock on the left then we need to copy in
+             saved pixels. */
+          copy_rect(src, CDEF_BSTRIDE, colbuf[pli], CDEF_HBORDER,
+                    rend + CDEF_VBORDER, CDEF_HBORDER);
+        }
+        /* Saving pixels in case we need to dering the superblock on the
+            right. */
+        copy_rect(colbuf[pli], CDEF_HBORDER, src + hsize, CDEF_BSTRIDE,
+                  rend + CDEF_VBORDER, CDEF_HBORDER);
+        copy_sb8_16(
+            cm, &linebuf[pli][coffset], stride, xd->plane[pli].dst.buf,
+            (MI_SIZE_64X64 << mi_high_l2[pli]) * (fbr + 1) - CDEF_VBORDER,
+            coffset, xd->plane[pli].dst.stride, CDEF_VBORDER, hsize);
+
+        if (frame_top) {
+          fill_rect(src, CDEF_BSTRIDE, CDEF_VBORDER, hsize + 2 * CDEF_HBORDER,
+                    CDEF_VERY_LARGE);
+        }
+        if (frame_left) {
+          fill_rect(src, CDEF_BSTRIDE, vsize + 2 * CDEF_VBORDER, CDEF_HBORDER,
+                    CDEF_VERY_LARGE);
+        }
+        if (frame_bottom) {
+          fill_rect(&src[(vsize + CDEF_VBORDER) * CDEF_BSTRIDE], CDEF_BSTRIDE,
+                    CDEF_VBORDER, hsize + 2 * CDEF_HBORDER, CDEF_VERY_LARGE);
+        }
+        if (frame_right) {
+          fill_rect(&src[hsize + CDEF_HBORDER], CDEF_BSTRIDE,
+                    vsize + 2 * CDEF_VBORDER, CDEF_HBORDER, CDEF_VERY_LARGE);
+        }
+
+        if (cm->seq_params.use_highbitdepth) {
+          cdef_filter_fb(
+              NULL,
+              &CONVERT_TO_SHORTPTR(
+                  xd->plane[pli]
+                      .dst.buf)[xd->plane[pli].dst.stride *
+                                    (MI_SIZE_64X64 * fbr << mi_high_l2[pli]) +
+                                (fbc * MI_SIZE_64X64 << mi_wide_l2[pli])],
+              xd->plane[pli].dst.stride,
+              &src[CDEF_VBORDER * CDEF_BSTRIDE + CDEF_HBORDER], xdec[pli],
+              ydec[pli], dir, NULL, var, pli, dlist, cdef_count, level,
+              sec_strength, pri_damping, sec_damping, coeff_shift);
+        } else {
+          cdef_filter_fb(
+              &xd->plane[pli]
+                   .dst.buf[xd->plane[pli].dst.stride *
+                                (MI_SIZE_64X64 * fbr << mi_high_l2[pli]) +
+                            (fbc * MI_SIZE_64X64 << mi_wide_l2[pli])],
+              NULL, xd->plane[pli].dst.stride,
+              &src[CDEF_VBORDER * CDEF_BSTRIDE + CDEF_HBORDER], xdec[pli],
+              ydec[pli], dir, NULL, var, pli, dlist, cdef_count, level,
+              sec_strength, pri_damping, sec_damping, coeff_shift);
+        }
+      }
+      cdef_left = 1;
+    }
+    {
+      unsigned char *tmp = prev_row_cdef;
+      prev_row_cdef = curr_row_cdef;
+      curr_row_cdef = tmp;
+    }
+  }
+  aom_free(row_cdef);
+  for (int pli = 0; pli < num_planes; pli++) {
+    aom_free(linebuf[pli]);
+    aom_free(colbuf[pli]);
+  }
+}

diff --git a/libav1/av1/common/cdef.h b/libav1/av1/common/cdef.h
new file mode 100644
index 0000000..3b2eac8
--- /dev/null
+++ b/libav1/av1/common/cdef.h

@@ -0,0 +1,51 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#ifndef AOM_AV1_COMMON_CDEF_H_
+#define AOM_AV1_COMMON_CDEF_H_
+
+#define CDEF_STRENGTH_BITS 6
+
+#define CDEF_PRI_STRENGTHS 16
+#define CDEF_SEC_STRENGTHS 4
+
+#include "config/aom_config.h"
+
+#include "aom/aom_integer.h"
+#include "aom_ports/mem.h"
+#include "av1/common/cdef_block.h"
+#include "av1/common/onyxc_int.h"
+
+static INLINE int sign(int i) { return i < 0 ? -1 : 1; }
+
+static INLINE int constrain(int diff, int threshold, int damping) {
+  if (!threshold) return 0;
+
+  const int shift = AOMMAX(0, damping - get_msb(threshold));
+  return sign(diff) *
+         AOMMIN(abs(diff), AOMMAX(0, threshold - (abs(diff) >> shift)));
+}
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+int sb_all_skip(const AV1_COMMON *const cm, int mi_row, int mi_col);
+int sb_compute_cdef_list(const AV1_COMMON *const cm, int mi_row, int mi_col,
+                         cdef_list *dlist, BLOCK_SIZE bsize);
+void av1_cdef_frame(YV12_BUFFER_CONFIG *frame, AV1_COMMON *cm, MACROBLOCKD *xd);
+
+void av1_cdef_search(YV12_BUFFER_CONFIG *frame, const YV12_BUFFER_CONFIG *ref,
+                     AV1_COMMON *cm, MACROBLOCKD *xd, int fast);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+#endif  // AOM_AV1_COMMON_CDEF_H_

diff --git a/libav1/av1/common/cdef_block.c b/libav1/av1/common/cdef_block.c
new file mode 100644
index 0000000..845df37
--- /dev/null
+++ b/libav1/av1/common/cdef_block.c

@@ -0,0 +1,255 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <math.h>
+#include <stdlib.h>
+
+#include "config/aom_dsp_rtcd.h"
+#include "config/av1_rtcd.h"
+
+#include "av1/common/cdef.h"
+
+/* Generated from gen_filter_tables.c. */
+DECLARE_ALIGNED(16, const int, cdef_directions[8][2]) = {
+  { -1 * CDEF_BSTRIDE + 1, -2 * CDEF_BSTRIDE + 2 },
+  { 0 * CDEF_BSTRIDE + 1, -1 * CDEF_BSTRIDE + 2 },
+  { 0 * CDEF_BSTRIDE + 1, 0 * CDEF_BSTRIDE + 2 },
+  { 0 * CDEF_BSTRIDE + 1, 1 * CDEF_BSTRIDE + 2 },
+  { 1 * CDEF_BSTRIDE + 1, 2 * CDEF_BSTRIDE + 2 },
+  { 1 * CDEF_BSTRIDE + 0, 2 * CDEF_BSTRIDE + 1 },
+  { 1 * CDEF_BSTRIDE + 0, 2 * CDEF_BSTRIDE + 0 },
+  { 1 * CDEF_BSTRIDE + 0, 2 * CDEF_BSTRIDE - 1 }
+};
+
+/* Detect direction. 0 means 45-degree up-right, 2 is horizontal, and so on.
+   The search minimizes the weighted variance along all the lines in a
+   particular direction, i.e. the squared error between the input and a
+   "predicted" block where each pixel is replaced by the average along a line
+   in a particular direction. Since each direction have the same sum(x^2) term,
+   that term is never computed. See Section 2, step 2, of:
+   http://jmvalin.ca/notes/intra_paint.pdf */
+int cdef_find_dir_c(const uint16_t *img, int stride, int32_t *var,
+                    int coeff_shift) {
+  int i;
+  int32_t cost[8] = { 0 };
+  int partial[8][15] = { { 0 } };
+  int32_t best_cost = 0;
+  int best_dir = 0;
+  /* Instead of dividing by n between 2 and 8, we multiply by 3*5*7*8/n.
+     The output is then 840 times larger, but we don't care for finding
+     the max. */
+  static const int div_table[] = { 0, 840, 420, 280, 210, 168, 140, 120, 105 };
+  for (i = 0; i < 8; i++) {
+    int j;
+    for (j = 0; j < 8; j++) {
+      int x;
+      /* We subtract 128 here to reduce the maximum range of the squared
+         partial sums. */
+      x = (img[i * stride + j] >> coeff_shift) - 128;
+      partial[0][i + j] += x;
+      partial[1][i + j / 2] += x;
+      partial[2][i] += x;
+      partial[3][3 + i - j / 2] += x;
+      partial[4][7 + i - j] += x;
+      partial[5][3 - i / 2 + j] += x;
+      partial[6][j] += x;
+      partial[7][i / 2 + j] += x;
+    }
+  }
+  for (i = 0; i < 8; i++) {
+    cost[2] += partial[2][i] * partial[2][i];
+    cost[6] += partial[6][i] * partial[6][i];
+  }
+  cost[2] *= div_table[8];
+  cost[6] *= div_table[8];
+  for (i = 0; i < 7; i++) {
+    cost[0] += (partial[0][i] * partial[0][i] +
+                partial[0][14 - i] * partial[0][14 - i]) *
+               div_table[i + 1];
+    cost[4] += (partial[4][i] * partial[4][i] +
+                partial[4][14 - i] * partial[4][14 - i]) *
+               div_table[i + 1];
+  }
+  cost[0] += partial[0][7] * partial[0][7] * div_table[8];
+  cost[4] += partial[4][7] * partial[4][7] * div_table[8];
+  for (i = 1; i < 8; i += 2) {
+    int j;
+    for (j = 0; j < 4 + 1; j++) {
+      cost[i] += partial[i][3 + j] * partial[i][3 + j];
+    }
+    cost[i] *= div_table[8];
+    for (j = 0; j < 4 - 1; j++) {
+      cost[i] += (partial[i][j] * partial[i][j] +
+                  partial[i][10 - j] * partial[i][10 - j]) *
+                 div_table[2 * j + 2];
+    }
+  }
+  for (i = 0; i < 8; i++) {
+    if (cost[i] > best_cost) {
+      best_cost = cost[i];
+      best_dir = i;
+    }
+  }
+  /* Difference between the optimal variance and the variance along the
+     orthogonal direction. Again, the sum(x^2) terms cancel out. */
+  *var = best_cost - cost[(best_dir + 4) & 7];
+  /* We'd normally divide by 840, but dividing by 1024 is close enough
+     for what we're going to do with this. */
+  *var >>= 10;
+  return best_dir;
+}
+
+const int cdef_pri_taps[2][2] = { { 4, 2 }, { 3, 3 } };
+const int cdef_sec_taps[2][2] = { { 2, 1 }, { 2, 1 } };
+
+/* Smooth in the direction detected. */
+void cdef_filter_block_c(uint8_t *dst8, uint16_t *dst16, int dstride,
+                         const uint16_t *in, int pri_strength, int sec_strength,
+                         int dir, int pri_damping, int sec_damping, int bsize,
+                         int coeff_shift) {
+  int i, j, k;
+  const int s = CDEF_BSTRIDE;
+  const int *pri_taps = cdef_pri_taps[(pri_strength >> coeff_shift) & 1];
+  const int *sec_taps = cdef_sec_taps[(pri_strength >> coeff_shift) & 1];
+  for (i = 0; i < 4 << (bsize == BLOCK_8X8 || bsize == BLOCK_4X8); i++) {
+    for (j = 0; j < 4 << (bsize == BLOCK_8X8 || bsize == BLOCK_8X4); j++) {
+      int16_t sum = 0;
+      int16_t y;
+      int16_t x = in[i * s + j];
+      int max = x;
+      int min = x;
+      for (k = 0; k < 2; k++) {
+        int16_t p0 = in[i * s + j + cdef_directions[dir][k]];
+        int16_t p1 = in[i * s + j - cdef_directions[dir][k]];
+        sum += pri_taps[k] * constrain(p0 - x, pri_strength, pri_damping);
+        sum += pri_taps[k] * constrain(p1 - x, pri_strength, pri_damping);
+        if (p0 != CDEF_VERY_LARGE) max = AOMMAX(p0, max);
+        if (p1 != CDEF_VERY_LARGE) max = AOMMAX(p1, max);
+        min = AOMMIN(p0, min);
+        min = AOMMIN(p1, min);
+        int16_t s0 = in[i * s + j + cdef_directions[(dir + 2) & 7][k]];
+        int16_t s1 = in[i * s + j - cdef_directions[(dir + 2) & 7][k]];
+        int16_t s2 = in[i * s + j + cdef_directions[(dir + 6) & 7][k]];
+        int16_t s3 = in[i * s + j - cdef_directions[(dir + 6) & 7][k]];
+        if (s0 != CDEF_VERY_LARGE) max = AOMMAX(s0, max);
+        if (s1 != CDEF_VERY_LARGE) max = AOMMAX(s1, max);
+        if (s2 != CDEF_VERY_LARGE) max = AOMMAX(s2, max);
+        if (s3 != CDEF_VERY_LARGE) max = AOMMAX(s3, max);
+        min = AOMMIN(s0, min);
+        min = AOMMIN(s1, min);
+        min = AOMMIN(s2, min);
+        min = AOMMIN(s3, min);
+        sum += sec_taps[k] * constrain(s0 - x, sec_strength, sec_damping);
+        sum += sec_taps[k] * constrain(s1 - x, sec_strength, sec_damping);
+        sum += sec_taps[k] * constrain(s2 - x, sec_strength, sec_damping);
+        sum += sec_taps[k] * constrain(s3 - x, sec_strength, sec_damping);
+      }
+      y = clamp((int16_t)x + ((8 + sum - (sum < 0)) >> 4), min, max);
+      if (dst8)
+        dst8[i * dstride + j] = (uint8_t)y;
+      else
+        dst16[i * dstride + j] = (uint16_t)y;
+    }
+  }
+}
+
+/* Compute the primary filter strength for an 8x8 block based on the
+   directional variance difference. A high variance difference means
+   that we have a highly directional pattern (e.g. a high contrast
+   edge), so we can apply more deringing. A low variance means that we
+   either have a low contrast edge, or a non-directional texture, so
+   we want to be careful not to blur. */
+static INLINE int adjust_strength(int strength, int32_t var) {
+  const int i = var >> 6 ? AOMMIN(get_msb(var >> 6), 12) : 0;
+  /* We use the variance of 8x8 blocks to adjust the strength. */
+  return var ? (strength * (4 + i) + 8) >> 4 : 0;
+}
+
+void cdef_filter_fb(uint8_t *dst8, uint16_t *dst16, int dstride, uint16_t *in,
+                    int xdec, int ydec, int dir[CDEF_NBLOCKS][CDEF_NBLOCKS],
+                    int *dirinit, int var[CDEF_NBLOCKS][CDEF_NBLOCKS], int pli,
+                    cdef_list *dlist, int cdef_count, int level,
+                    int sec_strength, int pri_damping, int sec_damping,
+                    int coeff_shift) {
+  int bi;
+  int bx;
+  int by;
+  int bsize, bsizex, bsizey;
+
+  int pri_strength = level << coeff_shift;
+  sec_strength <<= coeff_shift;
+  sec_damping += coeff_shift - (pli != AOM_PLANE_Y);
+  pri_damping += coeff_shift - (pli != AOM_PLANE_Y);
+  bsize =
+      ydec ? (xdec ? BLOCK_4X4 : BLOCK_8X4) : (xdec ? BLOCK_4X8 : BLOCK_8X8);
+  bsizex = 3 - xdec;
+  bsizey = 3 - ydec;
+  if (dirinit && pri_strength == 0 && sec_strength == 0) {
+    // If we're here, both primary and secondary strengths are 0, and
+    // we still haven't written anything to y[] yet, so we just copy
+    // the input to y[]. This is necessary only for av1_cdef_search()
+    // and only av1_cdef_search() sets dirinit.
+    for (bi = 0; bi < cdef_count; bi++) {
+      by = dlist[bi].by;
+      bx = dlist[bi].bx;
+      int iy, ix;
+      // TODO(stemidts/jmvalin): SIMD optimisations
+      for (iy = 0; iy < 1 << bsizey; iy++)
+        for (ix = 0; ix < 1 << bsizex; ix++)
+          dst16[(bi << (bsizex + bsizey)) + (iy << bsizex) + ix] =
+              in[((by << bsizey) + iy) * CDEF_BSTRIDE + (bx << bsizex) + ix];
+    }
+    return;
+  }
+
+  if (pli == 0) {
+    if (!dirinit || !*dirinit) {
+      for (bi = 0; bi < cdef_count; bi++) {
+        by = dlist[bi].by;
+        bx = dlist[bi].bx;
+        dir[by][bx] = cdef_find_dir(&in[8 * by * CDEF_BSTRIDE + 8 * bx],
+                                    CDEF_BSTRIDE, &var[by][bx], coeff_shift);
+      }
+      if (dirinit) *dirinit = 1;
+    }
+  }
+  if (pli == 1 && xdec != ydec) {
+    for (bi = 0; bi < cdef_count; bi++) {
+      static const int conv422[8] = { 7, 0, 2, 4, 5, 6, 6, 6 };
+      static const int conv440[8] = { 1, 2, 2, 2, 3, 4, 6, 0 };
+      by = dlist[bi].by;
+      bx = dlist[bi].bx;
+      dir[by][bx] = (xdec ? conv422 : conv440)[dir[by][bx]];
+    }
+  }
+
+  for (bi = 0; bi < cdef_count; bi++) {
+    int t = dlist[bi].skip ? 0 : pri_strength;
+    int s = dlist[bi].skip ? 0 : sec_strength;
+    by = dlist[bi].by;
+    bx = dlist[bi].bx;
+    if (dst8)
+      cdef_filter_block(
+          &dst8[(by << bsizey) * dstride + (bx << bsizex)], NULL, dstride,
+          &in[(by * CDEF_BSTRIDE << bsizey) + (bx << bsizex)],
+          (pli ? t : adjust_strength(t, var[by][bx])), s, t ? dir[by][bx] : 0,
+          pri_damping, sec_damping, bsize, coeff_shift);
+    else
+      cdef_filter_block(
+          NULL,
+          &dst16[dirinit ? bi << (bsizex + bsizey)
+                         : (by << bsizey) * dstride + (bx << bsizex)],
+          dirinit ? 1 << bsizex : dstride,
+          &in[(by * CDEF_BSTRIDE << bsizey) + (bx << bsizex)],
+          (pli ? t : adjust_strength(t, var[by][bx])), s, t ? dir[by][bx] : 0,
+          pri_damping, sec_damping, bsize, coeff_shift);
+  }
+}

diff --git a/libav1/av1/common/cdef_block.h b/libav1/av1/common/cdef_block.h
new file mode 100644
index 0000000..0e921e0
--- /dev/null
+++ b/libav1/av1/common/cdef_block.h

@@ -0,0 +1,59 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_CDEF_BLOCK_H_
+#define AOM_AV1_COMMON_CDEF_BLOCK_H_
+
+#include "av1/common/odintrin.h"
+
+#define CDEF_BLOCKSIZE 64
+#define CDEF_BLOCKSIZE_LOG2 6
+#define CDEF_NBLOCKS ((1 << MAX_SB_SIZE_LOG2) / 8)
+#define CDEF_SB_SHIFT (MAX_SB_SIZE_LOG2 - CDEF_BLOCKSIZE_LOG2)
+
+/* We need to buffer three vertical lines. */
+#define CDEF_VBORDER (3)
+/* We only need to buffer three horizontal pixels too, but let's align to
+   16 bytes (8 x 16 bits) to make vectorization easier. */
+#define CDEF_HBORDER (8)
+#define CDEF_BSTRIDE \
+  ALIGN_POWER_OF_TWO((1 << MAX_SB_SIZE_LOG2) + 2 * CDEF_HBORDER, 3)
+
+#define CDEF_VERY_LARGE (30000)
+#define CDEF_INBUF_SIZE \
+  (CDEF_BSTRIDE * ((1 << MAX_SB_SIZE_LOG2) + 2 * CDEF_VBORDER))
+
+extern const int cdef_pri_taps[2][2];
+extern const int cdef_sec_taps[2][2];
+DECLARE_ALIGNED(16, extern const int, cdef_directions[8][2]);
+
+typedef struct {
+  uint8_t by;
+  uint8_t bx;
+  uint8_t skip;
+} cdef_list;
+
+typedef void (*cdef_filter_block_func)(uint8_t *dst8, uint16_t *dst16,
+                                       int dstride, const uint16_t *in,
+                                       int pri_strength, int sec_strength,
+                                       int dir, int pri_damping,
+                                       int sec_damping, int bsize,
+                                       int coeff_shift);
+void copy_cdef_16bit_to_16bit(uint16_t *dst, int dstride, uint16_t *src,
+                              cdef_list *dlist, int cdef_count, int bsize);
+
+void cdef_filter_fb(uint8_t *dst8, uint16_t *dst16, int dstride, uint16_t *in,
+                    int xdec, int ydec, int dir[CDEF_NBLOCKS][CDEF_NBLOCKS],
+                    int *dirinit, int var[CDEF_NBLOCKS][CDEF_NBLOCKS], int pli,
+                    cdef_list *dlist, int cdef_count, int level,
+                    int sec_strength, int pri_damping, int sec_damping,
+                    int coeff_shift);
+#endif  // AOM_AV1_COMMON_CDEF_BLOCK_H_

diff --git a/libav1/av1/common/cfl.c b/libav1/av1/common/cfl.c
new file mode 100644
index 0000000..65e18e8
--- /dev/null
+++ b/libav1/av1/common/cfl.c

@@ -0,0 +1,408 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "av1/common/cfl.h"
+#include "av1/common/common_data.h"
+#include "av1/common/onyxc_int.h"
+
+#include "config/av1_rtcd.h"
+
+void cfl_init(CFL_CTX *cfl, const SequenceHeader *seq_params) {
+  assert(block_size_wide[CFL_MAX_BLOCK_SIZE] == CFL_BUF_LINE);
+  assert(block_size_high[CFL_MAX_BLOCK_SIZE] == CFL_BUF_LINE);
+
+  memset(&cfl->recon_buf_q3, 0, sizeof(cfl->recon_buf_q3));
+  memset(&cfl->ac_buf_q3, 0, sizeof(cfl->ac_buf_q3));
+  cfl->subsampling_x = seq_params->subsampling_x;
+  cfl->subsampling_y = seq_params->subsampling_y;
+  cfl->are_parameters_computed = 0;
+  cfl->store_y = 0;
+  // The DC_PRED cache is disabled by default and is only enabled in
+  // cfl_rd_pick_alpha
+  cfl->use_dc_pred_cache = 0;
+  cfl->dc_pred_is_cached[CFL_PRED_U] = 0;
+  cfl->dc_pred_is_cached[CFL_PRED_V] = 0;
+}
+
+void cfl_store_dc_pred(MACROBLOCKD *const xd, const uint8_t *input,
+                       CFL_PRED_TYPE pred_plane, int width) {
+  assert(pred_plane < CFL_PRED_PLANES);
+  assert(width <= CFL_BUF_LINE);
+
+  if (is_cur_buf_hbd(xd)) {
+    uint16_t *const input_16 = CONVERT_TO_SHORTPTR(input);
+    memcpy(xd->cfl.dc_pred_cache[pred_plane], input_16, width << 1);
+    return;
+  }
+
+  memcpy(xd->cfl.dc_pred_cache[pred_plane], input, width);
+}
+
+static void cfl_load_dc_pred_lbd(const int16_t *dc_pred_cache, uint8_t *dst,
+                                 int dst_stride, int width, int height) {
+  for (int j = 0; j < height; j++) {
+    memcpy(dst, dc_pred_cache, width);
+    dst += dst_stride;
+  }
+}
+
+static void cfl_load_dc_pred_hbd(const int16_t *dc_pred_cache, uint16_t *dst,
+                                 int dst_stride, int width, int height) {
+  const size_t num_bytes = width << 1;
+  for (int j = 0; j < height; j++) {
+    memcpy(dst, dc_pred_cache, num_bytes);
+    dst += dst_stride;
+  }
+}
+void cfl_load_dc_pred(MACROBLOCKD *const xd, uint8_t *dst, int dst_stride,
+                      TX_SIZE tx_size, CFL_PRED_TYPE pred_plane) {
+  const int width = tx_size_wide[tx_size];
+  const int height = tx_size_high[tx_size];
+  assert(pred_plane < CFL_PRED_PLANES);
+  assert(width <= CFL_BUF_LINE);
+  assert(height <= CFL_BUF_LINE);
+  if (is_cur_buf_hbd(xd)) {
+    uint16_t *dst_16 = CONVERT_TO_SHORTPTR(dst);
+    cfl_load_dc_pred_hbd(xd->cfl.dc_pred_cache[pred_plane], dst_16, dst_stride,
+                         width, height);
+    return;
+  }
+  cfl_load_dc_pred_lbd(xd->cfl.dc_pred_cache[pred_plane], dst, dst_stride,
+                       width, height);
+}
+
+// Due to frame boundary issues, it is possible that the total area covered by
+// chroma exceeds that of luma. When this happens, we fill the missing pixels by
+// repeating the last columns and/or rows.
+static INLINE void cfl_pad(CFL_CTX *cfl, int width, int height) {
+  const int diff_width = width - cfl->buf_width;
+  const int diff_height = height - cfl->buf_height;
+
+  if (diff_width > 0) {
+    const int min_height = height - diff_height;
+    uint16_t *recon_buf_q3 = cfl->recon_buf_q3 + (width - diff_width);
+    for (int j = 0; j < min_height; j++) {
+      const uint16_t last_pixel = recon_buf_q3[-1];
+      assert(recon_buf_q3 + diff_width <= cfl->recon_buf_q3 + CFL_BUF_SQUARE);
+      for (int i = 0; i < diff_width; i++) {
+        recon_buf_q3[i] = last_pixel;
+      }
+      recon_buf_q3 += CFL_BUF_LINE;
+    }
+    cfl->buf_width = width;
+  }
+  if (diff_height > 0) {
+    uint16_t *recon_buf_q3 =
+        cfl->recon_buf_q3 + ((height - diff_height) * CFL_BUF_LINE);
+    for (int j = 0; j < diff_height; j++) {
+      const uint16_t *last_row_q3 = recon_buf_q3 - CFL_BUF_LINE;
+      assert(recon_buf_q3 + width <= cfl->recon_buf_q3 + CFL_BUF_SQUARE);
+      for (int i = 0; i < width; i++) {
+        recon_buf_q3[i] = last_row_q3[i];
+      }
+      recon_buf_q3 += CFL_BUF_LINE;
+    }
+    cfl->buf_height = height;
+  }
+}
+
+static void subtract_average_c(const uint16_t *src, int16_t *dst, int width,
+                               int height, int round_offset, int num_pel_log2) {
+  int sum = round_offset;
+  const uint16_t *recon = src;
+  for (int j = 0; j < height; j++) {
+    for (int i = 0; i < width; i++) {
+      sum += recon[i];
+    }
+    recon += CFL_BUF_LINE;
+  }
+  const int avg = sum >> num_pel_log2;
+  for (int j = 0; j < height; j++) {
+    for (int i = 0; i < width; i++) {
+      dst[i] = src[i] - avg;
+    }
+    src += CFL_BUF_LINE;
+    dst += CFL_BUF_LINE;
+  }
+}
+
+CFL_SUB_AVG_FN(c)
+
+static INLINE int cfl_idx_to_alpha(int alpha_idx, int joint_sign,
+                                   CFL_PRED_TYPE pred_type) {
+  const int alpha_sign = (pred_type == CFL_PRED_U) ? CFL_SIGN_U(joint_sign)
+                                                   : CFL_SIGN_V(joint_sign);
+  if (alpha_sign == CFL_SIGN_ZERO) return 0;
+  const int abs_alpha_q3 =
+      (pred_type == CFL_PRED_U) ? CFL_IDX_U(alpha_idx) : CFL_IDX_V(alpha_idx);
+  return (alpha_sign == CFL_SIGN_POS) ? abs_alpha_q3 + 1 : -abs_alpha_q3 - 1;
+}
+
+static INLINE void cfl_predict_lbd_c(const int16_t *ac_buf_q3, uint8_t *dst,
+                                     int dst_stride, int alpha_q3, int width,
+                                     int height) {
+  for (int j = 0; j < height; j++) {
+    for (int i = 0; i < width; i++) {
+      dst[i] = clip_pixel(get_scaled_luma_q0(alpha_q3, ac_buf_q3[i]) + dst[i]);
+    }
+    dst += dst_stride;
+    ac_buf_q3 += CFL_BUF_LINE;
+  }
+}
+
+CFL_PREDICT_FN(c, lbd)
+
+void cfl_predict_hbd_c(const int16_t *ac_buf_q3, uint16_t *dst, int dst_stride,
+                       int alpha_q3, int bit_depth, int width, int height) {
+  for (int j = 0; j < height; j++) {
+    for (int i = 0; i < width; i++) {
+      dst[i] = clip_pixel_highbd(
+          get_scaled_luma_q0(alpha_q3, ac_buf_q3[i]) + dst[i], bit_depth);
+    }
+    dst += dst_stride;
+    ac_buf_q3 += CFL_BUF_LINE;
+  }
+}
+
+CFL_PREDICT_FN(c, hbd)
+
+static void cfl_compute_parameters(MACROBLOCKD *const xd, TX_SIZE tx_size) {
+  CFL_CTX *const cfl = &xd->cfl;
+  // Do not call cfl_compute_parameters multiple time on the same values.
+  assert(cfl->are_parameters_computed == 0);
+
+  cfl_pad(cfl, tx_size_wide[tx_size], tx_size_high[tx_size]);
+  get_subtract_average_fn(tx_size)(cfl->recon_buf_q3, cfl->ac_buf_q3);
+  cfl->are_parameters_computed = 1;
+}
+
+void cfl_predict_block(MACROBLOCKD *const xd, uint8_t *dst, int dst_stride,
+                       TX_SIZE tx_size, int plane) {
+  CFL_CTX *const cfl = &xd->cfl;
+  MB_MODE_INFO *mbmi = xd->mi[0];
+  assert(is_cfl_allowed(xd));
+
+  if (!cfl->are_parameters_computed) cfl_compute_parameters(xd, tx_size);
+
+  const int alpha_q3 =
+      cfl_idx_to_alpha(mbmi->cfl_alpha_idx, mbmi->cfl_alpha_signs, plane - 1);
+  assert((tx_size_high[tx_size] - 1) * CFL_BUF_LINE + tx_size_wide[tx_size] <=
+         CFL_BUF_SQUARE);
+  if (is_cur_buf_hbd(xd)) {
+    uint16_t *dst_16 = CONVERT_TO_SHORTPTR(dst);
+    get_predict_hbd_fn(tx_size)(cfl->ac_buf_q3, dst_16, dst_stride, alpha_q3,
+                                xd->bd);
+    return;
+  }
+  get_predict_lbd_fn(tx_size)(cfl->ac_buf_q3, dst, dst_stride, alpha_q3);
+}
+
+static void cfl_luma_subsampling_420_lbd_c(const uint8_t *input,
+                                           int input_stride,
+                                           uint16_t *output_q3, int width,
+                                           int height) {
+  for (int j = 0; j < height; j += 2) {
+    for (int i = 0; i < width; i += 2) {
+      const int bot = i + input_stride;
+      output_q3[i >> 1] =
+          (input[i] + input[i + 1] + input[bot] + input[bot + 1]) << 1;
+    }
+    input += input_stride << 1;
+    output_q3 += CFL_BUF_LINE;
+  }
+}
+
+static void cfl_luma_subsampling_422_lbd_c(const uint8_t *input,
+                                           int input_stride,
+                                           uint16_t *output_q3, int width,
+                                           int height) {
+  assert((height - 1) * CFL_BUF_LINE + width <= CFL_BUF_SQUARE);
+  for (int j = 0; j < height; j++) {
+    for (int i = 0; i < width; i += 2) {
+      output_q3[i >> 1] = (input[i] + input[i + 1]) << 2;
+    }
+    input += input_stride;
+    output_q3 += CFL_BUF_LINE;
+  }
+}
+
+static void cfl_luma_subsampling_444_lbd_c(const uint8_t *input,
+                                           int input_stride,
+                                           uint16_t *output_q3, int width,
+                                           int height) {
+  assert((height - 1) * CFL_BUF_LINE + width <= CFL_BUF_SQUARE);
+  for (int j = 0; j < height; j++) {
+    for (int i = 0; i < width; i++) {
+      output_q3[i] = input[i] << 3;
+    }
+    input += input_stride;
+    output_q3 += CFL_BUF_LINE;
+  }
+}
+
+static void cfl_luma_subsampling_420_hbd_c(const uint16_t *input,
+                                           int input_stride,
+                                           uint16_t *output_q3, int width,
+                                           int height) {
+  for (int j = 0; j < height; j += 2) {
+    for (int i = 0; i < width; i += 2) {
+      const int bot = i + input_stride;
+      output_q3[i >> 1] =
+          (input[i] + input[i + 1] + input[bot] + input[bot + 1]) << 1;
+    }
+    input += input_stride << 1;
+    output_q3 += CFL_BUF_LINE;
+  }
+}
+
+static void cfl_luma_subsampling_422_hbd_c(const uint16_t *input,
+                                           int input_stride,
+                                           uint16_t *output_q3, int width,
+                                           int height) {
+  assert((height - 1) * CFL_BUF_LINE + width <= CFL_BUF_SQUARE);
+  for (int j = 0; j < height; j++) {
+    for (int i = 0; i < width; i += 2) {
+      output_q3[i >> 1] = (input[i] + input[i + 1]) << 2;
+    }
+    input += input_stride;
+    output_q3 += CFL_BUF_LINE;
+  }
+}
+
+static void cfl_luma_subsampling_444_hbd_c(const uint16_t *input,
+                                           int input_stride,
+                                           uint16_t *output_q3, int width,
+                                           int height) {
+  assert((height - 1) * CFL_BUF_LINE + width <= CFL_BUF_SQUARE);
+  for (int j = 0; j < height; j++) {
+    for (int i = 0; i < width; i++) {
+      output_q3[i] = input[i] << 3;
+    }
+    input += input_stride;
+    output_q3 += CFL_BUF_LINE;
+  }
+}
+
+CFL_GET_SUBSAMPLE_FUNCTION(c)
+
+static INLINE cfl_subsample_hbd_fn cfl_subsampling_hbd(TX_SIZE tx_size,
+                                                       int sub_x, int sub_y) {
+  if (sub_x == 1) {
+    if (sub_y == 1) {
+      return cfl_get_luma_subsampling_420_hbd(tx_size);
+    }
+    return cfl_get_luma_subsampling_422_hbd(tx_size);
+  }
+  return cfl_get_luma_subsampling_444_hbd(tx_size);
+}
+
+static INLINE cfl_subsample_lbd_fn cfl_subsampling_lbd(TX_SIZE tx_size,
+                                                       int sub_x, int sub_y) {
+  if (sub_x == 1) {
+    if (sub_y == 1) {
+      return cfl_get_luma_subsampling_420_lbd(tx_size);
+    }
+    return cfl_get_luma_subsampling_422_lbd(tx_size);
+  }
+  return cfl_get_luma_subsampling_444_lbd(tx_size);
+}
+
+static void cfl_store(CFL_CTX *cfl, const uint8_t *input, int input_stride,
+                      int row, int col, TX_SIZE tx_size, int use_hbd) {
+  const int width = tx_size_wide[tx_size];
+  const int height = tx_size_high[tx_size];
+  const int tx_off_log2 = tx_size_wide_log2[0];
+  const int sub_x = cfl->subsampling_x;
+  const int sub_y = cfl->subsampling_y;
+  const int store_row = row << (tx_off_log2 - sub_y);
+  const int store_col = col << (tx_off_log2 - sub_x);
+  const int store_height = height >> sub_y;
+  const int store_width = width >> sub_x;
+
+  // Invalidate current parameters
+  cfl->are_parameters_computed = 0;
+
+  // Store the surface of the pixel buffer that was written to, this way we
+  // can manage chroma overrun (e.g. when the chroma surfaces goes beyond the
+  // frame boundary)
+  if (col == 0 && row == 0) {
+    cfl->buf_width = store_width;
+    cfl->buf_height = store_height;
+  } else {
+    cfl->buf_width = OD_MAXI(store_col + store_width, cfl->buf_width);
+    cfl->buf_height = OD_MAXI(store_row + store_height, cfl->buf_height);
+  }
+
+  // Check that we will remain inside the pixel buffer.
+  assert(store_row + store_height <= CFL_BUF_LINE);
+  assert(store_col + store_width <= CFL_BUF_LINE);
+
+  // Store the input into the CfL pixel buffer
+  uint16_t *recon_buf_q3 =
+      cfl->recon_buf_q3 + (store_row * CFL_BUF_LINE + store_col);
+
+  if (use_hbd) {
+    cfl_subsampling_hbd(tx_size, sub_x, sub_y)(CONVERT_TO_SHORTPTR(input),
+                                               input_stride, recon_buf_q3);
+  } else {
+    cfl_subsampling_lbd(tx_size, sub_x, sub_y)(input, input_stride,
+                                               recon_buf_q3);
+  }
+}
+
+// Adjust the row and column of blocks smaller than 8X8, as chroma-referenced
+// and non-chroma-referenced blocks are stored together in the CfL buffer.
+static INLINE void sub8x8_adjust_offset(const CFL_CTX *cfl, int *row_out,
+                                        int *col_out) {
+  // Increment row index for bottom: 8x4, 16x4 or both bottom 4x4s.
+  if ((cfl->mi_row & 0x01) && cfl->subsampling_y) {
+    assert(*row_out == 0);
+    (*row_out)++;
+  }
+
+  // Increment col index for right: 4x8, 4x16 or both right 4x4s.
+  if ((cfl->mi_col & 0x01) && cfl->subsampling_x) {
+    assert(*col_out == 0);
+    (*col_out)++;
+  }
+}
+
+void cfl_store_tx(MACROBLOCKD *const xd, int row, int col, TX_SIZE tx_size,
+                  BLOCK_SIZE bsize) {
+  CFL_CTX *const cfl = &xd->cfl;
+  struct macroblockd_plane *const pd = &xd->plane[AOM_PLANE_Y];
+  uint8_t *dst =
+      &pd->dst.buf[(row * pd->dst.stride + col) << tx_size_wide_log2[0]];
+
+  if (block_size_high[bsize] == 4 || block_size_wide[bsize] == 4) {
+    // Only dimensions of size 4 can have an odd offset.
+    assert(!((col & 1) && tx_size_wide[tx_size] != 4));
+    assert(!((row & 1) && tx_size_high[tx_size] != 4));
+    sub8x8_adjust_offset(cfl, &row, &col);
+  }
+  cfl_store(cfl, dst, pd->dst.stride, row, col, tx_size, is_cur_buf_hbd(xd));
+}
+
+void cfl_store_block(MACROBLOCKD *const xd, BLOCK_SIZE bsize, TX_SIZE tx_size) {
+  CFL_CTX *const cfl = &xd->cfl;
+  struct macroblockd_plane *const pd = &xd->plane[AOM_PLANE_Y];
+  int row = 0;
+  int col = 0;
+
+  if (block_size_high[bsize] == 4 || block_size_wide[bsize] == 4) {
+    sub8x8_adjust_offset(cfl, &row, &col);
+  }
+  const int width = max_intra_block_width(xd, bsize, AOM_PLANE_Y, tx_size);
+  const int height = max_intra_block_height(xd, bsize, AOM_PLANE_Y, tx_size);
+  tx_size = get_tx_size(width, height);
+  cfl_store(cfl, pd->dst.buf, pd->dst.stride, row, col, tx_size,
+            is_cur_buf_hbd(xd));
+}

diff --git a/libav1/av1/common/cfl.h b/libav1/av1/common/cfl.h
new file mode 100644
index 0000000..3b91d85
--- /dev/null
+++ b/libav1/av1/common/cfl.h

@@ -0,0 +1,278 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_CFL_H_
+#define AOM_AV1_COMMON_CFL_H_
+
+#include "av1/common/blockd.h"
+#include "av1/common/onyxc_int.h"
+
+// Can we use CfL for the current block?
+static INLINE CFL_ALLOWED_TYPE is_cfl_allowed(const MACROBLOCKD *xd) {
+  const MB_MODE_INFO *mbmi = xd->mi[0];
+  const BLOCK_SIZE bsize = mbmi->sb_type;
+  assert(bsize < BLOCK_SIZES_ALL);
+  if (xd->lossless[mbmi->segment_id]) {
+    // In lossless, CfL is available when the partition size is equal to the
+    // transform size.
+    const int ssx = xd->plane[AOM_PLANE_U].subsampling_x;
+    const int ssy = xd->plane[AOM_PLANE_U].subsampling_y;
+    const int plane_bsize = get_plane_block_size(bsize, ssx, ssy);
+    return (CFL_ALLOWED_TYPE)(plane_bsize == BLOCK_4X4);
+  }
+  // Spec: CfL is available to luma partitions lesser than or equal to 32x32
+  return (CFL_ALLOWED_TYPE)(block_size_wide[bsize] <= 32 &&
+                            block_size_high[bsize] <= 32);
+}
+
+// Do we need to save the luma pixels from the current block,
+// for a possible future CfL prediction?
+static INLINE CFL_ALLOWED_TYPE store_cfl_required(const AV1_COMMON *cm,
+                                                  const MACROBLOCKD *xd) {
+  const MB_MODE_INFO *mbmi = xd->mi[0];
+
+  if (cm->seq_params.monochrome) return CFL_DISALLOWED;
+
+  if (!xd->cfl.is_chroma_reference) {
+    // For non-chroma-reference blocks, we should always store the luma pixels,
+    // in case the corresponding chroma-reference block uses CfL.
+    // Note that this can only happen for block sizes which are <8 on
+    // their shortest side, as otherwise they would be chroma reference
+    // blocks.
+    return CFL_ALLOWED;
+  }
+
+  // If this block has chroma information, we know whether we're
+  // actually going to perform a CfL prediction
+  return (CFL_ALLOWED_TYPE)(!is_inter_block(mbmi) &&
+                            mbmi->uv_mode == UV_CFL_PRED);
+}
+
+static INLINE int get_scaled_luma_q0(int alpha_q3, int16_t pred_buf_q3) {
+  int scaled_luma_q6 = alpha_q3 * pred_buf_q3;
+  return ROUND_POWER_OF_TWO_SIGNED(scaled_luma_q6, 6);
+}
+
+static INLINE CFL_PRED_TYPE get_cfl_pred_type(PLANE_TYPE plane) {
+  assert(plane > 0);
+  return (CFL_PRED_TYPE)(plane - 1);
+}
+
+void cfl_predict_block(MACROBLOCKD *const xd, uint8_t *dst, int dst_stride,
+                       TX_SIZE tx_size, int plane);
+
+void cfl_store_block(MACROBLOCKD *const xd, BLOCK_SIZE bsize, TX_SIZE tx_size);
+
+void cfl_store_tx(MACROBLOCKD *const xd, int row, int col, TX_SIZE tx_size,
+                  BLOCK_SIZE bsize);
+
+void cfl_store_dc_pred(MACROBLOCKD *const xd, const uint8_t *input,
+                       CFL_PRED_TYPE pred_plane, int width);
+
+void cfl_load_dc_pred(MACROBLOCKD *const xd, uint8_t *dst, int dst_stride,
+                      TX_SIZE tx_size, CFL_PRED_TYPE pred_plane);
+
+// Allows the CFL_SUBSAMPLE function to switch types depending on the bitdepth.
+#define CFL_lbd_TYPE uint8_t *cfl_type
+#define CFL_hbd_TYPE uint16_t *cfl_type
+
+// Declare a size-specific wrapper for the size-generic function. The compiler
+// will inline the size generic function in here, the advantage is that the size
+// will be constant allowing for loop unrolling and other constant propagated
+// goodness.
+#define CFL_SUBSAMPLE(arch, sub, bd, width, height)                       \
+  void subsample_##bd##_##sub##_##width##x##height##_##arch(              \
+      const CFL_##bd##_TYPE, int input_stride, uint16_t *output_q3) {     \
+    cfl_luma_subsampling_##sub##_##bd##_##arch(cfl_type, input_stride,    \
+                                               output_q3, width, height); \
+  }
+
+// Declare size-specific wrappers for all valid CfL sizes.
+#define CFL_SUBSAMPLE_FUNCTIONS(arch, sub, bd)                            \
+  CFL_SUBSAMPLE(arch, sub, bd, 4, 4)                                      \
+  CFL_SUBSAMPLE(arch, sub, bd, 8, 8)                                      \
+  CFL_SUBSAMPLE(arch, sub, bd, 16, 16)                                    \
+  CFL_SUBSAMPLE(arch, sub, bd, 32, 32)                                    \
+  CFL_SUBSAMPLE(arch, sub, bd, 4, 8)                                      \
+  CFL_SUBSAMPLE(arch, sub, bd, 8, 4)                                      \
+  CFL_SUBSAMPLE(arch, sub, bd, 8, 16)                                     \
+  CFL_SUBSAMPLE(arch, sub, bd, 16, 8)                                     \
+  CFL_SUBSAMPLE(arch, sub, bd, 16, 32)                                    \
+  CFL_SUBSAMPLE(arch, sub, bd, 32, 16)                                    \
+  CFL_SUBSAMPLE(arch, sub, bd, 4, 16)                                     \
+  CFL_SUBSAMPLE(arch, sub, bd, 16, 4)                                     \
+  CFL_SUBSAMPLE(arch, sub, bd, 8, 32)                                     \
+  CFL_SUBSAMPLE(arch, sub, bd, 32, 8)                                     \
+  cfl_subsample_##bd##_fn cfl_get_luma_subsampling_##sub##_##bd##_##arch( \
+      TX_SIZE tx_size) {                                                  \
+    CFL_SUBSAMPLE_FUNCTION_ARRAY(arch, sub, bd)                           \
+    return subfn_##sub[tx_size];                                          \
+  }
+
+// Declare an architecture-specific array of function pointers for size-specific
+// wrappers.
+#define CFL_SUBSAMPLE_FUNCTION_ARRAY(arch, sub, bd)                       \
+  static const cfl_subsample_##bd##_fn subfn_##sub[TX_SIZES_ALL] = {      \
+    subsample_##bd##_##sub##_4x4_##arch,   /* 4x4 */                      \
+    subsample_##bd##_##sub##_8x8_##arch,   /* 8x8 */                      \
+    subsample_##bd##_##sub##_16x16_##arch, /* 16x16 */                    \
+    subsample_##bd##_##sub##_32x32_##arch, /* 32x32 */                    \
+    NULL,                                  /* 64x64 (invalid CFL size) */ \
+    subsample_##bd##_##sub##_4x8_##arch,   /* 4x8 */                      \
+    subsample_##bd##_##sub##_8x4_##arch,   /* 8x4 */                      \
+    subsample_##bd##_##sub##_8x16_##arch,  /* 8x16 */                     \
+    subsample_##bd##_##sub##_16x8_##arch,  /* 16x8 */                     \
+    subsample_##bd##_##sub##_16x32_##arch, /* 16x32 */                    \
+    subsample_##bd##_##sub##_32x16_##arch, /* 32x16 */                    \
+    NULL,                                  /* 32x64 (invalid CFL size) */ \
+    NULL,                                  /* 64x32 (invalid CFL size) */ \
+    subsample_##bd##_##sub##_4x16_##arch,  /* 4x16  */                    \
+    subsample_##bd##_##sub##_16x4_##arch,  /* 16x4  */                    \
+    subsample_##bd##_##sub##_8x32_##arch,  /* 8x32  */                    \
+    subsample_##bd##_##sub##_32x8_##arch,  /* 32x8  */                    \
+    NULL,                                  /* 16x64 (invalid CFL size) */ \
+    NULL,                                  /* 64x16 (invalid CFL size) */ \
+  };
+
+// The RTCD script does not support passing in an array, so we wrap it in this
+// function.
+#define CFL_GET_SUBSAMPLE_FUNCTION(arch)  \
+  CFL_SUBSAMPLE_FUNCTIONS(arch, 420, lbd) \
+  CFL_SUBSAMPLE_FUNCTIONS(arch, 422, lbd) \
+  CFL_SUBSAMPLE_FUNCTIONS(arch, 444, lbd) \
+  CFL_SUBSAMPLE_FUNCTIONS(arch, 420, hbd) \
+  CFL_SUBSAMPLE_FUNCTIONS(arch, 422, hbd) \
+  CFL_SUBSAMPLE_FUNCTIONS(arch, 444, hbd)
+
+// Declare a size-specific wrapper for the size-generic function. The compiler
+// will inline the size generic function in here, the advantage is that the size
+// will be constant allowing for loop unrolling and other constant propagated
+// goodness.
+#define CFL_SUB_AVG_X(arch, width, height, round_offset, num_pel_log2)   \
+  void subtract_average_##width##x##height##_##arch(const uint16_t *src, \
+                                                    int16_t *dst) {      \
+    subtract_average_##arch(src, dst, width, height, round_offset,       \
+                            num_pel_log2);                               \
+  }
+
+// Declare size-specific wrappers for all valid CfL sizes.
+#define CFL_SUB_AVG_FN(arch)                                                \
+  CFL_SUB_AVG_X(arch, 4, 4, 8, 4)                                           \
+  CFL_SUB_AVG_X(arch, 4, 8, 16, 5)                                          \
+  CFL_SUB_AVG_X(arch, 4, 16, 32, 6)                                         \
+  CFL_SUB_AVG_X(arch, 8, 4, 16, 5)                                          \
+  CFL_SUB_AVG_X(arch, 8, 8, 32, 6)                                          \
+  CFL_SUB_AVG_X(arch, 8, 16, 64, 7)                                         \
+  CFL_SUB_AVG_X(arch, 8, 32, 128, 8)                                        \
+  CFL_SUB_AVG_X(arch, 16, 4, 32, 6)                                         \
+  CFL_SUB_AVG_X(arch, 16, 8, 64, 7)                                         \
+  CFL_SUB_AVG_X(arch, 16, 16, 128, 8)                                       \
+  CFL_SUB_AVG_X(arch, 16, 32, 256, 9)                                       \
+  CFL_SUB_AVG_X(arch, 32, 8, 128, 8)                                        \
+  CFL_SUB_AVG_X(arch, 32, 16, 256, 9)                                       \
+  CFL_SUB_AVG_X(arch, 32, 32, 512, 10)                                      \
+  cfl_subtract_average_fn get_subtract_average_fn_##arch(TX_SIZE tx_size) { \
+    static const cfl_subtract_average_fn sub_avg[TX_SIZES_ALL] = {          \
+      subtract_average_4x4_##arch,   /* 4x4 */                              \
+      subtract_average_8x8_##arch,   /* 8x8 */                              \
+      subtract_average_16x16_##arch, /* 16x16 */                            \
+      subtract_average_32x32_##arch, /* 32x32 */                            \
+      NULL,                          /* 64x64 (invalid CFL size) */         \
+      subtract_average_4x8_##arch,   /* 4x8 */                              \
+      subtract_average_8x4_##arch,   /* 8x4 */                              \
+      subtract_average_8x16_##arch,  /* 8x16 */                             \
+      subtract_average_16x8_##arch,  /* 16x8 */                             \
+      subtract_average_16x32_##arch, /* 16x32 */                            \
+      subtract_average_32x16_##arch, /* 32x16 */                            \
+      NULL,                          /* 32x64 (invalid CFL size) */         \
+      NULL,                          /* 64x32 (invalid CFL size) */         \
+      subtract_average_4x16_##arch,  /* 4x16 (invalid CFL size) */          \
+      subtract_average_16x4_##arch,  /* 16x4 (invalid CFL size) */          \
+      subtract_average_8x32_##arch,  /* 8x32 (invalid CFL size) */          \
+      subtract_average_32x8_##arch,  /* 32x8 (invalid CFL size) */          \
+      NULL,                          /* 16x64 (invalid CFL size) */         \
+      NULL,                          /* 64x16 (invalid CFL size) */         \
+    };                                                                      \
+    /* Modulo TX_SIZES_ALL to ensure that an attacker won't be able to */   \
+    /* index the function pointer array out of bounds. */                   \
+    return sub_avg[tx_size % TX_SIZES_ALL];                                 \
+  }
+
+// For VSX SIMD optimization, the C versions of width == 4 subtract are
+// faster than the VSX. As such, the VSX code calls the C versions.
+void subtract_average_4x4_c(const uint16_t *src, int16_t *dst);
+void subtract_average_4x8_c(const uint16_t *src, int16_t *dst);
+void subtract_average_4x16_c(const uint16_t *src, int16_t *dst);
+
+#define CFL_PREDICT_lbd(arch, width, height)                                 \
+  void predict_lbd_##width##x##height##_##arch(const int16_t *pred_buf_q3,   \
+                                               uint8_t *dst, int dst_stride, \
+                                               int alpha_q3) {               \
+    cfl_predict_lbd_##arch(pred_buf_q3, dst, dst_stride, alpha_q3, width,    \
+                           height);                                          \
+  }
+
+#define CFL_PREDICT_hbd(arch, width, height)                                  \
+  void predict_hbd_##width##x##height##_##arch(const int16_t *pred_buf_q3,    \
+                                               uint16_t *dst, int dst_stride, \
+                                               int alpha_q3, int bd) {        \
+    cfl_predict_hbd_##arch(pred_buf_q3, dst, dst_stride, alpha_q3, bd, width, \
+                           height);                                           \
+  }
+
+// This wrapper exists because clang format does not like calling macros with
+// lowercase letters.
+#define CFL_PREDICT_X(arch, width, height, bd) \
+  CFL_PREDICT_##bd(arch, width, height)
+
+#define CFL_PREDICT_FN(arch, bd)                                          \
+  CFL_PREDICT_X(arch, 4, 4, bd)                                           \
+  CFL_PREDICT_X(arch, 4, 8, bd)                                           \
+  CFL_PREDICT_X(arch, 4, 16, bd)                                          \
+  CFL_PREDICT_X(arch, 8, 4, bd)                                           \
+  CFL_PREDICT_X(arch, 8, 8, bd)                                           \
+  CFL_PREDICT_X(arch, 8, 16, bd)                                          \
+  CFL_PREDICT_X(arch, 8, 32, bd)                                          \
+  CFL_PREDICT_X(arch, 16, 4, bd)                                          \
+  CFL_PREDICT_X(arch, 16, 8, bd)                                          \
+  CFL_PREDICT_X(arch, 16, 16, bd)                                         \
+  CFL_PREDICT_X(arch, 16, 32, bd)                                         \
+  CFL_PREDICT_X(arch, 32, 8, bd)                                          \
+  CFL_PREDICT_X(arch, 32, 16, bd)                                         \
+  CFL_PREDICT_X(arch, 32, 32, bd)                                         \
+  cfl_predict_##bd##_fn get_predict_##bd##_fn_##arch(TX_SIZE tx_size) {   \
+    static const cfl_predict_##bd##_fn pred[TX_SIZES_ALL] = {             \
+      predict_##bd##_4x4_##arch,   /* 4x4 */                              \
+      predict_##bd##_8x8_##arch,   /* 8x8 */                              \
+      predict_##bd##_16x16_##arch, /* 16x16 */                            \
+      predict_##bd##_32x32_##arch, /* 32x32 */                            \
+      NULL,                        /* 64x64 (invalid CFL size) */         \
+      predict_##bd##_4x8_##arch,   /* 4x8 */                              \
+      predict_##bd##_8x4_##arch,   /* 8x4 */                              \
+      predict_##bd##_8x16_##arch,  /* 8x16 */                             \
+      predict_##bd##_16x8_##arch,  /* 16x8 */                             \
+      predict_##bd##_16x32_##arch, /* 16x32 */                            \
+      predict_##bd##_32x16_##arch, /* 32x16 */                            \
+      NULL,                        /* 32x64 (invalid CFL size) */         \
+      NULL,                        /* 64x32 (invalid CFL size) */         \
+      predict_##bd##_4x16_##arch,  /* 4x16  */                            \
+      predict_##bd##_16x4_##arch,  /* 16x4  */                            \
+      predict_##bd##_8x32_##arch,  /* 8x32  */                            \
+      predict_##bd##_32x8_##arch,  /* 32x8  */                            \
+      NULL,                        /* 16x64 (invalid CFL size) */         \
+      NULL,                        /* 64x16 (invalid CFL size) */         \
+    };                                                                    \
+    /* Modulo TX_SIZES_ALL to ensure that an attacker won't be able to */ \
+    /* index the function pointer array out of bounds. */                 \
+    return pred[tx_size % TX_SIZES_ALL];                                  \
+  }
+
+#endif  // AOM_AV1_COMMON_CFL_H_

diff --git a/libav1/av1/common/common.h b/libav1/av1/common/common.h
new file mode 100644
index 0000000..bed6083
--- /dev/null
+++ b/libav1/av1/common/common.h

@@ -0,0 +1,63 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_COMMON_H_
+#define AOM_AV1_COMMON_COMMON_H_
+
+/* Interface header for common constant data structures and lookup tables */
+
+#include <assert.h>
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_mem/aom_mem.h"
+#include "aom/aom_integer.h"
+#include "aom_ports/bitops.h"
+#include "config/aom_config.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define PI 3.141592653589793238462643383279502884
+
+// Only need this for fixed-size arrays, for structs just assign.
+#define av1_copy(dest, src)              \
+  {                                      \
+    assert(sizeof(dest) == sizeof(src)); \
+    memcpy(dest, src, sizeof(src));      \
+  }
+
+// Use this for variably-sized arrays.
+#define av1_copy_array(dest, src, n)           \
+  {                                            \
+    assert(sizeof(*(dest)) == sizeof(*(src))); \
+    memcpy(dest, src, n * sizeof(*(src)));     \
+  }
+
+#define av1_zero(dest) memset(&(dest), 0, sizeof(dest))
+#define av1_zero_array(dest, n) memset(dest, 0, n * sizeof(*(dest)))
+
+static INLINE int get_unsigned_bits(unsigned int num_values) {
+  return num_values > 0 ? get_msb(num_values) + 1 : 0;
+}
+
+#define CHECK_MEM_ERROR(cm, lval, expr) \
+  AOM_CHECK_MEM_ERROR(&cm->error, lval, expr)
+
+#define AOM_FRAME_MARKER 0x2
+
+#define AV1_MIN_TILE_SIZE_BYTES 1
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_COMMON_H_

diff --git a/libav1/av1/common/common_data.h b/libav1/av1/common/common_data.h
new file mode 100644
index 0000000..46e455f
--- /dev/null
+++ b/libav1/av1/common/common_data.h

@@ -0,0 +1,446 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_COMMON_DATA_H_
+#define AOM_AV1_COMMON_COMMON_DATA_H_
+
+#include "av1/common/enums.h"
+#include "aom/aom_integer.h"
+#include "aom_dsp/aom_dsp_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// Log 2 conversion lookup tables in units of mode info (4x4).
+// The Mi_Width_Log2 table in the spec (Section 9.3. Conversion tables).
+static const uint8_t mi_size_wide_log2[BLOCK_SIZES_ALL] = {
+  0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 0, 2, 1, 3, 2, 4
+};
+// The Mi_Height_Log2 table in the spec (Section 9.3. Conversion tables).
+static const uint8_t mi_size_high_log2[BLOCK_SIZES_ALL] = {
+  0, 1, 0, 1, 2, 1, 2, 3, 2, 3, 4, 3, 4, 5, 4, 5, 2, 0, 3, 1, 4, 2
+};
+
+// Width/height lookup tables in units of mode info (4x4).
+// The Num_4x4_Blocks_Wide table in the spec (Section 9.3. Conversion tables).
+static const uint8_t mi_size_wide[BLOCK_SIZES_ALL] = {
+  1, 1, 2, 2, 2, 4, 4, 4, 8, 8, 8, 16, 16, 16, 32, 32, 1, 4, 2, 8, 4, 16
+};
+
+// The Num_4x4_Blocks_High table in the spec (Section 9.3. Conversion tables).
+static const uint8_t mi_size_high[BLOCK_SIZES_ALL] = {
+  1, 2, 1, 2, 4, 2, 4, 8, 4, 8, 16, 8, 16, 32, 16, 32, 4, 1, 8, 2, 16, 4
+};
+
+// Width/height lookup tables in units of samples.
+// The Block_Width table in the spec (Section 9.3. Conversion tables).
+static const uint8_t block_size_wide[BLOCK_SIZES_ALL] = {
+  4,  4,  8,  8,   8,   16, 16, 16, 32, 32, 32,
+  64, 64, 64, 128, 128, 4,  16, 8,  32, 16, 64
+};
+
+// The Block_Height table in the spec (Section 9.3. Conversion tables).
+static const uint8_t block_size_high[BLOCK_SIZES_ALL] = {
+  4,  8,  4,   8,  16,  8,  16, 32, 16, 32, 64,
+  32, 64, 128, 64, 128, 16, 4,  32, 8,  64, 16
+};
+
+// Maps a block size to a context.
+// The Size_Group table in the spec (Section 9.3. Conversion tables).
+// AOMMIN(3, AOMMIN(mi_size_wide_log2(bsize), mi_size_high_log2(bsize)))
+static const uint8_t size_group_lookup[BLOCK_SIZES_ALL] = {
+  0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 0, 0, 1, 1, 2, 2
+};
+
+static const uint8_t num_pels_log2_lookup[BLOCK_SIZES_ALL] = {
+  4, 5, 5, 6, 7, 7, 8, 9, 9, 10, 11, 11, 12, 13, 13, 14, 6, 6, 8, 8, 10, 10
+};
+
+// A compressed version of the Partition_Subsize table in the spec (9.3.
+// Conversion tables), for square block sizes only.
+/* clang-format off */
+static const BLOCK_SIZE subsize_lookup[EXT_PARTITION_TYPES][SQR_BLOCK_SIZES] = {
+  {     // PARTITION_NONE
+    BLOCK_4X4, BLOCK_8X8, BLOCK_16X16,
+    BLOCK_32X32, BLOCK_64X64, BLOCK_128X128
+  }, {  // PARTITION_HORZ
+    BLOCK_INVALID, BLOCK_8X4, BLOCK_16X8,
+    BLOCK_32X16, BLOCK_64X32, BLOCK_128X64
+  }, {  // PARTITION_VERT
+    BLOCK_INVALID, BLOCK_4X8, BLOCK_8X16,
+    BLOCK_16X32, BLOCK_32X64, BLOCK_64X128
+  }, {  // PARTITION_SPLIT
+    BLOCK_INVALID, BLOCK_4X4, BLOCK_8X8,
+    BLOCK_16X16, BLOCK_32X32, BLOCK_64X64
+  }, {  // PARTITION_HORZ_A
+    BLOCK_INVALID, BLOCK_8X4, BLOCK_16X8,
+    BLOCK_32X16, BLOCK_64X32, BLOCK_128X64
+  }, {  // PARTITION_HORZ_B
+    BLOCK_INVALID, BLOCK_8X4, BLOCK_16X8,
+    BLOCK_32X16, BLOCK_64X32, BLOCK_128X64
+  }, {  // PARTITION_VERT_A
+    BLOCK_INVALID, BLOCK_4X8, BLOCK_8X16,
+    BLOCK_16X32, BLOCK_32X64, BLOCK_64X128
+  }, {  // PARTITION_VERT_B
+    BLOCK_INVALID, BLOCK_4X8, BLOCK_8X16,
+    BLOCK_16X32, BLOCK_32X64, BLOCK_64X128
+  }, {  // PARTITION_HORZ_4
+    BLOCK_INVALID, BLOCK_INVALID, BLOCK_16X4,
+    BLOCK_32X8, BLOCK_64X16, BLOCK_INVALID
+  }, {  // PARTITION_VERT_4
+    BLOCK_INVALID, BLOCK_INVALID, BLOCK_4X16,
+    BLOCK_8X32, BLOCK_16X64, BLOCK_INVALID
+  }
+};
+
+static const TX_SIZE max_txsize_lookup[BLOCK_SIZES_ALL] = {
+  //                   4X4
+                       TX_4X4,
+  // 4X8,    8X4,      8X8
+  TX_4X4,    TX_4X4,   TX_8X8,
+  // 8X16,   16X8,     16X16
+  TX_8X8,    TX_8X8,   TX_16X16,
+  // 16X32,  32X16,    32X32
+  TX_16X16,  TX_16X16, TX_32X32,
+  // 32X64,  64X32,
+  TX_32X32,  TX_32X32,
+  // 64X64
+  TX_64X64,
+  // 64x128, 128x64,   128x128
+  TX_64X64,  TX_64X64, TX_64X64,
+  // 4x16,   16x4,     8x32
+  TX_4X4,    TX_4X4,   TX_8X8,
+  // 32x8,   16x64     64x16
+  TX_8X8,    TX_16X16, TX_16X16
+};
+
+static const TX_SIZE max_txsize_rect_lookup[BLOCK_SIZES_ALL] = {
+      // 4X4
+      TX_4X4,
+      // 4X8,    8X4,      8X8
+      TX_4X8,    TX_8X4,   TX_8X8,
+      // 8X16,   16X8,     16X16
+      TX_8X16,   TX_16X8,  TX_16X16,
+      // 16X32,  32X16,    32X32
+      TX_16X32,  TX_32X16, TX_32X32,
+      // 32X64,  64X32,
+      TX_32X64,  TX_64X32,
+      // 64X64
+      TX_64X64,
+      // 64x128, 128x64,   128x128
+      TX_64X64,  TX_64X64, TX_64X64,
+      // 4x16,   16x4,
+      TX_4X16,   TX_16X4,
+      // 8x32,   32x8
+      TX_8X32,   TX_32X8,
+      // 16x64,  64x16
+      TX_16X64,  TX_64X16
+};
+
+static const TX_TYPE_1D vtx_tab[TX_TYPES] = {
+  DCT_1D,      ADST_1D, DCT_1D,      ADST_1D,
+  FLIPADST_1D, DCT_1D,  FLIPADST_1D, ADST_1D, FLIPADST_1D, IDTX_1D,
+  DCT_1D,      IDTX_1D, ADST_1D,     IDTX_1D, FLIPADST_1D, IDTX_1D,
+};
+
+static const TX_TYPE_1D htx_tab[TX_TYPES] = {
+  DCT_1D,  DCT_1D,      ADST_1D,     ADST_1D,
+  DCT_1D,  FLIPADST_1D, FLIPADST_1D, FLIPADST_1D, ADST_1D, IDTX_1D,
+  IDTX_1D, DCT_1D,      IDTX_1D,     ADST_1D,     IDTX_1D, FLIPADST_1D,
+};
+
+#define TXSIZE_CAT_INVALID (-1)
+
+/* clang-format on */
+
+static const TX_SIZE sub_tx_size_map[TX_SIZES_ALL] = {
+  TX_4X4,    // TX_4X4
+  TX_4X4,    // TX_8X8
+  TX_8X8,    // TX_16X16
+  TX_16X16,  // TX_32X32
+  TX_32X32,  // TX_64X64
+  TX_4X4,    // TX_4X8
+  TX_4X4,    // TX_8X4
+  TX_8X8,    // TX_8X16
+  TX_8X8,    // TX_16X8
+  TX_16X16,  // TX_16X32
+  TX_16X16,  // TX_32X16
+  TX_32X32,  // TX_32X64
+  TX_32X32,  // TX_64X32
+  TX_4X8,    // TX_4X16
+  TX_8X4,    // TX_16X4
+  TX_8X16,   // TX_8X32
+  TX_16X8,   // TX_32X8
+  TX_16X32,  // TX_16X64
+  TX_32X16,  // TX_64X16
+};
+
+static const TX_SIZE txsize_horz_map[TX_SIZES_ALL] = {
+  TX_4X4,    // TX_4X4
+  TX_8X8,    // TX_8X8
+  TX_16X16,  // TX_16X16
+  TX_32X32,  // TX_32X32
+  TX_64X64,  // TX_64X64
+  TX_4X4,    // TX_4X8
+  TX_8X8,    // TX_8X4
+  TX_8X8,    // TX_8X16
+  TX_16X16,  // TX_16X8
+  TX_16X16,  // TX_16X32
+  TX_32X32,  // TX_32X16
+  TX_32X32,  // TX_32X64
+  TX_64X64,  // TX_64X32
+  TX_4X4,    // TX_4X16
+  TX_16X16,  // TX_16X4
+  TX_8X8,    // TX_8X32
+  TX_32X32,  // TX_32X8
+  TX_16X16,  // TX_16X64
+  TX_64X64,  // TX_64X16
+};
+
+static const TX_SIZE txsize_vert_map[TX_SIZES_ALL] = {
+  TX_4X4,    // TX_4X4
+  TX_8X8,    // TX_8X8
+  TX_16X16,  // TX_16X16
+  TX_32X32,  // TX_32X32
+  TX_64X64,  // TX_64X64
+  TX_8X8,    // TX_4X8
+  TX_4X4,    // TX_8X4
+  TX_16X16,  // TX_8X16
+  TX_8X8,    // TX_16X8
+  TX_32X32,  // TX_16X32
+  TX_16X16,  // TX_32X16
+  TX_64X64,  // TX_32X64
+  TX_32X32,  // TX_64X32
+  TX_16X16,  // TX_4X16
+  TX_4X4,    // TX_16X4
+  TX_32X32,  // TX_8X32
+  TX_8X8,    // TX_32X8
+  TX_64X64,  // TX_16X64
+  TX_16X16,  // TX_64X16
+};
+
+#define TX_SIZE_W_MIN 4
+
+// Transform block width in pixels
+static const int tx_size_wide[TX_SIZES_ALL] = {
+  4, 8, 16, 32, 64, 4, 8, 8, 16, 16, 32, 32, 64, 4, 16, 8, 32, 16, 64,
+};
+
+#define TX_SIZE_H_MIN 4
+
+// Transform block height in pixels
+static const int tx_size_high[TX_SIZES_ALL] = {
+  4, 8, 16, 32, 64, 8, 4, 16, 8, 32, 16, 64, 32, 16, 4, 32, 8, 64, 16,
+};
+
+// Transform block width in unit
+static const int tx_size_wide_unit[TX_SIZES_ALL] = {
+  1, 2, 4, 8, 16, 1, 2, 2, 4, 4, 8, 8, 16, 1, 4, 2, 8, 4, 16,
+};
+
+// Transform block height in unit
+static const int tx_size_high_unit[TX_SIZES_ALL] = {
+  1, 2, 4, 8, 16, 2, 1, 4, 2, 8, 4, 16, 8, 4, 1, 8, 2, 16, 4,
+};
+
+// Transform block width in log2
+static const int tx_size_wide_log2[TX_SIZES_ALL] = {
+  2, 3, 4, 5, 6, 2, 3, 3, 4, 4, 5, 5, 6, 2, 4, 3, 5, 4, 6,
+};
+
+// Transform block height in log2
+static const int tx_size_high_log2[TX_SIZES_ALL] = {
+  2, 3, 4, 5, 6, 3, 2, 4, 3, 5, 4, 6, 5, 4, 2, 5, 3, 6, 4,
+};
+
+static const int tx_size_2d[TX_SIZES_ALL + 1] = {
+  16,  64,   256,  1024, 4096, 32,  32,  128,  128,  512,
+  512, 2048, 2048, 64,   64,   256, 256, 1024, 1024,
+};
+
+static const BLOCK_SIZE txsize_to_bsize[TX_SIZES_ALL] = {
+  BLOCK_4X4,    // TX_4X4
+  BLOCK_8X8,    // TX_8X8
+  BLOCK_16X16,  // TX_16X16
+  BLOCK_32X32,  // TX_32X32
+  BLOCK_64X64,  // TX_64X64
+  BLOCK_4X8,    // TX_4X8
+  BLOCK_8X4,    // TX_8X4
+  BLOCK_8X16,   // TX_8X16
+  BLOCK_16X8,   // TX_16X8
+  BLOCK_16X32,  // TX_16X32
+  BLOCK_32X16,  // TX_32X16
+  BLOCK_32X64,  // TX_32X64
+  BLOCK_64X32,  // TX_64X32
+  BLOCK_4X16,   // TX_4X16
+  BLOCK_16X4,   // TX_16X4
+  BLOCK_8X32,   // TX_8X32
+  BLOCK_32X8,   // TX_32X8
+  BLOCK_16X64,  // TX_16X64
+  BLOCK_64X16,  // TX_64X16
+};
+
+static const TX_SIZE txsize_sqr_map[TX_SIZES_ALL] = {
+  TX_4X4,    // TX_4X4
+  TX_8X8,    // TX_8X8
+  TX_16X16,  // TX_16X16
+  TX_32X32,  // TX_32X32
+  TX_64X64,  // TX_64X64
+  TX_4X4,    // TX_4X8
+  TX_4X4,    // TX_8X4
+  TX_8X8,    // TX_8X16
+  TX_8X8,    // TX_16X8
+  TX_16X16,  // TX_16X32
+  TX_16X16,  // TX_32X16
+  TX_32X32,  // TX_32X64
+  TX_32X32,  // TX_64X32
+  TX_4X4,    // TX_4X16
+  TX_4X4,    // TX_16X4
+  TX_8X8,    // TX_8X32
+  TX_8X8,    // TX_32X8
+  TX_16X16,  // TX_16X64
+  TX_16X16,  // TX_64X16
+};
+
+static const TX_SIZE txsize_sqr_up_map[TX_SIZES_ALL] = {
+  TX_4X4,    // TX_4X4
+  TX_8X8,    // TX_8X8
+  TX_16X16,  // TX_16X16
+  TX_32X32,  // TX_32X32
+  TX_64X64,  // TX_64X64
+  TX_8X8,    // TX_4X8
+  TX_8X8,    // TX_8X4
+  TX_16X16,  // TX_8X16
+  TX_16X16,  // TX_16X8
+  TX_32X32,  // TX_16X32
+  TX_32X32,  // TX_32X16
+  TX_64X64,  // TX_32X64
+  TX_64X64,  // TX_64X32
+  TX_16X16,  // TX_4X16
+  TX_16X16,  // TX_16X4
+  TX_32X32,  // TX_8X32
+  TX_32X32,  // TX_32X8
+  TX_64X64,  // TX_16X64
+  TX_64X64,  // TX_64X16
+};
+
+static const int8_t txsize_log2_minus4[TX_SIZES_ALL] = {
+  0,  // TX_4X4
+  2,  // TX_8X8
+  4,  // TX_16X16
+  6,  // TX_32X32
+  6,  // TX_64X64
+  1,  // TX_4X8
+  1,  // TX_8X4
+  3,  // TX_8X16
+  3,  // TX_16X8
+  5,  // TX_16X32
+  5,  // TX_32X16
+  6,  // TX_32X64
+  6,  // TX_64X32
+  2,  // TX_4X16
+  2,  // TX_16X4
+  4,  // TX_8X32
+  4,  // TX_32X8
+  5,  // TX_16X64
+  5,  // TX_64X16
+};
+
+/* clang-format off */
+static const TX_SIZE tx_mode_to_biggest_tx_size[TX_MODES] = {
+  TX_4X4,    // ONLY_4X4
+  TX_64X64,  // TX_MODE_LARGEST
+  TX_64X64,  // TX_MODE_SELECT
+};
+
+// The Subsampled_Size table in the spec (Section 5.11.38. Get plane residual
+// size function).
+static const BLOCK_SIZE ss_size_lookup[BLOCK_SIZES_ALL][2][2] = {
+  //  ss_x == 0      ss_x == 0          ss_x == 1      ss_x == 1
+  //  ss_y == 0      ss_y == 1          ss_y == 0      ss_y == 1
+  { { BLOCK_4X4,     BLOCK_4X4 },     { BLOCK_4X4,     BLOCK_4X4 } },
+  { { BLOCK_4X8,     BLOCK_4X4 },     { BLOCK_INVALID, BLOCK_4X4 } },
+  { { BLOCK_8X4,     BLOCK_INVALID }, { BLOCK_4X4,     BLOCK_4X4 } },
+  { { BLOCK_8X8,     BLOCK_8X4 },     { BLOCK_4X8,     BLOCK_4X4 } },
+  { { BLOCK_8X16,    BLOCK_8X8 },     { BLOCK_INVALID, BLOCK_4X8 } },
+  { { BLOCK_16X8,    BLOCK_INVALID }, { BLOCK_8X8,     BLOCK_8X4 } },
+  { { BLOCK_16X16,   BLOCK_16X8 },    { BLOCK_8X16,    BLOCK_8X8 } },
+  { { BLOCK_16X32,   BLOCK_16X16 },   { BLOCK_INVALID, BLOCK_8X16 } },
+  { { BLOCK_32X16,   BLOCK_INVALID }, { BLOCK_16X16,   BLOCK_16X8 } },
+  { { BLOCK_32X32,   BLOCK_32X16 },   { BLOCK_16X32,   BLOCK_16X16 } },
+  { { BLOCK_32X64,   BLOCK_32X32 },   { BLOCK_INVALID, BLOCK_16X32 } },
+  { { BLOCK_64X32,   BLOCK_INVALID }, { BLOCK_32X32,   BLOCK_32X16 } },
+  { { BLOCK_64X64,   BLOCK_64X32 },   { BLOCK_32X64,   BLOCK_32X32 } },
+  { { BLOCK_64X128,  BLOCK_64X64 },   { BLOCK_INVALID, BLOCK_32X64 } },
+  { { BLOCK_128X64,  BLOCK_INVALID }, { BLOCK_64X64,   BLOCK_64X32 } },
+  { { BLOCK_128X128, BLOCK_128X64 },  { BLOCK_64X128,  BLOCK_64X64 } },
+  { { BLOCK_4X16,    BLOCK_4X8 },     { BLOCK_INVALID, BLOCK_4X8 } },
+  { { BLOCK_16X4,    BLOCK_INVALID }, { BLOCK_8X4,     BLOCK_8X4 } },
+  { { BLOCK_8X32,    BLOCK_8X16 },    { BLOCK_INVALID, BLOCK_4X16 } },
+  { { BLOCK_32X8,    BLOCK_INVALID }, { BLOCK_16X8,    BLOCK_16X4 } },
+  { { BLOCK_16X64,   BLOCK_16X32 },   { BLOCK_INVALID, BLOCK_8X32 } },
+  { { BLOCK_64X16,   BLOCK_INVALID }, { BLOCK_32X16,   BLOCK_32X8 } }
+};
+/* clang-format on */
+
+// Generates 5 bit field in which each bit set to 1 represents
+// a blocksize partition  11111 means we split 128x128, 64x64, 32x32, 16x16
+// and 8x8.  10000 means we just split the 128x128 to 64x64
+/* clang-format off */
+static const struct {
+  PARTITION_CONTEXT above;
+  PARTITION_CONTEXT left;
+} partition_context_lookup[BLOCK_SIZES_ALL] = {
+  { 31, 31 },  // 4X4   - {0b11111, 0b11111}
+  { 31, 30 },  // 4X8   - {0b11111, 0b11110}
+  { 30, 31 },  // 8X4   - {0b11110, 0b11111}
+  { 30, 30 },  // 8X8   - {0b11110, 0b11110}
+  { 30, 28 },  // 8X16  - {0b11110, 0b11100}
+  { 28, 30 },  // 16X8  - {0b11100, 0b11110}
+  { 28, 28 },  // 16X16 - {0b11100, 0b11100}
+  { 28, 24 },  // 16X32 - {0b11100, 0b11000}
+  { 24, 28 },  // 32X16 - {0b11000, 0b11100}
+  { 24, 24 },  // 32X32 - {0b11000, 0b11000}
+  { 24, 16 },  // 32X64 - {0b11000, 0b10000}
+  { 16, 24 },  // 64X32 - {0b10000, 0b11000}
+  { 16, 16 },  // 64X64 - {0b10000, 0b10000}
+  { 16, 0 },   // 64X128- {0b10000, 0b00000}
+  { 0, 16 },   // 128X64- {0b00000, 0b10000}
+  { 0, 0 },    // 128X128-{0b00000, 0b00000}
+  { 31, 28 },  // 4X16  - {0b11111, 0b11100}
+  { 28, 31 },  // 16X4  - {0b11100, 0b11111}
+  { 30, 24 },  // 8X32  - {0b11110, 0b11000}
+  { 24, 30 },  // 32X8  - {0b11000, 0b11110}
+  { 28, 16 },  // 16X64 - {0b11100, 0b10000}
+  { 16, 28 },  // 64X16 - {0b10000, 0b11100}
+};
+/* clang-format on */
+
+static const int intra_mode_context[INTRA_MODES] = {
+  0, 1, 2, 3, 4, 4, 4, 4, 3, 0, 1, 2, 0,
+};
+
+// Note: this is also used in unit tests. So whenever one changes the table,
+// the unit tests need to be changed accordingly.
+static const int quant_dist_weight[4][2] = {
+  { 2, 3 }, { 2, 5 }, { 2, 7 }, { 1, MAX_FRAME_DISTANCE }
+};
+static const int quant_dist_lookup_table[2][4][2] = {
+  { { 9, 7 }, { 11, 5 }, { 12, 4 }, { 13, 3 } },
+  { { 7, 9 }, { 5, 11 }, { 4, 12 }, { 3, 13 } },
+};
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_COMMON_DATA_H_

diff --git a/libav1/av1/common/convolve.c b/libav1/av1/common/convolve.c
new file mode 100644
index 0000000..5a55ece
--- /dev/null
+++ b/libav1/av1/common/convolve.c

@@ -0,0 +1,1327 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+#include <string.h>
+
+#include "config/aom_dsp_rtcd.h"
+#include "config/av1_rtcd.h"
+
+#include "av1/common/blockd.h"
+#include "av1/common/convolve.h"
+#include "av1/common/filter.h"
+#include "av1/common/onyxc_int.h"
+#include "av1/common/resize.h"
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_ports/mem.h"
+
+void av1_convolve_horiz_rs_c(const uint8_t *src, int src_stride, uint8_t *dst,
+                             int dst_stride, int w, int h,
+                             const int16_t *x_filters, int x0_qn,
+                             int x_step_qn) {
+  src -= UPSCALE_NORMATIVE_TAPS / 2 - 1;
+  for (int y = 0; y < h; ++y) {
+    int x_qn = x0_qn;
+    for (int x = 0; x < w; ++x) {
+      const uint8_t *const src_x = &src[x_qn >> RS_SCALE_SUBPEL_BITS];
+      const int x_filter_idx =
+          (x_qn & RS_SCALE_SUBPEL_MASK) >> RS_SCALE_EXTRA_BITS;
+      assert(x_filter_idx <= RS_SUBPEL_MASK);
+      const int16_t *const x_filter =
+          &x_filters[x_filter_idx * UPSCALE_NORMATIVE_TAPS];
+      int sum = 0;
+      for (int k = 0; k < UPSCALE_NORMATIVE_TAPS; ++k)
+        sum += src_x[k] * x_filter[k];
+      dst[x] = clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS));
+      x_qn += x_step_qn;
+    }
+    src += src_stride;
+    dst += dst_stride;
+  }
+}
+
+void av1_highbd_convolve_horiz_rs_c(const uint16_t *src, int src_stride,
+                                    uint16_t *dst, int dst_stride, int w, int h,
+                                    const int16_t *x_filters, int x0_qn,
+                                    int x_step_qn, int bd) {
+  src -= UPSCALE_NORMATIVE_TAPS / 2 - 1;
+  for (int y = 0; y < h; ++y) {
+    int x_qn = x0_qn;
+    for (int x = 0; x < w; ++x) {
+      const uint16_t *const src_x = &src[x_qn >> RS_SCALE_SUBPEL_BITS];
+      const int x_filter_idx =
+          (x_qn & RS_SCALE_SUBPEL_MASK) >> RS_SCALE_EXTRA_BITS;
+      assert(x_filter_idx <= RS_SUBPEL_MASK);
+      const int16_t *const x_filter =
+          &x_filters[x_filter_idx * UPSCALE_NORMATIVE_TAPS];
+      int sum = 0;
+      for (int k = 0; k < UPSCALE_NORMATIVE_TAPS; ++k)
+        sum += src_x[k] * x_filter[k];
+      dst[x] = clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd);
+      x_qn += x_step_qn;
+    }
+    src += src_stride;
+    dst += dst_stride;
+  }
+}
+
+void av1_convolve_2d_sobel_y_c(const uint8_t *src, int src_stride, double *dst,
+                               int dst_stride, int w, int h, int dir,
+                               double norm) {
+  int16_t im_block[(MAX_SB_SIZE + MAX_FILTER_TAP - 1) * MAX_SB_SIZE];
+  DECLARE_ALIGNED(256, static const int16_t, sobel_a[3]) = { 1, 0, -1 };
+  DECLARE_ALIGNED(256, static const int16_t, sobel_b[3]) = { 1, 2, 1 };
+  const int taps = 3;
+  int im_h = h + taps - 1;
+  int im_stride = w;
+  const int fo_vert = 1;
+  const int fo_horiz = 1;
+
+  // horizontal filter
+  const uint8_t *src_horiz = src - fo_vert * src_stride;
+  const int16_t *x_filter = dir ? sobel_a : sobel_b;
+  for (int y = 0; y < im_h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      int16_t sum = 0;
+      for (int k = 0; k < taps; ++k) {
+        sum += x_filter[k] * src_horiz[y * src_stride + x - fo_horiz + k];
+      }
+      im_block[y * im_stride + x] = sum;
+    }
+  }
+
+  // vertical filter
+  int16_t *src_vert = im_block + fo_vert * im_stride;
+  const int16_t *y_filter = dir ? sobel_b : sobel_a;
+  for (int y = 0; y < h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      int16_t sum = 0;
+      for (int k = 0; k < taps; ++k) {
+        sum += y_filter[k] * src_vert[(y - fo_vert + k) * im_stride + x];
+      }
+      dst[y * dst_stride + x] = sum * norm;
+    }
+  }
+}
+
+void av1_convolve_2d_sr_c(const uint8_t *src, int src_stride, uint8_t *dst,
+                          int dst_stride, int w, int h,
+                          const InterpFilterParams *filter_params_x,
+                          const InterpFilterParams *filter_params_y,
+                          const int subpel_x_q4, const int subpel_y_q4,
+                          ConvolveParams *conv_params) {
+  int16_t im_block[(MAX_SB_SIZE + MAX_FILTER_TAP - 1) * MAX_SB_SIZE];
+  int im_h = h + filter_params_y->taps - 1;
+  int im_stride = w;
+  const int fo_vert = filter_params_y->taps / 2 - 1;
+  const int fo_horiz = filter_params_x->taps / 2 - 1;
+  const int bd = 8;
+  const int bits =
+      FILTER_BITS * 2 - conv_params->round_0 - conv_params->round_1;
+
+  // horizontal filter
+  const uint8_t *src_horiz = src - fo_vert * src_stride;
+  const int16_t *x_filter = av1_get_interp_filter_subpel_kernel(
+      filter_params_x, subpel_x_q4 & SUBPEL_MASK);
+  for (int y = 0; y < im_h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      int32_t sum = (1 << (bd + FILTER_BITS - 1));
+      for (int k = 0; k < filter_params_x->taps; ++k) {
+        sum += x_filter[k] * src_horiz[y * src_stride + x - fo_horiz + k];
+      }
+      assert(0 <= sum && sum < (1 << (bd + FILTER_BITS + 1)));
+      im_block[y * im_stride + x] =
+          (int16_t)ROUND_POWER_OF_TWO(sum, conv_params->round_0);
+    }
+  }
+
+  // vertical filter
+  int16_t *src_vert = im_block + fo_vert * im_stride;
+  const int16_t *y_filter = av1_get_interp_filter_subpel_kernel(
+      filter_params_y, subpel_y_q4 & SUBPEL_MASK);
+  const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+  for (int y = 0; y < h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      int32_t sum = 1 << offset_bits;
+      for (int k = 0; k < filter_params_y->taps; ++k) {
+        sum += y_filter[k] * src_vert[(y - fo_vert + k) * im_stride + x];
+      }
+      assert(0 <= sum && sum < (1 << (offset_bits + 2)));
+      int16_t res = ROUND_POWER_OF_TWO(sum, conv_params->round_1) -
+                    ((1 << (offset_bits - conv_params->round_1)) +
+                     (1 << (offset_bits - conv_params->round_1 - 1)));
+      dst[y * dst_stride + x] = clip_pixel(ROUND_POWER_OF_TWO(res, bits));
+    }
+  }
+}
+
+void av1_convolve_y_sr_c(const uint8_t *src, int src_stride, uint8_t *dst,
+                         int dst_stride, int w, int h,
+                         const InterpFilterParams *filter_params_x,
+                         const InterpFilterParams *filter_params_y,
+                         const int subpel_x_q4, const int subpel_y_q4,
+                         ConvolveParams *conv_params) {
+  const int fo_vert = filter_params_y->taps / 2 - 1;
+  (void)filter_params_x;
+  (void)subpel_x_q4;
+  (void)conv_params;
+
+  assert(conv_params->round_0 <= FILTER_BITS);
+  assert(((conv_params->round_0 + conv_params->round_1) <= (FILTER_BITS + 1)) ||
+         ((conv_params->round_0 + conv_params->round_1) == (2 * FILTER_BITS)));
+
+  // vertical filter
+  const int16_t *y_filter = av1_get_interp_filter_subpel_kernel(
+      filter_params_y, subpel_y_q4 & SUBPEL_MASK);
+  for (int y = 0; y < h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      int32_t res = 0;
+      for (int k = 0; k < filter_params_y->taps; ++k) {
+        res += y_filter[k] * src[(y - fo_vert + k) * src_stride + x];
+      }
+      dst[y * dst_stride + x] =
+          clip_pixel(ROUND_POWER_OF_TWO(res, FILTER_BITS));
+    }
+  }
+}
+
+void av1_convolve_x_sr_c(const uint8_t *src, int src_stride, uint8_t *dst,
+                         int dst_stride, int w, int h,
+                         const InterpFilterParams *filter_params_x,
+                         const InterpFilterParams *filter_params_y,
+                         const int subpel_x_q4, const int subpel_y_q4,
+                         ConvolveParams *conv_params) {
+  const int fo_horiz = filter_params_x->taps / 2 - 1;
+  const int bits = FILTER_BITS - conv_params->round_0;
+  (void)filter_params_y;
+  (void)subpel_y_q4;
+  (void)conv_params;
+
+  assert(bits >= 0);
+  assert((FILTER_BITS - conv_params->round_1) >= 0 ||
+         ((conv_params->round_0 + conv_params->round_1) == 2 * FILTER_BITS));
+
+  // horizontal filter
+  const int16_t *x_filter = av1_get_interp_filter_subpel_kernel(
+      filter_params_x, subpel_x_q4 & SUBPEL_MASK);
+
+  for (int y = 0; y < h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      int32_t res = 0;
+      for (int k = 0; k < filter_params_x->taps; ++k) {
+        res += x_filter[k] * src[y * src_stride + x - fo_horiz + k];
+      }
+      res = ROUND_POWER_OF_TWO(res, conv_params->round_0);
+      dst[y * dst_stride + x] = clip_pixel(ROUND_POWER_OF_TWO(res, bits));
+    }
+  }
+}
+
+void av1_convolve_2d_copy_sr_c(const uint8_t *src, int src_stride, uint8_t *dst,
+                               int dst_stride, int w, int h,
+                               const InterpFilterParams *filter_params_x,
+                               const InterpFilterParams *filter_params_y,
+                               const int subpel_x_q4, const int subpel_y_q4,
+                               ConvolveParams *conv_params) {
+  (void)filter_params_x;
+  (void)filter_params_y;
+  (void)subpel_x_q4;
+  (void)subpel_y_q4;
+  (void)conv_params;
+
+  for (int y = 0; y < h; ++y) {
+    memmove(dst + y * dst_stride, src + y * src_stride, w * sizeof(src[0]));
+  }
+}
+
+void av1_dist_wtd_convolve_2d_c(const uint8_t *src, int src_stride,
+                                uint8_t *dst8, int dst8_stride, int w, int h,
+                                const InterpFilterParams *filter_params_x,
+                                const InterpFilterParams *filter_params_y,
+                                const int subpel_x_q4, const int subpel_y_q4,
+                                ConvolveParams *conv_params) {
+  CONV_BUF_TYPE *dst = conv_params->dst;
+  int dst_stride = conv_params->dst_stride;
+  int16_t im_block[(MAX_SB_SIZE + MAX_FILTER_TAP - 1) * MAX_SB_SIZE];
+  int im_h = h + filter_params_y->taps - 1;
+  int im_stride = w;
+  const int fo_vert = filter_params_y->taps / 2 - 1;
+  const int fo_horiz = filter_params_x->taps / 2 - 1;
+  const int bd = 8;
+  const int round_bits =
+      2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
+
+  // horizontal filter
+  const uint8_t *src_horiz = src - fo_vert * src_stride;
+  const int16_t *x_filter = av1_get_interp_filter_subpel_kernel(
+      filter_params_x, subpel_x_q4 & SUBPEL_MASK);
+  for (int y = 0; y < im_h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      int32_t sum = (1 << (bd + FILTER_BITS - 1));
+      for (int k = 0; k < filter_params_x->taps; ++k) {
+        sum += x_filter[k] * src_horiz[y * src_stride + x - fo_horiz + k];
+      }
+      assert(0 <= sum && sum < (1 << (bd + FILTER_BITS + 1)));
+      im_block[y * im_stride + x] =
+          (int16_t)ROUND_POWER_OF_TWO(sum, conv_params->round_0);
+    }
+  }
+
+  // vertical filter
+  int16_t *src_vert = im_block + fo_vert * im_stride;
+  const int16_t *y_filter = av1_get_interp_filter_subpel_kernel(
+      filter_params_y, subpel_y_q4 & SUBPEL_MASK);
+  const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+  for (int y = 0; y < h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      int32_t sum = 1 << offset_bits;
+      for (int k = 0; k < filter_params_y->taps; ++k) {
+        sum += y_filter[k] * src_vert[(y - fo_vert + k) * im_stride + x];
+      }
+      assert(0 <= sum && sum < (1 << (offset_bits + 2)));
+      CONV_BUF_TYPE res = ROUND_POWER_OF_TWO(sum, conv_params->round_1);
+      if (conv_params->do_average) {
+        int32_t tmp = dst[y * dst_stride + x];
+        if (conv_params->use_dist_wtd_comp_avg) {
+          tmp = tmp * conv_params->fwd_offset + res * conv_params->bck_offset;
+          tmp = tmp >> DIST_PRECISION_BITS;
+        } else {
+          tmp += res;
+          tmp = tmp >> 1;
+        }
+        tmp -= (1 << (offset_bits - conv_params->round_1)) +
+               (1 << (offset_bits - conv_params->round_1 - 1));
+        dst8[y * dst8_stride + x] =
+            clip_pixel(ROUND_POWER_OF_TWO(tmp, round_bits));
+      } else {
+        dst[y * dst_stride + x] = res;
+      }
+    }
+  }
+}
+
+void av1_dist_wtd_convolve_y_c(const uint8_t *src, int src_stride,
+                               uint8_t *dst8, int dst8_stride, int w, int h,
+                               const InterpFilterParams *filter_params_x,
+                               const InterpFilterParams *filter_params_y,
+                               const int subpel_x_q4, const int subpel_y_q4,
+                               ConvolveParams *conv_params) {
+  CONV_BUF_TYPE *dst = conv_params->dst;
+  int dst_stride = conv_params->dst_stride;
+  const int fo_vert = filter_params_y->taps / 2 - 1;
+  const int bits = FILTER_BITS - conv_params->round_0;
+  const int bd = 8;
+  const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+  const int round_offset = (1 << (offset_bits - conv_params->round_1)) +
+                           (1 << (offset_bits - conv_params->round_1 - 1));
+  const int round_bits =
+      2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
+  (void)filter_params_x;
+  (void)subpel_x_q4;
+
+  // vertical filter
+  const int16_t *y_filter = av1_get_interp_filter_subpel_kernel(
+      filter_params_y, subpel_y_q4 & SUBPEL_MASK);
+  for (int y = 0; y < h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      int32_t res = 0;
+      for (int k = 0; k < filter_params_y->taps; ++k) {
+        res += y_filter[k] * src[(y - fo_vert + k) * src_stride + x];
+      }
+      res *= (1 << bits);
+      res = ROUND_POWER_OF_TWO(res, conv_params->round_1) + round_offset;
+
+      if (conv_params->do_average) {
+        int32_t tmp = dst[y * dst_stride + x];
+        if (conv_params->use_dist_wtd_comp_avg) {
+          tmp = tmp * conv_params->fwd_offset + res * conv_params->bck_offset;
+          tmp = tmp >> DIST_PRECISION_BITS;
+        } else {
+          tmp += res;
+          tmp = tmp >> 1;
+        }
+        tmp -= round_offset;
+        dst8[y * dst8_stride + x] =
+            clip_pixel(ROUND_POWER_OF_TWO(tmp, round_bits));
+      } else {
+        dst[y * dst_stride + x] = res;
+      }
+    }
+  }
+}
+
+void av1_dist_wtd_convolve_x_c(const uint8_t *src, int src_stride,
+                               uint8_t *dst8, int dst8_stride, int w, int h,
+                               const InterpFilterParams *filter_params_x,
+                               const InterpFilterParams *filter_params_y,
+                               const int subpel_x_q4, const int subpel_y_q4,
+                               ConvolveParams *conv_params) {
+  CONV_BUF_TYPE *dst = conv_params->dst;
+  int dst_stride = conv_params->dst_stride;
+  const int fo_horiz = filter_params_x->taps / 2 - 1;
+  const int bits = FILTER_BITS - conv_params->round_1;
+  const int bd = 8;
+  const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+  const int round_offset = (1 << (offset_bits - conv_params->round_1)) +
+                           (1 << (offset_bits - conv_params->round_1 - 1));
+  const int round_bits =
+      2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
+  (void)filter_params_y;
+  (void)subpel_y_q4;
+
+  // horizontal filter
+  const int16_t *x_filter = av1_get_interp_filter_subpel_kernel(
+      filter_params_x, subpel_x_q4 & SUBPEL_MASK);
+  for (int y = 0; y < h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      int32_t res = 0;
+      for (int k = 0; k < filter_params_x->taps; ++k) {
+        res += x_filter[k] * src[y * src_stride + x - fo_horiz + k];
+      }
+      res = (1 << bits) * ROUND_POWER_OF_TWO(res, conv_params->round_0);
+      res += round_offset;
+
+      if (conv_params->do_average) {
+        int32_t tmp = dst[y * dst_stride + x];
+        if (conv_params->use_dist_wtd_comp_avg) {
+          tmp = tmp * conv_params->fwd_offset + res * conv_params->bck_offset;
+          tmp = tmp >> DIST_PRECISION_BITS;
+        } else {
+          tmp += res;
+          tmp = tmp >> 1;
+        }
+        tmp -= round_offset;
+        dst8[y * dst8_stride + x] =
+            clip_pixel(ROUND_POWER_OF_TWO(tmp, round_bits));
+      } else {
+        dst[y * dst_stride + x] = res;
+      }
+    }
+  }
+}
+
+void av1_dist_wtd_convolve_2d_copy_c(
+    const uint8_t *src, int src_stride, uint8_t *dst8, int dst8_stride, int w,
+    int h, const InterpFilterParams *filter_params_x,
+    const InterpFilterParams *filter_params_y, const int subpel_x_q4,
+    const int subpel_y_q4, ConvolveParams *conv_params) {
+  CONV_BUF_TYPE *dst = conv_params->dst;
+  int dst_stride = conv_params->dst_stride;
+  const int bits =
+      FILTER_BITS * 2 - conv_params->round_1 - conv_params->round_0;
+  const int bd = 8;
+  const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+  const int round_offset = (1 << (offset_bits - conv_params->round_1)) +
+                           (1 << (offset_bits - conv_params->round_1 - 1));
+  (void)filter_params_x;
+  (void)filter_params_y;
+  (void)subpel_x_q4;
+  (void)subpel_y_q4;
+
+  for (int y = 0; y < h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      CONV_BUF_TYPE res = src[y * src_stride + x] << bits;
+      res += round_offset;
+
+      if (conv_params->do_average) {
+        int32_t tmp = dst[y * dst_stride + x];
+        if (conv_params->use_dist_wtd_comp_avg) {
+          tmp = tmp * conv_params->fwd_offset + res * conv_params->bck_offset;
+          tmp = tmp >> DIST_PRECISION_BITS;
+        } else {
+          tmp += res;
+          tmp = tmp >> 1;
+        }
+        tmp -= round_offset;
+        dst8[y * dst8_stride + x] = clip_pixel(ROUND_POWER_OF_TWO(tmp, bits));
+      } else {
+        dst[y * dst_stride + x] = res;
+      }
+    }
+  }
+}
+
+void av1_convolve_2d_scale_c(const uint8_t *src, int src_stride, uint8_t *dst8,
+                             int dst8_stride, int w, int h,
+                             const InterpFilterParams *filter_params_x,
+                             const InterpFilterParams *filter_params_y,
+                             const int subpel_x_qn, const int x_step_qn,
+                             const int subpel_y_qn, const int y_step_qn,
+                             ConvolveParams *conv_params) {
+  int16_t im_block[(2 * MAX_SB_SIZE + MAX_FILTER_TAP) * MAX_SB_SIZE];
+  int im_h = (((h - 1) * y_step_qn + subpel_y_qn) >> SCALE_SUBPEL_BITS) +
+             filter_params_y->taps;
+  CONV_BUF_TYPE *dst16 = conv_params->dst;
+  const int dst16_stride = conv_params->dst_stride;
+  const int bits =
+      FILTER_BITS * 2 - conv_params->round_0 - conv_params->round_1;
+  assert(bits >= 0);
+  int im_stride = w;
+  const int fo_vert = filter_params_y->taps / 2 - 1;
+  const int fo_horiz = filter_params_x->taps / 2 - 1;
+  const int bd = 8;
+
+  // horizontal filter
+  const uint8_t *src_horiz = src - fo_vert * src_stride;
+  for (int y = 0; y < im_h; ++y) {
+    int x_qn = subpel_x_qn;
+    for (int x = 0; x < w; ++x, x_qn += x_step_qn) {
+      const uint8_t *const src_x = &src_horiz[(x_qn >> SCALE_SUBPEL_BITS)];
+      const int x_filter_idx = (x_qn & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS;
+      assert(x_filter_idx < SUBPEL_SHIFTS);
+      const int16_t *x_filter =
+          av1_get_interp_filter_subpel_kernel(filter_params_x, x_filter_idx);
+      int32_t sum = (1 << (bd + FILTER_BITS - 1));
+      for (int k = 0; k < filter_params_x->taps; ++k) {
+        sum += x_filter[k] * src_x[k - fo_horiz];
+      }
+      assert(0 <= sum && sum < (1 << (bd + FILTER_BITS + 1)));
+      im_block[y * im_stride + x] =
+          (int16_t)ROUND_POWER_OF_TWO(sum, conv_params->round_0);
+    }
+    src_horiz += src_stride;
+  }
+
+  // vertical filter
+  int16_t *src_vert = im_block + fo_vert * im_stride;
+  const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+  for (int x = 0; x < w; ++x) {
+    int y_qn = subpel_y_qn;
+    for (int y = 0; y < h; ++y, y_qn += y_step_qn) {
+      const int16_t *src_y = &src_vert[(y_qn >> SCALE_SUBPEL_BITS) * im_stride];
+      const int y_filter_idx = (y_qn & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS;
+      assert(y_filter_idx < SUBPEL_SHIFTS);
+      const int16_t *y_filter =
+          av1_get_interp_filter_subpel_kernel(filter_params_y, y_filter_idx);
+      int32_t sum = 1 << offset_bits;
+      for (int k = 0; k < filter_params_y->taps; ++k) {
+        sum += y_filter[k] * src_y[(k - fo_vert) * im_stride];
+      }
+      assert(0 <= sum && sum < (1 << (offset_bits + 2)));
+      CONV_BUF_TYPE res = ROUND_POWER_OF_TWO(sum, conv_params->round_1);
+      if (conv_params->is_compound) {
+        if (conv_params->do_average) {
+          int32_t tmp = dst16[y * dst16_stride + x];
+          if (conv_params->use_dist_wtd_comp_avg) {
+            tmp = tmp * conv_params->fwd_offset + res * conv_params->bck_offset;
+            tmp = tmp >> DIST_PRECISION_BITS;
+          } else {
+            tmp += res;
+            tmp = tmp >> 1;
+          }
+          /* Subtract round offset and convolve round */
+          tmp = tmp - ((1 << (offset_bits - conv_params->round_1)) +
+                       (1 << (offset_bits - conv_params->round_1 - 1)));
+          dst8[y * dst8_stride + x] = clip_pixel(ROUND_POWER_OF_TWO(tmp, bits));
+        } else {
+          dst16[y * dst16_stride + x] = res;
+        }
+      } else {
+        /* Subtract round offset and convolve round */
+        int32_t tmp = res - ((1 << (offset_bits - conv_params->round_1)) +
+                             (1 << (offset_bits - conv_params->round_1 - 1)));
+        dst8[y * dst8_stride + x] = clip_pixel(ROUND_POWER_OF_TWO(tmp, bits));
+      }
+    }
+    src_vert++;
+  }
+}
+
+static void convolve_2d_scale_wrapper(
+    const uint8_t *src, int src_stride, uint8_t *dst, int dst_stride, int w,
+    int h, const InterpFilterParams *filter_params_x,
+    const InterpFilterParams *filter_params_y, const int subpel_x_qn,
+    const int x_step_qn, const int subpel_y_qn, const int y_step_qn,
+    ConvolveParams *conv_params) {
+  if (conv_params->is_compound) {
+    assert(conv_params->dst != NULL);
+  }
+  av1_convolve_2d_scale(src, src_stride, dst, dst_stride, w, h, filter_params_x,
+                        filter_params_y, subpel_x_qn, x_step_qn, subpel_y_qn,
+                        y_step_qn, conv_params);
+}
+
+// TODO(huisu@google.com): bilinear filtering only needs 2 taps in general. So
+// we may create optimized code to do 2-tap filtering for all bilinear filtering
+// usages, not just IntraBC.
+static void convolve_2d_for_intrabc(const uint8_t *src, int src_stride,
+                                    uint8_t *dst, int dst_stride, int w, int h,
+                                    int subpel_x_q4, int subpel_y_q4,
+                                    ConvolveParams *conv_params) {
+  const InterpFilterParams *filter_params_x =
+      subpel_x_q4 ? &av1_intrabc_filter_params : NULL;
+  const InterpFilterParams *filter_params_y =
+      subpel_y_q4 ? &av1_intrabc_filter_params : NULL;
+  if (subpel_x_q4 != 0 && subpel_y_q4 != 0) {
+    av1_convolve_2d_sr_c(src, src_stride, dst, dst_stride, w, h,
+                         filter_params_x, filter_params_y, 0, 0, conv_params);
+  } else if (subpel_x_q4 != 0) {
+    av1_convolve_x_sr_c(src, src_stride, dst, dst_stride, w, h, filter_params_x,
+                        filter_params_y, 0, 0, conv_params);
+  } else {
+    av1_convolve_y_sr_c(src, src_stride, dst, dst_stride, w, h, filter_params_x,
+                        filter_params_y, 0, 0, conv_params);
+  }
+}
+
+void av1_convolve_2d_facade(const uint8_t *src, int src_stride, uint8_t *dst,
+                            int dst_stride, int w, int h,
+                            InterpFilters interp_filters, const int subpel_x_q4,
+                            int x_step_q4, const int subpel_y_q4, int y_step_q4,
+                            int scaled, ConvolveParams *conv_params,
+                            const struct scale_factors *sf, int is_intrabc) {
+  assert(IMPLIES(is_intrabc, !scaled));
+  (void)x_step_q4;
+  (void)y_step_q4;
+  (void)dst;
+  (void)dst_stride;
+
+  if (is_intrabc && (subpel_x_q4 != 0 || subpel_y_q4 != 0)) {
+    convolve_2d_for_intrabc(src, src_stride, dst, dst_stride, w, h, subpel_x_q4,
+                            subpel_y_q4, conv_params);
+    return;
+  }
+
+  InterpFilter filter_x = 0;
+  InterpFilter filter_y = 0;
+  const int need_filter_params_x = (subpel_x_q4 != 0) | scaled;
+  const int need_filter_params_y = (subpel_y_q4 != 0) | scaled;
+  if (need_filter_params_x)
+    filter_x = av1_extract_interp_filter(interp_filters, 1);
+  if (need_filter_params_y)
+    filter_y = av1_extract_interp_filter(interp_filters, 0);
+  const InterpFilterParams *filter_params_x =
+      need_filter_params_x
+          ? av1_get_interp_filter_params_with_block_size(filter_x, w)
+          : NULL;
+  const InterpFilterParams *filter_params_y =
+      need_filter_params_y
+          ? av1_get_interp_filter_params_with_block_size(filter_y, h)
+          : NULL;
+
+  if (scaled) {
+    convolve_2d_scale_wrapper(src, src_stride, dst, dst_stride, w, h,
+                              filter_params_x, filter_params_y, subpel_x_q4,
+                              x_step_q4, subpel_y_q4, y_step_q4, conv_params);
+  } else {
+    sf->convolve[subpel_x_q4 != 0][subpel_y_q4 != 0][conv_params->is_compound](
+        src, src_stride, dst, dst_stride, w, h, filter_params_x,
+        filter_params_y, subpel_x_q4, subpel_y_q4, conv_params);
+  }
+}
+
+void av1_highbd_convolve_2d_copy_sr_c(
+    const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w,
+    int h, const InterpFilterParams *filter_params_x,
+    const InterpFilterParams *filter_params_y, const int subpel_x_q4,
+    const int subpel_y_q4, ConvolveParams *conv_params, int bd) {
+  (void)filter_params_x;
+  (void)filter_params_y;
+  (void)subpel_x_q4;
+  (void)subpel_y_q4;
+  (void)conv_params;
+  (void)bd;
+
+  for (int y = 0; y < h; ++y) {
+    memmove(dst + y * dst_stride, src + y * src_stride, w * sizeof(src[0]));
+  }
+}
+
+void av1_highbd_convolve_x_sr_c(const uint16_t *src, int src_stride,
+                                uint16_t *dst, int dst_stride, int w, int h,
+                                const InterpFilterParams *filter_params_x,
+                                const InterpFilterParams *filter_params_y,
+                                const int subpel_x_q4, const int subpel_y_q4,
+                                ConvolveParams *conv_params, int bd) {
+  const int fo_horiz = filter_params_x->taps / 2 - 1;
+  const int bits = FILTER_BITS - conv_params->round_0;
+  (void)filter_params_y;
+  (void)subpel_y_q4;
+
+  assert(bits >= 0);
+  assert((FILTER_BITS - conv_params->round_1) >= 0 ||
+         ((conv_params->round_0 + conv_params->round_1) == 2 * FILTER_BITS));
+
+  // horizontal filter
+  const int16_t *x_filter = av1_get_interp_filter_subpel_kernel(
+      filter_params_x, subpel_x_q4 & SUBPEL_MASK);
+  for (int y = 0; y < h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      int32_t res = 0;
+      for (int k = 0; k < filter_params_x->taps; ++k) {
+        res += x_filter[k] * src[y * src_stride + x - fo_horiz + k];
+      }
+      res = ROUND_POWER_OF_TWO(res, conv_params->round_0);
+      dst[y * dst_stride + x] =
+          clip_pixel_highbd(ROUND_POWER_OF_TWO(res, bits), bd);
+    }
+  }
+}
+
+void av1_highbd_convolve_y_sr_c(const uint16_t *src, int src_stride,
+                                uint16_t *dst, int dst_stride, int w, int h,
+                                const InterpFilterParams *filter_params_x,
+                                const InterpFilterParams *filter_params_y,
+                                const int subpel_x_q4, const int subpel_y_q4,
+                                ConvolveParams *conv_params, int bd) {
+  const int fo_vert = filter_params_y->taps / 2 - 1;
+  (void)filter_params_x;
+  (void)subpel_x_q4;
+  (void)conv_params;
+
+  assert(conv_params->round_0 <= FILTER_BITS);
+  assert(((conv_params->round_0 + conv_params->round_1) <= (FILTER_BITS + 1)) ||
+         ((conv_params->round_0 + conv_params->round_1) == (2 * FILTER_BITS)));
+  // vertical filter
+  const int16_t *y_filter = av1_get_interp_filter_subpel_kernel(
+      filter_params_y, subpel_y_q4 & SUBPEL_MASK);
+  for (int y = 0; y < h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      int32_t res = 0;
+      for (int k = 0; k < filter_params_y->taps; ++k) {
+        res += y_filter[k] * src[(y - fo_vert + k) * src_stride + x];
+      }
+      dst[y * dst_stride + x] =
+          clip_pixel_highbd(ROUND_POWER_OF_TWO(res, FILTER_BITS), bd);
+    }
+  }
+}
+
+void av1_highbd_convolve_2d_sr_c(const uint16_t *src, int src_stride,
+                                 uint16_t *dst, int dst_stride, int w, int h,
+                                 const InterpFilterParams *filter_params_x,
+                                 const InterpFilterParams *filter_params_y,
+                                 const int subpel_x_q4, const int subpel_y_q4,
+                                 ConvolveParams *conv_params, int bd) {
+  int16_t im_block[(MAX_SB_SIZE + MAX_FILTER_TAP - 1) * MAX_SB_SIZE];
+  int im_h = h + filter_params_y->taps - 1;
+  int im_stride = w;
+  const int fo_vert = filter_params_y->taps / 2 - 1;
+  const int fo_horiz = filter_params_x->taps / 2 - 1;
+  const int bits =
+      FILTER_BITS * 2 - conv_params->round_0 - conv_params->round_1;
+  assert(bits >= 0);
+
+  // horizontal filter
+  const uint16_t *src_horiz = src - fo_vert * src_stride;
+  const int16_t *x_filter = av1_get_interp_filter_subpel_kernel(
+      filter_params_x, subpel_x_q4 & SUBPEL_MASK);
+  for (int y = 0; y < im_h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      int32_t sum = (1 << (bd + FILTER_BITS - 1));
+      for (int k = 0; k < filter_params_x->taps; ++k) {
+        sum += x_filter[k] * src_horiz[y * src_stride + x - fo_horiz + k];
+      }
+      assert(0 <= sum && sum < (1 << (bd + FILTER_BITS + 1)));
+      im_block[y * im_stride + x] =
+          ROUND_POWER_OF_TWO(sum, conv_params->round_0);
+    }
+  }
+
+  // vertical filter
+  int16_t *src_vert = im_block + fo_vert * im_stride;
+  const int16_t *y_filter = av1_get_interp_filter_subpel_kernel(
+      filter_params_y, subpel_y_q4 & SUBPEL_MASK);
+  const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+  for (int y = 0; y < h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      int32_t sum = 1 << offset_bits;
+      for (int k = 0; k < filter_params_y->taps; ++k) {
+        sum += y_filter[k] * src_vert[(y - fo_vert + k) * im_stride + x];
+      }
+      assert(0 <= sum && sum < (1 << (offset_bits + 2)));
+      int32_t res = ROUND_POWER_OF_TWO(sum, conv_params->round_1) -
+                    ((1 << (offset_bits - conv_params->round_1)) +
+                     (1 << (offset_bits - conv_params->round_1 - 1)));
+      dst[y * dst_stride + x] =
+          clip_pixel_highbd(ROUND_POWER_OF_TWO(res, bits), bd);
+    }
+  }
+}
+
+void av1_highbd_dist_wtd_convolve_2d_c(
+    const uint16_t *src, int src_stride, uint16_t *dst16, int dst16_stride,
+    int w, int h, const InterpFilterParams *filter_params_x,
+    const InterpFilterParams *filter_params_y, const int subpel_x_q4,
+    const int subpel_y_q4, ConvolveParams *conv_params, int bd) {
+  int x, y, k;
+  int16_t im_block[(MAX_SB_SIZE + MAX_FILTER_TAP - 1) * MAX_SB_SIZE];
+  CONV_BUF_TYPE *dst = conv_params->dst;
+  int dst_stride = conv_params->dst_stride;
+  int im_h = h + filter_params_y->taps - 1;
+  int im_stride = w;
+  const int fo_vert = filter_params_y->taps / 2 - 1;
+  const int fo_horiz = filter_params_x->taps / 2 - 1;
+  const int round_bits =
+      2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
+  assert(round_bits >= 0);
+
+  // horizontal filter
+  const uint16_t *src_horiz = src - fo_vert * src_stride;
+  const int16_t *x_filter = av1_get_interp_filter_subpel_kernel(
+      filter_params_x, subpel_x_q4 & SUBPEL_MASK);
+  for (y = 0; y < im_h; ++y) {
+    for (x = 0; x < w; ++x) {
+      int32_t sum = (1 << (bd + FILTER_BITS - 1));
+      for (k = 0; k < filter_params_x->taps; ++k) {
+        sum += x_filter[k] * src_horiz[y * src_stride + x - fo_horiz + k];
+      }
+      assert(0 <= sum && sum < (1 << (bd + FILTER_BITS + 1)));
+      (void)bd;
+      im_block[y * im_stride + x] =
+          (int16_t)ROUND_POWER_OF_TWO(sum, conv_params->round_0);
+    }
+  }
+
+  // vertical filter
+  int16_t *src_vert = im_block + fo_vert * im_stride;
+  const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+  const int16_t *y_filter = av1_get_interp_filter_subpel_kernel(
+      filter_params_y, subpel_y_q4 & SUBPEL_MASK);
+  for (y = 0; y < h; ++y) {
+    for (x = 0; x < w; ++x) {
+      int32_t sum = 1 << offset_bits;
+      for (k = 0; k < filter_params_y->taps; ++k) {
+        sum += y_filter[k] * src_vert[(y - fo_vert + k) * im_stride + x];
+      }
+      assert(0 <= sum && sum < (1 << (offset_bits + 2)));
+      CONV_BUF_TYPE res = ROUND_POWER_OF_TWO(sum, conv_params->round_1);
+      if (conv_params->do_average) {
+        int32_t tmp = dst[y * dst_stride + x];
+        if (conv_params->use_dist_wtd_comp_avg) {
+          tmp = tmp * conv_params->fwd_offset + res * conv_params->bck_offset;
+          tmp = tmp >> DIST_PRECISION_BITS;
+        } else {
+          tmp += res;
+          tmp = tmp >> 1;
+        }
+        tmp -= (1 << (offset_bits - conv_params->round_1)) +
+               (1 << (offset_bits - conv_params->round_1 - 1));
+        dst16[y * dst16_stride + x] =
+            clip_pixel_highbd(ROUND_POWER_OF_TWO(tmp, round_bits), bd);
+      } else {
+        dst[y * dst_stride + x] = res;
+      }
+    }
+  }
+}
+
+void av1_highbd_dist_wtd_convolve_x_c(
+    const uint16_t *src, int src_stride, uint16_t *dst16, int dst16_stride,
+    int w, int h, const InterpFilterParams *filter_params_x,
+    const InterpFilterParams *filter_params_y, const int subpel_x_q4,
+    const int subpel_y_q4, ConvolveParams *conv_params, int bd) {
+  CONV_BUF_TYPE *dst = conv_params->dst;
+  int dst_stride = conv_params->dst_stride;
+  const int fo_horiz = filter_params_x->taps / 2 - 1;
+  const int bits = FILTER_BITS - conv_params->round_1;
+  const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+  const int round_offset = (1 << (offset_bits - conv_params->round_1)) +
+                           (1 << (offset_bits - conv_params->round_1 - 1));
+  const int round_bits =
+      2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
+  assert(round_bits >= 0);
+  (void)filter_params_y;
+  (void)subpel_y_q4;
+  assert(bits >= 0);
+  // horizontal filter
+  const int16_t *x_filter = av1_get_interp_filter_subpel_kernel(
+      filter_params_x, subpel_x_q4 & SUBPEL_MASK);
+  for (int y = 0; y < h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      int32_t res = 0;
+      for (int k = 0; k < filter_params_x->taps; ++k) {
+        res += x_filter[k] * src[y * src_stride + x - fo_horiz + k];
+      }
+      res = (1 << bits) * ROUND_POWER_OF_TWO(res, conv_params->round_0);
+      res += round_offset;
+
+      if (conv_params->do_average) {
+        int32_t tmp = dst[y * dst_stride + x];
+        if (conv_params->use_dist_wtd_comp_avg) {
+          tmp = tmp * conv_params->fwd_offset + res * conv_params->bck_offset;
+          tmp = tmp >> DIST_PRECISION_BITS;
+        } else {
+          tmp += res;
+          tmp = tmp >> 1;
+        }
+        tmp -= round_offset;
+        dst16[y * dst16_stride + x] =
+            clip_pixel_highbd(ROUND_POWER_OF_TWO(tmp, round_bits), bd);
+      } else {
+        dst[y * dst_stride + x] = res;
+      }
+    }
+  }
+}
+
+void av1_highbd_dist_wtd_convolve_y_c(
+    const uint16_t *src, int src_stride, uint16_t *dst16, int dst16_stride,
+    int w, int h, const InterpFilterParams *filter_params_x,
+    const InterpFilterParams *filter_params_y, const int subpel_x_q4,
+    const int subpel_y_q4, ConvolveParams *conv_params, int bd) {
+  CONV_BUF_TYPE *dst = conv_params->dst;
+  int dst_stride = conv_params->dst_stride;
+  const int fo_vert = filter_params_y->taps / 2 - 1;
+  const int bits = FILTER_BITS - conv_params->round_0;
+  const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+  const int round_offset = (1 << (offset_bits - conv_params->round_1)) +
+                           (1 << (offset_bits - conv_params->round_1 - 1));
+  const int round_bits =
+      2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
+  assert(round_bits >= 0);
+  (void)filter_params_x;
+  (void)subpel_x_q4;
+  assert(bits >= 0);
+  // vertical filter
+  const int16_t *y_filter = av1_get_interp_filter_subpel_kernel(
+      filter_params_y, subpel_y_q4 & SUBPEL_MASK);
+  for (int y = 0; y < h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      int32_t res = 0;
+      for (int k = 0; k < filter_params_y->taps; ++k) {
+        res += y_filter[k] * src[(y - fo_vert + k) * src_stride + x];
+      }
+      res *= (1 << bits);
+      res = ROUND_POWER_OF_TWO(res, conv_params->round_1) + round_offset;
+
+      if (conv_params->do_average) {
+        int32_t tmp = dst[y * dst_stride + x];
+        if (conv_params->use_dist_wtd_comp_avg) {
+          tmp = tmp * conv_params->fwd_offset + res * conv_params->bck_offset;
+          tmp = tmp >> DIST_PRECISION_BITS;
+        } else {
+          tmp += res;
+          tmp = tmp >> 1;
+        }
+        tmp -= round_offset;
+        dst16[y * dst16_stride + x] =
+            clip_pixel_highbd(ROUND_POWER_OF_TWO(tmp, round_bits), bd);
+      } else {
+        dst[y * dst_stride + x] = res;
+      }
+    }
+  }
+}
+
+void av1_highbd_dist_wtd_convolve_2d_copy_c(
+    const uint16_t *src, int src_stride, uint16_t *dst16, int dst16_stride,
+    int w, int h, const InterpFilterParams *filter_params_x,
+    const InterpFilterParams *filter_params_y, const int subpel_x_q4,
+    const int subpel_y_q4, ConvolveParams *conv_params, int bd) {
+  CONV_BUF_TYPE *dst = conv_params->dst;
+  int dst_stride = conv_params->dst_stride;
+  const int bits =
+      FILTER_BITS * 2 - conv_params->round_1 - conv_params->round_0;
+  const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+  const int round_offset = (1 << (offset_bits - conv_params->round_1)) +
+                           (1 << (offset_bits - conv_params->round_1 - 1));
+  assert(bits >= 0);
+  (void)filter_params_x;
+  (void)filter_params_y;
+  (void)subpel_x_q4;
+  (void)subpel_y_q4;
+
+  for (int y = 0; y < h; ++y) {
+    for (int x = 0; x < w; ++x) {
+      CONV_BUF_TYPE res = src[y * src_stride + x] << bits;
+      res += round_offset;
+      if (conv_params->do_average) {
+        int32_t tmp = dst[y * dst_stride + x];
+        if (conv_params->use_dist_wtd_comp_avg) {
+          tmp = tmp * conv_params->fwd_offset + res * conv_params->bck_offset;
+          tmp = tmp >> DIST_PRECISION_BITS;
+        } else {
+          tmp += res;
+          tmp = tmp >> 1;
+        }
+        tmp -= round_offset;
+        dst16[y * dst16_stride + x] =
+            clip_pixel_highbd(ROUND_POWER_OF_TWO(tmp, bits), bd);
+      } else {
+        dst[y * dst_stride + x] = res;
+      }
+    }
+  }
+}
+
+void av1_highbd_convolve_2d_scale_c(const uint16_t *src, int src_stride,
+                                    uint16_t *dst, int dst_stride, int w, int h,
+                                    const InterpFilterParams *filter_params_x,
+                                    const InterpFilterParams *filter_params_y,
+                                    const int subpel_x_qn, const int x_step_qn,
+                                    const int subpel_y_qn, const int y_step_qn,
+                                    ConvolveParams *conv_params, int bd) {
+  int16_t im_block[(2 * MAX_SB_SIZE + MAX_FILTER_TAP) * MAX_SB_SIZE];
+  int im_h = (((h - 1) * y_step_qn + subpel_y_qn) >> SCALE_SUBPEL_BITS) +
+             filter_params_y->taps;
+  int im_stride = w;
+  const int fo_vert = filter_params_y->taps / 2 - 1;
+  const int fo_horiz = filter_params_x->taps / 2 - 1;
+  CONV_BUF_TYPE *dst16 = conv_params->dst;
+  const int dst16_stride = conv_params->dst_stride;
+  const int bits =
+      FILTER_BITS * 2 - conv_params->round_0 - conv_params->round_1;
+  assert(bits >= 0);
+  // horizontal filter
+  const uint16_t *src_horiz = src - fo_vert * src_stride;
+  for (int y = 0; y < im_h; ++y) {
+    int x_qn = subpel_x_qn;
+    for (int x = 0; x < w; ++x, x_qn += x_step_qn) {
+      const uint16_t *const src_x = &src_horiz[(x_qn >> SCALE_SUBPEL_BITS)];
+      const int x_filter_idx = (x_qn & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS;
+      assert(x_filter_idx < SUBPEL_SHIFTS);
+      const int16_t *x_filter =
+          av1_get_interp_filter_subpel_kernel(filter_params_x, x_filter_idx);
+      int32_t sum = (1 << (bd + FILTER_BITS - 1));
+      for (int k = 0; k < filter_params_x->taps; ++k) {
+        sum += x_filter[k] * src_x[k - fo_horiz];
+      }
+      assert(0 <= sum && sum < (1 << (bd + FILTER_BITS + 1)));
+      im_block[y * im_stride + x] =
+          (int16_t)ROUND_POWER_OF_TWO(sum, conv_params->round_0);
+    }
+    src_horiz += src_stride;
+  }
+
+  // vertical filter
+  int16_t *src_vert = im_block + fo_vert * im_stride;
+  const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+  for (int x = 0; x < w; ++x) {
+    int y_qn = subpel_y_qn;
+    for (int y = 0; y < h; ++y, y_qn += y_step_qn) {
+      const int16_t *src_y = &src_vert[(y_qn >> SCALE_SUBPEL_BITS) * im_stride];
+      const int y_filter_idx = (y_qn & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS;
+      assert(y_filter_idx < SUBPEL_SHIFTS);
+      const int16_t *y_filter =
+          av1_get_interp_filter_subpel_kernel(filter_params_y, y_filter_idx);
+      int32_t sum = 1 << offset_bits;
+      for (int k = 0; k < filter_params_y->taps; ++k) {
+        sum += y_filter[k] * src_y[(k - fo_vert) * im_stride];
+      }
+      assert(0 <= sum && sum < (1 << (offset_bits + 2)));
+      CONV_BUF_TYPE res = ROUND_POWER_OF_TWO(sum, conv_params->round_1);
+      if (conv_params->is_compound) {
+        if (conv_params->do_average) {
+          int32_t tmp = dst16[y * dst16_stride + x];
+          if (conv_params->use_dist_wtd_comp_avg) {
+            tmp = tmp * conv_params->fwd_offset + res * conv_params->bck_offset;
+            tmp = tmp >> DIST_PRECISION_BITS;
+          } else {
+            tmp += res;
+            tmp = tmp >> 1;
+          }
+          /* Subtract round offset and convolve round */
+          tmp = tmp - ((1 << (offset_bits - conv_params->round_1)) +
+                       (1 << (offset_bits - conv_params->round_1 - 1)));
+          dst[y * dst_stride + x] =
+              clip_pixel_highbd(ROUND_POWER_OF_TWO(tmp, bits), bd);
+        } else {
+          dst16[y * dst16_stride + x] = res;
+        }
+      } else {
+        /* Subtract round offset and convolve round */
+        int32_t tmp = res - ((1 << (offset_bits - conv_params->round_1)) +
+                             (1 << (offset_bits - conv_params->round_1 - 1)));
+        dst[y * dst_stride + x] =
+            clip_pixel_highbd(ROUND_POWER_OF_TWO(tmp, bits), bd);
+      }
+    }
+    src_vert++;
+  }
+}
+
+static void highbd_convolve_2d_for_intrabc(const uint16_t *src, int src_stride,
+                                           uint16_t *dst, int dst_stride, int w,
+                                           int h, int subpel_x_q4,
+                                           int subpel_y_q4,
+                                           ConvolveParams *conv_params,
+                                           int bd) {
+  const InterpFilterParams *filter_params_x =
+      subpel_x_q4 ? &av1_intrabc_filter_params : NULL;
+  const InterpFilterParams *filter_params_y =
+      subpel_y_q4 ? &av1_intrabc_filter_params : NULL;
+  if (subpel_x_q4 != 0 && subpel_y_q4 != 0) {
+    av1_highbd_convolve_2d_sr_c(src, src_stride, dst, dst_stride, w, h,
+                                filter_params_x, filter_params_y, 0, 0,
+                                conv_params, bd);
+  } else if (subpel_x_q4 != 0) {
+    av1_highbd_convolve_x_sr_c(src, src_stride, dst, dst_stride, w, h,
+                               filter_params_x, filter_params_y, 0, 0,
+                               conv_params, bd);
+  } else {
+    av1_highbd_convolve_y_sr_c(src, src_stride, dst, dst_stride, w, h,
+                               filter_params_x, filter_params_y, 0, 0,
+                               conv_params, bd);
+  }
+}
+
+void av1_highbd_convolve_2d_facade(const uint8_t *src8, int src_stride,
+                                   uint8_t *dst8, int dst_stride, int w, int h,
+                                   InterpFilters interp_filters,
+                                   const int subpel_x_q4, int x_step_q4,
+                                   const int subpel_y_q4, int y_step_q4,
+                                   int scaled, ConvolveParams *conv_params,
+                                   const struct scale_factors *sf,
+                                   int is_intrabc, int bd) {
+  assert(IMPLIES(is_intrabc, !scaled));
+  (void)x_step_q4;
+  (void)y_step_q4;
+  (void)dst_stride;
+  const uint16_t *src = CONVERT_TO_SHORTPTR(src8);
+
+  if (is_intrabc && (subpel_x_q4 != 0 || subpel_y_q4 != 0)) {
+    uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
+    highbd_convolve_2d_for_intrabc(src, src_stride, dst, dst_stride, w, h,
+                                   subpel_x_q4, subpel_y_q4, conv_params, bd);
+    return;
+  }
+
+  InterpFilter filter_x = 0;
+  InterpFilter filter_y = 0;
+  const int need_filter_params_x = (subpel_x_q4 != 0) | scaled;
+  const int need_filter_params_y = (subpel_y_q4 != 0) | scaled;
+  if (need_filter_params_x)
+    filter_x = av1_extract_interp_filter(interp_filters, 1);
+  if (need_filter_params_y)
+    filter_y = av1_extract_interp_filter(interp_filters, 0);
+  const InterpFilterParams *filter_params_x =
+      need_filter_params_x
+          ? av1_get_interp_filter_params_with_block_size(filter_x, w)
+          : NULL;
+  const InterpFilterParams *filter_params_y =
+      need_filter_params_y
+          ? av1_get_interp_filter_params_with_block_size(filter_y, h)
+          : NULL;
+
+  if (scaled) {
+    uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
+    if (conv_params->is_compound) {
+      assert(conv_params->dst != NULL);
+    }
+    av1_highbd_convolve_2d_scale(src, src_stride, dst, dst_stride, w, h,
+                                 filter_params_x, filter_params_y, subpel_x_q4,
+                                 x_step_q4, subpel_y_q4, y_step_q4, conv_params,
+                                 bd);
+  } else {
+    uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
+
+    sf->highbd_convolve[subpel_x_q4 != 0][subpel_y_q4 !=
+                                          0][conv_params->is_compound](
+        src, src_stride, dst, dst_stride, w, h, filter_params_x,
+        filter_params_y, subpel_x_q4, subpel_y_q4, conv_params, bd);
+  }
+}
+
+// Note: Fixed size intermediate buffers, place limits on parameters
+// of some functions. 2d filtering proceeds in 2 steps:
+//   (1) Interpolate horizontally into an intermediate buffer, temp.
+//   (2) Interpolate temp vertically to derive the sub-pixel result.
+// Deriving the maximum number of rows in the temp buffer (135):
+// --Smallest scaling factor is x1/2 ==> y_step_q4 = 32 (Normative).
+// --Largest block size is 128x128 pixels.
+// --128 rows in the downscaled frame span a distance of (128 - 1) * 32 in the
+//   original frame (in 1/16th pixel units).
+// --Must round-up because block may be located at sub-pixel position.
+// --Require an additional SUBPEL_TAPS rows for the 8-tap filter tails.
+// --((128 - 1) * 32 + 15) >> 4 + 8 = 263.
+#define WIENER_MAX_EXT_SIZE 263
+
+static INLINE int horz_scalar_product(const uint8_t *a, const int16_t *b) {
+  int sum = 0;
+  for (int k = 0; k < SUBPEL_TAPS; ++k) sum += a[k] * b[k];
+  return sum;
+}
+
+static INLINE int highbd_horz_scalar_product(const uint16_t *a,
+                                             const int16_t *b) {
+  int sum = 0;
+  for (int k = 0; k < SUBPEL_TAPS; ++k) sum += a[k] * b[k];
+  return sum;
+}
+
+static INLINE int highbd_vert_scalar_product(const uint16_t *a,
+                                             ptrdiff_t a_stride,
+                                             const int16_t *b) {
+  int sum = 0;
+  for (int k = 0; k < SUBPEL_TAPS; ++k) sum += a[k * a_stride] * b[k];
+  return sum;
+}
+
+static const InterpKernel *get_filter_base(const int16_t *filter) {
+  // NOTE: This assumes that the filter table is 256-byte aligned.
+  // TODO(agrange) Modify to make independent of table alignment.
+  return (const InterpKernel *)(((intptr_t)filter) & ~((intptr_t)0xFF));
+}
+
+static int get_filter_offset(const int16_t *f, const InterpKernel *base) {
+  return (int)((const InterpKernel *)(intptr_t)f - base);
+}
+
+static void convolve_add_src_horiz_hip(const uint8_t *src, ptrdiff_t src_stride,
+                                       uint16_t *dst, ptrdiff_t dst_stride,
+                                       const InterpKernel *x_filters, int x0_q4,
+                                       int x_step_q4, int w, int h,
+                                       int round0_bits) {
+  const int bd = 8;
+  src -= SUBPEL_TAPS / 2 - 1;
+  for (int y = 0; y < h; ++y) {
+    int x_q4 = x0_q4;
+    for (int x = 0; x < w; ++x) {
+      const uint8_t *const src_x = &src[x_q4 >> SUBPEL_BITS];
+      const int16_t *const x_filter = x_filters[x_q4 & SUBPEL_MASK];
+      const int rounding = ((int)src_x[SUBPEL_TAPS / 2 - 1] << FILTER_BITS) +
+                           (1 << (bd + FILTER_BITS - 1));
+      const int sum = horz_scalar_product(src_x, x_filter) + rounding;
+      dst[x] = (uint16_t)clamp(ROUND_POWER_OF_TWO(sum, round0_bits), 0,
+                               WIENER_CLAMP_LIMIT(round0_bits, bd) - 1);
+      x_q4 += x_step_q4;
+    }
+    src += src_stride;
+    dst += dst_stride;
+  }
+}
+
+static void convolve_add_src_vert_hip(const uint16_t *src, ptrdiff_t src_stride,
+                                      uint8_t *dst, ptrdiff_t dst_stride,
+                                      const InterpKernel *y_filters, int y0_q4,
+                                      int y_step_q4, int w, int h,
+                                      int round1_bits) {
+  const int bd = 8;
+  src -= src_stride * (SUBPEL_TAPS / 2 - 1);
+
+  for (int x = 0; x < w; ++x) {
+    int y_q4 = y0_q4;
+    for (int y = 0; y < h; ++y) {
+      const uint16_t *src_y = &src[(y_q4 >> SUBPEL_BITS) * src_stride];
+      const int16_t *const y_filter = y_filters[y_q4 & SUBPEL_MASK];
+      const int rounding =
+          ((int)src_y[(SUBPEL_TAPS / 2 - 1) * src_stride] << FILTER_BITS) -
+          (1 << (bd + round1_bits - 1));
+      const int sum =
+          highbd_vert_scalar_product(src_y, src_stride, y_filter) + rounding;
+      dst[y * dst_stride] = clip_pixel(ROUND_POWER_OF_TWO(sum, round1_bits));
+      y_q4 += y_step_q4;
+    }
+    ++src;
+    ++dst;
+  }
+}
+
+void av1_wiener_convolve_add_src_c(const uint8_t *src, ptrdiff_t src_stride,
+                                   uint8_t *dst, ptrdiff_t dst_stride,
+                                   const int16_t *filter_x, int x_step_q4,
+                                   const int16_t *filter_y, int y_step_q4,
+                                   int w, int h,
+                                   const ConvolveParams *conv_params) {
+  const InterpKernel *const filters_x = get_filter_base(filter_x);
+  const int x0_q4 = get_filter_offset(filter_x, filters_x);
+
+  const InterpKernel *const filters_y = get_filter_base(filter_y);
+  const int y0_q4 = get_filter_offset(filter_y, filters_y);
+
+  uint16_t temp[WIENER_MAX_EXT_SIZE * MAX_SB_SIZE];
+  const int intermediate_height =
+      (((h - 1) * y_step_q4 + y0_q4) >> SUBPEL_BITS) + SUBPEL_TAPS - 1;
+  memset(temp + (intermediate_height * MAX_SB_SIZE), 0, MAX_SB_SIZE);
+
+  assert(w <= MAX_SB_SIZE);
+  assert(h <= MAX_SB_SIZE);
+  assert(y_step_q4 <= 32);
+  assert(x_step_q4 <= 32);
+
+  convolve_add_src_horiz_hip(src - src_stride * (SUBPEL_TAPS / 2 - 1),
+                             src_stride, temp, MAX_SB_SIZE, filters_x, x0_q4,
+                             x_step_q4, w, intermediate_height,
+                             conv_params->round_0);
+  convolve_add_src_vert_hip(temp + MAX_SB_SIZE * (SUBPEL_TAPS / 2 - 1),
+                            MAX_SB_SIZE, dst, dst_stride, filters_y, y0_q4,
+                            y_step_q4, w, h, conv_params->round_1);
+}
+
+static void highbd_convolve_add_src_horiz_hip(
+    const uint8_t *src8, ptrdiff_t src_stride, uint16_t *dst,
+    ptrdiff_t dst_stride, const InterpKernel *x_filters, int x0_q4,
+    int x_step_q4, int w, int h, int round0_bits, int bd) {
+  const int extraprec_clamp_limit = WIENER_CLAMP_LIMIT(round0_bits, bd);
+  uint16_t *src = CONVERT_TO_SHORTPTR(src8);
+  src -= SUBPEL_TAPS / 2 - 1;
+  for (int y = 0; y < h; ++y) {
+    int x_q4 = x0_q4;
+    for (int x = 0; x < w; ++x) {
+      const uint16_t *const src_x = &src[x_q4 >> SUBPEL_BITS];
+      const int16_t *const x_filter = x_filters[x_q4 & SUBPEL_MASK];
+      const int rounding = ((int)src_x[SUBPEL_TAPS / 2 - 1] << FILTER_BITS) +
+                           (1 << (bd + FILTER_BITS - 1));
+      const int sum = highbd_horz_scalar_product(src_x, x_filter) + rounding;
+      dst[x] = (uint16_t)clamp(ROUND_POWER_OF_TWO(sum, round0_bits), 0,
+                               extraprec_clamp_limit - 1);
+      x_q4 += x_step_q4;
+    }
+    src += src_stride;
+    dst += dst_stride;
+  }
+}
+
+static void highbd_convolve_add_src_vert_hip(
+    const uint16_t *src, ptrdiff_t src_stride, uint8_t *dst8,
+    ptrdiff_t dst_stride, const InterpKernel *y_filters, int y0_q4,
+    int y_step_q4, int w, int h, int round1_bits, int bd) {
+  uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
+  src -= src_stride * (SUBPEL_TAPS / 2 - 1);
+  for (int x = 0; x < w; ++x) {
+    int y_q4 = y0_q4;
+    for (int y = 0; y < h; ++y) {
+      const uint16_t *src_y = &src[(y_q4 >> SUBPEL_BITS) * src_stride];
+      const int16_t *const y_filter = y_filters[y_q4 & SUBPEL_MASK];
+      const int rounding =
+          ((int)src_y[(SUBPEL_TAPS / 2 - 1) * src_stride] << FILTER_BITS) -
+          (1 << (bd + round1_bits - 1));
+      const int sum =
+          highbd_vert_scalar_product(src_y, src_stride, y_filter) + rounding;
+      dst[y * dst_stride] =
+          clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, round1_bits), bd);
+      y_q4 += y_step_q4;
+    }
+    ++src;
+    ++dst;
+  }
+}
+
+void av1_highbd_wiener_convolve_add_src_c(
+    const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst,
+    ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4,
+    const int16_t *filter_y, int y_step_q4, int w, int h,
+    const ConvolveParams *conv_params, int bd) {
+  const InterpKernel *const filters_x = get_filter_base(filter_x);
+  const int x0_q4 = get_filter_offset(filter_x, filters_x);
+
+  const InterpKernel *const filters_y = get_filter_base(filter_y);
+  const int y0_q4 = get_filter_offset(filter_y, filters_y);
+
+  uint16_t temp[WIENER_MAX_EXT_SIZE * MAX_SB_SIZE];
+  const int intermediate_height =
+      (((h - 1) * y_step_q4 + y0_q4) >> SUBPEL_BITS) + SUBPEL_TAPS;
+
+  assert(w <= MAX_SB_SIZE);
+  assert(h <= MAX_SB_SIZE);
+  assert(y_step_q4 <= 32);
+  assert(x_step_q4 <= 32);
+  assert(bd + FILTER_BITS - conv_params->round_0 + 2 <= 16);
+
+  highbd_convolve_add_src_horiz_hip(src - src_stride * (SUBPEL_TAPS / 2 - 1),
+                                    src_stride, temp, MAX_SB_SIZE, filters_x,
+                                    x0_q4, x_step_q4, w, intermediate_height,
+                                    conv_params->round_0, bd);
+  highbd_convolve_add_src_vert_hip(
+      temp + MAX_SB_SIZE * (SUBPEL_TAPS / 2 - 1), MAX_SB_SIZE, dst, dst_stride,
+      filters_y, y0_q4, y_step_q4, w, h, conv_params->round_1, bd);
+}

diff --git a/libav1/av1/common/convolve.h b/libav1/av1/common/convolve.h
new file mode 100644
index 0000000..e5479e6
--- /dev/null
+++ b/libav1/av1/common/convolve.h

@@ -0,0 +1,130 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_CONVOLVE_H_
+#define AOM_AV1_COMMON_CONVOLVE_H_
+#include "av1/common/filter.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef uint16_t CONV_BUF_TYPE;
+typedef struct ConvolveParams {
+  int do_average;
+  CONV_BUF_TYPE *dst;
+  int dst_stride;
+  int round_0;
+  int round_1;
+  int plane;
+  int is_compound;
+  int use_dist_wtd_comp_avg;
+  int fwd_offset;
+  int bck_offset;
+} ConvolveParams;
+
+#define ROUND0_BITS 3
+#define COMPOUND_ROUND1_BITS 7
+#define WIENER_ROUND0_BITS 3
+
+#define WIENER_CLAMP_LIMIT(r0, bd) (1 << ((bd) + 1 + FILTER_BITS - r0))
+
+typedef void (*aom_convolve_fn_t)(const uint8_t *src, int src_stride,
+                                  uint8_t *dst, int dst_stride, int w, int h,
+                                  const InterpFilterParams *filter_params_x,
+                                  const InterpFilterParams *filter_params_y,
+                                  const int subpel_x_q4, const int subpel_y_q4,
+                                  ConvolveParams *conv_params);
+
+typedef void (*aom_highbd_convolve_fn_t)(
+    const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w,
+    int h, const InterpFilterParams *filter_params_x,
+    const InterpFilterParams *filter_params_y, const int subpel_x_q4,
+    const int subpel_y_q4, ConvolveParams *conv_params, int bd);
+
+struct AV1Common;
+struct scale_factors;
+
+void av1_convolve_2d_facade(const uint8_t *src, int src_stride, uint8_t *dst,
+                            int dst_stride, int w, int h,
+                            InterpFilters interp_filters, const int subpel_x_q4,
+                            int x_step_q4, const int subpel_y_q4, int y_step_q4,
+                            int scaled, ConvolveParams *conv_params,
+                            const struct scale_factors *sf, int is_intrabc);
+
+static INLINE ConvolveParams get_conv_params_no_round(int do_average, int plane,
+                                                      CONV_BUF_TYPE *dst,
+                                                      int dst_stride,
+                                                      int is_compound, int bd) {
+  ConvolveParams conv_params;
+  conv_params.do_average = do_average;
+  assert(IMPLIES(do_average, is_compound));
+  conv_params.is_compound = is_compound;
+  conv_params.round_0 = ROUND0_BITS;
+  conv_params.round_1 = is_compound ? COMPOUND_ROUND1_BITS
+                                    : 2 * FILTER_BITS - conv_params.round_0;
+  const int intbufrange = bd + FILTER_BITS - conv_params.round_0 + 2;
+  assert(IMPLIES(bd < 12, intbufrange <= 16));
+  if (intbufrange > 16) {
+    conv_params.round_0 += intbufrange - 16;
+    if (!is_compound) conv_params.round_1 -= intbufrange - 16;
+  }
+  // TODO(yunqing): The following dst should only be valid while
+  // is_compound = 1;
+  conv_params.dst = dst;
+  conv_params.dst_stride = dst_stride;
+  conv_params.plane = plane;
+  return conv_params;
+}
+
+static INLINE ConvolveParams get_conv_params(int do_average, int plane,
+                                             int bd) {
+  return get_conv_params_no_round(do_average, plane, NULL, 0, 0, bd);
+}
+
+static INLINE ConvolveParams get_conv_params_wiener(int bd) {
+  ConvolveParams conv_params;
+  (void)bd;
+  conv_params.do_average = 0;
+  conv_params.is_compound = 0;
+  conv_params.round_0 = WIENER_ROUND0_BITS;
+  conv_params.round_1 = 2 * FILTER_BITS - conv_params.round_0;
+  const int intbufrange = bd + FILTER_BITS - conv_params.round_0 + 2;
+  assert(IMPLIES(bd < 12, intbufrange <= 16));
+  if (intbufrange > 16) {
+    conv_params.round_0 += intbufrange - 16;
+    conv_params.round_1 -= intbufrange - 16;
+  }
+  conv_params.dst = NULL;
+  conv_params.dst_stride = 0;
+  conv_params.plane = 0;
+  return conv_params;
+}
+
+void av1_highbd_convolve_2d_facade(const uint8_t *src8, int src_stride,
+                                   uint8_t *dst, int dst_stride, int w, int h,
+                                   InterpFilters interp_filters,
+                                   const int subpel_x_q4, int x_step_q4,
+                                   const int subpel_y_q4, int y_step_q4,
+                                   int scaled, ConvolveParams *conv_params,
+                                   const struct scale_factors *sf,
+                                   int is_intrabc, int bd);
+
+// TODO(sarahparker) This will need to be integerized and optimized
+void av1_convolve_2d_sobel_y_c(const uint8_t *src, int src_stride, double *dst,
+                               int dst_stride, int w, int h, int dir,
+                               double norm);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_CONVOLVE_H_

diff --git a/libav1/av1/common/debugmodes.c b/libav1/av1/common/debugmodes.c
new file mode 100644
index 0000000..b26c7dd
--- /dev/null
+++ b/libav1/av1/common/debugmodes.c

@@ -0,0 +1,107 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <stdio.h>
+
+#include "av1/common/blockd.h"
+#include "av1/common/enums.h"
+#include "av1/common/onyxc_int.h"
+
+static void log_frame_info(AV1_COMMON *cm, const char *str, FILE *f) {
+  fprintf(f, "%s", str);
+  fprintf(f, "(Frame %d, Show:%d, Q:%d): \n", cm->current_frame.frame_number,
+          cm->show_frame, cm->base_qindex);
+}
+/* This function dereferences a pointer to the mbmi structure
+ * and uses the passed in member offset to print out the value of an integer
+ * for each mbmi member value in the mi structure.
+ */
+static void print_mi_data(AV1_COMMON *cm, FILE *file, const char *descriptor,
+                          size_t member_offset) {
+  int mi_row, mi_col;
+  MB_MODE_INFO **mi = cm->mi_grid_visible;
+  int rows = cm->mi_rows;
+  int cols = cm->mi_cols;
+  char prefix = descriptor[0];
+
+  log_frame_info(cm, descriptor, file);
+  for (mi_row = 0; mi_row < rows; mi_row++) {
+    fprintf(file, "%c ", prefix);
+    for (mi_col = 0; mi_col < cols; mi_col++) {
+      fprintf(file, "%2d ", *((char *)((char *)(mi[0]) + member_offset)));
+      mi++;
+    }
+    fprintf(file, "\n");
+    mi += cm->mi_stride - cols;
+  }
+  fprintf(file, "\n");
+}
+
+void av1_print_modes_and_motion_vectors(AV1_COMMON *cm, const char *file) {
+  int mi_row;
+  int mi_col;
+  FILE *mvs = fopen(file, "a");
+  MB_MODE_INFO **mi = cm->mi_grid_visible;
+  int rows = cm->mi_rows;
+  int cols = cm->mi_cols;
+
+  print_mi_data(cm, mvs, "Partitions:", offsetof(MB_MODE_INFO, sb_type));
+  print_mi_data(cm, mvs, "Modes:", offsetof(MB_MODE_INFO, mode));
+  print_mi_data(cm, mvs, "Ref frame:", offsetof(MB_MODE_INFO, ref_frame[0]));
+  print_mi_data(cm, mvs, "Transform:", offsetof(MB_MODE_INFO, tx_size));
+  print_mi_data(cm, mvs, "UV Modes:", offsetof(MB_MODE_INFO, uv_mode));
+
+  // output skip infomation.
+  log_frame_info(cm, "Skips:", mvs);
+  for (mi_row = 0; mi_row < rows; mi_row++) {
+    fprintf(mvs, "S ");
+    for (mi_col = 0; mi_col < cols; mi_col++) {
+      fprintf(mvs, "%2d ", mi[0]->skip);
+      mi++;
+    }
+    fprintf(mvs, "\n");
+    mi += cm->mi_stride - cols;
+  }
+  fprintf(mvs, "\n");
+
+  // output motion vectors.
+  log_frame_info(cm, "Vectors ", mvs);
+  mi = cm->mi_grid_visible;
+  for (mi_row = 0; mi_row < rows; mi_row++) {
+    fprintf(mvs, "V ");
+    for (mi_col = 0; mi_col < cols; mi_col++) {
+      fprintf(mvs, "%4d:%4d ", mi[0]->mv[0].as_mv.row, mi[0]->mv[0].as_mv.col);
+      mi++;
+    }
+    fprintf(mvs, "\n");
+    mi += cm->mi_stride - cols;
+  }
+  fprintf(mvs, "\n");
+
+  fclose(mvs);
+}
+
+void av1_print_uncompressed_frame_header(const uint8_t *data, int size,
+                                         const char *filename) {
+  FILE *hdrFile = fopen(filename, "w");
+  fwrite(data, size, sizeof(uint8_t), hdrFile);
+  fclose(hdrFile);
+}
+
+void av1_print_frame_contexts(const FRAME_CONTEXT *fc, const char *filename) {
+  FILE *fcFile = fopen(filename, "w");
+  const uint16_t *fcp = (uint16_t *)fc;
+  const unsigned int n_contexts = sizeof(FRAME_CONTEXT) / sizeof(uint16_t);
+  unsigned int i;
+
+  for (i = 0; i < n_contexts; ++i) fprintf(fcFile, "%d ", *fcp++);
+  fclose(fcFile);
+}

diff --git a/libav1/av1/common/entropy.c b/libav1/av1/common/entropy.c
new file mode 100644
index 0000000..f63ac98
--- /dev/null
+++ b/libav1/av1/common/entropy.c

@@ -0,0 +1,178 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "config/aom_config.h"
+
+#include "aom/aom_integer.h"
+#include "aom_mem/aom_mem.h"
+#include "av1/common/blockd.h"
+#include "av1/common/entropy.h"
+#include "av1/common/entropymode.h"
+#include "av1/common/onyxc_int.h"
+#include "av1/common/scan.h"
+#include "av1/common/token_cdfs.h"
+#include "av1/common/txb_common.h"
+
+static int get_q_ctx(int q) {
+  if (q <= 20) return 0;
+  if (q <= 60) return 1;
+  if (q <= 120) return 2;
+  return 3;
+}
+
+void av1_default_coef_probs(AV1_COMMON *cm) {
+  const int index = get_q_ctx(cm->base_qindex);
+#if CONFIG_ENTROPY_STATS
+  cm->coef_cdf_category = index;
+#endif
+
+  av1_copy(cm->fc->txb_skip_cdf, av1_default_txb_skip_cdfs[index]);
+  av1_copy(cm->fc->eob_extra_cdf, av1_default_eob_extra_cdfs[index]);
+  av1_copy(cm->fc->dc_sign_cdf, av1_default_dc_sign_cdfs[index]);
+  av1_copy(cm->fc->coeff_br_cdf, av1_default_coeff_lps_multi_cdfs[index]);
+  av1_copy(cm->fc->coeff_base_cdf, av1_default_coeff_base_multi_cdfs[index]);
+  av1_copy(cm->fc->coeff_base_eob_cdf,
+           av1_default_coeff_base_eob_multi_cdfs[index]);
+  av1_copy(cm->fc->eob_flag_cdf16, av1_default_eob_multi16_cdfs[index]);
+  av1_copy(cm->fc->eob_flag_cdf32, av1_default_eob_multi32_cdfs[index]);
+  av1_copy(cm->fc->eob_flag_cdf64, av1_default_eob_multi64_cdfs[index]);
+  av1_copy(cm->fc->eob_flag_cdf128, av1_default_eob_multi128_cdfs[index]);
+  av1_copy(cm->fc->eob_flag_cdf256, av1_default_eob_multi256_cdfs[index]);
+  av1_copy(cm->fc->eob_flag_cdf512, av1_default_eob_multi512_cdfs[index]);
+  av1_copy(cm->fc->eob_flag_cdf1024, av1_default_eob_multi1024_cdfs[index]);
+}
+
+static void reset_cdf_symbol_counter(aom_cdf_prob *cdf_ptr, int num_cdfs,
+                                     int cdf_stride, int nsymbs) {
+  for (int i = 0; i < num_cdfs; i++) {
+    cdf_ptr[i * cdf_stride + nsymbs] = 0;
+  }
+}
+
+#define RESET_CDF_COUNTER(cname, nsymbs) \
+  RESET_CDF_COUNTER_STRIDE(cname, nsymbs, CDF_SIZE(nsymbs))
+
+#define RESET_CDF_COUNTER_STRIDE(cname, nsymbs, cdf_stride)          \
+  do {                                                               \
+    aom_cdf_prob *cdf_ptr = (aom_cdf_prob *)cname;                   \
+    int array_size = (int)sizeof(cname) / sizeof(aom_cdf_prob);      \
+    int num_cdfs = array_size / cdf_stride;                          \
+    reset_cdf_symbol_counter(cdf_ptr, num_cdfs, cdf_stride, nsymbs); \
+  } while (0)
+
+static void reset_nmv_counter(nmv_context *nmv) {
+  RESET_CDF_COUNTER(nmv->joints_cdf, 4);
+  for (int i = 0; i < 2; i++) {
+    RESET_CDF_COUNTER(nmv->comps[i].classes_cdf, MV_CLASSES);
+    RESET_CDF_COUNTER(nmv->comps[i].class0_fp_cdf, MV_FP_SIZE);
+    RESET_CDF_COUNTER(nmv->comps[i].fp_cdf, MV_FP_SIZE);
+    RESET_CDF_COUNTER(nmv->comps[i].sign_cdf, 2);
+    RESET_CDF_COUNTER(nmv->comps[i].class0_hp_cdf, 2);
+    RESET_CDF_COUNTER(nmv->comps[i].hp_cdf, 2);
+    RESET_CDF_COUNTER(nmv->comps[i].class0_cdf, CLASS0_SIZE);
+    RESET_CDF_COUNTER(nmv->comps[i].bits_cdf, 2);
+  }
+}
+
+void av1_reset_cdf_symbol_counters(FRAME_CONTEXT *fc) {
+  RESET_CDF_COUNTER(fc->txb_skip_cdf, 2);
+  RESET_CDF_COUNTER(fc->eob_extra_cdf, 2);
+  RESET_CDF_COUNTER(fc->dc_sign_cdf, 2);
+  RESET_CDF_COUNTER(fc->eob_flag_cdf16, 5);
+  RESET_CDF_COUNTER(fc->eob_flag_cdf32, 6);
+  RESET_CDF_COUNTER(fc->eob_flag_cdf64, 7);
+  RESET_CDF_COUNTER(fc->eob_flag_cdf128, 8);
+  RESET_CDF_COUNTER(fc->eob_flag_cdf256, 9);
+  RESET_CDF_COUNTER(fc->eob_flag_cdf512, 10);
+  RESET_CDF_COUNTER(fc->eob_flag_cdf1024, 11);
+  RESET_CDF_COUNTER(fc->coeff_base_eob_cdf, 3);
+  RESET_CDF_COUNTER(fc->coeff_base_cdf, 4);
+  RESET_CDF_COUNTER(fc->coeff_br_cdf, BR_CDF_SIZE);
+  RESET_CDF_COUNTER(fc->newmv_cdf, 2);
+  RESET_CDF_COUNTER(fc->zeromv_cdf, 2);
+  RESET_CDF_COUNTER(fc->refmv_cdf, 2);
+  RESET_CDF_COUNTER(fc->drl_cdf, 2);
+  RESET_CDF_COUNTER(fc->inter_compound_mode_cdf, INTER_COMPOUND_MODES);
+  RESET_CDF_COUNTER(fc->compound_type_cdf, MASKED_COMPOUND_TYPES);
+  RESET_CDF_COUNTER(fc->wedge_idx_cdf, 16);
+  RESET_CDF_COUNTER(fc->interintra_cdf, 2);
+  RESET_CDF_COUNTER(fc->wedge_interintra_cdf, 2);
+  RESET_CDF_COUNTER(fc->interintra_mode_cdf, INTERINTRA_MODES);
+  RESET_CDF_COUNTER(fc->motion_mode_cdf, MOTION_MODES);
+  RESET_CDF_COUNTER(fc->obmc_cdf, 2);
+  RESET_CDF_COUNTER(fc->palette_y_size_cdf, PALETTE_SIZES);
+  RESET_CDF_COUNTER(fc->palette_uv_size_cdf, PALETTE_SIZES);
+  for (int j = 0; j < PALETTE_SIZES; j++) {
+    int nsymbs = j + PALETTE_MIN_SIZE;
+    RESET_CDF_COUNTER_STRIDE(fc->palette_y_color_index_cdf[j], nsymbs,
+                             CDF_SIZE(PALETTE_COLORS));
+    RESET_CDF_COUNTER_STRIDE(fc->palette_uv_color_index_cdf[j], nsymbs,
+                             CDF_SIZE(PALETTE_COLORS));
+  }
+  RESET_CDF_COUNTER(fc->palette_y_mode_cdf, 2);
+  RESET_CDF_COUNTER(fc->palette_uv_mode_cdf, 2);
+  RESET_CDF_COUNTER(fc->comp_inter_cdf, 2);
+  RESET_CDF_COUNTER(fc->single_ref_cdf, 2);
+  RESET_CDF_COUNTER(fc->comp_ref_type_cdf, 2);
+  RESET_CDF_COUNTER(fc->uni_comp_ref_cdf, 2);
+  RESET_CDF_COUNTER(fc->comp_ref_cdf, 2);
+  RESET_CDF_COUNTER(fc->comp_bwdref_cdf, 2);
+  RESET_CDF_COUNTER(fc->txfm_partition_cdf, 2);
+  RESET_CDF_COUNTER(fc->compound_index_cdf, 2);
+  RESET_CDF_COUNTER(fc->comp_group_idx_cdf, 2);
+  RESET_CDF_COUNTER(fc->skip_mode_cdfs, 2);
+  RESET_CDF_COUNTER(fc->skip_cdfs, 2);
+  RESET_CDF_COUNTER(fc->intra_inter_cdf, 2);
+  reset_nmv_counter(&fc->nmvc);
+  reset_nmv_counter(&fc->ndvc);
+  RESET_CDF_COUNTER(fc->intrabc_cdf, 2);
+  RESET_CDF_COUNTER(fc->seg.tree_cdf, MAX_SEGMENTS);
+  RESET_CDF_COUNTER(fc->seg.pred_cdf, 2);
+  RESET_CDF_COUNTER(fc->seg.spatial_pred_seg_cdf, MAX_SEGMENTS);
+  RESET_CDF_COUNTER(fc->filter_intra_cdfs, 2);
+  RESET_CDF_COUNTER(fc->filter_intra_mode_cdf, FILTER_INTRA_MODES);
+  RESET_CDF_COUNTER(fc->switchable_restore_cdf, RESTORE_SWITCHABLE_TYPES);
+  RESET_CDF_COUNTER(fc->wiener_restore_cdf, 2);
+  RESET_CDF_COUNTER(fc->sgrproj_restore_cdf, 2);
+  RESET_CDF_COUNTER(fc->y_mode_cdf, INTRA_MODES);
+  RESET_CDF_COUNTER_STRIDE(fc->uv_mode_cdf[0], UV_INTRA_MODES - 1,
+                           CDF_SIZE(UV_INTRA_MODES));
+  RESET_CDF_COUNTER(fc->uv_mode_cdf[1], UV_INTRA_MODES);
+  for (int i = 0; i < PARTITION_CONTEXTS; i++) {
+    if (i < 4) {
+      RESET_CDF_COUNTER_STRIDE(fc->partition_cdf[i], 4, CDF_SIZE(10));
+    } else if (i < 16) {
+      RESET_CDF_COUNTER(fc->partition_cdf[i], 10);
+    } else {
+      RESET_CDF_COUNTER_STRIDE(fc->partition_cdf[i], 8, CDF_SIZE(10));
+    }
+  }
+  RESET_CDF_COUNTER(fc->switchable_interp_cdf, SWITCHABLE_FILTERS);
+  RESET_CDF_COUNTER(fc->kf_y_cdf, INTRA_MODES);
+  RESET_CDF_COUNTER(fc->angle_delta_cdf, 2 * MAX_ANGLE_DELTA + 1);
+  RESET_CDF_COUNTER_STRIDE(fc->tx_size_cdf[0], MAX_TX_DEPTH,
+                           CDF_SIZE(MAX_TX_DEPTH + 1));
+  RESET_CDF_COUNTER(fc->tx_size_cdf[1], MAX_TX_DEPTH + 1);
+  RESET_CDF_COUNTER(fc->tx_size_cdf[2], MAX_TX_DEPTH + 1);
+  RESET_CDF_COUNTER(fc->tx_size_cdf[3], MAX_TX_DEPTH + 1);
+  RESET_CDF_COUNTER(fc->delta_q_cdf, DELTA_Q_PROBS + 1);
+  RESET_CDF_COUNTER(fc->delta_lf_cdf, DELTA_LF_PROBS + 1);
+  for (int i = 0; i < FRAME_LF_COUNT; i++) {
+    RESET_CDF_COUNTER(fc->delta_lf_multi_cdf[i], DELTA_LF_PROBS + 1);
+  }
+  RESET_CDF_COUNTER_STRIDE(fc->intra_ext_tx_cdf[1], 7, CDF_SIZE(TX_TYPES));
+  RESET_CDF_COUNTER_STRIDE(fc->intra_ext_tx_cdf[2], 5, CDF_SIZE(TX_TYPES));
+  RESET_CDF_COUNTER_STRIDE(fc->inter_ext_tx_cdf[1], 16, CDF_SIZE(TX_TYPES));
+  RESET_CDF_COUNTER_STRIDE(fc->inter_ext_tx_cdf[2], 12, CDF_SIZE(TX_TYPES));
+  RESET_CDF_COUNTER_STRIDE(fc->inter_ext_tx_cdf[3], 2, CDF_SIZE(TX_TYPES));
+  RESET_CDF_COUNTER(fc->cfl_sign_cdf, CFL_JOINT_SIGNS);
+  RESET_CDF_COUNTER(fc->cfl_alpha_cdf, CFL_ALPHABET_SIZE);
+}

diff --git a/libav1/av1/common/entropy.h b/libav1/av1/common/entropy.h
new file mode 100644
index 0000000..41218d3
--- /dev/null
+++ b/libav1/av1/common/entropy.h

@@ -0,0 +1,181 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_ENTROPY_H_
+#define AOM_AV1_COMMON_ENTROPY_H_
+
+#include "config/aom_config.h"
+
+#include "aom/aom_integer.h"
+#include "aom_dsp/prob.h"
+
+#include "av1/common/common.h"
+#include "av1/common/common_data.h"
+#include "av1/common/enums.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define TOKEN_CDF_Q_CTXS 4
+
+#define TXB_SKIP_CONTEXTS 13
+
+#define EOB_COEF_CONTEXTS 9
+
+#define SIG_COEF_CONTEXTS_2D 26
+#define SIG_COEF_CONTEXTS_1D 16
+#define SIG_COEF_CONTEXTS_EOB 4
+#define SIG_COEF_CONTEXTS (SIG_COEF_CONTEXTS_2D + SIG_COEF_CONTEXTS_1D)
+
+#define COEFF_BASE_CONTEXTS (SIG_COEF_CONTEXTS)
+#define DC_SIGN_CONTEXTS 3
+
+#define BR_TMP_OFFSET 12
+#define BR_REF_CAT 4
+#define LEVEL_CONTEXTS 21
+
+#define NUM_BASE_LEVELS 2
+
+#define BR_CDF_SIZE (4)
+#define COEFF_BASE_RANGE (4 * (BR_CDF_SIZE - 1))
+
+#define COEFF_CONTEXT_BITS 6
+#define COEFF_CONTEXT_MASK ((1 << COEFF_CONTEXT_BITS) - 1)
+#define MAX_BASE_BR_RANGE (COEFF_BASE_RANGE + NUM_BASE_LEVELS + 1)
+
+#define BASE_CONTEXT_POSITION_NUM 12
+
+enum {
+  TX_CLASS_2D = 0,
+  TX_CLASS_HORIZ = 1,
+  TX_CLASS_VERT = 2,
+  TX_CLASSES = 3,
+} UENUM1BYTE(TX_CLASS);
+
+#define DCT_MAX_VALUE 16384
+#define DCT_MAX_VALUE_HIGH10 65536
+#define DCT_MAX_VALUE_HIGH12 262144
+
+/* Coefficients are predicted via a 3-dimensional probability table indexed on
+ * REF_TYPES, COEF_BANDS and COEF_CONTEXTS. */
+#define REF_TYPES 2  // intra=0, inter=1
+
+struct AV1Common;
+struct frame_contexts;
+void av1_reset_cdf_symbol_counters(struct frame_contexts *fc);
+void av1_default_coef_probs(struct AV1Common *cm);
+
+struct frame_contexts;
+
+typedef char ENTROPY_CONTEXT;
+
+static INLINE int combine_entropy_contexts(ENTROPY_CONTEXT a,
+                                           ENTROPY_CONTEXT b) {
+  return (a != 0) + (b != 0);
+}
+
+static INLINE int get_entropy_context(TX_SIZE tx_size, const ENTROPY_CONTEXT *a,
+                                      const ENTROPY_CONTEXT *l) {
+  ENTROPY_CONTEXT above_ec = 0, left_ec = 0;
+
+  switch (tx_size) {
+    case TX_4X4:
+      above_ec = a[0] != 0;
+      left_ec = l[0] != 0;
+      break;
+    case TX_4X8:
+      above_ec = a[0] != 0;
+      left_ec = !!*(const uint16_t *)l;
+      break;
+    case TX_8X4:
+      above_ec = !!*(const uint16_t *)a;
+      left_ec = l[0] != 0;
+      break;
+    case TX_8X16:
+      above_ec = !!*(const uint16_t *)a;
+      left_ec = !!*(const uint32_t *)l;
+      break;
+    case TX_16X8:
+      above_ec = !!*(const uint32_t *)a;
+      left_ec = !!*(const uint16_t *)l;
+      break;
+    case TX_16X32:
+      above_ec = !!*(const uint32_t *)a;
+      left_ec = !!*(const uint64_t *)l;
+      break;
+    case TX_32X16:
+      above_ec = !!*(const uint64_t *)a;
+      left_ec = !!*(const uint32_t *)l;
+      break;
+    case TX_8X8:
+      above_ec = !!*(const uint16_t *)a;
+      left_ec = !!*(const uint16_t *)l;
+      break;
+    case TX_16X16:
+      above_ec = !!*(const uint32_t *)a;
+      left_ec = !!*(const uint32_t *)l;
+      break;
+    case TX_32X32:
+      above_ec = !!*(const uint64_t *)a;
+      left_ec = !!*(const uint64_t *)l;
+      break;
+    case TX_64X64:
+      above_ec = !!(*(const uint64_t *)a | *(const uint64_t *)(a + 8));
+      left_ec = !!(*(const uint64_t *)l | *(const uint64_t *)(l + 8));
+      break;
+    case TX_32X64:
+      above_ec = !!*(const uint64_t *)a;
+      left_ec = !!(*(const uint64_t *)l | *(const uint64_t *)(l + 8));
+      break;
+    case TX_64X32:
+      above_ec = !!(*(const uint64_t *)a | *(const uint64_t *)(a + 8));
+      left_ec = !!*(const uint64_t *)l;
+      break;
+    case TX_4X16:
+      above_ec = a[0] != 0;
+      left_ec = !!*(const uint32_t *)l;
+      break;
+    case TX_16X4:
+      above_ec = !!*(const uint32_t *)a;
+      left_ec = l[0] != 0;
+      break;
+    case TX_8X32:
+      above_ec = !!*(const uint16_t *)a;
+      left_ec = !!*(const uint64_t *)l;
+      break;
+    case TX_32X8:
+      above_ec = !!*(const uint64_t *)a;
+      left_ec = !!*(const uint16_t *)l;
+      break;
+    case TX_16X64:
+      above_ec = !!*(const uint32_t *)a;
+      left_ec = !!(*(const uint64_t *)l | *(const uint64_t *)(l + 8));
+      break;
+    case TX_64X16:
+      above_ec = !!(*(const uint64_t *)a | *(const uint64_t *)(a + 8));
+      left_ec = !!*(const uint32_t *)l;
+      break;
+    default: assert(0 && "Invalid transform size."); break;
+  }
+  return combine_entropy_contexts(above_ec, left_ec);
+}
+
+static INLINE TX_SIZE get_txsize_entropy_ctx(TX_SIZE txsize) {
+  return (TX_SIZE)((txsize_sqr_map[txsize] + txsize_sqr_up_map[txsize] + 1) >>
+                   1);
+}
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_ENTROPY_H_

diff --git a/libav1/av1/common/entropymode.c b/libav1/av1/common/entropymode.c
new file mode 100644
index 0000000..21e46e3
--- /dev/null
+++ b/libav1/av1/common/entropymode.c

@@ -0,0 +1,1102 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "aom_mem/aom_mem.h"
+
+#include "av1/common/reconinter.h"
+#include "av1/common/scan.h"
+#include "av1/common/onyxc_int.h"
+#include "av1/common/seg_common.h"
+#include "av1/common/txb_common.h"
+
+static const aom_cdf_prob
+    default_kf_y_mode_cdf[KF_MODE_CONTEXTS][KF_MODE_CONTEXTS][CDF_SIZE(
+        INTRA_MODES)] = {
+      { { AOM_CDF13(15588, 17027, 19338, 20218, 20682, 21110, 21825, 23244,
+                    24189, 28165, 29093, 30466) },
+        { AOM_CDF13(12016, 18066, 19516, 20303, 20719, 21444, 21888, 23032,
+                    24434, 28658, 30172, 31409) },
+        { AOM_CDF13(10052, 10771, 22296, 22788, 23055, 23239, 24133, 25620,
+                    26160, 29336, 29929, 31567) },
+        { AOM_CDF13(14091, 15406, 16442, 18808, 19136, 19546, 19998, 22096,
+                    24746, 29585, 30958, 32462) },
+        { AOM_CDF13(12122, 13265, 15603, 16501, 18609, 20033, 22391, 25583,
+                    26437, 30261, 31073, 32475) } },
+      { { AOM_CDF13(10023, 19585, 20848, 21440, 21832, 22760, 23089, 24023,
+                    25381, 29014, 30482, 31436) },
+        { AOM_CDF13(5983, 24099, 24560, 24886, 25066, 25795, 25913, 26423,
+                    27610, 29905, 31276, 31794) },
+        { AOM_CDF13(7444, 12781, 20177, 20728, 21077, 21607, 22170, 23405,
+                    24469, 27915, 29090, 30492) },
+        { AOM_CDF13(8537, 14689, 15432, 17087, 17408, 18172, 18408, 19825,
+                    24649, 29153, 31096, 32210) },
+        { AOM_CDF13(7543, 14231, 15496, 16195, 17905, 20717, 21984, 24516,
+                    26001, 29675, 30981, 31994) } },
+      { { AOM_CDF13(12613, 13591, 21383, 22004, 22312, 22577, 23401, 25055,
+                    25729, 29538, 30305, 32077) },
+        { AOM_CDF13(9687, 13470, 18506, 19230, 19604, 20147, 20695, 22062,
+                    23219, 27743, 29211, 30907) },
+        { AOM_CDF13(6183, 6505, 26024, 26252, 26366, 26434, 27082, 28354, 28555,
+                    30467, 30794, 32086) },
+        { AOM_CDF13(10718, 11734, 14954, 17224, 17565, 17924, 18561, 21523,
+                    23878, 28975, 30287, 32252) },
+        { AOM_CDF13(9194, 9858, 16501, 17263, 18424, 19171, 21563, 25961, 26561,
+                    30072, 30737, 32463) } },
+      { { AOM_CDF13(12602, 14399, 15488, 18381, 18778, 19315, 19724, 21419,
+                    25060, 29696, 30917, 32409) },
+        { AOM_CDF13(8203, 13821, 14524, 17105, 17439, 18131, 18404, 19468,
+                    25225, 29485, 31158, 32342) },
+        { AOM_CDF13(8451, 9731, 15004, 17643, 18012, 18425, 19070, 21538, 24605,
+                    29118, 30078, 32018) },
+        { AOM_CDF13(7714, 9048, 9516, 16667, 16817, 16994, 17153, 18767, 26743,
+                    30389, 31536, 32528) },
+        { AOM_CDF13(8843, 10280, 11496, 15317, 16652, 17943, 19108, 22718,
+                    25769, 29953, 30983, 32485) } },
+      { { AOM_CDF13(12578, 13671, 15979, 16834, 19075, 20913, 22989, 25449,
+                    26219, 30214, 31150, 32477) },
+        { AOM_CDF13(9563, 13626, 15080, 15892, 17756, 20863, 22207, 24236,
+                    25380, 29653, 31143, 32277) },
+        { AOM_CDF13(8356, 8901, 17616, 18256, 19350, 20106, 22598, 25947, 26466,
+                    29900, 30523, 32261) },
+        { AOM_CDF13(10835, 11815, 13124, 16042, 17018, 18039, 18947, 22753,
+                    24615, 29489, 30883, 32482) },
+        { AOM_CDF13(7618, 8288, 9859, 10509, 15386, 18657, 22903, 28776, 29180,
+                    31355, 31802, 32593) } }
+    };
+
+static const aom_cdf_prob default_angle_delta_cdf[DIRECTIONAL_MODES][CDF_SIZE(
+    2 * MAX_ANGLE_DELTA + 1)] = {
+  { AOM_CDF7(2180, 5032, 7567, 22776, 26989, 30217) },
+  { AOM_CDF7(2301, 5608, 8801, 23487, 26974, 30330) },
+  { AOM_CDF7(3780, 11018, 13699, 19354, 23083, 31286) },
+  { AOM_CDF7(4581, 11226, 15147, 17138, 21834, 28397) },
+  { AOM_CDF7(1737, 10927, 14509, 19588, 22745, 28823) },
+  { AOM_CDF7(2664, 10176, 12485, 17650, 21600, 30495) },
+  { AOM_CDF7(2240, 11096, 15453, 20341, 22561, 28917) },
+  { AOM_CDF7(3605, 10428, 12459, 17676, 21244, 30655) }
+};
+
+static const aom_cdf_prob default_if_y_mode_cdf[BLOCK_SIZE_GROUPS][CDF_SIZE(
+    INTRA_MODES)] = { { AOM_CDF13(22801, 23489, 24293, 24756, 25601, 26123,
+                                  26606, 27418, 27945, 29228, 29685, 30349) },
+                      { AOM_CDF13(18673, 19845, 22631, 23318, 23950, 24649,
+                                  25527, 27364, 28152, 29701, 29984, 30852) },
+                      { AOM_CDF13(19770, 20979, 23396, 23939, 24241, 24654,
+                                  25136, 27073, 27830, 29360, 29730, 30659) },
+                      { AOM_CDF13(20155, 21301, 22838, 23178, 23261, 23533,
+                                  23703, 24804, 25352, 26575, 27016, 28049) } };
+
+static const aom_cdf_prob
+    default_uv_mode_cdf[CFL_ALLOWED_TYPES][INTRA_MODES][CDF_SIZE(
+        UV_INTRA_MODES)] = {
+      { { AOM_CDF13(22631, 24152, 25378, 25661, 25986, 26520, 27055, 27923,
+                    28244, 30059, 30941, 31961) },
+        { AOM_CDF13(9513, 26881, 26973, 27046, 27118, 27664, 27739, 27824,
+                    28359, 29505, 29800, 31796) },
+        { AOM_CDF13(9845, 9915, 28663, 28704, 28757, 28780, 29198, 29822, 29854,
+                    30764, 31777, 32029) },
+        { AOM_CDF13(13639, 13897, 14171, 25331, 25606, 25727, 25953, 27148,
+                    28577, 30612, 31355, 32493) },
+        { AOM_CDF13(9764, 9835, 9930, 9954, 25386, 27053, 27958, 28148, 28243,
+                    31101, 31744, 32363) },
+        { AOM_CDF13(11825, 13589, 13677, 13720, 15048, 29213, 29301, 29458,
+                    29711, 31161, 31441, 32550) },
+        { AOM_CDF13(14175, 14399, 16608, 16821, 17718, 17775, 28551, 30200,
+                    30245, 31837, 32342, 32667) },
+        { AOM_CDF13(12885, 13038, 14978, 15590, 15673, 15748, 16176, 29128,
+                    29267, 30643, 31961, 32461) },
+        { AOM_CDF13(12026, 13661, 13874, 15305, 15490, 15726, 15995, 16273,
+                    28443, 30388, 30767, 32416) },
+        { AOM_CDF13(19052, 19840, 20579, 20916, 21150, 21467, 21885, 22719,
+                    23174, 28861, 30379, 32175) },
+        { AOM_CDF13(18627, 19649, 20974, 21219, 21492, 21816, 22199, 23119,
+                    23527, 27053, 31397, 32148) },
+        { AOM_CDF13(17026, 19004, 19997, 20339, 20586, 21103, 21349, 21907,
+                    22482, 25896, 26541, 31819) },
+        { AOM_CDF13(12124, 13759, 14959, 14992, 15007, 15051, 15078, 15166,
+                    15255, 15753, 16039, 16606) } },
+      { { AOM_CDF14(10407, 11208, 12900, 13181, 13823, 14175, 14899, 15656,
+                    15986, 20086, 20995, 22455, 24212) },
+        { AOM_CDF14(4532, 19780, 20057, 20215, 20428, 21071, 21199, 21451,
+                    22099, 24228, 24693, 27032, 29472) },
+        { AOM_CDF14(5273, 5379, 20177, 20270, 20385, 20439, 20949, 21695, 21774,
+                    23138, 24256, 24703, 26679) },
+        { AOM_CDF14(6740, 7167, 7662, 14152, 14536, 14785, 15034, 16741, 18371,
+                    21520, 22206, 23389, 24182) },
+        { AOM_CDF14(4987, 5368, 5928, 6068, 19114, 20315, 21857, 22253, 22411,
+                    24911, 25380, 26027, 26376) },
+        { AOM_CDF14(5370, 6889, 7247, 7393, 9498, 21114, 21402, 21753, 21981,
+                    24780, 25386, 26517, 27176) },
+        { AOM_CDF14(4816, 4961, 7204, 7326, 8765, 8930, 20169, 20682, 20803,
+                    23188, 23763, 24455, 24940) },
+        { AOM_CDF14(6608, 6740, 8529, 9049, 9257, 9356, 9735, 18827, 19059,
+                    22336, 23204, 23964, 24793) },
+        { AOM_CDF14(5998, 7419, 7781, 8933, 9255, 9549, 9753, 10417, 18898,
+                    22494, 23139, 24764, 25989) },
+        { AOM_CDF14(10660, 11298, 12550, 12957, 13322, 13624, 14040, 15004,
+                    15534, 20714, 21789, 23443, 24861) },
+        { AOM_CDF14(10522, 11530, 12552, 12963, 13378, 13779, 14245, 15235,
+                    15902, 20102, 22696, 23774, 25838) },
+        { AOM_CDF14(10099, 10691, 12639, 13049, 13386, 13665, 14125, 15163,
+                    15636, 19676, 20474, 23519, 25208) },
+        { AOM_CDF14(3144, 5087, 7382, 7504, 7593, 7690, 7801, 8064, 8232, 9248,
+                    9875, 10521, 29048) } }
+    };
+
+static const aom_cdf_prob default_partition_cdf[PARTITION_CONTEXTS][CDF_SIZE(
+    EXT_PARTITION_TYPES)] = {
+  { AOM_CDF4(19132, 25510, 30392) },
+  { AOM_CDF4(13928, 19855, 28540) },
+  { AOM_CDF4(12522, 23679, 28629) },
+  { AOM_CDF4(9896, 18783, 25853) },
+  { AOM_CDF10(15597, 20929, 24571, 26706, 27664, 28821, 29601, 30571, 31902) },
+  { AOM_CDF10(7925, 11043, 16785, 22470, 23971, 25043, 26651, 28701, 29834) },
+  { AOM_CDF10(5414, 13269, 15111, 20488, 22360, 24500, 25537, 26336, 32117) },
+  { AOM_CDF10(2662, 6362, 8614, 20860, 23053, 24778, 26436, 27829, 31171) },
+  { AOM_CDF10(18462, 20920, 23124, 27647, 28227, 29049, 29519, 30178, 31544) },
+  { AOM_CDF10(7689, 9060, 12056, 24992, 25660, 26182, 26951, 28041, 29052) },
+  { AOM_CDF10(6015, 9009, 10062, 24544, 25409, 26545, 27071, 27526, 32047) },
+  { AOM_CDF10(1394, 2208, 2796, 28614, 29061, 29466, 29840, 30185, 31899) },
+  { AOM_CDF10(20137, 21547, 23078, 29566, 29837, 30261, 30524, 30892, 31724) },
+  { AOM_CDF10(6732, 7490, 9497, 27944, 28250, 28515, 28969, 29630, 30104) },
+  { AOM_CDF10(5945, 7663, 8348, 28683, 29117, 29749, 30064, 30298, 32238) },
+  { AOM_CDF10(870, 1212, 1487, 31198, 31394, 31574, 31743, 31881, 32332) },
+  { AOM_CDF8(27899, 28219, 28529, 32484, 32539, 32619, 32639) },
+  { AOM_CDF8(6607, 6990, 8268, 32060, 32219, 32338, 32371) },
+  { AOM_CDF8(5429, 6676, 7122, 32027, 32227, 32531, 32582) },
+  { AOM_CDF8(711, 966, 1172, 32448, 32538, 32617, 32664) },
+};
+
+static const aom_cdf_prob default_intra_ext_tx_cdf
+    [EXT_TX_SETS_INTRA][EXT_TX_SIZES][INTRA_MODES][CDF_SIZE(TX_TYPES)] = {
+      {
+          {
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+          },
+          {
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+          },
+          {
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+          },
+          {
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+              { 0 },
+          },
+      },
+      {
+          {
+              { AOM_CDF7(1535, 8035, 9461, 12751, 23467, 27825) },
+              { AOM_CDF7(564, 3335, 9709, 10870, 18143, 28094) },
+              { AOM_CDF7(672, 3247, 3676, 11982, 19415, 23127) },
+              { AOM_CDF7(5279, 13885, 15487, 18044, 23527, 30252) },
+              { AOM_CDF7(4423, 6074, 7985, 10416, 25693, 29298) },
+              { AOM_CDF7(1486, 4241, 9460, 10662, 16456, 27694) },
+              { AOM_CDF7(439, 2838, 3522, 6737, 18058, 23754) },
+              { AOM_CDF7(1190, 4233, 4855, 11670, 20281, 24377) },
+              { AOM_CDF7(1045, 4312, 8647, 10159, 18644, 29335) },
+              { AOM_CDF7(202, 3734, 4747, 7298, 17127, 24016) },
+              { AOM_CDF7(447, 4312, 6819, 8884, 16010, 23858) },
+              { AOM_CDF7(277, 4369, 5255, 8905, 16465, 22271) },
+              { AOM_CDF7(3409, 5436, 10599, 15599, 19687, 24040) },
+          },
+          {
+              { AOM_CDF7(1870, 13742, 14530, 16498, 23770, 27698) },
+              { AOM_CDF7(326, 8796, 14632, 15079, 19272, 27486) },
+              { AOM_CDF7(484, 7576, 7712, 14443, 19159, 22591) },
+              { AOM_CDF7(1126, 15340, 15895, 17023, 20896, 30279) },
+              { AOM_CDF7(655, 4854, 5249, 5913, 22099, 27138) },
+              { AOM_CDF7(1299, 6458, 8885, 9290, 14851, 25497) },
+              { AOM_CDF7(311, 5295, 5552, 6885, 16107, 22672) },
+              { AOM_CDF7(883, 8059, 8270, 11258, 17289, 21549) },
+              { AOM_CDF7(741, 7580, 9318, 10345, 16688, 29046) },
+              { AOM_CDF7(110, 7406, 7915, 9195, 16041, 23329) },
+              { AOM_CDF7(363, 7974, 9357, 10673, 15629, 24474) },
+              { AOM_CDF7(153, 7647, 8112, 9936, 15307, 19996) },
+              { AOM_CDF7(3511, 6332, 11165, 15335, 19323, 23594) },
+          },
+          {
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+          },
+          {
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+              { AOM_CDF7(4681, 9362, 14043, 18725, 23406, 28087) },
+          },
+      },
+      {
+          {
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+          },
+          {
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+          },
+          {
+              { AOM_CDF5(1127, 12814, 22772, 27483) },
+              { AOM_CDF5(145, 6761, 11980, 26667) },
+              { AOM_CDF5(362, 5887, 11678, 16725) },
+              { AOM_CDF5(385, 15213, 18587, 30693) },
+              { AOM_CDF5(25, 2914, 23134, 27903) },
+              { AOM_CDF5(60, 4470, 11749, 23991) },
+              { AOM_CDF5(37, 3332, 14511, 21448) },
+              { AOM_CDF5(157, 6320, 13036, 17439) },
+              { AOM_CDF5(119, 6719, 12906, 29396) },
+              { AOM_CDF5(47, 5537, 12576, 21499) },
+              { AOM_CDF5(269, 6076, 11258, 23115) },
+              { AOM_CDF5(83, 5615, 12001, 17228) },
+              { AOM_CDF5(1968, 5556, 12023, 18547) },
+          },
+          {
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+              { AOM_CDF5(6554, 13107, 19661, 26214) },
+          },
+      },
+    };
+
+static const aom_cdf_prob
+    default_inter_ext_tx_cdf[EXT_TX_SETS_INTER][EXT_TX_SIZES][CDF_SIZE(
+        TX_TYPES)] = {
+      {
+          { 0 },
+          { 0 },
+          { 0 },
+          { 0 },
+      },
+      {
+          { AOM_CDF16(4458, 5560, 7695, 9709, 13330, 14789, 17537, 20266, 21504,
+                      22848, 23934, 25474, 27727, 28915, 30631) },
+          { AOM_CDF16(1645, 2573, 4778, 5711, 7807, 8622, 10522, 15357, 17674,
+                      20408, 22517, 25010, 27116, 28856, 30749) },
+          { AOM_CDF16(2048, 4096, 6144, 8192, 10240, 12288, 14336, 16384, 18432,
+                      20480, 22528, 24576, 26624, 28672, 30720) },
+          { AOM_CDF16(2048, 4096, 6144, 8192, 10240, 12288, 14336, 16384, 18432,
+                      20480, 22528, 24576, 26624, 28672, 30720) },
+      },
+      {
+          { AOM_CDF12(2731, 5461, 8192, 10923, 13653, 16384, 19115, 21845,
+                      24576, 27307, 30037) },
+          { AOM_CDF12(2731, 5461, 8192, 10923, 13653, 16384, 19115, 21845,
+                      24576, 27307, 30037) },
+          { AOM_CDF12(770, 2421, 5225, 12907, 15819, 18927, 21561, 24089, 26595,
+                      28526, 30529) },
+          { AOM_CDF12(2731, 5461, 8192, 10923, 13653, 16384, 19115, 21845,
+                      24576, 27307, 30037) },
+      },
+      {
+          { AOM_CDF2(16384) },
+          { AOM_CDF2(4167) },
+          { AOM_CDF2(1998) },
+          { AOM_CDF2(748) },
+      },
+    };
+
+static const aom_cdf_prob default_cfl_sign_cdf[CDF_SIZE(CFL_JOINT_SIGNS)] = {
+  AOM_CDF8(1418, 2123, 13340, 18405, 26972, 28343, 32294)
+};
+
+static const aom_cdf_prob
+    default_cfl_alpha_cdf[CFL_ALPHA_CONTEXTS][CDF_SIZE(CFL_ALPHABET_SIZE)] = {
+      { AOM_CDF16(7637, 20719, 31401, 32481, 32657, 32688, 32692, 32696, 32700,
+                  32704, 32708, 32712, 32716, 32720, 32724) },
+      { AOM_CDF16(14365, 23603, 28135, 31168, 32167, 32395, 32487, 32573, 32620,
+                  32647, 32668, 32672, 32676, 32680, 32684) },
+      { AOM_CDF16(11532, 22380, 28445, 31360, 32349, 32523, 32584, 32649, 32673,
+                  32677, 32681, 32685, 32689, 32693, 32697) },
+      { AOM_CDF16(26990, 31402, 32282, 32571, 32692, 32696, 32700, 32704, 32708,
+                  32712, 32716, 32720, 32724, 32728, 32732) },
+      { AOM_CDF16(17248, 26058, 28904, 30608, 31305, 31877, 32126, 32321, 32394,
+                  32464, 32516, 32560, 32576, 32593, 32622) },
+      { AOM_CDF16(14738, 21678, 25779, 27901, 29024, 30302, 30980, 31843, 32144,
+                  32413, 32520, 32594, 32622, 32656, 32660) }
+    };
+
+static const aom_cdf_prob
+    default_switchable_interp_cdf[SWITCHABLE_FILTER_CONTEXTS][CDF_SIZE(
+        SWITCHABLE_FILTERS)] = {
+      { AOM_CDF3(31935, 32720) }, { AOM_CDF3(5568, 32719) },
+      { AOM_CDF3(422, 2938) },    { AOM_CDF3(28244, 32608) },
+      { AOM_CDF3(31206, 31953) }, { AOM_CDF3(4862, 32121) },
+      { AOM_CDF3(770, 1152) },    { AOM_CDF3(20889, 25637) },
+      { AOM_CDF3(31910, 32724) }, { AOM_CDF3(4120, 32712) },
+      { AOM_CDF3(305, 2247) },    { AOM_CDF3(27403, 32636) },
+      { AOM_CDF3(31022, 32009) }, { AOM_CDF3(2963, 32093) },
+      { AOM_CDF3(601, 943) },     { AOM_CDF3(14969, 21398) }
+    };
+
+static const aom_cdf_prob default_newmv_cdf[NEWMV_MODE_CONTEXTS][CDF_SIZE(2)] =
+    { { AOM_CDF2(24035) }, { AOM_CDF2(16630) }, { AOM_CDF2(15339) },
+      { AOM_CDF2(8386) },  { AOM_CDF2(12222) }, { AOM_CDF2(4676) } };
+
+static const aom_cdf_prob default_zeromv_cdf[GLOBALMV_MODE_CONTEXTS][CDF_SIZE(
+    2)] = { { AOM_CDF2(2175) }, { AOM_CDF2(1054) } };
+
+static const aom_cdf_prob default_refmv_cdf[REFMV_MODE_CONTEXTS][CDF_SIZE(2)] =
+    { { AOM_CDF2(23974) }, { AOM_CDF2(24188) }, { AOM_CDF2(17848) },
+      { AOM_CDF2(28622) }, { AOM_CDF2(24312) }, { AOM_CDF2(19923) } };
+
+static const aom_cdf_prob default_drl_cdf[DRL_MODE_CONTEXTS][CDF_SIZE(2)] = {
+  { AOM_CDF2(13104) }, { AOM_CDF2(24560) }, { AOM_CDF2(18945) }
+};
+
+static const aom_cdf_prob
+    default_inter_compound_mode_cdf[INTER_MODE_CONTEXTS][CDF_SIZE(
+        INTER_COMPOUND_MODES)] = {
+      { AOM_CDF8(7760, 13823, 15808, 17641, 19156, 20666, 26891) },
+      { AOM_CDF8(10730, 19452, 21145, 22749, 24039, 25131, 28724) },
+      { AOM_CDF8(10664, 20221, 21588, 22906, 24295, 25387, 28436) },
+      { AOM_CDF8(13298, 16984, 20471, 24182, 25067, 25736, 26422) },
+      { AOM_CDF8(18904, 23325, 25242, 27432, 27898, 28258, 30758) },
+      { AOM_CDF8(10725, 17454, 20124, 22820, 24195, 25168, 26046) },
+      { AOM_CDF8(17125, 24273, 25814, 27492, 28214, 28704, 30592) },
+      { AOM_CDF8(13046, 23214, 24505, 25942, 27435, 28442, 29330) }
+    };
+
+static const aom_cdf_prob default_interintra_cdf[BLOCK_SIZE_GROUPS][CDF_SIZE(
+    2)] = { { AOM_CDF2(16384) },
+            { AOM_CDF2(26887) },
+            { AOM_CDF2(27597) },
+            { AOM_CDF2(30237) } };
+
+static const aom_cdf_prob
+    default_interintra_mode_cdf[BLOCK_SIZE_GROUPS][CDF_SIZE(INTERINTRA_MODES)] =
+        { { AOM_CDF4(8192, 16384, 24576) },
+          { AOM_CDF4(1875, 11082, 27332) },
+          { AOM_CDF4(2473, 9996, 26388) },
+          { AOM_CDF4(4238, 11537, 25926) } };
+
+static const aom_cdf_prob
+    default_wedge_interintra_cdf[BLOCK_SIZES_ALL][CDF_SIZE(2)] = {
+      { AOM_CDF2(16384) }, { AOM_CDF2(16384) }, { AOM_CDF2(16384) },
+      { AOM_CDF2(20036) }, { AOM_CDF2(24957) }, { AOM_CDF2(26704) },
+      { AOM_CDF2(27530) }, { AOM_CDF2(29564) }, { AOM_CDF2(29444) },
+      { AOM_CDF2(26872) }, { AOM_CDF2(16384) }, { AOM_CDF2(16384) },
+      { AOM_CDF2(16384) }, { AOM_CDF2(16384) }, { AOM_CDF2(16384) },
+      { AOM_CDF2(16384) }, { AOM_CDF2(16384) }, { AOM_CDF2(16384) },
+      { AOM_CDF2(16384) }, { AOM_CDF2(16384) }, { AOM_CDF2(16384) },
+      { AOM_CDF2(16384) }
+    };
+
+static const aom_cdf_prob default_compound_type_cdf[BLOCK_SIZES_ALL][CDF_SIZE(
+    MASKED_COMPOUND_TYPES)] = {
+  { AOM_CDF2(16384) }, { AOM_CDF2(16384) }, { AOM_CDF2(16384) },
+  { AOM_CDF2(23431) }, { AOM_CDF2(13171) }, { AOM_CDF2(11470) },
+  { AOM_CDF2(9770) },  { AOM_CDF2(9100) },  { AOM_CDF2(8233) },
+  { AOM_CDF2(6172) },  { AOM_CDF2(16384) }, { AOM_CDF2(16384) },
+  { AOM_CDF2(16384) }, { AOM_CDF2(16384) }, { AOM_CDF2(16384) },
+  { AOM_CDF2(16384) }, { AOM_CDF2(16384) }, { AOM_CDF2(16384) },
+  { AOM_CDF2(11820) }, { AOM_CDF2(7701) },  { AOM_CDF2(16384) },
+  { AOM_CDF2(16384) }
+};
+
+static const aom_cdf_prob default_wedge_idx_cdf[BLOCK_SIZES_ALL][CDF_SIZE(16)] =
+    { { AOM_CDF16(2048, 4096, 6144, 8192, 10240, 12288, 14336, 16384, 18432,
+                  20480, 22528, 24576, 26624, 28672, 30720) },
+      { AOM_CDF16(2048, 4096, 6144, 8192, 10240, 12288, 14336, 16384, 18432,
+                  20480, 22528, 24576, 26624, 28672, 30720) },
+      { AOM_CDF16(2048, 4096, 6144, 8192, 10240, 12288, 14336, 16384, 18432,
+                  20480, 22528, 24576, 26624, 28672, 30720) },
+      { AOM_CDF16(2438, 4440, 6599, 8663, 11005, 12874, 15751, 18094, 20359,
+                  22362, 24127, 25702, 27752, 29450, 31171) },
+      { AOM_CDF16(806, 3266, 6005, 6738, 7218, 7367, 7771, 14588, 16323, 17367,
+                  18452, 19422, 22839, 26127, 29629) },
+      { AOM_CDF16(2779, 3738, 4683, 7213, 7775, 8017, 8655, 14357, 17939, 21332,
+                  24520, 27470, 29456, 30529, 31656) },
+      { AOM_CDF16(1684, 3625, 5675, 7108, 9302, 11274, 14429, 17144, 19163,
+                  20961, 22884, 24471, 26719, 28714, 30877) },
+      { AOM_CDF16(1142, 3491, 6277, 7314, 8089, 8355, 9023, 13624, 15369, 16730,
+                  18114, 19313, 22521, 26012, 29550) },
+      { AOM_CDF16(2742, 4195, 5727, 8035, 8980, 9336, 10146, 14124, 17270,
+                  20533, 23434, 25972, 27944, 29570, 31416) },
+      { AOM_CDF16(1727, 3948, 6101, 7796, 9841, 12344, 15766, 18944, 20638,
+                  22038, 23963, 25311, 26988, 28766, 31012) },
+      { AOM_CDF16(2048, 4096, 6144, 8192, 10240, 12288, 14336, 16384, 18432,
+                  20480, 22528, 24576, 26624, 28672, 30720) },
+      { AOM_CDF16(2048, 4096, 6144, 8192, 10240, 12288, 14336, 16384, 18432,
+                  20480, 22528, 24576, 26624, 28672, 30720) },
+      { AOM_CDF16(2048, 4096, 6144, 8192, 10240, 12288, 14336, 16384, 18432,
+                  20480, 22528, 24576, 26624, 28672, 30720) },
+      { AOM_CDF16(2048, 4096, 6144, 8192, 10240, 12288, 14336, 16384, 18432,
+                  20480, 22528, 24576, 26624, 28672, 30720) },
+      { AOM_CDF16(2048, 4096, 6144, 8192, 10240, 12288, 14336, 16384, 18432,
+                  20480, 22528, 24576, 26624, 28672, 30720) },
+      { AOM_CDF16(2048, 4096, 6144, 8192, 10240, 12288, 14336, 16384, 18432,
+                  20480, 22528, 24576, 26624, 28672, 30720) },
+      { AOM_CDF16(2048, 4096, 6144, 8192, 10240, 12288, 14336, 16384, 18432,
+                  20480, 22528, 24576, 26624, 28672, 30720) },
+      { AOM_CDF16(2048, 4096, 6144, 8192, 10240, 12288, 14336, 16384, 18432,
+                  20480, 22528, 24576, 26624, 28672, 30720) },
+      { AOM_CDF16(154, 987, 1925, 2051, 2088, 2111, 2151, 23033, 23703, 24284,
+                  24985, 25684, 27259, 28883, 30911) },
+      { AOM_CDF16(1135, 1322, 1493, 2635, 2696, 2737, 2770, 21016, 22935, 25057,
+                  27251, 29173, 30089, 30960, 31933) },
+      { AOM_CDF16(2048, 4096, 6144, 8192, 10240, 12288, 14336, 16384, 18432,
+                  20480, 22528, 24576, 26624, 28672, 30720) },
+      { AOM_CDF16(2048, 4096, 6144, 8192, 10240, 12288, 14336, 16384, 18432,
+                  20480, 22528, 24576, 26624, 28672, 30720) } };
+
+static const aom_cdf_prob default_motion_mode_cdf[BLOCK_SIZES_ALL][CDF_SIZE(
+    MOTION_MODES)] = { { AOM_CDF3(10923, 21845) }, { AOM_CDF3(10923, 21845) },
+                       { AOM_CDF3(10923, 21845) }, { AOM_CDF3(7651, 24760) },
+                       { AOM_CDF3(4738, 24765) },  { AOM_CDF3(5391, 25528) },
+                       { AOM_CDF3(19419, 26810) }, { AOM_CDF3(5123, 23606) },
+                       { AOM_CDF3(11606, 24308) }, { AOM_CDF3(26260, 29116) },
+                       { AOM_CDF3(20360, 28062) }, { AOM_CDF3(21679, 26830) },
+                       { AOM_CDF3(29516, 30701) }, { AOM_CDF3(28898, 30397) },
+                       { AOM_CDF3(30878, 31335) }, { AOM_CDF3(32507, 32558) },
+                       { AOM_CDF3(10923, 21845) }, { AOM_CDF3(10923, 21845) },
+                       { AOM_CDF3(28799, 31390) }, { AOM_CDF3(26431, 30774) },
+                       { AOM_CDF3(28973, 31594) }, { AOM_CDF3(29742, 31203) } };
+
+static const aom_cdf_prob default_obmc_cdf[BLOCK_SIZES_ALL][CDF_SIZE(2)] = {
+  { AOM_CDF2(16384) }, { AOM_CDF2(16384) }, { AOM_CDF2(16384) },
+  { AOM_CDF2(10437) }, { AOM_CDF2(9371) },  { AOM_CDF2(9301) },
+  { AOM_CDF2(17432) }, { AOM_CDF2(14423) }, { AOM_CDF2(15142) },
+  { AOM_CDF2(25817) }, { AOM_CDF2(22823) }, { AOM_CDF2(22083) },
+  { AOM_CDF2(30128) }, { AOM_CDF2(31014) }, { AOM_CDF2(31560) },
+  { AOM_CDF2(32638) }, { AOM_CDF2(16384) }, { AOM_CDF2(16384) },
+  { AOM_CDF2(23664) }, { AOM_CDF2(20901) }, { AOM_CDF2(24008) },
+  { AOM_CDF2(26879) }
+};
+
+static const aom_cdf_prob default_intra_inter_cdf[INTRA_INTER_CONTEXTS]
+                                                 [CDF_SIZE(2)] = {
+                                                   { AOM_CDF2(806) },
+                                                   { AOM_CDF2(16662) },
+                                                   { AOM_CDF2(20186) },
+                                                   { AOM_CDF2(26538) }
+                                                 };
+
+static const aom_cdf_prob default_comp_inter_cdf[COMP_INTER_CONTEXTS][CDF_SIZE(
+    2)] = { { AOM_CDF2(26828) },
+            { AOM_CDF2(24035) },
+            { AOM_CDF2(12031) },
+            { AOM_CDF2(10640) },
+            { AOM_CDF2(2901) } };
+
+static const aom_cdf_prob default_comp_ref_type_cdf[COMP_REF_TYPE_CONTEXTS]
+                                                   [CDF_SIZE(2)] = {
+                                                     { AOM_CDF2(1198) },
+                                                     { AOM_CDF2(2070) },
+                                                     { AOM_CDF2(9166) },
+                                                     { AOM_CDF2(7499) },
+                                                     { AOM_CDF2(22475) }
+                                                   };
+
+static const aom_cdf_prob
+    default_uni_comp_ref_cdf[UNI_COMP_REF_CONTEXTS][UNIDIR_COMP_REFS -
+                                                    1][CDF_SIZE(2)] = {
+      { { AOM_CDF2(5284) }, { AOM_CDF2(3865) }, { AOM_CDF2(3128) } },
+      { { AOM_CDF2(23152) }, { AOM_CDF2(14173) }, { AOM_CDF2(15270) } },
+      { { AOM_CDF2(31774) }, { AOM_CDF2(25120) }, { AOM_CDF2(26710) } }
+    };
+
+static const aom_cdf_prob default_single_ref_cdf[REF_CONTEXTS][SINGLE_REFS - 1]
+                                                [CDF_SIZE(2)] = {
+                                                  { { AOM_CDF2(4897) },
+                                                    { AOM_CDF2(1555) },
+                                                    { AOM_CDF2(4236) },
+                                                    { AOM_CDF2(8650) },
+                                                    { AOM_CDF2(904) },
+                                                    { AOM_CDF2(1444) } },
+                                                  { { AOM_CDF2(16973) },
+                                                    { AOM_CDF2(16751) },
+                                                    { AOM_CDF2(19647) },
+                                                    { AOM_CDF2(24773) },
+                                                    { AOM_CDF2(11014) },
+                                                    { AOM_CDF2(15087) } },
+                                                  { { AOM_CDF2(29744) },
+                                                    { AOM_CDF2(30279) },
+                                                    { AOM_CDF2(31194) },
+                                                    { AOM_CDF2(31895) },
+                                                    { AOM_CDF2(26875) },
+                                                    { AOM_CDF2(30304) } }
+                                                };
+
+static const aom_cdf_prob
+    default_comp_ref_cdf[REF_CONTEXTS][FWD_REFS - 1][CDF_SIZE(2)] = {
+      { { AOM_CDF2(4946) }, { AOM_CDF2(9468) }, { AOM_CDF2(1503) } },
+      { { AOM_CDF2(19891) }, { AOM_CDF2(22441) }, { AOM_CDF2(15160) } },
+      { { AOM_CDF2(30731) }, { AOM_CDF2(31059) }, { AOM_CDF2(27544) } }
+    };
+
+static const aom_cdf_prob
+    default_comp_bwdref_cdf[REF_CONTEXTS][BWD_REFS - 1][CDF_SIZE(2)] = {
+      { { AOM_CDF2(2235) }, { AOM_CDF2(1423) } },
+      { { AOM_CDF2(17182) }, { AOM_CDF2(15175) } },
+      { { AOM_CDF2(30606) }, { AOM_CDF2(30489) } }
+    };
+
+static const aom_cdf_prob
+    default_palette_y_size_cdf[PALATTE_BSIZE_CTXS][CDF_SIZE(PALETTE_SIZES)] = {
+      { AOM_CDF7(7952, 13000, 18149, 21478, 25527, 29241) },
+      { AOM_CDF7(7139, 11421, 16195, 19544, 23666, 28073) },
+      { AOM_CDF7(7788, 12741, 17325, 20500, 24315, 28530) },
+      { AOM_CDF7(8271, 14064, 18246, 21564, 25071, 28533) },
+      { AOM_CDF7(12725, 19180, 21863, 24839, 27535, 30120) },
+      { AOM_CDF7(9711, 14888, 16923, 21052, 25661, 27875) },
+      { AOM_CDF7(14940, 20797, 21678, 24186, 27033, 28999) }
+    };
+
+static const aom_cdf_prob
+    default_palette_uv_size_cdf[PALATTE_BSIZE_CTXS][CDF_SIZE(PALETTE_SIZES)] = {
+      { AOM_CDF7(8713, 19979, 27128, 29609, 31331, 32272) },
+      { AOM_CDF7(5839, 15573, 23581, 26947, 29848, 31700) },
+      { AOM_CDF7(4426, 11260, 17999, 21483, 25863, 29430) },
+      { AOM_CDF7(3228, 9464, 14993, 18089, 22523, 27420) },
+      { AOM_CDF7(3768, 8886, 13091, 17852, 22495, 27207) },
+      { AOM_CDF7(2464, 8451, 12861, 21632, 25525, 28555) },
+      { AOM_CDF7(1269, 5435, 10433, 18963, 21700, 25865) }
+    };
+
+static const aom_cdf_prob default_palette_y_mode_cdf
+    [PALATTE_BSIZE_CTXS][PALETTE_Y_MODE_CONTEXTS][CDF_SIZE(2)] = {
+      { { AOM_CDF2(31676) }, { AOM_CDF2(3419) }, { AOM_CDF2(1261) } },
+      { { AOM_CDF2(31912) }, { AOM_CDF2(2859) }, { AOM_CDF2(980) } },
+      { { AOM_CDF2(31823) }, { AOM_CDF2(3400) }, { AOM_CDF2(781) } },
+      { { AOM_CDF2(32030) }, { AOM_CDF2(3561) }, { AOM_CDF2(904) } },
+      { { AOM_CDF2(32309) }, { AOM_CDF2(7337) }, { AOM_CDF2(1462) } },
+      { { AOM_CDF2(32265) }, { AOM_CDF2(4015) }, { AOM_CDF2(1521) } },
+      { { AOM_CDF2(32450) }, { AOM_CDF2(7946) }, { AOM_CDF2(129) } }
+    };
+
+static const aom_cdf_prob
+    default_palette_uv_mode_cdf[PALETTE_UV_MODE_CONTEXTS][CDF_SIZE(2)] = {
+      { AOM_CDF2(32461) }, { AOM_CDF2(21488) }
+    };
+
+static const aom_cdf_prob default_palette_y_color_index_cdf
+    [PALETTE_SIZES][PALETTE_COLOR_INDEX_CONTEXTS][CDF_SIZE(PALETTE_COLORS)] = {
+      {
+          { AOM_CDF2(28710) },
+          { AOM_CDF2(16384) },
+          { AOM_CDF2(10553) },
+          { AOM_CDF2(27036) },
+          { AOM_CDF2(31603) },
+      },
+      {
+          { AOM_CDF3(27877, 30490) },
+          { AOM_CDF3(11532, 25697) },
+          { AOM_CDF3(6544, 30234) },
+          { AOM_CDF3(23018, 28072) },
+          { AOM_CDF3(31915, 32385) },
+      },
+      {
+          { AOM_CDF4(25572, 28046, 30045) },
+          { AOM_CDF4(9478, 21590, 27256) },
+          { AOM_CDF4(7248, 26837, 29824) },
+          { AOM_CDF4(19167, 24486, 28349) },
+          { AOM_CDF4(31400, 31825, 32250) },
+      },
+      {
+          { AOM_CDF5(24779, 26955, 28576, 30282) },
+          { AOM_CDF5(8669, 20364, 24073, 28093) },
+          { AOM_CDF5(4255, 27565, 29377, 31067) },
+          { AOM_CDF5(19864, 23674, 26716, 29530) },
+          { AOM_CDF5(31646, 31893, 32147, 32426) },
+      },
+      {
+          { AOM_CDF6(23132, 25407, 26970, 28435, 30073) },
+          { AOM_CDF6(7443, 17242, 20717, 24762, 27982) },
+          { AOM_CDF6(6300, 24862, 26944, 28784, 30671) },
+          { AOM_CDF6(18916, 22895, 25267, 27435, 29652) },
+          { AOM_CDF6(31270, 31550, 31808, 32059, 32353) },
+      },
+      {
+          { AOM_CDF7(23105, 25199, 26464, 27684, 28931, 30318) },
+          { AOM_CDF7(6950, 15447, 18952, 22681, 25567, 28563) },
+          { AOM_CDF7(7560, 23474, 25490, 27203, 28921, 30708) },
+          { AOM_CDF7(18544, 22373, 24457, 26195, 28119, 30045) },
+          { AOM_CDF7(31198, 31451, 31670, 31882, 32123, 32391) },
+      },
+      {
+          { AOM_CDF8(21689, 23883, 25163, 26352, 27506, 28827, 30195) },
+          { AOM_CDF8(6892, 15385, 17840, 21606, 24287, 26753, 29204) },
+          { AOM_CDF8(5651, 23182, 25042, 26518, 27982, 29392, 30900) },
+          { AOM_CDF8(19349, 22578, 24418, 25994, 27524, 29031, 30448) },
+          { AOM_CDF8(31028, 31270, 31504, 31705, 31927, 32153, 32392) },
+      },
+    };
+
+static const aom_cdf_prob default_palette_uv_color_index_cdf
+    [PALETTE_SIZES][PALETTE_COLOR_INDEX_CONTEXTS][CDF_SIZE(PALETTE_COLORS)] = {
+      {
+          { AOM_CDF2(29089) },
+          { AOM_CDF2(16384) },
+          { AOM_CDF2(8713) },
+          { AOM_CDF2(29257) },
+          { AOM_CDF2(31610) },
+      },
+      {
+          { AOM_CDF3(25257, 29145) },
+          { AOM_CDF3(12287, 27293) },
+          { AOM_CDF3(7033, 27960) },
+          { AOM_CDF3(20145, 25405) },
+          { AOM_CDF3(30608, 31639) },
+      },
+      {
+          { AOM_CDF4(24210, 27175, 29903) },
+          { AOM_CDF4(9888, 22386, 27214) },
+          { AOM_CDF4(5901, 26053, 29293) },
+          { AOM_CDF4(18318, 22152, 28333) },
+          { AOM_CDF4(30459, 31136, 31926) },
+      },
+      {
+          { AOM_CDF5(22980, 25479, 27781, 29986) },
+          { AOM_CDF5(8413, 21408, 24859, 28874) },
+          { AOM_CDF5(2257, 29449, 30594, 31598) },
+          { AOM_CDF5(19189, 21202, 25915, 28620) },
+          { AOM_CDF5(31844, 32044, 32281, 32518) },
+      },
+      {
+          { AOM_CDF6(22217, 24567, 26637, 28683, 30548) },
+          { AOM_CDF6(7307, 16406, 19636, 24632, 28424) },
+          { AOM_CDF6(4441, 25064, 26879, 28942, 30919) },
+          { AOM_CDF6(17210, 20528, 23319, 26750, 29582) },
+          { AOM_CDF6(30674, 30953, 31396, 31735, 32207) },
+      },
+      {
+          { AOM_CDF7(21239, 23168, 25044, 26962, 28705, 30506) },
+          { AOM_CDF7(6545, 15012, 18004, 21817, 25503, 28701) },
+          { AOM_CDF7(3448, 26295, 27437, 28704, 30126, 31442) },
+          { AOM_CDF7(15889, 18323, 21704, 24698, 26976, 29690) },
+          { AOM_CDF7(30988, 31204, 31479, 31734, 31983, 32325) },
+      },
+      {
+          { AOM_CDF8(21442, 23288, 24758, 26246, 27649, 28980, 30563) },
+          { AOM_CDF8(5863, 14933, 17552, 20668, 23683, 26411, 29273) },
+          { AOM_CDF8(3415, 25810, 26877, 27990, 29223, 30394, 31618) },
+          { AOM_CDF8(17965, 20084, 22232, 23974, 26274, 28402, 30390) },
+          { AOM_CDF8(31190, 31329, 31516, 31679, 31825, 32026, 32322) },
+      },
+    };
+
+static const aom_cdf_prob
+    default_txfm_partition_cdf[TXFM_PARTITION_CONTEXTS][CDF_SIZE(2)] = {
+      { AOM_CDF2(28581) }, { AOM_CDF2(23846) }, { AOM_CDF2(20847) },
+      { AOM_CDF2(24315) }, { AOM_CDF2(18196) }, { AOM_CDF2(12133) },
+      { AOM_CDF2(18791) }, { AOM_CDF2(10887) }, { AOM_CDF2(11005) },
+      { AOM_CDF2(27179) }, { AOM_CDF2(20004) }, { AOM_CDF2(11281) },
+      { AOM_CDF2(26549) }, { AOM_CDF2(19308) }, { AOM_CDF2(14224) },
+      { AOM_CDF2(28015) }, { AOM_CDF2(21546) }, { AOM_CDF2(14400) },
+      { AOM_CDF2(28165) }, { AOM_CDF2(22401) }, { AOM_CDF2(16088) }
+    };
+
+static const aom_cdf_prob default_skip_cdfs[SKIP_CONTEXTS][CDF_SIZE(2)] = {
+  { AOM_CDF2(31671) }, { AOM_CDF2(16515) }, { AOM_CDF2(4576) }
+};
+
+static const aom_cdf_prob default_skip_mode_cdfs[SKIP_MODE_CONTEXTS][CDF_SIZE(
+    2)] = { { AOM_CDF2(32621) }, { AOM_CDF2(20708) }, { AOM_CDF2(8127) } };
+
+static const aom_cdf_prob
+    default_compound_idx_cdfs[COMP_INDEX_CONTEXTS][CDF_SIZE(2)] = {
+      { AOM_CDF2(18244) }, { AOM_CDF2(12865) }, { AOM_CDF2(7053) },
+      { AOM_CDF2(13259) }, { AOM_CDF2(9334) },  { AOM_CDF2(4644) }
+    };
+
+static const aom_cdf_prob
+    default_comp_group_idx_cdfs[COMP_GROUP_IDX_CONTEXTS][CDF_SIZE(2)] = {
+      { AOM_CDF2(26607) }, { AOM_CDF2(22891) }, { AOM_CDF2(18840) },
+      { AOM_CDF2(24594) }, { AOM_CDF2(19934) }, { AOM_CDF2(22674) }
+    };
+
+static const aom_cdf_prob default_intrabc_cdf[CDF_SIZE(2)] = { AOM_CDF2(
+    30531) };
+
+static const aom_cdf_prob default_filter_intra_mode_cdf[CDF_SIZE(
+    FILTER_INTRA_MODES)] = { AOM_CDF5(8949, 12776, 17211, 29558) };
+
+static const aom_cdf_prob default_filter_intra_cdfs[BLOCK_SIZES_ALL][CDF_SIZE(
+    2)] = { { AOM_CDF2(4621) },  { AOM_CDF2(6743) },  { AOM_CDF2(5893) },
+            { AOM_CDF2(7866) },  { AOM_CDF2(12551) }, { AOM_CDF2(9394) },
+            { AOM_CDF2(12408) }, { AOM_CDF2(14301) }, { AOM_CDF2(12756) },
+            { AOM_CDF2(22343) }, { AOM_CDF2(16384) }, { AOM_CDF2(16384) },
+            { AOM_CDF2(16384) }, { AOM_CDF2(16384) }, { AOM_CDF2(16384) },
+            { AOM_CDF2(16384) }, { AOM_CDF2(12770) }, { AOM_CDF2(10368) },
+            { AOM_CDF2(20229) }, { AOM_CDF2(18101) }, { AOM_CDF2(16384) },
+            { AOM_CDF2(16384) } };
+
+static const aom_cdf_prob default_switchable_restore_cdf[CDF_SIZE(
+    RESTORE_SWITCHABLE_TYPES)] = { AOM_CDF3(9413, 22581) };
+
+static const aom_cdf_prob default_wiener_restore_cdf[CDF_SIZE(2)] = { AOM_CDF2(
+    11570) };
+
+static const aom_cdf_prob default_sgrproj_restore_cdf[CDF_SIZE(2)] = { AOM_CDF2(
+    16855) };
+
+static const aom_cdf_prob default_delta_q_cdf[CDF_SIZE(DELTA_Q_PROBS + 1)] = {
+  AOM_CDF4(28160, 32120, 32677)
+};
+
+static const aom_cdf_prob default_delta_lf_multi_cdf[FRAME_LF_COUNT][CDF_SIZE(
+    DELTA_LF_PROBS + 1)] = { { AOM_CDF4(28160, 32120, 32677) },
+                             { AOM_CDF4(28160, 32120, 32677) },
+                             { AOM_CDF4(28160, 32120, 32677) },
+                             { AOM_CDF4(28160, 32120, 32677) } };
+static const aom_cdf_prob default_delta_lf_cdf[CDF_SIZE(DELTA_LF_PROBS + 1)] = {
+  AOM_CDF4(28160, 32120, 32677)
+};
+
+// FIXME(someone) need real defaults here
+static const aom_cdf_prob default_seg_tree_cdf[CDF_SIZE(MAX_SEGMENTS)] = {
+  AOM_CDF8(4096, 8192, 12288, 16384, 20480, 24576, 28672)
+};
+
+static const aom_cdf_prob
+    default_segment_pred_cdf[SEG_TEMPORAL_PRED_CTXS][CDF_SIZE(2)] = {
+      { AOM_CDF2(128 * 128) }, { AOM_CDF2(128 * 128) }, { AOM_CDF2(128 * 128) }
+    };
+
+static const aom_cdf_prob
+    default_spatial_pred_seg_tree_cdf[SPATIAL_PREDICTION_PROBS][CDF_SIZE(
+        MAX_SEGMENTS)] = {
+      {
+          AOM_CDF8(5622, 7893, 16093, 18233, 27809, 28373, 32533),
+      },
+      {
+          AOM_CDF8(14274, 18230, 22557, 24935, 29980, 30851, 32344),
+      },
+      {
+          AOM_CDF8(27527, 28487, 28723, 28890, 32397, 32647, 32679),
+      },
+    };
+
+static const aom_cdf_prob default_tx_size_cdf[MAX_TX_CATS][TX_SIZE_CONTEXTS]
+                                             [CDF_SIZE(MAX_TX_DEPTH + 1)] = {
+                                               { { AOM_CDF2(19968) },
+                                                 { AOM_CDF2(19968) },
+                                                 { AOM_CDF2(24320) } },
+                                               { { AOM_CDF3(12272, 30172) },
+                                                 { AOM_CDF3(12272, 30172) },
+                                                 { AOM_CDF3(18677, 30848) } },
+                                               { { AOM_CDF3(12986, 15180) },
+                                                 { AOM_CDF3(12986, 15180) },
+                                                 { AOM_CDF3(24302, 25602) } },
+                                               { { AOM_CDF3(5782, 11475) },
+                                                 { AOM_CDF3(5782, 11475) },
+                                                 { AOM_CDF3(16803, 22759) } },
+                                             };
+
+#define MAX_COLOR_CONTEXT_HASH 8
+// Negative values are invalid
+static const int palette_color_index_context_lookup[MAX_COLOR_CONTEXT_HASH +
+                                                    1] = { -1, -1, 0, -1, -1,
+                                                           4,  3,  2, 1 };
+
+#define NUM_PALETTE_NEIGHBORS 3  // left, top-left and top.
+int av1_get_palette_color_index_context(const uint8_t *color_map, int stride,
+                                        int r, int c, int palette_size,
+                                        uint8_t *color_order, int *color_idx) {
+  assert(palette_size <= PALETTE_MAX_SIZE);
+  assert(r > 0 || c > 0);
+
+  // Get color indices of neighbors.
+  int color_neighbors[NUM_PALETTE_NEIGHBORS];
+  color_neighbors[0] = (c - 1 >= 0) ? color_map[r * stride + c - 1] : -1;
+  color_neighbors[1] =
+      (c - 1 >= 0 && r - 1 >= 0) ? color_map[(r - 1) * stride + c - 1] : -1;
+  color_neighbors[2] = (r - 1 >= 0) ? color_map[(r - 1) * stride + c] : -1;
+
+  // The +10 below should not be needed. But we get a warning "array subscript
+  // is above array bounds [-Werror=array-bounds]" without it, possibly due to
+  // this (or similar) bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59124
+  int scores[PALETTE_MAX_SIZE + 10] = { 0 };
+  int i;
+  static const int weights[NUM_PALETTE_NEIGHBORS] = { 2, 1, 2 };
+  for (i = 0; i < NUM_PALETTE_NEIGHBORS; ++i) {
+    if (color_neighbors[i] >= 0) {
+      scores[color_neighbors[i]] += weights[i];
+    }
+  }
+
+  int inverse_color_order[PALETTE_MAX_SIZE];
+  for (i = 0; i < PALETTE_MAX_SIZE; ++i) {
+    color_order[i] = i;
+    inverse_color_order[i] = i;
+  }
+
+  // Get the top NUM_PALETTE_NEIGHBORS scores (sorted from large to small).
+  for (i = 0; i < NUM_PALETTE_NEIGHBORS; ++i) {
+    int max = scores[i];
+    int max_idx = i;
+    for (int j = i + 1; j < palette_size; ++j) {
+      if (scores[j] > max) {
+        max = scores[j];
+        max_idx = j;
+      }
+    }
+    if (max_idx != i) {
+      // Move the score at index 'max_idx' to index 'i', and shift the scores
+      // from 'i' to 'max_idx - 1' by 1.
+      const int max_score = scores[max_idx];
+      const uint8_t max_color_order = color_order[max_idx];
+      for (int k = max_idx; k > i; --k) {
+        scores[k] = scores[k - 1];
+        color_order[k] = color_order[k - 1];
+        inverse_color_order[color_order[k]] = k;
+      }
+      scores[i] = max_score;
+      color_order[i] = max_color_order;
+      inverse_color_order[color_order[i]] = i;
+    }
+  }
+
+  if (color_idx != NULL)
+    *color_idx = inverse_color_order[color_map[r * stride + c]];
+
+  // Get hash value of context.
+  int color_index_ctx_hash = 0;
+  static const int hash_multipliers[NUM_PALETTE_NEIGHBORS] = { 1, 2, 2 };
+  for (i = 0; i < NUM_PALETTE_NEIGHBORS; ++i) {
+    color_index_ctx_hash += scores[i] * hash_multipliers[i];
+  }
+  assert(color_index_ctx_hash > 0);
+  assert(color_index_ctx_hash <= MAX_COLOR_CONTEXT_HASH);
+
+  // Lookup context from hash.
+  const int color_index_ctx =
+      palette_color_index_context_lookup[color_index_ctx_hash];
+  assert(color_index_ctx >= 0);
+  assert(color_index_ctx < PALETTE_COLOR_INDEX_CONTEXTS);
+  return color_index_ctx;
+}
+#undef NUM_PALETTE_NEIGHBORS
+#undef MAX_COLOR_CONTEXT_HASH
+
+static void init_mode_probs(FRAME_CONTEXT *fc) {
+  av1_copy(fc->palette_y_size_cdf, default_palette_y_size_cdf);
+  av1_copy(fc->palette_uv_size_cdf, default_palette_uv_size_cdf);
+  av1_copy(fc->palette_y_color_index_cdf, default_palette_y_color_index_cdf);
+  av1_copy(fc->palette_uv_color_index_cdf, default_palette_uv_color_index_cdf);
+  av1_copy(fc->kf_y_cdf, default_kf_y_mode_cdf);
+  av1_copy(fc->angle_delta_cdf, default_angle_delta_cdf);
+  av1_copy(fc->comp_inter_cdf, default_comp_inter_cdf);
+  av1_copy(fc->comp_ref_type_cdf, default_comp_ref_type_cdf);
+  av1_copy(fc->uni_comp_ref_cdf, default_uni_comp_ref_cdf);
+  av1_copy(fc->palette_y_mode_cdf, default_palette_y_mode_cdf);
+  av1_copy(fc->palette_uv_mode_cdf, default_palette_uv_mode_cdf);
+  av1_copy(fc->comp_ref_cdf, default_comp_ref_cdf);
+  av1_copy(fc->comp_bwdref_cdf, default_comp_bwdref_cdf);
+  av1_copy(fc->single_ref_cdf, default_single_ref_cdf);
+  av1_copy(fc->txfm_partition_cdf, default_txfm_partition_cdf);
+  av1_copy(fc->compound_index_cdf, default_compound_idx_cdfs);
+  av1_copy(fc->comp_group_idx_cdf, default_comp_group_idx_cdfs);
+  av1_copy(fc->newmv_cdf, default_newmv_cdf);
+  av1_copy(fc->zeromv_cdf, default_zeromv_cdf);
+  av1_copy(fc->refmv_cdf, default_refmv_cdf);
+  av1_copy(fc->drl_cdf, default_drl_cdf);
+  av1_copy(fc->motion_mode_cdf, default_motion_mode_cdf);
+  av1_copy(fc->obmc_cdf, default_obmc_cdf);
+  av1_copy(fc->inter_compound_mode_cdf, default_inter_compound_mode_cdf);
+  av1_copy(fc->compound_type_cdf, default_compound_type_cdf);
+  av1_copy(fc->wedge_idx_cdf, default_wedge_idx_cdf);
+  av1_copy(fc->interintra_cdf, default_interintra_cdf);
+  av1_copy(fc->wedge_interintra_cdf, default_wedge_interintra_cdf);
+  av1_copy(fc->interintra_mode_cdf, default_interintra_mode_cdf);
+  av1_copy(fc->seg.pred_cdf, default_segment_pred_cdf);
+  av1_copy(fc->seg.tree_cdf, default_seg_tree_cdf);
+  av1_copy(fc->filter_intra_cdfs, default_filter_intra_cdfs);
+  av1_copy(fc->filter_intra_mode_cdf, default_filter_intra_mode_cdf);
+  av1_copy(fc->switchable_restore_cdf, default_switchable_restore_cdf);
+  av1_copy(fc->wiener_restore_cdf, default_wiener_restore_cdf);
+  av1_copy(fc->sgrproj_restore_cdf, default_sgrproj_restore_cdf);
+  av1_copy(fc->y_mode_cdf, default_if_y_mode_cdf);
+  av1_copy(fc->uv_mode_cdf, default_uv_mode_cdf);
+  av1_copy(fc->switchable_interp_cdf, default_switchable_interp_cdf);
+  av1_copy(fc->partition_cdf, default_partition_cdf);
+  av1_copy(fc->intra_ext_tx_cdf, default_intra_ext_tx_cdf);
+  av1_copy(fc->inter_ext_tx_cdf, default_inter_ext_tx_cdf);
+  av1_copy(fc->skip_mode_cdfs, default_skip_mode_cdfs);
+  av1_copy(fc->skip_cdfs, default_skip_cdfs);
+  av1_copy(fc->intra_inter_cdf, default_intra_inter_cdf);
+  for (int i = 0; i < SPATIAL_PREDICTION_PROBS; i++)
+    av1_copy(fc->seg.spatial_pred_seg_cdf[i],
+             default_spatial_pred_seg_tree_cdf[i]);
+  av1_copy(fc->tx_size_cdf, default_tx_size_cdf);
+  av1_copy(fc->delta_q_cdf, default_delta_q_cdf);
+  av1_copy(fc->delta_lf_cdf, default_delta_lf_cdf);
+  av1_copy(fc->delta_lf_multi_cdf, default_delta_lf_multi_cdf);
+  av1_copy(fc->cfl_sign_cdf, default_cfl_sign_cdf);
+  av1_copy(fc->cfl_alpha_cdf, default_cfl_alpha_cdf);
+  av1_copy(fc->intrabc_cdf, default_intrabc_cdf);
+}
+
+void av1_set_default_ref_deltas(int8_t *ref_deltas) {
+  assert(ref_deltas != NULL);
+
+  ref_deltas[INTRA_FRAME] = 1;
+  ref_deltas[LAST_FRAME] = 0;
+  ref_deltas[LAST2_FRAME] = ref_deltas[LAST_FRAME];
+  ref_deltas[LAST3_FRAME] = ref_deltas[LAST_FRAME];
+  ref_deltas[BWDREF_FRAME] = ref_deltas[LAST_FRAME];
+  ref_deltas[GOLDEN_FRAME] = -1;
+  ref_deltas[ALTREF2_FRAME] = -1;
+  ref_deltas[ALTREF_FRAME] = -1;
+}
+
+void av1_set_default_mode_deltas(int8_t *mode_deltas) {
+  assert(mode_deltas != NULL);
+
+  mode_deltas[0] = 0;
+  mode_deltas[1] = 0;
+}
+
+static void set_default_lf_deltas(struct loopfilter *lf) {
+  lf->mode_ref_delta_enabled = 1;
+  lf->mode_ref_delta_update = 1;
+
+  av1_set_default_ref_deltas(lf->ref_deltas);
+  av1_set_default_mode_deltas(lf->mode_deltas);
+}
+
+void av1_setup_frame_contexts(AV1_COMMON *cm) {
+  // Store the frame context into a special slot (not associated with any
+  // reference buffer), so that we can set up cm->pre_fc correctly later
+  // This function must ONLY be called when cm->fc has been initialized with
+  // default probs, either by av1_setup_past_independence or after manually
+  // initializing them
+  *cm->default_frame_context = *cm->fc;
+  // TODO(jack.haughton@argondesign.com): don't think this should be necessary,
+  // but could do with fuller testing
+  if (cm->large_scale_tile) {
+    for (int i = LAST_FRAME; i <= ALTREF_FRAME; ++i) {
+      RefCntBuffer *const buf = get_ref_frame_buf(cm, i);
+      if (buf != NULL) buf->frame_context = *cm->fc;
+    }
+    for (int i = 0; i < FRAME_BUFFERS; ++i)
+      cm->buffer_pool->frame_bufs[i].frame_context = *cm->fc;
+  }
+}
+
+void av1_setup_past_independence(AV1_COMMON *cm) {
+  // Reset the segment feature data to the default stats:
+  // Features disabled, 0, with delta coding (Default state).
+  av1_clearall_segfeatures(&cm->seg);
+
+  if (cm->cur_frame->seg_map)
+    memset(cm->cur_frame->seg_map, 0, (cm->mi_rows * cm->mi_cols));
+
+  // reset mode ref deltas
+  av1_set_default_ref_deltas(cm->cur_frame->ref_deltas);
+  av1_set_default_mode_deltas(cm->cur_frame->mode_deltas);
+  set_default_lf_deltas(&cm->lf);
+
+  av1_default_coef_probs(cm);
+  init_mode_probs(cm->fc);
+  av1_init_mv_probs(cm);
+  cm->fc->initialized = 1;
+  av1_setup_frame_contexts(cm);
+}

diff --git a/libav1/av1/common/entropymode.h b/libav1/av1/common/entropymode.h
new file mode 100644
index 0000000..69b5218
--- /dev/null
+++ b/libav1/av1/common/entropymode.h

@@ -0,0 +1,213 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_ENTROPYMODE_H_
+#define AOM_AV1_COMMON_ENTROPYMODE_H_
+
+#include "av1/common/entropy.h"
+#include "av1/common/entropymv.h"
+#include "av1/common/filter.h"
+#include "av1/common/seg_common.h"
+#include "aom_dsp/aom_filter.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define BLOCK_SIZE_GROUPS 4
+
+#define TX_SIZE_CONTEXTS 3
+
+#define INTER_OFFSET(mode) ((mode)-NEARESTMV)
+#define INTER_COMPOUND_OFFSET(mode) (uint8_t)((mode)-NEAREST_NEARESTMV)
+
+// Number of possible contexts for a color index.
+// As can be seen from av1_get_palette_color_index_context(), the possible
+// contexts are (2,0,0), (2,2,1), (3,2,0), (4,1,0), (5,0,0). These are mapped to
+// a value from 0 to 4 using 'palette_color_index_context_lookup' table.
+#define PALETTE_COLOR_INDEX_CONTEXTS 5
+
+// Palette Y mode context for a block is determined by number of neighboring
+// blocks (top and/or left) using a palette for Y plane. So, possible Y mode'
+// context values are:
+// 0 if neither left nor top block uses palette for Y plane,
+// 1 if exactly one of left or top block uses palette for Y plane, and
+// 2 if both left and top blocks use palette for Y plane.
+#define PALETTE_Y_MODE_CONTEXTS 3
+
+// Palette UV mode context for a block is determined by whether this block uses
+// palette for the Y plane. So, possible values are:
+// 0 if this block doesn't use palette for Y plane.
+// 1 if this block uses palette for Y plane (i.e. Y palette size > 0).
+#define PALETTE_UV_MODE_CONTEXTS 2
+
+// Map the number of pixels in a block size to a context
+//   64(BLOCK_8X8, BLOCK_4x16, BLOCK_16X4)  -> 0
+//  128(BLOCK_8X16, BLOCK_16x8)             -> 1
+//   ...
+// 4096(BLOCK_64X64)                        -> 6
+#define PALATTE_BSIZE_CTXS 7
+
+#define KF_MODE_CONTEXTS 5
+
+struct AV1Common;
+
+typedef struct {
+  const int16_t *scan;
+  const int16_t *iscan;
+  const int16_t *neighbors;
+} SCAN_ORDER;
+
+typedef struct frame_contexts {
+  aom_cdf_prob txb_skip_cdf[TX_SIZES][TXB_SKIP_CONTEXTS][CDF_SIZE(2)];
+  aom_cdf_prob eob_extra_cdf[TX_SIZES][PLANE_TYPES][EOB_COEF_CONTEXTS]
+                            [CDF_SIZE(2)];
+  aom_cdf_prob dc_sign_cdf[PLANE_TYPES][DC_SIGN_CONTEXTS][CDF_SIZE(2)];
+  aom_cdf_prob eob_flag_cdf16[PLANE_TYPES][2][CDF_SIZE(5)];
+  aom_cdf_prob eob_flag_cdf32[PLANE_TYPES][2][CDF_SIZE(6)];
+  aom_cdf_prob eob_flag_cdf64[PLANE_TYPES][2][CDF_SIZE(7)];
+  aom_cdf_prob eob_flag_cdf128[PLANE_TYPES][2][CDF_SIZE(8)];
+  aom_cdf_prob eob_flag_cdf256[PLANE_TYPES][2][CDF_SIZE(9)];
+  aom_cdf_prob eob_flag_cdf512[PLANE_TYPES][2][CDF_SIZE(10)];
+  aom_cdf_prob eob_flag_cdf1024[PLANE_TYPES][2][CDF_SIZE(11)];
+  aom_cdf_prob coeff_base_eob_cdf[TX_SIZES][PLANE_TYPES][SIG_COEF_CONTEXTS_EOB]
+                                 [CDF_SIZE(3)];
+  aom_cdf_prob coeff_base_cdf[TX_SIZES][PLANE_TYPES][SIG_COEF_CONTEXTS]
+                             [CDF_SIZE(4)];
+  aom_cdf_prob coeff_br_cdf[TX_SIZES][PLANE_TYPES][LEVEL_CONTEXTS]
+                           [CDF_SIZE(BR_CDF_SIZE)];
+
+  aom_cdf_prob newmv_cdf[NEWMV_MODE_CONTEXTS][CDF_SIZE(2)];
+  aom_cdf_prob zeromv_cdf[GLOBALMV_MODE_CONTEXTS][CDF_SIZE(2)];
+  aom_cdf_prob refmv_cdf[REFMV_MODE_CONTEXTS][CDF_SIZE(2)];
+  aom_cdf_prob drl_cdf[DRL_MODE_CONTEXTS][CDF_SIZE(2)];
+
+  aom_cdf_prob inter_compound_mode_cdf[INTER_MODE_CONTEXTS]
+                                      [CDF_SIZE(INTER_COMPOUND_MODES)];
+  aom_cdf_prob compound_type_cdf[BLOCK_SIZES_ALL]
+                                [CDF_SIZE(MASKED_COMPOUND_TYPES)];
+  aom_cdf_prob wedge_idx_cdf[BLOCK_SIZES_ALL][CDF_SIZE(16)];
+  aom_cdf_prob interintra_cdf[BLOCK_SIZE_GROUPS][CDF_SIZE(2)];
+  aom_cdf_prob wedge_interintra_cdf[BLOCK_SIZES_ALL][CDF_SIZE(2)];
+  aom_cdf_prob interintra_mode_cdf[BLOCK_SIZE_GROUPS]
+                                  [CDF_SIZE(INTERINTRA_MODES)];
+  aom_cdf_prob motion_mode_cdf[BLOCK_SIZES_ALL][CDF_SIZE(MOTION_MODES)];
+  aom_cdf_prob obmc_cdf[BLOCK_SIZES_ALL][CDF_SIZE(2)];
+  aom_cdf_prob palette_y_size_cdf[PALATTE_BSIZE_CTXS][CDF_SIZE(PALETTE_SIZES)];
+  aom_cdf_prob palette_uv_size_cdf[PALATTE_BSIZE_CTXS][CDF_SIZE(PALETTE_SIZES)];
+  aom_cdf_prob palette_y_color_index_cdf[PALETTE_SIZES]
+                                        [PALETTE_COLOR_INDEX_CONTEXTS]
+                                        [CDF_SIZE(PALETTE_COLORS)];
+  aom_cdf_prob palette_uv_color_index_cdf[PALETTE_SIZES]
+                                         [PALETTE_COLOR_INDEX_CONTEXTS]
+                                         [CDF_SIZE(PALETTE_COLORS)];
+  aom_cdf_prob palette_y_mode_cdf[PALATTE_BSIZE_CTXS][PALETTE_Y_MODE_CONTEXTS]
+                                 [CDF_SIZE(2)];
+  aom_cdf_prob palette_uv_mode_cdf[PALETTE_UV_MODE_CONTEXTS][CDF_SIZE(2)];
+  aom_cdf_prob comp_inter_cdf[COMP_INTER_CONTEXTS][CDF_SIZE(2)];
+  aom_cdf_prob single_ref_cdf[REF_CONTEXTS][SINGLE_REFS - 1][CDF_SIZE(2)];
+  aom_cdf_prob comp_ref_type_cdf[COMP_REF_TYPE_CONTEXTS][CDF_SIZE(2)];
+  aom_cdf_prob uni_comp_ref_cdf[UNI_COMP_REF_CONTEXTS][UNIDIR_COMP_REFS - 1]
+                               [CDF_SIZE(2)];
+  aom_cdf_prob comp_ref_cdf[REF_CONTEXTS][FWD_REFS - 1][CDF_SIZE(2)];
+  aom_cdf_prob comp_bwdref_cdf[REF_CONTEXTS][BWD_REFS - 1][CDF_SIZE(2)];
+  aom_cdf_prob txfm_partition_cdf[TXFM_PARTITION_CONTEXTS][CDF_SIZE(2)];
+  aom_cdf_prob compound_index_cdf[COMP_INDEX_CONTEXTS][CDF_SIZE(2)];
+  aom_cdf_prob comp_group_idx_cdf[COMP_GROUP_IDX_CONTEXTS][CDF_SIZE(2)];
+  aom_cdf_prob skip_mode_cdfs[SKIP_CONTEXTS][CDF_SIZE(2)];
+  aom_cdf_prob skip_cdfs[SKIP_CONTEXTS][CDF_SIZE(2)];
+  aom_cdf_prob intra_inter_cdf[INTRA_INTER_CONTEXTS][CDF_SIZE(2)];
+  nmv_context nmvc;
+  nmv_context ndvc;
+  aom_cdf_prob intrabc_cdf[CDF_SIZE(2)];
+  struct segmentation_probs seg;
+  aom_cdf_prob filter_intra_cdfs[BLOCK_SIZES_ALL][CDF_SIZE(2)];
+  aom_cdf_prob filter_intra_mode_cdf[CDF_SIZE(FILTER_INTRA_MODES)];
+  aom_cdf_prob switchable_restore_cdf[CDF_SIZE(RESTORE_SWITCHABLE_TYPES)];
+  aom_cdf_prob wiener_restore_cdf[CDF_SIZE(2)];
+  aom_cdf_prob sgrproj_restore_cdf[CDF_SIZE(2)];
+  aom_cdf_prob y_mode_cdf[BLOCK_SIZE_GROUPS][CDF_SIZE(INTRA_MODES)];
+  aom_cdf_prob uv_mode_cdf[CFL_ALLOWED_TYPES][INTRA_MODES]
+                          [CDF_SIZE(UV_INTRA_MODES)];
+  aom_cdf_prob partition_cdf[PARTITION_CONTEXTS][CDF_SIZE(EXT_PARTITION_TYPES)];
+  aom_cdf_prob switchable_interp_cdf[SWITCHABLE_FILTER_CONTEXTS]
+                                    [CDF_SIZE(SWITCHABLE_FILTERS)];
+  /* kf_y_cdf is discarded after use, so does not require persistent storage.
+     However, we keep it with the other CDFs in this struct since it needs to
+     be copied to each tile to support parallelism just like the others.
+  */
+  aom_cdf_prob kf_y_cdf[KF_MODE_CONTEXTS][KF_MODE_CONTEXTS]
+                       [CDF_SIZE(INTRA_MODES)];
+
+  aom_cdf_prob angle_delta_cdf[DIRECTIONAL_MODES]
+                              [CDF_SIZE(2 * MAX_ANGLE_DELTA + 1)];
+
+  aom_cdf_prob tx_size_cdf[MAX_TX_CATS][TX_SIZE_CONTEXTS]
+                          [CDF_SIZE(MAX_TX_DEPTH + 1)];
+  aom_cdf_prob delta_q_cdf[CDF_SIZE(DELTA_Q_PROBS + 1)];
+  aom_cdf_prob delta_lf_multi_cdf[FRAME_LF_COUNT][CDF_SIZE(DELTA_LF_PROBS + 1)];
+  aom_cdf_prob delta_lf_cdf[CDF_SIZE(DELTA_LF_PROBS + 1)];
+  aom_cdf_prob intra_ext_tx_cdf[EXT_TX_SETS_INTRA][EXT_TX_SIZES][INTRA_MODES]
+                               [CDF_SIZE(TX_TYPES)];
+  aom_cdf_prob inter_ext_tx_cdf[EXT_TX_SETS_INTER][EXT_TX_SIZES]
+                               [CDF_SIZE(TX_TYPES)];
+  aom_cdf_prob cfl_sign_cdf[CDF_SIZE(CFL_JOINT_SIGNS)];
+  aom_cdf_prob cfl_alpha_cdf[CFL_ALPHA_CONTEXTS][CDF_SIZE(CFL_ALPHABET_SIZE)];
+  int initialized;
+} FRAME_CONTEXT;
+
+static const int av1_ext_tx_ind[EXT_TX_SET_TYPES][TX_TYPES] = {
+  { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
+  { 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
+  { 1, 3, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
+  { 1, 5, 6, 4, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 0, 0 },
+  { 3, 4, 5, 8, 6, 7, 9, 10, 11, 0, 1, 2, 0, 0, 0, 0 },
+  { 7, 8, 9, 12, 10, 11, 13, 14, 15, 0, 1, 2, 3, 4, 5, 6 },
+};
+
+static const int av1_ext_tx_inv[EXT_TX_SET_TYPES][TX_TYPES] = {
+  { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
+  { 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
+  { 9, 0, 3, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
+  { 9, 0, 10, 11, 3, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
+  { 9, 10, 11, 0, 1, 2, 4, 5, 3, 6, 7, 8, 0, 0, 0, 0 },
+  { 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 4, 5, 3, 6, 7, 8 },
+};
+
+void av1_set_default_ref_deltas(int8_t *ref_deltas);
+void av1_set_default_mode_deltas(int8_t *mode_deltas);
+void av1_setup_frame_contexts(struct AV1Common *cm);
+void av1_setup_past_independence(struct AV1Common *cm);
+
+// Returns (int)ceil(log2(n)).
+// NOTE: This implementation only works for n <= 2^30.
+static INLINE int av1_ceil_log2(int n) {
+  if (n < 2) return 0;
+  int i = 1, p = 2;
+  while (p < n) {
+    i++;
+    p = p << 1;
+  }
+  return i;
+}
+
+// Returns the context for palette color index at row 'r' and column 'c',
+// along with the 'color_order' of neighbors and the 'color_idx'.
+// The 'color_map' is a 2D array with the given 'stride'.
+int av1_get_palette_color_index_context(const uint8_t *color_map, int stride,
+                                        int r, int c, int palette_size,
+                                        uint8_t *color_order, int *color_idx);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_ENTROPYMODE_H_

diff --git a/libav1/av1/common/entropymv.c b/libav1/av1/common/entropymv.c
new file mode 100644
index 0000000..4913373
--- /dev/null
+++ b/libav1/av1/common/entropymv.c

@@ -0,0 +1,67 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "av1/common/onyxc_int.h"
+#include "av1/common/entropymv.h"
+
+static const nmv_context default_nmv_context = {
+  { AOM_CDF4(4096, 11264, 19328) },  // joints_cdf
+  { {
+        // Vertical component
+        { AOM_CDF11(28672, 30976, 31858, 32320, 32551, 32656, 32740, 32757,
+                    32762, 32767) },  // class_cdf // fp
+        { { AOM_CDF4(16384, 24576, 26624) },
+          { AOM_CDF4(12288, 21248, 24128) } },  // class0_fp_cdf
+        { AOM_CDF4(8192, 17408, 21248) },       // fp_cdf
+        { AOM_CDF2(128 * 128) },                // sign_cdf
+        { AOM_CDF2(160 * 128) },                // class0_hp_cdf
+        { AOM_CDF2(128 * 128) },                // hp_cdf
+        { AOM_CDF2(216 * 128) },                // class0_cdf
+        { { AOM_CDF2(128 * 136) },
+          { AOM_CDF2(128 * 140) },
+          { AOM_CDF2(128 * 148) },
+          { AOM_CDF2(128 * 160) },
+          { AOM_CDF2(128 * 176) },
+          { AOM_CDF2(128 * 192) },
+          { AOM_CDF2(128 * 224) },
+          { AOM_CDF2(128 * 234) },
+          { AOM_CDF2(128 * 234) },
+          { AOM_CDF2(128 * 240) } },  // bits_cdf
+    },
+    {
+        // Horizontal component
+        { AOM_CDF11(28672, 30976, 31858, 32320, 32551, 32656, 32740, 32757,
+                    32762, 32767) },  // class_cdf // fp
+        { { AOM_CDF4(16384, 24576, 26624) },
+          { AOM_CDF4(12288, 21248, 24128) } },  // class0_fp_cdf
+        { AOM_CDF4(8192, 17408, 21248) },       // fp_cdf
+        { AOM_CDF2(128 * 128) },                // sign_cdf
+        { AOM_CDF2(160 * 128) },                // class0_hp_cdf
+        { AOM_CDF2(128 * 128) },                // hp_cdf
+        { AOM_CDF2(216 * 128) },                // class0_cdf
+        { { AOM_CDF2(128 * 136) },
+          { AOM_CDF2(128 * 140) },
+          { AOM_CDF2(128 * 148) },
+          { AOM_CDF2(128 * 160) },
+          { AOM_CDF2(128 * 176) },
+          { AOM_CDF2(128 * 192) },
+          { AOM_CDF2(128 * 224) },
+          { AOM_CDF2(128 * 234) },
+          { AOM_CDF2(128 * 234) },
+          { AOM_CDF2(128 * 240) } },  // bits_cdf
+    } },
+};
+
+void av1_init_mv_probs(AV1_COMMON *cm) {
+  // NB: this sets CDFs too
+  cm->fc->nmvc = default_nmv_context;
+  cm->fc->ndvc = default_nmv_context;
+}

diff --git a/libav1/av1/common/entropymv.h b/libav1/av1/common/entropymv.h
new file mode 100644
index 0000000..cddc807
--- /dev/null
+++ b/libav1/av1/common/entropymv.h

@@ -0,0 +1,104 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_ENTROPYMV_H_
+#define AOM_AV1_COMMON_ENTROPYMV_H_
+
+#include "config/aom_config.h"
+
+#include "aom_dsp/prob.h"
+
+#include "av1/common/mv.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct AV1Common;
+
+void av1_init_mv_probs(struct AV1Common *cm);
+
+#define MV_UPDATE_PROB 252
+
+/* Symbols for coding which components are zero jointly */
+#define MV_JOINTS 4
+enum {
+  MV_JOINT_ZERO = 0,   /* Zero vector */
+  MV_JOINT_HNZVZ = 1,  /* Vert zero, hor nonzero */
+  MV_JOINT_HZVNZ = 2,  /* Hor zero, vert nonzero */
+  MV_JOINT_HNZVNZ = 3, /* Both components nonzero */
+} UENUM1BYTE(MV_JOINT_TYPE);
+
+static INLINE int mv_joint_vertical(MV_JOINT_TYPE type) {
+  return type == MV_JOINT_HZVNZ || type == MV_JOINT_HNZVNZ;
+}
+
+static INLINE int mv_joint_horizontal(MV_JOINT_TYPE type) {
+  return type == MV_JOINT_HNZVZ || type == MV_JOINT_HNZVNZ;
+}
+
+/* Symbols for coding magnitude class of nonzero components */
+#define MV_CLASSES 11
+enum {
+  MV_CLASS_0 = 0,   /* (0, 2]     integer pel */
+  MV_CLASS_1 = 1,   /* (2, 4]     integer pel */
+  MV_CLASS_2 = 2,   /* (4, 8]     integer pel */
+  MV_CLASS_3 = 3,   /* (8, 16]    integer pel */
+  MV_CLASS_4 = 4,   /* (16, 32]   integer pel */
+  MV_CLASS_5 = 5,   /* (32, 64]   integer pel */
+  MV_CLASS_6 = 6,   /* (64, 128]  integer pel */
+  MV_CLASS_7 = 7,   /* (128, 256] integer pel */
+  MV_CLASS_8 = 8,   /* (256, 512] integer pel */
+  MV_CLASS_9 = 9,   /* (512, 1024] integer pel */
+  MV_CLASS_10 = 10, /* (1024,2048] integer pel */
+} UENUM1BYTE(MV_CLASS_TYPE);
+
+#define CLASS0_BITS 1 /* bits at integer precision for class 0 */
+#define CLASS0_SIZE (1 << CLASS0_BITS)
+#define MV_OFFSET_BITS (MV_CLASSES + CLASS0_BITS - 2)
+#define MV_BITS_CONTEXTS 6
+#define MV_FP_SIZE 4
+
+#define MV_MAX_BITS (MV_CLASSES + CLASS0_BITS + 2)
+#define MV_MAX ((1 << MV_MAX_BITS) - 1)
+#define MV_VALS ((MV_MAX << 1) + 1)
+
+#define MV_IN_USE_BITS 14
+#define MV_UPP (1 << MV_IN_USE_BITS)
+#define MV_LOW (-(1 << MV_IN_USE_BITS))
+
+typedef struct {
+  aom_cdf_prob classes_cdf[CDF_SIZE(MV_CLASSES)];
+  aom_cdf_prob class0_fp_cdf[CLASS0_SIZE][CDF_SIZE(MV_FP_SIZE)];
+  aom_cdf_prob fp_cdf[CDF_SIZE(MV_FP_SIZE)];
+  aom_cdf_prob sign_cdf[CDF_SIZE(2)];
+  aom_cdf_prob class0_hp_cdf[CDF_SIZE(2)];
+  aom_cdf_prob hp_cdf[CDF_SIZE(2)];
+  aom_cdf_prob class0_cdf[CDF_SIZE(CLASS0_SIZE)];
+  aom_cdf_prob bits_cdf[MV_OFFSET_BITS][CDF_SIZE(2)];
+} nmv_component;
+
+typedef struct {
+  aom_cdf_prob joints_cdf[CDF_SIZE(MV_JOINTS)];
+  nmv_component comps[2];
+} nmv_context;
+
+enum {
+  MV_SUBPEL_NONE = -1,
+  MV_SUBPEL_LOW_PRECISION = 0,
+  MV_SUBPEL_HIGH_PRECISION,
+} SENUM1BYTE(MvSubpelPrecision);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_ENTROPYMV_H_

diff --git a/libav1/av1/common/enums.h b/libav1/av1/common/enums.h
new file mode 100644
index 0000000..8d003d6
--- /dev/null
+++ b/libav1/av1/common/enums.h

@@ -0,0 +1,617 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_ENUMS_H_
+#define AOM_AV1_COMMON_ENUMS_H_
+
+#include "config/aom_config.h"
+
+#include "aom/aom_codec.h"
+#include "aom/aom_integer.h"
+#include "aom_ports/mem.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#undef MAX_SB_SIZE
+
+// Max superblock size
+#define MAX_SB_SIZE_LOG2 7
+#define MAX_SB_SIZE (1 << MAX_SB_SIZE_LOG2)
+#define MAX_SB_SQUARE (MAX_SB_SIZE * MAX_SB_SIZE)
+
+// Min superblock size
+#define MIN_SB_SIZE_LOG2 6
+
+// Pixels per Mode Info (MI) unit
+#define MI_SIZE_LOG2 2
+#define MI_SIZE (1 << MI_SIZE_LOG2)
+
+// MI-units per max superblock (MI Block - MIB)
+#define MAX_MIB_SIZE_LOG2 (MAX_SB_SIZE_LOG2 - MI_SIZE_LOG2)
+#define MAX_MIB_SIZE (1 << MAX_MIB_SIZE_LOG2)
+
+// MI-units per min superblock
+#define MIN_MIB_SIZE_LOG2 (MIN_SB_SIZE_LOG2 - MI_SIZE_LOG2)
+
+// Mask to extract MI offset within max MIB
+#define MAX_MIB_MASK (MAX_MIB_SIZE - 1)
+
+// Maximum number of tile rows and tile columns
+#define MAX_TILE_ROWS 64
+#define MAX_TILE_COLS 64
+
+#define MAX_VARTX_DEPTH 2
+
+#define MI_SIZE_64X64 (64 >> MI_SIZE_LOG2)
+#define MI_SIZE_128X128 (128 >> MI_SIZE_LOG2)
+
+#define MAX_PALETTE_SQUARE (64 * 64)
+// Maximum number of colors in a palette.
+#define PALETTE_MAX_SIZE 8
+// Minimum number of colors in a palette.
+#define PALETTE_MIN_SIZE 2
+
+#define FRAME_OFFSET_BITS 5
+#define MAX_FRAME_DISTANCE ((1 << FRAME_OFFSET_BITS) - 1)
+
+// 4 frame filter levels: y plane vertical, y plane horizontal,
+// u plane, and v plane
+#define FRAME_LF_COUNT 4
+#define DEFAULT_DELTA_LF_MULTI 0
+#define MAX_MODE_LF_DELTAS 2
+
+#define DIST_PRECISION_BITS 4
+#define DIST_PRECISION (1 << DIST_PRECISION_BITS)  // 16
+
+// TODO(chengchen): Temporal flag serve as experimental flag for WIP
+// bitmask construction.
+// Shall be removed when bitmask code is completely checkedin
+#define LOOP_FILTER_BITMASK 0
+
+#define PROFILE_BITS 3
+// The following three profiles are currently defined.
+// Profile 0.  8-bit and 10-bit 4:2:0 and 4:0:0 only.
+// Profile 1.  8-bit and 10-bit 4:4:4
+// Profile 2.  8-bit and 10-bit 4:2:2
+//            12-bit  4:0:0, 4:2:2 and 4:4:4
+// Since we have three bits for the profiles, it can be extended later.
+enum {
+  PROFILE_0,
+  PROFILE_1,
+  PROFILE_2,
+  MAX_PROFILES,
+} SENUM1BYTE(BITSTREAM_PROFILE);
+
+#define LEVEL_MAJOR_BITS 3
+#define LEVEL_MINOR_BITS 2
+#define LEVEL_BITS (LEVEL_MAJOR_BITS + LEVEL_MINOR_BITS)
+
+#define LEVEL_MAJOR_MIN 2
+#define LEVEL_MAJOR_MAX ((1 << LEVEL_MAJOR_BITS) - 1 + LEVEL_MAJOR_MIN)
+#define LEVEL_MINOR_MIN 0
+#define LEVEL_MINOR_MAX ((1 << LEVEL_MINOR_BITS) - 1)
+
+#define OP_POINTS_CNT_MINUS_1_BITS 5
+#define OP_POINTS_IDC_BITS 12
+
+// Note: Some enums use the attribute 'packed' to use smallest possible integer
+// type, so that we can save memory when they are used in structs/arrays.
+
+enum {
+  BLOCK_4X4,
+  BLOCK_4X8,
+  BLOCK_8X4,
+  BLOCK_8X8,
+  BLOCK_8X16,
+  BLOCK_16X8,
+  BLOCK_16X16,
+  BLOCK_16X32,
+  BLOCK_32X16,
+  BLOCK_32X32,
+  BLOCK_32X64,
+  BLOCK_64X32,
+  BLOCK_64X64,
+  BLOCK_64X128,
+  BLOCK_128X64,
+  BLOCK_128X128,
+  BLOCK_4X16,
+  BLOCK_16X4,
+  BLOCK_8X32,
+  BLOCK_32X8,
+  BLOCK_16X64,
+  BLOCK_64X16,
+  BLOCK_SIZES_ALL,
+  BLOCK_SIZES = BLOCK_4X16,
+  BLOCK_INVALID = 255,
+  BLOCK_LARGEST = (BLOCK_SIZES - 1)
+} UENUM1BYTE(BLOCK_SIZE);
+
+// 4X4, 8X8, 16X16, 32X32, 64X64, 128X128
+#define SQR_BLOCK_SIZES 6
+
+enum {
+  PARTITION_NONE,
+  PARTITION_HORZ,
+  PARTITION_VERT,
+  PARTITION_SPLIT,
+  PARTITION_HORZ_A,  // HORZ split and the top partition is split again
+  PARTITION_HORZ_B,  // HORZ split and the bottom partition is split again
+  PARTITION_VERT_A,  // VERT split and the left partition is split again
+  PARTITION_VERT_B,  // VERT split and the right partition is split again
+  PARTITION_HORZ_4,  // 4:1 horizontal partition
+  PARTITION_VERT_4,  // 4:1 vertical partition
+  EXT_PARTITION_TYPES,
+  PARTITION_TYPES = PARTITION_SPLIT + 1,
+  PARTITION_INVALID = 255
+} UENUM1BYTE(PARTITION_TYPE);
+
+typedef char PARTITION_CONTEXT;
+#define PARTITION_PLOFFSET 4  // number of probability models per block size
+#define PARTITION_BLOCK_SIZES 5
+#define PARTITION_CONTEXTS (PARTITION_BLOCK_SIZES * PARTITION_PLOFFSET)
+
+// block transform size
+enum {
+  TX_4X4,             // 4x4 transform
+  TX_8X8,             // 8x8 transform
+  TX_16X16,           // 16x16 transform
+  TX_32X32,           // 32x32 transform
+  TX_64X64,           // 64x64 transform
+  TX_4X8,             // 4x8 transform
+  TX_8X4,             // 8x4 transform
+  TX_8X16,            // 8x16 transform
+  TX_16X8,            // 16x8 transform
+  TX_16X32,           // 16x32 transform
+  TX_32X16,           // 32x16 transform
+  TX_32X64,           // 32x64 transform
+  TX_64X32,           // 64x32 transform
+  TX_4X16,            // 4x16 transform
+  TX_16X4,            // 16x4 transform
+  TX_8X32,            // 8x32 transform
+  TX_32X8,            // 32x8 transform
+  TX_16X64,           // 16x64 transform
+  TX_64X16,           // 64x16 transform
+  TX_SIZES_ALL,       // Includes rectangular transforms
+  TX_SIZES = TX_4X8,  // Does NOT include rectangular transforms
+  TX_SIZES_LARGEST = TX_64X64,
+  TX_INVALID = 255  // Invalid transform size
+} UENUM1BYTE(TX_SIZE);
+
+#define TX_SIZE_LUMA_MIN (TX_4X4)
+/* We don't need to code a transform size unless the allowed size is at least
+   one more than the minimum. */
+#define TX_SIZE_CTX_MIN (TX_SIZE_LUMA_MIN + 1)
+
+// Maximum tx_size categories
+#define MAX_TX_CATS (TX_SIZES - TX_SIZE_CTX_MIN)
+#define MAX_TX_DEPTH 2
+
+#define MAX_TX_SIZE_LOG2 (6)
+#define MAX_TX_SIZE (1 << MAX_TX_SIZE_LOG2)
+#define MIN_TX_SIZE_LOG2 2
+#define MIN_TX_SIZE (1 << MIN_TX_SIZE_LOG2)
+#define MAX_TX_SQUARE (MAX_TX_SIZE * MAX_TX_SIZE)
+
+// Pad 4 extra columns to remove horizontal availability check.
+#define TX_PAD_HOR_LOG2 2
+#define TX_PAD_HOR 4
+// Pad 6 extra rows (2 on top and 4 on bottom) to remove vertical availability
+// check.
+#define TX_PAD_TOP 0
+#define TX_PAD_BOTTOM 4
+#define TX_PAD_VER (TX_PAD_TOP + TX_PAD_BOTTOM)
+// Pad 16 extra bytes to avoid reading overflow in SIMD optimization.
+#define TX_PAD_END 16
+#define TX_PAD_2D ((32 + TX_PAD_HOR) * (32 + TX_PAD_VER) + TX_PAD_END)
+
+// Number of maxium size transform blocks in the maximum size superblock
+#define MAX_TX_BLOCKS_IN_MAX_SB_LOG2 ((MAX_SB_SIZE_LOG2 - MAX_TX_SIZE_LOG2) * 2)
+#define MAX_TX_BLOCKS_IN_MAX_SB (1 << MAX_TX_BLOCKS_IN_MAX_SB_LOG2)
+
+// frame transform mode
+enum {
+  ONLY_4X4,         // use only 4x4 transform
+  TX_MODE_LARGEST,  // transform size is the largest possible for pu size
+  TX_MODE_SELECT,   // transform specified for each block
+  TX_MODES,
+} UENUM1BYTE(TX_MODE);
+
+// 1D tx types
+enum {
+  DCT_1D,
+  ADST_1D,
+  FLIPADST_1D,
+  IDTX_1D,
+  TX_TYPES_1D,
+} UENUM1BYTE(TX_TYPE_1D);
+
+enum {
+  DCT_DCT,            // DCT in both horizontal and vertical
+  ADST_DCT,           // ADST in vertical, DCT in horizontal
+  DCT_ADST,           // DCT in vertical, ADST in horizontal
+  ADST_ADST,          // ADST in both directions
+  FLIPADST_DCT,       // FLIPADST in vertical, DCT in horizontal
+  DCT_FLIPADST,       // DCT in vertical, FLIPADST in horizontal
+  FLIPADST_FLIPADST,  // FLIPADST in both directions
+  ADST_FLIPADST,      // ADST in vertical, FLIPADST in horizontal
+  FLIPADST_ADST,      // FLIPADST in vertical, ADST in horizontal
+  IDTX,               // Identity in both directions
+  V_DCT,              // DCT in vertical, identity in horizontal
+  H_DCT,              // Identity in vertical, DCT in horizontal
+  V_ADST,             // ADST in vertical, identity in horizontal
+  H_ADST,             // Identity in vertical, ADST in horizontal
+  V_FLIPADST,         // FLIPADST in vertical, identity in horizontal
+  H_FLIPADST,         // Identity in vertical, FLIPADST in horizontal
+  TX_TYPES,
+} UENUM1BYTE(TX_TYPE);
+
+enum {
+  REG_REG,
+  REG_SMOOTH,
+  REG_SHARP,
+  SMOOTH_REG,
+  SMOOTH_SMOOTH,
+  SMOOTH_SHARP,
+  SHARP_REG,
+  SHARP_SMOOTH,
+  SHARP_SHARP,
+} UENUM1BYTE(DUAL_FILTER_TYPE);
+
+enum {
+  // DCT only
+  EXT_TX_SET_DCTONLY,
+  // DCT + Identity only
+  EXT_TX_SET_DCT_IDTX,
+  // Discrete Trig transforms w/o flip (4) + Identity (1)
+  EXT_TX_SET_DTT4_IDTX,
+  // Discrete Trig transforms w/o flip (4) + Identity (1) + 1D Hor/vert DCT (2)
+  EXT_TX_SET_DTT4_IDTX_1DDCT,
+  // Discrete Trig transforms w/ flip (9) + Identity (1) + 1D Hor/Ver DCT (2)
+  EXT_TX_SET_DTT9_IDTX_1DDCT,
+  // Discrete Trig transforms w/ flip (9) + Identity (1) + 1D Hor/Ver (6)
+  EXT_TX_SET_ALL16,
+  EXT_TX_SET_TYPES
+} UENUM1BYTE(TxSetType);
+
+#define IS_2D_TRANSFORM(tx_type) (tx_type < IDTX)
+
+#define EXT_TX_SIZES 4       // number of sizes that use extended transforms
+#define EXT_TX_SETS_INTER 4  // Sets of transform selections for INTER
+#define EXT_TX_SETS_INTRA 3  // Sets of transform selections for INTRA
+
+enum {
+  AOM_LAST_FLAG = 1 << 0,
+  AOM_LAST2_FLAG = 1 << 1,
+  AOM_LAST3_FLAG = 1 << 2,
+  AOM_GOLD_FLAG = 1 << 3,
+  AOM_BWD_FLAG = 1 << 4,
+  AOM_ALT2_FLAG = 1 << 5,
+  AOM_ALT_FLAG = 1 << 6,
+  AOM_REFFRAME_ALL = (1 << 7) - 1
+} UENUM1BYTE(AOM_REFFRAME);
+
+enum {
+  UNIDIR_COMP_REFERENCE,
+  BIDIR_COMP_REFERENCE,
+  COMP_REFERENCE_TYPES,
+} UENUM1BYTE(COMP_REFERENCE_TYPE);
+
+enum { PLANE_TYPE_Y, PLANE_TYPE_UV, PLANE_TYPES } UENUM1BYTE(PLANE_TYPE);
+
+#define CFL_ALPHABET_SIZE_LOG2 4
+#define CFL_ALPHABET_SIZE (1 << CFL_ALPHABET_SIZE_LOG2)
+#define CFL_MAGS_SIZE ((2 << CFL_ALPHABET_SIZE_LOG2) + 1)
+#define CFL_IDX_U(idx) (idx >> CFL_ALPHABET_SIZE_LOG2)
+#define CFL_IDX_V(idx) (idx & (CFL_ALPHABET_SIZE - 1))
+
+enum { CFL_PRED_U, CFL_PRED_V, CFL_PRED_PLANES } UENUM1BYTE(CFL_PRED_TYPE);
+
+enum {
+  CFL_SIGN_ZERO,
+  CFL_SIGN_NEG,
+  CFL_SIGN_POS,
+  CFL_SIGNS
+} UENUM1BYTE(CFL_SIGN_TYPE);
+
+enum {
+  CFL_DISALLOWED,
+  CFL_ALLOWED,
+  CFL_ALLOWED_TYPES
+} UENUM1BYTE(CFL_ALLOWED_TYPE);
+
+// CFL_SIGN_ZERO,CFL_SIGN_ZERO is invalid
+#define CFL_JOINT_SIGNS (CFL_SIGNS * CFL_SIGNS - 1)
+// CFL_SIGN_U is equivalent to (js + 1) / 3 for js in 0 to 8
+#define CFL_SIGN_U(js) (((js + 1) * 11) >> 5)
+// CFL_SIGN_V is equivalent to (js + 1) % 3 for js in 0 to 8
+#define CFL_SIGN_V(js) ((js + 1) - CFL_SIGNS * CFL_SIGN_U(js))
+
+// There is no context when the alpha for a given plane is zero.
+// So there are 2 fewer contexts than joint signs.
+#define CFL_ALPHA_CONTEXTS (CFL_JOINT_SIGNS + 1 - CFL_SIGNS)
+#define CFL_CONTEXT_U(js) (js + 1 - CFL_SIGNS)
+// Also, the contexts are symmetric under swapping the planes.
+#define CFL_CONTEXT_V(js) \
+  (CFL_SIGN_V(js) * CFL_SIGNS + CFL_SIGN_U(js) - CFL_SIGNS)
+
+enum {
+  PALETTE_MAP,
+  COLOR_MAP_TYPES,
+} UENUM1BYTE(COLOR_MAP_TYPE);
+
+enum {
+  TWO_COLORS,
+  THREE_COLORS,
+  FOUR_COLORS,
+  FIVE_COLORS,
+  SIX_COLORS,
+  SEVEN_COLORS,
+  EIGHT_COLORS,
+  PALETTE_SIZES
+} UENUM1BYTE(PALETTE_SIZE);
+
+enum {
+  PALETTE_COLOR_ONE,
+  PALETTE_COLOR_TWO,
+  PALETTE_COLOR_THREE,
+  PALETTE_COLOR_FOUR,
+  PALETTE_COLOR_FIVE,
+  PALETTE_COLOR_SIX,
+  PALETTE_COLOR_SEVEN,
+  PALETTE_COLOR_EIGHT,
+  PALETTE_COLORS
+} UENUM1BYTE(PALETTE_COLOR);
+
+// Note: All directional predictors must be between V_PRED and D67_PRED (both
+// inclusive).
+enum {
+  DC_PRED,        // Average of above and left pixels
+  V_PRED,         // Vertical
+  H_PRED,         // Horizontal
+  D45_PRED,       // Directional 45  degree
+  D135_PRED,      // Directional 135 degree
+  D113_PRED,      // Directional 113 degree
+  D157_PRED,      // Directional 157 degree
+  D203_PRED,      // Directional 203 degree
+  D67_PRED,       // Directional 67  degree
+  SMOOTH_PRED,    // Combination of horizontal and vertical interpolation
+  SMOOTH_V_PRED,  // Vertical interpolation
+  SMOOTH_H_PRED,  // Horizontal interpolation
+  PAETH_PRED,     // Predict from the direction of smallest gradient
+  NEARESTMV,
+  NEARMV,
+  GLOBALMV,
+  NEWMV,
+  // Compound ref compound modes
+  NEAREST_NEARESTMV,
+  NEAR_NEARMV,
+  NEAREST_NEWMV,
+  NEW_NEARESTMV,
+  NEAR_NEWMV,
+  NEW_NEARMV,
+  GLOBAL_GLOBALMV,
+  NEW_NEWMV,
+  MB_MODE_COUNT,
+  INTRA_MODE_START = DC_PRED,
+  INTRA_MODE_END = NEARESTMV,
+  INTRA_MODE_NUM = INTRA_MODE_END - INTRA_MODE_START,
+  SINGLE_INTER_MODE_START = NEARESTMV,
+  SINGLE_INTER_MODE_END = NEAREST_NEARESTMV,
+  SINGLE_INTER_MODE_NUM = SINGLE_INTER_MODE_END - SINGLE_INTER_MODE_START,
+  COMP_INTER_MODE_START = NEAREST_NEARESTMV,
+  COMP_INTER_MODE_END = MB_MODE_COUNT,
+  COMP_INTER_MODE_NUM = COMP_INTER_MODE_END - COMP_INTER_MODE_START,
+  INTER_MODE_START = NEARESTMV,
+  INTER_MODE_END = MB_MODE_COUNT,
+  INTRA_MODES = PAETH_PRED + 1,  // PAETH_PRED has to be the last intra mode.
+  INTRA_INVALID = MB_MODE_COUNT  // For uv_mode in inter blocks
+} UENUM1BYTE(PREDICTION_MODE);
+
+// TODO(ltrudeau) Do we really want to pack this?
+// TODO(ltrudeau) Do we match with PREDICTION_MODE?
+enum {
+  UV_DC_PRED,        // Average of above and left pixels
+  UV_V_PRED,         // Vertical
+  UV_H_PRED,         // Horizontal
+  UV_D45_PRED,       // Directional 45  degree
+  UV_D135_PRED,      // Directional 135 degree
+  UV_D113_PRED,      // Directional 113 degree
+  UV_D157_PRED,      // Directional 157 degree
+  UV_D203_PRED,      // Directional 203 degree
+  UV_D67_PRED,       // Directional 67  degree
+  UV_SMOOTH_PRED,    // Combination of horizontal and vertical interpolation
+  UV_SMOOTH_V_PRED,  // Vertical interpolation
+  UV_SMOOTH_H_PRED,  // Horizontal interpolation
+  UV_PAETH_PRED,     // Predict from the direction of smallest gradient
+  UV_CFL_PRED,       // Chroma-from-Luma
+  UV_INTRA_MODES,
+  UV_MODE_INVALID,  // For uv_mode in inter blocks
+} UENUM1BYTE(UV_PREDICTION_MODE);
+
+enum {
+  SIMPLE_TRANSLATION,
+  OBMC_CAUSAL,    // 2-sided OBMC
+  WARPED_CAUSAL,  // 2-sided WARPED
+  MOTION_MODES
+} UENUM1BYTE(MOTION_MODE);
+
+enum {
+  II_DC_PRED,
+  II_V_PRED,
+  II_H_PRED,
+  II_SMOOTH_PRED,
+  INTERINTRA_MODES
+} UENUM1BYTE(INTERINTRA_MODE);
+
+enum {
+  COMPOUND_AVERAGE,
+  COMPOUND_DISTWTD,
+  COMPOUND_WEDGE,
+  COMPOUND_DIFFWTD,
+  COMPOUND_TYPES,
+  MASKED_COMPOUND_TYPES = 2,
+} UENUM4BYTE(COMPOUND_TYPE);
+
+enum {
+  FILTER_DC_PRED,
+  FILTER_V_PRED,
+  FILTER_H_PRED,
+  FILTER_D157_PRED,
+  FILTER_PAETH_PRED,
+  FILTER_INTRA_MODES,
+} UENUM1BYTE(FILTER_INTRA_MODE);
+
+#define DIRECTIONAL_MODES 8
+#define MAX_ANGLE_DELTA 3
+#define ANGLE_STEP 3
+
+#define INTER_MODES (1 + NEWMV - NEARESTMV)
+
+#define INTER_COMPOUND_MODES (1 + NEW_NEWMV - NEAREST_NEARESTMV)
+
+#define SKIP_CONTEXTS 3
+#define SKIP_MODE_CONTEXTS 3
+
+#define COMP_INDEX_CONTEXTS 6
+#define COMP_GROUP_IDX_CONTEXTS 6
+
+#define NMV_CONTEXTS 3
+
+#define NEWMV_MODE_CONTEXTS 6
+#define GLOBALMV_MODE_CONTEXTS 2
+#define REFMV_MODE_CONTEXTS 6
+#define DRL_MODE_CONTEXTS 3
+
+#define GLOBALMV_OFFSET 3
+#define REFMV_OFFSET 4
+
+#define NEWMV_CTX_MASK ((1 << GLOBALMV_OFFSET) - 1)
+#define GLOBALMV_CTX_MASK ((1 << (REFMV_OFFSET - GLOBALMV_OFFSET)) - 1)
+#define REFMV_CTX_MASK ((1 << (8 - REFMV_OFFSET)) - 1)
+
+#define COMP_NEWMV_CTXS 5
+#define INTER_MODE_CONTEXTS 8
+
+#define DELTA_Q_SMALL 3
+#define DELTA_Q_PROBS (DELTA_Q_SMALL)
+#define DEFAULT_DELTA_Q_RES 4
+#define DELTA_LF_SMALL 3
+#define DELTA_LF_PROBS (DELTA_LF_SMALL)
+#define DEFAULT_DELTA_LF_RES 2
+
+/* Segment Feature Masks */
+#define MAX_MV_REF_CANDIDATES 2
+
+#define MAX_REF_MV_STACK_SIZE 8
+#define REF_CAT_LEVEL 640
+
+#define INTRA_INTER_CONTEXTS 4
+#define COMP_INTER_CONTEXTS 5
+#define REF_CONTEXTS 3
+
+#define COMP_REF_TYPE_CONTEXTS 5
+#define UNI_COMP_REF_CONTEXTS 3
+
+#define TXFM_PARTITION_CONTEXTS ((TX_SIZES - TX_8X8) * 6 - 3)
+typedef uint8_t TXFM_CONTEXT;
+
+// An enum for single reference types (and some derived values).
+enum {
+  NONE_FRAME = -1,
+  INTRA_FRAME,
+  LAST_FRAME,
+  LAST2_FRAME,
+  LAST3_FRAME,
+  GOLDEN_FRAME,
+  BWDREF_FRAME,
+  ALTREF2_FRAME,
+  ALTREF_FRAME,
+  REF_FRAMES,
+
+  // Extra/scratch reference frame. It may be:
+  // - used to update the ALTREF2_FRAME ref (see lshift_bwd_ref_frames()), or
+  // - updated from ALTREF2_FRAME ref (see rshift_bwd_ref_frames()).
+  EXTREF_FRAME = REF_FRAMES,
+
+  // Number of inter (non-intra) reference types.
+  INTER_REFS_PER_FRAME = ALTREF_FRAME - LAST_FRAME + 1,
+
+  // Number of forward (aka past) reference types.
+  FWD_REFS = GOLDEN_FRAME - LAST_FRAME + 1,
+
+  // Number of backward (aka future) reference types.
+  BWD_REFS = ALTREF_FRAME - BWDREF_FRAME + 1,
+
+  SINGLE_REFS = FWD_REFS + BWD_REFS,
+};
+
+#define REF_FRAMES_LOG2 3
+
+// REF_FRAMES for the cm->ref_frame_map array, 1 scratch frame for the new
+// frame in cm->cur_frame, INTER_REFS_PER_FRAME for scaled references on the
+// encoder in the cpi->scaled_ref_buf array.
+#define FRAME_BUFFERS (REF_FRAMES + 1 + INTER_REFS_PER_FRAME)
+
+#define FWD_RF_OFFSET(ref) (ref - LAST_FRAME)
+#define BWD_RF_OFFSET(ref) (ref - BWDREF_FRAME)
+
+enum {
+  LAST_LAST2_FRAMES,      // { LAST_FRAME, LAST2_FRAME }
+  LAST_LAST3_FRAMES,      // { LAST_FRAME, LAST3_FRAME }
+  LAST_GOLDEN_FRAMES,     // { LAST_FRAME, GOLDEN_FRAME }
+  BWDREF_ALTREF_FRAMES,   // { BWDREF_FRAME, ALTREF_FRAME }
+  LAST2_LAST3_FRAMES,     // { LAST2_FRAME, LAST3_FRAME }
+  LAST2_GOLDEN_FRAMES,    // { LAST2_FRAME, GOLDEN_FRAME }
+  LAST3_GOLDEN_FRAMES,    // { LAST3_FRAME, GOLDEN_FRAME }
+  BWDREF_ALTREF2_FRAMES,  // { BWDREF_FRAME, ALTREF2_FRAME }
+  ALTREF2_ALTREF_FRAMES,  // { ALTREF2_FRAME, ALTREF_FRAME }
+  TOTAL_UNIDIR_COMP_REFS,
+  // NOTE: UNIDIR_COMP_REFS is the number of uni-directional reference pairs
+  //       that are explicitly signaled.
+  UNIDIR_COMP_REFS = BWDREF_ALTREF_FRAMES + 1,
+} UENUM1BYTE(UNIDIR_COMP_REF);
+
+#define TOTAL_COMP_REFS (FWD_REFS * BWD_REFS + TOTAL_UNIDIR_COMP_REFS)
+
+#define COMP_REFS (FWD_REFS * BWD_REFS + UNIDIR_COMP_REFS)
+
+// NOTE: A limited number of unidirectional reference pairs can be signalled for
+//       compound prediction. The use of skip mode, on the other hand, makes it
+//       possible to have a reference pair not listed for explicit signaling.
+#define MODE_CTX_REF_FRAMES (REF_FRAMES + TOTAL_COMP_REFS)
+
+// Note: It includes single and compound references. So, it can take values from
+// NONE_FRAME to (MODE_CTX_REF_FRAMES - 1). Hence, it is not defined as an enum.
+typedef int8_t MV_REFERENCE_FRAME;
+
+enum {
+  RESTORE_NONE,
+  RESTORE_WIENER,
+  RESTORE_SGRPROJ,
+  RESTORE_SWITCHABLE,
+  RESTORE_SWITCHABLE_TYPES = RESTORE_SWITCHABLE,
+  RESTORE_TYPES = 4,
+} UENUM1BYTE(RestorationType);
+
+#define SUPERRES_SCALE_BITS 3
+#define SUPERRES_SCALE_DENOMINATOR_MIN (SCALE_NUMERATOR + 1)
+
+// In large_scale_tile coding, external references are used.
+#define MAX_EXTERNAL_REFERENCES 128
+#define MAX_TILES 512
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_ENUMS_H_

diff --git a/libav1/av1/common/filter.h b/libav1/av1/common/filter.h
new file mode 100644
index 0000000..184f5b2
--- /dev/null
+++ b/libav1/av1/common/filter.h

@@ -0,0 +1,234 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_FILTER_H_
+#define AOM_AV1_COMMON_FILTER_H_
+
+#include <assert.h>
+
+#include "config/aom_config.h"
+
+#include "aom/aom_integer.h"
+#include "aom_dsp/aom_filter.h"
+#include "aom_ports/mem.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MAX_FILTER_TAP 8
+
+typedef enum ATTRIBUTE_PACKED {
+  EIGHTTAP_REGULAR,
+  EIGHTTAP_SMOOTH,
+  MULTITAP_SHARP,
+  BILINEAR,
+  INTERP_FILTERS_ALL,
+  SWITCHABLE_FILTERS = BILINEAR,
+  SWITCHABLE = SWITCHABLE_FILTERS + 1, /* the last switchable one */
+  EXTRA_FILTERS = INTERP_FILTERS_ALL - SWITCHABLE_FILTERS,
+} InterpFilter;
+
+enum {
+  USE_2_TAPS_ORIG = 0,  // This is used in temporal filtering.
+  USE_2_TAPS,
+  USE_4_TAPS,
+  USE_8_TAPS,
+} UENUM1BYTE(SUBPEL_SEARCH_TYPE);
+
+// Pack two InterpFilter's into a uint32_t: since there are at most 10 filters,
+// we can use 16 bits for each and have more than enough space. This reduces
+// argument passing and unifies the operation of setting a (pair of) filters.
+typedef uint32_t InterpFilters;
+static INLINE InterpFilter av1_extract_interp_filter(InterpFilters filters,
+                                                     int x_filter) {
+  return (InterpFilter)((filters >> (x_filter ? 16 : 0)) & 0xf);
+}
+
+static INLINE InterpFilters av1_make_interp_filters(InterpFilter y_filter,
+                                                    InterpFilter x_filter) {
+  uint16_t y16 = y_filter & 0xf;
+  uint16_t x16 = x_filter & 0xf;
+  return y16 | ((uint32_t)x16 << 16);
+}
+
+static INLINE InterpFilters av1_broadcast_interp_filter(InterpFilter filter) {
+  return av1_make_interp_filters(filter, filter);
+}
+
+static INLINE InterpFilter av1_unswitchable_filter(InterpFilter filter) {
+  return filter == SWITCHABLE ? EIGHTTAP_REGULAR : filter;
+}
+
+/* (1 << LOG_SWITCHABLE_FILTERS) > SWITCHABLE_FILTERS */
+#define LOG_SWITCHABLE_FILTERS 2
+
+#define MAX_SUBPEL_TAPS 12
+#define SWITCHABLE_FILTER_CONTEXTS ((SWITCHABLE_FILTERS + 1) * 4)
+#define INTER_FILTER_COMP_OFFSET (SWITCHABLE_FILTERS + 1)
+#define INTER_FILTER_DIR_OFFSET ((SWITCHABLE_FILTERS + 1) * 2)
+
+typedef struct InterpFilterParams {
+  const int16_t *filter_ptr;
+  uint16_t taps;
+  uint16_t subpel_shifts;
+  InterpFilter interp_filter;
+} InterpFilterParams;
+
+DECLARE_ALIGNED(256, static const InterpKernel,
+                av1_bilinear_filters[SUBPEL_SHIFTS]) = {
+  { 0, 0, 0, 128, 0, 0, 0, 0 },  { 0, 0, 0, 120, 8, 0, 0, 0 },
+  { 0, 0, 0, 112, 16, 0, 0, 0 }, { 0, 0, 0, 104, 24, 0, 0, 0 },
+  { 0, 0, 0, 96, 32, 0, 0, 0 },  { 0, 0, 0, 88, 40, 0, 0, 0 },
+  { 0, 0, 0, 80, 48, 0, 0, 0 },  { 0, 0, 0, 72, 56, 0, 0, 0 },
+  { 0, 0, 0, 64, 64, 0, 0, 0 },  { 0, 0, 0, 56, 72, 0, 0, 0 },
+  { 0, 0, 0, 48, 80, 0, 0, 0 },  { 0, 0, 0, 40, 88, 0, 0, 0 },
+  { 0, 0, 0, 32, 96, 0, 0, 0 },  { 0, 0, 0, 24, 104, 0, 0, 0 },
+  { 0, 0, 0, 16, 112, 0, 0, 0 }, { 0, 0, 0, 8, 120, 0, 0, 0 }
+};
+
+DECLARE_ALIGNED(256, static const InterpKernel,
+                av1_sub_pel_filters_8[SUBPEL_SHIFTS]) = {
+  { 0, 0, 0, 128, 0, 0, 0, 0 },      { 0, 2, -6, 126, 8, -2, 0, 0 },
+  { 0, 2, -10, 122, 18, -4, 0, 0 },  { 0, 2, -12, 116, 28, -8, 2, 0 },
+  { 0, 2, -14, 110, 38, -10, 2, 0 }, { 0, 2, -14, 102, 48, -12, 2, 0 },
+  { 0, 2, -16, 94, 58, -12, 2, 0 },  { 0, 2, -14, 84, 66, -12, 2, 0 },
+  { 0, 2, -14, 76, 76, -14, 2, 0 },  { 0, 2, -12, 66, 84, -14, 2, 0 },
+  { 0, 2, -12, 58, 94, -16, 2, 0 },  { 0, 2, -12, 48, 102, -14, 2, 0 },
+  { 0, 2, -10, 38, 110, -14, 2, 0 }, { 0, 2, -8, 28, 116, -12, 2, 0 },
+  { 0, 0, -4, 18, 122, -10, 2, 0 },  { 0, 0, -2, 8, 126, -6, 2, 0 }
+};
+
+DECLARE_ALIGNED(256, static const InterpKernel,
+                av1_sub_pel_filters_8sharp[SUBPEL_SHIFTS]) = {
+  { 0, 0, 0, 128, 0, 0, 0, 0 },         { -2, 2, -6, 126, 8, -2, 2, 0 },
+  { -2, 6, -12, 124, 16, -6, 4, -2 },   { -2, 8, -18, 120, 26, -10, 6, -2 },
+  { -4, 10, -22, 116, 38, -14, 6, -2 }, { -4, 10, -22, 108, 48, -18, 8, -2 },
+  { -4, 10, -24, 100, 60, -20, 8, -2 }, { -4, 10, -24, 90, 70, -22, 10, -2 },
+  { -4, 12, -24, 80, 80, -24, 12, -4 }, { -2, 10, -22, 70, 90, -24, 10, -4 },
+  { -2, 8, -20, 60, 100, -24, 10, -4 }, { -2, 8, -18, 48, 108, -22, 10, -4 },
+  { -2, 6, -14, 38, 116, -22, 10, -4 }, { -2, 6, -10, 26, 120, -18, 8, -2 },
+  { -2, 4, -6, 16, 124, -12, 6, -2 },   { 0, 2, -2, 8, 126, -6, 2, -2 }
+};
+
+DECLARE_ALIGNED(256, static const InterpKernel,
+                av1_sub_pel_filters_8smooth[SUBPEL_SHIFTS]) = {
+  { 0, 0, 0, 128, 0, 0, 0, 0 },     { 0, 2, 28, 62, 34, 2, 0, 0 },
+  { 0, 0, 26, 62, 36, 4, 0, 0 },    { 0, 0, 22, 62, 40, 4, 0, 0 },
+  { 0, 0, 20, 60, 42, 6, 0, 0 },    { 0, 0, 18, 58, 44, 8, 0, 0 },
+  { 0, 0, 16, 56, 46, 10, 0, 0 },   { 0, -2, 16, 54, 48, 12, 0, 0 },
+  { 0, -2, 14, 52, 52, 14, -2, 0 }, { 0, 0, 12, 48, 54, 16, -2, 0 },
+  { 0, 0, 10, 46, 56, 16, 0, 0 },   { 0, 0, 8, 44, 58, 18, 0, 0 },
+  { 0, 0, 6, 42, 60, 20, 0, 0 },    { 0, 0, 4, 40, 62, 22, 0, 0 },
+  { 0, 0, 4, 36, 62, 26, 0, 0 },    { 0, 0, 2, 34, 62, 28, 2, 0 }
+};
+
+static const InterpFilterParams
+    av1_interp_filter_params_list[SWITCHABLE_FILTERS + 1] = {
+      { (const int16_t *)av1_sub_pel_filters_8, SUBPEL_TAPS, SUBPEL_SHIFTS,
+        EIGHTTAP_REGULAR },
+      { (const int16_t *)av1_sub_pel_filters_8smooth, SUBPEL_TAPS,
+        SUBPEL_SHIFTS, EIGHTTAP_SMOOTH },
+      { (const int16_t *)av1_sub_pel_filters_8sharp, SUBPEL_TAPS, SUBPEL_SHIFTS,
+        MULTITAP_SHARP },
+      { (const int16_t *)av1_bilinear_filters, SUBPEL_TAPS, SUBPEL_SHIFTS,
+        BILINEAR }
+    };
+
+// A special 2-tap bilinear filter for IntraBC chroma. IntraBC uses full pixel
+// MV for luma. If sub-sampling exists, chroma may possibly use half-pel MV.
+DECLARE_ALIGNED(256, static const int16_t, av1_intrabc_bilinear_filter[2]) = {
+  64,
+  64,
+};
+
+static const InterpFilterParams av1_intrabc_filter_params = {
+  av1_intrabc_bilinear_filter, 2, 0, BILINEAR
+};
+
+DECLARE_ALIGNED(256, static const InterpKernel,
+                av1_sub_pel_filters_4[SUBPEL_SHIFTS]) = {
+  { 0, 0, 0, 128, 0, 0, 0, 0 },     { 0, 0, -4, 126, 8, -2, 0, 0 },
+  { 0, 0, -8, 122, 18, -4, 0, 0 },  { 0, 0, -10, 116, 28, -6, 0, 0 },
+  { 0, 0, -12, 110, 38, -8, 0, 0 }, { 0, 0, -12, 102, 48, -10, 0, 0 },
+  { 0, 0, -14, 94, 58, -10, 0, 0 }, { 0, 0, -12, 84, 66, -10, 0, 0 },
+  { 0, 0, -12, 76, 76, -12, 0, 0 }, { 0, 0, -10, 66, 84, -12, 0, 0 },
+  { 0, 0, -10, 58, 94, -14, 0, 0 }, { 0, 0, -10, 48, 102, -12, 0, 0 },
+  { 0, 0, -8, 38, 110, -12, 0, 0 }, { 0, 0, -6, 28, 116, -10, 0, 0 },
+  { 0, 0, -4, 18, 122, -8, 0, 0 },  { 0, 0, -2, 8, 126, -4, 0, 0 }
+};
+DECLARE_ALIGNED(256, static const InterpKernel,
+                av1_sub_pel_filters_4smooth[SUBPEL_SHIFTS]) = {
+  { 0, 0, 0, 128, 0, 0, 0, 0 },   { 0, 0, 30, 62, 34, 2, 0, 0 },
+  { 0, 0, 26, 62, 36, 4, 0, 0 },  { 0, 0, 22, 62, 40, 4, 0, 0 },
+  { 0, 0, 20, 60, 42, 6, 0, 0 },  { 0, 0, 18, 58, 44, 8, 0, 0 },
+  { 0, 0, 16, 56, 46, 10, 0, 0 }, { 0, 0, 14, 54, 48, 12, 0, 0 },
+  { 0, 0, 12, 52, 52, 12, 0, 0 }, { 0, 0, 12, 48, 54, 14, 0, 0 },
+  { 0, 0, 10, 46, 56, 16, 0, 0 }, { 0, 0, 8, 44, 58, 18, 0, 0 },
+  { 0, 0, 6, 42, 60, 20, 0, 0 },  { 0, 0, 4, 40, 62, 22, 0, 0 },
+  { 0, 0, 4, 36, 62, 26, 0, 0 },  { 0, 0, 2, 34, 62, 30, 0, 0 }
+};
+
+// For w<=4, MULTITAP_SHARP is the same as EIGHTTAP_REGULAR
+static const InterpFilterParams av1_interp_4tap[SWITCHABLE_FILTERS + 1] = {
+  { (const int16_t *)av1_sub_pel_filters_4, SUBPEL_TAPS, SUBPEL_SHIFTS,
+    EIGHTTAP_REGULAR },
+  { (const int16_t *)av1_sub_pel_filters_4smooth, SUBPEL_TAPS, SUBPEL_SHIFTS,
+    EIGHTTAP_SMOOTH },
+  { (const int16_t *)av1_sub_pel_filters_4, SUBPEL_TAPS, SUBPEL_SHIFTS,
+    EIGHTTAP_REGULAR },
+  { (const int16_t *)av1_bilinear_filters, SUBPEL_TAPS, SUBPEL_SHIFTS,
+    BILINEAR },
+};
+
+static INLINE const InterpFilterParams *
+av1_get_interp_filter_params_with_block_size(const InterpFilter interp_filter,
+                                             const int w) {
+  if (w <= 4) return &av1_interp_4tap[interp_filter];
+  return &av1_interp_filter_params_list[interp_filter];
+}
+
+static INLINE const InterpFilterParams *get_4tap_interp_filter_params(
+    const InterpFilter interp_filter) {
+  return &av1_interp_4tap[interp_filter];
+}
+
+static INLINE const int16_t *av1_get_interp_filter_kernel(
+    const InterpFilter interp_filter, int subpel_search) {
+  assert(subpel_search >= USE_2_TAPS);
+  return (subpel_search == USE_2_TAPS)
+             ? av1_interp_4tap[BILINEAR].filter_ptr
+             : ((subpel_search == USE_4_TAPS)
+                    ? av1_interp_4tap[interp_filter].filter_ptr
+                    : av1_interp_filter_params_list[interp_filter].filter_ptr);
+}
+
+static INLINE const int16_t *av1_get_interp_filter_subpel_kernel(
+    const InterpFilterParams *const filter_params, const int subpel) {
+  return filter_params->filter_ptr + filter_params->taps * subpel;
+}
+
+static INLINE const InterpFilterParams *av1_get_filter(int subpel_search) {
+  assert(subpel_search >= USE_2_TAPS);
+
+  switch (subpel_search) {
+    case USE_2_TAPS: return get_4tap_interp_filter_params(BILINEAR);
+    case USE_4_TAPS: return get_4tap_interp_filter_params(EIGHTTAP_REGULAR);
+    case USE_8_TAPS: return &av1_interp_filter_params_list[EIGHTTAP_REGULAR];
+    default: assert(0); return NULL;
+  }
+}
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_FILTER_H_

diff --git a/libav1/av1/common/frame_buffers.c b/libav1/av1/common/frame_buffers.c
new file mode 100644
index 0000000..f10ccd5
--- /dev/null
+++ b/libav1/av1/common/frame_buffers.c

@@ -0,0 +1,98 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+
+#include "av1/common/frame_buffers.h"
+#include "aom_mem/aom_mem.h"
+
+int av1_alloc_internal_frame_buffers(InternalFrameBufferList *list) {
+  assert(list != NULL);
+  av1_free_internal_frame_buffers(list);
+
+  list->num_internal_frame_buffers =
+      AOM_MAXIMUM_REF_BUFFERS + AOM_MAXIMUM_WORK_BUFFERS;
+  list->int_fb = (InternalFrameBuffer *)aom_calloc(
+      list->num_internal_frame_buffers, sizeof(*list->int_fb));
+  if (list->int_fb == NULL) {
+    list->num_internal_frame_buffers = 0;
+    return 1;
+  }
+  return 0;
+}
+
+void av1_free_internal_frame_buffers(InternalFrameBufferList *list) {
+  int i;
+
+  assert(list != NULL);
+
+  for (i = 0; i < list->num_internal_frame_buffers; ++i) {
+    aom_free(list->int_fb[i].data);
+    list->int_fb[i].data = NULL;
+  }
+  aom_free(list->int_fb);
+  list->int_fb = NULL;
+  list->num_internal_frame_buffers = 0;
+}
+
+void av1_zero_unused_internal_frame_buffers(InternalFrameBufferList *list) {
+  int i;
+
+  assert(list != NULL);
+
+  for (i = 0; i < list->num_internal_frame_buffers; ++i) {
+    if (list->int_fb[i].data && !list->int_fb[i].in_use)
+      memset(list->int_fb[i].data, 0, list->int_fb[i].size);
+  }
+}
+
+int av1_get_frame_buffer(void *cb_priv, size_t min_size,
+                         aom_codec_frame_buffer_t *fb) {
+  int i;
+  InternalFrameBufferList *const int_fb_list =
+      (InternalFrameBufferList *)cb_priv;
+  if (int_fb_list == NULL) return -1;
+
+  // Find a free frame buffer.
+  for (i = 0; i < int_fb_list->num_internal_frame_buffers; ++i) {
+    if (!int_fb_list->int_fb[i].in_use) break;
+  }
+
+  if (i == int_fb_list->num_internal_frame_buffers) return -1;
+
+  if (int_fb_list->int_fb[i].size < min_size) {
+    aom_free(int_fb_list->int_fb[i].data);
+    // The data must be zeroed to fix a valgrind error from the C loop filter
+    // due to access uninitialized memory in frame border. It could be
+    // skipped if border were totally removed.
+    int_fb_list->int_fb[i].data = (uint8_t *)aom_calloc(1, min_size);
+    if (!int_fb_list->int_fb[i].data) {
+      int_fb_list->int_fb[i].size = 0;
+      return -1;
+    }
+    int_fb_list->int_fb[i].size = min_size;
+  }
+
+  fb->data = int_fb_list->int_fb[i].data;
+  fb->size = int_fb_list->int_fb[i].size;
+  int_fb_list->int_fb[i].in_use = 1;
+
+  // Set the frame buffer's private data to point at the internal frame buffer.
+  fb->priv = &int_fb_list->int_fb[i];
+  return 0;
+}
+
+int av1_release_frame_buffer(void *cb_priv, aom_codec_frame_buffer_t *fb) {
+  InternalFrameBuffer *const int_fb = (InternalFrameBuffer *)fb->priv;
+  (void)cb_priv;
+  if (int_fb) int_fb->in_use = 0;
+  return 0;
+}

diff --git a/libav1/av1/common/frame_buffers.h b/libav1/av1/common/frame_buffers.h
new file mode 100644
index 0000000..16188e5
--- /dev/null
+++ b/libav1/av1/common/frame_buffers.h

@@ -0,0 +1,60 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_FRAME_BUFFERS_H_
+#define AOM_AV1_COMMON_FRAME_BUFFERS_H_
+
+#include "aom/aom_frame_buffer.h"
+#include "aom/aom_integer.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef struct InternalFrameBuffer {
+  uint8_t *data;
+  size_t size;
+  int in_use;
+} InternalFrameBuffer;
+
+typedef struct InternalFrameBufferList {
+  int num_internal_frame_buffers;
+  InternalFrameBuffer *int_fb;
+} InternalFrameBufferList;
+
+// Initializes |list|. Returns 0 on success.
+int av1_alloc_internal_frame_buffers(InternalFrameBufferList *list);
+
+// Free any data allocated to the frame buffers.
+void av1_free_internal_frame_buffers(InternalFrameBufferList *list);
+
+// Zeros all unused internal frame buffers. In particular, this zeros the
+// frame borders. Call this function after a sequence header change to
+// re-initialize the frame borders for the different width, height, or bit
+// depth.
+void av1_zero_unused_internal_frame_buffers(InternalFrameBufferList *list);
+
+// Callback used by libaom to request an external frame buffer. |cb_priv|
+// Callback private data, which points to an InternalFrameBufferList.
+// |min_size| is the minimum size in bytes needed to decode the next frame.
+// |fb| pointer to the frame buffer.
+int av1_get_frame_buffer(void *cb_priv, size_t min_size,
+                         aom_codec_frame_buffer_t *fb);
+
+// Callback used by libaom when there are no references to the frame buffer.
+// |cb_priv| is not used. |fb| pointer to the frame buffer.
+int av1_release_frame_buffer(void *cb_priv, aom_codec_frame_buffer_t *fb);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_FRAME_BUFFERS_H_

diff --git a/libav1/av1/common/idct.c b/libav1/av1/common/idct.c
new file mode 100644
index 0000000..cd2b7df
--- /dev/null
+++ b/libav1/av1/common/idct.c

@@ -0,0 +1,485 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <math.h>
+
+#include "config/aom_dsp_rtcd.h"
+#include "config/av1_rtcd.h"
+
+#include "aom_ports/mem.h"
+#include "av1/common/av1_inv_txfm1d_cfg.h"
+#include "av1/common/av1_txfm.h"
+#include "av1/common/blockd.h"
+#include "av1/common/enums.h"
+#include "av1/common/idct.h"
+
+int av1_get_tx_scale(const TX_SIZE tx_size) {
+  const int pels = tx_size_2d[tx_size];
+  // Largest possible pels is 4096 (64x64).
+  return (pels > 256) + (pels > 1024);
+}
+
+// NOTE: The implementation of all inverses need to be aware of the fact
+// that input and output could be the same buffer.
+
+// idct
+void av1_highbd_iwht4x4_add(const tran_low_t *input, uint8_t *dest, int stride,
+                            int eob, int bd) {
+  if (eob > 1)
+    av1_highbd_iwht4x4_16_add(input, dest, stride, bd);
+  else
+    av1_highbd_iwht4x4_1_add(input, dest, stride, bd);
+}
+
+void av1_highbd_inv_txfm_add_4x4_c(const tran_low_t *input, uint8_t *dest,
+                                   int stride, const TxfmParam *txfm_param) {
+  assert(av1_ext_tx_used[txfm_param->tx_set_type][txfm_param->tx_type]);
+  int eob = txfm_param->eob;
+  int bd = txfm_param->bd;
+  int lossless = txfm_param->lossless;
+  const int32_t *src = cast_to_int32(input);
+  const TX_TYPE tx_type = txfm_param->tx_type;
+  if (lossless) {
+    assert(tx_type == DCT_DCT);
+    av1_highbd_iwht4x4_add(input, dest, stride, eob, bd);
+    return;
+  }
+
+  av1_inv_txfm2d_add_4x4_c(src, CONVERT_TO_SHORTPTR(dest), stride, tx_type, bd);
+}
+
+void av1_highbd_inv_txfm_add_4x8_c(const tran_low_t *input, uint8_t *dest,
+                                   int stride, const TxfmParam *txfm_param) {
+  assert(av1_ext_tx_used[txfm_param->tx_set_type][txfm_param->tx_type]);
+  const int32_t *src = cast_to_int32(input);
+  av1_inv_txfm2d_add_4x8_c(src, CONVERT_TO_SHORTPTR(dest), stride,
+                           txfm_param->tx_type, txfm_param->bd);
+}
+
+void av1_highbd_inv_txfm_add_8x4_c(const tran_low_t *input, uint8_t *dest,
+                                   int stride, const TxfmParam *txfm_param) {
+  assert(av1_ext_tx_used[txfm_param->tx_set_type][txfm_param->tx_type]);
+  const int32_t *src = cast_to_int32(input);
+  av1_inv_txfm2d_add_8x4_c(src, CONVERT_TO_SHORTPTR(dest), stride,
+                           txfm_param->tx_type, txfm_param->bd);
+}
+
+void av1_highbd_inv_txfm_add_16x32_c(const tran_low_t *input, uint8_t *dest,
+                                     int stride, const TxfmParam *txfm_param) {
+  const int32_t *src = cast_to_int32(input);
+  av1_inv_txfm2d_add_16x32_c(src, CONVERT_TO_SHORTPTR(dest), stride,
+                             txfm_param->tx_type, txfm_param->bd);
+}
+
+void av1_highbd_inv_txfm_add_32x16_c(const tran_low_t *input, uint8_t *dest,
+                                     int stride, const TxfmParam *txfm_param) {
+  const int32_t *src = cast_to_int32(input);
+  av1_inv_txfm2d_add_32x16_c(src, CONVERT_TO_SHORTPTR(dest), stride,
+                             txfm_param->tx_type, txfm_param->bd);
+}
+
+void av1_highbd_inv_txfm_add_16x4_c(const tran_low_t *input, uint8_t *dest,
+                                    int stride, const TxfmParam *txfm_param) {
+  const int32_t *src = cast_to_int32(input);
+  av1_inv_txfm2d_add_16x4_c(src, CONVERT_TO_SHORTPTR(dest), stride,
+                            txfm_param->tx_type, txfm_param->bd);
+}
+
+void av1_highbd_inv_txfm_add_4x16_c(const tran_low_t *input, uint8_t *dest,
+                                    int stride, const TxfmParam *txfm_param) {
+  const int32_t *src = cast_to_int32(input);
+  av1_inv_txfm2d_add_4x16_c(src, CONVERT_TO_SHORTPTR(dest), stride,
+                            txfm_param->tx_type, txfm_param->bd);
+}
+
+void av1_highbd_inv_txfm_add_32x8_c(const tran_low_t *input, uint8_t *dest,
+                                    int stride, const TxfmParam *txfm_param) {
+  const int32_t *src = cast_to_int32(input);
+  av1_inv_txfm2d_add_32x8_c(src, CONVERT_TO_SHORTPTR(dest), stride,
+                            txfm_param->tx_type, txfm_param->bd);
+}
+
+void av1_highbd_inv_txfm_add_8x32_c(const tran_low_t *input, uint8_t *dest,
+                                    int stride, const TxfmParam *txfm_param) {
+  const int32_t *src = cast_to_int32(input);
+  av1_inv_txfm2d_add_8x32_c(src, CONVERT_TO_SHORTPTR(dest), stride,
+                            txfm_param->tx_type, txfm_param->bd);
+}
+
+void av1_highbd_inv_txfm_add_32x64_c(const tran_low_t *input, uint8_t *dest,
+                                     int stride, const TxfmParam *txfm_param) {
+  const int32_t *src = cast_to_int32(input);
+  av1_inv_txfm2d_add_32x64_c(src, CONVERT_TO_SHORTPTR(dest), stride,
+                             txfm_param->tx_type, txfm_param->bd);
+}
+
+void av1_highbd_inv_txfm_add_64x32_c(const tran_low_t *input, uint8_t *dest,
+                                     int stride, const TxfmParam *txfm_param) {
+  const int32_t *src = cast_to_int32(input);
+  av1_inv_txfm2d_add_64x32_c(src, CONVERT_TO_SHORTPTR(dest), stride,
+                             txfm_param->tx_type, txfm_param->bd);
+}
+
+void av1_highbd_inv_txfm_add_16x64_c(const tran_low_t *input, uint8_t *dest,
+                                     int stride, const TxfmParam *txfm_param) {
+  const int32_t *src = cast_to_int32(input);
+  av1_inv_txfm2d_add_16x64_c(src, CONVERT_TO_SHORTPTR(dest), stride,
+                             txfm_param->tx_type, txfm_param->bd);
+}
+
+void av1_highbd_inv_txfm_add_64x16_c(const tran_low_t *input, uint8_t *dest,
+                                     int stride, const TxfmParam *txfm_param) {
+  const int32_t *src = cast_to_int32(input);
+  av1_inv_txfm2d_add_64x16_c(src, CONVERT_TO_SHORTPTR(dest), stride,
+                             txfm_param->tx_type, txfm_param->bd);
+}
+
+void av1_highbd_inv_txfm_add_8x8_c(const tran_low_t *input, uint8_t *dest,
+                                   int stride, const TxfmParam *txfm_param) {
+  int bd = txfm_param->bd;
+  const TX_TYPE tx_type = txfm_param->tx_type;
+  const int32_t *src = cast_to_int32(input);
+
+  av1_inv_txfm2d_add_8x8_c(src, CONVERT_TO_SHORTPTR(dest), stride, tx_type, bd);
+}
+
+void av1_highbd_inv_txfm_add_16x16_c(const tran_low_t *input, uint8_t *dest,
+                                     int stride, const TxfmParam *txfm_param) {
+  int bd = txfm_param->bd;
+  const TX_TYPE tx_type = txfm_param->tx_type;
+  const int32_t *src = cast_to_int32(input);
+
+  av1_inv_txfm2d_add_16x16_c(src, CONVERT_TO_SHORTPTR(dest), stride, tx_type,
+                             bd);
+}
+
+void av1_highbd_inv_txfm_add_8x16_c(const tran_low_t *input, uint8_t *dest,
+                                    int stride, const TxfmParam *txfm_param) {
+  const int32_t *src = cast_to_int32(input);
+  av1_inv_txfm2d_add_8x16_c(src, CONVERT_TO_SHORTPTR(dest), stride,
+                            txfm_param->tx_type, txfm_param->bd);
+}
+
+void av1_highbd_inv_txfm_add_16x8_c(const tran_low_t *input, uint8_t *dest,
+                                    int stride, const TxfmParam *txfm_param) {
+  const int32_t *src = cast_to_int32(input);
+  av1_inv_txfm2d_add_16x8_c(src, CONVERT_TO_SHORTPTR(dest), stride,
+                            txfm_param->tx_type, txfm_param->bd);
+}
+
+void av1_highbd_inv_txfm_add_32x32_c(const tran_low_t *input, uint8_t *dest,
+                                     int stride, const TxfmParam *txfm_param) {
+  const int bd = txfm_param->bd;
+  const TX_TYPE tx_type = txfm_param->tx_type;
+  const int32_t *src = cast_to_int32(input);
+
+  av1_inv_txfm2d_add_32x32_c(src, CONVERT_TO_SHORTPTR(dest), stride, tx_type,
+                             bd);
+}
+
+void av1_highbd_inv_txfm_add_64x64_c(const tran_low_t *input, uint8_t *dest,
+                                     int stride, const TxfmParam *txfm_param) {
+  const int bd = txfm_param->bd;
+  const TX_TYPE tx_type = txfm_param->tx_type;
+  const int32_t *src = cast_to_int32(input);
+  assert(tx_type == DCT_DCT);
+  av1_inv_txfm2d_add_64x64_c(src, CONVERT_TO_SHORTPTR(dest), stride, tx_type,
+                             bd);
+}
+
+static void init_txfm_param(const MACROBLOCKD *xd, int plane, TX_SIZE tx_size,
+                            TX_TYPE tx_type, int eob, int reduced_tx_set,
+                            TxfmParam *txfm_param) {
+  (void)plane;
+  txfm_param->tx_type = tx_type;
+  txfm_param->tx_size = tx_size;
+  txfm_param->eob = eob;
+  txfm_param->lossless = xd->lossless[xd->mi[0]->segment_id];
+  txfm_param->bd = xd->bd;
+  txfm_param->is_hbd = is_cur_buf_hbd(xd);
+  txfm_param->tx_set_type = av1_get_ext_tx_set_type(
+      txfm_param->tx_size, is_inter_block(xd->mi[0]), reduced_tx_set);
+}
+
+void av1_highbd_inv_txfm_add_c(const tran_low_t *input, uint8_t *dest,
+                               int stride, const TxfmParam *txfm_param) {
+  assert(av1_ext_tx_used[txfm_param->tx_set_type][txfm_param->tx_type]);
+  const TX_SIZE tx_size = txfm_param->tx_size;
+  switch (tx_size) {
+    case TX_32X32:
+      av1_highbd_inv_txfm_add_32x32_c(input, dest, stride, txfm_param);
+      break;
+    case TX_16X16:
+      av1_highbd_inv_txfm_add_16x16_c(input, dest, stride, txfm_param);
+      break;
+    case TX_8X8:
+      av1_highbd_inv_txfm_add_8x8_c(input, dest, stride, txfm_param);
+      break;
+    case TX_4X8:
+      av1_highbd_inv_txfm_add_4x8_c(input, dest, stride, txfm_param);
+      break;
+    case TX_8X4:
+      av1_highbd_inv_txfm_add_8x4_c(input, dest, stride, txfm_param);
+      break;
+    case TX_8X16:
+      av1_highbd_inv_txfm_add_8x16_c(input, dest, stride, txfm_param);
+      break;
+    case TX_16X8:
+      av1_highbd_inv_txfm_add_16x8_c(input, dest, stride, txfm_param);
+      break;
+    case TX_16X32:
+      av1_highbd_inv_txfm_add_16x32_c(input, dest, stride, txfm_param);
+      break;
+    case TX_32X16:
+      av1_highbd_inv_txfm_add_32x16_c(input, dest, stride, txfm_param);
+      break;
+    case TX_64X64:
+      av1_highbd_inv_txfm_add_64x64_c(input, dest, stride, txfm_param);
+      break;
+    case TX_32X64:
+      av1_highbd_inv_txfm_add_32x64_c(input, dest, stride, txfm_param);
+      break;
+    case TX_64X32:
+      av1_highbd_inv_txfm_add_64x32_c(input, dest, stride, txfm_param);
+      break;
+    case TX_16X64:
+      av1_highbd_inv_txfm_add_16x64_c(input, dest, stride, txfm_param);
+      break;
+    case TX_64X16:
+      av1_highbd_inv_txfm_add_64x16_c(input, dest, stride, txfm_param);
+      break;
+    case TX_4X4:
+      // this is like av1_short_idct4x4 but has a special case around eob<=1
+      // which is significant (not just an optimization) for the lossless
+      // case.
+      av1_highbd_inv_txfm_add_4x4_c(input, dest, stride, txfm_param);
+      break;
+    case TX_16X4:
+      av1_highbd_inv_txfm_add_16x4_c(input, dest, stride, txfm_param);
+      break;
+    case TX_4X16:
+      av1_highbd_inv_txfm_add_4x16_c(input, dest, stride, txfm_param);
+      break;
+    case TX_8X32:
+      av1_highbd_inv_txfm_add_8x32_c(input, dest, stride, txfm_param);
+      break;
+    case TX_32X8:
+      av1_highbd_inv_txfm_add_32x8_c(input, dest, stride, txfm_param);
+      break;
+    default: assert(0 && "Invalid transform size"); break;
+  }
+}
+
+void av1_inv_txfm_add_c(const tran_low_t *dqcoeff, uint8_t *dst, int stride,
+                        const TxfmParam *txfm_param) {
+  const TX_SIZE tx_size = txfm_param->tx_size;
+  DECLARE_ALIGNED(32, uint16_t, tmp[MAX_TX_SQUARE]);
+  int tmp_stride = MAX_TX_SIZE;
+  int w = tx_size_wide[tx_size];
+  int h = tx_size_high[tx_size];
+  for (int r = 0; r < h; ++r) {
+    for (int c = 0; c < w; ++c) {
+      tmp[r * tmp_stride + c] = dst[r * stride + c];
+    }
+  }
+
+  av1_highbd_inv_txfm_add(dqcoeff, CONVERT_TO_BYTEPTR(tmp), tmp_stride,
+                          txfm_param);
+
+  for (int r = 0; r < h; ++r) {
+    for (int c = 0; c < w; ++c) {
+        if (dst[r * stride + c] != 253)
+        {
+            printf("-");
+            dst[r * stride + c] = (uint8_t)tmp[r * tmp_stride + c];
+        }
+    }
+  }
+}
+
+void av1_inverse_transform_block(const MACROBLOCKD *xd,
+                                 const tran_low_t *dqcoeff, int plane,
+                                 TX_TYPE tx_type, TX_SIZE tx_size, uint8_t *dst,
+                                 int stride, int eob, int reduced_tx_set) {
+  if (!eob) return;
+
+  assert(eob <= av1_get_max_eob(tx_size));
+
+  TxfmParam txfm_param;
+  init_txfm_param(xd, plane, tx_size, tx_type, eob, reduced_tx_set,
+                  &txfm_param);
+  assert(av1_ext_tx_used[txfm_param.tx_set_type][txfm_param.tx_type]);
+
+  if (txfm_param.is_hbd) {
+    av1_highbd_inv_txfm_add(dqcoeff, dst, stride, &txfm_param);
+  } else {
+    av1_inv_txfm_add(dqcoeff, dst, stride, &txfm_param);
+  }
+}
+
+
+void av1_highbd_iwht4x4_16_c(const tran_low_t *input, int16_t *dst, int stride, int bd) {
+    /* 4-point reversible, orthonormal inverse Walsh-Hadamard in 3.5 adds,
+       0.5 shifts per pixel. */
+    int i;
+    tran_low_t output[16];
+    tran_low_t a1, b1, c1, d1, e1;
+    const tran_low_t *ip = input;
+    tran_low_t *op = output;
+
+    for (i = 0; i < 4; i++) {
+        a1 = ip[0] >> UNIT_QUANT_SHIFT;
+        c1 = ip[1] >> UNIT_QUANT_SHIFT;
+        d1 = ip[2] >> UNIT_QUANT_SHIFT;
+        b1 = ip[3] >> UNIT_QUANT_SHIFT;
+        a1 += c1;
+        d1 -= b1;
+        e1 = (a1 - d1) >> 1;
+        b1 = e1 - b1;
+        c1 = e1 - c1;
+        a1 -= b1;
+        d1 += c1;
+
+        op[0] = a1;
+        op[1] = b1;
+        op[2] = c1;
+        op[3] = d1;
+        ip += 4;
+        op += 4;
+    }
+
+    ip = output;
+    for (i = 0; i < 4; i++) {
+        a1 = ip[4 * 0];
+        c1 = ip[4 * 1];
+        d1 = ip[4 * 2];
+        b1 = ip[4 * 3];
+        a1 += c1;
+        d1 -= b1;
+        e1 = (a1 - d1) >> 1;
+        b1 = e1 - b1;
+        c1 = e1 - c1;
+        a1 -= b1;
+        d1 += c1;
+
+        range_check_value(a1, bd + 1);
+        range_check_value(b1, bd + 1);
+        range_check_value(c1, bd + 1);
+        range_check_value(d1, bd + 1);
+
+        dst[stride * 0] = a1;
+        dst[stride * 1] = b1;
+        dst[stride * 2] = c1;
+        dst[stride * 3] = d1;
+
+        ip++;
+        dst++;
+    }
+}
+
+
+
+void inv_txfm2d_c(const int32_t *input, uint16_t *output, int stride, TXFM_2D_FLIP_CFG *cfg, int32_t *txfm_buf,
+                  TX_SIZE tx_size, int bd);
+
+static INLINE void inv_txfm2d_facade(const int32_t *input, uint16_t *output, int stride, int32_t *txfm_buf,
+                                         TX_TYPE tx_type, TX_SIZE tx_size, int bd) {
+  TXFM_2D_FLIP_CFG cfg;
+  av1_get_inv_txfm_cfg(tx_type, tx_size, &cfg);
+  inv_txfm2d_c(input, output, stride, &cfg, txfm_buf, tx_size, bd);
+}
+
+
+void av1_highbd_inv_txfm_c(const tran_low_t *input, uint16_t *dest, int stride, const TxfmParam *txfm_param) {
+  assert(av1_ext_tx_used[txfm_param->tx_set_type][txfm_param->tx_type]);
+  const TX_SIZE tx_size = txfm_param->tx_size;
+
+  const int bd = txfm_param->bd;
+  const TX_TYPE tx_type = txfm_param->tx_type;
+  const int32_t *input32 = cast_to_int32(input);
+  DECLARE_ALIGNED(32, int, txfm_buf[64 * 64 + 64 + 64]);
+    
+  int32_t mod_input[64 * 64];
+  const int block_w = tx_size_wide[tx_size];
+  const int block_h = tx_size_high[tx_size];
+  if (block_w == 64 || block_h == 64)
+  {
+      memset(mod_input, 0, sizeof(mod_input));
+      int w = block_w > 32 ? 32 : block_w;
+      int h = block_h > 32 ? 32 : block_h;
+      for (int row = 0; row < h; ++row) {
+          memcpy(mod_input + row * block_w, input32 + row * w, w * sizeof(*mod_input));
+      }
+      input32 = mod_input;
+  }
+//  if (tx_size == TX_32X64 || tx_size == TX_16X64)
+//  {
+//    int w = tx_size == TX_32X64 ? 32 : 16;
+//    for (int row = 0; row < )
+//    memset(input32 + w * 32, 0, (64 - w) * sizeof(*input32));
+//  }
+//  if (tx_size == TX_64X64 || tx_size == TX_64X32 || tx_size == TX_64X16)
+//  {
+//    int h = tx_size == TX_64X16 ? 16 : 32;
+//    for (int row = 0; row < h; ++row) {
+//      memcpy(mod_input + row * 64, input32 + row * 32, 32 * sizeof(*mod_input));
+//      memset(mod_input + row * 64 + 32, 0, 32 * sizeof(*mod_input));
+//    }
+//    if (tx_size == TX_64X64)
+//        memset(mod_input + 32 * 64, 0, 32 * 64 * sizeof(*mod_input)); 
+//    input32 = mod_input;
+//  }
+
+  if (!txfm_param->lossless || tx_size != TX_4X4)
+  {
+    inv_txfm2d_facade(input32, dest, stride, txfm_buf, tx_type, tx_size, bd);
+  }
+  else
+  {
+      av1_highbd_iwht4x4_16_c(input32, (int16_t *)dest, stride, bd);
+  }
+}
+
+void av1_inverse_transform_block_frame(const MACROBLOCKD *xd, const tran_low_t *dqcoeff, int plane, TX_TYPE tx_type,
+                                 TX_SIZE tx_size, uint16_t *dst, int stride, int eob, int reduced_tx_set)
+{
+  if (!eob) return;
+  assert(eob <= av1_get_max_eob(tx_size));
+
+  TxfmParam txfm_param;
+  init_txfm_param(xd, plane, tx_size, tx_type, eob, reduced_tx_set, &txfm_param);
+  assert(av1_ext_tx_used[txfm_param.tx_set_type][txfm_param.tx_type]);
+
+  av1_highbd_inv_txfm_c(dqcoeff, dst, stride, &txfm_param);
+  //  if (txfm_param.is_hbd) {
+//    av1_highbd_inv_txfm_add(dqcoeff, dst, stride, &txfm_param);
+//  } else {
+//    //av1_inv_txfm_add(dqcoeff, dst, stride, &txfm_param);
+//    const TX_SIZE tx_size = txfm_param.tx_size;
+//    DECLARE_ALIGNED(32, uint16_t, tmp[MAX_TX_SQUARE]);
+//    int tmp_stride = MAX_TX_SIZE;
+//    int w = tx_size_wide[tx_size];
+//    int h = tx_size_high[tx_size];
+//    for (int r = 0; r < h; ++r) {
+//      for (int c = 0; c < w; ++c) {
+//        tmp[r * tmp_stride + c] = dst[r * stride + c];
+//      }
+//    }
+//
+//    av1_highbd_inv_txfm_add(dqcoeff, CONVERT_TO_BYTEPTR(tmp), tmp_stride, &txfm_param);
+//
+//    for (int r = 0; r < h; ++r) {
+//      for (int c = 0; c < w; ++c) {
+//        dst[r * stride + c] = (uint8_t)tmp[r * tmp_stride + c];
+//      }
+//    }
+//  }
+}

diff --git a/libav1/av1/common/idct.h b/libav1/av1/common/idct.h
new file mode 100644
index 0000000..8729621
--- /dev/null
+++ b/libav1/av1/common/idct.h

@@ -0,0 +1,55 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_IDCT_H_
+#define AOM_AV1_COMMON_IDCT_H_
+
+#include "config/aom_config.h"
+
+#include "av1/common/blockd.h"
+#include "av1/common/common.h"
+#include "av1/common/enums.h"
+#include "aom_dsp/txfm_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef void (*transform_1d)(const tran_low_t *, tran_low_t *);
+
+typedef struct {
+  transform_1d cols, rows;  // vertical and horizontal
+} transform_2d;
+
+#define MAX_TX_SCALE 1
+int av1_get_tx_scale(const TX_SIZE tx_size);
+
+void av1_inverse_transform_block(const MACROBLOCKD *xd,
+                                 const tran_low_t *dqcoeff, int plane,
+                                 TX_TYPE tx_type, TX_SIZE tx_size, uint8_t *dst,
+                                 int stride, int eob, int reduced_tx_set);
+void av1_highbd_iwht4x4_add(const tran_low_t *input, uint8_t *dest, int stride,
+                            int eob, int bd);
+
+void av1_inverse_transform_block_frame(const MACROBLOCKD *xd, const tran_low_t *dqcoeff, int plane, TX_TYPE tx_type,
+                                       TX_SIZE tx_size, uint16_t *dst, int stride, int eob, int reduced_tx_set);
+void av1_highbd_inv_txfm_c(const tran_low_t *input, uint16_t *dest, int stride, const TxfmParam *txfm_param);
+
+static INLINE const int32_t *cast_to_int32(const tran_low_t *input) {
+  assert(sizeof(int32_t) == sizeof(tran_low_t));
+  return (const int32_t *)input;
+}
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_IDCT_H_

diff --git a/libav1/av1/common/mv.h b/libav1/av1/common/mv.h
new file mode 100644
index 0000000..cbee5a3
--- /dev/null
+++ b/libav1/av1/common/mv.h

@@ -0,0 +1,301 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_MV_H_
+#define AOM_AV1_COMMON_MV_H_
+
+#include "av1/common/common.h"
+#include "av1/common/common_data.h"
+#include "aom_dsp/aom_filter.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define INVALID_MV 0x80008000
+
+typedef struct mv {
+  int16_t row;
+  int16_t col;
+} MV;
+
+static const MV kZeroMv = { 0, 0 };
+
+typedef union int_mv {
+  uint32_t as_int;
+  MV as_mv;
+} int_mv; /* facilitates faster equality tests and copies */
+
+typedef struct mv32 {
+  int32_t row;
+  int32_t col;
+} MV32;
+
+// Bits of precision used for the model
+#define WARPEDMODEL_PREC_BITS 16
+#define WARPEDMODEL_ROW3HOMO_PREC_BITS 16
+
+#define WARPEDMODEL_TRANS_CLAMP (128 << WARPEDMODEL_PREC_BITS)
+#define WARPEDMODEL_NONDIAGAFFINE_CLAMP (1 << (WARPEDMODEL_PREC_BITS - 3))
+#define WARPEDMODEL_ROW3HOMO_CLAMP (1 << (WARPEDMODEL_PREC_BITS - 2))
+
+// Bits of subpel precision for warped interpolation
+#define WARPEDPIXEL_PREC_BITS 6
+#define WARPEDPIXEL_PREC_SHIFTS (1 << WARPEDPIXEL_PREC_BITS)
+
+#define WARP_PARAM_REDUCE_BITS 6
+
+#define WARPEDDIFF_PREC_BITS (WARPEDMODEL_PREC_BITS - WARPEDPIXEL_PREC_BITS)
+
+/* clang-format off */
+enum {
+  IDENTITY = 0,      // identity transformation, 0-parameter
+  TRANSLATION = 1,   // translational motion 2-parameter
+  ROTZOOM = 2,       // simplified affine with rotation + zoom only, 4-parameter
+  AFFINE = 3,        // affine, 6-parameter
+  TRANS_TYPES,
+} UENUM1BYTE(TransformationType);
+/* clang-format on */
+
+// Number of types used for global motion (must be >= 3 and <= TRANS_TYPES)
+// The following can be useful:
+// GLOBAL_TRANS_TYPES 3 - up to rotation-zoom
+// GLOBAL_TRANS_TYPES 4 - up to affine
+// GLOBAL_TRANS_TYPES 6 - up to hor/ver trapezoids
+// GLOBAL_TRANS_TYPES 7 - up to full homography
+#define GLOBAL_TRANS_TYPES 4
+
+typedef struct {
+  int global_warp_allowed;
+  int local_warp_allowed;
+} WarpTypesAllowed;
+
+// number of parameters used by each transformation in TransformationTypes
+static const int trans_model_params[TRANS_TYPES] = { 0, 2, 4, 6 };
+
+// The order of values in the wmmat matrix below is best described
+// by the homography:
+//      [x'     (m2 m3 m0   [x
+//  z .  y'  =   m4 m5 m1 *  y
+//       1]      m6 m7 1)    1]
+typedef struct {
+  int32_t wmmat[8];
+  int16_t alpha, beta, gamma, delta;
+  TransformationType wmtype;
+  int16_t invalid;
+} WarpedMotionParams;
+
+/* clang-format off */
+static const WarpedMotionParams default_warp_params = {
+  { 0, 0, (1 << WARPEDMODEL_PREC_BITS), 0, 0, (1 << WARPEDMODEL_PREC_BITS), 0,
+    0 },
+  0, 0, 0, 0,
+  IDENTITY,
+  0,
+};
+/* clang-format on */
+
+// The following constants describe the various precisions
+// of different parameters in the global motion experiment.
+//
+// Given the general homography:
+//      [x'     (a  b  c   [x
+//  z .  y'  =   d  e  f *  y
+//       1]      g  h  i)    1]
+//
+// Constants using the name ALPHA here are related to parameters
+// a, b, d, e. Constants using the name TRANS are related
+// to parameters c and f.
+//
+// Anything ending in PREC_BITS is the number of bits of precision
+// to maintain when converting from double to integer.
+//
+// The ABS parameters are used to create an upper and lower bound
+// for each parameter. In other words, after a parameter is integerized
+// it is clamped between -(1 << ABS_XXX_BITS) and (1 << ABS_XXX_BITS).
+//
+// XXX_PREC_DIFF and XXX_DECODE_FACTOR
+// are computed once here to prevent repetitive
+// computation on the decoder side. These are
+// to allow the global motion parameters to be encoded in a lower
+// precision than the warped model precision. This means that they
+// need to be changed to warped precision when they are decoded.
+//
+// XX_MIN, XX_MAX are also computed to avoid repeated computation
+
+#define SUBEXPFIN_K 3
+#define GM_TRANS_PREC_BITS 6
+#define GM_ABS_TRANS_BITS 12
+#define GM_ABS_TRANS_ONLY_BITS (GM_ABS_TRANS_BITS - GM_TRANS_PREC_BITS + 3)
+#define GM_TRANS_PREC_DIFF (WARPEDMODEL_PREC_BITS - GM_TRANS_PREC_BITS)
+#define GM_TRANS_ONLY_PREC_DIFF (WARPEDMODEL_PREC_BITS - 3)
+#define GM_TRANS_DECODE_FACTOR (1 << GM_TRANS_PREC_DIFF)
+#define GM_TRANS_ONLY_DECODE_FACTOR (1 << GM_TRANS_ONLY_PREC_DIFF)
+
+#define GM_ALPHA_PREC_BITS 15
+#define GM_ABS_ALPHA_BITS 12
+#define GM_ALPHA_PREC_DIFF (WARPEDMODEL_PREC_BITS - GM_ALPHA_PREC_BITS)
+#define GM_ALPHA_DECODE_FACTOR (1 << GM_ALPHA_PREC_DIFF)
+
+#define GM_ROW3HOMO_PREC_BITS 16
+#define GM_ABS_ROW3HOMO_BITS 11
+#define GM_ROW3HOMO_PREC_DIFF \
+  (WARPEDMODEL_ROW3HOMO_PREC_BITS - GM_ROW3HOMO_PREC_BITS)
+#define GM_ROW3HOMO_DECODE_FACTOR (1 << GM_ROW3HOMO_PREC_DIFF)
+
+#define GM_TRANS_MAX (1 << GM_ABS_TRANS_BITS)
+#define GM_ALPHA_MAX (1 << GM_ABS_ALPHA_BITS)
+#define GM_ROW3HOMO_MAX (1 << GM_ABS_ROW3HOMO_BITS)
+
+#define GM_TRANS_MIN -GM_TRANS_MAX
+#define GM_ALPHA_MIN -GM_ALPHA_MAX
+#define GM_ROW3HOMO_MIN -GM_ROW3HOMO_MAX
+
+static INLINE int block_center_x(int mi_col, BLOCK_SIZE bs) {
+  const int bw = block_size_wide[bs];
+  return mi_col * MI_SIZE + bw / 2 - 1;
+}
+
+static INLINE int block_center_y(int mi_row, BLOCK_SIZE bs) {
+  const int bh = block_size_high[bs];
+  return mi_row * MI_SIZE + bh / 2 - 1;
+}
+
+static INLINE int convert_to_trans_prec(int allow_hp, int coor) {
+  if (allow_hp)
+    return ROUND_POWER_OF_TWO_SIGNED(coor, WARPEDMODEL_PREC_BITS - 3);
+  else
+    return ROUND_POWER_OF_TWO_SIGNED(coor, WARPEDMODEL_PREC_BITS - 2) * 2;
+}
+static INLINE void integer_mv_precision(MV *mv) {
+  int mod = (mv->row % 8);
+  if (mod != 0) {
+    mv->row -= mod;
+    if (abs(mod) > 4) {
+      if (mod > 0) {
+        mv->row += 8;
+      } else {
+        mv->row -= 8;
+      }
+    }
+  }
+
+  mod = (mv->col % 8);
+  if (mod != 0) {
+    mv->col -= mod;
+    if (abs(mod) > 4) {
+      if (mod > 0) {
+        mv->col += 8;
+      } else {
+        mv->col -= 8;
+      }
+    }
+  }
+}
+// Convert a global motion vector into a motion vector at the centre of the
+// given block.
+//
+// The resulting motion vector will have three fractional bits of precision. If
+// allow_hp is zero, the bottom bit will always be zero. If CONFIG_AMVR and
+// is_integer is true, the bottom three bits will be zero (so the motion vector
+// represents an integer)
+static INLINE int_mv gm_get_motion_vector(const WarpedMotionParams *gm,
+                                          int allow_hp, BLOCK_SIZE bsize,
+                                          int mi_col, int mi_row,
+                                          int is_integer) {
+  int_mv res;
+
+  if (gm->wmtype == IDENTITY) {
+    res.as_int = 0;
+    return res;
+  }
+
+  const int32_t *mat = gm->wmmat;
+  int x, y, tx, ty;
+
+  if (gm->wmtype == TRANSLATION) {
+    // All global motion vectors are stored with WARPEDMODEL_PREC_BITS (16)
+    // bits of fractional precision. The offset for a translation is stored in
+    // entries 0 and 1. For translations, all but the top three (two if
+    // cm->allow_high_precision_mv is false) fractional bits are always zero.
+    //
+    // After the right shifts, there are 3 fractional bits of precision. If
+    // allow_hp is false, the bottom bit is always zero (so we don't need a
+    // call to convert_to_trans_prec here)
+    res.as_mv.row = gm->wmmat[0] >> GM_TRANS_ONLY_PREC_DIFF;
+    res.as_mv.col = gm->wmmat[1] >> GM_TRANS_ONLY_PREC_DIFF;
+    assert(IMPLIES(1 & (res.as_mv.row | res.as_mv.col), allow_hp));
+    if (is_integer) {
+      integer_mv_precision(&res.as_mv);
+    }
+    return res;
+  }
+
+  x = block_center_x(mi_col, bsize);
+  y = block_center_y(mi_row, bsize);
+
+  if (gm->wmtype == ROTZOOM) {
+    assert(gm->wmmat[5] == gm->wmmat[2]);
+    assert(gm->wmmat[4] == -gm->wmmat[3]);
+  }
+
+  const int xc =
+      (mat[2] - (1 << WARPEDMODEL_PREC_BITS)) * x + mat[3] * y + mat[0];
+  const int yc =
+      mat[4] * x + (mat[5] - (1 << WARPEDMODEL_PREC_BITS)) * y + mat[1];
+  tx = convert_to_trans_prec(allow_hp, xc);
+  ty = convert_to_trans_prec(allow_hp, yc);
+
+  res.as_mv.row = ty;
+  res.as_mv.col = tx;
+
+  if (is_integer) {
+    integer_mv_precision(&res.as_mv);
+  }
+  return res;
+}
+
+static INLINE TransformationType get_wmtype(const WarpedMotionParams *gm) {
+  if (gm->wmmat[5] == (1 << WARPEDMODEL_PREC_BITS) && !gm->wmmat[4] &&
+      gm->wmmat[2] == (1 << WARPEDMODEL_PREC_BITS) && !gm->wmmat[3]) {
+    return ((!gm->wmmat[1] && !gm->wmmat[0]) ? IDENTITY : TRANSLATION);
+  }
+  if (gm->wmmat[2] == gm->wmmat[5] && gm->wmmat[3] == -gm->wmmat[4])
+    return ROTZOOM;
+  else
+    return AFFINE;
+}
+
+typedef struct candidate_mv {
+  int_mv this_mv;
+  int_mv comp_mv;
+  int weight;
+} CANDIDATE_MV;
+
+static INLINE int is_zero_mv(const MV *mv) {
+  return *((const uint32_t *)mv) == 0;
+}
+
+static INLINE int is_equal_mv(const MV *a, const MV *b) {
+  return *((const uint32_t *)a) == *((const uint32_t *)b);
+}
+
+static INLINE void clamp_mv(MV *mv, int min_col, int max_col, int min_row,
+                            int max_row) {
+  mv->col = clamp(mv->col, min_col, max_col);
+  mv->row = clamp(mv->row, min_row, max_row);
+}
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_MV_H_

diff --git a/libav1/av1/common/mvref_common.c b/libav1/av1/common/mvref_common.c
new file mode 100644
index 0000000..0bfad38
--- /dev/null
+++ b/libav1/av1/common/mvref_common.c

@@ -0,0 +1,1653 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <stdlib.h>
+
+#include "av1/common/mvref_common.h"
+#include "av1/common/warped_motion.h"
+
+// Although we assign 32 bit integers, all the values are strictly under 14
+// bits.
+static int div_mult[32] = { 0,    16384, 8192, 5461, 4096, 3276, 2730, 2340,
+                            2048, 1820,  1638, 1489, 1365, 1260, 1170, 1092,
+                            1024, 963,   910,  862,  819,  780,  744,  712,
+                            682,  655,   630,  606,  585,  564,  546,  528 };
+
+// TODO(jingning): Consider the use of lookup table for (num / den)
+// altogether.
+static void get_mv_projection(MV *output, MV ref, int num, int den) {
+  den = AOMMIN(den, MAX_FRAME_DISTANCE);
+  num = num > 0 ? AOMMIN(num, MAX_FRAME_DISTANCE)
+                : AOMMAX(num, -MAX_FRAME_DISTANCE);
+  const int mv_row =
+      ROUND_POWER_OF_TWO_SIGNED(ref.row * num * div_mult[den], 14);
+  const int mv_col =
+      ROUND_POWER_OF_TWO_SIGNED(ref.col * num * div_mult[den], 14);
+  const int clamp_max = MV_UPP - 1;
+  const int clamp_min = MV_LOW + 1;
+  output->row = (int16_t)clamp(mv_row, clamp_min, clamp_max);
+  output->col = (int16_t)clamp(mv_col, clamp_min, clamp_max);
+}
+
+void av1_copy_frame_mvs(const AV1_COMMON *const cm,
+                        const MB_MODE_INFO *const mi, int mi_row, int mi_col,
+                        int x_mis, int y_mis) {
+  const int frame_mvs_stride = ROUND_POWER_OF_TWO(cm->mi_cols, 1);
+  MV_REF *frame_mvs =
+      cm->cur_frame->mvs + (mi_row >> 1) * frame_mvs_stride + (mi_col >> 1);
+  x_mis = ROUND_POWER_OF_TWO(x_mis, 1);
+  y_mis = ROUND_POWER_OF_TWO(y_mis, 1);
+  int w, h;
+
+  for (h = 0; h < y_mis; h++) {
+    MV_REF *mv = frame_mvs;
+    for (w = 0; w < x_mis; w++) {
+      mv->ref_frame = NONE_FRAME;
+      mv->mv.as_int = 0;
+
+      for (int idx = 0; idx < 2; ++idx) {
+        MV_REFERENCE_FRAME ref_frame = mi->ref_frame[idx];
+        if (ref_frame > INTRA_FRAME) {
+          int8_t ref_idx = cm->ref_frame_side[ref_frame];
+          if (ref_idx) continue;
+          if ((abs(mi->mv[idx].as_mv.row) > REFMVS_LIMIT) ||
+              (abs(mi->mv[idx].as_mv.col) > REFMVS_LIMIT))
+            continue;
+          mv->ref_frame = ref_frame;
+          mv->mv.as_int = mi->mv[idx].as_int;
+        }
+      }
+      mv++;
+    }
+    frame_mvs += frame_mvs_stride;
+  }
+}
+
+static void add_ref_mv_candidate(
+    const MB_MODE_INFO *const candidate, const MV_REFERENCE_FRAME rf[2],
+    uint8_t *refmv_count, uint8_t *ref_match_count, uint8_t *newmv_count,
+    CANDIDATE_MV *ref_mv_stack, int_mv *gm_mv_candidates,
+    const WarpedMotionParams *gm_params, int col, int weight) {
+  if (!is_inter_block(candidate)) return;  // for intrabc
+  int index = 0, ref;
+  assert(weight % 2 == 0);
+
+  if (rf[1] == NONE_FRAME) {
+    // single reference frame
+    for (ref = 0; ref < 2; ++ref) {
+      if (candidate->ref_frame[ref] == rf[0]) {
+        int_mv this_refmv;
+        if (is_global_mv_block(candidate, gm_params[rf[0]].wmtype))
+          this_refmv = gm_mv_candidates[0];
+        else
+          this_refmv = get_sub_block_mv(candidate, ref, col);
+
+        for (index = 0; index < *refmv_count; ++index)
+          if (ref_mv_stack[index].this_mv.as_int == this_refmv.as_int) break;
+
+        if (index < *refmv_count) ref_mv_stack[index].weight += weight;
+
+        // Add a new item to the list.
+        if (index == *refmv_count && *refmv_count < MAX_REF_MV_STACK_SIZE) {
+          ref_mv_stack[index].this_mv = this_refmv;
+          ref_mv_stack[index].weight = weight;
+          ++(*refmv_count);
+        }
+        if (have_newmv_in_inter_mode(candidate->mode)) ++*newmv_count;
+        ++*ref_match_count;
+      }
+    }
+  } else {
+    // compound reference frame
+    if (candidate->ref_frame[0] == rf[0] && candidate->ref_frame[1] == rf[1]) {
+      int_mv this_refmv[2];
+
+      for (ref = 0; ref < 2; ++ref) {
+        if (is_global_mv_block(candidate, gm_params[rf[ref]].wmtype))
+          this_refmv[ref] = gm_mv_candidates[ref];
+        else
+          this_refmv[ref] = get_sub_block_mv(candidate, ref, col);
+      }
+
+      for (index = 0; index < *refmv_count; ++index)
+        if ((ref_mv_stack[index].this_mv.as_int == this_refmv[0].as_int) &&
+            (ref_mv_stack[index].comp_mv.as_int == this_refmv[1].as_int))
+          break;
+
+      if (index < *refmv_count) ref_mv_stack[index].weight += weight;
+
+      // Add a new item to the list.
+      if (index == *refmv_count && *refmv_count < MAX_REF_MV_STACK_SIZE) {
+        ref_mv_stack[index].this_mv = this_refmv[0];
+        ref_mv_stack[index].comp_mv = this_refmv[1];
+        ref_mv_stack[index].weight = weight;
+        ++(*refmv_count);
+      }
+      if (have_newmv_in_inter_mode(candidate->mode)) ++*newmv_count;
+      ++*ref_match_count;
+    }
+  }
+}
+
+static void scan_row_mbmi(const AV1_COMMON *cm, const MACROBLOCKD *xd,
+                          int mi_row, int mi_col,
+                          const MV_REFERENCE_FRAME rf[2], int row_offset,
+                          CANDIDATE_MV *ref_mv_stack, uint8_t *refmv_count,
+                          uint8_t *ref_match_count, uint8_t *newmv_count,
+                          int_mv *gm_mv_candidates, int max_row_offset,
+                          int *processed_rows) {
+  int end_mi = AOMMIN(xd->n4_w, cm->mi_cols - mi_col);
+  end_mi = AOMMIN(end_mi, mi_size_wide[BLOCK_64X64]);
+  const int n8_w_8 = mi_size_wide[BLOCK_8X8];
+  const int n8_w_16 = mi_size_wide[BLOCK_16X16];
+  int i;
+  int col_offset = 0;
+  // TODO(jingning): Revisit this part after cb4x4 is stable.
+  if (abs(row_offset) > 1) {
+    col_offset = 1;
+    if ((mi_col & 0x01) && xd->n4_w < n8_w_8) --col_offset;
+  }
+  const int use_step_16 = (xd->n4_w >= 16);
+  MB_MODE_INFO **const candidate_mi0 = xd->mi + row_offset * xd->mi_stride;
+  (void)mi_row;
+
+  for (i = 0; i < end_mi;) {
+    const MB_MODE_INFO *const candidate = candidate_mi0[col_offset + i];
+    const int candidate_bsize = candidate->sb_type;
+    const int n4_w = mi_size_wide[candidate_bsize];
+    int len = AOMMIN(xd->n4_w, n4_w);
+    if (use_step_16)
+      len = AOMMAX(n8_w_16, len);
+    else if (abs(row_offset) > 1)
+      len = AOMMAX(len, n8_w_8);
+
+    int weight = 2;
+    if (xd->n4_w >= n8_w_8 && xd->n4_w <= n4_w) {
+      int inc = AOMMIN(-max_row_offset + row_offset + 1,
+                       mi_size_high[candidate_bsize]);
+      // Obtain range used in weight calculation.
+      weight = AOMMAX(weight, inc);
+      // Update processed rows.
+      *processed_rows = inc - row_offset - 1;
+    }
+
+    add_ref_mv_candidate(candidate, rf, refmv_count, ref_match_count,
+                         newmv_count, ref_mv_stack, gm_mv_candidates,
+                         cm->global_motion, col_offset + i, len * weight);
+
+    i += len;
+  }
+}
+
+static void scan_col_mbmi(const AV1_COMMON *cm, const MACROBLOCKD *xd,
+                          int mi_row, int mi_col,
+                          const MV_REFERENCE_FRAME rf[2], int col_offset,
+                          CANDIDATE_MV *ref_mv_stack, uint8_t *refmv_count,
+                          uint8_t *ref_match_count, uint8_t *newmv_count,
+                          int_mv *gm_mv_candidates, int max_col_offset,
+                          int *processed_cols) {
+  int end_mi = AOMMIN(xd->n4_h, cm->mi_rows - mi_row);
+  end_mi = AOMMIN(end_mi, mi_size_high[BLOCK_64X64]);
+  const int n8_h_8 = mi_size_high[BLOCK_8X8];
+  const int n8_h_16 = mi_size_high[BLOCK_16X16];
+  int i;
+  int row_offset = 0;
+  if (abs(col_offset) > 1) {
+    row_offset = 1;
+    if ((mi_row & 0x01) && xd->n4_h < n8_h_8) --row_offset;
+  }
+  const int use_step_16 = (xd->n4_h >= 16);
+  (void)mi_col;
+
+  for (i = 0; i < end_mi;) {
+    const MB_MODE_INFO *const candidate =
+        xd->mi[(row_offset + i) * xd->mi_stride + col_offset];
+    const int candidate_bsize = candidate->sb_type;
+    const int n4_h = mi_size_high[candidate_bsize];
+    int len = AOMMIN(xd->n4_h, n4_h);
+    if (use_step_16)
+      len = AOMMAX(n8_h_16, len);
+    else if (abs(col_offset) > 1)
+      len = AOMMAX(len, n8_h_8);
+
+    int weight = 2;
+    if (xd->n4_h >= n8_h_8 && xd->n4_h <= n4_h) {
+      int inc = AOMMIN(-max_col_offset + col_offset + 1,
+                       mi_size_wide[candidate_bsize]);
+      // Obtain range used in weight calculation.
+      weight = AOMMAX(weight, inc);
+      // Update processed cols.
+      *processed_cols = inc - col_offset - 1;
+    }
+
+    add_ref_mv_candidate(candidate, rf, refmv_count, ref_match_count,
+                         newmv_count, ref_mv_stack, gm_mv_candidates,
+                         cm->global_motion, col_offset, len * weight);
+
+    i += len;
+  }
+}
+
+static void scan_blk_mbmi(const AV1_COMMON *cm, const MACROBLOCKD *xd,
+                          const int mi_row, const int mi_col,
+                          const MV_REFERENCE_FRAME rf[2], int row_offset,
+                          int col_offset, CANDIDATE_MV *ref_mv_stack,
+                          uint8_t *ref_match_count, uint8_t *newmv_count,
+                          int_mv *gm_mv_candidates,
+                          uint8_t refmv_count[MODE_CTX_REF_FRAMES]) {
+  const TileInfo *const tile = &xd->tile;
+  POSITION mi_pos;
+
+  mi_pos.row = row_offset;
+  mi_pos.col = col_offset;
+
+  if (is_inside(tile, mi_col, mi_row, &mi_pos)) {
+    const MB_MODE_INFO *const candidate =
+        xd->mi[mi_pos.row * xd->mi_stride + mi_pos.col];
+    const int len = mi_size_wide[BLOCK_8X8];
+
+    add_ref_mv_candidate(candidate, rf, refmv_count, ref_match_count,
+                         newmv_count, ref_mv_stack, gm_mv_candidates,
+                         cm->global_motion, mi_pos.col, 2 * len);
+  }  // Analyze a single 8x8 block motion information.
+}
+
+static int has_top_right(const AV1_COMMON *cm, const MACROBLOCKD *xd,
+                         int mi_row, int mi_col, int bs) {
+  const int sb_mi_size = mi_size_wide[cm->seq_params.sb_size];
+  const int mask_row = mi_row & (sb_mi_size - 1);
+  const int mask_col = mi_col & (sb_mi_size - 1);
+
+  if (bs > mi_size_wide[BLOCK_64X64]) return 0;
+
+  // In a split partition all apart from the bottom right has a top right
+  int has_tr = !((mask_row & bs) && (mask_col & bs));
+
+  // bs > 0 and bs is a power of 2
+  assert(bs > 0 && !(bs & (bs - 1)));
+
+  // For each 4x4 group of blocks, when the bottom right is decoded the blocks
+  // to the right have not been decoded therefore the bottom right does
+  // not have a top right
+  while (bs < sb_mi_size) {
+    if (mask_col & bs) {
+      if ((mask_col & (2 * bs)) && (mask_row & (2 * bs))) {
+        has_tr = 0;
+        break;
+      }
+    } else {
+      break;
+    }
+    bs <<= 1;
+  }
+
+  // The left hand of two vertical rectangles always has a top right (as the
+  // block above will have been decoded)
+  if (xd->n4_w < xd->n4_h)
+    if (!xd->is_sec_rect) has_tr = 1;
+
+  // The bottom of two horizontal rectangles never has a top right (as the block
+  // to the right won't have been decoded)
+  if (xd->n4_w > xd->n4_h)
+    if (xd->is_sec_rect) has_tr = 0;
+
+  // The bottom left square of a Vertical A (in the old format) does
+  // not have a top right as it is decoded before the right hand
+  // rectangle of the partition
+  if (xd->mi[0]->partition == PARTITION_VERT_A) {
+    if (xd->n4_w == xd->n4_h)
+      if (mask_row & bs) has_tr = 0;
+  }
+
+  return has_tr;
+}
+
+static int check_sb_border(const int mi_row, const int mi_col,
+                           const int row_offset, const int col_offset) {
+  const int sb_mi_size = mi_size_wide[BLOCK_64X64];
+  const int row = mi_row & (sb_mi_size - 1);
+  const int col = mi_col & (sb_mi_size - 1);
+
+  if (row + row_offset < 0 || row + row_offset >= sb_mi_size ||
+      col + col_offset < 0 || col + col_offset >= sb_mi_size)
+    return 0;
+
+  return 1;
+}
+
+static int add_tpl_ref_mv(const AV1_COMMON *cm, const MACROBLOCKD *xd,
+                          int mi_row, int mi_col, MV_REFERENCE_FRAME ref_frame,
+                          int blk_row, int blk_col, int_mv *gm_mv_candidates,
+                          uint8_t refmv_count[MODE_CTX_REF_FRAMES],
+                          CANDIDATE_MV ref_mv_stacks[][MAX_REF_MV_STACK_SIZE],
+                          int16_t *mode_context) {
+  POSITION mi_pos;
+  int idx;
+  const int weight_unit = 1;  // mi_size_wide[BLOCK_8X8];
+
+  mi_pos.row = (mi_row & 0x01) ? blk_row : blk_row + 1;
+  mi_pos.col = (mi_col & 0x01) ? blk_col : blk_col + 1;
+
+  if (!is_inside(&xd->tile, mi_col, mi_row, &mi_pos)) return 0;
+
+  const TPL_MV_REF *prev_frame_mvs =
+      cm->tpl_mvs + ((mi_row + mi_pos.row) >> 1) * (cm->mi_stride >> 1) +
+      ((mi_col + mi_pos.col) >> 1);
+
+  MV_REFERENCE_FRAME rf[2];
+  av1_set_ref_frame(rf, ref_frame);
+
+  if (rf[1] == NONE_FRAME) {
+    int cur_frame_index = cm->cur_frame->order_hint;
+    const RefCntBuffer *const buf_0 = get_ref_frame_buf(cm, rf[0]);
+    int frame0_index = buf_0->order_hint;
+    int cur_offset_0 = get_relative_dist(&cm->seq_params.order_hint_info,
+                                         cur_frame_index, frame0_index);
+    CANDIDATE_MV *ref_mv_stack = ref_mv_stacks[rf[0]];
+
+    if (prev_frame_mvs->mfmv0.as_int != INVALID_MV) {
+      int_mv this_refmv;
+
+      get_mv_projection(&this_refmv.as_mv, prev_frame_mvs->mfmv0.as_mv,
+                        cur_offset_0, prev_frame_mvs->ref_frame_offset);
+      lower_mv_precision(&this_refmv.as_mv, cm->allow_high_precision_mv,
+                         cm->cur_frame_force_integer_mv);
+
+      if (blk_row == 0 && blk_col == 0)
+        if (abs(this_refmv.as_mv.row - gm_mv_candidates[0].as_mv.row) >= 16 ||
+            abs(this_refmv.as_mv.col - gm_mv_candidates[0].as_mv.col) >= 16)
+          mode_context[ref_frame] |= (1 << GLOBALMV_OFFSET);
+
+      for (idx = 0; idx < refmv_count[rf[0]]; ++idx)
+        if (this_refmv.as_int == ref_mv_stack[idx].this_mv.as_int) break;
+
+      if (idx < refmv_count[rf[0]]) ref_mv_stack[idx].weight += 2 * weight_unit;
+
+      if (idx == refmv_count[rf[0]] &&
+          refmv_count[rf[0]] < MAX_REF_MV_STACK_SIZE) {
+        ref_mv_stack[idx].this_mv.as_int = this_refmv.as_int;
+        ref_mv_stack[idx].weight = 2 * weight_unit;
+        ++(refmv_count[rf[0]]);
+      }
+      return 1;
+    }
+  } else {
+    // Process compound inter mode
+    int cur_frame_index = cm->cur_frame->order_hint;
+    const RefCntBuffer *const buf_0 = get_ref_frame_buf(cm, rf[0]);
+    int frame0_index = buf_0->order_hint;
+
+    int cur_offset_0 = get_relative_dist(&cm->seq_params.order_hint_info,
+                                         cur_frame_index, frame0_index);
+    const RefCntBuffer *const buf_1 = get_ref_frame_buf(cm, rf[1]);
+    int frame1_index = buf_1->order_hint;
+    int cur_offset_1 = get_relative_dist(&cm->seq_params.order_hint_info,
+                                         cur_frame_index, frame1_index);
+    CANDIDATE_MV *ref_mv_stack = ref_mv_stacks[ref_frame];
+
+    if (prev_frame_mvs->mfmv0.as_int != INVALID_MV) {
+      int_mv this_refmv;
+      int_mv comp_refmv;
+      get_mv_projection(&this_refmv.as_mv, prev_frame_mvs->mfmv0.as_mv,
+                        cur_offset_0, prev_frame_mvs->ref_frame_offset);
+      get_mv_projection(&comp_refmv.as_mv, prev_frame_mvs->mfmv0.as_mv,
+                        cur_offset_1, prev_frame_mvs->ref_frame_offset);
+
+      lower_mv_precision(&this_refmv.as_mv, cm->allow_high_precision_mv,
+                         cm->cur_frame_force_integer_mv);
+      lower_mv_precision(&comp_refmv.as_mv, cm->allow_high_precision_mv,
+                         cm->cur_frame_force_integer_mv);
+
+      if (blk_row == 0 && blk_col == 0)
+        if (abs(this_refmv.as_mv.row - gm_mv_candidates[0].as_mv.row) >= 16 ||
+            abs(this_refmv.as_mv.col - gm_mv_candidates[0].as_mv.col) >= 16 ||
+            abs(comp_refmv.as_mv.row - gm_mv_candidates[1].as_mv.row) >= 16 ||
+            abs(comp_refmv.as_mv.col - gm_mv_candidates[1].as_mv.col) >= 16)
+          mode_context[ref_frame] |= (1 << GLOBALMV_OFFSET);
+
+      for (idx = 0; idx < refmv_count[ref_frame]; ++idx)
+        if (this_refmv.as_int == ref_mv_stack[idx].this_mv.as_int &&
+            comp_refmv.as_int == ref_mv_stack[idx].comp_mv.as_int)
+          break;
+
+      if (idx < refmv_count[ref_frame])
+        ref_mv_stack[idx].weight += 2 * weight_unit;
+
+      if (idx == refmv_count[ref_frame] &&
+          refmv_count[ref_frame] < MAX_REF_MV_STACK_SIZE) {
+        ref_mv_stack[idx].this_mv.as_int = this_refmv.as_int;
+        ref_mv_stack[idx].comp_mv.as_int = comp_refmv.as_int;
+        ref_mv_stack[idx].weight = 2 * weight_unit;
+        ++(refmv_count[ref_frame]);
+      }
+      return 1;
+    }
+  }
+  return 0;
+}
+
+static void process_compound_ref_mv_candidate(
+    const MB_MODE_INFO *const candidate, const AV1_COMMON *const cm,
+    const MV_REFERENCE_FRAME *const rf, int_mv ref_id[2][2],
+    int ref_id_count[2], int_mv ref_diff[2][2], int ref_diff_count[2]) {
+  for (int rf_idx = 0; rf_idx < 2; ++rf_idx) {
+    MV_REFERENCE_FRAME can_rf = candidate->ref_frame[rf_idx];
+
+    for (int cmp_idx = 0; cmp_idx < 2; ++cmp_idx) {
+      if (can_rf == rf[cmp_idx] && ref_id_count[cmp_idx] < 2) {
+        ref_id[cmp_idx][ref_id_count[cmp_idx]] = candidate->mv[rf_idx];
+        ++ref_id_count[cmp_idx];
+      } else if (can_rf > INTRA_FRAME && ref_diff_count[cmp_idx] < 2) {
+        int_mv this_mv = candidate->mv[rf_idx];
+        if (cm->ref_frame_sign_bias[can_rf] !=
+            cm->ref_frame_sign_bias[rf[cmp_idx]]) {
+          this_mv.as_mv.row = -this_mv.as_mv.row;
+          this_mv.as_mv.col = -this_mv.as_mv.col;
+        }
+        ref_diff[cmp_idx][ref_diff_count[cmp_idx]] = this_mv;
+        ++ref_diff_count[cmp_idx];
+      }
+    }
+  }
+}
+
+static void process_single_ref_mv_candidate(
+    const MB_MODE_INFO *const candidate, const AV1_COMMON *const cm,
+    MV_REFERENCE_FRAME ref_frame, uint8_t refmv_count[MODE_CTX_REF_FRAMES],
+    CANDIDATE_MV ref_mv_stack[][MAX_REF_MV_STACK_SIZE]) {
+  for (int rf_idx = 0; rf_idx < 2; ++rf_idx) {
+    if (candidate->ref_frame[rf_idx] > INTRA_FRAME) {
+      int_mv this_mv = candidate->mv[rf_idx];
+      if (cm->ref_frame_sign_bias[candidate->ref_frame[rf_idx]] !=
+          cm->ref_frame_sign_bias[ref_frame]) {
+        this_mv.as_mv.row = -this_mv.as_mv.row;
+        this_mv.as_mv.col = -this_mv.as_mv.col;
+      }
+      int stack_idx;
+      for (stack_idx = 0; stack_idx < refmv_count[ref_frame]; ++stack_idx) {
+        const int_mv stack_mv = ref_mv_stack[ref_frame][stack_idx].this_mv;
+        if (this_mv.as_int == stack_mv.as_int) break;
+      }
+
+      if (stack_idx == refmv_count[ref_frame]) {
+        ref_mv_stack[ref_frame][stack_idx].this_mv = this_mv;
+
+        // TODO(jingning): Set an arbitrary small number here. The weight
+        // doesn't matter as long as it is properly initialized.
+        ref_mv_stack[ref_frame][stack_idx].weight = 2;
+        ++refmv_count[ref_frame];
+      }
+    }
+  }
+}
+
+static void setup_ref_mv_list(
+    const AV1_COMMON *cm, const MACROBLOCKD *xd, MV_REFERENCE_FRAME ref_frame,
+    uint8_t refmv_count[MODE_CTX_REF_FRAMES],
+    CANDIDATE_MV ref_mv_stack[][MAX_REF_MV_STACK_SIZE],
+    int_mv mv_ref_list[][MAX_MV_REF_CANDIDATES], int_mv *gm_mv_candidates,
+    int mi_row, int mi_col, int16_t *mode_context) {
+  const int bs = AOMMAX(xd->n4_w, xd->n4_h);
+  const int has_tr = has_top_right(cm, xd, mi_row, mi_col, bs);
+  MV_REFERENCE_FRAME rf[2];
+
+  const TileInfo *const tile = &xd->tile;
+  int max_row_offset = 0, max_col_offset = 0;
+  const int row_adj = (xd->n4_h < mi_size_high[BLOCK_8X8]) && (mi_row & 0x01);
+  const int col_adj = (xd->n4_w < mi_size_wide[BLOCK_8X8]) && (mi_col & 0x01);
+  int processed_rows = 0;
+  int processed_cols = 0;
+
+  av1_set_ref_frame(rf, ref_frame);
+  mode_context[ref_frame] = 0;
+  refmv_count[ref_frame] = 0;
+
+  // Find valid maximum row/col offset.
+  if (xd->up_available) {
+    max_row_offset = -(MVREF_ROW_COLS << 1) + row_adj;
+
+    if (xd->n4_h < mi_size_high[BLOCK_8X8])
+      max_row_offset = -(2 << 1) + row_adj;
+
+    max_row_offset = find_valid_row_offset(tile, mi_row, max_row_offset);
+  }
+
+  if (xd->left_available) {
+    max_col_offset = -(MVREF_ROW_COLS << 1) + col_adj;
+
+    if (xd->n4_w < mi_size_wide[BLOCK_8X8])
+      max_col_offset = -(2 << 1) + col_adj;
+
+    max_col_offset = find_valid_col_offset(tile, mi_col, max_col_offset);
+  }
+
+  uint8_t col_match_count = 0;
+  uint8_t row_match_count = 0;
+  uint8_t newmv_count = 0;
+
+  // Scan the first above row mode info. row_offset = -1;
+  if (abs(max_row_offset) >= 1)
+    scan_row_mbmi(cm, xd, mi_row, mi_col, rf, -1, ref_mv_stack[ref_frame],
+                  &refmv_count[ref_frame], &row_match_count, &newmv_count,
+                  gm_mv_candidates, max_row_offset, &processed_rows);
+  // Scan the first left column mode info. col_offset = -1;
+  if (abs(max_col_offset) >= 1)
+    scan_col_mbmi(cm, xd, mi_row, mi_col, rf, -1, ref_mv_stack[ref_frame],
+                  &refmv_count[ref_frame], &col_match_count, &newmv_count,
+                  gm_mv_candidates, max_col_offset, &processed_cols);
+  // Check top-right boundary
+  if (has_tr)
+    scan_blk_mbmi(cm, xd, mi_row, mi_col, rf, -1, xd->n4_w,
+                  ref_mv_stack[ref_frame], &row_match_count, &newmv_count,
+                  gm_mv_candidates, &refmv_count[ref_frame]);
+
+  const uint8_t nearest_match = (row_match_count > 0) + (col_match_count > 0);
+  const uint8_t nearest_refmv_count = refmv_count[ref_frame];
+
+  // TODO(yunqing): for comp_search, do it for all 3 cases.
+  for (int idx = 0; idx < nearest_refmv_count; ++idx)
+    ref_mv_stack[ref_frame][idx].weight += REF_CAT_LEVEL;
+
+  if (cm->allow_ref_frame_mvs) {
+    int is_available = 0;
+    const int voffset = AOMMAX(mi_size_high[BLOCK_8X8], xd->n4_h);
+    const int hoffset = AOMMAX(mi_size_wide[BLOCK_8X8], xd->n4_w);
+    const int blk_row_end = AOMMIN(xd->n4_h, mi_size_high[BLOCK_64X64]);
+    const int blk_col_end = AOMMIN(xd->n4_w, mi_size_wide[BLOCK_64X64]);
+
+    const int tpl_sample_pos[3][2] = {
+      { voffset, -2 },
+      { voffset, hoffset },
+      { voffset - 2, hoffset },
+    };
+    const int allow_extension = (xd->n4_h >= mi_size_high[BLOCK_8X8]) &&
+                                (xd->n4_h < mi_size_high[BLOCK_64X64]) &&
+                                (xd->n4_w >= mi_size_wide[BLOCK_8X8]) &&
+                                (xd->n4_w < mi_size_wide[BLOCK_64X64]);
+
+    const int step_h = (xd->n4_h >= mi_size_high[BLOCK_64X64])
+                           ? mi_size_high[BLOCK_16X16]
+                           : mi_size_high[BLOCK_8X8];
+    const int step_w = (xd->n4_w >= mi_size_wide[BLOCK_64X64])
+                           ? mi_size_wide[BLOCK_16X16]
+                           : mi_size_wide[BLOCK_8X8];
+
+    for (int blk_row = 0; blk_row < blk_row_end; blk_row += step_h) {
+      for (int blk_col = 0; blk_col < blk_col_end; blk_col += step_w) {
+        int ret = add_tpl_ref_mv(cm, xd, mi_row, mi_col, ref_frame, blk_row,
+                                 blk_col, gm_mv_candidates, refmv_count,
+                                 ref_mv_stack, mode_context);
+        if (blk_row == 0 && blk_col == 0) is_available = ret;
+      }
+    }
+
+    if (is_available == 0) mode_context[ref_frame] |= (1 << GLOBALMV_OFFSET);
+
+    for (int i = 0; i < 3 && allow_extension; ++i) {
+      const int blk_row = tpl_sample_pos[i][0];
+      const int blk_col = tpl_sample_pos[i][1];
+
+      if (!check_sb_border(mi_row, mi_col, blk_row, blk_col)) continue;
+      add_tpl_ref_mv(cm, xd, mi_row, mi_col, ref_frame, blk_row, blk_col,
+                     gm_mv_candidates, refmv_count, ref_mv_stack, mode_context);
+    }
+  }
+
+  uint8_t dummy_newmv_count = 0;
+
+  // Scan the second outer area.
+  scan_blk_mbmi(cm, xd, mi_row, mi_col, rf, -1, -1, ref_mv_stack[ref_frame],
+                &row_match_count, &dummy_newmv_count, gm_mv_candidates,
+                &refmv_count[ref_frame]);
+
+  for (int idx = 2; idx <= MVREF_ROW_COLS; ++idx) {
+    const int row_offset = -(idx << 1) + 1 + row_adj;
+    const int col_offset = -(idx << 1) + 1 + col_adj;
+
+    if (abs(row_offset) <= abs(max_row_offset) &&
+        abs(row_offset) > processed_rows)
+      scan_row_mbmi(cm, xd, mi_row, mi_col, rf, row_offset,
+                    ref_mv_stack[ref_frame], &refmv_count[ref_frame],
+                    &row_match_count, &dummy_newmv_count, gm_mv_candidates,
+                    max_row_offset, &processed_rows);
+
+    if (abs(col_offset) <= abs(max_col_offset) &&
+        abs(col_offset) > processed_cols)
+      scan_col_mbmi(cm, xd, mi_row, mi_col, rf, col_offset,
+                    ref_mv_stack[ref_frame], &refmv_count[ref_frame],
+                    &col_match_count, &dummy_newmv_count, gm_mv_candidates,
+                    max_col_offset, &processed_cols);
+  }
+
+  const uint8_t ref_match_count = (row_match_count > 0) + (col_match_count > 0);
+
+  switch (nearest_match) {
+    case 0:
+      mode_context[ref_frame] |= 0;
+      if (ref_match_count >= 1) mode_context[ref_frame] |= 1;
+      if (ref_match_count == 1)
+        mode_context[ref_frame] |= (1 << REFMV_OFFSET);
+      else if (ref_match_count >= 2)
+        mode_context[ref_frame] |= (2 << REFMV_OFFSET);
+      break;
+    case 1:
+      mode_context[ref_frame] |= (newmv_count > 0) ? 2 : 3;
+      if (ref_match_count == 1)
+        mode_context[ref_frame] |= (3 << REFMV_OFFSET);
+      else if (ref_match_count >= 2)
+        mode_context[ref_frame] |= (4 << REFMV_OFFSET);
+      break;
+    case 2:
+    default:
+      if (newmv_count >= 1)
+        mode_context[ref_frame] |= 4;
+      else
+        mode_context[ref_frame] |= 5;
+
+      mode_context[ref_frame] |= (5 << REFMV_OFFSET);
+      break;
+  }
+
+  // Rank the likelihood and assign nearest and near mvs.
+  int len = nearest_refmv_count;
+  while (len > 0) {
+    int nr_len = 0;
+    for (int idx = 1; idx < len; ++idx) {
+      if (ref_mv_stack[ref_frame][idx - 1].weight <
+          ref_mv_stack[ref_frame][idx].weight) {
+        CANDIDATE_MV tmp_mv = ref_mv_stack[ref_frame][idx - 1];
+        ref_mv_stack[ref_frame][idx - 1] = ref_mv_stack[ref_frame][idx];
+        ref_mv_stack[ref_frame][idx] = tmp_mv;
+        nr_len = idx;
+      }
+    }
+    len = nr_len;
+  }
+
+  len = refmv_count[ref_frame];
+  while (len > nearest_refmv_count) {
+    int nr_len = nearest_refmv_count;
+    for (int idx = nearest_refmv_count + 1; idx < len; ++idx) {
+      if (ref_mv_stack[ref_frame][idx - 1].weight <
+          ref_mv_stack[ref_frame][idx].weight) {
+        CANDIDATE_MV tmp_mv = ref_mv_stack[ref_frame][idx - 1];
+        ref_mv_stack[ref_frame][idx - 1] = ref_mv_stack[ref_frame][idx];
+        ref_mv_stack[ref_frame][idx] = tmp_mv;
+        nr_len = idx;
+      }
+    }
+    len = nr_len;
+  }
+
+  if (rf[1] > NONE_FRAME) {
+    // TODO(jingning, yunqing): Refactor and consolidate the compound and
+    // single reference frame modes. Reduce unnecessary redundancy.
+    if (refmv_count[ref_frame] < MAX_MV_REF_CANDIDATES) {
+      int_mv ref_id[2][2], ref_diff[2][2];
+      int ref_id_count[2] = { 0 }, ref_diff_count[2] = { 0 };
+
+      int mi_width = AOMMIN(mi_size_wide[BLOCK_64X64], xd->n4_w);
+      mi_width = AOMMIN(mi_width, cm->mi_cols - mi_col);
+      int mi_height = AOMMIN(mi_size_high[BLOCK_64X64], xd->n4_h);
+      mi_height = AOMMIN(mi_height, cm->mi_rows - mi_row);
+      int mi_size = AOMMIN(mi_width, mi_height);
+
+      for (int idx = 0; abs(max_row_offset) >= 1 && idx < mi_size;) {
+        const MB_MODE_INFO *const candidate = xd->mi[-xd->mi_stride + idx];
+        process_compound_ref_mv_candidate(
+            candidate, cm, rf, ref_id, ref_id_count, ref_diff, ref_diff_count);
+        idx += mi_size_wide[candidate->sb_type];
+      }
+
+      for (int idx = 0; abs(max_col_offset) >= 1 && idx < mi_size;) {
+        const MB_MODE_INFO *const candidate = xd->mi[idx * xd->mi_stride - 1];
+        process_compound_ref_mv_candidate(
+            candidate, cm, rf, ref_id, ref_id_count, ref_diff, ref_diff_count);
+        idx += mi_size_high[candidate->sb_type];
+      }
+
+      // Build up the compound mv predictor
+      int_mv comp_list[3][2];
+
+      for (int idx = 0; idx < 2; ++idx) {
+        int comp_idx = 0;
+        for (int list_idx = 0; list_idx < ref_id_count[idx] && comp_idx < 2;
+             ++list_idx, ++comp_idx)
+          comp_list[comp_idx][idx] = ref_id[idx][list_idx];
+        for (int list_idx = 0; list_idx < ref_diff_count[idx] && comp_idx < 2;
+             ++list_idx, ++comp_idx)
+          comp_list[comp_idx][idx] = ref_diff[idx][list_idx];
+        for (; comp_idx < 3; ++comp_idx)
+          comp_list[comp_idx][idx] = gm_mv_candidates[idx];
+      }
+
+      if (refmv_count[ref_frame]) {
+        assert(refmv_count[ref_frame] == 1);
+        if (comp_list[0][0].as_int ==
+                ref_mv_stack[ref_frame][0].this_mv.as_int &&
+            comp_list[0][1].as_int ==
+                ref_mv_stack[ref_frame][0].comp_mv.as_int) {
+          ref_mv_stack[ref_frame][refmv_count[ref_frame]].this_mv =
+              comp_list[1][0];
+          ref_mv_stack[ref_frame][refmv_count[ref_frame]].comp_mv =
+              comp_list[1][1];
+        } else {
+          ref_mv_stack[ref_frame][refmv_count[ref_frame]].this_mv =
+              comp_list[0][0];
+          ref_mv_stack[ref_frame][refmv_count[ref_frame]].comp_mv =
+              comp_list[0][1];
+        }
+        ref_mv_stack[ref_frame][refmv_count[ref_frame]].weight = 2;
+        ++refmv_count[ref_frame];
+      } else {
+        for (int idx = 0; idx < MAX_MV_REF_CANDIDATES; ++idx) {
+          ref_mv_stack[ref_frame][refmv_count[ref_frame]].this_mv =
+              comp_list[idx][0];
+          ref_mv_stack[ref_frame][refmv_count[ref_frame]].comp_mv =
+              comp_list[idx][1];
+          ref_mv_stack[ref_frame][refmv_count[ref_frame]].weight = 2;
+          ++refmv_count[ref_frame];
+        }
+      }
+    }
+
+    assert(refmv_count[ref_frame] >= 2);
+
+    for (int idx = 0; idx < refmv_count[ref_frame]; ++idx) {
+      clamp_mv_ref(&ref_mv_stack[ref_frame][idx].this_mv.as_mv,
+                   xd->n4_w << MI_SIZE_LOG2, xd->n4_h << MI_SIZE_LOG2, xd);
+      clamp_mv_ref(&ref_mv_stack[ref_frame][idx].comp_mv.as_mv,
+                   xd->n4_w << MI_SIZE_LOG2, xd->n4_h << MI_SIZE_LOG2, xd);
+    }
+  } else {
+    // Handle single reference frame extension
+    int mi_width = AOMMIN(mi_size_wide[BLOCK_64X64], xd->n4_w);
+    mi_width = AOMMIN(mi_width, cm->mi_cols - mi_col);
+    int mi_height = AOMMIN(mi_size_high[BLOCK_64X64], xd->n4_h);
+    mi_height = AOMMIN(mi_height, cm->mi_rows - mi_row);
+    int mi_size = AOMMIN(mi_width, mi_height);
+
+    for (int idx = 0; abs(max_row_offset) >= 1 && idx < mi_size &&
+                      refmv_count[ref_frame] < MAX_MV_REF_CANDIDATES;) {
+      const MB_MODE_INFO *const candidate = xd->mi[-xd->mi_stride + idx];
+      process_single_ref_mv_candidate(candidate, cm, ref_frame, refmv_count,
+                                      ref_mv_stack);
+      idx += mi_size_wide[candidate->sb_type];
+    }
+
+    for (int idx = 0; abs(max_col_offset) >= 1 && idx < mi_size &&
+                      refmv_count[ref_frame] < MAX_MV_REF_CANDIDATES;) {
+      const MB_MODE_INFO *const candidate = xd->mi[idx * xd->mi_stride - 1];
+      process_single_ref_mv_candidate(candidate, cm, ref_frame, refmv_count,
+                                      ref_mv_stack);
+      idx += mi_size_high[candidate->sb_type];
+    }
+
+    for (int idx = 0; idx < refmv_count[ref_frame]; ++idx) {
+      clamp_mv_ref(&ref_mv_stack[ref_frame][idx].this_mv.as_mv,
+                   xd->n4_w << MI_SIZE_LOG2, xd->n4_h << MI_SIZE_LOG2, xd);
+    }
+
+    if (mv_ref_list != NULL) {
+      for (int idx = refmv_count[ref_frame]; idx < MAX_MV_REF_CANDIDATES; ++idx)
+        mv_ref_list[rf[0]][idx].as_int = gm_mv_candidates[0].as_int;
+
+      for (int idx = 0;
+           idx < AOMMIN(MAX_MV_REF_CANDIDATES, refmv_count[ref_frame]); ++idx) {
+        mv_ref_list[rf[0]][idx].as_int =
+            ref_mv_stack[ref_frame][idx].this_mv.as_int;
+      }
+    }
+  }
+}
+
+void av1_find_mv_refs(const AV1_COMMON *cm, const MACROBLOCKD *xd,
+                      MB_MODE_INFO *mi, MV_REFERENCE_FRAME ref_frame,
+                      uint8_t ref_mv_count[MODE_CTX_REF_FRAMES],
+                      CANDIDATE_MV ref_mv_stack[][MAX_REF_MV_STACK_SIZE],
+                      int_mv mv_ref_list[][MAX_MV_REF_CANDIDATES],
+                      int_mv *global_mvs, int mi_row, int mi_col,
+                      int16_t *mode_context) {
+  int_mv zeromv[2];
+  BLOCK_SIZE bsize = mi->sb_type;
+  MV_REFERENCE_FRAME rf[2];
+  av1_set_ref_frame(rf, ref_frame);
+
+  if (global_mvs != NULL && ref_frame < REF_FRAMES) {
+    if (ref_frame != INTRA_FRAME) {
+      global_mvs[ref_frame] = gm_get_motion_vector(
+          &cm->global_motion[ref_frame], cm->allow_high_precision_mv, bsize,
+          mi_col, mi_row, cm->cur_frame_force_integer_mv);
+    } else {
+      global_mvs[ref_frame].as_int = INVALID_MV;
+    }
+  }
+
+  if (ref_frame != INTRA_FRAME) {
+    zeromv[0].as_int =
+        gm_get_motion_vector(&cm->global_motion[rf[0]],
+                             cm->allow_high_precision_mv, bsize, mi_col, mi_row,
+                             cm->cur_frame_force_integer_mv)
+            .as_int;
+    zeromv[1].as_int =
+        (rf[1] != NONE_FRAME)
+            ? gm_get_motion_vector(&cm->global_motion[rf[1]],
+                                   cm->allow_high_precision_mv, bsize, mi_col,
+                                   mi_row, cm->cur_frame_force_integer_mv)
+                  .as_int
+            : 0;
+  } else {
+    zeromv[0].as_int = zeromv[1].as_int = 0;
+  }
+
+  setup_ref_mv_list(cm, xd, ref_frame, ref_mv_count, ref_mv_stack, mv_ref_list,
+                    zeromv, mi_row, mi_col, mode_context);
+}
+
+void av1_find_best_ref_mvs(int allow_hp, int_mv *mvlist, int_mv *nearest_mv,
+                           int_mv *near_mv, int is_integer) {
+  int i;
+  // Make sure all the candidates are properly clamped etc
+  for (i = 0; i < MAX_MV_REF_CANDIDATES; ++i) {
+    lower_mv_precision(&mvlist[i].as_mv, allow_hp, is_integer);
+  }
+  *nearest_mv = mvlist[0];
+  *near_mv = mvlist[1];
+}
+
+void av1_setup_frame_buf_refs(AV1_COMMON *cm) {
+  cm->cur_frame->order_hint = cm->current_frame.order_hint;
+
+  MV_REFERENCE_FRAME ref_frame;
+  for (ref_frame = LAST_FRAME; ref_frame <= ALTREF_FRAME; ++ref_frame) {
+    const RefCntBuffer *const buf = get_ref_frame_buf(cm, ref_frame);
+    if (buf != NULL)
+      cm->cur_frame->ref_order_hints[ref_frame - LAST_FRAME] = buf->order_hint;
+  }
+}
+
+void av1_setup_frame_sign_bias(AV1_COMMON *cm) {
+  MV_REFERENCE_FRAME ref_frame;
+  for (ref_frame = LAST_FRAME; ref_frame <= ALTREF_FRAME; ++ref_frame) {
+    const RefCntBuffer *const buf = get_ref_frame_buf(cm, ref_frame);
+    if (cm->seq_params.order_hint_info.enable_order_hint && buf != NULL) {
+      const int ref_order_hint = buf->order_hint;
+      cm->ref_frame_sign_bias[ref_frame] =
+          (get_relative_dist(&cm->seq_params.order_hint_info, ref_order_hint,
+                             (int)cm->current_frame.order_hint) <= 0)
+              ? 0
+              : 1;
+    } else {
+      cm->ref_frame_sign_bias[ref_frame] = 0;
+    }
+  }
+}
+
+#define MAX_OFFSET_WIDTH 64
+#define MAX_OFFSET_HEIGHT 0
+
+static int get_block_position(AV1_COMMON *cm, int *mi_r, int *mi_c, int blk_row,
+                              int blk_col, MV mv, int sign_bias) {
+  const int base_blk_row = (blk_row >> 3) << 3;
+  const int base_blk_col = (blk_col >> 3) << 3;
+
+  const int row_offset = (mv.row >= 0) ? (mv.row >> (4 + MI_SIZE_LOG2))
+                                       : -((-mv.row) >> (4 + MI_SIZE_LOG2));
+
+  const int col_offset = (mv.col >= 0) ? (mv.col >> (4 + MI_SIZE_LOG2))
+                                       : -((-mv.col) >> (4 + MI_SIZE_LOG2));
+
+  const int row =
+      (sign_bias == 1) ? blk_row - row_offset : blk_row + row_offset;
+  const int col =
+      (sign_bias == 1) ? blk_col - col_offset : blk_col + col_offset;
+
+  if (row < 0 || row >= (cm->mi_rows >> 1) || col < 0 ||
+      col >= (cm->mi_cols >> 1))
+    return 0;
+
+  if (row < base_blk_row - (MAX_OFFSET_HEIGHT >> 3) ||
+      row >= base_blk_row + 8 + (MAX_OFFSET_HEIGHT >> 3) ||
+      col < base_blk_col - (MAX_OFFSET_WIDTH >> 3) ||
+      col >= base_blk_col + 8 + (MAX_OFFSET_WIDTH >> 3))
+    return 0;
+
+  *mi_r = row;
+  *mi_c = col;
+
+  return 1;
+}
+
+// Note: motion_filed_projection finds motion vectors of current frame's
+// reference frame, and projects them to current frame. To make it clear,
+// let's call current frame's reference frame as start frame.
+// Call Start frame's reference frames as reference frames.
+// Call ref_offset as frame distances between start frame and its reference
+// frames.
+static int motion_field_projection(AV1_COMMON *cm,
+                                   MV_REFERENCE_FRAME start_frame, int dir, int row_start, int row_end) {
+  TPL_MV_REF *tpl_mvs_base = cm->tpl_mvs;
+  int ref_offset[REF_FRAMES] = { 0 };
+
+  const RefCntBuffer *const start_frame_buf =
+      get_ref_frame_buf(cm, start_frame);
+  if (start_frame_buf == NULL) return 0;
+
+  if (start_frame_buf->frame_type == KEY_FRAME ||
+      start_frame_buf->frame_type == INTRA_ONLY_FRAME)
+    return 0;
+
+  if (start_frame_buf->mi_rows != cm->mi_rows ||
+      start_frame_buf->mi_cols != cm->mi_cols)
+    return 0;
+
+  const int start_frame_order_hint = start_frame_buf->order_hint;
+  const unsigned int *const ref_order_hints =
+      &start_frame_buf->ref_order_hints[0];
+  const int cur_order_hint = cm->cur_frame->order_hint;
+  int start_to_current_frame_offset = get_relative_dist(
+      &cm->seq_params.order_hint_info, start_frame_order_hint, cur_order_hint);
+
+  for (MV_REFERENCE_FRAME rf = LAST_FRAME; rf <= INTER_REFS_PER_FRAME; ++rf) {
+    ref_offset[rf] = get_relative_dist(&cm->seq_params.order_hint_info,
+                                       start_frame_order_hint,
+                                       ref_order_hints[rf - LAST_FRAME]);
+  }
+
+  if (dir == 2) start_to_current_frame_offset = -start_to_current_frame_offset;
+
+  MV_REF *mv_ref_base = start_frame_buf->mvs;
+  const int mvs_rows = (cm->mi_rows + 1) >> 1;
+  const int mvs_cols = (cm->mi_cols + 1) >> 1;
+
+  for (int blk_row = row_start; blk_row < row_end; ++blk_row) {
+    for (int blk_col = 0; blk_col < mvs_cols; ++blk_col) {
+      MV_REF *mv_ref = &mv_ref_base[blk_row * mvs_cols + blk_col];
+      MV fwd_mv = mv_ref->mv.as_mv;
+
+      if (mv_ref->ref_frame > INTRA_FRAME) {
+        int_mv this_mv;
+        int mi_r, mi_c;
+        const int ref_frame_offset = ref_offset[mv_ref->ref_frame];
+
+        int pos_valid =
+            abs(ref_frame_offset) <= MAX_FRAME_DISTANCE &&
+            ref_frame_offset > 0 &&
+            abs(start_to_current_frame_offset) <= MAX_FRAME_DISTANCE;
+
+        if (pos_valid) {
+          get_mv_projection(&this_mv.as_mv, fwd_mv,
+                            start_to_current_frame_offset, ref_frame_offset);
+          pos_valid = get_block_position(cm, &mi_r, &mi_c, blk_row, blk_col,
+                                         this_mv.as_mv, dir >> 1);
+        }
+
+        if (pos_valid) {
+          const int mi_offset = mi_r * (cm->mi_stride >> 1) + mi_c;
+
+          tpl_mvs_base[mi_offset].mfmv0.as_mv.row = fwd_mv.row;
+          tpl_mvs_base[mi_offset].mfmv0.as_mv.col = fwd_mv.col;
+          tpl_mvs_base[mi_offset].ref_frame_offset = ref_frame_offset;
+        }
+      }
+    }
+  }
+
+  return 1;
+}
+
+static int motion_field_thread_hook(void * data1, void * data2)
+{
+    MotionFieldWorkerData * td = (MotionFieldWorkerData*)data1;
+    AV1_COMMON * cm = (AV1_COMMON*)data2;
+    const int row_start = td->row_start;
+    const int row_end = td->row_end;
+
+    const OrderHintInfo *const order_hint_info = &cm->seq_params.order_hint_info;
+    TPL_MV_REF *tpl_mvs_base = cm->tpl_mvs;
+
+    const int idx0 = row_start * (cm->mi_stride >> 1);
+    const int size = row_end * (cm->mi_stride >> 1);
+    for (int idx = idx0; idx < size; ++idx) {
+        tpl_mvs_base[idx].mfmv0.as_int = INVALID_MV;
+        tpl_mvs_base[idx].ref_frame_offset = 0;
+    }
+
+    const int cur_order_hint = cm->cur_frame->order_hint;
+
+    const RefCntBuffer *ref_buf[INTER_REFS_PER_FRAME];
+    int ref_order_hint[INTER_REFS_PER_FRAME];
+
+    for (int ref_frame = LAST_FRAME; ref_frame <= ALTREF_FRAME; ref_frame++) {
+        const int ref_idx = ref_frame - LAST_FRAME;
+        const RefCntBuffer *const buf = get_ref_frame_buf(cm, ref_frame);
+        int order_hint = 0;
+
+        if (buf != NULL) order_hint = buf->order_hint;
+
+        ref_buf[ref_idx] = buf;
+        ref_order_hint[ref_idx] = order_hint;
+
+        if (get_relative_dist(order_hint_info, order_hint, cur_order_hint) > 0)
+            cm->ref_frame_side[ref_frame] = 1;
+        else if (order_hint == cur_order_hint)
+            cm->ref_frame_side[ref_frame] = -1;
+    }
+
+    int ref_stamp = MFMV_STACK_SIZE - 1;
+
+    if (ref_buf[LAST_FRAME - LAST_FRAME] != NULL) {
+        const int alt_of_lst_order_hint =
+            ref_buf[LAST_FRAME - LAST_FRAME]
+            ->ref_order_hints[ALTREF_FRAME - LAST_FRAME];
+
+        const int is_lst_overlay =
+            (alt_of_lst_order_hint == ref_order_hint[GOLDEN_FRAME - LAST_FRAME]);
+        if (!is_lst_overlay) {
+            motion_field_projection(cm, LAST_FRAME, 2, row_start, row_end);
+        }
+        --ref_stamp;
+    }
+
+    if (get_relative_dist(order_hint_info,
+        ref_order_hint[BWDREF_FRAME - LAST_FRAME],
+        cur_order_hint) > 0) {
+        if (motion_field_projection(cm, BWDREF_FRAME, 0, row_start, row_end)) --ref_stamp;
+    }
+
+    if (get_relative_dist(order_hint_info,
+        ref_order_hint[ALTREF2_FRAME - LAST_FRAME],
+        cur_order_hint) > 0) {
+        if (motion_field_projection(cm, ALTREF2_FRAME, 0, row_start, row_end)) --ref_stamp;
+    }
+
+    if (get_relative_dist(order_hint_info,
+        ref_order_hint[ALTREF_FRAME - LAST_FRAME],
+        cur_order_hint) > 0 &&
+        ref_stamp >= 0)
+    {
+        if (motion_field_projection(cm, ALTREF_FRAME, 0, row_start, row_end)) --ref_stamp;
+    }
+    if (ref_stamp >= 0) {
+        motion_field_projection(cm, LAST2_FRAME, 2, row_start, row_end);
+    }
+    return 0;
+}
+
+int av1_setup_motion_field(AV1Decoder *pbi) 
+{
+    AV1_COMMON *const cm = &pbi->common;
+    const OrderHintInfo *const order_hint_info = &cm->seq_params.order_hint_info;
+    const AVxWorkerInterface *const winterface = aom_get_worker_interface();
+    memset(cm->ref_frame_side, 0, sizeof(cm->ref_frame_side));
+    if (!order_hint_info->enable_order_hint)
+        return 0;
+    const int mvs_rows = (cm->mi_rows + 1) >> 1;
+#if 1
+    const int num_workers = AOMMIN(pbi->num_workers, (mvs_rows + 7) >> 3);
+    const int stripe_h = ((mvs_rows + num_workers - 1) / num_workers + 7) & ~7;
+    for (int worker_idx = 0; worker_idx < num_workers; ++worker_idx) {
+        AVxWorker *const worker = &pbi->tile_workers[worker_idx];
+        MotionFieldWorkerData *const thread_data = pbi->motion_field_data + worker_idx;
+        thread_data->row_start = worker_idx * stripe_h;
+        thread_data->row_end = AOMMIN(thread_data->row_start + stripe_h, mvs_rows);
+        winterface->sync(worker);
+        worker->hook = motion_field_thread_hook;
+        worker->data1 = thread_data;
+        worker->data2 = cm;
+
+        if (worker_idx == num_workers - 1)
+            winterface->execute(worker);
+        else
+            winterface->launch(worker);
+    }
+    for (int worker_idx = 0; worker_idx < (num_workers - 1); ++worker_idx)
+    {
+        winterface->sync(&pbi->tile_workers[worker_idx]);
+    }
+
+#else
+
+  TPL_MV_REF *tpl_mvs_base = cm->tpl_mvs;
+  int size = ((cm->mi_rows + MAX_MIB_SIZE) >> 1) * (cm->mi_stride >> 1);
+  for (int idx = 0; idx < size; ++idx) {
+    tpl_mvs_base[idx].mfmv0.as_int = INVALID_MV;
+    tpl_mvs_base[idx].ref_frame_offset = 0;
+  }
+
+  const int cur_order_hint = cm->cur_frame->order_hint;
+
+  const RefCntBuffer *ref_buf[INTER_REFS_PER_FRAME];
+  int ref_order_hint[INTER_REFS_PER_FRAME];
+
+  for (int ref_frame = LAST_FRAME; ref_frame <= ALTREF_FRAME; ref_frame++) {
+    const int ref_idx = ref_frame - LAST_FRAME;
+    const RefCntBuffer *const buf = get_ref_frame_buf(cm, ref_frame);
+    int order_hint = 0;
+
+    if (buf != NULL) order_hint = buf->order_hint;
+
+    ref_buf[ref_idx] = buf;
+    ref_order_hint[ref_idx] = order_hint;
+
+    if (get_relative_dist(order_hint_info, order_hint, cur_order_hint) > 0)
+      cm->ref_frame_side[ref_frame] = 1;
+    else if (order_hint == cur_order_hint)
+      cm->ref_frame_side[ref_frame] = -1;
+  }
+
+  int ref_stamp = MFMV_STACK_SIZE - 1;
+
+  if (ref_buf[LAST_FRAME - LAST_FRAME] != NULL) {
+    const int alt_of_lst_order_hint =
+        ref_buf[LAST_FRAME - LAST_FRAME]
+            ->ref_order_hints[ALTREF_FRAME - LAST_FRAME];
+
+    const int is_lst_overlay =
+        (alt_of_lst_order_hint == ref_order_hint[GOLDEN_FRAME - LAST_FRAME]);
+    if (!is_lst_overlay) {
+        motion_field_projection(cm, LAST_FRAME, 2, 0, mvs_rows); ++count;
+    }
+    --ref_stamp;
+  }
+
+  if (get_relative_dist(order_hint_info,
+                        ref_order_hint[BWDREF_FRAME - LAST_FRAME],
+                        cur_order_hint) > 0) {
+    if (motion_field_projection(cm, BWDREF_FRAME, 0, 0, mvs_rows)) --ref_stamp;
+    ++count;
+  }
+
+  if (get_relative_dist(order_hint_info,
+                        ref_order_hint[ALTREF2_FRAME - LAST_FRAME],
+                        cur_order_hint) > 0) {
+    if (motion_field_projection(cm, ALTREF2_FRAME, 0, 0, mvs_rows)) --ref_stamp;
+    ++count;
+  }
+
+  if (get_relative_dist(order_hint_info,
+      ref_order_hint[ALTREF_FRAME - LAST_FRAME],
+      cur_order_hint) > 0 &&
+      ref_stamp >= 0)
+  {
+      if (motion_field_projection(cm, ALTREF_FRAME, 0, 0, mvs_rows)) --ref_stamp;
+      ++count;
+  }
+  if (ref_stamp >= 0) {
+      motion_field_projection(cm, LAST2_FRAME, 2, 0, mvs_rows);    ++count;
+  }
+#endif
+  return 0;
+}
+
+static INLINE void record_samples(MB_MODE_INFO *mbmi, int *pts, int *pts_inref,
+                                  int row_offset, int sign_r, int col_offset,
+                                  int sign_c) {
+  int bw = block_size_wide[mbmi->sb_type];
+  int bh = block_size_high[mbmi->sb_type];
+  int x = col_offset * MI_SIZE + sign_c * AOMMAX(bw, MI_SIZE) / 2 - 1;
+  int y = row_offset * MI_SIZE + sign_r * AOMMAX(bh, MI_SIZE) / 2 - 1;
+
+  pts[0] = (x * 8);
+  pts[1] = (y * 8);
+  pts_inref[0] = (x * 8) + mbmi->mv[0].as_mv.col;
+  pts_inref[1] = (y * 8) + mbmi->mv[0].as_mv.row;
+}
+
+// Select samples according to the motion vector difference.
+int selectSamples(MV *mv, int *pts, int *pts_inref, int len, BLOCK_SIZE bsize) {
+  const int bw = block_size_wide[bsize];
+  const int bh = block_size_high[bsize];
+  const int thresh = clamp(AOMMAX(bw, bh), 16, 112);
+  int pts_mvd[SAMPLES_ARRAY_SIZE] = { 0 };
+  int i, j, k, l = len;
+  int ret = 0;
+  assert(len <= LEAST_SQUARES_SAMPLES_MAX);
+
+  // Obtain the motion vector difference.
+  for (i = 0; i < len; ++i) {
+    pts_mvd[i] = abs(pts_inref[2 * i] - pts[2 * i] - mv->col) +
+                 abs(pts_inref[2 * i + 1] - pts[2 * i + 1] - mv->row);
+
+    if (pts_mvd[i] > thresh)
+      pts_mvd[i] = -1;
+    else
+      ret++;
+  }
+
+  // Keep at least 1 sample.
+  if (!ret) return 1;
+
+  i = 0;
+  j = l - 1;
+  for (k = 0; k < l - ret; k++) {
+    while (pts_mvd[i] != -1) i++;
+    while (pts_mvd[j] == -1) j--;
+    assert(i != j);
+    if (i > j) break;
+
+    // Replace the discarded samples;
+    pts_mvd[i] = pts_mvd[j];
+    pts[2 * i] = pts[2 * j];
+    pts[2 * i + 1] = pts[2 * j + 1];
+    pts_inref[2 * i] = pts_inref[2 * j];
+    pts_inref[2 * i + 1] = pts_inref[2 * j + 1];
+    i++;
+    j--;
+  }
+
+  return ret;
+}
+
+// Note: Samples returned are at 1/8-pel precision
+// Sample are the neighbor block center point's coordinates relative to the
+// left-top pixel of current block.
+int findSamples(const AV1_COMMON *cm, MACROBLOCKD *xd, int mi_row, int mi_col,
+                int *pts, int *pts_inref) {
+  MB_MODE_INFO *const mbmi0 = xd->mi[0];
+  int ref_frame = mbmi0->ref_frame[0];
+  int up_available = xd->up_available;
+  int left_available = xd->left_available;
+  int i, mi_step = 1, np = 0;
+
+  const TileInfo *const tile = &xd->tile;
+  int do_tl = 1;
+  int do_tr = 1;
+
+  // scan the nearest above rows
+  if (up_available) {
+    int mi_row_offset = -1;
+    MB_MODE_INFO *mbmi = xd->mi[mi_row_offset * xd->mi_stride];
+    uint8_t n4_w = mi_size_wide[mbmi->sb_type];
+
+    if (xd->n4_w <= n4_w) {
+      // Handle "current block width <= above block width" case.
+      int col_offset = -mi_col % n4_w;
+
+      if (col_offset < 0) do_tl = 0;
+      if (col_offset + n4_w > xd->n4_w) do_tr = 0;
+
+      if (mbmi->ref_frame[0] == ref_frame && mbmi->ref_frame[1] == NONE_FRAME) {
+        record_samples(mbmi, pts, pts_inref, 0, -1, col_offset, 1);
+        pts += 2;
+        pts_inref += 2;
+        np++;
+        if (np >= LEAST_SQUARES_SAMPLES_MAX) return LEAST_SQUARES_SAMPLES_MAX;
+      }
+    } else {
+      // Handle "current block width > above block width" case.
+      for (i = 0; i < AOMMIN(xd->n4_w, cm->mi_cols - mi_col); i += mi_step) {
+        int mi_col_offset = i;
+        mbmi = xd->mi[mi_col_offset + mi_row_offset * xd->mi_stride];
+        n4_w = mi_size_wide[mbmi->sb_type];
+        mi_step = AOMMIN(xd->n4_w, n4_w);
+
+        if (mbmi->ref_frame[0] == ref_frame &&
+            mbmi->ref_frame[1] == NONE_FRAME) {
+          record_samples(mbmi, pts, pts_inref, 0, -1, i, 1);
+          pts += 2;
+          pts_inref += 2;
+          np++;
+          if (np >= LEAST_SQUARES_SAMPLES_MAX) return LEAST_SQUARES_SAMPLES_MAX;
+        }
+      }
+    }
+  }
+  assert(np <= LEAST_SQUARES_SAMPLES_MAX);
+
+  // scan the nearest left columns
+  if (left_available) {
+    int mi_col_offset = -1;
+
+    MB_MODE_INFO *mbmi = xd->mi[mi_col_offset];
+    uint8_t n4_h = mi_size_high[mbmi->sb_type];
+
+    if (xd->n4_h <= n4_h) {
+      // Handle "current block height <= above block height" case.
+      int row_offset = -mi_row % n4_h;
+
+      if (row_offset < 0) do_tl = 0;
+
+      if (mbmi->ref_frame[0] == ref_frame && mbmi->ref_frame[1] == NONE_FRAME) {
+        record_samples(mbmi, pts, pts_inref, row_offset, 1, 0, -1);
+        pts += 2;
+        pts_inref += 2;
+        np++;
+        if (np >= LEAST_SQUARES_SAMPLES_MAX) return LEAST_SQUARES_SAMPLES_MAX;
+      }
+    } else {
+      // Handle "current block height > above block height" case.
+      for (i = 0; i < AOMMIN(xd->n4_h, cm->mi_rows - mi_row); i += mi_step) {
+        int mi_row_offset = i;
+        mbmi = xd->mi[mi_col_offset + mi_row_offset * xd->mi_stride];
+        n4_h = mi_size_high[mbmi->sb_type];
+        mi_step = AOMMIN(xd->n4_h, n4_h);
+
+        if (mbmi->ref_frame[0] == ref_frame &&
+            mbmi->ref_frame[1] == NONE_FRAME) {
+          record_samples(mbmi, pts, pts_inref, i, 1, 0, -1);
+          pts += 2;
+          pts_inref += 2;
+          np++;
+          if (np >= LEAST_SQUARES_SAMPLES_MAX) return LEAST_SQUARES_SAMPLES_MAX;
+        }
+      }
+    }
+  }
+  assert(np <= LEAST_SQUARES_SAMPLES_MAX);
+
+  // Top-left block
+  if (do_tl && left_available && up_available) {
+    int mi_row_offset = -1;
+    int mi_col_offset = -1;
+
+    MB_MODE_INFO *mbmi = xd->mi[mi_col_offset + mi_row_offset * xd->mi_stride];
+
+    if (mbmi->ref_frame[0] == ref_frame && mbmi->ref_frame[1] == NONE_FRAME) {
+      record_samples(mbmi, pts, pts_inref, 0, -1, 0, -1);
+      pts += 2;
+      pts_inref += 2;
+      np++;
+      if (np >= LEAST_SQUARES_SAMPLES_MAX) return LEAST_SQUARES_SAMPLES_MAX;
+    }
+  }
+  assert(np <= LEAST_SQUARES_SAMPLES_MAX);
+
+  // Top-right block
+  if (do_tr &&
+      has_top_right(cm, xd, mi_row, mi_col, AOMMAX(xd->n4_w, xd->n4_h))) {
+    POSITION trb_pos = { -1, xd->n4_w };
+
+    if (is_inside(tile, mi_col, mi_row, &trb_pos)) {
+      int mi_row_offset = -1;
+      int mi_col_offset = xd->n4_w;
+
+      MB_MODE_INFO *mbmi =
+          xd->mi[mi_col_offset + mi_row_offset * xd->mi_stride];
+
+      if (mbmi->ref_frame[0] == ref_frame && mbmi->ref_frame[1] == NONE_FRAME) {
+        record_samples(mbmi, pts, pts_inref, 0, -1, xd->n4_w, 1);
+        np++;
+        if (np >= LEAST_SQUARES_SAMPLES_MAX) return LEAST_SQUARES_SAMPLES_MAX;
+      }
+    }
+  }
+  assert(np <= LEAST_SQUARES_SAMPLES_MAX);
+
+  return np;
+}
+
+void av1_setup_skip_mode_allowed(AV1_COMMON *cm) {
+  const OrderHintInfo *const order_hint_info = &cm->seq_params.order_hint_info;
+  SkipModeInfo *const skip_mode_info = &cm->current_frame.skip_mode_info;
+
+  skip_mode_info->skip_mode_allowed = 0;
+  skip_mode_info->ref_frame_idx_0 = INVALID_IDX;
+  skip_mode_info->ref_frame_idx_1 = INVALID_IDX;
+
+  if (!order_hint_info->enable_order_hint || frame_is_intra_only(cm) ||
+      cm->current_frame.reference_mode == SINGLE_REFERENCE)
+    return;
+
+  const int cur_order_hint = cm->current_frame.order_hint;
+  int ref_order_hints[2] = { -1, INT_MAX };
+  int ref_idx[2] = { INVALID_IDX, INVALID_IDX };
+
+  // Identify the nearest forward and backward references.
+  for (int i = 0; i < INTER_REFS_PER_FRAME; ++i) {
+    const RefCntBuffer *const buf = get_ref_frame_buf(cm, LAST_FRAME + i);
+    if (buf == NULL) continue;
+
+    const int ref_order_hint = buf->order_hint;
+    if (get_relative_dist(order_hint_info, ref_order_hint, cur_order_hint) <
+        0) {
+      // Forward reference
+      if (ref_order_hints[0] == -1 ||
+          get_relative_dist(order_hint_info, ref_order_hint,
+                            ref_order_hints[0]) > 0) {
+        ref_order_hints[0] = ref_order_hint;
+        ref_idx[0] = i;
+      }
+    } else if (get_relative_dist(order_hint_info, ref_order_hint,
+                                 cur_order_hint) > 0) {
+      // Backward reference
+      if (ref_order_hints[1] == INT_MAX ||
+          get_relative_dist(order_hint_info, ref_order_hint,
+                            ref_order_hints[1]) < 0) {
+        ref_order_hints[1] = ref_order_hint;
+        ref_idx[1] = i;
+      }
+    }
+  }
+
+  if (ref_idx[0] != INVALID_IDX && ref_idx[1] != INVALID_IDX) {
+    // == Bi-directional prediction ==
+    skip_mode_info->skip_mode_allowed = 1;
+    skip_mode_info->ref_frame_idx_0 = AOMMIN(ref_idx[0], ref_idx[1]);
+    skip_mode_info->ref_frame_idx_1 = AOMMAX(ref_idx[0], ref_idx[1]);
+  } else if (ref_idx[0] != INVALID_IDX && ref_idx[1] == INVALID_IDX) {
+    // == Forward prediction only ==
+    // Identify the second nearest forward reference.
+    ref_order_hints[1] = -1;
+    for (int i = 0; i < INTER_REFS_PER_FRAME; ++i) {
+      const RefCntBuffer *const buf = get_ref_frame_buf(cm, LAST_FRAME + i);
+      if (buf == NULL) continue;
+
+      const int ref_order_hint = buf->order_hint;
+      if ((ref_order_hints[0] != -1 &&
+           get_relative_dist(order_hint_info, ref_order_hint,
+                             ref_order_hints[0]) < 0) &&
+          (ref_order_hints[1] == -1 ||
+           get_relative_dist(order_hint_info, ref_order_hint,
+                             ref_order_hints[1]) > 0)) {
+        // Second closest forward reference
+        ref_order_hints[1] = ref_order_hint;
+        ref_idx[1] = i;
+      }
+    }
+    if (ref_order_hints[1] != -1) {
+      skip_mode_info->skip_mode_allowed = 1;
+      skip_mode_info->ref_frame_idx_0 = AOMMIN(ref_idx[0], ref_idx[1]);
+      skip_mode_info->ref_frame_idx_1 = AOMMAX(ref_idx[0], ref_idx[1]);
+    }
+  }
+}
+
+typedef struct {
+  int map_idx;        // frame map index
+  RefCntBuffer *buf;  // frame buffer
+  int sort_idx;       // index based on the offset to be used for sorting
+} REF_FRAME_INFO;
+
+// Compares the sort_idx fields. If they are equal, then compares the map_idx
+// fields to break the tie. This ensures a stable sort.
+static int compare_ref_frame_info(const void *arg_a, const void *arg_b) {
+  const REF_FRAME_INFO *info_a = (REF_FRAME_INFO *)arg_a;
+  const REF_FRAME_INFO *info_b = (REF_FRAME_INFO *)arg_b;
+
+  const int sort_idx_diff = info_a->sort_idx - info_b->sort_idx;
+  if (sort_idx_diff != 0) return sort_idx_diff;
+  return info_a->map_idx - info_b->map_idx;
+}
+
+static void set_ref_frame_info(int *remapped_ref_idx, int frame_idx,
+                               REF_FRAME_INFO *ref_info) {
+  assert(frame_idx >= 0 && frame_idx < INTER_REFS_PER_FRAME);
+
+  remapped_ref_idx[frame_idx] = ref_info->map_idx;
+}
+
+void av1_set_frame_refs(AV1_COMMON *const cm, int *remapped_ref_idx,
+                        int lst_map_idx, int gld_map_idx) {
+  int lst_frame_sort_idx = -1;
+  int gld_frame_sort_idx = -1;
+
+  assert(cm->seq_params.order_hint_info.enable_order_hint);
+  assert(cm->seq_params.order_hint_info.order_hint_bits_minus_1 >= 0);
+  const int cur_order_hint = (int)cm->current_frame.order_hint;
+  const int cur_frame_sort_idx =
+      1 << cm->seq_params.order_hint_info.order_hint_bits_minus_1;
+
+  REF_FRAME_INFO ref_frame_info[REF_FRAMES];
+  int ref_flag_list[INTER_REFS_PER_FRAME] = { 0, 0, 0, 0, 0, 0, 0 };
+
+  for (int i = 0; i < REF_FRAMES; ++i) {
+    const int map_idx = i;
+
+    ref_frame_info[i].map_idx = map_idx;
+    ref_frame_info[i].sort_idx = -1;
+
+    RefCntBuffer *const buf = cm->ref_frame_map[map_idx];
+    ref_frame_info[i].buf = buf;
+
+    if (buf == NULL) continue;
+    // If this assertion fails, there is a reference leak.
+    assert(buf->ref_count > 0);
+    // TODO(wtc@google.com): Remove the checking on ref_count after 2019-03-01.
+    if (buf->ref_count <= 0) continue;
+
+    const int offset = (int)buf->order_hint;
+    ref_frame_info[i].sort_idx =
+        (offset == -1) ? -1
+                       : cur_frame_sort_idx +
+                             get_relative_dist(&cm->seq_params.order_hint_info,
+                                               offset, cur_order_hint);
+    assert(ref_frame_info[i].sort_idx >= -1);
+
+    if (map_idx == lst_map_idx) lst_frame_sort_idx = ref_frame_info[i].sort_idx;
+    if (map_idx == gld_map_idx) gld_frame_sort_idx = ref_frame_info[i].sort_idx;
+  }
+
+  // Confirm both LAST_FRAME and GOLDEN_FRAME are valid forward reference
+  // frames.
+  if (lst_frame_sort_idx == -1 || lst_frame_sort_idx >= cur_frame_sort_idx) {
+    aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                       "Inter frame requests a look-ahead frame as LAST");
+  }
+  if (gld_frame_sort_idx == -1 || gld_frame_sort_idx >= cur_frame_sort_idx) {
+    aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                       "Inter frame requests a look-ahead frame as GOLDEN");
+  }
+
+  // Sort ref frames based on their frame_offset values.
+  qsort(ref_frame_info, REF_FRAMES, sizeof(REF_FRAME_INFO),
+        compare_ref_frame_info);
+
+  // Identify forward and backward reference frames.
+  // Forward  reference: offset < order_hint
+  // Backward reference: offset >= order_hint
+  int fwd_start_idx = 0, fwd_end_idx = REF_FRAMES - 1;
+
+  for (int i = 0; i < REF_FRAMES; i++) {
+    if (ref_frame_info[i].sort_idx == -1) {
+      fwd_start_idx++;
+      continue;
+    }
+
+    if (ref_frame_info[i].sort_idx >= cur_frame_sort_idx) {
+      fwd_end_idx = i - 1;
+      break;
+    }
+  }
+
+  int bwd_start_idx = fwd_end_idx + 1;
+  int bwd_end_idx = REF_FRAMES - 1;
+
+  // === Backward Reference Frames ===
+
+  // == ALTREF_FRAME ==
+  if (bwd_start_idx <= bwd_end_idx) {
+    set_ref_frame_info(remapped_ref_idx, ALTREF_FRAME - LAST_FRAME,
+                       &ref_frame_info[bwd_end_idx]);
+    ref_flag_list[ALTREF_FRAME - LAST_FRAME] = 1;
+    bwd_end_idx--;
+  }
+
+  // == BWDREF_FRAME ==
+  if (bwd_start_idx <= bwd_end_idx) {
+    set_ref_frame_info(remapped_ref_idx, BWDREF_FRAME - LAST_FRAME,
+                       &ref_frame_info[bwd_start_idx]);
+    ref_flag_list[BWDREF_FRAME - LAST_FRAME] = 1;
+    bwd_start_idx++;
+  }
+
+  // == ALTREF2_FRAME ==
+  if (bwd_start_idx <= bwd_end_idx) {
+    set_ref_frame_info(remapped_ref_idx, ALTREF2_FRAME - LAST_FRAME,
+                       &ref_frame_info[bwd_start_idx]);
+    ref_flag_list[ALTREF2_FRAME - LAST_FRAME] = 1;
+  }
+
+  // === Forward Reference Frames ===
+
+  for (int i = fwd_start_idx; i <= fwd_end_idx; ++i) {
+    // == LAST_FRAME ==
+    if (ref_frame_info[i].map_idx == lst_map_idx) {
+      set_ref_frame_info(remapped_ref_idx, LAST_FRAME - LAST_FRAME,
+                         &ref_frame_info[i]);
+      ref_flag_list[LAST_FRAME - LAST_FRAME] = 1;
+    }
+
+    // == GOLDEN_FRAME ==
+    if (ref_frame_info[i].map_idx == gld_map_idx) {
+      set_ref_frame_info(remapped_ref_idx, GOLDEN_FRAME - LAST_FRAME,
+                         &ref_frame_info[i]);
+      ref_flag_list[GOLDEN_FRAME - LAST_FRAME] = 1;
+    }
+  }
+
+  assert(ref_flag_list[LAST_FRAME - LAST_FRAME] == 1 &&
+         ref_flag_list[GOLDEN_FRAME - LAST_FRAME] == 1);
+
+  // == LAST2_FRAME ==
+  // == LAST3_FRAME ==
+  // == BWDREF_FRAME ==
+  // == ALTREF2_FRAME ==
+  // == ALTREF_FRAME ==
+
+  // Set up the reference frames in the anti-chronological order.
+  static const MV_REFERENCE_FRAME ref_frame_list[INTER_REFS_PER_FRAME - 2] = {
+    LAST2_FRAME, LAST3_FRAME, BWDREF_FRAME, ALTREF2_FRAME, ALTREF_FRAME
+  };
+
+  int ref_idx;
+  for (ref_idx = 0; ref_idx < (INTER_REFS_PER_FRAME - 2); ref_idx++) {
+    const MV_REFERENCE_FRAME ref_frame = ref_frame_list[ref_idx];
+
+    if (ref_flag_list[ref_frame - LAST_FRAME] == 1) continue;
+
+    while (fwd_start_idx <= fwd_end_idx &&
+           (ref_frame_info[fwd_end_idx].map_idx == lst_map_idx ||
+            ref_frame_info[fwd_end_idx].map_idx == gld_map_idx)) {
+      fwd_end_idx--;
+    }
+    if (fwd_start_idx > fwd_end_idx) break;
+
+    set_ref_frame_info(remapped_ref_idx, ref_frame - LAST_FRAME,
+                       &ref_frame_info[fwd_end_idx]);
+    ref_flag_list[ref_frame - LAST_FRAME] = 1;
+
+    fwd_end_idx--;
+  }
+
+  // Assign all the remaining frame(s), if any, to the earliest reference frame.
+  for (; ref_idx < (INTER_REFS_PER_FRAME - 2); ref_idx++) {
+    const MV_REFERENCE_FRAME ref_frame = ref_frame_list[ref_idx];
+    if (ref_flag_list[ref_frame - LAST_FRAME] == 1) continue;
+    set_ref_frame_info(remapped_ref_idx, ref_frame - LAST_FRAME,
+                       &ref_frame_info[fwd_start_idx]);
+    ref_flag_list[ref_frame - LAST_FRAME] = 1;
+  }
+
+  for (int i = 0; i < INTER_REFS_PER_FRAME; i++) {
+    assert(ref_flag_list[i] == 1);
+  }
+}

diff --git a/libav1/av1/common/mvref_common.h b/libav1/av1/common/mvref_common.h
new file mode 100644
index 0000000..02f6db5
--- /dev/null
+++ b/libav1/av1/common/mvref_common.h

@@ -0,0 +1,354 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#ifndef AOM_AV1_COMMON_MVREF_COMMON_H_
+#define AOM_AV1_COMMON_MVREF_COMMON_H_
+
+#include "av1/common/onyxc_int.h"
+#include "av1/common/blockd.h"
+#include "av1/decoder/decoder.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MVREF_ROW_COLS 3
+
+// Set the upper limit of the motion vector component magnitude.
+// This would make a motion vector fit in 26 bits. Plus 3 bits for the
+// reference frame index. A tuple of motion vector can hence be stored within
+// 32 bit range for efficient load/store operations.
+#define REFMVS_LIMIT ((1 << 12) - 1)
+
+typedef struct position {
+  int row;
+  int col;
+} POSITION;
+
+// clamp_mv_ref
+#define MV_BORDER (16 << 3)  // Allow 16 pels in 1/8th pel units
+
+static INLINE int get_relative_dist(const OrderHintInfo *oh, int a, int b) {
+  if (!oh->enable_order_hint) return 0;
+
+  const int bits = oh->order_hint_bits_minus_1 + 1;
+
+  assert(bits >= 1);
+  assert(a >= 0 && a < (1 << bits));
+  assert(b >= 0 && b < (1 << bits));
+
+  int diff = a - b;
+  const int m = 1 << (bits - 1);
+  diff = (diff & (m - 1)) - (diff & m);
+  return diff;
+}
+
+static INLINE void clamp_mv_ref(MV *mv, int bw, int bh, const MACROBLOCKD *xd) {
+  clamp_mv(mv, xd->mb_to_left_edge - bw * 8 - MV_BORDER,
+           xd->mb_to_right_edge + bw * 8 + MV_BORDER,
+           xd->mb_to_top_edge - bh * 8 - MV_BORDER,
+           xd->mb_to_bottom_edge + bh * 8 + MV_BORDER);
+}
+
+// This function returns either the appropriate sub block or block's mv
+// on whether the block_size < 8x8 and we have check_sub_blocks set.
+static INLINE int_mv get_sub_block_mv(const MB_MODE_INFO *candidate,
+                                      int which_mv, int search_col) {
+  (void)search_col;
+  return candidate->mv[which_mv];
+}
+
+static INLINE int_mv get_sub_block_pred_mv(const MB_MODE_INFO *candidate,
+                                           int which_mv, int search_col) {
+  (void)search_col;
+  return candidate->mv[which_mv];
+}
+
+// Checks that the given mi_row, mi_col and search point
+// are inside the borders of the tile.
+static INLINE int is_inside(const TileInfo *const tile, int mi_col, int mi_row,
+                            const POSITION *mi_pos) {
+  return !(mi_row + mi_pos->row < tile->mi_row_start ||
+           mi_col + mi_pos->col < tile->mi_col_start ||
+           mi_row + mi_pos->row >= tile->mi_row_end ||
+           mi_col + mi_pos->col >= tile->mi_col_end);
+}
+
+static INLINE int find_valid_row_offset(const TileInfo *const tile, int mi_row,
+                                        int row_offset) {
+  return clamp(row_offset, tile->mi_row_start - mi_row,
+               tile->mi_row_end - mi_row - 1);
+}
+
+static INLINE int find_valid_col_offset(const TileInfo *const tile, int mi_col,
+                                        int col_offset) {
+  return clamp(col_offset, tile->mi_col_start - mi_col,
+               tile->mi_col_end - mi_col - 1);
+}
+
+static INLINE void lower_mv_precision(MV *mv, int allow_hp, int is_integer) {
+  if (is_integer) {
+    integer_mv_precision(mv);
+  } else {
+    if (!allow_hp) {
+      if (mv->row & 1) mv->row += (mv->row > 0 ? -1 : 1);
+      if (mv->col & 1) mv->col += (mv->col > 0 ? -1 : 1);
+    }
+  }
+}
+
+static INLINE int8_t get_uni_comp_ref_idx(const MV_REFERENCE_FRAME *const rf) {
+  // Single ref pred
+  if (rf[1] <= INTRA_FRAME) return -1;
+
+  // Bi-directional comp ref pred
+  if ((rf[0] < BWDREF_FRAME) && (rf[1] >= BWDREF_FRAME)) return -1;
+
+  for (int8_t ref_idx = 0; ref_idx < TOTAL_UNIDIR_COMP_REFS; ++ref_idx) {
+    if (rf[0] == comp_ref0(ref_idx) && rf[1] == comp_ref1(ref_idx))
+      return ref_idx;
+  }
+  return -1;
+}
+
+static INLINE int8_t av1_ref_frame_type(const MV_REFERENCE_FRAME *const rf) {
+  if (rf[1] > INTRA_FRAME) {
+    const int8_t uni_comp_ref_idx = get_uni_comp_ref_idx(rf);
+    if (uni_comp_ref_idx >= 0) {
+      assert((REF_FRAMES + FWD_REFS * BWD_REFS + uni_comp_ref_idx) <
+             MODE_CTX_REF_FRAMES);
+      return REF_FRAMES + FWD_REFS * BWD_REFS + uni_comp_ref_idx;
+    } else {
+      return REF_FRAMES + FWD_RF_OFFSET(rf[0]) +
+             BWD_RF_OFFSET(rf[1]) * FWD_REFS;
+    }
+  }
+
+  return rf[0];
+}
+
+// clang-format off
+static MV_REFERENCE_FRAME ref_frame_map[TOTAL_COMP_REFS][2] = {
+  { LAST_FRAME, BWDREF_FRAME },  { LAST2_FRAME, BWDREF_FRAME },
+  { LAST3_FRAME, BWDREF_FRAME }, { GOLDEN_FRAME, BWDREF_FRAME },
+
+  { LAST_FRAME, ALTREF2_FRAME },  { LAST2_FRAME, ALTREF2_FRAME },
+  { LAST3_FRAME, ALTREF2_FRAME }, { GOLDEN_FRAME, ALTREF2_FRAME },
+
+  { LAST_FRAME, ALTREF_FRAME },  { LAST2_FRAME, ALTREF_FRAME },
+  { LAST3_FRAME, ALTREF_FRAME }, { GOLDEN_FRAME, ALTREF_FRAME },
+
+  { LAST_FRAME, LAST2_FRAME }, { LAST_FRAME, LAST3_FRAME },
+  { LAST_FRAME, GOLDEN_FRAME }, { BWDREF_FRAME, ALTREF_FRAME },
+
+  // NOTE: Following reference frame pairs are not supported to be explicitly
+  //       signalled, but they are possibly chosen by the use of skip_mode,
+  //       which may use the most recent one-sided reference frame pair.
+  { LAST2_FRAME, LAST3_FRAME }, { LAST2_FRAME, GOLDEN_FRAME },
+  { LAST3_FRAME, GOLDEN_FRAME }, {BWDREF_FRAME, ALTREF2_FRAME},
+  { ALTREF2_FRAME, ALTREF_FRAME }
+};
+// clang-format on
+
+static INLINE void av1_set_ref_frame(MV_REFERENCE_FRAME *rf,
+                                     MV_REFERENCE_FRAME ref_frame_type) {
+  if (ref_frame_type >= REF_FRAMES) {
+    rf[0] = ref_frame_map[ref_frame_type - REF_FRAMES][0];
+    rf[1] = ref_frame_map[ref_frame_type - REF_FRAMES][1];
+  } else {
+    assert(ref_frame_type > NONE_FRAME);
+    rf[0] = ref_frame_type;
+    rf[1] = NONE_FRAME;
+  }
+}
+
+static uint16_t compound_mode_ctx_map[3][COMP_NEWMV_CTXS] = {
+  { 0, 1, 1, 1, 1 },
+  { 1, 2, 3, 4, 4 },
+  { 4, 4, 5, 6, 7 },
+};
+
+static INLINE int16_t av1_mode_context_analyzer(
+    const int16_t *const mode_context, const MV_REFERENCE_FRAME *const rf) {
+  const int8_t ref_frame = av1_ref_frame_type(rf);
+
+  if (rf[1] <= INTRA_FRAME) return mode_context[ref_frame];
+
+  const int16_t newmv_ctx = mode_context[ref_frame] & NEWMV_CTX_MASK;
+  const int16_t refmv_ctx =
+      (mode_context[ref_frame] >> REFMV_OFFSET) & REFMV_CTX_MASK;
+
+  const int16_t comp_ctx = compound_mode_ctx_map[refmv_ctx >> 1][AOMMIN(
+      newmv_ctx, COMP_NEWMV_CTXS - 1)];
+  return comp_ctx;
+}
+
+static INLINE uint8_t av1_drl_ctx(const CANDIDATE_MV *ref_mv_stack,
+                                  int ref_idx) {
+  if (ref_mv_stack[ref_idx].weight >= REF_CAT_LEVEL &&
+      ref_mv_stack[ref_idx + 1].weight >= REF_CAT_LEVEL)
+    return 0;
+
+  if (ref_mv_stack[ref_idx].weight >= REF_CAT_LEVEL &&
+      ref_mv_stack[ref_idx + 1].weight < REF_CAT_LEVEL)
+    return 1;
+
+  if (ref_mv_stack[ref_idx].weight < REF_CAT_LEVEL &&
+      ref_mv_stack[ref_idx + 1].weight < REF_CAT_LEVEL)
+    return 2;
+
+  return 0;
+}
+
+void av1_setup_frame_buf_refs(AV1_COMMON *cm);
+void av1_setup_frame_sign_bias(AV1_COMMON *cm);
+void av1_setup_skip_mode_allowed(AV1_COMMON *cm);
+int av1_setup_motion_field(AV1Decoder *pbi);
+void av1_set_frame_refs(AV1_COMMON *const cm, int *remapped_ref_idx,
+                        int lst_map_idx, int gld_map_idx);
+
+static INLINE void av1_collect_neighbors_ref_counts(MACROBLOCKD *const xd) {
+  av1_zero(xd->neighbors_ref_counts);
+
+  uint8_t *const ref_counts = xd->neighbors_ref_counts;
+
+  const MB_MODE_INFO *const above_mbmi = xd->above_mbmi;
+  const MB_MODE_INFO *const left_mbmi = xd->left_mbmi;
+  const int above_in_image = xd->up_available;
+  const int left_in_image = xd->left_available;
+
+  // Above neighbor
+  if (above_in_image && is_inter_block(above_mbmi)) {
+    ref_counts[above_mbmi->ref_frame[0]]++;
+    if (has_second_ref(above_mbmi)) {
+      ref_counts[above_mbmi->ref_frame[1]]++;
+    }
+  }
+
+  // Left neighbor
+  if (left_in_image && is_inter_block(left_mbmi)) {
+    ref_counts[left_mbmi->ref_frame[0]]++;
+    if (has_second_ref(left_mbmi)) {
+      ref_counts[left_mbmi->ref_frame[1]]++;
+    }
+  }
+}
+
+void av1_copy_frame_mvs(const AV1_COMMON *const cm,
+                        const MB_MODE_INFO *const mi, int mi_row, int mi_col,
+                        int x_mis, int y_mis);
+
+// The global_mvs output parameter points to an array of REF_FRAMES elements.
+// The caller may pass a null global_mvs if it does not need the global_mvs
+// output.
+void av1_find_mv_refs(const AV1_COMMON *cm, const MACROBLOCKD *xd,
+                      MB_MODE_INFO *mi, MV_REFERENCE_FRAME ref_frame,
+                      uint8_t ref_mv_count[MODE_CTX_REF_FRAMES],
+                      CANDIDATE_MV ref_mv_stack[][MAX_REF_MV_STACK_SIZE],
+                      int_mv mv_ref_list[][MAX_MV_REF_CANDIDATES],
+                      int_mv *global_mvs, int mi_row, int mi_col,
+                      int16_t *mode_context);
+
+// check a list of motion vectors by sad score using a number rows of pixels
+// above and a number cols of pixels in the left to select the one with best
+// score to use as ref motion vector
+void av1_find_best_ref_mvs(int allow_hp, int_mv *mvlist, int_mv *nearest_mv,
+                           int_mv *near_mv, int is_integer);
+
+int selectSamples(MV *mv, int *pts, int *pts_inref, int len, BLOCK_SIZE bsize);
+int findSamples(const AV1_COMMON *cm, MACROBLOCKD *xd, int mi_row, int mi_col,
+                int *pts, int *pts_inref);
+
+#define INTRABC_DELAY_PIXELS 256  //  Delay of 256 pixels
+#define INTRABC_DELAY_SB64 (INTRABC_DELAY_PIXELS / 64)
+
+static INLINE void av1_find_ref_dv(int_mv *ref_dv, const TileInfo *const tile,
+                                   int mib_size, int mi_row, int mi_col) {
+  (void)mi_col;
+  if (mi_row - mib_size < tile->mi_row_start) {
+    ref_dv->as_mv.row = 0;
+    ref_dv->as_mv.col = -MI_SIZE * mib_size - INTRABC_DELAY_PIXELS;
+  } else {
+    ref_dv->as_mv.row = -MI_SIZE * mib_size;
+    ref_dv->as_mv.col = 0;
+  }
+  ref_dv->as_mv.row *= 8;
+  ref_dv->as_mv.col *= 8;
+}
+
+static INLINE int av1_is_dv_valid(const MV dv, const AV1_COMMON *cm,
+                                  const MACROBLOCKD *xd, int mi_row, int mi_col,
+                                  BLOCK_SIZE bsize, int mib_size_log2) {
+  const int bw = block_size_wide[bsize];
+  const int bh = block_size_high[bsize];
+  const int SCALE_PX_TO_MV = 8;
+  // Disallow subpixel for now
+  // SUBPEL_MASK is not the correct scale
+  if (((dv.row & (SCALE_PX_TO_MV - 1)) || (dv.col & (SCALE_PX_TO_MV - 1))))
+    return 0;
+
+  const TileInfo *const tile = &xd->tile;
+  // Is the source top-left inside the current tile?
+  const int src_top_edge = mi_row * MI_SIZE * SCALE_PX_TO_MV + dv.row;
+  const int tile_top_edge = tile->mi_row_start * MI_SIZE * SCALE_PX_TO_MV;
+  if (src_top_edge < tile_top_edge) return 0;
+  const int src_left_edge = mi_col * MI_SIZE * SCALE_PX_TO_MV + dv.col;
+  const int tile_left_edge = tile->mi_col_start * MI_SIZE * SCALE_PX_TO_MV;
+  if (src_left_edge < tile_left_edge) return 0;
+  // Is the bottom right inside the current tile?
+  const int src_bottom_edge = (mi_row * MI_SIZE + bh) * SCALE_PX_TO_MV + dv.row;
+  const int tile_bottom_edge = tile->mi_row_end * MI_SIZE * SCALE_PX_TO_MV;
+  if (src_bottom_edge > tile_bottom_edge) return 0;
+  const int src_right_edge = (mi_col * MI_SIZE + bw) * SCALE_PX_TO_MV + dv.col;
+  const int tile_right_edge = tile->mi_col_end * MI_SIZE * SCALE_PX_TO_MV;
+  if (src_right_edge > tile_right_edge) return 0;
+
+  // Special case for sub 8x8 chroma cases, to prevent referring to chroma
+  // pixels outside current tile.
+  for (int plane = 1; plane < av1_num_planes(cm); ++plane) {
+    const struct macroblockd_plane *const pd = &xd->plane[plane];
+    if (is_chroma_reference(mi_row, mi_col, bsize, pd->subsampling_x,
+                            pd->subsampling_y)) {
+      if (bw < 8 && pd->subsampling_x)
+        if (src_left_edge < tile_left_edge + 4 * SCALE_PX_TO_MV) return 0;
+      if (bh < 8 && pd->subsampling_y)
+        if (src_top_edge < tile_top_edge + 4 * SCALE_PX_TO_MV) return 0;
+    }
+  }
+
+  // Is the bottom right within an already coded SB? Also consider additional
+  // constraints to facilitate HW decoder.
+  const int max_mib_size = 1 << mib_size_log2;
+  const int active_sb_row = mi_row >> mib_size_log2;
+  const int active_sb64_col = (mi_col * MI_SIZE) >> 6;
+  const int sb_size = max_mib_size * MI_SIZE;
+  const int src_sb_row = ((src_bottom_edge >> 3) - 1) / sb_size;
+  const int src_sb64_col = ((src_right_edge >> 3) - 1) >> 6;
+  const int total_sb64_per_row =
+      ((tile->mi_col_end - tile->mi_col_start - 1) >> 4) + 1;
+  const int active_sb64 = active_sb_row * total_sb64_per_row + active_sb64_col;
+  const int src_sb64 = src_sb_row * total_sb64_per_row + src_sb64_col;
+  if (src_sb64 >= active_sb64 - INTRABC_DELAY_SB64) return 0;
+
+  // Wavefront constraint: use only top left area of frame for reference.
+  const int gradient = 1 + INTRABC_DELAY_SB64 + (sb_size > 64);
+  const int wf_offset = gradient * (active_sb_row - src_sb_row);
+  if (src_sb_row > active_sb_row ||
+      src_sb64_col >= active_sb64_col - INTRABC_DELAY_SB64 + wf_offset)
+    return 0;
+
+  return 1;
+}
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_MVREF_COMMON_H_

diff --git a/libav1/av1/common/obmc.h b/libav1/av1/common/obmc.h
new file mode 100644
index 0000000..1c90cd9
--- /dev/null
+++ b/libav1/av1/common/obmc.h

@@ -0,0 +1,91 @@
+/*
+ * Copyright (c) 2017, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_OBMC_H_
+#define AOM_AV1_COMMON_OBMC_H_
+
+typedef void (*overlappable_nb_visitor_t)(MACROBLOCKD *xd, int rel_mi_pos,
+                                          uint8_t nb_mi_size,
+                                          MB_MODE_INFO *nb_mi, void *fun_ctxt,
+                                          const int num_planes);
+
+static INLINE void foreach_overlappable_nb_above(const AV1_COMMON *cm,
+                                                 MACROBLOCKD *xd, int mi_col,
+                                                 int nb_max,
+                                                 overlappable_nb_visitor_t fun,
+                                                 void *fun_ctxt) {
+  const int num_planes = av1_num_planes(cm);
+  if (!xd->up_available) return;
+
+  int nb_count = 0;
+
+  // prev_row_mi points into the mi array, starting at the beginning of the
+  // previous row.
+  MB_MODE_INFO **prev_row_mi = xd->mi - mi_col - 1 * xd->mi_stride;
+  const int end_col = AOMMIN(mi_col + xd->n4_w, cm->mi_cols);
+  uint8_t mi_step;
+  for (int above_mi_col = mi_col; above_mi_col < end_col && nb_count < nb_max;
+       above_mi_col += mi_step) {
+    MB_MODE_INFO **above_mi = prev_row_mi + above_mi_col;
+    mi_step =
+        AOMMIN(mi_size_wide[above_mi[0]->sb_type], mi_size_wide[BLOCK_64X64]);
+    // If we're considering a block with width 4, it should be treated as
+    // half of a pair of blocks with chroma information in the second. Move
+    // above_mi_col back to the start of the pair if needed, set above_mbmi
+    // to point at the block with chroma information, and set mi_step to 2 to
+    // step over the entire pair at the end of the iteration.
+    if (mi_step == 1) {
+      above_mi_col &= ~1;
+      above_mi = prev_row_mi + above_mi_col + 1;
+      mi_step = 2;
+    }
+    if (is_neighbor_overlappable(*above_mi)) {
+      ++nb_count;
+      fun(xd, above_mi_col - mi_col, AOMMIN(xd->n4_w, mi_step), *above_mi,
+          fun_ctxt, num_planes);
+    }
+  }
+}
+
+static INLINE void foreach_overlappable_nb_left(const AV1_COMMON *cm,
+                                                MACROBLOCKD *xd, int mi_row,
+                                                int nb_max,
+                                                overlappable_nb_visitor_t fun,
+                                                void *fun_ctxt) {
+  const int num_planes = av1_num_planes(cm);
+  if (!xd->left_available) return;
+
+  int nb_count = 0;
+
+  // prev_col_mi points into the mi array, starting at the top of the
+  // previous column
+  MB_MODE_INFO **prev_col_mi = xd->mi - 1 - mi_row * xd->mi_stride;
+  const int end_row = AOMMIN(mi_row + xd->n4_h, cm->mi_rows);
+  uint8_t mi_step;
+  for (int left_mi_row = mi_row; left_mi_row < end_row && nb_count < nb_max;
+       left_mi_row += mi_step) {
+    MB_MODE_INFO **left_mi = prev_col_mi + left_mi_row * xd->mi_stride;
+    mi_step =
+        AOMMIN(mi_size_high[left_mi[0]->sb_type], mi_size_high[BLOCK_64X64]);
+    if (mi_step == 1) {
+      left_mi_row &= ~1;
+      left_mi = prev_col_mi + (left_mi_row + 1) * xd->mi_stride;
+      mi_step = 2;
+    }
+    if (is_neighbor_overlappable(*left_mi)) {
+      ++nb_count;
+      fun(xd, left_mi_row - mi_row, AOMMIN(xd->n4_h, mi_step), *left_mi,
+          fun_ctxt, num_planes);
+    }
+  }
+}
+
+#endif  // AOM_AV1_COMMON_OBMC_H_

diff --git a/libav1/av1/common/obu_util.c b/libav1/av1/common/obu_util.c
new file mode 100644
index 0000000..7d2694b
--- /dev/null
+++ b/libav1/av1/common/obu_util.c

@@ -0,0 +1,154 @@
+/*
+ * Copyright (c) 2018, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#include <assert.h>
+
+#include "av1/common/obu_util.h"
+
+#include "aom_dsp/bitreader_buffer.h"
+
+// Returns 1 when OBU type is valid, and 0 otherwise.
+static int valid_obu_type(int obu_type) {
+  int valid_type = 0;
+  switch (obu_type) {
+    case OBU_SEQUENCE_HEADER:
+    case OBU_TEMPORAL_DELIMITER:
+    case OBU_FRAME_HEADER:
+    case OBU_TILE_GROUP:
+    case OBU_METADATA:
+    case OBU_FRAME:
+    case OBU_REDUNDANT_FRAME_HEADER:
+    case OBU_TILE_LIST:
+    case OBU_PADDING: valid_type = 1; break;
+    default: break;
+  }
+  return valid_type;
+}
+
+static aom_codec_err_t read_obu_size(const uint8_t *data,
+                                     size_t bytes_available,
+                                     size_t *const obu_size,
+                                     size_t *const length_field_size) {
+  uint64_t u_obu_size = 0;
+  if (aom_uleb_decode(data, bytes_available, &u_obu_size, length_field_size) !=
+      0) {
+    return AOM_CODEC_CORRUPT_FRAME;
+  }
+
+  if (u_obu_size > UINT32_MAX) return AOM_CODEC_CORRUPT_FRAME;
+  *obu_size = (size_t)u_obu_size;
+  return AOM_CODEC_OK;
+}
+
+// Parses OBU header and stores values in 'header'.
+static aom_codec_err_t read_obu_header(struct aom_read_bit_buffer *rb,
+                                       int is_annexb, ObuHeader *header) {
+  if (!rb || !header) return AOM_CODEC_INVALID_PARAM;
+
+  const ptrdiff_t bit_buffer_byte_length = rb->bit_buffer_end - rb->bit_buffer;
+  if (bit_buffer_byte_length < 1) return AOM_CODEC_CORRUPT_FRAME;
+
+  header->size = 1;
+
+  if (aom_rb_read_bit(rb) != 0) {
+    // Forbidden bit. Must not be set.
+    return AOM_CODEC_CORRUPT_FRAME;
+  }
+
+  header->type = (OBU_TYPE)aom_rb_read_literal(rb, 4);
+
+  if (!valid_obu_type(header->type)) return AOM_CODEC_CORRUPT_FRAME;
+
+  header->has_extension = aom_rb_read_bit(rb);
+  header->has_size_field = aom_rb_read_bit(rb);
+
+  if (!header->has_size_field && !is_annexb) {
+    // section 5 obu streams must have obu_size field set.
+    return AOM_CODEC_UNSUP_BITSTREAM;
+  }
+
+  if (aom_rb_read_bit(rb) != 0) {
+    // obu_reserved_1bit must be set to 0.
+    return AOM_CODEC_CORRUPT_FRAME;
+  }
+
+  if (header->has_extension) {
+    if (bit_buffer_byte_length == 1) return AOM_CODEC_CORRUPT_FRAME;
+
+    header->size += 1;
+    header->temporal_layer_id = aom_rb_read_literal(rb, 3);
+    header->spatial_layer_id = aom_rb_read_literal(rb, 2);
+    if (aom_rb_read_literal(rb, 3) != 0) {
+      // extension_header_reserved_3bits must be set to 0.
+      return AOM_CODEC_CORRUPT_FRAME;
+    }
+  }
+
+  return AOM_CODEC_OK;
+}
+
+aom_codec_err_t aom_read_obu_header(uint8_t *buffer, size_t buffer_length,
+                                    size_t *consumed, ObuHeader *header,
+                                    int is_annexb) {
+  if (buffer_length < 1 || !consumed || !header) return AOM_CODEC_INVALID_PARAM;
+
+  // TODO(tomfinegan): Set the error handler here and throughout this file, and
+  // confirm parsing work done via aom_read_bit_buffer is successful.
+  struct aom_read_bit_buffer rb = { buffer, buffer + buffer_length, 0, NULL,
+                                    NULL };
+  aom_codec_err_t parse_result = read_obu_header(&rb, is_annexb, header);
+  if (parse_result == AOM_CODEC_OK) *consumed = header->size;
+  return parse_result;
+}
+
+aom_codec_err_t aom_read_obu_header_and_size(const uint8_t *data,
+                                             size_t bytes_available,
+                                             int is_annexb,
+                                             ObuHeader *obu_header,
+                                             size_t *const payload_size,
+                                             size_t *const bytes_read) {
+  size_t length_field_size_obu = 0;
+  size_t length_field_size_payload = 0;
+  size_t obu_size = 0;
+  aom_codec_err_t status;
+
+  if (is_annexb) {
+    // Size field comes before the OBU header, and includes the OBU header
+    status =
+        read_obu_size(data, bytes_available, &obu_size, &length_field_size_obu);
+
+    if (status != AOM_CODEC_OK) return status;
+  }
+
+  struct aom_read_bit_buffer rb = { data + length_field_size_obu,
+                                    data + bytes_available, 0, NULL, NULL };
+
+  status = read_obu_header(&rb, is_annexb, obu_header);
+  if (status != AOM_CODEC_OK) return status;
+
+  if (!obu_header->has_size_field) {
+    assert(is_annexb);
+    // Derive the payload size from the data we've already read
+    if (obu_size < obu_header->size) return AOM_CODEC_CORRUPT_FRAME;
+
+    *payload_size = obu_size - obu_header->size;
+  } else {
+    // Size field comes after the OBU header, and is just the payload size
+    status = read_obu_size(
+        data + length_field_size_obu + obu_header->size,
+        bytes_available - length_field_size_obu - obu_header->size,
+        payload_size, &length_field_size_payload);
+    if (status != AOM_CODEC_OK) return status;
+  }
+
+  *bytes_read =
+      length_field_size_obu + obu_header->size + length_field_size_payload;
+  return AOM_CODEC_OK;
+}

diff --git a/libav1/av1/common/obu_util.h b/libav1/av1/common/obu_util.h
new file mode 100644
index 0000000..7c56904
--- /dev/null
+++ b/libav1/av1/common/obu_util.h

@@ -0,0 +1,47 @@
+/*
+ * Copyright (c) 2018, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#ifndef AOM_AV1_COMMON_OBU_UTIL_H_
+#define AOM_AV1_COMMON_OBU_UTIL_H_
+
+#include "aom/aom_codec.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef struct {
+  size_t size;  // Size (1 or 2 bytes) of the OBU header (including the
+                // optional OBU extension header) in the bitstream.
+  OBU_TYPE type;
+  int has_size_field;
+  int has_extension;
+  // The following fields come from the OBU extension header and therefore are
+  // only used if has_extension is true.
+  int temporal_layer_id;
+  int spatial_layer_id;
+} ObuHeader;
+
+aom_codec_err_t aom_read_obu_header(uint8_t *buffer, size_t buffer_length,
+                                    size_t *consumed, ObuHeader *header,
+                                    int is_annexb);
+
+aom_codec_err_t aom_read_obu_header_and_size(const uint8_t *data,
+                                             size_t bytes_available,
+                                             int is_annexb,
+                                             ObuHeader *obu_header,
+                                             size_t *const payload_size,
+                                             size_t *const bytes_read);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_OBU_UTIL_H_

diff --git a/libav1/av1/common/odintrin.c b/libav1/av1/common/odintrin.c
new file mode 100644
index 0000000..7584b2e
--- /dev/null
+++ b/libav1/av1/common/odintrin.c

@@ -0,0 +1,541 @@
+/*
+ * Copyright (c) 2001-2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+/* clang-format off */
+
+#include "av1/common/odintrin.h"
+
+/*Constants for use with OD_DIVU_SMALL().
+  See \cite{Rob05} for details on computing these constants.
+  @INPROCEEDINGS{Rob05,
+    author="Arch D. Robison",
+    title="{N}-bit Unsigned Division via {N}-bit Multiply-Add",
+    booktitle="Proc. of the 17th IEEE Symposium on Computer Arithmetic
+     (ARITH'05)",
+    pages="131--139",
+    address="Cape Cod, MA",
+    month=Jun,
+    year=2005
+  }*/
+uint32_t OD_DIVU_SMALL_CONSTS[OD_DIVU_DMAX][2] = {
+  { 0xFFFFFFFF, 0xFFFFFFFF }, { 0xFFFFFFFF, 0xFFFFFFFF },
+  { 0xAAAAAAAB, 0 },          { 0xFFFFFFFF, 0xFFFFFFFF },
+  { 0xCCCCCCCD, 0 },          { 0xAAAAAAAB, 0 },
+  { 0x92492492, 0x92492492 }, { 0xFFFFFFFF, 0xFFFFFFFF },
+  { 0xE38E38E4, 0 },          { 0xCCCCCCCD, 0 },
+  { 0xBA2E8BA3, 0 },          { 0xAAAAAAAB, 0 },
+  { 0x9D89D89E, 0 },          { 0x92492492, 0x92492492 },
+  { 0x88888889, 0 },          { 0xFFFFFFFF, 0xFFFFFFFF },
+  { 0xF0F0F0F1, 0 },          { 0xE38E38E4, 0 },
+  { 0xD79435E5, 0xD79435E5 }, { 0xCCCCCCCD, 0 },
+  { 0xC30C30C3, 0xC30C30C3 }, { 0xBA2E8BA3, 0 },
+  { 0xB21642C9, 0 },          { 0xAAAAAAAB, 0 },
+  { 0xA3D70A3E, 0 },          { 0x9D89D89E, 0 },
+  { 0x97B425ED, 0x97B425ED }, { 0x92492492, 0x92492492 },
+  { 0x8D3DCB09, 0 },          { 0x88888889, 0 },
+  { 0x84210842, 0x84210842 }, { 0xFFFFFFFF, 0xFFFFFFFF },
+  { 0xF83E0F84, 0 },          { 0xF0F0F0F1, 0 },
+  { 0xEA0EA0EA, 0xEA0EA0EA }, { 0xE38E38E4, 0 },
+  { 0xDD67C8A6, 0xDD67C8A6 }, { 0xD79435E5, 0xD79435E5 },
+  { 0xD20D20D2, 0xD20D20D2 }, { 0xCCCCCCCD, 0 },
+  { 0xC7CE0C7D, 0 },          { 0xC30C30C3, 0xC30C30C3 },
+  { 0xBE82FA0C, 0 },          { 0xBA2E8BA3, 0 },
+  { 0xB60B60B6, 0xB60B60B6 }, { 0xB21642C9, 0 },
+  { 0xAE4C415D, 0 },          { 0xAAAAAAAB, 0 },
+  { 0xA72F053A, 0 },          { 0xA3D70A3E, 0 },
+  { 0xA0A0A0A1, 0 },          { 0x9D89D89E, 0 },
+  { 0x9A90E7D9, 0x9A90E7D9 }, { 0x97B425ED, 0x97B425ED },
+  { 0x94F2094F, 0x94F2094F }, { 0x92492492, 0x92492492 },
+  { 0x8FB823EE, 0x8FB823EE }, { 0x8D3DCB09, 0 },
+  { 0x8AD8F2FC, 0 },          { 0x88888889, 0 },
+  { 0x864B8A7E, 0 },          { 0x84210842, 0x84210842 },
+  { 0x82082082, 0x82082082 }, { 0xFFFFFFFF, 0xFFFFFFFF },
+  { 0xFC0FC0FD, 0 },          { 0xF83E0F84, 0 },
+  { 0xF4898D60, 0 },          { 0xF0F0F0F1, 0 },
+  { 0xED7303B6, 0 },          { 0xEA0EA0EA, 0xEA0EA0EA },
+  { 0xE6C2B449, 0 },          { 0xE38E38E4, 0 },
+  { 0xE070381C, 0xE070381C }, { 0xDD67C8A6, 0xDD67C8A6 },
+  { 0xDA740DA8, 0 },          { 0xD79435E5, 0xD79435E5 },
+  { 0xD4C77B04, 0 },          { 0xD20D20D2, 0xD20D20D2 },
+  { 0xCF6474A9, 0 },          { 0xCCCCCCCD, 0 },
+  { 0xCA4587E7, 0 },          { 0xC7CE0C7D, 0 },
+  { 0xC565C87C, 0 },          { 0xC30C30C3, 0xC30C30C3 },
+  { 0xC0C0C0C1, 0 },          { 0xBE82FA0C, 0 },
+  { 0xBC52640C, 0 },          { 0xBA2E8BA3, 0 },
+  { 0xB81702E1, 0 },          { 0xB60B60B6, 0xB60B60B6 },
+  { 0xB40B40B4, 0xB40B40B4 }, { 0xB21642C9, 0 },
+  { 0xB02C0B03, 0 },          { 0xAE4C415D, 0 },
+  { 0xAC769184, 0xAC769184 }, { 0xAAAAAAAB, 0 },
+  { 0xA8E83F57, 0xA8E83F57 }, { 0xA72F053A, 0 },
+  { 0xA57EB503, 0 },          { 0xA3D70A3E, 0 },
+  { 0xA237C32B, 0xA237C32B }, { 0xA0A0A0A1, 0 },
+  { 0x9F1165E7, 0x9F1165E7 }, { 0x9D89D89E, 0 },
+  { 0x9C09C09C, 0x9C09C09C }, { 0x9A90E7D9, 0x9A90E7D9 },
+  { 0x991F1A51, 0x991F1A51 }, { 0x97B425ED, 0x97B425ED },
+  { 0x964FDA6C, 0x964FDA6C }, { 0x94F2094F, 0x94F2094F },
+  { 0x939A85C4, 0x939A85C4 }, { 0x92492492, 0x92492492 },
+  { 0x90FDBC09, 0x90FDBC09 }, { 0x8FB823EE, 0x8FB823EE },
+  { 0x8E78356D, 0x8E78356D }, { 0x8D3DCB09, 0 },
+  { 0x8C08C08C, 0x8C08C08C }, { 0x8AD8F2FC, 0 },
+  { 0x89AE408A, 0 },          { 0x88888889, 0 },
+  { 0x8767AB5F, 0x8767AB5F }, { 0x864B8A7E, 0 },
+  { 0x85340853, 0x85340853 }, { 0x84210842, 0x84210842 },
+  { 0x83126E98, 0 },          { 0x82082082, 0x82082082 },
+  { 0x81020408, 0x81020408 }, { 0xFFFFFFFF, 0xFFFFFFFF },
+  { 0xFE03F810, 0 },          { 0xFC0FC0FD, 0 },
+  { 0xFA232CF3, 0 },          { 0xF83E0F84, 0 },
+  { 0xF6603D99, 0 },          { 0xF4898D60, 0 },
+  { 0xF2B9D649, 0 },          { 0xF0F0F0F1, 0 },
+  { 0xEF2EB720, 0 },          { 0xED7303B6, 0 },
+  { 0xEBBDB2A6, 0 },          { 0xEA0EA0EA, 0xEA0EA0EA },
+  { 0xE865AC7C, 0 },          { 0xE6C2B449, 0 },
+  { 0xE525982B, 0 },          { 0xE38E38E4, 0 },
+  { 0xE1FC780F, 0 },          { 0xE070381C, 0xE070381C },
+  { 0xDEE95C4D, 0 },          { 0xDD67C8A6, 0xDD67C8A6 },
+  { 0xDBEB61EF, 0 },          { 0xDA740DA8, 0 },
+  { 0xD901B204, 0 },          { 0xD79435E5, 0xD79435E5 },
+  { 0xD62B80D7, 0 },          { 0xD4C77B04, 0 },
+  { 0xD3680D37, 0 },          { 0xD20D20D2, 0xD20D20D2 },
+  { 0xD0B69FCC, 0 },          { 0xCF6474A9, 0 },
+  { 0xCE168A77, 0xCE168A77 }, { 0xCCCCCCCD, 0 },
+  { 0xCB8727C1, 0 },          { 0xCA4587E7, 0 },
+  { 0xC907DA4F, 0 },          { 0xC7CE0C7D, 0 },
+  { 0xC6980C6A, 0 },          { 0xC565C87C, 0 },
+  { 0xC4372F86, 0 },          { 0xC30C30C3, 0xC30C30C3 },
+  { 0xC1E4BBD6, 0 },          { 0xC0C0C0C1, 0 },
+  { 0xBFA02FE8, 0xBFA02FE8 }, { 0xBE82FA0C, 0 },
+  { 0xBD691047, 0xBD691047 }, { 0xBC52640C, 0 },
+  { 0xBB3EE722, 0 },          { 0xBA2E8BA3, 0 },
+  { 0xB92143FA, 0xB92143FA }, { 0xB81702E1, 0 },
+  { 0xB70FBB5A, 0xB70FBB5A }, { 0xB60B60B6, 0xB60B60B6 },
+  { 0xB509E68B, 0 },          { 0xB40B40B4, 0xB40B40B4 },
+  { 0xB30F6353, 0 },          { 0xB21642C9, 0 },
+  { 0xB11FD3B8, 0xB11FD3B8 }, { 0xB02C0B03, 0 },
+  { 0xAF3ADDC7, 0 },          { 0xAE4C415D, 0 },
+  { 0xAD602B58, 0xAD602B58 }, { 0xAC769184, 0xAC769184 },
+  { 0xAB8F69E3, 0 },          { 0xAAAAAAAB, 0 },
+  { 0xA9C84A48, 0 },          { 0xA8E83F57, 0xA8E83F57 },
+  { 0xA80A80A8, 0xA80A80A8 }, { 0xA72F053A, 0 },
+  { 0xA655C439, 0xA655C439 }, { 0xA57EB503, 0 },
+  { 0xA4A9CF1E, 0 },          { 0xA3D70A3E, 0 },
+  { 0xA3065E40, 0 },          { 0xA237C32B, 0xA237C32B },
+  { 0xA16B312F, 0 },          { 0xA0A0A0A1, 0 },
+  { 0x9FD809FE, 0 },          { 0x9F1165E7, 0x9F1165E7 },
+  { 0x9E4CAD24, 0 },          { 0x9D89D89E, 0 },
+  { 0x9CC8E161, 0 },          { 0x9C09C09C, 0x9C09C09C },
+  { 0x9B4C6F9F, 0 },          { 0x9A90E7D9, 0x9A90E7D9 },
+  { 0x99D722DB, 0 },          { 0x991F1A51, 0x991F1A51 },
+  { 0x9868C80A, 0 },          { 0x97B425ED, 0x97B425ED },
+  { 0x97012E02, 0x97012E02 }, { 0x964FDA6C, 0x964FDA6C },
+  { 0x95A02568, 0x95A02568 }, { 0x94F2094F, 0x94F2094F },
+  { 0x94458094, 0x94458094 }, { 0x939A85C4, 0x939A85C4 },
+  { 0x92F11384, 0x92F11384 }, { 0x92492492, 0x92492492 },
+  { 0x91A2B3C5, 0 },          { 0x90FDBC09, 0x90FDBC09 },
+  { 0x905A3863, 0x905A3863 }, { 0x8FB823EE, 0x8FB823EE },
+  { 0x8F1779DA, 0 },          { 0x8E78356D, 0x8E78356D },
+  { 0x8DDA5202, 0x8DDA5202 }, { 0x8D3DCB09, 0 },
+  { 0x8CA29C04, 0x8CA29C04 }, { 0x8C08C08C, 0x8C08C08C },
+  { 0x8B70344A, 0x8B70344A }, { 0x8AD8F2FC, 0 },
+  { 0x8A42F870, 0x8A42F870 }, { 0x89AE408A, 0 },
+  { 0x891AC73B, 0 },          { 0x88888889, 0 },
+  { 0x87F78088, 0 },          { 0x8767AB5F, 0x8767AB5F },
+  { 0x86D90545, 0 },          { 0x864B8A7E, 0 },
+  { 0x85BF3761, 0x85BF3761 }, { 0x85340853, 0x85340853 },
+  { 0x84A9F9C8, 0x84A9F9C8 }, { 0x84210842, 0x84210842 },
+  { 0x83993052, 0x83993052 }, { 0x83126E98, 0 },
+  { 0x828CBFBF, 0 },          { 0x82082082, 0x82082082 },
+  { 0x81848DA9, 0 },          { 0x81020408, 0x81020408 },
+  { 0x80808081, 0 },          { 0xFFFFFFFF, 0xFFFFFFFF },
+  { 0xFF00FF01, 0 },          { 0xFE03F810, 0 },
+  { 0xFD08E551, 0 },          { 0xFC0FC0FD, 0 },
+  { 0xFB188566, 0 },          { 0xFA232CF3, 0 },
+  { 0xF92FB222, 0 },          { 0xF83E0F84, 0 },
+  { 0xF74E3FC3, 0 },          { 0xF6603D99, 0 },
+  { 0xF57403D6, 0 },          { 0xF4898D60, 0 },
+  { 0xF3A0D52D, 0 },          { 0xF2B9D649, 0 },
+  { 0xF1D48BCF, 0 },          { 0xF0F0F0F1, 0 },
+  { 0xF00F00F0, 0xF00F00F0 }, { 0xEF2EB720, 0 },
+  { 0xEE500EE5, 0xEE500EE5 }, { 0xED7303B6, 0 },
+  { 0xEC979119, 0 },          { 0xEBBDB2A6, 0 },
+  { 0xEAE56404, 0 },          { 0xEA0EA0EA, 0xEA0EA0EA },
+  { 0xE9396520, 0 },          { 0xE865AC7C, 0 },
+  { 0xE79372E3, 0 },          { 0xE6C2B449, 0 },
+  { 0xE5F36CB0, 0xE5F36CB0 }, { 0xE525982B, 0 },
+  { 0xE45932D8, 0 },          { 0xE38E38E4, 0 },
+  { 0xE2C4A689, 0 },          { 0xE1FC780F, 0 },
+  { 0xE135A9CA, 0 },          { 0xE070381C, 0xE070381C },
+  { 0xDFAC1F75, 0 },          { 0xDEE95C4D, 0 },
+  { 0xDE27EB2D, 0 },          { 0xDD67C8A6, 0xDD67C8A6 },
+  { 0xDCA8F159, 0 },          { 0xDBEB61EF, 0 },
+  { 0xDB2F171E, 0 },          { 0xDA740DA8, 0 },
+  { 0xD9BA4257, 0 },          { 0xD901B204, 0 },
+  { 0xD84A598F, 0 },          { 0xD79435E5, 0xD79435E5 },
+  { 0xD6DF43FD, 0 },          { 0xD62B80D7, 0 },
+  { 0xD578E97D, 0 },          { 0xD4C77B04, 0 },
+  { 0xD417328A, 0 },          { 0xD3680D37, 0 },
+  { 0xD2BA083C, 0 },          { 0xD20D20D2, 0xD20D20D2 },
+  { 0xD161543E, 0xD161543E }, { 0xD0B69FCC, 0 },
+  { 0xD00D00D0, 0xD00D00D0 }, { 0xCF6474A9, 0 },
+  { 0xCEBCF8BC, 0 },          { 0xCE168A77, 0xCE168A77 },
+  { 0xCD712753, 0 },          { 0xCCCCCCCD, 0 },
+  { 0xCC29786D, 0 },          { 0xCB8727C1, 0 },
+  { 0xCAE5D85F, 0xCAE5D85F }, { 0xCA4587E7, 0 },
+  { 0xC9A633FD, 0 },          { 0xC907DA4F, 0 },
+  { 0xC86A7890, 0xC86A7890 }, { 0xC7CE0C7D, 0 },
+  { 0xC73293D8, 0 },          { 0xC6980C6A, 0 },
+  { 0xC5FE7403, 0xC5FE7403 }, { 0xC565C87C, 0 },
+  { 0xC4CE07B0, 0xC4CE07B0 }, { 0xC4372F86, 0 },
+  { 0xC3A13DE6, 0xC3A13DE6 }, { 0xC30C30C3, 0xC30C30C3 },
+  { 0xC2780614, 0 },          { 0xC1E4BBD6, 0 },
+  { 0xC152500C, 0xC152500C }, { 0xC0C0C0C1, 0 },
+  { 0xC0300C03, 0xC0300C03 }, { 0xBFA02FE8, 0xBFA02FE8 },
+  { 0xBF112A8B, 0 },          { 0xBE82FA0C, 0 },
+  { 0xBDF59C92, 0 },          { 0xBD691047, 0xBD691047 },
+  { 0xBCDD535E, 0 },          { 0xBC52640C, 0 },
+  { 0xBBC8408D, 0 },          { 0xBB3EE722, 0 },
+  { 0xBAB65610, 0xBAB65610 }, { 0xBA2E8BA3, 0 },
+  { 0xB9A7862A, 0xB9A7862A }, { 0xB92143FA, 0xB92143FA },
+  { 0xB89BC36D, 0 },          { 0xB81702E1, 0 },
+  { 0xB79300B8, 0 },          { 0xB70FBB5A, 0xB70FBB5A },
+  { 0xB68D3134, 0xB68D3134 }, { 0xB60B60B6, 0xB60B60B6 },
+  { 0xB58A4855, 0xB58A4855 }, { 0xB509E68B, 0 },
+  { 0xB48A39D4, 0xB48A39D4 }, { 0xB40B40B4, 0xB40B40B4 },
+  { 0xB38CF9B0, 0xB38CF9B0 }, { 0xB30F6353, 0 },
+  { 0xB2927C2A, 0 },          { 0xB21642C9, 0 },
+  { 0xB19AB5C5, 0 },          { 0xB11FD3B8, 0xB11FD3B8 },
+  { 0xB0A59B42, 0 },          { 0xB02C0B03, 0 },
+  { 0xAFB321A1, 0xAFB321A1 }, { 0xAF3ADDC7, 0 },
+  { 0xAEC33E20, 0 },          { 0xAE4C415D, 0 },
+  { 0xADD5E632, 0xADD5E632 }, { 0xAD602B58, 0xAD602B58 },
+  { 0xACEB0F89, 0xACEB0F89 }, { 0xAC769184, 0xAC769184 },
+  { 0xAC02B00B, 0 },          { 0xAB8F69E3, 0 },
+  { 0xAB1CBDD4, 0 },          { 0xAAAAAAAB, 0 },
+  { 0xAA392F36, 0 },          { 0xA9C84A48, 0 },
+  { 0xA957FAB5, 0xA957FAB5 }, { 0xA8E83F57, 0xA8E83F57 },
+  { 0xA8791709, 0 },          { 0xA80A80A8, 0xA80A80A8 },
+  { 0xA79C7B17, 0 },          { 0xA72F053A, 0 },
+  { 0xA6C21DF7, 0 },          { 0xA655C439, 0xA655C439 },
+  { 0xA5E9F6ED, 0xA5E9F6ED }, { 0xA57EB503, 0 },
+  { 0xA513FD6C, 0 },          { 0xA4A9CF1E, 0 },
+  { 0xA4402910, 0xA4402910 }, { 0xA3D70A3E, 0 },
+  { 0xA36E71A3, 0 },          { 0xA3065E40, 0 },
+  { 0xA29ECF16, 0xA29ECF16 }, { 0xA237C32B, 0xA237C32B },
+  { 0xA1D13986, 0 },          { 0xA16B312F, 0 },
+  { 0xA105A933, 0 },          { 0xA0A0A0A1, 0 },
+  { 0xA03C1689, 0 },          { 0x9FD809FE, 0 },
+  { 0x9F747A15, 0x9F747A15 }, { 0x9F1165E7, 0x9F1165E7 },
+  { 0x9EAECC8D, 0x9EAECC8D }, { 0x9E4CAD24, 0 },
+  { 0x9DEB06C9, 0x9DEB06C9 }, { 0x9D89D89E, 0 },
+  { 0x9D2921C4, 0 },          { 0x9CC8E161, 0 },
+  { 0x9C69169B, 0x9C69169B }, { 0x9C09C09C, 0x9C09C09C },
+  { 0x9BAADE8E, 0x9BAADE8E }, { 0x9B4C6F9F, 0 },
+  { 0x9AEE72FD, 0 },          { 0x9A90E7D9, 0x9A90E7D9 },
+  { 0x9A33CD67, 0x9A33CD67 }, { 0x99D722DB, 0 },
+  { 0x997AE76B, 0x997AE76B }, { 0x991F1A51, 0x991F1A51 },
+  { 0x98C3BAC7, 0x98C3BAC7 }, { 0x9868C80A, 0 },
+  { 0x980E4156, 0x980E4156 }, { 0x97B425ED, 0x97B425ED },
+  { 0x975A7510, 0 },          { 0x97012E02, 0x97012E02 },
+  { 0x96A8500A, 0 },          { 0x964FDA6C, 0x964FDA6C },
+  { 0x95F7CC73, 0 },          { 0x95A02568, 0x95A02568 },
+  { 0x9548E498, 0 },          { 0x94F2094F, 0x94F2094F },
+  { 0x949B92DE, 0 },          { 0x94458094, 0x94458094 },
+  { 0x93EFD1C5, 0x93EFD1C5 }, { 0x939A85C4, 0x939A85C4 },
+  { 0x93459BE7, 0 },          { 0x92F11384, 0x92F11384 },
+  { 0x929CEBF5, 0 },          { 0x92492492, 0x92492492 },
+  { 0x91F5BCB9, 0 },          { 0x91A2B3C5, 0 },
+  { 0x91500915, 0x91500915 }, { 0x90FDBC09, 0x90FDBC09 },
+  { 0x90ABCC02, 0x90ABCC02 }, { 0x905A3863, 0x905A3863 },
+  { 0x90090090, 0x90090090 }, { 0x8FB823EE, 0x8FB823EE },
+  { 0x8F67A1E4, 0 },          { 0x8F1779DA, 0 },
+  { 0x8EC7AB3A, 0 },          { 0x8E78356D, 0x8E78356D },
+  { 0x8E2917E1, 0 },          { 0x8DDA5202, 0x8DDA5202 },
+  { 0x8D8BE340, 0 },          { 0x8D3DCB09, 0 },
+  { 0x8CF008CF, 0x8CF008CF }, { 0x8CA29C04, 0x8CA29C04 },
+  { 0x8C55841D, 0 },          { 0x8C08C08C, 0x8C08C08C },
+  { 0x8BBC50C9, 0 },          { 0x8B70344A, 0x8B70344A },
+  { 0x8B246A88, 0 },          { 0x8AD8F2FC, 0 },
+  { 0x8A8DCD20, 0 },          { 0x8A42F870, 0x8A42F870 },
+  { 0x89F8746A, 0 },          { 0x89AE408A, 0 },
+  { 0x89645C4F, 0x89645C4F }, { 0x891AC73B, 0 },
+  { 0x88D180CD, 0x88D180CD }, { 0x88888889, 0 },
+  { 0x883FDDF0, 0x883FDDF0 }, { 0x87F78088, 0 },
+  { 0x87AF6FD6, 0 },          { 0x8767AB5F, 0x8767AB5F },
+  { 0x872032AC, 0x872032AC }, { 0x86D90545, 0 },
+  { 0x869222B2, 0 },          { 0x864B8A7E, 0 },
+  { 0x86053C34, 0x86053C34 }, { 0x85BF3761, 0x85BF3761 },
+  { 0x85797B91, 0x85797B91 }, { 0x85340853, 0x85340853 },
+  { 0x84EEDD36, 0 },          { 0x84A9F9C8, 0x84A9F9C8 },
+  { 0x84655D9C, 0 },          { 0x84210842, 0x84210842 },
+  { 0x83DCF94E, 0 },          { 0x83993052, 0x83993052 },
+  { 0x8355ACE4, 0 },          { 0x83126E98, 0 },
+  { 0x82CF7504, 0 },          { 0x828CBFBF, 0 },
+  { 0x824A4E61, 0 },          { 0x82082082, 0x82082082 },
+  { 0x81C635BC, 0x81C635BC }, { 0x81848DA9, 0 },
+  { 0x814327E4, 0 },          { 0x81020408, 0x81020408 },
+  { 0x80C121B3, 0 },          { 0x80808081, 0 },
+  { 0x80402010, 0x80402010 }, { 0xFFFFFFFF, 0xFFFFFFFF },
+  { 0xFF803FE1, 0 },          { 0xFF00FF01, 0 },
+  { 0xFE823CA6, 0 },          { 0xFE03F810, 0 },
+  { 0xFD863087, 0 },          { 0xFD08E551, 0 },
+  { 0xFC8C15B5, 0 },          { 0xFC0FC0FD, 0 },
+  { 0xFB93E673, 0 },          { 0xFB188566, 0 },
+  { 0xFA9D9D20, 0 },          { 0xFA232CF3, 0 },
+  { 0xF9A9342D, 0 },          { 0xF92FB222, 0 },
+  { 0xF8B6A622, 0xF8B6A622 }, { 0xF83E0F84, 0 },
+  { 0xF7C5ED9D, 0 },          { 0xF74E3FC3, 0 },
+  { 0xF6D7054E, 0 },          { 0xF6603D99, 0 },
+  { 0xF5E9E7FD, 0 },          { 0xF57403D6, 0 },
+  { 0xF4FE9083, 0 },          { 0xF4898D60, 0 },
+  { 0xF414F9CE, 0 },          { 0xF3A0D52D, 0 },
+  { 0xF32D1EE0, 0 },          { 0xF2B9D649, 0 },
+  { 0xF246FACC, 0 },          { 0xF1D48BCF, 0 },
+  { 0xF16288B9, 0 },          { 0xF0F0F0F1, 0 },
+  { 0xF07FC3E0, 0xF07FC3E0 }, { 0xF00F00F0, 0xF00F00F0 },
+  { 0xEF9EA78C, 0 },          { 0xEF2EB720, 0 },
+  { 0xEEBF2F19, 0 },          { 0xEE500EE5, 0xEE500EE5 },
+  { 0xEDE155F4, 0 },          { 0xED7303B6, 0 },
+  { 0xED05179C, 0xED05179C }, { 0xEC979119, 0 },
+  { 0xEC2A6FA0, 0xEC2A6FA0 }, { 0xEBBDB2A6, 0 },
+  { 0xEB5159A0, 0 },          { 0xEAE56404, 0 },
+  { 0xEA79D14A, 0 },          { 0xEA0EA0EA, 0xEA0EA0EA },
+  { 0xE9A3D25E, 0xE9A3D25E }, { 0xE9396520, 0 },
+  { 0xE8CF58AB, 0 },          { 0xE865AC7C, 0 },
+  { 0xE7FC600F, 0 },          { 0xE79372E3, 0 },
+  { 0xE72AE476, 0 },          { 0xE6C2B449, 0 },
+  { 0xE65AE1DC, 0 },          { 0xE5F36CB0, 0xE5F36CB0 },
+  { 0xE58C544A, 0 },          { 0xE525982B, 0 },
+  { 0xE4BF37D9, 0 },          { 0xE45932D8, 0 },
+  { 0xE3F388AF, 0 },          { 0xE38E38E4, 0 },
+  { 0xE32942FF, 0 },          { 0xE2C4A689, 0 },
+  { 0xE260630B, 0 },          { 0xE1FC780F, 0 },
+  { 0xE198E520, 0 },          { 0xE135A9CA, 0 },
+  { 0xE0D2C59A, 0 },          { 0xE070381C, 0xE070381C },
+  { 0xE00E00E0, 0xE00E00E0 }, { 0xDFAC1F75, 0 },
+  { 0xDF4A9369, 0 },          { 0xDEE95C4D, 0 },
+  { 0xDE8879B3, 0 },          { 0xDE27EB2D, 0 },
+  { 0xDDC7B04D, 0 },          { 0xDD67C8A6, 0xDD67C8A6 },
+  { 0xDD0833CE, 0 },          { 0xDCA8F159, 0 },
+  { 0xDC4A00DD, 0 },          { 0xDBEB61EF, 0 },
+  { 0xDB8D1428, 0 },          { 0xDB2F171E, 0 },
+  { 0xDAD16A6B, 0 },          { 0xDA740DA8, 0 },
+  { 0xDA17006D, 0xDA17006D }, { 0xD9BA4257, 0 },
+  { 0xD95DD300, 0 },          { 0xD901B204, 0 },
+  { 0xD8A5DEFF, 0 },          { 0xD84A598F, 0 },
+  { 0xD7EF2152, 0 },          { 0xD79435E5, 0xD79435E5 },
+  { 0xD73996E9, 0 },          { 0xD6DF43FD, 0 },
+  { 0xD6853CC1, 0 },          { 0xD62B80D7, 0 },
+  { 0xD5D20FDF, 0 },          { 0xD578E97D, 0 },
+  { 0xD5200D52, 0xD5200D52 }, { 0xD4C77B04, 0 },
+  { 0xD46F3235, 0 },          { 0xD417328A, 0 },
+  { 0xD3BF7BA9, 0 },          { 0xD3680D37, 0 },
+  { 0xD310E6DB, 0 },          { 0xD2BA083C, 0 },
+  { 0xD2637101, 0 },          { 0xD20D20D2, 0xD20D20D2 },
+  { 0xD1B71759, 0 },          { 0xD161543E, 0xD161543E },
+  { 0xD10BD72C, 0 },          { 0xD0B69FCC, 0 },
+  { 0xD061ADCA, 0 },          { 0xD00D00D0, 0xD00D00D0 },
+  { 0xCFB8988C, 0 },          { 0xCF6474A9, 0 },
+  { 0xCF1094D4, 0 },          { 0xCEBCF8BC, 0 },
+  { 0xCE69A00D, 0 },          { 0xCE168A77, 0xCE168A77 },
+  { 0xCDC3B7A9, 0xCDC3B7A9 }, { 0xCD712753, 0 },
+  { 0xCD1ED924, 0 },          { 0xCCCCCCCD, 0 },
+  { 0xCC7B0200, 0 },          { 0xCC29786D, 0 },
+  { 0xCBD82FC7, 0 },          { 0xCB8727C1, 0 },
+  { 0xCB36600D, 0 },          { 0xCAE5D85F, 0xCAE5D85F },
+  { 0xCA95906C, 0 },          { 0xCA4587E7, 0 },
+  { 0xC9F5BE86, 0 },          { 0xC9A633FD, 0 },
+  { 0xC956E803, 0xC956E803 }, { 0xC907DA4F, 0 },
+  { 0xC8B90A96, 0 },          { 0xC86A7890, 0xC86A7890 },
+  { 0xC81C23F5, 0xC81C23F5 }, { 0xC7CE0C7D, 0 },
+  { 0xC78031E0, 0xC78031E0 }, { 0xC73293D8, 0 },
+  { 0xC6E5321D, 0 },          { 0xC6980C6A, 0 },
+  { 0xC64B2278, 0xC64B2278 }, { 0xC5FE7403, 0xC5FE7403 },
+  { 0xC5B200C6, 0 },          { 0xC565C87C, 0 },
+  { 0xC519CAE0, 0xC519CAE0 }, { 0xC4CE07B0, 0xC4CE07B0 },
+  { 0xC4827EA8, 0xC4827EA8 }, { 0xC4372F86, 0 },
+  { 0xC3EC1A06, 0 },          { 0xC3A13DE6, 0xC3A13DE6 },
+  { 0xC3569AE6, 0 },          { 0xC30C30C3, 0xC30C30C3 },
+  { 0xC2C1FF3E, 0 },          { 0xC2780614, 0 },
+  { 0xC22E4507, 0 },          { 0xC1E4BBD6, 0 },
+  { 0xC19B6A42, 0 },          { 0xC152500C, 0xC152500C },
+  { 0xC1096CF6, 0 },          { 0xC0C0C0C1, 0 },
+  { 0xC0784B2F, 0 },          { 0xC0300C03, 0xC0300C03 },
+  { 0xBFE80300, 0 },          { 0xBFA02FE8, 0xBFA02FE8 },
+  { 0xBF589280, 0 },          { 0xBF112A8B, 0 },
+  { 0xBEC9F7CE, 0 },          { 0xBE82FA0C, 0 },
+  { 0xBE3C310C, 0 },          { 0xBDF59C92, 0 },
+  { 0xBDAF3C64, 0 },          { 0xBD691047, 0xBD691047 },
+  { 0xBD231803, 0 },          { 0xBCDD535E, 0 },
+  { 0xBC97C21E, 0xBC97C21E }, { 0xBC52640C, 0 },
+  { 0xBC0D38EE, 0xBC0D38EE }, { 0xBBC8408D, 0 },
+  { 0xBB837AB1, 0 },          { 0xBB3EE722, 0 },
+  { 0xBAFA85A9, 0xBAFA85A9 }, { 0xBAB65610, 0xBAB65610 },
+  { 0xBA725820, 0xBA725820 }, { 0xBA2E8BA3, 0 },
+  { 0xB9EAF063, 0 },          { 0xB9A7862A, 0xB9A7862A },
+  { 0xB9644CC4, 0 },          { 0xB92143FA, 0xB92143FA },
+  { 0xB8DE6B9A, 0 },          { 0xB89BC36D, 0 },
+  { 0xB8594B41, 0 },          { 0xB81702E1, 0 },
+  { 0xB7D4EA19, 0xB7D4EA19 }, { 0xB79300B8, 0 },
+  { 0xB7514689, 0 },          { 0xB70FBB5A, 0xB70FBB5A },
+  { 0xB6CE5EF9, 0xB6CE5EF9 }, { 0xB68D3134, 0xB68D3134 },
+  { 0xB64C31D9, 0 },          { 0xB60B60B6, 0xB60B60B6 },
+  { 0xB5CABD9B, 0 },          { 0xB58A4855, 0xB58A4855 },
+  { 0xB54A00B5, 0xB54A00B5 }, { 0xB509E68B, 0 },
+  { 0xB4C9F9A5, 0 },          { 0xB48A39D4, 0xB48A39D4 },
+  { 0xB44AA6E9, 0xB44AA6E9 }, { 0xB40B40B4, 0xB40B40B4 },
+  { 0xB3CC0706, 0 },          { 0xB38CF9B0, 0xB38CF9B0 },
+  { 0xB34E1884, 0 },          { 0xB30F6353, 0 },
+  { 0xB2D0D9EF, 0 },          { 0xB2927C2A, 0 },
+  { 0xB25449D7, 0 },          { 0xB21642C9, 0 },
+  { 0xB1D866D1, 0xB1D866D1 }, { 0xB19AB5C5, 0 },
+  { 0xB15D2F76, 0 },          { 0xB11FD3B8, 0xB11FD3B8 },
+  { 0xB0E2A260, 0xB0E2A260 }, { 0xB0A59B42, 0 },
+  { 0xB068BE31, 0 },          { 0xB02C0B03, 0 },
+  { 0xAFEF818C, 0 },          { 0xAFB321A1, 0xAFB321A1 },
+  { 0xAF76EB19, 0 },          { 0xAF3ADDC7, 0 },
+  { 0xAEFEF982, 0 },          { 0xAEC33E20, 0 },
+  { 0xAE87AB76, 0xAE87AB76 }, { 0xAE4C415D, 0 },
+  { 0xAE10FFA9, 0 },          { 0xADD5E632, 0xADD5E632 },
+  { 0xAD9AF4D0, 0 },          { 0xAD602B58, 0xAD602B58 },
+  { 0xAD2589A4, 0 },          { 0xACEB0F89, 0xACEB0F89 },
+  { 0xACB0BCE1, 0xACB0BCE1 }, { 0xAC769184, 0xAC769184 },
+  { 0xAC3C8D4A, 0 },          { 0xAC02B00B, 0 },
+  { 0xABC8F9A0, 0xABC8F9A0 }, { 0xAB8F69E3, 0 },
+  { 0xAB5600AC, 0 },          { 0xAB1CBDD4, 0 },
+  { 0xAAE3A136, 0 },          { 0xAAAAAAAB, 0 },
+  { 0xAA71DA0D, 0 },          { 0xAA392F36, 0 },
+  { 0xAA00AA01, 0 },          { 0xA9C84A48, 0 },
+  { 0xA9900FE6, 0 },          { 0xA957FAB5, 0xA957FAB5 },
+  { 0xA9200A92, 0xA9200A92 }, { 0xA8E83F57, 0xA8E83F57 },
+  { 0xA8B098E0, 0xA8B098E0 }, { 0xA8791709, 0 },
+  { 0xA841B9AD, 0 },          { 0xA80A80A8, 0xA80A80A8 },
+  { 0xA7D36BD8, 0 },          { 0xA79C7B17, 0 },
+  { 0xA765AE44, 0 },          { 0xA72F053A, 0 },
+  { 0xA6F87FD6, 0xA6F87FD6 }, { 0xA6C21DF7, 0 },
+  { 0xA68BDF79, 0 },          { 0xA655C439, 0xA655C439 },
+  { 0xA61FCC16, 0xA61FCC16 }, { 0xA5E9F6ED, 0xA5E9F6ED },
+  { 0xA5B4449D, 0 },          { 0xA57EB503, 0 },
+  { 0xA54947FE, 0 },          { 0xA513FD6C, 0 },
+  { 0xA4DED52C, 0xA4DED52C }, { 0xA4A9CF1E, 0 },
+  { 0xA474EB1F, 0xA474EB1F }, { 0xA4402910, 0xA4402910 },
+  { 0xA40B88D0, 0 },          { 0xA3D70A3E, 0 },
+  { 0xA3A2AD39, 0xA3A2AD39 }, { 0xA36E71A3, 0 },
+  { 0xA33A575A, 0xA33A575A }, { 0xA3065E40, 0 },
+  { 0xA2D28634, 0 },          { 0xA29ECF16, 0xA29ECF16 },
+  { 0xA26B38C9, 0 },          { 0xA237C32B, 0xA237C32B },
+  { 0xA2046E1F, 0xA2046E1F }, { 0xA1D13986, 0 },
+  { 0xA19E2540, 0 },          { 0xA16B312F, 0 },
+  { 0xA1385D35, 0 },          { 0xA105A933, 0 },
+  { 0xA0D3150C, 0 },          { 0xA0A0A0A1, 0 },
+  { 0xA06E4BD4, 0xA06E4BD4 }, { 0xA03C1689, 0 },
+  { 0xA00A00A0, 0xA00A00A0 }, { 0x9FD809FE, 0 },
+  { 0x9FA63284, 0 },          { 0x9F747A15, 0x9F747A15 },
+  { 0x9F42E095, 0x9F42E095 }, { 0x9F1165E7, 0x9F1165E7 },
+  { 0x9EE009EE, 0x9EE009EE }, { 0x9EAECC8D, 0x9EAECC8D },
+  { 0x9E7DADA9, 0 },          { 0x9E4CAD24, 0 },
+  { 0x9E1BCAE3, 0 },          { 0x9DEB06C9, 0x9DEB06C9 },
+  { 0x9DBA60BB, 0x9DBA60BB }, { 0x9D89D89E, 0 },
+  { 0x9D596E54, 0x9D596E54 }, { 0x9D2921C4, 0 },
+  { 0x9CF8F2D1, 0x9CF8F2D1 }, { 0x9CC8E161, 0 },
+  { 0x9C98ED58, 0 },          { 0x9C69169B, 0x9C69169B },
+  { 0x9C395D10, 0x9C395D10 }, { 0x9C09C09C, 0x9C09C09C },
+  { 0x9BDA4124, 0x9BDA4124 }, { 0x9BAADE8E, 0x9BAADE8E },
+  { 0x9B7B98C0, 0 },          { 0x9B4C6F9F, 0 },
+  { 0x9B1D6311, 0x9B1D6311 }, { 0x9AEE72FD, 0 },
+  { 0x9ABF9F48, 0x9ABF9F48 }, { 0x9A90E7D9, 0x9A90E7D9 },
+  { 0x9A624C97, 0 },          { 0x9A33CD67, 0x9A33CD67 },
+  { 0x9A056A31, 0 },          { 0x99D722DB, 0 },
+  { 0x99A8F74C, 0 },          { 0x997AE76B, 0x997AE76B },
+  { 0x994CF320, 0x994CF320 }, { 0x991F1A51, 0x991F1A51 },
+  { 0x98F15CE7, 0 },          { 0x98C3BAC7, 0x98C3BAC7 },
+  { 0x989633DB, 0x989633DB }, { 0x9868C80A, 0 },
+  { 0x983B773B, 0 },          { 0x980E4156, 0x980E4156 },
+  { 0x97E12644, 0x97E12644 }, { 0x97B425ED, 0x97B425ED },
+  { 0x97874039, 0 },          { 0x975A7510, 0 },
+  { 0x972DC45B, 0 },          { 0x97012E02, 0x97012E02 },
+  { 0x96D4B1EF, 0 },          { 0x96A8500A, 0 },
+  { 0x967C083B, 0 },          { 0x964FDA6C, 0x964FDA6C },
+  { 0x9623C686, 0x9623C686 }, { 0x95F7CC73, 0 },
+  { 0x95CBEC1B, 0 },          { 0x95A02568, 0x95A02568 },
+  { 0x95747844, 0 },          { 0x9548E498, 0 },
+  { 0x951D6A4E, 0 },          { 0x94F2094F, 0x94F2094F },
+  { 0x94C6C187, 0 },          { 0x949B92DE, 0 },
+  { 0x94707D3F, 0 },          { 0x94458094, 0x94458094 },
+  { 0x941A9CC8, 0x941A9CC8 }, { 0x93EFD1C5, 0x93EFD1C5 },
+  { 0x93C51F76, 0 },          { 0x939A85C4, 0x939A85C4 },
+  { 0x9370049C, 0 },          { 0x93459BE7, 0 },
+  { 0x931B4B91, 0 },          { 0x92F11384, 0x92F11384 },
+  { 0x92C6F3AC, 0x92C6F3AC }, { 0x929CEBF5, 0 },
+  { 0x9272FC48, 0x9272FC48 }, { 0x92492492, 0x92492492 },
+  { 0x921F64BF, 0 },          { 0x91F5BCB9, 0 },
+  { 0x91CC2C6C, 0x91CC2C6C }, { 0x91A2B3C5, 0 },
+  { 0x917952AF, 0 },          { 0x91500915, 0x91500915 },
+  { 0x9126D6E5, 0 },          { 0x90FDBC09, 0x90FDBC09 },
+  { 0x90D4B86F, 0 },          { 0x90ABCC02, 0x90ABCC02 },
+  { 0x9082F6B0, 0 },          { 0x905A3863, 0x905A3863 },
+  { 0x9031910A, 0 },          { 0x90090090, 0x90090090 },
+  { 0x8FE086E3, 0 },          { 0x8FB823EE, 0x8FB823EE },
+  { 0x8F8FD7A0, 0 },          { 0x8F67A1E4, 0 },
+  { 0x8F3F82A8, 0x8F3F82A8 }, { 0x8F1779DA, 0 },
+  { 0x8EEF8766, 0 },          { 0x8EC7AB3A, 0 },
+  { 0x8E9FE542, 0x8E9FE542 }, { 0x8E78356D, 0x8E78356D },
+  { 0x8E509BA8, 0x8E509BA8 }, { 0x8E2917E1, 0 },
+  { 0x8E01AA05, 0 },          { 0x8DDA5202, 0x8DDA5202 },
+  { 0x8DB30FC6, 0x8DB30FC6 }, { 0x8D8BE340, 0 },
+  { 0x8D64CC5C, 0 },          { 0x8D3DCB09, 0 },
+  { 0x8D16DF35, 0x8D16DF35 }, { 0x8CF008CF, 0x8CF008CF },
+  { 0x8CC947C5, 0 },          { 0x8CA29C04, 0x8CA29C04 },
+  { 0x8C7C057D, 0 },          { 0x8C55841D, 0 },
+  { 0x8C2F17D2, 0x8C2F17D2 }, { 0x8C08C08C, 0x8C08C08C },
+  { 0x8BE27E39, 0x8BE27E39 }, { 0x8BBC50C9, 0 },
+  { 0x8B963829, 0x8B963829 }, { 0x8B70344A, 0x8B70344A },
+  { 0x8B4A451A, 0 },          { 0x8B246A88, 0 },
+  { 0x8AFEA483, 0x8AFEA483 }, { 0x8AD8F2FC, 0 },
+  { 0x8AB355E0, 0x8AB355E0 }, { 0x8A8DCD20, 0 },
+  { 0x8A6858AB, 0 },          { 0x8A42F870, 0x8A42F870 },
+  { 0x8A1DAC60, 0x8A1DAC60 }, { 0x89F8746A, 0 },
+  { 0x89D3507D, 0 },          { 0x89AE408A, 0 },
+  { 0x89894480, 0 },          { 0x89645C4F, 0x89645C4F },
+  { 0x893F87E8, 0x893F87E8 }, { 0x891AC73B, 0 },
+  { 0x88F61A37, 0x88F61A37 }, { 0x88D180CD, 0x88D180CD },
+  { 0x88ACFAEE, 0 },          { 0x88888889, 0 },
+  { 0x8864298F, 0 },          { 0x883FDDF0, 0x883FDDF0 },
+  { 0x881BA59E, 0 },          { 0x87F78088, 0 },
+  { 0x87D36EA0, 0 },          { 0x87AF6FD6, 0 },
+  { 0x878B841B, 0 },          { 0x8767AB5F, 0x8767AB5F },
+  { 0x8743E595, 0 },          { 0x872032AC, 0x872032AC },
+  { 0x86FC9296, 0x86FC9296 }, { 0x86D90545, 0 },
+  { 0x86B58AA8, 0 },          { 0x869222B2, 0 },
+  { 0x866ECD53, 0x866ECD53 }, { 0x864B8A7E, 0 },
+  { 0x86285A23, 0x86285A23 }, { 0x86053C34, 0x86053C34 },
+  { 0x85E230A3, 0x85E230A3 }, { 0x85BF3761, 0x85BF3761 },
+  { 0x859C5060, 0x859C5060 }, { 0x85797B91, 0x85797B91 },
+  { 0x8556B8E7, 0x8556B8E7 }, { 0x85340853, 0x85340853 },
+  { 0x851169C7, 0x851169C7 }, { 0x84EEDD36, 0 },
+  { 0x84CC6290, 0 },          { 0x84A9F9C8, 0x84A9F9C8 },
+  { 0x8487A2D1, 0 },          { 0x84655D9C, 0 },
+  { 0x84432A1B, 0x84432A1B }, { 0x84210842, 0x84210842 },
+  { 0x83FEF802, 0x83FEF802 }, { 0x83DCF94E, 0 },
+  { 0x83BB0C18, 0 },          { 0x83993052, 0x83993052 },
+  { 0x837765F0, 0x837765F0 }, { 0x8355ACE4, 0 },
+  { 0x83340520, 0x83340520 }, { 0x83126E98, 0 },
+  { 0x82F0E93D, 0x82F0E93D }, { 0x82CF7504, 0 },
+  { 0x82AE11DE, 0 },          { 0x828CBFBF, 0 },
+  { 0x826B7E99, 0x826B7E99 }, { 0x824A4E61, 0 },
+  { 0x82292F08, 0 },          { 0x82082082, 0x82082082 },
+  { 0x81E722C2, 0x81E722C2 }, { 0x81C635BC, 0x81C635BC },
+  { 0x81A55963, 0 },          { 0x81848DA9, 0 },
+  { 0x8163D283, 0 },          { 0x814327E4, 0 },
+  { 0x81228DBF, 0 },          { 0x81020408, 0x81020408 },
+  { 0x80E18AB3, 0 },          { 0x80C121B3, 0 },
+  { 0x80A0C8FB, 0x80A0C8FB }, { 0x80808081, 0 },
+  { 0x80604836, 0x80604836 }, { 0x80402010, 0x80402010 },
+  { 0x80200802, 0x80200802 }, { 0xFFFFFFFF, 0xFFFFFFFF }
+};

diff --git a/libav1/av1/common/odintrin.h b/libav1/av1/common/odintrin.h
new file mode 100644
index 0000000..e1db0f4
--- /dev/null
+++ b/libav1/av1/common/odintrin.h

@@ -0,0 +1,96 @@
+/*
+ * Copyright (c) 2001-2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+/* clang-format off */
+
+#ifndef AOM_AV1_COMMON_ODINTRIN_H_
+#define AOM_AV1_COMMON_ODINTRIN_H_
+
+#include <stdlib.h>
+#include <string.h>
+
+#include "aom/aom_integer.h"
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_ports/bitops.h"
+#include "av1/common/enums.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef int od_coeff;
+
+#define OD_DIVU_DMAX (1024)
+
+extern uint32_t OD_DIVU_SMALL_CONSTS[OD_DIVU_DMAX][2];
+
+/*Translate unsigned division by small divisors into multiplications.*/
+#define OD_DIVU_SMALL(_x, _d)                                     \
+  ((uint32_t)((OD_DIVU_SMALL_CONSTS[(_d)-1][0] * (uint64_t)(_x) + \
+               OD_DIVU_SMALL_CONSTS[(_d)-1][1]) >>                \
+              32) >>                                              \
+   (OD_ILOG_NZ(_d) - 1))
+
+#define OD_DIVU(_x, _d) \
+  (((_d) < OD_DIVU_DMAX) ? (OD_DIVU_SMALL((_x), (_d))) : ((_x) / (_d)))
+
+#define OD_MINI AOMMIN
+#define OD_MAXI AOMMAX
+#define OD_CLAMPI(min, val, max) (OD_MAXI(min, OD_MINI(val, max)))
+
+/*Integer logarithm (base 2) of a nonzero unsigned 32-bit integer.
+  OD_ILOG_NZ(x) = (int)floor(log2(x)) + 1.*/
+#define OD_ILOG_NZ(x) (1 + get_msb(x))
+
+/*Enable special features for gcc and compatible compilers.*/
+#if defined(__GNUC__) && defined(__GNUC_MINOR__) && defined(__GNUC_PATCHLEVEL__)
+#define OD_GNUC_PREREQ(maj, min, pat)                                \
+  ((__GNUC__ << 16) + (__GNUC_MINOR__ << 8) + __GNUC_PATCHLEVEL__ >= \
+   ((maj) << 16) + ((min) << 8) + pat)  // NOLINT
+#else
+#define OD_GNUC_PREREQ(maj, min, pat) (0)
+#endif
+
+#if OD_GNUC_PREREQ(3, 4, 0)
+#define OD_WARN_UNUSED_RESULT __attribute__((__warn_unused_result__))
+#else
+#define OD_WARN_UNUSED_RESULT
+#endif
+
+#if OD_GNUC_PREREQ(3, 4, 0)
+#define OD_ARG_NONNULL(x) __attribute__((__nonnull__(x)))
+#else
+#define OD_ARG_NONNULL(x)
+#endif
+
+/** Copy n elements of memory from src to dst. The 0* term provides
+    compile-time type checking  */
+#if !defined(OVERRIDE_OD_COPY)
+#define OD_COPY(dst, src, n) \
+  (memcpy((dst), (src), sizeof(*(dst)) * (n) + 0 * ((dst) - (src))))
+#endif
+
+/** Copy n elements of memory from src to dst, allowing overlapping regions.
+    The 0* term provides compile-time type checking */
+#if !defined(OVERRIDE_OD_MOVE)
+# define OD_MOVE(dst, src, n) \
+ (memmove((dst), (src), sizeof(*(dst))*(n) + 0*((dst) - (src)) ))
+#endif
+
+/*All of these macros should expect floats as arguments.*/
+# define OD_SIGNMASK(a) (-((a) < 0))
+# define OD_FLIPSIGNI(a, b) (((a) + OD_SIGNMASK(b)) ^ OD_SIGNMASK(b))
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_ODINTRIN_H_

diff --git a/libav1/av1/common/onyxc_int.h b/libav1/av1/common/onyxc_int.h
new file mode 100644
index 0000000..f6a3f84
--- /dev/null
+++ b/libav1/av1/common/onyxc_int.h

@@ -0,0 +1,1394 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_ONYXC_INT_H_
+#define AOM_AV1_COMMON_ONYXC_INT_H_
+
+#include "config/aom_config.h"
+#include "config/av1_rtcd.h"
+
+#include "aom/internal/aom_codec_internal.h"
+#include "aom_util/aom_thread.h"
+#include "av1/common/alloccommon.h"
+#include "av1/common/av1_loopfilter.h"
+#include "av1/common/entropy.h"
+#include "av1/common/entropymode.h"
+#include "av1/common/entropymv.h"
+#include "av1/common/enums.h"
+#include "av1/common/frame_buffers.h"
+#include "av1/common/mv.h"
+#include "av1/common/quant_common.h"
+#include "av1/common/restoration.h"
+#include "av1/common/tile_common.h"
+#include "av1/common/timing.h"
+#include "av1/common/odintrin.h"
+//#include "av1/encoder/hash_motion.h"
+#include "aom_dsp/grain_synthesis.h"
+#include "aom_dsp/grain_table.h"
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#if defined(__clang__) && defined(__has_warning)
+#if __has_feature(cxx_attributes) && __has_warning("-Wimplicit-fallthrough")
+#define AOM_FALLTHROUGH_INTENDED [[clang::fallthrough]]  // NOLINT
+#endif
+#elif defined(__GNUC__) && __GNUC__ >= 7
+#define AOM_FALLTHROUGH_INTENDED __attribute__((fallthrough))  // NOLINT
+#endif
+
+#ifndef AOM_FALLTHROUGH_INTENDED
+#define AOM_FALLTHROUGH_INTENDED \
+  do {                           \
+  } while (0)
+#endif
+
+#define CDEF_MAX_STRENGTHS 16
+
+/* Constant values while waiting for the sequence header */
+#define FRAME_ID_LENGTH 15
+#define DELTA_FRAME_ID_LENGTH 14
+
+#define FRAME_CONTEXTS (FRAME_BUFFERS + 1)
+// Extra frame context which is always kept at default values
+#define FRAME_CONTEXT_DEFAULTS (FRAME_CONTEXTS - 1)
+#define PRIMARY_REF_BITS 3
+#define PRIMARY_REF_NONE 7
+
+#define NUM_PING_PONG_BUFFERS 2
+
+#define MAX_NUM_TEMPORAL_LAYERS 8
+#define MAX_NUM_SPATIAL_LAYERS 4
+/* clang-format off */
+// clang-format seems to think this is a pointer dereference and not a
+// multiplication.
+#define MAX_NUM_OPERATING_POINTS \
+  MAX_NUM_TEMPORAL_LAYERS * MAX_NUM_SPATIAL_LAYERS
+/* clang-format on*/
+
+// TODO(jingning): Turning this on to set up transform coefficient
+// processing timer.
+#define TXCOEFF_TIMER 0
+#define TXCOEFF_COST_TIMER 0
+
+enum {
+  SINGLE_REFERENCE = 0,
+  COMPOUND_REFERENCE = 1,
+  REFERENCE_MODE_SELECT = 2,
+  REFERENCE_MODES = 3,
+} UENUM1BYTE(REFERENCE_MODE);
+
+enum {
+  /**
+   * Frame context updates are disabled
+   */
+  REFRESH_FRAME_CONTEXT_DISABLED,
+  /**
+   * Update frame context to values resulting from backward probability
+   * updates based on entropy/counts in the decoded frame
+   */
+  REFRESH_FRAME_CONTEXT_BACKWARD,
+} UENUM1BYTE(REFRESH_FRAME_CONTEXT_MODE);
+
+#define MFMV_STACK_SIZE 3
+typedef struct {
+  int_mv mfmv0;
+  uint8_t ref_frame_offset;
+} TPL_MV_REF;
+
+typedef struct {
+  int_mv mv;
+  MV_REFERENCE_FRAME ref_frame;
+} MV_REF;
+
+// FIXME(jack.haughton@argondesign.com): This enum was originally in
+// encoder/ratectrl.h, and is encoder specific. When we move to C++, this
+// should go back there and BufferPool should be templatized.
+enum {
+  INTER_NORMAL = 0,
+  INTER_LOW = 1,
+  INTER_HIGH = 2,
+  GF_ARF_LOW = 3,
+  GF_ARF_STD = 4,
+  KF_STD = 5,
+  RATE_FACTOR_LEVELS = 6
+} UENUM1BYTE(RATE_FACTOR_LEVEL);
+
+typedef struct RefCntBuffer {
+  // For a RefCntBuffer, the following are reference-holding variables:
+  // - cm->ref_frame_map[]
+  // - cm->cur_frame
+  // - cm->scaled_ref_buf[] (encoder only)
+  // - cm->next_ref_frame_map[] (decoder only)
+  // - pbi->output_frame_index[] (decoder only)
+  // With that definition, 'ref_count' is the number of reference-holding
+  // variables that are currently referencing this buffer.
+  // For example:
+  // - suppose this buffer is at index 'k' in the buffer pool, and
+  // - Total 'n' of the variables / array elements above have value 'k' (that
+  // is, they are pointing to buffer at index 'k').
+  // Then, pool->frame_bufs[k].ref_count = n.
+  int ref_count;
+
+  unsigned int order_hint;
+  unsigned int ref_order_hints[INTER_REFS_PER_FRAME];
+
+  MV_REF *mvs;
+  uint8_t *seg_map;
+  struct segmentation seg;
+  int mi_rows;
+  int mi_cols;
+  // Width and height give the size of the buffer (before any upscaling, unlike
+  // the sizes that can be derived from the buf structure)
+  int width;
+  int height;
+  WarpedMotionParams global_motion[REF_FRAMES];
+  int showable_frame;  // frame can be used as show existing frame in future
+  uint8_t film_grain_params_present;
+  aom_film_grain_t film_grain_params;
+  aom_codec_frame_buffer_t raw_frame_buffer;
+  YV12_BUFFER_CONFIG buf;
+  FRAME_TYPE frame_type;
+
+  // This is only used in the encoder but needs to be indexed per ref frame
+  // so it's extremely convenient to keep it here.
+  int interp_filter_selected[SWITCHABLE];
+
+  // Inter frame reference frame delta for loop filter
+  int8_t ref_deltas[REF_FRAMES];
+
+  // 0 = ZERO_MV, MV
+  int8_t mode_deltas[MAX_MODE_LF_DELTAS];
+
+  FRAME_CONTEXT frame_context;
+  RATE_FACTOR_LEVEL frame_rf_level;
+} RefCntBuffer;
+
+typedef void(*internal_release_frame_buffer_cb_fn_t)(void *priv, YV12_BUFFER_CONFIG *fb);
+
+typedef struct BufferPool {
+// Protect BufferPool from being accessed by several FrameWorkers at
+// the same time during frame parallel decode.
+// TODO(hkuang): Try to use atomic variable instead of locking the whole pool.
+// TODO(wtc): Remove this. See
+// https://chromium-review.googlesource.com/c/webm/libvpx/+/560630.
+#if CONFIG_MULTITHREAD
+  pthread_mutex_t pool_mutex;
+#endif
+
+  // Private data associated with the frame buffer callbacks.
+  void *cb_priv;
+
+  aom_get_frame_buffer_cb_fn_t get_fb_cb;
+  aom_release_frame_buffer_cb_fn_t release_fb_cb;
+  internal_release_frame_buffer_cb_fn_t release_fb_cb_int;
+
+  RefCntBuffer frame_bufs[FRAME_BUFFERS];
+
+  // Frame buffers allocated internally by the codec.
+  InternalFrameBufferList int_frame_buffers;
+} BufferPool;
+
+typedef struct BitstreamLevel {
+  uint8_t major;
+  uint8_t minor;
+} BitstreamLevel;
+
+typedef struct {
+  int cdef_pri_damping;
+  int cdef_sec_damping;
+  int nb_cdef_strengths;
+  int cdef_strengths[CDEF_MAX_STRENGTHS];
+  int cdef_uv_strengths[CDEF_MAX_STRENGTHS];
+  int cdef_bits;
+} CdefInfo;
+
+typedef struct {
+  int delta_q_present_flag;
+  // Resolution of delta quant
+  int delta_q_res;
+  int delta_lf_present_flag;
+  // Resolution of delta lf level
+  int delta_lf_res;
+  // This is a flag for number of deltas of loop filter level
+  // 0: use 1 delta, for y_vertical, y_horizontal, u, and v
+  // 1: use separate deltas for each filter level
+  int delta_lf_multi;
+} DeltaQInfo;
+
+typedef struct {
+  int enable_order_hint;           // 0 - disable order hint, and related tools
+  int order_hint_bits_minus_1;     // dist_wtd_comp, ref_frame_mvs,
+                                   // frame_sign_bias
+                                   // if 0, enable_dist_wtd_comp and
+                                   // enable_ref_frame_mvs must be set as 0.
+  int enable_dist_wtd_comp;        // 0 - disable dist-wtd compound modes
+                                   // 1 - enable it
+  int enable_ref_frame_mvs;        // 0 - disable ref frame mvs
+                                   // 1 - enable it
+} OrderHintInfo;
+
+// Sequence header structure.
+// Note: All syntax elements of sequence_header_obu that need to be
+// bit-identical across multiple sequence headers must be part of this struct,
+// so that consistency is checked by are_seq_headers_consistent() function.
+typedef struct SequenceHeader {
+  int num_bits_width;
+  int num_bits_height;
+  int max_frame_width;
+  int max_frame_height;
+  uint8_t frame_id_numbers_present_flag;
+  int frame_id_length;
+  int delta_frame_id_length;
+  BLOCK_SIZE sb_size;  // Size of the superblock used for this frame
+  int mib_size;        // Size of the superblock in units of MI blocks
+  int mib_size_log2;   // Log 2 of above.
+
+  OrderHintInfo order_hint_info;
+
+  uint8_t force_screen_content_tools;  // 0 - force off
+                                       // 1 - force on
+                                       // 2 - adaptive
+  uint8_t still_picture;               // Video is a single frame still picture
+  uint8_t reduced_still_picture_hdr;   // Use reduced header for still picture
+  uint8_t force_integer_mv;            // 0 - Don't force. MV can use subpel
+                                       // 1 - force to integer
+                                       // 2 - adaptive
+  uint8_t enable_filter_intra;         // enables/disables filterintra
+  uint8_t enable_intra_edge_filter;    // enables/disables edge upsampling
+  uint8_t enable_interintra_compound;  // enables/disables interintra_compound
+  uint8_t enable_masked_compound;      // enables/disables masked compound
+  uint8_t enable_dual_filter;          // 0 - disable dual interpolation filter
+                                       // 1 - enable vert/horz filter selection
+  uint8_t enable_warped_motion;        // 0 - disable warp for the sequence
+                                       // 1 - enable warp for the sequence
+  uint8_t enable_superres;             // 0 - Disable superres for the sequence
+                                       //     and no frame level superres flag
+                                       // 1 - Enable superres for the sequence
+                                       //     enable per-frame superres flag
+  uint8_t enable_cdef;                 // To turn on/off CDEF
+  uint8_t enable_restoration;          // To turn on/off loop restoration
+  BITSTREAM_PROFILE profile;
+
+  // Operating point info.
+  int operating_points_cnt_minus_1;
+  int operating_point_idc[MAX_NUM_OPERATING_POINTS];
+  uint8_t display_model_info_present_flag;
+  uint8_t decoder_model_info_present_flag;
+  BitstreamLevel level[MAX_NUM_OPERATING_POINTS];
+  uint8_t tier[MAX_NUM_OPERATING_POINTS];  // seq_tier in the spec. One bit: 0
+                                           // or 1.
+
+  // Color config.
+  aom_bit_depth_t bit_depth;  // AOM_BITS_8 in profile 0 or 1,
+                              // AOM_BITS_10 or AOM_BITS_12 in profile 2 or 3.
+  uint8_t use_highbitdepth;   // If true, we need to use 16bit frame buffers.
+  uint8_t monochrome;         // Monochorme video
+  aom_color_primaries_t color_primaries;
+  aom_transfer_characteristics_t transfer_characteristics;
+  aom_matrix_coefficients_t matrix_coefficients;
+  int color_range;
+  int subsampling_x;          // Chroma subsampling for x
+  int subsampling_y;          // Chroma subsampling for y
+  aom_chroma_sample_position_t chroma_sample_position;
+  uint8_t separate_uv_delta_q;
+  uint8_t film_grain_params_present;
+} SequenceHeader;
+
+typedef struct {
+    int skip_mode_allowed;
+    int skip_mode_flag;
+    int ref_frame_idx_0;
+    int ref_frame_idx_1;
+} SkipModeInfo;
+
+typedef struct {
+  FRAME_TYPE frame_type;
+  REFERENCE_MODE reference_mode;
+
+  unsigned int order_hint;
+  unsigned int frame_number;
+  SkipModeInfo skip_mode_info;
+  int refresh_frame_flags;  // Which ref frames are overwritten by this frame
+  int frame_refs_short_signaling;
+} CurrentFrame;
+
+typedef struct AV1Common {
+  CurrentFrame current_frame;
+  struct aom_internal_error_info error;
+  int width;
+  int height;
+  int render_width;
+  int render_height;
+  int timing_info_present;
+  aom_timing_info_t timing_info;
+  int buffer_removal_time_present;
+  aom_dec_model_info_t buffer_model;
+  aom_dec_model_op_parameters_t op_params[MAX_NUM_OPERATING_POINTS + 1];
+  aom_op_timing_info_t op_frame_timing[MAX_NUM_OPERATING_POINTS + 1];
+  uint32_t frame_presentation_time;
+
+  int context_update_tile_id;
+
+  // Scale of the current frame with respect to itself.
+  struct scale_factors sf_identity;
+
+  RefCntBuffer *prev_frame;
+
+  // TODO(hkuang): Combine this with cur_buf in macroblockd.
+  RefCntBuffer *cur_frame;
+
+  // For encoder, we have a two-level mapping from reference frame type to the
+  // corresponding buffer in the buffer pool:
+  // * 'remapped_ref_idx[i - 1]' maps reference type 'i' (range: LAST_FRAME ...
+  // EXTREF_FRAME) to a remapped index 'j' (in range: 0 ... REF_FRAMES - 1)
+  // * Later, 'cm->ref_frame_map[j]' maps the remapped index 'j' to a pointer to
+  // the reference counted buffer structure RefCntBuffer, taken from the buffer
+  // pool cm->buffer_pool->frame_bufs.
+  //
+  // LAST_FRAME,                        ...,      EXTREF_FRAME
+  //      |                                           |
+  //      v                                           v
+  // remapped_ref_idx[LAST_FRAME - 1],  ...,  remapped_ref_idx[EXTREF_FRAME - 1]
+  //      |                                           |
+  //      v                                           v
+  // ref_frame_map[],                   ...,     ref_frame_map[]
+  //
+  // Note: INTRA_FRAME always refers to the current frame, so there's no need to
+  // have a remapped index for the same.
+  int remapped_ref_idx[REF_FRAMES];
+
+  struct scale_factors ref_scale_factors[REF_FRAMES];
+
+  // For decoder, ref_frame_map[i] maps reference type 'i' to a pointer to
+  // the buffer in the buffer pool 'cm->buffer_pool.frame_bufs'.
+  // For encoder, ref_frame_map[j] (where j = remapped_ref_idx[i]) maps
+  // remapped reference index 'j' (that is, original reference type 'i') to
+  // a pointer to the buffer in the buffer pool 'cm->buffer_pool.frame_bufs'.
+  RefCntBuffer *ref_frame_map[REF_FRAMES];
+
+  // Prepare ref_frame_map for the next frame.
+  // Only used in frame parallel decode.
+  RefCntBuffer *next_ref_frame_map[REF_FRAMES];
+  FRAME_TYPE last_frame_type; /* last frame's frame type for motion search.*/
+
+  int show_frame;
+  int showable_frame;  // frame can be used as show existing frame in future
+  int show_existing_frame;
+
+  uint8_t disable_cdf_update;
+  int allow_high_precision_mv;
+  uint8_t cur_frame_force_integer_mv;  // 0 the default in AOM, 1 only integer
+
+  uint8_t allow_screen_content_tools;
+  int allow_intrabc;
+  int allow_warped_motion;
+
+  // MBs, mb_rows/cols is in 16-pixel units; mi_rows/cols is in
+  // MB_MODE_INFO (8-pixel) units.
+  int MBs;
+  int mb_rows, mi_rows;
+  int mb_cols, mi_cols;
+  int mi_stride;
+
+  /* profile settings */
+  TX_MODE tx_mode;
+
+#if CONFIG_ENTROPY_STATS
+  int coef_cdf_category;
+#endif
+
+  int base_qindex;
+  int y_dc_delta_q;
+  int u_dc_delta_q;
+  int v_dc_delta_q;
+  int u_ac_delta_q;
+  int v_ac_delta_q;
+
+  // The dequantizers below are true dequantizers used only in the
+  // dequantization process.  They have the same coefficient
+  // shift/scale as TX.
+  int16_t y_dequant_QTX[MAX_SEGMENTS][2];
+  int16_t u_dequant_QTX[MAX_SEGMENTS][2];
+  int16_t v_dequant_QTX[MAX_SEGMENTS][2];
+
+  // Global quant matrix tables
+  const qm_val_t *giqmatrix[NUM_QM_LEVELS][3][TX_SIZES_ALL];
+  const qm_val_t *gqmatrix[NUM_QM_LEVELS][3][TX_SIZES_ALL];
+
+  // Local quant matrix tables for each frame
+  const qm_val_t *y_iqmatrix[MAX_SEGMENTS][TX_SIZES_ALL];
+  const qm_val_t *u_iqmatrix[MAX_SEGMENTS][TX_SIZES_ALL];
+  const qm_val_t *v_iqmatrix[MAX_SEGMENTS][TX_SIZES_ALL];
+
+  // Encoder
+  int using_qmatrix;
+  int qm_y;
+  int qm_u;
+  int qm_v;
+  int min_qmlevel;
+  int max_qmlevel;
+  int use_quant_b_adapt;
+
+  /* We allocate a MB_MODE_INFO struct for each macroblock, together with
+     an extra row on top and column on the left to simplify prediction. */
+  int mi_alloc_size;
+  MB_MODE_INFO *mi;  /* Corresponds to upper left visible macroblock */
+
+  // Grid of pointers to 8x8 MB_MODE_INFO structs.  Any 8x8 not in the visible
+  // area will be NULL.
+  MB_MODE_INFO **mi_grid_base;
+  MB_MODE_INFO **mi_grid_visible;
+
+  // Whether to use previous frames' motion vectors for prediction.
+  int allow_ref_frame_mvs;
+
+  uint8_t *last_frame_seg_map;
+
+  InterpFilter interp_filter;
+
+  int switchable_motion_mode;
+
+  loop_filter_info_n lf_info;
+  // The denominator of the superres scale; the numerator is fixed.
+  uint8_t superres_scale_denominator;
+  int superres_upscaled_width;
+  int superres_upscaled_height;
+  RestorationInfo rst_info[MAX_MB_PLANE];
+
+  // Pointer to a scratch buffer used by self-guided restoration
+  int32_t *rst_tmpbuf;
+  RestorationLineBuffers *rlbs;
+
+  // Output of loop restoration
+  YV12_BUFFER_CONFIG rst_frame;
+
+  // Flag signaling how frame contexts should be updated at the end of
+  // a frame decode
+  REFRESH_FRAME_CONTEXT_MODE refresh_frame_context;
+
+  int ref_frame_sign_bias[REF_FRAMES]; /* Two state 0, 1 */
+
+  struct loopfilter lf;
+  struct segmentation seg;
+  int coded_lossless;  // frame is fully lossless at the coded resolution.
+  int all_lossless;    // frame is fully lossless at the upscaled resolution.
+
+  int reduced_tx_set_used;
+
+  // Context probabilities for reference frame prediction
+  MV_REFERENCE_FRAME comp_fwd_ref[FWD_REFS];
+  MV_REFERENCE_FRAME comp_bwd_ref[BWD_REFS];
+
+  FRAME_CONTEXT *fc;              /* this frame entropy */
+  FRAME_CONTEXT *default_frame_context;
+  int primary_ref_frame;
+
+  int error_resilient_mode;
+
+  int tile_cols, tile_rows;
+
+  int max_tile_width_sb;
+  int min_log2_tile_cols;
+  int max_log2_tile_cols;
+  int max_log2_tile_rows;
+  int min_log2_tile_rows;
+  int min_log2_tiles;
+  int max_tile_height_sb;
+  int uniform_tile_spacing_flag;
+  int log2_tile_cols;                        // only valid for uniform tiles
+  int log2_tile_rows;                        // only valid for uniform tiles
+  int tile_col_start_sb[MAX_TILE_COLS + 1];  // valid for 0 <= i <= tile_cols
+  int tile_row_start_sb[MAX_TILE_ROWS + 1];  // valid for 0 <= i <= tile_rows
+  int tile_width, tile_height;               // In MI units
+
+  unsigned int large_scale_tile;
+  unsigned int single_tile_decoding;
+
+  int byte_alignment;
+  int skip_loop_filter;
+  int skip_film_grain;
+
+  // External BufferPool passed from outside.
+  BufferPool *buffer_pool;
+
+  PARTITION_CONTEXT **above_seg_context;
+  ENTROPY_CONTEXT **above_context[MAX_MB_PLANE];
+  TXFM_CONTEXT **above_txfm_context;
+  WarpedMotionParams global_motion[REF_FRAMES];
+  aom_film_grain_t film_grain_params;
+
+  CdefInfo cdef_info;
+  DeltaQInfo delta_q_info;  // Delta Q and Delta LF parameters
+
+  int num_tg;
+  SequenceHeader seq_params;
+  int current_frame_id;
+  int ref_frame_id[REF_FRAMES];
+  int valid_for_referencing[REF_FRAMES];
+  TPL_MV_REF *tpl_mvs;
+  int tpl_mvs_mem_size;
+  // TODO(jingning): This can be combined with sign_bias later.
+  int8_t ref_frame_side[REF_FRAMES];
+
+  int is_annexb;
+
+  int temporal_layer_id;
+  int spatial_layer_id;
+  unsigned int number_temporal_layers;
+  unsigned int number_spatial_layers;
+  int num_allocated_above_context_mi_col;
+  int num_allocated_above_contexts;
+  int num_allocated_above_context_planes;
+
+#if TXCOEFF_TIMER
+  int64_t cum_txcoeff_timer;
+  int64_t txcoeff_timer;
+  int txb_count;
+#endif
+
+#if TXCOEFF_COST_TIMER
+  int64_t cum_txcoeff_cost_timer;
+  int64_t txcoeff_cost_timer;
+  int64_t txcoeff_cost_count;
+#endif
+  const cfg_options_t *options;
+  int is_decoding;
+  DECLARE_ALIGNED(16, FRAME_CONTEXT, fc_alloc[2]);
+} AV1_COMMON;
+
+// TODO(hkuang): Don't need to lock the whole pool after implementing atomic
+// frame reference count.
+static void lock_buffer_pool(BufferPool *const pool) {
+#if CONFIG_MULTITHREAD
+  pthread_mutex_lock(&pool->pool_mutex);
+#else
+  (void)pool;
+#endif
+}
+
+static void unlock_buffer_pool(BufferPool *const pool) {
+#if CONFIG_MULTITHREAD
+  pthread_mutex_unlock(&pool->pool_mutex);
+#else
+  (void)pool;
+#endif
+}
+
+static INLINE YV12_BUFFER_CONFIG *get_ref_frame(AV1_COMMON *cm, int index) {
+  if (index < 0 || index >= REF_FRAMES) return NULL;
+  if (cm->ref_frame_map[index] == NULL) return NULL;
+  return &cm->ref_frame_map[index]->buf;
+}
+
+static INLINE int get_free_fb(AV1_COMMON *cm) {
+  RefCntBuffer *const frame_bufs = cm->buffer_pool->frame_bufs;
+  int i;
+
+  lock_buffer_pool(cm->buffer_pool);
+  for (i = 0; i < FRAME_BUFFERS; ++i)
+    if (frame_bufs[i].ref_count == 0) break;
+
+  if (i != FRAME_BUFFERS) {
+    if (frame_bufs[i].buf.use_external_reference_buffers) {
+      // If this frame buffer's y_buffer, u_buffer, and v_buffer point to the
+      // external reference buffers. Restore the buffer pointers to point to the
+      // internally allocated memory.
+      YV12_BUFFER_CONFIG *ybf = &frame_bufs[i].buf;
+      ybf->y_buffer = ybf->store_buf_adr[0];
+      ybf->u_buffer = ybf->store_buf_adr[1];
+      ybf->v_buffer = ybf->store_buf_adr[2];
+      ybf->use_external_reference_buffers = 0;
+    }
+
+    frame_bufs[i].ref_count = 1;
+  } else {
+    // We should never run out of free buffers. If this assertion fails, there
+    // is a reference leak.
+    assert(0 && "Ran out of free frame buffers. Likely a reference leak.");
+    // Reset i to be INVALID_IDX to indicate no free buffer found.
+    i = INVALID_IDX;
+  }
+
+  unlock_buffer_pool(cm->buffer_pool);
+  return i;
+}
+
+static INLINE RefCntBuffer *assign_cur_frame_new_fb(AV1_COMMON *const cm) {
+  // Release the previously-used frame-buffer
+  if (cm->cur_frame != NULL) {
+    --cm->cur_frame->ref_count;
+    cm->cur_frame = NULL;
+  }
+
+  // Assign a new framebuffer
+  const int new_fb_idx = get_free_fb(cm);
+  if (new_fb_idx == INVALID_IDX) return NULL;
+
+  cm->cur_frame = &cm->buffer_pool->frame_bufs[new_fb_idx];
+  cm->cur_frame->buf.buf_8bit_valid = 0;
+  av1_zero(cm->cur_frame->interp_filter_selected);
+  return cm->cur_frame;
+}
+
+// Modify 'lhs_ptr' to reference the buffer at 'rhs_ptr', and update the ref
+// counts accordingly.
+static INLINE void assign_frame_buffer_p(RefCntBuffer **lhs_ptr,
+                                       RefCntBuffer *rhs_ptr) {
+  RefCntBuffer *const old_ptr = *lhs_ptr;
+  if (old_ptr != NULL) {
+    assert(old_ptr->ref_count > 0);
+    // One less reference to the buffer at 'old_ptr', so decrease ref count.
+    --old_ptr->ref_count;
+  }
+
+  *lhs_ptr = rhs_ptr;
+  // One more reference to the buffer at 'rhs_ptr', so increase ref count.
+  ++rhs_ptr->ref_count;
+}
+
+static INLINE int frame_is_intra_only(const AV1_COMMON *const cm) {
+  return cm->current_frame.frame_type == KEY_FRAME ||
+      cm->current_frame.frame_type == INTRA_ONLY_FRAME;
+}
+
+static INLINE int frame_is_sframe(const AV1_COMMON *cm) {
+  return cm->current_frame.frame_type == S_FRAME;
+}
+
+// These functions take a reference frame label between LAST_FRAME and
+// EXTREF_FRAME inclusive.  Note that this is different to the indexing
+// previously used by the frame_refs[] array.
+static INLINE int get_ref_frame_map_idx(const AV1_COMMON *const cm,
+                                        const MV_REFERENCE_FRAME ref_frame) {
+  return (ref_frame >= LAST_FRAME && ref_frame <= EXTREF_FRAME)
+             ? cm->remapped_ref_idx[ref_frame - LAST_FRAME]
+             : INVALID_IDX;
+}
+
+static INLINE RefCntBuffer *get_ref_frame_buf(
+    const AV1_COMMON *const cm, const MV_REFERENCE_FRAME ref_frame) {
+  const int map_idx = get_ref_frame_map_idx(cm, ref_frame);
+  return (map_idx != INVALID_IDX) ? cm->ref_frame_map[map_idx] : NULL;
+}
+
+// Both const and non-const versions of this function are provided so that it
+// can be used with a const AV1_COMMON if needed.
+static INLINE const struct scale_factors *get_ref_scale_factors_const(
+    const AV1_COMMON *const cm, const MV_REFERENCE_FRAME ref_frame) {
+  const int map_idx = get_ref_frame_map_idx(cm, ref_frame);
+  return (map_idx != INVALID_IDX) ? &cm->ref_scale_factors[map_idx] : NULL;
+}
+
+static INLINE struct scale_factors *get_ref_scale_factors(
+    AV1_COMMON *const cm, const MV_REFERENCE_FRAME ref_frame) {
+  const int map_idx = get_ref_frame_map_idx(cm, ref_frame);
+  return (map_idx != INVALID_IDX) ? &cm->ref_scale_factors[map_idx] : NULL;
+}
+
+static INLINE RefCntBuffer *get_primary_ref_frame_buf(
+    const AV1_COMMON *const cm) {
+  if (cm->primary_ref_frame == PRIMARY_REF_NONE) return NULL;
+  const int map_idx = get_ref_frame_map_idx(cm, cm->primary_ref_frame + 1);
+  return (map_idx != INVALID_IDX) ? cm->ref_frame_map[map_idx] : NULL;
+}
+
+// Returns 1 if this frame might allow mvs from some reference frame.
+static INLINE int frame_might_allow_ref_frame_mvs(const AV1_COMMON *cm) {
+  return !cm->error_resilient_mode &&
+    cm->seq_params.order_hint_info.enable_ref_frame_mvs &&
+    cm->seq_params.order_hint_info.enable_order_hint &&
+    !frame_is_intra_only(cm);
+}
+
+// Returns 1 if this frame might use warped_motion
+static INLINE int frame_might_allow_warped_motion(const AV1_COMMON *cm) {
+  return !cm->error_resilient_mode && !frame_is_intra_only(cm) &&
+         cm->seq_params.enable_warped_motion;
+}
+
+static INLINE void ensure_mv_buffer(RefCntBuffer *buf, AV1_COMMON *cm) {
+  const int buf_rows = buf->mi_rows;
+  const int buf_cols = buf->mi_cols;
+  assert(buf->buf.hw_buffer);
+  buf->mvs = (MV_REF *)buf->buf.hw_buffer->mvs_alloc;
+  buf->seg_map = buf->buf.hw_buffer->seg_alloc;
+  if (buf_rows != cm->mi_rows || buf_cols != cm->mi_cols) {
+      buf->mi_rows = cm->mi_rows;
+      buf->mi_cols = cm->mi_cols;
+      memset(buf->mvs, 0, sizeof(*buf->mvs) * ((cm->mi_rows + 1) >> 1) * ((cm->mi_cols + 1) >> 1));
+      memset(buf->seg_map, 0, cm->mi_rows * cm->mi_cols);
+
+    //  aom_free(buf->mvs);
+    //  CHECK_MEM_ERROR(cm, buf->mvs,
+    //      (MV_REF *)aom_calloc(
+    //      ((cm->mi_rows + 1) >> 1) * ((cm->mi_cols + 1) >> 1),
+    //          sizeof(*buf->mvs)));
+    //  aom_free(buf->seg_map);
+    //  CHECK_MEM_ERROR(cm, buf->seg_map,
+    //      (uint8_t *)aom_calloc(cm->mi_rows * cm->mi_cols,
+    //          sizeof(*buf->seg_map)));
+  }
+  assert(buf->mvs);
+  assert(buf->seg_map);
+}
+
+void cfl_init(CFL_CTX *cfl, const SequenceHeader *seq_params);
+
+static INLINE int av1_num_planes(const AV1_COMMON *cm) {
+  return cm->seq_params.monochrome ? 1 : MAX_MB_PLANE;
+}
+
+static INLINE void av1_init_above_context(AV1_COMMON *cm, MACROBLOCKD *xd,
+                                          const int tile_row) {
+  const int num_planes = av1_num_planes(cm);
+  for (int i = 0; i < num_planes; ++i) {
+    xd->above_context[i] = cm->above_context[i][tile_row];
+  }
+  xd->above_seg_context = cm->above_seg_context[tile_row];
+  xd->above_txfm_context = cm->above_txfm_context[tile_row];
+}
+
+static INLINE void av1_init_macroblockd(AV1_COMMON *cm, MACROBLOCKD *xd,
+                                        tran_low_t *dqcoeff) {
+  const int num_planes = av1_num_planes(cm);
+  for (int i = 0; i < num_planes; ++i) {
+    xd->plane[i].dqcoeff = dqcoeff;
+
+    if (xd->plane[i].plane_type == PLANE_TYPE_Y) {
+      memcpy(xd->plane[i].seg_dequant_QTX, cm->y_dequant_QTX,
+             sizeof(cm->y_dequant_QTX));
+      memcpy(xd->plane[i].seg_iqmatrix, cm->y_iqmatrix, sizeof(cm->y_iqmatrix));
+
+    } else {
+      if (i == AOM_PLANE_U) {
+        memcpy(xd->plane[i].seg_dequant_QTX, cm->u_dequant_QTX,
+               sizeof(cm->u_dequant_QTX));
+        memcpy(xd->plane[i].seg_iqmatrix, cm->u_iqmatrix,
+               sizeof(cm->u_iqmatrix));
+      } else {
+        memcpy(xd->plane[i].seg_dequant_QTX, cm->v_dequant_QTX,
+               sizeof(cm->v_dequant_QTX));
+        memcpy(xd->plane[i].seg_iqmatrix, cm->v_iqmatrix,
+               sizeof(cm->v_iqmatrix));
+      }
+    }
+  }
+  xd->mi_stride = cm->mi_stride;
+  xd->error_info = &cm->error;
+  cfl_init(&xd->cfl, &cm->seq_params);
+}
+
+static INLINE void set_skip_context(MACROBLOCKD *xd, int mi_row, int mi_col,
+                                    const int num_planes) {
+  int i;
+  int row_offset = mi_row;
+  int col_offset = mi_col;
+  for (i = 0; i < num_planes; ++i) {
+    struct macroblockd_plane *const pd = &xd->plane[i];
+    // Offset the buffer pointer
+    const BLOCK_SIZE bsize = xd->mi[0]->sb_type;
+    if (pd->subsampling_y && (mi_row & 0x01) && (mi_size_high[bsize] == 1))
+      row_offset = mi_row - 1;
+    if (pd->subsampling_x && (mi_col & 0x01) && (mi_size_wide[bsize] == 1))
+      col_offset = mi_col - 1;
+    int above_idx = col_offset;
+    int left_idx = row_offset & MAX_MIB_MASK;
+    pd->above_context = &xd->above_context[i][above_idx >> pd->subsampling_x];
+    pd->left_context = &xd->left_context[i][left_idx >> pd->subsampling_y];
+  }
+}
+
+static INLINE int calc_mi_size(int len) {
+  // len is in mi units. Align to a multiple of SBs.
+  return ALIGN_POWER_OF_TWO(len, MAX_MIB_SIZE_LOG2);
+}
+
+static INLINE void set_plane_n4(MACROBLOCKD *const xd, int bw, int bh,
+                                const int num_planes) {
+  int i;
+  for (i = 0; i < num_planes; i++) {
+    xd->plane[i].width = (bw * MI_SIZE) >> xd->plane[i].subsampling_x;
+    xd->plane[i].height = (bh * MI_SIZE) >> xd->plane[i].subsampling_y;
+
+    xd->plane[i].width = AOMMAX(xd->plane[i].width, 4);
+    xd->plane[i].height = AOMMAX(xd->plane[i].height, 4);
+  }
+}
+
+static INLINE void set_mi_row_col(MACROBLOCKD *xd, const TileInfo *const tile,
+                                  int mi_row, int bh, int mi_col, int bw,
+                                  int mi_rows, int mi_cols) {
+  xd->mb_to_top_edge = -((mi_row * MI_SIZE) * 8);
+  xd->mb_to_bottom_edge = ((mi_rows - bh - mi_row) * MI_SIZE) * 8;
+  xd->mb_to_left_edge = -((mi_col * MI_SIZE) * 8);
+  xd->mb_to_right_edge = ((mi_cols - bw - mi_col) * MI_SIZE) * 8;
+
+  // Are edges available for intra prediction?
+  xd->up_available = (mi_row > tile->mi_row_start);
+
+  const int ss_x = xd->plane[1].subsampling_x;
+  const int ss_y = xd->plane[1].subsampling_y;
+
+  xd->left_available = (mi_col > tile->mi_col_start);
+  xd->chroma_up_available = xd->up_available;
+  xd->chroma_left_available = xd->left_available;
+  if (ss_x && bw < mi_size_wide[BLOCK_8X8])
+    xd->chroma_left_available = (mi_col - 1) > tile->mi_col_start;
+  if (ss_y && bh < mi_size_high[BLOCK_8X8])
+    xd->chroma_up_available = (mi_row - 1) > tile->mi_row_start;
+  if (xd->up_available) {
+    xd->above_mbmi = xd->mi[-xd->mi_stride];
+  } else {
+    xd->above_mbmi = NULL;
+  }
+
+  if (xd->left_available) {
+    xd->left_mbmi = xd->mi[-1];
+  } else {
+    xd->left_mbmi = NULL;
+  }
+
+  const int chroma_ref = ((mi_row & 0x01) || !(bh & 0x01) || !ss_y) &&
+                         ((mi_col & 0x01) || !(bw & 0x01) || !ss_x);
+  if (chroma_ref) {
+    // To help calculate the "above" and "left" chroma blocks, note that the
+    // current block may cover multiple luma blocks (eg, if partitioned into
+    // 4x4 luma blocks).
+    // First, find the top-left-most luma block covered by this chroma block
+    MB_MODE_INFO **base_mi =
+        &xd->mi[-(mi_row & ss_y) * xd->mi_stride - (mi_col & ss_x)];
+
+    // Then, we consider the luma region covered by the left or above 4x4 chroma
+    // prediction. We want to point to the chroma reference block in that
+    // region, which is the bottom-right-most mi unit.
+    // This leads to the following offsets:
+    MB_MODE_INFO *chroma_above_mi =
+        xd->chroma_up_available ? base_mi[-xd->mi_stride + ss_x] : NULL;
+    xd->chroma_above_mbmi = chroma_above_mi;
+
+    MB_MODE_INFO *chroma_left_mi =
+        xd->chroma_left_available ? base_mi[ss_y * xd->mi_stride - 1] : NULL;
+    xd->chroma_left_mbmi = chroma_left_mi;
+  }
+
+  xd->n4_h = bh;
+  xd->n4_w = bw;
+  xd->is_sec_rect = 0;
+  if (xd->n4_w < xd->n4_h) {
+    // Only mark is_sec_rect as 1 for the last block.
+    // For PARTITION_VERT_4, it would be (0, 0, 0, 1);
+    // For other partitions, it would be (0, 1).
+    if (!((mi_col + xd->n4_w) & (xd->n4_h - 1))) xd->is_sec_rect = 1;
+  }
+
+  if (xd->n4_w > xd->n4_h)
+    if (mi_row & (xd->n4_w - 1)) xd->is_sec_rect = 1;
+}
+
+static INLINE aom_cdf_prob *get_y_mode_cdf(FRAME_CONTEXT *tile_ctx,
+                                           const MB_MODE_INFO *above_mi,
+                                           const MB_MODE_INFO *left_mi) {
+  const PREDICTION_MODE above = av1_above_block_mode(above_mi);
+  const PREDICTION_MODE left = av1_left_block_mode(left_mi);
+  const int above_ctx = intra_mode_context[above];
+  const int left_ctx = intra_mode_context[left];
+  return tile_ctx->kf_y_cdf[above_ctx][left_ctx];
+}
+
+static INLINE void update_partition_context(MACROBLOCKD *xd, int mi_row,
+                                            int mi_col, BLOCK_SIZE subsize,
+                                            BLOCK_SIZE bsize) {
+  PARTITION_CONTEXT *const above_ctx = xd->above_seg_context + mi_col;
+  PARTITION_CONTEXT *const left_ctx =
+      xd->left_seg_context + (mi_row & MAX_MIB_MASK);
+
+  const int bw = mi_size_wide[bsize];
+  const int bh = mi_size_high[bsize];
+  memset(above_ctx, partition_context_lookup[subsize].above, bw);
+  memset(left_ctx, partition_context_lookup[subsize].left, bh);
+}
+
+static INLINE int is_chroma_reference(int mi_row, int mi_col, BLOCK_SIZE bsize,
+                                      int subsampling_x, int subsampling_y) {
+  const int bw = mi_size_wide[bsize];
+  const int bh = mi_size_high[bsize];
+  int ref_pos = ((mi_row & 0x01) || !(bh & 0x01) || !subsampling_y) &&
+                ((mi_col & 0x01) || !(bw & 0x01) || !subsampling_x);
+  return ref_pos;
+}
+
+static INLINE BLOCK_SIZE scale_chroma_bsize(BLOCK_SIZE bsize, int subsampling_x,
+                                            int subsampling_y) {
+  BLOCK_SIZE bs = bsize;
+  switch (bsize) {
+    case BLOCK_4X4:
+      if (subsampling_x == 1 && subsampling_y == 1)
+        bs = BLOCK_8X8;
+      else if (subsampling_x == 1)
+        bs = BLOCK_8X4;
+      else if (subsampling_y == 1)
+        bs = BLOCK_4X8;
+      break;
+    case BLOCK_4X8:
+      if (subsampling_x == 1 && subsampling_y == 1)
+        bs = BLOCK_8X8;
+      else if (subsampling_x == 1)
+        bs = BLOCK_8X8;
+      else if (subsampling_y == 1)
+        bs = BLOCK_4X8;
+      break;
+    case BLOCK_8X4:
+      if (subsampling_x == 1 && subsampling_y == 1)
+        bs = BLOCK_8X8;
+      else if (subsampling_x == 1)
+        bs = BLOCK_8X4;
+      else if (subsampling_y == 1)
+        bs = BLOCK_8X8;
+      break;
+    case BLOCK_4X16:
+      if (subsampling_x == 1 && subsampling_y == 1)
+        bs = BLOCK_8X16;
+      else if (subsampling_x == 1)
+        bs = BLOCK_8X16;
+      else if (subsampling_y == 1)
+        bs = BLOCK_4X16;
+      break;
+    case BLOCK_16X4:
+      if (subsampling_x == 1 && subsampling_y == 1)
+        bs = BLOCK_16X8;
+      else if (subsampling_x == 1)
+        bs = BLOCK_16X4;
+      else if (subsampling_y == 1)
+        bs = BLOCK_16X8;
+      break;
+    default: break;
+  }
+  return bs;
+}
+
+static INLINE aom_cdf_prob cdf_element_prob(const aom_cdf_prob *cdf,
+                                            size_t element) {
+  assert(cdf != NULL);
+  return (element > 0 ? cdf[element - 1] : CDF_PROB_TOP) - cdf[element];
+}
+
+static INLINE void partition_gather_horz_alike(aom_cdf_prob *out,
+                                               const aom_cdf_prob *const in,
+                                               BLOCK_SIZE bsize) {
+  (void)bsize;
+  out[0] = CDF_PROB_TOP;
+  out[0] -= cdf_element_prob(in, PARTITION_HORZ);
+  out[0] -= cdf_element_prob(in, PARTITION_SPLIT);
+  out[0] -= cdf_element_prob(in, PARTITION_HORZ_A);
+  out[0] -= cdf_element_prob(in, PARTITION_HORZ_B);
+  out[0] -= cdf_element_prob(in, PARTITION_VERT_A);
+  if (bsize != BLOCK_128X128) out[0] -= cdf_element_prob(in, PARTITION_HORZ_4);
+  out[0] = AOM_ICDF(out[0]);
+  out[1] = AOM_ICDF(CDF_PROB_TOP);
+}
+
+static INLINE void partition_gather_vert_alike(aom_cdf_prob *out,
+                                               const aom_cdf_prob *const in,
+                                               BLOCK_SIZE bsize) {
+  (void)bsize;
+  out[0] = CDF_PROB_TOP;
+  out[0] -= cdf_element_prob(in, PARTITION_VERT);
+  out[0] -= cdf_element_prob(in, PARTITION_SPLIT);
+  out[0] -= cdf_element_prob(in, PARTITION_HORZ_A);
+  out[0] -= cdf_element_prob(in, PARTITION_VERT_A);
+  out[0] -= cdf_element_prob(in, PARTITION_VERT_B);
+  if (bsize != BLOCK_128X128) out[0] -= cdf_element_prob(in, PARTITION_VERT_4);
+  out[0] = AOM_ICDF(out[0]);
+  out[1] = AOM_ICDF(CDF_PROB_TOP);
+}
+
+static INLINE void update_ext_partition_context(MACROBLOCKD *xd, int mi_row,
+                                                int mi_col, BLOCK_SIZE subsize,
+                                                BLOCK_SIZE bsize,
+                                                PARTITION_TYPE partition) {
+  if (bsize >= BLOCK_8X8) {
+    const int hbs = mi_size_wide[bsize] / 2;
+    BLOCK_SIZE bsize2 = get_partition_subsize(bsize, PARTITION_SPLIT);
+    switch (partition) {
+      case PARTITION_SPLIT:
+        if (bsize != BLOCK_8X8) break;
+        AOM_FALLTHROUGH_INTENDED;
+      case PARTITION_NONE:
+      case PARTITION_HORZ:
+      case PARTITION_VERT:
+      case PARTITION_HORZ_4:
+      case PARTITION_VERT_4:
+        update_partition_context(xd, mi_row, mi_col, subsize, bsize);
+        break;
+      case PARTITION_HORZ_A:
+        update_partition_context(xd, mi_row, mi_col, bsize2, subsize);
+        update_partition_context(xd, mi_row + hbs, mi_col, subsize, subsize);
+        break;
+      case PARTITION_HORZ_B:
+        update_partition_context(xd, mi_row, mi_col, subsize, subsize);
+        update_partition_context(xd, mi_row + hbs, mi_col, bsize2, subsize);
+        break;
+      case PARTITION_VERT_A:
+        update_partition_context(xd, mi_row, mi_col, bsize2, subsize);
+        update_partition_context(xd, mi_row, mi_col + hbs, subsize, subsize);
+        break;
+      case PARTITION_VERT_B:
+        update_partition_context(xd, mi_row, mi_col, subsize, subsize);
+        update_partition_context(xd, mi_row, mi_col + hbs, bsize2, subsize);
+        break;
+      default: assert(0 && "Invalid partition type");
+    }
+  }
+}
+
+static INLINE int partition_plane_context(const MACROBLOCKD *xd, int mi_row,
+                                          int mi_col, BLOCK_SIZE bsize) {
+  const PARTITION_CONTEXT *above_ctx = xd->above_seg_context + mi_col;
+  const PARTITION_CONTEXT *left_ctx =
+      xd->left_seg_context + (mi_row & MAX_MIB_MASK);
+  // Minimum partition point is 8x8. Offset the bsl accordingly.
+  const int bsl = mi_size_wide_log2[bsize] - mi_size_wide_log2[BLOCK_8X8];
+  int above = (*above_ctx >> bsl) & 1, left = (*left_ctx >> bsl) & 1;
+
+  assert(mi_size_wide_log2[bsize] == mi_size_high_log2[bsize]);
+  assert(bsl >= 0);
+
+  return (left * 2 + above) + bsl * PARTITION_PLOFFSET;
+}
+
+// Return the number of elements in the partition CDF when
+// partitioning the (square) block with luma block size of bsize.
+static INLINE int partition_cdf_length(BLOCK_SIZE bsize) {
+  if (bsize <= BLOCK_8X8)
+    return PARTITION_TYPES;
+  else if (bsize == BLOCK_128X128)
+    return EXT_PARTITION_TYPES - 2;
+  else
+    return EXT_PARTITION_TYPES;
+}
+
+static INLINE int max_block_wide(const MACROBLOCKD *xd, BLOCK_SIZE bsize,
+                                 int plane) {
+  int max_blocks_wide = block_size_wide[bsize];
+  const struct macroblockd_plane *const pd = &xd->plane[plane];
+
+  if (xd->mb_to_right_edge < 0)
+    max_blocks_wide += xd->mb_to_right_edge >> (3 + pd->subsampling_x);
+
+  // Scale the width in the transform block unit.
+  return max_blocks_wide >> tx_size_wide_log2[0];
+}
+
+static INLINE int max_block_high(const MACROBLOCKD *xd, BLOCK_SIZE bsize,
+                                 int plane) {
+  int max_blocks_high = block_size_high[bsize];
+  const struct macroblockd_plane *const pd = &xd->plane[plane];
+
+  if (xd->mb_to_bottom_edge < 0)
+    max_blocks_high += xd->mb_to_bottom_edge >> (3 + pd->subsampling_y);
+
+  // Scale the height in the transform block unit.
+  return max_blocks_high >> tx_size_high_log2[0];
+}
+
+static INLINE int max_intra_block_width(const MACROBLOCKD *xd,
+                                        BLOCK_SIZE plane_bsize, int plane,
+                                        TX_SIZE tx_size) {
+  const int max_blocks_wide = max_block_wide(xd, plane_bsize, plane)
+                              << tx_size_wide_log2[0];
+  return ALIGN_POWER_OF_TWO(max_blocks_wide, tx_size_wide_log2[tx_size]);
+}
+
+static INLINE int max_intra_block_height(const MACROBLOCKD *xd,
+                                         BLOCK_SIZE plane_bsize, int plane,
+                                         TX_SIZE tx_size) {
+  const int max_blocks_high = max_block_high(xd, plane_bsize, plane)
+                              << tx_size_high_log2[0];
+  return ALIGN_POWER_OF_TWO(max_blocks_high, tx_size_high_log2[tx_size]);
+}
+
+static INLINE void av1_zero_above_context(AV1_COMMON *const cm, const MACROBLOCKD *xd,
+  int mi_col_start, int mi_col_end, const int tile_row) {
+  const SequenceHeader *const seq_params = &cm->seq_params;
+  const int num_planes = av1_num_planes(cm);
+  const int width = mi_col_end - mi_col_start;
+  const int aligned_width =
+    ALIGN_POWER_OF_TWO(width, seq_params->mib_size_log2);
+
+  const int offset_y = mi_col_start;
+  const int width_y = aligned_width;
+  const int offset_uv = offset_y >> seq_params->subsampling_x;
+  const int width_uv = width_y >> seq_params->subsampling_x;
+
+  av1_zero_array(cm->above_context[0][tile_row] + offset_y, width_y);
+  if (num_planes > 1) {
+    if (cm->above_context[1][tile_row] && cm->above_context[2][tile_row]) {
+      av1_zero_array(cm->above_context[1][tile_row] + offset_uv, width_uv);
+      av1_zero_array(cm->above_context[2][tile_row] + offset_uv, width_uv);
+    } else {
+      aom_internal_error(xd->error_info, AOM_CODEC_CORRUPT_FRAME,
+                         "Invalid value of planes");
+    }
+  }
+
+  av1_zero_array(cm->above_seg_context[tile_row] + mi_col_start, aligned_width);
+
+  memset(cm->above_txfm_context[tile_row] + mi_col_start,
+    tx_size_wide[TX_SIZES_LARGEST],
+    aligned_width * sizeof(TXFM_CONTEXT));
+}
+
+static INLINE void av1_zero_left_context(MACROBLOCKD *const xd) {
+  av1_zero(xd->left_context);
+  av1_zero(xd->left_seg_context);
+
+  memset(xd->left_txfm_context_buffer, tx_size_high[TX_SIZES_LARGEST],
+         sizeof(xd->left_txfm_context_buffer));
+}
+
+// Disable array-bounds checks as the TX_SIZE enum contains values larger than
+// TX_SIZES_ALL (TX_INVALID) which make extending the array as a workaround
+// infeasible. The assert is enough for static analysis and this or other tools
+// asan, valgrind would catch oob access at runtime.
+#if defined(__GNUC__) && __GNUC__ >= 4
+#pragma GCC diagnostic ignored "-Warray-bounds"
+#endif
+
+#if defined(__GNUC__) && __GNUC__ >= 4
+#pragma GCC diagnostic warning "-Warray-bounds"
+#endif
+
+static INLINE void set_txfm_ctx(TXFM_CONTEXT *txfm_ctx, uint8_t txs, int len) {
+  int i;
+  for (i = 0; i < len; ++i) txfm_ctx[i] = txs;
+}
+
+static INLINE void set_txfm_ctxs(TX_SIZE tx_size, int n4_w, int n4_h, int skip,
+                                 const MACROBLOCKD *xd) {
+  uint8_t bw = tx_size_wide[tx_size];
+  uint8_t bh = tx_size_high[tx_size];
+
+  if (skip) {
+    bw = n4_w * MI_SIZE;
+    bh = n4_h * MI_SIZE;
+  }
+
+  set_txfm_ctx(xd->above_txfm_context, bw, n4_w);
+  set_txfm_ctx(xd->left_txfm_context, bh, n4_h);
+}
+
+static INLINE void txfm_partition_update(TXFM_CONTEXT *above_ctx,
+                                         TXFM_CONTEXT *left_ctx,
+                                         TX_SIZE tx_size, TX_SIZE txb_size) {
+  BLOCK_SIZE bsize = txsize_to_bsize[txb_size];
+  int bh = mi_size_high[bsize];
+  int bw = mi_size_wide[bsize];
+  uint8_t txw = tx_size_wide[tx_size];
+  uint8_t txh = tx_size_high[tx_size];
+  int i;
+  for (i = 0; i < bh; ++i) left_ctx[i] = txh;
+  for (i = 0; i < bw; ++i) above_ctx[i] = txw;
+}
+
+static INLINE TX_SIZE get_sqr_tx_size(int tx_dim) {
+  switch (tx_dim) {
+    case 128:
+    case 64: return TX_64X64; break;
+    case 32: return TX_32X32; break;
+    case 16: return TX_16X16; break;
+    case 8: return TX_8X8; break;
+    default: return TX_4X4;
+  }
+}
+
+static INLINE TX_SIZE get_tx_size(int width, int height) {
+  if (width == height) {
+    return get_sqr_tx_size(width);
+  }
+  if (width < height) {
+    if (width + width == height) {
+      switch (width) {
+        case 4: return TX_4X8; break;
+        case 8: return TX_8X16; break;
+        case 16: return TX_16X32; break;
+        case 32: return TX_32X64; break;
+      }
+    } else {
+      switch (width) {
+        case 4: return TX_4X16; break;
+        case 8: return TX_8X32; break;
+        case 16: return TX_16X64; break;
+      }
+    }
+  } else {
+    if (height + height == width) {
+      switch (height) {
+        case 4: return TX_8X4; break;
+        case 8: return TX_16X8; break;
+        case 16: return TX_32X16; break;
+        case 32: return TX_64X32; break;
+      }
+    } else {
+      switch (height) {
+        case 4: return TX_16X4; break;
+        case 8: return TX_32X8; break;
+        case 16: return TX_64X16; break;
+      }
+    }
+  }
+  assert(0);
+  return TX_4X4;
+}
+
+static INLINE int txfm_partition_context(const TXFM_CONTEXT *const above_ctx,
+                                         const TXFM_CONTEXT *const left_ctx,
+                                         BLOCK_SIZE bsize, TX_SIZE tx_size) {
+  const uint8_t txw = tx_size_wide[tx_size];
+  const uint8_t txh = tx_size_high[tx_size];
+  const int above = *above_ctx < txw;
+  const int left = *left_ctx < txh;
+  int category = TXFM_PARTITION_CONTEXTS;
+
+  // dummy return, not used by others.
+  if (tx_size <= TX_4X4) return 0;
+
+  TX_SIZE max_tx_size =
+      get_sqr_tx_size(AOMMAX(block_size_wide[bsize], block_size_high[bsize]));
+
+  if (max_tx_size >= TX_8X8) {
+    category =
+        (txsize_sqr_up_map[tx_size] != max_tx_size && max_tx_size > TX_8X8) +
+        (TX_SIZES - 1 - max_tx_size) * 2;
+  }
+  assert(category != TXFM_PARTITION_CONTEXTS);
+  return category * 3 + above + left;
+}
+
+// Compute the next partition in the direction of the sb_type stored in the mi
+// array, starting with bsize.
+static INLINE PARTITION_TYPE get_partition(const AV1_COMMON *const cm,
+                                           int mi_row, int mi_col,
+                                           BLOCK_SIZE bsize) {
+  if (mi_row >= cm->mi_rows || mi_col >= cm->mi_cols) return PARTITION_INVALID;
+
+  const int offset = mi_row * cm->mi_stride + mi_col;
+  MB_MODE_INFO **mi = cm->mi_grid_visible + offset;
+  const BLOCK_SIZE subsize = mi[0]->sb_type;
+
+  if (subsize == bsize) return PARTITION_NONE;
+
+  const int bhigh = mi_size_high[bsize];
+  const int bwide = mi_size_wide[bsize];
+  const int sshigh = mi_size_high[subsize];
+  const int sswide = mi_size_wide[subsize];
+
+  if (bsize > BLOCK_8X8 && mi_row + bwide / 2 < cm->mi_rows &&
+      mi_col + bhigh / 2 < cm->mi_cols) {
+    // In this case, the block might be using an extended partition
+    // type.
+    const MB_MODE_INFO *const mbmi_right = mi[bwide / 2];
+    const MB_MODE_INFO *const mbmi_below = mi[bhigh / 2 * cm->mi_stride];
+
+    if (sswide == bwide) {
+      // Smaller height but same width. Is PARTITION_HORZ_4, PARTITION_HORZ or
+      // PARTITION_HORZ_B. To distinguish the latter two, check if the lower
+      // half was split.
+      if (sshigh * 4 == bhigh) return PARTITION_HORZ_4;
+      assert(sshigh * 2 == bhigh);
+
+      if (mbmi_below->sb_type == subsize)
+        return PARTITION_HORZ;
+      else
+        return PARTITION_HORZ_B;
+    } else if (sshigh == bhigh) {
+      // Smaller width but same height. Is PARTITION_VERT_4, PARTITION_VERT or
+      // PARTITION_VERT_B. To distinguish the latter two, check if the right
+      // half was split.
+      if (sswide * 4 == bwide) return PARTITION_VERT_4;
+      assert(sswide * 2 == bhigh);
+
+      if (mbmi_right->sb_type == subsize)
+        return PARTITION_VERT;
+      else
+        return PARTITION_VERT_B;
+    } else {
+      // Smaller width and smaller height. Might be PARTITION_SPLIT or could be
+      // PARTITION_HORZ_A or PARTITION_VERT_A. If subsize isn't halved in both
+      // dimensions, we immediately know this is a split (which will recurse to
+      // get to subsize). Otherwise look down and to the right. With
+      // PARTITION_VERT_A, the right block will have height bhigh; with
+      // PARTITION_HORZ_A, the lower block with have width bwide. Otherwise
+      // it's PARTITION_SPLIT.
+      if (sswide * 2 != bwide || sshigh * 2 != bhigh) return PARTITION_SPLIT;
+
+      if (mi_size_wide[mbmi_below->sb_type] == bwide) return PARTITION_HORZ_A;
+      if (mi_size_high[mbmi_right->sb_type] == bhigh) return PARTITION_VERT_A;
+
+      return PARTITION_SPLIT;
+    }
+  }
+  const int vert_split = sswide < bwide;
+  const int horz_split = sshigh < bhigh;
+  const int split_idx = (vert_split << 1) | horz_split;
+  assert(split_idx != 0);
+
+  static const PARTITION_TYPE base_partitions[4] = {
+    PARTITION_INVALID, PARTITION_HORZ, PARTITION_VERT, PARTITION_SPLIT
+  };
+
+  return base_partitions[split_idx];
+}
+
+static INLINE void set_sb_size(SequenceHeader *const seq_params,
+                               BLOCK_SIZE sb_size) {
+  seq_params->sb_size = sb_size;
+  seq_params->mib_size = mi_size_wide[seq_params->sb_size];
+  seq_params->mib_size_log2 = mi_size_wide_log2[seq_params->sb_size];
+}
+
+// Returns true if the frame is fully lossless at the coded resolution.
+// Note: If super-resolution is used, such a frame will still NOT be lossless at
+// the upscaled resolution.
+static INLINE int is_coded_lossless(const AV1_COMMON *cm,
+                                    const MACROBLOCKD *xd) {
+  int coded_lossless = 1;
+  if (cm->seg.enabled) {
+    for (int i = 0; i < MAX_SEGMENTS; ++i) {
+      if (!xd->lossless[i]) {
+        coded_lossless = 0;
+        break;
+      }
+    }
+  } else {
+    coded_lossless = xd->lossless[0];
+  }
+  return coded_lossless;
+}
+
+static INLINE int is_valid_seq_level_idx(uint8_t seq_level_idx) {
+  return seq_level_idx < 24 || seq_level_idx == 31;
+}
+
+static INLINE uint8_t major_minor_to_seq_level_idx(BitstreamLevel bl) {
+  assert(bl.major >= LEVEL_MAJOR_MIN && bl.major <= LEVEL_MAJOR_MAX);
+  // Since bl.minor is unsigned a comparison will return a warning:
+  // comparison is always true due to limited range of data type
+  assert(LEVEL_MINOR_MIN == 0);
+  assert(bl.minor <= LEVEL_MINOR_MAX);
+  return ((bl.major - LEVEL_MAJOR_MIN) << LEVEL_MINOR_BITS) + bl.minor;
+}
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_ONYXC_INT_H_

diff --git a/libav1/av1/common/pred_common.c b/libav1/av1/common/pred_common.c
new file mode 100644
index 0000000..5952441
--- /dev/null
+++ b/libav1/av1/common/pred_common.c

@@ -0,0 +1,501 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "av1/common/common.h"
+#include "av1/common/pred_common.h"
+#include "av1/common/reconinter.h"
+#include "av1/common/reconintra.h"
+#include "av1/common/seg_common.h"
+
+// Returns a context number for the given MB prediction signal
+static InterpFilter get_ref_filter_type(const MB_MODE_INFO *ref_mbmi,
+                                        const MACROBLOCKD *xd, int dir,
+                                        MV_REFERENCE_FRAME ref_frame) {
+  (void)xd;
+
+  return ((ref_mbmi->ref_frame[0] == ref_frame ||
+           ref_mbmi->ref_frame[1] == ref_frame)
+              ? av1_extract_interp_filter(ref_mbmi->interp_filters, dir & 0x01)
+              : SWITCHABLE_FILTERS);
+}
+
+int av1_get_pred_context_switchable_interp(const MACROBLOCKD *xd, int dir) {
+  const MB_MODE_INFO *const mbmi = xd->mi[0];
+  const int ctx_offset =
+      (mbmi->ref_frame[1] > INTRA_FRAME) * INTER_FILTER_COMP_OFFSET;
+  assert(dir == 0 || dir == 1);
+  const MV_REFERENCE_FRAME ref_frame = mbmi->ref_frame[0];
+  // Note:
+  // The mode info data structure has a one element border above and to the
+  // left of the entries corresponding to real macroblocks.
+  // The prediction flags in these dummy entries are initialized to 0.
+  int filter_type_ctx = ctx_offset + (dir & 0x01) * INTER_FILTER_DIR_OFFSET;
+  int left_type = SWITCHABLE_FILTERS;
+  int above_type = SWITCHABLE_FILTERS;
+
+  if (xd->left_available)
+    left_type = get_ref_filter_type(xd->mi[-1], xd, dir, ref_frame);
+
+  if (xd->up_available)
+    above_type =
+        get_ref_filter_type(xd->mi[-xd->mi_stride], xd, dir, ref_frame);
+
+  if (left_type == above_type) {
+    filter_type_ctx += left_type;
+  } else if (left_type == SWITCHABLE_FILTERS) {
+    assert(above_type != SWITCHABLE_FILTERS);
+    filter_type_ctx += above_type;
+  } else if (above_type == SWITCHABLE_FILTERS) {
+    assert(left_type != SWITCHABLE_FILTERS);
+    filter_type_ctx += left_type;
+  } else {
+    filter_type_ctx += SWITCHABLE_FILTERS;
+  }
+
+  return filter_type_ctx;
+}
+
+static void palette_add_to_cache(uint16_t *cache, int *n, uint16_t val) {
+  // Do not add an already existing value
+  if (*n > 0 && val == cache[*n - 1]) return;
+
+  cache[(*n)++] = val;
+}
+
+int av1_get_palette_cache(const MACROBLOCKD *const xd, int plane,
+                          uint16_t *cache) {
+  const int row = -xd->mb_to_top_edge >> 3;
+  // Do not refer to above SB row when on SB boundary.
+  const MB_MODE_INFO *const above_mi =
+      (row % (1 << MIN_SB_SIZE_LOG2)) ? xd->above_mbmi : NULL;
+  const MB_MODE_INFO *const left_mi = xd->left_mbmi;
+  int above_n = 0, left_n = 0;
+  if (above_mi) above_n = above_mi->palette_mode_info.palette_size[plane != 0];
+  if (left_mi) left_n = left_mi->palette_mode_info.palette_size[plane != 0];
+  if (above_n == 0 && left_n == 0) return 0;
+  int above_idx = plane * PALETTE_MAX_SIZE;
+  int left_idx = plane * PALETTE_MAX_SIZE;
+  int n = 0;
+  const uint16_t *above_colors =
+      above_mi ? above_mi->palette_mode_info.palette_colors : NULL;
+  const uint16_t *left_colors =
+      left_mi ? left_mi->palette_mode_info.palette_colors : NULL;
+  // Merge the sorted lists of base colors from above and left to get
+  // combined sorted color cache.
+  while (above_n > 0 && left_n > 0) {
+    uint16_t v_above = above_colors[above_idx];
+    uint16_t v_left = left_colors[left_idx];
+    if (v_left < v_above) {
+      palette_add_to_cache(cache, &n, v_left);
+      ++left_idx, --left_n;
+    } else {
+      palette_add_to_cache(cache, &n, v_above);
+      ++above_idx, --above_n;
+      if (v_left == v_above) ++left_idx, --left_n;
+    }
+  }
+  while (above_n-- > 0) {
+    uint16_t val = above_colors[above_idx++];
+    palette_add_to_cache(cache, &n, val);
+  }
+  while (left_n-- > 0) {
+    uint16_t val = left_colors[left_idx++];
+    palette_add_to_cache(cache, &n, val);
+  }
+  assert(n <= 2 * PALETTE_MAX_SIZE);
+  return n;
+}
+
+// The mode info data structure has a one element border above and to the
+// left of the entries corresponding to real macroblocks.
+// The prediction flags in these dummy entries are initialized to 0.
+// 0 - inter/inter, inter/--, --/inter, --/--
+// 1 - intra/inter, inter/intra
+// 2 - intra/--, --/intra
+// 3 - intra/intra
+int av1_get_intra_inter_context(const MACROBLOCKD *xd) {
+  const MB_MODE_INFO *const above_mbmi = xd->above_mbmi;
+  const MB_MODE_INFO *const left_mbmi = xd->left_mbmi;
+  const int has_above = xd->up_available;
+  const int has_left = xd->left_available;
+
+  if (has_above && has_left) {  // both edges available
+    const int above_intra = !is_inter_block(above_mbmi);
+    const int left_intra = !is_inter_block(left_mbmi);
+    return left_intra && above_intra ? 3 : left_intra || above_intra;
+  } else if (has_above || has_left) {  // one edge available
+    return 2 * !is_inter_block(has_above ? above_mbmi : left_mbmi);
+  } else {
+    return 0;
+  }
+}
+
+#define CHECK_BACKWARD_REFS(ref_frame) \
+  (((ref_frame) >= BWDREF_FRAME) && ((ref_frame) <= ALTREF_FRAME))
+#define IS_BACKWARD_REF_FRAME(ref_frame) CHECK_BACKWARD_REFS(ref_frame)
+
+int av1_get_reference_mode_context(const MACROBLOCKD *xd) {
+  int ctx;
+  const MB_MODE_INFO *const above_mbmi = xd->above_mbmi;
+  const MB_MODE_INFO *const left_mbmi = xd->left_mbmi;
+  const int has_above = xd->up_available;
+  const int has_left = xd->left_available;
+
+  // Note:
+  // The mode info data structure has a one element border above and to the
+  // left of the entries corresponding to real macroblocks.
+  // The prediction flags in these dummy entries are initialized to 0.
+  if (has_above && has_left) {  // both edges available
+    if (!has_second_ref(above_mbmi) && !has_second_ref(left_mbmi))
+      // neither edge uses comp pred (0/1)
+      ctx = IS_BACKWARD_REF_FRAME(above_mbmi->ref_frame[0]) ^
+            IS_BACKWARD_REF_FRAME(left_mbmi->ref_frame[0]);
+    else if (!has_second_ref(above_mbmi))
+      // one of two edges uses comp pred (2/3)
+      ctx = 2 + (IS_BACKWARD_REF_FRAME(above_mbmi->ref_frame[0]) ||
+                 !is_inter_block(above_mbmi));
+    else if (!has_second_ref(left_mbmi))
+      // one of two edges uses comp pred (2/3)
+      ctx = 2 + (IS_BACKWARD_REF_FRAME(left_mbmi->ref_frame[0]) ||
+                 !is_inter_block(left_mbmi));
+    else  // both edges use comp pred (4)
+      ctx = 4;
+  } else if (has_above || has_left) {  // one edge available
+    const MB_MODE_INFO *edge_mbmi = has_above ? above_mbmi : left_mbmi;
+
+    if (!has_second_ref(edge_mbmi))
+      // edge does not use comp pred (0/1)
+      ctx = IS_BACKWARD_REF_FRAME(edge_mbmi->ref_frame[0]);
+    else
+      // edge uses comp pred (3)
+      ctx = 3;
+  } else {  // no edges available (1)
+    ctx = 1;
+  }
+  assert(ctx >= 0 && ctx < COMP_INTER_CONTEXTS);
+  return ctx;
+}
+
+int av1_get_comp_reference_type_context(const MACROBLOCKD *xd) {
+  int pred_context;
+  const MB_MODE_INFO *const above_mbmi = xd->above_mbmi;
+  const MB_MODE_INFO *const left_mbmi = xd->left_mbmi;
+  const int above_in_image = xd->up_available;
+  const int left_in_image = xd->left_available;
+
+  if (above_in_image && left_in_image) {  // both edges available
+    const int above_intra = !is_inter_block(above_mbmi);
+    const int left_intra = !is_inter_block(left_mbmi);
+
+    if (above_intra && left_intra) {  // intra/intra
+      pred_context = 2;
+    } else if (above_intra || left_intra) {  // intra/inter
+      const MB_MODE_INFO *inter_mbmi = above_intra ? left_mbmi : above_mbmi;
+
+      if (!has_second_ref(inter_mbmi))  // single pred
+        pred_context = 2;
+      else  // comp pred
+        pred_context = 1 + 2 * has_uni_comp_refs(inter_mbmi);
+    } else {  // inter/inter
+      const int a_sg = !has_second_ref(above_mbmi);
+      const int l_sg = !has_second_ref(left_mbmi);
+      const MV_REFERENCE_FRAME frfa = above_mbmi->ref_frame[0];
+      const MV_REFERENCE_FRAME frfl = left_mbmi->ref_frame[0];
+
+      if (a_sg && l_sg) {  // single/single
+        pred_context = 1 + 2 * (!(IS_BACKWARD_REF_FRAME(frfa) ^
+                                  IS_BACKWARD_REF_FRAME(frfl)));
+      } else if (l_sg || a_sg) {  // single/comp
+        const int uni_rfc =
+            a_sg ? has_uni_comp_refs(left_mbmi) : has_uni_comp_refs(above_mbmi);
+
+        if (!uni_rfc)  // comp bidir
+          pred_context = 1;
+        else  // comp unidir
+          pred_context = 3 + (!(IS_BACKWARD_REF_FRAME(frfa) ^
+                                IS_BACKWARD_REF_FRAME(frfl)));
+      } else {  // comp/comp
+        const int a_uni_rfc = has_uni_comp_refs(above_mbmi);
+        const int l_uni_rfc = has_uni_comp_refs(left_mbmi);
+
+        if (!a_uni_rfc && !l_uni_rfc)  // bidir/bidir
+          pred_context = 0;
+        else if (!a_uni_rfc || !l_uni_rfc)  // unidir/bidir
+          pred_context = 2;
+        else  // unidir/unidir
+          pred_context =
+              3 + (!((frfa == BWDREF_FRAME) ^ (frfl == BWDREF_FRAME)));
+      }
+    }
+  } else if (above_in_image || left_in_image) {  // one edge available
+    const MB_MODE_INFO *edge_mbmi = above_in_image ? above_mbmi : left_mbmi;
+
+    if (!is_inter_block(edge_mbmi)) {  // intra
+      pred_context = 2;
+    } else {                           // inter
+      if (!has_second_ref(edge_mbmi))  // single pred
+        pred_context = 2;
+      else  // comp pred
+        pred_context = 4 * has_uni_comp_refs(edge_mbmi);
+    }
+  } else {  // no edges available
+    pred_context = 2;
+  }
+
+  assert(pred_context >= 0 && pred_context < COMP_REF_TYPE_CONTEXTS);
+  return pred_context;
+}
+
+// Returns a context number for the given MB prediction signal
+//
+// Signal the uni-directional compound reference frame pair as either
+// (BWDREF, ALTREF), or (LAST, LAST2) / (LAST, LAST3) / (LAST, GOLDEN),
+// conditioning on the pair is known as uni-directional.
+//
+// 3 contexts: Voting is used to compare the count of forward references with
+//             that of backward references from the spatial neighbors.
+int av1_get_pred_context_uni_comp_ref_p(const MACROBLOCKD *xd) {
+  const uint8_t *const ref_counts = &xd->neighbors_ref_counts[0];
+
+  // Count of forward references (L, L2, L3, or G)
+  const int frf_count = ref_counts[LAST_FRAME] + ref_counts[LAST2_FRAME] +
+                        ref_counts[LAST3_FRAME] + ref_counts[GOLDEN_FRAME];
+  // Count of backward references (B or A)
+  const int brf_count = ref_counts[BWDREF_FRAME] + ref_counts[ALTREF2_FRAME] +
+                        ref_counts[ALTREF_FRAME];
+
+  const int pred_context =
+      (frf_count == brf_count) ? 1 : ((frf_count < brf_count) ? 0 : 2);
+
+  assert(pred_context >= 0 && pred_context < UNI_COMP_REF_CONTEXTS);
+  return pred_context;
+}
+
+// Returns a context number for the given MB prediction signal
+//
+// Signal the uni-directional compound reference frame pair as
+// either (LAST, LAST2), or (LAST, LAST3) / (LAST, GOLDEN),
+// conditioning on the pair is known as one of the above three.
+//
+// 3 contexts: Voting is used to compare the count of LAST2_FRAME with the
+//             total count of LAST3/GOLDEN from the spatial neighbors.
+int av1_get_pred_context_uni_comp_ref_p1(const MACROBLOCKD *xd) {
+  const uint8_t *const ref_counts = &xd->neighbors_ref_counts[0];
+
+  // Count of LAST2
+  const int last2_count = ref_counts[LAST2_FRAME];
+  // Count of LAST3 or GOLDEN
+  const int last3_or_gld_count =
+      ref_counts[LAST3_FRAME] + ref_counts[GOLDEN_FRAME];
+
+  const int pred_context = (last2_count == last3_or_gld_count)
+                               ? 1
+                               : ((last2_count < last3_or_gld_count) ? 0 : 2);
+
+  assert(pred_context >= 0 && pred_context < UNI_COMP_REF_CONTEXTS);
+  return pred_context;
+}
+
+// Returns a context number for the given MB prediction signal
+//
+// Signal the uni-directional compound reference frame pair as
+// either (LAST, LAST3) or (LAST, GOLDEN),
+// conditioning on the pair is known as one of the above two.
+//
+// 3 contexts: Voting is used to compare the count of LAST3_FRAME with the
+//             total count of GOLDEN_FRAME from the spatial neighbors.
+int av1_get_pred_context_uni_comp_ref_p2(const MACROBLOCKD *xd) {
+  const uint8_t *const ref_counts = &xd->neighbors_ref_counts[0];
+
+  // Count of LAST3
+  const int last3_count = ref_counts[LAST3_FRAME];
+  // Count of GOLDEN
+  const int gld_count = ref_counts[GOLDEN_FRAME];
+
+  const int pred_context =
+      (last3_count == gld_count) ? 1 : ((last3_count < gld_count) ? 0 : 2);
+
+  assert(pred_context >= 0 && pred_context < UNI_COMP_REF_CONTEXTS);
+  return pred_context;
+}
+
+// == Common context functions for both comp and single ref ==
+//
+// Obtain contexts to signal a reference frame to be either LAST/LAST2 or
+// LAST3/GOLDEN.
+static int get_pred_context_ll2_or_l3gld(const MACROBLOCKD *xd) {
+  const uint8_t *const ref_counts = &xd->neighbors_ref_counts[0];
+
+  // Count of LAST + LAST2
+  const int last_last2_count = ref_counts[LAST_FRAME] + ref_counts[LAST2_FRAME];
+  // Count of LAST3 + GOLDEN
+  const int last3_gld_count =
+      ref_counts[LAST3_FRAME] + ref_counts[GOLDEN_FRAME];
+
+  const int pred_context = (last_last2_count == last3_gld_count)
+                               ? 1
+                               : ((last_last2_count < last3_gld_count) ? 0 : 2);
+
+  assert(pred_context >= 0 && pred_context < REF_CONTEXTS);
+  return pred_context;
+}
+
+// Obtain contexts to signal a reference frame to be either LAST or LAST2.
+static int get_pred_context_last_or_last2(const MACROBLOCKD *xd) {
+  const uint8_t *const ref_counts = &xd->neighbors_ref_counts[0];
+
+  // Count of LAST
+  const int last_count = ref_counts[LAST_FRAME];
+  // Count of LAST2
+  const int last2_count = ref_counts[LAST2_FRAME];
+
+  const int pred_context =
+      (last_count == last2_count) ? 1 : ((last_count < last2_count) ? 0 : 2);
+
+  assert(pred_context >= 0 && pred_context < REF_CONTEXTS);
+  return pred_context;
+}
+
+// Obtain contexts to signal a reference frame to be either LAST3 or GOLDEN.
+static int get_pred_context_last3_or_gld(const MACROBLOCKD *xd) {
+  const uint8_t *const ref_counts = &xd->neighbors_ref_counts[0];
+
+  // Count of LAST3
+  const int last3_count = ref_counts[LAST3_FRAME];
+  // Count of GOLDEN
+  const int gld_count = ref_counts[GOLDEN_FRAME];
+
+  const int pred_context =
+      (last3_count == gld_count) ? 1 : ((last3_count < gld_count) ? 0 : 2);
+
+  assert(pred_context >= 0 && pred_context < REF_CONTEXTS);
+  return pred_context;
+}
+
+// Obtain contexts to signal a reference frame be either BWDREF/ALTREF2, or
+// ALTREF.
+static int get_pred_context_brfarf2_or_arf(const MACROBLOCKD *xd) {
+  const uint8_t *const ref_counts = &xd->neighbors_ref_counts[0];
+
+  // Counts of BWDREF, ALTREF2, or ALTREF frames (B, A2, or A)
+  const int brfarf2_count =
+      ref_counts[BWDREF_FRAME] + ref_counts[ALTREF2_FRAME];
+  const int arf_count = ref_counts[ALTREF_FRAME];
+
+  const int pred_context =
+      (brfarf2_count == arf_count) ? 1 : ((brfarf2_count < arf_count) ? 0 : 2);
+
+  assert(pred_context >= 0 && pred_context < REF_CONTEXTS);
+  return pred_context;
+}
+
+// Obtain contexts to signal a reference frame be either BWDREF or ALTREF2.
+static int get_pred_context_brf_or_arf2(const MACROBLOCKD *xd) {
+  const uint8_t *const ref_counts = &xd->neighbors_ref_counts[0];
+
+  // Count of BWDREF frames (B)
+  const int brf_count = ref_counts[BWDREF_FRAME];
+  // Count of ALTREF2 frames (A2)
+  const int arf2_count = ref_counts[ALTREF2_FRAME];
+
+  const int pred_context =
+      (brf_count == arf2_count) ? 1 : ((brf_count < arf2_count) ? 0 : 2);
+
+  assert(pred_context >= 0 && pred_context < REF_CONTEXTS);
+  return pred_context;
+}
+
+// == Context functions for comp ref ==
+//
+// Returns a context number for the given MB prediction signal
+// Signal the first reference frame for a compound mode be either
+// GOLDEN/LAST3, or LAST/LAST2.
+int av1_get_pred_context_comp_ref_p(const MACROBLOCKD *xd) {
+  return get_pred_context_ll2_or_l3gld(xd);
+}
+
+// Returns a context number for the given MB prediction signal
+// Signal the first reference frame for a compound mode be LAST,
+// conditioning on that it is known either LAST/LAST2.
+int av1_get_pred_context_comp_ref_p1(const MACROBLOCKD *xd) {
+  return get_pred_context_last_or_last2(xd);
+}
+
+// Returns a context number for the given MB prediction signal
+// Signal the first reference frame for a compound mode be GOLDEN,
+// conditioning on that it is known either GOLDEN or LAST3.
+int av1_get_pred_context_comp_ref_p2(const MACROBLOCKD *xd) {
+  return get_pred_context_last3_or_gld(xd);
+}
+
+// Signal the 2nd reference frame for a compound mode be either
+// ALTREF, or ALTREF2/BWDREF.
+int av1_get_pred_context_comp_bwdref_p(const MACROBLOCKD *xd) {
+  return get_pred_context_brfarf2_or_arf(xd);
+}
+
+// Signal the 2nd reference frame for a compound mode be either
+// ALTREF2 or BWDREF.
+int av1_get_pred_context_comp_bwdref_p1(const MACROBLOCKD *xd) {
+  return get_pred_context_brf_or_arf2(xd);
+}
+
+// == Context functions for single ref ==
+//
+// For the bit to signal whether the single reference is a forward reference
+// frame or a backward reference frame.
+int av1_get_pred_context_single_ref_p1(const MACROBLOCKD *xd) {
+  const uint8_t *const ref_counts = &xd->neighbors_ref_counts[0];
+
+  // Count of forward reference frames
+  const int fwd_count = ref_counts[LAST_FRAME] + ref_counts[LAST2_FRAME] +
+                        ref_counts[LAST3_FRAME] + ref_counts[GOLDEN_FRAME];
+  // Count of backward reference frames
+  const int bwd_count = ref_counts[BWDREF_FRAME] + ref_counts[ALTREF2_FRAME] +
+                        ref_counts[ALTREF_FRAME];
+
+  const int pred_context =
+      (fwd_count == bwd_count) ? 1 : ((fwd_count < bwd_count) ? 0 : 2);
+
+  assert(pred_context >= 0 && pred_context < REF_CONTEXTS);
+  return pred_context;
+}
+
+// For the bit to signal whether the single reference is ALTREF_FRAME or
+// non-ALTREF backward reference frame, knowing that it shall be either of
+// these 2 choices.
+int av1_get_pred_context_single_ref_p2(const MACROBLOCKD *xd) {
+  return get_pred_context_brfarf2_or_arf(xd);
+}
+
+// For the bit to signal whether the single reference is LAST3/GOLDEN or
+// LAST2/LAST, knowing that it shall be either of these 2 choices.
+int av1_get_pred_context_single_ref_p3(const MACROBLOCKD *xd) {
+  return get_pred_context_ll2_or_l3gld(xd);
+}
+
+// For the bit to signal whether the single reference is LAST2_FRAME or
+// LAST_FRAME, knowing that it shall be either of these 2 choices.
+int av1_get_pred_context_single_ref_p4(const MACROBLOCKD *xd) {
+  return get_pred_context_last_or_last2(xd);
+}
+
+// For the bit to signal whether the single reference is GOLDEN_FRAME or
+// LAST3_FRAME, knowing that it shall be either of these 2 choices.
+int av1_get_pred_context_single_ref_p5(const MACROBLOCKD *xd) {
+  return get_pred_context_last3_or_gld(xd);
+}
+
+// For the bit to signal whether the single reference is ALTREF2_FRAME or
+// BWDREF_FRAME, knowing that it shall be either of these 2 choices.
+int av1_get_pred_context_single_ref_p6(const MACROBLOCKD *xd) {
+  return get_pred_context_brf_or_arf2(xd);
+}

diff --git a/libav1/av1/common/pred_common.h b/libav1/av1/common/pred_common.h
new file mode 100644
index 0000000..d9b30a9
--- /dev/null
+++ b/libav1/av1/common/pred_common.h

@@ -0,0 +1,364 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_PRED_COMMON_H_
+#define AOM_AV1_COMMON_PRED_COMMON_H_
+
+#include "av1/common/blockd.h"
+#include "av1/common/mvref_common.h"
+#include "av1/common/onyxc_int.h"
+#include "aom_dsp/aom_dsp_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+static INLINE int get_segment_id(const AV1_COMMON *const cm,
+                                 const uint8_t *segment_ids, BLOCK_SIZE bsize,
+                                 int mi_row, int mi_col) {
+  const int mi_offset = mi_row * cm->mi_cols + mi_col;
+  const int bw = mi_size_wide[bsize];
+  const int bh = mi_size_high[bsize];
+  const int xmis = AOMMIN(cm->mi_cols - mi_col, bw);
+  const int ymis = AOMMIN(cm->mi_rows - mi_row, bh);
+  int x, y, segment_id = MAX_SEGMENTS;
+
+  for (y = 0; y < ymis; ++y)
+    for (x = 0; x < xmis; ++x)
+      segment_id =
+          AOMMIN(segment_id, segment_ids[mi_offset + y * cm->mi_cols + x]);
+
+  assert(segment_id >= 0 && segment_id < MAX_SEGMENTS);
+  return segment_id;
+}
+
+static INLINE int av1_get_spatial_seg_pred(const AV1_COMMON *const cm,
+                                           const MACROBLOCKD *const xd,
+                                           int mi_row, int mi_col,
+                                           int *cdf_index) {
+  int prev_ul = -1;  // top left segment_id
+  int prev_l = -1;   // left segment_id
+  int prev_u = -1;   // top segment_id
+  if ((xd->up_available) && (xd->left_available)) {
+    prev_ul = get_segment_id(cm, cm->cur_frame->seg_map, BLOCK_4X4, mi_row - 1,
+                             mi_col - 1);
+  }
+  if (xd->up_available) {
+    prev_u = get_segment_id(cm, cm->cur_frame->seg_map, BLOCK_4X4, mi_row - 1,
+                            mi_col - 0);
+  }
+  if (xd->left_available) {
+    prev_l = get_segment_id(cm, cm->cur_frame->seg_map, BLOCK_4X4, mi_row - 0,
+                            mi_col - 1);
+  }
+  // This property follows from the fact that get_segment_id() returns a
+  // nonnegative value. This allows us to test for all edge cases with a simple
+  // prev_ul < 0 check.
+  assert(IMPLIES(prev_ul >= 0, prev_u >= 0 && prev_l >= 0));
+
+  // Pick CDF index based on number of matching/out-of-bounds segment IDs.
+  if (prev_ul < 0) /* Edge cases */
+    *cdf_index = 0;
+  else if ((prev_ul == prev_u) && (prev_ul == prev_l))
+    *cdf_index = 2;
+  else if ((prev_ul == prev_u) || (prev_ul == prev_l) || (prev_u == prev_l))
+    *cdf_index = 1;
+  else
+    *cdf_index = 0;
+
+  // If 2 or more are identical returns that as predictor, otherwise prev_l.
+  if (prev_u == -1)  // edge case
+    return prev_l == -1 ? 0 : prev_l;
+  if (prev_l == -1)  // edge case
+    return prev_u;
+  return (prev_ul == prev_u) ? prev_u : prev_l;
+}
+
+static INLINE int av1_get_pred_context_seg_id(const MACROBLOCKD *xd) {
+  const MB_MODE_INFO *const above_mi = xd->above_mbmi;
+  const MB_MODE_INFO *const left_mi = xd->left_mbmi;
+  const int above_sip = (above_mi != NULL) ? above_mi->seg_id_predicted : 0;
+  const int left_sip = (left_mi != NULL) ? left_mi->seg_id_predicted : 0;
+
+  return above_sip + left_sip;
+}
+
+static INLINE int get_comp_index_context(const AV1_COMMON *cm,
+                                         const MACROBLOCKD *xd) {
+  MB_MODE_INFO *mbmi = xd->mi[0];
+  const RefCntBuffer *const bck_buf = get_ref_frame_buf(cm, mbmi->ref_frame[0]);
+  const RefCntBuffer *const fwd_buf = get_ref_frame_buf(cm, mbmi->ref_frame[1]);
+  int bck_frame_index = 0, fwd_frame_index = 0;
+  int cur_frame_index = cm->cur_frame->order_hint;
+
+  if (bck_buf != NULL) bck_frame_index = bck_buf->order_hint;
+  if (fwd_buf != NULL) fwd_frame_index = fwd_buf->order_hint;
+
+  int fwd = abs(get_relative_dist(&cm->seq_params.order_hint_info,
+                                  fwd_frame_index, cur_frame_index));
+  int bck = abs(get_relative_dist(&cm->seq_params.order_hint_info,
+                                  cur_frame_index, bck_frame_index));
+
+  const MB_MODE_INFO *const above_mi = xd->above_mbmi;
+  const MB_MODE_INFO *const left_mi = xd->left_mbmi;
+
+  int above_ctx = 0, left_ctx = 0;
+  const int offset = (fwd == bck);
+
+  if (above_mi != NULL) {
+    if (has_second_ref(above_mi))
+      above_ctx = above_mi->compound_idx;
+    else if (above_mi->ref_frame[0] == ALTREF_FRAME)
+      above_ctx = 1;
+  }
+
+  if (left_mi != NULL) {
+    if (has_second_ref(left_mi))
+      left_ctx = left_mi->compound_idx;
+    else if (left_mi->ref_frame[0] == ALTREF_FRAME)
+      left_ctx = 1;
+  }
+
+  return above_ctx + left_ctx + 3 * offset;
+}
+
+static INLINE int get_comp_group_idx_context(const MACROBLOCKD *xd) {
+  const MB_MODE_INFO *const above_mi = xd->above_mbmi;
+  const MB_MODE_INFO *const left_mi = xd->left_mbmi;
+  int above_ctx = 0, left_ctx = 0;
+
+  if (above_mi) {
+    if (has_second_ref(above_mi))
+      above_ctx = above_mi->comp_group_idx;
+    else if (above_mi->ref_frame[0] == ALTREF_FRAME)
+      above_ctx = 3;
+  }
+  if (left_mi) {
+    if (has_second_ref(left_mi))
+      left_ctx = left_mi->comp_group_idx;
+    else if (left_mi->ref_frame[0] == ALTREF_FRAME)
+      left_ctx = 3;
+  }
+
+  return AOMMIN(5, above_ctx + left_ctx);
+}
+
+static INLINE aom_cdf_prob *av1_get_pred_cdf_seg_id(
+    struct segmentation_probs *segp, const MACROBLOCKD *xd) {
+  return segp->pred_cdf[av1_get_pred_context_seg_id(xd)];
+}
+
+static INLINE int av1_get_skip_mode_context(const MACROBLOCKD *xd) {
+  const MB_MODE_INFO *const above_mi = xd->above_mbmi;
+  const MB_MODE_INFO *const left_mi = xd->left_mbmi;
+  const int above_skip_mode = above_mi ? above_mi->skip_mode : 0;
+  const int left_skip_mode = left_mi ? left_mi->skip_mode : 0;
+  return above_skip_mode + left_skip_mode;
+}
+
+static INLINE int av1_get_skip_context(const MACROBLOCKD *xd) {
+  const MB_MODE_INFO *const above_mi = xd->above_mbmi;
+  const MB_MODE_INFO *const left_mi = xd->left_mbmi;
+  const int above_skip = above_mi ? above_mi->skip : 0;
+  const int left_skip = left_mi ? left_mi->skip : 0;
+  return above_skip + left_skip;
+}
+
+int av1_get_pred_context_switchable_interp(const MACROBLOCKD *xd, int dir);
+
+// Get a list of palette base colors that are used in the above and left blocks,
+// referred to as "color cache". The return value is the number of colors in the
+// cache (<= 2 * PALETTE_MAX_SIZE). The color values are stored in "cache"
+// in ascending order.
+int av1_get_palette_cache(const MACROBLOCKD *const xd, int plane,
+                          uint16_t *cache);
+
+static INLINE int av1_get_palette_bsize_ctx(BLOCK_SIZE bsize) {
+  return num_pels_log2_lookup[bsize] - num_pels_log2_lookup[BLOCK_8X8];
+}
+
+static INLINE int av1_get_palette_mode_ctx(const MACROBLOCKD *xd) {
+  const MB_MODE_INFO *const above_mi = xd->above_mbmi;
+  const MB_MODE_INFO *const left_mi = xd->left_mbmi;
+  int ctx = 0;
+  if (above_mi) ctx += (above_mi->palette_mode_info.palette_size[0] > 0);
+  if (left_mi) ctx += (left_mi->palette_mode_info.palette_size[0] > 0);
+  return ctx;
+}
+
+int av1_get_intra_inter_context(const MACROBLOCKD *xd);
+
+int av1_get_reference_mode_context(const MACROBLOCKD *xd);
+
+static INLINE aom_cdf_prob *av1_get_reference_mode_cdf(const MACROBLOCKD *xd) {
+  return xd->tile_ctx->comp_inter_cdf[av1_get_reference_mode_context(xd)];
+}
+
+int av1_get_comp_reference_type_context(const MACROBLOCKD *xd);
+
+// == Uni-directional contexts ==
+
+int av1_get_pred_context_uni_comp_ref_p(const MACROBLOCKD *xd);
+
+int av1_get_pred_context_uni_comp_ref_p1(const MACROBLOCKD *xd);
+
+int av1_get_pred_context_uni_comp_ref_p2(const MACROBLOCKD *xd);
+
+static INLINE aom_cdf_prob *av1_get_comp_reference_type_cdf(
+    const MACROBLOCKD *xd) {
+  const int pred_context = av1_get_comp_reference_type_context(xd);
+  return xd->tile_ctx->comp_ref_type_cdf[pred_context];
+}
+
+static INLINE aom_cdf_prob *av1_get_pred_cdf_uni_comp_ref_p(
+    const MACROBLOCKD *xd) {
+  const int pred_context = av1_get_pred_context_uni_comp_ref_p(xd);
+  return xd->tile_ctx->uni_comp_ref_cdf[pred_context][0];
+}
+
+static INLINE aom_cdf_prob *av1_get_pred_cdf_uni_comp_ref_p1(
+    const MACROBLOCKD *xd) {
+  const int pred_context = av1_get_pred_context_uni_comp_ref_p1(xd);
+  return xd->tile_ctx->uni_comp_ref_cdf[pred_context][1];
+}
+
+static INLINE aom_cdf_prob *av1_get_pred_cdf_uni_comp_ref_p2(
+    const MACROBLOCKD *xd) {
+  const int pred_context = av1_get_pred_context_uni_comp_ref_p2(xd);
+  return xd->tile_ctx->uni_comp_ref_cdf[pred_context][2];
+}
+
+// == Bi-directional contexts ==
+
+int av1_get_pred_context_comp_ref_p(const MACROBLOCKD *xd);
+
+int av1_get_pred_context_comp_ref_p1(const MACROBLOCKD *xd);
+
+int av1_get_pred_context_comp_ref_p2(const MACROBLOCKD *xd);
+
+int av1_get_pred_context_comp_bwdref_p(const MACROBLOCKD *xd);
+
+int av1_get_pred_context_comp_bwdref_p1(const MACROBLOCKD *xd);
+
+static INLINE aom_cdf_prob *av1_get_pred_cdf_comp_ref_p(const MACROBLOCKD *xd) {
+  const int pred_context = av1_get_pred_context_comp_ref_p(xd);
+  return xd->tile_ctx->comp_ref_cdf[pred_context][0];
+}
+
+static INLINE aom_cdf_prob *av1_get_pred_cdf_comp_ref_p1(
+    const MACROBLOCKD *xd) {
+  const int pred_context = av1_get_pred_context_comp_ref_p1(xd);
+  return xd->tile_ctx->comp_ref_cdf[pred_context][1];
+}
+
+static INLINE aom_cdf_prob *av1_get_pred_cdf_comp_ref_p2(
+    const MACROBLOCKD *xd) {
+  const int pred_context = av1_get_pred_context_comp_ref_p2(xd);
+  return xd->tile_ctx->comp_ref_cdf[pred_context][2];
+}
+
+static INLINE aom_cdf_prob *av1_get_pred_cdf_comp_bwdref_p(
+    const MACROBLOCKD *xd) {
+  const int pred_context = av1_get_pred_context_comp_bwdref_p(xd);
+  return xd->tile_ctx->comp_bwdref_cdf[pred_context][0];
+}
+
+static INLINE aom_cdf_prob *av1_get_pred_cdf_comp_bwdref_p1(
+    const MACROBLOCKD *xd) {
+  const int pred_context = av1_get_pred_context_comp_bwdref_p1(xd);
+  return xd->tile_ctx->comp_bwdref_cdf[pred_context][1];
+}
+
+// == Single contexts ==
+
+int av1_get_pred_context_single_ref_p1(const MACROBLOCKD *xd);
+
+int av1_get_pred_context_single_ref_p2(const MACROBLOCKD *xd);
+
+int av1_get_pred_context_single_ref_p3(const MACROBLOCKD *xd);
+
+int av1_get_pred_context_single_ref_p4(const MACROBLOCKD *xd);
+
+int av1_get_pred_context_single_ref_p5(const MACROBLOCKD *xd);
+
+int av1_get_pred_context_single_ref_p6(const MACROBLOCKD *xd);
+
+static INLINE aom_cdf_prob *av1_get_pred_cdf_single_ref_p1(
+    const MACROBLOCKD *xd) {
+  return xd->tile_ctx
+      ->single_ref_cdf[av1_get_pred_context_single_ref_p1(xd)][0];
+}
+static INLINE aom_cdf_prob *av1_get_pred_cdf_single_ref_p2(
+    const MACROBLOCKD *xd) {
+  return xd->tile_ctx
+      ->single_ref_cdf[av1_get_pred_context_single_ref_p2(xd)][1];
+}
+static INLINE aom_cdf_prob *av1_get_pred_cdf_single_ref_p3(
+    const MACROBLOCKD *xd) {
+  return xd->tile_ctx
+      ->single_ref_cdf[av1_get_pred_context_single_ref_p3(xd)][2];
+}
+static INLINE aom_cdf_prob *av1_get_pred_cdf_single_ref_p4(
+    const MACROBLOCKD *xd) {
+  return xd->tile_ctx
+      ->single_ref_cdf[av1_get_pred_context_single_ref_p4(xd)][3];
+}
+static INLINE aom_cdf_prob *av1_get_pred_cdf_single_ref_p5(
+    const MACROBLOCKD *xd) {
+  return xd->tile_ctx
+      ->single_ref_cdf[av1_get_pred_context_single_ref_p5(xd)][4];
+}
+static INLINE aom_cdf_prob *av1_get_pred_cdf_single_ref_p6(
+    const MACROBLOCKD *xd) {
+  return xd->tile_ctx
+      ->single_ref_cdf[av1_get_pred_context_single_ref_p6(xd)][5];
+}
+
+// Returns a context number for the given MB prediction signal
+// The mode info data structure has a one element border above and to the
+// left of the entries corresponding to real blocks.
+// The prediction flags in these dummy entries are initialized to 0.
+static INLINE int get_tx_size_context(const MACROBLOCKD *xd) {
+  const MB_MODE_INFO *mbmi = xd->mi[0];
+  const MB_MODE_INFO *const above_mbmi = xd->above_mbmi;
+  const MB_MODE_INFO *const left_mbmi = xd->left_mbmi;
+  const TX_SIZE max_tx_size = max_txsize_rect_lookup[mbmi->sb_type];
+  const int max_tx_wide = tx_size_wide[max_tx_size];
+  const int max_tx_high = tx_size_high[max_tx_size];
+  const int has_above = xd->up_available;
+  const int has_left = xd->left_available;
+
+  int above = xd->above_txfm_context[0] >= max_tx_wide;
+  int left = xd->left_txfm_context[0] >= max_tx_high;
+
+  if (has_above)
+    if (is_inter_block(above_mbmi))
+      above = block_size_wide[above_mbmi->sb_type] >= max_tx_wide;
+
+  if (has_left)
+    if (is_inter_block(left_mbmi))
+      left = block_size_high[left_mbmi->sb_type] >= max_tx_high;
+
+  if (has_above && has_left)
+    return (above + left);
+  else if (has_above)
+    return above;
+  else if (has_left)
+    return left;
+  else
+    return 0;
+}
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_PRED_COMMON_H_

diff --git a/libav1/av1/common/quant_common.c b/libav1/av1/common/quant_common.c
new file mode 100644
index 0000000..d4bdb98
--- /dev/null
+++ b/libav1/av1/common/quant_common.c

@@ -0,0 +1,12845 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "av1/common/common.h"
+#include "av1/common/onyxc_int.h"
+#include "av1/common/entropy.h"
+#include "av1/common/quant_common.h"
+#include "av1/common/seg_common.h"
+#include "av1/common/blockd.h"
+
+static const int16_t dc_qlookup_Q3[QINDEX_RANGE] = {
+  4,    8,    8,    9,    10,  11,  12,  12,  13,  14,  15,   16,   17,   18,
+  19,   19,   20,   21,   22,  23,  24,  25,  26,  26,  27,   28,   29,   30,
+  31,   32,   32,   33,   34,  35,  36,  37,  38,  38,  39,   40,   41,   42,
+  43,   43,   44,   45,   46,  47,  48,  48,  49,  50,  51,   52,   53,   53,
+  54,   55,   56,   57,   57,  58,  59,  60,  61,  62,  62,   63,   64,   65,
+  66,   66,   67,   68,   69,  70,  70,  71,  72,  73,  74,   74,   75,   76,
+  77,   78,   78,   79,   80,  81,  81,  82,  83,  84,  85,   85,   87,   88,
+  90,   92,   93,   95,   96,  98,  99,  101, 102, 104, 105,  107,  108,  110,
+  111,  113,  114,  116,  117, 118, 120, 121, 123, 125, 127,  129,  131,  134,
+  136,  138,  140,  142,  144, 146, 148, 150, 152, 154, 156,  158,  161,  164,
+  166,  169,  172,  174,  177, 180, 182, 185, 187, 190, 192,  195,  199,  202,
+  205,  208,  211,  214,  217, 220, 223, 226, 230, 233, 237,  240,  243,  247,
+  250,  253,  257,  261,  265, 269, 272, 276, 280, 284, 288,  292,  296,  300,
+  304,  309,  313,  317,  322, 326, 330, 335, 340, 344, 349,  354,  359,  364,
+  369,  374,  379,  384,  389, 395, 400, 406, 411, 417, 423,  429,  435,  441,
+  447,  454,  461,  467,  475, 482, 489, 497, 505, 513, 522,  530,  539,  549,
+  559,  569,  579,  590,  602, 614, 626, 640, 654, 668, 684,  700,  717,  736,
+  755,  775,  796,  819,  843, 869, 896, 925, 955, 988, 1022, 1058, 1098, 1139,
+  1184, 1232, 1282, 1336,
+};
+
+static const int16_t dc_qlookup_10_Q3[QINDEX_RANGE] = {
+  4,    9,    10,   13,   15,   17,   20,   22,   25,   28,   31,   34,   37,
+  40,   43,   47,   50,   53,   57,   60,   64,   68,   71,   75,   78,   82,
+  86,   90,   93,   97,   101,  105,  109,  113,  116,  120,  124,  128,  132,
+  136,  140,  143,  147,  151,  155,  159,  163,  166,  170,  174,  178,  182,
+  185,  189,  193,  197,  200,  204,  208,  212,  215,  219,  223,  226,  230,
+  233,  237,  241,  244,  248,  251,  255,  259,  262,  266,  269,  273,  276,
+  280,  283,  287,  290,  293,  297,  300,  304,  307,  310,  314,  317,  321,
+  324,  327,  331,  334,  337,  343,  350,  356,  362,  369,  375,  381,  387,
+  394,  400,  406,  412,  418,  424,  430,  436,  442,  448,  454,  460,  466,
+  472,  478,  484,  490,  499,  507,  516,  525,  533,  542,  550,  559,  567,
+  576,  584,  592,  601,  609,  617,  625,  634,  644,  655,  666,  676,  687,
+  698,  708,  718,  729,  739,  749,  759,  770,  782,  795,  807,  819,  831,
+  844,  856,  868,  880,  891,  906,  920,  933,  947,  961,  975,  988,  1001,
+  1015, 1030, 1045, 1061, 1076, 1090, 1105, 1120, 1137, 1153, 1170, 1186, 1202,
+  1218, 1236, 1253, 1271, 1288, 1306, 1323, 1342, 1361, 1379, 1398, 1416, 1436,
+  1456, 1476, 1496, 1516, 1537, 1559, 1580, 1601, 1624, 1647, 1670, 1692, 1717,
+  1741, 1766, 1791, 1817, 1844, 1871, 1900, 1929, 1958, 1990, 2021, 2054, 2088,
+  2123, 2159, 2197, 2236, 2276, 2319, 2363, 2410, 2458, 2508, 2561, 2616, 2675,
+  2737, 2802, 2871, 2944, 3020, 3102, 3188, 3280, 3375, 3478, 3586, 3702, 3823,
+  3953, 4089, 4236, 4394, 4559, 4737, 4929, 5130, 5347,
+};
+
+static const int16_t dc_qlookup_12_Q3[QINDEX_RANGE] = {
+  4,     12,    18,    25,    33,    41,    50,    60,    70,    80,    91,
+  103,   115,   127,   140,   153,   166,   180,   194,   208,   222,   237,
+  251,   266,   281,   296,   312,   327,   343,   358,   374,   390,   405,
+  421,   437,   453,   469,   484,   500,   516,   532,   548,   564,   580,
+  596,   611,   627,   643,   659,   674,   690,   706,   721,   737,   752,
+  768,   783,   798,   814,   829,   844,   859,   874,   889,   904,   919,
+  934,   949,   964,   978,   993,   1008,  1022,  1037,  1051,  1065,  1080,
+  1094,  1108,  1122,  1136,  1151,  1165,  1179,  1192,  1206,  1220,  1234,
+  1248,  1261,  1275,  1288,  1302,  1315,  1329,  1342,  1368,  1393,  1419,
+  1444,  1469,  1494,  1519,  1544,  1569,  1594,  1618,  1643,  1668,  1692,
+  1717,  1741,  1765,  1789,  1814,  1838,  1862,  1885,  1909,  1933,  1957,
+  1992,  2027,  2061,  2096,  2130,  2165,  2199,  2233,  2267,  2300,  2334,
+  2367,  2400,  2434,  2467,  2499,  2532,  2575,  2618,  2661,  2704,  2746,
+  2788,  2830,  2872,  2913,  2954,  2995,  3036,  3076,  3127,  3177,  3226,
+  3275,  3324,  3373,  3421,  3469,  3517,  3565,  3621,  3677,  3733,  3788,
+  3843,  3897,  3951,  4005,  4058,  4119,  4181,  4241,  4301,  4361,  4420,
+  4479,  4546,  4612,  4677,  4742,  4807,  4871,  4942,  5013,  5083,  5153,
+  5222,  5291,  5367,  5442,  5517,  5591,  5665,  5745,  5825,  5905,  5984,
+  6063,  6149,  6234,  6319,  6404,  6495,  6587,  6678,  6769,  6867,  6966,
+  7064,  7163,  7269,  7376,  7483,  7599,  7715,  7832,  7958,  8085,  8214,
+  8352,  8492,  8635,  8788,  8945,  9104,  9275,  9450,  9639,  9832,  10031,
+  10245, 10465, 10702, 10946, 11210, 11482, 11776, 12081, 12409, 12750, 13118,
+  13501, 13913, 14343, 14807, 15290, 15812, 16356, 16943, 17575, 18237, 18949,
+  19718, 20521, 21387,
+};
+
+static const int16_t ac_qlookup_Q3[QINDEX_RANGE] = {
+  4,    8,    9,    10,   11,   12,   13,   14,   15,   16,   17,   18,   19,
+  20,   21,   22,   23,   24,   25,   26,   27,   28,   29,   30,   31,   32,
+  33,   34,   35,   36,   37,   38,   39,   40,   41,   42,   43,   44,   45,
+  46,   47,   48,   49,   50,   51,   52,   53,   54,   55,   56,   57,   58,
+  59,   60,   61,   62,   63,   64,   65,   66,   67,   68,   69,   70,   71,
+  72,   73,   74,   75,   76,   77,   78,   79,   80,   81,   82,   83,   84,
+  85,   86,   87,   88,   89,   90,   91,   92,   93,   94,   95,   96,   97,
+  98,   99,   100,  101,  102,  104,  106,  108,  110,  112,  114,  116,  118,
+  120,  122,  124,  126,  128,  130,  132,  134,  136,  138,  140,  142,  144,
+  146,  148,  150,  152,  155,  158,  161,  164,  167,  170,  173,  176,  179,
+  182,  185,  188,  191,  194,  197,  200,  203,  207,  211,  215,  219,  223,
+  227,  231,  235,  239,  243,  247,  251,  255,  260,  265,  270,  275,  280,
+  285,  290,  295,  300,  305,  311,  317,  323,  329,  335,  341,  347,  353,
+  359,  366,  373,  380,  387,  394,  401,  408,  416,  424,  432,  440,  448,
+  456,  465,  474,  483,  492,  501,  510,  520,  530,  540,  550,  560,  571,
+  582,  593,  604,  615,  627,  639,  651,  663,  676,  689,  702,  715,  729,
+  743,  757,  771,  786,  801,  816,  832,  848,  864,  881,  898,  915,  933,
+  951,  969,  988,  1007, 1026, 1046, 1066, 1087, 1108, 1129, 1151, 1173, 1196,
+  1219, 1243, 1267, 1292, 1317, 1343, 1369, 1396, 1423, 1451, 1479, 1508, 1537,
+  1567, 1597, 1628, 1660, 1692, 1725, 1759, 1793, 1828,
+};
+
+static const int16_t ac_qlookup_10_Q3[QINDEX_RANGE] = {
+  4,    9,    11,   13,   16,   18,   21,   24,   27,   30,   33,   37,   40,
+  44,   48,   51,   55,   59,   63,   67,   71,   75,   79,   83,   88,   92,
+  96,   100,  105,  109,  114,  118,  122,  127,  131,  136,  140,  145,  149,
+  154,  158,  163,  168,  172,  177,  181,  186,  190,  195,  199,  204,  208,
+  213,  217,  222,  226,  231,  235,  240,  244,  249,  253,  258,  262,  267,
+  271,  275,  280,  284,  289,  293,  297,  302,  306,  311,  315,  319,  324,
+  328,  332,  337,  341,  345,  349,  354,  358,  362,  367,  371,  375,  379,
+  384,  388,  392,  396,  401,  409,  417,  425,  433,  441,  449,  458,  466,
+  474,  482,  490,  498,  506,  514,  523,  531,  539,  547,  555,  563,  571,
+  579,  588,  596,  604,  616,  628,  640,  652,  664,  676,  688,  700,  713,
+  725,  737,  749,  761,  773,  785,  797,  809,  825,  841,  857,  873,  889,
+  905,  922,  938,  954,  970,  986,  1002, 1018, 1038, 1058, 1078, 1098, 1118,
+  1138, 1158, 1178, 1198, 1218, 1242, 1266, 1290, 1314, 1338, 1362, 1386, 1411,
+  1435, 1463, 1491, 1519, 1547, 1575, 1603, 1631, 1663, 1695, 1727, 1759, 1791,
+  1823, 1859, 1895, 1931, 1967, 2003, 2039, 2079, 2119, 2159, 2199, 2239, 2283,
+  2327, 2371, 2415, 2459, 2507, 2555, 2603, 2651, 2703, 2755, 2807, 2859, 2915,
+  2971, 3027, 3083, 3143, 3203, 3263, 3327, 3391, 3455, 3523, 3591, 3659, 3731,
+  3803, 3876, 3952, 4028, 4104, 4184, 4264, 4348, 4432, 4516, 4604, 4692, 4784,
+  4876, 4972, 5068, 5168, 5268, 5372, 5476, 5584, 5692, 5804, 5916, 6032, 6148,
+  6268, 6388, 6512, 6640, 6768, 6900, 7036, 7172, 7312,
+};
+
+static const int16_t ac_qlookup_12_Q3[QINDEX_RANGE] = {
+  4,     13,    19,    27,    35,    44,    54,    64,    75,    87,    99,
+  112,   126,   139,   154,   168,   183,   199,   214,   230,   247,   263,
+  280,   297,   314,   331,   349,   366,   384,   402,   420,   438,   456,
+  475,   493,   511,   530,   548,   567,   586,   604,   623,   642,   660,
+  679,   698,   716,   735,   753,   772,   791,   809,   828,   846,   865,
+  884,   902,   920,   939,   957,   976,   994,   1012,  1030,  1049,  1067,
+  1085,  1103,  1121,  1139,  1157,  1175,  1193,  1211,  1229,  1246,  1264,
+  1282,  1299,  1317,  1335,  1352,  1370,  1387,  1405,  1422,  1440,  1457,
+  1474,  1491,  1509,  1526,  1543,  1560,  1577,  1595,  1627,  1660,  1693,
+  1725,  1758,  1791,  1824,  1856,  1889,  1922,  1954,  1987,  2020,  2052,
+  2085,  2118,  2150,  2183,  2216,  2248,  2281,  2313,  2346,  2378,  2411,
+  2459,  2508,  2556,  2605,  2653,  2701,  2750,  2798,  2847,  2895,  2943,
+  2992,  3040,  3088,  3137,  3185,  3234,  3298,  3362,  3426,  3491,  3555,
+  3619,  3684,  3748,  3812,  3876,  3941,  4005,  4069,  4149,  4230,  4310,
+  4390,  4470,  4550,  4631,  4711,  4791,  4871,  4967,  5064,  5160,  5256,
+  5352,  5448,  5544,  5641,  5737,  5849,  5961,  6073,  6185,  6297,  6410,
+  6522,  6650,  6778,  6906,  7034,  7162,  7290,  7435,  7579,  7723,  7867,
+  8011,  8155,  8315,  8475,  8635,  8795,  8956,  9132,  9308,  9484,  9660,
+  9836,  10028, 10220, 10412, 10604, 10812, 11020, 11228, 11437, 11661, 11885,
+  12109, 12333, 12573, 12813, 13053, 13309, 13565, 13821, 14093, 14365, 14637,
+  14925, 15213, 15502, 15806, 16110, 16414, 16734, 17054, 17390, 17726, 18062,
+  18414, 18766, 19134, 19502, 19886, 20270, 20670, 21070, 21486, 21902, 22334,
+  22766, 23214, 23662, 24126, 24590, 25070, 25551, 26047, 26559, 27071, 27599,
+  28143, 28687, 29247,
+};
+
+// Coefficient scaling and quantization with AV1 TX are tailored to
+// the AV1 TX transforms.  Regardless of the bit-depth of the input,
+// the transform stages scale the coefficient values up by a factor of
+// 8 (3 bits) over the scale of the pixel values.  Thus, for 8-bit
+// input, the coefficients have effectively 11 bits of scale depth
+// (8+3), 10-bit input pixels result in 13-bit coefficient depth
+// (10+3) and 12-bit pixels yield 15-bit (12+3) coefficient depth.
+// All quantizers are built using this invariant of x8, 3-bit scaling,
+// thus the Q3 suffix.
+
+// A partial exception to this rule is large transforms; to avoid
+// overflow, TX blocks with > 256 pels (>16x16) are scaled only
+// 4-times unity (2 bits) over the pixel depth, and TX blocks with
+// over 1024 pixels (>32x32) are scaled up only 2x unity (1 bit).
+// This descaling is found via av1_tx_get_scale().  Thus, 16x32, 32x16
+// and 32x32 transforms actually return Q2 coefficients, and 32x64,
+// 64x32 and 64x64 transforms return Q1 coefficients.  However, the
+// quantizers are de-scaled down on-the-fly by the same amount
+// (av1_tx_get_scale()) during quantization, and as such the
+// dequantized/decoded coefficients, even for large TX blocks, are always
+// effectively Q3. Meanwhile, quantized/coded coefficients are Q0
+// because Qn quantizers are applied to Qn tx coefficients.
+
+// Note that encoder decision making (which uses the quantizer to
+// generate several bespoke lamdas for RDO and other heuristics)
+// expects quantizers to be larger for higher-bitdepth input.  In
+// addition, the minimum allowable quantizer is 4; smaller values will
+// underflow to 0 in the actual quantization routines.
+
+int16_t av1_dc_quant_Q3(int qindex, int delta, aom_bit_depth_t bit_depth) {
+  const int q_clamped = clamp(qindex + delta, 0, MAXQ);
+  switch (bit_depth) {
+    case AOM_BITS_8: return dc_qlookup_Q3[q_clamped];
+    case AOM_BITS_10: return dc_qlookup_10_Q3[q_clamped];
+    case AOM_BITS_12: return dc_qlookup_12_Q3[q_clamped];
+    default:
+      assert(0 && "bit_depth should be AOM_BITS_8, AOM_BITS_10 or AOM_BITS_12");
+      return -1;
+  }
+}
+
+int16_t av1_ac_quant_Q3(int qindex, int delta, aom_bit_depth_t bit_depth) {
+  const int q_clamped = clamp(qindex + delta, 0, MAXQ);
+  switch (bit_depth) {
+    case AOM_BITS_8: return ac_qlookup_Q3[q_clamped];
+    case AOM_BITS_10: return ac_qlookup_10_Q3[q_clamped];
+    case AOM_BITS_12: return ac_qlookup_12_Q3[q_clamped];
+    default:
+      assert(0 && "bit_depth should be AOM_BITS_8, AOM_BITS_10 or AOM_BITS_12");
+      return -1;
+  }
+}
+
+// In AV1 TX, the coefficients are always scaled up a factor of 8 (3
+// bits), so QTX == Q3.
+
+int16_t av1_dc_quant_QTX(int qindex, int delta, aom_bit_depth_t bit_depth) {
+  return av1_dc_quant_Q3(qindex, delta, bit_depth);
+}
+
+int16_t av1_ac_quant_QTX(int qindex, int delta, aom_bit_depth_t bit_depth) {
+  return av1_ac_quant_Q3(qindex, delta, bit_depth);
+}
+
+int av1_get_qindex(const struct segmentation *seg, int segment_id,
+                   int base_qindex) {
+  if (segfeature_active(seg, segment_id, SEG_LVL_ALT_Q)) {
+    const int data = get_segdata(seg, segment_id, SEG_LVL_ALT_Q);
+    const int seg_qindex = base_qindex + data;
+    return clamp(seg_qindex, 0, MAXQ);
+  } else {
+    return base_qindex;
+  }
+}
+
+const qm_val_t *av1_iqmatrix(AV1_COMMON *cm, int qmlevel, int plane,
+                             TX_SIZE tx_size) {
+  return &cm->giqmatrix[qmlevel][plane][tx_size][0];
+}
+const qm_val_t *av1_qmatrix(AV1_COMMON *cm, int qmlevel, int plane,
+                            TX_SIZE tx_size) {
+  return &cm->gqmatrix[qmlevel][plane][tx_size][0];
+}
+
+#define QM_TOTAL_SIZE 3344
+// We only use wt_matrix_ref[q] and iwt_matrix_ref[q]
+// for q = 0, ..., NUM_QM_LEVELS - 2.
+static const qm_val_t wt_matrix_ref[NUM_QM_LEVELS - 1][2][QM_TOTAL_SIZE];
+static const qm_val_t iwt_matrix_ref[NUM_QM_LEVELS - 1][2][QM_TOTAL_SIZE];
+
+void av1_qm_init(AV1_COMMON *cm) {
+  const int num_planes = av1_num_planes(cm);
+  int q, c, t;
+  int current;
+  for (q = 0; q < NUM_QM_LEVELS; ++q) {
+    for (c = 0; c < num_planes; ++c) {
+      current = 0;
+      for (t = 0; t < TX_SIZES_ALL; ++t) {
+        const int size = tx_size_2d[t];
+        const int qm_tx_size = av1_get_adjusted_tx_size(t);
+        if (q == NUM_QM_LEVELS - 1) {
+          cm->gqmatrix[q][c][t] = NULL;
+          cm->giqmatrix[q][c][t] = NULL;
+        } else if (t != qm_tx_size) {  // Reuse matrices for 'qm_tx_size'
+          cm->gqmatrix[q][c][t] = cm->gqmatrix[q][c][qm_tx_size];
+          cm->giqmatrix[q][c][t] = cm->giqmatrix[q][c][qm_tx_size];
+        } else {
+          assert(current + size <= QM_TOTAL_SIZE);
+          cm->gqmatrix[q][c][t] = &wt_matrix_ref[q][c >= 1][current];
+          cm->giqmatrix[q][c][t] = &iwt_matrix_ref[q][c >= 1][current];
+          current += size;
+        }
+      }
+    }
+  }
+}
+
+/* Provide 15 sets of quantization matrices for chroma and luma
+   and each TX size. Matrices for different TX sizes are in fact
+   sub-sampled from the 32x32 and 16x16 sizes, but explicitly
+   defined here for convenience. Intra and inter matrix sets are the
+   same but changing DEFAULT_QM_INTER_OFFSET from zero allows
+   for different matrices for inter and intra blocks in the same
+   frame.
+   Matrices for different QM levels have been rescaled in the
+   frequency domain according to different nominal viewing
+   distances. Matrices for QM level 15 are omitted because they are
+   not used.
+ */
+static const qm_val_t iwt_matrix_ref[NUM_QM_LEVELS - 1][2][QM_TOTAL_SIZE] = {
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 43, 73, 97, 43, 67, 94, 110, 73, 94, 137, 150, 97, 110, 150, 200,
+        /* Size 8x8 */
+        32, 32, 38, 51, 68, 84, 95, 109, 32, 35, 40, 49, 63, 76, 89, 102, 38,
+        40, 54, 65, 78, 91, 98, 106, 51, 49, 65, 82, 97, 111, 113, 121, 68, 63,
+        78, 97, 117, 134, 138, 142, 84, 76, 91, 111, 134, 152, 159, 168, 95, 89,
+        98, 113, 138, 159, 183, 199, 109, 102, 106, 121, 142, 168, 199, 220,
+        /* Size 16x16 */
+        32, 31, 31, 34, 36, 44, 48, 59, 65, 80, 83, 91, 97, 104, 111, 119, 31,
+        32, 32, 33, 34, 41, 44, 54, 59, 72, 75, 83, 90, 97, 104, 112, 31, 32,
+        33, 35, 36, 42, 45, 54, 59, 71, 74, 81, 86, 93, 100, 107, 34, 33, 35,
+        39, 42, 47, 51, 58, 63, 74, 76, 81, 84, 90, 97, 105, 36, 34, 36, 42, 48,
+        54, 57, 64, 68, 79, 81, 88, 91, 96, 102, 105, 44, 41, 42, 47, 54, 63,
+        67, 75, 79, 90, 92, 95, 100, 102, 109, 112, 48, 44, 45, 51, 57, 67, 71,
+        80, 85, 96, 99, 107, 108, 111, 117, 120, 59, 54, 54, 58, 64, 75, 80, 92,
+        98, 110, 113, 115, 116, 122, 125, 130, 65, 59, 59, 63, 68, 79, 85, 98,
+        105, 118, 121, 127, 130, 134, 135, 140, 80, 72, 71, 74, 79, 90, 96, 110,
+        118, 134, 137, 140, 143, 144, 146, 152, 83, 75, 74, 76, 81, 92, 99, 113,
+        121, 137, 140, 151, 152, 155, 158, 165, 91, 83, 81, 81, 88, 95, 107,
+        115, 127, 140, 151, 159, 166, 169, 173, 179, 97, 90, 86, 84, 91, 100,
+        108, 116, 130, 143, 152, 166, 174, 182, 189, 193, 104, 97, 93, 90, 96,
+        102, 111, 122, 134, 144, 155, 169, 182, 191, 200, 210, 111, 104, 100,
+        97, 102, 109, 117, 125, 135, 146, 158, 173, 189, 200, 210, 220, 119,
+        112, 107, 105, 105, 112, 120, 130, 140, 152, 165, 179, 193, 210, 220,
+        231,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 32, 34, 35, 36, 39, 44, 46, 48, 54, 59, 62, 65, 71,
+        80, 81, 83, 88, 91, 94, 97, 101, 104, 107, 111, 115, 119, 123, 31, 32,
+        32, 32, 32, 32, 34, 34, 35, 38, 42, 44, 46, 51, 56, 59, 62, 68, 76, 77,
+        78, 84, 86, 89, 92, 95, 99, 102, 105, 109, 113, 116, 31, 32, 32, 32, 32,
+        32, 33, 34, 34, 37, 41, 42, 44, 49, 54, 56, 59, 65, 72, 73, 75, 80, 83,
+        86, 90, 93, 97, 101, 104, 108, 112, 116, 31, 32, 32, 32, 33, 33, 34, 35,
+        35, 38, 41, 43, 45, 49, 54, 56, 59, 64, 72, 73, 74, 79, 82, 85, 88, 91,
+        94, 97, 101, 104, 107, 111, 31, 32, 32, 33, 33, 34, 35, 36, 36, 39, 42,
+        44, 45, 50, 54, 56, 59, 64, 71, 72, 74, 78, 81, 84, 86, 89, 93, 96, 100,
+        104, 107, 111, 32, 32, 32, 33, 34, 35, 37, 37, 38, 40, 42, 44, 46, 49,
+        53, 55, 58, 63, 69, 70, 72, 76, 79, 82, 85, 89, 93, 96, 99, 102, 106,
+        109, 34, 34, 33, 34, 35, 37, 39, 41, 42, 45, 47, 49, 51, 54, 58, 60, 63,
+        68, 74, 75, 76, 80, 81, 82, 84, 87, 90, 93, 97, 101, 105, 110, 35, 34,
+        34, 35, 36, 37, 41, 43, 45, 47, 50, 52, 53, 57, 61, 63, 65, 70, 76, 77,
+        79, 82, 84, 86, 89, 91, 92, 93, 96, 100, 103, 107, 36, 35, 34, 35, 36,
+        38, 42, 45, 48, 50, 54, 55, 57, 60, 64, 66, 68, 73, 79, 80, 81, 85, 88,
+        90, 91, 93, 96, 99, 102, 103, 105, 107, 39, 38, 37, 38, 39, 40, 45, 47,
+        50, 54, 58, 59, 61, 65, 69, 71, 73, 78, 84, 85, 86, 91, 92, 92, 95, 98,
+        100, 101, 103, 106, 110, 114, 44, 42, 41, 41, 42, 42, 47, 50, 54, 58,
+        63, 65, 67, 71, 75, 77, 79, 84, 90, 91, 92, 95, 95, 97, 100, 101, 102,
+        105, 109, 111, 112, 114, 46, 44, 42, 43, 44, 44, 49, 52, 55, 59, 65, 67,
+        69, 74, 78, 80, 82, 87, 93, 94, 95, 98, 100, 103, 102, 105, 108, 110,
+        111, 113, 117, 121, 48, 46, 44, 45, 45, 46, 51, 53, 57, 61, 67, 69, 71,
+        76, 80, 83, 85, 90, 96, 97, 99, 103, 107, 105, 108, 111, 111, 113, 117,
+        119, 120, 122, 54, 51, 49, 49, 50, 49, 54, 57, 60, 65, 71, 74, 76, 82,
+        87, 89, 92, 97, 104, 105, 106, 111, 110, 111, 114, 113, 116, 120, 120,
+        121, 125, 130, 59, 56, 54, 54, 54, 53, 58, 61, 64, 69, 75, 78, 80, 87,
+        92, 95, 98, 103, 110, 111, 113, 115, 115, 119, 116, 120, 122, 122, 125,
+        129, 130, 130, 62, 59, 56, 56, 56, 55, 60, 63, 66, 71, 77, 80, 83, 89,
+        95, 98, 101, 107, 114, 115, 117, 119, 123, 121, 125, 126, 125, 129, 131,
+        131, 135, 140, 65, 62, 59, 59, 59, 58, 63, 65, 68, 73, 79, 82, 85, 92,
+        98, 101, 105, 111, 118, 119, 121, 126, 127, 128, 130, 130, 134, 133,
+        135, 140, 140, 140, 71, 68, 65, 64, 64, 63, 68, 70, 73, 78, 84, 87, 90,
+        97, 103, 107, 111, 117, 125, 126, 128, 134, 132, 136, 133, 138, 137,
+        140, 143, 142, 145, 150, 80, 76, 72, 72, 71, 69, 74, 76, 79, 84, 90, 93,
+        96, 104, 110, 114, 118, 125, 134, 135, 137, 139, 140, 139, 143, 142,
+        144, 146, 146, 151, 152, 151, 81, 77, 73, 73, 72, 70, 75, 77, 80, 85,
+        91, 94, 97, 105, 111, 115, 119, 126, 135, 137, 138, 144, 147, 146, 148,
+        149, 151, 150, 156, 155, 157, 163, 83, 78, 75, 74, 74, 72, 76, 79, 81,
+        86, 92, 95, 99, 106, 113, 117, 121, 128, 137, 138, 140, 147, 151, 156,
+        152, 157, 155, 161, 158, 162, 165, 164, 88, 84, 80, 79, 78, 76, 80, 82,
+        85, 91, 95, 98, 103, 111, 115, 119, 126, 134, 139, 144, 147, 152, 154,
+        158, 163, 159, 165, 163, 168, 168, 169, 176, 91, 86, 83, 82, 81, 79, 81,
+        84, 88, 92, 95, 100, 107, 110, 115, 123, 127, 132, 140, 147, 151, 154,
+        159, 161, 166, 171, 169, 173, 173, 176, 179, 177, 94, 89, 86, 85, 84,
+        82, 82, 86, 90, 92, 97, 103, 105, 111, 119, 121, 128, 136, 139, 146,
+        156, 158, 161, 166, 168, 174, 179, 178, 180, 183, 183, 190, 97, 92, 90,
+        88, 86, 85, 84, 89, 91, 95, 100, 102, 108, 114, 116, 125, 130, 133, 143,
+        148, 152, 163, 166, 168, 174, 176, 182, 187, 189, 188, 193, 191, 101,
+        95, 93, 91, 89, 89, 87, 91, 93, 98, 101, 105, 111, 113, 120, 126, 130,
+        138, 142, 149, 157, 159, 171, 174, 176, 183, 184, 191, 195, 199, 197,
+        204, 104, 99, 97, 94, 93, 93, 90, 92, 96, 100, 102, 108, 111, 116, 122,
+        125, 134, 137, 144, 151, 155, 165, 169, 179, 182, 184, 191, 193, 200,
+        204, 210, 206, 107, 102, 101, 97, 96, 96, 93, 93, 99, 101, 105, 110,
+        113, 120, 122, 129, 133, 140, 146, 150, 161, 163, 173, 178, 187, 191,
+        193, 200, 202, 210, 214, 222, 111, 105, 104, 101, 100, 99, 97, 96, 102,
+        103, 109, 111, 117, 120, 125, 131, 135, 143, 146, 156, 158, 168, 173,
+        180, 189, 195, 200, 202, 210, 212, 220, 224, 115, 109, 108, 104, 104,
+        102, 101, 100, 103, 106, 111, 113, 119, 121, 129, 131, 140, 142, 151,
+        155, 162, 168, 176, 183, 188, 199, 204, 210, 212, 220, 222, 230, 119,
+        113, 112, 107, 107, 106, 105, 103, 105, 110, 112, 117, 120, 125, 130,
+        135, 140, 145, 152, 157, 165, 169, 179, 183, 193, 197, 210, 214, 220,
+        222, 231, 232, 123, 116, 116, 111, 111, 109, 110, 107, 107, 114, 114,
+        121, 122, 130, 130, 140, 140, 150, 151, 163, 164, 176, 177, 190, 191,
+        204, 206, 222, 224, 230, 232, 242,
+        /* Size 4x8 */
+        32, 42, 75, 91, 33, 42, 69, 86, 37, 58, 84, 91, 49, 71, 103, 110, 65,
+        84, 125, 128, 80, 97, 142, 152, 91, 100, 145, 178, 104, 112, 146, 190,
+        /* Size 8x4 */
+        32, 33, 37, 49, 65, 80, 91, 104, 42, 42, 58, 71, 84, 97, 100, 112, 75,
+        69, 84, 103, 125, 142, 145, 146, 91, 86, 91, 110, 128, 152, 178, 190,
+        /* Size 8x16 */
+        32, 32, 36, 53, 65, 87, 93, 99, 31, 33, 34, 49, 59, 78, 86, 93, 32, 34,
+        36, 50, 59, 77, 82, 89, 34, 37, 42, 54, 63, 79, 80, 88, 36, 38, 48, 60,
+        68, 84, 86, 90, 44, 43, 53, 71, 79, 95, 94, 97, 48, 46, 56, 76, 85, 102,
+        105, 105, 58, 54, 63, 87, 98, 116, 112, 115, 65, 58, 68, 92, 105, 124,
+        122, 124, 79, 70, 79, 104, 118, 141, 135, 135, 82, 72, 81, 106, 121,
+        144, 149, 146, 91, 80, 88, 106, 130, 148, 162, 159, 97, 86, 94, 107,
+        128, 157, 167, 171, 103, 93, 98, 114, 131, 150, 174, 186, 110, 100, 101,
+        117, 138, 161, 183, 193, 118, 107, 105, 118, 136, 157, 182, 203,
+        /* Size 16x8 */
+        32, 31, 32, 34, 36, 44, 48, 58, 65, 79, 82, 91, 97, 103, 110, 118, 32,
+        33, 34, 37, 38, 43, 46, 54, 58, 70, 72, 80, 86, 93, 100, 107, 36, 34,
+        36, 42, 48, 53, 56, 63, 68, 79, 81, 88, 94, 98, 101, 105, 53, 49, 50,
+        54, 60, 71, 76, 87, 92, 104, 106, 106, 107, 114, 117, 118, 65, 59, 59,
+        63, 68, 79, 85, 98, 105, 118, 121, 130, 128, 131, 138, 136, 87, 78, 77,
+        79, 84, 95, 102, 116, 124, 141, 144, 148, 157, 150, 161, 157, 93, 86,
+        82, 80, 86, 94, 105, 112, 122, 135, 149, 162, 167, 174, 183, 182, 99,
+        93, 89, 88, 90, 97, 105, 115, 124, 135, 146, 159, 171, 186, 193, 203,
+        /* Size 16x32 */
+        32, 31, 32, 34, 36, 44, 53, 59, 65, 79, 87, 90, 93, 96, 99, 102, 31, 32,
+        32, 34, 35, 42, 51, 56, 62, 75, 82, 85, 88, 91, 94, 97, 31, 32, 33, 33,
+        34, 41, 49, 54, 59, 72, 78, 82, 86, 90, 93, 97, 31, 32, 33, 34, 35, 41,
+        49, 54, 59, 71, 78, 81, 84, 87, 90, 93, 32, 32, 34, 35, 36, 42, 50, 54,
+        59, 71, 77, 80, 82, 86, 89, 93, 32, 33, 35, 37, 38, 42, 49, 53, 58, 69,
+        75, 78, 82, 86, 89, 92, 34, 34, 37, 39, 42, 48, 54, 58, 63, 73, 79, 78,
+        80, 83, 88, 92, 35, 34, 37, 41, 45, 50, 57, 61, 65, 76, 82, 83, 84, 84,
+        87, 90, 36, 34, 38, 43, 48, 54, 60, 64, 68, 78, 84, 87, 86, 89, 90, 90,
+        39, 37, 40, 45, 50, 58, 65, 69, 73, 84, 89, 89, 91, 91, 93, 96, 44, 41,
+        43, 48, 53, 63, 71, 75, 79, 90, 95, 93, 94, 95, 97, 97, 46, 43, 44, 49,
+        55, 65, 73, 78, 82, 93, 98, 100, 98, 100, 99, 103, 48, 45, 46, 51, 56,
+        67, 76, 80, 85, 96, 102, 102, 105, 102, 105, 104, 53, 49, 50, 54, 60,
+        71, 82, 87, 92, 103, 109, 107, 107, 110, 107, 111, 58, 54, 54, 58, 63,
+        75, 87, 92, 98, 110, 116, 115, 112, 111, 115, 112, 61, 57, 56, 60, 66,
+        77, 89, 95, 101, 114, 120, 118, 119, 118, 116, 120, 65, 60, 58, 63, 68,
+        79, 92, 98, 105, 118, 124, 123, 122, 123, 124, 121, 71, 65, 63, 68, 73,
+        84, 97, 103, 111, 125, 132, 132, 130, 128, 127, 130, 79, 72, 70, 74, 79,
+        90, 104, 110, 118, 133, 141, 136, 135, 135, 135, 131, 81, 74, 71, 75,
+        80, 91, 105, 112, 119, 135, 142, 140, 140, 138, 139, 142, 82, 75, 72,
+        76, 81, 92, 106, 113, 121, 136, 144, 151, 149, 149, 146, 143, 88, 80,
+        77, 80, 85, 97, 108, 115, 126, 142, 149, 153, 153, 152, 152, 154, 91,
+        83, 80, 81, 88, 100, 106, 114, 130, 142, 148, 155, 162, 160, 159, 155,
+        94, 85, 83, 82, 91, 100, 105, 118, 131, 137, 153, 160, 165, 167, 166,
+        168, 97, 88, 86, 85, 94, 100, 107, 123, 128, 140, 157, 161, 167, 173,
+        171, 169, 100, 91, 89, 87, 97, 100, 111, 121, 127, 145, 152, 164, 173,
+        178, 182, 181, 103, 94, 93, 90, 98, 101, 114, 120, 131, 144, 150, 170,
+        174, 180, 186, 183, 107, 97, 96, 93, 100, 104, 117, 119, 136, 142, 155,
+        168, 177, 187, 191, 198, 110, 101, 100, 97, 101, 108, 117, 123, 138,
+        141, 161, 165, 183, 188, 193, 200, 114, 104, 104, 100, 103, 112, 117,
+        127, 137, 146, 159, 167, 185, 190, 201, 206, 118, 108, 107, 103, 105,
+        115, 118, 131, 136, 151, 157, 172, 182, 197, 203, 208, 122, 111, 111,
+        107, 107, 119, 119, 136, 136, 156, 156, 178, 179, 203, 204, 217,
+        /* Size 32x16 */
+        32, 31, 31, 31, 32, 32, 34, 35, 36, 39, 44, 46, 48, 53, 58, 61, 65, 71,
+        79, 81, 82, 88, 91, 94, 97, 100, 103, 107, 110, 114, 118, 122, 31, 32,
+        32, 32, 32, 33, 34, 34, 34, 37, 41, 43, 45, 49, 54, 57, 60, 65, 72, 74,
+        75, 80, 83, 85, 88, 91, 94, 97, 101, 104, 108, 111, 32, 32, 33, 33, 34,
+        35, 37, 37, 38, 40, 43, 44, 46, 50, 54, 56, 58, 63, 70, 71, 72, 77, 80,
+        83, 86, 89, 93, 96, 100, 104, 107, 111, 34, 34, 33, 34, 35, 37, 39, 41,
+        43, 45, 48, 49, 51, 54, 58, 60, 63, 68, 74, 75, 76, 80, 81, 82, 85, 87,
+        90, 93, 97, 100, 103, 107, 36, 35, 34, 35, 36, 38, 42, 45, 48, 50, 53,
+        55, 56, 60, 63, 66, 68, 73, 79, 80, 81, 85, 88, 91, 94, 97, 98, 100,
+        101, 103, 105, 107, 44, 42, 41, 41, 42, 42, 48, 50, 54, 58, 63, 65, 67,
+        71, 75, 77, 79, 84, 90, 91, 92, 97, 100, 100, 100, 100, 101, 104, 108,
+        112, 115, 119, 53, 51, 49, 49, 50, 49, 54, 57, 60, 65, 71, 73, 76, 82,
+        87, 89, 92, 97, 104, 105, 106, 108, 106, 105, 107, 111, 114, 117, 117,
+        117, 118, 119, 59, 56, 54, 54, 54, 53, 58, 61, 64, 69, 75, 78, 80, 87,
+        92, 95, 98, 103, 110, 112, 113, 115, 114, 118, 123, 121, 120, 119, 123,
+        127, 131, 136, 65, 62, 59, 59, 59, 58, 63, 65, 68, 73, 79, 82, 85, 92,
+        98, 101, 105, 111, 118, 119, 121, 126, 130, 131, 128, 127, 131, 136,
+        138, 137, 136, 136, 79, 75, 72, 71, 71, 69, 73, 76, 78, 84, 90, 93, 96,
+        103, 110, 114, 118, 125, 133, 135, 136, 142, 142, 137, 140, 145, 144,
+        142, 141, 146, 151, 156, 87, 82, 78, 78, 77, 75, 79, 82, 84, 89, 95, 98,
+        102, 109, 116, 120, 124, 132, 141, 142, 144, 149, 148, 153, 157, 152,
+        150, 155, 161, 159, 157, 156, 90, 85, 82, 81, 80, 78, 78, 83, 87, 89,
+        93, 100, 102, 107, 115, 118, 123, 132, 136, 140, 151, 153, 155, 160,
+        161, 164, 170, 168, 165, 167, 172, 178, 93, 88, 86, 84, 82, 82, 80, 84,
+        86, 91, 94, 98, 105, 107, 112, 119, 122, 130, 135, 140, 149, 153, 162,
+        165, 167, 173, 174, 177, 183, 185, 182, 179, 96, 91, 90, 87, 86, 86, 83,
+        84, 89, 91, 95, 100, 102, 110, 111, 118, 123, 128, 135, 138, 149, 152,
+        160, 167, 173, 178, 180, 187, 188, 190, 197, 203, 99, 94, 93, 90, 89,
+        89, 88, 87, 90, 93, 97, 99, 105, 107, 115, 116, 124, 127, 135, 139, 146,
+        152, 159, 166, 171, 182, 186, 191, 193, 201, 203, 204, 102, 97, 97, 93,
+        93, 92, 92, 90, 90, 96, 97, 103, 104, 111, 112, 120, 121, 130, 131, 142,
+        143, 154, 155, 168, 169, 181, 183, 198, 200, 206, 208, 217,
+        /* Size 4x16 */
+        31, 44, 79, 96, 32, 41, 72, 90, 32, 42, 71, 86, 34, 48, 73, 83, 34, 54,
+        78, 89, 41, 63, 90, 95, 45, 67, 96, 102, 54, 75, 110, 111, 60, 79, 118,
+        123, 72, 90, 133, 135, 75, 92, 136, 149, 83, 100, 142, 160, 88, 100,
+        140, 173, 94, 101, 144, 180, 101, 108, 141, 188, 108, 115, 151, 197,
+        /* Size 16x4 */
+        31, 32, 32, 34, 34, 41, 45, 54, 60, 72, 75, 83, 88, 94, 101, 108, 44,
+        41, 42, 48, 54, 63, 67, 75, 79, 90, 92, 100, 100, 101, 108, 115, 79, 72,
+        71, 73, 78, 90, 96, 110, 118, 133, 136, 142, 140, 144, 141, 151, 96, 90,
+        86, 83, 89, 95, 102, 111, 123, 135, 149, 160, 173, 180, 188, 197,
+        /* Size 8x32 */
+        32, 32, 36, 53, 65, 87, 93, 99, 31, 32, 35, 51, 62, 82, 88, 94, 31, 33,
+        34, 49, 59, 78, 86, 93, 31, 33, 35, 49, 59, 78, 84, 90, 32, 34, 36, 50,
+        59, 77, 82, 89, 32, 35, 38, 49, 58, 75, 82, 89, 34, 37, 42, 54, 63, 79,
+        80, 88, 35, 37, 45, 57, 65, 82, 84, 87, 36, 38, 48, 60, 68, 84, 86, 90,
+        39, 40, 50, 65, 73, 89, 91, 93, 44, 43, 53, 71, 79, 95, 94, 97, 46, 44,
+        55, 73, 82, 98, 98, 99, 48, 46, 56, 76, 85, 102, 105, 105, 53, 50, 60,
+        82, 92, 109, 107, 107, 58, 54, 63, 87, 98, 116, 112, 115, 61, 56, 66,
+        89, 101, 120, 119, 116, 65, 58, 68, 92, 105, 124, 122, 124, 71, 63, 73,
+        97, 111, 132, 130, 127, 79, 70, 79, 104, 118, 141, 135, 135, 81, 71, 80,
+        105, 119, 142, 140, 139, 82, 72, 81, 106, 121, 144, 149, 146, 88, 77,
+        85, 108, 126, 149, 153, 152, 91, 80, 88, 106, 130, 148, 162, 159, 94,
+        83, 91, 105, 131, 153, 165, 166, 97, 86, 94, 107, 128, 157, 167, 171,
+        100, 89, 97, 111, 127, 152, 173, 182, 103, 93, 98, 114, 131, 150, 174,
+        186, 107, 96, 100, 117, 136, 155, 177, 191, 110, 100, 101, 117, 138,
+        161, 183, 193, 114, 104, 103, 117, 137, 159, 185, 201, 118, 107, 105,
+        118, 136, 157, 182, 203, 122, 111, 107, 119, 136, 156, 179, 204,
+        /* Size 32x8 */
+        32, 31, 31, 31, 32, 32, 34, 35, 36, 39, 44, 46, 48, 53, 58, 61, 65, 71,
+        79, 81, 82, 88, 91, 94, 97, 100, 103, 107, 110, 114, 118, 122, 32, 32,
+        33, 33, 34, 35, 37, 37, 38, 40, 43, 44, 46, 50, 54, 56, 58, 63, 70, 71,
+        72, 77, 80, 83, 86, 89, 93, 96, 100, 104, 107, 111, 36, 35, 34, 35, 36,
+        38, 42, 45, 48, 50, 53, 55, 56, 60, 63, 66, 68, 73, 79, 80, 81, 85, 88,
+        91, 94, 97, 98, 100, 101, 103, 105, 107, 53, 51, 49, 49, 50, 49, 54, 57,
+        60, 65, 71, 73, 76, 82, 87, 89, 92, 97, 104, 105, 106, 108, 106, 105,
+        107, 111, 114, 117, 117, 117, 118, 119, 65, 62, 59, 59, 59, 58, 63, 65,
+        68, 73, 79, 82, 85, 92, 98, 101, 105, 111, 118, 119, 121, 126, 130, 131,
+        128, 127, 131, 136, 138, 137, 136, 136, 87, 82, 78, 78, 77, 75, 79, 82,
+        84, 89, 95, 98, 102, 109, 116, 120, 124, 132, 141, 142, 144, 149, 148,
+        153, 157, 152, 150, 155, 161, 159, 157, 156, 93, 88, 86, 84, 82, 82, 80,
+        84, 86, 91, 94, 98, 105, 107, 112, 119, 122, 130, 135, 140, 149, 153,
+        162, 165, 167, 173, 174, 177, 183, 185, 182, 179, 99, 94, 93, 90, 89,
+        89, 88, 87, 90, 93, 97, 99, 105, 107, 115, 116, 124, 127, 135, 139, 146,
+        152, 159, 166, 171, 182, 186, 191, 193, 201, 203, 204 },
+      { /* Chroma */
+        /* Size 4x4 */
+        35, 46, 57, 66, 46, 60, 69, 71, 57, 69, 90, 90, 66, 71, 90, 109,
+        /* Size 8x8 */
+        31, 38, 47, 50, 57, 63, 67, 71, 38, 47, 46, 47, 52, 57, 62, 67, 47, 46,
+        54, 57, 61, 66, 67, 68, 50, 47, 57, 66, 72, 77, 75, 75, 57, 52, 61, 72,
+        82, 88, 86, 84, 63, 57, 66, 77, 88, 96, 95, 95, 67, 62, 67, 75, 86, 95,
+        104, 107, 71, 67, 68, 75, 84, 95, 107, 113,
+        /* Size 16x16 */
+        32, 30, 33, 41, 49, 49, 50, 54, 57, 63, 65, 68, 70, 72, 74, 76, 30, 32,
+        35, 42, 46, 45, 46, 49, 52, 57, 58, 62, 64, 67, 70, 72, 33, 35, 39, 45,
+        47, 45, 46, 49, 51, 56, 57, 60, 62, 64, 66, 69, 41, 42, 45, 48, 50, 49,
+        50, 52, 53, 57, 58, 59, 60, 61, 64, 67, 49, 46, 47, 50, 53, 53, 54, 55,
+        56, 60, 61, 64, 64, 65, 66, 66, 49, 45, 45, 49, 53, 58, 60, 62, 63, 67,
+        68, 67, 69, 68, 70, 70, 50, 46, 46, 50, 54, 60, 61, 65, 67, 71, 71, 74,
+        73, 73, 74, 74, 54, 49, 49, 52, 55, 62, 65, 71, 73, 78, 79, 78, 77, 78,
+        78, 78, 57, 52, 51, 53, 56, 63, 67, 73, 76, 82, 83, 84, 84, 84, 82, 83,
+        63, 57, 56, 57, 60, 67, 71, 78, 82, 89, 90, 90, 89, 88, 87, 88, 65, 58,
+        57, 58, 61, 68, 71, 79, 83, 90, 91, 94, 93, 93, 92, 93, 68, 62, 60, 59,
+        64, 67, 74, 78, 84, 90, 94, 98, 99, 98, 98, 98, 70, 64, 62, 60, 64, 69,
+        73, 77, 84, 89, 93, 99, 102, 103, 104, 104, 72, 67, 64, 61, 65, 68, 73,
+        78, 84, 88, 93, 98, 103, 106, 108, 109, 74, 70, 66, 64, 66, 70, 74, 78,
+        82, 87, 92, 98, 104, 108, 111, 112, 76, 72, 69, 67, 66, 70, 74, 78, 83,
+        88, 93, 98, 104, 109, 112, 116,
+        /* Size 32x32 */
+        32, 31, 30, 32, 33, 36, 41, 45, 49, 48, 49, 50, 50, 52, 54, 56, 57, 60,
+        63, 64, 65, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 31, 31, 31, 33,
+        34, 38, 42, 45, 47, 47, 47, 47, 48, 50, 52, 53, 54, 57, 60, 61, 61, 63,
+        64, 65, 66, 67, 68, 69, 70, 71, 72, 74, 30, 31, 32, 33, 35, 40, 42, 44,
+        46, 45, 45, 45, 46, 47, 49, 51, 52, 54, 57, 58, 58, 61, 62, 63, 64, 66,
+        67, 68, 70, 71, 72, 74, 32, 33, 33, 35, 37, 41, 43, 45, 47, 46, 45, 46,
+        46, 47, 49, 50, 51, 54, 57, 57, 58, 60, 61, 62, 63, 64, 65, 66, 67, 68,
+        69, 70, 33, 34, 35, 37, 39, 43, 45, 46, 47, 46, 45, 46, 46, 47, 49, 50,
+        51, 53, 56, 57, 57, 59, 60, 61, 62, 63, 64, 65, 66, 68, 69, 70, 36, 38,
+        40, 41, 43, 47, 47, 47, 48, 46, 45, 46, 46, 47, 48, 49, 50, 52, 54, 55,
+        55, 57, 58, 59, 61, 62, 64, 65, 66, 67, 68, 69, 41, 42, 42, 43, 45, 47,
+        48, 49, 50, 49, 49, 49, 50, 50, 52, 52, 53, 55, 57, 58, 58, 60, 59, 59,
+        60, 61, 61, 63, 64, 66, 67, 69, 45, 45, 44, 45, 46, 47, 49, 50, 51, 51,
+        51, 51, 52, 52, 53, 54, 55, 57, 59, 59, 60, 61, 61, 62, 63, 63, 63, 63,
+        63, 64, 65, 66, 49, 47, 46, 47, 47, 48, 50, 51, 53, 53, 53, 54, 54, 54,
+        55, 56, 56, 58, 60, 61, 61, 63, 64, 64, 64, 64, 65, 66, 66, 66, 66, 66,
+        48, 47, 45, 46, 46, 46, 49, 51, 53, 54, 55, 56, 56, 57, 58, 59, 60, 61,
+        63, 64, 64, 66, 66, 65, 66, 67, 67, 67, 67, 68, 69, 70, 49, 47, 45, 45,
+        45, 45, 49, 51, 53, 55, 58, 59, 60, 61, 62, 63, 63, 65, 67, 67, 68, 69,
+        67, 68, 69, 68, 68, 69, 70, 70, 70, 70, 50, 47, 45, 46, 46, 46, 49, 51,
+        54, 56, 59, 60, 60, 62, 64, 64, 65, 67, 69, 69, 70, 70, 71, 71, 70, 70,
+        71, 71, 71, 71, 72, 74, 50, 48, 46, 46, 46, 46, 50, 52, 54, 56, 60, 60,
+        61, 63, 65, 66, 67, 68, 71, 71, 71, 73, 74, 72, 73, 74, 73, 73, 74, 74,
+        74, 74, 52, 50, 47, 47, 47, 47, 50, 52, 54, 57, 61, 62, 63, 66, 68, 69,
+        70, 72, 75, 75, 75, 77, 75, 75, 76, 75, 75, 76, 75, 75, 76, 77, 54, 52,
+        49, 49, 49, 48, 52, 53, 55, 58, 62, 64, 65, 68, 71, 72, 73, 75, 78, 78,
+        79, 79, 78, 79, 77, 78, 78, 77, 78, 79, 78, 78, 56, 53, 51, 50, 50, 49,
+        52, 54, 56, 59, 63, 64, 66, 69, 72, 73, 75, 77, 80, 80, 81, 81, 82, 80,
+        81, 81, 79, 81, 80, 79, 81, 82, 57, 54, 52, 51, 51, 50, 53, 55, 56, 60,
+        63, 65, 67, 70, 73, 75, 76, 79, 82, 82, 83, 85, 84, 83, 84, 83, 84, 82,
+        82, 84, 83, 82, 60, 57, 54, 54, 53, 52, 55, 57, 58, 61, 65, 67, 68, 72,
+        75, 77, 79, 82, 85, 85, 86, 88, 86, 87, 85, 86, 85, 85, 86, 84, 85, 86,
+        63, 60, 57, 57, 56, 54, 57, 59, 60, 63, 67, 69, 71, 75, 78, 80, 82, 85,
+        89, 89, 90, 90, 90, 89, 89, 88, 88, 88, 87, 88, 88, 87, 64, 61, 58, 57,
+        57, 55, 58, 59, 61, 64, 67, 69, 71, 75, 78, 80, 82, 85, 89, 90, 91, 92,
+        93, 92, 92, 91, 91, 90, 91, 90, 90, 92, 65, 61, 58, 58, 57, 55, 58, 60,
+        61, 64, 68, 70, 71, 75, 79, 81, 83, 86, 90, 91, 91, 94, 94, 96, 93, 94,
+        93, 94, 92, 93, 93, 92, 67, 63, 61, 60, 59, 57, 60, 61, 63, 66, 69, 70,
+        73, 77, 79, 81, 85, 88, 90, 92, 94, 96, 96, 97, 98, 95, 97, 95, 96, 95,
+        95, 96, 68, 64, 62, 61, 60, 58, 59, 61, 64, 66, 67, 71, 74, 75, 78, 82,
+        84, 86, 90, 93, 94, 96, 98, 98, 99, 100, 98, 99, 98, 98, 98, 97, 69, 65,
+        63, 62, 61, 59, 59, 62, 64, 65, 68, 71, 72, 75, 79, 80, 83, 87, 89, 92,
+        96, 97, 98, 100, 100, 101, 102, 101, 101, 101, 100, 102, 70, 66, 64, 63,
+        62, 61, 60, 63, 64, 66, 69, 70, 73, 76, 77, 81, 84, 85, 89, 92, 93, 98,
+        99, 100, 102, 102, 103, 104, 104, 103, 104, 102, 71, 67, 66, 64, 63, 62,
+        61, 63, 64, 67, 68, 70, 74, 75, 78, 81, 83, 86, 88, 91, 94, 95, 100,
+        101, 102, 104, 104, 105, 106, 107, 105, 107, 72, 68, 67, 65, 64, 64, 61,
+        63, 65, 67, 68, 71, 73, 75, 78, 79, 84, 85, 88, 91, 93, 97, 98, 102,
+        103, 104, 106, 106, 108, 108, 109, 107, 73, 69, 68, 66, 65, 65, 63, 63,
+        66, 67, 69, 71, 73, 76, 77, 81, 82, 85, 88, 90, 94, 95, 99, 101, 104,
+        105, 106, 109, 108, 110, 111, 112, 74, 70, 70, 67, 66, 66, 64, 63, 66,
+        67, 70, 71, 74, 75, 78, 80, 82, 86, 87, 91, 92, 96, 98, 101, 104, 106,
+        108, 108, 111, 111, 112, 113, 75, 71, 71, 68, 68, 67, 66, 64, 66, 68,
+        70, 71, 74, 75, 79, 79, 84, 84, 88, 90, 93, 95, 98, 101, 103, 107, 108,
+        110, 111, 113, 113, 115, 76, 72, 72, 69, 69, 68, 67, 65, 66, 69, 70, 72,
+        74, 76, 78, 81, 83, 85, 88, 90, 93, 95, 98, 100, 104, 105, 109, 111,
+        112, 113, 116, 115, 78, 74, 74, 70, 70, 69, 69, 66, 66, 70, 70, 74, 74,
+        77, 78, 82, 82, 86, 87, 92, 92, 96, 97, 102, 102, 107, 107, 112, 113,
+        115, 115, 118,
+        /* Size 4x8 */
+        31, 47, 60, 66, 40, 45, 54, 61, 46, 56, 64, 64, 48, 61, 75, 73, 54, 65,
+        85, 82, 61, 69, 92, 92, 64, 68, 90, 102, 68, 71, 87, 105,
+        /* Size 8x4 */
+        31, 40, 46, 48, 54, 61, 64, 68, 47, 45, 56, 61, 65, 69, 68, 71, 60, 54,
+        64, 75, 85, 92, 90, 87, 66, 61, 64, 73, 82, 92, 102, 105,
+        /* Size 8x16 */
+        32, 37, 48, 52, 57, 66, 68, 71, 30, 40, 46, 48, 52, 60, 63, 66, 33, 43,
+        47, 47, 51, 59, 60, 63, 42, 47, 50, 50, 53, 60, 59, 62, 49, 48, 53, 54,
+        57, 62, 62, 62, 49, 46, 53, 61, 64, 69, 66, 66, 50, 46, 54, 64, 67, 73,
+        72, 70, 54, 49, 55, 68, 73, 80, 76, 75, 57, 50, 56, 70, 76, 84, 80, 79,
+        63, 55, 60, 75, 82, 92, 87, 84, 64, 56, 61, 75, 83, 93, 93, 89, 68, 59,
+        64, 74, 86, 94, 98, 94, 70, 62, 66, 73, 83, 96, 99, 98, 72, 64, 66, 75,
+        83, 92, 101, 104, 74, 67, 66, 74, 84, 94, 103, 106, 76, 69, 67, 73, 82,
+        91, 101, 109,
+        /* Size 16x8 */
+        32, 30, 33, 42, 49, 49, 50, 54, 57, 63, 64, 68, 70, 72, 74, 76, 37, 40,
+        43, 47, 48, 46, 46, 49, 50, 55, 56, 59, 62, 64, 67, 69, 48, 46, 47, 50,
+        53, 53, 54, 55, 56, 60, 61, 64, 66, 66, 66, 67, 52, 48, 47, 50, 54, 61,
+        64, 68, 70, 75, 75, 74, 73, 75, 74, 73, 57, 52, 51, 53, 57, 64, 67, 73,
+        76, 82, 83, 86, 83, 83, 84, 82, 66, 60, 59, 60, 62, 69, 73, 80, 84, 92,
+        93, 94, 96, 92, 94, 91, 68, 63, 60, 59, 62, 66, 72, 76, 80, 87, 93, 98,
+        99, 101, 103, 101, 71, 66, 63, 62, 62, 66, 70, 75, 79, 84, 89, 94, 98,
+        104, 106, 109,
+        /* Size 16x32 */
+        32, 31, 37, 42, 48, 49, 52, 54, 57, 63, 66, 67, 68, 69, 71, 72, 31, 31,
+        38, 42, 47, 47, 50, 52, 54, 60, 63, 64, 65, 66, 67, 68, 30, 32, 40, 42,
+        46, 45, 48, 50, 52, 57, 60, 62, 63, 65, 66, 68, 32, 34, 41, 44, 46, 45,
+        48, 49, 51, 57, 59, 61, 62, 63, 64, 65, 33, 36, 43, 45, 47, 46, 47, 49,
+        51, 56, 59, 60, 60, 62, 63, 65, 37, 40, 47, 47, 47, 45, 47, 48, 50, 54,
+        57, 58, 60, 61, 62, 63, 42, 43, 47, 48, 50, 49, 50, 52, 53, 57, 60, 58,
+        59, 60, 62, 63, 45, 44, 47, 49, 51, 51, 52, 54, 55, 59, 61, 61, 61, 60,
+        61, 61, 49, 46, 48, 50, 53, 53, 54, 55, 57, 60, 62, 63, 62, 63, 62, 62,
+        48, 46, 47, 50, 53, 56, 57, 59, 60, 64, 66, 65, 65, 64, 64, 65, 49, 45,
+        46, 49, 53, 58, 61, 62, 64, 67, 69, 67, 66, 66, 66, 65, 49, 46, 46, 49,
+        53, 59, 62, 64, 65, 69, 71, 70, 68, 68, 67, 68, 50, 46, 46, 50, 54, 59,
+        64, 65, 67, 71, 73, 72, 72, 70, 70, 69, 52, 48, 47, 50, 54, 61, 66, 68,
+        71, 75, 77, 74, 73, 73, 71, 72, 54, 50, 49, 52, 55, 62, 68, 71, 73, 78,
+        80, 78, 76, 74, 75, 73, 55, 51, 49, 52, 56, 63, 69, 72, 75, 80, 82, 80,
+        79, 78, 76, 77, 57, 52, 50, 53, 56, 64, 70, 73, 76, 82, 84, 82, 80, 80,
+        79, 77, 60, 54, 52, 55, 58, 65, 72, 75, 79, 85, 88, 86, 84, 82, 81, 81,
+        63, 57, 55, 58, 60, 67, 75, 78, 82, 89, 92, 88, 87, 85, 84, 81, 64, 58,
+        55, 58, 61, 68, 75, 78, 82, 89, 92, 90, 89, 87, 86, 86, 64, 59, 56, 58,
+        61, 68, 75, 79, 83, 90, 93, 95, 93, 91, 89, 87, 67, 61, 58, 60, 63, 69,
+        76, 79, 85, 92, 95, 96, 94, 92, 91, 91, 68, 62, 59, 60, 64, 71, 74, 78,
+        86, 91, 94, 96, 98, 96, 94, 91, 69, 62, 60, 60, 65, 70, 72, 79, 85, 88,
+        95, 98, 99, 98, 97, 96, 70, 63, 62, 60, 66, 69, 73, 81, 83, 89, 96, 97,
+        99, 101, 98, 97, 71, 64, 63, 61, 67, 68, 74, 79, 82, 90, 93, 98, 102,
+        102, 102, 101, 72, 65, 64, 62, 66, 68, 75, 78, 83, 89, 92, 100, 101,
+        103, 104, 102, 73, 66, 65, 63, 66, 69, 75, 76, 84, 87, 93, 98, 102, 105,
+        106, 107, 74, 67, 67, 64, 66, 70, 74, 77, 84, 86, 94, 96, 103, 105, 106,
+        107, 75, 68, 68, 65, 66, 71, 74, 78, 83, 87, 93, 96, 103, 105, 109, 109,
+        76, 69, 69, 66, 67, 72, 73, 80, 82, 88, 91, 97, 101, 107, 109, 110, 77,
+        70, 70, 67, 67, 73, 73, 81, 81, 90, 90, 99, 99, 108, 108, 113,
+        /* Size 32x16 */
+        32, 31, 30, 32, 33, 37, 42, 45, 49, 48, 49, 49, 50, 52, 54, 55, 57, 60,
+        63, 64, 64, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 31, 31, 32, 34,
+        36, 40, 43, 44, 46, 46, 45, 46, 46, 48, 50, 51, 52, 54, 57, 58, 59, 61,
+        62, 62, 63, 64, 65, 66, 67, 68, 69, 70, 37, 38, 40, 41, 43, 47, 47, 47,
+        48, 47, 46, 46, 46, 47, 49, 49, 50, 52, 55, 55, 56, 58, 59, 60, 62, 63,
+        64, 65, 67, 68, 69, 70, 42, 42, 42, 44, 45, 47, 48, 49, 50, 50, 49, 49,
+        50, 50, 52, 52, 53, 55, 58, 58, 58, 60, 60, 60, 60, 61, 62, 63, 64, 65,
+        66, 67, 48, 47, 46, 46, 47, 47, 50, 51, 53, 53, 53, 53, 54, 54, 55, 56,
+        56, 58, 60, 61, 61, 63, 64, 65, 66, 67, 66, 66, 66, 66, 67, 67, 49, 47,
+        45, 45, 46, 45, 49, 51, 53, 56, 58, 59, 59, 61, 62, 63, 64, 65, 67, 68,
+        68, 69, 71, 70, 69, 68, 68, 69, 70, 71, 72, 73, 52, 50, 48, 48, 47, 47,
+        50, 52, 54, 57, 61, 62, 64, 66, 68, 69, 70, 72, 75, 75, 75, 76, 74, 72,
+        73, 74, 75, 75, 74, 74, 73, 73, 54, 52, 50, 49, 49, 48, 52, 54, 55, 59,
+        62, 64, 65, 68, 71, 72, 73, 75, 78, 78, 79, 79, 78, 79, 81, 79, 78, 76,
+        77, 78, 80, 81, 57, 54, 52, 51, 51, 50, 53, 55, 57, 60, 64, 65, 67, 71,
+        73, 75, 76, 79, 82, 82, 83, 85, 86, 85, 83, 82, 83, 84, 84, 83, 82, 81,
+        63, 60, 57, 57, 56, 54, 57, 59, 60, 64, 67, 69, 71, 75, 78, 80, 82, 85,
+        89, 89, 90, 92, 91, 88, 89, 90, 89, 87, 86, 87, 88, 90, 66, 63, 60, 59,
+        59, 57, 60, 61, 62, 66, 69, 71, 73, 77, 80, 82, 84, 88, 92, 92, 93, 95,
+        94, 95, 96, 93, 92, 93, 94, 93, 91, 90, 67, 64, 62, 61, 60, 58, 58, 61,
+        63, 65, 67, 70, 72, 74, 78, 80, 82, 86, 88, 90, 95, 96, 96, 98, 97, 98,
+        100, 98, 96, 96, 97, 99, 68, 65, 63, 62, 60, 60, 59, 61, 62, 65, 66, 68,
+        72, 73, 76, 79, 80, 84, 87, 89, 93, 94, 98, 99, 99, 102, 101, 102, 103,
+        103, 101, 99, 69, 66, 65, 63, 62, 61, 60, 60, 63, 64, 66, 68, 70, 73,
+        74, 78, 80, 82, 85, 87, 91, 92, 96, 98, 101, 102, 103, 105, 105, 105,
+        107, 108, 71, 67, 66, 64, 63, 62, 62, 61, 62, 64, 66, 67, 70, 71, 75,
+        76, 79, 81, 84, 86, 89, 91, 94, 97, 98, 102, 104, 106, 106, 109, 109,
+        108, 72, 68, 68, 65, 65, 63, 63, 61, 62, 65, 65, 68, 69, 72, 73, 77, 77,
+        81, 81, 86, 87, 91, 91, 96, 97, 101, 102, 107, 107, 109, 110, 113,
+        /* Size 4x16 */
+        31, 49, 63, 69, 32, 45, 57, 65, 36, 46, 56, 62, 43, 49, 57, 60, 46, 53,
+        60, 63, 45, 58, 67, 66, 46, 59, 71, 70, 50, 62, 78, 74, 52, 64, 82, 80,
+        57, 67, 89, 85, 59, 68, 90, 91, 62, 71, 91, 96, 63, 69, 89, 101, 65, 68,
+        89, 103, 67, 70, 86, 105, 69, 72, 88, 107,
+        /* Size 16x4 */
+        31, 32, 36, 43, 46, 45, 46, 50, 52, 57, 59, 62, 63, 65, 67, 69, 49, 45,
+        46, 49, 53, 58, 59, 62, 64, 67, 68, 71, 69, 68, 70, 72, 63, 57, 56, 57,
+        60, 67, 71, 78, 82, 89, 90, 91, 89, 89, 86, 88, 69, 65, 62, 60, 63, 66,
+        70, 74, 80, 85, 91, 96, 101, 103, 105, 107,
+        /* Size 8x32 */
+        32, 37, 48, 52, 57, 66, 68, 71, 31, 38, 47, 50, 54, 63, 65, 67, 30, 40,
+        46, 48, 52, 60, 63, 66, 32, 41, 46, 48, 51, 59, 62, 64, 33, 43, 47, 47,
+        51, 59, 60, 63, 37, 47, 47, 47, 50, 57, 60, 62, 42, 47, 50, 50, 53, 60,
+        59, 62, 45, 47, 51, 52, 55, 61, 61, 61, 49, 48, 53, 54, 57, 62, 62, 62,
+        48, 47, 53, 57, 60, 66, 65, 64, 49, 46, 53, 61, 64, 69, 66, 66, 49, 46,
+        53, 62, 65, 71, 68, 67, 50, 46, 54, 64, 67, 73, 72, 70, 52, 47, 54, 66,
+        71, 77, 73, 71, 54, 49, 55, 68, 73, 80, 76, 75, 55, 49, 56, 69, 75, 82,
+        79, 76, 57, 50, 56, 70, 76, 84, 80, 79, 60, 52, 58, 72, 79, 88, 84, 81,
+        63, 55, 60, 75, 82, 92, 87, 84, 64, 55, 61, 75, 82, 92, 89, 86, 64, 56,
+        61, 75, 83, 93, 93, 89, 67, 58, 63, 76, 85, 95, 94, 91, 68, 59, 64, 74,
+        86, 94, 98, 94, 69, 60, 65, 72, 85, 95, 99, 97, 70, 62, 66, 73, 83, 96,
+        99, 98, 71, 63, 67, 74, 82, 93, 102, 102, 72, 64, 66, 75, 83, 92, 101,
+        104, 73, 65, 66, 75, 84, 93, 102, 106, 74, 67, 66, 74, 84, 94, 103, 106,
+        75, 68, 66, 74, 83, 93, 103, 109, 76, 69, 67, 73, 82, 91, 101, 109, 77,
+        70, 67, 73, 81, 90, 99, 108,
+        /* Size 32x8 */
+        32, 31, 30, 32, 33, 37, 42, 45, 49, 48, 49, 49, 50, 52, 54, 55, 57, 60,
+        63, 64, 64, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 37, 38, 40, 41,
+        43, 47, 47, 47, 48, 47, 46, 46, 46, 47, 49, 49, 50, 52, 55, 55, 56, 58,
+        59, 60, 62, 63, 64, 65, 67, 68, 69, 70, 48, 47, 46, 46, 47, 47, 50, 51,
+        53, 53, 53, 53, 54, 54, 55, 56, 56, 58, 60, 61, 61, 63, 64, 65, 66, 67,
+        66, 66, 66, 66, 67, 67, 52, 50, 48, 48, 47, 47, 50, 52, 54, 57, 61, 62,
+        64, 66, 68, 69, 70, 72, 75, 75, 75, 76, 74, 72, 73, 74, 75, 75, 74, 74,
+        73, 73, 57, 54, 52, 51, 51, 50, 53, 55, 57, 60, 64, 65, 67, 71, 73, 75,
+        76, 79, 82, 82, 83, 85, 86, 85, 83, 82, 83, 84, 84, 83, 82, 81, 66, 63,
+        60, 59, 59, 57, 60, 61, 62, 66, 69, 71, 73, 77, 80, 82, 84, 88, 92, 92,
+        93, 95, 94, 95, 96, 93, 92, 93, 94, 93, 91, 90, 68, 65, 63, 62, 60, 60,
+        59, 61, 62, 65, 66, 68, 72, 73, 76, 79, 80, 84, 87, 89, 93, 94, 98, 99,
+        99, 102, 101, 102, 103, 103, 101, 99, 71, 67, 66, 64, 63, 62, 62, 61,
+        62, 64, 66, 67, 70, 71, 75, 76, 79, 81, 84, 86, 89, 91, 94, 97, 98, 102,
+        104, 106, 106, 109, 109, 108 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 41, 69, 92, 41, 63, 88, 103, 69, 88, 127, 140, 92, 103, 140, 184,
+        /* Size 8x8 */
+        32, 32, 37, 47, 62, 78, 90, 102, 32, 35, 39, 46, 58, 72, 84, 96, 37, 39,
+        51, 60, 71, 84, 93, 100, 47, 46, 60, 73, 87, 100, 106, 113, 62, 58, 71,
+        87, 105, 121, 129, 132, 78, 72, 84, 100, 121, 140, 148, 155, 90, 84, 93,
+        106, 129, 148, 169, 183, 102, 96, 100, 113, 132, 155, 183, 201,
+        /* Size 16x16 */
+        32, 31, 31, 32, 36, 39, 47, 54, 61, 71, 80, 86, 92, 98, 104, 111, 31,
+        32, 32, 33, 34, 37, 44, 50, 56, 65, 73, 79, 85, 91, 98, 105, 31, 32, 33,
+        34, 36, 39, 45, 50, 56, 64, 71, 77, 82, 88, 94, 100, 32, 33, 34, 36, 40,
+        42, 47, 51, 57, 65, 71, 76, 80, 85, 91, 98, 36, 34, 36, 40, 48, 50, 56,
+        60, 65, 73, 79, 84, 86, 90, 95, 98, 39, 37, 39, 42, 50, 54, 60, 65, 70,
+        78, 84, 89, 95, 96, 102, 105, 47, 44, 45, 47, 56, 60, 69, 75, 81, 89,
+        95, 100, 102, 104, 109, 112, 54, 50, 50, 51, 60, 65, 75, 82, 89, 97,
+        104, 109, 110, 114, 117, 121, 61, 56, 56, 57, 65, 70, 81, 89, 97, 106,
+        113, 119, 122, 126, 125, 130, 71, 65, 64, 65, 73, 78, 89, 97, 106, 117,
+        125, 131, 134, 134, 136, 141, 80, 73, 71, 71, 79, 84, 95, 104, 113, 125,
+        134, 140, 142, 145, 146, 152, 86, 79, 77, 76, 84, 89, 100, 109, 119,
+        131, 140, 147, 154, 157, 160, 165, 92, 85, 82, 80, 86, 95, 102, 110,
+        122, 134, 142, 154, 162, 168, 174, 178, 98, 91, 88, 85, 90, 96, 104,
+        114, 126, 134, 145, 157, 168, 176, 184, 193, 104, 98, 94, 91, 95, 102,
+        109, 117, 125, 136, 146, 160, 174, 184, 193, 201, 111, 105, 100, 98, 98,
+        105, 112, 121, 130, 141, 152, 165, 178, 193, 201, 210,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 32, 32, 34, 36, 38, 39, 44, 47, 49, 54, 59, 61, 65,
+        71, 76, 80, 83, 86, 89, 92, 95, 98, 101, 104, 108, 111, 114, 31, 32, 32,
+        32, 32, 32, 33, 34, 35, 37, 38, 42, 45, 47, 51, 56, 58, 62, 68, 72, 76,
+        78, 82, 85, 88, 90, 93, 96, 99, 102, 105, 109, 31, 32, 32, 32, 32, 32,
+        33, 33, 34, 36, 37, 41, 44, 46, 50, 54, 56, 60, 65, 70, 73, 76, 79, 82,
+        85, 88, 91, 95, 98, 101, 105, 109, 31, 32, 32, 32, 32, 33, 33, 34, 35,
+        36, 38, 41, 44, 45, 49, 54, 56, 59, 65, 69, 72, 75, 78, 81, 84, 86, 89,
+        92, 95, 98, 101, 104, 31, 32, 32, 32, 33, 34, 34, 35, 36, 38, 39, 42,
+        45, 46, 50, 54, 56, 59, 64, 68, 71, 74, 77, 79, 82, 85, 88, 91, 94, 97,
+        100, 104, 32, 32, 32, 33, 34, 35, 36, 37, 38, 39, 40, 42, 45, 46, 49,
+        53, 55, 58, 63, 66, 69, 72, 74, 78, 81, 84, 87, 90, 93, 96, 99, 102, 32,
+        33, 33, 33, 34, 36, 36, 38, 40, 41, 42, 44, 47, 48, 51, 55, 57, 60, 65,
+        68, 71, 73, 76, 78, 80, 82, 85, 88, 91, 95, 98, 102, 34, 34, 33, 34, 35,
+        37, 38, 39, 42, 44, 45, 47, 50, 51, 54, 58, 60, 63, 68, 71, 74, 76, 79,
+        82, 85, 86, 87, 88, 90, 93, 96, 99, 36, 35, 34, 35, 36, 38, 40, 42, 48,
+        50, 50, 54, 56, 57, 60, 64, 65, 68, 73, 76, 79, 81, 84, 86, 86, 88, 90,
+        93, 95, 97, 98, 100, 38, 37, 36, 36, 38, 39, 41, 44, 50, 51, 52, 56, 58,
+        60, 63, 67, 68, 71, 76, 79, 82, 84, 87, 87, 90, 93, 94, 95, 96, 100,
+        103, 106, 39, 38, 37, 38, 39, 40, 42, 45, 50, 52, 54, 58, 60, 62, 65,
+        69, 70, 73, 78, 81, 84, 86, 89, 92, 95, 95, 96, 99, 102, 104, 105, 106,
+        44, 42, 41, 41, 42, 42, 44, 47, 54, 56, 58, 63, 66, 68, 71, 75, 77, 79,
+        84, 88, 90, 92, 95, 97, 97, 99, 102, 103, 103, 106, 109, 113, 47, 45,
+        44, 44, 45, 45, 47, 50, 56, 58, 60, 66, 69, 71, 75, 79, 81, 84, 89, 92,
+        95, 97, 100, 100, 102, 105, 104, 106, 109, 111, 112, 113, 49, 47, 46,
+        45, 46, 46, 48, 51, 57, 60, 62, 68, 71, 73, 77, 81, 83, 87, 92, 95, 98,
+        100, 103, 105, 107, 106, 109, 112, 112, 113, 117, 120, 54, 51, 50, 49,
+        50, 49, 51, 54, 60, 63, 65, 71, 75, 77, 82, 87, 89, 92, 97, 101, 104,
+        106, 109, 112, 110, 113, 114, 114, 117, 121, 121, 121, 59, 56, 54, 54,
+        54, 53, 55, 58, 64, 67, 69, 75, 79, 81, 87, 92, 94, 98, 103, 107, 110,
+        113, 116, 114, 117, 118, 117, 121, 122, 122, 125, 129, 61, 58, 56, 56,
+        56, 55, 57, 60, 65, 68, 70, 77, 81, 83, 89, 94, 97, 101, 106, 110, 113,
+        116, 119, 120, 122, 121, 126, 124, 125, 130, 130, 130, 65, 62, 60, 59,
+        59, 58, 60, 63, 68, 71, 73, 79, 84, 87, 92, 98, 101, 105, 111, 115, 118,
+        121, 124, 128, 125, 129, 128, 131, 133, 132, 135, 139, 71, 68, 65, 65,
+        64, 63, 65, 68, 73, 76, 78, 84, 89, 92, 97, 103, 106, 111, 117, 122,
+        125, 128, 131, 131, 134, 132, 134, 136, 136, 140, 141, 140, 76, 72, 70,
+        69, 68, 66, 68, 71, 76, 79, 81, 88, 92, 95, 101, 107, 110, 115, 122,
+        127, 130, 133, 136, 136, 138, 139, 141, 140, 145, 143, 146, 151, 80, 76,
+        73, 72, 71, 69, 71, 74, 79, 82, 84, 90, 95, 98, 104, 110, 113, 118, 125,
+        130, 134, 137, 140, 146, 142, 146, 145, 149, 146, 150, 152, 151, 83, 78,
+        76, 75, 74, 72, 73, 76, 81, 84, 86, 92, 97, 100, 106, 113, 116, 121,
+        128, 133, 137, 140, 144, 147, 152, 148, 154, 151, 156, 155, 156, 162,
+        86, 82, 79, 78, 77, 74, 76, 79, 84, 87, 89, 95, 100, 103, 109, 116, 119,
+        124, 131, 136, 140, 144, 147, 150, 154, 159, 157, 160, 160, 162, 165,
+        162, 89, 85, 82, 81, 79, 78, 78, 82, 86, 87, 92, 97, 100, 105, 112, 114,
+        120, 128, 131, 136, 146, 147, 150, 155, 156, 161, 166, 165, 167, 169,
+        169, 175, 92, 88, 85, 84, 82, 81, 80, 85, 86, 90, 95, 97, 102, 107, 110,
+        117, 122, 125, 134, 138, 142, 152, 154, 156, 162, 163, 168, 173, 174,
+        174, 178, 176, 95, 90, 88, 86, 85, 84, 82, 86, 88, 93, 95, 99, 105, 106,
+        113, 118, 121, 129, 132, 139, 146, 148, 159, 161, 163, 169, 170, 176,
+        180, 183, 181, 187, 98, 93, 91, 89, 88, 87, 85, 87, 90, 94, 96, 102,
+        104, 109, 114, 117, 126, 128, 134, 141, 145, 154, 157, 166, 168, 170,
+        176, 178, 184, 188, 193, 188, 101, 96, 95, 92, 91, 90, 88, 88, 93, 95,
+        99, 103, 106, 112, 114, 121, 124, 131, 136, 140, 149, 151, 160, 165,
+        173, 176, 178, 184, 186, 192, 196, 203, 104, 99, 98, 95, 94, 93, 91, 90,
+        95, 96, 102, 103, 109, 112, 117, 122, 125, 133, 136, 145, 146, 156, 160,
+        167, 174, 180, 184, 186, 193, 194, 201, 204, 108, 102, 101, 98, 97, 96,
+        95, 93, 97, 100, 104, 106, 111, 113, 121, 122, 130, 132, 140, 143, 150,
+        155, 162, 169, 174, 183, 188, 192, 194, 201, 202, 210, 111, 105, 105,
+        101, 100, 99, 98, 96, 98, 103, 105, 109, 112, 117, 121, 125, 130, 135,
+        141, 146, 152, 156, 165, 169, 178, 181, 193, 196, 201, 202, 210, 211,
+        114, 109, 109, 104, 104, 102, 102, 99, 100, 106, 106, 113, 113, 120,
+        121, 129, 130, 139, 140, 151, 151, 162, 162, 175, 176, 187, 188, 203,
+        204, 210, 211, 219,
+        /* Size 4x8 */
+        32, 42, 69, 88, 33, 42, 64, 83, 36, 56, 77, 88, 46, 67, 93, 105, 60, 79,
+        112, 122, 75, 92, 130, 144, 86, 95, 136, 167, 98, 105, 136, 177,
+        /* Size 8x4 */
+        32, 33, 36, 46, 60, 75, 86, 98, 42, 42, 56, 67, 79, 92, 95, 105, 69, 64,
+        77, 93, 112, 130, 136, 136, 88, 83, 88, 105, 122, 144, 167, 177,
+        /* Size 8x16 */
+        32, 32, 36, 47, 65, 79, 90, 96, 31, 32, 35, 44, 60, 72, 84, 90, 32, 34,
+        36, 45, 59, 71, 80, 87, 32, 35, 40, 47, 60, 71, 78, 85, 36, 37, 48, 56,
+        68, 78, 83, 87, 39, 40, 50, 60, 73, 84, 91, 94, 47, 45, 56, 69, 84, 95,
+        101, 101, 53, 50, 60, 75, 92, 103, 108, 110, 61, 56, 65, 81, 100, 113,
+        116, 118, 71, 64, 73, 89, 111, 125, 129, 129, 79, 70, 79, 95, 118, 133,
+        142, 138, 86, 76, 84, 100, 124, 140, 153, 150, 92, 82, 89, 101, 121,
+        148, 157, 161, 98, 88, 93, 108, 124, 141, 163, 174, 104, 94, 95, 110,
+        129, 151, 171, 181, 110, 100, 98, 111, 127, 147, 169, 188,
+        /* Size 16x8 */
+        32, 31, 32, 32, 36, 39, 47, 53, 61, 71, 79, 86, 92, 98, 104, 110, 32,
+        32, 34, 35, 37, 40, 45, 50, 56, 64, 70, 76, 82, 88, 94, 100, 36, 35, 36,
+        40, 48, 50, 56, 60, 65, 73, 79, 84, 89, 93, 95, 98, 47, 44, 45, 47, 56,
+        60, 69, 75, 81, 89, 95, 100, 101, 108, 110, 111, 65, 60, 59, 60, 68, 73,
+        84, 92, 100, 111, 118, 124, 121, 124, 129, 127, 79, 72, 71, 71, 78, 84,
+        95, 103, 113, 125, 133, 140, 148, 141, 151, 147, 90, 84, 80, 78, 83, 91,
+        101, 108, 116, 129, 142, 153, 157, 163, 171, 169, 96, 90, 87, 85, 87,
+        94, 101, 110, 118, 129, 138, 150, 161, 174, 181, 188,
+        /* Size 16x32 */
+        32, 31, 32, 32, 36, 44, 47, 53, 65, 73, 79, 87, 90, 93, 96, 99, 31, 32,
+        32, 33, 35, 42, 45, 51, 62, 69, 75, 83, 86, 88, 91, 94, 31, 32, 32, 33,
+        35, 41, 44, 49, 60, 67, 72, 80, 84, 87, 90, 94, 31, 32, 33, 33, 35, 41,
+        44, 49, 59, 66, 71, 79, 82, 84, 87, 90, 32, 32, 34, 34, 36, 42, 45, 50,
+        59, 65, 71, 78, 80, 83, 87, 90, 32, 33, 35, 36, 38, 42, 45, 49, 58, 64,
+        69, 76, 80, 83, 86, 88, 32, 33, 35, 36, 40, 44, 47, 51, 60, 66, 71, 76,
+        78, 81, 85, 89, 34, 34, 36, 38, 42, 48, 50, 54, 63, 69, 73, 80, 82, 81,
+        84, 86, 36, 34, 37, 40, 48, 54, 56, 60, 68, 74, 78, 84, 83, 86, 87, 87,
+        38, 36, 39, 41, 49, 56, 58, 63, 71, 77, 81, 86, 88, 88, 90, 93, 39, 37,
+        40, 42, 50, 58, 60, 65, 73, 79, 84, 90, 91, 92, 94, 93, 44, 41, 42, 45,
+        53, 63, 66, 71, 79, 85, 90, 96, 94, 96, 96, 99, 47, 44, 45, 47, 56, 66,
+        69, 75, 84, 90, 95, 99, 101, 98, 101, 99, 49, 46, 47, 48, 57, 67, 71,
+        77, 86, 93, 97, 103, 103, 105, 102, 106, 53, 49, 50, 51, 60, 71, 75, 82,
+        92, 99, 103, 111, 108, 107, 110, 107, 58, 54, 54, 55, 63, 75, 79, 87,
+        98, 105, 110, 114, 114, 113, 111, 115, 61, 56, 56, 57, 65, 77, 81, 89,
+        100, 107, 113, 118, 116, 117, 118, 116, 65, 60, 59, 60, 68, 79, 84, 92,
+        105, 112, 118, 126, 124, 122, 121, 124, 71, 65, 64, 65, 73, 84, 89, 97,
+        111, 119, 125, 130, 129, 129, 129, 125, 76, 69, 68, 69, 76, 88, 92, 101,
+        115, 123, 130, 134, 134, 131, 132, 135, 79, 72, 70, 71, 79, 90, 95, 104,
+        118, 127, 133, 143, 142, 141, 138, 136, 82, 75, 73, 74, 81, 92, 97, 106,
+        121, 130, 136, 146, 145, 144, 144, 145, 86, 78, 76, 77, 84, 95, 100,
+        109, 124, 133, 140, 147, 153, 151, 150, 146, 89, 81, 79, 78, 87, 95, 99,
+        112, 124, 130, 145, 152, 156, 157, 156, 158, 92, 84, 82, 80, 89, 95,
+        101, 116, 121, 132, 148, 151, 157, 163, 161, 159, 95, 86, 85, 83, 92,
+        95, 105, 114, 120, 136, 143, 155, 163, 167, 171, 170, 98, 89, 88, 85,
+        93, 95, 108, 113, 124, 136, 141, 160, 163, 169, 174, 171, 101, 92, 91,
+        88, 94, 98, 110, 112, 128, 133, 146, 158, 166, 175, 179, 185, 104, 95,
+        94, 91, 95, 101, 110, 115, 129, 132, 151, 154, 171, 175, 181, 186, 107,
+        98, 97, 94, 96, 105, 110, 119, 128, 136, 149, 156, 173, 177, 188, 192,
+        110, 101, 100, 97, 98, 108, 111, 123, 127, 141, 147, 161, 169, 183, 188,
+        193, 114, 104, 104, 100, 100, 111, 111, 126, 127, 145, 145, 166, 166,
+        189, 190, 201,
+        /* Size 32x16 */
+        32, 31, 31, 31, 32, 32, 32, 34, 36, 38, 39, 44, 47, 49, 53, 58, 61, 65,
+        71, 76, 79, 82, 86, 89, 92, 95, 98, 101, 104, 107, 110, 114, 31, 32, 32,
+        32, 32, 33, 33, 34, 34, 36, 37, 41, 44, 46, 49, 54, 56, 60, 65, 69, 72,
+        75, 78, 81, 84, 86, 89, 92, 95, 98, 101, 104, 32, 32, 32, 33, 34, 35,
+        35, 36, 37, 39, 40, 42, 45, 47, 50, 54, 56, 59, 64, 68, 70, 73, 76, 79,
+        82, 85, 88, 91, 94, 97, 100, 104, 32, 33, 33, 33, 34, 36, 36, 38, 40,
+        41, 42, 45, 47, 48, 51, 55, 57, 60, 65, 69, 71, 74, 77, 78, 80, 83, 85,
+        88, 91, 94, 97, 100, 36, 35, 35, 35, 36, 38, 40, 42, 48, 49, 50, 53, 56,
+        57, 60, 63, 65, 68, 73, 76, 79, 81, 84, 87, 89, 92, 93, 94, 95, 96, 98,
+        100, 44, 42, 41, 41, 42, 42, 44, 48, 54, 56, 58, 63, 66, 67, 71, 75, 77,
+        79, 84, 88, 90, 92, 95, 95, 95, 95, 95, 98, 101, 105, 108, 111, 47, 45,
+        44, 44, 45, 45, 47, 50, 56, 58, 60, 66, 69, 71, 75, 79, 81, 84, 89, 92,
+        95, 97, 100, 99, 101, 105, 108, 110, 110, 110, 111, 111, 53, 51, 49, 49,
+        50, 49, 51, 54, 60, 63, 65, 71, 75, 77, 82, 87, 89, 92, 97, 101, 104,
+        106, 109, 112, 116, 114, 113, 112, 115, 119, 123, 126, 65, 62, 60, 59,
+        59, 58, 60, 63, 68, 71, 73, 79, 84, 86, 92, 98, 100, 105, 111, 115, 118,
+        121, 124, 124, 121, 120, 124, 128, 129, 128, 127, 127, 73, 69, 67, 66,
+        65, 64, 66, 69, 74, 77, 79, 85, 90, 93, 99, 105, 107, 112, 119, 123,
+        127, 130, 133, 130, 132, 136, 136, 133, 132, 136, 141, 145, 79, 75, 72,
+        71, 71, 69, 71, 73, 78, 81, 84, 90, 95, 97, 103, 110, 113, 118, 125,
+        130, 133, 136, 140, 145, 148, 143, 141, 146, 151, 149, 147, 145, 87, 83,
+        80, 79, 78, 76, 76, 80, 84, 86, 90, 96, 99, 103, 111, 114, 118, 126,
+        130, 134, 143, 146, 147, 152, 151, 155, 160, 158, 154, 156, 161, 166,
+        90, 86, 84, 82, 80, 80, 78, 82, 83, 88, 91, 94, 101, 103, 108, 114, 116,
+        124, 129, 134, 142, 145, 153, 156, 157, 163, 163, 166, 171, 173, 169,
+        166, 93, 88, 87, 84, 83, 83, 81, 81, 86, 88, 92, 96, 98, 105, 107, 113,
+        117, 122, 129, 131, 141, 144, 151, 157, 163, 167, 169, 175, 175, 177,
+        183, 189, 96, 91, 90, 87, 87, 86, 85, 84, 87, 90, 94, 96, 101, 102, 110,
+        111, 118, 121, 129, 132, 138, 144, 150, 156, 161, 171, 174, 179, 181,
+        188, 188, 190, 99, 94, 94, 90, 90, 88, 89, 86, 87, 93, 93, 99, 99, 106,
+        107, 115, 116, 124, 125, 135, 136, 145, 146, 158, 159, 170, 171, 185,
+        186, 192, 193, 201,
+        /* Size 4x16 */
+        31, 44, 73, 93, 32, 41, 67, 87, 32, 42, 65, 83, 33, 44, 66, 81, 34, 54,
+        74, 86, 37, 58, 79, 92, 44, 66, 90, 98, 49, 71, 99, 107, 56, 77, 107,
+        117, 65, 84, 119, 129, 72, 90, 127, 141, 78, 95, 133, 151, 84, 95, 132,
+        163, 89, 95, 136, 169, 95, 101, 132, 175, 101, 108, 141, 183,
+        /* Size 16x4 */
+        31, 32, 32, 33, 34, 37, 44, 49, 56, 65, 72, 78, 84, 89, 95, 101, 44, 41,
+        42, 44, 54, 58, 66, 71, 77, 84, 90, 95, 95, 95, 101, 108, 73, 67, 65,
+        66, 74, 79, 90, 99, 107, 119, 127, 133, 132, 136, 132, 141, 93, 87, 83,
+        81, 86, 92, 98, 107, 117, 129, 141, 151, 163, 169, 175, 183,
+        /* Size 8x32 */
+        32, 32, 36, 47, 65, 79, 90, 96, 31, 32, 35, 45, 62, 75, 86, 91, 31, 32,
+        35, 44, 60, 72, 84, 90, 31, 33, 35, 44, 59, 71, 82, 87, 32, 34, 36, 45,
+        59, 71, 80, 87, 32, 35, 38, 45, 58, 69, 80, 86, 32, 35, 40, 47, 60, 71,
+        78, 85, 34, 36, 42, 50, 63, 73, 82, 84, 36, 37, 48, 56, 68, 78, 83, 87,
+        38, 39, 49, 58, 71, 81, 88, 90, 39, 40, 50, 60, 73, 84, 91, 94, 44, 42,
+        53, 66, 79, 90, 94, 96, 47, 45, 56, 69, 84, 95, 101, 101, 49, 47, 57,
+        71, 86, 97, 103, 102, 53, 50, 60, 75, 92, 103, 108, 110, 58, 54, 63, 79,
+        98, 110, 114, 111, 61, 56, 65, 81, 100, 113, 116, 118, 65, 59, 68, 84,
+        105, 118, 124, 121, 71, 64, 73, 89, 111, 125, 129, 129, 76, 68, 76, 92,
+        115, 130, 134, 132, 79, 70, 79, 95, 118, 133, 142, 138, 82, 73, 81, 97,
+        121, 136, 145, 144, 86, 76, 84, 100, 124, 140, 153, 150, 89, 79, 87, 99,
+        124, 145, 156, 156, 92, 82, 89, 101, 121, 148, 157, 161, 95, 85, 92,
+        105, 120, 143, 163, 171, 98, 88, 93, 108, 124, 141, 163, 174, 101, 91,
+        94, 110, 128, 146, 166, 179, 104, 94, 95, 110, 129, 151, 171, 181, 107,
+        97, 96, 110, 128, 149, 173, 188, 110, 100, 98, 111, 127, 147, 169, 188,
+        114, 104, 100, 111, 127, 145, 166, 190,
+        /* Size 32x8 */
+        32, 31, 31, 31, 32, 32, 32, 34, 36, 38, 39, 44, 47, 49, 53, 58, 61, 65,
+        71, 76, 79, 82, 86, 89, 92, 95, 98, 101, 104, 107, 110, 114, 32, 32, 32,
+        33, 34, 35, 35, 36, 37, 39, 40, 42, 45, 47, 50, 54, 56, 59, 64, 68, 70,
+        73, 76, 79, 82, 85, 88, 91, 94, 97, 100, 104, 36, 35, 35, 35, 36, 38,
+        40, 42, 48, 49, 50, 53, 56, 57, 60, 63, 65, 68, 73, 76, 79, 81, 84, 87,
+        89, 92, 93, 94, 95, 96, 98, 100, 47, 45, 44, 44, 45, 45, 47, 50, 56, 58,
+        60, 66, 69, 71, 75, 79, 81, 84, 89, 92, 95, 97, 100, 99, 101, 105, 108,
+        110, 110, 110, 111, 111, 65, 62, 60, 59, 59, 58, 60, 63, 68, 71, 73, 79,
+        84, 86, 92, 98, 100, 105, 111, 115, 118, 121, 124, 124, 121, 120, 124,
+        128, 129, 128, 127, 127, 79, 75, 72, 71, 71, 69, 71, 73, 78, 81, 84, 90,
+        95, 97, 103, 110, 113, 118, 125, 130, 133, 136, 140, 145, 148, 143, 141,
+        146, 151, 149, 147, 145, 90, 86, 84, 82, 80, 80, 78, 82, 83, 88, 91, 94,
+        101, 103, 108, 114, 116, 124, 129, 134, 142, 145, 153, 156, 157, 163,
+        163, 166, 171, 173, 169, 166, 96, 91, 90, 87, 87, 86, 85, 84, 87, 90,
+        94, 96, 101, 102, 110, 111, 118, 121, 129, 132, 138, 144, 150, 156, 161,
+        171, 174, 179, 181, 188, 188, 190 },
+      { /* Chroma */
+        /* Size 4x4 */
+        33, 45, 56, 64, 45, 58, 66, 69, 56, 66, 86, 87, 64, 69, 87, 105,
+        /* Size 8x8 */
+        31, 38, 47, 48, 54, 61, 66, 69, 38, 47, 47, 46, 50, 55, 61, 65, 47, 47,
+        53, 55, 58, 63, 65, 66, 48, 46, 55, 62, 67, 72, 73, 73, 54, 50, 58, 67,
+        76, 83, 84, 82, 61, 55, 63, 72, 83, 91, 92, 92, 66, 61, 65, 73, 84, 92,
+        101, 103, 69, 65, 66, 73, 82, 92, 103, 109,
+        /* Size 16x16 */
+        32, 30, 33, 38, 49, 48, 50, 52, 55, 60, 63, 66, 68, 70, 72, 74, 30, 31,
+        35, 41, 46, 46, 46, 48, 51, 55, 58, 60, 63, 65, 68, 70, 33, 35, 39, 44,
+        47, 46, 46, 47, 50, 53, 56, 58, 60, 62, 65, 67, 38, 41, 44, 47, 49, 48,
+        47, 48, 50, 53, 55, 58, 58, 60, 62, 65, 49, 46, 47, 49, 53, 53, 54, 54,
+        56, 58, 60, 62, 62, 63, 64, 64, 48, 46, 46, 48, 53, 54, 56, 57, 59, 61,
+        63, 65, 67, 66, 68, 68, 50, 46, 46, 47, 54, 56, 61, 63, 65, 68, 70, 72,
+        71, 71, 72, 72, 52, 48, 47, 48, 54, 57, 63, 66, 69, 72, 75, 76, 75, 76,
+        76, 76, 55, 51, 50, 50, 56, 59, 65, 69, 73, 77, 79, 81, 81, 81, 80, 80,
+        60, 55, 53, 53, 58, 61, 68, 72, 77, 82, 85, 87, 87, 85, 84, 85, 63, 58,
+        56, 55, 60, 63, 70, 75, 79, 85, 89, 91, 91, 90, 89, 90, 66, 60, 58, 58,
+        62, 65, 72, 76, 81, 87, 91, 94, 96, 95, 95, 95, 68, 63, 60, 58, 62, 67,
+        71, 75, 81, 87, 91, 96, 99, 100, 100, 100, 70, 65, 62, 60, 63, 66, 71,
+        76, 81, 85, 90, 95, 100, 103, 104, 105, 72, 68, 65, 62, 64, 68, 72, 76,
+        80, 84, 89, 95, 100, 104, 107, 108, 74, 70, 67, 65, 64, 68, 72, 76, 80,
+        85, 90, 95, 100, 105, 108, 111,
+        /* Size 32x32 */
+        32, 31, 30, 31, 33, 36, 38, 41, 49, 49, 48, 49, 50, 51, 52, 54, 55, 57,
+        60, 62, 63, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 31, 31, 31, 32,
+        34, 38, 40, 42, 47, 47, 47, 47, 48, 48, 50, 52, 53, 54, 57, 59, 60, 61,
+        63, 64, 65, 66, 67, 67, 68, 69, 70, 71, 30, 31, 31, 32, 35, 39, 41, 42,
+        46, 46, 46, 45, 46, 47, 48, 50, 51, 52, 55, 57, 58, 59, 60, 62, 63, 64,
+        65, 67, 68, 69, 70, 71, 31, 32, 32, 33, 36, 40, 41, 43, 46, 46, 45, 45,
+        46, 46, 47, 49, 50, 51, 54, 56, 57, 58, 59, 61, 62, 63, 63, 64, 65, 66,
+        67, 68, 33, 34, 35, 36, 39, 43, 44, 45, 47, 46, 46, 45, 46, 47, 47, 49,
+        50, 51, 53, 55, 56, 57, 58, 59, 60, 61, 62, 63, 65, 66, 67, 68, 36, 38,
+        39, 40, 43, 47, 47, 47, 48, 47, 46, 45, 46, 46, 47, 48, 49, 50, 52, 53,
+        54, 55, 56, 58, 59, 61, 62, 63, 64, 65, 66, 66, 38, 40, 41, 41, 44, 47,
+        47, 48, 49, 48, 48, 47, 47, 47, 48, 49, 50, 51, 53, 54, 55, 56, 58, 58,
+        58, 59, 60, 61, 62, 64, 65, 66, 41, 42, 42, 43, 45, 47, 48, 48, 50, 50,
+        49, 49, 50, 50, 50, 52, 52, 53, 55, 56, 57, 58, 59, 60, 61, 61, 61, 61,
+        62, 63, 63, 64, 49, 47, 46, 46, 47, 48, 49, 50, 53, 53, 53, 53, 54, 54,
+        54, 55, 56, 56, 58, 59, 60, 61, 62, 63, 62, 62, 63, 64, 64, 64, 64, 64,
+        49, 47, 46, 46, 46, 47, 48, 50, 53, 53, 54, 55, 55, 55, 56, 57, 58, 58,
+        60, 61, 62, 63, 64, 64, 64, 65, 65, 65, 65, 66, 67, 68, 48, 47, 46, 45,
+        46, 46, 48, 49, 53, 54, 54, 55, 56, 56, 57, 58, 59, 60, 61, 63, 63, 64,
+        65, 66, 67, 66, 66, 67, 68, 68, 68, 68, 49, 47, 45, 45, 45, 45, 47, 49,
+        53, 55, 55, 58, 59, 60, 61, 62, 63, 63, 65, 66, 67, 68, 69, 69, 68, 68,
+        69, 69, 69, 69, 70, 71, 50, 48, 46, 46, 46, 46, 47, 50, 54, 55, 56, 59,
+        61, 61, 63, 64, 65, 66, 68, 69, 70, 71, 72, 71, 71, 72, 71, 71, 72, 72,
+        72, 71, 51, 48, 47, 46, 47, 46, 47, 50, 54, 55, 56, 60, 61, 62, 64, 66,
+        66, 67, 69, 70, 71, 72, 73, 73, 74, 73, 73, 74, 73, 73, 74, 75, 52, 50,
+        48, 47, 47, 47, 48, 50, 54, 56, 57, 61, 63, 64, 66, 68, 69, 70, 72, 74,
+        75, 75, 76, 77, 75, 76, 76, 75, 76, 77, 76, 75, 54, 52, 50, 49, 49, 48,
+        49, 52, 55, 57, 58, 62, 64, 66, 68, 71, 72, 73, 75, 77, 78, 79, 80, 78,
+        79, 78, 77, 78, 78, 77, 78, 79, 55, 53, 51, 50, 50, 49, 50, 52, 56, 58,
+        59, 63, 65, 66, 69, 72, 73, 74, 77, 78, 79, 80, 81, 81, 81, 80, 81, 80,
+        80, 81, 80, 79, 57, 54, 52, 51, 51, 50, 51, 53, 56, 58, 60, 63, 66, 67,
+        70, 73, 74, 76, 79, 80, 82, 83, 84, 85, 83, 84, 83, 83, 83, 82, 82, 83,
+        60, 57, 55, 54, 53, 52, 53, 55, 58, 60, 61, 65, 68, 69, 72, 75, 77, 79,
+        82, 84, 85, 86, 87, 86, 87, 85, 85, 85, 84, 86, 85, 84, 62, 59, 57, 56,
+        55, 53, 54, 56, 59, 61, 63, 66, 69, 70, 74, 77, 78, 80, 84, 86, 87, 88,
+        90, 89, 89, 88, 88, 87, 88, 87, 87, 88, 63, 60, 58, 57, 56, 54, 55, 57,
+        60, 62, 63, 67, 70, 71, 75, 78, 79, 82, 85, 87, 89, 90, 91, 93, 91, 91,
+        90, 91, 89, 90, 90, 89, 65, 61, 59, 58, 57, 55, 56, 58, 61, 63, 64, 68,
+        71, 72, 75, 79, 80, 83, 86, 88, 90, 91, 93, 94, 95, 92, 94, 92, 93, 92,
+        91, 93, 66, 63, 60, 59, 58, 56, 58, 59, 62, 64, 65, 69, 72, 73, 76, 80,
+        81, 84, 87, 90, 91, 93, 94, 95, 96, 97, 95, 95, 95, 95, 95, 93, 67, 64,
+        62, 61, 59, 58, 58, 60, 63, 64, 66, 69, 71, 73, 77, 78, 81, 85, 86, 89,
+        93, 94, 95, 97, 97, 98, 99, 97, 97, 97, 96, 98, 68, 65, 63, 62, 60, 59,
+        58, 61, 62, 64, 67, 68, 71, 74, 75, 79, 81, 83, 87, 89, 91, 95, 96, 97,
+        99, 98, 100, 100, 100, 99, 100, 98, 69, 66, 64, 63, 61, 61, 59, 61, 62,
+        65, 66, 68, 72, 73, 76, 78, 80, 84, 85, 88, 91, 92, 97, 98, 98, 101,
+        100, 102, 102, 103, 101, 102, 70, 67, 65, 63, 62, 62, 60, 61, 63, 65,
+        66, 69, 71, 73, 76, 77, 81, 83, 85, 88, 90, 94, 95, 99, 100, 100, 103,
+        102, 104, 104, 105, 103, 71, 67, 67, 64, 63, 63, 61, 61, 64, 65, 67, 69,
+        71, 74, 75, 78, 80, 83, 85, 87, 91, 92, 95, 97, 100, 102, 102, 105, 104,
+        106, 106, 108, 72, 68, 68, 65, 65, 64, 62, 62, 64, 65, 68, 69, 72, 73,
+        76, 78, 80, 83, 84, 88, 89, 93, 95, 97, 100, 102, 104, 104, 107, 106,
+        108, 108, 73, 69, 69, 66, 66, 65, 64, 63, 64, 66, 68, 69, 72, 73, 77,
+        77, 81, 82, 86, 87, 90, 92, 95, 97, 99, 103, 104, 106, 106, 109, 108,
+        110, 74, 70, 70, 67, 67, 66, 65, 63, 64, 67, 68, 70, 72, 74, 76, 78, 80,
+        82, 85, 87, 90, 91, 95, 96, 100, 101, 105, 106, 108, 108, 111, 110, 75,
+        71, 71, 68, 68, 66, 66, 64, 64, 68, 68, 71, 71, 75, 75, 79, 79, 83, 84,
+        88, 89, 93, 93, 98, 98, 102, 103, 108, 108, 110, 110, 113,
+        /* Size 4x8 */
+        31, 47, 57, 65, 40, 45, 52, 61, 46, 55, 61, 63, 47, 60, 70, 72, 52, 64,
+        79, 81, 59, 68, 87, 90, 63, 66, 88, 99, 66, 69, 85, 102,
+        /* Size 8x4 */
+        31, 40, 46, 47, 52, 59, 63, 66, 47, 45, 55, 60, 64, 68, 66, 69, 57, 52,
+        61, 70, 79, 87, 88, 85, 65, 61, 63, 72, 81, 90, 99, 102,
+        /* Size 8x16 */
+        32, 35, 48, 50, 57, 63, 68, 70, 30, 38, 46, 46, 52, 58, 63, 65, 33, 41,
+        47, 46, 51, 56, 60, 63, 39, 46, 48, 47, 51, 55, 58, 61, 49, 48, 53, 54,
+        57, 60, 61, 61, 48, 46, 53, 56, 60, 64, 65, 65, 50, 46, 54, 61, 66, 70,
+        71, 69, 52, 47, 54, 63, 71, 75, 75, 74, 55, 49, 56, 65, 74, 79, 79, 78,
+        60, 53, 58, 68, 79, 85, 85, 82, 63, 55, 60, 70, 82, 89, 91, 87, 66, 58,
+        62, 72, 84, 91, 95, 91, 68, 60, 64, 71, 81, 94, 97, 96, 70, 62, 65, 73,
+        81, 89, 98, 101, 72, 65, 65, 72, 82, 92, 100, 103, 74, 67, 65, 71, 79,
+        89, 98, 105,
+        /* Size 16x8 */
+        32, 30, 33, 39, 49, 48, 50, 52, 55, 60, 63, 66, 68, 70, 72, 74, 35, 38,
+        41, 46, 48, 46, 46, 47, 49, 53, 55, 58, 60, 62, 65, 67, 48, 46, 47, 48,
+        53, 53, 54, 54, 56, 58, 60, 62, 64, 65, 65, 65, 50, 46, 46, 47, 54, 56,
+        61, 63, 65, 68, 70, 72, 71, 73, 72, 71, 57, 52, 51, 51, 57, 60, 66, 71,
+        74, 79, 82, 84, 81, 81, 82, 79, 63, 58, 56, 55, 60, 64, 70, 75, 79, 85,
+        89, 91, 94, 89, 92, 89, 68, 63, 60, 58, 61, 65, 71, 75, 79, 85, 91, 95,
+        97, 98, 100, 98, 70, 65, 63, 61, 61, 65, 69, 74, 78, 82, 87, 91, 96,
+        101, 103, 105,
+        /* Size 16x32 */
+        32, 31, 35, 38, 48, 49, 50, 52, 57, 61, 63, 67, 68, 69, 70, 71, 31, 31,
+        37, 40, 47, 47, 48, 50, 54, 57, 60, 63, 64, 65, 66, 67, 30, 32, 38, 40,
+        46, 45, 46, 48, 52, 55, 58, 61, 63, 64, 65, 67, 31, 33, 38, 41, 46, 45,
+        46, 48, 52, 55, 57, 60, 61, 62, 63, 64, 33, 36, 41, 44, 47, 46, 46, 47,
+        51, 54, 56, 59, 60, 61, 63, 64, 37, 40, 45, 47, 47, 45, 46, 47, 50, 52,
+        54, 57, 59, 61, 62, 62, 39, 41, 46, 47, 48, 47, 47, 48, 51, 54, 55, 57,
+        58, 59, 61, 62, 42, 43, 46, 48, 50, 49, 50, 50, 53, 56, 57, 60, 60, 59,
+        60, 60, 49, 46, 48, 49, 53, 53, 54, 54, 57, 59, 60, 63, 61, 62, 61, 61,
+        48, 46, 47, 48, 53, 55, 55, 56, 58, 61, 62, 64, 64, 63, 63, 64, 48, 46,
+        46, 48, 53, 56, 56, 57, 60, 62, 64, 66, 65, 65, 65, 64, 49, 45, 45, 47,
+        53, 58, 59, 61, 64, 66, 67, 69, 67, 67, 66, 67, 50, 46, 46, 48, 54, 59,
+        61, 63, 66, 68, 70, 71, 71, 68, 69, 67, 51, 47, 47, 48, 54, 60, 61, 64,
+        68, 70, 71, 73, 72, 72, 70, 71, 52, 48, 47, 48, 54, 61, 63, 66, 71, 73,
+        75, 77, 75, 73, 74, 71, 54, 50, 49, 50, 55, 62, 65, 68, 73, 76, 78, 79,
+        78, 76, 74, 75, 55, 51, 49, 50, 56, 63, 65, 69, 74, 77, 79, 81, 79, 78,
+        78, 75, 57, 52, 50, 51, 56, 64, 66, 70, 76, 79, 82, 85, 83, 81, 79, 79,
+        60, 54, 53, 53, 58, 65, 68, 72, 79, 82, 85, 87, 85, 84, 82, 80, 62, 56,
+        54, 55, 60, 66, 69, 74, 81, 84, 87, 88, 87, 85, 84, 84, 63, 57, 55, 56,
+        60, 67, 70, 75, 82, 86, 89, 92, 91, 89, 87, 84, 64, 59, 56, 57, 61, 68,
+        71, 75, 83, 87, 90, 93, 92, 90, 89, 89, 66, 60, 58, 58, 62, 69, 72, 76,
+        84, 88, 91, 94, 95, 93, 91, 89, 67, 61, 59, 58, 63, 68, 71, 78, 83, 86,
+        93, 96, 96, 96, 94, 94, 68, 62, 60, 59, 64, 67, 71, 79, 81, 86, 94, 95,
+        97, 98, 96, 94, 69, 63, 61, 60, 65, 66, 72, 77, 80, 88, 91, 96, 99, 99,
+        100, 98, 70, 64, 62, 60, 65, 66, 73, 76, 81, 87, 89, 97, 98, 100, 101,
+        99, 71, 65, 64, 61, 65, 67, 73, 74, 82, 85, 90, 95, 99, 102, 103, 104,
+        72, 65, 65, 62, 65, 68, 72, 75, 82, 83, 92, 93, 100, 102, 103, 104, 73,
+        66, 66, 63, 65, 69, 72, 76, 81, 85, 90, 93, 100, 102, 105, 106, 74, 67,
+        67, 64, 65, 70, 71, 77, 79, 86, 89, 94, 98, 103, 105, 106, 75, 68, 68,
+        65, 65, 71, 71, 78, 78, 87, 87, 96, 96, 105, 105, 109,
+        /* Size 32x16 */
+        32, 31, 30, 31, 33, 37, 39, 42, 49, 48, 48, 49, 50, 51, 52, 54, 55, 57,
+        60, 62, 63, 64, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 31, 31, 32, 33,
+        36, 40, 41, 43, 46, 46, 46, 45, 46, 47, 48, 50, 51, 52, 54, 56, 57, 59,
+        60, 61, 62, 63, 64, 65, 65, 66, 67, 68, 35, 37, 38, 38, 41, 45, 46, 46,
+        48, 47, 46, 45, 46, 47, 47, 49, 49, 50, 53, 54, 55, 56, 58, 59, 60, 61,
+        62, 64, 65, 66, 67, 68, 38, 40, 40, 41, 44, 47, 47, 48, 49, 48, 48, 47,
+        48, 48, 48, 50, 50, 51, 53, 55, 56, 57, 58, 58, 59, 60, 60, 61, 62, 63,
+        64, 65, 48, 47, 46, 46, 47, 47, 48, 50, 53, 53, 53, 53, 54, 54, 54, 55,
+        56, 56, 58, 60, 60, 61, 62, 63, 64, 65, 65, 65, 65, 65, 65, 65, 49, 47,
+        45, 45, 46, 45, 47, 49, 53, 55, 56, 58, 59, 60, 61, 62, 63, 64, 65, 66,
+        67, 68, 69, 68, 67, 66, 66, 67, 68, 69, 70, 71, 50, 48, 46, 46, 46, 46,
+        47, 50, 54, 55, 56, 59, 61, 61, 63, 65, 65, 66, 68, 69, 70, 71, 72, 71,
+        71, 72, 73, 73, 72, 72, 71, 71, 52, 50, 48, 48, 47, 47, 48, 50, 54, 56,
+        57, 61, 63, 64, 66, 68, 69, 70, 72, 74, 75, 75, 76, 78, 79, 77, 76, 74,
+        75, 76, 77, 78, 57, 54, 52, 52, 51, 50, 51, 53, 57, 58, 60, 64, 66, 68,
+        71, 73, 74, 76, 79, 81, 82, 83, 84, 83, 81, 80, 81, 82, 82, 81, 79, 78,
+        61, 57, 55, 55, 54, 52, 54, 56, 59, 61, 62, 66, 68, 70, 73, 76, 77, 79,
+        82, 84, 86, 87, 88, 86, 86, 88, 87, 85, 83, 85, 86, 87, 63, 60, 58, 57,
+        56, 54, 55, 57, 60, 62, 64, 67, 70, 71, 75, 78, 79, 82, 85, 87, 89, 90,
+        91, 93, 94, 91, 89, 90, 92, 90, 89, 87, 67, 63, 61, 60, 59, 57, 57, 60,
+        63, 64, 66, 69, 71, 73, 77, 79, 81, 85, 87, 88, 92, 93, 94, 96, 95, 96,
+        97, 95, 93, 93, 94, 96, 68, 64, 63, 61, 60, 59, 58, 60, 61, 64, 65, 67,
+        71, 72, 75, 78, 79, 83, 85, 87, 91, 92, 95, 96, 97, 99, 98, 99, 100,
+        100, 98, 96, 69, 65, 64, 62, 61, 61, 59, 59, 62, 63, 65, 67, 68, 72, 73,
+        76, 78, 81, 84, 85, 89, 90, 93, 96, 98, 99, 100, 102, 102, 102, 103,
+        105, 70, 66, 65, 63, 63, 62, 61, 60, 61, 63, 65, 66, 69, 70, 74, 74, 78,
+        79, 82, 84, 87, 89, 91, 94, 96, 100, 101, 103, 103, 105, 105, 105, 71,
+        67, 67, 64, 64, 62, 62, 60, 61, 64, 64, 67, 67, 71, 71, 75, 75, 79, 80,
+        84, 84, 89, 89, 94, 94, 98, 99, 104, 104, 106, 106, 109,
+        /* Size 4x16 */
+        31, 49, 61, 69, 32, 45, 55, 64, 36, 46, 54, 61, 41, 47, 54, 59, 46, 53,
+        59, 62, 46, 56, 62, 65, 46, 59, 68, 68, 48, 61, 73, 73, 51, 63, 77, 78,
+        54, 65, 82, 84, 57, 67, 86, 89, 60, 69, 88, 93, 62, 67, 86, 98, 64, 66,
+        87, 100, 65, 68, 83, 102, 67, 70, 86, 103,
+        /* Size 16x4 */
+        31, 32, 36, 41, 46, 46, 46, 48, 51, 54, 57, 60, 62, 64, 65, 67, 49, 45,
+        46, 47, 53, 56, 59, 61, 63, 65, 67, 69, 67, 66, 68, 70, 61, 55, 54, 54,
+        59, 62, 68, 73, 77, 82, 86, 88, 86, 87, 83, 86, 69, 64, 61, 59, 62, 65,
+        68, 73, 78, 84, 89, 93, 98, 100, 102, 103,
+        /* Size 8x32 */
+        32, 35, 48, 50, 57, 63, 68, 70, 31, 37, 47, 48, 54, 60, 64, 66, 30, 38,
+        46, 46, 52, 58, 63, 65, 31, 38, 46, 46, 52, 57, 61, 63, 33, 41, 47, 46,
+        51, 56, 60, 63, 37, 45, 47, 46, 50, 54, 59, 62, 39, 46, 48, 47, 51, 55,
+        58, 61, 42, 46, 50, 50, 53, 57, 60, 60, 49, 48, 53, 54, 57, 60, 61, 61,
+        48, 47, 53, 55, 58, 62, 64, 63, 48, 46, 53, 56, 60, 64, 65, 65, 49, 45,
+        53, 59, 64, 67, 67, 66, 50, 46, 54, 61, 66, 70, 71, 69, 51, 47, 54, 61,
+        68, 71, 72, 70, 52, 47, 54, 63, 71, 75, 75, 74, 54, 49, 55, 65, 73, 78,
+        78, 74, 55, 49, 56, 65, 74, 79, 79, 78, 57, 50, 56, 66, 76, 82, 83, 79,
+        60, 53, 58, 68, 79, 85, 85, 82, 62, 54, 60, 69, 81, 87, 87, 84, 63, 55,
+        60, 70, 82, 89, 91, 87, 64, 56, 61, 71, 83, 90, 92, 89, 66, 58, 62, 72,
+        84, 91, 95, 91, 67, 59, 63, 71, 83, 93, 96, 94, 68, 60, 64, 71, 81, 94,
+        97, 96, 69, 61, 65, 72, 80, 91, 99, 100, 70, 62, 65, 73, 81, 89, 98,
+        101, 71, 64, 65, 73, 82, 90, 99, 103, 72, 65, 65, 72, 82, 92, 100, 103,
+        73, 66, 65, 72, 81, 90, 100, 105, 74, 67, 65, 71, 79, 89, 98, 105, 75,
+        68, 65, 71, 78, 87, 96, 105,
+        /* Size 32x8 */
+        32, 31, 30, 31, 33, 37, 39, 42, 49, 48, 48, 49, 50, 51, 52, 54, 55, 57,
+        60, 62, 63, 64, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 35, 37, 38, 38,
+        41, 45, 46, 46, 48, 47, 46, 45, 46, 47, 47, 49, 49, 50, 53, 54, 55, 56,
+        58, 59, 60, 61, 62, 64, 65, 66, 67, 68, 48, 47, 46, 46, 47, 47, 48, 50,
+        53, 53, 53, 53, 54, 54, 54, 55, 56, 56, 58, 60, 60, 61, 62, 63, 64, 65,
+        65, 65, 65, 65, 65, 65, 50, 48, 46, 46, 46, 46, 47, 50, 54, 55, 56, 59,
+        61, 61, 63, 65, 65, 66, 68, 69, 70, 71, 72, 71, 71, 72, 73, 73, 72, 72,
+        71, 71, 57, 54, 52, 52, 51, 50, 51, 53, 57, 58, 60, 64, 66, 68, 71, 73,
+        74, 76, 79, 81, 82, 83, 84, 83, 81, 80, 81, 82, 82, 81, 79, 78, 63, 60,
+        58, 57, 56, 54, 55, 57, 60, 62, 64, 67, 70, 71, 75, 78, 79, 82, 85, 87,
+        89, 90, 91, 93, 94, 91, 89, 90, 92, 90, 89, 87, 68, 64, 63, 61, 60, 59,
+        58, 60, 61, 64, 65, 67, 71, 72, 75, 78, 79, 83, 85, 87, 91, 92, 95, 96,
+        97, 99, 98, 99, 100, 100, 98, 96, 70, 66, 65, 63, 63, 62, 61, 60, 61,
+        63, 65, 66, 69, 70, 74, 74, 78, 79, 82, 84, 87, 89, 91, 94, 96, 100,
+        101, 103, 103, 105, 105, 105 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 38, 63, 86, 38, 56, 78, 97, 63, 78, 113, 130, 86, 97, 130, 169,
+        /* Size 8x8 */
+        32, 32, 35, 46, 57, 76, 85, 96, 32, 34, 37, 45, 54, 70, 79, 90, 35, 37,
+        48, 56, 64, 79, 87, 93, 46, 45, 56, 70, 80, 96, 100, 105, 57, 54, 64,
+        80, 93, 111, 121, 122, 76, 70, 79, 96, 111, 134, 138, 144, 85, 79, 87,
+        100, 121, 138, 156, 168, 96, 90, 93, 105, 122, 144, 168, 184,
+        /* Size 16x16 */
+        32, 31, 31, 32, 34, 39, 44, 49, 58, 65, 71, 81, 87, 93, 98, 104, 31, 32,
+        32, 32, 34, 38, 41, 46, 54, 60, 66, 75, 81, 86, 92, 98, 31, 32, 33, 34,
+        36, 39, 42, 46, 53, 59, 64, 73, 78, 83, 88, 94, 32, 32, 34, 35, 37, 40,
+        42, 46, 52, 58, 63, 71, 75, 80, 86, 92, 34, 34, 36, 37, 42, 47, 50, 53,
+        59, 65, 70, 77, 82, 85, 89, 92, 39, 38, 39, 40, 47, 54, 58, 62, 68, 73,
+        78, 85, 90, 90, 96, 98, 44, 41, 42, 42, 50, 58, 63, 68, 74, 79, 84, 91,
+        96, 98, 102, 104, 49, 46, 46, 46, 53, 62, 68, 73, 81, 87, 92, 99, 103,
+        107, 109, 112, 58, 54, 53, 52, 59, 68, 74, 81, 90, 97, 102, 110, 114,
+        118, 117, 121, 65, 60, 59, 58, 65, 73, 79, 87, 97, 105, 111, 120, 125,
+        125, 126, 130, 71, 66, 64, 63, 70, 78, 84, 92, 102, 111, 117, 127, 133,
+        134, 136, 141, 81, 75, 73, 71, 77, 85, 91, 99, 110, 120, 127, 137, 143,
+        145, 148, 152, 87, 81, 78, 75, 82, 90, 96, 103, 114, 125, 133, 143, 150,
+        156, 160, 163, 93, 86, 83, 80, 85, 90, 98, 107, 118, 125, 134, 145, 156,
+        163, 169, 177, 98, 92, 88, 86, 89, 96, 102, 109, 117, 126, 136, 148,
+        160, 169, 176, 184, 104, 98, 94, 92, 92, 98, 104, 112, 121, 130, 141,
+        152, 163, 177, 184, 191,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 32, 32, 34, 34, 36, 39, 41, 44, 48, 49, 54, 58, 59,
+        65, 69, 71, 80, 81, 83, 87, 90, 93, 95, 98, 101, 104, 107, 31, 32, 32,
+        32, 32, 32, 32, 34, 34, 35, 38, 39, 42, 46, 47, 51, 55, 57, 62, 66, 68,
+        76, 77, 78, 83, 85, 88, 90, 93, 96, 99, 101, 31, 32, 32, 32, 32, 32, 32,
+        33, 34, 34, 38, 39, 41, 45, 46, 50, 54, 55, 60, 64, 66, 73, 75, 76, 81,
+        83, 86, 89, 92, 95, 98, 101, 31, 32, 32, 32, 32, 32, 32, 33, 34, 34, 37,
+        38, 41, 44, 45, 49, 53, 54, 59, 63, 65, 72, 74, 75, 79, 81, 84, 86, 89,
+        91, 94, 97, 31, 32, 32, 32, 33, 33, 34, 35, 36, 36, 39, 40, 42, 45, 46,
+        50, 53, 54, 59, 63, 64, 71, 73, 74, 78, 80, 83, 85, 88, 91, 94, 97, 32,
+        32, 32, 32, 33, 34, 34, 36, 36, 37, 40, 40, 42, 45, 46, 49, 53, 54, 58,
+        62, 63, 70, 72, 73, 77, 79, 82, 85, 87, 90, 92, 95, 32, 32, 32, 32, 34,
+        34, 35, 37, 37, 38, 40, 41, 42, 45, 46, 49, 52, 54, 58, 61, 63, 69, 71,
+        72, 75, 78, 80, 83, 86, 89, 92, 95, 34, 34, 33, 33, 35, 36, 37, 39, 41,
+        42, 45, 46, 47, 50, 51, 54, 57, 59, 63, 66, 68, 74, 75, 76, 80, 81, 82,
+        83, 85, 87, 90, 93, 34, 34, 34, 34, 36, 36, 37, 41, 42, 45, 47, 48, 50,
+        53, 53, 56, 59, 61, 65, 68, 70, 76, 77, 78, 82, 83, 85, 88, 89, 90, 92,
+        93, 36, 35, 34, 34, 36, 37, 38, 42, 45, 48, 50, 51, 54, 56, 57, 60, 63,
+        64, 68, 71, 73, 79, 80, 81, 85, 87, 89, 89, 90, 93, 96, 99, 39, 38, 38,
+        37, 39, 40, 40, 45, 47, 50, 54, 55, 58, 61, 62, 65, 68, 69, 73, 76, 78,
+        84, 85, 86, 90, 89, 90, 93, 96, 97, 98, 99, 41, 39, 39, 38, 40, 40, 41,
+        46, 48, 51, 55, 56, 59, 62, 63, 67, 70, 71, 75, 78, 80, 86, 87, 88, 91,
+        93, 96, 97, 97, 99, 102, 105, 44, 42, 41, 41, 42, 42, 42, 47, 50, 54,
+        58, 59, 63, 66, 68, 71, 74, 75, 79, 83, 84, 90, 91, 92, 96, 98, 98, 99,
+        102, 104, 104, 105, 48, 46, 45, 44, 45, 45, 45, 50, 53, 56, 61, 62, 66,
+        70, 71, 76, 79, 80, 85, 88, 90, 96, 97, 98, 101, 100, 102, 105, 105,
+        105, 109, 112, 49, 47, 46, 45, 46, 46, 46, 51, 53, 57, 62, 63, 68, 71,
+        73, 77, 81, 82, 87, 90, 92, 98, 99, 100, 103, 106, 107, 106, 109, 112,
+        112, 112, 54, 51, 50, 49, 50, 49, 49, 54, 56, 60, 65, 67, 71, 76, 77,
+        82, 86, 87, 92, 96, 97, 104, 105, 106, 110, 110, 109, 113, 114, 113,
+        116, 120, 58, 55, 54, 53, 53, 53, 52, 57, 59, 63, 68, 70, 74, 79, 81,
+        86, 90, 91, 97, 100, 102, 109, 110, 111, 114, 114, 118, 116, 117, 121,
+        121, 120, 59, 57, 55, 54, 54, 54, 54, 59, 61, 64, 69, 71, 75, 80, 82,
+        87, 91, 93, 99, 102, 104, 111, 112, 113, 117, 121, 120, 122, 124, 122,
+        125, 129, 65, 62, 60, 59, 59, 58, 58, 63, 65, 68, 73, 75, 79, 85, 87,
+        92, 97, 99, 105, 109, 111, 118, 120, 121, 125, 124, 125, 127, 126, 130,
+        130, 129, 69, 66, 64, 63, 63, 62, 61, 66, 68, 71, 76, 78, 83, 88, 90,
+        96, 100, 102, 109, 113, 115, 123, 125, 126, 129, 130, 131, 130, 134,
+        133, 135, 139, 71, 68, 66, 65, 64, 63, 63, 68, 70, 73, 78, 80, 84, 90,
+        92, 97, 102, 104, 111, 115, 117, 125, 127, 128, 133, 136, 134, 139, 136,
+        139, 141, 140, 80, 76, 73, 72, 71, 70, 69, 74, 76, 79, 84, 86, 90, 96,
+        98, 104, 109, 111, 118, 123, 125, 134, 136, 137, 142, 138, 143, 140,
+        144, 144, 144, 149, 81, 77, 75, 74, 73, 72, 71, 75, 77, 80, 85, 87, 91,
+        97, 99, 105, 110, 112, 120, 125, 127, 136, 137, 139, 143, 148, 145, 148,
+        148, 150, 152, 149, 83, 78, 76, 75, 74, 73, 72, 76, 78, 81, 86, 88, 92,
+        98, 100, 106, 111, 113, 121, 126, 128, 137, 139, 140, 145, 149, 153,
+        153, 154, 155, 155, 161, 87, 83, 81, 79, 78, 77, 75, 80, 82, 85, 90, 91,
+        96, 101, 103, 110, 114, 117, 125, 129, 133, 142, 143, 145, 150, 151,
+        156, 159, 160, 160, 163, 161, 90, 85, 83, 81, 80, 79, 78, 81, 83, 87,
+        89, 93, 98, 100, 106, 110, 114, 121, 124, 130, 136, 138, 148, 149, 151,
+        156, 157, 162, 166, 168, 166, 172, 93, 88, 86, 84, 83, 82, 80, 82, 85,
+        89, 90, 96, 98, 102, 107, 109, 118, 120, 125, 131, 134, 143, 145, 153,
+        156, 157, 163, 164, 169, 172, 177, 172, 95, 90, 89, 86, 85, 85, 83, 83,
+        88, 89, 93, 97, 99, 105, 106, 113, 116, 122, 127, 130, 139, 140, 148,
+        153, 159, 162, 164, 169, 170, 176, 179, 185, 98, 93, 92, 89, 88, 87, 86,
+        85, 89, 90, 96, 97, 102, 105, 109, 114, 117, 124, 126, 134, 136, 144,
+        148, 154, 160, 166, 169, 170, 176, 177, 184, 186, 101, 96, 95, 91, 91,
+        90, 89, 87, 90, 93, 97, 99, 104, 105, 112, 113, 121, 122, 130, 133, 139,
+        144, 150, 155, 160, 168, 172, 176, 177, 184, 185, 191, 104, 99, 98, 94,
+        94, 92, 92, 90, 92, 96, 98, 102, 104, 109, 112, 116, 121, 125, 130, 135,
+        141, 144, 152, 155, 163, 166, 177, 179, 184, 185, 191, 192, 107, 101,
+        101, 97, 97, 95, 95, 93, 93, 99, 99, 105, 105, 112, 112, 120, 120, 129,
+        129, 139, 140, 149, 149, 161, 161, 172, 172, 185, 186, 191, 192, 199,
+        /* Size 4x8 */
+        32, 38, 62, 86, 32, 40, 58, 80, 34, 51, 68, 85, 44, 61, 85, 101, 54, 69,
+        98, 117, 72, 84, 118, 136, 82, 89, 129, 157, 92, 98, 127, 165,
+        /* Size 8x4 */
+        32, 32, 34, 44, 54, 72, 82, 92, 38, 40, 51, 61, 69, 84, 89, 98, 62, 58,
+        68, 85, 98, 118, 129, 127, 86, 80, 85, 101, 117, 136, 157, 165,
+        /* Size 8x16 */
+        32, 32, 36, 44, 58, 79, 88, 93, 31, 32, 35, 41, 54, 73, 81, 88, 32, 33,
+        36, 42, 53, 71, 78, 84, 32, 34, 38, 42, 52, 69, 76, 82, 34, 36, 44, 50,
+        59, 75, 81, 84, 39, 39, 50, 58, 68, 84, 88, 90, 44, 42, 53, 63, 74, 90,
+        97, 97, 49, 46, 57, 67, 81, 97, 104, 105, 57, 53, 63, 74, 90, 108, 111,
+        113, 65, 59, 68, 79, 97, 118, 123, 122, 71, 64, 73, 84, 102, 125, 135,
+        131, 81, 72, 80, 91, 110, 135, 145, 141, 87, 77, 85, 96, 114, 140, 148,
+        151, 92, 83, 88, 102, 117, 133, 153, 163, 98, 88, 89, 103, 121, 141,
+        160, 169, 103, 94, 92, 103, 119, 137, 158, 175,
+        /* Size 16x8 */
+        32, 31, 32, 32, 34, 39, 44, 49, 57, 65, 71, 81, 87, 92, 98, 103, 32, 32,
+        33, 34, 36, 39, 42, 46, 53, 59, 64, 72, 77, 83, 88, 94, 36, 35, 36, 38,
+        44, 50, 53, 57, 63, 68, 73, 80, 85, 88, 89, 92, 44, 41, 42, 42, 50, 58,
+        63, 67, 74, 79, 84, 91, 96, 102, 103, 103, 58, 54, 53, 52, 59, 68, 74,
+        81, 90, 97, 102, 110, 114, 117, 121, 119, 79, 73, 71, 69, 75, 84, 90,
+        97, 108, 118, 125, 135, 140, 133, 141, 137, 88, 81, 78, 76, 81, 88, 97,
+        104, 111, 123, 135, 145, 148, 153, 160, 158, 93, 88, 84, 82, 84, 90, 97,
+        105, 113, 122, 131, 141, 151, 163, 169, 175,
+        /* Size 16x32 */
+        32, 31, 32, 32, 36, 39, 44, 53, 58, 65, 79, 81, 88, 90, 93, 96, 31, 32,
+        32, 32, 35, 38, 42, 51, 55, 62, 75, 77, 83, 86, 88, 91, 31, 32, 32, 32,
+        35, 38, 41, 50, 54, 60, 73, 75, 81, 84, 88, 91, 31, 32, 32, 33, 34, 37,
+        41, 49, 53, 59, 72, 74, 79, 82, 84, 87, 32, 32, 33, 34, 36, 39, 42, 50,
+        53, 59, 71, 72, 78, 81, 84, 87, 32, 32, 34, 34, 37, 40, 42, 49, 53, 58,
+        70, 71, 77, 80, 83, 85, 32, 33, 34, 35, 38, 40, 42, 49, 52, 58, 69, 70,
+        76, 78, 82, 86, 34, 34, 35, 37, 42, 45, 48, 54, 57, 63, 73, 75, 79, 79,
+        81, 83, 34, 34, 36, 37, 44, 47, 50, 56, 59, 65, 75, 77, 81, 83, 84, 84,
+        36, 34, 37, 38, 48, 51, 54, 60, 63, 68, 78, 80, 85, 85, 86, 89, 39, 37,
+        39, 40, 50, 54, 58, 65, 68, 73, 84, 85, 88, 89, 90, 89, 40, 38, 40, 41,
+        51, 55, 59, 67, 70, 75, 85, 87, 91, 92, 92, 95, 44, 41, 42, 43, 53, 58,
+        63, 71, 74, 79, 90, 91, 97, 94, 97, 95, 47, 44, 45, 46, 56, 61, 66, 75,
+        79, 85, 95, 97, 99, 101, 98, 102, 49, 46, 46, 47, 57, 62, 67, 77, 81,
+        86, 97, 99, 104, 102, 105, 102, 53, 49, 50, 50, 60, 65, 71, 82, 86, 92,
+        103, 105, 109, 108, 106, 110, 57, 53, 53, 53, 63, 68, 74, 86, 90, 97,
+        108, 110, 111, 112, 113, 110, 59, 54, 54, 54, 64, 69, 75, 87, 91, 98,
+        111, 112, 119, 117, 115, 118, 65, 60, 59, 58, 68, 73, 79, 92, 97, 105,
+        118, 119, 123, 123, 122, 119, 69, 63, 62, 62, 71, 76, 83, 96, 100, 109,
+        122, 124, 127, 125, 125, 128, 71, 65, 64, 63, 73, 78, 84, 97, 102, 111,
+        125, 127, 135, 134, 131, 129, 79, 72, 71, 70, 79, 84, 90, 104, 109, 118,
+        133, 135, 137, 136, 136, 137, 81, 74, 72, 71, 80, 85, 91, 105, 110, 120,
+        135, 137, 145, 143, 141, 138, 82, 75, 73, 72, 81, 86, 92, 106, 111, 121,
+        136, 139, 147, 148, 147, 149, 87, 79, 77, 76, 85, 90, 96, 110, 114, 125,
+        140, 143, 148, 154, 151, 149, 90, 82, 80, 78, 87, 89, 99, 108, 113, 129,
+        135, 146, 153, 157, 160, 159, 92, 84, 83, 81, 88, 90, 102, 106, 117,
+        128, 133, 150, 153, 158, 163, 160, 95, 87, 85, 83, 88, 92, 103, 105,
+        120, 125, 137, 148, 155, 164, 168, 173, 98, 89, 88, 85, 89, 95, 103,
+        108, 121, 124, 141, 144, 160, 164, 169, 174, 100, 92, 91, 88, 90, 98,
+        103, 111, 120, 127, 139, 146, 161, 165, 175, 179, 103, 94, 94, 90, 92,
+        101, 103, 114, 119, 131, 137, 150, 158, 170, 175, 180, 106, 97, 97, 93,
+        93, 104, 104, 118, 118, 135, 135, 154, 155, 175, 176, 187,
+        /* Size 32x16 */
+        32, 31, 31, 31, 32, 32, 32, 34, 34, 36, 39, 40, 44, 47, 49, 53, 57, 59,
+        65, 69, 71, 79, 81, 82, 87, 90, 92, 95, 98, 100, 103, 106, 31, 32, 32,
+        32, 32, 32, 33, 34, 34, 34, 37, 38, 41, 44, 46, 49, 53, 54, 60, 63, 65,
+        72, 74, 75, 79, 82, 84, 87, 89, 92, 94, 97, 32, 32, 32, 32, 33, 34, 34,
+        35, 36, 37, 39, 40, 42, 45, 46, 50, 53, 54, 59, 62, 64, 71, 72, 73, 77,
+        80, 83, 85, 88, 91, 94, 97, 32, 32, 32, 33, 34, 34, 35, 37, 37, 38, 40,
+        41, 43, 46, 47, 50, 53, 54, 58, 62, 63, 70, 71, 72, 76, 78, 81, 83, 85,
+        88, 90, 93, 36, 35, 35, 34, 36, 37, 38, 42, 44, 48, 50, 51, 53, 56, 57,
+        60, 63, 64, 68, 71, 73, 79, 80, 81, 85, 87, 88, 88, 89, 90, 92, 93, 39,
+        38, 38, 37, 39, 40, 40, 45, 47, 51, 54, 55, 58, 61, 62, 65, 68, 69, 73,
+        76, 78, 84, 85, 86, 90, 89, 90, 92, 95, 98, 101, 104, 44, 42, 41, 41,
+        42, 42, 42, 48, 50, 54, 58, 59, 63, 66, 67, 71, 74, 75, 79, 83, 84, 90,
+        91, 92, 96, 99, 102, 103, 103, 103, 103, 104, 53, 51, 50, 49, 50, 49,
+        49, 54, 56, 60, 65, 67, 71, 75, 77, 82, 86, 87, 92, 96, 97, 104, 105,
+        106, 110, 108, 106, 105, 108, 111, 114, 118, 58, 55, 54, 53, 53, 53, 52,
+        57, 59, 63, 68, 70, 74, 79, 81, 86, 90, 91, 97, 100, 102, 109, 110, 111,
+        114, 113, 117, 120, 121, 120, 119, 118, 65, 62, 60, 59, 59, 58, 58, 63,
+        65, 68, 73, 75, 79, 85, 86, 92, 97, 98, 105, 109, 111, 118, 120, 121,
+        125, 129, 128, 125, 124, 127, 131, 135, 79, 75, 73, 72, 71, 70, 69, 73,
+        75, 78, 84, 85, 90, 95, 97, 103, 108, 111, 118, 122, 125, 133, 135, 136,
+        140, 135, 133, 137, 141, 139, 137, 135, 81, 77, 75, 74, 72, 71, 70, 75,
+        77, 80, 85, 87, 91, 97, 99, 105, 110, 112, 119, 124, 127, 135, 137, 139,
+        143, 146, 150, 148, 144, 146, 150, 154, 88, 83, 81, 79, 78, 77, 76, 79,
+        81, 85, 88, 91, 97, 99, 104, 109, 111, 119, 123, 127, 135, 137, 145,
+        147, 148, 153, 153, 155, 160, 161, 158, 155, 90, 86, 84, 82, 81, 80, 78,
+        79, 83, 85, 89, 92, 94, 101, 102, 108, 112, 117, 123, 125, 134, 136,
+        143, 148, 154, 157, 158, 164, 164, 165, 170, 175, 93, 88, 88, 84, 84,
+        83, 82, 81, 84, 86, 90, 92, 97, 98, 105, 106, 113, 115, 122, 125, 131,
+        136, 141, 147, 151, 160, 163, 168, 169, 175, 175, 176, 96, 91, 91, 87,
+        87, 85, 86, 83, 84, 89, 89, 95, 95, 102, 102, 110, 110, 118, 119, 128,
+        129, 137, 138, 149, 149, 159, 160, 173, 174, 179, 180, 187,
+        /* Size 4x16 */
+        31, 39, 65, 90, 32, 38, 60, 84, 32, 39, 59, 81, 33, 40, 58, 78, 34, 47,
+        65, 83, 37, 54, 73, 89, 41, 58, 79, 94, 46, 62, 86, 102, 53, 68, 97,
+        112, 60, 73, 105, 123, 65, 78, 111, 134, 74, 85, 120, 143, 79, 90, 125,
+        154, 84, 90, 128, 158, 89, 95, 124, 164, 94, 101, 131, 170,
+        /* Size 16x4 */
+        31, 32, 32, 33, 34, 37, 41, 46, 53, 60, 65, 74, 79, 84, 89, 94, 39, 38,
+        39, 40, 47, 54, 58, 62, 68, 73, 78, 85, 90, 90, 95, 101, 65, 60, 59, 58,
+        65, 73, 79, 86, 97, 105, 111, 120, 125, 128, 124, 131, 90, 84, 81, 78,
+        83, 89, 94, 102, 112, 123, 134, 143, 154, 158, 164, 170,
+        /* Size 8x32 */
+        32, 32, 36, 44, 58, 79, 88, 93, 31, 32, 35, 42, 55, 75, 83, 88, 31, 32,
+        35, 41, 54, 73, 81, 88, 31, 32, 34, 41, 53, 72, 79, 84, 32, 33, 36, 42,
+        53, 71, 78, 84, 32, 34, 37, 42, 53, 70, 77, 83, 32, 34, 38, 42, 52, 69,
+        76, 82, 34, 35, 42, 48, 57, 73, 79, 81, 34, 36, 44, 50, 59, 75, 81, 84,
+        36, 37, 48, 54, 63, 78, 85, 86, 39, 39, 50, 58, 68, 84, 88, 90, 40, 40,
+        51, 59, 70, 85, 91, 92, 44, 42, 53, 63, 74, 90, 97, 97, 47, 45, 56, 66,
+        79, 95, 99, 98, 49, 46, 57, 67, 81, 97, 104, 105, 53, 50, 60, 71, 86,
+        103, 109, 106, 57, 53, 63, 74, 90, 108, 111, 113, 59, 54, 64, 75, 91,
+        111, 119, 115, 65, 59, 68, 79, 97, 118, 123, 122, 69, 62, 71, 83, 100,
+        122, 127, 125, 71, 64, 73, 84, 102, 125, 135, 131, 79, 71, 79, 90, 109,
+        133, 137, 136, 81, 72, 80, 91, 110, 135, 145, 141, 82, 73, 81, 92, 111,
+        136, 147, 147, 87, 77, 85, 96, 114, 140, 148, 151, 90, 80, 87, 99, 113,
+        135, 153, 160, 92, 83, 88, 102, 117, 133, 153, 163, 95, 85, 88, 103,
+        120, 137, 155, 168, 98, 88, 89, 103, 121, 141, 160, 169, 100, 91, 90,
+        103, 120, 139, 161, 175, 103, 94, 92, 103, 119, 137, 158, 175, 106, 97,
+        93, 104, 118, 135, 155, 176,
+        /* Size 32x8 */
+        32, 31, 31, 31, 32, 32, 32, 34, 34, 36, 39, 40, 44, 47, 49, 53, 57, 59,
+        65, 69, 71, 79, 81, 82, 87, 90, 92, 95, 98, 100, 103, 106, 32, 32, 32,
+        32, 33, 34, 34, 35, 36, 37, 39, 40, 42, 45, 46, 50, 53, 54, 59, 62, 64,
+        71, 72, 73, 77, 80, 83, 85, 88, 91, 94, 97, 36, 35, 35, 34, 36, 37, 38,
+        42, 44, 48, 50, 51, 53, 56, 57, 60, 63, 64, 68, 71, 73, 79, 80, 81, 85,
+        87, 88, 88, 89, 90, 92, 93, 44, 42, 41, 41, 42, 42, 42, 48, 50, 54, 58,
+        59, 63, 66, 67, 71, 74, 75, 79, 83, 84, 90, 91, 92, 96, 99, 102, 103,
+        103, 103, 103, 104, 58, 55, 54, 53, 53, 53, 52, 57, 59, 63, 68, 70, 74,
+        79, 81, 86, 90, 91, 97, 100, 102, 109, 110, 111, 114, 113, 117, 120,
+        121, 120, 119, 118, 79, 75, 73, 72, 71, 70, 69, 73, 75, 78, 84, 85, 90,
+        95, 97, 103, 108, 111, 118, 122, 125, 133, 135, 136, 140, 135, 133, 137,
+        141, 139, 137, 135, 88, 83, 81, 79, 78, 77, 76, 79, 81, 85, 88, 91, 97,
+        99, 104, 109, 111, 119, 123, 127, 135, 137, 145, 147, 148, 153, 153,
+        155, 160, 161, 158, 155, 93, 88, 88, 84, 84, 83, 82, 81, 84, 86, 90, 92,
+        97, 98, 105, 106, 113, 115, 122, 125, 131, 136, 141, 147, 151, 160, 163,
+        168, 169, 175, 175, 176 },
+      { /* Chroma */
+        /* Size 4x4 */
+        32, 45, 53, 63, 45, 55, 62, 67, 53, 62, 80, 84, 63, 67, 84, 101,
+        /* Size 8x8 */
+        31, 36, 47, 48, 52, 60, 64, 67, 36, 43, 47, 46, 49, 55, 59, 63, 47, 47,
+        53, 54, 55, 60, 63, 64, 48, 46, 54, 61, 65, 70, 71, 71, 52, 49, 55, 65,
+        71, 78, 81, 79, 60, 55, 60, 70, 78, 89, 89, 89, 64, 59, 63, 71, 81, 89,
+        97, 99, 67, 63, 64, 71, 79, 89, 99, 104,
+        /* Size 16x16 */
+        32, 30, 33, 36, 44, 48, 49, 51, 54, 57, 60, 64, 67, 68, 70, 72, 30, 31,
+        35, 39, 44, 46, 46, 47, 50, 53, 55, 59, 61, 64, 66, 68, 33, 35, 39, 43,
+        46, 46, 45, 47, 49, 51, 53, 57, 59, 61, 63, 65, 36, 39, 43, 47, 47, 46,
+        45, 46, 48, 50, 52, 55, 57, 58, 61, 63, 44, 44, 46, 47, 50, 51, 51, 51,
+        53, 54, 56, 59, 61, 61, 63, 62, 48, 46, 46, 46, 51, 54, 55, 56, 58, 60,
+        61, 64, 65, 64, 66, 66, 49, 46, 45, 45, 51, 55, 58, 60, 62, 63, 65, 68,
+        69, 69, 69, 69, 51, 47, 47, 46, 51, 56, 60, 62, 65, 67, 69, 72, 73, 74,
+        73, 73, 54, 50, 49, 48, 53, 58, 62, 65, 70, 73, 75, 78, 79, 79, 77, 77,
+        57, 53, 51, 50, 54, 60, 63, 67, 73, 76, 79, 82, 84, 83, 82, 82, 60, 55,
+        53, 52, 56, 61, 65, 69, 75, 79, 82, 86, 88, 87, 86, 87, 64, 59, 57, 55,
+        59, 64, 68, 72, 78, 82, 86, 90, 93, 92, 91, 92, 67, 61, 59, 57, 61, 65,
+        69, 73, 79, 84, 88, 93, 95, 96, 96, 96, 68, 64, 61, 58, 61, 64, 69, 74,
+        79, 83, 87, 92, 96, 99, 100, 101, 70, 66, 63, 61, 63, 66, 69, 73, 77,
+        82, 86, 91, 96, 100, 103, 104, 72, 68, 65, 63, 62, 66, 69, 73, 77, 82,
+        87, 92, 96, 101, 104, 106,
+        /* Size 32x32 */
+        32, 31, 30, 30, 33, 35, 36, 41, 44, 49, 48, 48, 49, 50, 51, 52, 54, 55,
+        57, 59, 60, 63, 64, 65, 67, 68, 68, 69, 70, 71, 72, 73, 31, 31, 31, 31,
+        34, 36, 38, 42, 44, 47, 47, 47, 47, 48, 48, 50, 51, 52, 54, 56, 57, 60,
+        61, 61, 63, 64, 65, 66, 67, 67, 68, 69, 30, 31, 31, 31, 35, 37, 39, 42,
+        44, 47, 46, 46, 46, 47, 47, 48, 50, 51, 53, 54, 55, 58, 59, 60, 61, 63,
+        64, 65, 66, 67, 68, 69, 30, 31, 31, 32, 35, 37, 40, 42, 44, 46, 45, 45,
+        45, 46, 46, 47, 49, 50, 52, 53, 54, 57, 58, 58, 60, 61, 62, 63, 63, 64,
+        65, 66, 33, 34, 35, 35, 39, 41, 43, 45, 46, 47, 46, 46, 45, 46, 47, 47,
+        49, 49, 51, 53, 53, 56, 57, 57, 59, 60, 61, 62, 63, 64, 65, 66, 35, 36,
+        37, 37, 41, 43, 45, 46, 46, 47, 46, 46, 45, 46, 46, 47, 48, 49, 50, 52,
+        53, 55, 56, 56, 58, 59, 60, 61, 62, 63, 64, 64, 36, 38, 39, 40, 43, 45,
+        47, 47, 47, 48, 46, 46, 45, 46, 46, 47, 48, 48, 50, 51, 52, 54, 55, 55,
+        57, 58, 58, 59, 61, 62, 63, 64, 41, 42, 42, 42, 45, 46, 47, 48, 49, 50,
+        49, 49, 49, 50, 50, 50, 51, 52, 53, 54, 55, 57, 58, 58, 60, 60, 59, 59,
+        60, 61, 61, 62, 44, 44, 44, 44, 46, 46, 47, 49, 50, 51, 51, 51, 51, 51,
+        51, 52, 53, 53, 54, 56, 56, 59, 59, 59, 61, 61, 61, 62, 63, 62, 62, 62,
+        49, 47, 47, 46, 47, 47, 48, 50, 51, 53, 53, 53, 53, 54, 54, 54, 55, 55,
+        56, 58, 58, 60, 61, 61, 63, 63, 64, 63, 63, 64, 65, 66, 48, 47, 46, 45,
+        46, 46, 46, 49, 51, 53, 54, 54, 55, 56, 56, 57, 58, 59, 60, 61, 61, 63,
+        64, 64, 65, 65, 64, 65, 66, 66, 66, 66, 48, 47, 46, 45, 46, 46, 46, 49,
+        51, 53, 54, 55, 56, 57, 57, 58, 59, 60, 61, 62, 63, 65, 65, 65, 66, 67,
+        68, 67, 67, 67, 68, 69, 49, 47, 46, 45, 45, 45, 45, 49, 51, 53, 55, 56,
+        58, 59, 60, 61, 62, 62, 63, 65, 65, 67, 68, 68, 69, 70, 69, 69, 69, 70,
+        69, 69, 50, 48, 47, 46, 46, 46, 46, 50, 51, 54, 56, 57, 59, 61, 62, 63,
+        64, 65, 66, 68, 68, 70, 71, 71, 72, 71, 71, 72, 71, 71, 71, 72, 51, 48,
+        47, 46, 47, 46, 46, 50, 51, 54, 56, 57, 60, 62, 62, 64, 65, 66, 67, 69,
+        69, 71, 72, 72, 73, 74, 74, 72, 73, 74, 73, 73, 52, 50, 48, 47, 47, 47,
+        47, 50, 52, 54, 57, 58, 61, 63, 64, 66, 68, 68, 70, 72, 72, 75, 75, 75,
+        77, 76, 75, 76, 76, 74, 75, 76, 54, 51, 50, 49, 49, 48, 48, 51, 53, 55,
+        58, 59, 62, 64, 65, 68, 70, 70, 73, 74, 75, 77, 78, 78, 79, 78, 79, 78,
+        77, 78, 77, 77, 55, 52, 51, 50, 49, 49, 48, 52, 53, 55, 59, 60, 62, 65,
+        66, 68, 70, 71, 73, 75, 76, 78, 79, 79, 80, 81, 80, 80, 81, 79, 79, 81,
+        57, 54, 53, 52, 51, 50, 50, 53, 54, 56, 60, 61, 63, 66, 67, 70, 73, 73,
+        76, 78, 79, 82, 82, 83, 84, 83, 83, 83, 82, 83, 82, 81, 59, 56, 54, 53,
+        53, 52, 51, 54, 56, 58, 61, 62, 65, 68, 69, 72, 74, 75, 78, 80, 81, 84,
+        85, 85, 86, 86, 86, 84, 85, 84, 84, 85, 60, 57, 55, 54, 53, 53, 52, 55,
+        56, 58, 61, 63, 65, 68, 69, 72, 75, 76, 79, 81, 82, 85, 86, 86, 88, 88,
+        87, 88, 86, 87, 87, 85, 63, 60, 58, 57, 56, 55, 54, 57, 59, 60, 63, 65,
+        67, 70, 71, 75, 77, 78, 82, 84, 85, 89, 89, 90, 92, 89, 91, 89, 90, 89,
+        88, 89, 64, 61, 59, 58, 57, 56, 55, 58, 59, 61, 64, 65, 68, 71, 72, 75,
+        78, 79, 82, 85, 86, 89, 90, 91, 93, 94, 92, 92, 91, 91, 92, 90, 65, 61,
+        60, 58, 57, 56, 55, 58, 59, 61, 64, 65, 68, 71, 72, 75, 78, 79, 83, 85,
+        86, 90, 91, 91, 93, 94, 95, 94, 94, 94, 93, 94, 67, 63, 61, 60, 59, 58,
+        57, 60, 61, 63, 65, 66, 69, 72, 73, 77, 79, 80, 84, 86, 88, 92, 93, 93,
+        95, 95, 96, 97, 96, 95, 96, 94, 68, 64, 63, 61, 60, 59, 58, 60, 61, 63,
+        65, 67, 70, 71, 74, 76, 78, 81, 83, 86, 88, 89, 94, 94, 95, 97, 97, 98,
+        99, 99, 97, 99, 68, 65, 64, 62, 61, 60, 58, 59, 61, 64, 64, 68, 69, 71,
+        74, 75, 79, 80, 83, 86, 87, 91, 92, 95, 96, 97, 99, 99, 100, 100, 101,
+        99, 69, 66, 65, 63, 62, 61, 59, 59, 62, 63, 65, 67, 69, 72, 72, 76, 78,
+        80, 83, 84, 88, 89, 92, 94, 97, 98, 99, 101, 100, 102, 102, 104, 70, 67,
+        66, 63, 63, 62, 61, 60, 63, 63, 66, 67, 69, 71, 73, 76, 77, 81, 82, 85,
+        86, 90, 91, 94, 96, 99, 100, 100, 103, 102, 104, 104, 71, 67, 67, 64,
+        64, 63, 62, 61, 62, 64, 66, 67, 70, 71, 74, 74, 78, 79, 83, 84, 87, 89,
+        91, 94, 95, 99, 100, 102, 102, 104, 104, 106, 72, 68, 68, 65, 65, 64,
+        63, 61, 62, 65, 66, 68, 69, 71, 73, 75, 77, 79, 82, 84, 87, 88, 92, 93,
+        96, 97, 101, 102, 104, 104, 106, 106, 73, 69, 69, 66, 66, 64, 64, 62,
+        62, 66, 66, 69, 69, 72, 73, 76, 77, 81, 81, 85, 85, 89, 90, 94, 94, 99,
+        99, 104, 104, 106, 106, 108,
+        /* Size 4x8 */
+        31, 47, 54, 64, 38, 46, 50, 60, 46, 53, 57, 62, 46, 56, 66, 71, 50, 59,
+        74, 79, 57, 64, 82, 88, 61, 65, 85, 97, 65, 67, 82, 99,
+        /* Size 8x4 */
+        31, 38, 46, 46, 50, 57, 61, 65, 47, 46, 53, 56, 59, 64, 65, 67, 54, 50,
+        57, 66, 74, 82, 85, 82, 64, 60, 62, 71, 79, 88, 97, 99,
+        /* Size 8x16 */
+        32, 34, 48, 49, 54, 63, 67, 69, 31, 36, 46, 46, 50, 58, 62, 65, 33, 40,
+        47, 46, 49, 56, 59, 62, 37, 44, 47, 45, 48, 54, 57, 60, 44, 46, 51, 51,
+        53, 59, 60, 61, 48, 46, 53, 56, 58, 64, 64, 64, 49, 45, 53, 58, 62, 67,
+        70, 68, 51, 47, 54, 60, 65, 71, 73, 72, 54, 49, 55, 62, 70, 77, 77, 76,
+        57, 51, 56, 64, 73, 82, 83, 81, 60, 53, 58, 65, 75, 85, 89, 85, 64, 57,
+        61, 68, 78, 89, 93, 89, 66, 59, 63, 69, 79, 91, 94, 93, 68, 61, 63, 71,
+        79, 87, 96, 98, 70, 63, 63, 70, 80, 89, 97, 100, 72, 65, 63, 69, 77, 86,
+        95, 102,
+        /* Size 16x8 */
+        32, 31, 33, 37, 44, 48, 49, 51, 54, 57, 60, 64, 66, 68, 70, 72, 34, 36,
+        40, 44, 46, 46, 45, 47, 49, 51, 53, 57, 59, 61, 63, 65, 48, 46, 47, 47,
+        51, 53, 53, 54, 55, 56, 58, 61, 63, 63, 63, 63, 49, 46, 46, 45, 51, 56,
+        58, 60, 62, 64, 65, 68, 69, 71, 70, 69, 54, 50, 49, 48, 53, 58, 62, 65,
+        70, 73, 75, 78, 79, 79, 80, 77, 63, 58, 56, 54, 59, 64, 67, 71, 77, 82,
+        85, 89, 91, 87, 89, 86, 67, 62, 59, 57, 60, 64, 70, 73, 77, 83, 89, 93,
+        94, 96, 97, 95, 69, 65, 62, 60, 61, 64, 68, 72, 76, 81, 85, 89, 93, 98,
+        100, 102,
+        /* Size 16x32 */
+        32, 31, 34, 37, 48, 48, 49, 52, 54, 57, 63, 64, 67, 68, 69, 69, 31, 31,
+        35, 38, 47, 47, 47, 50, 51, 54, 60, 61, 63, 64, 65, 66, 31, 32, 36, 39,
+        46, 46, 46, 48, 50, 53, 58, 59, 62, 63, 65, 66, 30, 32, 36, 40, 46, 45,
+        45, 48, 49, 52, 57, 58, 60, 61, 62, 63, 33, 36, 40, 43, 47, 46, 46, 47,
+        49, 51, 56, 57, 59, 60, 62, 63, 35, 38, 42, 45, 47, 46, 45, 47, 48, 50,
+        55, 56, 58, 60, 61, 61, 37, 40, 44, 47, 47, 46, 45, 47, 48, 50, 54, 55,
+        57, 58, 60, 61, 42, 43, 45, 47, 50, 50, 49, 50, 51, 53, 57, 58, 59, 58,
+        59, 59, 44, 44, 46, 47, 51, 51, 51, 52, 53, 54, 59, 59, 60, 61, 61, 60,
+        49, 46, 47, 48, 53, 53, 53, 54, 55, 57, 60, 61, 63, 62, 62, 63, 48, 46,
+        46, 47, 53, 54, 56, 57, 58, 60, 64, 64, 64, 64, 64, 63, 48, 45, 46, 46,
+        53, 55, 56, 58, 59, 61, 65, 65, 66, 66, 65, 66, 49, 45, 45, 46, 53, 56,
+        58, 61, 62, 64, 67, 68, 70, 67, 68, 66, 50, 46, 46, 46, 54, 56, 59, 63,
+        65, 66, 70, 71, 70, 71, 68, 70, 51, 47, 47, 47, 54, 57, 60, 64, 65, 68,
+        71, 72, 73, 71, 72, 70, 52, 48, 47, 47, 54, 57, 61, 66, 68, 71, 75, 75,
+        76, 75, 73, 73, 54, 49, 49, 48, 55, 58, 62, 68, 70, 73, 77, 78, 77, 77,
+        76, 74, 54, 50, 49, 49, 55, 59, 62, 68, 70, 74, 78, 79, 81, 79, 77, 78,
+        57, 52, 51, 50, 56, 60, 64, 70, 73, 76, 82, 82, 83, 82, 81, 78, 59, 54,
+        52, 52, 58, 61, 65, 72, 74, 78, 84, 85, 85, 83, 82, 82, 60, 54, 53, 52,
+        58, 62, 65, 72, 75, 79, 85, 86, 89, 87, 85, 82, 63, 57, 56, 55, 60, 64,
+        67, 75, 77, 82, 89, 90, 90, 88, 87, 86, 64, 58, 57, 55, 61, 64, 68, 75,
+        78, 82, 89, 90, 93, 91, 89, 87, 64, 59, 57, 56, 61, 65, 68, 75, 78, 83,
+        90, 91, 94, 93, 92, 91, 66, 60, 59, 57, 63, 66, 69, 77, 79, 84, 91, 93,
+        94, 95, 93, 91, 67, 61, 60, 58, 63, 65, 70, 75, 78, 85, 88, 93, 96, 97,
+        97, 95, 68, 62, 61, 59, 63, 64, 71, 74, 79, 84, 87, 94, 96, 97, 98, 96,
+        69, 63, 62, 60, 63, 65, 71, 72, 80, 82, 88, 93, 96, 99, 100, 101, 70,
+        64, 63, 60, 63, 66, 70, 73, 80, 81, 89, 90, 97, 99, 100, 101, 71, 65,
+        64, 61, 63, 67, 70, 74, 78, 82, 88, 90, 97, 99, 102, 103, 72, 65, 65,
+        62, 63, 68, 69, 75, 77, 83, 86, 92, 95, 100, 102, 103, 73, 66, 66, 63,
+        63, 69, 69, 76, 76, 84, 84, 93, 93, 101, 101, 105,
+        /* Size 32x16 */
+        32, 31, 31, 30, 33, 35, 37, 42, 44, 49, 48, 48, 49, 50, 51, 52, 54, 54,
+        57, 59, 60, 63, 64, 64, 66, 67, 68, 69, 70, 71, 72, 73, 31, 31, 32, 32,
+        36, 38, 40, 43, 44, 46, 46, 45, 45, 46, 47, 48, 49, 50, 52, 54, 54, 57,
+        58, 59, 60, 61, 62, 63, 64, 65, 65, 66, 34, 35, 36, 36, 40, 42, 44, 45,
+        46, 47, 46, 46, 45, 46, 47, 47, 49, 49, 51, 52, 53, 56, 57, 57, 59, 60,
+        61, 62, 63, 64, 65, 66, 37, 38, 39, 40, 43, 45, 47, 47, 47, 48, 47, 46,
+        46, 46, 47, 47, 48, 49, 50, 52, 52, 55, 55, 56, 57, 58, 59, 60, 60, 61,
+        62, 63, 48, 47, 46, 46, 47, 47, 47, 50, 51, 53, 53, 53, 53, 54, 54, 54,
+        55, 55, 56, 58, 58, 60, 61, 61, 63, 63, 63, 63, 63, 63, 63, 63, 48, 47,
+        46, 45, 46, 46, 46, 50, 51, 53, 54, 55, 56, 56, 57, 57, 58, 59, 60, 61,
+        62, 64, 64, 65, 66, 65, 64, 65, 66, 67, 68, 69, 49, 47, 46, 45, 46, 45,
+        45, 49, 51, 53, 56, 56, 58, 59, 60, 61, 62, 62, 64, 65, 65, 67, 68, 68,
+        69, 70, 71, 71, 70, 70, 69, 69, 52, 50, 48, 48, 47, 47, 47, 50, 52, 54,
+        57, 58, 61, 63, 64, 66, 68, 68, 70, 72, 72, 75, 75, 75, 77, 75, 74, 72,
+        73, 74, 75, 76, 54, 51, 50, 49, 49, 48, 48, 51, 53, 55, 58, 59, 62, 65,
+        65, 68, 70, 70, 73, 74, 75, 77, 78, 78, 79, 78, 79, 80, 80, 78, 77, 76,
+        57, 54, 53, 52, 51, 50, 50, 53, 54, 57, 60, 61, 64, 66, 68, 71, 73, 74,
+        76, 78, 79, 82, 82, 83, 84, 85, 84, 82, 81, 82, 83, 84, 63, 60, 58, 57,
+        56, 55, 54, 57, 59, 60, 64, 65, 67, 70, 71, 75, 77, 78, 82, 84, 85, 89,
+        89, 90, 91, 88, 87, 88, 89, 88, 86, 84, 64, 61, 59, 58, 57, 56, 55, 58,
+        59, 61, 64, 65, 68, 71, 72, 75, 78, 79, 82, 85, 86, 90, 90, 91, 93, 93,
+        94, 93, 90, 90, 92, 93, 67, 63, 62, 60, 59, 58, 57, 59, 60, 63, 64, 66,
+        70, 70, 73, 76, 77, 81, 83, 85, 89, 90, 93, 94, 94, 96, 96, 96, 97, 97,
+        95, 93, 68, 64, 63, 61, 60, 60, 58, 58, 61, 62, 64, 66, 67, 71, 71, 75,
+        77, 79, 82, 83, 87, 88, 91, 93, 95, 97, 97, 99, 99, 99, 100, 101, 69,
+        65, 65, 62, 62, 61, 60, 59, 61, 62, 64, 65, 68, 68, 72, 73, 76, 77, 81,
+        82, 85, 87, 89, 92, 93, 97, 98, 100, 100, 102, 102, 101, 69, 66, 66, 63,
+        63, 61, 61, 59, 60, 63, 63, 66, 66, 70, 70, 73, 74, 78, 78, 82, 82, 86,
+        87, 91, 91, 95, 96, 101, 101, 103, 103, 105,
+        /* Size 4x16 */
+        31, 48, 57, 68, 32, 46, 53, 63, 36, 46, 51, 60, 40, 46, 50, 58, 44, 51,
+        54, 61, 46, 54, 60, 64, 45, 56, 64, 67, 47, 57, 68, 71, 49, 58, 73, 77,
+        52, 60, 76, 82, 54, 62, 79, 87, 58, 64, 82, 91, 60, 66, 84, 95, 62, 64,
+        84, 97, 64, 66, 81, 99, 65, 68, 83, 100,
+        /* Size 16x4 */
+        31, 32, 36, 40, 44, 46, 45, 47, 49, 52, 54, 58, 60, 62, 64, 65, 48, 46,
+        46, 46, 51, 54, 56, 57, 58, 60, 62, 64, 66, 64, 66, 68, 57, 53, 51, 50,
+        54, 60, 64, 68, 73, 76, 79, 82, 84, 84, 81, 83, 68, 63, 60, 58, 61, 64,
+        67, 71, 77, 82, 87, 91, 95, 97, 99, 100,
+        /* Size 8x32 */
+        32, 34, 48, 49, 54, 63, 67, 69, 31, 35, 47, 47, 51, 60, 63, 65, 31, 36,
+        46, 46, 50, 58, 62, 65, 30, 36, 46, 45, 49, 57, 60, 62, 33, 40, 47, 46,
+        49, 56, 59, 62, 35, 42, 47, 45, 48, 55, 58, 61, 37, 44, 47, 45, 48, 54,
+        57, 60, 42, 45, 50, 49, 51, 57, 59, 59, 44, 46, 51, 51, 53, 59, 60, 61,
+        49, 47, 53, 53, 55, 60, 63, 62, 48, 46, 53, 56, 58, 64, 64, 64, 48, 46,
+        53, 56, 59, 65, 66, 65, 49, 45, 53, 58, 62, 67, 70, 68, 50, 46, 54, 59,
+        65, 70, 70, 68, 51, 47, 54, 60, 65, 71, 73, 72, 52, 47, 54, 61, 68, 75,
+        76, 73, 54, 49, 55, 62, 70, 77, 77, 76, 54, 49, 55, 62, 70, 78, 81, 77,
+        57, 51, 56, 64, 73, 82, 83, 81, 59, 52, 58, 65, 74, 84, 85, 82, 60, 53,
+        58, 65, 75, 85, 89, 85, 63, 56, 60, 67, 77, 89, 90, 87, 64, 57, 61, 68,
+        78, 89, 93, 89, 64, 57, 61, 68, 78, 90, 94, 92, 66, 59, 63, 69, 79, 91,
+        94, 93, 67, 60, 63, 70, 78, 88, 96, 97, 68, 61, 63, 71, 79, 87, 96, 98,
+        69, 62, 63, 71, 80, 88, 96, 100, 70, 63, 63, 70, 80, 89, 97, 100, 71,
+        64, 63, 70, 78, 88, 97, 102, 72, 65, 63, 69, 77, 86, 95, 102, 73, 66,
+        63, 69, 76, 84, 93, 101,
+        /* Size 32x8 */
+        32, 31, 31, 30, 33, 35, 37, 42, 44, 49, 48, 48, 49, 50, 51, 52, 54, 54,
+        57, 59, 60, 63, 64, 64, 66, 67, 68, 69, 70, 71, 72, 73, 34, 35, 36, 36,
+        40, 42, 44, 45, 46, 47, 46, 46, 45, 46, 47, 47, 49, 49, 51, 52, 53, 56,
+        57, 57, 59, 60, 61, 62, 63, 64, 65, 66, 48, 47, 46, 46, 47, 47, 47, 50,
+        51, 53, 53, 53, 53, 54, 54, 54, 55, 55, 56, 58, 58, 60, 61, 61, 63, 63,
+        63, 63, 63, 63, 63, 63, 49, 47, 46, 45, 46, 45, 45, 49, 51, 53, 56, 56,
+        58, 59, 60, 61, 62, 62, 64, 65, 65, 67, 68, 68, 69, 70, 71, 71, 70, 70,
+        69, 69, 54, 51, 50, 49, 49, 48, 48, 51, 53, 55, 58, 59, 62, 65, 65, 68,
+        70, 70, 73, 74, 75, 77, 78, 78, 79, 78, 79, 80, 80, 78, 77, 76, 63, 60,
+        58, 57, 56, 55, 54, 57, 59, 60, 64, 65, 67, 70, 71, 75, 77, 78, 82, 84,
+        85, 89, 89, 90, 91, 88, 87, 88, 89, 88, 86, 84, 67, 63, 62, 60, 59, 58,
+        57, 59, 60, 63, 64, 66, 70, 70, 73, 76, 77, 81, 83, 85, 89, 90, 93, 94,
+        94, 96, 96, 96, 97, 97, 95, 93, 69, 65, 65, 62, 62, 61, 60, 59, 61, 62,
+        64, 65, 68, 68, 72, 73, 76, 77, 81, 82, 85, 87, 89, 92, 93, 97, 98, 100,
+        100, 102, 102, 101 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 37, 58, 81, 37, 54, 72, 91, 58, 72, 102, 121, 81, 91, 121, 156,
+        /* Size 8x8 */
+        32, 32, 35, 42, 53, 68, 78, 90, 32, 33, 36, 42, 51, 64, 74, 84, 35, 36,
+        46, 52, 60, 72, 80, 87, 42, 42, 52, 63, 73, 84, 92, 98, 53, 51, 60, 73,
+        86, 100, 109, 114, 68, 64, 72, 84, 100, 117, 128, 133, 78, 74, 80, 92,
+        109, 128, 140, 155, 90, 84, 87, 98, 114, 133, 155, 168,
+        /* Size 16x16 */
+        32, 31, 31, 32, 34, 36, 41, 47, 54, 59, 65, 74, 82, 87, 92, 97, 31, 32,
+        32, 32, 34, 35, 39, 45, 50, 55, 61, 69, 76, 81, 87, 92, 31, 32, 33, 33,
+        35, 36, 40, 44, 49, 54, 59, 67, 73, 78, 83, 88, 32, 32, 33, 35, 37, 38,
+        41, 45, 49, 53, 58, 65, 71, 75, 80, 86, 34, 34, 35, 37, 39, 42, 46, 50,
+        54, 58, 63, 70, 76, 80, 84, 85, 36, 35, 36, 38, 42, 48, 52, 56, 60, 64,
+        68, 75, 80, 85, 90, 91, 41, 39, 40, 41, 46, 52, 57, 62, 67, 71, 75, 83,
+        88, 92, 95, 97, 47, 45, 44, 45, 50, 56, 62, 69, 75, 79, 84, 91, 97, 100,
+        102, 104, 54, 50, 49, 49, 54, 60, 67, 75, 82, 87, 92, 100, 106, 110,
+        109, 112, 59, 55, 54, 53, 58, 64, 71, 79, 87, 92, 98, 106, 112, 117,
+        117, 121, 65, 61, 59, 58, 63, 68, 75, 84, 92, 98, 105, 114, 120, 125,
+        126, 130, 74, 69, 67, 65, 70, 75, 83, 91, 100, 106, 114, 123, 131, 135,
+        137, 140, 82, 76, 73, 71, 76, 80, 88, 97, 106, 112, 120, 131, 139, 144,
+        148, 150, 87, 81, 78, 75, 80, 85, 92, 100, 110, 117, 125, 135, 144, 150,
+        155, 162, 92, 87, 83, 80, 84, 90, 95, 102, 109, 117, 126, 137, 148, 155,
+        162, 168, 97, 92, 88, 86, 85, 91, 97, 104, 112, 121, 130, 140, 150, 162,
+        168, 174,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 32, 32, 34, 35, 36, 39, 41, 44, 47, 48, 54, 56,
+        59, 64, 65, 71, 74, 80, 82, 83, 87, 90, 92, 95, 97, 100, 31, 32, 32, 32,
+        32, 32, 32, 33, 34, 35, 35, 38, 40, 42, 45, 46, 51, 53, 56, 61, 62, 68,
+        71, 76, 78, 78, 83, 85, 88, 90, 92, 95, 31, 32, 32, 32, 32, 32, 32, 33,
+        34, 34, 35, 38, 39, 42, 45, 45, 50, 52, 55, 60, 61, 67, 69, 74, 76, 77,
+        81, 84, 87, 89, 92, 95, 31, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 37,
+        38, 41, 44, 44, 49, 51, 54, 58, 59, 65, 68, 72, 74, 75, 79, 81, 84, 86,
+        88, 90, 31, 32, 32, 32, 33, 33, 33, 34, 35, 36, 36, 39, 40, 42, 44, 45,
+        49, 51, 54, 58, 59, 64, 67, 71, 73, 74, 78, 80, 83, 85, 88, 90, 31, 32,
+        32, 32, 33, 33, 34, 34, 35, 36, 36, 39, 40, 42, 45, 45, 50, 51, 54, 58,
+        59, 64, 67, 71, 73, 74, 78, 80, 82, 84, 86, 89, 32, 32, 32, 32, 33, 34,
+        35, 36, 37, 38, 38, 40, 41, 42, 45, 46, 49, 51, 53, 57, 58, 63, 65, 69,
+        71, 72, 75, 78, 80, 83, 86, 89, 32, 33, 33, 33, 34, 34, 36, 36, 38, 39,
+        40, 42, 43, 44, 47, 47, 51, 53, 55, 59, 60, 65, 67, 71, 73, 73, 77, 78,
+        80, 82, 84, 86, 34, 34, 34, 33, 35, 35, 37, 38, 39, 42, 42, 45, 46, 47,
+        50, 51, 54, 56, 58, 62, 63, 68, 70, 74, 76, 76, 80, 82, 84, 85, 85, 86,
+        35, 35, 34, 34, 36, 36, 38, 39, 42, 46, 47, 49, 50, 52, 55, 55, 59, 60,
+        62, 66, 67, 72, 74, 78, 79, 80, 83, 84, 85, 87, 90, 92, 36, 35, 35, 34,
+        36, 36, 38, 40, 42, 47, 48, 50, 52, 54, 56, 57, 60, 61, 64, 67, 68, 73,
+        75, 79, 80, 81, 85, 87, 90, 91, 91, 92, 39, 38, 38, 37, 39, 39, 40, 42,
+        45, 49, 50, 54, 55, 58, 60, 61, 65, 66, 69, 72, 73, 78, 80, 84, 86, 86,
+        90, 91, 91, 92, 95, 97, 41, 40, 39, 38, 40, 40, 41, 43, 46, 50, 52, 55,
+        57, 60, 62, 63, 67, 69, 71, 75, 75, 80, 83, 86, 88, 89, 92, 93, 95, 97,
+        97, 98, 44, 42, 42, 41, 42, 42, 42, 44, 47, 52, 54, 58, 60, 63, 66, 67,
+        71, 73, 75, 79, 79, 84, 86, 90, 92, 92, 96, 98, 98, 98, 101, 104, 47,
+        45, 45, 44, 44, 45, 45, 47, 50, 55, 56, 60, 62, 66, 69, 70, 75, 77, 79,
+        83, 84, 89, 91, 95, 97, 97, 100, 99, 102, 105, 104, 104, 48, 46, 45, 44,
+        45, 45, 46, 47, 51, 55, 57, 61, 63, 67, 70, 71, 76, 78, 80, 84, 85, 90,
+        93, 96, 98, 99, 102, 106, 106, 105, 108, 111, 54, 51, 50, 49, 49, 50,
+        49, 51, 54, 59, 60, 65, 67, 71, 75, 76, 82, 84, 87, 91, 92, 97, 100,
+        104, 106, 106, 110, 108, 109, 112, 112, 111, 56, 53, 52, 51, 51, 51, 51,
+        53, 56, 60, 61, 66, 69, 73, 77, 78, 84, 86, 89, 93, 94, 100, 102, 106,
+        108, 109, 112, 113, 115, 114, 116, 119, 59, 56, 55, 54, 54, 54, 53, 55,
+        58, 62, 64, 69, 71, 75, 79, 80, 87, 89, 92, 97, 98, 103, 106, 110, 112,
+        113, 117, 118, 117, 121, 121, 119, 64, 61, 60, 58, 58, 58, 57, 59, 62,
+        66, 67, 72, 75, 79, 83, 84, 91, 93, 97, 102, 103, 109, 112, 116, 118,
+        119, 122, 121, 125, 123, 125, 128, 65, 62, 61, 59, 59, 59, 58, 60, 63,
+        67, 68, 73, 75, 79, 84, 85, 92, 94, 98, 103, 105, 111, 114, 118, 120,
+        121, 125, 129, 126, 129, 130, 129, 71, 68, 67, 65, 64, 64, 63, 65, 68,
+        72, 73, 78, 80, 84, 89, 90, 97, 100, 103, 109, 111, 117, 120, 125, 127,
+        128, 133, 130, 134, 133, 133, 137, 74, 71, 69, 68, 67, 67, 65, 67, 70,
+        74, 75, 80, 83, 86, 91, 93, 100, 102, 106, 112, 114, 120, 123, 128, 131,
+        131, 135, 137, 137, 138, 140, 137, 80, 76, 74, 72, 71, 71, 69, 71, 74,
+        78, 79, 84, 86, 90, 95, 96, 104, 106, 110, 116, 118, 125, 128, 134, 136,
+        137, 142, 141, 142, 143, 143, 147, 82, 78, 76, 74, 73, 73, 71, 73, 76,
+        79, 80, 86, 88, 92, 97, 98, 106, 108, 112, 118, 120, 127, 131, 136, 139,
+        139, 144, 147, 148, 147, 150, 148, 83, 78, 77, 75, 74, 74, 72, 73, 76,
+        80, 81, 86, 89, 92, 97, 99, 106, 109, 113, 119, 121, 128, 131, 137, 139,
+        140, 145, 150, 152, 155, 152, 157, 87, 83, 81, 79, 78, 78, 75, 77, 80,
+        83, 85, 90, 92, 96, 100, 102, 110, 112, 117, 122, 125, 133, 135, 142,
+        144, 145, 150, 151, 155, 158, 162, 158, 90, 85, 84, 81, 80, 80, 78, 78,
+        82, 84, 87, 91, 93, 98, 99, 106, 108, 113, 118, 121, 129, 130, 137, 141,
+        147, 150, 151, 156, 156, 161, 164, 169, 92, 88, 87, 84, 83, 82, 80, 80,
+        84, 85, 90, 91, 95, 98, 102, 106, 109, 115, 117, 125, 126, 134, 137,
+        142, 148, 152, 155, 156, 162, 162, 168, 170, 95, 90, 89, 86, 85, 84, 83,
+        82, 85, 87, 91, 92, 97, 98, 105, 105, 112, 114, 121, 123, 129, 133, 138,
+        143, 147, 155, 158, 161, 162, 168, 168, 174, 97, 92, 92, 88, 88, 86, 86,
+        84, 85, 90, 91, 95, 97, 101, 104, 108, 112, 116, 121, 125, 130, 133,
+        140, 143, 150, 152, 162, 164, 168, 168, 174, 175, 100, 95, 95, 90, 90,
+        89, 89, 86, 86, 92, 92, 97, 98, 104, 104, 111, 111, 119, 119, 128, 129,
+        137, 137, 147, 148, 157, 158, 169, 170, 174, 175, 181,
+        /* Size 4x8 */
+        32, 35, 59, 83, 32, 36, 57, 78, 34, 47, 65, 82, 41, 53, 78, 97, 51, 61,
+        92, 111, 65, 73, 108, 129, 75, 81, 117, 148, 86, 92, 119, 154,
+        /* Size 8x4 */
+        32, 32, 34, 41, 51, 65, 75, 86, 35, 36, 47, 53, 61, 73, 81, 92, 59, 57,
+        65, 78, 92, 108, 117, 119, 83, 78, 82, 97, 111, 129, 148, 154,
+        /* Size 8x16 */
+        32, 31, 35, 44, 53, 65, 82, 90, 31, 32, 34, 41, 50, 61, 76, 85, 31, 33,
+        35, 42, 49, 59, 73, 81, 32, 34, 37, 42, 49, 58, 71, 79, 34, 35, 41, 48,
+        54, 63, 76, 81, 36, 36, 46, 54, 60, 68, 80, 87, 41, 40, 49, 60, 67, 76,
+        88, 93, 47, 44, 53, 66, 75, 84, 97, 101, 53, 50, 57, 71, 82, 92, 106,
+        108, 58, 54, 61, 75, 87, 98, 112, 116, 65, 59, 66, 79, 92, 105, 120,
+        124, 74, 67, 73, 86, 100, 113, 131, 134, 82, 73, 79, 92, 105, 120, 139,
+        142, 87, 78, 83, 96, 110, 125, 144, 153, 92, 83, 84, 97, 114, 132, 150,
+        157, 97, 88, 86, 97, 111, 128, 147, 163,
+        /* Size 16x8 */
+        32, 31, 31, 32, 34, 36, 41, 47, 53, 58, 65, 74, 82, 87, 92, 97, 31, 32,
+        33, 34, 35, 36, 40, 44, 50, 54, 59, 67, 73, 78, 83, 88, 35, 34, 35, 37,
+        41, 46, 49, 53, 57, 61, 66, 73, 79, 83, 84, 86, 44, 41, 42, 42, 48, 54,
+        60, 66, 71, 75, 79, 86, 92, 96, 97, 97, 53, 50, 49, 49, 54, 60, 67, 75,
+        82, 87, 92, 100, 105, 110, 114, 111, 65, 61, 59, 58, 63, 68, 76, 84, 92,
+        98, 105, 113, 120, 125, 132, 128, 82, 76, 73, 71, 76, 80, 88, 97, 106,
+        112, 120, 131, 139, 144, 150, 147, 90, 85, 81, 79, 81, 87, 93, 101, 108,
+        116, 124, 134, 142, 153, 157, 163,
+        /* Size 16x32 */
+        32, 31, 31, 32, 35, 36, 44, 47, 53, 62, 65, 79, 82, 88, 90, 93, 31, 32,
+        32, 32, 35, 35, 42, 45, 51, 59, 62, 75, 78, 83, 86, 88, 31, 32, 32, 32,
+        34, 35, 41, 45, 50, 58, 61, 74, 76, 82, 85, 88, 31, 32, 32, 33, 34, 34,
+        41, 44, 49, 57, 59, 72, 74, 79, 82, 84, 31, 32, 33, 34, 35, 36, 42, 44,
+        49, 57, 59, 71, 73, 79, 81, 84, 32, 32, 33, 34, 36, 36, 42, 45, 50, 57,
+        59, 71, 73, 78, 80, 82, 32, 33, 34, 35, 37, 38, 42, 45, 49, 56, 58, 69,
+        71, 76, 79, 83, 32, 33, 34, 36, 39, 40, 44, 47, 51, 58, 60, 71, 73, 76,
+        78, 80, 34, 34, 35, 37, 41, 42, 48, 50, 54, 61, 63, 73, 76, 81, 81, 80,
+        35, 34, 36, 38, 45, 47, 52, 55, 59, 65, 67, 77, 79, 82, 83, 86, 36, 34,
+        36, 38, 46, 48, 54, 56, 60, 66, 68, 78, 80, 85, 87, 86, 39, 37, 39, 40,
+        48, 50, 58, 60, 65, 71, 73, 84, 86, 89, 88, 91, 41, 39, 40, 41, 49, 51,
+        60, 62, 67, 74, 76, 86, 88, 91, 93, 91, 44, 41, 42, 43, 51, 53, 63, 66,
+        71, 78, 79, 90, 92, 97, 94, 97, 47, 44, 44, 45, 53, 56, 66, 69, 75, 82,
+        84, 95, 97, 98, 101, 98, 48, 45, 45, 46, 54, 56, 67, 70, 76, 83, 85, 96,
+        98, 104, 101, 105, 53, 49, 50, 50, 57, 60, 71, 75, 82, 90, 92, 103, 106,
+        107, 108, 105, 55, 51, 51, 51, 59, 61, 72, 77, 84, 92, 94, 106, 108,
+        111, 110, 112, 58, 54, 54, 54, 61, 63, 75, 79, 87, 95, 98, 110, 112,
+        117, 116, 113, 63, 58, 58, 57, 65, 67, 78, 83, 91, 100, 103, 116, 118,
+        119, 119, 121, 65, 60, 59, 58, 66, 68, 79, 84, 92, 102, 105, 118, 120,
+        127, 124, 122, 71, 65, 64, 63, 71, 73, 84, 89, 97, 108, 111, 125, 127,
+        129, 129, 130, 74, 68, 67, 66, 73, 75, 86, 91, 100, 110, 113, 128, 131,
+        135, 134, 130, 79, 72, 71, 70, 77, 79, 90, 95, 104, 115, 118, 133, 136,
+        140, 139, 140, 82, 75, 73, 72, 79, 81, 92, 97, 105, 117, 120, 136, 139,
+        145, 142, 140, 82, 75, 74, 72, 79, 81, 92, 97, 106, 117, 121, 136, 139,
+        148, 150, 149, 87, 79, 78, 76, 83, 85, 96, 100, 110, 120, 125, 141, 144,
+        148, 153, 150, 89, 82, 81, 78, 83, 87, 97, 99, 113, 118, 128, 139, 145,
+        153, 157, 161, 92, 84, 83, 80, 84, 89, 97, 101, 114, 116, 132, 135, 150,
+        153, 157, 162, 94, 86, 85, 82, 85, 92, 97, 104, 112, 119, 130, 136, 151,
+        154, 163, 166, 97, 88, 88, 85, 86, 94, 97, 107, 111, 123, 128, 140, 147,
+        159, 163, 167, 99, 91, 91, 87, 87, 97, 97, 110, 110, 126, 126, 144, 144,
+        163, 163, 173,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 32, 32, 32, 34, 35, 36, 39, 41, 44, 47, 48, 53, 55,
+        58, 63, 65, 71, 74, 79, 82, 82, 87, 89, 92, 94, 97, 99, 31, 32, 32, 32,
+        32, 32, 33, 33, 34, 34, 34, 37, 39, 41, 44, 45, 49, 51, 54, 58, 60, 65,
+        68, 72, 75, 75, 79, 82, 84, 86, 88, 91, 31, 32, 32, 32, 33, 33, 34, 34,
+        35, 36, 36, 39, 40, 42, 44, 45, 50, 51, 54, 58, 59, 64, 67, 71, 73, 74,
+        78, 81, 83, 85, 88, 91, 32, 32, 32, 33, 34, 34, 35, 36, 37, 38, 38, 40,
+        41, 43, 45, 46, 50, 51, 54, 57, 58, 63, 66, 70, 72, 72, 76, 78, 80, 82,
+        85, 87, 35, 35, 34, 34, 35, 36, 37, 39, 41, 45, 46, 48, 49, 51, 53, 54,
+        57, 59, 61, 65, 66, 71, 73, 77, 79, 79, 83, 83, 84, 85, 86, 87, 36, 35,
+        35, 34, 36, 36, 38, 40, 42, 47, 48, 50, 51, 53, 56, 56, 60, 61, 63, 67,
+        68, 73, 75, 79, 81, 81, 85, 87, 89, 92, 94, 97, 44, 42, 41, 41, 42, 42,
+        42, 44, 48, 52, 54, 58, 60, 63, 66, 67, 71, 72, 75, 78, 79, 84, 86, 90,
+        92, 92, 96, 97, 97, 97, 97, 97, 47, 45, 45, 44, 44, 45, 45, 47, 50, 55,
+        56, 60, 62, 66, 69, 70, 75, 77, 79, 83, 84, 89, 91, 95, 97, 97, 100, 99,
+        101, 104, 107, 110, 53, 51, 50, 49, 49, 50, 49, 51, 54, 59, 60, 65, 67,
+        71, 75, 76, 82, 84, 87, 91, 92, 97, 100, 104, 105, 106, 110, 113, 114,
+        112, 111, 110, 62, 59, 58, 57, 57, 57, 56, 58, 61, 65, 66, 71, 74, 78,
+        82, 83, 90, 92, 95, 100, 102, 108, 110, 115, 117, 117, 120, 118, 116,
+        119, 123, 126, 65, 62, 61, 59, 59, 59, 58, 60, 63, 67, 68, 73, 76, 79,
+        84, 85, 92, 94, 98, 103, 105, 111, 113, 118, 120, 121, 125, 128, 132,
+        130, 128, 126, 79, 75, 74, 72, 71, 71, 69, 71, 73, 77, 78, 84, 86, 90,
+        95, 96, 103, 106, 110, 116, 118, 125, 128, 133, 136, 136, 141, 139, 135,
+        136, 140, 144, 82, 78, 76, 74, 73, 73, 71, 73, 76, 79, 80, 86, 88, 92,
+        97, 98, 106, 108, 112, 118, 120, 127, 131, 136, 139, 139, 144, 145, 150,
+        151, 147, 144, 88, 83, 82, 79, 79, 78, 76, 76, 81, 82, 85, 89, 91, 97,
+        98, 104, 107, 111, 117, 119, 127, 129, 135, 140, 145, 148, 148, 153,
+        153, 154, 159, 163, 90, 86, 85, 82, 81, 80, 79, 78, 81, 83, 87, 88, 93,
+        94, 101, 101, 108, 110, 116, 119, 124, 129, 134, 139, 142, 150, 153,
+        157, 157, 163, 163, 163, 93, 88, 88, 84, 84, 82, 83, 80, 80, 86, 86, 91,
+        91, 97, 98, 105, 105, 112, 113, 121, 122, 130, 130, 140, 140, 149, 150,
+        161, 162, 166, 167, 173,
+        /* Size 4x16 */
+        31, 36, 62, 88, 32, 35, 58, 82, 32, 36, 57, 79, 33, 38, 56, 76, 34, 42,
+        61, 81, 34, 48, 66, 85, 39, 51, 74, 91, 44, 56, 82, 98, 49, 60, 90, 107,
+        54, 63, 95, 117, 60, 68, 102, 127, 68, 75, 110, 135, 75, 81, 117, 145,
+        79, 85, 120, 148, 84, 89, 116, 153, 88, 94, 123, 159,
+        /* Size 16x4 */
+        31, 32, 32, 33, 34, 34, 39, 44, 49, 54, 60, 68, 75, 79, 84, 88, 36, 35,
+        36, 38, 42, 48, 51, 56, 60, 63, 68, 75, 81, 85, 89, 94, 62, 58, 57, 56,
+        61, 66, 74, 82, 90, 95, 102, 110, 117, 120, 116, 123, 88, 82, 79, 76,
+        81, 85, 91, 98, 107, 117, 127, 135, 145, 148, 153, 159,
+        /* Size 8x32 */
+        32, 31, 35, 44, 53, 65, 82, 90, 31, 32, 35, 42, 51, 62, 78, 86, 31, 32,
+        34, 41, 50, 61, 76, 85, 31, 32, 34, 41, 49, 59, 74, 82, 31, 33, 35, 42,
+        49, 59, 73, 81, 32, 33, 36, 42, 50, 59, 73, 80, 32, 34, 37, 42, 49, 58,
+        71, 79, 32, 34, 39, 44, 51, 60, 73, 78, 34, 35, 41, 48, 54, 63, 76, 81,
+        35, 36, 45, 52, 59, 67, 79, 83, 36, 36, 46, 54, 60, 68, 80, 87, 39, 39,
+        48, 58, 65, 73, 86, 88, 41, 40, 49, 60, 67, 76, 88, 93, 44, 42, 51, 63,
+        71, 79, 92, 94, 47, 44, 53, 66, 75, 84, 97, 101, 48, 45, 54, 67, 76, 85,
+        98, 101, 53, 50, 57, 71, 82, 92, 106, 108, 55, 51, 59, 72, 84, 94, 108,
+        110, 58, 54, 61, 75, 87, 98, 112, 116, 63, 58, 65, 78, 91, 103, 118,
+        119, 65, 59, 66, 79, 92, 105, 120, 124, 71, 64, 71, 84, 97, 111, 127,
+        129, 74, 67, 73, 86, 100, 113, 131, 134, 79, 71, 77, 90, 104, 118, 136,
+        139, 82, 73, 79, 92, 105, 120, 139, 142, 82, 74, 79, 92, 106, 121, 139,
+        150, 87, 78, 83, 96, 110, 125, 144, 153, 89, 81, 83, 97, 113, 128, 145,
+        157, 92, 83, 84, 97, 114, 132, 150, 157, 94, 85, 85, 97, 112, 130, 151,
+        163, 97, 88, 86, 97, 111, 128, 147, 163, 99, 91, 87, 97, 110, 126, 144,
+        163,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 32, 32, 32, 34, 35, 36, 39, 41, 44, 47, 48, 53, 55,
+        58, 63, 65, 71, 74, 79, 82, 82, 87, 89, 92, 94, 97, 99, 31, 32, 32, 32,
+        33, 33, 34, 34, 35, 36, 36, 39, 40, 42, 44, 45, 50, 51, 54, 58, 59, 64,
+        67, 71, 73, 74, 78, 81, 83, 85, 88, 91, 35, 35, 34, 34, 35, 36, 37, 39,
+        41, 45, 46, 48, 49, 51, 53, 54, 57, 59, 61, 65, 66, 71, 73, 77, 79, 79,
+        83, 83, 84, 85, 86, 87, 44, 42, 41, 41, 42, 42, 42, 44, 48, 52, 54, 58,
+        60, 63, 66, 67, 71, 72, 75, 78, 79, 84, 86, 90, 92, 92, 96, 97, 97, 97,
+        97, 97, 53, 51, 50, 49, 49, 50, 49, 51, 54, 59, 60, 65, 67, 71, 75, 76,
+        82, 84, 87, 91, 92, 97, 100, 104, 105, 106, 110, 113, 114, 112, 111,
+        110, 65, 62, 61, 59, 59, 59, 58, 60, 63, 67, 68, 73, 76, 79, 84, 85, 92,
+        94, 98, 103, 105, 111, 113, 118, 120, 121, 125, 128, 132, 130, 128, 126,
+        82, 78, 76, 74, 73, 73, 71, 73, 76, 79, 80, 86, 88, 92, 97, 98, 106,
+        108, 112, 118, 120, 127, 131, 136, 139, 139, 144, 145, 150, 151, 147,
+        144, 90, 86, 85, 82, 81, 80, 79, 78, 81, 83, 87, 88, 93, 94, 101, 101,
+        108, 110, 116, 119, 124, 129, 134, 139, 142, 150, 153, 157, 157, 163,
+        163, 163 },
+      { /* Chroma */
+        /* Size 4x4 */
+        32, 45, 51, 61, 45, 54, 59, 65, 51, 59, 75, 81, 61, 65, 81, 97,
+        /* Size 8x8 */
+        31, 34, 46, 47, 50, 57, 61, 65, 34, 39, 47, 45, 48, 53, 57, 61, 46, 47,
+        52, 52, 54, 58, 61, 62, 47, 45, 52, 58, 62, 65, 68, 68, 50, 48, 54, 62,
+        68, 73, 77, 76, 57, 53, 58, 65, 73, 82, 86, 86, 61, 57, 61, 68, 77, 86,
+        91, 95, 65, 61, 62, 68, 76, 86, 95, 100,
+        /* Size 16x16 */
+        32, 31, 33, 36, 41, 49, 49, 50, 52, 54, 57, 61, 64, 67, 68, 70, 31, 31,
+        34, 39, 42, 47, 46, 47, 49, 51, 53, 57, 60, 62, 64, 66, 33, 34, 37, 42,
+        44, 47, 46, 46, 47, 49, 51, 55, 57, 59, 61, 63, 36, 39, 42, 47, 47, 48,
+        46, 46, 47, 48, 50, 53, 55, 57, 59, 61, 41, 42, 44, 47, 48, 50, 49, 50,
+        50, 52, 53, 56, 58, 60, 61, 60, 49, 47, 47, 48, 50, 53, 53, 54, 54, 55,
+        56, 59, 61, 63, 64, 64, 49, 46, 46, 46, 49, 53, 55, 57, 59, 60, 61, 64,
+        66, 67, 67, 67, 50, 47, 46, 46, 50, 54, 57, 61, 63, 64, 66, 69, 70, 72,
+        71, 71, 52, 49, 47, 47, 50, 54, 59, 63, 66, 68, 70, 73, 75, 77, 75, 75,
+        54, 51, 49, 48, 52, 55, 60, 64, 68, 71, 73, 76, 79, 80, 79, 79, 57, 53,
+        51, 50, 53, 56, 61, 66, 70, 73, 76, 80, 82, 84, 83, 84, 61, 57, 55, 53,
+        56, 59, 64, 69, 73, 76, 80, 84, 87, 89, 88, 88, 64, 60, 57, 55, 58, 61,
+        66, 70, 75, 79, 82, 87, 91, 93, 93, 93, 67, 62, 59, 57, 60, 63, 67, 72,
+        77, 80, 84, 89, 93, 95, 96, 97, 68, 64, 61, 59, 61, 64, 67, 71, 75, 79,
+        83, 88, 93, 96, 99, 100, 70, 66, 63, 61, 60, 64, 67, 71, 75, 79, 84, 88,
+        93, 97, 100, 102,
+        /* Size 32x32 */
+        32, 31, 31, 30, 33, 33, 36, 38, 41, 47, 49, 48, 49, 49, 50, 50, 52, 53,
+        54, 56, 57, 60, 61, 63, 64, 65, 67, 67, 68, 69, 70, 71, 31, 31, 31, 31,
+        34, 34, 38, 40, 42, 46, 47, 47, 47, 47, 48, 48, 50, 50, 52, 54, 54, 57,
+        58, 60, 61, 61, 63, 64, 65, 65, 66, 67, 31, 31, 31, 31, 34, 35, 39, 40,
+        42, 46, 47, 46, 46, 46, 47, 47, 49, 50, 51, 53, 53, 56, 57, 59, 60, 60,
+        62, 63, 64, 65, 66, 67, 30, 31, 31, 32, 34, 35, 40, 41, 42, 45, 46, 45,
+        45, 45, 46, 46, 47, 48, 49, 51, 52, 54, 55, 57, 58, 58, 60, 61, 62, 62,
+        63, 64, 33, 34, 34, 34, 37, 38, 42, 43, 44, 46, 47, 46, 46, 45, 46, 46,
+        47, 48, 49, 51, 51, 53, 55, 56, 57, 57, 59, 60, 61, 62, 63, 64, 33, 34,
+        35, 35, 38, 39, 43, 44, 45, 47, 47, 46, 46, 45, 46, 46, 47, 48, 49, 51,
+        51, 53, 54, 56, 57, 57, 59, 60, 60, 61, 62, 62, 36, 38, 39, 40, 42, 43,
+        47, 47, 47, 47, 48, 46, 46, 45, 46, 46, 47, 47, 48, 49, 50, 52, 53, 54,
+        55, 55, 57, 58, 59, 60, 61, 62, 38, 40, 40, 41, 43, 44, 47, 47, 48, 48,
+        49, 48, 47, 47, 47, 47, 48, 49, 49, 51, 51, 53, 54, 55, 56, 56, 58, 58,
+        58, 59, 60, 60, 41, 42, 42, 42, 44, 45, 47, 48, 48, 50, 50, 49, 49, 49,
+        50, 50, 50, 51, 52, 53, 53, 55, 56, 57, 58, 58, 60, 61, 61, 61, 60, 60,
+        47, 46, 46, 45, 46, 47, 47, 48, 50, 52, 52, 52, 52, 52, 53, 53, 53, 54,
+        55, 55, 56, 58, 58, 60, 60, 61, 62, 61, 61, 62, 63, 64, 49, 47, 47, 46,
+        47, 47, 48, 49, 50, 52, 53, 53, 53, 53, 54, 54, 54, 55, 55, 56, 56, 58,
+        59, 60, 61, 61, 63, 63, 64, 64, 64, 64, 48, 47, 46, 45, 46, 46, 46, 48,
+        49, 52, 53, 54, 55, 55, 56, 56, 57, 58, 58, 59, 60, 61, 62, 63, 64, 64,
+        66, 65, 65, 65, 66, 67, 49, 47, 46, 45, 46, 46, 46, 47, 49, 52, 53, 55,
+        55, 57, 57, 58, 59, 59, 60, 61, 61, 63, 64, 65, 66, 66, 67, 67, 67, 68,
+        67, 67, 49, 47, 46, 45, 45, 45, 45, 47, 49, 52, 53, 55, 57, 58, 59, 60,
+        61, 62, 62, 63, 63, 65, 66, 67, 68, 68, 69, 70, 69, 68, 69, 70, 50, 48,
+        47, 46, 46, 46, 46, 47, 50, 53, 54, 56, 57, 59, 61, 61, 63, 64, 64, 66,
+        66, 68, 69, 70, 70, 71, 72, 70, 71, 72, 71, 70, 50, 48, 47, 46, 46, 46,
+        46, 47, 50, 53, 54, 56, 58, 60, 61, 61, 63, 64, 65, 66, 67, 68, 69, 71,
+        71, 71, 73, 74, 73, 72, 73, 74, 52, 50, 49, 47, 47, 47, 47, 48, 50, 53,
+        54, 57, 59, 61, 63, 63, 66, 67, 68, 70, 70, 72, 73, 75, 75, 75, 77, 75,
+        75, 76, 75, 74, 53, 50, 50, 48, 48, 48, 47, 49, 51, 54, 55, 58, 59, 62,
+        64, 64, 67, 68, 69, 71, 71, 73, 74, 76, 77, 77, 78, 78, 78, 76, 77, 78,
+        54, 52, 51, 49, 49, 49, 48, 49, 52, 55, 55, 58, 60, 62, 64, 65, 68, 69,
+        71, 73, 73, 75, 76, 78, 79, 79, 80, 80, 79, 80, 79, 78, 56, 54, 53, 51,
+        51, 51, 49, 51, 53, 55, 56, 59, 61, 63, 66, 66, 70, 71, 73, 75, 76, 78,
+        79, 81, 82, 82, 83, 81, 83, 81, 81, 82, 57, 54, 53, 52, 51, 51, 50, 51,
+        53, 56, 56, 60, 61, 63, 66, 67, 70, 71, 73, 76, 76, 79, 80, 82, 82, 83,
+        84, 85, 83, 84, 84, 82, 60, 57, 56, 54, 53, 53, 52, 53, 55, 58, 58, 61,
+        63, 65, 68, 68, 72, 73, 75, 78, 79, 82, 83, 85, 86, 86, 88, 86, 87, 86,
+        85, 86, 61, 58, 57, 55, 55, 54, 53, 54, 56, 58, 59, 62, 64, 66, 69, 69,
+        73, 74, 76, 79, 80, 83, 84, 86, 87, 88, 89, 89, 88, 88, 88, 86, 63, 60,
+        59, 57, 56, 56, 54, 55, 57, 60, 60, 63, 65, 67, 70, 71, 75, 76, 78, 81,
+        82, 85, 86, 89, 90, 90, 92, 91, 91, 90, 89, 91, 64, 61, 60, 58, 57, 57,
+        55, 56, 58, 60, 61, 64, 66, 68, 70, 71, 75, 77, 79, 82, 82, 86, 87, 90,
+        91, 91, 93, 93, 93, 92, 93, 91, 65, 61, 60, 58, 57, 57, 55, 56, 58, 61,
+        61, 64, 66, 68, 71, 71, 75, 77, 79, 82, 83, 86, 88, 90, 91, 91, 93, 94,
+        95, 95, 93, 95, 67, 63, 62, 60, 59, 59, 57, 58, 60, 62, 63, 66, 67, 69,
+        72, 73, 77, 78, 80, 83, 84, 88, 89, 92, 93, 93, 95, 95, 96, 96, 97, 95,
+        67, 64, 63, 61, 60, 60, 58, 58, 61, 61, 63, 65, 67, 70, 70, 74, 75, 78,
+        80, 81, 85, 86, 89, 91, 93, 94, 95, 97, 97, 98, 98, 100, 68, 65, 64, 62,
+        61, 60, 59, 58, 61, 61, 64, 65, 67, 69, 71, 73, 75, 78, 79, 83, 83, 87,
+        88, 91, 93, 95, 96, 97, 99, 98, 100, 100, 69, 65, 65, 62, 62, 61, 60,
+        59, 61, 62, 64, 65, 68, 68, 72, 72, 76, 76, 80, 81, 84, 86, 88, 90, 92,
+        95, 96, 98, 98, 100, 100, 101, 70, 66, 66, 63, 63, 62, 61, 60, 60, 63,
+        64, 66, 67, 69, 71, 73, 75, 77, 79, 81, 84, 85, 88, 89, 93, 93, 97, 98,
+        100, 100, 102, 101, 71, 67, 67, 64, 64, 62, 62, 60, 60, 64, 64, 67, 67,
+        70, 70, 74, 74, 78, 78, 82, 82, 86, 86, 91, 91, 95, 95, 100, 100, 101,
+        101, 104,
+        /* Size 4x8 */
+        31, 47, 53, 63, 36, 47, 50, 59, 46, 52, 55, 61, 45, 53, 63, 70, 49, 55,
+        71, 77, 54, 58, 77, 86, 59, 61, 81, 94, 63, 65, 80, 95,
+        /* Size 8x4 */
+        31, 36, 46, 45, 49, 54, 59, 63, 47, 47, 52, 53, 55, 58, 61, 65, 53, 50,
+        55, 63, 71, 77, 81, 80, 63, 59, 61, 70, 77, 86, 94, 95,
+        /* Size 8x16 */
+        32, 33, 45, 49, 52, 57, 64, 68, 31, 34, 45, 46, 49, 53, 60, 64, 33, 37,
+        46, 45, 47, 51, 57, 61, 37, 43, 47, 45, 47, 50, 55, 59, 42, 44, 49, 49,
+        50, 53, 58, 60, 49, 47, 52, 53, 54, 57, 61, 63, 48, 46, 51, 57, 59, 61,
+        66, 67, 50, 46, 52, 59, 63, 66, 71, 71, 52, 47, 53, 61, 66, 71, 75, 74,
+        54, 49, 54, 62, 68, 73, 79, 79, 57, 51, 55, 64, 70, 76, 83, 83, 61, 55,
+        58, 66, 73, 80, 87, 87, 64, 57, 60, 68, 75, 83, 91, 91, 66, 59, 61, 69,
+        77, 84, 93, 95, 68, 61, 61, 68, 77, 86, 94, 97, 70, 63, 61, 67, 75, 83,
+        92, 98,
+        /* Size 16x8 */
+        32, 31, 33, 37, 42, 49, 48, 50, 52, 54, 57, 61, 64, 66, 68, 70, 33, 34,
+        37, 43, 44, 47, 46, 46, 47, 49, 51, 55, 57, 59, 61, 63, 45, 45, 46, 47,
+        49, 52, 51, 52, 53, 54, 55, 58, 60, 61, 61, 61, 49, 46, 45, 45, 49, 53,
+        57, 59, 61, 62, 64, 66, 68, 69, 68, 67, 52, 49, 47, 47, 50, 54, 59, 63,
+        66, 68, 70, 73, 75, 77, 77, 75, 57, 53, 51, 50, 53, 57, 61, 66, 71, 73,
+        76, 80, 83, 84, 86, 83, 64, 60, 57, 55, 58, 61, 66, 71, 75, 79, 83, 87,
+        91, 93, 94, 92, 68, 64, 61, 59, 60, 63, 67, 71, 74, 79, 83, 87, 91, 95,
+        97, 98,
+        /* Size 16x32 */
+        32, 31, 33, 37, 45, 48, 49, 50, 52, 56, 57, 63, 64, 67, 68, 68, 31, 31,
+        34, 38, 45, 47, 47, 48, 50, 53, 54, 60, 61, 63, 64, 65, 31, 32, 34, 39,
+        45, 46, 46, 47, 49, 52, 53, 59, 60, 62, 64, 65, 30, 32, 35, 40, 44, 46,
+        45, 46, 48, 51, 52, 57, 58, 60, 61, 62, 33, 35, 37, 42, 46, 47, 45, 46,
+        47, 50, 51, 56, 57, 60, 61, 62, 33, 36, 38, 43, 46, 47, 46, 46, 47, 50,
+        51, 56, 57, 59, 60, 60, 37, 40, 43, 47, 47, 47, 45, 46, 47, 49, 50, 54,
+        55, 57, 59, 61, 39, 41, 43, 47, 48, 48, 47, 47, 48, 50, 51, 55, 56, 57,
+        58, 59, 42, 43, 44, 47, 49, 50, 49, 50, 50, 53, 53, 57, 58, 60, 60, 59,
+        47, 46, 46, 48, 51, 52, 53, 53, 53, 55, 56, 60, 61, 61, 61, 62, 49, 46,
+        47, 48, 52, 53, 53, 54, 54, 56, 57, 60, 61, 63, 63, 62, 48, 46, 46, 47,
+        51, 53, 56, 56, 57, 59, 60, 64, 64, 65, 64, 65, 48, 45, 46, 46, 51, 53,
+        57, 57, 59, 61, 61, 65, 66, 66, 67, 65, 49, 45, 45, 46, 51, 53, 58, 59,
+        61, 63, 64, 67, 68, 70, 67, 68, 50, 46, 46, 46, 52, 54, 59, 61, 63, 65,
+        66, 70, 71, 70, 71, 68, 50, 46, 46, 46, 52, 54, 59, 61, 64, 66, 67, 71,
+        71, 73, 71, 72, 52, 48, 47, 47, 53, 54, 61, 63, 66, 70, 71, 75, 75, 75,
+        74, 72, 53, 49, 48, 48, 53, 55, 61, 64, 67, 71, 72, 76, 77, 77, 75, 76,
+        54, 50, 49, 49, 54, 55, 62, 65, 68, 72, 73, 78, 79, 80, 79, 76, 56, 51,
+        51, 50, 55, 56, 63, 66, 70, 74, 76, 81, 82, 81, 80, 80, 57, 52, 51, 50,
+        55, 56, 64, 66, 70, 75, 76, 82, 83, 85, 83, 80, 60, 54, 54, 52, 57, 58,
+        65, 68, 72, 77, 79, 85, 86, 86, 85, 84, 61, 56, 55, 53, 58, 59, 66, 69,
+        73, 79, 80, 86, 87, 89, 87, 84, 63, 57, 56, 55, 59, 60, 67, 70, 75, 80,
+        82, 89, 90, 91, 89, 89, 64, 58, 57, 56, 60, 61, 68, 71, 75, 81, 83, 90,
+        91, 93, 91, 89, 64, 59, 58, 56, 60, 61, 68, 71, 75, 81, 83, 90, 91, 94,
+        94, 93, 66, 60, 59, 57, 61, 63, 69, 72, 77, 82, 84, 92, 93, 94, 95, 93,
+        67, 61, 60, 58, 61, 63, 69, 70, 78, 80, 85, 90, 93, 96, 97, 97, 68, 62,
+        61, 59, 61, 64, 68, 71, 77, 79, 86, 88, 94, 96, 97, 98, 69, 63, 62, 59,
+        61, 65, 68, 72, 76, 80, 85, 88, 94, 95, 99, 99, 70, 63, 63, 60, 61, 66,
+        67, 73, 75, 81, 83, 89, 92, 97, 98, 99, 70, 64, 64, 61, 61, 67, 67, 74,
+        74, 82, 82, 90, 90, 98, 98, 102,
+        /* Size 32x16 */
+        32, 31, 31, 30, 33, 33, 37, 39, 42, 47, 49, 48, 48, 49, 50, 50, 52, 53,
+        54, 56, 57, 60, 61, 63, 64, 64, 66, 67, 68, 69, 70, 70, 31, 31, 32, 32,
+        35, 36, 40, 41, 43, 46, 46, 46, 45, 45, 46, 46, 48, 49, 50, 51, 52, 54,
+        56, 57, 58, 59, 60, 61, 62, 63, 63, 64, 33, 34, 34, 35, 37, 38, 43, 43,
+        44, 46, 47, 46, 46, 45, 46, 46, 47, 48, 49, 51, 51, 54, 55, 56, 57, 58,
+        59, 60, 61, 62, 63, 64, 37, 38, 39, 40, 42, 43, 47, 47, 47, 48, 48, 47,
+        46, 46, 46, 46, 47, 48, 49, 50, 50, 52, 53, 55, 56, 56, 57, 58, 59, 59,
+        60, 61, 45, 45, 45, 44, 46, 46, 47, 48, 49, 51, 52, 51, 51, 51, 52, 52,
+        53, 53, 54, 55, 55, 57, 58, 59, 60, 60, 61, 61, 61, 61, 61, 61, 48, 47,
+        46, 46, 47, 47, 47, 48, 50, 52, 53, 53, 53, 53, 54, 54, 54, 55, 55, 56,
+        56, 58, 59, 60, 61, 61, 63, 63, 64, 65, 66, 67, 49, 47, 46, 45, 45, 46,
+        45, 47, 49, 53, 53, 56, 57, 58, 59, 59, 61, 61, 62, 63, 64, 65, 66, 67,
+        68, 68, 69, 69, 68, 68, 67, 67, 50, 48, 47, 46, 46, 46, 46, 47, 50, 53,
+        54, 56, 57, 59, 61, 61, 63, 64, 65, 66, 66, 68, 69, 70, 71, 71, 72, 70,
+        71, 72, 73, 74, 52, 50, 49, 48, 47, 47, 47, 48, 50, 53, 54, 57, 59, 61,
+        63, 64, 66, 67, 68, 70, 70, 72, 73, 75, 75, 75, 77, 78, 77, 76, 75, 74,
+        56, 53, 52, 51, 50, 50, 49, 50, 53, 55, 56, 59, 61, 63, 65, 66, 70, 71,
+        72, 74, 75, 77, 79, 80, 81, 81, 82, 80, 79, 80, 81, 82, 57, 54, 53, 52,
+        51, 51, 50, 51, 53, 56, 57, 60, 61, 64, 66, 67, 71, 72, 73, 76, 76, 79,
+        80, 82, 83, 83, 84, 85, 86, 85, 83, 82, 63, 60, 59, 57, 56, 56, 54, 55,
+        57, 60, 60, 64, 65, 67, 70, 71, 75, 76, 78, 81, 82, 85, 86, 89, 90, 90,
+        92, 90, 88, 88, 89, 90, 64, 61, 60, 58, 57, 57, 55, 56, 58, 61, 61, 64,
+        66, 68, 71, 71, 75, 77, 79, 82, 83, 86, 87, 90, 91, 91, 93, 93, 94, 94,
+        92, 90, 67, 63, 62, 60, 60, 59, 57, 57, 60, 61, 63, 65, 66, 70, 70, 73,
+        75, 77, 80, 81, 85, 86, 89, 91, 93, 94, 94, 96, 96, 95, 97, 98, 68, 64,
+        64, 61, 61, 60, 59, 58, 60, 61, 63, 64, 67, 67, 71, 71, 74, 75, 79, 80,
+        83, 85, 87, 89, 91, 94, 95, 97, 97, 99, 98, 98, 68, 65, 65, 62, 62, 60,
+        61, 59, 59, 62, 62, 65, 65, 68, 68, 72, 72, 76, 76, 80, 80, 84, 84, 89,
+        89, 93, 93, 97, 98, 99, 99, 102,
+        /* Size 4x16 */
+        31, 48, 56, 67, 32, 46, 52, 62, 35, 47, 50, 60, 40, 47, 49, 57, 43, 50,
+        53, 60, 46, 53, 56, 63, 45, 53, 61, 66, 46, 54, 65, 70, 48, 54, 70, 75,
+        50, 55, 72, 80, 52, 56, 75, 85, 56, 59, 79, 89, 58, 61, 81, 93, 60, 63,
+        82, 94, 62, 64, 79, 96, 63, 66, 81, 97,
+        /* Size 16x4 */
+        31, 32, 35, 40, 43, 46, 45, 46, 48, 50, 52, 56, 58, 60, 62, 63, 48, 46,
+        47, 47, 50, 53, 53, 54, 54, 55, 56, 59, 61, 63, 64, 66, 56, 52, 50, 49,
+        53, 56, 61, 65, 70, 72, 75, 79, 81, 82, 79, 81, 67, 62, 60, 57, 60, 63,
+        66, 70, 75, 80, 85, 89, 93, 94, 96, 97,
+        /* Size 8x32 */
+        32, 33, 45, 49, 52, 57, 64, 68, 31, 34, 45, 47, 50, 54, 61, 64, 31, 34,
+        45, 46, 49, 53, 60, 64, 30, 35, 44, 45, 48, 52, 58, 61, 33, 37, 46, 45,
+        47, 51, 57, 61, 33, 38, 46, 46, 47, 51, 57, 60, 37, 43, 47, 45, 47, 50,
+        55, 59, 39, 43, 48, 47, 48, 51, 56, 58, 42, 44, 49, 49, 50, 53, 58, 60,
+        47, 46, 51, 53, 53, 56, 61, 61, 49, 47, 52, 53, 54, 57, 61, 63, 48, 46,
+        51, 56, 57, 60, 64, 64, 48, 46, 51, 57, 59, 61, 66, 67, 49, 45, 51, 58,
+        61, 64, 68, 67, 50, 46, 52, 59, 63, 66, 71, 71, 50, 46, 52, 59, 64, 67,
+        71, 71, 52, 47, 53, 61, 66, 71, 75, 74, 53, 48, 53, 61, 67, 72, 77, 75,
+        54, 49, 54, 62, 68, 73, 79, 79, 56, 51, 55, 63, 70, 76, 82, 80, 57, 51,
+        55, 64, 70, 76, 83, 83, 60, 54, 57, 65, 72, 79, 86, 85, 61, 55, 58, 66,
+        73, 80, 87, 87, 63, 56, 59, 67, 75, 82, 90, 89, 64, 57, 60, 68, 75, 83,
+        91, 91, 64, 58, 60, 68, 75, 83, 91, 94, 66, 59, 61, 69, 77, 84, 93, 95,
+        67, 60, 61, 69, 78, 85, 93, 97, 68, 61, 61, 68, 77, 86, 94, 97, 69, 62,
+        61, 68, 76, 85, 94, 99, 70, 63, 61, 67, 75, 83, 92, 98, 70, 64, 61, 67,
+        74, 82, 90, 98,
+        /* Size 32x8 */
+        32, 31, 31, 30, 33, 33, 37, 39, 42, 47, 49, 48, 48, 49, 50, 50, 52, 53,
+        54, 56, 57, 60, 61, 63, 64, 64, 66, 67, 68, 69, 70, 70, 33, 34, 34, 35,
+        37, 38, 43, 43, 44, 46, 47, 46, 46, 45, 46, 46, 47, 48, 49, 51, 51, 54,
+        55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 45, 45, 45, 44, 46, 46, 47, 48,
+        49, 51, 52, 51, 51, 51, 52, 52, 53, 53, 54, 55, 55, 57, 58, 59, 60, 60,
+        61, 61, 61, 61, 61, 61, 49, 47, 46, 45, 45, 46, 45, 47, 49, 53, 53, 56,
+        57, 58, 59, 59, 61, 61, 62, 63, 64, 65, 66, 67, 68, 68, 69, 69, 68, 68,
+        67, 67, 52, 50, 49, 48, 47, 47, 47, 48, 50, 53, 54, 57, 59, 61, 63, 64,
+        66, 67, 68, 70, 70, 72, 73, 75, 75, 75, 77, 78, 77, 76, 75, 74, 57, 54,
+        53, 52, 51, 51, 50, 51, 53, 56, 57, 60, 61, 64, 66, 67, 71, 72, 73, 76,
+        76, 79, 80, 82, 83, 83, 84, 85, 86, 85, 83, 82, 64, 61, 60, 58, 57, 57,
+        55, 56, 58, 61, 61, 64, 66, 68, 71, 71, 75, 77, 79, 82, 83, 86, 87, 90,
+        91, 91, 93, 93, 94, 94, 92, 90, 68, 64, 64, 61, 61, 60, 59, 58, 60, 61,
+        63, 64, 67, 67, 71, 71, 74, 75, 79, 80, 83, 85, 87, 89, 91, 94, 95, 97,
+        97, 99, 98, 98 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 34, 53, 75, 34, 49, 64, 81, 53, 64, 91, 112, 75, 81, 112, 140,
+        /* Size 8x8 */
+        32, 32, 34, 39, 50, 62, 76, 84, 32, 33, 35, 40, 48, 59, 71, 79, 34, 35,
+        39, 46, 53, 63, 74, 81, 39, 40, 46, 56, 65, 75, 86, 92, 50, 48, 53, 65,
+        78, 90, 101, 106, 62, 59, 63, 75, 90, 105, 118, 123, 76, 71, 74, 86,
+        101, 118, 134, 142, 84, 79, 81, 92, 106, 123, 142, 153,
+        /* Size 16x16 */
+        32, 31, 31, 32, 33, 36, 39, 44, 48, 54, 59, 66, 74, 81, 86, 91, 31, 32,
+        32, 32, 33, 35, 38, 42, 46, 51, 56, 63, 70, 77, 81, 86, 31, 32, 32, 33,
+        34, 35, 38, 41, 45, 49, 54, 60, 67, 73, 77, 82, 32, 32, 33, 34, 36, 37,
+        40, 42, 45, 49, 53, 59, 66, 71, 75, 80, 33, 33, 34, 36, 38, 42, 44, 46,
+        50, 53, 57, 63, 69, 74, 78, 80, 36, 35, 35, 37, 42, 48, 50, 54, 57, 60,
+        64, 69, 75, 80, 84, 85, 39, 38, 38, 40, 44, 50, 54, 58, 61, 65, 69, 74,
+        80, 85, 89, 91, 44, 42, 41, 42, 46, 54, 58, 63, 67, 71, 75, 80, 86, 91,
+        95, 97, 48, 46, 45, 45, 50, 57, 61, 67, 71, 76, 80, 86, 93, 98, 101,
+        104, 54, 51, 49, 49, 53, 60, 65, 71, 76, 82, 87, 93, 100, 105, 109, 112,
+        59, 56, 54, 53, 57, 64, 69, 75, 80, 87, 92, 99, 106, 112, 116, 120, 66,
+        63, 60, 59, 63, 69, 74, 80, 86, 93, 99, 107, 115, 121, 125, 129, 74, 70,
+        67, 66, 69, 75, 80, 86, 93, 100, 106, 115, 123, 130, 135, 138, 81, 77,
+        73, 71, 74, 80, 85, 91, 98, 105, 112, 121, 130, 137, 142, 148, 86, 81,
+        77, 75, 78, 84, 89, 95, 101, 109, 116, 125, 135, 142, 147, 153, 91, 86,
+        82, 80, 80, 85, 91, 97, 104, 112, 120, 129, 138, 148, 153, 159,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 32, 32, 33, 34, 36, 36, 39, 41, 44, 46, 48, 52,
+        54, 58, 59, 65, 66, 71, 74, 80, 81, 83, 86, 89, 91, 93, 31, 32, 32, 32,
+        32, 32, 32, 32, 33, 34, 35, 35, 38, 39, 42, 44, 46, 50, 51, 56, 56, 62,
+        63, 68, 71, 76, 77, 78, 82, 84, 86, 88, 31, 32, 32, 32, 32, 32, 32, 32,
+        33, 34, 35, 35, 38, 39, 42, 44, 46, 49, 51, 55, 56, 61, 63, 67, 70, 75,
+        77, 78, 81, 84, 86, 88, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34,
+        37, 38, 41, 42, 44, 48, 49, 53, 54, 59, 60, 65, 68, 72, 74, 75, 78, 80,
+        82, 84, 31, 32, 32, 32, 32, 33, 33, 33, 34, 34, 35, 35, 38, 39, 41, 43,
+        45, 48, 49, 53, 54, 59, 60, 65, 67, 72, 73, 74, 77, 80, 82, 84, 31, 32,
+        32, 32, 33, 33, 33, 34, 35, 35, 36, 36, 39, 40, 42, 44, 45, 48, 50, 53,
+        54, 59, 60, 64, 67, 71, 73, 74, 77, 79, 81, 83, 32, 32, 32, 32, 33, 33,
+        34, 35, 36, 36, 37, 38, 40, 40, 42, 44, 45, 48, 49, 53, 53, 58, 59, 63,
+        66, 70, 71, 72, 75, 78, 80, 83, 32, 32, 32, 32, 33, 34, 35, 35, 36, 37,
+        38, 38, 40, 41, 42, 44, 46, 48, 49, 53, 53, 58, 59, 63, 65, 69, 71, 72,
+        74, 77, 79, 80, 33, 33, 33, 33, 34, 35, 36, 36, 38, 39, 42, 42, 44, 45,
+        46, 48, 50, 52, 53, 57, 57, 62, 63, 67, 69, 73, 74, 75, 78, 79, 80, 81,
+        34, 34, 34, 33, 34, 35, 36, 37, 39, 39, 42, 43, 45, 46, 47, 49, 51, 53,
+        54, 58, 58, 63, 64, 68, 70, 74, 75, 76, 79, 81, 84, 86, 36, 35, 35, 34,
+        35, 36, 37, 38, 42, 42, 48, 48, 50, 51, 54, 55, 57, 59, 60, 63, 64, 68,
+        69, 73, 75, 79, 80, 81, 84, 85, 85, 86, 36, 35, 35, 34, 35, 36, 38, 38,
+        42, 43, 48, 49, 51, 52, 54, 55, 57, 59, 60, 64, 64, 68, 69, 73, 75, 79,
+        80, 81, 84, 86, 88, 91, 39, 38, 38, 37, 38, 39, 40, 40, 44, 45, 50, 51,
+        54, 55, 58, 59, 61, 64, 65, 68, 69, 73, 74, 78, 80, 84, 85, 86, 89, 91,
+        91, 91, 41, 39, 39, 38, 39, 40, 40, 41, 45, 46, 51, 52, 55, 56, 59, 61,
+        63, 65, 67, 70, 70, 75, 76, 80, 82, 86, 87, 88, 91, 92, 94, 96, 44, 42,
+        42, 41, 41, 42, 42, 42, 46, 47, 54, 54, 58, 59, 63, 65, 67, 70, 71, 75,
+        75, 79, 80, 84, 86, 90, 91, 92, 95, 97, 97, 97, 46, 44, 44, 42, 43, 44,
+        44, 44, 48, 49, 55, 55, 59, 61, 65, 67, 69, 72, 74, 77, 78, 82, 83, 87,
+        89, 93, 94, 95, 98, 98, 100, 103, 48, 46, 46, 44, 45, 45, 45, 46, 50,
+        51, 57, 57, 61, 63, 67, 69, 71, 74, 76, 80, 80, 85, 86, 90, 93, 96, 98,
+        99, 101, 104, 104, 103, 52, 50, 49, 48, 48, 48, 48, 48, 52, 53, 59, 59,
+        64, 65, 70, 72, 74, 78, 80, 84, 85, 90, 91, 95, 97, 101, 103, 104, 106,
+        106, 107, 110, 54, 51, 51, 49, 49, 50, 49, 49, 53, 54, 60, 60, 65, 67,
+        71, 74, 76, 80, 82, 86, 87, 92, 93, 97, 100, 104, 105, 106, 109, 112,
+        112, 110, 58, 56, 55, 53, 53, 53, 53, 53, 57, 58, 63, 64, 68, 70, 75,
+        77, 80, 84, 86, 91, 91, 97, 98, 103, 105, 110, 111, 112, 115, 114, 115,
+        118, 59, 56, 56, 54, 54, 54, 53, 53, 57, 58, 64, 64, 69, 70, 75, 78, 80,
+        85, 87, 91, 92, 98, 99, 103, 106, 110, 112, 113, 116, 119, 120, 119, 65,
+        62, 61, 59, 59, 59, 58, 58, 62, 63, 68, 68, 73, 75, 79, 82, 85, 90, 92,
+        97, 98, 105, 106, 111, 114, 118, 120, 121, 124, 123, 123, 126, 66, 63,
+        63, 60, 60, 60, 59, 59, 63, 64, 69, 69, 74, 76, 80, 83, 86, 91, 93, 98,
+        99, 106, 107, 112, 115, 119, 121, 122, 125, 128, 129, 126, 71, 68, 67,
+        65, 65, 64, 63, 63, 67, 68, 73, 73, 78, 80, 84, 87, 90, 95, 97, 103,
+        103, 111, 112, 117, 120, 125, 127, 128, 131, 132, 132, 135, 74, 71, 70,
+        68, 67, 67, 66, 65, 69, 70, 75, 75, 80, 82, 86, 89, 93, 97, 100, 105,
+        106, 114, 115, 120, 123, 128, 130, 131, 135, 135, 138, 136, 80, 76, 75,
+        72, 72, 71, 70, 69, 73, 74, 79, 79, 84, 86, 90, 93, 96, 101, 104, 110,
+        110, 118, 119, 125, 128, 134, 136, 137, 140, 142, 140, 144, 81, 77, 77,
+        74, 73, 73, 71, 71, 74, 75, 80, 80, 85, 87, 91, 94, 98, 103, 105, 111,
+        112, 120, 121, 127, 130, 136, 137, 139, 142, 145, 148, 144, 83, 78, 78,
+        75, 74, 74, 72, 72, 75, 76, 81, 81, 86, 88, 92, 95, 99, 104, 106, 112,
+        113, 121, 122, 128, 131, 137, 139, 140, 144, 148, 150, 155, 86, 82, 81,
+        78, 77, 77, 75, 74, 78, 79, 84, 84, 89, 91, 95, 98, 101, 106, 109, 115,
+        116, 124, 125, 131, 135, 140, 142, 144, 147, 149, 153, 155, 89, 84, 84,
+        80, 80, 79, 78, 77, 79, 81, 85, 86, 91, 92, 97, 98, 104, 106, 112, 114,
+        119, 123, 128, 132, 135, 142, 145, 148, 149, 153, 154, 159, 91, 86, 86,
+        82, 82, 81, 80, 79, 80, 84, 85, 88, 91, 94, 97, 100, 104, 107, 112, 115,
+        120, 123, 129, 132, 138, 140, 148, 150, 153, 154, 159, 159, 93, 88, 88,
+        84, 84, 83, 83, 80, 81, 86, 86, 91, 91, 96, 97, 103, 103, 110, 110, 118,
+        119, 126, 126, 135, 136, 144, 144, 155, 155, 159, 159, 164,
+        /* Size 4x8 */
+        32, 35, 51, 77, 32, 36, 50, 72, 34, 42, 54, 75, 38, 51, 67, 87, 48, 59,
+        80, 103, 60, 68, 92, 119, 72, 79, 104, 135, 81, 86, 112, 144,
+        /* Size 8x4 */
+        32, 32, 34, 38, 48, 60, 72, 81, 35, 36, 42, 51, 59, 68, 79, 86, 51, 50,
+        54, 67, 80, 92, 104, 112, 77, 72, 75, 87, 103, 119, 135, 144,
+        /* Size 8x16 */
+        32, 31, 33, 40, 51, 65, 79, 87, 31, 32, 33, 39, 49, 61, 74, 82, 31, 32,
+        34, 38, 47, 59, 71, 79, 32, 33, 36, 40, 48, 58, 69, 77, 33, 34, 38, 44,
+        52, 62, 72, 78, 36, 35, 42, 51, 58, 68, 78, 84, 39, 38, 44, 54, 63, 73,
+        84, 89, 44, 41, 46, 59, 69, 79, 90, 96, 48, 45, 50, 62, 74, 85, 96, 103,
+        53, 49, 53, 66, 79, 92, 103, 111, 58, 54, 57, 70, 84, 98, 110, 118, 66,
+        60, 63, 75, 90, 106, 119, 126, 74, 67, 69, 81, 97, 113, 128, 134, 81,
+        73, 75, 86, 102, 120, 135, 143, 86, 78, 78, 90, 106, 124, 140, 147, 91,
+        82, 80, 90, 103, 119, 137, 151,
+        /* Size 16x8 */
+        32, 31, 31, 32, 33, 36, 39, 44, 48, 53, 58, 66, 74, 81, 86, 91, 31, 32,
+        32, 33, 34, 35, 38, 41, 45, 49, 54, 60, 67, 73, 78, 82, 33, 33, 34, 36,
+        38, 42, 44, 46, 50, 53, 57, 63, 69, 75, 78, 80, 40, 39, 38, 40, 44, 51,
+        54, 59, 62, 66, 70, 75, 81, 86, 90, 90, 51, 49, 47, 48, 52, 58, 63, 69,
+        74, 79, 84, 90, 97, 102, 106, 103, 65, 61, 59, 58, 62, 68, 73, 79, 85,
+        92, 98, 106, 113, 120, 124, 119, 79, 74, 71, 69, 72, 78, 84, 90, 96,
+        103, 110, 119, 128, 135, 140, 137, 87, 82, 79, 77, 78, 84, 89, 96, 103,
+        111, 118, 126, 134, 143, 147, 151,
+        /* Size 16x32 */
+        32, 31, 31, 32, 33, 36, 40, 44, 51, 53, 65, 66, 79, 81, 87, 90, 31, 32,
+        32, 32, 33, 35, 39, 42, 49, 51, 62, 63, 75, 77, 83, 85, 31, 32, 32, 32,
+        33, 35, 39, 42, 49, 51, 61, 62, 74, 76, 82, 85, 31, 32, 32, 33, 33, 34,
+        38, 41, 47, 49, 59, 60, 72, 74, 79, 81, 31, 32, 32, 33, 34, 35, 38, 41,
+        47, 49, 59, 60, 71, 73, 79, 81, 32, 32, 33, 34, 35, 36, 39, 42, 48, 50,
+        59, 60, 71, 72, 78, 80, 32, 32, 33, 35, 36, 37, 40, 42, 48, 49, 58, 59,
+        69, 71, 77, 80, 32, 33, 33, 35, 36, 38, 41, 42, 48, 49, 58, 59, 69, 70,
+        75, 77, 33, 33, 34, 36, 38, 41, 44, 46, 52, 53, 62, 63, 72, 74, 78, 78,
+        34, 34, 34, 37, 39, 42, 45, 48, 53, 54, 63, 64, 73, 75, 80, 83, 36, 34,
+        35, 38, 42, 48, 51, 54, 58, 60, 68, 69, 78, 80, 84, 83, 36, 35, 35, 38,
+        42, 48, 51, 54, 59, 60, 68, 69, 79, 80, 85, 87, 39, 37, 38, 40, 44, 50,
+        54, 58, 63, 65, 73, 74, 84, 85, 89, 88, 40, 38, 39, 41, 45, 51, 56, 59,
+        65, 67, 75, 76, 85, 87, 90, 93, 44, 41, 41, 43, 46, 53, 59, 63, 69, 71,
+        79, 80, 90, 91, 96, 93, 46, 43, 43, 44, 48, 55, 60, 65, 72, 73, 82, 83,
+        93, 94, 97, 100, 48, 45, 45, 46, 50, 56, 62, 67, 74, 76, 85, 86, 96, 98,
+        103, 100, 52, 48, 48, 49, 52, 59, 65, 70, 78, 80, 90, 91, 101, 103, 105,
+        107, 53, 49, 49, 50, 53, 60, 66, 71, 79, 82, 92, 93, 103, 105, 111, 107,
+        58, 53, 53, 53, 57, 63, 69, 74, 83, 86, 97, 98, 109, 111, 113, 115, 58,
+        54, 54, 54, 57, 63, 70, 75, 84, 87, 98, 99, 110, 112, 118, 115, 65, 60,
+        59, 58, 62, 68, 74, 79, 89, 92, 105, 106, 118, 119, 122, 123, 66, 61,
+        60, 59, 63, 69, 75, 80, 90, 93, 106, 107, 119, 121, 126, 123, 71, 65,
+        65, 63, 67, 73, 79, 84, 94, 97, 111, 112, 125, 127, 131, 132, 74, 68,
+        67, 66, 69, 75, 81, 86, 97, 100, 113, 115, 128, 130, 134, 132, 79, 72,
+        72, 70, 73, 79, 85, 90, 101, 104, 118, 119, 133, 135, 141, 140, 81, 74,
+        73, 71, 75, 80, 86, 91, 102, 105, 120, 121, 135, 137, 143, 140, 82, 75,
+        74, 72, 75, 81, 87, 92, 103, 106, 121, 122, 136, 139, 147, 151, 86, 78,
+        78, 75, 78, 84, 90, 95, 106, 109, 124, 125, 140, 142, 147, 151, 88, 81,
+        80, 77, 80, 86, 90, 98, 105, 112, 122, 127, 140, 144, 152, 155, 91, 83,
+        82, 79, 80, 88, 90, 100, 103, 114, 119, 130, 137, 148, 151, 155, 93, 85,
+        85, 81, 81, 90, 90, 102, 103, 117, 117, 134, 134, 151, 152, 160,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 32, 32, 32, 33, 34, 36, 36, 39, 40, 44, 46, 48, 52,
+        53, 58, 58, 65, 66, 71, 74, 79, 81, 82, 86, 88, 91, 93, 31, 32, 32, 32,
+        32, 32, 32, 33, 33, 34, 34, 35, 37, 38, 41, 43, 45, 48, 49, 53, 54, 60,
+        61, 65, 68, 72, 74, 75, 78, 81, 83, 85, 31, 32, 32, 32, 32, 33, 33, 33,
+        34, 34, 35, 35, 38, 39, 41, 43, 45, 48, 49, 53, 54, 59, 60, 65, 67, 72,
+        73, 74, 78, 80, 82, 85, 32, 32, 32, 33, 33, 34, 35, 35, 36, 37, 38, 38,
+        40, 41, 43, 44, 46, 49, 50, 53, 54, 58, 59, 63, 66, 70, 71, 72, 75, 77,
+        79, 81, 33, 33, 33, 33, 34, 35, 36, 36, 38, 39, 42, 42, 44, 45, 46, 48,
+        50, 52, 53, 57, 57, 62, 63, 67, 69, 73, 75, 75, 78, 80, 80, 81, 36, 35,
+        35, 34, 35, 36, 37, 38, 41, 42, 48, 48, 50, 51, 53, 55, 56, 59, 60, 63,
+        63, 68, 69, 73, 75, 79, 80, 81, 84, 86, 88, 90, 40, 39, 39, 38, 38, 39,
+        40, 41, 44, 45, 51, 51, 54, 56, 59, 60, 62, 65, 66, 69, 70, 74, 75, 79,
+        81, 85, 86, 87, 90, 90, 90, 90, 44, 42, 42, 41, 41, 42, 42, 42, 46, 48,
+        54, 54, 58, 59, 63, 65, 67, 70, 71, 74, 75, 79, 80, 84, 86, 90, 91, 92,
+        95, 98, 100, 102, 51, 49, 49, 47, 47, 48, 48, 48, 52, 53, 58, 59, 63,
+        65, 69, 72, 74, 78, 79, 83, 84, 89, 90, 94, 97, 101, 102, 103, 106, 105,
+        103, 103, 53, 51, 51, 49, 49, 50, 49, 49, 53, 54, 60, 60, 65, 67, 71,
+        73, 76, 80, 82, 86, 87, 92, 93, 97, 100, 104, 105, 106, 109, 112, 114,
+        117, 65, 62, 61, 59, 59, 59, 58, 58, 62, 63, 68, 68, 73, 75, 79, 82, 85,
+        90, 92, 97, 98, 105, 106, 111, 113, 118, 120, 121, 124, 122, 119, 117,
+        66, 63, 62, 60, 60, 60, 59, 59, 63, 64, 69, 69, 74, 76, 80, 83, 86, 91,
+        93, 98, 99, 106, 107, 112, 115, 119, 121, 122, 125, 127, 130, 134, 79,
+        75, 74, 72, 71, 71, 69, 69, 72, 73, 78, 79, 84, 85, 90, 93, 96, 101,
+        103, 109, 110, 118, 119, 125, 128, 133, 135, 136, 140, 140, 137, 134,
+        81, 77, 76, 74, 73, 72, 71, 70, 74, 75, 80, 80, 85, 87, 91, 94, 98, 103,
+        105, 111, 112, 119, 121, 127, 130, 135, 137, 139, 142, 144, 148, 151,
+        87, 83, 82, 79, 79, 78, 77, 75, 78, 80, 84, 85, 89, 90, 96, 97, 103,
+        105, 111, 113, 118, 122, 126, 131, 134, 141, 143, 147, 147, 152, 151,
+        152, 90, 85, 85, 81, 81, 80, 80, 77, 78, 83, 83, 87, 88, 93, 93, 100,
+        100, 107, 107, 115, 115, 123, 123, 132, 132, 140, 140, 151, 151, 155,
+        155, 160,
+        /* Size 4x16 */
+        31, 36, 53, 81, 32, 35, 51, 76, 32, 35, 49, 73, 32, 37, 49, 71, 33, 41,
+        53, 74, 34, 48, 60, 80, 37, 50, 65, 85, 41, 53, 71, 91, 45, 56, 76, 98,
+        49, 60, 82, 105, 54, 63, 87, 112, 61, 69, 93, 121, 68, 75, 100, 130, 74,
+        80, 105, 137, 78, 84, 109, 142, 83, 88, 114, 148,
+        /* Size 16x4 */
+        31, 32, 32, 32, 33, 34, 37, 41, 45, 49, 54, 61, 68, 74, 78, 83, 36, 35,
+        35, 37, 41, 48, 50, 53, 56, 60, 63, 69, 75, 80, 84, 88, 53, 51, 49, 49,
+        53, 60, 65, 71, 76, 82, 87, 93, 100, 105, 109, 114, 81, 76, 73, 71, 74,
+        80, 85, 91, 98, 105, 112, 121, 130, 137, 142, 148,
+        /* Size 8x32 */
+        32, 31, 33, 40, 51, 65, 79, 87, 31, 32, 33, 39, 49, 62, 75, 83, 31, 32,
+        33, 39, 49, 61, 74, 82, 31, 32, 33, 38, 47, 59, 72, 79, 31, 32, 34, 38,
+        47, 59, 71, 79, 32, 33, 35, 39, 48, 59, 71, 78, 32, 33, 36, 40, 48, 58,
+        69, 77, 32, 33, 36, 41, 48, 58, 69, 75, 33, 34, 38, 44, 52, 62, 72, 78,
+        34, 34, 39, 45, 53, 63, 73, 80, 36, 35, 42, 51, 58, 68, 78, 84, 36, 35,
+        42, 51, 59, 68, 79, 85, 39, 38, 44, 54, 63, 73, 84, 89, 40, 39, 45, 56,
+        65, 75, 85, 90, 44, 41, 46, 59, 69, 79, 90, 96, 46, 43, 48, 60, 72, 82,
+        93, 97, 48, 45, 50, 62, 74, 85, 96, 103, 52, 48, 52, 65, 78, 90, 101,
+        105, 53, 49, 53, 66, 79, 92, 103, 111, 58, 53, 57, 69, 83, 97, 109, 113,
+        58, 54, 57, 70, 84, 98, 110, 118, 65, 59, 62, 74, 89, 105, 118, 122, 66,
+        60, 63, 75, 90, 106, 119, 126, 71, 65, 67, 79, 94, 111, 125, 131, 74,
+        67, 69, 81, 97, 113, 128, 134, 79, 72, 73, 85, 101, 118, 133, 141, 81,
+        73, 75, 86, 102, 120, 135, 143, 82, 74, 75, 87, 103, 121, 136, 147, 86,
+        78, 78, 90, 106, 124, 140, 147, 88, 80, 80, 90, 105, 122, 140, 152, 91,
+        82, 80, 90, 103, 119, 137, 151, 93, 85, 81, 90, 103, 117, 134, 152,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 32, 32, 32, 33, 34, 36, 36, 39, 40, 44, 46, 48, 52,
+        53, 58, 58, 65, 66, 71, 74, 79, 81, 82, 86, 88, 91, 93, 31, 32, 32, 32,
+        32, 33, 33, 33, 34, 34, 35, 35, 38, 39, 41, 43, 45, 48, 49, 53, 54, 59,
+        60, 65, 67, 72, 73, 74, 78, 80, 82, 85, 33, 33, 33, 33, 34, 35, 36, 36,
+        38, 39, 42, 42, 44, 45, 46, 48, 50, 52, 53, 57, 57, 62, 63, 67, 69, 73,
+        75, 75, 78, 80, 80, 81, 40, 39, 39, 38, 38, 39, 40, 41, 44, 45, 51, 51,
+        54, 56, 59, 60, 62, 65, 66, 69, 70, 74, 75, 79, 81, 85, 86, 87, 90, 90,
+        90, 90, 51, 49, 49, 47, 47, 48, 48, 48, 52, 53, 58, 59, 63, 65, 69, 72,
+        74, 78, 79, 83, 84, 89, 90, 94, 97, 101, 102, 103, 106, 105, 103, 103,
+        65, 62, 61, 59, 59, 59, 58, 58, 62, 63, 68, 68, 73, 75, 79, 82, 85, 90,
+        92, 97, 98, 105, 106, 111, 113, 118, 120, 121, 124, 122, 119, 117, 79,
+        75, 74, 72, 71, 71, 69, 69, 72, 73, 78, 79, 84, 85, 90, 93, 96, 101,
+        103, 109, 110, 118, 119, 125, 128, 133, 135, 136, 140, 140, 137, 134,
+        87, 83, 82, 79, 79, 78, 77, 75, 78, 80, 84, 85, 89, 90, 96, 97, 103,
+        105, 111, 113, 118, 122, 126, 131, 134, 141, 143, 147, 147, 152, 151,
+        152 },
+      { /* Chroma */
+        /* Size 4x4 */
+        32, 46, 49, 58, 46, 53, 55, 62, 49, 55, 70, 78, 58, 62, 78, 91,
+        /* Size 8x8 */
+        31, 34, 42, 47, 49, 54, 60, 64, 34, 39, 45, 46, 47, 51, 56, 59, 42, 45,
+        48, 49, 50, 53, 57, 60, 47, 46, 49, 55, 58, 61, 65, 66, 49, 47, 50, 58,
+        65, 69, 73, 74, 54, 51, 53, 61, 69, 76, 82, 83, 60, 56, 57, 65, 73, 82,
+        89, 92, 64, 59, 60, 66, 74, 83, 92, 96,
+        /* Size 16x16 */
+        32, 31, 31, 35, 40, 49, 48, 49, 50, 52, 54, 57, 61, 64, 66, 68, 31, 31,
+        32, 37, 41, 47, 47, 46, 48, 49, 51, 54, 57, 60, 62, 64, 31, 32, 34, 39,
+        43, 46, 46, 45, 46, 47, 49, 52, 55, 57, 59, 61, 35, 37, 39, 44, 46, 47,
+        46, 45, 46, 47, 48, 51, 53, 56, 57, 59, 40, 41, 43, 46, 48, 50, 49, 48,
+        49, 49, 51, 53, 55, 57, 59, 59, 49, 47, 46, 47, 50, 53, 53, 53, 54, 54,
+        55, 57, 59, 61, 62, 62, 48, 47, 46, 46, 49, 53, 54, 55, 56, 57, 58, 60,
+        62, 64, 65, 65, 49, 46, 45, 45, 48, 53, 55, 58, 60, 61, 62, 64, 66, 68,
+        69, 69, 50, 48, 46, 46, 49, 54, 56, 60, 61, 63, 65, 67, 69, 71, 72, 72,
+        52, 49, 47, 47, 49, 54, 57, 61, 63, 66, 68, 71, 73, 75, 76, 77, 54, 51,
+        49, 48, 51, 55, 58, 62, 65, 68, 71, 74, 76, 78, 80, 81, 57, 54, 52, 51,
+        53, 57, 60, 64, 67, 71, 74, 77, 80, 83, 84, 85, 61, 57, 55, 53, 55, 59,
+        62, 66, 69, 73, 76, 80, 84, 87, 89, 89, 64, 60, 57, 56, 57, 61, 64, 68,
+        71, 75, 78, 83, 87, 90, 92, 94, 66, 62, 59, 57, 59, 62, 65, 69, 72, 76,
+        80, 84, 89, 92, 94, 96, 68, 64, 61, 59, 59, 62, 65, 69, 72, 77, 81, 85,
+        89, 94, 96, 98,
+        /* Size 32x32 */
+        32, 31, 31, 30, 31, 33, 35, 36, 40, 41, 49, 49, 48, 48, 49, 50, 50, 52,
+        52, 54, 54, 57, 57, 60, 61, 63, 64, 65, 66, 67, 68, 69, 31, 31, 31, 31,
+        32, 34, 37, 38, 41, 42, 47, 47, 47, 47, 47, 47, 48, 49, 50, 52, 52, 54,
+        55, 57, 58, 60, 61, 61, 63, 64, 64, 65, 31, 31, 31, 31, 32, 35, 37, 39,
+        41, 42, 47, 47, 47, 46, 46, 47, 48, 49, 49, 51, 51, 54, 54, 56, 57, 59,
+        60, 61, 62, 63, 64, 65, 30, 31, 31, 32, 33, 35, 38, 40, 42, 42, 46, 46,
+        45, 45, 45, 45, 46, 47, 47, 49, 49, 52, 52, 54, 55, 57, 58, 58, 60, 61,
+        61, 62, 31, 32, 32, 33, 34, 37, 39, 41, 43, 43, 46, 46, 46, 45, 45, 46,
+        46, 47, 47, 49, 49, 51, 52, 54, 55, 57, 57, 58, 59, 60, 61, 62, 33, 34,
+        35, 35, 37, 39, 41, 43, 44, 45, 47, 47, 46, 46, 45, 46, 46, 47, 47, 49,
+        49, 51, 51, 53, 54, 56, 57, 57, 58, 59, 60, 61, 35, 37, 37, 38, 39, 41,
+        44, 46, 46, 46, 47, 47, 46, 46, 45, 46, 46, 47, 47, 48, 48, 50, 51, 52,
+        53, 55, 56, 56, 57, 58, 59, 61, 36, 38, 39, 40, 41, 43, 46, 47, 47, 47,
+        48, 47, 46, 46, 45, 46, 46, 46, 47, 48, 48, 50, 50, 52, 53, 54, 55, 55,
+        56, 57, 58, 58, 40, 41, 41, 42, 43, 44, 46, 47, 48, 48, 50, 49, 49, 49,
+        48, 49, 49, 49, 49, 51, 51, 52, 53, 54, 55, 57, 57, 58, 59, 59, 59, 59,
+        41, 42, 42, 42, 43, 45, 46, 47, 48, 48, 50, 50, 49, 49, 49, 49, 50, 50,
+        50, 52, 52, 53, 53, 55, 56, 57, 58, 58, 59, 60, 61, 62, 49, 47, 47, 46,
+        46, 47, 47, 48, 50, 50, 53, 53, 53, 53, 53, 54, 54, 54, 54, 55, 55, 56,
+        57, 58, 59, 60, 61, 61, 62, 62, 62, 62, 49, 47, 47, 46, 46, 47, 47, 47,
+        49, 50, 53, 53, 53, 53, 54, 54, 54, 54, 54, 55, 56, 57, 57, 59, 59, 61,
+        61, 62, 63, 63, 64, 65, 48, 47, 47, 45, 46, 46, 46, 46, 49, 49, 53, 53,
+        54, 54, 55, 56, 56, 57, 57, 58, 58, 60, 60, 61, 62, 63, 64, 64, 65, 66,
+        65, 65, 48, 47, 46, 45, 45, 46, 46, 46, 49, 49, 53, 53, 54, 55, 56, 57,
+        57, 58, 58, 59, 60, 61, 61, 63, 63, 65, 65, 65, 66, 66, 67, 68, 49, 47,
+        46, 45, 45, 45, 45, 45, 48, 49, 53, 54, 55, 56, 58, 59, 60, 61, 61, 62,
+        62, 63, 64, 65, 66, 67, 68, 68, 69, 70, 69, 68, 50, 47, 47, 45, 46, 46,
+        46, 46, 49, 49, 54, 54, 56, 57, 59, 60, 60, 62, 62, 63, 64, 65, 65, 67,
+        68, 69, 69, 70, 70, 70, 71, 71, 50, 48, 48, 46, 46, 46, 46, 46, 49, 50,
+        54, 54, 56, 57, 60, 60, 61, 63, 63, 65, 65, 67, 67, 68, 69, 71, 71, 71,
+        72, 73, 72, 71, 52, 49, 49, 47, 47, 47, 47, 46, 49, 50, 54, 54, 57, 58,
+        61, 62, 63, 65, 65, 67, 67, 69, 70, 71, 72, 73, 74, 74, 75, 74, 74, 75,
+        52, 50, 49, 47, 47, 47, 47, 47, 49, 50, 54, 54, 57, 58, 61, 62, 63, 65,
+        66, 68, 68, 70, 71, 72, 73, 75, 75, 75, 76, 77, 77, 75, 54, 52, 51, 49,
+        49, 49, 48, 48, 51, 52, 55, 55, 58, 59, 62, 63, 65, 67, 68, 70, 70, 73,
+        73, 75, 76, 78, 78, 78, 79, 78, 78, 79, 54, 52, 51, 49, 49, 49, 48, 48,
+        51, 52, 55, 56, 58, 60, 62, 64, 65, 67, 68, 70, 71, 73, 74, 75, 76, 78,
+        78, 79, 80, 81, 81, 79, 57, 54, 54, 52, 51, 51, 50, 50, 52, 53, 56, 57,
+        60, 61, 63, 65, 67, 69, 70, 73, 73, 76, 77, 79, 80, 82, 82, 83, 84, 83,
+        82, 83, 57, 55, 54, 52, 52, 51, 51, 50, 53, 53, 57, 57, 60, 61, 64, 65,
+        67, 70, 71, 73, 74, 77, 77, 79, 80, 82, 83, 83, 84, 85, 85, 83, 60, 57,
+        56, 54, 54, 53, 52, 52, 54, 55, 58, 59, 61, 63, 65, 67, 68, 71, 72, 75,
+        75, 79, 79, 82, 83, 85, 86, 86, 87, 87, 86, 87, 61, 58, 57, 55, 55, 54,
+        53, 53, 55, 56, 59, 59, 62, 63, 66, 68, 69, 72, 73, 76, 76, 80, 80, 83,
+        84, 86, 87, 88, 89, 89, 89, 87, 63, 60, 59, 57, 57, 56, 55, 54, 57, 57,
+        60, 61, 63, 65, 67, 69, 71, 73, 75, 78, 78, 82, 82, 85, 86, 89, 89, 90,
+        91, 92, 90, 91, 64, 61, 60, 58, 57, 57, 56, 55, 57, 58, 61, 61, 64, 65,
+        68, 69, 71, 74, 75, 78, 78, 82, 83, 86, 87, 89, 90, 91, 92, 93, 94, 91,
+        65, 61, 61, 58, 58, 57, 56, 55, 58, 58, 61, 62, 64, 65, 68, 70, 71, 74,
+        75, 78, 79, 83, 83, 86, 88, 90, 91, 91, 93, 94, 94, 96, 66, 63, 62, 60,
+        59, 58, 57, 56, 59, 59, 62, 63, 65, 66, 69, 70, 72, 75, 76, 79, 80, 84,
+        84, 87, 89, 91, 92, 93, 94, 94, 96, 96, 67, 64, 63, 61, 60, 59, 58, 57,
+        59, 60, 62, 63, 66, 66, 70, 70, 73, 74, 77, 78, 81, 83, 85, 87, 89, 92,
+        93, 94, 94, 96, 96, 97, 68, 64, 64, 61, 61, 60, 59, 58, 59, 61, 62, 64,
+        65, 67, 69, 71, 72, 74, 77, 78, 81, 82, 85, 86, 89, 90, 94, 94, 96, 96,
+        98, 97, 69, 65, 65, 62, 62, 61, 61, 58, 59, 62, 62, 65, 65, 68, 68, 71,
+        71, 75, 75, 79, 79, 83, 83, 87, 87, 91, 91, 96, 96, 97, 97, 99,
+        /* Size 4x8 */
+        31, 47, 50, 61, 36, 47, 47, 57, 43, 50, 50, 58, 45, 53, 58, 65, 47, 54,
+        66, 74, 52, 56, 70, 82, 57, 60, 75, 90, 61, 63, 77, 93,
+        /* Size 8x4 */
+        31, 36, 43, 45, 47, 52, 57, 61, 47, 47, 50, 53, 54, 56, 60, 63, 50, 47,
+        50, 58, 66, 70, 75, 77, 61, 57, 58, 65, 74, 82, 90, 93,
+        /* Size 8x16 */
+        32, 32, 40, 49, 51, 57, 63, 67, 31, 33, 41, 47, 49, 54, 59, 63, 31, 35,
+        43, 46, 47, 51, 57, 60, 35, 39, 46, 46, 47, 50, 55, 58, 41, 43, 48, 49,
+        49, 52, 57, 59, 49, 47, 50, 53, 54, 57, 60, 62, 48, 46, 49, 54, 57, 60,
+        64, 65, 49, 45, 48, 56, 61, 64, 67, 69, 50, 46, 49, 57, 63, 67, 71, 73,
+        52, 48, 50, 58, 65, 71, 75, 77, 54, 50, 51, 59, 67, 73, 78, 81, 57, 52,
+        53, 61, 69, 77, 82, 85, 61, 55, 56, 63, 72, 80, 86, 88, 64, 58, 58, 65,
+        73, 82, 89, 92, 66, 59, 59, 66, 75, 84, 91, 94, 68, 61, 59, 65, 72, 81,
+        89, 95,
+        /* Size 16x8 */
+        32, 31, 31, 35, 41, 49, 48, 49, 50, 52, 54, 57, 61, 64, 66, 68, 32, 33,
+        35, 39, 43, 47, 46, 45, 46, 48, 50, 52, 55, 58, 59, 61, 40, 41, 43, 46,
+        48, 50, 49, 48, 49, 50, 51, 53, 56, 58, 59, 59, 49, 47, 46, 46, 49, 53,
+        54, 56, 57, 58, 59, 61, 63, 65, 66, 65, 51, 49, 47, 47, 49, 54, 57, 61,
+        63, 65, 67, 69, 72, 73, 75, 72, 57, 54, 51, 50, 52, 57, 60, 64, 67, 71,
+        73, 77, 80, 82, 84, 81, 63, 59, 57, 55, 57, 60, 64, 67, 71, 75, 78, 82,
+        86, 89, 91, 89, 67, 63, 60, 58, 59, 62, 65, 69, 73, 77, 81, 85, 88, 92,
+        94, 95,
+        /* Size 16x32 */
+        32, 31, 32, 37, 40, 48, 49, 49, 51, 52, 57, 58, 63, 64, 67, 67, 31, 31,
+        33, 38, 41, 47, 47, 47, 49, 50, 54, 55, 60, 61, 63, 64, 31, 31, 33, 38,
+        41, 47, 47, 47, 49, 49, 54, 54, 59, 60, 63, 64, 30, 32, 33, 40, 42, 46,
+        45, 45, 47, 48, 52, 52, 57, 58, 60, 61, 31, 33, 35, 41, 43, 46, 46, 45,
+        47, 48, 51, 52, 57, 57, 60, 61, 33, 36, 37, 43, 44, 47, 46, 46, 47, 47,
+        51, 52, 56, 57, 59, 60, 35, 38, 39, 45, 46, 47, 46, 45, 47, 47, 50, 51,
+        55, 56, 58, 60, 37, 40, 41, 47, 47, 47, 46, 45, 46, 47, 50, 50, 54, 55,
+        57, 58, 41, 42, 43, 47, 48, 49, 49, 48, 49, 50, 52, 53, 57, 57, 59, 58,
+        42, 43, 43, 47, 48, 50, 49, 49, 50, 50, 53, 54, 57, 58, 60, 61, 49, 46,
+        47, 48, 50, 53, 53, 53, 54, 54, 57, 57, 60, 61, 62, 61, 49, 46, 47, 48,
+        50, 53, 53, 54, 54, 55, 57, 57, 61, 61, 63, 64, 48, 46, 46, 47, 49, 53,
+        54, 56, 57, 57, 60, 60, 64, 64, 65, 64, 48, 45, 46, 46, 49, 53, 55, 56,
+        58, 58, 61, 61, 65, 65, 66, 67, 49, 45, 45, 46, 48, 53, 56, 58, 61, 61,
+        64, 64, 67, 68, 69, 67, 49, 46, 46, 46, 49, 53, 57, 59, 62, 62, 65, 66,
+        69, 69, 70, 70, 50, 46, 46, 46, 49, 54, 57, 59, 63, 64, 67, 67, 71, 71,
+        73, 71, 51, 47, 47, 47, 49, 54, 58, 61, 64, 66, 69, 70, 73, 74, 74, 74,
+        52, 48, 48, 47, 50, 54, 58, 61, 65, 66, 71, 71, 75, 75, 77, 74, 54, 50,
+        49, 48, 51, 55, 59, 62, 67, 68, 73, 73, 77, 78, 78, 78, 54, 50, 50, 49,
+        51, 55, 59, 62, 67, 68, 73, 74, 78, 78, 81, 78, 57, 52, 52, 50, 52, 56,
+        60, 64, 69, 70, 76, 77, 82, 82, 83, 82, 57, 52, 52, 51, 53, 57, 61, 64,
+        69, 71, 77, 77, 82, 83, 85, 82, 60, 54, 54, 52, 55, 58, 62, 65, 71, 72,
+        79, 79, 85, 86, 87, 86, 61, 56, 55, 53, 56, 59, 63, 66, 72, 73, 80, 81,
+        86, 87, 88, 86, 63, 57, 57, 55, 57, 60, 64, 67, 73, 75, 82, 82, 89, 90,
+        92, 90, 64, 58, 58, 55, 58, 61, 65, 68, 73, 75, 82, 83, 89, 90, 92, 90,
+        64, 59, 58, 56, 58, 61, 65, 68, 74, 75, 83, 83, 90, 91, 94, 95, 66, 60,
+        59, 57, 59, 62, 66, 69, 75, 76, 84, 85, 91, 92, 94, 95, 67, 61, 60, 58,
+        59, 63, 66, 70, 74, 77, 82, 85, 91, 93, 96, 96, 68, 62, 61, 58, 59, 64,
+        65, 71, 72, 78, 81, 86, 89, 94, 95, 96, 68, 62, 62, 59, 59, 65, 65, 71,
+        71, 79, 79, 87, 87, 95, 95, 98,
+        /* Size 32x16 */
+        32, 31, 31, 30, 31, 33, 35, 37, 41, 42, 49, 49, 48, 48, 49, 49, 50, 51,
+        52, 54, 54, 57, 57, 60, 61, 63, 64, 64, 66, 67, 68, 68, 31, 31, 31, 32,
+        33, 36, 38, 40, 42, 43, 46, 46, 46, 45, 45, 46, 46, 47, 48, 50, 50, 52,
+        52, 54, 56, 57, 58, 59, 60, 61, 62, 62, 32, 33, 33, 33, 35, 37, 39, 41,
+        43, 43, 47, 47, 46, 46, 45, 46, 46, 47, 48, 49, 50, 52, 52, 54, 55, 57,
+        58, 58, 59, 60, 61, 62, 37, 38, 38, 40, 41, 43, 45, 47, 47, 47, 48, 48,
+        47, 46, 46, 46, 46, 47, 47, 48, 49, 50, 51, 52, 53, 55, 55, 56, 57, 58,
+        58, 59, 40, 41, 41, 42, 43, 44, 46, 47, 48, 48, 50, 50, 49, 49, 48, 49,
+        49, 49, 50, 51, 51, 52, 53, 55, 56, 57, 58, 58, 59, 59, 59, 59, 48, 47,
+        47, 46, 46, 47, 47, 47, 49, 50, 53, 53, 53, 53, 53, 53, 54, 54, 54, 55,
+        55, 56, 57, 58, 59, 60, 61, 61, 62, 63, 64, 65, 49, 47, 47, 45, 46, 46,
+        46, 46, 49, 49, 53, 53, 54, 55, 56, 57, 57, 58, 58, 59, 59, 60, 61, 62,
+        63, 64, 65, 65, 66, 66, 65, 65, 49, 47, 47, 45, 45, 46, 45, 45, 48, 49,
+        53, 54, 56, 56, 58, 59, 59, 61, 61, 62, 62, 64, 64, 65, 66, 67, 68, 68,
+        69, 70, 71, 71, 51, 49, 49, 47, 47, 47, 47, 46, 49, 50, 54, 54, 57, 58,
+        61, 62, 63, 64, 65, 67, 67, 69, 69, 71, 72, 73, 73, 74, 75, 74, 72, 71,
+        52, 50, 49, 48, 48, 47, 47, 47, 50, 50, 54, 55, 57, 58, 61, 62, 64, 66,
+        66, 68, 68, 70, 71, 72, 73, 75, 75, 75, 76, 77, 78, 79, 57, 54, 54, 52,
+        51, 51, 50, 50, 52, 53, 57, 57, 60, 61, 64, 65, 67, 69, 71, 73, 73, 76,
+        77, 79, 80, 82, 82, 83, 84, 82, 81, 79, 58, 55, 54, 52, 52, 52, 51, 50,
+        53, 54, 57, 57, 60, 61, 64, 66, 67, 70, 71, 73, 74, 77, 77, 79, 81, 82,
+        83, 83, 85, 85, 86, 87, 63, 60, 59, 57, 57, 56, 55, 54, 57, 57, 60, 61,
+        64, 65, 67, 69, 71, 73, 75, 77, 78, 82, 82, 85, 86, 89, 89, 90, 91, 91,
+        89, 87, 64, 61, 60, 58, 57, 57, 56, 55, 57, 58, 61, 61, 64, 65, 68, 69,
+        71, 74, 75, 78, 78, 82, 83, 86, 87, 90, 90, 91, 92, 93, 94, 95, 67, 63,
+        63, 60, 60, 59, 58, 57, 59, 60, 62, 63, 65, 66, 69, 70, 73, 74, 77, 78,
+        81, 83, 85, 87, 88, 92, 92, 94, 94, 96, 95, 95, 67, 64, 64, 61, 61, 60,
+        60, 58, 58, 61, 61, 64, 64, 67, 67, 70, 71, 74, 74, 78, 78, 82, 82, 86,
+        86, 90, 90, 95, 95, 96, 96, 98,
+        /* Size 4x16 */
+        31, 48, 52, 64, 31, 47, 49, 60, 33, 46, 48, 57, 38, 47, 47, 56, 42, 49,
+        50, 57, 46, 53, 54, 61, 46, 53, 57, 64, 45, 53, 61, 68, 46, 54, 64, 71,
+        48, 54, 66, 75, 50, 55, 68, 78, 52, 57, 71, 83, 56, 59, 73, 87, 58, 61,
+        75, 90, 60, 62, 76, 92, 62, 64, 78, 94,
+        /* Size 16x4 */
+        31, 31, 33, 38, 42, 46, 46, 45, 46, 48, 50, 52, 56, 58, 60, 62, 48, 47,
+        46, 47, 49, 53, 53, 53, 54, 54, 55, 57, 59, 61, 62, 64, 52, 49, 48, 47,
+        50, 54, 57, 61, 64, 66, 68, 71, 73, 75, 76, 78, 64, 60, 57, 56, 57, 61,
+        64, 68, 71, 75, 78, 83, 87, 90, 92, 94,
+        /* Size 8x32 */
+        32, 32, 40, 49, 51, 57, 63, 67, 31, 33, 41, 47, 49, 54, 60, 63, 31, 33,
+        41, 47, 49, 54, 59, 63, 30, 33, 42, 45, 47, 52, 57, 60, 31, 35, 43, 46,
+        47, 51, 57, 60, 33, 37, 44, 46, 47, 51, 56, 59, 35, 39, 46, 46, 47, 50,
+        55, 58, 37, 41, 47, 46, 46, 50, 54, 57, 41, 43, 48, 49, 49, 52, 57, 59,
+        42, 43, 48, 49, 50, 53, 57, 60, 49, 47, 50, 53, 54, 57, 60, 62, 49, 47,
+        50, 53, 54, 57, 61, 63, 48, 46, 49, 54, 57, 60, 64, 65, 48, 46, 49, 55,
+        58, 61, 65, 66, 49, 45, 48, 56, 61, 64, 67, 69, 49, 46, 49, 57, 62, 65,
+        69, 70, 50, 46, 49, 57, 63, 67, 71, 73, 51, 47, 49, 58, 64, 69, 73, 74,
+        52, 48, 50, 58, 65, 71, 75, 77, 54, 49, 51, 59, 67, 73, 77, 78, 54, 50,
+        51, 59, 67, 73, 78, 81, 57, 52, 52, 60, 69, 76, 82, 83, 57, 52, 53, 61,
+        69, 77, 82, 85, 60, 54, 55, 62, 71, 79, 85, 87, 61, 55, 56, 63, 72, 80,
+        86, 88, 63, 57, 57, 64, 73, 82, 89, 92, 64, 58, 58, 65, 73, 82, 89, 92,
+        64, 58, 58, 65, 74, 83, 90, 94, 66, 59, 59, 66, 75, 84, 91, 94, 67, 60,
+        59, 66, 74, 82, 91, 96, 68, 61, 59, 65, 72, 81, 89, 95, 68, 62, 59, 65,
+        71, 79, 87, 95,
+        /* Size 32x8 */
+        32, 31, 31, 30, 31, 33, 35, 37, 41, 42, 49, 49, 48, 48, 49, 49, 50, 51,
+        52, 54, 54, 57, 57, 60, 61, 63, 64, 64, 66, 67, 68, 68, 32, 33, 33, 33,
+        35, 37, 39, 41, 43, 43, 47, 47, 46, 46, 45, 46, 46, 47, 48, 49, 50, 52,
+        52, 54, 55, 57, 58, 58, 59, 60, 61, 62, 40, 41, 41, 42, 43, 44, 46, 47,
+        48, 48, 50, 50, 49, 49, 48, 49, 49, 49, 50, 51, 51, 52, 53, 55, 56, 57,
+        58, 58, 59, 59, 59, 59, 49, 47, 47, 45, 46, 46, 46, 46, 49, 49, 53, 53,
+        54, 55, 56, 57, 57, 58, 58, 59, 59, 60, 61, 62, 63, 64, 65, 65, 66, 66,
+        65, 65, 51, 49, 49, 47, 47, 47, 47, 46, 49, 50, 54, 54, 57, 58, 61, 62,
+        63, 64, 65, 67, 67, 69, 69, 71, 72, 73, 73, 74, 75, 74, 72, 71, 57, 54,
+        54, 52, 51, 51, 50, 50, 52, 53, 57, 57, 60, 61, 64, 65, 67, 69, 71, 73,
+        73, 76, 77, 79, 80, 82, 82, 83, 84, 82, 81, 79, 63, 60, 59, 57, 57, 56,
+        55, 54, 57, 57, 60, 61, 64, 65, 67, 69, 71, 73, 75, 77, 78, 82, 82, 85,
+        86, 89, 89, 90, 91, 91, 89, 87, 67, 63, 63, 60, 60, 59, 58, 57, 59, 60,
+        62, 63, 65, 66, 69, 70, 73, 74, 77, 78, 81, 83, 85, 87, 88, 92, 92, 94,
+        94, 96, 95, 95 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 34, 49, 72, 34, 48, 60, 79, 49, 60, 82, 104, 72, 79, 104, 134,
+        /* Size 8x8 */
+        32, 32, 34, 38, 46, 56, 68, 78, 32, 33, 35, 39, 45, 54, 64, 74, 34, 35,
+        39, 45, 51, 58, 68, 76, 38, 39, 45, 54, 61, 69, 78, 86, 46, 45, 51, 61,
+        71, 80, 90, 99, 56, 54, 58, 69, 80, 92, 103, 113, 68, 64, 68, 78, 90,
+        103, 117, 128, 78, 74, 76, 86, 99, 113, 128, 140,
+        /* Size 16x16 */
+        32, 31, 31, 31, 32, 34, 36, 39, 44, 48, 54, 59, 65, 71, 80, 83, 31, 32,
+        32, 32, 32, 34, 35, 38, 42, 46, 51, 56, 62, 68, 76, 78, 31, 32, 32, 32,
+        32, 33, 34, 37, 41, 44, 49, 54, 59, 65, 72, 75, 31, 32, 32, 33, 34, 35,
+        36, 39, 42, 45, 50, 54, 59, 64, 71, 74, 32, 32, 32, 34, 35, 37, 38, 40,
+        42, 46, 49, 53, 58, 63, 69, 72, 34, 34, 33, 35, 37, 39, 42, 45, 47, 51,
+        54, 58, 63, 68, 74, 76, 36, 35, 34, 36, 38, 42, 48, 50, 54, 57, 60, 64,
+        68, 73, 79, 81, 39, 38, 37, 39, 40, 45, 50, 54, 58, 61, 65, 69, 73, 78,
+        84, 86, 44, 42, 41, 42, 42, 47, 54, 58, 63, 67, 71, 75, 79, 84, 90, 92,
+        48, 46, 44, 45, 46, 51, 57, 61, 67, 71, 76, 80, 85, 90, 96, 99, 54, 51,
+        49, 50, 49, 54, 60, 65, 71, 76, 82, 87, 92, 97, 104, 106, 59, 56, 54,
+        54, 53, 58, 64, 69, 75, 80, 87, 92, 98, 103, 110, 113, 65, 62, 59, 59,
+        58, 63, 68, 73, 79, 85, 92, 98, 105, 111, 118, 121, 71, 68, 65, 64, 63,
+        68, 73, 78, 84, 90, 97, 103, 111, 117, 125, 128, 80, 76, 72, 71, 69, 74,
+        79, 84, 90, 96, 104, 110, 118, 125, 134, 137, 83, 78, 75, 74, 72, 76,
+        81, 86, 92, 99, 106, 113, 121, 128, 137, 140,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 31, 32, 32, 34, 34, 36, 36, 39, 39, 44, 44, 48,
+        48, 54, 54, 59, 59, 65, 65, 71, 71, 80, 80, 83, 83, 87, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 34, 34, 35, 35, 38, 38, 42, 42, 46, 46, 51, 51, 56,
+        56, 62, 62, 68, 68, 76, 76, 78, 78, 83, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 34, 34, 35, 35, 38, 38, 42, 42, 46, 46, 51, 51, 56, 56, 62, 62, 68,
+        68, 76, 76, 78, 78, 83, 31, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34,
+        34, 37, 37, 41, 41, 44, 44, 49, 49, 54, 54, 59, 59, 65, 65, 72, 72, 75,
+        75, 79, 31, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 37, 37, 41,
+        41, 44, 44, 49, 49, 54, 54, 59, 59, 65, 65, 72, 72, 75, 75, 79, 31, 32,
+        32, 32, 32, 33, 33, 34, 34, 35, 35, 36, 36, 39, 39, 42, 42, 45, 45, 50,
+        50, 54, 54, 59, 59, 64, 64, 71, 71, 74, 74, 77, 31, 32, 32, 32, 32, 33,
+        33, 34, 34, 35, 35, 36, 36, 39, 39, 42, 42, 45, 45, 50, 50, 54, 54, 59,
+        59, 64, 64, 71, 71, 74, 74, 77, 32, 32, 32, 32, 32, 34, 34, 35, 35, 37,
+        37, 38, 38, 40, 40, 42, 42, 46, 46, 49, 49, 53, 53, 58, 58, 63, 63, 69,
+        69, 72, 72, 75, 32, 32, 32, 32, 32, 34, 34, 35, 35, 37, 37, 38, 38, 40,
+        40, 42, 42, 46, 46, 49, 49, 53, 53, 58, 58, 63, 63, 69, 69, 72, 72, 75,
+        34, 34, 34, 33, 33, 35, 35, 37, 37, 39, 39, 42, 42, 45, 45, 47, 47, 51,
+        51, 54, 54, 58, 58, 63, 63, 68, 68, 74, 74, 76, 76, 80, 34, 34, 34, 33,
+        33, 35, 35, 37, 37, 39, 39, 42, 42, 45, 45, 47, 47, 51, 51, 54, 54, 58,
+        58, 63, 63, 68, 68, 74, 74, 76, 76, 80, 36, 35, 35, 34, 34, 36, 36, 38,
+        38, 42, 42, 48, 48, 50, 50, 54, 54, 57, 57, 60, 60, 64, 64, 68, 68, 73,
+        73, 79, 79, 81, 81, 84, 36, 35, 35, 34, 34, 36, 36, 38, 38, 42, 42, 48,
+        48, 50, 50, 54, 54, 57, 57, 60, 60, 64, 64, 68, 68, 73, 73, 79, 79, 81,
+        81, 84, 39, 38, 38, 37, 37, 39, 39, 40, 40, 45, 45, 50, 50, 54, 54, 58,
+        58, 61, 61, 65, 65, 69, 69, 73, 73, 78, 78, 84, 84, 86, 86, 90, 39, 38,
+        38, 37, 37, 39, 39, 40, 40, 45, 45, 50, 50, 54, 54, 58, 58, 61, 61, 65,
+        65, 69, 69, 73, 73, 78, 78, 84, 84, 86, 86, 90, 44, 42, 42, 41, 41, 42,
+        42, 42, 42, 47, 47, 54, 54, 58, 58, 63, 63, 67, 67, 71, 71, 75, 75, 79,
+        79, 84, 84, 90, 90, 92, 92, 96, 44, 42, 42, 41, 41, 42, 42, 42, 42, 47,
+        47, 54, 54, 58, 58, 63, 63, 67, 67, 71, 71, 75, 75, 79, 79, 84, 84, 90,
+        90, 92, 92, 96, 48, 46, 46, 44, 44, 45, 45, 46, 46, 51, 51, 57, 57, 61,
+        61, 67, 67, 71, 71, 76, 76, 80, 80, 85, 85, 90, 90, 96, 96, 99, 99, 102,
+        48, 46, 46, 44, 44, 45, 45, 46, 46, 51, 51, 57, 57, 61, 61, 67, 67, 71,
+        71, 76, 76, 80, 80, 85, 85, 90, 90, 96, 96, 99, 99, 102, 54, 51, 51, 49,
+        49, 50, 50, 49, 49, 54, 54, 60, 60, 65, 65, 71, 71, 76, 76, 82, 82, 87,
+        87, 92, 92, 97, 97, 104, 104, 106, 106, 109, 54, 51, 51, 49, 49, 50, 50,
+        49, 49, 54, 54, 60, 60, 65, 65, 71, 71, 76, 76, 82, 82, 87, 87, 92, 92,
+        97, 97, 104, 104, 106, 106, 109, 59, 56, 56, 54, 54, 54, 54, 53, 53, 58,
+        58, 64, 64, 69, 69, 75, 75, 80, 80, 87, 87, 92, 92, 98, 98, 103, 103,
+        110, 110, 113, 113, 116, 59, 56, 56, 54, 54, 54, 54, 53, 53, 58, 58, 64,
+        64, 69, 69, 75, 75, 80, 80, 87, 87, 92, 92, 98, 98, 103, 103, 110, 110,
+        113, 113, 116, 65, 62, 62, 59, 59, 59, 59, 58, 58, 63, 63, 68, 68, 73,
+        73, 79, 79, 85, 85, 92, 92, 98, 98, 105, 105, 111, 111, 118, 118, 121,
+        121, 124, 65, 62, 62, 59, 59, 59, 59, 58, 58, 63, 63, 68, 68, 73, 73,
+        79, 79, 85, 85, 92, 92, 98, 98, 105, 105, 111, 111, 118, 118, 121, 121,
+        124, 71, 68, 68, 65, 65, 64, 64, 63, 63, 68, 68, 73, 73, 78, 78, 84, 84,
+        90, 90, 97, 97, 103, 103, 111, 111, 117, 117, 125, 125, 128, 128, 132,
+        71, 68, 68, 65, 65, 64, 64, 63, 63, 68, 68, 73, 73, 78, 78, 84, 84, 90,
+        90, 97, 97, 103, 103, 111, 111, 117, 117, 125, 125, 128, 128, 132, 80,
+        76, 76, 72, 72, 71, 71, 69, 69, 74, 74, 79, 79, 84, 84, 90, 90, 96, 96,
+        104, 104, 110, 110, 118, 118, 125, 125, 134, 134, 137, 137, 141, 80, 76,
+        76, 72, 72, 71, 71, 69, 69, 74, 74, 79, 79, 84, 84, 90, 90, 96, 96, 104,
+        104, 110, 110, 118, 118, 125, 125, 134, 134, 137, 137, 141, 83, 78, 78,
+        75, 75, 74, 74, 72, 72, 76, 76, 81, 81, 86, 86, 92, 92, 99, 99, 106,
+        106, 113, 113, 121, 121, 128, 128, 137, 137, 140, 140, 144, 83, 78, 78,
+        75, 75, 74, 74, 72, 72, 76, 76, 81, 81, 86, 86, 92, 92, 99, 99, 106,
+        106, 113, 113, 121, 121, 128, 128, 137, 137, 140, 140, 144, 87, 83, 83,
+        79, 79, 77, 77, 75, 75, 80, 80, 84, 84, 90, 90, 96, 96, 102, 102, 109,
+        109, 116, 116, 124, 124, 132, 132, 141, 141, 144, 144, 149,
+        /* Size 4x8 */
+        32, 35, 51, 75, 32, 36, 50, 71, 34, 42, 54, 73, 37, 50, 65, 84, 45, 56,
+        76, 96, 54, 63, 87, 110, 65, 73, 97, 125, 75, 81, 106, 136,
+        /* Size 8x4 */
+        32, 32, 34, 37, 45, 54, 65, 75, 35, 36, 42, 50, 56, 63, 73, 81, 51, 50,
+        54, 65, 76, 87, 97, 106, 75, 71, 73, 84, 96, 110, 125, 136,
+        /* Size 8x16 */
+        32, 31, 32, 36, 44, 53, 65, 79, 31, 32, 32, 35, 42, 51, 62, 75, 31, 32,
+        33, 34, 41, 49, 59, 72, 32, 32, 34, 36, 42, 50, 59, 71, 32, 33, 35, 38,
+        42, 49, 58, 69, 34, 34, 37, 42, 48, 54, 63, 73, 36, 34, 38, 48, 54, 60,
+        68, 78, 39, 37, 40, 50, 58, 65, 73, 84, 44, 41, 43, 53, 63, 71, 79, 90,
+        48, 45, 46, 56, 67, 76, 85, 96, 53, 49, 50, 60, 71, 82, 92, 103, 58, 54,
+        54, 63, 75, 87, 98, 110, 65, 60, 58, 68, 79, 92, 105, 118, 71, 65, 63,
+        73, 84, 97, 111, 125, 79, 72, 70, 79, 90, 104, 118, 133, 82, 75, 72, 81,
+        92, 106, 121, 136,
+        /* Size 16x8 */
+        32, 31, 31, 32, 32, 34, 36, 39, 44, 48, 53, 58, 65, 71, 79, 82, 31, 32,
+        32, 32, 33, 34, 34, 37, 41, 45, 49, 54, 60, 65, 72, 75, 32, 32, 33, 34,
+        35, 37, 38, 40, 43, 46, 50, 54, 58, 63, 70, 72, 36, 35, 34, 36, 38, 42,
+        48, 50, 53, 56, 60, 63, 68, 73, 79, 81, 44, 42, 41, 42, 42, 48, 54, 58,
+        63, 67, 71, 75, 79, 84, 90, 92, 53, 51, 49, 50, 49, 54, 60, 65, 71, 76,
+        82, 87, 92, 97, 104, 106, 65, 62, 59, 59, 58, 63, 68, 73, 79, 85, 92,
+        98, 105, 111, 118, 121, 79, 75, 72, 71, 69, 73, 78, 84, 90, 96, 103,
+        110, 118, 125, 133, 136,
+        /* Size 16x32 */
+        32, 31, 31, 32, 32, 36, 36, 44, 44, 53, 53, 65, 65, 79, 79, 87, 31, 32,
+        32, 32, 32, 35, 35, 42, 42, 51, 51, 62, 62, 75, 75, 82, 31, 32, 32, 32,
+        32, 35, 35, 42, 42, 51, 51, 62, 62, 75, 75, 82, 31, 32, 32, 33, 33, 34,
+        34, 41, 41, 49, 49, 59, 59, 72, 72, 78, 31, 32, 32, 33, 33, 34, 34, 41,
+        41, 49, 49, 59, 59, 72, 72, 78, 32, 32, 32, 34, 34, 36, 36, 42, 42, 50,
+        50, 59, 59, 71, 71, 77, 32, 32, 32, 34, 34, 36, 36, 42, 42, 50, 50, 59,
+        59, 71, 71, 77, 32, 33, 33, 35, 35, 38, 38, 42, 42, 49, 49, 58, 58, 69,
+        69, 75, 32, 33, 33, 35, 35, 38, 38, 42, 42, 49, 49, 58, 58, 69, 69, 75,
+        34, 34, 34, 37, 37, 42, 42, 48, 48, 54, 54, 63, 63, 73, 73, 79, 34, 34,
+        34, 37, 37, 42, 42, 48, 48, 54, 54, 63, 63, 73, 73, 79, 36, 34, 34, 38,
+        38, 48, 48, 54, 54, 60, 60, 68, 68, 78, 78, 84, 36, 34, 34, 38, 38, 48,
+        48, 54, 54, 60, 60, 68, 68, 78, 78, 84, 39, 37, 37, 40, 40, 50, 50, 58,
+        58, 65, 65, 73, 73, 84, 84, 89, 39, 37, 37, 40, 40, 50, 50, 58, 58, 65,
+        65, 73, 73, 84, 84, 89, 44, 41, 41, 43, 43, 53, 53, 63, 63, 71, 71, 79,
+        79, 90, 90, 95, 44, 41, 41, 43, 43, 53, 53, 63, 63, 71, 71, 79, 79, 90,
+        90, 95, 48, 45, 45, 46, 46, 56, 56, 67, 67, 76, 76, 85, 85, 96, 96, 102,
+        48, 45, 45, 46, 46, 56, 56, 67, 67, 76, 76, 85, 85, 96, 96, 102, 53, 49,
+        49, 50, 50, 60, 60, 71, 71, 82, 82, 92, 92, 103, 103, 109, 53, 49, 49,
+        50, 50, 60, 60, 71, 71, 82, 82, 92, 92, 103, 103, 109, 58, 54, 54, 54,
+        54, 63, 63, 75, 75, 87, 87, 98, 98, 110, 110, 116, 58, 54, 54, 54, 54,
+        63, 63, 75, 75, 87, 87, 98, 98, 110, 110, 116, 65, 60, 60, 58, 58, 68,
+        68, 79, 79, 92, 92, 105, 105, 118, 118, 124, 65, 60, 60, 58, 58, 68, 68,
+        79, 79, 92, 92, 105, 105, 118, 118, 124, 71, 65, 65, 63, 63, 73, 73, 84,
+        84, 97, 97, 111, 111, 125, 125, 132, 71, 65, 65, 63, 63, 73, 73, 84, 84,
+        97, 97, 111, 111, 125, 125, 132, 79, 72, 72, 70, 70, 79, 79, 90, 90,
+        104, 104, 118, 118, 133, 133, 141, 79, 72, 72, 70, 70, 79, 79, 90, 90,
+        104, 104, 118, 118, 133, 133, 141, 82, 75, 75, 72, 72, 81, 81, 92, 92,
+        106, 106, 121, 121, 136, 136, 144, 82, 75, 75, 72, 72, 81, 81, 92, 92,
+        106, 106, 121, 121, 136, 136, 144, 87, 79, 79, 76, 76, 84, 84, 96, 96,
+        109, 109, 124, 124, 141, 141, 149,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 32, 32, 32, 32, 34, 34, 36, 36, 39, 39, 44, 44, 48,
+        48, 53, 53, 58, 58, 65, 65, 71, 71, 79, 79, 82, 82, 87, 31, 32, 32, 32,
+        32, 32, 32, 33, 33, 34, 34, 34, 34, 37, 37, 41, 41, 45, 45, 49, 49, 54,
+        54, 60, 60, 65, 65, 72, 72, 75, 75, 79, 31, 32, 32, 32, 32, 32, 32, 33,
+        33, 34, 34, 34, 34, 37, 37, 41, 41, 45, 45, 49, 49, 54, 54, 60, 60, 65,
+        65, 72, 72, 75, 75, 79, 32, 32, 32, 33, 33, 34, 34, 35, 35, 37, 37, 38,
+        38, 40, 40, 43, 43, 46, 46, 50, 50, 54, 54, 58, 58, 63, 63, 70, 70, 72,
+        72, 76, 32, 32, 32, 33, 33, 34, 34, 35, 35, 37, 37, 38, 38, 40, 40, 43,
+        43, 46, 46, 50, 50, 54, 54, 58, 58, 63, 63, 70, 70, 72, 72, 76, 36, 35,
+        35, 34, 34, 36, 36, 38, 38, 42, 42, 48, 48, 50, 50, 53, 53, 56, 56, 60,
+        60, 63, 63, 68, 68, 73, 73, 79, 79, 81, 81, 84, 36, 35, 35, 34, 34, 36,
+        36, 38, 38, 42, 42, 48, 48, 50, 50, 53, 53, 56, 56, 60, 60, 63, 63, 68,
+        68, 73, 73, 79, 79, 81, 81, 84, 44, 42, 42, 41, 41, 42, 42, 42, 42, 48,
+        48, 54, 54, 58, 58, 63, 63, 67, 67, 71, 71, 75, 75, 79, 79, 84, 84, 90,
+        90, 92, 92, 96, 44, 42, 42, 41, 41, 42, 42, 42, 42, 48, 48, 54, 54, 58,
+        58, 63, 63, 67, 67, 71, 71, 75, 75, 79, 79, 84, 84, 90, 90, 92, 92, 96,
+        53, 51, 51, 49, 49, 50, 50, 49, 49, 54, 54, 60, 60, 65, 65, 71, 71, 76,
+        76, 82, 82, 87, 87, 92, 92, 97, 97, 104, 104, 106, 106, 109, 53, 51, 51,
+        49, 49, 50, 50, 49, 49, 54, 54, 60, 60, 65, 65, 71, 71, 76, 76, 82, 82,
+        87, 87, 92, 92, 97, 97, 104, 104, 106, 106, 109, 65, 62, 62, 59, 59, 59,
+        59, 58, 58, 63, 63, 68, 68, 73, 73, 79, 79, 85, 85, 92, 92, 98, 98, 105,
+        105, 111, 111, 118, 118, 121, 121, 124, 65, 62, 62, 59, 59, 59, 59, 58,
+        58, 63, 63, 68, 68, 73, 73, 79, 79, 85, 85, 92, 92, 98, 98, 105, 105,
+        111, 111, 118, 118, 121, 121, 124, 79, 75, 75, 72, 72, 71, 71, 69, 69,
+        73, 73, 78, 78, 84, 84, 90, 90, 96, 96, 103, 103, 110, 110, 118, 118,
+        125, 125, 133, 133, 136, 136, 141, 79, 75, 75, 72, 72, 71, 71, 69, 69,
+        73, 73, 78, 78, 84, 84, 90, 90, 96, 96, 103, 103, 110, 110, 118, 118,
+        125, 125, 133, 133, 136, 136, 141, 87, 82, 82, 78, 78, 77, 77, 75, 75,
+        79, 79, 84, 84, 89, 89, 95, 95, 102, 102, 109, 109, 116, 116, 124, 124,
+        132, 132, 141, 141, 144, 144, 149,
+        /* Size 4x16 */
+        31, 36, 53, 79, 32, 35, 51, 75, 32, 34, 49, 72, 32, 36, 50, 71, 33, 38,
+        49, 69, 34, 42, 54, 73, 34, 48, 60, 78, 37, 50, 65, 84, 41, 53, 71, 90,
+        45, 56, 76, 96, 49, 60, 82, 103, 54, 63, 87, 110, 60, 68, 92, 118, 65,
+        73, 97, 125, 72, 79, 104, 133, 75, 81, 106, 136,
+        /* Size 16x4 */
+        31, 32, 32, 32, 33, 34, 34, 37, 41, 45, 49, 54, 60, 65, 72, 75, 36, 35,
+        34, 36, 38, 42, 48, 50, 53, 56, 60, 63, 68, 73, 79, 81, 53, 51, 49, 50,
+        49, 54, 60, 65, 71, 76, 82, 87, 92, 97, 104, 106, 79, 75, 72, 71, 69,
+        73, 78, 84, 90, 96, 103, 110, 118, 125, 133, 136,
+        /* Size 8x32 */
+        32, 31, 32, 36, 44, 53, 65, 79, 31, 32, 32, 35, 42, 51, 62, 75, 31, 32,
+        32, 35, 42, 51, 62, 75, 31, 32, 33, 34, 41, 49, 59, 72, 31, 32, 33, 34,
+        41, 49, 59, 72, 32, 32, 34, 36, 42, 50, 59, 71, 32, 32, 34, 36, 42, 50,
+        59, 71, 32, 33, 35, 38, 42, 49, 58, 69, 32, 33, 35, 38, 42, 49, 58, 69,
+        34, 34, 37, 42, 48, 54, 63, 73, 34, 34, 37, 42, 48, 54, 63, 73, 36, 34,
+        38, 48, 54, 60, 68, 78, 36, 34, 38, 48, 54, 60, 68, 78, 39, 37, 40, 50,
+        58, 65, 73, 84, 39, 37, 40, 50, 58, 65, 73, 84, 44, 41, 43, 53, 63, 71,
+        79, 90, 44, 41, 43, 53, 63, 71, 79, 90, 48, 45, 46, 56, 67, 76, 85, 96,
+        48, 45, 46, 56, 67, 76, 85, 96, 53, 49, 50, 60, 71, 82, 92, 103, 53, 49,
+        50, 60, 71, 82, 92, 103, 58, 54, 54, 63, 75, 87, 98, 110, 58, 54, 54,
+        63, 75, 87, 98, 110, 65, 60, 58, 68, 79, 92, 105, 118, 65, 60, 58, 68,
+        79, 92, 105, 118, 71, 65, 63, 73, 84, 97, 111, 125, 71, 65, 63, 73, 84,
+        97, 111, 125, 79, 72, 70, 79, 90, 104, 118, 133, 79, 72, 70, 79, 90,
+        104, 118, 133, 82, 75, 72, 81, 92, 106, 121, 136, 82, 75, 72, 81, 92,
+        106, 121, 136, 87, 79, 76, 84, 96, 109, 124, 141,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 32, 32, 32, 32, 34, 34, 36, 36, 39, 39, 44, 44, 48,
+        48, 53, 53, 58, 58, 65, 65, 71, 71, 79, 79, 82, 82, 87, 31, 32, 32, 32,
+        32, 32, 32, 33, 33, 34, 34, 34, 34, 37, 37, 41, 41, 45, 45, 49, 49, 54,
+        54, 60, 60, 65, 65, 72, 72, 75, 75, 79, 32, 32, 32, 33, 33, 34, 34, 35,
+        35, 37, 37, 38, 38, 40, 40, 43, 43, 46, 46, 50, 50, 54, 54, 58, 58, 63,
+        63, 70, 70, 72, 72, 76, 36, 35, 35, 34, 34, 36, 36, 38, 38, 42, 42, 48,
+        48, 50, 50, 53, 53, 56, 56, 60, 60, 63, 63, 68, 68, 73, 73, 79, 79, 81,
+        81, 84, 44, 42, 42, 41, 41, 42, 42, 42, 42, 48, 48, 54, 54, 58, 58, 63,
+        63, 67, 67, 71, 71, 75, 75, 79, 79, 84, 84, 90, 90, 92, 92, 96, 53, 51,
+        51, 49, 49, 50, 50, 49, 49, 54, 54, 60, 60, 65, 65, 71, 71, 76, 76, 82,
+        82, 87, 87, 92, 92, 97, 97, 104, 104, 106, 106, 109, 65, 62, 62, 59, 59,
+        59, 59, 58, 58, 63, 63, 68, 68, 73, 73, 79, 79, 85, 85, 92, 92, 98, 98,
+        105, 105, 111, 111, 118, 118, 121, 121, 124, 79, 75, 75, 72, 72, 71, 71,
+        69, 69, 73, 73, 78, 78, 84, 84, 90, 90, 96, 96, 103, 103, 110, 110, 118,
+        118, 125, 125, 133, 133, 136, 136, 141 },
+      { /* Chroma */
+        /* Size 4x4 */
+        32, 46, 47, 57, 46, 53, 54, 60, 47, 54, 66, 75, 57, 60, 75, 89,
+        /* Size 8x8 */
+        31, 34, 42, 47, 48, 52, 57, 61, 34, 39, 45, 46, 46, 49, 53, 57, 42, 45,
+        48, 49, 50, 52, 55, 58, 47, 46, 49, 54, 56, 58, 61, 64, 48, 46, 50, 56,
+        61, 65, 68, 71, 52, 49, 52, 58, 65, 71, 75, 79, 57, 53, 55, 61, 68, 75,
+        82, 86, 61, 57, 58, 64, 71, 79, 86, 91,
+        /* Size 16x16 */
+        32, 31, 30, 33, 36, 41, 49, 48, 49, 50, 52, 54, 57, 60, 63, 65, 31, 31,
+        31, 34, 38, 42, 47, 47, 47, 48, 50, 52, 54, 57, 60, 61, 30, 31, 32, 35,
+        40, 42, 46, 45, 45, 46, 47, 49, 52, 54, 57, 58, 33, 34, 35, 39, 43, 45,
+        47, 46, 45, 46, 47, 49, 51, 53, 56, 57, 36, 38, 40, 43, 47, 47, 48, 46,
+        45, 46, 47, 48, 50, 52, 54, 55, 41, 42, 42, 45, 47, 48, 50, 49, 49, 50,
+        50, 52, 53, 55, 57, 58, 49, 47, 46, 47, 48, 50, 53, 53, 53, 54, 54, 55,
+        56, 58, 60, 61, 48, 47, 45, 46, 46, 49, 53, 54, 55, 56, 57, 58, 60, 61,
+        63, 64, 49, 47, 45, 45, 45, 49, 53, 55, 58, 60, 61, 62, 63, 65, 67, 68,
+        50, 48, 46, 46, 46, 50, 54, 56, 60, 61, 63, 65, 67, 68, 71, 71, 52, 50,
+        47, 47, 47, 50, 54, 57, 61, 63, 66, 68, 70, 72, 75, 75, 54, 52, 49, 49,
+        48, 52, 55, 58, 62, 65, 68, 71, 73, 75, 78, 79, 57, 54, 52, 51, 50, 53,
+        56, 60, 63, 67, 70, 73, 76, 79, 82, 83, 60, 57, 54, 53, 52, 55, 58, 61,
+        65, 68, 72, 75, 79, 82, 85, 86, 63, 60, 57, 56, 54, 57, 60, 63, 67, 71,
+        75, 78, 82, 85, 89, 90, 65, 61, 58, 57, 55, 58, 61, 64, 68, 71, 75, 79,
+        83, 86, 90, 91,
+        /* Size 32x32 */
+        32, 31, 31, 30, 30, 33, 33, 36, 36, 41, 41, 49, 49, 48, 48, 49, 49, 50,
+        50, 52, 52, 54, 54, 57, 57, 60, 60, 63, 63, 65, 65, 67, 31, 31, 31, 31,
+        31, 34, 34, 38, 38, 42, 42, 47, 47, 47, 47, 47, 47, 48, 48, 50, 50, 52,
+        52, 54, 54, 57, 57, 60, 60, 61, 61, 63, 31, 31, 31, 31, 31, 34, 34, 38,
+        38, 42, 42, 47, 47, 47, 47, 47, 47, 48, 48, 50, 50, 52, 52, 54, 54, 57,
+        57, 60, 60, 61, 61, 63, 30, 31, 31, 32, 32, 35, 35, 40, 40, 42, 42, 46,
+        46, 45, 45, 45, 45, 46, 46, 47, 47, 49, 49, 52, 52, 54, 54, 57, 57, 58,
+        58, 60, 30, 31, 31, 32, 32, 35, 35, 40, 40, 42, 42, 46, 46, 45, 45, 45,
+        45, 46, 46, 47, 47, 49, 49, 52, 52, 54, 54, 57, 57, 58, 58, 60, 33, 34,
+        34, 35, 35, 39, 39, 43, 43, 45, 45, 47, 47, 46, 46, 45, 45, 46, 46, 47,
+        47, 49, 49, 51, 51, 53, 53, 56, 56, 57, 57, 59, 33, 34, 34, 35, 35, 39,
+        39, 43, 43, 45, 45, 47, 47, 46, 46, 45, 45, 46, 46, 47, 47, 49, 49, 51,
+        51, 53, 53, 56, 56, 57, 57, 59, 36, 38, 38, 40, 40, 43, 43, 47, 47, 47,
+        47, 48, 48, 46, 46, 45, 45, 46, 46, 47, 47, 48, 48, 50, 50, 52, 52, 54,
+        54, 55, 55, 57, 36, 38, 38, 40, 40, 43, 43, 47, 47, 47, 47, 48, 48, 46,
+        46, 45, 45, 46, 46, 47, 47, 48, 48, 50, 50, 52, 52, 54, 54, 55, 55, 57,
+        41, 42, 42, 42, 42, 45, 45, 47, 47, 48, 48, 50, 50, 49, 49, 49, 49, 50,
+        50, 50, 50, 52, 52, 53, 53, 55, 55, 57, 57, 58, 58, 60, 41, 42, 42, 42,
+        42, 45, 45, 47, 47, 48, 48, 50, 50, 49, 49, 49, 49, 50, 50, 50, 50, 52,
+        52, 53, 53, 55, 55, 57, 57, 58, 58, 60, 49, 47, 47, 46, 46, 47, 47, 48,
+        48, 50, 50, 53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 55, 55, 56, 56, 58,
+        58, 60, 60, 61, 61, 62, 49, 47, 47, 46, 46, 47, 47, 48, 48, 50, 50, 53,
+        53, 53, 53, 53, 53, 54, 54, 54, 54, 55, 55, 56, 56, 58, 58, 60, 60, 61,
+        61, 62, 48, 47, 47, 45, 45, 46, 46, 46, 46, 49, 49, 53, 53, 54, 54, 55,
+        55, 56, 56, 57, 57, 58, 58, 60, 60, 61, 61, 63, 63, 64, 64, 66, 48, 47,
+        47, 45, 45, 46, 46, 46, 46, 49, 49, 53, 53, 54, 54, 55, 55, 56, 56, 57,
+        57, 58, 58, 60, 60, 61, 61, 63, 63, 64, 64, 66, 49, 47, 47, 45, 45, 45,
+        45, 45, 45, 49, 49, 53, 53, 55, 55, 58, 58, 60, 60, 61, 61, 62, 62, 63,
+        63, 65, 65, 67, 67, 68, 68, 69, 49, 47, 47, 45, 45, 45, 45, 45, 45, 49,
+        49, 53, 53, 55, 55, 58, 58, 60, 60, 61, 61, 62, 62, 63, 63, 65, 65, 67,
+        67, 68, 68, 69, 50, 48, 48, 46, 46, 46, 46, 46, 46, 50, 50, 54, 54, 56,
+        56, 60, 60, 61, 61, 63, 63, 65, 65, 67, 67, 68, 68, 71, 71, 71, 71, 72,
+        50, 48, 48, 46, 46, 46, 46, 46, 46, 50, 50, 54, 54, 56, 56, 60, 60, 61,
+        61, 63, 63, 65, 65, 67, 67, 68, 68, 71, 71, 71, 71, 72, 52, 50, 50, 47,
+        47, 47, 47, 47, 47, 50, 50, 54, 54, 57, 57, 61, 61, 63, 63, 66, 66, 68,
+        68, 70, 70, 72, 72, 75, 75, 75, 75, 76, 52, 50, 50, 47, 47, 47, 47, 47,
+        47, 50, 50, 54, 54, 57, 57, 61, 61, 63, 63, 66, 66, 68, 68, 70, 70, 72,
+        72, 75, 75, 75, 75, 76, 54, 52, 52, 49, 49, 49, 49, 48, 48, 52, 52, 55,
+        55, 58, 58, 62, 62, 65, 65, 68, 68, 71, 71, 73, 73, 75, 75, 78, 78, 79,
+        79, 80, 54, 52, 52, 49, 49, 49, 49, 48, 48, 52, 52, 55, 55, 58, 58, 62,
+        62, 65, 65, 68, 68, 71, 71, 73, 73, 75, 75, 78, 78, 79, 79, 80, 57, 54,
+        54, 52, 52, 51, 51, 50, 50, 53, 53, 56, 56, 60, 60, 63, 63, 67, 67, 70,
+        70, 73, 73, 76, 76, 79, 79, 82, 82, 83, 83, 84, 57, 54, 54, 52, 52, 51,
+        51, 50, 50, 53, 53, 56, 56, 60, 60, 63, 63, 67, 67, 70, 70, 73, 73, 76,
+        76, 79, 79, 82, 82, 83, 83, 84, 60, 57, 57, 54, 54, 53, 53, 52, 52, 55,
+        55, 58, 58, 61, 61, 65, 65, 68, 68, 72, 72, 75, 75, 79, 79, 82, 82, 85,
+        85, 86, 86, 88, 60, 57, 57, 54, 54, 53, 53, 52, 52, 55, 55, 58, 58, 61,
+        61, 65, 65, 68, 68, 72, 72, 75, 75, 79, 79, 82, 82, 85, 85, 86, 86, 88,
+        63, 60, 60, 57, 57, 56, 56, 54, 54, 57, 57, 60, 60, 63, 63, 67, 67, 71,
+        71, 75, 75, 78, 78, 82, 82, 85, 85, 89, 89, 90, 90, 92, 63, 60, 60, 57,
+        57, 56, 56, 54, 54, 57, 57, 60, 60, 63, 63, 67, 67, 71, 71, 75, 75, 78,
+        78, 82, 82, 85, 85, 89, 89, 90, 90, 92, 65, 61, 61, 58, 58, 57, 57, 55,
+        55, 58, 58, 61, 61, 64, 64, 68, 68, 71, 71, 75, 75, 79, 79, 83, 83, 86,
+        86, 90, 90, 91, 91, 93, 65, 61, 61, 58, 58, 57, 57, 55, 55, 58, 58, 61,
+        61, 64, 64, 68, 68, 71, 71, 75, 75, 79, 79, 83, 83, 86, 86, 90, 90, 91,
+        91, 93, 67, 63, 63, 60, 60, 59, 59, 57, 57, 60, 60, 62, 62, 66, 66, 69,
+        69, 72, 72, 76, 76, 80, 80, 84, 84, 88, 88, 92, 92, 93, 93, 95,
+        /* Size 4x8 */
+        31, 47, 50, 60, 36, 47, 47, 56, 43, 50, 50, 57, 46, 53, 57, 64, 46, 54,
+        64, 71, 50, 55, 68, 78, 54, 58, 72, 85, 59, 61, 75, 90,
+        /* Size 8x4 */
+        31, 36, 43, 46, 46, 50, 54, 59, 47, 47, 50, 53, 54, 55, 58, 61, 50, 47,
+        50, 57, 64, 68, 72, 75, 60, 56, 57, 64, 71, 78, 85, 90,
+        /* Size 8x16 */
+        32, 31, 37, 48, 49, 52, 57, 63, 31, 31, 38, 47, 47, 50, 54, 60, 30, 32,
+        40, 46, 45, 48, 52, 57, 33, 36, 43, 47, 46, 47, 51, 56, 37, 40, 47, 47,
+        45, 47, 50, 54, 42, 43, 47, 50, 49, 50, 53, 57, 49, 46, 48, 53, 53, 54,
+        57, 60, 48, 46, 47, 53, 56, 57, 60, 64, 49, 45, 46, 53, 58, 61, 64, 67,
+        50, 46, 46, 54, 59, 64, 67, 71, 52, 48, 47, 54, 61, 66, 71, 75, 54, 50,
+        49, 55, 62, 68, 73, 78, 57, 52, 50, 56, 64, 70, 76, 82, 60, 54, 52, 58,
+        65, 72, 79, 85, 63, 57, 55, 60, 67, 75, 82, 89, 64, 59, 56, 61, 68, 75,
+        83, 90,
+        /* Size 16x8 */
+        32, 31, 30, 33, 37, 42, 49, 48, 49, 50, 52, 54, 57, 60, 63, 64, 31, 31,
+        32, 36, 40, 43, 46, 46, 45, 46, 48, 50, 52, 54, 57, 59, 37, 38, 40, 43,
+        47, 47, 48, 47, 46, 46, 47, 49, 50, 52, 55, 56, 48, 47, 46, 47, 47, 50,
+        53, 53, 53, 54, 54, 55, 56, 58, 60, 61, 49, 47, 45, 46, 45, 49, 53, 56,
+        58, 59, 61, 62, 64, 65, 67, 68, 52, 50, 48, 47, 47, 50, 54, 57, 61, 64,
+        66, 68, 70, 72, 75, 75, 57, 54, 52, 51, 50, 53, 57, 60, 64, 67, 71, 73,
+        76, 79, 82, 83, 63, 60, 57, 56, 54, 57, 60, 64, 67, 71, 75, 78, 82, 85,
+        89, 90,
+        /* Size 16x32 */
+        32, 31, 31, 37, 37, 48, 48, 49, 49, 52, 52, 57, 57, 63, 63, 66, 31, 31,
+        31, 38, 38, 47, 47, 47, 47, 50, 50, 54, 54, 60, 60, 63, 31, 31, 31, 38,
+        38, 47, 47, 47, 47, 50, 50, 54, 54, 60, 60, 63, 30, 32, 32, 40, 40, 46,
+        46, 45, 45, 48, 48, 52, 52, 57, 57, 60, 30, 32, 32, 40, 40, 46, 46, 45,
+        45, 48, 48, 52, 52, 57, 57, 60, 33, 36, 36, 43, 43, 47, 47, 46, 46, 47,
+        47, 51, 51, 56, 56, 59, 33, 36, 36, 43, 43, 47, 47, 46, 46, 47, 47, 51,
+        51, 56, 56, 59, 37, 40, 40, 47, 47, 47, 47, 45, 45, 47, 47, 50, 50, 54,
+        54, 57, 37, 40, 40, 47, 47, 47, 47, 45, 45, 47, 47, 50, 50, 54, 54, 57,
+        42, 43, 43, 47, 47, 50, 50, 49, 49, 50, 50, 53, 53, 57, 57, 60, 42, 43,
+        43, 47, 47, 50, 50, 49, 49, 50, 50, 53, 53, 57, 57, 60, 49, 46, 46, 48,
+        48, 53, 53, 53, 53, 54, 54, 57, 57, 60, 60, 62, 49, 46, 46, 48, 48, 53,
+        53, 53, 53, 54, 54, 57, 57, 60, 60, 62, 48, 46, 46, 47, 47, 53, 53, 56,
+        56, 57, 57, 60, 60, 64, 64, 66, 48, 46, 46, 47, 47, 53, 53, 56, 56, 57,
+        57, 60, 60, 64, 64, 66, 49, 45, 45, 46, 46, 53, 53, 58, 58, 61, 61, 64,
+        64, 67, 67, 69, 49, 45, 45, 46, 46, 53, 53, 58, 58, 61, 61, 64, 64, 67,
+        67, 69, 50, 46, 46, 46, 46, 54, 54, 59, 59, 64, 64, 67, 67, 71, 71, 73,
+        50, 46, 46, 46, 46, 54, 54, 59, 59, 64, 64, 67, 67, 71, 71, 73, 52, 48,
+        48, 47, 47, 54, 54, 61, 61, 66, 66, 71, 71, 75, 75, 77, 52, 48, 48, 47,
+        47, 54, 54, 61, 61, 66, 66, 71, 71, 75, 75, 77, 54, 50, 50, 49, 49, 55,
+        55, 62, 62, 68, 68, 73, 73, 78, 78, 80, 54, 50, 50, 49, 49, 55, 55, 62,
+        62, 68, 68, 73, 73, 78, 78, 80, 57, 52, 52, 50, 50, 56, 56, 64, 64, 70,
+        70, 76, 76, 82, 82, 84, 57, 52, 52, 50, 50, 56, 56, 64, 64, 70, 70, 76,
+        76, 82, 82, 84, 60, 54, 54, 52, 52, 58, 58, 65, 65, 72, 72, 79, 79, 85,
+        85, 88, 60, 54, 54, 52, 52, 58, 58, 65, 65, 72, 72, 79, 79, 85, 85, 88,
+        63, 57, 57, 55, 55, 60, 60, 67, 67, 75, 75, 82, 82, 89, 89, 92, 63, 57,
+        57, 55, 55, 60, 60, 67, 67, 75, 75, 82, 82, 89, 89, 92, 64, 59, 59, 56,
+        56, 61, 61, 68, 68, 75, 75, 83, 83, 90, 90, 93, 64, 59, 59, 56, 56, 61,
+        61, 68, 68, 75, 75, 83, 83, 90, 90, 93, 66, 60, 60, 57, 57, 63, 63, 69,
+        69, 77, 77, 84, 84, 92, 92, 95,
+        /* Size 32x16 */
+        32, 31, 31, 30, 30, 33, 33, 37, 37, 42, 42, 49, 49, 48, 48, 49, 49, 50,
+        50, 52, 52, 54, 54, 57, 57, 60, 60, 63, 63, 64, 64, 66, 31, 31, 31, 32,
+        32, 36, 36, 40, 40, 43, 43, 46, 46, 46, 46, 45, 45, 46, 46, 48, 48, 50,
+        50, 52, 52, 54, 54, 57, 57, 59, 59, 60, 31, 31, 31, 32, 32, 36, 36, 40,
+        40, 43, 43, 46, 46, 46, 46, 45, 45, 46, 46, 48, 48, 50, 50, 52, 52, 54,
+        54, 57, 57, 59, 59, 60, 37, 38, 38, 40, 40, 43, 43, 47, 47, 47, 47, 48,
+        48, 47, 47, 46, 46, 46, 46, 47, 47, 49, 49, 50, 50, 52, 52, 55, 55, 56,
+        56, 57, 37, 38, 38, 40, 40, 43, 43, 47, 47, 47, 47, 48, 48, 47, 47, 46,
+        46, 46, 46, 47, 47, 49, 49, 50, 50, 52, 52, 55, 55, 56, 56, 57, 48, 47,
+        47, 46, 46, 47, 47, 47, 47, 50, 50, 53, 53, 53, 53, 53, 53, 54, 54, 54,
+        54, 55, 55, 56, 56, 58, 58, 60, 60, 61, 61, 63, 48, 47, 47, 46, 46, 47,
+        47, 47, 47, 50, 50, 53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 55, 55, 56,
+        56, 58, 58, 60, 60, 61, 61, 63, 49, 47, 47, 45, 45, 46, 46, 45, 45, 49,
+        49, 53, 53, 56, 56, 58, 58, 59, 59, 61, 61, 62, 62, 64, 64, 65, 65, 67,
+        67, 68, 68, 69, 49, 47, 47, 45, 45, 46, 46, 45, 45, 49, 49, 53, 53, 56,
+        56, 58, 58, 59, 59, 61, 61, 62, 62, 64, 64, 65, 65, 67, 67, 68, 68, 69,
+        52, 50, 50, 48, 48, 47, 47, 47, 47, 50, 50, 54, 54, 57, 57, 61, 61, 64,
+        64, 66, 66, 68, 68, 70, 70, 72, 72, 75, 75, 75, 75, 77, 52, 50, 50, 48,
+        48, 47, 47, 47, 47, 50, 50, 54, 54, 57, 57, 61, 61, 64, 64, 66, 66, 68,
+        68, 70, 70, 72, 72, 75, 75, 75, 75, 77, 57, 54, 54, 52, 52, 51, 51, 50,
+        50, 53, 53, 57, 57, 60, 60, 64, 64, 67, 67, 71, 71, 73, 73, 76, 76, 79,
+        79, 82, 82, 83, 83, 84, 57, 54, 54, 52, 52, 51, 51, 50, 50, 53, 53, 57,
+        57, 60, 60, 64, 64, 67, 67, 71, 71, 73, 73, 76, 76, 79, 79, 82, 82, 83,
+        83, 84, 63, 60, 60, 57, 57, 56, 56, 54, 54, 57, 57, 60, 60, 64, 64, 67,
+        67, 71, 71, 75, 75, 78, 78, 82, 82, 85, 85, 89, 89, 90, 90, 92, 63, 60,
+        60, 57, 57, 56, 56, 54, 54, 57, 57, 60, 60, 64, 64, 67, 67, 71, 71, 75,
+        75, 78, 78, 82, 82, 85, 85, 89, 89, 90, 90, 92, 66, 63, 63, 60, 60, 59,
+        59, 57, 57, 60, 60, 62, 62, 66, 66, 69, 69, 73, 73, 77, 77, 80, 80, 84,
+        84, 88, 88, 92, 92, 93, 93, 95,
+        /* Size 4x16 */
+        31, 48, 52, 63, 31, 47, 50, 60, 32, 46, 48, 57, 36, 47, 47, 56, 40, 47,
+        47, 54, 43, 50, 50, 57, 46, 53, 54, 60, 46, 53, 57, 64, 45, 53, 61, 67,
+        46, 54, 64, 71, 48, 54, 66, 75, 50, 55, 68, 78, 52, 56, 70, 82, 54, 58,
+        72, 85, 57, 60, 75, 89, 59, 61, 75, 90,
+        /* Size 16x4 */
+        31, 31, 32, 36, 40, 43, 46, 46, 45, 46, 48, 50, 52, 54, 57, 59, 48, 47,
+        46, 47, 47, 50, 53, 53, 53, 54, 54, 55, 56, 58, 60, 61, 52, 50, 48, 47,
+        47, 50, 54, 57, 61, 64, 66, 68, 70, 72, 75, 75, 63, 60, 57, 56, 54, 57,
+        60, 64, 67, 71, 75, 78, 82, 85, 89, 90,
+        /* Size 8x32 */
+        32, 31, 37, 48, 49, 52, 57, 63, 31, 31, 38, 47, 47, 50, 54, 60, 31, 31,
+        38, 47, 47, 50, 54, 60, 30, 32, 40, 46, 45, 48, 52, 57, 30, 32, 40, 46,
+        45, 48, 52, 57, 33, 36, 43, 47, 46, 47, 51, 56, 33, 36, 43, 47, 46, 47,
+        51, 56, 37, 40, 47, 47, 45, 47, 50, 54, 37, 40, 47, 47, 45, 47, 50, 54,
+        42, 43, 47, 50, 49, 50, 53, 57, 42, 43, 47, 50, 49, 50, 53, 57, 49, 46,
+        48, 53, 53, 54, 57, 60, 49, 46, 48, 53, 53, 54, 57, 60, 48, 46, 47, 53,
+        56, 57, 60, 64, 48, 46, 47, 53, 56, 57, 60, 64, 49, 45, 46, 53, 58, 61,
+        64, 67, 49, 45, 46, 53, 58, 61, 64, 67, 50, 46, 46, 54, 59, 64, 67, 71,
+        50, 46, 46, 54, 59, 64, 67, 71, 52, 48, 47, 54, 61, 66, 71, 75, 52, 48,
+        47, 54, 61, 66, 71, 75, 54, 50, 49, 55, 62, 68, 73, 78, 54, 50, 49, 55,
+        62, 68, 73, 78, 57, 52, 50, 56, 64, 70, 76, 82, 57, 52, 50, 56, 64, 70,
+        76, 82, 60, 54, 52, 58, 65, 72, 79, 85, 60, 54, 52, 58, 65, 72, 79, 85,
+        63, 57, 55, 60, 67, 75, 82, 89, 63, 57, 55, 60, 67, 75, 82, 89, 64, 59,
+        56, 61, 68, 75, 83, 90, 64, 59, 56, 61, 68, 75, 83, 90, 66, 60, 57, 63,
+        69, 77, 84, 92,
+        /* Size 32x8 */
+        32, 31, 31, 30, 30, 33, 33, 37, 37, 42, 42, 49, 49, 48, 48, 49, 49, 50,
+        50, 52, 52, 54, 54, 57, 57, 60, 60, 63, 63, 64, 64, 66, 31, 31, 31, 32,
+        32, 36, 36, 40, 40, 43, 43, 46, 46, 46, 46, 45, 45, 46, 46, 48, 48, 50,
+        50, 52, 52, 54, 54, 57, 57, 59, 59, 60, 37, 38, 38, 40, 40, 43, 43, 47,
+        47, 47, 47, 48, 48, 47, 47, 46, 46, 46, 46, 47, 47, 49, 49, 50, 50, 52,
+        52, 55, 55, 56, 56, 57, 48, 47, 47, 46, 46, 47, 47, 47, 47, 50, 50, 53,
+        53, 53, 53, 53, 53, 54, 54, 54, 54, 55, 55, 56, 56, 58, 58, 60, 60, 61,
+        61, 63, 49, 47, 47, 45, 45, 46, 46, 45, 45, 49, 49, 53, 53, 56, 56, 58,
+        58, 59, 59, 61, 61, 62, 62, 64, 64, 65, 65, 67, 67, 68, 68, 69, 52, 50,
+        50, 48, 48, 47, 47, 47, 47, 50, 50, 54, 54, 57, 57, 61, 61, 64, 64, 66,
+        66, 68, 68, 70, 70, 72, 72, 75, 75, 75, 75, 77, 57, 54, 54, 52, 52, 51,
+        51, 50, 50, 53, 53, 57, 57, 60, 60, 64, 64, 67, 67, 71, 71, 73, 73, 76,
+        76, 79, 79, 82, 82, 83, 83, 84, 63, 60, 60, 57, 57, 56, 56, 54, 54, 57,
+        57, 60, 60, 64, 64, 67, 67, 71, 71, 75, 75, 78, 78, 82, 82, 85, 85, 89,
+        89, 90, 90, 92 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 33, 45, 62, 33, 39, 51, 64, 45, 51, 71, 87, 62, 64, 87, 108,
+        /* Size 8x8 */
+        31, 32, 32, 35, 42, 51, 59, 69, 32, 32, 33, 35, 41, 49, 56, 65, 32, 33,
+        35, 38, 43, 49, 56, 64, 35, 35, 38, 48, 54, 59, 66, 73, 42, 41, 43, 54,
+        63, 71, 77, 85, 51, 49, 49, 59, 71, 81, 89, 97, 59, 56, 56, 66, 77, 89,
+        98, 108, 69, 65, 64, 73, 85, 97, 108, 119,
+        /* Size 16x16 */
+        32, 31, 31, 31, 32, 34, 35, 38, 41, 45, 48, 54, 59, 65, 71, 80, 31, 32,
+        32, 32, 32, 34, 35, 37, 40, 43, 46, 51, 56, 62, 68, 76, 31, 32, 32, 32,
+        32, 33, 34, 36, 38, 41, 44, 49, 54, 59, 65, 72, 31, 32, 32, 33, 34, 35,
+        36, 38, 40, 42, 45, 50, 54, 59, 64, 71, 32, 32, 32, 34, 35, 37, 38, 39,
+        41, 43, 46, 49, 53, 58, 63, 69, 34, 34, 33, 35, 37, 39, 42, 44, 46, 48,
+        51, 54, 58, 63, 68, 74, 35, 35, 34, 36, 38, 42, 46, 48, 50, 53, 55, 59,
+        62, 67, 72, 78, 38, 37, 36, 38, 39, 44, 48, 51, 54, 57, 59, 63, 67, 71,
+        76, 82, 41, 40, 38, 40, 41, 46, 50, 54, 57, 60, 63, 67, 71, 75, 80, 86,
+        45, 43, 41, 42, 43, 48, 53, 57, 60, 65, 68, 72, 76, 81, 85, 91, 48, 46,
+        44, 45, 46, 51, 55, 59, 63, 68, 71, 76, 80, 85, 90, 96, 54, 51, 49, 50,
+        49, 54, 59, 63, 67, 72, 76, 82, 87, 92, 97, 104, 59, 56, 54, 54, 53, 58,
+        62, 67, 71, 76, 80, 87, 92, 98, 103, 110, 65, 62, 59, 59, 58, 63, 67,
+        71, 75, 81, 85, 92, 98, 105, 111, 118, 71, 68, 65, 64, 63, 68, 72, 76,
+        80, 85, 90, 97, 103, 111, 117, 125, 80, 76, 72, 71, 69, 74, 78, 82, 86,
+        91, 96, 104, 110, 118, 125, 134,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 34, 34, 35, 36, 38, 39, 41, 44,
+        45, 48, 48, 53, 54, 57, 59, 62, 65, 67, 71, 72, 80, 80, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 34, 34, 35, 35, 37, 38, 40, 42, 43, 46, 46, 51,
+        52, 55, 56, 59, 62, 64, 68, 69, 76, 76, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 34, 34, 35, 35, 37, 38, 40, 42, 43, 46, 46, 51, 51, 55, 56, 59,
+        62, 64, 68, 69, 76, 76, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        34, 34, 36, 38, 39, 41, 42, 45, 45, 49, 50, 53, 54, 57, 60, 62, 66, 66,
+        73, 73, 31, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 36, 37,
+        38, 41, 41, 44, 44, 49, 49, 52, 54, 56, 59, 61, 65, 65, 72, 72, 31, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 35, 35, 37, 38, 39, 41, 42, 45,
+        45, 49, 49, 52, 54, 56, 59, 61, 64, 65, 72, 72, 31, 32, 32, 32, 32, 33,
+        33, 33, 34, 34, 35, 35, 36, 36, 38, 39, 40, 42, 42, 45, 45, 49, 50, 52,
+        54, 56, 59, 60, 64, 65, 71, 71, 32, 32, 32, 32, 32, 33, 33, 34, 34, 34,
+        35, 35, 36, 37, 38, 39, 40, 42, 43, 45, 45, 49, 49, 52, 54, 56, 59, 60,
+        64, 64, 70, 70, 32, 32, 32, 32, 32, 33, 34, 34, 35, 35, 37, 37, 38, 38,
+        39, 40, 41, 42, 43, 46, 46, 49, 49, 52, 53, 55, 58, 59, 63, 63, 69, 69,
+        32, 32, 32, 32, 33, 33, 34, 34, 35, 35, 37, 37, 38, 38, 40, 41, 41, 43,
+        43, 46, 46, 49, 50, 52, 54, 56, 58, 60, 63, 64, 70, 70, 34, 34, 34, 33,
+        33, 34, 35, 35, 37, 37, 39, 39, 42, 42, 44, 45, 46, 47, 48, 51, 51, 54,
+        54, 57, 58, 60, 63, 64, 68, 68, 74, 74, 34, 34, 34, 33, 33, 34, 35, 35,
+        37, 37, 39, 39, 42, 42, 44, 45, 46, 47, 48, 51, 51, 54, 54, 57, 58, 60,
+        63, 64, 68, 68, 74, 74, 35, 35, 35, 34, 34, 35, 36, 36, 38, 38, 42, 42,
+        46, 47, 48, 49, 50, 52, 53, 55, 55, 58, 59, 61, 62, 64, 67, 68, 72, 72,
+        78, 78, 36, 35, 35, 34, 34, 35, 36, 37, 38, 38, 42, 42, 47, 48, 50, 50,
+        52, 54, 54, 57, 57, 59, 60, 62, 64, 66, 68, 69, 73, 73, 79, 79, 38, 37,
+        37, 36, 36, 37, 38, 38, 39, 40, 44, 44, 48, 50, 51, 52, 54, 56, 57, 59,
+        59, 62, 63, 65, 67, 69, 71, 72, 76, 76, 82, 82, 39, 38, 38, 38, 37, 38,
+        39, 39, 40, 41, 45, 45, 49, 50, 52, 54, 55, 58, 58, 61, 61, 64, 65, 67,
+        69, 71, 73, 74, 78, 78, 84, 84, 41, 40, 40, 39, 38, 39, 40, 40, 41, 41,
+        46, 46, 50, 52, 54, 55, 57, 60, 60, 63, 63, 67, 67, 70, 71, 73, 75, 77,
+        80, 81, 86, 86, 44, 42, 42, 41, 41, 41, 42, 42, 42, 43, 47, 47, 52, 54,
+        56, 58, 60, 63, 64, 67, 67, 71, 71, 74, 75, 77, 79, 81, 84, 85, 90, 90,
+        45, 43, 43, 42, 41, 42, 42, 43, 43, 43, 48, 48, 53, 54, 57, 58, 60, 64,
+        65, 68, 68, 72, 72, 75, 76, 78, 81, 82, 85, 86, 91, 91, 48, 46, 46, 45,
+        44, 45, 45, 45, 46, 46, 51, 51, 55, 57, 59, 61, 63, 67, 68, 71, 71, 75,
+        76, 79, 80, 83, 85, 87, 90, 91, 96, 96, 48, 46, 46, 45, 44, 45, 45, 45,
+        46, 46, 51, 51, 55, 57, 59, 61, 63, 67, 68, 71, 71, 75, 76, 79, 80, 83,
+        85, 87, 90, 91, 96, 96, 53, 51, 51, 49, 49, 49, 49, 49, 49, 49, 54, 54,
+        58, 59, 62, 64, 67, 71, 72, 75, 75, 81, 81, 85, 86, 89, 91, 93, 97, 97,
+        103, 103, 54, 52, 51, 50, 49, 49, 50, 49, 49, 50, 54, 54, 59, 60, 63,
+        65, 67, 71, 72, 76, 76, 81, 82, 85, 87, 89, 92, 94, 97, 98, 104, 104,
+        57, 55, 55, 53, 52, 52, 52, 52, 52, 52, 57, 57, 61, 62, 65, 67, 70, 74,
+        75, 79, 79, 85, 85, 89, 90, 93, 96, 98, 102, 102, 108, 108, 59, 56, 56,
+        54, 54, 54, 54, 54, 53, 54, 58, 58, 62, 64, 67, 69, 71, 75, 76, 80, 80,
+        86, 87, 90, 92, 95, 98, 99, 103, 104, 110, 110, 62, 59, 59, 57, 56, 56,
+        56, 56, 55, 56, 60, 60, 64, 66, 69, 71, 73, 77, 78, 83, 83, 89, 89, 93,
+        95, 98, 101, 103, 107, 108, 114, 114, 65, 62, 62, 60, 59, 59, 59, 59,
+        58, 58, 63, 63, 67, 68, 71, 73, 75, 79, 81, 85, 85, 91, 92, 96, 98, 101,
+        105, 106, 111, 111, 118, 118, 67, 64, 64, 62, 61, 61, 60, 60, 59, 60,
+        64, 64, 68, 69, 72, 74, 77, 81, 82, 87, 87, 93, 94, 98, 99, 103, 106,
+        108, 113, 113, 120, 120, 71, 68, 68, 66, 65, 64, 64, 64, 63, 63, 68, 68,
+        72, 73, 76, 78, 80, 84, 85, 90, 90, 97, 97, 102, 103, 107, 111, 113,
+        117, 118, 125, 125, 72, 69, 69, 66, 65, 65, 65, 64, 63, 64, 68, 68, 72,
+        73, 76, 78, 81, 85, 86, 91, 91, 97, 98, 102, 104, 108, 111, 113, 118,
+        119, 126, 126, 80, 76, 76, 73, 72, 72, 71, 70, 69, 70, 74, 74, 78, 79,
+        82, 84, 86, 90, 91, 96, 96, 103, 104, 108, 110, 114, 118, 120, 125, 126,
+        134, 134, 80, 76, 76, 73, 72, 72, 71, 70, 69, 70, 74, 74, 78, 79, 82,
+        84, 86, 90, 91, 96, 96, 103, 104, 108, 110, 114, 118, 120, 125, 126,
+        134, 134,
+        /* Size 4x8 */
+        32, 34, 43, 62, 32, 34, 42, 59, 33, 37, 44, 58, 35, 43, 54, 68, 41, 48,
+        64, 79, 49, 54, 71, 91, 57, 60, 78, 101, 66, 68, 86, 111,
+        /* Size 8x4 */
+        32, 32, 33, 35, 41, 49, 57, 66, 34, 34, 37, 43, 48, 54, 60, 68, 43, 42,
+        44, 54, 64, 71, 78, 86, 62, 59, 58, 68, 79, 91, 101, 111,
+        /* Size 8x16 */
+        32, 31, 32, 36, 44, 53, 62, 73, 31, 32, 32, 35, 42, 51, 59, 69, 31, 32,
+        33, 34, 41, 49, 57, 66, 32, 32, 34, 36, 42, 50, 57, 65, 32, 33, 35, 38,
+        42, 49, 56, 64, 34, 34, 37, 42, 48, 54, 61, 69, 35, 34, 38, 47, 52, 59,
+        65, 73, 38, 36, 40, 49, 56, 63, 69, 77, 41, 39, 41, 51, 60, 67, 74, 81,
+        44, 42, 43, 54, 64, 72, 79, 86, 48, 45, 46, 56, 67, 76, 83, 91, 53, 49,
+        50, 60, 71, 82, 90, 99, 58, 54, 54, 63, 75, 87, 95, 105, 65, 60, 58, 68,
+        79, 92, 102, 112, 71, 65, 63, 73, 84, 97, 108, 119, 79, 72, 70, 79, 90,
+        104, 115, 127,
+        /* Size 16x8 */
+        32, 31, 31, 32, 32, 34, 35, 38, 41, 44, 48, 53, 58, 65, 71, 79, 31, 32,
+        32, 32, 33, 34, 34, 36, 39, 42, 45, 49, 54, 60, 65, 72, 32, 32, 33, 34,
+        35, 37, 38, 40, 41, 43, 46, 50, 54, 58, 63, 70, 36, 35, 34, 36, 38, 42,
+        47, 49, 51, 54, 56, 60, 63, 68, 73, 79, 44, 42, 41, 42, 42, 48, 52, 56,
+        60, 64, 67, 71, 75, 79, 84, 90, 53, 51, 49, 50, 49, 54, 59, 63, 67, 72,
+        76, 82, 87, 92, 97, 104, 62, 59, 57, 57, 56, 61, 65, 69, 74, 79, 83, 90,
+        95, 102, 108, 115, 73, 69, 66, 65, 64, 69, 73, 77, 81, 86, 91, 99, 105,
+        112, 119, 127,
+        /* Size 16x32 */
+        32, 31, 31, 32, 32, 34, 36, 38, 44, 44, 53, 53, 62, 65, 73, 79, 31, 32,
+        32, 32, 32, 34, 35, 37, 42, 43, 51, 51, 60, 62, 70, 75, 31, 32, 32, 32,
+        32, 34, 35, 37, 42, 43, 51, 51, 59, 62, 69, 75, 31, 32, 32, 32, 32, 33,
+        35, 36, 41, 42, 50, 50, 58, 60, 67, 73, 31, 32, 32, 32, 33, 33, 34, 36,
+        41, 41, 49, 49, 57, 59, 66, 72, 31, 32, 32, 33, 33, 34, 35, 37, 41, 42,
+        49, 49, 57, 59, 66, 71, 32, 32, 32, 33, 34, 35, 36, 38, 42, 43, 50, 50,
+        57, 59, 65, 71, 32, 32, 32, 34, 34, 35, 37, 38, 42, 43, 49, 49, 56, 59,
+        65, 70, 32, 32, 33, 34, 35, 37, 38, 39, 42, 43, 49, 49, 56, 58, 64, 69,
+        32, 33, 33, 34, 35, 37, 39, 40, 43, 44, 50, 50, 56, 58, 64, 69, 34, 34,
+        34, 36, 37, 39, 42, 44, 48, 48, 54, 54, 61, 63, 69, 73, 34, 34, 34, 36,
+        37, 39, 42, 44, 48, 48, 54, 54, 61, 63, 69, 73, 35, 34, 34, 37, 38, 42,
+        47, 48, 52, 53, 59, 59, 65, 67, 73, 77, 36, 35, 34, 37, 38, 43, 48, 49,
+        54, 54, 60, 60, 66, 68, 74, 78, 38, 36, 36, 38, 40, 44, 49, 51, 56, 57,
+        63, 63, 69, 71, 77, 81, 39, 38, 37, 40, 40, 45, 50, 52, 58, 58, 65, 65,
+        71, 73, 79, 84, 41, 39, 39, 41, 41, 46, 51, 54, 60, 60, 67, 67, 74, 76,
+        81, 86, 44, 41, 41, 42, 43, 48, 53, 56, 63, 64, 71, 71, 78, 79, 85, 90,
+        44, 42, 42, 43, 43, 48, 54, 56, 64, 64, 72, 72, 79, 81, 86, 91, 48, 45,
+        45, 46, 46, 51, 56, 59, 67, 67, 76, 76, 83, 85, 91, 96, 48, 45, 45, 46,
+        46, 51, 56, 59, 67, 67, 76, 76, 83, 85, 91, 96, 53, 49, 49, 49, 49, 54,
+        59, 62, 71, 71, 81, 81, 89, 91, 98, 103, 53, 50, 49, 50, 50, 54, 60, 63,
+        71, 72, 82, 82, 90, 92, 99, 103, 57, 53, 52, 52, 52, 57, 62, 65, 74, 75,
+        85, 85, 94, 96, 103, 108, 58, 54, 54, 54, 54, 58, 63, 67, 75, 76, 87,
+        87, 95, 98, 105, 110, 61, 57, 57, 56, 56, 60, 66, 69, 77, 78, 89, 89,
+        98, 101, 108, 114, 65, 60, 60, 59, 58, 63, 68, 71, 79, 80, 92, 92, 102,
+        105, 112, 118, 67, 62, 61, 60, 60, 64, 69, 72, 81, 82, 94, 94, 103, 106,
+        114, 120, 71, 66, 65, 64, 63, 68, 73, 76, 84, 85, 97, 97, 108, 111, 119,
+        125, 72, 66, 66, 64, 64, 68, 73, 76, 85, 86, 98, 98, 108, 111, 119, 125,
+        79, 73, 72, 71, 70, 74, 79, 82, 90, 91, 104, 104, 115, 118, 127, 133,
+        79, 73, 72, 71, 70, 74, 79, 82, 90, 91, 104, 104, 115, 118, 127, 133,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 31, 32, 32, 32, 32, 34, 34, 35, 36, 38, 39, 41, 44,
+        44, 48, 48, 53, 53, 57, 58, 61, 65, 67, 71, 72, 79, 79, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 34, 34, 34, 35, 36, 38, 39, 41, 42, 45, 45, 49,
+        50, 53, 54, 57, 60, 62, 66, 66, 73, 73, 31, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 34, 34, 34, 34, 36, 37, 39, 41, 42, 45, 45, 49, 49, 52, 54, 57,
+        60, 61, 65, 66, 72, 72, 32, 32, 32, 32, 32, 33, 33, 34, 34, 34, 36, 36,
+        37, 37, 38, 40, 41, 42, 43, 46, 46, 49, 50, 52, 54, 56, 59, 60, 64, 64,
+        71, 71, 32, 32, 32, 32, 33, 33, 34, 34, 35, 35, 37, 37, 38, 38, 40, 40,
+        41, 43, 43, 46, 46, 49, 50, 52, 54, 56, 58, 60, 63, 64, 70, 70, 34, 34,
+        34, 33, 33, 34, 35, 35, 37, 37, 39, 39, 42, 43, 44, 45, 46, 48, 48, 51,
+        51, 54, 54, 57, 58, 60, 63, 64, 68, 68, 74, 74, 36, 35, 35, 35, 34, 35,
+        36, 37, 38, 39, 42, 42, 47, 48, 49, 50, 51, 53, 54, 56, 56, 59, 60, 62,
+        63, 66, 68, 69, 73, 73, 79, 79, 38, 37, 37, 36, 36, 37, 38, 38, 39, 40,
+        44, 44, 48, 49, 51, 52, 54, 56, 56, 59, 59, 62, 63, 65, 67, 69, 71, 72,
+        76, 76, 82, 82, 44, 42, 42, 41, 41, 41, 42, 42, 42, 43, 48, 48, 52, 54,
+        56, 58, 60, 63, 64, 67, 67, 71, 71, 74, 75, 77, 79, 81, 84, 85, 90, 90,
+        44, 43, 43, 42, 41, 42, 43, 43, 43, 44, 48, 48, 53, 54, 57, 58, 60, 64,
+        64, 67, 67, 71, 72, 75, 76, 78, 80, 82, 85, 86, 91, 91, 53, 51, 51, 50,
+        49, 49, 50, 49, 49, 50, 54, 54, 59, 60, 63, 65, 67, 71, 72, 76, 76, 81,
+        82, 85, 87, 89, 92, 94, 97, 98, 104, 104, 53, 51, 51, 50, 49, 49, 50,
+        49, 49, 50, 54, 54, 59, 60, 63, 65, 67, 71, 72, 76, 76, 81, 82, 85, 87,
+        89, 92, 94, 97, 98, 104, 104, 62, 60, 59, 58, 57, 57, 57, 56, 56, 56,
+        61, 61, 65, 66, 69, 71, 74, 78, 79, 83, 83, 89, 90, 94, 95, 98, 102,
+        103, 108, 108, 115, 115, 65, 62, 62, 60, 59, 59, 59, 59, 58, 58, 63, 63,
+        67, 68, 71, 73, 76, 79, 81, 85, 85, 91, 92, 96, 98, 101, 105, 106, 111,
+        111, 118, 118, 73, 70, 69, 67, 66, 66, 65, 65, 64, 64, 69, 69, 73, 74,
+        77, 79, 81, 85, 86, 91, 91, 98, 99, 103, 105, 108, 112, 114, 119, 119,
+        127, 127, 79, 75, 75, 73, 72, 71, 71, 70, 69, 69, 73, 73, 77, 78, 81,
+        84, 86, 90, 91, 96, 96, 103, 103, 108, 110, 114, 118, 120, 125, 125,
+        133, 133,
+        /* Size 4x16 */
+        31, 34, 44, 65, 32, 34, 43, 62, 32, 33, 41, 59, 32, 35, 43, 59, 32, 37,
+        43, 58, 34, 39, 48, 63, 34, 42, 53, 67, 36, 44, 57, 71, 39, 46, 60, 76,
+        42, 48, 64, 81, 45, 51, 67, 85, 50, 54, 72, 92, 54, 58, 76, 98, 60, 63,
+        80, 105, 66, 68, 85, 111, 73, 74, 91, 118,
+        /* Size 16x4 */
+        31, 32, 32, 32, 32, 34, 34, 36, 39, 42, 45, 50, 54, 60, 66, 73, 34, 34,
+        33, 35, 37, 39, 42, 44, 46, 48, 51, 54, 58, 63, 68, 74, 44, 43, 41, 43,
+        43, 48, 53, 57, 60, 64, 67, 72, 76, 80, 85, 91, 65, 62, 59, 59, 58, 63,
+        67, 71, 76, 81, 85, 92, 98, 105, 111, 118,
+        /* Size 8x32 */
+        32, 31, 32, 36, 44, 53, 62, 73, 31, 32, 32, 35, 42, 51, 60, 70, 31, 32,
+        32, 35, 42, 51, 59, 69, 31, 32, 32, 35, 41, 50, 58, 67, 31, 32, 33, 34,
+        41, 49, 57, 66, 31, 32, 33, 35, 41, 49, 57, 66, 32, 32, 34, 36, 42, 50,
+        57, 65, 32, 32, 34, 37, 42, 49, 56, 65, 32, 33, 35, 38, 42, 49, 56, 64,
+        32, 33, 35, 39, 43, 50, 56, 64, 34, 34, 37, 42, 48, 54, 61, 69, 34, 34,
+        37, 42, 48, 54, 61, 69, 35, 34, 38, 47, 52, 59, 65, 73, 36, 34, 38, 48,
+        54, 60, 66, 74, 38, 36, 40, 49, 56, 63, 69, 77, 39, 37, 40, 50, 58, 65,
+        71, 79, 41, 39, 41, 51, 60, 67, 74, 81, 44, 41, 43, 53, 63, 71, 78, 85,
+        44, 42, 43, 54, 64, 72, 79, 86, 48, 45, 46, 56, 67, 76, 83, 91, 48, 45,
+        46, 56, 67, 76, 83, 91, 53, 49, 49, 59, 71, 81, 89, 98, 53, 49, 50, 60,
+        71, 82, 90, 99, 57, 52, 52, 62, 74, 85, 94, 103, 58, 54, 54, 63, 75, 87,
+        95, 105, 61, 57, 56, 66, 77, 89, 98, 108, 65, 60, 58, 68, 79, 92, 102,
+        112, 67, 61, 60, 69, 81, 94, 103, 114, 71, 65, 63, 73, 84, 97, 108, 119,
+        72, 66, 64, 73, 85, 98, 108, 119, 79, 72, 70, 79, 90, 104, 115, 127, 79,
+        72, 70, 79, 90, 104, 115, 127,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 31, 32, 32, 32, 32, 34, 34, 35, 36, 38, 39, 41, 44,
+        44, 48, 48, 53, 53, 57, 58, 61, 65, 67, 71, 72, 79, 79, 31, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 34, 34, 34, 34, 36, 37, 39, 41, 42, 45, 45, 49,
+        49, 52, 54, 57, 60, 61, 65, 66, 72, 72, 32, 32, 32, 32, 33, 33, 34, 34,
+        35, 35, 37, 37, 38, 38, 40, 40, 41, 43, 43, 46, 46, 49, 50, 52, 54, 56,
+        58, 60, 63, 64, 70, 70, 36, 35, 35, 35, 34, 35, 36, 37, 38, 39, 42, 42,
+        47, 48, 49, 50, 51, 53, 54, 56, 56, 59, 60, 62, 63, 66, 68, 69, 73, 73,
+        79, 79, 44, 42, 42, 41, 41, 41, 42, 42, 42, 43, 48, 48, 52, 54, 56, 58,
+        60, 63, 64, 67, 67, 71, 71, 74, 75, 77, 79, 81, 84, 85, 90, 90, 53, 51,
+        51, 50, 49, 49, 50, 49, 49, 50, 54, 54, 59, 60, 63, 65, 67, 71, 72, 76,
+        76, 81, 82, 85, 87, 89, 92, 94, 97, 98, 104, 104, 62, 60, 59, 58, 57,
+        57, 57, 56, 56, 56, 61, 61, 65, 66, 69, 71, 74, 78, 79, 83, 83, 89, 90,
+        94, 95, 98, 102, 103, 108, 108, 115, 115, 73, 70, 69, 67, 66, 66, 65,
+        65, 64, 64, 69, 69, 73, 74, 77, 79, 81, 85, 86, 91, 91, 98, 99, 103,
+        105, 108, 112, 114, 119, 119, 127, 127 },
+      { /* Chroma */
+        /* Size 4x4 */
+        31, 42, 47, 53, 42, 48, 50, 54, 47, 50, 61, 67, 53, 54, 67, 78,
+        /* Size 8x8 */
+        31, 32, 38, 48, 47, 50, 53, 57, 32, 35, 42, 47, 45, 47, 50, 54, 38, 42,
+        47, 48, 45, 47, 49, 52, 48, 47, 48, 53, 53, 54, 56, 58, 47, 45, 45, 53,
+        58, 61, 63, 65, 50, 47, 47, 54, 61, 66, 69, 72, 53, 50, 49, 56, 63, 69,
+        73, 77, 57, 54, 52, 58, 65, 72, 77, 82,
+        /* Size 16x16 */
+        32, 31, 30, 33, 36, 41, 47, 49, 49, 49, 50, 52, 54, 57, 60, 63, 31, 31,
+        31, 34, 38, 42, 46, 47, 47, 47, 48, 50, 52, 54, 57, 60, 30, 31, 32, 35,
+        40, 42, 45, 46, 45, 45, 46, 47, 49, 52, 54, 57, 33, 34, 35, 39, 43, 45,
+        47, 46, 46, 45, 46, 47, 49, 51, 53, 56, 36, 38, 40, 43, 47, 47, 47, 47,
+        46, 45, 46, 47, 48, 50, 52, 54, 41, 42, 42, 45, 47, 48, 50, 50, 49, 49,
+        50, 50, 52, 53, 55, 57, 47, 46, 45, 47, 47, 50, 52, 52, 52, 52, 53, 53,
+        55, 56, 58, 60, 49, 47, 46, 46, 47, 50, 52, 53, 54, 55, 55, 56, 57, 58,
+        60, 62, 49, 47, 45, 46, 46, 49, 52, 54, 55, 57, 58, 59, 60, 61, 63, 65,
+        49, 47, 45, 45, 45, 49, 52, 55, 57, 59, 60, 61, 63, 64, 66, 68, 50, 48,
+        46, 46, 46, 50, 53, 55, 58, 60, 61, 63, 65, 67, 68, 71, 52, 50, 47, 47,
+        47, 50, 53, 56, 59, 61, 63, 66, 68, 70, 72, 75, 54, 52, 49, 49, 48, 52,
+        55, 57, 60, 63, 65, 68, 71, 73, 75, 78, 57, 54, 52, 51, 50, 53, 56, 58,
+        61, 64, 67, 70, 73, 76, 79, 82, 60, 57, 54, 53, 52, 55, 58, 60, 63, 66,
+        68, 72, 75, 79, 82, 85, 63, 60, 57, 56, 54, 57, 60, 62, 65, 68, 71, 75,
+        78, 82, 85, 89,
+        /* Size 32x32 */
+        32, 31, 31, 30, 30, 32, 33, 34, 36, 37, 41, 41, 47, 49, 49, 48, 49, 49,
+        49, 50, 50, 52, 52, 54, 54, 56, 57, 58, 60, 60, 63, 63, 31, 31, 31, 31,
+        31, 32, 34, 35, 38, 38, 42, 42, 46, 48, 47, 47, 47, 47, 47, 48, 48, 50,
+        50, 51, 52, 53, 54, 55, 57, 57, 60, 60, 31, 31, 31, 31, 31, 33, 34, 35,
+        38, 39, 42, 42, 46, 47, 47, 47, 47, 47, 47, 48, 48, 49, 50, 51, 52, 53,
+        54, 55, 57, 57, 60, 60, 30, 31, 31, 31, 31, 33, 35, 36, 39, 40, 42, 42,
+        46, 47, 46, 46, 46, 45, 46, 47, 47, 48, 48, 50, 50, 51, 52, 53, 55, 55,
+        58, 58, 30, 31, 31, 31, 32, 33, 35, 36, 40, 40, 42, 42, 45, 46, 46, 45,
+        45, 45, 45, 46, 46, 47, 47, 49, 49, 51, 52, 52, 54, 54, 57, 57, 32, 32,
+        33, 33, 33, 35, 37, 38, 41, 42, 43, 43, 46, 47, 46, 46, 45, 45, 45, 46,
+        46, 47, 47, 49, 49, 50, 51, 52, 54, 54, 57, 57, 33, 34, 34, 35, 35, 37,
+        39, 40, 43, 43, 45, 45, 47, 47, 46, 46, 46, 45, 45, 46, 46, 47, 47, 49,
+        49, 50, 51, 52, 53, 54, 56, 56, 34, 35, 35, 36, 36, 38, 40, 41, 44, 44,
+        45, 45, 47, 47, 47, 46, 46, 45, 45, 46, 46, 47, 47, 48, 49, 50, 51, 51,
+        53, 53, 55, 55, 36, 38, 38, 39, 40, 41, 43, 44, 47, 47, 47, 47, 47, 48,
+        47, 46, 46, 45, 45, 46, 46, 46, 47, 48, 48, 49, 50, 50, 52, 52, 54, 54,
+        37, 38, 39, 40, 40, 42, 43, 44, 47, 47, 47, 47, 48, 48, 47, 47, 46, 45,
+        46, 46, 46, 47, 47, 48, 48, 49, 50, 51, 52, 52, 55, 55, 41, 42, 42, 42,
+        42, 43, 45, 45, 47, 47, 48, 48, 50, 50, 50, 49, 49, 49, 49, 50, 50, 50,
+        50, 51, 52, 52, 53, 54, 55, 55, 57, 57, 41, 42, 42, 42, 42, 43, 45, 45,
+        47, 47, 48, 48, 50, 50, 50, 49, 49, 49, 49, 50, 50, 50, 50, 51, 52, 52,
+        53, 54, 55, 55, 57, 57, 47, 46, 46, 46, 45, 46, 47, 47, 47, 48, 50, 50,
+        52, 52, 52, 52, 52, 52, 52, 53, 53, 53, 53, 54, 55, 55, 56, 56, 58, 58,
+        60, 60, 49, 48, 47, 47, 46, 47, 47, 47, 48, 48, 50, 50, 52, 53, 53, 53,
+        53, 53, 53, 54, 54, 54, 54, 55, 55, 56, 56, 57, 58, 58, 60, 60, 49, 47,
+        47, 46, 46, 46, 46, 47, 47, 47, 50, 50, 52, 53, 53, 54, 54, 55, 55, 55,
+        55, 56, 56, 57, 57, 58, 58, 59, 60, 60, 62, 62, 48, 47, 47, 46, 45, 46,
+        46, 46, 46, 47, 49, 49, 52, 53, 54, 54, 55, 55, 56, 56, 56, 57, 57, 58,
+        58, 59, 60, 60, 61, 62, 63, 63, 49, 47, 47, 46, 45, 45, 46, 46, 46, 46,
+        49, 49, 52, 53, 54, 55, 55, 57, 57, 58, 58, 59, 59, 60, 60, 61, 61, 62,
+        63, 63, 65, 65, 49, 47, 47, 45, 45, 45, 45, 45, 45, 45, 49, 49, 52, 53,
+        55, 55, 57, 58, 59, 60, 60, 61, 61, 62, 62, 63, 63, 64, 65, 65, 67, 67,
+        49, 47, 47, 46, 45, 45, 45, 45, 45, 46, 49, 49, 52, 53, 55, 56, 57, 59,
+        59, 60, 60, 61, 61, 62, 63, 63, 64, 65, 66, 66, 68, 68, 50, 48, 48, 47,
+        46, 46, 46, 46, 46, 46, 50, 50, 53, 54, 55, 56, 58, 60, 60, 61, 61, 63,
+        63, 65, 65, 66, 67, 67, 68, 69, 71, 71, 50, 48, 48, 47, 46, 46, 46, 46,
+        46, 46, 50, 50, 53, 54, 55, 56, 58, 60, 60, 61, 61, 63, 63, 65, 65, 66,
+        67, 67, 68, 69, 71, 71, 52, 50, 49, 48, 47, 47, 47, 47, 46, 47, 50, 50,
+        53, 54, 56, 57, 59, 61, 61, 63, 63, 66, 66, 67, 68, 69, 70, 71, 72, 72,
+        74, 74, 52, 50, 50, 48, 47, 47, 47, 47, 47, 47, 50, 50, 53, 54, 56, 57,
+        59, 61, 61, 63, 63, 66, 66, 68, 68, 69, 70, 71, 72, 73, 75, 75, 54, 51,
+        51, 50, 49, 49, 49, 48, 48, 48, 51, 51, 54, 55, 57, 58, 60, 62, 62, 65,
+        65, 67, 68, 69, 70, 71, 72, 73, 74, 75, 77, 77, 54, 52, 52, 50, 49, 49,
+        49, 49, 48, 48, 52, 52, 55, 55, 57, 58, 60, 62, 63, 65, 65, 68, 68, 70,
+        71, 72, 73, 74, 75, 76, 78, 78, 56, 53, 53, 51, 51, 50, 50, 50, 49, 49,
+        52, 52, 55, 56, 58, 59, 61, 63, 63, 66, 66, 69, 69, 71, 72, 73, 75, 75,
+        77, 77, 80, 80, 57, 54, 54, 52, 52, 51, 51, 51, 50, 50, 53, 53, 56, 56,
+        58, 60, 61, 63, 64, 67, 67, 70, 70, 72, 73, 75, 76, 77, 79, 79, 82, 82,
+        58, 55, 55, 53, 52, 52, 52, 51, 50, 51, 54, 54, 56, 57, 59, 60, 62, 64,
+        65, 67, 67, 71, 71, 73, 74, 75, 77, 78, 80, 80, 83, 83, 60, 57, 57, 55,
+        54, 54, 53, 53, 52, 52, 55, 55, 58, 58, 60, 61, 63, 65, 66, 68, 68, 72,
+        72, 74, 75, 77, 79, 80, 82, 82, 85, 85, 60, 57, 57, 55, 54, 54, 54, 53,
+        52, 52, 55, 55, 58, 58, 60, 62, 63, 65, 66, 69, 69, 72, 73, 75, 76, 77,
+        79, 80, 82, 82, 85, 85, 63, 60, 60, 58, 57, 57, 56, 55, 54, 55, 57, 57,
+        60, 60, 62, 63, 65, 67, 68, 71, 71, 74, 75, 77, 78, 80, 82, 83, 85, 85,
+        89, 89, 63, 60, 60, 58, 57, 57, 56, 55, 54, 55, 57, 57, 60, 60, 62, 63,
+        65, 67, 68, 71, 71, 74, 75, 77, 78, 80, 82, 83, 85, 85, 89, 89,
+        /* Size 4x8 */
+        31, 42, 47, 54, 33, 44, 45, 51, 40, 47, 46, 50, 47, 50, 54, 57, 45, 49,
+        59, 64, 48, 50, 61, 70, 51, 52, 63, 75, 55, 55, 66, 79,
+        /* Size 8x4 */
+        31, 33, 40, 47, 45, 48, 51, 55, 42, 44, 47, 50, 49, 50, 52, 55, 47, 45,
+        46, 54, 59, 61, 63, 66, 54, 51, 50, 57, 64, 70, 75, 79,
+        /* Size 8x16 */
+        32, 31, 37, 48, 49, 52, 56, 61, 31, 31, 38, 47, 47, 50, 53, 57, 30, 32,
+        40, 46, 45, 48, 51, 55, 33, 36, 43, 47, 46, 47, 50, 54, 37, 40, 47, 47,
+        45, 47, 49, 52, 42, 43, 47, 50, 49, 50, 53, 56, 47, 46, 48, 52, 53, 53,
+        55, 58, 48, 46, 47, 53, 55, 56, 58, 61, 48, 45, 46, 53, 57, 59, 61, 63,
+        49, 45, 46, 53, 58, 62, 64, 66, 50, 46, 46, 54, 59, 64, 66, 69, 52, 48,
+        47, 54, 61, 66, 70, 73, 54, 50, 49, 55, 62, 68, 72, 76, 57, 52, 50, 56,
+        64, 70, 75, 79, 60, 54, 52, 58, 65, 72, 77, 82, 63, 57, 55, 60, 67, 75,
+        80, 86,
+        /* Size 16x8 */
+        32, 31, 30, 33, 37, 42, 47, 48, 48, 49, 50, 52, 54, 57, 60, 63, 31, 31,
+        32, 36, 40, 43, 46, 46, 45, 45, 46, 48, 50, 52, 54, 57, 37, 38, 40, 43,
+        47, 47, 48, 47, 46, 46, 46, 47, 49, 50, 52, 55, 48, 47, 46, 47, 47, 50,
+        52, 53, 53, 53, 54, 54, 55, 56, 58, 60, 49, 47, 45, 46, 45, 49, 53, 55,
+        57, 58, 59, 61, 62, 64, 65, 67, 52, 50, 48, 47, 47, 50, 53, 56, 59, 62,
+        64, 66, 68, 70, 72, 75, 56, 53, 51, 50, 49, 53, 55, 58, 61, 64, 66, 70,
+        72, 75, 77, 80, 61, 57, 55, 54, 52, 56, 58, 61, 63, 66, 69, 73, 76, 79,
+        82, 86,
+        /* Size 16x32 */
+        32, 31, 31, 35, 37, 42, 48, 48, 49, 49, 52, 52, 56, 57, 61, 63, 31, 31,
+        31, 36, 38, 42, 47, 47, 47, 47, 50, 50, 54, 54, 58, 60, 31, 31, 31, 36,
+        38, 42, 47, 47, 47, 47, 50, 50, 53, 54, 57, 60, 30, 32, 32, 37, 39, 42,
+        46, 46, 46, 46, 48, 48, 52, 52, 56, 58, 30, 32, 32, 37, 40, 42, 46, 46,
+        45, 45, 48, 48, 51, 52, 55, 57, 32, 33, 34, 39, 41, 44, 46, 46, 45, 45,
+        48, 48, 51, 51, 54, 57, 33, 35, 36, 40, 43, 45, 47, 46, 46, 46, 47, 47,
+        50, 51, 54, 56, 34, 37, 37, 42, 44, 45, 47, 47, 45, 46, 47, 47, 50, 51,
+        53, 55, 37, 40, 40, 45, 47, 47, 47, 47, 45, 46, 47, 47, 49, 50, 52, 54,
+        37, 40, 40, 45, 47, 47, 48, 47, 46, 46, 47, 47, 49, 50, 53, 55, 42, 43,
+        43, 46, 47, 48, 50, 50, 49, 49, 50, 50, 53, 53, 56, 57, 42, 43, 43, 46,
+        47, 48, 50, 50, 49, 49, 50, 50, 53, 53, 56, 57, 47, 46, 46, 47, 48, 50,
+        52, 52, 53, 53, 53, 53, 55, 56, 58, 60, 49, 47, 46, 47, 48, 50, 53, 53,
+        53, 54, 54, 54, 56, 57, 59, 60, 48, 46, 46, 47, 47, 50, 53, 53, 55, 55,
+        56, 56, 58, 58, 61, 62, 48, 46, 46, 46, 47, 50, 53, 54, 56, 56, 57, 57,
+        59, 60, 62, 64, 48, 46, 45, 46, 46, 49, 53, 54, 57, 57, 59, 59, 61, 61,
+        63, 65, 49, 45, 45, 45, 46, 49, 53, 55, 58, 59, 61, 61, 63, 64, 66, 67,
+        49, 46, 45, 46, 46, 49, 53, 55, 58, 59, 62, 62, 64, 64, 66, 68, 50, 47,
+        46, 46, 46, 50, 54, 55, 59, 60, 64, 64, 66, 67, 69, 71, 50, 47, 46, 46,
+        46, 50, 54, 55, 59, 60, 64, 64, 66, 67, 69, 71, 52, 48, 48, 47, 47, 50,
+        54, 56, 61, 61, 66, 66, 69, 70, 72, 74, 52, 48, 48, 47, 47, 50, 54, 56,
+        61, 61, 66, 66, 70, 71, 73, 75, 53, 50, 49, 48, 48, 51, 55, 57, 62, 62,
+        68, 68, 71, 72, 75, 77, 54, 50, 50, 49, 49, 52, 55, 57, 62, 63, 68, 68,
+        72, 73, 76, 78, 55, 51, 51, 50, 49, 52, 56, 58, 63, 63, 69, 69, 74, 75,
+        78, 80, 57, 52, 52, 51, 50, 53, 56, 58, 64, 64, 70, 70, 75, 76, 79, 82,
+        58, 53, 53, 51, 51, 54, 57, 59, 64, 65, 71, 71, 76, 77, 80, 83, 60, 55,
+        54, 53, 52, 55, 58, 60, 65, 66, 72, 72, 77, 79, 82, 85, 60, 55, 55, 53,
+        53, 55, 59, 60, 65, 66, 73, 73, 78, 79, 83, 85, 63, 58, 57, 56, 55, 58,
+        60, 62, 67, 68, 75, 75, 80, 82, 86, 89, 63, 58, 57, 56, 55, 58, 60, 62,
+        67, 68, 75, 75, 80, 82, 86, 89,
+        /* Size 32x16 */
+        32, 31, 31, 30, 30, 32, 33, 34, 37, 37, 42, 42, 47, 49, 48, 48, 48, 49,
+        49, 50, 50, 52, 52, 53, 54, 55, 57, 58, 60, 60, 63, 63, 31, 31, 31, 32,
+        32, 33, 35, 37, 40, 40, 43, 43, 46, 47, 46, 46, 46, 45, 46, 47, 47, 48,
+        48, 50, 50, 51, 52, 53, 55, 55, 58, 58, 31, 31, 31, 32, 32, 34, 36, 37,
+        40, 40, 43, 43, 46, 46, 46, 46, 45, 45, 45, 46, 46, 48, 48, 49, 50, 51,
+        52, 53, 54, 55, 57, 57, 35, 36, 36, 37, 37, 39, 40, 42, 45, 45, 46, 46,
+        47, 47, 47, 46, 46, 45, 46, 46, 46, 47, 47, 48, 49, 50, 51, 51, 53, 53,
+        56, 56, 37, 38, 38, 39, 40, 41, 43, 44, 47, 47, 47, 47, 48, 48, 47, 47,
+        46, 46, 46, 46, 46, 47, 47, 48, 49, 49, 50, 51, 52, 53, 55, 55, 42, 42,
+        42, 42, 42, 44, 45, 45, 47, 47, 48, 48, 50, 50, 50, 50, 49, 49, 49, 50,
+        50, 50, 50, 51, 52, 52, 53, 54, 55, 55, 58, 58, 48, 47, 47, 46, 46, 46,
+        47, 47, 47, 48, 50, 50, 52, 53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 55,
+        55, 56, 56, 57, 58, 59, 60, 60, 48, 47, 47, 46, 46, 46, 46, 47, 47, 47,
+        50, 50, 52, 53, 53, 54, 54, 55, 55, 55, 55, 56, 56, 57, 57, 58, 58, 59,
+        60, 60, 62, 62, 49, 47, 47, 46, 45, 45, 46, 45, 45, 46, 49, 49, 53, 53,
+        55, 56, 57, 58, 58, 59, 59, 61, 61, 62, 62, 63, 64, 64, 65, 65, 67, 67,
+        49, 47, 47, 46, 45, 45, 46, 46, 46, 46, 49, 49, 53, 54, 55, 56, 57, 59,
+        59, 60, 60, 61, 61, 62, 63, 63, 64, 65, 66, 66, 68, 68, 52, 50, 50, 48,
+        48, 48, 47, 47, 47, 47, 50, 50, 53, 54, 56, 57, 59, 61, 62, 64, 64, 66,
+        66, 68, 68, 69, 70, 71, 72, 73, 75, 75, 52, 50, 50, 48, 48, 48, 47, 47,
+        47, 47, 50, 50, 53, 54, 56, 57, 59, 61, 62, 64, 64, 66, 66, 68, 68, 69,
+        70, 71, 72, 73, 75, 75, 56, 54, 53, 52, 51, 51, 50, 50, 49, 49, 53, 53,
+        55, 56, 58, 59, 61, 63, 64, 66, 66, 69, 70, 71, 72, 74, 75, 76, 77, 78,
+        80, 80, 57, 54, 54, 52, 52, 51, 51, 51, 50, 50, 53, 53, 56, 57, 58, 60,
+        61, 64, 64, 67, 67, 70, 71, 72, 73, 75, 76, 77, 79, 79, 82, 82, 61, 58,
+        57, 56, 55, 54, 54, 53, 52, 53, 56, 56, 58, 59, 61, 62, 63, 66, 66, 69,
+        69, 72, 73, 75, 76, 78, 79, 80, 82, 83, 86, 86, 63, 60, 60, 58, 57, 57,
+        56, 55, 54, 55, 57, 57, 60, 60, 62, 64, 65, 67, 68, 71, 71, 74, 75, 77,
+        78, 80, 82, 83, 85, 85, 89, 89,
+        /* Size 4x16 */
+        31, 42, 49, 57, 31, 42, 47, 54, 32, 42, 45, 52, 35, 45, 46, 51, 40, 47,
+        46, 50, 43, 48, 49, 53, 46, 50, 53, 56, 46, 50, 55, 58, 46, 49, 57, 61,
+        46, 49, 59, 64, 47, 50, 60, 67, 48, 50, 61, 71, 50, 52, 63, 73, 52, 53,
+        64, 76, 55, 55, 66, 79, 58, 58, 68, 82,
+        /* Size 16x4 */
+        31, 31, 32, 35, 40, 43, 46, 46, 46, 46, 47, 48, 50, 52, 55, 58, 42, 42,
+        42, 45, 47, 48, 50, 50, 49, 49, 50, 50, 52, 53, 55, 58, 49, 47, 45, 46,
+        46, 49, 53, 55, 57, 59, 60, 61, 63, 64, 66, 68, 57, 54, 52, 51, 50, 53,
+        56, 58, 61, 64, 67, 71, 73, 76, 79, 82,
+        /* Size 8x32 */
+        32, 31, 37, 48, 49, 52, 56, 61, 31, 31, 38, 47, 47, 50, 54, 58, 31, 31,
+        38, 47, 47, 50, 53, 57, 30, 32, 39, 46, 46, 48, 52, 56, 30, 32, 40, 46,
+        45, 48, 51, 55, 32, 34, 41, 46, 45, 48, 51, 54, 33, 36, 43, 47, 46, 47,
+        50, 54, 34, 37, 44, 47, 45, 47, 50, 53, 37, 40, 47, 47, 45, 47, 49, 52,
+        37, 40, 47, 48, 46, 47, 49, 53, 42, 43, 47, 50, 49, 50, 53, 56, 42, 43,
+        47, 50, 49, 50, 53, 56, 47, 46, 48, 52, 53, 53, 55, 58, 49, 46, 48, 53,
+        53, 54, 56, 59, 48, 46, 47, 53, 55, 56, 58, 61, 48, 46, 47, 53, 56, 57,
+        59, 62, 48, 45, 46, 53, 57, 59, 61, 63, 49, 45, 46, 53, 58, 61, 63, 66,
+        49, 45, 46, 53, 58, 62, 64, 66, 50, 46, 46, 54, 59, 64, 66, 69, 50, 46,
+        46, 54, 59, 64, 66, 69, 52, 48, 47, 54, 61, 66, 69, 72, 52, 48, 47, 54,
+        61, 66, 70, 73, 53, 49, 48, 55, 62, 68, 71, 75, 54, 50, 49, 55, 62, 68,
+        72, 76, 55, 51, 49, 56, 63, 69, 74, 78, 57, 52, 50, 56, 64, 70, 75, 79,
+        58, 53, 51, 57, 64, 71, 76, 80, 60, 54, 52, 58, 65, 72, 77, 82, 60, 55,
+        53, 59, 65, 73, 78, 83, 63, 57, 55, 60, 67, 75, 80, 86, 63, 57, 55, 60,
+        67, 75, 80, 86,
+        /* Size 32x8 */
+        32, 31, 31, 30, 30, 32, 33, 34, 37, 37, 42, 42, 47, 49, 48, 48, 48, 49,
+        49, 50, 50, 52, 52, 53, 54, 55, 57, 58, 60, 60, 63, 63, 31, 31, 31, 32,
+        32, 34, 36, 37, 40, 40, 43, 43, 46, 46, 46, 46, 45, 45, 45, 46, 46, 48,
+        48, 49, 50, 51, 52, 53, 54, 55, 57, 57, 37, 38, 38, 39, 40, 41, 43, 44,
+        47, 47, 47, 47, 48, 48, 47, 47, 46, 46, 46, 46, 46, 47, 47, 48, 49, 49,
+        50, 51, 52, 53, 55, 55, 48, 47, 47, 46, 46, 46, 47, 47, 47, 48, 50, 50,
+        52, 53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 55, 55, 56, 56, 57, 58, 59,
+        60, 60, 49, 47, 47, 46, 45, 45, 46, 45, 45, 46, 49, 49, 53, 53, 55, 56,
+        57, 58, 58, 59, 59, 61, 61, 62, 62, 63, 64, 64, 65, 65, 67, 67, 52, 50,
+        50, 48, 48, 48, 47, 47, 47, 47, 50, 50, 53, 54, 56, 57, 59, 61, 62, 64,
+        64, 66, 66, 68, 68, 69, 70, 71, 72, 73, 75, 75, 56, 54, 53, 52, 51, 51,
+        50, 50, 49, 49, 53, 53, 55, 56, 58, 59, 61, 63, 64, 66, 66, 69, 70, 71,
+        72, 74, 75, 76, 77, 78, 80, 80, 61, 58, 57, 56, 55, 54, 54, 53, 52, 53,
+        56, 56, 58, 59, 61, 62, 63, 66, 66, 69, 69, 72, 73, 75, 76, 78, 79, 80,
+        82, 83, 86, 86 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 33, 42, 55, 33, 38, 46, 57, 42, 46, 63, 75, 55, 57, 75, 92,
+        /* Size 8x8 */
+        31, 32, 32, 34, 38, 46, 52, 63, 32, 32, 32, 34, 37, 44, 49, 59, 32, 32,
+        35, 37, 40, 45, 49, 58, 34, 34, 37, 42, 47, 52, 56, 65, 38, 37, 40, 47,
+        54, 60, 65, 73, 46, 44, 45, 52, 60, 69, 75, 84, 52, 49, 49, 56, 65, 75,
+        82, 92, 63, 59, 58, 65, 73, 84, 92, 105,
+        /* Size 16x16 */
+        32, 31, 31, 31, 32, 32, 34, 36, 38, 41, 44, 48, 54, 58, 61, 65, 31, 32,
+        32, 32, 32, 32, 34, 35, 38, 40, 42, 46, 51, 55, 58, 62, 31, 32, 32, 32,
+        32, 32, 33, 34, 37, 38, 41, 44, 49, 53, 56, 59, 31, 32, 32, 33, 33, 33,
+        35, 36, 38, 40, 42, 45, 49, 53, 56, 59, 32, 32, 32, 33, 34, 34, 36, 37,
+        39, 40, 42, 45, 49, 53, 55, 59, 32, 32, 32, 33, 34, 35, 37, 38, 40, 41,
+        42, 46, 49, 52, 55, 58, 34, 34, 33, 35, 36, 37, 39, 42, 44, 46, 47, 51,
+        54, 57, 60, 63, 36, 35, 34, 36, 37, 38, 42, 48, 50, 52, 54, 57, 60, 63,
+        65, 68, 38, 38, 37, 38, 39, 40, 44, 50, 52, 54, 57, 60, 64, 67, 69, 72,
+        41, 40, 38, 40, 40, 41, 46, 52, 54, 57, 60, 63, 67, 70, 73, 75, 44, 42,
+        41, 42, 42, 42, 47, 54, 57, 60, 63, 67, 71, 74, 77, 79, 48, 46, 44, 45,
+        45, 46, 51, 57, 60, 63, 67, 71, 76, 79, 82, 85, 54, 51, 49, 49, 49, 49,
+        54, 60, 64, 67, 71, 76, 82, 86, 89, 92, 58, 55, 53, 53, 53, 52, 57, 63,
+        67, 70, 74, 79, 86, 90, 93, 97, 61, 58, 56, 56, 55, 55, 60, 65, 69, 73,
+        77, 82, 89, 93, 97, 101, 65, 62, 59, 59, 59, 58, 63, 68, 72, 75, 79, 85,
+        92, 97, 101, 105,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 33, 34, 34, 36, 36, 38, 39,
+        41, 44, 44, 47, 48, 50, 54, 54, 58, 59, 61, 65, 65, 70, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 35, 35, 38, 38, 40, 42, 42, 46,
+        47, 49, 52, 52, 56, 57, 59, 63, 63, 67, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 33, 34, 34, 35, 35, 38, 38, 40, 42, 42, 45, 46, 48, 51, 51,
+        55, 56, 58, 62, 62, 67, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+        34, 34, 35, 35, 37, 38, 39, 42, 42, 45, 45, 47, 50, 50, 54, 55, 57, 61,
+        61, 65, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 34,
+        37, 37, 38, 41, 41, 44, 44, 46, 49, 49, 53, 54, 56, 59, 59, 64, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 34, 37, 37, 38, 41,
+        41, 44, 44, 46, 49, 49, 53, 54, 56, 59, 59, 64, 31, 32, 32, 32, 32, 32,
+        33, 33, 33, 33, 33, 34, 35, 35, 36, 36, 38, 39, 40, 42, 42, 44, 45, 47,
+        49, 49, 53, 54, 56, 59, 59, 63, 31, 32, 32, 32, 32, 32, 33, 33, 33, 34,
+        34, 35, 35, 36, 36, 36, 38, 39, 40, 42, 42, 45, 45, 47, 50, 50, 53, 54,
+        56, 59, 59, 63, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 34, 35, 36, 36,
+        37, 37, 39, 39, 40, 42, 42, 45, 45, 47, 49, 49, 53, 54, 55, 59, 59, 63,
+        32, 32, 32, 32, 32, 32, 33, 34, 34, 35, 35, 36, 37, 37, 38, 38, 40, 40,
+        41, 42, 42, 45, 46, 47, 49, 49, 52, 53, 55, 58, 58, 62, 32, 32, 32, 32,
+        32, 32, 33, 34, 34, 35, 35, 36, 37, 37, 38, 38, 40, 40, 41, 42, 42, 45,
+        46, 47, 49, 49, 52, 53, 55, 58, 58, 62, 33, 33, 33, 33, 33, 33, 34, 35,
+        35, 36, 36, 38, 39, 40, 42, 42, 43, 44, 45, 46, 46, 49, 50, 51, 53, 53,
+        56, 57, 59, 62, 62, 66, 34, 34, 34, 34, 33, 33, 35, 35, 36, 37, 37, 39,
+        39, 41, 42, 42, 44, 45, 46, 47, 47, 50, 51, 52, 54, 54, 57, 58, 60, 63,
+        63, 67, 34, 34, 34, 34, 34, 34, 35, 36, 36, 37, 37, 40, 41, 42, 45, 45,
+        46, 47, 48, 50, 50, 52, 53, 54, 56, 56, 59, 60, 62, 65, 65, 69, 36, 35,
+        35, 35, 34, 34, 36, 36, 37, 38, 38, 42, 42, 45, 48, 48, 50, 50, 52, 54,
+        54, 56, 57, 58, 60, 60, 63, 64, 65, 68, 68, 72, 36, 35, 35, 35, 34, 34,
+        36, 36, 37, 38, 38, 42, 42, 45, 48, 48, 50, 50, 52, 54, 54, 56, 57, 58,
+        60, 60, 63, 64, 65, 68, 68, 72, 38, 38, 38, 37, 37, 37, 38, 38, 39, 40,
+        40, 43, 44, 46, 50, 50, 52, 53, 54, 57, 57, 59, 60, 61, 64, 64, 67, 68,
+        69, 72, 72, 76, 39, 38, 38, 38, 37, 37, 39, 39, 39, 40, 40, 44, 45, 47,
+        50, 50, 53, 54, 55, 58, 58, 60, 61, 62, 65, 65, 68, 69, 70, 73, 73, 77,
+        41, 40, 40, 39, 38, 38, 40, 40, 40, 41, 41, 45, 46, 48, 52, 52, 54, 55,
+        57, 60, 60, 62, 63, 65, 67, 67, 70, 71, 73, 75, 75, 79, 44, 42, 42, 42,
+        41, 41, 42, 42, 42, 42, 42, 46, 47, 50, 54, 54, 57, 58, 60, 63, 63, 66,
+        67, 68, 71, 71, 74, 75, 77, 79, 79, 83, 44, 42, 42, 42, 41, 41, 42, 42,
+        42, 42, 42, 46, 47, 50, 54, 54, 57, 58, 60, 63, 63, 66, 67, 68, 71, 71,
+        74, 75, 77, 79, 79, 83, 47, 46, 45, 45, 44, 44, 44, 45, 45, 45, 45, 49,
+        50, 52, 56, 56, 59, 60, 62, 66, 66, 69, 70, 72, 75, 75, 78, 79, 81, 84,
+        84, 88, 48, 47, 46, 45, 44, 44, 45, 45, 45, 46, 46, 50, 51, 53, 57, 57,
+        60, 61, 63, 67, 67, 70, 71, 73, 76, 76, 79, 80, 82, 85, 85, 89, 50, 49,
+        48, 47, 46, 46, 47, 47, 47, 47, 47, 51, 52, 54, 58, 58, 61, 62, 65, 68,
+        68, 72, 73, 75, 78, 78, 82, 83, 85, 88, 88, 92, 54, 52, 51, 50, 49, 49,
+        49, 50, 49, 49, 49, 53, 54, 56, 60, 60, 64, 65, 67, 71, 71, 75, 76, 78,
+        82, 82, 86, 87, 89, 92, 92, 96, 54, 52, 51, 50, 49, 49, 49, 50, 49, 49,
+        49, 53, 54, 56, 60, 60, 64, 65, 67, 71, 71, 75, 76, 78, 82, 82, 86, 87,
+        89, 92, 92, 96, 58, 56, 55, 54, 53, 53, 53, 53, 53, 52, 52, 56, 57, 59,
+        63, 63, 67, 68, 70, 74, 74, 78, 79, 82, 86, 86, 90, 91, 93, 97, 97, 101,
+        59, 57, 56, 55, 54, 54, 54, 54, 54, 53, 53, 57, 58, 60, 64, 64, 68, 69,
+        71, 75, 75, 79, 80, 83, 87, 87, 91, 92, 94, 98, 98, 102, 61, 59, 58, 57,
+        56, 56, 56, 56, 55, 55, 55, 59, 60, 62, 65, 65, 69, 70, 73, 77, 77, 81,
+        82, 85, 89, 89, 93, 94, 97, 101, 101, 105, 65, 63, 62, 61, 59, 59, 59,
+        59, 59, 58, 58, 62, 63, 65, 68, 68, 72, 73, 75, 79, 79, 84, 85, 88, 92,
+        92, 97, 98, 101, 105, 105, 109, 65, 63, 62, 61, 59, 59, 59, 59, 59, 58,
+        58, 62, 63, 65, 68, 68, 72, 73, 75, 79, 79, 84, 85, 88, 92, 92, 97, 98,
+        101, 105, 105, 109, 70, 67, 67, 65, 64, 64, 63, 63, 63, 62, 62, 66, 67,
+        69, 72, 72, 76, 77, 79, 83, 83, 88, 89, 92, 96, 96, 101, 102, 105, 109,
+        109, 114,
+        /* Size 4x8 */
+        32, 32, 42, 56, 32, 33, 41, 53, 32, 35, 42, 52, 34, 37, 50, 59, 38, 40,
+        58, 68, 44, 45, 66, 78, 50, 50, 71, 86, 61, 58, 79, 97,
+        /* Size 8x4 */
+        32, 32, 32, 34, 38, 44, 50, 61, 32, 33, 35, 37, 40, 45, 50, 58, 42, 41,
+        42, 50, 58, 66, 71, 79, 56, 53, 52, 59, 68, 78, 86, 97,
+        /* Size 8x16 */
+        32, 31, 32, 35, 39, 44, 53, 65, 31, 32, 32, 35, 38, 42, 51, 62, 31, 32,
+        33, 34, 37, 41, 49, 59, 31, 32, 34, 35, 38, 42, 49, 59, 32, 32, 34, 36,
+        39, 42, 49, 58, 32, 33, 35, 37, 40, 42, 49, 58, 34, 34, 37, 41, 44, 48,
+        54, 63, 36, 34, 38, 46, 50, 54, 60, 68, 38, 37, 40, 47, 52, 57, 64, 72,
+        41, 39, 41, 49, 54, 60, 67, 76, 44, 41, 43, 51, 57, 63, 71, 79, 48, 45,
+        46, 54, 60, 67, 76, 85, 53, 49, 50, 57, 64, 71, 82, 92, 57, 53, 53, 60,
+        67, 74, 86, 97, 61, 56, 56, 63, 69, 77, 89, 100, 65, 60, 58, 66, 72, 79,
+        92, 105,
+        /* Size 16x8 */
+        32, 31, 31, 31, 32, 32, 34, 36, 38, 41, 44, 48, 53, 57, 61, 65, 31, 32,
+        32, 32, 32, 33, 34, 34, 37, 39, 41, 45, 49, 53, 56, 60, 32, 32, 33, 34,
+        34, 35, 37, 38, 40, 41, 43, 46, 50, 53, 56, 58, 35, 35, 34, 35, 36, 37,
+        41, 46, 47, 49, 51, 54, 57, 60, 63, 66, 39, 38, 37, 38, 39, 40, 44, 50,
+        52, 54, 57, 60, 64, 67, 69, 72, 44, 42, 41, 42, 42, 42, 48, 54, 57, 60,
+        63, 67, 71, 74, 77, 79, 53, 51, 49, 49, 49, 49, 54, 60, 64, 67, 71, 76,
+        82, 86, 89, 92, 65, 62, 59, 59, 58, 58, 63, 68, 72, 76, 79, 85, 92, 97,
+        100, 105,
+        /* Size 16x32 */
+        32, 31, 31, 31, 32, 32, 35, 36, 39, 44, 44, 51, 53, 58, 65, 65, 31, 32,
+        32, 32, 32, 32, 35, 35, 38, 42, 42, 49, 52, 56, 63, 63, 31, 32, 32, 32,
+        32, 32, 35, 35, 38, 42, 42, 49, 51, 55, 62, 62, 31, 32, 32, 32, 32, 32,
+        34, 35, 37, 41, 41, 48, 50, 54, 61, 61, 31, 32, 32, 32, 33, 33, 34, 34,
+        37, 41, 41, 47, 49, 53, 59, 59, 31, 32, 32, 32, 33, 33, 34, 34, 37, 41,
+        41, 47, 49, 53, 59, 59, 31, 32, 32, 33, 34, 34, 35, 36, 38, 42, 42, 48,
+        49, 53, 59, 59, 32, 32, 32, 33, 34, 34, 36, 36, 38, 42, 42, 48, 50, 53,
+        59, 59, 32, 32, 32, 33, 34, 34, 36, 37, 39, 42, 42, 48, 49, 53, 58, 58,
+        32, 32, 33, 34, 35, 35, 37, 38, 40, 42, 42, 48, 49, 52, 58, 58, 32, 32,
+        33, 34, 35, 35, 37, 38, 40, 42, 42, 48, 49, 52, 58, 58, 33, 33, 33, 35,
+        36, 36, 40, 41, 43, 46, 46, 52, 53, 56, 62, 62, 34, 34, 34, 35, 37, 37,
+        41, 42, 44, 48, 48, 53, 54, 57, 63, 63, 34, 34, 34, 35, 37, 37, 43, 44,
+        46, 50, 50, 55, 56, 59, 65, 65, 36, 35, 34, 36, 38, 38, 46, 48, 50, 54,
+        54, 58, 60, 63, 68, 68, 36, 35, 34, 36, 38, 38, 46, 48, 50, 54, 54, 58,
+        60, 63, 68, 68, 38, 37, 37, 38, 40, 40, 47, 50, 52, 57, 57, 62, 64, 67,
+        72, 72, 39, 38, 37, 39, 40, 40, 48, 50, 53, 58, 58, 63, 65, 68, 73, 73,
+        41, 39, 39, 40, 41, 41, 49, 51, 54, 60, 60, 66, 67, 70, 76, 76, 44, 41,
+        41, 42, 43, 43, 51, 53, 57, 63, 63, 69, 71, 74, 79, 79, 44, 41, 41, 42,
+        43, 43, 51, 53, 57, 63, 63, 69, 71, 74, 79, 79, 47, 44, 44, 44, 45, 45,
+        53, 56, 59, 66, 66, 73, 75, 78, 84, 84, 48, 45, 45, 45, 46, 46, 54, 56,
+        60, 67, 67, 74, 76, 79, 85, 85, 50, 47, 46, 47, 47, 47, 55, 58, 61, 68,
+        68, 76, 78, 82, 88, 88, 53, 50, 49, 50, 50, 50, 57, 60, 64, 71, 71, 79,
+        82, 86, 92, 92, 53, 50, 49, 50, 50, 50, 57, 60, 64, 71, 71, 79, 82, 86,
+        92, 92, 57, 54, 53, 53, 53, 53, 60, 63, 67, 74, 74, 83, 86, 90, 97, 97,
+        58, 55, 54, 54, 54, 54, 61, 63, 68, 75, 75, 84, 87, 91, 98, 98, 61, 57,
+        56, 56, 56, 56, 63, 65, 69, 77, 77, 86, 89, 93, 100, 100, 65, 61, 60,
+        59, 58, 58, 66, 68, 72, 79, 79, 89, 92, 97, 105, 105, 65, 61, 60, 59,
+        58, 58, 66, 68, 72, 79, 79, 89, 92, 97, 105, 105, 70, 65, 64, 63, 62,
+        62, 70, 72, 76, 83, 83, 93, 96, 101, 109, 109,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 33, 34, 34, 36, 36, 38, 39,
+        41, 44, 44, 47, 48, 50, 53, 53, 57, 58, 61, 65, 65, 70, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 35, 35, 37, 38, 39, 41, 41, 44,
+        45, 47, 50, 50, 54, 55, 57, 61, 61, 65, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 33, 34, 34, 34, 34, 37, 37, 39, 41, 41, 44, 45, 46, 49, 49,
+        53, 54, 56, 60, 60, 64, 31, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 35,
+        35, 35, 36, 36, 38, 39, 40, 42, 42, 44, 45, 47, 50, 50, 53, 54, 56, 59,
+        59, 63, 32, 32, 32, 32, 33, 33, 34, 34, 34, 35, 35, 36, 37, 37, 38, 38,
+        40, 40, 41, 43, 43, 45, 46, 47, 50, 50, 53, 54, 56, 58, 58, 62, 32, 32,
+        32, 32, 33, 33, 34, 34, 34, 35, 35, 36, 37, 37, 38, 38, 40, 40, 41, 43,
+        43, 45, 46, 47, 50, 50, 53, 54, 56, 58, 58, 62, 35, 35, 35, 34, 34, 34,
+        35, 36, 36, 37, 37, 40, 41, 43, 46, 46, 47, 48, 49, 51, 51, 53, 54, 55,
+        57, 57, 60, 61, 63, 66, 66, 70, 36, 35, 35, 35, 34, 34, 36, 36, 37, 38,
+        38, 41, 42, 44, 48, 48, 50, 50, 51, 53, 53, 56, 56, 58, 60, 60, 63, 63,
+        65, 68, 68, 72, 39, 38, 38, 37, 37, 37, 38, 38, 39, 40, 40, 43, 44, 46,
+        50, 50, 52, 53, 54, 57, 57, 59, 60, 61, 64, 64, 67, 68, 69, 72, 72, 76,
+        44, 42, 42, 41, 41, 41, 42, 42, 42, 42, 42, 46, 48, 50, 54, 54, 57, 58,
+        60, 63, 63, 66, 67, 68, 71, 71, 74, 75, 77, 79, 79, 83, 44, 42, 42, 41,
+        41, 41, 42, 42, 42, 42, 42, 46, 48, 50, 54, 54, 57, 58, 60, 63, 63, 66,
+        67, 68, 71, 71, 74, 75, 77, 79, 79, 83, 51, 49, 49, 48, 47, 47, 48, 48,
+        48, 48, 48, 52, 53, 55, 58, 58, 62, 63, 66, 69, 69, 73, 74, 76, 79, 79,
+        83, 84, 86, 89, 89, 93, 53, 52, 51, 50, 49, 49, 49, 50, 49, 49, 49, 53,
+        54, 56, 60, 60, 64, 65, 67, 71, 71, 75, 76, 78, 82, 82, 86, 87, 89, 92,
+        92, 96, 58, 56, 55, 54, 53, 53, 53, 53, 53, 52, 52, 56, 57, 59, 63, 63,
+        67, 68, 70, 74, 74, 78, 79, 82, 86, 86, 90, 91, 93, 97, 97, 101, 65, 63,
+        62, 61, 59, 59, 59, 59, 58, 58, 58, 62, 63, 65, 68, 68, 72, 73, 76, 79,
+        79, 84, 85, 88, 92, 92, 97, 98, 100, 105, 105, 109, 65, 63, 62, 61, 59,
+        59, 59, 59, 58, 58, 58, 62, 63, 65, 68, 68, 72, 73, 76, 79, 79, 84, 85,
+        88, 92, 92, 97, 98, 100, 105, 105, 109,
+        /* Size 4x16 */
+        31, 32, 44, 58, 32, 32, 42, 55, 32, 33, 41, 53, 32, 34, 42, 53, 32, 34,
+        42, 53, 32, 35, 42, 52, 34, 37, 48, 57, 35, 38, 54, 63, 37, 40, 57, 67,
+        39, 41, 60, 70, 41, 43, 63, 74, 45, 46, 67, 79, 50, 50, 71, 86, 54, 53,
+        74, 90, 57, 56, 77, 93, 61, 58, 79, 97,
+        /* Size 16x4 */
+        31, 32, 32, 32, 32, 32, 34, 35, 37, 39, 41, 45, 50, 54, 57, 61, 32, 32,
+        33, 34, 34, 35, 37, 38, 40, 41, 43, 46, 50, 53, 56, 58, 44, 42, 41, 42,
+        42, 42, 48, 54, 57, 60, 63, 67, 71, 74, 77, 79, 58, 55, 53, 53, 53, 52,
+        57, 63, 67, 70, 74, 79, 86, 90, 93, 97,
+        /* Size 8x32 */
+        32, 31, 32, 35, 39, 44, 53, 65, 31, 32, 32, 35, 38, 42, 52, 63, 31, 32,
+        32, 35, 38, 42, 51, 62, 31, 32, 32, 34, 37, 41, 50, 61, 31, 32, 33, 34,
+        37, 41, 49, 59, 31, 32, 33, 34, 37, 41, 49, 59, 31, 32, 34, 35, 38, 42,
+        49, 59, 32, 32, 34, 36, 38, 42, 50, 59, 32, 32, 34, 36, 39, 42, 49, 58,
+        32, 33, 35, 37, 40, 42, 49, 58, 32, 33, 35, 37, 40, 42, 49, 58, 33, 33,
+        36, 40, 43, 46, 53, 62, 34, 34, 37, 41, 44, 48, 54, 63, 34, 34, 37, 43,
+        46, 50, 56, 65, 36, 34, 38, 46, 50, 54, 60, 68, 36, 34, 38, 46, 50, 54,
+        60, 68, 38, 37, 40, 47, 52, 57, 64, 72, 39, 37, 40, 48, 53, 58, 65, 73,
+        41, 39, 41, 49, 54, 60, 67, 76, 44, 41, 43, 51, 57, 63, 71, 79, 44, 41,
+        43, 51, 57, 63, 71, 79, 47, 44, 45, 53, 59, 66, 75, 84, 48, 45, 46, 54,
+        60, 67, 76, 85, 50, 46, 47, 55, 61, 68, 78, 88, 53, 49, 50, 57, 64, 71,
+        82, 92, 53, 49, 50, 57, 64, 71, 82, 92, 57, 53, 53, 60, 67, 74, 86, 97,
+        58, 54, 54, 61, 68, 75, 87, 98, 61, 56, 56, 63, 69, 77, 89, 100, 65, 60,
+        58, 66, 72, 79, 92, 105, 65, 60, 58, 66, 72, 79, 92, 105, 70, 64, 62,
+        70, 76, 83, 96, 109,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 33, 34, 34, 36, 36, 38, 39,
+        41, 44, 44, 47, 48, 50, 53, 53, 57, 58, 61, 65, 65, 70, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 37, 37, 39, 41, 41, 44,
+        45, 46, 49, 49, 53, 54, 56, 60, 60, 64, 32, 32, 32, 32, 33, 33, 34, 34,
+        34, 35, 35, 36, 37, 37, 38, 38, 40, 40, 41, 43, 43, 45, 46, 47, 50, 50,
+        53, 54, 56, 58, 58, 62, 35, 35, 35, 34, 34, 34, 35, 36, 36, 37, 37, 40,
+        41, 43, 46, 46, 47, 48, 49, 51, 51, 53, 54, 55, 57, 57, 60, 61, 63, 66,
+        66, 70, 39, 38, 38, 37, 37, 37, 38, 38, 39, 40, 40, 43, 44, 46, 50, 50,
+        52, 53, 54, 57, 57, 59, 60, 61, 64, 64, 67, 68, 69, 72, 72, 76, 44, 42,
+        42, 41, 41, 41, 42, 42, 42, 42, 42, 46, 48, 50, 54, 54, 57, 58, 60, 63,
+        63, 66, 67, 68, 71, 71, 74, 75, 77, 79, 79, 83, 53, 52, 51, 50, 49, 49,
+        49, 50, 49, 49, 49, 53, 54, 56, 60, 60, 64, 65, 67, 71, 71, 75, 76, 78,
+        82, 82, 86, 87, 89, 92, 92, 96, 65, 63, 62, 61, 59, 59, 59, 59, 58, 58,
+        58, 62, 63, 65, 68, 68, 72, 73, 76, 79, 79, 84, 85, 88, 92, 92, 97, 98,
+        100, 105, 105, 109 },
+      { /* Chroma */
+        /* Size 4x4 */
+        31, 41, 46, 51, 41, 48, 48, 51, 46, 48, 58, 62, 51, 51, 62, 71,
+        /* Size 8x8 */
+        31, 31, 38, 44, 47, 48, 50, 55, 31, 32, 40, 44, 45, 46, 47, 52, 38, 40,
+        47, 47, 46, 46, 47, 50, 44, 44, 47, 50, 51, 51, 52, 54, 47, 45, 46, 51,
+        54, 56, 57, 60, 48, 46, 46, 51, 56, 61, 63, 66, 50, 47, 47, 52, 57, 63,
+        66, 70, 55, 52, 50, 54, 60, 66, 70, 76,
+        /* Size 16x16 */
+        32, 31, 30, 33, 34, 36, 41, 49, 48, 49, 49, 50, 52, 54, 55, 57, 31, 31,
+        31, 34, 36, 38, 42, 47, 47, 47, 47, 48, 50, 51, 53, 54, 30, 31, 32, 34,
+        37, 40, 42, 46, 45, 45, 45, 46, 47, 49, 50, 52, 33, 34, 34, 37, 40, 42,
+        44, 47, 46, 46, 45, 46, 47, 49, 50, 51, 34, 36, 37, 40, 42, 45, 46, 47,
+        46, 46, 45, 46, 47, 48, 49, 50, 36, 38, 40, 42, 45, 47, 47, 48, 47, 46,
+        45, 46, 47, 48, 49, 50, 41, 42, 42, 44, 46, 47, 48, 50, 50, 49, 49, 50,
+        50, 51, 52, 53, 49, 47, 46, 47, 47, 48, 50, 53, 53, 53, 53, 54, 54, 55,
+        56, 56, 48, 47, 45, 46, 46, 47, 50, 53, 54, 54, 55, 56, 57, 58, 58, 59,
+        49, 47, 45, 46, 46, 46, 49, 53, 54, 55, 57, 58, 59, 60, 60, 61, 49, 47,
+        45, 45, 45, 45, 49, 53, 55, 57, 58, 60, 61, 62, 63, 63, 50, 48, 46, 46,
+        46, 46, 50, 54, 56, 58, 60, 61, 63, 65, 66, 67, 52, 50, 47, 47, 47, 47,
+        50, 54, 57, 59, 61, 63, 66, 68, 69, 70, 54, 51, 49, 49, 48, 48, 51, 55,
+        58, 60, 62, 65, 68, 70, 71, 73, 55, 53, 50, 50, 49, 49, 52, 56, 58, 60,
+        63, 66, 69, 71, 73, 74, 57, 54, 52, 51, 50, 50, 53, 56, 59, 61, 63, 67,
+        70, 73, 74, 76,
+        /* Size 32x32 */
+        32, 31, 31, 31, 30, 30, 33, 33, 34, 36, 36, 40, 41, 44, 49, 49, 48, 48,
+        49, 49, 49, 50, 50, 51, 52, 52, 54, 54, 55, 57, 57, 59, 31, 31, 31, 31,
+        31, 31, 33, 34, 36, 38, 38, 41, 42, 44, 48, 48, 47, 47, 47, 47, 47, 48,
+        49, 49, 50, 50, 52, 52, 53, 55, 55, 57, 31, 31, 31, 31, 31, 31, 34, 34,
+        36, 38, 38, 41, 42, 44, 47, 47, 47, 47, 47, 47, 47, 48, 48, 49, 50, 50,
+        51, 52, 53, 54, 54, 56, 31, 31, 31, 31, 31, 31, 34, 35, 36, 39, 39, 41,
+        42, 44, 47, 47, 46, 46, 46, 46, 46, 47, 47, 48, 49, 49, 50, 51, 52, 53,
+        53, 55, 30, 31, 31, 31, 32, 32, 34, 35, 37, 40, 40, 42, 42, 44, 46, 46,
+        45, 45, 45, 45, 45, 46, 46, 47, 47, 47, 49, 49, 50, 52, 52, 54, 30, 31,
+        31, 31, 32, 32, 34, 35, 37, 40, 40, 42, 42, 44, 46, 46, 45, 45, 45, 45,
+        45, 46, 46, 47, 47, 47, 49, 49, 50, 52, 52, 54, 33, 33, 34, 34, 34, 34,
+        37, 38, 40, 42, 42, 44, 44, 45, 47, 47, 46, 46, 46, 45, 45, 46, 46, 47,
+        47, 47, 49, 49, 50, 51, 51, 53, 33, 34, 34, 35, 35, 35, 38, 39, 40, 43,
+        43, 44, 45, 46, 47, 47, 46, 46, 46, 45, 45, 46, 46, 47, 47, 47, 49, 49,
+        50, 51, 51, 53, 34, 36, 36, 36, 37, 37, 40, 40, 42, 45, 45, 45, 46, 46,
+        47, 47, 46, 46, 46, 45, 45, 46, 46, 47, 47, 47, 48, 49, 49, 50, 50, 52,
+        36, 38, 38, 39, 40, 40, 42, 43, 45, 47, 47, 47, 47, 47, 48, 48, 47, 46,
+        46, 45, 45, 46, 46, 46, 47, 47, 48, 48, 49, 50, 50, 51, 36, 38, 38, 39,
+        40, 40, 42, 43, 45, 47, 47, 47, 47, 47, 48, 48, 47, 46, 46, 45, 45, 46,
+        46, 46, 47, 47, 48, 48, 49, 50, 50, 51, 40, 41, 41, 41, 42, 42, 44, 44,
+        45, 47, 47, 48, 48, 49, 50, 50, 49, 49, 49, 48, 48, 49, 49, 49, 49, 49,
+        51, 51, 51, 52, 52, 54, 41, 42, 42, 42, 42, 42, 44, 45, 46, 47, 47, 48,
+        48, 49, 50, 50, 50, 49, 49, 49, 49, 50, 50, 50, 50, 50, 51, 52, 52, 53,
+        53, 55, 44, 44, 44, 44, 44, 44, 45, 46, 46, 47, 47, 49, 49, 50, 51, 51,
+        51, 51, 51, 51, 51, 51, 51, 51, 52, 52, 53, 53, 54, 54, 54, 56, 49, 48,
+        47, 47, 46, 46, 47, 47, 47, 48, 48, 50, 50, 51, 53, 53, 53, 53, 53, 53,
+        53, 54, 54, 54, 54, 54, 55, 55, 56, 56, 56, 58, 49, 48, 47, 47, 46, 46,
+        47, 47, 47, 48, 48, 50, 50, 51, 53, 53, 53, 53, 53, 53, 53, 54, 54, 54,
+        54, 54, 55, 55, 56, 56, 56, 58, 48, 47, 47, 46, 45, 45, 46, 46, 46, 47,
+        47, 49, 50, 51, 53, 53, 54, 54, 54, 55, 55, 56, 56, 56, 57, 57, 58, 58,
+        58, 59, 59, 60, 48, 47, 47, 46, 45, 45, 46, 46, 46, 46, 46, 49, 49, 51,
+        53, 53, 54, 54, 55, 55, 55, 56, 56, 57, 57, 57, 58, 58, 59, 60, 60, 61,
+        49, 47, 47, 46, 45, 45, 46, 46, 46, 46, 46, 49, 49, 51, 53, 53, 54, 55,
+        55, 57, 57, 57, 58, 58, 59, 59, 60, 60, 60, 61, 61, 63, 49, 47, 47, 46,
+        45, 45, 45, 45, 45, 45, 45, 48, 49, 51, 53, 53, 55, 55, 57, 58, 58, 59,
+        60, 60, 61, 61, 62, 62, 63, 63, 63, 65, 49, 47, 47, 46, 45, 45, 45, 45,
+        45, 45, 45, 48, 49, 51, 53, 53, 55, 55, 57, 58, 58, 59, 60, 60, 61, 61,
+        62, 62, 63, 63, 63, 65, 50, 48, 48, 47, 46, 46, 46, 46, 46, 46, 46, 49,
+        50, 51, 54, 54, 56, 56, 57, 59, 59, 61, 61, 62, 63, 63, 64, 64, 65, 66,
+        66, 67, 50, 49, 48, 47, 46, 46, 46, 46, 46, 46, 46, 49, 50, 51, 54, 54,
+        56, 56, 58, 60, 60, 61, 61, 62, 63, 63, 65, 65, 66, 67, 67, 68, 51, 49,
+        49, 48, 47, 47, 47, 47, 47, 46, 46, 49, 50, 51, 54, 54, 56, 57, 58, 60,
+        60, 62, 62, 63, 65, 65, 66, 66, 67, 68, 68, 70, 52, 50, 50, 49, 47, 47,
+        47, 47, 47, 47, 47, 49, 50, 52, 54, 54, 57, 57, 59, 61, 61, 63, 63, 65,
+        66, 66, 68, 68, 69, 70, 70, 72, 52, 50, 50, 49, 47, 47, 47, 47, 47, 47,
+        47, 49, 50, 52, 54, 54, 57, 57, 59, 61, 61, 63, 63, 65, 66, 66, 68, 68,
+        69, 70, 70, 72, 54, 52, 51, 50, 49, 49, 49, 49, 48, 48, 48, 51, 51, 53,
+        55, 55, 58, 58, 60, 62, 62, 64, 65, 66, 68, 68, 70, 70, 71, 73, 73, 74,
+        54, 52, 52, 51, 49, 49, 49, 49, 49, 48, 48, 51, 52, 53, 55, 55, 58, 58,
+        60, 62, 62, 64, 65, 66, 68, 68, 70, 71, 72, 73, 73, 75, 55, 53, 53, 52,
+        50, 50, 50, 50, 49, 49, 49, 51, 52, 54, 56, 56, 58, 59, 60, 63, 63, 65,
+        66, 67, 69, 69, 71, 72, 73, 74, 74, 76, 57, 55, 54, 53, 52, 52, 51, 51,
+        50, 50, 50, 52, 53, 54, 56, 56, 59, 60, 61, 63, 63, 66, 67, 68, 70, 70,
+        73, 73, 74, 76, 76, 78, 57, 55, 54, 53, 52, 52, 51, 51, 50, 50, 50, 52,
+        53, 54, 56, 56, 59, 60, 61, 63, 63, 66, 67, 68, 70, 70, 73, 73, 74, 76,
+        76, 78, 59, 57, 56, 55, 54, 54, 53, 53, 52, 51, 51, 54, 55, 56, 58, 58,
+        60, 61, 63, 65, 65, 67, 68, 70, 72, 72, 74, 75, 76, 78, 78, 80,
+        /* Size 4x8 */
+        31, 38, 47, 52, 32, 40, 45, 49, 39, 47, 45, 48, 44, 47, 51, 53, 46, 47,
+        56, 58, 47, 46, 59, 64, 48, 47, 61, 68, 53, 50, 64, 73,
+        /* Size 8x4 */
+        31, 32, 39, 44, 46, 47, 48, 53, 38, 40, 47, 47, 47, 46, 47, 50, 47, 45,
+        45, 51, 56, 59, 61, 64, 52, 49, 48, 53, 58, 64, 68, 73,
+        /* Size 8x16 */
+        32, 31, 37, 45, 48, 49, 52, 57, 31, 31, 38, 45, 47, 47, 50, 54, 30, 32,
+        40, 44, 45, 45, 48, 52, 33, 35, 42, 46, 46, 45, 47, 51, 35, 37, 44, 46,
+        46, 45, 47, 51, 37, 40, 47, 47, 47, 45, 47, 50, 42, 43, 47, 49, 50, 49,
+        50, 53, 49, 46, 48, 52, 53, 53, 54, 57, 48, 46, 47, 51, 54, 55, 57, 59,
+        48, 45, 46, 51, 54, 57, 59, 61, 49, 45, 46, 51, 55, 58, 61, 64, 50, 46,
+        46, 52, 56, 59, 64, 67, 52, 48, 47, 53, 57, 61, 66, 71, 54, 49, 48, 54,
+        58, 62, 68, 73, 55, 51, 49, 54, 58, 63, 69, 74, 57, 52, 50, 55, 59, 64,
+        70, 76,
+        /* Size 16x8 */
+        32, 31, 30, 33, 35, 37, 42, 49, 48, 48, 49, 50, 52, 54, 55, 57, 31, 31,
+        32, 35, 37, 40, 43, 46, 46, 45, 45, 46, 48, 49, 51, 52, 37, 38, 40, 42,
+        44, 47, 47, 48, 47, 46, 46, 46, 47, 48, 49, 50, 45, 45, 44, 46, 46, 47,
+        49, 52, 51, 51, 51, 52, 53, 54, 54, 55, 48, 47, 45, 46, 46, 47, 50, 53,
+        54, 54, 55, 56, 57, 58, 58, 59, 49, 47, 45, 45, 45, 45, 49, 53, 55, 57,
+        58, 59, 61, 62, 63, 64, 52, 50, 48, 47, 47, 47, 50, 54, 57, 59, 61, 64,
+        66, 68, 69, 70, 57, 54, 52, 51, 51, 50, 53, 57, 59, 61, 64, 67, 71, 73,
+        74, 76,
+        /* Size 16x32 */
+        32, 31, 31, 33, 37, 37, 45, 48, 48, 49, 49, 51, 52, 54, 57, 57, 31, 31,
+        31, 34, 38, 38, 45, 47, 47, 47, 47, 50, 50, 52, 55, 55, 31, 31, 31, 34,
+        38, 38, 45, 47, 47, 47, 47, 49, 50, 51, 54, 54, 31, 31, 32, 34, 39, 39,
+        45, 46, 46, 46, 46, 48, 49, 51, 53, 53, 30, 32, 32, 35, 40, 40, 44, 46,
+        45, 45, 45, 47, 48, 49, 52, 52, 30, 32, 32, 35, 40, 40, 44, 46, 45, 45,
+        45, 47, 48, 49, 52, 52, 33, 34, 35, 37, 42, 42, 46, 47, 46, 45, 45, 47,
+        47, 49, 51, 51, 33, 35, 36, 38, 43, 43, 46, 47, 46, 46, 46, 47, 47, 49,
+        51, 51, 35, 37, 37, 40, 44, 44, 46, 47, 46, 45, 45, 47, 47, 48, 51, 51,
+        37, 39, 40, 43, 47, 47, 47, 47, 47, 45, 45, 46, 47, 48, 50, 50, 37, 39,
+        40, 43, 47, 47, 47, 47, 47, 45, 45, 46, 47, 48, 50, 50, 41, 42, 42, 44,
+        47, 47, 49, 49, 49, 48, 48, 49, 50, 51, 52, 52, 42, 42, 43, 44, 47, 47,
+        49, 50, 50, 49, 49, 50, 50, 51, 53, 53, 44, 44, 44, 45, 47, 47, 50, 51,
+        51, 51, 51, 52, 52, 53, 54, 54, 49, 47, 46, 47, 48, 48, 52, 53, 53, 53,
+        53, 54, 54, 55, 57, 57, 49, 47, 46, 47, 48, 48, 52, 53, 53, 53, 53, 54,
+        54, 55, 57, 57, 48, 46, 46, 46, 47, 47, 51, 53, 54, 55, 55, 56, 57, 58,
+        59, 59, 48, 46, 46, 46, 47, 47, 51, 53, 54, 56, 56, 57, 57, 58, 60, 60,
+        48, 46, 45, 46, 46, 46, 51, 53, 54, 57, 57, 58, 59, 60, 61, 61, 49, 46,
+        45, 45, 46, 46, 51, 53, 55, 58, 58, 61, 61, 62, 64, 64, 49, 46, 45, 45,
+        46, 46, 51, 53, 55, 58, 58, 61, 61, 62, 64, 64, 50, 47, 46, 46, 46, 46,
+        52, 54, 56, 59, 59, 62, 63, 64, 66, 66, 50, 47, 46, 46, 46, 46, 52, 54,
+        56, 59, 59, 63, 64, 65, 67, 67, 51, 48, 47, 47, 47, 47, 52, 54, 56, 60,
+        60, 64, 65, 66, 68, 68, 52, 48, 48, 47, 47, 47, 53, 54, 57, 61, 61, 65,
+        66, 68, 71, 71, 52, 48, 48, 47, 47, 47, 53, 54, 57, 61, 61, 65, 66, 68,
+        71, 71, 54, 50, 49, 49, 48, 48, 54, 55, 58, 62, 62, 67, 68, 70, 73, 73,
+        54, 51, 50, 49, 49, 49, 54, 55, 58, 62, 62, 67, 68, 70, 73, 73, 55, 51,
+        51, 50, 49, 49, 54, 56, 58, 63, 63, 68, 69, 71, 74, 74, 57, 53, 52, 51,
+        50, 50, 55, 56, 59, 64, 64, 69, 70, 73, 76, 76, 57, 53, 52, 51, 50, 50,
+        55, 56, 59, 64, 64, 69, 70, 73, 76, 76, 59, 55, 54, 53, 52, 52, 57, 58,
+        61, 65, 65, 70, 72, 74, 78, 78,
+        /* Size 32x16 */
+        32, 31, 31, 31, 30, 30, 33, 33, 35, 37, 37, 41, 42, 44, 49, 49, 48, 48,
+        48, 49, 49, 50, 50, 51, 52, 52, 54, 54, 55, 57, 57, 59, 31, 31, 31, 31,
+        32, 32, 34, 35, 37, 39, 39, 42, 42, 44, 47, 47, 46, 46, 46, 46, 46, 47,
+        47, 48, 48, 48, 50, 51, 51, 53, 53, 55, 31, 31, 31, 32, 32, 32, 35, 36,
+        37, 40, 40, 42, 43, 44, 46, 46, 46, 46, 45, 45, 45, 46, 46, 47, 48, 48,
+        49, 50, 51, 52, 52, 54, 33, 34, 34, 34, 35, 35, 37, 38, 40, 43, 43, 44,
+        44, 45, 47, 47, 46, 46, 46, 45, 45, 46, 46, 47, 47, 47, 49, 49, 50, 51,
+        51, 53, 37, 38, 38, 39, 40, 40, 42, 43, 44, 47, 47, 47, 47, 47, 48, 48,
+        47, 47, 46, 46, 46, 46, 46, 47, 47, 47, 48, 49, 49, 50, 50, 52, 37, 38,
+        38, 39, 40, 40, 42, 43, 44, 47, 47, 47, 47, 47, 48, 48, 47, 47, 46, 46,
+        46, 46, 46, 47, 47, 47, 48, 49, 49, 50, 50, 52, 45, 45, 45, 45, 44, 44,
+        46, 46, 46, 47, 47, 49, 49, 50, 52, 52, 51, 51, 51, 51, 51, 52, 52, 52,
+        53, 53, 54, 54, 54, 55, 55, 57, 48, 47, 47, 46, 46, 46, 47, 47, 47, 47,
+        47, 49, 50, 51, 53, 53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 54, 55, 55,
+        56, 56, 56, 58, 48, 47, 47, 46, 45, 45, 46, 46, 46, 47, 47, 49, 50, 51,
+        53, 53, 54, 54, 54, 55, 55, 56, 56, 56, 57, 57, 58, 58, 58, 59, 59, 61,
+        49, 47, 47, 46, 45, 45, 45, 46, 45, 45, 45, 48, 49, 51, 53, 53, 55, 56,
+        57, 58, 58, 59, 59, 60, 61, 61, 62, 62, 63, 64, 64, 65, 49, 47, 47, 46,
+        45, 45, 45, 46, 45, 45, 45, 48, 49, 51, 53, 53, 55, 56, 57, 58, 58, 59,
+        59, 60, 61, 61, 62, 62, 63, 64, 64, 65, 51, 50, 49, 48, 47, 47, 47, 47,
+        47, 46, 46, 49, 50, 52, 54, 54, 56, 57, 58, 61, 61, 62, 63, 64, 65, 65,
+        67, 67, 68, 69, 69, 70, 52, 50, 50, 49, 48, 48, 47, 47, 47, 47, 47, 50,
+        50, 52, 54, 54, 57, 57, 59, 61, 61, 63, 64, 65, 66, 66, 68, 68, 69, 70,
+        70, 72, 54, 52, 51, 51, 49, 49, 49, 49, 48, 48, 48, 51, 51, 53, 55, 55,
+        58, 58, 60, 62, 62, 64, 65, 66, 68, 68, 70, 70, 71, 73, 73, 74, 57, 55,
+        54, 53, 52, 52, 51, 51, 51, 50, 50, 52, 53, 54, 57, 57, 59, 60, 61, 64,
+        64, 66, 67, 68, 71, 71, 73, 73, 74, 76, 76, 78, 57, 55, 54, 53, 52, 52,
+        51, 51, 51, 50, 50, 52, 53, 54, 57, 57, 59, 60, 61, 64, 64, 66, 67, 68,
+        71, 71, 73, 73, 74, 76, 76, 78,
+        /* Size 4x16 */
+        31, 37, 49, 54, 31, 38, 47, 51, 32, 40, 45, 49, 34, 42, 45, 49, 37, 44,
+        45, 48, 39, 47, 45, 48, 42, 47, 49, 51, 47, 48, 53, 55, 46, 47, 55, 58,
+        46, 46, 57, 60, 46, 46, 58, 62, 47, 46, 59, 65, 48, 47, 61, 68, 50, 48,
+        62, 70, 51, 49, 63, 71, 53, 50, 64, 73,
+        /* Size 16x4 */
+        31, 31, 32, 34, 37, 39, 42, 47, 46, 46, 46, 47, 48, 50, 51, 53, 37, 38,
+        40, 42, 44, 47, 47, 48, 47, 46, 46, 46, 47, 48, 49, 50, 49, 47, 45, 45,
+        45, 45, 49, 53, 55, 57, 58, 59, 61, 62, 63, 64, 54, 51, 49, 49, 48, 48,
+        51, 55, 58, 60, 62, 65, 68, 70, 71, 73,
+        /* Size 8x32 */
+        32, 31, 37, 45, 48, 49, 52, 57, 31, 31, 38, 45, 47, 47, 50, 55, 31, 31,
+        38, 45, 47, 47, 50, 54, 31, 32, 39, 45, 46, 46, 49, 53, 30, 32, 40, 44,
+        45, 45, 48, 52, 30, 32, 40, 44, 45, 45, 48, 52, 33, 35, 42, 46, 46, 45,
+        47, 51, 33, 36, 43, 46, 46, 46, 47, 51, 35, 37, 44, 46, 46, 45, 47, 51,
+        37, 40, 47, 47, 47, 45, 47, 50, 37, 40, 47, 47, 47, 45, 47, 50, 41, 42,
+        47, 49, 49, 48, 50, 52, 42, 43, 47, 49, 50, 49, 50, 53, 44, 44, 47, 50,
+        51, 51, 52, 54, 49, 46, 48, 52, 53, 53, 54, 57, 49, 46, 48, 52, 53, 53,
+        54, 57, 48, 46, 47, 51, 54, 55, 57, 59, 48, 46, 47, 51, 54, 56, 57, 60,
+        48, 45, 46, 51, 54, 57, 59, 61, 49, 45, 46, 51, 55, 58, 61, 64, 49, 45,
+        46, 51, 55, 58, 61, 64, 50, 46, 46, 52, 56, 59, 63, 66, 50, 46, 46, 52,
+        56, 59, 64, 67, 51, 47, 47, 52, 56, 60, 65, 68, 52, 48, 47, 53, 57, 61,
+        66, 71, 52, 48, 47, 53, 57, 61, 66, 71, 54, 49, 48, 54, 58, 62, 68, 73,
+        54, 50, 49, 54, 58, 62, 68, 73, 55, 51, 49, 54, 58, 63, 69, 74, 57, 52,
+        50, 55, 59, 64, 70, 76, 57, 52, 50, 55, 59, 64, 70, 76, 59, 54, 52, 57,
+        61, 65, 72, 78,
+        /* Size 32x8 */
+        32, 31, 31, 31, 30, 30, 33, 33, 35, 37, 37, 41, 42, 44, 49, 49, 48, 48,
+        48, 49, 49, 50, 50, 51, 52, 52, 54, 54, 55, 57, 57, 59, 31, 31, 31, 32,
+        32, 32, 35, 36, 37, 40, 40, 42, 43, 44, 46, 46, 46, 46, 45, 45, 45, 46,
+        46, 47, 48, 48, 49, 50, 51, 52, 52, 54, 37, 38, 38, 39, 40, 40, 42, 43,
+        44, 47, 47, 47, 47, 47, 48, 48, 47, 47, 46, 46, 46, 46, 46, 47, 47, 47,
+        48, 49, 49, 50, 50, 52, 45, 45, 45, 45, 44, 44, 46, 46, 46, 47, 47, 49,
+        49, 50, 52, 52, 51, 51, 51, 51, 51, 52, 52, 52, 53, 53, 54, 54, 54, 55,
+        55, 57, 48, 47, 47, 46, 45, 45, 46, 46, 46, 47, 47, 49, 50, 51, 53, 53,
+        54, 54, 54, 55, 55, 56, 56, 56, 57, 57, 58, 58, 58, 59, 59, 61, 49, 47,
+        47, 46, 45, 45, 45, 46, 45, 45, 45, 48, 49, 51, 53, 53, 55, 56, 57, 58,
+        58, 59, 59, 60, 61, 61, 62, 62, 63, 64, 64, 65, 52, 50, 50, 49, 48, 48,
+        47, 47, 47, 47, 47, 50, 50, 52, 54, 54, 57, 57, 59, 61, 61, 63, 64, 65,
+        66, 66, 68, 68, 69, 70, 70, 72, 57, 55, 54, 53, 52, 52, 51, 51, 51, 50,
+        50, 52, 53, 54, 57, 57, 59, 60, 61, 64, 64, 66, 67, 68, 71, 71, 73, 73,
+        74, 76, 76, 78 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 32, 38, 51, 32, 35, 40, 49, 38, 40, 54, 64, 51, 49, 64, 81,
+        /* Size 8x8 */
+        31, 32, 32, 34, 35, 41, 47, 53, 32, 32, 32, 33, 34, 40, 44, 50, 32, 32,
+        34, 35, 37, 41, 45, 51, 34, 33, 35, 39, 42, 47, 51, 55, 35, 34, 37, 42,
+        48, 53, 57, 61, 41, 40, 41, 47, 53, 60, 65, 70, 47, 44, 45, 51, 57, 65,
+        71, 77, 53, 50, 51, 55, 61, 70, 77, 85,
+        /* Size 16x16 */
+        32, 31, 31, 31, 31, 32, 32, 34, 36, 38, 39, 44, 47, 49, 54, 59, 31, 32,
+        32, 32, 32, 32, 33, 34, 35, 37, 38, 42, 45, 47, 51, 56, 31, 32, 32, 32,
+        32, 32, 33, 33, 34, 36, 37, 41, 44, 46, 50, 54, 31, 32, 32, 32, 32, 33,
+        33, 34, 35, 36, 38, 41, 44, 45, 49, 54, 31, 32, 32, 32, 33, 34, 34, 35,
+        36, 38, 39, 42, 45, 46, 50, 54, 32, 32, 32, 33, 34, 35, 36, 37, 38, 39,
+        40, 42, 45, 46, 49, 53, 32, 33, 33, 33, 34, 36, 36, 38, 40, 41, 42, 44,
+        47, 48, 51, 55, 34, 34, 33, 34, 35, 37, 38, 39, 42, 44, 45, 47, 50, 51,
+        54, 58, 36, 35, 34, 35, 36, 38, 40, 42, 48, 50, 50, 54, 56, 57, 60, 64,
+        38, 37, 36, 36, 38, 39, 41, 44, 50, 51, 52, 56, 58, 60, 63, 67, 39, 38,
+        37, 38, 39, 40, 42, 45, 50, 52, 54, 58, 60, 62, 65, 69, 44, 42, 41, 41,
+        42, 42, 44, 47, 54, 56, 58, 63, 66, 68, 71, 75, 47, 45, 44, 44, 45, 45,
+        47, 50, 56, 58, 60, 66, 69, 71, 75, 79, 49, 47, 46, 45, 46, 46, 48, 51,
+        57, 60, 62, 68, 71, 73, 77, 81, 54, 51, 50, 49, 50, 49, 51, 54, 60, 63,
+        65, 71, 75, 77, 82, 87, 59, 56, 54, 54, 54, 53, 55, 58, 64, 67, 69, 75,
+        79, 81, 87, 92,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 34, 34, 35, 36, 36,
+        38, 39, 39, 42, 44, 44, 47, 48, 49, 53, 54, 55, 59, 59, 31, 31, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 35, 35, 37, 39, 39, 41,
+        43, 43, 46, 47, 48, 51, 52, 53, 57, 57, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 34, 34, 34, 35, 35, 37, 38, 38, 41, 42, 43, 45, 46,
+        47, 51, 51, 53, 56, 56, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 34, 34, 34, 35, 35, 37, 38, 38, 41, 42, 42, 45, 46, 47, 51, 51, 52,
+        56, 56, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34,
+        34, 34, 36, 37, 37, 40, 41, 41, 44, 45, 46, 49, 50, 51, 54, 54, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 36, 37,
+        37, 40, 41, 41, 44, 44, 45, 49, 49, 50, 54, 54, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 35, 35, 36, 38, 38, 40, 41, 41,
+        44, 45, 45, 49, 49, 50, 54, 54, 31, 32, 32, 32, 32, 32, 32, 33, 33, 33,
+        34, 34, 34, 35, 35, 35, 36, 36, 38, 39, 39, 41, 42, 42, 44, 45, 46, 49,
+        50, 51, 54, 54, 31, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 35,
+        35, 36, 36, 36, 38, 39, 39, 41, 42, 42, 45, 45, 46, 49, 50, 51, 54, 54,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 35, 35, 35, 36, 37, 37,
+        38, 39, 39, 41, 42, 42, 45, 45, 46, 49, 49, 51, 54, 54, 32, 32, 32, 32,
+        32, 32, 33, 34, 34, 34, 35, 35, 36, 37, 37, 37, 38, 38, 39, 40, 40, 42,
+        42, 43, 45, 46, 46, 49, 49, 50, 53, 53, 32, 32, 32, 32, 32, 32, 33, 34,
+        34, 34, 35, 35, 36, 37, 37, 37, 38, 38, 39, 40, 40, 42, 42, 43, 45, 46,
+        46, 49, 49, 50, 53, 53, 32, 33, 33, 33, 33, 33, 33, 34, 34, 35, 36, 36,
+        36, 38, 38, 39, 40, 40, 41, 42, 42, 44, 44, 45, 47, 47, 48, 51, 51, 52,
+        55, 55, 34, 34, 34, 34, 33, 33, 34, 35, 35, 35, 37, 37, 38, 39, 39, 41,
+        42, 42, 44, 45, 45, 47, 47, 48, 50, 51, 51, 54, 54, 55, 58, 58, 34, 34,
+        34, 34, 33, 33, 34, 35, 35, 35, 37, 37, 38, 39, 39, 41, 42, 42, 44, 45,
+        45, 47, 47, 48, 50, 51, 51, 54, 54, 55, 58, 58, 35, 34, 34, 34, 34, 34,
+        34, 35, 36, 36, 37, 37, 39, 41, 41, 43, 45, 45, 47, 47, 47, 49, 50, 51,
+        53, 53, 54, 57, 57, 58, 61, 61, 36, 35, 35, 35, 34, 34, 35, 36, 36, 37,
+        38, 38, 40, 42, 42, 45, 48, 48, 50, 50, 50, 53, 54, 54, 56, 57, 57, 59,
+        60, 61, 64, 64, 36, 35, 35, 35, 34, 34, 35, 36, 36, 37, 38, 38, 40, 42,
+        42, 45, 48, 48, 50, 50, 50, 53, 54, 54, 56, 57, 57, 59, 60, 61, 64, 64,
+        38, 37, 37, 37, 36, 36, 36, 38, 38, 38, 39, 39, 41, 44, 44, 47, 50, 50,
+        51, 52, 52, 55, 56, 56, 58, 59, 60, 62, 63, 64, 67, 67, 39, 39, 38, 38,
+        37, 37, 38, 39, 39, 39, 40, 40, 42, 45, 45, 47, 50, 50, 52, 54, 54, 56,
+        58, 58, 60, 61, 62, 64, 65, 66, 69, 69, 39, 39, 38, 38, 37, 37, 38, 39,
+        39, 39, 40, 40, 42, 45, 45, 47, 50, 50, 52, 54, 54, 56, 58, 58, 60, 61,
+        62, 64, 65, 66, 69, 69, 42, 41, 41, 41, 40, 40, 40, 41, 41, 41, 42, 42,
+        44, 47, 47, 49, 53, 53, 55, 56, 56, 60, 61, 62, 64, 65, 66, 69, 69, 70,
+        73, 73, 44, 43, 42, 42, 41, 41, 41, 42, 42, 42, 42, 42, 44, 47, 47, 50,
+        54, 54, 56, 58, 58, 61, 63, 64, 66, 67, 68, 71, 71, 72, 75, 75, 44, 43,
+        43, 42, 41, 41, 41, 42, 42, 42, 43, 43, 45, 48, 48, 51, 54, 54, 56, 58,
+        58, 62, 64, 64, 66, 67, 68, 71, 72, 73, 76, 76, 47, 46, 45, 45, 44, 44,
+        44, 44, 45, 45, 45, 45, 47, 50, 50, 53, 56, 56, 58, 60, 60, 64, 66, 66,
+        69, 70, 71, 74, 75, 76, 79, 79, 48, 47, 46, 46, 45, 44, 45, 45, 45, 45,
+        46, 46, 47, 51, 51, 53, 57, 57, 59, 61, 61, 65, 67, 67, 70, 71, 72, 75,
+        76, 77, 80, 80, 49, 48, 47, 47, 46, 45, 45, 46, 46, 46, 46, 46, 48, 51,
+        51, 54, 57, 57, 60, 62, 62, 66, 68, 68, 71, 72, 73, 77, 77, 78, 81, 81,
+        53, 51, 51, 51, 49, 49, 49, 49, 49, 49, 49, 49, 51, 54, 54, 57, 59, 59,
+        62, 64, 64, 69, 71, 71, 74, 75, 77, 81, 81, 83, 86, 86, 54, 52, 51, 51,
+        50, 49, 49, 50, 50, 49, 49, 49, 51, 54, 54, 57, 60, 60, 63, 65, 65, 69,
+        71, 72, 75, 76, 77, 81, 82, 83, 87, 87, 55, 53, 53, 52, 51, 50, 50, 51,
+        51, 51, 50, 50, 52, 55, 55, 58, 61, 61, 64, 66, 66, 70, 72, 73, 76, 77,
+        78, 83, 83, 85, 88, 88, 59, 57, 56, 56, 54, 54, 54, 54, 54, 54, 53, 53,
+        55, 58, 58, 61, 64, 64, 67, 69, 69, 73, 75, 76, 79, 80, 81, 86, 87, 88,
+        92, 92, 59, 57, 56, 56, 54, 54, 54, 54, 54, 54, 53, 53, 55, 58, 58, 61,
+        64, 64, 67, 69, 69, 73, 75, 76, 79, 80, 81, 86, 87, 88, 92, 92,
+        /* Size 4x8 */
+        32, 32, 37, 52, 32, 33, 36, 49, 32, 34, 38, 49, 34, 37, 44, 54, 35, 38,
+        49, 60, 40, 42, 55, 69, 46, 46, 59, 76, 52, 51, 64, 83,
+        /* Size 8x4 */
+        32, 32, 32, 34, 35, 40, 46, 52, 32, 33, 34, 37, 38, 42, 46, 51, 37, 36,
+        38, 44, 49, 55, 59, 64, 52, 49, 49, 54, 60, 69, 76, 83,
+        /* Size 8x16 */
+        32, 31, 32, 32, 36, 44, 47, 53, 31, 32, 32, 33, 35, 42, 45, 51, 31, 32,
+        32, 33, 35, 41, 44, 49, 31, 32, 33, 33, 35, 41, 44, 49, 32, 32, 34, 34,
+        36, 42, 45, 50, 32, 33, 35, 36, 38, 42, 45, 49, 32, 33, 35, 36, 40, 44,
+        47, 51, 34, 34, 36, 38, 42, 48, 50, 54, 36, 34, 37, 40, 48, 54, 56, 60,
+        38, 36, 39, 41, 49, 56, 58, 63, 39, 37, 40, 42, 50, 58, 60, 65, 44, 41,
+        42, 45, 53, 63, 66, 71, 47, 44, 45, 47, 56, 66, 69, 75, 49, 46, 47, 48,
+        57, 67, 71, 77, 53, 49, 50, 51, 60, 71, 75, 82, 58, 54, 54, 55, 63, 75,
+        79, 87,
+        /* Size 16x8 */
+        32, 31, 31, 31, 32, 32, 32, 34, 36, 38, 39, 44, 47, 49, 53, 58, 31, 32,
+        32, 32, 32, 33, 33, 34, 34, 36, 37, 41, 44, 46, 49, 54, 32, 32, 32, 33,
+        34, 35, 35, 36, 37, 39, 40, 42, 45, 47, 50, 54, 32, 33, 33, 33, 34, 36,
+        36, 38, 40, 41, 42, 45, 47, 48, 51, 55, 36, 35, 35, 35, 36, 38, 40, 42,
+        48, 49, 50, 53, 56, 57, 60, 63, 44, 42, 41, 41, 42, 42, 44, 48, 54, 56,
+        58, 63, 66, 67, 71, 75, 47, 45, 44, 44, 45, 45, 47, 50, 56, 58, 60, 66,
+        69, 71, 75, 79, 53, 51, 49, 49, 50, 49, 51, 54, 60, 63, 65, 71, 75, 77,
+        82, 87,
+        /* Size 16x32 */
+        32, 31, 31, 31, 32, 32, 32, 35, 36, 38, 44, 44, 47, 53, 53, 59, 31, 32,
+        32, 32, 32, 32, 33, 35, 35, 37, 43, 43, 46, 52, 52, 57, 31, 32, 32, 32,
+        32, 32, 33, 35, 35, 37, 42, 42, 45, 51, 51, 56, 31, 32, 32, 32, 32, 32,
+        33, 35, 35, 37, 42, 42, 45, 51, 51, 56, 31, 32, 32, 32, 32, 32, 33, 34,
+        35, 36, 41, 41, 44, 49, 49, 54, 31, 32, 32, 32, 32, 33, 33, 34, 34, 36,
+        41, 41, 44, 49, 49, 54, 31, 32, 32, 32, 33, 33, 33, 35, 35, 36, 41, 41,
+        44, 49, 49, 54, 32, 32, 32, 32, 33, 34, 34, 36, 36, 38, 42, 42, 45, 49,
+        49, 54, 32, 32, 32, 33, 34, 34, 34, 36, 36, 38, 42, 42, 45, 50, 50, 54,
+        32, 32, 32, 33, 34, 34, 35, 37, 37, 38, 42, 42, 45, 49, 49, 54, 32, 32,
+        33, 33, 35, 35, 36, 38, 38, 39, 42, 42, 45, 49, 49, 53, 32, 32, 33, 33,
+        35, 35, 36, 38, 38, 39, 42, 42, 45, 49, 49, 53, 32, 33, 33, 33, 35, 36,
+        36, 39, 40, 41, 44, 44, 47, 51, 51, 55, 34, 34, 34, 34, 36, 37, 38, 42,
+        42, 44, 48, 48, 50, 54, 54, 58, 34, 34, 34, 34, 36, 37, 38, 42, 42, 44,
+        48, 48, 50, 54, 54, 58, 35, 34, 34, 34, 37, 37, 39, 44, 45, 46, 50, 50,
+        53, 57, 57, 61, 36, 35, 34, 35, 37, 38, 40, 47, 48, 49, 54, 54, 56, 60,
+        60, 64, 36, 35, 34, 35, 37, 38, 40, 47, 48, 49, 54, 54, 56, 60, 60, 64,
+        38, 37, 36, 37, 39, 40, 41, 48, 49, 51, 56, 56, 58, 63, 63, 67, 39, 38,
+        37, 38, 40, 40, 42, 49, 50, 52, 58, 58, 60, 65, 65, 69, 39, 38, 37, 38,
+        40, 40, 42, 49, 50, 52, 58, 58, 60, 65, 65, 69, 42, 40, 40, 40, 42, 42,
+        44, 51, 52, 55, 61, 61, 64, 69, 69, 73, 44, 42, 41, 41, 42, 43, 45, 52,
+        53, 56, 63, 63, 66, 71, 71, 75, 44, 42, 41, 41, 43, 43, 45, 52, 54, 56,
+        63, 63, 66, 72, 72, 76, 47, 45, 44, 44, 45, 45, 47, 54, 56, 58, 66, 66,
+        69, 75, 75, 79, 48, 46, 45, 45, 46, 46, 48, 55, 56, 59, 67, 67, 70, 76,
+        76, 80, 49, 47, 46, 46, 47, 47, 48, 56, 57, 60, 67, 67, 71, 77, 77, 81,
+        53, 50, 49, 49, 49, 49, 51, 58, 59, 62, 71, 71, 74, 81, 81, 86, 53, 51,
+        49, 49, 50, 50, 51, 59, 60, 63, 71, 71, 75, 82, 82, 87, 55, 52, 51, 51,
+        51, 51, 53, 60, 61, 64, 72, 72, 76, 83, 83, 88, 58, 55, 54, 54, 54, 54,
+        55, 62, 63, 67, 75, 75, 79, 87, 87, 92, 58, 55, 54, 54, 54, 54, 55, 62,
+        63, 67, 75, 75, 79, 87, 87, 92,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 34, 34, 35, 36, 36,
+        38, 39, 39, 42, 44, 44, 47, 48, 49, 53, 53, 55, 58, 58, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 35, 35, 37, 38, 38, 40,
+        42, 42, 45, 46, 47, 50, 51, 52, 55, 55, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 36, 37, 37, 40, 41, 41, 44, 45,
+        46, 49, 49, 51, 54, 54, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        33, 34, 34, 34, 35, 35, 37, 38, 38, 40, 41, 41, 44, 45, 46, 49, 49, 51,
+        54, 54, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 35, 35, 35, 36, 36, 37,
+        37, 37, 39, 40, 40, 42, 42, 43, 45, 46, 47, 49, 50, 51, 54, 54, 32, 32,
+        32, 32, 32, 33, 33, 34, 34, 34, 35, 35, 36, 37, 37, 37, 38, 38, 40, 40,
+        40, 42, 43, 43, 45, 46, 47, 49, 50, 51, 54, 54, 32, 33, 33, 33, 33, 33,
+        33, 34, 34, 35, 36, 36, 36, 38, 38, 39, 40, 40, 41, 42, 42, 44, 45, 45,
+        47, 48, 48, 51, 51, 53, 55, 55, 35, 35, 35, 35, 34, 34, 35, 36, 36, 37,
+        38, 38, 39, 42, 42, 44, 47, 47, 48, 49, 49, 51, 52, 52, 54, 55, 56, 58,
+        59, 60, 62, 62, 36, 35, 35, 35, 35, 34, 35, 36, 36, 37, 38, 38, 40, 42,
+        42, 45, 48, 48, 49, 50, 50, 52, 53, 54, 56, 56, 57, 59, 60, 61, 63, 63,
+        38, 37, 37, 37, 36, 36, 36, 38, 38, 38, 39, 39, 41, 44, 44, 46, 49, 49,
+        51, 52, 52, 55, 56, 56, 58, 59, 60, 62, 63, 64, 67, 67, 44, 43, 42, 42,
+        41, 41, 41, 42, 42, 42, 42, 42, 44, 48, 48, 50, 54, 54, 56, 58, 58, 61,
+        63, 63, 66, 67, 67, 71, 71, 72, 75, 75, 44, 43, 42, 42, 41, 41, 41, 42,
+        42, 42, 42, 42, 44, 48, 48, 50, 54, 54, 56, 58, 58, 61, 63, 63, 66, 67,
+        67, 71, 71, 72, 75, 75, 47, 46, 45, 45, 44, 44, 44, 45, 45, 45, 45, 45,
+        47, 50, 50, 53, 56, 56, 58, 60, 60, 64, 66, 66, 69, 70, 71, 74, 75, 76,
+        79, 79, 53, 52, 51, 51, 49, 49, 49, 49, 50, 49, 49, 49, 51, 54, 54, 57,
+        60, 60, 63, 65, 65, 69, 71, 72, 75, 76, 77, 81, 82, 83, 87, 87, 53, 52,
+        51, 51, 49, 49, 49, 49, 50, 49, 49, 49, 51, 54, 54, 57, 60, 60, 63, 65,
+        65, 69, 71, 72, 75, 76, 77, 81, 82, 83, 87, 87, 59, 57, 56, 56, 54, 54,
+        54, 54, 54, 54, 53, 53, 55, 58, 58, 61, 64, 64, 67, 69, 69, 73, 75, 76,
+        79, 80, 81, 86, 87, 88, 92, 92,
+        /* Size 4x16 */
+        31, 32, 38, 53, 32, 32, 37, 51, 32, 32, 36, 49, 32, 33, 36, 49, 32, 34,
+        38, 50, 32, 35, 39, 49, 33, 36, 41, 51, 34, 37, 44, 54, 35, 38, 49, 60,
+        37, 40, 51, 63, 38, 40, 52, 65, 42, 43, 56, 71, 45, 45, 58, 75, 47, 47,
+        60, 77, 51, 50, 63, 82, 55, 54, 67, 87,
+        /* Size 16x4 */
+        31, 32, 32, 32, 32, 32, 33, 34, 35, 37, 38, 42, 45, 47, 51, 55, 32, 32,
+        32, 33, 34, 35, 36, 37, 38, 40, 40, 43, 45, 47, 50, 54, 38, 37, 36, 36,
+        38, 39, 41, 44, 49, 51, 52, 56, 58, 60, 63, 67, 53, 51, 49, 49, 50, 49,
+        51, 54, 60, 63, 65, 71, 75, 77, 82, 87,
+        /* Size 8x32 */
+        32, 31, 32, 32, 36, 44, 47, 53, 31, 32, 32, 33, 35, 43, 46, 52, 31, 32,
+        32, 33, 35, 42, 45, 51, 31, 32, 32, 33, 35, 42, 45, 51, 31, 32, 32, 33,
+        35, 41, 44, 49, 31, 32, 32, 33, 34, 41, 44, 49, 31, 32, 33, 33, 35, 41,
+        44, 49, 32, 32, 33, 34, 36, 42, 45, 49, 32, 32, 34, 34, 36, 42, 45, 50,
+        32, 32, 34, 35, 37, 42, 45, 49, 32, 33, 35, 36, 38, 42, 45, 49, 32, 33,
+        35, 36, 38, 42, 45, 49, 32, 33, 35, 36, 40, 44, 47, 51, 34, 34, 36, 38,
+        42, 48, 50, 54, 34, 34, 36, 38, 42, 48, 50, 54, 35, 34, 37, 39, 45, 50,
+        53, 57, 36, 34, 37, 40, 48, 54, 56, 60, 36, 34, 37, 40, 48, 54, 56, 60,
+        38, 36, 39, 41, 49, 56, 58, 63, 39, 37, 40, 42, 50, 58, 60, 65, 39, 37,
+        40, 42, 50, 58, 60, 65, 42, 40, 42, 44, 52, 61, 64, 69, 44, 41, 42, 45,
+        53, 63, 66, 71, 44, 41, 43, 45, 54, 63, 66, 72, 47, 44, 45, 47, 56, 66,
+        69, 75, 48, 45, 46, 48, 56, 67, 70, 76, 49, 46, 47, 48, 57, 67, 71, 77,
+        53, 49, 49, 51, 59, 71, 74, 81, 53, 49, 50, 51, 60, 71, 75, 82, 55, 51,
+        51, 53, 61, 72, 76, 83, 58, 54, 54, 55, 63, 75, 79, 87, 58, 54, 54, 55,
+        63, 75, 79, 87,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 34, 34, 35, 36, 36,
+        38, 39, 39, 42, 44, 44, 47, 48, 49, 53, 53, 55, 58, 58, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 36, 37, 37, 40,
+        41, 41, 44, 45, 46, 49, 49, 51, 54, 54, 32, 32, 32, 32, 32, 32, 33, 33,
+        34, 34, 35, 35, 35, 36, 36, 37, 37, 37, 39, 40, 40, 42, 42, 43, 45, 46,
+        47, 49, 50, 51, 54, 54, 32, 33, 33, 33, 33, 33, 33, 34, 34, 35, 36, 36,
+        36, 38, 38, 39, 40, 40, 41, 42, 42, 44, 45, 45, 47, 48, 48, 51, 51, 53,
+        55, 55, 36, 35, 35, 35, 35, 34, 35, 36, 36, 37, 38, 38, 40, 42, 42, 45,
+        48, 48, 49, 50, 50, 52, 53, 54, 56, 56, 57, 59, 60, 61, 63, 63, 44, 43,
+        42, 42, 41, 41, 41, 42, 42, 42, 42, 42, 44, 48, 48, 50, 54, 54, 56, 58,
+        58, 61, 63, 63, 66, 67, 67, 71, 71, 72, 75, 75, 47, 46, 45, 45, 44, 44,
+        44, 45, 45, 45, 45, 45, 47, 50, 50, 53, 56, 56, 58, 60, 60, 64, 66, 66,
+        69, 70, 71, 74, 75, 76, 79, 79, 53, 52, 51, 51, 49, 49, 49, 49, 50, 49,
+        49, 49, 51, 54, 54, 57, 60, 60, 63, 65, 65, 69, 71, 72, 75, 76, 77, 81,
+        82, 83, 87, 87 },
+      { /* Chroma */
+        /* Size 4x4 */
+        31, 38, 47, 49, 38, 47, 46, 46, 47, 46, 54, 57, 49, 46, 57, 66,
+        /* Size 8x8 */
+        31, 31, 35, 42, 48, 47, 49, 51, 31, 32, 36, 42, 46, 45, 46, 48, 35, 36,
+        41, 45, 47, 45, 46, 48, 42, 42, 45, 48, 50, 49, 50, 51, 48, 46, 47, 50,
+        53, 53, 54, 54, 47, 45, 45, 49, 53, 57, 59, 60, 49, 46, 46, 50, 54, 59,
+        61, 64, 51, 48, 48, 51, 54, 60, 64, 68,
+        /* Size 16x16 */
+        32, 31, 30, 31, 33, 36, 38, 41, 49, 49, 48, 49, 50, 51, 52, 54, 31, 31,
+        31, 32, 34, 38, 40, 42, 47, 47, 47, 47, 48, 48, 50, 52, 30, 31, 31, 32,
+        35, 39, 41, 42, 46, 46, 46, 45, 46, 47, 48, 50, 31, 32, 32, 33, 36, 40,
+        41, 43, 46, 46, 45, 45, 46, 46, 47, 49, 33, 34, 35, 36, 39, 43, 44, 45,
+        47, 46, 46, 45, 46, 47, 47, 49, 36, 38, 39, 40, 43, 47, 47, 47, 48, 47,
+        46, 45, 46, 46, 47, 48, 38, 40, 41, 41, 44, 47, 47, 48, 49, 48, 48, 47,
+        47, 47, 48, 49, 41, 42, 42, 43, 45, 47, 48, 48, 50, 50, 49, 49, 50, 50,
+        50, 52, 49, 47, 46, 46, 47, 48, 49, 50, 53, 53, 53, 53, 54, 54, 54, 55,
+        49, 47, 46, 46, 46, 47, 48, 50, 53, 53, 54, 55, 55, 55, 56, 57, 48, 47,
+        46, 45, 46, 46, 48, 49, 53, 54, 54, 55, 56, 56, 57, 58, 49, 47, 45, 45,
+        45, 45, 47, 49, 53, 55, 55, 58, 59, 60, 61, 62, 50, 48, 46, 46, 46, 46,
+        47, 50, 54, 55, 56, 59, 61, 61, 63, 64, 51, 48, 47, 46, 47, 46, 47, 50,
+        54, 55, 56, 60, 61, 62, 64, 66, 52, 50, 48, 47, 47, 47, 48, 50, 54, 56,
+        57, 61, 63, 64, 66, 68, 54, 52, 50, 49, 49, 48, 49, 52, 55, 57, 58, 62,
+        64, 66, 68, 71,
+        /* Size 32x32 */
+        32, 31, 31, 31, 30, 30, 31, 33, 33, 34, 36, 36, 38, 41, 41, 45, 49, 49,
+        49, 48, 48, 49, 49, 49, 50, 50, 51, 52, 52, 53, 54, 54, 31, 31, 31, 31,
+        31, 31, 31, 34, 34, 35, 38, 38, 39, 42, 42, 45, 48, 48, 47, 47, 47, 47,
+        47, 47, 49, 49, 49, 50, 50, 51, 53, 53, 31, 31, 31, 31, 31, 31, 32, 34,
+        34, 35, 38, 38, 40, 42, 42, 45, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48,
+        48, 49, 50, 50, 52, 52, 31, 31, 31, 31, 31, 31, 32, 34, 34, 36, 38, 38,
+        40, 42, 42, 45, 47, 47, 47, 47, 47, 47, 46, 47, 48, 48, 48, 49, 49, 50,
+        52, 52, 30, 31, 31, 31, 31, 31, 32, 35, 35, 36, 39, 39, 41, 42, 42, 44,
+        46, 46, 46, 46, 46, 45, 45, 45, 46, 47, 47, 48, 48, 48, 50, 50, 30, 31,
+        31, 31, 31, 32, 32, 35, 35, 36, 40, 40, 41, 42, 42, 44, 46, 46, 46, 45,
+        45, 45, 45, 45, 46, 46, 46, 47, 47, 48, 49, 49, 31, 31, 32, 32, 32, 32,
+        33, 35, 36, 37, 40, 40, 41, 43, 43, 44, 46, 46, 46, 45, 45, 45, 45, 45,
+        46, 46, 46, 47, 47, 48, 49, 49, 33, 34, 34, 34, 35, 35, 35, 38, 38, 40,
+        43, 43, 43, 44, 44, 46, 47, 47, 46, 46, 46, 45, 45, 45, 46, 46, 47, 47,
+        47, 48, 49, 49, 33, 34, 34, 34, 35, 35, 36, 38, 39, 40, 43, 43, 44, 45,
+        45, 46, 47, 47, 46, 46, 46, 45, 45, 45, 46, 46, 47, 47, 47, 48, 49, 49,
+        34, 35, 35, 36, 36, 36, 37, 40, 40, 41, 44, 44, 45, 45, 45, 46, 47, 47,
+        47, 46, 46, 45, 45, 45, 46, 46, 46, 47, 47, 48, 49, 49, 36, 38, 38, 38,
+        39, 40, 40, 43, 43, 44, 47, 47, 47, 47, 47, 47, 48, 48, 47, 46, 46, 45,
+        45, 45, 46, 46, 46, 46, 47, 47, 48, 48, 36, 38, 38, 38, 39, 40, 40, 43,
+        43, 44, 47, 47, 47, 47, 47, 47, 48, 48, 47, 46, 46, 45, 45, 45, 46, 46,
+        46, 46, 47, 47, 48, 48, 38, 39, 40, 40, 41, 41, 41, 43, 44, 45, 47, 47,
+        47, 48, 48, 48, 49, 49, 48, 48, 48, 47, 47, 47, 47, 47, 47, 48, 48, 48,
+        49, 49, 41, 42, 42, 42, 42, 42, 43, 44, 45, 45, 47, 47, 48, 48, 48, 49,
+        50, 50, 50, 49, 49, 49, 49, 49, 50, 50, 50, 50, 50, 51, 52, 52, 41, 42,
+        42, 42, 42, 42, 43, 44, 45, 45, 47, 47, 48, 48, 48, 49, 50, 50, 50, 49,
+        49, 49, 49, 49, 50, 50, 50, 50, 50, 51, 52, 52, 45, 45, 45, 45, 44, 44,
+        44, 46, 46, 46, 47, 47, 48, 49, 49, 50, 51, 51, 51, 51, 51, 51, 51, 51,
+        52, 52, 52, 52, 52, 52, 53, 53, 49, 48, 47, 47, 46, 46, 46, 47, 47, 47,
+        48, 48, 49, 50, 50, 51, 53, 53, 53, 53, 53, 53, 53, 53, 54, 54, 54, 54,
+        54, 54, 55, 55, 49, 48, 47, 47, 46, 46, 46, 47, 47, 47, 48, 48, 49, 50,
+        50, 51, 53, 53, 53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 54, 54, 55, 55,
+        49, 47, 47, 47, 46, 46, 46, 46, 46, 47, 47, 47, 48, 50, 50, 51, 53, 53,
+        53, 54, 54, 54, 55, 55, 55, 55, 55, 56, 56, 56, 57, 57, 48, 47, 47, 47,
+        46, 45, 45, 46, 46, 46, 46, 46, 48, 49, 49, 51, 53, 53, 54, 54, 54, 55,
+        55, 56, 56, 56, 56, 57, 57, 58, 58, 58, 48, 47, 47, 47, 46, 45, 45, 46,
+        46, 46, 46, 46, 48, 49, 49, 51, 53, 53, 54, 54, 54, 55, 55, 56, 56, 56,
+        56, 57, 57, 58, 58, 58, 49, 47, 47, 47, 45, 45, 45, 45, 45, 45, 45, 45,
+        47, 49, 49, 51, 53, 53, 54, 55, 55, 57, 57, 58, 58, 59, 59, 60, 60, 60,
+        61, 61, 49, 47, 47, 46, 45, 45, 45, 45, 45, 45, 45, 45, 47, 49, 49, 51,
+        53, 53, 55, 55, 55, 57, 58, 58, 59, 60, 60, 61, 61, 61, 62, 62, 49, 47,
+        47, 47, 45, 45, 45, 45, 45, 45, 45, 45, 47, 49, 49, 51, 53, 53, 55, 56,
+        56, 58, 58, 59, 59, 60, 60, 61, 61, 62, 63, 63, 50, 49, 48, 48, 46, 46,
+        46, 46, 46, 46, 46, 46, 47, 50, 50, 52, 54, 54, 55, 56, 56, 58, 59, 59,
+        61, 61, 61, 63, 63, 63, 64, 64, 50, 49, 48, 48, 47, 46, 46, 46, 46, 46,
+        46, 46, 47, 50, 50, 52, 54, 54, 55, 56, 56, 59, 60, 60, 61, 61, 62, 63,
+        63, 64, 65, 65, 51, 49, 48, 48, 47, 46, 46, 47, 47, 46, 46, 46, 47, 50,
+        50, 52, 54, 54, 55, 56, 56, 59, 60, 60, 61, 62, 62, 64, 64, 64, 66, 66,
+        52, 50, 49, 49, 48, 47, 47, 47, 47, 47, 46, 46, 48, 50, 50, 52, 54, 54,
+        56, 57, 57, 60, 61, 61, 63, 63, 64, 66, 66, 67, 68, 68, 52, 50, 50, 49,
+        48, 47, 47, 47, 47, 47, 47, 47, 48, 50, 50, 52, 54, 54, 56, 57, 57, 60,
+        61, 61, 63, 63, 64, 66, 66, 67, 68, 68, 53, 51, 50, 50, 48, 48, 48, 48,
+        48, 48, 47, 47, 48, 51, 51, 52, 54, 54, 56, 58, 58, 60, 61, 62, 63, 64,
+        64, 67, 67, 68, 69, 69, 54, 53, 52, 52, 50, 49, 49, 49, 49, 49, 48, 48,
+        49, 52, 52, 53, 55, 55, 57, 58, 58, 61, 62, 63, 64, 65, 66, 68, 68, 69,
+        71, 71, 54, 53, 52, 52, 50, 49, 49, 49, 49, 49, 48, 48, 49, 52, 52, 53,
+        55, 55, 57, 58, 58, 61, 62, 63, 64, 65, 66, 68, 68, 69, 71, 71,
+        /* Size 4x8 */
+        31, 38, 47, 50, 31, 40, 46, 48, 36, 44, 47, 47, 42, 47, 50, 50, 47, 48,
+        53, 54, 46, 46, 54, 60, 48, 46, 55, 64, 50, 48, 56, 67,
+        /* Size 8x4 */
+        31, 31, 36, 42, 47, 46, 48, 50, 38, 40, 44, 47, 48, 46, 46, 48, 47, 46,
+        47, 50, 53, 54, 55, 56, 50, 48, 47, 50, 54, 60, 64, 67,
+        /* Size 8x16 */
+        32, 31, 35, 38, 48, 49, 50, 52, 31, 31, 37, 40, 47, 47, 48, 50, 30, 32,
+        38, 40, 46, 45, 46, 48, 31, 33, 38, 41, 46, 45, 46, 48, 33, 36, 41, 44,
+        47, 46, 46, 47, 37, 40, 45, 47, 47, 45, 46, 47, 39, 41, 46, 47, 48, 47,
+        47, 48, 42, 43, 46, 48, 50, 49, 50, 50, 49, 46, 48, 49, 53, 53, 54, 54,
+        48, 46, 47, 48, 53, 55, 55, 56, 48, 46, 46, 48, 53, 56, 56, 57, 49, 45,
+        45, 47, 53, 58, 59, 61, 50, 46, 46, 48, 54, 59, 61, 63, 51, 47, 47, 48,
+        54, 60, 61, 64, 52, 48, 47, 48, 54, 61, 63, 66, 54, 50, 49, 50, 55, 62,
+        65, 68,
+        /* Size 16x8 */
+        32, 31, 30, 31, 33, 37, 39, 42, 49, 48, 48, 49, 50, 51, 52, 54, 31, 31,
+        32, 33, 36, 40, 41, 43, 46, 46, 46, 45, 46, 47, 48, 50, 35, 37, 38, 38,
+        41, 45, 46, 46, 48, 47, 46, 45, 46, 47, 47, 49, 38, 40, 40, 41, 44, 47,
+        47, 48, 49, 48, 48, 47, 48, 48, 48, 50, 48, 47, 46, 46, 47, 47, 48, 50,
+        53, 53, 53, 53, 54, 54, 54, 55, 49, 47, 45, 45, 46, 45, 47, 49, 53, 55,
+        56, 58, 59, 60, 61, 62, 50, 48, 46, 46, 46, 46, 47, 50, 54, 55, 56, 59,
+        61, 61, 63, 65, 52, 50, 48, 48, 47, 47, 48, 50, 54, 56, 57, 61, 63, 64,
+        66, 68,
+        /* Size 16x32 */
+        32, 31, 31, 31, 35, 37, 38, 47, 48, 48, 49, 49, 50, 52, 52, 54, 31, 31,
+        31, 32, 36, 38, 39, 46, 47, 47, 48, 48, 49, 50, 50, 53, 31, 31, 31, 32,
+        37, 38, 40, 46, 47, 47, 47, 47, 48, 50, 50, 52, 31, 31, 31, 32, 37, 38,
+        40, 46, 47, 47, 47, 47, 48, 50, 50, 52, 30, 31, 32, 32, 38, 39, 40, 45,
+        46, 46, 45, 45, 46, 48, 48, 50, 30, 31, 32, 33, 38, 40, 41, 45, 46, 46,
+        45, 45, 46, 48, 48, 50, 31, 32, 33, 33, 38, 40, 41, 45, 46, 46, 45, 45,
+        46, 48, 48, 50, 33, 35, 35, 36, 41, 43, 43, 46, 47, 46, 45, 45, 46, 47,
+        47, 49, 33, 35, 36, 36, 41, 43, 44, 46, 47, 46, 46, 46, 46, 47, 47, 49,
+        34, 36, 37, 37, 42, 44, 45, 47, 47, 47, 45, 45, 46, 47, 47, 49, 37, 39,
+        40, 41, 45, 47, 47, 47, 47, 47, 45, 45, 46, 47, 47, 48, 37, 39, 40, 41,
+        45, 47, 47, 47, 47, 47, 45, 45, 46, 47, 47, 48, 39, 40, 41, 42, 46, 47,
+        47, 48, 48, 48, 47, 47, 47, 48, 48, 50, 42, 42, 43, 43, 46, 47, 48, 50,
+        50, 50, 49, 49, 50, 50, 50, 52, 42, 42, 43, 43, 46, 47, 48, 50, 50, 50,
+        49, 49, 50, 50, 50, 52, 45, 45, 44, 45, 47, 47, 48, 51, 51, 51, 51, 51,
+        52, 52, 52, 54, 49, 47, 46, 47, 48, 48, 49, 52, 53, 53, 53, 53, 54, 54,
+        54, 55, 49, 47, 46, 47, 48, 48, 49, 52, 53, 53, 53, 53, 54, 54, 54, 55,
+        48, 47, 46, 46, 47, 47, 48, 52, 53, 53, 55, 55, 55, 56, 56, 57, 48, 46,
+        46, 46, 46, 47, 48, 52, 53, 54, 56, 56, 56, 57, 57, 59, 48, 46, 46, 46,
+        46, 47, 48, 52, 53, 54, 56, 56, 56, 57, 57, 59, 49, 46, 45, 45, 46, 46,
+        47, 52, 53, 54, 57, 57, 58, 60, 60, 61, 49, 46, 45, 45, 45, 46, 47, 52,
+        53, 55, 58, 58, 59, 61, 61, 62, 49, 46, 45, 45, 46, 46, 47, 52, 53, 55,
+        58, 58, 60, 61, 61, 63, 50, 47, 46, 46, 46, 46, 48, 53, 54, 55, 59, 59,
+        61, 63, 63, 65, 50, 48, 46, 46, 46, 46, 48, 53, 54, 55, 59, 59, 61, 64,
+        64, 65, 51, 48, 47, 47, 47, 47, 48, 53, 54, 55, 60, 60, 61, 64, 64, 66,
+        52, 49, 48, 48, 47, 47, 48, 53, 54, 56, 61, 61, 63, 66, 66, 68, 52, 49,
+        48, 48, 47, 47, 48, 53, 54, 56, 61, 61, 63, 66, 66, 68, 53, 50, 48, 48,
+        48, 48, 49, 54, 54, 56, 61, 61, 63, 67, 67, 69, 54, 51, 50, 50, 49, 49,
+        50, 55, 55, 57, 62, 62, 65, 68, 68, 71, 54, 51, 50, 50, 49, 49, 50, 55,
+        55, 57, 62, 62, 65, 68, 68, 71,
+        /* Size 32x16 */
+        32, 31, 31, 31, 30, 30, 31, 33, 33, 34, 37, 37, 39, 42, 42, 45, 49, 49,
+        48, 48, 48, 49, 49, 49, 50, 50, 51, 52, 52, 53, 54, 54, 31, 31, 31, 31,
+        31, 31, 32, 35, 35, 36, 39, 39, 40, 42, 42, 45, 47, 47, 47, 46, 46, 46,
+        46, 46, 47, 48, 48, 49, 49, 50, 51, 51, 31, 31, 31, 31, 32, 32, 33, 35,
+        36, 37, 40, 40, 41, 43, 43, 44, 46, 46, 46, 46, 46, 45, 45, 45, 46, 46,
+        47, 48, 48, 48, 50, 50, 31, 32, 32, 32, 32, 33, 33, 36, 36, 37, 41, 41,
+        42, 43, 43, 45, 47, 47, 46, 46, 46, 45, 45, 45, 46, 46, 47, 48, 48, 48,
+        50, 50, 35, 36, 37, 37, 38, 38, 38, 41, 41, 42, 45, 45, 46, 46, 46, 47,
+        48, 48, 47, 46, 46, 46, 45, 46, 46, 46, 47, 47, 47, 48, 49, 49, 37, 38,
+        38, 38, 39, 40, 40, 43, 43, 44, 47, 47, 47, 47, 47, 47, 48, 48, 47, 47,
+        47, 46, 46, 46, 46, 46, 47, 47, 47, 48, 49, 49, 38, 39, 40, 40, 40, 41,
+        41, 43, 44, 45, 47, 47, 47, 48, 48, 48, 49, 49, 48, 48, 48, 47, 47, 47,
+        48, 48, 48, 48, 48, 49, 50, 50, 47, 46, 46, 46, 45, 45, 45, 46, 46, 47,
+        47, 47, 48, 50, 50, 51, 52, 52, 52, 52, 52, 52, 52, 52, 53, 53, 53, 53,
+        53, 54, 55, 55, 48, 47, 47, 47, 46, 46, 46, 47, 47, 47, 47, 47, 48, 50,
+        50, 51, 53, 53, 53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 54, 54, 55, 55,
+        48, 47, 47, 47, 46, 46, 46, 46, 46, 47, 47, 47, 48, 50, 50, 51, 53, 53,
+        53, 54, 54, 54, 55, 55, 55, 55, 55, 56, 56, 56, 57, 57, 49, 48, 47, 47,
+        45, 45, 45, 45, 46, 45, 45, 45, 47, 49, 49, 51, 53, 53, 55, 56, 56, 57,
+        58, 58, 59, 59, 60, 61, 61, 61, 62, 62, 49, 48, 47, 47, 45, 45, 45, 45,
+        46, 45, 45, 45, 47, 49, 49, 51, 53, 53, 55, 56, 56, 57, 58, 58, 59, 59,
+        60, 61, 61, 61, 62, 62, 50, 49, 48, 48, 46, 46, 46, 46, 46, 46, 46, 46,
+        47, 50, 50, 52, 54, 54, 55, 56, 56, 58, 59, 60, 61, 61, 61, 63, 63, 63,
+        65, 65, 52, 50, 50, 50, 48, 48, 48, 47, 47, 47, 47, 47, 48, 50, 50, 52,
+        54, 54, 56, 57, 57, 60, 61, 61, 63, 64, 64, 66, 66, 67, 68, 68, 52, 50,
+        50, 50, 48, 48, 48, 47, 47, 47, 47, 47, 48, 50, 50, 52, 54, 54, 56, 57,
+        57, 60, 61, 61, 63, 64, 64, 66, 66, 67, 68, 68, 54, 53, 52, 52, 50, 50,
+        50, 49, 49, 49, 48, 48, 50, 52, 52, 54, 55, 55, 57, 59, 59, 61, 62, 63,
+        65, 65, 66, 68, 68, 69, 71, 71,
+        /* Size 4x16 */
+        31, 37, 48, 52, 31, 38, 47, 50, 31, 39, 46, 48, 32, 40, 46, 48, 35, 43,
+        46, 47, 39, 47, 47, 47, 40, 47, 48, 48, 42, 47, 50, 50, 47, 48, 53, 54,
+        47, 47, 53, 56, 46, 47, 54, 57, 46, 46, 55, 61, 47, 46, 55, 63, 48, 47,
+        55, 64, 49, 47, 56, 66, 51, 49, 57, 68,
+        /* Size 16x4 */
+        31, 31, 31, 32, 35, 39, 40, 42, 47, 47, 46, 46, 47, 48, 49, 51, 37, 38,
+        39, 40, 43, 47, 47, 47, 48, 47, 47, 46, 46, 47, 47, 49, 48, 47, 46, 46,
+        46, 47, 48, 50, 53, 53, 54, 55, 55, 55, 56, 57, 52, 50, 48, 48, 47, 47,
+        48, 50, 54, 56, 57, 61, 63, 64, 66, 68,
+        /* Size 8x32 */
+        32, 31, 35, 38, 48, 49, 50, 52, 31, 31, 36, 39, 47, 48, 49, 50, 31, 31,
+        37, 40, 47, 47, 48, 50, 31, 31, 37, 40, 47, 47, 48, 50, 30, 32, 38, 40,
+        46, 45, 46, 48, 30, 32, 38, 41, 46, 45, 46, 48, 31, 33, 38, 41, 46, 45,
+        46, 48, 33, 35, 41, 43, 47, 45, 46, 47, 33, 36, 41, 44, 47, 46, 46, 47,
+        34, 37, 42, 45, 47, 45, 46, 47, 37, 40, 45, 47, 47, 45, 46, 47, 37, 40,
+        45, 47, 47, 45, 46, 47, 39, 41, 46, 47, 48, 47, 47, 48, 42, 43, 46, 48,
+        50, 49, 50, 50, 42, 43, 46, 48, 50, 49, 50, 50, 45, 44, 47, 48, 51, 51,
+        52, 52, 49, 46, 48, 49, 53, 53, 54, 54, 49, 46, 48, 49, 53, 53, 54, 54,
+        48, 46, 47, 48, 53, 55, 55, 56, 48, 46, 46, 48, 53, 56, 56, 57, 48, 46,
+        46, 48, 53, 56, 56, 57, 49, 45, 46, 47, 53, 57, 58, 60, 49, 45, 45, 47,
+        53, 58, 59, 61, 49, 45, 46, 47, 53, 58, 60, 61, 50, 46, 46, 48, 54, 59,
+        61, 63, 50, 46, 46, 48, 54, 59, 61, 64, 51, 47, 47, 48, 54, 60, 61, 64,
+        52, 48, 47, 48, 54, 61, 63, 66, 52, 48, 47, 48, 54, 61, 63, 66, 53, 48,
+        48, 49, 54, 61, 63, 67, 54, 50, 49, 50, 55, 62, 65, 68, 54, 50, 49, 50,
+        55, 62, 65, 68,
+        /* Size 32x8 */
+        32, 31, 31, 31, 30, 30, 31, 33, 33, 34, 37, 37, 39, 42, 42, 45, 49, 49,
+        48, 48, 48, 49, 49, 49, 50, 50, 51, 52, 52, 53, 54, 54, 31, 31, 31, 31,
+        32, 32, 33, 35, 36, 37, 40, 40, 41, 43, 43, 44, 46, 46, 46, 46, 46, 45,
+        45, 45, 46, 46, 47, 48, 48, 48, 50, 50, 35, 36, 37, 37, 38, 38, 38, 41,
+        41, 42, 45, 45, 46, 46, 46, 47, 48, 48, 47, 46, 46, 46, 45, 46, 46, 46,
+        47, 47, 47, 48, 49, 49, 38, 39, 40, 40, 40, 41, 41, 43, 44, 45, 47, 47,
+        47, 48, 48, 48, 49, 49, 48, 48, 48, 47, 47, 47, 48, 48, 48, 48, 48, 49,
+        50, 50, 48, 47, 47, 47, 46, 46, 46, 47, 47, 47, 47, 47, 48, 50, 50, 51,
+        53, 53, 53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 54, 54, 55, 55, 49, 48,
+        47, 47, 45, 45, 45, 45, 46, 45, 45, 45, 47, 49, 49, 51, 53, 53, 55, 56,
+        56, 57, 58, 58, 59, 59, 60, 61, 61, 61, 62, 62, 50, 49, 48, 48, 46, 46,
+        46, 46, 46, 46, 46, 46, 47, 50, 50, 52, 54, 54, 55, 56, 56, 58, 59, 60,
+        61, 61, 61, 63, 63, 63, 65, 65, 52, 50, 50, 50, 48, 48, 48, 47, 47, 47,
+        47, 47, 48, 50, 50, 52, 54, 54, 56, 57, 57, 60, 61, 61, 63, 64, 64, 66,
+        66, 67, 68, 68 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 32, 35, 43, 32, 34, 37, 43, 35, 37, 48, 54, 43, 43, 54, 65,
+        /* Size 8x8 */
+        31, 31, 32, 32, 34, 37, 43, 47, 31, 32, 32, 32, 34, 36, 41, 44, 32, 32,
+        33, 34, 35, 38, 42, 45, 32, 32, 34, 35, 37, 39, 42, 46, 34, 34, 35, 37,
+        41, 45, 49, 52, 37, 36, 38, 39, 45, 51, 56, 59, 43, 41, 42, 42, 49, 56,
+        63, 67, 47, 44, 45, 46, 52, 59, 67, 71,
+        /* Size 16x16 */
+        32, 31, 31, 31, 31, 31, 32, 32, 34, 35, 36, 39, 41, 44, 47, 48, 31, 32,
+        32, 32, 32, 32, 32, 33, 34, 35, 35, 38, 40, 42, 45, 46, 31, 32, 32, 32,
+        32, 32, 32, 33, 34, 34, 35, 38, 39, 42, 45, 45, 31, 32, 32, 32, 32, 32,
+        32, 33, 33, 34, 34, 37, 38, 41, 44, 44, 31, 32, 32, 32, 33, 33, 33, 34,
+        35, 36, 36, 39, 40, 42, 44, 45, 31, 32, 32, 32, 33, 33, 34, 34, 35, 36,
+        36, 39, 40, 42, 45, 45, 32, 32, 32, 32, 33, 34, 35, 36, 37, 38, 38, 40,
+        41, 42, 45, 46, 32, 33, 33, 33, 34, 34, 36, 36, 38, 39, 40, 42, 43, 44,
+        47, 47, 34, 34, 34, 33, 35, 35, 37, 38, 39, 42, 42, 45, 46, 47, 50, 51,
+        35, 35, 34, 34, 36, 36, 38, 39, 42, 46, 47, 49, 50, 52, 55, 55, 36, 35,
+        35, 34, 36, 36, 38, 40, 42, 47, 48, 50, 52, 54, 56, 57, 39, 38, 38, 37,
+        39, 39, 40, 42, 45, 49, 50, 54, 55, 58, 60, 61, 41, 40, 39, 38, 40, 40,
+        41, 43, 46, 50, 52, 55, 57, 60, 62, 63, 44, 42, 42, 41, 42, 42, 42, 44,
+        47, 52, 54, 58, 60, 63, 66, 67, 47, 45, 45, 44, 44, 45, 45, 47, 50, 55,
+        56, 60, 62, 66, 69, 70, 48, 46, 45, 44, 45, 45, 46, 47, 51, 55, 57, 61,
+        63, 67, 70, 71,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 34, 34, 34,
+        35, 36, 36, 38, 39, 39, 41, 44, 44, 45, 47, 48, 48, 51, 31, 31, 31, 31,
+        31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 35, 35, 35, 37,
+        39, 39, 40, 43, 43, 44, 46, 47, 47, 50, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 35, 35, 35, 37, 38, 38, 40, 42,
+        42, 43, 45, 46, 46, 49, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 34, 34, 34, 35, 35, 35, 37, 38, 38, 40, 42, 42, 43, 45, 46,
+        46, 49, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34,
+        34, 34, 34, 35, 35, 36, 38, 38, 39, 42, 42, 42, 45, 45, 45, 48, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34,
+        34, 36, 37, 37, 38, 41, 41, 41, 44, 44, 44, 47, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 36, 37, 37,
+        38, 41, 41, 41, 44, 44, 44, 47, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 33, 33, 34, 34, 34, 34, 35, 35, 36, 38, 38, 39, 41, 41, 42,
+        44, 45, 45, 47, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33,
+        34, 35, 35, 35, 36, 36, 36, 37, 39, 39, 40, 42, 42, 42, 44, 45, 45, 48,
+        31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 35, 35, 35,
+        36, 36, 36, 38, 39, 39, 40, 42, 42, 42, 45, 45, 45, 48, 31, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 35, 35, 35, 36, 36, 36, 38,
+        39, 39, 40, 42, 42, 42, 45, 45, 45, 48, 32, 32, 32, 32, 32, 32, 32, 33,
+        33, 33, 33, 34, 35, 35, 35, 36, 36, 36, 37, 37, 37, 39, 40, 40, 41, 42,
+        42, 43, 45, 45, 45, 48, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 35,
+        35, 35, 36, 37, 37, 37, 38, 38, 38, 39, 40, 40, 41, 42, 42, 43, 45, 46,
+        46, 48, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 35, 35, 35, 36, 37,
+        37, 37, 38, 38, 38, 39, 40, 40, 41, 42, 42, 43, 45, 46, 46, 48, 32, 33,
+        33, 33, 33, 33, 33, 33, 34, 34, 34, 35, 36, 36, 36, 38, 38, 38, 39, 40,
+        40, 41, 42, 42, 43, 44, 44, 45, 47, 47, 47, 50, 34, 34, 34, 34, 34, 33,
+        33, 34, 35, 35, 35, 36, 37, 37, 38, 39, 39, 40, 42, 42, 42, 44, 45, 45,
+        46, 47, 47, 48, 50, 51, 51, 53, 34, 34, 34, 34, 34, 33, 33, 34, 35, 35,
+        35, 36, 37, 37, 38, 39, 39, 40, 42, 42, 42, 44, 45, 45, 46, 47, 47, 48,
+        50, 51, 51, 53, 34, 34, 34, 34, 34, 34, 34, 34, 35, 35, 35, 36, 37, 37,
+        38, 40, 40, 41, 43, 44, 44, 45, 46, 46, 47, 49, 49, 49, 51, 52, 52, 54,
+        35, 35, 35, 35, 34, 34, 34, 34, 36, 36, 36, 37, 38, 38, 39, 42, 42, 43,
+        46, 47, 47, 48, 49, 49, 50, 52, 52, 53, 55, 55, 55, 57, 36, 35, 35, 35,
+        35, 34, 34, 35, 36, 36, 36, 37, 38, 38, 40, 42, 42, 44, 47, 48, 48, 50,
+        50, 50, 52, 54, 54, 54, 56, 57, 57, 58, 36, 35, 35, 35, 35, 34, 34, 35,
+        36, 36, 36, 37, 38, 38, 40, 42, 42, 44, 47, 48, 48, 50, 50, 50, 52, 54,
+        54, 54, 56, 57, 57, 58, 38, 37, 37, 37, 36, 36, 36, 36, 37, 38, 38, 39,
+        39, 39, 41, 44, 44, 45, 48, 50, 50, 51, 52, 52, 54, 56, 56, 57, 58, 59,
+        59, 61, 39, 39, 38, 38, 38, 37, 37, 38, 39, 39, 39, 40, 40, 40, 42, 45,
+        45, 46, 49, 50, 50, 52, 54, 54, 55, 58, 58, 58, 60, 61, 61, 63, 39, 39,
+        38, 38, 38, 37, 37, 38, 39, 39, 39, 40, 40, 40, 42, 45, 45, 46, 49, 50,
+        50, 52, 54, 54, 55, 58, 58, 58, 60, 61, 61, 63, 41, 40, 40, 40, 39, 38,
+        38, 39, 40, 40, 40, 41, 41, 41, 43, 46, 46, 47, 50, 52, 52, 54, 55, 55,
+        57, 60, 60, 60, 62, 63, 63, 66, 44, 43, 42, 42, 42, 41, 41, 41, 42, 42,
+        42, 42, 42, 42, 44, 47, 47, 49, 52, 54, 54, 56, 58, 58, 60, 63, 63, 64,
+        66, 67, 67, 69, 44, 43, 42, 42, 42, 41, 41, 41, 42, 42, 42, 42, 42, 42,
+        44, 47, 47, 49, 52, 54, 54, 56, 58, 58, 60, 63, 63, 64, 66, 67, 67, 69,
+        45, 44, 43, 43, 42, 41, 41, 42, 42, 42, 42, 43, 43, 43, 45, 48, 48, 49,
+        53, 54, 54, 57, 58, 58, 60, 64, 64, 65, 67, 68, 68, 70, 47, 46, 45, 45,
+        45, 44, 44, 44, 44, 45, 45, 45, 45, 45, 47, 50, 50, 51, 55, 56, 56, 58,
+        60, 60, 62, 66, 66, 67, 69, 70, 70, 73, 48, 47, 46, 46, 45, 44, 44, 45,
+        45, 45, 45, 45, 46, 46, 47, 51, 51, 52, 55, 57, 57, 59, 61, 61, 63, 67,
+        67, 68, 70, 71, 71, 74, 48, 47, 46, 46, 45, 44, 44, 45, 45, 45, 45, 45,
+        46, 46, 47, 51, 51, 52, 55, 57, 57, 59, 61, 61, 63, 67, 67, 68, 70, 71,
+        71, 74, 51, 50, 49, 49, 48, 47, 47, 47, 48, 48, 48, 48, 48, 48, 50, 53,
+        53, 54, 57, 58, 58, 61, 63, 63, 66, 69, 69, 70, 73, 74, 74, 77,
+        /* Size 4x8 */
+        31, 32, 35, 43, 32, 33, 34, 41, 32, 34, 36, 42, 32, 35, 38, 42, 34, 37,
+        43, 49, 37, 40, 49, 56, 42, 43, 53, 63, 46, 46, 56, 67,
+        /* Size 8x4 */
+        31, 32, 32, 32, 34, 37, 42, 46, 32, 33, 34, 35, 37, 40, 43, 46, 35, 34,
+        36, 38, 43, 49, 53, 56, 43, 41, 42, 42, 49, 56, 63, 67,
+        /* Size 8x16 */
+        32, 31, 31, 32, 35, 36, 44, 47, 31, 32, 32, 32, 35, 35, 42, 45, 31, 32,
+        32, 32, 34, 35, 41, 45, 31, 32, 32, 33, 34, 34, 41, 44, 31, 32, 33, 34,
+        35, 36, 42, 44, 32, 32, 33, 34, 36, 36, 42, 45, 32, 33, 34, 35, 37, 38,
+        42, 45, 32, 33, 34, 36, 39, 40, 44, 47, 34, 34, 35, 37, 41, 42, 48, 50,
+        35, 34, 36, 38, 45, 47, 52, 55, 36, 34, 36, 38, 46, 48, 54, 56, 39, 37,
+        39, 40, 48, 50, 58, 60, 41, 39, 40, 41, 49, 51, 60, 62, 44, 41, 42, 43,
+        51, 53, 63, 66, 47, 44, 44, 45, 53, 56, 66, 69, 48, 45, 45, 46, 54, 56,
+        67, 70,
+        /* Size 16x8 */
+        32, 31, 31, 31, 31, 32, 32, 32, 34, 35, 36, 39, 41, 44, 47, 48, 31, 32,
+        32, 32, 32, 32, 33, 33, 34, 34, 34, 37, 39, 41, 44, 45, 31, 32, 32, 32,
+        33, 33, 34, 34, 35, 36, 36, 39, 40, 42, 44, 45, 32, 32, 32, 33, 34, 34,
+        35, 36, 37, 38, 38, 40, 41, 43, 45, 46, 35, 35, 34, 34, 35, 36, 37, 39,
+        41, 45, 46, 48, 49, 51, 53, 54, 36, 35, 35, 34, 36, 36, 38, 40, 42, 47,
+        48, 50, 51, 53, 56, 56, 44, 42, 41, 41, 42, 42, 42, 44, 48, 52, 54, 58,
+        60, 63, 66, 67, 47, 45, 45, 44, 44, 45, 45, 47, 50, 55, 56, 60, 62, 66,
+        69, 70,
+        /* Size 16x32 */
+        32, 31, 31, 31, 31, 32, 32, 32, 35, 36, 36, 40, 44, 44, 47, 53, 31, 31,
+        32, 32, 32, 32, 32, 33, 35, 35, 35, 39, 43, 43, 46, 52, 31, 32, 32, 32,
+        32, 32, 32, 33, 35, 35, 35, 39, 42, 42, 45, 51, 31, 32, 32, 32, 32, 32,
+        32, 33, 35, 35, 35, 39, 42, 42, 45, 51, 31, 32, 32, 32, 32, 32, 32, 33,
+        34, 35, 35, 39, 41, 41, 45, 50, 31, 32, 32, 32, 32, 33, 33, 33, 34, 34,
+        34, 38, 41, 41, 44, 49, 31, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 38,
+        41, 41, 44, 49, 31, 32, 32, 32, 32, 33, 33, 33, 34, 35, 35, 38, 41, 41,
+        44, 49, 31, 32, 32, 32, 33, 34, 34, 34, 35, 36, 36, 39, 42, 42, 44, 49,
+        32, 32, 32, 32, 33, 34, 34, 34, 36, 36, 36, 39, 42, 42, 45, 50, 32, 32,
+        32, 32, 33, 34, 34, 34, 36, 36, 36, 39, 42, 42, 45, 50, 32, 32, 32, 32,
+        33, 35, 35, 35, 37, 37, 37, 40, 42, 42, 45, 49, 32, 32, 33, 33, 34, 35,
+        35, 36, 37, 38, 38, 41, 42, 42, 45, 49, 32, 32, 33, 33, 34, 35, 35, 36,
+        37, 38, 38, 41, 42, 42, 45, 49, 32, 33, 33, 33, 34, 36, 36, 36, 39, 40,
+        40, 42, 44, 44, 47, 51, 34, 34, 34, 34, 35, 37, 37, 38, 41, 42, 42, 45,
+        48, 48, 50, 54, 34, 34, 34, 34, 35, 37, 37, 38, 41, 42, 42, 45, 48, 48,
+        50, 54, 34, 34, 34, 34, 35, 37, 37, 38, 42, 43, 43, 46, 49, 49, 51, 55,
+        35, 35, 34, 34, 36, 38, 38, 39, 45, 47, 47, 50, 52, 52, 55, 59, 36, 35,
+        34, 34, 36, 38, 38, 40, 46, 48, 48, 51, 54, 54, 56, 60, 36, 35, 34, 34,
+        36, 38, 38, 40, 46, 48, 48, 51, 54, 54, 56, 60, 38, 37, 36, 36, 37, 40,
+        40, 41, 47, 49, 49, 53, 56, 56, 58, 63, 39, 38, 37, 37, 39, 40, 40, 42,
+        48, 50, 50, 54, 58, 58, 60, 65, 39, 38, 37, 37, 39, 40, 40, 42, 48, 50,
+        50, 54, 58, 58, 60, 65, 41, 40, 39, 39, 40, 41, 41, 43, 49, 51, 51, 56,
+        60, 60, 62, 67, 44, 42, 41, 41, 42, 43, 43, 45, 51, 53, 53, 59, 63, 63,
+        66, 71, 44, 42, 41, 41, 42, 43, 43, 45, 51, 53, 53, 59, 63, 63, 66, 71,
+        44, 43, 42, 42, 42, 43, 43, 45, 51, 54, 54, 59, 64, 64, 67, 72, 47, 45,
+        44, 44, 44, 45, 45, 47, 53, 56, 56, 61, 66, 66, 69, 75, 48, 46, 45, 45,
+        45, 46, 46, 48, 54, 56, 56, 62, 67, 67, 70, 76, 48, 46, 45, 45, 45, 46,
+        46, 48, 54, 56, 56, 62, 67, 67, 70, 76, 51, 49, 47, 47, 48, 48, 48, 50,
+        56, 58, 58, 64, 69, 69, 73, 79,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 34, 34, 34,
+        35, 36, 36, 38, 39, 39, 41, 44, 44, 44, 47, 48, 48, 51, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 35, 35, 35, 37,
+        38, 38, 40, 42, 42, 43, 45, 46, 46, 49, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 34, 36, 37, 37, 39, 41,
+        41, 42, 44, 45, 45, 47, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 34, 34, 34, 34, 34, 34, 36, 37, 37, 39, 41, 41, 42, 44, 45,
+        45, 47, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 35,
+        35, 35, 36, 36, 36, 37, 39, 39, 40, 42, 42, 42, 44, 45, 45, 48, 32, 32,
+        32, 32, 32, 33, 33, 33, 34, 34, 34, 35, 35, 35, 36, 37, 37, 37, 38, 38,
+        38, 40, 40, 40, 41, 43, 43, 43, 45, 46, 46, 48, 32, 32, 32, 32, 32, 33,
+        33, 33, 34, 34, 34, 35, 35, 35, 36, 37, 37, 37, 38, 38, 38, 40, 40, 40,
+        41, 43, 43, 43, 45, 46, 46, 48, 32, 33, 33, 33, 33, 33, 33, 33, 34, 34,
+        34, 35, 36, 36, 36, 38, 38, 38, 39, 40, 40, 41, 42, 42, 43, 45, 45, 45,
+        47, 48, 48, 50, 35, 35, 35, 35, 34, 34, 34, 34, 35, 36, 36, 37, 37, 37,
+        39, 41, 41, 42, 45, 46, 46, 47, 48, 48, 49, 51, 51, 51, 53, 54, 54, 56,
+        36, 35, 35, 35, 35, 34, 34, 35, 36, 36, 36, 37, 38, 38, 40, 42, 42, 43,
+        47, 48, 48, 49, 50, 50, 51, 53, 53, 54, 56, 56, 56, 58, 36, 35, 35, 35,
+        35, 34, 34, 35, 36, 36, 36, 37, 38, 38, 40, 42, 42, 43, 47, 48, 48, 49,
+        50, 50, 51, 53, 53, 54, 56, 56, 56, 58, 40, 39, 39, 39, 39, 38, 38, 38,
+        39, 39, 39, 40, 41, 41, 42, 45, 45, 46, 50, 51, 51, 53, 54, 54, 56, 59,
+        59, 59, 61, 62, 62, 64, 44, 43, 42, 42, 41, 41, 41, 41, 42, 42, 42, 42,
+        42, 42, 44, 48, 48, 49, 52, 54, 54, 56, 58, 58, 60, 63, 63, 64, 66, 67,
+        67, 69, 44, 43, 42, 42, 41, 41, 41, 41, 42, 42, 42, 42, 42, 42, 44, 48,
+        48, 49, 52, 54, 54, 56, 58, 58, 60, 63, 63, 64, 66, 67, 67, 69, 47, 46,
+        45, 45, 45, 44, 44, 44, 44, 45, 45, 45, 45, 45, 47, 50, 50, 51, 55, 56,
+        56, 58, 60, 60, 62, 66, 66, 67, 69, 70, 70, 73, 53, 52, 51, 51, 50, 49,
+        49, 49, 49, 50, 50, 49, 49, 49, 51, 54, 54, 55, 59, 60, 60, 63, 65, 65,
+        67, 71, 71, 72, 75, 76, 76, 79,
+        /* Size 4x16 */
+        31, 32, 36, 44, 32, 32, 35, 42, 32, 32, 35, 41, 32, 33, 34, 41, 32, 34,
+        36, 42, 32, 34, 36, 42, 32, 35, 38, 42, 33, 36, 40, 44, 34, 37, 42, 48,
+        35, 38, 47, 52, 35, 38, 48, 54, 38, 40, 50, 58, 40, 41, 51, 60, 42, 43,
+        53, 63, 45, 45, 56, 66, 46, 46, 56, 67,
+        /* Size 16x4 */
+        31, 32, 32, 32, 32, 32, 32, 33, 34, 35, 35, 38, 40, 42, 45, 46, 32, 32,
+        32, 33, 34, 34, 35, 36, 37, 38, 38, 40, 41, 43, 45, 46, 36, 35, 35, 34,
+        36, 36, 38, 40, 42, 47, 48, 50, 51, 53, 56, 56, 44, 42, 41, 41, 42, 42,
+        42, 44, 48, 52, 54, 58, 60, 63, 66, 67,
+        /* Size 8x32 */
+        32, 31, 31, 32, 35, 36, 44, 47, 31, 32, 32, 32, 35, 35, 43, 46, 31, 32,
+        32, 32, 35, 35, 42, 45, 31, 32, 32, 32, 35, 35, 42, 45, 31, 32, 32, 32,
+        34, 35, 41, 45, 31, 32, 32, 33, 34, 34, 41, 44, 31, 32, 32, 33, 34, 34,
+        41, 44, 31, 32, 32, 33, 34, 35, 41, 44, 31, 32, 33, 34, 35, 36, 42, 44,
+        32, 32, 33, 34, 36, 36, 42, 45, 32, 32, 33, 34, 36, 36, 42, 45, 32, 32,
+        33, 35, 37, 37, 42, 45, 32, 33, 34, 35, 37, 38, 42, 45, 32, 33, 34, 35,
+        37, 38, 42, 45, 32, 33, 34, 36, 39, 40, 44, 47, 34, 34, 35, 37, 41, 42,
+        48, 50, 34, 34, 35, 37, 41, 42, 48, 50, 34, 34, 35, 37, 42, 43, 49, 51,
+        35, 34, 36, 38, 45, 47, 52, 55, 36, 34, 36, 38, 46, 48, 54, 56, 36, 34,
+        36, 38, 46, 48, 54, 56, 38, 36, 37, 40, 47, 49, 56, 58, 39, 37, 39, 40,
+        48, 50, 58, 60, 39, 37, 39, 40, 48, 50, 58, 60, 41, 39, 40, 41, 49, 51,
+        60, 62, 44, 41, 42, 43, 51, 53, 63, 66, 44, 41, 42, 43, 51, 53, 63, 66,
+        44, 42, 42, 43, 51, 54, 64, 67, 47, 44, 44, 45, 53, 56, 66, 69, 48, 45,
+        45, 46, 54, 56, 67, 70, 48, 45, 45, 46, 54, 56, 67, 70, 51, 47, 48, 48,
+        56, 58, 69, 73,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 34, 34, 34,
+        35, 36, 36, 38, 39, 39, 41, 44, 44, 44, 47, 48, 48, 51, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 34, 36,
+        37, 37, 39, 41, 41, 42, 44, 45, 45, 47, 31, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 33, 34, 34, 34, 35, 35, 35, 36, 36, 36, 37, 39, 39, 40, 42,
+        42, 42, 44, 45, 45, 48, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 35,
+        35, 35, 36, 37, 37, 37, 38, 38, 38, 40, 40, 40, 41, 43, 43, 43, 45, 46,
+        46, 48, 35, 35, 35, 35, 34, 34, 34, 34, 35, 36, 36, 37, 37, 37, 39, 41,
+        41, 42, 45, 46, 46, 47, 48, 48, 49, 51, 51, 51, 53, 54, 54, 56, 36, 35,
+        35, 35, 35, 34, 34, 35, 36, 36, 36, 37, 38, 38, 40, 42, 42, 43, 47, 48,
+        48, 49, 50, 50, 51, 53, 53, 54, 56, 56, 56, 58, 44, 43, 42, 42, 41, 41,
+        41, 41, 42, 42, 42, 42, 42, 42, 44, 48, 48, 49, 52, 54, 54, 56, 58, 58,
+        60, 63, 63, 64, 66, 67, 67, 69, 47, 46, 45, 45, 45, 44, 44, 44, 44, 45,
+        45, 45, 45, 45, 47, 50, 50, 51, 55, 56, 56, 58, 60, 60, 62, 66, 66, 67,
+        69, 70, 70, 73 },
+      { /* Chroma */
+        /* Size 4x4 */
+        31, 37, 47, 47, 37, 44, 47, 45, 47, 47, 53, 53, 47, 45, 53, 59,
+        /* Size 8x8 */
+        31, 31, 34, 37, 43, 48, 47, 49, 31, 32, 35, 40, 43, 46, 45, 46, 34, 35,
+        39, 43, 45, 46, 45, 46, 37, 40, 43, 47, 47, 47, 45, 46, 43, 43, 45, 47,
+        49, 50, 50, 50, 48, 46, 46, 47, 50, 53, 55, 55, 47, 45, 45, 45, 50, 55,
+        58, 60, 49, 46, 46, 46, 50, 55, 60, 61,
+        /* Size 16x16 */
+        32, 31, 31, 30, 33, 33, 36, 38, 41, 47, 49, 48, 49, 49, 50, 50, 31, 31,
+        31, 31, 34, 34, 38, 40, 42, 46, 47, 47, 47, 47, 48, 48, 31, 31, 31, 31,
+        34, 35, 39, 40, 42, 46, 47, 46, 46, 46, 47, 47, 30, 31, 31, 32, 34, 35,
+        40, 41, 42, 45, 46, 45, 45, 45, 46, 46, 33, 34, 34, 34, 37, 38, 42, 43,
+        44, 46, 47, 46, 46, 45, 46, 46, 33, 34, 35, 35, 38, 39, 43, 44, 45, 47,
+        47, 46, 46, 45, 46, 46, 36, 38, 39, 40, 42, 43, 47, 47, 47, 47, 48, 46,
+        46, 45, 46, 46, 38, 40, 40, 41, 43, 44, 47, 47, 48, 48, 49, 48, 47, 47,
+        47, 47, 41, 42, 42, 42, 44, 45, 47, 48, 48, 50, 50, 49, 49, 49, 50, 50,
+        47, 46, 46, 45, 46, 47, 47, 48, 50, 52, 52, 52, 52, 52, 53, 53, 49, 47,
+        47, 46, 47, 47, 48, 49, 50, 52, 53, 53, 53, 53, 54, 54, 48, 47, 46, 45,
+        46, 46, 46, 48, 49, 52, 53, 54, 55, 55, 56, 56, 49, 47, 46, 45, 46, 46,
+        46, 47, 49, 52, 53, 55, 55, 57, 57, 58, 49, 47, 46, 45, 45, 45, 45, 47,
+        49, 52, 53, 55, 57, 58, 59, 60, 50, 48, 47, 46, 46, 46, 46, 47, 50, 53,
+        54, 56, 57, 59, 61, 61, 50, 48, 47, 46, 46, 46, 46, 47, 50, 53, 54, 56,
+        58, 60, 61, 61,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 30, 30, 31, 33, 33, 33, 35, 36, 36, 38, 41, 41, 43,
+        47, 49, 49, 49, 48, 48, 49, 49, 49, 49, 50, 50, 50, 51, 31, 31, 31, 31,
+        31, 31, 31, 31, 33, 34, 34, 36, 37, 37, 39, 42, 42, 43, 47, 48, 48, 48,
+        47, 47, 47, 47, 47, 48, 49, 49, 49, 50, 31, 31, 31, 31, 31, 31, 31, 32,
+        34, 34, 34, 37, 38, 38, 40, 42, 42, 43, 46, 47, 47, 47, 47, 47, 47, 47,
+        47, 47, 48, 48, 48, 49, 31, 31, 31, 31, 31, 31, 31, 32, 34, 34, 34, 37,
+        38, 38, 40, 42, 42, 43, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48,
+        48, 49, 31, 31, 31, 31, 31, 31, 31, 32, 34, 35, 35, 37, 39, 39, 40, 42,
+        42, 43, 46, 47, 47, 46, 46, 46, 46, 46, 46, 46, 47, 47, 47, 48, 30, 31,
+        31, 31, 31, 32, 32, 32, 34, 35, 35, 38, 40, 40, 41, 42, 42, 43, 45, 46,
+        46, 46, 45, 45, 45, 45, 45, 45, 46, 46, 46, 47, 30, 31, 31, 31, 31, 32,
+        32, 32, 34, 35, 35, 38, 40, 40, 41, 42, 42, 43, 45, 46, 46, 46, 45, 45,
+        45, 45, 45, 45, 46, 46, 46, 47, 31, 31, 32, 32, 32, 32, 32, 33, 35, 36,
+        36, 38, 40, 40, 41, 43, 43, 43, 46, 46, 46, 46, 45, 45, 45, 45, 45, 45,
+        46, 46, 46, 47, 33, 33, 34, 34, 34, 34, 34, 35, 37, 38, 38, 41, 42, 42,
+        43, 44, 44, 45, 46, 47, 47, 46, 46, 46, 46, 45, 45, 45, 46, 46, 46, 47,
+        33, 34, 34, 34, 35, 35, 35, 36, 38, 39, 39, 41, 43, 43, 44, 45, 45, 45,
+        47, 47, 47, 46, 46, 46, 46, 45, 45, 45, 46, 46, 46, 47, 33, 34, 34, 34,
+        35, 35, 35, 36, 38, 39, 39, 41, 43, 43, 44, 45, 45, 45, 47, 47, 47, 46,
+        46, 46, 46, 45, 45, 45, 46, 46, 46, 47, 35, 36, 37, 37, 37, 38, 38, 38,
+        41, 41, 41, 44, 46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 46, 46, 46, 45,
+        45, 45, 46, 46, 46, 47, 36, 37, 38, 38, 39, 40, 40, 40, 42, 43, 43, 46,
+        47, 47, 47, 47, 47, 47, 47, 48, 48, 47, 46, 46, 46, 45, 45, 45, 46, 46,
+        46, 46, 36, 37, 38, 38, 39, 40, 40, 40, 42, 43, 43, 46, 47, 47, 47, 47,
+        47, 47, 47, 48, 48, 47, 46, 46, 46, 45, 45, 45, 46, 46, 46, 46, 38, 39,
+        40, 40, 40, 41, 41, 41, 43, 44, 44, 46, 47, 47, 47, 48, 48, 48, 48, 49,
+        49, 48, 48, 48, 47, 47, 47, 47, 47, 47, 47, 48, 41, 42, 42, 42, 42, 42,
+        42, 43, 44, 45, 45, 46, 47, 47, 48, 48, 48, 49, 50, 50, 50, 50, 49, 49,
+        49, 49, 49, 49, 50, 50, 50, 50, 41, 42, 42, 42, 42, 42, 42, 43, 44, 45,
+        45, 46, 47, 47, 48, 48, 48, 49, 50, 50, 50, 50, 49, 49, 49, 49, 49, 49,
+        50, 50, 50, 50, 43, 43, 43, 43, 43, 43, 43, 43, 45, 45, 45, 46, 47, 47,
+        48, 49, 49, 49, 50, 51, 51, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 51,
+        47, 47, 46, 46, 46, 45, 45, 46, 46, 47, 47, 47, 47, 47, 48, 50, 50, 50,
+        52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 53, 53, 53, 53, 49, 48, 47, 47,
+        47, 46, 46, 46, 47, 47, 47, 47, 48, 48, 49, 50, 50, 51, 52, 53, 53, 53,
+        53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 49, 48, 47, 47, 47, 46, 46, 46,
+        47, 47, 47, 47, 48, 48, 49, 50, 50, 51, 52, 53, 53, 53, 53, 53, 53, 53,
+        53, 53, 54, 54, 54, 54, 49, 48, 47, 47, 46, 46, 46, 46, 46, 46, 46, 47,
+        47, 47, 48, 50, 50, 50, 52, 53, 53, 53, 54, 54, 54, 55, 55, 55, 55, 55,
+        55, 56, 48, 47, 47, 47, 46, 45, 45, 45, 46, 46, 46, 46, 46, 46, 48, 49,
+        49, 50, 52, 53, 53, 54, 54, 54, 55, 55, 55, 56, 56, 56, 56, 57, 48, 47,
+        47, 47, 46, 45, 45, 45, 46, 46, 46, 46, 46, 46, 48, 49, 49, 50, 52, 53,
+        53, 54, 54, 54, 55, 55, 55, 56, 56, 56, 56, 57, 49, 47, 47, 47, 46, 45,
+        45, 45, 46, 46, 46, 46, 46, 46, 47, 49, 49, 50, 52, 53, 53, 54, 55, 55,
+        55, 57, 57, 57, 57, 58, 58, 58, 49, 47, 47, 47, 46, 45, 45, 45, 45, 45,
+        45, 45, 45, 45, 47, 49, 49, 50, 52, 53, 53, 55, 55, 55, 57, 58, 58, 59,
+        59, 60, 60, 60, 49, 47, 47, 47, 46, 45, 45, 45, 45, 45, 45, 45, 45, 45,
+        47, 49, 49, 50, 52, 53, 53, 55, 55, 55, 57, 58, 58, 59, 59, 60, 60, 60,
+        49, 48, 47, 47, 46, 45, 45, 45, 45, 45, 45, 45, 45, 45, 47, 49, 49, 50,
+        52, 53, 53, 55, 56, 56, 57, 59, 59, 59, 60, 60, 60, 61, 50, 49, 48, 48,
+        47, 46, 46, 46, 46, 46, 46, 46, 46, 46, 47, 50, 50, 50, 53, 54, 54, 55,
+        56, 56, 57, 59, 59, 60, 61, 61, 61, 62, 50, 49, 48, 48, 47, 46, 46, 46,
+        46, 46, 46, 46, 46, 46, 47, 50, 50, 50, 53, 54, 54, 55, 56, 56, 58, 60,
+        60, 60, 61, 61, 61, 63, 50, 49, 48, 48, 47, 46, 46, 46, 46, 46, 46, 46,
+        46, 46, 47, 50, 50, 50, 53, 54, 54, 55, 56, 56, 58, 60, 60, 60, 61, 61,
+        61, 63, 51, 50, 49, 49, 48, 47, 47, 47, 47, 47, 47, 47, 46, 46, 48, 50,
+        50, 51, 53, 54, 54, 56, 57, 57, 58, 60, 60, 61, 62, 63, 63, 64,
+        /* Size 4x8 */
+        31, 38, 47, 48, 31, 40, 46, 45, 35, 43, 47, 46, 39, 47, 47, 45, 43, 47,
+        50, 50, 47, 47, 53, 55, 46, 46, 53, 58, 48, 46, 54, 59,
+        /* Size 8x4 */
+        31, 31, 35, 39, 43, 47, 46, 48, 38, 40, 43, 47, 47, 47, 46, 46, 47, 46,
+        47, 47, 50, 53, 53, 54, 48, 45, 46, 45, 50, 55, 58, 59,
+        /* Size 8x16 */
+        32, 31, 33, 37, 45, 48, 49, 50, 31, 31, 34, 38, 45, 47, 47, 48, 31, 32,
+        34, 39, 45, 46, 46, 47, 30, 32, 35, 40, 44, 46, 45, 46, 33, 35, 37, 42,
+        46, 47, 45, 46, 33, 36, 38, 43, 46, 47, 46, 46, 37, 40, 43, 47, 47, 47,
+        45, 46, 39, 41, 43, 47, 48, 48, 47, 47, 42, 43, 44, 47, 49, 50, 49, 50,
+        47, 46, 46, 48, 51, 52, 53, 53, 49, 46, 47, 48, 52, 53, 53, 54, 48, 46,
+        46, 47, 51, 53, 56, 56, 48, 45, 46, 46, 51, 53, 57, 57, 49, 45, 45, 46,
+        51, 53, 58, 59, 50, 46, 46, 46, 52, 54, 59, 61, 50, 46, 46, 46, 52, 54,
+        59, 61,
+        /* Size 16x8 */
+        32, 31, 31, 30, 33, 33, 37, 39, 42, 47, 49, 48, 48, 49, 50, 50, 31, 31,
+        32, 32, 35, 36, 40, 41, 43, 46, 46, 46, 45, 45, 46, 46, 33, 34, 34, 35,
+        37, 38, 43, 43, 44, 46, 47, 46, 46, 45, 46, 46, 37, 38, 39, 40, 42, 43,
+        47, 47, 47, 48, 48, 47, 46, 46, 46, 46, 45, 45, 45, 44, 46, 46, 47, 48,
+        49, 51, 52, 51, 51, 51, 52, 52, 48, 47, 46, 46, 47, 47, 47, 48, 50, 52,
+        53, 53, 53, 53, 54, 54, 49, 47, 46, 45, 45, 46, 45, 47, 49, 53, 53, 56,
+        57, 58, 59, 59, 50, 48, 47, 46, 46, 46, 46, 47, 50, 53, 54, 56, 57, 59,
+        61, 61,
+        /* Size 16x32 */
+        32, 31, 31, 31, 33, 37, 37, 38, 45, 48, 48, 49, 49, 49, 50, 52, 31, 31,
+        31, 31, 33, 38, 38, 39, 45, 47, 47, 48, 48, 48, 49, 51, 31, 31, 31, 31,
+        34, 38, 38, 40, 45, 47, 47, 47, 47, 47, 48, 50, 31, 31, 31, 31, 34, 38,
+        38, 40, 45, 47, 47, 47, 47, 47, 48, 50, 31, 31, 32, 32, 34, 39, 39, 40,
+        45, 46, 46, 46, 46, 46, 47, 49, 30, 31, 32, 32, 35, 40, 40, 41, 44, 46,
+        46, 45, 45, 45, 46, 48, 30, 31, 32, 32, 35, 40, 40, 41, 44, 46, 46, 45,
+        45, 45, 46, 48, 31, 32, 33, 33, 35, 40, 40, 41, 45, 46, 46, 45, 45, 45,
+        46, 48, 33, 34, 35, 35, 37, 42, 42, 43, 46, 47, 47, 46, 45, 45, 46, 47,
+        33, 35, 36, 36, 38, 43, 43, 44, 46, 47, 47, 46, 46, 46, 46, 47, 33, 35,
+        36, 36, 38, 43, 43, 44, 46, 47, 47, 46, 46, 46, 46, 47, 35, 37, 38, 38,
+        41, 45, 45, 46, 47, 47, 47, 46, 45, 45, 46, 47, 37, 39, 40, 40, 43, 47,
+        47, 47, 47, 47, 47, 46, 45, 45, 46, 47, 37, 39, 40, 40, 43, 47, 47, 47,
+        47, 47, 47, 46, 45, 45, 46, 47, 39, 40, 41, 41, 43, 47, 47, 47, 48, 48,
+        48, 47, 47, 47, 47, 48, 42, 42, 43, 43, 44, 47, 47, 48, 49, 50, 50, 49,
+        49, 49, 50, 50, 42, 42, 43, 43, 44, 47, 47, 48, 49, 50, 50, 49, 49, 49,
+        50, 50, 43, 43, 43, 43, 45, 47, 47, 48, 50, 50, 50, 50, 50, 50, 50, 51,
+        47, 46, 46, 46, 46, 48, 48, 48, 51, 52, 52, 52, 53, 53, 53, 53, 49, 47,
+        46, 46, 47, 48, 48, 49, 52, 53, 53, 53, 53, 53, 54, 54, 49, 47, 46, 46,
+        47, 48, 48, 49, 52, 53, 53, 53, 53, 53, 54, 54, 48, 47, 46, 46, 46, 47,
+        47, 48, 52, 53, 53, 54, 55, 55, 55, 56, 48, 47, 46, 46, 46, 47, 47, 48,
+        51, 53, 53, 54, 56, 56, 56, 57, 48, 47, 46, 46, 46, 47, 47, 48, 51, 53,
+        53, 54, 56, 56, 56, 57, 48, 47, 45, 45, 46, 46, 46, 47, 51, 53, 53, 55,
+        57, 57, 57, 59, 49, 46, 45, 45, 45, 46, 46, 47, 51, 53, 53, 56, 58, 58,
+        59, 61, 49, 46, 45, 45, 45, 46, 46, 47, 51, 53, 53, 56, 58, 58, 59, 61,
+        49, 47, 45, 45, 45, 46, 46, 47, 52, 53, 53, 56, 58, 58, 60, 62, 50, 48,
+        46, 46, 46, 46, 46, 48, 52, 54, 54, 57, 59, 59, 61, 63, 50, 48, 46, 46,
+        46, 46, 46, 48, 52, 54, 54, 57, 59, 59, 61, 64, 50, 48, 46, 46, 46, 46,
+        46, 48, 52, 54, 54, 57, 59, 59, 61, 64, 51, 49, 47, 47, 47, 47, 47, 48,
+        52, 54, 54, 58, 60, 60, 62, 65,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 30, 30, 31, 33, 33, 33, 35, 37, 37, 39, 42, 42, 43,
+        47, 49, 49, 48, 48, 48, 48, 49, 49, 49, 50, 50, 50, 51, 31, 31, 31, 31,
+        31, 31, 31, 32, 34, 35, 35, 37, 39, 39, 40, 42, 42, 43, 46, 47, 47, 47,
+        47, 47, 47, 46, 46, 47, 48, 48, 48, 49, 31, 31, 31, 31, 32, 32, 32, 33,
+        35, 36, 36, 38, 40, 40, 41, 43, 43, 43, 46, 46, 46, 46, 46, 46, 45, 45,
+        45, 45, 46, 46, 46, 47, 31, 31, 31, 31, 32, 32, 32, 33, 35, 36, 36, 38,
+        40, 40, 41, 43, 43, 43, 46, 46, 46, 46, 46, 46, 45, 45, 45, 45, 46, 46,
+        46, 47, 33, 33, 34, 34, 34, 35, 35, 35, 37, 38, 38, 41, 43, 43, 43, 44,
+        44, 45, 46, 47, 47, 46, 46, 46, 46, 45, 45, 45, 46, 46, 46, 47, 37, 38,
+        38, 38, 39, 40, 40, 40, 42, 43, 43, 45, 47, 47, 47, 47, 47, 47, 48, 48,
+        48, 47, 47, 47, 46, 46, 46, 46, 46, 46, 46, 47, 37, 38, 38, 38, 39, 40,
+        40, 40, 42, 43, 43, 45, 47, 47, 47, 47, 47, 47, 48, 48, 48, 47, 47, 47,
+        46, 46, 46, 46, 46, 46, 46, 47, 38, 39, 40, 40, 40, 41, 41, 41, 43, 44,
+        44, 46, 47, 47, 47, 48, 48, 48, 48, 49, 49, 48, 48, 48, 47, 47, 47, 47,
+        48, 48, 48, 48, 45, 45, 45, 45, 45, 44, 44, 45, 46, 46, 46, 47, 47, 47,
+        48, 49, 49, 50, 51, 52, 52, 52, 51, 51, 51, 51, 51, 52, 52, 52, 52, 52,
+        48, 47, 47, 47, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 48, 50, 50, 50,
+        52, 53, 53, 53, 53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 48, 47, 47, 47,
+        46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 48, 50, 50, 50, 52, 53, 53, 53,
+        53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 49, 48, 47, 47, 46, 45, 45, 45,
+        46, 46, 46, 46, 46, 46, 47, 49, 49, 50, 52, 53, 53, 54, 54, 54, 55, 56,
+        56, 56, 57, 57, 57, 58, 49, 48, 47, 47, 46, 45, 45, 45, 45, 46, 46, 45,
+        45, 45, 47, 49, 49, 50, 53, 53, 53, 55, 56, 56, 57, 58, 58, 58, 59, 59,
+        59, 60, 49, 48, 47, 47, 46, 45, 45, 45, 45, 46, 46, 45, 45, 45, 47, 49,
+        49, 50, 53, 53, 53, 55, 56, 56, 57, 58, 58, 58, 59, 59, 59, 60, 50, 49,
+        48, 48, 47, 46, 46, 46, 46, 46, 46, 46, 46, 46, 47, 50, 50, 50, 53, 54,
+        54, 55, 56, 56, 57, 59, 59, 60, 61, 61, 61, 62, 52, 51, 50, 50, 49, 48,
+        48, 48, 47, 47, 47, 47, 47, 47, 48, 50, 50, 51, 53, 54, 54, 56, 57, 57,
+        59, 61, 61, 62, 63, 64, 64, 65,
+        /* Size 4x16 */
+        31, 37, 48, 49, 31, 38, 47, 47, 31, 39, 46, 46, 31, 40, 46, 45, 34, 42,
+        47, 45, 35, 43, 47, 46, 39, 47, 47, 45, 40, 47, 48, 47, 42, 47, 50, 49,
+        46, 48, 52, 53, 47, 48, 53, 53, 47, 47, 53, 56, 47, 46, 53, 57, 46, 46,
+        53, 58, 48, 46, 54, 59, 48, 46, 54, 59,
+        /* Size 16x4 */
+        31, 31, 31, 31, 34, 35, 39, 40, 42, 46, 47, 47, 47, 46, 48, 48, 37, 38,
+        39, 40, 42, 43, 47, 47, 47, 48, 48, 47, 46, 46, 46, 46, 48, 47, 46, 46,
+        47, 47, 47, 48, 50, 52, 53, 53, 53, 53, 54, 54, 49, 47, 46, 45, 45, 46,
+        45, 47, 49, 53, 53, 56, 57, 58, 59, 59,
+        /* Size 8x32 */
+        32, 31, 33, 37, 45, 48, 49, 50, 31, 31, 33, 38, 45, 47, 48, 49, 31, 31,
+        34, 38, 45, 47, 47, 48, 31, 31, 34, 38, 45, 47, 47, 48, 31, 32, 34, 39,
+        45, 46, 46, 47, 30, 32, 35, 40, 44, 46, 45, 46, 30, 32, 35, 40, 44, 46,
+        45, 46, 31, 33, 35, 40, 45, 46, 45, 46, 33, 35, 37, 42, 46, 47, 45, 46,
+        33, 36, 38, 43, 46, 47, 46, 46, 33, 36, 38, 43, 46, 47, 46, 46, 35, 38,
+        41, 45, 47, 47, 45, 46, 37, 40, 43, 47, 47, 47, 45, 46, 37, 40, 43, 47,
+        47, 47, 45, 46, 39, 41, 43, 47, 48, 48, 47, 47, 42, 43, 44, 47, 49, 50,
+        49, 50, 42, 43, 44, 47, 49, 50, 49, 50, 43, 43, 45, 47, 50, 50, 50, 50,
+        47, 46, 46, 48, 51, 52, 53, 53, 49, 46, 47, 48, 52, 53, 53, 54, 49, 46,
+        47, 48, 52, 53, 53, 54, 48, 46, 46, 47, 52, 53, 55, 55, 48, 46, 46, 47,
+        51, 53, 56, 56, 48, 46, 46, 47, 51, 53, 56, 56, 48, 45, 46, 46, 51, 53,
+        57, 57, 49, 45, 45, 46, 51, 53, 58, 59, 49, 45, 45, 46, 51, 53, 58, 59,
+        49, 45, 45, 46, 52, 53, 58, 60, 50, 46, 46, 46, 52, 54, 59, 61, 50, 46,
+        46, 46, 52, 54, 59, 61, 50, 46, 46, 46, 52, 54, 59, 61, 51, 47, 47, 47,
+        52, 54, 60, 62,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 30, 30, 31, 33, 33, 33, 35, 37, 37, 39, 42, 42, 43,
+        47, 49, 49, 48, 48, 48, 48, 49, 49, 49, 50, 50, 50, 51, 31, 31, 31, 31,
+        32, 32, 32, 33, 35, 36, 36, 38, 40, 40, 41, 43, 43, 43, 46, 46, 46, 46,
+        46, 46, 45, 45, 45, 45, 46, 46, 46, 47, 33, 33, 34, 34, 34, 35, 35, 35,
+        37, 38, 38, 41, 43, 43, 43, 44, 44, 45, 46, 47, 47, 46, 46, 46, 46, 45,
+        45, 45, 46, 46, 46, 47, 37, 38, 38, 38, 39, 40, 40, 40, 42, 43, 43, 45,
+        47, 47, 47, 47, 47, 47, 48, 48, 48, 47, 47, 47, 46, 46, 46, 46, 46, 46,
+        46, 47, 45, 45, 45, 45, 45, 44, 44, 45, 46, 46, 46, 47, 47, 47, 48, 49,
+        49, 50, 51, 52, 52, 52, 51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 48, 47,
+        47, 47, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 48, 50, 50, 50, 52, 53,
+        53, 53, 53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 49, 48, 47, 47, 46, 45,
+        45, 45, 45, 46, 46, 45, 45, 45, 47, 49, 49, 50, 53, 53, 53, 55, 56, 56,
+        57, 58, 58, 58, 59, 59, 59, 60, 50, 49, 48, 48, 47, 46, 46, 46, 46, 46,
+        46, 46, 46, 46, 47, 50, 50, 50, 53, 54, 54, 55, 56, 56, 57, 59, 59, 60,
+        61, 61, 61, 62 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 32, 34, 38, 32, 33, 35, 39, 34, 35, 39, 45, 38, 39, 45, 54,
+        /* Size 8x8 */
+        31, 31, 32, 32, 33, 34, 37, 41, 31, 32, 32, 32, 33, 34, 36, 39, 32, 32,
+        32, 33, 34, 35, 37, 40, 32, 32, 33, 34, 35, 36, 38, 41, 33, 33, 34, 35,
+        37, 39, 41, 44, 34, 34, 35, 36, 39, 43, 46, 49, 37, 36, 37, 38, 41, 46,
+        51, 54, 41, 39, 40, 41, 44, 49, 54, 58,
+        /* Size 16x16 */
+        32, 31, 31, 31, 31, 31, 31, 32, 32, 34, 34, 36, 36, 39, 39, 44, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 34, 34, 35, 35, 38, 38, 42, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 34, 34, 35, 35, 38, 38, 42, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 33, 33, 34, 34, 37, 37, 41, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 34, 34, 37, 37, 41, 31, 32, 32, 32, 32, 33, 33, 34, 34, 35,
+        35, 36, 36, 39, 39, 42, 31, 32, 32, 32, 32, 33, 33, 34, 34, 35, 35, 36,
+        36, 39, 39, 42, 32, 32, 32, 32, 32, 34, 34, 35, 35, 37, 37, 38, 38, 40,
+        40, 42, 32, 32, 32, 32, 32, 34, 34, 35, 35, 37, 37, 38, 38, 40, 40, 42,
+        34, 34, 34, 33, 33, 35, 35, 37, 37, 39, 39, 42, 42, 45, 45, 47, 34, 34,
+        34, 33, 33, 35, 35, 37, 37, 39, 39, 42, 42, 45, 45, 47, 36, 35, 35, 34,
+        34, 36, 36, 38, 38, 42, 42, 48, 48, 50, 50, 54, 36, 35, 35, 34, 34, 36,
+        36, 38, 38, 42, 42, 48, 48, 50, 50, 54, 39, 38, 38, 37, 37, 39, 39, 40,
+        40, 45, 45, 50, 50, 54, 54, 58, 39, 38, 38, 37, 37, 39, 39, 40, 40, 45,
+        45, 50, 50, 54, 54, 58, 44, 42, 42, 41, 41, 42, 42, 42, 42, 47, 47, 54,
+        54, 58, 58, 63,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 33,
+        34, 34, 34, 35, 36, 36, 36, 37, 39, 39, 39, 41, 44, 44, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 34,
+        35, 35, 35, 37, 39, 39, 39, 41, 43, 43, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 34, 35, 35, 35, 37,
+        38, 38, 38, 40, 42, 42, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 34, 34, 34, 34, 35, 35, 35, 37, 38, 38, 38, 40,
+        42, 42, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 34, 34, 34, 34, 35, 35, 35, 37, 38, 38, 38, 40, 42, 42, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34,
+        34, 34, 35, 35, 35, 36, 38, 38, 38, 39, 41, 41, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34,
+        34, 36, 37, 37, 37, 39, 41, 41, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 36, 37, 37,
+        37, 39, 41, 41, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 36, 37, 37, 37, 39, 41, 41,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 34,
+        34, 34, 34, 35, 35, 35, 35, 37, 38, 38, 38, 40, 41, 41, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 34, 34, 34, 34, 35, 35, 35, 36,
+        36, 36, 36, 38, 39, 39, 39, 40, 42, 42, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 33, 33, 33, 34, 34, 34, 34, 35, 35, 35, 36, 36, 36, 36, 38,
+        39, 39, 39, 40, 42, 42, 31, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33,
+        33, 33, 34, 34, 34, 34, 35, 35, 35, 36, 36, 36, 36, 38, 39, 39, 39, 40,
+        42, 42, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34,
+        34, 35, 36, 36, 36, 36, 37, 37, 37, 38, 40, 40, 40, 41, 42, 42, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 34, 35, 35, 35, 36, 37, 37,
+        37, 37, 38, 38, 38, 39, 40, 40, 40, 41, 42, 42, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 33, 34, 34, 34, 34, 35, 35, 35, 36, 37, 37, 37, 37, 38, 38,
+        38, 39, 40, 40, 40, 41, 42, 42, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+        34, 34, 34, 34, 35, 35, 35, 36, 37, 37, 37, 37, 38, 38, 38, 39, 40, 40,
+        40, 41, 42, 42, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 35,
+        36, 36, 36, 37, 38, 38, 38, 39, 40, 40, 40, 41, 42, 42, 42, 44, 45, 45,
+        34, 34, 34, 34, 34, 34, 33, 33, 33, 34, 35, 35, 35, 36, 37, 37, 37, 38,
+        39, 39, 39, 41, 42, 42, 42, 44, 45, 45, 45, 46, 47, 47, 34, 34, 34, 34,
+        34, 34, 33, 33, 33, 34, 35, 35, 35, 36, 37, 37, 37, 38, 39, 39, 39, 41,
+        42, 42, 42, 44, 45, 45, 45, 46, 47, 47, 34, 34, 34, 34, 34, 34, 33, 33,
+        33, 34, 35, 35, 35, 36, 37, 37, 37, 38, 39, 39, 39, 41, 42, 42, 42, 44,
+        45, 45, 45, 46, 47, 47, 35, 34, 34, 34, 34, 34, 34, 34, 34, 35, 36, 36,
+        36, 36, 37, 37, 37, 39, 41, 41, 41, 43, 45, 45, 45, 46, 47, 47, 47, 49,
+        50, 50, 36, 35, 35, 35, 35, 35, 34, 34, 34, 35, 36, 36, 36, 37, 38, 38,
+        38, 40, 42, 42, 42, 45, 48, 48, 48, 49, 50, 50, 50, 52, 54, 54, 36, 35,
+        35, 35, 35, 35, 34, 34, 34, 35, 36, 36, 36, 37, 38, 38, 38, 40, 42, 42,
+        42, 45, 48, 48, 48, 49, 50, 50, 50, 52, 54, 54, 36, 35, 35, 35, 35, 35,
+        34, 34, 34, 35, 36, 36, 36, 37, 38, 38, 38, 40, 42, 42, 42, 45, 48, 48,
+        48, 49, 50, 50, 50, 52, 54, 54, 37, 37, 37, 37, 37, 36, 36, 36, 36, 37,
+        38, 38, 38, 38, 39, 39, 39, 41, 44, 44, 44, 46, 49, 49, 49, 51, 52, 52,
+        52, 54, 56, 56, 39, 39, 38, 38, 38, 38, 37, 37, 37, 38, 39, 39, 39, 40,
+        40, 40, 40, 42, 45, 45, 45, 47, 50, 50, 50, 52, 54, 54, 54, 56, 58, 58,
+        39, 39, 38, 38, 38, 38, 37, 37, 37, 38, 39, 39, 39, 40, 40, 40, 40, 42,
+        45, 45, 45, 47, 50, 50, 50, 52, 54, 54, 54, 56, 58, 58, 39, 39, 38, 38,
+        38, 38, 37, 37, 37, 38, 39, 39, 39, 40, 40, 40, 40, 42, 45, 45, 45, 47,
+        50, 50, 50, 52, 54, 54, 54, 56, 58, 58, 41, 41, 40, 40, 40, 39, 39, 39,
+        39, 40, 40, 40, 40, 41, 41, 41, 41, 44, 46, 46, 46, 49, 52, 52, 52, 54,
+        56, 56, 56, 58, 60, 60, 44, 43, 42, 42, 42, 41, 41, 41, 41, 41, 42, 42,
+        42, 42, 42, 42, 42, 45, 47, 47, 47, 50, 54, 54, 54, 56, 58, 58, 58, 60,
+        63, 63, 44, 43, 42, 42, 42, 41, 41, 41, 41, 41, 42, 42, 42, 42, 42, 42,
+        42, 45, 47, 47, 47, 50, 54, 54, 54, 56, 58, 58, 58, 60, 63, 63,
+        /* Size 4x8 */
+        31, 32, 34, 39, 32, 32, 34, 38, 32, 33, 34, 38, 32, 33, 36, 40, 33, 34,
+        38, 42, 34, 36, 41, 47, 37, 38, 44, 52, 40, 40, 46, 56,
+        /* Size 8x4 */
+        31, 32, 32, 32, 33, 34, 37, 40, 32, 32, 33, 33, 34, 36, 38, 40, 34, 34,
+        34, 36, 38, 41, 44, 46, 39, 38, 38, 40, 42, 47, 52, 56,
+        /* Size 8x16 */
+        32, 31, 31, 32, 32, 36, 36, 44, 31, 32, 32, 32, 32, 35, 35, 42, 31, 32,
+        32, 32, 32, 35, 35, 42, 31, 32, 32, 33, 33, 34, 34, 41, 31, 32, 32, 33,
+        33, 34, 34, 41, 32, 32, 32, 34, 34, 36, 36, 42, 32, 32, 32, 34, 34, 36,
+        36, 42, 32, 33, 33, 35, 35, 38, 38, 42, 32, 33, 33, 35, 35, 38, 38, 42,
+        34, 34, 34, 37, 37, 42, 42, 48, 34, 34, 34, 37, 37, 42, 42, 48, 36, 34,
+        34, 38, 38, 48, 48, 54, 36, 34, 34, 38, 38, 48, 48, 54, 39, 37, 37, 40,
+        40, 50, 50, 58, 39, 37, 37, 40, 40, 50, 50, 58, 44, 41, 41, 43, 43, 53,
+        53, 63,
+        /* Size 16x8 */
+        32, 31, 31, 31, 31, 32, 32, 32, 32, 34, 34, 36, 36, 39, 39, 44, 31, 32,
+        32, 32, 32, 32, 32, 33, 33, 34, 34, 34, 34, 37, 37, 41, 31, 32, 32, 32,
+        32, 32, 32, 33, 33, 34, 34, 34, 34, 37, 37, 41, 32, 32, 32, 33, 33, 34,
+        34, 35, 35, 37, 37, 38, 38, 40, 40, 43, 32, 32, 32, 33, 33, 34, 34, 35,
+        35, 37, 37, 38, 38, 40, 40, 43, 36, 35, 35, 34, 34, 36, 36, 38, 38, 42,
+        42, 48, 48, 50, 50, 53, 36, 35, 35, 34, 34, 36, 36, 38, 38, 42, 42, 48,
+        48, 50, 50, 53, 44, 42, 42, 41, 41, 42, 42, 42, 42, 48, 48, 54, 54, 58,
+        58, 63,
+        /* Size 16x32 */
+        32, 31, 31, 31, 31, 32, 32, 32, 32, 34, 36, 36, 36, 39, 44, 44, 31, 31,
+        31, 31, 31, 32, 32, 32, 32, 34, 35, 35, 35, 39, 43, 43, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 34, 35, 35, 35, 38, 42, 42, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 34, 35, 35, 35, 38, 42, 42, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 34, 35, 35, 35, 38, 42, 42, 31, 32, 32, 32, 32, 32, 32, 32, 32, 34,
+        35, 35, 35, 38, 41, 41, 31, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34,
+        34, 37, 41, 41, 31, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 37,
+        41, 41, 31, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 37, 41, 41,
+        31, 32, 32, 32, 32, 33, 33, 33, 33, 34, 35, 35, 35, 38, 41, 41, 32, 32,
+        32, 32, 32, 33, 34, 34, 34, 35, 36, 36, 36, 39, 42, 42, 32, 32, 32, 32,
+        32, 33, 34, 34, 34, 35, 36, 36, 36, 39, 42, 42, 32, 32, 32, 32, 32, 33,
+        34, 34, 34, 35, 36, 36, 36, 39, 42, 42, 32, 32, 32, 32, 32, 33, 34, 34,
+        34, 36, 37, 37, 37, 40, 42, 42, 32, 32, 33, 33, 33, 34, 35, 35, 35, 37,
+        38, 38, 38, 40, 42, 42, 32, 32, 33, 33, 33, 34, 35, 35, 35, 37, 38, 38,
+        38, 40, 42, 42, 32, 32, 33, 33, 33, 34, 35, 35, 35, 37, 38, 38, 38, 40,
+        42, 42, 33, 33, 33, 33, 33, 34, 36, 36, 36, 38, 40, 40, 40, 42, 45, 45,
+        34, 34, 34, 34, 34, 35, 37, 37, 37, 39, 42, 42, 42, 45, 48, 48, 34, 34,
+        34, 34, 34, 35, 37, 37, 37, 39, 42, 42, 42, 45, 48, 48, 34, 34, 34, 34,
+        34, 35, 37, 37, 37, 39, 42, 42, 42, 45, 48, 48, 35, 34, 34, 34, 34, 36,
+        37, 37, 37, 41, 45, 45, 45, 47, 50, 50, 36, 35, 34, 34, 34, 36, 38, 38,
+        38, 43, 48, 48, 48, 51, 54, 54, 36, 35, 34, 34, 34, 36, 38, 38, 38, 43,
+        48, 48, 48, 51, 54, 54, 36, 35, 34, 34, 34, 36, 38, 38, 38, 43, 48, 48,
+        48, 51, 54, 54, 37, 37, 36, 36, 36, 38, 39, 39, 39, 44, 49, 49, 49, 52,
+        56, 56, 39, 38, 37, 37, 37, 39, 40, 40, 40, 45, 50, 50, 50, 54, 58, 58,
+        39, 38, 37, 37, 37, 39, 40, 40, 40, 45, 50, 50, 50, 54, 58, 58, 39, 38,
+        37, 37, 37, 39, 40, 40, 40, 45, 50, 50, 50, 54, 58, 58, 41, 40, 39, 39,
+        39, 40, 42, 42, 42, 46, 52, 52, 52, 56, 60, 60, 44, 42, 41, 41, 41, 42,
+        43, 43, 43, 48, 53, 53, 53, 58, 63, 63, 44, 42, 41, 41, 41, 42, 43, 43,
+        43, 48, 53, 53, 53, 58, 63, 63,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33,
+        34, 34, 34, 35, 36, 36, 36, 37, 39, 39, 39, 41, 44, 44, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 34,
+        35, 35, 35, 37, 38, 38, 38, 40, 42, 42, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 36,
+        37, 37, 37, 39, 41, 41, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 36, 37, 37, 37, 39,
+        41, 41, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 34, 34, 34, 34, 34, 34, 34, 36, 37, 37, 37, 39, 41, 41, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 34, 34, 34, 34, 35, 35,
+        35, 36, 36, 36, 36, 38, 39, 39, 39, 40, 42, 42, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 33, 34, 34, 34, 34, 35, 35, 35, 36, 37, 37, 37, 37, 38, 38,
+        38, 39, 40, 40, 40, 42, 43, 43, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        34, 34, 34, 34, 35, 35, 35, 36, 37, 37, 37, 37, 38, 38, 38, 39, 40, 40,
+        40, 42, 43, 43, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34,
+        35, 35, 35, 36, 37, 37, 37, 37, 38, 38, 38, 39, 40, 40, 40, 42, 43, 43,
+        34, 34, 34, 34, 34, 34, 33, 33, 33, 34, 35, 35, 35, 36, 37, 37, 37, 38,
+        39, 39, 39, 41, 43, 43, 43, 44, 45, 45, 45, 46, 48, 48, 36, 35, 35, 35,
+        35, 35, 34, 34, 34, 35, 36, 36, 36, 37, 38, 38, 38, 40, 42, 42, 42, 45,
+        48, 48, 48, 49, 50, 50, 50, 52, 53, 53, 36, 35, 35, 35, 35, 35, 34, 34,
+        34, 35, 36, 36, 36, 37, 38, 38, 38, 40, 42, 42, 42, 45, 48, 48, 48, 49,
+        50, 50, 50, 52, 53, 53, 36, 35, 35, 35, 35, 35, 34, 34, 34, 35, 36, 36,
+        36, 37, 38, 38, 38, 40, 42, 42, 42, 45, 48, 48, 48, 49, 50, 50, 50, 52,
+        53, 53, 39, 39, 38, 38, 38, 38, 37, 37, 37, 38, 39, 39, 39, 40, 40, 40,
+        40, 42, 45, 45, 45, 47, 51, 51, 51, 52, 54, 54, 54, 56, 58, 58, 44, 43,
+        42, 42, 42, 41, 41, 41, 41, 41, 42, 42, 42, 42, 42, 42, 42, 45, 48, 48,
+        48, 50, 54, 54, 54, 56, 58, 58, 58, 60, 63, 63, 44, 43, 42, 42, 42, 41,
+        41, 41, 41, 41, 42, 42, 42, 42, 42, 42, 42, 45, 48, 48, 48, 50, 54, 54,
+        54, 56, 58, 58, 58, 60, 63, 63,
+        /* Size 4x16 */
+        31, 32, 34, 39, 32, 32, 34, 38, 32, 32, 34, 38, 32, 32, 33, 37, 32, 32,
+        33, 37, 32, 33, 35, 39, 32, 33, 35, 39, 32, 34, 37, 40, 32, 34, 37, 40,
+        34, 35, 39, 45, 34, 35, 39, 45, 35, 36, 43, 51, 35, 36, 43, 51, 38, 39,
+        45, 54, 38, 39, 45, 54, 42, 42, 48, 58,
+        /* Size 16x4 */
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 34, 34, 35, 35, 38, 38, 42, 32, 32,
+        32, 32, 32, 33, 33, 34, 34, 35, 35, 36, 36, 39, 39, 42, 34, 34, 34, 33,
+        33, 35, 35, 37, 37, 39, 39, 43, 43, 45, 45, 48, 39, 38, 38, 37, 37, 39,
+        39, 40, 40, 45, 45, 51, 51, 54, 54, 58,
+        /* Size 8x32 */
+        32, 31, 31, 32, 32, 36, 36, 44, 31, 31, 31, 32, 32, 35, 35, 43, 31, 32,
+        32, 32, 32, 35, 35, 42, 31, 32, 32, 32, 32, 35, 35, 42, 31, 32, 32, 32,
+        32, 35, 35, 42, 31, 32, 32, 32, 32, 35, 35, 41, 31, 32, 32, 33, 33, 34,
+        34, 41, 31, 32, 32, 33, 33, 34, 34, 41, 31, 32, 32, 33, 33, 34, 34, 41,
+        31, 32, 32, 33, 33, 35, 35, 41, 32, 32, 32, 34, 34, 36, 36, 42, 32, 32,
+        32, 34, 34, 36, 36, 42, 32, 32, 32, 34, 34, 36, 36, 42, 32, 32, 32, 34,
+        34, 37, 37, 42, 32, 33, 33, 35, 35, 38, 38, 42, 32, 33, 33, 35, 35, 38,
+        38, 42, 32, 33, 33, 35, 35, 38, 38, 42, 33, 33, 33, 36, 36, 40, 40, 45,
+        34, 34, 34, 37, 37, 42, 42, 48, 34, 34, 34, 37, 37, 42, 42, 48, 34, 34,
+        34, 37, 37, 42, 42, 48, 35, 34, 34, 37, 37, 45, 45, 50, 36, 34, 34, 38,
+        38, 48, 48, 54, 36, 34, 34, 38, 38, 48, 48, 54, 36, 34, 34, 38, 38, 48,
+        48, 54, 37, 36, 36, 39, 39, 49, 49, 56, 39, 37, 37, 40, 40, 50, 50, 58,
+        39, 37, 37, 40, 40, 50, 50, 58, 39, 37, 37, 40, 40, 50, 50, 58, 41, 39,
+        39, 42, 42, 52, 52, 60, 44, 41, 41, 43, 43, 53, 53, 63, 44, 41, 41, 43,
+        43, 53, 53, 63,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33,
+        34, 34, 34, 35, 36, 36, 36, 37, 39, 39, 39, 41, 44, 44, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34,
+        34, 34, 34, 36, 37, 37, 37, 39, 41, 41, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 36,
+        37, 37, 37, 39, 41, 41, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34,
+        34, 34, 35, 35, 35, 36, 37, 37, 37, 37, 38, 38, 38, 39, 40, 40, 40, 42,
+        43, 43, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 35, 35,
+        35, 36, 37, 37, 37, 37, 38, 38, 38, 39, 40, 40, 40, 42, 43, 43, 36, 35,
+        35, 35, 35, 35, 34, 34, 34, 35, 36, 36, 36, 37, 38, 38, 38, 40, 42, 42,
+        42, 45, 48, 48, 48, 49, 50, 50, 50, 52, 53, 53, 36, 35, 35, 35, 35, 35,
+        34, 34, 34, 35, 36, 36, 36, 37, 38, 38, 38, 40, 42, 42, 42, 45, 48, 48,
+        48, 49, 50, 50, 50, 52, 53, 53, 44, 43, 42, 42, 42, 41, 41, 41, 41, 41,
+        42, 42, 42, 42, 42, 42, 42, 45, 48, 48, 48, 50, 54, 54, 54, 56, 58, 58,
+        58, 60, 63, 63 },
+      { /* Chroma */
+        /* Size 4x4 */
+        31, 34, 42, 47, 34, 39, 45, 46, 42, 45, 48, 49, 47, 46, 49, 54,
+        /* Size 8x8 */
+        31, 31, 32, 35, 39, 45, 48, 48, 31, 31, 33, 37, 41, 44, 46, 46, 32, 33,
+        35, 39, 42, 45, 46, 45, 35, 37, 39, 43, 45, 47, 47, 46, 39, 41, 42, 45,
+        47, 48, 48, 47, 45, 44, 45, 47, 48, 50, 51, 51, 48, 46, 46, 47, 48, 51,
+        53, 54, 48, 46, 45, 46, 47, 51, 54, 56,
+        /* Size 16x16 */
+        32, 31, 31, 30, 30, 33, 33, 36, 36, 41, 41, 49, 49, 48, 48, 49, 31, 31,
+        31, 31, 31, 34, 34, 38, 38, 42, 42, 47, 47, 47, 47, 47, 31, 31, 31, 31,
+        31, 34, 34, 38, 38, 42, 42, 47, 47, 47, 47, 47, 30, 31, 31, 32, 32, 35,
+        35, 40, 40, 42, 42, 46, 46, 45, 45, 45, 30, 31, 31, 32, 32, 35, 35, 40,
+        40, 42, 42, 46, 46, 45, 45, 45, 33, 34, 34, 35, 35, 39, 39, 43, 43, 45,
+        45, 47, 47, 46, 46, 45, 33, 34, 34, 35, 35, 39, 39, 43, 43, 45, 45, 47,
+        47, 46, 46, 45, 36, 38, 38, 40, 40, 43, 43, 47, 47, 47, 47, 48, 48, 46,
+        46, 45, 36, 38, 38, 40, 40, 43, 43, 47, 47, 47, 47, 48, 48, 46, 46, 45,
+        41, 42, 42, 42, 42, 45, 45, 47, 47, 48, 48, 50, 50, 49, 49, 49, 41, 42,
+        42, 42, 42, 45, 45, 47, 47, 48, 48, 50, 50, 49, 49, 49, 49, 47, 47, 46,
+        46, 47, 47, 48, 48, 50, 50, 53, 53, 53, 53, 53, 49, 47, 47, 46, 46, 47,
+        47, 48, 48, 50, 50, 53, 53, 53, 53, 53, 48, 47, 47, 45, 45, 46, 46, 46,
+        46, 49, 49, 53, 53, 54, 54, 55, 48, 47, 47, 45, 45, 46, 46, 46, 46, 49,
+        49, 53, 53, 54, 54, 55, 49, 47, 47, 45, 45, 45, 45, 45, 45, 49, 49, 53,
+        53, 55, 55, 58,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 30, 30, 30, 32, 33, 33, 33, 35, 36, 36, 36, 39,
+        41, 41, 41, 45, 49, 49, 49, 49, 48, 48, 48, 49, 49, 49, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 32, 34, 34, 34, 35, 37, 37, 37, 39, 42, 42, 42, 45,
+        48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 33, 34, 34, 34, 36, 38, 38, 38, 40, 42, 42, 42, 45, 47, 47, 47, 47,
+        47, 47, 47, 47, 47, 47, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 34,
+        34, 36, 38, 38, 38, 40, 42, 42, 42, 45, 47, 47, 47, 47, 47, 47, 47, 47,
+        47, 47, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 34, 34, 36, 38, 38,
+        38, 40, 42, 42, 42, 45, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 33, 35, 35, 35, 37, 39, 39, 39, 41, 42, 42,
+        42, 44, 47, 47, 47, 46, 46, 46, 46, 46, 46, 46, 30, 31, 31, 31, 31, 31,
+        32, 32, 32, 33, 35, 35, 35, 37, 40, 40, 40, 41, 42, 42, 42, 44, 46, 46,
+        46, 46, 45, 45, 45, 45, 45, 45, 30, 31, 31, 31, 31, 31, 32, 32, 32, 33,
+        35, 35, 35, 37, 40, 40, 40, 41, 42, 42, 42, 44, 46, 46, 46, 46, 45, 45,
+        45, 45, 45, 45, 30, 31, 31, 31, 31, 31, 32, 32, 32, 33, 35, 35, 35, 37,
+        40, 40, 40, 41, 42, 42, 42, 44, 46, 46, 46, 46, 45, 45, 45, 45, 45, 45,
+        32, 32, 33, 33, 33, 33, 33, 33, 33, 35, 37, 37, 37, 39, 41, 41, 41, 42,
+        43, 43, 43, 45, 47, 47, 47, 46, 46, 46, 46, 45, 45, 45, 33, 34, 34, 34,
+        34, 35, 35, 35, 35, 37, 39, 39, 39, 41, 43, 43, 43, 44, 45, 45, 45, 46,
+        47, 47, 47, 47, 46, 46, 46, 46, 45, 45, 33, 34, 34, 34, 34, 35, 35, 35,
+        35, 37, 39, 39, 39, 41, 43, 43, 43, 44, 45, 45, 45, 46, 47, 47, 47, 47,
+        46, 46, 46, 46, 45, 45, 33, 34, 34, 34, 34, 35, 35, 35, 35, 37, 39, 39,
+        39, 41, 43, 43, 43, 44, 45, 45, 45, 46, 47, 47, 47, 47, 46, 46, 46, 46,
+        45, 45, 35, 35, 36, 36, 36, 37, 37, 37, 37, 39, 41, 41, 41, 43, 45, 45,
+        45, 45, 46, 46, 46, 47, 47, 47, 47, 47, 46, 46, 46, 46, 45, 45, 36, 37,
+        38, 38, 38, 39, 40, 40, 40, 41, 43, 43, 43, 45, 47, 47, 47, 47, 47, 47,
+        47, 47, 48, 48, 48, 47, 46, 46, 46, 46, 45, 45, 36, 37, 38, 38, 38, 39,
+        40, 40, 40, 41, 43, 43, 43, 45, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48,
+        48, 47, 46, 46, 46, 46, 45, 45, 36, 37, 38, 38, 38, 39, 40, 40, 40, 41,
+        43, 43, 43, 45, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48, 48, 47, 46, 46,
+        46, 46, 45, 45, 39, 39, 40, 40, 40, 41, 41, 41, 41, 42, 44, 44, 44, 45,
+        47, 47, 47, 47, 48, 48, 48, 48, 49, 49, 49, 48, 48, 48, 48, 47, 47, 47,
+        41, 42, 42, 42, 42, 42, 42, 42, 42, 43, 45, 45, 45, 46, 47, 47, 47, 48,
+        48, 48, 48, 49, 50, 50, 50, 50, 49, 49, 49, 49, 49, 49, 41, 42, 42, 42,
+        42, 42, 42, 42, 42, 43, 45, 45, 45, 46, 47, 47, 47, 48, 48, 48, 48, 49,
+        50, 50, 50, 50, 49, 49, 49, 49, 49, 49, 41, 42, 42, 42, 42, 42, 42, 42,
+        42, 43, 45, 45, 45, 46, 47, 47, 47, 48, 48, 48, 48, 49, 50, 50, 50, 50,
+        49, 49, 49, 49, 49, 49, 45, 45, 45, 45, 45, 44, 44, 44, 44, 45, 46, 46,
+        46, 47, 47, 47, 47, 48, 49, 49, 49, 50, 51, 51, 51, 51, 51, 51, 51, 51,
+        51, 51, 49, 48, 47, 47, 47, 47, 46, 46, 46, 47, 47, 47, 47, 47, 48, 48,
+        48, 49, 50, 50, 50, 51, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 49, 48,
+        47, 47, 47, 47, 46, 46, 46, 47, 47, 47, 47, 47, 48, 48, 48, 49, 50, 50,
+        50, 51, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 49, 48, 47, 47, 47, 47,
+        46, 46, 46, 47, 47, 47, 47, 47, 48, 48, 48, 49, 50, 50, 50, 51, 53, 53,
+        53, 53, 53, 53, 53, 53, 53, 53, 49, 48, 47, 47, 47, 46, 46, 46, 46, 46,
+        47, 47, 47, 47, 47, 47, 47, 48, 50, 50, 50, 51, 53, 53, 53, 53, 53, 53,
+        53, 54, 54, 54, 48, 48, 47, 47, 47, 46, 45, 45, 45, 46, 46, 46, 46, 46,
+        46, 46, 46, 48, 49, 49, 49, 51, 53, 53, 53, 53, 54, 54, 54, 55, 55, 55,
+        48, 48, 47, 47, 47, 46, 45, 45, 45, 46, 46, 46, 46, 46, 46, 46, 46, 48,
+        49, 49, 49, 51, 53, 53, 53, 53, 54, 54, 54, 55, 55, 55, 48, 48, 47, 47,
+        47, 46, 45, 45, 45, 46, 46, 46, 46, 46, 46, 46, 46, 48, 49, 49, 49, 51,
+        53, 53, 53, 53, 54, 54, 54, 55, 55, 55, 49, 48, 47, 47, 47, 46, 45, 45,
+        45, 45, 46, 46, 46, 46, 46, 46, 46, 47, 49, 49, 49, 51, 53, 53, 53, 54,
+        55, 55, 55, 56, 57, 57, 49, 48, 47, 47, 47, 46, 45, 45, 45, 45, 45, 45,
+        45, 45, 45, 45, 45, 47, 49, 49, 49, 51, 53, 53, 53, 54, 55, 55, 55, 57,
+        58, 58, 49, 48, 47, 47, 47, 46, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45,
+        45, 47, 49, 49, 49, 51, 53, 53, 53, 54, 55, 55, 55, 57, 58, 58,
+        /* Size 4x8 */
+        31, 34, 42, 48, 31, 35, 42, 46, 33, 37, 44, 46, 36, 41, 46, 46, 40, 44,
+        48, 48, 45, 46, 49, 51, 47, 47, 50, 54, 47, 46, 49, 55,
+        /* Size 8x4 */
+        31, 31, 33, 36, 40, 45, 47, 47, 34, 35, 37, 41, 44, 46, 47, 46, 42, 42,
+        44, 46, 48, 49, 50, 49, 48, 46, 46, 46, 48, 51, 54, 55,
+        /* Size 8x16 */
+        32, 31, 31, 37, 37, 48, 48, 49, 31, 31, 31, 38, 38, 47, 47, 47, 31, 31,
+        31, 38, 38, 47, 47, 47, 30, 32, 32, 40, 40, 46, 46, 45, 30, 32, 32, 40,
+        40, 46, 46, 45, 33, 36, 36, 43, 43, 47, 47, 46, 33, 36, 36, 43, 43, 47,
+        47, 46, 37, 40, 40, 47, 47, 47, 47, 45, 37, 40, 40, 47, 47, 47, 47, 45,
+        42, 43, 43, 47, 47, 50, 50, 49, 42, 43, 43, 47, 47, 50, 50, 49, 49, 46,
+        46, 48, 48, 53, 53, 53, 49, 46, 46, 48, 48, 53, 53, 53, 48, 46, 46, 47,
+        47, 53, 53, 56, 48, 46, 46, 47, 47, 53, 53, 56, 49, 45, 45, 46, 46, 53,
+        53, 58,
+        /* Size 16x8 */
+        32, 31, 31, 30, 30, 33, 33, 37, 37, 42, 42, 49, 49, 48, 48, 49, 31, 31,
+        31, 32, 32, 36, 36, 40, 40, 43, 43, 46, 46, 46, 46, 45, 31, 31, 31, 32,
+        32, 36, 36, 40, 40, 43, 43, 46, 46, 46, 46, 45, 37, 38, 38, 40, 40, 43,
+        43, 47, 47, 47, 47, 48, 48, 47, 47, 46, 37, 38, 38, 40, 40, 43, 43, 47,
+        47, 47, 47, 48, 48, 47, 47, 46, 48, 47, 47, 46, 46, 47, 47, 47, 47, 50,
+        50, 53, 53, 53, 53, 53, 48, 47, 47, 46, 46, 47, 47, 47, 47, 50, 50, 53,
+        53, 53, 53, 53, 49, 47, 47, 45, 45, 46, 46, 45, 45, 49, 49, 53, 53, 56,
+        56, 58,
+        /* Size 16x32 */
+        32, 31, 31, 31, 31, 33, 37, 37, 37, 42, 48, 48, 48, 48, 49, 49, 31, 31,
+        31, 31, 31, 34, 37, 37, 37, 42, 47, 47, 47, 48, 48, 48, 31, 31, 31, 31,
+        31, 34, 38, 38, 38, 42, 47, 47, 47, 47, 47, 47, 31, 31, 31, 31, 31, 34,
+        38, 38, 38, 42, 47, 47, 47, 47, 47, 47, 31, 31, 31, 31, 31, 34, 38, 38,
+        38, 42, 47, 47, 47, 47, 47, 47, 31, 31, 32, 32, 32, 35, 39, 39, 39, 42,
+        46, 46, 46, 46, 46, 46, 30, 31, 32, 32, 32, 35, 40, 40, 40, 42, 46, 46,
+        46, 45, 45, 45, 30, 31, 32, 32, 32, 35, 40, 40, 40, 42, 46, 46, 46, 45,
+        45, 45, 30, 31, 32, 32, 32, 35, 40, 40, 40, 42, 46, 46, 46, 45, 45, 45,
+        32, 33, 34, 34, 34, 37, 41, 41, 41, 44, 46, 46, 46, 46, 45, 45, 33, 34,
+        36, 36, 36, 39, 43, 43, 43, 45, 47, 47, 47, 46, 46, 46, 33, 34, 36, 36,
+        36, 39, 43, 43, 43, 45, 47, 47, 47, 46, 46, 46, 33, 34, 36, 36, 36, 39,
+        43, 43, 43, 45, 47, 47, 47, 46, 46, 46, 35, 36, 38, 38, 38, 41, 45, 45,
+        45, 46, 47, 47, 47, 46, 45, 45, 37, 38, 40, 40, 40, 43, 47, 47, 47, 47,
+        47, 47, 47, 46, 45, 45, 37, 38, 40, 40, 40, 43, 47, 47, 47, 47, 47, 47,
+        47, 46, 45, 45, 37, 38, 40, 40, 40, 43, 47, 47, 47, 47, 47, 47, 47, 46,
+        45, 45, 39, 40, 41, 41, 41, 44, 47, 47, 47, 48, 49, 49, 49, 48, 47, 47,
+        42, 42, 43, 43, 43, 45, 47, 47, 47, 48, 50, 50, 50, 50, 49, 49, 42, 42,
+        43, 43, 43, 45, 47, 47, 47, 48, 50, 50, 50, 50, 49, 49, 42, 42, 43, 43,
+        43, 45, 47, 47, 47, 48, 50, 50, 50, 50, 49, 49, 45, 45, 44, 44, 44, 46,
+        47, 47, 47, 49, 51, 51, 51, 51, 51, 51, 49, 48, 46, 46, 46, 47, 48, 48,
+        48, 50, 53, 53, 53, 53, 53, 53, 49, 48, 46, 46, 46, 47, 48, 48, 48, 50,
+        53, 53, 53, 53, 53, 53, 49, 48, 46, 46, 46, 47, 48, 48, 48, 50, 53, 53,
+        53, 53, 53, 53, 48, 47, 46, 46, 46, 47, 47, 47, 47, 50, 53, 53, 53, 54,
+        54, 54, 48, 47, 46, 46, 46, 46, 47, 47, 47, 50, 53, 53, 53, 54, 56, 56,
+        48, 47, 46, 46, 46, 46, 47, 47, 47, 50, 53, 53, 53, 54, 56, 56, 48, 47,
+        46, 46, 46, 46, 47, 47, 47, 50, 53, 53, 53, 54, 56, 56, 48, 47, 45, 45,
+        45, 46, 46, 46, 46, 49, 53, 53, 53, 55, 57, 57, 49, 47, 45, 45, 45, 45,
+        46, 46, 46, 49, 53, 53, 53, 56, 58, 58, 49, 47, 45, 45, 45, 45, 46, 46,
+        46, 49, 53, 53, 53, 56, 58, 58,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 31, 30, 30, 30, 32, 33, 33, 33, 35, 37, 37, 37, 39,
+        42, 42, 42, 45, 49, 49, 49, 48, 48, 48, 48, 48, 49, 49, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 33, 34, 34, 34, 36, 38, 38, 38, 40, 42, 42, 42, 45,
+        48, 48, 48, 47, 47, 47, 47, 47, 47, 47, 31, 31, 31, 31, 31, 32, 32, 32,
+        32, 34, 36, 36, 36, 38, 40, 40, 40, 41, 43, 43, 43, 44, 46, 46, 46, 46,
+        46, 46, 46, 45, 45, 45, 31, 31, 31, 31, 31, 32, 32, 32, 32, 34, 36, 36,
+        36, 38, 40, 40, 40, 41, 43, 43, 43, 44, 46, 46, 46, 46, 46, 46, 46, 45,
+        45, 45, 31, 31, 31, 31, 31, 32, 32, 32, 32, 34, 36, 36, 36, 38, 40, 40,
+        40, 41, 43, 43, 43, 44, 46, 46, 46, 46, 46, 46, 46, 45, 45, 45, 33, 34,
+        34, 34, 34, 35, 35, 35, 35, 37, 39, 39, 39, 41, 43, 43, 43, 44, 45, 45,
+        45, 46, 47, 47, 47, 47, 46, 46, 46, 46, 45, 45, 37, 37, 38, 38, 38, 39,
+        40, 40, 40, 41, 43, 43, 43, 45, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48,
+        48, 47, 47, 47, 47, 46, 46, 46, 37, 37, 38, 38, 38, 39, 40, 40, 40, 41,
+        43, 43, 43, 45, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48, 48, 47, 47, 47,
+        47, 46, 46, 46, 37, 37, 38, 38, 38, 39, 40, 40, 40, 41, 43, 43, 43, 45,
+        47, 47, 47, 47, 47, 47, 47, 47, 48, 48, 48, 47, 47, 47, 47, 46, 46, 46,
+        42, 42, 42, 42, 42, 42, 42, 42, 42, 44, 45, 45, 45, 46, 47, 47, 47, 48,
+        48, 48, 48, 49, 50, 50, 50, 50, 50, 50, 50, 49, 49, 49, 48, 47, 47, 47,
+        47, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 49, 50, 50, 50, 51,
+        53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 48, 47, 47, 47, 47, 46, 46, 46,
+        46, 46, 47, 47, 47, 47, 47, 47, 47, 49, 50, 50, 50, 51, 53, 53, 53, 53,
+        53, 53, 53, 53, 53, 53, 48, 47, 47, 47, 47, 46, 46, 46, 46, 46, 47, 47,
+        47, 47, 47, 47, 47, 49, 50, 50, 50, 51, 53, 53, 53, 53, 53, 53, 53, 53,
+        53, 53, 48, 48, 47, 47, 47, 46, 45, 45, 45, 46, 46, 46, 46, 46, 46, 46,
+        46, 48, 50, 50, 50, 51, 53, 53, 53, 54, 54, 54, 54, 55, 56, 56, 49, 48,
+        47, 47, 47, 46, 45, 45, 45, 45, 46, 46, 46, 45, 45, 45, 45, 47, 49, 49,
+        49, 51, 53, 53, 53, 54, 56, 56, 56, 57, 58, 58, 49, 48, 47, 47, 47, 46,
+        45, 45, 45, 45, 46, 46, 46, 45, 45, 45, 45, 47, 49, 49, 49, 51, 53, 53,
+        53, 54, 56, 56, 56, 57, 58, 58,
+        /* Size 4x16 */
+        31, 33, 42, 48, 31, 34, 42, 47, 31, 34, 42, 47, 31, 35, 42, 45, 31, 35,
+        42, 45, 34, 39, 45, 46, 34, 39, 45, 46, 38, 43, 47, 46, 38, 43, 47, 46,
+        42, 45, 48, 50, 42, 45, 48, 50, 48, 47, 50, 53, 48, 47, 50, 53, 47, 46,
+        50, 54, 47, 46, 50, 54, 47, 45, 49, 56,
+        /* Size 16x4 */
+        31, 31, 31, 31, 31, 34, 34, 38, 38, 42, 42, 48, 48, 47, 47, 47, 33, 34,
+        34, 35, 35, 39, 39, 43, 43, 45, 45, 47, 47, 46, 46, 45, 42, 42, 42, 42,
+        42, 45, 45, 47, 47, 48, 48, 50, 50, 50, 50, 49, 48, 47, 47, 45, 45, 46,
+        46, 46, 46, 50, 50, 53, 53, 54, 54, 56,
+        /* Size 8x32 */
+        32, 31, 31, 37, 37, 48, 48, 49, 31, 31, 31, 37, 37, 47, 47, 48, 31, 31,
+        31, 38, 38, 47, 47, 47, 31, 31, 31, 38, 38, 47, 47, 47, 31, 31, 31, 38,
+        38, 47, 47, 47, 31, 32, 32, 39, 39, 46, 46, 46, 30, 32, 32, 40, 40, 46,
+        46, 45, 30, 32, 32, 40, 40, 46, 46, 45, 30, 32, 32, 40, 40, 46, 46, 45,
+        32, 34, 34, 41, 41, 46, 46, 45, 33, 36, 36, 43, 43, 47, 47, 46, 33, 36,
+        36, 43, 43, 47, 47, 46, 33, 36, 36, 43, 43, 47, 47, 46, 35, 38, 38, 45,
+        45, 47, 47, 45, 37, 40, 40, 47, 47, 47, 47, 45, 37, 40, 40, 47, 47, 47,
+        47, 45, 37, 40, 40, 47, 47, 47, 47, 45, 39, 41, 41, 47, 47, 49, 49, 47,
+        42, 43, 43, 47, 47, 50, 50, 49, 42, 43, 43, 47, 47, 50, 50, 49, 42, 43,
+        43, 47, 47, 50, 50, 49, 45, 44, 44, 47, 47, 51, 51, 51, 49, 46, 46, 48,
+        48, 53, 53, 53, 49, 46, 46, 48, 48, 53, 53, 53, 49, 46, 46, 48, 48, 53,
+        53, 53, 48, 46, 46, 47, 47, 53, 53, 54, 48, 46, 46, 47, 47, 53, 53, 56,
+        48, 46, 46, 47, 47, 53, 53, 56, 48, 46, 46, 47, 47, 53, 53, 56, 48, 45,
+        45, 46, 46, 53, 53, 57, 49, 45, 45, 46, 46, 53, 53, 58, 49, 45, 45, 46,
+        46, 53, 53, 58,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 31, 30, 30, 30, 32, 33, 33, 33, 35, 37, 37, 37, 39,
+        42, 42, 42, 45, 49, 49, 49, 48, 48, 48, 48, 48, 49, 49, 31, 31, 31, 31,
+        31, 32, 32, 32, 32, 34, 36, 36, 36, 38, 40, 40, 40, 41, 43, 43, 43, 44,
+        46, 46, 46, 46, 46, 46, 46, 45, 45, 45, 31, 31, 31, 31, 31, 32, 32, 32,
+        32, 34, 36, 36, 36, 38, 40, 40, 40, 41, 43, 43, 43, 44, 46, 46, 46, 46,
+        46, 46, 46, 45, 45, 45, 37, 37, 38, 38, 38, 39, 40, 40, 40, 41, 43, 43,
+        43, 45, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48, 48, 47, 47, 47, 47, 46,
+        46, 46, 37, 37, 38, 38, 38, 39, 40, 40, 40, 41, 43, 43, 43, 45, 47, 47,
+        47, 47, 47, 47, 47, 47, 48, 48, 48, 47, 47, 47, 47, 46, 46, 46, 48, 47,
+        47, 47, 47, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 49, 50, 50,
+        50, 51, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 48, 47, 47, 47, 47, 46,
+        46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 49, 50, 50, 50, 51, 53, 53,
+        53, 53, 53, 53, 53, 53, 53, 53, 49, 48, 47, 47, 47, 46, 45, 45, 45, 45,
+        46, 46, 46, 45, 45, 45, 45, 47, 49, 49, 49, 51, 53, 53, 53, 54, 56, 56,
+        56, 57, 58, 58 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 32, 32, 35, 32, 32, 33, 35, 32, 33, 35, 38, 35, 35, 38, 46,
+        /* Size 8x8 */
+        31, 31, 31, 32, 32, 32, 34, 35, 31, 32, 32, 32, 32, 33, 34, 35, 31, 32,
+        32, 32, 32, 33, 33, 34, 32, 32, 32, 33, 34, 34, 35, 36, 32, 32, 32, 34,
+        35, 35, 36, 38, 32, 33, 33, 34, 35, 36, 38, 40, 34, 34, 33, 35, 36, 38,
+        39, 42, 35, 35, 34, 36, 38, 40, 42, 48,
+        /* Size 16x16 */
+        32, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 33, 34, 34, 36, 36, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 35, 35, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 35, 35, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 34, 34, 35, 35, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 33, 33, 34, 34, 34, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 34, 34, 34, 31, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 34,
+        35, 35, 36, 36, 31, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 35, 35, 36,
+        36, 36, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 34, 35, 36, 36, 37, 37,
+        32, 32, 32, 32, 32, 32, 33, 34, 34, 35, 35, 36, 37, 37, 38, 38, 32, 32,
+        32, 32, 32, 32, 33, 34, 34, 35, 35, 36, 37, 37, 38, 38, 33, 33, 33, 33,
+        33, 33, 34, 35, 35, 36, 36, 38, 39, 40, 42, 42, 34, 34, 34, 34, 33, 33,
+        35, 35, 36, 37, 37, 39, 39, 41, 42, 42, 34, 34, 34, 34, 34, 34, 35, 36,
+        36, 37, 37, 40, 41, 42, 45, 45, 36, 35, 35, 35, 34, 34, 36, 36, 37, 38,
+        38, 42, 42, 45, 48, 48, 36, 35, 35, 35, 34, 34, 36, 36, 37, 38, 38, 42,
+        42, 45, 48, 48,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32,
+        32, 32, 32, 32, 33, 34, 34, 34, 34, 35, 36, 36, 36, 37, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 34, 34, 34, 34, 35, 35, 35, 35, 37, 31, 31, 31, 31, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 34,
+        34, 35, 35, 35, 35, 36, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 34, 34, 35, 35, 35,
+        35, 36, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 33, 34, 34, 34, 34, 35, 35, 35, 35, 36, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 34, 34, 34, 34, 35, 35, 35, 35, 36, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34,
+        34, 34, 34, 34, 35, 35, 35, 36, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 34, 34,
+        34, 34, 34, 35, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 35,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 35, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+        33, 33, 33, 33, 34, 34, 34, 34, 34, 35, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34,
+        34, 35, 35, 35, 35, 36, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35, 36, 36, 36,
+        36, 37, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33,
+        33, 34, 34, 34, 34, 34, 35, 35, 35, 35, 36, 36, 36, 36, 36, 37, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 34, 34, 34,
+        34, 34, 35, 35, 35, 35, 36, 36, 36, 36, 36, 37, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 35, 35,
+        35, 35, 36, 36, 36, 36, 36, 37, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 35, 35, 36, 36, 36, 36, 37,
+        37, 37, 37, 38, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34,
+        34, 34, 34, 35, 35, 35, 35, 35, 36, 36, 36, 36, 37, 37, 38, 38, 38, 39,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 34, 34, 35,
+        35, 35, 35, 36, 36, 37, 37, 37, 37, 38, 38, 38, 38, 39, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 34, 34, 35, 35, 35, 35, 36,
+        36, 37, 37, 37, 37, 38, 38, 38, 38, 39, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 33, 33, 34, 34, 34, 34, 35, 35, 35, 35, 36, 36, 37, 37, 37,
+        37, 38, 38, 38, 38, 39, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        34, 34, 34, 34, 35, 35, 36, 36, 36, 36, 37, 38, 38, 38, 38, 39, 40, 40,
+        40, 41, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 35, 35, 35,
+        35, 36, 36, 36, 36, 37, 38, 39, 39, 39, 40, 41, 42, 42, 42, 42, 34, 34,
+        34, 34, 34, 34, 34, 33, 33, 33, 33, 34, 35, 35, 35, 35, 36, 36, 37, 37,
+        37, 38, 39, 39, 39, 39, 41, 42, 42, 42, 42, 43, 34, 34, 34, 34, 34, 34,
+        34, 33, 33, 33, 33, 34, 35, 35, 35, 35, 36, 36, 37, 37, 37, 38, 39, 39,
+        39, 39, 41, 42, 42, 42, 42, 43, 34, 34, 34, 34, 34, 34, 34, 33, 33, 33,
+        33, 34, 35, 35, 35, 35, 36, 36, 37, 37, 37, 38, 39, 39, 39, 39, 41, 42,
+        42, 42, 42, 43, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 35, 36,
+        36, 36, 36, 37, 37, 37, 37, 38, 40, 41, 41, 41, 42, 44, 45, 45, 45, 45,
+        35, 35, 35, 35, 35, 35, 34, 34, 34, 34, 34, 35, 36, 36, 36, 36, 37, 37,
+        38, 38, 38, 39, 41, 42, 42, 42, 44, 46, 47, 47, 47, 48, 36, 35, 35, 35,
+        35, 35, 35, 34, 34, 34, 34, 35, 36, 36, 36, 36, 37, 38, 38, 38, 38, 40,
+        42, 42, 42, 42, 45, 47, 48, 48, 48, 49, 36, 35, 35, 35, 35, 35, 35, 34,
+        34, 34, 34, 35, 36, 36, 36, 36, 37, 38, 38, 38, 38, 40, 42, 42, 42, 42,
+        45, 47, 48, 48, 48, 49, 36, 35, 35, 35, 35, 35, 35, 34, 34, 34, 34, 35,
+        36, 36, 36, 36, 37, 38, 38, 38, 38, 40, 42, 42, 42, 42, 45, 47, 48, 48,
+        48, 49, 37, 37, 36, 36, 36, 36, 36, 35, 35, 35, 35, 36, 37, 37, 37, 37,
+        38, 39, 39, 39, 39, 41, 42, 43, 43, 43, 45, 48, 49, 49, 49, 50,
+        /* Size 4x8 */
+        31, 31, 32, 35, 32, 32, 32, 35, 32, 32, 33, 34, 32, 32, 34, 36, 32, 33,
+        35, 38, 33, 33, 36, 40, 34, 34, 37, 42, 35, 34, 38, 48,
+        /* Size 8x4 */
+        31, 32, 32, 32, 32, 33, 34, 35, 31, 32, 32, 32, 33, 33, 34, 34, 32, 32,
+        33, 34, 35, 36, 37, 38, 35, 35, 34, 36, 38, 40, 42, 48,
+        /* Size 8x16 */
+        32, 31, 31, 31, 32, 32, 35, 36, 31, 32, 32, 32, 32, 32, 35, 35, 31, 32,
+        32, 32, 32, 32, 35, 35, 31, 32, 32, 32, 32, 32, 34, 35, 31, 32, 32, 32,
+        33, 33, 34, 34, 31, 32, 32, 32, 33, 33, 34, 34, 31, 32, 32, 33, 34, 34,
+        35, 36, 32, 32, 32, 33, 34, 34, 36, 36, 32, 32, 32, 33, 34, 34, 36, 37,
+        32, 32, 33, 34, 35, 35, 37, 38, 32, 32, 33, 34, 35, 35, 37, 38, 33, 33,
+        33, 35, 36, 36, 40, 41, 34, 34, 34, 35, 37, 37, 41, 42, 34, 34, 34, 35,
+        37, 37, 43, 44, 36, 35, 34, 36, 38, 38, 46, 48, 36, 35, 34, 36, 38, 38,
+        46, 48,
+        /* Size 16x8 */
+        32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 33, 34, 34, 36, 36, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 35, 35, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 31, 32, 32, 32, 32, 32,
+        33, 33, 33, 34, 34, 35, 35, 35, 36, 36, 32, 32, 32, 32, 33, 33, 34, 34,
+        34, 35, 35, 36, 37, 37, 38, 38, 32, 32, 32, 32, 33, 33, 34, 34, 34, 35,
+        35, 36, 37, 37, 38, 38, 35, 35, 35, 34, 34, 34, 35, 36, 36, 37, 37, 40,
+        41, 43, 46, 46, 36, 35, 35, 35, 34, 34, 36, 36, 37, 38, 38, 41, 42, 44,
+        48, 48,
+        /* Size 16x32 */
+        32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 33, 35, 36, 36, 36, 31, 31,
+        31, 31, 31, 31, 32, 32, 32, 32, 32, 33, 35, 35, 35, 35, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 35, 35, 35, 35, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 35, 35, 35, 35, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 33, 35, 35, 35, 35, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 35, 35, 35, 35, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+        34, 35, 35, 35, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 35,
+        35, 35, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34,
+        31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 31, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 31, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 33, 34, 35, 35, 35, 35, 31, 32, 32, 32, 32, 32,
+        33, 33, 34, 34, 34, 34, 35, 36, 36, 36, 32, 32, 32, 32, 32, 32, 33, 34,
+        34, 34, 34, 35, 36, 36, 36, 36, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34,
+        34, 35, 36, 36, 36, 36, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 34, 35,
+        36, 36, 36, 36, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 34, 35, 36, 37,
+        37, 37, 32, 32, 32, 33, 33, 33, 33, 34, 35, 35, 35, 36, 37, 38, 38, 38,
+        32, 32, 32, 33, 33, 33, 34, 35, 35, 35, 35, 36, 37, 38, 38, 38, 32, 32,
+        32, 33, 33, 33, 34, 35, 35, 35, 35, 36, 37, 38, 38, 38, 32, 32, 32, 33,
+        33, 33, 34, 35, 35, 35, 35, 36, 37, 38, 38, 38, 32, 33, 33, 33, 33, 33,
+        34, 35, 36, 36, 36, 37, 39, 40, 40, 40, 33, 33, 33, 33, 33, 33, 35, 36,
+        36, 36, 36, 38, 40, 41, 41, 41, 34, 34, 34, 34, 34, 34, 35, 36, 37, 37,
+        37, 39, 41, 42, 42, 42, 34, 34, 34, 34, 34, 34, 35, 36, 37, 37, 37, 39,
+        41, 42, 42, 42, 34, 34, 34, 34, 34, 34, 35, 36, 37, 37, 37, 39, 41, 42,
+        42, 42, 34, 34, 34, 34, 34, 34, 35, 37, 37, 37, 37, 40, 43, 44, 44, 44,
+        35, 35, 34, 34, 34, 34, 36, 37, 38, 38, 38, 41, 45, 47, 47, 47, 36, 35,
+        35, 34, 34, 34, 36, 37, 38, 38, 38, 42, 46, 48, 48, 48, 36, 35, 35, 34,
+        34, 34, 36, 37, 38, 38, 38, 42, 46, 48, 48, 48, 36, 35, 35, 34, 34, 34,
+        36, 37, 38, 38, 38, 42, 46, 48, 48, 48, 37, 36, 36, 36, 36, 36, 37, 38,
+        39, 39, 39, 42, 46, 49, 49, 49,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 34, 34, 34, 34, 35, 36, 36, 36, 37, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+        33, 34, 34, 34, 34, 35, 35, 35, 35, 36, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 34,
+        34, 34, 35, 35, 35, 36, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34,
+        34, 36, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 34, 36, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33,
+        33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 34, 36, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 35, 35,
+        35, 35, 35, 36, 36, 36, 36, 37, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 34, 34, 34, 34, 34, 35, 35, 35, 35, 36, 36, 36, 36, 37, 37,
+        37, 37, 37, 38, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34,
+        34, 34, 34, 35, 35, 35, 35, 36, 36, 37, 37, 37, 37, 38, 38, 38, 38, 39,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 34, 35,
+        35, 35, 35, 36, 36, 37, 37, 37, 37, 38, 38, 38, 38, 39, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 34, 35, 35, 35, 35, 36,
+        36, 37, 37, 37, 37, 38, 38, 38, 38, 39, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 34, 34, 35, 35, 35, 35, 36, 36, 36, 36, 37, 38, 39, 39, 39,
+        40, 41, 42, 42, 42, 42, 35, 35, 35, 35, 35, 35, 34, 34, 34, 34, 34, 35,
+        35, 36, 36, 36, 36, 37, 37, 37, 37, 39, 40, 41, 41, 41, 43, 45, 46, 46,
+        46, 46, 36, 35, 35, 35, 35, 35, 35, 35, 34, 34, 34, 35, 36, 36, 36, 36,
+        37, 38, 38, 38, 38, 40, 41, 42, 42, 42, 44, 47, 48, 48, 48, 49, 36, 35,
+        35, 35, 35, 35, 35, 35, 34, 34, 34, 35, 36, 36, 36, 36, 37, 38, 38, 38,
+        38, 40, 41, 42, 42, 42, 44, 47, 48, 48, 48, 49, 36, 35, 35, 35, 35, 35,
+        35, 35, 34, 34, 34, 35, 36, 36, 36, 36, 37, 38, 38, 38, 38, 40, 41, 42,
+        42, 42, 44, 47, 48, 48, 48, 49,
+        /* Size 4x16 */
+        31, 31, 32, 36, 31, 32, 32, 35, 32, 32, 32, 35, 32, 32, 32, 35, 32, 32,
+        33, 34, 32, 32, 33, 34, 32, 32, 34, 36, 32, 32, 34, 36, 32, 32, 34, 37,
+        32, 33, 35, 38, 32, 33, 35, 38, 33, 33, 36, 41, 34, 34, 37, 42, 34, 34,
+        37, 44, 35, 34, 38, 48, 35, 34, 38, 48,
+        /* Size 16x4 */
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 35, 35, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 32, 32, 32, 32,
+        33, 33, 34, 34, 34, 35, 35, 36, 37, 37, 38, 38, 36, 35, 35, 35, 34, 34,
+        36, 36, 37, 38, 38, 41, 42, 44, 48, 48,
+        /* Size 8x32 */
+        32, 31, 31, 31, 32, 32, 35, 36, 31, 31, 31, 32, 32, 32, 35, 35, 31, 32,
+        32, 32, 32, 32, 35, 35, 31, 32, 32, 32, 32, 32, 35, 35, 31, 32, 32, 32,
+        32, 32, 35, 35, 31, 32, 32, 32, 32, 32, 35, 35, 31, 32, 32, 32, 32, 32,
+        34, 35, 31, 32, 32, 32, 32, 32, 34, 35, 31, 32, 32, 32, 33, 33, 34, 34,
+        31, 32, 32, 32, 33, 33, 34, 34, 31, 32, 32, 32, 33, 33, 34, 34, 31, 32,
+        32, 33, 33, 33, 35, 35, 31, 32, 32, 33, 34, 34, 35, 36, 32, 32, 32, 33,
+        34, 34, 36, 36, 32, 32, 32, 33, 34, 34, 36, 36, 32, 32, 32, 33, 34, 34,
+        36, 36, 32, 32, 32, 33, 34, 34, 36, 37, 32, 32, 33, 33, 35, 35, 37, 38,
+        32, 32, 33, 34, 35, 35, 37, 38, 32, 32, 33, 34, 35, 35, 37, 38, 32, 32,
+        33, 34, 35, 35, 37, 38, 32, 33, 33, 34, 36, 36, 39, 40, 33, 33, 33, 35,
+        36, 36, 40, 41, 34, 34, 34, 35, 37, 37, 41, 42, 34, 34, 34, 35, 37, 37,
+        41, 42, 34, 34, 34, 35, 37, 37, 41, 42, 34, 34, 34, 35, 37, 37, 43, 44,
+        35, 34, 34, 36, 38, 38, 45, 47, 36, 35, 34, 36, 38, 38, 46, 48, 36, 35,
+        34, 36, 38, 38, 46, 48, 36, 35, 34, 36, 38, 38, 46, 48, 37, 36, 36, 37,
+        39, 39, 46, 49,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 34, 34, 34, 34, 35, 36, 36, 36, 37, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+        33, 34, 34, 34, 34, 34, 35, 35, 35, 36, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 34, 34, 34,
+        34, 34, 34, 34, 34, 36, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+        33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 36, 36, 36,
+        36, 37, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34,
+        34, 35, 35, 35, 35, 36, 36, 37, 37, 37, 37, 38, 38, 38, 38, 39, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 34, 35, 35, 35,
+        35, 36, 36, 37, 37, 37, 37, 38, 38, 38, 38, 39, 35, 35, 35, 35, 35, 35,
+        34, 34, 34, 34, 34, 35, 35, 36, 36, 36, 36, 37, 37, 37, 37, 39, 40, 41,
+        41, 41, 43, 45, 46, 46, 46, 46, 36, 35, 35, 35, 35, 35, 35, 35, 34, 34,
+        34, 35, 36, 36, 36, 36, 37, 38, 38, 38, 38, 40, 41, 42, 42, 42, 44, 47,
+        48, 48, 48, 49 },
+      { /* Chroma */
+        /* Size 4x4 */
+        31, 32, 38, 46, 32, 34, 41, 46, 38, 41, 47, 47, 46, 46, 47, 52,
+        /* Size 8x8 */
+        31, 31, 30, 34, 36, 39, 42, 48, 31, 31, 31, 34, 37, 40, 42, 47, 30, 31,
+        32, 35, 39, 41, 42, 46, 34, 34, 35, 39, 42, 44, 45, 47, 36, 37, 39, 42,
+        46, 47, 47, 47, 39, 40, 41, 44, 47, 47, 48, 49, 42, 42, 42, 45, 47, 48,
+        48, 50, 48, 47, 46, 47, 47, 49, 50, 53,
+        /* Size 16x16 */
+        32, 31, 31, 31, 30, 30, 33, 33, 34, 36, 36, 40, 41, 44, 49, 49, 31, 31,
+        31, 31, 31, 31, 33, 34, 36, 38, 38, 41, 42, 44, 48, 48, 31, 31, 31, 31,
+        31, 31, 34, 34, 36, 38, 38, 41, 42, 44, 47, 47, 31, 31, 31, 31, 31, 31,
+        34, 35, 36, 39, 39, 41, 42, 44, 47, 47, 30, 31, 31, 31, 32, 32, 34, 35,
+        37, 40, 40, 42, 42, 44, 46, 46, 30, 31, 31, 31, 32, 32, 34, 35, 37, 40,
+        40, 42, 42, 44, 46, 46, 33, 33, 34, 34, 34, 34, 37, 38, 40, 42, 42, 44,
+        44, 45, 47, 47, 33, 34, 34, 35, 35, 35, 38, 39, 40, 43, 43, 44, 45, 46,
+        47, 47, 34, 36, 36, 36, 37, 37, 40, 40, 42, 45, 45, 45, 46, 46, 47, 47,
+        36, 38, 38, 39, 40, 40, 42, 43, 45, 47, 47, 47, 47, 47, 48, 48, 36, 38,
+        38, 39, 40, 40, 42, 43, 45, 47, 47, 47, 47, 47, 48, 48, 40, 41, 41, 41,
+        42, 42, 44, 44, 45, 47, 47, 48, 48, 49, 50, 50, 41, 42, 42, 42, 42, 42,
+        44, 45, 46, 47, 47, 48, 48, 49, 50, 50, 44, 44, 44, 44, 44, 44, 45, 46,
+        46, 47, 47, 49, 49, 50, 51, 51, 49, 48, 47, 47, 46, 46, 47, 47, 47, 48,
+        48, 50, 50, 51, 53, 53, 49, 48, 47, 47, 46, 46, 47, 47, 47, 48, 48, 50,
+        50, 51, 53, 53,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 31, 33, 33, 33, 33, 34, 36,
+        36, 36, 36, 38, 40, 41, 41, 41, 44, 47, 49, 49, 49, 49, 31, 31, 31, 31,
+        31, 31, 31, 31, 30, 30, 30, 32, 33, 34, 34, 34, 35, 36, 37, 37, 37, 39,
+        41, 42, 42, 42, 44, 47, 48, 48, 48, 48, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 32, 33, 34, 34, 34, 36, 37, 38, 38, 38, 39, 41, 42, 42, 42,
+        44, 46, 48, 48, 48, 47, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32,
+        34, 34, 34, 34, 36, 37, 38, 38, 38, 40, 41, 42, 42, 42, 44, 46, 47, 47,
+        47, 47, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 34, 34, 34, 34,
+        36, 37, 38, 38, 38, 40, 41, 42, 42, 42, 44, 46, 47, 47, 47, 47, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 34, 34, 34, 34, 36, 37, 38, 38,
+        38, 40, 41, 42, 42, 42, 44, 46, 47, 47, 47, 47, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 33, 34, 35, 35, 35, 36, 38, 39, 39, 39, 40, 41, 42,
+        42, 42, 44, 46, 47, 47, 47, 47, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 33, 34, 35, 35, 35, 37, 38, 39, 39, 39, 41, 42, 42, 42, 42, 44, 46,
+        46, 46, 46, 46, 30, 30, 31, 31, 31, 31, 31, 31, 32, 32, 32, 33, 34, 35,
+        35, 35, 37, 39, 40, 40, 40, 41, 42, 42, 42, 42, 44, 45, 46, 46, 46, 46,
+        30, 30, 31, 31, 31, 31, 31, 31, 32, 32, 32, 33, 34, 35, 35, 35, 37, 39,
+        40, 40, 40, 41, 42, 42, 42, 42, 44, 45, 46, 46, 46, 46, 30, 30, 31, 31,
+        31, 31, 31, 31, 32, 32, 32, 33, 34, 35, 35, 35, 37, 39, 40, 40, 40, 41,
+        42, 42, 42, 42, 44, 45, 46, 46, 46, 46, 31, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 33, 34, 36, 37, 37, 37, 38, 40, 41, 41, 41, 42, 43, 43, 43, 43,
+        44, 46, 46, 46, 46, 46, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 34, 36,
+        37, 38, 38, 38, 40, 41, 42, 42, 42, 43, 44, 44, 44, 44, 45, 46, 47, 47,
+        47, 46, 33, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 37, 38, 39, 39, 39,
+        40, 42, 43, 43, 43, 44, 44, 45, 45, 45, 46, 47, 47, 47, 47, 47, 33, 34,
+        34, 34, 34, 34, 35, 35, 35, 35, 35, 37, 38, 39, 39, 39, 40, 42, 43, 43,
+        43, 44, 44, 45, 45, 45, 46, 47, 47, 47, 47, 47, 33, 34, 34, 34, 34, 34,
+        35, 35, 35, 35, 35, 37, 38, 39, 39, 39, 40, 42, 43, 43, 43, 44, 44, 45,
+        45, 45, 46, 47, 47, 47, 47, 47, 34, 35, 36, 36, 36, 36, 36, 37, 37, 37,
+        37, 38, 40, 40, 40, 40, 42, 44, 45, 45, 45, 45, 45, 46, 46, 46, 46, 47,
+        47, 47, 47, 47, 36, 36, 37, 37, 37, 37, 38, 38, 39, 39, 39, 40, 41, 42,
+        42, 42, 44, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47,
+        36, 37, 38, 38, 38, 38, 39, 39, 40, 40, 40, 41, 42, 43, 43, 43, 45, 46,
+        47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48, 48, 47, 36, 37, 38, 38,
+        38, 38, 39, 39, 40, 40, 40, 41, 42, 43, 43, 43, 45, 46, 47, 47, 47, 47,
+        47, 47, 47, 47, 47, 47, 48, 48, 48, 47, 36, 37, 38, 38, 38, 38, 39, 39,
+        40, 40, 40, 41, 42, 43, 43, 43, 45, 46, 47, 47, 47, 47, 47, 47, 47, 47,
+        47, 47, 48, 48, 48, 47, 38, 39, 39, 40, 40, 40, 40, 41, 41, 41, 41, 42,
+        43, 44, 44, 44, 45, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 48, 49, 49,
+        49, 48, 40, 41, 41, 41, 41, 41, 41, 42, 42, 42, 42, 43, 44, 44, 44, 44,
+        45, 47, 47, 47, 47, 48, 48, 48, 48, 48, 49, 49, 50, 50, 50, 49, 41, 42,
+        42, 42, 42, 42, 42, 42, 42, 42, 42, 43, 44, 45, 45, 45, 46, 47, 47, 47,
+        47, 48, 48, 48, 48, 48, 49, 50, 50, 50, 50, 50, 41, 42, 42, 42, 42, 42,
+        42, 42, 42, 42, 42, 43, 44, 45, 45, 45, 46, 47, 47, 47, 47, 48, 48, 48,
+        48, 48, 49, 50, 50, 50, 50, 50, 41, 42, 42, 42, 42, 42, 42, 42, 42, 42,
+        42, 43, 44, 45, 45, 45, 46, 47, 47, 47, 47, 48, 48, 48, 48, 48, 49, 50,
+        50, 50, 50, 50, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 45, 46,
+        46, 46, 46, 47, 47, 47, 47, 48, 49, 49, 49, 49, 50, 51, 51, 51, 51, 51,
+        47, 47, 46, 46, 46, 46, 46, 46, 45, 45, 45, 46, 46, 47, 47, 47, 47, 47,
+        47, 47, 47, 48, 49, 50, 50, 50, 51, 52, 52, 52, 52, 52, 49, 48, 48, 47,
+        47, 47, 47, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 48, 48, 48, 49,
+        50, 50, 50, 50, 51, 52, 53, 53, 53, 53, 49, 48, 48, 47, 47, 47, 47, 46,
+        46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 48, 48, 48, 49, 50, 50, 50, 50,
+        51, 52, 53, 53, 53, 53, 49, 48, 48, 47, 47, 47, 47, 46, 46, 46, 46, 46,
+        47, 47, 47, 47, 47, 47, 48, 48, 48, 49, 50, 50, 50, 50, 51, 52, 53, 53,
+        53, 53, 49, 48, 47, 47, 47, 47, 47, 46, 46, 46, 46, 46, 46, 47, 47, 47,
+        47, 47, 47, 47, 47, 48, 49, 50, 50, 50, 51, 52, 53, 53, 53, 53,
+        /* Size 4x8 */
+        31, 31, 37, 48, 31, 31, 38, 47, 31, 32, 40, 46, 34, 36, 43, 47, 37, 39,
+        46, 47, 39, 41, 47, 48, 42, 43, 47, 50, 48, 46, 48, 53,
+        /* Size 8x4 */
+        31, 31, 31, 34, 37, 39, 42, 48, 31, 31, 32, 36, 39, 41, 43, 46, 37, 38,
+        40, 43, 46, 47, 47, 48, 48, 47, 46, 47, 47, 48, 50, 53,
+        /* Size 8x16 */
+        32, 31, 31, 33, 37, 37, 45, 48, 31, 31, 31, 34, 38, 38, 45, 47, 31, 31,
+        31, 34, 38, 38, 45, 47, 31, 31, 32, 34, 39, 39, 45, 46, 30, 32, 32, 35,
+        40, 40, 44, 46, 30, 32, 32, 35, 40, 40, 44, 46, 33, 34, 35, 37, 42, 42,
+        46, 47, 33, 35, 36, 38, 43, 43, 46, 47, 35, 37, 37, 40, 44, 44, 46, 47,
+        37, 39, 40, 43, 47, 47, 47, 47, 37, 39, 40, 43, 47, 47, 47, 47, 41, 42,
+        42, 44, 47, 47, 49, 49, 42, 42, 43, 44, 47, 47, 49, 50, 44, 44, 44, 45,
+        47, 47, 50, 51, 49, 47, 46, 47, 48, 48, 52, 53, 49, 47, 46, 47, 48, 48,
+        52, 53,
+        /* Size 16x8 */
+        32, 31, 31, 31, 30, 30, 33, 33, 35, 37, 37, 41, 42, 44, 49, 49, 31, 31,
+        31, 31, 32, 32, 34, 35, 37, 39, 39, 42, 42, 44, 47, 47, 31, 31, 31, 32,
+        32, 32, 35, 36, 37, 40, 40, 42, 43, 44, 46, 46, 33, 34, 34, 34, 35, 35,
+        37, 38, 40, 43, 43, 44, 44, 45, 47, 47, 37, 38, 38, 39, 40, 40, 42, 43,
+        44, 47, 47, 47, 47, 47, 48, 48, 37, 38, 38, 39, 40, 40, 42, 43, 44, 47,
+        47, 47, 47, 47, 48, 48, 45, 45, 45, 45, 44, 44, 46, 46, 46, 47, 47, 49,
+        49, 50, 52, 52, 48, 47, 47, 46, 46, 46, 47, 47, 47, 47, 47, 49, 50, 51,
+        53, 53,
+        /* Size 16x32 */
+        32, 31, 31, 31, 31, 31, 33, 35, 37, 37, 37, 40, 45, 48, 48, 48, 31, 31,
+        31, 31, 31, 31, 33, 36, 37, 37, 37, 41, 45, 48, 48, 48, 31, 31, 31, 31,
+        31, 31, 34, 36, 38, 38, 38, 41, 45, 47, 47, 47, 31, 31, 31, 31, 31, 31,
+        34, 37, 38, 38, 38, 41, 45, 47, 47, 47, 31, 31, 31, 31, 31, 31, 34, 37,
+        38, 38, 38, 41, 45, 47, 47, 47, 31, 31, 31, 31, 31, 31, 34, 37, 38, 38,
+        38, 41, 45, 47, 47, 47, 31, 31, 31, 32, 32, 32, 34, 37, 39, 39, 39, 41,
+        45, 46, 46, 46, 30, 31, 31, 32, 32, 32, 34, 38, 39, 39, 39, 42, 44, 46,
+        46, 46, 30, 31, 32, 32, 32, 32, 35, 38, 40, 40, 40, 42, 44, 46, 46, 46,
+        30, 31, 32, 32, 32, 32, 35, 38, 40, 40, 40, 42, 44, 46, 46, 46, 30, 31,
+        32, 32, 32, 32, 35, 38, 40, 40, 40, 42, 44, 46, 46, 46, 31, 32, 33, 33,
+        33, 33, 36, 39, 41, 41, 41, 43, 45, 46, 46, 46, 33, 34, 34, 35, 35, 35,
+        37, 40, 42, 42, 42, 44, 46, 47, 47, 47, 33, 34, 35, 36, 36, 36, 38, 41,
+        43, 43, 43, 44, 46, 47, 47, 47, 33, 34, 35, 36, 36, 36, 38, 41, 43, 43,
+        43, 44, 46, 47, 47, 47, 33, 34, 35, 36, 36, 36, 38, 41, 43, 43, 43, 44,
+        46, 47, 47, 47, 35, 36, 37, 37, 37, 37, 40, 43, 44, 44, 44, 45, 46, 47,
+        47, 47, 36, 37, 38, 39, 39, 39, 42, 44, 46, 46, 46, 47, 47, 47, 47, 47,
+        37, 38, 39, 40, 40, 40, 43, 45, 47, 47, 47, 47, 47, 47, 47, 47, 37, 38,
+        39, 40, 40, 40, 43, 45, 47, 47, 47, 47, 47, 47, 47, 47, 37, 38, 39, 40,
+        40, 40, 43, 45, 47, 47, 47, 47, 47, 47, 47, 47, 39, 39, 40, 41, 41, 41,
+        43, 46, 47, 47, 47, 48, 48, 48, 48, 48, 41, 41, 42, 42, 42, 42, 44, 46,
+        47, 47, 47, 48, 49, 49, 49, 49, 42, 42, 42, 43, 43, 43, 44, 46, 47, 47,
+        47, 48, 49, 50, 50, 50, 42, 42, 42, 43, 43, 43, 44, 46, 47, 47, 47, 48,
+        49, 50, 50, 50, 42, 42, 42, 43, 43, 43, 44, 46, 47, 47, 47, 48, 49, 50,
+        50, 50, 44, 44, 44, 44, 44, 44, 45, 47, 47, 47, 47, 49, 50, 51, 51, 51,
+        47, 46, 46, 46, 46, 46, 46, 47, 48, 48, 48, 49, 51, 52, 52, 52, 49, 48,
+        47, 46, 46, 46, 47, 48, 48, 48, 48, 50, 52, 53, 53, 53, 49, 48, 47, 46,
+        46, 46, 47, 48, 48, 48, 48, 50, 52, 53, 53, 53, 49, 48, 47, 46, 46, 46,
+        47, 48, 48, 48, 48, 50, 52, 53, 53, 53, 49, 48, 47, 46, 46, 46, 47, 47,
+        47, 47, 47, 49, 52, 53, 53, 53,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 31, 33, 33, 33, 33, 35, 36,
+        37, 37, 37, 39, 41, 42, 42, 42, 44, 47, 49, 49, 49, 49, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 32, 34, 34, 34, 34, 36, 37, 38, 38, 38, 39,
+        41, 42, 42, 42, 44, 46, 48, 48, 48, 48, 31, 31, 31, 31, 31, 31, 31, 31,
+        32, 32, 32, 33, 34, 35, 35, 35, 37, 38, 39, 39, 39, 40, 42, 42, 42, 42,
+        44, 46, 47, 47, 47, 47, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 33,
+        35, 36, 36, 36, 37, 39, 40, 40, 40, 41, 42, 43, 43, 43, 44, 46, 46, 46,
+        46, 46, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 33, 35, 36, 36, 36,
+        37, 39, 40, 40, 40, 41, 42, 43, 43, 43, 44, 46, 46, 46, 46, 46, 31, 31,
+        31, 31, 31, 31, 32, 32, 32, 32, 32, 33, 35, 36, 36, 36, 37, 39, 40, 40,
+        40, 41, 42, 43, 43, 43, 44, 46, 46, 46, 46, 46, 33, 33, 34, 34, 34, 34,
+        34, 34, 35, 35, 35, 36, 37, 38, 38, 38, 40, 42, 43, 43, 43, 43, 44, 44,
+        44, 44, 45, 46, 47, 47, 47, 47, 35, 36, 36, 37, 37, 37, 37, 38, 38, 38,
+        38, 39, 40, 41, 41, 41, 43, 44, 45, 45, 45, 46, 46, 46, 46, 46, 47, 47,
+        48, 48, 48, 47, 37, 37, 38, 38, 38, 38, 39, 39, 40, 40, 40, 41, 42, 43,
+        43, 43, 44, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48, 48, 48, 47,
+        37, 37, 38, 38, 38, 38, 39, 39, 40, 40, 40, 41, 42, 43, 43, 43, 44, 46,
+        47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48, 48, 48, 47, 37, 37, 38, 38,
+        38, 38, 39, 39, 40, 40, 40, 41, 42, 43, 43, 43, 44, 46, 47, 47, 47, 47,
+        47, 47, 47, 47, 47, 48, 48, 48, 48, 47, 40, 41, 41, 41, 41, 41, 41, 42,
+        42, 42, 42, 43, 44, 44, 44, 44, 45, 47, 47, 47, 47, 48, 48, 48, 48, 48,
+        49, 49, 50, 50, 50, 49, 45, 45, 45, 45, 45, 45, 45, 44, 44, 44, 44, 45,
+        46, 46, 46, 46, 46, 47, 47, 47, 47, 48, 49, 49, 49, 49, 50, 51, 52, 52,
+        52, 52, 48, 48, 47, 47, 47, 47, 46, 46, 46, 46, 46, 46, 47, 47, 47, 47,
+        47, 47, 47, 47, 47, 48, 49, 50, 50, 50, 51, 52, 53, 53, 53, 53, 48, 48,
+        47, 47, 47, 47, 46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47,
+        47, 48, 49, 50, 50, 50, 51, 52, 53, 53, 53, 53, 48, 48, 47, 47, 47, 47,
+        46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 49, 50,
+        50, 50, 51, 52, 53, 53, 53, 53,
+        /* Size 4x16 */
+        31, 31, 37, 48, 31, 31, 38, 47, 31, 31, 38, 47, 31, 32, 39, 46, 31, 32,
+        40, 46, 31, 32, 40, 46, 34, 35, 42, 47, 34, 36, 43, 47, 36, 37, 44, 47,
+        38, 40, 47, 47, 38, 40, 47, 47, 41, 42, 47, 49, 42, 43, 47, 50, 44, 44,
+        47, 51, 48, 46, 48, 53, 48, 46, 48, 53,
+        /* Size 16x4 */
+        31, 31, 31, 31, 31, 31, 34, 34, 36, 38, 38, 41, 42, 44, 48, 48, 31, 31,
+        31, 32, 32, 32, 35, 36, 37, 40, 40, 42, 43, 44, 46, 46, 37, 38, 38, 39,
+        40, 40, 42, 43, 44, 47, 47, 47, 47, 47, 48, 48, 48, 47, 47, 46, 46, 46,
+        47, 47, 47, 47, 47, 49, 50, 51, 53, 53,
+        /* Size 8x32 */
+        32, 31, 31, 33, 37, 37, 45, 48, 31, 31, 31, 33, 37, 37, 45, 48, 31, 31,
+        31, 34, 38, 38, 45, 47, 31, 31, 31, 34, 38, 38, 45, 47, 31, 31, 31, 34,
+        38, 38, 45, 47, 31, 31, 31, 34, 38, 38, 45, 47, 31, 31, 32, 34, 39, 39,
+        45, 46, 30, 31, 32, 34, 39, 39, 44, 46, 30, 32, 32, 35, 40, 40, 44, 46,
+        30, 32, 32, 35, 40, 40, 44, 46, 30, 32, 32, 35, 40, 40, 44, 46, 31, 33,
+        33, 36, 41, 41, 45, 46, 33, 34, 35, 37, 42, 42, 46, 47, 33, 35, 36, 38,
+        43, 43, 46, 47, 33, 35, 36, 38, 43, 43, 46, 47, 33, 35, 36, 38, 43, 43,
+        46, 47, 35, 37, 37, 40, 44, 44, 46, 47, 36, 38, 39, 42, 46, 46, 47, 47,
+        37, 39, 40, 43, 47, 47, 47, 47, 37, 39, 40, 43, 47, 47, 47, 47, 37, 39,
+        40, 43, 47, 47, 47, 47, 39, 40, 41, 43, 47, 47, 48, 48, 41, 42, 42, 44,
+        47, 47, 49, 49, 42, 42, 43, 44, 47, 47, 49, 50, 42, 42, 43, 44, 47, 47,
+        49, 50, 42, 42, 43, 44, 47, 47, 49, 50, 44, 44, 44, 45, 47, 47, 50, 51,
+        47, 46, 46, 46, 48, 48, 51, 52, 49, 47, 46, 47, 48, 48, 52, 53, 49, 47,
+        46, 47, 48, 48, 52, 53, 49, 47, 46, 47, 48, 48, 52, 53, 49, 47, 46, 47,
+        47, 47, 52, 53,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 31, 33, 33, 33, 33, 35, 36,
+        37, 37, 37, 39, 41, 42, 42, 42, 44, 47, 49, 49, 49, 49, 31, 31, 31, 31,
+        31, 31, 31, 31, 32, 32, 32, 33, 34, 35, 35, 35, 37, 38, 39, 39, 39, 40,
+        42, 42, 42, 42, 44, 46, 47, 47, 47, 47, 31, 31, 31, 31, 31, 31, 32, 32,
+        32, 32, 32, 33, 35, 36, 36, 36, 37, 39, 40, 40, 40, 41, 42, 43, 43, 43,
+        44, 46, 46, 46, 46, 46, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35, 35, 36,
+        37, 38, 38, 38, 40, 42, 43, 43, 43, 43, 44, 44, 44, 44, 45, 46, 47, 47,
+        47, 47, 37, 37, 38, 38, 38, 38, 39, 39, 40, 40, 40, 41, 42, 43, 43, 43,
+        44, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48, 48, 48, 47, 37, 37,
+        38, 38, 38, 38, 39, 39, 40, 40, 40, 41, 42, 43, 43, 43, 44, 46, 47, 47,
+        47, 47, 47, 47, 47, 47, 47, 48, 48, 48, 48, 47, 45, 45, 45, 45, 45, 45,
+        45, 44, 44, 44, 44, 45, 46, 46, 46, 46, 46, 47, 47, 47, 47, 48, 49, 49,
+        49, 49, 50, 51, 52, 52, 52, 52, 48, 48, 47, 47, 47, 47, 46, 46, 46, 46,
+        46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 49, 50, 50, 50, 51, 52,
+        53, 53, 53, 53 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        31, 32, 32, 32, 32, 32, 32, 33, 32, 32, 33, 34, 32, 33, 34, 35,
+        /* Size 8x8 */
+        31, 31, 31, 31, 32, 32, 32, 33, 31, 32, 32, 32, 32, 32, 32, 33, 31, 32,
+        32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32,
+        33, 33, 34, 35, 32, 32, 32, 32, 33, 34, 34, 35, 32, 32, 32, 32, 34, 34,
+        35, 36, 33, 33, 33, 33, 35, 35, 36, 38,
+        /* Size 16x16 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 34, 31, 31,
+        31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 34, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33,
+        33, 34, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 34, 35,
+        31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 35, 31, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 35, 32, 32, 32, 32,
+        32, 32, 32, 33, 33, 33, 33, 34, 35, 35, 35, 36, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 34, 34, 35, 35, 35, 36, 37, 32, 32, 32, 32, 32, 32, 32, 33,
+        33, 34, 34, 35, 35, 35, 36, 37, 32, 33, 33, 33, 33, 33, 33, 33, 34, 34,
+        34, 35, 36, 36, 36, 38, 34, 34, 34, 34, 34, 33, 33, 34, 35, 35, 35, 36,
+        37, 37, 38, 39,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 34, 34, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        34, 34, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 31, 31, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 34, 34, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 34, 34, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34,
+        31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 33, 33, 33, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33,
+        33, 33, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 34, 34, 34, 34, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34,
+        34, 34, 35, 35, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35, 35,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33,
+        33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35, 35, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33,
+        33, 34, 34, 34, 34, 34, 34, 35, 35, 35, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34,
+        34, 34, 34, 35, 35, 35, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 35, 35,
+        35, 35, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 33, 33, 33, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34,
+        34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 36, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 35, 35,
+        35, 35, 35, 35, 36, 36, 37, 37, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35,
+        36, 36, 37, 37, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 37, 37,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 34, 34,
+        34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 37, 37, 32, 32, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 35,
+        35, 35, 36, 36, 36, 36, 36, 37, 38, 38, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 36,
+        36, 36, 37, 38, 38, 38, 34, 34, 34, 34, 34, 34, 34, 34, 34, 33, 33, 33,
+        33, 33, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 37, 37, 37, 37, 38, 38,
+        39, 39, 34, 34, 34, 34, 34, 34, 34, 34, 34, 33, 33, 33, 33, 33, 34, 34,
+        35, 35, 35, 35, 35, 35, 36, 36, 37, 37, 37, 37, 38, 38, 39, 39,
+        /* Size 4x8 */
+        31, 31, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32,
+        33, 34, 32, 32, 34, 34, 32, 33, 34, 35, 33, 33, 35, 36,
+        /* Size 8x4 */
+        31, 31, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 32, 33, 33, 32, 32,
+        32, 32, 33, 34, 34, 35, 32, 32, 32, 33, 34, 34, 35, 36,
+        /* Size 8x16 */
+        32, 31, 31, 31, 31, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 33, 31, 32,
+        32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32,
+        32, 32, 32, 33, 31, 32, 32, 32, 32, 33, 33, 33, 31, 32, 32, 32, 32, 33,
+        33, 33, 31, 32, 32, 32, 32, 33, 33, 33, 31, 32, 32, 32, 33, 34, 34, 34,
+        32, 32, 32, 32, 33, 34, 34, 34, 32, 32, 32, 32, 33, 34, 34, 34, 32, 32,
+        32, 32, 33, 35, 35, 35, 32, 32, 33, 33, 34, 35, 35, 36, 32, 32, 33, 33,
+        34, 35, 35, 36, 32, 33, 33, 33, 34, 36, 36, 36, 34, 34, 34, 34, 35, 37,
+        37, 38,
+        /* Size 16x8 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 34, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 31, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 33, 34, 34, 34, 35, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34,
+        34, 35, 35, 35, 36, 37, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 35,
+        35, 35, 36, 37, 32, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 35, 36, 36,
+        36, 38,
+        /* Size 16x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 34, 31, 31,
+        31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 33, 34, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+        33, 33, 33, 33, 33, 34, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        33, 33, 34, 34, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 34, 34,
+        34, 35, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 35,
+        32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 35, 32, 32,
+        32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 35, 32, 32, 32, 32,
+        32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 35, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 34, 34, 34, 34, 34, 35, 35, 32, 32, 32, 32, 32, 32, 32, 33,
+        33, 34, 35, 35, 35, 35, 35, 36, 32, 32, 32, 32, 33, 33, 33, 33, 33, 34,
+        35, 35, 35, 35, 36, 36, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 35, 35,
+        35, 35, 36, 37, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35,
+        36, 37, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35, 36, 37,
+        32, 32, 32, 33, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35, 36, 37, 32, 33,
+        33, 33, 33, 33, 33, 33, 34, 35, 36, 36, 36, 36, 36, 38, 33, 33, 33, 33,
+        33, 33, 33, 34, 34, 35, 36, 36, 36, 36, 37, 38, 34, 34, 34, 34, 34, 34,
+        34, 34, 35, 36, 37, 37, 37, 37, 38, 39, 34, 34, 34, 34, 34, 34, 34, 34,
+        35, 36, 37, 37, 37, 37, 38, 39,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 34, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33,
+        34, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 34, 34, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 34, 34, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+        33, 33, 33, 33, 33, 33, 34, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 34, 34, 34, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 35, 35, 36, 36, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34,
+        35, 35, 35, 35, 35, 35, 36, 36, 37, 37, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35,
+        35, 35, 36, 36, 37, 37, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36,
+        37, 37, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33,
+        34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 37, 37, 32, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34,
+        34, 35, 35, 36, 36, 36, 36, 36, 36, 37, 38, 38, 34, 34, 34, 34, 34, 34,
+        34, 34, 34, 33, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36,
+        37, 37, 37, 37, 38, 38, 39, 39,
+        /* Size 4x16 */
+        31, 31, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32,
+        32, 32, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 33, 33, 32, 32, 33, 34,
+        32, 32, 33, 34, 32, 32, 33, 34, 32, 32, 34, 35, 32, 33, 34, 35, 32, 33,
+        34, 35, 33, 33, 35, 36, 34, 34, 36, 37,
+        /* Size 16x4 */
+        31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 32, 32, 32, 32,
+        32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 35, 36, 32, 32, 32, 32, 32, 33,
+        33, 33, 34, 34, 34, 35, 35, 35, 36, 37,
+        /* Size 8x32 */
+        32, 31, 31, 31, 31, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 33, 31, 31,
+        32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32,
+        32, 32, 32, 33, 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 32,
+        32, 33, 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 32, 32, 33,
+        31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 33, 33, 33, 31, 32,
+        32, 32, 32, 33, 33, 33, 31, 32, 32, 32, 32, 33, 33, 33, 31, 32, 32, 32,
+        32, 33, 33, 33, 31, 32, 32, 32, 32, 33, 33, 33, 31, 32, 32, 32, 33, 33,
+        33, 34, 31, 32, 32, 32, 33, 34, 34, 34, 32, 32, 32, 32, 33, 34, 34, 34,
+        32, 32, 32, 32, 33, 34, 34, 34, 32, 32, 32, 32, 33, 34, 34, 34, 32, 32,
+        32, 32, 33, 34, 34, 34, 32, 32, 32, 32, 33, 34, 34, 35, 32, 32, 32, 32,
+        33, 35, 35, 35, 32, 32, 33, 33, 33, 35, 35, 36, 32, 32, 33, 33, 34, 35,
+        35, 36, 32, 32, 33, 33, 34, 35, 35, 36, 32, 32, 33, 33, 34, 35, 35, 36,
+        32, 32, 33, 33, 34, 35, 35, 36, 32, 33, 33, 33, 34, 36, 36, 36, 33, 33,
+        33, 33, 34, 36, 36, 37, 34, 34, 34, 34, 35, 37, 37, 38, 34, 34, 34, 34,
+        35, 37, 37, 38,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33,
+        33, 33, 33, 33, 34, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33,
+        34, 34, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34,
+        34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 37, 37, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35,
+        35, 35, 35, 35, 36, 36, 37, 37, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35, 36, 36, 36, 36, 36,
+        36, 37, 38, 38 },
+      { /* Chroma */
+        /* Size 4x4 */
+        31, 31, 34, 38, 31, 32, 35, 40, 34, 35, 39, 43, 38, 40, 43, 47,
+        /* Size 8x8 */
+        31, 31, 31, 30, 34, 35, 37, 40, 31, 31, 31, 31, 34, 35, 38, 41, 31, 31,
+        31, 31, 35, 36, 39, 41, 30, 31, 31, 32, 35, 36, 40, 42, 34, 34, 35, 35,
+        39, 40, 43, 44, 35, 35, 36, 36, 40, 41, 44, 45, 37, 38, 39, 40, 43, 44,
+        47, 47, 40, 41, 41, 42, 44, 45, 47, 48,
+        /* Size 16x16 */
+        32, 31, 31, 31, 31, 30, 30, 31, 33, 33, 33, 35, 36, 36, 38, 41, 31, 31,
+        31, 31, 31, 31, 31, 31, 33, 34, 34, 36, 37, 37, 39, 42, 31, 31, 31, 31,
+        31, 31, 31, 32, 34, 34, 34, 37, 38, 38, 40, 42, 31, 31, 31, 31, 31, 31,
+        31, 32, 34, 34, 34, 37, 38, 38, 40, 42, 31, 31, 31, 31, 31, 31, 31, 32,
+        34, 35, 35, 37, 39, 39, 40, 42, 30, 31, 31, 31, 31, 32, 32, 32, 34, 35,
+        35, 38, 40, 40, 41, 42, 30, 31, 31, 31, 31, 32, 32, 32, 34, 35, 35, 38,
+        40, 40, 41, 42, 31, 31, 32, 32, 32, 32, 32, 33, 35, 36, 36, 38, 40, 40,
+        41, 43, 33, 33, 34, 34, 34, 34, 34, 35, 37, 38, 38, 41, 42, 42, 43, 44,
+        33, 34, 34, 34, 35, 35, 35, 36, 38, 39, 39, 41, 43, 43, 44, 45, 33, 34,
+        34, 34, 35, 35, 35, 36, 38, 39, 39, 41, 43, 43, 44, 45, 35, 36, 37, 37,
+        37, 38, 38, 38, 41, 41, 41, 44, 46, 46, 46, 46, 36, 37, 38, 38, 39, 40,
+        40, 40, 42, 43, 43, 46, 47, 47, 47, 47, 36, 37, 38, 38, 39, 40, 40, 40,
+        42, 43, 43, 46, 47, 47, 47, 47, 38, 39, 40, 40, 40, 41, 41, 41, 43, 44,
+        44, 46, 47, 47, 47, 48, 41, 42, 42, 42, 42, 42, 42, 43, 44, 45, 45, 46,
+        47, 47, 48, 48,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 31, 32, 33, 33,
+        33, 33, 33, 34, 35, 36, 36, 36, 36, 37, 38, 40, 41, 41, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 31, 32, 33, 34, 34, 34, 34, 35,
+        36, 37, 37, 37, 37, 37, 39, 40, 42, 42, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 34, 34, 34, 35, 36, 37, 37, 37,
+        37, 38, 39, 40, 42, 42, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 32, 32, 34, 34, 34, 34, 34, 35, 36, 38, 38, 38, 38, 38, 40, 41,
+        42, 42, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33,
+        34, 34, 34, 34, 34, 35, 37, 38, 38, 38, 38, 39, 40, 41, 42, 42, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 34, 34, 34,
+        34, 35, 37, 38, 38, 38, 38, 39, 40, 41, 42, 42, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 34, 34, 34, 34, 35, 37, 38,
+        38, 38, 38, 39, 40, 41, 42, 42, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 32, 33, 34, 34, 34, 34, 34, 36, 37, 38, 38, 38, 38, 39,
+        40, 41, 42, 42, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        32, 33, 34, 35, 35, 35, 35, 36, 37, 38, 39, 39, 39, 39, 40, 41, 42, 42,
+        30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 35,
+        35, 35, 35, 36, 37, 39, 39, 39, 39, 40, 40, 41, 42, 42, 30, 30, 31, 31,
+        31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 33, 34, 35, 35, 35, 35, 36,
+        38, 39, 40, 40, 40, 40, 41, 42, 42, 42, 30, 30, 31, 31, 31, 31, 31, 31,
+        31, 31, 32, 32, 32, 32, 32, 33, 34, 35, 35, 35, 35, 36, 38, 39, 40, 40,
+        40, 40, 41, 42, 42, 42, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32,
+        32, 32, 32, 33, 34, 35, 35, 35, 35, 36, 38, 39, 40, 40, 40, 40, 41, 42,
+        42, 42, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 33,
+        34, 35, 35, 35, 35, 36, 38, 39, 40, 40, 40, 40, 41, 42, 42, 42, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 35, 36, 36, 36,
+        36, 37, 38, 40, 40, 40, 40, 41, 41, 42, 43, 43, 32, 32, 32, 32, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 34, 35, 36, 37, 37, 37, 37, 38, 39, 41,
+        41, 41, 41, 42, 42, 43, 43, 43, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34,
+        34, 34, 34, 34, 35, 36, 37, 38, 38, 38, 38, 39, 41, 42, 42, 42, 42, 43,
+        43, 44, 44, 44, 33, 34, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35,
+        36, 37, 38, 39, 39, 39, 39, 40, 41, 43, 43, 43, 43, 43, 44, 44, 45, 45,
+        33, 34, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 37, 38, 39,
+        39, 39, 39, 40, 41, 43, 43, 43, 43, 43, 44, 44, 45, 45, 33, 34, 34, 34,
+        34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 37, 38, 39, 39, 39, 39, 40,
+        41, 43, 43, 43, 43, 43, 44, 44, 45, 45, 33, 34, 34, 34, 34, 34, 34, 34,
+        35, 35, 35, 35, 35, 35, 36, 37, 38, 39, 39, 39, 39, 40, 41, 43, 43, 43,
+        43, 43, 44, 44, 45, 45, 34, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36,
+        36, 36, 37, 38, 39, 40, 40, 40, 40, 41, 42, 44, 44, 44, 44, 44, 45, 45,
+        45, 45, 35, 36, 36, 36, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 39,
+        41, 41, 41, 41, 41, 42, 44, 45, 46, 46, 46, 46, 46, 46, 46, 46, 36, 37,
+        37, 38, 38, 38, 38, 38, 38, 39, 39, 39, 39, 39, 40, 41, 42, 43, 43, 43,
+        43, 44, 45, 46, 47, 47, 47, 47, 47, 47, 47, 47, 36, 37, 37, 38, 38, 38,
+        38, 38, 39, 39, 40, 40, 40, 40, 40, 41, 42, 43, 43, 43, 43, 44, 46, 47,
+        47, 47, 47, 47, 47, 47, 47, 47, 36, 37, 37, 38, 38, 38, 38, 38, 39, 39,
+        40, 40, 40, 40, 40, 41, 42, 43, 43, 43, 43, 44, 46, 47, 47, 47, 47, 47,
+        47, 47, 47, 47, 36, 37, 37, 38, 38, 38, 38, 38, 39, 39, 40, 40, 40, 40,
+        40, 41, 42, 43, 43, 43, 43, 44, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47,
+        37, 37, 38, 38, 39, 39, 39, 39, 39, 40, 40, 40, 40, 40, 41, 42, 43, 43,
+        43, 43, 43, 44, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 38, 39, 39, 40,
+        40, 40, 40, 40, 40, 40, 41, 41, 41, 41, 41, 42, 43, 44, 44, 44, 44, 45,
+        46, 47, 47, 47, 47, 47, 47, 48, 48, 48, 40, 40, 40, 41, 41, 41, 41, 41,
+        41, 41, 42, 42, 42, 42, 42, 43, 44, 44, 44, 44, 44, 45, 46, 47, 47, 47,
+        47, 47, 48, 48, 48, 48, 41, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42,
+        42, 42, 43, 43, 44, 45, 45, 45, 45, 45, 46, 47, 47, 47, 47, 47, 48, 48,
+        48, 48, 41, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 43, 43,
+        44, 45, 45, 45, 45, 45, 46, 47, 47, 47, 47, 47, 48, 48, 48, 48,
+        /* Size 4x8 */
+        31, 31, 35, 37, 31, 31, 36, 38, 31, 32, 37, 39, 31, 32, 37, 40, 34, 36,
+        40, 43, 35, 37, 42, 44, 38, 40, 45, 47, 41, 42, 45, 47,
+        /* Size 8x4 */
+        31, 31, 31, 31, 34, 35, 38, 41, 31, 31, 32, 32, 36, 37, 40, 42, 35, 36,
+        37, 37, 40, 42, 45, 45, 37, 38, 39, 40, 43, 44, 47, 47,
+        /* Size 8x16 */
+        32, 31, 31, 31, 33, 37, 37, 38, 31, 31, 31, 31, 33, 38, 38, 39, 31, 31,
+        31, 31, 34, 38, 38, 40, 31, 31, 31, 31, 34, 38, 38, 40, 31, 31, 32, 32,
+        34, 39, 39, 40, 30, 31, 32, 32, 35, 40, 40, 41, 30, 31, 32, 32, 35, 40,
+        40, 41, 31, 32, 33, 33, 35, 40, 40, 41, 33, 34, 35, 35, 37, 42, 42, 43,
+        33, 35, 36, 36, 38, 43, 43, 44, 33, 35, 36, 36, 38, 43, 43, 44, 35, 37,
+        38, 38, 41, 45, 45, 46, 37, 39, 40, 40, 43, 47, 47, 47, 37, 39, 40, 40,
+        43, 47, 47, 47, 39, 40, 41, 41, 43, 47, 47, 47, 42, 42, 43, 43, 44, 47,
+        47, 48,
+        /* Size 16x8 */
+        32, 31, 31, 31, 31, 30, 30, 31, 33, 33, 33, 35, 37, 37, 39, 42, 31, 31,
+        31, 31, 31, 31, 31, 32, 34, 35, 35, 37, 39, 39, 40, 42, 31, 31, 31, 31,
+        32, 32, 32, 33, 35, 36, 36, 38, 40, 40, 41, 43, 31, 31, 31, 31, 32, 32,
+        32, 33, 35, 36, 36, 38, 40, 40, 41, 43, 33, 33, 34, 34, 34, 35, 35, 35,
+        37, 38, 38, 41, 43, 43, 43, 44, 37, 38, 38, 38, 39, 40, 40, 40, 42, 43,
+        43, 45, 47, 47, 47, 47, 37, 38, 38, 38, 39, 40, 40, 40, 42, 43, 43, 45,
+        47, 47, 47, 47, 38, 39, 40, 40, 40, 41, 41, 41, 43, 44, 44, 46, 47, 47,
+        47, 48,
+        /* Size 16x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 33, 35, 37, 37, 37, 37, 38, 42, 31, 31,
+        31, 31, 31, 31, 31, 31, 33, 35, 37, 37, 37, 37, 39, 42, 31, 31, 31, 31,
+        31, 31, 31, 32, 33, 35, 38, 38, 38, 38, 39, 42, 31, 31, 31, 31, 31, 31,
+        31, 32, 34, 36, 38, 38, 38, 38, 40, 42, 31, 31, 31, 31, 31, 31, 31, 32,
+        34, 36, 38, 38, 38, 38, 40, 42, 31, 31, 31, 31, 31, 31, 31, 32, 34, 36,
+        38, 38, 38, 38, 40, 42, 31, 31, 31, 31, 31, 31, 31, 32, 34, 36, 38, 38,
+        38, 38, 40, 42, 31, 31, 31, 31, 31, 31, 31, 32, 34, 36, 38, 38, 38, 38,
+        40, 42, 31, 31, 31, 31, 32, 32, 32, 32, 34, 36, 39, 39, 39, 39, 40, 42,
+        30, 31, 31, 32, 32, 32, 32, 32, 34, 37, 39, 39, 39, 39, 40, 42, 30, 31,
+        31, 32, 32, 32, 32, 33, 35, 37, 40, 40, 40, 40, 41, 42, 30, 31, 31, 32,
+        32, 32, 32, 33, 35, 37, 40, 40, 40, 40, 41, 42, 30, 31, 31, 32, 32, 32,
+        32, 33, 35, 37, 40, 40, 40, 40, 41, 42, 30, 31, 31, 32, 32, 32, 32, 33,
+        35, 37, 40, 40, 40, 40, 41, 42, 31, 31, 32, 32, 33, 33, 33, 33, 35, 38,
+        40, 40, 40, 40, 41, 43, 32, 32, 33, 33, 34, 34, 34, 34, 36, 39, 41, 41,
+        41, 41, 42, 44, 33, 33, 34, 35, 35, 35, 35, 35, 37, 40, 42, 42, 42, 42,
+        43, 44, 33, 34, 35, 35, 36, 36, 36, 36, 38, 40, 43, 43, 43, 43, 44, 45,
+        33, 34, 35, 35, 36, 36, 36, 36, 38, 40, 43, 43, 43, 43, 44, 45, 33, 34,
+        35, 35, 36, 36, 36, 36, 38, 40, 43, 43, 43, 43, 44, 45, 33, 34, 35, 35,
+        36, 36, 36, 36, 38, 40, 43, 43, 43, 43, 44, 45, 34, 35, 36, 37, 37, 37,
+        37, 37, 39, 42, 44, 44, 44, 44, 45, 45, 35, 36, 37, 38, 38, 38, 38, 39,
+        41, 43, 45, 45, 45, 45, 46, 46, 36, 37, 38, 39, 39, 39, 39, 40, 42, 44,
+        47, 47, 47, 47, 47, 47, 37, 38, 39, 40, 40, 40, 40, 41, 43, 45, 47, 47,
+        47, 47, 47, 47, 37, 38, 39, 40, 40, 40, 40, 41, 43, 45, 47, 47, 47, 47,
+        47, 47, 37, 38, 39, 40, 40, 40, 40, 41, 43, 45, 47, 47, 47, 47, 47, 47,
+        37, 38, 39, 40, 40, 40, 40, 41, 43, 45, 47, 47, 47, 47, 47, 47, 39, 39,
+        40, 41, 41, 41, 41, 42, 43, 45, 47, 47, 47, 47, 47, 48, 40, 41, 41, 42,
+        42, 42, 42, 42, 44, 45, 47, 47, 47, 47, 47, 48, 42, 42, 42, 43, 43, 43,
+        43, 43, 44, 46, 47, 47, 47, 47, 48, 48, 42, 42, 42, 43, 43, 43, 43, 43,
+        44, 46, 47, 47, 47, 47, 48, 48,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 31, 32, 33, 33,
+        33, 33, 33, 34, 35, 36, 37, 37, 37, 37, 39, 40, 42, 42, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 34, 34, 34, 35,
+        36, 37, 38, 38, 38, 38, 39, 41, 42, 42, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 32, 33, 34, 35, 35, 35, 35, 36, 37, 38, 39, 39,
+        39, 39, 40, 41, 42, 42, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32,
+        32, 32, 32, 33, 35, 35, 35, 35, 35, 37, 38, 39, 40, 40, 40, 40, 41, 42,
+        43, 43, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 33, 34,
+        35, 36, 36, 36, 36, 37, 38, 39, 40, 40, 40, 40, 41, 42, 43, 43, 31, 31,
+        31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 33, 34, 35, 36, 36, 36,
+        36, 37, 38, 39, 40, 40, 40, 40, 41, 42, 43, 43, 31, 31, 31, 31, 31, 31,
+        31, 31, 32, 32, 32, 32, 32, 32, 33, 34, 35, 36, 36, 36, 36, 37, 38, 39,
+        40, 40, 40, 40, 41, 42, 43, 43, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 33, 33, 34, 35, 36, 36, 36, 36, 37, 39, 40, 41, 41, 41, 41,
+        42, 42, 43, 43, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35,
+        35, 36, 37, 38, 38, 38, 38, 39, 41, 42, 43, 43, 43, 43, 43, 44, 44, 44,
+        35, 35, 35, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 38, 39, 40, 40,
+        40, 40, 40, 42, 43, 44, 45, 45, 45, 45, 45, 45, 46, 46, 37, 37, 38, 38,
+        38, 38, 38, 38, 39, 39, 40, 40, 40, 40, 40, 41, 42, 43, 43, 43, 43, 44,
+        45, 47, 47, 47, 47, 47, 47, 47, 47, 47, 37, 37, 38, 38, 38, 38, 38, 38,
+        39, 39, 40, 40, 40, 40, 40, 41, 42, 43, 43, 43, 43, 44, 45, 47, 47, 47,
+        47, 47, 47, 47, 47, 47, 37, 37, 38, 38, 38, 38, 38, 38, 39, 39, 40, 40,
+        40, 40, 40, 41, 42, 43, 43, 43, 43, 44, 45, 47, 47, 47, 47, 47, 47, 47,
+        47, 47, 37, 37, 38, 38, 38, 38, 38, 38, 39, 39, 40, 40, 40, 40, 40, 41,
+        42, 43, 43, 43, 43, 44, 45, 47, 47, 47, 47, 47, 47, 47, 47, 47, 38, 39,
+        39, 40, 40, 40, 40, 40, 40, 40, 41, 41, 41, 41, 41, 42, 43, 44, 44, 44,
+        44, 45, 46, 47, 47, 47, 47, 47, 47, 47, 48, 48, 42, 42, 42, 42, 42, 42,
+        42, 42, 42, 42, 42, 42, 42, 42, 43, 44, 44, 45, 45, 45, 45, 45, 46, 47,
+        47, 47, 47, 47, 48, 48, 48, 48,
+        /* Size 4x16 */
+        31, 31, 35, 37, 31, 31, 35, 38, 31, 31, 36, 38, 31, 31, 36, 38, 31, 32,
+        36, 39, 31, 32, 37, 40, 31, 32, 37, 40, 31, 33, 38, 40, 33, 35, 40, 42,
+        34, 36, 40, 43, 34, 36, 40, 43, 36, 38, 43, 45, 38, 40, 45, 47, 38, 40,
+        45, 47, 39, 41, 45, 47, 42, 43, 46, 47,
+        /* Size 16x4 */
+        31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 34, 36, 38, 38, 39, 42, 31, 31,
+        31, 31, 32, 32, 32, 33, 35, 36, 36, 38, 40, 40, 41, 43, 35, 35, 36, 36,
+        36, 37, 37, 38, 40, 40, 40, 43, 45, 45, 45, 46, 37, 38, 38, 38, 39, 40,
+        40, 40, 42, 43, 43, 45, 47, 47, 47, 47,
+        /* Size 8x32 */
+        32, 31, 31, 31, 33, 37, 37, 38, 31, 31, 31, 31, 33, 37, 37, 39, 31, 31,
+        31, 31, 33, 38, 38, 39, 31, 31, 31, 31, 34, 38, 38, 40, 31, 31, 31, 31,
+        34, 38, 38, 40, 31, 31, 31, 31, 34, 38, 38, 40, 31, 31, 31, 31, 34, 38,
+        38, 40, 31, 31, 31, 31, 34, 38, 38, 40, 31, 31, 32, 32, 34, 39, 39, 40,
+        30, 31, 32, 32, 34, 39, 39, 40, 30, 31, 32, 32, 35, 40, 40, 41, 30, 31,
+        32, 32, 35, 40, 40, 41, 30, 31, 32, 32, 35, 40, 40, 41, 30, 31, 32, 32,
+        35, 40, 40, 41, 31, 32, 33, 33, 35, 40, 40, 41, 32, 33, 34, 34, 36, 41,
+        41, 42, 33, 34, 35, 35, 37, 42, 42, 43, 33, 35, 36, 36, 38, 43, 43, 44,
+        33, 35, 36, 36, 38, 43, 43, 44, 33, 35, 36, 36, 38, 43, 43, 44, 33, 35,
+        36, 36, 38, 43, 43, 44, 34, 36, 37, 37, 39, 44, 44, 45, 35, 37, 38, 38,
+        41, 45, 45, 46, 36, 38, 39, 39, 42, 47, 47, 47, 37, 39, 40, 40, 43, 47,
+        47, 47, 37, 39, 40, 40, 43, 47, 47, 47, 37, 39, 40, 40, 43, 47, 47, 47,
+        37, 39, 40, 40, 43, 47, 47, 47, 39, 40, 41, 41, 43, 47, 47, 47, 40, 41,
+        42, 42, 44, 47, 47, 47, 42, 42, 43, 43, 44, 47, 47, 48, 42, 42, 43, 43,
+        44, 47, 47, 48,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 31, 32, 33, 33,
+        33, 33, 33, 34, 35, 36, 37, 37, 37, 37, 39, 40, 42, 42, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 35, 35, 35, 35, 36,
+        37, 38, 39, 39, 39, 39, 40, 41, 42, 42, 31, 31, 31, 31, 31, 31, 31, 31,
+        32, 32, 32, 32, 32, 32, 33, 34, 35, 36, 36, 36, 36, 37, 38, 39, 40, 40,
+        40, 40, 41, 42, 43, 43, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32,
+        32, 32, 33, 34, 35, 36, 36, 36, 36, 37, 38, 39, 40, 40, 40, 40, 41, 42,
+        43, 43, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 36,
+        37, 38, 38, 38, 38, 39, 41, 42, 43, 43, 43, 43, 43, 44, 44, 44, 37, 37,
+        38, 38, 38, 38, 38, 38, 39, 39, 40, 40, 40, 40, 40, 41, 42, 43, 43, 43,
+        43, 44, 45, 47, 47, 47, 47, 47, 47, 47, 47, 47, 37, 37, 38, 38, 38, 38,
+        38, 38, 39, 39, 40, 40, 40, 40, 40, 41, 42, 43, 43, 43, 43, 44, 45, 47,
+        47, 47, 47, 47, 47, 47, 47, 47, 38, 39, 39, 40, 40, 40, 40, 40, 40, 40,
+        41, 41, 41, 41, 41, 42, 43, 44, 44, 44, 44, 45, 46, 47, 47, 47, 47, 47,
+        47, 47, 48, 48 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        31, 31, 31, 32, 31, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 33,
+        /* Size 8x8 */
+        31, 31, 31, 31, 31, 31, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32,
+        32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32,
+        32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 32, 32, 32, 32, 32, 32, 33, 33,
+        /* Size 16x16 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 31, 31, 31, 31,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+        33, 33, 33, 33,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        /* Size 4x8 */
+        31, 31, 31, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32,
+        32, 32, 31, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33,
+        /* Size 8x4 */
+        31, 31, 31, 31, 31, 31, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33,
+        /* Size 8x16 */
+        32, 31, 31, 31, 31, 31, 31, 32, 31, 31, 31, 31, 31, 31, 32, 32, 31, 31,
+        32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32,
+        32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32,
+        32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32,
+        31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32,
+        32, 32, 32, 32, 33, 33, 31, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
+        32, 32, 33, 34, 32, 32, 32, 32, 32, 32, 33, 34, 32, 32, 32, 32, 32, 32,
+        33, 34,
+        /* Size 16x8 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+        33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34,
+        34, 34,
+        /* Size 16x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 33, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+        33, 33, 33, 34, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33,
+        34, 34, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 33, 33, 33, 34, 34,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        34, 34, 34, 34, 34, 34, 34, 34,
+        /* Size 4x16 */
+        31, 31, 31, 32, 31, 31, 31, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32,
+        32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32,
+        31, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32,
+        32, 33, 32, 32, 32, 33, 32, 32, 32, 33,
+        /* Size 16x4 */
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 33, 33, 33, 33,
+        /* Size 8x32 */
+        32, 31, 31, 31, 31, 31, 31, 32, 31, 31, 31, 31, 31, 31, 32, 32, 31, 31,
+        31, 31, 31, 31, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
+        32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32,
+        32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32,
+        31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32,
+        32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32,
+        32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32,
+        32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32,
+        31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32,
+        32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32,
+        32, 32, 33, 33, 31, 32, 32, 32, 32, 32, 33, 33, 31, 32, 32, 32, 32, 32,
+        33, 33, 32, 32, 32, 32, 32, 32, 33, 34, 32, 32, 32, 32, 32, 32, 33, 34,
+        32, 32, 32, 32, 32, 32, 33, 34, 32, 32, 32, 32, 32, 32, 33, 34, 32, 32,
+        32, 32, 32, 32, 33, 34, 32, 32, 32, 32, 32, 32, 33, 34, 32, 32, 32, 32,
+        32, 32, 33, 34,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34,
+        34, 34, 34, 34 },
+      { /* Chroma */
+        /* Size 4x4 */
+        31, 31, 31, 34, 31, 31, 31, 35, 31, 31, 32, 35, 34, 35, 35, 39,
+        /* Size 8x8 */
+        31, 31, 31, 31, 30, 31, 33, 33, 31, 31, 31, 31, 31, 32, 34, 34, 31, 31,
+        31, 31, 31, 32, 34, 34, 31, 31, 31, 31, 31, 32, 35, 35, 30, 31, 31, 31,
+        32, 32, 35, 35, 31, 32, 32, 32, 32, 33, 36, 36, 33, 34, 34, 35, 35, 36,
+        39, 39, 33, 34, 34, 35, 35, 36, 39, 39,
+        /* Size 16x16 */
+        32, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 31, 33, 33, 33, 33, 31, 31,
+        31, 31, 31, 31, 31, 31, 30, 30, 30, 32, 33, 34, 34, 34, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 34, 34, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 32, 34, 34, 34, 34, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 32, 34, 34, 34, 34, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 32, 34, 34, 34, 34, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33,
+        34, 35, 35, 35, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35,
+        35, 35, 30, 30, 31, 31, 31, 31, 31, 31, 32, 32, 32, 33, 34, 35, 35, 35,
+        30, 30, 31, 31, 31, 31, 31, 31, 32, 32, 32, 33, 34, 35, 35, 35, 30, 30,
+        31, 31, 31, 31, 31, 31, 32, 32, 32, 33, 34, 35, 35, 35, 31, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 33, 34, 36, 37, 37, 37, 33, 33, 33, 34, 34, 34,
+        34, 34, 34, 34, 34, 36, 37, 38, 38, 38, 33, 34, 34, 34, 34, 34, 35, 35,
+        35, 35, 35, 37, 38, 39, 39, 39, 33, 34, 34, 34, 34, 34, 35, 35, 35, 35,
+        35, 37, 38, 39, 39, 39, 33, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 37,
+        38, 39, 39, 39,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30,
+        30, 30, 30, 31, 31, 32, 33, 33, 33, 33, 33, 33, 33, 34, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 30, 31,
+        31, 32, 33, 33, 33, 33, 33, 33, 33, 34, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 31, 32, 32, 33, 34,
+        34, 34, 34, 34, 34, 34, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 33, 34, 34, 34, 34, 34,
+        34, 35, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 32, 33, 33, 34, 34, 34, 34, 34, 34, 35, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 32, 32, 33, 34, 34, 34, 34, 34, 34, 34, 35, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 33,
+        34, 34, 34, 34, 34, 34, 34, 35, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 33, 34, 34, 34, 34,
+        34, 34, 34, 35, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 32, 32, 33, 34, 34, 34, 34, 34, 34, 34, 35,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 32, 32, 33, 34, 34, 34, 34, 34, 34, 34, 35, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32,
+        32, 33, 34, 34, 34, 34, 34, 34, 34, 35, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 33, 34, 35,
+        35, 35, 35, 35, 35, 35, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 33, 34, 35, 35, 35, 35, 35,
+        35, 35, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 32, 33, 33, 34, 35, 35, 35, 35, 35, 35, 36, 30, 30,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 32, 33, 34, 34, 35, 35, 35, 35, 35, 35, 36, 30, 30, 30, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 34,
+        34, 35, 35, 35, 35, 35, 35, 36, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 35, 35, 35,
+        35, 35, 35, 36, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 35, 35, 35, 35, 35, 35, 36,
+        30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32,
+        32, 32, 32, 32, 33, 34, 34, 35, 35, 35, 35, 35, 35, 36, 30, 30, 30, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32,
+        33, 34, 34, 35, 35, 35, 35, 35, 35, 36, 30, 30, 30, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 35,
+        35, 35, 35, 35, 35, 36, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 35, 36, 36, 36, 36, 36,
+        36, 37, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 34, 34, 35, 36, 37, 37, 37, 37, 37, 37, 37, 32, 32,
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34,
+        34, 34, 35, 36, 37, 37, 37, 37, 37, 37, 37, 38, 33, 33, 33, 33, 33, 34,
+        34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 35, 36, 37,
+        37, 38, 38, 38, 38, 38, 38, 39, 33, 33, 34, 34, 34, 34, 34, 34, 34, 34,
+        34, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 36, 37, 37, 38, 39, 39, 39,
+        39, 39, 39, 40, 33, 33, 34, 34, 34, 34, 34, 34, 34, 34, 34, 35, 35, 35,
+        35, 35, 35, 35, 35, 35, 35, 36, 37, 37, 38, 39, 39, 39, 39, 39, 39, 40,
+        33, 33, 34, 34, 34, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35,
+        35, 35, 35, 36, 37, 37, 38, 39, 39, 39, 39, 39, 39, 40, 33, 33, 34, 34,
+        34, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 36,
+        37, 37, 38, 39, 39, 39, 39, 39, 39, 40, 33, 33, 34, 34, 34, 34, 34, 34,
+        34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 36, 37, 37, 38, 39,
+        39, 39, 39, 39, 39, 40, 33, 33, 34, 34, 34, 34, 34, 34, 34, 34, 34, 35,
+        35, 35, 35, 35, 35, 35, 35, 35, 35, 36, 37, 37, 38, 39, 39, 39, 39, 39,
+        39, 40, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 36, 36, 36,
+        36, 36, 36, 36, 36, 37, 37, 38, 39, 40, 40, 40, 40, 40, 40, 40,
+        /* Size 4x8 */
+        31, 31, 31, 34, 31, 31, 31, 35, 31, 31, 31, 35, 31, 32, 32, 36, 31, 32,
+        32, 36, 31, 33, 33, 37, 34, 36, 36, 40, 34, 36, 36, 40,
+        /* Size 8x4 */
+        31, 31, 31, 31, 31, 31, 34, 34, 31, 31, 31, 32, 32, 33, 36, 36, 31, 31,
+        31, 32, 32, 33, 36, 36, 34, 35, 35, 36, 36, 37, 40, 40,
+        /* Size 8x16 */
+        32, 31, 31, 31, 31, 31, 33, 35, 31, 31, 31, 31, 31, 31, 33, 36, 31, 31,
+        31, 31, 31, 31, 34, 36, 31, 31, 31, 31, 31, 31, 34, 37, 31, 31, 31, 31,
+        31, 31, 34, 37, 31, 31, 31, 31, 31, 31, 34, 37, 31, 31, 31, 32, 32, 32,
+        34, 37, 30, 31, 31, 32, 32, 32, 34, 38, 30, 31, 32, 32, 32, 32, 35, 38,
+        30, 31, 32, 32, 32, 32, 35, 38, 30, 31, 32, 32, 32, 32, 35, 38, 31, 32,
+        33, 33, 33, 33, 36, 39, 33, 34, 34, 35, 35, 35, 37, 40, 33, 34, 35, 36,
+        36, 36, 38, 41, 33, 34, 35, 36, 36, 36, 38, 41, 33, 34, 35, 36, 36, 36,
+        38, 41,
+        /* Size 16x8 */
+        32, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 31, 33, 33, 33, 33, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 34, 34, 34, 34, 31, 31, 31, 31,
+        31, 31, 31, 31, 32, 32, 32, 33, 34, 35, 35, 35, 31, 31, 31, 31, 31, 31,
+        32, 32, 32, 32, 32, 33, 35, 36, 36, 36, 31, 31, 31, 31, 31, 31, 32, 32,
+        32, 32, 32, 33, 35, 36, 36, 36, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32,
+        32, 33, 35, 36, 36, 36, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35, 35, 36,
+        37, 38, 38, 38, 35, 36, 36, 37, 37, 37, 37, 38, 38, 38, 38, 39, 40, 41,
+        41, 41,
+        /* Size 16x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 35, 37, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 35, 37, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 36, 37, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 32, 33, 35, 36, 38, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 32, 34, 35, 36, 38, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 33, 34, 35, 37, 38, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33,
+        34, 35, 37, 38, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35,
+        37, 38, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 37, 38,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 37, 38, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 37, 38, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 37, 38, 31, 31, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 33, 34, 36, 37, 39, 31, 31, 31, 31, 31, 32, 32, 32,
+        32, 32, 32, 33, 34, 36, 37, 39, 30, 31, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 33, 34, 36, 38, 39, 30, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33,
+        35, 36, 38, 40, 30, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 35, 36,
+        38, 40, 30, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 35, 36, 38, 40,
+        30, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 35, 36, 38, 40, 30, 31,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 35, 36, 38, 40, 30, 31, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 33, 35, 36, 38, 40, 31, 31, 31, 32, 32, 33,
+        33, 33, 33, 33, 33, 34, 35, 37, 38, 40, 31, 32, 32, 33, 33, 33, 33, 33,
+        33, 33, 33, 35, 36, 37, 39, 41, 32, 32, 33, 33, 34, 34, 34, 34, 34, 34,
+        34, 35, 37, 38, 40, 41, 33, 33, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36,
+        37, 39, 40, 42, 33, 34, 34, 35, 35, 36, 36, 36, 36, 36, 36, 37, 38, 40,
+        41, 43, 33, 34, 34, 35, 35, 36, 36, 36, 36, 36, 36, 37, 38, 40, 41, 43,
+        33, 34, 34, 35, 35, 36, 36, 36, 36, 36, 36, 37, 38, 40, 41, 43, 33, 34,
+        34, 35, 35, 36, 36, 36, 36, 36, 36, 37, 38, 40, 41, 43, 33, 34, 34, 35,
+        35, 36, 36, 36, 36, 36, 36, 37, 38, 40, 41, 43, 33, 34, 34, 35, 35, 36,
+        36, 36, 36, 36, 36, 37, 38, 40, 41, 43, 34, 34, 35, 35, 36, 36, 36, 36,
+        36, 36, 36, 38, 39, 40, 42, 44,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30,
+        30, 30, 30, 31, 31, 32, 33, 33, 33, 33, 33, 33, 33, 34, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        32, 32, 33, 34, 34, 34, 34, 34, 34, 34, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 34,
+        34, 34, 34, 34, 34, 35, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 33, 34, 35, 35, 35, 35, 35,
+        35, 35, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 32, 33, 34, 34, 35, 35, 35, 35, 35, 35, 36, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 34, 35, 36, 36, 36, 36, 36, 36, 36, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34,
+        35, 36, 36, 36, 36, 36, 36, 36, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 35, 36, 36, 36,
+        36, 36, 36, 36, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 35, 36, 36, 36, 36, 36, 36, 36,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 33, 33, 34, 35, 36, 36, 36, 36, 36, 36, 36, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+        33, 34, 35, 36, 36, 36, 36, 36, 36, 36, 32, 32, 32, 32, 32, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 35, 35, 36, 37,
+        37, 37, 37, 37, 37, 38, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 34,
+        34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 36, 37, 37, 38, 38, 38, 38, 38,
+        38, 39, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36,
+        36, 36, 36, 36, 36, 37, 37, 38, 39, 40, 40, 40, 40, 40, 40, 40, 35, 35,
+        36, 36, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38,
+        38, 38, 39, 40, 40, 41, 41, 41, 41, 41, 41, 42, 37, 37, 37, 38, 38, 38,
+        38, 38, 38, 38, 38, 38, 39, 39, 39, 40, 40, 40, 40, 40, 40, 40, 41, 41,
+        42, 43, 43, 43, 43, 43, 43, 44,
+        /* Size 4x16 */
+        31, 31, 31, 34, 31, 31, 31, 34, 31, 31, 31, 35, 31, 31, 31, 35, 31, 31,
+        31, 35, 31, 31, 31, 35, 31, 32, 32, 36, 31, 32, 32, 36, 31, 32, 32, 36,
+        31, 32, 32, 36, 31, 32, 32, 36, 32, 33, 33, 37, 33, 35, 35, 39, 34, 36,
+        36, 40, 34, 36, 36, 40, 34, 36, 36, 40,
+        /* Size 16x4 */
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 34, 34, 31, 31,
+        31, 31, 31, 31, 32, 32, 32, 32, 32, 33, 35, 36, 36, 36, 31, 31, 31, 31,
+        31, 31, 32, 32, 32, 32, 32, 33, 35, 36, 36, 36, 34, 34, 35, 35, 35, 35,
+        36, 36, 36, 36, 36, 37, 39, 40, 40, 40,
+        /* Size 8x32 */
+        32, 31, 31, 31, 31, 31, 33, 35, 31, 31, 31, 31, 31, 31, 33, 35, 31, 31,
+        31, 31, 31, 31, 33, 36, 31, 31, 31, 31, 31, 31, 33, 36, 31, 31, 31, 31,
+        31, 31, 34, 36, 31, 31, 31, 31, 31, 31, 34, 37, 31, 31, 31, 31, 31, 31,
+        34, 37, 31, 31, 31, 31, 31, 31, 34, 37, 31, 31, 31, 31, 31, 31, 34, 37,
+        31, 31, 31, 31, 31, 31, 34, 37, 31, 31, 31, 31, 31, 31, 34, 37, 31, 31,
+        31, 31, 31, 31, 34, 37, 31, 31, 31, 32, 32, 32, 34, 37, 31, 31, 31, 32,
+        32, 32, 34, 37, 30, 31, 31, 32, 32, 32, 34, 38, 30, 31, 32, 32, 32, 32,
+        35, 38, 30, 31, 32, 32, 32, 32, 35, 38, 30, 31, 32, 32, 32, 32, 35, 38,
+        30, 31, 32, 32, 32, 32, 35, 38, 30, 31, 32, 32, 32, 32, 35, 38, 30, 31,
+        32, 32, 32, 32, 35, 38, 31, 31, 32, 33, 33, 33, 35, 38, 31, 32, 33, 33,
+        33, 33, 36, 39, 32, 33, 34, 34, 34, 34, 37, 40, 33, 34, 34, 35, 35, 35,
+        37, 40, 33, 34, 35, 36, 36, 36, 38, 41, 33, 34, 35, 36, 36, 36, 38, 41,
+        33, 34, 35, 36, 36, 36, 38, 41, 33, 34, 35, 36, 36, 36, 38, 41, 33, 34,
+        35, 36, 36, 36, 38, 41, 33, 34, 35, 36, 36, 36, 38, 41, 34, 35, 36, 36,
+        36, 36, 39, 42,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30,
+        30, 30, 30, 31, 31, 32, 33, 33, 33, 33, 33, 33, 33, 34, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        32, 33, 34, 34, 34, 34, 34, 34, 34, 35, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 35,
+        35, 35, 35, 35, 35, 36, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 35, 36, 36, 36, 36, 36,
+        36, 36, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 33, 33, 34, 35, 36, 36, 36, 36, 36, 36, 36, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 33, 33, 34, 35, 36, 36, 36, 36, 36, 36, 36, 33, 33, 33, 33, 34, 34,
+        34, 34, 34, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 36, 37,
+        37, 38, 38, 38, 38, 38, 38, 39, 35, 35, 36, 36, 36, 37, 37, 37, 37, 37,
+        37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 39, 40, 40, 41, 41, 41,
+        41, 41, 41, 42 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        31, 31, 31, 31, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32,
+        /* Size 8x8 */
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
+        32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
+        32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
+        /* Size 16x16 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        /* Size 4x8 */
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32,
+        32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32,
+        /* Size 8x4 */
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31,
+        32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
+        /* Size 8x16 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 31, 31, 31, 32,
+        32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
+        32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
+        31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31,
+        32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
+        32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
+        32, 32,
+        /* Size 16x8 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32,
+        /* Size 16x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32,
+        /* Size 4x16 */
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 31, 32,
+        32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32,
+        31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32,
+        32, 32, 31, 32, 32, 32, 31, 32, 32, 32,
+        /* Size 16x4 */
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        /* Size 8x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32,
+        32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32,
+        31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31,
+        32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
+        32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
+        32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
+        31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31,
+        32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
+        32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
+        32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
+        31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31,
+        32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
+        32, 32, 32, 32,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32 },
+      { /* Chroma */
+        /* Size 4x4 */
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        /* Size 8x8 */
+        31, 31, 31, 31, 31, 31, 31, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 30, 31, 31, 31, 31, 31, 31, 31,
+        /* Size 16x16 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 32,
+        /* Size 32x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30,
+        30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        32, 32, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32,
+        /* Size 4x8 */
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 30, 31, 32, 32,
+        /* Size 8x4 */
+        31, 31, 31, 31, 31, 31, 31, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 32, 32, 31, 31, 31, 31, 31, 31, 32, 32,
+        /* Size 8x16 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31, 31, 31,
+        31, 32, 32, 32, 30, 31, 31, 31, 31, 32, 32, 32, 30, 31, 31, 31, 32, 32,
+        32, 32,
+        /* Size 16x8 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32,
+        32, 32,
+        /* Size 16x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32,
+        30, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 30, 31,
+        31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 30, 30, 31, 31,
+        31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 30, 30, 31, 31, 31, 31,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 30, 30, 31, 31, 31, 31, 31, 31,
+        32, 32, 32, 32, 32, 32, 32, 32,
+        /* Size 32x16 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32,
+        /* Size 4x16 */
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 31, 31,
+        32, 32, 31, 31, 32, 32, 30, 31, 32, 32,
+        /* Size 16x4 */
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 32, 32, 32, 32,
+        /* Size 8x32 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31, 31, 31, 31, 32,
+        32, 32, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31, 31, 31, 31, 32, 32, 32,
+        30, 31, 31, 31, 31, 32, 32, 32, 30, 31, 31, 31, 31, 32, 32, 32, 30, 31,
+        31, 31, 32, 32, 32, 32, 30, 31, 31, 31, 32, 32, 32, 32, 30, 31, 31, 31,
+        32, 32, 32, 32,
+        /* Size 32x8 */
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32,
+        32, 32, 32, 32 },
+  },
+};
+
+static const qm_val_t wt_matrix_ref[NUM_QM_LEVELS - 1][2][QM_TOTAL_SIZE] = {
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 24, 14, 11, 24, 15, 11, 9, 14, 11, 7, 7, 11, 9, 7, 5,
+        /* Size 8x8 */
+        32, 32, 27, 20, 15, 12, 11, 9, 32, 29, 26, 21, 16, 13, 12, 10, 27, 26,
+        19, 16, 13, 11, 10, 10, 20, 21, 16, 12, 11, 9, 9, 8, 15, 16, 13, 11, 9,
+        8, 7, 7, 12, 13, 11, 9, 8, 7, 6, 6, 11, 12, 10, 9, 7, 6, 6, 5, 9, 10,
+        10, 8, 7, 6, 5, 5,
+        /* Size 16x16 */
+        32, 33, 33, 30, 28, 23, 21, 17, 16, 13, 12, 11, 11, 10, 9, 9, 33, 32,
+        32, 31, 30, 25, 23, 19, 17, 14, 14, 12, 11, 11, 10, 9, 33, 32, 31, 29,
+        28, 24, 23, 19, 17, 14, 14, 13, 12, 11, 10, 10, 30, 31, 29, 26, 24, 22,
+        20, 18, 16, 14, 13, 13, 12, 11, 11, 10, 28, 30, 28, 24, 21, 19, 18, 16,
+        15, 13, 13, 12, 11, 11, 10, 10, 23, 25, 24, 22, 19, 16, 15, 14, 13, 11,
+        11, 11, 10, 10, 9, 9, 21, 23, 23, 20, 18, 15, 14, 13, 12, 11, 10, 10, 9,
+        9, 9, 9, 17, 19, 19, 18, 16, 14, 13, 11, 10, 9, 9, 9, 9, 8, 8, 8, 16,
+        17, 17, 16, 15, 13, 12, 10, 10, 9, 8, 8, 8, 8, 8, 7, 13, 14, 14, 14, 13,
+        11, 11, 9, 9, 8, 7, 7, 7, 7, 7, 7, 12, 14, 14, 13, 13, 11, 10, 9, 8, 7,
+        7, 7, 7, 7, 6, 6, 11, 12, 13, 13, 12, 11, 10, 9, 8, 7, 7, 6, 6, 6, 6, 6,
+        11, 11, 12, 12, 11, 10, 9, 9, 8, 7, 7, 6, 6, 6, 5, 5, 10, 11, 11, 11,
+        11, 10, 9, 8, 8, 7, 7, 6, 6, 5, 5, 5, 9, 10, 10, 11, 10, 9, 9, 8, 8, 7,
+        6, 6, 5, 5, 5, 5, 9, 9, 10, 10, 10, 9, 9, 8, 7, 7, 6, 6, 5, 5, 5, 4,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 32, 30, 29, 28, 26, 23, 22, 21, 19, 17, 17, 16, 14,
+        13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 9, 8, 33, 32, 32, 32, 32,
+        32, 30, 30, 29, 27, 24, 23, 22, 20, 18, 17, 17, 15, 13, 13, 13, 12, 12,
+        12, 11, 11, 10, 10, 10, 9, 9, 9, 33, 32, 32, 32, 32, 32, 31, 30, 30, 28,
+        25, 24, 23, 21, 19, 18, 17, 16, 14, 14, 14, 13, 12, 12, 11, 11, 11, 10,
+        10, 9, 9, 9, 33, 32, 32, 32, 31, 31, 30, 29, 29, 27, 25, 24, 23, 21, 19,
+        18, 17, 16, 14, 14, 14, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 9, 33,
+        32, 32, 31, 31, 30, 29, 28, 28, 26, 24, 23, 23, 20, 19, 18, 17, 16, 14,
+        14, 14, 13, 13, 12, 12, 12, 11, 11, 10, 10, 10, 9, 32, 32, 32, 31, 30,
+        29, 28, 28, 27, 26, 24, 23, 22, 21, 19, 19, 18, 16, 15, 15, 14, 13, 13,
+        12, 12, 12, 11, 11, 10, 10, 10, 9, 30, 30, 31, 30, 29, 28, 26, 25, 24,
+        23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 14, 13, 13, 13, 12, 12, 12, 11,
+        11, 11, 10, 10, 9, 29, 30, 30, 29, 28, 28, 25, 24, 23, 22, 20, 20, 19,
+        18, 17, 16, 16, 15, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 10, 10,
+        10, 28, 29, 30, 29, 28, 27, 24, 23, 21, 20, 19, 19, 18, 17, 16, 16, 15,
+        14, 13, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 10, 26, 27, 28,
+        27, 26, 26, 23, 22, 20, 19, 18, 17, 17, 16, 15, 14, 14, 13, 12, 12, 12,
+        11, 11, 11, 11, 10, 10, 10, 10, 10, 9, 9, 23, 24, 25, 25, 24, 24, 22,
+        20, 19, 18, 16, 16, 15, 14, 14, 13, 13, 12, 11, 11, 11, 11, 11, 11, 10,
+        10, 10, 10, 9, 9, 9, 9, 22, 23, 24, 24, 23, 23, 21, 20, 19, 17, 16, 15,
+        15, 14, 13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9,
+        8, 21, 22, 23, 23, 23, 22, 20, 19, 18, 17, 15, 15, 14, 13, 13, 12, 12,
+        11, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 8, 19, 20, 21, 21, 20,
+        21, 19, 18, 17, 16, 14, 14, 13, 12, 12, 12, 11, 11, 10, 10, 10, 9, 9, 9,
+        9, 9, 9, 9, 9, 8, 8, 8, 17, 18, 19, 19, 19, 19, 18, 17, 16, 15, 14, 13,
+        13, 12, 11, 11, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 17,
+        17, 18, 18, 18, 19, 17, 16, 16, 14, 13, 13, 12, 12, 11, 10, 10, 10, 9,
+        9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 16, 17, 17, 17, 17, 18, 16, 16,
+        15, 14, 13, 12, 12, 11, 10, 10, 10, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8, 8,
+        7, 7, 7, 14, 15, 16, 16, 16, 16, 15, 15, 14, 13, 12, 12, 11, 11, 10, 10,
+        9, 9, 8, 8, 8, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 13, 13, 14, 14, 14, 15,
+        14, 13, 13, 12, 11, 11, 11, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7,
+        7, 7, 7, 7, 7, 13, 13, 14, 14, 14, 15, 14, 13, 13, 12, 11, 11, 11, 10,
+        9, 9, 9, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 12, 13, 14, 14,
+        14, 14, 13, 13, 13, 12, 11, 11, 10, 10, 9, 9, 8, 8, 7, 7, 7, 7, 7, 7, 7,
+        7, 7, 6, 6, 6, 6, 6, 12, 12, 13, 13, 13, 13, 13, 12, 12, 11, 11, 10, 10,
+        9, 9, 9, 8, 8, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 11, 12, 12, 12,
+        13, 13, 13, 12, 12, 11, 11, 10, 10, 9, 9, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6,
+        6, 6, 6, 6, 6, 6, 6, 11, 12, 12, 12, 12, 12, 12, 12, 11, 11, 11, 10, 10,
+        9, 9, 8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 11, 11, 11, 12,
+        12, 12, 12, 12, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 6, 6, 6, 6,
+        6, 6, 5, 5, 5, 5, 5, 10, 11, 11, 11, 12, 12, 12, 11, 11, 10, 10, 10, 9,
+        9, 9, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 10, 10, 11, 11,
+        11, 11, 11, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 6,
+        5, 5, 5, 5, 5, 5, 10, 10, 10, 11, 11, 11, 11, 11, 10, 10, 10, 9, 9, 9,
+        8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5, 5, 9, 10, 10, 10, 10,
+        10, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 5, 5, 5,
+        5, 5, 5, 5, 5, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 8, 8, 8, 7,
+        7, 7, 7, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5, 4, 9, 9, 9, 10, 10, 10, 10,
+        10, 10, 9, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5,
+        4, 4, 8, 9, 9, 9, 9, 9, 9, 10, 10, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 6, 6,
+        6, 6, 5, 5, 5, 5, 5, 5, 4, 4, 4,
+        /* Size 4x8 */
+        32, 24, 14, 11, 31, 24, 15, 12, 28, 18, 12, 11, 21, 14, 10, 9, 16, 12,
+        8, 8, 13, 11, 7, 7, 11, 10, 7, 6, 10, 9, 7, 5,
+        /* Size 8x4 */
+        32, 31, 28, 21, 16, 13, 11, 10, 24, 24, 18, 14, 12, 11, 10, 9, 14, 15,
+        12, 10, 8, 7, 7, 7, 11, 12, 11, 9, 8, 7, 6, 5,
+        /* Size 8x16 */
+        32, 32, 28, 19, 16, 12, 11, 10, 33, 31, 30, 21, 17, 13, 12, 11, 32, 30,
+        28, 20, 17, 13, 12, 12, 30, 28, 24, 19, 16, 13, 13, 12, 28, 27, 21, 17,
+        15, 12, 12, 11, 23, 24, 19, 14, 13, 11, 11, 11, 21, 22, 18, 13, 12, 10,
+        10, 10, 18, 19, 16, 12, 10, 9, 9, 9, 16, 18, 15, 11, 10, 8, 8, 8, 13,
+        15, 13, 10, 9, 7, 8, 8, 12, 14, 13, 10, 8, 7, 7, 7, 11, 13, 12, 10, 8,
+        7, 6, 6, 11, 12, 11, 10, 8, 7, 6, 6, 10, 11, 10, 9, 8, 7, 6, 6, 9, 10,
+        10, 9, 7, 6, 6, 5, 9, 10, 10, 9, 8, 7, 6, 5,
+        /* Size 16x8 */
+        32, 33, 32, 30, 28, 23, 21, 18, 16, 13, 12, 11, 11, 10, 9, 9, 32, 31,
+        30, 28, 27, 24, 22, 19, 18, 15, 14, 13, 12, 11, 10, 10, 28, 30, 28, 24,
+        21, 19, 18, 16, 15, 13, 13, 12, 11, 10, 10, 10, 19, 21, 20, 19, 17, 14,
+        13, 12, 11, 10, 10, 10, 10, 9, 9, 9, 16, 17, 17, 16, 15, 13, 12, 10, 10,
+        9, 8, 8, 8, 8, 7, 8, 12, 13, 13, 13, 12, 11, 10, 9, 8, 7, 7, 7, 7, 7, 6,
+        7, 11, 12, 12, 13, 12, 11, 10, 9, 8, 8, 7, 6, 6, 6, 6, 6, 10, 11, 12,
+        12, 11, 11, 10, 9, 8, 8, 7, 6, 6, 6, 5, 5,
+        /* Size 16x32 */
+        32, 33, 32, 30, 28, 23, 19, 17, 16, 13, 12, 11, 11, 11, 10, 10, 33, 32,
+        32, 30, 29, 24, 20, 18, 17, 14, 12, 12, 12, 11, 11, 11, 33, 32, 31, 31,
+        30, 25, 21, 19, 17, 14, 13, 12, 12, 11, 11, 11, 33, 32, 31, 30, 29, 25,
+        21, 19, 17, 14, 13, 13, 12, 12, 11, 11, 32, 32, 30, 29, 28, 24, 20, 19,
+        17, 14, 13, 13, 12, 12, 12, 11, 32, 31, 29, 28, 27, 24, 21, 19, 18, 15,
+        14, 13, 12, 12, 12, 11, 30, 30, 28, 26, 24, 21, 19, 18, 16, 14, 13, 13,
+        13, 12, 12, 11, 29, 30, 28, 25, 23, 20, 18, 17, 16, 13, 12, 12, 12, 12,
+        12, 11, 28, 30, 27, 24, 21, 19, 17, 16, 15, 13, 12, 12, 12, 12, 11, 11,
+        26, 28, 26, 23, 20, 18, 16, 15, 14, 12, 12, 12, 11, 11, 11, 11, 23, 25,
+        24, 21, 19, 16, 14, 14, 13, 11, 11, 11, 11, 11, 11, 11, 22, 24, 23, 21,
+        19, 16, 14, 13, 12, 11, 10, 10, 10, 10, 10, 10, 21, 23, 22, 20, 18, 15,
+        13, 13, 12, 11, 10, 10, 10, 10, 10, 10, 19, 21, 20, 19, 17, 14, 12, 12,
+        11, 10, 9, 10, 10, 9, 10, 9, 18, 19, 19, 18, 16, 14, 12, 11, 10, 9, 9,
+        9, 9, 9, 9, 9, 17, 18, 18, 17, 16, 13, 12, 11, 10, 9, 9, 9, 9, 9, 9, 9,
+        16, 17, 18, 16, 15, 13, 11, 10, 10, 9, 8, 8, 8, 8, 8, 8, 14, 16, 16, 15,
+        14, 12, 11, 10, 9, 8, 8, 8, 8, 8, 8, 8, 13, 14, 15, 14, 13, 11, 10, 9,
+        9, 8, 7, 8, 8, 8, 8, 8, 13, 14, 14, 14, 13, 11, 10, 9, 9, 8, 7, 7, 7, 7,
+        7, 7, 12, 14, 14, 13, 13, 11, 10, 9, 8, 8, 7, 7, 7, 7, 7, 7, 12, 13, 13,
+        13, 12, 11, 9, 9, 8, 7, 7, 7, 7, 7, 7, 7, 11, 12, 13, 13, 12, 10, 10, 9,
+        8, 7, 7, 7, 6, 6, 6, 7, 11, 12, 12, 12, 11, 10, 10, 9, 8, 7, 7, 6, 6, 6,
+        6, 6, 11, 12, 12, 12, 11, 10, 10, 8, 8, 7, 7, 6, 6, 6, 6, 6, 10, 11, 12,
+        12, 11, 10, 9, 8, 8, 7, 7, 6, 6, 6, 6, 6, 10, 11, 11, 11, 10, 10, 9, 9,
+        8, 7, 7, 6, 6, 6, 6, 6, 10, 11, 11, 11, 10, 10, 9, 9, 8, 7, 7, 6, 6, 5,
+        5, 5, 9, 10, 10, 11, 10, 9, 9, 8, 7, 7, 6, 6, 6, 5, 5, 5, 9, 10, 10, 10,
+        10, 9, 9, 8, 7, 7, 6, 6, 6, 5, 5, 5, 9, 9, 10, 10, 10, 9, 9, 8, 8, 7, 7,
+        6, 6, 5, 5, 5, 8, 9, 9, 10, 10, 9, 9, 8, 8, 7, 7, 6, 6, 5, 5, 5,
+        /* Size 32x16 */
+        32, 33, 33, 33, 32, 32, 30, 29, 28, 26, 23, 22, 21, 19, 18, 17, 16, 14,
+        13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 9, 8, 33, 32, 32, 32, 32,
+        31, 30, 30, 30, 28, 25, 24, 23, 21, 19, 18, 17, 16, 14, 14, 14, 13, 12,
+        12, 12, 11, 11, 11, 10, 10, 9, 9, 32, 32, 31, 31, 30, 29, 28, 28, 27,
+        26, 24, 23, 22, 20, 19, 18, 18, 16, 15, 14, 14, 13, 13, 12, 12, 12, 11,
+        11, 10, 10, 10, 9, 30, 30, 31, 30, 29, 28, 26, 25, 24, 23, 21, 21, 20,
+        19, 18, 17, 16, 15, 14, 14, 13, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10,
+        10, 28, 29, 30, 29, 28, 27, 24, 23, 21, 20, 19, 19, 18, 17, 16, 16, 15,
+        14, 13, 13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10, 23, 24, 25,
+        25, 24, 24, 21, 20, 19, 18, 16, 16, 15, 14, 14, 13, 13, 12, 11, 11, 11,
+        11, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 19, 20, 21, 21, 20, 21, 19, 18,
+        17, 16, 14, 14, 13, 12, 12, 12, 11, 11, 10, 10, 10, 9, 10, 10, 10, 9, 9,
+        9, 9, 9, 9, 9, 17, 18, 19, 19, 19, 19, 18, 17, 16, 15, 14, 13, 13, 12,
+        11, 11, 10, 10, 9, 9, 9, 9, 9, 9, 8, 8, 9, 9, 8, 8, 8, 8, 16, 17, 17,
+        17, 17, 18, 16, 16, 15, 14, 13, 12, 12, 11, 10, 10, 10, 9, 9, 9, 8, 8,
+        8, 8, 8, 8, 8, 8, 7, 7, 8, 8, 13, 14, 14, 14, 14, 15, 14, 13, 13, 12,
+        11, 11, 11, 10, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
+        12, 12, 13, 13, 13, 14, 13, 12, 12, 12, 11, 10, 10, 9, 9, 9, 8, 8, 7, 7,
+        7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 7, 7, 11, 12, 12, 13, 13, 13, 13, 12, 12,
+        12, 11, 10, 10, 10, 9, 9, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6,
+        6, 11, 12, 12, 12, 12, 12, 13, 12, 12, 11, 11, 10, 10, 10, 9, 9, 8, 8,
+        8, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 11, 11, 11, 12, 12, 12, 12,
+        12, 12, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 6, 5, 5,
+        5, 5, 5, 10, 11, 11, 11, 12, 12, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9,
+        8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 10, 11, 11, 11, 11, 11,
+        11, 11, 11, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 5,
+        5, 5, 5, 5,
+        /* Size 4x16 */
+        33, 23, 13, 11, 32, 25, 14, 11, 32, 24, 14, 12, 30, 21, 14, 12, 30, 19,
+        13, 12, 25, 16, 11, 11, 23, 15, 11, 10, 19, 14, 9, 9, 17, 13, 9, 8, 14,
+        11, 8, 8, 14, 11, 8, 7, 12, 10, 7, 6, 12, 10, 7, 6, 11, 10, 7, 6, 10, 9,
+        7, 5, 9, 9, 7, 5,
+        /* Size 16x4 */
+        33, 32, 32, 30, 30, 25, 23, 19, 17, 14, 14, 12, 12, 11, 10, 9, 23, 25,
+        24, 21, 19, 16, 15, 14, 13, 11, 11, 10, 10, 10, 9, 9, 13, 14, 14, 14,
+        13, 11, 11, 9, 9, 8, 8, 7, 7, 7, 7, 7, 11, 11, 12, 12, 12, 11, 10, 9, 8,
+        8, 7, 6, 6, 6, 5, 5,
+        /* Size 8x32 */
+        32, 32, 28, 19, 16, 12, 11, 10, 33, 32, 29, 20, 17, 12, 12, 11, 33, 31,
+        30, 21, 17, 13, 12, 11, 33, 31, 29, 21, 17, 13, 12, 11, 32, 30, 28, 20,
+        17, 13, 12, 12, 32, 29, 27, 21, 18, 14, 12, 12, 30, 28, 24, 19, 16, 13,
+        13, 12, 29, 28, 23, 18, 16, 12, 12, 12, 28, 27, 21, 17, 15, 12, 12, 11,
+        26, 26, 20, 16, 14, 12, 11, 11, 23, 24, 19, 14, 13, 11, 11, 11, 22, 23,
+        19, 14, 12, 10, 10, 10, 21, 22, 18, 13, 12, 10, 10, 10, 19, 20, 17, 12,
+        11, 9, 10, 10, 18, 19, 16, 12, 10, 9, 9, 9, 17, 18, 16, 12, 10, 9, 9, 9,
+        16, 18, 15, 11, 10, 8, 8, 8, 14, 16, 14, 11, 9, 8, 8, 8, 13, 15, 13, 10,
+        9, 7, 8, 8, 13, 14, 13, 10, 9, 7, 7, 7, 12, 14, 13, 10, 8, 7, 7, 7, 12,
+        13, 12, 9, 8, 7, 7, 7, 11, 13, 12, 10, 8, 7, 6, 6, 11, 12, 11, 10, 8, 7,
+        6, 6, 11, 12, 11, 10, 8, 7, 6, 6, 10, 12, 11, 9, 8, 7, 6, 6, 10, 11, 10,
+        9, 8, 7, 6, 6, 10, 11, 10, 9, 8, 7, 6, 5, 9, 10, 10, 9, 7, 6, 6, 5, 9,
+        10, 10, 9, 7, 6, 6, 5, 9, 10, 10, 9, 8, 7, 6, 5, 8, 9, 10, 9, 8, 7, 6,
+        5,
+        /* Size 32x8 */
+        32, 33, 33, 33, 32, 32, 30, 29, 28, 26, 23, 22, 21, 19, 18, 17, 16, 14,
+        13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 9, 8, 32, 32, 31, 31, 30,
+        29, 28, 28, 27, 26, 24, 23, 22, 20, 19, 18, 18, 16, 15, 14, 14, 13, 13,
+        12, 12, 12, 11, 11, 10, 10, 10, 9, 28, 29, 30, 29, 28, 27, 24, 23, 21,
+        20, 19, 19, 18, 17, 16, 16, 15, 14, 13, 13, 13, 12, 12, 11, 11, 11, 10,
+        10, 10, 10, 10, 10, 19, 20, 21, 21, 20, 21, 19, 18, 17, 16, 14, 14, 13,
+        12, 12, 12, 11, 11, 10, 10, 10, 9, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 16,
+        17, 17, 17, 17, 18, 16, 16, 15, 14, 13, 12, 12, 11, 10, 10, 10, 9, 9, 9,
+        8, 8, 8, 8, 8, 8, 8, 8, 7, 7, 8, 8, 12, 12, 13, 13, 13, 14, 13, 12, 12,
+        12, 11, 10, 10, 9, 9, 9, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 7, 7,
+        11, 12, 12, 12, 12, 12, 13, 12, 12, 11, 11, 10, 10, 10, 9, 9, 8, 8, 8,
+        7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 10, 11, 11, 11, 12, 12, 12, 12,
+        11, 11, 11, 10, 10, 10, 9, 9, 8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 6, 5, 5, 5,
+        5, 5 },
+      { /* Chroma */
+        /* Size 4x4 */
+        29, 22, 18, 16, 22, 17, 15, 14, 18, 15, 11, 11, 16, 14, 11, 9,
+        /* Size 8x8 */
+        33, 27, 22, 20, 18, 16, 15, 14, 27, 22, 22, 22, 20, 18, 17, 15, 22, 22,
+        19, 18, 17, 16, 15, 15, 20, 22, 18, 16, 14, 13, 14, 14, 18, 20, 17, 14,
+        12, 12, 12, 12, 16, 18, 16, 13, 12, 11, 11, 11, 15, 17, 15, 14, 12, 11,
+        10, 10, 14, 15, 15, 14, 12, 11, 10, 9,
+        /* Size 16x16 */
+        32, 34, 31, 25, 21, 21, 20, 19, 18, 16, 16, 15, 15, 14, 14, 13, 34, 32,
+        29, 24, 22, 23, 22, 21, 20, 18, 18, 17, 16, 15, 15, 14, 31, 29, 26, 23,
+        22, 23, 22, 21, 20, 18, 18, 17, 17, 16, 16, 15, 25, 24, 23, 21, 20, 21,
+        20, 20, 19, 18, 18, 17, 17, 17, 16, 15, 21, 22, 22, 20, 19, 19, 19, 19,
+        18, 17, 17, 16, 16, 16, 16, 16, 21, 23, 23, 21, 19, 18, 17, 17, 16, 15,
+        15, 15, 15, 15, 15, 15, 20, 22, 22, 20, 19, 17, 17, 16, 15, 14, 14, 14,
+        14, 14, 14, 14, 19, 21, 21, 20, 19, 17, 16, 14, 14, 13, 13, 13, 13, 13,
+        13, 13, 18, 20, 20, 19, 18, 16, 15, 14, 13, 12, 12, 12, 12, 12, 12, 12,
+        16, 18, 18, 18, 17, 15, 14, 13, 12, 12, 11, 11, 12, 12, 12, 12, 16, 18,
+        18, 18, 17, 15, 14, 13, 12, 11, 11, 11, 11, 11, 11, 11, 15, 17, 17, 17,
+        16, 15, 14, 13, 12, 11, 11, 10, 10, 10, 10, 10, 15, 16, 17, 17, 16, 15,
+        14, 13, 12, 12, 11, 10, 10, 10, 10, 10, 14, 15, 16, 17, 16, 15, 14, 13,
+        12, 12, 11, 10, 10, 10, 9, 9, 14, 15, 16, 16, 16, 15, 14, 13, 12, 12,
+        11, 10, 10, 9, 9, 9, 13, 14, 15, 15, 16, 15, 14, 13, 12, 12, 11, 10, 10,
+        9, 9, 9,
+        /* Size 32x32 */
+        32, 33, 34, 32, 31, 28, 25, 23, 21, 21, 21, 20, 20, 20, 19, 18, 18, 17,
+        16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 13, 13, 33, 33, 33, 31,
+        30, 27, 24, 23, 22, 22, 22, 22, 21, 20, 20, 19, 19, 18, 17, 17, 17, 16,
+        16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 34, 33, 32, 31, 29, 26, 24, 23,
+        22, 23, 23, 23, 22, 22, 21, 20, 20, 19, 18, 18, 18, 17, 17, 16, 16, 16,
+        15, 15, 15, 14, 14, 14, 32, 31, 31, 29, 28, 25, 24, 23, 22, 22, 23, 22,
+        22, 22, 21, 20, 20, 19, 18, 18, 18, 17, 17, 17, 16, 16, 16, 16, 15, 15,
+        15, 15, 31, 30, 29, 28, 26, 24, 23, 22, 22, 22, 23, 22, 22, 22, 21, 20,
+        20, 19, 18, 18, 18, 17, 17, 17, 17, 16, 16, 16, 16, 15, 15, 15, 28, 27,
+        26, 25, 24, 22, 22, 22, 21, 22, 23, 22, 22, 22, 21, 21, 20, 20, 19, 19,
+        19, 18, 18, 17, 17, 17, 16, 16, 16, 15, 15, 15, 25, 24, 24, 24, 23, 22,
+        21, 21, 20, 21, 21, 21, 20, 20, 20, 20, 19, 19, 18, 18, 18, 17, 17, 17,
+        17, 17, 17, 16, 16, 16, 15, 15, 23, 23, 23, 23, 22, 22, 21, 20, 20, 20,
+        20, 20, 20, 20, 19, 19, 19, 18, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16,
+        16, 16, 16, 16, 21, 22, 22, 22, 22, 21, 20, 20, 19, 19, 19, 19, 19, 19,
+        19, 18, 18, 18, 17, 17, 17, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
+        21, 22, 23, 22, 22, 22, 21, 20, 19, 19, 19, 18, 18, 18, 18, 17, 17, 17,
+        16, 16, 16, 16, 16, 16, 16, 15, 15, 15, 15, 15, 15, 15, 21, 22, 23, 23,
+        23, 23, 21, 20, 19, 19, 18, 17, 17, 17, 17, 16, 16, 16, 15, 15, 15, 15,
+        15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 20, 22, 23, 22, 22, 22, 21, 20,
+        19, 18, 17, 17, 17, 17, 16, 16, 16, 15, 15, 15, 15, 15, 14, 14, 15, 15,
+        14, 14, 14, 14, 14, 14, 20, 21, 22, 22, 22, 22, 20, 20, 19, 18, 17, 17,
+        17, 16, 16, 16, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14,
+        14, 14, 20, 20, 22, 22, 22, 22, 20, 20, 19, 18, 17, 17, 16, 16, 15, 15,
+        15, 14, 14, 14, 14, 13, 14, 14, 13, 14, 14, 13, 14, 14, 13, 13, 19, 20,
+        21, 21, 21, 21, 20, 19, 19, 18, 17, 16, 16, 15, 14, 14, 14, 14, 13, 13,
+        13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 18, 19, 20, 20, 20, 21,
+        20, 19, 18, 17, 16, 16, 16, 15, 14, 14, 14, 13, 13, 13, 13, 13, 12, 13,
+        13, 13, 13, 13, 13, 13, 13, 12, 18, 19, 20, 20, 20, 20, 19, 19, 18, 17,
+        16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
+        12, 12, 12, 12, 17, 18, 19, 19, 19, 20, 19, 18, 18, 17, 16, 15, 15, 14,
+        14, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
+        16, 17, 18, 18, 18, 19, 18, 17, 17, 16, 15, 15, 14, 14, 13, 13, 12, 12,
+        12, 12, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 16, 17, 18, 18,
+        18, 19, 18, 17, 17, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11,
+        11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 16, 17, 18, 18, 18, 19, 18, 17,
+        17, 16, 15, 15, 14, 14, 13, 13, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11,
+        11, 11, 11, 11, 11, 11, 15, 16, 17, 17, 17, 18, 17, 17, 16, 16, 15, 15,
+        14, 13, 13, 13, 12, 12, 11, 11, 11, 11, 11, 11, 10, 11, 11, 11, 11, 11,
+        11, 11, 15, 16, 17, 17, 17, 18, 17, 17, 16, 16, 15, 14, 14, 14, 13, 12,
+        12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 15, 16,
+        16, 17, 17, 17, 17, 17, 16, 16, 15, 14, 14, 14, 13, 13, 12, 12, 12, 11,
+        11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 15, 16, 16, 16, 17, 17,
+        17, 16, 16, 16, 15, 15, 14, 13, 13, 13, 12, 12, 12, 11, 11, 10, 10, 10,
+        10, 10, 10, 10, 10, 10, 10, 10, 14, 15, 16, 16, 16, 17, 17, 16, 16, 15,
+        15, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10,
+        10, 10, 10, 10, 14, 15, 15, 16, 16, 16, 17, 16, 16, 15, 15, 14, 14, 14,
+        13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10, 9, 9, 9, 10, 14,
+        15, 15, 16, 16, 16, 16, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12,
+        11, 11, 11, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 14, 15, 15, 15, 16, 16,
+        16, 16, 16, 15, 15, 14, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10,
+        10, 10, 9, 9, 9, 9, 9, 9, 14, 14, 14, 15, 15, 15, 16, 16, 16, 15, 15,
+        14, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9,
+        9, 9, 13, 14, 14, 15, 15, 15, 15, 16, 16, 15, 15, 14, 14, 13, 13, 13,
+        12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 13, 14, 14,
+        15, 15, 15, 15, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11,
+        11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 9,
+        /* Size 4x8 */
+        33, 22, 17, 16, 26, 23, 19, 17, 22, 18, 16, 16, 21, 17, 14, 14, 19, 16,
+        12, 12, 17, 15, 11, 11, 16, 15, 11, 10, 15, 14, 12, 10,
+        /* Size 8x4 */
+        33, 26, 22, 21, 19, 17, 16, 15, 22, 23, 18, 17, 16, 15, 15, 14, 17, 19,
+        16, 14, 12, 11, 11, 12, 16, 17, 16, 14, 12, 11, 10, 10,
+        /* Size 8x16 */
+        32, 28, 21, 20, 18, 16, 15, 14, 34, 26, 22, 21, 20, 17, 16, 16, 31, 24,
+        22, 22, 20, 17, 17, 16, 24, 22, 20, 20, 19, 17, 17, 17, 21, 21, 19, 19,
+        18, 17, 17, 17, 21, 22, 19, 17, 16, 15, 16, 16, 20, 22, 19, 16, 15, 14,
+        14, 15, 19, 21, 19, 15, 14, 13, 13, 14, 18, 20, 18, 15, 13, 12, 13, 13,
+        16, 19, 17, 14, 12, 11, 12, 12, 16, 18, 17, 14, 12, 11, 11, 12, 15, 17,
+        16, 14, 12, 11, 10, 11, 15, 17, 16, 14, 12, 11, 10, 10, 14, 16, 16, 14,
+        12, 11, 10, 10, 14, 15, 16, 14, 12, 11, 10, 10, 13, 15, 15, 14, 12, 11,
+        10, 9,
+        /* Size 16x8 */
+        32, 34, 31, 24, 21, 21, 20, 19, 18, 16, 16, 15, 15, 14, 14, 13, 28, 26,
+        24, 22, 21, 22, 22, 21, 20, 19, 18, 17, 17, 16, 15, 15, 21, 22, 22, 20,
+        19, 19, 19, 19, 18, 17, 17, 16, 16, 16, 16, 15, 20, 21, 22, 20, 19, 17,
+        16, 15, 15, 14, 14, 14, 14, 14, 14, 14, 18, 20, 20, 19, 18, 16, 15, 14,
+        13, 12, 12, 12, 12, 12, 12, 12, 16, 17, 17, 17, 17, 15, 14, 13, 12, 11,
+        11, 11, 11, 11, 11, 11, 15, 16, 17, 17, 17, 16, 14, 13, 13, 12, 11, 10,
+        10, 10, 10, 10, 14, 16, 16, 17, 17, 16, 15, 14, 13, 12, 12, 11, 10, 10,
+        10, 9,
+        /* Size 16x32 */
+        32, 33, 28, 24, 21, 21, 20, 19, 18, 16, 16, 15, 15, 15, 14, 14, 33, 33,
+        27, 24, 22, 22, 20, 20, 19, 17, 16, 16, 16, 16, 15, 15, 34, 32, 26, 24,
+        22, 23, 21, 20, 20, 18, 17, 17, 16, 16, 16, 15, 32, 30, 25, 23, 22, 23,
+        21, 21, 20, 18, 17, 17, 17, 16, 16, 16, 31, 28, 24, 23, 22, 22, 22, 21,
+        20, 18, 17, 17, 17, 17, 16, 16, 28, 26, 22, 22, 22, 23, 22, 21, 20, 19,
+        18, 18, 17, 17, 17, 16, 24, 24, 22, 21, 20, 21, 20, 20, 19, 18, 17, 18,
+        17, 17, 17, 16, 23, 23, 22, 21, 20, 20, 20, 19, 19, 17, 17, 17, 17, 17,
+        17, 17, 21, 22, 21, 20, 19, 19, 19, 19, 18, 17, 17, 16, 17, 16, 17, 17,
+        21, 22, 22, 20, 19, 18, 18, 17, 17, 16, 16, 16, 16, 16, 16, 16, 21, 23,
+        22, 21, 19, 18, 17, 17, 16, 15, 15, 15, 16, 16, 16, 16, 21, 22, 22, 21,
+        19, 17, 17, 16, 16, 15, 14, 15, 15, 15, 15, 15, 20, 22, 22, 20, 19, 17,
+        16, 16, 15, 14, 14, 14, 14, 15, 15, 15, 20, 21, 22, 20, 19, 17, 16, 15,
+        14, 14, 13, 14, 14, 14, 14, 14, 19, 20, 21, 20, 19, 17, 15, 14, 14, 13,
+        13, 13, 13, 14, 14, 14, 19, 20, 21, 20, 18, 16, 15, 14, 14, 13, 12, 13,
+        13, 13, 13, 13, 18, 20, 20, 19, 18, 16, 15, 14, 13, 12, 12, 12, 13, 13,
+        13, 13, 17, 19, 20, 19, 18, 16, 14, 14, 13, 12, 12, 12, 12, 12, 13, 13,
+        16, 18, 19, 18, 17, 15, 14, 13, 12, 12, 11, 12, 12, 12, 12, 13, 16, 18,
+        19, 18, 17, 15, 14, 13, 12, 12, 11, 11, 12, 12, 12, 12, 16, 17, 18, 18,
+        17, 15, 14, 13, 12, 11, 11, 11, 11, 11, 12, 12, 15, 17, 18, 17, 16, 15,
+        13, 13, 12, 11, 11, 11, 11, 11, 11, 11, 15, 17, 17, 17, 16, 14, 14, 13,
+        12, 11, 11, 11, 10, 11, 11, 11, 15, 17, 17, 17, 16, 15, 14, 13, 12, 12,
+        11, 10, 10, 10, 11, 11, 15, 16, 17, 17, 16, 15, 14, 13, 12, 12, 11, 11,
+        10, 10, 10, 11, 14, 16, 16, 17, 15, 15, 14, 13, 12, 11, 11, 10, 10, 10,
+        10, 10, 14, 16, 16, 17, 16, 15, 14, 13, 12, 12, 11, 10, 10, 10, 10, 10,
+        14, 16, 16, 16, 16, 15, 14, 13, 12, 12, 11, 10, 10, 10, 10, 10, 14, 15,
+        15, 16, 16, 15, 14, 13, 12, 12, 11, 11, 10, 10, 10, 10, 14, 15, 15, 16,
+        16, 14, 14, 13, 12, 12, 11, 11, 10, 10, 9, 9, 13, 15, 15, 16, 15, 14,
+        14, 13, 12, 12, 11, 11, 10, 10, 9, 9, 13, 15, 15, 15, 15, 14, 14, 13,
+        13, 11, 11, 10, 10, 9, 9, 9,
+        /* Size 32x16 */
+        32, 33, 34, 32, 31, 28, 24, 23, 21, 21, 21, 21, 20, 20, 19, 19, 18, 17,
+        16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 13, 13, 33, 33, 32, 30,
+        28, 26, 24, 23, 22, 22, 23, 22, 22, 21, 20, 20, 20, 19, 18, 18, 17, 17,
+        17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 28, 27, 26, 25, 24, 22, 22, 22,
+        21, 22, 22, 22, 22, 22, 21, 21, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16,
+        16, 16, 15, 15, 15, 15, 24, 24, 24, 23, 23, 22, 21, 21, 20, 20, 21, 21,
+        20, 20, 20, 20, 19, 19, 18, 18, 18, 17, 17, 17, 17, 17, 17, 16, 16, 16,
+        16, 15, 21, 22, 22, 22, 22, 22, 20, 20, 19, 19, 19, 19, 19, 19, 19, 18,
+        18, 18, 17, 17, 17, 16, 16, 16, 16, 15, 16, 16, 16, 16, 15, 15, 21, 22,
+        23, 23, 22, 23, 21, 20, 19, 18, 18, 17, 17, 17, 17, 16, 16, 16, 15, 15,
+        15, 15, 14, 15, 15, 15, 15, 15, 15, 14, 14, 14, 20, 20, 21, 21, 22, 22,
+        20, 20, 19, 18, 17, 17, 16, 16, 15, 15, 15, 14, 14, 14, 14, 13, 14, 14,
+        14, 14, 14, 14, 14, 14, 14, 14, 19, 20, 20, 21, 21, 21, 20, 19, 19, 17,
+        17, 16, 16, 15, 14, 14, 14, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
+        13, 13, 13, 13, 18, 19, 20, 20, 20, 20, 19, 19, 18, 17, 16, 16, 15, 14,
+        14, 14, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13,
+        16, 17, 18, 18, 18, 19, 18, 17, 17, 16, 15, 15, 14, 14, 13, 13, 12, 12,
+        12, 12, 11, 11, 11, 12, 12, 11, 12, 12, 12, 12, 12, 11, 16, 16, 17, 17,
+        17, 18, 17, 17, 17, 16, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 11,
+        11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 15, 16, 17, 17, 17, 18, 18, 17,
+        16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 11, 10, 11, 10,
+        10, 10, 11, 11, 11, 10, 15, 16, 16, 17, 17, 17, 17, 17, 17, 16, 16, 15,
+        14, 14, 13, 13, 13, 12, 12, 12, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10,
+        10, 10, 15, 16, 16, 16, 17, 17, 17, 17, 16, 16, 16, 15, 15, 14, 14, 13,
+        13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 9, 14, 15,
+        16, 16, 16, 17, 17, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12,
+        12, 11, 11, 11, 10, 10, 10, 10, 10, 9, 9, 9, 14, 15, 15, 16, 16, 16, 16,
+        17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 13, 12, 12, 11, 11, 11, 11,
+        10, 10, 10, 10, 9, 9, 9,
+        /* Size 4x16 */
+        33, 21, 16, 15, 32, 23, 18, 16, 28, 22, 18, 17, 24, 21, 18, 17, 22, 19,
+        17, 16, 23, 18, 15, 16, 22, 17, 14, 15, 20, 17, 13, 14, 20, 16, 12, 13,
+        18, 15, 12, 12, 17, 15, 11, 11, 17, 14, 11, 11, 16, 15, 12, 10, 16, 15,
+        12, 10, 15, 15, 12, 10, 15, 14, 12, 10,
+        /* Size 16x4 */
+        33, 32, 28, 24, 22, 23, 22, 20, 20, 18, 17, 17, 16, 16, 15, 15, 21, 23,
+        22, 21, 19, 18, 17, 17, 16, 15, 15, 14, 15, 15, 15, 14, 16, 18, 18, 18,
+        17, 15, 14, 13, 12, 12, 11, 11, 12, 12, 12, 12, 15, 16, 17, 17, 16, 16,
+        15, 14, 13, 12, 11, 11, 10, 10, 10, 10,
+        /* Size 8x32 */
+        32, 28, 21, 20, 18, 16, 15, 14, 33, 27, 22, 20, 19, 16, 16, 15, 34, 26,
+        22, 21, 20, 17, 16, 16, 32, 25, 22, 21, 20, 17, 17, 16, 31, 24, 22, 22,
+        20, 17, 17, 16, 28, 22, 22, 22, 20, 18, 17, 17, 24, 22, 20, 20, 19, 17,
+        17, 17, 23, 22, 20, 20, 19, 17, 17, 17, 21, 21, 19, 19, 18, 17, 17, 17,
+        21, 22, 19, 18, 17, 16, 16, 16, 21, 22, 19, 17, 16, 15, 16, 16, 21, 22,
+        19, 17, 16, 14, 15, 15, 20, 22, 19, 16, 15, 14, 14, 15, 20, 22, 19, 16,
+        14, 13, 14, 14, 19, 21, 19, 15, 14, 13, 13, 14, 19, 21, 18, 15, 14, 12,
+        13, 13, 18, 20, 18, 15, 13, 12, 13, 13, 17, 20, 18, 14, 13, 12, 12, 13,
+        16, 19, 17, 14, 12, 11, 12, 12, 16, 19, 17, 14, 12, 11, 12, 12, 16, 18,
+        17, 14, 12, 11, 11, 12, 15, 18, 16, 13, 12, 11, 11, 11, 15, 17, 16, 14,
+        12, 11, 10, 11, 15, 17, 16, 14, 12, 11, 10, 11, 15, 17, 16, 14, 12, 11,
+        10, 10, 14, 16, 15, 14, 12, 11, 10, 10, 14, 16, 16, 14, 12, 11, 10, 10,
+        14, 16, 16, 14, 12, 11, 10, 10, 14, 15, 16, 14, 12, 11, 10, 10, 14, 15,
+        16, 14, 12, 11, 10, 9, 13, 15, 15, 14, 12, 11, 10, 9, 13, 15, 15, 14,
+        13, 11, 10, 9,
+        /* Size 32x8 */
+        32, 33, 34, 32, 31, 28, 24, 23, 21, 21, 21, 21, 20, 20, 19, 19, 18, 17,
+        16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 13, 13, 28, 27, 26, 25,
+        24, 22, 22, 22, 21, 22, 22, 22, 22, 22, 21, 21, 20, 20, 19, 19, 18, 18,
+        17, 17, 17, 16, 16, 16, 15, 15, 15, 15, 21, 22, 22, 22, 22, 22, 20, 20,
+        19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 17, 17, 17, 16, 16, 16, 16, 15,
+        16, 16, 16, 16, 15, 15, 20, 20, 21, 21, 22, 22, 20, 20, 19, 18, 17, 17,
+        16, 16, 15, 15, 15, 14, 14, 14, 14, 13, 14, 14, 14, 14, 14, 14, 14, 14,
+        14, 14, 18, 19, 20, 20, 20, 20, 19, 19, 18, 17, 16, 16, 15, 14, 14, 14,
+        13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 16, 16,
+        17, 17, 17, 18, 17, 17, 17, 16, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11,
+        11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 15, 16, 16, 17, 17, 17,
+        17, 17, 17, 16, 16, 15, 14, 14, 13, 13, 13, 12, 12, 12, 11, 11, 10, 10,
+        10, 10, 10, 10, 10, 10, 10, 10, 14, 15, 16, 16, 16, 17, 17, 17, 17, 16,
+        16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10,
+        10, 9, 9, 9 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 25, 15, 11, 25, 16, 12, 10, 15, 12, 8, 7, 11, 10, 7, 6,
+        /* Size 8x8 */
+        32, 32, 28, 22, 17, 13, 11, 10, 32, 29, 26, 22, 18, 14, 12, 11, 28, 26,
+        20, 17, 14, 12, 11, 10, 22, 22, 17, 14, 12, 10, 10, 9, 17, 18, 14, 12,
+        10, 8, 8, 8, 13, 14, 12, 10, 8, 7, 7, 7, 11, 12, 11, 10, 8, 7, 6, 6, 10,
+        11, 10, 9, 8, 7, 6, 5,
+        /* Size 16x16 */
+        32, 33, 33, 32, 28, 26, 22, 19, 17, 14, 13, 12, 11, 10, 10, 9, 33, 32,
+        32, 31, 30, 28, 23, 20, 18, 16, 14, 13, 12, 11, 10, 10, 33, 32, 31, 30,
+        28, 26, 23, 20, 18, 16, 14, 13, 12, 12, 11, 10, 32, 31, 30, 28, 26, 24,
+        22, 20, 18, 16, 14, 13, 13, 12, 11, 10, 28, 30, 28, 26, 21, 20, 18, 17,
+        16, 14, 13, 12, 12, 11, 11, 10, 26, 28, 26, 24, 20, 19, 17, 16, 15, 13,
+        12, 12, 11, 11, 10, 10, 22, 23, 23, 22, 18, 17, 15, 14, 13, 12, 11, 10,
+        10, 10, 9, 9, 19, 20, 20, 20, 17, 16, 14, 12, 12, 11, 10, 9, 9, 9, 9, 8,
+        17, 18, 18, 18, 16, 15, 13, 12, 11, 10, 9, 9, 8, 8, 8, 8, 14, 16, 16,
+        16, 14, 13, 12, 11, 10, 9, 8, 8, 8, 8, 8, 7, 13, 14, 14, 14, 13, 12, 11,
+        10, 9, 8, 8, 7, 7, 7, 7, 7, 12, 13, 13, 13, 12, 12, 10, 9, 9, 8, 7, 7,
+        7, 7, 6, 6, 11, 12, 12, 13, 12, 11, 10, 9, 8, 8, 7, 7, 6, 6, 6, 6, 10,
+        11, 12, 12, 11, 11, 10, 9, 8, 8, 7, 7, 6, 6, 6, 5, 10, 10, 11, 11, 11,
+        10, 9, 9, 8, 8, 7, 6, 6, 6, 5, 5, 9, 10, 10, 10, 10, 10, 9, 8, 8, 7, 7,
+        6, 6, 5, 5, 5,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 32, 32, 30, 28, 27, 26, 23, 22, 21, 19, 17, 17, 16,
+        14, 13, 13, 12, 12, 12, 11, 11, 10, 10, 10, 9, 9, 9, 33, 32, 32, 32, 32,
+        32, 31, 30, 29, 28, 27, 24, 23, 22, 20, 18, 18, 17, 15, 14, 13, 13, 12,
+        12, 12, 11, 11, 11, 10, 10, 10, 9, 33, 32, 32, 32, 32, 32, 31, 31, 30,
+        28, 28, 25, 23, 22, 20, 19, 18, 17, 16, 15, 14, 13, 13, 12, 12, 12, 11,
+        11, 10, 10, 10, 9, 33, 32, 32, 32, 32, 31, 31, 30, 29, 28, 27, 25, 23,
+        23, 21, 19, 18, 17, 16, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11, 10, 10,
+        10, 33, 32, 32, 32, 31, 30, 30, 29, 28, 27, 26, 24, 23, 22, 20, 19, 18,
+        17, 16, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 32, 32, 32,
+        31, 30, 29, 28, 28, 27, 26, 26, 24, 23, 22, 21, 19, 19, 18, 16, 16, 15,
+        14, 14, 13, 13, 12, 12, 11, 11, 11, 10, 10, 32, 31, 31, 31, 30, 28, 28,
+        27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 14, 13, 13, 13,
+        12, 12, 12, 11, 11, 10, 10, 30, 30, 31, 30, 29, 28, 27, 26, 24, 23, 23,
+        22, 20, 20, 19, 18, 17, 16, 15, 14, 14, 13, 13, 12, 12, 12, 12, 12, 11,
+        11, 11, 10, 28, 29, 30, 29, 28, 27, 26, 24, 21, 20, 20, 19, 18, 18, 17,
+        16, 16, 15, 14, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 10, 10, 27,
+        28, 28, 28, 27, 26, 25, 23, 20, 20, 20, 18, 18, 17, 16, 15, 15, 14, 13,
+        13, 12, 12, 12, 12, 11, 11, 11, 11, 11, 10, 10, 10, 26, 27, 28, 27, 26,
+        26, 24, 23, 20, 20, 19, 18, 17, 17, 16, 15, 15, 14, 13, 13, 12, 12, 12,
+        11, 11, 11, 11, 10, 10, 10, 10, 10, 23, 24, 25, 25, 24, 24, 23, 22, 19,
+        18, 18, 16, 16, 15, 14, 14, 13, 13, 12, 12, 11, 11, 11, 11, 11, 10, 10,
+        10, 10, 10, 9, 9, 22, 23, 23, 23, 23, 23, 22, 20, 18, 18, 17, 16, 15,
+        14, 14, 13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9,
+        21, 22, 22, 23, 22, 22, 21, 20, 18, 17, 17, 15, 14, 14, 13, 13, 12, 12,
+        11, 11, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 19, 20, 20, 21, 20,
+        21, 20, 19, 17, 16, 16, 14, 14, 13, 12, 12, 12, 11, 11, 10, 10, 10, 9,
+        9, 9, 9, 9, 9, 9, 8, 8, 8, 17, 18, 19, 19, 19, 19, 19, 18, 16, 15, 15,
+        14, 13, 13, 12, 11, 11, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 8, 8, 8, 8, 8,
+        17, 18, 18, 18, 18, 19, 18, 17, 16, 15, 15, 13, 13, 12, 12, 11, 11, 10,
+        10, 9, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8, 16, 17, 17, 17, 17, 18, 17,
+        16, 15, 14, 14, 13, 12, 12, 11, 10, 10, 10, 9, 9, 9, 8, 8, 8, 8, 8, 8,
+        8, 8, 8, 8, 7, 14, 15, 16, 16, 16, 16, 16, 15, 14, 13, 13, 12, 12, 11,
+        11, 10, 10, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 7, 7, 13, 14, 15, 15,
+        15, 16, 15, 14, 13, 13, 13, 12, 11, 11, 10, 10, 9, 9, 8, 8, 8, 8, 8, 8,
+        7, 7, 7, 7, 7, 7, 7, 7, 13, 13, 14, 14, 14, 15, 14, 14, 13, 12, 12, 11,
+        11, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 12, 13,
+        13, 14, 14, 14, 14, 13, 13, 12, 12, 11, 11, 10, 10, 9, 9, 8, 8, 8, 7, 7,
+        7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 12, 12, 13, 13, 13, 14, 13, 13, 12, 12,
+        12, 11, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7, 6, 7, 6, 6, 6, 6, 6,
+        12, 12, 12, 13, 13, 13, 13, 12, 12, 12, 11, 11, 10, 10, 9, 9, 9, 8, 8,
+        8, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 11, 12, 12, 12, 12, 13, 13, 12,
+        12, 11, 11, 11, 10, 10, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6,
+        6, 6, 11, 11, 12, 12, 12, 12, 12, 12, 12, 11, 11, 10, 10, 10, 9, 9, 8,
+        8, 8, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 10, 11, 11, 12, 12, 12, 12,
+        12, 11, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6,
+        5, 5, 5, 10, 11, 11, 11, 11, 11, 12, 12, 11, 11, 10, 10, 10, 9, 9, 8, 8,
+        8, 8, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 5, 5, 5, 10, 10, 10, 11, 11, 11, 11,
+        11, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 6, 6, 5,
+        5, 5, 5, 9, 10, 10, 10, 11, 11, 11, 11, 11, 10, 10, 10, 9, 9, 8, 8, 8,
+        8, 7, 7, 7, 7, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 9, 10, 10, 10, 10, 10, 10,
+        11, 10, 10, 10, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 5, 5, 5, 5,
+        5, 5, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7,
+        7, 7, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5,
+        /* Size 4x8 */
+        32, 24, 15, 12, 31, 24, 16, 12, 28, 18, 13, 12, 22, 15, 11, 10, 17, 13,
+        9, 8, 14, 11, 8, 7, 12, 11, 8, 6, 10, 10, 8, 6,
+        /* Size 8x4 */
+        32, 31, 28, 22, 17, 14, 12, 10, 24, 24, 18, 15, 13, 11, 11, 10, 15, 16,
+        13, 11, 9, 8, 8, 8, 12, 12, 12, 10, 8, 7, 6, 6,
+        /* Size 8x16 */
+        32, 32, 28, 22, 16, 13, 11, 11, 33, 32, 29, 23, 17, 14, 12, 11, 32, 30,
+        28, 23, 17, 14, 13, 12, 32, 29, 26, 22, 17, 14, 13, 12, 28, 28, 21, 18,
+        15, 13, 12, 12, 26, 26, 20, 17, 14, 12, 11, 11, 22, 23, 18, 15, 12, 11,
+        10, 10, 19, 20, 17, 14, 11, 10, 9, 9, 17, 18, 16, 13, 10, 9, 9, 9, 14,
+        16, 14, 12, 9, 8, 8, 8, 13, 15, 13, 11, 9, 8, 7, 7, 12, 13, 12, 10, 8,
+        7, 7, 7, 11, 12, 12, 10, 8, 7, 7, 6, 10, 12, 11, 9, 8, 7, 6, 6, 10, 11,
+        11, 9, 8, 7, 6, 6, 9, 10, 10, 9, 8, 7, 6, 5,
+        /* Size 16x8 */
+        32, 33, 32, 32, 28, 26, 22, 19, 17, 14, 13, 12, 11, 10, 10, 9, 32, 32,
+        30, 29, 28, 26, 23, 20, 18, 16, 15, 13, 12, 12, 11, 10, 28, 29, 28, 26,
+        21, 20, 18, 17, 16, 14, 13, 12, 12, 11, 11, 10, 22, 23, 23, 22, 18, 17,
+        15, 14, 13, 12, 11, 10, 10, 9, 9, 9, 16, 17, 17, 17, 15, 14, 12, 11, 10,
+        9, 9, 8, 8, 8, 8, 8, 13, 14, 14, 14, 13, 12, 11, 10, 9, 8, 8, 7, 7, 7,
+        7, 7, 11, 12, 13, 13, 12, 11, 10, 9, 9, 8, 7, 7, 7, 6, 6, 6, 11, 11, 12,
+        12, 12, 11, 10, 9, 9, 8, 7, 7, 6, 6, 6, 5,
+        /* Size 16x32 */
+        32, 33, 32, 32, 28, 23, 22, 19, 16, 14, 13, 12, 11, 11, 11, 10, 33, 32,
+        32, 31, 29, 24, 23, 20, 17, 15, 14, 12, 12, 12, 11, 11, 33, 32, 32, 31,
+        29, 25, 23, 21, 17, 15, 14, 13, 12, 12, 11, 11, 33, 32, 31, 31, 29, 25,
+        23, 21, 17, 16, 14, 13, 12, 12, 12, 11, 32, 32, 30, 30, 28, 24, 23, 20,
+        17, 16, 14, 13, 13, 12, 12, 11, 32, 31, 29, 28, 27, 24, 23, 21, 18, 16,
+        15, 13, 13, 12, 12, 12, 32, 31, 29, 28, 26, 23, 22, 20, 17, 16, 14, 13,
+        13, 13, 12, 12, 30, 30, 28, 27, 24, 21, 20, 19, 16, 15, 14, 13, 12, 13,
+        12, 12, 28, 30, 28, 26, 21, 19, 18, 17, 15, 14, 13, 12, 12, 12, 12, 12,
+        27, 28, 26, 25, 21, 18, 18, 16, 14, 13, 13, 12, 12, 12, 11, 11, 26, 28,
+        26, 24, 20, 18, 17, 16, 14, 13, 12, 11, 11, 11, 11, 11, 23, 25, 24, 23,
+        19, 16, 16, 14, 13, 12, 11, 11, 11, 11, 11, 10, 22, 23, 23, 22, 18, 16,
+        15, 14, 12, 11, 11, 10, 10, 10, 10, 10, 21, 22, 22, 21, 18, 15, 14, 13,
+        12, 11, 11, 10, 10, 10, 10, 10, 19, 21, 20, 20, 17, 14, 14, 12, 11, 10,
+        10, 9, 9, 10, 9, 10, 18, 19, 19, 19, 16, 14, 13, 12, 10, 10, 9, 9, 9, 9,
+        9, 9, 17, 18, 18, 18, 16, 13, 13, 12, 10, 10, 9, 9, 9, 9, 9, 9, 16, 17,
+        17, 17, 15, 13, 12, 11, 10, 9, 9, 8, 8, 8, 8, 8, 14, 16, 16, 16, 14, 12,
+        12, 11, 9, 9, 8, 8, 8, 8, 8, 8, 13, 15, 15, 15, 13, 12, 11, 10, 9, 8, 8,
+        8, 8, 8, 8, 8, 13, 14, 15, 14, 13, 11, 11, 10, 9, 8, 8, 7, 7, 7, 7, 8,
+        12, 14, 14, 14, 13, 11, 11, 10, 8, 8, 8, 7, 7, 7, 7, 7, 12, 13, 13, 13,
+        12, 11, 10, 9, 8, 8, 7, 7, 7, 7, 7, 7, 12, 13, 13, 13, 12, 11, 10, 9, 8,
+        8, 7, 7, 7, 7, 7, 6, 11, 12, 12, 13, 12, 11, 10, 9, 8, 8, 7, 7, 7, 6, 6,
+        6, 11, 12, 12, 12, 11, 11, 10, 9, 9, 8, 7, 7, 6, 6, 6, 6, 10, 12, 12,
+        12, 11, 11, 9, 9, 8, 8, 7, 6, 6, 6, 6, 6, 10, 11, 11, 12, 11, 10, 9, 9,
+        8, 8, 7, 6, 6, 6, 6, 6, 10, 11, 11, 11, 11, 10, 9, 9, 8, 8, 7, 7, 6, 6,
+        6, 6, 10, 10, 11, 11, 11, 10, 9, 9, 8, 8, 7, 7, 6, 6, 5, 5, 9, 10, 10,
+        11, 10, 9, 9, 8, 8, 7, 7, 6, 6, 6, 5, 5, 9, 10, 10, 10, 10, 9, 9, 8, 8,
+        7, 7, 6, 6, 5, 5, 5,
+        /* Size 32x16 */
+        32, 33, 33, 33, 32, 32, 32, 30, 28, 27, 26, 23, 22, 21, 19, 18, 17, 16,
+        14, 13, 13, 12, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 33, 32, 32, 32,
+        32, 31, 31, 30, 30, 28, 28, 25, 23, 22, 21, 19, 18, 17, 16, 15, 14, 14,
+        13, 13, 12, 12, 12, 11, 11, 10, 10, 10, 32, 32, 32, 31, 30, 29, 29, 28,
+        28, 26, 26, 24, 23, 22, 20, 19, 18, 17, 16, 15, 15, 14, 13, 13, 12, 12,
+        12, 11, 11, 11, 10, 10, 32, 31, 31, 31, 30, 28, 28, 27, 26, 25, 24, 23,
+        22, 21, 20, 19, 18, 17, 16, 15, 14, 14, 13, 13, 13, 12, 12, 12, 11, 11,
+        11, 10, 28, 29, 29, 29, 28, 27, 26, 24, 21, 21, 20, 19, 18, 18, 17, 16,
+        16, 15, 14, 13, 13, 13, 12, 12, 12, 11, 11, 11, 11, 11, 10, 10, 23, 24,
+        25, 25, 24, 24, 23, 21, 19, 18, 18, 16, 16, 15, 14, 14, 13, 13, 12, 12,
+        11, 11, 11, 11, 11, 11, 11, 10, 10, 10, 9, 9, 22, 23, 23, 23, 23, 23,
+        22, 20, 18, 18, 17, 16, 15, 14, 14, 13, 13, 12, 12, 11, 11, 11, 10, 10,
+        10, 10, 9, 9, 9, 9, 9, 9, 19, 20, 21, 21, 20, 21, 20, 19, 17, 16, 16,
+        14, 14, 13, 12, 12, 12, 11, 11, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 8,
+        8, 16, 17, 17, 17, 17, 18, 17, 16, 15, 14, 14, 13, 12, 12, 11, 10, 10,
+        10, 9, 9, 9, 8, 8, 8, 8, 9, 8, 8, 8, 8, 8, 8, 14, 15, 15, 16, 16, 16,
+        16, 15, 14, 13, 13, 12, 11, 11, 10, 10, 10, 9, 9, 8, 8, 8, 8, 8, 8, 8,
+        8, 8, 8, 8, 7, 7, 13, 14, 14, 14, 14, 15, 14, 14, 13, 13, 12, 11, 11,
+        11, 10, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 12, 12, 13,
+        13, 13, 13, 13, 13, 12, 12, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7,
+        7, 7, 7, 6, 6, 7, 7, 6, 6, 11, 12, 12, 12, 13, 13, 13, 12, 12, 12, 11,
+        11, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 11,
+        12, 12, 12, 12, 12, 13, 13, 12, 12, 11, 11, 10, 10, 10, 9, 9, 8, 8, 8,
+        7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 5, 11, 11, 11, 12, 12, 12, 12, 12, 12,
+        11, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 6, 5, 5,
+        5, 10, 11, 11, 11, 11, 12, 12, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 8,
+        8, 8, 8, 7, 7, 6, 6, 6, 6, 6, 6, 5, 5, 5,
+        /* Size 4x16 */
+        33, 23, 14, 11, 32, 25, 15, 12, 32, 24, 16, 12, 31, 23, 16, 13, 30, 19,
+        14, 12, 28, 18, 13, 11, 23, 16, 11, 10, 21, 14, 10, 10, 18, 13, 10, 9,
+        16, 12, 9, 8, 14, 11, 8, 7, 13, 11, 8, 7, 12, 11, 8, 6, 12, 11, 8, 6,
+        11, 10, 8, 6, 10, 9, 7, 6,
+        /* Size 16x4 */
+        33, 32, 32, 31, 30, 28, 23, 21, 18, 16, 14, 13, 12, 12, 11, 10, 23, 25,
+        24, 23, 19, 18, 16, 14, 13, 12, 11, 11, 11, 11, 10, 9, 14, 15, 16, 16,
+        14, 13, 11, 10, 10, 9, 8, 8, 8, 8, 8, 7, 11, 12, 12, 13, 12, 11, 10, 10,
+        9, 8, 7, 7, 6, 6, 6, 6,
+        /* Size 8x32 */
+        32, 32, 28, 22, 16, 13, 11, 11, 33, 32, 29, 23, 17, 14, 12, 11, 33, 32,
+        29, 23, 17, 14, 12, 11, 33, 31, 29, 23, 17, 14, 12, 12, 32, 30, 28, 23,
+        17, 14, 13, 12, 32, 29, 27, 23, 18, 15, 13, 12, 32, 29, 26, 22, 17, 14,
+        13, 12, 30, 28, 24, 20, 16, 14, 12, 12, 28, 28, 21, 18, 15, 13, 12, 12,
+        27, 26, 21, 18, 14, 13, 12, 11, 26, 26, 20, 17, 14, 12, 11, 11, 23, 24,
+        19, 16, 13, 11, 11, 11, 22, 23, 18, 15, 12, 11, 10, 10, 21, 22, 18, 14,
+        12, 11, 10, 10, 19, 20, 17, 14, 11, 10, 9, 9, 18, 19, 16, 13, 10, 9, 9,
+        9, 17, 18, 16, 13, 10, 9, 9, 9, 16, 17, 15, 12, 10, 9, 8, 8, 14, 16, 14,
+        12, 9, 8, 8, 8, 13, 15, 13, 11, 9, 8, 8, 8, 13, 15, 13, 11, 9, 8, 7, 7,
+        12, 14, 13, 11, 8, 8, 7, 7, 12, 13, 12, 10, 8, 7, 7, 7, 12, 13, 12, 10,
+        8, 7, 7, 7, 11, 12, 12, 10, 8, 7, 7, 6, 11, 12, 11, 10, 9, 7, 6, 6, 10,
+        12, 11, 9, 8, 7, 6, 6, 10, 11, 11, 9, 8, 7, 6, 6, 10, 11, 11, 9, 8, 7,
+        6, 6, 10, 11, 11, 9, 8, 7, 6, 5, 9, 10, 10, 9, 8, 7, 6, 5, 9, 10, 10, 9,
+        8, 7, 6, 5,
+        /* Size 32x8 */
+        32, 33, 33, 33, 32, 32, 32, 30, 28, 27, 26, 23, 22, 21, 19, 18, 17, 16,
+        14, 13, 13, 12, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 32, 32, 32, 31,
+        30, 29, 29, 28, 28, 26, 26, 24, 23, 22, 20, 19, 18, 17, 16, 15, 15, 14,
+        13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 28, 29, 29, 29, 28, 27, 26, 24,
+        21, 21, 20, 19, 18, 18, 17, 16, 16, 15, 14, 13, 13, 13, 12, 12, 12, 11,
+        11, 11, 11, 11, 10, 10, 22, 23, 23, 23, 23, 23, 22, 20, 18, 18, 17, 16,
+        15, 14, 14, 13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 9,
+        9, 16, 17, 17, 17, 17, 18, 17, 16, 15, 14, 14, 13, 12, 12, 11, 10, 10,
+        10, 9, 9, 9, 8, 8, 8, 8, 9, 8, 8, 8, 8, 8, 8, 13, 14, 14, 14, 14, 15,
+        14, 14, 13, 13, 12, 11, 11, 11, 10, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7,
+        7, 7, 7, 7, 7, 11, 12, 12, 12, 13, 13, 13, 12, 12, 12, 11, 11, 10, 10,
+        9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 11, 11, 11, 12,
+        12, 12, 12, 12, 12, 11, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 6,
+        6, 6, 6, 6, 5, 5, 5 },
+      { /* Chroma */
+        /* Size 4x4 */
+        31, 23, 18, 16, 23, 18, 16, 15, 18, 16, 12, 12, 16, 15, 12, 10,
+        /* Size 8x8 */
+        33, 27, 22, 21, 19, 17, 16, 15, 27, 22, 22, 22, 20, 19, 17, 16, 22, 22,
+        19, 19, 18, 16, 16, 16, 21, 22, 19, 17, 15, 14, 14, 14, 19, 20, 18, 15,
+        13, 12, 12, 12, 17, 19, 16, 14, 12, 11, 11, 11, 16, 17, 16, 14, 12, 11,
+        10, 10, 15, 16, 16, 14, 12, 11, 10, 9,
+        /* Size 16x16 */
+        32, 34, 31, 27, 21, 21, 20, 20, 19, 17, 16, 16, 15, 15, 14, 14, 34, 33,
+        29, 25, 22, 22, 22, 21, 20, 19, 18, 17, 16, 16, 15, 15, 31, 29, 26, 23,
+        22, 22, 22, 22, 20, 19, 18, 18, 17, 17, 16, 15, 27, 25, 23, 22, 21, 21,
+        22, 21, 20, 19, 19, 18, 18, 17, 17, 16, 21, 22, 22, 21, 19, 19, 19, 19,
+        18, 18, 17, 17, 17, 16, 16, 16, 21, 22, 22, 21, 19, 19, 18, 18, 17, 17,
+        16, 16, 15, 16, 15, 15, 20, 22, 22, 22, 19, 18, 17, 16, 16, 15, 15, 14,
+        14, 14, 14, 14, 20, 21, 22, 21, 19, 18, 16, 16, 15, 14, 14, 13, 14, 13,
+        13, 13, 19, 20, 20, 20, 18, 17, 16, 15, 14, 13, 13, 13, 13, 13, 13, 13,
+        17, 19, 19, 19, 18, 17, 15, 14, 13, 12, 12, 12, 12, 12, 12, 12, 16, 18,
+        18, 19, 17, 16, 15, 14, 13, 12, 12, 11, 11, 11, 12, 11, 16, 17, 18, 18,
+        17, 16, 14, 13, 13, 12, 11, 11, 11, 11, 11, 11, 15, 16, 17, 18, 17, 15,
+        14, 14, 13, 12, 11, 11, 10, 10, 10, 10, 15, 16, 17, 17, 16, 16, 14, 13,
+        13, 12, 11, 11, 10, 10, 10, 10, 14, 15, 16, 17, 16, 15, 14, 13, 13, 12,
+        12, 11, 10, 10, 10, 9, 14, 15, 15, 16, 16, 15, 14, 13, 13, 12, 11, 11,
+        10, 10, 9, 9,
+        /* Size 32x32 */
+        32, 33, 34, 33, 31, 28, 27, 25, 21, 21, 21, 21, 20, 20, 20, 19, 19, 18,
+        17, 17, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 33, 33, 33, 32,
+        30, 27, 26, 24, 22, 22, 22, 22, 21, 21, 20, 20, 19, 19, 18, 17, 17, 17,
+        16, 16, 16, 16, 15, 15, 15, 15, 15, 14, 34, 33, 33, 32, 29, 26, 25, 24,
+        22, 22, 22, 23, 22, 22, 21, 20, 20, 20, 19, 18, 18, 17, 17, 17, 16, 16,
+        16, 15, 15, 15, 15, 14, 33, 32, 32, 31, 28, 26, 25, 24, 22, 22, 23, 23,
+        22, 22, 22, 21, 20, 20, 19, 18, 18, 18, 17, 17, 17, 16, 16, 16, 16, 16,
+        15, 15, 31, 30, 29, 28, 26, 24, 23, 23, 22, 22, 22, 23, 22, 22, 22, 21,
+        20, 20, 19, 19, 18, 18, 18, 17, 17, 17, 17, 16, 16, 16, 15, 15, 28, 27,
+        26, 26, 24, 22, 22, 22, 21, 22, 22, 23, 22, 22, 22, 21, 21, 20, 20, 19,
+        19, 19, 18, 18, 17, 17, 17, 16, 16, 16, 16, 16, 27, 26, 25, 25, 23, 22,
+        22, 21, 21, 21, 21, 22, 22, 22, 21, 21, 20, 20, 19, 19, 19, 18, 18, 18,
+        18, 17, 17, 17, 17, 16, 16, 16, 25, 24, 24, 24, 23, 22, 21, 21, 20, 20,
+        21, 21, 20, 20, 20, 20, 20, 19, 19, 18, 18, 18, 17, 17, 17, 17, 17, 17,
+        17, 16, 16, 16, 21, 22, 22, 22, 22, 21, 21, 20, 19, 19, 19, 19, 19, 19,
+        19, 19, 18, 18, 18, 17, 17, 17, 17, 16, 17, 17, 16, 16, 16, 16, 16, 16,
+        21, 22, 22, 22, 22, 22, 21, 20, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18,
+        17, 17, 17, 16, 16, 16, 16, 16, 16, 16, 16, 16, 15, 15, 21, 22, 22, 23,
+        22, 22, 21, 21, 19, 19, 19, 19, 18, 18, 18, 18, 17, 17, 17, 16, 16, 16,
+        16, 16, 15, 16, 16, 15, 15, 15, 15, 15, 21, 22, 23, 23, 23, 23, 22, 21,
+        19, 19, 19, 18, 17, 17, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 15, 15,
+        15, 15, 15, 15, 15, 14, 20, 21, 22, 22, 22, 22, 22, 20, 19, 19, 18, 17,
+        17, 17, 16, 16, 16, 16, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14,
+        14, 14, 20, 21, 22, 22, 22, 22, 22, 20, 19, 19, 18, 17, 17, 17, 16, 16,
+        16, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 20, 20,
+        21, 22, 22, 22, 21, 20, 19, 18, 18, 17, 16, 16, 16, 15, 15, 15, 14, 14,
+        14, 14, 13, 13, 14, 13, 13, 14, 13, 13, 13, 14, 19, 20, 20, 21, 21, 21,
+        21, 20, 19, 18, 18, 17, 16, 16, 15, 14, 14, 14, 14, 13, 13, 13, 13, 13,
+        13, 13, 13, 13, 13, 13, 13, 13, 19, 19, 20, 20, 20, 21, 20, 20, 18, 18,
+        17, 16, 16, 16, 15, 14, 14, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
+        13, 13, 13, 13, 18, 19, 20, 20, 20, 20, 20, 19, 18, 18, 17, 16, 16, 15,
+        15, 14, 14, 13, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
+        17, 18, 19, 19, 19, 20, 19, 19, 18, 17, 17, 16, 15, 15, 14, 14, 13, 13,
+        12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 17, 17, 18, 18,
+        19, 19, 19, 18, 17, 17, 16, 16, 15, 15, 14, 13, 13, 13, 12, 12, 12, 12,
+        11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 16, 17, 18, 18, 18, 19, 19, 18,
+        17, 17, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11,
+        11, 11, 12, 11, 11, 12, 16, 17, 17, 18, 18, 19, 18, 18, 17, 16, 16, 15,
+        14, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
+        11, 11, 16, 16, 17, 17, 18, 18, 18, 17, 17, 16, 16, 15, 14, 14, 13, 13,
+        13, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 15, 16,
+        17, 17, 17, 18, 18, 17, 16, 16, 16, 15, 14, 14, 13, 13, 13, 12, 12, 12,
+        11, 11, 11, 11, 11, 10, 10, 11, 11, 11, 11, 10, 15, 16, 16, 17, 17, 17,
+        18, 17, 17, 16, 15, 15, 14, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 11,
+        10, 10, 10, 10, 10, 10, 10, 10, 15, 16, 16, 16, 17, 17, 17, 17, 17, 16,
+        16, 15, 14, 14, 13, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10,
+        10, 10, 10, 10, 15, 15, 16, 16, 17, 17, 17, 17, 16, 16, 16, 15, 14, 14,
+        13, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10,
+        14, 15, 15, 16, 16, 16, 17, 17, 16, 16, 15, 15, 14, 14, 14, 13, 13, 12,
+        12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 9, 14, 15, 15, 16,
+        16, 16, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12, 12, 11,
+        11, 11, 10, 10, 10, 10, 10, 10, 9, 9, 14, 15, 15, 16, 16, 16, 16, 16,
+        16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12, 11, 11, 11, 11, 10, 10,
+        10, 10, 10, 9, 9, 9, 14, 15, 15, 15, 15, 16, 16, 16, 16, 15, 15, 15, 14,
+        14, 13, 13, 13, 12, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9,
+        14, 14, 14, 15, 15, 16, 16, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 12,
+        12, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 9,
+        /* Size 4x8 */
+        33, 22, 18, 16, 26, 23, 20, 17, 22, 19, 17, 16, 22, 17, 15, 14, 20, 16,
+        13, 13, 17, 15, 12, 11, 16, 16, 12, 10, 16, 15, 12, 10,
+        /* Size 8x4 */
+        33, 26, 22, 22, 20, 17, 16, 16, 22, 23, 19, 17, 16, 15, 16, 15, 18, 20,
+        17, 15, 13, 12, 12, 12, 16, 17, 16, 14, 13, 11, 10, 10,
+        /* Size 8x16 */
+        32, 29, 21, 20, 18, 16, 15, 15, 34, 27, 22, 22, 20, 18, 16, 16, 31, 25,
+        22, 22, 20, 18, 17, 16, 26, 22, 21, 22, 20, 19, 18, 17, 21, 21, 19, 19,
+        18, 17, 17, 17, 21, 22, 19, 18, 17, 16, 16, 16, 20, 22, 19, 17, 16, 15,
+        14, 15, 20, 22, 19, 16, 14, 14, 14, 14, 19, 21, 18, 16, 14, 13, 13, 13,
+        17, 19, 18, 15, 13, 12, 12, 12, 16, 19, 17, 15, 12, 12, 11, 12, 16, 18,
+        17, 14, 12, 11, 11, 11, 15, 17, 16, 14, 13, 11, 11, 11, 15, 17, 16, 14,
+        13, 12, 10, 10, 14, 16, 16, 14, 12, 11, 10, 10, 14, 15, 16, 14, 13, 12,
+        10, 10,
+        /* Size 16x8 */
+        32, 34, 31, 26, 21, 21, 20, 20, 19, 17, 16, 16, 15, 15, 14, 14, 29, 27,
+        25, 22, 21, 22, 22, 22, 21, 19, 19, 18, 17, 17, 16, 15, 21, 22, 22, 21,
+        19, 19, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 20, 22, 22, 22, 19, 18,
+        17, 16, 16, 15, 15, 14, 14, 14, 14, 14, 18, 20, 20, 20, 18, 17, 16, 14,
+        14, 13, 12, 12, 13, 13, 12, 13, 16, 18, 18, 19, 17, 16, 15, 14, 13, 12,
+        12, 11, 11, 12, 11, 12, 15, 16, 17, 18, 17, 16, 14, 14, 13, 12, 11, 11,
+        11, 10, 10, 10, 15, 16, 16, 17, 17, 16, 15, 14, 13, 12, 12, 11, 11, 10,
+        10, 10,
+        /* Size 16x32 */
+        32, 33, 29, 27, 21, 21, 20, 20, 18, 17, 16, 15, 15, 15, 15, 14, 33, 33,
+        28, 26, 22, 22, 21, 20, 19, 18, 17, 16, 16, 16, 16, 15, 34, 32, 27, 26,
+        22, 23, 22, 21, 20, 19, 18, 17, 16, 16, 16, 15, 33, 31, 27, 25, 22, 23,
+        22, 21, 20, 19, 18, 17, 17, 17, 16, 16, 31, 28, 25, 23, 22, 22, 22, 22,
+        20, 19, 18, 17, 17, 17, 16, 16, 28, 26, 23, 22, 22, 23, 22, 22, 20, 20,
+        19, 18, 17, 17, 17, 17, 26, 25, 22, 22, 21, 22, 22, 21, 20, 19, 19, 18,
+        18, 17, 17, 17, 24, 24, 22, 21, 20, 21, 20, 20, 19, 18, 18, 17, 17, 17,
+        17, 17, 21, 22, 21, 21, 19, 19, 19, 19, 18, 17, 17, 16, 17, 17, 17, 17,
+        21, 22, 22, 21, 19, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 16, 21, 22,
+        22, 21, 19, 18, 18, 18, 17, 17, 16, 16, 16, 16, 16, 16, 21, 23, 23, 22,
+        19, 18, 17, 17, 16, 16, 15, 15, 15, 15, 16, 15, 20, 22, 22, 21, 19, 17,
+        17, 16, 16, 15, 15, 14, 14, 15, 15, 15, 20, 22, 22, 21, 19, 17, 17, 16,
+        15, 15, 14, 14, 14, 14, 15, 14, 20, 21, 22, 21, 19, 17, 16, 16, 14, 14,
+        14, 13, 14, 14, 14, 14, 19, 20, 21, 20, 19, 17, 16, 15, 14, 13, 13, 13,
+        13, 13, 14, 14, 19, 20, 21, 20, 18, 16, 16, 15, 14, 13, 13, 13, 13, 13,
+        13, 14, 18, 20, 20, 20, 18, 16, 16, 15, 13, 13, 12, 12, 12, 13, 13, 13,
+        17, 19, 19, 19, 18, 16, 15, 14, 13, 12, 12, 12, 12, 12, 12, 13, 17, 18,
+        19, 19, 17, 16, 15, 14, 13, 12, 12, 12, 12, 12, 12, 12, 16, 18, 19, 18,
+        17, 15, 15, 14, 12, 12, 12, 11, 11, 12, 12, 12, 16, 17, 18, 18, 17, 15,
+        14, 14, 12, 12, 11, 11, 11, 11, 12, 12, 16, 17, 18, 18, 17, 15, 14, 13,
+        12, 12, 11, 11, 11, 11, 11, 12, 15, 17, 17, 18, 16, 15, 14, 13, 12, 12,
+        11, 11, 11, 11, 11, 11, 15, 17, 17, 17, 16, 15, 14, 13, 13, 12, 11, 11,
+        11, 10, 11, 11, 15, 16, 17, 17, 16, 16, 14, 13, 13, 12, 11, 11, 10, 10,
+        10, 10, 15, 16, 17, 17, 16, 16, 14, 13, 13, 12, 12, 11, 10, 10, 10, 10,
+        14, 16, 16, 17, 16, 15, 14, 14, 12, 12, 11, 11, 10, 10, 10, 10, 14, 16,
+        16, 17, 16, 15, 14, 14, 12, 12, 11, 11, 10, 10, 10, 10, 14, 16, 16, 16,
+        16, 15, 14, 13, 13, 12, 11, 11, 10, 10, 10, 10, 14, 15, 15, 16, 16, 15,
+        14, 13, 13, 12, 12, 11, 10, 10, 10, 10, 14, 15, 15, 16, 16, 14, 14, 13,
+        13, 12, 12, 11, 11, 10, 10, 9,
+        /* Size 32x16 */
+        32, 33, 34, 33, 31, 28, 26, 24, 21, 21, 21, 21, 20, 20, 20, 19, 19, 18,
+        17, 17, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 33, 33, 32, 31,
+        28, 26, 25, 24, 22, 22, 22, 23, 22, 22, 21, 20, 20, 20, 19, 18, 18, 17,
+        17, 17, 17, 16, 16, 16, 16, 16, 15, 15, 29, 28, 27, 27, 25, 23, 22, 22,
+        21, 22, 22, 23, 22, 22, 22, 21, 21, 20, 19, 19, 19, 18, 18, 17, 17, 17,
+        17, 16, 16, 16, 15, 15, 27, 26, 26, 25, 23, 22, 22, 21, 21, 21, 21, 22,
+        21, 21, 21, 20, 20, 20, 19, 19, 18, 18, 18, 18, 17, 17, 17, 17, 17, 16,
+        16, 16, 21, 22, 22, 22, 22, 22, 21, 20, 19, 19, 19, 19, 19, 19, 19, 19,
+        18, 18, 18, 17, 17, 17, 17, 16, 16, 16, 16, 16, 16, 16, 16, 16, 21, 22,
+        23, 23, 22, 23, 22, 21, 19, 19, 18, 18, 17, 17, 17, 17, 16, 16, 16, 16,
+        15, 15, 15, 15, 15, 16, 16, 15, 15, 15, 15, 14, 20, 21, 22, 22, 22, 22,
+        22, 20, 19, 19, 18, 17, 17, 17, 16, 16, 16, 16, 15, 15, 15, 14, 14, 14,
+        14, 14, 14, 14, 14, 14, 14, 14, 20, 20, 21, 21, 22, 22, 21, 20, 19, 18,
+        18, 17, 16, 16, 16, 15, 15, 15, 14, 14, 14, 14, 13, 13, 13, 13, 13, 14,
+        14, 13, 13, 13, 18, 19, 20, 20, 20, 20, 20, 19, 18, 18, 17, 16, 16, 15,
+        14, 14, 14, 13, 13, 13, 12, 12, 12, 12, 13, 13, 13, 12, 12, 13, 13, 13,
+        17, 18, 19, 19, 19, 20, 19, 18, 17, 17, 17, 16, 15, 15, 14, 13, 13, 13,
+        12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 16, 17, 18, 18,
+        18, 19, 19, 18, 17, 17, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 12, 11,
+        11, 11, 11, 11, 12, 11, 11, 11, 12, 12, 15, 16, 17, 17, 17, 18, 18, 17,
+        16, 16, 16, 15, 14, 14, 13, 13, 13, 12, 12, 12, 11, 11, 11, 11, 11, 11,
+        11, 11, 11, 11, 11, 11, 15, 16, 16, 17, 17, 17, 18, 17, 17, 16, 16, 15,
+        14, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10,
+        10, 11, 15, 16, 16, 17, 17, 17, 17, 17, 17, 16, 16, 15, 15, 14, 14, 13,
+        13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 15, 16,
+        16, 16, 16, 17, 17, 17, 17, 16, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12,
+        12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 14, 15, 15, 16, 16, 17,
+        17, 17, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 12, 12, 12, 12, 11,
+        11, 10, 10, 10, 10, 10, 10, 9,
+        /* Size 4x16 */
+        33, 21, 17, 15, 32, 23, 19, 16, 28, 22, 19, 17, 25, 22, 19, 17, 22, 19,
+        17, 17, 22, 18, 17, 16, 22, 17, 15, 15, 21, 17, 14, 14, 20, 16, 13, 13,
+        19, 16, 12, 12, 18, 15, 12, 12, 17, 15, 12, 11, 17, 15, 12, 10, 16, 16,
+        12, 10, 16, 15, 12, 10, 15, 15, 12, 10,
+        /* Size 16x4 */
+        33, 32, 28, 25, 22, 22, 22, 21, 20, 19, 18, 17, 17, 16, 16, 15, 21, 23,
+        22, 22, 19, 18, 17, 17, 16, 16, 15, 15, 15, 16, 15, 15, 17, 19, 19, 19,
+        17, 17, 15, 14, 13, 12, 12, 12, 12, 12, 12, 12, 15, 16, 17, 17, 17, 16,
+        15, 14, 13, 12, 12, 11, 10, 10, 10, 10,
+        /* Size 8x32 */
+        32, 29, 21, 20, 18, 16, 15, 15, 33, 28, 22, 21, 19, 17, 16, 16, 34, 27,
+        22, 22, 20, 18, 16, 16, 33, 27, 22, 22, 20, 18, 17, 16, 31, 25, 22, 22,
+        20, 18, 17, 16, 28, 23, 22, 22, 20, 19, 17, 17, 26, 22, 21, 22, 20, 19,
+        18, 17, 24, 22, 20, 20, 19, 18, 17, 17, 21, 21, 19, 19, 18, 17, 17, 17,
+        21, 22, 19, 19, 18, 17, 16, 16, 21, 22, 19, 18, 17, 16, 16, 16, 21, 23,
+        19, 17, 16, 15, 15, 16, 20, 22, 19, 17, 16, 15, 14, 15, 20, 22, 19, 17,
+        15, 14, 14, 15, 20, 22, 19, 16, 14, 14, 14, 14, 19, 21, 19, 16, 14, 13,
+        13, 14, 19, 21, 18, 16, 14, 13, 13, 13, 18, 20, 18, 16, 13, 12, 12, 13,
+        17, 19, 18, 15, 13, 12, 12, 12, 17, 19, 17, 15, 13, 12, 12, 12, 16, 19,
+        17, 15, 12, 12, 11, 12, 16, 18, 17, 14, 12, 11, 11, 12, 16, 18, 17, 14,
+        12, 11, 11, 11, 15, 17, 16, 14, 12, 11, 11, 11, 15, 17, 16, 14, 13, 11,
+        11, 11, 15, 17, 16, 14, 13, 11, 10, 10, 15, 17, 16, 14, 13, 12, 10, 10,
+        14, 16, 16, 14, 12, 11, 10, 10, 14, 16, 16, 14, 12, 11, 10, 10, 14, 16,
+        16, 14, 13, 11, 10, 10, 14, 15, 16, 14, 13, 12, 10, 10, 14, 15, 16, 14,
+        13, 12, 11, 10,
+        /* Size 32x8 */
+        32, 33, 34, 33, 31, 28, 26, 24, 21, 21, 21, 21, 20, 20, 20, 19, 19, 18,
+        17, 17, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 29, 28, 27, 27,
+        25, 23, 22, 22, 21, 22, 22, 23, 22, 22, 22, 21, 21, 20, 19, 19, 19, 18,
+        18, 17, 17, 17, 17, 16, 16, 16, 15, 15, 21, 22, 22, 22, 22, 22, 21, 20,
+        19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 17, 17, 17, 17, 16, 16, 16,
+        16, 16, 16, 16, 16, 16, 20, 21, 22, 22, 22, 22, 22, 20, 19, 19, 18, 17,
+        17, 17, 16, 16, 16, 16, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14,
+        14, 14, 18, 19, 20, 20, 20, 20, 20, 19, 18, 18, 17, 16, 16, 15, 14, 14,
+        14, 13, 13, 13, 12, 12, 12, 12, 13, 13, 13, 12, 12, 13, 13, 13, 16, 17,
+        18, 18, 18, 19, 19, 18, 17, 17, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12,
+        12, 11, 11, 11, 11, 11, 12, 11, 11, 11, 12, 12, 15, 16, 16, 17, 17, 17,
+        18, 17, 17, 16, 16, 15, 14, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 11,
+        11, 10, 10, 10, 10, 10, 10, 11, 15, 16, 16, 16, 16, 17, 17, 17, 17, 16,
+        16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 10, 10, 10,
+        10, 10, 10, 10 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 27, 16, 12, 27, 18, 13, 11, 16, 13, 9, 8, 12, 11, 8, 6,
+        /* Size 8x8 */
+        32, 32, 29, 22, 18, 13, 12, 11, 32, 30, 28, 23, 19, 15, 13, 11, 29, 28,
+        21, 18, 16, 13, 12, 11, 22, 23, 18, 15, 13, 11, 10, 10, 18, 19, 16, 13,
+        11, 9, 8, 8, 13, 15, 13, 11, 9, 8, 7, 7, 12, 13, 12, 10, 8, 7, 7, 6, 11,
+        11, 11, 10, 8, 7, 6, 6,
+        /* Size 16x16 */
+        32, 33, 33, 32, 30, 26, 23, 21, 18, 16, 14, 13, 12, 11, 10, 10, 33, 32,
+        32, 32, 30, 27, 25, 22, 19, 17, 16, 14, 13, 12, 11, 10, 33, 32, 31, 30,
+        28, 26, 24, 22, 19, 17, 16, 14, 13, 12, 12, 11, 32, 32, 30, 29, 28, 26,
+        24, 22, 20, 18, 16, 14, 14, 13, 12, 11, 30, 30, 28, 28, 24, 22, 20, 19,
+        17, 16, 15, 13, 12, 12, 12, 11, 26, 27, 26, 26, 22, 19, 18, 17, 15, 14,
+        13, 12, 11, 11, 11, 10, 23, 25, 24, 24, 20, 18, 16, 15, 14, 13, 12, 11,
+        11, 10, 10, 10, 21, 22, 22, 22, 19, 17, 15, 14, 13, 12, 11, 10, 10, 10,
+        9, 9, 18, 19, 19, 20, 17, 15, 14, 13, 11, 11, 10, 9, 9, 9, 9, 8, 16, 17,
+        17, 18, 16, 14, 13, 12, 11, 10, 9, 9, 8, 8, 8, 8, 14, 16, 16, 16, 15,
+        13, 12, 11, 10, 9, 9, 8, 8, 8, 8, 7, 13, 14, 14, 14, 13, 12, 11, 10, 9,
+        9, 8, 7, 7, 7, 7, 7, 12, 13, 13, 14, 12, 11, 11, 10, 9, 8, 8, 7, 7, 7,
+        6, 6, 11, 12, 12, 13, 12, 11, 10, 10, 9, 8, 8, 7, 7, 6, 6, 6, 10, 11,
+        12, 12, 12, 11, 10, 9, 9, 8, 8, 7, 6, 6, 6, 6, 10, 10, 11, 11, 11, 10,
+        10, 9, 8, 8, 7, 7, 6, 6, 6, 5,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 32, 32, 30, 30, 28, 26, 25, 23, 21, 21, 19, 18, 17,
+        16, 15, 14, 13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 10, 33, 32, 32, 32,
+        32, 32, 32, 30, 30, 29, 27, 26, 24, 22, 22, 20, 19, 18, 17, 16, 15, 13,
+        13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 33, 32, 32, 32, 32, 32, 32, 31,
+        30, 30, 27, 26, 25, 23, 22, 20, 19, 19, 17, 16, 16, 14, 14, 13, 13, 12,
+        12, 12, 11, 11, 10, 10, 33, 32, 32, 32, 32, 32, 32, 31, 30, 30, 28, 27,
+        25, 23, 23, 21, 19, 19, 17, 16, 16, 14, 14, 14, 13, 13, 12, 12, 12, 11,
+        11, 11, 33, 32, 32, 32, 31, 31, 30, 29, 28, 28, 26, 26, 24, 23, 22, 20,
+        19, 19, 17, 16, 16, 14, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 32, 32,
+        32, 32, 31, 30, 30, 28, 28, 28, 26, 26, 24, 23, 22, 21, 19, 19, 18, 17,
+        16, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 32, 32, 32, 32, 30, 30,
+        29, 28, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 14,
+        14, 13, 13, 12, 12, 12, 11, 11, 30, 30, 31, 31, 29, 28, 28, 26, 25, 24,
+        23, 22, 22, 20, 20, 19, 18, 17, 16, 16, 15, 14, 14, 13, 13, 13, 12, 12,
+        12, 12, 11, 11, 30, 30, 30, 30, 28, 28, 28, 25, 24, 23, 22, 21, 20, 19,
+        19, 18, 17, 17, 16, 15, 15, 13, 13, 13, 12, 12, 12, 12, 12, 11, 11, 11,
+        28, 29, 30, 30, 28, 28, 27, 24, 23, 21, 20, 20, 19, 18, 18, 17, 16, 16,
+        15, 14, 14, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 10, 26, 27, 27, 28,
+        26, 26, 26, 23, 22, 20, 19, 19, 18, 17, 17, 16, 15, 15, 14, 13, 13, 12,
+        12, 12, 11, 12, 11, 11, 11, 11, 10, 10, 25, 26, 26, 27, 26, 26, 25, 22,
+        21, 20, 19, 18, 17, 17, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11,
+        11, 11, 11, 10, 10, 10, 23, 24, 25, 25, 24, 24, 24, 22, 20, 19, 18, 17,
+        16, 16, 15, 14, 14, 14, 13, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 10,
+        10, 10, 21, 22, 23, 23, 23, 23, 23, 20, 19, 18, 17, 17, 16, 15, 14, 13,
+        13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 9, 9, 21, 22,
+        22, 23, 22, 22, 22, 20, 19, 18, 17, 16, 15, 14, 14, 13, 13, 12, 12, 11,
+        11, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 19, 20, 20, 21, 20, 21, 21,
+        19, 18, 17, 16, 15, 14, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 9,
+        9, 9, 9, 9, 9, 9, 9, 18, 19, 19, 19, 19, 19, 20, 18, 17, 16, 15, 15, 14,
+        13, 13, 12, 11, 11, 11, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 8, 8, 9, 17, 18,
+        19, 19, 19, 19, 19, 17, 17, 16, 15, 14, 14, 13, 12, 12, 11, 11, 10, 10,
+        10, 9, 9, 9, 9, 8, 9, 8, 8, 8, 8, 8, 16, 17, 17, 17, 17, 18, 18, 16, 16,
+        15, 14, 14, 13, 12, 12, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8,
+        8, 8, 15, 16, 16, 16, 16, 17, 17, 16, 15, 14, 13, 13, 12, 12, 11, 11,
+        10, 10, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 14, 15, 16, 16, 16,
+        16, 16, 15, 15, 14, 13, 13, 12, 11, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 8,
+        8, 8, 7, 8, 7, 7, 7, 13, 13, 14, 14, 14, 15, 15, 14, 13, 13, 12, 12, 11,
+        11, 10, 10, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 13, 13, 14,
+        14, 14, 14, 14, 14, 13, 13, 12, 12, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 7,
+        7, 7, 7, 7, 7, 7, 7, 7, 7, 12, 13, 13, 14, 14, 14, 14, 13, 13, 13, 12,
+        12, 11, 10, 10, 10, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 12,
+        12, 13, 13, 13, 13, 14, 13, 12, 12, 11, 11, 11, 10, 10, 9, 9, 9, 8, 8,
+        8, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 11, 12, 12, 13, 13, 13, 13, 13, 12,
+        12, 12, 11, 10, 10, 10, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6,
+        6, 11, 12, 12, 12, 12, 12, 13, 12, 12, 12, 11, 11, 10, 10, 10, 9, 9, 9,
+        8, 8, 8, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 11, 11, 12, 12, 12, 12, 12,
+        12, 12, 12, 11, 11, 10, 10, 10, 9, 9, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6,
+        6, 6, 6, 6, 10, 11, 11, 12, 12, 12, 12, 12, 12, 11, 11, 11, 10, 10, 9,
+        9, 9, 8, 8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 10, 11, 11, 11, 11,
+        11, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 6, 6,
+        6, 6, 6, 6, 6, 5, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 10, 10, 10, 9,
+        9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 5, 5, 10, 10, 10, 11,
+        11, 11, 11, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 8, 8, 7, 7, 7, 7, 6, 6,
+        6, 6, 6, 6, 5, 5, 5,
+        /* Size 4x8 */
+        32, 27, 17, 12, 32, 26, 18, 13, 30, 20, 15, 12, 23, 17, 12, 10, 19, 15,
+        10, 9, 14, 12, 9, 8, 12, 12, 8, 7, 11, 10, 8, 6,
+        /* Size 8x4 */
+        32, 32, 30, 23, 19, 14, 12, 11, 27, 26, 20, 17, 15, 12, 12, 10, 17, 18,
+        15, 12, 10, 9, 8, 8, 12, 13, 12, 10, 9, 8, 7, 6,
+        /* Size 8x16 */
+        32, 32, 28, 23, 18, 13, 12, 11, 33, 32, 29, 25, 19, 14, 13, 12, 32, 31,
+        28, 24, 19, 14, 13, 12, 32, 30, 27, 24, 20, 15, 13, 12, 30, 28, 23, 20,
+        17, 14, 13, 12, 26, 26, 20, 18, 15, 12, 12, 11, 23, 24, 19, 16, 14, 11,
+        11, 11, 21, 22, 18, 15, 13, 11, 10, 10, 18, 19, 16, 14, 11, 9, 9, 9, 16,
+        17, 15, 13, 11, 9, 8, 8, 14, 16, 14, 12, 10, 8, 8, 8, 13, 14, 13, 11, 9,
+        8, 7, 7, 12, 13, 12, 11, 9, 7, 7, 7, 11, 12, 12, 10, 9, 8, 7, 6, 10, 12,
+        12, 10, 8, 7, 6, 6, 10, 11, 11, 10, 9, 7, 6, 6,
+        /* Size 16x8 */
+        32, 33, 32, 32, 30, 26, 23, 21, 18, 16, 14, 13, 12, 11, 10, 10, 32, 32,
+        31, 30, 28, 26, 24, 22, 19, 17, 16, 14, 13, 12, 12, 11, 28, 29, 28, 27,
+        23, 20, 19, 18, 16, 15, 14, 13, 12, 12, 12, 11, 23, 25, 24, 24, 20, 18,
+        16, 15, 14, 13, 12, 11, 11, 10, 10, 10, 18, 19, 19, 20, 17, 15, 14, 13,
+        11, 11, 10, 9, 9, 9, 8, 9, 13, 14, 14, 15, 14, 12, 11, 11, 9, 9, 8, 8,
+        7, 8, 7, 7, 12, 13, 13, 13, 13, 12, 11, 10, 9, 8, 8, 7, 7, 7, 6, 6, 11,
+        12, 12, 12, 12, 11, 11, 10, 9, 8, 8, 7, 7, 6, 6, 6,
+        /* Size 16x32 */
+        32, 33, 32, 32, 28, 26, 23, 19, 18, 16, 13, 13, 12, 11, 11, 11, 33, 32,
+        32, 32, 29, 27, 24, 20, 19, 17, 14, 13, 12, 12, 12, 11, 33, 32, 32, 32,
+        29, 27, 25, 20, 19, 17, 14, 14, 13, 12, 12, 11, 33, 32, 32, 31, 30, 28,
+        25, 21, 19, 17, 14, 14, 13, 12, 12, 12, 32, 32, 31, 30, 28, 26, 24, 20,
+        19, 17, 14, 14, 13, 13, 12, 12, 32, 32, 30, 30, 28, 26, 24, 21, 19, 18,
+        15, 14, 13, 13, 12, 12, 32, 31, 30, 29, 27, 26, 24, 21, 20, 18, 15, 15,
+        13, 13, 12, 12, 30, 30, 29, 28, 24, 23, 21, 19, 18, 16, 14, 14, 13, 13,
+        13, 12, 30, 30, 28, 28, 23, 22, 20, 18, 17, 16, 14, 13, 13, 12, 12, 12,
+        28, 30, 28, 27, 21, 20, 19, 17, 16, 15, 13, 13, 12, 12, 12, 12, 26, 28,
+        26, 26, 20, 19, 18, 16, 15, 14, 12, 12, 12, 12, 11, 12, 26, 27, 26, 25,
+        20, 19, 17, 15, 15, 14, 12, 12, 11, 11, 11, 11, 23, 25, 24, 24, 19, 18,
+        16, 14, 14, 13, 11, 11, 11, 11, 11, 11, 22, 23, 23, 22, 18, 17, 16, 14,
+        13, 12, 11, 11, 10, 10, 10, 10, 21, 22, 22, 22, 18, 17, 15, 13, 13, 12,
+        11, 10, 10, 10, 10, 10, 19, 21, 20, 20, 17, 16, 14, 12, 12, 11, 10, 10,
+        9, 9, 10, 9, 18, 19, 19, 19, 16, 15, 14, 12, 11, 11, 9, 9, 9, 9, 9, 9,
+        17, 19, 19, 19, 16, 15, 14, 12, 11, 10, 9, 9, 9, 9, 9, 9, 16, 17, 17,
+        18, 15, 14, 13, 11, 11, 10, 9, 9, 8, 8, 8, 9, 15, 16, 17, 17, 14, 13,
+        12, 11, 10, 9, 8, 8, 8, 8, 8, 8, 14, 16, 16, 16, 14, 13, 12, 11, 10, 9,
+        8, 8, 8, 8, 8, 8, 13, 14, 14, 15, 13, 12, 11, 10, 9, 9, 8, 8, 7, 8, 8,
+        7, 13, 14, 14, 14, 13, 12, 11, 10, 9, 9, 8, 7, 7, 7, 7, 7, 12, 14, 14,
+        14, 13, 12, 11, 10, 9, 8, 8, 7, 7, 7, 7, 7, 12, 13, 13, 13, 12, 11, 11,
+        9, 9, 8, 7, 7, 7, 7, 7, 7, 11, 12, 13, 13, 12, 12, 10, 9, 9, 8, 8, 7, 7,
+        7, 6, 6, 11, 12, 12, 13, 12, 11, 10, 10, 9, 8, 8, 7, 7, 6, 6, 6, 11, 12,
+        12, 12, 12, 11, 10, 10, 9, 8, 7, 7, 7, 6, 6, 6, 10, 12, 12, 12, 12, 11,
+        10, 9, 8, 8, 7, 7, 6, 6, 6, 6, 10, 11, 11, 12, 11, 10, 10, 9, 9, 8, 7,
+        7, 6, 6, 6, 6, 10, 11, 11, 11, 11, 10, 10, 9, 9, 8, 7, 7, 6, 6, 6, 6,
+        10, 11, 11, 11, 11, 10, 10, 9, 9, 8, 8, 7, 7, 6, 6, 5,
+        /* Size 32x16 */
+        32, 33, 33, 33, 32, 32, 32, 30, 30, 28, 26, 26, 23, 22, 21, 19, 18, 17,
+        16, 15, 14, 13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 10, 33, 32, 32, 32,
+        32, 32, 31, 30, 30, 30, 28, 27, 25, 23, 22, 21, 19, 19, 17, 16, 16, 14,
+        14, 14, 13, 12, 12, 12, 12, 11, 11, 11, 32, 32, 32, 32, 31, 30, 30, 29,
+        28, 28, 26, 26, 24, 23, 22, 20, 19, 19, 17, 17, 16, 14, 14, 14, 13, 13,
+        12, 12, 12, 11, 11, 11, 32, 32, 32, 31, 30, 30, 29, 28, 28, 27, 26, 25,
+        24, 22, 22, 20, 19, 19, 18, 17, 16, 15, 14, 14, 13, 13, 13, 12, 12, 12,
+        11, 11, 28, 29, 29, 30, 28, 28, 27, 24, 23, 21, 20, 20, 19, 18, 18, 17,
+        16, 16, 15, 14, 14, 13, 13, 13, 12, 12, 12, 12, 12, 11, 11, 11, 26, 27,
+        27, 28, 26, 26, 26, 23, 22, 20, 19, 19, 18, 17, 17, 16, 15, 15, 14, 13,
+        13, 12, 12, 12, 11, 12, 11, 11, 11, 10, 10, 10, 23, 24, 25, 25, 24, 24,
+        24, 21, 20, 19, 18, 17, 16, 16, 15, 14, 14, 14, 13, 12, 12, 11, 11, 11,
+        11, 10, 10, 10, 10, 10, 10, 10, 19, 20, 20, 21, 20, 21, 21, 19, 18, 17,
+        16, 15, 14, 14, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 10, 10, 9,
+        9, 9, 9, 18, 19, 19, 19, 19, 19, 20, 18, 17, 16, 15, 15, 14, 13, 13, 12,
+        11, 11, 11, 10, 10, 9, 9, 9, 9, 9, 9, 9, 8, 9, 9, 9, 16, 17, 17, 17, 17,
+        18, 18, 16, 16, 15, 14, 14, 13, 12, 12, 11, 11, 10, 10, 9, 9, 9, 9, 8,
+        8, 8, 8, 8, 8, 8, 8, 8, 13, 14, 14, 14, 14, 15, 15, 14, 14, 13, 12, 12,
+        11, 11, 11, 10, 9, 9, 9, 8, 8, 8, 8, 8, 7, 8, 8, 7, 7, 7, 7, 8, 13, 13,
+        14, 14, 14, 14, 15, 14, 13, 13, 12, 12, 11, 11, 10, 10, 9, 9, 9, 8, 8,
+        8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 12, 12, 13, 13, 13, 13, 13, 13, 13, 12,
+        12, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 7,
+        11, 12, 12, 12, 13, 13, 13, 13, 12, 12, 12, 11, 11, 10, 10, 9, 9, 9, 8,
+        8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 11, 12, 12, 12, 12, 12, 12, 13,
+        12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 8, 8, 8, 8, 7, 7, 7, 6, 6, 6, 6,
+        6, 6, 6, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 11, 11, 10, 10, 9,
+        9, 9, 9, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 5,
+        /* Size 4x16 */
+        33, 26, 16, 11, 32, 27, 17, 12, 32, 26, 17, 13, 31, 26, 18, 13, 30, 22,
+        16, 12, 28, 19, 14, 12, 25, 18, 13, 11, 22, 17, 12, 10, 19, 15, 11, 9,
+        17, 14, 10, 8, 16, 13, 9, 8, 14, 12, 9, 7, 13, 11, 8, 7, 12, 11, 8, 6,
+        12, 11, 8, 6, 11, 10, 8, 6,
+        /* Size 16x4 */
+        33, 32, 32, 31, 30, 28, 25, 22, 19, 17, 16, 14, 13, 12, 12, 11, 26, 27,
+        26, 26, 22, 19, 18, 17, 15, 14, 13, 12, 11, 11, 11, 10, 16, 17, 17, 18,
+        16, 14, 13, 12, 11, 10, 9, 9, 8, 8, 8, 8, 11, 12, 13, 13, 12, 12, 11,
+        10, 9, 8, 8, 7, 7, 6, 6, 6,
+        /* Size 8x32 */
+        32, 32, 28, 23, 18, 13, 12, 11, 33, 32, 29, 24, 19, 14, 12, 12, 33, 32,
+        29, 25, 19, 14, 13, 12, 33, 32, 30, 25, 19, 14, 13, 12, 32, 31, 28, 24,
+        19, 14, 13, 12, 32, 30, 28, 24, 19, 15, 13, 12, 32, 30, 27, 24, 20, 15,
+        13, 12, 30, 29, 24, 21, 18, 14, 13, 13, 30, 28, 23, 20, 17, 14, 13, 12,
+        28, 28, 21, 19, 16, 13, 12, 12, 26, 26, 20, 18, 15, 12, 12, 11, 26, 26,
+        20, 17, 15, 12, 11, 11, 23, 24, 19, 16, 14, 11, 11, 11, 22, 23, 18, 16,
+        13, 11, 10, 10, 21, 22, 18, 15, 13, 11, 10, 10, 19, 20, 17, 14, 12, 10,
+        9, 10, 18, 19, 16, 14, 11, 9, 9, 9, 17, 19, 16, 14, 11, 9, 9, 9, 16, 17,
+        15, 13, 11, 9, 8, 8, 15, 17, 14, 12, 10, 8, 8, 8, 14, 16, 14, 12, 10, 8,
+        8, 8, 13, 14, 13, 11, 9, 8, 7, 8, 13, 14, 13, 11, 9, 8, 7, 7, 12, 14,
+        13, 11, 9, 8, 7, 7, 12, 13, 12, 11, 9, 7, 7, 7, 11, 13, 12, 10, 9, 8, 7,
+        6, 11, 12, 12, 10, 9, 8, 7, 6, 11, 12, 12, 10, 9, 7, 7, 6, 10, 12, 12,
+        10, 8, 7, 6, 6, 10, 11, 11, 10, 9, 7, 6, 6, 10, 11, 11, 10, 9, 7, 6, 6,
+        10, 11, 11, 10, 9, 8, 7, 6,
+        /* Size 32x8 */
+        32, 33, 33, 33, 32, 32, 32, 30, 30, 28, 26, 26, 23, 22, 21, 19, 18, 17,
+        16, 15, 14, 13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 10, 32, 32, 32, 32,
+        31, 30, 30, 29, 28, 28, 26, 26, 24, 23, 22, 20, 19, 19, 17, 17, 16, 14,
+        14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 28, 29, 29, 30, 28, 28, 27, 24,
+        23, 21, 20, 20, 19, 18, 18, 17, 16, 16, 15, 14, 14, 13, 13, 13, 12, 12,
+        12, 12, 12, 11, 11, 11, 23, 24, 25, 25, 24, 24, 24, 21, 20, 19, 18, 17,
+        16, 16, 15, 14, 14, 14, 13, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 10,
+        10, 10, 18, 19, 19, 19, 19, 19, 20, 18, 17, 16, 15, 15, 14, 13, 13, 12,
+        11, 11, 11, 10, 10, 9, 9, 9, 9, 9, 9, 9, 8, 9, 9, 9, 13, 14, 14, 14, 14,
+        15, 15, 14, 14, 13, 12, 12, 11, 11, 11, 10, 9, 9, 9, 8, 8, 8, 8, 8, 7,
+        8, 8, 7, 7, 7, 7, 8, 12, 12, 13, 13, 13, 13, 13, 13, 13, 12, 12, 11, 11,
+        10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 7, 11, 12, 12,
+        12, 12, 12, 12, 13, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 8, 8, 8, 8, 7,
+        7, 7, 6, 6, 6, 6, 6, 6, 6 },
+      { /* Chroma */
+        /* Size 4x4 */
+        32, 23, 19, 16, 23, 19, 17, 15, 19, 17, 13, 12, 16, 15, 12, 10,
+        /* Size 8x8 */
+        33, 28, 22, 21, 20, 17, 16, 15, 28, 24, 22, 22, 21, 19, 17, 16, 22, 22,
+        19, 19, 19, 17, 16, 16, 21, 22, 19, 17, 16, 15, 14, 14, 20, 21, 19, 16,
+        14, 13, 13, 13, 17, 19, 17, 15, 13, 12, 12, 12, 16, 17, 16, 14, 13, 12,
+        11, 10, 15, 16, 16, 14, 13, 12, 10, 10,
+        /* Size 16x16 */
+        32, 34, 31, 28, 23, 21, 21, 20, 19, 18, 17, 16, 15, 15, 15, 14, 34, 33,
+        29, 26, 23, 22, 22, 22, 20, 19, 19, 17, 17, 16, 16, 15, 31, 29, 26, 24,
+        22, 22, 23, 22, 21, 20, 19, 18, 17, 17, 16, 16, 28, 26, 24, 22, 22, 22,
+        23, 22, 21, 20, 20, 19, 18, 18, 17, 16, 23, 23, 22, 22, 20, 20, 20, 20,
+        19, 19, 18, 17, 17, 17, 16, 17, 21, 22, 22, 22, 20, 19, 19, 18, 18, 17,
+        17, 16, 16, 16, 16, 16, 21, 22, 23, 23, 20, 19, 18, 17, 17, 16, 16, 15,
+        15, 15, 15, 15, 20, 22, 22, 22, 20, 18, 17, 17, 16, 15, 15, 14, 14, 14,
+        14, 14, 19, 20, 21, 21, 19, 18, 17, 16, 15, 14, 14, 13, 13, 13, 13, 13,
+        18, 19, 20, 20, 19, 17, 16, 15, 14, 13, 13, 12, 12, 12, 12, 12, 17, 19,
+        19, 20, 18, 17, 16, 15, 14, 13, 12, 12, 12, 12, 12, 12, 16, 17, 18, 19,
+        17, 16, 15, 14, 13, 12, 12, 11, 11, 11, 11, 11, 15, 17, 17, 18, 17, 16,
+        15, 14, 13, 12, 12, 11, 11, 11, 11, 11, 15, 16, 17, 18, 17, 16, 15, 14,
+        13, 12, 12, 11, 11, 10, 10, 10, 15, 16, 16, 17, 16, 16, 15, 14, 13, 12,
+        12, 11, 11, 10, 10, 10, 14, 15, 16, 16, 17, 16, 15, 14, 13, 12, 12, 11,
+        11, 10, 10, 10,
+        /* Size 32x32 */
+        32, 33, 34, 34, 31, 29, 28, 25, 23, 21, 21, 21, 21, 20, 20, 20, 19, 19,
+        18, 17, 17, 16, 16, 16, 15, 15, 15, 15, 15, 14, 14, 14, 33, 33, 33, 33,
+        30, 28, 27, 24, 23, 22, 22, 22, 22, 21, 21, 20, 20, 20, 19, 18, 18, 17,
+        17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 34, 33, 33, 33, 29, 28, 26, 24,
+        23, 22, 22, 22, 22, 22, 22, 21, 20, 20, 19, 19, 19, 18, 17, 17, 17, 16,
+        16, 16, 16, 15, 15, 15, 34, 33, 33, 32, 29, 28, 26, 24, 23, 22, 23, 23,
+        23, 22, 22, 22, 21, 20, 20, 19, 19, 18, 18, 18, 17, 17, 17, 16, 16, 16,
+        16, 16, 31, 30, 29, 29, 26, 25, 24, 23, 22, 22, 22, 22, 23, 22, 22, 22,
+        21, 21, 20, 19, 19, 18, 18, 18, 17, 17, 17, 17, 16, 16, 16, 16, 29, 28,
+        28, 28, 25, 24, 23, 22, 22, 22, 22, 22, 23, 22, 22, 22, 21, 21, 20, 20,
+        19, 19, 18, 18, 18, 17, 17, 17, 17, 16, 16, 16, 28, 27, 26, 26, 24, 23,
+        22, 22, 22, 21, 22, 22, 23, 22, 22, 22, 21, 21, 20, 20, 20, 19, 19, 19,
+        18, 18, 18, 17, 17, 17, 16, 16, 25, 24, 24, 24, 23, 22, 22, 21, 21, 20,
+        21, 21, 21, 20, 20, 20, 20, 20, 19, 19, 19, 18, 18, 18, 17, 17, 17, 17,
+        17, 17, 17, 17, 23, 23, 23, 23, 22, 22, 22, 21, 20, 20, 20, 20, 20, 20,
+        20, 20, 19, 19, 19, 18, 18, 17, 17, 17, 17, 17, 17, 17, 16, 17, 17, 17,
+        21, 22, 22, 22, 22, 22, 21, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19,
+        18, 18, 18, 17, 17, 17, 16, 16, 16, 16, 16, 16, 16, 16, 21, 22, 22, 23,
+        22, 22, 22, 21, 20, 19, 19, 19, 19, 18, 18, 18, 18, 17, 17, 17, 17, 16,
+        16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 21, 22, 22, 23, 22, 22, 22, 21,
+        20, 19, 19, 19, 18, 18, 18, 18, 17, 17, 17, 17, 16, 16, 16, 16, 16, 15,
+        15, 15, 15, 15, 15, 15, 21, 22, 22, 23, 23, 23, 23, 21, 20, 19, 19, 18,
+        18, 17, 17, 17, 17, 17, 16, 16, 16, 15, 15, 15, 15, 15, 15, 15, 15, 15,
+        15, 15, 20, 21, 22, 22, 22, 22, 22, 20, 20, 19, 18, 18, 17, 17, 17, 16,
+        16, 16, 16, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 20, 21,
+        22, 22, 22, 22, 22, 20, 20, 19, 18, 18, 17, 17, 17, 16, 16, 16, 15, 15,
+        15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 20, 20, 21, 22, 22, 22,
+        22, 20, 20, 19, 18, 18, 17, 16, 16, 16, 15, 15, 15, 14, 14, 14, 14, 14,
+        13, 13, 14, 13, 13, 14, 14, 13, 19, 20, 20, 21, 21, 21, 21, 20, 19, 19,
+        18, 17, 17, 16, 16, 15, 15, 15, 14, 14, 14, 13, 13, 13, 13, 13, 13, 13,
+        13, 13, 13, 13, 19, 20, 20, 20, 21, 21, 21, 20, 19, 19, 17, 17, 17, 16,
+        16, 15, 15, 14, 14, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
+        18, 19, 19, 20, 20, 20, 20, 19, 19, 18, 17, 17, 16, 16, 15, 15, 14, 14,
+        13, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 17, 18, 19, 19,
+        19, 20, 20, 19, 18, 18, 17, 17, 16, 15, 15, 14, 14, 14, 13, 13, 13, 12,
+        12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 17, 18, 19, 19, 19, 19, 20, 19,
+        18, 18, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12, 12, 12, 12,
+        12, 12, 12, 12, 12, 12, 16, 17, 18, 18, 18, 19, 19, 18, 17, 17, 16, 16,
+        15, 15, 14, 14, 13, 13, 12, 12, 12, 12, 12, 11, 11, 12, 11, 12, 11, 12,
+        12, 12, 16, 17, 17, 18, 18, 18, 19, 18, 17, 17, 16, 16, 15, 14, 14, 14,
+        13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 16, 17,
+        17, 18, 18, 18, 19, 18, 17, 17, 16, 16, 15, 14, 14, 14, 13, 13, 12, 12,
+        12, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 15, 16, 17, 17, 17, 18,
+        18, 17, 17, 16, 16, 16, 15, 14, 14, 13, 13, 13, 12, 12, 12, 11, 11, 11,
+        11, 11, 11, 11, 11, 11, 11, 11, 15, 16, 16, 17, 17, 17, 18, 17, 17, 16,
+        16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11, 10,
+        10, 10, 11, 10, 15, 16, 16, 17, 17, 17, 18, 17, 17, 16, 16, 15, 15, 14,
+        14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10,
+        15, 16, 16, 16, 17, 17, 17, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13,
+        12, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 15, 15, 16, 16,
+        16, 17, 17, 17, 16, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12, 11,
+        11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 14, 15, 15, 16, 16, 16, 17, 17,
+        17, 16, 16, 15, 15, 14, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 10,
+        10, 10, 10, 10, 10, 10, 14, 15, 15, 16, 16, 16, 16, 17, 17, 16, 16, 15,
+        15, 14, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10,
+        10, 10, 14, 15, 15, 16, 16, 16, 16, 17, 17, 16, 16, 15, 15, 14, 14, 13,
+        13, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10, 9,
+        /* Size 4x8 */
+        33, 22, 19, 16, 27, 22, 20, 17, 22, 19, 18, 17, 22, 18, 16, 14, 20, 17,
+        14, 13, 18, 16, 12, 12, 17, 16, 12, 11, 16, 15, 12, 10,
+        /* Size 8x4 */
+        33, 27, 22, 22, 20, 18, 17, 16, 22, 22, 19, 18, 17, 16, 16, 15, 19, 20,
+        18, 16, 14, 12, 12, 12, 16, 17, 17, 14, 13, 12, 11, 10,
+        /* Size 8x16 */
+        32, 30, 21, 21, 19, 16, 15, 15, 33, 28, 22, 22, 20, 18, 17, 16, 31, 26,
+        22, 22, 21, 18, 17, 17, 28, 23, 22, 23, 21, 19, 18, 17, 23, 22, 20, 20,
+        19, 17, 17, 17, 21, 22, 19, 18, 18, 16, 16, 16, 21, 23, 19, 18, 17, 15,
+        15, 15, 20, 22, 19, 17, 16, 14, 14, 14, 19, 21, 19, 17, 15, 13, 13, 13,
+        18, 20, 18, 16, 14, 12, 12, 13, 17, 19, 18, 16, 14, 12, 12, 12, 16, 18,
+        17, 15, 13, 12, 11, 12, 16, 17, 16, 15, 13, 11, 11, 11, 15, 17, 16, 14,
+        13, 12, 11, 10, 15, 16, 16, 15, 13, 12, 11, 10, 14, 16, 16, 15, 13, 12,
+        11, 10,
+        /* Size 16x8 */
+        32, 33, 31, 28, 23, 21, 21, 20, 19, 18, 17, 16, 16, 15, 15, 14, 30, 28,
+        26, 23, 22, 22, 23, 22, 21, 20, 19, 18, 17, 17, 16, 16, 21, 22, 22, 22,
+        20, 19, 19, 19, 19, 18, 18, 17, 16, 16, 16, 16, 21, 22, 22, 23, 20, 18,
+        18, 17, 17, 16, 16, 15, 15, 14, 15, 15, 19, 20, 21, 21, 19, 18, 17, 16,
+        15, 14, 14, 13, 13, 13, 13, 13, 16, 18, 18, 19, 17, 16, 15, 14, 13, 12,
+        12, 12, 11, 12, 12, 12, 15, 17, 17, 18, 17, 16, 15, 14, 13, 12, 12, 11,
+        11, 11, 11, 11, 15, 16, 17, 17, 17, 16, 15, 14, 13, 13, 12, 12, 11, 10,
+        10, 10,
+        /* Size 16x32 */
+        32, 33, 30, 28, 21, 21, 21, 20, 19, 18, 16, 16, 15, 15, 15, 15, 33, 33,
+        29, 27, 22, 22, 22, 20, 20, 19, 17, 17, 16, 16, 16, 16, 33, 32, 28, 26,
+        22, 22, 22, 21, 20, 19, 18, 17, 17, 16, 16, 16, 34, 32, 28, 26, 22, 23,
+        23, 21, 21, 20, 18, 18, 17, 17, 17, 16, 31, 28, 26, 24, 22, 22, 22, 22,
+        21, 20, 18, 18, 17, 17, 17, 16, 29, 27, 24, 23, 22, 22, 23, 22, 21, 20,
+        19, 18, 18, 17, 17, 17, 28, 26, 23, 22, 22, 22, 23, 22, 21, 20, 19, 19,
+        18, 18, 17, 17, 24, 24, 23, 22, 20, 20, 21, 20, 20, 19, 18, 18, 17, 18,
+        17, 17, 23, 23, 22, 22, 20, 20, 20, 20, 19, 19, 17, 17, 17, 17, 17, 17,
+        21, 22, 22, 21, 19, 19, 19, 19, 19, 18, 17, 17, 16, 17, 17, 16, 21, 22,
+        22, 22, 19, 19, 18, 18, 18, 17, 16, 16, 16, 16, 16, 16, 21, 23, 22, 22,
+        19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 16, 16, 21, 23, 23, 22, 19, 18,
+        18, 17, 17, 16, 15, 15, 15, 15, 15, 16, 20, 22, 22, 22, 19, 18, 17, 16,
+        16, 16, 15, 14, 15, 14, 15, 15, 20, 22, 22, 22, 19, 18, 17, 16, 16, 15,
+        14, 14, 14, 14, 14, 15, 20, 21, 22, 22, 19, 18, 17, 16, 15, 14, 14, 14,
+        13, 14, 14, 14, 19, 21, 21, 21, 19, 18, 17, 15, 15, 14, 13, 13, 13, 13,
+        13, 14, 19, 20, 21, 21, 19, 17, 17, 15, 15, 14, 13, 13, 13, 13, 13, 13,
+        18, 20, 20, 20, 18, 17, 16, 15, 14, 13, 12, 12, 12, 12, 13, 13, 17, 19,
+        20, 20, 18, 17, 16, 14, 14, 13, 12, 12, 12, 12, 12, 12, 17, 19, 19, 20,
+        18, 17, 16, 14, 14, 13, 12, 12, 12, 12, 12, 12, 16, 18, 18, 19, 17, 16,
+        15, 14, 13, 12, 12, 11, 11, 12, 12, 12, 16, 18, 18, 19, 17, 16, 15, 14,
+        13, 12, 12, 11, 11, 11, 12, 12, 16, 17, 18, 18, 17, 16, 15, 14, 13, 12,
+        11, 11, 11, 11, 11, 11, 16, 17, 17, 18, 16, 16, 15, 13, 13, 12, 11, 11,
+        11, 11, 11, 11, 15, 17, 17, 18, 16, 16, 15, 14, 13, 12, 12, 11, 11, 11,
+        11, 11, 15, 17, 17, 17, 16, 16, 14, 14, 13, 12, 12, 11, 11, 11, 10, 11,
+        15, 16, 17, 17, 16, 16, 14, 14, 13, 12, 12, 11, 11, 10, 10, 10, 15, 16,
+        16, 17, 16, 16, 15, 14, 13, 13, 12, 11, 11, 10, 10, 10, 14, 16, 16, 17,
+        16, 15, 15, 14, 13, 12, 12, 11, 11, 10, 10, 10, 14, 16, 16, 17, 16, 15,
+        15, 14, 13, 12, 12, 11, 11, 10, 10, 10, 14, 16, 16, 16, 16, 15, 15, 13,
+        13, 12, 12, 11, 11, 10, 10, 10,
+        /* Size 32x16 */
+        32, 33, 33, 34, 31, 29, 28, 24, 23, 21, 21, 21, 21, 20, 20, 20, 19, 19,
+        18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 33, 33, 32, 32,
+        28, 27, 26, 24, 23, 22, 22, 23, 23, 22, 22, 21, 21, 20, 20, 19, 19, 18,
+        18, 17, 17, 17, 17, 16, 16, 16, 16, 16, 30, 29, 28, 28, 26, 24, 23, 23,
+        22, 22, 22, 22, 23, 22, 22, 22, 21, 21, 20, 20, 19, 18, 18, 18, 17, 17,
+        17, 17, 16, 16, 16, 16, 28, 27, 26, 26, 24, 23, 22, 22, 22, 21, 22, 22,
+        22, 22, 22, 22, 21, 21, 20, 20, 20, 19, 19, 18, 18, 18, 17, 17, 17, 17,
+        17, 16, 21, 22, 22, 22, 22, 22, 22, 20, 20, 19, 19, 19, 19, 19, 19, 19,
+        19, 19, 18, 18, 18, 17, 17, 17, 16, 16, 16, 16, 16, 16, 16, 16, 21, 22,
+        22, 23, 22, 22, 22, 20, 20, 19, 19, 19, 18, 18, 18, 18, 18, 17, 17, 17,
+        17, 16, 16, 16, 16, 16, 16, 16, 16, 15, 15, 15, 21, 22, 22, 23, 22, 23,
+        23, 21, 20, 19, 18, 18, 18, 17, 17, 17, 17, 17, 16, 16, 16, 15, 15, 15,
+        15, 15, 14, 14, 15, 15, 15, 15, 20, 20, 21, 21, 22, 22, 22, 20, 20, 19,
+        18, 18, 17, 16, 16, 16, 15, 15, 15, 14, 14, 14, 14, 14, 13, 14, 14, 14,
+        14, 14, 14, 13, 19, 20, 20, 21, 21, 21, 21, 20, 19, 19, 18, 17, 17, 16,
+        16, 15, 15, 15, 14, 14, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13,
+        18, 19, 19, 20, 20, 20, 20, 19, 19, 18, 17, 17, 16, 16, 15, 14, 14, 14,
+        13, 13, 13, 12, 12, 12, 12, 12, 12, 12, 13, 12, 12, 12, 16, 17, 18, 18,
+        18, 19, 19, 18, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 12,
+        12, 11, 11, 12, 12, 12, 12, 12, 12, 12, 16, 17, 17, 18, 18, 18, 19, 18,
+        17, 17, 16, 16, 15, 14, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 11, 11,
+        11, 11, 11, 11, 11, 11, 15, 16, 17, 17, 17, 18, 18, 17, 17, 16, 16, 16,
+        15, 15, 14, 13, 13, 13, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11,
+        11, 11, 15, 16, 16, 17, 17, 17, 18, 18, 17, 17, 16, 16, 15, 14, 14, 14,
+        13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 15, 16,
+        16, 17, 17, 17, 17, 17, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12,
+        12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10, 15, 16, 16, 16, 16, 17,
+        17, 17, 17, 16, 16, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 12, 11,
+        11, 11, 11, 10, 10, 10, 10, 10,
+        /* Size 4x16 */
+        33, 21, 18, 15, 32, 22, 19, 16, 28, 22, 20, 17, 26, 22, 20, 18, 23, 20,
+        19, 17, 22, 19, 17, 16, 23, 18, 16, 15, 22, 18, 15, 14, 21, 18, 14, 13,
+        20, 17, 13, 12, 19, 17, 13, 12, 18, 16, 12, 11, 17, 16, 12, 11, 17, 16,
+        12, 11, 16, 16, 13, 10, 16, 15, 12, 10,
+        /* Size 16x4 */
+        33, 32, 28, 26, 23, 22, 23, 22, 21, 20, 19, 18, 17, 17, 16, 16, 21, 22,
+        22, 22, 20, 19, 18, 18, 18, 17, 17, 16, 16, 16, 16, 15, 18, 19, 20, 20,
+        19, 17, 16, 15, 14, 13, 13, 12, 12, 12, 13, 12, 15, 16, 17, 18, 17, 16,
+        15, 14, 13, 12, 12, 11, 11, 11, 10, 10,
+        /* Size 8x32 */
+        32, 30, 21, 21, 19, 16, 15, 15, 33, 29, 22, 22, 20, 17, 16, 16, 33, 28,
+        22, 22, 20, 18, 17, 16, 34, 28, 22, 23, 21, 18, 17, 17, 31, 26, 22, 22,
+        21, 18, 17, 17, 29, 24, 22, 23, 21, 19, 18, 17, 28, 23, 22, 23, 21, 19,
+        18, 17, 24, 23, 20, 21, 20, 18, 17, 17, 23, 22, 20, 20, 19, 17, 17, 17,
+        21, 22, 19, 19, 19, 17, 16, 17, 21, 22, 19, 18, 18, 16, 16, 16, 21, 22,
+        19, 18, 17, 16, 16, 16, 21, 23, 19, 18, 17, 15, 15, 15, 20, 22, 19, 17,
+        16, 15, 15, 15, 20, 22, 19, 17, 16, 14, 14, 14, 20, 22, 19, 17, 15, 14,
+        13, 14, 19, 21, 19, 17, 15, 13, 13, 13, 19, 21, 19, 17, 15, 13, 13, 13,
+        18, 20, 18, 16, 14, 12, 12, 13, 17, 20, 18, 16, 14, 12, 12, 12, 17, 19,
+        18, 16, 14, 12, 12, 12, 16, 18, 17, 15, 13, 12, 11, 12, 16, 18, 17, 15,
+        13, 12, 11, 12, 16, 18, 17, 15, 13, 11, 11, 11, 16, 17, 16, 15, 13, 11,
+        11, 11, 15, 17, 16, 15, 13, 12, 11, 11, 15, 17, 16, 14, 13, 12, 11, 10,
+        15, 17, 16, 14, 13, 12, 11, 10, 15, 16, 16, 15, 13, 12, 11, 10, 14, 16,
+        16, 15, 13, 12, 11, 10, 14, 16, 16, 15, 13, 12, 11, 10, 14, 16, 16, 15,
+        13, 12, 11, 10,
+        /* Size 32x8 */
+        32, 33, 33, 34, 31, 29, 28, 24, 23, 21, 21, 21, 21, 20, 20, 20, 19, 19,
+        18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 30, 29, 28, 28,
+        26, 24, 23, 23, 22, 22, 22, 22, 23, 22, 22, 22, 21, 21, 20, 20, 19, 18,
+        18, 18, 17, 17, 17, 17, 16, 16, 16, 16, 21, 22, 22, 22, 22, 22, 22, 20,
+        20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 17, 17, 17, 16, 16,
+        16, 16, 16, 16, 16, 16, 21, 22, 22, 23, 22, 23, 23, 21, 20, 19, 18, 18,
+        18, 17, 17, 17, 17, 17, 16, 16, 16, 15, 15, 15, 15, 15, 14, 14, 15, 15,
+        15, 15, 19, 20, 20, 21, 21, 21, 21, 20, 19, 19, 18, 17, 17, 16, 16, 15,
+        15, 15, 14, 14, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 16, 17,
+        18, 18, 18, 19, 19, 18, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12,
+        12, 12, 12, 11, 11, 12, 12, 12, 12, 12, 12, 12, 15, 16, 17, 17, 17, 18,
+        18, 17, 17, 16, 16, 16, 15, 15, 14, 13, 13, 13, 12, 12, 12, 11, 11, 11,
+        11, 11, 11, 11, 11, 11, 11, 11, 15, 16, 16, 17, 17, 17, 17, 17, 17, 17,
+        16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 10, 10,
+        10, 10, 10, 10 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 28, 18, 13, 28, 19, 14, 11, 18, 14, 10, 8, 13, 11, 8, 7,
+        /* Size 8x8 */
+        32, 32, 29, 24, 19, 15, 13, 11, 32, 31, 28, 24, 20, 16, 14, 12, 29, 28,
+        22, 20, 17, 14, 13, 12, 24, 24, 20, 16, 14, 12, 11, 10, 19, 20, 17, 14,
+        12, 10, 9, 9, 15, 16, 14, 12, 10, 9, 8, 8, 13, 14, 13, 11, 9, 8, 7, 7,
+        11, 12, 12, 10, 9, 8, 7, 6,
+        /* Size 16x16 */
+        32, 33, 33, 32, 30, 28, 25, 22, 19, 17, 16, 14, 12, 12, 11, 11, 33, 32,
+        32, 32, 30, 29, 26, 23, 20, 19, 17, 15, 13, 13, 12, 11, 33, 32, 31, 31,
+        29, 28, 26, 23, 21, 19, 17, 15, 14, 13, 12, 12, 32, 32, 31, 29, 28, 27,
+        25, 23, 21, 19, 18, 16, 14, 14, 13, 12, 30, 30, 29, 28, 26, 24, 22, 20,
+        19, 18, 16, 15, 13, 13, 12, 12, 28, 29, 28, 27, 24, 21, 20, 18, 17, 16,
+        15, 14, 13, 12, 11, 11, 25, 26, 26, 25, 22, 20, 18, 17, 15, 14, 14, 12,
+        12, 11, 11, 11, 22, 23, 23, 23, 20, 18, 17, 15, 14, 13, 12, 11, 11, 10,
+        10, 10, 19, 20, 21, 21, 19, 17, 15, 14, 12, 12, 11, 10, 10, 9, 9, 9, 17,
+        19, 19, 19, 18, 16, 14, 13, 12, 11, 10, 10, 9, 9, 9, 8, 16, 17, 17, 18,
+        16, 15, 14, 12, 11, 10, 10, 9, 9, 8, 8, 8, 14, 15, 15, 16, 15, 14, 12,
+        11, 10, 10, 9, 8, 8, 8, 7, 7, 12, 13, 14, 14, 13, 13, 12, 11, 10, 9, 9,
+        8, 7, 7, 7, 7, 12, 13, 13, 14, 13, 12, 11, 10, 9, 9, 8, 8, 7, 7, 7, 6,
+        11, 12, 12, 13, 12, 11, 11, 10, 9, 9, 8, 7, 7, 7, 6, 6, 11, 11, 12, 12,
+        12, 11, 11, 10, 9, 8, 8, 7, 7, 6, 6, 6,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 32, 32, 30, 29, 28, 26, 25, 23, 22, 21, 19, 18,
+        17, 16, 16, 14, 14, 13, 12, 12, 12, 11, 11, 11, 11, 10, 33, 32, 32, 32,
+        32, 32, 32, 31, 30, 29, 29, 27, 26, 24, 23, 22, 20, 19, 18, 17, 17, 15,
+        14, 13, 13, 13, 12, 12, 12, 11, 11, 11, 33, 32, 32, 32, 32, 32, 32, 31,
+        30, 30, 29, 27, 26, 24, 23, 23, 20, 20, 19, 17, 17, 15, 15, 14, 13, 13,
+        13, 12, 12, 12, 11, 11, 33, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 28,
+        27, 25, 23, 23, 21, 20, 19, 18, 17, 16, 15, 14, 14, 14, 13, 13, 12, 12,
+        12, 11, 33, 32, 32, 32, 31, 31, 31, 30, 29, 28, 28, 26, 26, 24, 23, 23,
+        21, 20, 19, 18, 17, 16, 15, 14, 14, 14, 13, 13, 12, 12, 12, 11, 33, 32,
+        32, 32, 31, 31, 30, 30, 29, 28, 28, 26, 26, 24, 23, 23, 20, 20, 19, 18,
+        17, 16, 15, 14, 14, 14, 13, 13, 12, 12, 12, 12, 32, 32, 32, 32, 31, 30,
+        29, 28, 28, 27, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 18, 16, 16, 15,
+        14, 14, 14, 13, 13, 12, 12, 12, 32, 31, 31, 31, 30, 30, 28, 28, 27, 26,
+        26, 24, 24, 23, 22, 22, 20, 19, 19, 17, 17, 16, 15, 14, 14, 14, 13, 13,
+        13, 12, 12, 12, 30, 30, 30, 31, 29, 29, 28, 27, 26, 24, 24, 23, 22, 22,
+        20, 20, 19, 18, 18, 17, 16, 15, 15, 14, 13, 13, 13, 12, 12, 12, 12, 12,
+        29, 29, 30, 30, 28, 28, 27, 26, 24, 22, 22, 21, 20, 20, 19, 19, 17, 17,
+        17, 16, 15, 14, 14, 13, 13, 13, 12, 12, 12, 12, 11, 11, 28, 29, 29, 30,
+        28, 28, 27, 26, 24, 22, 21, 20, 20, 19, 18, 18, 17, 17, 16, 15, 15, 14,
+        14, 13, 13, 13, 12, 12, 11, 11, 11, 11, 26, 27, 27, 28, 26, 26, 26, 24,
+        23, 21, 20, 19, 19, 18, 17, 17, 16, 16, 15, 14, 14, 13, 13, 12, 12, 12,
+        11, 11, 11, 11, 11, 11, 25, 26, 26, 27, 26, 26, 25, 24, 22, 20, 20, 19,
+        18, 17, 17, 16, 15, 15, 14, 14, 14, 13, 12, 12, 12, 12, 11, 11, 11, 11,
+        11, 10, 23, 24, 24, 25, 24, 24, 24, 23, 22, 20, 19, 18, 17, 16, 16, 15,
+        14, 14, 14, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 10, 22, 23,
+        23, 23, 23, 23, 23, 22, 20, 19, 18, 17, 17, 16, 15, 15, 14, 13, 13, 12,
+        12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 21, 22, 23, 23, 23, 23,
+        22, 22, 20, 19, 18, 17, 16, 15, 15, 14, 13, 13, 13, 12, 12, 11, 11, 11,
+        10, 10, 10, 10, 10, 10, 9, 9, 19, 20, 20, 21, 21, 20, 21, 20, 19, 17,
+        17, 16, 15, 14, 14, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 9, 9, 9,
+        9, 9, 9, 18, 19, 20, 20, 20, 20, 20, 19, 18, 17, 17, 16, 15, 14, 13, 13,
+        12, 12, 12, 11, 11, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 17, 18, 19, 19,
+        19, 19, 19, 19, 18, 17, 16, 15, 14, 14, 13, 13, 12, 12, 11, 11, 10, 10,
+        10, 9, 9, 9, 9, 9, 9, 8, 8, 9, 16, 17, 17, 18, 18, 18, 18, 17, 17, 16,
+        15, 14, 14, 13, 12, 12, 11, 11, 11, 10, 10, 9, 9, 9, 9, 9, 8, 8, 8, 8,
+        8, 8, 16, 17, 17, 17, 17, 17, 18, 17, 16, 15, 15, 14, 14, 13, 12, 12,
+        11, 11, 10, 10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 14, 15, 15, 16, 16,
+        16, 16, 16, 15, 14, 14, 13, 13, 12, 12, 11, 11, 10, 10, 9, 9, 9, 9, 8,
+        8, 8, 8, 8, 8, 8, 8, 7, 14, 14, 15, 15, 15, 15, 16, 15, 15, 14, 14, 13,
+        12, 12, 11, 11, 10, 10, 10, 9, 9, 9, 8, 8, 8, 8, 8, 7, 7, 7, 7, 7, 13,
+        13, 14, 14, 14, 14, 15, 14, 14, 13, 13, 12, 12, 11, 11, 11, 10, 10, 9,
+        9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 12, 13, 13, 14, 14, 14, 14, 14,
+        13, 13, 13, 12, 12, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7,
+        7, 7, 7, 12, 13, 13, 14, 14, 14, 14, 14, 13, 13, 13, 12, 12, 11, 11, 10,
+        10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 12, 12, 13, 13, 13, 13,
+        14, 13, 13, 12, 12, 11, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7,
+        7, 7, 6, 6, 6, 11, 12, 12, 13, 13, 13, 13, 13, 12, 12, 12, 11, 11, 10,
+        10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 11, 12, 12, 12,
+        12, 12, 13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7,
+        7, 7, 7, 7, 6, 6, 6, 6, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 11, 11,
+        11, 10, 10, 10, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 11, 11,
+        11, 12, 12, 12, 12, 12, 12, 11, 11, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 8,
+        7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 10, 11, 11, 11, 11, 12, 12, 12, 12, 11,
+        11, 11, 10, 10, 10, 9, 9, 9, 9, 8, 8, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6,
+        /* Size 4x8 */
+        32, 29, 17, 12, 32, 28, 18, 13, 30, 22, 16, 12, 25, 19, 13, 11, 20, 17,
+        11, 9, 16, 14, 9, 8, 14, 13, 9, 7, 12, 11, 9, 7,
+        /* Size 8x4 */
+        32, 32, 30, 25, 20, 16, 14, 12, 29, 28, 22, 19, 17, 14, 13, 11, 17, 18,
+        16, 13, 11, 9, 9, 9, 12, 13, 12, 11, 9, 8, 7, 7,
+        /* Size 8x16 */
+        32, 33, 29, 23, 19, 16, 12, 11, 33, 32, 30, 25, 20, 17, 13, 12, 33, 31,
+        29, 24, 21, 17, 14, 13, 32, 30, 28, 24, 21, 18, 14, 13, 30, 29, 25, 21,
+        19, 16, 13, 13, 28, 28, 22, 19, 17, 15, 13, 12, 25, 26, 21, 17, 15, 13,
+        12, 11, 22, 23, 19, 16, 14, 12, 11, 10, 19, 20, 18, 14, 12, 11, 10, 9,
+        18, 19, 17, 14, 12, 10, 9, 9, 16, 17, 16, 13, 11, 10, 9, 8, 14, 15, 14,
+        12, 10, 9, 8, 8, 12, 14, 13, 11, 10, 9, 7, 7, 12, 13, 12, 11, 9, 8, 7,
+        7, 11, 12, 12, 11, 9, 8, 7, 7, 11, 12, 12, 11, 9, 8, 7, 6,
+        /* Size 16x8 */
+        32, 33, 33, 32, 30, 28, 25, 22, 19, 18, 16, 14, 12, 12, 11, 11, 33, 32,
+        31, 30, 29, 28, 26, 23, 20, 19, 17, 15, 14, 13, 12, 12, 29, 30, 29, 28,
+        25, 22, 21, 19, 18, 17, 16, 14, 13, 12, 12, 12, 23, 25, 24, 24, 21, 19,
+        17, 16, 14, 14, 13, 12, 11, 11, 11, 11, 19, 20, 21, 21, 19, 17, 15, 14,
+        12, 12, 11, 10, 10, 9, 9, 9, 16, 17, 17, 18, 16, 15, 13, 12, 11, 10, 10,
+        9, 9, 8, 8, 8, 12, 13, 14, 14, 13, 13, 12, 11, 10, 9, 9, 8, 7, 7, 7, 7,
+        11, 12, 13, 13, 13, 12, 11, 10, 9, 9, 8, 8, 7, 7, 7, 6,
+        /* Size 16x32 */
+        32, 33, 33, 32, 29, 28, 23, 22, 19, 17, 16, 13, 12, 12, 11, 11, 33, 32,
+        32, 32, 29, 29, 24, 23, 20, 17, 17, 14, 13, 12, 12, 12, 33, 32, 32, 32,
+        30, 29, 25, 23, 20, 18, 17, 14, 13, 12, 12, 12, 33, 32, 32, 31, 30, 30,
+        25, 23, 21, 18, 17, 14, 14, 13, 12, 12, 33, 32, 31, 30, 29, 28, 24, 23,
+        21, 18, 17, 14, 14, 13, 13, 12, 32, 32, 31, 30, 28, 28, 24, 23, 20, 18,
+        17, 14, 14, 13, 13, 12, 32, 31, 30, 29, 28, 27, 24, 23, 21, 18, 18, 15,
+        14, 13, 13, 12, 32, 31, 30, 28, 26, 26, 23, 22, 20, 18, 17, 14, 14, 13,
+        13, 13, 30, 30, 29, 28, 25, 24, 21, 20, 19, 17, 16, 14, 13, 13, 13, 13,
+        29, 30, 28, 27, 23, 22, 20, 19, 17, 16, 15, 13, 13, 12, 12, 12, 28, 30,
+        28, 27, 22, 21, 19, 18, 17, 16, 15, 13, 13, 12, 12, 12, 26, 28, 26, 26,
+        21, 20, 18, 17, 16, 14, 14, 12, 12, 12, 12, 11, 25, 26, 26, 25, 21, 20,
+        17, 17, 15, 14, 13, 12, 12, 11, 11, 11, 23, 25, 24, 24, 20, 19, 16, 16,
+        14, 13, 13, 11, 11, 11, 11, 11, 22, 23, 23, 23, 19, 18, 16, 15, 14, 12,
+        12, 11, 11, 10, 10, 10, 21, 23, 23, 22, 19, 18, 15, 15, 13, 12, 12, 11,
+        10, 10, 10, 10, 19, 21, 20, 20, 18, 17, 14, 14, 12, 11, 11, 10, 10, 10,
+        9, 10, 19, 20, 20, 20, 17, 17, 14, 13, 12, 11, 11, 10, 9, 9, 9, 9, 18,
+        19, 19, 19, 17, 16, 14, 13, 12, 11, 10, 9, 9, 9, 9, 9, 16, 18, 18, 18,
+        16, 15, 13, 12, 11, 10, 10, 9, 9, 9, 9, 8, 16, 17, 17, 18, 16, 15, 13,
+        12, 11, 10, 10, 9, 9, 8, 8, 8, 14, 16, 16, 16, 14, 14, 12, 12, 11, 9, 9,
+        8, 8, 8, 8, 8, 14, 15, 15, 16, 14, 14, 12, 11, 10, 9, 9, 8, 8, 8, 8, 8,
+        13, 14, 14, 15, 13, 13, 11, 11, 10, 9, 9, 8, 8, 7, 7, 7, 12, 14, 14, 14,
+        13, 13, 11, 11, 10, 9, 9, 8, 7, 7, 7, 7, 12, 14, 14, 14, 13, 13, 11, 11,
+        10, 9, 8, 8, 7, 7, 7, 7, 12, 13, 13, 13, 12, 12, 11, 10, 9, 9, 8, 7, 7,
+        7, 7, 7, 12, 12, 13, 13, 12, 12, 11, 10, 9, 9, 8, 7, 7, 7, 7, 6, 11, 12,
+        12, 13, 12, 12, 11, 10, 9, 9, 8, 8, 7, 7, 7, 6, 11, 12, 12, 12, 12, 11,
+        11, 10, 9, 9, 8, 8, 7, 7, 6, 6, 11, 12, 12, 12, 12, 11, 11, 10, 9, 8, 8,
+        7, 7, 6, 6, 6, 10, 11, 11, 12, 12, 11, 11, 9, 9, 8, 8, 7, 7, 6, 6, 6,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 32, 32, 32, 30, 29, 28, 26, 25, 23, 22, 21, 19, 19,
+        18, 16, 16, 14, 14, 13, 12, 12, 12, 12, 11, 11, 11, 10, 33, 32, 32, 32,
+        32, 32, 31, 31, 30, 30, 30, 28, 26, 25, 23, 23, 21, 20, 19, 18, 17, 16,
+        15, 14, 14, 14, 13, 12, 12, 12, 12, 11, 33, 32, 32, 32, 31, 31, 30, 30,
+        29, 28, 28, 26, 26, 24, 23, 23, 20, 20, 19, 18, 17, 16, 15, 14, 14, 14,
+        13, 13, 12, 12, 12, 11, 32, 32, 32, 31, 30, 30, 29, 28, 28, 27, 27, 26,
+        25, 24, 23, 22, 20, 20, 19, 18, 18, 16, 16, 15, 14, 14, 13, 13, 13, 12,
+        12, 12, 29, 29, 30, 30, 29, 28, 28, 26, 25, 23, 22, 21, 21, 20, 19, 19,
+        18, 17, 17, 16, 16, 14, 14, 13, 13, 13, 12, 12, 12, 12, 12, 12, 28, 29,
+        29, 30, 28, 28, 27, 26, 24, 22, 21, 20, 20, 19, 18, 18, 17, 17, 16, 15,
+        15, 14, 14, 13, 13, 13, 12, 12, 12, 11, 11, 11, 23, 24, 25, 25, 24, 24,
+        24, 23, 21, 20, 19, 18, 17, 16, 16, 15, 14, 14, 14, 13, 13, 12, 12, 11,
+        11, 11, 11, 11, 11, 11, 11, 11, 22, 23, 23, 23, 23, 23, 23, 22, 20, 19,
+        18, 17, 17, 16, 15, 15, 14, 13, 13, 12, 12, 12, 11, 11, 11, 11, 10, 10,
+        10, 10, 10, 9, 19, 20, 20, 21, 21, 20, 21, 20, 19, 17, 17, 16, 15, 14,
+        14, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 17,
+        17, 18, 18, 18, 18, 18, 18, 17, 16, 16, 14, 14, 13, 12, 12, 11, 11, 11,
+        10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 8, 8, 16, 17, 17, 17, 17, 17, 18, 17,
+        16, 15, 15, 14, 13, 13, 12, 12, 11, 11, 10, 10, 10, 9, 9, 9, 9, 8, 8, 8,
+        8, 8, 8, 8, 13, 14, 14, 14, 14, 14, 15, 14, 14, 13, 13, 12, 12, 11, 11,
+        11, 10, 10, 9, 9, 9, 8, 8, 8, 8, 8, 7, 7, 8, 8, 7, 7, 12, 13, 13, 14,
+        14, 14, 14, 14, 13, 13, 13, 12, 12, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8,
+        7, 7, 7, 7, 7, 7, 7, 7, 12, 12, 12, 13, 13, 13, 13, 13, 13, 12, 12, 12,
+        11, 11, 10, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 6, 6, 11, 12,
+        12, 12, 13, 13, 13, 13, 13, 12, 12, 12, 11, 11, 10, 10, 9, 9, 9, 9, 8,
+        8, 8, 7, 7, 7, 7, 7, 7, 6, 6, 6, 11, 12, 12, 12, 12, 12, 12, 13, 13, 12,
+        12, 11, 11, 11, 10, 10, 10, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 6,
+        /* Size 4x16 */
+        33, 28, 17, 12, 32, 29, 18, 12, 32, 28, 18, 13, 31, 27, 18, 13, 30, 24,
+        17, 13, 30, 21, 16, 12, 26, 20, 14, 11, 23, 18, 12, 10, 21, 17, 11, 10,
+        19, 16, 11, 9, 17, 15, 10, 8, 15, 14, 9, 8, 14, 13, 9, 7, 13, 12, 9, 7,
+        12, 12, 9, 7, 12, 11, 8, 6,
+        /* Size 16x4 */
+        33, 32, 32, 31, 30, 30, 26, 23, 21, 19, 17, 15, 14, 13, 12, 12, 28, 29,
+        28, 27, 24, 21, 20, 18, 17, 16, 15, 14, 13, 12, 12, 11, 17, 18, 18, 18,
+        17, 16, 14, 12, 11, 11, 10, 9, 9, 9, 9, 8, 12, 12, 13, 13, 13, 12, 11,
+        10, 10, 9, 8, 8, 7, 7, 7, 6,
+        /* Size 8x32 */
+        32, 33, 29, 23, 19, 16, 12, 11, 33, 32, 29, 24, 20, 17, 13, 12, 33, 32,
+        30, 25, 20, 17, 13, 12, 33, 32, 30, 25, 21, 17, 14, 12, 33, 31, 29, 24,
+        21, 17, 14, 13, 32, 31, 28, 24, 20, 17, 14, 13, 32, 30, 28, 24, 21, 18,
+        14, 13, 32, 30, 26, 23, 20, 17, 14, 13, 30, 29, 25, 21, 19, 16, 13, 13,
+        29, 28, 23, 20, 17, 15, 13, 12, 28, 28, 22, 19, 17, 15, 13, 12, 26, 26,
+        21, 18, 16, 14, 12, 12, 25, 26, 21, 17, 15, 13, 12, 11, 23, 24, 20, 16,
+        14, 13, 11, 11, 22, 23, 19, 16, 14, 12, 11, 10, 21, 23, 19, 15, 13, 12,
+        10, 10, 19, 20, 18, 14, 12, 11, 10, 9, 19, 20, 17, 14, 12, 11, 9, 9, 18,
+        19, 17, 14, 12, 10, 9, 9, 16, 18, 16, 13, 11, 10, 9, 9, 16, 17, 16, 13,
+        11, 10, 9, 8, 14, 16, 14, 12, 11, 9, 8, 8, 14, 15, 14, 12, 10, 9, 8, 8,
+        13, 14, 13, 11, 10, 9, 8, 7, 12, 14, 13, 11, 10, 9, 7, 7, 12, 14, 13,
+        11, 10, 8, 7, 7, 12, 13, 12, 11, 9, 8, 7, 7, 12, 13, 12, 11, 9, 8, 7, 7,
+        11, 12, 12, 11, 9, 8, 7, 7, 11, 12, 12, 11, 9, 8, 7, 6, 11, 12, 12, 11,
+        9, 8, 7, 6, 10, 11, 12, 11, 9, 8, 7, 6,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 32, 32, 32, 30, 29, 28, 26, 25, 23, 22, 21, 19, 19,
+        18, 16, 16, 14, 14, 13, 12, 12, 12, 12, 11, 11, 11, 10, 33, 32, 32, 32,
+        31, 31, 30, 30, 29, 28, 28, 26, 26, 24, 23, 23, 20, 20, 19, 18, 17, 16,
+        15, 14, 14, 14, 13, 13, 12, 12, 12, 11, 29, 29, 30, 30, 29, 28, 28, 26,
+        25, 23, 22, 21, 21, 20, 19, 19, 18, 17, 17, 16, 16, 14, 14, 13, 13, 13,
+        12, 12, 12, 12, 12, 12, 23, 24, 25, 25, 24, 24, 24, 23, 21, 20, 19, 18,
+        17, 16, 16, 15, 14, 14, 14, 13, 13, 12, 12, 11, 11, 11, 11, 11, 11, 11,
+        11, 11, 19, 20, 20, 21, 21, 20, 21, 20, 19, 17, 17, 16, 15, 14, 14, 13,
+        12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 16, 17, 17,
+        17, 17, 17, 18, 17, 16, 15, 15, 14, 13, 13, 12, 12, 11, 11, 10, 10, 10,
+        9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 12, 13, 13, 14, 14, 14, 14, 14, 13, 13,
+        13, 12, 12, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7,
+        11, 12, 12, 12, 13, 13, 13, 13, 13, 12, 12, 12, 11, 11, 10, 10, 9, 9, 9,
+        9, 8, 8, 8, 7, 7, 7, 7, 7, 7, 6, 6, 6 },
+      { /* Chroma */
+        /* Size 4x4 */
+        32, 23, 20, 17, 23, 19, 17, 16, 20, 17, 14, 13, 17, 16, 13, 11,
+        /* Size 8x8 */
+        33, 30, 22, 22, 20, 18, 17, 16, 30, 26, 22, 23, 21, 19, 18, 17, 22, 22,
+        20, 20, 19, 18, 17, 17, 22, 23, 20, 18, 17, 16, 15, 15, 20, 21, 19, 17,
+        15, 14, 13, 13, 18, 19, 18, 16, 14, 12, 12, 12, 17, 18, 17, 15, 13, 12,
+        11, 11, 16, 17, 17, 15, 13, 12, 11, 10,
+        /* Size 16x16 */
+        32, 33, 31, 28, 25, 21, 21, 20, 20, 19, 18, 17, 16, 15, 15, 15, 33, 33,
+        30, 26, 24, 22, 22, 22, 21, 20, 19, 18, 17, 17, 16, 16, 31, 30, 28, 24,
+        23, 22, 22, 22, 22, 21, 20, 19, 18, 17, 17, 16, 28, 26, 24, 22, 22, 21,
+        22, 22, 22, 21, 20, 19, 19, 18, 17, 17, 25, 24, 23, 22, 21, 20, 21, 20,
+        20, 20, 19, 18, 18, 17, 17, 17, 21, 22, 22, 21, 20, 19, 19, 19, 19, 19,
+        18, 17, 17, 16, 16, 16, 21, 22, 22, 22, 21, 19, 19, 18, 17, 17, 17, 16,
+        16, 15, 15, 15, 20, 22, 22, 22, 20, 19, 18, 17, 16, 16, 16, 15, 15, 14,
+        14, 14, 20, 21, 22, 22, 20, 19, 17, 16, 16, 15, 15, 14, 14, 13, 14, 14,
+        19, 20, 21, 21, 20, 19, 17, 16, 15, 14, 14, 13, 13, 13, 13, 13, 18, 19,
+        20, 20, 19, 18, 17, 16, 15, 14, 13, 13, 12, 12, 12, 12, 17, 18, 19, 19,
+        18, 17, 16, 15, 14, 13, 13, 12, 12, 12, 12, 12, 16, 17, 18, 19, 18, 17,
+        16, 15, 14, 13, 12, 12, 11, 11, 11, 11, 15, 17, 17, 18, 17, 16, 15, 14,
+        13, 13, 12, 12, 11, 11, 11, 11, 15, 16, 17, 17, 17, 16, 15, 14, 14, 13,
+        12, 12, 11, 11, 10, 10, 15, 16, 16, 17, 17, 16, 15, 14, 14, 13, 12, 12,
+        11, 11, 10, 10,
+        /* Size 32x32 */
+        32, 33, 33, 34, 31, 31, 28, 27, 25, 22, 21, 21, 21, 21, 20, 20, 20, 19,
+        19, 18, 18, 17, 17, 16, 16, 16, 15, 15, 15, 15, 15, 14, 33, 33, 33, 33,
+        30, 30, 27, 26, 24, 22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 19, 19, 18,
+        18, 17, 17, 17, 16, 16, 16, 16, 16, 15, 33, 33, 33, 33, 30, 29, 26, 26,
+        24, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 19, 19, 18, 18, 17, 17, 17,
+        17, 16, 16, 16, 16, 15, 34, 33, 33, 32, 30, 29, 26, 25, 24, 23, 22, 23,
+        23, 23, 22, 22, 22, 21, 21, 20, 20, 19, 19, 18, 18, 18, 17, 17, 17, 17,
+        16, 16, 31, 30, 30, 30, 28, 27, 24, 24, 23, 22, 22, 22, 22, 23, 22, 22,
+        22, 21, 21, 20, 20, 19, 19, 18, 18, 18, 17, 17, 17, 17, 16, 16, 31, 30,
+        29, 29, 27, 26, 24, 23, 23, 22, 22, 22, 22, 23, 22, 22, 22, 21, 21, 20,
+        20, 19, 19, 18, 18, 18, 17, 17, 17, 17, 17, 17, 28, 27, 26, 26, 24, 24,
+        22, 22, 22, 22, 21, 22, 22, 23, 22, 22, 22, 22, 21, 21, 20, 20, 19, 19,
+        19, 19, 18, 18, 17, 17, 17, 17, 27, 26, 26, 25, 24, 23, 22, 22, 21, 21,
+        21, 21, 22, 22, 22, 22, 21, 21, 21, 20, 20, 19, 19, 19, 18, 18, 18, 18,
+        18, 17, 17, 17, 25, 24, 24, 24, 23, 23, 22, 21, 21, 20, 20, 21, 21, 21,
+        20, 20, 20, 20, 20, 19, 19, 19, 18, 18, 18, 18, 17, 17, 17, 17, 17, 17,
+        22, 22, 22, 23, 22, 22, 22, 21, 20, 20, 20, 20, 20, 20, 19, 19, 19, 19,
+        19, 19, 18, 18, 18, 17, 17, 17, 17, 17, 17, 17, 16, 16, 21, 22, 22, 22,
+        22, 22, 21, 21, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18,
+        17, 17, 17, 17, 16, 16, 16, 16, 16, 16, 21, 22, 22, 23, 22, 22, 22, 21,
+        21, 20, 19, 19, 19, 19, 18, 18, 18, 18, 18, 17, 17, 17, 17, 16, 16, 16,
+        16, 16, 16, 16, 16, 15, 21, 22, 22, 23, 22, 22, 22, 22, 21, 20, 19, 19,
+        19, 18, 18, 18, 17, 17, 17, 17, 17, 16, 16, 16, 16, 16, 15, 15, 15, 15,
+        15, 15, 21, 22, 22, 23, 23, 23, 23, 22, 21, 20, 19, 19, 18, 18, 17, 17,
+        17, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 15, 15, 15, 15, 15, 20, 21,
+        22, 22, 22, 22, 22, 22, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 16, 16,
+        16, 15, 15, 15, 15, 14, 14, 15, 14, 14, 14, 15, 20, 21, 22, 22, 22, 22,
+        22, 22, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 16, 16, 15, 15, 15, 14,
+        14, 14, 14, 14, 14, 14, 14, 14, 20, 20, 21, 22, 22, 22, 22, 21, 20, 19,
+        19, 18, 17, 17, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 13, 14,
+        14, 13, 14, 14, 19, 20, 20, 21, 21, 21, 22, 21, 20, 19, 19, 18, 17, 17,
+        16, 16, 15, 15, 15, 14, 14, 14, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13,
+        19, 20, 20, 21, 21, 21, 21, 21, 20, 19, 19, 18, 17, 17, 16, 16, 15, 15,
+        14, 14, 14, 14, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 18, 19, 19, 20,
+        20, 20, 21, 20, 19, 19, 18, 17, 17, 16, 16, 16, 15, 14, 14, 14, 13, 13,
+        13, 13, 12, 12, 12, 13, 12, 13, 13, 12, 18, 19, 19, 20, 20, 20, 20, 20,
+        19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 13, 12, 12, 12,
+        12, 12, 12, 12, 12, 12, 17, 18, 18, 19, 19, 19, 20, 19, 19, 18, 18, 17,
+        16, 16, 15, 15, 14, 14, 14, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12,
+        12, 12, 17, 18, 18, 19, 19, 19, 19, 19, 18, 18, 17, 17, 16, 16, 15, 15,
+        14, 14, 13, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 16, 17,
+        17, 18, 18, 18, 19, 19, 18, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13,
+        12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 12, 11, 16, 17, 17, 18, 18, 18,
+        19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 12, 11,
+        11, 11, 11, 11, 11, 11, 11, 11, 16, 17, 17, 18, 18, 18, 19, 18, 18, 17,
+        17, 16, 16, 15, 14, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11,
+        11, 11, 11, 11, 15, 16, 17, 17, 17, 17, 18, 18, 17, 17, 16, 16, 15, 15,
+        14, 14, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11,
+        15, 16, 16, 17, 17, 17, 18, 18, 17, 17, 16, 16, 15, 15, 15, 14, 14, 13,
+        13, 13, 12, 12, 12, 11, 11, 11, 11, 11, 11, 10, 10, 10, 15, 16, 16, 17,
+        17, 17, 17, 18, 17, 17, 16, 16, 15, 15, 14, 14, 14, 13, 13, 12, 12, 12,
+        12, 11, 11, 11, 11, 11, 10, 10, 10, 10, 15, 16, 16, 17, 17, 17, 17, 17,
+        17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 13, 12, 12, 12, 11, 11, 11,
+        11, 10, 10, 10, 10, 10, 15, 16, 16, 16, 16, 17, 17, 17, 17, 16, 16, 16,
+        15, 15, 14, 14, 14, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 10, 10, 10,
+        10, 10, 14, 15, 15, 16, 16, 17, 17, 17, 17, 16, 16, 15, 15, 15, 15, 14,
+        14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 10,
+        /* Size 4x8 */
+        33, 22, 19, 16, 28, 22, 20, 17, 22, 20, 19, 17, 23, 19, 16, 15, 21, 19,
+        14, 13, 19, 18, 13, 12, 17, 17, 13, 11, 16, 16, 13, 11,
+        /* Size 8x4 */
+        33, 28, 22, 23, 21, 19, 17, 16, 22, 22, 20, 19, 19, 18, 17, 16, 19, 20,
+        19, 16, 14, 13, 13, 13, 16, 17, 17, 15, 13, 12, 11, 11,
+        /* Size 8x16 */
+        32, 31, 23, 21, 20, 18, 16, 15, 33, 30, 23, 22, 21, 19, 17, 16, 31, 28,
+        22, 23, 22, 20, 18, 17, 28, 24, 22, 23, 22, 20, 19, 17, 24, 23, 21, 21,
+        20, 19, 18, 17, 21, 22, 20, 19, 19, 18, 17, 16, 21, 22, 20, 18, 17, 17,
+        16, 15, 20, 22, 20, 17, 16, 16, 14, 14, 20, 22, 19, 17, 16, 14, 14, 14,
+        19, 21, 19, 17, 15, 14, 13, 13, 18, 20, 19, 16, 15, 13, 12, 12, 17, 19,
+        18, 16, 14, 13, 12, 12, 16, 18, 17, 15, 14, 12, 11, 11, 16, 17, 17, 15,
+        13, 12, 11, 11, 15, 17, 17, 15, 13, 12, 11, 11, 15, 16, 17, 15, 14, 12,
+        11, 10,
+        /* Size 16x8 */
+        32, 33, 31, 28, 24, 21, 21, 20, 20, 19, 18, 17, 16, 16, 15, 15, 31, 30,
+        28, 24, 23, 22, 22, 22, 22, 21, 20, 19, 18, 17, 17, 16, 23, 23, 22, 22,
+        21, 20, 20, 20, 19, 19, 19, 18, 17, 17, 17, 17, 21, 22, 23, 23, 21, 19,
+        18, 17, 17, 17, 16, 16, 15, 15, 15, 15, 20, 21, 22, 22, 20, 19, 17, 16,
+        16, 15, 15, 14, 14, 13, 13, 14, 18, 19, 20, 20, 19, 18, 17, 16, 14, 14,
+        13, 13, 12, 12, 12, 12, 16, 17, 18, 19, 18, 17, 16, 14, 14, 13, 12, 12,
+        11, 11, 11, 11, 15, 16, 17, 17, 17, 16, 15, 14, 14, 13, 12, 12, 11, 11,
+        11, 10,
+        /* Size 16x32 */
+        32, 33, 31, 28, 23, 21, 21, 20, 20, 18, 18, 16, 16, 15, 15, 15, 33, 33,
+        30, 27, 23, 22, 22, 21, 20, 19, 19, 17, 17, 16, 16, 16, 33, 32, 30, 26,
+        23, 22, 22, 22, 21, 20, 19, 17, 17, 17, 16, 16, 34, 32, 29, 26, 23, 22,
+        23, 22, 21, 20, 20, 18, 18, 17, 17, 17, 31, 29, 28, 24, 22, 22, 23, 22,
+        22, 20, 20, 18, 18, 17, 17, 17, 31, 28, 27, 24, 22, 22, 22, 22, 22, 20,
+        20, 18, 18, 17, 17, 17, 28, 26, 24, 22, 22, 22, 23, 22, 22, 21, 20, 19,
+        19, 18, 17, 17, 26, 25, 24, 22, 21, 21, 22, 22, 21, 20, 20, 19, 18, 18,
+        18, 17, 24, 24, 23, 22, 21, 20, 21, 20, 20, 19, 19, 18, 18, 17, 17, 17,
+        22, 22, 22, 21, 20, 20, 19, 19, 19, 19, 18, 17, 17, 17, 17, 17, 21, 22,
+        22, 21, 20, 19, 19, 19, 19, 18, 18, 17, 17, 16, 16, 17, 21, 22, 22, 22,
+        20, 19, 18, 18, 18, 17, 17, 16, 16, 16, 16, 16, 21, 23, 22, 22, 20, 19,
+        18, 18, 17, 17, 17, 16, 16, 16, 15, 16, 21, 23, 23, 22, 20, 19, 18, 17,
+        17, 16, 16, 15, 15, 15, 15, 15, 20, 22, 22, 22, 20, 19, 17, 17, 16, 16,
+        16, 15, 14, 15, 14, 15, 20, 22, 22, 22, 20, 19, 17, 17, 16, 16, 15, 14,
+        14, 14, 14, 14, 20, 21, 22, 22, 19, 19, 17, 16, 16, 15, 14, 14, 14, 14,
+        14, 14, 19, 21, 21, 21, 19, 19, 17, 16, 15, 14, 14, 13, 13, 13, 14, 13,
+        19, 20, 21, 21, 19, 19, 17, 16, 15, 14, 14, 13, 13, 13, 13, 13, 18, 20,
+        20, 20, 19, 18, 16, 16, 15, 14, 13, 13, 12, 13, 13, 13, 18, 20, 20, 20,
+        19, 18, 16, 16, 15, 14, 13, 12, 12, 12, 12, 13, 17, 19, 19, 20, 18, 18,
+        16, 15, 14, 13, 13, 12, 12, 12, 12, 12, 17, 18, 19, 19, 18, 17, 16, 15,
+        14, 13, 13, 12, 12, 12, 12, 12, 16, 18, 18, 19, 17, 17, 15, 15, 14, 13,
+        12, 12, 11, 11, 12, 12, 16, 18, 18, 18, 17, 17, 15, 14, 14, 13, 12, 11,
+        11, 11, 11, 12, 16, 17, 18, 18, 17, 17, 15, 14, 14, 13, 12, 11, 11, 11,
+        11, 11, 16, 17, 17, 18, 17, 16, 15, 14, 13, 12, 12, 11, 11, 11, 11, 11,
+        15, 17, 17, 18, 17, 16, 15, 15, 13, 13, 12, 11, 11, 11, 11, 11, 15, 17,
+        17, 17, 17, 16, 15, 14, 13, 13, 12, 12, 11, 11, 11, 10, 15, 16, 17, 17,
+        17, 16, 15, 14, 13, 13, 12, 12, 11, 11, 10, 10, 15, 16, 16, 17, 17, 16,
+        15, 14, 14, 13, 12, 12, 11, 11, 10, 10, 15, 16, 16, 17, 17, 15, 15, 14,
+        14, 12, 12, 11, 11, 10, 10, 10,
+        /* Size 32x16 */
+        32, 33, 33, 34, 31, 31, 28, 26, 24, 22, 21, 21, 21, 21, 20, 20, 20, 19,
+        19, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 15, 33, 33, 32, 32,
+        29, 28, 26, 25, 24, 22, 22, 22, 23, 23, 22, 22, 21, 21, 20, 20, 20, 19,
+        18, 18, 18, 17, 17, 17, 17, 16, 16, 16, 31, 30, 30, 29, 28, 27, 24, 24,
+        23, 22, 22, 22, 22, 23, 22, 22, 22, 21, 21, 20, 20, 19, 19, 18, 18, 18,
+        17, 17, 17, 17, 16, 16, 28, 27, 26, 26, 24, 24, 22, 22, 22, 21, 21, 22,
+        22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 19, 19, 18, 18, 18, 18, 17, 17,
+        17, 17, 23, 23, 23, 23, 22, 22, 22, 21, 21, 20, 20, 20, 20, 20, 20, 20,
+        19, 19, 19, 19, 19, 18, 18, 17, 17, 17, 17, 17, 17, 17, 17, 17, 21, 22,
+        22, 22, 22, 22, 22, 21, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 18,
+        18, 18, 17, 17, 17, 17, 16, 16, 16, 16, 16, 15, 21, 22, 22, 23, 23, 22,
+        23, 22, 21, 19, 19, 18, 18, 18, 17, 17, 17, 17, 17, 16, 16, 16, 16, 15,
+        15, 15, 15, 15, 15, 15, 15, 15, 20, 21, 22, 22, 22, 22, 22, 22, 20, 19,
+        19, 18, 18, 17, 17, 17, 16, 16, 16, 16, 16, 15, 15, 15, 14, 14, 14, 15,
+        14, 14, 14, 14, 20, 20, 21, 21, 22, 22, 22, 21, 20, 19, 19, 18, 17, 17,
+        16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 13, 13, 13, 13, 14, 14,
+        18, 19, 20, 20, 20, 20, 21, 20, 19, 19, 18, 17, 17, 16, 16, 16, 15, 14,
+        14, 14, 14, 13, 13, 13, 13, 13, 12, 13, 13, 13, 13, 12, 18, 19, 19, 20,
+        20, 20, 20, 20, 19, 18, 18, 17, 17, 16, 16, 15, 14, 14, 14, 13, 13, 13,
+        13, 12, 12, 12, 12, 12, 12, 12, 12, 12, 16, 17, 17, 18, 18, 18, 19, 19,
+        18, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12, 12, 11, 11,
+        11, 11, 12, 12, 12, 11, 16, 17, 17, 18, 18, 18, 19, 18, 18, 17, 17, 16,
+        16, 15, 14, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11,
+        11, 11, 15, 16, 17, 17, 17, 17, 18, 18, 17, 17, 16, 16, 16, 15, 15, 14,
+        14, 13, 13, 13, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 10, 15, 16,
+        16, 17, 17, 17, 17, 18, 17, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13,
+        12, 12, 12, 12, 11, 11, 11, 11, 11, 10, 10, 10, 15, 16, 16, 17, 17, 17,
+        17, 17, 17, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 13, 12, 12, 12,
+        12, 11, 11, 11, 10, 10, 10, 10,
+        /* Size 4x16 */
+        33, 21, 18, 15, 32, 22, 20, 17, 29, 22, 20, 17, 26, 22, 21, 18, 24, 20,
+        19, 17, 22, 19, 18, 16, 23, 19, 17, 16, 22, 19, 16, 15, 21, 19, 15, 14,
+        20, 19, 14, 13, 20, 18, 14, 12, 18, 17, 13, 12, 18, 17, 13, 11, 17, 16,
+        12, 11, 17, 16, 13, 11, 16, 16, 13, 11,
+        /* Size 16x4 */
+        33, 32, 29, 26, 24, 22, 23, 22, 21, 20, 20, 18, 18, 17, 17, 16, 21, 22,
+        22, 22, 20, 19, 19, 19, 19, 19, 18, 17, 17, 16, 16, 16, 18, 20, 20, 21,
+        19, 18, 17, 16, 15, 14, 14, 13, 13, 12, 13, 13, 15, 17, 17, 18, 17, 16,
+        16, 15, 14, 13, 12, 12, 11, 11, 11, 11,
+        /* Size 8x32 */
+        32, 31, 23, 21, 20, 18, 16, 15, 33, 30, 23, 22, 20, 19, 17, 16, 33, 30,
+        23, 22, 21, 19, 17, 16, 34, 29, 23, 23, 21, 20, 18, 17, 31, 28, 22, 23,
+        22, 20, 18, 17, 31, 27, 22, 22, 22, 20, 18, 17, 28, 24, 22, 23, 22, 20,
+        19, 17, 26, 24, 21, 22, 21, 20, 18, 18, 24, 23, 21, 21, 20, 19, 18, 17,
+        22, 22, 20, 19, 19, 18, 17, 17, 21, 22, 20, 19, 19, 18, 17, 16, 21, 22,
+        20, 18, 18, 17, 16, 16, 21, 22, 20, 18, 17, 17, 16, 15, 21, 23, 20, 18,
+        17, 16, 15, 15, 20, 22, 20, 17, 16, 16, 14, 14, 20, 22, 20, 17, 16, 15,
+        14, 14, 20, 22, 19, 17, 16, 14, 14, 14, 19, 21, 19, 17, 15, 14, 13, 14,
+        19, 21, 19, 17, 15, 14, 13, 13, 18, 20, 19, 16, 15, 13, 12, 13, 18, 20,
+        19, 16, 15, 13, 12, 12, 17, 19, 18, 16, 14, 13, 12, 12, 17, 19, 18, 16,
+        14, 13, 12, 12, 16, 18, 17, 15, 14, 12, 11, 12, 16, 18, 17, 15, 14, 12,
+        11, 11, 16, 18, 17, 15, 14, 12, 11, 11, 16, 17, 17, 15, 13, 12, 11, 11,
+        15, 17, 17, 15, 13, 12, 11, 11, 15, 17, 17, 15, 13, 12, 11, 11, 15, 17,
+        17, 15, 13, 12, 11, 10, 15, 16, 17, 15, 14, 12, 11, 10, 15, 16, 17, 15,
+        14, 12, 11, 10,
+        /* Size 32x8 */
+        32, 33, 33, 34, 31, 31, 28, 26, 24, 22, 21, 21, 21, 21, 20, 20, 20, 19,
+        19, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 15, 31, 30, 30, 29,
+        28, 27, 24, 24, 23, 22, 22, 22, 22, 23, 22, 22, 22, 21, 21, 20, 20, 19,
+        19, 18, 18, 18, 17, 17, 17, 17, 16, 16, 23, 23, 23, 23, 22, 22, 22, 21,
+        21, 20, 20, 20, 20, 20, 20, 20, 19, 19, 19, 19, 19, 18, 18, 17, 17, 17,
+        17, 17, 17, 17, 17, 17, 21, 22, 22, 23, 23, 22, 23, 22, 21, 19, 19, 18,
+        18, 18, 17, 17, 17, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 15, 15, 15,
+        15, 15, 20, 20, 21, 21, 22, 22, 22, 21, 20, 19, 19, 18, 17, 17, 16, 16,
+        16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 13, 13, 13, 13, 14, 14, 18, 19,
+        19, 20, 20, 20, 20, 20, 19, 18, 18, 17, 17, 16, 16, 15, 14, 14, 14, 13,
+        13, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12, 16, 17, 17, 18, 18, 18,
+        19, 18, 18, 17, 17, 16, 16, 15, 14, 14, 14, 13, 13, 12, 12, 12, 12, 11,
+        11, 11, 11, 11, 11, 11, 11, 11, 15, 16, 16, 17, 17, 17, 17, 18, 17, 17,
+        16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11,
+        11, 10, 10, 10 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 30, 19, 14, 30, 21, 16, 13, 19, 16, 11, 9, 14, 13, 9, 7,
+        /* Size 8x8 */
+        32, 32, 30, 26, 20, 17, 13, 12, 32, 31, 29, 26, 21, 17, 14, 13, 30, 29,
+        26, 22, 19, 16, 14, 13, 26, 26, 22, 18, 16, 14, 12, 11, 20, 21, 19, 16,
+        13, 11, 10, 10, 17, 17, 16, 14, 11, 10, 9, 8, 13, 14, 14, 12, 10, 9, 8,
+        7, 12, 13, 13, 11, 10, 8, 7, 7,
+        /* Size 16x16 */
+        32, 33, 33, 32, 31, 28, 26, 23, 21, 19, 17, 16, 14, 13, 12, 11, 33, 32,
+        32, 32, 31, 29, 27, 24, 22, 20, 18, 16, 15, 13, 13, 12, 33, 32, 32, 31,
+        30, 29, 27, 25, 23, 21, 19, 17, 15, 14, 13, 12, 32, 32, 31, 30, 28, 28,
+        26, 24, 23, 21, 19, 17, 16, 14, 14, 13, 31, 31, 30, 28, 27, 24, 23, 22,
+        20, 19, 18, 16, 15, 14, 13, 13, 28, 29, 29, 28, 24, 21, 20, 19, 18, 17,
+        16, 15, 14, 13, 12, 12, 26, 27, 27, 26, 23, 20, 19, 18, 17, 16, 15, 14,
+        13, 12, 12, 11, 23, 24, 25, 24, 22, 19, 18, 16, 15, 14, 14, 13, 12, 11,
+        11, 11, 21, 22, 23, 23, 20, 18, 17, 15, 14, 13, 13, 12, 11, 10, 10, 10,
+        19, 20, 21, 21, 19, 17, 16, 14, 13, 12, 12, 11, 10, 10, 9, 9, 17, 18,
+        19, 19, 18, 16, 15, 14, 13, 12, 11, 10, 10, 9, 9, 9, 16, 16, 17, 17, 16,
+        15, 14, 13, 12, 11, 10, 10, 9, 8, 8, 8, 14, 15, 15, 16, 15, 14, 13, 12,
+        11, 10, 10, 9, 8, 8, 8, 7, 13, 13, 14, 14, 14, 13, 12, 11, 10, 10, 9, 8,
+        8, 7, 7, 7, 12, 13, 13, 14, 13, 12, 12, 11, 10, 9, 9, 8, 8, 7, 7, 7, 11,
+        12, 12, 13, 13, 12, 11, 11, 10, 9, 9, 8, 7, 7, 7, 6,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 32, 32, 31, 30, 28, 28, 26, 25, 23, 22, 21, 20,
+        19, 18, 17, 16, 16, 14, 14, 13, 13, 12, 12, 12, 11, 11, 33, 32, 32, 32,
+        32, 32, 32, 32, 31, 30, 29, 29, 27, 26, 24, 23, 22, 20, 20, 18, 18, 17,
+        16, 15, 14, 13, 13, 13, 12, 12, 12, 12, 33, 32, 32, 32, 32, 32, 32, 32,
+        31, 30, 29, 29, 27, 26, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 15, 14,
+        13, 13, 13, 12, 12, 12, 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30,
+        28, 27, 25, 24, 23, 21, 21, 19, 19, 17, 17, 16, 15, 14, 14, 14, 13, 13,
+        12, 12, 33, 32, 32, 32, 32, 31, 31, 31, 30, 30, 29, 29, 27, 26, 25, 24,
+        23, 21, 21, 19, 19, 17, 17, 16, 15, 14, 14, 14, 13, 13, 12, 12, 33, 32,
+        32, 32, 31, 31, 31, 30, 29, 29, 28, 28, 26, 26, 24, 23, 23, 21, 20, 19,
+        19, 17, 17, 16, 15, 14, 14, 14, 13, 13, 13, 12, 32, 32, 32, 32, 31, 31,
+        30, 29, 28, 28, 28, 27, 26, 26, 24, 23, 23, 21, 21, 19, 19, 18, 17, 16,
+        16, 15, 14, 14, 14, 13, 13, 12, 32, 32, 32, 32, 31, 30, 29, 29, 28, 28,
+        27, 27, 26, 25, 24, 23, 22, 21, 21, 19, 19, 18, 17, 16, 16, 15, 14, 14,
+        14, 13, 13, 13, 31, 31, 31, 31, 30, 29, 28, 28, 27, 26, 24, 24, 23, 23,
+        22, 21, 20, 20, 19, 18, 18, 17, 16, 15, 15, 14, 14, 14, 13, 13, 13, 13,
+        30, 30, 30, 31, 30, 29, 28, 28, 26, 26, 24, 24, 23, 22, 22, 21, 20, 19,
+        19, 18, 18, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 28, 29, 29, 30,
+        29, 28, 28, 27, 24, 24, 21, 21, 20, 20, 19, 19, 18, 17, 17, 16, 16, 15,
+        15, 14, 14, 13, 13, 13, 12, 12, 12, 12, 28, 29, 29, 30, 29, 28, 27, 27,
+        24, 24, 21, 21, 20, 20, 19, 19, 18, 17, 17, 16, 16, 15, 15, 14, 14, 13,
+        13, 13, 12, 12, 12, 11, 26, 27, 27, 28, 27, 26, 26, 26, 23, 23, 20, 20,
+        19, 19, 18, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 12, 11,
+        11, 11, 25, 26, 26, 27, 26, 26, 26, 25, 23, 22, 20, 20, 19, 18, 17, 17,
+        16, 16, 15, 15, 15, 14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 23, 24,
+        24, 25, 25, 24, 24, 24, 22, 22, 19, 19, 18, 17, 16, 16, 15, 15, 14, 14,
+        14, 13, 13, 12, 12, 11, 11, 11, 11, 11, 11, 11, 22, 23, 23, 24, 24, 23,
+        23, 23, 21, 21, 19, 19, 17, 17, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12,
+        12, 11, 11, 11, 10, 10, 10, 10, 21, 22, 22, 23, 23, 23, 23, 22, 20, 20,
+        18, 18, 17, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 11, 11, 11, 10, 10,
+        10, 10, 10, 10, 20, 20, 21, 21, 21, 21, 21, 21, 20, 19, 17, 17, 16, 16,
+        15, 14, 14, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 9,
+        19, 20, 20, 21, 21, 20, 21, 21, 19, 19, 17, 17, 16, 15, 14, 14, 13, 13,
+        12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 18, 18, 19, 19, 19,
+        19, 19, 19, 18, 18, 16, 16, 15, 15, 14, 13, 13, 12, 12, 11, 11, 11, 10,
+        10, 10, 9, 9, 9, 9, 9, 9, 9, 17, 18, 18, 19, 19, 19, 19, 19, 18, 18, 16,
+        16, 15, 15, 14, 13, 13, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 9,
+        9, 9, 16, 17, 17, 17, 17, 17, 18, 18, 17, 16, 15, 15, 14, 14, 13, 12,
+        12, 11, 11, 11, 10, 10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 8, 16, 16, 16, 17,
+        17, 17, 17, 17, 16, 16, 15, 15, 14, 13, 13, 12, 12, 11, 11, 10, 10, 10,
+        10, 9, 9, 9, 8, 8, 8, 8, 8, 8, 14, 15, 15, 16, 16, 16, 16, 16, 15, 15,
+        14, 14, 13, 13, 12, 12, 11, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 8,
+        8, 8, 14, 14, 15, 15, 15, 15, 16, 16, 15, 15, 14, 14, 13, 12, 12, 12,
+        11, 11, 10, 10, 10, 9, 9, 9, 8, 8, 8, 8, 8, 8, 7, 8, 13, 13, 14, 14, 14,
+        14, 15, 15, 14, 14, 13, 13, 12, 12, 11, 11, 11, 10, 10, 9, 9, 9, 9, 8,
+        8, 8, 8, 7, 7, 7, 7, 7, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 13, 13,
+        12, 12, 11, 11, 10, 10, 10, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 12,
+        13, 13, 14, 14, 14, 14, 14, 14, 13, 13, 13, 12, 12, 11, 11, 10, 10, 10,
+        9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 12, 12, 13, 13, 13, 13, 14, 14,
+        13, 13, 12, 12, 12, 11, 11, 10, 10, 10, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7,
+        7, 7, 7, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 12, 12, 11, 11, 11, 10,
+        10, 10, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 6, 11, 12, 12, 12, 12,
+        13, 13, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 9, 8, 8, 8, 7,
+        7, 7, 7, 7, 7, 6, 6, 11, 12, 12, 12, 12, 12, 12, 13, 13, 12, 12, 11, 11,
+        11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6,
+        /* Size 4x8 */
+        32, 29, 20, 13, 32, 28, 20, 14, 30, 24, 19, 14, 27, 20, 15, 12, 21, 17,
+        13, 10, 17, 15, 11, 9, 14, 13, 10, 8, 13, 12, 9, 7,
+        /* Size 8x4 */
+        32, 32, 30, 27, 21, 17, 14, 13, 29, 28, 24, 20, 17, 15, 13, 12, 20, 20,
+        19, 15, 13, 11, 10, 9, 13, 14, 14, 12, 10, 9, 8, 7,
+        /* Size 8x16 */
+        32, 33, 31, 26, 20, 16, 13, 12, 33, 32, 31, 26, 21, 17, 14, 12, 33, 32,
+        30, 27, 22, 17, 14, 13, 32, 31, 28, 26, 21, 18, 15, 13, 31, 30, 27, 23,
+        20, 17, 14, 13, 28, 29, 24, 20, 18, 15, 13, 12, 26, 27, 23, 19, 16, 14,
+        12, 12, 23, 25, 22, 17, 15, 13, 11, 11, 21, 23, 20, 17, 14, 12, 11, 10,
+        19, 21, 19, 16, 13, 11, 10, 9, 18, 19, 18, 15, 12, 10, 9, 9, 16, 17, 16,
+        14, 11, 10, 9, 8, 14, 15, 15, 13, 11, 9, 8, 8, 13, 14, 14, 12, 10, 9, 8,
+        7, 12, 13, 13, 11, 10, 8, 7, 7, 11, 12, 13, 11, 10, 9, 7, 7,
+        /* Size 16x8 */
+        32, 33, 33, 32, 31, 28, 26, 23, 21, 19, 18, 16, 14, 13, 12, 11, 33, 32,
+        32, 31, 30, 29, 27, 25, 23, 21, 19, 17, 15, 14, 13, 12, 31, 31, 30, 28,
+        27, 24, 23, 22, 20, 19, 18, 16, 15, 14, 13, 13, 26, 26, 27, 26, 23, 20,
+        19, 17, 17, 16, 15, 14, 13, 12, 11, 11, 20, 21, 22, 21, 20, 18, 16, 15,
+        14, 13, 12, 11, 11, 10, 10, 10, 16, 17, 17, 18, 17, 15, 14, 13, 12, 11,
+        10, 10, 9, 9, 8, 9, 13, 14, 14, 15, 14, 13, 12, 11, 11, 10, 9, 9, 8, 8,
+        7, 7, 12, 12, 13, 13, 13, 12, 12, 11, 10, 9, 9, 8, 8, 7, 7, 7,
+        /* Size 16x32 */
+        32, 33, 33, 32, 31, 28, 26, 23, 20, 19, 16, 16, 13, 13, 12, 11, 33, 32,
+        32, 32, 31, 29, 26, 24, 21, 20, 17, 16, 14, 13, 12, 12, 33, 32, 32, 32,
+        31, 29, 26, 24, 21, 20, 17, 17, 14, 13, 12, 12, 33, 32, 32, 31, 31, 30,
+        27, 25, 22, 21, 17, 17, 14, 14, 13, 13, 33, 32, 32, 31, 30, 29, 27, 25,
+        22, 21, 17, 17, 14, 14, 13, 13, 32, 32, 31, 30, 29, 28, 26, 24, 21, 20,
+        17, 17, 14, 14, 13, 13, 32, 32, 31, 29, 28, 28, 26, 24, 21, 21, 18, 17,
+        15, 14, 13, 13, 32, 31, 31, 29, 28, 27, 25, 24, 21, 21, 18, 17, 15, 15,
+        14, 13, 31, 31, 30, 28, 27, 25, 23, 22, 20, 19, 17, 16, 14, 14, 13, 13,
+        30, 30, 30, 28, 26, 24, 23, 21, 19, 19, 16, 16, 14, 14, 13, 12, 28, 30,
+        29, 27, 24, 21, 20, 19, 18, 17, 15, 15, 13, 13, 12, 12, 28, 29, 29, 27,
+        24, 21, 20, 19, 17, 17, 15, 15, 13, 13, 12, 12, 26, 28, 27, 26, 23, 20,
+        19, 18, 16, 16, 14, 14, 12, 12, 12, 12, 26, 27, 26, 25, 23, 20, 18, 17,
+        16, 15, 14, 13, 12, 12, 11, 11, 23, 25, 25, 24, 22, 19, 17, 16, 15, 14,
+        13, 13, 11, 11, 11, 11, 22, 24, 24, 23, 21, 19, 17, 16, 14, 14, 12, 12,
+        11, 11, 11, 10, 21, 23, 23, 22, 20, 18, 17, 15, 14, 13, 12, 12, 11, 10,
+        10, 10, 20, 21, 21, 21, 20, 17, 16, 15, 13, 13, 11, 11, 10, 10, 10, 10,
+        19, 21, 21, 20, 19, 17, 16, 14, 13, 12, 11, 11, 10, 10, 9, 10, 18, 19,
+        19, 19, 18, 16, 15, 14, 12, 12, 11, 10, 9, 9, 9, 9, 18, 19, 19, 19, 18,
+        16, 15, 14, 12, 12, 10, 10, 9, 9, 9, 9, 16, 17, 17, 18, 17, 15, 14, 13,
+        12, 11, 10, 10, 9, 9, 8, 8, 16, 17, 17, 17, 16, 15, 14, 13, 11, 11, 10,
+        10, 9, 8, 8, 8, 14, 16, 16, 16, 15, 14, 13, 12, 11, 11, 9, 9, 8, 8, 8,
+        8, 14, 15, 15, 16, 15, 14, 13, 12, 11, 10, 9, 9, 8, 8, 8, 8, 13, 14, 14,
+        15, 14, 13, 12, 11, 10, 10, 9, 9, 8, 8, 7, 7, 13, 14, 14, 14, 14, 13,
+        12, 11, 10, 10, 9, 8, 8, 7, 7, 7, 12, 14, 14, 14, 14, 13, 12, 11, 10,
+        10, 8, 8, 8, 7, 7, 7, 12, 13, 13, 14, 13, 12, 11, 11, 10, 9, 8, 8, 7, 7,
+        7, 7, 12, 13, 13, 13, 13, 12, 11, 10, 10, 9, 8, 8, 7, 7, 7, 7, 11, 12,
+        12, 13, 13, 12, 11, 10, 10, 9, 9, 8, 7, 7, 7, 7, 11, 12, 12, 13, 13, 11,
+        11, 10, 10, 9, 9, 8, 8, 7, 7, 6,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 32, 32, 32, 31, 30, 28, 28, 26, 26, 23, 22, 21, 20,
+        19, 18, 18, 16, 16, 14, 14, 13, 13, 12, 12, 12, 11, 11, 33, 32, 32, 32,
+        32, 32, 32, 31, 31, 30, 30, 29, 28, 27, 25, 24, 23, 21, 21, 19, 19, 17,
+        17, 16, 15, 14, 14, 14, 13, 13, 12, 12, 33, 32, 32, 32, 32, 31, 31, 31,
+        30, 30, 29, 29, 27, 26, 25, 24, 23, 21, 21, 19, 19, 17, 17, 16, 15, 14,
+        14, 14, 13, 13, 12, 12, 32, 32, 32, 31, 31, 30, 29, 29, 28, 28, 27, 27,
+        26, 25, 24, 23, 22, 21, 20, 19, 19, 18, 17, 16, 16, 15, 14, 14, 14, 13,
+        13, 13, 31, 31, 31, 31, 30, 29, 28, 28, 27, 26, 24, 24, 23, 23, 22, 21,
+        20, 20, 19, 18, 18, 17, 16, 15, 15, 14, 14, 14, 13, 13, 13, 13, 28, 29,
+        29, 30, 29, 28, 28, 27, 25, 24, 21, 21, 20, 20, 19, 19, 18, 17, 17, 16,
+        16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12, 11, 26, 26, 26, 27, 27, 26,
+        26, 25, 23, 23, 20, 20, 19, 18, 17, 17, 17, 16, 16, 15, 15, 14, 14, 13,
+        13, 12, 12, 12, 11, 11, 11, 11, 23, 24, 24, 25, 25, 24, 24, 24, 22, 21,
+        19, 19, 18, 17, 16, 16, 15, 15, 14, 14, 14, 13, 13, 12, 12, 11, 11, 11,
+        11, 10, 10, 10, 20, 21, 21, 22, 22, 21, 21, 21, 20, 19, 18, 17, 16, 16,
+        15, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10,
+        19, 20, 20, 21, 21, 20, 21, 21, 19, 19, 17, 17, 16, 15, 14, 14, 13, 13,
+        12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 16, 17, 17, 17, 17,
+        17, 18, 18, 17, 16, 15, 15, 14, 14, 13, 12, 12, 11, 11, 11, 10, 10, 10,
+        9, 9, 9, 9, 8, 8, 8, 9, 9, 16, 16, 17, 17, 17, 17, 17, 17, 16, 16, 15,
+        15, 14, 13, 13, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 9, 8, 8, 8, 8, 8,
+        8, 13, 14, 14, 14, 14, 14, 15, 15, 14, 14, 13, 13, 12, 12, 11, 11, 11,
+        10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 8, 7, 7, 7, 8, 13, 13, 13, 14, 14, 14,
+        14, 15, 14, 14, 13, 13, 12, 12, 11, 11, 10, 10, 10, 9, 9, 9, 8, 8, 8, 8,
+        7, 7, 7, 7, 7, 7, 12, 12, 12, 13, 13, 13, 13, 14, 13, 13, 12, 12, 12,
+        11, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 11, 12,
+        12, 13, 13, 13, 13, 13, 13, 12, 12, 12, 12, 11, 11, 10, 10, 10, 10, 9,
+        9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 6,
+        /* Size 4x16 */
+        33, 28, 19, 13, 32, 29, 20, 13, 32, 29, 21, 14, 32, 28, 21, 14, 31, 25,
+        19, 14, 30, 21, 17, 13, 28, 20, 16, 12, 25, 19, 14, 11, 23, 18, 13, 10,
+        21, 17, 12, 10, 19, 16, 12, 9, 17, 15, 11, 8, 15, 14, 10, 8, 14, 13, 10,
+        7, 13, 12, 9, 7, 12, 12, 9, 7,
+        /* Size 16x4 */
+        33, 32, 32, 32, 31, 30, 28, 25, 23, 21, 19, 17, 15, 14, 13, 12, 28, 29,
+        29, 28, 25, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 12, 19, 20, 21, 21,
+        19, 17, 16, 14, 13, 12, 12, 11, 10, 10, 9, 9, 13, 13, 14, 14, 14, 13,
+        12, 11, 10, 10, 9, 8, 8, 7, 7, 7,
+        /* Size 8x32 */
+        32, 33, 31, 26, 20, 16, 13, 12, 33, 32, 31, 26, 21, 17, 14, 12, 33, 32,
+        31, 26, 21, 17, 14, 12, 33, 32, 31, 27, 22, 17, 14, 13, 33, 32, 30, 27,
+        22, 17, 14, 13, 32, 31, 29, 26, 21, 17, 14, 13, 32, 31, 28, 26, 21, 18,
+        15, 13, 32, 31, 28, 25, 21, 18, 15, 14, 31, 30, 27, 23, 20, 17, 14, 13,
+        30, 30, 26, 23, 19, 16, 14, 13, 28, 29, 24, 20, 18, 15, 13, 12, 28, 29,
+        24, 20, 17, 15, 13, 12, 26, 27, 23, 19, 16, 14, 12, 12, 26, 26, 23, 18,
+        16, 14, 12, 11, 23, 25, 22, 17, 15, 13, 11, 11, 22, 24, 21, 17, 14, 12,
+        11, 11, 21, 23, 20, 17, 14, 12, 11, 10, 20, 21, 20, 16, 13, 11, 10, 10,
+        19, 21, 19, 16, 13, 11, 10, 9, 18, 19, 18, 15, 12, 11, 9, 9, 18, 19, 18,
+        15, 12, 10, 9, 9, 16, 17, 17, 14, 12, 10, 9, 8, 16, 17, 16, 14, 11, 10,
+        9, 8, 14, 16, 15, 13, 11, 9, 8, 8, 14, 15, 15, 13, 11, 9, 8, 8, 13, 14,
+        14, 12, 10, 9, 8, 7, 13, 14, 14, 12, 10, 9, 8, 7, 12, 14, 14, 12, 10, 8,
+        8, 7, 12, 13, 13, 11, 10, 8, 7, 7, 12, 13, 13, 11, 10, 8, 7, 7, 11, 12,
+        13, 11, 10, 9, 7, 7, 11, 12, 13, 11, 10, 9, 8, 7,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 32, 32, 32, 31, 30, 28, 28, 26, 26, 23, 22, 21, 20,
+        19, 18, 18, 16, 16, 14, 14, 13, 13, 12, 12, 12, 11, 11, 33, 32, 32, 32,
+        32, 31, 31, 31, 30, 30, 29, 29, 27, 26, 25, 24, 23, 21, 21, 19, 19, 17,
+        17, 16, 15, 14, 14, 14, 13, 13, 12, 12, 31, 31, 31, 31, 30, 29, 28, 28,
+        27, 26, 24, 24, 23, 23, 22, 21, 20, 20, 19, 18, 18, 17, 16, 15, 15, 14,
+        14, 14, 13, 13, 13, 13, 26, 26, 26, 27, 27, 26, 26, 25, 23, 23, 20, 20,
+        19, 18, 17, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11,
+        11, 11, 20, 21, 21, 22, 22, 21, 21, 21, 20, 19, 18, 17, 16, 16, 15, 14,
+        14, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 16, 17,
+        17, 17, 17, 17, 18, 18, 17, 16, 15, 15, 14, 14, 13, 12, 12, 11, 11, 11,
+        10, 10, 10, 9, 9, 9, 9, 8, 8, 8, 9, 9, 13, 14, 14, 14, 14, 14, 15, 15,
+        14, 14, 13, 13, 12, 12, 11, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 8,
+        7, 7, 7, 8, 12, 12, 12, 13, 13, 13, 13, 14, 13, 13, 12, 12, 12, 11, 11,
+        11, 10, 10, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7 },
+      { /* Chroma */
+        /* Size 4x4 */
+        32, 22, 21, 18, 22, 19, 19, 17, 21, 19, 15, 13, 18, 17, 13, 11,
+        /* Size 8x8 */
+        33, 30, 24, 22, 21, 19, 17, 16, 30, 26, 23, 22, 22, 20, 18, 17, 24, 23,
+        21, 21, 20, 19, 18, 17, 22, 22, 21, 19, 18, 17, 16, 16, 21, 22, 20, 18,
+        16, 15, 14, 14, 19, 20, 19, 17, 15, 13, 12, 12, 17, 18, 18, 16, 14, 12,
+        12, 11, 16, 17, 17, 16, 14, 12, 11, 11,
+        /* Size 16x16 */
+        32, 33, 33, 29, 26, 21, 21, 21, 20, 20, 19, 18, 17, 16, 16, 15, 33, 33,
+        32, 28, 25, 22, 22, 22, 21, 21, 20, 19, 18, 17, 17, 16, 33, 32, 30, 26,
+        24, 22, 22, 23, 22, 22, 21, 20, 19, 18, 17, 17, 29, 28, 26, 23, 22, 22,
+        22, 23, 22, 22, 21, 20, 19, 18, 18, 17, 26, 25, 24, 22, 21, 20, 21, 21,
+        21, 21, 20, 19, 19, 18, 17, 17, 21, 22, 22, 22, 20, 19, 19, 19, 19, 19,
+        19, 18, 17, 17, 17, 17, 21, 22, 22, 22, 21, 19, 19, 19, 18, 18, 18, 17,
+        17, 16, 16, 16, 21, 22, 23, 23, 21, 19, 19, 18, 17, 17, 17, 16, 16, 15,
+        15, 15, 20, 21, 22, 22, 21, 19, 18, 17, 17, 16, 16, 15, 15, 14, 14, 14,
+        20, 21, 22, 22, 21, 19, 18, 17, 16, 16, 15, 14, 14, 14, 13, 13, 19, 20,
+        21, 21, 20, 19, 18, 17, 16, 15, 14, 14, 13, 13, 13, 13, 18, 19, 20, 20,
+        19, 18, 17, 16, 15, 14, 14, 13, 13, 12, 12, 12, 17, 18, 19, 19, 19, 17,
+        17, 16, 15, 14, 13, 13, 12, 12, 12, 12, 16, 17, 18, 18, 18, 17, 16, 15,
+        14, 14, 13, 12, 12, 11, 11, 11, 16, 17, 17, 18, 17, 17, 16, 15, 14, 13,
+        13, 12, 12, 11, 11, 11, 15, 16, 17, 17, 17, 17, 16, 15, 14, 13, 13, 12,
+        12, 11, 11, 10,
+        /* Size 32x32 */
+        32, 33, 33, 34, 33, 31, 29, 28, 26, 25, 21, 21, 21, 21, 21, 20, 20, 20,
+        20, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 33, 33, 33, 33,
+        32, 30, 28, 27, 25, 24, 22, 22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 19,
+        19, 18, 18, 17, 17, 17, 16, 16, 16, 16, 33, 33, 33, 33, 32, 29, 28, 26,
+        25, 24, 22, 22, 22, 22, 22, 22, 21, 21, 21, 20, 20, 19, 19, 18, 18, 17,
+        17, 17, 17, 16, 16, 16, 34, 33, 33, 32, 31, 29, 27, 26, 24, 24, 22, 22,
+        23, 23, 23, 23, 22, 22, 22, 21, 21, 20, 20, 19, 19, 18, 18, 18, 17, 17,
+        17, 17, 33, 32, 32, 31, 30, 28, 26, 25, 24, 24, 22, 22, 22, 23, 23, 22,
+        22, 22, 22, 21, 21, 20, 20, 19, 19, 18, 18, 18, 17, 17, 17, 17, 31, 30,
+        29, 29, 28, 26, 25, 24, 23, 23, 22, 22, 22, 22, 23, 22, 22, 22, 22, 21,
+        21, 20, 20, 19, 19, 18, 18, 18, 18, 17, 17, 17, 29, 28, 28, 27, 26, 25,
+        23, 22, 22, 22, 22, 22, 22, 22, 23, 22, 22, 22, 22, 21, 21, 20, 20, 20,
+        19, 19, 18, 18, 18, 18, 17, 17, 28, 27, 26, 26, 25, 24, 22, 22, 22, 22,
+        21, 22, 22, 22, 23, 22, 22, 22, 22, 21, 21, 20, 20, 20, 19, 19, 19, 19,
+        18, 18, 18, 18, 26, 25, 25, 24, 24, 23, 22, 22, 21, 21, 20, 21, 21, 21,
+        21, 21, 21, 21, 21, 20, 20, 20, 19, 19, 19, 18, 18, 18, 17, 17, 17, 17,
+        25, 24, 24, 24, 24, 23, 22, 22, 21, 21, 20, 20, 21, 21, 21, 21, 20, 20,
+        20, 20, 20, 19, 19, 19, 18, 18, 18, 18, 17, 17, 17, 17, 21, 22, 22, 22,
+        22, 22, 22, 21, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 18,
+        18, 18, 17, 17, 17, 17, 17, 17, 17, 17, 21, 22, 22, 22, 22, 22, 22, 22,
+        21, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 17, 17, 17,
+        17, 17, 16, 16, 16, 16, 21, 22, 22, 23, 22, 22, 22, 22, 21, 21, 19, 19,
+        19, 19, 19, 18, 18, 18, 18, 18, 18, 17, 17, 17, 17, 16, 16, 16, 16, 16,
+        16, 16, 21, 22, 22, 23, 23, 22, 22, 22, 21, 21, 19, 19, 19, 19, 18, 18,
+        18, 18, 18, 17, 17, 17, 17, 16, 16, 16, 16, 16, 16, 16, 15, 15, 21, 22,
+        22, 23, 23, 23, 23, 23, 21, 21, 19, 19, 19, 18, 18, 17, 17, 17, 17, 17,
+        17, 16, 16, 16, 16, 15, 15, 15, 15, 15, 15, 15, 20, 22, 22, 23, 22, 22,
+        22, 22, 21, 21, 19, 19, 18, 18, 17, 17, 17, 17, 17, 16, 16, 16, 16, 15,
+        15, 15, 15, 15, 15, 15, 14, 14, 20, 21, 21, 22, 22, 22, 22, 22, 21, 20,
+        19, 19, 18, 18, 17, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14,
+        14, 14, 14, 14, 20, 21, 21, 22, 22, 22, 22, 22, 21, 20, 19, 19, 18, 18,
+        17, 17, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14,
+        20, 20, 21, 22, 22, 22, 22, 22, 21, 20, 19, 19, 18, 18, 17, 17, 16, 16,
+        16, 15, 15, 15, 14, 14, 14, 14, 14, 14, 13, 13, 13, 14, 19, 20, 20, 21,
+        21, 21, 21, 21, 20, 20, 19, 19, 18, 17, 17, 16, 16, 15, 15, 15, 15, 14,
+        14, 14, 13, 13, 13, 13, 13, 13, 13, 13, 19, 20, 20, 21, 21, 21, 21, 21,
+        20, 20, 19, 18, 18, 17, 17, 16, 16, 15, 15, 15, 14, 14, 14, 14, 13, 13,
+        13, 13, 13, 13, 13, 13, 18, 19, 19, 20, 20, 20, 20, 20, 20, 19, 18, 18,
+        17, 17, 16, 16, 15, 15, 15, 14, 14, 13, 13, 13, 13, 12, 12, 12, 12, 12,
+        12, 12, 18, 19, 19, 20, 20, 20, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16,
+        15, 15, 14, 14, 14, 13, 13, 13, 13, 12, 12, 12, 12, 12, 12, 12, 17, 18,
+        18, 19, 19, 19, 20, 20, 19, 19, 18, 17, 17, 16, 16, 15, 15, 14, 14, 14,
+        14, 13, 13, 12, 12, 12, 12, 12, 12, 12, 12, 12, 17, 18, 18, 19, 19, 19,
+        19, 19, 19, 18, 17, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 13, 12,
+        12, 12, 12, 12, 12, 12, 12, 12, 16, 17, 17, 18, 18, 18, 19, 19, 18, 18,
+        17, 17, 16, 16, 15, 15, 14, 14, 14, 13, 13, 12, 12, 12, 12, 12, 12, 11,
+        11, 11, 11, 11, 16, 17, 17, 18, 18, 18, 18, 19, 18, 18, 17, 17, 16, 16,
+        15, 15, 14, 14, 14, 13, 13, 12, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11,
+        16, 17, 17, 18, 18, 18, 18, 19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14,
+        14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 16, 16, 17, 17,
+        17, 18, 18, 18, 17, 17, 17, 16, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12,
+        12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 15, 16, 16, 17, 17, 17, 18, 18,
+        17, 17, 17, 16, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12, 12, 11,
+        11, 11, 11, 11, 11, 11, 15, 16, 16, 17, 17, 17, 17, 18, 17, 17, 17, 16,
+        16, 15, 15, 14, 14, 14, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11,
+        10, 11, 15, 16, 16, 17, 17, 17, 17, 18, 17, 17, 17, 16, 16, 15, 15, 14,
+        14, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 10,
+        /* Size 4x8 */
+        33, 22, 20, 17, 28, 22, 22, 18, 24, 20, 20, 18, 23, 19, 18, 16, 22, 19,
+        16, 14, 20, 18, 15, 12, 18, 17, 14, 11, 17, 16, 13, 11,
+        /* Size 8x4 */
+        33, 28, 24, 23, 22, 20, 18, 17, 22, 22, 20, 19, 19, 18, 17, 16, 20, 22,
+        20, 18, 16, 15, 14, 13, 17, 18, 18, 16, 14, 12, 11, 11,
+        /* Size 8x16 */
+        32, 32, 26, 21, 20, 18, 16, 15, 33, 31, 25, 22, 21, 19, 17, 16, 33, 29,
+        24, 22, 22, 20, 18, 17, 29, 26, 22, 22, 22, 20, 19, 18, 25, 24, 21, 21,
+        21, 20, 18, 17, 21, 22, 20, 19, 19, 18, 17, 17, 21, 22, 21, 19, 18, 17,
+        16, 16, 21, 23, 21, 18, 17, 16, 15, 15, 20, 22, 21, 18, 16, 15, 14, 14,
+        20, 21, 20, 18, 16, 14, 14, 13, 19, 20, 20, 17, 15, 14, 13, 13, 18, 20,
+        19, 17, 15, 13, 12, 12, 17, 19, 18, 16, 14, 13, 12, 12, 16, 18, 18, 16,
+        14, 12, 12, 11, 16, 17, 17, 16, 14, 12, 11, 11, 15, 17, 17, 16, 14, 13,
+        12, 11,
+        /* Size 16x8 */
+        32, 33, 33, 29, 25, 21, 21, 21, 20, 20, 19, 18, 17, 16, 16, 15, 32, 31,
+        29, 26, 24, 22, 22, 23, 22, 21, 20, 20, 19, 18, 17, 17, 26, 25, 24, 22,
+        21, 20, 21, 21, 21, 20, 20, 19, 18, 18, 17, 17, 21, 22, 22, 22, 21, 19,
+        19, 18, 18, 18, 17, 17, 16, 16, 16, 16, 20, 21, 22, 22, 21, 19, 18, 17,
+        16, 16, 15, 15, 14, 14, 14, 14, 18, 19, 20, 20, 20, 18, 17, 16, 15, 14,
+        14, 13, 13, 12, 12, 13, 16, 17, 18, 19, 18, 17, 16, 15, 14, 14, 13, 12,
+        12, 12, 11, 12, 15, 16, 17, 18, 17, 17, 16, 15, 14, 13, 13, 12, 12, 11,
+        11, 11,
+        /* Size 16x32 */
+        32, 33, 32, 28, 26, 21, 21, 21, 20, 20, 18, 18, 16, 16, 15, 15, 33, 33,
+        31, 27, 25, 22, 22, 22, 21, 20, 19, 19, 17, 17, 16, 16, 33, 33, 31, 27,
+        25, 22, 22, 22, 21, 21, 19, 19, 17, 17, 16, 16, 34, 32, 31, 26, 24, 22,
+        23, 23, 22, 21, 20, 20, 18, 18, 17, 17, 33, 31, 29, 25, 24, 22, 22, 23,
+        22, 21, 20, 20, 18, 18, 17, 17, 31, 28, 28, 24, 23, 22, 22, 22, 22, 22,
+        20, 20, 18, 18, 17, 17, 29, 27, 26, 23, 22, 22, 22, 23, 22, 22, 20, 20,
+        19, 18, 18, 17, 28, 26, 25, 22, 22, 22, 22, 23, 22, 22, 20, 20, 19, 19,
+        18, 18, 25, 24, 24, 22, 21, 21, 21, 21, 21, 20, 20, 19, 18, 18, 17, 18,
+        24, 24, 24, 22, 21, 20, 21, 21, 20, 20, 19, 19, 18, 18, 17, 17, 21, 22,
+        22, 21, 20, 19, 19, 19, 19, 19, 18, 18, 17, 17, 17, 17, 21, 22, 22, 21,
+        20, 19, 19, 19, 19, 19, 18, 18, 17, 17, 16, 16, 21, 22, 22, 22, 21, 19,
+        19, 18, 18, 18, 17, 17, 16, 16, 16, 16, 21, 23, 22, 22, 21, 19, 19, 18,
+        18, 18, 17, 17, 16, 16, 16, 15, 21, 23, 23, 22, 21, 19, 18, 18, 17, 17,
+        16, 16, 15, 15, 15, 15, 21, 22, 22, 22, 21, 19, 18, 17, 17, 17, 16, 16,
+        15, 15, 15, 15, 20, 22, 22, 22, 21, 19, 18, 17, 16, 16, 15, 15, 14, 14,
+        14, 14, 20, 22, 22, 22, 21, 19, 18, 17, 16, 16, 15, 15, 14, 14, 14, 14,
+        20, 21, 21, 22, 20, 19, 18, 17, 16, 16, 14, 14, 14, 14, 13, 14, 19, 20,
+        21, 21, 20, 19, 17, 17, 15, 15, 14, 14, 13, 13, 13, 13, 19, 20, 20, 21,
+        20, 19, 17, 17, 15, 15, 14, 14, 13, 13, 13, 13, 18, 20, 20, 20, 20, 18,
+        17, 16, 15, 15, 13, 13, 12, 12, 12, 12, 18, 20, 20, 20, 19, 18, 17, 16,
+        15, 14, 13, 13, 12, 12, 12, 12, 17, 19, 19, 20, 19, 18, 17, 16, 14, 14,
+        13, 13, 12, 12, 12, 12, 17, 18, 19, 19, 18, 17, 16, 16, 14, 14, 13, 13,
+        12, 12, 12, 12, 16, 18, 18, 19, 18, 17, 16, 15, 14, 14, 12, 12, 12, 11,
+        11, 11, 16, 18, 18, 19, 18, 17, 16, 15, 14, 14, 12, 12, 12, 11, 11, 11,
+        16, 17, 18, 18, 18, 17, 16, 15, 14, 14, 12, 12, 11, 11, 11, 11, 16, 17,
+        17, 18, 17, 17, 16, 15, 14, 13, 12, 12, 11, 11, 11, 11, 15, 17, 17, 18,
+        17, 16, 16, 15, 14, 13, 12, 12, 11, 11, 11, 11, 15, 17, 17, 18, 17, 16,
+        16, 14, 14, 13, 13, 12, 12, 11, 11, 11, 15, 17, 17, 17, 17, 16, 16, 14,
+        14, 13, 13, 12, 12, 11, 11, 10,
+        /* Size 32x16 */
+        32, 33, 33, 34, 33, 31, 29, 28, 25, 24, 21, 21, 21, 21, 21, 21, 20, 20,
+        20, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 33, 33, 33, 32,
+        31, 28, 27, 26, 24, 24, 22, 22, 22, 23, 23, 22, 22, 22, 21, 20, 20, 20,
+        20, 19, 18, 18, 18, 17, 17, 17, 17, 17, 32, 31, 31, 31, 29, 28, 26, 25,
+        24, 24, 22, 22, 22, 22, 23, 22, 22, 22, 21, 21, 20, 20, 20, 19, 19, 18,
+        18, 18, 17, 17, 17, 17, 28, 27, 27, 26, 25, 24, 23, 22, 22, 22, 21, 21,
+        22, 22, 22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 19, 19, 19, 18, 18, 18,
+        18, 17, 26, 25, 25, 24, 24, 23, 22, 22, 21, 21, 20, 20, 21, 21, 21, 21,
+        21, 21, 20, 20, 20, 20, 19, 19, 18, 18, 18, 18, 17, 17, 17, 17, 21, 22,
+        22, 22, 22, 22, 22, 22, 21, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,
+        19, 18, 18, 18, 17, 17, 17, 17, 17, 16, 16, 16, 21, 22, 22, 23, 22, 22,
+        22, 22, 21, 21, 19, 19, 19, 19, 18, 18, 18, 18, 18, 17, 17, 17, 17, 17,
+        16, 16, 16, 16, 16, 16, 16, 16, 21, 22, 22, 23, 23, 22, 23, 23, 21, 21,
+        19, 19, 18, 18, 18, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 15, 15, 15,
+        15, 15, 14, 14, 20, 21, 21, 22, 22, 22, 22, 22, 21, 20, 19, 19, 18, 18,
+        17, 17, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14,
+        20, 20, 21, 21, 21, 22, 22, 22, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16,
+        16, 15, 15, 15, 14, 14, 14, 14, 14, 14, 13, 13, 13, 13, 18, 19, 19, 20,
+        20, 20, 20, 20, 20, 19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14, 14, 13,
+        13, 13, 13, 12, 12, 12, 12, 12, 13, 13, 18, 19, 19, 20, 20, 20, 20, 20,
+        19, 19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14, 14, 13, 13, 13, 13, 12,
+        12, 12, 12, 12, 12, 12, 16, 17, 17, 18, 18, 18, 19, 19, 18, 18, 17, 17,
+        16, 16, 15, 15, 14, 14, 14, 13, 13, 12, 12, 12, 12, 12, 12, 11, 11, 11,
+        12, 12, 16, 17, 17, 18, 18, 18, 18, 19, 18, 18, 17, 17, 16, 16, 15, 15,
+        14, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 15, 16,
+        16, 17, 17, 17, 18, 18, 17, 17, 17, 16, 16, 16, 15, 15, 14, 14, 13, 13,
+        13, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 15, 16, 16, 17, 17, 17,
+        17, 18, 18, 17, 17, 16, 16, 15, 15, 15, 14, 14, 14, 13, 13, 12, 12, 12,
+        12, 11, 11, 11, 11, 11, 11, 10,
+        /* Size 4x16 */
+        33, 21, 20, 16, 33, 22, 21, 17, 31, 22, 21, 18, 27, 22, 22, 18, 24, 21,
+        20, 18, 22, 19, 19, 17, 22, 19, 18, 16, 23, 19, 17, 15, 22, 19, 16, 14,
+        21, 19, 16, 14, 20, 19, 15, 13, 20, 18, 14, 12, 18, 17, 14, 12, 18, 17,
+        14, 11, 17, 17, 13, 11, 17, 16, 13, 11,
+        /* Size 16x4 */
+        33, 33, 31, 27, 24, 22, 22, 23, 22, 21, 20, 20, 18, 18, 17, 17, 21, 22,
+        22, 22, 21, 19, 19, 19, 19, 19, 19, 18, 17, 17, 17, 16, 20, 21, 21, 22,
+        20, 19, 18, 17, 16, 16, 15, 14, 14, 14, 13, 13, 16, 17, 18, 18, 18, 17,
+        16, 15, 14, 14, 13, 12, 12, 11, 11, 11,
+        /* Size 8x32 */
+        32, 32, 26, 21, 20, 18, 16, 15, 33, 31, 25, 22, 21, 19, 17, 16, 33, 31,
+        25, 22, 21, 19, 17, 16, 34, 31, 24, 23, 22, 20, 18, 17, 33, 29, 24, 22,
+        22, 20, 18, 17, 31, 28, 23, 22, 22, 20, 18, 17, 29, 26, 22, 22, 22, 20,
+        19, 18, 28, 25, 22, 22, 22, 20, 19, 18, 25, 24, 21, 21, 21, 20, 18, 17,
+        24, 24, 21, 21, 20, 19, 18, 17, 21, 22, 20, 19, 19, 18, 17, 17, 21, 22,
+        20, 19, 19, 18, 17, 16, 21, 22, 21, 19, 18, 17, 16, 16, 21, 22, 21, 19,
+        18, 17, 16, 16, 21, 23, 21, 18, 17, 16, 15, 15, 21, 22, 21, 18, 17, 16,
+        15, 15, 20, 22, 21, 18, 16, 15, 14, 14, 20, 22, 21, 18, 16, 15, 14, 14,
+        20, 21, 20, 18, 16, 14, 14, 13, 19, 21, 20, 17, 15, 14, 13, 13, 19, 20,
+        20, 17, 15, 14, 13, 13, 18, 20, 20, 17, 15, 13, 12, 12, 18, 20, 19, 17,
+        15, 13, 12, 12, 17, 19, 19, 17, 14, 13, 12, 12, 17, 19, 18, 16, 14, 13,
+        12, 12, 16, 18, 18, 16, 14, 12, 12, 11, 16, 18, 18, 16, 14, 12, 12, 11,
+        16, 18, 18, 16, 14, 12, 11, 11, 16, 17, 17, 16, 14, 12, 11, 11, 15, 17,
+        17, 16, 14, 12, 11, 11, 15, 17, 17, 16, 14, 13, 12, 11, 15, 17, 17, 16,
+        14, 13, 12, 11,
+        /* Size 32x8 */
+        32, 33, 33, 34, 33, 31, 29, 28, 25, 24, 21, 21, 21, 21, 21, 21, 20, 20,
+        20, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 32, 31, 31, 31,
+        29, 28, 26, 25, 24, 24, 22, 22, 22, 22, 23, 22, 22, 22, 21, 21, 20, 20,
+        20, 19, 19, 18, 18, 18, 17, 17, 17, 17, 26, 25, 25, 24, 24, 23, 22, 22,
+        21, 21, 20, 20, 21, 21, 21, 21, 21, 21, 20, 20, 20, 20, 19, 19, 18, 18,
+        18, 18, 17, 17, 17, 17, 21, 22, 22, 23, 22, 22, 22, 22, 21, 21, 19, 19,
+        19, 19, 18, 18, 18, 18, 18, 17, 17, 17, 17, 17, 16, 16, 16, 16, 16, 16,
+        16, 16, 20, 21, 21, 22, 22, 22, 22, 22, 21, 20, 19, 19, 18, 18, 17, 17,
+        16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 18, 19,
+        19, 20, 20, 20, 20, 20, 20, 19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14,
+        14, 13, 13, 13, 13, 12, 12, 12, 12, 12, 13, 13, 16, 17, 17, 18, 18, 18,
+        19, 19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14, 14, 13, 13, 12, 12, 12,
+        12, 12, 12, 11, 11, 11, 12, 12, 15, 16, 16, 17, 17, 17, 18, 18, 17, 17,
+        17, 16, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11,
+        11, 11, 11, 11 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 30, 21, 14, 30, 21, 17, 13, 21, 17, 12, 10, 14, 13, 10, 8,
+        /* Size 8x8 */
+        32, 32, 30, 27, 22, 18, 15, 13, 32, 31, 29, 26, 23, 19, 16, 14, 30, 29,
+        26, 23, 20, 18, 15, 13, 27, 26, 23, 19, 17, 15, 13, 12, 22, 23, 20, 17,
+        14, 13, 11, 10, 18, 19, 18, 15, 13, 11, 10, 9, 15, 16, 15, 13, 11, 10,
+        9, 8, 13, 14, 13, 12, 10, 9, 8, 7,
+        /* Size 16x16 */
+        32, 33, 33, 33, 32, 30, 28, 26, 23, 21, 19, 17, 16, 14, 13, 12, 33, 32,
+        32, 32, 32, 30, 29, 27, 24, 22, 20, 18, 17, 15, 13, 13, 33, 32, 32, 32,
+        32, 31, 30, 28, 25, 23, 21, 19, 17, 16, 14, 14, 33, 32, 32, 31, 30, 29,
+        28, 26, 24, 23, 20, 19, 17, 16, 14, 14, 32, 32, 32, 30, 29, 28, 27, 26,
+        24, 22, 21, 19, 18, 16, 15, 14, 30, 30, 31, 29, 28, 26, 24, 23, 22, 20,
+        19, 18, 16, 15, 14, 13, 28, 29, 30, 28, 27, 24, 21, 20, 19, 18, 17, 16,
+        15, 14, 13, 13, 26, 27, 28, 26, 26, 23, 20, 19, 18, 17, 16, 15, 14, 13,
+        12, 12, 23, 24, 25, 24, 24, 22, 19, 18, 16, 15, 14, 14, 13, 12, 11, 11,
+        21, 22, 23, 23, 22, 20, 18, 17, 15, 14, 13, 13, 12, 11, 11, 10, 19, 20,
+        21, 20, 21, 19, 17, 16, 14, 13, 12, 12, 11, 11, 10, 10, 17, 18, 19, 19,
+        19, 18, 16, 15, 14, 13, 12, 11, 10, 10, 9, 9, 16, 17, 17, 17, 18, 16,
+        15, 14, 13, 12, 11, 10, 10, 9, 9, 8, 14, 15, 16, 16, 16, 15, 14, 13, 12,
+        11, 11, 10, 9, 9, 8, 8, 13, 13, 14, 14, 15, 14, 13, 12, 11, 11, 10, 9,
+        9, 8, 8, 7, 12, 13, 14, 14, 14, 13, 13, 12, 11, 10, 10, 9, 8, 8, 7, 7,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 33, 32, 32, 30, 30, 28, 28, 26, 26, 23, 23, 21,
+        21, 19, 19, 17, 17, 16, 16, 14, 14, 13, 13, 12, 12, 12, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 30, 30, 29, 29, 27, 27, 24, 24, 22, 22, 20, 20, 18,
+        18, 17, 17, 15, 15, 13, 13, 13, 13, 12, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 30, 30, 29, 29, 27, 27, 24, 24, 22, 22, 20, 20, 18, 18, 17, 17, 15,
+        15, 13, 13, 13, 13, 12, 33, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30,
+        30, 28, 28, 25, 25, 23, 23, 21, 21, 19, 19, 17, 17, 16, 16, 14, 14, 14,
+        14, 13, 33, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 28, 28, 25,
+        25, 23, 23, 21, 21, 19, 19, 17, 17, 16, 16, 14, 14, 14, 14, 13, 33, 32,
+        32, 32, 32, 31, 31, 30, 30, 29, 29, 28, 28, 26, 26, 24, 24, 23, 23, 20,
+        20, 19, 19, 17, 17, 16, 16, 14, 14, 14, 14, 13, 33, 32, 32, 32, 32, 31,
+        31, 30, 30, 29, 29, 28, 28, 26, 26, 24, 24, 23, 23, 20, 20, 19, 19, 17,
+        17, 16, 16, 14, 14, 14, 14, 13, 32, 32, 32, 32, 32, 30, 30, 29, 29, 28,
+        28, 27, 27, 26, 26, 24, 24, 22, 22, 21, 21, 19, 19, 18, 18, 16, 16, 15,
+        15, 14, 14, 14, 32, 32, 32, 32, 32, 30, 30, 29, 29, 28, 28, 27, 27, 26,
+        26, 24, 24, 22, 22, 21, 21, 19, 19, 18, 18, 16, 16, 15, 15, 14, 14, 14,
+        30, 30, 30, 31, 31, 29, 29, 28, 28, 26, 26, 24, 24, 23, 23, 22, 22, 20,
+        20, 19, 19, 18, 18, 16, 16, 15, 15, 14, 14, 13, 13, 13, 30, 30, 30, 31,
+        31, 29, 29, 28, 28, 26, 26, 24, 24, 23, 23, 22, 22, 20, 20, 19, 19, 18,
+        18, 16, 16, 15, 15, 14, 14, 13, 13, 13, 28, 29, 29, 30, 30, 28, 28, 27,
+        27, 24, 24, 21, 21, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 15, 15, 14,
+        14, 13, 13, 13, 13, 12, 28, 29, 29, 30, 30, 28, 28, 27, 27, 24, 24, 21,
+        21, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13,
+        13, 12, 26, 27, 27, 28, 28, 26, 26, 26, 26, 23, 23, 20, 20, 19, 19, 18,
+        18, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 12, 11, 26, 27,
+        27, 28, 28, 26, 26, 26, 26, 23, 23, 20, 20, 19, 19, 18, 18, 17, 17, 16,
+        16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 12, 11, 23, 24, 24, 25, 25, 24,
+        24, 24, 24, 22, 22, 19, 19, 18, 18, 16, 16, 15, 15, 14, 14, 14, 14, 13,
+        13, 12, 12, 11, 11, 11, 11, 11, 23, 24, 24, 25, 25, 24, 24, 24, 24, 22,
+        22, 19, 19, 18, 18, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 12, 12, 11,
+        11, 11, 11, 11, 21, 22, 22, 23, 23, 23, 23, 22, 22, 20, 20, 18, 18, 17,
+        17, 15, 15, 14, 14, 13, 13, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10, 10,
+        21, 22, 22, 23, 23, 23, 23, 22, 22, 20, 20, 18, 18, 17, 17, 15, 15, 14,
+        14, 13, 13, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10, 10, 19, 20, 20, 21,
+        21, 20, 20, 21, 21, 19, 19, 17, 17, 16, 16, 14, 14, 13, 13, 12, 12, 12,
+        12, 11, 11, 11, 11, 10, 10, 10, 10, 9, 19, 20, 20, 21, 21, 20, 20, 21,
+        21, 19, 19, 17, 17, 16, 16, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11, 11,
+        11, 10, 10, 10, 10, 9, 17, 18, 18, 19, 19, 19, 19, 19, 19, 18, 18, 16,
+        16, 15, 15, 14, 14, 13, 13, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9,
+        9, 17, 18, 18, 19, 19, 19, 19, 19, 19, 18, 18, 16, 16, 15, 15, 14, 14,
+        13, 13, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 9, 16, 17, 17, 17,
+        17, 17, 17, 18, 18, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 11, 11, 10,
+        10, 10, 10, 9, 9, 9, 9, 8, 8, 8, 16, 17, 17, 17, 17, 17, 17, 18, 18, 16,
+        16, 15, 15, 14, 14, 13, 13, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9,
+        8, 8, 8, 14, 15, 15, 16, 16, 16, 16, 16, 16, 15, 15, 14, 14, 13, 13, 12,
+        12, 11, 11, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 8, 14, 15, 15, 16,
+        16, 16, 16, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 11, 11, 11, 11, 10,
+        10, 9, 9, 9, 9, 8, 8, 8, 8, 8, 13, 13, 13, 14, 14, 14, 14, 15, 15, 14,
+        14, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 7,
+        7, 7, 13, 13, 13, 14, 14, 14, 14, 15, 15, 14, 14, 13, 13, 12, 12, 11,
+        11, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 12, 13, 13, 14, 14,
+        14, 14, 14, 14, 13, 13, 13, 13, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 8,
+        8, 8, 8, 7, 7, 7, 7, 7, 12, 13, 13, 14, 14, 14, 14, 14, 14, 13, 13, 13,
+        13, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 12,
+        12, 12, 13, 13, 13, 13, 14, 14, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10,
+        9, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7,
+        /* Size 4x8 */
+        32, 29, 20, 14, 32, 28, 20, 14, 30, 24, 19, 14, 28, 20, 16, 12, 23, 18,
+        13, 11, 19, 16, 12, 9, 16, 14, 11, 8, 14, 13, 10, 8,
+        /* Size 8x4 */
+        32, 32, 30, 28, 23, 19, 16, 14, 29, 28, 24, 20, 18, 16, 14, 13, 20, 20,
+        19, 16, 13, 12, 11, 10, 14, 14, 14, 12, 11, 9, 8, 8,
+        /* Size 8x16 */
+        32, 33, 32, 28, 23, 19, 16, 13, 33, 32, 32, 29, 24, 20, 17, 14, 33, 32,
+        31, 30, 25, 21, 17, 14, 32, 32, 30, 28, 24, 20, 17, 14, 32, 31, 29, 27,
+        24, 21, 18, 15, 30, 30, 28, 24, 21, 19, 16, 14, 28, 30, 27, 21, 19, 17,
+        15, 13, 26, 28, 26, 20, 18, 16, 14, 12, 23, 25, 24, 19, 16, 14, 13, 11,
+        21, 23, 22, 18, 15, 13, 12, 11, 19, 21, 20, 17, 14, 12, 11, 10, 18, 19,
+        19, 16, 14, 12, 10, 9, 16, 17, 18, 15, 13, 11, 10, 9, 14, 16, 16, 14,
+        12, 11, 9, 8, 13, 14, 15, 13, 11, 10, 9, 8, 12, 14, 14, 13, 11, 10, 8,
+        8,
+        /* Size 16x8 */
+        32, 33, 33, 32, 32, 30, 28, 26, 23, 21, 19, 18, 16, 14, 13, 12, 33, 32,
+        32, 32, 31, 30, 30, 28, 25, 23, 21, 19, 17, 16, 14, 14, 32, 32, 31, 30,
+        29, 28, 27, 26, 24, 22, 20, 19, 18, 16, 15, 14, 28, 29, 30, 28, 27, 24,
+        21, 20, 19, 18, 17, 16, 15, 14, 13, 13, 23, 24, 25, 24, 24, 21, 19, 18,
+        16, 15, 14, 14, 13, 12, 11, 11, 19, 20, 21, 20, 21, 19, 17, 16, 14, 13,
+        12, 12, 11, 11, 10, 10, 16, 17, 17, 17, 18, 16, 15, 14, 13, 12, 11, 10,
+        10, 9, 9, 8, 13, 14, 14, 14, 15, 14, 13, 12, 11, 11, 10, 9, 9, 8, 8, 8,
+        /* Size 16x32 */
+        32, 33, 33, 32, 32, 28, 28, 23, 23, 19, 19, 16, 16, 13, 13, 12, 33, 32,
+        32, 32, 32, 29, 29, 24, 24, 20, 20, 17, 17, 14, 14, 12, 33, 32, 32, 32,
+        32, 29, 29, 24, 24, 20, 20, 17, 17, 14, 14, 12, 33, 32, 32, 31, 31, 30,
+        30, 25, 25, 21, 21, 17, 17, 14, 14, 13, 33, 32, 32, 31, 31, 30, 30, 25,
+        25, 21, 21, 17, 17, 14, 14, 13, 32, 32, 32, 30, 30, 28, 28, 24, 24, 20,
+        20, 17, 17, 14, 14, 13, 32, 32, 32, 30, 30, 28, 28, 24, 24, 20, 20, 17,
+        17, 14, 14, 13, 32, 31, 31, 29, 29, 27, 27, 24, 24, 21, 21, 18, 18, 15,
+        15, 14, 32, 31, 31, 29, 29, 27, 27, 24, 24, 21, 21, 18, 18, 15, 15, 14,
+        30, 30, 30, 28, 28, 24, 24, 21, 21, 19, 19, 16, 16, 14, 14, 13, 30, 30,
+        30, 28, 28, 24, 24, 21, 21, 19, 19, 16, 16, 14, 14, 13, 28, 30, 30, 27,
+        27, 21, 21, 19, 19, 17, 17, 15, 15, 13, 13, 12, 28, 30, 30, 27, 27, 21,
+        21, 19, 19, 17, 17, 15, 15, 13, 13, 12, 26, 28, 28, 26, 26, 20, 20, 18,
+        18, 16, 16, 14, 14, 12, 12, 12, 26, 28, 28, 26, 26, 20, 20, 18, 18, 16,
+        16, 14, 14, 12, 12, 12, 23, 25, 25, 24, 24, 19, 19, 16, 16, 14, 14, 13,
+        13, 11, 11, 11, 23, 25, 25, 24, 24, 19, 19, 16, 16, 14, 14, 13, 13, 11,
+        11, 11, 21, 23, 23, 22, 22, 18, 18, 15, 15, 13, 13, 12, 12, 11, 11, 10,
+        21, 23, 23, 22, 22, 18, 18, 15, 15, 13, 13, 12, 12, 11, 11, 10, 19, 21,
+        21, 20, 20, 17, 17, 14, 14, 12, 12, 11, 11, 10, 10, 9, 19, 21, 21, 20,
+        20, 17, 17, 14, 14, 12, 12, 11, 11, 10, 10, 9, 18, 19, 19, 19, 19, 16,
+        16, 14, 14, 12, 12, 10, 10, 9, 9, 9, 18, 19, 19, 19, 19, 16, 16, 14, 14,
+        12, 12, 10, 10, 9, 9, 9, 16, 17, 17, 18, 18, 15, 15, 13, 13, 11, 11, 10,
+        10, 9, 9, 8, 16, 17, 17, 18, 18, 15, 15, 13, 13, 11, 11, 10, 10, 9, 9,
+        8, 14, 16, 16, 16, 16, 14, 14, 12, 12, 11, 11, 9, 9, 8, 8, 8, 14, 16,
+        16, 16, 16, 14, 14, 12, 12, 11, 11, 9, 9, 8, 8, 8, 13, 14, 14, 15, 15,
+        13, 13, 11, 11, 10, 10, 9, 9, 8, 8, 7, 13, 14, 14, 15, 15, 13, 13, 11,
+        11, 10, 10, 9, 9, 8, 8, 7, 12, 14, 14, 14, 14, 13, 13, 11, 11, 10, 10,
+        8, 8, 8, 8, 7, 12, 14, 14, 14, 14, 13, 13, 11, 11, 10, 10, 8, 8, 8, 8,
+        7, 12, 13, 13, 13, 13, 12, 12, 11, 11, 9, 9, 8, 8, 7, 7, 7,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 32, 32, 32, 32, 30, 30, 28, 28, 26, 26, 23, 23, 21,
+        21, 19, 19, 18, 18, 16, 16, 14, 14, 13, 13, 12, 12, 12, 33, 32, 32, 32,
+        32, 32, 32, 31, 31, 30, 30, 30, 30, 28, 28, 25, 25, 23, 23, 21, 21, 19,
+        19, 17, 17, 16, 16, 14, 14, 14, 14, 13, 33, 32, 32, 32, 32, 32, 32, 31,
+        31, 30, 30, 30, 30, 28, 28, 25, 25, 23, 23, 21, 21, 19, 19, 17, 17, 16,
+        16, 14, 14, 14, 14, 13, 32, 32, 32, 31, 31, 30, 30, 29, 29, 28, 28, 27,
+        27, 26, 26, 24, 24, 22, 22, 20, 20, 19, 19, 18, 18, 16, 16, 15, 15, 14,
+        14, 13, 32, 32, 32, 31, 31, 30, 30, 29, 29, 28, 28, 27, 27, 26, 26, 24,
+        24, 22, 22, 20, 20, 19, 19, 18, 18, 16, 16, 15, 15, 14, 14, 13, 28, 29,
+        29, 30, 30, 28, 28, 27, 27, 24, 24, 21, 21, 20, 20, 19, 19, 18, 18, 17,
+        17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 13, 12, 28, 29, 29, 30, 30, 28,
+        28, 27, 27, 24, 24, 21, 21, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 15,
+        15, 14, 14, 13, 13, 13, 13, 12, 23, 24, 24, 25, 25, 24, 24, 24, 24, 21,
+        21, 19, 19, 18, 18, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 12, 12, 11,
+        11, 11, 11, 11, 23, 24, 24, 25, 25, 24, 24, 24, 24, 21, 21, 19, 19, 18,
+        18, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 12, 12, 11, 11, 11, 11, 11,
+        19, 20, 20, 21, 21, 20, 20, 21, 21, 19, 19, 17, 17, 16, 16, 14, 14, 13,
+        13, 12, 12, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 9, 19, 20, 20, 21,
+        21, 20, 20, 21, 21, 19, 19, 17, 17, 16, 16, 14, 14, 13, 13, 12, 12, 12,
+        12, 11, 11, 11, 11, 10, 10, 10, 10, 9, 16, 17, 17, 17, 17, 17, 17, 18,
+        18, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 11, 11, 10, 10, 10, 10, 9,
+        9, 9, 9, 8, 8, 8, 16, 17, 17, 17, 17, 17, 17, 18, 18, 16, 16, 15, 15,
+        14, 14, 13, 13, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 8, 8, 8, 13,
+        14, 14, 14, 14, 14, 14, 15, 15, 14, 14, 13, 13, 12, 12, 11, 11, 11, 11,
+        10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 7, 13, 14, 14, 14, 14, 14, 14, 15,
+        15, 14, 14, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8,
+        8, 8, 8, 7, 12, 12, 12, 13, 13, 13, 13, 14, 14, 13, 13, 12, 12, 12, 12,
+        11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7,
+        /* Size 4x16 */
+        33, 28, 19, 13, 32, 29, 20, 14, 32, 30, 21, 14, 32, 28, 20, 14, 31, 27,
+        21, 15, 30, 24, 19, 14, 30, 21, 17, 13, 28, 20, 16, 12, 25, 19, 14, 11,
+        23, 18, 13, 11, 21, 17, 12, 10, 19, 16, 12, 9, 17, 15, 11, 9, 16, 14,
+        11, 8, 14, 13, 10, 8, 14, 13, 10, 8,
+        /* Size 16x4 */
+        33, 32, 32, 32, 31, 30, 30, 28, 25, 23, 21, 19, 17, 16, 14, 14, 28, 29,
+        30, 28, 27, 24, 21, 20, 19, 18, 17, 16, 15, 14, 13, 13, 19, 20, 21, 20,
+        21, 19, 17, 16, 14, 13, 12, 12, 11, 11, 10, 10, 13, 14, 14, 14, 15, 14,
+        13, 12, 11, 11, 10, 9, 9, 8, 8, 8,
+        /* Size 8x32 */
+        32, 33, 32, 28, 23, 19, 16, 13, 33, 32, 32, 29, 24, 20, 17, 14, 33, 32,
+        32, 29, 24, 20, 17, 14, 33, 32, 31, 30, 25, 21, 17, 14, 33, 32, 31, 30,
+        25, 21, 17, 14, 32, 32, 30, 28, 24, 20, 17, 14, 32, 32, 30, 28, 24, 20,
+        17, 14, 32, 31, 29, 27, 24, 21, 18, 15, 32, 31, 29, 27, 24, 21, 18, 15,
+        30, 30, 28, 24, 21, 19, 16, 14, 30, 30, 28, 24, 21, 19, 16, 14, 28, 30,
+        27, 21, 19, 17, 15, 13, 28, 30, 27, 21, 19, 17, 15, 13, 26, 28, 26, 20,
+        18, 16, 14, 12, 26, 28, 26, 20, 18, 16, 14, 12, 23, 25, 24, 19, 16, 14,
+        13, 11, 23, 25, 24, 19, 16, 14, 13, 11, 21, 23, 22, 18, 15, 13, 12, 11,
+        21, 23, 22, 18, 15, 13, 12, 11, 19, 21, 20, 17, 14, 12, 11, 10, 19, 21,
+        20, 17, 14, 12, 11, 10, 18, 19, 19, 16, 14, 12, 10, 9, 18, 19, 19, 16,
+        14, 12, 10, 9, 16, 17, 18, 15, 13, 11, 10, 9, 16, 17, 18, 15, 13, 11,
+        10, 9, 14, 16, 16, 14, 12, 11, 9, 8, 14, 16, 16, 14, 12, 11, 9, 8, 13,
+        14, 15, 13, 11, 10, 9, 8, 13, 14, 15, 13, 11, 10, 9, 8, 12, 14, 14, 13,
+        11, 10, 8, 8, 12, 14, 14, 13, 11, 10, 8, 8, 12, 13, 13, 12, 11, 9, 8, 7,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 32, 32, 32, 32, 30, 30, 28, 28, 26, 26, 23, 23, 21,
+        21, 19, 19, 18, 18, 16, 16, 14, 14, 13, 13, 12, 12, 12, 33, 32, 32, 32,
+        32, 32, 32, 31, 31, 30, 30, 30, 30, 28, 28, 25, 25, 23, 23, 21, 21, 19,
+        19, 17, 17, 16, 16, 14, 14, 14, 14, 13, 32, 32, 32, 31, 31, 30, 30, 29,
+        29, 28, 28, 27, 27, 26, 26, 24, 24, 22, 22, 20, 20, 19, 19, 18, 18, 16,
+        16, 15, 15, 14, 14, 13, 28, 29, 29, 30, 30, 28, 28, 27, 27, 24, 24, 21,
+        21, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13,
+        13, 12, 23, 24, 24, 25, 25, 24, 24, 24, 24, 21, 21, 19, 19, 18, 18, 16,
+        16, 15, 15, 14, 14, 14, 14, 13, 13, 12, 12, 11, 11, 11, 11, 11, 19, 20,
+        20, 21, 21, 20, 20, 21, 21, 19, 19, 17, 17, 16, 16, 14, 14, 13, 13, 12,
+        12, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 9, 16, 17, 17, 17, 17, 17,
+        17, 18, 18, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 11, 11, 10, 10, 10,
+        10, 9, 9, 9, 9, 8, 8, 8, 13, 14, 14, 14, 14, 14, 14, 15, 15, 14, 14, 13,
+        13, 12, 12, 11, 11, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 7 },
+      { /* Chroma */
+        /* Size 4x4 */
+        32, 22, 22, 18, 22, 19, 19, 17, 22, 19, 16, 14, 18, 17, 14, 12,
+        /* Size 8x8 */
+        33, 30, 24, 22, 21, 20, 18, 17, 30, 26, 23, 22, 22, 21, 19, 18, 24, 23,
+        21, 21, 20, 20, 19, 18, 22, 22, 21, 19, 18, 18, 17, 16, 21, 22, 20, 18,
+        17, 16, 15, 14, 20, 21, 20, 18, 16, 14, 14, 13, 18, 19, 19, 17, 15, 14,
+        12, 12, 17, 18, 18, 16, 14, 13, 12, 11,
+        /* Size 16x16 */
+        32, 33, 34, 31, 28, 25, 21, 21, 21, 20, 20, 19, 18, 17, 16, 16, 33, 33,
+        33, 30, 27, 24, 22, 22, 22, 21, 20, 20, 19, 18, 17, 17, 34, 33, 32, 29,
+        26, 24, 22, 23, 23, 22, 22, 21, 20, 19, 18, 18, 31, 30, 29, 26, 24, 23,
+        22, 22, 23, 22, 22, 21, 20, 19, 18, 18, 28, 27, 26, 24, 22, 22, 21, 22,
+        23, 22, 22, 21, 20, 20, 19, 19, 25, 24, 24, 23, 22, 21, 20, 21, 21, 20,
+        20, 20, 19, 19, 18, 18, 21, 22, 22, 22, 21, 20, 19, 19, 19, 19, 19, 19,
+        18, 18, 17, 17, 21, 22, 23, 22, 22, 21, 19, 19, 19, 18, 18, 18, 17, 17,
+        16, 16, 21, 22, 23, 23, 23, 21, 19, 19, 18, 17, 17, 17, 16, 16, 15, 15,
+        20, 21, 22, 22, 22, 20, 19, 18, 17, 17, 16, 16, 15, 15, 14, 14, 20, 20,
+        22, 22, 22, 20, 19, 18, 17, 16, 16, 15, 15, 14, 14, 14, 19, 20, 21, 21,
+        21, 20, 19, 18, 17, 16, 15, 14, 14, 14, 13, 13, 18, 19, 20, 20, 20, 19,
+        18, 17, 16, 15, 15, 14, 13, 13, 12, 12, 17, 18, 19, 19, 20, 19, 18, 17,
+        16, 15, 14, 14, 13, 12, 12, 12, 16, 17, 18, 18, 19, 18, 17, 16, 15, 14,
+        14, 13, 12, 12, 12, 11, 16, 17, 18, 18, 19, 18, 17, 16, 15, 14, 14, 13,
+        12, 12, 11, 11,
+        /* Size 32x32 */
+        32, 33, 33, 34, 34, 31, 31, 28, 28, 25, 25, 21, 21, 21, 21, 21, 21, 20,
+        20, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 15, 33, 33, 33, 33,
+        33, 30, 30, 27, 27, 24, 24, 22, 22, 22, 22, 22, 22, 21, 21, 20, 20, 20,
+        20, 19, 19, 18, 18, 17, 17, 17, 17, 16, 33, 33, 33, 33, 33, 30, 30, 27,
+        27, 24, 24, 22, 22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 20, 19, 19, 18,
+        18, 17, 17, 17, 17, 16, 34, 33, 33, 32, 32, 29, 29, 26, 26, 24, 24, 22,
+        22, 23, 23, 23, 23, 22, 22, 22, 22, 21, 21, 20, 20, 19, 19, 18, 18, 18,
+        18, 17, 34, 33, 33, 32, 32, 29, 29, 26, 26, 24, 24, 22, 22, 23, 23, 23,
+        23, 22, 22, 22, 22, 21, 21, 20, 20, 19, 19, 18, 18, 18, 18, 17, 31, 30,
+        30, 29, 29, 26, 26, 24, 24, 23, 23, 22, 22, 22, 22, 23, 23, 22, 22, 22,
+        22, 21, 21, 20, 20, 19, 19, 18, 18, 18, 18, 17, 31, 30, 30, 29, 29, 26,
+        26, 24, 24, 23, 23, 22, 22, 22, 22, 23, 23, 22, 22, 22, 22, 21, 21, 20,
+        20, 19, 19, 18, 18, 18, 18, 17, 28, 27, 27, 26, 26, 24, 24, 22, 22, 22,
+        22, 21, 21, 22, 22, 23, 23, 22, 22, 22, 22, 21, 21, 20, 20, 20, 20, 19,
+        19, 19, 19, 18, 28, 27, 27, 26, 26, 24, 24, 22, 22, 22, 22, 21, 21, 22,
+        22, 23, 23, 22, 22, 22, 22, 21, 21, 20, 20, 20, 20, 19, 19, 19, 19, 18,
+        25, 24, 24, 24, 24, 23, 23, 22, 22, 21, 21, 20, 20, 21, 21, 21, 21, 20,
+        20, 20, 20, 20, 20, 19, 19, 19, 19, 18, 18, 18, 18, 17, 25, 24, 24, 24,
+        24, 23, 23, 22, 22, 21, 21, 20, 20, 21, 21, 21, 21, 20, 20, 20, 20, 20,
+        20, 19, 19, 19, 19, 18, 18, 18, 18, 17, 21, 22, 22, 22, 22, 22, 22, 21,
+        21, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18,
+        18, 17, 17, 17, 17, 17, 21, 22, 22, 22, 22, 22, 22, 21, 21, 20, 20, 19,
+        19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18, 17, 17, 17,
+        17, 17, 21, 22, 22, 23, 23, 22, 22, 22, 22, 21, 21, 19, 19, 19, 19, 19,
+        19, 18, 18, 18, 18, 18, 18, 17, 17, 17, 17, 16, 16, 16, 16, 16, 21, 22,
+        22, 23, 23, 22, 22, 22, 22, 21, 21, 19, 19, 19, 19, 19, 19, 18, 18, 18,
+        18, 18, 18, 17, 17, 17, 17, 16, 16, 16, 16, 16, 21, 22, 22, 23, 23, 23,
+        23, 23, 23, 21, 21, 19, 19, 19, 19, 18, 18, 17, 17, 17, 17, 17, 17, 16,
+        16, 16, 16, 15, 15, 15, 15, 15, 21, 22, 22, 23, 23, 23, 23, 23, 23, 21,
+        21, 19, 19, 19, 19, 18, 18, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 15,
+        15, 15, 15, 15, 20, 21, 21, 22, 22, 22, 22, 22, 22, 20, 20, 19, 19, 18,
+        18, 17, 17, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14,
+        20, 21, 21, 22, 22, 22, 22, 22, 22, 20, 20, 19, 19, 18, 18, 17, 17, 17,
+        17, 16, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 20, 20, 20, 22,
+        22, 22, 22, 22, 22, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 15,
+        15, 15, 15, 14, 14, 14, 14, 14, 14, 13, 20, 20, 20, 22, 22, 22, 22, 22,
+        22, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 14,
+        14, 14, 14, 14, 14, 13, 19, 20, 20, 21, 21, 21, 21, 21, 21, 20, 20, 19,
+        19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14, 14, 14, 14, 14, 13, 13, 13,
+        13, 13, 19, 20, 20, 21, 21, 21, 21, 21, 21, 20, 20, 19, 19, 18, 18, 17,
+        17, 16, 16, 15, 15, 14, 14, 14, 14, 14, 14, 13, 13, 13, 13, 13, 18, 19,
+        19, 20, 20, 20, 20, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 15, 15, 15,
+        15, 14, 14, 13, 13, 13, 13, 12, 12, 12, 12, 12, 18, 19, 19, 20, 20, 20,
+        20, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 15, 15, 15, 15, 14, 14, 13,
+        13, 13, 13, 12, 12, 12, 12, 12, 17, 18, 18, 19, 19, 19, 19, 20, 20, 19,
+        19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 12, 12, 12,
+        12, 12, 12, 12, 17, 18, 18, 19, 19, 19, 19, 20, 20, 19, 19, 18, 18, 17,
+        17, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 12, 12, 12, 12, 12, 12, 12,
+        16, 17, 17, 18, 18, 18, 18, 19, 19, 18, 18, 17, 17, 16, 16, 15, 15, 14,
+        14, 14, 14, 13, 13, 12, 12, 12, 12, 12, 12, 11, 11, 11, 16, 17, 17, 18,
+        18, 18, 18, 19, 19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13,
+        13, 12, 12, 12, 12, 12, 12, 11, 11, 11, 16, 17, 17, 18, 18, 18, 18, 19,
+        19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 12, 12, 12,
+        12, 11, 11, 11, 11, 11, 16, 17, 17, 18, 18, 18, 18, 19, 19, 18, 18, 17,
+        17, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11, 11,
+        11, 11, 15, 16, 16, 17, 17, 17, 17, 18, 18, 17, 17, 17, 17, 16, 16, 15,
+        15, 14, 14, 13, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11,
+        /* Size 4x8 */
+        33, 22, 20, 17, 28, 22, 22, 18, 24, 20, 20, 18, 22, 19, 18, 16, 22, 19,
+        16, 14, 20, 19, 15, 13, 19, 18, 14, 12, 17, 17, 14, 11,
+        /* Size 8x4 */
+        33, 28, 24, 22, 22, 20, 19, 17, 22, 22, 20, 19, 19, 19, 18, 17, 20, 22,
+        20, 18, 16, 15, 14, 14, 17, 18, 18, 16, 14, 13, 12, 11,
+        /* Size 8x16 */
+        32, 33, 28, 21, 21, 20, 18, 16, 33, 33, 27, 22, 22, 20, 19, 17, 34, 32,
+        26, 22, 23, 21, 20, 18, 31, 28, 24, 22, 22, 22, 20, 18, 28, 26, 22, 22,
+        23, 22, 20, 19, 24, 24, 22, 20, 21, 20, 19, 18, 21, 22, 21, 19, 19, 19,
+        18, 17, 21, 22, 22, 19, 18, 18, 17, 16, 21, 23, 22, 19, 18, 17, 16, 15,
+        20, 22, 22, 19, 17, 16, 15, 14, 20, 21, 22, 19, 17, 16, 14, 14, 19, 20,
+        21, 19, 17, 15, 14, 13, 18, 20, 20, 18, 16, 15, 13, 12, 17, 19, 20, 18,
+        16, 14, 13, 12, 16, 18, 19, 17, 15, 14, 12, 12, 16, 17, 18, 17, 15, 14,
+        12, 11,
+        /* Size 16x8 */
+        32, 33, 34, 31, 28, 24, 21, 21, 21, 20, 20, 19, 18, 17, 16, 16, 33, 33,
+        32, 28, 26, 24, 22, 22, 23, 22, 21, 20, 20, 19, 18, 17, 28, 27, 26, 24,
+        22, 22, 21, 22, 22, 22, 22, 21, 20, 20, 19, 18, 21, 22, 22, 22, 22, 20,
+        19, 19, 19, 19, 19, 19, 18, 18, 17, 17, 21, 22, 23, 22, 23, 21, 19, 18,
+        18, 17, 17, 17, 16, 16, 15, 15, 20, 20, 21, 22, 22, 20, 19, 18, 17, 16,
+        16, 15, 15, 14, 14, 14, 18, 19, 20, 20, 20, 19, 18, 17, 16, 15, 14, 14,
+        13, 13, 12, 12, 16, 17, 18, 18, 19, 18, 17, 16, 15, 14, 14, 13, 12, 12,
+        12, 11,
+        /* Size 16x32 */
+        32, 33, 33, 28, 28, 21, 21, 21, 21, 20, 20, 18, 18, 16, 16, 16, 33, 33,
+        33, 27, 27, 22, 22, 22, 22, 20, 20, 19, 19, 17, 17, 16, 33, 33, 33, 27,
+        27, 22, 22, 22, 22, 20, 20, 19, 19, 17, 17, 16, 34, 32, 32, 26, 26, 22,
+        22, 23, 23, 21, 21, 20, 20, 18, 18, 17, 34, 32, 32, 26, 26, 22, 22, 23,
+        23, 21, 21, 20, 20, 18, 18, 17, 31, 28, 28, 24, 24, 22, 22, 22, 22, 22,
+        22, 20, 20, 18, 18, 17, 31, 28, 28, 24, 24, 22, 22, 22, 22, 22, 22, 20,
+        20, 18, 18, 17, 28, 26, 26, 22, 22, 22, 22, 23, 23, 22, 22, 20, 20, 19,
+        19, 18, 28, 26, 26, 22, 22, 22, 22, 23, 23, 22, 22, 20, 20, 19, 19, 18,
+        24, 24, 24, 22, 22, 20, 20, 21, 21, 20, 20, 19, 19, 18, 18, 17, 24, 24,
+        24, 22, 22, 20, 20, 21, 21, 20, 20, 19, 19, 18, 18, 17, 21, 22, 22, 21,
+        21, 19, 19, 19, 19, 19, 19, 18, 18, 17, 17, 17, 21, 22, 22, 21, 21, 19,
+        19, 19, 19, 19, 19, 18, 18, 17, 17, 17, 21, 22, 22, 22, 22, 19, 19, 18,
+        18, 18, 18, 17, 17, 16, 16, 16, 21, 22, 22, 22, 22, 19, 19, 18, 18, 18,
+        18, 17, 17, 16, 16, 16, 21, 23, 23, 22, 22, 19, 19, 18, 18, 17, 17, 16,
+        16, 15, 15, 15, 21, 23, 23, 22, 22, 19, 19, 18, 18, 17, 17, 16, 16, 15,
+        15, 15, 20, 22, 22, 22, 22, 19, 19, 17, 17, 16, 16, 15, 15, 14, 14, 14,
+        20, 22, 22, 22, 22, 19, 19, 17, 17, 16, 16, 15, 15, 14, 14, 14, 20, 21,
+        21, 22, 22, 19, 19, 17, 17, 16, 16, 14, 14, 14, 14, 13, 20, 21, 21, 22,
+        22, 19, 19, 17, 17, 16, 16, 14, 14, 14, 14, 13, 19, 20, 20, 21, 21, 19,
+        19, 17, 17, 15, 15, 14, 14, 13, 13, 13, 19, 20, 20, 21, 21, 19, 19, 17,
+        17, 15, 15, 14, 14, 13, 13, 13, 18, 20, 20, 20, 20, 18, 18, 16, 16, 15,
+        15, 13, 13, 12, 12, 12, 18, 20, 20, 20, 20, 18, 18, 16, 16, 15, 15, 13,
+        13, 12, 12, 12, 17, 19, 19, 20, 20, 18, 18, 16, 16, 14, 14, 13, 13, 12,
+        12, 12, 17, 19, 19, 20, 20, 18, 18, 16, 16, 14, 14, 13, 13, 12, 12, 12,
+        16, 18, 18, 19, 19, 17, 17, 15, 15, 14, 14, 12, 12, 12, 12, 11, 16, 18,
+        18, 19, 19, 17, 17, 15, 15, 14, 14, 12, 12, 12, 12, 11, 16, 17, 17, 18,
+        18, 17, 17, 15, 15, 14, 14, 12, 12, 11, 11, 11, 16, 17, 17, 18, 18, 17,
+        17, 15, 15, 14, 14, 12, 12, 11, 11, 11, 16, 17, 17, 18, 18, 16, 16, 15,
+        15, 13, 13, 12, 12, 11, 11, 11,
+        /* Size 32x16 */
+        32, 33, 33, 34, 34, 31, 31, 28, 28, 24, 24, 21, 21, 21, 21, 21, 21, 20,
+        20, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 16, 33, 33, 33, 32,
+        32, 28, 28, 26, 26, 24, 24, 22, 22, 22, 22, 23, 23, 22, 22, 21, 21, 20,
+        20, 20, 20, 19, 19, 18, 18, 17, 17, 17, 33, 33, 33, 32, 32, 28, 28, 26,
+        26, 24, 24, 22, 22, 22, 22, 23, 23, 22, 22, 21, 21, 20, 20, 20, 20, 19,
+        19, 18, 18, 17, 17, 17, 28, 27, 27, 26, 26, 24, 24, 22, 22, 22, 22, 21,
+        21, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 20, 19, 19, 18,
+        18, 18, 28, 27, 27, 26, 26, 24, 24, 22, 22, 22, 22, 21, 21, 22, 22, 22,
+        22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 20, 19, 19, 18, 18, 18, 21, 22,
+        22, 22, 22, 22, 22, 22, 22, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19,
+        19, 19, 19, 18, 18, 18, 18, 17, 17, 17, 17, 16, 21, 22, 22, 22, 22, 22,
+        22, 22, 22, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 18,
+        18, 18, 18, 17, 17, 17, 17, 16, 21, 22, 22, 23, 23, 22, 22, 23, 23, 21,
+        21, 19, 19, 18, 18, 18, 18, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 15,
+        15, 15, 15, 15, 21, 22, 22, 23, 23, 22, 22, 23, 23, 21, 21, 19, 19, 18,
+        18, 18, 18, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 15,
+        20, 20, 20, 21, 21, 22, 22, 22, 22, 20, 20, 19, 19, 18, 18, 17, 17, 16,
+        16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 14, 13, 20, 20, 20, 21,
+        21, 22, 22, 22, 22, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 15,
+        15, 15, 15, 14, 14, 14, 14, 14, 14, 13, 18, 19, 19, 20, 20, 20, 20, 20,
+        20, 19, 19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 13,
+        13, 12, 12, 12, 12, 12, 18, 19, 19, 20, 20, 20, 20, 20, 20, 19, 19, 18,
+        18, 17, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 13, 13, 12, 12, 12,
+        12, 12, 16, 17, 17, 18, 18, 18, 18, 19, 19, 18, 18, 17, 17, 16, 16, 15,
+        15, 14, 14, 14, 14, 13, 13, 12, 12, 12, 12, 12, 12, 11, 11, 11, 16, 17,
+        17, 18, 18, 18, 18, 19, 19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14, 14,
+        14, 13, 13, 12, 12, 12, 12, 12, 12, 11, 11, 11, 16, 16, 16, 17, 17, 17,
+        17, 18, 18, 17, 17, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 13, 12,
+        12, 12, 12, 11, 11, 11, 11, 11,
+        /* Size 4x16 */
+        33, 21, 20, 16, 33, 22, 20, 17, 32, 22, 21, 18, 28, 22, 22, 18, 26, 22,
+        22, 19, 24, 20, 20, 18, 22, 19, 19, 17, 22, 19, 18, 16, 23, 19, 17, 15,
+        22, 19, 16, 14, 21, 19, 16, 14, 20, 19, 15, 13, 20, 18, 15, 12, 19, 18,
+        14, 12, 18, 17, 14, 12, 17, 17, 14, 11,
+        /* Size 16x4 */
+        33, 33, 32, 28, 26, 24, 22, 22, 23, 22, 21, 20, 20, 19, 18, 17, 21, 22,
+        22, 22, 22, 20, 19, 19, 19, 19, 19, 19, 18, 18, 17, 17, 20, 20, 21, 22,
+        22, 20, 19, 18, 17, 16, 16, 15, 15, 14, 14, 14, 16, 17, 18, 18, 19, 18,
+        17, 16, 15, 14, 14, 13, 12, 12, 12, 11,
+        /* Size 8x32 */
+        32, 33, 28, 21, 21, 20, 18, 16, 33, 33, 27, 22, 22, 20, 19, 17, 33, 33,
+        27, 22, 22, 20, 19, 17, 34, 32, 26, 22, 23, 21, 20, 18, 34, 32, 26, 22,
+        23, 21, 20, 18, 31, 28, 24, 22, 22, 22, 20, 18, 31, 28, 24, 22, 22, 22,
+        20, 18, 28, 26, 22, 22, 23, 22, 20, 19, 28, 26, 22, 22, 23, 22, 20, 19,
+        24, 24, 22, 20, 21, 20, 19, 18, 24, 24, 22, 20, 21, 20, 19, 18, 21, 22,
+        21, 19, 19, 19, 18, 17, 21, 22, 21, 19, 19, 19, 18, 17, 21, 22, 22, 19,
+        18, 18, 17, 16, 21, 22, 22, 19, 18, 18, 17, 16, 21, 23, 22, 19, 18, 17,
+        16, 15, 21, 23, 22, 19, 18, 17, 16, 15, 20, 22, 22, 19, 17, 16, 15, 14,
+        20, 22, 22, 19, 17, 16, 15, 14, 20, 21, 22, 19, 17, 16, 14, 14, 20, 21,
+        22, 19, 17, 16, 14, 14, 19, 20, 21, 19, 17, 15, 14, 13, 19, 20, 21, 19,
+        17, 15, 14, 13, 18, 20, 20, 18, 16, 15, 13, 12, 18, 20, 20, 18, 16, 15,
+        13, 12, 17, 19, 20, 18, 16, 14, 13, 12, 17, 19, 20, 18, 16, 14, 13, 12,
+        16, 18, 19, 17, 15, 14, 12, 12, 16, 18, 19, 17, 15, 14, 12, 12, 16, 17,
+        18, 17, 15, 14, 12, 11, 16, 17, 18, 17, 15, 14, 12, 11, 16, 17, 18, 16,
+        15, 13, 12, 11,
+        /* Size 32x8 */
+        32, 33, 33, 34, 34, 31, 31, 28, 28, 24, 24, 21, 21, 21, 21, 21, 21, 20,
+        20, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 16, 33, 33, 33, 32,
+        32, 28, 28, 26, 26, 24, 24, 22, 22, 22, 22, 23, 23, 22, 22, 21, 21, 20,
+        20, 20, 20, 19, 19, 18, 18, 17, 17, 17, 28, 27, 27, 26, 26, 24, 24, 22,
+        22, 22, 22, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 20, 20, 20,
+        20, 19, 19, 18, 18, 18, 21, 22, 22, 22, 22, 22, 22, 22, 22, 20, 20, 19,
+        19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18, 17, 17, 17,
+        17, 16, 21, 22, 22, 23, 23, 22, 22, 23, 23, 21, 21, 19, 19, 18, 18, 18,
+        18, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 15, 20, 20,
+        20, 21, 21, 22, 22, 22, 22, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 16,
+        16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 14, 13, 18, 19, 19, 20, 20, 20,
+        20, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13,
+        13, 13, 13, 12, 12, 12, 12, 12, 16, 17, 17, 18, 18, 18, 18, 19, 19, 18,
+        18, 17, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 12, 12, 12, 12, 12,
+        12, 11, 11, 11 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 31, 23, 17, 31, 26, 20, 16, 23, 20, 14, 12, 17, 16, 12, 9,
+        /* Size 8x8 */
+        33, 32, 32, 29, 24, 20, 17, 15, 32, 32, 31, 29, 25, 21, 18, 16, 32, 31,
+        29, 27, 24, 21, 18, 16, 29, 29, 27, 21, 19, 17, 16, 14, 24, 25, 24, 19,
+        16, 14, 13, 12, 20, 21, 21, 17, 14, 13, 12, 11, 17, 18, 18, 16, 13, 12,
+        10, 9, 15, 16, 16, 14, 12, 11, 9, 9,
+        /* Size 16x16 */
+        32, 33, 33, 33, 32, 30, 29, 27, 25, 23, 21, 19, 17, 16, 14, 13, 33, 32,
+        32, 32, 32, 30, 29, 28, 26, 24, 22, 20, 18, 17, 15, 13, 33, 32, 32, 32,
+        32, 31, 30, 28, 27, 25, 23, 21, 19, 17, 16, 14, 33, 32, 32, 31, 30, 29,
+        28, 27, 26, 24, 23, 20, 19, 17, 16, 14, 32, 32, 32, 30, 29, 28, 27, 26,
+        25, 24, 22, 21, 19, 18, 16, 15, 30, 30, 31, 29, 28, 26, 24, 23, 22, 21,
+        20, 19, 18, 16, 15, 14, 29, 29, 30, 28, 27, 24, 22, 21, 20, 19, 19, 17,
+        17, 15, 14, 13, 27, 28, 28, 27, 26, 23, 21, 20, 19, 18, 17, 16, 15, 14,
+        13, 12, 25, 26, 27, 26, 25, 22, 20, 19, 18, 17, 16, 15, 14, 14, 13, 12,
+        23, 24, 25, 24, 24, 21, 19, 18, 17, 16, 15, 14, 13, 13, 12, 11, 21, 22,
+        23, 23, 22, 20, 19, 17, 16, 15, 14, 13, 13, 12, 11, 11, 19, 20, 21, 20,
+        21, 19, 17, 16, 15, 14, 13, 12, 12, 11, 11, 10, 17, 18, 19, 19, 19, 18,
+        17, 15, 14, 13, 13, 12, 11, 10, 10, 9, 16, 17, 17, 17, 18, 16, 15, 14,
+        14, 13, 12, 11, 10, 10, 9, 9, 14, 15, 16, 16, 16, 15, 14, 13, 13, 12,
+        11, 11, 10, 9, 9, 8, 13, 13, 14, 14, 15, 14, 13, 12, 12, 11, 11, 10, 9,
+        9, 8, 8,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 30, 30, 29, 28, 27, 26, 25, 23,
+        23, 21, 21, 19, 19, 18, 17, 17, 16, 15, 14, 14, 13, 13, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 30, 30, 29, 29, 28, 27, 26, 24, 24, 22, 22, 20,
+        20, 19, 18, 17, 17, 16, 15, 15, 13, 13, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 30, 30, 29, 29, 28, 27, 26, 24, 24, 22, 22, 20, 20, 19, 18, 17,
+        17, 16, 15, 15, 13, 13, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        30, 30, 28, 27, 26, 25, 24, 23, 23, 21, 20, 19, 19, 18, 17, 17, 16, 16,
+        14, 14, 33, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 28, 28,
+        27, 25, 25, 23, 23, 21, 21, 20, 19, 18, 17, 17, 16, 16, 14, 14, 33, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 29, 29, 28, 27, 26, 25, 24, 23,
+        23, 21, 21, 20, 19, 18, 17, 17, 16, 16, 14, 14, 33, 32, 32, 32, 32, 31,
+        31, 31, 30, 30, 29, 29, 28, 28, 27, 26, 26, 24, 24, 23, 23, 21, 20, 20,
+        19, 18, 17, 17, 16, 16, 14, 14, 32, 32, 32, 32, 32, 31, 31, 30, 30, 30,
+        29, 29, 28, 28, 27, 26, 26, 24, 24, 23, 23, 21, 21, 20, 19, 18, 17, 17,
+        16, 16, 15, 15, 32, 32, 32, 32, 32, 31, 30, 30, 29, 29, 28, 28, 27, 27,
+        26, 26, 25, 24, 24, 22, 22, 21, 21, 20, 19, 19, 18, 17, 16, 16, 15, 15,
+        32, 32, 32, 32, 31, 31, 30, 30, 29, 29, 28, 28, 27, 27, 26, 25, 25, 24,
+        24, 22, 22, 21, 20, 20, 19, 18, 18, 17, 16, 16, 15, 15, 30, 30, 30, 31,
+        31, 30, 29, 29, 28, 28, 26, 26, 24, 24, 23, 23, 22, 22, 21, 20, 20, 19,
+        19, 18, 18, 17, 16, 16, 15, 15, 14, 14, 30, 30, 30, 31, 31, 30, 29, 29,
+        28, 28, 26, 26, 24, 24, 23, 23, 22, 22, 21, 20, 20, 19, 19, 18, 18, 17,
+        16, 16, 15, 15, 14, 14, 29, 29, 29, 30, 30, 29, 28, 28, 27, 27, 24, 24,
+        22, 22, 21, 21, 20, 20, 19, 19, 19, 18, 17, 17, 17, 16, 15, 15, 14, 14,
+        13, 13, 28, 29, 29, 30, 30, 29, 28, 28, 27, 27, 24, 24, 22, 21, 20, 20,
+        20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 27, 28,
+        28, 28, 28, 28, 27, 27, 26, 26, 23, 23, 21, 20, 20, 20, 19, 18, 18, 17,
+        17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 26, 27, 27, 27, 28, 27,
+        26, 26, 26, 25, 23, 23, 21, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 15,
+        15, 14, 14, 14, 13, 13, 12, 12, 25, 26, 26, 26, 27, 26, 26, 26, 25, 25,
+        22, 22, 20, 20, 19, 19, 18, 17, 17, 16, 16, 15, 15, 15, 14, 14, 14, 13,
+        13, 13, 12, 12, 23, 24, 24, 25, 25, 25, 24, 24, 24, 24, 22, 22, 20, 19,
+        18, 18, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 13, 12, 12, 11, 11,
+        23, 24, 24, 24, 25, 24, 24, 24, 24, 24, 21, 21, 19, 19, 18, 18, 17, 16,
+        16, 15, 15, 14, 14, 14, 13, 13, 13, 12, 12, 12, 11, 11, 21, 22, 22, 23,
+        23, 23, 23, 23, 22, 22, 20, 20, 19, 18, 17, 17, 16, 15, 15, 14, 14, 14,
+        13, 13, 13, 12, 12, 12, 11, 11, 11, 11, 21, 22, 22, 23, 23, 23, 23, 23,
+        22, 22, 20, 20, 19, 18, 17, 17, 16, 15, 15, 14, 14, 14, 13, 13, 13, 12,
+        12, 12, 11, 11, 11, 11, 19, 20, 20, 21, 21, 21, 21, 21, 21, 21, 19, 19,
+        18, 17, 17, 16, 15, 14, 14, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 11,
+        10, 10, 19, 20, 20, 20, 21, 21, 20, 21, 21, 20, 19, 19, 17, 17, 16, 16,
+        15, 14, 14, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 10, 10, 10, 18, 19,
+        19, 19, 20, 20, 20, 20, 20, 20, 18, 18, 17, 17, 16, 15, 15, 14, 14, 13,
+        13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 17, 18, 18, 19, 19, 19,
+        19, 19, 19, 19, 18, 18, 17, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 11,
+        11, 11, 10, 10, 10, 10, 9, 9, 17, 17, 17, 18, 18, 18, 18, 18, 19, 18,
+        17, 17, 16, 16, 15, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11, 10, 10, 10,
+        10, 9, 9, 9, 16, 17, 17, 17, 17, 17, 17, 17, 18, 18, 16, 16, 15, 15, 14,
+        14, 14, 13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 15, 16,
+        16, 17, 17, 17, 17, 17, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12,
+        12, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 9, 14, 15, 15, 16, 16, 16, 16,
+        16, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10,
+        10, 9, 9, 9, 9, 8, 8, 14, 15, 15, 16, 16, 16, 16, 16, 16, 16, 15, 15,
+        14, 14, 13, 13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 9, 9, 9, 8, 8,
+        13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 14, 14, 13, 13, 12, 12, 12, 11,
+        11, 11, 11, 10, 10, 9, 9, 9, 9, 9, 8, 8, 8, 8, 13, 13, 13, 14, 14, 14,
+        14, 15, 15, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 11, 10, 10, 9,
+        9, 9, 9, 9, 8, 8, 8, 8,
+        /* Size 4x8 */
+        32, 30, 24, 17, 32, 30, 24, 17, 31, 28, 23, 18, 29, 24, 19, 15, 25, 21,
+        16, 13, 21, 19, 14, 11, 18, 17, 13, 10, 16, 15, 12, 9,
+        /* Size 8x4 */
+        32, 32, 31, 29, 25, 21, 18, 16, 30, 30, 28, 24, 21, 19, 17, 15, 24, 24,
+        23, 19, 16, 14, 13, 12, 17, 17, 18, 15, 13, 11, 10, 9,
+        /* Size 8x16 */
+        32, 33, 32, 28, 23, 19, 17, 14, 33, 32, 32, 29, 24, 20, 17, 15, 33, 32,
+        31, 30, 25, 21, 18, 16, 32, 32, 30, 28, 24, 20, 18, 16, 32, 31, 29, 27,
+        24, 21, 18, 16, 30, 30, 28, 24, 21, 19, 17, 15, 29, 30, 27, 22, 20, 17,
+        16, 14, 27, 28, 26, 21, 18, 16, 15, 13, 25, 26, 25, 20, 17, 15, 14, 13,
+        23, 24, 24, 19, 16, 14, 13, 12, 21, 23, 22, 18, 15, 13, 12, 11, 19, 21,
+        20, 17, 14, 12, 11, 10, 18, 19, 19, 16, 14, 12, 11, 10, 16, 17, 18, 15,
+        13, 11, 10, 9, 14, 16, 16, 14, 12, 11, 9, 9, 13, 14, 15, 13, 11, 10, 9,
+        8,
+        /* Size 16x8 */
+        32, 33, 33, 32, 32, 30, 29, 27, 25, 23, 21, 19, 18, 16, 14, 13, 33, 32,
+        32, 32, 31, 30, 30, 28, 26, 24, 23, 21, 19, 17, 16, 14, 32, 32, 31, 30,
+        29, 28, 27, 26, 25, 24, 22, 20, 19, 18, 16, 15, 28, 29, 30, 28, 27, 24,
+        22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 23, 24, 25, 24, 24, 21, 20, 18,
+        17, 16, 15, 14, 14, 13, 12, 11, 19, 20, 21, 20, 21, 19, 17, 16, 15, 14,
+        13, 12, 12, 11, 11, 10, 17, 17, 18, 18, 18, 17, 16, 15, 14, 13, 12, 11,
+        11, 10, 9, 9, 14, 15, 16, 16, 16, 15, 14, 13, 13, 12, 11, 10, 10, 9, 9,
+        8,
+        /* Size 16x32 */
+        32, 33, 33, 32, 32, 30, 28, 27, 23, 23, 19, 19, 17, 16, 14, 13, 33, 32,
+        32, 32, 32, 30, 29, 28, 24, 24, 20, 20, 17, 17, 15, 14, 33, 32, 32, 32,
+        32, 30, 29, 28, 24, 24, 20, 20, 17, 17, 15, 14, 33, 32, 32, 32, 32, 31,
+        29, 28, 25, 24, 20, 20, 18, 17, 15, 14, 33, 32, 32, 32, 31, 31, 30, 28,
+        25, 25, 21, 21, 18, 17, 16, 14, 33, 32, 32, 31, 31, 30, 29, 28, 25, 24,
+        21, 21, 18, 17, 16, 14, 32, 32, 32, 31, 30, 29, 28, 27, 24, 24, 20, 20,
+        18, 17, 16, 14, 32, 32, 32, 30, 30, 29, 28, 27, 24, 24, 21, 21, 18, 17,
+        16, 15, 32, 32, 31, 30, 29, 28, 27, 26, 24, 24, 21, 21, 18, 18, 16, 15,
+        32, 31, 31, 30, 29, 28, 26, 26, 24, 23, 20, 20, 18, 18, 16, 15, 30, 30,
+        30, 28, 28, 26, 24, 23, 21, 21, 19, 19, 17, 16, 15, 14, 30, 30, 30, 28,
+        28, 26, 24, 23, 21, 21, 19, 19, 17, 16, 15, 14, 29, 30, 30, 28, 27, 24,
+        22, 21, 20, 19, 17, 17, 16, 15, 14, 13, 28, 29, 30, 28, 27, 24, 21, 21,
+        19, 19, 17, 17, 16, 15, 14, 13, 27, 28, 28, 27, 26, 23, 21, 20, 18, 18,
+        16, 16, 15, 14, 13, 13, 26, 27, 28, 26, 26, 23, 20, 20, 18, 18, 16, 16,
+        14, 14, 13, 12, 25, 26, 26, 25, 25, 22, 20, 19, 17, 17, 15, 15, 14, 13,
+        13, 12, 23, 25, 25, 24, 24, 21, 19, 18, 16, 16, 14, 14, 13, 13, 12, 11,
+        23, 24, 24, 24, 24, 21, 19, 18, 16, 16, 14, 14, 13, 13, 12, 11, 21, 23,
+        23, 22, 22, 20, 18, 17, 15, 15, 13, 13, 12, 12, 11, 11, 21, 23, 23, 22,
+        22, 20, 18, 17, 15, 15, 13, 13, 12, 12, 11, 11, 19, 21, 21, 21, 21, 19,
+        17, 17, 14, 14, 13, 13, 12, 11, 10, 10, 19, 20, 21, 20, 20, 19, 17, 16,
+        14, 14, 12, 12, 11, 11, 10, 10, 18, 19, 20, 20, 20, 18, 17, 16, 14, 14,
+        12, 12, 11, 11, 10, 9, 18, 19, 19, 19, 19, 18, 16, 15, 14, 13, 12, 12,
+        11, 10, 10, 9, 17, 18, 18, 18, 18, 17, 16, 15, 13, 13, 12, 12, 10, 10,
+        9, 9, 16, 17, 17, 17, 18, 16, 15, 14, 13, 13, 11, 11, 10, 10, 9, 9, 15,
+        17, 17, 17, 17, 16, 15, 14, 13, 12, 11, 11, 10, 10, 9, 9, 14, 16, 16,
+        16, 16, 15, 14, 13, 12, 12, 11, 11, 9, 9, 9, 8, 14, 16, 16, 16, 16, 15,
+        14, 13, 12, 12, 10, 10, 9, 9, 9, 8, 13, 14, 14, 14, 15, 14, 13, 12, 11,
+        11, 10, 10, 9, 9, 8, 8, 13, 14, 14, 14, 15, 14, 13, 12, 11, 11, 10, 10,
+        9, 9, 8, 8,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 33, 32, 32, 32, 32, 30, 30, 29, 28, 27, 26, 25, 23,
+        23, 21, 21, 19, 19, 18, 18, 17, 16, 15, 14, 14, 13, 13, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 30, 30, 30, 29, 28, 27, 26, 25, 24, 23, 23, 21,
+        20, 19, 19, 18, 17, 17, 16, 16, 14, 14, 33, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 30, 30, 30, 30, 28, 28, 26, 25, 24, 23, 23, 21, 21, 20, 19, 18,
+        17, 17, 16, 16, 14, 14, 32, 32, 32, 32, 32, 31, 31, 30, 30, 30, 28, 28,
+        28, 28, 27, 26, 25, 24, 24, 22, 22, 21, 20, 20, 19, 18, 17, 17, 16, 16,
+        14, 14, 32, 32, 32, 32, 31, 31, 30, 30, 29, 29, 28, 28, 27, 27, 26, 26,
+        25, 24, 24, 22, 22, 21, 20, 20, 19, 18, 18, 17, 16, 16, 15, 15, 30, 30,
+        30, 31, 31, 30, 29, 29, 28, 28, 26, 26, 24, 24, 23, 23, 22, 21, 21, 20,
+        20, 19, 19, 18, 18, 17, 16, 16, 15, 15, 14, 14, 28, 29, 29, 29, 30, 29,
+        28, 28, 27, 26, 24, 24, 22, 21, 21, 20, 20, 19, 19, 18, 18, 17, 17, 17,
+        16, 16, 15, 15, 14, 14, 13, 13, 27, 28, 28, 28, 28, 28, 27, 27, 26, 26,
+        23, 23, 21, 21, 20, 20, 19, 18, 18, 17, 17, 17, 16, 16, 15, 15, 14, 14,
+        13, 13, 12, 12, 23, 24, 24, 25, 25, 25, 24, 24, 24, 24, 21, 21, 20, 19,
+        18, 18, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 13, 12, 12, 11, 11,
+        23, 24, 24, 24, 25, 24, 24, 24, 24, 23, 21, 21, 19, 19, 18, 18, 17, 16,
+        16, 15, 15, 14, 14, 14, 13, 13, 13, 12, 12, 12, 11, 11, 19, 20, 20, 20,
+        21, 21, 20, 21, 21, 20, 19, 19, 17, 17, 16, 16, 15, 14, 14, 13, 13, 13,
+        12, 12, 12, 12, 11, 11, 11, 10, 10, 10, 19, 20, 20, 20, 21, 21, 20, 21,
+        21, 20, 19, 19, 17, 17, 16, 16, 15, 14, 14, 13, 13, 13, 12, 12, 12, 12,
+        11, 11, 11, 10, 10, 10, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 17, 17,
+        16, 16, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 9,
+        9, 16, 17, 17, 17, 17, 17, 17, 17, 18, 18, 16, 16, 15, 15, 14, 14, 13,
+        13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 14, 15, 15, 15,
+        16, 16, 16, 16, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 11, 11, 10,
+        10, 10, 10, 9, 9, 9, 9, 9, 8, 8, 13, 14, 14, 14, 14, 14, 14, 15, 15, 15,
+        14, 14, 13, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10, 9, 9, 9, 9, 9, 8, 8,
+        8, 8,
+        /* Size 4x16 */
+        33, 30, 23, 16, 32, 30, 24, 17, 32, 31, 25, 17, 32, 29, 24, 17, 32, 28,
+        24, 18, 30, 26, 21, 16, 30, 24, 19, 15, 28, 23, 18, 14, 26, 22, 17, 13,
+        24, 21, 16, 13, 23, 20, 15, 12, 20, 19, 14, 11, 19, 18, 13, 10, 17, 16,
+        13, 10, 16, 15, 12, 9, 14, 14, 11, 9,
+        /* Size 16x4 */
+        33, 32, 32, 32, 32, 30, 30, 28, 26, 24, 23, 20, 19, 17, 16, 14, 30, 30,
+        31, 29, 28, 26, 24, 23, 22, 21, 20, 19, 18, 16, 15, 14, 23, 24, 25, 24,
+        24, 21, 19, 18, 17, 16, 15, 14, 13, 13, 12, 11, 16, 17, 17, 17, 18, 16,
+        15, 14, 13, 13, 12, 11, 10, 10, 9, 9,
+        /* Size 8x32 */
+        32, 33, 32, 28, 23, 19, 17, 14, 33, 32, 32, 29, 24, 20, 17, 15, 33, 32,
+        32, 29, 24, 20, 17, 15, 33, 32, 32, 29, 25, 20, 18, 15, 33, 32, 31, 30,
+        25, 21, 18, 16, 33, 32, 31, 29, 25, 21, 18, 16, 32, 32, 30, 28, 24, 20,
+        18, 16, 32, 32, 30, 28, 24, 21, 18, 16, 32, 31, 29, 27, 24, 21, 18, 16,
+        32, 31, 29, 26, 24, 20, 18, 16, 30, 30, 28, 24, 21, 19, 17, 15, 30, 30,
+        28, 24, 21, 19, 17, 15, 29, 30, 27, 22, 20, 17, 16, 14, 28, 30, 27, 21,
+        19, 17, 16, 14, 27, 28, 26, 21, 18, 16, 15, 13, 26, 28, 26, 20, 18, 16,
+        14, 13, 25, 26, 25, 20, 17, 15, 14, 13, 23, 25, 24, 19, 16, 14, 13, 12,
+        23, 24, 24, 19, 16, 14, 13, 12, 21, 23, 22, 18, 15, 13, 12, 11, 21, 23,
+        22, 18, 15, 13, 12, 11, 19, 21, 21, 17, 14, 13, 12, 10, 19, 21, 20, 17,
+        14, 12, 11, 10, 18, 20, 20, 17, 14, 12, 11, 10, 18, 19, 19, 16, 14, 12,
+        11, 10, 17, 18, 18, 16, 13, 12, 10, 9, 16, 17, 18, 15, 13, 11, 10, 9,
+        15, 17, 17, 15, 13, 11, 10, 9, 14, 16, 16, 14, 12, 11, 9, 9, 14, 16, 16,
+        14, 12, 10, 9, 9, 13, 14, 15, 13, 11, 10, 9, 8, 13, 14, 15, 13, 11, 10,
+        9, 8,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 33, 32, 32, 32, 32, 30, 30, 29, 28, 27, 26, 25, 23,
+        23, 21, 21, 19, 19, 18, 18, 17, 16, 15, 14, 14, 13, 13, 33, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 30, 30, 30, 30, 28, 28, 26, 25, 24, 23, 23, 21,
+        21, 20, 19, 18, 17, 17, 16, 16, 14, 14, 32, 32, 32, 32, 31, 31, 30, 30,
+        29, 29, 28, 28, 27, 27, 26, 26, 25, 24, 24, 22, 22, 21, 20, 20, 19, 18,
+        18, 17, 16, 16, 15, 15, 28, 29, 29, 29, 30, 29, 28, 28, 27, 26, 24, 24,
+        22, 21, 21, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 15, 15, 14, 14,
+        13, 13, 23, 24, 24, 25, 25, 25, 24, 24, 24, 24, 21, 21, 20, 19, 18, 18,
+        17, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 13, 12, 12, 11, 11, 19, 20,
+        20, 20, 21, 21, 20, 21, 21, 20, 19, 19, 17, 17, 16, 16, 15, 14, 14, 13,
+        13, 13, 12, 12, 12, 12, 11, 11, 11, 10, 10, 10, 17, 17, 17, 18, 18, 18,
+        18, 18, 18, 18, 17, 17, 16, 16, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11,
+        11, 10, 10, 10, 9, 9, 9, 9, 14, 15, 15, 15, 16, 16, 16, 16, 16, 16, 15,
+        15, 14, 14, 13, 13, 13, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 9,
+        8, 8 },
+      { /* Chroma */
+        /* Size 4x4 */
+        33, 24, 22, 19, 24, 21, 20, 19, 22, 20, 17, 15, 19, 19, 15, 13,
+        /* Size 8x8 */
+        33, 32, 27, 21, 22, 20, 19, 18, 32, 29, 24, 22, 23, 22, 20, 19, 27, 24,
+        22, 21, 23, 22, 21, 20, 21, 22, 21, 19, 19, 19, 18, 18, 22, 23, 23, 19,
+        18, 17, 16, 16, 20, 22, 22, 19, 17, 16, 15, 14, 19, 20, 21, 18, 16, 15,
+        14, 13, 18, 19, 20, 18, 16, 14, 13, 12,
+        /* Size 16x16 */
+        32, 33, 34, 31, 28, 25, 22, 21, 21, 21, 20, 20, 19, 18, 17, 16, 33, 33,
+        33, 30, 27, 24, 22, 22, 22, 22, 21, 20, 20, 19, 18, 17, 34, 33, 32, 29,
+        26, 24, 23, 22, 23, 23, 22, 22, 21, 20, 19, 18, 31, 30, 29, 26, 24, 23,
+        22, 22, 22, 23, 22, 22, 21, 20, 19, 18, 28, 27, 26, 24, 22, 22, 22, 22,
+        22, 23, 22, 22, 21, 20, 20, 19, 25, 24, 24, 23, 22, 21, 20, 20, 21, 21,
+        20, 20, 20, 19, 19, 18, 22, 22, 23, 22, 22, 20, 20, 20, 20, 20, 19, 19,
+        19, 18, 18, 17, 21, 22, 22, 22, 22, 20, 20, 19, 19, 19, 19, 18, 18, 18,
+        17, 17, 21, 22, 23, 22, 22, 21, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16,
+        21, 22, 23, 23, 23, 21, 20, 19, 18, 17, 17, 17, 16, 16, 16, 15, 20, 21,
+        22, 22, 22, 20, 19, 19, 18, 17, 17, 16, 16, 15, 15, 14, 20, 20, 22, 22,
+        22, 20, 19, 18, 17, 17, 16, 16, 15, 15, 14, 14, 19, 20, 21, 21, 21, 20,
+        19, 18, 17, 16, 16, 15, 14, 14, 14, 13, 18, 19, 20, 20, 20, 19, 18, 18,
+        17, 16, 15, 15, 14, 13, 13, 12, 17, 18, 19, 19, 20, 19, 18, 17, 16, 16,
+        15, 14, 14, 13, 12, 12, 16, 17, 18, 18, 19, 18, 17, 17, 16, 15, 14, 14,
+        13, 12, 12, 12,
+        /* Size 32x32 */
+        32, 33, 33, 34, 34, 32, 31, 30, 28, 28, 25, 25, 22, 21, 21, 21, 21, 21,
+        21, 20, 20, 20, 20, 19, 19, 18, 18, 18, 17, 17, 16, 16, 33, 33, 33, 33,
+        33, 32, 30, 29, 27, 27, 24, 24, 22, 21, 22, 22, 22, 22, 22, 21, 21, 20,
+        20, 20, 20, 19, 19, 19, 18, 18, 17, 17, 33, 33, 33, 33, 33, 31, 30, 29,
+        27, 26, 24, 24, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 20, 20, 20, 19,
+        19, 19, 18, 18, 17, 17, 34, 33, 33, 33, 33, 31, 29, 28, 26, 26, 24, 24,
+        22, 22, 22, 22, 22, 23, 22, 22, 22, 21, 21, 20, 20, 20, 20, 19, 19, 19,
+        18, 18, 34, 33, 33, 33, 32, 31, 29, 28, 26, 26, 24, 24, 23, 22, 22, 23,
+        23, 23, 23, 22, 22, 22, 22, 21, 21, 20, 20, 20, 19, 19, 18, 18, 32, 32,
+        31, 31, 31, 29, 28, 27, 25, 24, 24, 24, 22, 22, 22, 22, 23, 23, 23, 22,
+        22, 22, 22, 21, 21, 20, 20, 20, 19, 19, 18, 18, 31, 30, 30, 29, 29, 28,
+        26, 26, 24, 24, 23, 23, 22, 22, 22, 22, 22, 23, 23, 22, 22, 22, 22, 21,
+        21, 20, 20, 20, 19, 19, 18, 18, 30, 29, 29, 28, 28, 27, 26, 25, 23, 23,
+        23, 23, 22, 22, 22, 22, 22, 23, 23, 22, 22, 22, 22, 21, 21, 20, 20, 20,
+        19, 19, 19, 19, 28, 27, 27, 26, 26, 25, 24, 23, 22, 22, 22, 22, 22, 21,
+        22, 22, 22, 23, 23, 22, 22, 22, 22, 21, 21, 21, 20, 20, 20, 20, 19, 19,
+        28, 27, 26, 26, 26, 24, 24, 23, 22, 22, 22, 22, 21, 21, 22, 22, 22, 23,
+        22, 22, 22, 22, 22, 21, 21, 21, 20, 20, 20, 20, 19, 19, 25, 24, 24, 24,
+        24, 24, 23, 23, 22, 22, 21, 21, 20, 20, 20, 21, 21, 21, 21, 20, 20, 20,
+        20, 20, 20, 20, 19, 19, 19, 19, 18, 18, 25, 24, 24, 24, 24, 24, 23, 23,
+        22, 22, 21, 21, 20, 20, 20, 21, 21, 21, 21, 20, 20, 20, 20, 20, 20, 20,
+        19, 19, 19, 19, 18, 18, 22, 22, 22, 22, 23, 22, 22, 22, 22, 21, 20, 20,
+        20, 20, 20, 20, 20, 20, 20, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18,
+        17, 17, 21, 21, 22, 22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 19, 19, 19,
+        19, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 17, 17, 21, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 20, 20, 20, 19, 19, 19, 19, 19, 19, 19,
+        19, 18, 18, 18, 18, 18, 18, 17, 17, 17, 17, 17, 21, 22, 22, 22, 23, 22,
+        22, 22, 22, 22, 21, 21, 20, 19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 18,
+        18, 17, 17, 17, 17, 17, 16, 16, 21, 22, 22, 22, 23, 23, 22, 22, 22, 22,
+        21, 21, 20, 19, 19, 19, 19, 18, 18, 18, 18, 17, 17, 17, 17, 17, 17, 17,
+        16, 16, 16, 16, 21, 22, 22, 23, 23, 23, 23, 23, 23, 23, 21, 21, 20, 19,
+        19, 19, 18, 18, 17, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 16, 15, 15,
+        21, 22, 22, 22, 23, 23, 23, 23, 23, 22, 21, 21, 20, 19, 19, 18, 18, 17,
+        17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 16, 16, 15, 15, 20, 21, 21, 22,
+        22, 22, 22, 22, 22, 22, 20, 20, 19, 19, 19, 18, 18, 17, 17, 17, 17, 16,
+        16, 16, 16, 16, 15, 15, 15, 15, 14, 14, 20, 21, 21, 22, 22, 22, 22, 22,
+        22, 22, 20, 20, 19, 19, 19, 18, 18, 17, 17, 17, 17, 16, 16, 16, 16, 16,
+        15, 15, 15, 15, 14, 14, 20, 20, 21, 21, 22, 22, 22, 22, 22, 22, 20, 20,
+        19, 19, 18, 18, 17, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14,
+        14, 14, 20, 20, 20, 21, 22, 22, 22, 22, 22, 22, 20, 20, 19, 19, 18, 18,
+        17, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 19, 20,
+        20, 20, 21, 21, 21, 21, 21, 21, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16,
+        16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 13, 13, 19, 20, 20, 20, 21, 21,
+        21, 21, 21, 21, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 16, 15, 15, 15,
+        14, 14, 14, 14, 14, 13, 13, 13, 18, 19, 19, 20, 20, 20, 20, 20, 21, 21,
+        20, 20, 19, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 14, 14, 14, 14, 14,
+        13, 13, 13, 13, 18, 19, 19, 20, 20, 20, 20, 20, 20, 20, 19, 19, 18, 18,
+        18, 17, 17, 16, 16, 15, 15, 15, 15, 14, 14, 14, 13, 13, 13, 13, 12, 12,
+        18, 19, 19, 19, 20, 20, 20, 20, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16,
+        16, 15, 15, 14, 14, 14, 14, 14, 13, 13, 13, 13, 12, 12, 17, 18, 18, 19,
+        19, 19, 19, 19, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 16, 15, 15, 14,
+        14, 14, 14, 13, 13, 13, 12, 12, 12, 12, 17, 18, 18, 19, 19, 19, 19, 19,
+        20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 16, 15, 15, 14, 14, 14, 13, 13,
+        13, 13, 12, 12, 12, 12, 16, 17, 17, 18, 18, 18, 18, 19, 19, 19, 18, 18,
+        17, 17, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 13, 12, 12, 12, 12,
+        12, 12, 16, 17, 17, 18, 18, 18, 18, 19, 19, 19, 18, 18, 17, 17, 17, 16,
+        16, 15, 15, 14, 14, 14, 14, 13, 13, 13, 12, 12, 12, 12, 12, 12,
+        /* Size 4x8 */
+        33, 24, 22, 19, 31, 23, 23, 20, 26, 22, 22, 20, 22, 20, 19, 18, 23, 21,
+        17, 16, 21, 20, 17, 15, 20, 20, 16, 14, 19, 19, 16, 13,
+        /* Size 8x4 */
+        33, 31, 26, 22, 23, 21, 20, 19, 24, 23, 22, 20, 21, 20, 20, 19, 22, 23,
+        22, 19, 17, 17, 16, 16, 19, 20, 20, 18, 16, 15, 14, 13,
+        /* Size 8x16 */
+        32, 33, 28, 21, 21, 20, 18, 17, 33, 33, 27, 22, 22, 20, 19, 18, 34, 32,
+        26, 22, 23, 21, 20, 19, 31, 28, 24, 22, 22, 22, 20, 19, 28, 26, 22, 22,
+        23, 22, 21, 20, 24, 24, 22, 20, 21, 20, 19, 18, 22, 22, 21, 20, 19, 19,
+        19, 18, 21, 22, 22, 19, 19, 18, 18, 17, 21, 23, 22, 19, 18, 17, 17, 16,
+        21, 23, 22, 19, 18, 17, 16, 16, 20, 22, 22, 19, 17, 16, 16, 15, 20, 21,
+        22, 19, 17, 16, 15, 14, 19, 20, 21, 19, 17, 15, 14, 13, 18, 20, 20, 18,
+        16, 15, 14, 13, 17, 19, 20, 18, 16, 14, 13, 12, 16, 18, 19, 17, 15, 14,
+        13, 12,
+        /* Size 16x8 */
+        32, 33, 34, 31, 28, 24, 22, 21, 21, 21, 20, 20, 19, 18, 17, 16, 33, 33,
+        32, 28, 26, 24, 22, 22, 23, 23, 22, 21, 20, 20, 19, 18, 28, 27, 26, 24,
+        22, 22, 21, 22, 22, 22, 22, 22, 21, 20, 20, 19, 21, 22, 22, 22, 22, 20,
+        20, 19, 19, 19, 19, 19, 19, 18, 18, 17, 21, 22, 23, 22, 23, 21, 19, 19,
+        18, 18, 17, 17, 17, 16, 16, 15, 20, 20, 21, 22, 22, 20, 19, 18, 17, 17,
+        16, 16, 15, 15, 14, 14, 18, 19, 20, 20, 21, 19, 19, 18, 17, 16, 16, 15,
+        14, 14, 13, 13, 17, 18, 19, 19, 20, 18, 18, 17, 16, 16, 15, 14, 13, 13,
+        12, 12,
+        /* Size 16x32 */
+        32, 33, 33, 29, 28, 24, 21, 21, 21, 21, 20, 20, 18, 18, 17, 16, 33, 33,
+        33, 28, 27, 24, 22, 22, 22, 22, 20, 20, 19, 19, 18, 17, 33, 33, 33, 28,
+        27, 24, 22, 22, 22, 22, 20, 20, 19, 19, 18, 17, 34, 32, 32, 28, 26, 24,
+        22, 22, 22, 22, 21, 21, 20, 20, 18, 18, 34, 32, 32, 28, 26, 24, 22, 22,
+        23, 23, 21, 21, 20, 20, 19, 18, 32, 31, 30, 26, 25, 23, 22, 22, 23, 23,
+        21, 21, 20, 20, 19, 18, 31, 29, 28, 26, 24, 23, 22, 22, 22, 22, 22, 22,
+        20, 20, 19, 18, 30, 28, 28, 24, 23, 23, 22, 22, 23, 22, 22, 22, 20, 20,
+        19, 19, 28, 26, 26, 23, 22, 22, 22, 22, 23, 22, 22, 22, 21, 20, 20, 19,
+        28, 26, 26, 23, 22, 22, 21, 22, 22, 22, 22, 22, 21, 20, 19, 19, 24, 24,
+        24, 22, 22, 21, 20, 20, 21, 21, 20, 20, 19, 19, 18, 18, 24, 24, 24, 22,
+        22, 21, 20, 20, 21, 21, 20, 20, 19, 19, 18, 18, 22, 22, 22, 22, 21, 20,
+        20, 20, 19, 19, 19, 19, 19, 18, 18, 17, 21, 22, 22, 22, 21, 20, 19, 19,
+        19, 19, 19, 19, 18, 18, 17, 17, 21, 22, 22, 22, 22, 20, 19, 19, 19, 19,
+        18, 18, 18, 18, 17, 17, 21, 22, 22, 22, 22, 20, 19, 19, 18, 18, 18, 18,
+        17, 17, 17, 16, 21, 22, 23, 22, 22, 21, 19, 19, 18, 18, 17, 17, 17, 17,
+        16, 16, 21, 23, 23, 23, 22, 21, 19, 19, 18, 17, 17, 17, 16, 16, 16, 15,
+        21, 22, 23, 22, 22, 21, 19, 19, 18, 17, 17, 17, 16, 16, 16, 15, 20, 22,
+        22, 22, 22, 20, 19, 19, 17, 17, 16, 16, 16, 15, 15, 14, 20, 22, 22, 22,
+        22, 20, 19, 19, 17, 17, 16, 16, 16, 15, 15, 14, 20, 21, 21, 22, 22, 20,
+        19, 18, 17, 17, 16, 16, 15, 15, 14, 14, 20, 21, 21, 22, 22, 20, 19, 18,
+        17, 17, 16, 16, 15, 14, 14, 14, 19, 20, 21, 21, 21, 20, 19, 18, 17, 17,
+        15, 15, 14, 14, 14, 13, 19, 20, 20, 21, 21, 20, 19, 18, 17, 16, 15, 15,
+        14, 14, 13, 13, 19, 20, 20, 20, 21, 20, 18, 18, 16, 16, 15, 15, 14, 14,
+        13, 13, 18, 20, 20, 20, 20, 19, 18, 18, 16, 16, 15, 15, 14, 13, 13, 12,
+        18, 19, 19, 20, 20, 19, 18, 17, 16, 16, 14, 14, 13, 13, 13, 12, 17, 19,
+        19, 19, 20, 19, 18, 17, 16, 16, 14, 14, 13, 13, 12, 12, 17, 19, 19, 19,
+        19, 19, 17, 17, 16, 16, 14, 14, 13, 13, 12, 12, 16, 18, 18, 18, 19, 18,
+        17, 17, 15, 15, 14, 14, 13, 12, 12, 12, 16, 18, 18, 18, 19, 18, 17, 17,
+        15, 15, 14, 14, 13, 12, 12, 12,
+        /* Size 32x16 */
+        32, 33, 33, 34, 34, 32, 31, 30, 28, 28, 24, 24, 22, 21, 21, 21, 21, 21,
+        21, 20, 20, 20, 20, 19, 19, 19, 18, 18, 17, 17, 16, 16, 33, 33, 33, 32,
+        32, 31, 29, 28, 26, 26, 24, 24, 22, 22, 22, 22, 22, 23, 22, 22, 22, 21,
+        21, 20, 20, 20, 20, 19, 19, 19, 18, 18, 33, 33, 33, 32, 32, 30, 28, 28,
+        26, 26, 24, 24, 22, 22, 22, 22, 23, 23, 23, 22, 22, 21, 21, 21, 20, 20,
+        20, 19, 19, 19, 18, 18, 29, 28, 28, 28, 28, 26, 26, 24, 23, 23, 22, 22,
+        22, 22, 22, 22, 22, 23, 22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 19, 19,
+        18, 18, 28, 27, 27, 26, 26, 25, 24, 23, 22, 22, 22, 22, 21, 21, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 20, 20, 20, 19, 19, 19, 24, 24,
+        24, 24, 24, 23, 23, 23, 22, 22, 21, 21, 20, 20, 20, 20, 21, 21, 21, 20,
+        20, 20, 20, 20, 20, 20, 19, 19, 19, 19, 18, 18, 21, 22, 22, 22, 22, 22,
+        22, 22, 22, 21, 20, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,
+        19, 18, 18, 18, 18, 17, 17, 17, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22,
+        20, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 18, 17,
+        17, 17, 17, 17, 21, 22, 22, 22, 23, 23, 22, 23, 23, 22, 21, 21, 19, 19,
+        19, 18, 18, 18, 18, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 16, 15, 15,
+        21, 22, 22, 22, 23, 23, 22, 22, 22, 22, 21, 21, 19, 19, 19, 18, 18, 17,
+        17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 16, 16, 15, 15, 20, 20, 20, 21,
+        21, 21, 22, 22, 22, 22, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 16,
+        16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 20, 20, 20, 21, 21, 21, 22, 22,
+        22, 22, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 16, 16, 15, 15, 15,
+        15, 14, 14, 14, 14, 14, 18, 19, 19, 20, 20, 20, 20, 20, 21, 21, 19, 19,
+        19, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 13,
+        13, 13, 18, 19, 19, 20, 20, 20, 20, 20, 20, 20, 19, 19, 18, 18, 18, 17,
+        17, 16, 16, 15, 15, 15, 14, 14, 14, 14, 13, 13, 13, 13, 12, 12, 17, 18,
+        18, 18, 19, 19, 19, 19, 20, 19, 18, 18, 18, 17, 17, 17, 16, 16, 16, 15,
+        15, 14, 14, 14, 13, 13, 13, 13, 12, 12, 12, 12, 16, 17, 17, 18, 18, 18,
+        18, 19, 19, 19, 18, 18, 17, 17, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13,
+        13, 13, 12, 12, 12, 12, 12, 12,
+        /* Size 4x16 */
+        33, 24, 21, 18, 33, 24, 22, 19, 32, 24, 23, 20, 29, 23, 22, 20, 26, 22,
+        22, 20, 24, 21, 21, 19, 22, 20, 19, 18, 22, 20, 19, 18, 22, 21, 18, 17,
+        22, 21, 17, 16, 22, 20, 17, 15, 21, 20, 17, 14, 20, 20, 16, 14, 20, 19,
+        16, 13, 19, 19, 16, 13, 18, 18, 15, 12,
+        /* Size 16x4 */
+        33, 33, 32, 29, 26, 24, 22, 22, 22, 22, 22, 21, 20, 20, 19, 18, 24, 24,
+        24, 23, 22, 21, 20, 20, 21, 21, 20, 20, 20, 19, 19, 18, 21, 22, 23, 22,
+        22, 21, 19, 19, 18, 17, 17, 17, 16, 16, 16, 15, 18, 19, 20, 20, 20, 19,
+        18, 18, 17, 16, 15, 14, 14, 13, 13, 12,
+        /* Size 8x32 */
+        32, 33, 28, 21, 21, 20, 18, 17, 33, 33, 27, 22, 22, 20, 19, 18, 33, 33,
+        27, 22, 22, 20, 19, 18, 34, 32, 26, 22, 22, 21, 20, 18, 34, 32, 26, 22,
+        23, 21, 20, 19, 32, 30, 25, 22, 23, 21, 20, 19, 31, 28, 24, 22, 22, 22,
+        20, 19, 30, 28, 23, 22, 23, 22, 20, 19, 28, 26, 22, 22, 23, 22, 21, 20,
+        28, 26, 22, 21, 22, 22, 21, 19, 24, 24, 22, 20, 21, 20, 19, 18, 24, 24,
+        22, 20, 21, 20, 19, 18, 22, 22, 21, 20, 19, 19, 19, 18, 21, 22, 21, 19,
+        19, 19, 18, 17, 21, 22, 22, 19, 19, 18, 18, 17, 21, 22, 22, 19, 18, 18,
+        17, 17, 21, 23, 22, 19, 18, 17, 17, 16, 21, 23, 22, 19, 18, 17, 16, 16,
+        21, 23, 22, 19, 18, 17, 16, 16, 20, 22, 22, 19, 17, 16, 16, 15, 20, 22,
+        22, 19, 17, 16, 16, 15, 20, 21, 22, 19, 17, 16, 15, 14, 20, 21, 22, 19,
+        17, 16, 15, 14, 19, 21, 21, 19, 17, 15, 14, 14, 19, 20, 21, 19, 17, 15,
+        14, 13, 19, 20, 21, 18, 16, 15, 14, 13, 18, 20, 20, 18, 16, 15, 14, 13,
+        18, 19, 20, 18, 16, 14, 13, 13, 17, 19, 20, 18, 16, 14, 13, 12, 17, 19,
+        19, 17, 16, 14, 13, 12, 16, 18, 19, 17, 15, 14, 13, 12, 16, 18, 19, 17,
+        15, 14, 13, 12,
+        /* Size 32x8 */
+        32, 33, 33, 34, 34, 32, 31, 30, 28, 28, 24, 24, 22, 21, 21, 21, 21, 21,
+        21, 20, 20, 20, 20, 19, 19, 19, 18, 18, 17, 17, 16, 16, 33, 33, 33, 32,
+        32, 30, 28, 28, 26, 26, 24, 24, 22, 22, 22, 22, 23, 23, 23, 22, 22, 21,
+        21, 21, 20, 20, 20, 19, 19, 19, 18, 18, 28, 27, 27, 26, 26, 25, 24, 23,
+        22, 22, 22, 22, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21,
+        20, 20, 20, 19, 19, 19, 21, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20,
+        20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18, 17,
+        17, 17, 21, 22, 22, 22, 23, 23, 22, 23, 23, 22, 21, 21, 19, 19, 19, 18,
+        18, 18, 18, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 16, 15, 15, 20, 20,
+        20, 21, 21, 21, 22, 22, 22, 22, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16,
+        16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 18, 19, 19, 20, 20, 20,
+        20, 20, 21, 21, 19, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 14,
+        14, 14, 14, 13, 13, 13, 13, 13, 17, 18, 18, 18, 19, 19, 19, 19, 20, 19,
+        18, 18, 18, 17, 17, 17, 16, 16, 16, 15, 15, 14, 14, 14, 13, 13, 13, 13,
+        12, 12, 12, 12 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 31, 24, 19, 31, 27, 22, 18, 24, 22, 16, 14, 19, 18, 14, 11,
+        /* Size 8x8 */
+        33, 32, 32, 30, 27, 22, 20, 16, 32, 32, 32, 30, 28, 23, 21, 17, 32, 32,
+        29, 28, 26, 23, 21, 18, 30, 30, 28, 24, 22, 20, 18, 16, 27, 28, 26, 22,
+        19, 17, 16, 14, 22, 23, 23, 20, 17, 15, 14, 12, 20, 21, 21, 18, 16, 14,
+        12, 11, 16, 17, 18, 16, 14, 12, 11, 10,
+        /* Size 16x16 */
+        32, 33, 33, 33, 32, 32, 30, 28, 27, 25, 23, 21, 19, 18, 17, 16, 33, 32,
+        32, 32, 32, 32, 30, 29, 27, 26, 24, 22, 20, 19, 18, 17, 33, 32, 32, 32,
+        32, 32, 31, 30, 28, 27, 25, 23, 21, 19, 18, 17, 33, 32, 32, 31, 31, 31,
+        29, 28, 27, 26, 24, 23, 21, 19, 18, 17, 32, 32, 32, 31, 30, 30, 28, 28,
+        26, 26, 24, 23, 21, 19, 19, 17, 32, 32, 32, 31, 30, 29, 28, 27, 26, 25,
+        24, 22, 21, 20, 19, 18, 30, 30, 31, 29, 28, 28, 26, 24, 23, 22, 22, 20,
+        19, 18, 17, 16, 28, 29, 30, 28, 28, 27, 24, 21, 20, 20, 19, 18, 17, 16,
+        16, 15, 27, 27, 28, 27, 26, 26, 23, 20, 20, 19, 18, 17, 16, 15, 15, 14,
+        25, 26, 27, 26, 26, 25, 22, 20, 19, 18, 17, 16, 15, 15, 14, 14, 23, 24,
+        25, 24, 24, 24, 22, 19, 18, 17, 16, 15, 14, 14, 13, 13, 21, 22, 23, 23,
+        23, 22, 20, 18, 17, 16, 15, 14, 13, 13, 12, 12, 19, 20, 21, 21, 21, 21,
+        19, 17, 16, 15, 14, 13, 12, 12, 12, 11, 18, 19, 19, 19, 19, 20, 18, 16,
+        15, 15, 14, 13, 12, 11, 11, 11, 17, 18, 18, 18, 19, 19, 17, 16, 15, 14,
+        13, 12, 12, 11, 11, 10, 16, 17, 17, 17, 17, 18, 16, 15, 14, 14, 13, 12,
+        11, 11, 10, 10,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 31, 30, 30, 28, 28, 27, 26,
+        25, 23, 23, 22, 21, 20, 19, 19, 18, 17, 17, 16, 16, 15, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 29, 29, 27, 27, 26, 24, 24, 22,
+        22, 21, 20, 20, 18, 18, 17, 16, 16, 15, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 31, 30, 30, 29, 29, 27, 27, 26, 24, 24, 23, 22, 21, 20, 20,
+        19, 18, 18, 17, 17, 15, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+        30, 30, 29, 29, 28, 27, 26, 24, 24, 23, 23, 22, 20, 20, 19, 19, 18, 17,
+        17, 16, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 30,
+        28, 28, 27, 25, 25, 23, 23, 22, 21, 21, 19, 19, 18, 17, 17, 16, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 30, 28, 28, 27, 25,
+        25, 23, 23, 22, 21, 21, 19, 19, 18, 17, 17, 16, 33, 32, 32, 32, 32, 32,
+        31, 31, 31, 31, 31, 30, 29, 29, 28, 28, 27, 26, 26, 24, 24, 23, 23, 22,
+        21, 21, 19, 19, 18, 17, 17, 16, 33, 32, 32, 32, 32, 32, 31, 31, 31, 30,
+        30, 29, 29, 28, 28, 28, 27, 26, 26, 24, 24, 23, 23, 22, 20, 20, 19, 19,
+        18, 17, 17, 16, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 30, 29, 28, 28,
+        28, 28, 26, 26, 26, 24, 24, 23, 23, 22, 21, 21, 19, 19, 19, 17, 17, 16,
+        32, 32, 32, 32, 32, 32, 31, 30, 30, 29, 29, 28, 28, 28, 27, 27, 26, 26,
+        25, 24, 24, 23, 22, 22, 21, 21, 20, 19, 19, 18, 18, 17, 32, 32, 32, 32,
+        32, 32, 31, 30, 30, 29, 29, 28, 28, 28, 27, 27, 26, 26, 25, 24, 24, 23,
+        22, 22, 21, 21, 20, 19, 19, 18, 18, 17, 31, 31, 31, 31, 31, 31, 30, 29,
+        29, 28, 28, 27, 26, 26, 24, 24, 24, 23, 23, 22, 22, 21, 20, 20, 19, 19,
+        18, 18, 17, 17, 17, 16, 30, 30, 30, 30, 31, 31, 29, 29, 28, 28, 28, 26,
+        26, 25, 24, 24, 23, 23, 22, 22, 22, 20, 20, 20, 19, 19, 18, 18, 17, 16,
+        16, 15, 30, 30, 30, 30, 30, 30, 29, 28, 28, 28, 28, 26, 25, 24, 23, 23,
+        22, 22, 21, 20, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 15, 28, 29,
+        29, 29, 30, 30, 28, 28, 28, 27, 27, 24, 24, 23, 21, 21, 20, 20, 20, 19,
+        19, 18, 18, 18, 17, 17, 16, 16, 16, 15, 15, 14, 28, 29, 29, 29, 30, 30,
+        28, 28, 28, 27, 27, 24, 24, 23, 21, 21, 20, 20, 20, 19, 19, 18, 18, 18,
+        17, 17, 16, 16, 16, 15, 15, 14, 27, 27, 27, 28, 28, 28, 27, 27, 26, 26,
+        26, 24, 23, 22, 20, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 15, 15,
+        15, 14, 14, 13, 26, 27, 27, 27, 28, 28, 26, 26, 26, 26, 26, 23, 23, 22,
+        20, 20, 19, 19, 19, 18, 18, 17, 17, 17, 16, 16, 15, 15, 15, 14, 14, 13,
+        25, 26, 26, 26, 27, 27, 26, 26, 26, 25, 25, 23, 22, 21, 20, 20, 19, 19,
+        18, 17, 17, 17, 16, 16, 15, 15, 15, 14, 14, 14, 14, 13, 23, 24, 24, 24,
+        25, 25, 24, 24, 24, 24, 24, 22, 22, 20, 19, 19, 18, 18, 17, 16, 16, 16,
+        15, 15, 14, 14, 14, 14, 13, 13, 13, 12, 23, 24, 24, 24, 25, 25, 24, 24,
+        24, 24, 24, 22, 22, 20, 19, 19, 18, 18, 17, 16, 16, 16, 15, 15, 14, 14,
+        14, 14, 13, 13, 13, 12, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 23, 21,
+        20, 20, 18, 18, 17, 17, 17, 16, 16, 15, 15, 14, 14, 14, 13, 13, 13, 12,
+        12, 12, 21, 22, 22, 23, 23, 23, 23, 23, 23, 22, 22, 20, 20, 19, 18, 18,
+        17, 17, 16, 15, 15, 15, 14, 14, 13, 13, 13, 13, 12, 12, 12, 12, 20, 21,
+        21, 22, 22, 22, 22, 22, 22, 22, 22, 20, 20, 19, 18, 18, 17, 17, 16, 15,
+        15, 14, 14, 14, 13, 13, 12, 12, 12, 12, 12, 11, 19, 20, 20, 20, 21, 21,
+        21, 20, 21, 21, 21, 19, 19, 18, 17, 17, 16, 16, 15, 14, 14, 14, 13, 13,
+        12, 12, 12, 12, 12, 11, 11, 11, 19, 20, 20, 20, 21, 21, 21, 20, 21, 21,
+        21, 19, 19, 18, 17, 17, 16, 16, 15, 14, 14, 14, 13, 13, 12, 12, 12, 12,
+        12, 11, 11, 11, 18, 18, 19, 19, 19, 19, 19, 19, 19, 20, 20, 18, 18, 17,
+        16, 16, 15, 15, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 11, 11, 10,
+        17, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 17, 16, 16, 15, 15,
+        14, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 17, 17, 18, 18,
+        18, 18, 18, 18, 19, 19, 19, 17, 17, 17, 16, 16, 15, 15, 14, 13, 13, 13,
+        12, 12, 12, 12, 11, 11, 11, 10, 10, 10, 16, 16, 17, 17, 17, 17, 17, 17,
+        17, 18, 18, 17, 16, 16, 15, 15, 14, 14, 14, 13, 13, 12, 12, 12, 11, 11,
+        11, 10, 10, 10, 10, 9, 16, 16, 17, 17, 17, 17, 17, 17, 17, 18, 18, 17,
+        16, 16, 15, 15, 14, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10,
+        10, 9, 15, 15, 15, 16, 16, 16, 16, 16, 16, 17, 17, 16, 15, 15, 14, 14,
+        13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 9,
+        /* Size 4x8 */
+        32, 32, 24, 18, 32, 31, 25, 19, 32, 29, 24, 20, 30, 28, 20, 17, 27, 26,
+        18, 15, 23, 23, 16, 13, 20, 20, 14, 12, 17, 18, 13, 11,
+        /* Size 8x4 */
+        32, 32, 32, 30, 27, 23, 20, 17, 32, 31, 29, 28, 26, 23, 20, 18, 24, 25,
+        24, 20, 18, 16, 14, 13, 18, 19, 20, 17, 15, 13, 12, 11,
+        /* Size 8x16 */
+        32, 33, 32, 29, 26, 23, 19, 16, 33, 32, 32, 29, 27, 24, 20, 17, 33, 32,
+        31, 30, 28, 25, 21, 17, 33, 32, 30, 29, 27, 24, 21, 17, 32, 32, 30, 28,
+        26, 24, 21, 18, 32, 31, 29, 28, 26, 24, 21, 18, 30, 30, 28, 25, 23, 21,
+        19, 16, 28, 30, 27, 22, 20, 19, 17, 15, 27, 28, 26, 22, 20, 18, 16, 14,
+        25, 26, 25, 21, 19, 17, 15, 13, 23, 25, 24, 20, 18, 16, 14, 13, 21, 23,
+        22, 19, 17, 15, 13, 12, 19, 21, 20, 18, 16, 14, 12, 11, 18, 19, 19, 17,
+        15, 14, 12, 11, 17, 18, 18, 16, 15, 13, 12, 10, 16, 17, 18, 16, 14, 13,
+        11, 10,
+        /* Size 16x8 */
+        32, 33, 33, 33, 32, 32, 30, 28, 27, 25, 23, 21, 19, 18, 17, 16, 33, 32,
+        32, 32, 32, 31, 30, 30, 28, 26, 25, 23, 21, 19, 18, 17, 32, 32, 31, 30,
+        30, 29, 28, 27, 26, 25, 24, 22, 20, 19, 18, 18, 29, 29, 30, 29, 28, 28,
+        25, 22, 22, 21, 20, 19, 18, 17, 16, 16, 26, 27, 28, 27, 26, 26, 23, 20,
+        20, 19, 18, 17, 16, 15, 15, 14, 23, 24, 25, 24, 24, 24, 21, 19, 18, 17,
+        16, 15, 14, 14, 13, 13, 19, 20, 21, 21, 21, 21, 19, 17, 16, 15, 14, 13,
+        12, 12, 12, 11, 16, 17, 17, 17, 18, 18, 16, 15, 14, 13, 13, 12, 11, 11,
+        10, 10,
+        /* Size 16x32 */
+        32, 33, 33, 33, 32, 32, 29, 28, 26, 23, 23, 20, 19, 18, 16, 16, 33, 32,
+        32, 32, 32, 32, 29, 29, 27, 24, 24, 21, 20, 18, 16, 16, 33, 32, 32, 32,
+        32, 32, 29, 29, 27, 24, 24, 21, 20, 19, 17, 17, 33, 32, 32, 32, 32, 32,
+        30, 29, 28, 25, 25, 21, 20, 19, 17, 17, 33, 32, 32, 32, 31, 31, 30, 30,
+        28, 25, 25, 22, 21, 19, 17, 17, 33, 32, 32, 32, 31, 31, 30, 30, 28, 25,
+        25, 22, 21, 19, 17, 17, 33, 32, 32, 31, 30, 30, 29, 28, 27, 24, 24, 21,
+        21, 19, 17, 17, 32, 32, 32, 31, 30, 30, 28, 28, 27, 24, 24, 21, 20, 19,
+        17, 17, 32, 32, 32, 31, 30, 30, 28, 28, 26, 24, 24, 21, 21, 19, 18, 18,
+        32, 32, 31, 30, 29, 29, 28, 27, 26, 24, 24, 21, 21, 20, 18, 18, 32, 32,
+        31, 30, 29, 29, 28, 27, 26, 24, 24, 21, 21, 20, 18, 18, 31, 31, 31, 29,
+        28, 28, 26, 25, 24, 22, 22, 20, 19, 18, 17, 17, 30, 30, 30, 29, 28, 28,
+        25, 24, 23, 21, 21, 19, 19, 18, 16, 16, 30, 30, 30, 29, 28, 28, 24, 23,
+        22, 20, 20, 19, 18, 17, 16, 16, 28, 29, 30, 28, 27, 27, 22, 21, 20, 19,
+        19, 18, 17, 16, 15, 15, 28, 29, 30, 28, 27, 27, 22, 21, 20, 19, 19, 18,
+        17, 16, 15, 15, 27, 28, 28, 27, 26, 26, 22, 20, 20, 18, 18, 17, 16, 15,
+        14, 14, 26, 27, 28, 26, 26, 26, 21, 20, 19, 18, 18, 16, 16, 15, 14, 14,
+        25, 26, 26, 26, 25, 25, 21, 20, 19, 17, 17, 16, 15, 15, 13, 13, 23, 25,
+        25, 24, 24, 24, 20, 19, 18, 16, 16, 15, 14, 14, 13, 13, 23, 25, 25, 24,
+        24, 24, 20, 19, 18, 16, 16, 15, 14, 14, 13, 13, 22, 23, 23, 23, 23, 23,
+        19, 18, 17, 16, 16, 14, 14, 13, 12, 12, 21, 23, 23, 23, 22, 22, 19, 18,
+        17, 15, 15, 14, 13, 13, 12, 12, 20, 22, 22, 22, 22, 22, 19, 18, 17, 15,
+        15, 13, 13, 12, 12, 12, 19, 20, 21, 20, 20, 20, 18, 17, 16, 14, 14, 13,
+        12, 12, 11, 11, 19, 20, 21, 20, 20, 20, 18, 17, 16, 14, 14, 13, 12, 12,
+        11, 11, 18, 19, 19, 19, 19, 19, 17, 16, 15, 14, 14, 12, 12, 11, 11, 11,
+        18, 19, 19, 19, 19, 19, 17, 16, 15, 14, 14, 12, 12, 11, 10, 10, 17, 18,
+        18, 18, 18, 18, 16, 16, 15, 13, 13, 12, 12, 11, 10, 10, 16, 17, 17, 17,
+        18, 18, 16, 15, 14, 13, 13, 12, 11, 11, 10, 10, 16, 17, 17, 17, 18, 18,
+        16, 15, 14, 13, 13, 12, 11, 11, 10, 10, 15, 16, 16, 16, 17, 17, 15, 14,
+        13, 12, 12, 11, 11, 10, 9, 9,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 31, 30, 30, 28, 28, 27, 26,
+        25, 23, 23, 22, 21, 20, 19, 19, 18, 18, 17, 16, 16, 15, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 29, 29, 28, 27, 26, 25, 25, 23,
+        23, 22, 20, 20, 19, 19, 18, 17, 17, 16, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 31, 30, 30, 30, 30, 28, 28, 26, 25, 25, 23, 23, 22, 21, 21,
+        19, 19, 18, 17, 17, 16, 33, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 29,
+        29, 29, 28, 28, 27, 26, 26, 24, 24, 23, 23, 22, 20, 20, 19, 19, 18, 17,
+        17, 16, 32, 32, 32, 32, 31, 31, 30, 30, 30, 29, 29, 28, 28, 28, 27, 27,
+        26, 26, 25, 24, 24, 23, 22, 22, 20, 20, 19, 19, 18, 18, 18, 17, 32, 32,
+        32, 32, 31, 31, 30, 30, 30, 29, 29, 28, 28, 28, 27, 27, 26, 26, 25, 24,
+        24, 23, 22, 22, 20, 20, 19, 19, 18, 18, 18, 17, 29, 29, 29, 30, 30, 30,
+        29, 28, 28, 28, 28, 26, 25, 24, 22, 22, 22, 21, 21, 20, 20, 19, 19, 19,
+        18, 18, 17, 17, 16, 16, 16, 15, 28, 29, 29, 29, 30, 30, 28, 28, 28, 27,
+        27, 25, 24, 23, 21, 21, 20, 20, 20, 19, 19, 18, 18, 18, 17, 17, 16, 16,
+        16, 15, 15, 14, 26, 27, 27, 28, 28, 28, 27, 27, 26, 26, 26, 24, 23, 22,
+        20, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 15, 15, 15, 14, 14, 13,
+        23, 24, 24, 25, 25, 25, 24, 24, 24, 24, 24, 22, 21, 20, 19, 19, 18, 18,
+        17, 16, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 13, 12, 23, 24, 24, 25,
+        25, 25, 24, 24, 24, 24, 24, 22, 21, 20, 19, 19, 18, 18, 17, 16, 16, 16,
+        15, 15, 14, 14, 14, 14, 13, 13, 13, 12, 20, 21, 21, 21, 22, 22, 21, 21,
+        21, 21, 21, 20, 19, 19, 18, 18, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13,
+        12, 12, 12, 12, 12, 11, 19, 20, 20, 20, 21, 21, 21, 20, 21, 21, 21, 19,
+        19, 18, 17, 17, 16, 16, 15, 14, 14, 14, 13, 13, 12, 12, 12, 12, 12, 11,
+        11, 11, 18, 18, 19, 19, 19, 19, 19, 19, 19, 20, 20, 18, 18, 17, 16, 16,
+        15, 15, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 11, 11, 10, 16, 16,
+        17, 17, 17, 17, 17, 17, 18, 18, 18, 17, 16, 16, 15, 15, 14, 14, 13, 13,
+        13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 9, 16, 16, 17, 17, 17, 17,
+        17, 17, 18, 18, 18, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12,
+        11, 11, 11, 10, 10, 10, 10, 9,
+        /* Size 4x16 */
+        33, 32, 23, 18, 32, 32, 24, 19, 32, 31, 25, 19, 32, 30, 24, 19, 32, 30,
+        24, 19, 32, 29, 24, 20, 30, 28, 21, 18, 29, 27, 19, 16, 28, 26, 18, 15,
+        26, 25, 17, 15, 25, 24, 16, 14, 23, 22, 15, 13, 20, 20, 14, 12, 19, 19,
+        14, 11, 18, 18, 13, 11, 17, 18, 13, 11,
+        /* Size 16x4 */
+        33, 32, 32, 32, 32, 32, 30, 29, 28, 26, 25, 23, 20, 19, 18, 17, 32, 32,
+        31, 30, 30, 29, 28, 27, 26, 25, 24, 22, 20, 19, 18, 18, 23, 24, 25, 24,
+        24, 24, 21, 19, 18, 17, 16, 15, 14, 14, 13, 13, 18, 19, 19, 19, 19, 20,
+        18, 16, 15, 15, 14, 13, 12, 11, 11, 11,
+        /* Size 8x32 */
+        32, 33, 32, 29, 26, 23, 19, 16, 33, 32, 32, 29, 27, 24, 20, 16, 33, 32,
+        32, 29, 27, 24, 20, 17, 33, 32, 32, 30, 28, 25, 20, 17, 33, 32, 31, 30,
+        28, 25, 21, 17, 33, 32, 31, 30, 28, 25, 21, 17, 33, 32, 30, 29, 27, 24,
+        21, 17, 32, 32, 30, 28, 27, 24, 20, 17, 32, 32, 30, 28, 26, 24, 21, 18,
+        32, 31, 29, 28, 26, 24, 21, 18, 32, 31, 29, 28, 26, 24, 21, 18, 31, 31,
+        28, 26, 24, 22, 19, 17, 30, 30, 28, 25, 23, 21, 19, 16, 30, 30, 28, 24,
+        22, 20, 18, 16, 28, 30, 27, 22, 20, 19, 17, 15, 28, 30, 27, 22, 20, 19,
+        17, 15, 27, 28, 26, 22, 20, 18, 16, 14, 26, 28, 26, 21, 19, 18, 16, 14,
+        25, 26, 25, 21, 19, 17, 15, 13, 23, 25, 24, 20, 18, 16, 14, 13, 23, 25,
+        24, 20, 18, 16, 14, 13, 22, 23, 23, 19, 17, 16, 14, 12, 21, 23, 22, 19,
+        17, 15, 13, 12, 20, 22, 22, 19, 17, 15, 13, 12, 19, 21, 20, 18, 16, 14,
+        12, 11, 19, 21, 20, 18, 16, 14, 12, 11, 18, 19, 19, 17, 15, 14, 12, 11,
+        18, 19, 19, 17, 15, 14, 12, 10, 17, 18, 18, 16, 15, 13, 12, 10, 16, 17,
+        18, 16, 14, 13, 11, 10, 16, 17, 18, 16, 14, 13, 11, 10, 15, 16, 17, 15,
+        13, 12, 11, 9,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 31, 30, 30, 28, 28, 27, 26,
+        25, 23, 23, 22, 21, 20, 19, 19, 18, 18, 17, 16, 16, 15, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 28, 28, 26, 25, 25, 23,
+        23, 22, 21, 21, 19, 19, 18, 17, 17, 16, 32, 32, 32, 32, 31, 31, 30, 30,
+        30, 29, 29, 28, 28, 28, 27, 27, 26, 26, 25, 24, 24, 23, 22, 22, 20, 20,
+        19, 19, 18, 18, 18, 17, 29, 29, 29, 30, 30, 30, 29, 28, 28, 28, 28, 26,
+        25, 24, 22, 22, 22, 21, 21, 20, 20, 19, 19, 19, 18, 18, 17, 17, 16, 16,
+        16, 15, 26, 27, 27, 28, 28, 28, 27, 27, 26, 26, 26, 24, 23, 22, 20, 20,
+        20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 15, 15, 15, 14, 14, 13, 23, 24,
+        24, 25, 25, 25, 24, 24, 24, 24, 24, 22, 21, 20, 19, 19, 18, 18, 17, 16,
+        16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 13, 12, 19, 20, 20, 20, 21, 21,
+        21, 20, 21, 21, 21, 19, 19, 18, 17, 17, 16, 16, 15, 14, 14, 14, 13, 13,
+        12, 12, 12, 12, 12, 11, 11, 11, 16, 16, 17, 17, 17, 17, 17, 17, 18, 18,
+        18, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12, 11, 11, 11, 10,
+        10, 10, 10, 9 },
+      { /* Chroma */
+        /* Size 4x4 */
+        33, 25, 22, 20, 25, 21, 21, 20, 22, 21, 18, 17, 20, 20, 17, 14,
+        /* Size 8x8 */
+        33, 33, 27, 23, 22, 21, 20, 19, 33, 32, 26, 23, 23, 22, 22, 20, 27, 26,
+        22, 22, 22, 22, 22, 20, 23, 23, 22, 20, 20, 20, 20, 19, 22, 23, 22, 20,
+        19, 18, 18, 17, 21, 22, 22, 20, 18, 17, 16, 16, 20, 22, 22, 20, 18, 16,
+        16, 15, 19, 20, 20, 19, 17, 16, 15, 13,
+        /* Size 16x16 */
+        32, 33, 34, 31, 30, 28, 25, 21, 21, 21, 21, 20, 20, 19, 19, 18, 33, 33,
+        33, 30, 28, 27, 24, 22, 22, 22, 22, 21, 20, 20, 19, 19, 34, 33, 32, 30,
+        28, 26, 24, 22, 23, 23, 23, 22, 22, 21, 20, 20, 31, 30, 30, 28, 26, 24,
+        23, 22, 22, 22, 23, 22, 22, 21, 20, 20, 30, 28, 28, 26, 24, 23, 22, 22,
+        22, 22, 23, 22, 22, 21, 21, 20, 28, 27, 26, 24, 23, 22, 22, 21, 22, 22,
+        23, 22, 22, 21, 21, 20, 25, 24, 24, 23, 22, 22, 21, 20, 20, 21, 21, 20,
+        20, 20, 20, 19, 21, 22, 22, 22, 22, 21, 20, 19, 19, 19, 19, 19, 19, 19,
+        18, 18, 21, 22, 23, 22, 22, 22, 20, 19, 19, 19, 19, 18, 18, 18, 18, 17,
+        21, 22, 23, 22, 22, 22, 21, 19, 19, 19, 18, 18, 17, 17, 17, 17, 21, 22,
+        23, 23, 23, 23, 21, 19, 19, 18, 18, 17, 17, 17, 16, 16, 20, 21, 22, 22,
+        22, 22, 20, 19, 18, 18, 17, 17, 16, 16, 16, 15, 20, 20, 22, 22, 22, 22,
+        20, 19, 18, 17, 17, 16, 16, 15, 15, 15, 19, 20, 21, 21, 21, 21, 20, 19,
+        18, 17, 17, 16, 15, 15, 14, 14, 19, 19, 20, 20, 21, 21, 20, 18, 18, 17,
+        16, 16, 15, 14, 14, 14, 18, 19, 20, 20, 20, 20, 19, 18, 17, 17, 16, 15,
+        15, 14, 14, 13,
+        /* Size 32x32 */
+        32, 33, 33, 33, 34, 34, 31, 31, 30, 28, 28, 26, 25, 23, 21, 21, 21, 21,
+        21, 21, 21, 20, 20, 20, 20, 20, 19, 19, 19, 18, 18, 17, 33, 33, 33, 33,
+        33, 33, 31, 30, 28, 27, 27, 25, 24, 23, 21, 21, 22, 22, 22, 22, 22, 21,
+        21, 21, 20, 20, 20, 20, 19, 19, 19, 18, 33, 33, 33, 33, 33, 33, 30, 30,
+        28, 27, 27, 25, 24, 23, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 20, 20,
+        20, 20, 19, 19, 19, 18, 33, 33, 33, 33, 33, 33, 30, 29, 28, 26, 26, 25,
+        24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 20, 20, 20, 19,
+        19, 19, 34, 33, 33, 33, 32, 32, 30, 29, 28, 26, 26, 24, 24, 23, 22, 22,
+        23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 19, 34, 33,
+        33, 33, 32, 32, 30, 29, 28, 26, 26, 24, 24, 23, 22, 22, 23, 23, 23, 23,
+        23, 22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 19, 31, 31, 30, 30, 30, 30,
+        28, 27, 26, 24, 24, 23, 23, 23, 22, 22, 22, 22, 22, 23, 23, 22, 22, 22,
+        22, 22, 21, 21, 20, 20, 20, 19, 31, 30, 30, 29, 29, 29, 27, 26, 26, 24,
+        24, 23, 23, 22, 22, 22, 22, 22, 22, 23, 23, 22, 22, 22, 22, 22, 21, 21,
+        20, 20, 20, 19, 30, 28, 28, 28, 28, 28, 26, 26, 24, 23, 23, 23, 22, 22,
+        22, 22, 22, 22, 22, 23, 23, 22, 22, 22, 22, 22, 21, 21, 21, 20, 20, 20,
+        28, 27, 27, 26, 26, 26, 24, 24, 23, 22, 22, 22, 22, 22, 21, 21, 22, 22,
+        22, 23, 23, 22, 22, 22, 22, 22, 21, 21, 21, 20, 20, 20, 28, 27, 27, 26,
+        26, 26, 24, 24, 23, 22, 22, 22, 22, 22, 21, 21, 22, 22, 22, 23, 23, 22,
+        22, 22, 22, 22, 21, 21, 21, 20, 20, 20, 26, 25, 25, 25, 24, 24, 23, 23,
+        23, 22, 22, 21, 21, 21, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+        20, 20, 20, 20, 20, 19, 25, 24, 24, 24, 24, 24, 23, 23, 22, 22, 22, 21,
+        21, 21, 20, 20, 20, 21, 21, 21, 21, 20, 20, 20, 20, 20, 20, 20, 20, 19,
+        19, 19, 23, 23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 21, 21, 20, 20, 20,
+        20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 19, 19, 19, 19, 19, 18, 21, 21,
+        22, 22, 22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 19, 19, 19, 19, 19, 19,
+        19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18, 21, 21, 22, 22, 22, 22,
+        22, 22, 22, 21, 21, 20, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,
+        19, 19, 19, 19, 18, 18, 18, 18, 21, 22, 22, 22, 23, 23, 22, 22, 22, 22,
+        22, 21, 20, 20, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 18, 18,
+        18, 17, 17, 17, 21, 22, 22, 22, 23, 23, 22, 22, 22, 22, 22, 21, 21, 20,
+        19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 18, 18, 17, 17, 17, 17,
+        21, 22, 22, 22, 23, 23, 22, 22, 22, 22, 22, 21, 21, 20, 19, 19, 19, 19,
+        19, 18, 18, 18, 18, 18, 17, 17, 17, 17, 17, 17, 17, 16, 21, 22, 22, 22,
+        23, 23, 23, 23, 23, 23, 23, 21, 21, 20, 19, 19, 19, 19, 18, 18, 18, 17,
+        17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 21, 22, 22, 22, 23, 23, 23, 23,
+        23, 23, 23, 21, 21, 20, 19, 19, 19, 19, 18, 18, 18, 17, 17, 17, 17, 17,
+        17, 17, 16, 16, 16, 16, 20, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 21,
+        20, 20, 19, 19, 18, 18, 18, 17, 17, 17, 17, 17, 16, 16, 16, 16, 16, 16,
+        16, 15, 20, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 19, 19,
+        18, 18, 18, 17, 17, 17, 17, 17, 16, 16, 16, 16, 16, 15, 15, 15, 20, 21,
+        21, 21, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 19, 19, 18, 18, 18, 17,
+        17, 17, 17, 16, 16, 16, 16, 16, 15, 15, 15, 15, 20, 20, 20, 21, 22, 22,
+        22, 22, 22, 22, 22, 21, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 16,
+        16, 16, 15, 15, 15, 15, 15, 14, 20, 20, 20, 21, 22, 22, 22, 22, 22, 22,
+        22, 21, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 16, 16, 16, 15, 15,
+        15, 15, 15, 14, 19, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 20, 20, 19,
+        19, 19, 18, 18, 17, 17, 17, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14,
+        19, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 20, 20, 19, 19, 19, 18, 18,
+        17, 17, 17, 16, 16, 16, 15, 15, 15, 14, 14, 14, 14, 14, 19, 19, 19, 20,
+        20, 20, 20, 20, 21, 21, 21, 20, 20, 19, 18, 18, 18, 17, 17, 16, 16, 16,
+        16, 15, 15, 15, 14, 14, 14, 14, 14, 13, 18, 19, 19, 19, 20, 20, 20, 20,
+        20, 20, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 16, 15, 15, 15, 15,
+        14, 14, 14, 13, 13, 13, 18, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20,
+        19, 19, 18, 18, 17, 17, 17, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 13,
+        13, 13, 17, 18, 18, 19, 19, 19, 19, 19, 20, 20, 20, 19, 19, 18, 18, 18,
+        17, 17, 16, 16, 16, 15, 15, 15, 14, 14, 14, 14, 13, 13, 13, 13,
+        /* Size 4x8 */
+        33, 27, 22, 20, 32, 26, 23, 21, 26, 22, 23, 21, 23, 22, 20, 19, 22, 22,
+        18, 18, 22, 22, 17, 16, 21, 22, 17, 15, 19, 20, 16, 14,
+        /* Size 8x4 */
+        33, 32, 26, 23, 22, 22, 21, 19, 27, 26, 22, 22, 22, 22, 22, 20, 22, 23,
+        23, 20, 18, 17, 17, 16, 20, 21, 21, 19, 18, 16, 15, 14,
+        /* Size 8x16 */
+        32, 33, 28, 23, 21, 21, 20, 18, 33, 33, 27, 23, 22, 22, 20, 19, 34, 32,
+        26, 23, 23, 23, 21, 20, 31, 29, 24, 22, 22, 23, 22, 20, 29, 28, 23, 22,
+        22, 23, 22, 20, 28, 26, 22, 22, 22, 23, 22, 20, 24, 24, 22, 21, 20, 21,
+        20, 19, 21, 22, 21, 20, 19, 19, 19, 18, 21, 22, 22, 20, 19, 19, 18, 17,
+        21, 23, 22, 20, 19, 18, 17, 17, 21, 23, 22, 20, 19, 18, 17, 16, 20, 22,
+        22, 20, 18, 17, 16, 15, 20, 21, 22, 19, 18, 17, 16, 14, 19, 21, 21, 19,
+        18, 17, 15, 14, 19, 20, 21, 19, 18, 16, 15, 14, 18, 20, 20, 19, 17, 16,
+        15, 13,
+        /* Size 16x8 */
+        32, 33, 34, 31, 29, 28, 24, 21, 21, 21, 21, 20, 20, 19, 19, 18, 33, 33,
+        32, 29, 28, 26, 24, 22, 22, 23, 23, 22, 21, 21, 20, 20, 28, 27, 26, 24,
+        23, 22, 22, 21, 22, 22, 22, 22, 22, 21, 21, 20, 23, 23, 23, 22, 22, 22,
+        21, 20, 20, 20, 20, 20, 19, 19, 19, 19, 21, 22, 23, 22, 22, 22, 20, 19,
+        19, 19, 19, 18, 18, 18, 18, 17, 21, 22, 23, 23, 23, 23, 21, 19, 19, 18,
+        18, 17, 17, 17, 16, 16, 20, 20, 21, 22, 22, 22, 20, 19, 18, 17, 17, 16,
+        16, 15, 15, 15, 18, 19, 20, 20, 20, 20, 19, 18, 17, 17, 16, 15, 14, 14,
+        14, 13,
+        /* Size 16x32 */
+        32, 33, 33, 31, 28, 28, 23, 21, 21, 21, 21, 20, 20, 19, 18, 18, 33, 33,
+        33, 30, 27, 27, 23, 22, 22, 22, 22, 20, 20, 20, 19, 19, 33, 33, 33, 30,
+        27, 27, 23, 22, 22, 22, 22, 21, 20, 20, 19, 19, 33, 33, 32, 30, 26, 26,
+        23, 22, 22, 22, 22, 21, 21, 20, 19, 19, 34, 32, 32, 29, 26, 26, 23, 22,
+        23, 23, 23, 22, 21, 21, 20, 20, 34, 32, 32, 29, 26, 26, 23, 22, 23, 23,
+        23, 22, 21, 21, 20, 20, 31, 30, 29, 28, 24, 24, 22, 22, 22, 23, 23, 22,
+        22, 21, 20, 20, 31, 29, 28, 27, 24, 24, 22, 22, 22, 22, 22, 22, 22, 21,
+        20, 20, 29, 28, 28, 26, 23, 23, 22, 22, 22, 23, 23, 22, 22, 21, 20, 20,
+        28, 26, 26, 24, 22, 22, 22, 22, 22, 23, 23, 22, 22, 21, 20, 20, 28, 26,
+        26, 24, 22, 22, 22, 22, 22, 23, 23, 22, 22, 21, 20, 20, 25, 24, 24, 23,
+        22, 22, 21, 21, 21, 21, 21, 21, 20, 20, 20, 20, 24, 24, 24, 23, 22, 22,
+        21, 20, 20, 21, 21, 20, 20, 20, 19, 19, 23, 23, 23, 23, 22, 22, 20, 20,
+        20, 20, 20, 20, 20, 19, 19, 19, 21, 22, 22, 22, 21, 21, 20, 19, 19, 19,
+        19, 19, 19, 19, 18, 18, 21, 22, 22, 22, 21, 21, 20, 19, 19, 19, 19, 19,
+        19, 19, 18, 18, 21, 22, 22, 22, 22, 22, 20, 19, 19, 19, 19, 18, 18, 18,
+        17, 17, 21, 22, 22, 22, 22, 22, 20, 19, 19, 18, 18, 18, 18, 18, 17, 17,
+        21, 22, 23, 22, 22, 22, 20, 19, 19, 18, 18, 18, 17, 17, 17, 17, 21, 22,
+        23, 23, 22, 22, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 21, 22, 23, 23,
+        22, 22, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 20, 22, 22, 22, 22, 22,
+        20, 19, 18, 17, 17, 17, 16, 16, 16, 16, 20, 22, 22, 22, 22, 22, 20, 19,
+        18, 17, 17, 16, 16, 16, 15, 15, 20, 21, 22, 22, 22, 22, 20, 19, 18, 17,
+        17, 16, 16, 16, 15, 15, 20, 21, 21, 22, 22, 22, 19, 19, 18, 17, 17, 16,
+        16, 15, 14, 14, 20, 21, 21, 22, 22, 22, 19, 19, 18, 17, 17, 16, 16, 15,
+        14, 14, 19, 20, 21, 21, 21, 21, 19, 19, 18, 17, 17, 15, 15, 15, 14, 14,
+        19, 20, 20, 21, 21, 21, 19, 19, 18, 17, 17, 15, 15, 15, 14, 14, 19, 20,
+        20, 20, 21, 21, 19, 18, 18, 16, 16, 15, 15, 14, 14, 14, 18, 19, 20, 20,
+        20, 20, 19, 18, 17, 16, 16, 15, 15, 14, 13, 13, 18, 19, 20, 20, 20, 20,
+        19, 18, 17, 16, 16, 15, 15, 14, 13, 13, 17, 19, 19, 19, 20, 20, 18, 18,
+        17, 16, 16, 15, 14, 14, 13, 13,
+        /* Size 32x16 */
+        32, 33, 33, 33, 34, 34, 31, 31, 29, 28, 28, 25, 24, 23, 21, 21, 21, 21,
+        21, 21, 21, 20, 20, 20, 20, 20, 19, 19, 19, 18, 18, 17, 33, 33, 33, 33,
+        32, 32, 30, 29, 28, 26, 26, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22,
+        22, 21, 21, 21, 20, 20, 20, 19, 19, 19, 33, 33, 33, 32, 32, 32, 29, 28,
+        28, 26, 26, 24, 24, 23, 22, 22, 22, 22, 23, 23, 23, 22, 22, 22, 21, 21,
+        21, 20, 20, 20, 20, 19, 31, 30, 30, 30, 29, 29, 28, 27, 26, 24, 24, 23,
+        23, 23, 22, 22, 22, 22, 22, 23, 23, 22, 22, 22, 22, 22, 21, 21, 20, 20,
+        20, 19, 28, 27, 27, 26, 26, 26, 24, 24, 23, 22, 22, 22, 22, 22, 21, 21,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 20, 20, 20, 28, 27,
+        27, 26, 26, 26, 24, 24, 23, 22, 22, 22, 22, 22, 21, 21, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 21, 21, 21, 20, 20, 20, 23, 23, 23, 23, 23, 23,
+        22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
+        19, 19, 19, 19, 19, 19, 19, 18, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22,
+        22, 21, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,
+        18, 18, 18, 18, 21, 22, 22, 22, 23, 23, 22, 22, 22, 22, 22, 21, 20, 20,
+        19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 18, 18, 18, 17, 17, 17,
+        21, 22, 22, 22, 23, 23, 23, 22, 23, 23, 23, 21, 21, 20, 19, 19, 19, 18,
+        18, 18, 18, 17, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 21, 22, 22, 22,
+        23, 23, 23, 22, 23, 23, 23, 21, 21, 20, 19, 19, 19, 18, 18, 18, 18, 17,
+        17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 20, 20, 21, 21, 22, 22, 22, 22,
+        22, 22, 22, 21, 20, 20, 19, 19, 18, 18, 18, 17, 17, 17, 16, 16, 16, 16,
+        15, 15, 15, 15, 15, 15, 20, 20, 20, 21, 21, 21, 22, 22, 22, 22, 22, 20,
+        20, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 16, 16, 16, 15, 15, 15, 15,
+        15, 14, 19, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 20, 20, 19, 19, 19,
+        18, 18, 17, 17, 17, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 18, 19,
+        19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16,
+        16, 16, 15, 15, 14, 14, 14, 14, 14, 13, 13, 13, 18, 19, 19, 19, 20, 20,
+        20, 20, 20, 20, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 16, 15, 15,
+        14, 14, 14, 14, 14, 13, 13, 13,
+        /* Size 4x16 */
+        33, 28, 21, 19, 33, 27, 22, 20, 32, 26, 23, 21, 30, 24, 23, 21, 28, 23,
+        23, 21, 26, 22, 23, 21, 24, 22, 21, 20, 22, 21, 19, 19, 22, 22, 19, 18,
+        22, 22, 18, 17, 22, 22, 18, 17, 22, 22, 17, 16, 21, 22, 17, 15, 20, 21,
+        17, 15, 20, 21, 16, 14, 19, 20, 16, 14,
+        /* Size 16x4 */
+        33, 33, 32, 30, 28, 26, 24, 22, 22, 22, 22, 22, 21, 20, 20, 19, 28, 27,
+        26, 24, 23, 22, 22, 21, 22, 22, 22, 22, 22, 21, 21, 20, 21, 22, 23, 23,
+        23, 23, 21, 19, 19, 18, 18, 17, 17, 17, 16, 16, 19, 20, 21, 21, 21, 21,
+        20, 19, 18, 17, 17, 16, 15, 15, 14, 14,
+        /* Size 8x32 */
+        32, 33, 28, 23, 21, 21, 20, 18, 33, 33, 27, 23, 22, 22, 20, 19, 33, 33,
+        27, 23, 22, 22, 20, 19, 33, 32, 26, 23, 22, 22, 21, 19, 34, 32, 26, 23,
+        23, 23, 21, 20, 34, 32, 26, 23, 23, 23, 21, 20, 31, 29, 24, 22, 22, 23,
+        22, 20, 31, 28, 24, 22, 22, 22, 22, 20, 29, 28, 23, 22, 22, 23, 22, 20,
+        28, 26, 22, 22, 22, 23, 22, 20, 28, 26, 22, 22, 22, 23, 22, 20, 25, 24,
+        22, 21, 21, 21, 20, 20, 24, 24, 22, 21, 20, 21, 20, 19, 23, 23, 22, 20,
+        20, 20, 20, 19, 21, 22, 21, 20, 19, 19, 19, 18, 21, 22, 21, 20, 19, 19,
+        19, 18, 21, 22, 22, 20, 19, 19, 18, 17, 21, 22, 22, 20, 19, 18, 18, 17,
+        21, 23, 22, 20, 19, 18, 17, 17, 21, 23, 22, 20, 19, 18, 17, 16, 21, 23,
+        22, 20, 19, 18, 17, 16, 20, 22, 22, 20, 18, 17, 16, 16, 20, 22, 22, 20,
+        18, 17, 16, 15, 20, 22, 22, 20, 18, 17, 16, 15, 20, 21, 22, 19, 18, 17,
+        16, 14, 20, 21, 22, 19, 18, 17, 16, 14, 19, 21, 21, 19, 18, 17, 15, 14,
+        19, 20, 21, 19, 18, 17, 15, 14, 19, 20, 21, 19, 18, 16, 15, 14, 18, 20,
+        20, 19, 17, 16, 15, 13, 18, 20, 20, 19, 17, 16, 15, 13, 17, 19, 20, 18,
+        17, 16, 14, 13,
+        /* Size 32x8 */
+        32, 33, 33, 33, 34, 34, 31, 31, 29, 28, 28, 25, 24, 23, 21, 21, 21, 21,
+        21, 21, 21, 20, 20, 20, 20, 20, 19, 19, 19, 18, 18, 17, 33, 33, 33, 32,
+        32, 32, 29, 28, 28, 26, 26, 24, 24, 23, 22, 22, 22, 22, 23, 23, 23, 22,
+        22, 22, 21, 21, 21, 20, 20, 20, 20, 19, 28, 27, 27, 26, 26, 26, 24, 24,
+        23, 22, 22, 22, 22, 22, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22,
+        21, 21, 21, 20, 20, 20, 23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 21,
+        21, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 19, 19, 19, 19, 19, 19,
+        19, 18, 21, 22, 22, 22, 23, 23, 22, 22, 22, 22, 22, 21, 20, 20, 19, 19,
+        19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 18, 18, 18, 17, 17, 17, 21, 22,
+        22, 22, 23, 23, 23, 22, 23, 23, 23, 21, 21, 20, 19, 19, 19, 18, 18, 18,
+        18, 17, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 20, 20, 20, 21, 21, 21,
+        22, 22, 22, 22, 22, 20, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 16,
+        16, 16, 15, 15, 15, 15, 15, 14, 18, 19, 19, 19, 20, 20, 20, 20, 20, 20,
+        20, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 16, 15, 15, 14, 14, 14, 14,
+        14, 13, 13, 13 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 32, 27, 20, 32, 29, 26, 21, 27, 26, 19, 16, 20, 21, 16, 13,
+        /* Size 8x8 */
+        33, 32, 32, 30, 29, 25, 22, 19, 32, 32, 32, 31, 30, 26, 23, 20, 32, 32,
+        30, 29, 28, 25, 23, 20, 30, 31, 29, 26, 24, 22, 20, 19, 29, 30, 28, 24,
+        21, 19, 18, 17, 25, 26, 25, 22, 19, 17, 16, 15, 22, 23, 23, 20, 18, 16,
+        14, 13, 19, 20, 20, 19, 17, 15, 13, 12,
+        /* Size 16x16 */
+        32, 33, 33, 33, 33, 32, 32, 30, 28, 27, 26, 23, 22, 21, 19, 17, 33, 32,
+        32, 32, 32, 32, 31, 30, 29, 28, 27, 24, 23, 22, 20, 18, 33, 32, 32, 32,
+        32, 32, 31, 31, 30, 28, 28, 25, 23, 22, 20, 19, 33, 32, 32, 32, 32, 31,
+        31, 30, 29, 28, 27, 25, 23, 23, 21, 19, 33, 32, 32, 32, 31, 30, 30, 29,
+        28, 27, 26, 24, 23, 22, 20, 19, 32, 32, 32, 31, 30, 29, 28, 28, 27, 26,
+        26, 24, 23, 22, 21, 19, 32, 31, 31, 31, 30, 28, 28, 27, 26, 25, 24, 23,
+        22, 21, 20, 19, 30, 30, 31, 30, 29, 28, 27, 26, 24, 23, 23, 22, 20, 20,
+        19, 18, 28, 29, 30, 29, 28, 27, 26, 24, 21, 20, 20, 19, 18, 18, 17, 16,
+        27, 28, 28, 28, 27, 26, 25, 23, 20, 20, 20, 18, 18, 17, 16, 15, 26, 27,
+        28, 27, 26, 26, 24, 23, 20, 20, 19, 18, 17, 17, 16, 15, 23, 24, 25, 25,
+        24, 24, 23, 22, 19, 18, 18, 16, 16, 15, 14, 14, 22, 23, 23, 23, 23, 23,
+        22, 20, 18, 18, 17, 16, 15, 14, 14, 13, 21, 22, 22, 23, 22, 22, 21, 20,
+        18, 17, 17, 15, 14, 14, 13, 13, 19, 20, 20, 21, 20, 21, 20, 19, 17, 16,
+        16, 14, 14, 13, 12, 12, 17, 18, 19, 19, 19, 19, 19, 18, 16, 15, 15, 14,
+        13, 13, 12, 11,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 30, 30, 29, 28, 28,
+        27, 26, 26, 24, 23, 23, 22, 21, 21, 19, 19, 19, 17, 17, 33, 33, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 29, 29, 28, 26, 26, 25,
+        24, 24, 22, 22, 21, 20, 20, 19, 18, 18, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 30, 30, 30, 29, 29, 28, 27, 27, 25, 24, 24, 23, 22,
+        22, 20, 20, 19, 18, 18, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 30, 30, 30, 29, 29, 28, 27, 27, 25, 24, 24, 23, 22, 22, 20, 20, 20,
+        18, 18, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30,
+        30, 30, 28, 28, 28, 26, 25, 25, 23, 23, 22, 21, 20, 20, 19, 19, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 28, 28,
+        28, 26, 25, 25, 23, 23, 23, 21, 21, 20, 19, 19, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 29, 29, 28, 27, 27, 26, 25, 25,
+        23, 23, 23, 21, 21, 20, 19, 19, 33, 32, 32, 32, 32, 32, 32, 31, 31, 31,
+        30, 30, 30, 29, 29, 29, 28, 28, 27, 26, 26, 25, 24, 24, 23, 23, 22, 21,
+        20, 20, 19, 19, 33, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 29,
+        29, 28, 28, 28, 27, 26, 26, 25, 24, 24, 23, 23, 22, 21, 20, 20, 19, 19,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 29, 29, 29, 28, 28, 28,
+        27, 26, 26, 25, 24, 24, 23, 23, 22, 21, 21, 20, 19, 19, 32, 32, 32, 32,
+        32, 32, 31, 30, 30, 30, 29, 29, 28, 28, 28, 28, 27, 27, 26, 26, 26, 24,
+        24, 24, 23, 22, 22, 21, 21, 20, 19, 19, 32, 32, 32, 32, 32, 32, 31, 30,
+        30, 30, 29, 29, 28, 28, 28, 28, 27, 27, 26, 26, 26, 24, 24, 24, 23, 22,
+        22, 21, 21, 20, 19, 19, 32, 31, 31, 31, 31, 31, 31, 30, 30, 29, 28, 28,
+        28, 27, 27, 26, 26, 26, 25, 24, 24, 23, 23, 23, 22, 22, 21, 20, 20, 20,
+        19, 19, 30, 30, 30, 30, 31, 31, 30, 29, 29, 29, 28, 28, 27, 26, 26, 25,
+        24, 24, 23, 23, 23, 22, 22, 21, 20, 20, 20, 19, 19, 19, 18, 18, 30, 30,
+        30, 30, 31, 31, 30, 29, 29, 29, 28, 28, 27, 26, 26, 25, 24, 24, 23, 23,
+        23, 22, 22, 21, 20, 20, 20, 19, 19, 19, 18, 18, 29, 30, 30, 30, 30, 30,
+        30, 29, 28, 28, 28, 28, 26, 25, 25, 24, 23, 23, 22, 22, 22, 21, 20, 20,
+        19, 19, 19, 18, 18, 18, 17, 17, 28, 29, 29, 29, 30, 30, 29, 28, 28, 28,
+        27, 27, 26, 24, 24, 23, 21, 21, 20, 20, 20, 19, 19, 19, 18, 18, 18, 17,
+        17, 17, 16, 16, 28, 29, 29, 29, 30, 30, 29, 28, 28, 28, 27, 27, 26, 24,
+        24, 23, 21, 21, 20, 20, 20, 19, 19, 19, 18, 18, 18, 17, 17, 17, 16, 16,
+        27, 28, 28, 28, 28, 28, 28, 27, 27, 27, 26, 26, 25, 23, 23, 22, 20, 20,
+        20, 20, 20, 19, 18, 18, 18, 17, 17, 17, 16, 16, 15, 15, 26, 26, 27, 27,
+        28, 28, 27, 26, 26, 26, 26, 26, 24, 23, 23, 22, 20, 20, 20, 19, 19, 18,
+        18, 18, 17, 17, 17, 16, 16, 16, 15, 15, 26, 26, 27, 27, 28, 28, 27, 26,
+        26, 26, 26, 26, 24, 23, 23, 22, 20, 20, 20, 19, 19, 18, 18, 18, 17, 17,
+        17, 16, 16, 16, 15, 15, 24, 25, 25, 25, 26, 26, 26, 25, 25, 25, 24, 24,
+        23, 22, 22, 21, 19, 19, 19, 18, 18, 17, 17, 17, 16, 16, 16, 15, 15, 15,
+        14, 14, 23, 24, 24, 24, 25, 25, 25, 24, 24, 24, 24, 24, 23, 22, 22, 20,
+        19, 19, 18, 18, 18, 17, 16, 16, 16, 15, 15, 14, 14, 14, 14, 14, 23, 24,
+        24, 24, 25, 25, 25, 24, 24, 24, 24, 24, 23, 21, 21, 20, 19, 19, 18, 18,
+        18, 17, 16, 16, 16, 15, 15, 14, 14, 14, 13, 13, 22, 22, 23, 23, 23, 23,
+        23, 23, 23, 23, 23, 23, 22, 20, 20, 19, 18, 18, 18, 17, 17, 16, 16, 16,
+        15, 15, 14, 14, 14, 13, 13, 13, 21, 22, 22, 22, 23, 23, 23, 23, 23, 23,
+        22, 22, 22, 20, 20, 19, 18, 18, 17, 17, 17, 16, 15, 15, 15, 14, 14, 14,
+        13, 13, 13, 13, 21, 21, 22, 22, 22, 23, 23, 22, 22, 22, 22, 22, 21, 20,
+        20, 19, 18, 18, 17, 17, 17, 16, 15, 15, 14, 14, 14, 13, 13, 13, 13, 13,
+        19, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 20, 19, 19, 18, 17, 17,
+        17, 16, 16, 15, 14, 14, 14, 14, 13, 13, 13, 12, 12, 12, 19, 20, 20, 20,
+        20, 21, 21, 20, 20, 21, 21, 21, 20, 19, 19, 18, 17, 17, 16, 16, 16, 15,
+        14, 14, 14, 13, 13, 13, 12, 12, 12, 12, 19, 19, 19, 20, 20, 20, 20, 20,
+        20, 20, 20, 20, 20, 19, 19, 18, 17, 17, 16, 16, 16, 15, 14, 14, 13, 13,
+        13, 12, 12, 12, 12, 12, 17, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19,
+        19, 18, 18, 17, 16, 16, 15, 15, 15, 14, 14, 13, 13, 13, 13, 12, 12, 12,
+        11, 11, 17, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 17,
+        16, 16, 15, 15, 15, 14, 14, 13, 13, 13, 13, 12, 12, 12, 11, 11,
+        /* Size 4x8 */
+        32, 32, 28, 20, 32, 31, 28, 21, 32, 30, 27, 21, 30, 28, 23, 19, 29, 27,
+        21, 17, 26, 24, 19, 15, 22, 22, 17, 13, 20, 20, 16, 12,
+        /* Size 8x4 */
+        32, 32, 32, 30, 29, 26, 22, 20, 32, 31, 30, 28, 27, 24, 22, 20, 28, 28,
+        27, 23, 21, 19, 17, 16, 20, 21, 21, 19, 17, 15, 13, 12,
+        /* Size 8x16 */
+        32, 33, 32, 32, 28, 23, 22, 19, 33, 32, 32, 31, 29, 24, 23, 20, 33, 32,
+        32, 31, 29, 25, 23, 21, 33, 32, 31, 31, 29, 25, 23, 21, 32, 32, 30, 30,
+        28, 24, 23, 20, 32, 31, 29, 28, 27, 24, 23, 21, 32, 31, 29, 28, 26, 23,
+        22, 20, 30, 30, 28, 27, 24, 21, 20, 19, 28, 30, 28, 26, 21, 19, 18, 17,
+        27, 28, 26, 25, 21, 18, 18, 16, 26, 28, 26, 24, 20, 18, 17, 16, 23, 25,
+        24, 23, 19, 16, 16, 14, 22, 23, 23, 22, 18, 16, 15, 14, 21, 22, 22, 21,
+        18, 15, 14, 13, 19, 21, 20, 20, 17, 14, 14, 12, 18, 19, 19, 19, 16, 14,
+        13, 12,
+        /* Size 16x8 */
+        32, 33, 33, 33, 32, 32, 32, 30, 28, 27, 26, 23, 22, 21, 19, 18, 33, 32,
+        32, 32, 32, 31, 31, 30, 30, 28, 28, 25, 23, 22, 21, 19, 32, 32, 32, 31,
+        30, 29, 29, 28, 28, 26, 26, 24, 23, 22, 20, 19, 32, 31, 31, 31, 30, 28,
+        28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 28, 29, 29, 29, 28, 27, 26, 24,
+        21, 21, 20, 19, 18, 18, 17, 16, 23, 24, 25, 25, 24, 24, 23, 21, 19, 18,
+        18, 16, 16, 15, 14, 14, 22, 23, 23, 23, 23, 23, 22, 20, 18, 18, 17, 16,
+        15, 14, 14, 13, 19, 20, 21, 21, 20, 21, 20, 19, 17, 16, 16, 14, 14, 13,
+        12, 12,
+        /* Size 16x32 */
+        32, 33, 33, 33, 32, 32, 32, 29, 28, 27, 23, 23, 22, 19, 19, 17, 33, 32,
+        32, 32, 32, 32, 31, 29, 29, 28, 24, 24, 22, 20, 20, 18, 33, 32, 32, 32,
+        32, 32, 31, 29, 29, 28, 24, 24, 23, 20, 20, 18, 33, 32, 32, 32, 32, 32,
+        31, 29, 29, 28, 24, 24, 23, 20, 20, 18, 33, 32, 32, 32, 32, 32, 31, 30,
+        29, 28, 25, 25, 23, 21, 21, 19, 33, 32, 32, 32, 32, 31, 31, 30, 30, 28,
+        25, 25, 23, 21, 21, 19, 33, 32, 32, 32, 31, 31, 31, 29, 29, 28, 25, 25,
+        23, 21, 21, 19, 32, 32, 32, 32, 31, 30, 30, 28, 28, 27, 24, 24, 23, 21,
+        21, 19, 32, 32, 32, 31, 30, 30, 30, 28, 28, 27, 24, 24, 23, 20, 20, 19,
+        32, 32, 32, 31, 30, 30, 29, 28, 28, 27, 24, 24, 23, 21, 21, 19, 32, 32,
+        31, 31, 29, 29, 28, 27, 27, 26, 24, 24, 23, 21, 21, 19, 32, 32, 31, 31,
+        29, 29, 28, 27, 27, 26, 24, 24, 23, 21, 21, 19, 32, 31, 31, 31, 29, 28,
+        28, 26, 26, 25, 23, 23, 22, 20, 20, 19, 30, 30, 30, 30, 28, 28, 27, 24,
+        24, 23, 21, 21, 20, 19, 19, 18, 30, 30, 30, 30, 28, 28, 27, 24, 24, 23,
+        21, 21, 20, 19, 19, 18, 29, 30, 30, 30, 28, 28, 26, 23, 23, 22, 20, 20,
+        19, 18, 18, 17, 28, 29, 30, 29, 28, 27, 26, 22, 21, 21, 19, 19, 18, 17,
+        17, 16, 28, 29, 30, 29, 28, 27, 26, 22, 21, 21, 19, 19, 18, 17, 17, 16,
+        27, 28, 28, 28, 26, 26, 25, 21, 21, 20, 18, 18, 18, 16, 16, 15, 26, 27,
+        28, 27, 26, 26, 24, 21, 20, 20, 18, 18, 17, 16, 16, 15, 26, 27, 28, 27,
+        26, 26, 24, 21, 20, 20, 18, 18, 17, 16, 16, 15, 24, 26, 26, 26, 24, 24,
+        23, 20, 20, 19, 17, 17, 16, 15, 15, 14, 23, 24, 25, 25, 24, 24, 23, 20,
+        19, 18, 16, 16, 16, 14, 14, 14, 23, 24, 25, 25, 24, 24, 23, 20, 19, 18,
+        16, 16, 16, 14, 14, 13, 22, 23, 23, 23, 23, 23, 22, 19, 18, 18, 16, 16,
+        15, 14, 14, 13, 21, 22, 23, 23, 22, 22, 21, 19, 18, 17, 15, 15, 15, 13,
+        13, 13, 21, 22, 22, 22, 22, 22, 21, 18, 18, 17, 15, 15, 14, 13, 13, 13,
+        19, 20, 21, 21, 21, 21, 20, 18, 17, 17, 14, 14, 14, 13, 13, 12, 19, 20,
+        21, 21, 20, 20, 20, 17, 17, 16, 14, 14, 14, 12, 12, 12, 19, 20, 20, 20,
+        20, 20, 19, 17, 17, 16, 14, 14, 13, 12, 12, 12, 18, 19, 19, 19, 19, 19,
+        19, 17, 16, 15, 14, 14, 13, 12, 12, 11, 18, 19, 19, 19, 19, 19, 19, 17,
+        16, 15, 14, 14, 13, 12, 12, 11,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 30, 30, 29, 28, 28,
+        27, 26, 26, 24, 23, 23, 22, 21, 21, 19, 19, 19, 18, 18, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 29, 29, 28, 27, 27, 26,
+        24, 24, 23, 22, 22, 20, 20, 20, 19, 19, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 28, 28, 28, 26, 25, 25, 23, 23,
+        22, 21, 21, 20, 19, 19, 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        31, 30, 30, 30, 29, 29, 28, 27, 27, 26, 25, 25, 23, 23, 22, 21, 21, 20,
+        19, 19, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 29, 29, 29, 28, 28, 28,
+        28, 28, 26, 26, 26, 24, 24, 24, 23, 22, 22, 21, 20, 20, 19, 19, 32, 32,
+        32, 32, 32, 31, 31, 30, 30, 30, 29, 29, 28, 28, 28, 28, 27, 27, 26, 26,
+        26, 24, 24, 24, 23, 22, 22, 21, 20, 20, 19, 19, 32, 31, 31, 31, 31, 31,
+        31, 30, 30, 29, 28, 28, 28, 27, 27, 26, 26, 26, 25, 24, 24, 23, 23, 23,
+        22, 21, 21, 20, 20, 19, 19, 19, 29, 29, 29, 29, 30, 30, 29, 28, 28, 28,
+        27, 27, 26, 24, 24, 23, 22, 22, 21, 21, 21, 20, 20, 20, 19, 19, 18, 18,
+        17, 17, 17, 17, 28, 29, 29, 29, 29, 30, 29, 28, 28, 28, 27, 27, 26, 24,
+        24, 23, 21, 21, 21, 20, 20, 20, 19, 19, 18, 18, 18, 17, 17, 17, 16, 16,
+        27, 28, 28, 28, 28, 28, 28, 27, 27, 27, 26, 26, 25, 23, 23, 22, 21, 21,
+        20, 20, 20, 19, 18, 18, 18, 17, 17, 17, 16, 16, 15, 15, 23, 24, 24, 24,
+        25, 25, 25, 24, 24, 24, 24, 24, 23, 21, 21, 20, 19, 19, 18, 18, 18, 17,
+        16, 16, 16, 15, 15, 14, 14, 14, 14, 14, 23, 24, 24, 24, 25, 25, 25, 24,
+        24, 24, 24, 24, 23, 21, 21, 20, 19, 19, 18, 18, 18, 17, 16, 16, 16, 15,
+        15, 14, 14, 14, 14, 14, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
+        22, 20, 20, 19, 18, 18, 18, 17, 17, 16, 16, 16, 15, 15, 14, 14, 14, 13,
+        13, 13, 19, 20, 20, 20, 21, 21, 21, 21, 20, 21, 21, 21, 20, 19, 19, 18,
+        17, 17, 16, 16, 16, 15, 14, 14, 14, 13, 13, 13, 12, 12, 12, 12, 19, 20,
+        20, 20, 21, 21, 21, 21, 20, 21, 21, 21, 20, 19, 19, 18, 17, 17, 16, 16,
+        16, 15, 14, 14, 14, 13, 13, 13, 12, 12, 12, 12, 17, 18, 18, 18, 19, 19,
+        19, 19, 19, 19, 19, 19, 19, 18, 18, 17, 16, 16, 15, 15, 15, 14, 14, 13,
+        13, 13, 13, 12, 12, 12, 11, 11,
+        /* Size 4x16 */
+        33, 32, 27, 19, 32, 32, 28, 20, 32, 32, 28, 21, 32, 31, 28, 21, 32, 30,
+        27, 20, 32, 29, 26, 21, 31, 28, 25, 20, 30, 28, 23, 19, 29, 27, 21, 17,
+        28, 26, 20, 16, 27, 26, 20, 16, 24, 24, 18, 14, 23, 23, 18, 14, 22, 22,
+        17, 13, 20, 20, 16, 12, 19, 19, 15, 12,
+        /* Size 16x4 */
+        33, 32, 32, 32, 32, 32, 31, 30, 29, 28, 27, 24, 23, 22, 20, 19, 32, 32,
+        32, 31, 30, 29, 28, 28, 27, 26, 26, 24, 23, 22, 20, 19, 27, 28, 28, 28,
+        27, 26, 25, 23, 21, 20, 20, 18, 18, 17, 16, 15, 19, 20, 21, 21, 20, 21,
+        20, 19, 17, 16, 16, 14, 14, 13, 12, 12,
+        /* Size 8x32 */
+        32, 33, 32, 32, 28, 23, 22, 19, 33, 32, 32, 31, 29, 24, 22, 20, 33, 32,
+        32, 31, 29, 24, 23, 20, 33, 32, 32, 31, 29, 24, 23, 20, 33, 32, 32, 31,
+        29, 25, 23, 21, 33, 32, 32, 31, 30, 25, 23, 21, 33, 32, 31, 31, 29, 25,
+        23, 21, 32, 32, 31, 30, 28, 24, 23, 21, 32, 32, 30, 30, 28, 24, 23, 20,
+        32, 32, 30, 29, 28, 24, 23, 21, 32, 31, 29, 28, 27, 24, 23, 21, 32, 31,
+        29, 28, 27, 24, 23, 21, 32, 31, 29, 28, 26, 23, 22, 20, 30, 30, 28, 27,
+        24, 21, 20, 19, 30, 30, 28, 27, 24, 21, 20, 19, 29, 30, 28, 26, 23, 20,
+        19, 18, 28, 30, 28, 26, 21, 19, 18, 17, 28, 30, 28, 26, 21, 19, 18, 17,
+        27, 28, 26, 25, 21, 18, 18, 16, 26, 28, 26, 24, 20, 18, 17, 16, 26, 28,
+        26, 24, 20, 18, 17, 16, 24, 26, 24, 23, 20, 17, 16, 15, 23, 25, 24, 23,
+        19, 16, 16, 14, 23, 25, 24, 23, 19, 16, 16, 14, 22, 23, 23, 22, 18, 16,
+        15, 14, 21, 23, 22, 21, 18, 15, 15, 13, 21, 22, 22, 21, 18, 15, 14, 13,
+        19, 21, 21, 20, 17, 14, 14, 13, 19, 21, 20, 20, 17, 14, 14, 12, 19, 20,
+        20, 19, 17, 14, 13, 12, 18, 19, 19, 19, 16, 14, 13, 12, 18, 19, 19, 19,
+        16, 14, 13, 12,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 30, 30, 29, 28, 28,
+        27, 26, 26, 24, 23, 23, 22, 21, 21, 19, 19, 19, 18, 18, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 28, 28, 28, 26,
+        25, 25, 23, 23, 22, 21, 21, 20, 19, 19, 32, 32, 32, 32, 32, 32, 31, 31,
+        30, 30, 29, 29, 29, 28, 28, 28, 28, 28, 26, 26, 26, 24, 24, 24, 23, 22,
+        22, 21, 20, 20, 19, 19, 32, 31, 31, 31, 31, 31, 31, 30, 30, 29, 28, 28,
+        28, 27, 27, 26, 26, 26, 25, 24, 24, 23, 23, 23, 22, 21, 21, 20, 20, 19,
+        19, 19, 28, 29, 29, 29, 29, 30, 29, 28, 28, 28, 27, 27, 26, 24, 24, 23,
+        21, 21, 21, 20, 20, 20, 19, 19, 18, 18, 18, 17, 17, 17, 16, 16, 23, 24,
+        24, 24, 25, 25, 25, 24, 24, 24, 24, 24, 23, 21, 21, 20, 19, 19, 18, 18,
+        18, 17, 16, 16, 16, 15, 15, 14, 14, 14, 14, 14, 22, 22, 23, 23, 23, 23,
+        23, 23, 23, 23, 23, 23, 22, 20, 20, 19, 18, 18, 18, 17, 17, 16, 16, 16,
+        15, 15, 14, 14, 14, 13, 13, 13, 19, 20, 20, 20, 21, 21, 21, 21, 20, 21,
+        21, 21, 20, 19, 19, 18, 17, 17, 16, 16, 16, 15, 14, 14, 14, 13, 13, 13,
+        12, 12, 12, 12 },
+      { /* Chroma */
+        /* Size 4x4 */
+        33, 27, 22, 21, 27, 22, 22, 22, 22, 22, 19, 18, 21, 22, 18, 16,
+        /* Size 8x8 */
+        33, 33, 29, 24, 21, 22, 21, 20, 33, 32, 28, 24, 22, 23, 22, 21, 29, 28,
+        25, 23, 22, 23, 22, 21, 24, 24, 23, 21, 20, 21, 20, 20, 21, 22, 22, 20,
+        19, 19, 19, 19, 22, 23, 23, 21, 19, 18, 17, 17, 21, 22, 22, 20, 19, 17,
+        17, 16, 20, 21, 21, 20, 19, 17, 16, 15,
+        /* Size 16x16 */
+        32, 33, 34, 33, 31, 28, 27, 25, 21, 21, 21, 21, 20, 20, 20, 19, 33, 33,
+        33, 32, 30, 27, 26, 24, 22, 22, 22, 22, 21, 21, 20, 20, 34, 33, 33, 32,
+        29, 26, 25, 24, 22, 22, 22, 23, 22, 22, 21, 20, 33, 32, 32, 31, 28, 26,
+        25, 24, 22, 22, 23, 23, 22, 22, 22, 21, 31, 30, 29, 28, 26, 24, 23, 23,
+        22, 22, 22, 23, 22, 22, 22, 21, 28, 27, 26, 26, 24, 22, 22, 22, 21, 22,
+        22, 23, 22, 22, 22, 21, 27, 26, 25, 25, 23, 22, 22, 21, 21, 21, 21, 22,
+        22, 22, 21, 21, 25, 24, 24, 24, 23, 22, 21, 21, 20, 20, 21, 21, 20, 20,
+        20, 20, 21, 22, 22, 22, 22, 21, 21, 20, 19, 19, 19, 19, 19, 19, 19, 19,
+        21, 22, 22, 22, 22, 22, 21, 20, 19, 19, 19, 19, 19, 19, 18, 18, 21, 22,
+        22, 23, 22, 22, 21, 21, 19, 19, 19, 19, 18, 18, 18, 18, 21, 22, 23, 23,
+        23, 23, 22, 21, 19, 19, 19, 18, 17, 17, 17, 17, 20, 21, 22, 22, 22, 22,
+        22, 20, 19, 19, 18, 17, 17, 17, 16, 16, 20, 21, 22, 22, 22, 22, 22, 20,
+        19, 19, 18, 17, 17, 17, 16, 16, 20, 20, 21, 22, 22, 22, 21, 20, 19, 18,
+        18, 17, 16, 16, 16, 15, 19, 20, 20, 21, 21, 21, 21, 20, 19, 18, 18, 17,
+        16, 16, 15, 14,
+        /* Size 32x32 */
+        32, 33, 33, 33, 34, 34, 33, 31, 31, 30, 28, 28, 27, 25, 25, 23, 21, 21,
+        21, 21, 21, 21, 21, 21, 20, 20, 20, 20, 20, 19, 19, 19, 33, 33, 33, 33,
+        33, 33, 33, 30, 30, 29, 27, 27, 26, 24, 24, 23, 21, 21, 22, 22, 22, 22,
+        22, 22, 21, 21, 21, 20, 20, 20, 19, 19, 33, 33, 33, 33, 33, 33, 32, 30,
+        30, 29, 27, 27, 26, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21,
+        21, 21, 20, 20, 20, 20, 33, 33, 33, 33, 33, 33, 32, 30, 30, 28, 27, 27,
+        26, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 21, 20,
+        20, 20, 34, 33, 33, 33, 33, 33, 32, 29, 29, 28, 26, 26, 25, 24, 24, 23,
+        22, 22, 22, 22, 22, 23, 23, 23, 22, 22, 22, 21, 21, 21, 20, 20, 34, 33,
+        33, 33, 33, 32, 32, 29, 29, 28, 26, 26, 25, 24, 24, 23, 22, 22, 22, 23,
+        23, 23, 23, 23, 22, 22, 22, 22, 22, 21, 21, 21, 33, 33, 32, 32, 32, 32,
+        31, 29, 28, 28, 26, 26, 25, 24, 24, 23, 22, 22, 22, 23, 23, 23, 23, 23,
+        22, 22, 22, 22, 22, 21, 21, 21, 31, 30, 30, 30, 29, 29, 29, 27, 27, 26,
+        24, 24, 24, 23, 23, 22, 22, 22, 22, 22, 22, 23, 23, 23, 22, 22, 22, 22,
+        22, 21, 21, 21, 31, 30, 30, 30, 29, 29, 28, 27, 26, 26, 24, 24, 23, 23,
+        23, 22, 22, 22, 22, 22, 22, 23, 23, 23, 22, 22, 22, 22, 22, 21, 21, 21,
+        30, 29, 29, 28, 28, 28, 28, 26, 26, 25, 23, 23, 23, 23, 23, 22, 22, 22,
+        22, 22, 22, 23, 23, 23, 22, 22, 22, 22, 22, 21, 21, 21, 28, 27, 27, 27,
+        26, 26, 26, 24, 24, 23, 22, 22, 22, 22, 22, 22, 21, 21, 22, 22, 22, 23,
+        23, 23, 22, 22, 22, 22, 22, 22, 21, 21, 28, 27, 27, 27, 26, 26, 26, 24,
+        24, 23, 22, 22, 22, 22, 22, 22, 21, 21, 22, 22, 22, 23, 23, 23, 22, 22,
+        22, 22, 22, 22, 21, 21, 27, 26, 26, 26, 25, 25, 25, 24, 23, 23, 22, 22,
+        22, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 21, 21, 21,
+        21, 21, 25, 24, 24, 24, 24, 24, 24, 23, 23, 23, 22, 22, 21, 21, 21, 21,
+        20, 20, 20, 21, 21, 21, 21, 21, 20, 20, 20, 20, 20, 20, 20, 20, 25, 24,
+        24, 24, 24, 24, 24, 23, 23, 23, 22, 22, 21, 21, 21, 21, 20, 20, 20, 21,
+        21, 21, 21, 21, 20, 20, 20, 20, 20, 20, 20, 20, 23, 23, 23, 23, 23, 23,
+        23, 22, 22, 22, 22, 22, 21, 21, 21, 20, 20, 20, 20, 20, 20, 20, 20, 20,
+        20, 20, 20, 20, 20, 20, 19, 19, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22,
+        21, 21, 21, 20, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,
+        19, 19, 19, 19, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 20,
+        20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,
+        21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 20, 19, 19,
+        19, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 21, 22, 22, 22,
+        22, 23, 23, 22, 22, 22, 22, 22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19,
+        19, 18, 18, 18, 18, 18, 18, 18, 18, 18, 21, 22, 22, 22, 22, 23, 23, 22,
+        22, 22, 22, 22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18,
+        18, 18, 18, 18, 18, 18, 21, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23,
+        22, 21, 21, 20, 19, 19, 19, 19, 19, 18, 18, 18, 18, 17, 17, 17, 17, 17,
+        17, 17, 21, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 22, 21, 21, 20,
+        19, 19, 19, 19, 19, 18, 18, 18, 17, 17, 17, 17, 17, 17, 17, 17, 21, 22,
+        22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 22, 21, 21, 20, 19, 19, 19, 18,
+        18, 18, 18, 17, 17, 17, 17, 17, 17, 17, 16, 16, 20, 21, 21, 21, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 20, 20, 20, 19, 19, 19, 18, 18, 18, 17, 17,
+        17, 17, 17, 16, 16, 16, 16, 16, 20, 21, 21, 21, 22, 22, 22, 22, 22, 22,
+        22, 22, 22, 20, 20, 20, 19, 19, 19, 18, 18, 17, 17, 17, 17, 17, 17, 16,
+        16, 16, 16, 16, 20, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 20,
+        20, 20, 19, 19, 19, 18, 18, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 16,
+        20, 20, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 20, 19, 19,
+        18, 18, 18, 17, 17, 17, 16, 16, 16, 16, 16, 15, 15, 15, 20, 20, 20, 21,
+        21, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 20, 19, 19, 18, 18, 18, 17,
+        17, 17, 16, 16, 16, 16, 16, 15, 15, 15, 19, 20, 20, 20, 21, 21, 21, 21,
+        21, 21, 22, 22, 21, 20, 20, 20, 19, 19, 18, 18, 18, 17, 17, 17, 16, 16,
+        16, 15, 15, 15, 15, 15, 19, 19, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21,
+        21, 20, 20, 19, 19, 19, 18, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 15,
+        14, 14, 19, 19, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 20, 20, 19,
+        19, 19, 18, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 14, 14,
+        /* Size 4x8 */
+        33, 27, 22, 20, 33, 26, 22, 21, 28, 23, 22, 22, 24, 22, 20, 20, 22, 21,
+        19, 19, 22, 22, 19, 17, 21, 22, 19, 16, 20, 21, 18, 15,
+        /* Size 8x4 */
+        33, 33, 28, 24, 22, 22, 21, 20, 27, 26, 23, 22, 21, 22, 22, 21, 22, 22,
+        22, 20, 19, 19, 19, 18, 20, 21, 22, 20, 19, 17, 16, 15,
+        /* Size 8x16 */
+        32, 33, 29, 27, 21, 21, 20, 20, 33, 33, 28, 26, 22, 22, 21, 20, 34, 32,
+        27, 26, 22, 23, 22, 21, 33, 31, 27, 25, 22, 23, 22, 21, 31, 28, 25, 23,
+        22, 22, 22, 22, 28, 26, 23, 22, 22, 23, 22, 22, 26, 25, 22, 22, 21, 22,
+        22, 21, 24, 24, 22, 21, 20, 21, 20, 20, 21, 22, 21, 21, 19, 19, 19, 19,
+        21, 22, 22, 21, 19, 19, 19, 18, 21, 22, 22, 21, 19, 18, 18, 18, 21, 23,
+        23, 22, 19, 18, 17, 17, 20, 22, 22, 21, 19, 17, 17, 16, 20, 22, 22, 21,
+        19, 17, 17, 16, 20, 21, 22, 21, 19, 17, 16, 16, 19, 20, 21, 20, 19, 17,
+        16, 15,
+        /* Size 16x8 */
+        32, 33, 34, 33, 31, 28, 26, 24, 21, 21, 21, 21, 20, 20, 20, 19, 33, 33,
+        32, 31, 28, 26, 25, 24, 22, 22, 22, 23, 22, 22, 21, 20, 29, 28, 27, 27,
+        25, 23, 22, 22, 21, 22, 22, 23, 22, 22, 22, 21, 27, 26, 26, 25, 23, 22,
+        22, 21, 21, 21, 21, 22, 21, 21, 21, 20, 21, 22, 22, 22, 22, 22, 21, 20,
+        19, 19, 19, 19, 19, 19, 19, 19, 21, 22, 23, 23, 22, 23, 22, 21, 19, 19,
+        18, 18, 17, 17, 17, 17, 20, 21, 22, 22, 22, 22, 22, 20, 19, 19, 18, 17,
+        17, 17, 16, 16, 20, 20, 21, 21, 22, 22, 21, 20, 19, 18, 18, 17, 16, 16,
+        16, 15,
+        /* Size 16x32 */
+        32, 33, 33, 33, 29, 28, 27, 22, 21, 21, 21, 21, 20, 20, 20, 19, 33, 33,
+        33, 32, 28, 27, 26, 22, 22, 22, 21, 21, 21, 20, 20, 19, 33, 33, 33, 32,
+        28, 27, 26, 22, 22, 22, 22, 22, 21, 20, 20, 20, 33, 33, 33, 32, 28, 27,
+        26, 22, 22, 22, 22, 22, 21, 20, 20, 20, 34, 33, 32, 32, 27, 26, 26, 23,
+        22, 22, 23, 23, 22, 21, 21, 20, 34, 33, 32, 31, 27, 26, 25, 23, 22, 22,
+        23, 23, 22, 21, 21, 20, 33, 32, 31, 31, 27, 26, 25, 23, 22, 22, 23, 23,
+        22, 21, 21, 20, 31, 29, 29, 28, 25, 24, 24, 22, 22, 22, 23, 23, 22, 22,
+        22, 21, 31, 29, 28, 28, 25, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 21,
+        30, 28, 28, 28, 24, 23, 23, 22, 22, 22, 23, 23, 22, 22, 22, 21, 28, 26,
+        26, 25, 23, 22, 22, 22, 22, 22, 23, 23, 22, 22, 22, 21, 28, 26, 26, 25,
+        23, 22, 22, 22, 22, 22, 23, 23, 22, 22, 22, 21, 26, 26, 25, 24, 22, 22,
+        22, 21, 21, 21, 22, 22, 22, 21, 21, 20, 24, 24, 24, 24, 22, 22, 21, 20,
+        20, 20, 21, 21, 20, 20, 20, 20, 24, 24, 24, 24, 22, 22, 21, 20, 20, 20,
+        21, 21, 20, 20, 20, 20, 23, 23, 23, 23, 22, 22, 21, 20, 20, 20, 20, 20,
+        20, 20, 20, 19, 21, 22, 22, 22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19,
+        19, 19, 21, 22, 22, 22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19, 19, 19,
+        21, 22, 22, 22, 22, 22, 21, 20, 19, 19, 19, 19, 19, 18, 18, 18, 21, 22,
+        22, 22, 22, 22, 21, 20, 19, 19, 18, 18, 18, 18, 18, 17, 21, 22, 22, 22,
+        22, 22, 21, 20, 19, 19, 18, 18, 18, 18, 18, 17, 21, 22, 23, 23, 22, 22,
+        22, 20, 19, 19, 18, 18, 18, 17, 17, 17, 21, 22, 23, 23, 23, 22, 22, 20,
+        19, 19, 18, 18, 17, 17, 17, 17, 21, 22, 23, 23, 22, 22, 22, 20, 19, 19,
+        18, 18, 17, 17, 17, 16, 20, 22, 22, 22, 22, 22, 21, 19, 19, 19, 17, 17,
+        17, 16, 16, 16, 20, 21, 22, 22, 22, 22, 21, 19, 19, 19, 17, 17, 17, 16,
+        16, 16, 20, 21, 22, 22, 22, 22, 21, 19, 19, 19, 17, 17, 17, 16, 16, 16,
+        20, 21, 21, 21, 22, 22, 21, 19, 19, 18, 17, 17, 16, 16, 16, 15, 20, 21,
+        21, 21, 22, 22, 21, 19, 19, 18, 17, 17, 16, 16, 16, 15, 19, 20, 21, 21,
+        21, 21, 21, 19, 19, 18, 17, 17, 16, 15, 15, 15, 19, 20, 20, 20, 21, 21,
+        20, 19, 19, 18, 17, 17, 16, 15, 15, 14, 19, 20, 20, 20, 21, 21, 20, 19,
+        19, 18, 17, 17, 16, 15, 15, 14,
+        /* Size 32x16 */
+        32, 33, 33, 33, 34, 34, 33, 31, 31, 30, 28, 28, 26, 24, 24, 23, 21, 21,
+        21, 21, 21, 21, 21, 21, 20, 20, 20, 20, 20, 19, 19, 19, 33, 33, 33, 33,
+        33, 33, 32, 29, 29, 28, 26, 26, 26, 24, 24, 23, 22, 22, 22, 22, 22, 22,
+        22, 22, 22, 21, 21, 21, 21, 20, 20, 20, 33, 33, 33, 33, 32, 32, 31, 29,
+        28, 28, 26, 26, 25, 24, 24, 23, 22, 22, 22, 22, 22, 23, 23, 23, 22, 22,
+        22, 21, 21, 21, 20, 20, 33, 32, 32, 32, 32, 31, 31, 28, 28, 28, 25, 25,
+        24, 24, 24, 23, 22, 22, 22, 22, 22, 23, 23, 23, 22, 22, 22, 21, 21, 21,
+        20, 20, 29, 28, 28, 28, 27, 27, 27, 25, 25, 24, 23, 23, 22, 22, 22, 22,
+        21, 21, 22, 22, 22, 22, 23, 22, 22, 22, 22, 22, 22, 21, 21, 21, 28, 27,
+        27, 27, 26, 26, 26, 24, 24, 23, 22, 22, 22, 22, 22, 22, 21, 21, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 27, 26, 26, 26, 26, 25,
+        25, 24, 23, 23, 22, 22, 22, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22,
+        21, 21, 21, 21, 21, 21, 20, 20, 22, 22, 22, 22, 23, 23, 23, 22, 22, 22,
+        22, 22, 21, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 19, 19, 19, 19,
+        19, 19, 19, 19, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20,
+        20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,
+        21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 20, 19, 19,
+        19, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 21, 21, 22, 22,
+        23, 23, 23, 23, 22, 23, 23, 23, 22, 21, 21, 20, 19, 19, 19, 18, 18, 18,
+        18, 18, 17, 17, 17, 17, 17, 17, 17, 17, 21, 21, 22, 22, 23, 23, 23, 23,
+        22, 23, 23, 23, 22, 21, 21, 20, 19, 19, 19, 18, 18, 18, 18, 18, 17, 17,
+        17, 17, 17, 17, 17, 17, 20, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22,
+        22, 20, 20, 20, 19, 19, 19, 18, 18, 18, 17, 17, 17, 17, 17, 16, 16, 16,
+        16, 16, 20, 20, 20, 20, 21, 21, 21, 22, 22, 22, 22, 22, 21, 20, 20, 20,
+        19, 19, 18, 18, 18, 17, 17, 17, 16, 16, 16, 16, 16, 15, 15, 15, 20, 20,
+        20, 20, 21, 21, 21, 22, 22, 22, 22, 22, 21, 20, 20, 20, 19, 19, 18, 18,
+        18, 17, 17, 17, 16, 16, 16, 16, 16, 15, 15, 15, 19, 19, 20, 20, 20, 20,
+        20, 21, 21, 21, 21, 21, 20, 20, 20, 19, 19, 19, 18, 17, 17, 17, 17, 16,
+        16, 16, 16, 15, 15, 15, 14, 14,
+        /* Size 4x16 */
+        33, 28, 21, 20, 33, 27, 22, 20, 33, 26, 22, 21, 32, 26, 22, 21, 29, 24,
+        22, 22, 26, 22, 22, 22, 26, 22, 21, 21, 24, 22, 20, 20, 22, 21, 19, 19,
+        22, 22, 19, 18, 22, 22, 19, 18, 22, 22, 19, 17, 22, 22, 19, 16, 21, 22,
+        19, 16, 21, 22, 18, 16, 20, 21, 18, 15,
+        /* Size 16x4 */
+        33, 33, 33, 32, 29, 26, 26, 24, 22, 22, 22, 22, 22, 21, 21, 20, 28, 27,
+        26, 26, 24, 22, 22, 22, 21, 22, 22, 22, 22, 22, 22, 21, 21, 22, 22, 22,
+        22, 22, 21, 20, 19, 19, 19, 19, 19, 19, 18, 18, 20, 20, 21, 21, 22, 22,
+        21, 20, 19, 18, 18, 17, 16, 16, 16, 15,
+        /* Size 8x32 */
+        32, 33, 29, 27, 21, 21, 20, 20, 33, 33, 28, 26, 22, 21, 21, 20, 33, 33,
+        28, 26, 22, 22, 21, 20, 33, 33, 28, 26, 22, 22, 21, 20, 34, 32, 27, 26,
+        22, 23, 22, 21, 34, 32, 27, 25, 22, 23, 22, 21, 33, 31, 27, 25, 22, 23,
+        22, 21, 31, 29, 25, 24, 22, 23, 22, 22, 31, 28, 25, 23, 22, 22, 22, 22,
+        30, 28, 24, 23, 22, 23, 22, 22, 28, 26, 23, 22, 22, 23, 22, 22, 28, 26,
+        23, 22, 22, 23, 22, 22, 26, 25, 22, 22, 21, 22, 22, 21, 24, 24, 22, 21,
+        20, 21, 20, 20, 24, 24, 22, 21, 20, 21, 20, 20, 23, 23, 22, 21, 20, 20,
+        20, 20, 21, 22, 21, 21, 19, 19, 19, 19, 21, 22, 21, 21, 19, 19, 19, 19,
+        21, 22, 22, 21, 19, 19, 19, 18, 21, 22, 22, 21, 19, 18, 18, 18, 21, 22,
+        22, 21, 19, 18, 18, 18, 21, 23, 22, 22, 19, 18, 18, 17, 21, 23, 23, 22,
+        19, 18, 17, 17, 21, 23, 22, 22, 19, 18, 17, 17, 20, 22, 22, 21, 19, 17,
+        17, 16, 20, 22, 22, 21, 19, 17, 17, 16, 20, 22, 22, 21, 19, 17, 17, 16,
+        20, 21, 22, 21, 19, 17, 16, 16, 20, 21, 22, 21, 19, 17, 16, 16, 19, 21,
+        21, 21, 19, 17, 16, 15, 19, 20, 21, 20, 19, 17, 16, 15, 19, 20, 21, 20,
+        19, 17, 16, 15,
+        /* Size 32x8 */
+        32, 33, 33, 33, 34, 34, 33, 31, 31, 30, 28, 28, 26, 24, 24, 23, 21, 21,
+        21, 21, 21, 21, 21, 21, 20, 20, 20, 20, 20, 19, 19, 19, 33, 33, 33, 33,
+        32, 32, 31, 29, 28, 28, 26, 26, 25, 24, 24, 23, 22, 22, 22, 22, 22, 23,
+        23, 23, 22, 22, 22, 21, 21, 21, 20, 20, 29, 28, 28, 28, 27, 27, 27, 25,
+        25, 24, 23, 23, 22, 22, 22, 22, 21, 21, 22, 22, 22, 22, 23, 22, 22, 22,
+        22, 22, 22, 21, 21, 21, 27, 26, 26, 26, 26, 25, 25, 24, 23, 23, 22, 22,
+        22, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 21, 21, 21, 21, 21, 21,
+        20, 20, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 20,
+        19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 21, 21,
+        22, 22, 23, 23, 23, 23, 22, 23, 23, 23, 22, 21, 21, 20, 19, 19, 19, 18,
+        18, 18, 18, 18, 17, 17, 17, 17, 17, 17, 17, 17, 20, 21, 21, 21, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 20, 20, 20, 19, 19, 19, 18, 18, 18, 17, 17,
+        17, 17, 17, 16, 16, 16, 16, 16, 20, 20, 20, 20, 21, 21, 21, 22, 22, 22,
+        22, 22, 21, 20, 20, 20, 19, 19, 18, 18, 18, 17, 17, 17, 16, 16, 16, 16,
+        16, 15, 15, 15 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 32, 29, 24, 32, 30, 28, 24, 29, 28, 21, 19, 24, 24, 19, 16,
+        /* Size 8x8 */
+        33, 33, 32, 32, 30, 28, 24, 22, 33, 32, 32, 32, 30, 28, 25, 23, 32, 32,
+        31, 30, 29, 27, 24, 23, 32, 32, 30, 29, 28, 26, 24, 22, 30, 30, 29, 28,
+        25, 23, 21, 20, 28, 28, 27, 26, 23, 20, 18, 17, 24, 25, 24, 24, 21, 18,
+        16, 15, 22, 23, 23, 22, 20, 17, 15, 14,
+        /* Size 16x16 */
+        32, 33, 33, 33, 33, 33, 32, 32, 30, 29, 28, 26, 25, 23, 22, 21, 33, 32,
+        32, 32, 32, 32, 32, 31, 30, 29, 29, 27, 26, 24, 23, 22, 33, 32, 32, 32,
+        32, 32, 32, 31, 30, 30, 29, 27, 26, 24, 23, 23, 33, 32, 32, 32, 32, 32,
+        32, 31, 31, 30, 30, 28, 27, 25, 23, 23, 33, 32, 32, 32, 31, 31, 31, 30,
+        29, 28, 28, 26, 26, 24, 23, 23, 33, 32, 32, 32, 31, 31, 30, 30, 29, 28,
+        28, 26, 26, 24, 23, 23, 32, 32, 32, 32, 31, 30, 29, 28, 28, 27, 27, 26,
+        25, 24, 23, 22, 32, 31, 31, 31, 30, 30, 28, 28, 27, 26, 26, 24, 24, 23,
+        22, 22, 30, 30, 30, 31, 29, 29, 28, 27, 26, 24, 24, 23, 22, 22, 20, 20,
+        29, 29, 30, 30, 28, 28, 27, 26, 24, 22, 22, 21, 20, 20, 19, 19, 28, 29,
+        29, 30, 28, 28, 27, 26, 24, 22, 21, 20, 20, 19, 18, 18, 26, 27, 27, 28,
+        26, 26, 26, 24, 23, 21, 20, 19, 19, 18, 17, 17, 25, 26, 26, 27, 26, 26,
+        25, 24, 22, 20, 20, 19, 18, 17, 17, 16, 23, 24, 24, 25, 24, 24, 24, 23,
+        22, 20, 19, 18, 17, 16, 16, 15, 22, 23, 23, 23, 23, 23, 23, 22, 20, 19,
+        18, 17, 17, 16, 15, 15, 21, 22, 23, 23, 23, 23, 22, 22, 20, 19, 18, 17,
+        16, 15, 15, 14,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 30, 30, 30,
+        29, 28, 28, 27, 26, 26, 25, 23, 23, 23, 22, 21, 21, 20, 33, 33, 33, 33,
+        33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 29, 29, 29, 28,
+        26, 26, 26, 24, 24, 23, 22, 22, 22, 20, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 29, 29, 29, 28, 27, 27, 26, 24,
+        24, 24, 23, 22, 22, 21, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 30, 30, 30, 29, 29, 29, 28, 27, 27, 26, 24, 24, 24, 23, 22,
+        22, 21, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30,
+        30, 30, 30, 29, 29, 28, 27, 27, 26, 24, 24, 24, 23, 23, 23, 21, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30,
+        30, 28, 28, 28, 27, 25, 25, 25, 23, 23, 23, 22, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 28, 28, 28,
+        27, 25, 25, 25, 23, 23, 23, 22, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 31, 31, 30, 30, 30, 30, 29, 29, 28, 27, 27, 26, 25, 25, 24,
+        23, 23, 23, 22, 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31,
+        30, 29, 29, 29, 28, 28, 28, 28, 26, 26, 26, 24, 24, 24, 23, 23, 23, 21,
+        33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 29, 29, 29,
+        28, 28, 28, 27, 26, 26, 26, 24, 24, 24, 23, 23, 23, 21, 33, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 29, 29, 29, 28, 28, 28, 27,
+        26, 26, 26, 24, 24, 24, 23, 23, 23, 21, 32, 32, 32, 32, 32, 32, 32, 31,
+        31, 31, 31, 30, 29, 29, 29, 28, 28, 28, 28, 28, 28, 26, 26, 26, 25, 24,
+        24, 24, 23, 23, 23, 21, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 29,
+        29, 29, 28, 28, 28, 28, 27, 27, 27, 26, 26, 26, 25, 24, 24, 24, 23, 22,
+        22, 21, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 29, 29, 29, 28, 28,
+        28, 28, 27, 27, 27, 26, 26, 26, 25, 24, 24, 24, 23, 22, 22, 21, 32, 31,
+        31, 31, 31, 31, 31, 31, 30, 30, 30, 29, 28, 28, 28, 27, 27, 27, 26, 26,
+        26, 25, 24, 24, 24, 23, 23, 23, 22, 22, 22, 20, 30, 30, 30, 30, 30, 31,
+        31, 30, 29, 29, 29, 28, 28, 28, 27, 26, 26, 26, 24, 24, 24, 23, 23, 23,
+        22, 22, 22, 21, 20, 20, 20, 19, 30, 30, 30, 30, 30, 31, 31, 30, 29, 29,
+        29, 28, 28, 28, 27, 26, 26, 26, 24, 24, 24, 23, 23, 23, 22, 22, 22, 21,
+        20, 20, 20, 19, 30, 30, 30, 30, 30, 30, 30, 30, 29, 29, 29, 28, 28, 28,
+        27, 26, 26, 25, 24, 23, 23, 23, 22, 22, 22, 21, 21, 21, 20, 20, 20, 19,
+        29, 29, 29, 29, 30, 30, 30, 30, 28, 28, 28, 28, 27, 27, 26, 24, 24, 24,
+        22, 22, 22, 21, 21, 21, 20, 20, 20, 19, 19, 19, 19, 18, 28, 29, 29, 29,
+        29, 30, 30, 29, 28, 28, 28, 28, 27, 27, 26, 24, 24, 23, 22, 21, 21, 20,
+        20, 20, 20, 19, 19, 19, 18, 18, 18, 18, 28, 29, 29, 29, 29, 30, 30, 29,
+        28, 28, 28, 28, 27, 27, 26, 24, 24, 23, 22, 21, 21, 20, 20, 20, 20, 19,
+        19, 19, 18, 18, 18, 18, 27, 28, 28, 28, 28, 28, 28, 28, 28, 27, 27, 26,
+        26, 26, 25, 23, 23, 23, 21, 20, 20, 20, 20, 20, 19, 18, 18, 18, 18, 17,
+        17, 17, 26, 26, 27, 27, 27, 28, 28, 27, 26, 26, 26, 26, 26, 26, 24, 23,
+        23, 22, 21, 20, 20, 20, 19, 19, 19, 18, 18, 18, 17, 17, 17, 16, 26, 26,
+        27, 27, 27, 28, 28, 27, 26, 26, 26, 26, 26, 26, 24, 23, 23, 22, 21, 20,
+        20, 20, 19, 19, 19, 18, 18, 18, 17, 17, 17, 16, 25, 26, 26, 26, 26, 27,
+        27, 26, 26, 26, 26, 25, 25, 25, 24, 22, 22, 22, 20, 20, 20, 19, 19, 19,
+        18, 17, 17, 17, 17, 16, 16, 16, 23, 24, 24, 24, 24, 25, 25, 25, 24, 24,
+        24, 24, 24, 24, 23, 22, 22, 21, 20, 19, 19, 18, 18, 18, 17, 16, 16, 16,
+        16, 15, 15, 15, 23, 24, 24, 24, 24, 25, 25, 25, 24, 24, 24, 24, 24, 24,
+        23, 22, 22, 21, 20, 19, 19, 18, 18, 18, 17, 16, 16, 16, 16, 15, 15, 15,
+        23, 23, 24, 24, 24, 25, 25, 24, 24, 24, 24, 24, 24, 24, 23, 21, 21, 21,
+        19, 19, 19, 18, 18, 18, 17, 16, 16, 16, 15, 15, 15, 15, 22, 22, 23, 23,
+        23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 22, 20, 20, 20, 19, 18, 18, 18,
+        17, 17, 17, 16, 16, 15, 15, 15, 15, 14, 21, 22, 22, 22, 23, 23, 23, 23,
+        23, 23, 23, 23, 22, 22, 22, 20, 20, 20, 19, 18, 18, 17, 17, 17, 16, 15,
+        15, 15, 15, 14, 14, 14, 21, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23,
+        22, 22, 22, 20, 20, 20, 19, 18, 18, 17, 17, 17, 16, 15, 15, 15, 15, 14,
+        14, 14, 20, 20, 21, 21, 21, 22, 22, 22, 21, 21, 21, 21, 21, 21, 20, 19,
+        19, 19, 18, 18, 18, 17, 16, 16, 16, 15, 15, 15, 14, 14, 14, 13,
+        /* Size 4x8 */
+        33, 32, 29, 24, 32, 31, 30, 25, 32, 30, 28, 24, 32, 29, 27, 24, 30, 28,
+        24, 21, 28, 26, 21, 18, 24, 24, 19, 16, 22, 22, 18, 15,
+        /* Size 8x4 */
+        33, 32, 32, 32, 30, 28, 24, 22, 32, 31, 30, 29, 28, 26, 24, 22, 29, 30,
+        28, 27, 24, 21, 19, 18, 24, 25, 24, 24, 21, 18, 16, 15,
+        /* Size 8x16 */
+        32, 33, 33, 32, 29, 28, 23, 22, 33, 32, 32, 32, 29, 29, 24, 23, 33, 32,
+        32, 32, 30, 29, 25, 23, 33, 32, 32, 31, 30, 30, 25, 23, 33, 32, 31, 30,
+        29, 28, 24, 23, 32, 32, 31, 30, 28, 28, 24, 23, 32, 31, 30, 29, 28, 27,
+        24, 23, 32, 31, 30, 28, 26, 26, 23, 22, 30, 30, 29, 28, 25, 24, 21, 20,
+        29, 30, 28, 27, 23, 22, 20, 19, 28, 30, 28, 27, 22, 21, 19, 18, 26, 28,
+        26, 26, 21, 20, 18, 17, 25, 26, 26, 25, 21, 20, 17, 17, 23, 25, 24, 24,
+        20, 19, 16, 16, 22, 23, 23, 23, 19, 18, 16, 15, 21, 23, 23, 22, 19, 18,
+        15, 15,
+        /* Size 16x8 */
+        32, 33, 33, 33, 33, 32, 32, 32, 30, 29, 28, 26, 25, 23, 22, 21, 33, 32,
+        32, 32, 32, 32, 31, 31, 30, 30, 30, 28, 26, 25, 23, 23, 33, 32, 32, 32,
+        31, 31, 30, 30, 29, 28, 28, 26, 26, 24, 23, 23, 32, 32, 32, 31, 30, 30,
+        29, 28, 28, 27, 27, 26, 25, 24, 23, 22, 29, 29, 30, 30, 29, 28, 28, 26,
+        25, 23, 22, 21, 21, 20, 19, 19, 28, 29, 29, 30, 28, 28, 27, 26, 24, 22,
+        21, 20, 20, 19, 18, 18, 23, 24, 25, 25, 24, 24, 24, 23, 21, 20, 19, 18,
+        17, 16, 16, 15, 22, 23, 23, 23, 23, 23, 23, 22, 20, 19, 18, 17, 17, 16,
+        15, 15,
+        /* Size 16x32 */
+        32, 33, 33, 33, 33, 32, 32, 32, 29, 28, 28, 26, 23, 23, 22, 19, 33, 33,
+        32, 32, 32, 32, 32, 31, 29, 29, 29, 26, 24, 24, 22, 20, 33, 32, 32, 32,
+        32, 32, 32, 31, 29, 29, 29, 26, 24, 24, 23, 20, 33, 32, 32, 32, 32, 32,
+        32, 31, 29, 29, 29, 26, 24, 24, 23, 20, 33, 32, 32, 32, 32, 32, 32, 31,
+        30, 29, 29, 26, 25, 25, 23, 20, 33, 32, 32, 32, 32, 31, 31, 31, 30, 30,
+        30, 27, 25, 25, 23, 21, 33, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 27,
+        25, 25, 23, 21, 33, 32, 32, 32, 32, 31, 31, 31, 30, 29, 29, 27, 25, 25,
+        23, 21, 33, 32, 32, 32, 31, 30, 30, 30, 29, 28, 28, 26, 24, 24, 23, 21,
+        32, 32, 32, 32, 31, 30, 30, 30, 28, 28, 28, 26, 24, 24, 23, 20, 32, 32,
+        32, 32, 31, 30, 30, 30, 28, 28, 28, 26, 24, 24, 23, 20, 32, 32, 32, 32,
+        31, 29, 29, 29, 28, 28, 28, 26, 24, 24, 23, 21, 32, 32, 31, 31, 30, 29,
+        29, 28, 28, 27, 27, 25, 24, 24, 23, 21, 32, 32, 31, 31, 30, 29, 29, 28,
+        28, 27, 27, 25, 24, 24, 23, 21, 32, 31, 31, 31, 30, 28, 28, 28, 26, 26,
+        26, 24, 23, 23, 22, 20, 30, 30, 30, 30, 29, 28, 28, 27, 25, 24, 24, 23,
+        21, 21, 20, 19, 30, 30, 30, 30, 29, 28, 28, 27, 25, 24, 24, 23, 21, 21,
+        20, 19, 30, 30, 30, 30, 29, 28, 28, 27, 24, 24, 24, 22, 21, 21, 20, 19,
+        29, 29, 30, 30, 28, 27, 27, 26, 23, 22, 22, 20, 20, 20, 19, 17, 28, 29,
+        30, 30, 28, 27, 27, 26, 22, 21, 21, 20, 19, 19, 18, 17, 28, 29, 30, 30,
+        28, 27, 27, 26, 22, 21, 21, 20, 19, 19, 18, 17, 27, 28, 28, 28, 28, 26,
+        26, 25, 22, 21, 21, 19, 18, 18, 18, 16, 26, 27, 28, 28, 26, 26, 26, 24,
+        21, 20, 20, 19, 18, 18, 17, 16, 26, 27, 28, 28, 26, 26, 26, 24, 21, 20,
+        20, 19, 18, 18, 17, 16, 25, 26, 26, 26, 26, 25, 25, 24, 21, 20, 20, 18,
+        17, 17, 17, 15, 23, 24, 25, 25, 24, 24, 24, 23, 20, 19, 19, 17, 16, 16,
+        16, 14, 23, 24, 25, 25, 24, 24, 24, 23, 20, 19, 19, 17, 16, 16, 16, 14,
+        23, 24, 24, 24, 24, 24, 24, 23, 20, 19, 19, 17, 16, 16, 15, 14, 22, 23,
+        23, 23, 23, 23, 23, 22, 19, 18, 18, 17, 16, 16, 15, 14, 21, 22, 23, 23,
+        23, 22, 22, 21, 19, 18, 18, 17, 15, 15, 15, 13, 21, 22, 23, 23, 23, 22,
+        22, 21, 19, 18, 18, 17, 15, 15, 15, 13, 20, 21, 22, 22, 21, 21, 21, 20,
+        18, 18, 18, 16, 15, 15, 14, 13,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 30, 30, 30,
+        29, 28, 28, 27, 26, 26, 25, 23, 23, 23, 22, 21, 21, 20, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 29, 29, 29, 28,
+        27, 27, 26, 24, 24, 24, 23, 22, 22, 21, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 30, 28, 28, 28, 26, 25,
+        25, 24, 23, 23, 23, 22, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 30, 30, 30, 30, 30, 30, 28, 28, 28, 26, 25, 25, 24, 23, 23,
+        23, 22, 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 29,
+        29, 29, 28, 28, 28, 28, 26, 26, 26, 24, 24, 24, 23, 23, 23, 21, 32, 32,
+        32, 32, 32, 31, 31, 31, 30, 30, 30, 29, 29, 29, 28, 28, 28, 28, 27, 27,
+        27, 26, 26, 26, 25, 24, 24, 24, 23, 22, 22, 21, 32, 32, 32, 32, 32, 31,
+        31, 31, 30, 30, 30, 29, 29, 29, 28, 28, 28, 28, 27, 27, 27, 26, 26, 26,
+        25, 24, 24, 24, 23, 22, 22, 21, 32, 31, 31, 31, 31, 31, 31, 31, 30, 30,
+        30, 29, 28, 28, 28, 27, 27, 27, 26, 26, 26, 25, 24, 24, 24, 23, 23, 23,
+        22, 21, 21, 20, 29, 29, 29, 29, 30, 30, 30, 30, 29, 28, 28, 28, 28, 28,
+        26, 25, 25, 24, 23, 22, 22, 22, 21, 21, 21, 20, 20, 20, 19, 19, 19, 18,
+        28, 29, 29, 29, 29, 30, 30, 29, 28, 28, 28, 28, 27, 27, 26, 24, 24, 24,
+        22, 21, 21, 21, 20, 20, 20, 19, 19, 19, 18, 18, 18, 18, 28, 29, 29, 29,
+        29, 30, 30, 29, 28, 28, 28, 28, 27, 27, 26, 24, 24, 24, 22, 21, 21, 21,
+        20, 20, 20, 19, 19, 19, 18, 18, 18, 18, 26, 26, 26, 26, 26, 27, 27, 27,
+        26, 26, 26, 26, 25, 25, 24, 23, 23, 22, 20, 20, 20, 19, 19, 19, 18, 17,
+        17, 17, 17, 17, 17, 16, 23, 24, 24, 24, 25, 25, 25, 25, 24, 24, 24, 24,
+        24, 24, 23, 21, 21, 21, 20, 19, 19, 18, 18, 18, 17, 16, 16, 16, 16, 15,
+        15, 15, 23, 24, 24, 24, 25, 25, 25, 25, 24, 24, 24, 24, 24, 24, 23, 21,
+        21, 21, 20, 19, 19, 18, 18, 18, 17, 16, 16, 16, 16, 15, 15, 15, 22, 22,
+        23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 22, 20, 20, 20, 19, 18,
+        18, 18, 17, 17, 17, 16, 16, 15, 15, 15, 15, 14, 19, 20, 20, 20, 20, 21,
+        21, 21, 21, 20, 20, 21, 21, 21, 20, 19, 19, 19, 17, 17, 17, 16, 16, 16,
+        15, 14, 14, 14, 14, 13, 13, 13,
+        /* Size 4x16 */
+        33, 32, 28, 23, 32, 32, 29, 24, 32, 32, 29, 25, 32, 31, 30, 25, 32, 30,
+        28, 24, 32, 30, 28, 24, 32, 29, 27, 24, 31, 28, 26, 23, 30, 28, 24, 21,
+        29, 27, 22, 20, 29, 27, 21, 19, 27, 26, 20, 18, 26, 25, 20, 17, 24, 24,
+        19, 16, 23, 23, 18, 16, 22, 22, 18, 15,
+        /* Size 16x4 */
+        33, 32, 32, 32, 32, 32, 32, 31, 30, 29, 29, 27, 26, 24, 23, 22, 32, 32,
+        32, 31, 30, 30, 29, 28, 28, 27, 27, 26, 25, 24, 23, 22, 28, 29, 29, 30,
+        28, 28, 27, 26, 24, 22, 21, 20, 20, 19, 18, 18, 23, 24, 25, 25, 24, 24,
+        24, 23, 21, 20, 19, 18, 17, 16, 16, 15,
+        /* Size 8x32 */
+        32, 33, 33, 32, 29, 28, 23, 22, 33, 32, 32, 32, 29, 29, 24, 22, 33, 32,
+        32, 32, 29, 29, 24, 23, 33, 32, 32, 32, 29, 29, 24, 23, 33, 32, 32, 32,
+        30, 29, 25, 23, 33, 32, 32, 31, 30, 30, 25, 23, 33, 32, 32, 31, 30, 30,
+        25, 23, 33, 32, 32, 31, 30, 29, 25, 23, 33, 32, 31, 30, 29, 28, 24, 23,
+        32, 32, 31, 30, 28, 28, 24, 23, 32, 32, 31, 30, 28, 28, 24, 23, 32, 32,
+        31, 29, 28, 28, 24, 23, 32, 31, 30, 29, 28, 27, 24, 23, 32, 31, 30, 29,
+        28, 27, 24, 23, 32, 31, 30, 28, 26, 26, 23, 22, 30, 30, 29, 28, 25, 24,
+        21, 20, 30, 30, 29, 28, 25, 24, 21, 20, 30, 30, 29, 28, 24, 24, 21, 20,
+        29, 30, 28, 27, 23, 22, 20, 19, 28, 30, 28, 27, 22, 21, 19, 18, 28, 30,
+        28, 27, 22, 21, 19, 18, 27, 28, 28, 26, 22, 21, 18, 18, 26, 28, 26, 26,
+        21, 20, 18, 17, 26, 28, 26, 26, 21, 20, 18, 17, 25, 26, 26, 25, 21, 20,
+        17, 17, 23, 25, 24, 24, 20, 19, 16, 16, 23, 25, 24, 24, 20, 19, 16, 16,
+        23, 24, 24, 24, 20, 19, 16, 15, 22, 23, 23, 23, 19, 18, 16, 15, 21, 23,
+        23, 22, 19, 18, 15, 15, 21, 23, 23, 22, 19, 18, 15, 15, 20, 22, 21, 21,
+        18, 18, 15, 14,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 30, 30, 30,
+        29, 28, 28, 27, 26, 26, 25, 23, 23, 23, 22, 21, 21, 20, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 30, 28,
+        28, 28, 26, 25, 25, 24, 23, 23, 23, 22, 33, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 31, 30, 30, 30, 29, 29, 29, 28, 28, 28, 28, 26, 26, 26, 24,
+        24, 24, 23, 23, 23, 21, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 29,
+        29, 29, 28, 28, 28, 28, 27, 27, 27, 26, 26, 26, 25, 24, 24, 24, 23, 22,
+        22, 21, 29, 29, 29, 29, 30, 30, 30, 30, 29, 28, 28, 28, 28, 28, 26, 25,
+        25, 24, 23, 22, 22, 22, 21, 21, 21, 20, 20, 20, 19, 19, 19, 18, 28, 29,
+        29, 29, 29, 30, 30, 29, 28, 28, 28, 28, 27, 27, 26, 24, 24, 24, 22, 21,
+        21, 21, 20, 20, 20, 19, 19, 19, 18, 18, 18, 18, 23, 24, 24, 24, 25, 25,
+        25, 25, 24, 24, 24, 24, 24, 24, 23, 21, 21, 21, 20, 19, 19, 18, 18, 18,
+        17, 16, 16, 16, 16, 15, 15, 15, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23,
+        23, 23, 23, 23, 22, 20, 20, 20, 19, 18, 18, 18, 17, 17, 17, 16, 16, 15,
+        15, 15, 15, 14 },
+      { /* Chroma */
+        /* Size 4x4 */
+        33, 28, 22, 22, 28, 23, 22, 23, 22, 22, 19, 19, 22, 23, 19, 17,
+        /* Size 8x8 */
+        33, 33, 30, 28, 24, 21, 22, 21, 33, 32, 29, 26, 24, 22, 23, 22, 30, 29,
+        26, 24, 23, 22, 23, 22, 28, 26, 24, 22, 22, 22, 23, 22, 24, 24, 23, 22,
+        21, 20, 20, 20, 21, 22, 22, 22, 20, 19, 19, 19, 22, 23, 23, 23, 20, 19,
+        18, 17, 21, 22, 22, 22, 20, 19, 17, 17,
+        /* Size 16x16 */
+        32, 33, 33, 34, 31, 31, 28, 27, 25, 22, 21, 21, 21, 21, 20, 20, 33, 33,
+        33, 33, 30, 30, 27, 26, 24, 22, 22, 22, 22, 22, 21, 21, 33, 33, 33, 33,
+        30, 29, 26, 26, 24, 22, 22, 22, 22, 22, 22, 22, 34, 33, 33, 32, 30, 29,
+        26, 25, 24, 23, 22, 23, 23, 23, 22, 22, 31, 30, 30, 30, 28, 27, 24, 24,
+        23, 22, 22, 22, 22, 23, 22, 22, 31, 30, 29, 29, 27, 26, 24, 23, 23, 22,
+        22, 22, 22, 23, 22, 22, 28, 27, 26, 26, 24, 24, 22, 22, 22, 22, 21, 22,
+        22, 23, 22, 22, 27, 26, 26, 25, 24, 23, 22, 22, 21, 21, 21, 21, 22, 22,
+        22, 22, 25, 24, 24, 24, 23, 23, 22, 21, 21, 20, 20, 21, 21, 21, 20, 20,
+        22, 22, 22, 23, 22, 22, 22, 21, 20, 20, 20, 20, 20, 20, 19, 19, 21, 22,
+        22, 22, 22, 22, 21, 21, 20, 20, 19, 19, 19, 19, 19, 19, 21, 22, 22, 23,
+        22, 22, 22, 21, 21, 20, 19, 19, 19, 19, 18, 18, 21, 22, 22, 23, 22, 22,
+        22, 22, 21, 20, 19, 19, 19, 18, 18, 18, 21, 22, 22, 23, 23, 23, 23, 22,
+        21, 20, 19, 19, 18, 18, 17, 17, 20, 21, 22, 22, 22, 22, 22, 22, 20, 19,
+        19, 18, 18, 17, 17, 17, 20, 21, 22, 22, 22, 22, 22, 22, 20, 19, 19, 18,
+        18, 17, 17, 17,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 34, 34, 33, 31, 31, 31, 29, 28, 28, 27, 25, 25, 24,
+        22, 21, 21, 21, 21, 21, 21, 21, 21, 21, 20, 20, 20, 20, 33, 33, 33, 33,
+        33, 33, 33, 33, 31, 30, 30, 28, 28, 28, 26, 24, 24, 24, 22, 21, 21, 21,
+        22, 22, 22, 22, 22, 21, 21, 21, 21, 20, 33, 33, 33, 33, 33, 33, 33, 32,
+        30, 30, 30, 28, 27, 27, 26, 24, 24, 24, 22, 22, 22, 22, 22, 22, 22, 22,
+        22, 22, 21, 21, 21, 21, 33, 33, 33, 33, 33, 33, 33, 32, 30, 30, 30, 28,
+        27, 27, 26, 24, 24, 24, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21,
+        21, 21, 33, 33, 33, 33, 33, 33, 33, 32, 30, 29, 29, 28, 26, 26, 26, 24,
+        24, 24, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 34, 33,
+        33, 33, 33, 32, 32, 32, 30, 29, 29, 27, 26, 26, 25, 24, 24, 24, 23, 22,
+        22, 22, 23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 34, 33, 33, 33, 33, 32,
+        32, 32, 30, 29, 29, 27, 26, 26, 25, 24, 24, 24, 23, 22, 22, 22, 23, 23,
+        23, 23, 23, 23, 22, 22, 22, 22, 33, 33, 32, 32, 32, 32, 32, 31, 29, 28,
+        28, 27, 26, 26, 25, 24, 24, 24, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23,
+        22, 22, 22, 22, 31, 31, 30, 30, 30, 30, 30, 29, 28, 27, 27, 25, 24, 24,
+        24, 23, 23, 23, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 22, 22, 22, 22,
+        31, 30, 30, 30, 29, 29, 29, 28, 27, 26, 26, 25, 24, 24, 23, 23, 23, 23,
+        22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 22, 22, 22, 22, 31, 30, 30, 30,
+        29, 29, 29, 28, 27, 26, 26, 25, 24, 24, 23, 23, 23, 23, 22, 22, 22, 22,
+        22, 22, 22, 23, 23, 23, 22, 22, 22, 22, 29, 28, 28, 28, 28, 27, 27, 27,
+        25, 25, 25, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 23,
+        23, 23, 22, 22, 22, 22, 28, 28, 27, 27, 26, 26, 26, 26, 24, 24, 24, 22,
+        22, 22, 22, 22, 22, 22, 22, 21, 21, 22, 22, 22, 22, 23, 23, 23, 22, 22,
+        22, 22, 28, 28, 27, 27, 26, 26, 26, 26, 24, 24, 24, 22, 22, 22, 22, 22,
+        22, 22, 22, 21, 21, 22, 22, 22, 22, 23, 23, 23, 22, 22, 22, 22, 27, 26,
+        26, 26, 26, 25, 25, 25, 24, 23, 23, 22, 22, 22, 22, 21, 21, 21, 21, 21,
+        21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 21, 25, 24, 24, 24, 24, 24,
+        24, 24, 23, 23, 23, 22, 22, 22, 21, 21, 21, 21, 20, 20, 20, 20, 21, 21,
+        21, 21, 21, 21, 20, 20, 20, 20, 25, 24, 24, 24, 24, 24, 24, 24, 23, 23,
+        23, 22, 22, 22, 21, 21, 21, 21, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21,
+        20, 20, 20, 20, 24, 24, 24, 24, 24, 24, 24, 24, 23, 23, 23, 22, 22, 22,
+        21, 21, 21, 21, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
+        22, 22, 22, 22, 22, 23, 23, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 20,
+        20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 19, 19, 19, 19, 21, 21, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 20, 20, 20, 20, 19, 19, 19,
+        19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 21, 21, 22, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 21, 21, 21, 20, 20, 20, 20, 19, 19, 19, 19, 19, 19, 19,
+        19, 19, 19, 19, 19, 19, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22,
+        22, 22, 21, 20, 20, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,
+        19, 18, 21, 22, 22, 22, 22, 23, 23, 23, 22, 22, 22, 22, 22, 22, 21, 21,
+        21, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 21, 22,
+        22, 22, 22, 23, 23, 23, 22, 22, 22, 22, 22, 22, 21, 21, 21, 20, 20, 19,
+        19, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 21, 22, 22, 22, 22, 23,
+        23, 23, 22, 22, 22, 22, 22, 22, 22, 21, 21, 20, 20, 19, 19, 19, 19, 19,
+        19, 18, 18, 18, 18, 18, 18, 18, 21, 22, 22, 22, 22, 23, 23, 23, 23, 23,
+        23, 23, 23, 23, 22, 21, 21, 20, 20, 19, 19, 19, 19, 19, 18, 18, 18, 17,
+        17, 17, 17, 17, 21, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 23,
+        22, 21, 21, 20, 20, 19, 19, 19, 19, 19, 18, 18, 18, 17, 17, 17, 17, 17,
+        21, 21, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 23, 22, 21, 21, 20,
+        20, 19, 19, 19, 18, 18, 18, 17, 17, 17, 17, 17, 17, 17, 20, 21, 21, 21,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 20, 20, 20, 19, 19, 19, 19,
+        18, 18, 18, 17, 17, 17, 17, 17, 17, 17, 20, 21, 21, 21, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 20, 20, 20, 19, 19, 19, 19, 18, 18, 18, 17,
+        17, 17, 17, 17, 17, 16, 20, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22,
+        22, 22, 22, 20, 20, 20, 19, 19, 19, 19, 18, 18, 18, 17, 17, 17, 17, 17,
+        17, 16, 20, 20, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20,
+        20, 20, 19, 19, 19, 18, 18, 18, 18, 17, 17, 17, 17, 16, 16, 16,
+        /* Size 4x8 */
+        33, 27, 22, 21, 33, 26, 22, 23, 29, 24, 22, 22, 26, 22, 22, 23, 24, 22,
+        20, 20, 22, 22, 19, 19, 22, 22, 19, 18, 21, 22, 19, 17,
+        /* Size 8x4 */
+        33, 33, 29, 26, 24, 22, 22, 21, 27, 26, 24, 22, 22, 22, 22, 22, 22, 22,
+        22, 22, 20, 19, 19, 19, 21, 23, 22, 23, 20, 19, 18, 17,
+        /* Size 8x16 */
+        32, 33, 31, 28, 23, 21, 21, 20, 33, 33, 30, 27, 23, 22, 22, 21, 33, 32,
+        30, 26, 23, 22, 22, 22, 34, 32, 29, 26, 23, 22, 23, 22, 31, 29, 28, 24,
+        22, 22, 23, 22, 31, 28, 27, 24, 22, 22, 22, 22, 28, 26, 24, 22, 22, 22,
+        23, 22, 26, 25, 24, 22, 21, 21, 22, 22, 24, 24, 23, 22, 21, 20, 21, 20,
+        22, 22, 22, 21, 20, 20, 19, 19, 21, 22, 22, 21, 20, 19, 19, 19, 21, 22,
+        22, 22, 20, 19, 18, 18, 21, 23, 22, 22, 20, 19, 18, 18, 21, 23, 23, 22,
+        20, 19, 18, 17, 20, 22, 22, 22, 20, 19, 17, 17, 20, 22, 22, 22, 20, 19,
+        17, 17,
+        /* Size 16x8 */
+        32, 33, 33, 34, 31, 31, 28, 26, 24, 22, 21, 21, 21, 21, 20, 20, 33, 33,
+        32, 32, 29, 28, 26, 25, 24, 22, 22, 22, 23, 23, 22, 22, 31, 30, 30, 29,
+        28, 27, 24, 24, 23, 22, 22, 22, 22, 23, 22, 22, 28, 27, 26, 26, 24, 24,
+        22, 22, 22, 21, 21, 22, 22, 22, 22, 22, 23, 23, 23, 23, 22, 22, 22, 21,
+        21, 20, 20, 20, 20, 20, 20, 20, 21, 22, 22, 22, 22, 22, 22, 21, 20, 20,
+        19, 19, 19, 19, 19, 19, 21, 22, 22, 23, 23, 22, 23, 22, 21, 19, 19, 18,
+        18, 18, 17, 17, 20, 21, 22, 22, 22, 22, 22, 22, 20, 19, 19, 18, 18, 17,
+        17, 17,
+        /* Size 16x32 */
+        32, 33, 33, 33, 31, 28, 28, 27, 23, 21, 21, 21, 21, 21, 20, 20, 33, 33,
+        33, 33, 31, 27, 27, 26, 23, 22, 22, 21, 21, 21, 21, 20, 33, 33, 33, 33,
+        30, 27, 27, 26, 23, 22, 22, 22, 22, 22, 21, 20, 33, 33, 33, 33, 30, 27,
+        27, 26, 23, 22, 22, 22, 22, 22, 21, 20, 33, 33, 32, 32, 30, 26, 26, 26,
+        23, 22, 22, 22, 22, 22, 22, 21, 34, 33, 32, 32, 29, 26, 26, 25, 23, 22,
+        22, 23, 23, 23, 22, 21, 34, 33, 32, 32, 29, 26, 26, 25, 23, 22, 22, 23,
+        23, 23, 22, 21, 33, 32, 31, 31, 29, 26, 26, 25, 23, 22, 22, 23, 23, 23,
+        22, 21, 31, 30, 29, 29, 28, 24, 24, 24, 22, 22, 22, 22, 23, 23, 22, 22,
+        31, 29, 28, 28, 27, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 31, 29,
+        28, 28, 27, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 29, 28, 27, 27,
+        25, 23, 23, 22, 22, 22, 22, 22, 23, 23, 22, 22, 28, 26, 26, 26, 24, 22,
+        22, 22, 22, 22, 22, 22, 23, 23, 22, 22, 28, 26, 26, 26, 24, 22, 22, 22,
+        22, 22, 22, 22, 23, 23, 22, 22, 26, 26, 25, 25, 24, 22, 22, 22, 21, 21,
+        21, 22, 22, 22, 22, 21, 24, 24, 24, 24, 23, 22, 22, 21, 21, 20, 20, 21,
+        21, 21, 20, 20, 24, 24, 24, 24, 23, 22, 22, 21, 21, 20, 20, 21, 21, 21,
+        20, 20, 24, 24, 24, 24, 23, 22, 22, 21, 20, 20, 20, 20, 20, 20, 20, 20,
+        22, 22, 22, 22, 22, 21, 21, 21, 20, 20, 20, 20, 19, 19, 19, 19, 21, 22,
+        22, 22, 22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19, 19, 21, 22, 22, 22,
+        22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19, 19, 21, 22, 22, 22, 22, 22,
+        22, 21, 20, 19, 19, 19, 19, 19, 19, 18, 21, 22, 22, 22, 22, 22, 22, 21,
+        20, 19, 19, 19, 18, 18, 18, 18, 21, 22, 22, 22, 22, 22, 22, 21, 20, 19,
+        19, 19, 18, 18, 18, 18, 21, 22, 23, 23, 22, 22, 22, 22, 20, 19, 19, 19,
+        18, 18, 18, 17, 21, 22, 23, 23, 23, 22, 22, 22, 20, 19, 19, 18, 18, 18,
+        17, 17, 21, 22, 23, 23, 23, 22, 22, 22, 20, 19, 19, 18, 18, 18, 17, 17,
+        21, 22, 23, 23, 23, 22, 22, 22, 20, 19, 19, 18, 18, 18, 17, 17, 20, 21,
+        22, 22, 22, 22, 22, 21, 20, 19, 19, 18, 17, 17, 17, 16, 20, 21, 22, 22,
+        22, 22, 22, 21, 20, 19, 19, 18, 17, 17, 17, 16, 20, 21, 22, 22, 22, 22,
+        22, 21, 20, 19, 19, 18, 17, 17, 17, 16, 20, 21, 22, 22, 22, 22, 22, 21,
+        20, 19, 19, 18, 17, 17, 17, 16,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 34, 34, 33, 31, 31, 31, 29, 28, 28, 26, 24, 24, 24,
+        22, 21, 21, 21, 21, 21, 21, 21, 21, 21, 20, 20, 20, 20, 33, 33, 33, 33,
+        33, 33, 33, 32, 30, 29, 29, 28, 26, 26, 26, 24, 24, 24, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 33, 33, 33, 33, 32, 32, 32, 31,
+        29, 28, 28, 27, 26, 26, 25, 24, 24, 24, 22, 22, 22, 22, 22, 22, 23, 23,
+        23, 23, 22, 22, 22, 22, 33, 33, 33, 33, 32, 32, 32, 31, 29, 28, 28, 27,
+        26, 26, 25, 24, 24, 24, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 22, 22,
+        22, 22, 31, 31, 30, 30, 30, 29, 29, 29, 28, 27, 27, 25, 24, 24, 24, 23,
+        23, 23, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 22, 22, 22, 22, 28, 27,
+        27, 27, 26, 26, 26, 26, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 21, 21,
+        21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 28, 27, 27, 27, 26, 26,
+        26, 26, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 21, 21, 21, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 27, 26, 26, 26, 26, 25, 25, 25, 24, 23,
+        23, 22, 22, 22, 22, 21, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22,
+        21, 21, 21, 21, 23, 23, 23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 22,
+        21, 21, 21, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
+        21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 20,
+        20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 21, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 20, 20, 19, 19, 19,
+        19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 21, 21, 22, 22, 22, 23, 23, 23,
+        22, 22, 22, 22, 22, 22, 22, 21, 21, 20, 20, 19, 19, 19, 19, 19, 19, 18,
+        18, 18, 18, 18, 18, 18, 21, 21, 22, 22, 22, 23, 23, 23, 23, 22, 22, 23,
+        23, 23, 22, 21, 21, 20, 19, 19, 19, 19, 18, 18, 18, 18, 18, 18, 17, 17,
+        17, 17, 21, 21, 22, 22, 22, 23, 23, 23, 23, 22, 22, 23, 23, 23, 22, 21,
+        21, 20, 19, 19, 19, 19, 18, 18, 18, 18, 18, 18, 17, 17, 17, 17, 20, 21,
+        21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 20, 20, 20, 19, 19,
+        19, 19, 18, 18, 18, 17, 17, 17, 17, 17, 17, 17, 20, 20, 20, 20, 21, 21,
+        21, 21, 22, 22, 22, 22, 22, 22, 21, 20, 20, 20, 19, 19, 19, 18, 18, 18,
+        17, 17, 17, 17, 16, 16, 16, 16,
+        /* Size 4x16 */
+        33, 28, 21, 21, 33, 27, 22, 22, 33, 26, 22, 22, 33, 26, 22, 23, 30, 24,
+        22, 23, 29, 24, 22, 22, 26, 22, 22, 23, 26, 22, 21, 22, 24, 22, 20, 21,
+        22, 21, 20, 19, 22, 21, 19, 19, 22, 22, 19, 18, 22, 22, 19, 18, 22, 22,
+        19, 18, 21, 22, 19, 17, 21, 22, 19, 17,
+        /* Size 16x4 */
+        33, 33, 33, 33, 30, 29, 26, 26, 24, 22, 22, 22, 22, 22, 21, 21, 28, 27,
+        26, 26, 24, 24, 22, 22, 22, 21, 21, 22, 22, 22, 22, 22, 21, 22, 22, 22,
+        22, 22, 22, 21, 20, 20, 19, 19, 19, 19, 19, 19, 21, 22, 22, 23, 23, 22,
+        23, 22, 21, 19, 19, 18, 18, 18, 17, 17,
+        /* Size 8x32 */
+        32, 33, 31, 28, 23, 21, 21, 20, 33, 33, 31, 27, 23, 22, 21, 21, 33, 33,
+        30, 27, 23, 22, 22, 21, 33, 33, 30, 27, 23, 22, 22, 21, 33, 32, 30, 26,
+        23, 22, 22, 22, 34, 32, 29, 26, 23, 22, 23, 22, 34, 32, 29, 26, 23, 22,
+        23, 22, 33, 31, 29, 26, 23, 22, 23, 22, 31, 29, 28, 24, 22, 22, 23, 22,
+        31, 28, 27, 24, 22, 22, 22, 22, 31, 28, 27, 24, 22, 22, 22, 22, 29, 27,
+        25, 23, 22, 22, 23, 22, 28, 26, 24, 22, 22, 22, 23, 22, 28, 26, 24, 22,
+        22, 22, 23, 22, 26, 25, 24, 22, 21, 21, 22, 22, 24, 24, 23, 22, 21, 20,
+        21, 20, 24, 24, 23, 22, 21, 20, 21, 20, 24, 24, 23, 22, 20, 20, 20, 20,
+        22, 22, 22, 21, 20, 20, 19, 19, 21, 22, 22, 21, 20, 19, 19, 19, 21, 22,
+        22, 21, 20, 19, 19, 19, 21, 22, 22, 22, 20, 19, 19, 19, 21, 22, 22, 22,
+        20, 19, 18, 18, 21, 22, 22, 22, 20, 19, 18, 18, 21, 23, 22, 22, 20, 19,
+        18, 18, 21, 23, 23, 22, 20, 19, 18, 17, 21, 23, 23, 22, 20, 19, 18, 17,
+        21, 23, 23, 22, 20, 19, 18, 17, 20, 22, 22, 22, 20, 19, 17, 17, 20, 22,
+        22, 22, 20, 19, 17, 17, 20, 22, 22, 22, 20, 19, 17, 17, 20, 22, 22, 22,
+        20, 19, 17, 17,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 34, 34, 33, 31, 31, 31, 29, 28, 28, 26, 24, 24, 24,
+        22, 21, 21, 21, 21, 21, 21, 21, 21, 21, 20, 20, 20, 20, 33, 33, 33, 33,
+        32, 32, 32, 31, 29, 28, 28, 27, 26, 26, 25, 24, 24, 24, 22, 22, 22, 22,
+        22, 22, 23, 23, 23, 23, 22, 22, 22, 22, 31, 31, 30, 30, 30, 29, 29, 29,
+        28, 27, 27, 25, 24, 24, 24, 23, 23, 23, 22, 22, 22, 22, 22, 22, 22, 23,
+        23, 23, 22, 22, 22, 22, 28, 27, 27, 27, 26, 26, 26, 26, 24, 24, 24, 23,
+        22, 22, 22, 22, 22, 22, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22,
+        22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 22, 21, 21,
+        21, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 21, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 20, 20, 19,
+        19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 21, 21, 22, 22, 22, 23,
+        23, 23, 23, 22, 22, 23, 23, 23, 22, 21, 21, 20, 19, 19, 19, 19, 18, 18,
+        18, 18, 18, 18, 17, 17, 17, 17, 20, 21, 21, 21, 22, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 20, 20, 20, 19, 19, 19, 19, 18, 18, 18, 17, 17, 17,
+        17, 17, 17, 17 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 32, 30, 27, 32, 31, 29, 26, 30, 29, 26, 23, 27, 26, 23, 19,
+        /* Size 8x8 */
+        33, 33, 32, 32, 31, 30, 28, 25, 33, 32, 32, 32, 31, 30, 28, 26, 32, 32,
+        32, 31, 30, 29, 28, 26, 32, 32, 31, 30, 29, 28, 27, 25, 31, 31, 30, 29,
+        28, 26, 25, 23, 30, 30, 29, 28, 26, 24, 22, 21, 28, 28, 28, 27, 25, 22,
+        20, 19, 25, 26, 26, 25, 23, 21, 19, 18,
+        /* Size 16x16 */
+        32, 33, 33, 33, 33, 33, 33, 32, 32, 30, 30, 28, 28, 26, 26, 23, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 30, 30, 29, 29, 27, 27, 24, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 30, 30, 29, 29, 27, 27, 24, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 31, 31, 30, 30, 28, 28, 25, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 30, 30, 28, 28, 25, 33, 32, 32, 32, 32, 31, 31, 30, 30, 29,
+        29, 28, 28, 26, 26, 24, 33, 32, 32, 32, 32, 31, 31, 30, 30, 29, 29, 28,
+        28, 26, 26, 24, 32, 32, 32, 32, 32, 30, 30, 29, 29, 28, 28, 27, 27, 26,
+        26, 24, 32, 32, 32, 32, 32, 30, 30, 29, 29, 28, 28, 27, 27, 26, 26, 24,
+        30, 30, 30, 31, 31, 29, 29, 28, 28, 26, 26, 24, 24, 23, 23, 22, 30, 30,
+        30, 31, 31, 29, 29, 28, 28, 26, 26, 24, 24, 23, 23, 22, 28, 29, 29, 30,
+        30, 28, 28, 27, 27, 24, 24, 21, 21, 20, 20, 19, 28, 29, 29, 30, 30, 28,
+        28, 27, 27, 24, 24, 21, 21, 20, 20, 19, 26, 27, 27, 28, 28, 26, 26, 26,
+        26, 23, 23, 20, 20, 19, 19, 18, 26, 27, 27, 28, 28, 26, 26, 26, 26, 23,
+        23, 20, 20, 19, 19, 18, 23, 24, 24, 25, 25, 24, 24, 24, 24, 22, 22, 19,
+        19, 18, 18, 16,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 31,
+        30, 30, 30, 29, 28, 28, 28, 28, 26, 26, 26, 25, 23, 23, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 30,
+        29, 29, 29, 28, 26, 26, 26, 25, 24, 24, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 30, 29, 29, 29, 28,
+        27, 27, 27, 26, 24, 24, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 30, 30, 30, 30, 29, 29, 29, 28, 27, 27, 27, 26,
+        24, 24, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 30, 30, 30, 30, 29, 29, 29, 28, 27, 27, 27, 26, 24, 24, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30,
+        30, 30, 29, 29, 29, 28, 27, 27, 27, 26, 25, 25, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30,
+        30, 28, 28, 28, 28, 26, 25, 25, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30, 28, 28, 28,
+        28, 26, 25, 25, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30, 28, 28, 28, 28, 26, 25, 25,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 30,
+        30, 30, 30, 29, 29, 29, 29, 28, 27, 27, 27, 26, 25, 25, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 30, 30, 30, 30, 29, 29, 29, 28,
+        28, 28, 28, 27, 26, 26, 26, 26, 24, 24, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 31, 31, 31, 30, 30, 30, 30, 29, 29, 29, 28, 28, 28, 28, 27,
+        26, 26, 26, 26, 24, 24, 33, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31,
+        31, 31, 30, 30, 30, 30, 29, 29, 29, 28, 28, 28, 28, 27, 26, 26, 26, 26,
+        24, 24, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30,
+        30, 29, 28, 28, 28, 28, 28, 28, 28, 27, 26, 26, 26, 25, 24, 24, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 30, 29, 29, 29, 28, 28, 28,
+        28, 28, 27, 27, 27, 26, 26, 26, 26, 25, 24, 24, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 31, 30, 30, 30, 30, 29, 29, 29, 28, 28, 28, 28, 28, 27, 27,
+        27, 26, 26, 26, 26, 25, 24, 24, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+        30, 30, 30, 30, 29, 29, 29, 28, 28, 28, 28, 28, 27, 27, 27, 26, 26, 26,
+        26, 25, 24, 24, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 29,
+        28, 28, 28, 28, 27, 27, 27, 26, 26, 26, 26, 25, 24, 24, 24, 23, 23, 23,
+        30, 30, 30, 30, 30, 30, 31, 31, 31, 30, 29, 29, 29, 28, 28, 28, 28, 27,
+        26, 26, 26, 25, 24, 24, 24, 23, 23, 23, 23, 22, 22, 22, 30, 30, 30, 30,
+        30, 30, 31, 31, 31, 30, 29, 29, 29, 28, 28, 28, 28, 27, 26, 26, 26, 25,
+        24, 24, 24, 23, 23, 23, 23, 22, 22, 22, 30, 30, 30, 30, 30, 30, 31, 31,
+        31, 30, 29, 29, 29, 28, 28, 28, 28, 27, 26, 26, 26, 25, 24, 24, 24, 23,
+        23, 23, 23, 22, 22, 22, 29, 30, 30, 30, 30, 30, 30, 30, 30, 29, 28, 28,
+        28, 28, 28, 28, 28, 26, 25, 25, 25, 24, 23, 23, 23, 22, 22, 22, 22, 21,
+        20, 20, 28, 29, 29, 29, 29, 29, 30, 30, 30, 29, 28, 28, 28, 28, 27, 27,
+        27, 26, 24, 24, 24, 23, 21, 21, 21, 21, 20, 20, 20, 20, 19, 19, 28, 29,
+        29, 29, 29, 29, 30, 30, 30, 29, 28, 28, 28, 28, 27, 27, 27, 26, 24, 24,
+        24, 23, 21, 21, 21, 21, 20, 20, 20, 20, 19, 19, 28, 29, 29, 29, 29, 29,
+        30, 30, 30, 29, 28, 28, 28, 28, 27, 27, 27, 26, 24, 24, 24, 23, 21, 21,
+        21, 21, 20, 20, 20, 20, 19, 19, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
+        27, 27, 27, 27, 26, 26, 26, 25, 23, 23, 23, 22, 21, 21, 21, 20, 20, 20,
+        20, 19, 18, 18, 26, 26, 27, 27, 27, 27, 28, 28, 28, 27, 26, 26, 26, 26,
+        26, 26, 26, 24, 23, 23, 23, 22, 20, 20, 20, 20, 19, 19, 19, 18, 18, 18,
+        26, 26, 27, 27, 27, 27, 28, 28, 28, 27, 26, 26, 26, 26, 26, 26, 26, 24,
+        23, 23, 23, 22, 20, 20, 20, 20, 19, 19, 19, 18, 18, 18, 26, 26, 27, 27,
+        27, 27, 28, 28, 28, 27, 26, 26, 26, 26, 26, 26, 26, 24, 23, 23, 23, 22,
+        20, 20, 20, 20, 19, 19, 19, 18, 18, 18, 25, 25, 26, 26, 26, 26, 26, 26,
+        26, 26, 26, 26, 26, 25, 25, 25, 25, 23, 22, 22, 22, 21, 20, 20, 20, 19,
+        18, 18, 18, 18, 17, 17, 23, 24, 24, 24, 24, 25, 25, 25, 25, 25, 24, 24,
+        24, 24, 24, 24, 24, 23, 22, 22, 22, 20, 19, 19, 19, 18, 18, 18, 18, 17,
+        16, 16, 23, 24, 24, 24, 24, 25, 25, 25, 25, 25, 24, 24, 24, 24, 24, 24,
+        24, 23, 22, 22, 22, 20, 19, 19, 19, 18, 18, 18, 18, 17, 16, 16,
+        /* Size 4x8 */
+        33, 32, 30, 26, 32, 32, 30, 27, 32, 31, 30, 27, 32, 31, 28, 26, 31, 30,
+        27, 24, 30, 28, 25, 22, 28, 27, 23, 20, 26, 26, 22, 18,
+        /* Size 8x4 */
+        33, 32, 32, 32, 31, 30, 28, 26, 32, 32, 31, 31, 30, 28, 27, 26, 30, 30,
+        30, 28, 27, 25, 23, 22, 26, 27, 27, 26, 24, 22, 20, 18,
+        /* Size 8x16 */
+        32, 33, 33, 32, 32, 28, 28, 23, 33, 32, 32, 32, 32, 29, 29, 24, 33, 32,
+        32, 32, 32, 29, 29, 24, 33, 32, 32, 31, 31, 30, 30, 25, 33, 32, 32, 31,
+        31, 30, 30, 25, 32, 32, 32, 30, 30, 28, 28, 24, 32, 32, 32, 30, 30, 28,
+        28, 24, 32, 31, 31, 29, 29, 27, 27, 24, 32, 31, 31, 29, 29, 27, 27, 24,
+        30, 30, 30, 28, 28, 24, 24, 21, 30, 30, 30, 28, 28, 24, 24, 21, 28, 30,
+        30, 27, 27, 21, 21, 19, 28, 30, 30, 27, 27, 21, 21, 19, 26, 28, 28, 26,
+        26, 20, 20, 18, 26, 28, 28, 26, 26, 20, 20, 18, 23, 25, 25, 24, 24, 19,
+        19, 16,
+        /* Size 16x8 */
+        32, 33, 33, 33, 33, 32, 32, 32, 32, 30, 30, 28, 28, 26, 26, 23, 33, 32,
+        32, 32, 32, 32, 32, 31, 31, 30, 30, 30, 30, 28, 28, 25, 33, 32, 32, 32,
+        32, 32, 32, 31, 31, 30, 30, 30, 30, 28, 28, 25, 32, 32, 32, 31, 31, 30,
+        30, 29, 29, 28, 28, 27, 27, 26, 26, 24, 32, 32, 32, 31, 31, 30, 30, 29,
+        29, 28, 28, 27, 27, 26, 26, 24, 28, 29, 29, 30, 30, 28, 28, 27, 27, 24,
+        24, 21, 21, 20, 20, 19, 28, 29, 29, 30, 30, 28, 28, 27, 27, 24, 24, 21,
+        21, 20, 20, 19, 23, 24, 24, 25, 25, 24, 24, 24, 24, 21, 21, 19, 19, 18,
+        18, 16,
+        /* Size 16x32 */
+        32, 33, 33, 33, 33, 32, 32, 32, 32, 30, 28, 28, 28, 26, 23, 23, 33, 33,
+        33, 33, 33, 32, 32, 32, 32, 30, 29, 29, 29, 26, 24, 24, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 30, 29, 29, 29, 27, 24, 24, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 30, 29, 29, 29, 27, 24, 24, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 30, 29, 29, 29, 27, 24, 24, 33, 32, 32, 32, 32, 32, 32, 32, 32, 30,
+        29, 29, 29, 27, 25, 25, 33, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30,
+        30, 28, 25, 25, 33, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 28,
+        25, 25, 33, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 28, 25, 25,
+        33, 32, 32, 32, 32, 31, 31, 31, 31, 30, 29, 29, 29, 27, 25, 25, 32, 32,
+        32, 32, 32, 31, 30, 30, 30, 29, 28, 28, 28, 26, 24, 24, 32, 32, 32, 32,
+        32, 31, 30, 30, 30, 29, 28, 28, 28, 26, 24, 24, 32, 32, 32, 32, 32, 31,
+        30, 30, 30, 29, 28, 28, 28, 26, 24, 24, 32, 32, 32, 32, 32, 31, 30, 30,
+        30, 28, 28, 28, 28, 26, 24, 24, 32, 32, 31, 31, 31, 30, 29, 29, 29, 28,
+        27, 27, 27, 26, 24, 24, 32, 32, 31, 31, 31, 30, 29, 29, 29, 28, 27, 27,
+        27, 26, 24, 24, 32, 32, 31, 31, 31, 30, 29, 29, 29, 28, 27, 27, 27, 26,
+        24, 24, 31, 31, 31, 31, 31, 30, 28, 28, 28, 27, 26, 26, 26, 24, 23, 23,
+        30, 30, 30, 30, 30, 29, 28, 28, 28, 26, 24, 24, 24, 23, 21, 21, 30, 30,
+        30, 30, 30, 29, 28, 28, 28, 26, 24, 24, 24, 23, 21, 21, 30, 30, 30, 30,
+        30, 29, 28, 28, 28, 26, 24, 24, 24, 23, 21, 21, 29, 30, 30, 30, 30, 28,
+        28, 28, 28, 25, 23, 23, 23, 22, 20, 20, 28, 29, 30, 30, 30, 28, 27, 27,
+        27, 24, 21, 21, 21, 20, 19, 19, 28, 29, 30, 30, 30, 28, 27, 27, 27, 24,
+        21, 21, 21, 20, 19, 19, 28, 29, 30, 30, 30, 28, 27, 27, 27, 24, 21, 21,
+        21, 20, 19, 19, 28, 28, 28, 28, 28, 27, 26, 26, 26, 23, 21, 21, 21, 20,
+        18, 18, 26, 27, 28, 28, 28, 26, 26, 26, 26, 23, 20, 20, 20, 19, 18, 18,
+        26, 27, 28, 28, 28, 26, 26, 26, 26, 23, 20, 20, 20, 19, 18, 18, 26, 27,
+        28, 28, 28, 26, 26, 26, 26, 23, 20, 20, 20, 19, 18, 18, 25, 26, 26, 26,
+        26, 26, 24, 24, 24, 22, 20, 20, 20, 18, 17, 17, 23, 24, 25, 25, 25, 24,
+        24, 24, 24, 21, 19, 19, 19, 18, 16, 16, 23, 24, 25, 25, 25, 24, 24, 24,
+        24, 21, 19, 19, 19, 18, 16, 16,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31,
+        30, 30, 30, 29, 28, 28, 28, 28, 26, 26, 26, 25, 23, 23, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 30,
+        29, 29, 29, 28, 27, 27, 27, 26, 24, 24, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 30, 28,
+        28, 28, 28, 26, 25, 25, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 30, 28, 28, 28, 28, 26,
+        25, 25, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 30, 30, 30, 30, 30, 30, 30, 28, 28, 28, 28, 26, 25, 25, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 30, 30, 30, 30, 29, 29,
+        29, 28, 28, 28, 28, 27, 26, 26, 26, 26, 24, 24, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 31, 30, 30, 30, 30, 29, 29, 29, 28, 28, 28, 28, 28, 27, 27,
+        27, 26, 26, 26, 26, 24, 24, 24, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        30, 30, 30, 30, 29, 29, 29, 28, 28, 28, 28, 28, 27, 27, 27, 26, 26, 26,
+        26, 24, 24, 24, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30,
+        29, 29, 29, 28, 28, 28, 28, 28, 27, 27, 27, 26, 26, 26, 26, 24, 24, 24,
+        30, 30, 30, 30, 30, 30, 31, 31, 31, 30, 29, 29, 29, 28, 28, 28, 28, 27,
+        26, 26, 26, 25, 24, 24, 24, 23, 23, 23, 23, 22, 21, 21, 28, 29, 29, 29,
+        29, 29, 30, 30, 30, 29, 28, 28, 28, 28, 27, 27, 27, 26, 24, 24, 24, 23,
+        21, 21, 21, 21, 20, 20, 20, 20, 19, 19, 28, 29, 29, 29, 29, 29, 30, 30,
+        30, 29, 28, 28, 28, 28, 27, 27, 27, 26, 24, 24, 24, 23, 21, 21, 21, 21,
+        20, 20, 20, 20, 19, 19, 28, 29, 29, 29, 29, 29, 30, 30, 30, 29, 28, 28,
+        28, 28, 27, 27, 27, 26, 24, 24, 24, 23, 21, 21, 21, 21, 20, 20, 20, 20,
+        19, 19, 26, 26, 27, 27, 27, 27, 28, 28, 28, 27, 26, 26, 26, 26, 26, 26,
+        26, 24, 23, 23, 23, 22, 20, 20, 20, 20, 19, 19, 19, 18, 18, 18, 23, 24,
+        24, 24, 24, 25, 25, 25, 25, 25, 24, 24, 24, 24, 24, 24, 24, 23, 21, 21,
+        21, 20, 19, 19, 19, 18, 18, 18, 18, 17, 16, 16, 23, 24, 24, 24, 24, 25,
+        25, 25, 25, 25, 24, 24, 24, 24, 24, 24, 24, 23, 21, 21, 21, 20, 19, 19,
+        19, 18, 18, 18, 18, 17, 16, 16,
+        /* Size 4x16 */
+        33, 32, 30, 26, 32, 32, 30, 27, 32, 32, 30, 27, 32, 32, 31, 28, 32, 32,
+        31, 28, 32, 31, 29, 26, 32, 31, 29, 26, 32, 30, 28, 26, 32, 30, 28, 26,
+        30, 29, 26, 23, 30, 29, 26, 23, 29, 28, 24, 20, 29, 28, 24, 20, 27, 26,
+        23, 19, 27, 26, 23, 19, 24, 24, 21, 18,
+        /* Size 16x4 */
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 30, 30, 29, 29, 27, 27, 24, 32, 32,
+        32, 32, 32, 31, 31, 30, 30, 29, 29, 28, 28, 26, 26, 24, 30, 30, 30, 31,
+        31, 29, 29, 28, 28, 26, 26, 24, 24, 23, 23, 21, 26, 27, 27, 28, 28, 26,
+        26, 26, 26, 23, 23, 20, 20, 19, 19, 18,
+        /* Size 8x32 */
+        32, 33, 33, 32, 32, 28, 28, 23, 33, 33, 33, 32, 32, 29, 29, 24, 33, 32,
+        32, 32, 32, 29, 29, 24, 33, 32, 32, 32, 32, 29, 29, 24, 33, 32, 32, 32,
+        32, 29, 29, 24, 33, 32, 32, 32, 32, 29, 29, 25, 33, 32, 32, 31, 31, 30,
+        30, 25, 33, 32, 32, 31, 31, 30, 30, 25, 33, 32, 32, 31, 31, 30, 30, 25,
+        33, 32, 32, 31, 31, 29, 29, 25, 32, 32, 32, 30, 30, 28, 28, 24, 32, 32,
+        32, 30, 30, 28, 28, 24, 32, 32, 32, 30, 30, 28, 28, 24, 32, 32, 32, 30,
+        30, 28, 28, 24, 32, 31, 31, 29, 29, 27, 27, 24, 32, 31, 31, 29, 29, 27,
+        27, 24, 32, 31, 31, 29, 29, 27, 27, 24, 31, 31, 31, 28, 28, 26, 26, 23,
+        30, 30, 30, 28, 28, 24, 24, 21, 30, 30, 30, 28, 28, 24, 24, 21, 30, 30,
+        30, 28, 28, 24, 24, 21, 29, 30, 30, 28, 28, 23, 23, 20, 28, 30, 30, 27,
+        27, 21, 21, 19, 28, 30, 30, 27, 27, 21, 21, 19, 28, 30, 30, 27, 27, 21,
+        21, 19, 28, 28, 28, 26, 26, 21, 21, 18, 26, 28, 28, 26, 26, 20, 20, 18,
+        26, 28, 28, 26, 26, 20, 20, 18, 26, 28, 28, 26, 26, 20, 20, 18, 25, 26,
+        26, 24, 24, 20, 20, 17, 23, 25, 25, 24, 24, 19, 19, 16, 23, 25, 25, 24,
+        24, 19, 19, 16,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31,
+        30, 30, 30, 29, 28, 28, 28, 28, 26, 26, 26, 25, 23, 23, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30,
+        30, 30, 30, 28, 28, 28, 28, 26, 25, 25, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 30, 28,
+        28, 28, 28, 26, 25, 25, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30,
+        30, 30, 29, 29, 29, 28, 28, 28, 28, 28, 27, 27, 27, 26, 26, 26, 26, 24,
+        24, 24, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30, 29, 29,
+        29, 28, 28, 28, 28, 28, 27, 27, 27, 26, 26, 26, 26, 24, 24, 24, 28, 29,
+        29, 29, 29, 29, 30, 30, 30, 29, 28, 28, 28, 28, 27, 27, 27, 26, 24, 24,
+        24, 23, 21, 21, 21, 21, 20, 20, 20, 20, 19, 19, 28, 29, 29, 29, 29, 29,
+        30, 30, 30, 29, 28, 28, 28, 28, 27, 27, 27, 26, 24, 24, 24, 23, 21, 21,
+        21, 21, 20, 20, 20, 20, 19, 19, 23, 24, 24, 24, 24, 25, 25, 25, 25, 25,
+        24, 24, 24, 24, 24, 24, 24, 23, 21, 21, 21, 20, 19, 19, 19, 18, 18, 18,
+        18, 17, 16, 16 },
+      { /* Chroma */
+        /* Size 4x4 */
+        33, 30, 24, 22, 30, 26, 23, 22, 24, 23, 21, 21, 22, 22, 21, 19,
+        /* Size 8x8 */
+        33, 33, 32, 29, 26, 23, 21, 21, 33, 33, 31, 28, 25, 23, 22, 22, 32, 31,
+        29, 26, 24, 23, 22, 23, 29, 28, 26, 24, 23, 22, 22, 22, 26, 25, 24, 23,
+        22, 21, 21, 22, 23, 23, 23, 22, 21, 20, 20, 20, 21, 22, 22, 22, 21, 20,
+        19, 19, 21, 22, 23, 22, 22, 20, 19, 18,
+        /* Size 16x16 */
+        32, 33, 33, 34, 34, 31, 31, 28, 28, 25, 25, 21, 21, 21, 21, 21, 33, 33,
+        33, 33, 33, 30, 30, 27, 27, 24, 24, 22, 22, 22, 22, 22, 33, 33, 33, 33,
+        33, 30, 30, 27, 27, 24, 24, 22, 22, 22, 22, 22, 34, 33, 33, 32, 32, 29,
+        29, 26, 26, 24, 24, 22, 22, 23, 23, 23, 34, 33, 33, 32, 32, 29, 29, 26,
+        26, 24, 24, 22, 22, 23, 23, 23, 31, 30, 30, 29, 29, 26, 26, 24, 24, 23,
+        23, 22, 22, 22, 22, 23, 31, 30, 30, 29, 29, 26, 26, 24, 24, 23, 23, 22,
+        22, 22, 22, 23, 28, 27, 27, 26, 26, 24, 24, 22, 22, 22, 22, 21, 21, 22,
+        22, 23, 28, 27, 27, 26, 26, 24, 24, 22, 22, 22, 22, 21, 21, 22, 22, 23,
+        25, 24, 24, 24, 24, 23, 23, 22, 22, 21, 21, 20, 20, 21, 21, 21, 25, 24,
+        24, 24, 24, 23, 23, 22, 22, 21, 21, 20, 20, 21, 21, 21, 21, 22, 22, 22,
+        22, 22, 22, 21, 21, 20, 20, 19, 19, 19, 19, 19, 21, 22, 22, 22, 22, 22,
+        22, 21, 21, 20, 20, 19, 19, 19, 19, 19, 21, 22, 22, 23, 23, 22, 22, 22,
+        22, 21, 21, 19, 19, 19, 19, 19, 21, 22, 22, 23, 23, 22, 22, 22, 22, 21,
+        21, 19, 19, 19, 19, 19, 21, 22, 22, 23, 23, 23, 23, 23, 23, 21, 21, 19,
+        19, 19, 19, 18,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 34, 34, 34, 32, 31, 31, 31, 29, 28, 28, 28, 26,
+        25, 25, 25, 23, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 32, 30, 30, 30, 29, 28, 28, 28, 26, 24, 24, 24, 23,
+        21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 31, 30, 30, 30, 28, 27, 27, 27, 26, 24, 24, 24, 23, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 30,
+        30, 28, 27, 27, 27, 26, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22,
+        22, 22, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 30, 30, 28, 27, 27,
+        27, 26, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 31, 29, 29, 29, 28, 26, 26, 26, 25, 24, 24,
+        24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 34, 33, 33, 33, 33, 33,
+        32, 32, 32, 31, 29, 29, 29, 28, 26, 26, 26, 25, 24, 24, 24, 23, 22, 22,
+        22, 22, 23, 23, 23, 23, 23, 23, 34, 33, 33, 33, 33, 33, 32, 32, 32, 31,
+        29, 29, 29, 28, 26, 26, 26, 25, 24, 24, 24, 23, 22, 22, 22, 22, 23, 23,
+        23, 23, 23, 23, 34, 33, 33, 33, 33, 33, 32, 32, 32, 31, 29, 29, 29, 28,
+        26, 26, 26, 25, 24, 24, 24, 23, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23,
+        32, 32, 31, 31, 31, 31, 31, 31, 31, 29, 28, 28, 28, 26, 25, 25, 25, 24,
+        24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 31, 30, 30, 30,
+        30, 29, 29, 29, 29, 28, 26, 26, 26, 25, 24, 24, 24, 23, 23, 23, 23, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 31, 30, 30, 30, 30, 29, 29, 29,
+        29, 28, 26, 26, 26, 25, 24, 24, 24, 23, 23, 23, 23, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 23, 23, 31, 30, 30, 30, 30, 29, 29, 29, 29, 28, 26, 26,
+        26, 25, 24, 24, 24, 23, 23, 23, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22,
+        23, 23, 29, 29, 28, 28, 28, 28, 28, 28, 28, 26, 25, 25, 25, 24, 23, 23,
+        23, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 28, 28,
+        27, 27, 27, 26, 26, 26, 26, 25, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22,
+        22, 22, 21, 21, 21, 22, 22, 22, 22, 22, 23, 23, 28, 28, 27, 27, 27, 26,
+        26, 26, 26, 25, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21,
+        21, 22, 22, 22, 22, 22, 23, 23, 28, 28, 27, 27, 27, 26, 26, 26, 26, 25,
+        24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 22, 22, 22,
+        22, 22, 23, 23, 26, 26, 26, 26, 26, 25, 25, 25, 25, 24, 23, 23, 23, 23,
+        22, 22, 22, 22, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22,
+        25, 24, 24, 24, 24, 24, 24, 24, 24, 24, 23, 23, 23, 22, 22, 22, 22, 21,
+        21, 21, 21, 21, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 25, 24, 24, 24,
+        24, 24, 24, 24, 24, 24, 23, 23, 23, 22, 22, 22, 22, 21, 21, 21, 21, 21,
+        20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 25, 24, 24, 24, 24, 24, 24, 24,
+        24, 24, 23, 23, 23, 22, 22, 22, 22, 21, 21, 21, 21, 21, 20, 20, 20, 20,
+        21, 21, 21, 21, 21, 21, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 22, 22,
+        22, 22, 22, 22, 22, 21, 21, 21, 21, 20, 20, 20, 20, 20, 20, 20, 20, 20,
+        20, 20, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21,
+        21, 21, 20, 20, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 21, 21,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 20, 20,
+        20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 21, 21, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 20, 20, 20, 20, 19, 19,
+        19, 19, 19, 19, 19, 19, 19, 19, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 20, 20, 19, 19, 19, 19, 19, 19,
+        19, 19, 19, 19, 21, 21, 22, 22, 22, 22, 23, 23, 23, 22, 22, 22, 22, 22,
+        22, 22, 22, 21, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,
+        21, 21, 22, 22, 22, 22, 23, 23, 23, 22, 22, 22, 22, 22, 22, 22, 22, 21,
+        21, 21, 21, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 21, 21, 22, 22,
+        22, 22, 23, 23, 23, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 20,
+        19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 21, 21, 22, 22, 22, 22, 23, 23,
+        23, 23, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 20, 19, 19, 19, 19,
+        19, 19, 19, 18, 18, 18, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23,
+        23, 23, 23, 23, 23, 22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19, 19, 18,
+        18, 18, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
+        23, 22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18,
+        /* Size 4x8 */
+        33, 30, 24, 21, 33, 29, 24, 22, 31, 28, 23, 22, 28, 25, 22, 22, 26, 23,
+        21, 21, 23, 22, 21, 20, 22, 22, 20, 19, 22, 22, 21, 19,
+        /* Size 8x4 */
+        33, 33, 31, 28, 26, 23, 22, 22, 30, 29, 28, 25, 23, 22, 22, 22, 24, 24,
+        23, 22, 21, 21, 20, 21, 21, 22, 22, 22, 21, 20, 19, 19,
+        /* Size 8x16 */
+        32, 33, 33, 28, 28, 21, 21, 21, 33, 33, 33, 27, 27, 22, 22, 22, 33, 33,
+        33, 27, 27, 22, 22, 22, 34, 32, 32, 26, 26, 22, 22, 23, 34, 32, 32, 26,
+        26, 22, 22, 23, 31, 28, 28, 24, 24, 22, 22, 22, 31, 28, 28, 24, 24, 22,
+        22, 22, 28, 26, 26, 22, 22, 22, 22, 23, 28, 26, 26, 22, 22, 22, 22, 23,
+        24, 24, 24, 22, 22, 20, 20, 21, 24, 24, 24, 22, 22, 20, 20, 21, 21, 22,
+        22, 21, 21, 19, 19, 19, 21, 22, 22, 21, 21, 19, 19, 19, 21, 22, 22, 22,
+        22, 19, 19, 18, 21, 22, 22, 22, 22, 19, 19, 18, 21, 23, 23, 22, 22, 19,
+        19, 18,
+        /* Size 16x8 */
+        32, 33, 33, 34, 34, 31, 31, 28, 28, 24, 24, 21, 21, 21, 21, 21, 33, 33,
+        33, 32, 32, 28, 28, 26, 26, 24, 24, 22, 22, 22, 22, 23, 33, 33, 33, 32,
+        32, 28, 28, 26, 26, 24, 24, 22, 22, 22, 22, 23, 28, 27, 27, 26, 26, 24,
+        24, 22, 22, 22, 22, 21, 21, 22, 22, 22, 28, 27, 27, 26, 26, 24, 24, 22,
+        22, 22, 22, 21, 21, 22, 22, 22, 21, 22, 22, 22, 22, 22, 22, 22, 22, 20,
+        20, 19, 19, 19, 19, 19, 21, 22, 22, 22, 22, 22, 22, 22, 22, 20, 20, 19,
+        19, 19, 19, 19, 21, 22, 22, 23, 23, 22, 22, 23, 23, 21, 21, 19, 19, 18,
+        18, 18,
+        /* Size 16x32 */
+        32, 33, 33, 33, 33, 31, 28, 28, 28, 24, 21, 21, 21, 21, 21, 21, 33, 33,
+        33, 33, 33, 30, 28, 28, 28, 24, 22, 22, 22, 21, 21, 21, 33, 33, 33, 33,
+        33, 30, 27, 27, 27, 24, 22, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 30,
+        27, 27, 27, 24, 22, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 30, 27, 27,
+        27, 24, 22, 22, 22, 22, 22, 22, 33, 33, 32, 32, 32, 29, 26, 26, 26, 24,
+        22, 22, 22, 22, 22, 22, 34, 33, 32, 32, 32, 29, 26, 26, 26, 24, 22, 22,
+        22, 23, 23, 23, 34, 33, 32, 32, 32, 29, 26, 26, 26, 24, 22, 22, 22, 23,
+        23, 23, 34, 33, 32, 32, 32, 29, 26, 26, 26, 24, 22, 22, 22, 23, 23, 23,
+        32, 31, 30, 30, 30, 28, 25, 25, 25, 23, 22, 22, 22, 22, 23, 23, 31, 30,
+        28, 28, 28, 26, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 31, 30, 28, 28,
+        28, 26, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 31, 30, 28, 28, 28, 26,
+        24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 29, 28, 27, 27, 27, 25, 23, 23,
+        23, 22, 22, 22, 22, 22, 23, 23, 28, 27, 26, 26, 26, 24, 22, 22, 22, 22,
+        22, 22, 22, 22, 23, 23, 28, 27, 26, 26, 26, 24, 22, 22, 22, 22, 22, 22,
+        22, 22, 23, 23, 28, 27, 26, 26, 26, 24, 22, 22, 22, 22, 22, 22, 22, 22,
+        23, 23, 26, 26, 25, 25, 25, 23, 22, 22, 22, 21, 21, 21, 21, 21, 22, 22,
+        24, 24, 24, 24, 24, 23, 22, 22, 22, 21, 20, 20, 20, 20, 21, 21, 24, 24,
+        24, 24, 24, 23, 22, 22, 22, 21, 20, 20, 20, 20, 21, 21, 24, 24, 24, 24,
+        24, 23, 22, 22, 22, 21, 20, 20, 20, 20, 21, 21, 23, 23, 23, 23, 23, 22,
+        22, 22, 22, 21, 20, 20, 20, 20, 20, 20, 21, 21, 22, 22, 22, 22, 21, 21,
+        21, 20, 19, 19, 19, 19, 19, 19, 21, 21, 22, 22, 22, 22, 21, 21, 21, 20,
+        19, 19, 19, 19, 19, 19, 21, 21, 22, 22, 22, 22, 21, 21, 21, 20, 19, 19,
+        19, 19, 19, 19, 21, 22, 22, 22, 22, 22, 22, 22, 22, 20, 19, 19, 19, 19,
+        19, 19, 21, 22, 22, 22, 22, 22, 22, 22, 22, 20, 19, 19, 19, 19, 18, 18,
+        21, 22, 22, 22, 22, 22, 22, 22, 22, 20, 19, 19, 19, 19, 18, 18, 21, 22,
+        22, 22, 22, 22, 22, 22, 22, 20, 19, 19, 19, 19, 18, 18, 21, 22, 23, 23,
+        23, 22, 22, 22, 22, 21, 19, 19, 19, 19, 18, 18, 21, 22, 23, 23, 23, 23,
+        22, 22, 22, 21, 19, 19, 19, 18, 18, 18, 21, 22, 23, 23, 23, 23, 22, 22,
+        22, 21, 19, 19, 19, 18, 18, 18,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 33, 34, 34, 34, 32, 31, 31, 31, 29, 28, 28, 28, 26,
+        24, 24, 24, 23, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 31, 30, 30, 30, 28, 27, 27, 27, 26, 24, 24, 24, 23,
+        21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 32, 32, 32,
+        32, 30, 28, 28, 28, 27, 26, 26, 26, 25, 24, 24, 24, 23, 22, 22, 22, 22,
+        22, 22, 22, 23, 23, 23, 33, 33, 33, 33, 33, 32, 32, 32, 32, 30, 28, 28,
+        28, 27, 26, 26, 26, 25, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 23,
+        23, 23, 33, 33, 33, 33, 33, 32, 32, 32, 32, 30, 28, 28, 28, 27, 26, 26,
+        26, 25, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 31, 30,
+        30, 30, 30, 29, 29, 29, 29, 28, 26, 26, 26, 25, 24, 24, 24, 23, 23, 23,
+        23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 28, 28, 27, 27, 27, 26,
+        26, 26, 26, 25, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21,
+        21, 22, 22, 22, 22, 22, 22, 22, 28, 28, 27, 27, 27, 26, 26, 26, 26, 25,
+        24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 22, 22, 22,
+        22, 22, 22, 22, 28, 28, 27, 27, 27, 26, 26, 26, 26, 25, 24, 24, 24, 23,
+        22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22,
+        24, 24, 24, 24, 24, 24, 24, 24, 24, 23, 23, 23, 23, 22, 22, 22, 22, 21,
+        21, 21, 21, 21, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 20, 20,
+        19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 21, 22, 22, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 20, 20, 19, 19, 19, 19,
+        19, 19, 19, 19, 19, 19, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 21, 20, 20, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19,
+        19, 19, 21, 21, 22, 22, 22, 22, 23, 23, 23, 22, 22, 22, 22, 22, 22, 22,
+        22, 21, 20, 20, 20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 21, 21,
+        22, 22, 22, 22, 23, 23, 23, 23, 22, 22, 22, 23, 23, 23, 23, 22, 21, 21,
+        21, 20, 19, 19, 19, 19, 18, 18, 18, 18, 18, 18, 21, 21, 22, 22, 22, 22,
+        23, 23, 23, 23, 22, 22, 22, 23, 23, 23, 23, 22, 21, 21, 21, 20, 19, 19,
+        19, 19, 18, 18, 18, 18, 18, 18,
+        /* Size 4x16 */
+        33, 31, 24, 21, 33, 30, 24, 22, 33, 30, 24, 22, 33, 29, 24, 23, 33, 29,
+        24, 23, 30, 26, 23, 22, 30, 26, 23, 22, 27, 24, 22, 22, 27, 24, 22, 22,
+        24, 23, 21, 20, 24, 23, 21, 20, 21, 22, 20, 19, 21, 22, 20, 19, 22, 22,
+        20, 19, 22, 22, 20, 19, 22, 23, 21, 18,
+        /* Size 16x4 */
+        33, 33, 33, 33, 33, 30, 30, 27, 27, 24, 24, 21, 21, 22, 22, 22, 31, 30,
+        30, 29, 29, 26, 26, 24, 24, 23, 23, 22, 22, 22, 22, 23, 24, 24, 24, 24,
+        24, 23, 23, 22, 22, 21, 21, 20, 20, 20, 20, 21, 21, 22, 22, 23, 23, 22,
+        22, 22, 22, 20, 20, 19, 19, 19, 19, 18,
+        /* Size 8x32 */
+        32, 33, 33, 28, 28, 21, 21, 21, 33, 33, 33, 28, 28, 22, 22, 21, 33, 33,
+        33, 27, 27, 22, 22, 22, 33, 33, 33, 27, 27, 22, 22, 22, 33, 33, 33, 27,
+        27, 22, 22, 22, 33, 32, 32, 26, 26, 22, 22, 22, 34, 32, 32, 26, 26, 22,
+        22, 23, 34, 32, 32, 26, 26, 22, 22, 23, 34, 32, 32, 26, 26, 22, 22, 23,
+        32, 30, 30, 25, 25, 22, 22, 23, 31, 28, 28, 24, 24, 22, 22, 22, 31, 28,
+        28, 24, 24, 22, 22, 22, 31, 28, 28, 24, 24, 22, 22, 22, 29, 27, 27, 23,
+        23, 22, 22, 23, 28, 26, 26, 22, 22, 22, 22, 23, 28, 26, 26, 22, 22, 22,
+        22, 23, 28, 26, 26, 22, 22, 22, 22, 23, 26, 25, 25, 22, 22, 21, 21, 22,
+        24, 24, 24, 22, 22, 20, 20, 21, 24, 24, 24, 22, 22, 20, 20, 21, 24, 24,
+        24, 22, 22, 20, 20, 21, 23, 23, 23, 22, 22, 20, 20, 20, 21, 22, 22, 21,
+        21, 19, 19, 19, 21, 22, 22, 21, 21, 19, 19, 19, 21, 22, 22, 21, 21, 19,
+        19, 19, 21, 22, 22, 22, 22, 19, 19, 19, 21, 22, 22, 22, 22, 19, 19, 18,
+        21, 22, 22, 22, 22, 19, 19, 18, 21, 22, 22, 22, 22, 19, 19, 18, 21, 23,
+        23, 22, 22, 19, 19, 18, 21, 23, 23, 22, 22, 19, 19, 18, 21, 23, 23, 22,
+        22, 19, 19, 18,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 33, 34, 34, 34, 32, 31, 31, 31, 29, 28, 28, 28, 26,
+        24, 24, 24, 23, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 33, 33, 33, 33,
+        33, 32, 32, 32, 32, 30, 28, 28, 28, 27, 26, 26, 26, 25, 24, 24, 24, 23,
+        22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 33, 33, 33, 33, 33, 32, 32, 32,
+        32, 30, 28, 28, 28, 27, 26, 26, 26, 25, 24, 24, 24, 23, 22, 22, 22, 22,
+        22, 22, 22, 23, 23, 23, 28, 28, 27, 27, 27, 26, 26, 26, 26, 25, 24, 24,
+        24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 22, 22, 22, 22, 22,
+        22, 22, 28, 28, 27, 27, 27, 26, 26, 26, 26, 25, 24, 24, 24, 23, 22, 22,
+        22, 22, 22, 22, 22, 22, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 21, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20,
+        20, 20, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 21, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 20, 20, 19, 19,
+        19, 19, 19, 19, 19, 19, 19, 19, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23,
+        22, 22, 22, 23, 23, 23, 23, 22, 21, 21, 21, 20, 19, 19, 19, 19, 18, 18,
+        18, 18, 18, 18 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        32, 32, 32, 29, 32, 32, 31, 29, 32, 31, 29, 27, 29, 29, 27, 22,
+        /* Size 8x8 */
+        33, 33, 33, 32, 32, 32, 30, 29, 33, 32, 32, 32, 32, 31, 30, 29, 33, 32,
+        32, 32, 32, 31, 31, 30, 32, 32, 32, 31, 30, 30, 29, 28, 32, 32, 32, 30,
+        29, 29, 28, 27, 32, 31, 31, 30, 29, 28, 27, 26, 30, 30, 31, 29, 28, 27,
+        26, 24, 29, 29, 30, 28, 27, 26, 24, 21,
+        /* Size 16x16 */
+        32, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 31, 30, 30, 28, 28, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 29, 29, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 29, 29, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 30, 30, 29, 29, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 31, 31, 30, 30, 30, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 30, 30, 30, 33, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 30,
+        29, 29, 28, 28, 33, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 29, 29, 28,
+        28, 28, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 30, 29, 28, 28, 28, 28,
+        32, 32, 32, 32, 32, 32, 31, 30, 30, 29, 29, 28, 28, 28, 27, 27, 32, 32,
+        32, 32, 32, 32, 31, 30, 30, 29, 29, 28, 28, 28, 27, 27, 31, 31, 31, 31,
+        31, 31, 30, 29, 29, 28, 28, 27, 26, 26, 24, 24, 30, 30, 30, 30, 31, 31,
+        29, 29, 28, 28, 28, 26, 26, 25, 24, 24, 30, 30, 30, 30, 30, 30, 29, 28,
+        28, 28, 28, 26, 25, 24, 23, 23, 28, 29, 29, 29, 30, 30, 28, 28, 28, 27,
+        27, 24, 24, 23, 21, 21, 28, 29, 29, 29, 30, 30, 28, 28, 28, 27, 27, 24,
+        24, 23, 21, 21,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32,
+        32, 32, 32, 32, 31, 30, 30, 30, 30, 29, 28, 28, 28, 28, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 30, 30, 30, 30, 29, 29, 29, 29, 28, 33, 33, 33, 33, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 30,
+        30, 29, 29, 29, 29, 28, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 30, 30, 29, 29, 29,
+        29, 28, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 31, 30, 30, 30, 30, 29, 29, 29, 29, 28, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 30, 30, 30, 30, 29, 29, 29, 29, 28, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30,
+        30, 30, 30, 30, 29, 29, 29, 28, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 30, 30,
+        30, 30, 30, 29, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 29,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 29, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+        31, 31, 31, 31, 30, 30, 30, 30, 30, 29, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30,
+        30, 29, 29, 29, 29, 28, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 29, 29, 29, 29, 28, 28, 28,
+        28, 28, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31,
+        31, 30, 30, 30, 30, 30, 29, 29, 29, 29, 28, 28, 28, 28, 28, 28, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 30, 30, 30,
+        30, 30, 29, 29, 29, 29, 28, 28, 28, 28, 28, 28, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 29, 29,
+        29, 29, 28, 28, 28, 28, 28, 28, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 29, 29, 28, 28, 28, 28, 28,
+        28, 28, 28, 27, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30,
+        30, 30, 30, 29, 29, 29, 29, 29, 28, 28, 28, 28, 28, 28, 27, 27, 27, 26,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 30, 30, 29,
+        29, 29, 29, 28, 28, 28, 28, 28, 28, 27, 27, 27, 27, 26, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 30, 30, 29, 29, 29, 29, 28,
+        28, 28, 28, 28, 28, 27, 27, 27, 27, 26, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 31, 31, 30, 30, 30, 30, 29, 29, 29, 29, 28, 28, 28, 28, 28,
+        28, 27, 27, 27, 27, 26, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        30, 30, 30, 30, 29, 29, 28, 28, 28, 28, 28, 27, 27, 27, 27, 26, 26, 26,
+        26, 25, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 29, 29, 29,
+        29, 28, 28, 28, 28, 28, 27, 26, 26, 26, 26, 25, 24, 24, 24, 24, 30, 30,
+        30, 30, 30, 30, 30, 31, 31, 31, 31, 30, 29, 29, 29, 29, 28, 28, 28, 28,
+        28, 27, 26, 26, 26, 26, 25, 24, 24, 24, 24, 24, 30, 30, 30, 30, 30, 30,
+        30, 31, 31, 31, 31, 30, 29, 29, 29, 29, 28, 28, 28, 28, 28, 27, 26, 26,
+        26, 26, 25, 24, 24, 24, 24, 24, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31,
+        31, 30, 29, 29, 29, 29, 28, 28, 28, 28, 28, 27, 26, 26, 26, 26, 25, 24,
+        24, 24, 24, 24, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 29, 28,
+        28, 28, 28, 28, 28, 28, 28, 27, 26, 25, 25, 25, 24, 23, 23, 23, 23, 23,
+        29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 29, 28, 28, 28, 28, 28, 28,
+        27, 27, 27, 26, 25, 24, 24, 24, 23, 22, 22, 22, 22, 21, 28, 29, 29, 29,
+        29, 29, 29, 30, 30, 30, 30, 29, 28, 28, 28, 28, 28, 27, 27, 27, 27, 26,
+        24, 24, 24, 24, 23, 22, 21, 21, 21, 21, 28, 29, 29, 29, 29, 29, 29, 30,
+        30, 30, 30, 29, 28, 28, 28, 28, 28, 27, 27, 27, 27, 26, 24, 24, 24, 24,
+        23, 22, 21, 21, 21, 21, 28, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 29,
+        28, 28, 28, 28, 28, 27, 27, 27, 27, 26, 24, 24, 24, 24, 23, 22, 21, 21,
+        21, 21, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 28, 28, 28, 28, 28,
+        27, 26, 26, 26, 26, 25, 24, 24, 24, 24, 23, 21, 21, 21, 21, 20,
+        /* Size 4x8 */
+        33, 33, 32, 29, 32, 32, 32, 29, 32, 32, 31, 30, 32, 32, 30, 28, 32, 31,
+        29, 27, 31, 31, 28, 26, 30, 30, 28, 24, 29, 30, 27, 21,
+        /* Size 8x4 */
+        33, 32, 32, 32, 32, 31, 30, 29, 33, 32, 32, 32, 31, 31, 30, 30, 32, 32,
+        31, 30, 29, 28, 28, 27, 29, 29, 30, 28, 27, 26, 24, 21,
+        /* Size 8x16 */
+        32, 33, 33, 33, 32, 32, 29, 28, 33, 32, 32, 32, 32, 32, 29, 29, 33, 32,
+        32, 32, 32, 32, 29, 29, 33, 32, 32, 32, 32, 32, 30, 29, 33, 32, 32, 32,
+        31, 31, 30, 30, 33, 32, 32, 32, 31, 31, 30, 30, 33, 32, 32, 31, 30, 30,
+        29, 28, 32, 32, 32, 31, 30, 30, 28, 28, 32, 32, 32, 31, 30, 30, 28, 28,
+        32, 32, 31, 30, 29, 29, 28, 27, 32, 32, 31, 30, 29, 29, 28, 27, 31, 31,
+        31, 29, 28, 28, 26, 25, 30, 30, 30, 29, 28, 28, 25, 24, 30, 30, 30, 29,
+        28, 28, 24, 23, 28, 29, 30, 28, 27, 27, 22, 21, 28, 29, 30, 28, 27, 27,
+        22, 21,
+        /* Size 16x8 */
+        32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 31, 30, 30, 28, 28, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 29, 29, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 33, 32, 32, 32, 32, 32,
+        31, 31, 31, 30, 30, 29, 29, 29, 28, 28, 32, 32, 32, 32, 31, 31, 30, 30,
+        30, 29, 29, 28, 28, 28, 27, 27, 32, 32, 32, 32, 31, 31, 30, 30, 30, 29,
+        29, 28, 28, 28, 27, 27, 29, 29, 29, 30, 30, 30, 29, 28, 28, 28, 28, 26,
+        25, 24, 22, 22, 28, 29, 29, 29, 30, 30, 28, 28, 28, 27, 27, 25, 24, 23,
+        21, 21,
+        /* Size 16x32 */
+        32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 31, 29, 28, 28, 28, 33, 33,
+        33, 33, 33, 33, 32, 32, 32, 32, 32, 31, 29, 29, 29, 29, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 29, 29, 29, 29, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 29, 29, 29, 29, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 31, 29, 29, 29, 29, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 29, 29, 29, 29, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+        30, 29, 29, 29, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 29,
+        29, 29, 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30,
+        33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30, 33, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30, 33, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 31, 30, 29, 29, 29, 29, 33, 32, 32, 32, 32, 32,
+        31, 31, 30, 30, 30, 30, 29, 28, 28, 28, 32, 32, 32, 32, 32, 32, 31, 30,
+        30, 30, 30, 29, 28, 28, 28, 28, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30,
+        30, 29, 28, 28, 28, 28, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 30, 29,
+        28, 28, 28, 28, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 30, 29, 28, 28,
+        28, 28, 32, 32, 32, 31, 31, 31, 31, 30, 29, 29, 29, 28, 28, 27, 27, 27,
+        32, 32, 32, 31, 31, 31, 30, 29, 29, 29, 29, 28, 28, 27, 27, 27, 32, 32,
+        32, 31, 31, 31, 30, 29, 29, 29, 29, 28, 28, 27, 27, 27, 32, 32, 32, 31,
+        31, 31, 30, 29, 29, 29, 29, 28, 28, 27, 27, 27, 32, 31, 31, 31, 31, 31,
+        30, 29, 28, 28, 28, 28, 26, 26, 26, 26, 31, 31, 31, 31, 31, 31, 29, 28,
+        28, 28, 28, 27, 26, 25, 25, 25, 30, 30, 30, 30, 30, 30, 29, 28, 28, 28,
+        28, 26, 25, 24, 24, 24, 30, 30, 30, 30, 30, 30, 29, 28, 28, 28, 28, 26,
+        25, 24, 24, 24, 30, 30, 30, 30, 30, 30, 29, 28, 28, 28, 28, 26, 25, 24,
+        24, 24, 30, 30, 30, 30, 30, 30, 29, 28, 28, 28, 28, 26, 24, 23, 23, 23,
+        29, 29, 30, 30, 30, 30, 28, 28, 27, 27, 27, 25, 23, 22, 22, 22, 28, 29,
+        29, 30, 30, 30, 28, 28, 27, 27, 27, 24, 22, 21, 21, 21, 28, 29, 29, 30,
+        30, 30, 28, 28, 27, 27, 27, 24, 22, 21, 21, 21, 28, 29, 29, 30, 30, 30,
+        28, 28, 27, 27, 27, 24, 22, 21, 21, 21, 28, 28, 28, 28, 28, 28, 28, 27,
+        26, 26, 26, 24, 22, 21, 21, 21,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 30, 30, 30, 30, 29, 28, 28, 28, 28, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+        31, 30, 30, 30, 30, 29, 29, 29, 29, 28, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 30,
+        30, 30, 29, 29, 29, 28, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 30,
+        30, 28, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 30, 30, 28, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31,
+        31, 31, 31, 30, 30, 30, 30, 30, 30, 30, 30, 28, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 29, 29,
+        29, 29, 29, 28, 28, 28, 28, 28, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 30, 30, 30, 30, 30, 29, 29, 29, 29, 28, 28, 28, 28, 28, 28,
+        28, 28, 28, 27, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30,
+        30, 30, 30, 29, 29, 29, 29, 28, 28, 28, 28, 28, 28, 27, 27, 27, 27, 26,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30, 30, 29,
+        29, 29, 29, 28, 28, 28, 28, 28, 28, 27, 27, 27, 27, 26, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30, 30, 29, 29, 29, 29, 28,
+        28, 28, 28, 28, 28, 27, 27, 27, 27, 26, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 30, 30, 29, 29, 29, 29, 28, 28, 28, 28, 28, 27, 26, 26, 26,
+        26, 25, 24, 24, 24, 24, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 29,
+        29, 28, 28, 28, 28, 28, 28, 28, 28, 26, 26, 25, 25, 25, 24, 23, 22, 22,
+        22, 22, 28, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 29, 28, 28, 28, 28,
+        28, 27, 27, 27, 27, 26, 25, 24, 24, 24, 23, 22, 21, 21, 21, 21, 28, 29,
+        29, 29, 29, 29, 29, 29, 30, 30, 30, 29, 28, 28, 28, 28, 28, 27, 27, 27,
+        27, 26, 25, 24, 24, 24, 23, 22, 21, 21, 21, 21, 28, 29, 29, 29, 29, 29,
+        29, 29, 30, 30, 30, 29, 28, 28, 28, 28, 28, 27, 27, 27, 27, 26, 25, 24,
+        24, 24, 23, 22, 21, 21, 21, 21,
+        /* Size 4x16 */
+        33, 33, 32, 28, 33, 32, 32, 29, 32, 32, 32, 29, 32, 32, 32, 29, 32, 32,
+        31, 30, 32, 32, 31, 30, 32, 32, 30, 28, 32, 32, 30, 28, 32, 32, 30, 28,
+        32, 31, 29, 27, 32, 31, 29, 27, 31, 31, 28, 25, 30, 30, 28, 24, 30, 30,
+        28, 23, 29, 30, 27, 21, 29, 30, 27, 21,
+        /* Size 16x4 */
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 29, 29, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 32, 32, 32, 32,
+        31, 31, 30, 30, 30, 29, 29, 28, 28, 28, 27, 27, 28, 29, 29, 29, 30, 30,
+        28, 28, 28, 27, 27, 25, 24, 23, 21, 21,
+        /* Size 8x32 */
+        32, 33, 33, 33, 32, 32, 29, 28, 33, 33, 33, 32, 32, 32, 29, 29, 33, 32,
+        32, 32, 32, 32, 29, 29, 33, 32, 32, 32, 32, 32, 29, 29, 33, 32, 32, 32,
+        32, 32, 29, 29, 33, 32, 32, 32, 32, 32, 29, 29, 33, 32, 32, 32, 32, 32,
+        30, 29, 33, 32, 32, 32, 32, 32, 30, 29, 33, 32, 32, 32, 31, 31, 30, 30,
+        33, 32, 32, 32, 31, 31, 30, 30, 33, 32, 32, 32, 31, 31, 30, 30, 33, 32,
+        32, 31, 31, 31, 29, 29, 33, 32, 32, 31, 30, 30, 29, 28, 32, 32, 32, 31,
+        30, 30, 28, 28, 32, 32, 32, 31, 30, 30, 28, 28, 32, 32, 32, 31, 30, 30,
+        28, 28, 32, 32, 32, 31, 30, 30, 28, 28, 32, 32, 31, 31, 29, 29, 28, 27,
+        32, 32, 31, 30, 29, 29, 28, 27, 32, 32, 31, 30, 29, 29, 28, 27, 32, 32,
+        31, 30, 29, 29, 28, 27, 32, 31, 31, 30, 28, 28, 26, 26, 31, 31, 31, 29,
+        28, 28, 26, 25, 30, 30, 30, 29, 28, 28, 25, 24, 30, 30, 30, 29, 28, 28,
+        25, 24, 30, 30, 30, 29, 28, 28, 25, 24, 30, 30, 30, 29, 28, 28, 24, 23,
+        29, 30, 30, 28, 27, 27, 23, 22, 28, 29, 30, 28, 27, 27, 22, 21, 28, 29,
+        30, 28, 27, 27, 22, 21, 28, 29, 30, 28, 27, 27, 22, 21, 28, 28, 28, 28,
+        26, 26, 22, 21,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 30, 30, 30, 30, 29, 28, 28, 28, 28, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+        31, 30, 30, 30, 30, 30, 29, 29, 29, 28, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 30, 30, 30,
+        30, 30, 30, 30, 30, 28, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+        31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 29, 29, 29, 29, 29, 28, 28, 28,
+        28, 28, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30,
+        30, 29, 29, 29, 29, 28, 28, 28, 28, 28, 28, 27, 27, 27, 27, 26, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30, 30, 29, 29, 29,
+        29, 28, 28, 28, 28, 28, 28, 27, 27, 27, 27, 26, 29, 29, 29, 29, 29, 29,
+        30, 30, 30, 30, 30, 29, 29, 28, 28, 28, 28, 28, 28, 28, 28, 26, 26, 25,
+        25, 25, 24, 23, 22, 22, 22, 22, 28, 29, 29, 29, 29, 29, 29, 29, 30, 30,
+        30, 29, 28, 28, 28, 28, 28, 27, 27, 27, 27, 26, 25, 24, 24, 24, 23, 22,
+        21, 21, 21, 21 },
+      { /* Chroma */
+        /* Size 4x4 */
+        33, 32, 27, 22, 32, 30, 25, 22, 27, 25, 22, 22, 22, 22, 22, 20,
+        /* Size 8x8 */
+        33, 33, 34, 30, 28, 26, 24, 21, 33, 33, 33, 30, 28, 26, 24, 22, 34, 33,
+        32, 29, 26, 25, 24, 22, 30, 30, 29, 26, 24, 23, 23, 22, 28, 28, 26, 24,
+        22, 22, 22, 22, 26, 26, 25, 23, 22, 22, 21, 21, 24, 24, 24, 23, 22, 21,
+        21, 20, 21, 22, 22, 22, 22, 21, 20, 19,
+        /* Size 16x16 */
+        32, 33, 33, 33, 34, 34, 31, 31, 30, 28, 28, 26, 25, 23, 21, 21, 33, 33,
+        33, 33, 33, 33, 31, 30, 28, 27, 27, 25, 24, 23, 21, 21, 33, 33, 33, 33,
+        33, 33, 30, 30, 28, 27, 27, 25, 24, 23, 22, 22, 33, 33, 33, 33, 33, 33,
+        30, 29, 28, 26, 26, 25, 24, 23, 22, 22, 34, 33, 33, 33, 32, 32, 30, 29,
+        28, 26, 26, 24, 24, 23, 22, 22, 34, 33, 33, 33, 32, 32, 30, 29, 28, 26,
+        26, 24, 24, 23, 22, 22, 31, 31, 30, 30, 30, 30, 28, 27, 26, 24, 24, 23,
+        23, 23, 22, 22, 31, 30, 30, 29, 29, 29, 27, 26, 26, 24, 24, 23, 23, 22,
+        22, 22, 30, 28, 28, 28, 28, 28, 26, 26, 24, 23, 23, 23, 22, 22, 22, 22,
+        28, 27, 27, 26, 26, 26, 24, 24, 23, 22, 22, 22, 22, 22, 21, 21, 28, 27,
+        27, 26, 26, 26, 24, 24, 23, 22, 22, 22, 22, 22, 21, 21, 26, 25, 25, 25,
+        24, 24, 23, 23, 23, 22, 22, 21, 21, 21, 20, 20, 25, 24, 24, 24, 24, 24,
+        23, 23, 22, 22, 22, 21, 21, 21, 20, 20, 23, 23, 23, 23, 23, 23, 23, 22,
+        22, 22, 22, 21, 21, 20, 20, 20, 21, 21, 22, 22, 22, 22, 22, 22, 22, 21,
+        21, 20, 20, 20, 19, 19, 21, 21, 22, 22, 22, 22, 22, 22, 22, 21, 21, 20,
+        20, 20, 19, 19,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 33, 31, 31, 31, 31, 30, 28,
+        28, 28, 28, 27, 26, 25, 25, 25, 23, 22, 21, 21, 21, 21, 33, 33, 33, 33,
+        33, 33, 33, 33, 34, 34, 34, 32, 31, 30, 30, 30, 29, 28, 28, 28, 28, 26,
+        25, 24, 24, 24, 23, 22, 21, 21, 21, 21, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 32, 31, 30, 30, 30, 28, 28, 27, 27, 27, 26, 25, 24, 24, 24,
+        23, 22, 21, 21, 21, 22, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32,
+        30, 30, 30, 30, 28, 28, 27, 27, 27, 26, 25, 24, 24, 24, 23, 22, 22, 22,
+        22, 22, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 30, 30, 30, 30,
+        28, 28, 27, 27, 27, 26, 25, 24, 24, 24, 23, 22, 22, 22, 22, 22, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 30, 30, 30, 30, 28, 28, 27, 27,
+        27, 26, 25, 24, 24, 24, 23, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 31, 30, 29, 29, 29, 28, 27, 26, 26, 26, 26, 25, 24,
+        24, 24, 23, 22, 22, 22, 22, 22, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 31, 30, 29, 29, 29, 28, 27, 26, 26, 26, 25, 24, 24, 24, 24, 23, 22,
+        22, 22, 22, 22, 34, 34, 33, 33, 33, 33, 33, 33, 32, 32, 32, 31, 30, 29,
+        29, 29, 28, 26, 26, 26, 26, 25, 24, 24, 24, 24, 23, 23, 22, 22, 22, 22,
+        34, 34, 33, 33, 33, 33, 33, 33, 32, 32, 32, 31, 30, 29, 29, 29, 28, 26,
+        26, 26, 26, 25, 24, 24, 24, 24, 23, 23, 22, 22, 22, 22, 34, 34, 33, 33,
+        33, 33, 33, 33, 32, 32, 32, 31, 30, 29, 29, 29, 28, 26, 26, 26, 26, 25,
+        24, 24, 24, 24, 23, 23, 22, 22, 22, 22, 33, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 31, 30, 28, 28, 28, 28, 27, 26, 25, 25, 25, 24, 24, 24, 24, 24,
+        23, 22, 22, 22, 22, 22, 31, 31, 31, 30, 30, 30, 30, 30, 30, 30, 30, 28,
+        28, 27, 27, 27, 26, 25, 24, 24, 24, 24, 23, 23, 23, 23, 23, 22, 22, 22,
+        22, 22, 31, 30, 30, 30, 30, 30, 29, 29, 29, 29, 29, 28, 27, 26, 26, 26,
+        26, 24, 24, 24, 24, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 22, 31, 30,
+        30, 30, 30, 30, 29, 29, 29, 29, 29, 28, 27, 26, 26, 26, 26, 24, 24, 24,
+        24, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 22, 31, 30, 30, 30, 30, 30,
+        29, 29, 29, 29, 29, 28, 27, 26, 26, 26, 26, 24, 24, 24, 24, 23, 23, 23,
+        23, 23, 22, 22, 22, 22, 22, 22, 30, 29, 28, 28, 28, 28, 28, 28, 28, 28,
+        28, 27, 26, 26, 26, 26, 24, 23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 28, 28, 28, 28, 28, 28, 27, 27, 26, 26, 26, 26, 25, 24,
+        24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22,
+        28, 28, 27, 27, 27, 27, 26, 26, 26, 26, 26, 25, 24, 24, 24, 24, 23, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 22, 28, 28, 27, 27,
+        27, 27, 26, 26, 26, 26, 26, 25, 24, 24, 24, 24, 23, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 21, 21, 21, 22, 28, 28, 27, 27, 27, 27, 26, 26,
+        26, 26, 26, 25, 24, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22,
+        22, 22, 21, 21, 21, 22, 27, 26, 26, 26, 26, 26, 26, 25, 25, 25, 25, 24,
+        24, 23, 23, 23, 23, 22, 22, 22, 22, 22, 21, 21, 21, 21, 21, 21, 21, 21,
+        21, 21, 26, 25, 25, 25, 25, 25, 25, 24, 24, 24, 24, 24, 23, 23, 23, 23,
+        23, 22, 22, 22, 22, 21, 21, 21, 21, 21, 21, 21, 20, 20, 20, 21, 25, 24,
+        24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 23, 23, 23, 23, 22, 22, 22, 22,
+        22, 21, 21, 21, 21, 21, 21, 20, 20, 20, 20, 20, 25, 24, 24, 24, 24, 24,
+        24, 24, 24, 24, 24, 24, 23, 23, 23, 23, 22, 22, 22, 22, 22, 21, 21, 21,
+        21, 21, 21, 20, 20, 20, 20, 20, 25, 24, 24, 24, 24, 24, 24, 24, 24, 24,
+        24, 24, 23, 23, 23, 23, 22, 22, 22, 22, 22, 21, 21, 21, 21, 21, 21, 20,
+        20, 20, 20, 20, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 22,
+        22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 21, 20, 20, 20, 20, 20, 20,
+        22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 22, 22, 22, 22, 22, 22, 22,
+        22, 22, 22, 21, 21, 20, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21,
+        20, 20, 20, 20, 20, 20, 19, 19, 19, 19, 21, 21, 21, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 20, 20, 20, 20,
+        20, 20, 19, 19, 19, 19, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 20, 20, 20, 20, 20, 20, 19, 19,
+        19, 19, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 20, 20, 19, 19, 19, 19,
+        /* Size 4x8 */
+        33, 33, 28, 21, 33, 33, 27, 22, 33, 32, 26, 22, 30, 28, 24, 22, 28, 26,
+        22, 22, 26, 25, 22, 21, 24, 24, 22, 20, 21, 22, 21, 19,
+        /* Size 8x4 */
+        33, 33, 33, 30, 28, 26, 24, 21, 33, 33, 32, 28, 26, 25, 24, 22, 28, 27,
+        26, 24, 22, 22, 22, 21, 21, 22, 22, 22, 22, 21, 20, 19,
+        /* Size 8x16 */
+        32, 33, 33, 31, 28, 28, 23, 21, 33, 33, 33, 30, 27, 27, 23, 22, 33, 33,
+        33, 30, 27, 27, 23, 22, 33, 33, 32, 30, 26, 26, 23, 22, 34, 32, 32, 29,
+        26, 26, 23, 22, 34, 32, 32, 29, 26, 26, 23, 22, 31, 30, 29, 28, 24, 24,
+        22, 22, 31, 29, 28, 27, 24, 24, 22, 22, 29, 28, 28, 26, 23, 23, 22, 22,
+        28, 26, 26, 24, 22, 22, 22, 22, 28, 26, 26, 24, 22, 22, 22, 22, 25, 24,
+        24, 23, 22, 22, 21, 21, 24, 24, 24, 23, 22, 22, 21, 20, 23, 23, 23, 23,
+        22, 22, 20, 20, 21, 22, 22, 22, 21, 21, 20, 19, 21, 22, 22, 22, 21, 21,
+        20, 19,
+        /* Size 16x8 */
+        32, 33, 33, 33, 34, 34, 31, 31, 29, 28, 28, 25, 24, 23, 21, 21, 33, 33,
+        33, 33, 32, 32, 30, 29, 28, 26, 26, 24, 24, 23, 22, 22, 33, 33, 33, 32,
+        32, 32, 29, 28, 28, 26, 26, 24, 24, 23, 22, 22, 31, 30, 30, 30, 29, 29,
+        28, 27, 26, 24, 24, 23, 23, 23, 22, 22, 28, 27, 27, 26, 26, 26, 24, 24,
+        23, 22, 22, 22, 22, 22, 21, 21, 28, 27, 27, 26, 26, 26, 24, 24, 23, 22,
+        22, 22, 22, 22, 21, 21, 23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 21,
+        21, 20, 20, 20, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20,
+        19, 19,
+        /* Size 16x32 */
+        32, 33, 33, 33, 33, 33, 31, 29, 28, 28, 28, 26, 23, 21, 21, 21, 33, 33,
+        33, 33, 33, 33, 31, 28, 28, 28, 28, 25, 23, 21, 21, 21, 33, 33, 33, 33,
+        33, 33, 30, 28, 27, 27, 27, 25, 23, 22, 22, 22, 33, 33, 33, 33, 33, 33,
+        30, 28, 27, 27, 27, 25, 23, 22, 22, 22, 33, 33, 33, 33, 33, 33, 30, 28,
+        27, 27, 27, 25, 23, 22, 22, 22, 33, 33, 33, 33, 33, 33, 30, 28, 27, 27,
+        27, 25, 23, 22, 22, 22, 33, 33, 33, 32, 32, 32, 30, 28, 26, 26, 26, 25,
+        23, 22, 22, 22, 34, 33, 33, 32, 32, 32, 30, 27, 26, 26, 26, 24, 23, 22,
+        22, 22, 34, 33, 32, 32, 32, 32, 29, 27, 26, 26, 26, 24, 23, 22, 22, 22,
+        34, 33, 32, 32, 32, 32, 29, 27, 26, 26, 26, 24, 23, 22, 22, 22, 34, 33,
+        32, 32, 32, 32, 29, 27, 26, 26, 26, 24, 23, 22, 22, 22, 33, 32, 31, 31,
+        31, 31, 28, 26, 25, 25, 25, 24, 23, 22, 22, 22, 31, 30, 30, 29, 29, 29,
+        28, 26, 24, 24, 24, 23, 22, 22, 22, 22, 31, 30, 29, 28, 28, 28, 27, 25,
+        24, 24, 24, 23, 22, 22, 22, 22, 31, 30, 29, 28, 28, 28, 27, 25, 24, 24,
+        24, 23, 22, 22, 22, 22, 31, 30, 29, 28, 28, 28, 27, 25, 24, 24, 24, 23,
+        22, 22, 22, 22, 29, 28, 28, 28, 28, 28, 26, 24, 23, 23, 23, 23, 22, 22,
+        22, 22, 28, 28, 27, 26, 26, 26, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22,
+        28, 27, 26, 26, 26, 26, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 28, 27,
+        26, 26, 26, 26, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 28, 27, 26, 26,
+        26, 26, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 26, 26, 26, 25, 25, 25,
+        24, 22, 22, 22, 22, 21, 21, 21, 21, 21, 25, 25, 24, 24, 24, 24, 23, 22,
+        22, 22, 22, 21, 21, 21, 21, 21, 24, 24, 24, 24, 24, 24, 23, 22, 22, 22,
+        22, 21, 21, 20, 20, 20, 24, 24, 24, 24, 24, 24, 23, 22, 22, 22, 22, 21,
+        21, 20, 20, 20, 24, 24, 24, 24, 24, 24, 23, 22, 22, 22, 22, 21, 21, 20,
+        20, 20, 23, 23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 21, 20, 20, 20, 20,
+        22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 20, 20, 20, 20, 21, 21,
+        22, 22, 22, 22, 22, 21, 21, 21, 21, 20, 20, 19, 19, 19, 21, 21, 22, 22,
+        22, 22, 22, 21, 21, 21, 21, 20, 20, 19, 19, 19, 21, 21, 22, 22, 22, 22,
+        22, 21, 21, 21, 21, 20, 20, 19, 19, 19, 21, 21, 22, 22, 22, 22, 22, 22,
+        22, 22, 22, 21, 20, 19, 19, 19,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 33, 31, 31, 31, 31, 29, 28,
+        28, 28, 28, 26, 25, 24, 24, 24, 23, 22, 21, 21, 21, 21, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 32, 30, 30, 30, 30, 28, 28, 27, 27, 27, 26,
+        25, 24, 24, 24, 23, 22, 21, 21, 21, 21, 33, 33, 33, 33, 33, 33, 33, 33,
+        32, 32, 32, 31, 30, 29, 29, 29, 28, 27, 26, 26, 26, 26, 24, 24, 24, 24,
+        23, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 31,
+        29, 28, 28, 28, 28, 26, 26, 26, 26, 25, 24, 24, 24, 24, 23, 22, 22, 22,
+        22, 22, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 31, 29, 28, 28, 28,
+        28, 26, 26, 26, 26, 25, 24, 24, 24, 24, 23, 22, 22, 22, 22, 22, 33, 33,
+        33, 33, 33, 33, 32, 32, 32, 32, 32, 31, 29, 28, 28, 28, 28, 26, 26, 26,
+        26, 25, 24, 24, 24, 24, 23, 22, 22, 22, 22, 22, 31, 31, 30, 30, 30, 30,
+        30, 30, 29, 29, 29, 28, 28, 27, 27, 27, 26, 24, 24, 24, 24, 24, 23, 23,
+        23, 23, 23, 22, 22, 22, 22, 22, 29, 28, 28, 28, 28, 28, 28, 27, 27, 27,
+        27, 26, 26, 25, 25, 25, 24, 23, 23, 23, 23, 22, 22, 22, 22, 22, 22, 22,
+        21, 21, 21, 22, 28, 28, 27, 27, 27, 27, 26, 26, 26, 26, 26, 25, 24, 24,
+        24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 22,
+        28, 28, 27, 27, 27, 27, 26, 26, 26, 26, 26, 25, 24, 24, 24, 24, 23, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 22, 28, 28, 27, 27,
+        27, 27, 26, 26, 26, 26, 26, 25, 24, 24, 24, 24, 23, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 21, 21, 21, 21, 22, 26, 25, 25, 25, 25, 25, 25, 24,
+        24, 24, 24, 24, 23, 23, 23, 23, 23, 22, 22, 22, 22, 21, 21, 21, 21, 21,
+        21, 21, 20, 20, 20, 21, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 21, 20, 20, 20, 20,
+        20, 20, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 20, 20, 19, 19, 19, 19, 21, 21,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22,
+        22, 21, 21, 20, 20, 20, 20, 20, 19, 19, 19, 19, 21, 21, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 20,
+        20, 20, 20, 20, 19, 19, 19, 19,
+        /* Size 4x16 */
+        33, 33, 28, 21, 33, 33, 27, 22, 33, 33, 27, 22, 33, 32, 26, 22, 33, 32,
+        26, 22, 33, 32, 26, 22, 30, 29, 24, 22, 30, 28, 24, 22, 28, 28, 23, 22,
+        27, 26, 22, 22, 27, 26, 22, 22, 25, 24, 22, 21, 24, 24, 22, 20, 23, 23,
+        22, 20, 21, 22, 21, 19, 21, 22, 21, 19,
+        /* Size 16x4 */
+        33, 33, 33, 33, 33, 33, 30, 30, 28, 27, 27, 25, 24, 23, 21, 21, 33, 33,
+        33, 32, 32, 32, 29, 28, 28, 26, 26, 24, 24, 23, 22, 22, 28, 27, 27, 26,
+        26, 26, 24, 24, 23, 22, 22, 22, 22, 22, 21, 21, 21, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 21, 20, 20, 19, 19,
+        /* Size 8x32 */
+        32, 33, 33, 31, 28, 28, 23, 21, 33, 33, 33, 31, 28, 28, 23, 21, 33, 33,
+        33, 30, 27, 27, 23, 22, 33, 33, 33, 30, 27, 27, 23, 22, 33, 33, 33, 30,
+        27, 27, 23, 22, 33, 33, 33, 30, 27, 27, 23, 22, 33, 33, 32, 30, 26, 26,
+        23, 22, 34, 33, 32, 30, 26, 26, 23, 22, 34, 32, 32, 29, 26, 26, 23, 22,
+        34, 32, 32, 29, 26, 26, 23, 22, 34, 32, 32, 29, 26, 26, 23, 22, 33, 31,
+        31, 28, 25, 25, 23, 22, 31, 30, 29, 28, 24, 24, 22, 22, 31, 29, 28, 27,
+        24, 24, 22, 22, 31, 29, 28, 27, 24, 24, 22, 22, 31, 29, 28, 27, 24, 24,
+        22, 22, 29, 28, 28, 26, 23, 23, 22, 22, 28, 27, 26, 24, 22, 22, 22, 22,
+        28, 26, 26, 24, 22, 22, 22, 22, 28, 26, 26, 24, 22, 22, 22, 22, 28, 26,
+        26, 24, 22, 22, 22, 22, 26, 26, 25, 24, 22, 22, 21, 21, 25, 24, 24, 23,
+        22, 22, 21, 21, 24, 24, 24, 23, 22, 22, 21, 20, 24, 24, 24, 23, 22, 22,
+        21, 20, 24, 24, 24, 23, 22, 22, 21, 20, 23, 23, 23, 23, 22, 22, 20, 20,
+        22, 22, 22, 22, 21, 21, 20, 20, 21, 22, 22, 22, 21, 21, 20, 19, 21, 22,
+        22, 22, 21, 21, 20, 19, 21, 22, 22, 22, 21, 21, 20, 19, 21, 22, 22, 22,
+        22, 22, 20, 19,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 33, 31, 31, 31, 31, 29, 28,
+        28, 28, 28, 26, 25, 24, 24, 24, 23, 22, 21, 21, 21, 21, 33, 33, 33, 33,
+        33, 33, 33, 33, 32, 32, 32, 31, 30, 29, 29, 29, 28, 27, 26, 26, 26, 26,
+        24, 24, 24, 24, 23, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 33, 32, 32,
+        32, 32, 32, 31, 29, 28, 28, 28, 28, 26, 26, 26, 26, 25, 24, 24, 24, 24,
+        23, 22, 22, 22, 22, 22, 31, 31, 30, 30, 30, 30, 30, 30, 29, 29, 29, 28,
+        28, 27, 27, 27, 26, 24, 24, 24, 24, 24, 23, 23, 23, 23, 23, 22, 22, 22,
+        22, 22, 28, 28, 27, 27, 27, 27, 26, 26, 26, 26, 26, 25, 24, 24, 24, 24,
+        23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 22, 28, 28,
+        27, 27, 27, 27, 26, 26, 26, 26, 26, 25, 24, 24, 24, 24, 23, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 22, 23, 23, 23, 23, 23, 23,
+        23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21,
+        21, 21, 20, 20, 20, 20, 20, 20, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 20, 20,
+        19, 19, 19, 19 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        33, 32, 32, 32, 32, 32, 32, 31, 32, 32, 31, 30, 32, 31, 30, 29,
+        /* Size 8x8 */
+        33, 33, 33, 33, 32, 32, 32, 31, 33, 32, 32, 32, 32, 32, 32, 31, 33, 32,
+        32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32,
+        31, 31, 30, 29, 32, 32, 32, 32, 31, 30, 30, 29, 32, 32, 32, 32, 30, 30,
+        29, 28, 31, 31, 31, 31, 29, 29, 28, 27,
+        /* Size 16x16 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 30, 33, 33,
+        33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 30, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31,
+        31, 30, 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 30, 29,
+        33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 29, 33, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 29, 32, 32, 32, 32,
+        32, 32, 32, 31, 31, 31, 31, 30, 29, 29, 29, 28, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 30, 30, 29, 29, 29, 28, 28, 32, 32, 32, 32, 32, 32, 32, 31,
+        31, 30, 30, 29, 29, 29, 28, 28, 32, 31, 31, 31, 31, 31, 31, 31, 30, 30,
+        30, 29, 28, 28, 28, 27, 30, 30, 30, 30, 30, 31, 31, 30, 29, 29, 29, 28,
+        28, 28, 27, 26,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 30, 30, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        30, 30, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 33, 33, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 30, 30, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 30, 30, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30,
+        33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 31, 31, 31, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31,
+        31, 31, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 30, 30, 30, 30, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30,
+        30, 30, 29, 29, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 29, 29, 29,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31,
+        31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 29, 29, 29, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31,
+        31, 30, 30, 30, 30, 30, 30, 29, 29, 29, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30,
+        30, 30, 30, 29, 29, 29, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 30, 29, 29,
+        29, 29, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 31, 31, 31, 30, 30, 30, 29, 29, 29, 29, 29, 29, 28, 28, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30,
+        30, 30, 30, 29, 29, 29, 29, 29, 29, 28, 28, 28, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 29, 29,
+        29, 29, 29, 29, 28, 28, 28, 28, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 29, 29, 29, 29, 29, 29,
+        28, 28, 28, 28, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 30, 30, 30, 30, 30, 29, 29, 29, 29, 29, 29, 28, 28, 28, 28,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 30, 30,
+        30, 30, 30, 30, 29, 29, 29, 29, 29, 29, 28, 28, 28, 28, 32, 32, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 29,
+        29, 29, 28, 28, 28, 28, 28, 28, 27, 27, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 30, 30, 29, 29, 29, 29, 29, 29, 28, 28, 28,
+        28, 28, 28, 27, 27, 27, 30, 30, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31,
+        31, 31, 30, 30, 29, 29, 29, 29, 29, 29, 28, 28, 28, 28, 28, 28, 27, 27,
+        26, 26, 30, 30, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 30, 30,
+        29, 29, 29, 29, 29, 29, 28, 28, 28, 28, 28, 28, 27, 27, 26, 26,
+        /* Size 4x8 */
+        33, 33, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32,
+        31, 30, 32, 32, 30, 30, 32, 31, 30, 29, 31, 31, 29, 28,
+        /* Size 8x4 */
+        33, 33, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 32, 31, 31, 32, 32,
+        32, 32, 31, 30, 30, 29, 32, 32, 32, 31, 30, 30, 29, 28,
+        /* Size 8x16 */
+        32, 33, 33, 33, 33, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 31, 33, 32,
+        32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32,
+        32, 32, 32, 31, 33, 32, 32, 32, 32, 31, 31, 31, 33, 32, 32, 32, 32, 31,
+        31, 31, 33, 32, 32, 32, 32, 31, 31, 31, 33, 32, 32, 32, 31, 30, 30, 30,
+        32, 32, 32, 32, 31, 30, 30, 30, 32, 32, 32, 32, 31, 30, 30, 30, 32, 32,
+        32, 32, 31, 29, 29, 29, 32, 32, 31, 31, 30, 29, 29, 28, 32, 32, 31, 31,
+        30, 29, 29, 28, 32, 31, 31, 31, 30, 28, 28, 28, 30, 30, 30, 30, 29, 28,
+        28, 27,
+        /* Size 16x8 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 30, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 33, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 31, 30, 30, 30, 29, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30,
+        30, 29, 29, 29, 28, 28, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 29,
+        29, 29, 28, 28, 32, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 29, 28, 28,
+        28, 27,
+        /* Size 16x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 30, 33, 33,
+        33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 31, 30, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+        31, 31, 31, 31, 31, 30, 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        31, 31, 30, 30, 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 30, 30,
+        30, 29, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 29,
+        32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 29, 32, 32,
+        32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 29, 32, 32, 32, 32,
+        32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 29, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 30, 30, 30, 30, 30, 29, 29, 32, 32, 32, 32, 32, 32, 32, 31,
+        31, 30, 29, 29, 29, 29, 29, 28, 32, 32, 32, 32, 31, 31, 31, 31, 31, 30,
+        29, 29, 29, 29, 28, 28, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 29, 29,
+        29, 29, 28, 28, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 29, 29, 29, 29,
+        28, 28, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 29, 29, 29, 29, 28, 28,
+        32, 32, 32, 31, 31, 31, 31, 31, 30, 30, 29, 29, 29, 29, 28, 28, 32, 31,
+        31, 31, 31, 31, 31, 31, 30, 29, 28, 28, 28, 28, 28, 27, 31, 31, 31, 31,
+        31, 31, 31, 30, 30, 29, 28, 28, 28, 28, 28, 27, 30, 30, 30, 30, 30, 30,
+        30, 30, 29, 28, 28, 28, 28, 28, 27, 26, 30, 30, 30, 30, 30, 30, 30, 30,
+        29, 28, 28, 28, 28, 28, 27, 26,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 30, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31,
+        30, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 30, 30, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 30, 30, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+        31, 31, 31, 31, 31, 31, 30, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 30, 30, 30, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 29, 29,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        31, 31, 31, 30, 30, 30, 30, 30, 30, 30, 29, 29, 28, 28, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30,
+        29, 29, 29, 29, 29, 29, 28, 28, 28, 28, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 29, 29, 29, 29,
+        29, 29, 28, 28, 28, 28, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 29, 29, 29, 29, 29, 29, 28, 28,
+        28, 28, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31,
+        30, 30, 30, 30, 30, 30, 29, 29, 29, 29, 29, 29, 28, 28, 28, 28, 32, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30,
+        30, 29, 29, 28, 28, 28, 28, 28, 28, 28, 27, 27, 30, 30, 30, 30, 30, 30,
+        30, 30, 30, 31, 31, 31, 31, 31, 30, 30, 29, 29, 29, 29, 29, 29, 28, 28,
+        28, 28, 28, 28, 27, 27, 26, 26,
+        /* Size 4x16 */
+        33, 33, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32,
+        32, 32, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 31, 31, 32, 32, 31, 30,
+        32, 32, 31, 30, 32, 32, 31, 30, 32, 32, 30, 29, 32, 31, 30, 29, 32, 31,
+        30, 29, 31, 31, 29, 28, 30, 30, 28, 28,
+        /* Size 16x4 */
+        33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 32, 32, 32, 32,
+        32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 29, 28, 32, 32, 32, 32, 32, 31,
+        31, 31, 30, 30, 30, 29, 29, 29, 28, 28,
+        /* Size 8x32 */
+        32, 33, 33, 33, 33, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 31, 33, 33,
+        32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32,
+        32, 32, 32, 31, 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 32,
+        32, 31, 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 32, 32, 31,
+        33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 31, 31, 31, 33, 32,
+        32, 32, 32, 31, 31, 31, 33, 32, 32, 32, 32, 31, 31, 31, 33, 32, 32, 32,
+        32, 31, 31, 31, 33, 32, 32, 32, 32, 31, 31, 31, 33, 32, 32, 32, 31, 31,
+        31, 30, 33, 32, 32, 32, 31, 30, 30, 30, 32, 32, 32, 32, 31, 30, 30, 30,
+        32, 32, 32, 32, 31, 30, 30, 30, 32, 32, 32, 32, 31, 30, 30, 30, 32, 32,
+        32, 32, 31, 30, 30, 30, 32, 32, 32, 32, 31, 30, 30, 29, 32, 32, 32, 32,
+        31, 29, 29, 29, 32, 32, 31, 31, 31, 29, 29, 28, 32, 32, 31, 31, 30, 29,
+        29, 28, 32, 32, 31, 31, 30, 29, 29, 28, 32, 32, 31, 31, 30, 29, 29, 28,
+        32, 32, 31, 31, 30, 29, 29, 28, 32, 31, 31, 31, 30, 28, 28, 28, 31, 31,
+        31, 31, 30, 28, 28, 28, 30, 30, 30, 30, 29, 28, 28, 27, 30, 30, 30, 30,
+        29, 28, 28, 27,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31,
+        31, 31, 31, 31, 30, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31,
+        30, 30, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 29, 29, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30,
+        30, 30, 29, 29, 29, 29, 29, 29, 28, 28, 28, 28, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 29, 29,
+        29, 29, 29, 29, 28, 28, 28, 28, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 29, 29, 28, 28, 28, 28, 28,
+        28, 28, 27, 27 },
+      { /* Chroma */
+        /* Size 4x4 */
+        33, 33, 30, 27, 33, 32, 29, 26, 30, 29, 26, 24, 27, 26, 24, 22,
+        /* Size 8x8 */
+        33, 33, 33, 34, 30, 29, 28, 26, 33, 33, 33, 33, 30, 29, 27, 25, 33, 33,
+        33, 33, 29, 28, 26, 25, 34, 33, 33, 32, 29, 28, 26, 24, 30, 30, 29, 29,
+        26, 26, 24, 23, 29, 29, 28, 28, 26, 25, 23, 23, 28, 27, 26, 26, 24, 23,
+        22, 22, 26, 25, 25, 24, 23, 23, 22, 21,
+        /* Size 16x16 */
+        32, 33, 33, 33, 33, 34, 34, 33, 31, 31, 31, 29, 28, 28, 27, 25, 33, 33,
+        33, 33, 33, 33, 33, 33, 31, 30, 30, 28, 28, 28, 26, 24, 33, 33, 33, 33,
+        33, 33, 33, 32, 30, 30, 30, 28, 27, 27, 26, 24, 33, 33, 33, 33, 33, 33,
+        33, 32, 30, 30, 30, 28, 27, 27, 26, 24, 33, 33, 33, 33, 33, 33, 33, 32,
+        30, 29, 29, 28, 26, 26, 26, 24, 34, 33, 33, 33, 33, 32, 32, 32, 30, 29,
+        29, 27, 26, 26, 25, 24, 34, 33, 33, 33, 33, 32, 32, 32, 30, 29, 29, 27,
+        26, 26, 25, 24, 33, 33, 32, 32, 32, 32, 32, 31, 29, 28, 28, 27, 26, 26,
+        25, 24, 31, 31, 30, 30, 30, 30, 30, 29, 28, 27, 27, 25, 24, 24, 24, 23,
+        31, 30, 30, 30, 29, 29, 29, 28, 27, 26, 26, 25, 24, 24, 23, 23, 31, 30,
+        30, 30, 29, 29, 29, 28, 27, 26, 26, 25, 24, 24, 23, 23, 29, 28, 28, 28,
+        28, 27, 27, 27, 25, 25, 25, 23, 22, 22, 22, 22, 28, 28, 27, 27, 26, 26,
+        26, 26, 24, 24, 24, 22, 22, 22, 22, 22, 28, 28, 27, 27, 26, 26, 26, 26,
+        24, 24, 24, 22, 22, 22, 22, 22, 27, 26, 26, 26, 26, 25, 25, 25, 24, 23,
+        23, 22, 22, 22, 22, 21, 25, 24, 24, 24, 24, 24, 24, 24, 23, 23, 23, 22,
+        22, 22, 21, 21,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 33, 32, 31, 31,
+        31, 31, 31, 30, 29, 28, 28, 28, 28, 28, 27, 26, 25, 25, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 33, 32, 31, 30, 30, 30, 30, 29,
+        28, 28, 28, 28, 28, 28, 26, 26, 24, 24, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 30, 30, 30, 29, 28, 28, 28, 28,
+        28, 27, 26, 26, 24, 24, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 32, 32, 30, 30, 30, 30, 30, 29, 28, 27, 27, 27, 27, 27, 26, 25,
+        24, 24, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31,
+        30, 30, 30, 30, 30, 29, 28, 27, 27, 27, 27, 26, 26, 25, 24, 24, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 30, 30, 30,
+        30, 29, 28, 27, 27, 27, 27, 26, 26, 25, 24, 24, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 30, 30, 30, 30, 29, 28, 27,
+        27, 27, 27, 26, 26, 25, 24, 24, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 32, 31, 30, 30, 30, 30, 30, 28, 28, 27, 27, 27, 27, 26,
+        26, 25, 24, 24, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        32, 31, 30, 29, 29, 29, 29, 28, 28, 27, 26, 26, 26, 26, 26, 25, 24, 24,
+        34, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 29,
+        29, 29, 29, 28, 28, 26, 26, 26, 26, 26, 26, 25, 24, 24, 34, 34, 33, 33,
+        33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 31, 30, 29, 29, 29, 29, 28,
+        27, 26, 26, 26, 26, 26, 25, 24, 24, 24, 34, 34, 33, 33, 33, 33, 33, 33,
+        33, 33, 32, 32, 32, 32, 32, 31, 30, 29, 29, 29, 29, 28, 27, 26, 26, 26,
+        26, 26, 25, 24, 24, 24, 34, 34, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32,
+        32, 32, 32, 31, 30, 29, 29, 29, 29, 28, 27, 26, 26, 26, 26, 26, 25, 24,
+        24, 24, 34, 34, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 31,
+        30, 29, 29, 29, 29, 28, 27, 26, 26, 26, 26, 26, 25, 24, 24, 24, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 29, 28, 28, 28,
+        28, 28, 27, 26, 26, 26, 26, 25, 25, 24, 24, 24, 32, 32, 32, 32, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 30, 29, 28, 28, 28, 28, 28, 27, 26, 25,
+        25, 25, 25, 24, 24, 24, 24, 24, 31, 31, 31, 30, 30, 30, 30, 30, 30, 30,
+        30, 30, 30, 30, 29, 28, 28, 27, 27, 27, 27, 26, 25, 24, 24, 24, 24, 24,
+        24, 23, 23, 23, 31, 30, 30, 30, 30, 30, 30, 30, 29, 29, 29, 29, 29, 29,
+        28, 28, 27, 26, 26, 26, 26, 26, 25, 24, 24, 24, 24, 24, 23, 23, 23, 23,
+        31, 30, 30, 30, 30, 30, 30, 30, 29, 29, 29, 29, 29, 29, 28, 28, 27, 26,
+        26, 26, 26, 26, 25, 24, 24, 24, 24, 24, 23, 23, 23, 23, 31, 30, 30, 30,
+        30, 30, 30, 30, 29, 29, 29, 29, 29, 29, 28, 28, 27, 26, 26, 26, 26, 26,
+        25, 24, 24, 24, 24, 24, 23, 23, 23, 23, 31, 30, 30, 30, 30, 30, 30, 30,
+        29, 29, 29, 29, 29, 29, 28, 28, 27, 26, 26, 26, 26, 26, 25, 24, 24, 24,
+        24, 24, 23, 23, 23, 23, 30, 29, 29, 29, 29, 29, 29, 28, 28, 28, 28, 28,
+        28, 28, 28, 27, 26, 26, 26, 26, 26, 25, 24, 23, 23, 23, 23, 23, 23, 23,
+        23, 23, 29, 28, 28, 28, 28, 28, 28, 28, 28, 28, 27, 27, 27, 27, 27, 26,
+        25, 25, 25, 25, 25, 24, 23, 23, 22, 22, 22, 22, 22, 22, 22, 22, 28, 28,
+        28, 27, 27, 27, 27, 27, 27, 26, 26, 26, 26, 26, 26, 25, 24, 24, 24, 24,
+        24, 23, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 28, 28, 28, 27, 27, 27,
+        27, 27, 26, 26, 26, 26, 26, 26, 26, 25, 24, 24, 24, 24, 24, 23, 22, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 28, 28, 28, 27, 27, 27, 27, 27, 26, 26,
+        26, 26, 26, 26, 26, 25, 24, 24, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22,
+        22, 22, 22, 22, 28, 28, 28, 27, 27, 27, 27, 27, 26, 26, 26, 26, 26, 26,
+        26, 25, 24, 24, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22,
+        28, 28, 27, 27, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 25, 24, 24, 24,
+        24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 27, 26, 26, 26,
+        26, 26, 26, 26, 26, 26, 25, 25, 25, 25, 25, 24, 24, 23, 23, 23, 23, 23,
+        22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 26, 26, 26, 25, 25, 25, 25, 25,
+        25, 25, 24, 24, 24, 24, 24, 24, 23, 23, 23, 23, 23, 23, 22, 22, 22, 22,
+        22, 22, 21, 21, 21, 21, 25, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24,
+        24, 24, 24, 24, 23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 22, 21, 21,
+        21, 21, 25, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24,
+        23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21,
+        /* Size 4x8 */
+        33, 33, 29, 28, 33, 33, 28, 27, 33, 32, 28, 26, 33, 32, 28, 26, 30, 28,
+        26, 24, 29, 28, 24, 23, 27, 26, 23, 22, 25, 24, 23, 22,
+        /* Size 8x4 */
+        33, 33, 33, 33, 30, 29, 27, 25, 33, 33, 32, 32, 28, 28, 26, 24, 29, 28,
+        28, 28, 26, 24, 23, 23, 28, 27, 26, 26, 24, 23, 22, 22,
+        /* Size 8x16 */
+        32, 33, 33, 33, 31, 28, 28, 27, 33, 33, 33, 33, 31, 27, 27, 26, 33, 33,
+        33, 33, 30, 27, 27, 26, 33, 33, 33, 33, 30, 27, 27, 26, 33, 33, 32, 32,
+        30, 26, 26, 26, 34, 33, 32, 32, 29, 26, 26, 25, 34, 33, 32, 32, 29, 26,
+        26, 25, 33, 32, 31, 31, 29, 26, 26, 25, 31, 30, 29, 29, 28, 24, 24, 24,
+        31, 29, 28, 28, 27, 24, 24, 23, 31, 29, 28, 28, 27, 24, 24, 23, 29, 28,
+        27, 27, 25, 23, 23, 22, 28, 26, 26, 26, 24, 22, 22, 22, 28, 26, 26, 26,
+        24, 22, 22, 22, 26, 26, 25, 25, 24, 22, 22, 22, 24, 24, 24, 24, 23, 22,
+        22, 21,
+        /* Size 16x8 */
+        32, 33, 33, 33, 33, 34, 34, 33, 31, 31, 31, 29, 28, 28, 26, 24, 33, 33,
+        33, 33, 33, 33, 33, 32, 30, 29, 29, 28, 26, 26, 26, 24, 33, 33, 33, 33,
+        32, 32, 32, 31, 29, 28, 28, 27, 26, 26, 25, 24, 33, 33, 33, 33, 32, 32,
+        32, 31, 29, 28, 28, 27, 26, 26, 25, 24, 31, 31, 30, 30, 30, 29, 29, 29,
+        28, 27, 27, 25, 24, 24, 24, 23, 28, 27, 27, 27, 26, 26, 26, 26, 24, 24,
+        24, 23, 22, 22, 22, 22, 28, 27, 27, 27, 26, 26, 26, 26, 24, 24, 24, 23,
+        22, 22, 22, 22, 27, 26, 26, 26, 26, 25, 25, 25, 24, 23, 23, 22, 22, 22,
+        22, 21,
+        /* Size 16x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 31, 29, 28, 28, 28, 28, 27, 24, 33, 33,
+        33, 33, 33, 33, 33, 33, 31, 29, 28, 28, 28, 28, 26, 24, 33, 33, 33, 33,
+        33, 33, 33, 32, 31, 29, 27, 27, 27, 27, 26, 24, 33, 33, 33, 33, 33, 33,
+        33, 32, 30, 28, 27, 27, 27, 27, 26, 24, 33, 33, 33, 33, 33, 33, 33, 32,
+        30, 28, 27, 27, 27, 27, 26, 24, 33, 33, 33, 33, 33, 33, 33, 32, 30, 28,
+        27, 27, 27, 27, 26, 24, 33, 33, 33, 33, 33, 33, 33, 32, 30, 28, 27, 27,
+        27, 27, 26, 24, 33, 33, 33, 33, 33, 33, 33, 32, 30, 28, 27, 27, 27, 27,
+        26, 24, 33, 33, 33, 33, 32, 32, 32, 32, 30, 28, 26, 26, 26, 26, 26, 24,
+        34, 33, 33, 32, 32, 32, 32, 32, 30, 28, 26, 26, 26, 26, 26, 24, 34, 33,
+        33, 32, 32, 32, 32, 31, 29, 28, 26, 26, 26, 26, 25, 24, 34, 33, 33, 32,
+        32, 32, 32, 31, 29, 28, 26, 26, 26, 26, 25, 24, 34, 33, 33, 32, 32, 32,
+        32, 31, 29, 28, 26, 26, 26, 26, 25, 24, 34, 33, 33, 32, 32, 32, 32, 31,
+        29, 28, 26, 26, 26, 26, 25, 24, 33, 33, 32, 32, 31, 31, 31, 31, 29, 27,
+        26, 26, 26, 26, 25, 24, 32, 32, 31, 31, 30, 30, 30, 30, 28, 26, 25, 25,
+        25, 25, 24, 23, 31, 31, 30, 29, 29, 29, 29, 29, 28, 26, 24, 24, 24, 24,
+        24, 23, 31, 30, 29, 29, 28, 28, 28, 28, 27, 26, 24, 24, 24, 24, 23, 23,
+        31, 30, 29, 29, 28, 28, 28, 28, 27, 26, 24, 24, 24, 24, 23, 23, 31, 30,
+        29, 29, 28, 28, 28, 28, 27, 26, 24, 24, 24, 24, 23, 23, 31, 30, 29, 29,
+        28, 28, 28, 28, 27, 26, 24, 24, 24, 24, 23, 23, 30, 29, 28, 28, 28, 28,
+        28, 28, 26, 24, 23, 23, 23, 23, 23, 23, 29, 28, 28, 27, 27, 27, 27, 26,
+        25, 24, 23, 23, 23, 23, 22, 22, 28, 28, 27, 26, 26, 26, 26, 26, 24, 23,
+        22, 22, 22, 22, 22, 22, 28, 27, 26, 26, 26, 26, 26, 25, 24, 23, 22, 22,
+        22, 22, 22, 22, 28, 27, 26, 26, 26, 26, 26, 25, 24, 23, 22, 22, 22, 22,
+        22, 22, 28, 27, 26, 26, 26, 26, 26, 25, 24, 23, 22, 22, 22, 22, 22, 22,
+        28, 27, 26, 26, 26, 26, 26, 25, 24, 23, 22, 22, 22, 22, 22, 22, 26, 26,
+        26, 25, 25, 25, 25, 24, 24, 23, 22, 22, 22, 22, 22, 21, 26, 25, 25, 24,
+        24, 24, 24, 24, 23, 23, 22, 22, 22, 22, 22, 21, 24, 24, 24, 24, 24, 24,
+        24, 24, 23, 22, 22, 22, 22, 22, 21, 21, 24, 24, 24, 24, 24, 24, 24, 24,
+        23, 22, 22, 22, 22, 22, 21, 21,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 33, 32, 31, 31,
+        31, 31, 31, 30, 29, 28, 28, 28, 28, 28, 26, 26, 24, 24, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 30, 30, 30, 29,
+        28, 28, 27, 27, 27, 27, 26, 25, 24, 24, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 32, 31, 30, 29, 29, 29, 29, 28, 28, 27, 26, 26,
+        26, 26, 26, 25, 24, 24, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32,
+        32, 32, 32, 31, 29, 29, 29, 29, 29, 28, 27, 26, 26, 26, 26, 26, 25, 24,
+        24, 24, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 31, 30,
+        29, 28, 28, 28, 28, 28, 27, 26, 26, 26, 26, 26, 25, 24, 24, 24, 33, 33,
+        33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 31, 30, 29, 28, 28, 28,
+        28, 28, 27, 26, 26, 26, 26, 26, 25, 24, 24, 24, 33, 33, 33, 33, 33, 33,
+        33, 33, 32, 32, 32, 32, 32, 32, 31, 30, 29, 28, 28, 28, 28, 28, 27, 26,
+        26, 26, 26, 26, 25, 24, 24, 24, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 31, 31, 30, 29, 28, 28, 28, 28, 28, 26, 26, 25, 25, 25, 25,
+        24, 24, 24, 24, 31, 31, 31, 30, 30, 30, 30, 30, 30, 30, 29, 29, 29, 29,
+        29, 28, 28, 27, 27, 27, 27, 26, 25, 24, 24, 24, 24, 24, 24, 23, 23, 23,
+        29, 29, 29, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 27, 26, 26, 26,
+        26, 26, 26, 24, 24, 23, 23, 23, 23, 23, 23, 23, 22, 22, 28, 28, 27, 27,
+        27, 27, 27, 27, 26, 26, 26, 26, 26, 26, 26, 25, 24, 24, 24, 24, 24, 23,
+        23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 28, 28, 27, 27, 27, 27, 27, 27,
+        26, 26, 26, 26, 26, 26, 26, 25, 24, 24, 24, 24, 24, 23, 23, 22, 22, 22,
+        22, 22, 22, 22, 22, 22, 28, 28, 27, 27, 27, 27, 27, 27, 26, 26, 26, 26,
+        26, 26, 26, 25, 24, 24, 24, 24, 24, 23, 23, 22, 22, 22, 22, 22, 22, 22,
+        22, 22, 28, 28, 27, 27, 27, 27, 27, 27, 26, 26, 26, 26, 26, 26, 26, 25,
+        24, 24, 24, 24, 24, 23, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 27, 26,
+        26, 26, 26, 26, 26, 26, 26, 26, 25, 25, 25, 25, 25, 24, 24, 23, 23, 23,
+        23, 23, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 24, 24, 24, 24, 24, 24,
+        24, 24, 24, 24, 24, 24, 24, 24, 24, 23, 23, 23, 23, 23, 23, 23, 22, 22,
+        22, 22, 22, 22, 21, 21, 21, 21,
+        /* Size 4x16 */
+        33, 33, 29, 28, 33, 33, 29, 27, 33, 33, 28, 27, 33, 33, 28, 27, 33, 32,
+        28, 26, 33, 32, 28, 26, 33, 32, 28, 26, 33, 31, 27, 26, 31, 29, 26, 24,
+        30, 28, 26, 24, 30, 28, 26, 24, 28, 27, 24, 23, 27, 26, 23, 22, 27, 26,
+        23, 22, 26, 25, 23, 22, 24, 24, 22, 22,
+        /* Size 16x4 */
+        33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 30, 28, 27, 27, 26, 24, 33, 33,
+        33, 33, 32, 32, 32, 31, 29, 28, 28, 27, 26, 26, 25, 24, 29, 29, 28, 28,
+        28, 28, 28, 27, 26, 26, 26, 24, 23, 23, 23, 22, 28, 27, 27, 27, 26, 26,
+        26, 26, 24, 24, 24, 23, 22, 22, 22, 22,
+        /* Size 8x32 */
+        32, 33, 33, 33, 31, 28, 28, 27, 33, 33, 33, 33, 31, 28, 28, 26, 33, 33,
+        33, 33, 31, 27, 27, 26, 33, 33, 33, 33, 30, 27, 27, 26, 33, 33, 33, 33,
+        30, 27, 27, 26, 33, 33, 33, 33, 30, 27, 27, 26, 33, 33, 33, 33, 30, 27,
+        27, 26, 33, 33, 33, 33, 30, 27, 27, 26, 33, 33, 32, 32, 30, 26, 26, 26,
+        34, 33, 32, 32, 30, 26, 26, 26, 34, 33, 32, 32, 29, 26, 26, 25, 34, 33,
+        32, 32, 29, 26, 26, 25, 34, 33, 32, 32, 29, 26, 26, 25, 34, 33, 32, 32,
+        29, 26, 26, 25, 33, 32, 31, 31, 29, 26, 26, 25, 32, 31, 30, 30, 28, 25,
+        25, 24, 31, 30, 29, 29, 28, 24, 24, 24, 31, 29, 28, 28, 27, 24, 24, 23,
+        31, 29, 28, 28, 27, 24, 24, 23, 31, 29, 28, 28, 27, 24, 24, 23, 31, 29,
+        28, 28, 27, 24, 24, 23, 30, 28, 28, 28, 26, 23, 23, 23, 29, 28, 27, 27,
+        25, 23, 23, 22, 28, 27, 26, 26, 24, 22, 22, 22, 28, 26, 26, 26, 24, 22,
+        22, 22, 28, 26, 26, 26, 24, 22, 22, 22, 28, 26, 26, 26, 24, 22, 22, 22,
+        28, 26, 26, 26, 24, 22, 22, 22, 26, 26, 25, 25, 24, 22, 22, 22, 26, 25,
+        24, 24, 23, 22, 22, 22, 24, 24, 24, 24, 23, 22, 22, 21, 24, 24, 24, 24,
+        23, 22, 22, 21,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 33, 32, 31, 31,
+        31, 31, 31, 30, 29, 28, 28, 28, 28, 28, 26, 26, 24, 24, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 29, 29, 29, 29, 28,
+        28, 27, 26, 26, 26, 26, 26, 25, 24, 24, 33, 33, 33, 33, 33, 33, 33, 33,
+        32, 32, 32, 32, 32, 32, 31, 30, 29, 28, 28, 28, 28, 28, 27, 26, 26, 26,
+        26, 26, 25, 24, 24, 24, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32,
+        32, 32, 31, 30, 29, 28, 28, 28, 28, 28, 27, 26, 26, 26, 26, 26, 25, 24,
+        24, 24, 31, 31, 31, 30, 30, 30, 30, 30, 30, 30, 29, 29, 29, 29, 29, 28,
+        28, 27, 27, 27, 27, 26, 25, 24, 24, 24, 24, 24, 24, 23, 23, 23, 28, 28,
+        27, 27, 27, 27, 27, 27, 26, 26, 26, 26, 26, 26, 26, 25, 24, 24, 24, 24,
+        24, 23, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 28, 28, 27, 27, 27, 27,
+        27, 27, 26, 26, 26, 26, 26, 26, 26, 25, 24, 24, 24, 24, 24, 23, 23, 22,
+        22, 22, 22, 22, 22, 22, 22, 22, 27, 26, 26, 26, 26, 26, 26, 26, 26, 26,
+        25, 25, 25, 25, 25, 24, 24, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 22,
+        22, 22, 21, 21 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        33, 33, 33, 32, 33, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 31,
+        /* Size 8x8 */
+        33, 33, 33, 33, 33, 33, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32,
+        32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32,
+        32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 32, 32, 32, 32, 32, 32, 31, 31,
+        /* Size 16x16 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 33, 33, 33, 33,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+        31, 31, 31, 31,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31,
+        31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        /* Size 4x8 */
+        33, 33, 33, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32,
+        32, 32, 33, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31,
+        /* Size 8x4 */
+        33, 33, 33, 33, 33, 33, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31,
+        /* Size 8x16 */
+        32, 33, 33, 33, 33, 33, 33, 32, 33, 33, 33, 33, 33, 33, 32, 32, 33, 33,
+        32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32,
+        32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32,
+        32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32,
+        33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32,
+        32, 32, 32, 32, 31, 31, 33, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
+        32, 32, 31, 30, 32, 32, 32, 32, 32, 32, 31, 30, 32, 32, 32, 32, 32, 32,
+        31, 30,
+        /* Size 16x8 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+        31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30,
+        30, 30,
+        /* Size 16x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 31, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+        31, 31, 31, 30, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31,
+        30, 30, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 31, 31, 31, 30, 30,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31,
+        31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 30, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+        30, 30, 30, 30, 30, 30, 30, 30,
+        /* Size 4x16 */
+        33, 33, 33, 32, 33, 33, 33, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32,
+        32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32,
+        33, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32,
+        32, 31, 32, 32, 32, 31, 32, 32, 32, 31,
+        /* Size 16x4 */
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 31, 31, 31, 31,
+        /* Size 8x32 */
+        32, 33, 33, 33, 33, 33, 33, 32, 33, 33, 33, 33, 33, 33, 32, 32, 33, 33,
+        33, 33, 33, 33, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
+        32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32,
+        32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32,
+        33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32,
+        32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32,
+        32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32,
+        32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32,
+        33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32,
+        32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32,
+        32, 32, 31, 31, 33, 32, 32, 32, 32, 32, 31, 31, 33, 32, 32, 32, 32, 32,
+        31, 31, 32, 32, 32, 32, 32, 32, 31, 30, 32, 32, 32, 32, 32, 32, 31, 30,
+        32, 32, 32, 32, 32, 32, 31, 30, 32, 32, 32, 32, 32, 32, 31, 30, 32, 32,
+        32, 32, 32, 32, 31, 30, 32, 32, 32, 32, 32, 32, 31, 30, 32, 32, 32, 32,
+        32, 32, 31, 30,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30,
+        30, 30, 30, 30 },
+      { /* Chroma */
+        /* Size 4x4 */
+        33, 33, 33, 30, 33, 33, 33, 29, 33, 33, 32, 29, 30, 29, 29, 26,
+        /* Size 8x8 */
+        33, 33, 33, 33, 34, 33, 31, 31, 33, 33, 33, 33, 33, 32, 30, 30, 33, 33,
+        33, 33, 33, 32, 30, 30, 33, 33, 33, 33, 33, 32, 29, 29, 34, 33, 33, 33,
+        32, 32, 29, 29, 33, 32, 32, 32, 32, 31, 28, 28, 31, 30, 30, 29, 29, 28,
+        26, 26, 31, 30, 30, 29, 29, 28, 26, 26,
+        /* Size 16x16 */
+        32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 33, 31, 31, 31, 31, 33, 33,
+        33, 33, 33, 33, 33, 33, 34, 34, 34, 32, 31, 30, 30, 30, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 30, 30, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 32, 30, 30, 30, 30, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 32, 30, 30, 30, 30, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 32, 30, 30, 30, 30, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31,
+        30, 29, 29, 29, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 29,
+        29, 29, 34, 34, 33, 33, 33, 33, 33, 33, 32, 32, 32, 31, 30, 29, 29, 29,
+        34, 34, 33, 33, 33, 33, 33, 33, 32, 32, 32, 31, 30, 29, 29, 29, 34, 34,
+        33, 33, 33, 33, 33, 33, 32, 32, 32, 31, 30, 29, 29, 29, 33, 32, 32, 32,
+        32, 32, 31, 31, 31, 31, 31, 30, 28, 28, 28, 28, 31, 31, 31, 30, 30, 30,
+        30, 30, 30, 30, 30, 28, 28, 27, 27, 27, 31, 30, 30, 30, 30, 30, 29, 29,
+        29, 29, 29, 28, 27, 26, 26, 26, 31, 30, 30, 30, 30, 30, 29, 29, 29, 29,
+        29, 28, 27, 26, 26, 26, 31, 30, 30, 30, 30, 30, 29, 29, 29, 29, 29, 28,
+        27, 26, 26, 26,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34,
+        34, 34, 34, 33, 33, 32, 31, 31, 31, 31, 31, 31, 31, 30, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 33,
+        33, 32, 31, 31, 31, 31, 31, 31, 31, 30, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 33, 32, 32, 31, 30,
+        30, 30, 30, 30, 30, 30, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 31, 30, 30, 30, 30, 30,
+        30, 29, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 32, 31, 31, 30, 30, 30, 30, 30, 30, 29, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 32, 32, 31, 30, 30, 30, 30, 30, 30, 30, 29, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 31,
+        30, 30, 30, 30, 30, 30, 30, 29, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 31, 30, 30, 30, 30,
+        30, 30, 30, 29, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 32, 32, 31, 30, 30, 30, 30, 30, 30, 30, 29,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 32, 32, 31, 30, 30, 30, 30, 30, 30, 30, 29, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32,
+        32, 31, 30, 30, 30, 30, 30, 30, 30, 29, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 31, 30, 29,
+        29, 29, 29, 29, 29, 29, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 31, 30, 29, 29, 29, 29, 29,
+        29, 29, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 32, 31, 31, 30, 29, 29, 29, 29, 29, 29, 28, 34, 34,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 32, 31, 30, 30, 29, 29, 29, 29, 29, 29, 28, 34, 34, 34, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 30,
+        30, 29, 29, 29, 29, 29, 29, 28, 34, 34, 34, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 29, 29, 29,
+        29, 29, 29, 28, 34, 34, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 29, 29, 29, 29, 29, 29, 28,
+        34, 34, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32,
+        32, 32, 32, 32, 31, 30, 30, 29, 29, 29, 29, 29, 29, 28, 34, 34, 34, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32,
+        31, 30, 30, 29, 29, 29, 29, 29, 29, 28, 34, 34, 34, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 29,
+        29, 29, 29, 29, 29, 28, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 29, 28, 28, 28, 28, 28,
+        28, 28, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+        31, 31, 31, 31, 31, 30, 30, 29, 28, 28, 28, 28, 28, 28, 28, 28, 32, 32,
+        32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30,
+        30, 30, 29, 28, 28, 28, 28, 28, 28, 28, 28, 27, 31, 31, 31, 31, 31, 30,
+        30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 29, 28, 28,
+        28, 27, 27, 27, 27, 27, 27, 26, 31, 31, 30, 30, 30, 30, 30, 30, 30, 30,
+        30, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 28, 28, 28, 27, 26, 26, 26,
+        26, 26, 26, 26, 31, 31, 30, 30, 30, 30, 30, 30, 30, 30, 30, 29, 29, 29,
+        29, 29, 29, 29, 29, 29, 29, 28, 28, 28, 27, 26, 26, 26, 26, 26, 26, 26,
+        31, 31, 30, 30, 30, 30, 30, 30, 30, 30, 30, 29, 29, 29, 29, 29, 29, 29,
+        29, 29, 29, 28, 28, 28, 27, 26, 26, 26, 26, 26, 26, 26, 31, 31, 30, 30,
+        30, 30, 30, 30, 30, 30, 30, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 28,
+        28, 28, 27, 26, 26, 26, 26, 26, 26, 26, 31, 31, 30, 30, 30, 30, 30, 30,
+        30, 30, 30, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 28, 28, 28, 27, 26,
+        26, 26, 26, 26, 26, 26, 31, 31, 30, 30, 30, 30, 30, 30, 30, 30, 30, 29,
+        29, 29, 29, 29, 29, 29, 29, 29, 29, 28, 28, 28, 27, 26, 26, 26, 26, 26,
+        26, 26, 30, 30, 30, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 28, 28, 28,
+        28, 28, 28, 28, 28, 28, 28, 27, 26, 26, 26, 26, 26, 26, 26, 26,
+        /* Size 4x8 */
+        33, 33, 33, 30, 33, 33, 33, 29, 33, 33, 33, 29, 33, 32, 32, 28, 33, 32,
+        32, 28, 33, 31, 31, 28, 30, 28, 28, 26, 30, 28, 28, 26,
+        /* Size 8x4 */
+        33, 33, 33, 33, 33, 33, 30, 30, 33, 33, 33, 32, 32, 31, 28, 28, 33, 33,
+        33, 32, 32, 31, 28, 28, 30, 29, 29, 28, 28, 28, 26, 26,
+        /* Size 8x16 */
+        32, 33, 33, 33, 33, 33, 31, 29, 33, 33, 33, 33, 33, 33, 31, 28, 33, 33,
+        33, 33, 33, 33, 30, 28, 33, 33, 33, 33, 33, 33, 30, 28, 33, 33, 33, 33,
+        33, 33, 30, 28, 33, 33, 33, 33, 33, 33, 30, 28, 33, 33, 33, 32, 32, 32,
+        30, 28, 34, 33, 33, 32, 32, 32, 30, 27, 34, 33, 32, 32, 32, 32, 29, 27,
+        34, 33, 32, 32, 32, 32, 29, 27, 34, 33, 32, 32, 32, 32, 29, 27, 33, 32,
+        31, 31, 31, 31, 28, 26, 31, 30, 30, 29, 29, 29, 28, 26, 31, 30, 29, 28,
+        28, 28, 27, 25, 31, 30, 29, 28, 28, 28, 27, 25, 31, 30, 29, 28, 28, 28,
+        27, 25,
+        /* Size 16x8 */
+        32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 33, 31, 31, 31, 31, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 30, 30, 30, 30, 33, 33, 33, 33,
+        33, 33, 33, 33, 32, 32, 32, 31, 30, 29, 29, 29, 33, 33, 33, 33, 33, 33,
+        32, 32, 32, 32, 32, 31, 29, 28, 28, 28, 33, 33, 33, 33, 33, 33, 32, 32,
+        32, 32, 32, 31, 29, 28, 28, 28, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32,
+        32, 31, 29, 28, 28, 28, 31, 31, 30, 30, 30, 30, 30, 30, 29, 29, 29, 28,
+        28, 27, 27, 27, 29, 28, 28, 28, 28, 28, 28, 27, 27, 27, 27, 26, 26, 25,
+        25, 25,
+        /* Size 16x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 29, 28, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 29, 28, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 28, 28, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 32, 31, 29, 28, 27, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 32, 30, 29, 28, 27, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 31, 30, 29, 28, 27, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31,
+        30, 29, 28, 27, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 29,
+        28, 27, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 29, 28, 27,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 29, 28, 27, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 29, 28, 27, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 31, 30, 29, 28, 27, 33, 33, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 31, 30, 28, 28, 26, 33, 33, 33, 33, 33, 32, 32, 32,
+        32, 32, 32, 31, 30, 28, 28, 26, 34, 33, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 31, 30, 28, 27, 26, 34, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31,
+        29, 28, 27, 26, 34, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 29, 28,
+        27, 26, 34, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 29, 28, 27, 26,
+        34, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 29, 28, 27, 26, 34, 33,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 29, 28, 27, 26, 34, 33, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 31, 29, 28, 27, 26, 33, 33, 33, 32, 32, 31,
+        31, 31, 31, 31, 31, 30, 29, 28, 27, 26, 33, 32, 32, 31, 31, 31, 31, 31,
+        31, 31, 31, 29, 28, 28, 26, 25, 32, 32, 31, 31, 30, 30, 30, 30, 30, 30,
+        30, 29, 28, 27, 26, 25, 31, 31, 30, 30, 30, 29, 29, 29, 29, 29, 29, 28,
+        28, 26, 26, 24, 31, 30, 30, 29, 29, 28, 28, 28, 28, 28, 28, 28, 27, 26,
+        25, 24, 31, 30, 30, 29, 29, 28, 28, 28, 28, 28, 28, 28, 27, 26, 25, 24,
+        31, 30, 30, 29, 29, 28, 28, 28, 28, 28, 28, 28, 27, 26, 25, 24, 31, 30,
+        30, 29, 29, 28, 28, 28, 28, 28, 28, 28, 27, 26, 25, 24, 31, 30, 30, 29,
+        29, 28, 28, 28, 28, 28, 28, 28, 27, 26, 25, 24, 31, 30, 30, 29, 29, 28,
+        28, 28, 28, 28, 28, 28, 27, 26, 25, 24, 30, 30, 29, 29, 28, 28, 28, 28,
+        28, 28, 28, 27, 26, 26, 24, 23,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34,
+        34, 34, 34, 33, 33, 32, 31, 31, 31, 31, 31, 31, 31, 30, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        32, 32, 31, 30, 30, 30, 30, 30, 30, 30, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 30,
+        30, 30, 30, 30, 30, 29, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 31, 30, 29, 29, 29, 29, 29,
+        29, 29, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 32, 31, 30, 30, 29, 29, 29, 29, 29, 29, 28, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 30, 29, 28, 28, 28, 28, 28, 28, 28, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30,
+        29, 28, 28, 28, 28, 28, 28, 28, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 29, 28, 28, 28,
+        28, 28, 28, 28, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 29, 28, 28, 28, 28, 28, 28, 28,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 31, 31, 30, 29, 28, 28, 28, 28, 28, 28, 28, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+        31, 30, 29, 28, 28, 28, 28, 28, 28, 28, 32, 32, 32, 32, 32, 31, 31, 31,
+        31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 29, 29, 28, 28,
+        28, 28, 28, 28, 28, 27, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 30, 30,
+        30, 30, 30, 29, 29, 29, 29, 29, 29, 29, 28, 28, 28, 27, 27, 27, 27, 27,
+        27, 26, 30, 30, 30, 29, 29, 29, 29, 29, 29, 29, 29, 29, 28, 28, 28, 28,
+        28, 28, 28, 28, 28, 28, 28, 27, 26, 26, 26, 26, 26, 26, 26, 26, 29, 29,
+        28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 27, 27, 27, 27, 27, 27,
+        27, 27, 26, 26, 26, 25, 25, 25, 25, 25, 25, 24, 28, 28, 28, 27, 27, 27,
+        27, 27, 27, 27, 27, 27, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 25, 25,
+        24, 24, 24, 24, 24, 24, 24, 23,
+        /* Size 4x16 */
+        33, 33, 33, 30, 33, 33, 33, 30, 33, 33, 33, 29, 33, 33, 33, 29, 33, 33,
+        33, 29, 33, 33, 33, 29, 33, 32, 32, 28, 33, 32, 32, 28, 33, 32, 32, 28,
+        33, 32, 32, 28, 33, 32, 32, 28, 32, 31, 31, 28, 31, 29, 29, 26, 30, 28,
+        28, 26, 30, 28, 28, 26, 30, 28, 28, 26,
+        /* Size 16x4 */
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 30, 30, 33, 33,
+        33, 33, 33, 33, 32, 32, 32, 32, 32, 31, 29, 28, 28, 28, 33, 33, 33, 33,
+        33, 33, 32, 32, 32, 32, 32, 31, 29, 28, 28, 28, 30, 30, 29, 29, 29, 29,
+        28, 28, 28, 28, 28, 28, 26, 26, 26, 26,
+        /* Size 8x32 */
+        32, 33, 33, 33, 33, 33, 31, 29, 33, 33, 33, 33, 33, 33, 31, 29, 33, 33,
+        33, 33, 33, 33, 31, 28, 33, 33, 33, 33, 33, 33, 31, 28, 33, 33, 33, 33,
+        33, 33, 30, 28, 33, 33, 33, 33, 33, 33, 30, 28, 33, 33, 33, 33, 33, 33,
+        30, 28, 33, 33, 33, 33, 33, 33, 30, 28, 33, 33, 33, 33, 33, 33, 30, 28,
+        33, 33, 33, 33, 33, 33, 30, 28, 33, 33, 33, 33, 33, 33, 30, 28, 33, 33,
+        33, 33, 33, 33, 30, 28, 33, 33, 33, 32, 32, 32, 30, 28, 33, 33, 33, 32,
+        32, 32, 30, 28, 34, 33, 33, 32, 32, 32, 30, 27, 34, 33, 32, 32, 32, 32,
+        29, 27, 34, 33, 32, 32, 32, 32, 29, 27, 34, 33, 32, 32, 32, 32, 29, 27,
+        34, 33, 32, 32, 32, 32, 29, 27, 34, 33, 32, 32, 32, 32, 29, 27, 34, 33,
+        32, 32, 32, 32, 29, 27, 33, 33, 32, 31, 31, 31, 29, 27, 33, 32, 31, 31,
+        31, 31, 28, 26, 32, 31, 30, 30, 30, 30, 28, 26, 31, 30, 30, 29, 29, 29,
+        28, 26, 31, 30, 29, 28, 28, 28, 27, 25, 31, 30, 29, 28, 28, 28, 27, 25,
+        31, 30, 29, 28, 28, 28, 27, 25, 31, 30, 29, 28, 28, 28, 27, 25, 31, 30,
+        29, 28, 28, 28, 27, 25, 31, 30, 29, 28, 28, 28, 27, 25, 30, 29, 28, 28,
+        28, 28, 26, 24,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34,
+        34, 34, 34, 33, 33, 32, 31, 31, 31, 31, 31, 31, 31, 30, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        32, 31, 30, 30, 30, 30, 30, 30, 30, 29, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 29,
+        29, 29, 29, 29, 29, 28, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 29, 28, 28, 28, 28, 28,
+        28, 28, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 31, 31, 30, 29, 28, 28, 28, 28, 28, 28, 28, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 31, 31, 30, 29, 28, 28, 28, 28, 28, 28, 28, 31, 31, 31, 31, 30, 30,
+        30, 30, 30, 30, 30, 30, 30, 30, 30, 29, 29, 29, 29, 29, 29, 29, 28, 28,
+        28, 27, 27, 27, 27, 27, 27, 26, 29, 29, 28, 28, 28, 28, 28, 28, 28, 28,
+        28, 28, 28, 28, 27, 27, 27, 27, 27, 27, 27, 27, 26, 26, 26, 25, 25, 25,
+        25, 25, 25, 24 },
+  },
+  {
+      { /* Luma */
+        /* Size 4x4 */
+        33, 33, 33, 33, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32,
+        /* Size 8x8 */
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
+        32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
+        32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
+        /* Size 16x16 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        /* Size 4x8 */
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32,
+        32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32,
+        /* Size 8x4 */
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33,
+        32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
+        /* Size 8x16 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 33, 33, 33, 32,
+        32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
+        32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
+        33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33,
+        32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
+        32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
+        32, 32,
+        /* Size 16x8 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32,
+        /* Size 16x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32,
+        /* Size 4x16 */
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 33, 32,
+        32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32,
+        33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32,
+        32, 32, 33, 32, 32, 32, 33, 32, 32, 32,
+        /* Size 16x4 */
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        /* Size 8x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32,
+        32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32,
+        33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33,
+        32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
+        32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
+        32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
+        33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33,
+        32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
+        32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
+        32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
+        33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33,
+        32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
+        32, 32, 32, 32,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 32, 32 },
+      { /* Chroma */
+        /* Size 4x4 */
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        /* Size 8x8 */
+        33, 33, 33, 33, 33, 33, 33, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 34, 33, 33, 33, 33, 33, 33, 33,
+        /* Size 16x16 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 34, 34, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 32,
+        /* Size 32x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 34, 34, 34, 34, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34,
+        34, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 34, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        34, 34, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        32, 32, 34, 34, 34, 34, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32,
+        /* Size 4x8 */
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 34, 33, 32, 32,
+        /* Size 8x4 */
+        33, 33, 33, 33, 33, 33, 33, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 32, 32, 33, 33, 33, 33, 33, 33, 32, 32,
+        /* Size 8x16 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33, 33, 33,
+        33, 32, 32, 32, 34, 33, 33, 33, 33, 32, 32, 32, 34, 33, 33, 33, 32, 32,
+        32, 32,
+        /* Size 16x8 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32,
+        32, 32,
+        /* Size 16x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32,
+        32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32,
+        34, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 34, 33,
+        33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 34, 34, 33, 33,
+        33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 34, 34, 33, 33, 33, 33,
+        33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 34, 34, 33, 33, 33, 33, 33, 33,
+        32, 32, 32, 32, 32, 32, 32, 32,
+        /* Size 32x16 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32,
+        32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32,
+        32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32,
+        /* Size 4x16 */
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 33, 33,
+        32, 32, 33, 33, 32, 32, 34, 33, 32, 32,
+        /* Size 16x4 */
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 32, 32, 32, 32,
+        /* Size 8x32 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33, 33, 33, 33, 32,
+        32, 32, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33, 33, 33, 33, 32, 32, 32,
+        34, 33, 33, 33, 33, 32, 32, 32, 34, 33, 33, 33, 33, 32, 32, 32, 34, 33,
+        33, 33, 32, 32, 32, 32, 34, 33, 33, 33, 32, 32, 32, 32, 34, 33, 33, 33,
+        32, 32, 32, 32,
+        /* Size 32x8 */
+        32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32,
+        32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+        33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32,
+        32, 32, 32, 32 },
+  },
+};

diff --git a/libav1/av1/common/quant_common.h b/libav1/av1/common/quant_common.h
new file mode 100644
index 0000000..d1f52a6
--- /dev/null
+++ b/libav1/av1/common/quant_common.h

@@ -0,0 +1,63 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_QUANT_COMMON_H_
+#define AOM_AV1_COMMON_QUANT_COMMON_H_
+
+#include "aom/aom_codec.h"
+#include "av1/common/seg_common.h"
+#include "av1/common/enums.h"
+#include "av1/common/entropy.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MINQ 0
+#define MAXQ 255
+#define QINDEX_RANGE (MAXQ - MINQ + 1)
+#define QINDEX_BITS 8
+// Total number of QM sets stored
+#define QM_LEVEL_BITS 4
+#define NUM_QM_LEVELS (1 << QM_LEVEL_BITS)
+/* Range of QMS is between first and last value, with offset applied to inter
+ * blocks*/
+#define DEFAULT_QM_Y 10
+#define DEFAULT_QM_U 11
+#define DEFAULT_QM_V 12
+#define DEFAULT_QM_FIRST 5
+#define DEFAULT_QM_LAST 9
+
+struct AV1Common;
+
+int16_t av1_dc_quant_Q3(int qindex, int delta, aom_bit_depth_t bit_depth);
+int16_t av1_ac_quant_Q3(int qindex, int delta, aom_bit_depth_t bit_depth);
+int16_t av1_dc_quant_QTX(int qindex, int delta, aom_bit_depth_t bit_depth);
+int16_t av1_ac_quant_QTX(int qindex, int delta, aom_bit_depth_t bit_depth);
+
+int av1_get_qindex(const struct segmentation *seg, int segment_id,
+                   int base_qindex);
+// Reduce the large number of quantizers to a smaller number of levels for which
+// different matrices may be defined
+static INLINE int aom_get_qmlevel(int qindex, int first, int last) {
+  return first + (qindex * (last + 1 - first)) / QINDEX_RANGE;
+}
+void av1_qm_init(struct AV1Common *cm);
+const qm_val_t *av1_iqmatrix(struct AV1Common *cm, int qindex, int comp,
+                             TX_SIZE tx_size);
+const qm_val_t *av1_qmatrix(struct AV1Common *cm, int qindex, int comp,
+                            TX_SIZE tx_size);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_QUANT_COMMON_H_

diff --git a/libav1/av1/common/reconinter.c b/libav1/av1/common/reconinter.c
new file mode 100644
index 0000000..5042ee1
--- /dev/null
+++ b/libav1/av1/common/reconinter.c

@@ -0,0 +1,1102 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+#include <stdio.h>
+#include <limits.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+#include "config/aom_scale_rtcd.h"
+
+#include "aom/aom_integer.h"
+#include "aom_dsp/blend.h"
+
+#include "av1/common/blockd.h"
+#include "av1/common/mvref_common.h"
+#include "av1/common/reconinter.h"
+#include "av1/common/reconintra.h"
+#include "av1/common/onyxc_int.h"
+#include "av1/common/obmc.h"
+
+#define USE_PRECOMPUTED_WEDGE_MASK 1
+#define USE_PRECOMPUTED_WEDGE_SIGN 1
+
+// This function will determine whether or not to create a warped
+// prediction.
+int av1_allow_warp(const MB_MODE_INFO *const mbmi,
+                   const WarpTypesAllowed *const warp_types,
+                   const WarpedMotionParams *const gm_params,
+                   int build_for_obmc, const struct scale_factors *const sf,
+                   WarpedMotionParams *final_warp_params) {
+  // Note: As per the spec, we must test the fixed point scales here, which are
+  // at a higher precision (1 << 14) than the xs and ys in subpel_params (that
+  // have 1 << 10 precision).
+  if (av1_is_scaled(sf)) return 0;
+
+  if (final_warp_params != NULL) *final_warp_params = default_warp_params;
+
+  if (build_for_obmc) return 0;
+
+  if (warp_types->local_warp_allowed && !mbmi->wm_params.invalid) {
+    if (final_warp_params != NULL)
+      memcpy(final_warp_params, &mbmi->wm_params, sizeof(*final_warp_params));
+    return 1;
+  } else if (warp_types->global_warp_allowed && !gm_params->invalid) {
+    if (final_warp_params != NULL)
+      memcpy(final_warp_params, gm_params, sizeof(*final_warp_params));
+    return 1;
+  }
+
+  return 0;
+}
+
+void av1_make_inter_predictor(const uint8_t *src, int src_stride, uint8_t *dst,
+                              int dst_stride, const SubpelParams *subpel_params,
+                              const struct scale_factors *sf, int w, int h,
+                              ConvolveParams *conv_params,
+                              InterpFilters interp_filters,
+                              const WarpTypesAllowed *warp_types, int p_col,
+                              int p_row, int plane, int ref,
+                              const MB_MODE_INFO *mi, int build_for_obmc,
+                              const MACROBLOCKD *xd, int can_use_previous) {
+  // Make sure the selected motion mode is valid for this configuration
+  assert_motion_mode_valid(mi->motion_mode, xd->global_motion, xd, mi,
+                           can_use_previous);
+  assert(IMPLIES(conv_params->is_compound, conv_params->dst != NULL));
+
+  WarpedMotionParams final_warp_params;
+  const int do_warp =
+      (w >= 8 && h >= 8 &&
+       av1_allow_warp(mi, warp_types, &xd->global_motion[mi->ref_frame[ref]],
+                      build_for_obmc, sf, &final_warp_params));
+  const int is_intrabc = mi->use_intrabc;
+  assert(IMPLIES(is_intrabc, !do_warp));
+
+  if (do_warp && xd->cur_frame_force_integer_mv == 0) {
+    const struct macroblockd_plane *const pd = &xd->plane[plane];
+    const struct buf_2d *const pre_buf = &pd->pre[ref];
+    av1_warp_plane(&final_warp_params, is_cur_buf_hbd(xd), xd->bd,
+                   pre_buf->buf0, pre_buf->width, pre_buf->height,
+                   pre_buf->stride, dst, p_col, p_row, w, h, dst_stride,
+                   pd->subsampling_x, pd->subsampling_y, conv_params);
+  } else if (is_cur_buf_hbd(xd)) {
+    highbd_inter_predictor(src, src_stride, dst, dst_stride, subpel_params, sf,
+                           w, h, conv_params, interp_filters, is_intrabc,
+                           xd->bd);
+  } else {
+    inter_predictor(src, src_stride, dst, dst_stride, subpel_params, sf, w, h,
+                    conv_params, interp_filters, is_intrabc);
+  }
+}
+
+#if USE_PRECOMPUTED_WEDGE_MASK
+static const uint8_t wedge_master_oblique_odd[MASK_MASTER_SIZE] = {
+  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
+  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  2,  6,  18,
+  37, 53, 60, 63, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
+  64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
+};
+static const uint8_t wedge_master_oblique_even[MASK_MASTER_SIZE] = {
+  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
+  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  4,  11, 27,
+  46, 58, 62, 63, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
+  64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
+};
+static const uint8_t wedge_master_vertical[MASK_MASTER_SIZE] = {
+  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
+  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  2,  7,  21,
+  43, 57, 62, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
+  64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
+};
+
+static void shift_copy(const uint8_t *src, uint8_t *dst, int shift, int width) {
+  if (shift >= 0) {
+    memcpy(dst + shift, src, width - shift);
+    memset(dst, src[0], shift);
+  } else {
+    shift = -shift;
+    memcpy(dst, src + shift, width - shift);
+    memset(dst + width - shift, src[width - 1], shift);
+  }
+}
+#endif  // USE_PRECOMPUTED_WEDGE_MASK
+
+#if USE_PRECOMPUTED_WEDGE_SIGN
+/* clang-format off */
+DECLARE_ALIGNED(16, static uint8_t,
+                wedge_signflip_lookup[BLOCK_SIZES_ALL][MAX_WEDGE_TYPES]) = {
+  { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },  // not used
+  { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },  // not used
+  { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },  // not used
+  { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, },
+  { 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, },
+  { 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, },
+  { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, },
+  { 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, },
+  { 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, },
+  { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, },
+  { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },  // not used
+  { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },  // not used
+  { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },  // not used
+  { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },  // not used
+  { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },  // not used
+  { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },  // not used
+  { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },  // not used
+  { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },  // not used
+  { 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, },
+  { 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, },
+  { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },  // not used
+  { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, },  // not used
+};
+/* clang-format on */
+#else
+DECLARE_ALIGNED(16, static uint8_t,
+                wedge_signflip_lookup[BLOCK_SIZES_ALL][MAX_WEDGE_TYPES]);
+#endif  // USE_PRECOMPUTED_WEDGE_SIGN
+
+// [negative][direction]
+DECLARE_ALIGNED(
+    16, static uint8_t,
+    wedge_mask_obl[2][WEDGE_DIRECTIONS][MASK_MASTER_SIZE * MASK_MASTER_SIZE]);
+
+// 4 * MAX_WEDGE_SQUARE is an easy to compute and fairly tight upper bound
+// on the sum of all mask sizes up to an including MAX_WEDGE_SQUARE.
+DECLARE_ALIGNED(16, uint8_t,
+                wedge_mask_buf[2 * MAX_WEDGE_TYPES * 4 * MAX_WEDGE_SQUARE]);
+
+static wedge_masks_type wedge_masks[BLOCK_SIZES_ALL][2];
+
+static const wedge_code_type wedge_codebook_16_hgtw[16] = {
+  { WEDGE_OBLIQUE27, 4, 4 },  { WEDGE_OBLIQUE63, 4, 4 },
+  { WEDGE_OBLIQUE117, 4, 4 }, { WEDGE_OBLIQUE153, 4, 4 },
+  { WEDGE_HORIZONTAL, 4, 2 }, { WEDGE_HORIZONTAL, 4, 4 },
+  { WEDGE_HORIZONTAL, 4, 6 }, { WEDGE_VERTICAL, 4, 4 },
+  { WEDGE_OBLIQUE27, 4, 2 },  { WEDGE_OBLIQUE27, 4, 6 },
+  { WEDGE_OBLIQUE153, 4, 2 }, { WEDGE_OBLIQUE153, 4, 6 },
+  { WEDGE_OBLIQUE63, 2, 4 },  { WEDGE_OBLIQUE63, 6, 4 },
+  { WEDGE_OBLIQUE117, 2, 4 }, { WEDGE_OBLIQUE117, 6, 4 },
+};
+
+static const wedge_code_type wedge_codebook_16_hltw[16] = {
+  { WEDGE_OBLIQUE27, 4, 4 },  { WEDGE_OBLIQUE63, 4, 4 },
+  { WEDGE_OBLIQUE117, 4, 4 }, { WEDGE_OBLIQUE153, 4, 4 },
+  { WEDGE_VERTICAL, 2, 4 },   { WEDGE_VERTICAL, 4, 4 },
+  { WEDGE_VERTICAL, 6, 4 },   { WEDGE_HORIZONTAL, 4, 4 },
+  { WEDGE_OBLIQUE27, 4, 2 },  { WEDGE_OBLIQUE27, 4, 6 },
+  { WEDGE_OBLIQUE153, 4, 2 }, { WEDGE_OBLIQUE153, 4, 6 },
+  { WEDGE_OBLIQUE63, 2, 4 },  { WEDGE_OBLIQUE63, 6, 4 },
+  { WEDGE_OBLIQUE117, 2, 4 }, { WEDGE_OBLIQUE117, 6, 4 },
+};
+
+static const wedge_code_type wedge_codebook_16_heqw[16] = {
+  { WEDGE_OBLIQUE27, 4, 4 },  { WEDGE_OBLIQUE63, 4, 4 },
+  { WEDGE_OBLIQUE117, 4, 4 }, { WEDGE_OBLIQUE153, 4, 4 },
+  { WEDGE_HORIZONTAL, 4, 2 }, { WEDGE_HORIZONTAL, 4, 6 },
+  { WEDGE_VERTICAL, 2, 4 },   { WEDGE_VERTICAL, 6, 4 },
+  { WEDGE_OBLIQUE27, 4, 2 },  { WEDGE_OBLIQUE27, 4, 6 },
+  { WEDGE_OBLIQUE153, 4, 2 }, { WEDGE_OBLIQUE153, 4, 6 },
+  { WEDGE_OBLIQUE63, 2, 4 },  { WEDGE_OBLIQUE63, 6, 4 },
+  { WEDGE_OBLIQUE117, 2, 4 }, { WEDGE_OBLIQUE117, 6, 4 },
+};
+
+const wedge_params_type wedge_params_lookup[BLOCK_SIZES_ALL] = {
+  { 0, NULL, NULL, NULL },
+  { 0, NULL, NULL, NULL },
+  { 0, NULL, NULL, NULL },
+  { 4, wedge_codebook_16_heqw, wedge_signflip_lookup[BLOCK_8X8],
+    wedge_masks[BLOCK_8X8] },
+  { 4, wedge_codebook_16_hgtw, wedge_signflip_lookup[BLOCK_8X16],
+    wedge_masks[BLOCK_8X16] },
+  { 4, wedge_codebook_16_hltw, wedge_signflip_lookup[BLOCK_16X8],
+    wedge_masks[BLOCK_16X8] },
+  { 4, wedge_codebook_16_heqw, wedge_signflip_lookup[BLOCK_16X16],
+    wedge_masks[BLOCK_16X16] },
+  { 4, wedge_codebook_16_hgtw, wedge_signflip_lookup[BLOCK_16X32],
+    wedge_masks[BLOCK_16X32] },
+  { 4, wedge_codebook_16_hltw, wedge_signflip_lookup[BLOCK_32X16],
+    wedge_masks[BLOCK_32X16] },
+  { 4, wedge_codebook_16_heqw, wedge_signflip_lookup[BLOCK_32X32],
+    wedge_masks[BLOCK_32X32] },
+  { 0, NULL, NULL, NULL },
+  { 0, NULL, NULL, NULL },
+  { 0, NULL, NULL, NULL },
+  { 0, NULL, NULL, NULL },
+  { 0, NULL, NULL, NULL },
+  { 0, NULL, NULL, NULL },
+  { 0, NULL, NULL, NULL },
+  { 0, NULL, NULL, NULL },
+  { 4, wedge_codebook_16_hgtw, wedge_signflip_lookup[BLOCK_8X32],
+    wedge_masks[BLOCK_8X32] },
+  { 4, wedge_codebook_16_hltw, wedge_signflip_lookup[BLOCK_32X8],
+    wedge_masks[BLOCK_32X8] },
+  { 0, NULL, NULL, NULL },
+  { 0, NULL, NULL, NULL },
+};
+
+static const uint8_t *get_wedge_mask_inplace(int wedge_index, int neg,
+                                             BLOCK_SIZE sb_type) {
+  const uint8_t *master;
+  const int bh = block_size_high[sb_type];
+  const int bw = block_size_wide[sb_type];
+  const wedge_code_type *a =
+      wedge_params_lookup[sb_type].codebook + wedge_index;
+  int woff, hoff;
+  const uint8_t wsignflip = wedge_params_lookup[sb_type].signflip[wedge_index];
+
+  assert(wedge_index >= 0 &&
+         wedge_index < (1 << get_wedge_bits_lookup(sb_type)));
+  woff = (a->x_offset * bw) >> 3;
+  hoff = (a->y_offset * bh) >> 3;
+  master = wedge_mask_obl[neg ^ wsignflip][a->direction] +
+           MASK_MASTER_STRIDE * (MASK_MASTER_SIZE / 2 - hoff) +
+           MASK_MASTER_SIZE / 2 - woff;
+  return master;
+}
+
+const uint8_t *av1_get_compound_type_mask(
+    const INTERINTER_COMPOUND_DATA *const comp_data, BLOCK_SIZE sb_type) {
+  assert(is_masked_compound_type(comp_data->type));
+  (void)sb_type;
+#if 1
+  assert(0);
+  return NULL;
+#else
+  switch (comp_data->type) {
+    case COMPOUND_WEDGE:
+      return av1_get_contiguous_soft_mask(comp_data->wedge_index,
+                                          comp_data->wedge_sign, sb_type);
+    case COMPOUND_DIFFWTD: return comp_data->seg_mask;
+    default: assert(0); return NULL;
+  }
+#endif
+}
+
+static void diffwtd_mask_d16(uint8_t *mask, int which_inverse, int mask_base,
+                             const CONV_BUF_TYPE *src0, int src0_stride,
+                             const CONV_BUF_TYPE *src1, int src1_stride, int h,
+                             int w, ConvolveParams *conv_params, int bd) {
+  int round =
+      2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1 + (bd - 8);
+  int i, j, m, diff;
+  for (i = 0; i < h; ++i) {
+    for (j = 0; j < w; ++j) {
+      diff = abs(src0[i * src0_stride + j] - src1[i * src1_stride + j]);
+      diff = ROUND_POWER_OF_TWO(diff, round);
+      m = clamp(mask_base + (diff / DIFF_FACTOR), 0, AOM_BLEND_A64_MAX_ALPHA);
+      mask[i * w + j] = which_inverse ? AOM_BLEND_A64_MAX_ALPHA - m : m;
+    }
+  }
+}
+
+void av1_build_compound_diffwtd_mask_d16_c(
+    uint8_t *mask, DIFFWTD_MASK_TYPE mask_type, const CONV_BUF_TYPE *src0,
+    int src0_stride, const CONV_BUF_TYPE *src1, int src1_stride, int h, int w,
+    ConvolveParams *conv_params, int bd) {
+  switch (mask_type) {
+    case DIFFWTD_38:
+      diffwtd_mask_d16(mask, 0, 38, src0, src0_stride, src1, src1_stride, h, w,
+                       conv_params, bd);
+      break;
+    case DIFFWTD_38_INV:
+      diffwtd_mask_d16(mask, 1, 38, src0, src0_stride, src1, src1_stride, h, w,
+                       conv_params, bd);
+      break;
+    default: assert(0);
+  }
+}
+
+static void diffwtd_mask(uint8_t *mask, int which_inverse, int mask_base,
+                         const uint8_t *src0, int src0_stride,
+                         const uint8_t *src1, int src1_stride, int h, int w) {
+  int i, j, m, diff;
+  for (i = 0; i < h; ++i) {
+    for (j = 0; j < w; ++j) {
+      diff =
+          abs((int)src0[i * src0_stride + j] - (int)src1[i * src1_stride + j]);
+      m = clamp(mask_base + (diff / DIFF_FACTOR), 0, AOM_BLEND_A64_MAX_ALPHA);
+      mask[i * w + j] = which_inverse ? AOM_BLEND_A64_MAX_ALPHA - m : m;
+    }
+  }
+}
+
+void av1_build_compound_diffwtd_mask_c(uint8_t *mask,
+                                       DIFFWTD_MASK_TYPE mask_type,
+                                       const uint8_t *src0, int src0_stride,
+                                       const uint8_t *src1, int src1_stride,
+                                       int h, int w) {
+  switch (mask_type) {
+    case DIFFWTD_38:
+      diffwtd_mask(mask, 0, 38, src0, src0_stride, src1, src1_stride, h, w);
+      break;
+    case DIFFWTD_38_INV:
+      diffwtd_mask(mask, 1, 38, src0, src0_stride, src1, src1_stride, h, w);
+      break;
+    default: assert(0);
+  }
+}
+
+static AOM_FORCE_INLINE void diffwtd_mask_highbd(
+    uint8_t *mask, int which_inverse, int mask_base, const uint16_t *src0,
+    int src0_stride, const uint16_t *src1, int src1_stride, int h, int w,
+    const unsigned int bd) {
+  assert(bd >= 8);
+  if (bd == 8) {
+    if (which_inverse) {
+      for (int i = 0; i < h; ++i) {
+        for (int j = 0; j < w; ++j) {
+          int diff = abs((int)src0[j] - (int)src1[j]) / DIFF_FACTOR;
+          unsigned int m = negative_to_zero(mask_base + diff);
+          m = AOMMIN(m, AOM_BLEND_A64_MAX_ALPHA);
+          mask[j] = AOM_BLEND_A64_MAX_ALPHA - m;
+        }
+        src0 += src0_stride;
+        src1 += src1_stride;
+        mask += w;
+      }
+    } else {
+      for (int i = 0; i < h; ++i) {
+        for (int j = 0; j < w; ++j) {
+          int diff = abs((int)src0[j] - (int)src1[j]) / DIFF_FACTOR;
+          unsigned int m = negative_to_zero(mask_base + diff);
+          m = AOMMIN(m, AOM_BLEND_A64_MAX_ALPHA);
+          mask[j] = m;
+        }
+        src0 += src0_stride;
+        src1 += src1_stride;
+        mask += w;
+      }
+    }
+  } else {
+    const unsigned int bd_shift = bd - 8;
+    if (which_inverse) {
+      for (int i = 0; i < h; ++i) {
+        for (int j = 0; j < w; ++j) {
+          int diff =
+              (abs((int)src0[j] - (int)src1[j]) >> bd_shift) / DIFF_FACTOR;
+          unsigned int m = negative_to_zero(mask_base + diff);
+          m = AOMMIN(m, AOM_BLEND_A64_MAX_ALPHA);
+          mask[j] = AOM_BLEND_A64_MAX_ALPHA - m;
+        }
+        src0 += src0_stride;
+        src1 += src1_stride;
+        mask += w;
+      }
+    } else {
+      for (int i = 0; i < h; ++i) {
+        for (int j = 0; j < w; ++j) {
+          int diff =
+              (abs((int)src0[j] - (int)src1[j]) >> bd_shift) / DIFF_FACTOR;
+          unsigned int m = negative_to_zero(mask_base + diff);
+          m = AOMMIN(m, AOM_BLEND_A64_MAX_ALPHA);
+          mask[j] = m;
+        }
+        src0 += src0_stride;
+        src1 += src1_stride;
+        mask += w;
+      }
+    }
+  }
+}
+
+void av1_build_compound_diffwtd_mask_highbd_c(
+    uint8_t *mask, DIFFWTD_MASK_TYPE mask_type, const uint8_t *src0,
+    int src0_stride, const uint8_t *src1, int src1_stride, int h, int w,
+    int bd) {
+  switch (mask_type) {
+    case DIFFWTD_38:
+      diffwtd_mask_highbd(mask, 0, 38, CONVERT_TO_SHORTPTR(src0), src0_stride,
+                          CONVERT_TO_SHORTPTR(src1), src1_stride, h, w, bd);
+      break;
+    case DIFFWTD_38_INV:
+      diffwtd_mask_highbd(mask, 1, 38, CONVERT_TO_SHORTPTR(src0), src0_stride,
+                          CONVERT_TO_SHORTPTR(src1), src1_stride, h, w, bd);
+      break;
+    default: assert(0);
+  }
+}
+
+static void init_wedge_master_masks() {
+  int i, j;
+  const int w = MASK_MASTER_SIZE;
+  const int h = MASK_MASTER_SIZE;
+  const int stride = MASK_MASTER_STRIDE;
+// Note: index [0] stores the masters, and [1] its complement.
+#if USE_PRECOMPUTED_WEDGE_MASK
+  // Generate prototype by shifting the masters
+  int shift = h / 4;
+  for (i = 0; i < h; i += 2) {
+    shift_copy(wedge_master_oblique_even,
+               &wedge_mask_obl[0][WEDGE_OBLIQUE63][i * stride], shift,
+               MASK_MASTER_SIZE);
+    shift--;
+    shift_copy(wedge_master_oblique_odd,
+               &wedge_mask_obl[0][WEDGE_OBLIQUE63][(i + 1) * stride], shift,
+               MASK_MASTER_SIZE);
+    memcpy(&wedge_mask_obl[0][WEDGE_VERTICAL][i * stride],
+           wedge_master_vertical,
+           MASK_MASTER_SIZE * sizeof(wedge_master_vertical[0]));
+    memcpy(&wedge_mask_obl[0][WEDGE_VERTICAL][(i + 1) * stride],
+           wedge_master_vertical,
+           MASK_MASTER_SIZE * sizeof(wedge_master_vertical[0]));
+  }
+#else
+  static const double smoother_param = 2.85;
+  const int a[2] = { 2, 1 };
+  const double asqrt = sqrt(a[0] * a[0] + a[1] * a[1]);
+  for (i = 0; i < h; i++) {
+    for (j = 0; j < w; ++j) {
+      int x = (2 * j + 1 - w);
+      int y = (2 * i + 1 - h);
+      double d = (a[0] * x + a[1] * y) / asqrt;
+      const int msk = (int)rint((1.0 + tanh(d / smoother_param)) * 32);
+      wedge_mask_obl[0][WEDGE_OBLIQUE63][i * stride + j] = msk;
+      const int mskx = (int)rint((1.0 + tanh(x / smoother_param)) * 32);
+      wedge_mask_obl[0][WEDGE_VERTICAL][i * stride + j] = mskx;
+    }
+  }
+#endif  // USE_PRECOMPUTED_WEDGE_MASK
+  for (i = 0; i < h; ++i) {
+    for (j = 0; j < w; ++j) {
+      const int msk = wedge_mask_obl[0][WEDGE_OBLIQUE63][i * stride + j];
+      wedge_mask_obl[0][WEDGE_OBLIQUE27][j * stride + i] = msk;
+      wedge_mask_obl[0][WEDGE_OBLIQUE117][i * stride + w - 1 - j] =
+          wedge_mask_obl[0][WEDGE_OBLIQUE153][(w - 1 - j) * stride + i] =
+              (1 << WEDGE_WEIGHT_BITS) - msk;
+      wedge_mask_obl[1][WEDGE_OBLIQUE63][i * stride + j] =
+          wedge_mask_obl[1][WEDGE_OBLIQUE27][j * stride + i] =
+              (1 << WEDGE_WEIGHT_BITS) - msk;
+      wedge_mask_obl[1][WEDGE_OBLIQUE117][i * stride + w - 1 - j] =
+          wedge_mask_obl[1][WEDGE_OBLIQUE153][(w - 1 - j) * stride + i] = msk;
+      const int mskx = wedge_mask_obl[0][WEDGE_VERTICAL][i * stride + j];
+      wedge_mask_obl[0][WEDGE_HORIZONTAL][j * stride + i] = mskx;
+      wedge_mask_obl[1][WEDGE_VERTICAL][i * stride + j] =
+          wedge_mask_obl[1][WEDGE_HORIZONTAL][j * stride + i] =
+              (1 << WEDGE_WEIGHT_BITS) - mskx;
+    }
+  }
+}
+
+#if !USE_PRECOMPUTED_WEDGE_SIGN
+// If the signs for the wedges for various blocksizes are
+// inconsistent flip the sign flag. Do it only once for every
+// wedge codebook.
+static void init_wedge_signs() {
+  BLOCK_SIZE sb_type;
+  memset(wedge_signflip_lookup, 0, sizeof(wedge_signflip_lookup));
+  for (sb_type = BLOCK_4X4; sb_type < BLOCK_SIZES_ALL; ++sb_type) {
+    const int bw = block_size_wide[sb_type];
+    const int bh = block_size_high[sb_type];
+    const wedge_params_type wedge_params = wedge_params_lookup[sb_type];
+    const int wbits = wedge_params.bits;
+    const int wtypes = 1 << wbits;
+    int i, w;
+    if (wbits) {
+      for (w = 0; w < wtypes; ++w) {
+        // Get the mask master, i.e. index [0]
+        const uint8_t *mask = get_wedge_mask_inplace(w, 0, sb_type);
+        int avg = 0;
+        for (i = 0; i < bw; ++i) avg += mask[i];
+        for (i = 1; i < bh; ++i) avg += mask[i * MASK_MASTER_STRIDE];
+        avg = (avg + (bw + bh - 1) / 2) / (bw + bh - 1);
+        // Default sign of this wedge is 1 if the average < 32, 0 otherwise.
+        // If default sign is 1:
+        //   If sign requested is 0, we need to flip the sign and return
+        //   the complement i.e. index [1] instead. If sign requested is 1
+        //   we need to flip the sign and return index [0] instead.
+        // If default sign is 0:
+        //   If sign requested is 0, we need to return index [0] the master
+        //   if sign requested is 1, we need to return the complement index [1]
+        //   instead.
+        wedge_params.signflip[w] = (avg < 32);
+      }
+    }
+  }
+}
+#endif  // !USE_PRECOMPUTED_WEDGE_SIGN
+
+static void init_wedge_masks() {
+  uint8_t *dst = wedge_mask_buf;
+  BLOCK_SIZE bsize;
+  memset(wedge_masks, 0, sizeof(wedge_masks));
+  for (bsize = BLOCK_4X4; bsize < BLOCK_SIZES_ALL; ++bsize) {
+    const uint8_t *mask;
+    const int bw = block_size_wide[bsize];
+    const int bh = block_size_high[bsize];
+    const wedge_params_type *wedge_params = &wedge_params_lookup[bsize];
+    const int wbits = wedge_params->bits;
+    const int wtypes = 1 << wbits;
+    int w;
+    if (wbits == 0) continue;
+    for (w = 0; w < wtypes; ++w) {
+      mask = get_wedge_mask_inplace(w, 0, bsize);
+      aom_convolve_copy(mask, MASK_MASTER_STRIDE, dst, bw, NULL, 0, NULL, 0, bw,
+                        bh);
+      wedge_params->masks[0][w] = dst;
+      dst += bw * bh;
+
+      mask = get_wedge_mask_inplace(w, 1, bsize);
+      aom_convolve_copy(mask, MASK_MASTER_STRIDE, dst, bw, NULL, 0, NULL, 0, bw,
+                        bh);
+      wedge_params->masks[1][w] = dst;
+      dst += bw * bh;
+    }
+    assert(sizeof(wedge_mask_buf) >= (size_t)(dst - wedge_mask_buf));
+  }
+}
+
+// Equation of line: f(x, y) = a[0]*(x - a[2]*w/8) + a[1]*(y - a[3]*h/8) = 0
+void av1_init_wedge_masks() {
+  init_wedge_master_masks();
+#if !USE_PRECOMPUTED_WEDGE_SIGN
+  init_wedge_signs();
+#endif  // !USE_PRECOMPUTED_WEDGE_SIGN
+  init_wedge_masks();
+}
+
+void av1_dist_wtd_comp_weight_assign(const AV1_COMMON *cm,
+                                     const MB_MODE_INFO *mbmi, int order_idx,
+                                     int *fwd_offset, int *bck_offset,
+                                     int *use_dist_wtd_comp_avg,
+                                     int is_compound) {
+  assert(fwd_offset != NULL && bck_offset != NULL);
+  if (!is_compound || mbmi->compound_idx) {
+    *use_dist_wtd_comp_avg = 0;
+    return;
+  }
+
+  *use_dist_wtd_comp_avg = 1;
+  const RefCntBuffer *const bck_buf = get_ref_frame_buf(cm, mbmi->ref_frame[0]);
+  const RefCntBuffer *const fwd_buf = get_ref_frame_buf(cm, mbmi->ref_frame[1]);
+  const int cur_frame_index = cm->cur_frame->order_hint;
+  int bck_frame_index = 0, fwd_frame_index = 0;
+
+  if (bck_buf != NULL) bck_frame_index = bck_buf->order_hint;
+  if (fwd_buf != NULL) fwd_frame_index = fwd_buf->order_hint;
+
+  int d0 = clamp(abs(get_relative_dist(&cm->seq_params.order_hint_info, 
+                                       fwd_frame_index, cur_frame_index)),
+                 0, MAX_FRAME_DISTANCE);
+  int d1 = clamp(abs(get_relative_dist(&cm->seq_params.order_hint_info,
+                                       cur_frame_index, bck_frame_index)),
+                 0, MAX_FRAME_DISTANCE);
+
+  const int order = d0 <= d1;
+
+  if (d0 == 0 || d1 == 0) {
+    *fwd_offset = quant_dist_lookup_table[order_idx][3][order];
+    *bck_offset = quant_dist_lookup_table[order_idx][3][1 - order];
+    return;
+  }
+
+  int i;
+  for (i = 0; i < 3; ++i) {
+    int c0 = quant_dist_weight[i][order];
+    int c1 = quant_dist_weight[i][!order];
+    int d0_c0 = d0 * c0;
+    int d1_c1 = d1 * c1;
+    if ((d0 > d1 && d0_c0 < d1_c1) || (d0 <= d1 && d0_c0 > d1_c1)) break;
+  }
+
+  *fwd_offset = quant_dist_lookup_table[order_idx][i][order];
+  *bck_offset = quant_dist_lookup_table[order_idx][i][1 - order];
+}
+
+void av1_setup_dst_planes(struct macroblockd_plane *planes, BLOCK_SIZE bsize,
+                          const YV12_BUFFER_CONFIG *src, int mi_row, int mi_col,
+                          const int plane_start, const int plane_end) {
+  // We use AOMMIN(num_planes, MAX_MB_PLANE) instead of num_planes to quiet
+  // the static analysis warnings.
+  for (int i = plane_start; i < AOMMIN(plane_end, MAX_MB_PLANE); ++i) {
+    struct macroblockd_plane *const pd = &planes[i];
+    const int is_uv = i > 0;
+    setup_pred_plane(&pd->dst, bsize, src->buffers[i], src->crop_widths[is_uv],
+                     src->crop_heights[is_uv], src->strides[is_uv], mi_row,
+                     mi_col, NULL, pd->subsampling_x, pd->subsampling_y);
+  }
+}
+
+void av1_setup_pre_planes(MACROBLOCKD *xd, int idx,
+                          const YV12_BUFFER_CONFIG *src, int mi_row, int mi_col,
+                          const struct scale_factors *sf,
+                          const int num_planes) {
+  if (src != NULL) {
+    // We use AOMMIN(num_planes, MAX_MB_PLANE) instead of num_planes to quiet
+    // the static analysis warnings.
+    for (int i = 0; i < AOMMIN(num_planes, MAX_MB_PLANE); ++i) {
+      struct macroblockd_plane *const pd = &xd->plane[i];
+      const int is_uv = i > 0;
+      setup_pred_plane(&pd->pre[idx], xd->mi[0]->sb_type, src->buffers[i],
+                       src->crop_widths[is_uv], src->crop_heights[is_uv],
+                       src->strides[is_uv], mi_row, mi_col, sf,
+                       pd->subsampling_x, pd->subsampling_y);
+    }
+  }
+}
+
+// obmc_mask_N[overlap_position]
+static const uint8_t obmc_mask_1[1] = { 64 };
+DECLARE_ALIGNED(2, static const uint8_t, obmc_mask_2[2]) = { 45, 64 };
+
+DECLARE_ALIGNED(4, static const uint8_t, obmc_mask_4[4]) = { 39, 50, 59, 64 };
+
+static const uint8_t obmc_mask_8[8] = { 36, 42, 48, 53, 57, 61, 64, 64 };
+
+static const uint8_t obmc_mask_16[16] = { 34, 37, 40, 43, 46, 49, 52, 54,
+                                          56, 58, 60, 61, 64, 64, 64, 64 };
+
+static const uint8_t obmc_mask_32[32] = { 33, 35, 36, 38, 40, 41, 43, 44,
+                                          45, 47, 48, 50, 51, 52, 53, 55,
+                                          56, 57, 58, 59, 60, 60, 61, 62,
+                                          64, 64, 64, 64, 64, 64, 64, 64 };
+
+static const uint8_t obmc_mask_64[64] = {
+  33, 34, 35, 35, 36, 37, 38, 39, 40, 40, 41, 42, 43, 44, 44, 44,
+  45, 46, 47, 47, 48, 49, 50, 51, 51, 51, 52, 52, 53, 54, 55, 56,
+  56, 56, 57, 57, 58, 58, 59, 60, 60, 60, 60, 60, 61, 62, 62, 62,
+  62, 62, 63, 63, 63, 63, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
+};
+
+const uint8_t *av1_get_obmc_mask(int length) {
+  switch (length) {
+    case 1: return obmc_mask_1;
+    case 2: return obmc_mask_2;
+    case 4: return obmc_mask_4;
+    case 8: return obmc_mask_8;
+    case 16: return obmc_mask_16;
+    case 32: return obmc_mask_32;
+    case 64: return obmc_mask_64;
+    default: assert(0); return NULL;
+  }
+}
+
+static INLINE void increment_int_ptr(MACROBLOCKD *xd, int rel_mi_rc,
+                                     uint8_t mi_hw, MB_MODE_INFO *mi,
+                                     void *fun_ctxt, const int num_planes) {
+  (void)xd;
+  (void)rel_mi_rc;
+  (void)mi_hw;
+  (void)mi;
+  ++*(int *)fun_ctxt;
+  (void)num_planes;
+}
+
+void av1_count_overlappable_neighbors(const AV1_COMMON *cm, MACROBLOCKD *xd,
+                                      int mi_row, int mi_col) {
+  MB_MODE_INFO *mbmi = xd->mi[0];
+
+  xd->overlappable_neighbors[0] = 0;
+  xd->overlappable_neighbors[1] = 0;
+
+  if (!is_motion_variation_allowed_bsize(mbmi->sb_type)) return;
+
+  foreach_overlappable_nb_above(cm, xd, mi_col, INT_MAX, increment_int_ptr,
+                                &xd->overlappable_neighbors[0]);
+  foreach_overlappable_nb_left(cm, xd, mi_row, INT_MAX, increment_int_ptr,
+                               &xd->overlappable_neighbors[1]);
+}
+
+// HW does not support < 4x4 prediction. To limit the bandwidth requirement, if
+// block-size of current plane is smaller than 8x8, always only blend with the
+// left neighbor(s) (skip blending with the above side).
+#define DISABLE_CHROMA_U8X8_OBMC 0  // 0: one-sided obmc; 1: disable
+
+int av1_skip_u4x4_pred_in_obmc(BLOCK_SIZE bsize,
+                               const struct macroblockd_plane *pd, int dir) {
+  assert(is_motion_variation_allowed_bsize(bsize));
+
+  const BLOCK_SIZE bsize_plane =
+      get_plane_block_size(bsize, pd->subsampling_x, pd->subsampling_y);
+  switch (bsize_plane) {
+#if DISABLE_CHROMA_U8X8_OBMC
+    case BLOCK_4X4:
+    case BLOCK_8X4:
+    case BLOCK_4X8: return 1; break;
+#else
+    case BLOCK_4X4:
+    case BLOCK_8X4:
+    case BLOCK_4X8: return dir == 0; break;
+#endif
+    default: return 0;
+  }
+}
+
+void av1_modify_neighbor_predictor_for_obmc(MB_MODE_INFO *mbmi) {
+  mbmi->ref_frame[1] = NONE_FRAME;
+  mbmi->interinter_comp.type = COMPOUND_AVERAGE;
+
+  return;
+}
+
+struct obmc_inter_pred_ctxt {
+  uint8_t **adjacent;
+  int *adjacent_stride;
+};
+
+static INLINE void build_obmc_inter_pred_above(MACROBLOCKD *xd, int rel_mi_col,
+                                               uint8_t above_mi_width,
+                                               MB_MODE_INFO *above_mi,
+                                               void *fun_ctxt,
+                                               const int num_planes) {
+  (void)above_mi;
+  struct obmc_inter_pred_ctxt *ctxt = (struct obmc_inter_pred_ctxt *)fun_ctxt;
+  const BLOCK_SIZE bsize = xd->mi[0]->sb_type;
+  const int is_hbd = is_cur_buf_hbd(xd);
+  const int overlap =
+      AOMMIN(block_size_high[bsize], block_size_high[BLOCK_64X64]) >> 1;
+
+  for (int plane = 0; plane < num_planes; ++plane) {
+    const struct macroblockd_plane *pd = &xd->plane[plane];
+    const int bw = (above_mi_width * MI_SIZE) >> pd->subsampling_x;
+    const int bh = overlap >> pd->subsampling_y;
+    const int plane_col = (rel_mi_col * MI_SIZE) >> pd->subsampling_x;
+
+    if (av1_skip_u4x4_pred_in_obmc(bsize, pd, 0)) continue;
+
+    const int dst_stride = pd->dst.stride;
+    uint8_t *const dst = &pd->dst.buf[plane_col];
+    const int tmp_stride = ctxt->adjacent_stride[plane];
+    const uint8_t *const tmp = &ctxt->adjacent[plane][plane_col];
+    const uint8_t *const mask = av1_get_obmc_mask(bh);
+
+    if (is_hbd)
+      aom_highbd_blend_a64_vmask(dst, dst_stride, dst, dst_stride, tmp,
+                                 tmp_stride, mask, bw, bh, xd->bd);
+    else
+      aom_blend_a64_vmask(dst, dst_stride, dst, dst_stride, tmp, tmp_stride,
+                          mask, bw, bh);
+  }
+}
+
+static INLINE void build_obmc_inter_pred_left(MACROBLOCKD *xd, int rel_mi_row,
+                                              uint8_t left_mi_height,
+                                              MB_MODE_INFO *left_mi,
+                                              void *fun_ctxt,
+                                              const int num_planes) {
+  (void)left_mi;
+  struct obmc_inter_pred_ctxt *ctxt = (struct obmc_inter_pred_ctxt *)fun_ctxt;
+  const BLOCK_SIZE bsize = xd->mi[0]->sb_type;
+  const int overlap =
+      AOMMIN(block_size_wide[bsize], block_size_wide[BLOCK_64X64]) >> 1;
+  const int is_hbd = is_cur_buf_hbd(xd);
+
+  for (int plane = 0; plane < num_planes; ++plane) {
+    const struct macroblockd_plane *pd = &xd->plane[plane];
+    const int bw = overlap >> pd->subsampling_x;
+    const int bh = (left_mi_height * MI_SIZE) >> pd->subsampling_y;
+    const int plane_row = (rel_mi_row * MI_SIZE) >> pd->subsampling_y;
+
+    if (av1_skip_u4x4_pred_in_obmc(bsize, pd, 1)) continue;
+
+    const int dst_stride = pd->dst.stride;
+    uint8_t *const dst = &pd->dst.buf[plane_row * dst_stride];
+    const int tmp_stride = ctxt->adjacent_stride[plane];
+    const uint8_t *const tmp = &ctxt->adjacent[plane][plane_row * tmp_stride];
+    const uint8_t *const mask = av1_get_obmc_mask(bw);
+
+    if (is_hbd)
+      aom_highbd_blend_a64_hmask(dst, dst_stride, dst, dst_stride, tmp,
+                                 tmp_stride, mask, bw, bh, xd->bd);
+    else
+      aom_blend_a64_hmask(dst, dst_stride, dst, dst_stride, tmp, tmp_stride,
+                          mask, bw, bh);
+  }
+}
+
+// This function combines motion compensated predictions that are generated by
+// top/left neighboring blocks' inter predictors with the regular inter
+// prediction. We assume the original prediction (bmc) is stored in
+// xd->plane[].dst.buf
+void av1_build_obmc_inter_prediction(const AV1_COMMON *cm, MACROBLOCKD *xd,
+                                     int mi_row, int mi_col,
+                                     uint8_t *above[MAX_MB_PLANE],
+                                     int above_stride[MAX_MB_PLANE],
+                                     uint8_t *left[MAX_MB_PLANE],
+                                     int left_stride[MAX_MB_PLANE]) {
+  const BLOCK_SIZE bsize = xd->mi[0]->sb_type;
+
+  // handle above row
+  struct obmc_inter_pred_ctxt ctxt_above = { above, above_stride };
+  foreach_overlappable_nb_above(cm, xd, mi_col,
+                                max_neighbor_obmc[mi_size_wide_log2[bsize]],
+                                build_obmc_inter_pred_above, &ctxt_above);
+
+  // handle left column
+  struct obmc_inter_pred_ctxt ctxt_left = { left, left_stride };
+  foreach_overlappable_nb_left(cm, xd, mi_row,
+                               max_neighbor_obmc[mi_size_high_log2[bsize]],
+                               build_obmc_inter_pred_left, &ctxt_left);
+}
+
+void av1_setup_build_prediction_by_above_pred(
+    MACROBLOCKD *xd, int rel_mi_col, uint8_t above_mi_width,
+    MB_MODE_INFO *above_mbmi, struct build_prediction_ctxt *ctxt,
+    const int num_planes) {
+  const BLOCK_SIZE a_bsize = AOMMAX(BLOCK_8X8, above_mbmi->sb_type);
+  const int above_mi_col = ctxt->mi_col + rel_mi_col;
+
+  av1_modify_neighbor_predictor_for_obmc(above_mbmi);
+
+  for (int j = 0; j < num_planes; ++j) {
+    struct macroblockd_plane *const pd = &xd->plane[j];
+    setup_pred_plane(&pd->dst, a_bsize, ctxt->tmp_buf[j], ctxt->tmp_width[j],
+                     ctxt->tmp_height[j], ctxt->tmp_stride[j], 0, rel_mi_col,
+                     NULL, pd->subsampling_x, pd->subsampling_y);
+  }
+
+  const int num_refs = 1 + has_second_ref(above_mbmi);
+
+  for (int ref = 0; ref < num_refs; ++ref) {
+    const MV_REFERENCE_FRAME frame = above_mbmi->ref_frame[ref];
+
+    const RefCntBuffer *const ref_buf = get_ref_frame_buf(ctxt->cm, frame);
+    const struct scale_factors *const sf =
+        get_ref_scale_factors_const(ctxt->cm, frame);
+    xd->block_ref_scale_factors[ref] = sf;
+    if ((!av1_is_valid_scale(sf)))
+      aom_internal_error(xd->error_info, AOM_CODEC_UNSUP_BITSTREAM,
+                         "Reference frame has invalid dimensions");
+    av1_setup_pre_planes(xd, ref, &ref_buf->buf, ctxt->mi_row, above_mi_col, sf,
+                         num_planes);
+  }
+
+  xd->mb_to_left_edge = 8 * MI_SIZE * (-above_mi_col);
+  xd->mb_to_right_edge = ctxt->mb_to_far_edge +
+                         (xd->n4_w - rel_mi_col - above_mi_width) * MI_SIZE * 8;
+}
+
+void av1_setup_build_prediction_by_left_pred(MACROBLOCKD *xd, int rel_mi_row,
+                                             uint8_t left_mi_height,
+                                             MB_MODE_INFO *left_mbmi,
+                                             struct build_prediction_ctxt *ctxt,
+                                             const int num_planes) {
+  const BLOCK_SIZE l_bsize = AOMMAX(BLOCK_8X8, left_mbmi->sb_type);
+  const int left_mi_row = ctxt->mi_row + rel_mi_row;
+
+  av1_modify_neighbor_predictor_for_obmc(left_mbmi);
+
+  for (int j = 0; j < num_planes; ++j) {
+    struct macroblockd_plane *const pd = &xd->plane[j];
+    setup_pred_plane(&pd->dst, l_bsize, ctxt->tmp_buf[j], ctxt->tmp_width[j],
+                     ctxt->tmp_height[j], ctxt->tmp_stride[j], rel_mi_row, 0,
+                     NULL, pd->subsampling_x, pd->subsampling_y);
+  }
+
+  const int num_refs = 1 + has_second_ref(left_mbmi);
+
+  for (int ref = 0; ref < num_refs; ++ref) {
+    const MV_REFERENCE_FRAME frame = left_mbmi->ref_frame[ref];
+
+    const RefCntBuffer *const ref_buf = get_ref_frame_buf(ctxt->cm, frame);
+    const struct scale_factors *const ref_scale_factors =
+        get_ref_scale_factors_const(ctxt->cm, frame);
+
+    xd->block_ref_scale_factors[ref] = ref_scale_factors;
+    if ((!av1_is_valid_scale(ref_scale_factors)))
+      aom_internal_error(xd->error_info, AOM_CODEC_UNSUP_BITSTREAM,
+                         "Reference frame has invalid dimensions");
+    av1_setup_pre_planes(xd, ref, &ref_buf->buf, left_mi_row, ctxt->mi_col,
+                         ref_scale_factors, num_planes);
+  }
+
+  xd->mb_to_top_edge = 8 * MI_SIZE * (-left_mi_row);
+  xd->mb_to_bottom_edge =
+      ctxt->mb_to_far_edge +
+      (xd->n4_h - rel_mi_row - left_mi_height) * MI_SIZE * 8;
+}
+
+/* clang-format off */
+const uint8_t ii_weights1d[MAX_SB_SIZE] = {
+  60, 58, 56, 54, 52, 50, 48, 47, 45, 44, 42, 41, 39, 38, 37, 35, 34, 33, 32,
+  31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 22, 21, 20, 19, 19, 18, 18, 17, 16,
+  16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11, 10, 10, 10,  9,  9,  9,  8,
+  8,  8,  8,  7,  7,  7,  7,  6,  6,  6,  6,  6,  5,  5,  5,  5,  5,  4,  4,
+  4,  4,  4,  4,  4,  4,  3,  3,  3,  3,  3,  3,  3,  3,  3,  2,  2,  2,  2,
+  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  1,  1,  1,  1,  1,  1,  1,  1,
+  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1
+};
+uint8_t ii_size_scales[BLOCK_SIZES_ALL] = {
+    32, 16, 16, 16, 8, 8, 8, 4,
+    4,  4,  2,  2,  2, 1, 1, 1,
+    8,  8,  4,  4,  2, 2
+};
+/* clang-format on */
+
+static void build_smooth_interintra_mask(uint8_t *mask, int stride,
+                                         BLOCK_SIZE plane_bsize,
+                                         INTERINTRA_MODE mode) {
+  int i, j;
+  const int bw = block_size_wide[plane_bsize];
+  const int bh = block_size_high[plane_bsize];
+  const int size_scale = ii_size_scales[plane_bsize];
+
+  switch (mode) {
+    case II_V_PRED:
+      for (i = 0; i < bh; ++i) {
+        memset(mask, ii_weights1d[i * size_scale], bw * sizeof(mask[0]));
+        mask += stride;
+      }
+      break;
+
+    case II_H_PRED:
+      for (i = 0; i < bh; ++i) {
+        for (j = 0; j < bw; ++j) mask[j] = ii_weights1d[j * size_scale];
+        mask += stride;
+      }
+      break;
+
+    case II_SMOOTH_PRED:
+      for (i = 0; i < bh; ++i) {
+        for (j = 0; j < bw; ++j)
+          mask[j] = ii_weights1d[(i < j ? i : j) * size_scale];
+        mask += stride;
+      }
+      break;
+
+    case II_DC_PRED:
+    default:
+      for (i = 0; i < bh; ++i) {
+        memset(mask, 32, bw * sizeof(mask[0]));
+        mask += stride;
+      }
+      break;
+  }
+}
+
+static void combine_interintra(INTERINTRA_MODE mode,
+                               int8_t use_wedge_interintra, int wedge_index,
+                               int wedge_sign, BLOCK_SIZE bsize,
+                               BLOCK_SIZE plane_bsize, uint8_t *comppred,
+                               int compstride, const uint8_t *interpred,
+                               int interstride, const uint8_t *intrapred,
+                               int intrastride) {
+  const int bw = block_size_wide[plane_bsize];
+  const int bh = block_size_high[plane_bsize];
+
+  if (use_wedge_interintra) {
+    if (is_interintra_wedge_used(bsize)) {
+      const uint8_t *mask =
+          av1_get_contiguous_soft_mask(wedge_index, wedge_sign, bsize);
+      const int subw = 2 * mi_size_wide[bsize] == bw;
+      const int subh = 2 * mi_size_high[bsize] == bh;
+      aom_blend_a64_mask(comppred, compstride, intrapred, intrastride,
+                         interpred, interstride, mask, block_size_wide[bsize],
+                         bw, bh, subw, subh);
+    }
+    return;
+  }
+
+  uint8_t mask[MAX_SB_SQUARE];
+  build_smooth_interintra_mask(mask, bw, plane_bsize, mode);
+  aom_blend_a64_mask(comppred, compstride, intrapred, intrastride, interpred,
+                     interstride, mask, bw, bw, bh, 0, 0);
+}
+
+static void combine_interintra_highbd(
+    INTERINTRA_MODE mode, int8_t use_wedge_interintra, int wedge_index,
+    int wedge_sign, BLOCK_SIZE bsize, BLOCK_SIZE plane_bsize,
+    uint8_t *comppred8, int compstride, const uint8_t *interpred8,
+    int interstride, const uint8_t *intrapred8, int intrastride, int bd) {
+  const int bw = block_size_wide[plane_bsize];
+  const int bh = block_size_high[plane_bsize];
+
+  if (use_wedge_interintra) {
+    if (is_interintra_wedge_used(bsize)) {
+      const uint8_t *mask =
+          av1_get_contiguous_soft_mask(wedge_index, wedge_sign, bsize);
+      const int subh = 2 * mi_size_high[bsize] == bh;
+      const int subw = 2 * mi_size_wide[bsize] == bw;
+      aom_highbd_blend_a64_mask(comppred8, compstride, intrapred8, intrastride,
+                                interpred8, interstride, mask,
+                                block_size_wide[bsize], bw, bh, subw, subh, bd);
+    }
+    return;
+  }
+
+  uint8_t mask[MAX_SB_SQUARE];
+  build_smooth_interintra_mask(mask, bw, plane_bsize, mode);
+  aom_highbd_blend_a64_mask(comppred8, compstride, intrapred8, intrastride,
+                            interpred8, interstride, mask, bw, bw, bh, 0, 0,
+                            bd);
+}
+
+void av1_build_intra_predictors_for_interintra(const AV1_COMMON *cm,
+                                               MACROBLOCKD *xd,
+                                               BLOCK_SIZE bsize, int plane,
+                                               const BUFFER_SET *ctx,
+                                               uint8_t *dst, int dst_stride) {
+  struct macroblockd_plane *const pd = &xd->plane[plane];
+  const int ssx = xd->plane[plane].subsampling_x;
+  const int ssy = xd->plane[plane].subsampling_y;
+  BLOCK_SIZE plane_bsize = get_plane_block_size(bsize, ssx, ssy);
+  PREDICTION_MODE mode = interintra_to_intra_mode[xd->mi[0]->interintra_mode];
+  assert(xd->mi[0]->angle_delta[PLANE_TYPE_Y] == 0);
+  assert(xd->mi[0]->angle_delta[PLANE_TYPE_UV] == 0);
+  assert(xd->mi[0]->filter_intra_mode_info.use_filter_intra == 0);
+  assert(xd->mi[0]->use_intrabc == 0);
+
+  av1_predict_intra_block(cm, xd, pd->width, pd->height,
+                          max_txsize_rect_lookup[plane_bsize], mode, 0, 0,
+                          FILTER_INTRA_MODES, ctx->plane[plane],
+                          ctx->stride[plane], dst, dst_stride, 0, 0, plane);
+}
+
+void av1_combine_interintra(MACROBLOCKD *xd, BLOCK_SIZE bsize, int plane,
+                            const uint8_t *inter_pred, int inter_stride,
+                            const uint8_t *intra_pred, int intra_stride) {
+  const int ssx = xd->plane[plane].subsampling_x;
+  const int ssy = xd->plane[plane].subsampling_y;
+  const BLOCK_SIZE plane_bsize = get_plane_block_size(bsize, ssx, ssy);
+  if (is_cur_buf_hbd(xd)) {
+    combine_interintra_highbd(
+        xd->mi[0]->interintra_mode, xd->mi[0]->use_wedge_interintra,
+        xd->mi[0]->interintra_wedge_index, xd->mi[0]->interintra_wedge_sign,
+        bsize, plane_bsize, xd->plane[plane].dst.buf,
+        xd->plane[plane].dst.stride, inter_pred, inter_stride, intra_pred,
+        intra_stride, xd->bd);
+    return;
+  }
+  combine_interintra(
+      xd->mi[0]->interintra_mode, xd->mi[0]->use_wedge_interintra,
+      xd->mi[0]->interintra_wedge_index, xd->mi[0]->interintra_wedge_sign,
+      bsize, plane_bsize, xd->plane[plane].dst.buf, xd->plane[plane].dst.stride,
+      inter_pred, inter_stride, intra_pred, intra_stride);
+}
+
+// build interintra_predictors for one plane
+void av1_build_interintra_predictors_sbp(const AV1_COMMON *cm, MACROBLOCKD *xd,
+                                         uint8_t *pred, int stride,
+                                         const BUFFER_SET *ctx, int plane,
+                                         BLOCK_SIZE bsize) {
+  if (is_cur_buf_hbd(xd)) {
+    DECLARE_ALIGNED(16, uint16_t, intrapredictor[MAX_SB_SQUARE]);
+    av1_build_intra_predictors_for_interintra(
+        cm, xd, bsize, plane, ctx, CONVERT_TO_BYTEPTR(intrapredictor),
+        MAX_SB_SIZE);
+    av1_combine_interintra(xd, bsize, plane, pred, stride,
+                           CONVERT_TO_BYTEPTR(intrapredictor), MAX_SB_SIZE);
+  } else {
+    DECLARE_ALIGNED(16, uint8_t, intrapredictor[MAX_SB_SQUARE]);
+    av1_build_intra_predictors_for_interintra(cm, xd, bsize, plane, ctx,
+                                              intrapredictor, MAX_SB_SIZE);
+    av1_combine_interintra(xd, bsize, plane, pred, stride, intrapredictor,
+                           MAX_SB_SIZE);
+  }
+}
+
+void av1_build_interintra_predictors_sbuv(const AV1_COMMON *cm, MACROBLOCKD *xd,
+                                          uint8_t *upred, uint8_t *vpred,
+                                          int ustride, int vstride,
+                                          const BUFFER_SET *ctx,
+                                          BLOCK_SIZE bsize) {
+  av1_build_interintra_predictors_sbp(cm, xd, upred, ustride, ctx, 1, bsize);
+  av1_build_interintra_predictors_sbp(cm, xd, vpred, vstride, ctx, 2, bsize);
+}

diff --git a/libav1/av1/common/reconinter.h b/libav1/av1/common/reconinter.h
new file mode 100644
index 0000000..944a700
--- /dev/null
+++ b/libav1/av1/common/reconinter.h

@@ -0,0 +1,365 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_RECONINTER_H_
+#define AOM_AV1_COMMON_RECONINTER_H_
+
+#include "av1/common/filter.h"
+#include "av1/common/onyxc_int.h"
+#include "av1/common/convolve.h"
+#include "av1/common/warped_motion.h"
+#include "aom/aom_integer.h"
+
+// Work out how many pixels off the edge of a reference frame we're allowed
+// to go when forming an inter prediction.
+// The outermost row/col of each referernce frame is extended by
+// (AOM_BORDER_IN_PIXELS >> subsampling) pixels, but we need to keep
+// at least AOM_INTERP_EXTEND pixels within that to account for filtering.
+//
+// We have to break this up into two macros to keep both clang-format and
+// tools/lint-hunks.py happy.
+#define AOM_LEFT_TOP_MARGIN_PX(subsampling) \
+  ((AOM_BORDER_IN_PIXELS >> subsampling) - AOM_INTERP_EXTEND)
+#define AOM_LEFT_TOP_MARGIN_SCALED(subsampling) \
+  (AOM_LEFT_TOP_MARGIN_PX(subsampling) << SCALE_SUBPEL_BITS)
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// Set to (1 << 5) if the 32-ary codebooks are used for any bock size
+#define MAX_WEDGE_TYPES (1 << 4)
+
+#define MAX_WEDGE_SIZE_LOG2 5  // 32x32
+#define MAX_WEDGE_SIZE (1 << MAX_WEDGE_SIZE_LOG2)
+#define MAX_WEDGE_SQUARE (MAX_WEDGE_SIZE * MAX_WEDGE_SIZE)
+
+#define WEDGE_WEIGHT_BITS 6
+
+#define WEDGE_NONE -1
+
+// Angles are with respect to horizontal anti-clockwise
+enum {
+  WEDGE_HORIZONTAL = 0,
+  WEDGE_VERTICAL = 1,
+  WEDGE_OBLIQUE27 = 2,
+  WEDGE_OBLIQUE63 = 3,
+  WEDGE_OBLIQUE117 = 4,
+  WEDGE_OBLIQUE153 = 5,
+  WEDGE_DIRECTIONS
+} UENUM1BYTE(WedgeDirectionType);
+
+// 3-tuple: {direction, x_offset, y_offset}
+typedef struct {
+  WedgeDirectionType direction;
+  int x_offset;
+  int y_offset;
+} wedge_code_type;
+
+typedef uint8_t *wedge_masks_type[MAX_WEDGE_TYPES];
+
+typedef struct {
+  int bits;
+  const wedge_code_type *codebook;
+  uint8_t *signflip;
+  wedge_masks_type *masks;
+} wedge_params_type;
+
+extern const wedge_params_type wedge_params_lookup[BLOCK_SIZES_ALL];
+extern uint8_t wedge_mask_buf[2 * MAX_WEDGE_TYPES * 4 * MAX_WEDGE_SQUARE];
+extern const uint8_t ii_weights1d[MAX_SB_SIZE];
+extern uint8_t ii_size_scales[BLOCK_SIZES_ALL];
+
+typedef struct SubpelParams {
+  int xs;
+  int ys;
+  int subpel_x;
+  int subpel_y;
+} SubpelParams;
+
+struct build_prediction_ctxt {
+  const AV1_COMMON *cm;
+  int mi_row;
+  int mi_col;
+  uint8_t **tmp_buf;
+  int *tmp_width;
+  int *tmp_height;
+  int *tmp_stride;
+  int mb_to_far_edge;
+};
+
+static INLINE int has_scale(int xs, int ys) {
+  return xs != SCALE_SUBPEL_SHIFTS || ys != SCALE_SUBPEL_SHIFTS;
+}
+
+static INLINE void revert_scale_extra_bits(SubpelParams *sp) {
+  sp->subpel_x >>= SCALE_EXTRA_BITS;
+  sp->subpel_y >>= SCALE_EXTRA_BITS;
+  sp->xs >>= SCALE_EXTRA_BITS;
+  sp->ys >>= SCALE_EXTRA_BITS;
+  assert(sp->subpel_x < SUBPEL_SHIFTS);
+  assert(sp->subpel_y < SUBPEL_SHIFTS);
+  assert(sp->xs <= SUBPEL_SHIFTS);
+  assert(sp->ys <= SUBPEL_SHIFTS);
+}
+
+static INLINE void inter_predictor(const uint8_t *src, int src_stride,
+                                   uint8_t *dst, int dst_stride,
+                                   const SubpelParams *subpel_params,
+                                   const struct scale_factors *sf, int w, int h,
+                                   ConvolveParams *conv_params,
+                                   InterpFilters interp_filters,
+                                   int is_intrabc) {
+  assert(conv_params->do_average == 0 || conv_params->do_average == 1);
+  assert(sf);
+  const int is_scaled = has_scale(subpel_params->xs, subpel_params->ys);
+  assert(IMPLIES(is_intrabc, !is_scaled));
+  if (is_scaled) {
+    av1_convolve_2d_facade(src, src_stride, dst, dst_stride, w, h,
+                           interp_filters, subpel_params->subpel_x,
+                           subpel_params->xs, subpel_params->subpel_y,
+                           subpel_params->ys, 1, conv_params, sf, is_intrabc);
+  } else {
+    SubpelParams sp = *subpel_params;
+    revert_scale_extra_bits(&sp);
+    av1_convolve_2d_facade(src, src_stride, dst, dst_stride, w, h,
+                           interp_filters, sp.subpel_x, sp.xs, sp.subpel_y,
+                           sp.ys, 0, conv_params, sf, is_intrabc);
+  }
+}
+
+static INLINE void highbd_inter_predictor(const uint8_t *src, int src_stride,
+                                          uint8_t *dst, int dst_stride,
+                                          const SubpelParams *subpel_params,
+                                          const struct scale_factors *sf, int w,
+                                          int h, ConvolveParams *conv_params,
+                                          InterpFilters interp_filters,
+                                          int is_intrabc, int bd) {
+  assert(conv_params->do_average == 0 || conv_params->do_average == 1);
+  assert(sf);
+  const int is_scaled = has_scale(subpel_params->xs, subpel_params->ys);
+  assert(IMPLIES(is_intrabc, !is_scaled));
+  if (is_scaled) {
+    av1_highbd_convolve_2d_facade(
+        src, src_stride, dst, dst_stride, w, h, interp_filters,
+        subpel_params->subpel_x, subpel_params->xs, subpel_params->subpel_y,
+        subpel_params->ys, 1, conv_params, sf, is_intrabc, bd);
+  } else {
+    SubpelParams sp = *subpel_params;
+    revert_scale_extra_bits(&sp);
+    av1_highbd_convolve_2d_facade(
+        src, src_stride, dst, dst_stride, w, h, interp_filters, sp.subpel_x,
+        sp.xs, sp.subpel_y, sp.ys, 0, conv_params, sf, is_intrabc, bd);
+  }
+}
+
+void av1_modify_neighbor_predictor_for_obmc(MB_MODE_INFO *mbmi);
+int av1_skip_u4x4_pred_in_obmc(BLOCK_SIZE bsize,
+                               const struct macroblockd_plane *pd, int dir);
+
+static INLINE int is_interinter_compound_used(COMPOUND_TYPE type,
+                                              BLOCK_SIZE sb_type) {
+  const int comp_allowed = is_comp_ref_allowed(sb_type);
+  switch (type) {
+    case COMPOUND_AVERAGE:
+    case COMPOUND_DISTWTD:
+    case COMPOUND_DIFFWTD: return comp_allowed;
+    case COMPOUND_WEDGE:
+      return comp_allowed && wedge_params_lookup[sb_type].bits > 0;
+    default: assert(0); return 0;
+  }
+}
+
+static INLINE int is_any_masked_compound_used(BLOCK_SIZE sb_type) {
+  COMPOUND_TYPE comp_type;
+  int i;
+  if (!is_comp_ref_allowed(sb_type)) return 0;
+  for (i = 0; i < COMPOUND_TYPES; i++) {
+    comp_type = (COMPOUND_TYPE)i;
+    if (is_masked_compound_type(comp_type) &&
+        is_interinter_compound_used(comp_type, sb_type))
+      return 1;
+  }
+  return 0;
+}
+
+static INLINE int get_wedge_bits_lookup(BLOCK_SIZE sb_type) {
+  return wedge_params_lookup[sb_type].bits;
+}
+
+static INLINE int get_interinter_wedge_bits(BLOCK_SIZE sb_type) {
+  const int wbits = wedge_params_lookup[sb_type].bits;
+  return (wbits > 0) ? wbits + 1 : 0;
+}
+
+static INLINE int is_interintra_wedge_used(BLOCK_SIZE sb_type) {
+  return wedge_params_lookup[sb_type].bits > 0;
+}
+
+static INLINE int get_interintra_wedge_bits(BLOCK_SIZE sb_type) {
+  return wedge_params_lookup[sb_type].bits;
+}
+
+void av1_make_inter_predictor(const uint8_t *src, int src_stride, uint8_t *dst,
+                              int dst_stride, const SubpelParams *subpel_params,
+                              const struct scale_factors *sf, int w, int h,
+                              ConvolveParams *conv_params,
+                              InterpFilters interp_filters,
+                              const WarpTypesAllowed *warp_types, int p_col,
+                              int p_row, int plane, int ref,
+                              const MB_MODE_INFO *mi, int build_for_obmc,
+                              const MACROBLOCKD *xd, int can_use_previous);
+
+// TODO(jkoleszar): yet another mv clamping function :-(
+static INLINE MV clamp_mv_to_umv_border_sb(const MACROBLOCKD *xd,
+                                           const MV *src_mv, int bw, int bh,
+                                           int ss_x, int ss_y) {
+  // If the MV points so far into the UMV border that no visible pixels
+  // are used for reconstruction, the subpel part of the MV can be
+  // discarded and the MV limited to 16 pixels with equivalent results.
+  const int spel_left = (AOM_INTERP_EXTEND + bw) << SUBPEL_BITS;
+  const int spel_right = spel_left - SUBPEL_SHIFTS;
+  const int spel_top = (AOM_INTERP_EXTEND + bh) << SUBPEL_BITS;
+  const int spel_bottom = spel_top - SUBPEL_SHIFTS;
+  MV clamped_mv = { (int16_t)(src_mv->row * (1 << (1 - ss_y))),
+                    (int16_t)(src_mv->col * (1 << (1 - ss_x))) };
+  assert(ss_x <= 1);
+  assert(ss_y <= 1);
+
+  clamp_mv(&clamped_mv, xd->mb_to_left_edge * (1 << (1 - ss_x)) - spel_left,
+           xd->mb_to_right_edge * (1 << (1 - ss_x)) + spel_right,
+           xd->mb_to_top_edge * (1 << (1 - ss_y)) - spel_top,
+           xd->mb_to_bottom_edge * (1 << (1 - ss_y)) + spel_bottom);
+
+  return clamped_mv;
+}
+
+static INLINE int scaled_buffer_offset(int x_offset, int y_offset, int stride,
+                                       const struct scale_factors *sf) {
+  const int x =
+      sf ? sf->scale_value_x(x_offset, sf) >> SCALE_EXTRA_BITS : x_offset;
+  const int y =
+      sf ? sf->scale_value_y(y_offset, sf) >> SCALE_EXTRA_BITS : y_offset;
+  return y * stride + x;
+}
+
+static INLINE void setup_pred_plane(struct buf_2d *dst, BLOCK_SIZE bsize,
+                                    uint8_t *src, int width, int height,
+                                    int stride, int mi_row, int mi_col,
+                                    const struct scale_factors *scale,
+                                    int subsampling_x, int subsampling_y) {
+  // Offset the buffer pointer
+  if (subsampling_y && (mi_row & 0x01) && (mi_size_high[bsize] == 1))
+    mi_row -= 1;
+  if (subsampling_x && (mi_col & 0x01) && (mi_size_wide[bsize] == 1))
+    mi_col -= 1;
+
+  const int x = (MI_SIZE * mi_col) >> subsampling_x;
+  const int y = (MI_SIZE * mi_row) >> subsampling_y;
+  dst->buf = src + scaled_buffer_offset(x, y, stride, scale);
+  dst->buf0 = src;
+  dst->width = width;
+  dst->height = height;
+  dst->stride = stride;
+}
+
+void av1_setup_dst_planes(struct macroblockd_plane *planes, BLOCK_SIZE bsize,
+                          const YV12_BUFFER_CONFIG *src, int mi_row, int mi_col,
+                          const int plane_start, const int plane_end);
+
+void av1_setup_pre_planes(MACROBLOCKD *xd, int idx,
+                          const YV12_BUFFER_CONFIG *src, int mi_row, int mi_col,
+                          const struct scale_factors *sf, const int num_planes);
+
+static INLINE void set_default_interp_filters(
+    MB_MODE_INFO *const mbmi, InterpFilter frame_interp_filter) {
+  mbmi->interp_filters =
+      av1_broadcast_interp_filter(av1_unswitchable_filter(frame_interp_filter));
+}
+
+static INLINE int av1_is_interp_needed(const MACROBLOCKD *const xd) {
+  const MB_MODE_INFO *const mbmi = xd->mi[0];
+  if (mbmi->skip_mode) return 0;
+  if (mbmi->motion_mode == WARPED_CAUSAL) return 0;
+  if (is_nontrans_global_motion(xd, xd->mi[0])) return 0;
+  return 1;
+}
+
+void av1_setup_build_prediction_by_above_pred(
+    MACROBLOCKD *xd, int rel_mi_col, uint8_t above_mi_width,
+    MB_MODE_INFO *above_mbmi, struct build_prediction_ctxt *ctxt,
+    const int num_planes);
+void av1_setup_build_prediction_by_left_pred(MACROBLOCKD *xd, int rel_mi_row,
+                                             uint8_t left_mi_height,
+                                             MB_MODE_INFO *left_mbmi,
+                                             struct build_prediction_ctxt *ctxt,
+                                             const int num_planes);
+void av1_build_obmc_inter_prediction(const AV1_COMMON *cm, MACROBLOCKD *xd,
+                                     int mi_row, int mi_col,
+                                     uint8_t *above[MAX_MB_PLANE],
+                                     int above_stride[MAX_MB_PLANE],
+                                     uint8_t *left[MAX_MB_PLANE],
+                                     int left_stride[MAX_MB_PLANE]);
+
+const uint8_t *av1_get_obmc_mask(int length);
+void av1_count_overlappable_neighbors(const AV1_COMMON *cm, MACROBLOCKD *xd,
+                                      int mi_row, int mi_col);
+
+#define MASK_MASTER_SIZE ((MAX_WEDGE_SIZE) << 1)
+#define MASK_MASTER_STRIDE (MASK_MASTER_SIZE)
+
+void av1_init_wedge_masks();
+
+static INLINE const uint8_t *av1_get_contiguous_soft_mask(int wedge_index,
+                                                          int wedge_sign,
+                                                          BLOCK_SIZE sb_type) {
+  return wedge_params_lookup[sb_type].masks[wedge_sign][wedge_index];
+}
+
+const uint8_t *av1_get_compound_type_mask(
+    const INTERINTER_COMPOUND_DATA *const comp_data, BLOCK_SIZE sb_type);
+
+// build interintra_predictors for one plane
+void av1_build_interintra_predictors_sbp(const AV1_COMMON *cm, MACROBLOCKD *xd,
+                                         uint8_t *pred, int stride,
+                                         const BUFFER_SET *ctx, int plane,
+                                         BLOCK_SIZE bsize);
+
+void av1_build_interintra_predictors_sbuv(const AV1_COMMON *cm, MACROBLOCKD *xd,
+                                          uint8_t *upred, uint8_t *vpred,
+                                          int ustride, int vstride,
+                                          const BUFFER_SET *ctx,
+                                          BLOCK_SIZE bsize);
+
+void av1_build_intra_predictors_for_interintra(
+    const AV1_COMMON *cm, MACROBLOCKD *xd, BLOCK_SIZE bsize, int plane,
+    const BUFFER_SET *ctx, uint8_t *intra_pred, int intra_stride);
+
+void av1_combine_interintra(MACROBLOCKD *xd, BLOCK_SIZE bsize, int plane,
+                            const uint8_t *inter_pred, int inter_stride,
+                            const uint8_t *intra_pred, int intra_stride);
+
+void av1_dist_wtd_comp_weight_assign(const AV1_COMMON *cm,
+                                     const MB_MODE_INFO *mbmi, int order_idx,
+                                     int *fwd_offset, int *bck_offset,
+                                     int *use_dist_wtd_comp_avg,
+                                     int is_compound);
+int av1_allow_warp(const MB_MODE_INFO *const mbmi,
+                   const WarpTypesAllowed *const warp_types,
+                   const WarpedMotionParams *const gm_params,
+                   int build_for_obmc, const struct scale_factors *const sf,
+                   WarpedMotionParams *final_warp_params);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_RECONINTER_H_

diff --git a/libav1/av1/common/reconintra.c b/libav1/av1/common/reconintra.c
new file mode 100644
index 0000000..39a724d
--- /dev/null
+++ b/libav1/av1/common/reconintra.c

@@ -0,0 +1,1647 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <math.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+#include "config/av1_rtcd.h"
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_mem/aom_mem.h"
+#include "aom_ports/aom_once.h"
+#include "aom_ports/mem.h"
+#include "aom_ports/system_state.h"
+#include "av1/common/reconintra.h"
+#include "av1/common/onyxc_int.h"
+#include "av1/common/cfl.h"
+
+enum {
+  NEED_LEFT = 1 << 1,
+  NEED_ABOVE = 1 << 2,
+  NEED_ABOVERIGHT = 1 << 3,
+  NEED_ABOVELEFT = 1 << 4,
+  NEED_BOTTOMLEFT = 1 << 5,
+};
+
+#define INTRA_EDGE_FILT 3
+#define INTRA_EDGE_TAPS 5
+#define MAX_UPSAMPLE_SZ 16
+
+static const uint8_t extend_modes[INTRA_MODES] = {
+  NEED_ABOVE | NEED_LEFT,                   // DC
+  NEED_ABOVE,                               // V
+  NEED_LEFT,                                // H
+  NEED_ABOVE | NEED_ABOVERIGHT,             // D45
+  NEED_LEFT | NEED_ABOVE | NEED_ABOVELEFT,  // D135
+  NEED_LEFT | NEED_ABOVE | NEED_ABOVELEFT,  // D113
+  NEED_LEFT | NEED_ABOVE | NEED_ABOVELEFT,  // D157
+  NEED_LEFT | NEED_BOTTOMLEFT,              // D203
+  NEED_ABOVE | NEED_ABOVERIGHT,             // D67
+  NEED_LEFT | NEED_ABOVE,                   // SMOOTH
+  NEED_LEFT | NEED_ABOVE,                   // SMOOTH_V
+  NEED_LEFT | NEED_ABOVE,                   // SMOOTH_H
+  NEED_LEFT | NEED_ABOVE | NEED_ABOVELEFT,  // PAETH
+};
+
+// Tables to store if the top-right reference pixels are available. The flags
+// are represented with bits, packed into 8-bit integers. E.g., for the 32x32
+// blocks in a 128x128 superblock, the index of the "o" block is 10 (in raster
+// order), so its flag is stored at the 3rd bit of the 2nd entry in the table,
+// i.e. (table[10 / 8] >> (10 % 8)) & 1.
+//       . . . .
+//       . . . .
+//       . . o .
+//       . . . .
+static uint8_t has_tr_4x4[128] = {
+  255, 255, 255, 255, 85, 85, 85, 85, 119, 119, 119, 119, 85, 85, 85, 85,
+  127, 127, 127, 127, 85, 85, 85, 85, 119, 119, 119, 119, 85, 85, 85, 85,
+  255, 127, 255, 127, 85, 85, 85, 85, 119, 119, 119, 119, 85, 85, 85, 85,
+  127, 127, 127, 127, 85, 85, 85, 85, 119, 119, 119, 119, 85, 85, 85, 85,
+  255, 255, 255, 127, 85, 85, 85, 85, 119, 119, 119, 119, 85, 85, 85, 85,
+  127, 127, 127, 127, 85, 85, 85, 85, 119, 119, 119, 119, 85, 85, 85, 85,
+  255, 127, 255, 127, 85, 85, 85, 85, 119, 119, 119, 119, 85, 85, 85, 85,
+  127, 127, 127, 127, 85, 85, 85, 85, 119, 119, 119, 119, 85, 85, 85, 85,
+};
+static uint8_t has_tr_4x8[64] = {
+  255, 255, 255, 255, 119, 119, 119, 119, 127, 127, 127, 127, 119,
+  119, 119, 119, 255, 127, 255, 127, 119, 119, 119, 119, 127, 127,
+  127, 127, 119, 119, 119, 119, 255, 255, 255, 127, 119, 119, 119,
+  119, 127, 127, 127, 127, 119, 119, 119, 119, 255, 127, 255, 127,
+  119, 119, 119, 119, 127, 127, 127, 127, 119, 119, 119, 119,
+};
+static uint8_t has_tr_8x4[64] = {
+  255, 255, 0, 0, 85, 85, 0, 0, 119, 119, 0, 0, 85, 85, 0, 0,
+  127, 127, 0, 0, 85, 85, 0, 0, 119, 119, 0, 0, 85, 85, 0, 0,
+  255, 127, 0, 0, 85, 85, 0, 0, 119, 119, 0, 0, 85, 85, 0, 0,
+  127, 127, 0, 0, 85, 85, 0, 0, 119, 119, 0, 0, 85, 85, 0, 0,
+};
+static uint8_t has_tr_8x8[32] = {
+  255, 255, 85, 85, 119, 119, 85, 85, 127, 127, 85, 85, 119, 119, 85, 85,
+  255, 127, 85, 85, 119, 119, 85, 85, 127, 127, 85, 85, 119, 119, 85, 85,
+};
+static uint8_t has_tr_8x16[16] = {
+  255, 255, 119, 119, 127, 127, 119, 119,
+  255, 127, 119, 119, 127, 127, 119, 119,
+};
+static uint8_t has_tr_16x8[16] = {
+  255, 0, 85, 0, 119, 0, 85, 0, 127, 0, 85, 0, 119, 0, 85, 0,
+};
+static uint8_t has_tr_16x16[8] = {
+  255, 85, 119, 85, 127, 85, 119, 85,
+};
+static uint8_t has_tr_16x32[4] = { 255, 119, 127, 119 };
+static uint8_t has_tr_32x16[4] = { 15, 5, 7, 5 };
+static uint8_t has_tr_32x32[2] = { 95, 87 };
+static uint8_t has_tr_32x64[1] = { 127 };
+static uint8_t has_tr_64x32[1] = { 19 };
+static uint8_t has_tr_64x64[1] = { 7 };
+static uint8_t has_tr_64x128[1] = { 3 };
+static uint8_t has_tr_128x64[1] = { 1 };
+static uint8_t has_tr_128x128[1] = { 1 };
+static uint8_t has_tr_4x16[32] = {
+  255, 255, 255, 255, 127, 127, 127, 127, 255, 127, 255,
+  127, 127, 127, 127, 127, 255, 255, 255, 127, 127, 127,
+  127, 127, 255, 127, 255, 127, 127, 127, 127, 127,
+};
+static uint8_t has_tr_16x4[32] = {
+  255, 0, 0, 0, 85, 0, 0, 0, 119, 0, 0, 0, 85, 0, 0, 0,
+  127, 0, 0, 0, 85, 0, 0, 0, 119, 0, 0, 0, 85, 0, 0, 0,
+};
+static uint8_t has_tr_8x32[8] = {
+  255, 255, 127, 127, 255, 127, 127, 127,
+};
+static uint8_t has_tr_32x8[8] = {
+  15, 0, 5, 0, 7, 0, 5, 0,
+};
+static uint8_t has_tr_16x64[2] = { 255, 127 };
+static uint8_t has_tr_64x16[2] = { 3, 1 };
+
+static const uint8_t *const has_tr_tables[BLOCK_SIZES_ALL] = {
+  // 4X4
+  has_tr_4x4,
+  // 4X8,       8X4,            8X8
+  has_tr_4x8, has_tr_8x4, has_tr_8x8,
+  // 8X16,      16X8,           16X16
+  has_tr_8x16, has_tr_16x8, has_tr_16x16,
+  // 16X32,     32X16,          32X32
+  has_tr_16x32, has_tr_32x16, has_tr_32x32,
+  // 32X64,     64X32,          64X64
+  has_tr_32x64, has_tr_64x32, has_tr_64x64,
+  // 64x128,    128x64,         128x128
+  has_tr_64x128, has_tr_128x64, has_tr_128x128,
+  // 4x16,      16x4,            8x32
+  has_tr_4x16, has_tr_16x4, has_tr_8x32,
+  // 32x8,      16x64,           64x16
+  has_tr_32x8, has_tr_16x64, has_tr_64x16
+};
+
+static uint8_t has_tr_vert_8x8[32] = {
+  255, 255, 0, 0, 119, 119, 0, 0, 127, 127, 0, 0, 119, 119, 0, 0,
+  255, 127, 0, 0, 119, 119, 0, 0, 127, 127, 0, 0, 119, 119, 0, 0,
+};
+static uint8_t has_tr_vert_16x16[8] = {
+  255, 0, 119, 0, 127, 0, 119, 0,
+};
+static uint8_t has_tr_vert_32x32[2] = { 15, 7 };
+static uint8_t has_tr_vert_64x64[1] = { 3 };
+
+// The _vert_* tables are like the ordinary tables above, but describe the
+// order we visit square blocks when doing a PARTITION_VERT_A or
+// PARTITION_VERT_B. This is the same order as normal except for on the last
+// split where we go vertically (TL, BL, TR, BR). We treat the rectangular block
+// as a pair of squares, which means that these tables work correctly for both
+// mixed vertical partition types.
+//
+// There are tables for each of the square sizes. Vertical rectangles (like
+// BLOCK_16X32) use their respective "non-vert" table
+static const uint8_t *const has_tr_vert_tables[BLOCK_SIZES] = {
+  // 4X4
+  NULL,
+  // 4X8,      8X4,         8X8
+  has_tr_4x8, NULL, has_tr_vert_8x8,
+  // 8X16,     16X8,        16X16
+  has_tr_8x16, NULL, has_tr_vert_16x16,
+  // 16X32,    32X16,       32X32
+  has_tr_16x32, NULL, has_tr_vert_32x32,
+  // 32X64,    64X32,       64X64
+  has_tr_32x64, NULL, has_tr_vert_64x64,
+  // 64x128,   128x64,      128x128
+  has_tr_64x128, NULL, has_tr_128x128
+};
+
+static const uint8_t *get_has_tr_table(PARTITION_TYPE partition,
+                                       BLOCK_SIZE bsize) {
+  const uint8_t *ret = NULL;
+  // If this is a mixed vertical partition, look up bsize in orders_vert.
+  if (partition == PARTITION_VERT_A || partition == PARTITION_VERT_B) {
+    assert(bsize < BLOCK_SIZES);
+    ret = has_tr_vert_tables[bsize];
+  } else {
+    ret = has_tr_tables[bsize];
+  }
+  assert(ret);
+  return ret;
+}
+
+int has_top_right(const AV1_COMMON *cm, BLOCK_SIZE bsize, int mi_row,
+                         int mi_col, int top_available, int right_available,
+                         PARTITION_TYPE partition, TX_SIZE txsz, int row_off,
+                         int col_off, int ss_x, int ss_y) {
+  if (!top_available || !right_available) return 0;
+
+  const int bw_unit = block_size_wide[bsize] >> tx_size_wide_log2[0];
+  const int plane_bw_unit = AOMMAX(bw_unit >> ss_x, 1);
+  const int top_right_count_unit = tx_size_wide_unit[txsz];
+
+  if (row_off > 0) {  // Just need to check if enough pixels on the right.
+    if (block_size_wide[bsize] > block_size_wide[BLOCK_64X64]) {
+      // Special case: For 128x128 blocks, the transform unit whose
+      // top-right corner is at the center of the block does in fact have
+      // pixels available at its top-right corner.
+      if (row_off == mi_size_high[BLOCK_64X64] >> ss_y &&
+          col_off + top_right_count_unit == mi_size_wide[BLOCK_64X64] >> ss_x) {
+        return 1;
+      }
+      const int plane_bw_unit_64 = mi_size_wide[BLOCK_64X64] >> ss_x;
+      const int col_off_64 = col_off % plane_bw_unit_64;
+      return col_off_64 + top_right_count_unit < plane_bw_unit_64;
+    }
+    return col_off + top_right_count_unit < plane_bw_unit;
+  } else {
+    // All top-right pixels are in the block above, which is already available.
+    if (col_off + top_right_count_unit < plane_bw_unit) return 1;
+
+    const int bw_in_mi_log2 = mi_size_wide_log2[bsize];
+    const int bh_in_mi_log2 = mi_size_high_log2[bsize];
+    const int sb_mi_size = mi_size_high[cm->seq_params.sb_size];
+    const int blk_row_in_sb = (mi_row & (sb_mi_size - 1)) >> bh_in_mi_log2;
+    const int blk_col_in_sb = (mi_col & (sb_mi_size - 1)) >> bw_in_mi_log2;
+
+    // Top row of superblock: so top-right pixels are in the top and/or
+    // top-right superblocks, both of which are already available.
+    if (blk_row_in_sb == 0) return 1;
+
+    // Rightmost column of superblock (and not the top row): so top-right pixels
+    // fall in the right superblock, which is not available yet.
+    if (((blk_col_in_sb + 1) << bw_in_mi_log2) >= sb_mi_size) {
+      return 0;
+    }
+
+    // General case (neither top row nor rightmost column): check if the
+    // top-right block is coded before the current block.
+    const int this_blk_index =
+        ((blk_row_in_sb + 0) << (MAX_MIB_SIZE_LOG2 - bw_in_mi_log2)) +
+        blk_col_in_sb + 0;
+    const int idx1 = this_blk_index / 8;
+    const int idx2 = this_blk_index % 8;
+    const uint8_t *has_tr_table = get_has_tr_table(partition, bsize);
+    return (has_tr_table[idx1] >> idx2) & 1;
+  }
+}
+
+// Similar to the has_tr_* tables, but store if the bottom-left reference
+// pixels are available.
+static uint8_t has_bl_4x4[128] = {
+  84, 85, 85, 85, 16, 17, 17, 17, 84, 85, 85, 85, 0,  1,  1,  1,  84, 85, 85,
+  85, 16, 17, 17, 17, 84, 85, 85, 85, 0,  0,  1,  0,  84, 85, 85, 85, 16, 17,
+  17, 17, 84, 85, 85, 85, 0,  1,  1,  1,  84, 85, 85, 85, 16, 17, 17, 17, 84,
+  85, 85, 85, 0,  0,  0,  0,  84, 85, 85, 85, 16, 17, 17, 17, 84, 85, 85, 85,
+  0,  1,  1,  1,  84, 85, 85, 85, 16, 17, 17, 17, 84, 85, 85, 85, 0,  0,  1,
+  0,  84, 85, 85, 85, 16, 17, 17, 17, 84, 85, 85, 85, 0,  1,  1,  1,  84, 85,
+  85, 85, 16, 17, 17, 17, 84, 85, 85, 85, 0,  0,  0,  0,
+};
+static uint8_t has_bl_4x8[64] = {
+  16, 17, 17, 17, 0, 1, 1, 1, 16, 17, 17, 17, 0, 0, 1, 0,
+  16, 17, 17, 17, 0, 1, 1, 1, 16, 17, 17, 17, 0, 0, 0, 0,
+  16, 17, 17, 17, 0, 1, 1, 1, 16, 17, 17, 17, 0, 0, 1, 0,
+  16, 17, 17, 17, 0, 1, 1, 1, 16, 17, 17, 17, 0, 0, 0, 0,
+};
+static uint8_t has_bl_8x4[64] = {
+  254, 255, 84, 85, 254, 255, 16, 17, 254, 255, 84, 85, 254, 255, 0, 1,
+  254, 255, 84, 85, 254, 255, 16, 17, 254, 255, 84, 85, 254, 255, 0, 0,
+  254, 255, 84, 85, 254, 255, 16, 17, 254, 255, 84, 85, 254, 255, 0, 1,
+  254, 255, 84, 85, 254, 255, 16, 17, 254, 255, 84, 85, 254, 255, 0, 0,
+};
+static uint8_t has_bl_8x8[32] = {
+  84, 85, 16, 17, 84, 85, 0, 1, 84, 85, 16, 17, 84, 85, 0, 0,
+  84, 85, 16, 17, 84, 85, 0, 1, 84, 85, 16, 17, 84, 85, 0, 0,
+};
+static uint8_t has_bl_8x16[16] = {
+  16, 17, 0, 1, 16, 17, 0, 0, 16, 17, 0, 1, 16, 17, 0, 0,
+};
+static uint8_t has_bl_16x8[16] = {
+  254, 84, 254, 16, 254, 84, 254, 0, 254, 84, 254, 16, 254, 84, 254, 0,
+};
+static uint8_t has_bl_16x16[8] = {
+  84, 16, 84, 0, 84, 16, 84, 0,
+};
+static uint8_t has_bl_16x32[4] = { 16, 0, 16, 0 };
+static uint8_t has_bl_32x16[4] = { 78, 14, 78, 14 };
+static uint8_t has_bl_32x32[2] = { 4, 4 };
+static uint8_t has_bl_32x64[1] = { 0 };
+static uint8_t has_bl_64x32[1] = { 34 };
+static uint8_t has_bl_64x64[1] = { 0 };
+static uint8_t has_bl_64x128[1] = { 0 };
+static uint8_t has_bl_128x64[1] = { 0 };
+static uint8_t has_bl_128x128[1] = { 0 };
+static uint8_t has_bl_4x16[32] = {
+  0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0,
+  0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0,
+};
+static uint8_t has_bl_16x4[32] = {
+  254, 254, 254, 84, 254, 254, 254, 16, 254, 254, 254, 84, 254, 254, 254, 0,
+  254, 254, 254, 84, 254, 254, 254, 16, 254, 254, 254, 84, 254, 254, 254, 0,
+};
+static uint8_t has_bl_8x32[8] = {
+  0, 1, 0, 0, 0, 1, 0, 0,
+};
+static uint8_t has_bl_32x8[8] = {
+  238, 78, 238, 14, 238, 78, 238, 14,
+};
+static uint8_t has_bl_16x64[2] = { 0, 0 };
+static uint8_t has_bl_64x16[2] = { 42, 42 };
+
+static const uint8_t *const has_bl_tables[BLOCK_SIZES_ALL] = {
+  // 4X4
+  has_bl_4x4,
+  // 4X8,         8X4,         8X8
+  has_bl_4x8, has_bl_8x4, has_bl_8x8,
+  // 8X16,        16X8,        16X16
+  has_bl_8x16, has_bl_16x8, has_bl_16x16,
+  // 16X32,       32X16,       32X32
+  has_bl_16x32, has_bl_32x16, has_bl_32x32,
+  // 32X64,       64X32,       64X64
+  has_bl_32x64, has_bl_64x32, has_bl_64x64,
+  // 64x128,      128x64,      128x128
+  has_bl_64x128, has_bl_128x64, has_bl_128x128,
+  // 4x16,        16x4,        8x32
+  has_bl_4x16, has_bl_16x4, has_bl_8x32,
+  // 32x8,        16x64,       64x16
+  has_bl_32x8, has_bl_16x64, has_bl_64x16
+};
+
+static uint8_t has_bl_vert_8x8[32] = {
+  254, 255, 16, 17, 254, 255, 0, 1, 254, 255, 16, 17, 254, 255, 0, 0,
+  254, 255, 16, 17, 254, 255, 0, 1, 254, 255, 16, 17, 254, 255, 0, 0,
+};
+static uint8_t has_bl_vert_16x16[8] = {
+  254, 16, 254, 0, 254, 16, 254, 0,
+};
+static uint8_t has_bl_vert_32x32[2] = { 14, 14 };
+static uint8_t has_bl_vert_64x64[1] = { 2 };
+
+// The _vert_* tables are like the ordinary tables above, but describe the
+// order we visit square blocks when doing a PARTITION_VERT_A or
+// PARTITION_VERT_B. This is the same order as normal except for on the last
+// split where we go vertically (TL, BL, TR, BR). We treat the rectangular block
+// as a pair of squares, which means that these tables work correctly for both
+// mixed vertical partition types.
+//
+// There are tables for each of the square sizes. Vertical rectangles (like
+// BLOCK_16X32) use their respective "non-vert" table
+static const uint8_t *const has_bl_vert_tables[BLOCK_SIZES] = {
+  // 4X4
+  NULL,
+  // 4X8,     8X4,         8X8
+  has_bl_4x8, NULL, has_bl_vert_8x8,
+  // 8X16,    16X8,        16X16
+  has_bl_8x16, NULL, has_bl_vert_16x16,
+  // 16X32,   32X16,       32X32
+  has_bl_16x32, NULL, has_bl_vert_32x32,
+  // 32X64,   64X32,       64X64
+  has_bl_32x64, NULL, has_bl_vert_64x64,
+  // 64x128,  128x64,      128x128
+  has_bl_64x128, NULL, has_bl_128x128
+};
+
+static const uint8_t *get_has_bl_table(PARTITION_TYPE partition,
+                                       BLOCK_SIZE bsize) {
+  const uint8_t *ret = NULL;
+  // If this is a mixed vertical partition, look up bsize in orders_vert.
+  if (partition == PARTITION_VERT_A || partition == PARTITION_VERT_B) {
+    assert(bsize < BLOCK_SIZES);
+    ret = has_bl_vert_tables[bsize];
+  } else {
+    ret = has_bl_tables[bsize];
+  }
+  assert(ret);
+  return ret;
+}
+
+int has_bottom_left(const AV1_COMMON *cm, BLOCK_SIZE bsize, int mi_row,
+                           int mi_col, int bottom_available, int left_available,
+                           PARTITION_TYPE partition, TX_SIZE txsz, int row_off,
+                           int col_off, int ss_x, int ss_y) {
+  if (!bottom_available || !left_available) return 0;
+
+  // Special case for 128x* blocks, when col_off is half the block width.
+  // This is needed because 128x* superblocks are divided into 64x* blocks in
+  // raster order
+  if (block_size_wide[bsize] > block_size_wide[BLOCK_64X64] && col_off > 0) {
+    const int plane_bw_unit_64 = mi_size_wide[BLOCK_64X64] >> ss_x;
+    const int col_off_64 = col_off % plane_bw_unit_64;
+    if (col_off_64 == 0) {
+      // We are at the left edge of top-right or bottom-right 64x* block.
+      const int plane_bh_unit_64 = mi_size_high[BLOCK_64X64] >> ss_y;
+      const int row_off_64 = row_off % plane_bh_unit_64;
+      const int plane_bh_unit =
+          AOMMIN(mi_size_high[bsize] >> ss_y, plane_bh_unit_64);
+      // Check if all bottom-left pixels are in the left 64x* block (which is
+      // already coded).
+      return row_off_64 + tx_size_high_unit[txsz] < plane_bh_unit;
+    }
+  }
+
+  if (col_off > 0) {
+    // Bottom-left pixels are in the bottom-left block, which is not available.
+    return 0;
+  } else {
+    const int bh_unit = block_size_high[bsize] >> tx_size_high_log2[0];
+    const int plane_bh_unit = AOMMAX(bh_unit >> ss_y, 1);
+    const int bottom_left_count_unit = tx_size_high_unit[txsz];
+
+    // All bottom-left pixels are in the left block, which is already available.
+    if (row_off + bottom_left_count_unit < plane_bh_unit) return 1;
+
+    const int bw_in_mi_log2 = mi_size_wide_log2[bsize];
+    const int bh_in_mi_log2 = mi_size_high_log2[bsize];
+    const int sb_mi_size = mi_size_high[cm->seq_params.sb_size];
+    const int blk_row_in_sb = (mi_row & (sb_mi_size - 1)) >> bh_in_mi_log2;
+    const int blk_col_in_sb = (mi_col & (sb_mi_size - 1)) >> bw_in_mi_log2;
+
+    // Leftmost column of superblock: so bottom-left pixels maybe in the left
+    // and/or bottom-left superblocks. But only the left superblock is
+    // available, so check if all required pixels fall in that superblock.
+    if (blk_col_in_sb == 0) {
+      const int blk_start_row_off = blk_row_in_sb
+                                        << (bh_in_mi_log2 + MI_SIZE_LOG2 -
+                                            tx_size_wide_log2[0]) >>
+                                    ss_y;
+      const int row_off_in_sb = blk_start_row_off + row_off;
+      const int sb_height_unit = sb_mi_size >> ss_y;
+      return row_off_in_sb + bottom_left_count_unit < sb_height_unit;
+    }
+
+    // Bottom row of superblock (and not the leftmost column): so bottom-left
+    // pixels fall in the bottom superblock, which is not available yet.
+    if (((blk_row_in_sb + 1) << bh_in_mi_log2) >= sb_mi_size) return 0;
+
+    // General case (neither leftmost column nor bottom row): check if the
+    // bottom-left block is coded before the current block.
+    const int this_blk_index =
+        ((blk_row_in_sb + 0) << (MAX_MIB_SIZE_LOG2 - bw_in_mi_log2)) +
+        blk_col_in_sb + 0;
+    const int idx1 = this_blk_index / 8;
+    const int idx2 = this_blk_index % 8;
+    const uint8_t *has_bl_table = get_has_bl_table(partition, bsize);
+    return (has_bl_table[idx1] >> idx2) & 1;
+  }
+}
+
+typedef void (*intra_pred_fn)(uint8_t *dst, ptrdiff_t stride,
+                              const uint8_t *above, const uint8_t *left);
+
+static intra_pred_fn pred[INTRA_MODES][TX_SIZES_ALL];
+static intra_pred_fn dc_pred[2][2][TX_SIZES_ALL];
+
+typedef void (*intra_high_pred_fn)(uint16_t *dst, ptrdiff_t stride,
+                                   const uint16_t *above, const uint16_t *left,
+                                   int bd);
+static intra_high_pred_fn pred_high[INTRA_MODES][TX_SIZES_ALL];
+static intra_high_pred_fn dc_pred_high[2][2][TX_SIZES_ALL];
+
+static void init_intra_predictors_internal(void) {
+  assert(NELEMENTS(mode_to_angle_map) == INTRA_MODES);
+
+#define INIT_RECTANGULAR(p, type)             \
+  p[TX_4X8] = aom_##type##_predictor_4x8;     \
+  p[TX_8X4] = aom_##type##_predictor_8x4;     \
+  p[TX_8X16] = aom_##type##_predictor_8x16;   \
+  p[TX_16X8] = aom_##type##_predictor_16x8;   \
+  p[TX_16X32] = aom_##type##_predictor_16x32; \
+  p[TX_32X16] = aom_##type##_predictor_32x16; \
+  p[TX_32X64] = aom_##type##_predictor_32x64; \
+  p[TX_64X32] = aom_##type##_predictor_64x32; \
+  p[TX_4X16] = aom_##type##_predictor_4x16;   \
+  p[TX_16X4] = aom_##type##_predictor_16x4;   \
+  p[TX_8X32] = aom_##type##_predictor_8x32;   \
+  p[TX_32X8] = aom_##type##_predictor_32x8;   \
+  p[TX_16X64] = aom_##type##_predictor_16x64; \
+  p[TX_64X16] = aom_##type##_predictor_64x16;
+
+#define INIT_NO_4X4(p, type)                  \
+  p[TX_8X8] = aom_##type##_predictor_8x8;     \
+  p[TX_16X16] = aom_##type##_predictor_16x16; \
+  p[TX_32X32] = aom_##type##_predictor_32x32; \
+  p[TX_64X64] = aom_##type##_predictor_64x64; \
+  INIT_RECTANGULAR(p, type)
+
+#define INIT_ALL_SIZES(p, type)           \
+  p[TX_4X4] = aom_##type##_predictor_4x4; \
+  INIT_NO_4X4(p, type)
+
+  INIT_ALL_SIZES(pred[V_PRED], v);
+  INIT_ALL_SIZES(pred[H_PRED], h);
+  INIT_ALL_SIZES(pred[PAETH_PRED], paeth);
+  INIT_ALL_SIZES(pred[SMOOTH_PRED], smooth);
+  INIT_ALL_SIZES(pred[SMOOTH_V_PRED], smooth_v);
+  INIT_ALL_SIZES(pred[SMOOTH_H_PRED], smooth_h);
+  INIT_ALL_SIZES(dc_pred[0][0], dc_128);
+  INIT_ALL_SIZES(dc_pred[0][1], dc_top);
+  INIT_ALL_SIZES(dc_pred[1][0], dc_left);
+  INIT_ALL_SIZES(dc_pred[1][1], dc);
+
+  INIT_ALL_SIZES(pred_high[V_PRED], highbd_v);
+  INIT_ALL_SIZES(pred_high[H_PRED], highbd_h);
+  INIT_ALL_SIZES(pred_high[PAETH_PRED], highbd_paeth);
+  INIT_ALL_SIZES(pred_high[SMOOTH_PRED], highbd_smooth);
+  INIT_ALL_SIZES(pred_high[SMOOTH_V_PRED], highbd_smooth_v);
+  INIT_ALL_SIZES(pred_high[SMOOTH_H_PRED], highbd_smooth_h);
+  INIT_ALL_SIZES(dc_pred_high[0][0], highbd_dc_128);
+  INIT_ALL_SIZES(dc_pred_high[0][1], highbd_dc_top);
+  INIT_ALL_SIZES(dc_pred_high[1][0], highbd_dc_left);
+  INIT_ALL_SIZES(dc_pred_high[1][1], highbd_dc);
+#undef intra_pred_allsizes
+}
+
+// Directional prediction, zone 1: 0 < angle < 90
+void av1_dr_prediction_z1_c(uint8_t *dst, ptrdiff_t stride, int bw, int bh,
+                            const uint8_t *above, const uint8_t *left,
+                            int upsample_above, int dx, int dy) {
+  int r, c, x, base, shift, val;
+
+  (void)left;
+  (void)dy;
+  assert(dy == 1);
+  assert(dx > 0);
+
+  const int max_base_x = ((bw + bh) - 1) << upsample_above;
+  const int frac_bits = 6 - upsample_above;
+  const int base_inc = 1 << upsample_above;
+  x = dx;
+  for (r = 0; r < bh; ++r, dst += stride, x += dx) {
+    base = x >> frac_bits;
+    shift = ((x << upsample_above) & 0x3F) >> 1;
+
+    if (base >= max_base_x) {
+      for (int i = r; i < bh; ++i) {
+        memset(dst, above[max_base_x], bw * sizeof(dst[0]));
+        dst += stride;
+      }
+      return;
+    }
+
+    for (c = 0; c < bw; ++c, base += base_inc) {
+      if (base < max_base_x) {
+        val = above[base] * (32 - shift) + above[base + 1] * shift;
+        dst[c] = ROUND_POWER_OF_TWO(val, 5);
+      } else {
+        dst[c] = above[max_base_x];
+      }
+    }
+  }
+}
+
+// Directional prediction, zone 2: 90 < angle < 180
+void av1_dr_prediction_z2_c(uint8_t *dst, ptrdiff_t stride, int bw, int bh,
+                            const uint8_t *above, const uint8_t *left,
+                            int upsample_above, int upsample_left, int dx,
+                            int dy) {
+  assert(dx > 0);
+  assert(dy > 0);
+
+  const int min_base_x = -(1 << upsample_above);
+  const int min_base_y = -(1 << upsample_left);
+  (void)min_base_y;
+  const int frac_bits_x = 6 - upsample_above;
+  const int frac_bits_y = 6 - upsample_left;
+
+  for (int r = 0; r < bh; ++r) {
+    for (int c = 0; c < bw; ++c) {
+      int val;
+      int y = r + 1;
+      int x = (c << 6) - y * dx;
+      const int base_x = x >> frac_bits_x;
+      if (base_x >= min_base_x) {
+        const int shift = ((x * (1 << upsample_above)) & 0x3F) >> 1;
+        val = above[base_x] * (32 - shift) + above[base_x + 1] * shift;
+        val = ROUND_POWER_OF_TWO(val, 5);
+      } else {
+        x = c + 1;
+        y = (r << 6) - x * dy;
+        const int base_y = y >> frac_bits_y;
+        assert(base_y >= min_base_y);
+        const int shift = ((y * (1 << upsample_left)) & 0x3F) >> 1;
+        val = left[base_y] * (32 - shift) + left[base_y + 1] * shift;
+        val = ROUND_POWER_OF_TWO(val, 5);
+      }
+      dst[c] = val;
+    }
+    dst += stride;
+  }
+}
+
+// Directional prediction, zone 3: 180 < angle < 270
+void av1_dr_prediction_z3_c(uint8_t *dst, ptrdiff_t stride, int bw, int bh,
+                            const uint8_t *above, const uint8_t *left,
+                            int upsample_left, int dx, int dy) {
+  int r, c, y, base, shift, val;
+
+  (void)above;
+  (void)dx;
+
+  assert(dx == 1);
+  assert(dy > 0);
+
+  const int max_base_y = (bw + bh - 1) << upsample_left;
+  const int frac_bits = 6 - upsample_left;
+  const int base_inc = 1 << upsample_left;
+  y = dy;
+  for (c = 0; c < bw; ++c, y += dy) {
+    base = y >> frac_bits;
+    shift = ((y << upsample_left) & 0x3F) >> 1;
+
+    for (r = 0; r < bh; ++r, base += base_inc) {
+      if (base < max_base_y) {
+        val = left[base] * (32 - shift) + left[base + 1] * shift;
+        dst[r * stride + c] = val = ROUND_POWER_OF_TWO(val, 5);
+      } else {
+        for (; r < bh; ++r) dst[r * stride + c] = left[max_base_y];
+        break;
+      }
+    }
+  }
+}
+
+static void dr_predictor(uint8_t *dst, ptrdiff_t stride, TX_SIZE tx_size,
+                         const uint8_t *above, const uint8_t *left,
+                         int upsample_above, int upsample_left, int angle) {
+  const int dx = av1_get_dx(angle);
+  const int dy = av1_get_dy(angle);
+  const int bw = tx_size_wide[tx_size];
+  const int bh = tx_size_high[tx_size];
+  assert(angle > 0 && angle < 270);
+
+  if (angle > 0 && angle < 90) {
+    av1_dr_prediction_z1(dst, stride, bw, bh, above, left, upsample_above, dx,
+                         dy);
+  } else if (angle > 90 && angle < 180) {
+    av1_dr_prediction_z2(dst, stride, bw, bh, above, left, upsample_above,
+                         upsample_left, dx, dy);
+  } else if (angle > 180 && angle < 270) {
+    av1_dr_prediction_z3(dst, stride, bw, bh, above, left, upsample_left, dx,
+                         dy);
+  } else if (angle == 90) {
+    pred[V_PRED][tx_size](dst, stride, above, left);
+  } else if (angle == 180) {
+    pred[H_PRED][tx_size](dst, stride, above, left);
+  }
+}
+
+// Directional prediction, zone 1: 0 < angle < 90
+void av1_highbd_dr_prediction_z1_c(uint16_t *dst, ptrdiff_t stride, int bw,
+                                   int bh, const uint16_t *above,
+                                   const uint16_t *left, int upsample_above,
+                                   int dx, int dy, int bd) {
+  int r, c, x, base, shift, val;
+
+  (void)left;
+  (void)dy;
+  (void)bd;
+  assert(dy == 1);
+  assert(dx > 0);
+
+  const int max_base_x = ((bw + bh) - 1) << upsample_above;
+  const int frac_bits = 6 - upsample_above;
+  const int base_inc = 1 << upsample_above;
+  x = dx;
+  for (r = 0; r < bh; ++r, dst += stride, x += dx) {
+    base = x >> frac_bits;
+    shift = ((x << upsample_above) & 0x3F) >> 1;
+
+    if (base >= max_base_x) {
+      for (int i = r; i < bh; ++i) {
+        aom_memset16(dst, above[max_base_x], bw);
+        dst += stride;
+      }
+      return;
+    }
+
+    for (c = 0; c < bw; ++c, base += base_inc) {
+      if (base < max_base_x) {
+        val = above[base] * (32 - shift) + above[base + 1] * shift;
+        dst[c] = ROUND_POWER_OF_TWO(val, 5);
+      } else {
+        dst[c] = above[max_base_x];
+      }
+    }
+  }
+}
+
+// Directional prediction, zone 2: 90 < angle < 180
+void av1_highbd_dr_prediction_z2_c(uint16_t *dst, ptrdiff_t stride, int bw,
+                                   int bh, const uint16_t *above,
+                                   const uint16_t *left, int upsample_above,
+                                   int upsample_left, int dx, int dy, int bd) {
+  (void)bd;
+  assert(dx > 0);
+  assert(dy > 0);
+
+  const int min_base_x = -(1 << upsample_above);
+  const int min_base_y = -(1 << upsample_left);
+  (void)min_base_y;
+  const int frac_bits_x = 6 - upsample_above;
+  const int frac_bits_y = 6 - upsample_left;
+
+  for (int r = 0; r < bh; ++r) {
+    for (int c = 0; c < bw; ++c) {
+      int val;
+      int y = r + 1;
+      int x = (c << 6) - y * dx;
+      const int base_x = x >> frac_bits_x;
+      if (base_x >= min_base_x) {
+        const int shift = ((x * (1 << upsample_above)) & 0x3F) >> 1;
+        val = above[base_x] * (32 - shift) + above[base_x + 1] * shift;
+        val = ROUND_POWER_OF_TWO(val, 5);
+      } else {
+        x = c + 1;
+        y = (r << 6) - x * dy;
+        const int base_y = y >> frac_bits_y;
+        assert(base_y >= min_base_y);
+        const int shift = ((y * (1 << upsample_left)) & 0x3F) >> 1;
+        val = left[base_y] * (32 - shift) + left[base_y + 1] * shift;
+        val = ROUND_POWER_OF_TWO(val, 5);
+      }
+      dst[c] = val;
+    }
+    dst += stride;
+  }
+}
+
+// Directional prediction, zone 3: 180 < angle < 270
+void av1_highbd_dr_prediction_z3_c(uint16_t *dst, ptrdiff_t stride, int bw,
+                                   int bh, const uint16_t *above,
+                                   const uint16_t *left, int upsample_left,
+                                   int dx, int dy, int bd) {
+  int r, c, y, base, shift, val;
+
+  (void)above;
+  (void)dx;
+  (void)bd;
+  assert(dx == 1);
+  assert(dy > 0);
+
+  const int max_base_y = (bw + bh - 1) << upsample_left;
+  const int frac_bits = 6 - upsample_left;
+  const int base_inc = 1 << upsample_left;
+  y = dy;
+  for (c = 0; c < bw; ++c, y += dy) {
+    base = y >> frac_bits;
+    shift = ((y << upsample_left) & 0x3F) >> 1;
+
+    for (r = 0; r < bh; ++r, base += base_inc) {
+      if (base < max_base_y) {
+        val = left[base] * (32 - shift) + left[base + 1] * shift;
+        dst[r * stride + c] = ROUND_POWER_OF_TWO(val, 5);
+      } else {
+        for (; r < bh; ++r) dst[r * stride + c] = left[max_base_y];
+        break;
+      }
+    }
+  }
+}
+
+static void highbd_dr_predictor(uint16_t *dst, ptrdiff_t stride,
+                                TX_SIZE tx_size, const uint16_t *above,
+                                const uint16_t *left, int upsample_above,
+                                int upsample_left, int angle, int bd) {
+  const int dx = av1_get_dx(angle);
+  const int dy = av1_get_dy(angle);
+  const int bw = tx_size_wide[tx_size];
+  const int bh = tx_size_high[tx_size];
+  assert(angle > 0 && angle < 270);
+
+  if (angle > 0 && angle < 90) {
+    av1_highbd_dr_prediction_z1(dst, stride, bw, bh, above, left,
+                                upsample_above, dx, dy, bd);
+  } else if (angle > 90 && angle < 180) {
+    av1_highbd_dr_prediction_z2(dst, stride, bw, bh, above, left,
+                                upsample_above, upsample_left, dx, dy, bd);
+  } else if (angle > 180 && angle < 270) {
+    av1_highbd_dr_prediction_z3(dst, stride, bw, bh, above, left, upsample_left,
+                                dx, dy, bd);
+  } else if (angle == 90) {
+    pred_high[V_PRED][tx_size](dst, stride, above, left, bd);
+  } else if (angle == 180) {
+    pred_high[H_PRED][tx_size](dst, stride, above, left, bd);
+  }
+}
+
+DECLARE_ALIGNED(16, const int8_t,
+                av1_filter_intra_taps[FILTER_INTRA_MODES][8][8]) = {
+  {
+      { -6, 10, 0, 0, 0, 12, 0, 0 },
+      { -5, 2, 10, 0, 0, 9, 0, 0 },
+      { -3, 1, 1, 10, 0, 7, 0, 0 },
+      { -3, 1, 1, 2, 10, 5, 0, 0 },
+      { -4, 6, 0, 0, 0, 2, 12, 0 },
+      { -3, 2, 6, 0, 0, 2, 9, 0 },
+      { -3, 2, 2, 6, 0, 2, 7, 0 },
+      { -3, 1, 2, 2, 6, 3, 5, 0 },
+  },
+  {
+      { -10, 16, 0, 0, 0, 10, 0, 0 },
+      { -6, 0, 16, 0, 0, 6, 0, 0 },
+      { -4, 0, 0, 16, 0, 4, 0, 0 },
+      { -2, 0, 0, 0, 16, 2, 0, 0 },
+      { -10, 16, 0, 0, 0, 0, 10, 0 },
+      { -6, 0, 16, 0, 0, 0, 6, 0 },
+      { -4, 0, 0, 16, 0, 0, 4, 0 },
+      { -2, 0, 0, 0, 16, 0, 2, 0 },
+  },
+  {
+      { -8, 8, 0, 0, 0, 16, 0, 0 },
+      { -8, 0, 8, 0, 0, 16, 0, 0 },
+      { -8, 0, 0, 8, 0, 16, 0, 0 },
+      { -8, 0, 0, 0, 8, 16, 0, 0 },
+      { -4, 4, 0, 0, 0, 0, 16, 0 },
+      { -4, 0, 4, 0, 0, 0, 16, 0 },
+      { -4, 0, 0, 4, 0, 0, 16, 0 },
+      { -4, 0, 0, 0, 4, 0, 16, 0 },
+  },
+  {
+      { -2, 8, 0, 0, 0, 10, 0, 0 },
+      { -1, 3, 8, 0, 0, 6, 0, 0 },
+      { -1, 2, 3, 8, 0, 4, 0, 0 },
+      { 0, 1, 2, 3, 8, 2, 0, 0 },
+      { -1, 4, 0, 0, 0, 3, 10, 0 },
+      { -1, 3, 4, 0, 0, 4, 6, 0 },
+      { -1, 2, 3, 4, 0, 4, 4, 0 },
+      { -1, 2, 2, 3, 4, 3, 3, 0 },
+  },
+  {
+      { -12, 14, 0, 0, 0, 14, 0, 0 },
+      { -10, 0, 14, 0, 0, 12, 0, 0 },
+      { -9, 0, 0, 14, 0, 11, 0, 0 },
+      { -8, 0, 0, 0, 14, 10, 0, 0 },
+      { -10, 12, 0, 0, 0, 0, 14, 0 },
+      { -9, 1, 12, 0, 0, 0, 12, 0 },
+      { -8, 0, 0, 12, 0, 1, 11, 0 },
+      { -7, 0, 0, 1, 12, 1, 9, 0 },
+  },
+};
+
+void av1_filter_intra_predictor_c(uint8_t *dst, ptrdiff_t stride,
+                                  TX_SIZE tx_size, const uint8_t *above,
+                                  const uint8_t *left, int mode) {
+  int r, c;
+  uint8_t buffer[33][33];
+  const int bw = tx_size_wide[tx_size];
+  const int bh = tx_size_high[tx_size];
+
+  assert(bw <= 32 && bh <= 32);
+
+  // The initialization is just for silencing Jenkins static analysis warnings
+  for (r = 0; r < bh + 1; ++r)
+    memset(buffer[r], 0, (bw + 1) * sizeof(buffer[0][0]));
+
+  for (r = 0; r < bh; ++r) buffer[r + 1][0] = left[r];
+  memcpy(buffer[0], &above[-1], (bw + 1) * sizeof(uint8_t));
+
+  for (r = 1; r < bh + 1; r += 2)
+    for (c = 1; c < bw + 1; c += 4) {
+      const uint8_t p0 = buffer[r - 1][c - 1];
+      const uint8_t p1 = buffer[r - 1][c];
+      const uint8_t p2 = buffer[r - 1][c + 1];
+      const uint8_t p3 = buffer[r - 1][c + 2];
+      const uint8_t p4 = buffer[r - 1][c + 3];
+      const uint8_t p5 = buffer[r][c - 1];
+      const uint8_t p6 = buffer[r + 1][c - 1];
+      for (int k = 0; k < 8; ++k) {
+        int r_offset = k >> 2;
+        int c_offset = k & 0x03;
+        buffer[r + r_offset][c + c_offset] =
+            clip_pixel(ROUND_POWER_OF_TWO_SIGNED(
+                av1_filter_intra_taps[mode][k][0] * p0 +
+                    av1_filter_intra_taps[mode][k][1] * p1 +
+                    av1_filter_intra_taps[mode][k][2] * p2 +
+                    av1_filter_intra_taps[mode][k][3] * p3 +
+                    av1_filter_intra_taps[mode][k][4] * p4 +
+                    av1_filter_intra_taps[mode][k][5] * p5 +
+                    av1_filter_intra_taps[mode][k][6] * p6,
+                FILTER_INTRA_SCALE_BITS));
+      }
+    }
+
+  for (r = 0; r < bh; ++r) {
+    memcpy(dst, &buffer[r + 1][1], bw * sizeof(uint8_t));
+    dst += stride;
+  }
+}
+
+static void highbd_filter_intra_predictor(uint16_t *dst, ptrdiff_t stride,
+                                          TX_SIZE tx_size,
+                                          const uint16_t *above,
+                                          const uint16_t *left, int mode,
+                                          int bd) {
+  int r, c;
+  uint16_t buffer[33][33];
+  const int bw = tx_size_wide[tx_size];
+  const int bh = tx_size_high[tx_size];
+
+  assert(bw <= 32 && bh <= 32);
+
+  // The initialization is just for silencing Jenkins static analysis warnings
+  for (r = 0; r < bh + 1; ++r)
+    memset(buffer[r], 0, (bw + 1) * sizeof(buffer[0][0]));
+
+  for (r = 0; r < bh; ++r) buffer[r + 1][0] = left[r];
+  memcpy(buffer[0], &above[-1], (bw + 1) * sizeof(buffer[0][0]));
+
+  for (r = 1; r < bh + 1; r += 2)
+    for (c = 1; c < bw + 1; c += 4) {
+      const uint16_t p0 = buffer[r - 1][c - 1];
+      const uint16_t p1 = buffer[r - 1][c];
+      const uint16_t p2 = buffer[r - 1][c + 1];
+      const uint16_t p3 = buffer[r - 1][c + 2];
+      const uint16_t p4 = buffer[r - 1][c + 3];
+      const uint16_t p5 = buffer[r][c - 1];
+      const uint16_t p6 = buffer[r + 1][c - 1];
+      for (int k = 0; k < 8; ++k) {
+        int r_offset = k >> 2;
+        int c_offset = k & 0x03;
+        buffer[r + r_offset][c + c_offset] =
+            clip_pixel_highbd(ROUND_POWER_OF_TWO_SIGNED(
+                                  av1_filter_intra_taps[mode][k][0] * p0 +
+                                      av1_filter_intra_taps[mode][k][1] * p1 +
+                                      av1_filter_intra_taps[mode][k][2] * p2 +
+                                      av1_filter_intra_taps[mode][k][3] * p3 +
+                                      av1_filter_intra_taps[mode][k][4] * p4 +
+                                      av1_filter_intra_taps[mode][k][5] * p5 +
+                                      av1_filter_intra_taps[mode][k][6] * p6,
+                                  FILTER_INTRA_SCALE_BITS),
+                              bd);
+      }
+    }
+
+  for (r = 0; r < bh; ++r) {
+    memcpy(dst, &buffer[r + 1][1], bw * sizeof(dst[0]));
+    dst += stride;
+  }
+}
+
+static int is_smooth(const MB_MODE_INFO *mbmi, int plane) {
+  if (plane == 0) {
+    const PREDICTION_MODE mode = mbmi->mode;
+    return (mode == SMOOTH_PRED || mode == SMOOTH_V_PRED ||
+            mode == SMOOTH_H_PRED);
+  } else {
+    // uv_mode is not set for inter blocks, so need to explicitly
+    // detect that case.
+    if (is_inter_block(mbmi)) return 0;
+
+    const UV_PREDICTION_MODE uv_mode = mbmi->uv_mode;
+    return (uv_mode == UV_SMOOTH_PRED || uv_mode == UV_SMOOTH_V_PRED ||
+            uv_mode == UV_SMOOTH_H_PRED);
+  }
+}
+
+int get_filt_type(const MACROBLOCKD *xd, int plane) {
+  int ab_sm, le_sm;
+
+  if (plane == 0) {
+    const MB_MODE_INFO *ab = xd->above_mbmi;
+    const MB_MODE_INFO *le = xd->left_mbmi;
+    ab_sm = ab ? is_smooth(ab, plane) : 0;
+    le_sm = le ? is_smooth(le, plane) : 0;
+  } else {
+    const MB_MODE_INFO *ab = xd->chroma_above_mbmi;
+    const MB_MODE_INFO *le = xd->chroma_left_mbmi;
+    ab_sm = ab ? is_smooth(ab, plane) : 0;
+    le_sm = le ? is_smooth(le, plane) : 0;
+  }
+
+  return (ab_sm || le_sm) ? 1 : 0;
+}
+
+int intra_edge_filter_strength(int bs0, int bs1, int delta, int type) {
+  const int d = abs(delta);
+  int strength = 0;
+
+  const int blk_wh = bs0 + bs1;
+  if (type == 0) {
+    if (blk_wh <= 8) {
+      if (d >= 56) strength = 1;
+    } else if (blk_wh <= 12) {
+      if (d >= 40) strength = 1;
+    } else if (blk_wh <= 16) {
+      if (d >= 40) strength = 1;
+    } else if (blk_wh <= 24) {
+      if (d >= 8) strength = 1;
+      if (d >= 16) strength = 2;
+      if (d >= 32) strength = 3;
+    } else if (blk_wh <= 32) {
+      if (d >= 1) strength = 1;
+      if (d >= 4) strength = 2;
+      if (d >= 32) strength = 3;
+    } else {
+      if (d >= 1) strength = 3;
+    }
+  } else {
+    if (blk_wh <= 8) {
+      if (d >= 40) strength = 1;
+      if (d >= 64) strength = 2;
+    } else if (blk_wh <= 16) {
+      if (d >= 20) strength = 1;
+      if (d >= 48) strength = 2;
+    } else if (blk_wh <= 24) {
+      if (d >= 4) strength = 3;
+    } else {
+      if (d >= 1) strength = 3;
+    }
+  }
+  return strength;
+}
+
+void av1_filter_intra_edge_c(uint8_t *p, int sz, int strength) {
+  if (!strength) return;
+
+  const int kernel[INTRA_EDGE_FILT][INTRA_EDGE_TAPS] = {
+    { 0, 4, 8, 4, 0 }, { 0, 5, 6, 5, 0 }, { 2, 4, 4, 4, 2 }
+  };
+  const int filt = strength - 1;
+  uint8_t edge[129];
+
+  memcpy(edge, p, sz * sizeof(*p));
+  for (int i = 1; i < sz; i++) {
+    int s = 0;
+    for (int j = 0; j < INTRA_EDGE_TAPS; j++) {
+      int k = i - 2 + j;
+      k = (k < 0) ? 0 : k;
+      k = (k > sz - 1) ? sz - 1 : k;
+      s += edge[k] * kernel[filt][j];
+    }
+    s = (s + 8) >> 4;
+    p[i] = s;
+  }
+}
+
+static void filter_intra_edge_corner(uint8_t *p_above, uint8_t *p_left) {
+  const int kernel[3] = { 5, 6, 5 };
+
+  int s = (p_left[0] * kernel[0]) + (p_above[-1] * kernel[1]) +
+          (p_above[0] * kernel[2]);
+  s = (s + 8) >> 4;
+  p_above[-1] = s;
+  p_left[-1] = s;
+}
+
+void av1_filter_intra_edge_high_c(uint16_t *p, int sz, int strength) {
+  if (!strength) return;
+
+  const int kernel[INTRA_EDGE_FILT][INTRA_EDGE_TAPS] = {
+    { 0, 4, 8, 4, 0 }, { 0, 5, 6, 5, 0 }, { 2, 4, 4, 4, 2 }
+  };
+  const int filt = strength - 1;
+  uint16_t edge[129];
+
+  memcpy(edge, p, sz * sizeof(*p));
+  for (int i = 1; i < sz; i++) {
+    int s = 0;
+    for (int j = 0; j < INTRA_EDGE_TAPS; j++) {
+      int k = i - 2 + j;
+      k = (k < 0) ? 0 : k;
+      k = (k > sz - 1) ? sz - 1 : k;
+      s += edge[k] * kernel[filt][j];
+    }
+    s = (s + 8) >> 4;
+    p[i] = s;
+  }
+}
+
+static void filter_intra_edge_corner_high(uint16_t *p_above, uint16_t *p_left) {
+  const int kernel[3] = { 5, 6, 5 };
+
+  int s = (p_left[0] * kernel[0]) + (p_above[-1] * kernel[1]) +
+          (p_above[0] * kernel[2]);
+  s = (s + 8) >> 4;
+  p_above[-1] = s;
+  p_left[-1] = s;
+}
+
+void av1_upsample_intra_edge_c(uint8_t *p, int sz) {
+  // interpolate half-sample positions
+  assert(sz <= MAX_UPSAMPLE_SZ);
+
+  uint8_t in[MAX_UPSAMPLE_SZ + 3];
+  // copy p[-1..(sz-1)] and extend first and last samples
+  in[0] = p[-1];
+  in[1] = p[-1];
+  for (int i = 0; i < sz; i++) {
+    in[i + 2] = p[i];
+  }
+  in[sz + 2] = p[sz - 1];
+
+  // interpolate half-sample edge positions
+  p[-2] = in[0];
+  for (int i = 0; i < sz; i++) {
+    int s = -in[i] + (9 * in[i + 1]) + (9 * in[i + 2]) - in[i + 3];
+    s = clip_pixel((s + 8) >> 4);
+    p[2 * i - 1] = s;
+    p[2 * i] = in[i + 2];
+  }
+}
+
+void av1_upsample_intra_edge_high_c(uint16_t *p, int sz, int bd) {
+  // interpolate half-sample positions
+  assert(sz <= MAX_UPSAMPLE_SZ);
+
+  uint16_t in[MAX_UPSAMPLE_SZ + 3];
+  // copy p[-1..(sz-1)] and extend first and last samples
+  in[0] = p[-1];
+  in[1] = p[-1];
+  for (int i = 0; i < sz; i++) {
+    in[i + 2] = p[i];
+  }
+  in[sz + 2] = p[sz - 1];
+
+  // interpolate half-sample edge positions
+  p[-2] = in[0];
+  for (int i = 0; i < sz; i++) {
+    int s = -in[i] + (9 * in[i + 1]) + (9 * in[i + 2]) - in[i + 3];
+    s = (s + 8) >> 4;
+    s = clip_pixel_highbd(s, bd);
+    p[2 * i - 1] = s;
+    p[2 * i] = in[i + 2];
+  }
+}
+
+static void build_intra_predictors_high(
+    const MACROBLOCKD *xd, const uint8_t *ref8, int ref_stride, uint8_t *dst8,
+    int dst_stride, PREDICTION_MODE mode, int angle_delta,
+    FILTER_INTRA_MODE filter_intra_mode, TX_SIZE tx_size,
+    int disable_edge_filter, int n_top_px, int n_topright_px, int n_left_px,
+    int n_bottomleft_px, int plane) {
+  int i;
+  uint16_t *dst = CONVERT_TO_SHORTPTR(dst8);
+  uint16_t *ref = CONVERT_TO_SHORTPTR(ref8);
+  DECLARE_ALIGNED(16, uint16_t, left_data[MAX_TX_SIZE * 2 + 32]);
+  DECLARE_ALIGNED(16, uint16_t, above_data[MAX_TX_SIZE * 2 + 32]);
+  uint16_t *const above_row = above_data + 16;
+  uint16_t *const left_col = left_data + 16;
+  const int txwpx = tx_size_wide[tx_size];
+  const int txhpx = tx_size_high[tx_size];
+  int need_left = extend_modes[mode] & NEED_LEFT;
+  int need_above = extend_modes[mode] & NEED_ABOVE;
+  int need_above_left = extend_modes[mode] & NEED_ABOVELEFT;
+  const uint16_t *above_ref = ref - ref_stride;
+  const uint16_t *left_ref = ref - 1;
+  int p_angle = 0;
+  const int is_dr_mode = av1_is_directional_mode(mode);
+  const int use_filter_intra = filter_intra_mode != FILTER_INTRA_MODES;
+  int base = 128 << (xd->bd - 8);
+
+  // The default values if ref pixels are not available:
+  // base-1 base-1 base-1 .. base-1 base-1 base-1 base-1 base-1 base-1
+  // base+1   A      B  ..     Y      Z
+  // base+1   C      D  ..     W      X
+  // base+1   E      F  ..     U      V
+  // base+1   G      H  ..     S      T      T      T      T      T
+
+  if (is_dr_mode) {
+    p_angle = mode_to_angle_map[mode] + angle_delta;
+    if (p_angle <= 90)
+      need_above = 1, need_left = 0, need_above_left = 1;
+    else if (p_angle < 180)
+      need_above = 1, need_left = 1, need_above_left = 1;
+    else
+      need_above = 0, need_left = 1, need_above_left = 1;
+  }
+  if (use_filter_intra) need_left = need_above = need_above_left = 1;
+
+  assert(n_top_px >= 0);
+  assert(n_topright_px >= 0);
+  assert(n_left_px >= 0);
+  assert(n_bottomleft_px >= 0);
+
+  if ((!need_above && n_left_px == 0) || (!need_left && n_top_px == 0)) {
+    int val;
+    if (need_left) {
+      val = (n_top_px > 0) ? above_ref[0] : base + 1;
+    } else {
+      val = (n_left_px > 0) ? left_ref[0] : base - 1;
+    }
+    for (i = 0; i < txhpx; ++i) {
+      aom_memset16(dst, val, txwpx);
+      dst += dst_stride;
+    }
+    return;
+  }
+
+  // NEED_LEFT
+  if (need_left) {
+    int need_bottom = !!(extend_modes[mode] & NEED_BOTTOMLEFT);
+    if (use_filter_intra) need_bottom = 0;
+    if (is_dr_mode) need_bottom = p_angle > 180;
+    const int num_left_pixels_needed = txhpx + (need_bottom ? txwpx : 0);
+    i = 0;
+    if (n_left_px > 0) {
+      for (; i < n_left_px; i++) left_col[i] = left_ref[i * ref_stride];
+      if (need_bottom && n_bottomleft_px > 0) {
+        assert(i == txhpx);
+        for (; i < txhpx + n_bottomleft_px; i++)
+          left_col[i] = left_ref[i * ref_stride];
+      }
+      if (i < num_left_pixels_needed)
+        aom_memset16(&left_col[i], left_col[i - 1], num_left_pixels_needed - i);
+    } else {
+      if (n_top_px > 0) {
+        aom_memset16(left_col, above_ref[0], num_left_pixels_needed);
+      } else {
+        aom_memset16(left_col, base + 1, num_left_pixels_needed);
+      }
+    }
+  }
+
+  // NEED_ABOVE
+  if (need_above) {
+    int need_right = !!(extend_modes[mode] & NEED_ABOVERIGHT);
+    if (use_filter_intra) need_right = 0;
+    if (is_dr_mode) need_right = p_angle < 90;
+    const int num_top_pixels_needed = txwpx + (need_right ? txhpx : 0);
+    if (n_top_px > 0) {
+      memcpy(above_row, above_ref, n_top_px * sizeof(above_ref[0]));
+      i = n_top_px;
+      if (need_right && n_topright_px > 0) {
+        assert(n_top_px == txwpx);
+        memcpy(above_row + txwpx, above_ref + txwpx,
+               n_topright_px * sizeof(above_ref[0]));
+        i += n_topright_px;
+      }
+      if (i < num_top_pixels_needed)
+        aom_memset16(&above_row[i], above_row[i - 1],
+                     num_top_pixels_needed - i);
+    } else {
+      if (n_left_px > 0) {
+        aom_memset16(above_row, left_ref[0], num_top_pixels_needed);
+      } else {
+        aom_memset16(above_row, base - 1, num_top_pixels_needed);
+      }
+    }
+  }
+
+  if (need_above_left) {
+    if (n_top_px > 0 && n_left_px > 0) {
+      above_row[-1] = above_ref[-1];
+    } else if (n_top_px > 0) {
+      above_row[-1] = above_ref[0];
+    } else if (n_left_px > 0) {
+      above_row[-1] = left_ref[0];
+    } else {
+      above_row[-1] = base;
+    }
+    left_col[-1] = above_row[-1];
+  }
+
+  if (use_filter_intra) {
+    highbd_filter_intra_predictor(dst, dst_stride, tx_size, above_row, left_col,
+                                  filter_intra_mode, xd->bd);
+    return;
+  }
+
+  if (is_dr_mode) {
+    int upsample_above = 0;
+    int upsample_left = 0;
+    if (!disable_edge_filter) {
+      const int need_right = p_angle < 90;
+      const int need_bottom = p_angle > 180;
+      const int filt_type = get_filt_type(xd, plane);
+      if (p_angle != 90 && p_angle != 180) {
+        const int ab_le = need_above_left ? 1 : 0;
+        if (need_above && need_left && (txwpx + txhpx >= 24)) {
+          filter_intra_edge_corner_high(above_row, left_col);
+        }
+        if (need_above && n_top_px > 0) {
+          const int strength =
+              intra_edge_filter_strength(txwpx, txhpx, p_angle - 90, filt_type);
+          const int n_px = n_top_px + ab_le + (need_right ? txhpx : 0);
+          av1_filter_intra_edge_high(above_row - ab_le, n_px, strength);
+        }
+        if (need_left && n_left_px > 0) {
+          const int strength = intra_edge_filter_strength(
+              txhpx, txwpx, p_angle - 180, filt_type);
+          const int n_px = n_left_px + ab_le + (need_bottom ? txwpx : 0);
+          av1_filter_intra_edge_high(left_col - ab_le, n_px, strength);
+        }
+      }
+      upsample_above =
+          av1_use_intra_edge_upsample(txwpx, txhpx, p_angle - 90, filt_type);
+      if (need_above && upsample_above) {
+        const int n_px = txwpx + (need_right ? txhpx : 0);
+        av1_upsample_intra_edge_high(above_row, n_px, xd->bd);
+      }
+      upsample_left =
+          av1_use_intra_edge_upsample(txhpx, txwpx, p_angle - 180, filt_type);
+      if (need_left && upsample_left) {
+        const int n_px = txhpx + (need_bottom ? txwpx : 0);
+        av1_upsample_intra_edge_high(left_col, n_px, xd->bd);
+      }
+    }
+    highbd_dr_predictor(dst, dst_stride, tx_size, above_row, left_col,
+                        upsample_above, upsample_left, p_angle, xd->bd);
+    return;
+  }
+
+  // predict
+  if (mode == DC_PRED) {
+    dc_pred_high[n_left_px > 0][n_top_px > 0][tx_size](
+        dst, dst_stride, above_row, left_col, xd->bd);
+  } else {
+    pred_high[mode][tx_size](dst, dst_stride, above_row, left_col, xd->bd);
+  }
+}
+
+static void build_intra_predictors(const MACROBLOCKD *xd, const uint8_t *ref,
+                                   int ref_stride, uint8_t *dst, int dst_stride,
+                                   PREDICTION_MODE mode, int angle_delta,
+                                   FILTER_INTRA_MODE filter_intra_mode,
+                                   TX_SIZE tx_size, int disable_edge_filter,
+                                   int n_top_px, int n_topright_px,
+                                   int n_left_px, int n_bottomleft_px,
+                                   int plane) {
+  int i;
+  const uint8_t *above_ref = ref - ref_stride;
+  const uint8_t *left_ref = ref - 1;
+  DECLARE_ALIGNED(16, uint8_t, left_data[MAX_TX_SIZE * 2 + 32]);
+  DECLARE_ALIGNED(16, uint8_t, above_data[MAX_TX_SIZE * 2 + 32]);
+  uint8_t *const above_row = above_data + 16;
+  uint8_t *const left_col = left_data + 16;
+  const int txwpx = tx_size_wide[tx_size];
+  const int txhpx = tx_size_high[tx_size];
+  int need_left = extend_modes[mode] & NEED_LEFT;
+  int need_above = extend_modes[mode] & NEED_ABOVE;
+  int need_above_left = extend_modes[mode] & NEED_ABOVELEFT;
+  int p_angle = 0;
+  const int is_dr_mode = av1_is_directional_mode(mode);
+  const int use_filter_intra = filter_intra_mode != FILTER_INTRA_MODES;
+
+  // The default values if ref pixels are not available:
+  // 127 127 127 .. 127 127 127 127 127 127
+  // 129  A   B  ..  Y   Z
+  // 129  C   D  ..  W   X
+  // 129  E   F  ..  U   V
+  // 129  G   H  ..  S   T   T   T   T   T
+  // ..
+
+  if (is_dr_mode) {
+    p_angle = mode_to_angle_map[mode] + angle_delta;
+    if (p_angle <= 90)
+      need_above = 1, need_left = 0, need_above_left = 1;
+    else if (p_angle < 180)
+      need_above = 1, need_left = 1, need_above_left = 1;
+    else
+      need_above = 0, need_left = 1, need_above_left = 1;
+  }
+  if (use_filter_intra) need_left = need_above = need_above_left = 1;
+
+  assert(n_top_px >= 0);
+  assert(n_topright_px >= 0);
+  assert(n_left_px >= 0);
+  assert(n_bottomleft_px >= 0);
+
+  if ((!need_above && n_left_px == 0) || (!need_left && n_top_px == 0)) {
+    int val;
+    if (need_left) {
+      val = (n_top_px > 0) ? above_ref[0] : 129;
+    } else {
+      val = (n_left_px > 0) ? left_ref[0] : 127;
+    }
+    for (i = 0; i < txhpx; ++i) {
+      memset(dst, val, txwpx);
+      dst += dst_stride;
+    }
+    return;
+  }
+
+  // NEED_LEFT
+  if (need_left) {
+    int need_bottom = !!(extend_modes[mode] & NEED_BOTTOMLEFT);
+    if (use_filter_intra) need_bottom = 0;
+    if (is_dr_mode) need_bottom = p_angle > 180;
+    const int num_left_pixels_needed = txhpx + (need_bottom ? txwpx : 0);
+    i = 0;
+    if (n_left_px > 0) {
+      for (; i < n_left_px; i++) left_col[i] = left_ref[i * ref_stride];
+      if (need_bottom && n_bottomleft_px > 0) {
+        assert(i == txhpx);
+        for (; i < txhpx + n_bottomleft_px; i++)
+          left_col[i] = left_ref[i * ref_stride];
+      }
+      if (i < num_left_pixels_needed)
+        memset(&left_col[i], left_col[i - 1], num_left_pixels_needed - i);
+    } else {
+      if (n_top_px > 0) {
+        memset(left_col, above_ref[0], num_left_pixels_needed);
+      } else {
+        memset(left_col, 129, num_left_pixels_needed);
+      }
+    }
+  }
+
+  // NEED_ABOVE
+  if (need_above) {
+    int need_right = !!(extend_modes[mode] & NEED_ABOVERIGHT);
+    if (use_filter_intra) need_right = 0;
+    if (is_dr_mode) need_right = p_angle < 90;
+    const int num_top_pixels_needed = txwpx + (need_right ? txhpx : 0);
+    if (n_top_px > 0) {
+      memcpy(above_row, above_ref, n_top_px);
+      i = n_top_px;
+      if (need_right && n_topright_px > 0) {
+        assert(n_top_px == txwpx);
+        memcpy(above_row + txwpx, above_ref + txwpx, n_topright_px);
+        i += n_topright_px;
+      }
+      if (i < num_top_pixels_needed)
+        memset(&above_row[i], above_row[i - 1], num_top_pixels_needed - i);
+    } else {
+      if (n_left_px > 0) {
+        memset(above_row, left_ref[0], num_top_pixels_needed);
+      } else {
+        memset(above_row, 127, num_top_pixels_needed);
+      }
+    }
+  }
+
+  if (need_above_left) {
+    if (n_top_px > 0 && n_left_px > 0) {
+      above_row[-1] = above_ref[-1];
+    } else if (n_top_px > 0) {
+      above_row[-1] = above_ref[0];
+    } else if (n_left_px > 0) {
+      above_row[-1] = left_ref[0];
+    } else {
+      above_row[-1] = 128;
+    }
+    left_col[-1] = above_row[-1];
+  }
+
+  if (use_filter_intra) {
+    av1_filter_intra_predictor(dst, dst_stride, tx_size, above_row, left_col,
+                               filter_intra_mode);
+    return;
+  }
+
+  if (is_dr_mode) {
+    int upsample_above = 0;
+    int upsample_left = 0;
+    if (!disable_edge_filter) {
+      const int need_right = p_angle < 90;
+      const int need_bottom = p_angle > 180;
+      const int filt_type = get_filt_type(xd, plane);
+      if (p_angle != 90 && p_angle != 180) {
+        const int ab_le = need_above_left ? 1 : 0;
+        if (need_above && need_left && (txwpx + txhpx >= 24)) {
+          filter_intra_edge_corner(above_row, left_col);
+        }
+        if (need_above && n_top_px > 0) {
+          const int strength =
+              intra_edge_filter_strength(txwpx, txhpx, p_angle - 90, filt_type);
+          const int n_px = n_top_px + ab_le + (need_right ? txhpx : 0);
+          av1_filter_intra_edge(above_row - ab_le, n_px, strength);
+        }
+        if (need_left && n_left_px > 0) {
+          const int strength = intra_edge_filter_strength(
+              txhpx, txwpx, p_angle - 180, filt_type);
+          const int n_px = n_left_px + ab_le + (need_bottom ? txwpx : 0);
+          av1_filter_intra_edge(left_col - ab_le, n_px, strength);
+        }
+      }
+      upsample_above =
+          av1_use_intra_edge_upsample(txwpx, txhpx, p_angle - 90, filt_type);
+      if (need_above && upsample_above) {
+        const int n_px = txwpx + (need_right ? txhpx : 0);
+        av1_upsample_intra_edge(above_row, n_px);
+      }
+      upsample_left =
+          av1_use_intra_edge_upsample(txhpx, txwpx, p_angle - 180, filt_type);
+      if (need_left && upsample_left) {
+        const int n_px = txhpx + (need_bottom ? txwpx : 0);
+        av1_upsample_intra_edge(left_col, n_px);
+      }
+    }
+    dr_predictor(dst, dst_stride, tx_size, above_row, left_col, upsample_above,
+                 upsample_left, p_angle);
+    return;
+  }
+
+  // predict
+  if (mode == DC_PRED) {
+    dc_pred[n_left_px > 0][n_top_px > 0][tx_size](dst, dst_stride, above_row,
+                                                  left_col);
+  } else {
+    pred[mode][tx_size](dst, dst_stride, above_row, left_col);
+  }
+}
+
+void av1_predict_intra_block(
+    const AV1_COMMON *cm, const MACROBLOCKD *xd, int wpx, int hpx,
+    TX_SIZE tx_size, PREDICTION_MODE mode, int angle_delta, int use_palette,
+    FILTER_INTRA_MODE filter_intra_mode, const uint8_t *ref, int ref_stride,
+    uint8_t *dst, int dst_stride, int col_off, int row_off, int plane) {
+  const MB_MODE_INFO *const mbmi = xd->mi[0];
+  const int txwpx = tx_size_wide[tx_size];
+  const int txhpx = tx_size_high[tx_size];
+  const int x = col_off << tx_size_wide_log2[0];
+  const int y = row_off << tx_size_high_log2[0];
+
+  if (use_palette) {
+    int r, c;
+    const uint8_t *const map = xd->plane[plane != 0].color_index_map +
+                               xd->color_index_map_offset[plane != 0];
+    const uint16_t *const palette =
+        mbmi->palette_mode_info.palette_colors + plane * PALETTE_MAX_SIZE;
+    if (is_cur_buf_hbd(xd)) {
+      uint16_t *dst16 = CONVERT_TO_SHORTPTR(dst);
+      for (r = 0; r < txhpx; ++r) {
+        for (c = 0; c < txwpx; ++c) {
+          dst16[r * dst_stride + c] = palette[map[(r + y) * wpx + c + x]];
+        }
+      }
+    } else {
+      for (r = 0; r < txhpx; ++r) {
+        for (c = 0; c < txwpx; ++c) {
+          dst[r * dst_stride + c] =
+              (uint8_t)palette[map[(r + y) * wpx + c + x]];
+        }
+      }
+    }
+    return;
+  }
+
+  BLOCK_SIZE bsize = mbmi->sb_type;
+  const struct macroblockd_plane *const pd = &xd->plane[plane];
+  const int txw = tx_size_wide_unit[tx_size];
+  const int txh = tx_size_high_unit[tx_size];
+  const int have_top = row_off || (pd->subsampling_y ? xd->chroma_up_available
+                                                     : xd->up_available);
+  const int have_left =
+      col_off ||
+      (pd->subsampling_x ? xd->chroma_left_available : xd->left_available);
+  const int mi_row = -xd->mb_to_top_edge >> (3 + MI_SIZE_LOG2);
+  const int mi_col = -xd->mb_to_left_edge >> (3 + MI_SIZE_LOG2);
+  const int xr_chr_offset = 0;
+  const int yd_chr_offset = 0;
+
+  // Distance between the right edge of this prediction block to
+  // the frame right edge
+  const int xr = (xd->mb_to_right_edge >> (3 + pd->subsampling_x)) +
+                 (wpx - x - txwpx) - xr_chr_offset;
+  // Distance between the bottom edge of this prediction block to
+  // the frame bottom edge
+  const int yd = (xd->mb_to_bottom_edge >> (3 + pd->subsampling_y)) +
+                 (hpx - y - txhpx) - yd_chr_offset;
+  const int right_available =
+      mi_col + ((col_off + txw) << pd->subsampling_x) < xd->tile.mi_col_end;
+  const int bottom_available =
+      (yd > 0) &&
+      (mi_row + ((row_off + txh) << pd->subsampling_y) < xd->tile.mi_row_end);
+
+  const PARTITION_TYPE partition = mbmi->partition;
+
+  // force 4x4 chroma component block size.
+  bsize = scale_chroma_bsize(bsize, pd->subsampling_x, pd->subsampling_y);
+
+  const int have_top_right = has_top_right(
+      cm, bsize, mi_row, mi_col, have_top, right_available, partition, tx_size,
+      row_off, col_off, pd->subsampling_x, pd->subsampling_y);
+  const int have_bottom_left = has_bottom_left(
+      cm, bsize, mi_row, mi_col, bottom_available, have_left, partition,
+      tx_size, row_off, col_off, pd->subsampling_x, pd->subsampling_y);
+
+  const int disable_edge_filter = !cm->seq_params.enable_intra_edge_filter;
+  if (is_cur_buf_hbd(xd)) {
+    build_intra_predictors_high(
+        xd, ref, ref_stride, dst, dst_stride, mode, angle_delta,
+        filter_intra_mode, tx_size, disable_edge_filter,
+        have_top ? AOMMIN(txwpx, xr + txwpx) : 0,
+        have_top_right ? AOMMIN(txwpx, xr) : 0,
+        have_left ? AOMMIN(txhpx, yd + txhpx) : 0,
+        have_bottom_left ? AOMMIN(txhpx, yd) : 0, plane);
+    return;
+  }
+
+  build_intra_predictors(xd, ref, ref_stride, dst, dst_stride, mode,
+                         angle_delta, filter_intra_mode, tx_size,
+                         disable_edge_filter,
+                         have_top ? AOMMIN(txwpx, xr + txwpx) : 0,
+                         have_top_right ? AOMMIN(txwpx, xr) : 0,
+                         have_left ? AOMMIN(txhpx, yd + txhpx) : 0,
+                         have_bottom_left ? AOMMIN(txhpx, yd) : 0, plane);
+}
+
+void av1_predict_intra_block_facade(const AV1_COMMON *cm, MACROBLOCKD *xd,
+                                    int plane, int blk_col, int blk_row,
+                                    TX_SIZE tx_size) {
+  const MB_MODE_INFO *const mbmi = xd->mi[0];
+  struct macroblockd_plane *const pd = &xd->plane[plane];
+  const int dst_stride = pd->dst.stride;
+  uint8_t *dst =
+      &pd->dst.buf[(blk_row * dst_stride + blk_col) << tx_size_wide_log2[0]];
+  const PREDICTION_MODE mode =
+      (plane == AOM_PLANE_Y) ? mbmi->mode : get_uv_mode(mbmi->uv_mode);
+  const int use_palette = mbmi->palette_mode_info.palette_size[plane != 0] > 0;
+  const FILTER_INTRA_MODE filter_intra_mode =
+      (plane == AOM_PLANE_Y && mbmi->filter_intra_mode_info.use_filter_intra)
+          ? mbmi->filter_intra_mode_info.filter_intra_mode
+          : FILTER_INTRA_MODES;
+  const int angle_delta = mbmi->angle_delta[plane != AOM_PLANE_Y] * ANGLE_STEP;
+
+  if (plane != AOM_PLANE_Y && mbmi->uv_mode == UV_CFL_PRED) {
+#if CONFIG_DEBUG
+    assert(is_cfl_allowed(xd));
+    const BLOCK_SIZE plane_bsize = get_plane_block_size(
+        mbmi->sb_type, pd->subsampling_x, pd->subsampling_y);
+    (void)plane_bsize;
+    assert(plane_bsize < BLOCK_SIZES_ALL);
+    if (!xd->lossless[mbmi->segment_id]) {
+      assert(blk_col == 0);
+      assert(blk_row == 0);
+      assert(block_size_wide[plane_bsize] == tx_size_wide[tx_size]);
+      assert(block_size_high[plane_bsize] == tx_size_high[tx_size]);
+    }
+#endif
+    CFL_CTX *const cfl = &xd->cfl;
+    CFL_PRED_TYPE pred_plane = get_cfl_pred_type(plane);
+    if (cfl->dc_pred_is_cached[pred_plane] == 0) {
+      av1_predict_intra_block(cm, xd, pd->width, pd->height, tx_size, mode,
+                              angle_delta, use_palette, filter_intra_mode, dst,
+                              dst_stride, dst, dst_stride, blk_col, blk_row,
+                              plane);
+      if (cfl->use_dc_pred_cache) {
+        cfl_store_dc_pred(xd, dst, pred_plane, tx_size_wide[tx_size]);
+        cfl->dc_pred_is_cached[pred_plane] = 1;
+      }
+    } else {
+      cfl_load_dc_pred(xd, dst, dst_stride, tx_size, pred_plane);
+    }
+    cfl_predict_block(xd, dst, dst_stride, tx_size, plane);
+    return;
+  }
+  av1_predict_intra_block(cm, xd, pd->width, pd->height, tx_size, mode,
+                          angle_delta, use_palette, filter_intra_mode, dst,
+                          dst_stride, dst, dst_stride, blk_col, blk_row, plane);
+}
+
+void av1_init_intra_predictors(void) {
+  aom_once(init_intra_predictors_internal);
+}

diff --git a/libav1/av1/common/reconintra.h b/libav1/av1/common/reconintra.h
new file mode 100644
index 0000000..2f386e9
--- /dev/null
+++ b/libav1/av1/common/reconintra.h

@@ -0,0 +1,128 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_RECONINTRA_H_
+#define AOM_AV1_COMMON_RECONINTRA_H_
+
+#include <stdlib.h>
+
+#include "aom/aom_integer.h"
+#include "av1/common/blockd.h"
+#include "av1/common/onyxc_int.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void av1_init_intra_predictors(void);
+void av1_predict_intra_block_facade(const AV1_COMMON *cm, MACROBLOCKD *xd,
+                                    int plane, int blk_col, int blk_row,
+                                    TX_SIZE tx_size);
+void av1_predict_intra_block(const AV1_COMMON *cm, const MACROBLOCKD *xd,
+                             int bw, int bh, TX_SIZE tx_size,
+                             PREDICTION_MODE mode, int angle_delta,
+                             int use_palette,
+                             FILTER_INTRA_MODE filter_intra_mode,
+                             const uint8_t *ref, int ref_stride, uint8_t *dst,
+                             int dst_stride, int aoff, int loff, int plane);
+int get_filt_type(const MACROBLOCKD *xd, int plane);
+int has_bottom_left(const AV1_COMMON *cm, BLOCK_SIZE bsize, int mi_row,
+    int mi_col, int bottom_available, int left_available,
+    PARTITION_TYPE partition, TX_SIZE txsz, int row_off,
+    int col_off, int ss_x, int ss_y);
+int has_top_right(const AV1_COMMON *cm, BLOCK_SIZE bsize, int mi_row,
+    int mi_col, int top_available, int right_available,
+    PARTITION_TYPE partition, TX_SIZE txsz, int row_off,
+    int col_off, int ss_x, int ss_y);
+int intra_edge_filter_strength(int bs0, int bs1, int delta, int type);
+// Mapping of interintra to intra mode for use in the intra component
+static const PREDICTION_MODE interintra_to_intra_mode[INTERINTRA_MODES] = {
+  DC_PRED, V_PRED, H_PRED, SMOOTH_PRED
+};
+
+// Mapping of intra mode to the interintra mode
+static const INTERINTRA_MODE intra_to_interintra_mode[INTRA_MODES] = {
+  II_DC_PRED, II_V_PRED, II_H_PRED, II_V_PRED,      II_SMOOTH_PRED, II_V_PRED,
+  II_H_PRED,  II_H_PRED, II_V_PRED, II_SMOOTH_PRED, II_SMOOTH_PRED
+};
+
+#define FILTER_INTRA_SCALE_BITS 4
+
+static INLINE int av1_is_directional_mode(PREDICTION_MODE mode) {
+  return mode >= V_PRED && mode <= D67_PRED;
+}
+
+static INLINE int av1_use_angle_delta(BLOCK_SIZE bsize) {
+  return bsize >= BLOCK_8X8;
+}
+
+static INLINE int av1_allow_intrabc(const AV1_COMMON *const cm) {
+  return frame_is_intra_only(cm) && cm->allow_screen_content_tools &&
+         cm->allow_intrabc;
+}
+
+static INLINE int av1_filter_intra_allowed_bsize(const AV1_COMMON *const cm,
+                                                 BLOCK_SIZE bs) {
+  if (!cm->seq_params.enable_filter_intra || bs == BLOCK_INVALID) return 0;
+
+  return block_size_wide[bs] <= 32 && block_size_high[bs] <= 32;
+}
+
+static INLINE int av1_filter_intra_allowed(const AV1_COMMON *const cm,
+                                           const MB_MODE_INFO *mbmi) {
+  return mbmi->mode == DC_PRED &&
+         mbmi->palette_mode_info.palette_size[0] == 0 &&
+         av1_filter_intra_allowed_bsize(cm, mbmi->sb_type);
+}
+
+extern const int8_t av1_filter_intra_taps[FILTER_INTRA_MODES][8][8];
+
+// Get the shift (up-scaled by 256) in X w.r.t a unit change in Y.
+// If angle > 0 && angle < 90, dx = -((int)(256 / t));
+// If angle > 90 && angle < 180, dx = (int)(256 / t);
+// If angle > 180 && angle < 270, dx = 1;
+static INLINE int av1_get_dx(int angle) {
+  if (angle > 0 && angle < 90) {
+    return dr_intra_derivative[angle];
+  } else if (angle > 90 && angle < 180) {
+    return dr_intra_derivative[180 - angle];
+  } else {
+    // In this case, we are not really going to use dx. We may return any value.
+    return 1;
+  }
+}
+
+// Get the shift (up-scaled by 256) in Y w.r.t a unit change in X.
+// If angle > 0 && angle < 90, dy = 1;
+// If angle > 90 && angle < 180, dy = (int)(256 * t);
+// If angle > 180 && angle < 270, dy = -((int)(256 * t));
+static INLINE int av1_get_dy(int angle) {
+  if (angle > 90 && angle < 180) {
+    return dr_intra_derivative[angle - 90];
+  } else if (angle > 180 && angle < 270) {
+    return dr_intra_derivative[270 - angle];
+  } else {
+    // In this case, we are not really going to use dy. We may return any value.
+    return 1;
+  }
+}
+
+static INLINE int av1_use_intra_edge_upsample(int bs0, int bs1, int delta,
+                                              int type) {
+  const int d = abs(delta);
+  const int blk_wh = bs0 + bs1;
+  if (d == 0 || d >= 40) return 0;
+  return type ? (blk_wh <= 8) : (blk_wh <= 16);
+}
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+#endif  // AOM_AV1_COMMON_RECONINTRA_H_

diff --git a/libav1/av1/common/resize.c b/libav1/av1/common/resize.c
new file mode 100644
index 0000000..8b24ed0
--- /dev/null
+++ b/libav1/av1/common/resize.c

@@ -0,0 +1,1436 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+#include <limits.h>
+#include <math.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "config/aom_config.h"
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_ports/mem.h"
+#include "aom_scale/aom_scale.h"
+#include "av1/common/common.h"
+#include "av1/common/resize.h"
+
+#include "config/aom_scale_rtcd.h"
+
+// Filters for interpolation (0.5-band) - note this also filters integer pels.
+static const InterpKernel filteredinterp_filters500[(1 << RS_SUBPEL_BITS)] = {
+  { -3, 0, 35, 64, 35, 0, -3, 0 },    { -3, 0, 34, 64, 36, 0, -3, 0 },
+  { -3, -1, 34, 64, 36, 1, -3, 0 },   { -3, -1, 33, 64, 37, 1, -3, 0 },
+  { -3, -1, 32, 64, 38, 1, -3, 0 },   { -3, -1, 31, 64, 39, 1, -3, 0 },
+  { -3, -1, 31, 63, 39, 2, -3, 0 },   { -2, -2, 30, 63, 40, 2, -3, 0 },
+  { -2, -2, 29, 63, 41, 2, -3, 0 },   { -2, -2, 29, 63, 41, 3, -4, 0 },
+  { -2, -2, 28, 63, 42, 3, -4, 0 },   { -2, -2, 27, 63, 43, 3, -4, 0 },
+  { -2, -3, 27, 63, 43, 4, -4, 0 },   { -2, -3, 26, 62, 44, 5, -4, 0 },
+  { -2, -3, 25, 62, 45, 5, -4, 0 },   { -2, -3, 25, 62, 45, 5, -4, 0 },
+  { -2, -3, 24, 62, 46, 5, -4, 0 },   { -2, -3, 23, 61, 47, 6, -4, 0 },
+  { -2, -3, 23, 61, 47, 6, -4, 0 },   { -2, -3, 22, 61, 48, 7, -4, -1 },
+  { -2, -3, 21, 60, 49, 7, -4, 0 },   { -1, -4, 20, 60, 49, 8, -4, 0 },
+  { -1, -4, 20, 60, 50, 8, -4, -1 },  { -1, -4, 19, 59, 51, 9, -4, -1 },
+  { -1, -4, 19, 59, 51, 9, -4, -1 },  { -1, -4, 18, 58, 52, 10, -4, -1 },
+  { -1, -4, 17, 58, 52, 11, -4, -1 }, { -1, -4, 16, 58, 53, 11, -4, -1 },
+  { -1, -4, 16, 57, 53, 12, -4, -1 }, { -1, -4, 15, 57, 54, 12, -4, -1 },
+  { -1, -4, 15, 56, 54, 13, -4, -1 }, { -1, -4, 14, 56, 55, 13, -4, -1 },
+  { -1, -4, 14, 55, 55, 14, -4, -1 }, { -1, -4, 13, 55, 56, 14, -4, -1 },
+  { -1, -4, 13, 54, 56, 15, -4, -1 }, { -1, -4, 12, 54, 57, 15, -4, -1 },
+  { -1, -4, 12, 53, 57, 16, -4, -1 }, { -1, -4, 11, 53, 58, 16, -4, -1 },
+  { -1, -4, 11, 52, 58, 17, -4, -1 }, { -1, -4, 10, 52, 58, 18, -4, -1 },
+  { -1, -4, 9, 51, 59, 19, -4, -1 },  { -1, -4, 9, 51, 59, 19, -4, -1 },
+  { -1, -4, 8, 50, 60, 20, -4, -1 },  { 0, -4, 8, 49, 60, 20, -4, -1 },
+  { 0, -4, 7, 49, 60, 21, -3, -2 },   { -1, -4, 7, 48, 61, 22, -3, -2 },
+  { 0, -4, 6, 47, 61, 23, -3, -2 },   { 0, -4, 6, 47, 61, 23, -3, -2 },
+  { 0, -4, 5, 46, 62, 24, -3, -2 },   { 0, -4, 5, 45, 62, 25, -3, -2 },
+  { 0, -4, 5, 45, 62, 25, -3, -2 },   { 0, -4, 5, 44, 62, 26, -3, -2 },
+  { 0, -4, 4, 43, 63, 27, -3, -2 },   { 0, -4, 3, 43, 63, 27, -2, -2 },
+  { 0, -4, 3, 42, 63, 28, -2, -2 },   { 0, -4, 3, 41, 63, 29, -2, -2 },
+  { 0, -3, 2, 41, 63, 29, -2, -2 },   { 0, -3, 2, 40, 63, 30, -2, -2 },
+  { 0, -3, 2, 39, 63, 31, -1, -3 },   { 0, -3, 1, 39, 64, 31, -1, -3 },
+  { 0, -3, 1, 38, 64, 32, -1, -3 },   { 0, -3, 1, 37, 64, 33, -1, -3 },
+  { 0, -3, 1, 36, 64, 34, -1, -3 },   { 0, -3, 0, 36, 64, 34, 0, -3 },
+};
+
+// Filters for interpolation (0.625-band) - note this also filters integer pels.
+static const InterpKernel filteredinterp_filters625[(1 << RS_SUBPEL_BITS)] = {
+  { -1, -8, 33, 80, 33, -8, -1, 0 }, { -1, -8, 31, 80, 34, -8, -1, 1 },
+  { -1, -8, 30, 80, 35, -8, -1, 1 }, { -1, -8, 29, 80, 36, -7, -2, 1 },
+  { -1, -8, 28, 80, 37, -7, -2, 1 }, { -1, -8, 27, 80, 38, -7, -2, 1 },
+  { 0, -8, 26, 79, 39, -7, -2, 1 },  { 0, -8, 25, 79, 40, -7, -2, 1 },
+  { 0, -8, 24, 79, 41, -7, -2, 1 },  { 0, -8, 23, 78, 42, -6, -2, 1 },
+  { 0, -8, 22, 78, 43, -6, -2, 1 },  { 0, -8, 21, 78, 44, -6, -2, 1 },
+  { 0, -8, 20, 78, 45, -5, -3, 1 },  { 0, -8, 19, 77, 47, -5, -3, 1 },
+  { 0, -8, 18, 77, 48, -5, -3, 1 },  { 0, -8, 17, 77, 49, -5, -3, 1 },
+  { 0, -8, 16, 76, 50, -4, -3, 1 },  { 0, -8, 15, 76, 51, -4, -3, 1 },
+  { 0, -8, 15, 75, 52, -3, -4, 1 },  { 0, -7, 14, 74, 53, -3, -4, 1 },
+  { 0, -7, 13, 74, 54, -3, -4, 1 },  { 0, -7, 12, 73, 55, -2, -4, 1 },
+  { 0, -7, 11, 73, 56, -2, -4, 1 },  { 0, -7, 10, 72, 57, -1, -4, 1 },
+  { 1, -7, 10, 71, 58, -1, -5, 1 },  { 0, -7, 9, 71, 59, 0, -5, 1 },
+  { 1, -7, 8, 70, 60, 0, -5, 1 },    { 1, -7, 7, 69, 61, 1, -5, 1 },
+  { 1, -6, 6, 68, 62, 1, -5, 1 },    { 0, -6, 6, 68, 62, 2, -5, 1 },
+  { 1, -6, 5, 67, 63, 2, -5, 1 },    { 1, -6, 5, 66, 64, 3, -6, 1 },
+  { 1, -6, 4, 65, 65, 4, -6, 1 },    { 1, -6, 3, 64, 66, 5, -6, 1 },
+  { 1, -5, 2, 63, 67, 5, -6, 1 },    { 1, -5, 2, 62, 68, 6, -6, 0 },
+  { 1, -5, 1, 62, 68, 6, -6, 1 },    { 1, -5, 1, 61, 69, 7, -7, 1 },
+  { 1, -5, 0, 60, 70, 8, -7, 1 },    { 1, -5, 0, 59, 71, 9, -7, 0 },
+  { 1, -5, -1, 58, 71, 10, -7, 1 },  { 1, -4, -1, 57, 72, 10, -7, 0 },
+  { 1, -4, -2, 56, 73, 11, -7, 0 },  { 1, -4, -2, 55, 73, 12, -7, 0 },
+  { 1, -4, -3, 54, 74, 13, -7, 0 },  { 1, -4, -3, 53, 74, 14, -7, 0 },
+  { 1, -4, -3, 52, 75, 15, -8, 0 },  { 1, -3, -4, 51, 76, 15, -8, 0 },
+  { 1, -3, -4, 50, 76, 16, -8, 0 },  { 1, -3, -5, 49, 77, 17, -8, 0 },
+  { 1, -3, -5, 48, 77, 18, -8, 0 },  { 1, -3, -5, 47, 77, 19, -8, 0 },
+  { 1, -3, -5, 45, 78, 20, -8, 0 },  { 1, -2, -6, 44, 78, 21, -8, 0 },
+  { 1, -2, -6, 43, 78, 22, -8, 0 },  { 1, -2, -6, 42, 78, 23, -8, 0 },
+  { 1, -2, -7, 41, 79, 24, -8, 0 },  { 1, -2, -7, 40, 79, 25, -8, 0 },
+  { 1, -2, -7, 39, 79, 26, -8, 0 },  { 1, -2, -7, 38, 80, 27, -8, -1 },
+  { 1, -2, -7, 37, 80, 28, -8, -1 }, { 1, -2, -7, 36, 80, 29, -8, -1 },
+  { 1, -1, -8, 35, 80, 30, -8, -1 }, { 1, -1, -8, 34, 80, 31, -8, -1 },
+};
+
+// Filters for interpolation (0.75-band) - note this also filters integer pels.
+static const InterpKernel filteredinterp_filters750[(1 << RS_SUBPEL_BITS)] = {
+  { 2, -11, 25, 96, 25, -11, 2, 0 }, { 2, -11, 24, 96, 26, -11, 2, 0 },
+  { 2, -11, 22, 96, 28, -11, 2, 0 }, { 2, -10, 21, 96, 29, -12, 2, 0 },
+  { 2, -10, 19, 96, 31, -12, 2, 0 }, { 2, -10, 18, 95, 32, -11, 2, 0 },
+  { 2, -10, 17, 95, 34, -12, 2, 0 }, { 2, -9, 15, 95, 35, -12, 2, 0 },
+  { 2, -9, 14, 94, 37, -12, 2, 0 },  { 2, -9, 13, 94, 38, -12, 2, 0 },
+  { 2, -8, 12, 93, 40, -12, 1, 0 },  { 2, -8, 11, 93, 41, -12, 1, 0 },
+  { 2, -8, 9, 92, 43, -12, 1, 1 },   { 2, -8, 8, 92, 44, -12, 1, 1 },
+  { 2, -7, 7, 91, 46, -12, 1, 0 },   { 2, -7, 6, 90, 47, -12, 1, 1 },
+  { 2, -7, 5, 90, 49, -12, 1, 0 },   { 2, -6, 4, 89, 50, -12, 1, 0 },
+  { 2, -6, 3, 88, 52, -12, 0, 1 },   { 2, -6, 2, 87, 54, -12, 0, 1 },
+  { 2, -5, 1, 86, 55, -12, 0, 1 },   { 2, -5, 0, 85, 57, -12, 0, 1 },
+  { 2, -5, -1, 84, 58, -11, 0, 1 },  { 2, -5, -2, 83, 60, -11, 0, 1 },
+  { 2, -4, -2, 82, 61, -11, -1, 1 }, { 1, -4, -3, 81, 63, -10, -1, 1 },
+  { 2, -4, -4, 80, 64, -10, -1, 1 }, { 1, -4, -4, 79, 66, -10, -1, 1 },
+  { 1, -3, -5, 77, 67, -9, -1, 1 },  { 1, -3, -6, 76, 69, -9, -1, 1 },
+  { 1, -3, -6, 75, 70, -8, -2, 1 },  { 1, -2, -7, 74, 71, -8, -2, 1 },
+  { 1, -2, -7, 72, 72, -7, -2, 1 },  { 1, -2, -8, 71, 74, -7, -2, 1 },
+  { 1, -2, -8, 70, 75, -6, -3, 1 },  { 1, -1, -9, 69, 76, -6, -3, 1 },
+  { 1, -1, -9, 67, 77, -5, -3, 1 },  { 1, -1, -10, 66, 79, -4, -4, 1 },
+  { 1, -1, -10, 64, 80, -4, -4, 2 }, { 1, -1, -10, 63, 81, -3, -4, 1 },
+  { 1, -1, -11, 61, 82, -2, -4, 2 }, { 1, 0, -11, 60, 83, -2, -5, 2 },
+  { 1, 0, -11, 58, 84, -1, -5, 2 },  { 1, 0, -12, 57, 85, 0, -5, 2 },
+  { 1, 0, -12, 55, 86, 1, -5, 2 },   { 1, 0, -12, 54, 87, 2, -6, 2 },
+  { 1, 0, -12, 52, 88, 3, -6, 2 },   { 0, 1, -12, 50, 89, 4, -6, 2 },
+  { 0, 1, -12, 49, 90, 5, -7, 2 },   { 1, 1, -12, 47, 90, 6, -7, 2 },
+  { 0, 1, -12, 46, 91, 7, -7, 2 },   { 1, 1, -12, 44, 92, 8, -8, 2 },
+  { 1, 1, -12, 43, 92, 9, -8, 2 },   { 0, 1, -12, 41, 93, 11, -8, 2 },
+  { 0, 1, -12, 40, 93, 12, -8, 2 },  { 0, 2, -12, 38, 94, 13, -9, 2 },
+  { 0, 2, -12, 37, 94, 14, -9, 2 },  { 0, 2, -12, 35, 95, 15, -9, 2 },
+  { 0, 2, -12, 34, 95, 17, -10, 2 }, { 0, 2, -11, 32, 95, 18, -10, 2 },
+  { 0, 2, -12, 31, 96, 19, -10, 2 }, { 0, 2, -12, 29, 96, 21, -10, 2 },
+  { 0, 2, -11, 28, 96, 22, -11, 2 }, { 0, 2, -11, 26, 96, 24, -11, 2 },
+};
+
+// Filters for interpolation (0.875-band) - note this also filters integer pels.
+static const InterpKernel filteredinterp_filters875[(1 << RS_SUBPEL_BITS)] = {
+  { 3, -8, 13, 112, 13, -8, 3, 0 },   { 2, -7, 12, 112, 15, -8, 3, -1 },
+  { 3, -7, 10, 112, 17, -9, 3, -1 },  { 2, -6, 8, 112, 19, -9, 3, -1 },
+  { 2, -6, 7, 112, 21, -10, 3, -1 },  { 2, -5, 6, 111, 22, -10, 3, -1 },
+  { 2, -5, 4, 111, 24, -10, 3, -1 },  { 2, -4, 3, 110, 26, -11, 3, -1 },
+  { 2, -4, 1, 110, 28, -11, 3, -1 },  { 2, -4, 0, 109, 30, -12, 4, -1 },
+  { 1, -3, -1, 108, 32, -12, 4, -1 }, { 1, -3, -2, 108, 34, -13, 4, -1 },
+  { 1, -2, -4, 107, 36, -13, 4, -1 }, { 1, -2, -5, 106, 38, -13, 4, -1 },
+  { 1, -1, -6, 105, 40, -14, 4, -1 }, { 1, -1, -7, 104, 42, -14, 4, -1 },
+  { 1, -1, -7, 103, 44, -15, 4, -1 }, { 1, 0, -8, 101, 46, -15, 4, -1 },
+  { 1, 0, -9, 100, 48, -15, 4, -1 },  { 1, 0, -10, 99, 50, -15, 4, -1 },
+  { 1, 1, -11, 97, 53, -16, 4, -1 },  { 0, 1, -11, 96, 55, -16, 4, -1 },
+  { 0, 1, -12, 95, 57, -16, 4, -1 },  { 0, 2, -13, 93, 59, -16, 4, -1 },
+  { 0, 2, -13, 91, 61, -16, 4, -1 },  { 0, 2, -14, 90, 63, -16, 4, -1 },
+  { 0, 2, -14, 88, 65, -16, 4, -1 },  { 0, 2, -15, 86, 67, -16, 4, 0 },
+  { 0, 3, -15, 84, 69, -17, 4, 0 },   { 0, 3, -16, 83, 71, -17, 4, 0 },
+  { 0, 3, -16, 81, 73, -16, 3, 0 },   { 0, 3, -16, 79, 75, -16, 3, 0 },
+  { 0, 3, -16, 77, 77, -16, 3, 0 },   { 0, 3, -16, 75, 79, -16, 3, 0 },
+  { 0, 3, -16, 73, 81, -16, 3, 0 },   { 0, 4, -17, 71, 83, -16, 3, 0 },
+  { 0, 4, -17, 69, 84, -15, 3, 0 },   { 0, 4, -16, 67, 86, -15, 2, 0 },
+  { -1, 4, -16, 65, 88, -14, 2, 0 },  { -1, 4, -16, 63, 90, -14, 2, 0 },
+  { -1, 4, -16, 61, 91, -13, 2, 0 },  { -1, 4, -16, 59, 93, -13, 2, 0 },
+  { -1, 4, -16, 57, 95, -12, 1, 0 },  { -1, 4, -16, 55, 96, -11, 1, 0 },
+  { -1, 4, -16, 53, 97, -11, 1, 1 },  { -1, 4, -15, 50, 99, -10, 0, 1 },
+  { -1, 4, -15, 48, 100, -9, 0, 1 },  { -1, 4, -15, 46, 101, -8, 0, 1 },
+  { -1, 4, -15, 44, 103, -7, -1, 1 }, { -1, 4, -14, 42, 104, -7, -1, 1 },
+  { -1, 4, -14, 40, 105, -6, -1, 1 }, { -1, 4, -13, 38, 106, -5, -2, 1 },
+  { -1, 4, -13, 36, 107, -4, -2, 1 }, { -1, 4, -13, 34, 108, -2, -3, 1 },
+  { -1, 4, -12, 32, 108, -1, -3, 1 }, { -1, 4, -12, 30, 109, 0, -4, 2 },
+  { -1, 3, -11, 28, 110, 1, -4, 2 },  { -1, 3, -11, 26, 110, 3, -4, 2 },
+  { -1, 3, -10, 24, 111, 4, -5, 2 },  { -1, 3, -10, 22, 111, 6, -5, 2 },
+  { -1, 3, -10, 21, 112, 7, -6, 2 },  { -1, 3, -9, 19, 112, 8, -6, 2 },
+  { -1, 3, -9, 17, 112, 10, -7, 3 },  { -1, 3, -8, 15, 112, 12, -7, 2 },
+};
+
+const int16_t av1_resize_filter_normative[(
+    1 << RS_SUBPEL_BITS)][UPSCALE_NORMATIVE_TAPS] = {
+#if UPSCALE_NORMATIVE_TAPS == 8
+  { 0, 0, 0, 128, 0, 0, 0, 0 },        { 0, 0, -1, 128, 2, -1, 0, 0 },
+  { 0, 1, -3, 127, 4, -2, 1, 0 },      { 0, 1, -4, 127, 6, -3, 1, 0 },
+  { 0, 2, -6, 126, 8, -3, 1, 0 },      { 0, 2, -7, 125, 11, -4, 1, 0 },
+  { -1, 2, -8, 125, 13, -5, 2, 0 },    { -1, 3, -9, 124, 15, -6, 2, 0 },
+  { -1, 3, -10, 123, 18, -6, 2, -1 },  { -1, 3, -11, 122, 20, -7, 3, -1 },
+  { -1, 4, -12, 121, 22, -8, 3, -1 },  { -1, 4, -13, 120, 25, -9, 3, -1 },
+  { -1, 4, -14, 118, 28, -9, 3, -1 },  { -1, 4, -15, 117, 30, -10, 4, -1 },
+  { -1, 5, -16, 116, 32, -11, 4, -1 }, { -1, 5, -16, 114, 35, -12, 4, -1 },
+  { -1, 5, -17, 112, 38, -12, 4, -1 }, { -1, 5, -18, 111, 40, -13, 5, -1 },
+  { -1, 5, -18, 109, 43, -14, 5, -1 }, { -1, 6, -19, 107, 45, -14, 5, -1 },
+  { -1, 6, -19, 105, 48, -15, 5, -1 }, { -1, 6, -19, 103, 51, -16, 5, -1 },
+  { -1, 6, -20, 101, 53, -16, 6, -1 }, { -1, 6, -20, 99, 56, -17, 6, -1 },
+  { -1, 6, -20, 97, 58, -17, 6, -1 },  { -1, 6, -20, 95, 61, -18, 6, -1 },
+  { -2, 7, -20, 93, 64, -18, 6, -2 },  { -2, 7, -20, 91, 66, -19, 6, -1 },
+  { -2, 7, -20, 88, 69, -19, 6, -1 },  { -2, 7, -20, 86, 71, -19, 6, -1 },
+  { -2, 7, -20, 84, 74, -20, 7, -2 },  { -2, 7, -20, 81, 76, -20, 7, -1 },
+  { -2, 7, -20, 79, 79, -20, 7, -2 },  { -1, 7, -20, 76, 81, -20, 7, -2 },
+  { -2, 7, -20, 74, 84, -20, 7, -2 },  { -1, 6, -19, 71, 86, -20, 7, -2 },
+  { -1, 6, -19, 69, 88, -20, 7, -2 },  { -1, 6, -19, 66, 91, -20, 7, -2 },
+  { -2, 6, -18, 64, 93, -20, 7, -2 },  { -1, 6, -18, 61, 95, -20, 6, -1 },
+  { -1, 6, -17, 58, 97, -20, 6, -1 },  { -1, 6, -17, 56, 99, -20, 6, -1 },
+  { -1, 6, -16, 53, 101, -20, 6, -1 }, { -1, 5, -16, 51, 103, -19, 6, -1 },
+  { -1, 5, -15, 48, 105, -19, 6, -1 }, { -1, 5, -14, 45, 107, -19, 6, -1 },
+  { -1, 5, -14, 43, 109, -18, 5, -1 }, { -1, 5, -13, 40, 111, -18, 5, -1 },
+  { -1, 4, -12, 38, 112, -17, 5, -1 }, { -1, 4, -12, 35, 114, -16, 5, -1 },
+  { -1, 4, -11, 32, 116, -16, 5, -1 }, { -1, 4, -10, 30, 117, -15, 4, -1 },
+  { -1, 3, -9, 28, 118, -14, 4, -1 },  { -1, 3, -9, 25, 120, -13, 4, -1 },
+  { -1, 3, -8, 22, 121, -12, 4, -1 },  { -1, 3, -7, 20, 122, -11, 3, -1 },
+  { -1, 2, -6, 18, 123, -10, 3, -1 },  { 0, 2, -6, 15, 124, -9, 3, -1 },
+  { 0, 2, -5, 13, 125, -8, 2, -1 },    { 0, 1, -4, 11, 125, -7, 2, 0 },
+  { 0, 1, -3, 8, 126, -6, 2, 0 },      { 0, 1, -3, 6, 127, -4, 1, 0 },
+  { 0, 1, -2, 4, 127, -3, 1, 0 },      { 0, 0, -1, 2, 128, -1, 0, 0 },
+#else
+#error "Invalid value of UPSCALE_NORMATIVE_TAPS"
+#endif  // UPSCALE_NORMATIVE_TAPS == 8
+};
+
+// Filters for interpolation (full-band) - no filtering for integer pixels
+#define filteredinterp_filters1000 av1_resize_filter_normative
+
+// Filters for factor of 2 downsampling.
+static const int16_t av1_down2_symeven_half_filter[] = { 56, 12, -3, -1 };
+static const int16_t av1_down2_symodd_half_filter[] = { 64, 35, 0, -3 };
+
+static const InterpKernel *choose_interp_filter(int in_length, int out_length) {
+  int out_length16 = out_length * 16;
+  if (out_length16 >= in_length * 16)
+    return filteredinterp_filters1000;
+  else if (out_length16 >= in_length * 13)
+    return filteredinterp_filters875;
+  else if (out_length16 >= in_length * 11)
+    return filteredinterp_filters750;
+  else if (out_length16 >= in_length * 9)
+    return filteredinterp_filters625;
+  else
+    return filteredinterp_filters500;
+}
+
+static void interpolate_core(const uint8_t *const input, int in_length,
+                             uint8_t *output, int out_length,
+                             const int16_t *interp_filters, int interp_taps) {
+  const int32_t delta =
+      (((uint32_t)in_length << RS_SCALE_SUBPEL_BITS) + out_length / 2) /
+      out_length;
+  const int32_t offset =
+      in_length > out_length
+          ? (((int32_t)(in_length - out_length) << (RS_SCALE_SUBPEL_BITS - 1)) +
+             out_length / 2) /
+                out_length
+          : -(((int32_t)(out_length - in_length)
+               << (RS_SCALE_SUBPEL_BITS - 1)) +
+              out_length / 2) /
+                out_length;
+  uint8_t *optr = output;
+  int x, x1, x2, sum, k, int_pel, sub_pel;
+  int32_t y;
+
+  x = 0;
+  y = offset + RS_SCALE_EXTRA_OFF;
+  while ((y >> RS_SCALE_SUBPEL_BITS) < (interp_taps / 2 - 1)) {
+    x++;
+    y += delta;
+  }
+  x1 = x;
+  x = out_length - 1;
+  y = delta * x + offset + RS_SCALE_EXTRA_OFF;
+  while ((y >> RS_SCALE_SUBPEL_BITS) + (int32_t)(interp_taps / 2) >=
+         in_length) {
+    x--;
+    y -= delta;
+  }
+  x2 = x;
+  if (x1 > x2) {
+    for (x = 0, y = offset + RS_SCALE_EXTRA_OFF; x < out_length;
+         ++x, y += delta) {
+      int_pel = y >> RS_SCALE_SUBPEL_BITS;
+      sub_pel = (y >> RS_SCALE_EXTRA_BITS) & RS_SUBPEL_MASK;
+      const int16_t *filter = &interp_filters[sub_pel * interp_taps];
+      sum = 0;
+      for (k = 0; k < interp_taps; ++k) {
+        const int pk = int_pel - interp_taps / 2 + 1 + k;
+        sum += filter[k] * input[AOMMAX(AOMMIN(pk, in_length - 1), 0)];
+      }
+      *optr++ = clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS));
+    }
+  } else {
+    // Initial part.
+    for (x = 0, y = offset + RS_SCALE_EXTRA_OFF; x < x1; ++x, y += delta) {
+      int_pel = y >> RS_SCALE_SUBPEL_BITS;
+      sub_pel = (y >> RS_SCALE_EXTRA_BITS) & RS_SUBPEL_MASK;
+      const int16_t *filter = &interp_filters[sub_pel * interp_taps];
+      sum = 0;
+      for (k = 0; k < interp_taps; ++k)
+        sum += filter[k] * input[AOMMAX(int_pel - interp_taps / 2 + 1 + k, 0)];
+      *optr++ = clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS));
+    }
+    // Middle part.
+    for (; x <= x2; ++x, y += delta) {
+      int_pel = y >> RS_SCALE_SUBPEL_BITS;
+      sub_pel = (y >> RS_SCALE_EXTRA_BITS) & RS_SUBPEL_MASK;
+      const int16_t *filter = &interp_filters[sub_pel * interp_taps];
+      sum = 0;
+      for (k = 0; k < interp_taps; ++k)
+        sum += filter[k] * input[int_pel - interp_taps / 2 + 1 + k];
+      *optr++ = clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS));
+    }
+    // End part.
+    for (; x < out_length; ++x, y += delta) {
+      int_pel = y >> RS_SCALE_SUBPEL_BITS;
+      sub_pel = (y >> RS_SCALE_EXTRA_BITS) & RS_SUBPEL_MASK;
+      const int16_t *filter = &interp_filters[sub_pel * interp_taps];
+      sum = 0;
+      for (k = 0; k < interp_taps; ++k)
+        sum += filter[k] *
+               input[AOMMIN(int_pel - interp_taps / 2 + 1 + k, in_length - 1)];
+      *optr++ = clip_pixel(ROUND_POWER_OF_TWO(sum, FILTER_BITS));
+    }
+  }
+}
+
+static void interpolate_core_double_prec(const double *const input,
+                                         int in_length, double *output,
+                                         int out_length,
+                                         const int16_t *interp_filters,
+                                         int interp_taps) {
+  const int32_t delta =
+      (((uint32_t)in_length << RS_SCALE_SUBPEL_BITS) + out_length / 2) /
+      out_length;
+  const int32_t offset =
+      in_length > out_length
+          ? (((int32_t)(in_length - out_length) << (RS_SCALE_SUBPEL_BITS - 1)) +
+             out_length / 2) /
+                out_length
+          : -(((int32_t)(out_length - in_length)
+               << (RS_SCALE_SUBPEL_BITS - 1)) +
+              out_length / 2) /
+                out_length;
+  double *optr = output;
+  int x, x1, x2, k, int_pel, sub_pel;
+  double sum;
+  int32_t y;
+
+  x = 0;
+  y = offset + RS_SCALE_EXTRA_OFF;
+  while ((y >> RS_SCALE_SUBPEL_BITS) < (interp_taps / 2 - 1)) {
+    x++;
+    y += delta;
+  }
+  x1 = x;
+  x = out_length - 1;
+  y = delta * x + offset + RS_SCALE_EXTRA_OFF;
+  while ((y >> RS_SCALE_SUBPEL_BITS) + (int32_t)(interp_taps / 2) >=
+         in_length) {
+    x--;
+    y -= delta;
+  }
+  x2 = x;
+  if (x1 > x2) {
+    for (x = 0, y = offset + RS_SCALE_EXTRA_OFF; x < out_length;
+         ++x, y += delta) {
+      int_pel = y >> RS_SCALE_SUBPEL_BITS;
+      sub_pel = (y >> RS_SCALE_EXTRA_BITS) & RS_SUBPEL_MASK;
+      const int16_t *filter = &interp_filters[sub_pel * interp_taps];
+      sum = 0;
+      for (k = 0; k < interp_taps; ++k) {
+        const int pk = int_pel - interp_taps / 2 + 1 + k;
+        sum += filter[k] * input[AOMMAX(AOMMIN(pk, in_length - 1), 0)];
+      }
+      *optr++ = sum / (1 << FILTER_BITS);
+    }
+  } else {
+    // Initial part.
+    for (x = 0, y = offset + RS_SCALE_EXTRA_OFF; x < x1; ++x, y += delta) {
+      int_pel = y >> RS_SCALE_SUBPEL_BITS;
+      sub_pel = (y >> RS_SCALE_EXTRA_BITS) & RS_SUBPEL_MASK;
+      const int16_t *filter = &interp_filters[sub_pel * interp_taps];
+      sum = 0;
+      for (k = 0; k < interp_taps; ++k)
+        sum += filter[k] * input[AOMMAX(int_pel - interp_taps / 2 + 1 + k, 0)];
+      *optr++ = sum / (1 << FILTER_BITS);
+    }
+    // Middle part.
+    for (; x <= x2; ++x, y += delta) {
+      int_pel = y >> RS_SCALE_SUBPEL_BITS;
+      sub_pel = (y >> RS_SCALE_EXTRA_BITS) & RS_SUBPEL_MASK;
+      const int16_t *filter = &interp_filters[sub_pel * interp_taps];
+      sum = 0;
+      for (k = 0; k < interp_taps; ++k)
+        sum += filter[k] * input[int_pel - interp_taps / 2 + 1 + k];
+      *optr++ = sum / (1 << FILTER_BITS);
+    }
+    // End part.
+    for (; x < out_length; ++x, y += delta) {
+      int_pel = y >> RS_SCALE_SUBPEL_BITS;
+      sub_pel = (y >> RS_SCALE_EXTRA_BITS) & RS_SUBPEL_MASK;
+      const int16_t *filter = &interp_filters[sub_pel * interp_taps];
+      sum = 0;
+      for (k = 0; k < interp_taps; ++k)
+        sum += filter[k] *
+               input[AOMMIN(int_pel - interp_taps / 2 + 1 + k, in_length - 1)];
+      *optr++ = sum / (1 << FILTER_BITS);
+    }
+  }
+}
+
+static void interpolate(const uint8_t *const input, int in_length,
+                        uint8_t *output, int out_length) {
+  const InterpKernel *interp_filters =
+      choose_interp_filter(in_length, out_length);
+
+  interpolate_core(input, in_length, output, out_length, &interp_filters[0][0],
+                   SUBPEL_TAPS);
+}
+
+static void interpolate_double_prec(const double *const input, int in_length,
+                                    double *output, int out_length) {
+  const InterpKernel *interp_filters =
+      choose_interp_filter(in_length, out_length);
+
+  interpolate_core_double_prec(input, in_length, output, out_length,
+                               &interp_filters[0][0], SUBPEL_TAPS);
+}
+
+int32_t av1_get_upscale_convolve_step(int in_length, int out_length) {
+  return ((in_length << RS_SCALE_SUBPEL_BITS) + out_length / 2) / out_length;
+}
+
+static int32_t get_upscale_convolve_x0(int in_length, int out_length,
+                                       int32_t x_step_qn) {
+  const int err = out_length * x_step_qn - (in_length << RS_SCALE_SUBPEL_BITS);
+  const int32_t x0 =
+      (-((out_length - in_length) << (RS_SCALE_SUBPEL_BITS - 1)) +
+       out_length / 2) /
+          out_length +
+      RS_SCALE_EXTRA_OFF - err / 2;
+  return (int32_t)((uint32_t)x0 & RS_SCALE_SUBPEL_MASK);
+}
+
+#ifndef __clang_analyzer__
+static void down2_symeven(const uint8_t *const input, int length,
+                          uint8_t *output) {
+  // Actual filter len = 2 * filter_len_half.
+  const int16_t *filter = av1_down2_symeven_half_filter;
+  const int filter_len_half = sizeof(av1_down2_symeven_half_filter) / 2;
+  int i, j;
+  uint8_t *optr = output;
+  int l1 = filter_len_half;
+  int l2 = (length - filter_len_half);
+  l1 += (l1 & 1);
+  l2 += (l2 & 1);
+  if (l1 > l2) {
+    // Short input length.
+    for (i = 0; i < length; i += 2) {
+      int sum = (1 << (FILTER_BITS - 1));
+      for (j = 0; j < filter_len_half; ++j) {
+        sum +=
+            (input[AOMMAX(i - j, 0)] + input[AOMMIN(i + 1 + j, length - 1)]) *
+            filter[j];
+      }
+      sum >>= FILTER_BITS;
+      *optr++ = clip_pixel(sum);
+    }
+  } else {
+    // Initial part.
+    for (i = 0; i < l1; i += 2) {
+      int sum = (1 << (FILTER_BITS - 1));
+      for (j = 0; j < filter_len_half; ++j) {
+        sum += (input[AOMMAX(i - j, 0)] + input[i + 1 + j]) * filter[j];
+      }
+      sum >>= FILTER_BITS;
+      *optr++ = clip_pixel(sum);
+    }
+    // Middle part.
+    for (; i < l2; i += 2) {
+      int sum = (1 << (FILTER_BITS - 1));
+      for (j = 0; j < filter_len_half; ++j) {
+        sum += (input[i - j] + input[i + 1 + j]) * filter[j];
+      }
+      sum >>= FILTER_BITS;
+      *optr++ = clip_pixel(sum);
+    }
+    // End part.
+    for (; i < length; i += 2) {
+      int sum = (1 << (FILTER_BITS - 1));
+      for (j = 0; j < filter_len_half; ++j) {
+        sum +=
+            (input[i - j] + input[AOMMIN(i + 1 + j, length - 1)]) * filter[j];
+      }
+      sum >>= FILTER_BITS;
+      *optr++ = clip_pixel(sum);
+    }
+  }
+}
+#endif
+
+static void down2_symodd(const uint8_t *const input, int length,
+                         uint8_t *output) {
+  // Actual filter len = 2 * filter_len_half - 1.
+  const int16_t *filter = av1_down2_symodd_half_filter;
+  const int filter_len_half = sizeof(av1_down2_symodd_half_filter) / 2;
+  int i, j;
+  uint8_t *optr = output;
+  int l1 = filter_len_half - 1;
+  int l2 = (length - filter_len_half + 1);
+  l1 += (l1 & 1);
+  l2 += (l2 & 1);
+  if (l1 > l2) {
+    // Short input length.
+    for (i = 0; i < length; i += 2) {
+      int sum = (1 << (FILTER_BITS - 1)) + input[i] * filter[0];
+      for (j = 1; j < filter_len_half; ++j) {
+        sum += (input[(i - j < 0 ? 0 : i - j)] +
+                input[(i + j >= length ? length - 1 : i + j)]) *
+               filter[j];
+      }
+      sum >>= FILTER_BITS;
+      *optr++ = clip_pixel(sum);
+    }
+  } else {
+    // Initial part.
+    for (i = 0; i < l1; i += 2) {
+      int sum = (1 << (FILTER_BITS - 1)) + input[i] * filter[0];
+      for (j = 1; j < filter_len_half; ++j) {
+        sum += (input[(i - j < 0 ? 0 : i - j)] + input[i + j]) * filter[j];
+      }
+      sum >>= FILTER_BITS;
+      *optr++ = clip_pixel(sum);
+    }
+    // Middle part.
+    for (; i < l2; i += 2) {
+      int sum = (1 << (FILTER_BITS - 1)) + input[i] * filter[0];
+      for (j = 1; j < filter_len_half; ++j) {
+        sum += (input[i - j] + input[i + j]) * filter[j];
+      }
+      sum >>= FILTER_BITS;
+      *optr++ = clip_pixel(sum);
+    }
+    // End part.
+    for (; i < length; i += 2) {
+      int sum = (1 << (FILTER_BITS - 1)) + input[i] * filter[0];
+      for (j = 1; j < filter_len_half; ++j) {
+        sum += (input[i - j] + input[(i + j >= length ? length - 1 : i + j)]) *
+               filter[j];
+      }
+      sum >>= FILTER_BITS;
+      *optr++ = clip_pixel(sum);
+    }
+  }
+}
+
+static int get_down2_length(int length, int steps) {
+  for (int s = 0; s < steps; ++s) length = (length + 1) >> 1;
+  return length;
+}
+
+static int get_down2_steps(int in_length, int out_length) {
+  int steps = 0;
+  int proj_in_length;
+  while ((proj_in_length = get_down2_length(in_length, 1)) >= out_length) {
+    ++steps;
+    in_length = proj_in_length;
+    if (in_length == 1) {
+      // Special case: we break because any further calls to get_down2_length()
+      // with be with length == 1, which return 1, resulting in an infinite
+      // loop.
+      break;
+    }
+  }
+  return steps;
+}
+
+static void resize_multistep(const uint8_t *const input, int length,
+                             uint8_t *output, int olength, uint8_t *otmp) {
+  if (length == olength) {
+    memcpy(output, input, sizeof(output[0]) * length);
+    return;
+  }
+  const int steps = get_down2_steps(length, olength);
+
+  if (steps > 0) {
+    uint8_t *out = NULL;
+    int filteredlength = length;
+
+    assert(otmp != NULL);
+    uint8_t *otmp2 = otmp + get_down2_length(length, 1);
+    for (int s = 0; s < steps; ++s) {
+      const int proj_filteredlength = get_down2_length(filteredlength, 1);
+      const uint8_t *const in = (s == 0 ? input : out);
+      if (s == steps - 1 && proj_filteredlength == olength)
+        out = output;
+      else
+        out = (s & 1 ? otmp2 : otmp);
+      if (filteredlength & 1)
+        down2_symodd(in, filteredlength, out);
+      else
+        down2_symeven(in, filteredlength, out);
+      filteredlength = proj_filteredlength;
+    }
+    if (filteredlength != olength) {
+      interpolate(out, filteredlength, output, olength);
+    }
+  } else {
+    interpolate(input, length, output, olength);
+  }
+}
+
+static void upscale_multistep_double_prec(const double *const input, int length,
+                                          double *output, int olength) {
+  assert(length < olength);
+  interpolate_double_prec(input, length, output, olength);
+}
+
+static void fill_col_to_arr(uint8_t *img, int stride, int len, uint8_t *arr) {
+  int i;
+  uint8_t *iptr = img;
+  uint8_t *aptr = arr;
+  for (i = 0; i < len; ++i, iptr += stride) {
+    *aptr++ = *iptr;
+  }
+}
+
+static void fill_arr_to_col(uint8_t *img, int stride, int len, uint8_t *arr) {
+  int i;
+  uint8_t *iptr = img;
+  uint8_t *aptr = arr;
+  for (i = 0; i < len; ++i, iptr += stride) {
+    *iptr = *aptr++;
+  }
+}
+
+static void fill_col_to_arr_double_prec(double *img, int stride, int len,
+                                        double *arr) {
+  int i;
+  double *iptr = img;
+  double *aptr = arr;
+  for (i = 0; i < len; ++i, iptr += stride) {
+    *aptr++ = *iptr;
+  }
+}
+
+static void fill_arr_to_col_double_prec(double *img, int stride, int len,
+                                        double *arr) {
+  int i;
+  double *iptr = img;
+  double *aptr = arr;
+  for (i = 0; i < len; ++i, iptr += stride) {
+    *iptr = *aptr++;
+  }
+}
+
+void av1_resize_plane(const uint8_t *const input, int height, int width,
+                      int in_stride, uint8_t *output, int height2, int width2,
+                      int out_stride) {
+  int i;
+  uint8_t *intbuf = (uint8_t *)aom_malloc(sizeof(uint8_t) * width2 * height);
+  uint8_t *tmpbuf =
+      (uint8_t *)aom_malloc(sizeof(uint8_t) * AOMMAX(width, height));
+  uint8_t *arrbuf = (uint8_t *)aom_malloc(sizeof(uint8_t) * height);
+  uint8_t *arrbuf2 = (uint8_t *)aom_malloc(sizeof(uint8_t) * height2);
+  if (intbuf == NULL || tmpbuf == NULL || arrbuf == NULL || arrbuf2 == NULL)
+    goto Error;
+  assert(width > 0);
+  assert(height > 0);
+  assert(width2 > 0);
+  assert(height2 > 0);
+  for (i = 0; i < height; ++i)
+    resize_multistep(input + in_stride * i, width, intbuf + width2 * i, width2,
+                     tmpbuf);
+  for (i = 0; i < width2; ++i) {
+    fill_col_to_arr(intbuf + i, width2, height, arrbuf);
+    resize_multistep(arrbuf, height, arrbuf2, height2, tmpbuf);
+    fill_arr_to_col(output + i, out_stride, height2, arrbuf2);
+  }
+
+Error:
+  aom_free(intbuf);
+  aom_free(tmpbuf);
+  aom_free(arrbuf);
+  aom_free(arrbuf2);
+}
+
+void av1_upscale_plane_double_prec(const double *const input, int height,
+                                   int width, int in_stride, double *output,
+                                   int height2, int width2, int out_stride) {
+  int i;
+  double *intbuf = (double *)aom_malloc(sizeof(double) * width2 * height);
+  double *arrbuf = (double *)aom_malloc(sizeof(double) * height);
+  double *arrbuf2 = (double *)aom_malloc(sizeof(double) * height2);
+  if (intbuf == NULL || arrbuf == NULL || arrbuf2 == NULL) goto Error;
+  assert(width > 0);
+  assert(height > 0);
+  assert(width2 > 0);
+  assert(height2 > 0);
+  for (i = 0; i < height; ++i)
+    upscale_multistep_double_prec(input + in_stride * i, width,
+                                  intbuf + width2 * i, width2);
+  for (i = 0; i < width2; ++i) {
+    fill_col_to_arr_double_prec(intbuf + i, width2, height, arrbuf);
+    upscale_multistep_double_prec(arrbuf, height, arrbuf2, height2);
+    fill_arr_to_col_double_prec(output + i, out_stride, height2, arrbuf2);
+  }
+
+Error:
+  aom_free(intbuf);
+  aom_free(arrbuf);
+  aom_free(arrbuf2);
+}
+
+static void upscale_normative_rect(const uint8_t *const input, int height,
+                                   int width, int in_stride, uint8_t *output,
+                                   int height2, int width2, int out_stride,
+                                   int x_step_qn, int x0_qn, int pad_left,
+                                   int pad_right) {
+  assert(width > 0);
+  assert(height > 0);
+  assert(width2 > 0);
+  assert(height2 > 0);
+  assert(height2 == height);
+
+  // Extend the left/right pixels of the tile column if needed
+  // (either because we can't sample from other tiles, or because we're at
+  // a frame edge).
+  // Save the overwritten pixels into tmp_left and tmp_right.
+  // Note: Because we pass input-1 to av1_convolve_horiz_rs, we need one extra
+  // column of border pixels compared to what we'd naively think.
+  const int border_cols = UPSCALE_NORMATIVE_TAPS / 2 + 1;
+  uint8_t *tmp_left =
+      NULL;  // Silence spurious "may be used uninitialized" warnings
+  uint8_t *tmp_right = NULL;
+  uint8_t *const in_tl = (uint8_t *)(input - border_cols);  // Cast off 'const'
+  uint8_t *const in_tr = (uint8_t *)(input + width);
+  if (pad_left) {
+    tmp_left = (uint8_t *)aom_malloc(sizeof(*tmp_left) * border_cols * height);
+    for (int i = 0; i < height; i++) {
+      memcpy(tmp_left + i * border_cols, in_tl + i * in_stride, border_cols);
+      memset(in_tl + i * in_stride, input[i * in_stride], border_cols);
+    }
+  }
+  if (pad_right) {
+    tmp_right =
+        (uint8_t *)aom_malloc(sizeof(*tmp_right) * border_cols * height);
+    for (int i = 0; i < height; i++) {
+      memcpy(tmp_right + i * border_cols, in_tr + i * in_stride, border_cols);
+      memset(in_tr + i * in_stride, input[i * in_stride + width - 1],
+             border_cols);
+    }
+  }
+
+  av1_convolve_horiz_rs(input - 1, in_stride, output, out_stride, width2,
+                        height2, &av1_resize_filter_normative[0][0], x0_qn,
+                        x_step_qn);
+
+  // Restore the left/right border pixels
+  if (pad_left) {
+    for (int i = 0; i < height; i++) {
+      memcpy(in_tl + i * in_stride, tmp_left + i * border_cols, border_cols);
+    }
+    aom_free(tmp_left);
+  }
+  if (pad_right) {
+    for (int i = 0; i < height; i++) {
+      memcpy(in_tr + i * in_stride, tmp_right + i * border_cols, border_cols);
+    }
+    aom_free(tmp_right);
+  }
+}
+
+static void highbd_interpolate_core(const uint16_t *const input, int in_length,
+                                    uint16_t *output, int out_length, int bd,
+                                    const int16_t *interp_filters,
+                                    int interp_taps) {
+  const int32_t delta =
+      (((uint32_t)in_length << RS_SCALE_SUBPEL_BITS) + out_length / 2) /
+      out_length;
+  const int32_t offset =
+      in_length > out_length
+          ? (((int32_t)(in_length - out_length) << (RS_SCALE_SUBPEL_BITS - 1)) +
+             out_length / 2) /
+                out_length
+          : -(((int32_t)(out_length - in_length)
+               << (RS_SCALE_SUBPEL_BITS - 1)) +
+              out_length / 2) /
+                out_length;
+  uint16_t *optr = output;
+  int x, x1, x2, sum, k, int_pel, sub_pel;
+  int32_t y;
+
+  x = 0;
+  y = offset + RS_SCALE_EXTRA_OFF;
+  while ((y >> RS_SCALE_SUBPEL_BITS) < (interp_taps / 2 - 1)) {
+    x++;
+    y += delta;
+  }
+  x1 = x;
+  x = out_length - 1;
+  y = delta * x + offset + RS_SCALE_EXTRA_OFF;
+  while ((y >> RS_SCALE_SUBPEL_BITS) + (int32_t)(interp_taps / 2) >=
+         in_length) {
+    x--;
+    y -= delta;
+  }
+  x2 = x;
+  if (x1 > x2) {
+    for (x = 0, y = offset + RS_SCALE_EXTRA_OFF; x < out_length;
+         ++x, y += delta) {
+      int_pel = y >> RS_SCALE_SUBPEL_BITS;
+      sub_pel = (y >> RS_SCALE_EXTRA_BITS) & RS_SUBPEL_MASK;
+      const int16_t *filter = &interp_filters[sub_pel * interp_taps];
+      sum = 0;
+      for (k = 0; k < interp_taps; ++k) {
+        const int pk = int_pel - interp_taps / 2 + 1 + k;
+        sum += filter[k] * input[AOMMAX(AOMMIN(pk, in_length - 1), 0)];
+      }
+      *optr++ = clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd);
+    }
+  } else {
+    // Initial part.
+    for (x = 0, y = offset + RS_SCALE_EXTRA_OFF; x < x1; ++x, y += delta) {
+      int_pel = y >> RS_SCALE_SUBPEL_BITS;
+      sub_pel = (y >> RS_SCALE_EXTRA_BITS) & RS_SUBPEL_MASK;
+      const int16_t *filter = &interp_filters[sub_pel * interp_taps];
+      sum = 0;
+      for (k = 0; k < interp_taps; ++k)
+        sum += filter[k] * input[AOMMAX(int_pel - interp_taps / 2 + 1 + k, 0)];
+      *optr++ = clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd);
+    }
+    // Middle part.
+    for (; x <= x2; ++x, y += delta) {
+      int_pel = y >> RS_SCALE_SUBPEL_BITS;
+      sub_pel = (y >> RS_SCALE_EXTRA_BITS) & RS_SUBPEL_MASK;
+      const int16_t *filter = &interp_filters[sub_pel * interp_taps];
+      sum = 0;
+      for (k = 0; k < interp_taps; ++k)
+        sum += filter[k] * input[int_pel - interp_taps / 2 + 1 + k];
+      *optr++ = clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd);
+    }
+    // End part.
+    for (; x < out_length; ++x, y += delta) {
+      int_pel = y >> RS_SCALE_SUBPEL_BITS;
+      sub_pel = (y >> RS_SCALE_EXTRA_BITS) & RS_SUBPEL_MASK;
+      const int16_t *filter = &interp_filters[sub_pel * interp_taps];
+      sum = 0;
+      for (k = 0; k < interp_taps; ++k)
+        sum += filter[k] *
+               input[AOMMIN(int_pel - interp_taps / 2 + 1 + k, in_length - 1)];
+      *optr++ = clip_pixel_highbd(ROUND_POWER_OF_TWO(sum, FILTER_BITS), bd);
+    }
+  }
+}
+
+static void highbd_interpolate(const uint16_t *const input, int in_length,
+                               uint16_t *output, int out_length, int bd) {
+  const InterpKernel *interp_filters =
+      choose_interp_filter(in_length, out_length);
+
+  highbd_interpolate_core(input, in_length, output, out_length, bd,
+                          &interp_filters[0][0], SUBPEL_TAPS);
+}
+
+#ifndef __clang_analyzer__
+static void highbd_down2_symeven(const uint16_t *const input, int length,
+                                 uint16_t *output, int bd) {
+  // Actual filter len = 2 * filter_len_half.
+  static const int16_t *filter = av1_down2_symeven_half_filter;
+  const int filter_len_half = sizeof(av1_down2_symeven_half_filter) / 2;
+  int i, j;
+  uint16_t *optr = output;
+  int l1 = filter_len_half;
+  int l2 = (length - filter_len_half);
+  l1 += (l1 & 1);
+  l2 += (l2 & 1);
+  if (l1 > l2) {
+    // Short input length.
+    for (i = 0; i < length; i += 2) {
+      int sum = (1 << (FILTER_BITS - 1));
+      for (j = 0; j < filter_len_half; ++j) {
+        sum +=
+            (input[AOMMAX(0, i - j)] + input[AOMMIN(i + 1 + j, length - 1)]) *
+            filter[j];
+      }
+      sum >>= FILTER_BITS;
+      *optr++ = clip_pixel_highbd(sum, bd);
+    }
+  } else {
+    // Initial part.
+    for (i = 0; i < l1; i += 2) {
+      int sum = (1 << (FILTER_BITS - 1));
+      for (j = 0; j < filter_len_half; ++j) {
+        sum += (input[AOMMAX(0, i - j)] + input[i + 1 + j]) * filter[j];
+      }
+      sum >>= FILTER_BITS;
+      *optr++ = clip_pixel_highbd(sum, bd);
+    }
+    // Middle part.
+    for (; i < l2; i += 2) {
+      int sum = (1 << (FILTER_BITS - 1));
+      for (j = 0; j < filter_len_half; ++j) {
+        sum += (input[i - j] + input[i + 1 + j]) * filter[j];
+      }
+      sum >>= FILTER_BITS;
+      *optr++ = clip_pixel_highbd(sum, bd);
+    }
+    // End part.
+    for (; i < length; i += 2) {
+      int sum = (1 << (FILTER_BITS - 1));
+      for (j = 0; j < filter_len_half; ++j) {
+        sum +=
+            (input[i - j] + input[AOMMIN(i + 1 + j, length - 1)]) * filter[j];
+      }
+      sum >>= FILTER_BITS;
+      *optr++ = clip_pixel_highbd(sum, bd);
+    }
+  }
+}
+
+static void highbd_down2_symodd(const uint16_t *const input, int length,
+                                uint16_t *output, int bd) {
+  // Actual filter len = 2 * filter_len_half - 1.
+  static const int16_t *filter = av1_down2_symodd_half_filter;
+  const int filter_len_half = sizeof(av1_down2_symodd_half_filter) / 2;
+  int i, j;
+  uint16_t *optr = output;
+  int l1 = filter_len_half - 1;
+  int l2 = (length - filter_len_half + 1);
+  l1 += (l1 & 1);
+  l2 += (l2 & 1);
+  if (l1 > l2) {
+    // Short input length.
+    for (i = 0; i < length; i += 2) {
+      int sum = (1 << (FILTER_BITS - 1)) + input[i] * filter[0];
+      for (j = 1; j < filter_len_half; ++j) {
+        sum += (input[AOMMAX(i - j, 0)] + input[AOMMIN(i + j, length - 1)]) *
+               filter[j];
+      }
+      sum >>= FILTER_BITS;
+      *optr++ = clip_pixel_highbd(sum, bd);
+    }
+  } else {
+    // Initial part.
+    for (i = 0; i < l1; i += 2) {
+      int sum = (1 << (FILTER_BITS - 1)) + input[i] * filter[0];
+      for (j = 1; j < filter_len_half; ++j) {
+        sum += (input[AOMMAX(i - j, 0)] + input[i + j]) * filter[j];
+      }
+      sum >>= FILTER_BITS;
+      *optr++ = clip_pixel_highbd(sum, bd);
+    }
+    // Middle part.
+    for (; i < l2; i += 2) {
+      int sum = (1 << (FILTER_BITS - 1)) + input[i] * filter[0];
+      for (j = 1; j < filter_len_half; ++j) {
+        sum += (input[i - j] + input[i + j]) * filter[j];
+      }
+      sum >>= FILTER_BITS;
+      *optr++ = clip_pixel_highbd(sum, bd);
+    }
+    // End part.
+    for (; i < length; i += 2) {
+      int sum = (1 << (FILTER_BITS - 1)) + input[i] * filter[0];
+      for (j = 1; j < filter_len_half; ++j) {
+        sum += (input[i - j] + input[AOMMIN(i + j, length - 1)]) * filter[j];
+      }
+      sum >>= FILTER_BITS;
+      *optr++ = clip_pixel_highbd(sum, bd);
+    }
+  }
+}
+#endif
+
+static void highbd_resize_multistep(const uint16_t *const input, int length,
+                                    uint16_t *output, int olength,
+                                    uint16_t *otmp, int bd) {
+  if (length == olength) {
+    memcpy(output, input, sizeof(output[0]) * length);
+    return;
+  }
+  const int steps = get_down2_steps(length, olength);
+
+  if (steps > 0) {
+    uint16_t *out = NULL;
+    int filteredlength = length;
+
+    assert(otmp != NULL);
+    uint16_t *otmp2 = otmp + get_down2_length(length, 1);
+    for (int s = 0; s < steps; ++s) {
+      const int proj_filteredlength = get_down2_length(filteredlength, 1);
+      const uint16_t *const in = (s == 0 ? input : out);
+      if (s == steps - 1 && proj_filteredlength == olength)
+        out = output;
+      else
+        out = (s & 1 ? otmp2 : otmp);
+      if (filteredlength & 1)
+        highbd_down2_symodd(in, filteredlength, out, bd);
+      else
+        highbd_down2_symeven(in, filteredlength, out, bd);
+      filteredlength = proj_filteredlength;
+    }
+    if (filteredlength != olength) {
+      highbd_interpolate(out, filteredlength, output, olength, bd);
+    }
+  } else {
+    highbd_interpolate(input, length, output, olength, bd);
+  }
+}
+
+static void highbd_fill_col_to_arr(uint16_t *img, int stride, int len,
+                                   uint16_t *arr) {
+  int i;
+  uint16_t *iptr = img;
+  uint16_t *aptr = arr;
+  for (i = 0; i < len; ++i, iptr += stride) {
+    *aptr++ = *iptr;
+  }
+}
+
+static void highbd_fill_arr_to_col(uint16_t *img, int stride, int len,
+                                   uint16_t *arr) {
+  int i;
+  uint16_t *iptr = img;
+  uint16_t *aptr = arr;
+  for (i = 0; i < len; ++i, iptr += stride) {
+    *iptr = *aptr++;
+  }
+}
+
+void av1_highbd_resize_plane(const uint8_t *const input, int height, int width,
+                             int in_stride, uint8_t *output, int height2,
+                             int width2, int out_stride, int bd) {
+  int i;
+  uint16_t *intbuf = (uint16_t *)aom_malloc(sizeof(uint16_t) * width2 * height);
+  uint16_t *tmpbuf =
+      (uint16_t *)aom_malloc(sizeof(uint16_t) * AOMMAX(width, height));
+  uint16_t *arrbuf = (uint16_t *)aom_malloc(sizeof(uint16_t) * height);
+  uint16_t *arrbuf2 = (uint16_t *)aom_malloc(sizeof(uint16_t) * height2);
+  if (intbuf == NULL || tmpbuf == NULL || arrbuf == NULL || arrbuf2 == NULL)
+    goto Error;
+  for (i = 0; i < height; ++i) {
+    highbd_resize_multistep(CONVERT_TO_SHORTPTR(input + in_stride * i), width,
+                            intbuf + width2 * i, width2, tmpbuf, bd);
+  }
+  for (i = 0; i < width2; ++i) {
+    highbd_fill_col_to_arr(intbuf + i, width2, height, arrbuf);
+    highbd_resize_multistep(arrbuf, height, arrbuf2, height2, tmpbuf, bd);
+    highbd_fill_arr_to_col(CONVERT_TO_SHORTPTR(output + i), out_stride, height2,
+                           arrbuf2);
+  }
+
+Error:
+  aom_free(intbuf);
+  aom_free(tmpbuf);
+  aom_free(arrbuf);
+  aom_free(arrbuf2);
+}
+
+static void highbd_upscale_normative_rect(const uint8_t *const input,
+                                          int height, int width, int in_stride,
+                                          uint8_t *output, int height2,
+                                          int width2, int out_stride,
+                                          int x_step_qn, int x0_qn,
+                                          int pad_left, int pad_right, int bd) {
+  assert(width > 0);
+  assert(height > 0);
+  assert(width2 > 0);
+  assert(height2 > 0);
+  assert(height2 == height);
+
+  // Extend the left/right pixels of the tile column if needed
+  // (either because we can't sample from other tiles, or because we're at
+  // a frame edge).
+  // Save the overwritten pixels into tmp_left and tmp_right.
+  // Note: Because we pass input-1 to av1_convolve_horiz_rs, we need one extra
+  // column of border pixels compared to what we'd naively think.
+  const int border_cols = UPSCALE_NORMATIVE_TAPS / 2 + 1;
+  const int border_size = border_cols * sizeof(uint16_t);
+  uint16_t *tmp_left =
+      NULL;  // Silence spurious "may be used uninitialized" warnings
+  uint16_t *tmp_right = NULL;
+  uint16_t *const input16 = CONVERT_TO_SHORTPTR(input);
+  uint16_t *const in_tl = input16 - border_cols;
+  uint16_t *const in_tr = input16 + width;
+  if (pad_left) {
+    tmp_left = (uint16_t *)aom_malloc(sizeof(*tmp_left) * border_cols * height);
+    for (int i = 0; i < height; i++) {
+      memcpy(tmp_left + i * border_cols, in_tl + i * in_stride, border_size);
+      aom_memset16(in_tl + i * in_stride, input16[i * in_stride], border_cols);
+    }
+  }
+  if (pad_right) {
+    tmp_right =
+        (uint16_t *)aom_malloc(sizeof(*tmp_right) * border_cols * height);
+    for (int i = 0; i < height; i++) {
+      memcpy(tmp_right + i * border_cols, in_tr + i * in_stride, border_size);
+      aom_memset16(in_tr + i * in_stride, input16[i * in_stride + width - 1],
+                   border_cols);
+    }
+  }
+
+  av1_highbd_convolve_horiz_rs(CONVERT_TO_SHORTPTR(input - 1), in_stride,
+                               CONVERT_TO_SHORTPTR(output), out_stride, width2,
+                               height2, &av1_resize_filter_normative[0][0],
+                               x0_qn, x_step_qn, bd);
+
+  // Restore the left/right border pixels
+  if (pad_left) {
+    for (int i = 0; i < height; i++) {
+      memcpy(in_tl + i * in_stride, tmp_left + i * border_cols, border_size);
+    }
+    aom_free(tmp_left);
+  }
+  if (pad_right) {
+    for (int i = 0; i < height; i++) {
+      memcpy(in_tr + i * in_stride, tmp_right + i * border_cols, border_size);
+    }
+    aom_free(tmp_right);
+  }
+}
+
+void av1_resize_frame420(const uint8_t *const y, int y_stride,
+                         const uint8_t *const u, const uint8_t *const v,
+                         int uv_stride, int height, int width, uint8_t *oy,
+                         int oy_stride, uint8_t *ou, uint8_t *ov,
+                         int ouv_stride, int oheight, int owidth) {
+  av1_resize_plane(y, height, width, y_stride, oy, oheight, owidth, oy_stride);
+  av1_resize_plane(u, height / 2, width / 2, uv_stride, ou, oheight / 2,
+                   owidth / 2, ouv_stride);
+  av1_resize_plane(v, height / 2, width / 2, uv_stride, ov, oheight / 2,
+                   owidth / 2, ouv_stride);
+}
+
+void av1_resize_frame422(const uint8_t *const y, int y_stride,
+                         const uint8_t *const u, const uint8_t *const v,
+                         int uv_stride, int height, int width, uint8_t *oy,
+                         int oy_stride, uint8_t *ou, uint8_t *ov,
+                         int ouv_stride, int oheight, int owidth) {
+  av1_resize_plane(y, height, width, y_stride, oy, oheight, owidth, oy_stride);
+  av1_resize_plane(u, height, width / 2, uv_stride, ou, oheight, owidth / 2,
+                   ouv_stride);
+  av1_resize_plane(v, height, width / 2, uv_stride, ov, oheight, owidth / 2,
+                   ouv_stride);
+}
+
+void av1_resize_frame444(const uint8_t *const y, int y_stride,
+                         const uint8_t *const u, const uint8_t *const v,
+                         int uv_stride, int height, int width, uint8_t *oy,
+                         int oy_stride, uint8_t *ou, uint8_t *ov,
+                         int ouv_stride, int oheight, int owidth) {
+  av1_resize_plane(y, height, width, y_stride, oy, oheight, owidth, oy_stride);
+  av1_resize_plane(u, height, width, uv_stride, ou, oheight, owidth,
+                   ouv_stride);
+  av1_resize_plane(v, height, width, uv_stride, ov, oheight, owidth,
+                   ouv_stride);
+}
+
+void av1_highbd_resize_frame420(const uint8_t *const y, int y_stride,
+                                const uint8_t *const u, const uint8_t *const v,
+                                int uv_stride, int height, int width,
+                                uint8_t *oy, int oy_stride, uint8_t *ou,
+                                uint8_t *ov, int ouv_stride, int oheight,
+                                int owidth, int bd) {
+  av1_highbd_resize_plane(y, height, width, y_stride, oy, oheight, owidth,
+                          oy_stride, bd);
+  av1_highbd_resize_plane(u, height / 2, width / 2, uv_stride, ou, oheight / 2,
+                          owidth / 2, ouv_stride, bd);
+  av1_highbd_resize_plane(v, height / 2, width / 2, uv_stride, ov, oheight / 2,
+                          owidth / 2, ouv_stride, bd);
+}
+
+void av1_highbd_resize_frame422(const uint8_t *const y, int y_stride,
+                                const uint8_t *const u, const uint8_t *const v,
+                                int uv_stride, int height, int width,
+                                uint8_t *oy, int oy_stride, uint8_t *ou,
+                                uint8_t *ov, int ouv_stride, int oheight,
+                                int owidth, int bd) {
+  av1_highbd_resize_plane(y, height, width, y_stride, oy, oheight, owidth,
+                          oy_stride, bd);
+  av1_highbd_resize_plane(u, height, width / 2, uv_stride, ou, oheight,
+                          owidth / 2, ouv_stride, bd);
+  av1_highbd_resize_plane(v, height, width / 2, uv_stride, ov, oheight,
+                          owidth / 2, ouv_stride, bd);
+}
+
+void av1_highbd_resize_frame444(const uint8_t *const y, int y_stride,
+                                const uint8_t *const u, const uint8_t *const v,
+                                int uv_stride, int height, int width,
+                                uint8_t *oy, int oy_stride, uint8_t *ou,
+                                uint8_t *ov, int ouv_stride, int oheight,
+                                int owidth, int bd) {
+  av1_highbd_resize_plane(y, height, width, y_stride, oy, oheight, owidth,
+                          oy_stride, bd);
+  av1_highbd_resize_plane(u, height, width, uv_stride, ou, oheight, owidth,
+                          ouv_stride, bd);
+  av1_highbd_resize_plane(v, height, width, uv_stride, ov, oheight, owidth,
+                          ouv_stride, bd);
+}
+
+void av1_resize_and_extend_frame(const YV12_BUFFER_CONFIG *src,
+                                 YV12_BUFFER_CONFIG *dst, int bd,
+                                 const int num_planes) {
+  // TODO(dkovalev): replace YV12_BUFFER_CONFIG with aom_image_t
+
+  // We use AOMMIN(num_planes, MAX_MB_PLANE) instead of num_planes to quiet
+  // the static analysis warnings.
+  for (int i = 0; i < AOMMIN(num_planes, MAX_MB_PLANE); ++i) {
+    const int is_uv = i > 0;
+    if (src->flags & YV12_FLAG_HIGHBITDEPTH)
+      av1_highbd_resize_plane(src->buffers[i], src->crop_heights[is_uv],
+                              src->crop_widths[is_uv], src->strides[is_uv],
+                              dst->buffers[i], dst->crop_heights[is_uv],
+                              dst->crop_widths[is_uv], dst->strides[is_uv], bd);
+    else
+      av1_resize_plane(src->buffers[i], src->crop_heights[is_uv],
+                       src->crop_widths[is_uv], src->strides[is_uv],
+                       dst->buffers[i], dst->crop_heights[is_uv],
+                       dst->crop_widths[is_uv], dst->strides[is_uv]);
+  }
+  aom_extend_frame_borders(dst, num_planes);
+}
+
+void av1_upscale_normative_rows(const AV1_COMMON *cm, const uint8_t *src,
+                                int src_stride, uint8_t *dst, int dst_stride,
+                                int plane, int rows) {
+  const int is_uv = (plane > 0);
+  const int ss_x = is_uv && cm->seq_params.subsampling_x;
+  const int downscaled_plane_width = ROUND_POWER_OF_TWO(cm->width, ss_x);
+  const int upscaled_plane_width =
+      ROUND_POWER_OF_TWO(cm->superres_upscaled_width, ss_x);
+  const int superres_denom = cm->superres_scale_denominator;
+
+  TileInfo tile_col;
+  const int32_t x_step_qn = av1_get_upscale_convolve_step(
+      downscaled_plane_width, upscaled_plane_width);
+  int32_t x0_qn = get_upscale_convolve_x0(downscaled_plane_width,
+                                          upscaled_plane_width, x_step_qn);
+
+  for (int j = 0; j < cm->tile_cols; j++) {
+    av1_tile_set_col(&tile_col, cm, j);
+    // Determine the limits of this tile column in both the source
+    // and destination images.
+    // Note: The actual location which we start sampling from is
+    // (downscaled_x0 - 1 + (x0_qn/2^14)), and this quantity increases
+    // by exactly dst_width * (x_step_qn/2^14) pixels each iteration.
+    const int downscaled_x0 = tile_col.mi_col_start << (MI_SIZE_LOG2 - ss_x);
+    const int downscaled_x1 = tile_col.mi_col_end << (MI_SIZE_LOG2 - ss_x);
+    const int src_width = downscaled_x1 - downscaled_x0;
+
+    const int upscaled_x0 = (downscaled_x0 * superres_denom) / SCALE_NUMERATOR;
+    int upscaled_x1;
+    if (j == cm->tile_cols - 1) {
+      // Note that we can't just use AOMMIN here - due to rounding,
+      // (downscaled_x1 * superres_denom) / SCALE_NUMERATOR may be less than
+      // upscaled_plane_width.
+      upscaled_x1 = upscaled_plane_width;
+    } else {
+      upscaled_x1 = (downscaled_x1 * superres_denom) / SCALE_NUMERATOR;
+    }
+
+    const uint8_t *const src_ptr = src + downscaled_x0;
+    uint8_t *const dst_ptr = dst + upscaled_x0;
+    const int dst_width = upscaled_x1 - upscaled_x0;
+
+    const int pad_left = (j == 0);
+    const int pad_right = (j == cm->tile_cols - 1);
+
+    if (cm->seq_params.use_highbitdepth)
+      highbd_upscale_normative_rect(src_ptr, rows, src_width, src_stride,
+                                    dst_ptr, rows, dst_width, dst_stride,
+                                    x_step_qn, x0_qn, pad_left, pad_right,
+                                    cm->seq_params.bit_depth);
+    else
+      upscale_normative_rect(src_ptr, rows, src_width, src_stride, dst_ptr,
+                             rows, dst_width, dst_stride, x_step_qn, x0_qn,
+                             pad_left, pad_right);
+
+    // Update the fractional pixel offset to prepare for the next tile column.
+    x0_qn += (dst_width * x_step_qn) - (src_width << RS_SCALE_SUBPEL_BITS);
+  }
+}
+
+void av1_upscale_normative_and_extend_frame(const AV1_COMMON *cm,
+                                            const YV12_BUFFER_CONFIG *src,
+                                            YV12_BUFFER_CONFIG *dst) {
+  const int num_planes = av1_num_planes(cm);
+  for (int i = 0; i < num_planes; ++i) {
+    const int is_uv = (i > 0);
+    av1_upscale_normative_rows(cm, src->buffers[i], src->strides[is_uv],
+                               dst->buffers[i], dst->strides[is_uv], i,
+                               src->crop_heights[is_uv]);
+  }
+
+  aom_extend_frame_borders(dst, num_planes);
+}
+
+YV12_BUFFER_CONFIG *av1_scale_if_required(AV1_COMMON *cm,
+                                          YV12_BUFFER_CONFIG *unscaled,
+                                          YV12_BUFFER_CONFIG *scaled) {
+  const int num_planes = av1_num_planes(cm);
+  if (cm->width != unscaled->y_crop_width ||
+      cm->height != unscaled->y_crop_height) {
+    av1_resize_and_extend_frame(unscaled, scaled, (int)cm->seq_params.bit_depth,
+                                num_planes);
+    return scaled;
+  } else {
+    return unscaled;
+  }
+}
+
+// Calculates the scaled dimension given the original dimension and the scale
+// denominator.
+static void calculate_scaled_size_helper(int *dim, int denom) {
+  if (denom != SCALE_NUMERATOR) {
+    // We need to ensure the constraint in "Appendix A" of the spec:
+    // * FrameWidth is greater than or equal to 16
+    // * FrameHeight is greater than or equal to 16
+    // For this, we clamp the downscaled dimension to at least 16. One
+    // exception: if original dimension itself was < 16, then we keep the
+    // downscaled dimension to be same as the original, to ensure that resizing
+    // is valid.
+    const int min_dim = AOMMIN(16, *dim);
+    // Use this version if we need *dim to be even
+    // *width = (*width * SCALE_NUMERATOR + denom) / (2 * denom);
+    // *width <<= 1;
+    *dim = (*dim * SCALE_NUMERATOR + denom / 2) / (denom);
+    *dim = AOMMAX(*dim, min_dim);
+  }
+}
+
+void av1_calculate_scaled_size(int *width, int *height, int resize_denom) {
+  calculate_scaled_size_helper(width, resize_denom);
+  calculate_scaled_size_helper(height, resize_denom);
+}
+
+void av1_calculate_scaled_superres_size(int *width, int *height,
+                                        int superres_denom) {
+  (void)height;
+  calculate_scaled_size_helper(width, superres_denom);
+}
+
+void av1_calculate_unscaled_superres_size(int *width, int *height, int denom) {
+  if (denom != SCALE_NUMERATOR) {
+    // Note: av1_calculate_scaled_superres_size() rounds *up* after division
+    // when the resulting dimensions are odd. So here, we round *down*.
+    *width = *width * denom / SCALE_NUMERATOR;
+    (void)height;
+  }
+}
+
+// Copy only the config data from 'src' to 'dst'.
+static void copy_buffer_config(const YV12_BUFFER_CONFIG *const src,
+                               YV12_BUFFER_CONFIG *const dst) {
+  dst->bit_depth = src->bit_depth;
+  dst->color_primaries = src->color_primaries;
+  dst->transfer_characteristics = src->transfer_characteristics;
+  dst->matrix_coefficients = src->matrix_coefficients;
+  dst->monochrome = src->monochrome;
+  dst->chroma_sample_position = src->chroma_sample_position;
+  dst->color_range = src->color_range;
+}
+
+// TODO(afergs): Look for in-place upscaling
+// TODO(afergs): aom_ vs av1_ functions? Which can I use?
+// Upscale decoded image.
+void av1_superres_upscale(AV1_COMMON *cm, BufferPool *const pool) {
+  const int num_planes = av1_num_planes(cm);
+  if (!av1_superres_scaled(cm)) return;
+  const SequenceHeader *const seq_params = &cm->seq_params;
+
+  YV12_BUFFER_CONFIG copy_buffer;
+  memset(&copy_buffer, 0, sizeof(copy_buffer));
+
+  YV12_BUFFER_CONFIG *const frame_to_show = &cm->cur_frame->buf;
+
+  const int aligned_width = ALIGN_POWER_OF_TWO(cm->width, 3);
+  if (aom_alloc_frame_buffer(
+          &copy_buffer, aligned_width, cm->height, seq_params->subsampling_x,
+          seq_params->subsampling_y, seq_params->use_highbitdepth,
+          AOM_BORDER_IN_PIXELS, cm->byte_alignment))
+    aom_internal_error(&cm->error, AOM_CODEC_MEM_ERROR,
+                       "Failed to allocate copy buffer for superres upscaling");
+
+  // Copy function assumes the frames are the same size.
+  // Note that it does not copy YV12_BUFFER_CONFIG config data.
+  aom_yv12_copy_frame(frame_to_show, &copy_buffer, num_planes);
+
+  assert(copy_buffer.y_crop_width == aligned_width);
+  assert(copy_buffer.y_crop_height == cm->height);
+
+  // Realloc the current frame buffer at a higher resolution in place.
+  if (pool != NULL) {
+    // Use callbacks if on the decoder.
+    aom_codec_frame_buffer_t *fb = &cm->cur_frame->raw_frame_buffer;
+    aom_release_frame_buffer_cb_fn_t release_fb_cb = pool->release_fb_cb;
+    aom_get_frame_buffer_cb_fn_t cb = pool->get_fb_cb;
+    void *cb_priv = pool->cb_priv;
+
+    // Realloc with callback does not release the frame buffer - release first.
+    if (release_fb_cb(cb_priv, fb))
+      aom_internal_error(
+          &cm->error, AOM_CODEC_MEM_ERROR,
+          "Failed to free current frame buffer before superres upscaling");
+
+    // aom_realloc_frame_buffer() leaves config data for frame_to_show intact
+    if (aom_realloc_frame_buffer(
+            frame_to_show, cm->superres_upscaled_width,
+            cm->superres_upscaled_height, seq_params->subsampling_x,
+            seq_params->subsampling_y, seq_params->use_highbitdepth,
+            AOM_BORDER_IN_PIXELS, cm->byte_alignment, fb, cb, cb_priv))
+      aom_internal_error(
+          &cm->error, AOM_CODEC_MEM_ERROR,
+          "Failed to allocate current frame buffer for superres upscaling");
+  } else {
+    // Make a copy of the config data for frame_to_show in copy_buffer
+    copy_buffer_config(frame_to_show, &copy_buffer);
+
+    // Don't use callbacks on the encoder.
+    // aom_alloc_frame_buffer() clears the config data for frame_to_show
+    if (aom_alloc_frame_buffer(
+            frame_to_show, cm->superres_upscaled_width,
+            cm->superres_upscaled_height, seq_params->subsampling_x,
+            seq_params->subsampling_y, seq_params->use_highbitdepth,
+            AOM_BORDER_IN_PIXELS, cm->byte_alignment))
+      aom_internal_error(
+          &cm->error, AOM_CODEC_MEM_ERROR,
+          "Failed to reallocate current frame buffer for superres upscaling");
+
+    // Restore config data back to frame_to_show
+    copy_buffer_config(&copy_buffer, frame_to_show);
+  }
+  // TODO(afergs): verify frame_to_show is correct after realloc
+  //               encoder:
+  //               decoder:
+
+  assert(frame_to_show->y_crop_width == cm->superres_upscaled_width);
+  assert(frame_to_show->y_crop_height == cm->superres_upscaled_height);
+
+  // Scale up and back into frame_to_show.
+  assert(frame_to_show->y_crop_width != cm->width);
+  av1_upscale_normative_and_extend_frame(cm, &copy_buffer, frame_to_show);
+
+  // Free the copy buffer
+  aom_free_frame_buffer(&copy_buffer);
+}

diff --git a/libav1/av1/common/resize.h b/libav1/av1/common/resize.h
new file mode 100644
index 0000000..43bea58
--- /dev/null
+++ b/libav1/av1/common/resize.h

@@ -0,0 +1,115 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_RESIZE_H_
+#define AOM_AV1_COMMON_RESIZE_H_
+
+#include <stdio.h>
+#include "aom/aom_integer.h"
+#include "av1/common/onyxc_int.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void av1_resize_plane(const uint8_t *const input, int height, int width,
+                      int in_stride, uint8_t *output, int height2, int width2,
+                      int out_stride);
+void av1_upscale_plane_double_prec(const double *const input, int height,
+                                   int width, int in_stride, double *output,
+                                   int height2, int width2, int out_stride);
+void av1_resize_frame420(const uint8_t *const y, int y_stride,
+                         const uint8_t *const u, const uint8_t *const v,
+                         int uv_stride, int height, int width, uint8_t *oy,
+                         int oy_stride, uint8_t *ou, uint8_t *ov,
+                         int ouv_stride, int oheight, int owidth);
+void av1_resize_frame422(const uint8_t *const y, int y_stride,
+                         const uint8_t *const u, const uint8_t *const v,
+                         int uv_stride, int height, int width, uint8_t *oy,
+                         int oy_stride, uint8_t *ou, uint8_t *ov,
+                         int ouv_stride, int oheight, int owidth);
+void av1_resize_frame444(const uint8_t *const y, int y_stride,
+                         const uint8_t *const u, const uint8_t *const v,
+                         int uv_stride, int height, int width, uint8_t *oy,
+                         int oy_stride, uint8_t *ou, uint8_t *ov,
+                         int ouv_stride, int oheight, int owidth);
+
+void av1_highbd_resize_plane(const uint8_t *const input, int height, int width,
+                             int in_stride, uint8_t *output, int height2,
+                             int width2, int out_stride, int bd);
+void av1_highbd_resize_frame420(const uint8_t *const y, int y_stride,
+                                const uint8_t *const u, const uint8_t *const v,
+                                int uv_stride, int height, int width,
+                                uint8_t *oy, int oy_stride, uint8_t *ou,
+                                uint8_t *ov, int ouv_stride, int oheight,
+                                int owidth, int bd);
+void av1_highbd_resize_frame422(const uint8_t *const y, int y_stride,
+                                const uint8_t *const u, const uint8_t *const v,
+                                int uv_stride, int height, int width,
+                                uint8_t *oy, int oy_stride, uint8_t *ou,
+                                uint8_t *ov, int ouv_stride, int oheight,
+                                int owidth, int bd);
+void av1_highbd_resize_frame444(const uint8_t *const y, int y_stride,
+                                const uint8_t *const u, const uint8_t *const v,
+                                int uv_stride, int height, int width,
+                                uint8_t *oy, int oy_stride, uint8_t *ou,
+                                uint8_t *ov, int ouv_stride, int oheight,
+                                int owidth, int bd);
+void av1_resize_and_extend_frame(const YV12_BUFFER_CONFIG *src,
+                                 YV12_BUFFER_CONFIG *dst, int bd,
+                                 const int num_planes);
+
+void av1_upscale_normative_rows(const AV1_COMMON *cm, const uint8_t *src,
+                                int src_stride, uint8_t *dst, int dst_stride,
+                                int plane, int rows);
+void av1_upscale_normative_and_extend_frame(const AV1_COMMON *cm,
+                                            const YV12_BUFFER_CONFIG *src,
+                                            YV12_BUFFER_CONFIG *dst);
+
+YV12_BUFFER_CONFIG *av1_scale_if_required(AV1_COMMON *cm,
+                                          YV12_BUFFER_CONFIG *unscaled,
+                                          YV12_BUFFER_CONFIG *scaled);
+
+// Calculates the scaled dimensions from the given original dimensions and the
+// resize scale denominator.
+void av1_calculate_scaled_size(int *width, int *height, int resize_denom);
+
+// Similar to above, but calculates scaled dimensions after superres from the
+// given original dimensions and superres scale denominator.
+void av1_calculate_scaled_superres_size(int *width, int *height,
+                                        int superres_denom);
+
+// Inverse of av1_calculate_scaled_superres_size() above: calculates the
+// original dimensions from the given scaled dimensions and the scale
+// denominator.
+void av1_calculate_unscaled_superres_size(int *width, int *height, int denom);
+
+void av1_superres_upscale(AV1_COMMON *cm, BufferPool *const pool);
+
+// Returns 1 if a superres upscaled frame is scaled and 0 otherwise.
+static INLINE int av1_superres_scaled(const AV1_COMMON *cm) {
+  // Note: for some corner cases (e.g. cm->width of 1), there may be no scaling
+  // required even though cm->superres_scale_denominator != SCALE_NUMERATOR.
+  // So, the following check is more accurate.
+  return !(cm->width == cm->superres_upscaled_width);
+}
+
+#define UPSCALE_NORMATIVE_TAPS 8
+extern const int16_t av1_resize_filter_normative[1 << RS_SUBPEL_BITS]
+                                                [UPSCALE_NORMATIVE_TAPS];
+
+int32_t av1_get_upscale_convolve_step(int in_length, int out_length);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_RESIZE_H_

diff --git a/libav1/av1/common/restoration.c b/libav1/av1/common/restoration.c
new file mode 100644
index 0000000..c5c8529
--- /dev/null
+++ b/libav1/av1/common/restoration.c

@@ -0,0 +1,1532 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ *
+ */
+
+#include <math.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+#include "config/aom_scale_rtcd.h"
+
+#include "aom_mem/aom_mem.h"
+#include "av1/common/onyxc_int.h"
+#include "av1/common/resize.h"
+#include "av1/common/restoration.h"
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_mem/aom_mem.h"
+
+#include "aom_ports/mem.h"
+
+// The 's' values are calculated based on original 'r' and 'e' values in the
+// spec using GenSgrprojVtable().
+// Note: Setting r = 0 skips the filter; with corresponding s = -1 (invalid).
+const sgr_params_type sgr_params[SGRPROJ_PARAMS] = {
+  { { 2, 1 }, { 140, 3236 } }, { { 2, 1 }, { 112, 2158 } },
+  { { 2, 1 }, { 93, 1618 } },  { { 2, 1 }, { 80, 1438 } },
+  { { 2, 1 }, { 70, 1295 } },  { { 2, 1 }, { 58, 1177 } },
+  { { 2, 1 }, { 47, 1079 } },  { { 2, 1 }, { 37, 996 } },
+  { { 2, 1 }, { 30, 925 } },   { { 2, 1 }, { 25, 863 } },
+  { { 0, 1 }, { -1, 2589 } },  { { 0, 1 }, { -1, 1618 } },
+  { { 0, 1 }, { -1, 1177 } },  { { 0, 1 }, { -1, 925 } },
+  { { 2, 0 }, { 56, -1 } },    { { 2, 0 }, { 22, -1 } },
+};
+
+AV1PixelRect av1_whole_frame_rect(const AV1_COMMON *cm, int is_uv) {
+  AV1PixelRect rect;
+
+  int ss_x = is_uv && cm->seq_params.subsampling_x;
+  int ss_y = is_uv && cm->seq_params.subsampling_y;
+
+  rect.top = 0;
+  rect.bottom = ROUND_POWER_OF_TWO(cm->height, ss_y);
+  rect.left = 0;
+  rect.right = ROUND_POWER_OF_TWO(cm->superres_upscaled_width, ss_x);
+  return rect;
+}
+
+// Count horizontal or vertical units per tile (use a width or height for
+// tile_size, respectively). We basically want to divide the tile size by the
+// size of a restoration unit. Rather than rounding up unconditionally as you
+// might expect, we round to nearest, which models the way a right or bottom
+// restoration unit can extend to up to 150% its normal width or height. The
+// max with 1 is to deal with tiles that are smaller than half of a restoration
+// unit.
+int av1_lr_count_units_in_tile(int unit_size, int tile_size) {
+  return AOMMAX((tile_size + (unit_size >> 1)) / unit_size, 1);
+}
+
+void av1_alloc_restoration_struct(AV1_COMMON *cm, RestorationInfo *rsi,
+                                  int is_uv) {
+  // We need to allocate enough space for restoration units to cover the
+  // largest tile. Without CONFIG_MAX_TILE, this is always the tile at the
+  // top-left and we can use av1_get_tile_rect(). With CONFIG_MAX_TILE, we have
+  // to do the computation ourselves, iterating over the tiles and keeping
+  // track of the largest width and height, then upscaling.
+  const AV1PixelRect tile_rect = av1_whole_frame_rect(cm, is_uv);
+  const int max_tile_w = tile_rect.right - tile_rect.left;
+  const int max_tile_h = tile_rect.bottom - tile_rect.top;
+
+  // To calculate hpertile and vpertile (horizontal and vertical units per
+  // tile), we basically want to divide the largest tile width or height by the
+  // size of a restoration unit. Rather than rounding up unconditionally as you
+  // might expect, we round to nearest, which models the way a right or bottom
+  // restoration unit can extend to up to 150% its normal width or height. The
+  // max with 1 is to deal with tiles that are smaller than half of a
+  // restoration unit.
+  const int unit_size = rsi->restoration_unit_size;
+  const int hpertile = av1_lr_count_units_in_tile(unit_size, max_tile_w);
+  const int vpertile = av1_lr_count_units_in_tile(unit_size, max_tile_h);
+
+  rsi->units_per_tile = hpertile * vpertile;
+  rsi->horz_units_per_tile = hpertile;
+  rsi->vert_units_per_tile = vpertile;
+//  const int ntiles = 1;
+//  const int nunits = ntiles * rsi->units_per_tile;
+//
+//  aom_free(rsi->unit_info);
+//  CHECK_MEM_ERROR(cm, rsi->unit_info,
+//                  (RestorationUnitInfo *)aom_memalign(
+//                      16, sizeof(*rsi->unit_info) * nunits));
+}
+
+void av1_free_restoration_struct(RestorationInfo *rst_info) {
+  aom_free(rst_info->unit_info);
+  rst_info->unit_info = NULL;
+}
+
+#if 0
+// Pair of values for each sgrproj parameter:
+// Index 0 corresponds to r[0], e[0]
+// Index 1 corresponds to r[1], e[1]
+int sgrproj_mtable[SGRPROJ_PARAMS][2];
+
+static void GenSgrprojVtable() {
+  for (int i = 0; i < SGRPROJ_PARAMS; ++i) {
+    const sgr_params_type *const params = &sgr_params[i];
+    for (int j = 0; j < 2; ++j) {
+      const int e = params->e[j];
+      const int r = params->r[j];
+      if (r == 0) {                 // filter is disabled
+        sgrproj_mtable[i][j] = -1;  // mark invalid
+      } else {                      // filter is enabled
+        const int n = (2 * r + 1) * (2 * r + 1);
+        const int n2e = n * n * e;
+        assert(n2e != 0);
+        sgrproj_mtable[i][j] = (((1 << SGRPROJ_MTABLE_BITS) + n2e / 2) / n2e);
+      }
+    }
+  }
+}
+#endif
+
+void av1_loop_restoration_precal() {
+#if 0
+  GenSgrprojVtable();
+#endif
+}
+
+static void extend_frame_lowbd(uint8_t *data, int width, int height, int stride,
+                               int border_horz, int border_vert) {
+  uint8_t *data_p;
+  int i;
+  for (i = 0; i < height; ++i) {
+    data_p = data + i * stride;
+    memset(data_p - border_horz, data_p[0], border_horz);
+    memset(data_p + width, data_p[width - 1], border_horz);
+  }
+  data_p = data - border_horz;
+  for (i = -border_vert; i < 0; ++i) {
+    memcpy(data_p + i * stride, data_p, width + 2 * border_horz);
+  }
+  for (i = height; i < height + border_vert; ++i) {
+    memcpy(data_p + i * stride, data_p + (height - 1) * stride,
+           width + 2 * border_horz);
+  }
+}
+
+static void extend_frame_highbd(uint16_t *data, int width, int height,
+                                int stride, int border_horz, int border_vert) {
+  uint16_t *data_p;
+  int i, j;
+  for (i = 0; i < height; ++i) {
+    data_p = data + i * stride;
+    for (j = -border_horz; j < 0; ++j) data_p[j] = data_p[0];
+    for (j = width; j < width + border_horz; ++j) data_p[j] = data_p[width - 1];
+  }
+  data_p = data - border_horz;
+  for (i = -border_vert; i < 0; ++i) {
+    memcpy(data_p + i * stride, data_p,
+           (width + 2 * border_horz) * sizeof(uint16_t));
+  }
+  for (i = height; i < height + border_vert; ++i) {
+    memcpy(data_p + i * stride, data_p + (height - 1) * stride,
+           (width + 2 * border_horz) * sizeof(uint16_t));
+  }
+}
+
+void extend_frame(uint8_t *data, int width, int height, int stride,
+                  int border_horz, int border_vert, int highbd) {
+  if (highbd)
+    extend_frame_highbd(CONVERT_TO_SHORTPTR(data), width, height, stride,
+                        border_horz, border_vert);
+  else
+    extend_frame_lowbd(data, width, height, stride, border_horz, border_vert);
+}
+
+static void copy_tile_lowbd(int width, int height, const uint8_t *src,
+                            int src_stride, uint8_t *dst, int dst_stride) {
+  for (int i = 0; i < height; ++i)
+    memcpy(dst + i * dst_stride, src + i * src_stride, width);
+}
+
+static void copy_tile_highbd(int width, int height, const uint16_t *src,
+                             int src_stride, uint16_t *dst, int dst_stride) {
+  for (int i = 0; i < height; ++i)
+    memcpy(dst + i * dst_stride, src + i * src_stride, width * sizeof(*dst));
+}
+
+static void copy_tile(int width, int height, const uint8_t *src, int src_stride,
+                      uint8_t *dst, int dst_stride, int highbd) {
+  if (highbd)
+    copy_tile_highbd(width, height, CONVERT_TO_SHORTPTR(src), src_stride,
+                     CONVERT_TO_SHORTPTR(dst), dst_stride);
+  else
+    copy_tile_lowbd(width, height, src, src_stride, dst, dst_stride);
+}
+
+#define REAL_PTR(hbd, d) ((hbd) ? (uint8_t *)CONVERT_TO_SHORTPTR(d) : (d))
+
+// With striped loop restoration, the filtering for each 64-pixel stripe gets
+// most of its input from the output of CDEF (stored in data8), but we need to
+// fill out a border of 3 pixels above/below the stripe according to the
+// following
+// rules:
+//
+// * At a frame boundary, we copy the outermost row of CDEF pixels three times.
+//   This extension is done by a call to extend_frame() at the start of the loop
+//   restoration process, so the value of copy_above/copy_below doesn't strictly
+//   matter.
+//   However, by setting *copy_above = *copy_below = 1 whenever loop filtering
+//   across tiles is disabled, we can allow
+//   {setup,restore}_processing_stripe_boundary to assume that the top/bottom
+//   data has always been copied, simplifying the behaviour at the left and
+//   right edges of tiles.
+//
+// * If we're at a tile boundary and loop filtering across tiles is enabled,
+//   then there is a logical stripe which is 64 pixels high, but which is split
+//   into an 8px high and a 56px high stripe so that the processing (and
+//   coefficient set usage) can be aligned to tiles.
+//   In this case, we use the 3 rows of CDEF output across the boundary for
+//   context; this corresponds to leaving the frame buffer as-is.
+//
+// * If we're at a tile boundary and loop filtering across tiles is disabled,
+//   then we take the outermost row of CDEF pixels *within the current tile*
+//   and copy it three times. Thus we behave exactly as if the tile were a full
+//   frame.
+//
+// * Otherwise, we're at a stripe boundary within a tile. In that case, we
+//   take 2 rows of deblocked pixels and extend them to 3 rows of context.
+//
+// The distinction between the latter two cases is handled by the
+// av1_loop_restoration_save_boundary_lines() function, so here we just need
+// to decide if we're overwriting the above/below boundary pixels or not.
+static void get_stripe_boundary_info(const RestorationTileLimits *limits,
+                                     const AV1PixelRect *tile_rect, int ss_y,
+                                     int *copy_above, int *copy_below) {
+  *copy_above = 1;
+  *copy_below = 1;
+
+  const int full_stripe_height = RESTORATION_PROC_UNIT_SIZE >> ss_y;
+  const int runit_offset = RESTORATION_UNIT_OFFSET >> ss_y;
+
+  const int first_stripe_in_tile = (limits->v_start == tile_rect->top);
+  const int this_stripe_height =
+      full_stripe_height - (first_stripe_in_tile ? runit_offset : 0);
+  const int last_stripe_in_tile =
+      (limits->v_start + this_stripe_height >= tile_rect->bottom);
+
+  if (first_stripe_in_tile) *copy_above = 0;
+  if (last_stripe_in_tile) *copy_below = 0;
+}
+
+// Overwrite the border pixels around a processing stripe so that the conditions
+// listed above get_stripe_boundary_info() are preserved.
+// We save the pixels which get overwritten into a temporary buffer, so that
+// they can be restored by restore_processing_stripe_boundary() after we've
+// processed the stripe.
+//
+// limits gives the rectangular limits of the remaining stripes for the current
+// restoration unit. rsb is the stored stripe boundaries (taken from either
+// deblock or CDEF output as necessary).
+//
+// tile_rect is the limits of the current tile and tile_stripe0 is the index of
+// the first stripe in this tile (needed to convert the tile-relative stripe
+// index we get from limits into something we can look up in rsb).
+static void setup_processing_stripe_boundary(
+    const RestorationTileLimits *limits, const RestorationStripeBoundaries *rsb,
+    int rsb_row, int use_highbd, int h, uint8_t *data8, int data_stride,
+    RestorationLineBuffers *rlbs, int copy_above, int copy_below, int opt) {
+  // Offsets within the line buffers. The buffer logically starts at column
+  // -RESTORATION_EXTRA_HORZ so the 1st column (at x0 - RESTORATION_EXTRA_HORZ)
+  // has column x0 in the buffer.
+  const int buf_stride = rsb->stripe_boundary_stride;
+  const int buf_x0_off = limits->h_start;
+  const int line_width =
+      (limits->h_end - limits->h_start) + 2 * RESTORATION_EXTRA_HORZ;
+  const int line_size = line_width << use_highbd;
+
+  const int data_x0 = limits->h_start - RESTORATION_EXTRA_HORZ;
+
+  // Replace RESTORATION_BORDER pixels above the top of the stripe
+  // We expand RESTORATION_CTX_VERT=2 lines from rsb->stripe_boundary_above
+  // to fill RESTORATION_BORDER=3 lines of above pixels. This is done by
+  // duplicating the topmost of the 2 lines (see the AOMMAX call when
+  // calculating src_row, which gets the values 0, 0, 1 for i = -3, -2, -1).
+  //
+  // Special case: If we're at the top of a tile, which isn't on the topmost
+  // tile row, and we're allowed to loop filter across tiles, then we have a
+  // logical 64-pixel-high stripe which has been split into an 8-pixel high
+  // stripe and a 56-pixel high stripe (the current one). So, in this case,
+  // we want to leave the boundary alone!
+  if (!opt) {
+    if (copy_above) {
+      uint8_t *data8_tl = data8 + data_x0 + limits->v_start * data_stride;
+
+      for (int i = -RESTORATION_BORDER; i < 0; ++i) {
+        const int buf_row = rsb_row + AOMMAX(i + RESTORATION_CTX_VERT, 0);
+        const int buf_off = buf_x0_off + buf_row * buf_stride;
+        const uint8_t *buf =
+            rsb->stripe_boundary_above + (buf_off << use_highbd);
+        uint8_t *dst8 = data8_tl + i * data_stride;
+        // Save old pixels, then replace with data from stripe_boundary_above
+        memcpy(rlbs->tmp_save_above[i + RESTORATION_BORDER],
+               REAL_PTR(use_highbd, dst8), line_size);
+        memcpy(REAL_PTR(use_highbd, dst8), buf, line_size);
+      }
+    }
+
+    // Replace RESTORATION_BORDER pixels below the bottom of the stripe.
+    // The second buffer row is repeated, so src_row gets the values 0, 1, 1
+    // for i = 0, 1, 2.
+    if (copy_below) {
+      const int stripe_end = limits->v_start + h;
+      uint8_t *data8_bl = data8 + data_x0 + stripe_end * data_stride;
+
+      for (int i = 0; i < RESTORATION_BORDER; ++i) {
+        const int buf_row = rsb_row + AOMMIN(i, RESTORATION_CTX_VERT - 1);
+        const int buf_off = buf_x0_off + buf_row * buf_stride;
+        const uint8_t *src =
+            rsb->stripe_boundary_below + (buf_off << use_highbd);
+
+        uint8_t *dst8 = data8_bl + i * data_stride;
+        // Save old pixels, then replace with data from stripe_boundary_below
+        memcpy(rlbs->tmp_save_below[i], REAL_PTR(use_highbd, dst8), line_size);
+        memcpy(REAL_PTR(use_highbd, dst8), src, line_size);
+      }
+    }
+  } else {
+    if (copy_above) {
+      uint8_t *data8_tl = data8 + data_x0 + limits->v_start * data_stride;
+
+      // Only save and overwrite i=-RESTORATION_BORDER line.
+      uint8_t *dst8 = data8_tl + (-RESTORATION_BORDER) * data_stride;
+      // Save old pixels, then replace with data from stripe_boundary_above
+      memcpy(rlbs->tmp_save_above[0], REAL_PTR(use_highbd, dst8), line_size);
+      memcpy(REAL_PTR(use_highbd, dst8),
+             REAL_PTR(use_highbd,
+                      data8_tl + (-RESTORATION_BORDER + 1) * data_stride),
+             line_size);
+    }
+
+    if (copy_below) {
+      const int stripe_end = limits->v_start + h;
+      uint8_t *data8_bl = data8 + data_x0 + stripe_end * data_stride;
+
+      // Only save and overwrite i=2 line.
+      uint8_t *dst8 = data8_bl + 2 * data_stride;
+      // Save old pixels, then replace with data from stripe_boundary_below
+      memcpy(rlbs->tmp_save_below[2], REAL_PTR(use_highbd, dst8), line_size);
+      memcpy(REAL_PTR(use_highbd, dst8),
+             REAL_PTR(use_highbd, data8_bl + (2 - 1) * data_stride), line_size);
+    }
+  }
+}
+
+// This function restores the boundary lines modified by
+// setup_processing_stripe_boundary.
+//
+// Note: We need to be careful when handling the corners of the processing
+// unit, because (eg.) the top-left corner is considered to be part of
+// both the left and top borders. This means that, depending on the
+// loop_filter_across_tiles_enabled flag, the corner pixels might get
+// overwritten twice, once as part of the "top" border and once as part
+// of the "left" border (or similar for other corners).
+//
+// Everything works out fine as long as we make sure to reverse the order
+// when restoring, ie. we need to restore the left/right borders followed
+// by the top/bottom borders.
+static void restore_processing_stripe_boundary(
+    const RestorationTileLimits *limits, const RestorationLineBuffers *rlbs,
+    int use_highbd, int h, uint8_t *data8, int data_stride, int copy_above,
+    int copy_below, int opt) {
+  const int line_width =
+      (limits->h_end - limits->h_start) + 2 * RESTORATION_EXTRA_HORZ;
+  const int line_size = line_width << use_highbd;
+
+  const int data_x0 = limits->h_start - RESTORATION_EXTRA_HORZ;
+
+  if (!opt) {
+    if (copy_above) {
+      uint8_t *data8_tl = data8 + data_x0 + limits->v_start * data_stride;
+      for (int i = -RESTORATION_BORDER; i < 0; ++i) {
+        uint8_t *dst8 = data8_tl + i * data_stride;
+        memcpy(REAL_PTR(use_highbd, dst8),
+               rlbs->tmp_save_above[i + RESTORATION_BORDER], line_size);
+      }
+    }
+
+    if (copy_below) {
+      const int stripe_bottom = limits->v_start + h;
+      uint8_t *data8_bl = data8 + data_x0 + stripe_bottom * data_stride;
+
+      for (int i = 0; i < RESTORATION_BORDER; ++i) {
+        if (stripe_bottom + i >= limits->v_end + RESTORATION_BORDER) break;
+
+        uint8_t *dst8 = data8_bl + i * data_stride;
+        memcpy(REAL_PTR(use_highbd, dst8), rlbs->tmp_save_below[i], line_size);
+      }
+    }
+  } else {
+    if (copy_above) {
+      uint8_t *data8_tl = data8 + data_x0 + limits->v_start * data_stride;
+
+      // Only restore i=-RESTORATION_BORDER line.
+      uint8_t *dst8 = data8_tl + (-RESTORATION_BORDER) * data_stride;
+      memcpy(REAL_PTR(use_highbd, dst8), rlbs->tmp_save_above[0], line_size);
+    }
+
+    if (copy_below) {
+      const int stripe_bottom = limits->v_start + h;
+      uint8_t *data8_bl = data8 + data_x0 + stripe_bottom * data_stride;
+
+      // Only restore i=2 line.
+      if (stripe_bottom + 2 < limits->v_end + RESTORATION_BORDER) {
+        uint8_t *dst8 = data8_bl + 2 * data_stride;
+        memcpy(REAL_PTR(use_highbd, dst8), rlbs->tmp_save_below[2], line_size);
+      }
+    }
+  }
+}
+
+static void wiener_filter_stripe(const RestorationUnitInfo *rui,
+                                 int stripe_width, int stripe_height,
+                                 int procunit_width, const uint8_t *src,
+                                 int src_stride, uint8_t *dst, int dst_stride,
+                                 int32_t *tmpbuf, int bit_depth) {
+  (void)tmpbuf;
+  (void)bit_depth;
+  assert(bit_depth == 8);
+  const ConvolveParams conv_params = get_conv_params_wiener(8);
+
+  for (int j = 0; j < stripe_width; j += procunit_width) {
+    int w = AOMMIN(procunit_width, (stripe_width - j + 15) & ~15);
+    const uint8_t *src_p = src + j;
+    uint8_t *dst_p = dst + j;
+    av1_wiener_convolve_add_src(
+        src_p, src_stride, dst_p, dst_stride, rui->wiener_info.hfilter, 16,
+        rui->wiener_info.vfilter, 16, w, stripe_height, &conv_params);
+  }
+}
+
+/* Calculate windowed sums (if sqr=0) or sums of squares (if sqr=1)
+   over the input. The window is of size (2r + 1)x(2r + 1), and we
+   specialize to r = 1, 2, 3. A default function is used for r > 3.
+
+   Each loop follows the same format: We keep a window's worth of input
+   in individual variables and select data out of that as appropriate.
+*/
+static void boxsum1(int32_t *src, int width, int height, int src_stride,
+                    int sqr, int32_t *dst, int dst_stride) {
+  int i, j, a, b, c;
+  assert(width > 2 * SGRPROJ_BORDER_HORZ);
+  assert(height > 2 * SGRPROJ_BORDER_VERT);
+
+  // Vertical sum over 3-pixel regions, from src into dst.
+  if (!sqr) {
+    for (j = 0; j < width; ++j) {
+      a = src[j];
+      b = src[src_stride + j];
+      c = src[2 * src_stride + j];
+
+      dst[j] = a + b;
+      for (i = 1; i < height - 2; ++i) {
+        // Loop invariant: At the start of each iteration,
+        // a = src[(i - 1) * src_stride + j]
+        // b = src[(i    ) * src_stride + j]
+        // c = src[(i + 1) * src_stride + j]
+        dst[i * dst_stride + j] = a + b + c;
+        a = b;
+        b = c;
+        c = src[(i + 2) * src_stride + j];
+      }
+      dst[i * dst_stride + j] = a + b + c;
+      dst[(i + 1) * dst_stride + j] = b + c;
+    }
+  } else {
+    for (j = 0; j < width; ++j) {
+      a = src[j] * src[j];
+      b = src[src_stride + j] * src[src_stride + j];
+      c = src[2 * src_stride + j] * src[2 * src_stride + j];
+
+      dst[j] = a + b;
+      for (i = 1; i < height - 2; ++i) {
+        dst[i * dst_stride + j] = a + b + c;
+        a = b;
+        b = c;
+        c = src[(i + 2) * src_stride + j] * src[(i + 2) * src_stride + j];
+      }
+      dst[i * dst_stride + j] = a + b + c;
+      dst[(i + 1) * dst_stride + j] = b + c;
+    }
+  }
+
+  // Horizontal sum over 3-pixel regions of dst
+  for (i = 0; i < height; ++i) {
+    a = dst[i * dst_stride];
+    b = dst[i * dst_stride + 1];
+    c = dst[i * dst_stride + 2];
+
+    dst[i * dst_stride] = a + b;
+    for (j = 1; j < width - 2; ++j) {
+      // Loop invariant: At the start of each iteration,
+      // a = src[i * src_stride + (j - 1)]
+      // b = src[i * src_stride + (j    )]
+      // c = src[i * src_stride + (j + 1)]
+      dst[i * dst_stride + j] = a + b + c;
+      a = b;
+      b = c;
+      c = dst[i * dst_stride + (j + 2)];
+    }
+    dst[i * dst_stride + j] = a + b + c;
+    dst[i * dst_stride + (j + 1)] = b + c;
+  }
+}
+
+static void boxsum2(int32_t *src, int width, int height, int src_stride,
+                    int sqr, int32_t *dst, int dst_stride) {
+  int i, j, a, b, c, d, e;
+  assert(width > 2 * SGRPROJ_BORDER_HORZ);
+  assert(height > 2 * SGRPROJ_BORDER_VERT);
+
+  // Vertical sum over 5-pixel regions, from src into dst.
+  if (!sqr) {
+    for (j = 0; j < width; ++j) {
+      a = src[j];
+      b = src[src_stride + j];
+      c = src[2 * src_stride + j];
+      d = src[3 * src_stride + j];
+      e = src[4 * src_stride + j];
+
+      dst[j] = a + b + c;
+      dst[dst_stride + j] = a + b + c + d;
+      for (i = 2; i < height - 3; ++i) {
+        // Loop invariant: At the start of each iteration,
+        // a = src[(i - 2) * src_stride + j]
+        // b = src[(i - 1) * src_stride + j]
+        // c = src[(i    ) * src_stride + j]
+        // d = src[(i + 1) * src_stride + j]
+        // e = src[(i + 2) * src_stride + j]
+        dst[i * dst_stride + j] = a + b + c + d + e;
+        a = b;
+        b = c;
+        c = d;
+        d = e;
+        e = src[(i + 3) * src_stride + j];
+      }
+      dst[i * dst_stride + j] = a + b + c + d + e;
+      dst[(i + 1) * dst_stride + j] = b + c + d + e;
+      dst[(i + 2) * dst_stride + j] = c + d + e;
+    }
+  } else {
+    for (j = 0; j < width; ++j) {
+      a = src[j] * src[j];
+      b = src[src_stride + j] * src[src_stride + j];
+      c = src[2 * src_stride + j] * src[2 * src_stride + j];
+      d = src[3 * src_stride + j] * src[3 * src_stride + j];
+      e = src[4 * src_stride + j] * src[4 * src_stride + j];
+
+      dst[j] = a + b + c;
+      dst[dst_stride + j] = a + b + c + d;
+      for (i = 2; i < height - 3; ++i) {
+        dst[i * dst_stride + j] = a + b + c + d + e;
+        a = b;
+        b = c;
+        c = d;
+        d = e;
+        e = src[(i + 3) * src_stride + j] * src[(i + 3) * src_stride + j];
+      }
+      dst[i * dst_stride + j] = a + b + c + d + e;
+      dst[(i + 1) * dst_stride + j] = b + c + d + e;
+      dst[(i + 2) * dst_stride + j] = c + d + e;
+    }
+  }
+
+  // Horizontal sum over 5-pixel regions of dst
+  for (i = 0; i < height; ++i) {
+    a = dst[i * dst_stride];
+    b = dst[i * dst_stride + 1];
+    c = dst[i * dst_stride + 2];
+    d = dst[i * dst_stride + 3];
+    e = dst[i * dst_stride + 4];
+
+    dst[i * dst_stride] = a + b + c;
+    dst[i * dst_stride + 1] = a + b + c + d;
+    for (j = 2; j < width - 3; ++j) {
+      // Loop invariant: At the start of each iteration,
+      // a = src[i * src_stride + (j - 2)]
+      // b = src[i * src_stride + (j - 1)]
+      // c = src[i * src_stride + (j    )]
+      // d = src[i * src_stride + (j + 1)]
+      // e = src[i * src_stride + (j + 2)]
+      dst[i * dst_stride + j] = a + b + c + d + e;
+      a = b;
+      b = c;
+      c = d;
+      d = e;
+      e = dst[i * dst_stride + (j + 3)];
+    }
+    dst[i * dst_stride + j] = a + b + c + d + e;
+    dst[i * dst_stride + (j + 1)] = b + c + d + e;
+    dst[i * dst_stride + (j + 2)] = c + d + e;
+  }
+}
+
+static void boxsum(int32_t *src, int width, int height, int src_stride, int r,
+                   int sqr, int32_t *dst, int dst_stride) {
+  if (r == 1)
+    boxsum1(src, width, height, src_stride, sqr, dst, dst_stride);
+  else if (r == 2)
+    boxsum2(src, width, height, src_stride, sqr, dst, dst_stride);
+  else
+    assert(0 && "Invalid value of r in self-guided filter");
+}
+
+void decode_xq(const int *xqd, int *xq, const sgr_params_type *params) {
+  if (params->r[0] == 0) {
+    xq[0] = 0;
+    xq[1] = (1 << SGRPROJ_PRJ_BITS) - xqd[1];
+  } else if (params->r[1] == 0) {
+    xq[0] = xqd[0];
+    xq[1] = 0;
+  } else {
+    xq[0] = xqd[0];
+    xq[1] = (1 << SGRPROJ_PRJ_BITS) - xq[0] - xqd[1];
+  }
+}
+
+const int32_t x_by_xplus1[256] = {
+  // Special case: Map 0 -> 1 (corresponding to a value of 1/256)
+  // instead of 0. See comments in selfguided_restoration_internal() for why
+  1,   128, 171, 192, 205, 213, 219, 224, 228, 230, 233, 235, 236, 238, 239,
+  240, 241, 242, 243, 243, 244, 244, 245, 245, 246, 246, 247, 247, 247, 247,
+  248, 248, 248, 248, 249, 249, 249, 249, 249, 250, 250, 250, 250, 250, 250,
+  250, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 252, 252, 252, 252,
+  252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 252, 253, 253,
+  253, 253, 253, 253, 253, 253, 253, 253, 253, 253, 253, 253, 253, 253, 253,
+  253, 253, 253, 253, 253, 253, 253, 253, 253, 253, 253, 253, 254, 254, 254,
+  254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254,
+  254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254,
+  254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254,
+  254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254, 254,
+  254, 254, 254, 254, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
+  255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
+  255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
+  255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
+  255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
+  255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
+  256,
+};
+
+const int32_t one_by_x[MAX_NELEM] = {
+  4096, 2048, 1365, 1024, 819, 683, 585, 512, 455, 410, 372, 341, 315,
+  293,  273,  256,  241,  228, 216, 205, 195, 186, 178, 171, 164,
+};
+
+static void calculate_intermediate_result(int32_t *dgd, int width, int height,
+                                          int dgd_stride, int bit_depth,
+                                          int sgr_params_idx, int radius_idx,
+                                          int pass, int32_t *A, int32_t *B) {
+  const sgr_params_type *const params = &sgr_params[sgr_params_idx];
+  const int r = params->r[radius_idx];
+  const int width_ext = width + 2 * SGRPROJ_BORDER_HORZ;
+  const int height_ext = height + 2 * SGRPROJ_BORDER_VERT;
+  // Adjusting the stride of A and B here appears to avoid bad cache effects,
+  // leading to a significant speed improvement.
+  // We also align the stride to a multiple of 16 bytes, for consistency
+  // with the SIMD version of this function.
+  int buf_stride = ((width_ext + 3) & ~3) + 16;
+  const int step = pass == 0 ? 1 : 2;
+  int i, j;
+
+  assert(r <= MAX_RADIUS && "Need MAX_RADIUS >= r");
+  assert(r <= SGRPROJ_BORDER_VERT - 1 && r <= SGRPROJ_BORDER_HORZ - 1 &&
+         "Need SGRPROJ_BORDER_* >= r+1");
+
+  boxsum(dgd - dgd_stride * SGRPROJ_BORDER_VERT - SGRPROJ_BORDER_HORZ,
+         width_ext, height_ext, dgd_stride, r, 0, B, buf_stride);
+  boxsum(dgd - dgd_stride * SGRPROJ_BORDER_VERT - SGRPROJ_BORDER_HORZ,
+         width_ext, height_ext, dgd_stride, r, 1, A, buf_stride);
+  A += SGRPROJ_BORDER_VERT * buf_stride + SGRPROJ_BORDER_HORZ;
+  B += SGRPROJ_BORDER_VERT * buf_stride + SGRPROJ_BORDER_HORZ;
+  // Calculate the eventual A[] and B[] arrays. Include a 1-pixel border - ie,
+  // for a 64x64 processing unit, we calculate 66x66 pixels of A[] and B[].
+  for (i = -1; i < height + 1; i += step) {
+    for (j = -1; j < width + 1; ++j) {
+      const int k = i * buf_stride + j;
+      const int n = (2 * r + 1) * (2 * r + 1);
+
+      // a < 2^16 * n < 2^22 regardless of bit depth
+      uint32_t a = ROUND_POWER_OF_TWO(A[k], 2 * (bit_depth - 8));
+      // b < 2^8 * n < 2^14 regardless of bit depth
+      uint32_t b = ROUND_POWER_OF_TWO(B[k], bit_depth - 8);
+
+      // Each term in calculating p = a * n - b * b is < 2^16 * n^2 < 2^28,
+      // and p itself satisfies p < 2^14 * n^2 < 2^26.
+      // This bound on p is due to:
+      // https://en.wikipedia.org/wiki/Popoviciu's_inequality_on_variances
+      //
+      // Note: Sometimes, in high bit depth, we can end up with a*n < b*b.
+      // This is an artefact of rounding, and can only happen if all pixels
+      // are (almost) identical, so in this case we saturate to p=0.
+      uint32_t p = (a * n < b * b) ? 0 : a * n - b * b;
+
+      const uint32_t s = params->s[radius_idx];
+
+      // p * s < (2^14 * n^2) * round(2^20 / n^2 eps) < 2^34 / eps < 2^32
+      // as long as eps >= 4. So p * s fits into a uint32_t, and z < 2^12
+      // (this holds even after accounting for the rounding in s)
+      const uint32_t z = ROUND_POWER_OF_TWO(p * s, SGRPROJ_MTABLE_BITS);
+
+      // Note: We have to be quite careful about the value of A[k].
+      // This is used as a blend factor between individual pixel values and the
+      // local mean. So it logically has a range of [0, 256], including both
+      // endpoints.
+      //
+      // This is a pain for hardware, as we'd like something which can be stored
+      // in exactly 8 bits.
+      // Further, in the calculation of B[k] below, if z == 0 and r == 2,
+      // then A[k] "should be" 0. But then we can end up setting B[k] to a value
+      // slightly above 2^(8 + bit depth), due to rounding in the value of
+      // one_by_x[25-1].
+      //
+      // Thus we saturate so that, when z == 0, A[k] is set to 1 instead of 0.
+      // This fixes the above issues (256 - A[k] fits in a uint8, and we can't
+      // overflow), without significantly affecting the final result: z == 0
+      // implies that the image is essentially "flat", so the local mean and
+      // individual pixel values are very similar.
+      //
+      // Note that saturating on the other side, ie. requring A[k] <= 255,
+      // would be a bad idea, as that corresponds to the case where the image
+      // is very variable, when we want to preserve the local pixel value as
+      // much as possible.
+      A[k] = x_by_xplus1[AOMMIN(z, 255)];  // in range [1, 256]
+
+      // SGRPROJ_SGR - A[k] < 2^8 (from above), B[k] < 2^(bit_depth) * n,
+      // one_by_x[n - 1] = round(2^12 / n)
+      // => the product here is < 2^(20 + bit_depth) <= 2^32,
+      // and B[k] is set to a value < 2^(8 + bit depth)
+      // This holds even with the rounding in one_by_x and in the overall
+      // result, as long as SGRPROJ_SGR - A[k] is strictly less than 2^8.
+      B[k] = (int32_t)ROUND_POWER_OF_TWO((uint32_t)(SGRPROJ_SGR - A[k]) *
+                                             (uint32_t)B[k] *
+                                             (uint32_t)one_by_x[n - 1],
+                                         SGRPROJ_RECIP_BITS);
+    }
+  }
+}
+
+static void selfguided_restoration_fast_internal(
+    int32_t *dgd, int width, int height, int dgd_stride, int32_t *dst,
+    int dst_stride, int bit_depth, int sgr_params_idx, int radius_idx) {
+  const sgr_params_type *const params = &sgr_params[sgr_params_idx];
+  const int r = params->r[radius_idx];
+  const int width_ext = width + 2 * SGRPROJ_BORDER_HORZ;
+  // Adjusting the stride of A and B here appears to avoid bad cache effects,
+  // leading to a significant speed improvement.
+  // We also align the stride to a multiple of 16 bytes, for consistency
+  // with the SIMD version of this function.
+  int buf_stride = ((width_ext + 3) & ~3) + 16;
+  int32_t A_[RESTORATION_PROC_UNIT_PELS];
+  int32_t B_[RESTORATION_PROC_UNIT_PELS];
+  int32_t *A = A_;
+  int32_t *B = B_;
+  int i, j;
+  calculate_intermediate_result(dgd, width, height, dgd_stride, bit_depth,
+                                sgr_params_idx, radius_idx, 1, A, B);
+  A += SGRPROJ_BORDER_VERT * buf_stride + SGRPROJ_BORDER_HORZ;
+  B += SGRPROJ_BORDER_VERT * buf_stride + SGRPROJ_BORDER_HORZ;
+
+  // Use the A[] and B[] arrays to calculate the filtered image
+  (void)r;
+  assert(r == 2);
+  for (i = 0; i < height; ++i) {
+    if (!(i & 1)) {  // even row
+      for (j = 0; j < width; ++j) {
+        const int k = i * buf_stride + j;
+        const int l = i * dgd_stride + j;
+        const int m = i * dst_stride + j;
+        const int nb = 5;
+        const int32_t a = (A[k - buf_stride] + A[k + buf_stride]) * 6 +
+                          (A[k - 1 - buf_stride] + A[k - 1 + buf_stride] +
+                           A[k + 1 - buf_stride] + A[k + 1 + buf_stride]) *
+                              5;
+        const int32_t b = (B[k - buf_stride] + B[k + buf_stride]) * 6 +
+                          (B[k - 1 - buf_stride] + B[k - 1 + buf_stride] +
+                           B[k + 1 - buf_stride] + B[k + 1 + buf_stride]) *
+                              5;
+        const int32_t v = a * dgd[l] + b;
+        dst[m] =
+            ROUND_POWER_OF_TWO(v, SGRPROJ_SGR_BITS + nb - SGRPROJ_RST_BITS);
+      }
+    } else {  // odd row
+      for (j = 0; j < width; ++j) {
+        const int k = i * buf_stride + j;
+        const int l = i * dgd_stride + j;
+        const int m = i * dst_stride + j;
+        const int nb = 4;
+        const int32_t a = A[k] * 6 + (A[k - 1] + A[k + 1]) * 5;
+        const int32_t b = B[k] * 6 + (B[k - 1] + B[k + 1]) * 5;
+        const int32_t v = a * dgd[l] + b;
+        dst[m] =
+            ROUND_POWER_OF_TWO(v, SGRPROJ_SGR_BITS + nb - SGRPROJ_RST_BITS);
+      }
+    }
+  }
+}
+
+static void selfguided_restoration_internal(int32_t *dgd, int width, int height,
+                                            int dgd_stride, int32_t *dst,
+                                            int dst_stride, int bit_depth,
+                                            int sgr_params_idx,
+                                            int radius_idx) {
+  const int width_ext = width + 2 * SGRPROJ_BORDER_HORZ;
+  // Adjusting the stride of A and B here appears to avoid bad cache effects,
+  // leading to a significant speed improvement.
+  // We also align the stride to a multiple of 16 bytes, for consistency
+  // with the SIMD version of this function.
+  int buf_stride = ((width_ext + 3) & ~3) + 16;
+  int32_t A_[RESTORATION_PROC_UNIT_PELS];
+  int32_t B_[RESTORATION_PROC_UNIT_PELS];
+  int32_t *A = A_;
+  int32_t *B = B_;
+  int i, j;
+  calculate_intermediate_result(dgd, width, height, dgd_stride, bit_depth,
+                                sgr_params_idx, radius_idx, 0, A, B);
+  A += SGRPROJ_BORDER_VERT * buf_stride + SGRPROJ_BORDER_HORZ;
+  B += SGRPROJ_BORDER_VERT * buf_stride + SGRPROJ_BORDER_HORZ;
+
+  // Use the A[] and B[] arrays to calculate the filtered image
+  for (i = 0; i < height; ++i) {
+    for (j = 0; j < width; ++j) {
+      const int k = i * buf_stride + j;
+      const int l = i * dgd_stride + j;
+      const int m = i * dst_stride + j;
+      const int nb = 5;
+      const int32_t a =
+          (A[k] + A[k - 1] + A[k + 1] + A[k - buf_stride] + A[k + buf_stride]) *
+              4 +
+          (A[k - 1 - buf_stride] + A[k - 1 + buf_stride] +
+           A[k + 1 - buf_stride] + A[k + 1 + buf_stride]) *
+              3;
+      const int32_t b =
+          (B[k] + B[k - 1] + B[k + 1] + B[k - buf_stride] + B[k + buf_stride]) *
+              4 +
+          (B[k - 1 - buf_stride] + B[k - 1 + buf_stride] +
+           B[k + 1 - buf_stride] + B[k + 1 + buf_stride]) *
+              3;
+      const int32_t v = a * dgd[l] + b;
+      dst[m] = ROUND_POWER_OF_TWO(v, SGRPROJ_SGR_BITS + nb - SGRPROJ_RST_BITS);
+    }
+  }
+}
+
+int av1_selfguided_restoration_c(const uint8_t *dgd8, int width, int height,
+                                 int dgd_stride, int32_t *flt0, int32_t *flt1,
+                                 int flt_stride, int sgr_params_idx,
+                                 int bit_depth, int highbd) {
+  int32_t dgd32_[RESTORATION_PROC_UNIT_PELS];
+  const int dgd32_stride = width + 2 * SGRPROJ_BORDER_HORZ;
+  int32_t *dgd32 =
+      dgd32_ + dgd32_stride * SGRPROJ_BORDER_VERT + SGRPROJ_BORDER_HORZ;
+
+  if (highbd) {
+    const uint16_t *dgd16 = CONVERT_TO_SHORTPTR(dgd8);
+    for (int i = -SGRPROJ_BORDER_VERT; i < height + SGRPROJ_BORDER_VERT; ++i) {
+      for (int j = -SGRPROJ_BORDER_HORZ; j < width + SGRPROJ_BORDER_HORZ; ++j) {
+        dgd32[i * dgd32_stride + j] = dgd16[i * dgd_stride + j];
+      }
+    }
+  } else {
+    for (int i = -SGRPROJ_BORDER_VERT; i < height + SGRPROJ_BORDER_VERT; ++i) {
+      for (int j = -SGRPROJ_BORDER_HORZ; j < width + SGRPROJ_BORDER_HORZ; ++j) {
+        dgd32[i * dgd32_stride + j] = dgd8[i * dgd_stride + j];
+      }
+    }
+  }
+
+  const sgr_params_type *const params = &sgr_params[sgr_params_idx];
+  // If params->r == 0 we skip the corresponding filter. We only allow one of
+  // the radii to be 0, as having both equal to 0 would be equivalent to
+  // skipping SGR entirely.
+  assert(!(params->r[0] == 0 && params->r[1] == 0));
+
+  if (params->r[0] > 0)
+    selfguided_restoration_fast_internal(dgd32, width, height, dgd32_stride,
+                                         flt0, flt_stride, bit_depth,
+                                         sgr_params_idx, 0);
+  if (params->r[1] > 0)
+    selfguided_restoration_internal(dgd32, width, height, dgd32_stride, flt1,
+                                    flt_stride, bit_depth, sgr_params_idx, 1);
+  return 0;
+}
+
+void apply_selfguided_restoration_c(const uint8_t *dat8, int width, int height,
+                                    int stride, int eps, const int *xqd,
+                                    uint8_t *dst8, int dst_stride,
+                                    int32_t *tmpbuf, int bit_depth,
+                                    int highbd) {
+  int32_t *flt0 = tmpbuf;
+  int32_t *flt1 = flt0 + RESTORATION_UNITPELS_MAX;
+  assert(width * height <= RESTORATION_UNITPELS_MAX);
+
+  const int ret = av1_selfguided_restoration_c(
+      dat8, width, height, stride, flt0, flt1, width, eps, bit_depth, highbd);
+  (void)ret;
+  assert(!ret);
+  const sgr_params_type *const params = &sgr_params[eps];
+  int xq[2];
+  decode_xq(xqd, xq, params);
+  for (int i = 0; i < height; ++i) {
+    for (int j = 0; j < width; ++j) {
+      const int k = i * width + j;
+      uint8_t *dst8ij = dst8 + i * dst_stride + j;
+      const uint8_t *dat8ij = dat8 + i * stride + j;
+
+      const uint16_t pre_u = highbd ? *CONVERT_TO_SHORTPTR(dat8ij) : *dat8ij;
+      const int32_t u = (int32_t)pre_u << SGRPROJ_RST_BITS;
+      int32_t v = u << SGRPROJ_PRJ_BITS;
+      // If params->r == 0 then we skipped the filtering in
+      // av1_selfguided_restoration_c, i.e. flt[k] == u
+      if (params->r[0] > 0) v += xq[0] * (flt0[k] - u);
+      if (params->r[1] > 0) v += xq[1] * (flt1[k] - u);
+      const int16_t w =
+          (int16_t)ROUND_POWER_OF_TWO(v, SGRPROJ_PRJ_BITS + SGRPROJ_RST_BITS);
+
+      const uint16_t out = clip_pixel_highbd(w, bit_depth);
+      if (highbd)
+        *CONVERT_TO_SHORTPTR(dst8ij) = out;
+      else
+        *dst8ij = (uint8_t)out;
+    }
+  }
+}
+
+static void sgrproj_filter_stripe(const RestorationUnitInfo *rui,
+                                  int stripe_width, int stripe_height,
+                                  int procunit_width, const uint8_t *src,
+                                  int src_stride, uint8_t *dst, int dst_stride,
+                                  int32_t *tmpbuf, int bit_depth) {
+  (void)bit_depth;
+  assert(bit_depth == 8);
+
+  for (int j = 0; j < stripe_width; j += procunit_width) {
+    int w = AOMMIN(procunit_width, stripe_width - j);
+    apply_selfguided_restoration(src + j, w, stripe_height, src_stride,
+                                 rui->sgrproj_info.ep, rui->sgrproj_info.xqd,
+                                 dst + j, dst_stride, tmpbuf, bit_depth, 0);
+  }
+}
+
+static void wiener_filter_stripe_highbd(const RestorationUnitInfo *rui,
+                                        int stripe_width, int stripe_height,
+                                        int procunit_width, const uint8_t *src8,
+                                        int src_stride, uint8_t *dst8,
+                                        int dst_stride, int32_t *tmpbuf,
+                                        int bit_depth) {
+  (void)tmpbuf;
+  const ConvolveParams conv_params = get_conv_params_wiener(bit_depth);
+
+  for (int j = 0; j < stripe_width; j += procunit_width) {
+    int w = AOMMIN(procunit_width, (stripe_width - j + 15) & ~15);
+    const uint8_t *src8_p = src8 + j;
+    uint8_t *dst8_p = dst8 + j;
+    av1_highbd_wiener_convolve_add_src(src8_p, src_stride, dst8_p, dst_stride,
+                                       rui->wiener_info.hfilter, 16,
+                                       rui->wiener_info.vfilter, 16, w,
+                                       stripe_height, &conv_params, bit_depth);
+  }
+}
+
+static void sgrproj_filter_stripe_highbd(const RestorationUnitInfo *rui,
+                                         int stripe_width, int stripe_height,
+                                         int procunit_width,
+                                         const uint8_t *src8, int src_stride,
+                                         uint8_t *dst8, int dst_stride,
+                                         int32_t *tmpbuf, int bit_depth) {
+  for (int j = 0; j < stripe_width; j += procunit_width) {
+    int w = AOMMIN(procunit_width, stripe_width - j);
+    apply_selfguided_restoration(src8 + j, w, stripe_height, src_stride,
+                                 rui->sgrproj_info.ep, rui->sgrproj_info.xqd,
+                                 dst8 + j, dst_stride, tmpbuf, bit_depth, 1);
+  }
+}
+
+typedef void (*stripe_filter_fun)(const RestorationUnitInfo *rui,
+                                  int stripe_width, int stripe_height,
+                                  int procunit_width, const uint8_t *src,
+                                  int src_stride, uint8_t *dst, int dst_stride,
+                                  int32_t *tmpbuf, int bit_depth);
+
+#define NUM_STRIPE_FILTERS 4
+
+static const stripe_filter_fun stripe_filters[NUM_STRIPE_FILTERS] = {
+  wiener_filter_stripe, sgrproj_filter_stripe, wiener_filter_stripe_highbd,
+  sgrproj_filter_stripe_highbd
+};
+
+// Filter one restoration unit
+void av1_loop_restoration_filter_unit(
+    const RestorationTileLimits *limits, const RestorationUnitInfo *rui,
+    const RestorationStripeBoundaries *rsb, RestorationLineBuffers *rlbs,
+    const AV1PixelRect *tile_rect, int tile_stripe0, int ss_x, int ss_y,
+    int highbd, int bit_depth, uint8_t *data8, int stride, uint8_t *dst8,
+    int dst_stride, int32_t *tmpbuf, int optimized_lr) {
+  RestorationType unit_rtype = rui->restoration_type;
+
+  int unit_h = limits->v_end - limits->v_start;
+  int unit_w = limits->h_end - limits->h_start;
+  uint8_t *data8_tl = data8 + limits->v_start * stride + limits->h_start;
+  uint8_t *dst8_tl = dst8 + limits->v_start * dst_stride + limits->h_start;
+
+  if (unit_rtype == RESTORE_NONE) {
+    copy_tile(unit_w, unit_h, data8_tl, stride, dst8_tl, dst_stride, highbd);
+    return;
+  }
+
+  const int filter_idx = 2 * highbd + (unit_rtype == RESTORE_SGRPROJ);
+  assert(filter_idx < NUM_STRIPE_FILTERS);
+  const stripe_filter_fun stripe_filter = stripe_filters[filter_idx];
+
+  const int procunit_width = RESTORATION_PROC_UNIT_SIZE >> ss_x;
+
+  // Convolve the whole tile one stripe at a time
+  RestorationTileLimits remaining_stripes = *limits;
+  int i = 0;
+  while (i < unit_h) {
+    int copy_above, copy_below;
+    remaining_stripes.v_start = limits->v_start + i;
+
+    get_stripe_boundary_info(&remaining_stripes, tile_rect, ss_y, &copy_above,
+                             &copy_below);
+
+    const int full_stripe_height = RESTORATION_PROC_UNIT_SIZE >> ss_y;
+    const int runit_offset = RESTORATION_UNIT_OFFSET >> ss_y;
+
+    // Work out where this stripe's boundaries are within
+    // rsb->stripe_boundary_{above,below}
+    const int tile_stripe =
+        (remaining_stripes.v_start - tile_rect->top + runit_offset) /
+        full_stripe_height;
+    const int frame_stripe = tile_stripe0 + tile_stripe;
+    const int rsb_row = RESTORATION_CTX_VERT * frame_stripe;
+
+    // Calculate this stripe's height, based on two rules:
+    // * The topmost stripe in each tile is 8 luma pixels shorter than usual.
+    // * We can't extend past the end of the current restoration unit
+    const int nominal_stripe_height =
+        full_stripe_height - ((tile_stripe == 0) ? runit_offset : 0);
+    const int h = AOMMIN(nominal_stripe_height,
+                         remaining_stripes.v_end - remaining_stripes.v_start);
+
+    setup_processing_stripe_boundary(&remaining_stripes, rsb, rsb_row, highbd,
+                                     h, data8, stride, rlbs, copy_above,
+                                     copy_below, optimized_lr);
+
+    stripe_filter(rui, unit_w, h, procunit_width, data8_tl + i * stride, stride,
+                  dst8_tl + i * dst_stride, dst_stride, tmpbuf, bit_depth);
+
+    restore_processing_stripe_boundary(&remaining_stripes, rlbs, highbd, h,
+                                       data8, stride, copy_above, copy_below,
+                                       optimized_lr);
+
+    i += h;
+  }
+}
+
+static void filter_frame_on_unit(const RestorationTileLimits *limits,
+                                 const AV1PixelRect *tile_rect,
+                                 int rest_unit_idx, void *priv, int32_t *tmpbuf,
+                                 RestorationLineBuffers *rlbs) {
+  FilterFrameCtxt *ctxt = (FilterFrameCtxt *)priv;
+  const RestorationInfo *rsi = ctxt->rsi;
+
+  av1_loop_restoration_filter_unit(
+      limits, &rsi->unit_info[rest_unit_idx], &rsi->boundaries, rlbs, tile_rect,
+      ctxt->tile_stripe0, ctxt->ss_x, ctxt->ss_y, ctxt->highbd, ctxt->bit_depth,
+      ctxt->data8, ctxt->data_stride, ctxt->dst8, ctxt->dst_stride, tmpbuf,
+      rsi->optimized_lr);
+}
+
+void av1_loop_restoration_filter_frame_init(AV1LrStruct *lr_ctxt,
+                                            YV12_BUFFER_CONFIG *frame,
+                                            AV1_COMMON *cm, int optimized_lr,
+                                            int num_planes) {
+  const SequenceHeader *const seq_params = &cm->seq_params;
+  const int bit_depth = seq_params->bit_depth;
+  const int highbd = seq_params->use_highbitdepth;
+  lr_ctxt->dst = &cm->rst_frame;
+
+  lr_ctxt->on_rest_unit = filter_frame_on_unit;
+  lr_ctxt->frame = frame;
+  for (int plane = 0; plane < num_planes; ++plane) {
+    RestorationInfo *rsi = &cm->rst_info[plane];
+    RestorationType rtype = rsi->frame_restoration_type;
+    rsi->optimized_lr = optimized_lr;
+
+    if (rtype == RESTORE_NONE) {
+      continue;
+    }
+
+    const int is_uv = plane > 0;
+    FilterFrameCtxt *lr_plane_ctxt = &lr_ctxt->ctxt[plane];
+    lr_plane_ctxt->rsi = rsi;
+    lr_plane_ctxt->ss_x = is_uv && seq_params->subsampling_x;
+    lr_plane_ctxt->ss_y = is_uv && seq_params->subsampling_y;
+    lr_plane_ctxt->highbd = highbd;
+    lr_plane_ctxt->bit_depth = bit_depth;
+    lr_plane_ctxt->data8 = frame->buffers[plane];
+    lr_plane_ctxt->dst8 = lr_ctxt->dst->buffers[plane];
+    lr_plane_ctxt->data_stride = frame->strides[is_uv];
+    lr_plane_ctxt->dst_stride = lr_ctxt->dst->strides[is_uv];
+    lr_plane_ctxt->tile_rect = av1_whole_frame_rect(cm, is_uv);
+    lr_plane_ctxt->tile_stripe0 = 0;
+  }
+}
+
+void av1_loop_restoration_copy_planes(AV1LrStruct *loop_rest_ctxt,
+                                      AV1_COMMON *cm, int num_planes) {
+  typedef void (*copy_fun)(const YV12_BUFFER_CONFIG *src_ybc,
+                           YV12_BUFFER_CONFIG *dst_ybc, int hstart, int hend,
+                           int vstart, int vend);
+  static const copy_fun copy_funs[3] = { aom_yv12_partial_coloc_copy_y,
+                                         aom_yv12_partial_coloc_copy_u,
+                                         aom_yv12_partial_coloc_copy_v };
+
+  for (int plane = 0; plane < num_planes; ++plane) {
+    if (cm->rst_info[plane].frame_restoration_type == RESTORE_NONE) continue;
+    AV1PixelRect tile_rect = loop_rest_ctxt->ctxt[plane].tile_rect;
+    copy_funs[plane](loop_rest_ctxt->dst, loop_rest_ctxt->frame, tile_rect.left,
+                     tile_rect.right, tile_rect.top, tile_rect.bottom);
+  }
+}
+
+static void foreach_rest_unit_in_planes(AV1LrStruct *lr_ctxt, AV1_COMMON *cm,
+                                        int num_planes) {
+  FilterFrameCtxt *ctxt = lr_ctxt->ctxt;
+
+  for (int plane = 0; plane < num_planes; ++plane) {
+    if (cm->rst_info[plane].frame_restoration_type == RESTORE_NONE) {
+      continue;
+    }
+
+    av1_foreach_rest_unit_in_plane(cm, plane, lr_ctxt->on_rest_unit,
+                                   &ctxt[plane], &ctxt[plane].tile_rect,
+                                   cm->rst_tmpbuf, cm->rlbs);
+  }
+}
+
+void av1_loop_restoration_filter_frame(YV12_BUFFER_CONFIG *frame,
+                                       AV1_COMMON *cm, int optimized_lr,
+                                       void *lr_ctxt) {
+  assert(!cm->all_lossless);
+  const int num_planes = av1_num_planes(cm);
+
+  AV1LrStruct *loop_rest_ctxt = (AV1LrStruct *)lr_ctxt;
+
+  av1_loop_restoration_filter_frame_init(loop_rest_ctxt, frame, cm,
+                                         optimized_lr, num_planes);
+
+  foreach_rest_unit_in_planes(loop_rest_ctxt, cm, num_planes);
+
+  av1_loop_restoration_copy_planes(loop_rest_ctxt, cm, num_planes);
+}
+
+void av1_foreach_rest_unit_in_row(
+    RestorationTileLimits *limits, const AV1PixelRect *tile_rect,
+    rest_unit_visitor_t on_rest_unit, int row_number, int unit_size,
+    int unit_idx0, int hunits_per_tile, int vunits_per_tile, int plane,
+    void *priv, int32_t *tmpbuf, RestorationLineBuffers *rlbs,
+    sync_read_fn_t on_sync_read, sync_write_fn_t on_sync_write,
+    struct AV1LrSyncData *const lr_sync) {
+  const int tile_w = tile_rect->right - tile_rect->left;
+  const int ext_size = unit_size * 3 / 2;
+  int x0 = 0, j = 0;
+  while (x0 < tile_w) {
+    int remaining_w = tile_w - x0;
+    int w = (remaining_w < ext_size) ? remaining_w : unit_size;
+
+    limits->h_start = tile_rect->left + x0;
+    limits->h_end = tile_rect->left + x0 + w;
+    assert(limits->h_end <= tile_rect->right);
+
+    const int unit_idx = unit_idx0 + row_number * hunits_per_tile + j;
+
+    // No sync for even numbered rows
+    // For odd numbered rows, Loop Restoration of current block requires the LR
+    // of top-right and bottom-right blocks to be completed
+
+    // top-right sync
+    on_sync_read(lr_sync, row_number, j, plane);
+    if ((row_number + 1) < vunits_per_tile)
+      // bottom-right sync
+      on_sync_read(lr_sync, row_number + 2, j, plane);
+
+    on_rest_unit(limits, tile_rect, unit_idx, priv, tmpbuf, rlbs);
+
+    on_sync_write(lr_sync, row_number, j, hunits_per_tile, plane);
+
+    x0 += w;
+    ++j;
+  }
+}
+
+void av1_lr_sync_read_dummy(void *const lr_sync, int r, int c, int plane) {
+  (void)lr_sync;
+  (void)r;
+  (void)c;
+  (void)plane;
+}
+
+void av1_lr_sync_write_dummy(void *const lr_sync, int r, int c,
+                             const int sb_cols, int plane) {
+  (void)lr_sync;
+  (void)r;
+  (void)c;
+  (void)sb_cols;
+  (void)plane;
+}
+
+static void foreach_rest_unit_in_tile(
+    const AV1PixelRect *tile_rect, int tile_row, int tile_col, int tile_cols,
+    int hunits_per_tile, int vunits_per_tile, int units_per_tile, int unit_size,
+    int ss_y, int plane, rest_unit_visitor_t on_rest_unit, void *priv,
+    int32_t *tmpbuf, RestorationLineBuffers *rlbs) {
+  const int tile_h = tile_rect->bottom - tile_rect->top;
+  const int ext_size = unit_size * 3 / 2;
+
+  const int tile_idx = tile_col + tile_row * tile_cols;
+  const int unit_idx0 = tile_idx * units_per_tile;
+
+  int y0 = 0, i = 0;
+  while (y0 < tile_h) {
+    int remaining_h = tile_h - y0;
+    int h = (remaining_h < ext_size) ? remaining_h : unit_size;
+
+    RestorationTileLimits limits;
+    limits.v_start = tile_rect->top + y0;
+    limits.v_end = tile_rect->top + y0 + h;
+    assert(limits.v_end <= tile_rect->bottom);
+    // Offset the tile upwards to align with the restoration processing stripe
+    const int voffset = RESTORATION_UNIT_OFFSET >> ss_y;
+    limits.v_start = AOMMAX(tile_rect->top, limits.v_start - voffset);
+    if (limits.v_end < tile_rect->bottom) limits.v_end -= voffset;
+
+    av1_foreach_rest_unit_in_row(
+        &limits, tile_rect, on_rest_unit, i, unit_size, unit_idx0,
+        hunits_per_tile, vunits_per_tile, plane, priv, tmpbuf, rlbs,
+        av1_lr_sync_read_dummy, av1_lr_sync_write_dummy, NULL);
+
+    y0 += h;
+    ++i;
+  }
+}
+
+void av1_foreach_rest_unit_in_plane(const struct AV1Common *cm, int plane,
+                                    rest_unit_visitor_t on_rest_unit,
+                                    void *priv, AV1PixelRect *tile_rect,
+                                    int32_t *tmpbuf,
+                                    RestorationLineBuffers *rlbs) {
+  const int is_uv = plane > 0;
+  const int ss_y = is_uv && cm->seq_params.subsampling_y;
+
+  const RestorationInfo *rsi = &cm->rst_info[plane];
+
+  foreach_rest_unit_in_tile(tile_rect, LR_TILE_ROW, LR_TILE_COL, LR_TILE_COLS,
+                            rsi->horz_units_per_tile, rsi->vert_units_per_tile,
+                            rsi->units_per_tile, rsi->restoration_unit_size,
+                            ss_y, plane, on_rest_unit, priv, tmpbuf, rlbs);
+}
+
+int av1_loop_restoration_corners_in_sb(const struct AV1Common *cm, int plane,
+                                       int mi_row, int mi_col, BLOCK_SIZE bsize,
+                                       int *rcol0, int *rcol1, int *rrow0,
+                                       int *rrow1) {
+  assert(rcol0 && rcol1 && rrow0 && rrow1);
+
+  if (bsize != cm->seq_params.sb_size) return 0;
+  if (cm->rst_info[plane].frame_restoration_type == RESTORE_NONE) return 0;
+
+  assert(!cm->all_lossless);
+
+  const int is_uv = plane > 0;
+
+  const AV1PixelRect tile_rect = av1_whole_frame_rect(cm, is_uv);
+  const int tile_w = tile_rect.right - tile_rect.left;
+  const int tile_h = tile_rect.bottom - tile_rect.top;
+
+  const int mi_top = 0;
+  const int mi_left = 0;
+
+  // Compute the mi-unit corners of the superblock relative to the top-left of
+  // the tile
+  const int mi_rel_row0 = mi_row - mi_top;
+  const int mi_rel_col0 = mi_col - mi_left;
+  const int mi_rel_row1 = mi_rel_row0 + mi_size_high[bsize];
+  const int mi_rel_col1 = mi_rel_col0 + mi_size_wide[bsize];
+
+  const RestorationInfo *rsi = &cm->rst_info[plane];
+  const int size = rsi->restoration_unit_size;
+
+  // Calculate the number of restoration units in this tile (which might be
+  // strictly less than rsi->horz_units_per_tile and rsi->vert_units_per_tile)
+  const int horz_units = av1_lr_count_units_in_tile(size, tile_w);
+  const int vert_units = av1_lr_count_units_in_tile(size, tile_h);
+
+  // The size of an MI-unit on this plane of the image
+  const int ss_x = is_uv && cm->seq_params.subsampling_x;
+  const int ss_y = is_uv && cm->seq_params.subsampling_y;
+  const int mi_size_x = MI_SIZE >> ss_x;
+  const int mi_size_y = MI_SIZE >> ss_y;
+
+  // Write m for the relative mi column or row, D for the superres denominator
+  // and N for the superres numerator. If u is the upscaled pixel offset then
+  // we can write the downscaled pixel offset in two ways as:
+  //
+  //   MI_SIZE * m = N / D u
+  //
+  // from which we get u = D * MI_SIZE * m / N
+  const int mi_to_num_x = av1_superres_scaled(cm)
+                              ? mi_size_x * cm->superres_scale_denominator
+                              : mi_size_x;
+  const int mi_to_num_y = mi_size_y;
+  const int denom_x = av1_superres_scaled(cm) ? size * SCALE_NUMERATOR : size;
+  const int denom_y = size;
+
+  const int rnd_x = denom_x - 1;
+  const int rnd_y = denom_y - 1;
+
+  // rcol0/rrow0 should be the first column/row of restoration units (relative
+  // to the top-left of the tile) that doesn't start left/below of
+  // mi_col/mi_row. For this calculation, we need to round up the division (if
+  // the sb starts at runit column 10.1, the first matching runit has column
+  // index 11)
+  *rcol0 = (mi_rel_col0 * mi_to_num_x + rnd_x) / denom_x;
+  *rrow0 = (mi_rel_row0 * mi_to_num_y + rnd_y) / denom_y;
+
+  // rel_col1/rel_row1 is the equivalent calculation, but for the superblock
+  // below-right. If we're at the bottom or right of the tile, this restoration
+  // unit might not exist, in which case we'll clamp accordingly.
+  *rcol1 = AOMMIN((mi_rel_col1 * mi_to_num_x + rnd_x) / denom_x, horz_units);
+  *rrow1 = AOMMIN((mi_rel_row1 * mi_to_num_y + rnd_y) / denom_y, vert_units);
+
+  return *rcol0 < *rcol1 && *rrow0 < *rrow1;
+}
+
+// Extend to left and right
+static void extend_lines(uint8_t *buf, int width, int height, int stride,
+                         int extend, int use_highbitdepth) {
+  for (int i = 0; i < height; ++i) {
+    if (use_highbitdepth) {
+      uint16_t *buf16 = (uint16_t *)buf;
+      aom_memset16(buf16 - extend, buf16[0], extend);
+      aom_memset16(buf16 + width, buf16[width - 1], extend);
+    } else {
+      memset(buf - extend, buf[0], extend);
+      memset(buf + width, buf[width - 1], extend);
+    }
+    buf += stride;
+  }
+}
+
+static void save_deblock_boundary_lines(
+    const YV12_BUFFER_CONFIG *frame, const AV1_COMMON *cm, int plane, int row,
+    int stripe, int use_highbd, int is_above,
+    RestorationStripeBoundaries *boundaries) {
+  const int is_uv = plane > 0;
+  const uint8_t *src_buf = REAL_PTR(use_highbd, frame->buffers[plane]);
+  const int src_stride = frame->strides[is_uv] << use_highbd;
+  const uint8_t *src_rows = src_buf + row * src_stride;
+
+  uint8_t *bdry_buf = is_above ? boundaries->stripe_boundary_above
+                               : boundaries->stripe_boundary_below;
+  uint8_t *bdry_start = bdry_buf + (RESTORATION_EXTRA_HORZ << use_highbd);
+  const int bdry_stride = boundaries->stripe_boundary_stride << use_highbd;
+  uint8_t *bdry_rows = bdry_start + RESTORATION_CTX_VERT * stripe * bdry_stride;
+
+  // There is a rare case in which a processing stripe can end 1px above the
+  // crop border. In this case, we do want to use deblocked pixels from below
+  // the stripe (hence why we ended up in this function), but instead of
+  // fetching 2 "below" rows we need to fetch one and duplicate it.
+  // This is equivalent to clamping the sample locations against the crop border
+  const int lines_to_save =
+      AOMMIN(RESTORATION_CTX_VERT, frame->crop_heights[is_uv] - row);
+  assert(lines_to_save == 1 || lines_to_save == 2);
+
+  int upscaled_width;
+  int line_bytes;
+  if (av1_superres_scaled(cm)) {
+    const int ss_x = is_uv && cm->seq_params.subsampling_x;
+    upscaled_width = (cm->superres_upscaled_width + ss_x) >> ss_x;
+    line_bytes = upscaled_width << use_highbd;
+    if (use_highbd)
+      av1_upscale_normative_rows(
+          cm, CONVERT_TO_BYTEPTR(src_rows), frame->strides[is_uv],
+          CONVERT_TO_BYTEPTR(bdry_rows), boundaries->stripe_boundary_stride,
+          plane, lines_to_save);
+    else
+      av1_upscale_normative_rows(cm, src_rows, frame->strides[is_uv], bdry_rows,
+                                 boundaries->stripe_boundary_stride, plane,
+                                 lines_to_save);
+  } else {
+    upscaled_width = frame->crop_widths[is_uv];
+    line_bytes = upscaled_width << use_highbd;
+    for (int i = 0; i < lines_to_save; i++) {
+      memcpy(bdry_rows + i * bdry_stride, src_rows + i * src_stride,
+             line_bytes);
+    }
+  }
+  // If we only saved one line, then copy it into the second line buffer
+  if (lines_to_save == 1)
+    memcpy(bdry_rows + bdry_stride, bdry_rows, line_bytes);
+
+  extend_lines(bdry_rows, upscaled_width, RESTORATION_CTX_VERT, bdry_stride,
+               RESTORATION_EXTRA_HORZ, use_highbd);
+}
+
+static void save_cdef_boundary_lines(const YV12_BUFFER_CONFIG *frame,
+                                     const AV1_COMMON *cm, int plane, int row,
+                                     int stripe, int use_highbd, int is_above,
+                                     RestorationStripeBoundaries *boundaries) {
+  const int is_uv = plane > 0;
+  const uint8_t *src_buf = REAL_PTR(use_highbd, frame->buffers[plane]);
+  const int src_stride = frame->strides[is_uv] << use_highbd;
+  const uint8_t *src_rows = src_buf + row * src_stride;
+
+  uint8_t *bdry_buf = is_above ? boundaries->stripe_boundary_above
+                               : boundaries->stripe_boundary_below;
+  uint8_t *bdry_start = bdry_buf + (RESTORATION_EXTRA_HORZ << use_highbd);
+  const int bdry_stride = boundaries->stripe_boundary_stride << use_highbd;
+  uint8_t *bdry_rows = bdry_start + RESTORATION_CTX_VERT * stripe * bdry_stride;
+  const int src_width = frame->crop_widths[is_uv];
+
+  // At the point where this function is called, we've already applied
+  // superres. So we don't need to extend the lines here, we can just
+  // pull directly from the topmost row of the upscaled frame.
+  const int ss_x = is_uv && cm->seq_params.subsampling_x;
+  const int upscaled_width = av1_superres_scaled(cm)
+                                 ? (cm->superres_upscaled_width + ss_x) >> ss_x
+                                 : src_width;
+  const int line_bytes = upscaled_width << use_highbd;
+  for (int i = 0; i < RESTORATION_CTX_VERT; i++) {
+    // Copy the line at 'row' into both context lines. This is because
+    // we want to (effectively) extend the outermost row of CDEF data
+    // from this tile to produce a border, rather than using deblocked
+    // pixels from the tile above/below.
+    memcpy(bdry_rows + i * bdry_stride, src_rows, line_bytes);
+  }
+  extend_lines(bdry_rows, upscaled_width, RESTORATION_CTX_VERT, bdry_stride,
+               RESTORATION_EXTRA_HORZ, use_highbd);
+}
+
+static void save_tile_row_boundary_lines(const YV12_BUFFER_CONFIG *frame,
+                                         int use_highbd, int plane,
+                                         AV1_COMMON *cm, int after_cdef) {
+  const int is_uv = plane > 0;
+  const int ss_y = is_uv && cm->seq_params.subsampling_y;
+  const int stripe_height = RESTORATION_PROC_UNIT_SIZE >> ss_y;
+  const int stripe_off = RESTORATION_UNIT_OFFSET >> ss_y;
+
+  // Get the tile rectangle, with height rounded up to the next multiple of 8
+  // luma pixels (only relevant for the bottom tile of the frame)
+  const AV1PixelRect tile_rect = av1_whole_frame_rect(cm, is_uv);
+  const int stripe0 = 0;
+
+  RestorationStripeBoundaries *boundaries = &cm->rst_info[plane].boundaries;
+
+  const int plane_height = ROUND_POWER_OF_TWO(cm->height, ss_y);
+
+  int tile_stripe;
+  for (tile_stripe = 0;; ++tile_stripe) {
+    const int rel_y0 = AOMMAX(0, tile_stripe * stripe_height - stripe_off);
+    const int y0 = tile_rect.top + rel_y0;
+    if (y0 >= tile_rect.bottom) break;
+
+    const int rel_y1 = (tile_stripe + 1) * stripe_height - stripe_off;
+    const int y1 = AOMMIN(tile_rect.top + rel_y1, tile_rect.bottom);
+
+    const int frame_stripe = stripe0 + tile_stripe;
+
+    // In this case, we should only use CDEF pixels at the top
+    // and bottom of the frame as a whole; internal tile boundaries
+    // can use deblocked pixels from adjacent tiles for context.
+    const int use_deblock_above = (frame_stripe > 0);
+    const int use_deblock_below = (y1 < plane_height);
+
+    if (!after_cdef) {
+      // Save deblocked context where needed.
+      if (use_deblock_above) {
+        save_deblock_boundary_lines(frame, cm, plane, y0 - RESTORATION_CTX_VERT,
+                                    frame_stripe, use_highbd, 1, boundaries);
+      }
+      if (use_deblock_below) {
+        save_deblock_boundary_lines(frame, cm, plane, y1, frame_stripe,
+                                    use_highbd, 0, boundaries);
+      }
+    } else {
+      // Save CDEF context where needed. Note that we need to save the CDEF
+      // context for a particular boundary iff we *didn't* save deblocked
+      // context for that boundary.
+      //
+      // In addition, we need to save copies of the outermost line within
+      // the tile, rather than using data from outside the tile.
+      if (!use_deblock_above) {
+        save_cdef_boundary_lines(frame, cm, plane, y0, frame_stripe, use_highbd,
+                                 1, boundaries);
+      }
+      if (!use_deblock_below) {
+        save_cdef_boundary_lines(frame, cm, plane, y1 - 1, frame_stripe,
+                                 use_highbd, 0, boundaries);
+      }
+    }
+  }
+}
+
+// For each RESTORATION_PROC_UNIT_SIZE pixel high stripe, save 4 scan
+// lines to be used as boundary in the loop restoration process. The
+// lines are saved in rst_internal.stripe_boundary_lines
+void av1_loop_restoration_save_boundary_lines(const YV12_BUFFER_CONFIG *frame,
+                                              AV1_COMMON *cm, int after_cdef) {
+  const int num_planes = av1_num_planes(cm);
+  const int use_highbd = cm->seq_params.use_highbitdepth;
+  for (int p = 0; p < num_planes; ++p) {
+    save_tile_row_boundary_lines(frame, use_highbd, p, cm, after_cdef);
+  }
+}

diff --git a/libav1/av1/common/restoration.h b/libav1/av1/common/restoration.h
new file mode 100644
index 0000000..6d6ba37
--- /dev/null
+++ b/libav1/av1/common/restoration.h

@@ -0,0 +1,379 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_RESTORATION_H_
+#define AOM_AV1_COMMON_RESTORATION_H_
+
+#include "aom_ports/mem.h"
+#include "config/aom_config.h"
+
+#include "av1/common/blockd.h"
+#include "av1/common/enums.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// Border for Loop restoration buffer
+#define AOM_RESTORATION_FRAME_BORDER 32
+#define CLIP(x, lo, hi) ((x) < (lo) ? (lo) : (x) > (hi) ? (hi) : (x))
+#define RINT(x) ((x) < 0 ? (int)((x)-0.5) : (int)((x) + 0.5))
+
+#define RESTORATION_PROC_UNIT_SIZE 64
+
+// Filter tile grid offset upwards compared to the superblock grid
+#define RESTORATION_UNIT_OFFSET 8
+
+#define SGRPROJ_BORDER_VERT 3  // Vertical border used for Sgr
+#define SGRPROJ_BORDER_HORZ 3  // Horizontal border used for Sgr
+
+#define WIENER_BORDER_VERT 2  // Vertical border used for Wiener
+#define WIENER_HALFWIN 3
+#define WIENER_BORDER_HORZ (WIENER_HALFWIN)  // Horizontal border for Wiener
+
+// RESTORATION_BORDER_VERT determines line buffer requirement for LR.
+// Should be set at the max of SGRPROJ_BORDER_VERT and WIENER_BORDER_VERT.
+// Note the line buffer needed is twice the value of this macro.
+#if SGRPROJ_BORDER_VERT >= WIENER_BORDER_VERT
+#define RESTORATION_BORDER_VERT (SGRPROJ_BORDER_VERT)
+#else
+#define RESTORATION_BORDER_VERT (WIENER_BORDER_VERT)
+#endif  // SGRPROJ_BORDER_VERT >= WIENER_BORDER_VERT
+
+#if SGRPROJ_BORDER_HORZ >= WIENER_BORDER_HORZ
+#define RESTORATION_BORDER_HORZ (SGRPROJ_BORDER_HORZ)
+#else
+#define RESTORATION_BORDER_HORZ (WIENER_BORDER_HORZ)
+#endif  // SGRPROJ_BORDER_VERT >= WIENER_BORDER_VERT
+
+// How many border pixels do we need for each processing unit?
+#define RESTORATION_BORDER 3
+
+// How many rows of deblocked pixels do we save above/below each processing
+// stripe?
+#define RESTORATION_CTX_VERT 2
+
+// Additional pixels to the left and right in above/below buffers
+// It is RESTORATION_BORDER_HORZ rounded up to get nicer buffer alignment
+#define RESTORATION_EXTRA_HORZ 4
+
+// Pad up to 20 more (may be much less is needed)
+#define RESTORATION_PADDING 20
+#define RESTORATION_PROC_UNIT_PELS                             \
+  ((RESTORATION_PROC_UNIT_SIZE + RESTORATION_BORDER_HORZ * 2 + \
+    RESTORATION_PADDING) *                                     \
+   (RESTORATION_PROC_UNIT_SIZE + RESTORATION_BORDER_VERT * 2 + \
+    RESTORATION_PADDING))
+
+#define RESTORATION_UNITSIZE_MAX 256
+#define RESTORATION_UNITPELS_HORZ_MAX \
+  (RESTORATION_UNITSIZE_MAX * 3 / 2 + 2 * RESTORATION_BORDER_HORZ + 16)
+#define RESTORATION_UNITPELS_VERT_MAX                                \
+  ((RESTORATION_UNITSIZE_MAX * 3 / 2 + 2 * RESTORATION_BORDER_VERT + \
+    RESTORATION_UNIT_OFFSET))
+#define RESTORATION_UNITPELS_MAX \
+  (RESTORATION_UNITPELS_HORZ_MAX * RESTORATION_UNITPELS_VERT_MAX)
+
+// Two 32-bit buffers needed for the restored versions from two filters
+// TODO(debargha, rupert): Refactor to not need the large tilesize to be stored
+// on the decoder side.
+#define SGRPROJ_TMPBUF_SIZE (RESTORATION_UNITPELS_MAX * 2 * sizeof(int32_t))
+
+#define SGRPROJ_EXTBUF_SIZE (0)
+#define SGRPROJ_PARAMS_BITS 4
+#define SGRPROJ_PARAMS (1 << SGRPROJ_PARAMS_BITS)
+
+// Precision bits for projection
+#define SGRPROJ_PRJ_BITS 7
+// Restoration precision bits generated higher than source before projection
+#define SGRPROJ_RST_BITS 4
+// Internal precision bits for core selfguided_restoration
+#define SGRPROJ_SGR_BITS 8
+#define SGRPROJ_SGR (1 << SGRPROJ_SGR_BITS)
+
+#define SGRPROJ_PRJ_MIN0 (-(1 << SGRPROJ_PRJ_BITS) * 3 / 4)
+#define SGRPROJ_PRJ_MAX0 (SGRPROJ_PRJ_MIN0 + (1 << SGRPROJ_PRJ_BITS) - 1)
+#define SGRPROJ_PRJ_MIN1 (-(1 << SGRPROJ_PRJ_BITS) / 4)
+#define SGRPROJ_PRJ_MAX1 (SGRPROJ_PRJ_MIN1 + (1 << SGRPROJ_PRJ_BITS) - 1)
+
+#define SGRPROJ_PRJ_SUBEXP_K 4
+
+#define SGRPROJ_BITS (SGRPROJ_PRJ_BITS * 2 + SGRPROJ_PARAMS_BITS)
+
+#define MAX_RADIUS 2  // Only 1, 2, 3 allowed
+#define MAX_NELEM ((2 * MAX_RADIUS + 1) * (2 * MAX_RADIUS + 1))
+#define SGRPROJ_MTABLE_BITS 20
+#define SGRPROJ_RECIP_BITS 12
+
+#define WIENER_HALFWIN1 (WIENER_HALFWIN + 1)
+#define WIENER_WIN (2 * WIENER_HALFWIN + 1)
+#define WIENER_WIN2 ((WIENER_WIN) * (WIENER_WIN))
+#define WIENER_TMPBUF_SIZE (0)
+#define WIENER_EXTBUF_SIZE (0)
+
+// If WIENER_WIN_CHROMA == WIENER_WIN - 2, that implies 5x5 filters are used for
+// chroma. To use 7x7 for chroma set WIENER_WIN_CHROMA to WIENER_WIN.
+#define WIENER_WIN_CHROMA (WIENER_WIN - 2)
+#define WIENER_WIN2_CHROMA ((WIENER_WIN_CHROMA) * (WIENER_WIN_CHROMA))
+
+#define WIENER_FILT_PREC_BITS 7
+#define WIENER_FILT_STEP (1 << WIENER_FILT_PREC_BITS)
+
+// Central values for the taps
+#define WIENER_FILT_TAP0_MIDV (3)
+#define WIENER_FILT_TAP1_MIDV (-7)
+#define WIENER_FILT_TAP2_MIDV (15)
+#define WIENER_FILT_TAP3_MIDV                                              \
+  (WIENER_FILT_STEP - 2 * (WIENER_FILT_TAP0_MIDV + WIENER_FILT_TAP1_MIDV + \
+                           WIENER_FILT_TAP2_MIDV))
+
+#define WIENER_FILT_TAP0_BITS 4
+#define WIENER_FILT_TAP1_BITS 5
+#define WIENER_FILT_TAP2_BITS 6
+
+#define WIENER_FILT_BITS \
+  ((WIENER_FILT_TAP0_BITS + WIENER_FILT_TAP1_BITS + WIENER_FILT_TAP2_BITS) * 2)
+
+#define WIENER_FILT_TAP0_MINV \
+  (WIENER_FILT_TAP0_MIDV - (1 << WIENER_FILT_TAP0_BITS) / 2)
+#define WIENER_FILT_TAP1_MINV \
+  (WIENER_FILT_TAP1_MIDV - (1 << WIENER_FILT_TAP1_BITS) / 2)
+#define WIENER_FILT_TAP2_MINV \
+  (WIENER_FILT_TAP2_MIDV - (1 << WIENER_FILT_TAP2_BITS) / 2)
+
+#define WIENER_FILT_TAP0_MAXV \
+  (WIENER_FILT_TAP0_MIDV - 1 + (1 << WIENER_FILT_TAP0_BITS) / 2)
+#define WIENER_FILT_TAP1_MAXV \
+  (WIENER_FILT_TAP1_MIDV - 1 + (1 << WIENER_FILT_TAP1_BITS) / 2)
+#define WIENER_FILT_TAP2_MAXV \
+  (WIENER_FILT_TAP2_MIDV - 1 + (1 << WIENER_FILT_TAP2_BITS) / 2)
+
+#define WIENER_FILT_TAP0_SUBEXP_K 1
+#define WIENER_FILT_TAP1_SUBEXP_K 2
+#define WIENER_FILT_TAP2_SUBEXP_K 3
+
+// Max of SGRPROJ_TMPBUF_SIZE, DOMAINTXFMRF_TMPBUF_SIZE, WIENER_TMPBUF_SIZE
+#define RESTORATION_TMPBUF_SIZE (SGRPROJ_TMPBUF_SIZE)
+
+// Max of SGRPROJ_EXTBUF_SIZE, WIENER_EXTBUF_SIZE
+#define RESTORATION_EXTBUF_SIZE (WIENER_EXTBUF_SIZE)
+
+// Check the assumptions of the existing code
+#if SUBPEL_TAPS != WIENER_WIN + 1
+#error "Wiener filter currently only works if SUBPEL_TAPS == WIENER_WIN + 1"
+#endif
+#if WIENER_FILT_PREC_BITS != 7
+#error "Wiener filter currently only works if WIENER_FILT_PREC_BITS == 7"
+#endif
+
+#define LR_TILE_ROW 0
+#define LR_TILE_COL 0
+#define LR_TILE_COLS 1
+
+typedef struct {
+  int r[2];  // radii
+  int s[2];  // sgr parameters for r[0] and r[1], based on GenSgrprojVtable()
+} sgr_params_type;
+
+typedef struct {
+  RestorationType restoration_type;
+  WienerInfo wiener_info;
+  SgrprojInfo sgrproj_info;
+} RestorationUnitInfo;
+
+// A restoration line buffer needs space for two lines plus a horizontal filter
+// margin of RESTORATION_EXTRA_HORZ on each side.
+#define RESTORATION_LINEBUFFER_WIDTH \
+  (RESTORATION_UNITSIZE_MAX * 3 / 2 + 2 * RESTORATION_EXTRA_HORZ)
+
+// Similarly, the column buffers (used when we're at a vertical tile edge
+// that we can't filter across) need space for one processing unit's worth
+// of pixels, plus the top/bottom border width
+#define RESTORATION_COLBUFFER_HEIGHT \
+  (RESTORATION_PROC_UNIT_SIZE + 2 * RESTORATION_BORDER)
+
+typedef struct {
+  // Temporary buffers to save/restore 3 lines above/below the restoration
+  // stripe.
+  uint16_t tmp_save_above[RESTORATION_BORDER][RESTORATION_LINEBUFFER_WIDTH];
+  uint16_t tmp_save_below[RESTORATION_BORDER][RESTORATION_LINEBUFFER_WIDTH];
+} RestorationLineBuffers;
+
+typedef struct {
+  uint8_t *stripe_boundary_above;
+  uint8_t *stripe_boundary_below;
+  int stripe_boundary_stride;
+  int stripe_boundary_size;
+} RestorationStripeBoundaries;
+
+typedef struct {
+  RestorationType frame_restoration_type;
+  int restoration_unit_size;
+
+  // Fields below here are allocated and initialised by
+  // av1_alloc_restoration_struct. (horz_)units_per_tile give the number of
+  // restoration units in (one row of) the largest tile in the frame. The data
+  // in unit_info is laid out with units_per_tile entries for each tile, which
+  // have stride horz_units_per_tile.
+  //
+  // Even if there are tiles of different sizes, the data in unit_info is laid
+  // out as if all tiles are of full size.
+  int units_per_tile;
+  int vert_units_per_tile, horz_units_per_tile;
+  RestorationUnitInfo *unit_info;
+  RestorationStripeBoundaries boundaries;
+  int optimized_lr;
+} RestorationInfo;
+
+static INLINE void set_default_sgrproj(SgrprojInfo *sgrproj_info) {
+  sgrproj_info->xqd[0] = (SGRPROJ_PRJ_MIN0 + SGRPROJ_PRJ_MAX0) / 2;
+  sgrproj_info->xqd[1] = (SGRPROJ_PRJ_MIN1 + SGRPROJ_PRJ_MAX1) / 2;
+}
+
+static INLINE void set_default_wiener(WienerInfo *wiener_info) {
+  wiener_info->vfilter[0] = wiener_info->hfilter[0] = WIENER_FILT_TAP0_MIDV;
+  wiener_info->vfilter[1] = wiener_info->hfilter[1] = WIENER_FILT_TAP1_MIDV;
+  wiener_info->vfilter[2] = wiener_info->hfilter[2] = WIENER_FILT_TAP2_MIDV;
+  wiener_info->vfilter[WIENER_HALFWIN] = wiener_info->hfilter[WIENER_HALFWIN] =
+      -2 *
+      (WIENER_FILT_TAP2_MIDV + WIENER_FILT_TAP1_MIDV + WIENER_FILT_TAP0_MIDV);
+  wiener_info->vfilter[4] = wiener_info->hfilter[4] = WIENER_FILT_TAP2_MIDV;
+  wiener_info->vfilter[5] = wiener_info->hfilter[5] = WIENER_FILT_TAP1_MIDV;
+  wiener_info->vfilter[6] = wiener_info->hfilter[6] = WIENER_FILT_TAP0_MIDV;
+}
+
+typedef struct {
+  int h_start, h_end, v_start, v_end;
+} RestorationTileLimits;
+
+typedef void (*rest_unit_visitor_t)(const RestorationTileLimits *limits,
+                                    const AV1PixelRect *tile_rect,
+                                    int rest_unit_idx, void *priv,
+                                    int32_t *tmpbuf,
+                                    RestorationLineBuffers *rlbs);
+
+typedef struct FilterFrameCtxt {
+  const RestorationInfo *rsi;
+  int tile_stripe0;
+  int ss_x, ss_y;
+  int highbd, bit_depth;
+  uint8_t *data8, *dst8;
+  int data_stride, dst_stride;
+  AV1PixelRect tile_rect;
+} FilterFrameCtxt;
+
+typedef struct AV1LrStruct {
+  rest_unit_visitor_t on_rest_unit;
+  FilterFrameCtxt ctxt[MAX_MB_PLANE];
+  YV12_BUFFER_CONFIG *frame;
+  YV12_BUFFER_CONFIG *dst;
+} AV1LrStruct;
+
+extern const sgr_params_type sgr_params[SGRPROJ_PARAMS];
+extern int sgrproj_mtable[SGRPROJ_PARAMS][2];
+extern const int32_t x_by_xplus1[256];
+extern const int32_t one_by_x[MAX_NELEM];
+
+void av1_alloc_restoration_struct(struct AV1Common *cm, RestorationInfo *rsi,
+                                  int is_uv);
+void av1_free_restoration_struct(RestorationInfo *rst_info);
+
+void extend_frame(uint8_t *data, int width, int height, int stride,
+                  int border_horz, int border_vert, int highbd);
+void decode_xq(const int *xqd, int *xq, const sgr_params_type *params);
+
+// Filter a single loop restoration unit.
+//
+// limits is the limits of the unit. rui gives the mode to use for this unit
+// and its coefficients. If striped loop restoration is enabled, rsb contains
+// deblocked pixels to use for stripe boundaries; rlbs is just some space to
+// use as a scratch buffer. tile_rect gives the limits of the tile containing
+// this unit. tile_stripe0 is the index of the first stripe in this tile.
+//
+// ss_x and ss_y are flags which should be 1 if this is a plane with
+// horizontal/vertical subsampling, respectively. highbd is a flag which should
+// be 1 in high bit depth mode, in which case bit_depth is the bit depth.
+//
+// data8 is the frame data (pointing at the top-left corner of the frame, not
+// the restoration unit) and stride is its stride. dst8 is the buffer where the
+// results will be written and has stride dst_stride. Like data8, dst8 should
+// point at the top-left corner of the frame.
+//
+// Finally tmpbuf is a scratch buffer used by the sgrproj filter which should
+// be at least SGRPROJ_TMPBUF_SIZE big.
+void av1_loop_restoration_filter_unit(
+    const RestorationTileLimits *limits, const RestorationUnitInfo *rui,
+    const RestorationStripeBoundaries *rsb, RestorationLineBuffers *rlbs,
+    const AV1PixelRect *tile_rect, int tile_stripe0, int ss_x, int ss_y,
+    int highbd, int bit_depth, uint8_t *data8, int stride, uint8_t *dst8,
+    int dst_stride, int32_t *tmpbuf, int optimized_lr);
+
+void av1_loop_restoration_filter_frame(YV12_BUFFER_CONFIG *frame,
+                                       struct AV1Common *cm, int optimized_lr,
+                                       void *lr_ctxt);
+void av1_loop_restoration_precal();
+
+typedef void (*rest_tile_start_visitor_t)(int tile_row, int tile_col,
+                                          void *priv);
+struct AV1LrSyncData;
+
+typedef void (*sync_read_fn_t)(void *const lr_sync, int r, int c, int plane);
+
+typedef void (*sync_write_fn_t)(void *const lr_sync, int r, int c,
+                                const int sb_cols, int plane);
+
+// Call on_rest_unit for each loop restoration unit in the plane.
+void av1_foreach_rest_unit_in_plane(const struct AV1Common *cm, int plane,
+                                    rest_unit_visitor_t on_rest_unit,
+                                    void *priv, AV1PixelRect *tile_rect,
+                                    int32_t *tmpbuf,
+                                    RestorationLineBuffers *rlbs);
+
+// Return 1 iff the block at mi_row, mi_col with size bsize is a
+// top-level superblock containing the top-left corner of at least one
+// loop restoration unit.
+//
+// If the block is a top-level superblock, the function writes to
+// *rcol0, *rcol1, *rrow0, *rrow1. The rectangle of restoration unit
+// indices given by [*rcol0, *rcol1) x [*rrow0, *rrow1) are relative
+// to the current tile, whose starting index is returned as
+// *tile_tl_idx.
+int av1_loop_restoration_corners_in_sb(const struct AV1Common *cm, int plane,
+                                       int mi_row, int mi_col, BLOCK_SIZE bsize,
+                                       int *rcol0, int *rcol1, int *rrow0,
+                                       int *rrow1);
+
+void av1_loop_restoration_save_boundary_lines(const YV12_BUFFER_CONFIG *frame,
+                                              struct AV1Common *cm,
+                                              int after_cdef);
+void av1_loop_restoration_filter_frame_init(AV1LrStruct *lr_ctxt,
+                                            YV12_BUFFER_CONFIG *frame,
+                                            struct AV1Common *cm,
+                                            int optimized_lr, int num_planes);
+void av1_loop_restoration_copy_planes(AV1LrStruct *loop_rest_ctxt,
+                                      struct AV1Common *cm, int num_planes);
+void av1_foreach_rest_unit_in_row(
+    RestorationTileLimits *limits, const AV1PixelRect *tile_rect,
+    rest_unit_visitor_t on_rest_unit, int row_number, int unit_size,
+    int unit_idx0, int hunits_per_tile, int vunits_per_tile, int plane,
+    void *priv, int32_t *tmpbuf, RestorationLineBuffers *rlbs,
+    sync_read_fn_t on_sync_read, sync_write_fn_t on_sync_write,
+    struct AV1LrSyncData *const lr_sync);
+AV1PixelRect av1_whole_frame_rect(const struct AV1Common *cm, int is_uv);
+int av1_lr_count_units_in_tile(int unit_size, int tile_size);
+void av1_lr_sync_read_dummy(void *const lr_sync, int r, int c, int plane);
+void av1_lr_sync_write_dummy(void *const lr_sync, int r, int c,
+                             const int sb_cols, int plane);
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_RESTORATION_H_

diff --git a/libav1/av1/common/scale.c b/libav1/av1/common/scale.c
new file mode 100644
index 0000000..bac7bd9
--- /dev/null
+++ b/libav1/av1/common/scale.c

@@ -0,0 +1,126 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "config/aom_dsp_rtcd.h"
+#include "config/av1_rtcd.h"
+
+#include "av1/common/filter.h"
+#include "av1/common/scale.h"
+#include "aom_dsp/aom_filter.h"
+
+// Note: Expect val to be in q4 precision
+static INLINE int scaled_x(int val, const struct scale_factors *sf) {
+  const int off =
+      (sf->x_scale_fp - (1 << REF_SCALE_SHIFT)) * (1 << (SUBPEL_BITS - 1));
+  const int64_t tval = (int64_t)val * sf->x_scale_fp + off;
+  return (int)ROUND_POWER_OF_TWO_SIGNED_64(tval,
+                                           REF_SCALE_SHIFT - SCALE_EXTRA_BITS);
+}
+
+// Note: Expect val to be in q4 precision
+static INLINE int scaled_y(int val, const struct scale_factors *sf) {
+  const int off =
+      (sf->y_scale_fp - (1 << REF_SCALE_SHIFT)) * (1 << (SUBPEL_BITS - 1));
+  const int64_t tval = (int64_t)val * sf->y_scale_fp + off;
+  return (int)ROUND_POWER_OF_TWO_SIGNED_64(tval,
+                                           REF_SCALE_SHIFT - SCALE_EXTRA_BITS);
+}
+
+// Note: Expect val to be in q4 precision
+static int unscaled_value(int val, const struct scale_factors *sf) {
+  (void)sf;
+  return val << SCALE_EXTRA_BITS;
+}
+
+static int get_fixed_point_scale_factor(int other_size, int this_size) {
+  // Calculate scaling factor once for each reference frame
+  // and use fixed point scaling factors in decoding and encoding routines.
+  // Hardware implementations can calculate scale factor in device driver
+  // and use multiplication and shifting on hardware instead of division.
+  return ((other_size << REF_SCALE_SHIFT) + this_size / 2) / this_size;
+}
+
+// Given the fixed point scale, calculate coarse point scale.
+static int fixed_point_scale_to_coarse_point_scale(int scale_fp) {
+  return ROUND_POWER_OF_TWO(scale_fp, REF_SCALE_SHIFT - SCALE_SUBPEL_BITS);
+}
+
+// Note: x and y are integer precision, mvq4 is q4 precision.
+MV32 av1_scale_mv(const MV *mvq4, int x, int y,
+                  const struct scale_factors *sf) {
+  const int x_off_q4 = scaled_x(x << SUBPEL_BITS, sf);
+  const int y_off_q4 = scaled_y(y << SUBPEL_BITS, sf);
+  const MV32 res = { scaled_y((y << SUBPEL_BITS) + mvq4->row, sf) - y_off_q4,
+                     scaled_x((x << SUBPEL_BITS) + mvq4->col, sf) - x_off_q4 };
+  return res;
+}
+
+void av1_setup_scale_factors_for_frame(struct scale_factors *sf, int other_w,
+                                       int other_h, int this_w, int this_h) {
+  if (!valid_ref_frame_size(other_w, other_h, this_w, this_h)) {
+    sf->x_scale_fp = REF_INVALID_SCALE;
+    sf->y_scale_fp = REF_INVALID_SCALE;
+    return;
+  }
+
+  sf->x_scale_fp = get_fixed_point_scale_factor(other_w, this_w);
+  sf->y_scale_fp = get_fixed_point_scale_factor(other_h, this_h);
+
+  sf->x_step_q4 = fixed_point_scale_to_coarse_point_scale(sf->x_scale_fp);
+  sf->y_step_q4 = fixed_point_scale_to_coarse_point_scale(sf->y_scale_fp);
+
+  if (av1_is_scaled(sf)) {
+    sf->scale_value_x = scaled_x;
+    sf->scale_value_y = scaled_y;
+  } else {
+    sf->scale_value_x = unscaled_value;
+    sf->scale_value_y = unscaled_value;
+  }
+
+  // AV1 convolve functions
+  // Special case convolve functions should produce the same result as
+  // av1_convolve_2d.
+  // subpel_x_q4 == 0 && subpel_y_q4 == 0
+  sf->convolve[0][0][0] = av1_convolve_2d_copy_sr;
+  // subpel_x_q4 == 0
+  sf->convolve[0][1][0] = av1_convolve_y_sr;
+  // subpel_y_q4 == 0
+  sf->convolve[1][0][0] = av1_convolve_x_sr;
+  // subpel_x_q4 != 0 && subpel_y_q4 != 0
+  sf->convolve[1][1][0] = av1_convolve_2d_sr;
+  // subpel_x_q4 == 0 && subpel_y_q4 == 0
+  sf->convolve[0][0][1] = av1_dist_wtd_convolve_2d_copy;
+  // subpel_x_q4 == 0
+  sf->convolve[0][1][1] = av1_dist_wtd_convolve_y;
+  // subpel_y_q4 == 0
+  sf->convolve[1][0][1] = av1_dist_wtd_convolve_x;
+  // subpel_x_q4 != 0 && subpel_y_q4 != 0
+  sf->convolve[1][1][1] = av1_dist_wtd_convolve_2d;
+  // AV1 High BD convolve functions
+  // Special case convolve functions should produce the same result as
+  // av1_highbd_convolve_2d.
+  // subpel_x_q4 == 0 && subpel_y_q4 == 0
+  sf->highbd_convolve[0][0][0] = av1_highbd_convolve_2d_copy_sr;
+  // subpel_x_q4 == 0
+  sf->highbd_convolve[0][1][0] = av1_highbd_convolve_y_sr;
+  // subpel_y_q4 == 0
+  sf->highbd_convolve[1][0][0] = av1_highbd_convolve_x_sr;
+  // subpel_x_q4 != 0 && subpel_y_q4 != 0
+  sf->highbd_convolve[1][1][0] = av1_highbd_convolve_2d_sr;
+  // subpel_x_q4 == 0 && subpel_y_q4 == 0
+  sf->highbd_convolve[0][0][1] = av1_highbd_dist_wtd_convolve_2d_copy;
+  // subpel_x_q4 == 0
+  sf->highbd_convolve[0][1][1] = av1_highbd_dist_wtd_convolve_y;
+  // subpel_y_q4 == 0
+  sf->highbd_convolve[1][0][1] = av1_highbd_dist_wtd_convolve_x;
+  // subpel_x_q4 != 0 && subpel_y_q4 != 0
+  sf->highbd_convolve[1][1][1] = av1_highbd_dist_wtd_convolve_2d;
+}

diff --git a/libav1/av1/common/scale.h b/libav1/av1/common/scale.h
new file mode 100644
index 0000000..748e958
--- /dev/null
+++ b/libav1/av1/common/scale.h

@@ -0,0 +1,67 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_SCALE_H_
+#define AOM_AV1_COMMON_SCALE_H_
+
+#include "av1/common/convolve.h"
+#include "av1/common/mv.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define SCALE_NUMERATOR 8
+
+#define REF_SCALE_SHIFT 14
+#define REF_NO_SCALE (1 << REF_SCALE_SHIFT)
+#define REF_INVALID_SCALE -1
+
+struct scale_factors {
+  int x_scale_fp;  // horizontal fixed point scale factor
+  int y_scale_fp;  // vertical fixed point scale factor
+  int x_step_q4;
+  int y_step_q4;
+
+  int (*scale_value_x)(int val, const struct scale_factors *sf);
+  int (*scale_value_y)(int val, const struct scale_factors *sf);
+
+  // convolve_fn_ptr[subpel_x != 0][subpel_y != 0][is_compound]
+  aom_convolve_fn_t convolve[2][2][2];
+  aom_highbd_convolve_fn_t highbd_convolve[2][2][2];
+};
+
+MV32 av1_scale_mv(const MV *mv, int x, int y, const struct scale_factors *sf);
+
+void av1_setup_scale_factors_for_frame(struct scale_factors *sf, int other_w,
+                                       int other_h, int this_w, int this_h);
+
+static INLINE int av1_is_valid_scale(const struct scale_factors *sf) {
+  return sf->x_scale_fp != REF_INVALID_SCALE &&
+         sf->y_scale_fp != REF_INVALID_SCALE;
+}
+
+static INLINE int av1_is_scaled(const struct scale_factors *sf) {
+  return av1_is_valid_scale(sf) &&
+         (sf->x_scale_fp != REF_NO_SCALE || sf->y_scale_fp != REF_NO_SCALE);
+}
+
+static INLINE int valid_ref_frame_size(int ref_width, int ref_height,
+                                       int this_width, int this_height) {
+  return 2 * this_width >= ref_width && 2 * this_height >= ref_height &&
+         this_width <= 16 * ref_width && this_height <= 16 * ref_height;
+}
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_SCALE_H_

diff --git a/libav1/av1/common/scan.c b/libav1/av1/common/scan.c
new file mode 100644
index 0000000..29d78a0
--- /dev/null
+++ b/libav1/av1/common/scan.c

@@ -0,0 +1,3758 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+
+#include "av1/common/common_data.h"
+#include "av1/common/scan.h"
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_4x4[16]) = {
+  0, 1, 4, 8, 5, 2, 3, 6, 9, 12, 13, 10, 7, 11, 14, 15
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_4x4[16]) = {
+  0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_4x4[16]) = {
+  0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_4x8[32]) = {
+  0,  1,  4,  2,  5,  8,  3,  6,  9,  12, 7,  10, 13, 16, 11, 14,
+  17, 20, 15, 18, 21, 24, 19, 22, 25, 28, 23, 26, 29, 27, 30, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_4x8[32]) = {
+  0, 4, 8,  12, 16, 20, 24, 28, 1, 5, 9,  13, 17, 21, 25, 29,
+  2, 6, 10, 14, 18, 22, 26, 30, 3, 7, 11, 15, 19, 23, 27, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_4x8[32]) = {
+  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15,
+  16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_8x4[32]) = {
+  0,  8, 1,  16, 9,  2, 24, 17, 10, 3, 25, 18, 11, 4,  26, 19,
+  12, 5, 27, 20, 13, 6, 28, 21, 14, 7, 29, 22, 15, 30, 23, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_8x4[32]) = {
+  0, 8,  16, 24, 1, 9,  17, 25, 2, 10, 18, 26, 3, 11, 19, 27,
+  4, 12, 20, 28, 5, 13, 21, 29, 6, 14, 22, 30, 7, 15, 23, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_8x4[32]) = {
+  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15,
+  16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_4x16[64]) = {
+  0,  1,  4,  2,  5,  8,  3,  6,  9,  12, 7,  10, 13, 16, 11, 14,
+  17, 20, 15, 18, 21, 24, 19, 22, 25, 28, 23, 26, 29, 32, 27, 30,
+  33, 36, 31, 34, 37, 40, 35, 38, 41, 44, 39, 42, 45, 48, 43, 46,
+  49, 52, 47, 50, 53, 56, 51, 54, 57, 60, 55, 58, 61, 59, 62, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_16x4[64]) = {
+  0,  16, 1,  32, 17, 2,  48, 33, 18, 3,  49, 34, 19, 4,  50, 35,
+  20, 5,  51, 36, 21, 6,  52, 37, 22, 7,  53, 38, 23, 8,  54, 39,
+  24, 9,  55, 40, 25, 10, 56, 41, 26, 11, 57, 42, 27, 12, 58, 43,
+  28, 13, 59, 44, 29, 14, 60, 45, 30, 15, 61, 46, 31, 62, 47, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_4x16[64]) = {
+  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15,
+  16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+  32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
+  48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_16x4[64]) = {
+  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15,
+  16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+  32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
+  48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_4x16[64]) = {
+  0, 4, 8,  12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60,
+  1, 5, 9,  13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61,
+  2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62,
+  3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_16x4[64]) = {
+  0,  16, 32, 48, 1,  17, 33, 49, 2,  18, 34, 50, 3,  19, 35, 51,
+  4,  20, 36, 52, 5,  21, 37, 53, 6,  22, 38, 54, 7,  23, 39, 55,
+  8,  24, 40, 56, 9,  25, 41, 57, 10, 26, 42, 58, 11, 27, 43, 59,
+  12, 28, 44, 60, 13, 29, 45, 61, 14, 30, 46, 62, 15, 31, 47, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_8x32[256]) = {
+  0,   1,   8,   2,   9,   16,  3,   10,  17,  24,  4,   11,  18,  25,  32,
+  5,   12,  19,  26,  33,  40,  6,   13,  20,  27,  34,  41,  48,  7,   14,
+  21,  28,  35,  42,  49,  56,  15,  22,  29,  36,  43,  50,  57,  64,  23,
+  30,  37,  44,  51,  58,  65,  72,  31,  38,  45,  52,  59,  66,  73,  80,
+  39,  46,  53,  60,  67,  74,  81,  88,  47,  54,  61,  68,  75,  82,  89,
+  96,  55,  62,  69,  76,  83,  90,  97,  104, 63,  70,  77,  84,  91,  98,
+  105, 112, 71,  78,  85,  92,  99,  106, 113, 120, 79,  86,  93,  100, 107,
+  114, 121, 128, 87,  94,  101, 108, 115, 122, 129, 136, 95,  102, 109, 116,
+  123, 130, 137, 144, 103, 110, 117, 124, 131, 138, 145, 152, 111, 118, 125,
+  132, 139, 146, 153, 160, 119, 126, 133, 140, 147, 154, 161, 168, 127, 134,
+  141, 148, 155, 162, 169, 176, 135, 142, 149, 156, 163, 170, 177, 184, 143,
+  150, 157, 164, 171, 178, 185, 192, 151, 158, 165, 172, 179, 186, 193, 200,
+  159, 166, 173, 180, 187, 194, 201, 208, 167, 174, 181, 188, 195, 202, 209,
+  216, 175, 182, 189, 196, 203, 210, 217, 224, 183, 190, 197, 204, 211, 218,
+  225, 232, 191, 198, 205, 212, 219, 226, 233, 240, 199, 206, 213, 220, 227,
+  234, 241, 248, 207, 214, 221, 228, 235, 242, 249, 215, 222, 229, 236, 243,
+  250, 223, 230, 237, 244, 251, 231, 238, 245, 252, 239, 246, 253, 247, 254,
+  255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_32x8[256]) = {
+  0,   32,  1,   64,  33,  2,   96,  65,  34,  3,   128, 97,  66,  35,  4,
+  160, 129, 98,  67,  36,  5,   192, 161, 130, 99,  68,  37,  6,   224, 193,
+  162, 131, 100, 69,  38,  7,   225, 194, 163, 132, 101, 70,  39,  8,   226,
+  195, 164, 133, 102, 71,  40,  9,   227, 196, 165, 134, 103, 72,  41,  10,
+  228, 197, 166, 135, 104, 73,  42,  11,  229, 198, 167, 136, 105, 74,  43,
+  12,  230, 199, 168, 137, 106, 75,  44,  13,  231, 200, 169, 138, 107, 76,
+  45,  14,  232, 201, 170, 139, 108, 77,  46,  15,  233, 202, 171, 140, 109,
+  78,  47,  16,  234, 203, 172, 141, 110, 79,  48,  17,  235, 204, 173, 142,
+  111, 80,  49,  18,  236, 205, 174, 143, 112, 81,  50,  19,  237, 206, 175,
+  144, 113, 82,  51,  20,  238, 207, 176, 145, 114, 83,  52,  21,  239, 208,
+  177, 146, 115, 84,  53,  22,  240, 209, 178, 147, 116, 85,  54,  23,  241,
+  210, 179, 148, 117, 86,  55,  24,  242, 211, 180, 149, 118, 87,  56,  25,
+  243, 212, 181, 150, 119, 88,  57,  26,  244, 213, 182, 151, 120, 89,  58,
+  27,  245, 214, 183, 152, 121, 90,  59,  28,  246, 215, 184, 153, 122, 91,
+  60,  29,  247, 216, 185, 154, 123, 92,  61,  30,  248, 217, 186, 155, 124,
+  93,  62,  31,  249, 218, 187, 156, 125, 94,  63,  250, 219, 188, 157, 126,
+  95,  251, 220, 189, 158, 127, 252, 221, 190, 159, 253, 222, 191, 254, 223,
+  255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_8x32[256]) = {
+  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   10,  11,  12,  13,  14,
+  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
+  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,
+  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,
+  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,
+  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
+  90,  91,  92,  93,  94,  95,  96,  97,  98,  99,  100, 101, 102, 103, 104,
+  105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+  120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+  135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+  150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+  165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+  180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+  195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+  210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+  225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+  240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+  255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_32x8[256]) = {
+  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   10,  11,  12,  13,  14,
+  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
+  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,
+  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,
+  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,
+  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
+  90,  91,  92,  93,  94,  95,  96,  97,  98,  99,  100, 101, 102, 103, 104,
+  105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+  120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+  135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+  150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+  165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+  180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+  195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+  210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+  225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+  240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+  255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_8x32[256]) = {
+  0,   8,   16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96,  104, 112,
+  120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232,
+  240, 248, 1,   9,   17,  25,  33,  41,  49,  57,  65,  73,  81,  89,  97,
+  105, 113, 121, 129, 137, 145, 153, 161, 169, 177, 185, 193, 201, 209, 217,
+  225, 233, 241, 249, 2,   10,  18,  26,  34,  42,  50,  58,  66,  74,  82,
+  90,  98,  106, 114, 122, 130, 138, 146, 154, 162, 170, 178, 186, 194, 202,
+  210, 218, 226, 234, 242, 250, 3,   11,  19,  27,  35,  43,  51,  59,  67,
+  75,  83,  91,  99,  107, 115, 123, 131, 139, 147, 155, 163, 171, 179, 187,
+  195, 203, 211, 219, 227, 235, 243, 251, 4,   12,  20,  28,  36,  44,  52,
+  60,  68,  76,  84,  92,  100, 108, 116, 124, 132, 140, 148, 156, 164, 172,
+  180, 188, 196, 204, 212, 220, 228, 236, 244, 252, 5,   13,  21,  29,  37,
+  45,  53,  61,  69,  77,  85,  93,  101, 109, 117, 125, 133, 141, 149, 157,
+  165, 173, 181, 189, 197, 205, 213, 221, 229, 237, 245, 253, 6,   14,  22,
+  30,  38,  46,  54,  62,  70,  78,  86,  94,  102, 110, 118, 126, 134, 142,
+  150, 158, 166, 174, 182, 190, 198, 206, 214, 222, 230, 238, 246, 254, 7,
+  15,  23,  31,  39,  47,  55,  63,  71,  79,  87,  95,  103, 111, 119, 127,
+  135, 143, 151, 159, 167, 175, 183, 191, 199, 207, 215, 223, 231, 239, 247,
+  255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_32x8[256]) = {
+  0,  32, 64, 96,  128, 160, 192, 224, 1,  33, 65, 97,  129, 161, 193, 225,
+  2,  34, 66, 98,  130, 162, 194, 226, 3,  35, 67, 99,  131, 163, 195, 227,
+  4,  36, 68, 100, 132, 164, 196, 228, 5,  37, 69, 101, 133, 165, 197, 229,
+  6,  38, 70, 102, 134, 166, 198, 230, 7,  39, 71, 103, 135, 167, 199, 231,
+  8,  40, 72, 104, 136, 168, 200, 232, 9,  41, 73, 105, 137, 169, 201, 233,
+  10, 42, 74, 106, 138, 170, 202, 234, 11, 43, 75, 107, 139, 171, 203, 235,
+  12, 44, 76, 108, 140, 172, 204, 236, 13, 45, 77, 109, 141, 173, 205, 237,
+  14, 46, 78, 110, 142, 174, 206, 238, 15, 47, 79, 111, 143, 175, 207, 239,
+  16, 48, 80, 112, 144, 176, 208, 240, 17, 49, 81, 113, 145, 177, 209, 241,
+  18, 50, 82, 114, 146, 178, 210, 242, 19, 51, 83, 115, 147, 179, 211, 243,
+  20, 52, 84, 116, 148, 180, 212, 244, 21, 53, 85, 117, 149, 181, 213, 245,
+  22, 54, 86, 118, 150, 182, 214, 246, 23, 55, 87, 119, 151, 183, 215, 247,
+  24, 56, 88, 120, 152, 184, 216, 248, 25, 57, 89, 121, 153, 185, 217, 249,
+  26, 58, 90, 122, 154, 186, 218, 250, 27, 59, 91, 123, 155, 187, 219, 251,
+  28, 60, 92, 124, 156, 188, 220, 252, 29, 61, 93, 125, 157, 189, 221, 253,
+  30, 62, 94, 126, 158, 190, 222, 254, 31, 63, 95, 127, 159, 191, 223, 255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_8x8[64]) = {
+  0,  1,  8,  16, 9,  2,  3,  10, 17, 24, 32, 25, 18, 11, 4,  5,
+  12, 19, 26, 33, 40, 48, 41, 34, 27, 20, 13, 6,  7,  14, 21, 28,
+  35, 42, 49, 56, 57, 50, 43, 36, 29, 22, 15, 23, 30, 37, 44, 51,
+  58, 59, 52, 45, 38, 31, 39, 46, 53, 60, 61, 54, 47, 55, 62, 63
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_8x8[64]) = {
+  0, 8,  16, 24, 32, 40, 48, 56, 1, 9,  17, 25, 33, 41, 49, 57,
+  2, 10, 18, 26, 34, 42, 50, 58, 3, 11, 19, 27, 35, 43, 51, 59,
+  4, 12, 20, 28, 36, 44, 52, 60, 5, 13, 21, 29, 37, 45, 53, 61,
+  6, 14, 22, 30, 38, 46, 54, 62, 7, 15, 23, 31, 39, 47, 55, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_8x8[64]) = {
+  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15,
+  16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+  32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
+  48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_8x16[128]) = {
+  0,   1,   8,   2,   9,   16,  3,   10,  17,  24,  4,   11,  18,  25,  32,
+  5,   12,  19,  26,  33,  40,  6,   13,  20,  27,  34,  41,  48,  7,   14,
+  21,  28,  35,  42,  49,  56,  15,  22,  29,  36,  43,  50,  57,  64,  23,
+  30,  37,  44,  51,  58,  65,  72,  31,  38,  45,  52,  59,  66,  73,  80,
+  39,  46,  53,  60,  67,  74,  81,  88,  47,  54,  61,  68,  75,  82,  89,
+  96,  55,  62,  69,  76,  83,  90,  97,  104, 63,  70,  77,  84,  91,  98,
+  105, 112, 71,  78,  85,  92,  99,  106, 113, 120, 79,  86,  93,  100, 107,
+  114, 121, 87,  94,  101, 108, 115, 122, 95,  102, 109, 116, 123, 103, 110,
+  117, 124, 111, 118, 125, 119, 126, 127,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_16x8[128]) = {
+  0,  16,  1,   32, 17,  2,   48,  33,  18, 3,  64,  49,  34,  19,  4,   80,
+  65, 50,  35,  20, 5,   96,  81,  66,  51, 36, 21,  6,   112, 97,  82,  67,
+  52, 37,  22,  7,  113, 98,  83,  68,  53, 38, 23,  8,   114, 99,  84,  69,
+  54, 39,  24,  9,  115, 100, 85,  70,  55, 40, 25,  10,  116, 101, 86,  71,
+  56, 41,  26,  11, 117, 102, 87,  72,  57, 42, 27,  12,  118, 103, 88,  73,
+  58, 43,  28,  13, 119, 104, 89,  74,  59, 44, 29,  14,  120, 105, 90,  75,
+  60, 45,  30,  15, 121, 106, 91,  76,  61, 46, 31,  122, 107, 92,  77,  62,
+  47, 123, 108, 93, 78,  63,  124, 109, 94, 79, 125, 110, 95,  126, 111, 127,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_8x16[128]) = {
+  0, 8,  16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96,  104, 112, 120,
+  1, 9,  17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97,  105, 113, 121,
+  2, 10, 18, 26, 34, 42, 50, 58, 66, 74, 82, 90, 98,  106, 114, 122,
+  3, 11, 19, 27, 35, 43, 51, 59, 67, 75, 83, 91, 99,  107, 115, 123,
+  4, 12, 20, 28, 36, 44, 52, 60, 68, 76, 84, 92, 100, 108, 116, 124,
+  5, 13, 21, 29, 37, 45, 53, 61, 69, 77, 85, 93, 101, 109, 117, 125,
+  6, 14, 22, 30, 38, 46, 54, 62, 70, 78, 86, 94, 102, 110, 118, 126,
+  7, 15, 23, 31, 39, 47, 55, 63, 71, 79, 87, 95, 103, 111, 119, 127,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_16x8[128]) = {
+  0,  16, 32, 48, 64, 80, 96,  112, 1,  17, 33, 49, 65, 81, 97,  113,
+  2,  18, 34, 50, 66, 82, 98,  114, 3,  19, 35, 51, 67, 83, 99,  115,
+  4,  20, 36, 52, 68, 84, 100, 116, 5,  21, 37, 53, 69, 85, 101, 117,
+  6,  22, 38, 54, 70, 86, 102, 118, 7,  23, 39, 55, 71, 87, 103, 119,
+  8,  24, 40, 56, 72, 88, 104, 120, 9,  25, 41, 57, 73, 89, 105, 121,
+  10, 26, 42, 58, 74, 90, 106, 122, 11, 27, 43, 59, 75, 91, 107, 123,
+  12, 28, 44, 60, 76, 92, 108, 124, 13, 29, 45, 61, 77, 93, 109, 125,
+  14, 30, 46, 62, 78, 94, 110, 126, 15, 31, 47, 63, 79, 95, 111, 127,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_8x16[128]) = {
+  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   10,  11,  12,  13,  14,
+  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
+  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,
+  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,
+  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,
+  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
+  90,  91,  92,  93,  94,  95,  96,  97,  98,  99,  100, 101, 102, 103, 104,
+  105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+  120, 121, 122, 123, 124, 125, 126, 127,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_16x8[128]) = {
+  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   10,  11,  12,  13,  14,
+  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
+  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,
+  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,
+  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,
+  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
+  90,  91,  92,  93,  94,  95,  96,  97,  98,  99,  100, 101, 102, 103, 104,
+  105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+  120, 121, 122, 123, 124, 125, 126, 127,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_16x32[512]) = {
+  0,   1,   16,  2,   17,  32,  3,   18,  33,  48,  4,   19,  34,  49,  64,
+  5,   20,  35,  50,  65,  80,  6,   21,  36,  51,  66,  81,  96,  7,   22,
+  37,  52,  67,  82,  97,  112, 8,   23,  38,  53,  68,  83,  98,  113, 128,
+  9,   24,  39,  54,  69,  84,  99,  114, 129, 144, 10,  25,  40,  55,  70,
+  85,  100, 115, 130, 145, 160, 11,  26,  41,  56,  71,  86,  101, 116, 131,
+  146, 161, 176, 12,  27,  42,  57,  72,  87,  102, 117, 132, 147, 162, 177,
+  192, 13,  28,  43,  58,  73,  88,  103, 118, 133, 148, 163, 178, 193, 208,
+  14,  29,  44,  59,  74,  89,  104, 119, 134, 149, 164, 179, 194, 209, 224,
+  15,  30,  45,  60,  75,  90,  105, 120, 135, 150, 165, 180, 195, 210, 225,
+  240, 31,  46,  61,  76,  91,  106, 121, 136, 151, 166, 181, 196, 211, 226,
+  241, 256, 47,  62,  77,  92,  107, 122, 137, 152, 167, 182, 197, 212, 227,
+  242, 257, 272, 63,  78,  93,  108, 123, 138, 153, 168, 183, 198, 213, 228,
+  243, 258, 273, 288, 79,  94,  109, 124, 139, 154, 169, 184, 199, 214, 229,
+  244, 259, 274, 289, 304, 95,  110, 125, 140, 155, 170, 185, 200, 215, 230,
+  245, 260, 275, 290, 305, 320, 111, 126, 141, 156, 171, 186, 201, 216, 231,
+  246, 261, 276, 291, 306, 321, 336, 127, 142, 157, 172, 187, 202, 217, 232,
+  247, 262, 277, 292, 307, 322, 337, 352, 143, 158, 173, 188, 203, 218, 233,
+  248, 263, 278, 293, 308, 323, 338, 353, 368, 159, 174, 189, 204, 219, 234,
+  249, 264, 279, 294, 309, 324, 339, 354, 369, 384, 175, 190, 205, 220, 235,
+  250, 265, 280, 295, 310, 325, 340, 355, 370, 385, 400, 191, 206, 221, 236,
+  251, 266, 281, 296, 311, 326, 341, 356, 371, 386, 401, 416, 207, 222, 237,
+  252, 267, 282, 297, 312, 327, 342, 357, 372, 387, 402, 417, 432, 223, 238,
+  253, 268, 283, 298, 313, 328, 343, 358, 373, 388, 403, 418, 433, 448, 239,
+  254, 269, 284, 299, 314, 329, 344, 359, 374, 389, 404, 419, 434, 449, 464,
+  255, 270, 285, 300, 315, 330, 345, 360, 375, 390, 405, 420, 435, 450, 465,
+  480, 271, 286, 301, 316, 331, 346, 361, 376, 391, 406, 421, 436, 451, 466,
+  481, 496, 287, 302, 317, 332, 347, 362, 377, 392, 407, 422, 437, 452, 467,
+  482, 497, 303, 318, 333, 348, 363, 378, 393, 408, 423, 438, 453, 468, 483,
+  498, 319, 334, 349, 364, 379, 394, 409, 424, 439, 454, 469, 484, 499, 335,
+  350, 365, 380, 395, 410, 425, 440, 455, 470, 485, 500, 351, 366, 381, 396,
+  411, 426, 441, 456, 471, 486, 501, 367, 382, 397, 412, 427, 442, 457, 472,
+  487, 502, 383, 398, 413, 428, 443, 458, 473, 488, 503, 399, 414, 429, 444,
+  459, 474, 489, 504, 415, 430, 445, 460, 475, 490, 505, 431, 446, 461, 476,
+  491, 506, 447, 462, 477, 492, 507, 463, 478, 493, 508, 479, 494, 509, 495,
+  510, 511,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_32x16[512]) = {
+  0,   32,  1,   64,  33,  2,   96,  65,  34,  3,   128, 97,  66,  35,  4,
+  160, 129, 98,  67,  36,  5,   192, 161, 130, 99,  68,  37,  6,   224, 193,
+  162, 131, 100, 69,  38,  7,   256, 225, 194, 163, 132, 101, 70,  39,  8,
+  288, 257, 226, 195, 164, 133, 102, 71,  40,  9,   320, 289, 258, 227, 196,
+  165, 134, 103, 72,  41,  10,  352, 321, 290, 259, 228, 197, 166, 135, 104,
+  73,  42,  11,  384, 353, 322, 291, 260, 229, 198, 167, 136, 105, 74,  43,
+  12,  416, 385, 354, 323, 292, 261, 230, 199, 168, 137, 106, 75,  44,  13,
+  448, 417, 386, 355, 324, 293, 262, 231, 200, 169, 138, 107, 76,  45,  14,
+  480, 449, 418, 387, 356, 325, 294, 263, 232, 201, 170, 139, 108, 77,  46,
+  15,  481, 450, 419, 388, 357, 326, 295, 264, 233, 202, 171, 140, 109, 78,
+  47,  16,  482, 451, 420, 389, 358, 327, 296, 265, 234, 203, 172, 141, 110,
+  79,  48,  17,  483, 452, 421, 390, 359, 328, 297, 266, 235, 204, 173, 142,
+  111, 80,  49,  18,  484, 453, 422, 391, 360, 329, 298, 267, 236, 205, 174,
+  143, 112, 81,  50,  19,  485, 454, 423, 392, 361, 330, 299, 268, 237, 206,
+  175, 144, 113, 82,  51,  20,  486, 455, 424, 393, 362, 331, 300, 269, 238,
+  207, 176, 145, 114, 83,  52,  21,  487, 456, 425, 394, 363, 332, 301, 270,
+  239, 208, 177, 146, 115, 84,  53,  22,  488, 457, 426, 395, 364, 333, 302,
+  271, 240, 209, 178, 147, 116, 85,  54,  23,  489, 458, 427, 396, 365, 334,
+  303, 272, 241, 210, 179, 148, 117, 86,  55,  24,  490, 459, 428, 397, 366,
+  335, 304, 273, 242, 211, 180, 149, 118, 87,  56,  25,  491, 460, 429, 398,
+  367, 336, 305, 274, 243, 212, 181, 150, 119, 88,  57,  26,  492, 461, 430,
+  399, 368, 337, 306, 275, 244, 213, 182, 151, 120, 89,  58,  27,  493, 462,
+  431, 400, 369, 338, 307, 276, 245, 214, 183, 152, 121, 90,  59,  28,  494,
+  463, 432, 401, 370, 339, 308, 277, 246, 215, 184, 153, 122, 91,  60,  29,
+  495, 464, 433, 402, 371, 340, 309, 278, 247, 216, 185, 154, 123, 92,  61,
+  30,  496, 465, 434, 403, 372, 341, 310, 279, 248, 217, 186, 155, 124, 93,
+  62,  31,  497, 466, 435, 404, 373, 342, 311, 280, 249, 218, 187, 156, 125,
+  94,  63,  498, 467, 436, 405, 374, 343, 312, 281, 250, 219, 188, 157, 126,
+  95,  499, 468, 437, 406, 375, 344, 313, 282, 251, 220, 189, 158, 127, 500,
+  469, 438, 407, 376, 345, 314, 283, 252, 221, 190, 159, 501, 470, 439, 408,
+  377, 346, 315, 284, 253, 222, 191, 502, 471, 440, 409, 378, 347, 316, 285,
+  254, 223, 503, 472, 441, 410, 379, 348, 317, 286, 255, 504, 473, 442, 411,
+  380, 349, 318, 287, 505, 474, 443, 412, 381, 350, 319, 506, 475, 444, 413,
+  382, 351, 507, 476, 445, 414, 383, 508, 477, 446, 415, 509, 478, 447, 510,
+  479, 511,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_16x32[512]) = {
+  0,   16,  32,  48,  64,  80,  96,  112, 128, 144, 160, 176, 192, 208, 224,
+  240, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464,
+  480, 496, 1,   17,  33,  49,  65,  81,  97,  113, 129, 145, 161, 177, 193,
+  209, 225, 241, 257, 273, 289, 305, 321, 337, 353, 369, 385, 401, 417, 433,
+  449, 465, 481, 497, 2,   18,  34,  50,  66,  82,  98,  114, 130, 146, 162,
+  178, 194, 210, 226, 242, 258, 274, 290, 306, 322, 338, 354, 370, 386, 402,
+  418, 434, 450, 466, 482, 498, 3,   19,  35,  51,  67,  83,  99,  115, 131,
+  147, 163, 179, 195, 211, 227, 243, 259, 275, 291, 307, 323, 339, 355, 371,
+  387, 403, 419, 435, 451, 467, 483, 499, 4,   20,  36,  52,  68,  84,  100,
+  116, 132, 148, 164, 180, 196, 212, 228, 244, 260, 276, 292, 308, 324, 340,
+  356, 372, 388, 404, 420, 436, 452, 468, 484, 500, 5,   21,  37,  53,  69,
+  85,  101, 117, 133, 149, 165, 181, 197, 213, 229, 245, 261, 277, 293, 309,
+  325, 341, 357, 373, 389, 405, 421, 437, 453, 469, 485, 501, 6,   22,  38,
+  54,  70,  86,  102, 118, 134, 150, 166, 182, 198, 214, 230, 246, 262, 278,
+  294, 310, 326, 342, 358, 374, 390, 406, 422, 438, 454, 470, 486, 502, 7,
+  23,  39,  55,  71,  87,  103, 119, 135, 151, 167, 183, 199, 215, 231, 247,
+  263, 279, 295, 311, 327, 343, 359, 375, 391, 407, 423, 439, 455, 471, 487,
+  503, 8,   24,  40,  56,  72,  88,  104, 120, 136, 152, 168, 184, 200, 216,
+  232, 248, 264, 280, 296, 312, 328, 344, 360, 376, 392, 408, 424, 440, 456,
+  472, 488, 504, 9,   25,  41,  57,  73,  89,  105, 121, 137, 153, 169, 185,
+  201, 217, 233, 249, 265, 281, 297, 313, 329, 345, 361, 377, 393, 409, 425,
+  441, 457, 473, 489, 505, 10,  26,  42,  58,  74,  90,  106, 122, 138, 154,
+  170, 186, 202, 218, 234, 250, 266, 282, 298, 314, 330, 346, 362, 378, 394,
+  410, 426, 442, 458, 474, 490, 506, 11,  27,  43,  59,  75,  91,  107, 123,
+  139, 155, 171, 187, 203, 219, 235, 251, 267, 283, 299, 315, 331, 347, 363,
+  379, 395, 411, 427, 443, 459, 475, 491, 507, 12,  28,  44,  60,  76,  92,
+  108, 124, 140, 156, 172, 188, 204, 220, 236, 252, 268, 284, 300, 316, 332,
+  348, 364, 380, 396, 412, 428, 444, 460, 476, 492, 508, 13,  29,  45,  61,
+  77,  93,  109, 125, 141, 157, 173, 189, 205, 221, 237, 253, 269, 285, 301,
+  317, 333, 349, 365, 381, 397, 413, 429, 445, 461, 477, 493, 509, 14,  30,
+  46,  62,  78,  94,  110, 126, 142, 158, 174, 190, 206, 222, 238, 254, 270,
+  286, 302, 318, 334, 350, 366, 382, 398, 414, 430, 446, 462, 478, 494, 510,
+  15,  31,  47,  63,  79,  95,  111, 127, 143, 159, 175, 191, 207, 223, 239,
+  255, 271, 287, 303, 319, 335, 351, 367, 383, 399, 415, 431, 447, 463, 479,
+  495, 511,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_32x16[512]) = {
+  0,  32, 64, 96,  128, 160, 192, 224, 256, 288, 320, 352, 384, 416, 448, 480,
+  1,  33, 65, 97,  129, 161, 193, 225, 257, 289, 321, 353, 385, 417, 449, 481,
+  2,  34, 66, 98,  130, 162, 194, 226, 258, 290, 322, 354, 386, 418, 450, 482,
+  3,  35, 67, 99,  131, 163, 195, 227, 259, 291, 323, 355, 387, 419, 451, 483,
+  4,  36, 68, 100, 132, 164, 196, 228, 260, 292, 324, 356, 388, 420, 452, 484,
+  5,  37, 69, 101, 133, 165, 197, 229, 261, 293, 325, 357, 389, 421, 453, 485,
+  6,  38, 70, 102, 134, 166, 198, 230, 262, 294, 326, 358, 390, 422, 454, 486,
+  7,  39, 71, 103, 135, 167, 199, 231, 263, 295, 327, 359, 391, 423, 455, 487,
+  8,  40, 72, 104, 136, 168, 200, 232, 264, 296, 328, 360, 392, 424, 456, 488,
+  9,  41, 73, 105, 137, 169, 201, 233, 265, 297, 329, 361, 393, 425, 457, 489,
+  10, 42, 74, 106, 138, 170, 202, 234, 266, 298, 330, 362, 394, 426, 458, 490,
+  11, 43, 75, 107, 139, 171, 203, 235, 267, 299, 331, 363, 395, 427, 459, 491,
+  12, 44, 76, 108, 140, 172, 204, 236, 268, 300, 332, 364, 396, 428, 460, 492,
+  13, 45, 77, 109, 141, 173, 205, 237, 269, 301, 333, 365, 397, 429, 461, 493,
+  14, 46, 78, 110, 142, 174, 206, 238, 270, 302, 334, 366, 398, 430, 462, 494,
+  15, 47, 79, 111, 143, 175, 207, 239, 271, 303, 335, 367, 399, 431, 463, 495,
+  16, 48, 80, 112, 144, 176, 208, 240, 272, 304, 336, 368, 400, 432, 464, 496,
+  17, 49, 81, 113, 145, 177, 209, 241, 273, 305, 337, 369, 401, 433, 465, 497,
+  18, 50, 82, 114, 146, 178, 210, 242, 274, 306, 338, 370, 402, 434, 466, 498,
+  19, 51, 83, 115, 147, 179, 211, 243, 275, 307, 339, 371, 403, 435, 467, 499,
+  20, 52, 84, 116, 148, 180, 212, 244, 276, 308, 340, 372, 404, 436, 468, 500,
+  21, 53, 85, 117, 149, 181, 213, 245, 277, 309, 341, 373, 405, 437, 469, 501,
+  22, 54, 86, 118, 150, 182, 214, 246, 278, 310, 342, 374, 406, 438, 470, 502,
+  23, 55, 87, 119, 151, 183, 215, 247, 279, 311, 343, 375, 407, 439, 471, 503,
+  24, 56, 88, 120, 152, 184, 216, 248, 280, 312, 344, 376, 408, 440, 472, 504,
+  25, 57, 89, 121, 153, 185, 217, 249, 281, 313, 345, 377, 409, 441, 473, 505,
+  26, 58, 90, 122, 154, 186, 218, 250, 282, 314, 346, 378, 410, 442, 474, 506,
+  27, 59, 91, 123, 155, 187, 219, 251, 283, 315, 347, 379, 411, 443, 475, 507,
+  28, 60, 92, 124, 156, 188, 220, 252, 284, 316, 348, 380, 412, 444, 476, 508,
+  29, 61, 93, 125, 157, 189, 221, 253, 285, 317, 349, 381, 413, 445, 477, 509,
+  30, 62, 94, 126, 158, 190, 222, 254, 286, 318, 350, 382, 414, 446, 478, 510,
+  31, 63, 95, 127, 159, 191, 223, 255, 287, 319, 351, 383, 415, 447, 479, 511,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_16x32[512]) = {
+  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   10,  11,  12,  13,  14,
+  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
+  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,
+  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,
+  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,
+  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
+  90,  91,  92,  93,  94,  95,  96,  97,  98,  99,  100, 101, 102, 103, 104,
+  105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+  120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+  135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+  150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+  165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+  180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+  195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+  210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+  225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+  240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+  255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269,
+  270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284,
+  285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
+  300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314,
+  315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329,
+  330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344,
+  345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359,
+  360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374,
+  375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389,
+  390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404,
+  405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419,
+  420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434,
+  435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449,
+  450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464,
+  465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479,
+  480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494,
+  495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509,
+  510, 511,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_32x16[512]) = {
+  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   10,  11,  12,  13,  14,
+  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
+  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,
+  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,
+  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,
+  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
+  90,  91,  92,  93,  94,  95,  96,  97,  98,  99,  100, 101, 102, 103, 104,
+  105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+  120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+  135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+  150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+  165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+  180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+  195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+  210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+  225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+  240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+  255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269,
+  270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284,
+  285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
+  300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314,
+  315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329,
+  330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344,
+  345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359,
+  360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374,
+  375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389,
+  390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404,
+  405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419,
+  420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434,
+  435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449,
+  450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464,
+  465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479,
+  480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494,
+  495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509,
+  510, 511,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_16x16[256]) = {
+  0,   1,   16,  32,  17,  2,   3,   18,  33,  48,  64,  49,  34,  19,  4,
+  5,   20,  35,  50,  65,  80,  96,  81,  66,  51,  36,  21,  6,   7,   22,
+  37,  52,  67,  82,  97,  112, 128, 113, 98,  83,  68,  53,  38,  23,  8,
+  9,   24,  39,  54,  69,  84,  99,  114, 129, 144, 160, 145, 130, 115, 100,
+  85,  70,  55,  40,  25,  10,  11,  26,  41,  56,  71,  86,  101, 116, 131,
+  146, 161, 176, 192, 177, 162, 147, 132, 117, 102, 87,  72,  57,  42,  27,
+  12,  13,  28,  43,  58,  73,  88,  103, 118, 133, 148, 163, 178, 193, 208,
+  224, 209, 194, 179, 164, 149, 134, 119, 104, 89,  74,  59,  44,  29,  14,
+  15,  30,  45,  60,  75,  90,  105, 120, 135, 150, 165, 180, 195, 210, 225,
+  240, 241, 226, 211, 196, 181, 166, 151, 136, 121, 106, 91,  76,  61,  46,
+  31,  47,  62,  77,  92,  107, 122, 137, 152, 167, 182, 197, 212, 227, 242,
+  243, 228, 213, 198, 183, 168, 153, 138, 123, 108, 93,  78,  63,  79,  94,
+  109, 124, 139, 154, 169, 184, 199, 214, 229, 244, 245, 230, 215, 200, 185,
+  170, 155, 140, 125, 110, 95,  111, 126, 141, 156, 171, 186, 201, 216, 231,
+  246, 247, 232, 217, 202, 187, 172, 157, 142, 127, 143, 158, 173, 188, 203,
+  218, 233, 248, 249, 234, 219, 204, 189, 174, 159, 175, 190, 205, 220, 235,
+  250, 251, 236, 221, 206, 191, 207, 222, 237, 252, 253, 238, 223, 239, 254,
+  255
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_16x16[256]) = {
+  0,  16, 32, 48, 64, 80, 96,  112, 128, 144, 160, 176, 192, 208, 224, 240,
+  1,  17, 33, 49, 65, 81, 97,  113, 129, 145, 161, 177, 193, 209, 225, 241,
+  2,  18, 34, 50, 66, 82, 98,  114, 130, 146, 162, 178, 194, 210, 226, 242,
+  3,  19, 35, 51, 67, 83, 99,  115, 131, 147, 163, 179, 195, 211, 227, 243,
+  4,  20, 36, 52, 68, 84, 100, 116, 132, 148, 164, 180, 196, 212, 228, 244,
+  5,  21, 37, 53, 69, 85, 101, 117, 133, 149, 165, 181, 197, 213, 229, 245,
+  6,  22, 38, 54, 70, 86, 102, 118, 134, 150, 166, 182, 198, 214, 230, 246,
+  7,  23, 39, 55, 71, 87, 103, 119, 135, 151, 167, 183, 199, 215, 231, 247,
+  8,  24, 40, 56, 72, 88, 104, 120, 136, 152, 168, 184, 200, 216, 232, 248,
+  9,  25, 41, 57, 73, 89, 105, 121, 137, 153, 169, 185, 201, 217, 233, 249,
+  10, 26, 42, 58, 74, 90, 106, 122, 138, 154, 170, 186, 202, 218, 234, 250,
+  11, 27, 43, 59, 75, 91, 107, 123, 139, 155, 171, 187, 203, 219, 235, 251,
+  12, 28, 44, 60, 76, 92, 108, 124, 140, 156, 172, 188, 204, 220, 236, 252,
+  13, 29, 45, 61, 77, 93, 109, 125, 141, 157, 173, 189, 205, 221, 237, 253,
+  14, 30, 46, 62, 78, 94, 110, 126, 142, 158, 174, 190, 206, 222, 238, 254,
+  15, 31, 47, 63, 79, 95, 111, 127, 143, 159, 175, 191, 207, 223, 239, 255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_16x16[256]) = {
+  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   10,  11,  12,  13,  14,
+  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
+  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,
+  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,
+  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,
+  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
+  90,  91,  92,  93,  94,  95,  96,  97,  98,  99,  100, 101, 102, 103, 104,
+  105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+  120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+  135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+  150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+  165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+  180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+  195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+  210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+  225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+  240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+  255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_32x32[1024]) = {
+  0,   32,   64,  96,   128, 160,  192, 224,  256, 288,  320, 352,  384, 416,
+  448, 480,  512, 544,  576, 608,  640, 672,  704, 736,  768, 800,  832, 864,
+  896, 928,  960, 992,  1,   33,   65,  97,   129, 161,  193, 225,  257, 289,
+  321, 353,  385, 417,  449, 481,  513, 545,  577, 609,  641, 673,  705, 737,
+  769, 801,  833, 865,  897, 929,  961, 993,  2,   34,   66,  98,   130, 162,
+  194, 226,  258, 290,  322, 354,  386, 418,  450, 482,  514, 546,  578, 610,
+  642, 674,  706, 738,  770, 802,  834, 866,  898, 930,  962, 994,  3,   35,
+  67,  99,   131, 163,  195, 227,  259, 291,  323, 355,  387, 419,  451, 483,
+  515, 547,  579, 611,  643, 675,  707, 739,  771, 803,  835, 867,  899, 931,
+  963, 995,  4,   36,   68,  100,  132, 164,  196, 228,  260, 292,  324, 356,
+  388, 420,  452, 484,  516, 548,  580, 612,  644, 676,  708, 740,  772, 804,
+  836, 868,  900, 932,  964, 996,  5,   37,   69,  101,  133, 165,  197, 229,
+  261, 293,  325, 357,  389, 421,  453, 485,  517, 549,  581, 613,  645, 677,
+  709, 741,  773, 805,  837, 869,  901, 933,  965, 997,  6,   38,   70,  102,
+  134, 166,  198, 230,  262, 294,  326, 358,  390, 422,  454, 486,  518, 550,
+  582, 614,  646, 678,  710, 742,  774, 806,  838, 870,  902, 934,  966, 998,
+  7,   39,   71,  103,  135, 167,  199, 231,  263, 295,  327, 359,  391, 423,
+  455, 487,  519, 551,  583, 615,  647, 679,  711, 743,  775, 807,  839, 871,
+  903, 935,  967, 999,  8,   40,   72,  104,  136, 168,  200, 232,  264, 296,
+  328, 360,  392, 424,  456, 488,  520, 552,  584, 616,  648, 680,  712, 744,
+  776, 808,  840, 872,  904, 936,  968, 1000, 9,   41,   73,  105,  137, 169,
+  201, 233,  265, 297,  329, 361,  393, 425,  457, 489,  521, 553,  585, 617,
+  649, 681,  713, 745,  777, 809,  841, 873,  905, 937,  969, 1001, 10,  42,
+  74,  106,  138, 170,  202, 234,  266, 298,  330, 362,  394, 426,  458, 490,
+  522, 554,  586, 618,  650, 682,  714, 746,  778, 810,  842, 874,  906, 938,
+  970, 1002, 11,  43,   75,  107,  139, 171,  203, 235,  267, 299,  331, 363,
+  395, 427,  459, 491,  523, 555,  587, 619,  651, 683,  715, 747,  779, 811,
+  843, 875,  907, 939,  971, 1003, 12,  44,   76,  108,  140, 172,  204, 236,
+  268, 300,  332, 364,  396, 428,  460, 492,  524, 556,  588, 620,  652, 684,
+  716, 748,  780, 812,  844, 876,  908, 940,  972, 1004, 13,  45,   77,  109,
+  141, 173,  205, 237,  269, 301,  333, 365,  397, 429,  461, 493,  525, 557,
+  589, 621,  653, 685,  717, 749,  781, 813,  845, 877,  909, 941,  973, 1005,
+  14,  46,   78,  110,  142, 174,  206, 238,  270, 302,  334, 366,  398, 430,
+  462, 494,  526, 558,  590, 622,  654, 686,  718, 750,  782, 814,  846, 878,
+  910, 942,  974, 1006, 15,  47,   79,  111,  143, 175,  207, 239,  271, 303,
+  335, 367,  399, 431,  463, 495,  527, 559,  591, 623,  655, 687,  719, 751,
+  783, 815,  847, 879,  911, 943,  975, 1007, 16,  48,   80,  112,  144, 176,
+  208, 240,  272, 304,  336, 368,  400, 432,  464, 496,  528, 560,  592, 624,
+  656, 688,  720, 752,  784, 816,  848, 880,  912, 944,  976, 1008, 17,  49,
+  81,  113,  145, 177,  209, 241,  273, 305,  337, 369,  401, 433,  465, 497,
+  529, 561,  593, 625,  657, 689,  721, 753,  785, 817,  849, 881,  913, 945,
+  977, 1009, 18,  50,   82,  114,  146, 178,  210, 242,  274, 306,  338, 370,
+  402, 434,  466, 498,  530, 562,  594, 626,  658, 690,  722, 754,  786, 818,
+  850, 882,  914, 946,  978, 1010, 19,  51,   83,  115,  147, 179,  211, 243,
+  275, 307,  339, 371,  403, 435,  467, 499,  531, 563,  595, 627,  659, 691,
+  723, 755,  787, 819,  851, 883,  915, 947,  979, 1011, 20,  52,   84,  116,
+  148, 180,  212, 244,  276, 308,  340, 372,  404, 436,  468, 500,  532, 564,
+  596, 628,  660, 692,  724, 756,  788, 820,  852, 884,  916, 948,  980, 1012,
+  21,  53,   85,  117,  149, 181,  213, 245,  277, 309,  341, 373,  405, 437,
+  469, 501,  533, 565,  597, 629,  661, 693,  725, 757,  789, 821,  853, 885,
+  917, 949,  981, 1013, 22,  54,   86,  118,  150, 182,  214, 246,  278, 310,
+  342, 374,  406, 438,  470, 502,  534, 566,  598, 630,  662, 694,  726, 758,
+  790, 822,  854, 886,  918, 950,  982, 1014, 23,  55,   87,  119,  151, 183,
+  215, 247,  279, 311,  343, 375,  407, 439,  471, 503,  535, 567,  599, 631,
+  663, 695,  727, 759,  791, 823,  855, 887,  919, 951,  983, 1015, 24,  56,
+  88,  120,  152, 184,  216, 248,  280, 312,  344, 376,  408, 440,  472, 504,
+  536, 568,  600, 632,  664, 696,  728, 760,  792, 824,  856, 888,  920, 952,
+  984, 1016, 25,  57,   89,  121,  153, 185,  217, 249,  281, 313,  345, 377,
+  409, 441,  473, 505,  537, 569,  601, 633,  665, 697,  729, 761,  793, 825,
+  857, 889,  921, 953,  985, 1017, 26,  58,   90,  122,  154, 186,  218, 250,
+  282, 314,  346, 378,  410, 442,  474, 506,  538, 570,  602, 634,  666, 698,
+  730, 762,  794, 826,  858, 890,  922, 954,  986, 1018, 27,  59,   91,  123,
+  155, 187,  219, 251,  283, 315,  347, 379,  411, 443,  475, 507,  539, 571,
+  603, 635,  667, 699,  731, 763,  795, 827,  859, 891,  923, 955,  987, 1019,
+  28,  60,   92,  124,  156, 188,  220, 252,  284, 316,  348, 380,  412, 444,
+  476, 508,  540, 572,  604, 636,  668, 700,  732, 764,  796, 828,  860, 892,
+  924, 956,  988, 1020, 29,  61,   93,  125,  157, 189,  221, 253,  285, 317,
+  349, 381,  413, 445,  477, 509,  541, 573,  605, 637,  669, 701,  733, 765,
+  797, 829,  861, 893,  925, 957,  989, 1021, 30,  62,   94,  126,  158, 190,
+  222, 254,  286, 318,  350, 382,  414, 446,  478, 510,  542, 574,  606, 638,
+  670, 702,  734, 766,  798, 830,  862, 894,  926, 958,  990, 1022, 31,  63,
+  95,  127,  159, 191,  223, 255,  287, 319,  351, 383,  415, 447,  479, 511,
+  543, 575,  607, 639,  671, 703,  735, 767,  799, 831,  863, 895,  927, 959,
+  991, 1023,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_32x32[1024]) = {
+  0,    1,    2,    3,    4,    5,    6,    7,    8,    9,    10,   11,   12,
+  13,   14,   15,   16,   17,   18,   19,   20,   21,   22,   23,   24,   25,
+  26,   27,   28,   29,   30,   31,   32,   33,   34,   35,   36,   37,   38,
+  39,   40,   41,   42,   43,   44,   45,   46,   47,   48,   49,   50,   51,
+  52,   53,   54,   55,   56,   57,   58,   59,   60,   61,   62,   63,   64,
+  65,   66,   67,   68,   69,   70,   71,   72,   73,   74,   75,   76,   77,
+  78,   79,   80,   81,   82,   83,   84,   85,   86,   87,   88,   89,   90,
+  91,   92,   93,   94,   95,   96,   97,   98,   99,   100,  101,  102,  103,
+  104,  105,  106,  107,  108,  109,  110,  111,  112,  113,  114,  115,  116,
+  117,  118,  119,  120,  121,  122,  123,  124,  125,  126,  127,  128,  129,
+  130,  131,  132,  133,  134,  135,  136,  137,  138,  139,  140,  141,  142,
+  143,  144,  145,  146,  147,  148,  149,  150,  151,  152,  153,  154,  155,
+  156,  157,  158,  159,  160,  161,  162,  163,  164,  165,  166,  167,  168,
+  169,  170,  171,  172,  173,  174,  175,  176,  177,  178,  179,  180,  181,
+  182,  183,  184,  185,  186,  187,  188,  189,  190,  191,  192,  193,  194,
+  195,  196,  197,  198,  199,  200,  201,  202,  203,  204,  205,  206,  207,
+  208,  209,  210,  211,  212,  213,  214,  215,  216,  217,  218,  219,  220,
+  221,  222,  223,  224,  225,  226,  227,  228,  229,  230,  231,  232,  233,
+  234,  235,  236,  237,  238,  239,  240,  241,  242,  243,  244,  245,  246,
+  247,  248,  249,  250,  251,  252,  253,  254,  255,  256,  257,  258,  259,
+  260,  261,  262,  263,  264,  265,  266,  267,  268,  269,  270,  271,  272,
+  273,  274,  275,  276,  277,  278,  279,  280,  281,  282,  283,  284,  285,
+  286,  287,  288,  289,  290,  291,  292,  293,  294,  295,  296,  297,  298,
+  299,  300,  301,  302,  303,  304,  305,  306,  307,  308,  309,  310,  311,
+  312,  313,  314,  315,  316,  317,  318,  319,  320,  321,  322,  323,  324,
+  325,  326,  327,  328,  329,  330,  331,  332,  333,  334,  335,  336,  337,
+  338,  339,  340,  341,  342,  343,  344,  345,  346,  347,  348,  349,  350,
+  351,  352,  353,  354,  355,  356,  357,  358,  359,  360,  361,  362,  363,
+  364,  365,  366,  367,  368,  369,  370,  371,  372,  373,  374,  375,  376,
+  377,  378,  379,  380,  381,  382,  383,  384,  385,  386,  387,  388,  389,
+  390,  391,  392,  393,  394,  395,  396,  397,  398,  399,  400,  401,  402,
+  403,  404,  405,  406,  407,  408,  409,  410,  411,  412,  413,  414,  415,
+  416,  417,  418,  419,  420,  421,  422,  423,  424,  425,  426,  427,  428,
+  429,  430,  431,  432,  433,  434,  435,  436,  437,  438,  439,  440,  441,
+  442,  443,  444,  445,  446,  447,  448,  449,  450,  451,  452,  453,  454,
+  455,  456,  457,  458,  459,  460,  461,  462,  463,  464,  465,  466,  467,
+  468,  469,  470,  471,  472,  473,  474,  475,  476,  477,  478,  479,  480,
+  481,  482,  483,  484,  485,  486,  487,  488,  489,  490,  491,  492,  493,
+  494,  495,  496,  497,  498,  499,  500,  501,  502,  503,  504,  505,  506,
+  507,  508,  509,  510,  511,  512,  513,  514,  515,  516,  517,  518,  519,
+  520,  521,  522,  523,  524,  525,  526,  527,  528,  529,  530,  531,  532,
+  533,  534,  535,  536,  537,  538,  539,  540,  541,  542,  543,  544,  545,
+  546,  547,  548,  549,  550,  551,  552,  553,  554,  555,  556,  557,  558,
+  559,  560,  561,  562,  563,  564,  565,  566,  567,  568,  569,  570,  571,
+  572,  573,  574,  575,  576,  577,  578,  579,  580,  581,  582,  583,  584,
+  585,  586,  587,  588,  589,  590,  591,  592,  593,  594,  595,  596,  597,
+  598,  599,  600,  601,  602,  603,  604,  605,  606,  607,  608,  609,  610,
+  611,  612,  613,  614,  615,  616,  617,  618,  619,  620,  621,  622,  623,
+  624,  625,  626,  627,  628,  629,  630,  631,  632,  633,  634,  635,  636,
+  637,  638,  639,  640,  641,  642,  643,  644,  645,  646,  647,  648,  649,
+  650,  651,  652,  653,  654,  655,  656,  657,  658,  659,  660,  661,  662,
+  663,  664,  665,  666,  667,  668,  669,  670,  671,  672,  673,  674,  675,
+  676,  677,  678,  679,  680,  681,  682,  683,  684,  685,  686,  687,  688,
+  689,  690,  691,  692,  693,  694,  695,  696,  697,  698,  699,  700,  701,
+  702,  703,  704,  705,  706,  707,  708,  709,  710,  711,  712,  713,  714,
+  715,  716,  717,  718,  719,  720,  721,  722,  723,  724,  725,  726,  727,
+  728,  729,  730,  731,  732,  733,  734,  735,  736,  737,  738,  739,  740,
+  741,  742,  743,  744,  745,  746,  747,  748,  749,  750,  751,  752,  753,
+  754,  755,  756,  757,  758,  759,  760,  761,  762,  763,  764,  765,  766,
+  767,  768,  769,  770,  771,  772,  773,  774,  775,  776,  777,  778,  779,
+  780,  781,  782,  783,  784,  785,  786,  787,  788,  789,  790,  791,  792,
+  793,  794,  795,  796,  797,  798,  799,  800,  801,  802,  803,  804,  805,
+  806,  807,  808,  809,  810,  811,  812,  813,  814,  815,  816,  817,  818,
+  819,  820,  821,  822,  823,  824,  825,  826,  827,  828,  829,  830,  831,
+  832,  833,  834,  835,  836,  837,  838,  839,  840,  841,  842,  843,  844,
+  845,  846,  847,  848,  849,  850,  851,  852,  853,  854,  855,  856,  857,
+  858,  859,  860,  861,  862,  863,  864,  865,  866,  867,  868,  869,  870,
+  871,  872,  873,  874,  875,  876,  877,  878,  879,  880,  881,  882,  883,
+  884,  885,  886,  887,  888,  889,  890,  891,  892,  893,  894,  895,  896,
+  897,  898,  899,  900,  901,  902,  903,  904,  905,  906,  907,  908,  909,
+  910,  911,  912,  913,  914,  915,  916,  917,  918,  919,  920,  921,  922,
+  923,  924,  925,  926,  927,  928,  929,  930,  931,  932,  933,  934,  935,
+  936,  937,  938,  939,  940,  941,  942,  943,  944,  945,  946,  947,  948,
+  949,  950,  951,  952,  953,  954,  955,  956,  957,  958,  959,  960,  961,
+  962,  963,  964,  965,  966,  967,  968,  969,  970,  971,  972,  973,  974,
+  975,  976,  977,  978,  979,  980,  981,  982,  983,  984,  985,  986,  987,
+  988,  989,  990,  991,  992,  993,  994,  995,  996,  997,  998,  999,  1000,
+  1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013,
+  1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_32x32[1024]) = {
+  0,    1,    32,   64,   33,   2,   3,    34,   65,   96,   128,  97,  66,
+  35,   4,    5,    36,   67,   98,  129,  160,  192,  161,  130,  99,  68,
+  37,   6,    7,    38,   69,   100, 131,  162,  193,  224,  256,  225, 194,
+  163,  132,  101,  70,   39,   8,   9,    40,   71,   102,  133,  164, 195,
+  226,  257,  288,  320,  289,  258, 227,  196,  165,  134,  103,  72,  41,
+  10,   11,   42,   73,   104,  135, 166,  197,  228,  259,  290,  321, 352,
+  384,  353,  322,  291,  260,  229, 198,  167,  136,  105,  74,   43,  12,
+  13,   44,   75,   106,  137,  168, 199,  230,  261,  292,  323,  354, 385,
+  416,  448,  417,  386,  355,  324, 293,  262,  231,  200,  169,  138, 107,
+  76,   45,   14,   15,   46,   77,  108,  139,  170,  201,  232,  263, 294,
+  325,  356,  387,  418,  449,  480, 512,  481,  450,  419,  388,  357, 326,
+  295,  264,  233,  202,  171,  140, 109,  78,   47,   16,   17,   48,  79,
+  110,  141,  172,  203,  234,  265, 296,  327,  358,  389,  420,  451, 482,
+  513,  544,  576,  545,  514,  483, 452,  421,  390,  359,  328,  297, 266,
+  235,  204,  173,  142,  111,  80,  49,   18,   19,   50,   81,   112, 143,
+  174,  205,  236,  267,  298,  329, 360,  391,  422,  453,  484,  515, 546,
+  577,  608,  640,  609,  578,  547, 516,  485,  454,  423,  392,  361, 330,
+  299,  268,  237,  206,  175,  144, 113,  82,   51,   20,   21,   52,  83,
+  114,  145,  176,  207,  238,  269, 300,  331,  362,  393,  424,  455, 486,
+  517,  548,  579,  610,  641,  672, 704,  673,  642,  611,  580,  549, 518,
+  487,  456,  425,  394,  363,  332, 301,  270,  239,  208,  177,  146, 115,
+  84,   53,   22,   23,   54,   85,  116,  147,  178,  209,  240,  271, 302,
+  333,  364,  395,  426,  457,  488, 519,  550,  581,  612,  643,  674, 705,
+  736,  768,  737,  706,  675,  644, 613,  582,  551,  520,  489,  458, 427,
+  396,  365,  334,  303,  272,  241, 210,  179,  148,  117,  86,   55,  24,
+  25,   56,   87,   118,  149,  180, 211,  242,  273,  304,  335,  366, 397,
+  428,  459,  490,  521,  552,  583, 614,  645,  676,  707,  738,  769, 800,
+  832,  801,  770,  739,  708,  677, 646,  615,  584,  553,  522,  491, 460,
+  429,  398,  367,  336,  305,  274, 243,  212,  181,  150,  119,  88,  57,
+  26,   27,   58,   89,   120,  151, 182,  213,  244,  275,  306,  337, 368,
+  399,  430,  461,  492,  523,  554, 585,  616,  647,  678,  709,  740, 771,
+  802,  833,  864,  896,  865,  834, 803,  772,  741,  710,  679,  648, 617,
+  586,  555,  524,  493,  462,  431, 400,  369,  338,  307,  276,  245, 214,
+  183,  152,  121,  90,   59,   28,  29,   60,   91,   122,  153,  184, 215,
+  246,  277,  308,  339,  370,  401, 432,  463,  494,  525,  556,  587, 618,
+  649,  680,  711,  742,  773,  804, 835,  866,  897,  928,  960,  929, 898,
+  867,  836,  805,  774,  743,  712, 681,  650,  619,  588,  557,  526, 495,
+  464,  433,  402,  371,  340,  309, 278,  247,  216,  185,  154,  123, 92,
+  61,   30,   31,   62,   93,   124, 155,  186,  217,  248,  279,  310, 341,
+  372,  403,  434,  465,  496,  527, 558,  589,  620,  651,  682,  713, 744,
+  775,  806,  837,  868,  899,  930, 961,  992,  993,  962,  931,  900, 869,
+  838,  807,  776,  745,  714,  683, 652,  621,  590,  559,  528,  497, 466,
+  435,  404,  373,  342,  311,  280, 249,  218,  187,  156,  125,  94,  63,
+  95,   126,  157,  188,  219,  250, 281,  312,  343,  374,  405,  436, 467,
+  498,  529,  560,  591,  622,  653, 684,  715,  746,  777,  808,  839, 870,
+  901,  932,  963,  994,  995,  964, 933,  902,  871,  840,  809,  778, 747,
+  716,  685,  654,  623,  592,  561, 530,  499,  468,  437,  406,  375, 344,
+  313,  282,  251,  220,  189,  158, 127,  159,  190,  221,  252,  283, 314,
+  345,  376,  407,  438,  469,  500, 531,  562,  593,  624,  655,  686, 717,
+  748,  779,  810,  841,  872,  903, 934,  965,  996,  997,  966,  935, 904,
+  873,  842,  811,  780,  749,  718, 687,  656,  625,  594,  563,  532, 501,
+  470,  439,  408,  377,  346,  315, 284,  253,  222,  191,  223,  254, 285,
+  316,  347,  378,  409,  440,  471, 502,  533,  564,  595,  626,  657, 688,
+  719,  750,  781,  812,  843,  874, 905,  936,  967,  998,  999,  968, 937,
+  906,  875,  844,  813,  782,  751, 720,  689,  658,  627,  596,  565, 534,
+  503,  472,  441,  410,  379,  348, 317,  286,  255,  287,  318,  349, 380,
+  411,  442,  473,  504,  535,  566, 597,  628,  659,  690,  721,  752, 783,
+  814,  845,  876,  907,  938,  969, 1000, 1001, 970,  939,  908,  877, 846,
+  815,  784,  753,  722,  691,  660, 629,  598,  567,  536,  505,  474, 443,
+  412,  381,  350,  319,  351,  382, 413,  444,  475,  506,  537,  568, 599,
+  630,  661,  692,  723,  754,  785, 816,  847,  878,  909,  940,  971, 1002,
+  1003, 972,  941,  910,  879,  848, 817,  786,  755,  724,  693,  662, 631,
+  600,  569,  538,  507,  476,  445, 414,  383,  415,  446,  477,  508, 539,
+  570,  601,  632,  663,  694,  725, 756,  787,  818,  849,  880,  911, 942,
+  973,  1004, 1005, 974,  943,  912, 881,  850,  819,  788,  757,  726, 695,
+  664,  633,  602,  571,  540,  509, 478,  447,  479,  510,  541,  572, 603,
+  634,  665,  696,  727,  758,  789, 820,  851,  882,  913,  944,  975, 1006,
+  1007, 976,  945,  914,  883,  852, 821,  790,  759,  728,  697,  666, 635,
+  604,  573,  542,  511,  543,  574, 605,  636,  667,  698,  729,  760, 791,
+  822,  853,  884,  915,  946,  977, 1008, 1009, 978,  947,  916,  885, 854,
+  823,  792,  761,  730,  699,  668, 637,  606,  575,  607,  638,  669, 700,
+  731,  762,  793,  824,  855,  886, 917,  948,  979,  1010, 1011, 980, 949,
+  918,  887,  856,  825,  794,  763, 732,  701,  670,  639,  671,  702, 733,
+  764,  795,  826,  857,  888,  919, 950,  981,  1012, 1013, 982,  951, 920,
+  889,  858,  827,  796,  765,  734, 703,  735,  766,  797,  828,  859, 890,
+  921,  952,  983,  1014, 1015, 984, 953,  922,  891,  860,  829,  798, 767,
+  799,  830,  861,  892,  923,  954, 985,  1016, 1017, 986,  955,  924, 893,
+  862,  831,  863,  894,  925,  956, 987,  1018, 1019, 988,  957,  926, 895,
+  927,  958,  989,  1020, 1021, 990, 959,  991,  1022, 1023
+};
+
+// Neighborhood 2-tuples for various scans and blocksizes,
+// in {top, left} order for each position in corresponding scan order.
+DECLARE_ALIGNED(16, static const int16_t,
+                default_scan_4x4_neighbors[17 * MAX_NEIGHBORS]) = {
+  0, 0, 0, 0, 0,  0, 4, 4, 1, 4, 1,  1,  2,  2,  2,  5, 5,
+  8, 8, 8, 9, 12, 6, 9, 3, 6, 7, 10, 10, 13, 11, 14, 0, 0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mcol_scan_4x4_neighbors[17 * MAX_NEIGHBORS]) = {
+  0, 0, 0, 0, 4, 4,  8,  8, 0, 0, 1, 4, 5,  8,  9,  12, 1,
+  1, 2, 5, 6, 9, 10, 13, 2, 2, 3, 6, 7, 10, 11, 14, 0,  0,
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mrow_scan_4x4_neighbors[17 * MAX_NEIGHBORS]) = {
+  0, 0, 0, 0, 1, 1, 2,  2, 0, 0, 1,  4,  2,  5,  3,  6, 4,
+  4, 5, 8, 6, 9, 7, 10, 8, 8, 9, 12, 10, 13, 11, 14, 0, 0,
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                default_scan_4x8_neighbors[33 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  0,  0,  1,  4,  1,  1,  4,  4,  2,  5,  5,  8,  6,
+  9,  2,  2,  8,  8,  3,  6,  9,  12, 7,  10, 10, 13, 12, 12, 13, 16,
+  11, 14, 14, 17, 15, 18, 16, 16, 17, 20, 18, 21, 19, 22, 20, 20, 21,
+  24, 22, 25, 23, 26, 24, 24, 25, 28, 26, 29, 27, 30, 0,  0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mcol_scan_4x8_neighbors[33 * MAX_NEIGHBORS]) = {
+  0, 0, 0,  0,  4,  4,  8,  8,  12, 12, 16, 16, 20, 20, 24, 24, 0,
+  0, 1, 4,  5,  8,  9,  12, 13, 16, 17, 20, 21, 24, 25, 28, 1,  1,
+  2, 5, 6,  9,  10, 13, 14, 17, 18, 21, 22, 25, 26, 29, 2,  2,  3,
+  6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 0,  0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mrow_scan_4x8_neighbors[33 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  1,  1,  2,  2,  0,  0,  1,  4,  2,  5,  3,  6,  4,
+  4,  5,  8,  6,  9,  7,  10, 8,  8,  9,  12, 10, 13, 11, 14, 12, 12,
+  13, 16, 14, 17, 15, 18, 16, 16, 17, 20, 18, 21, 19, 22, 20, 20, 21,
+  24, 22, 25, 23, 26, 24, 24, 25, 28, 26, 29, 27, 30, 0,  0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                default_scan_8x4_neighbors[33 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  0,  0, 1,  8,  1,  1,  8,  8,  2,  9,  9, 16, 10,
+  17, 2,  2,  16, 16, 3, 10, 17, 24, 11, 18, 18, 25, 3,  3, 4,  11,
+  19, 26, 12, 19, 4,  4, 20, 27, 5,  12, 13, 20, 21, 28, 5, 5,  6,
+  13, 14, 21, 22, 29, 6, 6,  7,  14, 15, 22, 23, 30, 0,  0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mcol_scan_8x4_neighbors[33 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  8,  8,  16, 16, 0,  0,  1,  8,  9,  16, 17, 24, 1,
+  1,  2,  9,  10, 17, 18, 25, 2,  2,  3,  10, 11, 18, 19, 26, 3,  3,
+  4,  11, 12, 19, 20, 27, 4,  4,  5,  12, 13, 20, 21, 28, 5,  5,  6,
+  13, 14, 21, 22, 29, 6,  6,  7,  14, 15, 22, 23, 30, 0,  0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mrow_scan_8x4_neighbors[33 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  1,  1,  2,  2,  3,  3,  4,  4,  5,  5,  6,  6,  0,
+  0,  1,  8,  2,  9,  3,  10, 4,  11, 5,  12, 6,  13, 7,  14, 8,  8,
+  9,  16, 10, 17, 11, 18, 12, 19, 13, 20, 14, 21, 15, 22, 16, 16, 17,
+  24, 18, 25, 19, 26, 20, 27, 21, 28, 22, 29, 23, 30, 0,  0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                default_scan_4x16_neighbors[65 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  0,  0,  1,  4,  1,  1,  4,  4,  2,  5,  5,  8,  6,  9,  2,
+  2,  8,  8,  3,  6,  9,  12, 7,  10, 10, 13, 12, 12, 13, 16, 11, 14, 14, 17,
+  15, 18, 16, 16, 17, 20, 18, 21, 19, 22, 20, 20, 21, 24, 22, 25, 23, 26, 24,
+  24, 25, 28, 26, 29, 27, 30, 28, 28, 29, 32, 30, 33, 31, 34, 32, 32, 33, 36,
+  34, 37, 35, 38, 36, 36, 37, 40, 38, 41, 39, 42, 40, 40, 41, 44, 42, 45, 43,
+  46, 44, 44, 45, 48, 46, 49, 47, 50, 48, 48, 49, 52, 50, 53, 51, 54, 52, 52,
+  53, 56, 54, 57, 55, 58, 56, 56, 57, 60, 58, 61, 59, 62, 0,  0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                default_scan_16x4_neighbors[65 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  0,  0,  1,  16, 1,  1,  16, 16, 2,  17, 17, 32, 18, 33, 2,
+  2,  32, 32, 3,  18, 33, 48, 19, 34, 34, 49, 3,  3,  4,  19, 35, 50, 20, 35,
+  4,  4,  36, 51, 5,  20, 21, 36, 37, 52, 5,  5,  6,  21, 22, 37, 38, 53, 6,
+  6,  7,  22, 23, 38, 39, 54, 7,  7,  8,  23, 24, 39, 40, 55, 8,  8,  9,  24,
+  25, 40, 41, 56, 9,  9,  10, 25, 26, 41, 42, 57, 10, 10, 11, 26, 27, 42, 43,
+  58, 11, 11, 12, 27, 28, 43, 44, 59, 12, 12, 13, 28, 29, 44, 45, 60, 13, 13,
+  14, 29, 30, 45, 46, 61, 14, 14, 15, 30, 31, 46, 47, 62, 0,  0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mrow_scan_4x16_neighbors[65 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  1,  1,  2,  2,  0,  0,  1,  4,  2,  5,  3,  6,  4,  4,  5,
+  8,  6,  9,  7,  10, 8,  8,  9,  12, 10, 13, 11, 14, 12, 12, 13, 16, 14, 17,
+  15, 18, 16, 16, 17, 20, 18, 21, 19, 22, 20, 20, 21, 24, 22, 25, 23, 26, 24,
+  24, 25, 28, 26, 29, 27, 30, 28, 28, 29, 32, 30, 33, 31, 34, 32, 32, 33, 36,
+  34, 37, 35, 38, 36, 36, 37, 40, 38, 41, 39, 42, 40, 40, 41, 44, 42, 45, 43,
+  46, 44, 44, 45, 48, 46, 49, 47, 50, 48, 48, 49, 52, 50, 53, 51, 54, 52, 52,
+  53, 56, 54, 57, 55, 58, 56, 56, 57, 60, 58, 61, 59, 62, 0,  0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mrow_scan_16x4_neighbors[65 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  1,  1,  2,  2,  3,  3,  4,  4,  5,  5,  6,  6,  7,  7,  8,
+  8,  9,  9,  10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 0,  0,  1,  16, 2,  17,
+  3,  18, 4,  19, 5,  20, 6,  21, 7,  22, 8,  23, 9,  24, 10, 25, 11, 26, 12,
+  27, 13, 28, 14, 29, 15, 30, 16, 16, 17, 32, 18, 33, 19, 34, 20, 35, 21, 36,
+  22, 37, 23, 38, 24, 39, 25, 40, 26, 41, 27, 42, 28, 43, 29, 44, 30, 45, 31,
+  46, 32, 32, 33, 48, 34, 49, 35, 50, 36, 51, 37, 52, 38, 53, 39, 54, 40, 55,
+  41, 56, 42, 57, 43, 58, 44, 59, 45, 60, 46, 61, 47, 62, 0,  0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mcol_scan_4x16_neighbors[65 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  4,  4,  8,  8,  12, 12, 16, 16, 20, 20, 24, 24, 28, 28, 32,
+  32, 36, 36, 40, 40, 44, 44, 48, 48, 52, 52, 56, 56, 0,  0,  1,  4,  5,  8,
+  9,  12, 13, 16, 17, 20, 21, 24, 25, 28, 29, 32, 33, 36, 37, 40, 41, 44, 45,
+  48, 49, 52, 53, 56, 57, 60, 1,  1,  2,  5,  6,  9,  10, 13, 14, 17, 18, 21,
+  22, 25, 26, 29, 30, 33, 34, 37, 38, 41, 42, 45, 46, 49, 50, 53, 54, 57, 58,
+  61, 2,  2,  3,  6,  7,  10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31, 34,
+  35, 38, 39, 42, 43, 46, 47, 50, 51, 54, 55, 58, 59, 62, 0,  0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mcol_scan_16x4_neighbors[65 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  16, 16, 32, 32, 0,  0,  1,  16, 17, 32, 33, 48, 1,  1,  2,
+  17, 18, 33, 34, 49, 2,  2,  3,  18, 19, 34, 35, 50, 3,  3,  4,  19, 20, 35,
+  36, 51, 4,  4,  5,  20, 21, 36, 37, 52, 5,  5,  6,  21, 22, 37, 38, 53, 6,
+  6,  7,  22, 23, 38, 39, 54, 7,  7,  8,  23, 24, 39, 40, 55, 8,  8,  9,  24,
+  25, 40, 41, 56, 9,  9,  10, 25, 26, 41, 42, 57, 10, 10, 11, 26, 27, 42, 43,
+  58, 11, 11, 12, 27, 28, 43, 44, 59, 12, 12, 13, 28, 29, 44, 45, 60, 13, 13,
+  14, 29, 30, 45, 46, 61, 14, 14, 15, 30, 31, 46, 47, 62, 0,  0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                default_scan_8x32_neighbors[257 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   0,   0,   1,   1,   1,   8,   8,   8,   2,   2,   2,
+  9,   9,   16,  16,  16,  3,   3,   3,   10,  10,  17,  17,  24,  24,  24,
+  4,   4,   4,   11,  11,  18,  18,  25,  25,  32,  32,  32,  5,   5,   5,
+  12,  12,  19,  19,  26,  26,  33,  33,  40,  40,  40,  6,   6,   6,   13,
+  13,  20,  20,  27,  27,  34,  34,  41,  41,  48,  48,  48,  7,   14,  14,
+  21,  21,  28,  28,  35,  35,  42,  42,  49,  49,  56,  56,  56,  15,  22,
+  22,  29,  29,  36,  36,  43,  43,  50,  50,  57,  57,  64,  64,  64,  23,
+  30,  30,  37,  37,  44,  44,  51,  51,  58,  58,  65,  65,  72,  72,  72,
+  31,  38,  38,  45,  45,  52,  52,  59,  59,  66,  66,  73,  73,  80,  80,
+  80,  39,  46,  46,  53,  53,  60,  60,  67,  67,  74,  74,  81,  81,  88,
+  88,  88,  47,  54,  54,  61,  61,  68,  68,  75,  75,  82,  82,  89,  89,
+  96,  96,  96,  55,  62,  62,  69,  69,  76,  76,  83,  83,  90,  90,  97,
+  97,  104, 104, 104, 63,  70,  70,  77,  77,  84,  84,  91,  91,  98,  98,
+  105, 105, 112, 112, 112, 71,  78,  78,  85,  85,  92,  92,  99,  99,  106,
+  106, 113, 113, 120, 120, 120, 79,  86,  86,  93,  93,  100, 100, 107, 107,
+  114, 114, 121, 121, 128, 128, 128, 87,  94,  94,  101, 101, 108, 108, 115,
+  115, 122, 122, 129, 129, 136, 136, 136, 95,  102, 102, 109, 109, 116, 116,
+  123, 123, 130, 130, 137, 137, 144, 144, 144, 103, 110, 110, 117, 117, 124,
+  124, 131, 131, 138, 138, 145, 145, 152, 152, 152, 111, 118, 118, 125, 125,
+  132, 132, 139, 139, 146, 146, 153, 153, 160, 160, 160, 119, 126, 126, 133,
+  133, 140, 140, 147, 147, 154, 154, 161, 161, 168, 168, 168, 127, 134, 134,
+  141, 141, 148, 148, 155, 155, 162, 162, 169, 169, 176, 176, 176, 135, 142,
+  142, 149, 149, 156, 156, 163, 163, 170, 170, 177, 177, 184, 184, 184, 143,
+  150, 150, 157, 157, 164, 164, 171, 171, 178, 178, 185, 185, 192, 192, 192,
+  151, 158, 158, 165, 165, 172, 172, 179, 179, 186, 186, 193, 193, 200, 200,
+  200, 159, 166, 166, 173, 173, 180, 180, 187, 187, 194, 194, 201, 201, 208,
+  208, 208, 167, 174, 174, 181, 181, 188, 188, 195, 195, 202, 202, 209, 209,
+  216, 216, 216, 175, 182, 182, 189, 189, 196, 196, 203, 203, 210, 210, 217,
+  217, 224, 224, 224, 183, 190, 190, 197, 197, 204, 204, 211, 211, 218, 218,
+  225, 225, 232, 232, 232, 191, 198, 198, 205, 205, 212, 212, 219, 219, 226,
+  226, 233, 233, 240, 240, 240, 199, 206, 206, 213, 213, 220, 220, 227, 227,
+  234, 234, 241, 241, 248, 207, 214, 214, 221, 221, 228, 228, 235, 235, 242,
+  242, 249, 215, 222, 222, 229, 229, 236, 236, 243, 243, 250, 223, 230, 230,
+  237, 237, 244, 244, 251, 231, 238, 238, 245, 245, 252, 239, 246, 246, 253,
+  247, 254, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                default_scan_32x8_neighbors[257 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   0,   0,   1,   1,   1,   32,  32,  32,  2,   2,   2,
+  33,  33,  64,  64,  64,  3,   3,   3,   34,  34,  65,  65,  96,  96,  96,
+  4,   4,   4,   35,  35,  66,  66,  97,  97,  128, 128, 128, 5,   5,   5,
+  36,  36,  67,  67,  98,  98,  129, 129, 160, 160, 160, 6,   6,   6,   37,
+  37,  68,  68,  99,  99,  130, 130, 161, 161, 192, 192, 192, 7,   7,   7,
+  38,  38,  69,  69,  100, 100, 131, 131, 162, 162, 193, 193, 224, 8,   8,
+  8,   39,  39,  70,  70,  101, 101, 132, 132, 163, 163, 194, 194, 225, 9,
+  9,   9,   40,  40,  71,  71,  102, 102, 133, 133, 164, 164, 195, 195, 226,
+  10,  10,  10,  41,  41,  72,  72,  103, 103, 134, 134, 165, 165, 196, 196,
+  227, 11,  11,  11,  42,  42,  73,  73,  104, 104, 135, 135, 166, 166, 197,
+  197, 228, 12,  12,  12,  43,  43,  74,  74,  105, 105, 136, 136, 167, 167,
+  198, 198, 229, 13,  13,  13,  44,  44,  75,  75,  106, 106, 137, 137, 168,
+  168, 199, 199, 230, 14,  14,  14,  45,  45,  76,  76,  107, 107, 138, 138,
+  169, 169, 200, 200, 231, 15,  15,  15,  46,  46,  77,  77,  108, 108, 139,
+  139, 170, 170, 201, 201, 232, 16,  16,  16,  47,  47,  78,  78,  109, 109,
+  140, 140, 171, 171, 202, 202, 233, 17,  17,  17,  48,  48,  79,  79,  110,
+  110, 141, 141, 172, 172, 203, 203, 234, 18,  18,  18,  49,  49,  80,  80,
+  111, 111, 142, 142, 173, 173, 204, 204, 235, 19,  19,  19,  50,  50,  81,
+  81,  112, 112, 143, 143, 174, 174, 205, 205, 236, 20,  20,  20,  51,  51,
+  82,  82,  113, 113, 144, 144, 175, 175, 206, 206, 237, 21,  21,  21,  52,
+  52,  83,  83,  114, 114, 145, 145, 176, 176, 207, 207, 238, 22,  22,  22,
+  53,  53,  84,  84,  115, 115, 146, 146, 177, 177, 208, 208, 239, 23,  23,
+  23,  54,  54,  85,  85,  116, 116, 147, 147, 178, 178, 209, 209, 240, 24,
+  24,  24,  55,  55,  86,  86,  117, 117, 148, 148, 179, 179, 210, 210, 241,
+  25,  25,  25,  56,  56,  87,  87,  118, 118, 149, 149, 180, 180, 211, 211,
+  242, 26,  26,  26,  57,  57,  88,  88,  119, 119, 150, 150, 181, 181, 212,
+  212, 243, 27,  27,  27,  58,  58,  89,  89,  120, 120, 151, 151, 182, 182,
+  213, 213, 244, 28,  28,  28,  59,  59,  90,  90,  121, 121, 152, 152, 183,
+  183, 214, 214, 245, 29,  29,  29,  60,  60,  91,  91,  122, 122, 153, 153,
+  184, 184, 215, 215, 246, 30,  30,  30,  61,  61,  92,  92,  123, 123, 154,
+  154, 185, 185, 216, 216, 247, 31,  62,  62,  93,  93,  124, 124, 155, 155,
+  186, 186, 217, 217, 248, 63,  94,  94,  125, 125, 156, 156, 187, 187, 218,
+  218, 249, 95,  126, 126, 157, 157, 188, 188, 219, 219, 250, 127, 158, 158,
+  189, 189, 220, 220, 251, 159, 190, 190, 221, 221, 252, 191, 222, 222, 253,
+  223, 254, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mrow_scan_8x32_neighbors[257 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   1,   1,   2,   2,   3,   3,   4,   4,   5,   5,   6,
+  6,   0,   0,   1,   8,   2,   9,   3,   10,  4,   11,  5,   12,  6,   13,
+  7,   14,  8,   8,   9,   16,  10,  17,  11,  18,  12,  19,  13,  20,  14,
+  21,  15,  22,  16,  16,  17,  24,  18,  25,  19,  26,  20,  27,  21,  28,
+  22,  29,  23,  30,  24,  24,  25,  32,  26,  33,  27,  34,  28,  35,  29,
+  36,  30,  37,  31,  38,  32,  32,  33,  40,  34,  41,  35,  42,  36,  43,
+  37,  44,  38,  45,  39,  46,  40,  40,  41,  48,  42,  49,  43,  50,  44,
+  51,  45,  52,  46,  53,  47,  54,  48,  48,  49,  56,  50,  57,  51,  58,
+  52,  59,  53,  60,  54,  61,  55,  62,  56,  56,  57,  64,  58,  65,  59,
+  66,  60,  67,  61,  68,  62,  69,  63,  70,  64,  64,  65,  72,  66,  73,
+  67,  74,  68,  75,  69,  76,  70,  77,  71,  78,  72,  72,  73,  80,  74,
+  81,  75,  82,  76,  83,  77,  84,  78,  85,  79,  86,  80,  80,  81,  88,
+  82,  89,  83,  90,  84,  91,  85,  92,  86,  93,  87,  94,  88,  88,  89,
+  96,  90,  97,  91,  98,  92,  99,  93,  100, 94,  101, 95,  102, 96,  96,
+  97,  104, 98,  105, 99,  106, 100, 107, 101, 108, 102, 109, 103, 110, 104,
+  104, 105, 112, 106, 113, 107, 114, 108, 115, 109, 116, 110, 117, 111, 118,
+  112, 112, 113, 120, 114, 121, 115, 122, 116, 123, 117, 124, 118, 125, 119,
+  126, 120, 120, 121, 128, 122, 129, 123, 130, 124, 131, 125, 132, 126, 133,
+  127, 134, 128, 128, 129, 136, 130, 137, 131, 138, 132, 139, 133, 140, 134,
+  141, 135, 142, 136, 136, 137, 144, 138, 145, 139, 146, 140, 147, 141, 148,
+  142, 149, 143, 150, 144, 144, 145, 152, 146, 153, 147, 154, 148, 155, 149,
+  156, 150, 157, 151, 158, 152, 152, 153, 160, 154, 161, 155, 162, 156, 163,
+  157, 164, 158, 165, 159, 166, 160, 160, 161, 168, 162, 169, 163, 170, 164,
+  171, 165, 172, 166, 173, 167, 174, 168, 168, 169, 176, 170, 177, 171, 178,
+  172, 179, 173, 180, 174, 181, 175, 182, 176, 176, 177, 184, 178, 185, 179,
+  186, 180, 187, 181, 188, 182, 189, 183, 190, 184, 184, 185, 192, 186, 193,
+  187, 194, 188, 195, 189, 196, 190, 197, 191, 198, 192, 192, 193, 200, 194,
+  201, 195, 202, 196, 203, 197, 204, 198, 205, 199, 206, 200, 200, 201, 208,
+  202, 209, 203, 210, 204, 211, 205, 212, 206, 213, 207, 214, 208, 208, 209,
+  216, 210, 217, 211, 218, 212, 219, 213, 220, 214, 221, 215, 222, 216, 216,
+  217, 224, 218, 225, 219, 226, 220, 227, 221, 228, 222, 229, 223, 230, 224,
+  224, 225, 232, 226, 233, 227, 234, 228, 235, 229, 236, 230, 237, 231, 238,
+  232, 232, 233, 240, 234, 241, 235, 242, 236, 243, 237, 244, 238, 245, 239,
+  246, 240, 240, 241, 248, 242, 249, 243, 250, 244, 251, 245, 252, 246, 253,
+  247, 254, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mrow_scan_32x8_neighbors[257 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   1,   1,   2,   2,   3,   3,   4,   4,   5,   5,   6,
+  6,   7,   7,   8,   8,   9,   9,   10,  10,  11,  11,  12,  12,  13,  13,
+  14,  14,  15,  15,  16,  16,  17,  17,  18,  18,  19,  19,  20,  20,  21,
+  21,  22,  22,  23,  23,  24,  24,  25,  25,  26,  26,  27,  27,  28,  28,
+  29,  29,  30,  30,  0,   0,   1,   32,  2,   33,  3,   34,  4,   35,  5,
+  36,  6,   37,  7,   38,  8,   39,  9,   40,  10,  41,  11,  42,  12,  43,
+  13,  44,  14,  45,  15,  46,  16,  47,  17,  48,  18,  49,  19,  50,  20,
+  51,  21,  52,  22,  53,  23,  54,  24,  55,  25,  56,  26,  57,  27,  58,
+  28,  59,  29,  60,  30,  61,  31,  62,  32,  32,  33,  64,  34,  65,  35,
+  66,  36,  67,  37,  68,  38,  69,  39,  70,  40,  71,  41,  72,  42,  73,
+  43,  74,  44,  75,  45,  76,  46,  77,  47,  78,  48,  79,  49,  80,  50,
+  81,  51,  82,  52,  83,  53,  84,  54,  85,  55,  86,  56,  87,  57,  88,
+  58,  89,  59,  90,  60,  91,  61,  92,  62,  93,  63,  94,  64,  64,  65,
+  96,  66,  97,  67,  98,  68,  99,  69,  100, 70,  101, 71,  102, 72,  103,
+  73,  104, 74,  105, 75,  106, 76,  107, 77,  108, 78,  109, 79,  110, 80,
+  111, 81,  112, 82,  113, 83,  114, 84,  115, 85,  116, 86,  117, 87,  118,
+  88,  119, 89,  120, 90,  121, 91,  122, 92,  123, 93,  124, 94,  125, 95,
+  126, 96,  96,  97,  128, 98,  129, 99,  130, 100, 131, 101, 132, 102, 133,
+  103, 134, 104, 135, 105, 136, 106, 137, 107, 138, 108, 139, 109, 140, 110,
+  141, 111, 142, 112, 143, 113, 144, 114, 145, 115, 146, 116, 147, 117, 148,
+  118, 149, 119, 150, 120, 151, 121, 152, 122, 153, 123, 154, 124, 155, 125,
+  156, 126, 157, 127, 158, 128, 128, 129, 160, 130, 161, 131, 162, 132, 163,
+  133, 164, 134, 165, 135, 166, 136, 167, 137, 168, 138, 169, 139, 170, 140,
+  171, 141, 172, 142, 173, 143, 174, 144, 175, 145, 176, 146, 177, 147, 178,
+  148, 179, 149, 180, 150, 181, 151, 182, 152, 183, 153, 184, 154, 185, 155,
+  186, 156, 187, 157, 188, 158, 189, 159, 190, 160, 160, 161, 192, 162, 193,
+  163, 194, 164, 195, 165, 196, 166, 197, 167, 198, 168, 199, 169, 200, 170,
+  201, 171, 202, 172, 203, 173, 204, 174, 205, 175, 206, 176, 207, 177, 208,
+  178, 209, 179, 210, 180, 211, 181, 212, 182, 213, 183, 214, 184, 215, 185,
+  216, 186, 217, 187, 218, 188, 219, 189, 220, 190, 221, 191, 222, 192, 192,
+  193, 224, 194, 225, 195, 226, 196, 227, 197, 228, 198, 229, 199, 230, 200,
+  231, 201, 232, 202, 233, 203, 234, 204, 235, 205, 236, 206, 237, 207, 238,
+  208, 239, 209, 240, 210, 241, 211, 242, 212, 243, 213, 244, 214, 245, 215,
+  246, 216, 247, 217, 248, 218, 249, 219, 250, 220, 251, 221, 252, 222, 253,
+  223, 254, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mcol_scan_8x32_neighbors[257 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   8,   8,   16,  16,  24,  24,  32,  32,  40,  40,  48,
+  48,  56,  56,  64,  64,  72,  72,  80,  80,  88,  88,  96,  96,  104, 104,
+  112, 112, 120, 120, 128, 128, 136, 136, 144, 144, 152, 152, 160, 160, 168,
+  168, 176, 176, 184, 184, 192, 192, 200, 200, 208, 208, 216, 216, 224, 224,
+  232, 232, 240, 240, 0,   0,   1,   8,   9,   16,  17,  24,  25,  32,  33,
+  40,  41,  48,  49,  56,  57,  64,  65,  72,  73,  80,  81,  88,  89,  96,
+  97,  104, 105, 112, 113, 120, 121, 128, 129, 136, 137, 144, 145, 152, 153,
+  160, 161, 168, 169, 176, 177, 184, 185, 192, 193, 200, 201, 208, 209, 216,
+  217, 224, 225, 232, 233, 240, 241, 248, 1,   1,   2,   9,   10,  17,  18,
+  25,  26,  33,  34,  41,  42,  49,  50,  57,  58,  65,  66,  73,  74,  81,
+  82,  89,  90,  97,  98,  105, 106, 113, 114, 121, 122, 129, 130, 137, 138,
+  145, 146, 153, 154, 161, 162, 169, 170, 177, 178, 185, 186, 193, 194, 201,
+  202, 209, 210, 217, 218, 225, 226, 233, 234, 241, 242, 249, 2,   2,   3,
+  10,  11,  18,  19,  26,  27,  34,  35,  42,  43,  50,  51,  58,  59,  66,
+  67,  74,  75,  82,  83,  90,  91,  98,  99,  106, 107, 114, 115, 122, 123,
+  130, 131, 138, 139, 146, 147, 154, 155, 162, 163, 170, 171, 178, 179, 186,
+  187, 194, 195, 202, 203, 210, 211, 218, 219, 226, 227, 234, 235, 242, 243,
+  250, 3,   3,   4,   11,  12,  19,  20,  27,  28,  35,  36,  43,  44,  51,
+  52,  59,  60,  67,  68,  75,  76,  83,  84,  91,  92,  99,  100, 107, 108,
+  115, 116, 123, 124, 131, 132, 139, 140, 147, 148, 155, 156, 163, 164, 171,
+  172, 179, 180, 187, 188, 195, 196, 203, 204, 211, 212, 219, 220, 227, 228,
+  235, 236, 243, 244, 251, 4,   4,   5,   12,  13,  20,  21,  28,  29,  36,
+  37,  44,  45,  52,  53,  60,  61,  68,  69,  76,  77,  84,  85,  92,  93,
+  100, 101, 108, 109, 116, 117, 124, 125, 132, 133, 140, 141, 148, 149, 156,
+  157, 164, 165, 172, 173, 180, 181, 188, 189, 196, 197, 204, 205, 212, 213,
+  220, 221, 228, 229, 236, 237, 244, 245, 252, 5,   5,   6,   13,  14,  21,
+  22,  29,  30,  37,  38,  45,  46,  53,  54,  61,  62,  69,  70,  77,  78,
+  85,  86,  93,  94,  101, 102, 109, 110, 117, 118, 125, 126, 133, 134, 141,
+  142, 149, 150, 157, 158, 165, 166, 173, 174, 181, 182, 189, 190, 197, 198,
+  205, 206, 213, 214, 221, 222, 229, 230, 237, 238, 245, 246, 253, 6,   6,
+  7,   14,  15,  22,  23,  30,  31,  38,  39,  46,  47,  54,  55,  62,  63,
+  70,  71,  78,  79,  86,  87,  94,  95,  102, 103, 110, 111, 118, 119, 126,
+  127, 134, 135, 142, 143, 150, 151, 158, 159, 166, 167, 174, 175, 182, 183,
+  190, 191, 198, 199, 206, 207, 214, 215, 222, 223, 230, 231, 238, 239, 246,
+  247, 254, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mcol_scan_32x8_neighbors[257 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  32, 32, 64, 64,  96,  96,  128, 128, 160, 160, 192, 192,
+  0,  0,  1,  32, 33, 64, 65, 96,  97,  128, 129, 160, 161, 192, 193, 224,
+  1,  1,  2,  33, 34, 65, 66, 97,  98,  129, 130, 161, 162, 193, 194, 225,
+  2,  2,  3,  34, 35, 66, 67, 98,  99,  130, 131, 162, 163, 194, 195, 226,
+  3,  3,  4,  35, 36, 67, 68, 99,  100, 131, 132, 163, 164, 195, 196, 227,
+  4,  4,  5,  36, 37, 68, 69, 100, 101, 132, 133, 164, 165, 196, 197, 228,
+  5,  5,  6,  37, 38, 69, 70, 101, 102, 133, 134, 165, 166, 197, 198, 229,
+  6,  6,  7,  38, 39, 70, 71, 102, 103, 134, 135, 166, 167, 198, 199, 230,
+  7,  7,  8,  39, 40, 71, 72, 103, 104, 135, 136, 167, 168, 199, 200, 231,
+  8,  8,  9,  40, 41, 72, 73, 104, 105, 136, 137, 168, 169, 200, 201, 232,
+  9,  9,  10, 41, 42, 73, 74, 105, 106, 137, 138, 169, 170, 201, 202, 233,
+  10, 10, 11, 42, 43, 74, 75, 106, 107, 138, 139, 170, 171, 202, 203, 234,
+  11, 11, 12, 43, 44, 75, 76, 107, 108, 139, 140, 171, 172, 203, 204, 235,
+  12, 12, 13, 44, 45, 76, 77, 108, 109, 140, 141, 172, 173, 204, 205, 236,
+  13, 13, 14, 45, 46, 77, 78, 109, 110, 141, 142, 173, 174, 205, 206, 237,
+  14, 14, 15, 46, 47, 78, 79, 110, 111, 142, 143, 174, 175, 206, 207, 238,
+  15, 15, 16, 47, 48, 79, 80, 111, 112, 143, 144, 175, 176, 207, 208, 239,
+  16, 16, 17, 48, 49, 80, 81, 112, 113, 144, 145, 176, 177, 208, 209, 240,
+  17, 17, 18, 49, 50, 81, 82, 113, 114, 145, 146, 177, 178, 209, 210, 241,
+  18, 18, 19, 50, 51, 82, 83, 114, 115, 146, 147, 178, 179, 210, 211, 242,
+  19, 19, 20, 51, 52, 83, 84, 115, 116, 147, 148, 179, 180, 211, 212, 243,
+  20, 20, 21, 52, 53, 84, 85, 116, 117, 148, 149, 180, 181, 212, 213, 244,
+  21, 21, 22, 53, 54, 85, 86, 117, 118, 149, 150, 181, 182, 213, 214, 245,
+  22, 22, 23, 54, 55, 86, 87, 118, 119, 150, 151, 182, 183, 214, 215, 246,
+  23, 23, 24, 55, 56, 87, 88, 119, 120, 151, 152, 183, 184, 215, 216, 247,
+  24, 24, 25, 56, 57, 88, 89, 120, 121, 152, 153, 184, 185, 216, 217, 248,
+  25, 25, 26, 57, 58, 89, 90, 121, 122, 153, 154, 185, 186, 217, 218, 249,
+  26, 26, 27, 58, 59, 90, 91, 122, 123, 154, 155, 186, 187, 218, 219, 250,
+  27, 27, 28, 59, 60, 91, 92, 123, 124, 155, 156, 187, 188, 219, 220, 251,
+  28, 28, 29, 60, 61, 92, 93, 124, 125, 156, 157, 188, 189, 220, 221, 252,
+  29, 29, 30, 61, 62, 93, 94, 125, 126, 157, 158, 189, 190, 221, 222, 253,
+  30, 30, 31, 62, 63, 94, 95, 126, 127, 158, 159, 190, 191, 222, 223, 254,
+  0,  0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mcol_scan_8x8_neighbors[65 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  8,  8,  16, 16, 24, 24, 32, 32, 40, 40, 48, 48, 0,  0,  1,
+  8,  9,  16, 17, 24, 25, 32, 33, 40, 41, 48, 49, 56, 1,  1,  2,  9,  10, 17,
+  18, 25, 26, 33, 34, 41, 42, 49, 50, 57, 2,  2,  3,  10, 11, 18, 19, 26, 27,
+  34, 35, 42, 43, 50, 51, 58, 3,  3,  4,  11, 12, 19, 20, 27, 28, 35, 36, 43,
+  44, 51, 52, 59, 4,  4,  5,  12, 13, 20, 21, 28, 29, 36, 37, 44, 45, 52, 53,
+  60, 5,  5,  6,  13, 14, 21, 22, 29, 30, 37, 38, 45, 46, 53, 54, 61, 6,  6,
+  7,  14, 15, 22, 23, 30, 31, 38, 39, 46, 47, 54, 55, 62, 0,  0,
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mrow_scan_8x8_neighbors[65 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  1,  1,  2,  2,  3,  3,  4,  4,  5,  5,  6,  6,  0,  0,  1,
+  8,  2,  9,  3,  10, 4,  11, 5,  12, 6,  13, 7,  14, 8,  8,  9,  16, 10, 17,
+  11, 18, 12, 19, 13, 20, 14, 21, 15, 22, 16, 16, 17, 24, 18, 25, 19, 26, 20,
+  27, 21, 28, 22, 29, 23, 30, 24, 24, 25, 32, 26, 33, 27, 34, 28, 35, 29, 36,
+  30, 37, 31, 38, 32, 32, 33, 40, 34, 41, 35, 42, 36, 43, 37, 44, 38, 45, 39,
+  46, 40, 40, 41, 48, 42, 49, 43, 50, 44, 51, 45, 52, 46, 53, 47, 54, 48, 48,
+  49, 56, 50, 57, 51, 58, 52, 59, 53, 60, 54, 61, 55, 62, 0,  0,
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                default_scan_8x8_neighbors[65 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  0,  0,  8,  8,  1,  8,  1,  1,  2,  2,  2,  9,  9,  16, 16,
+  16, 24, 24, 17, 24, 10, 17, 3,  10, 3,  3,  4,  4,  4,  11, 11, 18, 18, 25,
+  25, 32, 32, 32, 40, 40, 33, 40, 26, 33, 19, 26, 12, 19, 5,  12, 5,  5,  6,
+  6,  6,  13, 13, 20, 20, 27, 27, 34, 34, 41, 41, 48, 48, 48, 49, 56, 42, 49,
+  35, 42, 28, 35, 21, 28, 14, 21, 7,  14, 15, 22, 22, 29, 29, 36, 36, 43, 43,
+  50, 50, 57, 51, 58, 44, 51, 37, 44, 30, 37, 23, 30, 31, 38, 38, 45, 45, 52,
+  52, 59, 53, 60, 46, 53, 39, 46, 47, 54, 54, 61, 55, 62, 0,  0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                default_scan_8x16_neighbors[129 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   0,   0,   1,   1,   1,   8,   8,   8,   2,   2,   2,
+  9,   9,   16,  16,  16,  3,   3,   3,   10,  10,  17,  17,  24,  24,  24,
+  4,   4,   4,   11,  11,  18,  18,  25,  25,  32,  32,  32,  5,   5,   5,
+  12,  12,  19,  19,  26,  26,  33,  33,  40,  40,  40,  6,   6,   6,   13,
+  13,  20,  20,  27,  27,  34,  34,  41,  41,  48,  48,  48,  7,   14,  14,
+  21,  21,  28,  28,  35,  35,  42,  42,  49,  49,  56,  56,  56,  15,  22,
+  22,  29,  29,  36,  36,  43,  43,  50,  50,  57,  57,  64,  64,  64,  23,
+  30,  30,  37,  37,  44,  44,  51,  51,  58,  58,  65,  65,  72,  72,  72,
+  31,  38,  38,  45,  45,  52,  52,  59,  59,  66,  66,  73,  73,  80,  80,
+  80,  39,  46,  46,  53,  53,  60,  60,  67,  67,  74,  74,  81,  81,  88,
+  88,  88,  47,  54,  54,  61,  61,  68,  68,  75,  75,  82,  82,  89,  89,
+  96,  96,  96,  55,  62,  62,  69,  69,  76,  76,  83,  83,  90,  90,  97,
+  97,  104, 104, 104, 63,  70,  70,  77,  77,  84,  84,  91,  91,  98,  98,
+  105, 105, 112, 112, 112, 71,  78,  78,  85,  85,  92,  92,  99,  99,  106,
+  106, 113, 113, 120, 79,  86,  86,  93,  93,  100, 100, 107, 107, 114, 114,
+  121, 87,  94,  94,  101, 101, 108, 108, 115, 115, 122, 95,  102, 102, 109,
+  109, 116, 116, 123, 103, 110, 110, 117, 117, 124, 111, 118, 118, 125, 119,
+  126, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                default_scan_16x8_neighbors[129 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   0,   0,  1,  1,   1,   16,  16,  16,  2,   2,   2,
+  17,  17,  32,  32,  32,  3,  3,  3,   18,  18,  33,  33,  48,  48,  48,
+  4,   4,   4,   19,  19,  34, 34, 49,  49,  64,  64,  64,  5,   5,   5,
+  20,  20,  35,  35,  50,  50, 65, 65,  80,  80,  80,  6,   6,   6,   21,
+  21,  36,  36,  51,  51,  66, 66, 81,  81,  96,  96,  96,  7,   7,   7,
+  22,  22,  37,  37,  52,  52, 67, 67,  82,  82,  97,  97,  112, 8,   8,
+  8,   23,  23,  38,  38,  53, 53, 68,  68,  83,  83,  98,  98,  113, 9,
+  9,   9,   24,  24,  39,  39, 54, 54,  69,  69,  84,  84,  99,  99,  114,
+  10,  10,  10,  25,  25,  40, 40, 55,  55,  70,  70,  85,  85,  100, 100,
+  115, 11,  11,  11,  26,  26, 41, 41,  56,  56,  71,  71,  86,  86,  101,
+  101, 116, 12,  12,  12,  27, 27, 42,  42,  57,  57,  72,  72,  87,  87,
+  102, 102, 117, 13,  13,  13, 28, 28,  43,  43,  58,  58,  73,  73,  88,
+  88,  103, 103, 118, 14,  14, 14, 29,  29,  44,  44,  59,  59,  74,  74,
+  89,  89,  104, 104, 119, 15, 30, 30,  45,  45,  60,  60,  75,  75,  90,
+  90,  105, 105, 120, 31,  46, 46, 61,  61,  76,  76,  91,  91,  106, 106,
+  121, 47,  62,  62,  77,  77, 92, 92,  107, 107, 122, 63,  78,  78,  93,
+  93,  108, 108, 123, 79,  94, 94, 109, 109, 124, 95,  110, 110, 125, 111,
+  126, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mcol_scan_8x16_neighbors[129 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  8,  8,  16, 16, 24, 24,  32,  32,  40,  40,  48,  48,
+  56, 56, 64, 64, 72, 72, 80, 80, 88, 88,  96,  96,  104, 104, 112, 112,
+  0,  0,  1,  8,  9,  16, 17, 24, 25, 32,  33,  40,  41,  48,  49,  56,
+  57, 64, 65, 72, 73, 80, 81, 88, 89, 96,  97,  104, 105, 112, 113, 120,
+  1,  1,  2,  9,  10, 17, 18, 25, 26, 33,  34,  41,  42,  49,  50,  57,
+  58, 65, 66, 73, 74, 81, 82, 89, 90, 97,  98,  105, 106, 113, 114, 121,
+  2,  2,  3,  10, 11, 18, 19, 26, 27, 34,  35,  42,  43,  50,  51,  58,
+  59, 66, 67, 74, 75, 82, 83, 90, 91, 98,  99,  106, 107, 114, 115, 122,
+  3,  3,  4,  11, 12, 19, 20, 27, 28, 35,  36,  43,  44,  51,  52,  59,
+  60, 67, 68, 75, 76, 83, 84, 91, 92, 99,  100, 107, 108, 115, 116, 123,
+  4,  4,  5,  12, 13, 20, 21, 28, 29, 36,  37,  44,  45,  52,  53,  60,
+  61, 68, 69, 76, 77, 84, 85, 92, 93, 100, 101, 108, 109, 116, 117, 124,
+  5,  5,  6,  13, 14, 21, 22, 29, 30, 37,  38,  45,  46,  53,  54,  61,
+  62, 69, 70, 77, 78, 85, 86, 93, 94, 101, 102, 109, 110, 117, 118, 125,
+  6,  6,  7,  14, 15, 22, 23, 30, 31, 38,  39,  46,  47,  54,  55,  62,
+  63, 70, 71, 78, 79, 86, 87, 94, 95, 102, 103, 110, 111, 118, 119, 126,
+  0,  0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mcol_scan_16x8_neighbors[129 * MAX_NEIGHBORS]) = {
+  0,  0,  0,  0,  16, 16, 32, 32, 48, 48, 64, 64, 80, 80,  96,  96,
+  0,  0,  1,  16, 17, 32, 33, 48, 49, 64, 65, 80, 81, 96,  97,  112,
+  1,  1,  2,  17, 18, 33, 34, 49, 50, 65, 66, 81, 82, 97,  98,  113,
+  2,  2,  3,  18, 19, 34, 35, 50, 51, 66, 67, 82, 83, 98,  99,  114,
+  3,  3,  4,  19, 20, 35, 36, 51, 52, 67, 68, 83, 84, 99,  100, 115,
+  4,  4,  5,  20, 21, 36, 37, 52, 53, 68, 69, 84, 85, 100, 101, 116,
+  5,  5,  6,  21, 22, 37, 38, 53, 54, 69, 70, 85, 86, 101, 102, 117,
+  6,  6,  7,  22, 23, 38, 39, 54, 55, 70, 71, 86, 87, 102, 103, 118,
+  7,  7,  8,  23, 24, 39, 40, 55, 56, 71, 72, 87, 88, 103, 104, 119,
+  8,  8,  9,  24, 25, 40, 41, 56, 57, 72, 73, 88, 89, 104, 105, 120,
+  9,  9,  10, 25, 26, 41, 42, 57, 58, 73, 74, 89, 90, 105, 106, 121,
+  10, 10, 11, 26, 27, 42, 43, 58, 59, 74, 75, 90, 91, 106, 107, 122,
+  11, 11, 12, 27, 28, 43, 44, 59, 60, 75, 76, 91, 92, 107, 108, 123,
+  12, 12, 13, 28, 29, 44, 45, 60, 61, 76, 77, 92, 93, 108, 109, 124,
+  13, 13, 14, 29, 30, 45, 46, 61, 62, 77, 78, 93, 94, 109, 110, 125,
+  14, 14, 15, 30, 31, 46, 47, 62, 63, 78, 79, 94, 95, 110, 111, 126,
+  0,  0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mrow_scan_8x16_neighbors[129 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   1,   1,   2,   2,   3,   3,   4,   4,   5,   5,   6,
+  6,   0,   0,   1,   8,   2,   9,   3,   10,  4,   11,  5,   12,  6,   13,
+  7,   14,  8,   8,   9,   16,  10,  17,  11,  18,  12,  19,  13,  20,  14,
+  21,  15,  22,  16,  16,  17,  24,  18,  25,  19,  26,  20,  27,  21,  28,
+  22,  29,  23,  30,  24,  24,  25,  32,  26,  33,  27,  34,  28,  35,  29,
+  36,  30,  37,  31,  38,  32,  32,  33,  40,  34,  41,  35,  42,  36,  43,
+  37,  44,  38,  45,  39,  46,  40,  40,  41,  48,  42,  49,  43,  50,  44,
+  51,  45,  52,  46,  53,  47,  54,  48,  48,  49,  56,  50,  57,  51,  58,
+  52,  59,  53,  60,  54,  61,  55,  62,  56,  56,  57,  64,  58,  65,  59,
+  66,  60,  67,  61,  68,  62,  69,  63,  70,  64,  64,  65,  72,  66,  73,
+  67,  74,  68,  75,  69,  76,  70,  77,  71,  78,  72,  72,  73,  80,  74,
+  81,  75,  82,  76,  83,  77,  84,  78,  85,  79,  86,  80,  80,  81,  88,
+  82,  89,  83,  90,  84,  91,  85,  92,  86,  93,  87,  94,  88,  88,  89,
+  96,  90,  97,  91,  98,  92,  99,  93,  100, 94,  101, 95,  102, 96,  96,
+  97,  104, 98,  105, 99,  106, 100, 107, 101, 108, 102, 109, 103, 110, 104,
+  104, 105, 112, 106, 113, 107, 114, 108, 115, 109, 116, 110, 117, 111, 118,
+  112, 112, 113, 120, 114, 121, 115, 122, 116, 123, 117, 124, 118, 125, 119,
+  126, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mrow_scan_16x8_neighbors[129 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   1,   1,   2,   2,   3,   3,   4,   4,   5,   5,   6,
+  6,   7,   7,   8,   8,   9,   9,   10,  10,  11,  11,  12,  12,  13,  13,
+  14,  14,  0,   0,   1,   16,  2,   17,  3,   18,  4,   19,  5,   20,  6,
+  21,  7,   22,  8,   23,  9,   24,  10,  25,  11,  26,  12,  27,  13,  28,
+  14,  29,  15,  30,  16,  16,  17,  32,  18,  33,  19,  34,  20,  35,  21,
+  36,  22,  37,  23,  38,  24,  39,  25,  40,  26,  41,  27,  42,  28,  43,
+  29,  44,  30,  45,  31,  46,  32,  32,  33,  48,  34,  49,  35,  50,  36,
+  51,  37,  52,  38,  53,  39,  54,  40,  55,  41,  56,  42,  57,  43,  58,
+  44,  59,  45,  60,  46,  61,  47,  62,  48,  48,  49,  64,  50,  65,  51,
+  66,  52,  67,  53,  68,  54,  69,  55,  70,  56,  71,  57,  72,  58,  73,
+  59,  74,  60,  75,  61,  76,  62,  77,  63,  78,  64,  64,  65,  80,  66,
+  81,  67,  82,  68,  83,  69,  84,  70,  85,  71,  86,  72,  87,  73,  88,
+  74,  89,  75,  90,  76,  91,  77,  92,  78,  93,  79,  94,  80,  80,  81,
+  96,  82,  97,  83,  98,  84,  99,  85,  100, 86,  101, 87,  102, 88,  103,
+  89,  104, 90,  105, 91,  106, 92,  107, 93,  108, 94,  109, 95,  110, 96,
+  96,  97,  112, 98,  113, 99,  114, 100, 115, 101, 116, 102, 117, 103, 118,
+  104, 119, 105, 120, 106, 121, 107, 122, 108, 123, 109, 124, 110, 125, 111,
+  126, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                default_scan_16x32_neighbors[513 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   0,   0,   1,   1,   1,   16,  16,  16,  2,   2,   2,
+  17,  17,  32,  32,  32,  3,   3,   3,   18,  18,  33,  33,  48,  48,  48,
+  4,   4,   4,   19,  19,  34,  34,  49,  49,  64,  64,  64,  5,   5,   5,
+  20,  20,  35,  35,  50,  50,  65,  65,  80,  80,  80,  6,   6,   6,   21,
+  21,  36,  36,  51,  51,  66,  66,  81,  81,  96,  96,  96,  7,   7,   7,
+  22,  22,  37,  37,  52,  52,  67,  67,  82,  82,  97,  97,  112, 112, 112,
+  8,   8,   8,   23,  23,  38,  38,  53,  53,  68,  68,  83,  83,  98,  98,
+  113, 113, 128, 128, 128, 9,   9,   9,   24,  24,  39,  39,  54,  54,  69,
+  69,  84,  84,  99,  99,  114, 114, 129, 129, 144, 144, 144, 10,  10,  10,
+  25,  25,  40,  40,  55,  55,  70,  70,  85,  85,  100, 100, 115, 115, 130,
+  130, 145, 145, 160, 160, 160, 11,  11,  11,  26,  26,  41,  41,  56,  56,
+  71,  71,  86,  86,  101, 101, 116, 116, 131, 131, 146, 146, 161, 161, 176,
+  176, 176, 12,  12,  12,  27,  27,  42,  42,  57,  57,  72,  72,  87,  87,
+  102, 102, 117, 117, 132, 132, 147, 147, 162, 162, 177, 177, 192, 192, 192,
+  13,  13,  13,  28,  28,  43,  43,  58,  58,  73,  73,  88,  88,  103, 103,
+  118, 118, 133, 133, 148, 148, 163, 163, 178, 178, 193, 193, 208, 208, 208,
+  14,  14,  14,  29,  29,  44,  44,  59,  59,  74,  74,  89,  89,  104, 104,
+  119, 119, 134, 134, 149, 149, 164, 164, 179, 179, 194, 194, 209, 209, 224,
+  224, 224, 15,  30,  30,  45,  45,  60,  60,  75,  75,  90,  90,  105, 105,
+  120, 120, 135, 135, 150, 150, 165, 165, 180, 180, 195, 195, 210, 210, 225,
+  225, 240, 240, 240, 31,  46,  46,  61,  61,  76,  76,  91,  91,  106, 106,
+  121, 121, 136, 136, 151, 151, 166, 166, 181, 181, 196, 196, 211, 211, 226,
+  226, 241, 241, 256, 256, 256, 47,  62,  62,  77,  77,  92,  92,  107, 107,
+  122, 122, 137, 137, 152, 152, 167, 167, 182, 182, 197, 197, 212, 212, 227,
+  227, 242, 242, 257, 257, 272, 272, 272, 63,  78,  78,  93,  93,  108, 108,
+  123, 123, 138, 138, 153, 153, 168, 168, 183, 183, 198, 198, 213, 213, 228,
+  228, 243, 243, 258, 258, 273, 273, 288, 288, 288, 79,  94,  94,  109, 109,
+  124, 124, 139, 139, 154, 154, 169, 169, 184, 184, 199, 199, 214, 214, 229,
+  229, 244, 244, 259, 259, 274, 274, 289, 289, 304, 304, 304, 95,  110, 110,
+  125, 125, 140, 140, 155, 155, 170, 170, 185, 185, 200, 200, 215, 215, 230,
+  230, 245, 245, 260, 260, 275, 275, 290, 290, 305, 305, 320, 320, 320, 111,
+  126, 126, 141, 141, 156, 156, 171, 171, 186, 186, 201, 201, 216, 216, 231,
+  231, 246, 246, 261, 261, 276, 276, 291, 291, 306, 306, 321, 321, 336, 336,
+  336, 127, 142, 142, 157, 157, 172, 172, 187, 187, 202, 202, 217, 217, 232,
+  232, 247, 247, 262, 262, 277, 277, 292, 292, 307, 307, 322, 322, 337, 337,
+  352, 352, 352, 143, 158, 158, 173, 173, 188, 188, 203, 203, 218, 218, 233,
+  233, 248, 248, 263, 263, 278, 278, 293, 293, 308, 308, 323, 323, 338, 338,
+  353, 353, 368, 368, 368, 159, 174, 174, 189, 189, 204, 204, 219, 219, 234,
+  234, 249, 249, 264, 264, 279, 279, 294, 294, 309, 309, 324, 324, 339, 339,
+  354, 354, 369, 369, 384, 384, 384, 175, 190, 190, 205, 205, 220, 220, 235,
+  235, 250, 250, 265, 265, 280, 280, 295, 295, 310, 310, 325, 325, 340, 340,
+  355, 355, 370, 370, 385, 385, 400, 400, 400, 191, 206, 206, 221, 221, 236,
+  236, 251, 251, 266, 266, 281, 281, 296, 296, 311, 311, 326, 326, 341, 341,
+  356, 356, 371, 371, 386, 386, 401, 401, 416, 416, 416, 207, 222, 222, 237,
+  237, 252, 252, 267, 267, 282, 282, 297, 297, 312, 312, 327, 327, 342, 342,
+  357, 357, 372, 372, 387, 387, 402, 402, 417, 417, 432, 432, 432, 223, 238,
+  238, 253, 253, 268, 268, 283, 283, 298, 298, 313, 313, 328, 328, 343, 343,
+  358, 358, 373, 373, 388, 388, 403, 403, 418, 418, 433, 433, 448, 448, 448,
+  239, 254, 254, 269, 269, 284, 284, 299, 299, 314, 314, 329, 329, 344, 344,
+  359, 359, 374, 374, 389, 389, 404, 404, 419, 419, 434, 434, 449, 449, 464,
+  464, 464, 255, 270, 270, 285, 285, 300, 300, 315, 315, 330, 330, 345, 345,
+  360, 360, 375, 375, 390, 390, 405, 405, 420, 420, 435, 435, 450, 450, 465,
+  465, 480, 480, 480, 271, 286, 286, 301, 301, 316, 316, 331, 331, 346, 346,
+  361, 361, 376, 376, 391, 391, 406, 406, 421, 421, 436, 436, 451, 451, 466,
+  466, 481, 481, 496, 287, 302, 302, 317, 317, 332, 332, 347, 347, 362, 362,
+  377, 377, 392, 392, 407, 407, 422, 422, 437, 437, 452, 452, 467, 467, 482,
+  482, 497, 303, 318, 318, 333, 333, 348, 348, 363, 363, 378, 378, 393, 393,
+  408, 408, 423, 423, 438, 438, 453, 453, 468, 468, 483, 483, 498, 319, 334,
+  334, 349, 349, 364, 364, 379, 379, 394, 394, 409, 409, 424, 424, 439, 439,
+  454, 454, 469, 469, 484, 484, 499, 335, 350, 350, 365, 365, 380, 380, 395,
+  395, 410, 410, 425, 425, 440, 440, 455, 455, 470, 470, 485, 485, 500, 351,
+  366, 366, 381, 381, 396, 396, 411, 411, 426, 426, 441, 441, 456, 456, 471,
+  471, 486, 486, 501, 367, 382, 382, 397, 397, 412, 412, 427, 427, 442, 442,
+  457, 457, 472, 472, 487, 487, 502, 383, 398, 398, 413, 413, 428, 428, 443,
+  443, 458, 458, 473, 473, 488, 488, 503, 399, 414, 414, 429, 429, 444, 444,
+  459, 459, 474, 474, 489, 489, 504, 415, 430, 430, 445, 445, 460, 460, 475,
+  475, 490, 490, 505, 431, 446, 446, 461, 461, 476, 476, 491, 491, 506, 447,
+  462, 462, 477, 477, 492, 492, 507, 463, 478, 478, 493, 493, 508, 479, 494,
+  494, 509, 495, 510, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                default_scan_32x16_neighbors[513 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   0,   0,   1,   1,   1,   32,  32,  32,  2,   2,   2,
+  33,  33,  64,  64,  64,  3,   3,   3,   34,  34,  65,  65,  96,  96,  96,
+  4,   4,   4,   35,  35,  66,  66,  97,  97,  128, 128, 128, 5,   5,   5,
+  36,  36,  67,  67,  98,  98,  129, 129, 160, 160, 160, 6,   6,   6,   37,
+  37,  68,  68,  99,  99,  130, 130, 161, 161, 192, 192, 192, 7,   7,   7,
+  38,  38,  69,  69,  100, 100, 131, 131, 162, 162, 193, 193, 224, 224, 224,
+  8,   8,   8,   39,  39,  70,  70,  101, 101, 132, 132, 163, 163, 194, 194,
+  225, 225, 256, 256, 256, 9,   9,   9,   40,  40,  71,  71,  102, 102, 133,
+  133, 164, 164, 195, 195, 226, 226, 257, 257, 288, 288, 288, 10,  10,  10,
+  41,  41,  72,  72,  103, 103, 134, 134, 165, 165, 196, 196, 227, 227, 258,
+  258, 289, 289, 320, 320, 320, 11,  11,  11,  42,  42,  73,  73,  104, 104,
+  135, 135, 166, 166, 197, 197, 228, 228, 259, 259, 290, 290, 321, 321, 352,
+  352, 352, 12,  12,  12,  43,  43,  74,  74,  105, 105, 136, 136, 167, 167,
+  198, 198, 229, 229, 260, 260, 291, 291, 322, 322, 353, 353, 384, 384, 384,
+  13,  13,  13,  44,  44,  75,  75,  106, 106, 137, 137, 168, 168, 199, 199,
+  230, 230, 261, 261, 292, 292, 323, 323, 354, 354, 385, 385, 416, 416, 416,
+  14,  14,  14,  45,  45,  76,  76,  107, 107, 138, 138, 169, 169, 200, 200,
+  231, 231, 262, 262, 293, 293, 324, 324, 355, 355, 386, 386, 417, 417, 448,
+  448, 448, 15,  15,  15,  46,  46,  77,  77,  108, 108, 139, 139, 170, 170,
+  201, 201, 232, 232, 263, 263, 294, 294, 325, 325, 356, 356, 387, 387, 418,
+  418, 449, 449, 480, 16,  16,  16,  47,  47,  78,  78,  109, 109, 140, 140,
+  171, 171, 202, 202, 233, 233, 264, 264, 295, 295, 326, 326, 357, 357, 388,
+  388, 419, 419, 450, 450, 481, 17,  17,  17,  48,  48,  79,  79,  110, 110,
+  141, 141, 172, 172, 203, 203, 234, 234, 265, 265, 296, 296, 327, 327, 358,
+  358, 389, 389, 420, 420, 451, 451, 482, 18,  18,  18,  49,  49,  80,  80,
+  111, 111, 142, 142, 173, 173, 204, 204, 235, 235, 266, 266, 297, 297, 328,
+  328, 359, 359, 390, 390, 421, 421, 452, 452, 483, 19,  19,  19,  50,  50,
+  81,  81,  112, 112, 143, 143, 174, 174, 205, 205, 236, 236, 267, 267, 298,
+  298, 329, 329, 360, 360, 391, 391, 422, 422, 453, 453, 484, 20,  20,  20,
+  51,  51,  82,  82,  113, 113, 144, 144, 175, 175, 206, 206, 237, 237, 268,
+  268, 299, 299, 330, 330, 361, 361, 392, 392, 423, 423, 454, 454, 485, 21,
+  21,  21,  52,  52,  83,  83,  114, 114, 145, 145, 176, 176, 207, 207, 238,
+  238, 269, 269, 300, 300, 331, 331, 362, 362, 393, 393, 424, 424, 455, 455,
+  486, 22,  22,  22,  53,  53,  84,  84,  115, 115, 146, 146, 177, 177, 208,
+  208, 239, 239, 270, 270, 301, 301, 332, 332, 363, 363, 394, 394, 425, 425,
+  456, 456, 487, 23,  23,  23,  54,  54,  85,  85,  116, 116, 147, 147, 178,
+  178, 209, 209, 240, 240, 271, 271, 302, 302, 333, 333, 364, 364, 395, 395,
+  426, 426, 457, 457, 488, 24,  24,  24,  55,  55,  86,  86,  117, 117, 148,
+  148, 179, 179, 210, 210, 241, 241, 272, 272, 303, 303, 334, 334, 365, 365,
+  396, 396, 427, 427, 458, 458, 489, 25,  25,  25,  56,  56,  87,  87,  118,
+  118, 149, 149, 180, 180, 211, 211, 242, 242, 273, 273, 304, 304, 335, 335,
+  366, 366, 397, 397, 428, 428, 459, 459, 490, 26,  26,  26,  57,  57,  88,
+  88,  119, 119, 150, 150, 181, 181, 212, 212, 243, 243, 274, 274, 305, 305,
+  336, 336, 367, 367, 398, 398, 429, 429, 460, 460, 491, 27,  27,  27,  58,
+  58,  89,  89,  120, 120, 151, 151, 182, 182, 213, 213, 244, 244, 275, 275,
+  306, 306, 337, 337, 368, 368, 399, 399, 430, 430, 461, 461, 492, 28,  28,
+  28,  59,  59,  90,  90,  121, 121, 152, 152, 183, 183, 214, 214, 245, 245,
+  276, 276, 307, 307, 338, 338, 369, 369, 400, 400, 431, 431, 462, 462, 493,
+  29,  29,  29,  60,  60,  91,  91,  122, 122, 153, 153, 184, 184, 215, 215,
+  246, 246, 277, 277, 308, 308, 339, 339, 370, 370, 401, 401, 432, 432, 463,
+  463, 494, 30,  30,  30,  61,  61,  92,  92,  123, 123, 154, 154, 185, 185,
+  216, 216, 247, 247, 278, 278, 309, 309, 340, 340, 371, 371, 402, 402, 433,
+  433, 464, 464, 495, 31,  62,  62,  93,  93,  124, 124, 155, 155, 186, 186,
+  217, 217, 248, 248, 279, 279, 310, 310, 341, 341, 372, 372, 403, 403, 434,
+  434, 465, 465, 496, 63,  94,  94,  125, 125, 156, 156, 187, 187, 218, 218,
+  249, 249, 280, 280, 311, 311, 342, 342, 373, 373, 404, 404, 435, 435, 466,
+  466, 497, 95,  126, 126, 157, 157, 188, 188, 219, 219, 250, 250, 281, 281,
+  312, 312, 343, 343, 374, 374, 405, 405, 436, 436, 467, 467, 498, 127, 158,
+  158, 189, 189, 220, 220, 251, 251, 282, 282, 313, 313, 344, 344, 375, 375,
+  406, 406, 437, 437, 468, 468, 499, 159, 190, 190, 221, 221, 252, 252, 283,
+  283, 314, 314, 345, 345, 376, 376, 407, 407, 438, 438, 469, 469, 500, 191,
+  222, 222, 253, 253, 284, 284, 315, 315, 346, 346, 377, 377, 408, 408, 439,
+  439, 470, 470, 501, 223, 254, 254, 285, 285, 316, 316, 347, 347, 378, 378,
+  409, 409, 440, 440, 471, 471, 502, 255, 286, 286, 317, 317, 348, 348, 379,
+  379, 410, 410, 441, 441, 472, 472, 503, 287, 318, 318, 349, 349, 380, 380,
+  411, 411, 442, 442, 473, 473, 504, 319, 350, 350, 381, 381, 412, 412, 443,
+  443, 474, 474, 505, 351, 382, 382, 413, 413, 444, 444, 475, 475, 506, 383,
+  414, 414, 445, 445, 476, 476, 507, 415, 446, 446, 477, 477, 508, 447, 478,
+  478, 509, 479, 510, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mcol_scan_16x32_neighbors[513 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   16,  16,  32,  32,  48,  48,  64,  64,  80,  80,  96,
+  96,  112, 112, 128, 128, 144, 144, 160, 160, 176, 176, 192, 192, 208, 208,
+  224, 224, 240, 240, 256, 256, 272, 272, 288, 288, 304, 304, 320, 320, 336,
+  336, 352, 352, 368, 368, 384, 384, 400, 400, 416, 416, 432, 432, 448, 448,
+  464, 464, 480, 480, 0,   0,   1,   16,  17,  32,  33,  48,  49,  64,  65,
+  80,  81,  96,  97,  112, 113, 128, 129, 144, 145, 160, 161, 176, 177, 192,
+  193, 208, 209, 224, 225, 240, 241, 256, 257, 272, 273, 288, 289, 304, 305,
+  320, 321, 336, 337, 352, 353, 368, 369, 384, 385, 400, 401, 416, 417, 432,
+  433, 448, 449, 464, 465, 480, 481, 496, 1,   1,   2,   17,  18,  33,  34,
+  49,  50,  65,  66,  81,  82,  97,  98,  113, 114, 129, 130, 145, 146, 161,
+  162, 177, 178, 193, 194, 209, 210, 225, 226, 241, 242, 257, 258, 273, 274,
+  289, 290, 305, 306, 321, 322, 337, 338, 353, 354, 369, 370, 385, 386, 401,
+  402, 417, 418, 433, 434, 449, 450, 465, 466, 481, 482, 497, 2,   2,   3,
+  18,  19,  34,  35,  50,  51,  66,  67,  82,  83,  98,  99,  114, 115, 130,
+  131, 146, 147, 162, 163, 178, 179, 194, 195, 210, 211, 226, 227, 242, 243,
+  258, 259, 274, 275, 290, 291, 306, 307, 322, 323, 338, 339, 354, 355, 370,
+  371, 386, 387, 402, 403, 418, 419, 434, 435, 450, 451, 466, 467, 482, 483,
+  498, 3,   3,   4,   19,  20,  35,  36,  51,  52,  67,  68,  83,  84,  99,
+  100, 115, 116, 131, 132, 147, 148, 163, 164, 179, 180, 195, 196, 211, 212,
+  227, 228, 243, 244, 259, 260, 275, 276, 291, 292, 307, 308, 323, 324, 339,
+  340, 355, 356, 371, 372, 387, 388, 403, 404, 419, 420, 435, 436, 451, 452,
+  467, 468, 483, 484, 499, 4,   4,   5,   20,  21,  36,  37,  52,  53,  68,
+  69,  84,  85,  100, 101, 116, 117, 132, 133, 148, 149, 164, 165, 180, 181,
+  196, 197, 212, 213, 228, 229, 244, 245, 260, 261, 276, 277, 292, 293, 308,
+  309, 324, 325, 340, 341, 356, 357, 372, 373, 388, 389, 404, 405, 420, 421,
+  436, 437, 452, 453, 468, 469, 484, 485, 500, 5,   5,   6,   21,  22,  37,
+  38,  53,  54,  69,  70,  85,  86,  101, 102, 117, 118, 133, 134, 149, 150,
+  165, 166, 181, 182, 197, 198, 213, 214, 229, 230, 245, 246, 261, 262, 277,
+  278, 293, 294, 309, 310, 325, 326, 341, 342, 357, 358, 373, 374, 389, 390,
+  405, 406, 421, 422, 437, 438, 453, 454, 469, 470, 485, 486, 501, 6,   6,
+  7,   22,  23,  38,  39,  54,  55,  70,  71,  86,  87,  102, 103, 118, 119,
+  134, 135, 150, 151, 166, 167, 182, 183, 198, 199, 214, 215, 230, 231, 246,
+  247, 262, 263, 278, 279, 294, 295, 310, 311, 326, 327, 342, 343, 358, 359,
+  374, 375, 390, 391, 406, 407, 422, 423, 438, 439, 454, 455, 470, 471, 486,
+  487, 502, 7,   7,   8,   23,  24,  39,  40,  55,  56,  71,  72,  87,  88,
+  103, 104, 119, 120, 135, 136, 151, 152, 167, 168, 183, 184, 199, 200, 215,
+  216, 231, 232, 247, 248, 263, 264, 279, 280, 295, 296, 311, 312, 327, 328,
+  343, 344, 359, 360, 375, 376, 391, 392, 407, 408, 423, 424, 439, 440, 455,
+  456, 471, 472, 487, 488, 503, 8,   8,   9,   24,  25,  40,  41,  56,  57,
+  72,  73,  88,  89,  104, 105, 120, 121, 136, 137, 152, 153, 168, 169, 184,
+  185, 200, 201, 216, 217, 232, 233, 248, 249, 264, 265, 280, 281, 296, 297,
+  312, 313, 328, 329, 344, 345, 360, 361, 376, 377, 392, 393, 408, 409, 424,
+  425, 440, 441, 456, 457, 472, 473, 488, 489, 504, 9,   9,   10,  25,  26,
+  41,  42,  57,  58,  73,  74,  89,  90,  105, 106, 121, 122, 137, 138, 153,
+  154, 169, 170, 185, 186, 201, 202, 217, 218, 233, 234, 249, 250, 265, 266,
+  281, 282, 297, 298, 313, 314, 329, 330, 345, 346, 361, 362, 377, 378, 393,
+  394, 409, 410, 425, 426, 441, 442, 457, 458, 473, 474, 489, 490, 505, 10,
+  10,  11,  26,  27,  42,  43,  58,  59,  74,  75,  90,  91,  106, 107, 122,
+  123, 138, 139, 154, 155, 170, 171, 186, 187, 202, 203, 218, 219, 234, 235,
+  250, 251, 266, 267, 282, 283, 298, 299, 314, 315, 330, 331, 346, 347, 362,
+  363, 378, 379, 394, 395, 410, 411, 426, 427, 442, 443, 458, 459, 474, 475,
+  490, 491, 506, 11,  11,  12,  27,  28,  43,  44,  59,  60,  75,  76,  91,
+  92,  107, 108, 123, 124, 139, 140, 155, 156, 171, 172, 187, 188, 203, 204,
+  219, 220, 235, 236, 251, 252, 267, 268, 283, 284, 299, 300, 315, 316, 331,
+  332, 347, 348, 363, 364, 379, 380, 395, 396, 411, 412, 427, 428, 443, 444,
+  459, 460, 475, 476, 491, 492, 507, 12,  12,  13,  28,  29,  44,  45,  60,
+  61,  76,  77,  92,  93,  108, 109, 124, 125, 140, 141, 156, 157, 172, 173,
+  188, 189, 204, 205, 220, 221, 236, 237, 252, 253, 268, 269, 284, 285, 300,
+  301, 316, 317, 332, 333, 348, 349, 364, 365, 380, 381, 396, 397, 412, 413,
+  428, 429, 444, 445, 460, 461, 476, 477, 492, 493, 508, 13,  13,  14,  29,
+  30,  45,  46,  61,  62,  77,  78,  93,  94,  109, 110, 125, 126, 141, 142,
+  157, 158, 173, 174, 189, 190, 205, 206, 221, 222, 237, 238, 253, 254, 269,
+  270, 285, 286, 301, 302, 317, 318, 333, 334, 349, 350, 365, 366, 381, 382,
+  397, 398, 413, 414, 429, 430, 445, 446, 461, 462, 477, 478, 493, 494, 509,
+  14,  14,  15,  30,  31,  46,  47,  62,  63,  78,  79,  94,  95,  110, 111,
+  126, 127, 142, 143, 158, 159, 174, 175, 190, 191, 206, 207, 222, 223, 238,
+  239, 254, 255, 270, 271, 286, 287, 302, 303, 318, 319, 334, 335, 350, 351,
+  366, 367, 382, 383, 398, 399, 414, 415, 430, 431, 446, 447, 462, 463, 478,
+  479, 494, 495, 510, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mcol_scan_32x16_neighbors[513 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   32,  32,  64,  64,  96,  96,  128, 128, 160, 160, 192,
+  192, 224, 224, 256, 256, 288, 288, 320, 320, 352, 352, 384, 384, 416, 416,
+  448, 448, 0,   0,   1,   32,  33,  64,  65,  96,  97,  128, 129, 160, 161,
+  192, 193, 224, 225, 256, 257, 288, 289, 320, 321, 352, 353, 384, 385, 416,
+  417, 448, 449, 480, 1,   1,   2,   33,  34,  65,  66,  97,  98,  129, 130,
+  161, 162, 193, 194, 225, 226, 257, 258, 289, 290, 321, 322, 353, 354, 385,
+  386, 417, 418, 449, 450, 481, 2,   2,   3,   34,  35,  66,  67,  98,  99,
+  130, 131, 162, 163, 194, 195, 226, 227, 258, 259, 290, 291, 322, 323, 354,
+  355, 386, 387, 418, 419, 450, 451, 482, 3,   3,   4,   35,  36,  67,  68,
+  99,  100, 131, 132, 163, 164, 195, 196, 227, 228, 259, 260, 291, 292, 323,
+  324, 355, 356, 387, 388, 419, 420, 451, 452, 483, 4,   4,   5,   36,  37,
+  68,  69,  100, 101, 132, 133, 164, 165, 196, 197, 228, 229, 260, 261, 292,
+  293, 324, 325, 356, 357, 388, 389, 420, 421, 452, 453, 484, 5,   5,   6,
+  37,  38,  69,  70,  101, 102, 133, 134, 165, 166, 197, 198, 229, 230, 261,
+  262, 293, 294, 325, 326, 357, 358, 389, 390, 421, 422, 453, 454, 485, 6,
+  6,   7,   38,  39,  70,  71,  102, 103, 134, 135, 166, 167, 198, 199, 230,
+  231, 262, 263, 294, 295, 326, 327, 358, 359, 390, 391, 422, 423, 454, 455,
+  486, 7,   7,   8,   39,  40,  71,  72,  103, 104, 135, 136, 167, 168, 199,
+  200, 231, 232, 263, 264, 295, 296, 327, 328, 359, 360, 391, 392, 423, 424,
+  455, 456, 487, 8,   8,   9,   40,  41,  72,  73,  104, 105, 136, 137, 168,
+  169, 200, 201, 232, 233, 264, 265, 296, 297, 328, 329, 360, 361, 392, 393,
+  424, 425, 456, 457, 488, 9,   9,   10,  41,  42,  73,  74,  105, 106, 137,
+  138, 169, 170, 201, 202, 233, 234, 265, 266, 297, 298, 329, 330, 361, 362,
+  393, 394, 425, 426, 457, 458, 489, 10,  10,  11,  42,  43,  74,  75,  106,
+  107, 138, 139, 170, 171, 202, 203, 234, 235, 266, 267, 298, 299, 330, 331,
+  362, 363, 394, 395, 426, 427, 458, 459, 490, 11,  11,  12,  43,  44,  75,
+  76,  107, 108, 139, 140, 171, 172, 203, 204, 235, 236, 267, 268, 299, 300,
+  331, 332, 363, 364, 395, 396, 427, 428, 459, 460, 491, 12,  12,  13,  44,
+  45,  76,  77,  108, 109, 140, 141, 172, 173, 204, 205, 236, 237, 268, 269,
+  300, 301, 332, 333, 364, 365, 396, 397, 428, 429, 460, 461, 492, 13,  13,
+  14,  45,  46,  77,  78,  109, 110, 141, 142, 173, 174, 205, 206, 237, 238,
+  269, 270, 301, 302, 333, 334, 365, 366, 397, 398, 429, 430, 461, 462, 493,
+  14,  14,  15,  46,  47,  78,  79,  110, 111, 142, 143, 174, 175, 206, 207,
+  238, 239, 270, 271, 302, 303, 334, 335, 366, 367, 398, 399, 430, 431, 462,
+  463, 494, 15,  15,  16,  47,  48,  79,  80,  111, 112, 143, 144, 175, 176,
+  207, 208, 239, 240, 271, 272, 303, 304, 335, 336, 367, 368, 399, 400, 431,
+  432, 463, 464, 495, 16,  16,  17,  48,  49,  80,  81,  112, 113, 144, 145,
+  176, 177, 208, 209, 240, 241, 272, 273, 304, 305, 336, 337, 368, 369, 400,
+  401, 432, 433, 464, 465, 496, 17,  17,  18,  49,  50,  81,  82,  113, 114,
+  145, 146, 177, 178, 209, 210, 241, 242, 273, 274, 305, 306, 337, 338, 369,
+  370, 401, 402, 433, 434, 465, 466, 497, 18,  18,  19,  50,  51,  82,  83,
+  114, 115, 146, 147, 178, 179, 210, 211, 242, 243, 274, 275, 306, 307, 338,
+  339, 370, 371, 402, 403, 434, 435, 466, 467, 498, 19,  19,  20,  51,  52,
+  83,  84,  115, 116, 147, 148, 179, 180, 211, 212, 243, 244, 275, 276, 307,
+  308, 339, 340, 371, 372, 403, 404, 435, 436, 467, 468, 499, 20,  20,  21,
+  52,  53,  84,  85,  116, 117, 148, 149, 180, 181, 212, 213, 244, 245, 276,
+  277, 308, 309, 340, 341, 372, 373, 404, 405, 436, 437, 468, 469, 500, 21,
+  21,  22,  53,  54,  85,  86,  117, 118, 149, 150, 181, 182, 213, 214, 245,
+  246, 277, 278, 309, 310, 341, 342, 373, 374, 405, 406, 437, 438, 469, 470,
+  501, 22,  22,  23,  54,  55,  86,  87,  118, 119, 150, 151, 182, 183, 214,
+  215, 246, 247, 278, 279, 310, 311, 342, 343, 374, 375, 406, 407, 438, 439,
+  470, 471, 502, 23,  23,  24,  55,  56,  87,  88,  119, 120, 151, 152, 183,
+  184, 215, 216, 247, 248, 279, 280, 311, 312, 343, 344, 375, 376, 407, 408,
+  439, 440, 471, 472, 503, 24,  24,  25,  56,  57,  88,  89,  120, 121, 152,
+  153, 184, 185, 216, 217, 248, 249, 280, 281, 312, 313, 344, 345, 376, 377,
+  408, 409, 440, 441, 472, 473, 504, 25,  25,  26,  57,  58,  89,  90,  121,
+  122, 153, 154, 185, 186, 217, 218, 249, 250, 281, 282, 313, 314, 345, 346,
+  377, 378, 409, 410, 441, 442, 473, 474, 505, 26,  26,  27,  58,  59,  90,
+  91,  122, 123, 154, 155, 186, 187, 218, 219, 250, 251, 282, 283, 314, 315,
+  346, 347, 378, 379, 410, 411, 442, 443, 474, 475, 506, 27,  27,  28,  59,
+  60,  91,  92,  123, 124, 155, 156, 187, 188, 219, 220, 251, 252, 283, 284,
+  315, 316, 347, 348, 379, 380, 411, 412, 443, 444, 475, 476, 507, 28,  28,
+  29,  60,  61,  92,  93,  124, 125, 156, 157, 188, 189, 220, 221, 252, 253,
+  284, 285, 316, 317, 348, 349, 380, 381, 412, 413, 444, 445, 476, 477, 508,
+  29,  29,  30,  61,  62,  93,  94,  125, 126, 157, 158, 189, 190, 221, 222,
+  253, 254, 285, 286, 317, 318, 349, 350, 381, 382, 413, 414, 445, 446, 477,
+  478, 509, 30,  30,  31,  62,  63,  94,  95,  126, 127, 158, 159, 190, 191,
+  222, 223, 254, 255, 286, 287, 318, 319, 350, 351, 382, 383, 414, 415, 446,
+  447, 478, 479, 510, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mrow_scan_16x32_neighbors[513 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   1,   1,   2,   2,   3,   3,   4,   4,   5,   5,   6,
+  6,   7,   7,   8,   8,   9,   9,   10,  10,  11,  11,  12,  12,  13,  13,
+  14,  14,  0,   0,   1,   16,  2,   17,  3,   18,  4,   19,  5,   20,  6,
+  21,  7,   22,  8,   23,  9,   24,  10,  25,  11,  26,  12,  27,  13,  28,
+  14,  29,  15,  30,  16,  16,  17,  32,  18,  33,  19,  34,  20,  35,  21,
+  36,  22,  37,  23,  38,  24,  39,  25,  40,  26,  41,  27,  42,  28,  43,
+  29,  44,  30,  45,  31,  46,  32,  32,  33,  48,  34,  49,  35,  50,  36,
+  51,  37,  52,  38,  53,  39,  54,  40,  55,  41,  56,  42,  57,  43,  58,
+  44,  59,  45,  60,  46,  61,  47,  62,  48,  48,  49,  64,  50,  65,  51,
+  66,  52,  67,  53,  68,  54,  69,  55,  70,  56,  71,  57,  72,  58,  73,
+  59,  74,  60,  75,  61,  76,  62,  77,  63,  78,  64,  64,  65,  80,  66,
+  81,  67,  82,  68,  83,  69,  84,  70,  85,  71,  86,  72,  87,  73,  88,
+  74,  89,  75,  90,  76,  91,  77,  92,  78,  93,  79,  94,  80,  80,  81,
+  96,  82,  97,  83,  98,  84,  99,  85,  100, 86,  101, 87,  102, 88,  103,
+  89,  104, 90,  105, 91,  106, 92,  107, 93,  108, 94,  109, 95,  110, 96,
+  96,  97,  112, 98,  113, 99,  114, 100, 115, 101, 116, 102, 117, 103, 118,
+  104, 119, 105, 120, 106, 121, 107, 122, 108, 123, 109, 124, 110, 125, 111,
+  126, 112, 112, 113, 128, 114, 129, 115, 130, 116, 131, 117, 132, 118, 133,
+  119, 134, 120, 135, 121, 136, 122, 137, 123, 138, 124, 139, 125, 140, 126,
+  141, 127, 142, 128, 128, 129, 144, 130, 145, 131, 146, 132, 147, 133, 148,
+  134, 149, 135, 150, 136, 151, 137, 152, 138, 153, 139, 154, 140, 155, 141,
+  156, 142, 157, 143, 158, 144, 144, 145, 160, 146, 161, 147, 162, 148, 163,
+  149, 164, 150, 165, 151, 166, 152, 167, 153, 168, 154, 169, 155, 170, 156,
+  171, 157, 172, 158, 173, 159, 174, 160, 160, 161, 176, 162, 177, 163, 178,
+  164, 179, 165, 180, 166, 181, 167, 182, 168, 183, 169, 184, 170, 185, 171,
+  186, 172, 187, 173, 188, 174, 189, 175, 190, 176, 176, 177, 192, 178, 193,
+  179, 194, 180, 195, 181, 196, 182, 197, 183, 198, 184, 199, 185, 200, 186,
+  201, 187, 202, 188, 203, 189, 204, 190, 205, 191, 206, 192, 192, 193, 208,
+  194, 209, 195, 210, 196, 211, 197, 212, 198, 213, 199, 214, 200, 215, 201,
+  216, 202, 217, 203, 218, 204, 219, 205, 220, 206, 221, 207, 222, 208, 208,
+  209, 224, 210, 225, 211, 226, 212, 227, 213, 228, 214, 229, 215, 230, 216,
+  231, 217, 232, 218, 233, 219, 234, 220, 235, 221, 236, 222, 237, 223, 238,
+  224, 224, 225, 240, 226, 241, 227, 242, 228, 243, 229, 244, 230, 245, 231,
+  246, 232, 247, 233, 248, 234, 249, 235, 250, 236, 251, 237, 252, 238, 253,
+  239, 254, 240, 240, 241, 256, 242, 257, 243, 258, 244, 259, 245, 260, 246,
+  261, 247, 262, 248, 263, 249, 264, 250, 265, 251, 266, 252, 267, 253, 268,
+  254, 269, 255, 270, 256, 256, 257, 272, 258, 273, 259, 274, 260, 275, 261,
+  276, 262, 277, 263, 278, 264, 279, 265, 280, 266, 281, 267, 282, 268, 283,
+  269, 284, 270, 285, 271, 286, 272, 272, 273, 288, 274, 289, 275, 290, 276,
+  291, 277, 292, 278, 293, 279, 294, 280, 295, 281, 296, 282, 297, 283, 298,
+  284, 299, 285, 300, 286, 301, 287, 302, 288, 288, 289, 304, 290, 305, 291,
+  306, 292, 307, 293, 308, 294, 309, 295, 310, 296, 311, 297, 312, 298, 313,
+  299, 314, 300, 315, 301, 316, 302, 317, 303, 318, 304, 304, 305, 320, 306,
+  321, 307, 322, 308, 323, 309, 324, 310, 325, 311, 326, 312, 327, 313, 328,
+  314, 329, 315, 330, 316, 331, 317, 332, 318, 333, 319, 334, 320, 320, 321,
+  336, 322, 337, 323, 338, 324, 339, 325, 340, 326, 341, 327, 342, 328, 343,
+  329, 344, 330, 345, 331, 346, 332, 347, 333, 348, 334, 349, 335, 350, 336,
+  336, 337, 352, 338, 353, 339, 354, 340, 355, 341, 356, 342, 357, 343, 358,
+  344, 359, 345, 360, 346, 361, 347, 362, 348, 363, 349, 364, 350, 365, 351,
+  366, 352, 352, 353, 368, 354, 369, 355, 370, 356, 371, 357, 372, 358, 373,
+  359, 374, 360, 375, 361, 376, 362, 377, 363, 378, 364, 379, 365, 380, 366,
+  381, 367, 382, 368, 368, 369, 384, 370, 385, 371, 386, 372, 387, 373, 388,
+  374, 389, 375, 390, 376, 391, 377, 392, 378, 393, 379, 394, 380, 395, 381,
+  396, 382, 397, 383, 398, 384, 384, 385, 400, 386, 401, 387, 402, 388, 403,
+  389, 404, 390, 405, 391, 406, 392, 407, 393, 408, 394, 409, 395, 410, 396,
+  411, 397, 412, 398, 413, 399, 414, 400, 400, 401, 416, 402, 417, 403, 418,
+  404, 419, 405, 420, 406, 421, 407, 422, 408, 423, 409, 424, 410, 425, 411,
+  426, 412, 427, 413, 428, 414, 429, 415, 430, 416, 416, 417, 432, 418, 433,
+  419, 434, 420, 435, 421, 436, 422, 437, 423, 438, 424, 439, 425, 440, 426,
+  441, 427, 442, 428, 443, 429, 444, 430, 445, 431, 446, 432, 432, 433, 448,
+  434, 449, 435, 450, 436, 451, 437, 452, 438, 453, 439, 454, 440, 455, 441,
+  456, 442, 457, 443, 458, 444, 459, 445, 460, 446, 461, 447, 462, 448, 448,
+  449, 464, 450, 465, 451, 466, 452, 467, 453, 468, 454, 469, 455, 470, 456,
+  471, 457, 472, 458, 473, 459, 474, 460, 475, 461, 476, 462, 477, 463, 478,
+  464, 464, 465, 480, 466, 481, 467, 482, 468, 483, 469, 484, 470, 485, 471,
+  486, 472, 487, 473, 488, 474, 489, 475, 490, 476, 491, 477, 492, 478, 493,
+  479, 494, 480, 480, 481, 496, 482, 497, 483, 498, 484, 499, 485, 500, 486,
+  501, 487, 502, 488, 503, 489, 504, 490, 505, 491, 506, 492, 507, 493, 508,
+  494, 509, 495, 510, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mrow_scan_32x16_neighbors[513 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   1,   1,   2,   2,   3,   3,   4,   4,   5,   5,   6,
+  6,   7,   7,   8,   8,   9,   9,   10,  10,  11,  11,  12,  12,  13,  13,
+  14,  14,  15,  15,  16,  16,  17,  17,  18,  18,  19,  19,  20,  20,  21,
+  21,  22,  22,  23,  23,  24,  24,  25,  25,  26,  26,  27,  27,  28,  28,
+  29,  29,  30,  30,  0,   0,   1,   32,  2,   33,  3,   34,  4,   35,  5,
+  36,  6,   37,  7,   38,  8,   39,  9,   40,  10,  41,  11,  42,  12,  43,
+  13,  44,  14,  45,  15,  46,  16,  47,  17,  48,  18,  49,  19,  50,  20,
+  51,  21,  52,  22,  53,  23,  54,  24,  55,  25,  56,  26,  57,  27,  58,
+  28,  59,  29,  60,  30,  61,  31,  62,  32,  32,  33,  64,  34,  65,  35,
+  66,  36,  67,  37,  68,  38,  69,  39,  70,  40,  71,  41,  72,  42,  73,
+  43,  74,  44,  75,  45,  76,  46,  77,  47,  78,  48,  79,  49,  80,  50,
+  81,  51,  82,  52,  83,  53,  84,  54,  85,  55,  86,  56,  87,  57,  88,
+  58,  89,  59,  90,  60,  91,  61,  92,  62,  93,  63,  94,  64,  64,  65,
+  96,  66,  97,  67,  98,  68,  99,  69,  100, 70,  101, 71,  102, 72,  103,
+  73,  104, 74,  105, 75,  106, 76,  107, 77,  108, 78,  109, 79,  110, 80,
+  111, 81,  112, 82,  113, 83,  114, 84,  115, 85,  116, 86,  117, 87,  118,
+  88,  119, 89,  120, 90,  121, 91,  122, 92,  123, 93,  124, 94,  125, 95,
+  126, 96,  96,  97,  128, 98,  129, 99,  130, 100, 131, 101, 132, 102, 133,
+  103, 134, 104, 135, 105, 136, 106, 137, 107, 138, 108, 139, 109, 140, 110,
+  141, 111, 142, 112, 143, 113, 144, 114, 145, 115, 146, 116, 147, 117, 148,
+  118, 149, 119, 150, 120, 151, 121, 152, 122, 153, 123, 154, 124, 155, 125,
+  156, 126, 157, 127, 158, 128, 128, 129, 160, 130, 161, 131, 162, 132, 163,
+  133, 164, 134, 165, 135, 166, 136, 167, 137, 168, 138, 169, 139, 170, 140,
+  171, 141, 172, 142, 173, 143, 174, 144, 175, 145, 176, 146, 177, 147, 178,
+  148, 179, 149, 180, 150, 181, 151, 182, 152, 183, 153, 184, 154, 185, 155,
+  186, 156, 187, 157, 188, 158, 189, 159, 190, 160, 160, 161, 192, 162, 193,
+  163, 194, 164, 195, 165, 196, 166, 197, 167, 198, 168, 199, 169, 200, 170,
+  201, 171, 202, 172, 203, 173, 204, 174, 205, 175, 206, 176, 207, 177, 208,
+  178, 209, 179, 210, 180, 211, 181, 212, 182, 213, 183, 214, 184, 215, 185,
+  216, 186, 217, 187, 218, 188, 219, 189, 220, 190, 221, 191, 222, 192, 192,
+  193, 224, 194, 225, 195, 226, 196, 227, 197, 228, 198, 229, 199, 230, 200,
+  231, 201, 232, 202, 233, 203, 234, 204, 235, 205, 236, 206, 237, 207, 238,
+  208, 239, 209, 240, 210, 241, 211, 242, 212, 243, 213, 244, 214, 245, 215,
+  246, 216, 247, 217, 248, 218, 249, 219, 250, 220, 251, 221, 252, 222, 253,
+  223, 254, 224, 224, 225, 256, 226, 257, 227, 258, 228, 259, 229, 260, 230,
+  261, 231, 262, 232, 263, 233, 264, 234, 265, 235, 266, 236, 267, 237, 268,
+  238, 269, 239, 270, 240, 271, 241, 272, 242, 273, 243, 274, 244, 275, 245,
+  276, 246, 277, 247, 278, 248, 279, 249, 280, 250, 281, 251, 282, 252, 283,
+  253, 284, 254, 285, 255, 286, 256, 256, 257, 288, 258, 289, 259, 290, 260,
+  291, 261, 292, 262, 293, 263, 294, 264, 295, 265, 296, 266, 297, 267, 298,
+  268, 299, 269, 300, 270, 301, 271, 302, 272, 303, 273, 304, 274, 305, 275,
+  306, 276, 307, 277, 308, 278, 309, 279, 310, 280, 311, 281, 312, 282, 313,
+  283, 314, 284, 315, 285, 316, 286, 317, 287, 318, 288, 288, 289, 320, 290,
+  321, 291, 322, 292, 323, 293, 324, 294, 325, 295, 326, 296, 327, 297, 328,
+  298, 329, 299, 330, 300, 331, 301, 332, 302, 333, 303, 334, 304, 335, 305,
+  336, 306, 337, 307, 338, 308, 339, 309, 340, 310, 341, 311, 342, 312, 343,
+  313, 344, 314, 345, 315, 346, 316, 347, 317, 348, 318, 349, 319, 350, 320,
+  320, 321, 352, 322, 353, 323, 354, 324, 355, 325, 356, 326, 357, 327, 358,
+  328, 359, 329, 360, 330, 361, 331, 362, 332, 363, 333, 364, 334, 365, 335,
+  366, 336, 367, 337, 368, 338, 369, 339, 370, 340, 371, 341, 372, 342, 373,
+  343, 374, 344, 375, 345, 376, 346, 377, 347, 378, 348, 379, 349, 380, 350,
+  381, 351, 382, 352, 352, 353, 384, 354, 385, 355, 386, 356, 387, 357, 388,
+  358, 389, 359, 390, 360, 391, 361, 392, 362, 393, 363, 394, 364, 395, 365,
+  396, 366, 397, 367, 398, 368, 399, 369, 400, 370, 401, 371, 402, 372, 403,
+  373, 404, 374, 405, 375, 406, 376, 407, 377, 408, 378, 409, 379, 410, 380,
+  411, 381, 412, 382, 413, 383, 414, 384, 384, 385, 416, 386, 417, 387, 418,
+  388, 419, 389, 420, 390, 421, 391, 422, 392, 423, 393, 424, 394, 425, 395,
+  426, 396, 427, 397, 428, 398, 429, 399, 430, 400, 431, 401, 432, 402, 433,
+  403, 434, 404, 435, 405, 436, 406, 437, 407, 438, 408, 439, 409, 440, 410,
+  441, 411, 442, 412, 443, 413, 444, 414, 445, 415, 446, 416, 416, 417, 448,
+  418, 449, 419, 450, 420, 451, 421, 452, 422, 453, 423, 454, 424, 455, 425,
+  456, 426, 457, 427, 458, 428, 459, 429, 460, 430, 461, 431, 462, 432, 463,
+  433, 464, 434, 465, 435, 466, 436, 467, 437, 468, 438, 469, 439, 470, 440,
+  471, 441, 472, 442, 473, 443, 474, 444, 475, 445, 476, 446, 477, 447, 478,
+  448, 448, 449, 480, 450, 481, 451, 482, 452, 483, 453, 484, 454, 485, 455,
+  486, 456, 487, 457, 488, 458, 489, 459, 490, 460, 491, 461, 492, 462, 493,
+  463, 494, 464, 495, 465, 496, 466, 497, 467, 498, 468, 499, 469, 500, 470,
+  501, 471, 502, 472, 503, 473, 504, 474, 505, 475, 506, 476, 507, 477, 508,
+  478, 509, 479, 510, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mcol_scan_16x16_neighbors[257 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   16,  16,  32,  32,  48,  48,  64,  64,  80,  80,  96,
+  96,  112, 112, 128, 128, 144, 144, 160, 160, 176, 176, 192, 192, 208, 208,
+  224, 224, 0,   0,   1,   16,  17,  32,  33,  48,  49,  64,  65,  80,  81,
+  96,  97,  112, 113, 128, 129, 144, 145, 160, 161, 176, 177, 192, 193, 208,
+  209, 224, 225, 240, 1,   1,   2,   17,  18,  33,  34,  49,  50,  65,  66,
+  81,  82,  97,  98,  113, 114, 129, 130, 145, 146, 161, 162, 177, 178, 193,
+  194, 209, 210, 225, 226, 241, 2,   2,   3,   18,  19,  34,  35,  50,  51,
+  66,  67,  82,  83,  98,  99,  114, 115, 130, 131, 146, 147, 162, 163, 178,
+  179, 194, 195, 210, 211, 226, 227, 242, 3,   3,   4,   19,  20,  35,  36,
+  51,  52,  67,  68,  83,  84,  99,  100, 115, 116, 131, 132, 147, 148, 163,
+  164, 179, 180, 195, 196, 211, 212, 227, 228, 243, 4,   4,   5,   20,  21,
+  36,  37,  52,  53,  68,  69,  84,  85,  100, 101, 116, 117, 132, 133, 148,
+  149, 164, 165, 180, 181, 196, 197, 212, 213, 228, 229, 244, 5,   5,   6,
+  21,  22,  37,  38,  53,  54,  69,  70,  85,  86,  101, 102, 117, 118, 133,
+  134, 149, 150, 165, 166, 181, 182, 197, 198, 213, 214, 229, 230, 245, 6,
+  6,   7,   22,  23,  38,  39,  54,  55,  70,  71,  86,  87,  102, 103, 118,
+  119, 134, 135, 150, 151, 166, 167, 182, 183, 198, 199, 214, 215, 230, 231,
+  246, 7,   7,   8,   23,  24,  39,  40,  55,  56,  71,  72,  87,  88,  103,
+  104, 119, 120, 135, 136, 151, 152, 167, 168, 183, 184, 199, 200, 215, 216,
+  231, 232, 247, 8,   8,   9,   24,  25,  40,  41,  56,  57,  72,  73,  88,
+  89,  104, 105, 120, 121, 136, 137, 152, 153, 168, 169, 184, 185, 200, 201,
+  216, 217, 232, 233, 248, 9,   9,   10,  25,  26,  41,  42,  57,  58,  73,
+  74,  89,  90,  105, 106, 121, 122, 137, 138, 153, 154, 169, 170, 185, 186,
+  201, 202, 217, 218, 233, 234, 249, 10,  10,  11,  26,  27,  42,  43,  58,
+  59,  74,  75,  90,  91,  106, 107, 122, 123, 138, 139, 154, 155, 170, 171,
+  186, 187, 202, 203, 218, 219, 234, 235, 250, 11,  11,  12,  27,  28,  43,
+  44,  59,  60,  75,  76,  91,  92,  107, 108, 123, 124, 139, 140, 155, 156,
+  171, 172, 187, 188, 203, 204, 219, 220, 235, 236, 251, 12,  12,  13,  28,
+  29,  44,  45,  60,  61,  76,  77,  92,  93,  108, 109, 124, 125, 140, 141,
+  156, 157, 172, 173, 188, 189, 204, 205, 220, 221, 236, 237, 252, 13,  13,
+  14,  29,  30,  45,  46,  61,  62,  77,  78,  93,  94,  109, 110, 125, 126,
+  141, 142, 157, 158, 173, 174, 189, 190, 205, 206, 221, 222, 237, 238, 253,
+  14,  14,  15,  30,  31,  46,  47,  62,  63,  78,  79,  94,  95,  110, 111,
+  126, 127, 142, 143, 158, 159, 174, 175, 190, 191, 206, 207, 222, 223, 238,
+  239, 254, 0,   0,
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mrow_scan_16x16_neighbors[257 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   1,   1,   2,   2,   3,   3,   4,   4,   5,   5,   6,
+  6,   7,   7,   8,   8,   9,   9,   10,  10,  11,  11,  12,  12,  13,  13,
+  14,  14,  0,   0,   1,   16,  2,   17,  3,   18,  4,   19,  5,   20,  6,
+  21,  7,   22,  8,   23,  9,   24,  10,  25,  11,  26,  12,  27,  13,  28,
+  14,  29,  15,  30,  16,  16,  17,  32,  18,  33,  19,  34,  20,  35,  21,
+  36,  22,  37,  23,  38,  24,  39,  25,  40,  26,  41,  27,  42,  28,  43,
+  29,  44,  30,  45,  31,  46,  32,  32,  33,  48,  34,  49,  35,  50,  36,
+  51,  37,  52,  38,  53,  39,  54,  40,  55,  41,  56,  42,  57,  43,  58,
+  44,  59,  45,  60,  46,  61,  47,  62,  48,  48,  49,  64,  50,  65,  51,
+  66,  52,  67,  53,  68,  54,  69,  55,  70,  56,  71,  57,  72,  58,  73,
+  59,  74,  60,  75,  61,  76,  62,  77,  63,  78,  64,  64,  65,  80,  66,
+  81,  67,  82,  68,  83,  69,  84,  70,  85,  71,  86,  72,  87,  73,  88,
+  74,  89,  75,  90,  76,  91,  77,  92,  78,  93,  79,  94,  80,  80,  81,
+  96,  82,  97,  83,  98,  84,  99,  85,  100, 86,  101, 87,  102, 88,  103,
+  89,  104, 90,  105, 91,  106, 92,  107, 93,  108, 94,  109, 95,  110, 96,
+  96,  97,  112, 98,  113, 99,  114, 100, 115, 101, 116, 102, 117, 103, 118,
+  104, 119, 105, 120, 106, 121, 107, 122, 108, 123, 109, 124, 110, 125, 111,
+  126, 112, 112, 113, 128, 114, 129, 115, 130, 116, 131, 117, 132, 118, 133,
+  119, 134, 120, 135, 121, 136, 122, 137, 123, 138, 124, 139, 125, 140, 126,
+  141, 127, 142, 128, 128, 129, 144, 130, 145, 131, 146, 132, 147, 133, 148,
+  134, 149, 135, 150, 136, 151, 137, 152, 138, 153, 139, 154, 140, 155, 141,
+  156, 142, 157, 143, 158, 144, 144, 145, 160, 146, 161, 147, 162, 148, 163,
+  149, 164, 150, 165, 151, 166, 152, 167, 153, 168, 154, 169, 155, 170, 156,
+  171, 157, 172, 158, 173, 159, 174, 160, 160, 161, 176, 162, 177, 163, 178,
+  164, 179, 165, 180, 166, 181, 167, 182, 168, 183, 169, 184, 170, 185, 171,
+  186, 172, 187, 173, 188, 174, 189, 175, 190, 176, 176, 177, 192, 178, 193,
+  179, 194, 180, 195, 181, 196, 182, 197, 183, 198, 184, 199, 185, 200, 186,
+  201, 187, 202, 188, 203, 189, 204, 190, 205, 191, 206, 192, 192, 193, 208,
+  194, 209, 195, 210, 196, 211, 197, 212, 198, 213, 199, 214, 200, 215, 201,
+  216, 202, 217, 203, 218, 204, 219, 205, 220, 206, 221, 207, 222, 208, 208,
+  209, 224, 210, 225, 211, 226, 212, 227, 213, 228, 214, 229, 215, 230, 216,
+  231, 217, 232, 218, 233, 219, 234, 220, 235, 221, 236, 222, 237, 223, 238,
+  224, 224, 225, 240, 226, 241, 227, 242, 228, 243, 229, 244, 230, 245, 231,
+  246, 232, 247, 233, 248, 234, 249, 235, 250, 236, 251, 237, 252, 238, 253,
+  239, 254, 0,   0,
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                default_scan_16x16_neighbors[257 * MAX_NEIGHBORS]) = {
+  0,   0,   0,   0,   0,   0,   16,  16,  1,   16,  1,   1,   2,   2,   2,
+  17,  17,  32,  32,  32,  48,  48,  33,  48,  18,  33,  3,   18,  3,   3,
+  4,   4,   4,   19,  19,  34,  34,  49,  49,  64,  64,  64,  80,  80,  65,
+  80,  50,  65,  35,  50,  20,  35,  5,   20,  5,   5,   6,   6,   6,   21,
+  21,  36,  36,  51,  51,  66,  66,  81,  81,  96,  96,  96,  112, 112, 97,
+  112, 82,  97,  67,  82,  52,  67,  37,  52,  22,  37,  7,   22,  7,   7,
+  8,   8,   8,   23,  23,  38,  38,  53,  53,  68,  68,  83,  83,  98,  98,
+  113, 113, 128, 128, 128, 144, 144, 129, 144, 114, 129, 99,  114, 84,  99,
+  69,  84,  54,  69,  39,  54,  24,  39,  9,   24,  9,   9,   10,  10,  10,
+  25,  25,  40,  40,  55,  55,  70,  70,  85,  85,  100, 100, 115, 115, 130,
+  130, 145, 145, 160, 160, 160, 176, 176, 161, 176, 146, 161, 131, 146, 116,
+  131, 101, 116, 86,  101, 71,  86,  56,  71,  41,  56,  26,  41,  11,  26,
+  11,  11,  12,  12,  12,  27,  27,  42,  42,  57,  57,  72,  72,  87,  87,
+  102, 102, 117, 117, 132, 132, 147, 147, 162, 162, 177, 177, 192, 192, 192,
+  208, 208, 193, 208, 178, 193, 163, 178, 148, 163, 133, 148, 118, 133, 103,
+  118, 88,  103, 73,  88,  58,  73,  43,  58,  28,  43,  13,  28,  13,  13,
+  14,  14,  14,  29,  29,  44,  44,  59,  59,  74,  74,  89,  89,  104, 104,
+  119, 119, 134, 134, 149, 149, 164, 164, 179, 179, 194, 194, 209, 209, 224,
+  224, 224, 225, 240, 210, 225, 195, 210, 180, 195, 165, 180, 150, 165, 135,
+  150, 120, 135, 105, 120, 90,  105, 75,  90,  60,  75,  45,  60,  30,  45,
+  15,  30,  31,  46,  46,  61,  61,  76,  76,  91,  91,  106, 106, 121, 121,
+  136, 136, 151, 151, 166, 166, 181, 181, 196, 196, 211, 211, 226, 226, 241,
+  227, 242, 212, 227, 197, 212, 182, 197, 167, 182, 152, 167, 137, 152, 122,
+  137, 107, 122, 92,  107, 77,  92,  62,  77,  47,  62,  63,  78,  78,  93,
+  93,  108, 108, 123, 123, 138, 138, 153, 153, 168, 168, 183, 183, 198, 198,
+  213, 213, 228, 228, 243, 229, 244, 214, 229, 199, 214, 184, 199, 169, 184,
+  154, 169, 139, 154, 124, 139, 109, 124, 94,  109, 79,  94,  95,  110, 110,
+  125, 125, 140, 140, 155, 155, 170, 170, 185, 185, 200, 200, 215, 215, 230,
+  230, 245, 231, 246, 216, 231, 201, 216, 186, 201, 171, 186, 156, 171, 141,
+  156, 126, 141, 111, 126, 127, 142, 142, 157, 157, 172, 172, 187, 187, 202,
+  202, 217, 217, 232, 232, 247, 233, 248, 218, 233, 203, 218, 188, 203, 173,
+  188, 158, 173, 143, 158, 159, 174, 174, 189, 189, 204, 204, 219, 219, 234,
+  234, 249, 235, 250, 220, 235, 205, 220, 190, 205, 175, 190, 191, 206, 206,
+  221, 221, 236, 236, 251, 237, 252, 222, 237, 207, 222, 223, 238, 238, 253,
+  239, 254, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mcol_scan_32x32_neighbors[1025 * MAX_NEIGHBORS]) = {
+  0,   0,    0,   0,    32,  32,   64,  64,   96,  96,   128, 128,  160, 160,
+  192, 192,  224, 224,  256, 256,  288, 288,  320, 320,  352, 352,  384, 384,
+  416, 416,  448, 448,  480, 480,  512, 512,  544, 544,  576, 576,  608, 608,
+  640, 640,  672, 672,  704, 704,  736, 736,  768, 768,  800, 800,  832, 832,
+  864, 864,  896, 896,  928, 928,  960, 960,  0,   0,    1,   32,   33,  64,
+  65,  96,   97,  128,  129, 160,  161, 192,  193, 224,  225, 256,  257, 288,
+  289, 320,  321, 352,  353, 384,  385, 416,  417, 448,  449, 480,  481, 512,
+  513, 544,  545, 576,  577, 608,  609, 640,  641, 672,  673, 704,  705, 736,
+  737, 768,  769, 800,  801, 832,  833, 864,  865, 896,  897, 928,  929, 960,
+  961, 992,  1,   1,    2,   33,   34,  65,   66,  97,   98,  129,  130, 161,
+  162, 193,  194, 225,  226, 257,  258, 289,  290, 321,  322, 353,  354, 385,
+  386, 417,  418, 449,  450, 481,  482, 513,  514, 545,  546, 577,  578, 609,
+  610, 641,  642, 673,  674, 705,  706, 737,  738, 769,  770, 801,  802, 833,
+  834, 865,  866, 897,  898, 929,  930, 961,  962, 993,  2,   2,    3,   34,
+  35,  66,   67,  98,   99,  130,  131, 162,  163, 194,  195, 226,  227, 258,
+  259, 290,  291, 322,  323, 354,  355, 386,  387, 418,  419, 450,  451, 482,
+  483, 514,  515, 546,  547, 578,  579, 610,  611, 642,  643, 674,  675, 706,
+  707, 738,  739, 770,  771, 802,  803, 834,  835, 866,  867, 898,  899, 930,
+  931, 962,  963, 994,  3,   3,    4,   35,   36,  67,   68,  99,   100, 131,
+  132, 163,  164, 195,  196, 227,  228, 259,  260, 291,  292, 323,  324, 355,
+  356, 387,  388, 419,  420, 451,  452, 483,  484, 515,  516, 547,  548, 579,
+  580, 611,  612, 643,  644, 675,  676, 707,  708, 739,  740, 771,  772, 803,
+  804, 835,  836, 867,  868, 899,  900, 931,  932, 963,  964, 995,  4,   4,
+  5,   36,   37,  68,   69,  100,  101, 132,  133, 164,  165, 196,  197, 228,
+  229, 260,  261, 292,  293, 324,  325, 356,  357, 388,  389, 420,  421, 452,
+  453, 484,  485, 516,  517, 548,  549, 580,  581, 612,  613, 644,  645, 676,
+  677, 708,  709, 740,  741, 772,  773, 804,  805, 836,  837, 868,  869, 900,
+  901, 932,  933, 964,  965, 996,  5,   5,    6,   37,   38,  69,   70,  101,
+  102, 133,  134, 165,  166, 197,  198, 229,  230, 261,  262, 293,  294, 325,
+  326, 357,  358, 389,  390, 421,  422, 453,  454, 485,  486, 517,  518, 549,
+  550, 581,  582, 613,  614, 645,  646, 677,  678, 709,  710, 741,  742, 773,
+  774, 805,  806, 837,  838, 869,  870, 901,  902, 933,  934, 965,  966, 997,
+  6,   6,    7,   38,   39,  70,   71,  102,  103, 134,  135, 166,  167, 198,
+  199, 230,  231, 262,  263, 294,  295, 326,  327, 358,  359, 390,  391, 422,
+  423, 454,  455, 486,  487, 518,  519, 550,  551, 582,  583, 614,  615, 646,
+  647, 678,  679, 710,  711, 742,  743, 774,  775, 806,  807, 838,  839, 870,
+  871, 902,  903, 934,  935, 966,  967, 998,  7,   7,    8,   39,   40,  71,
+  72,  103,  104, 135,  136, 167,  168, 199,  200, 231,  232, 263,  264, 295,
+  296, 327,  328, 359,  360, 391,  392, 423,  424, 455,  456, 487,  488, 519,
+  520, 551,  552, 583,  584, 615,  616, 647,  648, 679,  680, 711,  712, 743,
+  744, 775,  776, 807,  808, 839,  840, 871,  872, 903,  904, 935,  936, 967,
+  968, 999,  8,   8,    9,   40,   41,  72,   73,  104,  105, 136,  137, 168,
+  169, 200,  201, 232,  233, 264,  265, 296,  297, 328,  329, 360,  361, 392,
+  393, 424,  425, 456,  457, 488,  489, 520,  521, 552,  553, 584,  585, 616,
+  617, 648,  649, 680,  681, 712,  713, 744,  745, 776,  777, 808,  809, 840,
+  841, 872,  873, 904,  905, 936,  937, 968,  969, 1000, 9,   9,    10,  41,
+  42,  73,   74,  105,  106, 137,  138, 169,  170, 201,  202, 233,  234, 265,
+  266, 297,  298, 329,  330, 361,  362, 393,  394, 425,  426, 457,  458, 489,
+  490, 521,  522, 553,  554, 585,  586, 617,  618, 649,  650, 681,  682, 713,
+  714, 745,  746, 777,  778, 809,  810, 841,  842, 873,  874, 905,  906, 937,
+  938, 969,  970, 1001, 10,  10,   11,  42,   43,  74,   75,  106,  107, 138,
+  139, 170,  171, 202,  203, 234,  235, 266,  267, 298,  299, 330,  331, 362,
+  363, 394,  395, 426,  427, 458,  459, 490,  491, 522,  523, 554,  555, 586,
+  587, 618,  619, 650,  651, 682,  683, 714,  715, 746,  747, 778,  779, 810,
+  811, 842,  843, 874,  875, 906,  907, 938,  939, 970,  971, 1002, 11,  11,
+  12,  43,   44,  75,   76,  107,  108, 139,  140, 171,  172, 203,  204, 235,
+  236, 267,  268, 299,  300, 331,  332, 363,  364, 395,  396, 427,  428, 459,
+  460, 491,  492, 523,  524, 555,  556, 587,  588, 619,  620, 651,  652, 683,
+  684, 715,  716, 747,  748, 779,  780, 811,  812, 843,  844, 875,  876, 907,
+  908, 939,  940, 971,  972, 1003, 12,  12,   13,  44,   45,  76,   77,  108,
+  109, 140,  141, 172,  173, 204,  205, 236,  237, 268,  269, 300,  301, 332,
+  333, 364,  365, 396,  397, 428,  429, 460,  461, 492,  493, 524,  525, 556,
+  557, 588,  589, 620,  621, 652,  653, 684,  685, 716,  717, 748,  749, 780,
+  781, 812,  813, 844,  845, 876,  877, 908,  909, 940,  941, 972,  973, 1004,
+  13,  13,   14,  45,   46,  77,   78,  109,  110, 141,  142, 173,  174, 205,
+  206, 237,  238, 269,  270, 301,  302, 333,  334, 365,  366, 397,  398, 429,
+  430, 461,  462, 493,  494, 525,  526, 557,  558, 589,  590, 621,  622, 653,
+  654, 685,  686, 717,  718, 749,  750, 781,  782, 813,  814, 845,  846, 877,
+  878, 909,  910, 941,  942, 973,  974, 1005, 14,  14,   15,  46,   47,  78,
+  79,  110,  111, 142,  143, 174,  175, 206,  207, 238,  239, 270,  271, 302,
+  303, 334,  335, 366,  367, 398,  399, 430,  431, 462,  463, 494,  495, 526,
+  527, 558,  559, 590,  591, 622,  623, 654,  655, 686,  687, 718,  719, 750,
+  751, 782,  783, 814,  815, 846,  847, 878,  879, 910,  911, 942,  943, 974,
+  975, 1006, 15,  15,   16,  47,   48,  79,   80,  111,  112, 143,  144, 175,
+  176, 207,  208, 239,  240, 271,  272, 303,  304, 335,  336, 367,  368, 399,
+  400, 431,  432, 463,  464, 495,  496, 527,  528, 559,  560, 591,  592, 623,
+  624, 655,  656, 687,  688, 719,  720, 751,  752, 783,  784, 815,  816, 847,
+  848, 879,  880, 911,  912, 943,  944, 975,  976, 1007, 16,  16,   17,  48,
+  49,  80,   81,  112,  113, 144,  145, 176,  177, 208,  209, 240,  241, 272,
+  273, 304,  305, 336,  337, 368,  369, 400,  401, 432,  433, 464,  465, 496,
+  497, 528,  529, 560,  561, 592,  593, 624,  625, 656,  657, 688,  689, 720,
+  721, 752,  753, 784,  785, 816,  817, 848,  849, 880,  881, 912,  913, 944,
+  945, 976,  977, 1008, 17,  17,   18,  49,   50,  81,   82,  113,  114, 145,
+  146, 177,  178, 209,  210, 241,  242, 273,  274, 305,  306, 337,  338, 369,
+  370, 401,  402, 433,  434, 465,  466, 497,  498, 529,  530, 561,  562, 593,
+  594, 625,  626, 657,  658, 689,  690, 721,  722, 753,  754, 785,  786, 817,
+  818, 849,  850, 881,  882, 913,  914, 945,  946, 977,  978, 1009, 18,  18,
+  19,  50,   51,  82,   83,  114,  115, 146,  147, 178,  179, 210,  211, 242,
+  243, 274,  275, 306,  307, 338,  339, 370,  371, 402,  403, 434,  435, 466,
+  467, 498,  499, 530,  531, 562,  563, 594,  595, 626,  627, 658,  659, 690,
+  691, 722,  723, 754,  755, 786,  787, 818,  819, 850,  851, 882,  883, 914,
+  915, 946,  947, 978,  979, 1010, 19,  19,   20,  51,   52,  83,   84,  115,
+  116, 147,  148, 179,  180, 211,  212, 243,  244, 275,  276, 307,  308, 339,
+  340, 371,  372, 403,  404, 435,  436, 467,  468, 499,  500, 531,  532, 563,
+  564, 595,  596, 627,  628, 659,  660, 691,  692, 723,  724, 755,  756, 787,
+  788, 819,  820, 851,  852, 883,  884, 915,  916, 947,  948, 979,  980, 1011,
+  20,  20,   21,  52,   53,  84,   85,  116,  117, 148,  149, 180,  181, 212,
+  213, 244,  245, 276,  277, 308,  309, 340,  341, 372,  373, 404,  405, 436,
+  437, 468,  469, 500,  501, 532,  533, 564,  565, 596,  597, 628,  629, 660,
+  661, 692,  693, 724,  725, 756,  757, 788,  789, 820,  821, 852,  853, 884,
+  885, 916,  917, 948,  949, 980,  981, 1012, 21,  21,   22,  53,   54,  85,
+  86,  117,  118, 149,  150, 181,  182, 213,  214, 245,  246, 277,  278, 309,
+  310, 341,  342, 373,  374, 405,  406, 437,  438, 469,  470, 501,  502, 533,
+  534, 565,  566, 597,  598, 629,  630, 661,  662, 693,  694, 725,  726, 757,
+  758, 789,  790, 821,  822, 853,  854, 885,  886, 917,  918, 949,  950, 981,
+  982, 1013, 22,  22,   23,  54,   55,  86,   87,  118,  119, 150,  151, 182,
+  183, 214,  215, 246,  247, 278,  279, 310,  311, 342,  343, 374,  375, 406,
+  407, 438,  439, 470,  471, 502,  503, 534,  535, 566,  567, 598,  599, 630,
+  631, 662,  663, 694,  695, 726,  727, 758,  759, 790,  791, 822,  823, 854,
+  855, 886,  887, 918,  919, 950,  951, 982,  983, 1014, 23,  23,   24,  55,
+  56,  87,   88,  119,  120, 151,  152, 183,  184, 215,  216, 247,  248, 279,
+  280, 311,  312, 343,  344, 375,  376, 407,  408, 439,  440, 471,  472, 503,
+  504, 535,  536, 567,  568, 599,  600, 631,  632, 663,  664, 695,  696, 727,
+  728, 759,  760, 791,  792, 823,  824, 855,  856, 887,  888, 919,  920, 951,
+  952, 983,  984, 1015, 24,  24,   25,  56,   57,  88,   89,  120,  121, 152,
+  153, 184,  185, 216,  217, 248,  249, 280,  281, 312,  313, 344,  345, 376,
+  377, 408,  409, 440,  441, 472,  473, 504,  505, 536,  537, 568,  569, 600,
+  601, 632,  633, 664,  665, 696,  697, 728,  729, 760,  761, 792,  793, 824,
+  825, 856,  857, 888,  889, 920,  921, 952,  953, 984,  985, 1016, 25,  25,
+  26,  57,   58,  89,   90,  121,  122, 153,  154, 185,  186, 217,  218, 249,
+  250, 281,  282, 313,  314, 345,  346, 377,  378, 409,  410, 441,  442, 473,
+  474, 505,  506, 537,  538, 569,  570, 601,  602, 633,  634, 665,  666, 697,
+  698, 729,  730, 761,  762, 793,  794, 825,  826, 857,  858, 889,  890, 921,
+  922, 953,  954, 985,  986, 1017, 26,  26,   27,  58,   59,  90,   91,  122,
+  123, 154,  155, 186,  187, 218,  219, 250,  251, 282,  283, 314,  315, 346,
+  347, 378,  379, 410,  411, 442,  443, 474,  475, 506,  507, 538,  539, 570,
+  571, 602,  603, 634,  635, 666,  667, 698,  699, 730,  731, 762,  763, 794,
+  795, 826,  827, 858,  859, 890,  891, 922,  923, 954,  955, 986,  987, 1018,
+  27,  27,   28,  59,   60,  91,   92,  123,  124, 155,  156, 187,  188, 219,
+  220, 251,  252, 283,  284, 315,  316, 347,  348, 379,  380, 411,  412, 443,
+  444, 475,  476, 507,  508, 539,  540, 571,  572, 603,  604, 635,  636, 667,
+  668, 699,  700, 731,  732, 763,  764, 795,  796, 827,  828, 859,  860, 891,
+  892, 923,  924, 955,  956, 987,  988, 1019, 28,  28,   29,  60,   61,  92,
+  93,  124,  125, 156,  157, 188,  189, 220,  221, 252,  253, 284,  285, 316,
+  317, 348,  349, 380,  381, 412,  413, 444,  445, 476,  477, 508,  509, 540,
+  541, 572,  573, 604,  605, 636,  637, 668,  669, 700,  701, 732,  733, 764,
+  765, 796,  797, 828,  829, 860,  861, 892,  893, 924,  925, 956,  957, 988,
+  989, 1020, 29,  29,   30,  61,   62,  93,   94,  125,  126, 157,  158, 189,
+  190, 221,  222, 253,  254, 285,  286, 317,  318, 349,  350, 381,  382, 413,
+  414, 445,  446, 477,  478, 509,  510, 541,  542, 573,  574, 605,  606, 637,
+  638, 669,  670, 701,  702, 733,  734, 765,  766, 797,  798, 829,  830, 861,
+  862, 893,  894, 925,  926, 957,  958, 989,  990, 1021, 30,  30,   31,  62,
+  63,  94,   95,  126,  127, 158,  159, 190,  191, 222,  223, 254,  255, 286,
+  287, 318,  319, 350,  351, 382,  383, 414,  415, 446,  447, 478,  479, 510,
+  511, 542,  543, 574,  575, 606,  607, 638,  639, 670,  671, 702,  703, 734,
+  735, 766,  767, 798,  799, 830,  831, 862,  863, 894,  895, 926,  927, 958,
+  959, 990,  991, 1022, 0,   0,
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                mrow_scan_32x32_neighbors[1025 * MAX_NEIGHBORS]) = {
+  0,   0,    0,   0,    1,   1,    2,   2,    3,   3,    4,   4,    5,   5,
+  6,   6,    7,   7,    8,   8,    9,   9,    10,  10,   11,  11,   12,  12,
+  13,  13,   14,  14,   15,  15,   16,  16,   17,  17,   18,  18,   19,  19,
+  20,  20,   21,  21,   22,  22,   23,  23,   24,  24,   25,  25,   26,  26,
+  27,  27,   28,  28,   29,  29,   30,  30,   0,   0,    1,   32,   2,   33,
+  3,   34,   4,   35,   5,   36,   6,   37,   7,   38,   8,   39,   9,   40,
+  10,  41,   11,  42,   12,  43,   13,  44,   14,  45,   15,  46,   16,  47,
+  17,  48,   18,  49,   19,  50,   20,  51,   21,  52,   22,  53,   23,  54,
+  24,  55,   25,  56,   26,  57,   27,  58,   28,  59,   29,  60,   30,  61,
+  31,  62,   32,  32,   33,  64,   34,  65,   35,  66,   36,  67,   37,  68,
+  38,  69,   39,  70,   40,  71,   41,  72,   42,  73,   43,  74,   44,  75,
+  45,  76,   46,  77,   47,  78,   48,  79,   49,  80,   50,  81,   51,  82,
+  52,  83,   53,  84,   54,  85,   55,  86,   56,  87,   57,  88,   58,  89,
+  59,  90,   60,  91,   61,  92,   62,  93,   63,  94,   64,  64,   65,  96,
+  66,  97,   67,  98,   68,  99,   69,  100,  70,  101,  71,  102,  72,  103,
+  73,  104,  74,  105,  75,  106,  76,  107,  77,  108,  78,  109,  79,  110,
+  80,  111,  81,  112,  82,  113,  83,  114,  84,  115,  85,  116,  86,  117,
+  87,  118,  88,  119,  89,  120,  90,  121,  91,  122,  92,  123,  93,  124,
+  94,  125,  95,  126,  96,  96,   97,  128,  98,  129,  99,  130,  100, 131,
+  101, 132,  102, 133,  103, 134,  104, 135,  105, 136,  106, 137,  107, 138,
+  108, 139,  109, 140,  110, 141,  111, 142,  112, 143,  113, 144,  114, 145,
+  115, 146,  116, 147,  117, 148,  118, 149,  119, 150,  120, 151,  121, 152,
+  122, 153,  123, 154,  124, 155,  125, 156,  126, 157,  127, 158,  128, 128,
+  129, 160,  130, 161,  131, 162,  132, 163,  133, 164,  134, 165,  135, 166,
+  136, 167,  137, 168,  138, 169,  139, 170,  140, 171,  141, 172,  142, 173,
+  143, 174,  144, 175,  145, 176,  146, 177,  147, 178,  148, 179,  149, 180,
+  150, 181,  151, 182,  152, 183,  153, 184,  154, 185,  155, 186,  156, 187,
+  157, 188,  158, 189,  159, 190,  160, 160,  161, 192,  162, 193,  163, 194,
+  164, 195,  165, 196,  166, 197,  167, 198,  168, 199,  169, 200,  170, 201,
+  171, 202,  172, 203,  173, 204,  174, 205,  175, 206,  176, 207,  177, 208,
+  178, 209,  179, 210,  180, 211,  181, 212,  182, 213,  183, 214,  184, 215,
+  185, 216,  186, 217,  187, 218,  188, 219,  189, 220,  190, 221,  191, 222,
+  192, 192,  193, 224,  194, 225,  195, 226,  196, 227,  197, 228,  198, 229,
+  199, 230,  200, 231,  201, 232,  202, 233,  203, 234,  204, 235,  205, 236,
+  206, 237,  207, 238,  208, 239,  209, 240,  210, 241,  211, 242,  212, 243,
+  213, 244,  214, 245,  215, 246,  216, 247,  217, 248,  218, 249,  219, 250,
+  220, 251,  221, 252,  222, 253,  223, 254,  224, 224,  225, 256,  226, 257,
+  227, 258,  228, 259,  229, 260,  230, 261,  231, 262,  232, 263,  233, 264,
+  234, 265,  235, 266,  236, 267,  237, 268,  238, 269,  239, 270,  240, 271,
+  241, 272,  242, 273,  243, 274,  244, 275,  245, 276,  246, 277,  247, 278,
+  248, 279,  249, 280,  250, 281,  251, 282,  252, 283,  253, 284,  254, 285,
+  255, 286,  256, 256,  257, 288,  258, 289,  259, 290,  260, 291,  261, 292,
+  262, 293,  263, 294,  264, 295,  265, 296,  266, 297,  267, 298,  268, 299,
+  269, 300,  270, 301,  271, 302,  272, 303,  273, 304,  274, 305,  275, 306,
+  276, 307,  277, 308,  278, 309,  279, 310,  280, 311,  281, 312,  282, 313,
+  283, 314,  284, 315,  285, 316,  286, 317,  287, 318,  288, 288,  289, 320,
+  290, 321,  291, 322,  292, 323,  293, 324,  294, 325,  295, 326,  296, 327,
+  297, 328,  298, 329,  299, 330,  300, 331,  301, 332,  302, 333,  303, 334,
+  304, 335,  305, 336,  306, 337,  307, 338,  308, 339,  309, 340,  310, 341,
+  311, 342,  312, 343,  313, 344,  314, 345,  315, 346,  316, 347,  317, 348,
+  318, 349,  319, 350,  320, 320,  321, 352,  322, 353,  323, 354,  324, 355,
+  325, 356,  326, 357,  327, 358,  328, 359,  329, 360,  330, 361,  331, 362,
+  332, 363,  333, 364,  334, 365,  335, 366,  336, 367,  337, 368,  338, 369,
+  339, 370,  340, 371,  341, 372,  342, 373,  343, 374,  344, 375,  345, 376,
+  346, 377,  347, 378,  348, 379,  349, 380,  350, 381,  351, 382,  352, 352,
+  353, 384,  354, 385,  355, 386,  356, 387,  357, 388,  358, 389,  359, 390,
+  360, 391,  361, 392,  362, 393,  363, 394,  364, 395,  365, 396,  366, 397,
+  367, 398,  368, 399,  369, 400,  370, 401,  371, 402,  372, 403,  373, 404,
+  374, 405,  375, 406,  376, 407,  377, 408,  378, 409,  379, 410,  380, 411,
+  381, 412,  382, 413,  383, 414,  384, 384,  385, 416,  386, 417,  387, 418,
+  388, 419,  389, 420,  390, 421,  391, 422,  392, 423,  393, 424,  394, 425,
+  395, 426,  396, 427,  397, 428,  398, 429,  399, 430,  400, 431,  401, 432,
+  402, 433,  403, 434,  404, 435,  405, 436,  406, 437,  407, 438,  408, 439,
+  409, 440,  410, 441,  411, 442,  412, 443,  413, 444,  414, 445,  415, 446,
+  416, 416,  417, 448,  418, 449,  419, 450,  420, 451,  421, 452,  422, 453,
+  423, 454,  424, 455,  425, 456,  426, 457,  427, 458,  428, 459,  429, 460,
+  430, 461,  431, 462,  432, 463,  433, 464,  434, 465,  435, 466,  436, 467,
+  437, 468,  438, 469,  439, 470,  440, 471,  441, 472,  442, 473,  443, 474,
+  444, 475,  445, 476,  446, 477,  447, 478,  448, 448,  449, 480,  450, 481,
+  451, 482,  452, 483,  453, 484,  454, 485,  455, 486,  456, 487,  457, 488,
+  458, 489,  459, 490,  460, 491,  461, 492,  462, 493,  463, 494,  464, 495,
+  465, 496,  466, 497,  467, 498,  468, 499,  469, 500,  470, 501,  471, 502,
+  472, 503,  473, 504,  474, 505,  475, 506,  476, 507,  477, 508,  478, 509,
+  479, 510,  480, 480,  481, 512,  482, 513,  483, 514,  484, 515,  485, 516,
+  486, 517,  487, 518,  488, 519,  489, 520,  490, 521,  491, 522,  492, 523,
+  493, 524,  494, 525,  495, 526,  496, 527,  497, 528,  498, 529,  499, 530,
+  500, 531,  501, 532,  502, 533,  503, 534,  504, 535,  505, 536,  506, 537,
+  507, 538,  508, 539,  509, 540,  510, 541,  511, 542,  512, 512,  513, 544,
+  514, 545,  515, 546,  516, 547,  517, 548,  518, 549,  519, 550,  520, 551,
+  521, 552,  522, 553,  523, 554,  524, 555,  525, 556,  526, 557,  527, 558,
+  528, 559,  529, 560,  530, 561,  531, 562,  532, 563,  533, 564,  534, 565,
+  535, 566,  536, 567,  537, 568,  538, 569,  539, 570,  540, 571,  541, 572,
+  542, 573,  543, 574,  544, 544,  545, 576,  546, 577,  547, 578,  548, 579,
+  549, 580,  550, 581,  551, 582,  552, 583,  553, 584,  554, 585,  555, 586,
+  556, 587,  557, 588,  558, 589,  559, 590,  560, 591,  561, 592,  562, 593,
+  563, 594,  564, 595,  565, 596,  566, 597,  567, 598,  568, 599,  569, 600,
+  570, 601,  571, 602,  572, 603,  573, 604,  574, 605,  575, 606,  576, 576,
+  577, 608,  578, 609,  579, 610,  580, 611,  581, 612,  582, 613,  583, 614,
+  584, 615,  585, 616,  586, 617,  587, 618,  588, 619,  589, 620,  590, 621,
+  591, 622,  592, 623,  593, 624,  594, 625,  595, 626,  596, 627,  597, 628,
+  598, 629,  599, 630,  600, 631,  601, 632,  602, 633,  603, 634,  604, 635,
+  605, 636,  606, 637,  607, 638,  608, 608,  609, 640,  610, 641,  611, 642,
+  612, 643,  613, 644,  614, 645,  615, 646,  616, 647,  617, 648,  618, 649,
+  619, 650,  620, 651,  621, 652,  622, 653,  623, 654,  624, 655,  625, 656,
+  626, 657,  627, 658,  628, 659,  629, 660,  630, 661,  631, 662,  632, 663,
+  633, 664,  634, 665,  635, 666,  636, 667,  637, 668,  638, 669,  639, 670,
+  640, 640,  641, 672,  642, 673,  643, 674,  644, 675,  645, 676,  646, 677,
+  647, 678,  648, 679,  649, 680,  650, 681,  651, 682,  652, 683,  653, 684,
+  654, 685,  655, 686,  656, 687,  657, 688,  658, 689,  659, 690,  660, 691,
+  661, 692,  662, 693,  663, 694,  664, 695,  665, 696,  666, 697,  667, 698,
+  668, 699,  669, 700,  670, 701,  671, 702,  672, 672,  673, 704,  674, 705,
+  675, 706,  676, 707,  677, 708,  678, 709,  679, 710,  680, 711,  681, 712,
+  682, 713,  683, 714,  684, 715,  685, 716,  686, 717,  687, 718,  688, 719,
+  689, 720,  690, 721,  691, 722,  692, 723,  693, 724,  694, 725,  695, 726,
+  696, 727,  697, 728,  698, 729,  699, 730,  700, 731,  701, 732,  702, 733,
+  703, 734,  704, 704,  705, 736,  706, 737,  707, 738,  708, 739,  709, 740,
+  710, 741,  711, 742,  712, 743,  713, 744,  714, 745,  715, 746,  716, 747,
+  717, 748,  718, 749,  719, 750,  720, 751,  721, 752,  722, 753,  723, 754,
+  724, 755,  725, 756,  726, 757,  727, 758,  728, 759,  729, 760,  730, 761,
+  731, 762,  732, 763,  733, 764,  734, 765,  735, 766,  736, 736,  737, 768,
+  738, 769,  739, 770,  740, 771,  741, 772,  742, 773,  743, 774,  744, 775,
+  745, 776,  746, 777,  747, 778,  748, 779,  749, 780,  750, 781,  751, 782,
+  752, 783,  753, 784,  754, 785,  755, 786,  756, 787,  757, 788,  758, 789,
+  759, 790,  760, 791,  761, 792,  762, 793,  763, 794,  764, 795,  765, 796,
+  766, 797,  767, 798,  768, 768,  769, 800,  770, 801,  771, 802,  772, 803,
+  773, 804,  774, 805,  775, 806,  776, 807,  777, 808,  778, 809,  779, 810,
+  780, 811,  781, 812,  782, 813,  783, 814,  784, 815,  785, 816,  786, 817,
+  787, 818,  788, 819,  789, 820,  790, 821,  791, 822,  792, 823,  793, 824,
+  794, 825,  795, 826,  796, 827,  797, 828,  798, 829,  799, 830,  800, 800,
+  801, 832,  802, 833,  803, 834,  804, 835,  805, 836,  806, 837,  807, 838,
+  808, 839,  809, 840,  810, 841,  811, 842,  812, 843,  813, 844,  814, 845,
+  815, 846,  816, 847,  817, 848,  818, 849,  819, 850,  820, 851,  821, 852,
+  822, 853,  823, 854,  824, 855,  825, 856,  826, 857,  827, 858,  828, 859,
+  829, 860,  830, 861,  831, 862,  832, 832,  833, 864,  834, 865,  835, 866,
+  836, 867,  837, 868,  838, 869,  839, 870,  840, 871,  841, 872,  842, 873,
+  843, 874,  844, 875,  845, 876,  846, 877,  847, 878,  848, 879,  849, 880,
+  850, 881,  851, 882,  852, 883,  853, 884,  854, 885,  855, 886,  856, 887,
+  857, 888,  858, 889,  859, 890,  860, 891,  861, 892,  862, 893,  863, 894,
+  864, 864,  865, 896,  866, 897,  867, 898,  868, 899,  869, 900,  870, 901,
+  871, 902,  872, 903,  873, 904,  874, 905,  875, 906,  876, 907,  877, 908,
+  878, 909,  879, 910,  880, 911,  881, 912,  882, 913,  883, 914,  884, 915,
+  885, 916,  886, 917,  887, 918,  888, 919,  889, 920,  890, 921,  891, 922,
+  892, 923,  893, 924,  894, 925,  895, 926,  896, 896,  897, 928,  898, 929,
+  899, 930,  900, 931,  901, 932,  902, 933,  903, 934,  904, 935,  905, 936,
+  906, 937,  907, 938,  908, 939,  909, 940,  910, 941,  911, 942,  912, 943,
+  913, 944,  914, 945,  915, 946,  916, 947,  917, 948,  918, 949,  919, 950,
+  920, 951,  921, 952,  922, 953,  923, 954,  924, 955,  925, 956,  926, 957,
+  927, 958,  928, 928,  929, 960,  930, 961,  931, 962,  932, 963,  933, 964,
+  934, 965,  935, 966,  936, 967,  937, 968,  938, 969,  939, 970,  940, 971,
+  941, 972,  942, 973,  943, 974,  944, 975,  945, 976,  946, 977,  947, 978,
+  948, 979,  949, 980,  950, 981,  951, 982,  952, 983,  953, 984,  954, 985,
+  955, 986,  956, 987,  957, 988,  958, 989,  959, 990,  960, 960,  961, 992,
+  962, 993,  963, 994,  964, 995,  965, 996,  966, 997,  967, 998,  968, 999,
+  969, 1000, 970, 1001, 971, 1002, 972, 1003, 973, 1004, 974, 1005, 975, 1006,
+  976, 1007, 977, 1008, 978, 1009, 979, 1010, 980, 1011, 981, 1012, 982, 1013,
+  983, 1014, 984, 1015, 985, 1016, 986, 1017, 987, 1018, 988, 1019, 989, 1020,
+  990, 1021, 991, 1022, 0,   0,
+};
+
+DECLARE_ALIGNED(16, static const int16_t,
+                default_scan_32x32_neighbors[1025 * MAX_NEIGHBORS]) = {
+  0,   0,    0,   0,    0,   0,    32,  32,   1,   32,  1,   1,    2,   2,
+  2,   33,   33,  64,   64,  64,   96,  96,   65,  96,  34,  65,   3,   34,
+  3,   3,    4,   4,    4,   35,   35,  66,   66,  97,  97,  128,  128, 128,
+  160, 160,  129, 160,  98,  129,  67,  98,   36,  67,  5,   36,   5,   5,
+  6,   6,    6,   37,   37,  68,   68,  99,   99,  130, 130, 161,  161, 192,
+  192, 192,  224, 224,  193, 224,  162, 193,  131, 162, 100, 131,  69,  100,
+  38,  69,   7,   38,   7,   7,    8,   8,    8,   39,  39,  70,   70,  101,
+  101, 132,  132, 163,  163, 194,  194, 225,  225, 256, 256, 256,  288, 288,
+  257, 288,  226, 257,  195, 226,  164, 195,  133, 164, 102, 133,  71,  102,
+  40,  71,   9,   40,   9,   9,    10,  10,   10,  41,  41,  72,   72,  103,
+  103, 134,  134, 165,  165, 196,  196, 227,  227, 258, 258, 289,  289, 320,
+  320, 320,  352, 352,  321, 352,  290, 321,  259, 290, 228, 259,  197, 228,
+  166, 197,  135, 166,  104, 135,  73,  104,  42,  73,  11,  42,   11,  11,
+  12,  12,   12,  43,   43,  74,   74,  105,  105, 136, 136, 167,  167, 198,
+  198, 229,  229, 260,  260, 291,  291, 322,  322, 353, 353, 384,  384, 384,
+  416, 416,  385, 416,  354, 385,  323, 354,  292, 323, 261, 292,  230, 261,
+  199, 230,  168, 199,  137, 168,  106, 137,  75,  106, 44,  75,   13,  44,
+  13,  13,   14,  14,   14,  45,   45,  76,   76,  107, 107, 138,  138, 169,
+  169, 200,  200, 231,  231, 262,  262, 293,  293, 324, 324, 355,  355, 386,
+  386, 417,  417, 448,  448, 448,  480, 480,  449, 480, 418, 449,  387, 418,
+  356, 387,  325, 356,  294, 325,  263, 294,  232, 263, 201, 232,  170, 201,
+  139, 170,  108, 139,  77,  108,  46,  77,   15,  46,  15,  15,   16,  16,
+  16,  47,   47,  78,   78,  109,  109, 140,  140, 171, 171, 202,  202, 233,
+  233, 264,  264, 295,  295, 326,  326, 357,  357, 388, 388, 419,  419, 450,
+  450, 481,  481, 512,  512, 512,  544, 544,  513, 544, 482, 513,  451, 482,
+  420, 451,  389, 420,  358, 389,  327, 358,  296, 327, 265, 296,  234, 265,
+  203, 234,  172, 203,  141, 172,  110, 141,  79,  110, 48,  79,   17,  48,
+  17,  17,   18,  18,   18,  49,   49,  80,   80,  111, 111, 142,  142, 173,
+  173, 204,  204, 235,  235, 266,  266, 297,  297, 328, 328, 359,  359, 390,
+  390, 421,  421, 452,  452, 483,  483, 514,  514, 545, 545, 576,  576, 576,
+  608, 608,  577, 608,  546, 577,  515, 546,  484, 515, 453, 484,  422, 453,
+  391, 422,  360, 391,  329, 360,  298, 329,  267, 298, 236, 267,  205, 236,
+  174, 205,  143, 174,  112, 143,  81,  112,  50,  81,  19,  50,   19,  19,
+  20,  20,   20,  51,   51,  82,   82,  113,  113, 144, 144, 175,  175, 206,
+  206, 237,  237, 268,  268, 299,  299, 330,  330, 361, 361, 392,  392, 423,
+  423, 454,  454, 485,  485, 516,  516, 547,  547, 578, 578, 609,  609, 640,
+  640, 640,  672, 672,  641, 672,  610, 641,  579, 610, 548, 579,  517, 548,
+  486, 517,  455, 486,  424, 455,  393, 424,  362, 393, 331, 362,  300, 331,
+  269, 300,  238, 269,  207, 238,  176, 207,  145, 176, 114, 145,  83,  114,
+  52,  83,   21,  52,   21,  21,   22,  22,   22,  53,  53,  84,   84,  115,
+  115, 146,  146, 177,  177, 208,  208, 239,  239, 270, 270, 301,  301, 332,
+  332, 363,  363, 394,  394, 425,  425, 456,  456, 487, 487, 518,  518, 549,
+  549, 580,  580, 611,  611, 642,  642, 673,  673, 704, 704, 704,  736, 736,
+  705, 736,  674, 705,  643, 674,  612, 643,  581, 612, 550, 581,  519, 550,
+  488, 519,  457, 488,  426, 457,  395, 426,  364, 395, 333, 364,  302, 333,
+  271, 302,  240, 271,  209, 240,  178, 209,  147, 178, 116, 147,  85,  116,
+  54,  85,   23,  54,   23,  23,   24,  24,   24,  55,  55,  86,   86,  117,
+  117, 148,  148, 179,  179, 210,  210, 241,  241, 272, 272, 303,  303, 334,
+  334, 365,  365, 396,  396, 427,  427, 458,  458, 489, 489, 520,  520, 551,
+  551, 582,  582, 613,  613, 644,  644, 675,  675, 706, 706, 737,  737, 768,
+  768, 768,  800, 800,  769, 800,  738, 769,  707, 738, 676, 707,  645, 676,
+  614, 645,  583, 614,  552, 583,  521, 552,  490, 521, 459, 490,  428, 459,
+  397, 428,  366, 397,  335, 366,  304, 335,  273, 304, 242, 273,  211, 242,
+  180, 211,  149, 180,  118, 149,  87,  118,  56,  87,  25,  56,   25,  25,
+  26,  26,   26,  57,   57,  88,   88,  119,  119, 150, 150, 181,  181, 212,
+  212, 243,  243, 274,  274, 305,  305, 336,  336, 367, 367, 398,  398, 429,
+  429, 460,  460, 491,  491, 522,  522, 553,  553, 584, 584, 615,  615, 646,
+  646, 677,  677, 708,  708, 739,  739, 770,  770, 801, 801, 832,  832, 832,
+  864, 864,  833, 864,  802, 833,  771, 802,  740, 771, 709, 740,  678, 709,
+  647, 678,  616, 647,  585, 616,  554, 585,  523, 554, 492, 523,  461, 492,
+  430, 461,  399, 430,  368, 399,  337, 368,  306, 337, 275, 306,  244, 275,
+  213, 244,  182, 213,  151, 182,  120, 151,  89,  120, 58,  89,   27,  58,
+  27,  27,   28,  28,   28,  59,   59,  90,   90,  121, 121, 152,  152, 183,
+  183, 214,  214, 245,  245, 276,  276, 307,  307, 338, 338, 369,  369, 400,
+  400, 431,  431, 462,  462, 493,  493, 524,  524, 555, 555, 586,  586, 617,
+  617, 648,  648, 679,  679, 710,  710, 741,  741, 772, 772, 803,  803, 834,
+  834, 865,  865, 896,  896, 896,  928, 928,  897, 928, 866, 897,  835, 866,
+  804, 835,  773, 804,  742, 773,  711, 742,  680, 711, 649, 680,  618, 649,
+  587, 618,  556, 587,  525, 556,  494, 525,  463, 494, 432, 463,  401, 432,
+  370, 401,  339, 370,  308, 339,  277, 308,  246, 277, 215, 246,  184, 215,
+  153, 184,  122, 153,  91,  122,  60,  91,   29,  60,  29,  29,   30,  30,
+  30,  61,   61,  92,   92,  123,  123, 154,  154, 185, 185, 216,  216, 247,
+  247, 278,  278, 309,  309, 340,  340, 371,  371, 402, 402, 433,  433, 464,
+  464, 495,  495, 526,  526, 557,  557, 588,  588, 619, 619, 650,  650, 681,
+  681, 712,  712, 743,  743, 774,  774, 805,  805, 836, 836, 867,  867, 898,
+  898, 929,  929, 960,  960, 960,  961, 992,  930, 961, 899, 930,  868, 899,
+  837, 868,  806, 837,  775, 806,  744, 775,  713, 744, 682, 713,  651, 682,
+  620, 651,  589, 620,  558, 589,  527, 558,  496, 527, 465, 496,  434, 465,
+  403, 434,  372, 403,  341, 372,  310, 341,  279, 310, 248, 279,  217, 248,
+  186, 217,  155, 186,  124, 155,  93,  124,  62,  93,  31,  62,   63,  94,
+  94,  125,  125, 156,  156, 187,  187, 218,  218, 249, 249, 280,  280, 311,
+  311, 342,  342, 373,  373, 404,  404, 435,  435, 466, 466, 497,  497, 528,
+  528, 559,  559, 590,  590, 621,  621, 652,  652, 683, 683, 714,  714, 745,
+  745, 776,  776, 807,  807, 838,  838, 869,  869, 900, 900, 931,  931, 962,
+  962, 993,  963, 994,  932, 963,  901, 932,  870, 901, 839, 870,  808, 839,
+  777, 808,  746, 777,  715, 746,  684, 715,  653, 684, 622, 653,  591, 622,
+  560, 591,  529, 560,  498, 529,  467, 498,  436, 467, 405, 436,  374, 405,
+  343, 374,  312, 343,  281, 312,  250, 281,  219, 250, 188, 219,  157, 188,
+  126, 157,  95,  126,  127, 158,  158, 189,  189, 220, 220, 251,  251, 282,
+  282, 313,  313, 344,  344, 375,  375, 406,  406, 437, 437, 468,  468, 499,
+  499, 530,  530, 561,  561, 592,  592, 623,  623, 654, 654, 685,  685, 716,
+  716, 747,  747, 778,  778, 809,  809, 840,  840, 871, 871, 902,  902, 933,
+  933, 964,  964, 995,  965, 996,  934, 965,  903, 934, 872, 903,  841, 872,
+  810, 841,  779, 810,  748, 779,  717, 748,  686, 717, 655, 686,  624, 655,
+  593, 624,  562, 593,  531, 562,  500, 531,  469, 500, 438, 469,  407, 438,
+  376, 407,  345, 376,  314, 345,  283, 314,  252, 283, 221, 252,  190, 221,
+  159, 190,  191, 222,  222, 253,  253, 284,  284, 315, 315, 346,  346, 377,
+  377, 408,  408, 439,  439, 470,  470, 501,  501, 532, 532, 563,  563, 594,
+  594, 625,  625, 656,  656, 687,  687, 718,  718, 749, 749, 780,  780, 811,
+  811, 842,  842, 873,  873, 904,  904, 935,  935, 966, 966, 997,  967, 998,
+  936, 967,  905, 936,  874, 905,  843, 874,  812, 843, 781, 812,  750, 781,
+  719, 750,  688, 719,  657, 688,  626, 657,  595, 626, 564, 595,  533, 564,
+  502, 533,  471, 502,  440, 471,  409, 440,  378, 409, 347, 378,  316, 347,
+  285, 316,  254, 285,  223, 254,  255, 286,  286, 317, 317, 348,  348, 379,
+  379, 410,  410, 441,  441, 472,  472, 503,  503, 534, 534, 565,  565, 596,
+  596, 627,  627, 658,  658, 689,  689, 720,  720, 751, 751, 782,  782, 813,
+  813, 844,  844, 875,  875, 906,  906, 937,  937, 968, 968, 999,  969, 1000,
+  938, 969,  907, 938,  876, 907,  845, 876,  814, 845, 783, 814,  752, 783,
+  721, 752,  690, 721,  659, 690,  628, 659,  597, 628, 566, 597,  535, 566,
+  504, 535,  473, 504,  442, 473,  411, 442,  380, 411, 349, 380,  318, 349,
+  287, 318,  319, 350,  350, 381,  381, 412,  412, 443, 443, 474,  474, 505,
+  505, 536,  536, 567,  567, 598,  598, 629,  629, 660, 660, 691,  691, 722,
+  722, 753,  753, 784,  784, 815,  815, 846,  846, 877, 877, 908,  908, 939,
+  939, 970,  970, 1001, 971, 1002, 940, 971,  909, 940, 878, 909,  847, 878,
+  816, 847,  785, 816,  754, 785,  723, 754,  692, 723, 661, 692,  630, 661,
+  599, 630,  568, 599,  537, 568,  506, 537,  475, 506, 444, 475,  413, 444,
+  382, 413,  351, 382,  383, 414,  414, 445,  445, 476, 476, 507,  507, 538,
+  538, 569,  569, 600,  600, 631,  631, 662,  662, 693, 693, 724,  724, 755,
+  755, 786,  786, 817,  817, 848,  848, 879,  879, 910, 910, 941,  941, 972,
+  972, 1003, 973, 1004, 942, 973,  911, 942,  880, 911, 849, 880,  818, 849,
+  787, 818,  756, 787,  725, 756,  694, 725,  663, 694, 632, 663,  601, 632,
+  570, 601,  539, 570,  508, 539,  477, 508,  446, 477, 415, 446,  447, 478,
+  478, 509,  509, 540,  540, 571,  571, 602,  602, 633, 633, 664,  664, 695,
+  695, 726,  726, 757,  757, 788,  788, 819,  819, 850, 850, 881,  881, 912,
+  912, 943,  943, 974,  974, 1005, 975, 1006, 944, 975, 913, 944,  882, 913,
+  851, 882,  820, 851,  789, 820,  758, 789,  727, 758, 696, 727,  665, 696,
+  634, 665,  603, 634,  572, 603,  541, 572,  510, 541, 479, 510,  511, 542,
+  542, 573,  573, 604,  604, 635,  635, 666,  666, 697, 697, 728,  728, 759,
+  759, 790,  790, 821,  821, 852,  852, 883,  883, 914, 914, 945,  945, 976,
+  976, 1007, 977, 1008, 946, 977,  915, 946,  884, 915, 853, 884,  822, 853,
+  791, 822,  760, 791,  729, 760,  698, 729,  667, 698, 636, 667,  605, 636,
+  574, 605,  543, 574,  575, 606,  606, 637,  637, 668, 668, 699,  699, 730,
+  730, 761,  761, 792,  792, 823,  823, 854,  854, 885, 885, 916,  916, 947,
+  947, 978,  978, 1009, 979, 1010, 948, 979,  917, 948, 886, 917,  855, 886,
+  824, 855,  793, 824,  762, 793,  731, 762,  700, 731, 669, 700,  638, 669,
+  607, 638,  639, 670,  670, 701,  701, 732,  732, 763, 763, 794,  794, 825,
+  825, 856,  856, 887,  887, 918,  918, 949,  949, 980, 980, 1011, 981, 1012,
+  950, 981,  919, 950,  888, 919,  857, 888,  826, 857, 795, 826,  764, 795,
+  733, 764,  702, 733,  671, 702,  703, 734,  734, 765, 765, 796,  796, 827,
+  827, 858,  858, 889,  889, 920,  920, 951,  951, 982, 982, 1013, 983, 1014,
+  952, 983,  921, 952,  890, 921,  859, 890,  828, 859, 797, 828,  766, 797,
+  735, 766,  767, 798,  798, 829,  829, 860,  860, 891, 891, 922,  922, 953,
+  953, 984,  984, 1015, 985, 1016, 954, 985,  923, 954, 892, 923,  861, 892,
+  830, 861,  799, 830,  831, 862,  862, 893,  893, 924, 924, 955,  955, 986,
+  986, 1017, 987, 1018, 956, 987,  925, 956,  894, 925, 863, 894,  895, 926,
+  926, 957,  957, 988,  988, 1019, 989, 1020, 958, 989, 927, 958,  959, 990,
+  990, 1021, 991, 1022, 0,   0
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_4x4[16]) = {
+  0, 1, 5, 6, 2, 4, 7, 12, 3, 8, 11, 13, 9, 10, 14, 15
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_4x4[16]) = {
+  0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_4x4[16]) = {
+  0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_4x8[32]) = {
+  0,  1,  3,  6,  2,  4,  7,  10, 5,  8,  11, 14, 9,  12, 15, 18,
+  13, 16, 19, 22, 17, 20, 23, 26, 21, 24, 27, 29, 25, 28, 30, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_4x8[32]) = {
+  0, 8,  16, 24, 1, 9,  17, 25, 2, 10, 18, 26, 3, 11, 19, 27,
+  4, 12, 20, 28, 5, 13, 21, 29, 6, 14, 22, 30, 7, 15, 23, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_4x8[32]) = {
+  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15,
+  16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_8x4[32]) = {
+  0, 2, 5,  9,  13, 17, 21, 25, 1, 4,  8,  12, 16, 20, 24, 28,
+  3, 7, 11, 15, 19, 23, 27, 30, 6, 10, 14, 18, 22, 26, 29, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_8x4[32]) = {
+  0, 4, 8,  12, 16, 20, 24, 28, 1, 5, 9,  13, 17, 21, 25, 29,
+  2, 6, 10, 14, 18, 22, 26, 30, 3, 7, 11, 15, 19, 23, 27, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_8x4[32]) = {
+  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15,
+  16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_4x16[64]) = {
+  0,  1,  3,  6,  2,  4,  7,  10, 5,  8,  11, 14, 9,  12, 15, 18,
+  13, 16, 19, 22, 17, 20, 23, 26, 21, 24, 27, 30, 25, 28, 31, 34,
+  29, 32, 35, 38, 33, 36, 39, 42, 37, 40, 43, 46, 41, 44, 47, 50,
+  45, 48, 51, 54, 49, 52, 55, 58, 53, 56, 59, 61, 57, 60, 62, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_16x4[64]) = {
+  0, 2,  5,  9,  13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57,
+  1, 4,  8,  12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60,
+  3, 7,  11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59, 62,
+  6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 61, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_4x16[64]) = {
+  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15,
+  16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+  32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
+  48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_16x4[64]) = {
+  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15,
+  16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+  32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
+  48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_4x16[64]) = {
+  0,  16, 32, 48, 1,  17, 33, 49, 2,  18, 34, 50, 3,  19, 35, 51,
+  4,  20, 36, 52, 5,  21, 37, 53, 6,  22, 38, 54, 7,  23, 39, 55,
+  8,  24, 40, 56, 9,  25, 41, 57, 10, 26, 42, 58, 11, 27, 43, 59,
+  12, 28, 44, 60, 13, 29, 45, 61, 14, 30, 46, 62, 15, 31, 47, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_16x4[64]) = {
+  0, 4, 8,  12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60,
+  1, 5, 9,  13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61,
+  2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62,
+  3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_8x32[256]) = {
+  0,   1,   3,   6,   10,  15,  21,  28,  2,   4,   7,   11,  16,  22,  29,
+  36,  5,   8,   12,  17,  23,  30,  37,  44,  9,   13,  18,  24,  31,  38,
+  45,  52,  14,  19,  25,  32,  39,  46,  53,  60,  20,  26,  33,  40,  47,
+  54,  61,  68,  27,  34,  41,  48,  55,  62,  69,  76,  35,  42,  49,  56,
+  63,  70,  77,  84,  43,  50,  57,  64,  71,  78,  85,  92,  51,  58,  65,
+  72,  79,  86,  93,  100, 59,  66,  73,  80,  87,  94,  101, 108, 67,  74,
+  81,  88,  95,  102, 109, 116, 75,  82,  89,  96,  103, 110, 117, 124, 83,
+  90,  97,  104, 111, 118, 125, 132, 91,  98,  105, 112, 119, 126, 133, 140,
+  99,  106, 113, 120, 127, 134, 141, 148, 107, 114, 121, 128, 135, 142, 149,
+  156, 115, 122, 129, 136, 143, 150, 157, 164, 123, 130, 137, 144, 151, 158,
+  165, 172, 131, 138, 145, 152, 159, 166, 173, 180, 139, 146, 153, 160, 167,
+  174, 181, 188, 147, 154, 161, 168, 175, 182, 189, 196, 155, 162, 169, 176,
+  183, 190, 197, 204, 163, 170, 177, 184, 191, 198, 205, 212, 171, 178, 185,
+  192, 199, 206, 213, 220, 179, 186, 193, 200, 207, 214, 221, 228, 187, 194,
+  201, 208, 215, 222, 229, 235, 195, 202, 209, 216, 223, 230, 236, 241, 203,
+  210, 217, 224, 231, 237, 242, 246, 211, 218, 225, 232, 238, 243, 247, 250,
+  219, 226, 233, 239, 244, 248, 251, 253, 227, 234, 240, 245, 249, 252, 254,
+  255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_32x8[256]) = {
+  0,   2,   5,   9,   14,  20,  27,  35,  43,  51,  59,  67,  75,  83,  91,
+  99,  107, 115, 123, 131, 139, 147, 155, 163, 171, 179, 187, 195, 203, 211,
+  219, 227, 1,   4,   8,   13,  19,  26,  34,  42,  50,  58,  66,  74,  82,
+  90,  98,  106, 114, 122, 130, 138, 146, 154, 162, 170, 178, 186, 194, 202,
+  210, 218, 226, 234, 3,   7,   12,  18,  25,  33,  41,  49,  57,  65,  73,
+  81,  89,  97,  105, 113, 121, 129, 137, 145, 153, 161, 169, 177, 185, 193,
+  201, 209, 217, 225, 233, 240, 6,   11,  17,  24,  32,  40,  48,  56,  64,
+  72,  80,  88,  96,  104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184,
+  192, 200, 208, 216, 224, 232, 239, 245, 10,  16,  23,  31,  39,  47,  55,
+  63,  71,  79,  87,  95,  103, 111, 119, 127, 135, 143, 151, 159, 167, 175,
+  183, 191, 199, 207, 215, 223, 231, 238, 244, 249, 15,  22,  30,  38,  46,
+  54,  62,  70,  78,  86,  94,  102, 110, 118, 126, 134, 142, 150, 158, 166,
+  174, 182, 190, 198, 206, 214, 222, 230, 237, 243, 248, 252, 21,  29,  37,
+  45,  53,  61,  69,  77,  85,  93,  101, 109, 117, 125, 133, 141, 149, 157,
+  165, 173, 181, 189, 197, 205, 213, 221, 229, 236, 242, 247, 251, 254, 28,
+  36,  44,  52,  60,  68,  76,  84,  92,  100, 108, 116, 124, 132, 140, 148,
+  156, 164, 172, 180, 188, 196, 204, 212, 220, 228, 235, 241, 246, 250, 253,
+  255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_8x32[256]) = {
+  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   10,  11,  12,  13,  14,
+  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
+  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,
+  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,
+  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,
+  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
+  90,  91,  92,  93,  94,  95,  96,  97,  98,  99,  100, 101, 102, 103, 104,
+  105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+  120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+  135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+  150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+  165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+  180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+  195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+  210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+  225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+  240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+  255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_32x8[256]) = {
+  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   10,  11,  12,  13,  14,
+  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
+  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,
+  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,
+  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,
+  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
+  90,  91,  92,  93,  94,  95,  96,  97,  98,  99,  100, 101, 102, 103, 104,
+  105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+  120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+  135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+  150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+  165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+  180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+  195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+  210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+  225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+  240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+  255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_8x32[256]) = {
+  0,  32, 64, 96,  128, 160, 192, 224, 1,  33, 65, 97,  129, 161, 193, 225,
+  2,  34, 66, 98,  130, 162, 194, 226, 3,  35, 67, 99,  131, 163, 195, 227,
+  4,  36, 68, 100, 132, 164, 196, 228, 5,  37, 69, 101, 133, 165, 197, 229,
+  6,  38, 70, 102, 134, 166, 198, 230, 7,  39, 71, 103, 135, 167, 199, 231,
+  8,  40, 72, 104, 136, 168, 200, 232, 9,  41, 73, 105, 137, 169, 201, 233,
+  10, 42, 74, 106, 138, 170, 202, 234, 11, 43, 75, 107, 139, 171, 203, 235,
+  12, 44, 76, 108, 140, 172, 204, 236, 13, 45, 77, 109, 141, 173, 205, 237,
+  14, 46, 78, 110, 142, 174, 206, 238, 15, 47, 79, 111, 143, 175, 207, 239,
+  16, 48, 80, 112, 144, 176, 208, 240, 17, 49, 81, 113, 145, 177, 209, 241,
+  18, 50, 82, 114, 146, 178, 210, 242, 19, 51, 83, 115, 147, 179, 211, 243,
+  20, 52, 84, 116, 148, 180, 212, 244, 21, 53, 85, 117, 149, 181, 213, 245,
+  22, 54, 86, 118, 150, 182, 214, 246, 23, 55, 87, 119, 151, 183, 215, 247,
+  24, 56, 88, 120, 152, 184, 216, 248, 25, 57, 89, 121, 153, 185, 217, 249,
+  26, 58, 90, 122, 154, 186, 218, 250, 27, 59, 91, 123, 155, 187, 219, 251,
+  28, 60, 92, 124, 156, 188, 220, 252, 29, 61, 93, 125, 157, 189, 221, 253,
+  30, 62, 94, 126, 158, 190, 222, 254, 31, 63, 95, 127, 159, 191, 223, 255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_32x8[256]) = {
+  0,   8,   16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96,  104, 112,
+  120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232,
+  240, 248, 1,   9,   17,  25,  33,  41,  49,  57,  65,  73,  81,  89,  97,
+  105, 113, 121, 129, 137, 145, 153, 161, 169, 177, 185, 193, 201, 209, 217,
+  225, 233, 241, 249, 2,   10,  18,  26,  34,  42,  50,  58,  66,  74,  82,
+  90,  98,  106, 114, 122, 130, 138, 146, 154, 162, 170, 178, 186, 194, 202,
+  210, 218, 226, 234, 242, 250, 3,   11,  19,  27,  35,  43,  51,  59,  67,
+  75,  83,  91,  99,  107, 115, 123, 131, 139, 147, 155, 163, 171, 179, 187,
+  195, 203, 211, 219, 227, 235, 243, 251, 4,   12,  20,  28,  36,  44,  52,
+  60,  68,  76,  84,  92,  100, 108, 116, 124, 132, 140, 148, 156, 164, 172,
+  180, 188, 196, 204, 212, 220, 228, 236, 244, 252, 5,   13,  21,  29,  37,
+  45,  53,  61,  69,  77,  85,  93,  101, 109, 117, 125, 133, 141, 149, 157,
+  165, 173, 181, 189, 197, 205, 213, 221, 229, 237, 245, 253, 6,   14,  22,
+  30,  38,  46,  54,  62,  70,  78,  86,  94,  102, 110, 118, 126, 134, 142,
+  150, 158, 166, 174, 182, 190, 198, 206, 214, 222, 230, 238, 246, 254, 7,
+  15,  23,  31,  39,  47,  55,  63,  71,  79,  87,  95,  103, 111, 119, 127,
+  135, 143, 151, 159, 167, 175, 183, 191, 199, 207, 215, 223, 231, 239, 247,
+  255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_8x8[64]) = {
+  0, 8,  16, 24, 32, 40, 48, 56, 1, 9,  17, 25, 33, 41, 49, 57,
+  2, 10, 18, 26, 34, 42, 50, 58, 3, 11, 19, 27, 35, 43, 51, 59,
+  4, 12, 20, 28, 36, 44, 52, 60, 5, 13, 21, 29, 37, 45, 53, 61,
+  6, 14, 22, 30, 38, 46, 54, 62, 7, 15, 23, 31, 39, 47, 55, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_8x8[64]) = {
+  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15,
+  16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+  32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
+  48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_8x8[64]) = {
+  0,  1,  5,  6,  14, 15, 27, 28, 2,  4,  7,  13, 16, 26, 29, 42,
+  3,  8,  12, 17, 25, 30, 41, 43, 9,  11, 18, 24, 31, 40, 44, 53,
+  10, 19, 23, 32, 39, 45, 52, 54, 20, 22, 33, 38, 46, 51, 55, 60,
+  21, 34, 37, 47, 50, 56, 59, 61, 35, 36, 48, 49, 57, 58, 62, 63
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_8x16[128]) = {
+  0,  1,  3,   6,   10,  15,  21,  28,  2,  4,   7,   11,  16,  22,  29,  36,
+  5,  8,  12,  17,  23,  30,  37,  44,  9,  13,  18,  24,  31,  38,  45,  52,
+  14, 19, 25,  32,  39,  46,  53,  60,  20, 26,  33,  40,  47,  54,  61,  68,
+  27, 34, 41,  48,  55,  62,  69,  76,  35, 42,  49,  56,  63,  70,  77,  84,
+  43, 50, 57,  64,  71,  78,  85,  92,  51, 58,  65,  72,  79,  86,  93,  100,
+  59, 66, 73,  80,  87,  94,  101, 107, 67, 74,  81,  88,  95,  102, 108, 113,
+  75, 82, 89,  96,  103, 109, 114, 118, 83, 90,  97,  104, 110, 115, 119, 122,
+  91, 98, 105, 111, 116, 120, 123, 125, 99, 106, 112, 117, 121, 124, 126, 127,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_16x8[128]) = {
+  0,  2,  5,  9,  14, 20, 27, 35, 43, 51,  59,  67,  75,  83,  91,  99,
+  1,  4,  8,  13, 19, 26, 34, 42, 50, 58,  66,  74,  82,  90,  98,  106,
+  3,  7,  12, 18, 25, 33, 41, 49, 57, 65,  73,  81,  89,  97,  105, 112,
+  6,  11, 17, 24, 32, 40, 48, 56, 64, 72,  80,  88,  96,  104, 111, 117,
+  10, 16, 23, 31, 39, 47, 55, 63, 71, 79,  87,  95,  103, 110, 116, 121,
+  15, 22, 30, 38, 46, 54, 62, 70, 78, 86,  94,  102, 109, 115, 120, 124,
+  21, 29, 37, 45, 53, 61, 69, 77, 85, 93,  101, 108, 114, 119, 123, 126,
+  28, 36, 44, 52, 60, 68, 76, 84, 92, 100, 107, 113, 118, 122, 125, 127,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_8x16[128]) = {
+  0,  16, 32, 48, 64, 80, 96,  112, 1,  17, 33, 49, 65, 81, 97,  113,
+  2,  18, 34, 50, 66, 82, 98,  114, 3,  19, 35, 51, 67, 83, 99,  115,
+  4,  20, 36, 52, 68, 84, 100, 116, 5,  21, 37, 53, 69, 85, 101, 117,
+  6,  22, 38, 54, 70, 86, 102, 118, 7,  23, 39, 55, 71, 87, 103, 119,
+  8,  24, 40, 56, 72, 88, 104, 120, 9,  25, 41, 57, 73, 89, 105, 121,
+  10, 26, 42, 58, 74, 90, 106, 122, 11, 27, 43, 59, 75, 91, 107, 123,
+  12, 28, 44, 60, 76, 92, 108, 124, 13, 29, 45, 61, 77, 93, 109, 125,
+  14, 30, 46, 62, 78, 94, 110, 126, 15, 31, 47, 63, 79, 95, 111, 127,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_16x8[128]) = {
+  0, 8,  16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96,  104, 112, 120,
+  1, 9,  17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97,  105, 113, 121,
+  2, 10, 18, 26, 34, 42, 50, 58, 66, 74, 82, 90, 98,  106, 114, 122,
+  3, 11, 19, 27, 35, 43, 51, 59, 67, 75, 83, 91, 99,  107, 115, 123,
+  4, 12, 20, 28, 36, 44, 52, 60, 68, 76, 84, 92, 100, 108, 116, 124,
+  5, 13, 21, 29, 37, 45, 53, 61, 69, 77, 85, 93, 101, 109, 117, 125,
+  6, 14, 22, 30, 38, 46, 54, 62, 70, 78, 86, 94, 102, 110, 118, 126,
+  7, 15, 23, 31, 39, 47, 55, 63, 71, 79, 87, 95, 103, 111, 119, 127,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_8x16[128]) = {
+  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   10,  11,  12,  13,  14,
+  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
+  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,
+  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,
+  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,
+  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
+  90,  91,  92,  93,  94,  95,  96,  97,  98,  99,  100, 101, 102, 103, 104,
+  105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+  120, 121, 122, 123, 124, 125, 126, 127,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_16x8[128]) = {
+  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   10,  11,  12,  13,  14,
+  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
+  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,
+  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,
+  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,
+  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
+  90,  91,  92,  93,  94,  95,  96,  97,  98,  99,  100, 101, 102, 103, 104,
+  105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+  120, 121, 122, 123, 124, 125, 126, 127,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_16x32[512]) = {
+  0,   1,   3,   6,   10,  15,  21,  28,  36,  45,  55,  66,  78,  91,  105,
+  120, 2,   4,   7,   11,  16,  22,  29,  37,  46,  56,  67,  79,  92,  106,
+  121, 136, 5,   8,   12,  17,  23,  30,  38,  47,  57,  68,  80,  93,  107,
+  122, 137, 152, 9,   13,  18,  24,  31,  39,  48,  58,  69,  81,  94,  108,
+  123, 138, 153, 168, 14,  19,  25,  32,  40,  49,  59,  70,  82,  95,  109,
+  124, 139, 154, 169, 184, 20,  26,  33,  41,  50,  60,  71,  83,  96,  110,
+  125, 140, 155, 170, 185, 200, 27,  34,  42,  51,  61,  72,  84,  97,  111,
+  126, 141, 156, 171, 186, 201, 216, 35,  43,  52,  62,  73,  85,  98,  112,
+  127, 142, 157, 172, 187, 202, 217, 232, 44,  53,  63,  74,  86,  99,  113,
+  128, 143, 158, 173, 188, 203, 218, 233, 248, 54,  64,  75,  87,  100, 114,
+  129, 144, 159, 174, 189, 204, 219, 234, 249, 264, 65,  76,  88,  101, 115,
+  130, 145, 160, 175, 190, 205, 220, 235, 250, 265, 280, 77,  89,  102, 116,
+  131, 146, 161, 176, 191, 206, 221, 236, 251, 266, 281, 296, 90,  103, 117,
+  132, 147, 162, 177, 192, 207, 222, 237, 252, 267, 282, 297, 312, 104, 118,
+  133, 148, 163, 178, 193, 208, 223, 238, 253, 268, 283, 298, 313, 328, 119,
+  134, 149, 164, 179, 194, 209, 224, 239, 254, 269, 284, 299, 314, 329, 344,
+  135, 150, 165, 180, 195, 210, 225, 240, 255, 270, 285, 300, 315, 330, 345,
+  360, 151, 166, 181, 196, 211, 226, 241, 256, 271, 286, 301, 316, 331, 346,
+  361, 376, 167, 182, 197, 212, 227, 242, 257, 272, 287, 302, 317, 332, 347,
+  362, 377, 392, 183, 198, 213, 228, 243, 258, 273, 288, 303, 318, 333, 348,
+  363, 378, 393, 407, 199, 214, 229, 244, 259, 274, 289, 304, 319, 334, 349,
+  364, 379, 394, 408, 421, 215, 230, 245, 260, 275, 290, 305, 320, 335, 350,
+  365, 380, 395, 409, 422, 434, 231, 246, 261, 276, 291, 306, 321, 336, 351,
+  366, 381, 396, 410, 423, 435, 446, 247, 262, 277, 292, 307, 322, 337, 352,
+  367, 382, 397, 411, 424, 436, 447, 457, 263, 278, 293, 308, 323, 338, 353,
+  368, 383, 398, 412, 425, 437, 448, 458, 467, 279, 294, 309, 324, 339, 354,
+  369, 384, 399, 413, 426, 438, 449, 459, 468, 476, 295, 310, 325, 340, 355,
+  370, 385, 400, 414, 427, 439, 450, 460, 469, 477, 484, 311, 326, 341, 356,
+  371, 386, 401, 415, 428, 440, 451, 461, 470, 478, 485, 491, 327, 342, 357,
+  372, 387, 402, 416, 429, 441, 452, 462, 471, 479, 486, 492, 497, 343, 358,
+  373, 388, 403, 417, 430, 442, 453, 463, 472, 480, 487, 493, 498, 502, 359,
+  374, 389, 404, 418, 431, 443, 454, 464, 473, 481, 488, 494, 499, 503, 506,
+  375, 390, 405, 419, 432, 444, 455, 465, 474, 482, 489, 495, 500, 504, 507,
+  509, 391, 406, 420, 433, 445, 456, 466, 475, 483, 490, 496, 501, 505, 508,
+  510, 511,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_32x16[512]) = {
+  0,   2,   5,   9,   14,  20,  27,  35,  44,  54,  65,  77,  90,  104, 119,
+  135, 151, 167, 183, 199, 215, 231, 247, 263, 279, 295, 311, 327, 343, 359,
+  375, 391, 1,   4,   8,   13,  19,  26,  34,  43,  53,  64,  76,  89,  103,
+  118, 134, 150, 166, 182, 198, 214, 230, 246, 262, 278, 294, 310, 326, 342,
+  358, 374, 390, 406, 3,   7,   12,  18,  25,  33,  42,  52,  63,  75,  88,
+  102, 117, 133, 149, 165, 181, 197, 213, 229, 245, 261, 277, 293, 309, 325,
+  341, 357, 373, 389, 405, 420, 6,   11,  17,  24,  32,  41,  51,  62,  74,
+  87,  101, 116, 132, 148, 164, 180, 196, 212, 228, 244, 260, 276, 292, 308,
+  324, 340, 356, 372, 388, 404, 419, 433, 10,  16,  23,  31,  40,  50,  61,
+  73,  86,  100, 115, 131, 147, 163, 179, 195, 211, 227, 243, 259, 275, 291,
+  307, 323, 339, 355, 371, 387, 403, 418, 432, 445, 15,  22,  30,  39,  49,
+  60,  72,  85,  99,  114, 130, 146, 162, 178, 194, 210, 226, 242, 258, 274,
+  290, 306, 322, 338, 354, 370, 386, 402, 417, 431, 444, 456, 21,  29,  38,
+  48,  59,  71,  84,  98,  113, 129, 145, 161, 177, 193, 209, 225, 241, 257,
+  273, 289, 305, 321, 337, 353, 369, 385, 401, 416, 430, 443, 455, 466, 28,
+  37,  47,  58,  70,  83,  97,  112, 128, 144, 160, 176, 192, 208, 224, 240,
+  256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 415, 429, 442, 454, 465,
+  475, 36,  46,  57,  69,  82,  96,  111, 127, 143, 159, 175, 191, 207, 223,
+  239, 255, 271, 287, 303, 319, 335, 351, 367, 383, 399, 414, 428, 441, 453,
+  464, 474, 483, 45,  56,  68,  81,  95,  110, 126, 142, 158, 174, 190, 206,
+  222, 238, 254, 270, 286, 302, 318, 334, 350, 366, 382, 398, 413, 427, 440,
+  452, 463, 473, 482, 490, 55,  67,  80,  94,  109, 125, 141, 157, 173, 189,
+  205, 221, 237, 253, 269, 285, 301, 317, 333, 349, 365, 381, 397, 412, 426,
+  439, 451, 462, 472, 481, 489, 496, 66,  79,  93,  108, 124, 140, 156, 172,
+  188, 204, 220, 236, 252, 268, 284, 300, 316, 332, 348, 364, 380, 396, 411,
+  425, 438, 450, 461, 471, 480, 488, 495, 501, 78,  92,  107, 123, 139, 155,
+  171, 187, 203, 219, 235, 251, 267, 283, 299, 315, 331, 347, 363, 379, 395,
+  410, 424, 437, 449, 460, 470, 479, 487, 494, 500, 505, 91,  106, 122, 138,
+  154, 170, 186, 202, 218, 234, 250, 266, 282, 298, 314, 330, 346, 362, 378,
+  394, 409, 423, 436, 448, 459, 469, 478, 486, 493, 499, 504, 508, 105, 121,
+  137, 153, 169, 185, 201, 217, 233, 249, 265, 281, 297, 313, 329, 345, 361,
+  377, 393, 408, 422, 435, 447, 458, 468, 477, 485, 492, 498, 503, 507, 510,
+  120, 136, 152, 168, 184, 200, 216, 232, 248, 264, 280, 296, 312, 328, 344,
+  360, 376, 392, 407, 421, 434, 446, 457, 467, 476, 484, 491, 497, 502, 506,
+  509, 511,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_16x32[512]) = {
+  0,  32, 64, 96,  128, 160, 192, 224, 256, 288, 320, 352, 384, 416, 448, 480,
+  1,  33, 65, 97,  129, 161, 193, 225, 257, 289, 321, 353, 385, 417, 449, 481,
+  2,  34, 66, 98,  130, 162, 194, 226, 258, 290, 322, 354, 386, 418, 450, 482,
+  3,  35, 67, 99,  131, 163, 195, 227, 259, 291, 323, 355, 387, 419, 451, 483,
+  4,  36, 68, 100, 132, 164, 196, 228, 260, 292, 324, 356, 388, 420, 452, 484,
+  5,  37, 69, 101, 133, 165, 197, 229, 261, 293, 325, 357, 389, 421, 453, 485,
+  6,  38, 70, 102, 134, 166, 198, 230, 262, 294, 326, 358, 390, 422, 454, 486,
+  7,  39, 71, 103, 135, 167, 199, 231, 263, 295, 327, 359, 391, 423, 455, 487,
+  8,  40, 72, 104, 136, 168, 200, 232, 264, 296, 328, 360, 392, 424, 456, 488,
+  9,  41, 73, 105, 137, 169, 201, 233, 265, 297, 329, 361, 393, 425, 457, 489,
+  10, 42, 74, 106, 138, 170, 202, 234, 266, 298, 330, 362, 394, 426, 458, 490,
+  11, 43, 75, 107, 139, 171, 203, 235, 267, 299, 331, 363, 395, 427, 459, 491,
+  12, 44, 76, 108, 140, 172, 204, 236, 268, 300, 332, 364, 396, 428, 460, 492,
+  13, 45, 77, 109, 141, 173, 205, 237, 269, 301, 333, 365, 397, 429, 461, 493,
+  14, 46, 78, 110, 142, 174, 206, 238, 270, 302, 334, 366, 398, 430, 462, 494,
+  15, 47, 79, 111, 143, 175, 207, 239, 271, 303, 335, 367, 399, 431, 463, 495,
+  16, 48, 80, 112, 144, 176, 208, 240, 272, 304, 336, 368, 400, 432, 464, 496,
+  17, 49, 81, 113, 145, 177, 209, 241, 273, 305, 337, 369, 401, 433, 465, 497,
+  18, 50, 82, 114, 146, 178, 210, 242, 274, 306, 338, 370, 402, 434, 466, 498,
+  19, 51, 83, 115, 147, 179, 211, 243, 275, 307, 339, 371, 403, 435, 467, 499,
+  20, 52, 84, 116, 148, 180, 212, 244, 276, 308, 340, 372, 404, 436, 468, 500,
+  21, 53, 85, 117, 149, 181, 213, 245, 277, 309, 341, 373, 405, 437, 469, 501,
+  22, 54, 86, 118, 150, 182, 214, 246, 278, 310, 342, 374, 406, 438, 470, 502,
+  23, 55, 87, 119, 151, 183, 215, 247, 279, 311, 343, 375, 407, 439, 471, 503,
+  24, 56, 88, 120, 152, 184, 216, 248, 280, 312, 344, 376, 408, 440, 472, 504,
+  25, 57, 89, 121, 153, 185, 217, 249, 281, 313, 345, 377, 409, 441, 473, 505,
+  26, 58, 90, 122, 154, 186, 218, 250, 282, 314, 346, 378, 410, 442, 474, 506,
+  27, 59, 91, 123, 155, 187, 219, 251, 283, 315, 347, 379, 411, 443, 475, 507,
+  28, 60, 92, 124, 156, 188, 220, 252, 284, 316, 348, 380, 412, 444, 476, 508,
+  29, 61, 93, 125, 157, 189, 221, 253, 285, 317, 349, 381, 413, 445, 477, 509,
+  30, 62, 94, 126, 158, 190, 222, 254, 286, 318, 350, 382, 414, 446, 478, 510,
+  31, 63, 95, 127, 159, 191, 223, 255, 287, 319, 351, 383, 415, 447, 479, 511,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_32x16[512]) = {
+  0,   16,  32,  48,  64,  80,  96,  112, 128, 144, 160, 176, 192, 208, 224,
+  240, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464,
+  480, 496, 1,   17,  33,  49,  65,  81,  97,  113, 129, 145, 161, 177, 193,
+  209, 225, 241, 257, 273, 289, 305, 321, 337, 353, 369, 385, 401, 417, 433,
+  449, 465, 481, 497, 2,   18,  34,  50,  66,  82,  98,  114, 130, 146, 162,
+  178, 194, 210, 226, 242, 258, 274, 290, 306, 322, 338, 354, 370, 386, 402,
+  418, 434, 450, 466, 482, 498, 3,   19,  35,  51,  67,  83,  99,  115, 131,
+  147, 163, 179, 195, 211, 227, 243, 259, 275, 291, 307, 323, 339, 355, 371,
+  387, 403, 419, 435, 451, 467, 483, 499, 4,   20,  36,  52,  68,  84,  100,
+  116, 132, 148, 164, 180, 196, 212, 228, 244, 260, 276, 292, 308, 324, 340,
+  356, 372, 388, 404, 420, 436, 452, 468, 484, 500, 5,   21,  37,  53,  69,
+  85,  101, 117, 133, 149, 165, 181, 197, 213, 229, 245, 261, 277, 293, 309,
+  325, 341, 357, 373, 389, 405, 421, 437, 453, 469, 485, 501, 6,   22,  38,
+  54,  70,  86,  102, 118, 134, 150, 166, 182, 198, 214, 230, 246, 262, 278,
+  294, 310, 326, 342, 358, 374, 390, 406, 422, 438, 454, 470, 486, 502, 7,
+  23,  39,  55,  71,  87,  103, 119, 135, 151, 167, 183, 199, 215, 231, 247,
+  263, 279, 295, 311, 327, 343, 359, 375, 391, 407, 423, 439, 455, 471, 487,
+  503, 8,   24,  40,  56,  72,  88,  104, 120, 136, 152, 168, 184, 200, 216,
+  232, 248, 264, 280, 296, 312, 328, 344, 360, 376, 392, 408, 424, 440, 456,
+  472, 488, 504, 9,   25,  41,  57,  73,  89,  105, 121, 137, 153, 169, 185,
+  201, 217, 233, 249, 265, 281, 297, 313, 329, 345, 361, 377, 393, 409, 425,
+  441, 457, 473, 489, 505, 10,  26,  42,  58,  74,  90,  106, 122, 138, 154,
+  170, 186, 202, 218, 234, 250, 266, 282, 298, 314, 330, 346, 362, 378, 394,
+  410, 426, 442, 458, 474, 490, 506, 11,  27,  43,  59,  75,  91,  107, 123,
+  139, 155, 171, 187, 203, 219, 235, 251, 267, 283, 299, 315, 331, 347, 363,
+  379, 395, 411, 427, 443, 459, 475, 491, 507, 12,  28,  44,  60,  76,  92,
+  108, 124, 140, 156, 172, 188, 204, 220, 236, 252, 268, 284, 300, 316, 332,
+  348, 364, 380, 396, 412, 428, 444, 460, 476, 492, 508, 13,  29,  45,  61,
+  77,  93,  109, 125, 141, 157, 173, 189, 205, 221, 237, 253, 269, 285, 301,
+  317, 333, 349, 365, 381, 397, 413, 429, 445, 461, 477, 493, 509, 14,  30,
+  46,  62,  78,  94,  110, 126, 142, 158, 174, 190, 206, 222, 238, 254, 270,
+  286, 302, 318, 334, 350, 366, 382, 398, 414, 430, 446, 462, 478, 494, 510,
+  15,  31,  47,  63,  79,  95,  111, 127, 143, 159, 175, 191, 207, 223, 239,
+  255, 271, 287, 303, 319, 335, 351, 367, 383, 399, 415, 431, 447, 463, 479,
+  495, 511,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_16x32[512]) = {
+  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   10,  11,  12,  13,  14,
+  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
+  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,
+  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,
+  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,
+  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
+  90,  91,  92,  93,  94,  95,  96,  97,  98,  99,  100, 101, 102, 103, 104,
+  105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+  120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+  135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+  150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+  165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+  180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+  195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+  210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+  225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+  240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+  255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269,
+  270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284,
+  285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
+  300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314,
+  315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329,
+  330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344,
+  345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359,
+  360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374,
+  375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389,
+  390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404,
+  405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419,
+  420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434,
+  435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449,
+  450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464,
+  465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479,
+  480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494,
+  495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509,
+  510, 511,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_32x16[512]) = {
+  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   10,  11,  12,  13,  14,
+  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
+  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,
+  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,
+  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,
+  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
+  90,  91,  92,  93,  94,  95,  96,  97,  98,  99,  100, 101, 102, 103, 104,
+  105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+  120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+  135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+  150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+  165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+  180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+  195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+  210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+  225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+  240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+  255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269,
+  270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284,
+  285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
+  300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314,
+  315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329,
+  330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344,
+  345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359,
+  360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374,
+  375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389,
+  390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404,
+  405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419,
+  420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434,
+  435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449,
+  450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464,
+  465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479,
+  480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494,
+  495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509,
+  510, 511,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_16x16[256]) = {
+  0,  16, 32, 48, 64, 80, 96,  112, 128, 144, 160, 176, 192, 208, 224, 240,
+  1,  17, 33, 49, 65, 81, 97,  113, 129, 145, 161, 177, 193, 209, 225, 241,
+  2,  18, 34, 50, 66, 82, 98,  114, 130, 146, 162, 178, 194, 210, 226, 242,
+  3,  19, 35, 51, 67, 83, 99,  115, 131, 147, 163, 179, 195, 211, 227, 243,
+  4,  20, 36, 52, 68, 84, 100, 116, 132, 148, 164, 180, 196, 212, 228, 244,
+  5,  21, 37, 53, 69, 85, 101, 117, 133, 149, 165, 181, 197, 213, 229, 245,
+  6,  22, 38, 54, 70, 86, 102, 118, 134, 150, 166, 182, 198, 214, 230, 246,
+  7,  23, 39, 55, 71, 87, 103, 119, 135, 151, 167, 183, 199, 215, 231, 247,
+  8,  24, 40, 56, 72, 88, 104, 120, 136, 152, 168, 184, 200, 216, 232, 248,
+  9,  25, 41, 57, 73, 89, 105, 121, 137, 153, 169, 185, 201, 217, 233, 249,
+  10, 26, 42, 58, 74, 90, 106, 122, 138, 154, 170, 186, 202, 218, 234, 250,
+  11, 27, 43, 59, 75, 91, 107, 123, 139, 155, 171, 187, 203, 219, 235, 251,
+  12, 28, 44, 60, 76, 92, 108, 124, 140, 156, 172, 188, 204, 220, 236, 252,
+  13, 29, 45, 61, 77, 93, 109, 125, 141, 157, 173, 189, 205, 221, 237, 253,
+  14, 30, 46, 62, 78, 94, 110, 126, 142, 158, 174, 190, 206, 222, 238, 254,
+  15, 31, 47, 63, 79, 95, 111, 127, 143, 159, 175, 191, 207, 223, 239, 255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_16x16[256]) = {
+  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,   10,  11,  12,  13,  14,
+  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,
+  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,
+  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,
+  60,  61,  62,  63,  64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,
+  75,  76,  77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
+  90,  91,  92,  93,  94,  95,  96,  97,  98,  99,  100, 101, 102, 103, 104,
+  105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+  120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+  135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+  150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+  165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+  180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+  195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+  210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+  225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+  240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+  255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_16x16[256]) = {
+  0,   1,   5,   6,   14,  15,  27,  28,  44,  45,  65,  66,  90,  91,  119,
+  120, 2,   4,   7,   13,  16,  26,  29,  43,  46,  64,  67,  89,  92,  118,
+  121, 150, 3,   8,   12,  17,  25,  30,  42,  47,  63,  68,  88,  93,  117,
+  122, 149, 151, 9,   11,  18,  24,  31,  41,  48,  62,  69,  87,  94,  116,
+  123, 148, 152, 177, 10,  19,  23,  32,  40,  49,  61,  70,  86,  95,  115,
+  124, 147, 153, 176, 178, 20,  22,  33,  39,  50,  60,  71,  85,  96,  114,
+  125, 146, 154, 175, 179, 200, 21,  34,  38,  51,  59,  72,  84,  97,  113,
+  126, 145, 155, 174, 180, 199, 201, 35,  37,  52,  58,  73,  83,  98,  112,
+  127, 144, 156, 173, 181, 198, 202, 219, 36,  53,  57,  74,  82,  99,  111,
+  128, 143, 157, 172, 182, 197, 203, 218, 220, 54,  56,  75,  81,  100, 110,
+  129, 142, 158, 171, 183, 196, 204, 217, 221, 234, 55,  76,  80,  101, 109,
+  130, 141, 159, 170, 184, 195, 205, 216, 222, 233, 235, 77,  79,  102, 108,
+  131, 140, 160, 169, 185, 194, 206, 215, 223, 232, 236, 245, 78,  103, 107,
+  132, 139, 161, 168, 186, 193, 207, 214, 224, 231, 237, 244, 246, 104, 106,
+  133, 138, 162, 167, 187, 192, 208, 213, 225, 230, 238, 243, 247, 252, 105,
+  134, 137, 163, 166, 188, 191, 209, 212, 226, 229, 239, 242, 248, 251, 253,
+  135, 136, 164, 165, 189, 190, 210, 211, 227, 228, 240, 241, 249, 250, 254,
+  255
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_32x32[1024]) = {
+  0,   32,   64,  96,   128, 160,  192, 224,  256, 288,  320, 352,  384, 416,
+  448, 480,  512, 544,  576, 608,  640, 672,  704, 736,  768, 800,  832, 864,
+  896, 928,  960, 992,  1,   33,   65,  97,   129, 161,  193, 225,  257, 289,
+  321, 353,  385, 417,  449, 481,  513, 545,  577, 609,  641, 673,  705, 737,
+  769, 801,  833, 865,  897, 929,  961, 993,  2,   34,   66,  98,   130, 162,
+  194, 226,  258, 290,  322, 354,  386, 418,  450, 482,  514, 546,  578, 610,
+  642, 674,  706, 738,  770, 802,  834, 866,  898, 930,  962, 994,  3,   35,
+  67,  99,   131, 163,  195, 227,  259, 291,  323, 355,  387, 419,  451, 483,
+  515, 547,  579, 611,  643, 675,  707, 739,  771, 803,  835, 867,  899, 931,
+  963, 995,  4,   36,   68,  100,  132, 164,  196, 228,  260, 292,  324, 356,
+  388, 420,  452, 484,  516, 548,  580, 612,  644, 676,  708, 740,  772, 804,
+  836, 868,  900, 932,  964, 996,  5,   37,   69,  101,  133, 165,  197, 229,
+  261, 293,  325, 357,  389, 421,  453, 485,  517, 549,  581, 613,  645, 677,
+  709, 741,  773, 805,  837, 869,  901, 933,  965, 997,  6,   38,   70,  102,
+  134, 166,  198, 230,  262, 294,  326, 358,  390, 422,  454, 486,  518, 550,
+  582, 614,  646, 678,  710, 742,  774, 806,  838, 870,  902, 934,  966, 998,
+  7,   39,   71,  103,  135, 167,  199, 231,  263, 295,  327, 359,  391, 423,
+  455, 487,  519, 551,  583, 615,  647, 679,  711, 743,  775, 807,  839, 871,
+  903, 935,  967, 999,  8,   40,   72,  104,  136, 168,  200, 232,  264, 296,
+  328, 360,  392, 424,  456, 488,  520, 552,  584, 616,  648, 680,  712, 744,
+  776, 808,  840, 872,  904, 936,  968, 1000, 9,   41,   73,  105,  137, 169,
+  201, 233,  265, 297,  329, 361,  393, 425,  457, 489,  521, 553,  585, 617,
+  649, 681,  713, 745,  777, 809,  841, 873,  905, 937,  969, 1001, 10,  42,
+  74,  106,  138, 170,  202, 234,  266, 298,  330, 362,  394, 426,  458, 490,
+  522, 554,  586, 618,  650, 682,  714, 746,  778, 810,  842, 874,  906, 938,
+  970, 1002, 11,  43,   75,  107,  139, 171,  203, 235,  267, 299,  331, 363,
+  395, 427,  459, 491,  523, 555,  587, 619,  651, 683,  715, 747,  779, 811,
+  843, 875,  907, 939,  971, 1003, 12,  44,   76,  108,  140, 172,  204, 236,
+  268, 300,  332, 364,  396, 428,  460, 492,  524, 556,  588, 620,  652, 684,
+  716, 748,  780, 812,  844, 876,  908, 940,  972, 1004, 13,  45,   77,  109,
+  141, 173,  205, 237,  269, 301,  333, 365,  397, 429,  461, 493,  525, 557,
+  589, 621,  653, 685,  717, 749,  781, 813,  845, 877,  909, 941,  973, 1005,
+  14,  46,   78,  110,  142, 174,  206, 238,  270, 302,  334, 366,  398, 430,
+  462, 494,  526, 558,  590, 622,  654, 686,  718, 750,  782, 814,  846, 878,
+  910, 942,  974, 1006, 15,  47,   79,  111,  143, 175,  207, 239,  271, 303,
+  335, 367,  399, 431,  463, 495,  527, 559,  591, 623,  655, 687,  719, 751,
+  783, 815,  847, 879,  911, 943,  975, 1007, 16,  48,   80,  112,  144, 176,
+  208, 240,  272, 304,  336, 368,  400, 432,  464, 496,  528, 560,  592, 624,
+  656, 688,  720, 752,  784, 816,  848, 880,  912, 944,  976, 1008, 17,  49,
+  81,  113,  145, 177,  209, 241,  273, 305,  337, 369,  401, 433,  465, 497,
+  529, 561,  593, 625,  657, 689,  721, 753,  785, 817,  849, 881,  913, 945,
+  977, 1009, 18,  50,   82,  114,  146, 178,  210, 242,  274, 306,  338, 370,
+  402, 434,  466, 498,  530, 562,  594, 626,  658, 690,  722, 754,  786, 818,
+  850, 882,  914, 946,  978, 1010, 19,  51,   83,  115,  147, 179,  211, 243,
+  275, 307,  339, 371,  403, 435,  467, 499,  531, 563,  595, 627,  659, 691,
+  723, 755,  787, 819,  851, 883,  915, 947,  979, 1011, 20,  52,   84,  116,
+  148, 180,  212, 244,  276, 308,  340, 372,  404, 436,  468, 500,  532, 564,
+  596, 628,  660, 692,  724, 756,  788, 820,  852, 884,  916, 948,  980, 1012,
+  21,  53,   85,  117,  149, 181,  213, 245,  277, 309,  341, 373,  405, 437,
+  469, 501,  533, 565,  597, 629,  661, 693,  725, 757,  789, 821,  853, 885,
+  917, 949,  981, 1013, 22,  54,   86,  118,  150, 182,  214, 246,  278, 310,
+  342, 374,  406, 438,  470, 502,  534, 566,  598, 630,  662, 694,  726, 758,
+  790, 822,  854, 886,  918, 950,  982, 1014, 23,  55,   87,  119,  151, 183,
+  215, 247,  279, 311,  343, 375,  407, 439,  471, 503,  535, 567,  599, 631,
+  663, 695,  727, 759,  791, 823,  855, 887,  919, 951,  983, 1015, 24,  56,
+  88,  120,  152, 184,  216, 248,  280, 312,  344, 376,  408, 440,  472, 504,
+  536, 568,  600, 632,  664, 696,  728, 760,  792, 824,  856, 888,  920, 952,
+  984, 1016, 25,  57,   89,  121,  153, 185,  217, 249,  281, 313,  345, 377,
+  409, 441,  473, 505,  537, 569,  601, 633,  665, 697,  729, 761,  793, 825,
+  857, 889,  921, 953,  985, 1017, 26,  58,   90,  122,  154, 186,  218, 250,
+  282, 314,  346, 378,  410, 442,  474, 506,  538, 570,  602, 634,  666, 698,
+  730, 762,  794, 826,  858, 890,  922, 954,  986, 1018, 27,  59,   91,  123,
+  155, 187,  219, 251,  283, 315,  347, 379,  411, 443,  475, 507,  539, 571,
+  603, 635,  667, 699,  731, 763,  795, 827,  859, 891,  923, 955,  987, 1019,
+  28,  60,   92,  124,  156, 188,  220, 252,  284, 316,  348, 380,  412, 444,
+  476, 508,  540, 572,  604, 636,  668, 700,  732, 764,  796, 828,  860, 892,
+  924, 956,  988, 1020, 29,  61,   93,  125,  157, 189,  221, 253,  285, 317,
+  349, 381,  413, 445,  477, 509,  541, 573,  605, 637,  669, 701,  733, 765,
+  797, 829,  861, 893,  925, 957,  989, 1021, 30,  62,   94,  126,  158, 190,
+  222, 254,  286, 318,  350, 382,  414, 446,  478, 510,  542, 574,  606, 638,
+  670, 702,  734, 766,  798, 830,  862, 894,  926, 958,  990, 1022, 31,  63,
+  95,  127,  159, 191,  223, 255,  287, 319,  351, 383,  415, 447,  479, 511,
+  543, 575,  607, 639,  671, 703,  735, 767,  799, 831,  863, 895,  927, 959,
+  991, 1023,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_32x32[1024]) = {
+  0,    1,    2,    3,    4,    5,    6,    7,    8,    9,    10,   11,   12,
+  13,   14,   15,   16,   17,   18,   19,   20,   21,   22,   23,   24,   25,
+  26,   27,   28,   29,   30,   31,   32,   33,   34,   35,   36,   37,   38,
+  39,   40,   41,   42,   43,   44,   45,   46,   47,   48,   49,   50,   51,
+  52,   53,   54,   55,   56,   57,   58,   59,   60,   61,   62,   63,   64,
+  65,   66,   67,   68,   69,   70,   71,   72,   73,   74,   75,   76,   77,
+  78,   79,   80,   81,   82,   83,   84,   85,   86,   87,   88,   89,   90,
+  91,   92,   93,   94,   95,   96,   97,   98,   99,   100,  101,  102,  103,
+  104,  105,  106,  107,  108,  109,  110,  111,  112,  113,  114,  115,  116,
+  117,  118,  119,  120,  121,  122,  123,  124,  125,  126,  127,  128,  129,
+  130,  131,  132,  133,  134,  135,  136,  137,  138,  139,  140,  141,  142,
+  143,  144,  145,  146,  147,  148,  149,  150,  151,  152,  153,  154,  155,
+  156,  157,  158,  159,  160,  161,  162,  163,  164,  165,  166,  167,  168,
+  169,  170,  171,  172,  173,  174,  175,  176,  177,  178,  179,  180,  181,
+  182,  183,  184,  185,  186,  187,  188,  189,  190,  191,  192,  193,  194,
+  195,  196,  197,  198,  199,  200,  201,  202,  203,  204,  205,  206,  207,
+  208,  209,  210,  211,  212,  213,  214,  215,  216,  217,  218,  219,  220,
+  221,  222,  223,  224,  225,  226,  227,  228,  229,  230,  231,  232,  233,
+  234,  235,  236,  237,  238,  239,  240,  241,  242,  243,  244,  245,  246,
+  247,  248,  249,  250,  251,  252,  253,  254,  255,  256,  257,  258,  259,
+  260,  261,  262,  263,  264,  265,  266,  267,  268,  269,  270,  271,  272,
+  273,  274,  275,  276,  277,  278,  279,  280,  281,  282,  283,  284,  285,
+  286,  287,  288,  289,  290,  291,  292,  293,  294,  295,  296,  297,  298,
+  299,  300,  301,  302,  303,  304,  305,  306,  307,  308,  309,  310,  311,
+  312,  313,  314,  315,  316,  317,  318,  319,  320,  321,  322,  323,  324,
+  325,  326,  327,  328,  329,  330,  331,  332,  333,  334,  335,  336,  337,
+  338,  339,  340,  341,  342,  343,  344,  345,  346,  347,  348,  349,  350,
+  351,  352,  353,  354,  355,  356,  357,  358,  359,  360,  361,  362,  363,
+  364,  365,  366,  367,  368,  369,  370,  371,  372,  373,  374,  375,  376,
+  377,  378,  379,  380,  381,  382,  383,  384,  385,  386,  387,  388,  389,
+  390,  391,  392,  393,  394,  395,  396,  397,  398,  399,  400,  401,  402,
+  403,  404,  405,  406,  407,  408,  409,  410,  411,  412,  413,  414,  415,
+  416,  417,  418,  419,  420,  421,  422,  423,  424,  425,  426,  427,  428,
+  429,  430,  431,  432,  433,  434,  435,  436,  437,  438,  439,  440,  441,
+  442,  443,  444,  445,  446,  447,  448,  449,  450,  451,  452,  453,  454,
+  455,  456,  457,  458,  459,  460,  461,  462,  463,  464,  465,  466,  467,
+  468,  469,  470,  471,  472,  473,  474,  475,  476,  477,  478,  479,  480,
+  481,  482,  483,  484,  485,  486,  487,  488,  489,  490,  491,  492,  493,
+  494,  495,  496,  497,  498,  499,  500,  501,  502,  503,  504,  505,  506,
+  507,  508,  509,  510,  511,  512,  513,  514,  515,  516,  517,  518,  519,
+  520,  521,  522,  523,  524,  525,  526,  527,  528,  529,  530,  531,  532,
+  533,  534,  535,  536,  537,  538,  539,  540,  541,  542,  543,  544,  545,
+  546,  547,  548,  549,  550,  551,  552,  553,  554,  555,  556,  557,  558,
+  559,  560,  561,  562,  563,  564,  565,  566,  567,  568,  569,  570,  571,
+  572,  573,  574,  575,  576,  577,  578,  579,  580,  581,  582,  583,  584,
+  585,  586,  587,  588,  589,  590,  591,  592,  593,  594,  595,  596,  597,
+  598,  599,  600,  601,  602,  603,  604,  605,  606,  607,  608,  609,  610,
+  611,  612,  613,  614,  615,  616,  617,  618,  619,  620,  621,  622,  623,
+  624,  625,  626,  627,  628,  629,  630,  631,  632,  633,  634,  635,  636,
+  637,  638,  639,  640,  641,  642,  643,  644,  645,  646,  647,  648,  649,
+  650,  651,  652,  653,  654,  655,  656,  657,  658,  659,  660,  661,  662,
+  663,  664,  665,  666,  667,  668,  669,  670,  671,  672,  673,  674,  675,
+  676,  677,  678,  679,  680,  681,  682,  683,  684,  685,  686,  687,  688,
+  689,  690,  691,  692,  693,  694,  695,  696,  697,  698,  699,  700,  701,
+  702,  703,  704,  705,  706,  707,  708,  709,  710,  711,  712,  713,  714,
+  715,  716,  717,  718,  719,  720,  721,  722,  723,  724,  725,  726,  727,
+  728,  729,  730,  731,  732,  733,  734,  735,  736,  737,  738,  739,  740,
+  741,  742,  743,  744,  745,  746,  747,  748,  749,  750,  751,  752,  753,
+  754,  755,  756,  757,  758,  759,  760,  761,  762,  763,  764,  765,  766,
+  767,  768,  769,  770,  771,  772,  773,  774,  775,  776,  777,  778,  779,
+  780,  781,  782,  783,  784,  785,  786,  787,  788,  789,  790,  791,  792,
+  793,  794,  795,  796,  797,  798,  799,  800,  801,  802,  803,  804,  805,
+  806,  807,  808,  809,  810,  811,  812,  813,  814,  815,  816,  817,  818,
+  819,  820,  821,  822,  823,  824,  825,  826,  827,  828,  829,  830,  831,
+  832,  833,  834,  835,  836,  837,  838,  839,  840,  841,  842,  843,  844,
+  845,  846,  847,  848,  849,  850,  851,  852,  853,  854,  855,  856,  857,
+  858,  859,  860,  861,  862,  863,  864,  865,  866,  867,  868,  869,  870,
+  871,  872,  873,  874,  875,  876,  877,  878,  879,  880,  881,  882,  883,
+  884,  885,  886,  887,  888,  889,  890,  891,  892,  893,  894,  895,  896,
+  897,  898,  899,  900,  901,  902,  903,  904,  905,  906,  907,  908,  909,
+  910,  911,  912,  913,  914,  915,  916,  917,  918,  919,  920,  921,  922,
+  923,  924,  925,  926,  927,  928,  929,  930,  931,  932,  933,  934,  935,
+  936,  937,  938,  939,  940,  941,  942,  943,  944,  945,  946,  947,  948,
+  949,  950,  951,  952,  953,  954,  955,  956,  957,  958,  959,  960,  961,
+  962,  963,  964,  965,  966,  967,  968,  969,  970,  971,  972,  973,  974,
+  975,  976,  977,  978,  979,  980,  981,  982,  983,  984,  985,  986,  987,
+  988,  989,  990,  991,  992,  993,  994,  995,  996,  997,  998,  999,  1000,
+  1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013,
+  1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_32x32[1024]) = {
+  0,    1,    5,    6,    14,   15,   27,   28,   44,   45,   65,   66,   90,
+  91,   119,  120,  152,  153,  189,  190,  230,  231,  275,  276,  324,  325,
+  377,  378,  434,  435,  495,  496,  2,    4,    7,    13,   16,   26,   29,
+  43,   46,   64,   67,   89,   92,   118,  121,  151,  154,  188,  191,  229,
+  232,  274,  277,  323,  326,  376,  379,  433,  436,  494,  497,  558,  3,
+  8,    12,   17,   25,   30,   42,   47,   63,   68,   88,   93,   117,  122,
+  150,  155,  187,  192,  228,  233,  273,  278,  322,  327,  375,  380,  432,
+  437,  493,  498,  557,  559,  9,    11,   18,   24,   31,   41,   48,   62,
+  69,   87,   94,   116,  123,  149,  156,  186,  193,  227,  234,  272,  279,
+  321,  328,  374,  381,  431,  438,  492,  499,  556,  560,  617,  10,   19,
+  23,   32,   40,   49,   61,   70,   86,   95,   115,  124,  148,  157,  185,
+  194,  226,  235,  271,  280,  320,  329,  373,  382,  430,  439,  491,  500,
+  555,  561,  616,  618,  20,   22,   33,   39,   50,   60,   71,   85,   96,
+  114,  125,  147,  158,  184,  195,  225,  236,  270,  281,  319,  330,  372,
+  383,  429,  440,  490,  501,  554,  562,  615,  619,  672,  21,   34,   38,
+  51,   59,   72,   84,   97,   113,  126,  146,  159,  183,  196,  224,  237,
+  269,  282,  318,  331,  371,  384,  428,  441,  489,  502,  553,  563,  614,
+  620,  671,  673,  35,   37,   52,   58,   73,   83,   98,   112,  127,  145,
+  160,  182,  197,  223,  238,  268,  283,  317,  332,  370,  385,  427,  442,
+  488,  503,  552,  564,  613,  621,  670,  674,  723,  36,   53,   57,   74,
+  82,   99,   111,  128,  144,  161,  181,  198,  222,  239,  267,  284,  316,
+  333,  369,  386,  426,  443,  487,  504,  551,  565,  612,  622,  669,  675,
+  722,  724,  54,   56,   75,   81,   100,  110,  129,  143,  162,  180,  199,
+  221,  240,  266,  285,  315,  334,  368,  387,  425,  444,  486,  505,  550,
+  566,  611,  623,  668,  676,  721,  725,  770,  55,   76,   80,   101,  109,
+  130,  142,  163,  179,  200,  220,  241,  265,  286,  314,  335,  367,  388,
+  424,  445,  485,  506,  549,  567,  610,  624,  667,  677,  720,  726,  769,
+  771,  77,   79,   102,  108,  131,  141,  164,  178,  201,  219,  242,  264,
+  287,  313,  336,  366,  389,  423,  446,  484,  507,  548,  568,  609,  625,
+  666,  678,  719,  727,  768,  772,  813,  78,   103,  107,  132,  140,  165,
+  177,  202,  218,  243,  263,  288,  312,  337,  365,  390,  422,  447,  483,
+  508,  547,  569,  608,  626,  665,  679,  718,  728,  767,  773,  812,  814,
+  104,  106,  133,  139,  166,  176,  203,  217,  244,  262,  289,  311,  338,
+  364,  391,  421,  448,  482,  509,  546,  570,  607,  627,  664,  680,  717,
+  729,  766,  774,  811,  815,  852,  105,  134,  138,  167,  175,  204,  216,
+  245,  261,  290,  310,  339,  363,  392,  420,  449,  481,  510,  545,  571,
+  606,  628,  663,  681,  716,  730,  765,  775,  810,  816,  851,  853,  135,
+  137,  168,  174,  205,  215,  246,  260,  291,  309,  340,  362,  393,  419,
+  450,  480,  511,  544,  572,  605,  629,  662,  682,  715,  731,  764,  776,
+  809,  817,  850,  854,  887,  136,  169,  173,  206,  214,  247,  259,  292,
+  308,  341,  361,  394,  418,  451,  479,  512,  543,  573,  604,  630,  661,
+  683,  714,  732,  763,  777,  808,  818,  849,  855,  886,  888,  170,  172,
+  207,  213,  248,  258,  293,  307,  342,  360,  395,  417,  452,  478,  513,
+  542,  574,  603,  631,  660,  684,  713,  733,  762,  778,  807,  819,  848,
+  856,  885,  889,  918,  171,  208,  212,  249,  257,  294,  306,  343,  359,
+  396,  416,  453,  477,  514,  541,  575,  602,  632,  659,  685,  712,  734,
+  761,  779,  806,  820,  847,  857,  884,  890,  917,  919,  209,  211,  250,
+  256,  295,  305,  344,  358,  397,  415,  454,  476,  515,  540,  576,  601,
+  633,  658,  686,  711,  735,  760,  780,  805,  821,  846,  858,  883,  891,
+  916,  920,  945,  210,  251,  255,  296,  304,  345,  357,  398,  414,  455,
+  475,  516,  539,  577,  600,  634,  657,  687,  710,  736,  759,  781,  804,
+  822,  845,  859,  882,  892,  915,  921,  944,  946,  252,  254,  297,  303,
+  346,  356,  399,  413,  456,  474,  517,  538,  578,  599,  635,  656,  688,
+  709,  737,  758,  782,  803,  823,  844,  860,  881,  893,  914,  922,  943,
+  947,  968,  253,  298,  302,  347,  355,  400,  412,  457,  473,  518,  537,
+  579,  598,  636,  655,  689,  708,  738,  757,  783,  802,  824,  843,  861,
+  880,  894,  913,  923,  942,  948,  967,  969,  299,  301,  348,  354,  401,
+  411,  458,  472,  519,  536,  580,  597,  637,  654,  690,  707,  739,  756,
+  784,  801,  825,  842,  862,  879,  895,  912,  924,  941,  949,  966,  970,
+  987,  300,  349,  353,  402,  410,  459,  471,  520,  535,  581,  596,  638,
+  653,  691,  706,  740,  755,  785,  800,  826,  841,  863,  878,  896,  911,
+  925,  940,  950,  965,  971,  986,  988,  350,  352,  403,  409,  460,  470,
+  521,  534,  582,  595,  639,  652,  692,  705,  741,  754,  786,  799,  827,
+  840,  864,  877,  897,  910,  926,  939,  951,  964,  972,  985,  989,  1002,
+  351,  404,  408,  461,  469,  522,  533,  583,  594,  640,  651,  693,  704,
+  742,  753,  787,  798,  828,  839,  865,  876,  898,  909,  927,  938,  952,
+  963,  973,  984,  990,  1001, 1003, 405,  407,  462,  468,  523,  532,  584,
+  593,  641,  650,  694,  703,  743,  752,  788,  797,  829,  838,  866,  875,
+  899,  908,  928,  937,  953,  962,  974,  983,  991,  1000, 1004, 1013, 406,
+  463,  467,  524,  531,  585,  592,  642,  649,  695,  702,  744,  751,  789,
+  796,  830,  837,  867,  874,  900,  907,  929,  936,  954,  961,  975,  982,
+  992,  999,  1005, 1012, 1014, 464,  466,  525,  530,  586,  591,  643,  648,
+  696,  701,  745,  750,  790,  795,  831,  836,  868,  873,  901,  906,  930,
+  935,  955,  960,  976,  981,  993,  998,  1006, 1011, 1015, 1020, 465,  526,
+  529,  587,  590,  644,  647,  697,  700,  746,  749,  791,  794,  832,  835,
+  869,  872,  902,  905,  931,  934,  956,  959,  977,  980,  994,  997,  1007,
+  1010, 1016, 1019, 1021, 527,  528,  588,  589,  645,  646,  698,  699,  747,
+  748,  792,  793,  833,  834,  870,  871,  903,  904,  932,  933,  957,  958,
+  978,  979,  995,  996,  1008, 1009, 1017, 1018, 1022, 1023
+};
+
+const SCAN_ORDER av1_default_scan_orders[TX_SIZES] = {
+  { default_scan_4x4, av1_default_iscan_4x4, default_scan_4x4_neighbors },
+  { default_scan_8x8, av1_default_iscan_8x8, default_scan_8x8_neighbors },
+  { default_scan_16x16, av1_default_iscan_16x16, default_scan_16x16_neighbors },
+  { default_scan_32x32, av1_default_iscan_32x32, default_scan_32x32_neighbors },
+  // Half of the coefficients of tx64 at higher frequencies are set to
+  // zeros. So tx32's scan order is used.
+  { default_scan_32x32, av1_default_iscan_32x32, default_scan_32x32_neighbors },
+};
+
+const SCAN_ORDER av1_scan_orders[TX_SIZES_ALL][TX_TYPES] = {
+  {
+      // TX_4X4
+      { default_scan_4x4, av1_default_iscan_4x4, default_scan_4x4_neighbors },
+      { default_scan_4x4, av1_default_iscan_4x4, default_scan_4x4_neighbors },
+      { default_scan_4x4, av1_default_iscan_4x4, default_scan_4x4_neighbors },
+      { default_scan_4x4, av1_default_iscan_4x4, default_scan_4x4_neighbors },
+      { default_scan_4x4, av1_default_iscan_4x4, default_scan_4x4_neighbors },
+      { default_scan_4x4, av1_default_iscan_4x4, default_scan_4x4_neighbors },
+      { default_scan_4x4, av1_default_iscan_4x4, default_scan_4x4_neighbors },
+      { default_scan_4x4, av1_default_iscan_4x4, default_scan_4x4_neighbors },
+      { default_scan_4x4, av1_default_iscan_4x4, default_scan_4x4_neighbors },
+      { default_scan_4x4, av1_default_iscan_4x4, default_scan_4x4_neighbors },
+      { mrow_scan_4x4, av1_mrow_iscan_4x4, mrow_scan_4x4_neighbors },
+      { mcol_scan_4x4, av1_mcol_iscan_4x4, mcol_scan_4x4_neighbors },
+      { mrow_scan_4x4, av1_mrow_iscan_4x4, mrow_scan_4x4_neighbors },
+      { mcol_scan_4x4, av1_mcol_iscan_4x4, mcol_scan_4x4_neighbors },
+      { mrow_scan_4x4, av1_mrow_iscan_4x4, mrow_scan_4x4_neighbors },
+      { mcol_scan_4x4, av1_mcol_iscan_4x4, mcol_scan_4x4_neighbors },
+  },
+  {
+      // TX_8X8
+      { default_scan_8x8, av1_default_iscan_8x8, default_scan_8x8_neighbors },
+      { default_scan_8x8, av1_default_iscan_8x8, default_scan_8x8_neighbors },
+      { default_scan_8x8, av1_default_iscan_8x8, default_scan_8x8_neighbors },
+      { default_scan_8x8, av1_default_iscan_8x8, default_scan_8x8_neighbors },
+      { default_scan_8x8, av1_default_iscan_8x8, default_scan_8x8_neighbors },
+      { default_scan_8x8, av1_default_iscan_8x8, default_scan_8x8_neighbors },
+      { default_scan_8x8, av1_default_iscan_8x8, default_scan_8x8_neighbors },
+      { default_scan_8x8, av1_default_iscan_8x8, default_scan_8x8_neighbors },
+      { default_scan_8x8, av1_default_iscan_8x8, default_scan_8x8_neighbors },
+      { default_scan_8x8, av1_default_iscan_8x8, default_scan_8x8_neighbors },
+      { mrow_scan_8x8, av1_mrow_iscan_8x8, mrow_scan_8x8_neighbors },
+      { mcol_scan_8x8, av1_mcol_iscan_8x8, mcol_scan_8x8_neighbors },
+      { mrow_scan_8x8, av1_mrow_iscan_8x8, mrow_scan_8x8_neighbors },
+      { mcol_scan_8x8, av1_mcol_iscan_8x8, mcol_scan_8x8_neighbors },
+      { mrow_scan_8x8, av1_mrow_iscan_8x8, mrow_scan_8x8_neighbors },
+      { mcol_scan_8x8, av1_mcol_iscan_8x8, mcol_scan_8x8_neighbors },
+  },
+  {
+      // TX_16X16
+      { default_scan_16x16, av1_default_iscan_16x16,
+        default_scan_16x16_neighbors },
+      { default_scan_16x16, av1_default_iscan_16x16,
+        default_scan_16x16_neighbors },
+      { default_scan_16x16, av1_default_iscan_16x16,
+        default_scan_16x16_neighbors },
+      { default_scan_16x16, av1_default_iscan_16x16,
+        default_scan_16x16_neighbors },
+      { default_scan_16x16, av1_default_iscan_16x16,
+        default_scan_16x16_neighbors },
+      { default_scan_16x16, av1_default_iscan_16x16,
+        default_scan_16x16_neighbors },
+      { default_scan_16x16, av1_default_iscan_16x16,
+        default_scan_16x16_neighbors },
+      { default_scan_16x16, av1_default_iscan_16x16,
+        default_scan_16x16_neighbors },
+      { default_scan_16x16, av1_default_iscan_16x16,
+        default_scan_16x16_neighbors },
+      { default_scan_16x16, av1_default_iscan_16x16,
+        default_scan_16x16_neighbors },
+      { mrow_scan_16x16, av1_mrow_iscan_16x16, mrow_scan_16x16_neighbors },
+      { mcol_scan_16x16, av1_mcol_iscan_16x16, mcol_scan_16x16_neighbors },
+      { mrow_scan_16x16, av1_mrow_iscan_16x16, mrow_scan_16x16_neighbors },
+      { mcol_scan_16x16, av1_mcol_iscan_16x16, mcol_scan_16x16_neighbors },
+      { mrow_scan_16x16, av1_mrow_iscan_16x16, mrow_scan_16x16_neighbors },
+      { mcol_scan_16x16, av1_mcol_iscan_16x16, mcol_scan_16x16_neighbors },
+  },
+  {
+      // TX_32X32
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { mrow_scan_32x32, av1_mrow_iscan_32x32, mrow_scan_32x32_neighbors },
+      { mcol_scan_32x32, av1_mcol_iscan_32x32, mcol_scan_32x32_neighbors },
+      { mrow_scan_32x32, av1_mrow_iscan_32x32, mrow_scan_32x32_neighbors },
+      { mcol_scan_32x32, av1_mcol_iscan_32x32, mcol_scan_32x32_neighbors },
+      { mrow_scan_32x32, av1_mrow_iscan_32x32, mrow_scan_32x32_neighbors },
+      { mcol_scan_32x32, av1_mcol_iscan_32x32, mcol_scan_32x32_neighbors },
+  },
+  {
+      // TX_64X64
+      // Half of the coefficients of tx64 at higher frequencies are set to
+      // zeros. So tx32's scan order is used.
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { mrow_scan_32x32, av1_mrow_iscan_32x32, mrow_scan_32x32_neighbors },
+      { mcol_scan_32x32, av1_mcol_iscan_32x32, mcol_scan_32x32_neighbors },
+      { mrow_scan_32x32, av1_mrow_iscan_32x32, mrow_scan_32x32_neighbors },
+      { mcol_scan_32x32, av1_mcol_iscan_32x32, mcol_scan_32x32_neighbors },
+      { mrow_scan_32x32, av1_mrow_iscan_32x32, mrow_scan_32x32_neighbors },
+      { mcol_scan_32x32, av1_mcol_iscan_32x32, mcol_scan_32x32_neighbors },
+  },
+  {
+      // TX_4X8
+      { default_scan_4x8, av1_default_iscan_4x8, default_scan_4x8_neighbors },
+      { default_scan_4x8, av1_default_iscan_4x8, default_scan_4x8_neighbors },
+      { default_scan_4x8, av1_default_iscan_4x8, default_scan_4x8_neighbors },
+      { default_scan_4x8, av1_default_iscan_4x8, default_scan_4x8_neighbors },
+      { default_scan_4x8, av1_default_iscan_4x8, default_scan_4x8_neighbors },
+      { default_scan_4x8, av1_default_iscan_4x8, default_scan_4x8_neighbors },
+      { default_scan_4x8, av1_default_iscan_4x8, default_scan_4x8_neighbors },
+      { default_scan_4x8, av1_default_iscan_4x8, default_scan_4x8_neighbors },
+      { default_scan_4x8, av1_default_iscan_4x8, default_scan_4x8_neighbors },
+      { default_scan_4x8, av1_default_iscan_4x8, default_scan_4x8_neighbors },
+      { mrow_scan_4x8, av1_mrow_iscan_4x8, mrow_scan_4x8_neighbors },
+      { mcol_scan_4x8, av1_mcol_iscan_4x8, mcol_scan_4x8_neighbors },
+      { mrow_scan_4x8, av1_mrow_iscan_4x8, mrow_scan_4x8_neighbors },
+      { mcol_scan_4x8, av1_mcol_iscan_4x8, mcol_scan_4x8_neighbors },
+      { mrow_scan_4x8, av1_mrow_iscan_4x8, mrow_scan_4x8_neighbors },
+      { mcol_scan_4x8, av1_mcol_iscan_4x8, mcol_scan_4x8_neighbors },
+  },
+  {
+      // TX_8X4
+      { default_scan_8x4, av1_default_iscan_8x4, default_scan_8x4_neighbors },
+      { default_scan_8x4, av1_default_iscan_8x4, default_scan_8x4_neighbors },
+      { default_scan_8x4, av1_default_iscan_8x4, default_scan_8x4_neighbors },
+      { default_scan_8x4, av1_default_iscan_8x4, default_scan_8x4_neighbors },
+      { default_scan_8x4, av1_default_iscan_8x4, default_scan_8x4_neighbors },
+      { default_scan_8x4, av1_default_iscan_8x4, default_scan_8x4_neighbors },
+      { default_scan_8x4, av1_default_iscan_8x4, default_scan_8x4_neighbors },
+      { default_scan_8x4, av1_default_iscan_8x4, default_scan_8x4_neighbors },
+      { default_scan_8x4, av1_default_iscan_8x4, default_scan_8x4_neighbors },
+      { default_scan_8x4, av1_default_iscan_8x4, default_scan_8x4_neighbors },
+      { mrow_scan_8x4, av1_mrow_iscan_8x4, mrow_scan_8x4_neighbors },
+      { mcol_scan_8x4, av1_mcol_iscan_8x4, mcol_scan_8x4_neighbors },
+      { mrow_scan_8x4, av1_mrow_iscan_8x4, mrow_scan_8x4_neighbors },
+      { mcol_scan_8x4, av1_mcol_iscan_8x4, mcol_scan_8x4_neighbors },
+      { mrow_scan_8x4, av1_mrow_iscan_8x4, mrow_scan_8x4_neighbors },
+      { mcol_scan_8x4, av1_mcol_iscan_8x4, mcol_scan_8x4_neighbors },
+  },
+  {
+      // TX_8X16
+      { default_scan_8x16, av1_default_iscan_8x16,
+        default_scan_8x16_neighbors },
+      { default_scan_8x16, av1_default_iscan_8x16,
+        default_scan_8x16_neighbors },
+      { default_scan_8x16, av1_default_iscan_8x16,
+        default_scan_8x16_neighbors },
+      { default_scan_8x16, av1_default_iscan_8x16,
+        default_scan_8x16_neighbors },
+      { default_scan_8x16, av1_default_iscan_8x16,
+        default_scan_8x16_neighbors },
+      { default_scan_8x16, av1_default_iscan_8x16,
+        default_scan_8x16_neighbors },
+      { default_scan_8x16, av1_default_iscan_8x16,
+        default_scan_8x16_neighbors },
+      { default_scan_8x16, av1_default_iscan_8x16,
+        default_scan_8x16_neighbors },
+      { default_scan_8x16, av1_default_iscan_8x16,
+        default_scan_8x16_neighbors },
+      { default_scan_8x16, av1_default_iscan_8x16,
+        default_scan_8x16_neighbors },
+      { mrow_scan_8x16, av1_mrow_iscan_8x16, mrow_scan_8x16_neighbors },
+      { mcol_scan_8x16, av1_mcol_iscan_8x16, mcol_scan_8x16_neighbors },
+      { mrow_scan_8x16, av1_mrow_iscan_8x16, mrow_scan_8x16_neighbors },
+      { mcol_scan_8x16, av1_mcol_iscan_8x16, mcol_scan_8x16_neighbors },
+      { mrow_scan_8x16, av1_mrow_iscan_8x16, mrow_scan_8x16_neighbors },
+      { mcol_scan_8x16, av1_mcol_iscan_8x16, mcol_scan_8x16_neighbors },
+  },
+  {
+      // TX_16X8
+      { default_scan_16x8, av1_default_iscan_16x8,
+        default_scan_16x8_neighbors },
+      { default_scan_16x8, av1_default_iscan_16x8,
+        default_scan_16x8_neighbors },
+      { default_scan_16x8, av1_default_iscan_16x8,
+        default_scan_16x8_neighbors },
+      { default_scan_16x8, av1_default_iscan_16x8,
+        default_scan_16x8_neighbors },
+      { default_scan_16x8, av1_default_iscan_16x8,
+        default_scan_16x8_neighbors },
+      { default_scan_16x8, av1_default_iscan_16x8,
+        default_scan_16x8_neighbors },
+      { default_scan_16x8, av1_default_iscan_16x8,
+        default_scan_16x8_neighbors },
+      { default_scan_16x8, av1_default_iscan_16x8,
+        default_scan_16x8_neighbors },
+      { default_scan_16x8, av1_default_iscan_16x8,
+        default_scan_16x8_neighbors },
+      { default_scan_16x8, av1_default_iscan_16x8,
+        default_scan_16x8_neighbors },
+      { mrow_scan_16x8, av1_mrow_iscan_16x8, mrow_scan_16x8_neighbors },
+      { mcol_scan_16x8, av1_mcol_iscan_16x8, mcol_scan_16x8_neighbors },
+      { mrow_scan_16x8, av1_mrow_iscan_16x8, mrow_scan_16x8_neighbors },
+      { mcol_scan_16x8, av1_mcol_iscan_16x8, mcol_scan_16x8_neighbors },
+      { mrow_scan_16x8, av1_mrow_iscan_16x8, mrow_scan_16x8_neighbors },
+      { mcol_scan_16x8, av1_mcol_iscan_16x8, mcol_scan_16x8_neighbors },
+  },
+  {
+      // TX_16X32
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { mrow_scan_16x32, av1_mrow_iscan_16x32, mrow_scan_16x32_neighbors },
+      { mcol_scan_16x32, av1_mcol_iscan_16x32, mcol_scan_16x32_neighbors },
+      { mrow_scan_16x32, av1_mrow_iscan_16x32, mrow_scan_16x32_neighbors },
+      { mcol_scan_16x32, av1_mcol_iscan_16x32, mcol_scan_16x32_neighbors },
+      { mrow_scan_16x32, av1_mrow_iscan_16x32, mrow_scan_16x32_neighbors },
+      { mcol_scan_16x32, av1_mcol_iscan_16x32, mcol_scan_16x32_neighbors },
+  },
+  {
+      // TX_32X16
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { mrow_scan_32x16, av1_mrow_iscan_32x16, mrow_scan_32x16_neighbors },
+      { mcol_scan_32x16, av1_mcol_iscan_32x16, mcol_scan_32x16_neighbors },
+      { mrow_scan_32x16, av1_mrow_iscan_32x16, mrow_scan_32x16_neighbors },
+      { mcol_scan_32x16, av1_mcol_iscan_32x16, mcol_scan_32x16_neighbors },
+      { mrow_scan_32x16, av1_mrow_iscan_32x16, mrow_scan_32x16_neighbors },
+      { mcol_scan_32x16, av1_mcol_iscan_32x16, mcol_scan_32x16_neighbors },
+  },
+  {
+      // TX_32X64
+      // Half of the coefficients of tx64 at higher frequencies are set to
+      // zeros. So tx32's scan order is used.
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { mrow_scan_32x32, av1_mrow_iscan_32x32, mrow_scan_32x32_neighbors },
+      { mcol_scan_32x32, av1_mcol_iscan_32x32, mcol_scan_32x32_neighbors },
+      { mrow_scan_32x32, av1_mrow_iscan_32x32, mrow_scan_32x32_neighbors },
+      { mcol_scan_32x32, av1_mcol_iscan_32x32, mcol_scan_32x32_neighbors },
+      { mrow_scan_32x32, av1_mrow_iscan_32x32, mrow_scan_32x32_neighbors },
+      { mcol_scan_32x32, av1_mcol_iscan_32x32, mcol_scan_32x32_neighbors },
+  },
+  {
+      // TX_64X32
+      // Half of the coefficients of tx64 at higher frequencies are set to
+      // zeros. So tx32's scan order is used.
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { default_scan_32x32, av1_default_iscan_32x32,
+        default_scan_32x32_neighbors },
+      { mrow_scan_32x32, av1_mrow_iscan_32x32, mrow_scan_32x32_neighbors },
+      { mcol_scan_32x32, av1_mcol_iscan_32x32, mcol_scan_32x32_neighbors },
+      { mrow_scan_32x32, av1_mrow_iscan_32x32, mrow_scan_32x32_neighbors },
+      { mcol_scan_32x32, av1_mcol_iscan_32x32, mcol_scan_32x32_neighbors },
+      { mrow_scan_32x32, av1_mrow_iscan_32x32, mrow_scan_32x32_neighbors },
+      { mcol_scan_32x32, av1_mcol_iscan_32x32, mcol_scan_32x32_neighbors },
+  },
+  {
+      // TX_4X16
+      { default_scan_4x16, av1_default_iscan_4x16,
+        default_scan_4x16_neighbors },
+      { default_scan_4x16, av1_default_iscan_4x16,
+        default_scan_4x16_neighbors },
+      { default_scan_4x16, av1_default_iscan_4x16,
+        default_scan_4x16_neighbors },
+      { default_scan_4x16, av1_default_iscan_4x16,
+        default_scan_4x16_neighbors },
+      { default_scan_4x16, av1_default_iscan_4x16,
+        default_scan_4x16_neighbors },
+      { default_scan_4x16, av1_default_iscan_4x16,
+        default_scan_4x16_neighbors },
+      { default_scan_4x16, av1_default_iscan_4x16,
+        default_scan_4x16_neighbors },
+      { default_scan_4x16, av1_default_iscan_4x16,
+        default_scan_4x16_neighbors },
+      { default_scan_4x16, av1_default_iscan_4x16,
+        default_scan_4x16_neighbors },
+      { default_scan_4x16, av1_default_iscan_4x16,
+        default_scan_4x16_neighbors },
+      { mrow_scan_4x16, av1_mrow_iscan_4x16, mrow_scan_4x16_neighbors },
+      { mcol_scan_4x16, av1_mcol_iscan_4x16, mcol_scan_4x16_neighbors },
+      { mrow_scan_4x16, av1_mrow_iscan_4x16, mrow_scan_4x16_neighbors },
+      { mcol_scan_4x16, av1_mcol_iscan_4x16, mcol_scan_4x16_neighbors },
+      { mrow_scan_4x16, av1_mrow_iscan_4x16, mrow_scan_4x16_neighbors },
+      { mcol_scan_4x16, av1_mcol_iscan_4x16, mcol_scan_4x16_neighbors },
+  },
+  {
+      // TX_16X4
+      { default_scan_16x4, av1_default_iscan_16x4,
+        default_scan_16x4_neighbors },
+      { default_scan_16x4, av1_default_iscan_16x4,
+        default_scan_16x4_neighbors },
+      { default_scan_16x4, av1_default_iscan_16x4,
+        default_scan_16x4_neighbors },
+      { default_scan_16x4, av1_default_iscan_16x4,
+        default_scan_16x4_neighbors },
+      { default_scan_16x4, av1_default_iscan_16x4,
+        default_scan_16x4_neighbors },
+      { default_scan_16x4, av1_default_iscan_16x4,
+        default_scan_16x4_neighbors },
+      { default_scan_16x4, av1_default_iscan_16x4,
+        default_scan_16x4_neighbors },
+      { default_scan_16x4, av1_default_iscan_16x4,
+        default_scan_16x4_neighbors },
+      { default_scan_16x4, av1_default_iscan_16x4,
+        default_scan_16x4_neighbors },
+      { default_scan_16x4, av1_default_iscan_16x4,
+        default_scan_16x4_neighbors },
+      { mrow_scan_16x4, av1_mrow_iscan_16x4, mrow_scan_16x4_neighbors },
+      { mcol_scan_16x4, av1_mcol_iscan_16x4, mcol_scan_16x4_neighbors },
+      { mrow_scan_16x4, av1_mrow_iscan_16x4, mrow_scan_16x4_neighbors },
+      { mcol_scan_16x4, av1_mcol_iscan_16x4, mcol_scan_16x4_neighbors },
+      { mrow_scan_16x4, av1_mrow_iscan_16x4, mrow_scan_16x4_neighbors },
+      { mcol_scan_16x4, av1_mcol_iscan_16x4, mcol_scan_16x4_neighbors },
+  },
+  {
+      // TX_8X32
+      { default_scan_8x32, av1_default_iscan_8x32,
+        default_scan_8x32_neighbors },
+      { default_scan_8x32, av1_default_iscan_8x32,
+        default_scan_8x32_neighbors },
+      { default_scan_8x32, av1_default_iscan_8x32,
+        default_scan_8x32_neighbors },
+      { default_scan_8x32, av1_default_iscan_8x32,
+        default_scan_8x32_neighbors },
+      { default_scan_8x32, av1_default_iscan_8x32,
+        default_scan_8x32_neighbors },
+      { default_scan_8x32, av1_default_iscan_8x32,
+        default_scan_8x32_neighbors },
+      { default_scan_8x32, av1_default_iscan_8x32,
+        default_scan_8x32_neighbors },
+      { default_scan_8x32, av1_default_iscan_8x32,
+        default_scan_8x32_neighbors },
+      { default_scan_8x32, av1_default_iscan_8x32,
+        default_scan_8x32_neighbors },
+      { default_scan_8x32, av1_default_iscan_8x32,
+        default_scan_8x32_neighbors },
+      { mrow_scan_8x32, av1_mrow_iscan_8x32, mrow_scan_8x32_neighbors },
+      { mcol_scan_8x32, av1_mcol_iscan_8x32, mcol_scan_8x32_neighbors },
+      { mrow_scan_8x32, av1_mrow_iscan_8x32, mrow_scan_8x32_neighbors },
+      { mcol_scan_8x32, av1_mcol_iscan_8x32, mcol_scan_8x32_neighbors },
+      { mrow_scan_8x32, av1_mrow_iscan_8x32, mrow_scan_8x32_neighbors },
+      { mcol_scan_8x32, av1_mcol_iscan_8x32, mcol_scan_8x32_neighbors },
+  },
+  {
+      // TX_32X8
+      { default_scan_32x8, av1_default_iscan_32x8,
+        default_scan_32x8_neighbors },
+      { default_scan_32x8, av1_default_iscan_32x8,
+        default_scan_32x8_neighbors },
+      { default_scan_32x8, av1_default_iscan_32x8,
+        default_scan_32x8_neighbors },
+      { default_scan_32x8, av1_default_iscan_32x8,
+        default_scan_32x8_neighbors },
+      { default_scan_32x8, av1_default_iscan_32x8,
+        default_scan_32x8_neighbors },
+      { default_scan_32x8, av1_default_iscan_32x8,
+        default_scan_32x8_neighbors },
+      { default_scan_32x8, av1_default_iscan_32x8,
+        default_scan_32x8_neighbors },
+      { default_scan_32x8, av1_default_iscan_32x8,
+        default_scan_32x8_neighbors },
+      { default_scan_32x8, av1_default_iscan_32x8,
+        default_scan_32x8_neighbors },
+      { default_scan_32x8, av1_default_iscan_32x8,
+        default_scan_32x8_neighbors },
+      { mrow_scan_32x8, av1_mrow_iscan_32x8, mrow_scan_32x8_neighbors },
+      { mcol_scan_32x8, av1_mcol_iscan_32x8, mcol_scan_32x8_neighbors },
+      { mrow_scan_32x8, av1_mrow_iscan_32x8, mrow_scan_32x8_neighbors },
+      { mcol_scan_32x8, av1_mcol_iscan_32x8, mcol_scan_32x8_neighbors },
+      { mrow_scan_32x8, av1_mrow_iscan_32x8, mrow_scan_32x8_neighbors },
+      { mcol_scan_32x8, av1_mcol_iscan_32x8, mcol_scan_32x8_neighbors },
+  },
+  {
+      // TX_16X64
+      // Half of the coefficients of tx64 at higher frequencies are set to
+      // zeros. So tx32's scan order is used.
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { default_scan_16x32, av1_default_iscan_16x32,
+        default_scan_16x32_neighbors },
+      { mrow_scan_16x32, av1_mrow_iscan_16x32, mrow_scan_16x32_neighbors },
+      { mcol_scan_16x32, av1_mcol_iscan_16x32, mcol_scan_16x32_neighbors },
+      { mrow_scan_16x32, av1_mrow_iscan_16x32, mrow_scan_16x32_neighbors },
+      { mcol_scan_16x32, av1_mcol_iscan_16x32, mcol_scan_16x32_neighbors },
+      { mrow_scan_16x32, av1_mrow_iscan_16x32, mrow_scan_16x32_neighbors },
+      { mcol_scan_16x32, av1_mcol_iscan_16x32, mcol_scan_16x32_neighbors },
+  },
+  {
+      // TX_64X16
+      // Half of the coefficients of tx64 at higher frequencies are set to
+      // zeros. So tx32's scan order is used.
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { default_scan_32x16, av1_default_iscan_32x16,
+        default_scan_32x16_neighbors },
+      { mrow_scan_32x16, av1_mrow_iscan_32x16, mrow_scan_32x16_neighbors },
+      { mcol_scan_32x16, av1_mcol_iscan_32x16, mcol_scan_32x16_neighbors },
+      { mrow_scan_32x16, av1_mrow_iscan_32x16, mrow_scan_32x16_neighbors },
+      { mcol_scan_32x16, av1_mcol_iscan_32x16, mcol_scan_32x16_neighbors },
+      { mrow_scan_32x16, av1_mrow_iscan_32x16, mrow_scan_32x16_neighbors },
+      { mcol_scan_32x16, av1_mcol_iscan_32x16, mcol_scan_32x16_neighbors },
+  },
+};
+
+
+const int16_t * av1_idct_scans[TX_SIZES_ALL][3] = {
+{ default_scan_4x4        ,    mrow_scan_4x4    ,    mcol_scan_4x4 },    //  Tx_4x4,   
+{ default_scan_8x8      ,    mrow_scan_8x8      ,    mcol_scan_8x8 },    //  Tx_8x8,   
+{ default_scan_16x16    ,    mrow_scan_16x16    ,    mcol_scan_16x16 },    //  Tx_16x16, 
+{ default_scan_32x32    ,    mrow_scan_32x32    ,    mcol_scan_32x32 },    //  Tx_32x32, 
+{ default_scan_32x32    ,    mrow_scan_32x32    ,    mcol_scan_32x32 },    //  Tx_64x64, 
+{ default_scan_4x8      ,    mrow_scan_4x8      ,    mcol_scan_4x8 },    //  Tx_4x8,   
+{ default_scan_8x4      ,    mrow_scan_8x4      ,    mcol_scan_8x4 },    //  Tx_8x4,   
+{ default_scan_8x16     ,    mrow_scan_8x16     ,    mcol_scan_8x16 },    //  Tx_8x16,  
+{ default_scan_16x8     ,    mrow_scan_16x8     ,    mcol_scan_16x8 },    //  Tx_16x8,  
+{ default_scan_16x32    ,    mrow_scan_16x32    ,    mcol_scan_16x32 },    //  Tx_16x32, 
+{ default_scan_32x16    ,    mrow_scan_32x16    ,    mcol_scan_32x16 },    //  Tx_32x16, 
+{ default_scan_32x32    ,    mrow_scan_32x32    ,    mcol_scan_32x32 },    //  Tx_32x64, 
+{ default_scan_32x32    ,    mrow_scan_32x32    ,    mcol_scan_32x32 },    //  Tx_64x32, 
+{ default_scan_4x16     ,    mrow_scan_4x16     ,    mcol_scan_4x16 },    //  Tx_4x16,  
+{ default_scan_16x4     ,    mrow_scan_16x4     ,    mcol_scan_16x4 },    //  Tx_16x4,  
+{ default_scan_8x32     ,    mrow_scan_8x32     ,    mcol_scan_8x32 },    //  Tx_8x32,  
+{ default_scan_32x8     ,    mrow_scan_32x8     ,    mcol_scan_32x8 },    //  Tx_32x8,  
+{ default_scan_16x32    ,    mrow_scan_16x32    ,    mcol_scan_16x32 },    //  Tx_16x64, 
+{ default_scan_32x16    ,    mrow_scan_32x16    ,    mcol_scan_32x16 },    //  Tx_64x16, 
+};
\ No newline at end of file

diff --git a/libav1/av1/common/scan.h b/libav1/av1/common/scan.h
new file mode 100644
index 0000000..6b1cd9e
--- /dev/null
+++ b/libav1/av1/common/scan.h

@@ -0,0 +1,57 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_SCAN_H_
+#define AOM_AV1_COMMON_SCAN_H_
+
+#include "aom/aom_integer.h"
+#include "aom_ports/mem.h"
+
+#include "av1/common/enums.h"
+#include "av1/common/onyxc_int.h"
+#include "av1/common/blockd.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MAX_NEIGHBORS 2
+
+enum {
+  SCAN_MODE_ZIG_ZAG,
+  SCAN_MODE_COL_DIAG,
+  SCAN_MODE_ROW_DIAG,
+  SCAN_MODE_COL_1D,
+  SCAN_MODE_ROW_1D,
+  SCAN_MODES
+} UENUM1BYTE(SCAN_MODE);
+
+extern const SCAN_ORDER av1_default_scan_orders[TX_SIZES];
+extern const SCAN_ORDER av1_scan_orders[TX_SIZES_ALL][TX_TYPES];
+
+extern const int16_t * av1_idct_scans[TX_SIZES_ALL][3];
+
+void av1_deliver_eob_threshold(const AV1_COMMON *cm, MACROBLOCKD *xd);
+
+static INLINE const SCAN_ORDER *get_default_scan(TX_SIZE tx_size,
+                                                 TX_TYPE tx_type) {
+  return &av1_scan_orders[tx_size][tx_type];
+}
+
+static INLINE const SCAN_ORDER *get_scan(TX_SIZE tx_size, TX_TYPE tx_type) {
+  return get_default_scan(tx_size, tx_type);
+}
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_SCAN_H_

diff --git a/libav1/av1/common/seg_common.c b/libav1/av1/common/seg_common.c
new file mode 100644
index 0000000..4650903
--- /dev/null
+++ b/libav1/av1/common/seg_common.c

@@ -0,0 +1,91 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+
+#include "av1/common/av1_loopfilter.h"
+#include "av1/common/blockd.h"
+#include "av1/common/seg_common.h"
+#include "av1/common/quant_common.h"
+
+static const int seg_feature_data_signed[SEG_LVL_MAX] = {
+  1, 1, 1, 1, 1, 0, 0, 0
+};
+
+static const int seg_feature_data_max[SEG_LVL_MAX] = { MAXQ,
+                                                       MAX_LOOP_FILTER,
+                                                       MAX_LOOP_FILTER,
+                                                       MAX_LOOP_FILTER,
+                                                       MAX_LOOP_FILTER,
+                                                       7,
+                                                       0,
+                                                       0 };
+
+// These functions provide access to new segment level features.
+// Eventually these function may be "optimized out" but for the moment,
+// the coding mechanism is still subject to change so these provide a
+// convenient single point of change.
+
+void av1_clearall_segfeatures(struct segmentation *seg) {
+  av1_zero(seg->feature_data);
+  av1_zero(seg->feature_mask);
+}
+
+void calculate_segdata(struct segmentation *seg) {
+  seg->segid_preskip = 0;
+  seg->last_active_segid = 0;
+  for (int i = 0; i < MAX_SEGMENTS; i++) {
+    for (int j = 0; j < SEG_LVL_MAX; j++) {
+      if (seg->feature_mask[i] & (1 << j)) {
+        seg->segid_preskip |= (j >= SEG_LVL_REF_FRAME);
+        seg->last_active_segid = i;
+      }
+    }
+  }
+}
+
+void av1_enable_segfeature(struct segmentation *seg, int segment_id,
+                           SEG_LVL_FEATURES feature_id) {
+  seg->feature_mask[segment_id] |= 1 << feature_id;
+}
+
+int av1_seg_feature_data_max(SEG_LVL_FEATURES feature_id) {
+  return seg_feature_data_max[feature_id];
+}
+
+int av1_is_segfeature_signed(SEG_LVL_FEATURES feature_id) {
+  return seg_feature_data_signed[feature_id];
+}
+
+// The 'seg_data' given for each segment can be either deltas (from the default
+// value chosen for the frame) or absolute values.
+//
+// Valid range for abs values is (0-127 for MB_LVL_ALT_Q), (0-63 for
+// SEGMENT_ALT_LF)
+// Valid range for delta values are (+/-127 for MB_LVL_ALT_Q), (+/-63 for
+// SEGMENT_ALT_LF)
+//
+// abs_delta = SEGMENT_DELTADATA (deltas) abs_delta = SEGMENT_ABSDATA (use
+// the absolute values given).
+
+void av1_set_segdata(struct segmentation *seg, int segment_id,
+                     SEG_LVL_FEATURES feature_id, int seg_data) {
+  if (seg_data < 0) {
+    assert(seg_feature_data_signed[feature_id]);
+    assert(-seg_data <= seg_feature_data_max[feature_id]);
+  } else {
+    assert(seg_data <= seg_feature_data_max[feature_id]);
+  }
+
+  seg->feature_data[segment_id][feature_id] = seg_data;
+}
+
+// TBD? Functions to read and write segment data with range / validity checking

diff --git a/libav1/av1/common/seg_common.h b/libav1/av1/common/seg_common.h
new file mode 100644
index 0000000..fa7894c
--- /dev/null
+++ b/libav1/av1/common/seg_common.h

@@ -0,0 +1,104 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_SEG_COMMON_H_
+#define AOM_AV1_COMMON_SEG_COMMON_H_
+
+#include "aom_dsp/prob.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MAX_SEGMENTS 8
+#define SEG_TREE_PROBS (MAX_SEGMENTS - 1)
+
+#define SEG_TEMPORAL_PRED_CTXS 3
+#define SPATIAL_PREDICTION_PROBS 3
+
+enum {
+  SEG_LVL_ALT_Q,       // Use alternate Quantizer ....
+  SEG_LVL_ALT_LF_Y_V,  // Use alternate loop filter value on y plane vertical
+  SEG_LVL_ALT_LF_Y_H,  // Use alternate loop filter value on y plane horizontal
+  SEG_LVL_ALT_LF_U,    // Use alternate loop filter value on u plane
+  SEG_LVL_ALT_LF_V,    // Use alternate loop filter value on v plane
+  SEG_LVL_REF_FRAME,   // Optional Segment reference frame
+  SEG_LVL_SKIP,        // Optional Segment (0,0) + skip mode
+  SEG_LVL_GLOBALMV,
+  SEG_LVL_MAX
+} UENUM1BYTE(SEG_LVL_FEATURES);
+
+struct segmentation {
+  uint8_t enabled;
+  uint8_t update_map;
+  uint8_t update_data;
+  uint8_t temporal_update;
+
+  int16_t feature_data[MAX_SEGMENTS][SEG_LVL_MAX];
+  unsigned int feature_mask[MAX_SEGMENTS];
+  int last_active_segid;  // The highest numbered segment id that has some
+                          // enabled feature.
+  uint8_t segid_preskip;  // Whether the segment id will be read before the
+                          // skip syntax element.
+                          // 1: the segment id will be read first.
+                          // 0: the skip syntax element will be read first.
+};
+
+struct segmentation_probs {
+  aom_cdf_prob tree_cdf[CDF_SIZE(MAX_SEGMENTS)];
+  aom_cdf_prob pred_cdf[SEG_TEMPORAL_PRED_CTXS][CDF_SIZE(2)];
+  aom_cdf_prob spatial_pred_seg_cdf[SPATIAL_PREDICTION_PROBS]
+                                   [CDF_SIZE(MAX_SEGMENTS)];
+};
+
+static INLINE int segfeature_active(const struct segmentation *seg,
+                                    int segment_id,
+                                    SEG_LVL_FEATURES feature_id) {
+  return seg->enabled && (seg->feature_mask[segment_id] & (1 << feature_id));
+}
+
+static INLINE void segfeatures_copy(struct segmentation *dst,
+                                    const struct segmentation *src) {
+  int i, j;
+  for (i = 0; i < MAX_SEGMENTS; i++) {
+    dst->feature_mask[i] = src->feature_mask[i];
+    for (j = 0; j < SEG_LVL_MAX; j++) {
+      dst->feature_data[i][j] = src->feature_data[i][j];
+    }
+  }
+  dst->segid_preskip = src->segid_preskip;
+  dst->last_active_segid = src->last_active_segid;
+}
+
+void av1_clearall_segfeatures(struct segmentation *seg);
+
+void av1_enable_segfeature(struct segmentation *seg, int segment_id,
+                           SEG_LVL_FEATURES feature_id);
+
+void calculate_segdata(struct segmentation *seg);
+
+int av1_seg_feature_data_max(SEG_LVL_FEATURES feature_id);
+
+int av1_is_segfeature_signed(SEG_LVL_FEATURES feature_id);
+
+void av1_set_segdata(struct segmentation *seg, int segment_id,
+                     SEG_LVL_FEATURES feature_id, int seg_data);
+
+static INLINE int get_segdata(const struct segmentation *seg, int segment_id,
+                              SEG_LVL_FEATURES feature_id) {
+  return seg->feature_data[segment_id][feature_id];
+}
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_SEG_COMMON_H_

diff --git a/libav1/av1/common/thread_common.c b/libav1/av1/common/thread_common.c
new file mode 100644
index 0000000..58c0b0b
--- /dev/null
+++ b/libav1/av1/common/thread_common.c

@@ -0,0 +1,927 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "config/aom_config.h"
+#include "config/aom_scale_rtcd.h"
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_mem/aom_mem.h"
+#include "av1/common/av1_loopfilter.h"
+#include "av1/common/entropymode.h"
+#include "av1/common/thread_common.h"
+#include "av1/common/reconinter.h"
+
+// Set up nsync by width.
+static INLINE int get_sync_range(int width) {
+  // nsync numbers are picked by testing. For example, for 4k
+  // video, using 4 gives best performance.
+  if (width < 640)
+    return 1;
+  else if (width <= 1280)
+    return 2;
+  else if (width <= 4096)
+    return 4;
+  else
+    return 8;
+}
+
+static INLINE int get_lr_sync_range(int width) {
+#if 0
+  // nsync numbers are picked by testing. For example, for 4k
+  // video, using 4 gives best performance.
+  if (width < 640)
+    return 1;
+  else if (width <= 1280)
+    return 2;
+  else if (width <= 4096)
+    return 4;
+  else
+    return 8;
+#else
+  (void)width;
+  return 1;
+#endif
+}
+
+// Allocate memory for lf row synchronization
+static void loop_filter_alloc(AV1LfSync *lf_sync, AV1_COMMON *cm, int rows,
+                              int width, int num_workers) {
+  lf_sync->rows = rows;
+#if CONFIG_MULTITHREAD
+  {
+    int i, j;
+
+    for (j = 0; j < MAX_MB_PLANE; j++) {
+      CHECK_MEM_ERROR(cm, lf_sync->mutex_[j],
+                      aom_malloc(sizeof(*(lf_sync->mutex_[j])) * rows));
+      if (lf_sync->mutex_[j]) {
+        for (i = 0; i < rows; ++i) {
+          pthread_mutex_init(&lf_sync->mutex_[j][i], NULL);
+        }
+      }
+
+      CHECK_MEM_ERROR(cm, lf_sync->cond_[j],
+                      aom_malloc(sizeof(*(lf_sync->cond_[j])) * rows));
+      if (lf_sync->cond_[j]) {
+        for (i = 0; i < rows; ++i) {
+          pthread_cond_init(&lf_sync->cond_[j][i], NULL);
+        }
+      }
+    }
+
+    CHECK_MEM_ERROR(cm, lf_sync->job_mutex,
+                    aom_malloc(sizeof(*(lf_sync->job_mutex))));
+    if (lf_sync->job_mutex) {
+      pthread_mutex_init(lf_sync->job_mutex, NULL);
+    }
+  }
+#endif  // CONFIG_MULTITHREAD
+  CHECK_MEM_ERROR(cm, lf_sync->lfdata,
+                  aom_malloc(num_workers * sizeof(*(lf_sync->lfdata))));
+  lf_sync->num_workers = num_workers;
+
+  for (int j = 0; j < MAX_MB_PLANE; j++) {
+    CHECK_MEM_ERROR(cm, lf_sync->cur_sb_col[j],
+                    aom_malloc(sizeof(*(lf_sync->cur_sb_col[j])) * rows));
+  }
+  CHECK_MEM_ERROR(
+      cm, lf_sync->job_queue,
+      aom_malloc(sizeof(*(lf_sync->job_queue)) * rows * MAX_MB_PLANE * 2));
+  // Set up nsync.
+  lf_sync->sync_range = get_sync_range(width);
+}
+
+// Deallocate lf synchronization related mutex and data
+void av1_loop_filter_dealloc(AV1LfSync *lf_sync) {
+  if (lf_sync != NULL) {
+    int j;
+#if CONFIG_MULTITHREAD
+    int i;
+    for (j = 0; j < MAX_MB_PLANE; j++) {
+      if (lf_sync->mutex_[j] != NULL) {
+        for (i = 0; i < lf_sync->rows; ++i) {
+          pthread_mutex_destroy(&lf_sync->mutex_[j][i]);
+        }
+        aom_free(lf_sync->mutex_[j]);
+      }
+      if (lf_sync->cond_[j] != NULL) {
+        for (i = 0; i < lf_sync->rows; ++i) {
+          pthread_cond_destroy(&lf_sync->cond_[j][i]);
+        }
+        aom_free(lf_sync->cond_[j]);
+      }
+    }
+    if (lf_sync->job_mutex != NULL) {
+      pthread_mutex_destroy(lf_sync->job_mutex);
+      aom_free(lf_sync->job_mutex);
+    }
+#endif  // CONFIG_MULTITHREAD
+    aom_free(lf_sync->lfdata);
+    for (j = 0; j < MAX_MB_PLANE; j++) {
+      aom_free(lf_sync->cur_sb_col[j]);
+    }
+
+    aom_free(lf_sync->job_queue);
+    // clear the structure as the source of this call may be a resize in which
+    // case this call will be followed by an _alloc() which may fail.
+    av1_zero(*lf_sync);
+  }
+}
+
+static void loop_filter_data_reset(LFWorkerData *lf_data,
+                                   YV12_BUFFER_CONFIG *frame_buffer,
+                                   struct AV1Common *cm, MACROBLOCKD *xd) {
+  struct macroblockd_plane *pd = xd->plane;
+  lf_data->frame_buffer = frame_buffer;
+  lf_data->cm = cm;
+  lf_data->xd = xd;
+  for (int i = 0; i < MAX_MB_PLANE; i++) {
+    memcpy(&lf_data->planes[i].dst, &pd[i].dst, sizeof(lf_data->planes[i].dst));
+    lf_data->planes[i].subsampling_x = pd[i].subsampling_x;
+    lf_data->planes[i].subsampling_y = pd[i].subsampling_y;
+  }
+}
+
+static INLINE void sync_read(AV1LfSync *const lf_sync, int r, int c,
+                             int plane) {
+#if CONFIG_MULTITHREAD
+  const int nsync = lf_sync->sync_range;
+
+  if (r && !(c & (nsync - 1))) {
+    pthread_mutex_t *const mutex = &lf_sync->mutex_[plane][r - 1];
+    pthread_mutex_lock(mutex);
+
+    while (c > lf_sync->cur_sb_col[plane][r - 1] - nsync) {
+      pthread_cond_wait(&lf_sync->cond_[plane][r - 1], mutex);
+    }
+    pthread_mutex_unlock(mutex);
+  }
+#else
+  (void)lf_sync;
+  (void)r;
+  (void)c;
+  (void)plane;
+#endif  // CONFIG_MULTITHREAD
+}
+
+static INLINE void sync_write(AV1LfSync *const lf_sync, int r, int c,
+                              const int sb_cols, int plane) {
+#if CONFIG_MULTITHREAD
+  const int nsync = lf_sync->sync_range;
+  int cur;
+  // Only signal when there are enough filtered SB for next row to run.
+  int sig = 1;
+
+  if (c < sb_cols - 1) {
+    cur = c;
+    if (c % nsync) sig = 0;
+  } else {
+    cur = sb_cols + nsync;
+  }
+
+  if (sig) {
+    pthread_mutex_lock(&lf_sync->mutex_[plane][r]);
+
+    lf_sync->cur_sb_col[plane][r] = cur;
+
+    pthread_cond_broadcast(&lf_sync->cond_[plane][r]);
+    pthread_mutex_unlock(&lf_sync->mutex_[plane][r]);
+  }
+#else
+  (void)lf_sync;
+  (void)r;
+  (void)c;
+  (void)sb_cols;
+  (void)plane;
+#endif  // CONFIG_MULTITHREAD
+}
+
+static void enqueue_lf_jobs(AV1LfSync *lf_sync, AV1_COMMON *cm, int start,
+                            int stop,
+#if LOOP_FILTER_BITMASK
+                            int is_decoding,
+#endif
+                            int plane_start, int plane_end) {
+  int mi_row, plane, dir;
+  AV1LfMTInfo *lf_job_queue = lf_sync->job_queue;
+  lf_sync->jobs_enqueued = 0;
+  lf_sync->jobs_dequeued = 0;
+
+  for (dir = 0; dir < 2; dir++) {
+    for (plane = plane_start; plane < plane_end; plane++) {
+      if (plane == 0 && !(cm->lf.filter_level[0]) && !(cm->lf.filter_level[1]))
+        break;
+      else if (plane == 1 && !(cm->lf.filter_level_u))
+        continue;
+      else if (plane == 2 && !(cm->lf.filter_level_v))
+        continue;
+#if LOOP_FILTER_BITMASK
+      int step = MAX_MIB_SIZE;
+      if (is_decoding) {
+        step = MI_SIZE_64X64;
+      }
+      for (mi_row = start; mi_row < stop; mi_row += step)
+#else
+      for (mi_row = start; mi_row < stop; mi_row += MAX_MIB_SIZE)
+#endif
+      {
+        lf_job_queue->mi_row = mi_row;
+        lf_job_queue->plane = plane;
+        lf_job_queue->dir = dir;
+        lf_job_queue++;
+        lf_sync->jobs_enqueued++;
+      }
+    }
+  }
+}
+
+static AV1LfMTInfo *get_lf_job_info(AV1LfSync *lf_sync) {
+  AV1LfMTInfo *cur_job_info = NULL;
+
+#if CONFIG_MULTITHREAD
+  pthread_mutex_lock(lf_sync->job_mutex);
+
+  if (lf_sync->jobs_dequeued < lf_sync->jobs_enqueued) {
+    cur_job_info = lf_sync->job_queue + lf_sync->jobs_dequeued;
+    lf_sync->jobs_dequeued++;
+  }
+
+  pthread_mutex_unlock(lf_sync->job_mutex);
+#else
+  (void)lf_sync;
+#endif
+
+  return cur_job_info;
+}
+
+// Implement row loopfiltering for each thread.
+static INLINE void thread_loop_filter_rows(
+    const YV12_BUFFER_CONFIG *const frame_buffer, AV1_COMMON *const cm,
+    struct macroblockd_plane *planes, MACROBLOCKD *xd,
+    AV1LfSync *const lf_sync) {
+  const int sb_cols =
+      ALIGN_POWER_OF_TWO(cm->mi_cols, MAX_MIB_SIZE_LOG2) >> MAX_MIB_SIZE_LOG2;
+  int mi_row, mi_col, plane, dir;
+  int r, c;
+
+  while (1) {
+    AV1LfMTInfo *cur_job_info = get_lf_job_info(lf_sync);
+
+    if (cur_job_info != NULL) {
+      mi_row = cur_job_info->mi_row;
+      plane = cur_job_info->plane;
+      dir = cur_job_info->dir;
+      r = mi_row >> MAX_MIB_SIZE_LOG2;
+
+      if (dir == 0) {
+        for (mi_col = 0; mi_col < cm->mi_cols; mi_col += MAX_MIB_SIZE) {
+          c = mi_col >> MAX_MIB_SIZE_LOG2;
+
+          av1_setup_dst_planes(planes, cm->seq_params.sb_size, frame_buffer,
+                               mi_row, mi_col, plane, plane + 1);
+
+          av1_filter_block_plane_vert(cm, xd, plane, &planes[plane], mi_row,
+                                      mi_col);
+          sync_write(lf_sync, r, c, sb_cols, plane);
+        }
+      } else if (dir == 1) {
+        for (mi_col = 0; mi_col < cm->mi_cols; mi_col += MAX_MIB_SIZE) {
+          c = mi_col >> MAX_MIB_SIZE_LOG2;
+
+          // Wait for vertical edge filtering of the top-right block to be
+          // completed
+          sync_read(lf_sync, r, c, plane);
+
+          // Wait for vertical edge filtering of the right block to be
+          // completed
+          sync_read(lf_sync, r + 1, c, plane);
+
+          av1_setup_dst_planes(planes, cm->seq_params.sb_size, frame_buffer,
+                               mi_row, mi_col, plane, plane + 1);
+          av1_filter_block_plane_horz(cm, xd, plane, &planes[plane], mi_row,
+                                      mi_col);
+        }
+      }
+    } else {
+      break;
+    }
+  }
+}
+
+// Row-based multi-threaded loopfilter hook
+static int loop_filter_row_worker(void *arg1, void *arg2) {
+  AV1LfSync *const lf_sync = (AV1LfSync *)arg1;
+  LFWorkerData *const lf_data = (LFWorkerData *)arg2;
+  thread_loop_filter_rows(lf_data->frame_buffer, lf_data->cm, lf_data->planes,
+                          lf_data->xd, lf_sync);
+  return 1;
+}
+
+#if LOOP_FILTER_BITMASK
+static INLINE void thread_loop_filter_bitmask_rows(
+    const YV12_BUFFER_CONFIG *const frame_buffer, AV1_COMMON *const cm,
+    struct macroblockd_plane *planes, MACROBLOCKD *xd,
+    AV1LfSync *const lf_sync) {
+  const int sb_cols =
+      ALIGN_POWER_OF_TWO(cm->mi_cols, MIN_MIB_SIZE_LOG2) >> MIN_MIB_SIZE_LOG2;
+  int mi_row, mi_col, plane, dir;
+  int r, c;
+  (void)xd;
+
+  while (1) {
+    AV1LfMTInfo *cur_job_info = get_lf_job_info(lf_sync);
+
+    if (cur_job_info != NULL) {
+      mi_row = cur_job_info->mi_row;
+      plane = cur_job_info->plane;
+      dir = cur_job_info->dir;
+      r = mi_row >> MIN_MIB_SIZE_LOG2;
+
+      if (dir == 0) {
+        for (mi_col = 0; mi_col < cm->mi_cols; mi_col += MI_SIZE_64X64) {
+          c = mi_col >> MIN_MIB_SIZE_LOG2;
+
+          av1_setup_dst_planes(planes, BLOCK_64X64, frame_buffer, mi_row,
+                               mi_col, plane, plane + 1);
+
+          av1_filter_block_plane_bitmask_vert(cm, &planes[plane], plane, mi_row,
+                                              mi_col);
+          sync_write(lf_sync, r, c, sb_cols, plane);
+        }
+      } else if (dir == 1) {
+        for (mi_col = 0; mi_col < cm->mi_cols; mi_col += MI_SIZE_64X64) {
+          c = mi_col >> MIN_MIB_SIZE_LOG2;
+
+          // Wait for vertical edge filtering of the top-right block to be
+          // completed
+          sync_read(lf_sync, r, c, plane);
+
+          // Wait for vertical edge filtering of the right block to be
+          // completed
+          sync_read(lf_sync, r + 1, c, plane);
+
+          av1_setup_dst_planes(planes, BLOCK_64X64, frame_buffer, mi_row,
+                               mi_col, plane, plane + 1);
+          av1_filter_block_plane_bitmask_horz(cm, &planes[plane], plane, mi_row,
+                                              mi_col);
+        }
+      }
+    } else {
+      break;
+    }
+  }
+}
+
+// Row-based multi-threaded loopfilter hook
+static int loop_filter_bitmask_row_worker(void *arg1, void *arg2) {
+  AV1LfSync *const lf_sync = (AV1LfSync *)arg1;
+  LFWorkerData *const lf_data = (LFWorkerData *)arg2;
+  thread_loop_filter_bitmask_rows(lf_data->frame_buffer, lf_data->cm,
+                                  lf_data->planes, lf_data->xd, lf_sync);
+  return 1;
+}
+#endif  // LOOP_FILTER_BITMASK
+
+static void loop_filter_rows_mt(YV12_BUFFER_CONFIG *frame, AV1_COMMON *cm,
+                                MACROBLOCKD *xd, int start, int stop,
+                                int plane_start, int plane_end,
+#if LOOP_FILTER_BITMASK
+                                int is_decoding,
+#endif
+                                AVxWorker *workers, int nworkers,
+                                AV1LfSync *lf_sync) {
+  const AVxWorkerInterface *const winterface = aom_get_worker_interface();
+#if LOOP_FILTER_BITMASK
+  int sb_rows;
+  if (is_decoding) {
+    sb_rows =
+        ALIGN_POWER_OF_TWO(cm->mi_rows, MIN_MIB_SIZE_LOG2) >> MIN_MIB_SIZE_LOG2;
+  } else {
+    sb_rows =
+        ALIGN_POWER_OF_TWO(cm->mi_rows, MAX_MIB_SIZE_LOG2) >> MAX_MIB_SIZE_LOG2;
+  }
+#else
+  // Number of superblock rows and cols
+  const int sb_rows =
+      ALIGN_POWER_OF_TWO(cm->mi_rows, MAX_MIB_SIZE_LOG2) >> MAX_MIB_SIZE_LOG2;
+#endif
+  const int num_workers = nworkers;
+  int i;
+
+  if (!lf_sync->sync_range || sb_rows != lf_sync->rows ||
+      num_workers > lf_sync->num_workers) {
+    av1_loop_filter_dealloc(lf_sync);
+    loop_filter_alloc(lf_sync, cm, sb_rows, cm->width, num_workers);
+  }
+
+  // Initialize cur_sb_col to -1 for all SB rows.
+  for (i = 0; i < MAX_MB_PLANE; i++) {
+    memset(lf_sync->cur_sb_col[i], -1,
+           sizeof(*(lf_sync->cur_sb_col[i])) * sb_rows);
+  }
+
+  enqueue_lf_jobs(lf_sync, cm, start, stop,
+#if LOOP_FILTER_BITMASK
+                  is_decoding,
+#endif
+                  plane_start, plane_end);
+
+  // Set up loopfilter thread data.
+  for (i = 0; i < num_workers; ++i) {
+    AVxWorker *const worker = &workers[i];
+    LFWorkerData *const lf_data = &lf_sync->lfdata[i];
+
+#if LOOP_FILTER_BITMASK
+    if (is_decoding) {
+      worker->hook = loop_filter_bitmask_row_worker;
+    } else {
+      worker->hook = loop_filter_row_worker;
+    }
+#else
+    worker->hook = loop_filter_row_worker;
+#endif
+    worker->data1 = lf_sync;
+    worker->data2 = lf_data;
+
+    // Loopfilter data
+    loop_filter_data_reset(lf_data, frame, cm, xd);
+
+    // Start loopfiltering
+    if (i == num_workers - 1) {
+      winterface->execute(worker);
+    } else {
+      winterface->launch(worker);
+    }
+  }
+
+  // Wait till all rows are finished
+  for (i = 0; i < num_workers; ++i) {
+    winterface->sync(&workers[i]);
+  }
+}
+
+void av1_loop_filter_frame_mt(YV12_BUFFER_CONFIG *frame, AV1_COMMON *cm,
+                              MACROBLOCKD *xd, int plane_start, int plane_end,
+                              int partial_frame,
+#if LOOP_FILTER_BITMASK
+                              int is_decoding,
+#endif
+                              AVxWorker *workers, int num_workers,
+                              AV1LfSync *lf_sync) {
+  int start_mi_row, end_mi_row, mi_rows_to_filter;
+
+  start_mi_row = 0;
+  mi_rows_to_filter = cm->mi_rows;
+  if (partial_frame && cm->mi_rows > 8) {
+    start_mi_row = cm->mi_rows >> 1;
+    start_mi_row &= 0xfffffff8;
+    mi_rows_to_filter = AOMMAX(cm->mi_rows / 8, 8);
+  }
+  end_mi_row = start_mi_row + mi_rows_to_filter;
+  av1_loop_filter_frame_init(cm, plane_start, plane_end);
+
+#if LOOP_FILTER_BITMASK
+  if (is_decoding) {
+    cm->is_decoding = is_decoding;
+    // TODO(chengchen): currently use one thread to build bitmasks for the
+    // frame. Make it support multi-thread later.
+    for (int plane = plane_start; plane < plane_end; plane++) {
+      if (plane == 0 && !(cm->lf.filter_level[0]) && !(cm->lf.filter_level[1]))
+        break;
+      else if (plane == 1 && !(cm->lf.filter_level_u))
+        continue;
+      else if (plane == 2 && !(cm->lf.filter_level_v))
+        continue;
+
+      // TODO(chengchen): can we remove this?
+      struct macroblockd_plane *pd = xd->plane;
+      av1_setup_dst_planes(pd, cm->seq_params.sb_size, frame, 0, 0, plane,
+                           plane + 1);
+
+      av1_build_bitmask_vert_info(cm, &pd[plane], plane);
+      av1_build_bitmask_horz_info(cm, &pd[plane], plane);
+    }
+    loop_filter_rows_mt(frame, cm, xd, start_mi_row, end_mi_row, plane_start,
+                        plane_end, 1, workers, num_workers, lf_sync);
+  } else {
+    loop_filter_rows_mt(frame, cm, xd, start_mi_row, end_mi_row, plane_start,
+                        plane_end, 0, workers, num_workers, lf_sync);
+  }
+#else
+  loop_filter_rows_mt(frame, cm, xd, start_mi_row, end_mi_row, plane_start,
+                      plane_end, workers, num_workers, lf_sync);
+#endif
+}
+
+static INLINE void lr_sync_read(void *const lr_sync, int r, int c, int plane) {
+#if CONFIG_MULTITHREAD
+  AV1LrSync *const loop_res_sync = (AV1LrSync *)lr_sync;
+  const int nsync = loop_res_sync->sync_range;
+
+  if (r && !(c & (nsync - 1))) {
+    pthread_mutex_t *const mutex = &loop_res_sync->mutex_[plane][r - 1];
+    pthread_mutex_lock(mutex);
+
+    while (c > loop_res_sync->cur_sb_col[plane][r - 1] - nsync) {
+      pthread_cond_wait(&loop_res_sync->cond_[plane][r - 1], mutex);
+    }
+    pthread_mutex_unlock(mutex);
+  }
+#else
+  (void)lr_sync;
+  (void)r;
+  (void)c;
+  (void)plane;
+#endif  // CONFIG_MULTITHREAD
+}
+
+static INLINE void lr_sync_write(void *const lr_sync, int r, int c,
+                                 const int sb_cols, int plane) {
+#if CONFIG_MULTITHREAD
+  AV1LrSync *const loop_res_sync = (AV1LrSync *)lr_sync;
+  const int nsync = loop_res_sync->sync_range;
+  int cur;
+  // Only signal when there are enough filtered SB for next row to run.
+  int sig = 1;
+
+  if (c < sb_cols - 1) {
+    cur = c;
+    if (c % nsync) sig = 0;
+  } else {
+    cur = sb_cols + nsync;
+  }
+
+  if (sig) {
+    pthread_mutex_lock(&loop_res_sync->mutex_[plane][r]);
+
+    loop_res_sync->cur_sb_col[plane][r] = cur;
+
+    pthread_cond_broadcast(&loop_res_sync->cond_[plane][r]);
+    pthread_mutex_unlock(&loop_res_sync->mutex_[plane][r]);
+  }
+#else
+  (void)lr_sync;
+  (void)r;
+  (void)c;
+  (void)sb_cols;
+  (void)plane;
+#endif  // CONFIG_MULTITHREAD
+}
+
+// Allocate memory for loop restoration row synchronization
+static void loop_restoration_alloc(AV1LrSync *lr_sync, AV1_COMMON *cm,
+                                   int num_workers, int num_rows_lr,
+                                   int num_planes, int width) {
+  lr_sync->rows = num_rows_lr;
+  lr_sync->num_planes = num_planes;
+#if CONFIG_MULTITHREAD
+  {
+    int i, j;
+
+    for (j = 0; j < num_planes; j++) {
+      CHECK_MEM_ERROR(cm, lr_sync->mutex_[j],
+                      aom_malloc(sizeof(*(lr_sync->mutex_[j])) * num_rows_lr));
+      if (lr_sync->mutex_[j]) {
+        for (i = 0; i < num_rows_lr; ++i) {
+          pthread_mutex_init(&lr_sync->mutex_[j][i], NULL);
+        }
+      }
+
+      CHECK_MEM_ERROR(cm, lr_sync->cond_[j],
+                      aom_malloc(sizeof(*(lr_sync->cond_[j])) * num_rows_lr));
+      if (lr_sync->cond_[j]) {
+        for (i = 0; i < num_rows_lr; ++i) {
+          pthread_cond_init(&lr_sync->cond_[j][i], NULL);
+        }
+      }
+    }
+
+    CHECK_MEM_ERROR(cm, lr_sync->job_mutex,
+                    aom_malloc(sizeof(*(lr_sync->job_mutex))));
+    if (lr_sync->job_mutex) {
+      pthread_mutex_init(lr_sync->job_mutex, NULL);
+    }
+  }
+#endif  // CONFIG_MULTITHREAD
+  CHECK_MEM_ERROR(cm, lr_sync->lrworkerdata,
+                  aom_malloc(num_workers * sizeof(*(lr_sync->lrworkerdata))));
+  memset(lr_sync->lrworkerdata, 0, num_workers * sizeof(*(lr_sync->lrworkerdata)));
+
+  for (int worker_idx = 0; worker_idx < num_workers; ++worker_idx) {
+    if (worker_idx < num_workers - 1) {
+      CHECK_MEM_ERROR(cm, lr_sync->lrworkerdata[worker_idx].rst_tmpbuf,
+                      (int32_t *)aom_memalign(16, RESTORATION_TMPBUF_SIZE));
+      memset(lr_sync->lrworkerdata[worker_idx].rst_tmpbuf, 0, RESTORATION_TMPBUF_SIZE);
+
+      CHECK_MEM_ERROR(cm, lr_sync->lrworkerdata[worker_idx].rlbs,
+                      aom_malloc(sizeof(RestorationLineBuffers)));
+      memset(lr_sync->lrworkerdata[worker_idx].rlbs, 0, sizeof(RestorationLineBuffers));
+
+    } else {
+      lr_sync->lrworkerdata[worker_idx].rst_tmpbuf = cm->rst_tmpbuf;
+      lr_sync->lrworkerdata[worker_idx].rlbs = cm->rlbs;
+    }
+  }
+
+  lr_sync->num_workers = num_workers;
+
+  for (int j = 0; j < num_planes; j++) {
+    CHECK_MEM_ERROR(
+        cm, lr_sync->cur_sb_col[j],
+        aom_malloc(sizeof(*(lr_sync->cur_sb_col[j])) * num_rows_lr));
+  }
+  CHECK_MEM_ERROR(
+      cm, lr_sync->job_queue,
+      aom_malloc(sizeof(*(lr_sync->job_queue)) * num_rows_lr * num_planes));
+  // Set up nsync.
+  lr_sync->sync_range = get_lr_sync_range(width);
+}
+
+// Deallocate loop restoration synchronization related mutex and data
+void av1_loop_restoration_dealloc(AV1LrSync *lr_sync, int num_workers) {
+  if (lr_sync != NULL) {
+    int j;
+#if CONFIG_MULTITHREAD
+    int i;
+    for (j = 0; j < MAX_MB_PLANE; j++) {
+      if (lr_sync->mutex_[j] != NULL) {
+        for (i = 0; i < lr_sync->rows; ++i) {
+          pthread_mutex_destroy(&lr_sync->mutex_[j][i]);
+        }
+        aom_free(lr_sync->mutex_[j]);
+      }
+      if (lr_sync->cond_[j] != NULL) {
+        for (i = 0; i < lr_sync->rows; ++i) {
+          pthread_cond_destroy(&lr_sync->cond_[j][i]);
+        }
+        aom_free(lr_sync->cond_[j]);
+      }
+    }
+    if (lr_sync->job_mutex != NULL) {
+      pthread_mutex_destroy(lr_sync->job_mutex);
+      aom_free(lr_sync->job_mutex);
+    }
+#endif  // CONFIG_MULTITHREAD
+    for (j = 0; j < MAX_MB_PLANE; j++) {
+      aom_free(lr_sync->cur_sb_col[j]);
+    }
+
+    aom_free(lr_sync->job_queue);
+
+    if (lr_sync->lrworkerdata) {
+      for (int worker_idx = 0; worker_idx < num_workers - 1; worker_idx++) {
+        LRWorkerData *const workerdata_data =
+            lr_sync->lrworkerdata + worker_idx;
+
+        aom_free(workerdata_data->rst_tmpbuf);
+        aom_free(workerdata_data->rlbs);
+      }
+      aom_free(lr_sync->lrworkerdata);
+    }
+
+    // clear the structure as the source of this call may be a resize in which
+    // case this call will be followed by an _alloc() which may fail.
+    av1_zero(*lr_sync);
+  }
+}
+
+static void enqueue_lr_jobs(AV1LrSync *lr_sync, AV1LrStruct *lr_ctxt,
+                            AV1_COMMON *cm) {
+  FilterFrameCtxt *ctxt = lr_ctxt->ctxt;
+
+  const int num_planes = av1_num_planes(cm);
+  AV1LrMTInfo *lr_job_queue = lr_sync->job_queue;
+  int32_t lr_job_counter[2], num_even_lr_jobs = 0;
+  lr_sync->jobs_enqueued = 0;
+  lr_sync->jobs_dequeued = 0;
+
+  for (int plane = 0; plane < num_planes; plane++) {
+    if (cm->rst_info[plane].frame_restoration_type == RESTORE_NONE) continue;
+    num_even_lr_jobs =
+        num_even_lr_jobs + ((ctxt[plane].rsi->vert_units_per_tile + 1) >> 1);
+  }
+  lr_job_counter[0] = 0;
+  lr_job_counter[1] = num_even_lr_jobs;
+
+  for (int plane = 0; plane < num_planes; plane++) {
+    if (cm->rst_info[plane].frame_restoration_type == RESTORE_NONE) continue;
+    const int is_uv = plane > 0;
+    const int ss_y = is_uv && cm->seq_params.subsampling_y;
+
+    AV1PixelRect tile_rect = ctxt[plane].tile_rect;
+    const int unit_size = ctxt[plane].rsi->restoration_unit_size;
+
+    const int tile_h = tile_rect.bottom - tile_rect.top;
+    const int ext_size = unit_size * 3 / 2;
+
+    int y0 = 0, i = 0;
+    while (y0 < tile_h) {
+      int remaining_h = tile_h - y0;
+      int h = (remaining_h < ext_size) ? remaining_h : unit_size;
+
+      RestorationTileLimits limits;
+      limits.v_start = tile_rect.top + y0;
+      limits.v_end = tile_rect.top + y0 + h;
+      assert(limits.v_end <= tile_rect.bottom);
+      // Offset the tile upwards to align with the restoration processing stripe
+      const int voffset = RESTORATION_UNIT_OFFSET >> ss_y;
+      limits.v_start = AOMMAX(tile_rect.top, limits.v_start - voffset);
+      if (limits.v_end < tile_rect.bottom) limits.v_end -= voffset;
+
+      assert(lr_job_counter[0] <= num_even_lr_jobs);
+
+      lr_job_queue[lr_job_counter[i & 1]].lr_unit_row = i;
+      lr_job_queue[lr_job_counter[i & 1]].plane = plane;
+      lr_job_queue[lr_job_counter[i & 1]].v_start = limits.v_start;
+      lr_job_queue[lr_job_counter[i & 1]].v_end = limits.v_end;
+      lr_job_queue[lr_job_counter[i & 1]].sync_mode = i & 1;
+      if ((i & 1) == 0) {
+        lr_job_queue[lr_job_counter[i & 1]].v_copy_start =
+            limits.v_start + RESTORATION_BORDER;
+        lr_job_queue[lr_job_counter[i & 1]].v_copy_end =
+            limits.v_end - RESTORATION_BORDER;
+        if (i == 0) {
+          assert(limits.v_start == tile_rect.top);
+          lr_job_queue[lr_job_counter[i & 1]].v_copy_start = tile_rect.top;
+        }
+        if (i == (ctxt[plane].rsi->vert_units_per_tile - 1)) {
+          assert(limits.v_end == tile_rect.bottom);
+          lr_job_queue[lr_job_counter[i & 1]].v_copy_end = tile_rect.bottom;
+        }
+      } else {
+        lr_job_queue[lr_job_counter[i & 1]].v_copy_start =
+            AOMMAX(limits.v_start - RESTORATION_BORDER, tile_rect.top);
+        lr_job_queue[lr_job_counter[i & 1]].v_copy_end =
+            AOMMIN(limits.v_end + RESTORATION_BORDER, tile_rect.bottom);
+      }
+      lr_job_counter[i & 1]++;
+      lr_sync->jobs_enqueued++;
+
+      y0 += h;
+      ++i;
+    }
+  }
+}
+
+static AV1LrMTInfo *get_lr_job_info(AV1LrSync *lr_sync) {
+  AV1LrMTInfo *cur_job_info = NULL;
+
+#if CONFIG_MULTITHREAD
+  pthread_mutex_lock(lr_sync->job_mutex);
+
+  if (lr_sync->jobs_dequeued < lr_sync->jobs_enqueued) {
+    cur_job_info = lr_sync->job_queue + lr_sync->jobs_dequeued;
+    lr_sync->jobs_dequeued++;
+  }
+
+  pthread_mutex_unlock(lr_sync->job_mutex);
+#else
+  (void)lr_sync;
+#endif
+
+  return cur_job_info;
+}
+
+// Implement row loop restoration for each thread.
+static int loop_restoration_row_worker(void *arg1, void *arg2) {
+  AV1LrSync *const lr_sync = (AV1LrSync *)arg1;
+  LRWorkerData *lrworkerdata = (LRWorkerData *)arg2;
+  AV1LrStruct *lr_ctxt = (AV1LrStruct *)lrworkerdata->lr_ctxt;
+  FilterFrameCtxt *ctxt = lr_ctxt->ctxt;
+  int lr_unit_row;
+  int plane;
+  const int tile_row = LR_TILE_ROW;
+  const int tile_col = LR_TILE_COL;
+  const int tile_cols = LR_TILE_COLS;
+  const int tile_idx = tile_col + tile_row * tile_cols;
+  typedef void (*copy_fun)(const YV12_BUFFER_CONFIG *src_ybc,
+                           YV12_BUFFER_CONFIG *dst_ybc, int hstart, int hend,
+                           int vstart, int vend);
+  static const copy_fun copy_funs[3] = { aom_yv12_partial_coloc_copy_y,
+                                         aom_yv12_partial_coloc_copy_u,
+                                         aom_yv12_partial_coloc_copy_v };
+
+  while (1) {
+    AV1LrMTInfo *cur_job_info = get_lr_job_info(lr_sync);
+    if (cur_job_info != NULL) {
+      RestorationTileLimits limits;
+      sync_read_fn_t on_sync_read;
+      sync_write_fn_t on_sync_write;
+      limits.v_start = cur_job_info->v_start;
+      limits.v_end = cur_job_info->v_end;
+      lr_unit_row = cur_job_info->lr_unit_row;
+      plane = cur_job_info->plane;
+      const int unit_idx0 = tile_idx * ctxt[plane].rsi->units_per_tile;
+
+      // sync_mode == 1 implies only sync read is required in LR Multi-threading
+      // sync_mode == 0 implies only sync write is required.
+      on_sync_read =
+          cur_job_info->sync_mode == 1 ? lr_sync_read : av1_lr_sync_read_dummy;
+      on_sync_write = cur_job_info->sync_mode == 0 ? lr_sync_write
+                                                   : av1_lr_sync_write_dummy;
+
+      av1_foreach_rest_unit_in_row(
+          &limits, &(ctxt[plane].tile_rect), lr_ctxt->on_rest_unit, lr_unit_row,
+          ctxt[plane].rsi->restoration_unit_size, unit_idx0,
+          ctxt[plane].rsi->horz_units_per_tile,
+          ctxt[plane].rsi->vert_units_per_tile, plane, &ctxt[plane],
+          lrworkerdata->rst_tmpbuf, lrworkerdata->rlbs, on_sync_read,
+          on_sync_write, lr_sync);
+
+      copy_funs[plane](lr_ctxt->dst, lr_ctxt->frame, ctxt[plane].tile_rect.left,
+                       ctxt[plane].tile_rect.right, cur_job_info->v_copy_start,
+                       cur_job_info->v_copy_end);
+    } else {
+      break;
+    }
+  }
+  return 1;
+}
+
+static void foreach_rest_unit_in_planes_mt(AV1LrStruct *lr_ctxt,
+                                           AVxWorker *workers, int nworkers,
+                                           AV1LrSync *lr_sync, AV1_COMMON *cm) {
+  FilterFrameCtxt *ctxt = lr_ctxt->ctxt;
+
+  const int num_planes = av1_num_planes(cm);
+
+  const AVxWorkerInterface *const winterface = aom_get_worker_interface();
+  int num_rows_lr = 0;
+
+  for (int plane = 0; plane < num_planes; plane++) {
+    if (cm->rst_info[plane].frame_restoration_type == RESTORE_NONE) continue;
+
+    const AV1PixelRect tile_rect = ctxt[plane].tile_rect;
+    const int max_tile_h = tile_rect.bottom - tile_rect.top;
+
+    const int unit_size = cm->rst_info[plane].restoration_unit_size;
+
+    num_rows_lr =
+        AOMMAX(num_rows_lr, av1_lr_count_units_in_tile(unit_size, max_tile_h));
+  }
+
+  const int num_workers = nworkers;
+  int i;
+  assert(MAX_MB_PLANE == 3);
+
+  if (!lr_sync->sync_range || num_rows_lr != lr_sync->rows ||
+      num_workers > lr_sync->num_workers || num_planes != lr_sync->num_planes) {
+    av1_loop_restoration_dealloc(lr_sync, num_workers);
+    loop_restoration_alloc(lr_sync, cm, num_workers, num_rows_lr, num_planes,
+                           cm->width);
+  }
+
+  // Initialize cur_sb_col to -1 for all SB rows.
+  for (i = 0; i < num_planes; i++) {
+    memset(lr_sync->cur_sb_col[i], -1,
+           sizeof(*(lr_sync->cur_sb_col[i])) * num_rows_lr);
+  }
+
+  enqueue_lr_jobs(lr_sync, lr_ctxt, cm);
+
+  // Set up looprestoration thread data.
+  for (i = 0; i < num_workers; ++i) {
+    AVxWorker *const worker = &workers[i];
+    lr_sync->lrworkerdata[i].lr_ctxt = (void *)lr_ctxt;
+    worker->hook = loop_restoration_row_worker;
+    worker->data1 = lr_sync;
+    worker->data2 = &lr_sync->lrworkerdata[i];
+
+    // Start loopfiltering
+    if (i == num_workers - 1) {
+      winterface->execute(worker);
+    } else {
+      winterface->launch(worker);
+    }
+  }
+
+  // Wait till all rows are finished
+  for (i = 0; i < num_workers; ++i) {
+    winterface->sync(&workers[i]);
+  }
+}
+
+void av1_loop_restoration_filter_frame_mt(YV12_BUFFER_CONFIG *frame,
+                                          AV1_COMMON *cm, int optimized_lr,
+                                          AVxWorker *workers, int num_workers,
+                                          AV1LrSync *lr_sync, void *lr_ctxt) {
+  assert(!cm->all_lossless);
+
+  const int num_planes = av1_num_planes(cm);
+
+  AV1LrStruct *loop_rest_ctxt = (AV1LrStruct *)lr_ctxt;
+
+  av1_loop_restoration_filter_frame_init(loop_rest_ctxt, frame, cm,
+                                         optimized_lr, num_planes);
+
+  foreach_rest_unit_in_planes_mt(loop_rest_ctxt, workers, num_workers, lr_sync,
+                                 cm);
+}

diff --git a/libav1/av1/common/thread_common.h b/libav1/av1/common/thread_common.h
new file mode 100644
index 0000000..e7dbb8b
--- /dev/null
+++ b/libav1/av1/common/thread_common.h

@@ -0,0 +1,122 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_THREAD_COMMON_H_
+#define AOM_AV1_COMMON_THREAD_COMMON_H_
+
+#include "config/aom_config.h"
+
+#include "av1/common/av1_loopfilter.h"
+#include "aom_util/aom_thread.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct AV1Common;
+
+typedef struct AV1LfMTInfo {
+  int mi_row;
+  int plane;
+  int dir;
+} AV1LfMTInfo;
+
+// Loopfilter row synchronization
+typedef struct AV1LfSyncData {
+#if CONFIG_MULTITHREAD
+  pthread_mutex_t *mutex_[MAX_MB_PLANE];
+  pthread_cond_t *cond_[MAX_MB_PLANE];
+#endif
+  // Allocate memory to store the loop-filtered superblock index in each row.
+  int *cur_sb_col[MAX_MB_PLANE];
+  // The optimal sync_range for different resolution and platform should be
+  // determined by testing. Currently, it is chosen to be a power-of-2 number.
+  int sync_range;
+  int rows;
+
+  // Row-based parallel loopfilter data
+  LFWorkerData *lfdata;
+  int num_workers;
+
+#if CONFIG_MULTITHREAD
+  pthread_mutex_t *job_mutex;
+#endif
+  AV1LfMTInfo *job_queue;
+  int jobs_enqueued;
+  int jobs_dequeued;
+} AV1LfSync;
+
+typedef struct AV1LrMTInfo {
+  int v_start;
+  int v_end;
+  int lr_unit_row;
+  int plane;
+  int sync_mode;
+  int v_copy_start;
+  int v_copy_end;
+} AV1LrMTInfo;
+
+typedef struct LoopRestorationWorkerData {
+  int32_t *rst_tmpbuf;
+  void *rlbs;
+  void *lr_ctxt;
+} LRWorkerData;
+
+// Looprestoration row synchronization
+typedef struct AV1LrSyncData {
+#if CONFIG_MULTITHREAD
+  pthread_mutex_t *mutex_[MAX_MB_PLANE];
+  pthread_cond_t *cond_[MAX_MB_PLANE];
+#endif
+  // Allocate memory to store the loop-restoration block index in each row.
+  int *cur_sb_col[MAX_MB_PLANE];
+  // The optimal sync_range for different resolution and platform should be
+  // determined by testing. Currently, it is chosen to be a power-of-2 number.
+  int sync_range;
+  int rows;
+  int num_planes;
+
+  int num_workers;
+
+#if CONFIG_MULTITHREAD
+  pthread_mutex_t *job_mutex;
+#endif
+  // Row-based parallel loopfilter data
+  LRWorkerData *lrworkerdata;
+
+  AV1LrMTInfo *job_queue;
+  int jobs_enqueued;
+  int jobs_dequeued;
+} AV1LrSync;
+
+// Deallocate loopfilter synchronization related mutex and data.
+void av1_loop_filter_dealloc(AV1LfSync *lf_sync);
+
+void av1_loop_filter_frame_mt(YV12_BUFFER_CONFIG *frame, struct AV1Common *cm,
+                              struct macroblockd *mbd, int plane_start,
+                              int plane_end, int partial_frame,
+#if LOOP_FILTER_BITMASK
+                              int is_decoding,
+#endif
+                              AVxWorker *workers, int num_workers,
+                              AV1LfSync *lf_sync);
+void av1_loop_restoration_filter_frame_mt(YV12_BUFFER_CONFIG *frame,
+                                          struct AV1Common *cm,
+                                          int optimized_lr, AVxWorker *workers,
+                                          int num_workers, AV1LrSync *lr_sync,
+                                          void *lr_ctxt);
+void av1_loop_restoration_dealloc(AV1LrSync *lr_sync, int num_workers);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_THREAD_COMMON_H_

diff --git a/libav1/av1/common/tile_common.c b/libav1/av1/common/tile_common.c
new file mode 100644
index 0000000..4d90592
--- /dev/null
+++ b/libav1/av1/common/tile_common.c

@@ -0,0 +1,206 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "av1/common/tile_common.h"
+#include "av1/common/onyxc_int.h"
+#include "av1/common/resize.h"
+#include "aom_dsp/aom_dsp_common.h"
+
+void av1_tile_init(TileInfo *tile, const AV1_COMMON *cm, int row, int col) {
+  av1_tile_set_row(tile, cm, row);
+  av1_tile_set_col(tile, cm, col);
+}
+
+// Find smallest k>=0 such that (blk_size << k) >= target
+static int tile_log2(int blk_size, int target) {
+  int k;
+  for (k = 0; (blk_size << k) < target; k++) {
+  }
+  return k;
+}
+
+void av1_get_tile_limits(AV1_COMMON *const cm) {
+  int mi_cols = ALIGN_POWER_OF_TWO(cm->mi_cols, cm->seq_params.mib_size_log2);
+  int mi_rows = ALIGN_POWER_OF_TWO(cm->mi_rows, cm->seq_params.mib_size_log2);
+  int sb_cols = mi_cols >> cm->seq_params.mib_size_log2;
+  int sb_rows = mi_rows >> cm->seq_params.mib_size_log2;
+
+  int sb_size_log2 = cm->seq_params.mib_size_log2 + MI_SIZE_LOG2;
+  cm->max_tile_width_sb = MAX_TILE_WIDTH >> sb_size_log2;
+  int max_tile_area_sb = MAX_TILE_AREA >> (2 * sb_size_log2);
+
+  cm->min_log2_tile_cols = tile_log2(cm->max_tile_width_sb, sb_cols);
+  cm->max_log2_tile_cols = tile_log2(1, AOMMIN(sb_cols, MAX_TILE_COLS));
+  cm->max_log2_tile_rows = tile_log2(1, AOMMIN(sb_rows, MAX_TILE_ROWS));
+  cm->min_log2_tiles = tile_log2(max_tile_area_sb, sb_cols * sb_rows);
+  cm->min_log2_tiles = AOMMAX(cm->min_log2_tiles, cm->min_log2_tile_cols);
+}
+
+void av1_calculate_tile_cols(AV1_COMMON *const cm) {
+  int mi_cols = ALIGN_POWER_OF_TWO(cm->mi_cols, cm->seq_params.mib_size_log2);
+  int mi_rows = ALIGN_POWER_OF_TWO(cm->mi_rows, cm->seq_params.mib_size_log2);
+  int sb_cols = mi_cols >> cm->seq_params.mib_size_log2;
+  int sb_rows = mi_rows >> cm->seq_params.mib_size_log2;
+  int i;
+
+  if (cm->uniform_tile_spacing_flag) {
+    int start_sb;
+    int size_sb = ALIGN_POWER_OF_TWO(sb_cols, cm->log2_tile_cols);
+    size_sb >>= cm->log2_tile_cols;
+    assert(size_sb > 0);
+    for (i = 0, start_sb = 0; start_sb < sb_cols; i++) {
+      cm->tile_col_start_sb[i] = start_sb;
+      start_sb += size_sb;
+    }
+    cm->tile_cols = i;
+    cm->tile_col_start_sb[i] = sb_cols;
+    cm->min_log2_tile_rows = AOMMAX(cm->min_log2_tiles - cm->log2_tile_cols, 0);
+    cm->max_tile_height_sb = sb_rows >> cm->min_log2_tile_rows;
+
+    cm->tile_width = size_sb << cm->seq_params.mib_size_log2;
+    cm->tile_width = AOMMIN(cm->tile_width, cm->mi_cols);
+  } else {
+    int max_tile_area_sb = (sb_rows * sb_cols);
+    int widest_tile_sb = 1;
+    cm->log2_tile_cols = tile_log2(1, cm->tile_cols);
+    for (i = 0; i < cm->tile_cols; i++) {
+      int size_sb = cm->tile_col_start_sb[i + 1] - cm->tile_col_start_sb[i];
+      widest_tile_sb = AOMMAX(widest_tile_sb, size_sb);
+    }
+    if (cm->min_log2_tiles) {
+      max_tile_area_sb >>= (cm->min_log2_tiles + 1);
+    }
+    cm->max_tile_height_sb = AOMMAX(max_tile_area_sb / widest_tile_sb, 1);
+  }
+}
+
+void av1_calculate_tile_rows(AV1_COMMON *const cm) {
+  int mi_rows = ALIGN_POWER_OF_TWO(cm->mi_rows, cm->seq_params.mib_size_log2);
+  int sb_rows = mi_rows >> cm->seq_params.mib_size_log2;
+  int start_sb, size_sb, i;
+
+  if (cm->uniform_tile_spacing_flag) {
+    size_sb = ALIGN_POWER_OF_TWO(sb_rows, cm->log2_tile_rows);
+    size_sb >>= cm->log2_tile_rows;
+    assert(size_sb > 0);
+    for (i = 0, start_sb = 0; start_sb < sb_rows; i++) {
+      cm->tile_row_start_sb[i] = start_sb;
+      start_sb += size_sb;
+    }
+    cm->tile_rows = i;
+    cm->tile_row_start_sb[i] = sb_rows;
+
+    cm->tile_height = size_sb << cm->seq_params.mib_size_log2;
+    cm->tile_height = AOMMIN(cm->tile_height, cm->mi_rows);
+  } else {
+    cm->log2_tile_rows = tile_log2(1, cm->tile_rows);
+  }
+}
+
+void av1_tile_set_row(TileInfo *tile, const AV1_COMMON *cm, int row) {
+  assert(row < cm->tile_rows);
+  int mi_row_start = cm->tile_row_start_sb[row] << cm->seq_params.mib_size_log2;
+  int mi_row_end = cm->tile_row_start_sb[row + 1]
+                   << cm->seq_params.mib_size_log2;
+  tile->tile_row = row;
+  tile->mi_row_start = mi_row_start;
+  tile->mi_row_end = AOMMIN(mi_row_end, cm->mi_rows);
+  assert(tile->mi_row_end > tile->mi_row_start);
+}
+
+void av1_tile_set_col(TileInfo *tile, const AV1_COMMON *cm, int col) {
+  assert(col < cm->tile_cols);
+  int mi_col_start = cm->tile_col_start_sb[col] << cm->seq_params.mib_size_log2;
+  int mi_col_end = cm->tile_col_start_sb[col + 1]
+                   << cm->seq_params.mib_size_log2;
+  tile->tile_col = col;
+  tile->mi_col_start = mi_col_start;
+  tile->mi_col_end = AOMMIN(mi_col_end, cm->mi_cols);
+  assert(tile->mi_col_end > tile->mi_col_start);
+}
+
+int av1_get_sb_rows_in_tile(AV1_COMMON *cm, TileInfo tile) {
+  int mi_rows_aligned_to_sb = ALIGN_POWER_OF_TWO(
+      tile.mi_row_end - tile.mi_row_start, cm->seq_params.mib_size_log2);
+  int sb_rows = mi_rows_aligned_to_sb >> cm->seq_params.mib_size_log2;
+
+  return sb_rows;
+}
+
+int av1_get_sb_cols_in_tile(AV1_COMMON *cm, TileInfo tile) {
+  int mi_cols_aligned_to_sb = ALIGN_POWER_OF_TWO(
+      tile.mi_col_end - tile.mi_col_start, cm->seq_params.mib_size_log2);
+  int sb_cols = mi_cols_aligned_to_sb >> cm->seq_params.mib_size_log2;
+
+  return sb_cols;
+}
+
+AV1PixelRect av1_get_tile_rect(const TileInfo *tile_info, const AV1_COMMON *cm,
+                               int is_uv) {
+  AV1PixelRect r;
+
+  // Calculate position in the Y plane
+  r.left = tile_info->mi_col_start * MI_SIZE;
+  r.right = tile_info->mi_col_end * MI_SIZE;
+  r.top = tile_info->mi_row_start * MI_SIZE;
+  r.bottom = tile_info->mi_row_end * MI_SIZE;
+
+  // If upscaling is enabled, the tile limits need scaling to match the
+  // upscaled frame where the restoration units live. To do this, scale up the
+  // top-left and bottom-right of the tile.
+  if (av1_superres_scaled(cm)) {
+    av1_calculate_unscaled_superres_size(&r.left, &r.top,
+                                         cm->superres_scale_denominator);
+    av1_calculate_unscaled_superres_size(&r.right, &r.bottom,
+                                         cm->superres_scale_denominator);
+  }
+
+  const int frame_w = cm->superres_upscaled_width;
+  const int frame_h = cm->superres_upscaled_height;
+
+  // Make sure we don't fall off the bottom-right of the frame.
+  r.right = AOMMIN(r.right, frame_w);
+  r.bottom = AOMMIN(r.bottom, frame_h);
+
+  // Convert to coordinates in the appropriate plane
+  const int ss_x = is_uv && cm->seq_params.subsampling_x;
+  const int ss_y = is_uv && cm->seq_params.subsampling_y;
+
+  r.left = ROUND_POWER_OF_TWO(r.left, ss_x);
+  r.right = ROUND_POWER_OF_TWO(r.right, ss_x);
+  r.top = ROUND_POWER_OF_TWO(r.top, ss_y);
+  r.bottom = ROUND_POWER_OF_TWO(r.bottom, ss_y);
+
+  return r;
+}
+
+void av1_get_uniform_tile_size(const AV1_COMMON *cm, int *w, int *h) {
+  if (cm->uniform_tile_spacing_flag) {
+    *w = cm->tile_width;
+    *h = cm->tile_height;
+  } else {
+    for (int i = 0; i < cm->tile_cols; ++i) {
+      const int tile_width_sb =
+          cm->tile_col_start_sb[i + 1] - cm->tile_col_start_sb[i];
+      const int tile_w = tile_width_sb * cm->seq_params.mib_size;
+      assert(i == 0 || tile_w == *w);  // ensure all tiles have same dimension
+      *w = tile_w;
+    }
+
+    for (int i = 0; i < cm->tile_rows; ++i) {
+      const int tile_height_sb =
+          cm->tile_row_start_sb[i + 1] - cm->tile_row_start_sb[i];
+      const int tile_h = tile_height_sb * cm->seq_params.mib_size;
+      assert(i == 0 || tile_h == *h);  // ensure all tiles have same dimension
+      *h = tile_h;
+    }
+  }
+}

diff --git a/libav1/av1/common/tile_common.h b/libav1/av1/common/tile_common.h
new file mode 100644
index 0000000..b7203ce
--- /dev/null
+++ b/libav1/av1/common/tile_common.h

@@ -0,0 +1,66 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_TILE_COMMON_H_
+#define AOM_AV1_COMMON_TILE_COMMON_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "config/aom_config.h"
+
+struct AV1Common;
+
+#define DEFAULT_MAX_NUM_TG 1
+
+typedef struct TileInfo {
+  int mi_row_start, mi_row_end;
+  int mi_col_start, mi_col_end;
+  int tile_row;
+  int tile_col;
+} TileInfo;
+
+// initializes 'tile->mi_(row|col)_(start|end)' for (row, col) based on
+// 'cm->log2_tile_(rows|cols)' & 'cm->mi_(rows|cols)'
+void av1_tile_init(TileInfo *tile, const struct AV1Common *cm, int row,
+                   int col);
+
+void av1_tile_set_row(TileInfo *tile, const struct AV1Common *cm, int row);
+void av1_tile_set_col(TileInfo *tile, const struct AV1Common *cm, int col);
+
+int av1_get_sb_rows_in_tile(struct AV1Common *cm, TileInfo tile);
+int av1_get_sb_cols_in_tile(struct AV1Common *cm, TileInfo tile);
+
+typedef struct {
+  int left, top, right, bottom;
+} AV1PixelRect;
+
+// Return the pixel extents of the given tile
+AV1PixelRect av1_get_tile_rect(const TileInfo *tile_info,
+                               const struct AV1Common *cm, int is_uv);
+
+// Define tile maximum width and area
+// There is no maximum height since height is limited by area and width limits
+// The minimum tile width or height is fixed at one superblock
+#define MAX_TILE_WIDTH (4096)        // Max Tile width in pixels
+#define MAX_TILE_AREA (4096 * 2304)  // Maximum tile area in pixels
+
+void av1_get_uniform_tile_size(const struct AV1Common *cm, int *w, int *h);
+void av1_get_tile_limits(struct AV1Common *const cm);
+void av1_calculate_tile_cols(struct AV1Common *const cm);
+void av1_calculate_tile_rows(struct AV1Common *const cm);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_TILE_COMMON_H_

diff --git a/libav1/av1/common/timing.c b/libav1/av1/common/timing.c
new file mode 100644
index 0000000..49dbde7
--- /dev/null
+++ b/libav1/av1/common/timing.c

@@ -0,0 +1,79 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "av1/common/timing.h"
+
+/* Tables for AV1 max bitrates for different levels of main and high tier.
+ * The tables are in Kbps instead of Mbps in the specification.
+ * Note that depending on the profile, a multiplier is needed.
+ */
+
+/* Max Bitrates for levels of Main Tier in kbps. Bitrate in main_kbps [31] */
+/* is a dummy value. The decoder model is not applicable for level 31. */
+static int32_t main_kbps[1 << LEVEL_BITS] = {
+  1500, 3000,  0,     0,     6000,  10000, 0,      0,      12000,  20000,    0,
+  0,    30000, 40000, 60000, 60000, 60000, 100000, 160000, 160000, 0,        0,
+  0,    0,     0,     0,     0,     0,     0,      0,      0,      (1 << 26)
+};
+
+/* Max Bitrates for levels of High Tier in kbps. Bitrate in high_kbps [31] */
+/* is a dummy value. The decoder model is not applicable for level 31. */
+static int32_t high_kbps[1 << LEVEL_BITS] = {
+  0,      0,      0,      0,      0,      0,      0,      0,
+  30000,  50000,  0,      0,      100000, 160000, 240000, 240000,
+  240000, 480000, 800000, 800000, 0,      0,      0,      0,
+  0,      0,      0,      0,      0,      0,      0,      (1 << 26)
+};
+
+/* BitrateProfileFactor */
+static int bitrate_profile_factor[1 << PROFILE_BITS] = {
+  1, 2, 3, 0, 0, 0, 0, 0
+};
+
+int64_t max_level_bitrate(BITSTREAM_PROFILE seq_profile, int seq_level_idx,
+                          int seq_tier) {
+  int64_t bitrate;
+
+  if (seq_tier) {
+    bitrate = high_kbps[seq_level_idx] * bitrate_profile_factor[seq_profile];
+  } else {
+    bitrate = main_kbps[seq_level_idx] * bitrate_profile_factor[seq_profile];
+  }
+
+  return bitrate * 1000;
+}
+
+void set_aom_dec_model_info(aom_dec_model_info_t *decoder_model) {
+  decoder_model->encoder_decoder_buffer_delay_length = 16;
+  decoder_model->buffer_removal_time_length = 10;
+  decoder_model->frame_presentation_time_length = 10;
+}
+
+void set_dec_model_op_parameters(aom_dec_model_op_parameters_t *op_params) {
+  op_params->decoder_model_param_present_flag = 1;
+  op_params->decoder_buffer_delay = 90000 >> 1;  //  0.5 s
+  op_params->encoder_buffer_delay = 90000 >> 1;  //  0.5 s
+  op_params->low_delay_mode_flag = 0;
+  op_params->display_model_param_present_flag = 1;
+  op_params->initial_display_delay = 8;  // 8 frames delay
+}
+
+void set_resource_availability_parameters(
+    aom_dec_model_op_parameters_t *op_params) {
+  op_params->decoder_model_param_present_flag = 0;
+  op_params->decoder_buffer_delay =
+      70000;  // Resource availability mode default
+  op_params->encoder_buffer_delay =
+      20000;                           // Resource availability mode default
+  op_params->low_delay_mode_flag = 0;  // Resource availability mode default
+  op_params->display_model_param_present_flag = 1;
+  op_params->initial_display_delay = 8;  // 8 frames delay
+}

diff --git a/libav1/av1/common/timing.h b/libav1/av1/common/timing.h
new file mode 100644
index 0000000..06939ae
--- /dev/null
+++ b/libav1/av1/common/timing.h

@@ -0,0 +1,59 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_TIMING_H_
+#define AOM_AV1_COMMON_TIMING_H_
+
+#include "aom/aom_integer.h"
+#include "av1/common/enums.h"
+
+#define MAX_NUM_OP_POINTS 32
+
+typedef struct aom_timing {
+  uint32_t num_units_in_display_tick;
+  uint32_t time_scale;
+  int equal_picture_interval;
+  uint32_t num_ticks_per_picture;
+} aom_timing_info_t;
+
+typedef struct aom_dec_model_info {
+  uint32_t num_units_in_decoding_tick;
+  int encoder_decoder_buffer_delay_length;
+  int buffer_removal_time_length;
+  int frame_presentation_time_length;
+} aom_dec_model_info_t;
+
+typedef struct aom_dec_model_op_parameters {
+  int decoder_model_param_present_flag;
+  int64_t bitrate;
+  int64_t buffer_size;
+  uint32_t decoder_buffer_delay;
+  uint32_t encoder_buffer_delay;
+  int low_delay_mode_flag;
+  int display_model_param_present_flag;
+  int initial_display_delay;
+} aom_dec_model_op_parameters_t;
+
+typedef struct aom_op_timing_info_t {
+  uint32_t buffer_removal_time;
+} aom_op_timing_info_t;
+
+void set_aom_dec_model_info(aom_dec_model_info_t *decoder_model);
+
+void set_dec_model_op_parameters(aom_dec_model_op_parameters_t *op_params);
+
+void set_resource_availability_parameters(
+    aom_dec_model_op_parameters_t *op_params);
+
+int64_t max_level_bitrate(BITSTREAM_PROFILE seq_profile, int seq_level_idx,
+                          int seq_tier);
+
+#endif  // AOM_AV1_COMMON_TIMING_H_

diff --git a/libav1/av1/common/token_cdfs.h b/libav1/av1/common/token_cdfs.h
new file mode 100644
index 0000000..53e9564
--- /dev/null
+++ b/libav1/av1/common/token_cdfs.h

@@ -0,0 +1,3555 @@
+/*
+ * Copyright (c) 2017, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_TOKEN_CDFS_H_
+#define AOM_AV1_COMMON_TOKEN_CDFS_H_
+
+#include "config/aom_config.h"
+
+#include "av1/common/entropy.h"
+
+static const aom_cdf_prob
+    av1_default_dc_sign_cdfs[TOKEN_CDF_Q_CTXS][PLANE_TYPES][DC_SIGN_CONTEXTS]
+                            [CDF_SIZE(2)] = {
+                              { {
+                                    { AOM_CDF2(128 * 125) },
+                                    { AOM_CDF2(128 * 102) },
+                                    { AOM_CDF2(128 * 147) },
+                                },
+                                {
+                                    { AOM_CDF2(128 * 119) },
+                                    { AOM_CDF2(128 * 101) },
+                                    { AOM_CDF2(128 * 135) },
+                                } },
+                              { {
+                                    { AOM_CDF2(128 * 125) },
+                                    { AOM_CDF2(128 * 102) },
+                                    { AOM_CDF2(128 * 147) },
+                                },
+                                {
+                                    { AOM_CDF2(128 * 119) },
+                                    { AOM_CDF2(128 * 101) },
+                                    { AOM_CDF2(128 * 135) },
+                                } },
+                              { {
+                                    { AOM_CDF2(128 * 125) },
+                                    { AOM_CDF2(128 * 102) },
+                                    { AOM_CDF2(128 * 147) },
+                                },
+                                {
+                                    { AOM_CDF2(128 * 119) },
+                                    { AOM_CDF2(128 * 101) },
+                                    { AOM_CDF2(128 * 135) },
+                                } },
+                              { {
+                                    { AOM_CDF2(128 * 125) },
+                                    { AOM_CDF2(128 * 102) },
+                                    { AOM_CDF2(128 * 147) },
+                                },
+                                {
+                                    { AOM_CDF2(128 * 119) },
+                                    { AOM_CDF2(128 * 101) },
+                                    { AOM_CDF2(128 * 135) },
+                                } },
+                            };
+
+static const aom_cdf_prob
+    av1_default_txb_skip_cdfs[TOKEN_CDF_Q_CTXS][TX_SIZES][TXB_SKIP_CONTEXTS]
+                             [CDF_SIZE(2)] = { { { { AOM_CDF2(31849) },
+                                                   { AOM_CDF2(5892) },
+                                                   { AOM_CDF2(12112) },
+                                                   { AOM_CDF2(21935) },
+                                                   { AOM_CDF2(20289) },
+                                                   { AOM_CDF2(27473) },
+                                                   { AOM_CDF2(32487) },
+                                                   { AOM_CDF2(7654) },
+                                                   { AOM_CDF2(19473) },
+                                                   { AOM_CDF2(29984) },
+                                                   { AOM_CDF2(9961) },
+                                                   { AOM_CDF2(30242) },
+                                                   { AOM_CDF2(32117) } },
+                                                 { { AOM_CDF2(31548) },
+                                                   { AOM_CDF2(1549) },
+                                                   { AOM_CDF2(10130) },
+                                                   { AOM_CDF2(16656) },
+                                                   { AOM_CDF2(18591) },
+                                                   { AOM_CDF2(26308) },
+                                                   { AOM_CDF2(32537) },
+                                                   { AOM_CDF2(5403) },
+                                                   { AOM_CDF2(18096) },
+                                                   { AOM_CDF2(30003) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) } },
+                                                 { { AOM_CDF2(29957) },
+                                                   { AOM_CDF2(5391) },
+                                                   { AOM_CDF2(18039) },
+                                                   { AOM_CDF2(23566) },
+                                                   { AOM_CDF2(22431) },
+                                                   { AOM_CDF2(25822) },
+                                                   { AOM_CDF2(32197) },
+                                                   { AOM_CDF2(3778) },
+                                                   { AOM_CDF2(15336) },
+                                                   { AOM_CDF2(28981) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) } },
+                                                 { { AOM_CDF2(17920) },
+                                                   { AOM_CDF2(1818) },
+                                                   { AOM_CDF2(7282) },
+                                                   { AOM_CDF2(25273) },
+                                                   { AOM_CDF2(10923) },
+                                                   { AOM_CDF2(31554) },
+                                                   { AOM_CDF2(32624) },
+                                                   { AOM_CDF2(1366) },
+                                                   { AOM_CDF2(15628) },
+                                                   { AOM_CDF2(30462) },
+                                                   { AOM_CDF2(146) },
+                                                   { AOM_CDF2(5132) },
+                                                   { AOM_CDF2(31657) } },
+                                                 { { AOM_CDF2(6308) },
+                                                   { AOM_CDF2(117) },
+                                                   { AOM_CDF2(1638) },
+                                                   { AOM_CDF2(2161) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(10923) },
+                                                   { AOM_CDF2(30247) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) } } },
+                                               { { { AOM_CDF2(30371) },
+                                                   { AOM_CDF2(7570) },
+                                                   { AOM_CDF2(13155) },
+                                                   { AOM_CDF2(20751) },
+                                                   { AOM_CDF2(20969) },
+                                                   { AOM_CDF2(27067) },
+                                                   { AOM_CDF2(32013) },
+                                                   { AOM_CDF2(5495) },
+                                                   { AOM_CDF2(17942) },
+                                                   { AOM_CDF2(28280) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) } },
+                                                 { { AOM_CDF2(31782) },
+                                                   { AOM_CDF2(1836) },
+                                                   { AOM_CDF2(10689) },
+                                                   { AOM_CDF2(17604) },
+                                                   { AOM_CDF2(21622) },
+                                                   { AOM_CDF2(27518) },
+                                                   { AOM_CDF2(32399) },
+                                                   { AOM_CDF2(4419) },
+                                                   { AOM_CDF2(16294) },
+                                                   { AOM_CDF2(28345) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) } },
+                                                 { { AOM_CDF2(31901) },
+                                                   { AOM_CDF2(10311) },
+                                                   { AOM_CDF2(18047) },
+                                                   { AOM_CDF2(24806) },
+                                                   { AOM_CDF2(23288) },
+                                                   { AOM_CDF2(27914) },
+                                                   { AOM_CDF2(32296) },
+                                                   { AOM_CDF2(4215) },
+                                                   { AOM_CDF2(15756) },
+                                                   { AOM_CDF2(28341) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) } },
+                                                 { { AOM_CDF2(26726) },
+                                                   { AOM_CDF2(1045) },
+                                                   { AOM_CDF2(11703) },
+                                                   { AOM_CDF2(20590) },
+                                                   { AOM_CDF2(18554) },
+                                                   { AOM_CDF2(25970) },
+                                                   { AOM_CDF2(31938) },
+                                                   { AOM_CDF2(5583) },
+                                                   { AOM_CDF2(21313) },
+                                                   { AOM_CDF2(29390) },
+                                                   { AOM_CDF2(641) },
+                                                   { AOM_CDF2(22265) },
+                                                   { AOM_CDF2(31452) } },
+                                                 { { AOM_CDF2(26584) },
+                                                   { AOM_CDF2(188) },
+                                                   { AOM_CDF2(8847) },
+                                                   { AOM_CDF2(24519) },
+                                                   { AOM_CDF2(22938) },
+                                                   { AOM_CDF2(30583) },
+                                                   { AOM_CDF2(32608) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) } } },
+                                               { { { AOM_CDF2(29614) },
+                                                   { AOM_CDF2(9068) },
+                                                   { AOM_CDF2(12924) },
+                                                   { AOM_CDF2(19538) },
+                                                   { AOM_CDF2(17737) },
+                                                   { AOM_CDF2(24619) },
+                                                   { AOM_CDF2(30642) },
+                                                   { AOM_CDF2(4119) },
+                                                   { AOM_CDF2(16026) },
+                                                   { AOM_CDF2(25657) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) } },
+                                                 { { AOM_CDF2(31957) },
+                                                   { AOM_CDF2(3230) },
+                                                   { AOM_CDF2(11153) },
+                                                   { AOM_CDF2(18123) },
+                                                   { AOM_CDF2(20143) },
+                                                   { AOM_CDF2(26536) },
+                                                   { AOM_CDF2(31986) },
+                                                   { AOM_CDF2(3050) },
+                                                   { AOM_CDF2(14603) },
+                                                   { AOM_CDF2(25155) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) } },
+                                                 { { AOM_CDF2(32363) },
+                                                   { AOM_CDF2(10692) },
+                                                   { AOM_CDF2(19090) },
+                                                   { AOM_CDF2(24357) },
+                                                   { AOM_CDF2(24442) },
+                                                   { AOM_CDF2(28312) },
+                                                   { AOM_CDF2(32169) },
+                                                   { AOM_CDF2(3648) },
+                                                   { AOM_CDF2(15690) },
+                                                   { AOM_CDF2(26815) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) } },
+                                                 { { AOM_CDF2(30669) },
+                                                   { AOM_CDF2(3832) },
+                                                   { AOM_CDF2(11663) },
+                                                   { AOM_CDF2(18889) },
+                                                   { AOM_CDF2(19782) },
+                                                   { AOM_CDF2(23313) },
+                                                   { AOM_CDF2(31330) },
+                                                   { AOM_CDF2(5124) },
+                                                   { AOM_CDF2(18719) },
+                                                   { AOM_CDF2(28468) },
+                                                   { AOM_CDF2(3082) },
+                                                   { AOM_CDF2(20982) },
+                                                   { AOM_CDF2(29443) } },
+                                                 { { AOM_CDF2(28573) },
+                                                   { AOM_CDF2(3183) },
+                                                   { AOM_CDF2(17802) },
+                                                   { AOM_CDF2(25977) },
+                                                   { AOM_CDF2(26677) },
+                                                   { AOM_CDF2(27832) },
+                                                   { AOM_CDF2(32387) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) } } },
+                                               { { { AOM_CDF2(26887) },
+                                                   { AOM_CDF2(6729) },
+                                                   { AOM_CDF2(10361) },
+                                                   { AOM_CDF2(17442) },
+                                                   { AOM_CDF2(15045) },
+                                                   { AOM_CDF2(22478) },
+                                                   { AOM_CDF2(29072) },
+                                                   { AOM_CDF2(2713) },
+                                                   { AOM_CDF2(11861) },
+                                                   { AOM_CDF2(20773) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) } },
+                                                 { { AOM_CDF2(31903) },
+                                                   { AOM_CDF2(2044) },
+                                                   { AOM_CDF2(7528) },
+                                                   { AOM_CDF2(14618) },
+                                                   { AOM_CDF2(16182) },
+                                                   { AOM_CDF2(24168) },
+                                                   { AOM_CDF2(31037) },
+                                                   { AOM_CDF2(2786) },
+                                                   { AOM_CDF2(11194) },
+                                                   { AOM_CDF2(20155) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) } },
+                                                 { { AOM_CDF2(32510) },
+                                                   { AOM_CDF2(8430) },
+                                                   { AOM_CDF2(17318) },
+                                                   { AOM_CDF2(24154) },
+                                                   { AOM_CDF2(23674) },
+                                                   { AOM_CDF2(28789) },
+                                                   { AOM_CDF2(32139) },
+                                                   { AOM_CDF2(3440) },
+                                                   { AOM_CDF2(13117) },
+                                                   { AOM_CDF2(22702) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) } },
+                                                 { { AOM_CDF2(31671) },
+                                                   { AOM_CDF2(2056) },
+                                                   { AOM_CDF2(11746) },
+                                                   { AOM_CDF2(16852) },
+                                                   { AOM_CDF2(18635) },
+                                                   { AOM_CDF2(24715) },
+                                                   { AOM_CDF2(31484) },
+                                                   { AOM_CDF2(4656) },
+                                                   { AOM_CDF2(16074) },
+                                                   { AOM_CDF2(24704) },
+                                                   { AOM_CDF2(1806) },
+                                                   { AOM_CDF2(14645) },
+                                                   { AOM_CDF2(25336) } },
+                                                 { { AOM_CDF2(31539) },
+                                                   { AOM_CDF2(8433) },
+                                                   { AOM_CDF2(20576) },
+                                                   { AOM_CDF2(27904) },
+                                                   { AOM_CDF2(27852) },
+                                                   { AOM_CDF2(30026) },
+                                                   { AOM_CDF2(32441) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) },
+                                                   { AOM_CDF2(16384) } } } };
+
+static const aom_cdf_prob
+    av1_default_eob_extra_cdfs[TOKEN_CDF_Q_CTXS][TX_SIZES][PLANE_TYPES]
+                              [EOB_COEF_CONTEXTS][CDF_SIZE(2)] = {
+                                { { {
+                                        { AOM_CDF2(16961) },
+                                        { AOM_CDF2(17223) },
+                                        { AOM_CDF2(7621) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    },
+                                    {
+                                        { AOM_CDF2(19069) },
+                                        { AOM_CDF2(22525) },
+                                        { AOM_CDF2(13377) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    } },
+                                  { {
+                                        { AOM_CDF2(20401) },
+                                        { AOM_CDF2(17025) },
+                                        { AOM_CDF2(12845) },
+                                        { AOM_CDF2(12873) },
+                                        { AOM_CDF2(14094) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    },
+                                    {
+                                        { AOM_CDF2(20681) },
+                                        { AOM_CDF2(20701) },
+                                        { AOM_CDF2(15250) },
+                                        { AOM_CDF2(15017) },
+                                        { AOM_CDF2(14928) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    } },
+                                  { {
+                                        { AOM_CDF2(23905) },
+                                        { AOM_CDF2(17194) },
+                                        { AOM_CDF2(16170) },
+                                        { AOM_CDF2(17695) },
+                                        { AOM_CDF2(13826) },
+                                        { AOM_CDF2(15810) },
+                                        { AOM_CDF2(12036) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    },
+                                    {
+                                        { AOM_CDF2(23959) },
+                                        { AOM_CDF2(20799) },
+                                        { AOM_CDF2(19021) },
+                                        { AOM_CDF2(16203) },
+                                        { AOM_CDF2(17886) },
+                                        { AOM_CDF2(14144) },
+                                        { AOM_CDF2(12010) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    } },
+                                  { {
+                                        { AOM_CDF2(27399) },
+                                        { AOM_CDF2(16327) },
+                                        { AOM_CDF2(18071) },
+                                        { AOM_CDF2(19584) },
+                                        { AOM_CDF2(20721) },
+                                        { AOM_CDF2(18432) },
+                                        { AOM_CDF2(19560) },
+                                        { AOM_CDF2(10150) },
+                                        { AOM_CDF2(8805) },
+                                    },
+                                    {
+                                        { AOM_CDF2(24932) },
+                                        { AOM_CDF2(20833) },
+                                        { AOM_CDF2(12027) },
+                                        { AOM_CDF2(16670) },
+                                        { AOM_CDF2(19914) },
+                                        { AOM_CDF2(15106) },
+                                        { AOM_CDF2(17662) },
+                                        { AOM_CDF2(13783) },
+                                        { AOM_CDF2(28756) },
+                                    } },
+                                  { {
+                                        { AOM_CDF2(23406) },
+                                        { AOM_CDF2(21845) },
+                                        { AOM_CDF2(18432) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(17096) },
+                                        { AOM_CDF2(12561) },
+                                        { AOM_CDF2(17320) },
+                                        { AOM_CDF2(22395) },
+                                        { AOM_CDF2(21370) },
+                                    },
+                                    {
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    } } },
+                                { { {
+                                        { AOM_CDF2(17471) },
+                                        { AOM_CDF2(20223) },
+                                        { AOM_CDF2(11357) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    },
+                                    {
+                                        { AOM_CDF2(20335) },
+                                        { AOM_CDF2(21667) },
+                                        { AOM_CDF2(14818) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    } },
+                                  { {
+                                        { AOM_CDF2(20430) },
+                                        { AOM_CDF2(20662) },
+                                        { AOM_CDF2(15367) },
+                                        { AOM_CDF2(16970) },
+                                        { AOM_CDF2(14657) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    },
+                                    {
+                                        { AOM_CDF2(22117) },
+                                        { AOM_CDF2(22028) },
+                                        { AOM_CDF2(18650) },
+                                        { AOM_CDF2(16042) },
+                                        { AOM_CDF2(15885) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    } },
+                                  { {
+                                        { AOM_CDF2(22409) },
+                                        { AOM_CDF2(21012) },
+                                        { AOM_CDF2(15650) },
+                                        { AOM_CDF2(17395) },
+                                        { AOM_CDF2(15469) },
+                                        { AOM_CDF2(20205) },
+                                        { AOM_CDF2(19511) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    },
+                                    {
+                                        { AOM_CDF2(24220) },
+                                        { AOM_CDF2(22480) },
+                                        { AOM_CDF2(17737) },
+                                        { AOM_CDF2(18916) },
+                                        { AOM_CDF2(19268) },
+                                        { AOM_CDF2(18412) },
+                                        { AOM_CDF2(18844) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    } },
+                                  { {
+                                        { AOM_CDF2(25991) },
+                                        { AOM_CDF2(20314) },
+                                        { AOM_CDF2(17731) },
+                                        { AOM_CDF2(19678) },
+                                        { AOM_CDF2(18649) },
+                                        { AOM_CDF2(17307) },
+                                        { AOM_CDF2(21798) },
+                                        { AOM_CDF2(17549) },
+                                        { AOM_CDF2(15630) },
+                                    },
+                                    {
+                                        { AOM_CDF2(26585) },
+                                        { AOM_CDF2(21469) },
+                                        { AOM_CDF2(20432) },
+                                        { AOM_CDF2(17735) },
+                                        { AOM_CDF2(19280) },
+                                        { AOM_CDF2(15235) },
+                                        { AOM_CDF2(20297) },
+                                        { AOM_CDF2(22471) },
+                                        { AOM_CDF2(28997) },
+                                    } },
+                                  { {
+                                        { AOM_CDF2(26605) },
+                                        { AOM_CDF2(11304) },
+                                        { AOM_CDF2(16726) },
+                                        { AOM_CDF2(16560) },
+                                        { AOM_CDF2(20866) },
+                                        { AOM_CDF2(23524) },
+                                        { AOM_CDF2(19878) },
+                                        { AOM_CDF2(13469) },
+                                        { AOM_CDF2(23084) },
+                                    },
+                                    {
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    } } },
+                                { { {
+                                        { AOM_CDF2(18983) },
+                                        { AOM_CDF2(20512) },
+                                        { AOM_CDF2(14885) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    },
+                                    {
+                                        { AOM_CDF2(20090) },
+                                        { AOM_CDF2(19444) },
+                                        { AOM_CDF2(17286) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    } },
+                                  { {
+                                        { AOM_CDF2(19139) },
+                                        { AOM_CDF2(21487) },
+                                        { AOM_CDF2(18959) },
+                                        { AOM_CDF2(20910) },
+                                        { AOM_CDF2(19089) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    },
+                                    {
+                                        { AOM_CDF2(20536) },
+                                        { AOM_CDF2(20664) },
+                                        { AOM_CDF2(20625) },
+                                        { AOM_CDF2(19123) },
+                                        { AOM_CDF2(14862) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    } },
+                                  { {
+                                        { AOM_CDF2(19833) },
+                                        { AOM_CDF2(21502) },
+                                        { AOM_CDF2(17485) },
+                                        { AOM_CDF2(20267) },
+                                        { AOM_CDF2(18353) },
+                                        { AOM_CDF2(23329) },
+                                        { AOM_CDF2(21478) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    },
+                                    {
+                                        { AOM_CDF2(22041) },
+                                        { AOM_CDF2(23434) },
+                                        { AOM_CDF2(20001) },
+                                        { AOM_CDF2(20554) },
+                                        { AOM_CDF2(20951) },
+                                        { AOM_CDF2(20145) },
+                                        { AOM_CDF2(15562) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    } },
+                                  { {
+                                        { AOM_CDF2(23312) },
+                                        { AOM_CDF2(21607) },
+                                        { AOM_CDF2(16526) },
+                                        { AOM_CDF2(18957) },
+                                        { AOM_CDF2(18034) },
+                                        { AOM_CDF2(18934) },
+                                        { AOM_CDF2(24247) },
+                                        { AOM_CDF2(16921) },
+                                        { AOM_CDF2(17080) },
+                                    },
+                                    {
+                                        { AOM_CDF2(26579) },
+                                        { AOM_CDF2(24910) },
+                                        { AOM_CDF2(18637) },
+                                        { AOM_CDF2(19800) },
+                                        { AOM_CDF2(20388) },
+                                        { AOM_CDF2(9887) },
+                                        { AOM_CDF2(15642) },
+                                        { AOM_CDF2(30198) },
+                                        { AOM_CDF2(24721) },
+                                    } },
+                                  { {
+                                        { AOM_CDF2(26998) },
+                                        { AOM_CDF2(16737) },
+                                        { AOM_CDF2(17838) },
+                                        { AOM_CDF2(18922) },
+                                        { AOM_CDF2(19515) },
+                                        { AOM_CDF2(18636) },
+                                        { AOM_CDF2(17333) },
+                                        { AOM_CDF2(15776) },
+                                        { AOM_CDF2(22658) },
+                                    },
+                                    {
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    } } },
+                                { { {
+                                        { AOM_CDF2(20177) },
+                                        { AOM_CDF2(20789) },
+                                        { AOM_CDF2(20262) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    },
+                                    {
+                                        { AOM_CDF2(21416) },
+                                        { AOM_CDF2(20855) },
+                                        { AOM_CDF2(23410) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    } },
+                                  { {
+                                        { AOM_CDF2(20238) },
+                                        { AOM_CDF2(21057) },
+                                        { AOM_CDF2(19159) },
+                                        { AOM_CDF2(22337) },
+                                        { AOM_CDF2(20159) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    },
+                                    {
+                                        { AOM_CDF2(20125) },
+                                        { AOM_CDF2(20559) },
+                                        { AOM_CDF2(21707) },
+                                        { AOM_CDF2(22296) },
+                                        { AOM_CDF2(17333) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    } },
+                                  { {
+                                        { AOM_CDF2(19941) },
+                                        { AOM_CDF2(20527) },
+                                        { AOM_CDF2(21470) },
+                                        { AOM_CDF2(22487) },
+                                        { AOM_CDF2(19558) },
+                                        { AOM_CDF2(22354) },
+                                        { AOM_CDF2(20331) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    },
+                                    {
+                                        { AOM_CDF2(22752) },
+                                        { AOM_CDF2(25006) },
+                                        { AOM_CDF2(22075) },
+                                        { AOM_CDF2(21576) },
+                                        { AOM_CDF2(17740) },
+                                        { AOM_CDF2(21690) },
+                                        { AOM_CDF2(19211) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    } },
+                                  { {
+                                        { AOM_CDF2(21442) },
+                                        { AOM_CDF2(22358) },
+                                        { AOM_CDF2(18503) },
+                                        { AOM_CDF2(20291) },
+                                        { AOM_CDF2(19945) },
+                                        { AOM_CDF2(21294) },
+                                        { AOM_CDF2(21178) },
+                                        { AOM_CDF2(19400) },
+                                        { AOM_CDF2(10556) },
+                                    },
+                                    {
+                                        { AOM_CDF2(24648) },
+                                        { AOM_CDF2(24949) },
+                                        { AOM_CDF2(20708) },
+                                        { AOM_CDF2(23905) },
+                                        { AOM_CDF2(20501) },
+                                        { AOM_CDF2(9558) },
+                                        { AOM_CDF2(9423) },
+                                        { AOM_CDF2(30365) },
+                                        { AOM_CDF2(19253) },
+                                    } },
+                                  { {
+                                        { AOM_CDF2(26064) },
+                                        { AOM_CDF2(22098) },
+                                        { AOM_CDF2(19613) },
+                                        { AOM_CDF2(20525) },
+                                        { AOM_CDF2(17595) },
+                                        { AOM_CDF2(16618) },
+                                        { AOM_CDF2(20497) },
+                                        { AOM_CDF2(18989) },
+                                        { AOM_CDF2(15513) },
+                                    },
+                                    {
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                        { AOM_CDF2(16384) },
+                                    } } }
+                              };
+
+static const aom_cdf_prob
+    av1_default_eob_multi16_cdfs[TOKEN_CDF_Q_CTXS][PLANE_TYPES][2][CDF_SIZE(
+        5)] = { { { { AOM_CDF5(840, 1039, 1980, 4895) },
+                    { AOM_CDF5(370, 671, 1883, 4471) } },
+                  { { AOM_CDF5(3247, 4950, 9688, 14563) },
+                    { AOM_CDF5(1904, 3354, 7763, 14647) } } },
+                { { { AOM_CDF5(2125, 2551, 5165, 8946) },
+                    { AOM_CDF5(513, 765, 1859, 6339) } },
+                  { { AOM_CDF5(7637, 9498, 14259, 19108) },
+                    { AOM_CDF5(2497, 4096, 8866, 16993) } } },
+                { { { AOM_CDF5(4016, 4897, 8881, 14968) },
+                    { AOM_CDF5(716, 1105, 2646, 10056) } },
+                  { { AOM_CDF5(11139, 13270, 18241, 23566) },
+                    { AOM_CDF5(3192, 5032, 10297, 19755) } } },
+                { { { AOM_CDF5(6708, 8958, 14746, 22133) },
+                    { AOM_CDF5(1222, 2074, 4783, 15410) } },
+                  { { AOM_CDF5(19575, 21766, 26044, 29709) },
+                    { AOM_CDF5(7297, 10767, 19273, 28194) } } } };
+
+static const aom_cdf_prob
+    av1_default_eob_multi32_cdfs[TOKEN_CDF_Q_CTXS][PLANE_TYPES][2][CDF_SIZE(
+        6)] = { { { { AOM_CDF6(400, 520, 977, 2102, 6542) },
+                    { AOM_CDF6(210, 405, 1315, 3326, 7537) } },
+                  { { AOM_CDF6(2636, 4273, 7588, 11794, 20401) },
+                    { AOM_CDF6(1786, 3179, 6902, 11357, 19054) } } },
+                { { { AOM_CDF6(989, 1249, 2019, 4151, 10785) },
+                    { AOM_CDF6(313, 441, 1099, 2917, 8562) } },
+                  { { AOM_CDF6(8394, 10352, 13932, 18855, 26014) },
+                    { AOM_CDF6(2578, 4124, 8181, 13670, 24234) } } },
+                { { { AOM_CDF6(2515, 3003, 4452, 8162, 16041) },
+                    { AOM_CDF6(574, 821, 1836, 5089, 13128) } },
+                  { { AOM_CDF6(13468, 16303, 20361, 25105, 29281) },
+                    { AOM_CDF6(3542, 5502, 10415, 16760, 25644) } } },
+                { { { AOM_CDF6(4617, 5709, 8446, 13584, 23135) },
+                    { AOM_CDF6(1156, 1702, 3675, 9274, 20539) } },
+                  { { AOM_CDF6(22086, 24282, 27010, 29770, 31743) },
+                    { AOM_CDF6(7699, 10897, 20891, 26926, 31628) } } } };
+
+static const aom_cdf_prob
+    av1_default_eob_multi64_cdfs[TOKEN_CDF_Q_CTXS][PLANE_TYPES][2][CDF_SIZE(
+        7)] = { { { { AOM_CDF7(329, 498, 1101, 1784, 3265, 7758) },
+                    { AOM_CDF7(335, 730, 1459, 5494, 8755, 12997) } },
+                  { { AOM_CDF7(3505, 5304, 10086, 13814, 17684, 23370) },
+                    { AOM_CDF7(1563, 2700, 4876, 10911, 14706, 22480) } } },
+                { { { AOM_CDF7(1260, 1446, 2253, 3712, 6652, 13369) },
+                    { AOM_CDF7(401, 605, 1029, 2563, 5845, 12626) } },
+                  { { AOM_CDF7(8609, 10612, 14624, 18714, 22614, 29024) },
+                    { AOM_CDF7(1923, 3127, 5867, 9703, 14277, 27100) } } },
+                { { { AOM_CDF7(2374, 2772, 4583, 7276, 12288, 19706) },
+                    { AOM_CDF7(497, 810, 1315, 3000, 7004, 15641) } },
+                  { { AOM_CDF7(15050, 17126, 21410, 24886, 28156, 30726) },
+                    { AOM_CDF7(4034, 6290, 10235, 14982, 21214, 28491) } } },
+                { { { AOM_CDF7(6307, 7541, 12060, 16358, 22553, 27865) },
+                    { AOM_CDF7(1289, 2320, 3971, 7926, 14153, 24291) } },
+                  { { AOM_CDF7(24212, 25708, 28268, 30035, 31307, 32049) },
+                    { AOM_CDF7(8726, 12378, 19409, 26450, 30038, 32462) } } } };
+
+static const aom_cdf_prob
+    av1_default_eob_multi128_cdfs[TOKEN_CDF_Q_CTXS][PLANE_TYPES][2][CDF_SIZE(
+        8)] = {
+      { { { AOM_CDF8(219, 482, 1140, 2091, 3680, 6028, 12586) },
+          { AOM_CDF8(371, 699, 1254, 4830, 9479, 12562, 17497) } },
+        { { AOM_CDF8(5245, 7456, 12880, 15852, 20033, 23932, 27608) },
+          { AOM_CDF8(2054, 3472, 5869, 14232, 18242, 20590, 26752) } } },
+      { { { AOM_CDF8(685, 933, 1488, 2714, 4766, 8562, 19254) },
+          { AOM_CDF8(217, 352, 618, 2303, 5261, 9969, 17472) } },
+        { { AOM_CDF8(8045, 11200, 15497, 19595, 23948, 27408, 30938) },
+          { AOM_CDF8(2310, 4160, 7471, 14997, 17931, 20768, 30240) } } },
+      { { { AOM_CDF8(1366, 1738, 2527, 5016, 9355, 15797, 24643) },
+          { AOM_CDF8(354, 558, 944, 2760, 7287, 14037, 21779) } },
+        { { AOM_CDF8(13627, 16246, 20173, 24429, 27948, 30415, 31863) },
+          { AOM_CDF8(6275, 9889, 14769, 23164, 27988, 30493, 32272) } } },
+      { { { AOM_CDF8(3472, 4885, 7489, 12481, 18517, 24536, 29635) },
+          { AOM_CDF8(886, 1731, 3271, 8469, 15569, 22126, 28383) } },
+        { { AOM_CDF8(24313, 26062, 28385, 30107, 31217, 31898, 32345) },
+          { AOM_CDF8(9165, 13282, 21150, 30286, 31894, 32571, 32712) } } }
+    };
+
+static const aom_cdf_prob
+    av1_default_eob_multi256_cdfs[TOKEN_CDF_Q_CTXS][PLANE_TYPES][2][CDF_SIZE(
+        9)] = {
+      { { { AOM_CDF9(310, 584, 1887, 3589, 6168, 8611, 11352, 15652) },
+          { AOM_CDF9(998, 1850, 2998, 5604, 17341, 19888, 22899, 25583) } },
+        { { AOM_CDF9(2520, 3240, 5952, 8870, 12577, 17558, 19954, 24168) },
+          { AOM_CDF9(2203, 4130, 7435, 10739, 20652, 23681, 25609, 27261) } } },
+      { { { AOM_CDF9(1448, 2109, 4151, 6263, 9329, 13260, 17944, 23300) },
+          { AOM_CDF9(399, 1019, 1749, 3038, 10444, 15546, 22739, 27294) } },
+        { { AOM_CDF9(6402, 8148, 12623, 15072, 18728, 22847, 26447, 29377) },
+          { AOM_CDF9(1674, 3252, 5734, 10159, 22397, 23802, 24821, 30940) } } },
+      { { { AOM_CDF9(3089, 3920, 6038, 9460, 14266, 19881, 25766, 29176) },
+          { AOM_CDF9(1084, 2358, 3488, 5122, 11483, 18103, 26023, 29799) } },
+        { { AOM_CDF9(11514, 13794, 17480, 20754, 24361, 27378, 29492, 31277) },
+          { AOM_CDF9(6571, 9610, 15516, 21826, 29092, 30829, 31842,
+                     32708) } } },
+      { { { AOM_CDF9(5348, 7113, 11820, 15924, 22106, 26777, 30334, 31757) },
+          { AOM_CDF9(2453, 4474, 6307, 8777, 16474, 22975, 29000, 31547) } },
+        { { AOM_CDF9(23110, 24597, 27140, 28894, 30167, 30927, 31392, 32094) },
+          { AOM_CDF9(9998, 17661, 25178, 28097, 31308, 32038, 32403,
+                     32695) } } }
+    };
+
+static const aom_cdf_prob
+    av1_default_eob_multi512_cdfs[TOKEN_CDF_Q_CTXS][PLANE_TYPES][2][CDF_SIZE(
+        10)] = { { { { AOM_CDF10(641, 983, 3707, 5430, 10234, 14958, 18788,
+                                 23412, 26061) },
+                     { AOM_CDF10(3277, 6554, 9830, 13107, 16384, 19661, 22938,
+                                 26214, 29491) } },
+                   { { AOM_CDF10(5095, 6446, 9996, 13354, 16017, 17986, 20919,
+                                 26129, 29140) },
+                     { AOM_CDF10(3277, 6554, 9830, 13107, 16384, 19661, 22938,
+                                 26214, 29491) } } },
+                 { { { AOM_CDF10(1230, 2278, 5035, 7776, 11871, 15346, 19590,
+                                 24584, 28749) },
+                     { AOM_CDF10(3277, 6554, 9830, 13107, 16384, 19661, 22938,
+                                 26214, 29491) } },
+                   { { AOM_CDF10(7265, 9979, 15819, 19250, 21780, 23846, 26478,
+                                 28396, 31811) },
+                     { AOM_CDF10(3277, 6554, 9830, 13107, 16384, 19661, 22938,
+                                 26214, 29491) } } },
+                 { { { AOM_CDF10(2624, 3936, 6480, 9686, 13979, 17726, 23267,
+                                 28410, 31078) },
+                     { AOM_CDF10(3277, 6554, 9830, 13107, 16384, 19661, 22938,
+                                 26214, 29491) } },
+                   { { AOM_CDF10(12015, 14769, 19588, 22052, 24222, 25812,
+                                 27300, 29219, 32114) },
+                     { AOM_CDF10(3277, 6554, 9830, 13107, 16384, 19661, 22938,
+                                 26214, 29491) } } },
+                 { { { AOM_CDF10(5927, 7809, 10923, 14597, 19439, 24135, 28456,
+                                 31142, 32060) },
+                     { AOM_CDF10(3277, 6554, 9830, 13107, 16384, 19661, 22938,
+                                 26214, 29491) } },
+                   { { AOM_CDF10(21093, 23043, 25742, 27658, 29097, 29716,
+                                 30073, 30820, 31956) },
+                     { AOM_CDF10(3277, 6554, 9830, 13107, 16384, 19661, 22938,
+                                 26214, 29491) } } } };
+
+static const aom_cdf_prob
+    av1_default_eob_multi1024_cdfs[TOKEN_CDF_Q_CTXS][PLANE_TYPES][2][CDF_SIZE(
+        11)] = { { { { AOM_CDF11(393, 421, 751, 1623, 3160, 6352, 13345, 18047,
+                                 22571, 25830) },
+                     { AOM_CDF11(2979, 5958, 8937, 11916, 14895, 17873, 20852,
+                                 23831, 26810, 29789) } },
+                   { { AOM_CDF11(1865, 1988, 2930, 4242, 10533, 16538, 21354,
+                                 27255, 28546, 31784) },
+                     { AOM_CDF11(2979, 5958, 8937, 11916, 14895, 17873, 20852,
+                                 23831, 26810, 29789) } } },
+                 { { { AOM_CDF11(696, 948, 3145, 5702, 9706, 13217, 17851,
+                                 21856, 25692, 28034) },
+                     { AOM_CDF11(2979, 5958, 8937, 11916, 14895, 17873, 20852,
+                                 23831, 26810, 29789) } },
+                   { { AOM_CDF11(2672, 3591, 9330, 17084, 22725, 24284, 26527,
+                                 28027, 28377, 30876) },
+                     { AOM_CDF11(2979, 5958, 8937, 11916, 14895, 17873, 20852,
+                                 23831, 26810, 29789) } } },
+                 { { { AOM_CDF11(2784, 3831, 7041, 10521, 14847, 18844, 23155,
+                                 26682, 29229, 31045) },
+                     { AOM_CDF11(2979, 5958, 8937, 11916, 14895, 17873, 20852,
+                                 23831, 26810, 29789) } },
+                   { { AOM_CDF11(9577, 12466, 17739, 20750, 22061, 23215, 24601,
+                                 25483, 25843, 32056) },
+                     { AOM_CDF11(2979, 5958, 8937, 11916, 14895, 17873, 20852,
+                                 23831, 26810, 29789) } } },
+                 { { { AOM_CDF11(6698, 8334, 11961, 15762, 20186, 23862, 27434,
+                                 29326, 31082, 32050) },
+                     { AOM_CDF11(2979, 5958, 8937, 11916, 14895, 17873, 20852,
+                                 23831, 26810, 29789) } },
+                   { { AOM_CDF11(20569, 22426, 25569, 26859, 28053, 28913,
+                                 29486, 29724, 29807, 32570) },
+                     { AOM_CDF11(2979, 5958, 8937, 11916, 14895, 17873, 20852,
+                                 23831, 26810, 29789) } } } };
+
+static const aom_cdf_prob av1_default_coeff_lps_multi_cdfs
+    [TOKEN_CDF_Q_CTXS][TX_SIZES][PLANE_TYPES][LEVEL_CONTEXTS]
+    [CDF_SIZE(BR_CDF_SIZE)] = {
+      { { { { AOM_CDF4(14298, 20718, 24174) },
+            { AOM_CDF4(12536, 19601, 23789) },
+            { AOM_CDF4(8712, 15051, 19503) },
+            { AOM_CDF4(6170, 11327, 15434) },
+            { AOM_CDF4(4742, 8926, 12538) },
+            { AOM_CDF4(3803, 7317, 10546) },
+            { AOM_CDF4(1696, 3317, 4871) },
+            { AOM_CDF4(14392, 19951, 22756) },
+            { AOM_CDF4(15978, 23218, 26818) },
+            { AOM_CDF4(12187, 19474, 23889) },
+            { AOM_CDF4(9176, 15640, 20259) },
+            { AOM_CDF4(7068, 12655, 17028) },
+            { AOM_CDF4(5656, 10442, 14472) },
+            { AOM_CDF4(2580, 4992, 7244) },
+            { AOM_CDF4(12136, 18049, 21426) },
+            { AOM_CDF4(13784, 20721, 24481) },
+            { AOM_CDF4(10836, 17621, 21900) },
+            { AOM_CDF4(8372, 14444, 18847) },
+            { AOM_CDF4(6523, 11779, 16000) },
+            { AOM_CDF4(5337, 9898, 13760) },
+            { AOM_CDF4(3034, 5860, 8462) } },
+          { { AOM_CDF4(15967, 22905, 26286) },
+            { AOM_CDF4(13534, 20654, 24579) },
+            { AOM_CDF4(9504, 16092, 20535) },
+            { AOM_CDF4(6975, 12568, 16903) },
+            { AOM_CDF4(5364, 10091, 14020) },
+            { AOM_CDF4(4357, 8370, 11857) },
+            { AOM_CDF4(2506, 4934, 7218) },
+            { AOM_CDF4(23032, 28815, 30936) },
+            { AOM_CDF4(19540, 26704, 29719) },
+            { AOM_CDF4(15158, 22969, 27097) },
+            { AOM_CDF4(11408, 18865, 23650) },
+            { AOM_CDF4(8885, 15448, 20250) },
+            { AOM_CDF4(7108, 12853, 17416) },
+            { AOM_CDF4(4231, 8041, 11480) },
+            { AOM_CDF4(19823, 26490, 29156) },
+            { AOM_CDF4(18890, 25929, 28932) },
+            { AOM_CDF4(15660, 23491, 27433) },
+            { AOM_CDF4(12147, 19776, 24488) },
+            { AOM_CDF4(9728, 16774, 21649) },
+            { AOM_CDF4(7919, 14277, 19066) },
+            { AOM_CDF4(5440, 10170, 14185) } } },
+        { { { AOM_CDF4(14406, 20862, 24414) },
+            { AOM_CDF4(11824, 18907, 23109) },
+            { AOM_CDF4(8257, 14393, 18803) },
+            { AOM_CDF4(5860, 10747, 14778) },
+            { AOM_CDF4(4475, 8486, 11984) },
+            { AOM_CDF4(3606, 6954, 10043) },
+            { AOM_CDF4(1736, 3410, 5048) },
+            { AOM_CDF4(14430, 20046, 22882) },
+            { AOM_CDF4(15593, 22899, 26709) },
+            { AOM_CDF4(12102, 19368, 23811) },
+            { AOM_CDF4(9059, 15584, 20262) },
+            { AOM_CDF4(6999, 12603, 17048) },
+            { AOM_CDF4(5684, 10497, 14553) },
+            { AOM_CDF4(2822, 5438, 7862) },
+            { AOM_CDF4(15785, 21585, 24359) },
+            { AOM_CDF4(18347, 25229, 28266) },
+            { AOM_CDF4(14974, 22487, 26389) },
+            { AOM_CDF4(11423, 18681, 23271) },
+            { AOM_CDF4(8863, 15350, 20008) },
+            { AOM_CDF4(7153, 12852, 17278) },
+            { AOM_CDF4(3707, 7036, 9982) } },
+          { { AOM_CDF4(15460, 21696, 25469) },
+            { AOM_CDF4(12170, 19249, 23191) },
+            { AOM_CDF4(8723, 15027, 19332) },
+            { AOM_CDF4(6428, 11704, 15874) },
+            { AOM_CDF4(4922, 9292, 13052) },
+            { AOM_CDF4(4139, 7695, 11010) },
+            { AOM_CDF4(2291, 4508, 6598) },
+            { AOM_CDF4(19856, 26920, 29828) },
+            { AOM_CDF4(17923, 25289, 28792) },
+            { AOM_CDF4(14278, 21968, 26297) },
+            { AOM_CDF4(10910, 18136, 22950) },
+            { AOM_CDF4(8423, 14815, 19627) },
+            { AOM_CDF4(6771, 12283, 16774) },
+            { AOM_CDF4(4074, 7750, 11081) },
+            { AOM_CDF4(19852, 26074, 28672) },
+            { AOM_CDF4(19371, 26110, 28989) },
+            { AOM_CDF4(16265, 23873, 27663) },
+            { AOM_CDF4(12758, 20378, 24952) },
+            { AOM_CDF4(10095, 17098, 21961) },
+            { AOM_CDF4(8250, 14628, 19451) },
+            { AOM_CDF4(5205, 9745, 13622) } } },
+        { { { AOM_CDF4(10563, 16233, 19763) },
+            { AOM_CDF4(9794, 16022, 19804) },
+            { AOM_CDF4(6750, 11945, 15759) },
+            { AOM_CDF4(4963, 9186, 12752) },
+            { AOM_CDF4(3845, 7435, 10627) },
+            { AOM_CDF4(3051, 6085, 8834) },
+            { AOM_CDF4(1311, 2596, 3830) },
+            { AOM_CDF4(11246, 16404, 19689) },
+            { AOM_CDF4(12315, 18911, 22731) },
+            { AOM_CDF4(10557, 17095, 21289) },
+            { AOM_CDF4(8136, 14006, 18249) },
+            { AOM_CDF4(6348, 11474, 15565) },
+            { AOM_CDF4(5196, 9655, 13400) },
+            { AOM_CDF4(2349, 4526, 6587) },
+            { AOM_CDF4(13337, 18730, 21569) },
+            { AOM_CDF4(19306, 26071, 28882) },
+            { AOM_CDF4(15952, 23540, 27254) },
+            { AOM_CDF4(12409, 19934, 24430) },
+            { AOM_CDF4(9760, 16706, 21389) },
+            { AOM_CDF4(8004, 14220, 18818) },
+            { AOM_CDF4(4138, 7794, 10961) } },
+          { { AOM_CDF4(10870, 16684, 20949) },
+            { AOM_CDF4(9664, 15230, 18680) },
+            { AOM_CDF4(6886, 12109, 15408) },
+            { AOM_CDF4(4825, 8900, 12305) },
+            { AOM_CDF4(3630, 7162, 10314) },
+            { AOM_CDF4(3036, 6429, 9387) },
+            { AOM_CDF4(1671, 3296, 4940) },
+            { AOM_CDF4(13819, 19159, 23026) },
+            { AOM_CDF4(11984, 19108, 23120) },
+            { AOM_CDF4(10690, 17210, 21663) },
+            { AOM_CDF4(7984, 14154, 18333) },
+            { AOM_CDF4(6868, 12294, 16124) },
+            { AOM_CDF4(5274, 8994, 12868) },
+            { AOM_CDF4(2988, 5771, 8424) },
+            { AOM_CDF4(19736, 26647, 29141) },
+            { AOM_CDF4(18933, 26070, 28984) },
+            { AOM_CDF4(15779, 23048, 27200) },
+            { AOM_CDF4(12638, 20061, 24532) },
+            { AOM_CDF4(10692, 17545, 22220) },
+            { AOM_CDF4(9217, 15251, 20054) },
+            { AOM_CDF4(5078, 9284, 12594) } } },
+        { { { AOM_CDF4(2331, 3662, 5244) },
+            { AOM_CDF4(2891, 4771, 6145) },
+            { AOM_CDF4(4598, 7623, 9729) },
+            { AOM_CDF4(3520, 6845, 9199) },
+            { AOM_CDF4(3417, 6119, 9324) },
+            { AOM_CDF4(2601, 5412, 7385) },
+            { AOM_CDF4(600, 1173, 1744) },
+            { AOM_CDF4(7672, 13286, 17469) },
+            { AOM_CDF4(4232, 7792, 10793) },
+            { AOM_CDF4(2915, 5317, 7397) },
+            { AOM_CDF4(2318, 4356, 6152) },
+            { AOM_CDF4(2127, 4000, 5554) },
+            { AOM_CDF4(1850, 3478, 5275) },
+            { AOM_CDF4(977, 1933, 2843) },
+            { AOM_CDF4(18280, 24387, 27989) },
+            { AOM_CDF4(15852, 22671, 26185) },
+            { AOM_CDF4(13845, 20951, 24789) },
+            { AOM_CDF4(11055, 17966, 22129) },
+            { AOM_CDF4(9138, 15422, 19801) },
+            { AOM_CDF4(7454, 13145, 17456) },
+            { AOM_CDF4(3370, 6393, 9013) } },
+          { { AOM_CDF4(5842, 9229, 10838) },
+            { AOM_CDF4(2313, 3491, 4276) },
+            { AOM_CDF4(2998, 6104, 7496) },
+            { AOM_CDF4(2420, 7447, 9868) },
+            { AOM_CDF4(3034, 8495, 10923) },
+            { AOM_CDF4(4076, 8937, 10975) },
+            { AOM_CDF4(1086, 2370, 3299) },
+            { AOM_CDF4(9714, 17254, 20444) },
+            { AOM_CDF4(8543, 13698, 17123) },
+            { AOM_CDF4(4918, 9007, 11910) },
+            { AOM_CDF4(4129, 7532, 10553) },
+            { AOM_CDF4(2364, 5533, 8058) },
+            { AOM_CDF4(1834, 3546, 5563) },
+            { AOM_CDF4(1473, 2908, 4133) },
+            { AOM_CDF4(15405, 21193, 25619) },
+            { AOM_CDF4(15691, 21952, 26561) },
+            { AOM_CDF4(12962, 19194, 24165) },
+            { AOM_CDF4(10272, 17855, 22129) },
+            { AOM_CDF4(8588, 15270, 20718) },
+            { AOM_CDF4(8682, 14669, 19500) },
+            { AOM_CDF4(4870, 9636, 13205) } } },
+        { { { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) } },
+          { { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) } } } },
+      { { { { AOM_CDF4(14995, 21341, 24749) },
+            { AOM_CDF4(13158, 20289, 24601) },
+            { AOM_CDF4(8941, 15326, 19876) },
+            { AOM_CDF4(6297, 11541, 15807) },
+            { AOM_CDF4(4817, 9029, 12776) },
+            { AOM_CDF4(3731, 7273, 10627) },
+            { AOM_CDF4(1847, 3617, 5354) },
+            { AOM_CDF4(14472, 19659, 22343) },
+            { AOM_CDF4(16806, 24162, 27533) },
+            { AOM_CDF4(12900, 20404, 24713) },
+            { AOM_CDF4(9411, 16112, 20797) },
+            { AOM_CDF4(7056, 12697, 17148) },
+            { AOM_CDF4(5544, 10339, 14460) },
+            { AOM_CDF4(2954, 5704, 8319) },
+            { AOM_CDF4(12464, 18071, 21354) },
+            { AOM_CDF4(15482, 22528, 26034) },
+            { AOM_CDF4(12070, 19269, 23624) },
+            { AOM_CDF4(8953, 15406, 20106) },
+            { AOM_CDF4(7027, 12730, 17220) },
+            { AOM_CDF4(5887, 10913, 15140) },
+            { AOM_CDF4(3793, 7278, 10447) } },
+          { { AOM_CDF4(15571, 22232, 25749) },
+            { AOM_CDF4(14506, 21575, 25374) },
+            { AOM_CDF4(10189, 17089, 21569) },
+            { AOM_CDF4(7316, 13301, 17915) },
+            { AOM_CDF4(5783, 10912, 15190) },
+            { AOM_CDF4(4760, 9155, 13088) },
+            { AOM_CDF4(2993, 5966, 8774) },
+            { AOM_CDF4(23424, 28903, 30778) },
+            { AOM_CDF4(20775, 27666, 30290) },
+            { AOM_CDF4(16474, 24410, 28299) },
+            { AOM_CDF4(12471, 20180, 24987) },
+            { AOM_CDF4(9410, 16487, 21439) },
+            { AOM_CDF4(7536, 13614, 18529) },
+            { AOM_CDF4(5048, 9586, 13549) },
+            { AOM_CDF4(21090, 27290, 29756) },
+            { AOM_CDF4(20796, 27402, 30026) },
+            { AOM_CDF4(17819, 25485, 28969) },
+            { AOM_CDF4(13860, 21909, 26462) },
+            { AOM_CDF4(11002, 18494, 23529) },
+            { AOM_CDF4(8953, 15929, 20897) },
+            { AOM_CDF4(6448, 11918, 16454) } } },
+        { { { AOM_CDF4(15999, 22208, 25449) },
+            { AOM_CDF4(13050, 19988, 24122) },
+            { AOM_CDF4(8594, 14864, 19378) },
+            { AOM_CDF4(6033, 11079, 15238) },
+            { AOM_CDF4(4554, 8683, 12347) },
+            { AOM_CDF4(3672, 7139, 10337) },
+            { AOM_CDF4(1900, 3771, 5576) },
+            { AOM_CDF4(15788, 21340, 23949) },
+            { AOM_CDF4(16825, 24235, 27758) },
+            { AOM_CDF4(12873, 20402, 24810) },
+            { AOM_CDF4(9590, 16363, 21094) },
+            { AOM_CDF4(7352, 13209, 17733) },
+            { AOM_CDF4(5960, 10989, 15184) },
+            { AOM_CDF4(3232, 6234, 9007) },
+            { AOM_CDF4(15761, 20716, 23224) },
+            { AOM_CDF4(19318, 25989, 28759) },
+            { AOM_CDF4(15529, 23094, 26929) },
+            { AOM_CDF4(11662, 18989, 23641) },
+            { AOM_CDF4(8955, 15568, 20366) },
+            { AOM_CDF4(7281, 13106, 17708) },
+            { AOM_CDF4(4248, 8059, 11440) } },
+          { { AOM_CDF4(14899, 21217, 24503) },
+            { AOM_CDF4(13519, 20283, 24047) },
+            { AOM_CDF4(9429, 15966, 20365) },
+            { AOM_CDF4(6700, 12355, 16652) },
+            { AOM_CDF4(5088, 9704, 13716) },
+            { AOM_CDF4(4243, 8154, 11731) },
+            { AOM_CDF4(2702, 5364, 7861) },
+            { AOM_CDF4(22745, 28388, 30454) },
+            { AOM_CDF4(20235, 27146, 29922) },
+            { AOM_CDF4(15896, 23715, 27637) },
+            { AOM_CDF4(11840, 19350, 24131) },
+            { AOM_CDF4(9122, 15932, 20880) },
+            { AOM_CDF4(7488, 13581, 18362) },
+            { AOM_CDF4(5114, 9568, 13370) },
+            { AOM_CDF4(20845, 26553, 28932) },
+            { AOM_CDF4(20981, 27372, 29884) },
+            { AOM_CDF4(17781, 25335, 28785) },
+            { AOM_CDF4(13760, 21708, 26297) },
+            { AOM_CDF4(10975, 18415, 23365) },
+            { AOM_CDF4(9045, 15789, 20686) },
+            { AOM_CDF4(6130, 11199, 15423) } } },
+        { { { AOM_CDF4(13549, 19724, 23158) },
+            { AOM_CDF4(11844, 18382, 22246) },
+            { AOM_CDF4(7919, 13619, 17773) },
+            { AOM_CDF4(5486, 10143, 13946) },
+            { AOM_CDF4(4166, 7983, 11324) },
+            { AOM_CDF4(3364, 6506, 9427) },
+            { AOM_CDF4(1598, 3160, 4674) },
+            { AOM_CDF4(15281, 20979, 23781) },
+            { AOM_CDF4(14939, 22119, 25952) },
+            { AOM_CDF4(11363, 18407, 22812) },
+            { AOM_CDF4(8609, 14857, 19370) },
+            { AOM_CDF4(6737, 12184, 16480) },
+            { AOM_CDF4(5506, 10263, 14262) },
+            { AOM_CDF4(2990, 5786, 8380) },
+            { AOM_CDF4(20249, 25253, 27417) },
+            { AOM_CDF4(21070, 27518, 30001) },
+            { AOM_CDF4(16854, 24469, 28074) },
+            { AOM_CDF4(12864, 20486, 25000) },
+            { AOM_CDF4(9962, 16978, 21778) },
+            { AOM_CDF4(8074, 14338, 19048) },
+            { AOM_CDF4(4494, 8479, 11906) } },
+          { { AOM_CDF4(13960, 19617, 22829) },
+            { AOM_CDF4(11150, 17341, 21228) },
+            { AOM_CDF4(7150, 12964, 17190) },
+            { AOM_CDF4(5331, 10002, 13867) },
+            { AOM_CDF4(4167, 7744, 11057) },
+            { AOM_CDF4(3480, 6629, 9646) },
+            { AOM_CDF4(1883, 3784, 5686) },
+            { AOM_CDF4(18752, 25660, 28912) },
+            { AOM_CDF4(16968, 24586, 28030) },
+            { AOM_CDF4(13520, 21055, 25313) },
+            { AOM_CDF4(10453, 17626, 22280) },
+            { AOM_CDF4(8386, 14505, 19116) },
+            { AOM_CDF4(6742, 12595, 17008) },
+            { AOM_CDF4(4273, 8140, 11499) },
+            { AOM_CDF4(22120, 27827, 30233) },
+            { AOM_CDF4(20563, 27358, 29895) },
+            { AOM_CDF4(17076, 24644, 28153) },
+            { AOM_CDF4(13362, 20942, 25309) },
+            { AOM_CDF4(10794, 17965, 22695) },
+            { AOM_CDF4(9014, 15652, 20319) },
+            { AOM_CDF4(5708, 10512, 14497) } } },
+        { { { AOM_CDF4(5705, 10930, 15725) },
+            { AOM_CDF4(7946, 12765, 16115) },
+            { AOM_CDF4(6801, 12123, 16226) },
+            { AOM_CDF4(5462, 10135, 14200) },
+            { AOM_CDF4(4189, 8011, 11507) },
+            { AOM_CDF4(3191, 6229, 9408) },
+            { AOM_CDF4(1057, 2137, 3212) },
+            { AOM_CDF4(10018, 17067, 21491) },
+            { AOM_CDF4(7380, 12582, 16453) },
+            { AOM_CDF4(6068, 10845, 14339) },
+            { AOM_CDF4(5098, 9198, 12555) },
+            { AOM_CDF4(4312, 8010, 11119) },
+            { AOM_CDF4(3700, 6966, 9781) },
+            { AOM_CDF4(1693, 3326, 4887) },
+            { AOM_CDF4(18757, 24930, 27774) },
+            { AOM_CDF4(17648, 24596, 27817) },
+            { AOM_CDF4(14707, 22052, 26026) },
+            { AOM_CDF4(11720, 18852, 23292) },
+            { AOM_CDF4(9357, 15952, 20525) },
+            { AOM_CDF4(7810, 13753, 18210) },
+            { AOM_CDF4(3879, 7333, 10328) } },
+          { { AOM_CDF4(8278, 13242, 15922) },
+            { AOM_CDF4(10547, 15867, 18919) },
+            { AOM_CDF4(9106, 15842, 20609) },
+            { AOM_CDF4(6833, 13007, 17218) },
+            { AOM_CDF4(4811, 9712, 13923) },
+            { AOM_CDF4(3985, 7352, 11128) },
+            { AOM_CDF4(1688, 3458, 5262) },
+            { AOM_CDF4(12951, 21861, 26510) },
+            { AOM_CDF4(9788, 16044, 20276) },
+            { AOM_CDF4(6309, 11244, 14870) },
+            { AOM_CDF4(5183, 9349, 12566) },
+            { AOM_CDF4(4389, 8229, 11492) },
+            { AOM_CDF4(3633, 6945, 10620) },
+            { AOM_CDF4(3600, 6847, 9907) },
+            { AOM_CDF4(21748, 28137, 30255) },
+            { AOM_CDF4(19436, 26581, 29560) },
+            { AOM_CDF4(16359, 24201, 27953) },
+            { AOM_CDF4(13961, 21693, 25871) },
+            { AOM_CDF4(11544, 18686, 23322) },
+            { AOM_CDF4(9372, 16462, 20952) },
+            { AOM_CDF4(6138, 11210, 15390) } } },
+        { { { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) } },
+          { { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) } } } },
+      { { { { AOM_CDF4(16138, 22223, 25509) },
+            { AOM_CDF4(15347, 22430, 26332) },
+            { AOM_CDF4(9614, 16736, 21332) },
+            { AOM_CDF4(6600, 12275, 16907) },
+            { AOM_CDF4(4811, 9424, 13547) },
+            { AOM_CDF4(3748, 7809, 11420) },
+            { AOM_CDF4(2254, 4587, 6890) },
+            { AOM_CDF4(15196, 20284, 23177) },
+            { AOM_CDF4(18317, 25469, 28451) },
+            { AOM_CDF4(13918, 21651, 25842) },
+            { AOM_CDF4(10052, 17150, 21995) },
+            { AOM_CDF4(7499, 13630, 18587) },
+            { AOM_CDF4(6158, 11417, 16003) },
+            { AOM_CDF4(4014, 7785, 11252) },
+            { AOM_CDF4(15048, 21067, 24384) },
+            { AOM_CDF4(18202, 25346, 28553) },
+            { AOM_CDF4(14302, 22019, 26356) },
+            { AOM_CDF4(10839, 18139, 23166) },
+            { AOM_CDF4(8715, 15744, 20806) },
+            { AOM_CDF4(7536, 13576, 18544) },
+            { AOM_CDF4(5413, 10335, 14498) } },
+          { { AOM_CDF4(17394, 24501, 27895) },
+            { AOM_CDF4(15889, 23420, 27185) },
+            { AOM_CDF4(11561, 19133, 23870) },
+            { AOM_CDF4(8285, 14812, 19844) },
+            { AOM_CDF4(6496, 12043, 16550) },
+            { AOM_CDF4(4771, 9574, 13677) },
+            { AOM_CDF4(3603, 6830, 10144) },
+            { AOM_CDF4(21656, 27704, 30200) },
+            { AOM_CDF4(21324, 27915, 30511) },
+            { AOM_CDF4(17327, 25336, 28997) },
+            { AOM_CDF4(13417, 21381, 26033) },
+            { AOM_CDF4(10132, 17425, 22338) },
+            { AOM_CDF4(8580, 15016, 19633) },
+            { AOM_CDF4(5694, 11477, 16411) },
+            { AOM_CDF4(24116, 29780, 31450) },
+            { AOM_CDF4(23853, 29695, 31591) },
+            { AOM_CDF4(20085, 27614, 30428) },
+            { AOM_CDF4(15326, 24335, 28575) },
+            { AOM_CDF4(11814, 19472, 24810) },
+            { AOM_CDF4(10221, 18611, 24767) },
+            { AOM_CDF4(7689, 14558, 20321) } } },
+        { { { AOM_CDF4(16214, 22380, 25770) },
+            { AOM_CDF4(14213, 21304, 25295) },
+            { AOM_CDF4(9213, 15823, 20455) },
+            { AOM_CDF4(6395, 11758, 16139) },
+            { AOM_CDF4(4779, 9187, 13066) },
+            { AOM_CDF4(3821, 7501, 10953) },
+            { AOM_CDF4(2293, 4567, 6795) },
+            { AOM_CDF4(15859, 21283, 23820) },
+            { AOM_CDF4(18404, 25602, 28726) },
+            { AOM_CDF4(14325, 21980, 26206) },
+            { AOM_CDF4(10669, 17937, 22720) },
+            { AOM_CDF4(8297, 14642, 19447) },
+            { AOM_CDF4(6746, 12389, 16893) },
+            { AOM_CDF4(4324, 8251, 11770) },
+            { AOM_CDF4(16532, 21631, 24475) },
+            { AOM_CDF4(20667, 27150, 29668) },
+            { AOM_CDF4(16728, 24510, 28175) },
+            { AOM_CDF4(12861, 20645, 25332) },
+            { AOM_CDF4(10076, 17361, 22417) },
+            { AOM_CDF4(8395, 14940, 19963) },
+            { AOM_CDF4(5731, 10683, 14912) } },
+          { { AOM_CDF4(14433, 21155, 24938) },
+            { AOM_CDF4(14658, 21716, 25545) },
+            { AOM_CDF4(9923, 16824, 21557) },
+            { AOM_CDF4(6982, 13052, 17721) },
+            { AOM_CDF4(5419, 10503, 15050) },
+            { AOM_CDF4(4852, 9162, 13014) },
+            { AOM_CDF4(3271, 6395, 9630) },
+            { AOM_CDF4(22210, 27833, 30109) },
+            { AOM_CDF4(20750, 27368, 29821) },
+            { AOM_CDF4(16894, 24828, 28573) },
+            { AOM_CDF4(13247, 21276, 25757) },
+            { AOM_CDF4(10038, 17265, 22563) },
+            { AOM_CDF4(8587, 14947, 20327) },
+            { AOM_CDF4(5645, 11371, 15252) },
+            { AOM_CDF4(22027, 27526, 29714) },
+            { AOM_CDF4(23098, 29146, 31221) },
+            { AOM_CDF4(19886, 27341, 30272) },
+            { AOM_CDF4(15609, 23747, 28046) },
+            { AOM_CDF4(11993, 20065, 24939) },
+            { AOM_CDF4(9637, 18267, 23671) },
+            { AOM_CDF4(7625, 13801, 19144) } } },
+        { { { AOM_CDF4(14438, 20798, 24089) },
+            { AOM_CDF4(12621, 19203, 23097) },
+            { AOM_CDF4(8177, 14125, 18402) },
+            { AOM_CDF4(5674, 10501, 14456) },
+            { AOM_CDF4(4236, 8239, 11733) },
+            { AOM_CDF4(3447, 6750, 9806) },
+            { AOM_CDF4(1986, 3950, 5864) },
+            { AOM_CDF4(16208, 22099, 24930) },
+            { AOM_CDF4(16537, 24025, 27585) },
+            { AOM_CDF4(12780, 20381, 24867) },
+            { AOM_CDF4(9767, 16612, 21416) },
+            { AOM_CDF4(7686, 13738, 18398) },
+            { AOM_CDF4(6333, 11614, 15964) },
+            { AOM_CDF4(3941, 7571, 10836) },
+            { AOM_CDF4(22819, 27422, 29202) },
+            { AOM_CDF4(22224, 28514, 30721) },
+            { AOM_CDF4(17660, 25433, 28913) },
+            { AOM_CDF4(13574, 21482, 26002) },
+            { AOM_CDF4(10629, 17977, 22938) },
+            { AOM_CDF4(8612, 15298, 20265) },
+            { AOM_CDF4(5607, 10491, 14596) } },
+          { { AOM_CDF4(13569, 19800, 23206) },
+            { AOM_CDF4(13128, 19924, 23869) },
+            { AOM_CDF4(8329, 14841, 19403) },
+            { AOM_CDF4(6130, 10976, 15057) },
+            { AOM_CDF4(4682, 8839, 12518) },
+            { AOM_CDF4(3656, 7409, 10588) },
+            { AOM_CDF4(2577, 5099, 7412) },
+            { AOM_CDF4(22427, 28684, 30585) },
+            { AOM_CDF4(20913, 27750, 30139) },
+            { AOM_CDF4(15840, 24109, 27834) },
+            { AOM_CDF4(12308, 20029, 24569) },
+            { AOM_CDF4(10216, 16785, 21458) },
+            { AOM_CDF4(8309, 14203, 19113) },
+            { AOM_CDF4(6043, 11168, 15307) },
+            { AOM_CDF4(23166, 28901, 30998) },
+            { AOM_CDF4(21899, 28405, 30751) },
+            { AOM_CDF4(18413, 26091, 29443) },
+            { AOM_CDF4(15233, 23114, 27352) },
+            { AOM_CDF4(12683, 20472, 25288) },
+            { AOM_CDF4(10702, 18259, 23409) },
+            { AOM_CDF4(8125, 14464, 19226) } } },
+        { { { AOM_CDF4(9040, 14786, 18360) },
+            { AOM_CDF4(9979, 15718, 19415) },
+            { AOM_CDF4(7913, 13918, 18311) },
+            { AOM_CDF4(5859, 10889, 15184) },
+            { AOM_CDF4(4593, 8677, 12510) },
+            { AOM_CDF4(3820, 7396, 10791) },
+            { AOM_CDF4(1730, 3471, 5192) },
+            { AOM_CDF4(11803, 18365, 22709) },
+            { AOM_CDF4(11419, 18058, 22225) },
+            { AOM_CDF4(9418, 15774, 20243) },
+            { AOM_CDF4(7539, 13325, 17657) },
+            { AOM_CDF4(6233, 11317, 15384) },
+            { AOM_CDF4(5137, 9656, 13545) },
+            { AOM_CDF4(2977, 5774, 8349) },
+            { AOM_CDF4(21207, 27246, 29640) },
+            { AOM_CDF4(19547, 26578, 29497) },
+            { AOM_CDF4(16169, 23871, 27690) },
+            { AOM_CDF4(12820, 20458, 25018) },
+            { AOM_CDF4(10224, 17332, 22214) },
+            { AOM_CDF4(8526, 15048, 19884) },
+            { AOM_CDF4(5037, 9410, 13118) } },
+          { { AOM_CDF4(12339, 17329, 20140) },
+            { AOM_CDF4(13505, 19895, 23225) },
+            { AOM_CDF4(9847, 16944, 21564) },
+            { AOM_CDF4(7280, 13256, 18348) },
+            { AOM_CDF4(4712, 10009, 14454) },
+            { AOM_CDF4(4361, 7914, 12477) },
+            { AOM_CDF4(2870, 5628, 7995) },
+            { AOM_CDF4(20061, 25504, 28526) },
+            { AOM_CDF4(15235, 22878, 26145) },
+            { AOM_CDF4(12985, 19958, 24155) },
+            { AOM_CDF4(9782, 16641, 21403) },
+            { AOM_CDF4(9456, 16360, 20760) },
+            { AOM_CDF4(6855, 12940, 18557) },
+            { AOM_CDF4(5661, 10564, 15002) },
+            { AOM_CDF4(25656, 30602, 31894) },
+            { AOM_CDF4(22570, 29107, 31092) },
+            { AOM_CDF4(18917, 26423, 29541) },
+            { AOM_CDF4(15940, 23649, 27754) },
+            { AOM_CDF4(12803, 20581, 25219) },
+            { AOM_CDF4(11082, 18695, 23376) },
+            { AOM_CDF4(7939, 14373, 19005) } } },
+        { { { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) } },
+          { { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) } } } },
+      { { { { AOM_CDF4(18315, 24289, 27551) },
+            { AOM_CDF4(16854, 24068, 27835) },
+            { AOM_CDF4(10140, 17927, 23173) },
+            { AOM_CDF4(6722, 12982, 18267) },
+            { AOM_CDF4(4661, 9826, 14706) },
+            { AOM_CDF4(3832, 8165, 12294) },
+            { AOM_CDF4(2795, 6098, 9245) },
+            { AOM_CDF4(17145, 23326, 26672) },
+            { AOM_CDF4(20733, 27680, 30308) },
+            { AOM_CDF4(16032, 24461, 28546) },
+            { AOM_CDF4(11653, 20093, 25081) },
+            { AOM_CDF4(9290, 16429, 22086) },
+            { AOM_CDF4(7796, 14598, 19982) },
+            { AOM_CDF4(6502, 12378, 17441) },
+            { AOM_CDF4(21681, 27732, 30320) },
+            { AOM_CDF4(22389, 29044, 31261) },
+            { AOM_CDF4(19027, 26731, 30087) },
+            { AOM_CDF4(14739, 23755, 28624) },
+            { AOM_CDF4(11358, 20778, 25511) },
+            { AOM_CDF4(10995, 18073, 24190) },
+            { AOM_CDF4(9162, 14990, 20617) } },
+          { { AOM_CDF4(21425, 27952, 30388) },
+            { AOM_CDF4(18062, 25838, 29034) },
+            { AOM_CDF4(11956, 19881, 24808) },
+            { AOM_CDF4(7718, 15000, 20980) },
+            { AOM_CDF4(5702, 11254, 16143) },
+            { AOM_CDF4(4898, 9088, 16864) },
+            { AOM_CDF4(3679, 6776, 11907) },
+            { AOM_CDF4(23294, 30160, 31663) },
+            { AOM_CDF4(24397, 29896, 31836) },
+            { AOM_CDF4(19245, 27128, 30593) },
+            { AOM_CDF4(13202, 19825, 26404) },
+            { AOM_CDF4(11578, 19297, 23957) },
+            { AOM_CDF4(8073, 13297, 21370) },
+            { AOM_CDF4(5461, 10923, 19745) },
+            { AOM_CDF4(27367, 30521, 31934) },
+            { AOM_CDF4(24904, 30671, 31940) },
+            { AOM_CDF4(23075, 28460, 31299) },
+            { AOM_CDF4(14400, 23658, 30417) },
+            { AOM_CDF4(13885, 23882, 28325) },
+            { AOM_CDF4(14746, 22938, 27853) },
+            { AOM_CDF4(5461, 16384, 27307) } } },
+        { { { AOM_CDF4(18274, 24813, 27890) },
+            { AOM_CDF4(15537, 23149, 27003) },
+            { AOM_CDF4(9449, 16740, 21827) },
+            { AOM_CDF4(6700, 12498, 17261) },
+            { AOM_CDF4(4988, 9866, 14198) },
+            { AOM_CDF4(4236, 8147, 11902) },
+            { AOM_CDF4(2867, 5860, 8654) },
+            { AOM_CDF4(17124, 23171, 26101) },
+            { AOM_CDF4(20396, 27477, 30148) },
+            { AOM_CDF4(16573, 24629, 28492) },
+            { AOM_CDF4(12749, 20846, 25674) },
+            { AOM_CDF4(10233, 17878, 22818) },
+            { AOM_CDF4(8525, 15332, 20363) },
+            { AOM_CDF4(6283, 11632, 16255) },
+            { AOM_CDF4(20466, 26511, 29286) },
+            { AOM_CDF4(23059, 29174, 31191) },
+            { AOM_CDF4(19481, 27263, 30241) },
+            { AOM_CDF4(15458, 23631, 28137) },
+            { AOM_CDF4(12416, 20608, 25693) },
+            { AOM_CDF4(10261, 18011, 23261) },
+            { AOM_CDF4(8016, 14655, 19666) } },
+          { { AOM_CDF4(17616, 24586, 28112) },
+            { AOM_CDF4(15809, 23299, 27155) },
+            { AOM_CDF4(10767, 18890, 23793) },
+            { AOM_CDF4(7727, 14255, 18865) },
+            { AOM_CDF4(6129, 11926, 16882) },
+            { AOM_CDF4(4482, 9704, 14861) },
+            { AOM_CDF4(3277, 7452, 11522) },
+            { AOM_CDF4(22956, 28551, 30730) },
+            { AOM_CDF4(22724, 28937, 30961) },
+            { AOM_CDF4(18467, 26324, 29580) },
+            { AOM_CDF4(13234, 20713, 25649) },
+            { AOM_CDF4(11181, 17592, 22481) },
+            { AOM_CDF4(8291, 18358, 24576) },
+            { AOM_CDF4(7568, 11881, 14984) },
+            { AOM_CDF4(24948, 29001, 31147) },
+            { AOM_CDF4(25674, 30619, 32151) },
+            { AOM_CDF4(20841, 26793, 29603) },
+            { AOM_CDF4(14669, 24356, 28666) },
+            { AOM_CDF4(11334, 23593, 28219) },
+            { AOM_CDF4(8922, 14762, 22873) },
+            { AOM_CDF4(8301, 13544, 20535) } } },
+        { { { AOM_CDF4(17113, 23733, 27081) },
+            { AOM_CDF4(14139, 21406, 25452) },
+            { AOM_CDF4(8552, 15002, 19776) },
+            { AOM_CDF4(5871, 11120, 15378) },
+            { AOM_CDF4(4455, 8616, 12253) },
+            { AOM_CDF4(3469, 6910, 10386) },
+            { AOM_CDF4(2255, 4553, 6782) },
+            { AOM_CDF4(18224, 24376, 27053) },
+            { AOM_CDF4(19290, 26710, 29614) },
+            { AOM_CDF4(14936, 22991, 27184) },
+            { AOM_CDF4(11238, 18951, 23762) },
+            { AOM_CDF4(8786, 15617, 20588) },
+            { AOM_CDF4(7317, 13228, 18003) },
+            { AOM_CDF4(5101, 9512, 13493) },
+            { AOM_CDF4(22639, 28222, 30210) },
+            { AOM_CDF4(23216, 29331, 31307) },
+            { AOM_CDF4(19075, 26762, 29895) },
+            { AOM_CDF4(15014, 23113, 27457) },
+            { AOM_CDF4(11938, 19857, 24752) },
+            { AOM_CDF4(9942, 17280, 22282) },
+            { AOM_CDF4(7167, 13144, 17752) } },
+          { { AOM_CDF4(15820, 22738, 26488) },
+            { AOM_CDF4(13530, 20885, 25216) },
+            { AOM_CDF4(8395, 15530, 20452) },
+            { AOM_CDF4(6574, 12321, 16380) },
+            { AOM_CDF4(5353, 10419, 14568) },
+            { AOM_CDF4(4613, 8446, 12381) },
+            { AOM_CDF4(3440, 7158, 9903) },
+            { AOM_CDF4(24247, 29051, 31224) },
+            { AOM_CDF4(22118, 28058, 30369) },
+            { AOM_CDF4(16498, 24768, 28389) },
+            { AOM_CDF4(12920, 21175, 26137) },
+            { AOM_CDF4(10730, 18619, 25352) },
+            { AOM_CDF4(10187, 16279, 22791) },
+            { AOM_CDF4(9310, 14631, 22127) },
+            { AOM_CDF4(24970, 30558, 32057) },
+            { AOM_CDF4(24801, 29942, 31698) },
+            { AOM_CDF4(22432, 28453, 30855) },
+            { AOM_CDF4(19054, 25680, 29580) },
+            { AOM_CDF4(14392, 23036, 28109) },
+            { AOM_CDF4(12495, 20947, 26650) },
+            { AOM_CDF4(12442, 20326, 26214) } } },
+        { { { AOM_CDF4(12162, 18785, 22648) },
+            { AOM_CDF4(12749, 19697, 23806) },
+            { AOM_CDF4(8580, 15297, 20346) },
+            { AOM_CDF4(6169, 11749, 16543) },
+            { AOM_CDF4(4836, 9391, 13448) },
+            { AOM_CDF4(3821, 7711, 11613) },
+            { AOM_CDF4(2228, 4601, 7070) },
+            { AOM_CDF4(16319, 24725, 28280) },
+            { AOM_CDF4(15698, 23277, 27168) },
+            { AOM_CDF4(12726, 20368, 25047) },
+            { AOM_CDF4(9912, 17015, 21976) },
+            { AOM_CDF4(7888, 14220, 19179) },
+            { AOM_CDF4(6777, 12284, 17018) },
+            { AOM_CDF4(4492, 8590, 12252) },
+            { AOM_CDF4(23249, 28904, 30947) },
+            { AOM_CDF4(21050, 27908, 30512) },
+            { AOM_CDF4(17440, 25340, 28949) },
+            { AOM_CDF4(14059, 22018, 26541) },
+            { AOM_CDF4(11288, 18903, 23898) },
+            { AOM_CDF4(9411, 16342, 21428) },
+            { AOM_CDF4(6278, 11588, 15944) } },
+          { { AOM_CDF4(13981, 20067, 23226) },
+            { AOM_CDF4(16922, 23580, 26783) },
+            { AOM_CDF4(11005, 19039, 24487) },
+            { AOM_CDF4(7389, 14218, 19798) },
+            { AOM_CDF4(5598, 11505, 17206) },
+            { AOM_CDF4(6090, 11213, 15659) },
+            { AOM_CDF4(3820, 7371, 10119) },
+            { AOM_CDF4(21082, 26925, 29675) },
+            { AOM_CDF4(21262, 28627, 31128) },
+            { AOM_CDF4(18392, 26454, 30437) },
+            { AOM_CDF4(14870, 22910, 27096) },
+            { AOM_CDF4(12620, 19484, 24908) },
+            { AOM_CDF4(9290, 16553, 22802) },
+            { AOM_CDF4(6668, 14288, 20004) },
+            { AOM_CDF4(27704, 31055, 31949) },
+            { AOM_CDF4(24709, 29978, 31788) },
+            { AOM_CDF4(21668, 29264, 31657) },
+            { AOM_CDF4(18295, 26968, 30074) },
+            { AOM_CDF4(16399, 24422, 29313) },
+            { AOM_CDF4(14347, 23026, 28104) },
+            { AOM_CDF4(12370, 19806, 24477) } } },
+        { { { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) } },
+          { { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) }, { AOM_CDF4(8192, 16384, 24576) },
+            { AOM_CDF4(8192, 16384, 24576) } } } }
+    };
+
+static const aom_cdf_prob av1_default_coeff_base_multi_cdfs
+    [TOKEN_CDF_Q_CTXS][TX_SIZES][PLANE_TYPES][SIG_COEF_CONTEXTS]
+    [CDF_SIZE(NUM_BASE_LEVELS + 2)] =
+        { { { { { AOM_CDF4(4034, 8930, 12727) },
+                { AOM_CDF4(18082, 29741, 31877) },
+                { AOM_CDF4(12596, 26124, 30493) },
+                { AOM_CDF4(9446, 21118, 27005) },
+                { AOM_CDF4(6308, 15141, 21279) },
+                { AOM_CDF4(2463, 6357, 9783) },
+                { AOM_CDF4(20667, 30546, 31929) },
+                { AOM_CDF4(13043, 26123, 30134) },
+                { AOM_CDF4(8151, 18757, 24778) },
+                { AOM_CDF4(5255, 12839, 18632) },
+                { AOM_CDF4(2820, 7206, 11161) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(15736, 27553, 30604) },
+                { AOM_CDF4(11210, 23794, 28787) },
+                { AOM_CDF4(5947, 13874, 19701) },
+                { AOM_CDF4(4215, 9323, 13891) },
+                { AOM_CDF4(2833, 6462, 10059) },
+                { AOM_CDF4(19605, 30393, 31582) },
+                { AOM_CDF4(13523, 26252, 30248) },
+                { AOM_CDF4(8446, 18622, 24512) },
+                { AOM_CDF4(3818, 10343, 15974) },
+                { AOM_CDF4(1481, 4117, 6796) },
+                { AOM_CDF4(22649, 31302, 32190) },
+                { AOM_CDF4(14829, 27127, 30449) },
+                { AOM_CDF4(8313, 17702, 23304) },
+                { AOM_CDF4(3022, 8301, 12786) },
+                { AOM_CDF4(1536, 4412, 7184) },
+                { AOM_CDF4(22354, 29774, 31372) },
+                { AOM_CDF4(14723, 25472, 29214) },
+                { AOM_CDF4(6673, 13745, 18662) },
+                { AOM_CDF4(2068, 5766, 9322) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(6302, 16444, 21761) },
+                { AOM_CDF4(23040, 31538, 32475) },
+                { AOM_CDF4(15196, 28452, 31496) },
+                { AOM_CDF4(10020, 22946, 28514) },
+                { AOM_CDF4(6533, 16862, 23501) },
+                { AOM_CDF4(3538, 9816, 15076) },
+                { AOM_CDF4(24444, 31875, 32525) },
+                { AOM_CDF4(15881, 28924, 31635) },
+                { AOM_CDF4(9922, 22873, 28466) },
+                { AOM_CDF4(6527, 16966, 23691) },
+                { AOM_CDF4(4114, 11303, 17220) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(20201, 30770, 32209) },
+                { AOM_CDF4(14754, 28071, 31258) },
+                { AOM_CDF4(8378, 20186, 26517) },
+                { AOM_CDF4(5916, 15299, 21978) },
+                { AOM_CDF4(4268, 11583, 17901) },
+                { AOM_CDF4(24361, 32025, 32581) },
+                { AOM_CDF4(18673, 30105, 31943) },
+                { AOM_CDF4(10196, 22244, 27576) },
+                { AOM_CDF4(5495, 14349, 20417) },
+                { AOM_CDF4(2676, 7415, 11498) },
+                { AOM_CDF4(24678, 31958, 32585) },
+                { AOM_CDF4(18629, 29906, 31831) },
+                { AOM_CDF4(9364, 20724, 26315) },
+                { AOM_CDF4(4641, 12318, 18094) },
+                { AOM_CDF4(2758, 7387, 11579) },
+                { AOM_CDF4(25433, 31842, 32469) },
+                { AOM_CDF4(18795, 29289, 31411) },
+                { AOM_CDF4(7644, 17584, 23592) },
+                { AOM_CDF4(3408, 9014, 15047) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } } },
+            { { { AOM_CDF4(4536, 10072, 14001) },
+                { AOM_CDF4(25459, 31416, 32206) },
+                { AOM_CDF4(16605, 28048, 30818) },
+                { AOM_CDF4(11008, 22857, 27719) },
+                { AOM_CDF4(6915, 16268, 22315) },
+                { AOM_CDF4(2625, 6812, 10537) },
+                { AOM_CDF4(24257, 31788, 32499) },
+                { AOM_CDF4(16880, 29454, 31879) },
+                { AOM_CDF4(11958, 25054, 29778) },
+                { AOM_CDF4(7916, 18718, 25084) },
+                { AOM_CDF4(3383, 8777, 13446) },
+                { AOM_CDF4(22720, 31603, 32393) },
+                { AOM_CDF4(14960, 28125, 31335) },
+                { AOM_CDF4(9731, 22210, 27928) },
+                { AOM_CDF4(6304, 15832, 22277) },
+                { AOM_CDF4(2910, 7818, 12166) },
+                { AOM_CDF4(20375, 30627, 32131) },
+                { AOM_CDF4(13904, 27284, 30887) },
+                { AOM_CDF4(9368, 21558, 27144) },
+                { AOM_CDF4(5937, 14966, 21119) },
+                { AOM_CDF4(2667, 7225, 11319) },
+                { AOM_CDF4(23970, 31470, 32378) },
+                { AOM_CDF4(17173, 29734, 32018) },
+                { AOM_CDF4(12795, 25441, 29965) },
+                { AOM_CDF4(8981, 19680, 25893) },
+                { AOM_CDF4(4728, 11372, 16902) },
+                { AOM_CDF4(24287, 31797, 32439) },
+                { AOM_CDF4(16703, 29145, 31696) },
+                { AOM_CDF4(10833, 23554, 28725) },
+                { AOM_CDF4(6468, 16566, 23057) },
+                { AOM_CDF4(2415, 6562, 10278) },
+                { AOM_CDF4(26610, 32395, 32659) },
+                { AOM_CDF4(18590, 30498, 32117) },
+                { AOM_CDF4(12420, 25756, 29950) },
+                { AOM_CDF4(7639, 18746, 24710) },
+                { AOM_CDF4(3001, 8086, 12347) },
+                { AOM_CDF4(25076, 32064, 32580) },
+                { AOM_CDF4(17946, 30128, 32028) },
+                { AOM_CDF4(12024, 24985, 29378) },
+                { AOM_CDF4(7517, 18390, 24304) },
+                { AOM_CDF4(3243, 8781, 13331) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(6037, 16771, 21957) },
+                { AOM_CDF4(24774, 31704, 32426) },
+                { AOM_CDF4(16830, 28589, 31056) },
+                { AOM_CDF4(10602, 22828, 27760) },
+                { AOM_CDF4(6733, 16829, 23071) },
+                { AOM_CDF4(3250, 8914, 13556) },
+                { AOM_CDF4(25582, 32220, 32668) },
+                { AOM_CDF4(18659, 30342, 32223) },
+                { AOM_CDF4(12546, 26149, 30515) },
+                { AOM_CDF4(8420, 20451, 26801) },
+                { AOM_CDF4(4636, 12420, 18344) },
+                { AOM_CDF4(27581, 32362, 32639) },
+                { AOM_CDF4(18987, 30083, 31978) },
+                { AOM_CDF4(11327, 24248, 29084) },
+                { AOM_CDF4(7264, 17719, 24120) },
+                { AOM_CDF4(3995, 10768, 16169) },
+                { AOM_CDF4(25893, 31831, 32487) },
+                { AOM_CDF4(16577, 28587, 31379) },
+                { AOM_CDF4(10189, 22748, 28182) },
+                { AOM_CDF4(6832, 17094, 23556) },
+                { AOM_CDF4(3708, 10110, 15334) },
+                { AOM_CDF4(25904, 32282, 32656) },
+                { AOM_CDF4(19721, 30792, 32276) },
+                { AOM_CDF4(12819, 26243, 30411) },
+                { AOM_CDF4(8572, 20614, 26891) },
+                { AOM_CDF4(5364, 14059, 20467) },
+                { AOM_CDF4(26580, 32438, 32677) },
+                { AOM_CDF4(20852, 31225, 32340) },
+                { AOM_CDF4(12435, 25700, 29967) },
+                { AOM_CDF4(8691, 20825, 26976) },
+                { AOM_CDF4(4446, 12209, 17269) },
+                { AOM_CDF4(27350, 32429, 32696) },
+                { AOM_CDF4(21372, 30977, 32272) },
+                { AOM_CDF4(12673, 25270, 29853) },
+                { AOM_CDF4(9208, 20925, 26640) },
+                { AOM_CDF4(5018, 13351, 18732) },
+                { AOM_CDF4(27351, 32479, 32713) },
+                { AOM_CDF4(21398, 31209, 32387) },
+                { AOM_CDF4(12162, 25047, 29842) },
+                { AOM_CDF4(7896, 18691, 25319) },
+                { AOM_CDF4(4670, 12882, 18881) },
+                { AOM_CDF4(8192, 16384, 24576) } } },
+            { { { AOM_CDF4(5487, 10460, 13708) },
+                { AOM_CDF4(21597, 28303, 30674) },
+                { AOM_CDF4(11037, 21953, 26476) },
+                { AOM_CDF4(8147, 17962, 22952) },
+                { AOM_CDF4(5242, 13061, 18532) },
+                { AOM_CDF4(1889, 5208, 8182) },
+                { AOM_CDF4(26774, 32133, 32590) },
+                { AOM_CDF4(17844, 29564, 31767) },
+                { AOM_CDF4(11690, 24438, 29171) },
+                { AOM_CDF4(7542, 18215, 24459) },
+                { AOM_CDF4(2993, 8050, 12319) },
+                { AOM_CDF4(28023, 32328, 32591) },
+                { AOM_CDF4(18651, 30126, 31954) },
+                { AOM_CDF4(12164, 25146, 29589) },
+                { AOM_CDF4(7762, 18530, 24771) },
+                { AOM_CDF4(3492, 9183, 13920) },
+                { AOM_CDF4(27591, 32008, 32491) },
+                { AOM_CDF4(17149, 28853, 31510) },
+                { AOM_CDF4(11485, 24003, 28860) },
+                { AOM_CDF4(7697, 18086, 24210) },
+                { AOM_CDF4(3075, 7999, 12218) },
+                { AOM_CDF4(28268, 32482, 32654) },
+                { AOM_CDF4(19631, 31051, 32404) },
+                { AOM_CDF4(13860, 27260, 31020) },
+                { AOM_CDF4(9605, 21613, 27594) },
+                { AOM_CDF4(4876, 12162, 17908) },
+                { AOM_CDF4(27248, 32316, 32576) },
+                { AOM_CDF4(18955, 30457, 32075) },
+                { AOM_CDF4(11824, 23997, 28795) },
+                { AOM_CDF4(7346, 18196, 24647) },
+                { AOM_CDF4(3403, 9247, 14111) },
+                { AOM_CDF4(29711, 32655, 32735) },
+                { AOM_CDF4(21169, 31394, 32417) },
+                { AOM_CDF4(13487, 27198, 30957) },
+                { AOM_CDF4(8828, 21683, 27614) },
+                { AOM_CDF4(4270, 11451, 17038) },
+                { AOM_CDF4(28708, 32578, 32731) },
+                { AOM_CDF4(20120, 31241, 32482) },
+                { AOM_CDF4(13692, 27550, 31321) },
+                { AOM_CDF4(9418, 22514, 28439) },
+                { AOM_CDF4(4999, 13283, 19462) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(5673, 14302, 19711) },
+                { AOM_CDF4(26251, 30701, 31834) },
+                { AOM_CDF4(12782, 23783, 27803) },
+                { AOM_CDF4(9127, 20657, 25808) },
+                { AOM_CDF4(6368, 16208, 21462) },
+                { AOM_CDF4(2465, 7177, 10822) },
+                { AOM_CDF4(29961, 32563, 32719) },
+                { AOM_CDF4(18318, 29891, 31949) },
+                { AOM_CDF4(11361, 24514, 29357) },
+                { AOM_CDF4(7900, 19603, 25607) },
+                { AOM_CDF4(4002, 10590, 15546) },
+                { AOM_CDF4(29637, 32310, 32595) },
+                { AOM_CDF4(18296, 29913, 31809) },
+                { AOM_CDF4(10144, 21515, 26871) },
+                { AOM_CDF4(5358, 14322, 20394) },
+                { AOM_CDF4(3067, 8362, 13346) },
+                { AOM_CDF4(28652, 32470, 32676) },
+                { AOM_CDF4(17538, 30771, 32209) },
+                { AOM_CDF4(13924, 26882, 30494) },
+                { AOM_CDF4(10496, 22837, 27869) },
+                { AOM_CDF4(7236, 16396, 21621) },
+                { AOM_CDF4(30743, 32687, 32746) },
+                { AOM_CDF4(23006, 31676, 32489) },
+                { AOM_CDF4(14494, 27828, 31120) },
+                { AOM_CDF4(10174, 22801, 28352) },
+                { AOM_CDF4(6242, 15281, 21043) },
+                { AOM_CDF4(25817, 32243, 32720) },
+                { AOM_CDF4(18618, 31367, 32325) },
+                { AOM_CDF4(13997, 28318, 31878) },
+                { AOM_CDF4(12255, 26534, 31383) },
+                { AOM_CDF4(9561, 21588, 28450) },
+                { AOM_CDF4(28188, 32635, 32724) },
+                { AOM_CDF4(22060, 32365, 32728) },
+                { AOM_CDF4(18102, 30690, 32528) },
+                { AOM_CDF4(14196, 28864, 31999) },
+                { AOM_CDF4(12262, 25792, 30865) },
+                { AOM_CDF4(24176, 32109, 32628) },
+                { AOM_CDF4(18280, 29681, 31963) },
+                { AOM_CDF4(10205, 23703, 29664) },
+                { AOM_CDF4(7889, 20025, 27676) },
+                { AOM_CDF4(6060, 16743, 23970) },
+                { AOM_CDF4(8192, 16384, 24576) } } },
+            { { { AOM_CDF4(5141, 7096, 8260) },
+                { AOM_CDF4(27186, 29022, 29789) },
+                { AOM_CDF4(6668, 12568, 15682) },
+                { AOM_CDF4(2172, 6181, 8638) },
+                { AOM_CDF4(1126, 3379, 4531) },
+                { AOM_CDF4(443, 1361, 2254) },
+                { AOM_CDF4(26083, 31153, 32436) },
+                { AOM_CDF4(13486, 24603, 28483) },
+                { AOM_CDF4(6508, 14840, 19910) },
+                { AOM_CDF4(3386, 8800, 13286) },
+                { AOM_CDF4(1530, 4322, 7054) },
+                { AOM_CDF4(29639, 32080, 32548) },
+                { AOM_CDF4(15897, 27552, 30290) },
+                { AOM_CDF4(8588, 20047, 25383) },
+                { AOM_CDF4(4889, 13339, 19269) },
+                { AOM_CDF4(2240, 6871, 10498) },
+                { AOM_CDF4(28165, 32197, 32517) },
+                { AOM_CDF4(20735, 30427, 31568) },
+                { AOM_CDF4(14325, 24671, 27692) },
+                { AOM_CDF4(5119, 12554, 17805) },
+                { AOM_CDF4(1810, 5441, 8261) },
+                { AOM_CDF4(31212, 32724, 32748) },
+                { AOM_CDF4(23352, 31766, 32545) },
+                { AOM_CDF4(14669, 27570, 31059) },
+                { AOM_CDF4(8492, 20894, 27272) },
+                { AOM_CDF4(3644, 10194, 15204) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(2461, 7013, 9371) },
+                { AOM_CDF4(24749, 29600, 30986) },
+                { AOM_CDF4(9466, 19037, 22417) },
+                { AOM_CDF4(3584, 9280, 14400) },
+                { AOM_CDF4(1505, 3929, 5433) },
+                { AOM_CDF4(677, 1500, 2736) },
+                { AOM_CDF4(23987, 30702, 32117) },
+                { AOM_CDF4(13554, 24571, 29263) },
+                { AOM_CDF4(6211, 14556, 21155) },
+                { AOM_CDF4(3135, 10972, 15625) },
+                { AOM_CDF4(2435, 7127, 11427) },
+                { AOM_CDF4(31300, 32532, 32550) },
+                { AOM_CDF4(14757, 30365, 31954) },
+                { AOM_CDF4(4405, 11612, 18553) },
+                { AOM_CDF4(580, 4132, 7322) },
+                { AOM_CDF4(1695, 10169, 14124) },
+                { AOM_CDF4(30008, 32282, 32591) },
+                { AOM_CDF4(19244, 30108, 31748) },
+                { AOM_CDF4(11180, 24158, 29555) },
+                { AOM_CDF4(5650, 14972, 19209) },
+                { AOM_CDF4(2114, 5109, 8456) },
+                { AOM_CDF4(31856, 32716, 32748) },
+                { AOM_CDF4(23012, 31664, 32572) },
+                { AOM_CDF4(13694, 26656, 30636) },
+                { AOM_CDF4(8142, 19508, 26093) },
+                { AOM_CDF4(4253, 10955, 16724) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } } },
+            { { { AOM_CDF4(601, 983, 1311) },
+                { AOM_CDF4(18725, 23406, 28087) },
+                { AOM_CDF4(5461, 8192, 10923) },
+                { AOM_CDF4(3781, 15124, 21425) },
+                { AOM_CDF4(2587, 7761, 12072) },
+                { AOM_CDF4(106, 458, 810) },
+                { AOM_CDF4(22282, 29710, 31894) },
+                { AOM_CDF4(8508, 20926, 25984) },
+                { AOM_CDF4(3726, 12713, 18083) },
+                { AOM_CDF4(1620, 7112, 10893) },
+                { AOM_CDF4(729, 2236, 3495) },
+                { AOM_CDF4(30163, 32474, 32684) },
+                { AOM_CDF4(18304, 30464, 32000) },
+                { AOM_CDF4(11443, 26526, 29647) },
+                { AOM_CDF4(6007, 15292, 21299) },
+                { AOM_CDF4(2234, 6703, 8937) },
+                { AOM_CDF4(30954, 32177, 32571) },
+                { AOM_CDF4(17363, 29562, 31076) },
+                { AOM_CDF4(9686, 22464, 27410) },
+                { AOM_CDF4(8192, 16384, 21390) },
+                { AOM_CDF4(1755, 8046, 11264) },
+                { AOM_CDF4(31168, 32734, 32748) },
+                { AOM_CDF4(22486, 31441, 32471) },
+                { AOM_CDF4(12833, 25627, 29738) },
+                { AOM_CDF4(6980, 17379, 23122) },
+                { AOM_CDF4(3111, 8887, 13479) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } } } },
+          { { { { AOM_CDF4(6041, 11854, 15927) },
+                { AOM_CDF4(20326, 30905, 32251) },
+                { AOM_CDF4(14164, 26831, 30725) },
+                { AOM_CDF4(9760, 20647, 26585) },
+                { AOM_CDF4(6416, 14953, 21219) },
+                { AOM_CDF4(2966, 7151, 10891) },
+                { AOM_CDF4(23567, 31374, 32254) },
+                { AOM_CDF4(14978, 27416, 30946) },
+                { AOM_CDF4(9434, 20225, 26254) },
+                { AOM_CDF4(6658, 14558, 20535) },
+                { AOM_CDF4(3916, 8677, 12989) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(18088, 29545, 31587) },
+                { AOM_CDF4(13062, 25843, 30073) },
+                { AOM_CDF4(8940, 16827, 22251) },
+                { AOM_CDF4(7654, 13220, 17973) },
+                { AOM_CDF4(5733, 10316, 14456) },
+                { AOM_CDF4(22879, 31388, 32114) },
+                { AOM_CDF4(15215, 27993, 30955) },
+                { AOM_CDF4(9397, 19445, 24978) },
+                { AOM_CDF4(3442, 9813, 15344) },
+                { AOM_CDF4(1368, 3936, 6532) },
+                { AOM_CDF4(25494, 32033, 32406) },
+                { AOM_CDF4(16772, 27963, 30718) },
+                { AOM_CDF4(9419, 18165, 23260) },
+                { AOM_CDF4(2677, 7501, 11797) },
+                { AOM_CDF4(1516, 4344, 7170) },
+                { AOM_CDF4(26556, 31454, 32101) },
+                { AOM_CDF4(17128, 27035, 30108) },
+                { AOM_CDF4(8324, 15344, 20249) },
+                { AOM_CDF4(1903, 5696, 9469) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(8455, 19003, 24368) },
+                { AOM_CDF4(23563, 32021, 32604) },
+                { AOM_CDF4(16237, 29446, 31935) },
+                { AOM_CDF4(10724, 23999, 29358) },
+                { AOM_CDF4(6725, 17528, 24416) },
+                { AOM_CDF4(3927, 10927, 16825) },
+                { AOM_CDF4(26313, 32288, 32634) },
+                { AOM_CDF4(17430, 30095, 32095) },
+                { AOM_CDF4(11116, 24606, 29679) },
+                { AOM_CDF4(7195, 18384, 25269) },
+                { AOM_CDF4(4726, 12852, 19315) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(22822, 31648, 32483) },
+                { AOM_CDF4(16724, 29633, 31929) },
+                { AOM_CDF4(10261, 23033, 28725) },
+                { AOM_CDF4(7029, 17840, 24528) },
+                { AOM_CDF4(4867, 13886, 21502) },
+                { AOM_CDF4(25298, 31892, 32491) },
+                { AOM_CDF4(17809, 29330, 31512) },
+                { AOM_CDF4(9668, 21329, 26579) },
+                { AOM_CDF4(4774, 12956, 18976) },
+                { AOM_CDF4(2322, 7030, 11540) },
+                { AOM_CDF4(25472, 31920, 32543) },
+                { AOM_CDF4(17957, 29387, 31632) },
+                { AOM_CDF4(9196, 20593, 26400) },
+                { AOM_CDF4(4680, 12705, 19202) },
+                { AOM_CDF4(2917, 8456, 13436) },
+                { AOM_CDF4(26471, 32059, 32574) },
+                { AOM_CDF4(18458, 29783, 31909) },
+                { AOM_CDF4(8400, 19464, 25956) },
+                { AOM_CDF4(3812, 10973, 17206) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } } },
+            { { { AOM_CDF4(6779, 13743, 17678) },
+                { AOM_CDF4(24806, 31797, 32457) },
+                { AOM_CDF4(17616, 29047, 31372) },
+                { AOM_CDF4(11063, 23175, 28003) },
+                { AOM_CDF4(6521, 16110, 22324) },
+                { AOM_CDF4(2764, 7504, 11654) },
+                { AOM_CDF4(25266, 32367, 32637) },
+                { AOM_CDF4(19054, 30553, 32175) },
+                { AOM_CDF4(12139, 25212, 29807) },
+                { AOM_CDF4(7311, 18162, 24704) },
+                { AOM_CDF4(3397, 9164, 14074) },
+                { AOM_CDF4(25988, 32208, 32522) },
+                { AOM_CDF4(16253, 28912, 31526) },
+                { AOM_CDF4(9151, 21387, 27372) },
+                { AOM_CDF4(5688, 14915, 21496) },
+                { AOM_CDF4(2717, 7627, 12004) },
+                { AOM_CDF4(23144, 31855, 32443) },
+                { AOM_CDF4(16070, 28491, 31325) },
+                { AOM_CDF4(8702, 20467, 26517) },
+                { AOM_CDF4(5243, 13956, 20367) },
+                { AOM_CDF4(2621, 7335, 11567) },
+                { AOM_CDF4(26636, 32340, 32630) },
+                { AOM_CDF4(19990, 31050, 32341) },
+                { AOM_CDF4(13243, 26105, 30315) },
+                { AOM_CDF4(8588, 19521, 25918) },
+                { AOM_CDF4(4717, 11585, 17304) },
+                { AOM_CDF4(25844, 32292, 32582) },
+                { AOM_CDF4(19090, 30635, 32097) },
+                { AOM_CDF4(11963, 24546, 28939) },
+                { AOM_CDF4(6218, 16087, 22354) },
+                { AOM_CDF4(2340, 6608, 10426) },
+                { AOM_CDF4(28046, 32576, 32694) },
+                { AOM_CDF4(21178, 31313, 32296) },
+                { AOM_CDF4(13486, 26184, 29870) },
+                { AOM_CDF4(7149, 17871, 23723) },
+                { AOM_CDF4(2833, 7958, 12259) },
+                { AOM_CDF4(27710, 32528, 32686) },
+                { AOM_CDF4(20674, 31076, 32268) },
+                { AOM_CDF4(12413, 24955, 29243) },
+                { AOM_CDF4(6676, 16927, 23097) },
+                { AOM_CDF4(2966, 8333, 12919) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(8639, 19339, 24429) },
+                { AOM_CDF4(24404, 31837, 32525) },
+                { AOM_CDF4(16997, 29425, 31784) },
+                { AOM_CDF4(11253, 24234, 29149) },
+                { AOM_CDF4(6751, 17394, 24028) },
+                { AOM_CDF4(3490, 9830, 15191) },
+                { AOM_CDF4(26283, 32471, 32714) },
+                { AOM_CDF4(19599, 31168, 32442) },
+                { AOM_CDF4(13146, 26954, 30893) },
+                { AOM_CDF4(8214, 20588, 26890) },
+                { AOM_CDF4(4699, 13081, 19300) },
+                { AOM_CDF4(28212, 32458, 32669) },
+                { AOM_CDF4(18594, 30316, 32100) },
+                { AOM_CDF4(11219, 24408, 29234) },
+                { AOM_CDF4(6865, 17656, 24149) },
+                { AOM_CDF4(3678, 10362, 16006) },
+                { AOM_CDF4(25825, 32136, 32616) },
+                { AOM_CDF4(17313, 29853, 32021) },
+                { AOM_CDF4(11197, 24471, 29472) },
+                { AOM_CDF4(6947, 17781, 24405) },
+                { AOM_CDF4(3768, 10660, 16261) },
+                { AOM_CDF4(27352, 32500, 32706) },
+                { AOM_CDF4(20850, 31468, 32469) },
+                { AOM_CDF4(14021, 27707, 31133) },
+                { AOM_CDF4(8964, 21748, 27838) },
+                { AOM_CDF4(5437, 14665, 21187) },
+                { AOM_CDF4(26304, 32492, 32698) },
+                { AOM_CDF4(20409, 31380, 32385) },
+                { AOM_CDF4(13682, 27222, 30632) },
+                { AOM_CDF4(8974, 21236, 26685) },
+                { AOM_CDF4(4234, 11665, 16934) },
+                { AOM_CDF4(26273, 32357, 32711) },
+                { AOM_CDF4(20672, 31242, 32441) },
+                { AOM_CDF4(14172, 27254, 30902) },
+                { AOM_CDF4(9870, 21898, 27275) },
+                { AOM_CDF4(5164, 13506, 19270) },
+                { AOM_CDF4(26725, 32459, 32728) },
+                { AOM_CDF4(20991, 31442, 32527) },
+                { AOM_CDF4(13071, 26434, 30811) },
+                { AOM_CDF4(8184, 20090, 26742) },
+                { AOM_CDF4(4803, 13255, 19895) },
+                { AOM_CDF4(8192, 16384, 24576) } } },
+            { { { AOM_CDF4(7555, 14942, 18501) },
+                { AOM_CDF4(24410, 31178, 32287) },
+                { AOM_CDF4(14394, 26738, 30253) },
+                { AOM_CDF4(8413, 19554, 25195) },
+                { AOM_CDF4(4766, 12924, 18785) },
+                { AOM_CDF4(2029, 5806, 9207) },
+                { AOM_CDF4(26776, 32364, 32663) },
+                { AOM_CDF4(18732, 29967, 31931) },
+                { AOM_CDF4(11005, 23786, 28852) },
+                { AOM_CDF4(6466, 16909, 23510) },
+                { AOM_CDF4(3044, 8638, 13419) },
+                { AOM_CDF4(29208, 32582, 32704) },
+                { AOM_CDF4(20068, 30857, 32208) },
+                { AOM_CDF4(12003, 25085, 29595) },
+                { AOM_CDF4(6947, 17750, 24189) },
+                { AOM_CDF4(3245, 9103, 14007) },
+                { AOM_CDF4(27359, 32465, 32669) },
+                { AOM_CDF4(19421, 30614, 32174) },
+                { AOM_CDF4(11915, 25010, 29579) },
+                { AOM_CDF4(6950, 17676, 24074) },
+                { AOM_CDF4(3007, 8473, 13096) },
+                { AOM_CDF4(29002, 32676, 32735) },
+                { AOM_CDF4(22102, 31849, 32576) },
+                { AOM_CDF4(14408, 28009, 31405) },
+                { AOM_CDF4(9027, 21679, 27931) },
+                { AOM_CDF4(4694, 12678, 18748) },
+                { AOM_CDF4(28216, 32528, 32682) },
+                { AOM_CDF4(20849, 31264, 32318) },
+                { AOM_CDF4(12756, 25815, 29751) },
+                { AOM_CDF4(7565, 18801, 24923) },
+                { AOM_CDF4(3509, 9533, 14477) },
+                { AOM_CDF4(30133, 32687, 32739) },
+                { AOM_CDF4(23063, 31910, 32515) },
+                { AOM_CDF4(14588, 28051, 31132) },
+                { AOM_CDF4(9085, 21649, 27457) },
+                { AOM_CDF4(4261, 11654, 17264) },
+                { AOM_CDF4(29518, 32691, 32748) },
+                { AOM_CDF4(22451, 31959, 32613) },
+                { AOM_CDF4(14864, 28722, 31700) },
+                { AOM_CDF4(9695, 22964, 28716) },
+                { AOM_CDF4(4932, 13358, 19502) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(6465, 16958, 21688) },
+                { AOM_CDF4(25199, 31514, 32360) },
+                { AOM_CDF4(14774, 27149, 30607) },
+                { AOM_CDF4(9257, 21438, 26972) },
+                { AOM_CDF4(5723, 15183, 21882) },
+                { AOM_CDF4(3150, 8879, 13731) },
+                { AOM_CDF4(26989, 32262, 32682) },
+                { AOM_CDF4(17396, 29937, 32085) },
+                { AOM_CDF4(11387, 24901, 29784) },
+                { AOM_CDF4(7289, 18821, 25548) },
+                { AOM_CDF4(3734, 10577, 16086) },
+                { AOM_CDF4(29728, 32501, 32695) },
+                { AOM_CDF4(17431, 29701, 31903) },
+                { AOM_CDF4(9921, 22826, 28300) },
+                { AOM_CDF4(5896, 15434, 22068) },
+                { AOM_CDF4(3430, 9646, 14757) },
+                { AOM_CDF4(28614, 32511, 32705) },
+                { AOM_CDF4(19364, 30638, 32263) },
+                { AOM_CDF4(13129, 26254, 30402) },
+                { AOM_CDF4(8754, 20484, 26440) },
+                { AOM_CDF4(4378, 11607, 17110) },
+                { AOM_CDF4(30292, 32671, 32744) },
+                { AOM_CDF4(21780, 31603, 32501) },
+                { AOM_CDF4(14314, 27829, 31291) },
+                { AOM_CDF4(9611, 22327, 28263) },
+                { AOM_CDF4(4890, 13087, 19065) },
+                { AOM_CDF4(25862, 32567, 32733) },
+                { AOM_CDF4(20794, 32050, 32567) },
+                { AOM_CDF4(17243, 30625, 32254) },
+                { AOM_CDF4(13283, 27628, 31474) },
+                { AOM_CDF4(9669, 22532, 28918) },
+                { AOM_CDF4(27435, 32697, 32748) },
+                { AOM_CDF4(24922, 32390, 32714) },
+                { AOM_CDF4(21449, 31504, 32536) },
+                { AOM_CDF4(16392, 29729, 31832) },
+                { AOM_CDF4(11692, 24884, 29076) },
+                { AOM_CDF4(24193, 32290, 32735) },
+                { AOM_CDF4(18909, 31104, 32563) },
+                { AOM_CDF4(12236, 26841, 31403) },
+                { AOM_CDF4(8171, 21840, 29082) },
+                { AOM_CDF4(7224, 17280, 25275) },
+                { AOM_CDF4(8192, 16384, 24576) } } },
+            { { { AOM_CDF4(3078, 6839, 9890) },
+                { AOM_CDF4(13837, 20450, 24479) },
+                { AOM_CDF4(5914, 14222, 19328) },
+                { AOM_CDF4(3866, 10267, 14762) },
+                { AOM_CDF4(2612, 7208, 11042) },
+                { AOM_CDF4(1067, 2991, 4776) },
+                { AOM_CDF4(25817, 31646, 32529) },
+                { AOM_CDF4(13708, 26338, 30385) },
+                { AOM_CDF4(7328, 18585, 24870) },
+                { AOM_CDF4(4691, 13080, 19276) },
+                { AOM_CDF4(1825, 5253, 8352) },
+                { AOM_CDF4(29386, 32315, 32624) },
+                { AOM_CDF4(17160, 29001, 31360) },
+                { AOM_CDF4(9602, 21862, 27396) },
+                { AOM_CDF4(5915, 15772, 22148) },
+                { AOM_CDF4(2786, 7779, 12047) },
+                { AOM_CDF4(29246, 32450, 32663) },
+                { AOM_CDF4(18696, 29929, 31818) },
+                { AOM_CDF4(10510, 23369, 28560) },
+                { AOM_CDF4(6229, 16499, 23125) },
+                { AOM_CDF4(2608, 7448, 11705) },
+                { AOM_CDF4(30753, 32710, 32748) },
+                { AOM_CDF4(21638, 31487, 32503) },
+                { AOM_CDF4(12937, 26854, 30870) },
+                { AOM_CDF4(8182, 20596, 26970) },
+                { AOM_CDF4(3637, 10269, 15497) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(5244, 12150, 16906) },
+                { AOM_CDF4(20486, 26858, 29701) },
+                { AOM_CDF4(7756, 18317, 23735) },
+                { AOM_CDF4(3452, 9256, 13146) },
+                { AOM_CDF4(2020, 5206, 8229) },
+                { AOM_CDF4(1801, 4993, 7903) },
+                { AOM_CDF4(27051, 31858, 32531) },
+                { AOM_CDF4(15988, 27531, 30619) },
+                { AOM_CDF4(9188, 21484, 26719) },
+                { AOM_CDF4(6273, 17186, 23800) },
+                { AOM_CDF4(3108, 9355, 14764) },
+                { AOM_CDF4(31076, 32520, 32680) },
+                { AOM_CDF4(18119, 30037, 31850) },
+                { AOM_CDF4(10244, 22969, 27472) },
+                { AOM_CDF4(4692, 14077, 19273) },
+                { AOM_CDF4(3694, 11677, 17556) },
+                { AOM_CDF4(30060, 32581, 32720) },
+                { AOM_CDF4(21011, 30775, 32120) },
+                { AOM_CDF4(11931, 24820, 29289) },
+                { AOM_CDF4(7119, 17662, 24356) },
+                { AOM_CDF4(3833, 10706, 16304) },
+                { AOM_CDF4(31954, 32731, 32748) },
+                { AOM_CDF4(23913, 31724, 32489) },
+                { AOM_CDF4(15520, 28060, 31286) },
+                { AOM_CDF4(11517, 23008, 28571) },
+                { AOM_CDF4(6193, 14508, 20629) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } } },
+            { { { AOM_CDF4(1035, 2807, 4156) },
+                { AOM_CDF4(13162, 18138, 20939) },
+                { AOM_CDF4(2696, 6633, 8755) },
+                { AOM_CDF4(1373, 4161, 6853) },
+                { AOM_CDF4(1099, 2746, 4716) },
+                { AOM_CDF4(340, 1021, 1599) },
+                { AOM_CDF4(22826, 30419, 32135) },
+                { AOM_CDF4(10395, 21762, 26942) },
+                { AOM_CDF4(4726, 12407, 17361) },
+                { AOM_CDF4(2447, 7080, 10593) },
+                { AOM_CDF4(1227, 3717, 6011) },
+                { AOM_CDF4(28156, 31424, 31934) },
+                { AOM_CDF4(16915, 27754, 30373) },
+                { AOM_CDF4(9148, 20990, 26431) },
+                { AOM_CDF4(5950, 15515, 21148) },
+                { AOM_CDF4(2492, 7327, 11526) },
+                { AOM_CDF4(30602, 32477, 32670) },
+                { AOM_CDF4(20026, 29955, 31568) },
+                { AOM_CDF4(11220, 23628, 28105) },
+                { AOM_CDF4(6652, 17019, 22973) },
+                { AOM_CDF4(3064, 8536, 13043) },
+                { AOM_CDF4(31769, 32724, 32748) },
+                { AOM_CDF4(22230, 30887, 32373) },
+                { AOM_CDF4(12234, 25079, 29731) },
+                { AOM_CDF4(7326, 18816, 25353) },
+                { AOM_CDF4(3933, 10907, 16616) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } } } },
+          { { { { AOM_CDF4(8896, 16227, 20630) },
+                { AOM_CDF4(23629, 31782, 32527) },
+                { AOM_CDF4(15173, 27755, 31321) },
+                { AOM_CDF4(10158, 21233, 27382) },
+                { AOM_CDF4(6420, 14857, 21558) },
+                { AOM_CDF4(3269, 8155, 12646) },
+                { AOM_CDF4(24835, 32009, 32496) },
+                { AOM_CDF4(16509, 28421, 31579) },
+                { AOM_CDF4(10957, 21514, 27418) },
+                { AOM_CDF4(7881, 15930, 22096) },
+                { AOM_CDF4(5388, 10960, 15918) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(20745, 30773, 32093) },
+                { AOM_CDF4(15200, 27221, 30861) },
+                { AOM_CDF4(13032, 20873, 25667) },
+                { AOM_CDF4(12285, 18663, 23494) },
+                { AOM_CDF4(11563, 17481, 21489) },
+                { AOM_CDF4(26260, 31982, 32320) },
+                { AOM_CDF4(15397, 28083, 31100) },
+                { AOM_CDF4(9742, 19217, 24824) },
+                { AOM_CDF4(3261, 9629, 15362) },
+                { AOM_CDF4(1480, 4322, 7499) },
+                { AOM_CDF4(27599, 32256, 32460) },
+                { AOM_CDF4(16857, 27659, 30774) },
+                { AOM_CDF4(9551, 18290, 23748) },
+                { AOM_CDF4(3052, 8933, 14103) },
+                { AOM_CDF4(2021, 5910, 9787) },
+                { AOM_CDF4(29005, 32015, 32392) },
+                { AOM_CDF4(17677, 27694, 30863) },
+                { AOM_CDF4(9204, 17356, 23219) },
+                { AOM_CDF4(2403, 7516, 12814) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(10808, 22056, 26896) },
+                { AOM_CDF4(25739, 32313, 32676) },
+                { AOM_CDF4(17288, 30203, 32221) },
+                { AOM_CDF4(11359, 24878, 29896) },
+                { AOM_CDF4(6949, 17767, 24893) },
+                { AOM_CDF4(4287, 11796, 18071) },
+                { AOM_CDF4(27880, 32521, 32705) },
+                { AOM_CDF4(19038, 31004, 32414) },
+                { AOM_CDF4(12564, 26345, 30768) },
+                { AOM_CDF4(8269, 19947, 26779) },
+                { AOM_CDF4(5674, 14657, 21674) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(25742, 32319, 32671) },
+                { AOM_CDF4(19557, 31164, 32454) },
+                { AOM_CDF4(13381, 26381, 30755) },
+                { AOM_CDF4(10101, 21466, 26722) },
+                { AOM_CDF4(9209, 19650, 26825) },
+                { AOM_CDF4(27107, 31917, 32432) },
+                { AOM_CDF4(18056, 28893, 31203) },
+                { AOM_CDF4(10200, 21434, 26764) },
+                { AOM_CDF4(4660, 12913, 19502) },
+                { AOM_CDF4(2368, 6930, 12504) },
+                { AOM_CDF4(26960, 32158, 32613) },
+                { AOM_CDF4(18628, 30005, 32031) },
+                { AOM_CDF4(10233, 22442, 28232) },
+                { AOM_CDF4(5471, 14630, 21516) },
+                { AOM_CDF4(3235, 10767, 17109) },
+                { AOM_CDF4(27696, 32440, 32692) },
+                { AOM_CDF4(20032, 31167, 32438) },
+                { AOM_CDF4(8700, 21341, 28442) },
+                { AOM_CDF4(5662, 14831, 21795) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } } },
+            { { { AOM_CDF4(9704, 17294, 21132) },
+                { AOM_CDF4(26762, 32278, 32633) },
+                { AOM_CDF4(18382, 29620, 31819) },
+                { AOM_CDF4(10891, 23475, 28723) },
+                { AOM_CDF4(6358, 16583, 23309) },
+                { AOM_CDF4(3248, 9118, 14141) },
+                { AOM_CDF4(27204, 32573, 32699) },
+                { AOM_CDF4(19818, 30824, 32329) },
+                { AOM_CDF4(11772, 25120, 30041) },
+                { AOM_CDF4(6995, 18033, 25039) },
+                { AOM_CDF4(3752, 10442, 16098) },
+                { AOM_CDF4(27222, 32256, 32559) },
+                { AOM_CDF4(15356, 28399, 31475) },
+                { AOM_CDF4(8821, 20635, 27057) },
+                { AOM_CDF4(5511, 14404, 21239) },
+                { AOM_CDF4(2935, 8222, 13051) },
+                { AOM_CDF4(24875, 32120, 32529) },
+                { AOM_CDF4(15233, 28265, 31445) },
+                { AOM_CDF4(8605, 20570, 26932) },
+                { AOM_CDF4(5431, 14413, 21196) },
+                { AOM_CDF4(2994, 8341, 13223) },
+                { AOM_CDF4(28201, 32604, 32700) },
+                { AOM_CDF4(21041, 31446, 32456) },
+                { AOM_CDF4(13221, 26213, 30475) },
+                { AOM_CDF4(8255, 19385, 26037) },
+                { AOM_CDF4(4930, 12585, 18830) },
+                { AOM_CDF4(28768, 32448, 32627) },
+                { AOM_CDF4(19705, 30561, 32021) },
+                { AOM_CDF4(11572, 23589, 28220) },
+                { AOM_CDF4(5532, 15034, 21446) },
+                { AOM_CDF4(2460, 7150, 11456) },
+                { AOM_CDF4(29874, 32619, 32699) },
+                { AOM_CDF4(21621, 31071, 32201) },
+                { AOM_CDF4(12511, 24747, 28992) },
+                { AOM_CDF4(6281, 16395, 22748) },
+                { AOM_CDF4(3246, 9278, 14497) },
+                { AOM_CDF4(29715, 32625, 32712) },
+                { AOM_CDF4(20958, 31011, 32283) },
+                { AOM_CDF4(11233, 23671, 28806) },
+                { AOM_CDF4(6012, 16128, 22868) },
+                { AOM_CDF4(3427, 9851, 15414) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(11016, 22111, 26794) },
+                { AOM_CDF4(25946, 32357, 32677) },
+                { AOM_CDF4(17890, 30452, 32252) },
+                { AOM_CDF4(11678, 25142, 29816) },
+                { AOM_CDF4(6720, 17534, 24584) },
+                { AOM_CDF4(4230, 11665, 17820) },
+                { AOM_CDF4(28400, 32623, 32747) },
+                { AOM_CDF4(21164, 31668, 32575) },
+                { AOM_CDF4(13572, 27388, 31182) },
+                { AOM_CDF4(8234, 20750, 27358) },
+                { AOM_CDF4(5065, 14055, 20897) },
+                { AOM_CDF4(28981, 32547, 32705) },
+                { AOM_CDF4(18681, 30543, 32239) },
+                { AOM_CDF4(10919, 24075, 29286) },
+                { AOM_CDF4(6431, 17199, 24077) },
+                { AOM_CDF4(3819, 10464, 16618) },
+                { AOM_CDF4(26870, 32467, 32693) },
+                { AOM_CDF4(19041, 30831, 32347) },
+                { AOM_CDF4(11794, 25211, 30016) },
+                { AOM_CDF4(6888, 18019, 24970) },
+                { AOM_CDF4(4370, 12363, 18992) },
+                { AOM_CDF4(29578, 32670, 32744) },
+                { AOM_CDF4(23159, 32007, 32613) },
+                { AOM_CDF4(15315, 28669, 31676) },
+                { AOM_CDF4(9298, 22607, 28782) },
+                { AOM_CDF4(6144, 15913, 22968) },
+                { AOM_CDF4(28110, 32499, 32669) },
+                { AOM_CDF4(21574, 30937, 32015) },
+                { AOM_CDF4(12759, 24818, 28727) },
+                { AOM_CDF4(6545, 16761, 23042) },
+                { AOM_CDF4(3649, 10597, 16833) },
+                { AOM_CDF4(28163, 32552, 32728) },
+                { AOM_CDF4(22101, 31469, 32464) },
+                { AOM_CDF4(13160, 25472, 30143) },
+                { AOM_CDF4(7303, 18684, 25468) },
+                { AOM_CDF4(5241, 13975, 20955) },
+                { AOM_CDF4(28400, 32631, 32744) },
+                { AOM_CDF4(22104, 31793, 32603) },
+                { AOM_CDF4(13557, 26571, 30846) },
+                { AOM_CDF4(7749, 19861, 26675) },
+                { AOM_CDF4(4873, 14030, 21234) },
+                { AOM_CDF4(8192, 16384, 24576) } } },
+            { { { AOM_CDF4(9800, 17635, 21073) },
+                { AOM_CDF4(26153, 31885, 32527) },
+                { AOM_CDF4(15038, 27852, 31006) },
+                { AOM_CDF4(8718, 20564, 26486) },
+                { AOM_CDF4(5128, 14076, 20514) },
+                { AOM_CDF4(2636, 7566, 11925) },
+                { AOM_CDF4(27551, 32504, 32701) },
+                { AOM_CDF4(18310, 30054, 32100) },
+                { AOM_CDF4(10211, 23420, 29082) },
+                { AOM_CDF4(6222, 16876, 23916) },
+                { AOM_CDF4(3462, 9954, 15498) },
+                { AOM_CDF4(29991, 32633, 32721) },
+                { AOM_CDF4(19883, 30751, 32201) },
+                { AOM_CDF4(11141, 24184, 29285) },
+                { AOM_CDF4(6420, 16940, 23774) },
+                { AOM_CDF4(3392, 9753, 15118) },
+                { AOM_CDF4(28465, 32616, 32712) },
+                { AOM_CDF4(19850, 30702, 32244) },
+                { AOM_CDF4(10983, 24024, 29223) },
+                { AOM_CDF4(6294, 16770, 23582) },
+                { AOM_CDF4(3244, 9283, 14509) },
+                { AOM_CDF4(30023, 32717, 32748) },
+                { AOM_CDF4(22940, 32032, 32626) },
+                { AOM_CDF4(14282, 27928, 31473) },
+                { AOM_CDF4(8562, 21327, 27914) },
+                { AOM_CDF4(4846, 13393, 19919) },
+                { AOM_CDF4(29981, 32590, 32695) },
+                { AOM_CDF4(20465, 30963, 32166) },
+                { AOM_CDF4(11479, 23579, 28195) },
+                { AOM_CDF4(5916, 15648, 22073) },
+                { AOM_CDF4(3031, 8605, 13398) },
+                { AOM_CDF4(31146, 32691, 32739) },
+                { AOM_CDF4(23106, 31724, 32444) },
+                { AOM_CDF4(13783, 26738, 30439) },
+                { AOM_CDF4(7852, 19468, 25807) },
+                { AOM_CDF4(3860, 11124, 16853) },
+                { AOM_CDF4(31014, 32724, 32748) },
+                { AOM_CDF4(23629, 32109, 32628) },
+                { AOM_CDF4(14747, 28115, 31403) },
+                { AOM_CDF4(8545, 21242, 27478) },
+                { AOM_CDF4(4574, 12781, 19067) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(9185, 19694, 24688) },
+                { AOM_CDF4(26081, 31985, 32621) },
+                { AOM_CDF4(16015, 29000, 31787) },
+                { AOM_CDF4(10542, 23690, 29206) },
+                { AOM_CDF4(6732, 17945, 24677) },
+                { AOM_CDF4(3916, 11039, 16722) },
+                { AOM_CDF4(28224, 32566, 32744) },
+                { AOM_CDF4(19100, 31138, 32485) },
+                { AOM_CDF4(12528, 26620, 30879) },
+                { AOM_CDF4(7741, 20277, 26885) },
+                { AOM_CDF4(4566, 12845, 18990) },
+                { AOM_CDF4(29933, 32593, 32718) },
+                { AOM_CDF4(17670, 30333, 32155) },
+                { AOM_CDF4(10385, 23600, 28909) },
+                { AOM_CDF4(6243, 16236, 22407) },
+                { AOM_CDF4(3976, 10389, 16017) },
+                { AOM_CDF4(28377, 32561, 32738) },
+                { AOM_CDF4(19366, 31175, 32482) },
+                { AOM_CDF4(13327, 27175, 31094) },
+                { AOM_CDF4(8258, 20769, 27143) },
+                { AOM_CDF4(4703, 13198, 19527) },
+                { AOM_CDF4(31086, 32706, 32748) },
+                { AOM_CDF4(22853, 31902, 32583) },
+                { AOM_CDF4(14759, 28186, 31419) },
+                { AOM_CDF4(9284, 22382, 28348) },
+                { AOM_CDF4(5585, 15192, 21868) },
+                { AOM_CDF4(28291, 32652, 32746) },
+                { AOM_CDF4(19849, 32107, 32571) },
+                { AOM_CDF4(14834, 26818, 29214) },
+                { AOM_CDF4(10306, 22594, 28672) },
+                { AOM_CDF4(6615, 17384, 23384) },
+                { AOM_CDF4(28947, 32604, 32745) },
+                { AOM_CDF4(25625, 32289, 32646) },
+                { AOM_CDF4(18758, 28672, 31403) },
+                { AOM_CDF4(10017, 23430, 28523) },
+                { AOM_CDF4(6862, 15269, 22131) },
+                { AOM_CDF4(23933, 32509, 32739) },
+                { AOM_CDF4(19927, 31495, 32631) },
+                { AOM_CDF4(11903, 26023, 30621) },
+                { AOM_CDF4(7026, 20094, 27252) },
+                { AOM_CDF4(5998, 18106, 24437) },
+                { AOM_CDF4(8192, 16384, 24576) } } },
+            { { { AOM_CDF4(4456, 11274, 15533) },
+                { AOM_CDF4(21219, 29079, 31616) },
+                { AOM_CDF4(11173, 23774, 28567) },
+                { AOM_CDF4(7282, 18293, 24263) },
+                { AOM_CDF4(4890, 13286, 19115) },
+                { AOM_CDF4(1890, 5508, 8659) },
+                { AOM_CDF4(26651, 32136, 32647) },
+                { AOM_CDF4(14630, 28254, 31455) },
+                { AOM_CDF4(8716, 21287, 27395) },
+                { AOM_CDF4(5615, 15331, 22008) },
+                { AOM_CDF4(2675, 7700, 12150) },
+                { AOM_CDF4(29954, 32526, 32690) },
+                { AOM_CDF4(16126, 28982, 31633) },
+                { AOM_CDF4(9030, 21361, 27352) },
+                { AOM_CDF4(5411, 14793, 21271) },
+                { AOM_CDF4(2943, 8422, 13163) },
+                { AOM_CDF4(29539, 32601, 32730) },
+                { AOM_CDF4(18125, 30385, 32201) },
+                { AOM_CDF4(10422, 24090, 29468) },
+                { AOM_CDF4(6468, 17487, 24438) },
+                { AOM_CDF4(2970, 8653, 13531) },
+                { AOM_CDF4(30912, 32715, 32748) },
+                { AOM_CDF4(20666, 31373, 32497) },
+                { AOM_CDF4(12509, 26640, 30917) },
+                { AOM_CDF4(8058, 20629, 27290) },
+                { AOM_CDF4(4231, 12006, 18052) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(10202, 20633, 25484) },
+                { AOM_CDF4(27336, 31445, 32352) },
+                { AOM_CDF4(12420, 24384, 28552) },
+                { AOM_CDF4(7648, 18115, 23856) },
+                { AOM_CDF4(5662, 14341, 19902) },
+                { AOM_CDF4(3611, 10328, 15390) },
+                { AOM_CDF4(30945, 32616, 32736) },
+                { AOM_CDF4(18682, 30505, 32253) },
+                { AOM_CDF4(11513, 25336, 30203) },
+                { AOM_CDF4(7449, 19452, 26148) },
+                { AOM_CDF4(4482, 13051, 18886) },
+                { AOM_CDF4(32022, 32690, 32747) },
+                { AOM_CDF4(18578, 30501, 32146) },
+                { AOM_CDF4(11249, 23368, 28631) },
+                { AOM_CDF4(5645, 16958, 22158) },
+                { AOM_CDF4(5009, 11444, 16637) },
+                { AOM_CDF4(31357, 32710, 32748) },
+                { AOM_CDF4(21552, 31494, 32504) },
+                { AOM_CDF4(13891, 27677, 31340) },
+                { AOM_CDF4(9051, 22098, 28172) },
+                { AOM_CDF4(5190, 13377, 19486) },
+                { AOM_CDF4(32364, 32740, 32748) },
+                { AOM_CDF4(24839, 31907, 32551) },
+                { AOM_CDF4(17160, 28779, 31696) },
+                { AOM_CDF4(12452, 24137, 29602) },
+                { AOM_CDF4(6165, 15389, 22477) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } } },
+            { { { AOM_CDF4(2575, 7281, 11077) },
+                { AOM_CDF4(14002, 20866, 25402) },
+                { AOM_CDF4(6343, 15056, 19658) },
+                { AOM_CDF4(4474, 11858, 17041) },
+                { AOM_CDF4(2865, 8299, 12534) },
+                { AOM_CDF4(1344, 3949, 6391) },
+                { AOM_CDF4(24720, 31239, 32459) },
+                { AOM_CDF4(12585, 25356, 29968) },
+                { AOM_CDF4(7181, 18246, 24444) },
+                { AOM_CDF4(5025, 13667, 19885) },
+                { AOM_CDF4(2521, 7304, 11605) },
+                { AOM_CDF4(29908, 32252, 32584) },
+                { AOM_CDF4(17421, 29156, 31575) },
+                { AOM_CDF4(9889, 22188, 27782) },
+                { AOM_CDF4(5878, 15647, 22123) },
+                { AOM_CDF4(2814, 8665, 13323) },
+                { AOM_CDF4(30183, 32568, 32713) },
+                { AOM_CDF4(18528, 30195, 32049) },
+                { AOM_CDF4(10982, 24606, 29657) },
+                { AOM_CDF4(6957, 18165, 25231) },
+                { AOM_CDF4(3508, 10118, 15468) },
+                { AOM_CDF4(31761, 32736, 32748) },
+                { AOM_CDF4(21041, 31328, 32546) },
+                { AOM_CDF4(12568, 26732, 31166) },
+                { AOM_CDF4(8052, 20720, 27733) },
+                { AOM_CDF4(4336, 12192, 18396) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } } } },
+          { { { { AOM_CDF4(7062, 16472, 22319) },
+                { AOM_CDF4(24538, 32261, 32674) },
+                { AOM_CDF4(13675, 28041, 31779) },
+                { AOM_CDF4(8590, 20674, 27631) },
+                { AOM_CDF4(5685, 14675, 22013) },
+                { AOM_CDF4(3655, 9898, 15731) },
+                { AOM_CDF4(26493, 32418, 32658) },
+                { AOM_CDF4(16376, 29342, 32090) },
+                { AOM_CDF4(10594, 22649, 28970) },
+                { AOM_CDF4(8176, 17170, 24303) },
+                { AOM_CDF4(5605, 12694, 19139) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(23888, 31902, 32542) },
+                { AOM_CDF4(18612, 29687, 31987) },
+                { AOM_CDF4(16245, 24852, 29249) },
+                { AOM_CDF4(15765, 22608, 27559) },
+                { AOM_CDF4(19895, 24699, 27510) },
+                { AOM_CDF4(28401, 32212, 32457) },
+                { AOM_CDF4(15274, 27825, 30980) },
+                { AOM_CDF4(9364, 18128, 24332) },
+                { AOM_CDF4(2283, 8193, 15082) },
+                { AOM_CDF4(1228, 3972, 7881) },
+                { AOM_CDF4(29455, 32469, 32620) },
+                { AOM_CDF4(17981, 28245, 31388) },
+                { AOM_CDF4(10921, 20098, 26240) },
+                { AOM_CDF4(3743, 11829, 18657) },
+                { AOM_CDF4(2374, 9593, 15715) },
+                { AOM_CDF4(31068, 32466, 32635) },
+                { AOM_CDF4(20321, 29572, 31971) },
+                { AOM_CDF4(10771, 20255, 27119) },
+                { AOM_CDF4(2795, 10410, 17361) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(9320, 22102, 27840) },
+                { AOM_CDF4(27057, 32464, 32724) },
+                { AOM_CDF4(16331, 30268, 32309) },
+                { AOM_CDF4(10319, 23935, 29720) },
+                { AOM_CDF4(6189, 16448, 24106) },
+                { AOM_CDF4(3589, 10884, 18808) },
+                { AOM_CDF4(29026, 32624, 32748) },
+                { AOM_CDF4(19226, 31507, 32587) },
+                { AOM_CDF4(12692, 26921, 31203) },
+                { AOM_CDF4(7049, 19532, 27635) },
+                { AOM_CDF4(7727, 15669, 23252) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(28056, 32625, 32748) },
+                { AOM_CDF4(22383, 32075, 32669) },
+                { AOM_CDF4(15417, 27098, 31749) },
+                { AOM_CDF4(18127, 26493, 27190) },
+                { AOM_CDF4(5461, 16384, 21845) },
+                { AOM_CDF4(27982, 32091, 32584) },
+                { AOM_CDF4(19045, 29868, 31972) },
+                { AOM_CDF4(10397, 22266, 27932) },
+                { AOM_CDF4(5990, 13697, 21500) },
+                { AOM_CDF4(1792, 6912, 15104) },
+                { AOM_CDF4(28198, 32501, 32718) },
+                { AOM_CDF4(21534, 31521, 32569) },
+                { AOM_CDF4(11109, 25217, 30017) },
+                { AOM_CDF4(5671, 15124, 26151) },
+                { AOM_CDF4(4681, 14043, 18725) },
+                { AOM_CDF4(28688, 32580, 32741) },
+                { AOM_CDF4(22576, 32079, 32661) },
+                { AOM_CDF4(10627, 22141, 28340) },
+                { AOM_CDF4(9362, 14043, 28087) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } } },
+            { { { AOM_CDF4(7754, 16948, 22142) },
+                { AOM_CDF4(25670, 32330, 32691) },
+                { AOM_CDF4(15663, 29225, 31994) },
+                { AOM_CDF4(9878, 23288, 29158) },
+                { AOM_CDF4(6419, 17088, 24336) },
+                { AOM_CDF4(3859, 11003, 17039) },
+                { AOM_CDF4(27562, 32595, 32725) },
+                { AOM_CDF4(17575, 30588, 32399) },
+                { AOM_CDF4(10819, 24838, 30309) },
+                { AOM_CDF4(7124, 18686, 25916) },
+                { AOM_CDF4(4479, 12688, 19340) },
+                { AOM_CDF4(28385, 32476, 32673) },
+                { AOM_CDF4(15306, 29005, 31938) },
+                { AOM_CDF4(8937, 21615, 28322) },
+                { AOM_CDF4(5982, 15603, 22786) },
+                { AOM_CDF4(3620, 10267, 16136) },
+                { AOM_CDF4(27280, 32464, 32667) },
+                { AOM_CDF4(15607, 29160, 32004) },
+                { AOM_CDF4(9091, 22135, 28740) },
+                { AOM_CDF4(6232, 16632, 24020) },
+                { AOM_CDF4(4047, 11377, 17672) },
+                { AOM_CDF4(29220, 32630, 32718) },
+                { AOM_CDF4(19650, 31220, 32462) },
+                { AOM_CDF4(13050, 26312, 30827) },
+                { AOM_CDF4(9228, 20870, 27468) },
+                { AOM_CDF4(6146, 15149, 21971) },
+                { AOM_CDF4(30169, 32481, 32623) },
+                { AOM_CDF4(17212, 29311, 31554) },
+                { AOM_CDF4(9911, 21311, 26882) },
+                { AOM_CDF4(4487, 13314, 20372) },
+                { AOM_CDF4(2570, 7772, 12889) },
+                { AOM_CDF4(30924, 32613, 32708) },
+                { AOM_CDF4(19490, 30206, 32107) },
+                { AOM_CDF4(11232, 23998, 29276) },
+                { AOM_CDF4(6769, 17955, 25035) },
+                { AOM_CDF4(4398, 12623, 19214) },
+                { AOM_CDF4(30609, 32627, 32722) },
+                { AOM_CDF4(19370, 30582, 32287) },
+                { AOM_CDF4(10457, 23619, 29409) },
+                { AOM_CDF4(6443, 17637, 24834) },
+                { AOM_CDF4(4645, 13236, 20106) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(8626, 20271, 26216) },
+                { AOM_CDF4(26707, 32406, 32711) },
+                { AOM_CDF4(16999, 30329, 32286) },
+                { AOM_CDF4(11445, 25123, 30286) },
+                { AOM_CDF4(6411, 18828, 25601) },
+                { AOM_CDF4(6801, 12458, 20248) },
+                { AOM_CDF4(29918, 32682, 32748) },
+                { AOM_CDF4(20649, 31739, 32618) },
+                { AOM_CDF4(12879, 27773, 31581) },
+                { AOM_CDF4(7896, 21751, 28244) },
+                { AOM_CDF4(5260, 14870, 23698) },
+                { AOM_CDF4(29252, 32593, 32731) },
+                { AOM_CDF4(17072, 30460, 32294) },
+                { AOM_CDF4(10653, 24143, 29365) },
+                { AOM_CDF4(6536, 17490, 23983) },
+                { AOM_CDF4(4929, 13170, 20085) },
+                { AOM_CDF4(28137, 32518, 32715) },
+                { AOM_CDF4(18171, 30784, 32407) },
+                { AOM_CDF4(11437, 25436, 30459) },
+                { AOM_CDF4(7252, 18534, 26176) },
+                { AOM_CDF4(4126, 13353, 20978) },
+                { AOM_CDF4(31162, 32726, 32748) },
+                { AOM_CDF4(23017, 32222, 32701) },
+                { AOM_CDF4(15629, 29233, 32046) },
+                { AOM_CDF4(9387, 22621, 29480) },
+                { AOM_CDF4(6922, 17616, 25010) },
+                { AOM_CDF4(28838, 32265, 32614) },
+                { AOM_CDF4(19701, 30206, 31920) },
+                { AOM_CDF4(11214, 22410, 27933) },
+                { AOM_CDF4(5320, 14177, 23034) },
+                { AOM_CDF4(5049, 12881, 17827) },
+                { AOM_CDF4(27484, 32471, 32734) },
+                { AOM_CDF4(21076, 31526, 32561) },
+                { AOM_CDF4(12707, 26303, 31211) },
+                { AOM_CDF4(8169, 21722, 28219) },
+                { AOM_CDF4(6045, 19406, 27042) },
+                { AOM_CDF4(27753, 32572, 32745) },
+                { AOM_CDF4(20832, 31878, 32653) },
+                { AOM_CDF4(13250, 27356, 31674) },
+                { AOM_CDF4(7718, 21508, 29858) },
+                { AOM_CDF4(7209, 18350, 25559) },
+                { AOM_CDF4(8192, 16384, 24576) } } },
+            { { { AOM_CDF4(7876, 16901, 21741) },
+                { AOM_CDF4(24001, 31898, 32625) },
+                { AOM_CDF4(14529, 27959, 31451) },
+                { AOM_CDF4(8273, 20818, 27258) },
+                { AOM_CDF4(5278, 14673, 21510) },
+                { AOM_CDF4(2983, 8843, 14039) },
+                { AOM_CDF4(28016, 32574, 32732) },
+                { AOM_CDF4(17471, 30306, 32301) },
+                { AOM_CDF4(10224, 24063, 29728) },
+                { AOM_CDF4(6602, 17954, 25052) },
+                { AOM_CDF4(4002, 11585, 17759) },
+                { AOM_CDF4(30190, 32634, 32739) },
+                { AOM_CDF4(17497, 30282, 32270) },
+                { AOM_CDF4(10229, 23729, 29538) },
+                { AOM_CDF4(6344, 17211, 24440) },
+                { AOM_CDF4(3849, 11189, 17108) },
+                { AOM_CDF4(28570, 32583, 32726) },
+                { AOM_CDF4(17521, 30161, 32238) },
+                { AOM_CDF4(10153, 23565, 29378) },
+                { AOM_CDF4(6455, 17341, 24443) },
+                { AOM_CDF4(3907, 11042, 17024) },
+                { AOM_CDF4(30689, 32715, 32748) },
+                { AOM_CDF4(21546, 31840, 32610) },
+                { AOM_CDF4(13547, 27581, 31459) },
+                { AOM_CDF4(8912, 21757, 28309) },
+                { AOM_CDF4(5548, 15080, 22046) },
+                { AOM_CDF4(30783, 32540, 32685) },
+                { AOM_CDF4(17540, 29528, 31668) },
+                { AOM_CDF4(10160, 21468, 26783) },
+                { AOM_CDF4(4724, 13393, 20054) },
+                { AOM_CDF4(2702, 8174, 13102) },
+                { AOM_CDF4(31648, 32686, 32742) },
+                { AOM_CDF4(20954, 31094, 32337) },
+                { AOM_CDF4(12420, 25698, 30179) },
+                { AOM_CDF4(7304, 19320, 26248) },
+                { AOM_CDF4(4366, 12261, 18864) },
+                { AOM_CDF4(31581, 32723, 32748) },
+                { AOM_CDF4(21373, 31586, 32525) },
+                { AOM_CDF4(12744, 26625, 30885) },
+                { AOM_CDF4(7431, 20322, 26950) },
+                { AOM_CDF4(4692, 13323, 20111) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(7833, 18369, 24095) },
+                { AOM_CDF4(26650, 32273, 32702) },
+                { AOM_CDF4(16371, 29961, 32191) },
+                { AOM_CDF4(11055, 24082, 29629) },
+                { AOM_CDF4(6892, 18644, 25400) },
+                { AOM_CDF4(5006, 13057, 19240) },
+                { AOM_CDF4(29834, 32666, 32748) },
+                { AOM_CDF4(19577, 31335, 32570) },
+                { AOM_CDF4(12253, 26509, 31122) },
+                { AOM_CDF4(7991, 20772, 27711) },
+                { AOM_CDF4(5677, 15910, 23059) },
+                { AOM_CDF4(30109, 32532, 32720) },
+                { AOM_CDF4(16747, 30166, 32252) },
+                { AOM_CDF4(10134, 23542, 29184) },
+                { AOM_CDF4(5791, 16176, 23556) },
+                { AOM_CDF4(4362, 10414, 17284) },
+                { AOM_CDF4(29492, 32626, 32748) },
+                { AOM_CDF4(19894, 31402, 32525) },
+                { AOM_CDF4(12942, 27071, 30869) },
+                { AOM_CDF4(8346, 21216, 27405) },
+                { AOM_CDF4(6572, 17087, 23859) },
+                { AOM_CDF4(32035, 32735, 32748) },
+                { AOM_CDF4(22957, 31838, 32618) },
+                { AOM_CDF4(14724, 28572, 31772) },
+                { AOM_CDF4(10364, 23999, 29553) },
+                { AOM_CDF4(7004, 18433, 25655) },
+                { AOM_CDF4(27528, 32277, 32681) },
+                { AOM_CDF4(16959, 31171, 32096) },
+                { AOM_CDF4(10486, 23593, 27962) },
+                { AOM_CDF4(8192, 16384, 23211) },
+                { AOM_CDF4(8937, 17873, 20852) },
+                { AOM_CDF4(27715, 32002, 32615) },
+                { AOM_CDF4(15073, 29491, 31676) },
+                { AOM_CDF4(11264, 24576, 28672) },
+                { AOM_CDF4(2341, 18725, 23406) },
+                { AOM_CDF4(7282, 18204, 25486) },
+                { AOM_CDF4(28547, 32213, 32657) },
+                { AOM_CDF4(20788, 29773, 32239) },
+                { AOM_CDF4(6780, 21469, 30508) },
+                { AOM_CDF4(5958, 14895, 23831) },
+                { AOM_CDF4(16384, 21845, 27307) },
+                { AOM_CDF4(8192, 16384, 24576) } } },
+            { { { AOM_CDF4(5992, 14304, 19765) },
+                { AOM_CDF4(22612, 31238, 32456) },
+                { AOM_CDF4(13456, 27162, 31087) },
+                { AOM_CDF4(8001, 20062, 26504) },
+                { AOM_CDF4(5168, 14105, 20764) },
+                { AOM_CDF4(2632, 7771, 12385) },
+                { AOM_CDF4(27034, 32344, 32709) },
+                { AOM_CDF4(15850, 29415, 31997) },
+                { AOM_CDF4(9494, 22776, 28841) },
+                { AOM_CDF4(6151, 16830, 23969) },
+                { AOM_CDF4(3461, 10039, 15722) },
+                { AOM_CDF4(30134, 32569, 32731) },
+                { AOM_CDF4(15638, 29422, 31945) },
+                { AOM_CDF4(9150, 21865, 28218) },
+                { AOM_CDF4(5647, 15719, 22676) },
+                { AOM_CDF4(3402, 9772, 15477) },
+                { AOM_CDF4(28530, 32586, 32735) },
+                { AOM_CDF4(17139, 30298, 32292) },
+                { AOM_CDF4(10200, 24039, 29685) },
+                { AOM_CDF4(6419, 17674, 24786) },
+                { AOM_CDF4(3544, 10225, 15824) },
+                { AOM_CDF4(31333, 32726, 32748) },
+                { AOM_CDF4(20618, 31487, 32544) },
+                { AOM_CDF4(12901, 27217, 31232) },
+                { AOM_CDF4(8624, 21734, 28171) },
+                { AOM_CDF4(5104, 14191, 20748) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(11206, 21090, 26561) },
+                { AOM_CDF4(28759, 32279, 32671) },
+                { AOM_CDF4(14171, 27952, 31569) },
+                { AOM_CDF4(9743, 22907, 29141) },
+                { AOM_CDF4(6871, 17886, 24868) },
+                { AOM_CDF4(4960, 13152, 19315) },
+                { AOM_CDF4(31077, 32661, 32748) },
+                { AOM_CDF4(19400, 31195, 32515) },
+                { AOM_CDF4(12752, 26858, 31040) },
+                { AOM_CDF4(8370, 22098, 28591) },
+                { AOM_CDF4(5457, 15373, 22298) },
+                { AOM_CDF4(31697, 32706, 32748) },
+                { AOM_CDF4(17860, 30657, 32333) },
+                { AOM_CDF4(12510, 24812, 29261) },
+                { AOM_CDF4(6180, 19124, 24722) },
+                { AOM_CDF4(5041, 13548, 17959) },
+                { AOM_CDF4(31552, 32716, 32748) },
+                { AOM_CDF4(21908, 31769, 32623) },
+                { AOM_CDF4(14470, 28201, 31565) },
+                { AOM_CDF4(9493, 22982, 28608) },
+                { AOM_CDF4(6858, 17240, 24137) },
+                { AOM_CDF4(32543, 32752, 32756) },
+                { AOM_CDF4(24286, 32097, 32666) },
+                { AOM_CDF4(15958, 29217, 32024) },
+                { AOM_CDF4(10207, 24234, 29958) },
+                { AOM_CDF4(6929, 18305, 25652) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } } },
+            { { { AOM_CDF4(4137, 10847, 15682) },
+                { AOM_CDF4(17824, 27001, 30058) },
+                { AOM_CDF4(10204, 22796, 28291) },
+                { AOM_CDF4(6076, 15935, 22125) },
+                { AOM_CDF4(3852, 10937, 16816) },
+                { AOM_CDF4(2252, 6324, 10131) },
+                { AOM_CDF4(25840, 32016, 32662) },
+                { AOM_CDF4(15109, 28268, 31531) },
+                { AOM_CDF4(9385, 22231, 28340) },
+                { AOM_CDF4(6082, 16672, 23479) },
+                { AOM_CDF4(3318, 9427, 14681) },
+                { AOM_CDF4(30594, 32574, 32718) },
+                { AOM_CDF4(16836, 29552, 31859) },
+                { AOM_CDF4(9556, 22542, 28356) },
+                { AOM_CDF4(6305, 16725, 23540) },
+                { AOM_CDF4(3376, 9895, 15184) },
+                { AOM_CDF4(29383, 32617, 32745) },
+                { AOM_CDF4(18891, 30809, 32401) },
+                { AOM_CDF4(11688, 25942, 30687) },
+                { AOM_CDF4(7468, 19469, 26651) },
+                { AOM_CDF4(3909, 11358, 17012) },
+                { AOM_CDF4(31564, 32736, 32748) },
+                { AOM_CDF4(20906, 31611, 32600) },
+                { AOM_CDF4(13191, 27621, 31537) },
+                { AOM_CDF4(8768, 22029, 28676) },
+                { AOM_CDF4(5079, 14109, 20906) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } },
+              { { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) },
+                { AOM_CDF4(8192, 16384, 24576) } } } } };
+
+static const aom_cdf_prob av1_default_coeff_base_eob_multi_cdfs
+    [TOKEN_CDF_Q_CTXS][TX_SIZES][PLANE_TYPES][SIG_COEF_CONTEXTS_EOB][CDF_SIZE(
+        NUM_BASE_LEVELS + 1)] = { { { { { AOM_CDF3(17837, 29055) },
+                                        { AOM_CDF3(29600, 31446) },
+                                        { AOM_CDF3(30844, 31878) },
+                                        { AOM_CDF3(24926, 28948) } },
+                                      { { AOM_CDF3(21365, 30026) },
+                                        { AOM_CDF3(30512, 32423) },
+                                        { AOM_CDF3(31658, 32621) },
+                                        { AOM_CDF3(29630, 31881) } } },
+                                    { { { AOM_CDF3(5717, 26477) },
+                                        { AOM_CDF3(30491, 31703) },
+                                        { AOM_CDF3(31550, 32158) },
+                                        { AOM_CDF3(29648, 31491) } },
+                                      { { AOM_CDF3(12608, 27820) },
+                                        { AOM_CDF3(30680, 32225) },
+                                        { AOM_CDF3(30809, 32335) },
+                                        { AOM_CDF3(31299, 32423) } } },
+                                    { { { AOM_CDF3(1786, 12612) },
+                                        { AOM_CDF3(30663, 31625) },
+                                        { AOM_CDF3(32339, 32468) },
+                                        { AOM_CDF3(31148, 31833) } },
+                                      { { AOM_CDF3(18857, 23865) },
+                                        { AOM_CDF3(31428, 32428) },
+                                        { AOM_CDF3(31744, 32373) },
+                                        { AOM_CDF3(31775, 32526) } } },
+                                    { { { AOM_CDF3(1787, 2532) },
+                                        { AOM_CDF3(30832, 31662) },
+                                        { AOM_CDF3(31824, 32682) },
+                                        { AOM_CDF3(32133, 32569) } },
+                                      { { AOM_CDF3(13751, 22235) },
+                                        { AOM_CDF3(32089, 32409) },
+                                        { AOM_CDF3(27084, 27920) },
+                                        { AOM_CDF3(29291, 32594) } } },
+                                    { { { AOM_CDF3(1725, 3449) },
+                                        { AOM_CDF3(31102, 31935) },
+                                        { AOM_CDF3(32457, 32613) },
+                                        { AOM_CDF3(32412, 32649) } },
+                                      { { AOM_CDF3(10923, 21845) },
+                                        { AOM_CDF3(10923, 21845) },
+                                        { AOM_CDF3(10923, 21845) },
+                                        { AOM_CDF3(10923, 21845) } } } },
+                                  { { { { AOM_CDF3(17560, 29888) },
+                                        { AOM_CDF3(29671, 31549) },
+                                        { AOM_CDF3(31007, 32056) },
+                                        { AOM_CDF3(27286, 30006) } },
+                                      { { AOM_CDF3(26594, 31212) },
+                                        { AOM_CDF3(31208, 32582) },
+                                        { AOM_CDF3(31835, 32637) },
+                                        { AOM_CDF3(30595, 32206) } } },
+                                    { { { AOM_CDF3(15239, 29932) },
+                                        { AOM_CDF3(31315, 32095) },
+                                        { AOM_CDF3(32130, 32434) },
+                                        { AOM_CDF3(30864, 31996) } },
+                                      { { AOM_CDF3(26279, 30968) },
+                                        { AOM_CDF3(31142, 32495) },
+                                        { AOM_CDF3(31713, 32540) },
+                                        { AOM_CDF3(31929, 32594) } } },
+                                    { { { AOM_CDF3(2644, 25198) },
+                                        { AOM_CDF3(32038, 32451) },
+                                        { AOM_CDF3(32639, 32695) },
+                                        { AOM_CDF3(32166, 32518) } },
+                                      { { AOM_CDF3(17187, 27668) },
+                                        { AOM_CDF3(31714, 32550) },
+                                        { AOM_CDF3(32283, 32678) },
+                                        { AOM_CDF3(31930, 32563) } } },
+                                    { { { AOM_CDF3(1044, 2257) },
+                                        { AOM_CDF3(30755, 31923) },
+                                        { AOM_CDF3(32208, 32693) },
+                                        { AOM_CDF3(32244, 32615) } },
+                                      { { AOM_CDF3(21317, 26207) },
+                                        { AOM_CDF3(29133, 30868) },
+                                        { AOM_CDF3(29311, 31231) },
+                                        { AOM_CDF3(29657, 31087) } } },
+                                    { { { AOM_CDF3(478, 1834) },
+                                        { AOM_CDF3(31005, 31987) },
+                                        { AOM_CDF3(32317, 32724) },
+                                        { AOM_CDF3(30865, 32648) } },
+                                      { { AOM_CDF3(10923, 21845) },
+                                        { AOM_CDF3(10923, 21845) },
+                                        { AOM_CDF3(10923, 21845) },
+                                        { AOM_CDF3(10923, 21845) } } } },
+                                  { { { { AOM_CDF3(20092, 30774) },
+                                        { AOM_CDF3(30695, 32020) },
+                                        { AOM_CDF3(31131, 32103) },
+                                        { AOM_CDF3(28666, 30870) } },
+                                      { { AOM_CDF3(27258, 31095) },
+                                        { AOM_CDF3(31804, 32623) },
+                                        { AOM_CDF3(31763, 32528) },
+                                        { AOM_CDF3(31438, 32506) } } },
+                                    { { { AOM_CDF3(18049, 30489) },
+                                        { AOM_CDF3(31706, 32286) },
+                                        { AOM_CDF3(32163, 32473) },
+                                        { AOM_CDF3(31550, 32184) } },
+                                      { { AOM_CDF3(27116, 30842) },
+                                        { AOM_CDF3(31971, 32598) },
+                                        { AOM_CDF3(32088, 32576) },
+                                        { AOM_CDF3(32067, 32664) } } },
+                                    { { { AOM_CDF3(12854, 29093) },
+                                        { AOM_CDF3(32272, 32558) },
+                                        { AOM_CDF3(32667, 32729) },
+                                        { AOM_CDF3(32306, 32585) } },
+                                      { { AOM_CDF3(25476, 30366) },
+                                        { AOM_CDF3(32169, 32687) },
+                                        { AOM_CDF3(32479, 32689) },
+                                        { AOM_CDF3(31673, 32634) } } },
+                                    { { { AOM_CDF3(2809, 19301) },
+                                        { AOM_CDF3(32205, 32622) },
+                                        { AOM_CDF3(32338, 32730) },
+                                        { AOM_CDF3(31786, 32616) } },
+                                      { { AOM_CDF3(22737, 29105) },
+                                        { AOM_CDF3(30810, 32362) },
+                                        { AOM_CDF3(30014, 32627) },
+                                        { AOM_CDF3(30528, 32574) } } },
+                                    { { { AOM_CDF3(935, 3382) },
+                                        { AOM_CDF3(30789, 31909) },
+                                        { AOM_CDF3(32466, 32756) },
+                                        { AOM_CDF3(30860, 32513) } },
+                                      { { AOM_CDF3(10923, 21845) },
+                                        { AOM_CDF3(10923, 21845) },
+                                        { AOM_CDF3(10923, 21845) },
+                                        { AOM_CDF3(10923, 21845) } } } },
+                                  { { { { AOM_CDF3(22497, 31198) },
+                                        { AOM_CDF3(31715, 32495) },
+                                        { AOM_CDF3(31606, 32337) },
+                                        { AOM_CDF3(30388, 31990) } },
+                                      { { AOM_CDF3(27877, 31584) },
+                                        { AOM_CDF3(32170, 32728) },
+                                        { AOM_CDF3(32155, 32688) },
+                                        { AOM_CDF3(32219, 32702) } } },
+                                    { { { AOM_CDF3(21457, 31043) },
+                                        { AOM_CDF3(31951, 32483) },
+                                        { AOM_CDF3(32153, 32562) },
+                                        { AOM_CDF3(31473, 32215) } },
+                                      { { AOM_CDF3(27558, 31151) },
+                                        { AOM_CDF3(32020, 32640) },
+                                        { AOM_CDF3(32097, 32575) },
+                                        { AOM_CDF3(32242, 32719) } } },
+                                    { { { AOM_CDF3(19980, 30591) },
+                                        { AOM_CDF3(32219, 32597) },
+                                        { AOM_CDF3(32581, 32706) },
+                                        { AOM_CDF3(31803, 32287) } },
+                                      { { AOM_CDF3(26473, 30507) },
+                                        { AOM_CDF3(32431, 32723) },
+                                        { AOM_CDF3(32196, 32611) },
+                                        { AOM_CDF3(31588, 32528) } } },
+                                    { { { AOM_CDF3(24647, 30463) },
+                                        { AOM_CDF3(32412, 32695) },
+                                        { AOM_CDF3(32468, 32720) },
+                                        { AOM_CDF3(31269, 32523) } },
+                                      { { AOM_CDF3(28482, 31505) },
+                                        { AOM_CDF3(32152, 32701) },
+                                        { AOM_CDF3(31732, 32598) },
+                                        { AOM_CDF3(31767, 32712) } } },
+                                    { { { AOM_CDF3(12358, 24977) },
+                                        { AOM_CDF3(31331, 32385) },
+                                        { AOM_CDF3(32634, 32756) },
+                                        { AOM_CDF3(30411, 32548) } },
+                                      { { AOM_CDF3(10923, 21845) },
+                                        { AOM_CDF3(10923, 21845) },
+                                        { AOM_CDF3(10923, 21845) },
+                                        { AOM_CDF3(10923, 21845) } } } } };
+
+#endif  // AOM_AV1_COMMON_TOKEN_CDFS_H_

diff --git a/libav1/av1/common/txb_common.c b/libav1/av1/common/txb_common.c
new file mode 100644
index 0000000..cb92bd8
--- /dev/null
+++ b/libav1/av1/common/txb_common.c

@@ -0,0 +1,458 @@
+/*
+ * Copyright (c) 2017, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#include "aom/aom_integer.h"
+#include "av1/common/onyxc_int.h"
+#include "av1/common/txb_common.h"
+
+const int8_t av1_coeff_band_4x4[16] = { 0, 1, 2,  3,  4,  5,  6,  7,
+                                        8, 9, 10, 11, 12, 13, 14, 15 };
+
+const int8_t av1_coeff_band_8x8[64] = {
+  0,  1,  2,  2,  3,  3,  4,  4,  5,  6,  2,  2,  3,  3,  4,  4,
+  7,  7,  8,  8,  9,  9,  10, 10, 7,  7,  8,  8,  9,  9,  10, 10,
+  11, 11, 12, 12, 13, 13, 14, 14, 11, 11, 12, 12, 13, 13, 14, 14,
+  15, 15, 16, 16, 17, 17, 18, 18, 15, 15, 16, 16, 17, 17, 18, 18,
+};
+
+const int8_t av1_coeff_band_16x16[256] = {
+  0,  1,  4,  4,  7,  7,  7,  7,  8,  8,  8,  8,  9,  9,  9,  9,  2,  3,  4,
+  4,  7,  7,  7,  7,  8,  8,  8,  8,  9,  9,  9,  9,  5,  5,  6,  6,  7,  7,
+  7,  7,  8,  8,  8,  8,  9,  9,  9,  9,  5,  5,  6,  6,  7,  7,  7,  7,  8,
+  8,  8,  8,  9,  9,  9,  9,  10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12,
+  13, 13, 13, 13, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13,
+  13, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 10, 10,
+  10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15,
+  15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 14, 14, 14, 14, 15, 15, 15, 15,
+  16, 16, 16, 16, 17, 17, 17, 17, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16,
+  16, 17, 17, 17, 17, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17,
+  17, 17, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 18,
+  18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 18, 18, 18, 18,
+  19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 18, 18, 18, 18, 19, 19, 19,
+  19, 20, 20, 20, 20, 21, 21, 21, 21,
+};
+
+const int8_t av1_coeff_band_32x32[1024] = {
+  0,  1,  4,  4,  7,  7,  7,  7,  10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11,
+  11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 2,  3,  4,  4,  7,  7,
+  7,  7,  10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 12,
+  12, 12, 12, 12, 12, 12, 12, 5,  5,  6,  6,  7,  7,  7,  7,  10, 10, 10, 10,
+  10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12,
+  12, 5,  5,  6,  6,  7,  7,  7,  7,  10, 10, 10, 10, 10, 10, 10, 10, 11, 11,
+  11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 8,  8,  8,  8,  9,
+  9,  9,  9,  10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11,
+  12, 12, 12, 12, 12, 12, 12, 12, 8,  8,  8,  8,  9,  9,  9,  9,  10, 10, 10,
+  10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12,
+  12, 12, 8,  8,  8,  8,  9,  9,  9,  9,  10, 10, 10, 10, 10, 10, 10, 10, 11,
+  11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 8,  8,  8,  8,
+  9,  9,  9,  9,  10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11,
+  11, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14,
+  14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16,
+  16, 16, 16, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14,
+  15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 13, 13, 13,
+  13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15,
+  15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 13, 13, 13, 13, 13, 13, 13, 13, 14,
+  14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16,
+  16, 16, 16, 16, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14,
+  14, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 13, 13,
+  13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15,
+  15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 13, 13, 13, 13, 13, 13, 13, 13,
+  14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16,
+  16, 16, 16, 16, 16, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14,
+  14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 17,
+  17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19,
+  19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 17, 17, 17, 17, 17, 17, 17,
+  17, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20,
+  20, 20, 20, 20, 20, 20, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18,
+  18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20,
+  17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19,
+  19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 17, 17, 17, 17, 17, 17,
+  17, 17, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 20,
+  20, 20, 20, 20, 20, 20, 20, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18,
+  18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20,
+  20, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19,
+  19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 17, 17, 17, 17, 17,
+  17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19,
+  20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22,
+  22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24,
+  24, 24, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 23,
+  23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 21, 21, 21, 21,
+  21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23,
+  23, 24, 24, 24, 24, 24, 24, 24, 24, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22,
+  22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24,
+  24, 24, 24, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22,
+  23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 21, 21, 21,
+  21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23,
+  23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 21, 21, 21, 21, 21, 21, 21, 21, 22,
+  22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24,
+  24, 24, 24, 24, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22,
+  22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 24,
+};
+
+// The ctx offset table when TX is TX_CLASS_2D.
+// TX col and row indices are clamped to 4
+
+const int8_t av1_nz_map_ctx_offset_4x4[16] = {
+  0, 1, 6, 6, 1, 6, 6, 21, 6, 6, 21, 21, 6, 21, 21, 21,
+};
+
+const int8_t av1_nz_map_ctx_offset_8x8[64] = {
+  0,  1,  6,  6,  21, 21, 21, 21, 1,  6,  6,  21, 21, 21, 21, 21,
+  6,  6,  21, 21, 21, 21, 21, 21, 6,  21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+};
+
+const int8_t av1_nz_map_ctx_offset_16x16[256] = {
+  0,  1,  6,  6,  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 1,  6,  6,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 6,  6,  21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 6,  21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21,
+};
+
+const int8_t av1_nz_map_ctx_offset_32x32[1024] = {
+  0,  1,  6,  6,  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 1,  6,  6,  21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 6,  6,  21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 6,  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+};
+
+const int8_t av1_nz_map_ctx_offset_8x4[32] = {
+  0,  16, 6,  6,  21, 21, 21, 21, 16, 16, 6,  21, 21, 21, 21, 21,
+  16, 16, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21,
+};
+
+const int8_t av1_nz_map_ctx_offset_8x16[128] = {
+  0,  11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 6,  6,  21,
+  21, 21, 21, 21, 21, 6,  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+};
+
+const int8_t av1_nz_map_ctx_offset_16x8[128] = {
+  0,  16, 6,  6,  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 6,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+};
+
+const int8_t av1_nz_map_ctx_offset_16x32[512] = {
+  0,  11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
+  11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 6,  6,  21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 6,  21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+};
+
+const int8_t av1_nz_map_ctx_offset_32x16[512] = {
+  0,  16, 6,  6,  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 6,  21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+};
+
+const int8_t av1_nz_map_ctx_offset_32x64[1024] = {
+  0,  11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
+  11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
+  11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
+  11, 11, 11, 11, 11, 11, 11, 6,  6,  21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 6,  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+};
+
+const int8_t av1_nz_map_ctx_offset_64x32[1024] = {
+  0,  16, 6,  6,  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 6,  21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16,
+  16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+};
+
+const int8_t av1_nz_map_ctx_offset_4x16[64] = {
+  0,  11, 11, 11, 11, 11, 11, 11, 6,  6,  21, 21, 6,  21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+};
+
+const int8_t av1_nz_map_ctx_offset_16x4[64] = {
+  0,  16, 6,  6,  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  16, 16, 6,  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+};
+
+const int8_t av1_nz_map_ctx_offset_8x32[256] = {
+  0,  11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 6,  6,  21,
+  21, 21, 21, 21, 21, 6,  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21,
+};
+
+const int8_t av1_nz_map_ctx_offset_32x8[256] = {
+  0,  16, 6,  6,  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 6,  21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+  21, 21, 21, 21, 21, 21, 21, 21, 21,
+};
+
+const int8_t *av1_nz_map_ctx_offset[19] = {
+  av1_nz_map_ctx_offset_4x4,    // TX_4x4
+  av1_nz_map_ctx_offset_8x8,    // TX_8x8
+  av1_nz_map_ctx_offset_16x16,  // TX_16x16
+  av1_nz_map_ctx_offset_32x32,  // TX_32x32
+  av1_nz_map_ctx_offset_32x32,  // TX_32x32
+  av1_nz_map_ctx_offset_4x16,   // TX_4x8
+  av1_nz_map_ctx_offset_8x4,    // TX_8x4
+  av1_nz_map_ctx_offset_8x32,   // TX_8x16
+  av1_nz_map_ctx_offset_16x8,   // TX_16x8
+  av1_nz_map_ctx_offset_16x32,  // TX_16x32
+  av1_nz_map_ctx_offset_32x16,  // TX_32x16
+  av1_nz_map_ctx_offset_32x64,  // TX_32x64
+  av1_nz_map_ctx_offset_64x32,  // TX_64x32
+  av1_nz_map_ctx_offset_4x16,   // TX_4x16
+  av1_nz_map_ctx_offset_16x4,   // TX_16x4
+  av1_nz_map_ctx_offset_8x32,   // TX_8x32
+  av1_nz_map_ctx_offset_32x8,   // TX_32x8
+  av1_nz_map_ctx_offset_16x32,  // TX_16x64
+  av1_nz_map_ctx_offset_64x32,  // TX_64x16
+};
+
+const int16_t k_eob_group_start[12] = { 0,  1,  2,  3,   5,   9,
+                                        17, 33, 65, 129, 257, 513 };
+const int16_t k_eob_offset_bits[12] = { 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };

diff --git a/libav1/av1/common/txb_common.h b/libav1/av1/common/txb_common.h
new file mode 100644
index 0000000..8a3932d
--- /dev/null
+++ b/libav1/av1/common/txb_common.h

@@ -0,0 +1,435 @@
+/*
+ * Copyright (c) 2017, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_TXB_COMMON_H_
+#define AOM_AV1_COMMON_TXB_COMMON_H_
+
+#include "av1/common/onyxc_int.h"
+
+extern const int16_t k_eob_group_start[12];
+extern const int16_t k_eob_offset_bits[12];
+
+extern const int8_t av1_coeff_band_4x4[16];
+
+extern const int8_t av1_coeff_band_8x8[64];
+
+extern const int8_t av1_coeff_band_16x16[256];
+
+extern const int8_t av1_coeff_band_32x32[1024];
+
+extern const int8_t *av1_nz_map_ctx_offset[TX_SIZES_ALL];
+
+typedef struct txb_ctx {
+  int txb_skip_ctx;
+  int dc_sign_ctx;
+} TXB_CTX;
+
+static const int base_level_count_to_index[13] = {
+  0, 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3,
+};
+
+static const TX_CLASS tx_type_to_class[TX_TYPES] = {
+  TX_CLASS_2D,     // DCT_DCT
+  TX_CLASS_2D,     // ADST_DCT
+  TX_CLASS_2D,     // DCT_ADST
+  TX_CLASS_2D,     // ADST_ADST
+  TX_CLASS_2D,     // FLIPADST_DCT
+  TX_CLASS_2D,     // DCT_FLIPADST
+  TX_CLASS_2D,     // FLIPADST_FLIPADST
+  TX_CLASS_2D,     // ADST_FLIPADST
+  TX_CLASS_2D,     // FLIPADST_ADST
+  TX_CLASS_2D,     // IDTX
+  TX_CLASS_VERT,   // V_DCT
+  TX_CLASS_HORIZ,  // H_DCT
+  TX_CLASS_VERT,   // V_ADST
+  TX_CLASS_HORIZ,  // H_ADST
+  TX_CLASS_VERT,   // V_FLIPADST
+  TX_CLASS_HORIZ,  // H_FLIPADST
+};
+
+static INLINE int get_txb_bwl(TX_SIZE tx_size) {
+  tx_size = av1_get_adjusted_tx_size(tx_size);
+  return tx_size_wide_log2[tx_size];
+}
+
+static INLINE int get_txb_wide(TX_SIZE tx_size) {
+  tx_size = av1_get_adjusted_tx_size(tx_size);
+  return tx_size_wide[tx_size];
+}
+
+static INLINE int get_txb_high(TX_SIZE tx_size) {
+  tx_size = av1_get_adjusted_tx_size(tx_size);
+  return tx_size_high[tx_size];
+}
+
+static INLINE uint8_t *set_levels(uint8_t *const levels_buf, const int width) {
+  return levels_buf + TX_PAD_TOP * (width + TX_PAD_HOR);
+}
+
+static INLINE int get_padded_idx(const int idx, const int bwl) {
+  return idx + ((idx >> bwl) << TX_PAD_HOR_LOG2);
+}
+
+static INLINE int get_base_ctx_from_count_mag(int row, int col, int count,
+                                              int sig_mag) {
+  const int ctx = base_level_count_to_index[count];
+  int ctx_idx = -1;
+
+  if (row == 0 && col == 0) {
+    if (sig_mag >= 2) return ctx_idx = 0;
+    if (sig_mag == 1) {
+      if (count >= 2)
+        ctx_idx = 1;
+      else
+        ctx_idx = 2;
+
+      return ctx_idx;
+    }
+
+    ctx_idx = 3 + ctx;
+    assert(ctx_idx <= 6);
+    return ctx_idx;
+  } else if (row == 0) {
+    if (sig_mag >= 2) return ctx_idx = 6;
+    if (sig_mag == 1) {
+      if (count >= 2)
+        ctx_idx = 7;
+      else
+        ctx_idx = 8;
+      return ctx_idx;
+    }
+
+    ctx_idx = 9 + ctx;
+    assert(ctx_idx <= 11);
+    return ctx_idx;
+  } else if (col == 0) {
+    if (sig_mag >= 2) return ctx_idx = 12;
+    if (sig_mag == 1) {
+      if (count >= 2)
+        ctx_idx = 13;
+      else
+        ctx_idx = 14;
+
+      return ctx_idx;
+    }
+
+    ctx_idx = 15 + ctx;
+    assert(ctx_idx <= 17);
+    // TODO(angiebird): turn this on once the optimization is finalized
+    // assert(ctx_idx < 28);
+  } else {
+    if (sig_mag >= 2) return ctx_idx = 18;
+    if (sig_mag == 1) {
+      if (count >= 2)
+        ctx_idx = 19;
+      else
+        ctx_idx = 20;
+      return ctx_idx;
+    }
+
+    ctx_idx = 21 + ctx;
+
+    assert(ctx_idx <= 24);
+  }
+  return ctx_idx;
+}
+
+static INLINE int get_br_ctx_2d(const uint8_t *const levels,
+                                const int c,  // raster order
+                                const int bwl) {
+  assert(c > 0);
+  const int row = c >> bwl;
+  const int col = c - (row << bwl);
+  const int stride = (1 << bwl) + TX_PAD_HOR;
+  const int pos = row * stride + col;
+  int mag = AOMMIN(levels[pos + 1], MAX_BASE_BR_RANGE) +
+            AOMMIN(levels[pos + stride], MAX_BASE_BR_RANGE) +
+            AOMMIN(levels[pos + 1 + stride], MAX_BASE_BR_RANGE);
+  mag = AOMMIN((mag + 1) >> 1, 6);
+  //((row | col) < 2) is equivalent to ((row < 2) && (col < 2))
+  if ((row | col) < 2) return mag + 7;
+  return mag + 14;
+}
+
+static AOM_FORCE_INLINE int get_br_ctx_eob(const int c,  // raster order
+                                           const int bwl,
+                                           const TX_CLASS tx_class) {
+  const int row = c >> bwl;
+  const int col = c - (row << bwl);
+  if (c == 0) return 0;
+  if ((tx_class == TX_CLASS_2D && row < 2 && col < 2) ||
+      (tx_class == TX_CLASS_HORIZ && col == 0) ||
+      (tx_class == TX_CLASS_VERT && row == 0))
+    return 7;
+  return 14;
+}
+
+static AOM_FORCE_INLINE int get_br_ctx(const uint8_t *const levels,
+                                       const int c,  // raster order
+                                       const int bwl, const TX_CLASS tx_class) {
+  const int row = c >> bwl;
+  const int col = c - (row << bwl);
+  const int stride = (1 << bwl) + TX_PAD_HOR;
+  const int pos = row * stride + col;
+  int mag = levels[pos + 1];
+  mag += levels[pos + stride];
+  switch (tx_class) {
+    case TX_CLASS_2D:
+      mag += levels[pos + stride + 1];
+      mag = AOMMIN((mag + 1) >> 1, 6);
+      if (c == 0) return mag;
+      if ((row < 2) && (col < 2)) return mag + 7;
+      break;
+    case TX_CLASS_HORIZ:
+      mag += levels[pos + 2];
+      mag = AOMMIN((mag + 1) >> 1, 6);
+      if (c == 0) return mag;
+      if (col == 0) return mag + 7;
+      break;
+    case TX_CLASS_VERT:
+      mag += levels[pos + (stride << 1)];
+      mag = AOMMIN((mag + 1) >> 1, 6);
+      if (c == 0) return mag;
+      if (row == 0) return mag + 7;
+      break;
+    default: break;
+  }
+
+  return mag + 14;
+}
+
+static const uint8_t clip_max3[256] = {
+  0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+  3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+  3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+  3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+  3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+  3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+  3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+  3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+  3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+  3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3
+};
+
+static AOM_FORCE_INLINE int get_nz_mag(const uint8_t *const levels,
+                                       const int bwl, const TX_CLASS tx_class) {
+  int mag;
+
+  // Note: AOMMIN(level, 3) is useless for decoder since level < 3.
+  mag = clip_max3[levels[1]];                         // { 0, 1 }
+  mag += clip_max3[levels[(1 << bwl) + TX_PAD_HOR]];  // { 1, 0 }
+
+  if (tx_class == TX_CLASS_2D) {
+    mag += clip_max3[levels[(1 << bwl) + TX_PAD_HOR + 1]];          // { 1, 1 }
+    mag += clip_max3[levels[2]];                                    // { 0, 2 }
+    mag += clip_max3[levels[(2 << bwl) + (2 << TX_PAD_HOR_LOG2)]];  // { 2, 0 }
+  } else if (tx_class == TX_CLASS_VERT) {
+    mag += clip_max3[levels[(2 << bwl) + (2 << TX_PAD_HOR_LOG2)]];  // { 2, 0 }
+    mag += clip_max3[levels[(3 << bwl) + (3 << TX_PAD_HOR_LOG2)]];  // { 3, 0 }
+    mag += clip_max3[levels[(4 << bwl) + (4 << TX_PAD_HOR_LOG2)]];  // { 4, 0 }
+  } else {
+    mag += clip_max3[levels[2]];  // { 0, 2 }
+    mag += clip_max3[levels[3]];  // { 0, 3 }
+    mag += clip_max3[levels[4]];  // { 0, 4 }
+  }
+
+  return mag;
+}
+
+#define NZ_MAP_CTX_0 SIG_COEF_CONTEXTS_2D
+#define NZ_MAP_CTX_5 (NZ_MAP_CTX_0 + 5)
+#define NZ_MAP_CTX_10 (NZ_MAP_CTX_0 + 10)
+
+static const int nz_map_ctx_offset_1d[32] = {
+  NZ_MAP_CTX_0,  NZ_MAP_CTX_5,  NZ_MAP_CTX_10, NZ_MAP_CTX_10, NZ_MAP_CTX_10,
+  NZ_MAP_CTX_10, NZ_MAP_CTX_10, NZ_MAP_CTX_10, NZ_MAP_CTX_10, NZ_MAP_CTX_10,
+  NZ_MAP_CTX_10, NZ_MAP_CTX_10, NZ_MAP_CTX_10, NZ_MAP_CTX_10, NZ_MAP_CTX_10,
+  NZ_MAP_CTX_10, NZ_MAP_CTX_10, NZ_MAP_CTX_10, NZ_MAP_CTX_10, NZ_MAP_CTX_10,
+  NZ_MAP_CTX_10, NZ_MAP_CTX_10, NZ_MAP_CTX_10, NZ_MAP_CTX_10, NZ_MAP_CTX_10,
+  NZ_MAP_CTX_10, NZ_MAP_CTX_10, NZ_MAP_CTX_10, NZ_MAP_CTX_10, NZ_MAP_CTX_10,
+  NZ_MAP_CTX_10, NZ_MAP_CTX_10,
+};
+
+static AOM_FORCE_INLINE int get_nz_map_ctx_from_stats(
+    const int stats,
+    const int coeff_idx,  // raster order
+    const int bwl, const TX_SIZE tx_size, const TX_CLASS tx_class) {
+  // tx_class == 0(TX_CLASS_2D)
+  if ((tx_class | coeff_idx) == 0) return 0;
+  int ctx = (stats + 1) >> 1;
+  ctx = AOMMIN(ctx, 4);
+  switch (tx_class) {
+    case TX_CLASS_2D: {
+      // This is the algorithm to generate av1_nz_map_ctx_offset[][]
+      //   const int width = tx_size_wide[tx_size];
+      //   const int height = tx_size_high[tx_size];
+      //   if (width < height) {
+      //     if (row < 2) return 11 + ctx;
+      //   } else if (width > height) {
+      //     if (col < 2) return 16 + ctx;
+      //   }
+      //   if (row + col < 2) return ctx + 1;
+      //   if (row + col < 4) return 5 + ctx + 1;
+      //   return 21 + ctx;
+      return ctx + av1_nz_map_ctx_offset[tx_size][coeff_idx];
+    }
+    case TX_CLASS_HORIZ: {
+      const int row = coeff_idx >> bwl;
+      const int col = coeff_idx - (row << bwl);
+      return ctx + nz_map_ctx_offset_1d[col];
+    }
+    case TX_CLASS_VERT: {
+      const int row = coeff_idx >> bwl;
+      return ctx + nz_map_ctx_offset_1d[row];
+    }
+    default: break;
+  }
+  return 0;
+}
+
+typedef aom_cdf_prob (*base_cdf_arr)[CDF_SIZE(4)];
+typedef aom_cdf_prob (*br_cdf_arr)[CDF_SIZE(BR_CDF_SIZE)];
+
+static INLINE int get_lower_levels_ctx_eob(int bwl, int height, int scan_idx) {
+  if (scan_idx == 0) return 0;
+  if (scan_idx <= (height << bwl) / 8) return 1;
+  if (scan_idx <= (height << bwl) / 4) return 2;
+  return 3;
+}
+
+static INLINE int get_lower_levels_ctx_2d(const uint8_t *levels, int coeff_idx,
+                                          int bwl, TX_SIZE tx_size) {
+  assert(coeff_idx > 0);
+  int mag;
+  // Note: AOMMIN(level, 3) is useless for decoder since level < 3.
+  levels = levels + get_padded_idx(coeff_idx, bwl);
+  mag = AOMMIN(levels[1], 3);                                     // { 0, 1 }
+  mag += AOMMIN(levels[(1 << bwl) + TX_PAD_HOR], 3);              // { 1, 0 }
+  mag += AOMMIN(levels[(1 << bwl) + TX_PAD_HOR + 1], 3);          // { 1, 1 }
+  mag += AOMMIN(levels[2], 3);                                    // { 0, 2 }
+  mag += AOMMIN(levels[(2 << bwl) + (2 << TX_PAD_HOR_LOG2)], 3);  // { 2, 0 }
+
+  const int ctx = AOMMIN((mag + 1) >> 1, 4);
+  return ctx + av1_nz_map_ctx_offset[tx_size][coeff_idx];
+}
+static AOM_FORCE_INLINE int get_lower_levels_ctx(const uint8_t *levels,
+                                                 int coeff_idx, int bwl,
+                                                 TX_SIZE tx_size,
+                                                 TX_CLASS tx_class) {
+  const int stats =
+      get_nz_mag(levels + get_padded_idx(coeff_idx, bwl), bwl, tx_class);
+  return get_nz_map_ctx_from_stats(stats, coeff_idx, bwl, tx_size, tx_class);
+}
+
+static INLINE int get_lower_levels_ctx_general(int is_last, int scan_idx,
+                                               int bwl, int height,
+                                               const uint8_t *levels,
+                                               int coeff_idx, TX_SIZE tx_size,
+                                               TX_CLASS tx_class) {
+  if (is_last) {
+    if (scan_idx == 0) return 0;
+    if (scan_idx <= (height << bwl) >> 3) return 1;
+    if (scan_idx <= (height << bwl) >> 2) return 2;
+    return 3;
+  }
+  return get_lower_levels_ctx(levels, coeff_idx, bwl, tx_size, tx_class);
+}
+
+static INLINE void set_dc_sign(int *cul_level, int dc_val) {
+  if (dc_val < 0)
+    *cul_level |= 1 << COEFF_CONTEXT_BITS;
+  else if (dc_val > 0)
+    *cul_level += 2 << COEFF_CONTEXT_BITS;
+}
+
+static INLINE void get_txb_ctx(const BLOCK_SIZE plane_bsize,
+                               const TX_SIZE tx_size, const int plane,
+                               const ENTROPY_CONTEXT *const a,
+                               const ENTROPY_CONTEXT *const l,
+                               TXB_CTX *const txb_ctx) {
+#define MAX_TX_SIZE_UNIT 16
+  static const int8_t signs[3] = { 0, -1, 1 };
+  static const int8_t dc_sign_contexts[4 * MAX_TX_SIZE_UNIT + 1] = {
+    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
+    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
+  };
+  const int txb_w_unit = tx_size_wide_unit[tx_size];
+  const int txb_h_unit = tx_size_high_unit[tx_size];
+  int dc_sign = 0;
+  int k = 0;
+
+  do {
+    const unsigned int sign = ((uint8_t)a[k]) >> COEFF_CONTEXT_BITS;
+    assert(sign <= 2);
+    dc_sign += signs[sign];
+  } while (++k < txb_w_unit);
+
+  k = 0;
+  do {
+    const unsigned int sign = ((uint8_t)l[k]) >> COEFF_CONTEXT_BITS;
+    assert(sign <= 2);
+    dc_sign += signs[sign];
+  } while (++k < txb_h_unit);
+
+  txb_ctx->dc_sign_ctx = dc_sign_contexts[dc_sign + 2 * MAX_TX_SIZE_UNIT];
+
+  if (plane == 0) {
+    if (plane_bsize == txsize_to_bsize[tx_size]) {
+      txb_ctx->txb_skip_ctx = 0;
+    } else {
+      // This is the algorithm to generate table skip_contexts[min][max].
+      //    if (!max)
+      //      txb_skip_ctx = 1;
+      //    else if (!min)
+      //      txb_skip_ctx = 2 + (max > 3);
+      //    else if (max <= 3)
+      //      txb_skip_ctx = 4;
+      //    else if (min <= 3)
+      //      txb_skip_ctx = 5;
+      //    else
+      //      txb_skip_ctx = 6;
+      static const uint8_t skip_contexts[5][5] = { { 1, 2, 2, 2, 3 },
+                                                   { 1, 4, 4, 4, 5 },
+                                                   { 1, 4, 4, 4, 5 },
+                                                   { 1, 4, 4, 4, 5 },
+                                                   { 1, 4, 4, 4, 6 } };
+      int top = 0;
+      int left = 0;
+
+      k = 0;
+      do {
+        top |= a[k];
+      } while (++k < txb_w_unit);
+      top &= COEFF_CONTEXT_MASK;
+
+      k = 0;
+      do {
+        left |= l[k];
+      } while (++k < txb_h_unit);
+      left &= COEFF_CONTEXT_MASK;
+      const int max = AOMMIN(top | left, 4);
+      const int min = AOMMIN(AOMMIN(top, left), 4);
+
+      txb_ctx->txb_skip_ctx = skip_contexts[min][max];
+    }
+  } else {
+    const int ctx_base = get_entropy_context(tx_size, a, l);
+    const int ctx_offset = (num_pels_log2_lookup[plane_bsize] >
+                            num_pels_log2_lookup[txsize_to_bsize[tx_size]])
+                               ? 10
+                               : 7;
+    txb_ctx->txb_skip_ctx = ctx_base + ctx_offset;
+  }
+#undef MAX_TX_SIZE_UNIT
+}
+
+#endif  // AOM_AV1_COMMON_TXB_COMMON_H_

diff --git a/libav1/av1/common/warped_motion.c b/libav1/av1/common/warped_motion.c
new file mode 100644
index 0000000..e232e10
--- /dev/null
+++ b/libav1/av1/common/warped_motion.c

@@ -0,0 +1,1148 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <memory.h>
+#include <math.h>
+#include <assert.h>
+
+#include "config/av1_rtcd.h"
+
+#include "av1/common/warped_motion.h"
+#include "av1/common/scale.h"
+
+#define WARP_ERROR_BLOCK 32
+
+/* clang-format off */
+static const int error_measure_lut[512] = {
+  // pow 0.7
+  16384, 16339, 16294, 16249, 16204, 16158, 16113, 16068,
+  16022, 15977, 15932, 15886, 15840, 15795, 15749, 15703,
+  15657, 15612, 15566, 15520, 15474, 15427, 15381, 15335,
+  15289, 15242, 15196, 15149, 15103, 15056, 15010, 14963,
+  14916, 14869, 14822, 14775, 14728, 14681, 14634, 14587,
+  14539, 14492, 14445, 14397, 14350, 14302, 14254, 14206,
+  14159, 14111, 14063, 14015, 13967, 13918, 13870, 13822,
+  13773, 13725, 13676, 13628, 13579, 13530, 13481, 13432,
+  13383, 13334, 13285, 13236, 13187, 13137, 13088, 13038,
+  12988, 12939, 12889, 12839, 12789, 12739, 12689, 12639,
+  12588, 12538, 12487, 12437, 12386, 12335, 12285, 12234,
+  12183, 12132, 12080, 12029, 11978, 11926, 11875, 11823,
+  11771, 11719, 11667, 11615, 11563, 11511, 11458, 11406,
+  11353, 11301, 11248, 11195, 11142, 11089, 11036, 10982,
+  10929, 10875, 10822, 10768, 10714, 10660, 10606, 10552,
+  10497, 10443, 10388, 10333, 10279, 10224, 10168, 10113,
+  10058, 10002,  9947,  9891,  9835,  9779,  9723,  9666,
+  9610, 9553, 9497, 9440, 9383, 9326, 9268, 9211,
+  9153, 9095, 9037, 8979, 8921, 8862, 8804, 8745,
+  8686, 8627, 8568, 8508, 8449, 8389, 8329, 8269,
+  8208, 8148, 8087, 8026, 7965, 7903, 7842, 7780,
+  7718, 7656, 7593, 7531, 7468, 7405, 7341, 7278,
+  7214, 7150, 7086, 7021, 6956, 6891, 6826, 6760,
+  6695, 6628, 6562, 6495, 6428, 6361, 6293, 6225,
+  6157, 6089, 6020, 5950, 5881, 5811, 5741, 5670,
+  5599, 5527, 5456, 5383, 5311, 5237, 5164, 5090,
+  5015, 4941, 4865, 4789, 4713, 4636, 4558, 4480,
+  4401, 4322, 4242, 4162, 4080, 3998, 3916, 3832,
+  3748, 3663, 3577, 3490, 3402, 3314, 3224, 3133,
+  3041, 2948, 2854, 2758, 2661, 2562, 2461, 2359,
+  2255, 2148, 2040, 1929, 1815, 1698, 1577, 1452,
+  1323, 1187, 1045,  894,  731,  550,  339,    0,
+  339,  550,  731,  894, 1045, 1187, 1323, 1452,
+  1577, 1698, 1815, 1929, 2040, 2148, 2255, 2359,
+  2461, 2562, 2661, 2758, 2854, 2948, 3041, 3133,
+  3224, 3314, 3402, 3490, 3577, 3663, 3748, 3832,
+  3916, 3998, 4080, 4162, 4242, 4322, 4401, 4480,
+  4558, 4636, 4713, 4789, 4865, 4941, 5015, 5090,
+  5164, 5237, 5311, 5383, 5456, 5527, 5599, 5670,
+  5741, 5811, 5881, 5950, 6020, 6089, 6157, 6225,
+  6293, 6361, 6428, 6495, 6562, 6628, 6695, 6760,
+  6826, 6891, 6956, 7021, 7086, 7150, 7214, 7278,
+  7341, 7405, 7468, 7531, 7593, 7656, 7718, 7780,
+  7842, 7903, 7965, 8026, 8087, 8148, 8208, 8269,
+  8329, 8389, 8449, 8508, 8568, 8627, 8686, 8745,
+  8804, 8862, 8921, 8979, 9037, 9095, 9153, 9211,
+  9268, 9326, 9383, 9440, 9497, 9553, 9610, 9666,
+  9723,  9779,  9835,  9891,  9947, 10002, 10058, 10113,
+  10168, 10224, 10279, 10333, 10388, 10443, 10497, 10552,
+  10606, 10660, 10714, 10768, 10822, 10875, 10929, 10982,
+  11036, 11089, 11142, 11195, 11248, 11301, 11353, 11406,
+  11458, 11511, 11563, 11615, 11667, 11719, 11771, 11823,
+  11875, 11926, 11978, 12029, 12080, 12132, 12183, 12234,
+  12285, 12335, 12386, 12437, 12487, 12538, 12588, 12639,
+  12689, 12739, 12789, 12839, 12889, 12939, 12988, 13038,
+  13088, 13137, 13187, 13236, 13285, 13334, 13383, 13432,
+  13481, 13530, 13579, 13628, 13676, 13725, 13773, 13822,
+  13870, 13918, 13967, 14015, 14063, 14111, 14159, 14206,
+  14254, 14302, 14350, 14397, 14445, 14492, 14539, 14587,
+  14634, 14681, 14728, 14775, 14822, 14869, 14916, 14963,
+  15010, 15056, 15103, 15149, 15196, 15242, 15289, 15335,
+  15381, 15427, 15474, 15520, 15566, 15612, 15657, 15703,
+  15749, 15795, 15840, 15886, 15932, 15977, 16022, 16068,
+  16113, 16158, 16204, 16249, 16294, 16339, 16384, 16384,
+};
+/* clang-format on */
+
+// For warping, we really use a 6-tap filter, but we do blocks of 8 pixels
+// at a time. The zoom/rotation/shear in the model are applied to the
+// "fractional" position of each pixel, which therefore varies within
+// [-1, 2) * WARPEDPIXEL_PREC_SHIFTS.
+// We need an extra 2 taps to fit this in, for a total of 8 taps.
+/* clang-format off */
+const int16_t warped_filter[WARPEDPIXEL_PREC_SHIFTS * 3 + 1][8] = {
+#if WARPEDPIXEL_PREC_BITS == 6
+  // [-1, 0)
+  { 0,   0, 127,   1,   0, 0, 0, 0 }, { 0, - 1, 127,   2,   0, 0, 0, 0 },
+  { 1, - 3, 127,   4, - 1, 0, 0, 0 }, { 1, - 4, 126,   6, - 2, 1, 0, 0 },
+  { 1, - 5, 126,   8, - 3, 1, 0, 0 }, { 1, - 6, 125,  11, - 4, 1, 0, 0 },
+  { 1, - 7, 124,  13, - 4, 1, 0, 0 }, { 2, - 8, 123,  15, - 5, 1, 0, 0 },
+  { 2, - 9, 122,  18, - 6, 1, 0, 0 }, { 2, -10, 121,  20, - 6, 1, 0, 0 },
+  { 2, -11, 120,  22, - 7, 2, 0, 0 }, { 2, -12, 119,  25, - 8, 2, 0, 0 },
+  { 3, -13, 117,  27, - 8, 2, 0, 0 }, { 3, -13, 116,  29, - 9, 2, 0, 0 },
+  { 3, -14, 114,  32, -10, 3, 0, 0 }, { 3, -15, 113,  35, -10, 2, 0, 0 },
+  { 3, -15, 111,  37, -11, 3, 0, 0 }, { 3, -16, 109,  40, -11, 3, 0, 0 },
+  { 3, -16, 108,  42, -12, 3, 0, 0 }, { 4, -17, 106,  45, -13, 3, 0, 0 },
+  { 4, -17, 104,  47, -13, 3, 0, 0 }, { 4, -17, 102,  50, -14, 3, 0, 0 },
+  { 4, -17, 100,  52, -14, 3, 0, 0 }, { 4, -18,  98,  55, -15, 4, 0, 0 },
+  { 4, -18,  96,  58, -15, 3, 0, 0 }, { 4, -18,  94,  60, -16, 4, 0, 0 },
+  { 4, -18,  91,  63, -16, 4, 0, 0 }, { 4, -18,  89,  65, -16, 4, 0, 0 },
+  { 4, -18,  87,  68, -17, 4, 0, 0 }, { 4, -18,  85,  70, -17, 4, 0, 0 },
+  { 4, -18,  82,  73, -17, 4, 0, 0 }, { 4, -18,  80,  75, -17, 4, 0, 0 },
+  { 4, -18,  78,  78, -18, 4, 0, 0 }, { 4, -17,  75,  80, -18, 4, 0, 0 },
+  { 4, -17,  73,  82, -18, 4, 0, 0 }, { 4, -17,  70,  85, -18, 4, 0, 0 },
+  { 4, -17,  68,  87, -18, 4, 0, 0 }, { 4, -16,  65,  89, -18, 4, 0, 0 },
+  { 4, -16,  63,  91, -18, 4, 0, 0 }, { 4, -16,  60,  94, -18, 4, 0, 0 },
+  { 3, -15,  58,  96, -18, 4, 0, 0 }, { 4, -15,  55,  98, -18, 4, 0, 0 },
+  { 3, -14,  52, 100, -17, 4, 0, 0 }, { 3, -14,  50, 102, -17, 4, 0, 0 },
+  { 3, -13,  47, 104, -17, 4, 0, 0 }, { 3, -13,  45, 106, -17, 4, 0, 0 },
+  { 3, -12,  42, 108, -16, 3, 0, 0 }, { 3, -11,  40, 109, -16, 3, 0, 0 },
+  { 3, -11,  37, 111, -15, 3, 0, 0 }, { 2, -10,  35, 113, -15, 3, 0, 0 },
+  { 3, -10,  32, 114, -14, 3, 0, 0 }, { 2, - 9,  29, 116, -13, 3, 0, 0 },
+  { 2, - 8,  27, 117, -13, 3, 0, 0 }, { 2, - 8,  25, 119, -12, 2, 0, 0 },
+  { 2, - 7,  22, 120, -11, 2, 0, 0 }, { 1, - 6,  20, 121, -10, 2, 0, 0 },
+  { 1, - 6,  18, 122, - 9, 2, 0, 0 }, { 1, - 5,  15, 123, - 8, 2, 0, 0 },
+  { 1, - 4,  13, 124, - 7, 1, 0, 0 }, { 1, - 4,  11, 125, - 6, 1, 0, 0 },
+  { 1, - 3,   8, 126, - 5, 1, 0, 0 }, { 1, - 2,   6, 126, - 4, 1, 0, 0 },
+  { 0, - 1,   4, 127, - 3, 1, 0, 0 }, { 0,   0,   2, 127, - 1, 0, 0, 0 },
+
+  // [0, 1)
+  { 0,  0,   0, 127,   1,   0,  0,  0}, { 0,  0,  -1, 127,   2,   0,  0,  0},
+  { 0,  1,  -3, 127,   4,  -2,  1,  0}, { 0,  1,  -5, 127,   6,  -2,  1,  0},
+  { 0,  2,  -6, 126,   8,  -3,  1,  0}, {-1,  2,  -7, 126,  11,  -4,  2, -1},
+  {-1,  3,  -8, 125,  13,  -5,  2, -1}, {-1,  3, -10, 124,  16,  -6,  3, -1},
+  {-1,  4, -11, 123,  18,  -7,  3, -1}, {-1,  4, -12, 122,  20,  -7,  3, -1},
+  {-1,  4, -13, 121,  23,  -8,  3, -1}, {-2,  5, -14, 120,  25,  -9,  4, -1},
+  {-1,  5, -15, 119,  27, -10,  4, -1}, {-1,  5, -16, 118,  30, -11,  4, -1},
+  {-2,  6, -17, 116,  33, -12,  5, -1}, {-2,  6, -17, 114,  35, -12,  5, -1},
+  {-2,  6, -18, 113,  38, -13,  5, -1}, {-2,  7, -19, 111,  41, -14,  6, -2},
+  {-2,  7, -19, 110,  43, -15,  6, -2}, {-2,  7, -20, 108,  46, -15,  6, -2},
+  {-2,  7, -20, 106,  49, -16,  6, -2}, {-2,  7, -21, 104,  51, -16,  7, -2},
+  {-2,  7, -21, 102,  54, -17,  7, -2}, {-2,  8, -21, 100,  56, -18,  7, -2},
+  {-2,  8, -22,  98,  59, -18,  7, -2}, {-2,  8, -22,  96,  62, -19,  7, -2},
+  {-2,  8, -22,  94,  64, -19,  7, -2}, {-2,  8, -22,  91,  67, -20,  8, -2},
+  {-2,  8, -22,  89,  69, -20,  8, -2}, {-2,  8, -22,  87,  72, -21,  8, -2},
+  {-2,  8, -21,  84,  74, -21,  8, -2}, {-2,  8, -22,  82,  77, -21,  8, -2},
+  {-2,  8, -21,  79,  79, -21,  8, -2}, {-2,  8, -21,  77,  82, -22,  8, -2},
+  {-2,  8, -21,  74,  84, -21,  8, -2}, {-2,  8, -21,  72,  87, -22,  8, -2},
+  {-2,  8, -20,  69,  89, -22,  8, -2}, {-2,  8, -20,  67,  91, -22,  8, -2},
+  {-2,  7, -19,  64,  94, -22,  8, -2}, {-2,  7, -19,  62,  96, -22,  8, -2},
+  {-2,  7, -18,  59,  98, -22,  8, -2}, {-2,  7, -18,  56, 100, -21,  8, -2},
+  {-2,  7, -17,  54, 102, -21,  7, -2}, {-2,  7, -16,  51, 104, -21,  7, -2},
+  {-2,  6, -16,  49, 106, -20,  7, -2}, {-2,  6, -15,  46, 108, -20,  7, -2},
+  {-2,  6, -15,  43, 110, -19,  7, -2}, {-2,  6, -14,  41, 111, -19,  7, -2},
+  {-1,  5, -13,  38, 113, -18,  6, -2}, {-1,  5, -12,  35, 114, -17,  6, -2},
+  {-1,  5, -12,  33, 116, -17,  6, -2}, {-1,  4, -11,  30, 118, -16,  5, -1},
+  {-1,  4, -10,  27, 119, -15,  5, -1}, {-1,  4,  -9,  25, 120, -14,  5, -2},
+  {-1,  3,  -8,  23, 121, -13,  4, -1}, {-1,  3,  -7,  20, 122, -12,  4, -1},
+  {-1,  3,  -7,  18, 123, -11,  4, -1}, {-1,  3,  -6,  16, 124, -10,  3, -1},
+  {-1,  2,  -5,  13, 125,  -8,  3, -1}, {-1,  2,  -4,  11, 126,  -7,  2, -1},
+  { 0,  1,  -3,   8, 126,  -6,  2,  0}, { 0,  1,  -2,   6, 127,  -5,  1,  0},
+  { 0,  1,  -2,   4, 127,  -3,  1,  0}, { 0,  0,   0,   2, 127,  -1,  0,  0},
+
+  // [1, 2)
+  { 0, 0, 0,   1, 127,   0,   0, 0 }, { 0, 0, 0, - 1, 127,   2,   0, 0 },
+  { 0, 0, 1, - 3, 127,   4, - 1, 0 }, { 0, 0, 1, - 4, 126,   6, - 2, 1 },
+  { 0, 0, 1, - 5, 126,   8, - 3, 1 }, { 0, 0, 1, - 6, 125,  11, - 4, 1 },
+  { 0, 0, 1, - 7, 124,  13, - 4, 1 }, { 0, 0, 2, - 8, 123,  15, - 5, 1 },
+  { 0, 0, 2, - 9, 122,  18, - 6, 1 }, { 0, 0, 2, -10, 121,  20, - 6, 1 },
+  { 0, 0, 2, -11, 120,  22, - 7, 2 }, { 0, 0, 2, -12, 119,  25, - 8, 2 },
+  { 0, 0, 3, -13, 117,  27, - 8, 2 }, { 0, 0, 3, -13, 116,  29, - 9, 2 },
+  { 0, 0, 3, -14, 114,  32, -10, 3 }, { 0, 0, 3, -15, 113,  35, -10, 2 },
+  { 0, 0, 3, -15, 111,  37, -11, 3 }, { 0, 0, 3, -16, 109,  40, -11, 3 },
+  { 0, 0, 3, -16, 108,  42, -12, 3 }, { 0, 0, 4, -17, 106,  45, -13, 3 },
+  { 0, 0, 4, -17, 104,  47, -13, 3 }, { 0, 0, 4, -17, 102,  50, -14, 3 },
+  { 0, 0, 4, -17, 100,  52, -14, 3 }, { 0, 0, 4, -18,  98,  55, -15, 4 },
+  { 0, 0, 4, -18,  96,  58, -15, 3 }, { 0, 0, 4, -18,  94,  60, -16, 4 },
+  { 0, 0, 4, -18,  91,  63, -16, 4 }, { 0, 0, 4, -18,  89,  65, -16, 4 },
+  { 0, 0, 4, -18,  87,  68, -17, 4 }, { 0, 0, 4, -18,  85,  70, -17, 4 },
+  { 0, 0, 4, -18,  82,  73, -17, 4 }, { 0, 0, 4, -18,  80,  75, -17, 4 },
+  { 0, 0, 4, -18,  78,  78, -18, 4 }, { 0, 0, 4, -17,  75,  80, -18, 4 },
+  { 0, 0, 4, -17,  73,  82, -18, 4 }, { 0, 0, 4, -17,  70,  85, -18, 4 },
+  { 0, 0, 4, -17,  68,  87, -18, 4 }, { 0, 0, 4, -16,  65,  89, -18, 4 },
+  { 0, 0, 4, -16,  63,  91, -18, 4 }, { 0, 0, 4, -16,  60,  94, -18, 4 },
+  { 0, 0, 3, -15,  58,  96, -18, 4 }, { 0, 0, 4, -15,  55,  98, -18, 4 },
+  { 0, 0, 3, -14,  52, 100, -17, 4 }, { 0, 0, 3, -14,  50, 102, -17, 4 },
+  { 0, 0, 3, -13,  47, 104, -17, 4 }, { 0, 0, 3, -13,  45, 106, -17, 4 },
+  { 0, 0, 3, -12,  42, 108, -16, 3 }, { 0, 0, 3, -11,  40, 109, -16, 3 },
+  { 0, 0, 3, -11,  37, 111, -15, 3 }, { 0, 0, 2, -10,  35, 113, -15, 3 },
+  { 0, 0, 3, -10,  32, 114, -14, 3 }, { 0, 0, 2, - 9,  29, 116, -13, 3 },
+  { 0, 0, 2, - 8,  27, 117, -13, 3 }, { 0, 0, 2, - 8,  25, 119, -12, 2 },
+  { 0, 0, 2, - 7,  22, 120, -11, 2 }, { 0, 0, 1, - 6,  20, 121, -10, 2 },
+  { 0, 0, 1, - 6,  18, 122, - 9, 2 }, { 0, 0, 1, - 5,  15, 123, - 8, 2 },
+  { 0, 0, 1, - 4,  13, 124, - 7, 1 }, { 0, 0, 1, - 4,  11, 125, - 6, 1 },
+  { 0, 0, 1, - 3,   8, 126, - 5, 1 }, { 0, 0, 1, - 2,   6, 126, - 4, 1 },
+  { 0, 0, 0, - 1,   4, 127, - 3, 1 }, { 0, 0, 0,   0,   2, 127, - 1, 0 },
+  // dummy (replicate row index 191)
+  { 0, 0, 0,   0,   2, 127, - 1, 0 },
+
+#elif WARPEDPIXEL_PREC_BITS == 5
+  // [-1, 0)
+  {0,   0, 127,   1,   0, 0, 0, 0}, {1,  -3, 127,   4,  -1, 0, 0, 0},
+  {1,  -5, 126,   8,  -3, 1, 0, 0}, {1,  -7, 124,  13,  -4, 1, 0, 0},
+  {2,  -9, 122,  18,  -6, 1, 0, 0}, {2, -11, 120,  22,  -7, 2, 0, 0},
+  {3, -13, 117,  27,  -8, 2, 0, 0}, {3, -14, 114,  32, -10, 3, 0, 0},
+  {3, -15, 111,  37, -11, 3, 0, 0}, {3, -16, 108,  42, -12, 3, 0, 0},
+  {4, -17, 104,  47, -13, 3, 0, 0}, {4, -17, 100,  52, -14, 3, 0, 0},
+  {4, -18,  96,  58, -15, 3, 0, 0}, {4, -18,  91,  63, -16, 4, 0, 0},
+  {4, -18,  87,  68, -17, 4, 0, 0}, {4, -18,  82,  73, -17, 4, 0, 0},
+  {4, -18,  78,  78, -18, 4, 0, 0}, {4, -17,  73,  82, -18, 4, 0, 0},
+  {4, -17,  68,  87, -18, 4, 0, 0}, {4, -16,  63,  91, -18, 4, 0, 0},
+  {3, -15,  58,  96, -18, 4, 0, 0}, {3, -14,  52, 100, -17, 4, 0, 0},
+  {3, -13,  47, 104, -17, 4, 0, 0}, {3, -12,  42, 108, -16, 3, 0, 0},
+  {3, -11,  37, 111, -15, 3, 0, 0}, {3, -10,  32, 114, -14, 3, 0, 0},
+  {2,  -8,  27, 117, -13, 3, 0, 0}, {2,  -7,  22, 120, -11, 2, 0, 0},
+  {1,  -6,  18, 122,  -9, 2, 0, 0}, {1,  -4,  13, 124,  -7, 1, 0, 0},
+  {1,  -3,   8, 126,  -5, 1, 0, 0}, {0,  -1,   4, 127,  -3, 1, 0, 0},
+  // [0, 1)
+  { 0,  0,   0, 127,   1,   0,   0,  0}, { 0,  1,  -3, 127,   4,  -2,   1,  0},
+  { 0,  2,  -6, 126,   8,  -3,   1,  0}, {-1,  3,  -8, 125,  13,  -5,   2, -1},
+  {-1,  4, -11, 123,  18,  -7,   3, -1}, {-1,  4, -13, 121,  23,  -8,   3, -1},
+  {-1,  5, -15, 119,  27, -10,   4, -1}, {-2,  6, -17, 116,  33, -12,   5, -1},
+  {-2,  6, -18, 113,  38, -13,   5, -1}, {-2,  7, -19, 110,  43, -15,   6, -2},
+  {-2,  7, -20, 106,  49, -16,   6, -2}, {-2,  7, -21, 102,  54, -17,   7, -2},
+  {-2,  8, -22,  98,  59, -18,   7, -2}, {-2,  8, -22,  94,  64, -19,   7, -2},
+  {-2,  8, -22,  89,  69, -20,   8, -2}, {-2,  8, -21,  84,  74, -21,   8, -2},
+  {-2,  8, -21,  79,  79, -21,   8, -2}, {-2,  8, -21,  74,  84, -21,   8, -2},
+  {-2,  8, -20,  69,  89, -22,   8, -2}, {-2,  7, -19,  64,  94, -22,   8, -2},
+  {-2,  7, -18,  59,  98, -22,   8, -2}, {-2,  7, -17,  54, 102, -21,   7, -2},
+  {-2,  6, -16,  49, 106, -20,   7, -2}, {-2,  6, -15,  43, 110, -19,   7, -2},
+  {-1,  5, -13,  38, 113, -18,   6, -2}, {-1,  5, -12,  33, 116, -17,   6, -2},
+  {-1,  4, -10,  27, 119, -15,   5, -1}, {-1,  3,  -8,  23, 121, -13,   4, -1},
+  {-1,  3,  -7,  18, 123, -11,   4, -1}, {-1,  2,  -5,  13, 125,  -8,   3, -1},
+  { 0,  1,  -3,   8, 126,  -6,   2,  0}, { 0,  1,  -2,   4, 127,  -3,   1,  0},
+  // [1, 2)
+  {0, 0, 0,   1, 127,   0,   0, 0}, {0, 0, 1,  -3, 127,   4,  -1, 0},
+  {0, 0, 1,  -5, 126,   8,  -3, 1}, {0, 0, 1,  -7, 124,  13,  -4, 1},
+  {0, 0, 2,  -9, 122,  18,  -6, 1}, {0, 0, 2, -11, 120,  22,  -7, 2},
+  {0, 0, 3, -13, 117,  27,  -8, 2}, {0, 0, 3, -14, 114,  32, -10, 3},
+  {0, 0, 3, -15, 111,  37, -11, 3}, {0, 0, 3, -16, 108,  42, -12, 3},
+  {0, 0, 4, -17, 104,  47, -13, 3}, {0, 0, 4, -17, 100,  52, -14, 3},
+  {0, 0, 4, -18,  96,  58, -15, 3}, {0, 0, 4, -18,  91,  63, -16, 4},
+  {0, 0, 4, -18,  87,  68, -17, 4}, {0, 0, 4, -18,  82,  73, -17, 4},
+  {0, 0, 4, -18,  78,  78, -18, 4}, {0, 0, 4, -17,  73,  82, -18, 4},
+  {0, 0, 4, -17,  68,  87, -18, 4}, {0, 0, 4, -16,  63,  91, -18, 4},
+  {0, 0, 3, -15,  58,  96, -18, 4}, {0, 0, 3, -14,  52, 100, -17, 4},
+  {0, 0, 3, -13,  47, 104, -17, 4}, {0, 0, 3, -12,  42, 108, -16, 3},
+  {0, 0, 3, -11,  37, 111, -15, 3}, {0, 0, 3, -10,  32, 114, -14, 3},
+  {0, 0, 2,  -8,  27, 117, -13, 3}, {0, 0, 2,  -7,  22, 120, -11, 2},
+  {0, 0, 1,  -6,  18, 122,  -9, 2}, {0, 0, 1,  -4,  13, 124,  -7, 1},
+  {0, 0, 1,  -3,   8, 126,  -5, 1}, {0, 0, 0,  -1,   4, 127,  -3, 1},
+  // dummy (replicate row index 95)
+  {0, 0, 0,  -1,   4, 127,  -3, 1},
+
+#endif  // WARPEDPIXEL_PREC_BITS == 6
+};
+
+/* clang-format on */
+
+#define DIV_LUT_PREC_BITS 14
+#define DIV_LUT_BITS 8
+#define DIV_LUT_NUM (1 << DIV_LUT_BITS)
+
+static const uint16_t div_lut[DIV_LUT_NUM + 1] = {
+  16384, 16320, 16257, 16194, 16132, 16070, 16009, 15948, 15888, 15828, 15768,
+  15709, 15650, 15592, 15534, 15477, 15420, 15364, 15308, 15252, 15197, 15142,
+  15087, 15033, 14980, 14926, 14873, 14821, 14769, 14717, 14665, 14614, 14564,
+  14513, 14463, 14413, 14364, 14315, 14266, 14218, 14170, 14122, 14075, 14028,
+  13981, 13935, 13888, 13843, 13797, 13752, 13707, 13662, 13618, 13574, 13530,
+  13487, 13443, 13400, 13358, 13315, 13273, 13231, 13190, 13148, 13107, 13066,
+  13026, 12985, 12945, 12906, 12866, 12827, 12788, 12749, 12710, 12672, 12633,
+  12596, 12558, 12520, 12483, 12446, 12409, 12373, 12336, 12300, 12264, 12228,
+  12193, 12157, 12122, 12087, 12053, 12018, 11984, 11950, 11916, 11882, 11848,
+  11815, 11782, 11749, 11716, 11683, 11651, 11619, 11586, 11555, 11523, 11491,
+  11460, 11429, 11398, 11367, 11336, 11305, 11275, 11245, 11215, 11185, 11155,
+  11125, 11096, 11067, 11038, 11009, 10980, 10951, 10923, 10894, 10866, 10838,
+  10810, 10782, 10755, 10727, 10700, 10673, 10645, 10618, 10592, 10565, 10538,
+  10512, 10486, 10460, 10434, 10408, 10382, 10356, 10331, 10305, 10280, 10255,
+  10230, 10205, 10180, 10156, 10131, 10107, 10082, 10058, 10034, 10010, 9986,
+  9963,  9939,  9916,  9892,  9869,  9846,  9823,  9800,  9777,  9754,  9732,
+  9709,  9687,  9664,  9642,  9620,  9598,  9576,  9554,  9533,  9511,  9489,
+  9468,  9447,  9425,  9404,  9383,  9362,  9341,  9321,  9300,  9279,  9259,
+  9239,  9218,  9198,  9178,  9158,  9138,  9118,  9098,  9079,  9059,  9039,
+  9020,  9001,  8981,  8962,  8943,  8924,  8905,  8886,  8867,  8849,  8830,
+  8812,  8793,  8775,  8756,  8738,  8720,  8702,  8684,  8666,  8648,  8630,
+  8613,  8595,  8577,  8560,  8542,  8525,  8508,  8490,  8473,  8456,  8439,
+  8422,  8405,  8389,  8372,  8355,  8339,  8322,  8306,  8289,  8273,  8257,
+  8240,  8224,  8208,  8192,
+};
+
+// Decomposes a divisor D such that 1/D = y/2^shift, where y is returned
+// at precision of DIV_LUT_PREC_BITS along with the shift.
+static int16_t resolve_divisor_64(uint64_t D, int16_t *shift) {
+  int64_t f;
+  *shift = (int16_t)((D >> 32) ? get_msb((unsigned int)(D >> 32)) + 32
+                               : get_msb((unsigned int)D));
+  // e is obtained from D after resetting the most significant 1 bit.
+  const int64_t e = D - ((uint64_t)1 << *shift);
+  // Get the most significant DIV_LUT_BITS (8) bits of e into f
+  if (*shift > DIV_LUT_BITS)
+    f = ROUND_POWER_OF_TWO_64(e, *shift - DIV_LUT_BITS);
+  else
+    f = e << (DIV_LUT_BITS - *shift);
+  assert(f <= DIV_LUT_NUM);
+  *shift += DIV_LUT_PREC_BITS;
+  // Use f as lookup into the precomputed table of multipliers
+  return div_lut[f];
+}
+
+static int16_t resolve_divisor_32(uint32_t D, int16_t *shift) {
+  int32_t f;
+  *shift = get_msb(D);
+  // e is obtained from D after resetting the most significant 1 bit.
+  const int32_t e = D - ((uint32_t)1 << *shift);
+  // Get the most significant DIV_LUT_BITS (8) bits of e into f
+  if (*shift > DIV_LUT_BITS)
+    f = ROUND_POWER_OF_TWO(e, *shift - DIV_LUT_BITS);
+  else
+    f = e << (DIV_LUT_BITS - *shift);
+  assert(f <= DIV_LUT_NUM);
+  *shift += DIV_LUT_PREC_BITS;
+  // Use f as lookup into the precomputed table of multipliers
+  return div_lut[f];
+}
+
+static int is_affine_valid(const WarpedMotionParams *const wm) {
+  const int32_t *mat = wm->wmmat;
+  return (mat[2] > 0);
+}
+
+static int is_affine_shear_allowed(int16_t alpha, int16_t beta, int16_t gamma,
+                                   int16_t delta) {
+  if ((4 * abs(alpha) + 7 * abs(beta) >= (1 << WARPEDMODEL_PREC_BITS)) ||
+      (4 * abs(gamma) + 4 * abs(delta) >= (1 << WARPEDMODEL_PREC_BITS)))
+    return 0;
+  else
+    return 1;
+}
+
+// Returns 1 on success or 0 on an invalid affine set
+int get_shear_params(WarpedMotionParams *wm) {
+  const int32_t *mat = wm->wmmat;
+  if (!is_affine_valid(wm)) return 0;
+  wm->alpha =
+      clamp(mat[2] - (1 << WARPEDMODEL_PREC_BITS), INT16_MIN, INT16_MAX);
+  wm->beta = clamp(mat[3], INT16_MIN, INT16_MAX);
+  int16_t shift;
+  int16_t y = resolve_divisor_32(abs(mat[2]), &shift) * (mat[2] < 0 ? -1 : 1);
+  int64_t v = ((int64_t)mat[4] * (1 << WARPEDMODEL_PREC_BITS)) * y;
+  wm->gamma =
+      clamp((int)ROUND_POWER_OF_TWO_SIGNED_64(v, shift), INT16_MIN, INT16_MAX);
+  v = ((int64_t)mat[3] * mat[4]) * y;
+  wm->delta = clamp(mat[5] - (int)ROUND_POWER_OF_TWO_SIGNED_64(v, shift) -
+                        (1 << WARPEDMODEL_PREC_BITS),
+                    INT16_MIN, INT16_MAX);
+
+  wm->alpha = ROUND_POWER_OF_TWO_SIGNED(wm->alpha, WARP_PARAM_REDUCE_BITS) *
+              (1 << WARP_PARAM_REDUCE_BITS);
+  wm->beta = ROUND_POWER_OF_TWO_SIGNED(wm->beta, WARP_PARAM_REDUCE_BITS) *
+             (1 << WARP_PARAM_REDUCE_BITS);
+  wm->gamma = ROUND_POWER_OF_TWO_SIGNED(wm->gamma, WARP_PARAM_REDUCE_BITS) *
+              (1 << WARP_PARAM_REDUCE_BITS);
+  wm->delta = ROUND_POWER_OF_TWO_SIGNED(wm->delta, WARP_PARAM_REDUCE_BITS) *
+              (1 << WARP_PARAM_REDUCE_BITS);
+
+  if (!is_affine_shear_allowed(wm->alpha, wm->beta, wm->gamma, wm->delta))
+    return 0;
+
+  return 1;
+}
+
+static INLINE int highbd_error_measure(int err, int bd) {
+  const int b = bd - 8;
+  const int bmask = (1 << b) - 1;
+  const int v = (1 << b);
+  err = abs(err);
+  const int e1 = err >> b;
+  const int e2 = err & bmask;
+  return error_measure_lut[255 + e1] * (v - e2) +
+         error_measure_lut[256 + e1] * e2;
+}
+
+/* Note: For an explanation of the warp algorithm, and some notes on bit widths
+    for hardware implementations, see the comments above av1_warp_affine_c
+*/
+void av1_highbd_warp_affine_c(const int32_t *mat, const uint16_t *ref,
+                              int width, int height, int stride, uint16_t *pred,
+                              int p_col, int p_row, int p_width, int p_height,
+                              int p_stride, int subsampling_x,
+                              int subsampling_y, int bd,
+                              ConvolveParams *conv_params, int16_t alpha,
+                              int16_t beta, int16_t gamma, int16_t delta) {
+  int32_t tmp[15 * 8];
+  const int reduce_bits_horiz =
+      conv_params->round_0 +
+      AOMMAX(bd + FILTER_BITS - conv_params->round_0 - 14, 0);
+  const int reduce_bits_vert = conv_params->is_compound
+                                   ? conv_params->round_1
+                                   : 2 * FILTER_BITS - reduce_bits_horiz;
+  const int max_bits_horiz = bd + FILTER_BITS + 1 - reduce_bits_horiz;
+  const int offset_bits_horiz = bd + FILTER_BITS - 1;
+  const int offset_bits_vert = bd + 2 * FILTER_BITS - reduce_bits_horiz;
+  const int round_bits =
+      2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
+  const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+  (void)max_bits_horiz;
+  assert(IMPLIES(conv_params->is_compound, conv_params->dst != NULL));
+
+  for (int i = p_row; i < p_row + p_height; i += 8) {
+    for (int j = p_col; j < p_col + p_width; j += 8) {
+      // Calculate the center of this 8x8 block,
+      // project to luma coordinates (if in a subsampled chroma plane),
+      // apply the affine transformation,
+      // then convert back to the original coordinates (if necessary)
+      const int32_t src_x = (j + 4) << subsampling_x;
+      const int32_t src_y = (i + 4) << subsampling_y;
+      const int32_t dst_x = mat[2] * src_x + mat[3] * src_y + mat[0];
+      const int32_t dst_y = mat[4] * src_x + mat[5] * src_y + mat[1];
+      const int32_t x4 = dst_x >> subsampling_x;
+      const int32_t y4 = dst_y >> subsampling_y;
+
+      const int32_t ix4 = x4 >> WARPEDMODEL_PREC_BITS;
+      int32_t sx4 = x4 & ((1 << WARPEDMODEL_PREC_BITS) - 1);
+      const int32_t iy4 = y4 >> WARPEDMODEL_PREC_BITS;
+      int32_t sy4 = y4 & ((1 << WARPEDMODEL_PREC_BITS) - 1);
+
+      sx4 += alpha * (-4) + beta * (-4);
+      sy4 += gamma * (-4) + delta * (-4);
+
+      sx4 &= ~((1 << WARP_PARAM_REDUCE_BITS) - 1);
+      sy4 &= ~((1 << WARP_PARAM_REDUCE_BITS) - 1);
+
+      // Horizontal filter
+      for (int k = -7; k < 8; ++k) {
+        const int iy = clamp(iy4 + k, 0, height - 1);
+
+        int sx = sx4 + beta * (k + 4);
+        for (int l = -4; l < 4; ++l) {
+          int ix = ix4 + l - 3;
+          const int offs = ROUND_POWER_OF_TWO(sx, WARPEDDIFF_PREC_BITS) +
+                           WARPEDPIXEL_PREC_SHIFTS;
+          assert(offs >= 0 && offs <= WARPEDPIXEL_PREC_SHIFTS * 3);
+          const int16_t *coeffs = warped_filter[offs];
+
+          int32_t sum = 1 << offset_bits_horiz;
+          for (int m = 0; m < 8; ++m) {
+            const int sample_x = clamp(ix + m, 0, width - 1);
+            sum += ref[iy * stride + sample_x] * coeffs[m];
+          }
+          sum = ROUND_POWER_OF_TWO(sum, reduce_bits_horiz);
+          assert(0 <= sum && sum < (1 << max_bits_horiz));
+          tmp[(k + 7) * 8 + (l + 4)] = sum;
+          sx += alpha;
+        }
+      }
+
+      // Vertical filter
+      for (int k = -4; k < AOMMIN(4, p_row + p_height - i - 4); ++k) {
+        int sy = sy4 + delta * (k + 4);
+        for (int l = -4; l < AOMMIN(4, p_col + p_width - j - 4); ++l) {
+          const int offs = ROUND_POWER_OF_TWO(sy, WARPEDDIFF_PREC_BITS) +
+                           WARPEDPIXEL_PREC_SHIFTS;
+          assert(offs >= 0 && offs <= WARPEDPIXEL_PREC_SHIFTS * 3);
+          const int16_t *coeffs = warped_filter[offs];
+
+          int32_t sum = 1 << offset_bits_vert;
+          for (int m = 0; m < 8; ++m) {
+            sum += tmp[(k + m + 4) * 8 + (l + 4)] * coeffs[m];
+          }
+
+          if (conv_params->is_compound) {
+            CONV_BUF_TYPE *p =
+                &conv_params
+                     ->dst[(i - p_row + k + 4) * conv_params->dst_stride +
+                           (j - p_col + l + 4)];
+            sum = ROUND_POWER_OF_TWO(sum, reduce_bits_vert);
+            if (conv_params->do_average) {
+              uint16_t *dst16 =
+                  &pred[(i - p_row + k + 4) * p_stride + (j - p_col + l + 4)];
+              int32_t tmp32 = *p;
+              if (conv_params->use_dist_wtd_comp_avg) {
+                tmp32 = tmp32 * conv_params->fwd_offset +
+                        sum * conv_params->bck_offset;
+                tmp32 = tmp32 >> DIST_PRECISION_BITS;
+              } else {
+                tmp32 += sum;
+                tmp32 = tmp32 >> 1;
+              }
+              tmp32 = tmp32 - (1 << (offset_bits - conv_params->round_1)) -
+                      (1 << (offset_bits - conv_params->round_1 - 1));
+              *dst16 =
+                  clip_pixel_highbd(ROUND_POWER_OF_TWO(tmp32, round_bits), bd);
+            } else {
+              *p = sum;
+            }
+          } else {
+            uint16_t *p =
+                &pred[(i - p_row + k + 4) * p_stride + (j - p_col + l + 4)];
+            sum = ROUND_POWER_OF_TWO(sum, reduce_bits_vert);
+            assert(0 <= sum && sum < (1 << (bd + 2)));
+            *p = clip_pixel_highbd(sum - (1 << (bd - 1)) - (1 << bd), bd);
+          }
+          sy += gamma;
+        }
+      }
+    }
+  }
+}
+
+static void highbd_warp_plane(WarpedMotionParams *wm, const uint8_t *const ref8,
+                              int width, int height, int stride,
+                              const uint8_t *const pred8, int p_col, int p_row,
+                              int p_width, int p_height, int p_stride,
+                              int subsampling_x, int subsampling_y, int bd,
+                              ConvolveParams *conv_params) {
+  assert(wm->wmtype <= AFFINE);
+  if (wm->wmtype == ROTZOOM) {
+    wm->wmmat[5] = wm->wmmat[2];
+    wm->wmmat[4] = -wm->wmmat[3];
+  }
+  const int32_t *const mat = wm->wmmat;
+  const int16_t alpha = wm->alpha;
+  const int16_t beta = wm->beta;
+  const int16_t gamma = wm->gamma;
+  const int16_t delta = wm->delta;
+
+  const uint16_t *const ref = CONVERT_TO_SHORTPTR(ref8);
+  uint16_t *pred = CONVERT_TO_SHORTPTR(pred8);
+  av1_highbd_warp_affine(mat, ref, width, height, stride, pred, p_col, p_row,
+                         p_width, p_height, p_stride, subsampling_x,
+                         subsampling_y, bd, conv_params, alpha, beta, gamma,
+                         delta);
+}
+
+static int64_t highbd_frame_error(const uint16_t *const ref, int stride,
+                                  const uint16_t *const dst, int p_width,
+                                  int p_height, int p_stride, int bd) {
+  int64_t sum_error = 0;
+  for (int i = 0; i < p_height; ++i) {
+    for (int j = 0; j < p_width; ++j) {
+      sum_error +=
+          highbd_error_measure(dst[j + i * p_stride] - ref[j + i * stride], bd);
+    }
+  }
+  return sum_error;
+}
+
+static int64_t highbd_warp_error(
+    WarpedMotionParams *wm, const uint8_t *const ref8, int width, int height,
+    int stride, const uint8_t *const dst8, int p_col, int p_row, int p_width,
+    int p_height, int p_stride, int subsampling_x, int subsampling_y, int bd,
+    int64_t best_error) {
+  int64_t gm_sumerr = 0;
+  const int error_bsize_w = AOMMIN(p_width, WARP_ERROR_BLOCK);
+  const int error_bsize_h = AOMMIN(p_height, WARP_ERROR_BLOCK);
+  uint16_t tmp[WARP_ERROR_BLOCK * WARP_ERROR_BLOCK];
+
+  ConvolveParams conv_params = get_conv_params(0, 0, bd);
+  conv_params.use_dist_wtd_comp_avg = 0;
+  for (int i = p_row; i < p_row + p_height; i += WARP_ERROR_BLOCK) {
+    for (int j = p_col; j < p_col + p_width; j += WARP_ERROR_BLOCK) {
+      // avoid warping extra 8x8 blocks in the padded region of the frame
+      // when p_width and p_height are not multiples of WARP_ERROR_BLOCK
+      const int warp_w = AOMMIN(error_bsize_w, p_col + p_width - j);
+      const int warp_h = AOMMIN(error_bsize_h, p_row + p_height - i);
+      highbd_warp_plane(wm, ref8, width, height, stride,
+                        CONVERT_TO_BYTEPTR(tmp), j, i, warp_w, warp_h,
+                        WARP_ERROR_BLOCK, subsampling_x, subsampling_y, bd,
+                        &conv_params);
+
+      gm_sumerr += highbd_frame_error(
+          tmp, WARP_ERROR_BLOCK, CONVERT_TO_SHORTPTR(dst8) + j + i * p_stride,
+          warp_w, warp_h, p_stride, bd);
+      if (gm_sumerr > best_error) return gm_sumerr;
+    }
+  }
+  return gm_sumerr;
+}
+
+static INLINE int error_measure(int err) {
+  return error_measure_lut[255 + err];
+}
+
+/* The warp filter for ROTZOOM and AFFINE models works as follows:
+   * Split the input into 8x8 blocks
+   * For each block, project the point (4, 4) within the block, to get the
+     overall block position. Split into integer and fractional coordinates,
+     maintaining full WARPEDMODEL precision
+   * Filter horizontally: Generate 15 rows of 8 pixels each. Each pixel gets a
+     variable horizontal offset. This means that, while the rows of the
+     intermediate buffer align with the rows of the *reference* image, the
+     columns align with the columns of the *destination* image.
+   * Filter vertically: Generate the output block (up to 8x8 pixels, but if the
+     destination is too small we crop the output at this stage). Each pixel has
+     a variable vertical offset, so that the resulting rows are aligned with
+     the rows of the destination image.
+
+   To accomplish these alignments, we factor the warp matrix as a
+   product of two shear / asymmetric zoom matrices:
+   / a b \  = /   1       0    \ * / 1+alpha  beta \
+   \ c d /    \ gamma  1+delta /   \    0      1   /
+   where a, b, c, d are wmmat[2], wmmat[3], wmmat[4], wmmat[5] respectively.
+   The horizontal shear (with alpha and beta) is applied first,
+   then the vertical shear (with gamma and delta) is applied second.
+
+   The only limitation is that, to fit this in a fixed 8-tap filter size,
+   the fractional pixel offsets must be at most +-1. Since the horizontal filter
+   generates 15 rows of 8 columns, and the initial point we project is at (4, 4)
+   within the block, the parameters must satisfy
+   4 * |alpha| + 7 * |beta| <= 1   and   4 * |gamma| + 4 * |delta| <= 1
+   for this filter to be applicable.
+
+   Note: This function assumes that the caller has done all of the relevant
+   checks, ie. that we have a ROTZOOM or AFFINE model, that wm[4] and wm[5]
+   are set appropriately (if using a ROTZOOM model), and that alpha, beta,
+   gamma, delta are all in range.
+
+   TODO(david.barker): Maybe support scaled references?
+*/
+/* A note on hardware implementation:
+    The warp filter is intended to be implementable using the same hardware as
+    the high-precision convolve filters from the loop-restoration and
+    convolve-round experiments.
+
+    For a single filter stage, considering all of the coefficient sets for the
+    warp filter and the regular convolution filter, an input in the range
+    [0, 2^k - 1] is mapped into the range [-56 * (2^k - 1), 184 * (2^k - 1)]
+    before rounding.
+
+    Allowing for some changes to the filter coefficient sets, call the range
+    [-64 * 2^k, 192 * 2^k]. Then, if we initialize the accumulator to 64 * 2^k,
+    we can replace this by the range [0, 256 * 2^k], which can be stored in an
+    unsigned value with 8 + k bits.
+
+    This allows the derivation of the appropriate bit widths and offsets for
+    the various intermediate values: If
+
+    F := FILTER_BITS = 7 (or else the above ranges need adjusting)
+         So a *single* filter stage maps a k-bit input to a (k + F + 1)-bit
+         intermediate value.
+    H := ROUND0_BITS
+    V := VERSHEAR_REDUCE_PREC_BITS
+    (and note that we must have H + V = 2*F for the output to have the same
+     scale as the input)
+
+    then we end up with the following offsets and ranges:
+    Horizontal filter: Apply an offset of 1 << (bd + F - 1), sum fits into a
+                       uint{bd + F + 1}
+    After rounding: The values stored in 'tmp' fit into a uint{bd + F + 1 - H}.
+    Vertical filter: Apply an offset of 1 << (bd + 2*F - H), sum fits into a
+                     uint{bd + 2*F + 2 - H}
+    After rounding: The final value, before undoing the offset, fits into a
+                    uint{bd + 2}.
+
+    Then we need to undo the offsets before clamping to a pixel. Note that,
+    if we do this at the end, the amount to subtract is actually independent
+    of H and V:
+
+    offset to subtract = (1 << ((bd + F - 1) - H + F - V)) +
+                         (1 << ((bd + 2*F - H) - V))
+                      == (1 << (bd - 1)) + (1 << bd)
+
+    This allows us to entirely avoid clamping in both the warp filter and
+    the convolve-round experiment. As of the time of writing, the Wiener filter
+    from loop-restoration can encode a central coefficient up to 216, which
+    leads to a maximum value of about 282 * 2^k after applying the offset.
+    So in that case we still need to clamp.
+*/
+void av1_warp_affine_c(const int32_t *mat, const uint8_t *ref, int width,
+                       int height, int stride, uint8_t *pred, int p_col,
+                       int p_row, int p_width, int p_height, int p_stride,
+                       int subsampling_x, int subsampling_y,
+                       ConvolveParams *conv_params, int16_t alpha, int16_t beta,
+                       int16_t gamma, int16_t delta) {
+  int32_t tmp[15 * 8];
+  const int bd = 8;
+  const int reduce_bits_horiz = conv_params->round_0;
+  const int reduce_bits_vert = conv_params->is_compound
+                                   ? conv_params->round_1
+                                   : 2 * FILTER_BITS - reduce_bits_horiz;
+  const int max_bits_horiz = bd + FILTER_BITS + 1 - reduce_bits_horiz;
+  const int offset_bits_horiz = bd + FILTER_BITS - 1;
+  const int offset_bits_vert = bd + 2 * FILTER_BITS - reduce_bits_horiz;
+  const int round_bits =
+      2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
+  const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+  (void)max_bits_horiz;
+  assert(IMPLIES(conv_params->is_compound, conv_params->dst != NULL));
+  assert(IMPLIES(conv_params->do_average, conv_params->is_compound));
+
+  for (int i = p_row; i < p_row + p_height; i += 8) {
+    for (int j = p_col; j < p_col + p_width; j += 8) {
+      // Calculate the center of this 8x8 block,
+      // project to luma coordinates (if in a subsampled chroma plane),
+      // apply the affine transformation,
+      // then convert back to the original coordinates (if necessary)
+      const int32_t src_x = (j + 4) << subsampling_x;
+      const int32_t src_y = (i + 4) << subsampling_y;
+      const int32_t dst_x = mat[2] * src_x + mat[3] * src_y + mat[0];
+      const int32_t dst_y = mat[4] * src_x + mat[5] * src_y + mat[1];
+      const int32_t x4 = dst_x >> subsampling_x;
+      const int32_t y4 = dst_y >> subsampling_y;
+
+      int32_t ix4 = x4 >> WARPEDMODEL_PREC_BITS;
+      int32_t sx4 = x4 & ((1 << WARPEDMODEL_PREC_BITS) - 1);
+      int32_t iy4 = y4 >> WARPEDMODEL_PREC_BITS;
+      int32_t sy4 = y4 & ((1 << WARPEDMODEL_PREC_BITS) - 1);
+
+      sx4 += alpha * (-4) + beta * (-4);
+      sy4 += gamma * (-4) + delta * (-4);
+
+      sx4 &= ~((1 << WARP_PARAM_REDUCE_BITS) - 1);
+      sy4 &= ~((1 << WARP_PARAM_REDUCE_BITS) - 1);
+
+      // Horizontal filter
+      for (int k = -7; k < 8; ++k) {
+        // Clamp to top/bottom edge of the frame
+        const int iy = clamp(iy4 + k, 0, height - 1);
+
+        int sx = sx4 + beta * (k + 4);
+
+        for (int l = -4; l < 4; ++l) {
+          int ix = ix4 + l - 3;
+          // At this point, sx = sx4 + alpha * l + beta * k
+          const int offs = ROUND_POWER_OF_TWO(sx, WARPEDDIFF_PREC_BITS) +
+                           WARPEDPIXEL_PREC_SHIFTS;
+          assert(offs >= 0 && offs <= WARPEDPIXEL_PREC_SHIFTS * 3);
+          const int16_t *coeffs = warped_filter[offs];
+
+          int32_t sum = 1 << offset_bits_horiz;
+          for (int m = 0; m < 8; ++m) {
+            // Clamp to left/right edge of the frame
+            const int sample_x = clamp(ix + m, 0, width - 1);
+
+            sum += ref[iy * stride + sample_x] * coeffs[m];
+          }
+          sum = ROUND_POWER_OF_TWO(sum, reduce_bits_horiz);
+          assert(0 <= sum && sum < (1 << max_bits_horiz));
+          tmp[(k + 7) * 8 + (l + 4)] = sum;
+          sx += alpha;
+        }
+      }
+
+      // Vertical filter
+      for (int k = -4; k < AOMMIN(4, p_row + p_height - i - 4); ++k) {
+        int sy = sy4 + delta * (k + 4);
+        for (int l = -4; l < AOMMIN(4, p_col + p_width - j - 4); ++l) {
+          // At this point, sy = sy4 + gamma * l + delta * k
+          const int offs = ROUND_POWER_OF_TWO(sy, WARPEDDIFF_PREC_BITS) +
+                           WARPEDPIXEL_PREC_SHIFTS;
+          assert(offs >= 0 && offs <= WARPEDPIXEL_PREC_SHIFTS * 3);
+          const int16_t *coeffs = warped_filter[offs];
+
+          int32_t sum = 1 << offset_bits_vert;
+          for (int m = 0; m < 8; ++m) {
+            sum += tmp[(k + m + 4) * 8 + (l + 4)] * coeffs[m];
+          }
+
+          if (conv_params->is_compound) {
+            CONV_BUF_TYPE *p =
+                &conv_params
+                     ->dst[(i - p_row + k + 4) * conv_params->dst_stride +
+                           (j - p_col + l + 4)];
+            sum = ROUND_POWER_OF_TWO(sum, reduce_bits_vert);
+            if (conv_params->do_average) {
+              uint8_t *dst8 =
+                  &pred[(i - p_row + k + 4) * p_stride + (j - p_col + l + 4)];
+              int32_t tmp32 = *p;
+              if (conv_params->use_dist_wtd_comp_avg) {
+                tmp32 = tmp32 * conv_params->fwd_offset +
+                        sum * conv_params->bck_offset;
+                tmp32 = tmp32 >> DIST_PRECISION_BITS;
+              } else {
+                tmp32 += sum;
+                tmp32 = tmp32 >> 1;
+              }
+              tmp32 = tmp32 - (1 << (offset_bits - conv_params->round_1)) -
+                      (1 << (offset_bits - conv_params->round_1 - 1));
+              *dst8 = clip_pixel(ROUND_POWER_OF_TWO(tmp32, round_bits));
+            } else {
+              *p = sum;
+            }
+          } else {
+            uint8_t *p =
+                &pred[(i - p_row + k + 4) * p_stride + (j - p_col + l + 4)];
+            sum = ROUND_POWER_OF_TWO(sum, reduce_bits_vert);
+            assert(0 <= sum && sum < (1 << (bd + 2)));
+            *p = clip_pixel(sum - (1 << (bd - 1)) - (1 << bd));
+          }
+          sy += gamma;
+        }
+      }
+    }
+  }
+}
+
+static void warp_plane(WarpedMotionParams *wm, const uint8_t *const ref,
+                       int width, int height, int stride, uint8_t *pred,
+                       int p_col, int p_row, int p_width, int p_height,
+                       int p_stride, int subsampling_x, int subsampling_y,
+                       ConvolveParams *conv_params) {
+  assert(wm->wmtype <= AFFINE);
+  if (wm->wmtype == ROTZOOM) {
+    wm->wmmat[5] = wm->wmmat[2];
+    wm->wmmat[4] = -wm->wmmat[3];
+  }
+  const int32_t *const mat = wm->wmmat;
+  const int16_t alpha = wm->alpha;
+  const int16_t beta = wm->beta;
+  const int16_t gamma = wm->gamma;
+  const int16_t delta = wm->delta;
+  av1_warp_affine(mat, ref, width, height, stride, pred, p_col, p_row, p_width,
+                  p_height, p_stride, subsampling_x, subsampling_y, conv_params,
+                  alpha, beta, gamma, delta);
+}
+
+static int64_t frame_error(const uint8_t *const ref, int stride,
+                           const uint8_t *const dst, int p_width, int p_height,
+                           int p_stride) {
+  int64_t sum_error = 0;
+  for (int i = 0; i < p_height; ++i) {
+    for (int j = 0; j < p_width; ++j) {
+      sum_error +=
+          (int64_t)error_measure(dst[j + i * p_stride] - ref[j + i * stride]);
+    }
+  }
+  return sum_error;
+}
+
+static int64_t warp_error(WarpedMotionParams *wm, const uint8_t *const ref,
+                          int width, int height, int stride,
+                          const uint8_t *const dst, int p_col, int p_row,
+                          int p_width, int p_height, int p_stride,
+                          int subsampling_x, int subsampling_y,
+                          int64_t best_error) {
+  int64_t gm_sumerr = 0;
+  int warp_w, warp_h;
+  int error_bsize_w = AOMMIN(p_width, WARP_ERROR_BLOCK);
+  int error_bsize_h = AOMMIN(p_height, WARP_ERROR_BLOCK);
+  uint8_t tmp[WARP_ERROR_BLOCK * WARP_ERROR_BLOCK];
+  ConvolveParams conv_params = get_conv_params(0, 0, 8);
+  conv_params.use_dist_wtd_comp_avg = 0;
+
+  for (int i = p_row; i < p_row + p_height; i += WARP_ERROR_BLOCK) {
+    for (int j = p_col; j < p_col + p_width; j += WARP_ERROR_BLOCK) {
+      // avoid warping extra 8x8 blocks in the padded region of the frame
+      // when p_width and p_height are not multiples of WARP_ERROR_BLOCK
+      warp_w = AOMMIN(error_bsize_w, p_col + p_width - j);
+      warp_h = AOMMIN(error_bsize_h, p_row + p_height - i);
+      warp_plane(wm, ref, width, height, stride, tmp, j, i, warp_w, warp_h,
+                 WARP_ERROR_BLOCK, subsampling_x, subsampling_y, &conv_params);
+
+      gm_sumerr += frame_error(tmp, WARP_ERROR_BLOCK, dst + j + i * p_stride,
+                               warp_w, warp_h, p_stride);
+      if (gm_sumerr > best_error) return gm_sumerr;
+    }
+  }
+  return gm_sumerr;
+}
+
+int64_t av1_frame_error(int use_hbd, int bd, const uint8_t *ref, int stride,
+                        uint8_t *dst, int p_width, int p_height, int p_stride) {
+  if (use_hbd) {
+    return highbd_frame_error(CONVERT_TO_SHORTPTR(ref), stride,
+                              CONVERT_TO_SHORTPTR(dst), p_width, p_height,
+                              p_stride, bd);
+  }
+  return frame_error(ref, stride, dst, p_width, p_height, p_stride);
+}
+
+int64_t av1_warp_error(WarpedMotionParams *wm, int use_hbd, int bd,
+                       const uint8_t *ref, int width, int height, int stride,
+                       uint8_t *dst, int p_col, int p_row, int p_width,
+                       int p_height, int p_stride, int subsampling_x,
+                       int subsampling_y, int64_t best_error) {
+  if (wm->wmtype <= AFFINE)
+    if (!get_shear_params(wm)) return 1;
+  if (use_hbd)
+    return highbd_warp_error(wm, ref, width, height, stride, dst, p_col, p_row,
+                             p_width, p_height, p_stride, subsampling_x,
+                             subsampling_y, bd, best_error);
+  return warp_error(wm, ref, width, height, stride, dst, p_col, p_row, p_width,
+                    p_height, p_stride, subsampling_x, subsampling_y,
+                    best_error);
+}
+
+void av1_warp_plane(WarpedMotionParams *wm, int use_hbd, int bd,
+                    const uint8_t *ref, int width, int height, int stride,
+                    uint8_t *pred, int p_col, int p_row, int p_width,
+                    int p_height, int p_stride, int subsampling_x,
+                    int subsampling_y, ConvolveParams *conv_params) {
+  if (use_hbd)
+    highbd_warp_plane(wm, ref, width, height, stride, pred, p_col, p_row,
+                      p_width, p_height, p_stride, subsampling_x, subsampling_y,
+                      bd, conv_params);
+  else
+    warp_plane(wm, ref, width, height, stride, pred, p_col, p_row, p_width,
+               p_height, p_stride, subsampling_x, subsampling_y, conv_params);
+}
+
+#define LS_MV_MAX 256  // max mv in 1/8-pel
+// Use LS_STEP = 8 so that 2 less bits needed for A, Bx, By.
+#define LS_STEP 8
+
+// Assuming LS_MV_MAX is < MAX_SB_SIZE * 8,
+// the precision needed is:
+//   (MAX_SB_SIZE_LOG2 + 3) [for sx * sx magnitude] +
+//   (MAX_SB_SIZE_LOG2 + 4) [for sx * dx magnitude] +
+//   1 [for sign] +
+//   LEAST_SQUARES_SAMPLES_MAX_BITS
+//        [for adding up to LEAST_SQUARES_SAMPLES_MAX samples]
+// The value is 23
+#define LS_MAT_RANGE_BITS \
+  ((MAX_SB_SIZE_LOG2 + 4) * 2 + LEAST_SQUARES_SAMPLES_MAX_BITS)
+
+// Bit-depth reduction from the full-range
+#define LS_MAT_DOWN_BITS 2
+
+// bits range of A, Bx and By after downshifting
+#define LS_MAT_BITS (LS_MAT_RANGE_BITS - LS_MAT_DOWN_BITS)
+#define LS_MAT_MIN (-(1 << (LS_MAT_BITS - 1)))
+#define LS_MAT_MAX ((1 << (LS_MAT_BITS - 1)) - 1)
+
+// By setting LS_STEP = 8, the least 2 bits of every elements in A, Bx, By are
+// 0. So, we can reduce LS_MAT_RANGE_BITS(2) bits here.
+#define LS_SQUARE(a)                                          \
+  (((a) * (a)*4 + (a)*4 * LS_STEP + LS_STEP * LS_STEP * 2) >> \
+   (2 + LS_MAT_DOWN_BITS))
+#define LS_PRODUCT1(a, b)                                           \
+  (((a) * (b)*4 + ((a) + (b)) * 2 * LS_STEP + LS_STEP * LS_STEP) >> \
+   (2 + LS_MAT_DOWN_BITS))
+#define LS_PRODUCT2(a, b)                                               \
+  (((a) * (b)*4 + ((a) + (b)) * 2 * LS_STEP + LS_STEP * LS_STEP * 2) >> \
+   (2 + LS_MAT_DOWN_BITS))
+
+#define USE_LIMITED_PREC_MULT 0
+
+#if USE_LIMITED_PREC_MULT
+
+#define MUL_PREC_BITS 16
+static uint16_t resolve_multiplier_64(uint64_t D, int16_t *shift) {
+  int msb = 0;
+  uint16_t mult = 0;
+  *shift = 0;
+  if (D != 0) {
+    msb = (int16_t)((D >> 32) ? get_msb((unsigned int)(D >> 32)) + 32
+                              : get_msb((unsigned int)D));
+    if (msb >= MUL_PREC_BITS) {
+      mult = (uint16_t)ROUND_POWER_OF_TWO_64(D, msb + 1 - MUL_PREC_BITS);
+      *shift = msb + 1 - MUL_PREC_BITS;
+    } else {
+      mult = (uint16_t)D;
+      *shift = 0;
+    }
+  }
+  return mult;
+}
+
+static int32_t get_mult_shift_ndiag(int64_t Px, int16_t iDet, int shift) {
+  int32_t ret;
+  int16_t mshift;
+  uint16_t Mul = resolve_multiplier_64(llabs(Px), &mshift);
+  int32_t v = (int32_t)Mul * (int32_t)iDet * (Px < 0 ? -1 : 1);
+  shift -= mshift;
+  if (shift > 0) {
+    return (int32_t)clamp(ROUND_POWER_OF_TWO_SIGNED(v, shift),
+                          -WARPEDMODEL_NONDIAGAFFINE_CLAMP + 1,
+                          WARPEDMODEL_NONDIAGAFFINE_CLAMP - 1);
+  } else {
+    return (int32_t)clamp(v * (1 << (-shift)),
+                          -WARPEDMODEL_NONDIAGAFFINE_CLAMP + 1,
+                          WARPEDMODEL_NONDIAGAFFINE_CLAMP - 1);
+  }
+  return ret;
+}
+
+static int32_t get_mult_shift_diag(int64_t Px, int16_t iDet, int shift) {
+  int16_t mshift;
+  uint16_t Mul = resolve_multiplier_64(llabs(Px), &mshift);
+  int32_t v = (int32_t)Mul * (int32_t)iDet * (Px < 0 ? -1 : 1);
+  shift -= mshift;
+  if (shift > 0) {
+    return (int32_t)clamp(
+        ROUND_POWER_OF_TWO_SIGNED(v, shift),
+        (1 << WARPEDMODEL_PREC_BITS) - WARPEDMODEL_NONDIAGAFFINE_CLAMP + 1,
+        (1 << WARPEDMODEL_PREC_BITS) + WARPEDMODEL_NONDIAGAFFINE_CLAMP - 1);
+  } else {
+    return (int32_t)clamp(
+        v * (1 << (-shift)),
+        (1 << WARPEDMODEL_PREC_BITS) - WARPEDMODEL_NONDIAGAFFINE_CLAMP + 1,
+        (1 << WARPEDMODEL_PREC_BITS) + WARPEDMODEL_NONDIAGAFFINE_CLAMP - 1);
+  }
+}
+
+#else
+
+static int32_t get_mult_shift_ndiag(int64_t Px, int16_t iDet, int shift) {
+  int64_t v = Px * (int64_t)iDet;
+  return (int32_t)clamp64(ROUND_POWER_OF_TWO_SIGNED_64(v, shift),
+                          -WARPEDMODEL_NONDIAGAFFINE_CLAMP + 1,
+                          WARPEDMODEL_NONDIAGAFFINE_CLAMP - 1);
+}
+
+static int32_t get_mult_shift_diag(int64_t Px, int16_t iDet, int shift) {
+  int64_t v = Px * (int64_t)iDet;
+  return (int32_t)clamp64(
+      ROUND_POWER_OF_TWO_SIGNED_64(v, shift),
+      (1 << WARPEDMODEL_PREC_BITS) - WARPEDMODEL_NONDIAGAFFINE_CLAMP + 1,
+      (1 << WARPEDMODEL_PREC_BITS) + WARPEDMODEL_NONDIAGAFFINE_CLAMP - 1);
+}
+#endif  // USE_LIMITED_PREC_MULT
+
+static int find_affine_int(int np, const int *pts1, const int *pts2,
+                           BLOCK_SIZE bsize, int mvy, int mvx,
+                           WarpedMotionParams *wm, int mi_row, int mi_col) {
+  int32_t A[2][2] = { { 0, 0 }, { 0, 0 } };
+  int32_t Bx[2] = { 0, 0 };
+  int32_t By[2] = { 0, 0 };
+  int i;
+
+  const int bw = block_size_wide[bsize];
+  const int bh = block_size_high[bsize];
+  const int rsuy = (AOMMAX(bh, MI_SIZE) / 2 - 1);
+  const int rsux = (AOMMAX(bw, MI_SIZE) / 2 - 1);
+  const int suy = rsuy * 8;
+  const int sux = rsux * 8;
+  const int duy = suy + mvy;
+  const int dux = sux + mvx;
+  const int isuy = (mi_row * MI_SIZE + rsuy);
+  const int isux = (mi_col * MI_SIZE + rsux);
+
+  // Assume the center pixel of the block has exactly the same motion vector
+  // as transmitted for the block. First shift the origin of the source
+  // points to the block center, and the origin of the destination points to
+  // the block center added to the motion vector transmitted.
+  // Let (xi, yi) denote the source points and (xi', yi') denote destination
+  // points after origin shfifting, for i = 0, 1, 2, .... n-1.
+  // Then if  P = [x0, y0,
+  //               x1, y1
+  //               x2, y1,
+  //                ....
+  //              ]
+  //          q = [x0', x1', x2', ... ]'
+  //          r = [y0', y1', y2', ... ]'
+  // the least squares problems that need to be solved are:
+  //          [h1, h2]' = inv(P'P)P'q and
+  //          [h3, h4]' = inv(P'P)P'r
+  // where the affine transformation is given by:
+  //          x' = h1.x + h2.y
+  //          y' = h3.x + h4.y
+  //
+  // The loop below computes: A = P'P, Bx = P'q, By = P'r
+  // We need to just compute inv(A).Bx and inv(A).By for the solutions.
+  // Contribution from neighbor block
+  for (i = 0; i < np; i++) {
+    const int dx = pts2[i * 2] - dux;
+    const int dy = pts2[i * 2 + 1] - duy;
+    const int sx = pts1[i * 2] - sux;
+    const int sy = pts1[i * 2 + 1] - suy;
+    // (TODO)yunqing: This comparison wouldn't be necessary if the sample
+    // selection is done in find_samples(). Also, global offset can be removed
+    // while collecting samples.
+    if (abs(sx - dx) < LS_MV_MAX && abs(sy - dy) < LS_MV_MAX) {
+      A[0][0] += LS_SQUARE(sx);
+      A[0][1] += LS_PRODUCT1(sx, sy);
+      A[1][1] += LS_SQUARE(sy);
+      Bx[0] += LS_PRODUCT2(sx, dx);
+      Bx[1] += LS_PRODUCT1(sy, dx);
+      By[0] += LS_PRODUCT1(sx, dy);
+      By[1] += LS_PRODUCT2(sy, dy);
+    }
+  }
+
+  // Just for debugging, and can be removed later.
+  assert(A[0][0] >= LS_MAT_MIN && A[0][0] <= LS_MAT_MAX);
+  assert(A[0][1] >= LS_MAT_MIN && A[0][1] <= LS_MAT_MAX);
+  assert(A[1][1] >= LS_MAT_MIN && A[1][1] <= LS_MAT_MAX);
+  assert(Bx[0] >= LS_MAT_MIN && Bx[0] <= LS_MAT_MAX);
+  assert(Bx[1] >= LS_MAT_MIN && Bx[1] <= LS_MAT_MAX);
+  assert(By[0] >= LS_MAT_MIN && By[0] <= LS_MAT_MAX);
+  assert(By[1] >= LS_MAT_MIN && By[1] <= LS_MAT_MAX);
+
+  int64_t Det;
+  int16_t iDet, shift;
+
+  // Compute Determinant of A
+  Det = (int64_t)A[0][0] * A[1][1] - (int64_t)A[0][1] * A[0][1];
+  if (Det == 0) return 1;
+  iDet = resolve_divisor_64(llabs(Det), &shift) * (Det < 0 ? -1 : 1);
+  shift -= WARPEDMODEL_PREC_BITS;
+  if (shift < 0) {
+    iDet <<= (-shift);
+    shift = 0;
+  }
+
+  int64_t Px[2], Py[2];
+
+  // These divided by the Det, are the least squares solutions
+  Px[0] = (int64_t)A[1][1] * Bx[0] - (int64_t)A[0][1] * Bx[1];
+  Px[1] = -(int64_t)A[0][1] * Bx[0] + (int64_t)A[0][0] * Bx[1];
+  Py[0] = (int64_t)A[1][1] * By[0] - (int64_t)A[0][1] * By[1];
+  Py[1] = -(int64_t)A[0][1] * By[0] + (int64_t)A[0][0] * By[1];
+
+  wm->wmmat[2] = get_mult_shift_diag(Px[0], iDet, shift);
+  wm->wmmat[3] = get_mult_shift_ndiag(Px[1], iDet, shift);
+  wm->wmmat[4] = get_mult_shift_ndiag(Py[0], iDet, shift);
+  wm->wmmat[5] = get_mult_shift_diag(Py[1], iDet, shift);
+
+  // Note: In the vx, vy expressions below, the max value of each of the
+  // 2nd and 3rd terms are (2^16 - 1) * (2^13 - 1). That leaves enough room
+  // for the first term so that the overall sum in the worst case fits
+  // within 32 bits overall.
+  int32_t vx = mvx * (1 << (WARPEDMODEL_PREC_BITS - 3)) -
+               (isux * (wm->wmmat[2] - (1 << WARPEDMODEL_PREC_BITS)) +
+                isuy * wm->wmmat[3]);
+  int32_t vy = mvy * (1 << (WARPEDMODEL_PREC_BITS - 3)) -
+               (isux * wm->wmmat[4] +
+                isuy * (wm->wmmat[5] - (1 << WARPEDMODEL_PREC_BITS)));
+  wm->wmmat[0] =
+      clamp(vx, -WARPEDMODEL_TRANS_CLAMP, WARPEDMODEL_TRANS_CLAMP - 1);
+  wm->wmmat[1] =
+      clamp(vy, -WARPEDMODEL_TRANS_CLAMP, WARPEDMODEL_TRANS_CLAMP - 1);
+
+  wm->wmmat[6] = wm->wmmat[7] = 0;
+  return 0;
+}
+
+int find_projection(int np, int *pts1, int *pts2, BLOCK_SIZE bsize, int mvy,
+                    int mvx, WarpedMotionParams *wm_params, int mi_row,
+                    int mi_col) {
+  assert(wm_params->wmtype == AFFINE);
+
+  if (find_affine_int(np, pts1, pts2, bsize, mvy, mvx, wm_params, mi_row,
+                      mi_col))
+    return 1;
+
+  // check compatibility with the fast warp filter
+  if (!get_shear_params(wm_params)) return 1;
+
+  return 0;
+}

diff --git a/libav1/av1/common/warped_motion.h b/libav1/av1/common/warped_motion.h
new file mode 100644
index 0000000..b5d5665
--- /dev/null
+++ b/libav1/av1/common/warped_motion.h

@@ -0,0 +1,96 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_WARPED_MOTION_H_
+#define AOM_AV1_COMMON_WARPED_MOTION_H_
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <memory.h>
+#include <math.h>
+#include <assert.h>
+
+#include "config/aom_config.h"
+
+#include "aom_ports/mem.h"
+#include "aom_dsp/aom_dsp_common.h"
+#include "av1/common/mv.h"
+#include "av1/common/convolve.h"
+
+#define MAX_PARAMDIM 9
+#define LEAST_SQUARES_SAMPLES_MAX_BITS 3
+#define LEAST_SQUARES_SAMPLES_MAX (1 << LEAST_SQUARES_SAMPLES_MAX_BITS)
+#define SAMPLES_ARRAY_SIZE (LEAST_SQUARES_SAMPLES_MAX * 2)
+#define WARPED_MOTION_DEBUG 0
+#define DEFAULT_WMTYPE AFFINE
+
+extern const int16_t warped_filter[WARPEDPIXEL_PREC_SHIFTS * 3 + 1][8];
+
+static const uint8_t warp_pad_left[14][16] = {
+  { 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
+  { 2, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
+  { 3, 3, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
+  { 4, 4, 4, 4, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
+  { 5, 5, 5, 5, 5, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
+  { 6, 6, 6, 6, 6, 6, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
+  { 7, 7, 7, 7, 7, 7, 7, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
+  { 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 10, 11, 12, 13, 14, 15 },
+  { 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 11, 12, 13, 14, 15 },
+  { 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 12, 13, 14, 15 },
+  { 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 13, 14, 15 },
+  { 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 14, 15 },
+  { 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 14, 15 },
+  { 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15 },
+};
+
+static const uint8_t warp_pad_right[14][16] = {
+  { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 14 },
+  { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 13, 13 },
+  { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 12, 12 },
+  { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 11, 11, 11, 11 },
+  { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 10, 10, 10 },
+  { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 9, 9, 9, 9, 9 },
+  { 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 8, 8, 8, 8, 8, 8 },
+  { 0, 1, 2, 3, 4, 5, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7 },
+  { 0, 1, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6 },
+  { 0, 1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5 },
+  { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4 },
+  { 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3 },
+  { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 },
+  { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }
+};
+
+
+// Returns the error between the result of applying motion 'wm' to the frame
+// described by 'ref' and the frame described by 'dst'.
+int64_t av1_warp_error(WarpedMotionParams *wm, int use_hbd, int bd,
+                       const uint8_t *ref, int width, int height, int stride,
+                       uint8_t *dst, int p_col, int p_row, int p_width,
+                       int p_height, int p_stride, int subsampling_x,
+                       int subsampling_y, int64_t best_error);
+
+// Returns the error between the frame described by 'ref' and the frame
+// described by 'dst'.
+int64_t av1_frame_error(int use_hbd, int bd, const uint8_t *ref, int stride,
+                        uint8_t *dst, int p_width, int p_height, int p_stride);
+
+void av1_warp_plane(WarpedMotionParams *wm, int use_hbd, int bd,
+                    const uint8_t *ref, int width, int height, int stride,
+                    uint8_t *pred, int p_col, int p_row, int p_width,
+                    int p_height, int p_stride, int subsampling_x,
+                    int subsampling_y, ConvolveParams *conv_params);
+
+int find_projection(int np, int *pts1, int *pts2, BLOCK_SIZE bsize, int mvy,
+                    int mvx, WarpedMotionParams *wm_params, int mi_row,
+                    int mi_col);
+
+int get_shear_params(WarpedMotionParams *wm);
+#endif  // AOM_AV1_COMMON_WARPED_MOTION_H_

diff --git a/libav1/av1/decoder/accounting.c b/libav1/av1/decoder/accounting.c
new file mode 100644
index 0000000..8d8f3df
--- /dev/null
+++ b/libav1/av1/decoder/accounting.c

@@ -0,0 +1,138 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "aom/aom_integer.h"
+#include "av1/decoder/accounting.h"
+
+static int aom_accounting_hash(const char *str) {
+  uint32_t val;
+  const unsigned char *ustr;
+  val = 0;
+  ustr = (const unsigned char *)str;
+  /* This is about the worst hash one can design, but it should be good enough
+     here. */
+  while (*ustr) val += *ustr++;
+  return val % AOM_ACCOUNTING_HASH_SIZE;
+}
+
+/* Dictionary lookup based on an open-addressing hash table. */
+int aom_accounting_dictionary_lookup(Accounting *accounting, const char *str) {
+  int hash;
+  size_t len;
+  AccountingDictionary *dictionary;
+  dictionary = &accounting->syms.dictionary;
+  hash = aom_accounting_hash(str);
+  while (accounting->hash_dictionary[hash] != -1) {
+    if (strcmp(dictionary->strs[accounting->hash_dictionary[hash]], str) == 0) {
+      return accounting->hash_dictionary[hash];
+    }
+    hash++;
+    if (hash == AOM_ACCOUNTING_HASH_SIZE) hash = 0;
+  }
+  /* No match found. */
+  assert(dictionary->num_strs + 1 < MAX_SYMBOL_TYPES);
+  accounting->hash_dictionary[hash] = dictionary->num_strs;
+  len = strlen(str);
+  dictionary->strs[dictionary->num_strs] = malloc(len + 1);
+  snprintf(dictionary->strs[dictionary->num_strs], len + 1, "%s", str);
+  dictionary->num_strs++;
+  return dictionary->num_strs - 1;
+}
+
+void aom_accounting_init(Accounting *accounting) {
+  int i;
+  accounting->num_syms_allocated = 1000;
+  accounting->syms.syms =
+      malloc(sizeof(AccountingSymbol) * accounting->num_syms_allocated);
+  accounting->syms.dictionary.num_strs = 0;
+  assert(AOM_ACCOUNTING_HASH_SIZE > 2 * MAX_SYMBOL_TYPES);
+  for (i = 0; i < AOM_ACCOUNTING_HASH_SIZE; i++)
+    accounting->hash_dictionary[i] = -1;
+  aom_accounting_reset(accounting);
+}
+
+void aom_accounting_reset(Accounting *accounting) {
+  accounting->syms.num_syms = 0;
+  accounting->syms.num_binary_syms = 0;
+  accounting->syms.num_multi_syms = 0;
+  accounting->context.x = -1;
+  accounting->context.y = -1;
+  accounting->last_tell_frac = 0;
+}
+
+void aom_accounting_clear(Accounting *accounting) {
+  int i;
+  AccountingDictionary *dictionary;
+  free(accounting->syms.syms);
+  dictionary = &accounting->syms.dictionary;
+  for (i = 0; i < dictionary->num_strs; i++) {
+    free(dictionary->strs[i]);
+  }
+}
+
+void aom_accounting_set_context(Accounting *accounting, int16_t x, int16_t y) {
+  accounting->context.x = x;
+  accounting->context.y = y;
+}
+
+void aom_accounting_record(Accounting *accounting, const char *str,
+                           uint32_t bits) {
+  AccountingSymbol sym;
+  // Reuse previous symbol if it has the same context and symbol id.
+  if (accounting->syms.num_syms) {
+    AccountingSymbol *last_sym;
+    last_sym = &accounting->syms.syms[accounting->syms.num_syms - 1];
+    if (memcmp(&last_sym->context, &accounting->context,
+               sizeof(AccountingSymbolContext)) == 0) {
+      uint32_t id;
+      id = aom_accounting_dictionary_lookup(accounting, str);
+      if (id == last_sym->id) {
+        last_sym->bits += bits;
+        last_sym->samples++;
+        return;
+      }
+    }
+  }
+  sym.context = accounting->context;
+  sym.samples = 1;
+  sym.bits = bits;
+  sym.id = aom_accounting_dictionary_lookup(accounting, str);
+  assert(sym.id <= 255);
+  if (accounting->syms.num_syms == accounting->num_syms_allocated) {
+    accounting->num_syms_allocated *= 2;
+    accounting->syms.syms =
+        realloc(accounting->syms.syms,
+                sizeof(AccountingSymbol) * accounting->num_syms_allocated);
+    assert(accounting->syms.syms != NULL);
+  }
+  accounting->syms.syms[accounting->syms.num_syms++] = sym;
+}
+
+void aom_accounting_dump(Accounting *accounting) {
+  int i;
+  AccountingSymbol *sym;
+  printf("\n----- Number of recorded syntax elements = %d -----\n",
+         accounting->syms.num_syms);
+  printf("----- Total number of symbol calls = %d (%d binary) -----\n",
+         accounting->syms.num_multi_syms + accounting->syms.num_binary_syms,
+         accounting->syms.num_binary_syms);
+  for (i = 0; i < accounting->syms.num_syms; i++) {
+    sym = &accounting->syms.syms[i];
+    printf("%s x: %d, y: %d bits: %f samples: %d\n",
+           accounting->syms.dictionary.strs[sym->id], sym->context.x,
+           sym->context.y, (float)sym->bits / 8.0, sym->samples);
+  }
+}

diff --git a/libav1/av1/decoder/accounting.h b/libav1/av1/decoder/accounting.h
new file mode 100644
index 0000000..288e5e6
--- /dev/null
+++ b/libav1/av1/decoder/accounting.h

@@ -0,0 +1,82 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#ifndef AOM_AV1_DECODER_ACCOUNTING_H_
+#define AOM_AV1_DECODER_ACCOUNTING_H_
+#include <stdlib.h>
+#include "aom/aomdx.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif  // __cplusplus
+
+#define AOM_ACCOUNTING_HASH_SIZE (1021)
+
+/* Max number of entries for symbol types in the dictionary (increase as
+   necessary). */
+#define MAX_SYMBOL_TYPES (256)
+
+/*The resolution of fractional-precision bit usage measurements, i.e.,
+   3 => 1/8th bits.*/
+#define AOM_ACCT_BITRES (3)
+
+typedef struct {
+  int16_t x;
+  int16_t y;
+} AccountingSymbolContext;
+
+typedef struct {
+  AccountingSymbolContext context;
+  uint32_t id;
+  /** Number of bits in units of 1/8 bit. */
+  uint32_t bits;
+  uint32_t samples;
+} AccountingSymbol;
+
+/** Dictionary for translating strings into id. */
+typedef struct {
+  char *(strs[MAX_SYMBOL_TYPES]);
+  int num_strs;
+} AccountingDictionary;
+
+typedef struct {
+  /** All recorded symbols decoded. */
+  AccountingSymbol *syms;
+  /** Number of syntax actually recorded. */
+  int num_syms;
+  /** Raw symbol decoding calls for non-binary values. */
+  int num_multi_syms;
+  /** Raw binary symbol decoding calls. */
+  int num_binary_syms;
+  /** Dictionary for translating strings into id. */
+  AccountingDictionary dictionary;
+} AccountingSymbols;
+
+struct Accounting {
+  AccountingSymbols syms;
+  /** Size allocated for symbols (not all may be used). */
+  int num_syms_allocated;
+  int16_t hash_dictionary[AOM_ACCOUNTING_HASH_SIZE];
+  AccountingSymbolContext context;
+  uint32_t last_tell_frac;
+};
+
+void aom_accounting_init(Accounting *accounting);
+void aom_accounting_reset(Accounting *accounting);
+void aom_accounting_clear(Accounting *accounting);
+void aom_accounting_set_context(Accounting *accounting, int16_t x, int16_t y);
+int aom_accounting_dictionary_lookup(Accounting *accounting, const char *str);
+void aom_accounting_record(Accounting *accounting, const char *str,
+                           uint32_t bits);
+void aom_accounting_dump(Accounting *accounting);
+#ifdef __cplusplus
+}  // extern "C"
+#endif  // __cplusplus
+#endif  // AOM_AV1_DECODER_ACCOUNTING_H_

diff --git a/libav1/av1/decoder/decodeframe.c b/libav1/av1/decoder/decodeframe.c
new file mode 100644
index 0000000..9020d28
--- /dev/null
+++ b/libav1/av1/decoder/decodeframe.c

@@ -0,0 +1,4229 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+#include <stddef.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+#include "config/aom_scale_rtcd.h"
+#include "config/av1_rtcd.h"
+
+#include "aom/aom_codec.h"
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_dsp/binary_codes_reader.h"
+#include "aom_dsp/bitreader.h"
+#include "aom_dsp/bitreader_buffer.h"
+#include "aom_mem/aom_mem.h"
+#include "aom_ports/aom_timer.h"
+#include "aom_ports/mem.h"
+#include "aom_ports/mem_ops.h"
+#include "aom_scale/aom_scale.h"
+#include "aom_util/aom_thread.h"
+
+#if CONFIG_BITSTREAM_DEBUG || CONFIG_MISMATCH_DEBUG
+#include "aom_util/debug_util.h"
+#endif  // CONFIG_BITSTREAM_DEBUG || CONFIG_MISMATCH_DEBUG
+
+#include "av1/common/alloccommon.h"
+#include "av1/common/cdef.h"
+#include "av1/common/cfl.h"
+#if CONFIG_INSPECTION
+#include "av1/decoder/inspection.h"
+#endif
+#include "av1/common/common.h"
+#include "av1/common/entropy.h"
+#include "av1/common/entropymode.h"
+#include "av1/common/entropymv.h"
+#include "av1/common/frame_buffers.h"
+#include "av1/common/idct.h"
+#include "av1/common/mvref_common.h"
+#include "av1/common/pred_common.h"
+#include "av1/common/quant_common.h"
+#include "av1/common/reconinter.h"
+#include "av1/common/reconintra.h"
+#include "av1/common/resize.h"
+#include "av1/common/seg_common.h"
+#include "av1/common/thread_common.h"
+#include "av1/common/tile_common.h"
+#include "av1/common/warped_motion.h"
+#include "av1/common/obmc.h"
+#include "av1/decoder/decodeframe.h"
+#include "av1/decoder/decodemv.h"
+#include "av1/decoder/decoder.h"
+#include "av1/decoder/decodetxb.h"
+#include "av1/decoder/detokenize.h"
+
+
+#include "dx\av1_core.h"
+
+
+#define ACCT_STR __func__
+
+#define AOM_MIN_THREADS_PER_TILE 1
+#define AOM_MAX_THREADS_PER_TILE 2
+
+// This is needed by ext_tile related unit tests.
+#define EXT_TILE_DEBUG 1
+#define MC_TEMP_BUF_PELS                       \
+  (((MAX_SB_SIZE)*2 + (AOM_INTERP_EXTEND)*2) * \
+   ((MAX_SB_SIZE)*2 + (AOM_INTERP_EXTEND)*2))
+
+// Checks that the remaining bits start with a 1 and ends with 0s.
+// It consumes an additional byte, if already byte aligned before the check.
+int av1_check_trailing_bits(AV1Decoder *pbi, struct aom_read_bit_buffer *rb) {
+  AV1_COMMON *const cm = &pbi->common;
+  // bit_offset is set to 0 (mod 8) when the reader is already byte aligned
+  int bits_before_alignment = 8 - rb->bit_offset % 8;
+  int trailing = aom_rb_read_literal(rb, bits_before_alignment);
+  if (trailing != (1 << (bits_before_alignment - 1))) {
+    cm->error.error_code = AOM_CODEC_CORRUPT_FRAME;
+    return -1;
+  }
+  return 0;
+}
+
+// Use only_chroma = 1 to only set the chroma planes
+static void set_planes_to_neutral_grey(const SequenceHeader *const seq_params,
+                                       const YV12_BUFFER_CONFIG *const buf,
+                                       int only_chroma) {
+  HwFrameBuffer * hbuf = buf->hw_buffer;
+//  if (hbuf)
+//    return;
+//  if (seq_params->use_highbitdepth) {
+//    const int val = 1 << (seq_params->bit_depth - 1);
+//    for (int plane = only_chroma; plane < MAX_MB_PLANE; plane++) {
+//      uint16_t *const base = (uint16_t*)(hbuf->pool_ptr + hbuf->planes[plane].offset);
+//      const int sz = (plane ? hbuf->uv_crop_height : hbuf->y_crop_height) * hbuf->planes[plane].stride;
+//      if (sz > 0)
+//        aom_memset16(base, val, sz);
+//    }
+//  } else {
+//    for (int plane = only_chroma; plane < MAX_MB_PLANE; plane++) {
+//      int8_t *const base = hbuf->pool_ptr + hbuf->planes[plane].offset;
+//      const int sz = (plane ? hbuf->uv_crop_height : hbuf->y_crop_height) * hbuf->planes[plane].stride;
+//      if (sz > 0)
+//          memset(base, 1 << 7, sz);
+//    }
+//  }
+}
+
+static void loop_restoration_read_sb_coeffs(const AV1_COMMON *const cm,
+                                            MACROBLOCKD *xd,
+                                            aom_reader *const r, int plane,
+                                            int runit_idx);
+
+static void setup_compound_reference_mode(AV1_COMMON *cm) {
+  cm->comp_fwd_ref[0] = LAST_FRAME;
+  cm->comp_fwd_ref[1] = LAST2_FRAME;
+  cm->comp_fwd_ref[2] = LAST3_FRAME;
+  cm->comp_fwd_ref[3] = GOLDEN_FRAME;
+
+  cm->comp_bwd_ref[0] = BWDREF_FRAME;
+  cm->comp_bwd_ref[1] = ALTREF2_FRAME;
+  cm->comp_bwd_ref[2] = ALTREF_FRAME;
+}
+
+static int read_is_valid(const uint8_t *start, size_t len, const uint8_t *end) {
+  return len != 0 && len <= (size_t)(end - start);
+}
+
+static TX_MODE read_tx_mode(AV1_COMMON *cm, struct aom_read_bit_buffer *rb) {
+  if (cm->coded_lossless) return ONLY_4X4;
+  return aom_rb_read_bit(rb) ? TX_MODE_SELECT : TX_MODE_LARGEST;
+}
+
+static REFERENCE_MODE read_frame_reference_mode(
+    const AV1_COMMON *cm, struct aom_read_bit_buffer *rb) {
+  if (frame_is_intra_only(cm)) {
+    return SINGLE_REFERENCE;
+  } else {
+    return aom_rb_read_bit(rb) ? REFERENCE_MODE_SELECT : SINGLE_REFERENCE;
+  }
+}
+
+static void read_coeffs_tx_intra_block(const AV1_COMMON *const cm,
+                                       MACROBLOCKD *const xd,
+                                       aom_reader *const r, const int plane,
+                                       const int row, const int col,
+                                       const TX_SIZE tx_size) {
+  MB_MODE_INFO *mbmi = xd->mi[0];
+  if (!mbmi->skip) {
+#if TXCOEFF_TIMER
+    struct aom_usec_timer timer;
+    aom_usec_timer_start(&timer);
+#endif
+    av1_read_coeffs_txb_facade(cm, xd, r, plane, row, col, tx_size);
+#if TXCOEFF_TIMER
+    aom_usec_timer_mark(&timer);
+    const int64_t elapsed_time = aom_usec_timer_elapsed(&timer);
+    cm->txcoeff_timer += elapsed_time;
+    ++cm->txb_count;
+#endif
+  }
+}
+
+static void decode_block_void(const AV1_COMMON *const cm, MACROBLOCKD *const xd,
+                              aom_reader *const r, const int plane,
+                              const int row, const int col,
+                              const TX_SIZE tx_size) {
+  (void)cm;
+  (void)xd;
+  (void)r;
+  (void)plane;
+  (void)row;
+  (void)col;
+  (void)tx_size;
+}
+
+static void predict_inter_block_void(AV1_COMMON *const cm,
+                                     MACROBLOCKD *const xd, int mi_row,
+                                     int mi_col, BLOCK_SIZE bsize) {
+  (void)cm;
+  (void)xd;
+  (void)mi_row;
+  (void)mi_col;
+  (void)bsize;
+}
+
+static void cfl_store_inter_block_void(AV1_COMMON *const cm,
+                                       MACROBLOCKD *const xd) {
+  (void)cm;
+  (void)xd;
+}
+
+static void set_cb_buffer_offsets(MACROBLOCKD *const xd, TX_SIZE tx_size,
+                                  int plane) {
+  xd->cb_offset[plane] += tx_size_wide[tx_size] * tx_size_high[tx_size];
+  xd->txb_offset[plane] =
+      xd->cb_offset[plane] / (TX_SIZE_W_MIN * TX_SIZE_H_MIN);
+}
+
+static void decode_reconstruct_tx(AV1_COMMON *cm, ThreadData *const td,
+                                  aom_reader *r, MB_MODE_INFO *const mbmi,
+                                  int plane, BLOCK_SIZE plane_bsize,
+                                  int blk_row, int blk_col, int block,
+                                  TX_SIZE tx_size, int *eob_total) {
+  MACROBLOCKD *const xd = &td->xd;
+  const struct macroblockd_plane *const pd = &xd->plane[plane];
+  const TX_SIZE plane_tx_size =
+      plane ? av1_get_max_uv_txsize(mbmi->sb_type, pd->subsampling_x,
+                                    pd->subsampling_y)
+            : mbmi->inter_tx_size[av1_get_txb_size_index(plane_bsize, blk_row,
+                                                         blk_col)];
+  // Scale to match transform block unit.
+  const int max_blocks_high = max_block_high(xd, plane_bsize, plane);
+  const int max_blocks_wide = max_block_wide(xd, plane_bsize, plane);
+
+  if (blk_row >= max_blocks_high || blk_col >= max_blocks_wide) return;
+
+  if (tx_size == plane_tx_size || plane) {
+    td->read_coeffs_tx_inter_block_visit(cm, xd, r, plane, blk_row, blk_col,
+                                         tx_size);
+
+    td->inverse_tx_inter_block_visit(cm, xd, r, plane, blk_row, blk_col,
+                                     tx_size);
+    eob_info *eob_data = pd->eob_data + xd->txb_offset[plane];
+    *eob_total += eob_data->eob;
+    set_cb_buffer_offsets(xd, tx_size, plane);
+  } else {
+    const TX_SIZE sub_txs = sub_tx_size_map[tx_size];
+    assert(IMPLIES(tx_size <= TX_4X4, sub_txs == tx_size));
+    assert(IMPLIES(tx_size > TX_4X4, sub_txs < tx_size));
+    const int bsw = tx_size_wide_unit[sub_txs];
+    const int bsh = tx_size_high_unit[sub_txs];
+    const int sub_step = bsw * bsh;
+
+    assert(bsw > 0 && bsh > 0);
+
+    for (int row = 0; row < tx_size_high_unit[tx_size]; row += bsh) {
+      for (int col = 0; col < tx_size_wide_unit[tx_size]; col += bsw) {
+        const int offsetr = blk_row + row;
+        const int offsetc = blk_col + col;
+
+        if (offsetr >= max_blocks_high || offsetc >= max_blocks_wide) continue;
+
+        decode_reconstruct_tx(cm, td, r, mbmi, plane, plane_bsize, offsetr,
+                              offsetc, block, sub_txs, eob_total);
+        block += sub_step;
+      }
+    }
+  }
+}
+
+
+static void set_offsets(AV1_COMMON *const cm, MACROBLOCKD *const xd,
+                        BLOCK_SIZE bsize, int mi_row, int mi_col, int bw,
+                        int bh, int x_mis, int y_mis, ThreadData *const td) {
+  const int num_planes = av1_num_planes(cm);
+
+  const int offset = mi_row * cm->mi_stride + mi_col;
+  const TileInfo *const tile = &xd->tile;
+
+  xd->mi = cm->mi_grid_visible + offset;
+  if (td->mi_pool2) {
+    xd->mi[0] = &td->mi_pool2[td->mi_count2++];
+  } else {
+    xd->mi[0] = &td->mi_pool[td->mi_count++];
+  }
+  // TODO(slavarnway): Generate sb_type based on bwl and bhl, instead of
+  // passing bsize from decode_partition().
+  xd->mi[0]->sb_type = bsize;
+  xd->mi[0]->mi_row = mi_row;
+  xd->mi[0]->mi_col = mi_col;
+  xd->cfl.mi_row = mi_row;
+  xd->cfl.mi_col = mi_col;
+
+  assert(x_mis && y_mis);
+  for (int x = 1; x < x_mis; ++x) xd->mi[x] = xd->mi[0];
+  int idx = cm->mi_stride;
+  for (int y = 1; y < y_mis; ++y) {
+    memcpy(&xd->mi[idx], &xd->mi[0], x_mis * sizeof(xd->mi[0]));
+    idx += cm->mi_stride;
+  }
+
+  set_plane_n4(xd, bw, bh, num_planes);
+  set_skip_context(xd, mi_row, mi_col, num_planes);
+
+  // Distance of Mb to the various image edges. These are specified to 8th pel
+  // as they are always compared to values that are in 1/8th pel units
+  set_mi_row_col(xd, tile, mi_row, bh, mi_col, bw, cm->mi_rows, cm->mi_cols);
+
+  av1_setup_dst_planes(xd->plane, bsize, &cm->cur_frame->buf, mi_row, mi_col, 0,
+                       num_planes);
+}
+
+static void decode_mbmi_block(AV1Decoder *const pbi, MACROBLOCKD *const xd,
+                              int mi_row, int mi_col, aom_reader *r,
+                              PARTITION_TYPE partition, BLOCK_SIZE bsize,
+                              ThreadData *const td) {
+  AV1_COMMON *const cm = &pbi->common;
+  const SequenceHeader *const seq_params = &cm->seq_params;
+  const int bw = mi_size_wide[bsize];
+  const int bh = mi_size_high[bsize];
+  const int x_mis = AOMMIN(bw, cm->mi_cols - mi_col);
+  const int y_mis = AOMMIN(bh, cm->mi_rows - mi_row);
+
+#if CONFIG_ACCOUNTING
+  aom_accounting_set_context(&pbi->accounting, mi_col, mi_row);
+#endif
+  if (td->mi_count >= td->mi_count_max && td->mi_pool2 == NULL) {
+      av1_setup_sec_data(pbi, cm, td);
+  }
+  set_offsets(cm, xd, bsize, mi_row, mi_col, bw, bh, x_mis, y_mis, td);
+  xd->mi[0]->partition = partition;
+  av1_read_mode_info(pbi, xd, mi_row, mi_col, r, x_mis, y_mis);
+  if (bsize >= BLOCK_8X8 &&
+      (seq_params->subsampling_x || seq_params->subsampling_y)) {
+    const BLOCK_SIZE uv_subsize =
+        ss_size_lookup[bsize][seq_params->subsampling_x]
+                      [seq_params->subsampling_y];
+    if (uv_subsize == BLOCK_INVALID)
+      aom_internal_error(xd->error_info, AOM_CODEC_CORRUPT_FRAME,
+                         "Invalid block size.");
+  }
+}
+
+static void set_color_index_map_offset(AV1Decoder * pbi, MACROBLOCKD *const xd, int plane,
+                                       aom_reader *r) {
+  (void)r;
+  Av1ColorMapParam params;
+  const MB_MODE_INFO *const mbmi = xd->mi[0];
+  av1_get_block_dimensions(mbmi->sb_type, plane, xd, &params.plane_width,
+                           &params.plane_height, NULL, NULL);
+  xd->color_index_map_offset[plane] += params.plane_width * params.plane_height;
+}
+
+static void decode_token_recon_block(AV1Decoder *const pbi,
+                                     ThreadData *const td, int mi_row,
+                                     int mi_col, aom_reader *r,
+                                     BLOCK_SIZE bsize) {
+  AV1_COMMON *const cm = &pbi->common;
+  MACROBLOCKD *const xd = &td->xd;
+  const int num_planes = av1_num_planes(cm);
+
+  MB_MODE_INFO *mbmi = xd->mi[0];
+  CFL_CTX *const cfl = &xd->cfl;
+  cfl->is_chroma_reference = is_chroma_reference(
+      mi_row, mi_col, bsize, cfl->subsampling_x, cfl->subsampling_y);
+
+  if (!is_inter_block(mbmi)) {
+    int row, col;
+    assert(bsize == get_plane_block_size(bsize, xd->plane[0].subsampling_x,
+                                         xd->plane[0].subsampling_y));
+    const int max_blocks_wide = max_block_wide(xd, bsize, 0);
+    const int max_blocks_high = max_block_high(xd, bsize, 0);
+    const BLOCK_SIZE max_unit_bsize = BLOCK_64X64;
+    int mu_blocks_wide =
+        block_size_wide[max_unit_bsize] >> tx_size_wide_log2[0];
+    int mu_blocks_high =
+        block_size_high[max_unit_bsize] >> tx_size_high_log2[0];
+    mu_blocks_wide = AOMMIN(max_blocks_wide, mu_blocks_wide);
+    mu_blocks_high = AOMMIN(max_blocks_high, mu_blocks_high);
+
+    for (row = 0; row < max_blocks_high; row += mu_blocks_high) {
+      for (col = 0; col < max_blocks_wide; col += mu_blocks_wide) {
+        for (int plane = 0; plane < num_planes; ++plane) {
+          const struct macroblockd_plane *const pd = &xd->plane[plane];
+          if (!is_chroma_reference(mi_row, mi_col, bsize, pd->subsampling_x,
+                                   pd->subsampling_y))
+            continue;
+
+          const TX_SIZE tx_size = av1_get_tx_size(plane, xd);
+          const int stepr = tx_size_high_unit[tx_size];
+          const int stepc = tx_size_wide_unit[tx_size];
+
+          const int unit_height = ROUND_POWER_OF_TWO(
+              AOMMIN(mu_blocks_high + row, max_blocks_high), pd->subsampling_y);
+          const int unit_width = ROUND_POWER_OF_TWO(
+              AOMMIN(mu_blocks_wide + col, max_blocks_wide), pd->subsampling_x);
+
+          for (int blk_row = row >> pd->subsampling_y; blk_row < unit_height;
+               blk_row += stepr) {
+            for (int blk_col = col >> pd->subsampling_x; blk_col < unit_width;
+                 blk_col += stepc) {
+              td->read_coeffs_tx_intra_block_visit(cm, xd, r, plane, blk_row,
+                                                   blk_col, tx_size);
+              td->predict_and_recon_intra_block_visit(cm, xd, r, plane, blk_row,
+                                                      blk_col, tx_size);
+              set_cb_buffer_offsets(xd, tx_size, plane);
+            }
+          }
+        }
+      }
+    }
+  } else {
+    td->predict_inter_block_visit(cm, xd, mi_row, mi_col, bsize);
+    // Reconstruction
+    if (!mbmi->skip) {
+      int eobtotal = 0;
+
+      const int max_blocks_wide = max_block_wide(xd, bsize, 0);
+      const int max_blocks_high = max_block_high(xd, bsize, 0);
+      int row, col;
+
+      const BLOCK_SIZE max_unit_bsize = BLOCK_64X64;
+      assert(max_unit_bsize ==
+             get_plane_block_size(BLOCK_64X64, xd->plane[0].subsampling_x,
+                                  xd->plane[0].subsampling_y));
+      int mu_blocks_wide =
+          block_size_wide[max_unit_bsize] >> tx_size_wide_log2[0];
+      int mu_blocks_high =
+          block_size_high[max_unit_bsize] >> tx_size_high_log2[0];
+
+      mu_blocks_wide = AOMMIN(max_blocks_wide, mu_blocks_wide);
+      mu_blocks_high = AOMMIN(max_blocks_high, mu_blocks_high);
+
+      for (row = 0; row < max_blocks_high; row += mu_blocks_high) {
+        for (col = 0; col < max_blocks_wide; col += mu_blocks_wide) {
+          for (int plane = 0; plane < num_planes; ++plane) {
+            const struct macroblockd_plane *const pd = &xd->plane[plane];
+            if (!is_chroma_reference(mi_row, mi_col, bsize, pd->subsampling_x,
+                                     pd->subsampling_y))
+              continue;
+            const BLOCK_SIZE bsizec =
+                scale_chroma_bsize(bsize, pd->subsampling_x, pd->subsampling_y);
+            const BLOCK_SIZE plane_bsize = get_plane_block_size(
+                bsizec, pd->subsampling_x, pd->subsampling_y);
+
+            const TX_SIZE max_tx_size =
+                get_vartx_max_txsize(xd, plane_bsize, plane);
+            const int bh_var_tx = tx_size_high_unit[max_tx_size];
+            const int bw_var_tx = tx_size_wide_unit[max_tx_size];
+            int block = 0;
+            int step =
+                tx_size_wide_unit[max_tx_size] * tx_size_high_unit[max_tx_size];
+            int blk_row, blk_col;
+            const int unit_height = ROUND_POWER_OF_TWO(
+                AOMMIN(mu_blocks_high + row, max_blocks_high),
+                pd->subsampling_y);
+            const int unit_width = ROUND_POWER_OF_TWO(
+                AOMMIN(mu_blocks_wide + col, max_blocks_wide),
+                pd->subsampling_x);
+
+            for (blk_row = row >> pd->subsampling_y; blk_row < unit_height;
+                 blk_row += bh_var_tx) {
+              for (blk_col = col >> pd->subsampling_x; blk_col < unit_width;
+                   blk_col += bw_var_tx) {
+                decode_reconstruct_tx(cm, td, r, mbmi, plane, plane_bsize,
+                                      blk_row, blk_col, block, max_tx_size,
+                                      &eobtotal);
+                block += step;
+              }
+            }
+          }
+        }
+      }
+    }
+    td->cfl_store_inter_block_visit(cm, xd);
+  }
+
+  av1_visit_palette(pbi, xd, mi_row, mi_col, r, bsize,
+                    set_color_index_map_offset);
+}
+
+#if LOOP_FILTER_BITMASK
+static void store_bitmask_vartx(AV1_COMMON *cm, int mi_row, int mi_col,
+                                BLOCK_SIZE bsize, TX_SIZE tx_size,
+                                MB_MODE_INFO *mbmi);
+#endif
+
+static void set_inter_tx_size(MB_MODE_INFO *mbmi, int stride_log2,
+                              int tx_w_log2, int tx_h_log2, int min_txs,
+                              int split_size, int txs, int blk_row,
+                              int blk_col) {
+  for (int idy = 0; idy < tx_size_high_unit[split_size];
+       idy += tx_size_high_unit[min_txs]) {
+    for (int idx = 0; idx < tx_size_wide_unit[split_size];
+         idx += tx_size_wide_unit[min_txs]) {
+      const int index = (((blk_row + idy) >> tx_h_log2) << stride_log2) +
+                        ((blk_col + idx) >> tx_w_log2);
+      mbmi->inter_tx_size[index] = txs;
+    }
+  }
+}
+
+static void read_tx_size_vartx(MACROBLOCKD *xd, MB_MODE_INFO *mbmi,
+                               TX_SIZE tx_size, int depth,
+#if LOOP_FILTER_BITMASK
+                               AV1_COMMON *cm, int mi_row, int mi_col,
+                               int store_bitmask,
+#endif
+                               int blk_row, int blk_col, aom_reader *r) {
+  FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+  int is_split = 0;
+  const BLOCK_SIZE bsize = mbmi->sb_type;
+  const int max_blocks_high = max_block_high(xd, bsize, 0);
+  const int max_blocks_wide = max_block_wide(xd, bsize, 0);
+  if (blk_row >= max_blocks_high || blk_col >= max_blocks_wide) return;
+  assert(tx_size > TX_4X4);
+  TX_SIZE txs = max_txsize_rect_lookup[bsize];
+  for (int level = 0; level < MAX_VARTX_DEPTH - 1; ++level)
+    txs = sub_tx_size_map[txs];
+  const int tx_w_log2 = tx_size_wide_log2[txs] - MI_SIZE_LOG2;
+  const int tx_h_log2 = tx_size_high_log2[txs] - MI_SIZE_LOG2;
+  const int bw_log2 = mi_size_wide_log2[bsize];
+  const int stride_log2 = bw_log2 - tx_w_log2;
+
+  if (depth == MAX_VARTX_DEPTH) {
+    set_inter_tx_size(mbmi, stride_log2, tx_w_log2, tx_h_log2, txs, tx_size,
+                      tx_size, blk_row, blk_col);
+    mbmi->tx_size = tx_size;
+    txfm_partition_update(xd->above_txfm_context + blk_col,
+                          xd->left_txfm_context + blk_row, tx_size, tx_size);
+    return;
+  }
+
+  const int ctx = txfm_partition_context(xd->above_txfm_context + blk_col,
+                                         xd->left_txfm_context + blk_row,
+                                         mbmi->sb_type, tx_size);
+  is_split = aom_read_symbol(r, ec_ctx->txfm_partition_cdf[ctx], 2, ACCT_STR);
+
+  if (is_split) {
+    const TX_SIZE sub_txs = sub_tx_size_map[tx_size];
+    const int bsw = tx_size_wide_unit[sub_txs];
+    const int bsh = tx_size_high_unit[sub_txs];
+
+    if (sub_txs == TX_4X4) {
+      set_inter_tx_size(mbmi, stride_log2, tx_w_log2, tx_h_log2, txs, tx_size,
+                        sub_txs, blk_row, blk_col);
+      mbmi->tx_size = sub_txs;
+      txfm_partition_update(xd->above_txfm_context + blk_col,
+                            xd->left_txfm_context + blk_row, sub_txs, tx_size);
+#if LOOP_FILTER_BITMASK
+      if (store_bitmask) {
+        store_bitmask_vartx(cm, mi_row + blk_row, mi_col + blk_col,
+                            txsize_to_bsize[tx_size], TX_4X4, mbmi);
+      }
+#endif
+      return;
+    }
+#if LOOP_FILTER_BITMASK
+    if (depth + 1 == MAX_VARTX_DEPTH && store_bitmask) {
+      store_bitmask_vartx(cm, mi_row + blk_row, mi_col + blk_col,
+                          txsize_to_bsize[tx_size], sub_txs, mbmi);
+      store_bitmask = 0;
+    }
+#endif
+
+    assert(bsw > 0 && bsh > 0);
+    for (int row = 0; row < tx_size_high_unit[tx_size]; row += bsh) {
+      for (int col = 0; col < tx_size_wide_unit[tx_size]; col += bsw) {
+        int offsetr = blk_row + row;
+        int offsetc = blk_col + col;
+        read_tx_size_vartx(xd, mbmi, sub_txs, depth + 1,
+#if LOOP_FILTER_BITMASK
+                           cm, mi_row, mi_col, store_bitmask,
+#endif
+                           offsetr, offsetc, r);
+      }
+    }
+  } else {
+    set_inter_tx_size(mbmi, stride_log2, tx_w_log2, tx_h_log2, txs, tx_size,
+                      tx_size, blk_row, blk_col);
+    mbmi->tx_size = tx_size;
+    txfm_partition_update(xd->above_txfm_context + blk_col,
+                          xd->left_txfm_context + blk_row, tx_size, tx_size);
+#if LOOP_FILTER_BITMASK
+    if (store_bitmask) {
+      store_bitmask_vartx(cm, mi_row + blk_row, mi_col + blk_col,
+                          txsize_to_bsize[tx_size], tx_size, mbmi);
+    }
+#endif
+  }
+}
+
+static TX_SIZE read_selected_tx_size(MACROBLOCKD *xd, aom_reader *r) {
+  // TODO(debargha): Clean up the logic here. This function should only
+  // be called for intra.
+  const BLOCK_SIZE bsize = xd->mi[0]->sb_type;
+  const int32_t tx_size_cat = bsize_to_tx_size_cat(bsize);
+  const int max_depths = bsize_to_max_depth(bsize);
+  const int ctx = get_tx_size_context(xd);
+  FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+  const int depth = aom_read_symbol(r, ec_ctx->tx_size_cdf[tx_size_cat][ctx],
+                                    max_depths + 1, ACCT_STR);
+  assert(depth >= 0 && depth <= max_depths);
+  const TX_SIZE tx_size = depth_to_tx_size(depth, bsize);
+  return tx_size;
+}
+
+static TX_SIZE read_tx_size(AV1_COMMON *cm, MACROBLOCKD *xd, int is_inter,
+                            int allow_select_inter, aom_reader *r) {
+  const TX_MODE tx_mode = cm->tx_mode;
+  const BLOCK_SIZE bsize = xd->mi[0]->sb_type;
+  if (xd->lossless[xd->mi[0]->segment_id]) return TX_4X4;
+
+  if (block_signals_txsize(bsize)) {
+    if ((!is_inter || allow_select_inter) && tx_mode == TX_MODE_SELECT) {
+      const TX_SIZE coded_tx_size = read_selected_tx_size(xd, r);
+      return coded_tx_size;
+    } else {
+      return tx_size_from_tx_mode(bsize, tx_mode);
+    }
+  } else {
+    assert(IMPLIES(tx_mode == ONLY_4X4, bsize == BLOCK_4X4));
+    return max_txsize_rect_lookup[bsize];
+  }
+}
+
+#if LOOP_FILTER_BITMASK
+static void store_bitmask_vartx(AV1_COMMON *cm, int mi_row, int mi_col,
+                                BLOCK_SIZE bsize, TX_SIZE tx_size,
+                                MB_MODE_INFO *mbmi) {
+  LoopFilterMask *lfm = get_loop_filter_mask(cm, mi_row, mi_col);
+  const TX_SIZE tx_size_y_vert = txsize_vert_map[tx_size];
+  const TX_SIZE tx_size_y_horz = txsize_horz_map[tx_size];
+  const TX_SIZE tx_size_uv_vert = txsize_vert_map[av1_get_max_uv_txsize(
+      mbmi->sb_type, cm->seq_params.subsampling_x,
+      cm->seq_params.subsampling_y)];
+  const TX_SIZE tx_size_uv_horz = txsize_horz_map[av1_get_max_uv_txsize(
+      mbmi->sb_type, cm->seq_params.subsampling_x,
+      cm->seq_params.subsampling_y)];
+  const int is_square_transform_size = tx_size <= TX_64X64;
+  int mask_id = 0;
+  int offset = 0;
+  const int half_ratio_tx_size_max32 =
+      (tx_size > TX_64X64) & (tx_size <= TX_32X16);
+  if (is_square_transform_size) {
+    switch (tx_size) {
+      case TX_4X4: mask_id = mask_id_table_tx_4x4[bsize]; break;
+      case TX_8X8:
+        mask_id = mask_id_table_tx_8x8[bsize];
+        offset = 19;
+        break;
+      case TX_16X16:
+        mask_id = mask_id_table_tx_16x16[bsize];
+        offset = 33;
+        break;
+      case TX_32X32:
+        mask_id = mask_id_table_tx_32x32[bsize];
+        offset = 42;
+        break;
+      case TX_64X64: mask_id = 46; break;
+      default: assert(!is_square_transform_size); return;
+    }
+    mask_id += offset;
+  } else if (half_ratio_tx_size_max32) {
+    int tx_size_equal_block_size = bsize == txsize_to_bsize[tx_size];
+    mask_id = 47 + 2 * (tx_size - TX_4X8) + (tx_size_equal_block_size ? 0 : 1);
+  } else if (tx_size == TX_32X64) {
+    mask_id = 59;
+  } else if (tx_size == TX_64X32) {
+    mask_id = 60;
+  } else {  // quarter ratio tx size
+    mask_id = 61 + (tx_size - TX_4X16);
+  }
+  int index = 0;
+  const int row = mi_row % MI_SIZE_64X64;
+  const int col = mi_col % MI_SIZE_64X64;
+  const int shift = get_index_shift(col, row, &index);
+  const int vert_shift = tx_size_y_vert <= TX_8X8 ? shift : col;
+  for (int i = 0; i + index < 4; ++i) {
+    // y vertical.
+    lfm->tx_size_ver[0][tx_size_y_horz].bits[i + index] |=
+        (left_mask_univariant_reordered[mask_id].bits[i] << vert_shift);
+    // y horizontal.
+    lfm->tx_size_hor[0][tx_size_y_vert].bits[i + index] |=
+        (above_mask_univariant_reordered[mask_id].bits[i] << shift);
+    // u/v vertical.
+    lfm->tx_size_ver[1][tx_size_uv_horz].bits[i + index] |=
+        (left_mask_univariant_reordered[mask_id].bits[i] << vert_shift);
+    // u/v horizontal.
+    lfm->tx_size_hor[1][tx_size_uv_vert].bits[i + index] |=
+        (above_mask_univariant_reordered[mask_id].bits[i] << shift);
+  }
+}
+
+static void store_bitmask_univariant_tx(AV1_COMMON *cm, int mi_row, int mi_col,
+                                        BLOCK_SIZE bsize, MB_MODE_INFO *mbmi) {
+  // Use a lookup table that provides one bitmask for a given block size and
+  // a univariant transform size.
+  int index;
+  int shift;
+  int row;
+  int col;
+  LoopFilterMask *lfm = get_loop_filter_mask(cm, mi_row, mi_col);
+  const TX_SIZE tx_size_y_vert = txsize_vert_map[mbmi->tx_size];
+  const TX_SIZE tx_size_y_horz = txsize_horz_map[mbmi->tx_size];
+  const TX_SIZE tx_size_uv_vert = txsize_vert_map[av1_get_max_uv_txsize(
+      mbmi->sb_type, cm->seq_params.subsampling_x,
+      cm->seq_params.subsampling_y)];
+  const TX_SIZE tx_size_uv_horz = txsize_horz_map[av1_get_max_uv_txsize(
+      mbmi->sb_type, cm->seq_params.subsampling_x,
+      cm->seq_params.subsampling_y)];
+  const int is_square_transform_size = mbmi->tx_size <= TX_64X64;
+  int mask_id = 0;
+  int offset = 0;
+  const int half_ratio_tx_size_max32 =
+      (mbmi->tx_size > TX_64X64) & (mbmi->tx_size <= TX_32X16);
+  if (is_square_transform_size) {
+    switch (mbmi->tx_size) {
+      case TX_4X4: mask_id = mask_id_table_tx_4x4[bsize]; break;
+      case TX_8X8:
+        mask_id = mask_id_table_tx_8x8[bsize];
+        offset = 19;
+        break;
+      case TX_16X16:
+        mask_id = mask_id_table_tx_16x16[bsize];
+        offset = 33;
+        break;
+      case TX_32X32:
+        mask_id = mask_id_table_tx_32x32[bsize];
+        offset = 42;
+        break;
+      case TX_64X64: mask_id = 46; break;
+      default: assert(!is_square_transform_size); return;
+    }
+    mask_id += offset;
+  } else if (half_ratio_tx_size_max32) {
+    int tx_size_equal_block_size = bsize == txsize_to_bsize[mbmi->tx_size];
+    mask_id =
+        47 + 2 * (mbmi->tx_size - TX_4X8) + (tx_size_equal_block_size ? 0 : 1);
+  } else if (mbmi->tx_size == TX_32X64) {
+    mask_id = 59;
+  } else if (mbmi->tx_size == TX_64X32) {
+    mask_id = 60;
+  } else {  // quarter ratio tx size
+    mask_id = 61 + (mbmi->tx_size - TX_4X16);
+  }
+  row = mi_row % MI_SIZE_64X64;
+  col = mi_col % MI_SIZE_64X64;
+  shift = get_index_shift(col, row, &index);
+  const int vert_shift = tx_size_y_vert <= TX_8X8 ? shift : col;
+  for (int i = 0; i + index < 4; ++i) {
+    // y vertical.
+    lfm->tx_size_ver[0][tx_size_y_horz].bits[i + index] |=
+        (left_mask_univariant_reordered[mask_id].bits[i] << vert_shift);
+    // y horizontal.
+    lfm->tx_size_hor[0][tx_size_y_vert].bits[i + index] |=
+        (above_mask_univariant_reordered[mask_id].bits[i] << shift);
+    // u/v vertical.
+    lfm->tx_size_ver[1][tx_size_uv_horz].bits[i + index] |=
+        (left_mask_univariant_reordered[mask_id].bits[i] << vert_shift);
+    // u/v horizontal.
+    lfm->tx_size_hor[1][tx_size_uv_vert].bits[i + index] |=
+        (above_mask_univariant_reordered[mask_id].bits[i] << shift);
+  }
+}
+
+static void store_bitmask_other_info(AV1_COMMON *cm, int mi_row, int mi_col,
+                                     BLOCK_SIZE bsize, MB_MODE_INFO *mbmi,
+                                     int is_horz_coding_block_border,
+                                     int is_vert_coding_block_border) {
+  int index;
+  int shift;
+  int row;
+  LoopFilterMask *lfm = get_loop_filter_mask(cm, mi_row, mi_col);
+  const int row_start = mi_row % MI_SIZE_64X64;
+  const int col_start = mi_col % MI_SIZE_64X64;
+  shift = get_index_shift(col_start, row_start, &index);
+  if (is_horz_coding_block_border) {
+    const int block_shift = shift + mi_size_wide[bsize];
+    assert(block_shift <= 64);
+    const uint64_t right_edge_shift =
+        (block_shift == 64) ? 0xffffffffffffffff : ((uint64_t)1 << block_shift);
+    const uint64_t left_edge_shift = (block_shift == 64)
+                                         ? (((uint64_t)1 << shift) - 1)
+                                         : ((uint64_t)1 << shift);
+    assert(right_edge_shift > left_edge_shift);
+    const uint64_t top_edge_mask = right_edge_shift - left_edge_shift;
+    lfm->is_horz_border.bits[index] |= top_edge_mask;
+  }
+  if (is_vert_coding_block_border) {
+    const int is_vert_border = mask_id_table_vert_border[bsize];
+    const int vert_shift = block_size_high[bsize] <= 8 ? shift : col_start;
+    for (int i = 0; i + index < 4; ++i) {
+      lfm->is_vert_border.bits[i + index] |=
+          (left_mask_univariant_reordered[is_vert_border].bits[i]
+           << vert_shift);
+    }
+  }
+  const int is_skip = mbmi->skip && is_inter_block(mbmi);
+  if (is_skip) {
+    const int is_skip_mask = mask_id_table_tx_4x4[bsize];
+    for (int i = 0; i + index < 4; ++i) {
+      lfm->skip.bits[i + index] |=
+          (above_mask_univariant_reordered[is_skip_mask].bits[i] << shift);
+    }
+  }
+  const uint8_t level_vert_y = get_filter_level(cm, &cm->lf_info, 0, 0, mbmi);
+  const uint8_t level_horz_y = get_filter_level(cm, &cm->lf_info, 1, 0, mbmi);
+  const uint8_t level_u = get_filter_level(cm, &cm->lf_info, 0, 1, mbmi);
+  const uint8_t level_v = get_filter_level(cm, &cm->lf_info, 0, 2, mbmi);
+  for (int r = mi_row; r < mi_row + mi_size_high[bsize]; r++) {
+    index = 0;
+    row = r % MI_SIZE_64X64;
+    memset(&lfm->lfl_y_ver[row][col_start], level_vert_y,
+           sizeof(uint8_t) * mi_size_wide[bsize]);
+    memset(&lfm->lfl_y_hor[row][col_start], level_horz_y,
+           sizeof(uint8_t) * mi_size_wide[bsize]);
+    memset(&lfm->lfl_u_ver[row][col_start], level_u,
+           sizeof(uint8_t) * mi_size_wide[bsize]);
+    memset(&lfm->lfl_u_hor[row][col_start], level_u,
+           sizeof(uint8_t) * mi_size_wide[bsize]);
+    memset(&lfm->lfl_v_ver[row][col_start], level_v,
+           sizeof(uint8_t) * mi_size_wide[bsize]);
+    memset(&lfm->lfl_v_hor[row][col_start], level_v,
+           sizeof(uint8_t) * mi_size_wide[bsize]);
+  }
+}
+#endif
+
+static void parse_decode_block(AV1Decoder *const pbi, ThreadData *const td,
+                               int mi_row, int mi_col, aom_reader *r,
+                               PARTITION_TYPE partition, BLOCK_SIZE bsize) {
+  MACROBLOCKD *const xd = &td->xd;
+  decode_mbmi_block(pbi, xd, mi_row, mi_col, r, partition, bsize, td);
+
+  av1_visit_palette(pbi, xd, mi_row, mi_col, r, bsize,
+                    av1_decode_palette_tokens);
+
+  AV1_COMMON *cm = &pbi->common;
+  const int num_planes = av1_num_planes(cm);
+  MB_MODE_INFO *mbmi = xd->mi[0];
+  int inter_block_tx = is_inter_block(mbmi) || is_intrabc_block(mbmi);
+  if (cm->tx_mode == TX_MODE_SELECT && block_signals_txsize(bsize) &&
+      !mbmi->skip && inter_block_tx && !xd->lossless[mbmi->segment_id]) {
+    const TX_SIZE max_tx_size = max_txsize_rect_lookup[bsize];
+    const int bh = tx_size_high_unit[max_tx_size];
+    const int bw = tx_size_wide_unit[max_tx_size];
+    const int width = block_size_wide[bsize] >> tx_size_wide_log2[0];
+    const int height = block_size_high[bsize] >> tx_size_high_log2[0];
+
+    for (int idy = 0; idy < height; idy += bh)
+      for (int idx = 0; idx < width; idx += bw)
+        read_tx_size_vartx(xd, mbmi, max_tx_size, 0,
+#if LOOP_FILTER_BITMASK
+                           cm, mi_row, mi_col, 1,
+#endif
+                           idy, idx, r);
+  } else {
+    mbmi->tx_size = read_tx_size(cm, xd, inter_block_tx, !mbmi->skip, r);
+    if (inter_block_tx)
+      memset(mbmi->inter_tx_size, mbmi->tx_size, sizeof(mbmi->inter_tx_size));
+    set_txfm_ctxs(mbmi->tx_size, xd->n4_w, xd->n4_h,
+                  mbmi->skip && is_inter_block(mbmi), xd);
+#if LOOP_FILTER_BITMASK
+    const int w = mi_size_wide[bsize];
+    const int h = mi_size_high[bsize];
+    if (w <= mi_size_wide[BLOCK_64X64] && h <= mi_size_high[BLOCK_64X64]) {
+      store_bitmask_univariant_tx(cm, mi_row, mi_col, bsize, mbmi);
+    } else {
+      for (int row = 0; row < h; row += mi_size_high[BLOCK_64X64]) {
+        for (int col = 0; col < w; col += mi_size_wide[BLOCK_64X64]) {
+          store_bitmask_univariant_tx(cm, mi_row + row, mi_col + col,
+                                      BLOCK_64X64, mbmi);
+        }
+      }
+    }
+#endif
+  }
+#if LOOP_FILTER_BITMASK
+  const int w = mi_size_wide[bsize];
+  const int h = mi_size_high[bsize];
+  if (w <= mi_size_wide[BLOCK_64X64] && h <= mi_size_high[BLOCK_64X64]) {
+    store_bitmask_other_info(cm, mi_row, mi_col, bsize, mbmi, 1, 1);
+  } else {
+    for (int row = 0; row < h; row += mi_size_high[BLOCK_64X64]) {
+      for (int col = 0; col < w; col += mi_size_wide[BLOCK_64X64]) {
+        store_bitmask_other_info(cm, mi_row + row, mi_col + col, BLOCK_64X64,
+                                 mbmi, row == 0, col == 0);
+      }
+    }
+  }
+#endif
+
+  if (cm->delta_q_info.delta_q_present_flag) {
+    for (int i = 0; i < MAX_SEGMENTS; i++) {
+      const int current_qindex =
+          av1_get_qindex(&cm->seg, i, xd->current_qindex);
+      for (int j = 0; j < num_planes; ++j) {
+        const int dc_delta_q =
+            j == 0 ? cm->y_dc_delta_q
+                   : (j == 1 ? cm->u_dc_delta_q : cm->v_dc_delta_q);
+        const int ac_delta_q =
+            j == 0 ? 0 : (j == 1 ? cm->u_ac_delta_q : cm->v_ac_delta_q);
+        xd->plane[j].seg_dequant_QTX[i][0] = av1_dc_quant_QTX(
+            current_qindex, dc_delta_q, cm->seq_params.bit_depth);
+        xd->plane[j].seg_dequant_QTX[i][1] = av1_ac_quant_QTX(
+            current_qindex, ac_delta_q, cm->seq_params.bit_depth);
+      }
+    }
+  }
+  if (mbmi->skip) av1_reset_skip_context(xd, mi_row, mi_col, bsize, num_planes);
+
+  decode_token_recon_block(pbi, td, mi_row, mi_col, r, bsize);
+}
+
+static void set_offsets_for_pred_and_recon(AV1Decoder *const pbi,
+                                           ThreadData *const td, int mi_row,
+                                           int mi_col, BLOCK_SIZE bsize) {
+  AV1_COMMON *const cm = &pbi->common;
+  MACROBLOCKD *const xd = &td->xd;
+  const int bw = mi_size_wide[bsize];
+  const int bh = mi_size_high[bsize];
+  const int num_planes = av1_num_planes(cm);
+
+  const int offset = mi_row * cm->mi_stride + mi_col;
+  const TileInfo *const tile = &xd->tile;
+
+  xd->mi = cm->mi_grid_visible + offset;
+  xd->cfl.mi_row = mi_row;
+  xd->cfl.mi_col = mi_col;
+
+  set_plane_n4(xd, bw, bh, num_planes);
+
+  // Distance of Mb to the various image edges. These are specified to 8th pel
+  // as they are always compared to values that are in 1/8th pel units
+  set_mi_row_col(xd, tile, mi_row, bh, mi_col, bw, cm->mi_rows, cm->mi_cols);
+
+  av1_setup_dst_planes(xd->plane, bsize, &cm->cur_frame->buf, mi_row, mi_col, 0,
+                       num_planes);
+}
+
+static void decode_block(AV1Decoder *const pbi, ThreadData *const td,
+                         int mi_row, int mi_col, aom_reader *r,
+                         PARTITION_TYPE partition, BLOCK_SIZE bsize) {
+  (void)partition;
+  set_offsets_for_pred_and_recon(pbi, td, mi_row, mi_col, bsize);
+  decode_token_recon_block(pbi, td, mi_row, mi_col, r, bsize);
+}
+
+static PARTITION_TYPE read_partition(MACROBLOCKD *xd, int mi_row, int mi_col,
+                                     aom_reader *r, int has_rows, int has_cols,
+                                     BLOCK_SIZE bsize) {
+  const int ctx = partition_plane_context(xd, mi_row, mi_col, bsize);
+  FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+
+  if (!has_rows && !has_cols) return PARTITION_SPLIT;
+
+  assert(ctx >= 0);
+  aom_cdf_prob *partition_cdf = ec_ctx->partition_cdf[ctx];
+  if (has_rows && has_cols) {
+    return (PARTITION_TYPE)aom_read_symbol(
+        r, partition_cdf, partition_cdf_length(bsize), ACCT_STR);
+  } else if (!has_rows && has_cols) {
+    assert(bsize > BLOCK_8X8);
+    aom_cdf_prob cdf[2];
+    partition_gather_vert_alike(cdf, partition_cdf, bsize);
+    assert(cdf[1] == AOM_ICDF(CDF_PROB_TOP));
+    return aom_read_cdf(r, cdf, 2, ACCT_STR) ? PARTITION_SPLIT : PARTITION_HORZ;
+  } else {
+    assert(has_rows && !has_cols);
+    assert(bsize > BLOCK_8X8);
+    aom_cdf_prob cdf[2];
+    partition_gather_horz_alike(cdf, partition_cdf, bsize);
+    assert(cdf[1] == AOM_ICDF(CDF_PROB_TOP));
+    return aom_read_cdf(r, cdf, 2, ACCT_STR) ? PARTITION_SPLIT : PARTITION_VERT;
+  }
+}
+
+// TODO(slavarnway): eliminate bsize and subsize in future commits
+static void decode_partition(AV1Decoder *const pbi, ThreadData *const td,
+                             int mi_row, int mi_col, aom_reader *reader,
+                             BLOCK_SIZE bsize, int parse_decode_flag) {
+  AV1_COMMON *const cm = &pbi->common;
+  MACROBLOCKD *const xd = &td->xd;
+  const int bw = mi_size_wide[bsize];
+  const int hbs = bw >> 1;
+  PARTITION_TYPE partition;
+  BLOCK_SIZE subsize;
+  const int quarter_step = bw / 4;
+  BLOCK_SIZE bsize2 = get_partition_subsize(bsize, PARTITION_SPLIT);
+  const int has_rows = (mi_row + hbs) < cm->mi_rows;
+  const int has_cols = (mi_col + hbs) < cm->mi_cols;
+
+  if (mi_row >= cm->mi_rows || mi_col >= cm->mi_cols) return;
+
+  // parse_decode_flag takes the following values :
+  // 01 - do parse only
+  // 10 - do decode only
+  // 11 - do parse and decode
+  static const block_visitor_fn_t block_visit[4] = {
+    NULL, parse_decode_block, decode_block, parse_decode_block
+  };
+
+  if (parse_decode_flag & 1) {
+    const int num_planes = av1_num_planes(cm);
+    for (int plane = 0; plane < num_planes; ++plane) {
+      int rcol0, rcol1, rrow0, rrow1;
+      if (av1_loop_restoration_corners_in_sb(cm, plane, mi_row, mi_col, bsize,
+                                             &rcol0, &rcol1, &rrow0, &rrow1)) {
+        const int rstride = cm->rst_info[plane].horz_units_per_tile;
+        for (int rrow = rrow0; rrow < rrow1; ++rrow) {
+          for (int rcol = rcol0; rcol < rcol1; ++rcol) {
+            const int runit_idx = rcol + rrow * rstride;
+            loop_restoration_read_sb_coeffs(cm, xd, reader, plane, runit_idx);
+          }
+        }
+      }
+    }
+
+    partition = (bsize < BLOCK_8X8) ? PARTITION_NONE
+                                    : read_partition(xd, mi_row, mi_col, reader,
+                                                     has_rows, has_cols, bsize);
+  } else {
+    partition = get_partition(cm, mi_row, mi_col, bsize);
+  }
+  subsize = get_partition_subsize(bsize, partition);
+
+  // Check the bitstream is conformant: if there is subsampling on the
+  // chroma planes, subsize must subsample to a valid block size.
+  const struct macroblockd_plane *const pd_u = &xd->plane[1];
+  if (get_plane_block_size(subsize, pd_u->subsampling_x, pd_u->subsampling_y) ==
+      BLOCK_INVALID) {
+    aom_internal_error(xd->error_info, AOM_CODEC_CORRUPT_FRAME,
+                       "Block size %dx%d invalid with this subsampling mode",
+                       block_size_wide[subsize], block_size_high[subsize]);
+  }
+
+#define DEC_BLOCK_STX_ARG
+#define DEC_BLOCK_EPT_ARG partition,
+#define DEC_BLOCK(db_r, db_c, db_subsize)                                  \
+  block_visit[parse_decode_flag](pbi, td, DEC_BLOCK_STX_ARG(db_r), (db_c), \
+                                 reader, DEC_BLOCK_EPT_ARG(db_subsize))
+#define DEC_PARTITION(db_r, db_c, db_subsize)                        \
+  decode_partition(pbi, td, DEC_BLOCK_STX_ARG(db_r), (db_c), reader, \
+                   (db_subsize), parse_decode_flag)
+
+  switch (partition) {
+    case PARTITION_NONE: DEC_BLOCK(mi_row, mi_col, subsize); break;
+    case PARTITION_HORZ:
+      DEC_BLOCK(mi_row, mi_col, subsize);
+      if (has_rows) DEC_BLOCK(mi_row + hbs, mi_col, subsize);
+      break;
+    case PARTITION_VERT:
+      DEC_BLOCK(mi_row, mi_col, subsize);
+      if (has_cols) DEC_BLOCK(mi_row, mi_col + hbs, subsize);
+      break;
+    case PARTITION_SPLIT:
+      DEC_PARTITION(mi_row, mi_col, subsize);
+      DEC_PARTITION(mi_row, mi_col + hbs, subsize);
+      DEC_PARTITION(mi_row + hbs, mi_col, subsize);
+      DEC_PARTITION(mi_row + hbs, mi_col + hbs, subsize);
+      break;
+    case PARTITION_HORZ_A:
+      DEC_BLOCK(mi_row, mi_col, bsize2);
+      DEC_BLOCK(mi_row, mi_col + hbs, bsize2);
+      DEC_BLOCK(mi_row + hbs, mi_col, subsize);
+      break;
+    case PARTITION_HORZ_B:
+      DEC_BLOCK(mi_row, mi_col, subsize);
+      DEC_BLOCK(mi_row + hbs, mi_col, bsize2);
+      DEC_BLOCK(mi_row + hbs, mi_col + hbs, bsize2);
+      break;
+    case PARTITION_VERT_A:
+      DEC_BLOCK(mi_row, mi_col, bsize2);
+      DEC_BLOCK(mi_row + hbs, mi_col, bsize2);
+      DEC_BLOCK(mi_row, mi_col + hbs, subsize);
+      break;
+    case PARTITION_VERT_B:
+      DEC_BLOCK(mi_row, mi_col, subsize);
+      DEC_BLOCK(mi_row, mi_col + hbs, bsize2);
+      DEC_BLOCK(mi_row + hbs, mi_col + hbs, bsize2);
+      break;
+    case PARTITION_HORZ_4:
+      for (int i = 0; i < 4; ++i) {
+        int this_mi_row = mi_row + i * quarter_step;
+        if (i > 0 && this_mi_row >= cm->mi_rows) break;
+        DEC_BLOCK(this_mi_row, mi_col, subsize);
+      }
+      break;
+    case PARTITION_VERT_4:
+      for (int i = 0; i < 4; ++i) {
+        int this_mi_col = mi_col + i * quarter_step;
+        if (i > 0 && this_mi_col >= cm->mi_cols) break;
+        DEC_BLOCK(mi_row, this_mi_col, subsize);
+      }
+      break;
+    default: assert(0 && "Invalid partition type");
+  }
+
+#undef DEC_PARTITION
+#undef DEC_BLOCK
+#undef DEC_BLOCK_EPT_ARG
+#undef DEC_BLOCK_STX_ARG
+
+  if (parse_decode_flag & 1)
+    update_ext_partition_context(xd, mi_row, mi_col, subsize, bsize, partition);
+}
+
+static void setup_bool_decoder(const uint8_t *data, const uint8_t *data_end,
+                               const size_t read_size,
+                               struct aom_internal_error_info *error_info,
+                               aom_reader *r, uint8_t allow_update_cdf) {
+  // Validate the calculated partition length. If the buffer
+  // described by the partition can't be fully read, then restrict
+  // it to the portion that can be (for EC mode) or throw an error.
+  if (!read_is_valid(data, read_size, data_end))
+    aom_internal_error(error_info, AOM_CODEC_CORRUPT_FRAME,
+                       "Truncated packet or corrupt tile length");
+
+  if (aom_reader_init(r, data, read_size))
+    aom_internal_error(error_info, AOM_CODEC_MEM_ERROR,
+                       "Failed to allocate bool decoder %d", 1);
+
+  r->allow_update_cdf = allow_update_cdf;
+}
+
+static void setup_segmentation(AV1_COMMON *const cm,
+                               struct aom_read_bit_buffer *rb) {
+  struct segmentation *const seg = &cm->seg;
+
+  seg->update_map = 0;
+  seg->update_data = 0;
+  seg->temporal_update = 0;
+
+  seg->enabled = aom_rb_read_bit(rb);
+  if (!seg->enabled) {
+    if (cm->cur_frame->seg_map)
+      memset(cm->cur_frame->seg_map, 0, (cm->mi_rows * cm->mi_cols));
+
+    memset(seg, 0, sizeof(*seg));
+    segfeatures_copy(&cm->cur_frame->seg, seg);
+    return;
+  }
+  if (cm->seg.enabled && cm->prev_frame &&
+      (cm->mi_rows == cm->prev_frame->mi_rows) &&
+      (cm->mi_cols == cm->prev_frame->mi_cols)) {
+    cm->last_frame_seg_map = cm->prev_frame->seg_map;
+  } else {
+    cm->last_frame_seg_map = NULL;
+  }
+  // Read update flags
+  if (cm->primary_ref_frame == PRIMARY_REF_NONE) {
+    // These frames can't use previous frames, so must signal map + features
+    seg->update_map = 1;
+    seg->temporal_update = 0;
+    seg->update_data = 1;
+  } else {
+    seg->update_map = aom_rb_read_bit(rb);
+    if (seg->update_map) {
+      seg->temporal_update = aom_rb_read_bit(rb);
+    } else {
+      seg->temporal_update = 0;
+    }
+    seg->update_data = aom_rb_read_bit(rb);
+  }
+
+  // Segmentation data update
+  if (seg->update_data) {
+    av1_clearall_segfeatures(seg);
+
+    for (int i = 0; i < MAX_SEGMENTS; i++) {
+      for (int j = 0; j < SEG_LVL_MAX; j++) {
+        int data = 0;
+        const int feature_enabled = aom_rb_read_bit(rb);
+        if (feature_enabled) {
+          av1_enable_segfeature(seg, i, j);
+
+          const int data_max = av1_seg_feature_data_max(j);
+          const int data_min = -data_max;
+          const int ubits = get_unsigned_bits(data_max);
+
+          if (av1_is_segfeature_signed(j)) {
+            data = aom_rb_read_inv_signed_literal(rb, ubits);
+          } else {
+            data = aom_rb_read_literal(rb, ubits);
+          }
+
+          data = clamp(data, data_min, data_max);
+        }
+        av1_set_segdata(seg, i, j, data);
+      }
+    }
+    calculate_segdata(seg);
+  } else if (cm->prev_frame) {
+    segfeatures_copy(seg, &cm->prev_frame->seg);
+  }
+  segfeatures_copy(&cm->cur_frame->seg, seg);
+}
+
+static void decode_restoration_mode(AV1_COMMON *cm,
+                                    struct aom_read_bit_buffer *rb) {
+  assert(!cm->all_lossless);
+  const int num_planes = av1_num_planes(cm);
+  if (cm->allow_intrabc) return;
+  int all_none = 1, chroma_none = 1;
+  for (int p = 0; p < num_planes; ++p) {
+    RestorationInfo *rsi = &cm->rst_info[p];
+    if (aom_rb_read_bit(rb)) {
+      rsi->frame_restoration_type =
+          aom_rb_read_bit(rb) ? RESTORE_SGRPROJ : RESTORE_WIENER;
+    } else {
+      rsi->frame_restoration_type =
+          aom_rb_read_bit(rb) ? RESTORE_SWITCHABLE : RESTORE_NONE;
+    }
+    if (rsi->frame_restoration_type != RESTORE_NONE) {
+      all_none = 0;
+      chroma_none &= p == 0;
+    }
+  }
+  if (!all_none) {
+    assert(cm->seq_params.sb_size == BLOCK_64X64 ||
+           cm->seq_params.sb_size == BLOCK_128X128);
+    const int sb_size = cm->seq_params.sb_size == BLOCK_128X128 ? 128 : 64;
+
+    for (int p = 0; p < num_planes; ++p)
+      cm->rst_info[p].restoration_unit_size = sb_size;
+
+    RestorationInfo *rsi = &cm->rst_info[0];
+
+    if (sb_size == 64) {
+      rsi->restoration_unit_size <<= aom_rb_read_bit(rb);
+    }
+    if (rsi->restoration_unit_size > 64) {
+      rsi->restoration_unit_size <<= aom_rb_read_bit(rb);
+    }
+  } else {
+    const int size = RESTORATION_UNITSIZE_MAX;
+    for (int p = 0; p < num_planes; ++p)
+      cm->rst_info[p].restoration_unit_size = size;
+  }
+
+  if (num_planes > 1) {
+    int s = AOMMIN(cm->seq_params.subsampling_x, cm->seq_params.subsampling_y);
+    if (s && !chroma_none) {
+      cm->rst_info[1].restoration_unit_size =
+          cm->rst_info[0].restoration_unit_size >> (aom_rb_read_bit(rb) * s);
+    } else {
+      cm->rst_info[1].restoration_unit_size =
+          cm->rst_info[0].restoration_unit_size;
+    }
+    cm->rst_info[2].restoration_unit_size =
+        cm->rst_info[1].restoration_unit_size;
+  }
+}
+
+static void read_wiener_filter(int wiener_win, WienerInfo *wiener_info,
+                               WienerInfo *ref_wiener_info, aom_reader *rb) {
+  memset(wiener_info->vfilter, 0, sizeof(wiener_info->vfilter));
+  memset(wiener_info->hfilter, 0, sizeof(wiener_info->hfilter));
+
+  if (wiener_win == WIENER_WIN)
+    wiener_info->vfilter[0] = wiener_info->vfilter[WIENER_WIN - 1] =
+        aom_read_primitive_refsubexpfin(
+            rb, WIENER_FILT_TAP0_MAXV - WIENER_FILT_TAP0_MINV + 1,
+            WIENER_FILT_TAP0_SUBEXP_K,
+            ref_wiener_info->vfilter[0] - WIENER_FILT_TAP0_MINV, ACCT_STR) +
+        WIENER_FILT_TAP0_MINV;
+  else
+    wiener_info->vfilter[0] = wiener_info->vfilter[WIENER_WIN - 1] = 0;
+  wiener_info->vfilter[1] = wiener_info->vfilter[WIENER_WIN - 2] =
+      aom_read_primitive_refsubexpfin(
+          rb, WIENER_FILT_TAP1_MAXV - WIENER_FILT_TAP1_MINV + 1,
+          WIENER_FILT_TAP1_SUBEXP_K,
+          ref_wiener_info->vfilter[1] - WIENER_FILT_TAP1_MINV, ACCT_STR) +
+      WIENER_FILT_TAP1_MINV;
+  wiener_info->vfilter[2] = wiener_info->vfilter[WIENER_WIN - 3] =
+      aom_read_primitive_refsubexpfin(
+          rb, WIENER_FILT_TAP2_MAXV - WIENER_FILT_TAP2_MINV + 1,
+          WIENER_FILT_TAP2_SUBEXP_K,
+          ref_wiener_info->vfilter[2] - WIENER_FILT_TAP2_MINV, ACCT_STR) +
+      WIENER_FILT_TAP2_MINV;
+  // The central element has an implicit +WIENER_FILT_STEP
+  wiener_info->vfilter[WIENER_HALFWIN] =
+      -2 * (wiener_info->vfilter[0] + wiener_info->vfilter[1] +
+            wiener_info->vfilter[2]);
+
+  if (wiener_win == WIENER_WIN)
+    wiener_info->hfilter[0] = wiener_info->hfilter[WIENER_WIN - 1] =
+        aom_read_primitive_refsubexpfin(
+            rb, WIENER_FILT_TAP0_MAXV - WIENER_FILT_TAP0_MINV + 1,
+            WIENER_FILT_TAP0_SUBEXP_K,
+            ref_wiener_info->hfilter[0] - WIENER_FILT_TAP0_MINV, ACCT_STR) +
+        WIENER_FILT_TAP0_MINV;
+  else
+    wiener_info->hfilter[0] = wiener_info->hfilter[WIENER_WIN - 1] = 0;
+  wiener_info->hfilter[1] = wiener_info->hfilter[WIENER_WIN - 2] =
+      aom_read_primitive_refsubexpfin(
+          rb, WIENER_FILT_TAP1_MAXV - WIENER_FILT_TAP1_MINV + 1,
+          WIENER_FILT_TAP1_SUBEXP_K,
+          ref_wiener_info->hfilter[1] - WIENER_FILT_TAP1_MINV, ACCT_STR) +
+      WIENER_FILT_TAP1_MINV;
+  wiener_info->hfilter[2] = wiener_info->hfilter[WIENER_WIN - 3] =
+      aom_read_primitive_refsubexpfin(
+          rb, WIENER_FILT_TAP2_MAXV - WIENER_FILT_TAP2_MINV + 1,
+          WIENER_FILT_TAP2_SUBEXP_K,
+          ref_wiener_info->hfilter[2] - WIENER_FILT_TAP2_MINV, ACCT_STR) +
+      WIENER_FILT_TAP2_MINV;
+  // The central element has an implicit +WIENER_FILT_STEP
+  wiener_info->hfilter[WIENER_HALFWIN] =
+      -2 * (wiener_info->hfilter[0] + wiener_info->hfilter[1] +
+            wiener_info->hfilter[2]);
+  memcpy(ref_wiener_info, wiener_info, sizeof(*wiener_info));
+}
+
+static void read_sgrproj_filter(SgrprojInfo *sgrproj_info,
+                                SgrprojInfo *ref_sgrproj_info, aom_reader *rb) {
+  sgrproj_info->ep = aom_read_literal(rb, SGRPROJ_PARAMS_BITS, ACCT_STR);
+  const sgr_params_type *params = &sgr_params[sgrproj_info->ep];
+
+  if (params->r[0] == 0) {
+    sgrproj_info->xqd[0] = 0;
+    sgrproj_info->xqd[1] =
+        aom_read_primitive_refsubexpfin(
+            rb, SGRPROJ_PRJ_MAX1 - SGRPROJ_PRJ_MIN1 + 1, SGRPROJ_PRJ_SUBEXP_K,
+            ref_sgrproj_info->xqd[1] - SGRPROJ_PRJ_MIN1, ACCT_STR) +
+        SGRPROJ_PRJ_MIN1;
+  } else if (params->r[1] == 0) {
+    sgrproj_info->xqd[0] =
+        aom_read_primitive_refsubexpfin(
+            rb, SGRPROJ_PRJ_MAX0 - SGRPROJ_PRJ_MIN0 + 1, SGRPROJ_PRJ_SUBEXP_K,
+            ref_sgrproj_info->xqd[0] - SGRPROJ_PRJ_MIN0, ACCT_STR) +
+        SGRPROJ_PRJ_MIN0;
+    sgrproj_info->xqd[1] = clamp((1 << SGRPROJ_PRJ_BITS) - sgrproj_info->xqd[0],
+                                 SGRPROJ_PRJ_MIN1, SGRPROJ_PRJ_MAX1);
+  } else {
+    sgrproj_info->xqd[0] =
+        aom_read_primitive_refsubexpfin(
+            rb, SGRPROJ_PRJ_MAX0 - SGRPROJ_PRJ_MIN0 + 1, SGRPROJ_PRJ_SUBEXP_K,
+            ref_sgrproj_info->xqd[0] - SGRPROJ_PRJ_MIN0, ACCT_STR) +
+        SGRPROJ_PRJ_MIN0;
+    sgrproj_info->xqd[1] =
+        aom_read_primitive_refsubexpfin(
+            rb, SGRPROJ_PRJ_MAX1 - SGRPROJ_PRJ_MIN1 + 1, SGRPROJ_PRJ_SUBEXP_K,
+            ref_sgrproj_info->xqd[1] - SGRPROJ_PRJ_MIN1, ACCT_STR) +
+        SGRPROJ_PRJ_MIN1;
+  }
+
+  memcpy(ref_sgrproj_info, sgrproj_info, sizeof(*sgrproj_info));
+}
+
+static void loop_restoration_read_sb_coeffs(const AV1_COMMON *const cm,
+                                            MACROBLOCKD *xd,
+                                            aom_reader *const r, int plane,
+                                            int runit_idx) {
+  const RestorationInfo *rsi = &cm->rst_info[plane];
+  RestorationUnitInfo *rui = &rsi->unit_info[runit_idx];
+  if (rsi->frame_restoration_type == RESTORE_NONE) return;
+
+  assert(!cm->all_lossless);
+
+  const int wiener_win = (plane > 0) ? WIENER_WIN_CHROMA : WIENER_WIN;
+  WienerInfo *wiener_info = xd->wiener_info + plane;
+  SgrprojInfo *sgrproj_info = xd->sgrproj_info + plane;
+
+  if (rsi->frame_restoration_type == RESTORE_SWITCHABLE) {
+    rui->restoration_type =
+        aom_read_symbol(r, xd->tile_ctx->switchable_restore_cdf,
+                        RESTORE_SWITCHABLE_TYPES, ACCT_STR);
+    switch (rui->restoration_type) {
+      case RESTORE_WIENER:
+        read_wiener_filter(wiener_win, &rui->wiener_info, wiener_info, r);
+        break;
+      case RESTORE_SGRPROJ:
+        read_sgrproj_filter(&rui->sgrproj_info, sgrproj_info, r);
+        break;
+      default: assert(rui->restoration_type == RESTORE_NONE); break;
+    }
+  } else if (rsi->frame_restoration_type == RESTORE_WIENER) {
+    if (aom_read_symbol(r, xd->tile_ctx->wiener_restore_cdf, 2, ACCT_STR)) {
+      rui->restoration_type = RESTORE_WIENER;
+      read_wiener_filter(wiener_win, &rui->wiener_info, wiener_info, r);
+    } else {
+      rui->restoration_type = RESTORE_NONE;
+    }
+  } else if (rsi->frame_restoration_type == RESTORE_SGRPROJ) {
+    if (aom_read_symbol(r, xd->tile_ctx->sgrproj_restore_cdf, 2, ACCT_STR)) {
+      rui->restoration_type = RESTORE_SGRPROJ;
+      read_sgrproj_filter(&rui->sgrproj_info, sgrproj_info, r);
+    } else {
+      rui->restoration_type = RESTORE_NONE;
+    }
+  }
+}
+
+static void setup_loopfilter(AV1_COMMON *cm, struct aom_read_bit_buffer *rb) {
+  const int num_planes = av1_num_planes(cm);
+  struct loopfilter *lf = &cm->lf;
+  if (cm->allow_intrabc || cm->coded_lossless) {
+    // write default deltas to frame buffer
+    av1_set_default_ref_deltas(cm->cur_frame->ref_deltas);
+    av1_set_default_mode_deltas(cm->cur_frame->mode_deltas);
+    return;
+  }
+  assert(!cm->coded_lossless);
+  if (cm->prev_frame) {
+    // write deltas to frame buffer
+    memcpy(lf->ref_deltas, cm->prev_frame->ref_deltas, REF_FRAMES);
+    memcpy(lf->mode_deltas, cm->prev_frame->mode_deltas, MAX_MODE_LF_DELTAS);
+  } else {
+    av1_set_default_ref_deltas(lf->ref_deltas);
+    av1_set_default_mode_deltas(lf->mode_deltas);
+  }
+  lf->filter_level[0] = aom_rb_read_literal(rb, 6);
+  lf->filter_level[1] = aom_rb_read_literal(rb, 6);
+  if (num_planes > 1) {
+    if (lf->filter_level[0] || lf->filter_level[1]) {
+      lf->filter_level_u = aom_rb_read_literal(rb, 6);
+      lf->filter_level_v = aom_rb_read_literal(rb, 6);
+    }
+  }
+  lf->sharpness_level = aom_rb_read_literal(rb, 3);
+
+  // Read in loop filter deltas applied at the MB level based on mode or ref
+  // frame.
+  lf->mode_ref_delta_update = 0;
+
+  lf->mode_ref_delta_enabled = aom_rb_read_bit(rb);
+  if (lf->mode_ref_delta_enabled) {
+    lf->mode_ref_delta_update = aom_rb_read_bit(rb);
+    if (lf->mode_ref_delta_update) {
+      for (int i = 0; i < REF_FRAMES; i++)
+        if (aom_rb_read_bit(rb))
+          lf->ref_deltas[i] = aom_rb_read_inv_signed_literal(rb, 6);
+
+      for (int i = 0; i < MAX_MODE_LF_DELTAS; i++)
+        if (aom_rb_read_bit(rb))
+          lf->mode_deltas[i] = aom_rb_read_inv_signed_literal(rb, 6);
+    }
+  }
+
+  // write deltas to frame buffer
+  memcpy(cm->cur_frame->ref_deltas, lf->ref_deltas, REF_FRAMES);
+  memcpy(cm->cur_frame->mode_deltas, lf->mode_deltas, MAX_MODE_LF_DELTAS);
+}
+
+static void setup_cdef(AV1_COMMON *cm, struct aom_read_bit_buffer *rb) {
+  const int num_planes = av1_num_planes(cm);
+  CdefInfo *const cdef_info = &cm->cdef_info;
+
+  if (cm->allow_intrabc) return;
+  cdef_info->cdef_pri_damping = aom_rb_read_literal(rb, 2) + 3;
+  cdef_info->cdef_sec_damping = cdef_info->cdef_pri_damping;
+  cdef_info->cdef_bits = aom_rb_read_literal(rb, 2);
+  cdef_info->nb_cdef_strengths = 1 << cdef_info->cdef_bits;
+  for (int i = 0; i < cdef_info->nb_cdef_strengths; i++) {
+    cdef_info->cdef_strengths[i] = aom_rb_read_literal(rb, CDEF_STRENGTH_BITS);
+    cdef_info->cdef_uv_strengths[i] =
+        num_planes > 1 ? aom_rb_read_literal(rb, CDEF_STRENGTH_BITS) : 0;
+  }
+}
+
+static INLINE int read_delta_q(struct aom_read_bit_buffer *rb) {
+  return aom_rb_read_bit(rb) ? aom_rb_read_inv_signed_literal(rb, 6) : 0;
+}
+
+static void setup_quantization(AV1_COMMON *const cm,
+                               struct aom_read_bit_buffer *rb) {
+  const SequenceHeader *const seq_params = &cm->seq_params;
+  const int num_planes = av1_num_planes(cm);
+  cm->base_qindex = aom_rb_read_literal(rb, QINDEX_BITS);
+  cm->y_dc_delta_q = read_delta_q(rb);
+  if (num_planes > 1) {
+    int diff_uv_delta = 0;
+    if (seq_params->separate_uv_delta_q) diff_uv_delta = aom_rb_read_bit(rb);
+    cm->u_dc_delta_q = read_delta_q(rb);
+    cm->u_ac_delta_q = read_delta_q(rb);
+    if (diff_uv_delta) {
+      cm->v_dc_delta_q = read_delta_q(rb);
+      cm->v_ac_delta_q = read_delta_q(rb);
+    } else {
+      cm->v_dc_delta_q = cm->u_dc_delta_q;
+      cm->v_ac_delta_q = cm->u_ac_delta_q;
+    }
+  } else {
+    cm->u_dc_delta_q = 0;
+    cm->u_ac_delta_q = 0;
+    cm->v_dc_delta_q = 0;
+    cm->v_ac_delta_q = 0;
+  }
+  cm->using_qmatrix = aom_rb_read_bit(rb);
+  if (cm->using_qmatrix) {
+    cm->qm_y = aom_rb_read_literal(rb, QM_LEVEL_BITS);
+    cm->qm_u = aom_rb_read_literal(rb, QM_LEVEL_BITS);
+    if (!seq_params->separate_uv_delta_q)
+      cm->qm_v = cm->qm_u;
+    else
+      cm->qm_v = aom_rb_read_literal(rb, QM_LEVEL_BITS);
+  } else {
+    cm->qm_y = 0;
+    cm->qm_u = 0;
+    cm->qm_v = 0;
+  }
+}
+
+// Build y/uv dequant values based on segmentation.
+static void setup_segmentation_dequant(AV1_COMMON *const cm,
+                                       MACROBLOCKD *const xd) {
+  const int bit_depth = cm->seq_params.bit_depth;
+  const int using_qm = cm->using_qmatrix;
+  // When segmentation is disabled, only the first value is used.  The
+  // remaining are don't cares.
+  const int max_segments = cm->seg.enabled ? MAX_SEGMENTS : 1;
+  for (int i = 0; i < max_segments; ++i) {
+    const int qindex = xd->qindex[i];
+    cm->y_dequant_QTX[i][0] =
+        av1_dc_quant_QTX(qindex, cm->y_dc_delta_q, bit_depth);
+    cm->y_dequant_QTX[i][1] = av1_ac_quant_QTX(qindex, 0, bit_depth);
+    cm->u_dequant_QTX[i][0] =
+        av1_dc_quant_QTX(qindex, cm->u_dc_delta_q, bit_depth);
+    cm->u_dequant_QTX[i][1] =
+        av1_ac_quant_QTX(qindex, cm->u_ac_delta_q, bit_depth);
+    cm->v_dequant_QTX[i][0] =
+        av1_dc_quant_QTX(qindex, cm->v_dc_delta_q, bit_depth);
+    cm->v_dequant_QTX[i][1] =
+        av1_ac_quant_QTX(qindex, cm->v_ac_delta_q, bit_depth);
+    const int lossless = xd->lossless[i];
+    // NB: depends on base index so there is only 1 set per frame
+    // No quant weighting when lossless or signalled not using QM
+    int qmlevel = (lossless || using_qm == 0) ? NUM_QM_LEVELS - 1 : cm->qm_y;
+    for (int j = 0; j < TX_SIZES_ALL; ++j) {
+      cm->y_iqmatrix[i][j] = av1_iqmatrix(cm, qmlevel, AOM_PLANE_Y, j);
+    }
+    qmlevel = (lossless || using_qm == 0) ? NUM_QM_LEVELS - 1 : cm->qm_u;
+    for (int j = 0; j < TX_SIZES_ALL; ++j) {
+      cm->u_iqmatrix[i][j] = av1_iqmatrix(cm, qmlevel, AOM_PLANE_U, j);
+    }
+    qmlevel = (lossless || using_qm == 0) ? NUM_QM_LEVELS - 1 : cm->qm_v;
+    for (int j = 0; j < TX_SIZES_ALL; ++j) {
+      cm->v_iqmatrix[i][j] = av1_iqmatrix(cm, qmlevel, AOM_PLANE_V, j);
+    }
+  }
+}
+
+static InterpFilter read_frame_interp_filter(struct aom_read_bit_buffer *rb) {
+  return aom_rb_read_bit(rb) ? SWITCHABLE
+                             : aom_rb_read_literal(rb, LOG_SWITCHABLE_FILTERS);
+}
+
+static void setup_render_size(AV1_COMMON *cm, struct aom_read_bit_buffer *rb) {
+  cm->render_width = cm->superres_upscaled_width;
+  cm->render_height = cm->superres_upscaled_height;
+  if (aom_rb_read_bit(rb))
+    av1_read_frame_size(rb, 16, 16, &cm->render_width, &cm->render_height);
+}
+
+// TODO(afergs): make "struct aom_read_bit_buffer *const rb"?
+static void setup_superres(AV1_COMMON *const cm, struct aom_read_bit_buffer *rb,
+                           int *width, int *height) {
+  cm->superres_upscaled_width = *width;
+  cm->superres_upscaled_height = *height;
+
+  const SequenceHeader *const seq_params = &cm->seq_params;
+  if (!seq_params->enable_superres) return;
+
+  if (aom_rb_read_bit(rb)) {
+    cm->superres_scale_denominator =
+        (uint8_t)aom_rb_read_literal(rb, SUPERRES_SCALE_BITS);
+    cm->superres_scale_denominator += SUPERRES_SCALE_DENOMINATOR_MIN;
+    // Don't edit cm->width or cm->height directly, or the buffers won't get
+    // resized correctly
+    av1_calculate_scaled_superres_size(width, height,
+                                       cm->superres_scale_denominator);
+  } else {
+    // 1:1 scaling - ie. no scaling, scale not provided
+    cm->superres_scale_denominator = SCALE_NUMERATOR;
+  }
+}
+
+static void resize_context_buffers(AV1_COMMON *cm, int width, int height) {
+#if CONFIG_SIZE_LIMIT
+  if (width > DECODE_WIDTH_LIMIT || height > DECODE_HEIGHT_LIMIT)
+    aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                       "Dimensions of %dx%d beyond allowed size of %dx%d.",
+                       width, height, DECODE_WIDTH_LIMIT, DECODE_HEIGHT_LIMIT);
+#endif
+  if (cm->width != width || cm->height != height) {
+    av1_set_mb_mi(cm, width, height);
+    cm->width = width;
+    cm->height = height;
+  }
+
+  cm->cur_frame->width = cm->width;
+  cm->cur_frame->height = cm->height;
+}
+
+static void setup_buffer_pool(AV1_COMMON *cm) {
+  BufferPool *const pool = cm->buffer_pool;
+  const SequenceHeader *const seq_params = &cm->seq_params;
+
+  lock_buffer_pool(pool);
+  //if (aom_realloc_frame_buffer(
+  //        &cm->cur_frame->buf, cm->width, cm->height, seq_params->subsampling_x,
+  //        seq_params->subsampling_y, seq_params->use_highbitdepth,
+  //        AOM_DEC_BORDER_IN_PIXELS, cm->byte_alignment,
+  //        &cm->cur_frame->raw_frame_buffer, pool->get_fb_cb, pool->cb_priv)) {
+  //  unlock_buffer_pool(pool);
+  //  aom_internal_error(&cm->error, AOM_CODEC_MEM_ERROR,
+  //                     "Failed to allocate frame buffer");
+  //}
+  if (av1_reallocate_frame_buffer(pool->cb_priv,
+                                &cm->cur_frame->buf, 
+                                cm->width, 
+                                cm->height, 
+                                cm->superres_upscaled_width,
+                                seq_params->use_highbitdepth)) {
+    unlock_buffer_pool(pool);
+    aom_internal_error(&cm->error, AOM_CODEC_MEM_ERROR, "Failed to allocate frame buffer");
+  }
+  unlock_buffer_pool(pool);
+  ensure_mv_buffer(cm->cur_frame, cm);
+
+  cm->cur_frame->buf.bit_depth = (unsigned int)seq_params->bit_depth;
+  cm->cur_frame->buf.color_primaries = seq_params->color_primaries;
+  cm->cur_frame->buf.transfer_characteristics =
+      seq_params->transfer_characteristics;
+  cm->cur_frame->buf.matrix_coefficients = seq_params->matrix_coefficients;
+  cm->cur_frame->buf.monochrome = seq_params->monochrome;
+  cm->cur_frame->buf.chroma_sample_position =
+      seq_params->chroma_sample_position;
+  cm->cur_frame->buf.color_range = seq_params->color_range;
+  cm->cur_frame->buf.render_width = cm->render_width;
+  cm->cur_frame->buf.render_height = cm->render_height;
+}
+
+static void setup_frame_size(AV1_COMMON *cm, int frame_size_override_flag,
+                             struct aom_read_bit_buffer *rb) {
+  const SequenceHeader *const seq_params = &cm->seq_params;
+  int width, height;
+
+  if (frame_size_override_flag) {
+    int num_bits_width = seq_params->num_bits_width;
+    int num_bits_height = seq_params->num_bits_height;
+    av1_read_frame_size(rb, num_bits_width, num_bits_height, &width, &height);
+    if (width > seq_params->max_frame_width ||
+        height > seq_params->max_frame_height) {
+      aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                         "Frame dimensions are larger than the maximum values");
+    }
+  } else {
+    width = seq_params->max_frame_width;
+    height = seq_params->max_frame_height;
+  }
+
+  setup_superres(cm, rb, &width, &height);
+  resize_context_buffers(cm, width, height);
+  setup_render_size(cm, rb);
+  setup_buffer_pool(cm);
+}
+
+static void setup_sb_size(SequenceHeader *seq_params,
+                          struct aom_read_bit_buffer *rb) {
+  set_sb_size(seq_params, aom_rb_read_bit(rb) ? BLOCK_128X128 : BLOCK_64X64);
+}
+
+static INLINE int valid_ref_frame_img_fmt(aom_bit_depth_t ref_bit_depth,
+                                          int ref_xss, int ref_yss,
+                                          aom_bit_depth_t this_bit_depth,
+                                          int this_xss, int this_yss) {
+  return ref_bit_depth == this_bit_depth && ref_xss == this_xss &&
+         ref_yss == this_yss;
+}
+
+static void setup_frame_size_with_refs(AV1_COMMON *cm,
+                                       struct aom_read_bit_buffer *rb) {
+  int width, height;
+  int found = 0;
+  int has_valid_ref_frame = 0;
+  for (int i = LAST_FRAME; i <= ALTREF_FRAME; ++i) {
+    if (aom_rb_read_bit(rb)) {
+      const RefCntBuffer *const ref_buf = get_ref_frame_buf(cm, i);
+      // This will never be NULL in a normal stream, as streams are required to
+      // have a shown keyframe before any inter frames, which would refresh all
+      // the reference buffers. However, it might be null if we're starting in
+      // the middle of a stream, and static analysis will error if we don't do
+      // a null check here.
+      if (ref_buf == NULL) {
+        aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                           "Invalid condition: invalid reference buffer");
+      } else {
+        const YV12_BUFFER_CONFIG *const buf = &ref_buf->buf;
+        width = buf->y_crop_width;
+        height = buf->y_crop_height;
+        cm->render_width = buf->render_width;
+        cm->render_height = buf->render_height;
+        setup_superres(cm, rb, &width, &height);
+        resize_context_buffers(cm, width, height);
+        found = 1;
+        break;
+      }
+    }
+  }
+
+  const SequenceHeader *const seq_params = &cm->seq_params;
+  if (!found) {
+    int num_bits_width = seq_params->num_bits_width;
+    int num_bits_height = seq_params->num_bits_height;
+
+    av1_read_frame_size(rb, num_bits_width, num_bits_height, &width, &height);
+    setup_superres(cm, rb, &width, &height);
+    resize_context_buffers(cm, width, height);
+    setup_render_size(cm, rb);
+  }
+
+  if (width <= 0 || height <= 0)
+    aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                       "Invalid frame size");
+
+  // Check to make sure at least one of frames that this frame references
+  // has valid dimensions.
+  for (int i = LAST_FRAME; i <= ALTREF_FRAME; ++i) {
+    const RefCntBuffer *const ref_frame = get_ref_frame_buf(cm, i);
+    has_valid_ref_frame |=
+        valid_ref_frame_size(ref_frame->buf.y_crop_width,
+                             ref_frame->buf.y_crop_height, width, height);
+  }
+  if (!has_valid_ref_frame)
+    aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                       "Referenced frame has invalid size");
+  for (int i = LAST_FRAME; i <= ALTREF_FRAME; ++i) {
+    const RefCntBuffer *const ref_frame = get_ref_frame_buf(cm, i);
+    if (!valid_ref_frame_img_fmt(
+            ref_frame->buf.bit_depth, ref_frame->buf.subsampling_x,
+            ref_frame->buf.subsampling_y, seq_params->bit_depth,
+            seq_params->subsampling_x, seq_params->subsampling_y))
+      aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                         "Referenced frame has incompatible color format");
+  }
+  setup_buffer_pool(cm);
+}
+
+// Same function as av1_read_uniform but reading from uncompresses header wb
+static int rb_read_uniform(struct aom_read_bit_buffer *const rb, int n) {
+  const int l = get_unsigned_bits(n);
+  const int m = (1 << l) - n;
+  const int v = aom_rb_read_literal(rb, l - 1);
+  assert(l != 0);
+  if (v < m)
+    return v;
+  else
+    return (v << 1) - m + aom_rb_read_bit(rb);
+}
+
+static void read_tile_info_max_tile(AV1_COMMON *const cm,
+                                    struct aom_read_bit_buffer *const rb) {
+  int width_mi = ALIGN_POWER_OF_TWO(cm->mi_cols, cm->seq_params.mib_size_log2);
+  int height_mi = ALIGN_POWER_OF_TWO(cm->mi_rows, cm->seq_params.mib_size_log2);
+  int width_sb = width_mi >> cm->seq_params.mib_size_log2;
+  int height_sb = height_mi >> cm->seq_params.mib_size_log2;
+
+  av1_get_tile_limits(cm);
+  cm->uniform_tile_spacing_flag = aom_rb_read_bit(rb);
+
+  // Read tile columns
+  if (cm->uniform_tile_spacing_flag) {
+    cm->log2_tile_cols = cm->min_log2_tile_cols;
+    while (cm->log2_tile_cols < cm->max_log2_tile_cols) {
+      if (!aom_rb_read_bit(rb)) {
+        break;
+      }
+      cm->log2_tile_cols++;
+    }
+  } else {
+    int i;
+    int start_sb;
+    for (i = 0, start_sb = 0; width_sb > 0 && i < MAX_TILE_COLS; i++) {
+      const int size_sb =
+          1 + rb_read_uniform(rb, AOMMIN(width_sb, cm->max_tile_width_sb));
+      cm->tile_col_start_sb[i] = start_sb;
+      start_sb += size_sb;
+      width_sb -= size_sb;
+    }
+    cm->tile_cols = i;
+    cm->tile_col_start_sb[i] = start_sb + width_sb;
+  }
+  av1_calculate_tile_cols(cm);
+
+  // Read tile rows
+  if (cm->uniform_tile_spacing_flag) {
+    cm->log2_tile_rows = cm->min_log2_tile_rows;
+    while (cm->log2_tile_rows < cm->max_log2_tile_rows) {
+      if (!aom_rb_read_bit(rb)) {
+        break;
+      }
+      cm->log2_tile_rows++;
+    }
+  } else {
+    int i;
+    int start_sb;
+    for (i = 0, start_sb = 0; height_sb > 0 && i < MAX_TILE_ROWS; i++) {
+      const int size_sb =
+          1 + rb_read_uniform(rb, AOMMIN(height_sb, cm->max_tile_height_sb));
+      cm->tile_row_start_sb[i] = start_sb;
+      start_sb += size_sb;
+      height_sb -= size_sb;
+    }
+    cm->tile_rows = i;
+    cm->tile_row_start_sb[i] = start_sb + height_sb;
+  }
+  av1_calculate_tile_rows(cm);
+}
+
+void av1_set_single_tile_decoding_mode(AV1_COMMON *const cm) {
+  cm->single_tile_decoding = 0;
+  if (cm->large_scale_tile) {
+    struct loopfilter *lf = &cm->lf;
+    RestorationInfo *const rst_info = cm->rst_info;
+    const CdefInfo *const cdef_info = &cm->cdef_info;
+
+    // Figure out single_tile_decoding by loopfilter_level.
+    const int no_loopfilter = !(lf->filter_level[0] || lf->filter_level[1]);
+    const int no_cdef = cdef_info->cdef_bits == 0 &&
+                        cdef_info->cdef_strengths[0] == 0 &&
+                        cdef_info->cdef_uv_strengths[0] == 0;
+    const int no_restoration =
+        rst_info[0].frame_restoration_type == RESTORE_NONE &&
+        rst_info[1].frame_restoration_type == RESTORE_NONE &&
+        rst_info[2].frame_restoration_type == RESTORE_NONE;
+    assert(IMPLIES(cm->coded_lossless, no_loopfilter && no_cdef));
+    assert(IMPLIES(cm->all_lossless, no_restoration));
+    cm->single_tile_decoding = no_loopfilter && no_cdef && no_restoration;
+  }
+}
+
+static void read_tile_info(AV1Decoder *const pbi,
+                           struct aom_read_bit_buffer *const rb) {
+  AV1_COMMON *const cm = &pbi->common;
+
+  read_tile_info_max_tile(cm, rb);
+
+  cm->context_update_tile_id = 0;
+  if (cm->tile_rows * cm->tile_cols > 1) {
+    // tile to use for cdf update
+    cm->context_update_tile_id =
+        aom_rb_read_literal(rb, cm->log2_tile_rows + cm->log2_tile_cols);
+    if (cm->context_update_tile_id >= cm->tile_rows * cm->tile_cols) {
+      aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                         "Invalid context_update_tile_id");
+    }
+    // tile size magnitude
+    pbi->tile_size_bytes = aom_rb_read_literal(rb, 2) + 1;
+  }
+}
+
+#if EXT_TILE_DEBUG
+static void read_ext_tile_info(AV1Decoder *const pbi,
+                               struct aom_read_bit_buffer *const rb) {
+  AV1_COMMON *const cm = &pbi->common;
+
+  // This information is stored as a separate byte.
+  int mod = rb->bit_offset % CHAR_BIT;
+  if (mod > 0) aom_rb_read_literal(rb, CHAR_BIT - mod);
+  assert(rb->bit_offset % CHAR_BIT == 0);
+
+  if (cm->tile_cols * cm->tile_rows > 1) {
+    // Read the number of bytes used to store tile size
+    pbi->tile_col_size_bytes = aom_rb_read_literal(rb, 2) + 1;
+    pbi->tile_size_bytes = aom_rb_read_literal(rb, 2) + 1;
+  }
+}
+#endif  // EXT_TILE_DEBUG
+
+static size_t mem_get_varsize(const uint8_t *src, int sz) {
+  switch (sz) {
+    case 1: return src[0];
+    case 2: return mem_get_le16(src);
+    case 3: return mem_get_le24(src);
+    case 4: return mem_get_le32(src);
+    default: assert(0 && "Invalid size"); return -1;
+  }
+}
+
+#if EXT_TILE_DEBUG
+// Reads the next tile returning its size and adjusting '*data' accordingly
+// based on 'is_last'. On return, '*data' is updated to point to the end of the
+// raw tile buffer in the bit stream.
+static void get_ls_tile_buffer(
+    const uint8_t *const data_end, struct aom_internal_error_info *error_info,
+    const uint8_t **data, TileBufferDec (*const tile_buffers)[MAX_TILE_COLS],
+    int tile_size_bytes, int col, int row, int tile_copy_mode) {
+  size_t size;
+
+  size_t copy_size = 0;
+  const uint8_t *copy_data = NULL;
+
+  if (!read_is_valid(*data, tile_size_bytes, data_end))
+    aom_internal_error(error_info, AOM_CODEC_CORRUPT_FRAME,
+                       "Truncated packet or corrupt tile length");
+  size = mem_get_varsize(*data, tile_size_bytes);
+
+  // If tile_copy_mode = 1, then the top bit of the tile header indicates copy
+  // mode.
+  if (tile_copy_mode && (size >> (tile_size_bytes * 8 - 1)) == 1) {
+    // The remaining bits in the top byte signal the row offset
+    int offset = (size >> (tile_size_bytes - 1) * 8) & 0x7f;
+
+    // Currently, only use tiles in same column as reference tiles.
+    copy_data = tile_buffers[row - offset][col].data;
+    copy_size = tile_buffers[row - offset][col].size;
+    size = 0;
+  } else {
+    size += AV1_MIN_TILE_SIZE_BYTES;
+  }
+
+  *data += tile_size_bytes;
+
+  if (size > (size_t)(data_end - *data))
+    aom_internal_error(error_info, AOM_CODEC_CORRUPT_FRAME,
+                       "Truncated packet or corrupt tile size");
+
+  if (size > 0) {
+    tile_buffers[row][col].data = *data;
+    tile_buffers[row][col].size = size;
+  } else {
+    tile_buffers[row][col].data = copy_data;
+    tile_buffers[row][col].size = copy_size;
+  }
+
+  *data += size;
+}
+
+// Returns the end of the last tile buffer
+// (tile_buffers[cm->tile_rows - 1][cm->tile_cols - 1]).
+static const uint8_t *get_ls_tile_buffers(
+    AV1Decoder *pbi, const uint8_t *data, const uint8_t *data_end,
+    TileBufferDec (*const tile_buffers)[MAX_TILE_COLS]) {
+  AV1_COMMON *const cm = &pbi->common;
+  const int tile_cols = cm->tile_cols;
+  const int tile_rows = cm->tile_rows;
+  const int have_tiles = tile_cols * tile_rows > 1;
+  const uint8_t *raw_data_end;  // The end of the last tile buffer
+
+  if (!have_tiles) {
+    const size_t tile_size = data_end - data;
+    tile_buffers[0][0].data = data;
+    tile_buffers[0][0].size = tile_size;
+    raw_data_end = NULL;
+  } else {
+    // We locate only the tile buffers that are required, which are the ones
+    // specified by pbi->dec_tile_col and pbi->dec_tile_row. Also, we always
+    // need the last (bottom right) tile buffer, as we need to know where the
+    // end of the compressed frame buffer is for proper superframe decoding.
+
+    const uint8_t *tile_col_data_end[MAX_TILE_COLS] = { NULL };
+    const uint8_t *const data_start = data;
+
+    const int dec_tile_row = AOMMIN(pbi->dec_tile_row, tile_rows);
+    const int single_row = pbi->dec_tile_row >= 0;
+    const int tile_rows_start = single_row ? dec_tile_row : 0;
+    const int tile_rows_end = single_row ? tile_rows_start + 1 : tile_rows;
+    const int dec_tile_col = AOMMIN(pbi->dec_tile_col, tile_cols);
+    const int single_col = pbi->dec_tile_col >= 0;
+    const int tile_cols_start = single_col ? dec_tile_col : 0;
+    const int tile_cols_end = single_col ? tile_cols_start + 1 : tile_cols;
+
+    const int tile_col_size_bytes = pbi->tile_col_size_bytes;
+    const int tile_size_bytes = pbi->tile_size_bytes;
+    int tile_width, tile_height;
+    av1_get_uniform_tile_size(cm, &tile_width, &tile_height);
+    const int tile_copy_mode =
+        ((AOMMAX(tile_width, tile_height) << MI_SIZE_LOG2) <= 256) ? 1 : 0;
+    // Read tile column sizes for all columns (we need the last tile buffer)
+    for (int c = 0; c < tile_cols; ++c) {
+      const int is_last = c == tile_cols - 1;
+      size_t tile_col_size;
+
+      if (!is_last) {
+        tile_col_size = mem_get_varsize(data, tile_col_size_bytes);
+        data += tile_col_size_bytes;
+        tile_col_data_end[c] = data + tile_col_size;
+      } else {
+        tile_col_size = data_end - data;
+        tile_col_data_end[c] = data_end;
+      }
+      data += tile_col_size;
+    }
+
+    data = data_start;
+
+    // Read the required tile sizes.
+    for (int c = tile_cols_start; c < tile_cols_end; ++c) {
+      const int is_last = c == tile_cols - 1;
+
+      if (c > 0) data = tile_col_data_end[c - 1];
+
+      if (!is_last) data += tile_col_size_bytes;
+
+      // Get the whole of the last column, otherwise stop at the required tile.
+      for (int r = 0; r < (is_last ? tile_rows : tile_rows_end); ++r) {
+        get_ls_tile_buffer(tile_col_data_end[c], &pbi->common.error, &data,
+                           tile_buffers, tile_size_bytes, c, r, tile_copy_mode);
+      }
+    }
+
+    // If we have not read the last column, then read it to get the last tile.
+    if (tile_cols_end != tile_cols) {
+      const int c = tile_cols - 1;
+
+      data = tile_col_data_end[c - 1];
+
+      for (int r = 0; r < tile_rows; ++r) {
+        get_ls_tile_buffer(tile_col_data_end[c], &pbi->common.error, &data,
+                           tile_buffers, tile_size_bytes, c, r, tile_copy_mode);
+      }
+    }
+    raw_data_end = data;
+  }
+  return raw_data_end;
+}
+#endif  // EXT_TILE_DEBUG
+
+// Reads the next tile returning its size and adjusting '*data' accordingly
+// based on 'is_last'.
+static void get_tile_buffer(const uint8_t *const data_end,
+                            const int tile_size_bytes, int is_last,
+                            struct aom_internal_error_info *error_info,
+                            const uint8_t **data, TileBufferDec *const buf) {
+  size_t size;
+
+  if (!is_last) {
+    if (!read_is_valid(*data, tile_size_bytes, data_end))
+      aom_internal_error(error_info, AOM_CODEC_CORRUPT_FRAME,
+                         "Truncated packet or corrupt tile length");
+
+    size = mem_get_varsize(*data, tile_size_bytes) + AV1_MIN_TILE_SIZE_BYTES;
+    *data += tile_size_bytes;
+
+    if (size > (size_t)(data_end - *data))
+      aom_internal_error(error_info, AOM_CODEC_CORRUPT_FRAME,
+                         "Truncated packet or corrupt tile size");
+  } else {
+    size = data_end - *data;
+  }
+
+  buf->data = *data;
+  buf->size = size;
+
+  *data += size;
+}
+
+static void get_tile_buffers(AV1Decoder *pbi, const uint8_t *data,
+                             const uint8_t *data_end,
+                             TileBufferDec (*const tile_buffers)[MAX_TILE_COLS],
+                             int start_tile, int end_tile) {
+  AV1_COMMON *const cm = &pbi->common;
+  const int tile_cols = cm->tile_cols;
+  const int tile_rows = cm->tile_rows;
+  int tc = 0;
+  int first_tile_in_tg = 0;
+
+  for (int r = 0; r < tile_rows; ++r) {
+    for (int c = 0; c < tile_cols; ++c, ++tc) {
+      TileBufferDec *const buf = &tile_buffers[r][c];
+
+      const int is_last = (tc == end_tile);
+      const size_t hdr_offset = 0;
+
+      if (tc < start_tile || tc > end_tile) continue;
+
+      if (data + hdr_offset >= data_end)
+        aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                           "Data ended before all tiles were read.");
+      first_tile_in_tg += tc == first_tile_in_tg ? pbi->tg_size : 0;
+      data += hdr_offset;
+      get_tile_buffer(data_end, pbi->tile_size_bytes, is_last,
+                      &pbi->common.error, &data, buf);
+    }
+  }
+}
+
+static void set_cb_buffer(AV1Decoder *pbi, MACROBLOCKD *const xd,
+                          CB_BUFFER *cb_buffer_base, const int num_planes,
+                          int mi_row, int mi_col) {
+  AV1_COMMON *const cm = &pbi->common;
+  int mib_size_log2 = cm->seq_params.mib_size_log2;
+  int stride = (cm->mi_cols >> mib_size_log2) + 1;
+  int offset = (mi_row >> mib_size_log2) * stride + (mi_col >> mib_size_log2);
+  CB_BUFFER *cb_buffer = cb_buffer_base + offset;
+  xd->cb_buffer = cb_buffer;
+  //cb_buffer->dqcoeff_ptr = 0;
+  for (int plane = 0; plane < num_planes; ++plane) {
+    xd->plane[plane].eob_data = cb_buffer->eob_data[plane];
+    xd->cb_offset[plane] = 0;
+    xd->txb_offset[plane] = 0;
+  }
+  xd->plane[0].color_index_map = cb_buffer->color_index_map[0];
+  xd->plane[1].color_index_map = cb_buffer->color_index_map[1];
+  xd->color_index_map_offset[0] = 0;
+  xd->color_index_map_offset[1] = 0;
+}
+
+static void decoder_alloc_tile_data(AV1Decoder *pbi, const int n_tiles) {
+  pbi->tile_data = pbi->tile_data_alloc;
+  pbi->allocated_tiles = n_tiles;
+  pbi->allocated_row_mt_sync_rows = 0;
+}
+
+
+// Deallocate decoder row synchronization related mutex and data
+void av1_dec_row_mt_dealloc(AV1DecRowMTSync *dec_row_mt_sync) {
+  if (dec_row_mt_sync != NULL) {
+#if CONFIG_MULTITHREAD
+    int i;
+    if (dec_row_mt_sync->mutex_ != NULL) {
+      for (i = 0; i < dec_row_mt_sync->allocated_sb_rows; ++i) {
+        pthread_mutex_destroy(&dec_row_mt_sync->mutex_[i]);
+      }
+      aom_free(dec_row_mt_sync->mutex_);
+    }
+    if (dec_row_mt_sync->cond_ != NULL) {
+      for (i = 0; i < dec_row_mt_sync->allocated_sb_rows; ++i) {
+        pthread_cond_destroy(&dec_row_mt_sync->cond_[i]);
+      }
+      aom_free(dec_row_mt_sync->cond_);
+    }
+#endif  // CONFIG_MULTITHREAD
+    aom_free(dec_row_mt_sync->cur_sb_col);
+
+    // clear the structure as the source of this call may be a resize in which
+    // case this call will be followed by an _alloc() which may fail.
+    av1_zero(*dec_row_mt_sync);
+  }
+}
+
+
+static int check_trailing_bits_after_symbol_coder(aom_reader *r) {
+  if (aom_reader_has_overflowed(r)) return -1;
+
+  uint32_t nb_bits = aom_reader_tell(r);
+  uint32_t nb_bytes = (nb_bits + 7) >> 3;
+  const uint8_t *p = aom_reader_find_begin(r) + nb_bytes;
+
+  // aom_reader_tell() returns 1 for a newly initialized decoder, and the
+  // return value only increases as values are decoded. So nb_bits > 0, and
+  // thus p > p_begin. Therefore accessing p[-1] is safe.
+  uint8_t last_byte = p[-1];
+  uint8_t pattern = 128 >> ((nb_bits - 1) & 7);
+  if ((last_byte & (2 * pattern - 1)) != pattern) return -1;
+
+  // Make sure that all padding bytes are zero as required by the spec.
+  const uint8_t *p_end = aom_reader_find_end(r);
+  while (p < p_end) {
+    if (*p != 0) return -1;
+    p++;
+  }
+  return 0;
+}
+
+static void set_decode_func_pointers(ThreadData *td, int parse_decode_flag) {
+  (void)parse_decode_flag;
+  td->read_coeffs_tx_intra_block_visit = decode_block_void;
+  td->predict_and_recon_intra_block_visit = decode_block_void;
+  td->read_coeffs_tx_inter_block_visit = decode_block_void;
+  td->inverse_tx_inter_block_visit = decode_block_void;
+  td->predict_inter_block_visit = predict_inter_block_void;
+  td->cfl_store_inter_block_visit = cfl_store_inter_block_void;
+
+  td->read_coeffs_tx_intra_block_visit = read_coeffs_tx_intra_block;
+  td->read_coeffs_tx_inter_block_visit = av1_read_coeffs_txb_facade;
+}
+
+static void decode_tile2(AV1Decoder *pbi, ThreadData *const td, int tile_row,
+                         int tile_col, int flags) {
+  TileInfo tile_info;
+  int block = 0;
+  AV1_COMMON *const cm = &pbi->common;
+  const int num_planes = av1_num_planes(cm);
+
+  static int tile_no = 0;
+  ++tile_no;
+
+  set_decode_func_pointers(td, 0x1);
+
+  av1_tile_set_row(&tile_info, cm, tile_row);
+  av1_tile_set_col(&tile_info, cm, tile_col);
+  av1_zero_above_context(cm, &td->xd, tile_info.mi_col_start,
+                         tile_info.mi_col_end, tile_row);
+  av1_reset_loop_filter_delta(&td->xd, num_planes);
+  av1_reset_loop_restoration(&td->xd, num_planes);
+
+  av1_setup_macroblockd(pbi, cm, td, &tile_info);
+
+  for (int mi_row = tile_info.mi_row_start; mi_row < tile_info.mi_row_end; mi_row += cm->seq_params.mib_size)
+  {
+    av1_zero_left_context(&td->xd);
+    for (int mi_col = tile_info.mi_col_start; mi_col < tile_info.mi_col_end; mi_col += cm->seq_params.mib_size)
+    {
+      if (!td->ext_idct_buffer && td->tile_data->dq_buffer_ptr >= td->tile_data->dq_buffer_max) {
+          av1_setup_ext_coef_buffer(pbi, cm, td);
+          td->ext_idct_buffer = 1;
+      }
+      
+      set_cb_buffer(pbi, &td->xd, &td->cb_buffer_base, num_planes, 0, 0);
+      // Bit-stream parsing and decoding of the superblock
+      decode_partition(pbi, td, mi_row, mi_col, td->bit_reader,
+                       cm->seq_params.sb_size, 1);
+
+      if (aom_reader_has_overflowed(td->bit_reader)) {
+        aom_merge_corrupted_flag(&td->xd.corrupted, 1);
+        return;
+      }
+    }
+  }
+
+  int corrupted =
+      (check_trailing_bits_after_symbol_coder(td->bit_reader)) ? 1 : 0;
+  aom_merge_corrupted_flag(&td->xd.corrupted, corrupted);
+
+#if 1
+  td->tile_data->mi_count = td->mi_count;
+  if (td->tile_data2)
+    td->tile_data2->mi_count = td->mi_count2;
+  for (block = 0; block < td->mi_count; ++block) {
+      MB_MODE_INFO *mi = &td->mi_pool[block];
+      const int mi_col = mi->mi_col;
+      const int mi_row = mi->mi_row;
+      const int bsize = mi->sb_type;
+      MACROBLOCKD *const xd = &td->xd;
+
+      set_offsets_for_pred_and_recon(pbi, td, mi_row, mi_col, bsize);
+      av1_mi_push_block(pbi, cm, xd);
+  }
+  for (block = 0; block < td->mi_count2; ++block) {
+      MB_MODE_INFO *mi = &td->mi_pool2[block];
+      const int mi_col = mi->mi_col;
+      const int mi_row = mi->mi_row;
+      const int bsize = mi->sb_type;
+      MACROBLOCKD *const xd = &td->xd;
+      set_offsets_for_pred_and_recon(pbi, td, mi_row, mi_col, bsize);
+      av1_mi_push_block(pbi, cm, xd);
+  }
+
+#else
+  set_decode_func_pointers(td, 0x2);
+
+  //idct ;
+  {
+      for (int mi_row = tile_info.mi_row_start; mi_row < tile_info.mi_row_end; mi_row += cm->seq_params.mib_size)
+      {
+          for (int mi_col = tile_info.mi_col_start; mi_col < tile_info.mi_col_end; mi_col += cm->seq_params.mib_size)
+          {
+              set_cb_buffer(pbi, &td->xd, pbi->cb_buffer_base, num_planes, mi_row, mi_col);
+              CB_BUFFER *cb = td->xd.cb_buffer;
+              for (int blk = 0; blk < cb->tx_block_count; ++blk)
+              {
+                  tx_block_info * tx_block = &cb->tx_block_data[blk];
+                  if (tx_block->eob)
+                  {
+                      const int bx = tx_block->x;
+                      const int by = tx_block->y;
+                      const int plane = tx_block->plane;
+                      const struct macroblockd_plane *const pd = &td->xd.plane[plane];
+                      const int dst_stride = ((cm->mi_cols + 31) & (~31)) * 4 >> (plane > 0);
+                      int16_t * dst = pbi->res_buffer_base[plane] + bx + by * dst_stride;
+                      tran_low_t *const dqcoeff = pd->dqcoeff_block + tx_block->dq_offset;
+                      uint16_t scan_line = tx_block->max_scan_line;
+                      uint16_t eob = tx_block->eob;
+                      TxfmParam params;
+                      params.tx_type = tx_block->tx_type;
+                      params.tx_size = tx_block->tx_size;
+                      params.eob = tx_block->eob;
+                      params.lossless = tx_block->flags;
+                      params.bd = td->xd.bd;
+                      params.is_hbd = is_cur_buf_hbd(&td->xd);
+                      params.tx_set_type = EXT_TX_SET_ALL16;
+
+
+                      av1_highbd_inv_txfm_c(dqcoeff, dst, dst_stride, &params);
+                    //  memset(dqcoeff, 0, (scan_line + 1) * sizeof(dqcoeff[0]));
+                  }
+              }
+          }
+      }
+
+      for (int mi_row = tile_info.mi_row_start; mi_row < tile_info.mi_row_end; mi_row += cm->seq_params.mib_size)
+      {
+          for (int mi_col = tile_info.mi_col_start; mi_col < tile_info.mi_col_end; mi_col += cm->seq_params.mib_size)
+          {
+              set_cb_buffer(pbi, &td->xd, pbi->cb_buffer_base, num_planes, mi_row, mi_col);
+              CB_BUFFER *cb = td->xd.cb_buffer;
+              for (int blk = 0; blk < cb->tx_block_count; ++blk)
+              {
+                  tx_block_info * tx_block = &cb->tx_block_data[blk];
+                  if (tx_block->eob)
+                  {
+                      const int plane = tx_block->plane;
+                      const struct macroblockd_plane *const pd = &td->xd.plane[plane];
+                      tran_low_t *const dqcoeff = pd->dqcoeff_block + tx_block->dq_offset;
+                      uint16_t scan_line = tx_block->max_scan_line;
+                      memset(dqcoeff, 0, (scan_line + 1) * sizeof(dqcoeff[0]));
+                  }
+              }
+          }
+      }
+  }
+
+  for (block = 0; block < td->mi_count; ++block) {
+    MB_MODE_INFO *mi = &td->mi_pool[block];
+    const int sb_mask = cm->seq_params.mib_size - 1;    
+    const int mi_col = mi->mi_col;
+    const int mi_row = mi->mi_row;
+    const int bsize = mi->sb_type;
+    MACROBLOCKD *const xd = &td->xd;
+
+    if ((mi_col & sb_mask) == 0 && (mi_row & sb_mask) == 0)
+      set_cb_buffer(pbi, &td->xd, pbi->cb_buffer_base, num_planes, mi_row, mi_col);
+
+    set_offsets_for_pred_and_recon(pbi, td, mi_row, mi_col, bsize);
+
+    av1_mi_push_block(cm, xd);
+
+    {
+      MB_MODE_INFO *mbmi = xd->mi[0];
+      CFL_CTX *const cfl = &xd->cfl;
+      cfl->is_chroma_reference = is_chroma_reference(
+          mi_row, mi_col, bsize, cfl->subsampling_x, cfl->subsampling_y);
+
+
+
+      if (!is_inter_block(mbmi)) {
+        int row, col;
+        assert(bsize == get_plane_block_size(bsize, xd->plane[0].subsampling_x, xd->plane[0].subsampling_y));
+        const int max_blocks_wide = max_block_wide(xd, bsize, 0);
+        const int max_blocks_high = max_block_high(xd, bsize, 0);
+        const BLOCK_SIZE max_unit_bsize = BLOCK_64X64;
+        int mu_blocks_wide =
+            block_size_wide[max_unit_bsize] >> tx_size_wide_log2[0];
+        int mu_blocks_high =
+            block_size_high[max_unit_bsize] >> tx_size_high_log2[0];
+
+        mu_blocks_wide = AOMMIN(max_blocks_wide, mu_blocks_wide);
+        mu_blocks_high = AOMMIN(max_blocks_high, mu_blocks_high);
+
+        for (row = 0; row < max_blocks_high; row += mu_blocks_high) {
+          for (col = 0; col < max_blocks_wide; col += mu_blocks_wide) {
+            for (int plane = 0; plane < num_planes; ++plane) {
+              const struct macroblockd_plane *const pd = &xd->plane[plane];
+              if (!is_chroma_reference(mi_row, mi_col, bsize, pd->subsampling_x,
+                                       pd->subsampling_y))
+                continue;
+
+              const TX_SIZE tx_size = av1_get_tx_size(plane, xd);
+              const int stepr = tx_size_high_unit[tx_size];
+              const int stepc = tx_size_wide_unit[tx_size];
+
+              const int unit_height = ROUND_POWER_OF_TWO(
+                  AOMMIN(mu_blocks_high + row, max_blocks_high),
+                  pd->subsampling_y);
+              const int unit_width = ROUND_POWER_OF_TWO(
+                  AOMMIN(mu_blocks_wide + col, max_blocks_wide),
+                  pd->subsampling_x);
+
+              for (int blk_row = row >> pd->subsampling_y; blk_row < unit_height; blk_row += stepr) {
+                  for (int blk_col = col >> pd->subsampling_x; blk_col < unit_width; blk_col += stepc) {
+                      int c, r;
+                      av1_predict_intra_block_facade(cm, xd, plane, blk_col, blk_row, tx_size);
+                      uint8_t *dst_ = &pd->dst.buf[(blk_row * pd->dst.stride + blk_col) << tx_size_wide_log2[0]];
+                      const int new_res_stride = ((cm->mi_cols + 31) & (~31)) * 4 >> (plane > 0);
+                      int res_x = (((mi_col * MI_SIZE) >> pd->subsampling_x) + (blk_col << tx_size_wide_log2[0])) & (~3);
+                      int res_y = (((mi_row * MI_SIZE) >> pd->subsampling_y) + (blk_row << tx_size_wide_log2[0])) & (~3);
+
+                      int16_t * new_res_ = pbi->res_buffer_base[plane] + res_x + res_y * new_res_stride;
+                          //((mi_col * MI_SIZE) >> pd->subsampling_x) +
+                          //((mi_row * MI_SIZE) >> pd->subsampling_y) * new_res_stride +
+                          //((blk_col + blk_row * new_res_stride) << tx_size_wide_log2[0]);
+                      uint8_t * dst = dst_;
+                      int16_t * new_res = new_res_;
+                      if (!mi->skip)
+                      {
+                          for (r = 0; r < (stepr << 2); ++r)
+                          {
+                              for (c = 0; c < (stepc << 2); ++c)
+                              {
+                                  dst[c] = clip_pixel_highbd((int)new_res[c] + (int)dst[c], xd->bd);
+                              }
+                              dst += pd->dst.stride;
+                              new_res += new_res_stride;
+                          }
+                      }
+
+                      if (plane == AOM_PLANE_Y && store_cfl_required(cm, xd)) {
+                          cfl_store_tx(xd, blk_row, blk_col, tx_size, mbmi->sb_type);
+                      }
+                  }
+              }
+            }
+          }
+        }
+      } else {
+        predict_inter_block(cm, xd, mi_row, mi_col, bsize);
+        
+        //if (is_interintra_pred(mbmi))
+        //    av1_inter_block_run(cm, xd);
+
+        if (!mbmi->skip)
+        {
+            for (int plane = 0; plane < 3; ++plane)
+            {
+                const struct macroblockd_plane *const pd = &xd->plane[plane];
+                if (is_chroma_reference(mi_row, mi_col, bsize, pd->subsampling_x, pd->subsampling_y))
+                {
+                    int x, y;
+                    int w = MAX(4, block_size_wide[bsize] >> pd->subsampling_x);
+                    int h = MAX(4, block_size_high[bsize] >> pd->subsampling_y);
+                    uint8_t * dst = pd->dst.buf;
+                    const int stride = pd->dst.stride;
+                    const int new_res_stride = ((cm->mi_cols + 31) & (~31)) * 4 >> (plane > 0);
+                    
+                    int mi_col1 = mi_col;
+                    int mi_row1 = mi_row;
+                    if (pd->subsampling_y && (mi_row & 0x01) && (mi_size_high[mbmi->sb_type] == 1))
+                        mi_row1 -= 1;
+                    if (pd->subsampling_x && (mi_col & 0x01) && (mi_size_wide[mbmi->sb_type] == 1))
+                        mi_col1 -= 1;
+                    int16_t * new_res = pbi->res_buffer_base[plane] +
+                        ((mi_col1 * MI_SIZE) >> pd->subsampling_x) +
+                        ((mi_row1 * MI_SIZE) >> pd->subsampling_y) * new_res_stride;
+                    for (y = 0; y < h; ++y)
+                    {
+                        for (x = 0; x < w; ++x)
+                        {
+                            dst[x] = clip_pixel_highbd((int)new_res[x] + (int)dst[x], xd->bd);
+                        }
+                        dst += stride;
+                        new_res += new_res_stride;
+                    }
+                }
+            }
+        }
+        cfl_store_inter_block(cm, xd);
+      }
+
+    //  if (/*!is_inter_block(mbmi) || is_intrabc_block(mbmi)*/ is_interintra_pred(mbmi))
+    //  {
+    //      int16_t * res[3];
+    //      const int res_stride = ((cm->mi_cols + 31) & (~31)) * 4;
+    //      res[0] = pbi->res_buffer_base[0] + (mi_col * MI_SIZE) + (mi_row * MI_SIZE) * res_stride;
+    //      res[1] = pbi->res_buffer_base[1] + ((mi_col * MI_SIZE) >> 1) + ((mi_row * MI_SIZE) >> 1) * (res_stride >> 1);
+    //      res[2] = pbi->res_buffer_base[2] + ((mi_col * MI_SIZE) >> 1) + ((mi_row * MI_SIZE) >> 1) * (res_stride >> 1);
+    //      av1_intra_block_run(cm, xd, res, res_stride);
+    //  }
+
+
+      av1_visit_palette(pbi, xd, mi_row, mi_col, NULL, bsize,
+                        set_color_index_map_offset);
+    }
+  }
+
+//  {
+//  static int tile_idx = 0;
+//  ++tile_idx;
+//    //      if (tile_idx >= 9)
+//          {
+//              int mib_size_log2 = cm->seq_params.mib_size_log2;
+//              int stride = (cm->mi_cols >> mib_size_log2) + 1;
+//              int y_stride = ((cm->mi_cols + 31) & (~31)) * 4;
+//              av1_idct_run(&td->xd, pbi->cb_buffer_base, tile_info, stride, mib_size_log2, pbi->res_buffer_base, y_stride);
+//              av1_prediction_run_all(cm, &tile_info);
+//          }
+//
+//  }
+#endif
+}
+
+
+static TileJobsDec *get_dec_job_info(AV1DecTileMT *tile_mt_info) {
+  TileJobsDec *cur_job_info = NULL;
+#if CONFIG_MULTITHREAD
+  pthread_mutex_lock(tile_mt_info->job_mutex);
+
+  if (tile_mt_info->jobs_dequeued < tile_mt_info->jobs_enqueued) {
+    cur_job_info = tile_mt_info->job_queue + tile_mt_info->jobs_dequeued;
+    tile_mt_info->jobs_dequeued++;
+  }
+
+  pthread_mutex_unlock(tile_mt_info->job_mutex);
+#else
+  (void)tile_mt_info;
+#endif
+  return cur_job_info;
+}
+
+static void tile_worker_hook_init(AV1Decoder *const pbi,
+                                  DecWorkerData *const thread_data,
+                                  const TileBufferDec *const tile_buffer,
+                                  TileDataDec *const tile_data,
+                                  uint8_t allow_update_cdf) {
+  AV1_COMMON *cm = &pbi->common;
+  ThreadData *const td = thread_data->td;
+  int tile_row = tile_data->tile_info.tile_row;
+  int tile_col = tile_data->tile_info.tile_col;
+
+  td->bit_reader = tile_data->is_last? &pbi->bit_reader_end : &td->thread_bit_reader;
+  av1_tile_init(&td->xd.tile, cm, tile_row, tile_col);
+  td->xd.current_qindex = cm->base_qindex;
+  setup_bool_decoder(tile_buffer->data, thread_data->data_end,
+                     tile_buffer->size, &thread_data->error_info,
+                     td->bit_reader, allow_update_cdf);
+#if CONFIG_ACCOUNTING
+  if (pbi->acct_enabled) {
+    td->bit_reader->accounting = &pbi->accounting;
+    td->bit_reader->accounting->last_tell_frac =
+        aom_reader_tell_frac(td->bit_reader);
+  } else {
+    td->bit_reader->accounting = NULL;
+  }
+#endif
+  av1_init_macroblockd(cm, &td->xd, NULL);
+  td->xd.error_info = &thread_data->error_info;
+  av1_init_above_context(cm, &td->xd, tile_row);
+
+  // Initialise the tile context from the frame context
+  td->xd.tile_ctx = tile_data->is_choosen_one ? &pbi->ctx_the_choosen_one : &td->tctx;
+  *td->xd.tile_ctx = *cm->fc;
+#if CONFIG_ACCOUNTING
+  if (pbi->acct_enabled) {
+    tile_data->bit_reader.accounting->last_tell_frac =
+        aom_reader_tell_frac(&tile_data->bit_reader);
+  }
+#endif
+}
+
+static int tile_worker_hook(void *arg1, void *arg2) {
+  DecWorkerData *const thread_data = (DecWorkerData *)arg1;
+  AV1Decoder *const pbi = (AV1Decoder *)arg2;
+  AV1_COMMON *cm = &pbi->common;
+  ThreadData *const td = thread_data->td;
+  uint8_t allow_update_cdf;
+
+  // The jmp_buf is valid only for the duration of the function that calls
+  // setjmp(). Therefore, this function must reset the 'setjmp' field to 0
+  // before it returns.
+  if (setjmp(thread_data->error_info.jmp)) {
+    thread_data->error_info.setjmp = 0;
+    thread_data->td->xd.corrupted = 1;
+    return 0;
+  }
+  thread_data->error_info.setjmp = 1;
+
+  allow_update_cdf = cm->large_scale_tile ? 0 : 1;
+  allow_update_cdf = allow_update_cdf && !cm->disable_cdf_update;
+
+  assert(cm->tile_cols > 0);
+  while (!td->xd.corrupted) {
+    TileJobsDec *cur_job_info = get_dec_job_info(&pbi->tile_mt_info);
+
+    if (cur_job_info != NULL) {
+      const TileBufferDec *const tile_buffer = cur_job_info->tile_buffer;
+      TileDataDec *const tile_data = cur_job_info->tile_data;
+      tile_worker_hook_init(pbi, thread_data, tile_buffer, tile_data,
+                            allow_update_cdf);
+      // decode tile
+      int tile_row = tile_data->tile_info.tile_row;
+      int tile_col = tile_data->tile_info.tile_col;
+
+      decode_tile2(pbi, td, tile_row, tile_col, 0);
+    } else {
+      break;
+    }
+  }
+  thread_data->error_info.setjmp = 0;
+  return !td->xd.corrupted;
+}
+
+// sorts in descending order
+static int compare_tile_buffers(const void *a, const void *b) {
+  const TileJobsDec *const buf1 = (const TileJobsDec *)a;
+  const TileJobsDec *const buf2 = (const TileJobsDec *)b;
+  return (((int)buf2->tile_buffer->size) - ((int)buf1->tile_buffer->size));
+}
+
+static void enqueue_tile_jobs(AV1Decoder *pbi, AV1_COMMON *cm,
+                              int tile_rows_start, int tile_rows_end,
+                              int tile_cols_start, int tile_cols_end,
+                              int start_tile, int end_tile) {
+  AV1DecTileMT *tile_mt_info = &pbi->tile_mt_info;
+  TileJobsDec *tile_job_queue = tile_mt_info->job_queue;
+  tile_mt_info->jobs_enqueued = 0;
+  tile_mt_info->jobs_dequeued = 0;
+
+  for (int row = tile_rows_start; row < tile_rows_end; row++) {
+    for (int col = tile_cols_start; col < tile_cols_end; col++) {
+      if (row * cm->tile_cols + col < start_tile ||
+          row * cm->tile_cols + col > end_tile)
+        continue;
+      tile_job_queue->tile_buffer = &pbi->tile_buffers[row][col];
+      tile_job_queue->tile_data = pbi->tile_data + row * cm->tile_cols + col;
+      tile_job_queue++;
+      tile_mt_info->jobs_enqueued++;
+    }
+  }
+}
+
+static void alloc_dec_jobs(AV1DecTileMT *tile_mt_info, AV1_COMMON *cm,
+                           int tile_rows, int tile_cols) {
+  tile_mt_info->alloc_tile_rows = tile_rows;
+  tile_mt_info->alloc_tile_cols = tile_cols;
+  int num_tiles = tile_rows * tile_cols;
+#if CONFIG_MULTITHREAD
+  {
+    for (int i = 0; i < num_tiles; i++) {
+      pthread_mutex_init(&tile_mt_info->job_mutex[i], NULL);
+    }
+  }
+#endif
+}
+
+void av1_free_mc_tmp_buf(ThreadData *thread_data) {
+  int ref;
+  if (thread_data)
+  {
+      for (ref = 0; ref < 2; ref++) {
+          if (thread_data->mc_buf_use_highbd)
+              aom_free(CONVERT_TO_SHORTPTR(thread_data->mc_buf[ref]));
+          else
+              aom_free(thread_data->mc_buf[ref]);
+          thread_data->mc_buf[ref] = NULL;
+      }
+      thread_data->mc_buf_size = 0;
+      thread_data->mc_buf_use_highbd = 0;
+
+      aom_free(thread_data->tmp_conv_dst);
+      thread_data->tmp_conv_dst = NULL;
+      for (int i = 0; i < 2; ++i) {
+          aom_free(thread_data->tmp_obmc_bufs[i]);
+          thread_data->tmp_obmc_bufs[i] = NULL;
+      }
+  }
+}
+
+
+static void reset_dec_workers(AV1Decoder *pbi, AVxWorkerHook worker_hook,
+                              int num_workers) {
+  const AVxWorkerInterface *const winterface = aom_get_worker_interface();
+
+  // Reset tile decoding hook
+  for (int worker_idx = 0; worker_idx < num_workers; ++worker_idx) {
+    AVxWorker *const worker = &pbi->tile_workers[worker_idx];
+    DecWorkerData *const thread_data = pbi->thread_data + worker_idx;
+    thread_data->td->xd = pbi->mb;
+    thread_data->td->xd.corrupted = 0;
+    thread_data->td->xd.mc_buf[0] = thread_data->td->mc_buf[0];
+    thread_data->td->xd.mc_buf[1] = thread_data->td->mc_buf[1];
+    thread_data->td->xd.tmp_conv_dst = thread_data->td->tmp_conv_dst;
+    for (int j = 0; j < 2; ++j) {
+      thread_data->td->xd.tmp_obmc_bufs[j] = thread_data->td->tmp_obmc_bufs[j];
+    }
+    thread_data->td->thread_id = worker_idx;
+
+    winterface->sync(worker);
+    
+    worker->hook = worker_hook;
+    worker->data1 = thread_data;
+    worker->data2 = pbi;
+  }
+#if CONFIG_ACCOUNTING
+  if (pbi->acct_enabled) {
+    aom_accounting_reset(&pbi->accounting);
+  }
+#endif
+}
+
+static void launch_dec_workers(AV1Decoder *pbi, const uint8_t *data_end,
+                               int num_workers) {
+  const AVxWorkerInterface *const winterface = aom_get_worker_interface();
+
+  for (int worker_idx = 0; worker_idx < num_workers; ++worker_idx) {
+    AVxWorker *const worker = &pbi->tile_workers[worker_idx];
+    DecWorkerData *const thread_data = (DecWorkerData *)worker->data1;
+
+    thread_data->data_end = data_end;
+
+    worker->had_error = 0;
+    if (worker_idx == num_workers - 1) {
+      winterface->execute(worker);
+    } else {
+      winterface->launch(worker);
+    }
+  }
+}
+
+static void sync_dec_workers(AV1Decoder *pbi, int num_workers) {
+  const AVxWorkerInterface *const winterface = aom_get_worker_interface();
+  int corrupted = 0;
+
+  for (int worker_idx = num_workers; worker_idx > 0; --worker_idx) {
+    AVxWorker *const worker = &pbi->tile_workers[worker_idx - 1];
+    aom_merge_corrupted_flag(&corrupted, !winterface->sync(worker));
+  }
+
+  pbi->mb.corrupted = corrupted;
+}
+
+static void decode_mt_init(AV1Decoder *pbi) {
+  AV1_COMMON *const cm = &pbi->common;
+  const AVxWorkerInterface *const winterface = aom_get_worker_interface();
+  int worker_idx;
+
+  // Create workers and thread_data
+  if (pbi->num_workers == 0) {
+    const int num_threads = pbi->max_threads;
+    memset(pbi->tile_workers, 0, sizeof(pbi->tile_workers));
+    memset(pbi->thread_data, 0, sizeof(pbi->thread_data));
+
+    for (worker_idx = 0; worker_idx < num_threads; ++worker_idx) {
+      AVxWorker *const worker = &pbi->tile_workers[worker_idx];
+      DecWorkerData *const thread_data = pbi->thread_data + worker_idx;
+      ++pbi->num_workers;
+
+      winterface->init(worker);
+      worker->thread_name = "aom tile worker";
+      if (worker_idx < num_threads - 1 && !winterface->reset(worker)) {
+        aom_internal_error(&cm->error, AOM_CODEC_MEM_ERROR,
+                           "Tile decoder thread creation failed");
+      }
+        // Main thread acts as a worker and uses the thread data in pbi
+      thread_data->td = &pbi->td[worker_idx];
+      thread_data->error_info.error_code = AOM_CODEC_OK;
+      thread_data->error_info.setjmp = 0;
+    }
+  }
+}
+
+static void tile_mt_queue(AV1Decoder *pbi, int tile_cols, int tile_rows,
+                          int tile_rows_start, int tile_rows_end,
+                          int tile_cols_start, int tile_cols_end,
+                          int start_tile, int end_tile) {
+  AV1_COMMON *const cm = &pbi->common;
+  if (pbi->tile_mt_info.alloc_tile_cols != tile_cols ||
+      pbi->tile_mt_info.alloc_tile_rows != tile_rows) {
+    av1_dealloc_dec_jobs(&pbi->tile_mt_info);
+    alloc_dec_jobs(&pbi->tile_mt_info, cm, tile_rows, tile_cols);
+  }
+  enqueue_tile_jobs(pbi, cm, tile_rows_start, tile_rows_end, tile_cols_start,
+                    tile_cols_end, start_tile, end_tile);
+  qsort(pbi->tile_mt_info.job_queue, pbi->tile_mt_info.jobs_enqueued,
+        sizeof(pbi->tile_mt_info.job_queue[0]), compare_tile_buffers);
+}
+
+static void dec_alloc_cb_buf(AV1Decoder *pbi) {
+    pbi->cb_buffer_base = NULL;
+//  AV1_COMMON *const cm = &pbi->common;
+//  int size = ((cm->mi_rows >> cm->seq_params.mib_size_log2) + 1) *
+//             ((cm->mi_cols >> cm->seq_params.mib_size_log2) + 1);
+//
+//  if (pbi->cb_buffer_alloc_size < size) {
+//    av1_dec_free_cb_buf(pbi);
+//    CHECK_MEM_ERROR(cm, pbi->cb_buffer_base,
+//                    aom_memalign(32, sizeof(*pbi->cb_buffer_base) * size));
+//    memset(pbi->cb_buffer_base, 0, sizeof(*pbi->cb_buffer_base) * size);
+//    pbi->cb_buffer_alloc_size = size;
+//  }
+}
+
+static const uint8_t *decode_tiles_mt(AV1Decoder *pbi, const uint8_t *data,
+                                      const uint8_t *data_end, int start_tile,
+                                      int end_tile) {
+  AV1_COMMON *const cm = &pbi->common;
+  const int tile_cols = cm->tile_cols;
+  const int tile_rows = cm->tile_rows;
+  const int n_tiles = tile_cols * tile_rows;
+  TileBufferDec(*const tile_buffers)[MAX_TILE_COLS] = pbi->tile_buffers;
+  const int dec_tile_row = AOMMIN(pbi->dec_tile_row, tile_rows);
+  const int single_row = pbi->dec_tile_row >= 0;
+  const int dec_tile_col = AOMMIN(pbi->dec_tile_col, tile_cols);
+  const int single_col = pbi->dec_tile_col >= 0;
+  int tile_rows_start;
+  int tile_rows_end;
+  int tile_cols_start;
+  int tile_cols_end;
+  int tile_count_tg;
+  int num_workers;
+  const uint8_t *raw_data_end = NULL;
+
+  if (cm->large_scale_tile) {
+    tile_rows_start = single_row ? dec_tile_row : 0;
+    tile_rows_end = single_row ? dec_tile_row + 1 : tile_rows;
+    tile_cols_start = single_col ? dec_tile_col : 0;
+    tile_cols_end = single_col ? tile_cols_start + 1 : tile_cols;
+  } else {
+    tile_rows_start = 0;
+    tile_rows_end = tile_rows;
+    tile_cols_start = 0;
+    tile_cols_end = tile_cols;
+  }
+  tile_count_tg = end_tile - start_tile + 1;
+  num_workers = AOMMIN(pbi->max_threads, tile_count_tg);
+
+  // No tiles to decode.
+  if (tile_rows_end <= tile_rows_start || tile_cols_end <= tile_cols_start ||
+      // First tile is larger than end_tile.
+      tile_rows_start * tile_cols + tile_cols_start > end_tile ||
+      // Last tile is smaller than start_tile.
+      (tile_rows_end - 1) * tile_cols + tile_cols_end - 1 < start_tile)
+    return data;
+
+  assert(tile_rows <= MAX_TILE_ROWS);
+  assert(tile_cols <= MAX_TILE_COLS);
+  assert(tile_count_tg > 0);
+  assert(num_workers > 0);
+  assert(start_tile <= end_tile);
+  assert(start_tile >= 0 && end_tile < n_tiles);
+
+  //decode_mt_init(pbi);
+  dec_alloc_cb_buf(pbi);
+
+  // get tile size in tile group
+#if EXT_TILE_DEBUG
+  if (cm->large_scale_tile) assert(pbi->ext_tile_debug == 1);
+  if (cm->large_scale_tile)
+    raw_data_end = get_ls_tile_buffers(pbi, data, data_end, tile_buffers);
+  else
+#endif  // EXT_TILE_DEBUG
+    get_tile_buffers(pbi, data, data_end, tile_buffers, start_tile, end_tile);
+
+  if (pbi->tile_data == NULL || n_tiles != pbi->allocated_tiles) {
+    decoder_alloc_tile_data(pbi, n_tiles);
+  }
+
+  for (int row = 0; row < tile_rows; row++) {
+    for (int col = 0; col < tile_cols; col++) {
+      TileDataDec *tile_data = pbi->tile_data + row * cm->tile_cols + col;
+      av1_tile_init(&tile_data->tile_info, cm, row, col);
+      tile_data->is_choosen_one = 0;
+      tile_data->is_last = 0;
+    }
+  }
+  if (cm->refresh_frame_context == REFRESH_FRAME_CONTEXT_BACKWARD)
+      pbi->tile_data[cm->context_update_tile_id].is_choosen_one = 1;
+  pbi->tile_data[end_tile].is_last = 1;
+
+
+  tile_mt_queue(pbi, tile_cols, tile_rows, tile_rows_start, tile_rows_end,
+                tile_cols_start, tile_cols_end, start_tile, end_tile);
+
+  reset_dec_workers(pbi, tile_worker_hook, num_workers);
+  launch_dec_workers(pbi, data_end, num_workers);
+  sync_dec_workers(pbi, num_workers);
+
+  if (pbi->mb.corrupted)
+    aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                       "Failed to decode tile data");
+
+  if (cm->large_scale_tile) {
+    if (n_tiles == 1) {
+      // Find the end of the single tile buffer
+      return aom_reader_find_end(&pbi->bit_reader_end);
+    }
+    // Return the end of the last tile buffer
+    return raw_data_end;
+  }
+  return aom_reader_find_end(&pbi->bit_reader_end);
+}
+
+static void error_handler(void *data) {
+  AV1_COMMON *const cm = (AV1_COMMON *)data;
+  aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME, "Truncated packet");
+}
+
+// Reads the high_bitdepth and twelve_bit fields in color_config() and sets
+// seq_params->bit_depth based on the values of those fields and
+// seq_params->profile. Reports errors by calling rb->error_handler() or
+// aom_internal_error().
+static void read_bitdepth(struct aom_read_bit_buffer *rb,
+                          SequenceHeader *seq_params,
+                          struct aom_internal_error_info *error_info) {
+  const int high_bitdepth = aom_rb_read_bit(rb);
+  if (seq_params->profile == PROFILE_2 && high_bitdepth) {
+    const int twelve_bit = aom_rb_read_bit(rb);
+    seq_params->bit_depth = twelve_bit ? AOM_BITS_12 : AOM_BITS_10;
+  } else if (seq_params->profile <= PROFILE_2) {
+    seq_params->bit_depth = high_bitdepth ? AOM_BITS_10 : AOM_BITS_8;
+  } else {
+    aom_internal_error(error_info, AOM_CODEC_UNSUP_BITSTREAM,
+                       "Unsupported profile/bit-depth combination");
+  }
+}
+
+void av1_read_film_grain_params(AV1_COMMON *cm,
+                                struct aom_read_bit_buffer *rb) {
+  aom_film_grain_t *pars = &cm->film_grain_params;
+  const SequenceHeader *const seq_params = &cm->seq_params;
+
+  pars->apply_grain = aom_rb_read_bit(rb);
+  if (!pars->apply_grain) {
+    memset(pars, 0, sizeof(*pars));
+    return;
+  }
+
+  pars->random_seed = aom_rb_read_literal(rb, 16);
+  if (cm->current_frame.frame_type == INTER_FRAME)
+    pars->update_parameters = aom_rb_read_bit(rb);
+  else
+    pars->update_parameters = 1;
+
+  pars->bit_depth = seq_params->bit_depth;
+
+  if (!pars->update_parameters) {
+    // inherit parameters from a previous reference frame
+    int film_grain_params_ref_idx = aom_rb_read_literal(rb, 3);
+    RefCntBuffer *const buf = cm->ref_frame_map[film_grain_params_ref_idx];
+    if (buf == NULL) {
+      aom_internal_error(&cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+                         "Invalid Film grain reference idx");
+    }
+    if (!buf->film_grain_params_present) {
+      aom_internal_error(&cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+                         "Film grain reference parameters not available");
+    }
+    uint16_t random_seed = pars->random_seed;
+    *pars = buf->film_grain_params;   // inherit paramaters
+    pars->random_seed = random_seed;  // with new random seed
+    return;
+  }
+
+  // Scaling functions parameters
+  pars->num_y_points = aom_rb_read_literal(rb, 4);  // max 14
+  if (pars->num_y_points > 14)
+    aom_internal_error(&cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+                       "Number of points for film grain luma scaling function "
+                       "exceeds the maximum value.");
+  for (int i = 0; i < pars->num_y_points; i++) {
+    pars->scaling_points_y[i][0] = aom_rb_read_literal(rb, 8);
+    if (i && pars->scaling_points_y[i - 1][0] >= pars->scaling_points_y[i][0])
+      aom_internal_error(&cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+                         "First coordinate of the scaling function points "
+                         "shall be increasing.");
+    pars->scaling_points_y[i][1] = aom_rb_read_literal(rb, 8);
+  }
+
+  if (!seq_params->monochrome)
+    pars->chroma_scaling_from_luma = aom_rb_read_bit(rb);
+  else
+    pars->chroma_scaling_from_luma = 0;
+
+  if (seq_params->monochrome || pars->chroma_scaling_from_luma ||
+      ((seq_params->subsampling_x == 1) && (seq_params->subsampling_y == 1) &&
+       (pars->num_y_points == 0))) {
+    pars->num_cb_points = 0;
+    pars->num_cr_points = 0;
+  } else {
+    pars->num_cb_points = aom_rb_read_literal(rb, 4);  // max 10
+    if (pars->num_cb_points > 10)
+      aom_internal_error(&cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+                         "Number of points for film grain cb scaling function "
+                         "exceeds the maximum value.");
+    for (int i = 0; i < pars->num_cb_points; i++) {
+      pars->scaling_points_cb[i][0] = aom_rb_read_literal(rb, 8);
+      if (i &&
+          pars->scaling_points_cb[i - 1][0] >= pars->scaling_points_cb[i][0])
+        aom_internal_error(&cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+                           "First coordinate of the scaling function points "
+                           "shall be increasing.");
+      pars->scaling_points_cb[i][1] = aom_rb_read_literal(rb, 8);
+    }
+
+    pars->num_cr_points = aom_rb_read_literal(rb, 4);  // max 10
+    if (pars->num_cr_points > 10)
+      aom_internal_error(&cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+                         "Number of points for film grain cr scaling function "
+                         "exceeds the maximum value.");
+    for (int i = 0; i < pars->num_cr_points; i++) {
+      pars->scaling_points_cr[i][0] = aom_rb_read_literal(rb, 8);
+      if (i &&
+          pars->scaling_points_cr[i - 1][0] >= pars->scaling_points_cr[i][0])
+        aom_internal_error(&cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+                           "First coordinate of the scaling function points "
+                           "shall be increasing.");
+      pars->scaling_points_cr[i][1] = aom_rb_read_literal(rb, 8);
+    }
+
+    if ((seq_params->subsampling_x == 1) && (seq_params->subsampling_y == 1) &&
+        (((pars->num_cb_points == 0) && (pars->num_cr_points != 0)) ||
+         ((pars->num_cb_points != 0) && (pars->num_cr_points == 0))))
+      aom_internal_error(&cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+                         "In YCbCr 4:2:0, film grain shall be applied "
+                         "to both chroma components or neither.");
+  }
+
+  pars->scaling_shift = aom_rb_read_literal(rb, 2) + 8;  // 8 + value
+
+  // AR coefficients
+  // Only sent if the corresponsing scaling function has
+  // more than 0 points
+
+  pars->ar_coeff_lag = aom_rb_read_literal(rb, 2);
+
+  int num_pos_luma = 2 * pars->ar_coeff_lag * (pars->ar_coeff_lag + 1);
+  int num_pos_chroma = num_pos_luma;
+  if (pars->num_y_points > 0) ++num_pos_chroma;
+
+  if (pars->num_y_points)
+    for (int i = 0; i < num_pos_luma; i++)
+      pars->ar_coeffs_y[i] = aom_rb_read_literal(rb, 8) - 128;
+
+  if (pars->num_cb_points || pars->chroma_scaling_from_luma)
+    for (int i = 0; i < num_pos_chroma; i++)
+      pars->ar_coeffs_cb[i] = aom_rb_read_literal(rb, 8) - 128;
+
+  if (pars->num_cr_points || pars->chroma_scaling_from_luma)
+    for (int i = 0; i < num_pos_chroma; i++)
+      pars->ar_coeffs_cr[i] = aom_rb_read_literal(rb, 8) - 128;
+
+  pars->ar_coeff_shift = aom_rb_read_literal(rb, 2) + 6;  // 6 + value
+
+  pars->grain_scale_shift = aom_rb_read_literal(rb, 2);
+
+  if (pars->num_cb_points) {
+    pars->cb_mult = aom_rb_read_literal(rb, 8);
+    pars->cb_luma_mult = aom_rb_read_literal(rb, 8);
+    pars->cb_offset = aom_rb_read_literal(rb, 9);
+  }
+
+  if (pars->num_cr_points) {
+    pars->cr_mult = aom_rb_read_literal(rb, 8);
+    pars->cr_luma_mult = aom_rb_read_literal(rb, 8);
+    pars->cr_offset = aom_rb_read_literal(rb, 9);
+  }
+
+  pars->overlap_flag = aom_rb_read_bit(rb);
+
+  pars->clip_to_restricted_range = aom_rb_read_bit(rb);
+}
+
+static void read_film_grain(AV1_COMMON *cm, struct aom_read_bit_buffer *rb) {
+  if (cm->seq_params.film_grain_params_present &&
+      (cm->show_frame || cm->showable_frame)) {
+    av1_read_film_grain_params(cm, rb);
+  } else {
+    memset(&cm->film_grain_params, 0, sizeof(cm->film_grain_params));
+  }
+  cm->film_grain_params.bit_depth = cm->seq_params.bit_depth;
+  memcpy(&cm->cur_frame->film_grain_params, &cm->film_grain_params,
+         sizeof(aom_film_grain_t));
+}
+
+void av1_read_color_config(struct aom_read_bit_buffer *rb,
+                           int allow_lowbitdepth, SequenceHeader *seq_params,
+                           struct aom_internal_error_info *error_info) {
+  read_bitdepth(rb, seq_params, error_info);
+
+  seq_params->use_highbitdepth =
+      seq_params->bit_depth > AOM_BITS_8 || !allow_lowbitdepth;
+
+  // monochrome bit (not needed for PROFILE_1)
+  const int is_monochrome =
+      seq_params->profile != PROFILE_1 ? aom_rb_read_bit(rb) : 0;
+  seq_params->monochrome = is_monochrome;
+  int color_description_present_flag = aom_rb_read_bit(rb);
+  if (color_description_present_flag) {
+    seq_params->color_primaries = aom_rb_read_literal(rb, 8);
+    seq_params->transfer_characteristics = aom_rb_read_literal(rb, 8);
+    seq_params->matrix_coefficients = aom_rb_read_literal(rb, 8);
+  } else {
+    seq_params->color_primaries = AOM_CICP_CP_UNSPECIFIED;
+    seq_params->transfer_characteristics = AOM_CICP_TC_UNSPECIFIED;
+    seq_params->matrix_coefficients = AOM_CICP_MC_UNSPECIFIED;
+  }
+  if (is_monochrome) {
+    // [16,235] (including xvycc) vs [0,255] range
+    seq_params->color_range = aom_rb_read_bit(rb);
+    seq_params->subsampling_y = seq_params->subsampling_x = 1;
+    seq_params->chroma_sample_position = AOM_CSP_UNKNOWN;
+    seq_params->separate_uv_delta_q = 0;
+    return;
+  }
+  if (seq_params->color_primaries == AOM_CICP_CP_BT_709 &&
+      seq_params->transfer_characteristics == AOM_CICP_TC_SRGB &&
+      seq_params->matrix_coefficients == AOM_CICP_MC_IDENTITY) {
+    seq_params->subsampling_y = seq_params->subsampling_x = 0;
+    seq_params->color_range = 1;  // assume full color-range
+    if (!(seq_params->profile == PROFILE_1 ||
+          (seq_params->profile == PROFILE_2 &&
+           seq_params->bit_depth == AOM_BITS_12))) {
+      aom_internal_error(
+          error_info, AOM_CODEC_UNSUP_BITSTREAM,
+          "sRGB colorspace not compatible with specified profile");
+    }
+  } else {
+    // [16,235] (including xvycc) vs [0,255] range
+    seq_params->color_range = aom_rb_read_bit(rb);
+    if (seq_params->profile == PROFILE_0) {
+      // 420 only
+      seq_params->subsampling_x = seq_params->subsampling_y = 1;
+    } else if (seq_params->profile == PROFILE_1) {
+      // 444 only
+      seq_params->subsampling_x = seq_params->subsampling_y = 0;
+    } else {
+      assert(seq_params->profile == PROFILE_2);
+      if (seq_params->bit_depth == AOM_BITS_12) {
+        seq_params->subsampling_x = aom_rb_read_bit(rb);
+        if (seq_params->subsampling_x)
+          seq_params->subsampling_y = aom_rb_read_bit(rb);  // 422 or 420
+        else
+          seq_params->subsampling_y = 0;  // 444
+      } else {
+        // 422
+        seq_params->subsampling_x = 1;
+        seq_params->subsampling_y = 0;
+      }
+    }
+    if (seq_params->matrix_coefficients == AOM_CICP_MC_IDENTITY &&
+        (seq_params->subsampling_x || seq_params->subsampling_y)) {
+      aom_internal_error(
+          error_info, AOM_CODEC_UNSUP_BITSTREAM,
+          "Identity CICP Matrix incompatible with non 4:4:4 color sampling");
+    }
+    if (seq_params->subsampling_x && seq_params->subsampling_y) {
+      seq_params->chroma_sample_position = aom_rb_read_literal(rb, 2);
+    }
+  }
+  seq_params->separate_uv_delta_q = aom_rb_read_bit(rb);
+}
+
+void av1_read_timing_info_header(AV1_COMMON *cm,
+                                 struct aom_read_bit_buffer *rb) {
+  cm->timing_info.num_units_in_display_tick = aom_rb_read_unsigned_literal(
+      rb, 32);  // Number of units in a display tick
+  cm->timing_info.time_scale =
+      aom_rb_read_unsigned_literal(rb, 32);  // Time scale
+  if (cm->timing_info.num_units_in_display_tick == 0 ||
+      cm->timing_info.time_scale == 0) {
+    aom_internal_error(
+        &cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+        "num_units_in_display_tick and time_scale must be greater than 0.");
+  }
+  cm->timing_info.equal_picture_interval =
+      aom_rb_read_bit(rb);  // Equal picture interval bit
+  if (cm->timing_info.equal_picture_interval) {
+    cm->timing_info.num_ticks_per_picture =
+        aom_rb_read_uvlc(rb) + 1;  // ticks per picture
+    if (cm->timing_info.num_ticks_per_picture == 0) {
+      aom_internal_error(
+          &cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+          "num_ticks_per_picture_minus_1 cannot be (1 << 32) − 1.");
+    }
+  }
+}
+
+void av1_read_decoder_model_info(AV1_COMMON *cm,
+                                 struct aom_read_bit_buffer *rb) {
+  cm->buffer_model.encoder_decoder_buffer_delay_length =
+      aom_rb_read_literal(rb, 5) + 1;
+  cm->buffer_model.num_units_in_decoding_tick = aom_rb_read_unsigned_literal(
+      rb, 32);  // Number of units in a decoding tick
+  cm->buffer_model.buffer_removal_time_length = aom_rb_read_literal(rb, 5) + 1;
+  cm->buffer_model.frame_presentation_time_length =
+      aom_rb_read_literal(rb, 5) + 1;
+}
+
+void av1_read_op_parameters_info(AV1_COMMON *const cm,
+                                 struct aom_read_bit_buffer *rb, int op_num) {
+  // The cm->op_params array has MAX_NUM_OPERATING_POINTS + 1 elements.
+  if (op_num > MAX_NUM_OPERATING_POINTS) {
+    aom_internal_error(&cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+                       "AV1 does not support %d decoder model operating points",
+                       op_num + 1);
+  }
+
+  cm->op_params[op_num].decoder_buffer_delay = aom_rb_read_unsigned_literal(
+      rb, cm->buffer_model.encoder_decoder_buffer_delay_length);
+
+  cm->op_params[op_num].encoder_buffer_delay = aom_rb_read_unsigned_literal(
+      rb, cm->buffer_model.encoder_decoder_buffer_delay_length);
+
+  cm->op_params[op_num].low_delay_mode_flag = aom_rb_read_bit(rb);
+}
+
+static void av1_read_temporal_point_info(AV1_COMMON *const cm,
+                                         struct aom_read_bit_buffer *rb) {
+  cm->frame_presentation_time = aom_rb_read_unsigned_literal(
+      rb, cm->buffer_model.frame_presentation_time_length);
+}
+
+void av1_read_sequence_header(AV1_COMMON *cm, struct aom_read_bit_buffer *rb,
+                              SequenceHeader *seq_params) {
+  const int num_bits_width = aom_rb_read_literal(rb, 4) + 1;
+  const int num_bits_height = aom_rb_read_literal(rb, 4) + 1;
+  const int max_frame_width = aom_rb_read_literal(rb, num_bits_width) + 1;
+  const int max_frame_height = aom_rb_read_literal(rb, num_bits_height) + 1;
+
+  seq_params->num_bits_width = num_bits_width;
+  seq_params->num_bits_height = num_bits_height;
+  seq_params->max_frame_width = max_frame_width;
+  seq_params->max_frame_height = max_frame_height;
+
+  if (seq_params->reduced_still_picture_hdr) {
+    seq_params->frame_id_numbers_present_flag = 0;
+  } else {
+    seq_params->frame_id_numbers_present_flag = aom_rb_read_bit(rb);
+  }
+  if (seq_params->frame_id_numbers_present_flag) {
+    // We must always have delta_frame_id_length < frame_id_length,
+    // in order for a frame to be referenced with a unique delta.
+    // Avoid wasting bits by using a coding that enforces this restriction.
+    seq_params->delta_frame_id_length = aom_rb_read_literal(rb, 4) + 2;
+    seq_params->frame_id_length =
+        aom_rb_read_literal(rb, 3) + seq_params->delta_frame_id_length + 1;
+    if (seq_params->frame_id_length > 16)
+      aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                         "Invalid frame_id_length");
+  }
+
+  setup_sb_size(seq_params, rb);
+
+  seq_params->enable_filter_intra = aom_rb_read_bit(rb);
+  seq_params->enable_intra_edge_filter = aom_rb_read_bit(rb);
+
+  if (seq_params->reduced_still_picture_hdr) {
+    seq_params->enable_interintra_compound = 0;
+    seq_params->enable_masked_compound = 0;
+    seq_params->enable_warped_motion = 0;
+    seq_params->enable_dual_filter = 0;
+    seq_params->order_hint_info.enable_order_hint = 0;
+    seq_params->order_hint_info.enable_dist_wtd_comp = 0;
+    seq_params->order_hint_info.enable_ref_frame_mvs = 0;
+    seq_params->force_screen_content_tools = 2;  // SELECT_SCREEN_CONTENT_TOOLS
+    seq_params->force_integer_mv = 2;            // SELECT_INTEGER_MV
+    seq_params->order_hint_info.order_hint_bits_minus_1 = -1;
+  } else {
+    seq_params->enable_interintra_compound = aom_rb_read_bit(rb);
+    seq_params->enable_masked_compound = aom_rb_read_bit(rb);
+    seq_params->enable_warped_motion = aom_rb_read_bit(rb);
+    seq_params->enable_dual_filter = aom_rb_read_bit(rb);
+
+    seq_params->order_hint_info.enable_order_hint = aom_rb_read_bit(rb);
+    seq_params->order_hint_info.enable_dist_wtd_comp =
+        seq_params->order_hint_info.enable_order_hint ? aom_rb_read_bit(rb) : 0;
+    seq_params->order_hint_info.enable_ref_frame_mvs =
+        seq_params->order_hint_info.enable_order_hint ? aom_rb_read_bit(rb) : 0;
+
+    if (aom_rb_read_bit(rb)) {
+      seq_params->force_screen_content_tools =
+          2;  // SELECT_SCREEN_CONTENT_TOOLS
+    } else {
+      seq_params->force_screen_content_tools = aom_rb_read_bit(rb);
+    }
+
+    if (seq_params->force_screen_content_tools > 0) {
+      if (aom_rb_read_bit(rb)) {
+        seq_params->force_integer_mv = 2;  // SELECT_INTEGER_MV
+      } else {
+        seq_params->force_integer_mv = aom_rb_read_bit(rb);
+      }
+    } else {
+      seq_params->force_integer_mv = 2;  // SELECT_INTEGER_MV
+    }
+    seq_params->order_hint_info.order_hint_bits_minus_1 =
+        seq_params->order_hint_info.enable_order_hint
+            ? aom_rb_read_literal(rb, 3)
+            : -1;
+  }
+
+  seq_params->enable_superres = aom_rb_read_bit(rb);
+  seq_params->enable_cdef = aom_rb_read_bit(rb);
+  seq_params->enable_restoration = aom_rb_read_bit(rb);
+}
+
+static int read_global_motion_params(WarpedMotionParams *params,
+                                     const WarpedMotionParams *ref_params,
+                                     struct aom_read_bit_buffer *rb,
+                                     int allow_hp) {
+  TransformationType type = aom_rb_read_bit(rb);
+  if (type != IDENTITY) {
+    if (aom_rb_read_bit(rb))
+      type = ROTZOOM;
+    else
+      type = aom_rb_read_bit(rb) ? TRANSLATION : AFFINE;
+  }
+
+  *params = default_warp_params;
+  params->wmtype = type;
+
+  if (type >= ROTZOOM) {
+    params->wmmat[2] = aom_rb_read_signed_primitive_refsubexpfin(
+                           rb, GM_ALPHA_MAX + 1, SUBEXPFIN_K,
+                           (ref_params->wmmat[2] >> GM_ALPHA_PREC_DIFF) -
+                               (1 << GM_ALPHA_PREC_BITS)) *
+                           GM_ALPHA_DECODE_FACTOR +
+                       (1 << WARPEDMODEL_PREC_BITS);
+    params->wmmat[3] = aom_rb_read_signed_primitive_refsubexpfin(
+                           rb, GM_ALPHA_MAX + 1, SUBEXPFIN_K,
+                           (ref_params->wmmat[3] >> GM_ALPHA_PREC_DIFF)) *
+                       GM_ALPHA_DECODE_FACTOR;
+  }
+
+  if (type >= AFFINE) {
+    params->wmmat[4] = aom_rb_read_signed_primitive_refsubexpfin(
+                           rb, GM_ALPHA_MAX + 1, SUBEXPFIN_K,
+                           (ref_params->wmmat[4] >> GM_ALPHA_PREC_DIFF)) *
+                       GM_ALPHA_DECODE_FACTOR;
+    params->wmmat[5] = aom_rb_read_signed_primitive_refsubexpfin(
+                           rb, GM_ALPHA_MAX + 1, SUBEXPFIN_K,
+                           (ref_params->wmmat[5] >> GM_ALPHA_PREC_DIFF) -
+                               (1 << GM_ALPHA_PREC_BITS)) *
+                           GM_ALPHA_DECODE_FACTOR +
+                       (1 << WARPEDMODEL_PREC_BITS);
+  } else {
+    params->wmmat[4] = -params->wmmat[3];
+    params->wmmat[5] = params->wmmat[2];
+  }
+
+  if (type >= TRANSLATION) {
+    const int trans_bits = (type == TRANSLATION)
+                               ? GM_ABS_TRANS_ONLY_BITS - !allow_hp
+                               : GM_ABS_TRANS_BITS;
+    const int trans_dec_factor =
+        (type == TRANSLATION) ? GM_TRANS_ONLY_DECODE_FACTOR * (1 << !allow_hp)
+                              : GM_TRANS_DECODE_FACTOR;
+    const int trans_prec_diff = (type == TRANSLATION)
+                                    ? GM_TRANS_ONLY_PREC_DIFF + !allow_hp
+                                    : GM_TRANS_PREC_DIFF;
+    params->wmmat[0] = aom_rb_read_signed_primitive_refsubexpfin(
+                           rb, (1 << trans_bits) + 1, SUBEXPFIN_K,
+                           (ref_params->wmmat[0] >> trans_prec_diff)) *
+                       trans_dec_factor;
+    params->wmmat[1] = aom_rb_read_signed_primitive_refsubexpfin(
+                           rb, (1 << trans_bits) + 1, SUBEXPFIN_K,
+                           (ref_params->wmmat[1] >> trans_prec_diff)) *
+                       trans_dec_factor;
+  }
+
+  if (params->wmtype <= AFFINE) {
+    int good_shear_params = get_shear_params(params);
+    if (!good_shear_params) return 0;
+  }
+
+  return 1;
+}
+
+static void read_global_motion(AV1_COMMON *cm, struct aom_read_bit_buffer *rb) {
+  for (int frame = LAST_FRAME; frame <= ALTREF_FRAME; ++frame) {
+    const WarpedMotionParams *ref_params =
+        cm->prev_frame ? &cm->prev_frame->global_motion[frame]
+                       : &default_warp_params;
+    int good_params = read_global_motion_params(
+        &cm->global_motion[frame], ref_params, rb, cm->allow_high_precision_mv);
+    if (!good_params) {
+#if WARPED_MOTION_DEBUG
+      printf("Warning: unexpected global motion shear params from aomenc\n");
+#endif
+      cm->global_motion[frame].invalid = 1;
+    }
+
+    // TODO(sarahparker, debargha): The logic in the commented out code below
+    // does not work currently and causes mismatches when resize is on. Fix it
+    // before turning the optimization back on.
+    /*
+    YV12_BUFFER_CONFIG *ref_buf = get_ref_frame(cm, frame);
+    if (cm->width == ref_buf->y_crop_width &&
+        cm->height == ref_buf->y_crop_height) {
+      read_global_motion_params(&cm->global_motion[frame],
+                                &cm->prev_frame->global_motion[frame], rb,
+                                cm->allow_high_precision_mv);
+    } else {
+      cm->global_motion[frame] = default_warp_params;
+    }
+    */
+    /*
+    printf("Dec Ref %d [%d/%d]: %d %d %d %d\n",
+           frame, cm->current_frame.frame_number, cm->show_frame,
+           cm->global_motion[frame].wmmat[0],
+           cm->global_motion[frame].wmmat[1],
+           cm->global_motion[frame].wmmat[2],
+           cm->global_motion[frame].wmmat[3]);
+           */
+  }
+  memcpy(cm->cur_frame->global_motion, cm->global_motion,
+         REF_FRAMES * sizeof(WarpedMotionParams));
+}
+
+// Release the references to the frame buffers in cm->ref_frame_map and reset
+// all elements of cm->ref_frame_map to NULL.
+static void reset_ref_frame_map(AV1_COMMON *const cm) {
+  BufferPool *const pool = cm->buffer_pool;
+
+  for (int i = 0; i < REF_FRAMES; i++) {
+    decrease_ref_count(cm->ref_frame_map[i], pool);
+    cm->ref_frame_map[i] = NULL;
+  }
+}
+
+// Generate next_ref_frame_map.
+static void generate_next_ref_frame_map(AV1Decoder *const pbi) {
+  AV1_COMMON *const cm = &pbi->common;
+  BufferPool *const pool = cm->buffer_pool;
+
+  lock_buffer_pool(pool);
+  // cm->next_ref_frame_map holds references to frame buffers. After storing a
+  // frame buffer index in cm->next_ref_frame_map, we need to increase the
+  // frame buffer's ref_count.
+  int ref_index = 0;
+  for (int mask = cm->current_frame.refresh_frame_flags; mask; mask >>= 1) {
+    if (mask & 1) {
+      cm->next_ref_frame_map[ref_index] = cm->cur_frame;
+    } else {
+      cm->next_ref_frame_map[ref_index] = cm->ref_frame_map[ref_index];
+    }
+    if (cm->next_ref_frame_map[ref_index] != NULL)
+      ++cm->next_ref_frame_map[ref_index]->ref_count;
+    ++ref_index;
+  }
+
+  for (; ref_index < REF_FRAMES; ++ref_index) {
+    cm->next_ref_frame_map[ref_index] = cm->ref_frame_map[ref_index];
+    if (cm->next_ref_frame_map[ref_index] != NULL)
+      ++cm->next_ref_frame_map[ref_index]->ref_count;
+  }
+  unlock_buffer_pool(pool);
+  pbi->hold_ref_buf = 1;
+}
+
+// If the refresh_frame_flags bitmask is set, update reference frame id values
+// and mark frames as valid for reference.
+static void update_ref_frame_id(AV1_COMMON *const cm, int frame_id) {
+  assert(cm->seq_params.frame_id_numbers_present_flag);
+  int refresh_frame_flags = cm->current_frame.refresh_frame_flags;
+  for (int i = 0; i < REF_FRAMES; i++) {
+    if ((refresh_frame_flags >> i) & 1) {
+      cm->ref_frame_id[i] = frame_id;
+      cm->valid_for_referencing[i] = 1;
+    }
+  }
+}
+
+static void show_existing_frame_reset(AV1Decoder *const pbi,
+                                      int existing_frame_idx) {
+  AV1_COMMON *const cm = &pbi->common;
+
+  assert(cm->show_existing_frame);
+
+  cm->current_frame.frame_type = KEY_FRAME;
+
+  cm->current_frame.refresh_frame_flags = (1 << REF_FRAMES) - 1;
+
+  for (int i = 0; i < INTER_REFS_PER_FRAME; ++i) {
+    cm->remapped_ref_idx[i] = INVALID_IDX;
+  }
+
+  if (pbi->need_resync) {
+    reset_ref_frame_map(cm);
+    pbi->need_resync = 0;
+  }
+
+  // Note that the displayed frame must be valid for referencing in order to
+  // have been selected.
+  if (cm->seq_params.frame_id_numbers_present_flag) {
+    update_ref_frame_id(cm, cm->ref_frame_id[existing_frame_idx]);
+  }
+
+  cm->refresh_frame_context = REFRESH_FRAME_CONTEXT_DISABLED;
+
+  generate_next_ref_frame_map(pbi);
+
+  // Reload the adapted CDFs from when we originally coded this keyframe
+  *cm->fc = cm->next_ref_frame_map[existing_frame_idx]->frame_context;
+}
+
+static INLINE void reset_frame_buffers(AV1_COMMON *cm) {
+  RefCntBuffer *const frame_bufs = cm->buffer_pool->frame_bufs;
+  int i;
+
+  // We have not stored any references to frame buffers in
+  // cm->next_ref_frame_map, so we can directly reset it to all NULL.
+  for (i = 0; i < REF_FRAMES; ++i) {
+    cm->next_ref_frame_map[i] = NULL;
+  }
+
+  lock_buffer_pool(cm->buffer_pool);
+  reset_ref_frame_map(cm);
+  assert(cm->cur_frame->ref_count == 1);
+  for (i = 0; i < FRAME_BUFFERS; ++i) {
+    // Reset all unreferenced frame buffers. We can also reset cm->cur_frame
+    // because we are the sole owner of cm->cur_frame.
+    if (frame_bufs[i].ref_count > 0 && &frame_bufs[i] != cm->cur_frame) {
+      continue;
+    }
+    frame_bufs[i].order_hint = 0;
+    av1_zero(frame_bufs[i].ref_order_hints);
+  }
+  av1_zero_unused_internal_frame_buffers(&cm->buffer_pool->int_frame_buffers);
+  unlock_buffer_pool(cm->buffer_pool);
+}
+
+// On success, returns 0. On failure, calls aom_internal_error and does not
+// return.
+static int read_uncompressed_header(AV1Decoder *pbi,
+                                    struct aom_read_bit_buffer *rb) {
+  AV1_COMMON *const cm = &pbi->common;
+  const SequenceHeader *const seq_params = &cm->seq_params;
+  CurrentFrame *const current_frame = &cm->current_frame;
+  MACROBLOCKD *const xd = &pbi->mb;
+  BufferPool *const pool = cm->buffer_pool;
+  RefCntBuffer *const frame_bufs = pool->frame_bufs;
+
+  if (!pbi->sequence_header_ready) {
+    aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                       "No sequence header");
+  }
+
+  cm->last_frame_type = current_frame->frame_type;
+
+  if (seq_params->reduced_still_picture_hdr) {
+    cm->show_existing_frame = 0;
+    cm->show_frame = 1;
+    current_frame->frame_type = KEY_FRAME;
+    if (pbi->sequence_header_changed) {
+      // This is the start of a new coded video sequence.
+      pbi->sequence_header_changed = 0;
+      pbi->decoding_first_frame = 1;
+      reset_frame_buffers(cm);
+    }
+    cm->error_resilient_mode = 1;
+  } else {
+    cm->show_existing_frame = aom_rb_read_bit(rb);
+    pbi->reset_decoder_state = 0;
+
+    if (cm->show_existing_frame) {
+      if (pbi->sequence_header_changed) {
+        aom_internal_error(
+            &cm->error, AOM_CODEC_CORRUPT_FRAME,
+            "New sequence header starts with a show_existing_frame.");
+      }
+      // Show an existing frame directly.
+      const int existing_frame_idx = aom_rb_read_literal(rb, 3);
+      RefCntBuffer *const frame_to_show = cm->ref_frame_map[existing_frame_idx];
+      if (frame_to_show == NULL) {
+        aom_internal_error(&cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+                           "Buffer does not contain a decoded frame");
+      }
+      if (seq_params->decoder_model_info_present_flag &&
+          cm->timing_info.equal_picture_interval == 0) {
+        av1_read_temporal_point_info(cm, rb);
+      }
+      if (seq_params->frame_id_numbers_present_flag) {
+        int frame_id_length = seq_params->frame_id_length;
+        int display_frame_id = aom_rb_read_literal(rb, frame_id_length);
+        /* Compare display_frame_id with ref_frame_id and check valid for
+         * referencing */
+        if (display_frame_id != cm->ref_frame_id[existing_frame_idx] ||
+            cm->valid_for_referencing[existing_frame_idx] == 0)
+          aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                             "Reference buffer frame ID mismatch");
+      }
+      lock_buffer_pool(pool);
+      assert(frame_to_show->ref_count > 0);
+      // cm->cur_frame should be the buffer referenced by the return value
+      // of the get_free_fb() call in av1_receive_compressed_data(), and
+      // generate_next_ref_frame_map() has not been called, so ref_count
+      // should still be 1.
+      assert(cm->cur_frame->ref_count == 1);
+      // assign_frame_buffer_p() decrements ref_count directly rather than
+      // call decrease_ref_count(). If cm->cur_frame->raw_frame_buffer has
+      // already been allocated, it will not be released by
+      // assign_frame_buffer_p()!
+      assert(!cm->cur_frame->raw_frame_buffer.data);
+      assign_frame_buffer_p(&cm->cur_frame, frame_to_show);
+      pbi->reset_decoder_state = frame_to_show->frame_type == KEY_FRAME;
+      unlock_buffer_pool(pool);
+
+      cm->lf.filter_level[0] = 0;
+      cm->lf.filter_level[1] = 0;
+      cm->show_frame = 1;
+
+      if (!frame_to_show->showable_frame) {
+        aom_merge_corrupted_flag(&xd->corrupted, 1);
+      }
+      if (pbi->reset_decoder_state) frame_to_show->showable_frame = 0;
+
+      cm->film_grain_params = frame_to_show->film_grain_params;
+
+      if (pbi->reset_decoder_state) {
+        show_existing_frame_reset(pbi, existing_frame_idx);
+      } else {
+        current_frame->refresh_frame_flags = 0;
+      }
+      av1_decode_sef(pbi);
+      return 0;
+    }
+
+    current_frame->frame_type = (FRAME_TYPE)aom_rb_read_literal(rb, 2);
+    if (pbi->sequence_header_changed) {
+      if (current_frame->frame_type == KEY_FRAME) {
+        // This is the start of a new coded video sequence.
+        pbi->sequence_header_changed = 0;
+        pbi->decoding_first_frame = 1;
+        reset_frame_buffers(cm);
+      } else {
+        aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                           "Sequence header has changed without a keyframe.");
+      }
+    }
+
+    cm->show_frame = aom_rb_read_bit(rb);
+    if (seq_params->still_picture &&
+        (current_frame->frame_type != KEY_FRAME || !cm->show_frame)) {
+      aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                         "Still pictures must be coded as shown keyframes");
+    }
+    cm->showable_frame = current_frame->frame_type != KEY_FRAME;
+    if (cm->show_frame) {
+      if (seq_params->decoder_model_info_present_flag &&
+          cm->timing_info.equal_picture_interval == 0)
+        av1_read_temporal_point_info(cm, rb);
+    } else {
+      // See if this frame can be used as show_existing_frame in future
+      cm->showable_frame = aom_rb_read_bit(rb);
+    }
+    cm->cur_frame->showable_frame = cm->showable_frame;
+    cm->error_resilient_mode =
+        frame_is_sframe(cm) ||
+                (current_frame->frame_type == KEY_FRAME && cm->show_frame)
+            ? 1
+            : aom_rb_read_bit(rb);
+  }
+
+  cm->disable_cdf_update = aom_rb_read_bit(rb);
+  if (seq_params->force_screen_content_tools == 2) {
+    cm->allow_screen_content_tools = aom_rb_read_bit(rb);
+  } else {
+    cm->allow_screen_content_tools = seq_params->force_screen_content_tools;
+  }
+
+  if (cm->allow_screen_content_tools) {
+    if (seq_params->force_integer_mv == 2) {
+      cm->cur_frame_force_integer_mv = aom_rb_read_bit(rb);
+    } else {
+      cm->cur_frame_force_integer_mv = seq_params->force_integer_mv;
+    }
+  } else {
+    cm->cur_frame_force_integer_mv = 0;
+  }
+
+  int frame_size_override_flag = 0;
+  cm->allow_intrabc = 0;
+  cm->primary_ref_frame = PRIMARY_REF_NONE;
+
+  if (!seq_params->reduced_still_picture_hdr) {
+    if (seq_params->frame_id_numbers_present_flag) {
+      int frame_id_length = seq_params->frame_id_length;
+      int diff_len = seq_params->delta_frame_id_length;
+      int prev_frame_id = 0;
+      int have_prev_frame_id =
+          !pbi->decoding_first_frame &&
+          !(current_frame->frame_type == KEY_FRAME && cm->show_frame);
+      if (have_prev_frame_id) {
+        prev_frame_id = cm->current_frame_id;
+      }
+      cm->current_frame_id = aom_rb_read_literal(rb, frame_id_length);
+
+      if (have_prev_frame_id) {
+        int diff_frame_id;
+        if (cm->current_frame_id > prev_frame_id) {
+          diff_frame_id = cm->current_frame_id - prev_frame_id;
+        } else {
+          diff_frame_id =
+              (1 << frame_id_length) + cm->current_frame_id - prev_frame_id;
+        }
+        /* Check current_frame_id for conformance */
+        if (prev_frame_id == cm->current_frame_id ||
+            diff_frame_id >= (1 << (frame_id_length - 1))) {
+          aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                             "Invalid value of current_frame_id");
+        }
+      }
+      /* Check if some frames need to be marked as not valid for referencing */
+      for (int i = 0; i < REF_FRAMES; i++) {
+        if (current_frame->frame_type == KEY_FRAME && cm->show_frame) {
+          cm->valid_for_referencing[i] = 0;
+        } else if (cm->current_frame_id - (1 << diff_len) > 0) {
+          if (cm->ref_frame_id[i] > cm->current_frame_id ||
+              cm->ref_frame_id[i] < cm->current_frame_id - (1 << diff_len))
+            cm->valid_for_referencing[i] = 0;
+        } else {
+          if (cm->ref_frame_id[i] > cm->current_frame_id &&
+              cm->ref_frame_id[i] < (1 << frame_id_length) +
+                                        cm->current_frame_id - (1 << diff_len))
+            cm->valid_for_referencing[i] = 0;
+        }
+      }
+    }
+
+    frame_size_override_flag = frame_is_sframe(cm) ? 1 : aom_rb_read_bit(rb);
+
+    current_frame->order_hint = aom_rb_read_literal(
+        rb, seq_params->order_hint_info.order_hint_bits_minus_1 + 1);
+    current_frame->frame_number = current_frame->order_hint;
+
+    if (!cm->error_resilient_mode && !frame_is_intra_only(cm)) {
+      cm->primary_ref_frame = aom_rb_read_literal(rb, PRIMARY_REF_BITS);
+    }
+  }
+
+  if (seq_params->decoder_model_info_present_flag) {
+    cm->buffer_removal_time_present = aom_rb_read_bit(rb);
+    if (cm->buffer_removal_time_present) {
+      for (int op_num = 0;
+           op_num < seq_params->operating_points_cnt_minus_1 + 1; op_num++) {
+        if (cm->op_params[op_num].decoder_model_param_present_flag) {
+          if ((((seq_params->operating_point_idc[op_num] >>
+                 cm->temporal_layer_id) &
+                0x1) &&
+               ((seq_params->operating_point_idc[op_num] >>
+                 (cm->spatial_layer_id + 8)) &
+                0x1)) ||
+              seq_params->operating_point_idc[op_num] == 0) {
+            cm->op_frame_timing[op_num].buffer_removal_time =
+                aom_rb_read_unsigned_literal(
+                    rb, cm->buffer_model.buffer_removal_time_length);
+          } else {
+            cm->op_frame_timing[op_num].buffer_removal_time = 0;
+          }
+        } else {
+          cm->op_frame_timing[op_num].buffer_removal_time = 0;
+        }
+      }
+    }
+  }
+  if (current_frame->frame_type == KEY_FRAME) {
+    if (!cm->show_frame) {  // unshown keyframe (forward keyframe)
+      current_frame->refresh_frame_flags = aom_rb_read_literal(rb, REF_FRAMES);
+    } else {  // shown keyframe
+      current_frame->refresh_frame_flags = (1 << REF_FRAMES) - 1;
+    }
+
+    for (int i = 0; i < INTER_REFS_PER_FRAME; ++i) {
+      cm->remapped_ref_idx[i] = INVALID_IDX;
+    }
+    if (pbi->need_resync) {
+      reset_ref_frame_map(cm);
+      pbi->need_resync = 0;
+    }
+  } else {
+    if (current_frame->frame_type == INTRA_ONLY_FRAME) {
+      current_frame->refresh_frame_flags = aom_rb_read_literal(rb, REF_FRAMES);
+      if (current_frame->refresh_frame_flags == 0xFF) {
+        aom_internal_error(&cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+                           "Intra only frames cannot have refresh flags 0xFF");
+      }
+      if (pbi->need_resync) {
+        reset_ref_frame_map(cm);
+        pbi->need_resync = 0;
+      }
+    } else if (pbi->need_resync != 1) { /* Skip if need resync */
+      current_frame->refresh_frame_flags =
+          frame_is_sframe(cm) ? 0xFF : aom_rb_read_literal(rb, REF_FRAMES);
+    }
+  }
+
+  if (!frame_is_intra_only(cm) || current_frame->refresh_frame_flags != 0xFF) {
+    // Read all ref frame order hints if error_resilient_mode == 1
+    if (cm->error_resilient_mode &&
+        seq_params->order_hint_info.enable_order_hint) {
+      for (int ref_idx = 0; ref_idx < REF_FRAMES; ref_idx++) {
+        // Read order hint from bit stream
+        unsigned int order_hint = aom_rb_read_literal(
+            rb, seq_params->order_hint_info.order_hint_bits_minus_1 + 1);
+        // Get buffer
+        RefCntBuffer *buf = cm->ref_frame_map[ref_idx];
+        if (buf == NULL || order_hint != buf->order_hint) {
+          if (buf != NULL) {
+            lock_buffer_pool(pool);
+            decrease_ref_count(buf, pool);
+            unlock_buffer_pool(pool);
+          }
+          // If no corresponding buffer exists, allocate a new buffer with all
+          // pixels set to neutral grey.
+          int buf_idx = get_free_fb(cm);
+          if (buf_idx == INVALID_IDX) {
+            aom_internal_error(&cm->error, AOM_CODEC_MEM_ERROR,
+                               "Unable to find free frame buffer");
+          }
+          buf = &frame_bufs[buf_idx];
+          lock_buffer_pool(pool);
+
+          if (av1_reallocate_frame_buffer(pool->cb_priv, &buf->buf,
+                                          seq_params->max_frame_width,
+                                          seq_params->max_frame_height,
+                                          seq_params->max_frame_width,
+                                          seq_params->use_highbitdepth))
+          {
+              unlock_buffer_pool(pool);
+              aom_internal_error(&cm->error, AOM_CODEC_MEM_ERROR, "Failed to allocate frame buffer");
+          }
+          unlock_buffer_pool(pool);
+          set_planes_to_neutral_grey(seq_params, &buf->buf, 0);
+          cm->ref_frame_map[ref_idx] = buf;
+          buf->order_hint = order_hint;
+        }
+      }
+    }
+  }
+
+  if (current_frame->frame_type == KEY_FRAME) {
+    setup_frame_size(cm, frame_size_override_flag, rb);
+
+    if (cm->allow_screen_content_tools && !av1_superres_scaled(cm))
+      cm->allow_intrabc = aom_rb_read_bit(rb);
+    cm->allow_ref_frame_mvs = 0;
+    cm->prev_frame = NULL;
+  } else {
+    cm->allow_ref_frame_mvs = 0;
+
+    if (current_frame->frame_type == INTRA_ONLY_FRAME) {
+      cm->cur_frame->film_grain_params_present =
+          seq_params->film_grain_params_present;
+      setup_frame_size(cm, frame_size_override_flag, rb);
+      if (cm->allow_screen_content_tools && !av1_superres_scaled(cm))
+        cm->allow_intrabc = aom_rb_read_bit(rb);
+
+    } else if (pbi->need_resync != 1) { /* Skip if need resync */
+      int frame_refs_short_signaling = 0;
+      // Frame refs short signaling is off when error resilient mode is on.
+      if (seq_params->order_hint_info.enable_order_hint)
+        frame_refs_short_signaling = aom_rb_read_bit(rb);
+
+      if (frame_refs_short_signaling) {
+        // == LAST_FRAME ==
+        const int lst_ref = aom_rb_read_literal(rb, REF_FRAMES_LOG2);
+        const RefCntBuffer *const lst_buf = cm->ref_frame_map[lst_ref];
+
+        // == GOLDEN_FRAME ==
+        const int gld_ref = aom_rb_read_literal(rb, REF_FRAMES_LOG2);
+        const RefCntBuffer *const gld_buf = cm->ref_frame_map[gld_ref];
+
+        // Most of the time, streams start with a keyframe. In that case,
+        // ref_frame_map will have been filled in at that point and will not
+        // contain any NULLs. However, streams are explicitly allowed to start
+        // with an intra-only frame, so long as they don't then signal a
+        // reference to a slot that hasn't been set yet. That's what we are
+        // checking here.
+        if (lst_buf == NULL)
+          aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                             "Inter frame requests nonexistent reference");
+        if (gld_buf == NULL)
+          aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                             "Inter frame requests nonexistent reference");
+
+        av1_set_frame_refs(cm, cm->remapped_ref_idx, lst_ref, gld_ref);
+      }
+
+      for (int i = 0; i < INTER_REFS_PER_FRAME; ++i) {
+        int ref = 0;
+        if (!frame_refs_short_signaling) {
+          ref = aom_rb_read_literal(rb, REF_FRAMES_LOG2);
+
+          // Most of the time, streams start with a keyframe. In that case,
+          // ref_frame_map will have been filled in at that point and will not
+          // contain any -1's. However, streams are explicitly allowed to start
+          // with an intra-only frame, so long as they don't then signal a
+          // reference to a slot that hasn't been set yet. That's what we are
+          // checking here.
+          if (cm->ref_frame_map[ref] == NULL)
+            aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                               "Inter frame requests nonexistent reference");
+          cm->remapped_ref_idx[i] = ref;
+        } else {
+          ref = cm->remapped_ref_idx[i];
+        }
+
+        cm->ref_frame_sign_bias[LAST_FRAME + i] = 0;
+
+        if (seq_params->frame_id_numbers_present_flag) {
+          int frame_id_length = seq_params->frame_id_length;
+          int diff_len = seq_params->delta_frame_id_length;
+          int delta_frame_id_minus_1 = aom_rb_read_literal(rb, diff_len);
+          int ref_frame_id =
+              ((cm->current_frame_id - (delta_frame_id_minus_1 + 1) +
+                (1 << frame_id_length)) %
+               (1 << frame_id_length));
+          // Compare values derived from delta_frame_id_minus_1 and
+          // refresh_frame_flags. Also, check valid for referencing
+          if (ref_frame_id != cm->ref_frame_id[ref] ||
+              cm->valid_for_referencing[ref] == 0)
+            aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                               "Reference buffer frame ID mismatch");
+        }
+      }
+
+      if (!cm->error_resilient_mode && frame_size_override_flag) {
+        setup_frame_size_with_refs(cm, rb);
+      } else {
+        setup_frame_size(cm, frame_size_override_flag, rb);
+      }
+
+      if (cm->cur_frame_force_integer_mv) {
+        cm->allow_high_precision_mv = 0;
+      } else {
+        cm->allow_high_precision_mv = aom_rb_read_bit(rb);
+      }
+      cm->interp_filter = read_frame_interp_filter(rb);
+      cm->switchable_motion_mode = aom_rb_read_bit(rb);
+    }
+
+    cm->prev_frame = get_primary_ref_frame_buf(cm);
+    if (cm->primary_ref_frame != PRIMARY_REF_NONE &&
+        get_primary_ref_frame_buf(cm) == NULL) {
+      aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                         "Reference frame containing this frame's initial "
+                         "frame context is unavailable.");
+    }
+
+    if (!(current_frame->frame_type == INTRA_ONLY_FRAME) &&
+        pbi->need_resync != 1) {
+      if (frame_might_allow_ref_frame_mvs(cm))
+        cm->allow_ref_frame_mvs = aom_rb_read_bit(rb);
+      else
+        cm->allow_ref_frame_mvs = 0;
+
+      for (int i = LAST_FRAME; i <= ALTREF_FRAME; ++i) {
+        const RefCntBuffer *const ref_buf = get_ref_frame_buf(cm, i);
+        struct scale_factors *const ref_scale_factors =
+            get_ref_scale_factors(cm, i);
+        av1_setup_scale_factors_for_frame(
+            ref_scale_factors, ref_buf->buf.y_crop_width,
+            ref_buf->buf.y_crop_height, cm->width, cm->height);
+        if ((!av1_is_valid_scale(ref_scale_factors)))
+          aom_internal_error(&cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+                             "Reference frame has invalid dimensions");
+      }
+    }
+  }
+
+  av1_setup_frame_buf_refs(cm);
+
+  av1_setup_frame_sign_bias(cm);
+
+  cm->cur_frame->frame_type = current_frame->frame_type;
+
+  if (seq_params->frame_id_numbers_present_flag) {
+    update_ref_frame_id(cm, cm->current_frame_id);
+  }
+
+  const int might_bwd_adapt =
+      !(seq_params->reduced_still_picture_hdr) && !(cm->disable_cdf_update);
+  if (might_bwd_adapt) {
+    cm->refresh_frame_context = aom_rb_read_bit(rb)
+                                    ? REFRESH_FRAME_CONTEXT_DISABLED
+                                    : REFRESH_FRAME_CONTEXT_BACKWARD;
+  } else {
+    cm->refresh_frame_context = REFRESH_FRAME_CONTEXT_DISABLED;
+  }
+
+  cm->cur_frame->buf.bit_depth = seq_params->bit_depth;
+  cm->cur_frame->buf.color_primaries = seq_params->color_primaries;
+  cm->cur_frame->buf.transfer_characteristics =
+      seq_params->transfer_characteristics;
+  cm->cur_frame->buf.matrix_coefficients = seq_params->matrix_coefficients;
+  cm->cur_frame->buf.monochrome = seq_params->monochrome;
+  cm->cur_frame->buf.chroma_sample_position =
+      seq_params->chroma_sample_position;
+  cm->cur_frame->buf.color_range = seq_params->color_range;
+  cm->cur_frame->buf.render_width = cm->render_width;
+  cm->cur_frame->buf.render_height = cm->render_height;
+
+  if (pbi->need_resync) {
+    aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                       "Keyframe / intra-only frame required to reset decoder"
+                       " state");
+  }
+
+  generate_next_ref_frame_map(pbi);
+
+  if (cm->allow_intrabc) {
+    // Set parameters corresponding to no filtering.
+    struct loopfilter *lf = &cm->lf;
+    lf->filter_level[0] = 0;
+    lf->filter_level[1] = 0;
+    cm->cdef_info.cdef_bits = 0;
+    cm->cdef_info.cdef_strengths[0] = 0;
+    cm->cdef_info.nb_cdef_strengths = 1;
+    cm->cdef_info.cdef_uv_strengths[0] = 0;
+    cm->rst_info[0].frame_restoration_type = RESTORE_NONE;
+    cm->rst_info[1].frame_restoration_type = RESTORE_NONE;
+    cm->rst_info[2].frame_restoration_type = RESTORE_NONE;
+  }
+
+  read_tile_info(pbi, rb);
+  setup_quantization(cm, rb);
+  xd->bd = (int)seq_params->bit_depth;
+
+  //if (cm->num_allocated_above_context_planes < av1_num_planes(cm) ||
+  //    cm->num_allocated_above_context_mi_col < cm->mi_cols ||
+  //    cm->num_allocated_above_contexts < cm->tile_rows) {
+  //  av1_free_above_context_buffers(cm, cm->num_allocated_above_contexts);
+  //  if (av1_alloc_above_context_buffers(cm, cm->tile_rows))
+  //    aom_internal_error(&cm->error, AOM_CODEC_MEM_ERROR,
+  //                       "Failed to allocate context buffers");
+  //}
+  av1_setup_context_buffers(pbi);
+
+  if (cm->primary_ref_frame == PRIMARY_REF_NONE) {
+    av1_setup_past_independence(cm);
+  }
+
+  setup_segmentation(cm, rb);
+
+  cm->delta_q_info.delta_q_res = 1;
+  cm->delta_q_info.delta_lf_res = 1;
+  cm->delta_q_info.delta_lf_present_flag = 0;
+  cm->delta_q_info.delta_lf_multi = 0;
+  cm->delta_q_info.delta_q_present_flag =
+      cm->base_qindex > 0 ? aom_rb_read_bit(rb) : 0;
+  if (cm->delta_q_info.delta_q_present_flag) {
+    xd->current_qindex = cm->base_qindex;
+    cm->delta_q_info.delta_q_res = 1 << aom_rb_read_literal(rb, 2);
+    if (!cm->allow_intrabc)
+      cm->delta_q_info.delta_lf_present_flag = aom_rb_read_bit(rb);
+    if (cm->delta_q_info.delta_lf_present_flag) {
+      cm->delta_q_info.delta_lf_res = 1 << aom_rb_read_literal(rb, 2);
+      cm->delta_q_info.delta_lf_multi = aom_rb_read_bit(rb);
+      av1_reset_loop_filter_delta(xd, av1_num_planes(cm));
+    }
+  }
+
+  xd->cur_frame_force_integer_mv = cm->cur_frame_force_integer_mv;
+
+  for (int i = 0; i < MAX_SEGMENTS; ++i) {
+    const int qindex = av1_get_qindex(&cm->seg, i, cm->base_qindex);
+    xd->lossless[i] = qindex == 0 && cm->y_dc_delta_q == 0 &&
+                      cm->u_dc_delta_q == 0 && cm->u_ac_delta_q == 0 &&
+                      cm->v_dc_delta_q == 0 && cm->v_ac_delta_q == 0;
+    xd->qindex[i] = qindex;
+  }
+  cm->coded_lossless = is_coded_lossless(cm, xd);
+  cm->all_lossless = cm->coded_lossless && !av1_superres_scaled(cm);
+  setup_segmentation_dequant(cm, xd);
+  if (cm->coded_lossless) {
+    cm->lf.filter_level[0] = 0;
+    cm->lf.filter_level[1] = 0;
+  }
+  if (cm->coded_lossless || !seq_params->enable_cdef) {
+    cm->cdef_info.cdef_bits = 0;
+    cm->cdef_info.cdef_strengths[0] = 0;
+    cm->cdef_info.cdef_uv_strengths[0] = 0;
+  }
+  if (cm->all_lossless || !seq_params->enable_restoration) {
+    cm->rst_info[0].frame_restoration_type = RESTORE_NONE;
+    cm->rst_info[1].frame_restoration_type = RESTORE_NONE;
+    cm->rst_info[2].frame_restoration_type = RESTORE_NONE;
+  }
+  setup_loopfilter(cm, rb);
+
+  if (!cm->coded_lossless && seq_params->enable_cdef) {
+    setup_cdef(cm, rb);
+  }
+  if (!cm->all_lossless && seq_params->enable_restoration) {
+    decode_restoration_mode(cm, rb);
+  }
+
+  cm->tx_mode = read_tx_mode(cm, rb);
+  current_frame->reference_mode = read_frame_reference_mode(cm, rb);
+  if (current_frame->reference_mode != SINGLE_REFERENCE)
+    setup_compound_reference_mode(cm);
+
+  av1_setup_skip_mode_allowed(cm);
+  current_frame->skip_mode_info.skip_mode_flag =
+      current_frame->skip_mode_info.skip_mode_allowed ? aom_rb_read_bit(rb) : 0;
+
+  if (frame_might_allow_warped_motion(cm))
+    cm->allow_warped_motion = aom_rb_read_bit(rb);
+  else
+    cm->allow_warped_motion = 0;
+
+  cm->reduced_tx_set_used = aom_rb_read_bit(rb);
+
+  if (cm->allow_ref_frame_mvs && !frame_might_allow_ref_frame_mvs(cm)) {
+    aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                       "Frame wrongly requests reference frame MVs");
+  }
+
+  if (!frame_is_intra_only(cm)) read_global_motion(cm, rb);
+
+  cm->cur_frame->film_grain_params_present =
+      seq_params->film_grain_params_present;
+  read_film_grain(cm, rb);
+
+#if EXT_TILE_DEBUG
+  if (pbi->ext_tile_debug && cm->large_scale_tile) {
+    read_ext_tile_info(pbi, rb);
+    av1_set_single_tile_decoding_mode(cm);
+  }
+#endif  // EXT_TILE_DEBUG
+  return 0;
+}
+
+struct aom_read_bit_buffer *av1_init_read_bit_buffer(
+    AV1Decoder *pbi, struct aom_read_bit_buffer *rb, const uint8_t *data,
+    const uint8_t *data_end) {
+  rb->bit_offset = 0;
+  rb->error_handler = error_handler;
+  rb->error_handler_data = &pbi->common;
+  rb->bit_buffer = data;
+  rb->bit_buffer_end = data_end;
+  return rb;
+}
+
+void av1_read_frame_size(struct aom_read_bit_buffer *rb, int num_bits_width,
+                         int num_bits_height, int *width, int *height) {
+  *width = aom_rb_read_literal(rb, num_bits_width) + 1;
+  *height = aom_rb_read_literal(rb, num_bits_height) + 1;
+}
+
+BITSTREAM_PROFILE av1_read_profile(struct aom_read_bit_buffer *rb) {
+  int profile = aom_rb_read_literal(rb, PROFILE_BITS);
+  return (BITSTREAM_PROFILE)profile;
+}
+
+uint32_t av1_decode_frame_headers_and_setup(AV1Decoder *pbi,
+                                            struct aom_read_bit_buffer *rb,
+                                            const uint8_t *data,
+                                            const uint8_t **p_data_end,
+                                            int trailing_bits_present) {
+  AV1_COMMON *const cm = &pbi->common;
+  const int num_planes = av1_num_planes(cm);
+  MACROBLOCKD *const xd = &pbi->mb;
+
+#if CONFIG_BITSTREAM_DEBUG
+  bitstream_queue_set_frame_read(cm->current_frame.frame_number * 2 +
+                                 cm->show_frame);
+#endif
+#if CONFIG_MISMATCH_DEBUG
+  mismatch_move_frame_idx_r();
+#endif
+
+  for (int i = LAST_FRAME; i <= ALTREF_FRAME; ++i) {
+    cm->global_motion[i] = default_warp_params;
+    cm->cur_frame->global_motion[i] = default_warp_params;
+  }
+  xd->global_motion = cm->global_motion;
+
+  read_uncompressed_header(pbi, rb);
+
+  if (trailing_bits_present) av1_check_trailing_bits(pbi, rb);
+
+  // If cm->single_tile_decoding = 0, the independent decoding of a single tile
+  // or a section of a frame is not allowed.
+  if (!cm->single_tile_decoding &&
+      (pbi->dec_tile_row >= 0 || pbi->dec_tile_col >= 0)) {
+    pbi->dec_tile_row = -1;
+    pbi->dec_tile_col = -1;
+  }
+
+  const uint32_t uncomp_hdr_size =
+      (uint32_t)aom_rb_bytes_read(rb);  // Size of the uncompressed header
+  YV12_BUFFER_CONFIG *new_fb = &cm->cur_frame->buf;
+  xd->cur_buf = new_fb;
+  if (av1_allow_intrabc(cm)) {
+    av1_setup_scale_factors_for_frame(
+        &cm->sf_identity, xd->cur_buf->y_crop_width, xd->cur_buf->y_crop_height,
+        xd->cur_buf->y_crop_width, xd->cur_buf->y_crop_height);
+  }
+
+  if (cm->show_existing_frame) {
+    // showing a frame directly
+    *p_data_end = data + uncomp_hdr_size;
+    if (pbi->reset_decoder_state) {
+      // Use the default frame context values.
+      *cm->fc = *cm->default_frame_context;
+      if (!cm->fc->initialized)
+        aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                           "Uninitialized entropy context.");
+    }
+    return uncomp_hdr_size;
+  }
+  decode_mt_init(pbi);
+  av1_setup_motion_field(pbi);
+
+  av1_setup_block_planes(xd, cm->seq_params.subsampling_x,
+                         cm->seq_params.subsampling_y, num_planes);
+  if (cm->primary_ref_frame == PRIMARY_REF_NONE) {
+    // use the default frame context values
+    *cm->fc = *cm->default_frame_context;
+  } else {
+    *cm->fc = get_primary_ref_frame_buf(cm)->frame_context;
+  }
+  if (!cm->fc->initialized)
+    aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                       "Uninitialized entropy context.");
+
+  xd->corrupted = 0;
+  return uncomp_hdr_size;
+}
+
+// Once-per-frame initialization
+static void setup_frame_info(AV1Decoder *pbi) {
+  AV1_COMMON *const cm = &pbi->common;
+
+  if (cm->rst_info[0].frame_restoration_type != RESTORE_NONE ||
+      cm->rst_info[1].frame_restoration_type != RESTORE_NONE ||
+      cm->rst_info[2].frame_restoration_type != RESTORE_NONE) {
+    av1_alloc_restoration_buffers(cm);
+  }
+}
+
+void av1_decode_tg_tiles_and_wrapup(AV1Decoder *pbi, const uint8_t *data,
+                                    const uint8_t *data_end,
+                                    const uint8_t **p_data_end, int start_tile,
+                                    int end_tile, int initialize_flag) {
+  AV1_COMMON *const cm = &pbi->common;
+  MACROBLOCKD *const xd = &pbi->mb;
+  if (initialize_flag) {
+    setup_frame_info(pbi);
+    av1_setup_frame(pbi, cm);
+  }
+  *p_data_end = decode_tiles_mt(pbi, data, data_end, start_tile, end_tile);
+
+  if (end_tile != cm->tile_rows * cm->tile_cols - 1) {
+    return;
+  }
+  
+  if (!xd->corrupted) {
+    if (cm->refresh_frame_context == REFRESH_FRAME_CONTEXT_BACKWARD) {
+      assert(cm->context_update_tile_id < pbi->allocated_tiles);
+      *cm->fc = pbi->ctx_the_choosen_one;
+      av1_reset_cdf_symbol_counters(cm->fc);
+    }
+  } else {
+    aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                       "Decode failed. Frame data is corrupted.");
+  }
+
+  av1_decode_frame_gpu(pbi);
+
+  // Non frame parallel update frame context here.
+  if (!cm->large_scale_tile) {
+    cm->cur_frame->frame_context = *cm->fc;
+  }
+}

diff --git a/libav1/av1/decoder/decodeframe.h b/libav1/av1/decoder/decodeframe.h
new file mode 100644
index 0000000..13b9696
--- /dev/null
+++ b/libav1/av1/decoder/decodeframe.h

@@ -0,0 +1,85 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_DECODER_DECODEFRAME_H_
+#define AOM_AV1_DECODER_DECODEFRAME_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct AV1Decoder;
+struct aom_read_bit_buffer;
+struct ThreadData;
+
+// Reads the middle part of the sequence header OBU (from
+// frame_width_bits_minus_1 to enable_restoration) into seq_params.
+// Reports errors by calling rb->error_handler() or aom_internal_error().
+void av1_read_sequence_header(AV1_COMMON *cm, struct aom_read_bit_buffer *rb,
+                              SequenceHeader *seq_params);
+
+void av1_read_frame_size(struct aom_read_bit_buffer *rb, int num_bits_width,
+                         int num_bits_height, int *width, int *height);
+BITSTREAM_PROFILE av1_read_profile(struct aom_read_bit_buffer *rb);
+
+// Returns 0 on success. Sets pbi->common.error.error_code and returns -1 on
+// failure.
+int av1_check_trailing_bits(struct AV1Decoder *pbi,
+                            struct aom_read_bit_buffer *rb);
+
+// On success, returns the frame header size. On failure, calls
+// aom_internal_error and does not return.
+// TODO(wtc): Figure out and document the p_data_end parameter.
+uint32_t av1_decode_frame_headers_and_setup(struct AV1Decoder *pbi,
+                                            struct aom_read_bit_buffer *rb,
+                                            const uint8_t *data,
+                                            const uint8_t **p_data_end,
+                                            int trailing_bits_present);
+
+void av1_decode_tg_tiles_and_wrapup(struct AV1Decoder *pbi, const uint8_t *data,
+                                    const uint8_t *data_end,
+                                    const uint8_t **p_data_end, int start_tile,
+                                    int end_tile, int initialize_flag);
+
+// Implements the color_config() function in the spec. Reports errors by
+// calling rb->error_handler() or aom_internal_error().
+void av1_read_color_config(struct aom_read_bit_buffer *rb,
+                           int allow_lowbitdepth, SequenceHeader *seq_params,
+                           struct aom_internal_error_info *error_info);
+
+// Implements the timing_info() function in the spec. Reports errors by calling
+// rb->error_handler().
+void av1_read_timing_info_header(AV1_COMMON *cm,
+                                 struct aom_read_bit_buffer *rb);
+
+// Implements the decoder_model_info() function in the spec. Reports errors by
+// calling rb->error_handler().
+void av1_read_decoder_model_info(AV1_COMMON *cm,
+                                 struct aom_read_bit_buffer *rb);
+
+// Implements the operating_parameters_info() function in the spec. Reports
+// errors by calling rb->error_handler() or aom_internal_error().
+void av1_read_op_parameters_info(AV1_COMMON *const cm,
+                                 struct aom_read_bit_buffer *rb, int op_num);
+
+struct aom_read_bit_buffer *av1_init_read_bit_buffer(
+    struct AV1Decoder *pbi, struct aom_read_bit_buffer *rb, const uint8_t *data,
+    const uint8_t *data_end);
+
+void av1_free_mc_tmp_buf(struct ThreadData *thread_data);
+
+void av1_set_single_tile_decoding_mode(AV1_COMMON *const cm);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_DECODER_DECODEFRAME_H_

diff --git a/libav1/av1/decoder/decodemv.c b/libav1/av1/decoder/decodemv.c
new file mode 100644
index 0000000..3c2b9cd
--- /dev/null
+++ b/libav1/av1/decoder/decodemv.c

@@ -0,0 +1,1559 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+
+#include "av1/common/cfl.h"
+#include "av1/common/common.h"
+#include "av1/common/entropy.h"
+#include "av1/common/entropymode.h"
+#include "av1/common/entropymv.h"
+#include "av1/common/mvref_common.h"
+#include "av1/common/pred_common.h"
+#include "av1/common/reconinter.h"
+#include "av1/common/reconintra.h"
+#include "av1/common/seg_common.h"
+#include "av1/common/warped_motion.h"
+
+#include "av1/decoder/decodeframe.h"
+#include "av1/decoder/decodemv.h"
+
+#include "aom_dsp/aom_dsp_common.h"
+
+#define ACCT_STR __func__
+
+#define DEC_MISMATCH_DEBUG 0
+
+static PREDICTION_MODE read_intra_mode(aom_reader *r, aom_cdf_prob *cdf) {
+  return (PREDICTION_MODE)aom_read_symbol(r, cdf, INTRA_MODES, ACCT_STR);
+}
+
+static void read_cdef(AV1_COMMON *cm, aom_reader *r, MACROBLOCKD *const xd,
+                      int mi_col, int mi_row) {
+  MB_MODE_INFO *const mbmi = xd->mi[0];
+  if (cm->coded_lossless) return;
+  if (cm->allow_intrabc) {
+    assert(cm->cdef_info.cdef_bits == 0);
+    return;
+  }
+
+  if (!(mi_col & (cm->seq_params.mib_size - 1)) &&
+      !(mi_row & (cm->seq_params.mib_size - 1))) {  // Top left?
+    xd->cdef_preset[0] = xd->cdef_preset[1] = xd->cdef_preset[2] =
+        xd->cdef_preset[3] = -1;
+  }
+  // Read CDEF param at the first non-skip coding block
+  const int mask = (1 << (6 - MI_SIZE_LOG2));
+  const int m = ~(mask - 1);
+  const int index = cm->seq_params.sb_size == BLOCK_128X128
+                        ? !!(mi_col & mask) + 2 * !!(mi_row & mask)
+                        : 0;
+  cm->mi_grid_visible[(mi_row & m) * cm->mi_stride + (mi_col & m)]
+      ->cdef_strength = xd->cdef_preset[index] =
+      xd->cdef_preset[index] == -1 && !mbmi->skip
+          ? aom_read_literal(r, cm->cdef_info.cdef_bits, ACCT_STR)
+          : xd->cdef_preset[index];
+}
+
+static int read_delta_qindex(AV1_COMMON *cm, const MACROBLOCKD *xd,
+                             aom_reader *r, MB_MODE_INFO *const mbmi,
+                             int mi_col, int mi_row) {
+  int sign, abs, reduced_delta_qindex = 0;
+  BLOCK_SIZE bsize = mbmi->sb_type;
+  const int b_col = mi_col & (cm->seq_params.mib_size - 1);
+  const int b_row = mi_row & (cm->seq_params.mib_size - 1);
+  const int read_delta_q_flag = (b_col == 0 && b_row == 0);
+  FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+
+  if ((bsize != cm->seq_params.sb_size || mbmi->skip == 0) &&
+      read_delta_q_flag) {
+    abs = aom_read_symbol(r, ec_ctx->delta_q_cdf, DELTA_Q_PROBS + 1, ACCT_STR);
+    const int smallval = (abs < DELTA_Q_SMALL);
+
+    if (!smallval) {
+      const int rem_bits = aom_read_literal(r, 3, ACCT_STR) + 1;
+      const int thr = (1 << rem_bits) + 1;
+      abs = aom_read_literal(r, rem_bits, ACCT_STR) + thr;
+    }
+
+    if (abs) {
+      sign = aom_read_bit(r, ACCT_STR);
+    } else {
+      sign = 1;
+    }
+
+    reduced_delta_qindex = sign ? -abs : abs;
+  }
+  return reduced_delta_qindex;
+}
+static int read_delta_lflevel(const AV1_COMMON *const cm, aom_reader *r,
+                              aom_cdf_prob *const cdf,
+                              const MB_MODE_INFO *const mbmi, int mi_col,
+                              int mi_row) {
+  int reduced_delta_lflevel = 0;
+  const BLOCK_SIZE bsize = mbmi->sb_type;
+  const int b_col = mi_col & (cm->seq_params.mib_size - 1);
+  const int b_row = mi_row & (cm->seq_params.mib_size - 1);
+  const int read_delta_lf_flag = (b_col == 0 && b_row == 0);
+
+  if ((bsize != cm->seq_params.sb_size || mbmi->skip == 0) &&
+      read_delta_lf_flag) {
+    int abs = aom_read_symbol(r, cdf, DELTA_LF_PROBS + 1, ACCT_STR);
+    const int smallval = (abs < DELTA_LF_SMALL);
+    if (!smallval) {
+      const int rem_bits = aom_read_literal(r, 3, ACCT_STR) + 1;
+      const int thr = (1 << rem_bits) + 1;
+      abs = aom_read_literal(r, rem_bits, ACCT_STR) + thr;
+    }
+    const int sign = abs ? aom_read_bit(r, ACCT_STR) : 1;
+    reduced_delta_lflevel = sign ? -abs : abs;
+  }
+  return reduced_delta_lflevel;
+}
+
+static UV_PREDICTION_MODE read_intra_mode_uv(FRAME_CONTEXT *ec_ctx,
+                                             aom_reader *r,
+                                             CFL_ALLOWED_TYPE cfl_allowed,
+                                             PREDICTION_MODE y_mode) {
+  const UV_PREDICTION_MODE uv_mode =
+      aom_read_symbol(r, ec_ctx->uv_mode_cdf[cfl_allowed][y_mode],
+                      UV_INTRA_MODES - !cfl_allowed, ACCT_STR);
+  return uv_mode;
+}
+
+static int read_cfl_alphas(FRAME_CONTEXT *const ec_ctx, aom_reader *r,
+                           int *signs_out) {
+  const int joint_sign =
+      aom_read_symbol(r, ec_ctx->cfl_sign_cdf, CFL_JOINT_SIGNS, "cfl:signs");
+  int idx = 0;
+  // Magnitudes are only coded for nonzero values
+  if (CFL_SIGN_U(joint_sign) != CFL_SIGN_ZERO) {
+    aom_cdf_prob *cdf_u = ec_ctx->cfl_alpha_cdf[CFL_CONTEXT_U(joint_sign)];
+    idx = aom_read_symbol(r, cdf_u, CFL_ALPHABET_SIZE, "cfl:alpha_u")
+          << CFL_ALPHABET_SIZE_LOG2;
+  }
+  if (CFL_SIGN_V(joint_sign) != CFL_SIGN_ZERO) {
+    aom_cdf_prob *cdf_v = ec_ctx->cfl_alpha_cdf[CFL_CONTEXT_V(joint_sign)];
+    idx += aom_read_symbol(r, cdf_v, CFL_ALPHABET_SIZE, "cfl:alpha_v");
+  }
+  *signs_out = joint_sign;
+  return idx;
+}
+
+static INTERINTRA_MODE read_interintra_mode(MACROBLOCKD *xd, aom_reader *r,
+                                            int size_group) {
+  const INTERINTRA_MODE ii_mode = (INTERINTRA_MODE)aom_read_symbol(
+      r, xd->tile_ctx->interintra_mode_cdf[size_group], INTERINTRA_MODES,
+      ACCT_STR);
+  return ii_mode;
+}
+
+static PREDICTION_MODE read_inter_mode(FRAME_CONTEXT *ec_ctx, aom_reader *r,
+                                       int16_t ctx) {
+  int16_t mode_ctx = ctx & NEWMV_CTX_MASK;
+  int is_newmv, is_zeromv, is_refmv;
+  is_newmv = aom_read_symbol(r, ec_ctx->newmv_cdf[mode_ctx], 2, ACCT_STR) == 0;
+  if (is_newmv) return NEWMV;
+
+  mode_ctx = (ctx >> GLOBALMV_OFFSET) & GLOBALMV_CTX_MASK;
+  is_zeromv =
+      aom_read_symbol(r, ec_ctx->zeromv_cdf[mode_ctx], 2, ACCT_STR) == 0;
+  if (is_zeromv) return GLOBALMV;
+
+  mode_ctx = (ctx >> REFMV_OFFSET) & REFMV_CTX_MASK;
+  is_refmv = aom_read_symbol(r, ec_ctx->refmv_cdf[mode_ctx], 2, ACCT_STR) == 0;
+  if (is_refmv)
+    return NEARESTMV;
+  else
+    return NEARMV;
+}
+
+static void read_drl_idx(FRAME_CONTEXT *ec_ctx, MACROBLOCKD *xd,
+                         MB_MODE_INFO *mbmi, aom_reader *r) {
+  uint8_t ref_frame_type = av1_ref_frame_type(mbmi->ref_frame);
+  mbmi->ref_mv_idx = 0;
+  if (mbmi->mode == NEWMV || mbmi->mode == NEW_NEWMV) {
+    for (int idx = 0; idx < 2; ++idx) {
+      if (xd->ref_mv_count[ref_frame_type] > idx + 1) {
+        uint8_t drl_ctx = av1_drl_ctx(xd->ref_mv_stack[ref_frame_type], idx);
+        int drl_idx = aom_read_symbol(r, ec_ctx->drl_cdf[drl_ctx], 2, ACCT_STR);
+        mbmi->ref_mv_idx = idx + drl_idx;
+        if (!drl_idx) return;
+      }
+    }
+  }
+  if (have_nearmv_in_inter_mode(mbmi->mode)) {
+    // Offset the NEARESTMV mode.
+    // TODO(jingning): Unify the two syntax decoding loops after the NEARESTMV
+    // mode is factored in.
+    for (int idx = 1; idx < 3; ++idx) {
+      if (xd->ref_mv_count[ref_frame_type] > idx + 1) {
+        uint8_t drl_ctx = av1_drl_ctx(xd->ref_mv_stack[ref_frame_type], idx);
+        int drl_idx = aom_read_symbol(r, ec_ctx->drl_cdf[drl_ctx], 2, ACCT_STR);
+        mbmi->ref_mv_idx = idx + drl_idx - 1;
+        if (!drl_idx) return;
+      }
+    }
+  }
+}
+
+static MOTION_MODE read_motion_mode(AV1_COMMON *cm, MACROBLOCKD *xd,
+                                    MB_MODE_INFO *mbmi, aom_reader *r) {
+  if (cm->switchable_motion_mode == 0) return SIMPLE_TRANSLATION;
+  if (mbmi->skip_mode) return SIMPLE_TRANSLATION;
+
+  const MOTION_MODE last_motion_mode_allowed =
+      motion_mode_allowed(xd->global_motion, xd, mbmi, cm->allow_warped_motion);
+  int motion_mode;
+
+  if (last_motion_mode_allowed == SIMPLE_TRANSLATION) return SIMPLE_TRANSLATION;
+
+  if (last_motion_mode_allowed == OBMC_CAUSAL) {
+    motion_mode =
+        aom_read_symbol(r, xd->tile_ctx->obmc_cdf[mbmi->sb_type], 2, ACCT_STR);
+    return (MOTION_MODE)(SIMPLE_TRANSLATION + motion_mode);
+  } else {
+    motion_mode =
+        aom_read_symbol(r, xd->tile_ctx->motion_mode_cdf[mbmi->sb_type],
+                        MOTION_MODES, ACCT_STR);
+    return (MOTION_MODE)(SIMPLE_TRANSLATION + motion_mode);
+  }
+}
+
+static PREDICTION_MODE read_inter_compound_mode(MACROBLOCKD *xd, aom_reader *r,
+                                                int16_t ctx) {
+  const int mode =
+      aom_read_symbol(r, xd->tile_ctx->inter_compound_mode_cdf[ctx],
+                      INTER_COMPOUND_MODES, ACCT_STR);
+  assert(is_inter_compound_mode(NEAREST_NEARESTMV + mode));
+  return NEAREST_NEARESTMV + mode;
+}
+
+int av1_neg_deinterleave(int diff, int ref, int max) {
+  if (!ref) return diff;
+  if (ref >= (max - 1)) return max - diff - 1;
+  if (2 * ref < max) {
+    if (diff <= 2 * ref) {
+      if (diff & 1)
+        return ref + ((diff + 1) >> 1);
+      else
+        return ref - (diff >> 1);
+    }
+    return diff;
+  } else {
+    if (diff <= 2 * (max - ref - 1)) {
+      if (diff & 1)
+        return ref + ((diff + 1) >> 1);
+      else
+        return ref - (diff >> 1);
+    }
+    return max - (diff + 1);
+  }
+}
+
+static int read_segment_id(AV1_COMMON *const cm, const MACROBLOCKD *const xd,
+                           int mi_row, int mi_col, aom_reader *r, int skip) {
+  int cdf_num;
+  const int pred = av1_get_spatial_seg_pred(cm, xd, mi_row, mi_col, &cdf_num);
+  if (skip) return pred;
+
+  FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+  struct segmentation *const seg = &cm->seg;
+  struct segmentation_probs *const segp = &ec_ctx->seg;
+  aom_cdf_prob *pred_cdf = segp->spatial_pred_seg_cdf[cdf_num];
+  const int coded_id = aom_read_symbol(r, pred_cdf, MAX_SEGMENTS, ACCT_STR);
+  const int segment_id =
+      av1_neg_deinterleave(coded_id, pred, seg->last_active_segid + 1);
+
+  if (segment_id < 0 || segment_id > seg->last_active_segid) {
+    aom_internal_error(xd->error_info, AOM_CODEC_CORRUPT_FRAME,
+                       "Corrupted segment_ids");
+  }
+  return segment_id;
+}
+
+static int dec_get_segment_id(const AV1_COMMON *cm, const uint8_t *segment_ids,
+                              int mi_offset, int x_mis, int y_mis) {
+  int segment_id = INT_MAX;
+
+  for (int y = 0; y < y_mis; y++)
+    for (int x = 0; x < x_mis; x++)
+      segment_id =
+          AOMMIN(segment_id, segment_ids[mi_offset + y * cm->mi_cols + x]);
+
+  assert(segment_id >= 0 && segment_id < MAX_SEGMENTS);
+  return segment_id;
+}
+
+static void set_segment_id(AV1_COMMON *cm, int mi_offset, int x_mis, int y_mis,
+                           int segment_id) {
+  assert(segment_id >= 0 && segment_id < MAX_SEGMENTS);
+
+  for (int y = 0; y < y_mis; y++)
+    for (int x = 0; x < x_mis; x++)
+      cm->cur_frame->seg_map[mi_offset + y * cm->mi_cols + x] = segment_id;
+}
+
+static int read_intra_segment_id(AV1_COMMON *const cm,
+                                 const MACROBLOCKD *const xd, int mi_row,
+                                 int mi_col, int bsize, aom_reader *r,
+                                 int skip) {
+  struct segmentation *const seg = &cm->seg;
+  if (!seg->enabled) return 0;  // Default for disabled segmentation
+
+  assert(seg->update_map && !seg->temporal_update);
+
+  const int mi_offset = mi_row * cm->mi_cols + mi_col;
+  const int bw = mi_size_wide[bsize];
+  const int bh = mi_size_high[bsize];
+  const int x_mis = AOMMIN(cm->mi_cols - mi_col, bw);
+  const int y_mis = AOMMIN(cm->mi_rows - mi_row, bh);
+  const int segment_id = read_segment_id(cm, xd, mi_row, mi_col, r, skip);
+  set_segment_id(cm, mi_offset, x_mis, y_mis, segment_id);
+  return segment_id;
+}
+
+static void copy_segment_id(const AV1_COMMON *cm,
+                            const uint8_t *last_segment_ids,
+                            uint8_t *current_segment_ids, int mi_offset,
+                            int x_mis, int y_mis) {
+  for (int y = 0; y < y_mis; y++)
+    for (int x = 0; x < x_mis; x++)
+      current_segment_ids[mi_offset + y * cm->mi_cols + x] =
+          last_segment_ids ? last_segment_ids[mi_offset + y * cm->mi_cols + x]
+                           : 0;
+}
+
+static int get_predicted_segment_id(AV1_COMMON *const cm, int mi_offset,
+                                    int x_mis, int y_mis) {
+  return cm->last_frame_seg_map ? dec_get_segment_id(cm, cm->last_frame_seg_map,
+                                                     mi_offset, x_mis, y_mis)
+                                : 0;
+}
+
+static int read_inter_segment_id(AV1_COMMON *const cm, MACROBLOCKD *const xd,
+                                 int mi_row, int mi_col, int preskip,
+                                 aom_reader *r) {
+  struct segmentation *const seg = &cm->seg;
+  MB_MODE_INFO *const mbmi = xd->mi[0];
+  const int mi_offset = mi_row * cm->mi_cols + mi_col;
+  const int bw = mi_size_wide[mbmi->sb_type];
+  const int bh = mi_size_high[mbmi->sb_type];
+
+  // TODO(slavarnway): move x_mis, y_mis into xd ?????
+  const int x_mis = AOMMIN(cm->mi_cols - mi_col, bw);
+  const int y_mis = AOMMIN(cm->mi_rows - mi_row, bh);
+
+  if (!seg->enabled) return 0;  // Default for disabled segmentation
+
+  if (!seg->update_map) {
+    copy_segment_id(cm, cm->last_frame_seg_map, cm->cur_frame->seg_map,
+                    mi_offset, x_mis, y_mis);
+    return get_predicted_segment_id(cm, mi_offset, x_mis, y_mis);
+  }
+
+  int segment_id;
+  if (preskip) {
+    if (!seg->segid_preskip) return 0;
+  } else {
+    if (mbmi->skip) {
+      if (seg->temporal_update) {
+        mbmi->seg_id_predicted = 0;
+      }
+      segment_id = read_segment_id(cm, xd, mi_row, mi_col, r, 1);
+      set_segment_id(cm, mi_offset, x_mis, y_mis, segment_id);
+      return segment_id;
+    }
+  }
+
+  if (seg->temporal_update) {
+    const int ctx = av1_get_pred_context_seg_id(xd);
+    FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+    struct segmentation_probs *const segp = &ec_ctx->seg;
+    aom_cdf_prob *pred_cdf = segp->pred_cdf[ctx];
+    mbmi->seg_id_predicted = aom_read_symbol(r, pred_cdf, 2, ACCT_STR);
+    if (mbmi->seg_id_predicted) {
+      segment_id = get_predicted_segment_id(cm, mi_offset, x_mis, y_mis);
+    } else {
+      segment_id = read_segment_id(cm, xd, mi_row, mi_col, r, 0);
+    }
+  } else {
+    segment_id = read_segment_id(cm, xd, mi_row, mi_col, r, 0);
+  }
+  set_segment_id(cm, mi_offset, x_mis, y_mis, segment_id);
+  return segment_id;
+}
+
+static int read_skip_mode(AV1_COMMON *cm, const MACROBLOCKD *xd, int segment_id,
+                          aom_reader *r) {
+  if (!cm->current_frame.skip_mode_info.skip_mode_flag) return 0;
+
+  if (segfeature_active(&cm->seg, segment_id, SEG_LVL_SKIP)) {
+    return 0;
+  }
+
+  if (!is_comp_ref_allowed(xd->mi[0]->sb_type)) return 0;
+
+  if (segfeature_active(&cm->seg, segment_id, SEG_LVL_REF_FRAME) ||
+      segfeature_active(&cm->seg, segment_id, SEG_LVL_GLOBALMV)) {
+    // These features imply single-reference mode, while skip mode implies
+    // compound reference. Hence, the two are mutually exclusive.
+    // In other words, skip_mode is implicitly 0 here.
+    return 0;
+  }
+
+  const int ctx = av1_get_skip_mode_context(xd);
+  FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+  const int skip_mode =
+      aom_read_symbol(r, ec_ctx->skip_mode_cdfs[ctx], 2, ACCT_STR);
+  return skip_mode;
+}
+
+static int read_skip(AV1_COMMON *cm, const MACROBLOCKD *xd, int segment_id,
+                     aom_reader *r) {
+  if (segfeature_active(&cm->seg, segment_id, SEG_LVL_SKIP)) {
+    return 1;
+  } else {
+    const int ctx = av1_get_skip_context(xd);
+    FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+    const int skip = aom_read_symbol(r, ec_ctx->skip_cdfs[ctx], 2, ACCT_STR);
+    return skip;
+  }
+}
+
+// Merge the sorted list of cached colors(cached_colors[0...n_cached_colors-1])
+// and the sorted list of transmitted colors(colors[n_cached_colors...n-1]) into
+// one single sorted list(colors[...]).
+static void merge_colors(uint16_t *colors, uint16_t *cached_colors,
+                         int n_colors, int n_cached_colors) {
+  if (n_cached_colors == 0) return;
+  int cache_idx = 0, trans_idx = n_cached_colors;
+  for (int i = 0; i < n_colors; ++i) {
+    if (cache_idx < n_cached_colors &&
+        (trans_idx >= n_colors ||
+         cached_colors[cache_idx] <= colors[trans_idx])) {
+      colors[i] = cached_colors[cache_idx++];
+    } else {
+      assert(trans_idx < n_colors);
+      colors[i] = colors[trans_idx++];
+    }
+  }
+}
+
+static void read_palette_colors_y(MACROBLOCKD *const xd, int bit_depth,
+                                  PALETTE_MODE_INFO *const pmi, aom_reader *r) {
+  uint16_t color_cache[2 * PALETTE_MAX_SIZE];
+  uint16_t cached_colors[PALETTE_MAX_SIZE];
+  const int n_cache = av1_get_palette_cache(xd, 0, color_cache);
+  const int n = pmi->palette_size[0];
+  int idx = 0;
+  for (int i = 0; i < n_cache && idx < n; ++i)
+    if (aom_read_bit(r, ACCT_STR)) cached_colors[idx++] = color_cache[i];
+  if (idx < n) {
+    const int n_cached_colors = idx;
+    pmi->palette_colors[idx++] = aom_read_literal(r, bit_depth, ACCT_STR);
+    if (idx < n) {
+      const int min_bits = bit_depth - 3;
+      int bits = min_bits + aom_read_literal(r, 2, ACCT_STR);
+      int range = (1 << bit_depth) - pmi->palette_colors[idx - 1] - 1;
+      for (; idx < n; ++idx) {
+        assert(range >= 0);
+        const int delta = aom_read_literal(r, bits, ACCT_STR) + 1;
+        pmi->palette_colors[idx] = clamp(pmi->palette_colors[idx - 1] + delta,
+                                         0, (1 << bit_depth) - 1);
+        range -= (pmi->palette_colors[idx] - pmi->palette_colors[idx - 1]);
+        bits = AOMMIN(bits, av1_ceil_log2(range));
+      }
+    }
+    merge_colors(pmi->palette_colors, cached_colors, n, n_cached_colors);
+  } else {
+    memcpy(pmi->palette_colors, cached_colors, n * sizeof(cached_colors[0]));
+  }
+}
+
+static void read_palette_colors_uv(MACROBLOCKD *const xd, int bit_depth,
+                                   PALETTE_MODE_INFO *const pmi,
+                                   aom_reader *r) {
+  const int n = pmi->palette_size[1];
+  // U channel colors.
+  uint16_t color_cache[2 * PALETTE_MAX_SIZE];
+  uint16_t cached_colors[PALETTE_MAX_SIZE];
+  const int n_cache = av1_get_palette_cache(xd, 1, color_cache);
+  int idx = 0;
+  for (int i = 0; i < n_cache && idx < n; ++i)
+    if (aom_read_bit(r, ACCT_STR)) cached_colors[idx++] = color_cache[i];
+  if (idx < n) {
+    const int n_cached_colors = idx;
+    idx += PALETTE_MAX_SIZE;
+    pmi->palette_colors[idx++] = aom_read_literal(r, bit_depth, ACCT_STR);
+    if (idx < PALETTE_MAX_SIZE + n) {
+      const int min_bits = bit_depth - 3;
+      int bits = min_bits + aom_read_literal(r, 2, ACCT_STR);
+      int range = (1 << bit_depth) - pmi->palette_colors[idx - 1];
+      for (; idx < PALETTE_MAX_SIZE + n; ++idx) {
+        assert(range >= 0);
+        const int delta = aom_read_literal(r, bits, ACCT_STR);
+        pmi->palette_colors[idx] = clamp(pmi->palette_colors[idx - 1] + delta,
+                                         0, (1 << bit_depth) - 1);
+        range -= (pmi->palette_colors[idx] - pmi->palette_colors[idx - 1]);
+        bits = AOMMIN(bits, av1_ceil_log2(range));
+      }
+    }
+    merge_colors(pmi->palette_colors + PALETTE_MAX_SIZE, cached_colors, n,
+                 n_cached_colors);
+  } else {
+    memcpy(pmi->palette_colors + PALETTE_MAX_SIZE, cached_colors,
+           n * sizeof(cached_colors[0]));
+  }
+
+  // V channel colors.
+  if (aom_read_bit(r, ACCT_STR)) {  // Delta encoding.
+    const int min_bits_v = bit_depth - 4;
+    const int max_val = 1 << bit_depth;
+    int bits = min_bits_v + aom_read_literal(r, 2, ACCT_STR);
+    pmi->palette_colors[2 * PALETTE_MAX_SIZE] =
+        aom_read_literal(r, bit_depth, ACCT_STR);
+    for (int i = 1; i < n; ++i) {
+      int delta = aom_read_literal(r, bits, ACCT_STR);
+      if (delta && aom_read_bit(r, ACCT_STR)) delta = -delta;
+      int val = (int)pmi->palette_colors[2 * PALETTE_MAX_SIZE + i - 1] + delta;
+      if (val < 0) val += max_val;
+      if (val >= max_val) val -= max_val;
+      pmi->palette_colors[2 * PALETTE_MAX_SIZE + i] = val;
+    }
+  } else {
+    for (int i = 0; i < n; ++i) {
+      pmi->palette_colors[2 * PALETTE_MAX_SIZE + i] =
+          aom_read_literal(r, bit_depth, ACCT_STR);
+    }
+  }
+}
+
+static void read_palette_mode_info(AV1_COMMON *const cm, MACROBLOCKD *const xd,
+                                   int mi_row, int mi_col, aom_reader *r) {
+  const int num_planes = av1_num_planes(cm);
+  MB_MODE_INFO *const mbmi = xd->mi[0];
+  const BLOCK_SIZE bsize = mbmi->sb_type;
+  assert(av1_allow_palette(cm->allow_screen_content_tools, bsize));
+  PALETTE_MODE_INFO *const pmi = &mbmi->palette_mode_info;
+  const int bsize_ctx = av1_get_palette_bsize_ctx(bsize);
+
+  if (mbmi->mode == DC_PRED) {
+    const int palette_mode_ctx = av1_get_palette_mode_ctx(xd);
+    const int modev = aom_read_symbol(
+        r, xd->tile_ctx->palette_y_mode_cdf[bsize_ctx][palette_mode_ctx], 2,
+        ACCT_STR);
+    if (modev) {
+      pmi->palette_size[0] =
+          aom_read_symbol(r, xd->tile_ctx->palette_y_size_cdf[bsize_ctx],
+                          PALETTE_SIZES, ACCT_STR) +
+          2;
+      read_palette_colors_y(xd, cm->seq_params.bit_depth, pmi, r);
+    }
+  }
+  if (num_planes > 1 && mbmi->uv_mode == UV_DC_PRED &&
+      is_chroma_reference(mi_row, mi_col, bsize, xd->plane[1].subsampling_x,
+                          xd->plane[1].subsampling_y)) {
+    const int palette_uv_mode_ctx = (pmi->palette_size[0] > 0);
+    const int modev = aom_read_symbol(
+        r, xd->tile_ctx->palette_uv_mode_cdf[palette_uv_mode_ctx], 2, ACCT_STR);
+    if (modev) {
+      pmi->palette_size[1] =
+          aom_read_symbol(r, xd->tile_ctx->palette_uv_size_cdf[bsize_ctx],
+                          PALETTE_SIZES, ACCT_STR) +
+          2;
+      read_palette_colors_uv(xd, cm->seq_params.bit_depth, pmi, r);
+    }
+  }
+}
+
+static int read_angle_delta(aom_reader *r, aom_cdf_prob *cdf) {
+  const int sym = aom_read_symbol(r, cdf, 2 * MAX_ANGLE_DELTA + 1, ACCT_STR);
+  return sym - MAX_ANGLE_DELTA;
+}
+
+static void read_filter_intra_mode_info(const AV1_COMMON *const cm,
+                                        MACROBLOCKD *const xd, aom_reader *r) {
+  MB_MODE_INFO *const mbmi = xd->mi[0];
+  FILTER_INTRA_MODE_INFO *filter_intra_mode_info =
+      &mbmi->filter_intra_mode_info;
+
+  if (av1_filter_intra_allowed(cm, mbmi)) {
+    filter_intra_mode_info->use_filter_intra = aom_read_symbol(
+        r, xd->tile_ctx->filter_intra_cdfs[mbmi->sb_type], 2, ACCT_STR);
+    if (filter_intra_mode_info->use_filter_intra) {
+      filter_intra_mode_info->filter_intra_mode = aom_read_symbol(
+          r, xd->tile_ctx->filter_intra_mode_cdf, FILTER_INTRA_MODES, ACCT_STR);
+    }
+  } else {
+    filter_intra_mode_info->use_filter_intra = 0;
+  }
+}
+
+void av1_read_tx_type(const AV1_COMMON *const cm, MACROBLOCKD *xd, int blk_row,
+                      int blk_col, TX_SIZE tx_size, aom_reader *r) {
+  MB_MODE_INFO *mbmi = xd->mi[0];
+  const int txk_type_idx =
+      av1_get_txk_type_index(mbmi->sb_type, blk_row, blk_col);
+  TX_TYPE *tx_type = &xd->txk_type[txk_type_idx];
+  *tx_type = DCT_DCT;
+
+  // No need to read transform type if block is skipped.
+  if (mbmi->skip || segfeature_active(&cm->seg, mbmi->segment_id, SEG_LVL_SKIP))
+    return;
+
+  // No need to read transform type for lossless mode(qindex==0).
+  const int qindex = xd->qindex[mbmi->segment_id];
+  if (qindex == 0) return;
+
+  const int inter_block = is_inter_block(mbmi);
+  if (get_ext_tx_types(tx_size, inter_block, cm->reduced_tx_set_used) > 1) {
+    const TxSetType tx_set_type =
+        av1_get_ext_tx_set_type(tx_size, inter_block, cm->reduced_tx_set_used);
+    const int eset =
+        get_ext_tx_set(tx_size, inter_block, cm->reduced_tx_set_used);
+    // eset == 0 should correspond to a set with only DCT_DCT and
+    // there is no need to read the tx_type
+    assert(eset != 0);
+
+    const TX_SIZE square_tx_size = txsize_sqr_map[tx_size];
+    FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+    if (inter_block) {
+      *tx_type = av1_ext_tx_inv[tx_set_type][aom_read_symbol(
+          r, ec_ctx->inter_ext_tx_cdf[eset][square_tx_size],
+          av1_num_ext_tx_set[tx_set_type], ACCT_STR)];
+    } else {
+      const PREDICTION_MODE intra_mode =
+          mbmi->filter_intra_mode_info.use_filter_intra
+              ? fimode_to_intradir[mbmi->filter_intra_mode_info
+                                       .filter_intra_mode]
+              : mbmi->mode;
+      *tx_type = av1_ext_tx_inv[tx_set_type][aom_read_symbol(
+          r, ec_ctx->intra_ext_tx_cdf[eset][square_tx_size][intra_mode],
+          av1_num_ext_tx_set[tx_set_type], ACCT_STR)];
+    }
+  }
+}
+
+static INLINE void read_mv(aom_reader *r, MV *mv, const MV *ref,
+                           nmv_context *ctx, MvSubpelPrecision precision);
+
+static INLINE int is_mv_valid(const MV *mv);
+
+static INLINE int assign_dv(AV1_COMMON *cm, MACROBLOCKD *xd, int_mv *mv,
+                            const int_mv *ref_mv, int mi_row, int mi_col,
+                            BLOCK_SIZE bsize, aom_reader *r) {
+  FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+  read_mv(r, &mv->as_mv, &ref_mv->as_mv, &ec_ctx->ndvc, MV_SUBPEL_NONE);
+  // DV should not have sub-pel.
+  assert((mv->as_mv.col & 7) == 0);
+  assert((mv->as_mv.row & 7) == 0);
+  mv->as_mv.col = (mv->as_mv.col >> 3) * 8;
+  mv->as_mv.row = (mv->as_mv.row >> 3) * 8;
+  int valid = is_mv_valid(&mv->as_mv) &&
+              av1_is_dv_valid(mv->as_mv, cm, xd, mi_row, mi_col, bsize,
+                              cm->seq_params.mib_size_log2);
+  return valid;
+}
+
+static void read_intrabc_info(AV1_COMMON *const cm, MACROBLOCKD *const xd,
+                              int mi_row, int mi_col, aom_reader *r) {
+  MB_MODE_INFO *const mbmi = xd->mi[0];
+  FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+  mbmi->use_intrabc = aom_read_symbol(r, ec_ctx->intrabc_cdf, 2, ACCT_STR);
+  if (mbmi->use_intrabc) {
+    BLOCK_SIZE bsize = mbmi->sb_type;
+    mbmi->mode = DC_PRED;
+    mbmi->uv_mode = UV_DC_PRED;
+    mbmi->interp_filters = av1_broadcast_interp_filter(BILINEAR);
+    mbmi->motion_mode = SIMPLE_TRANSLATION;
+
+    int16_t inter_mode_ctx[MODE_CTX_REF_FRAMES];
+    int_mv ref_mvs[INTRA_FRAME + 1][MAX_MV_REF_CANDIDATES];
+
+    av1_find_mv_refs(cm, xd, mbmi, INTRA_FRAME, xd->ref_mv_count,
+                     xd->ref_mv_stack, ref_mvs, /*global_mvs=*/NULL, mi_row,
+                     mi_col, inter_mode_ctx);
+
+    int_mv nearestmv, nearmv;
+
+    av1_find_best_ref_mvs(0, ref_mvs[INTRA_FRAME], &nearestmv, &nearmv, 0);
+    int_mv dv_ref = nearestmv.as_int == 0 ? nearmv : nearestmv;
+    if (dv_ref.as_int == 0)
+      av1_find_ref_dv(&dv_ref, &xd->tile, cm->seq_params.mib_size, mi_row,
+                      mi_col);
+    // Ref DV should not have sub-pel.
+    int valid_dv = (dv_ref.as_mv.col & 7) == 0 && (dv_ref.as_mv.row & 7) == 0;
+    dv_ref.as_mv.col = (dv_ref.as_mv.col >> 3) * 8;
+    dv_ref.as_mv.row = (dv_ref.as_mv.row >> 3) * 8;
+    valid_dv = valid_dv && assign_dv(cm, xd, &mbmi->mv[0], &dv_ref, mi_row,
+                                     mi_col, bsize, r);
+    if (!valid_dv) {
+      // Intra bc motion vectors are not valid - signal corrupt frame
+      aom_internal_error(xd->error_info, AOM_CODEC_CORRUPT_FRAME,
+                         "Invalid intrabc dv");
+    }
+  }
+}
+
+// If delta q is present, reads delta_q index.
+// Also reads delta_q loop filter levels, if present.
+static void read_delta_q_params(AV1_COMMON *const cm, MACROBLOCKD *const xd,
+                                const int mi_row, const int mi_col,
+                                aom_reader *r) {
+  DeltaQInfo *const delta_q_info = &cm->delta_q_info;
+
+  if (delta_q_info->delta_q_present_flag) {
+    MB_MODE_INFO *const mbmi = xd->mi[0];
+    xd->current_qindex += read_delta_qindex(cm, xd, r, mbmi, mi_col, mi_row) *
+                          delta_q_info->delta_q_res;
+    /* Normative: Clamp to [1,MAXQ] to not interfere with lossless mode */
+    xd->current_qindex = clamp(xd->current_qindex, 1, MAXQ);
+    FRAME_CONTEXT *const ec_ctx = xd->tile_ctx;
+    if (delta_q_info->delta_lf_present_flag) {
+      if (delta_q_info->delta_lf_multi) {
+        const int frame_lf_count =
+            av1_num_planes(cm) > 1 ? FRAME_LF_COUNT : FRAME_LF_COUNT - 2;
+        for (int lf_id = 0; lf_id < frame_lf_count; ++lf_id) {
+          const int tmp_lvl =
+              xd->delta_lf[lf_id] +
+              read_delta_lflevel(cm, r, ec_ctx->delta_lf_multi_cdf[lf_id], mbmi,
+                                 mi_col, mi_row) *
+                  delta_q_info->delta_lf_res;
+          mbmi->delta_lf[lf_id] = xd->delta_lf[lf_id] =
+              clamp(tmp_lvl, -MAX_LOOP_FILTER, MAX_LOOP_FILTER);
+        }
+      } else {
+        const int tmp_lvl = xd->delta_lf_from_base +
+                            read_delta_lflevel(cm, r, ec_ctx->delta_lf_cdf,
+                                               mbmi, mi_col, mi_row) *
+                                delta_q_info->delta_lf_res;
+        mbmi->delta_lf_from_base = xd->delta_lf_from_base =
+            clamp(tmp_lvl, -MAX_LOOP_FILTER, MAX_LOOP_FILTER);
+      }
+    }
+  }
+}
+
+static void read_intra_frame_mode_info(AV1_COMMON *const cm,
+                                       MACROBLOCKD *const xd, int mi_row,
+                                       int mi_col, aom_reader *r) {
+  MB_MODE_INFO *const mbmi = xd->mi[0];
+  const MB_MODE_INFO *above_mi = xd->above_mbmi;
+  const MB_MODE_INFO *left_mi = xd->left_mbmi;
+  const BLOCK_SIZE bsize = mbmi->sb_type;
+  struct segmentation *const seg = &cm->seg;
+
+  FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+
+  if (seg->segid_preskip)
+    mbmi->segment_id =
+        read_intra_segment_id(cm, xd, mi_row, mi_col, bsize, r, 0);
+
+  mbmi->skip = read_skip(cm, xd, mbmi->segment_id, r);
+
+  if (!seg->segid_preskip)
+    mbmi->segment_id =
+        read_intra_segment_id(cm, xd, mi_row, mi_col, bsize, r, mbmi->skip);
+
+  read_cdef(cm, r, xd, mi_col, mi_row);
+
+  read_delta_q_params(cm, xd, mi_row, mi_col, r);
+
+  mbmi->current_qindex = xd->current_qindex;
+
+  mbmi->ref_frame[0] = INTRA_FRAME;
+  mbmi->ref_frame[1] = NONE_FRAME;
+  mbmi->palette_mode_info.palette_size[0] = 0;
+  mbmi->palette_mode_info.palette_size[1] = 0;
+  mbmi->filter_intra_mode_info.use_filter_intra = 0;
+
+  xd->above_txfm_context = cm->above_txfm_context[xd->tile.tile_row] + mi_col;
+  xd->left_txfm_context =
+      xd->left_txfm_context_buffer + (mi_row & MAX_MIB_MASK);
+
+  if (av1_allow_intrabc(cm)) {
+    read_intrabc_info(cm, xd, mi_row, mi_col, r);
+    if (is_intrabc_block(mbmi)) return;
+  }
+
+  mbmi->mode = read_intra_mode(r, get_y_mode_cdf(ec_ctx, above_mi, left_mi));
+
+  const int use_angle_delta = av1_use_angle_delta(bsize);
+  mbmi->angle_delta[PLANE_TYPE_Y] =
+      (use_angle_delta && av1_is_directional_mode(mbmi->mode))
+          ? read_angle_delta(r, ec_ctx->angle_delta_cdf[mbmi->mode - V_PRED])
+          : 0;
+
+  if (!cm->seq_params.monochrome &&
+      is_chroma_reference(mi_row, mi_col, bsize, xd->plane[1].subsampling_x,
+                          xd->plane[1].subsampling_y)) {
+    xd->cfl.is_chroma_reference = 1;
+    mbmi->uv_mode =
+        read_intra_mode_uv(ec_ctx, r, is_cfl_allowed(xd), mbmi->mode);
+    if (mbmi->uv_mode == UV_CFL_PRED) {
+      mbmi->cfl_alpha_idx = read_cfl_alphas(ec_ctx, r, &mbmi->cfl_alpha_signs);
+    }
+    mbmi->angle_delta[PLANE_TYPE_UV] =
+        (use_angle_delta && av1_is_directional_mode(get_uv_mode(mbmi->uv_mode)))
+            ? read_angle_delta(r,
+                               ec_ctx->angle_delta_cdf[mbmi->uv_mode - V_PRED])
+            : 0;
+  } else {
+    // Avoid decoding angle_info if there is is no chroma prediction
+    mbmi->uv_mode = UV_DC_PRED;
+    xd->cfl.is_chroma_reference = 0;
+  }
+  xd->cfl.store_y = store_cfl_required(cm, xd);
+
+  if (av1_allow_palette(cm->allow_screen_content_tools, bsize))
+    read_palette_mode_info(cm, xd, mi_row, mi_col, r);
+
+  read_filter_intra_mode_info(cm, xd, r);
+}
+
+static int read_mv_component(aom_reader *r, nmv_component *mvcomp,
+                             int use_subpel, int usehp) {
+  int mag, d, fr, hp;
+  const int sign = aom_read_symbol(r, mvcomp->sign_cdf, 2, ACCT_STR);
+  const int mv_class =
+      aom_read_symbol(r, mvcomp->classes_cdf, MV_CLASSES, ACCT_STR);
+  const int class0 = mv_class == MV_CLASS_0;
+
+  // Integer part
+  if (class0) {
+    d = aom_read_symbol(r, mvcomp->class0_cdf, CLASS0_SIZE, ACCT_STR);
+    mag = 0;
+  } else {
+    const int n = mv_class + CLASS0_BITS - 1;  // number of bits
+    d = 0;
+    for (int i = 0; i < n; ++i)
+      d |= aom_read_symbol(r, mvcomp->bits_cdf[i], 2, ACCT_STR) << i;
+    mag = CLASS0_SIZE << (mv_class + 2);
+  }
+
+  if (use_subpel) {
+    // Fractional part
+    fr = aom_read_symbol(r, class0 ? mvcomp->class0_fp_cdf[d] : mvcomp->fp_cdf,
+                         MV_FP_SIZE, ACCT_STR);
+
+    // High precision part (if hp is not used, the default value of the hp is 1)
+    hp = usehp ? aom_read_symbol(
+                     r, class0 ? mvcomp->class0_hp_cdf : mvcomp->hp_cdf, 2,
+                     ACCT_STR)
+               : 1;
+  } else {
+    fr = 3;
+    hp = 1;
+  }
+
+  // Result
+  mag += ((d << 3) | (fr << 1) | hp) + 1;
+  return sign ? -mag : mag;
+}
+
+static INLINE void read_mv(aom_reader *r, MV *mv, const MV *ref,
+                           nmv_context *ctx, MvSubpelPrecision precision) {
+  MV diff = kZeroMv;
+  const MV_JOINT_TYPE joint_type =
+      (MV_JOINT_TYPE)aom_read_symbol(r, ctx->joints_cdf, MV_JOINTS, ACCT_STR);
+
+  if (mv_joint_vertical(joint_type))
+    diff.row = read_mv_component(r, &ctx->comps[0], precision > MV_SUBPEL_NONE,
+                                 precision > MV_SUBPEL_LOW_PRECISION);
+
+  if (mv_joint_horizontal(joint_type))
+    diff.col = read_mv_component(r, &ctx->comps[1], precision > MV_SUBPEL_NONE,
+                                 precision > MV_SUBPEL_LOW_PRECISION);
+
+  mv->row = ref->row + diff.row;
+  mv->col = ref->col + diff.col;
+}
+
+static REFERENCE_MODE read_block_reference_mode(AV1_COMMON *cm,
+                                                const MACROBLOCKD *xd,
+                                                aom_reader *r) {
+  if (!is_comp_ref_allowed(xd->mi[0]->sb_type)) return SINGLE_REFERENCE;
+  if (cm->current_frame.reference_mode == REFERENCE_MODE_SELECT) {
+    const int ctx = av1_get_reference_mode_context(xd);
+    const REFERENCE_MODE mode = (REFERENCE_MODE)aom_read_symbol(
+        r, xd->tile_ctx->comp_inter_cdf[ctx], 2, ACCT_STR);
+    return mode;  // SINGLE_REFERENCE or COMPOUND_REFERENCE
+  } else {
+    assert(cm->current_frame.reference_mode == SINGLE_REFERENCE);
+    return cm->current_frame.reference_mode;
+  }
+}
+
+#define READ_REF_BIT(pname) \
+  aom_read_symbol(r, av1_get_pred_cdf_##pname(xd), 2, ACCT_STR)
+
+static COMP_REFERENCE_TYPE read_comp_reference_type(const MACROBLOCKD *xd,
+                                                    aom_reader *r) {
+  const int ctx = av1_get_comp_reference_type_context(xd);
+  const COMP_REFERENCE_TYPE comp_ref_type =
+      (COMP_REFERENCE_TYPE)aom_read_symbol(
+          r, xd->tile_ctx->comp_ref_type_cdf[ctx], 2, ACCT_STR);
+  return comp_ref_type;  // UNIDIR_COMP_REFERENCE or BIDIR_COMP_REFERENCE
+}
+
+static void set_ref_frames_for_skip_mode(AV1_COMMON *const cm,
+                                         MV_REFERENCE_FRAME ref_frame[2]) {
+  ref_frame[0] = LAST_FRAME + cm->current_frame.skip_mode_info.ref_frame_idx_0;
+  ref_frame[1] = LAST_FRAME + cm->current_frame.skip_mode_info.ref_frame_idx_1;
+}
+
+// Read the referncence frame
+static void read_ref_frames(AV1_COMMON *const cm, MACROBLOCKD *const xd,
+                            aom_reader *r, int segment_id,
+                            MV_REFERENCE_FRAME ref_frame[2]) {
+  if (xd->mi[0]->skip_mode) {
+    set_ref_frames_for_skip_mode(cm, ref_frame);
+    return;
+  }
+
+  if (segfeature_active(&cm->seg, segment_id, SEG_LVL_REF_FRAME)) {
+    ref_frame[0] = (MV_REFERENCE_FRAME)get_segdata(&cm->seg, segment_id,
+                                                   SEG_LVL_REF_FRAME);
+    ref_frame[1] = NONE_FRAME;
+  } else if (segfeature_active(&cm->seg, segment_id, SEG_LVL_SKIP) ||
+             segfeature_active(&cm->seg, segment_id, SEG_LVL_GLOBALMV)) {
+    ref_frame[0] = LAST_FRAME;
+    ref_frame[1] = NONE_FRAME;
+  } else {
+    const REFERENCE_MODE mode = read_block_reference_mode(cm, xd, r);
+
+    if (mode == COMPOUND_REFERENCE) {
+      const COMP_REFERENCE_TYPE comp_ref_type = read_comp_reference_type(xd, r);
+
+      if (comp_ref_type == UNIDIR_COMP_REFERENCE) {
+        const int bit = READ_REF_BIT(uni_comp_ref_p);
+        if (bit) {
+          ref_frame[0] = BWDREF_FRAME;
+          ref_frame[1] = ALTREF_FRAME;
+        } else {
+          const int bit1 = READ_REF_BIT(uni_comp_ref_p1);
+          if (bit1) {
+            const int bit2 = READ_REF_BIT(uni_comp_ref_p2);
+            if (bit2) {
+              ref_frame[0] = LAST_FRAME;
+              ref_frame[1] = GOLDEN_FRAME;
+            } else {
+              ref_frame[0] = LAST_FRAME;
+              ref_frame[1] = LAST3_FRAME;
+            }
+          } else {
+            ref_frame[0] = LAST_FRAME;
+            ref_frame[1] = LAST2_FRAME;
+          }
+        }
+
+        return;
+      }
+
+      assert(comp_ref_type == BIDIR_COMP_REFERENCE);
+
+      const int idx = 1;
+      const int bit = READ_REF_BIT(comp_ref_p);
+      // Decode forward references.
+      if (!bit) {
+        const int bit1 = READ_REF_BIT(comp_ref_p1);
+        ref_frame[!idx] = cm->comp_fwd_ref[bit1 ? 1 : 0];
+      } else {
+        const int bit2 = READ_REF_BIT(comp_ref_p2);
+        ref_frame[!idx] = cm->comp_fwd_ref[bit2 ? 3 : 2];
+      }
+
+      // Decode backward references.
+      const int bit_bwd = READ_REF_BIT(comp_bwdref_p);
+      if (!bit_bwd) {
+        const int bit1_bwd = READ_REF_BIT(comp_bwdref_p1);
+        ref_frame[idx] = cm->comp_bwd_ref[bit1_bwd];
+      } else {
+        ref_frame[idx] = cm->comp_bwd_ref[2];
+      }
+    } else if (mode == SINGLE_REFERENCE) {
+      const int bit0 = READ_REF_BIT(single_ref_p1);
+      if (bit0) {
+        const int bit1 = READ_REF_BIT(single_ref_p2);
+        if (!bit1) {
+          const int bit5 = READ_REF_BIT(single_ref_p6);
+          ref_frame[0] = bit5 ? ALTREF2_FRAME : BWDREF_FRAME;
+        } else {
+          ref_frame[0] = ALTREF_FRAME;
+        }
+      } else {
+        const int bit2 = READ_REF_BIT(single_ref_p3);
+        if (bit2) {
+          const int bit4 = READ_REF_BIT(single_ref_p5);
+          ref_frame[0] = bit4 ? GOLDEN_FRAME : LAST3_FRAME;
+        } else {
+          const int bit3 = READ_REF_BIT(single_ref_p4);
+          ref_frame[0] = bit3 ? LAST2_FRAME : LAST_FRAME;
+        }
+      }
+
+      ref_frame[1] = NONE_FRAME;
+    } else {
+      assert(0 && "Invalid prediction mode.");
+    }
+  }
+}
+
+static INLINE void read_mb_interp_filter(AV1_COMMON *const cm,
+                                         MACROBLOCKD *const xd,
+                                         MB_MODE_INFO *const mbmi,
+                                         aom_reader *r) {
+  FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+
+  if (!av1_is_interp_needed(xd)) {
+    set_default_interp_filters(mbmi, cm->interp_filter);
+    return;
+  }
+
+  if (cm->interp_filter != SWITCHABLE) {
+    mbmi->interp_filters = av1_broadcast_interp_filter(cm->interp_filter);
+  } else {
+    InterpFilter ref0_filter[2] = { EIGHTTAP_REGULAR, EIGHTTAP_REGULAR };
+    for (int dir = 0; dir < 2; ++dir) {
+      const int ctx = av1_get_pred_context_switchable_interp(xd, dir);
+      ref0_filter[dir] = (InterpFilter)aom_read_symbol(
+          r, ec_ctx->switchable_interp_cdf[ctx], SWITCHABLE_FILTERS, ACCT_STR);
+      if (cm->seq_params.enable_dual_filter == 0) {
+        ref0_filter[1] = ref0_filter[0];
+        break;
+      }
+    }
+    // The index system works as: (0, 1) -> (vertical, horizontal) filter types
+    mbmi->interp_filters =
+        av1_make_interp_filters(ref0_filter[0], ref0_filter[1]);
+  }
+}
+
+static void read_intra_block_mode_info(AV1_COMMON *const cm, const int mi_row,
+                                       const int mi_col, MACROBLOCKD *const xd,
+                                       MB_MODE_INFO *const mbmi,
+                                       aom_reader *r) {
+  const BLOCK_SIZE bsize = mbmi->sb_type;
+  const int use_angle_delta = av1_use_angle_delta(bsize);
+
+  mbmi->ref_frame[0] = INTRA_FRAME;
+  mbmi->ref_frame[1] = NONE_FRAME;
+
+  FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+
+  mbmi->mode = read_intra_mode(r, ec_ctx->y_mode_cdf[size_group_lookup[bsize]]);
+
+  mbmi->angle_delta[PLANE_TYPE_Y] =
+      use_angle_delta && av1_is_directional_mode(mbmi->mode)
+          ? read_angle_delta(r, ec_ctx->angle_delta_cdf[mbmi->mode - V_PRED])
+          : 0;
+  const int has_chroma =
+      is_chroma_reference(mi_row, mi_col, bsize, xd->plane[1].subsampling_x,
+                          xd->plane[1].subsampling_y);
+  xd->cfl.is_chroma_reference = has_chroma;
+  if (!cm->seq_params.monochrome && has_chroma) {
+    mbmi->uv_mode =
+        read_intra_mode_uv(ec_ctx, r, is_cfl_allowed(xd), mbmi->mode);
+    if (mbmi->uv_mode == UV_CFL_PRED) {
+      mbmi->cfl_alpha_idx =
+          read_cfl_alphas(xd->tile_ctx, r, &mbmi->cfl_alpha_signs);
+    }
+    mbmi->angle_delta[PLANE_TYPE_UV] =
+        use_angle_delta && av1_is_directional_mode(get_uv_mode(mbmi->uv_mode))
+            ? read_angle_delta(r,
+                               ec_ctx->angle_delta_cdf[mbmi->uv_mode - V_PRED])
+            : 0;
+  } else {
+    // Avoid decoding angle_info if there is is no chroma prediction
+    mbmi->uv_mode = UV_DC_PRED;
+  }
+  xd->cfl.store_y = store_cfl_required(cm, xd);
+
+  mbmi->palette_mode_info.palette_size[0] = 0;
+  mbmi->palette_mode_info.palette_size[1] = 0;
+  if (av1_allow_palette(cm->allow_screen_content_tools, bsize))
+    read_palette_mode_info(cm, xd, mi_row, mi_col, r);
+
+  read_filter_intra_mode_info(cm, xd, r);
+}
+
+static INLINE int is_mv_valid(const MV *mv) {
+  return mv->row > MV_LOW && mv->row < MV_UPP && mv->col > MV_LOW &&
+         mv->col < MV_UPP;
+}
+
+static INLINE int assign_mv(AV1_COMMON *cm, MACROBLOCKD *xd,
+                            PREDICTION_MODE mode,
+                            MV_REFERENCE_FRAME ref_frame[2], int_mv mv[2],
+                            int_mv ref_mv[2], int_mv nearest_mv[2],
+                            int_mv near_mv[2], int mi_row, int mi_col,
+                            int is_compound, int allow_hp, aom_reader *r) {
+  FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+  MB_MODE_INFO *mbmi = xd->mi[0];
+  BLOCK_SIZE bsize = mbmi->sb_type;
+  if (cm->cur_frame_force_integer_mv) {
+    allow_hp = MV_SUBPEL_NONE;
+  }
+  switch (mode) {
+    case NEWMV: {
+      nmv_context *const nmvc = &ec_ctx->nmvc;
+      read_mv(r, &mv[0].as_mv, &ref_mv[0].as_mv, nmvc, allow_hp);
+      break;
+    }
+    case NEARESTMV: {
+      mv[0].as_int = nearest_mv[0].as_int;
+      break;
+    }
+    case NEARMV: {
+      mv[0].as_int = near_mv[0].as_int;
+      break;
+    }
+    case GLOBALMV: {
+      mv[0].as_int =
+          gm_get_motion_vector(&cm->global_motion[ref_frame[0]],
+                               cm->allow_high_precision_mv, bsize, mi_col,
+                               mi_row, cm->cur_frame_force_integer_mv)
+              .as_int;
+      break;
+    }
+    case NEW_NEWMV: {
+      assert(is_compound);
+      for (int i = 0; i < 2; ++i) {
+        nmv_context *const nmvc = &ec_ctx->nmvc;
+        read_mv(r, &mv[i].as_mv, &ref_mv[i].as_mv, nmvc, allow_hp);
+      }
+      break;
+    }
+    case NEAREST_NEARESTMV: {
+      assert(is_compound);
+      mv[0].as_int = nearest_mv[0].as_int;
+      mv[1].as_int = nearest_mv[1].as_int;
+      break;
+    }
+    case NEAR_NEARMV: {
+      assert(is_compound);
+      mv[0].as_int = near_mv[0].as_int;
+      mv[1].as_int = near_mv[1].as_int;
+      break;
+    }
+    case NEW_NEARESTMV: {
+      nmv_context *const nmvc = &ec_ctx->nmvc;
+      read_mv(r, &mv[0].as_mv, &ref_mv[0].as_mv, nmvc, allow_hp);
+      assert(is_compound);
+      mv[1].as_int = nearest_mv[1].as_int;
+      break;
+    }
+    case NEAREST_NEWMV: {
+      nmv_context *const nmvc = &ec_ctx->nmvc;
+      mv[0].as_int = nearest_mv[0].as_int;
+      read_mv(r, &mv[1].as_mv, &ref_mv[1].as_mv, nmvc, allow_hp);
+      assert(is_compound);
+      break;
+    }
+    case NEAR_NEWMV: {
+      nmv_context *const nmvc = &ec_ctx->nmvc;
+      mv[0].as_int = near_mv[0].as_int;
+      read_mv(r, &mv[1].as_mv, &ref_mv[1].as_mv, nmvc, allow_hp);
+      assert(is_compound);
+      break;
+    }
+    case NEW_NEARMV: {
+      nmv_context *const nmvc = &ec_ctx->nmvc;
+      read_mv(r, &mv[0].as_mv, &ref_mv[0].as_mv, nmvc, allow_hp);
+      assert(is_compound);
+      mv[1].as_int = near_mv[1].as_int;
+      break;
+    }
+    case GLOBAL_GLOBALMV: {
+      assert(is_compound);
+      mv[0].as_int =
+          gm_get_motion_vector(&cm->global_motion[ref_frame[0]],
+                               cm->allow_high_precision_mv, bsize, mi_col,
+                               mi_row, cm->cur_frame_force_integer_mv)
+              .as_int;
+      mv[1].as_int =
+          gm_get_motion_vector(&cm->global_motion[ref_frame[1]],
+                               cm->allow_high_precision_mv, bsize, mi_col,
+                               mi_row, cm->cur_frame_force_integer_mv)
+              .as_int;
+      break;
+    }
+    default: { return 0; }
+  }
+
+  int ret = is_mv_valid(&mv[0].as_mv);
+  if (is_compound) {
+    ret = ret && is_mv_valid(&mv[1].as_mv);
+  }
+  return ret;
+}
+
+static int read_is_inter_block(AV1_COMMON *const cm, MACROBLOCKD *const xd,
+                               int segment_id, aom_reader *r) {
+  if (segfeature_active(&cm->seg, segment_id, SEG_LVL_REF_FRAME)) {
+    const int frame = get_segdata(&cm->seg, segment_id, SEG_LVL_REF_FRAME);
+    if (frame < LAST_FRAME) return 0;
+    return frame != INTRA_FRAME;
+  }
+  if (segfeature_active(&cm->seg, segment_id, SEG_LVL_GLOBALMV)) {
+    return 1;
+  }
+  const int ctx = av1_get_intra_inter_context(xd);
+  FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+  const int is_inter =
+      aom_read_symbol(r, ec_ctx->intra_inter_cdf[ctx], 2, ACCT_STR);
+  return is_inter;
+}
+
+#if DEC_MISMATCH_DEBUG
+static void dec_dump_logs(AV1_COMMON *cm, MB_MODE_INFO *const mbmi, int mi_row,
+                          int mi_col, int16_t mode_ctx) {
+  int_mv mv[2] = { { 0 } };
+  for (int ref = 0; ref < 1 + has_second_ref(mbmi); ++ref)
+    mv[ref].as_mv = mbmi->mv[ref].as_mv;
+
+  const int16_t newmv_ctx = mode_ctx & NEWMV_CTX_MASK;
+  int16_t zeromv_ctx = -1;
+  int16_t refmv_ctx = -1;
+  if (mbmi->mode != NEWMV) {
+    zeromv_ctx = (mode_ctx >> GLOBALMV_OFFSET) & GLOBALMV_CTX_MASK;
+    if (mbmi->mode != GLOBALMV)
+      refmv_ctx = (mode_ctx >> REFMV_OFFSET) & REFMV_CTX_MASK;
+  }
+
+#define FRAME_TO_CHECK 11
+  if (cm->current_frame.frame_number == FRAME_TO_CHECK && cm->show_frame == 1) {
+    printf(
+        "=== DECODER ===: "
+        "Frame=%d, (mi_row,mi_col)=(%d,%d), skip_mode=%d, mode=%d, bsize=%d, "
+        "show_frame=%d, mv[0]=(%d,%d), mv[1]=(%d,%d), ref[0]=%d, "
+        "ref[1]=%d, motion_mode=%d, mode_ctx=%d, "
+        "newmv_ctx=%d, zeromv_ctx=%d, refmv_ctx=%d, tx_size=%d\n",
+        cm->current_frame.frame_number, mi_row, mi_col, mbmi->skip_mode,
+        mbmi->mode, mbmi->sb_type, cm->show_frame, mv[0].as_mv.row,
+        mv[0].as_mv.col, mv[1].as_mv.row, mv[1].as_mv.col, mbmi->ref_frame[0],
+        mbmi->ref_frame[1], mbmi->motion_mode, mode_ctx, newmv_ctx, zeromv_ctx,
+        refmv_ctx, mbmi->tx_size);
+  }
+}
+#endif  // DEC_MISMATCH_DEBUG
+
+static void read_inter_block_mode_info(AV1Decoder *const pbi,
+                                       MACROBLOCKD *const xd,
+                                       MB_MODE_INFO *const mbmi, int mi_row,
+                                       int mi_col, aom_reader *r) {
+  AV1_COMMON *const cm = &pbi->common;
+  const BLOCK_SIZE bsize = mbmi->sb_type;
+  const int allow_hp = cm->allow_high_precision_mv;
+  int_mv nearestmv[2], nearmv[2];
+  int_mv ref_mvs[MODE_CTX_REF_FRAMES][MAX_MV_REF_CANDIDATES] = { { { 0 } } };
+  int16_t inter_mode_ctx[MODE_CTX_REF_FRAMES];
+  int pts[SAMPLES_ARRAY_SIZE], pts_inref[SAMPLES_ARRAY_SIZE];
+  FRAME_CONTEXT *ec_ctx = xd->tile_ctx;
+
+  mbmi->uv_mode = UV_DC_PRED;
+  mbmi->palette_mode_info.palette_size[0] = 0;
+  mbmi->palette_mode_info.palette_size[1] = 0;
+
+  av1_collect_neighbors_ref_counts(xd);
+
+  read_ref_frames(cm, xd, r, mbmi->segment_id, mbmi->ref_frame);
+  const int is_compound = has_second_ref(mbmi);
+
+  MV_REFERENCE_FRAME ref_frame = av1_ref_frame_type(mbmi->ref_frame);
+  av1_find_mv_refs(cm, xd, mbmi, ref_frame, xd->ref_mv_count, xd->ref_mv_stack,
+                   ref_mvs, /*global_mvs=*/NULL, mi_row, mi_col,
+                   inter_mode_ctx);
+
+  int mode_ctx = av1_mode_context_analyzer(inter_mode_ctx, mbmi->ref_frame);
+  mbmi->ref_mv_idx = 0;
+
+  if (mbmi->skip_mode) {
+    assert(is_compound);
+    mbmi->mode = NEAREST_NEARESTMV;
+  } else {
+    if (segfeature_active(&cm->seg, mbmi->segment_id, SEG_LVL_SKIP) ||
+        segfeature_active(&cm->seg, mbmi->segment_id, SEG_LVL_GLOBALMV)) {
+      mbmi->mode = GLOBALMV;
+    } else {
+      if (is_compound)
+        mbmi->mode = read_inter_compound_mode(xd, r, mode_ctx);
+      else
+        mbmi->mode = read_inter_mode(ec_ctx, r, mode_ctx);
+      if (mbmi->mode == NEWMV || mbmi->mode == NEW_NEWMV ||
+          have_nearmv_in_inter_mode(mbmi->mode))
+        read_drl_idx(ec_ctx, xd, mbmi, r);
+    }
+  }
+
+  if (is_compound != is_inter_compound_mode(mbmi->mode)) {
+    aom_internal_error(xd->error_info, AOM_CODEC_CORRUPT_FRAME,
+                       "Prediction mode %d invalid with ref frame %d %d",
+                       mbmi->mode, mbmi->ref_frame[0], mbmi->ref_frame[1]);
+  }
+
+  if (!is_compound && mbmi->mode != GLOBALMV) {
+    av1_find_best_ref_mvs(allow_hp, ref_mvs[mbmi->ref_frame[0]], &nearestmv[0],
+                          &nearmv[0], cm->cur_frame_force_integer_mv);
+  }
+
+  if (is_compound && mbmi->mode != GLOBAL_GLOBALMV) {
+    int ref_mv_idx = mbmi->ref_mv_idx + 1;
+    nearestmv[0] = xd->ref_mv_stack[ref_frame][0].this_mv;
+    nearestmv[1] = xd->ref_mv_stack[ref_frame][0].comp_mv;
+    nearmv[0] = xd->ref_mv_stack[ref_frame][ref_mv_idx].this_mv;
+    nearmv[1] = xd->ref_mv_stack[ref_frame][ref_mv_idx].comp_mv;
+    lower_mv_precision(&nearestmv[0].as_mv, allow_hp,
+                       cm->cur_frame_force_integer_mv);
+    lower_mv_precision(&nearestmv[1].as_mv, allow_hp,
+                       cm->cur_frame_force_integer_mv);
+    lower_mv_precision(&nearmv[0].as_mv, allow_hp,
+                       cm->cur_frame_force_integer_mv);
+    lower_mv_precision(&nearmv[1].as_mv, allow_hp,
+                       cm->cur_frame_force_integer_mv);
+  } else if (mbmi->ref_mv_idx > 0 && mbmi->mode == NEARMV) {
+    int_mv cur_mv =
+        xd->ref_mv_stack[mbmi->ref_frame[0]][1 + mbmi->ref_mv_idx].this_mv;
+    nearmv[0] = cur_mv;
+  }
+
+  int_mv ref_mv[2];
+  ref_mv[0] = nearestmv[0];
+  ref_mv[1] = nearestmv[1];
+
+  if (is_compound) {
+    int ref_mv_idx = mbmi->ref_mv_idx;
+    // Special case: NEAR_NEWMV and NEW_NEARMV modes use
+    // 1 + mbmi->ref_mv_idx (like NEARMV) instead of
+    // mbmi->ref_mv_idx (like NEWMV)
+    if (mbmi->mode == NEAR_NEWMV || mbmi->mode == NEW_NEARMV)
+      ref_mv_idx = 1 + mbmi->ref_mv_idx;
+
+    // TODO(jingning, yunqing): Do we need a lower_mv_precision() call here?
+    if (compound_ref0_mode(mbmi->mode) == NEWMV)
+      ref_mv[0] = xd->ref_mv_stack[ref_frame][ref_mv_idx].this_mv;
+
+    if (compound_ref1_mode(mbmi->mode) == NEWMV)
+      ref_mv[1] = xd->ref_mv_stack[ref_frame][ref_mv_idx].comp_mv;
+  } else {
+    if (mbmi->mode == NEWMV) {
+      if (xd->ref_mv_count[ref_frame] > 1)
+        ref_mv[0] = xd->ref_mv_stack[ref_frame][mbmi->ref_mv_idx].this_mv;
+    }
+  }
+
+  if (mbmi->skip_mode) assert(mbmi->mode == NEAREST_NEARESTMV);
+
+  int mv_corrupted_flag =
+      !assign_mv(cm, xd, mbmi->mode, mbmi->ref_frame, mbmi->mv, ref_mv,
+                 nearestmv, nearmv, mi_row, mi_col, is_compound, allow_hp, r);
+  aom_merge_corrupted_flag(&xd->corrupted, mv_corrupted_flag);
+
+  mbmi->use_wedge_interintra = 0;
+  if (cm->seq_params.enable_interintra_compound && !mbmi->skip_mode &&
+      is_interintra_allowed(mbmi)) {
+    const int bsize_group = size_group_lookup[bsize];
+    const int interintra =
+        aom_read_symbol(r, ec_ctx->interintra_cdf[bsize_group], 2, ACCT_STR);
+    assert(mbmi->ref_frame[1] == NONE_FRAME);
+    if (interintra) {
+      const INTERINTRA_MODE interintra_mode =
+          read_interintra_mode(xd, r, bsize_group);
+      mbmi->ref_frame[1] = INTRA_FRAME;
+      mbmi->interintra_mode = interintra_mode;
+      mbmi->angle_delta[PLANE_TYPE_Y] = 0;
+      mbmi->angle_delta[PLANE_TYPE_UV] = 0;
+      mbmi->filter_intra_mode_info.use_filter_intra = 0;
+      if (is_interintra_wedge_used(bsize)) {
+        mbmi->use_wedge_interintra = aom_read_symbol(
+            r, ec_ctx->wedge_interintra_cdf[bsize], 2, ACCT_STR);
+        if (mbmi->use_wedge_interintra) {
+          mbmi->interintra_wedge_index =
+              aom_read_symbol(r, ec_ctx->wedge_idx_cdf[bsize], 16, ACCT_STR);
+          mbmi->interintra_wedge_sign = 0;
+        }
+      }
+    }
+  }
+
+  for (int ref = 0; ref < 1 + has_second_ref(mbmi); ++ref) {
+    const MV_REFERENCE_FRAME frame = mbmi->ref_frame[ref];
+    xd->block_ref_scale_factors[ref] = get_ref_scale_factors_const(cm, frame);
+  }
+
+  mbmi->motion_mode = SIMPLE_TRANSLATION;
+  if (is_motion_variation_allowed_bsize(mbmi->sb_type) && !mbmi->skip_mode &&
+      !has_second_ref(mbmi))
+    mbmi->num_proj_ref = findSamples(cm, xd, mi_row, mi_col, pts, pts_inref);
+  av1_count_overlappable_neighbors(cm, xd, mi_row, mi_col);
+
+  if (mbmi->ref_frame[1] != INTRA_FRAME)
+    mbmi->motion_mode = read_motion_mode(cm, xd, mbmi, r);
+
+  // init
+  mbmi->comp_group_idx = 0;
+  mbmi->compound_idx = 1;
+  mbmi->interinter_comp.type = COMPOUND_AVERAGE;
+
+  if (has_second_ref(mbmi) && !mbmi->skip_mode) {
+    // Read idx to indicate current compound inter prediction mode group
+    const int masked_compound_used = is_any_masked_compound_used(bsize) &&
+                                     cm->seq_params.enable_masked_compound;
+
+    if (masked_compound_used) {
+      const int ctx_comp_group_idx = get_comp_group_idx_context(xd);
+      mbmi->comp_group_idx = aom_read_symbol(
+          r, ec_ctx->comp_group_idx_cdf[ctx_comp_group_idx], 2, ACCT_STR);
+    }
+
+    if (mbmi->comp_group_idx == 0) {
+      if (cm->seq_params.order_hint_info.enable_dist_wtd_comp) {
+        const int comp_index_ctx = get_comp_index_context(cm, xd);
+        mbmi->compound_idx = aom_read_symbol(
+            r, ec_ctx->compound_index_cdf[comp_index_ctx], 2, ACCT_STR);
+        mbmi->interinter_comp.type =
+            mbmi->compound_idx ? COMPOUND_AVERAGE : COMPOUND_DISTWTD;
+      } else {
+        // Distance-weighted compound is disabled, so always use average
+        mbmi->compound_idx = 1;
+        mbmi->interinter_comp.type = COMPOUND_AVERAGE;
+      }
+    } else {
+      assert(cm->current_frame.reference_mode != SINGLE_REFERENCE &&
+             is_inter_compound_mode(mbmi->mode) &&
+             mbmi->motion_mode == SIMPLE_TRANSLATION);
+      assert(masked_compound_used);
+
+      // compound_diffwtd, wedge
+      if (is_interinter_compound_used(COMPOUND_WEDGE, bsize))
+        mbmi->interinter_comp.type =
+            COMPOUND_WEDGE + aom_read_symbol(r,
+                                             ec_ctx->compound_type_cdf[bsize],
+                                             MASKED_COMPOUND_TYPES, ACCT_STR);
+      else
+        mbmi->interinter_comp.type = COMPOUND_DIFFWTD;
+
+      if (mbmi->interinter_comp.type == COMPOUND_WEDGE) {
+        assert(is_interinter_compound_used(COMPOUND_WEDGE, bsize));
+        mbmi->interinter_comp.wedge_index =
+            aom_read_symbol(r, ec_ctx->wedge_idx_cdf[bsize], 16, ACCT_STR);
+        mbmi->interinter_comp.wedge_sign = aom_read_bit(r, ACCT_STR);
+      } else {
+        assert(mbmi->interinter_comp.type == COMPOUND_DIFFWTD);
+        mbmi->interinter_comp.mask_type =
+            aom_read_literal(r, MAX_DIFFWTD_MASK_BITS, ACCT_STR);
+      }
+    }
+  }
+
+  read_mb_interp_filter(cm, xd, mbmi, r);
+
+  if (mbmi->motion_mode == WARPED_CAUSAL) {
+    mbmi->wm_params.wmtype = DEFAULT_WMTYPE;
+    mbmi->wm_params.invalid = 0;
+
+    if (mbmi->num_proj_ref > 1)
+      mbmi->num_proj_ref = selectSamples(&mbmi->mv[0].as_mv, pts, pts_inref,
+                                         mbmi->num_proj_ref, bsize);
+
+    if (find_projection(mbmi->num_proj_ref, pts, pts_inref, bsize,
+                        mbmi->mv[0].as_mv.row, mbmi->mv[0].as_mv.col,
+                        &mbmi->wm_params, mi_row, mi_col)) {
+#if WARPED_MOTION_DEBUG
+      printf("Warning: unexpected warped model from aomenc\n");
+#endif
+      mbmi->wm_params.invalid = 1;
+    }
+  }
+
+  xd->cfl.is_chroma_reference =
+      is_chroma_reference(mi_row, mi_col, bsize, cm->seq_params.subsampling_x,
+                          cm->seq_params.subsampling_y);
+  xd->cfl.store_y = store_cfl_required(cm, xd);
+
+#if DEC_MISMATCH_DEBUG
+  dec_dump_logs(cm, mi, mi_row, mi_col, mode_ctx);
+#endif  // DEC_MISMATCH_DEBUG
+}
+
+static void read_inter_frame_mode_info(AV1Decoder *const pbi,
+                                       MACROBLOCKD *const xd, int mi_row,
+                                       int mi_col, aom_reader *r) {
+  AV1_COMMON *const cm = &pbi->common;
+  MB_MODE_INFO *const mbmi = xd->mi[0];
+  int inter_block = 1;
+
+  mbmi->mv[0].as_int = 0;
+  mbmi->mv[1].as_int = 0;
+  mbmi->segment_id = read_inter_segment_id(cm, xd, mi_row, mi_col, 1, r);
+
+  mbmi->skip_mode = read_skip_mode(cm, xd, mbmi->segment_id, r);
+
+  if (mbmi->skip_mode)
+    mbmi->skip = 1;
+  else
+    mbmi->skip = read_skip(cm, xd, mbmi->segment_id, r);
+
+  if (!cm->seg.segid_preskip)
+    mbmi->segment_id = read_inter_segment_id(cm, xd, mi_row, mi_col, 0, r);
+
+  read_cdef(cm, r, xd, mi_col, mi_row);
+
+  read_delta_q_params(cm, xd, mi_row, mi_col, r);
+
+  if (!mbmi->skip_mode)
+    inter_block = read_is_inter_block(cm, xd, mbmi->segment_id, r);
+
+  mbmi->current_qindex = xd->current_qindex;
+
+  xd->above_txfm_context = cm->above_txfm_context[xd->tile.tile_row] + mi_col;
+  xd->left_txfm_context =
+      xd->left_txfm_context_buffer + (mi_row & MAX_MIB_MASK);
+
+  if (inter_block)
+    read_inter_block_mode_info(pbi, xd, mbmi, mi_row, mi_col, r);
+  else
+    read_intra_block_mode_info(cm, mi_row, mi_col, xd, mbmi, r);
+}
+
+static void intra_copy_frame_mvs(AV1_COMMON *const cm, int mi_row, int mi_col,
+                                 int x_mis, int y_mis) {
+  const int frame_mvs_stride = ROUND_POWER_OF_TWO(cm->mi_cols, 1);
+  MV_REF *frame_mvs =
+      cm->cur_frame->mvs + (mi_row >> 1) * frame_mvs_stride + (mi_col >> 1);
+  x_mis = ROUND_POWER_OF_TWO(x_mis, 1);
+  y_mis = ROUND_POWER_OF_TWO(y_mis, 1);
+
+  for (int h = 0; h < y_mis; h++) {
+    MV_REF *mv = frame_mvs;
+    for (int w = 0; w < x_mis; w++) {
+      mv->ref_frame = NONE_FRAME;
+      mv++;
+    }
+    frame_mvs += frame_mvs_stride;
+  }
+}
+
+void av1_read_mode_info(AV1Decoder *const pbi, MACROBLOCKD *xd, int mi_row,
+                        int mi_col, aom_reader *r, int x_mis, int y_mis) {
+  AV1_COMMON *const cm = &pbi->common;
+  MB_MODE_INFO *const mi = xd->mi[0];
+  mi->use_intrabc = 0;
+
+  if (frame_is_intra_only(cm)) {
+    read_intra_frame_mode_info(cm, xd, mi_row, mi_col, r);
+    intra_copy_frame_mvs(cm, mi_row, mi_col, x_mis, y_mis);
+  } else {
+    read_inter_frame_mode_info(pbi, xd, mi_row, mi_col, r);
+    av1_copy_frame_mvs(cm, mi, mi_row, mi_col, x_mis, y_mis);
+  }
+}

diff --git a/libav1/av1/decoder/decodemv.h b/libav1/av1/decoder/decodemv.h
new file mode 100644
index 0000000..1625e5b
--- /dev/null
+++ b/libav1/av1/decoder/decodemv.h

@@ -0,0 +1,35 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_DECODER_DECODEMV_H_
+#define AOM_AV1_DECODER_DECODEMV_H_
+
+#include "aom_dsp/bitreader.h"
+
+#include "av1/decoder/decoder.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void av1_read_mode_info(AV1Decoder *const pbi, MACROBLOCKD *xd,
+
+                        int mi_row, int mi_col, aom_reader *r, int x_mis,
+                        int y_mis);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+void av1_read_tx_type(const AV1_COMMON *const cm, MACROBLOCKD *xd, int blk_row,
+                      int blk_col, TX_SIZE tx_size, aom_reader *r);
+
+#endif  // AOM_AV1_DECODER_DECODEMV_H_

diff --git a/libav1/av1/decoder/decoder.c b/libav1/av1/decoder/decoder.c
new file mode 100644
index 0000000..1575566
--- /dev/null
+++ b/libav1/av1/decoder/decoder.c

@@ -0,0 +1,422 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+#include <limits.h>
+#include <stdio.h>
+
+#include "config/av1_rtcd.h"
+#include "config/aom_dsp_rtcd.h"
+#include "config/aom_scale_rtcd.h"
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_mem/aom_mem.h"
+#include "aom_ports/system_state.h"
+#include "aom_ports/aom_once.h"
+#include "aom_ports/aom_timer.h"
+#include "aom_scale/aom_scale.h"
+#include "aom_util/aom_thread.h"
+
+#include "av1/common/alloccommon.h"
+#include "av1/common/av1_loopfilter.h"
+#include "av1/common/onyxc_int.h"
+#include "av1/common/quant_common.h"
+#include "av1/common/reconinter.h"
+#include "av1/common/reconintra.h"
+
+#include "av1/decoder/decodeframe.h"
+#include "av1/decoder/decoder.h"
+#include "av1/decoder/detokenize.h"
+#include "av1/decoder/obu.h"
+
+static void initialize_dec(void) {
+  av1_rtcd();
+  aom_dsp_rtcd();
+  aom_scale_rtcd();
+  av1_init_intra_predictors();
+  av1_init_wedge_masks();
+}
+
+aom_codec_err_t av1_decoder_create(AV1Decoder * const pbi, BufferPool *const pool) {
+  AV1_COMMON *volatile const cm = &pbi->common;
+  // The jmp_buf is valid only for the duration of the function that calls
+  // setjmp(). Therefore, this function must reset the 'setjmp' field to 0
+  // before it returns.
+  if (setjmp(cm->error.jmp)) {
+    cm->error.setjmp = 0;
+    av1_decoder_remove(pbi);
+    return AOM_CODEC_ERROR;
+  }
+
+  cm->error.setjmp = 1;
+  cm->fc = &cm->fc_alloc[0];
+  cm->default_frame_context = &cm->fc_alloc[1];
+  memset(cm->fc, 0, sizeof(*cm->fc));
+  memset(cm->default_frame_context, 0, sizeof(*cm->default_frame_context));
+
+  pbi->need_resync = 1;
+  aom_once(initialize_dec);
+
+  // Initialize the references to not point to any frame buffers.
+  for (int i = 0; i < REF_FRAMES; i++) {
+    cm->ref_frame_map[i] = NULL;
+    cm->next_ref_frame_map[i] = NULL;
+  }
+
+  cm->current_frame.frame_number = 0;
+  pbi->decoding_first_frame = 1;
+  pbi->common.buffer_pool = pool;
+
+  cm->seq_params.bit_depth = AOM_BITS_8;
+
+  av1_loop_filter_init(cm);
+
+  av1_qm_init(cm);
+  av1_loop_restoration_precal();
+#if CONFIG_ACCOUNTING
+  pbi->acct_enabled = 1;
+  aom_accounting_init(&pbi->accounting);
+#endif
+  cm->error.setjmp = 0;
+  return AOM_CODEC_OK;
+}
+
+void av1_dealloc_dec_jobs(struct AV1DecTileMTData *tile_mt_info) {
+  if (tile_mt_info != NULL) {
+#if CONFIG_MULTITHREAD
+     for (int i = 0; i < tile_mt_info->alloc_tile_cols * tile_mt_info->alloc_tile_rows; ++i)
+        pthread_mutex_destroy(&tile_mt_info->job_mutex[i]);
+#endif
+    // clear the structure as the source of this call may be a resize in which
+    // case this call will be followed by an _alloc() which may fail.
+    av1_zero(*tile_mt_info);
+  }
+}
+
+void av1_dec_free_cb_buf(AV1Decoder *pbi) {
+  aom_free(pbi->cb_buffer_base);
+  pbi->cb_buffer_base = NULL;
+  pbi->cb_buffer_alloc_size = 0;
+}
+
+void av1_decoder_remove(AV1Decoder *pbi) {
+  int i;
+
+  if (!pbi) return;
+  for (i = 0; i < pbi->num_workers; ++i) {
+    AVxWorker *const worker = &pbi->tile_workers[i];
+    aom_get_worker_interface()->end(worker);
+  }
+
+  if (pbi->num_workers > 0) {
+    av1_dealloc_dec_jobs(&pbi->tile_mt_info);
+  }
+}
+
+void av1_visit_palette(AV1Decoder *const pbi, MACROBLOCKD *const xd, int mi_row,
+                       int mi_col, aom_reader *r, BLOCK_SIZE bsize,
+                       palette_visitor_fn_t visit) {
+  if (!is_inter_block(xd->mi[0])) {
+    for (int plane = 0; plane < AOMMIN(2, av1_num_planes(&pbi->common));
+         ++plane) {
+      const struct macroblockd_plane *const pd = &xd->plane[plane];
+      if (is_chroma_reference(mi_row, mi_col, bsize, pd->subsampling_x,
+                              pd->subsampling_y)) {
+        if (xd->mi[0]->palette_mode_info.palette_size[plane])
+          visit(pbi, xd, plane, r);
+      } else {
+        assert(xd->mi[0]->palette_mode_info.palette_size[plane] == 0);
+      }
+    }
+  }
+}
+
+static int equal_dimensions(const YV12_BUFFER_CONFIG *a,
+                            const YV12_BUFFER_CONFIG *b) {
+  return a->y_height == b->y_height && a->y_width == b->y_width &&
+         a->uv_height == b->uv_height && a->uv_width == b->uv_width;
+}
+
+aom_codec_err_t av1_copy_reference_dec(AV1Decoder *pbi, int idx,
+                                       YV12_BUFFER_CONFIG *sd) {
+  AV1_COMMON *cm = &pbi->common;
+  const int num_planes = av1_num_planes(cm);
+
+  const YV12_BUFFER_CONFIG *const cfg = get_ref_frame(cm, idx);
+  if (cfg == NULL) {
+    aom_internal_error(&cm->error, AOM_CODEC_ERROR, "No reference frame");
+    return AOM_CODEC_ERROR;
+  }
+  if (!equal_dimensions(cfg, sd))
+    aom_internal_error(&cm->error, AOM_CODEC_ERROR,
+                       "Incorrect buffer dimensions");
+  else
+    aom_yv12_copy_frame(cfg, sd, num_planes);
+
+  return cm->error.error_code;
+}
+
+static int equal_dimensions_and_border(const YV12_BUFFER_CONFIG *a,
+                                       const YV12_BUFFER_CONFIG *b) {
+  return a->y_height == b->y_height && a->y_width == b->y_width &&
+         a->uv_height == b->uv_height && a->uv_width == b->uv_width &&
+         a->y_stride == b->y_stride && a->uv_stride == b->uv_stride &&
+         a->border == b->border &&
+         (a->flags & YV12_FLAG_HIGHBITDEPTH) ==
+             (b->flags & YV12_FLAG_HIGHBITDEPTH);
+}
+
+aom_codec_err_t av1_set_reference_dec(AV1_COMMON *cm, int idx,
+                                      int use_external_ref,
+                                      YV12_BUFFER_CONFIG *sd) {
+  const int num_planes = av1_num_planes(cm);
+  YV12_BUFFER_CONFIG *ref_buf = NULL;
+
+  // Get the destination reference buffer.
+  ref_buf = get_ref_frame(cm, idx);
+
+  if (ref_buf == NULL) {
+    aom_internal_error(&cm->error, AOM_CODEC_ERROR, "No reference frame");
+    return AOM_CODEC_ERROR;
+  }
+
+  if (!use_external_ref) {
+    if (!equal_dimensions(ref_buf, sd)) {
+      aom_internal_error(&cm->error, AOM_CODEC_ERROR,
+                         "Incorrect buffer dimensions");
+    } else {
+      // Overwrite the reference frame buffer.
+      aom_yv12_copy_frame(sd, ref_buf, num_planes);
+    }
+  } else {
+    if (!equal_dimensions_and_border(ref_buf, sd)) {
+      aom_internal_error(&cm->error, AOM_CODEC_ERROR,
+                         "Incorrect buffer dimensions");
+    } else {
+      // Overwrite the reference frame buffer pointers.
+      // Once we no longer need the external reference buffer, these pointers
+      // are restored.
+      ref_buf->store_buf_adr[0] = ref_buf->y_buffer;
+      ref_buf->store_buf_adr[1] = ref_buf->u_buffer;
+      ref_buf->store_buf_adr[2] = ref_buf->v_buffer;
+      ref_buf->y_buffer = sd->y_buffer;
+      ref_buf->u_buffer = sd->u_buffer;
+      ref_buf->v_buffer = sd->v_buffer;
+      ref_buf->use_external_reference_buffers = 1;
+    }
+  }
+
+  return cm->error.error_code;
+}
+
+aom_codec_err_t av1_copy_new_frame_dec(AV1_COMMON *cm,
+                                       YV12_BUFFER_CONFIG *new_frame,
+                                       YV12_BUFFER_CONFIG *sd) {
+  const int num_planes = av1_num_planes(cm);
+
+  if (!equal_dimensions_and_border(new_frame, sd))
+    aom_internal_error(&cm->error, AOM_CODEC_ERROR,
+                       "Incorrect buffer dimensions");
+  else
+    aom_yv12_copy_frame(new_frame, sd, num_planes);
+
+  return cm->error.error_code;
+}
+
+static void release_frame_buffers(AV1Decoder *pbi) {
+  AV1_COMMON *const cm = &pbi->common;
+  BufferPool *const pool = cm->buffer_pool;
+
+  cm->cur_frame->buf.corrupted = 1;
+  lock_buffer_pool(pool);
+  // Release all the reference buffers in cm->next_ref_frame_map if the worker
+  // thread is holding them.
+  if (pbi->hold_ref_buf) {
+    for (int ref_index = 0; ref_index < REF_FRAMES; ++ref_index) {
+      decrease_ref_count(cm->next_ref_frame_map[ref_index], pool);
+      cm->next_ref_frame_map[ref_index] = NULL;
+    }
+    pbi->hold_ref_buf = 0;
+  }
+  // Release current frame.
+  decrease_ref_count(cm->cur_frame, pool);
+  unlock_buffer_pool(pool);
+  cm->cur_frame = NULL;
+}
+
+// If any buffer updating is signaled it should be done here.
+// Consumes a reference to cm->cur_frame.
+//
+// This functions returns void. It reports failure by setting
+// cm->error.error_code.
+static void swap_frame_buffers(AV1Decoder *pbi, int frame_decoded) {
+  int ref_index = 0, mask;
+  AV1_COMMON *const cm = &pbi->common;
+  BufferPool *const pool = cm->buffer_pool;
+
+  if (frame_decoded) {
+    lock_buffer_pool(pool);
+
+    // In ext-tile decoding, the camera frame header is only decoded once. So,
+    // we don't release the references here.
+    if (!pbi->camera_frame_header_ready) {
+      // If we are not holding reference buffers in cm->next_ref_frame_map,
+      // assert that the following two for loops are no-ops.
+      assert(IMPLIES(!pbi->hold_ref_buf,
+                     cm->current_frame.refresh_frame_flags == 0));
+      assert(IMPLIES(!pbi->hold_ref_buf,
+                     cm->show_existing_frame && !pbi->reset_decoder_state));
+
+      // The following two for loops need to release the reference stored in
+      // cm->ref_frame_map[ref_index] before transferring the reference stored
+      // in cm->next_ref_frame_map[ref_index] to cm->ref_frame_map[ref_index].
+      for (mask = cm->current_frame.refresh_frame_flags; mask; mask >>= 1) {
+        decrease_ref_count(cm->ref_frame_map[ref_index], pool);
+        cm->ref_frame_map[ref_index] = cm->next_ref_frame_map[ref_index];
+        cm->next_ref_frame_map[ref_index] = NULL;
+        ++ref_index;
+      }
+
+      const int check_on_show_existing_frame =
+          !cm->show_existing_frame || pbi->reset_decoder_state;
+      for (; ref_index < REF_FRAMES && check_on_show_existing_frame;
+           ++ref_index) {
+        decrease_ref_count(cm->ref_frame_map[ref_index], pool);
+        cm->ref_frame_map[ref_index] = cm->next_ref_frame_map[ref_index];
+        cm->next_ref_frame_map[ref_index] = NULL;
+      }
+    }
+
+    decrease_ref_count(cm->cur_frame, pool);
+
+    unlock_buffer_pool(pool);
+  } else {
+    // The code here assumes we are not holding reference buffers in
+    // cm->next_ref_frame_map. If this assertion fails, we are leaking the
+    // frame buffer references in cm->next_ref_frame_map.
+    assert(IMPLIES(!pbi->camera_frame_header_ready, !pbi->hold_ref_buf));
+    // Nothing was decoded, so just drop this frame buffer
+    lock_buffer_pool(pool);
+    decrease_ref_count(cm->cur_frame, pool);
+    unlock_buffer_pool(pool);
+  }
+  cm->cur_frame = NULL;
+
+  if (!pbi->camera_frame_header_ready) {
+    pbi->hold_ref_buf = 0;
+
+    // Invalidate these references until the next frame starts.
+    for (ref_index = 0; ref_index < INTER_REFS_PER_FRAME; ref_index++) {
+      cm->remapped_ref_idx[ref_index] = INVALID_IDX;
+    }
+  }
+}
+
+int av1_receive_compressed_data(AV1Decoder *pbi, size_t size,
+                                const uint8_t **psource) {
+  AV1_COMMON *volatile const cm = &pbi->common;
+  const uint8_t *source = *psource;
+  cm->error.error_code = AOM_CODEC_OK;
+  cm->error.has_detail = 0;
+
+  if (size == 0) {
+    // This is used to signal that we are missing frames.
+    // We do not know if the missing frame(s) was supposed to update
+    // any of the reference buffers, but we act conservative and
+    // mark only the last buffer as corrupted.
+    //
+    // TODO(jkoleszar): Error concealment is undefined and non-normative
+    // at this point, but if it becomes so, [0] may not always be the correct
+    // thing to do here.
+    RefCntBuffer *ref_buf = get_ref_frame_buf(cm, LAST_FRAME);
+    if (ref_buf != NULL) ref_buf->buf.corrupted = 1;
+  }
+
+  if (assign_cur_frame_new_fb(cm) == NULL) {
+    cm->error.error_code = AOM_CODEC_MEM_ERROR;
+    return 1;
+  }
+
+  if (!pbi->camera_frame_header_ready) pbi->hold_ref_buf = 0;
+
+  // The jmp_buf is valid only for the duration of the function that calls
+  // setjmp(). Therefore, this function must reset the 'setjmp' field to 0
+  // before it returns.
+  if (setjmp(cm->error.jmp)) {
+    const AVxWorkerInterface *const winterface = aom_get_worker_interface();
+    int i;
+
+    cm->error.setjmp = 0;
+
+    // Synchronize all threads immediately as a subsequent decode call may
+    // cause a resize invalidating some allocations.
+    for (i = 0; i < pbi->num_workers; ++i) {
+      winterface->sync(&pbi->tile_workers[i]);
+    }
+
+    release_frame_buffers(pbi);
+    aom_clear_system_state();
+    return -1;
+  }
+
+  cm->error.setjmp = 1;
+
+  int frame_decoded =
+      aom_decode_frame_from_obus(pbi, source, source + size, psource);
+
+  if (frame_decoded < 0) {
+    assert(cm->error.error_code != AOM_CODEC_OK);
+    release_frame_buffers(pbi);
+    cm->error.setjmp = 0;
+    return 1;
+  }
+
+#if TXCOEFF_TIMER
+  cm->cum_txcoeff_timer += cm->txcoeff_timer;
+  fprintf(stderr,
+          "txb coeff block number: %d, frame time: %ld, cum time %ld in us\n",
+          cm->txb_count, cm->txcoeff_timer, cm->cum_txcoeff_timer);
+  cm->txcoeff_timer = 0;
+  cm->txb_count = 0;
+#endif
+
+  // Note: At this point, this function holds a reference to cm->cur_frame
+  // in the buffer pool. This reference is consumed by swap_frame_buffers().
+  swap_frame_buffers(pbi, frame_decoded);
+
+  if (frame_decoded) {
+    pbi->decoding_first_frame = 0;
+  }
+
+  if (cm->error.error_code != AOM_CODEC_OK) {
+    cm->error.setjmp = 0;
+    return 1;
+  }
+
+  aom_clear_system_state();
+
+  if (!cm->show_existing_frame) {
+    if (cm->seg.enabled) {
+      if (cm->prev_frame && (cm->mi_rows == cm->prev_frame->mi_rows) &&
+          (cm->mi_cols == cm->prev_frame->mi_cols)) {
+        cm->last_frame_seg_map = cm->prev_frame->seg_map;
+      } else {
+        cm->last_frame_seg_map = NULL;
+      }
+    }
+  }
+
+  // Update progress in frame parallel decode.
+  cm->error.setjmp = 0;
+
+  return 0;
+}
+

diff --git a/libav1/av1/decoder/decoder.h b/libav1/av1/decoder/decoder.h
new file mode 100644
index 0000000..ae66101
--- /dev/null
+++ b/libav1/av1/decoder/decoder.h

@@ -0,0 +1,338 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_DECODER_DECODER_H_
+#define AOM_AV1_DECODER_DECODER_H_
+
+#include "config/aom_config.h"
+
+#include "aom/aom_codec.h"
+#include "aom_dsp/bitreader.h"
+#include "aom_scale/yv12config.h"
+#include "aom_util/aom_thread.h"
+
+#include "av1/common/thread_common.h"
+#include "av1/common/onyxc_int.h"
+#include "av1/decoder/dthread.h"
+#if CONFIG_ACCOUNTING
+#include "av1/decoder/accounting.h"
+#endif
+#if CONFIG_INSPECTION
+#include "av1/decoder/inspection.h"
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef void (*decode_block_visitor_fn_t)(const AV1_COMMON *const cm,
+                                          MACROBLOCKD *const xd,
+                                          aom_reader *const r, const int plane,
+                                          const int row, const int col,
+                                          const TX_SIZE tx_size);
+
+typedef void (*predict_inter_block_visitor_fn_t)(AV1_COMMON *const cm,
+                                                 MACROBLOCKD *const xd,
+                                                 int mi_row, int mi_col,
+                                                 BLOCK_SIZE bsize);
+
+typedef void (*cfl_store_inter_block_visitor_fn_t)(AV1_COMMON *const cm,
+                                                   MACROBLOCKD *const xd);
+
+typedef struct ThreadData {
+  DECLARE_ALIGNED(32, MACROBLOCKD, xd);
+  DECLARE_ALIGNED(32, FRAME_CONTEXT, tctx);
+  aom_reader thread_bit_reader;
+  CB_BUFFER cb_buffer_base;
+  aom_reader *bit_reader;
+  uint8_t *mc_buf[2];
+  int32_t mc_buf_size;
+  int mc_buf_use_highbd;  // Boolean: whether the byte pointers stored in
+                          // mc_buf were converted from highbd pointers.
+  av1_tile_data * tile_data;
+  av1_tile_data * tile_data2;
+  CONV_BUF_TYPE *tmp_conv_dst;
+  uint8_t *tmp_obmc_bufs[2];
+
+  MB_MODE_INFO *mi_pool;
+  MB_MODE_INFO *mi_pool2;
+  int32_t mi_count;
+  int32_t mi_count2;
+  int32_t mi_count_max;
+  int32_t ext_idct_buffer;
+  int32_t thread_id;
+  int32_t tile_id;
+  decode_block_visitor_fn_t read_coeffs_tx_intra_block_visit;
+  decode_block_visitor_fn_t predict_and_recon_intra_block_visit;
+  decode_block_visitor_fn_t read_coeffs_tx_inter_block_visit;
+  decode_block_visitor_fn_t inverse_tx_inter_block_visit;
+  predict_inter_block_visitor_fn_t predict_inter_block_visit;
+  cfl_store_inter_block_visitor_fn_t cfl_store_inter_block_visit;
+} ThreadData;
+
+typedef struct AV1DecRowMTJobInfo {
+  int tile_row;
+  int tile_col;
+  int mi_row;
+} AV1DecRowMTJobInfo;
+
+typedef struct AV1DecRowMTSyncData {
+#if CONFIG_MULTITHREAD
+  pthread_mutex_t *mutex_;
+  pthread_cond_t *cond_;
+#endif
+  int allocated_sb_rows;
+  int *cur_sb_col;
+  int sync_range;
+  int mi_rows;
+  int mi_cols;
+  int mi_rows_parse_done;
+  int mi_rows_decode_started;
+  int num_threads_working;
+} AV1DecRowMTSync;
+
+typedef struct AV1DecRowMTInfo {
+  int tile_rows_start;
+  int tile_rows_end;
+  int tile_cols_start;
+  int tile_cols_end;
+  int start_tile;
+  int end_tile;
+  int mi_rows_to_decode;
+
+  // Invariant:
+  //   mi_rows_parse_done >= mi_rows_decode_started.
+  // mi_rows_parse_done and mi_rows_decode_started are both initialized to 0.
+  // mi_rows_parse_done is incremented freely. mi_rows_decode_started may only
+  // be incremented to catch up with mi_rows_parse_done but is not allowed to
+  // surpass mi_rows_parse_done.
+  //
+  // When mi_rows_decode_started reaches mi_rows_to_decode, there are no more
+  // decode jobs.
+
+  // Indicates the progress of the bit-stream parsing of superblocks.
+  // Initialized to 0. Incremented by sb_mi_size when parse sb row is done.
+  int mi_rows_parse_done;
+  // Indicates the progress of the decoding of superblocks.
+  // Initialized to 0. Incremented by sb_mi_size when decode sb row is started.
+  int mi_rows_decode_started;
+  // Boolean: Initialized to 0 (false). Set to 1 (true) on error to abort
+  // decoding.
+  int row_mt_exit;
+} AV1DecRowMTInfo;
+
+typedef struct TileDataDec {
+  TileInfo tile_info;
+  int is_last;
+  int is_choosen_one;
+//  aom_reader bit_reader;
+//  DECLARE_ALIGNED(16, FRAME_CONTEXT, tctx);
+//  AV1DecRowMTSync dec_row_mt_sync;
+} TileDataDec;
+
+typedef struct TileBufferDec {
+  const uint8_t *data;
+  size_t size;
+} TileBufferDec;
+
+typedef struct DataBuffer {
+  const uint8_t *data;
+  size_t size;
+} DataBuffer;
+
+typedef struct EXTERNAL_REFERENCES {
+  YV12_BUFFER_CONFIG refs[MAX_EXTERNAL_REFERENCES];
+  int num;
+} EXTERNAL_REFERENCES;
+
+typedef struct TileJobsDec {
+  TileBufferDec *tile_buffer;
+  TileDataDec *tile_data;
+} TileJobsDec;
+
+typedef struct AV1DecTileMTData {
+#if CONFIG_MULTITHREAD
+  pthread_mutex_t job_mutex[MAX_TILE_COLS * MAX_TILE_ROWS];
+#endif
+  TileJobsDec job_queue[MAX_TILE_COLS * MAX_TILE_ROWS];
+  int jobs_enqueued;
+  int jobs_dequeued;
+  int alloc_tile_rows;
+  int alloc_tile_cols;
+} AV1DecTileMT;
+
+typedef struct {
+    int row_start;
+    int row_end;
+} MotionFieldWorkerData;
+
+typedef struct AV1Decoder {
+  DECLARE_ALIGNED(32, MACROBLOCKD, mb);
+
+  DECLARE_ALIGNED(32, AV1_COMMON, common);
+
+  struct Av1Core * gpu_decoder;
+
+  const uint8_t *data;
+  const uint8_t *data_end;
+  size_t data_size;
+  void *user_priv;
+
+  AV1LrStruct lr_ctxt;
+  AVxWorker tile_workers[8];
+  int num_workers;
+  DecWorkerData thread_data[8];
+  ThreadData td[8];
+  TileDataDec *tile_data;
+  MotionFieldWorkerData motion_field_data[8];
+  int allocated_tiles;
+  aom_reader bit_reader_end;
+  DECLARE_ALIGNED(16, FRAME_CONTEXT, ctx_the_choosen_one);
+
+  TileBufferDec tile_buffers[MAX_TILE_ROWS][MAX_TILE_COLS];
+  AV1DecTileMT tile_mt_info;
+
+  // In order to properly support random-access decoding, we need
+  // to behave slightly differently for the very first frame we decode.
+  // So we track whether this is the first frame or not.
+  int decoding_first_frame;
+
+  int allow_lowbitdepth;
+  int max_threads;
+  int inv_tile_order;
+  int need_resync;   // wait for key/intra-only frame.
+  int hold_ref_buf;  // Boolean: whether we are holding reference buffers in
+                     // common.next_ref_frame_map.
+  int reset_decoder_state;
+
+  int tile_size_bytes;
+  int tile_col_size_bytes;
+  int dec_tile_row, dec_tile_col;  // always -1 for non-VR tile encoding
+#if CONFIG_ACCOUNTING
+  int acct_enabled;
+  Accounting accounting;
+#endif
+  int tg_size;   // Number of tiles in the current tilegroup
+  int tg_start;  // First tile in the current tilegroup
+  int tg_size_bit_offset;
+  int sequence_header_ready;
+  int sequence_header_changed;
+#if CONFIG_INSPECTION
+  aom_inspect_cb inspect_cb;
+  void *inspect_ctx;
+#endif
+  int operating_point;
+  int current_operating_point;
+  int seen_frame_header;
+
+  // State if the camera frame header is already decoded while
+  // large_scale_tile = 1.
+  int camera_frame_header_ready;
+  size_t frame_header_size;
+  DataBuffer obu_size_hdr;
+  int output_frame_width_in_tiles_minus_1;
+  int output_frame_height_in_tiles_minus_1;
+  int tile_count_minus_1;
+  uint32_t coded_tile_data_size;
+  unsigned int ext_tile_debug;  // for ext-tile software debug & testing
+  unsigned int row_mt;
+  EXTERNAL_REFERENCES ext_refs;
+  YV12_BUFFER_CONFIG tile_list_outbuf;
+
+  CB_BUFFER *cb_buffer_base;
+  int cb_buffer_alloc_size;
+
+  int allocated_row_mt_sync_rows;
+
+#if CONFIG_MULTITHREAD
+  pthread_mutex_t *row_mt_mutex_;
+  pthread_cond_t *row_mt_cond_;
+#endif
+
+  AV1DecRowMTInfo frame_row_mt_info;
+  TileDataDec tile_data_alloc[MAX_TILE_ROWS*MAX_TILE_COLS];
+} AV1Decoder;
+
+// Returns 0 on success. Sets pbi->common.error.error_code to a nonzero error
+// code and returns a nonzero value on failure.
+int av1_receive_compressed_data(struct AV1Decoder *pbi, size_t size,
+                                const uint8_t **psource);
+
+// Get the frame at a particular index in the output queue
+int av1_get_raw_frame(AV1Decoder *pbi, size_t index, YV12_BUFFER_CONFIG **sd,
+                      aom_film_grain_t **grain_params);
+
+int av1_get_frame_to_show(struct AV1Decoder *pbi, YV12_BUFFER_CONFIG *frame);
+
+aom_codec_err_t av1_copy_reference_dec(struct AV1Decoder *pbi, int idx,
+                                       YV12_BUFFER_CONFIG *sd);
+
+aom_codec_err_t av1_set_reference_dec(AV1_COMMON *cm, int idx,
+                                      int use_external_ref,
+                                      YV12_BUFFER_CONFIG *sd);
+aom_codec_err_t av1_copy_new_frame_dec(AV1_COMMON *cm,
+                                       YV12_BUFFER_CONFIG *new_frame,
+                                       YV12_BUFFER_CONFIG *sd);
+
+aom_codec_err_t av1_decoder_create(struct AV1Decoder * const pbi, BufferPool *const pool);
+
+void av1_decoder_remove(struct AV1Decoder *pbi);
+void av1_dealloc_dec_jobs(struct AV1DecTileMTData *tile_mt_info);
+
+void av1_dec_row_mt_dealloc(AV1DecRowMTSync *dec_row_mt_sync);
+
+void av1_dec_free_cb_buf(AV1Decoder *pbi);
+
+static INLINE void decrease_ref_count(RefCntBuffer *const buf,
+                                      BufferPool *const pool) {
+  if (buf != NULL) {
+    --buf->ref_count;
+    // Reference counts should never become negative. If this assertion fails,
+    // there is a bug in our reference count management.
+    assert(buf->ref_count >= 0);
+    // A worker may only get a free framebuffer index when calling get_free_fb.
+    // But the raw frame buffer is not set up until we finish decoding header.
+    // So if any error happens during decoding header, frame_bufs[idx] will not
+    // have a valid raw frame buffer.
+    if (buf->ref_count == 0) {
+      pool->release_fb_cb_int(pool->cb_priv, &buf->buf);
+    }
+  }
+}
+
+#define ACCT_STR __func__
+static INLINE int av1_read_uniform(aom_reader *r, int n) {
+  const int l = get_unsigned_bits(n);
+  const int m = (1 << l) - n;
+  const int v = aom_read_literal(r, l - 1, ACCT_STR);
+  assert(l != 0);
+  if (v < m)
+    return v;
+  else
+    return (v << 1) - m + aom_read_literal(r, 1, ACCT_STR);
+}
+
+typedef void (*palette_visitor_fn_t)(AV1Decoder *const pbi, MACROBLOCKD *const xd, int plane,
+                                     aom_reader *r);
+
+void av1_visit_palette(AV1Decoder *const pbi, MACROBLOCKD *const xd, int mi_row,
+                       int mi_col, aom_reader *r, BLOCK_SIZE bsize,
+                       palette_visitor_fn_t visit);
+
+typedef void (*block_visitor_fn_t)(AV1Decoder *const pbi, ThreadData *const td,
+                                   int mi_row, int mi_col, aom_reader *r,
+                                   PARTITION_TYPE partition, BLOCK_SIZE bsize);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_DECODER_DECODER_H_

diff --git a/libav1/av1/decoder/decodetxb.c b/libav1/av1/decoder/decodetxb.c
new file mode 100644
index 0000000..c4451da
--- /dev/null
+++ b/libav1/av1/decoder/decodetxb.c

@@ -0,0 +1,418 @@
+/*
+ * Copyright (c) 2017, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "av1/decoder/decodetxb.h"
+
+#include "aom_ports/mem.h"
+#include "av1/common/idct.h"
+#include "av1/common/scan.h"
+#include "av1/common/txb_common.h"
+#include "av1/decoder/decodemv.h"
+
+#define ACCT_STR __func__
+
+static int read_golomb(MACROBLOCKD *xd, aom_reader *r) {
+  int x = 1;
+  int length = 0;
+  int i = 0;
+
+  while (!i) {
+    i = aom_read_bit(r, ACCT_STR);
+    ++length;
+    if (length > 20) {
+      aom_internal_error(xd->error_info, AOM_CODEC_CORRUPT_FRAME,
+                         "Invalid length in read_golomb");
+      break;
+    }
+  }
+
+  for (i = 0; i < length - 1; ++i) {
+    x <<= 1;
+    x += aom_read_bit(r, ACCT_STR);
+  }
+
+  return x - 1;
+}
+
+static INLINE int rec_eob_pos(const int eob_token, const int extra) {
+  int eob = k_eob_group_start[eob_token];
+  if (eob > 2) {
+    eob += extra;
+  }
+  return eob;
+}
+
+static INLINE int get_dqv(const int16_t *dequant, int coeff_idx,
+                          const qm_val_t *iqmatrix) {
+  int dqv = dequant[!!coeff_idx];
+  if (iqmatrix != NULL)
+    dqv =
+        ((iqmatrix[coeff_idx] * dqv) + (1 << (AOM_QM_BITS - 1))) >> AOM_QM_BITS;
+  return dqv;
+}
+
+static INLINE void read_coeffs_reverse_2d(aom_reader *r, TX_SIZE tx_size,
+                                          int start_si, int end_si,
+                                          const int16_t *scan, int bwl,
+                                          uint8_t *levels,
+                                          base_cdf_arr base_cdf,
+                                          br_cdf_arr br_cdf) {
+  for (int c = end_si; c >= start_si; --c) {
+    const int pos = scan[c];
+    const int coeff_ctx = get_lower_levels_ctx_2d(levels, pos, bwl, tx_size);
+    const int nsymbs = 4;
+    int level = aom_read_symbol(r, base_cdf[coeff_ctx], nsymbs, ACCT_STR);
+    if (level > NUM_BASE_LEVELS) {
+      const int br_ctx = get_br_ctx_2d(levels, pos, bwl);
+      aom_cdf_prob *cdf = br_cdf[br_ctx];
+      for (int idx = 0; idx < COEFF_BASE_RANGE; idx += BR_CDF_SIZE - 1) {
+        const int k = aom_read_symbol(r, cdf, BR_CDF_SIZE, ACCT_STR);
+        level += k;
+        if (k < BR_CDF_SIZE - 1) break;
+      }
+    }
+    levels[get_padded_idx(pos, bwl)] = level;
+  }
+}
+
+static INLINE void read_coeffs_reverse(aom_reader *r, TX_SIZE tx_size,
+                                       TX_CLASS tx_class, int start_si,
+                                       int end_si, const int16_t *scan, int bwl,
+                                       uint8_t *levels, base_cdf_arr base_cdf,
+                                       br_cdf_arr br_cdf) {
+  for (int c = end_si; c >= start_si; --c) {
+    const int pos = scan[c];
+    const int coeff_ctx =
+        get_lower_levels_ctx(levels, pos, bwl, tx_size, tx_class);
+    const int nsymbs = 4;
+    int level = aom_read_symbol(r, base_cdf[coeff_ctx], nsymbs, ACCT_STR);
+    if (level > NUM_BASE_LEVELS) {
+      const int br_ctx = get_br_ctx(levels, pos, bwl, tx_class);
+      aom_cdf_prob *cdf = br_cdf[br_ctx];
+      for (int idx = 0; idx < COEFF_BASE_RANGE; idx += BR_CDF_SIZE - 1) {
+        const int k = aom_read_symbol(r, cdf, BR_CDF_SIZE, ACCT_STR);
+        level += k;
+        if (k < BR_CDF_SIZE - 1) break;
+      }
+    }
+    levels[get_padded_idx(pos, bwl)] = level;
+  }
+}
+
+
+#define USE_GPU_BLOCKS 1
+
+uint8_t av1_read_coeffs_txb(const AV1_COMMON *const cm, MACROBLOCKD *const xd,
+                            aom_reader *const r, const int blk_row,
+                            const int blk_col, const int plane,
+                            const TXB_CTX *const txb_ctx,
+                            const TX_SIZE tx_size) {
+  FRAME_CONTEXT *const ec_ctx = xd->tile_ctx;
+  const int32_t max_value = (1 << (7 + xd->bd)) - 1;
+  const int32_t min_value = -(1 << (7 + xd->bd));
+  const TX_SIZE txs_ctx = get_txsize_entropy_ctx(tx_size);
+  const PLANE_TYPE plane_type = get_plane_type(plane);
+  MB_MODE_INFO *const mbmi = xd->mi[0];
+  struct macroblockd_plane *const pd = &xd->plane[plane];
+  const int16_t *const dequant = pd->seg_dequant_QTX[mbmi->segment_id];
+  av1_tile_data * tile_data = xd->tile_data;
+  tran_low_t *const tcoeffs_raw = tile_data->dq_buffer_base + tile_data->dq_buffer_ptr;
+  const int shift = av1_get_tx_scale(tx_size);
+  const int bwl = get_txb_bwl(tx_size);
+  const int width = get_txb_wide(tx_size);
+  const int height = get_txb_high(tx_size);
+  int cul_level = 0;
+  int dc_val = 0;
+  uint8_t levels_buf[TX_PAD_2D];
+  uint8_t *const levels = set_levels(levels_buf, width);
+  const int all_zero = aom_read_symbol(
+      r, ec_ctx->txb_skip_cdf[txs_ctx][txb_ctx->txb_skip_ctx], 2, ACCT_STR);
+  eob_info *eob_data = pd->eob_data + xd->txb_offset[plane];
+  uint16_t *const eob = &(eob_data->eob);
+  uint16_t *const max_scan_line = &(eob_data->max_scan_line);
+  *max_scan_line = 0;
+  *eob = 0;
+
+#if CONFIG_INSPECTION
+  if (plane == 0) {
+    const int txk_type_idx =
+        av1_get_txk_type_index(mbmi->sb_type, blk_row, blk_col);
+    mbmi->tx_skip[txk_type_idx] = all_zero;
+  }
+#endif
+
+  if (all_zero) {
+    *max_scan_line = 0;
+    if (plane == 0) {
+      const int txk_type_idx =
+          av1_get_txk_type_index(mbmi->sb_type, blk_row, blk_col);
+      xd->txk_type[txk_type_idx] = DCT_DCT;
+    }
+    return 0;
+  }
+
+  if (plane == AOM_PLANE_Y) {
+    // only y plane's tx_type is transmitted
+    av1_read_tx_type(cm, xd, blk_row, blk_col, tx_size, r);
+  }
+  const TX_TYPE tx_type = av1_get_tx_type(plane_type, xd, blk_row, blk_col,
+                                          tx_size, cm->reduced_tx_set_used);
+  const TX_CLASS tx_class = tx_type_to_class[tx_type];
+  const TX_SIZE qm_tx_size = av1_get_adjusted_tx_size(tx_size);
+  const qm_val_t *iqmatrix =
+      IS_2D_TRANSFORM(tx_type)
+          ? pd->seg_iqmatrix[mbmi->segment_id][qm_tx_size]
+          : cm->giqmatrix[NUM_QM_LEVELS - 1][0][qm_tx_size];
+  const SCAN_ORDER *const scan_order = get_scan(tx_size, tx_type);
+  const int16_t *const scan = scan_order->scan;
+  int eob_extra = 0;
+  int eob_pt = 1;
+
+  const int eob_multi_size = txsize_log2_minus4[tx_size];
+  const int eob_multi_ctx = (tx_class == TX_CLASS_2D) ? 0 : 1;
+  switch (eob_multi_size) {
+    case 0:
+      eob_pt =
+          aom_read_symbol(r, ec_ctx->eob_flag_cdf16[plane_type][eob_multi_ctx],
+                          5, ACCT_STR) +
+          1;
+      break;
+    case 1:
+      eob_pt =
+          aom_read_symbol(r, ec_ctx->eob_flag_cdf32[plane_type][eob_multi_ctx],
+                          6, ACCT_STR) +
+          1;
+      break;
+    case 2:
+      eob_pt =
+          aom_read_symbol(r, ec_ctx->eob_flag_cdf64[plane_type][eob_multi_ctx],
+                          7, ACCT_STR) +
+          1;
+      break;
+    case 3:
+      eob_pt =
+          aom_read_symbol(r, ec_ctx->eob_flag_cdf128[plane_type][eob_multi_ctx],
+                          8, ACCT_STR) +
+          1;
+      break;
+    case 4:
+      eob_pt =
+          aom_read_symbol(r, ec_ctx->eob_flag_cdf256[plane_type][eob_multi_ctx],
+                          9, ACCT_STR) +
+          1;
+      break;
+    case 5:
+      eob_pt =
+          aom_read_symbol(r, ec_ctx->eob_flag_cdf512[plane_type][eob_multi_ctx],
+                          10, ACCT_STR) +
+          1;
+      break;
+    case 6:
+    default:
+      eob_pt = aom_read_symbol(
+                   r, ec_ctx->eob_flag_cdf1024[plane_type][eob_multi_ctx], 11,
+                   ACCT_STR) +
+               1;
+      break;
+  }
+
+  const int eob_offset_bits = k_eob_offset_bits[eob_pt];
+  if (eob_offset_bits > 0) {
+    const int eob_ctx = eob_pt - 3;
+    int bit = aom_read_symbol(
+        r, ec_ctx->eob_extra_cdf[txs_ctx][plane_type][eob_ctx], 2, ACCT_STR);
+    if (bit) {
+      eob_extra += (1 << (eob_offset_bits - 1));
+    }
+
+    for (int i = 1; i < eob_offset_bits; i++) {
+      bit = aom_read_bit(r, ACCT_STR);
+      if (bit) {
+        eob_extra += (1 << (eob_offset_bits - 1 - i));
+      }
+    }
+  }
+  *eob = rec_eob_pos(eob_pt, eob_extra);
+
+  if (*eob > 1) {
+    memset(levels_buf, 0,
+           sizeof(*levels_buf) *
+               ((width + TX_PAD_HOR) * (height + TX_PAD_VER) + TX_PAD_END));
+  }
+
+  {
+    // Read the non-zero coefficient with scan index eob-1
+    // TODO(angiebird): Put this into a function
+    const int c = *eob - 1;
+    const int pos = scan[c];
+    const int coeff_ctx = get_lower_levels_ctx_eob(bwl, height, c);
+    const int nsymbs = 3;
+    aom_cdf_prob *cdf =
+        ec_ctx->coeff_base_eob_cdf[txs_ctx][plane_type][coeff_ctx];
+    int level = aom_read_symbol(r, cdf, nsymbs, ACCT_STR) + 1;
+    if (level > NUM_BASE_LEVELS) {
+      const int br_ctx = get_br_ctx_eob(pos, bwl, tx_class);
+      cdf = ec_ctx->coeff_br_cdf[AOMMIN(txs_ctx, TX_32X32)][plane_type][br_ctx];
+      for (int idx = 0; idx < COEFF_BASE_RANGE; idx += BR_CDF_SIZE - 1) {
+        const int k = aom_read_symbol(r, cdf, BR_CDF_SIZE, ACCT_STR);
+        level += k;
+        if (k < BR_CDF_SIZE - 1) break;
+      }
+    }
+    levels[get_padded_idx(pos, bwl)] = level;
+  }
+  if (*eob > 1) {
+    base_cdf_arr base_cdf = ec_ctx->coeff_base_cdf[txs_ctx][plane_type];
+    br_cdf_arr br_cdf =
+        ec_ctx->coeff_br_cdf[AOMMIN(txs_ctx, TX_32X32)][plane_type];
+    if (tx_class == TX_CLASS_2D) {
+      read_coeffs_reverse_2d(r, tx_size, 1, *eob - 1 - 1, scan, bwl, levels,
+                             base_cdf, br_cdf);
+      read_coeffs_reverse(r, tx_size, tx_class, 0, 0, scan, bwl, levels,
+                          base_cdf, br_cdf);
+    } else {
+      read_coeffs_reverse(r, tx_size, tx_class, 0, *eob - 1 - 1, scan, bwl,
+                          levels, base_cdf, br_cdf);
+    }
+  }
+
+  for (int c = 0; c < *eob; ++c) {
+    const int pos = scan[c];
+    uint8_t sign;
+    tran_low_t level = levels[get_padded_idx(pos, bwl)];
+    if (level) {
+      *max_scan_line = AOMMAX(*max_scan_line, pos);
+      if (c == 0) {
+        const int dc_sign_ctx = txb_ctx->dc_sign_ctx;
+        sign = aom_read_symbol(r, ec_ctx->dc_sign_cdf[plane_type][dc_sign_ctx],
+                               2, ACCT_STR);
+      } else {
+        sign = aom_read_bit(r, ACCT_STR);
+      }
+      if (level >= MAX_BASE_BR_RANGE) {
+        level += read_golomb(xd, r);
+      }
+
+      if (c == 0) dc_val = sign ? -level : level;
+
+      // Bitmasking to clamp level to valid range:
+      //   The valid range for 8/10/12 bit vdieo is at most 14/16/18 bit
+      level &= 0xfffff;
+      cul_level += level;
+      tran_low_t dq_coeff;
+      // Bitmasking to clamp dq_coeff to valid range:
+      //   The valid range for 8/10/12 bit video is at most 17/19/21 bit
+      dq_coeff = (tran_low_t)(
+          (int64_t)level * get_dqv(dequant, scan[c], iqmatrix) & 0xffffff);
+      dq_coeff = dq_coeff >> shift;
+      if (sign) {
+        dq_coeff = -dq_coeff;
+      }
+      tcoeffs_raw[c] = clamp(dq_coeff, min_value, max_value);
+    }
+    else
+        tcoeffs_raw[c] = 0;
+
+  }
+
+  cul_level = AOMMIN(COEFF_CONTEXT_MASK, cul_level);
+
+  // DC value
+  set_dc_sign(&cul_level, dc_val);
+  {
+      const int tx_types_flags[] = {
+          0,        //    DCT_DCT,          
+          0x080000,    //    ADST_DCT,         
+          0x020000,    //    DCT_ADST,         
+          0x0A0000,    //    ADST_ADST,        
+          0x090000,    //    FLIPADST_DCT,     
+          0x028000,    //    DCT_FLIPADST,     
+          0x0B8000,    //    FLIPADST_FLIPADST,
+          0x0A8000,    //    ADST_FLIPADST,    
+          0x0B0000,    //    FLIPADST_ADST,    
+          0x140000,    //    IDTX,             
+          0x040000,    //    V_DCT,            
+          0x100000,    //    H_DCT,            
+          0x0C0000,    //    V_ADST,           
+          0x120000,    //    H_ADST,           
+          0x0D0000,    //    V_FLIPADST,       
+          0x128000    //    H_FLIPADST,       
+      };
+
+      int mi_col = mbmi->mi_col;
+      int mi_row = mbmi->mi_row;
+      if (pd->subsampling_y && (mi_row & 0x01) && (mi_size_high[mbmi->sb_type] == 1))
+          mi_row -= 1;
+      if (pd->subsampling_x && (mi_col & 0x01) && (mi_size_wide[mbmi->sb_type] == 1))
+          mi_col -= 1;
+      const uint32_t x = ((mi_col * MI_SIZE) >> (pd->subsampling_x + 2)) + blk_col;
+      const uint32_t y = ((mi_row * MI_SIZE) >> (pd->subsampling_y + 2)) + blk_row;
+      tx_block_info_gpu * block = tile_data->idct_blocks_host + tile_data->idct_blocks_ptr;
+      const uint32_t coef_count = ((*eob) + 3) >> 2;
+      const int av1_idct_scans_lut[TX_TYPES] = {
+         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 1, 2, 1, 2
+      };
+
+      for (uint32_t c = *eob; c < (coef_count << 2); ++c)
+          tcoeffs_raw[c] = 0;
+      const int type_index = (xd->lossless[xd->mi[0]->segment_id] && tx_size == 0)? TX_SIZES_ALL : tx_size;
+      block->flags = coef_count | tx_types_flags[tx_type] | (plane << 21) | (av1_idct_scans_lut[tx_type] << 11);
+      block->input_offset = tile_data->dq_buffer_ptr;
+      block->output_pos = x | (y << 16);
+      block->sorting_idx = (tile_data->idct_blocks_sizes[type_index] << 8) + type_index;
+      ++tile_data->idct_blocks_sizes[type_index];
+      ++tile_data->idct_blocks_ptr;
+      tile_data->dq_buffer_ptr += coef_count << 2;
+  }
+  return cul_level;
+}
+
+void av1_read_coeffs_txb_facade(const AV1_COMMON *const cm,
+                                MACROBLOCKD *const xd, aom_reader *const r,
+                                const int plane, const int row, const int col,
+                                const TX_SIZE tx_size) {
+#if TXCOEFF_TIMER
+  struct aom_usec_timer timer;
+  aom_usec_timer_start(&timer);
+#endif
+  MB_MODE_INFO *const mbmi = xd->mi[0];
+  struct macroblockd_plane *const pd = &xd->plane[plane];
+
+  const BLOCK_SIZE bsize = mbmi->sb_type;
+  const BLOCK_SIZE plane_bsize =
+      get_plane_block_size(bsize, pd->subsampling_x, pd->subsampling_y);
+
+  TXB_CTX txb_ctx;
+  get_txb_ctx(plane_bsize, tx_size, plane, pd->above_context + col,
+              pd->left_context + row, &txb_ctx);
+  const uint8_t cul_level =
+      av1_read_coeffs_txb(cm, xd, r, row, col, plane, &txb_ctx, tx_size);
+
+  av1_set_contexts(xd, pd, plane, plane_bsize, tx_size, cul_level, col, row);
+
+  if (is_inter_block(mbmi)) {
+    PLANE_TYPE plane_type = get_plane_type(plane);
+    // tx_type will be read out in av1_read_coeffs_txb_facade
+    const TX_TYPE tx_type = av1_get_tx_type(plane_type, xd, row, col, tx_size,
+                                            cm->reduced_tx_set_used);
+
+    if (plane == 0)
+      update_txk_array(xd->txk_type, mbmi->sb_type, row, col, tx_size,
+                       tx_type);
+  }
+
+#if TXCOEFF_TIMER
+  aom_usec_timer_mark(&timer);
+  const int64_t elapsed_time = aom_usec_timer_elapsed(&timer);
+  cm->txcoeff_timer += elapsed_time;
+  ++cm->txb_count;
+#endif
+}

diff --git a/libav1/av1/decoder/decodetxb.h b/libav1/av1/decoder/decodetxb.h
new file mode 100644
index 0000000..fe04f6a
--- /dev/null
+++ b/libav1/av1/decoder/decodetxb.h

@@ -0,0 +1,32 @@
+/*
+ * Copyright (c) 2017, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_DECODER_DECODETXB_H_
+#define AOM_AV1_DECODER_DECODETXB_H_
+
+#include "config/aom_config.h"
+
+#include "av1/common/blockd.h"
+#include "av1/common/onyxc_int.h"
+#include "av1/common/txb_common.h"
+#include "aom_dsp/bitreader.h"
+
+uint8_t av1_read_coeffs_txb(const AV1_COMMON *const cm, MACROBLOCKD *const xd,
+                            aom_reader *const r, const int blk_row,
+                            const int blk_col, const int plane,
+                            const TXB_CTX *const txb_ctx,
+                            const TX_SIZE tx_size);
+
+void av1_read_coeffs_txb_facade(const AV1_COMMON *const cm,
+                                MACROBLOCKD *const xd, aom_reader *const r,
+                                const int plane, const int row, const int col,
+                                const TX_SIZE tx_size);
+#endif  // AOM_AV1_DECODER_DECODETXB_H_

diff --git a/libav1/av1/decoder/detokenize.c b/libav1/av1/decoder/detokenize.c
new file mode 100644
index 0000000..11ae19e
--- /dev/null
+++ b/libav1/av1/decoder/detokenize.c

@@ -0,0 +1,81 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "config/aom_config.h"
+
+#include "aom_mem/aom_mem.h"
+#include "aom_ports/mem.h"
+#include "av1/common/blockd.h"
+#include "av1/decoder/detokenize.h"
+
+#define ACCT_STR __func__
+
+#include "av1/common/common.h"
+#include "av1/common/entropy.h"
+#include "av1/common/idct.h"
+#include "dx/av1_core.h"
+
+static void decode_color_map_tokens(Av1ColorMapParam *param, aom_reader *r) {
+  uint8_t color_order[PALETTE_MAX_SIZE];
+  const int n = param->n_colors;
+  uint8_t *const color_map = param->color_map;
+  MapCdf color_map_cdf = param->map_cdf;
+  int plane_block_width = param->plane_width;
+  int plane_block_height = param->plane_height;
+  int rows = param->rows;
+  int cols = param->cols;
+
+  // The first color index.
+  color_map[0] = av1_read_uniform(r, n);
+  assert(color_map[0] < n);
+
+  // Run wavefront on the palette map index decoding.
+  for (int i = 1; i < rows + cols - 1; ++i) {
+    for (int j = AOMMIN(i, cols - 1); j >= AOMMAX(0, i - rows + 1); --j) {
+      const int color_ctx = av1_get_palette_color_index_context(
+          color_map, plane_block_width, (i - j), j, n, color_order, NULL);
+      const int color_idx = aom_read_symbol(
+          r, color_map_cdf[n - PALETTE_MIN_SIZE][color_ctx], n, ACCT_STR);
+      assert(color_idx >= 0 && color_idx < n);
+      color_map[(i - j) * plane_block_width + j] = color_order[color_idx];
+    }
+  }
+  // Copy last column to extra columns.
+  if (cols < plane_block_width) {
+    for (int i = 0; i < rows; ++i) {
+      memset(color_map + i * plane_block_width + cols,
+             color_map[i * plane_block_width + cols - 1],
+             (plane_block_width - cols));
+    }
+  }
+  // Copy last row to extra rows.
+  for (int i = rows; i < plane_block_height; ++i) {
+    memcpy(color_map + i * plane_block_width,
+           color_map + (rows - 1) * plane_block_width, plane_block_width);
+  }
+}
+
+void av1_decode_palette_tokens(AV1Decoder *const pbi, MACROBLOCKD *const xd, int plane,
+                               aom_reader *r) {
+  assert(plane == 0 || plane == 1);
+  Av1ColorMapParam params;
+  params.color_map =
+      xd->plane[plane].color_index_map + xd->color_index_map_offset[plane];
+  params.map_cdf = plane ? xd->tile_ctx->palette_uv_color_index_cdf
+                         : xd->tile_ctx->palette_y_color_index_cdf;
+  MB_MODE_INFO *const mbmi = xd->mi[0];
+  params.n_colors = mbmi->palette_mode_info.palette_size[plane];
+  av1_get_block_dimensions(mbmi->sb_type, plane, xd, &params.plane_width,
+                           &params.plane_height, &params.rows, &params.cols);
+  decode_color_map_tokens(&params, r);
+  av1_intra_palette(pbi, mbmi, &params, plane);
+
+}

diff --git a/libav1/av1/decoder/detokenize.h b/libav1/av1/decoder/detokenize.h
new file mode 100644
index 0000000..05c5649
--- /dev/null
+++ b/libav1/av1/decoder/detokenize.h

@@ -0,0 +1,29 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_DECODER_DETOKENIZE_H_
+#define AOM_AV1_DECODER_DETOKENIZE_H_
+
+#include "config/aom_config.h"
+
+#include "av1/common/scan.h"
+#include "av1/decoder/decoder.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void av1_decode_palette_tokens(AV1Decoder *const pbi, MACROBLOCKD *const xd, int plane, aom_reader *r);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+#endif  // AOM_AV1_DECODER_DETOKENIZE_H_

diff --git a/libav1/av1/decoder/dthread.h b/libav1/av1/decoder/dthread.h
new file mode 100644
index 0000000..ef778eb
--- /dev/null
+++ b/libav1/av1/decoder/dthread.h

@@ -0,0 +1,39 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_DECODER_DTHREAD_H_
+#define AOM_AV1_DECODER_DTHREAD_H_
+
+#include "config/aom_config.h"
+
+#include "aom_util/aom_thread.h"
+#include "aom/internal/aom_codec_internal.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct AV1Common;
+struct AV1Decoder;
+struct ThreadData;
+
+typedef struct DecWorkerData {
+  struct ThreadData *td;
+  const uint8_t *data_end;
+  struct aom_internal_error_info error_info;
+} DecWorkerData;
+
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_DECODER_DTHREAD_H_

diff --git a/libav1/av1/decoder/inspection.h b/libav1/av1/decoder/inspection.h
new file mode 100644
index 0000000..ddea4a1
--- /dev/null
+++ b/libav1/av1/decoder/inspection.h

@@ -0,0 +1,88 @@
+/*
+ * Copyright (c) 2017, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#ifndef AOM_AV1_DECODER_INSPECTION_H_
+#define AOM_AV1_DECODER_INSPECTION_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif  // __cplusplus
+
+#include "av1/common/seg_common.h"
+#if CONFIG_ACCOUNTING
+#include "av1/decoder/accounting.h"
+#endif
+
+#ifndef AOM_AOM_AOMDX_H_
+typedef void (*aom_inspect_cb)(void *decoder, void *data);
+#endif
+
+typedef struct insp_mv insp_mv;
+
+struct insp_mv {
+  int16_t row;
+  int16_t col;
+};
+
+typedef struct insp_mi_data insp_mi_data;
+
+struct insp_mi_data {
+  insp_mv mv[2];
+  int16_t ref_frame[2];
+  int16_t mode;
+  int16_t uv_mode;
+  int16_t sb_type;
+  int16_t skip;
+  int16_t segment_id;
+  int16_t dual_filter_type;
+  int16_t filter[2];
+  int16_t tx_type;
+  int16_t tx_size;
+  int16_t cdef_level;
+  int16_t cdef_strength;
+  int16_t cfl_alpha_idx;
+  int16_t cfl_alpha_sign;
+  int16_t current_qindex;
+  int16_t compound_type;
+  int16_t motion_mode;
+};
+
+typedef struct insp_frame_data insp_frame_data;
+
+struct insp_frame_data {
+#if CONFIG_ACCOUNTING
+  Accounting *accounting;
+#endif
+  insp_mi_data *mi_grid;
+  int16_t frame_number;
+  int show_frame;
+  int frame_type;
+  int base_qindex;
+  int mi_rows;
+  int mi_cols;
+  int tile_mi_rows;
+  int tile_mi_cols;
+  int16_t y_dequant[MAX_SEGMENTS][2];
+  int16_t u_dequant[MAX_SEGMENTS][2];
+  int16_t v_dequant[MAX_SEGMENTS][2];
+  // TODO(negge): add per frame CDEF data
+  int delta_q_present_flag;
+  int delta_q_res;
+  int show_existing_frame;
+};
+
+void ifd_init(insp_frame_data *fd, int frame_width, int frame_height);
+void ifd_clear(insp_frame_data *fd);
+int ifd_inspect(insp_frame_data *fd, void *decoder, int skip_not_transform);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif  // __cplusplus
+#endif  // AOM_AV1_DECODER_INSPECTION_H_

diff --git a/libav1/av1/decoder/obu.c b/libav1/av1/decoder/obu.c
new file mode 100644
index 0000000..58f7d2c
--- /dev/null
+++ b/libav1/av1/decoder/obu.c

@@ -0,0 +1,887 @@
+/*
+ * Copyright (c) 2017, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+
+#include "config/aom_config.h"
+#include "config/aom_scale_rtcd.h"
+
+#include "aom/aom_codec.h"
+#include "aom_dsp/bitreader_buffer.h"
+#include "aom_ports/mem_ops.h"
+
+#include "av1/common/common.h"
+#include "av1/common/obu_util.h"
+#include "av1/common/timing.h"
+#include "av1/decoder/decoder.h"
+#include "av1/decoder/decodeframe.h"
+#include "av1/decoder/obu.h"
+
+// Picture prediction structures (0-12 are predefined) in scalability metadata.
+enum {
+  SCALABILITY_L1T2 = 0,
+  SCALABILITY_L1T3 = 1,
+  SCALABILITY_L2T1 = 2,
+  SCALABILITY_L2T2 = 3,
+  SCALABILITY_L2T3 = 4,
+  SCALABILITY_S2T1 = 5,
+  SCALABILITY_S2T2 = 6,
+  SCALABILITY_S2T3 = 7,
+  SCALABILITY_L2T1h = 8,
+  SCALABILITY_L2T2h = 9,
+  SCALABILITY_L2T3h = 10,
+  SCALABILITY_S2T1h = 11,
+  SCALABILITY_S2T2h = 12,
+  SCALABILITY_S2T3h = 13,
+  SCALABILITY_SS = 14
+} UENUM1BYTE(SCALABILITY_STRUCTURES);
+
+aom_codec_err_t aom_get_num_layers_from_operating_point_idc(
+    int operating_point_idc, unsigned int *number_spatial_layers,
+    unsigned int *number_temporal_layers) {
+  // derive number of spatial/temporal layers from operating_point_idc
+
+  if (!number_spatial_layers || !number_temporal_layers)
+    return AOM_CODEC_INVALID_PARAM;
+
+  if (operating_point_idc == 0) {
+    *number_temporal_layers = 1;
+    *number_spatial_layers = 1;
+  } else {
+    *number_spatial_layers = 0;
+    *number_temporal_layers = 0;
+    for (int j = 0; j < MAX_NUM_SPATIAL_LAYERS; j++) {
+      *number_spatial_layers +=
+          (operating_point_idc >> (j + MAX_NUM_TEMPORAL_LAYERS)) & 0x1;
+    }
+    for (int j = 0; j < MAX_NUM_TEMPORAL_LAYERS; j++) {
+      *number_temporal_layers += (operating_point_idc >> j) & 0x1;
+    }
+  }
+
+  return AOM_CODEC_OK;
+}
+
+static int is_obu_in_current_operating_point(AV1Decoder *pbi,
+                                             ObuHeader obu_header) {
+  if (!pbi->current_operating_point) {
+    return 1;
+  }
+
+  if ((pbi->current_operating_point >> obu_header.temporal_layer_id) & 0x1 &&
+      (pbi->current_operating_point >> (obu_header.spatial_layer_id + 8)) &
+          0x1) {
+    return 1;
+  }
+  return 0;
+}
+
+static int byte_alignment(AV1_COMMON *const cm,
+                          struct aom_read_bit_buffer *const rb) {
+  while (rb->bit_offset & 7) {
+    if (aom_rb_read_bit(rb)) {
+      cm->error.error_code = AOM_CODEC_CORRUPT_FRAME;
+      return -1;
+    }
+  }
+  return 0;
+}
+
+static uint32_t read_temporal_delimiter_obu() { return 0; }
+
+// Returns a boolean that indicates success.
+static int read_bitstream_level(BitstreamLevel *bl,
+                                struct aom_read_bit_buffer *rb) {
+  const uint8_t seq_level_idx = aom_rb_read_literal(rb, LEVEL_BITS);
+  if (!is_valid_seq_level_idx(seq_level_idx)) return 0;
+  bl->major = (seq_level_idx >> LEVEL_MINOR_BITS) + LEVEL_MAJOR_MIN;
+  bl->minor = seq_level_idx & ((1 << LEVEL_MINOR_BITS) - 1);
+  return 1;
+}
+
+// Returns whether two sequence headers are consistent with each other.
+// TODO(huisu,wtc@google.com): make sure the code matches the spec exactly.
+static int are_seq_headers_consistent(const SequenceHeader *seq_params_old,
+                                      const SequenceHeader *seq_params_new) {
+  return !memcmp(seq_params_old, seq_params_new, sizeof(SequenceHeader));
+}
+
+// On success, sets pbi->sequence_header_ready to 1 and returns the number of
+// bytes read from 'rb'.
+// On failure, sets pbi->common.error.error_code and returns 0.
+static uint32_t read_sequence_header_obu(AV1Decoder *pbi,
+                                         struct aom_read_bit_buffer *rb) {
+  AV1_COMMON *const cm = &pbi->common;
+  const uint32_t saved_bit_offset = rb->bit_offset;
+
+  // Verify rb has been configured to report errors.
+  assert(rb->error_handler);
+
+  // Use a local variable to store the information as we decode. At the end,
+  // if no errors have occurred, cm->seq_params is updated.
+  SequenceHeader sh = cm->seq_params;
+  SequenceHeader *const seq_params = &sh;
+
+  seq_params->profile = av1_read_profile(rb);
+  if (seq_params->profile > CONFIG_MAX_DECODE_PROFILE) {
+    cm->error.error_code = AOM_CODEC_UNSUP_BITSTREAM;
+    return 0;
+  }
+
+  // Still picture or not
+  seq_params->still_picture = aom_rb_read_bit(rb);
+  seq_params->reduced_still_picture_hdr = aom_rb_read_bit(rb);
+  // Video must have reduced_still_picture_hdr = 0
+  if (!seq_params->still_picture && seq_params->reduced_still_picture_hdr) {
+    cm->error.error_code = AOM_CODEC_UNSUP_BITSTREAM;
+    return 0;
+  }
+
+  if (seq_params->reduced_still_picture_hdr) {
+    cm->timing_info_present = 0;
+    seq_params->decoder_model_info_present_flag = 0;
+    seq_params->display_model_info_present_flag = 0;
+    seq_params->operating_points_cnt_minus_1 = 0;
+    seq_params->operating_point_idc[0] = 0;
+    if (!read_bitstream_level(&seq_params->level[0], rb)) {
+      cm->error.error_code = AOM_CODEC_UNSUP_BITSTREAM;
+      return 0;
+    }
+    seq_params->tier[0] = 0;
+    cm->op_params[0].decoder_model_param_present_flag = 0;
+    cm->op_params[0].display_model_param_present_flag = 0;
+  } else {
+    cm->timing_info_present = aom_rb_read_bit(rb);  // timing_info_present_flag
+    if (cm->timing_info_present) {
+      av1_read_timing_info_header(cm, rb);
+
+      seq_params->decoder_model_info_present_flag = aom_rb_read_bit(rb);
+      if (seq_params->decoder_model_info_present_flag)
+        av1_read_decoder_model_info(cm, rb);
+    } else {
+      seq_params->decoder_model_info_present_flag = 0;
+    }
+    seq_params->display_model_info_present_flag = aom_rb_read_bit(rb);
+    seq_params->operating_points_cnt_minus_1 =
+        aom_rb_read_literal(rb, OP_POINTS_CNT_MINUS_1_BITS);
+    for (int i = 0; i < seq_params->operating_points_cnt_minus_1 + 1; i++) {
+      seq_params->operating_point_idc[i] =
+          aom_rb_read_literal(rb, OP_POINTS_IDC_BITS);
+      if (!read_bitstream_level(&seq_params->level[i], rb)) {
+        cm->error.error_code = AOM_CODEC_UNSUP_BITSTREAM;
+        return 0;
+      }
+      // This is the seq_level_idx[i] > 7 check in the spec. seq_level_idx 7
+      // is equivalent to level 3.3.
+      if (seq_params->level[i].major > 3)
+        seq_params->tier[i] = aom_rb_read_bit(rb);
+      else
+        seq_params->tier[i] = 0;
+      if (seq_params->decoder_model_info_present_flag) {
+        cm->op_params[i].decoder_model_param_present_flag = aom_rb_read_bit(rb);
+        if (cm->op_params[i].decoder_model_param_present_flag)
+          av1_read_op_parameters_info(cm, rb, i);
+      } else {
+        cm->op_params[i].decoder_model_param_present_flag = 0;
+      }
+      if (cm->timing_info_present &&
+          (cm->timing_info.equal_picture_interval ||
+           cm->op_params[i].decoder_model_param_present_flag)) {
+        cm->op_params[i].bitrate = max_level_bitrate(
+            seq_params->profile,
+            major_minor_to_seq_level_idx(seq_params->level[i]),
+            seq_params->tier[i]);
+        // Level with seq_level_idx = 31 returns a high "dummy" bitrate to pass
+        // the check
+        if (cm->op_params[i].bitrate == 0)
+          aom_internal_error(&cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+                             "AV1 does not support this combination of "
+                             "profile, level, and tier.");
+        // Buffer size in bits/s is bitrate in bits/s * 1 s
+        cm->op_params[i].buffer_size = cm->op_params[i].bitrate;
+      }
+      if (cm->timing_info_present && cm->timing_info.equal_picture_interval &&
+          !cm->op_params[i].decoder_model_param_present_flag) {
+        // When the decoder_model_parameters are not sent for this op, set
+        // the default ones that can be used with the resource availability mode
+        cm->op_params[i].decoder_buffer_delay = 70000;
+        cm->op_params[i].encoder_buffer_delay = 20000;
+        cm->op_params[i].low_delay_mode_flag = 0;
+      }
+
+      if (seq_params->display_model_info_present_flag) {
+        cm->op_params[i].display_model_param_present_flag = aom_rb_read_bit(rb);
+        if (cm->op_params[i].display_model_param_present_flag) {
+          cm->op_params[i].initial_display_delay =
+              aom_rb_read_literal(rb, 4) + 1;
+          if (cm->op_params[i].initial_display_delay > 10)
+            aom_internal_error(
+                &cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+                "AV1 does not support more than 10 decoded frames delay");
+        } else {
+          cm->op_params[i].initial_display_delay = 10;
+        }
+      } else {
+        cm->op_params[i].display_model_param_present_flag = 0;
+        cm->op_params[i].initial_display_delay = 10;
+      }
+    }
+  }
+  // This decoder supports all levels.  Choose operating point provided by
+  // external means
+  int operating_point = pbi->operating_point;
+  if (operating_point < 0 ||
+      operating_point > seq_params->operating_points_cnt_minus_1)
+    operating_point = 0;
+  pbi->current_operating_point =
+      seq_params->operating_point_idc[operating_point];
+  if (aom_get_num_layers_from_operating_point_idc(
+          pbi->current_operating_point, &cm->number_spatial_layers,
+          &cm->number_temporal_layers) != AOM_CODEC_OK) {
+    cm->error.error_code = AOM_CODEC_ERROR;
+    return 0;
+  }
+
+  av1_read_sequence_header(cm, rb, seq_params);
+
+  av1_read_color_config(rb, pbi->allow_lowbitdepth, seq_params, &cm->error);
+  if (!(seq_params->subsampling_x == 0 && seq_params->subsampling_y == 0) &&
+      !(seq_params->subsampling_x == 1 && seq_params->subsampling_y == 1) &&
+      !(seq_params->subsampling_x == 1 && seq_params->subsampling_y == 0)) {
+    aom_internal_error(&cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+                       "Only 4:4:4, 4:2:2 and 4:2:0 are currently supported, "
+                       "%d %d subsampling is not supported.\n",
+                       seq_params->subsampling_x, seq_params->subsampling_y);
+  }
+
+  seq_params->film_grain_params_present = aom_rb_read_bit(rb);
+
+  if (av1_check_trailing_bits(pbi, rb) != 0) {
+    // cm->error.error_code is already set.
+    return 0;
+  }
+
+  // If a sequence header has been decoded before, we check if the new
+  // one is consistent with the old one.
+  if (pbi->sequence_header_ready) {
+    if (!are_seq_headers_consistent(&cm->seq_params, seq_params))
+      pbi->sequence_header_changed = 1;
+  }
+
+  cm->seq_params = *seq_params;
+  pbi->sequence_header_ready = 1;
+
+  return ((rb->bit_offset - saved_bit_offset + 7) >> 3);
+}
+
+// On success, returns the frame header size. On failure, calls
+// aom_internal_error and does not return.
+static uint32_t read_frame_header_obu(AV1Decoder *pbi,
+                                      struct aom_read_bit_buffer *rb,
+                                      const uint8_t *data,
+                                      const uint8_t **p_data_end,
+                                      int trailing_bits_present) {
+  return av1_decode_frame_headers_and_setup(pbi, rb, data, p_data_end,
+                                            trailing_bits_present);
+}
+
+// On success, returns the tile group header size. On failure, calls
+// aom_internal_error() and returns -1.
+static int32_t read_tile_group_header(AV1Decoder *pbi,
+                                      struct aom_read_bit_buffer *rb,
+                                      int *start_tile, int *end_tile,
+                                      int tile_start_implicit) {
+  AV1_COMMON *const cm = &pbi->common;
+  uint32_t saved_bit_offset = rb->bit_offset;
+  int tile_start_and_end_present_flag = 0;
+  const int num_tiles = pbi->common.tile_rows * pbi->common.tile_cols;
+
+  if (!pbi->common.large_scale_tile && num_tiles > 1) {
+    tile_start_and_end_present_flag = aom_rb_read_bit(rb);
+    if (tile_start_implicit && tile_start_and_end_present_flag) {
+      aom_internal_error(
+          &cm->error, AOM_CODEC_UNSUP_BITSTREAM,
+          "For OBU_FRAME type obu tile_start_and_end_present_flag must be 0");
+      return -1;
+    }
+  }
+  if (pbi->common.large_scale_tile || num_tiles == 1 ||
+      !tile_start_and_end_present_flag) {
+    *start_tile = 0;
+    *end_tile = num_tiles - 1;
+  } else {
+    int tile_bits = cm->log2_tile_rows + cm->log2_tile_cols;
+    *start_tile = aom_rb_read_literal(rb, tile_bits);
+    *end_tile = aom_rb_read_literal(rb, tile_bits);
+  }
+  if (*start_tile > *end_tile) {
+    aom_internal_error(&cm->error, AOM_CODEC_CORRUPT_FRAME,
+                       "tg_end must be greater than or equal to tg_start");
+    return -1;
+  }
+
+  return ((rb->bit_offset - saved_bit_offset + 7) >> 3);
+}
+
+// On success, returns the tile group OBU size. On failure, sets
+// pbi->common.error.error_code and returns 0.
+static uint32_t read_one_tile_group_obu(
+    AV1Decoder *pbi, struct aom_read_bit_buffer *rb, int is_first_tg,
+    const uint8_t *data, const uint8_t *data_end, const uint8_t **p_data_end,
+    int *is_last_tg, int tile_start_implicit) {
+  AV1_COMMON *const cm = &pbi->common;
+  int start_tile, end_tile;
+  int32_t header_size, tg_payload_size;
+
+  assert((rb->bit_offset & 7) == 0);
+  assert(rb->bit_buffer + aom_rb_bytes_read(rb) == data);
+
+  header_size = read_tile_group_header(pbi, rb, &start_tile, &end_tile,
+                                       tile_start_implicit);
+  if (header_size == -1 || byte_alignment(cm, rb)) return 0;
+  data += header_size;
+  av1_decode_tg_tiles_and_wrapup(pbi, data, data_end, p_data_end, start_tile,
+                                 end_tile, is_first_tg);
+
+  tg_payload_size = (uint32_t)(*p_data_end - data);
+
+  // TODO(shan):  For now, assume all tile groups received in order
+  *is_last_tg = end_tile == cm->tile_rows * cm->tile_cols - 1;
+  return header_size + tg_payload_size;
+}
+
+static void alloc_tile_list_buffer(AV1Decoder *pbi) {
+  // The resolution of the output frame is read out from the bitstream. The data
+  // are stored in the order of Y plane, U plane and V plane. As an example, for
+  // image format 4:2:0, the output frame of U plane and V plane is 1/4 of the
+  // output frame.
+  AV1_COMMON *const cm = &pbi->common;
+  int tile_width, tile_height;
+  av1_get_uniform_tile_size(cm, &tile_width, &tile_height);
+  const int tile_width_in_pixels = tile_width * MI_SIZE;
+  const int tile_height_in_pixels = tile_height * MI_SIZE;
+  const int output_frame_width =
+      (pbi->output_frame_width_in_tiles_minus_1 + 1) * tile_width_in_pixels;
+  const int output_frame_height =
+      (pbi->output_frame_height_in_tiles_minus_1 + 1) * tile_height_in_pixels;
+  // The output frame is used to store the decoded tile list. The decoded tile
+  // list has to fit into 1 output frame.
+  assert((pbi->tile_count_minus_1 + 1) <=
+         (pbi->output_frame_width_in_tiles_minus_1 + 1) *
+             (pbi->output_frame_height_in_tiles_minus_1 + 1));
+
+  // Allocate the tile list output buffer.
+  // Note: if cm->seq_params.use_highbitdepth is 1 and cm->seq_params.bit_depth
+  // is 8, we could allocate less memory, namely, 8 bits/pixel.
+  if (aom_alloc_frame_buffer(&pbi->tile_list_outbuf, output_frame_width,
+                             output_frame_height, cm->seq_params.subsampling_x,
+                             cm->seq_params.subsampling_y,
+                             (cm->seq_params.use_highbitdepth &&
+                              (cm->seq_params.bit_depth > AOM_BITS_8)),
+                             0, cm->byte_alignment))
+    aom_internal_error(&cm->error, AOM_CODEC_MEM_ERROR,
+                       "Failed to allocate the tile list output buffer");
+}
+
+static void yv12_tile_copy(const YV12_BUFFER_CONFIG *src, int hstart1,
+                           int hend1, int vstart1, int vend1,
+                           YV12_BUFFER_CONFIG *dst, int hstart2, int vstart2,
+                           int plane) {
+  const int src_stride = (plane > 0) ? src->strides[1] : src->strides[0];
+  const int dst_stride = (plane > 0) ? dst->strides[1] : dst->strides[0];
+  int row, col;
+
+  assert(src->flags & YV12_FLAG_HIGHBITDEPTH);
+  assert(!(dst->flags & YV12_FLAG_HIGHBITDEPTH));
+
+  const uint16_t *src16 =
+      CONVERT_TO_SHORTPTR(src->buffers[plane] + vstart1 * src_stride + hstart1);
+  uint8_t *dst8 = dst->buffers[plane] + vstart2 * dst_stride + hstart2;
+
+  for (row = vstart1; row < vend1; ++row) {
+    for (col = 0; col < (hend1 - hstart1); ++col) *dst8++ = (uint8_t)(*src16++);
+    src16 += src_stride - (hend1 - hstart1);
+    dst8 += dst_stride - (hend1 - hstart1);
+  }
+  return;
+}
+
+static void copy_decoded_tile_to_tile_list_buffer(AV1Decoder *pbi,
+                                                  int tile_idx) {
+  AV1_COMMON *const cm = &pbi->common;
+  int tile_width, tile_height;
+  av1_get_uniform_tile_size(cm, &tile_width, &tile_height);
+  const int tile_width_in_pixels = tile_width * MI_SIZE;
+  const int tile_height_in_pixels = tile_height * MI_SIZE;
+  const int ssy = cm->seq_params.subsampling_y;
+  const int ssx = cm->seq_params.subsampling_x;
+  const int num_planes = av1_num_planes(cm);
+
+  YV12_BUFFER_CONFIG *cur_frame = &cm->cur_frame->buf;
+  const int tr = tile_idx / (pbi->output_frame_width_in_tiles_minus_1 + 1);
+  const int tc = tile_idx % (pbi->output_frame_width_in_tiles_minus_1 + 1);
+  int plane;
+
+  // Copy decoded tile to the tile list output buffer.
+  for (plane = 0; plane < num_planes; ++plane) {
+    const int shift_x = plane > 0 ? ssx : 0;
+    const int shift_y = plane > 0 ? ssy : 0;
+    const int h = tile_height_in_pixels >> shift_y;
+    const int w = tile_width_in_pixels >> shift_x;
+
+    // src offset
+    int vstart1 = pbi->dec_tile_row * h;
+    int vend1 = vstart1 + h;
+    int hstart1 = pbi->dec_tile_col * w;
+    int hend1 = hstart1 + w;
+    // dst offset
+    int vstart2 = tr * h;
+    int hstart2 = tc * w;
+
+    if (cm->seq_params.use_highbitdepth &&
+        cm->seq_params.bit_depth == AOM_BITS_8) {
+      yv12_tile_copy(cur_frame, hstart1, hend1, vstart1, vend1,
+                     &pbi->tile_list_outbuf, hstart2, vstart2, plane);
+    } else {
+      switch (plane) {
+        case 0:
+          aom_yv12_partial_copy_y(cur_frame, hstart1, hend1, vstart1, vend1,
+                                  &pbi->tile_list_outbuf, hstart2, vstart2);
+          break;
+        case 1:
+          aom_yv12_partial_copy_u(cur_frame, hstart1, hend1, vstart1, vend1,
+                                  &pbi->tile_list_outbuf, hstart2, vstart2);
+          break;
+        case 2:
+          aom_yv12_partial_copy_v(cur_frame, hstart1, hend1, vstart1, vend1,
+                                  &pbi->tile_list_outbuf, hstart2, vstart2);
+          break;
+        default: assert(0);
+      }
+    }
+  }
+}
+
+// Only called while large_scale_tile = 1.
+//
+// On success, returns the tile list OBU size. On failure, sets
+// pbi->common.error.error_code and returns 0.
+static uint32_t read_and_decode_one_tile_list(AV1Decoder *pbi,
+                                              struct aom_read_bit_buffer *rb,
+                                              const uint8_t *data,
+                                              const uint8_t *data_end,
+                                              const uint8_t **p_data_end,
+                                              int *frame_decoding_finished) {
+  AV1_COMMON *const cm = &pbi->common;
+  uint32_t tile_list_payload_size = 0;
+  const int num_tiles = cm->tile_cols * cm->tile_rows;
+  const int start_tile = 0;
+  const int end_tile = num_tiles - 1;
+  int i = 0;
+
+  // Process the tile list info.
+  pbi->output_frame_width_in_tiles_minus_1 = aom_rb_read_literal(rb, 8);
+  pbi->output_frame_height_in_tiles_minus_1 = aom_rb_read_literal(rb, 8);
+  pbi->tile_count_minus_1 = aom_rb_read_literal(rb, 16);
+  if (pbi->tile_count_minus_1 > MAX_TILES - 1) {
+    cm->error.error_code = AOM_CODEC_CORRUPT_FRAME;
+    return 0;
+  }
+
+  // Allocate output frame buffer for the tile list.
+  alloc_tile_list_buffer(pbi);
+
+  uint32_t tile_list_info_bytes = 4;
+  tile_list_payload_size += tile_list_info_bytes;
+  data += tile_list_info_bytes;
+
+  int tile_idx = 0;
+  for (i = 0; i <= pbi->tile_count_minus_1; i++) {
+    // Process 1 tile.
+    // Reset the bit reader.
+    rb->bit_offset = 0;
+    rb->bit_buffer = data;
+
+    // Read out the tile info.
+    uint32_t tile_info_bytes = 5;
+    // Set reference for each tile.
+    int ref_idx = aom_rb_read_literal(rb, 8);
+    if (ref_idx >= MAX_EXTERNAL_REFERENCES) {
+      cm->error.error_code = AOM_CODEC_CORRUPT_FRAME;
+      return 0;
+    }
+    av1_set_reference_dec(cm, 0, 1, &pbi->ext_refs.refs[ref_idx]);
+
+    pbi->dec_tile_row = aom_rb_read_literal(rb, 8);
+    pbi->dec_tile_col = aom_rb_read_literal(rb, 8);
+    if (pbi->dec_tile_row < 0 || pbi->dec_tile_col < 0 ||
+        pbi->dec_tile_row >= cm->tile_rows ||
+        pbi->dec_tile_col >= cm->tile_cols) {
+      cm->error.error_code = AOM_CODEC_CORRUPT_FRAME;
+      return 0;
+    }
+
+    pbi->coded_tile_data_size = aom_rb_read_literal(rb, 16) + 1;
+    data += tile_info_bytes;
+    if ((size_t)(data_end - data) < pbi->coded_tile_data_size) {
+      cm->error.error_code = AOM_CODEC_CORRUPT_FRAME;
+      return 0;
+    }
+
+    av1_decode_tg_tiles_and_wrapup(pbi, data, data + pbi->coded_tile_data_size,
+                                   p_data_end, start_tile, end_tile, 0);
+    uint32_t tile_payload_size = (uint32_t)(*p_data_end - data);
+
+    tile_list_payload_size += tile_info_bytes + tile_payload_size;
+
+    // Update data ptr for next tile decoding.
+    data = *p_data_end;
+    assert(data <= data_end);
+
+    // Copy the decoded tile to the tile list output buffer.
+    copy_decoded_tile_to_tile_list_buffer(pbi, tile_idx);
+    tile_idx++;
+  }
+
+  *frame_decoding_finished = 1;
+  return tile_list_payload_size;
+}
+
+static void read_metadata_itut_t35(const uint8_t *data, size_t sz) {
+  struct aom_read_bit_buffer rb = { data, data + sz, 0, NULL, NULL };
+  for (size_t i = 0; i < sz; i++) {
+    aom_rb_read_literal(&rb, 8);
+  }
+}
+
+static void read_metadata_hdr_cll(const uint8_t *data, size_t sz) {
+  struct aom_read_bit_buffer rb = { data, data + sz, 0, NULL, NULL };
+  aom_rb_read_literal(&rb, 16);  // max_cll
+  aom_rb_read_literal(&rb, 16);  // max_fall
+}
+
+static void read_metadata_hdr_mdcv(const uint8_t *data, size_t sz) {
+  struct aom_read_bit_buffer rb = { data, data + sz, 0, NULL, NULL };
+  for (int i = 0; i < 3; i++) {
+    aom_rb_read_literal(&rb, 16);  // primary_i_chromaticity_x
+    aom_rb_read_literal(&rb, 16);  // primary_i_chromaticity_y
+  }
+
+  aom_rb_read_literal(&rb, 16);  // white_point_chromaticity_x
+  aom_rb_read_literal(&rb, 16);  // white_point_chromaticity_y
+
+  aom_rb_read_unsigned_literal(&rb, 32);  // luminance_max
+  aom_rb_read_unsigned_literal(&rb, 32);  // luminance_min
+}
+
+static void scalability_structure(struct aom_read_bit_buffer *rb) {
+  int spatial_layers_cnt = aom_rb_read_literal(rb, 2);
+  int spatial_layer_dimensions_present_flag = aom_rb_read_bit(rb);
+  int spatial_layer_description_present_flag = aom_rb_read_bit(rb);
+  int temporal_group_description_present_flag = aom_rb_read_bit(rb);
+  aom_rb_read_literal(rb, 3);  // reserved
+
+  if (spatial_layer_dimensions_present_flag) {
+    int i;
+    for (i = 0; i < spatial_layers_cnt + 1; i++) {
+      aom_rb_read_literal(rb, 16);
+      aom_rb_read_literal(rb, 16);
+    }
+  }
+  if (spatial_layer_description_present_flag) {
+    int i;
+    for (i = 0; i < spatial_layers_cnt + 1; i++) {
+      aom_rb_read_literal(rb, 8);
+    }
+  }
+  if (temporal_group_description_present_flag) {
+    int i, j, temporal_group_size;
+    temporal_group_size = aom_rb_read_literal(rb, 8);
+    for (i = 0; i < temporal_group_size; i++) {
+      aom_rb_read_literal(rb, 3);
+      aom_rb_read_bit(rb);
+      aom_rb_read_bit(rb);
+      int temporal_group_ref_cnt = aom_rb_read_literal(rb, 3);
+      for (j = 0; j < temporal_group_ref_cnt; j++) {
+        aom_rb_read_literal(rb, 8);
+      }
+    }
+  }
+}
+
+static void read_metadata_scalability(const uint8_t *data, size_t sz) {
+  struct aom_read_bit_buffer rb = { data, data + sz, 0, NULL, NULL };
+  int scalability_mode_idc = aom_rb_read_literal(&rb, 8);
+  if (scalability_mode_idc == SCALABILITY_SS) {
+    scalability_structure(&rb);
+  }
+}
+
+static void read_metadata_timecode(const uint8_t *data, size_t sz) {
+  struct aom_read_bit_buffer rb = { data, data + sz, 0, NULL, NULL };
+  aom_rb_read_literal(&rb, 5);                     // counting_type f(5)
+  int full_timestamp_flag = aom_rb_read_bit(&rb);  // full_timestamp_flag f(1)
+  aom_rb_read_bit(&rb);                            // discontinuity_flag (f1)
+  aom_rb_read_bit(&rb);                            // cnt_dropped_flag f(1)
+  aom_rb_read_literal(&rb, 9);                     // n_frames f(9)
+  if (full_timestamp_flag) {
+    aom_rb_read_literal(&rb, 6);  // seconds_value f(6)
+    aom_rb_read_literal(&rb, 6);  // minutes_value f(6)
+    aom_rb_read_literal(&rb, 5);  // hours_value f(5)
+  } else {
+    int seconds_flag = aom_rb_read_bit(&rb);  // seconds_flag f(1)
+    if (seconds_flag) {
+      aom_rb_read_literal(&rb, 6);              // seconds_value f(6)
+      int minutes_flag = aom_rb_read_bit(&rb);  // minutes_flag f(1)
+      if (minutes_flag) {
+        aom_rb_read_literal(&rb, 6);            // minutes_value f(6)
+        int hours_flag = aom_rb_read_bit(&rb);  // hours_flag f(1)
+        if (hours_flag) {
+          aom_rb_read_literal(&rb, 5);  // hours_value f(5)
+        }
+      }
+    }
+  }
+  // time_offset_length f(5)
+  int time_offset_length = aom_rb_read_literal(&rb, 5);
+  if (time_offset_length) {
+    aom_rb_read_literal(&rb, time_offset_length);  // f(time_offset_length)
+  }
+}
+
+// Not fully implemented. Always succeeds and returns sz.
+static size_t read_metadata(const uint8_t *data, size_t sz) {
+  size_t type_length;
+  uint64_t type_value;
+  OBU_METADATA_TYPE metadata_type;
+  if (aom_uleb_decode(data, sz, &type_value, &type_length) < 0) {
+    return sz;
+  }
+  metadata_type = (OBU_METADATA_TYPE)type_value;
+  if (metadata_type == OBU_METADATA_TYPE_ITUT_T35) {
+    read_metadata_itut_t35(data + type_length, sz - type_length);
+  } else if (metadata_type == OBU_METADATA_TYPE_HDR_CLL) {
+    read_metadata_hdr_cll(data + type_length, sz - type_length);
+  } else if (metadata_type == OBU_METADATA_TYPE_HDR_MDCV) {
+    read_metadata_hdr_mdcv(data + type_length, sz - type_length);
+  } else if (metadata_type == OBU_METADATA_TYPE_SCALABILITY) {
+    read_metadata_scalability(data + type_length, sz - type_length);
+  } else if (metadata_type == OBU_METADATA_TYPE_TIMECODE) {
+    read_metadata_timecode(data + type_length, sz - type_length);
+  }
+
+  return sz;
+}
+
+// On success, returns a boolean that indicates whether the decoding of the
+// current frame is finished. On failure, sets cm->error.error_code and
+// returns -1.
+int aom_decode_frame_from_obus(struct AV1Decoder *pbi, const uint8_t *data,
+                               const uint8_t *data_end,
+                               const uint8_t **p_data_end) {
+  AV1_COMMON *const cm = &pbi->common;
+  int frame_decoding_finished = 0;
+  int is_first_tg_obu_received = 1;
+  uint32_t frame_header_size = 0;
+  ObuHeader obu_header;
+  memset(&obu_header, 0, sizeof(obu_header));
+  pbi->seen_frame_header = 0;
+
+  if (data_end < data) {
+    cm->error.error_code = AOM_CODEC_CORRUPT_FRAME;
+    return -1;
+  }
+
+  // Reset pbi->camera_frame_header_ready to 0 if cm->large_scale_tile = 0.
+  if (!cm->large_scale_tile) pbi->camera_frame_header_ready = 0;
+
+  // decode frame as a series of OBUs
+  while (!frame_decoding_finished && cm->error.error_code == AOM_CODEC_OK) {
+    struct aom_read_bit_buffer rb;
+    size_t payload_size = 0;
+    size_t decoded_payload_size = 0;
+    size_t obu_payload_offset = 0;
+    size_t bytes_read = 0;
+    const size_t bytes_available = data_end - data;
+
+    if (bytes_available == 0 && !pbi->seen_frame_header) {
+      *p_data_end = data;
+      cm->error.error_code = AOM_CODEC_OK;
+      break;
+    }
+
+    aom_codec_err_t status =
+        aom_read_obu_header_and_size(data, bytes_available, cm->is_annexb,
+                                     &obu_header, &payload_size, &bytes_read);
+
+    if (status != AOM_CODEC_OK) {
+      cm->error.error_code = status;
+      return -1;
+    }
+
+    // Record obu size header information.
+    pbi->obu_size_hdr.data = data + obu_header.size;
+    pbi->obu_size_hdr.size = bytes_read - obu_header.size;
+
+    // Note: aom_read_obu_header_and_size() takes care of checking that this
+    // doesn't cause 'data' to advance past 'data_end'.
+    data += bytes_read;
+
+    if ((size_t)(data_end - data) < payload_size) {
+      cm->error.error_code = AOM_CODEC_CORRUPT_FRAME;
+      return -1;
+    }
+
+    cm->temporal_layer_id = obu_header.temporal_layer_id;
+    cm->spatial_layer_id = obu_header.spatial_layer_id;
+
+    if (obu_header.type != OBU_TEMPORAL_DELIMITER &&
+        obu_header.type != OBU_SEQUENCE_HEADER &&
+        obu_header.type != OBU_PADDING) {
+      // don't decode obu if it's not in current operating mode
+      if (!is_obu_in_current_operating_point(pbi, obu_header)) {
+        data += payload_size;
+        continue;
+      }
+    }
+
+    av1_init_read_bit_buffer(pbi, &rb, data, data + payload_size);
+
+    switch (obu_header.type) {
+      case OBU_TEMPORAL_DELIMITER:
+        decoded_payload_size = read_temporal_delimiter_obu();
+        pbi->seen_frame_header = 0;
+        break;
+      case OBU_SEQUENCE_HEADER:
+        decoded_payload_size = read_sequence_header_obu(pbi, &rb);
+        if (cm->error.error_code != AOM_CODEC_OK) return -1;
+        break;
+      case OBU_FRAME_HEADER:
+      case OBU_REDUNDANT_FRAME_HEADER:
+      case OBU_FRAME:
+        // Only decode first frame header received
+        if (!pbi->seen_frame_header ||
+            (cm->large_scale_tile && !pbi->camera_frame_header_ready)) {
+          frame_header_size = read_frame_header_obu(
+              pbi, &rb, data, p_data_end, obu_header.type != OBU_FRAME);
+          pbi->seen_frame_header = 1;
+          if (!pbi->ext_tile_debug && cm->large_scale_tile)
+            pbi->camera_frame_header_ready = 1;
+        } else {
+          // TODO(wtc): Verify that the frame_header_obu is identical to the
+          // original frame_header_obu. For now just skip frame_header_size
+          // bytes in the bit buffer.
+          if (frame_header_size > payload_size) {
+            cm->error.error_code = AOM_CODEC_CORRUPT_FRAME;
+            return -1;
+          }
+          assert(rb.bit_offset == 0);
+          rb.bit_offset = 8 * frame_header_size;
+        }
+
+        decoded_payload_size = frame_header_size;
+        pbi->frame_header_size = frame_header_size;
+
+        if (cm->show_existing_frame) {
+          if (obu_header.type == OBU_FRAME) {
+            cm->error.error_code = AOM_CODEC_UNSUP_BITSTREAM;
+            return -1;
+          }
+          frame_decoding_finished = 1;
+          pbi->seen_frame_header = 0;
+          break;
+        }
+
+        // In large scale tile coding, decode the common camera frame header
+        // before any tile list OBU.
+        if (!pbi->ext_tile_debug && pbi->camera_frame_header_ready) {
+          frame_decoding_finished = 1;
+          // Skip the rest of the frame data.
+          decoded_payload_size = payload_size;
+          // Update data_end.
+          *p_data_end = data_end;
+          break;
+        }
+
+        if (obu_header.type != OBU_FRAME) break;
+        obu_payload_offset = frame_header_size;
+        // Byte align the reader before reading the tile group.
+        // byte_alignment() has set cm->error.error_code if it returns -1.
+        if (byte_alignment(cm, &rb)) return -1;
+        AOM_FALLTHROUGH_INTENDED;  // fall through to read tile group.
+      case OBU_TILE_GROUP:
+        if (!pbi->seen_frame_header) {
+          cm->error.error_code = AOM_CODEC_CORRUPT_FRAME;
+          return -1;
+        }
+        if (obu_payload_offset > payload_size) {
+          cm->error.error_code = AOM_CODEC_CORRUPT_FRAME;
+          return -1;
+        }
+        decoded_payload_size += read_one_tile_group_obu(
+            pbi, &rb, is_first_tg_obu_received, data + obu_payload_offset,
+            data + payload_size, p_data_end, &frame_decoding_finished,
+            obu_header.type == OBU_FRAME);
+        if (cm->error.error_code != AOM_CODEC_OK) return -1;
+        is_first_tg_obu_received = 0;
+        if (frame_decoding_finished) pbi->seen_frame_header = 0;
+        break;
+      case OBU_METADATA:
+        decoded_payload_size = read_metadata(data, payload_size);
+        break;
+      case OBU_TILE_LIST:
+        if (CONFIG_NORMAL_TILE_MODE) {
+          cm->error.error_code = AOM_CODEC_UNSUP_BITSTREAM;
+          return -1;
+        }
+
+        // This OBU type is purely for the large scale tile coding mode.
+        // The common camera frame header has to be already decoded.
+        if (!pbi->camera_frame_header_ready) {
+          cm->error.error_code = AOM_CODEC_CORRUPT_FRAME;
+          return -1;
+        }
+
+        cm->large_scale_tile = 1;
+        av1_set_single_tile_decoding_mode(cm);
+        decoded_payload_size =
+            read_and_decode_one_tile_list(pbi, &rb, data, data + payload_size,
+                                          p_data_end, &frame_decoding_finished);
+        if (cm->error.error_code != AOM_CODEC_OK) return -1;
+        break;
+      case OBU_PADDING:
+      default:
+        // Skip unrecognized OBUs
+        decoded_payload_size = payload_size;
+        break;
+    }
+
+    // Check that the signalled OBU size matches the actual amount of data read
+    if (decoded_payload_size > payload_size) {
+      cm->error.error_code = AOM_CODEC_CORRUPT_FRAME;
+      return -1;
+    }
+
+    // If there are extra padding bytes, they should all be zero
+    while (decoded_payload_size < payload_size) {
+      uint8_t padding_byte = data[decoded_payload_size++];
+      if (padding_byte != 0) {
+        cm->error.error_code = AOM_CODEC_CORRUPT_FRAME;
+        return -1;
+      }
+    }
+
+    data += payload_size;
+  }
+
+  if (cm->error.error_code != AOM_CODEC_OK) return -1;
+  return frame_decoding_finished;
+}

diff --git a/libav1/av1/decoder/obu.h b/libav1/av1/decoder/obu.h
new file mode 100644
index 0000000..d8ebe36
--- /dev/null
+++ b/libav1/av1/decoder/obu.h

@@ -0,0 +1,31 @@
+/*
+ * Copyright (c) 2017, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_DECODER_OBU_H_
+#define AOM_AV1_DECODER_OBU_H_
+
+#include "aom/aom_codec.h"
+#include "av1/decoder/decoder.h"
+
+// Try to decode one frame from a buffer.
+// Returns 1 if we decoded a frame,
+//         0 if we didn't decode a frame but that's okay
+//           (eg, if there was a frame but we skipped it),
+//     or -1 on error
+int aom_decode_frame_from_obus(struct AV1Decoder *pbi, const uint8_t *data,
+                               const uint8_t *data_end,
+                               const uint8_t **p_data_end);
+
+aom_codec_err_t aom_get_num_layers_from_operating_point_idc(
+    int operating_point_idc, unsigned int *number_spatial_layers,
+    unsigned int *number_temporal_layers);
+
+#endif  // AOM_AV1_DECODER_OBU_H_

diff --git a/libav1/build/config/aom_config.c b/libav1/build/config/aom_config.c
new file mode 100644
index 0000000..b4b6feb
--- /dev/null
+++ b/libav1/build/config/aom_config.c

@@ -0,0 +1,13 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#include "aom/aom_codec.h"
+static const char* const cfg = "cmake ../../../../Documents/av1-ps4 -G \"Ninja\" -DAOM_TARGET_CPU=generic -DENABLE_TESTS=0";
+const char *aom_codec_build_config(void) {return cfg;}

diff --git a/libav1/build/config/aom_config.h b/libav1/build/config/aom_config.h
new file mode 100644
index 0000000..8eb75fc
--- /dev/null
+++ b/libav1/build/config/aom_config.h

@@ -0,0 +1,83 @@
+/*
+ * Copyright (c) 2019, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#ifndef AOM_CONFIG_H_
+#define AOM_CONFIG_H_
+#define ARCH_ARM 0
+#define ARCH_MIPS 0
+#define ARCH_PPC 0
+#define ARCH_X86 0
+#define ARCH_X86_64 0
+#define CONFIG_2PASS_PARTITION_SEARCH_LVL_END 4
+#define CONFIG_2PASS_PARTITION_SEARCH_LVL_START 1
+#define CONFIG_ACCOUNTING 0
+#define CONFIG_ANALYZER 0
+#define CONFIG_AV1_DECODER 1
+#define CONFIG_AV1_ENCODER 0
+#define CONFIG_BIG_ENDIAN 0
+#define CONFIG_BITSTREAM_DEBUG 0
+#define CONFIG_COEFFICIENT_RANGE_CHECKING 0
+#define CONFIG_COLLECT_COMPONENT_TIMING 0
+#define CONFIG_COLLECT_PARTITION_STATS 0
+#define CONFIG_COLLECT_RD_STATS 0
+#define CONFIG_DEBUG 1
+#define CONFIG_DENOISE 1
+#define CONFIG_DISABLE_FULL_PIXEL_SPLIT_8X8 1
+#define CONFIG_DIST_8X8 0
+#define CONFIG_ENTROPY_STATS 0
+#define CONFIG_FILEOPTIONS 1
+#define CONFIG_FP_MB_STATS 0
+#define CONFIG_GCC 0
+#define CONFIG_GCOV 0
+#define CONFIG_GPROF 0
+#define CONFIG_INSPECTION 0
+#define CONFIG_INTERNAL_STATS 0
+#define CONFIG_INTER_STATS_ONLY 0
+#define CONFIG_LIBYUV 0
+#define CONFIG_LOWBITDEPTH 1
+#define CONFIG_MAX_DECODE_PROFILE 2
+#define CONFIG_MISMATCH_DEBUG 0
+#define CONFIG_MULTITHREAD 1
+#define CONFIG_NORMAL_TILE_MODE 0
+#define CONFIG_ONE_PASS_SVM 0
+#define CONFIG_OS_SUPPORT 1
+#define CONFIG_PIC 0
+#define CONFIG_RD_DEBUG 0
+#define CONFIG_RUNTIME_CPU_DETECT 1
+#define CONFIG_SHARED 0
+#define CONFIG_SHARP_SETTINGS 0
+#define CONFIG_SIZE_LIMIT 0
+#define CONFIG_SPATIAL_RESAMPLING 1
+#define CONFIG_SPEED_STATS 0
+#define CONFIG_STATIC 1
+#define CONFIG_WEBM_IO 1
+#define DECODE_HEIGHT_LIMIT 0
+#define DECODE_WIDTH_LIMIT 0
+#define HAVE_AVX 0
+#define HAVE_AVX2 0
+#define HAVE_DSPR2 0
+#define HAVE_FEXCEPT 0
+#define HAVE_MIPS32 0
+#define HAVE_MIPS64 0
+#define HAVE_MMX 0
+#define HAVE_MSA 0
+#define HAVE_NEON 0
+#define HAVE_PTHREAD_H 0
+#define HAVE_SSE 0
+#define HAVE_SSE2 0
+#define HAVE_SSE3 0
+#define HAVE_SSE4_1 0
+#define HAVE_SSE4_2 0
+#define HAVE_SSSE3 0
+#define HAVE_UNISTD_H 0
+#define HAVE_VSX 0
+#define HAVE_WXWIDGETS 0
+#define INLINE inline
+#endif  // AOM_CONFIG_H_

diff --git a/libav1/build/config/aom_dsp_rtcd.h b/libav1/build/config/aom_dsp_rtcd.h
new file mode 100644
index 0000000..744d9e4
--- /dev/null
+++ b/libav1/build/config/aom_dsp_rtcd.h

@@ -0,0 +1,4300 @@
+// This file is generated. Do not edit.
+#ifndef AOM_DSP_RTCD_H_
+#define AOM_DSP_RTCD_H_
+
+#ifdef RTCD_C
+#define RTCD_EXTERN
+#else
+#define RTCD_EXTERN extern
+#endif
+
+/*
+ * DSP
+ */
+
+#include "aom/aom_integer.h"
+#include "aom_dsp/aom_dsp_common.h"
+#include "av1/common/enums.h"
+#include "av1/common/blockd.h"
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+unsigned int aom_avg_4x4_c(const uint8_t *, int p);
+#define aom_avg_4x4 aom_avg_4x4_c
+
+unsigned int aom_avg_8x8_c(const uint8_t *, int p);
+#define aom_avg_8x8 aom_avg_8x8_c
+
+void aom_blend_a64_hmask_c(uint8_t *dst, uint32_t dst_stride, const uint8_t *src0, uint32_t src0_stride, const uint8_t *src1, uint32_t src1_stride, const uint8_t *mask, int w, int h);
+#define aom_blend_a64_hmask aom_blend_a64_hmask_c
+
+void aom_blend_a64_mask_c(uint8_t *dst, uint32_t dst_stride, const uint8_t *src0, uint32_t src0_stride, const uint8_t *src1, uint32_t src1_stride, const uint8_t *mask, uint32_t mask_stride, int w, int h, int subx, int suby);
+#define aom_blend_a64_mask aom_blend_a64_mask_c
+
+void aom_blend_a64_vmask_c(uint8_t *dst, uint32_t dst_stride, const uint8_t *src0, uint32_t src0_stride, const uint8_t *src1, uint32_t src1_stride, const uint8_t *mask, int w, int h);
+#define aom_blend_a64_vmask aom_blend_a64_vmask_c
+
+void aom_comp_avg_pred_c(uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride);
+#define aom_comp_avg_pred aom_comp_avg_pred_c
+
+void aom_comp_avg_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
+                                                   const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
+                                                   int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
+                                                   int ref_stride, int subpel_search);
+#define aom_comp_avg_upsampled_pred aom_comp_avg_upsampled_pred_c
+
+void aom_comp_mask_pred_c(uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask);
+#define aom_comp_mask_pred aom_comp_mask_pred_c
+
+void aom_comp_mask_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
+                                                       const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
+                                                       int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
+                                                       int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask,
+                                                       int subpel_search);
+#define aom_comp_mask_upsampled_pred aom_comp_mask_upsampled_pred_c
+
+void aom_convolve8_horiz_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h);
+#define aom_convolve8_horiz aom_convolve8_horiz_c
+
+void aom_convolve8_vert_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h);
+#define aom_convolve8_vert aom_convolve8_vert_c
+
+void aom_convolve_copy_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h);
+#define aom_convolve_copy aom_convolve_copy_c
+
+void aom_dc_128_predictor_16x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_16x16 aom_dc_128_predictor_16x16_c
+
+void aom_dc_128_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_16x32 aom_dc_128_predictor_16x32_c
+
+void aom_dc_128_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_16x4 aom_dc_128_predictor_16x4_c
+
+void aom_dc_128_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_16x64 aom_dc_128_predictor_16x64_c
+
+void aom_dc_128_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_16x8 aom_dc_128_predictor_16x8_c
+
+void aom_dc_128_predictor_2x2_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_2x2 aom_dc_128_predictor_2x2_c
+
+void aom_dc_128_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_32x16 aom_dc_128_predictor_32x16_c
+
+void aom_dc_128_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_32x32 aom_dc_128_predictor_32x32_c
+
+void aom_dc_128_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_32x64 aom_dc_128_predictor_32x64_c
+
+void aom_dc_128_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_32x8 aom_dc_128_predictor_32x8_c
+
+void aom_dc_128_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_4x16 aom_dc_128_predictor_4x16_c
+
+void aom_dc_128_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_4x4 aom_dc_128_predictor_4x4_c
+
+void aom_dc_128_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_4x8 aom_dc_128_predictor_4x8_c
+
+void aom_dc_128_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_64x16 aom_dc_128_predictor_64x16_c
+
+void aom_dc_128_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_64x32 aom_dc_128_predictor_64x32_c
+
+void aom_dc_128_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_64x64 aom_dc_128_predictor_64x64_c
+
+void aom_dc_128_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_8x16 aom_dc_128_predictor_8x16_c
+
+void aom_dc_128_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_8x32 aom_dc_128_predictor_8x32_c
+
+void aom_dc_128_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_8x4 aom_dc_128_predictor_8x4_c
+
+void aom_dc_128_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_8x8 aom_dc_128_predictor_8x8_c
+
+void aom_dc_left_predictor_16x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_16x16 aom_dc_left_predictor_16x16_c
+
+void aom_dc_left_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_16x32 aom_dc_left_predictor_16x32_c
+
+void aom_dc_left_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_16x4 aom_dc_left_predictor_16x4_c
+
+void aom_dc_left_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_16x64 aom_dc_left_predictor_16x64_c
+
+void aom_dc_left_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_16x8 aom_dc_left_predictor_16x8_c
+
+void aom_dc_left_predictor_2x2_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_2x2 aom_dc_left_predictor_2x2_c
+
+void aom_dc_left_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_32x16 aom_dc_left_predictor_32x16_c
+
+void aom_dc_left_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_32x32 aom_dc_left_predictor_32x32_c
+
+void aom_dc_left_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_32x64 aom_dc_left_predictor_32x64_c
+
+void aom_dc_left_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_32x8 aom_dc_left_predictor_32x8_c
+
+void aom_dc_left_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_4x16 aom_dc_left_predictor_4x16_c
+
+void aom_dc_left_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_4x4 aom_dc_left_predictor_4x4_c
+
+void aom_dc_left_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_4x8 aom_dc_left_predictor_4x8_c
+
+void aom_dc_left_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_64x16 aom_dc_left_predictor_64x16_c
+
+void aom_dc_left_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_64x32 aom_dc_left_predictor_64x32_c
+
+void aom_dc_left_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_64x64 aom_dc_left_predictor_64x64_c
+
+void aom_dc_left_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_8x16 aom_dc_left_predictor_8x16_c
+
+void aom_dc_left_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_8x32 aom_dc_left_predictor_8x32_c
+
+void aom_dc_left_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_8x4 aom_dc_left_predictor_8x4_c
+
+void aom_dc_left_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_8x8 aom_dc_left_predictor_8x8_c
+
+void aom_dc_predictor_16x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_16x16 aom_dc_predictor_16x16_c
+
+void aom_dc_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_16x32 aom_dc_predictor_16x32_c
+
+void aom_dc_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_16x4 aom_dc_predictor_16x4_c
+
+void aom_dc_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_16x64 aom_dc_predictor_16x64_c
+
+void aom_dc_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_16x8 aom_dc_predictor_16x8_c
+
+void aom_dc_predictor_2x2_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_2x2 aom_dc_predictor_2x2_c
+
+void aom_dc_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_32x16 aom_dc_predictor_32x16_c
+
+void aom_dc_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_32x32 aom_dc_predictor_32x32_c
+
+void aom_dc_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_32x64 aom_dc_predictor_32x64_c
+
+void aom_dc_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_32x8 aom_dc_predictor_32x8_c
+
+void aom_dc_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_4x16 aom_dc_predictor_4x16_c
+
+void aom_dc_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_4x4 aom_dc_predictor_4x4_c
+
+void aom_dc_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_4x8 aom_dc_predictor_4x8_c
+
+void aom_dc_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_64x16 aom_dc_predictor_64x16_c
+
+void aom_dc_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_64x32 aom_dc_predictor_64x32_c
+
+void aom_dc_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_64x64 aom_dc_predictor_64x64_c
+
+void aom_dc_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_8x16 aom_dc_predictor_8x16_c
+
+void aom_dc_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_8x32 aom_dc_predictor_8x32_c
+
+void aom_dc_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_8x4 aom_dc_predictor_8x4_c
+
+void aom_dc_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_8x8 aom_dc_predictor_8x8_c
+
+void aom_dc_top_predictor_16x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_16x16 aom_dc_top_predictor_16x16_c
+
+void aom_dc_top_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_16x32 aom_dc_top_predictor_16x32_c
+
+void aom_dc_top_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_16x4 aom_dc_top_predictor_16x4_c
+
+void aom_dc_top_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_16x64 aom_dc_top_predictor_16x64_c
+
+void aom_dc_top_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_16x8 aom_dc_top_predictor_16x8_c
+
+void aom_dc_top_predictor_2x2_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_2x2 aom_dc_top_predictor_2x2_c
+
+void aom_dc_top_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_32x16 aom_dc_top_predictor_32x16_c
+
+void aom_dc_top_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_32x32 aom_dc_top_predictor_32x32_c
+
+void aom_dc_top_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_32x64 aom_dc_top_predictor_32x64_c
+
+void aom_dc_top_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_32x8 aom_dc_top_predictor_32x8_c
+
+void aom_dc_top_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_4x16 aom_dc_top_predictor_4x16_c
+
+void aom_dc_top_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_4x4 aom_dc_top_predictor_4x4_c
+
+void aom_dc_top_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_4x8 aom_dc_top_predictor_4x8_c
+
+void aom_dc_top_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_64x16 aom_dc_top_predictor_64x16_c
+
+void aom_dc_top_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_64x32 aom_dc_top_predictor_64x32_c
+
+void aom_dc_top_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_64x64 aom_dc_top_predictor_64x64_c
+
+void aom_dc_top_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_8x16 aom_dc_top_predictor_8x16_c
+
+void aom_dc_top_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_8x32 aom_dc_top_predictor_8x32_c
+
+void aom_dc_top_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_8x4 aom_dc_top_predictor_8x4_c
+
+void aom_dc_top_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_8x8 aom_dc_top_predictor_8x8_c
+
+void aom_dist_wtd_comp_avg_pred_c(uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_comp_avg_pred aom_dist_wtd_comp_avg_pred_c
+
+void aom_dist_wtd_comp_avg_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
+                                                       const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
+                                                       int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
+                                                       int ref_stride, const DIST_WTD_COMP_PARAMS *jcp_param, int subpel_search);
+#define aom_dist_wtd_comp_avg_upsampled_pred aom_dist_wtd_comp_avg_upsampled_pred_c
+
+unsigned int aom_dist_wtd_sad128x128_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad128x128_avg aom_dist_wtd_sad128x128_avg_c
+
+unsigned int aom_dist_wtd_sad128x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad128x64_avg aom_dist_wtd_sad128x64_avg_c
+
+unsigned int aom_dist_wtd_sad16x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad16x16_avg aom_dist_wtd_sad16x16_avg_c
+
+unsigned int aom_dist_wtd_sad16x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad16x32_avg aom_dist_wtd_sad16x32_avg_c
+
+unsigned int aom_dist_wtd_sad16x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad16x4_avg aom_dist_wtd_sad16x4_avg_c
+
+unsigned int aom_dist_wtd_sad16x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad16x64_avg aom_dist_wtd_sad16x64_avg_c
+
+unsigned int aom_dist_wtd_sad16x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad16x8_avg aom_dist_wtd_sad16x8_avg_c
+
+unsigned int aom_dist_wtd_sad32x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad32x16_avg aom_dist_wtd_sad32x16_avg_c
+
+unsigned int aom_dist_wtd_sad32x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad32x32_avg aom_dist_wtd_sad32x32_avg_c
+
+unsigned int aom_dist_wtd_sad32x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad32x64_avg aom_dist_wtd_sad32x64_avg_c
+
+unsigned int aom_dist_wtd_sad32x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad32x8_avg aom_dist_wtd_sad32x8_avg_c
+
+unsigned int aom_dist_wtd_sad4x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad4x16_avg aom_dist_wtd_sad4x16_avg_c
+
+unsigned int aom_dist_wtd_sad4x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad4x4_avg aom_dist_wtd_sad4x4_avg_c
+
+unsigned int aom_dist_wtd_sad4x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad4x8_avg aom_dist_wtd_sad4x8_avg_c
+
+unsigned int aom_dist_wtd_sad64x128_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad64x128_avg aom_dist_wtd_sad64x128_avg_c
+
+unsigned int aom_dist_wtd_sad64x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad64x16_avg aom_dist_wtd_sad64x16_avg_c
+
+unsigned int aom_dist_wtd_sad64x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad64x32_avg aom_dist_wtd_sad64x32_avg_c
+
+unsigned int aom_dist_wtd_sad64x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad64x64_avg aom_dist_wtd_sad64x64_avg_c
+
+unsigned int aom_dist_wtd_sad8x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad8x16_avg aom_dist_wtd_sad8x16_avg_c
+
+unsigned int aom_dist_wtd_sad8x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad8x32_avg aom_dist_wtd_sad8x32_avg_c
+
+unsigned int aom_dist_wtd_sad8x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad8x4_avg aom_dist_wtd_sad8x4_avg_c
+
+unsigned int aom_dist_wtd_sad8x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad8x8_avg aom_dist_wtd_sad8x8_avg_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance128x128 aom_dist_wtd_sub_pixel_avg_variance128x128_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance128x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance128x64 aom_dist_wtd_sub_pixel_avg_variance128x64_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance16x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance16x16 aom_dist_wtd_sub_pixel_avg_variance16x16_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance16x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance16x32 aom_dist_wtd_sub_pixel_avg_variance16x32_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance16x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance16x4 aom_dist_wtd_sub_pixel_avg_variance16x4_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance16x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance16x64 aom_dist_wtd_sub_pixel_avg_variance16x64_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance16x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance16x8 aom_dist_wtd_sub_pixel_avg_variance16x8_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance32x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance32x16 aom_dist_wtd_sub_pixel_avg_variance32x16_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance32x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance32x32 aom_dist_wtd_sub_pixel_avg_variance32x32_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance32x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance32x64 aom_dist_wtd_sub_pixel_avg_variance32x64_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance32x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance32x8 aom_dist_wtd_sub_pixel_avg_variance32x8_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance4x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance4x16 aom_dist_wtd_sub_pixel_avg_variance4x16_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance4x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance4x4 aom_dist_wtd_sub_pixel_avg_variance4x4_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance4x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance4x8 aom_dist_wtd_sub_pixel_avg_variance4x8_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance64x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance64x128 aom_dist_wtd_sub_pixel_avg_variance64x128_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance64x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance64x16 aom_dist_wtd_sub_pixel_avg_variance64x16_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance64x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance64x32 aom_dist_wtd_sub_pixel_avg_variance64x32_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance64x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance64x64 aom_dist_wtd_sub_pixel_avg_variance64x64_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance8x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance8x16 aom_dist_wtd_sub_pixel_avg_variance8x16_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance8x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance8x32 aom_dist_wtd_sub_pixel_avg_variance8x32_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance8x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance8x4 aom_dist_wtd_sub_pixel_avg_variance8x4_c
+
+uint32_t aom_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sub_pixel_avg_variance8x8 aom_dist_wtd_sub_pixel_avg_variance8x8_c
+
+void aom_fdct8x8_c(const int16_t *input, tran_low_t *output, int stride);
+#define aom_fdct8x8 aom_fdct8x8_c
+
+void aom_fft16x16_float_c(const float *input, float *temp, float *output);
+#define aom_fft16x16_float aom_fft16x16_float_c
+
+void aom_fft2x2_float_c(const float *input, float *temp, float *output);
+#define aom_fft2x2_float aom_fft2x2_float_c
+
+void aom_fft32x32_float_c(const float *input, float *temp, float *output);
+#define aom_fft32x32_float aom_fft32x32_float_c
+
+void aom_fft4x4_float_c(const float *input, float *temp, float *output);
+#define aom_fft4x4_float aom_fft4x4_float_c
+
+void aom_fft8x8_float_c(const float *input, float *temp, float *output);
+#define aom_fft8x8_float aom_fft8x8_float_c
+
+void aom_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
+#define aom_get16x16var aom_get16x16var_c
+
+unsigned int aom_get4x4sse_cs_c(const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int ref_stride);
+#define aom_get4x4sse_cs aom_get4x4sse_cs_c
+
+void aom_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
+#define aom_get8x8var aom_get8x8var_c
+
+unsigned int aom_get_mb_ss_c(const int16_t *);
+#define aom_get_mb_ss aom_get_mb_ss_c
+
+void aom_h_predictor_16x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_16x16 aom_h_predictor_16x16_c
+
+void aom_h_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_16x32 aom_h_predictor_16x32_c
+
+void aom_h_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_16x4 aom_h_predictor_16x4_c
+
+void aom_h_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_16x64 aom_h_predictor_16x64_c
+
+void aom_h_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_16x8 aom_h_predictor_16x8_c
+
+void aom_h_predictor_2x2_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_2x2 aom_h_predictor_2x2_c
+
+void aom_h_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_32x16 aom_h_predictor_32x16_c
+
+void aom_h_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_32x32 aom_h_predictor_32x32_c
+
+void aom_h_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_32x64 aom_h_predictor_32x64_c
+
+void aom_h_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_32x8 aom_h_predictor_32x8_c
+
+void aom_h_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_4x16 aom_h_predictor_4x16_c
+
+void aom_h_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_4x4 aom_h_predictor_4x4_c
+
+void aom_h_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_4x8 aom_h_predictor_4x8_c
+
+void aom_h_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_64x16 aom_h_predictor_64x16_c
+
+void aom_h_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_64x32 aom_h_predictor_64x32_c
+
+void aom_h_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_64x64 aom_h_predictor_64x64_c
+
+void aom_h_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_8x16 aom_h_predictor_8x16_c
+
+void aom_h_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_8x32 aom_h_predictor_8x32_c
+
+void aom_h_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_8x4 aom_h_predictor_8x4_c
+
+void aom_h_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_8x8 aom_h_predictor_8x8_c
+
+void aom_hadamard_16x16_c(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
+#define aom_hadamard_16x16 aom_hadamard_16x16_c
+
+void aom_hadamard_32x32_c(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
+#define aom_hadamard_32x32 aom_hadamard_32x32_c
+
+void aom_hadamard_8x8_c(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
+#define aom_hadamard_8x8 aom_hadamard_8x8_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance128x128 aom_highbd_10_dist_wtd_sub_pixel_avg_variance128x128_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance128x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance128x64 aom_highbd_10_dist_wtd_sub_pixel_avg_variance128x64_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance16x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance16x16 aom_highbd_10_dist_wtd_sub_pixel_avg_variance16x16_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance16x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance16x32 aom_highbd_10_dist_wtd_sub_pixel_avg_variance16x32_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance16x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance16x4 aom_highbd_10_dist_wtd_sub_pixel_avg_variance16x4_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance16x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance16x64 aom_highbd_10_dist_wtd_sub_pixel_avg_variance16x64_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance16x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance16x8 aom_highbd_10_dist_wtd_sub_pixel_avg_variance16x8_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance32x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance32x16 aom_highbd_10_dist_wtd_sub_pixel_avg_variance32x16_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance32x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance32x32 aom_highbd_10_dist_wtd_sub_pixel_avg_variance32x32_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance32x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance32x64 aom_highbd_10_dist_wtd_sub_pixel_avg_variance32x64_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance32x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance32x8 aom_highbd_10_dist_wtd_sub_pixel_avg_variance32x8_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance4x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance4x16 aom_highbd_10_dist_wtd_sub_pixel_avg_variance4x16_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance4x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance4x4 aom_highbd_10_dist_wtd_sub_pixel_avg_variance4x4_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance4x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance4x8 aom_highbd_10_dist_wtd_sub_pixel_avg_variance4x8_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance64x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance64x128 aom_highbd_10_dist_wtd_sub_pixel_avg_variance64x128_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance64x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance64x16 aom_highbd_10_dist_wtd_sub_pixel_avg_variance64x16_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance64x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance64x32 aom_highbd_10_dist_wtd_sub_pixel_avg_variance64x32_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance64x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance64x64 aom_highbd_10_dist_wtd_sub_pixel_avg_variance64x64_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x16 aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x16_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x32 aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x32_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x4 aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x4_c
+
+uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8_c
+
+void aom_highbd_10_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
+#define aom_highbd_10_get16x16var aom_highbd_10_get16x16var_c
+
+void aom_highbd_10_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
+#define aom_highbd_10_get8x8var aom_highbd_10_get8x8var_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance128x128 aom_highbd_10_masked_sub_pixel_variance128x128_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance128x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance128x64 aom_highbd_10_masked_sub_pixel_variance128x64_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance16x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance16x16 aom_highbd_10_masked_sub_pixel_variance16x16_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance16x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance16x32 aom_highbd_10_masked_sub_pixel_variance16x32_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance16x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance16x4 aom_highbd_10_masked_sub_pixel_variance16x4_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance16x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance16x64 aom_highbd_10_masked_sub_pixel_variance16x64_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance16x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance16x8 aom_highbd_10_masked_sub_pixel_variance16x8_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance32x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance32x16 aom_highbd_10_masked_sub_pixel_variance32x16_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance32x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance32x32 aom_highbd_10_masked_sub_pixel_variance32x32_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance32x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance32x64 aom_highbd_10_masked_sub_pixel_variance32x64_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance32x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance32x8 aom_highbd_10_masked_sub_pixel_variance32x8_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance4x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance4x16 aom_highbd_10_masked_sub_pixel_variance4x16_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance4x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance4x4 aom_highbd_10_masked_sub_pixel_variance4x4_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance4x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance4x8 aom_highbd_10_masked_sub_pixel_variance4x8_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance64x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance64x128 aom_highbd_10_masked_sub_pixel_variance64x128_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance64x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance64x16 aom_highbd_10_masked_sub_pixel_variance64x16_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance64x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance64x32 aom_highbd_10_masked_sub_pixel_variance64x32_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance64x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance64x64 aom_highbd_10_masked_sub_pixel_variance64x64_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance8x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance8x16 aom_highbd_10_masked_sub_pixel_variance8x16_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance8x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance8x32 aom_highbd_10_masked_sub_pixel_variance8x32_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance8x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance8x4 aom_highbd_10_masked_sub_pixel_variance8x4_c
+
+unsigned int aom_highbd_10_masked_sub_pixel_variance8x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_10_masked_sub_pixel_variance8x8 aom_highbd_10_masked_sub_pixel_variance8x8_c
+
+unsigned int aom_highbd_10_mse16x16_c(const uint8_t *src_ptr, int  source_stride, const uint8_t *ref_ptr, int  recon_stride, unsigned int *sse);
+#define aom_highbd_10_mse16x16 aom_highbd_10_mse16x16_c
+
+unsigned int aom_highbd_10_mse16x8_c(const uint8_t *src_ptr, int  source_stride, const uint8_t *ref_ptr, int  recon_stride, unsigned int *sse);
+#define aom_highbd_10_mse16x8 aom_highbd_10_mse16x8_c
+
+unsigned int aom_highbd_10_mse8x16_c(const uint8_t *src_ptr, int  source_stride, const uint8_t *ref_ptr, int  recon_stride, unsigned int *sse);
+#define aom_highbd_10_mse8x16 aom_highbd_10_mse8x16_c
+
+unsigned int aom_highbd_10_mse8x8_c(const uint8_t *src_ptr, int  source_stride, const uint8_t *ref_ptr, int  recon_stride, unsigned int *sse);
+#define aom_highbd_10_mse8x8 aom_highbd_10_mse8x8_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance128x128_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance128x128 aom_highbd_10_obmc_sub_pixel_variance128x128_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance128x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance128x64 aom_highbd_10_obmc_sub_pixel_variance128x64_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance16x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance16x16 aom_highbd_10_obmc_sub_pixel_variance16x16_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance16x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance16x32 aom_highbd_10_obmc_sub_pixel_variance16x32_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance16x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance16x4 aom_highbd_10_obmc_sub_pixel_variance16x4_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance16x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance16x64 aom_highbd_10_obmc_sub_pixel_variance16x64_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance16x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance16x8 aom_highbd_10_obmc_sub_pixel_variance16x8_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance32x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance32x16 aom_highbd_10_obmc_sub_pixel_variance32x16_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance32x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance32x32 aom_highbd_10_obmc_sub_pixel_variance32x32_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance32x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance32x64 aom_highbd_10_obmc_sub_pixel_variance32x64_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance32x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance32x8 aom_highbd_10_obmc_sub_pixel_variance32x8_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance4x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance4x16 aom_highbd_10_obmc_sub_pixel_variance4x16_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance4x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance4x4 aom_highbd_10_obmc_sub_pixel_variance4x4_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance4x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance4x8 aom_highbd_10_obmc_sub_pixel_variance4x8_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance64x128_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance64x128 aom_highbd_10_obmc_sub_pixel_variance64x128_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance64x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance64x16 aom_highbd_10_obmc_sub_pixel_variance64x16_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance64x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance64x32 aom_highbd_10_obmc_sub_pixel_variance64x32_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance64x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance64x64 aom_highbd_10_obmc_sub_pixel_variance64x64_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance8x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance8x16 aom_highbd_10_obmc_sub_pixel_variance8x16_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance8x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance8x32 aom_highbd_10_obmc_sub_pixel_variance8x32_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance8x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance8x4 aom_highbd_10_obmc_sub_pixel_variance8x4_c
+
+unsigned int aom_highbd_10_obmc_sub_pixel_variance8x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_sub_pixel_variance8x8 aom_highbd_10_obmc_sub_pixel_variance8x8_c
+
+unsigned int aom_highbd_10_obmc_variance128x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance128x128 aom_highbd_10_obmc_variance128x128_c
+
+unsigned int aom_highbd_10_obmc_variance128x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance128x64 aom_highbd_10_obmc_variance128x64_c
+
+unsigned int aom_highbd_10_obmc_variance16x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance16x16 aom_highbd_10_obmc_variance16x16_c
+
+unsigned int aom_highbd_10_obmc_variance16x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance16x32 aom_highbd_10_obmc_variance16x32_c
+
+unsigned int aom_highbd_10_obmc_variance16x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance16x4 aom_highbd_10_obmc_variance16x4_c
+
+unsigned int aom_highbd_10_obmc_variance16x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance16x64 aom_highbd_10_obmc_variance16x64_c
+
+unsigned int aom_highbd_10_obmc_variance16x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance16x8 aom_highbd_10_obmc_variance16x8_c
+
+unsigned int aom_highbd_10_obmc_variance32x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance32x16 aom_highbd_10_obmc_variance32x16_c
+
+unsigned int aom_highbd_10_obmc_variance32x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance32x32 aom_highbd_10_obmc_variance32x32_c
+
+unsigned int aom_highbd_10_obmc_variance32x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance32x64 aom_highbd_10_obmc_variance32x64_c
+
+unsigned int aom_highbd_10_obmc_variance32x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance32x8 aom_highbd_10_obmc_variance32x8_c
+
+unsigned int aom_highbd_10_obmc_variance4x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance4x16 aom_highbd_10_obmc_variance4x16_c
+
+unsigned int aom_highbd_10_obmc_variance4x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance4x4 aom_highbd_10_obmc_variance4x4_c
+
+unsigned int aom_highbd_10_obmc_variance4x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance4x8 aom_highbd_10_obmc_variance4x8_c
+
+unsigned int aom_highbd_10_obmc_variance64x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance64x128 aom_highbd_10_obmc_variance64x128_c
+
+unsigned int aom_highbd_10_obmc_variance64x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance64x16 aom_highbd_10_obmc_variance64x16_c
+
+unsigned int aom_highbd_10_obmc_variance64x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance64x32 aom_highbd_10_obmc_variance64x32_c
+
+unsigned int aom_highbd_10_obmc_variance64x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance64x64 aom_highbd_10_obmc_variance64x64_c
+
+unsigned int aom_highbd_10_obmc_variance8x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance8x16 aom_highbd_10_obmc_variance8x16_c
+
+unsigned int aom_highbd_10_obmc_variance8x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance8x32 aom_highbd_10_obmc_variance8x32_c
+
+unsigned int aom_highbd_10_obmc_variance8x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance8x4 aom_highbd_10_obmc_variance8x4_c
+
+unsigned int aom_highbd_10_obmc_variance8x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_10_obmc_variance8x8 aom_highbd_10_obmc_variance8x8_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance128x128 aom_highbd_10_sub_pixel_avg_variance128x128_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance128x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance128x64 aom_highbd_10_sub_pixel_avg_variance128x64_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance16x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance16x16 aom_highbd_10_sub_pixel_avg_variance16x16_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance16x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance16x32 aom_highbd_10_sub_pixel_avg_variance16x32_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance16x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance16x4 aom_highbd_10_sub_pixel_avg_variance16x4_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance16x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance16x64 aom_highbd_10_sub_pixel_avg_variance16x64_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance16x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance16x8 aom_highbd_10_sub_pixel_avg_variance16x8_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance32x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance32x16 aom_highbd_10_sub_pixel_avg_variance32x16_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance32x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance32x32 aom_highbd_10_sub_pixel_avg_variance32x32_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance32x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance32x64 aom_highbd_10_sub_pixel_avg_variance32x64_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance32x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance32x8 aom_highbd_10_sub_pixel_avg_variance32x8_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance4x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance4x16 aom_highbd_10_sub_pixel_avg_variance4x16_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance4x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance4x4 aom_highbd_10_sub_pixel_avg_variance4x4_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance4x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance4x8 aom_highbd_10_sub_pixel_avg_variance4x8_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance64x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance64x128 aom_highbd_10_sub_pixel_avg_variance64x128_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance64x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance64x16 aom_highbd_10_sub_pixel_avg_variance64x16_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance64x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance64x32 aom_highbd_10_sub_pixel_avg_variance64x32_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance64x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance64x64 aom_highbd_10_sub_pixel_avg_variance64x64_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance8x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance8x16 aom_highbd_10_sub_pixel_avg_variance8x16_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance8x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance8x32 aom_highbd_10_sub_pixel_avg_variance8x32_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance8x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance8x4 aom_highbd_10_sub_pixel_avg_variance8x4_c
+
+uint32_t aom_highbd_10_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_10_sub_pixel_avg_variance8x8 aom_highbd_10_sub_pixel_avg_variance8x8_c
+
+uint32_t aom_highbd_10_sub_pixel_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance128x128 aom_highbd_10_sub_pixel_variance128x128_c
+
+uint32_t aom_highbd_10_sub_pixel_variance128x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance128x64 aom_highbd_10_sub_pixel_variance128x64_c
+
+uint32_t aom_highbd_10_sub_pixel_variance16x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance16x16 aom_highbd_10_sub_pixel_variance16x16_c
+
+uint32_t aom_highbd_10_sub_pixel_variance16x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance16x32 aom_highbd_10_sub_pixel_variance16x32_c
+
+uint32_t aom_highbd_10_sub_pixel_variance16x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance16x4 aom_highbd_10_sub_pixel_variance16x4_c
+
+uint32_t aom_highbd_10_sub_pixel_variance16x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance16x64 aom_highbd_10_sub_pixel_variance16x64_c
+
+uint32_t aom_highbd_10_sub_pixel_variance16x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance16x8 aom_highbd_10_sub_pixel_variance16x8_c
+
+uint32_t aom_highbd_10_sub_pixel_variance32x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance32x16 aom_highbd_10_sub_pixel_variance32x16_c
+
+uint32_t aom_highbd_10_sub_pixel_variance32x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance32x32 aom_highbd_10_sub_pixel_variance32x32_c
+
+uint32_t aom_highbd_10_sub_pixel_variance32x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance32x64 aom_highbd_10_sub_pixel_variance32x64_c
+
+uint32_t aom_highbd_10_sub_pixel_variance32x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance32x8 aom_highbd_10_sub_pixel_variance32x8_c
+
+uint32_t aom_highbd_10_sub_pixel_variance4x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance4x16 aom_highbd_10_sub_pixel_variance4x16_c
+
+uint32_t aom_highbd_10_sub_pixel_variance4x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance4x4 aom_highbd_10_sub_pixel_variance4x4_c
+
+uint32_t aom_highbd_10_sub_pixel_variance4x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance4x8 aom_highbd_10_sub_pixel_variance4x8_c
+
+uint32_t aom_highbd_10_sub_pixel_variance64x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance64x128 aom_highbd_10_sub_pixel_variance64x128_c
+
+uint32_t aom_highbd_10_sub_pixel_variance64x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance64x16 aom_highbd_10_sub_pixel_variance64x16_c
+
+uint32_t aom_highbd_10_sub_pixel_variance64x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance64x32 aom_highbd_10_sub_pixel_variance64x32_c
+
+uint32_t aom_highbd_10_sub_pixel_variance64x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance64x64 aom_highbd_10_sub_pixel_variance64x64_c
+
+uint32_t aom_highbd_10_sub_pixel_variance8x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance8x16 aom_highbd_10_sub_pixel_variance8x16_c
+
+uint32_t aom_highbd_10_sub_pixel_variance8x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance8x32 aom_highbd_10_sub_pixel_variance8x32_c
+
+uint32_t aom_highbd_10_sub_pixel_variance8x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance8x4 aom_highbd_10_sub_pixel_variance8x4_c
+
+uint32_t aom_highbd_10_sub_pixel_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_sub_pixel_variance8x8 aom_highbd_10_sub_pixel_variance8x8_c
+
+unsigned int aom_highbd_10_variance128x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance128x128 aom_highbd_10_variance128x128_c
+
+unsigned int aom_highbd_10_variance128x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance128x64 aom_highbd_10_variance128x64_c
+
+unsigned int aom_highbd_10_variance16x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance16x16 aom_highbd_10_variance16x16_c
+
+unsigned int aom_highbd_10_variance16x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance16x32 aom_highbd_10_variance16x32_c
+
+unsigned int aom_highbd_10_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_variance16x4 aom_highbd_10_variance16x4_c
+
+unsigned int aom_highbd_10_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_variance16x64 aom_highbd_10_variance16x64_c
+
+unsigned int aom_highbd_10_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance16x8 aom_highbd_10_variance16x8_c
+
+unsigned int aom_highbd_10_variance2x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance2x2 aom_highbd_10_variance2x2_c
+
+unsigned int aom_highbd_10_variance2x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance2x4 aom_highbd_10_variance2x4_c
+
+unsigned int aom_highbd_10_variance32x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance32x16 aom_highbd_10_variance32x16_c
+
+unsigned int aom_highbd_10_variance32x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance32x32 aom_highbd_10_variance32x32_c
+
+unsigned int aom_highbd_10_variance32x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance32x64 aom_highbd_10_variance32x64_c
+
+unsigned int aom_highbd_10_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_variance32x8 aom_highbd_10_variance32x8_c
+
+unsigned int aom_highbd_10_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_variance4x16 aom_highbd_10_variance4x16_c
+
+unsigned int aom_highbd_10_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance4x2 aom_highbd_10_variance4x2_c
+
+unsigned int aom_highbd_10_variance4x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance4x4 aom_highbd_10_variance4x4_c
+
+unsigned int aom_highbd_10_variance4x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance4x8 aom_highbd_10_variance4x8_c
+
+unsigned int aom_highbd_10_variance64x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance64x128 aom_highbd_10_variance64x128_c
+
+unsigned int aom_highbd_10_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_variance64x16 aom_highbd_10_variance64x16_c
+
+unsigned int aom_highbd_10_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance64x32 aom_highbd_10_variance64x32_c
+
+unsigned int aom_highbd_10_variance64x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance64x64 aom_highbd_10_variance64x64_c
+
+unsigned int aom_highbd_10_variance8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance8x16 aom_highbd_10_variance8x16_c
+
+unsigned int aom_highbd_10_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_10_variance8x32 aom_highbd_10_variance8x32_c
+
+unsigned int aom_highbd_10_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance8x4 aom_highbd_10_variance8x4_c
+
+unsigned int aom_highbd_10_variance8x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_10_variance8x8 aom_highbd_10_variance8x8_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance128x128 aom_highbd_12_dist_wtd_sub_pixel_avg_variance128x128_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance128x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance128x64 aom_highbd_12_dist_wtd_sub_pixel_avg_variance128x64_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance16x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance16x16 aom_highbd_12_dist_wtd_sub_pixel_avg_variance16x16_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance16x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance16x32 aom_highbd_12_dist_wtd_sub_pixel_avg_variance16x32_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance16x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance16x4 aom_highbd_12_dist_wtd_sub_pixel_avg_variance16x4_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance16x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance16x64 aom_highbd_12_dist_wtd_sub_pixel_avg_variance16x64_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance16x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance16x8 aom_highbd_12_dist_wtd_sub_pixel_avg_variance16x8_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance32x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance32x16 aom_highbd_12_dist_wtd_sub_pixel_avg_variance32x16_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance32x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance32x32 aom_highbd_12_dist_wtd_sub_pixel_avg_variance32x32_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance32x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance32x64 aom_highbd_12_dist_wtd_sub_pixel_avg_variance32x64_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance32x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance32x8 aom_highbd_12_dist_wtd_sub_pixel_avg_variance32x8_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance4x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance4x16 aom_highbd_12_dist_wtd_sub_pixel_avg_variance4x16_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance4x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance4x4 aom_highbd_12_dist_wtd_sub_pixel_avg_variance4x4_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance4x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance4x8 aom_highbd_12_dist_wtd_sub_pixel_avg_variance4x8_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance64x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance64x128 aom_highbd_12_dist_wtd_sub_pixel_avg_variance64x128_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance64x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance64x16 aom_highbd_12_dist_wtd_sub_pixel_avg_variance64x16_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance64x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance64x32 aom_highbd_12_dist_wtd_sub_pixel_avg_variance64x32_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance64x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance64x64 aom_highbd_12_dist_wtd_sub_pixel_avg_variance64x64_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x16 aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x16_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x32 aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x32_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x4 aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x4_c
+
+uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8_c
+
+void aom_highbd_12_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
+#define aom_highbd_12_get16x16var aom_highbd_12_get16x16var_c
+
+void aom_highbd_12_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
+#define aom_highbd_12_get8x8var aom_highbd_12_get8x8var_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance128x128 aom_highbd_12_masked_sub_pixel_variance128x128_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance128x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance128x64 aom_highbd_12_masked_sub_pixel_variance128x64_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance16x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance16x16 aom_highbd_12_masked_sub_pixel_variance16x16_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance16x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance16x32 aom_highbd_12_masked_sub_pixel_variance16x32_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance16x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance16x4 aom_highbd_12_masked_sub_pixel_variance16x4_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance16x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance16x64 aom_highbd_12_masked_sub_pixel_variance16x64_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance16x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance16x8 aom_highbd_12_masked_sub_pixel_variance16x8_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance32x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance32x16 aom_highbd_12_masked_sub_pixel_variance32x16_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance32x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance32x32 aom_highbd_12_masked_sub_pixel_variance32x32_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance32x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance32x64 aom_highbd_12_masked_sub_pixel_variance32x64_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance32x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance32x8 aom_highbd_12_masked_sub_pixel_variance32x8_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance4x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance4x16 aom_highbd_12_masked_sub_pixel_variance4x16_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance4x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance4x4 aom_highbd_12_masked_sub_pixel_variance4x4_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance4x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance4x8 aom_highbd_12_masked_sub_pixel_variance4x8_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance64x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance64x128 aom_highbd_12_masked_sub_pixel_variance64x128_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance64x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance64x16 aom_highbd_12_masked_sub_pixel_variance64x16_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance64x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance64x32 aom_highbd_12_masked_sub_pixel_variance64x32_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance64x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance64x64 aom_highbd_12_masked_sub_pixel_variance64x64_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance8x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance8x16 aom_highbd_12_masked_sub_pixel_variance8x16_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance8x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance8x32 aom_highbd_12_masked_sub_pixel_variance8x32_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance8x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance8x4 aom_highbd_12_masked_sub_pixel_variance8x4_c
+
+unsigned int aom_highbd_12_masked_sub_pixel_variance8x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_12_masked_sub_pixel_variance8x8 aom_highbd_12_masked_sub_pixel_variance8x8_c
+
+unsigned int aom_highbd_12_mse16x16_c(const uint8_t *src_ptr, int  source_stride, const uint8_t *ref_ptr, int  recon_stride, unsigned int *sse);
+#define aom_highbd_12_mse16x16 aom_highbd_12_mse16x16_c
+
+unsigned int aom_highbd_12_mse16x8_c(const uint8_t *src_ptr, int  source_stride, const uint8_t *ref_ptr, int  recon_stride, unsigned int *sse);
+#define aom_highbd_12_mse16x8 aom_highbd_12_mse16x8_c
+
+unsigned int aom_highbd_12_mse8x16_c(const uint8_t *src_ptr, int  source_stride, const uint8_t *ref_ptr, int  recon_stride, unsigned int *sse);
+#define aom_highbd_12_mse8x16 aom_highbd_12_mse8x16_c
+
+unsigned int aom_highbd_12_mse8x8_c(const uint8_t *src_ptr, int  source_stride, const uint8_t *ref_ptr, int  recon_stride, unsigned int *sse);
+#define aom_highbd_12_mse8x8 aom_highbd_12_mse8x8_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance128x128_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance128x128 aom_highbd_12_obmc_sub_pixel_variance128x128_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance128x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance128x64 aom_highbd_12_obmc_sub_pixel_variance128x64_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance16x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance16x16 aom_highbd_12_obmc_sub_pixel_variance16x16_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance16x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance16x32 aom_highbd_12_obmc_sub_pixel_variance16x32_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance16x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance16x4 aom_highbd_12_obmc_sub_pixel_variance16x4_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance16x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance16x64 aom_highbd_12_obmc_sub_pixel_variance16x64_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance16x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance16x8 aom_highbd_12_obmc_sub_pixel_variance16x8_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance32x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance32x16 aom_highbd_12_obmc_sub_pixel_variance32x16_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance32x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance32x32 aom_highbd_12_obmc_sub_pixel_variance32x32_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance32x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance32x64 aom_highbd_12_obmc_sub_pixel_variance32x64_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance32x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance32x8 aom_highbd_12_obmc_sub_pixel_variance32x8_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance4x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance4x16 aom_highbd_12_obmc_sub_pixel_variance4x16_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance4x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance4x4 aom_highbd_12_obmc_sub_pixel_variance4x4_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance4x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance4x8 aom_highbd_12_obmc_sub_pixel_variance4x8_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance64x128_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance64x128 aom_highbd_12_obmc_sub_pixel_variance64x128_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance64x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance64x16 aom_highbd_12_obmc_sub_pixel_variance64x16_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance64x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance64x32 aom_highbd_12_obmc_sub_pixel_variance64x32_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance64x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance64x64 aom_highbd_12_obmc_sub_pixel_variance64x64_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance8x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance8x16 aom_highbd_12_obmc_sub_pixel_variance8x16_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance8x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance8x32 aom_highbd_12_obmc_sub_pixel_variance8x32_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance8x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance8x4 aom_highbd_12_obmc_sub_pixel_variance8x4_c
+
+unsigned int aom_highbd_12_obmc_sub_pixel_variance8x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_sub_pixel_variance8x8 aom_highbd_12_obmc_sub_pixel_variance8x8_c
+
+unsigned int aom_highbd_12_obmc_variance128x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance128x128 aom_highbd_12_obmc_variance128x128_c
+
+unsigned int aom_highbd_12_obmc_variance128x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance128x64 aom_highbd_12_obmc_variance128x64_c
+
+unsigned int aom_highbd_12_obmc_variance16x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance16x16 aom_highbd_12_obmc_variance16x16_c
+
+unsigned int aom_highbd_12_obmc_variance16x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance16x32 aom_highbd_12_obmc_variance16x32_c
+
+unsigned int aom_highbd_12_obmc_variance16x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance16x4 aom_highbd_12_obmc_variance16x4_c
+
+unsigned int aom_highbd_12_obmc_variance16x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance16x64 aom_highbd_12_obmc_variance16x64_c
+
+unsigned int aom_highbd_12_obmc_variance16x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance16x8 aom_highbd_12_obmc_variance16x8_c
+
+unsigned int aom_highbd_12_obmc_variance32x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance32x16 aom_highbd_12_obmc_variance32x16_c
+
+unsigned int aom_highbd_12_obmc_variance32x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance32x32 aom_highbd_12_obmc_variance32x32_c
+
+unsigned int aom_highbd_12_obmc_variance32x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance32x64 aom_highbd_12_obmc_variance32x64_c
+
+unsigned int aom_highbd_12_obmc_variance32x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance32x8 aom_highbd_12_obmc_variance32x8_c
+
+unsigned int aom_highbd_12_obmc_variance4x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance4x16 aom_highbd_12_obmc_variance4x16_c
+
+unsigned int aom_highbd_12_obmc_variance4x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance4x4 aom_highbd_12_obmc_variance4x4_c
+
+unsigned int aom_highbd_12_obmc_variance4x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance4x8 aom_highbd_12_obmc_variance4x8_c
+
+unsigned int aom_highbd_12_obmc_variance64x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance64x128 aom_highbd_12_obmc_variance64x128_c
+
+unsigned int aom_highbd_12_obmc_variance64x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance64x16 aom_highbd_12_obmc_variance64x16_c
+
+unsigned int aom_highbd_12_obmc_variance64x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance64x32 aom_highbd_12_obmc_variance64x32_c
+
+unsigned int aom_highbd_12_obmc_variance64x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance64x64 aom_highbd_12_obmc_variance64x64_c
+
+unsigned int aom_highbd_12_obmc_variance8x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance8x16 aom_highbd_12_obmc_variance8x16_c
+
+unsigned int aom_highbd_12_obmc_variance8x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance8x32 aom_highbd_12_obmc_variance8x32_c
+
+unsigned int aom_highbd_12_obmc_variance8x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance8x4 aom_highbd_12_obmc_variance8x4_c
+
+unsigned int aom_highbd_12_obmc_variance8x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_12_obmc_variance8x8 aom_highbd_12_obmc_variance8x8_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance128x128 aom_highbd_12_sub_pixel_avg_variance128x128_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance128x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance128x64 aom_highbd_12_sub_pixel_avg_variance128x64_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance16x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance16x16 aom_highbd_12_sub_pixel_avg_variance16x16_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance16x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance16x32 aom_highbd_12_sub_pixel_avg_variance16x32_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance16x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance16x4 aom_highbd_12_sub_pixel_avg_variance16x4_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance16x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance16x64 aom_highbd_12_sub_pixel_avg_variance16x64_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance16x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance16x8 aom_highbd_12_sub_pixel_avg_variance16x8_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance32x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance32x16 aom_highbd_12_sub_pixel_avg_variance32x16_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance32x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance32x32 aom_highbd_12_sub_pixel_avg_variance32x32_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance32x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance32x64 aom_highbd_12_sub_pixel_avg_variance32x64_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance32x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance32x8 aom_highbd_12_sub_pixel_avg_variance32x8_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance4x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance4x16 aom_highbd_12_sub_pixel_avg_variance4x16_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance4x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance4x4 aom_highbd_12_sub_pixel_avg_variance4x4_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance4x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance4x8 aom_highbd_12_sub_pixel_avg_variance4x8_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance64x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance64x128 aom_highbd_12_sub_pixel_avg_variance64x128_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance64x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance64x16 aom_highbd_12_sub_pixel_avg_variance64x16_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance64x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance64x32 aom_highbd_12_sub_pixel_avg_variance64x32_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance64x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance64x64 aom_highbd_12_sub_pixel_avg_variance64x64_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance8x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance8x16 aom_highbd_12_sub_pixel_avg_variance8x16_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance8x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance8x32 aom_highbd_12_sub_pixel_avg_variance8x32_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance8x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance8x4 aom_highbd_12_sub_pixel_avg_variance8x4_c
+
+uint32_t aom_highbd_12_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_12_sub_pixel_avg_variance8x8 aom_highbd_12_sub_pixel_avg_variance8x8_c
+
+uint32_t aom_highbd_12_sub_pixel_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance128x128 aom_highbd_12_sub_pixel_variance128x128_c
+
+uint32_t aom_highbd_12_sub_pixel_variance128x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance128x64 aom_highbd_12_sub_pixel_variance128x64_c
+
+uint32_t aom_highbd_12_sub_pixel_variance16x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance16x16 aom_highbd_12_sub_pixel_variance16x16_c
+
+uint32_t aom_highbd_12_sub_pixel_variance16x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance16x32 aom_highbd_12_sub_pixel_variance16x32_c
+
+uint32_t aom_highbd_12_sub_pixel_variance16x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance16x4 aom_highbd_12_sub_pixel_variance16x4_c
+
+uint32_t aom_highbd_12_sub_pixel_variance16x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance16x64 aom_highbd_12_sub_pixel_variance16x64_c
+
+uint32_t aom_highbd_12_sub_pixel_variance16x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance16x8 aom_highbd_12_sub_pixel_variance16x8_c
+
+uint32_t aom_highbd_12_sub_pixel_variance32x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance32x16 aom_highbd_12_sub_pixel_variance32x16_c
+
+uint32_t aom_highbd_12_sub_pixel_variance32x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance32x32 aom_highbd_12_sub_pixel_variance32x32_c
+
+uint32_t aom_highbd_12_sub_pixel_variance32x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance32x64 aom_highbd_12_sub_pixel_variance32x64_c
+
+uint32_t aom_highbd_12_sub_pixel_variance32x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance32x8 aom_highbd_12_sub_pixel_variance32x8_c
+
+uint32_t aom_highbd_12_sub_pixel_variance4x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance4x16 aom_highbd_12_sub_pixel_variance4x16_c
+
+uint32_t aom_highbd_12_sub_pixel_variance4x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance4x4 aom_highbd_12_sub_pixel_variance4x4_c
+
+uint32_t aom_highbd_12_sub_pixel_variance4x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance4x8 aom_highbd_12_sub_pixel_variance4x8_c
+
+uint32_t aom_highbd_12_sub_pixel_variance64x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance64x128 aom_highbd_12_sub_pixel_variance64x128_c
+
+uint32_t aom_highbd_12_sub_pixel_variance64x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance64x16 aom_highbd_12_sub_pixel_variance64x16_c
+
+uint32_t aom_highbd_12_sub_pixel_variance64x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance64x32 aom_highbd_12_sub_pixel_variance64x32_c
+
+uint32_t aom_highbd_12_sub_pixel_variance64x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance64x64 aom_highbd_12_sub_pixel_variance64x64_c
+
+uint32_t aom_highbd_12_sub_pixel_variance8x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance8x16 aom_highbd_12_sub_pixel_variance8x16_c
+
+uint32_t aom_highbd_12_sub_pixel_variance8x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance8x32 aom_highbd_12_sub_pixel_variance8x32_c
+
+uint32_t aom_highbd_12_sub_pixel_variance8x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance8x4 aom_highbd_12_sub_pixel_variance8x4_c
+
+uint32_t aom_highbd_12_sub_pixel_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_sub_pixel_variance8x8 aom_highbd_12_sub_pixel_variance8x8_c
+
+unsigned int aom_highbd_12_variance128x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance128x128 aom_highbd_12_variance128x128_c
+
+unsigned int aom_highbd_12_variance128x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance128x64 aom_highbd_12_variance128x64_c
+
+unsigned int aom_highbd_12_variance16x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance16x16 aom_highbd_12_variance16x16_c
+
+unsigned int aom_highbd_12_variance16x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance16x32 aom_highbd_12_variance16x32_c
+
+unsigned int aom_highbd_12_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_variance16x4 aom_highbd_12_variance16x4_c
+
+unsigned int aom_highbd_12_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_variance16x64 aom_highbd_12_variance16x64_c
+
+unsigned int aom_highbd_12_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance16x8 aom_highbd_12_variance16x8_c
+
+unsigned int aom_highbd_12_variance2x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance2x2 aom_highbd_12_variance2x2_c
+
+unsigned int aom_highbd_12_variance2x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance2x4 aom_highbd_12_variance2x4_c
+
+unsigned int aom_highbd_12_variance32x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance32x16 aom_highbd_12_variance32x16_c
+
+unsigned int aom_highbd_12_variance32x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance32x32 aom_highbd_12_variance32x32_c
+
+unsigned int aom_highbd_12_variance32x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance32x64 aom_highbd_12_variance32x64_c
+
+unsigned int aom_highbd_12_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_variance32x8 aom_highbd_12_variance32x8_c
+
+unsigned int aom_highbd_12_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_variance4x16 aom_highbd_12_variance4x16_c
+
+unsigned int aom_highbd_12_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance4x2 aom_highbd_12_variance4x2_c
+
+unsigned int aom_highbd_12_variance4x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance4x4 aom_highbd_12_variance4x4_c
+
+unsigned int aom_highbd_12_variance4x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance4x8 aom_highbd_12_variance4x8_c
+
+unsigned int aom_highbd_12_variance64x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance64x128 aom_highbd_12_variance64x128_c
+
+unsigned int aom_highbd_12_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_variance64x16 aom_highbd_12_variance64x16_c
+
+unsigned int aom_highbd_12_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance64x32 aom_highbd_12_variance64x32_c
+
+unsigned int aom_highbd_12_variance64x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance64x64 aom_highbd_12_variance64x64_c
+
+unsigned int aom_highbd_12_variance8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance8x16 aom_highbd_12_variance8x16_c
+
+unsigned int aom_highbd_12_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_12_variance8x32 aom_highbd_12_variance8x32_c
+
+unsigned int aom_highbd_12_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance8x4 aom_highbd_12_variance8x4_c
+
+unsigned int aom_highbd_12_variance8x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance8x8 aom_highbd_12_variance8x8_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance128x128 aom_highbd_8_dist_wtd_sub_pixel_avg_variance128x128_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance128x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance128x64 aom_highbd_8_dist_wtd_sub_pixel_avg_variance128x64_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance16x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance16x16 aom_highbd_8_dist_wtd_sub_pixel_avg_variance16x16_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance16x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance16x32 aom_highbd_8_dist_wtd_sub_pixel_avg_variance16x32_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance16x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance16x4 aom_highbd_8_dist_wtd_sub_pixel_avg_variance16x4_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance16x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance16x64 aom_highbd_8_dist_wtd_sub_pixel_avg_variance16x64_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance16x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance16x8 aom_highbd_8_dist_wtd_sub_pixel_avg_variance16x8_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance32x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance32x16 aom_highbd_8_dist_wtd_sub_pixel_avg_variance32x16_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance32x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance32x32 aom_highbd_8_dist_wtd_sub_pixel_avg_variance32x32_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance32x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance32x64 aom_highbd_8_dist_wtd_sub_pixel_avg_variance32x64_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance32x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance32x8 aom_highbd_8_dist_wtd_sub_pixel_avg_variance32x8_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance4x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance4x16 aom_highbd_8_dist_wtd_sub_pixel_avg_variance4x16_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance4x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance4x4 aom_highbd_8_dist_wtd_sub_pixel_avg_variance4x4_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance4x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance4x8 aom_highbd_8_dist_wtd_sub_pixel_avg_variance4x8_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance64x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance64x128 aom_highbd_8_dist_wtd_sub_pixel_avg_variance64x128_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance64x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance64x16 aom_highbd_8_dist_wtd_sub_pixel_avg_variance64x16_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance64x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance64x32 aom_highbd_8_dist_wtd_sub_pixel_avg_variance64x32_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance64x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance64x64 aom_highbd_8_dist_wtd_sub_pixel_avg_variance64x64_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x16 aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x16_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x32 aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x32_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x4 aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x4_c
+
+uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8_c
+
+void aom_highbd_8_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
+#define aom_highbd_8_get16x16var aom_highbd_8_get16x16var_c
+
+void aom_highbd_8_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
+#define aom_highbd_8_get8x8var aom_highbd_8_get8x8var_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance128x128 aom_highbd_8_masked_sub_pixel_variance128x128_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance128x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance128x64 aom_highbd_8_masked_sub_pixel_variance128x64_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance16x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance16x16 aom_highbd_8_masked_sub_pixel_variance16x16_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance16x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance16x32 aom_highbd_8_masked_sub_pixel_variance16x32_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance16x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance16x4 aom_highbd_8_masked_sub_pixel_variance16x4_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance16x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance16x64 aom_highbd_8_masked_sub_pixel_variance16x64_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance16x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance16x8 aom_highbd_8_masked_sub_pixel_variance16x8_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance32x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance32x16 aom_highbd_8_masked_sub_pixel_variance32x16_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance32x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance32x32 aom_highbd_8_masked_sub_pixel_variance32x32_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance32x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance32x64 aom_highbd_8_masked_sub_pixel_variance32x64_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance32x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance32x8 aom_highbd_8_masked_sub_pixel_variance32x8_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance4x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance4x16 aom_highbd_8_masked_sub_pixel_variance4x16_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance4x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance4x4 aom_highbd_8_masked_sub_pixel_variance4x4_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance4x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance4x8 aom_highbd_8_masked_sub_pixel_variance4x8_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance64x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance64x128 aom_highbd_8_masked_sub_pixel_variance64x128_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance64x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance64x16 aom_highbd_8_masked_sub_pixel_variance64x16_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance64x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance64x32 aom_highbd_8_masked_sub_pixel_variance64x32_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance64x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance64x64 aom_highbd_8_masked_sub_pixel_variance64x64_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance8x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance8x16 aom_highbd_8_masked_sub_pixel_variance8x16_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance8x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance8x32 aom_highbd_8_masked_sub_pixel_variance8x32_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance8x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance8x4 aom_highbd_8_masked_sub_pixel_variance8x4_c
+
+unsigned int aom_highbd_8_masked_sub_pixel_variance8x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_highbd_8_masked_sub_pixel_variance8x8 aom_highbd_8_masked_sub_pixel_variance8x8_c
+
+unsigned int aom_highbd_8_mse16x16_c(const uint8_t *src_ptr, int  source_stride, const uint8_t *ref_ptr, int  recon_stride, unsigned int *sse);
+#define aom_highbd_8_mse16x16 aom_highbd_8_mse16x16_c
+
+unsigned int aom_highbd_8_mse16x8_c(const uint8_t *src_ptr, int  source_stride, const uint8_t *ref_ptr, int  recon_stride, unsigned int *sse);
+#define aom_highbd_8_mse16x8 aom_highbd_8_mse16x8_c
+
+unsigned int aom_highbd_8_mse8x16_c(const uint8_t *src_ptr, int  source_stride, const uint8_t *ref_ptr, int  recon_stride, unsigned int *sse);
+#define aom_highbd_8_mse8x16 aom_highbd_8_mse8x16_c
+
+unsigned int aom_highbd_8_mse8x8_c(const uint8_t *src_ptr, int  source_stride, const uint8_t *ref_ptr, int  recon_stride, unsigned int *sse);
+#define aom_highbd_8_mse8x8 aom_highbd_8_mse8x8_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance128x128 aom_highbd_8_sub_pixel_avg_variance128x128_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance128x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance128x64 aom_highbd_8_sub_pixel_avg_variance128x64_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance16x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance16x16 aom_highbd_8_sub_pixel_avg_variance16x16_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance16x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance16x32 aom_highbd_8_sub_pixel_avg_variance16x32_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance16x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance16x4 aom_highbd_8_sub_pixel_avg_variance16x4_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance16x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance16x64 aom_highbd_8_sub_pixel_avg_variance16x64_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance16x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance16x8 aom_highbd_8_sub_pixel_avg_variance16x8_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance32x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance32x16 aom_highbd_8_sub_pixel_avg_variance32x16_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance32x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance32x32 aom_highbd_8_sub_pixel_avg_variance32x32_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance32x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance32x64 aom_highbd_8_sub_pixel_avg_variance32x64_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance32x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance32x8 aom_highbd_8_sub_pixel_avg_variance32x8_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance4x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance4x16 aom_highbd_8_sub_pixel_avg_variance4x16_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance4x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance4x4 aom_highbd_8_sub_pixel_avg_variance4x4_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance4x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance4x8 aom_highbd_8_sub_pixel_avg_variance4x8_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance64x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance64x128 aom_highbd_8_sub_pixel_avg_variance64x128_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance64x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance64x16 aom_highbd_8_sub_pixel_avg_variance64x16_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance64x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance64x32 aom_highbd_8_sub_pixel_avg_variance64x32_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance64x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance64x64 aom_highbd_8_sub_pixel_avg_variance64x64_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance8x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance8x16 aom_highbd_8_sub_pixel_avg_variance8x16_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance8x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance8x32 aom_highbd_8_sub_pixel_avg_variance8x32_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance8x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance8x4 aom_highbd_8_sub_pixel_avg_variance8x4_c
+
+uint32_t aom_highbd_8_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_highbd_8_sub_pixel_avg_variance8x8 aom_highbd_8_sub_pixel_avg_variance8x8_c
+
+uint32_t aom_highbd_8_sub_pixel_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance128x128 aom_highbd_8_sub_pixel_variance128x128_c
+
+uint32_t aom_highbd_8_sub_pixel_variance128x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance128x64 aom_highbd_8_sub_pixel_variance128x64_c
+
+uint32_t aom_highbd_8_sub_pixel_variance16x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance16x16 aom_highbd_8_sub_pixel_variance16x16_c
+
+uint32_t aom_highbd_8_sub_pixel_variance16x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance16x32 aom_highbd_8_sub_pixel_variance16x32_c
+
+uint32_t aom_highbd_8_sub_pixel_variance16x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance16x4 aom_highbd_8_sub_pixel_variance16x4_c
+
+uint32_t aom_highbd_8_sub_pixel_variance16x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance16x64 aom_highbd_8_sub_pixel_variance16x64_c
+
+uint32_t aom_highbd_8_sub_pixel_variance16x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance16x8 aom_highbd_8_sub_pixel_variance16x8_c
+
+uint32_t aom_highbd_8_sub_pixel_variance32x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance32x16 aom_highbd_8_sub_pixel_variance32x16_c
+
+uint32_t aom_highbd_8_sub_pixel_variance32x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance32x32 aom_highbd_8_sub_pixel_variance32x32_c
+
+uint32_t aom_highbd_8_sub_pixel_variance32x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance32x64 aom_highbd_8_sub_pixel_variance32x64_c
+
+uint32_t aom_highbd_8_sub_pixel_variance32x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance32x8 aom_highbd_8_sub_pixel_variance32x8_c
+
+uint32_t aom_highbd_8_sub_pixel_variance4x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance4x16 aom_highbd_8_sub_pixel_variance4x16_c
+
+uint32_t aom_highbd_8_sub_pixel_variance4x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance4x4 aom_highbd_8_sub_pixel_variance4x4_c
+
+uint32_t aom_highbd_8_sub_pixel_variance4x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance4x8 aom_highbd_8_sub_pixel_variance4x8_c
+
+uint32_t aom_highbd_8_sub_pixel_variance64x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance64x128 aom_highbd_8_sub_pixel_variance64x128_c
+
+uint32_t aom_highbd_8_sub_pixel_variance64x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance64x16 aom_highbd_8_sub_pixel_variance64x16_c
+
+uint32_t aom_highbd_8_sub_pixel_variance64x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance64x32 aom_highbd_8_sub_pixel_variance64x32_c
+
+uint32_t aom_highbd_8_sub_pixel_variance64x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance64x64 aom_highbd_8_sub_pixel_variance64x64_c
+
+uint32_t aom_highbd_8_sub_pixel_variance8x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance8x16 aom_highbd_8_sub_pixel_variance8x16_c
+
+uint32_t aom_highbd_8_sub_pixel_variance8x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance8x32 aom_highbd_8_sub_pixel_variance8x32_c
+
+uint32_t aom_highbd_8_sub_pixel_variance8x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance8x4 aom_highbd_8_sub_pixel_variance8x4_c
+
+uint32_t aom_highbd_8_sub_pixel_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_sub_pixel_variance8x8 aom_highbd_8_sub_pixel_variance8x8_c
+
+unsigned int aom_highbd_8_variance128x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance128x128 aom_highbd_8_variance128x128_c
+
+unsigned int aom_highbd_8_variance128x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance128x64 aom_highbd_8_variance128x64_c
+
+unsigned int aom_highbd_8_variance16x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance16x16 aom_highbd_8_variance16x16_c
+
+unsigned int aom_highbd_8_variance16x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance16x32 aom_highbd_8_variance16x32_c
+
+unsigned int aom_highbd_8_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_variance16x4 aom_highbd_8_variance16x4_c
+
+unsigned int aom_highbd_8_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_variance16x64 aom_highbd_8_variance16x64_c
+
+unsigned int aom_highbd_8_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance16x8 aom_highbd_8_variance16x8_c
+
+unsigned int aom_highbd_8_variance2x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance2x2 aom_highbd_8_variance2x2_c
+
+unsigned int aom_highbd_8_variance2x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance2x4 aom_highbd_8_variance2x4_c
+
+unsigned int aom_highbd_8_variance32x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance32x16 aom_highbd_8_variance32x16_c
+
+unsigned int aom_highbd_8_variance32x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance32x32 aom_highbd_8_variance32x32_c
+
+unsigned int aom_highbd_8_variance32x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance32x64 aom_highbd_8_variance32x64_c
+
+unsigned int aom_highbd_8_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_variance32x8 aom_highbd_8_variance32x8_c
+
+unsigned int aom_highbd_8_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_variance4x16 aom_highbd_8_variance4x16_c
+
+unsigned int aom_highbd_8_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance4x2 aom_highbd_8_variance4x2_c
+
+unsigned int aom_highbd_8_variance4x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance4x4 aom_highbd_8_variance4x4_c
+
+unsigned int aom_highbd_8_variance4x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance4x8 aom_highbd_8_variance4x8_c
+
+unsigned int aom_highbd_8_variance64x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance64x128 aom_highbd_8_variance64x128_c
+
+unsigned int aom_highbd_8_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_variance64x16 aom_highbd_8_variance64x16_c
+
+unsigned int aom_highbd_8_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance64x32 aom_highbd_8_variance64x32_c
+
+unsigned int aom_highbd_8_variance64x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance64x64 aom_highbd_8_variance64x64_c
+
+unsigned int aom_highbd_8_variance8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance8x16 aom_highbd_8_variance8x16_c
+
+unsigned int aom_highbd_8_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_highbd_8_variance8x32 aom_highbd_8_variance8x32_c
+
+unsigned int aom_highbd_8_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance8x4 aom_highbd_8_variance8x4_c
+
+unsigned int aom_highbd_8_variance8x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance8x8 aom_highbd_8_variance8x8_c
+
+void aom_highbd_blend_a64_d16_mask_c(uint8_t *dst, uint32_t dst_stride, const CONV_BUF_TYPE *src0, uint32_t src0_stride, const CONV_BUF_TYPE *src1, uint32_t src1_stride, const uint8_t *mask, uint32_t mask_stride, int w, int h, int subx, int suby, ConvolveParams *conv_params, const int bd);
+#define aom_highbd_blend_a64_d16_mask aom_highbd_blend_a64_d16_mask_c
+
+void aom_highbd_blend_a64_hmask_c(uint8_t *dst, uint32_t dst_stride, const uint8_t *src0, uint32_t src0_stride, const uint8_t *src1, uint32_t src1_stride, const uint8_t *mask, int w, int h, int bd);
+#define aom_highbd_blend_a64_hmask aom_highbd_blend_a64_hmask_c
+
+void aom_highbd_blend_a64_mask_c(uint8_t *dst, uint32_t dst_stride, const uint8_t *src0, uint32_t src0_stride, const uint8_t *src1, uint32_t src1_stride, const uint8_t *mask, uint32_t mask_stride, int w, int h, int subx, int suby, int bd);
+#define aom_highbd_blend_a64_mask aom_highbd_blend_a64_mask_c
+
+void aom_highbd_blend_a64_vmask_c(uint8_t *dst, uint32_t dst_stride, const uint8_t *src0, uint32_t src0_stride, const uint8_t *src1, uint32_t src1_stride, const uint8_t *mask, int w, int h, int bd);
+#define aom_highbd_blend_a64_vmask aom_highbd_blend_a64_vmask_c
+
+void aom_highbd_comp_avg_pred_c(uint8_t *comp_pred8, const uint8_t *pred8, int width, int height, const uint8_t *ref8, int ref_stride);
+#define aom_highbd_comp_avg_pred aom_highbd_comp_avg_pred_c
+
+void aom_highbd_comp_avg_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
+                                                          const MV *const mv, uint8_t *comp_pred8, const uint8_t *pred8, int width,
+                                                          int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref8, int ref_stride, int bd, int subpel_search);
+#define aom_highbd_comp_avg_upsampled_pred aom_highbd_comp_avg_upsampled_pred_c
+
+void aom_highbd_comp_mask_pred_c(uint8_t *comp_pred, const uint8_t *pred8, int width, int height, const uint8_t *ref8, int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask);
+#define aom_highbd_comp_mask_pred aom_highbd_comp_mask_pred_c
+
+void aom_highbd_convolve8_horiz_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, int bps);
+#define aom_highbd_convolve8_horiz aom_highbd_convolve8_horiz_c
+
+void aom_highbd_convolve8_vert_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, int bps);
+#define aom_highbd_convolve8_vert aom_highbd_convolve8_vert_c
+
+void aom_highbd_convolve_copy_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, int bps);
+#define aom_highbd_convolve_copy aom_highbd_convolve_copy_c
+
+void aom_highbd_dc_128_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_16x16 aom_highbd_dc_128_predictor_16x16_c
+
+void aom_highbd_dc_128_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_16x32 aom_highbd_dc_128_predictor_16x32_c
+
+void aom_highbd_dc_128_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_16x4 aom_highbd_dc_128_predictor_16x4_c
+
+void aom_highbd_dc_128_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_16x64 aom_highbd_dc_128_predictor_16x64_c
+
+void aom_highbd_dc_128_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_16x8 aom_highbd_dc_128_predictor_16x8_c
+
+void aom_highbd_dc_128_predictor_2x2_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_2x2 aom_highbd_dc_128_predictor_2x2_c
+
+void aom_highbd_dc_128_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_32x16 aom_highbd_dc_128_predictor_32x16_c
+
+void aom_highbd_dc_128_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_32x32 aom_highbd_dc_128_predictor_32x32_c
+
+void aom_highbd_dc_128_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_32x64 aom_highbd_dc_128_predictor_32x64_c
+
+void aom_highbd_dc_128_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_32x8 aom_highbd_dc_128_predictor_32x8_c
+
+void aom_highbd_dc_128_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_4x16 aom_highbd_dc_128_predictor_4x16_c
+
+void aom_highbd_dc_128_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_4x4 aom_highbd_dc_128_predictor_4x4_c
+
+void aom_highbd_dc_128_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_4x8 aom_highbd_dc_128_predictor_4x8_c
+
+void aom_highbd_dc_128_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_64x16 aom_highbd_dc_128_predictor_64x16_c
+
+void aom_highbd_dc_128_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_64x32 aom_highbd_dc_128_predictor_64x32_c
+
+void aom_highbd_dc_128_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_64x64 aom_highbd_dc_128_predictor_64x64_c
+
+void aom_highbd_dc_128_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_8x16 aom_highbd_dc_128_predictor_8x16_c
+
+void aom_highbd_dc_128_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_8x32 aom_highbd_dc_128_predictor_8x32_c
+
+void aom_highbd_dc_128_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_8x4 aom_highbd_dc_128_predictor_8x4_c
+
+void aom_highbd_dc_128_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_8x8 aom_highbd_dc_128_predictor_8x8_c
+
+void aom_highbd_dc_left_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_16x16 aom_highbd_dc_left_predictor_16x16_c
+
+void aom_highbd_dc_left_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_16x32 aom_highbd_dc_left_predictor_16x32_c
+
+void aom_highbd_dc_left_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_16x4 aom_highbd_dc_left_predictor_16x4_c
+
+void aom_highbd_dc_left_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_16x64 aom_highbd_dc_left_predictor_16x64_c
+
+void aom_highbd_dc_left_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_16x8 aom_highbd_dc_left_predictor_16x8_c
+
+void aom_highbd_dc_left_predictor_2x2_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_2x2 aom_highbd_dc_left_predictor_2x2_c
+
+void aom_highbd_dc_left_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_32x16 aom_highbd_dc_left_predictor_32x16_c
+
+void aom_highbd_dc_left_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_32x32 aom_highbd_dc_left_predictor_32x32_c
+
+void aom_highbd_dc_left_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_32x64 aom_highbd_dc_left_predictor_32x64_c
+
+void aom_highbd_dc_left_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_32x8 aom_highbd_dc_left_predictor_32x8_c
+
+void aom_highbd_dc_left_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_4x16 aom_highbd_dc_left_predictor_4x16_c
+
+void aom_highbd_dc_left_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_4x4 aom_highbd_dc_left_predictor_4x4_c
+
+void aom_highbd_dc_left_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_4x8 aom_highbd_dc_left_predictor_4x8_c
+
+void aom_highbd_dc_left_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_64x16 aom_highbd_dc_left_predictor_64x16_c
+
+void aom_highbd_dc_left_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_64x32 aom_highbd_dc_left_predictor_64x32_c
+
+void aom_highbd_dc_left_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_64x64 aom_highbd_dc_left_predictor_64x64_c
+
+void aom_highbd_dc_left_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_8x16 aom_highbd_dc_left_predictor_8x16_c
+
+void aom_highbd_dc_left_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_8x32 aom_highbd_dc_left_predictor_8x32_c
+
+void aom_highbd_dc_left_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_8x4 aom_highbd_dc_left_predictor_8x4_c
+
+void aom_highbd_dc_left_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_8x8 aom_highbd_dc_left_predictor_8x8_c
+
+void aom_highbd_dc_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_16x16 aom_highbd_dc_predictor_16x16_c
+
+void aom_highbd_dc_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_16x32 aom_highbd_dc_predictor_16x32_c
+
+void aom_highbd_dc_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_16x4 aom_highbd_dc_predictor_16x4_c
+
+void aom_highbd_dc_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_16x64 aom_highbd_dc_predictor_16x64_c
+
+void aom_highbd_dc_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_16x8 aom_highbd_dc_predictor_16x8_c
+
+void aom_highbd_dc_predictor_2x2_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_2x2 aom_highbd_dc_predictor_2x2_c
+
+void aom_highbd_dc_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_32x16 aom_highbd_dc_predictor_32x16_c
+
+void aom_highbd_dc_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_32x32 aom_highbd_dc_predictor_32x32_c
+
+void aom_highbd_dc_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_32x64 aom_highbd_dc_predictor_32x64_c
+
+void aom_highbd_dc_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_32x8 aom_highbd_dc_predictor_32x8_c
+
+void aom_highbd_dc_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_4x16 aom_highbd_dc_predictor_4x16_c
+
+void aom_highbd_dc_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_4x4 aom_highbd_dc_predictor_4x4_c
+
+void aom_highbd_dc_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_4x8 aom_highbd_dc_predictor_4x8_c
+
+void aom_highbd_dc_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_64x16 aom_highbd_dc_predictor_64x16_c
+
+void aom_highbd_dc_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_64x32 aom_highbd_dc_predictor_64x32_c
+
+void aom_highbd_dc_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_64x64 aom_highbd_dc_predictor_64x64_c
+
+void aom_highbd_dc_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_8x16 aom_highbd_dc_predictor_8x16_c
+
+void aom_highbd_dc_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_8x32 aom_highbd_dc_predictor_8x32_c
+
+void aom_highbd_dc_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_8x4 aom_highbd_dc_predictor_8x4_c
+
+void aom_highbd_dc_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_8x8 aom_highbd_dc_predictor_8x8_c
+
+void aom_highbd_dc_top_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_16x16 aom_highbd_dc_top_predictor_16x16_c
+
+void aom_highbd_dc_top_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_16x32 aom_highbd_dc_top_predictor_16x32_c
+
+void aom_highbd_dc_top_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_16x4 aom_highbd_dc_top_predictor_16x4_c
+
+void aom_highbd_dc_top_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_16x64 aom_highbd_dc_top_predictor_16x64_c
+
+void aom_highbd_dc_top_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_16x8 aom_highbd_dc_top_predictor_16x8_c
+
+void aom_highbd_dc_top_predictor_2x2_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_2x2 aom_highbd_dc_top_predictor_2x2_c
+
+void aom_highbd_dc_top_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_32x16 aom_highbd_dc_top_predictor_32x16_c
+
+void aom_highbd_dc_top_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_32x32 aom_highbd_dc_top_predictor_32x32_c
+
+void aom_highbd_dc_top_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_32x64 aom_highbd_dc_top_predictor_32x64_c
+
+void aom_highbd_dc_top_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_32x8 aom_highbd_dc_top_predictor_32x8_c
+
+void aom_highbd_dc_top_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_4x16 aom_highbd_dc_top_predictor_4x16_c
+
+void aom_highbd_dc_top_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_4x4 aom_highbd_dc_top_predictor_4x4_c
+
+void aom_highbd_dc_top_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_4x8 aom_highbd_dc_top_predictor_4x8_c
+
+void aom_highbd_dc_top_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_64x16 aom_highbd_dc_top_predictor_64x16_c
+
+void aom_highbd_dc_top_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_64x32 aom_highbd_dc_top_predictor_64x32_c
+
+void aom_highbd_dc_top_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_64x64 aom_highbd_dc_top_predictor_64x64_c
+
+void aom_highbd_dc_top_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_8x16 aom_highbd_dc_top_predictor_8x16_c
+
+void aom_highbd_dc_top_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_8x32 aom_highbd_dc_top_predictor_8x32_c
+
+void aom_highbd_dc_top_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_8x4 aom_highbd_dc_top_predictor_8x4_c
+
+void aom_highbd_dc_top_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_8x8 aom_highbd_dc_top_predictor_8x8_c
+
+void aom_highbd_dist_wtd_comp_avg_pred_c(uint8_t *comp_pred8, const uint8_t *pred8, int width, int height, const uint8_t *ref8, int ref_stride, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_highbd_dist_wtd_comp_avg_pred aom_highbd_dist_wtd_comp_avg_pred_c
+
+void aom_highbd_dist_wtd_comp_avg_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
+                                                              const MV *const mv, uint8_t *comp_pred8, const uint8_t *pred8, int width,
+                                                              int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref8,
+                                                              int ref_stride, int bd, const DIST_WTD_COMP_PARAMS *jcp_param, int subpel_search);
+#define aom_highbd_dist_wtd_comp_avg_upsampled_pred aom_highbd_dist_wtd_comp_avg_upsampled_pred_c
+
+unsigned int aom_highbd_dist_wtd_sad128x128_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad128x128_avg aom_highbd_dist_wtd_sad128x128_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad128x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad128x64_avg aom_highbd_dist_wtd_sad128x64_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad16x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad16x16_avg aom_highbd_dist_wtd_sad16x16_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad16x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad16x32_avg aom_highbd_dist_wtd_sad16x32_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad16x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad16x4_avg aom_highbd_dist_wtd_sad16x4_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad16x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad16x64_avg aom_highbd_dist_wtd_sad16x64_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad16x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad16x8_avg aom_highbd_dist_wtd_sad16x8_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad32x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad32x16_avg aom_highbd_dist_wtd_sad32x16_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad32x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad32x32_avg aom_highbd_dist_wtd_sad32x32_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad32x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad32x64_avg aom_highbd_dist_wtd_sad32x64_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad32x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad32x8_avg aom_highbd_dist_wtd_sad32x8_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad4x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad4x16_avg aom_highbd_dist_wtd_sad4x16_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad4x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad4x4_avg aom_highbd_dist_wtd_sad4x4_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad4x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad4x8_avg aom_highbd_dist_wtd_sad4x8_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad64x128_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad64x128_avg aom_highbd_dist_wtd_sad64x128_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad64x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad64x16_avg aom_highbd_dist_wtd_sad64x16_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad64x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad64x32_avg aom_highbd_dist_wtd_sad64x32_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad64x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad64x64_avg aom_highbd_dist_wtd_sad64x64_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad8x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad8x16_avg aom_highbd_dist_wtd_sad8x16_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad8x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad8x32_avg aom_highbd_dist_wtd_sad8x32_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad8x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad8x4_avg aom_highbd_dist_wtd_sad8x4_avg_c
+
+unsigned int aom_highbd_dist_wtd_sad8x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
+#define aom_highbd_dist_wtd_sad8x8_avg aom_highbd_dist_wtd_sad8x8_avg_c
+
+void aom_highbd_fdct8x8_c(const int16_t *input, tran_low_t *output, int stride);
+#define aom_highbd_fdct8x8 aom_highbd_fdct8x8_c
+
+void aom_highbd_h_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_16x16 aom_highbd_h_predictor_16x16_c
+
+void aom_highbd_h_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_16x32 aom_highbd_h_predictor_16x32_c
+
+void aom_highbd_h_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_16x4 aom_highbd_h_predictor_16x4_c
+
+void aom_highbd_h_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_16x64 aom_highbd_h_predictor_16x64_c
+
+void aom_highbd_h_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_16x8 aom_highbd_h_predictor_16x8_c
+
+void aom_highbd_h_predictor_2x2_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_2x2 aom_highbd_h_predictor_2x2_c
+
+void aom_highbd_h_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_32x16 aom_highbd_h_predictor_32x16_c
+
+void aom_highbd_h_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_32x32 aom_highbd_h_predictor_32x32_c
+
+void aom_highbd_h_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_32x64 aom_highbd_h_predictor_32x64_c
+
+void aom_highbd_h_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_32x8 aom_highbd_h_predictor_32x8_c
+
+void aom_highbd_h_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_4x16 aom_highbd_h_predictor_4x16_c
+
+void aom_highbd_h_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_4x4 aom_highbd_h_predictor_4x4_c
+
+void aom_highbd_h_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_4x8 aom_highbd_h_predictor_4x8_c
+
+void aom_highbd_h_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_64x16 aom_highbd_h_predictor_64x16_c
+
+void aom_highbd_h_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_64x32 aom_highbd_h_predictor_64x32_c
+
+void aom_highbd_h_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_64x64 aom_highbd_h_predictor_64x64_c
+
+void aom_highbd_h_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_8x16 aom_highbd_h_predictor_8x16_c
+
+void aom_highbd_h_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_8x32 aom_highbd_h_predictor_8x32_c
+
+void aom_highbd_h_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_8x4 aom_highbd_h_predictor_8x4_c
+
+void aom_highbd_h_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_8x8 aom_highbd_h_predictor_8x8_c
+
+void aom_highbd_lpf_horizontal_14_c(uint16_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh, int bd);
+#define aom_highbd_lpf_horizontal_14 aom_highbd_lpf_horizontal_14_c
+
+void aom_highbd_lpf_horizontal_14_dual_c(uint16_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limt1, const uint8_t *thresh1,int bd);
+#define aom_highbd_lpf_horizontal_14_dual aom_highbd_lpf_horizontal_14_dual_c
+
+void aom_highbd_lpf_horizontal_4_c(uint16_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh, int bd);
+#define aom_highbd_lpf_horizontal_4 aom_highbd_lpf_horizontal_4_c
+
+void aom_highbd_lpf_horizontal_4_dual_c(uint16_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1, const uint8_t *thresh1, int bd);
+#define aom_highbd_lpf_horizontal_4_dual aom_highbd_lpf_horizontal_4_dual_c
+
+void aom_highbd_lpf_horizontal_6_c(uint16_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh, int bd);
+#define aom_highbd_lpf_horizontal_6 aom_highbd_lpf_horizontal_6_c
+
+void aom_highbd_lpf_horizontal_6_dual_c(uint16_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1, const uint8_t *thresh1, int bd);
+#define aom_highbd_lpf_horizontal_6_dual aom_highbd_lpf_horizontal_6_dual_c
+
+void aom_highbd_lpf_horizontal_8_c(uint16_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh, int bd);
+#define aom_highbd_lpf_horizontal_8 aom_highbd_lpf_horizontal_8_c
+
+void aom_highbd_lpf_horizontal_8_dual_c(uint16_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1, const uint8_t *thresh1, int bd);
+#define aom_highbd_lpf_horizontal_8_dual aom_highbd_lpf_horizontal_8_dual_c
+
+void aom_highbd_lpf_vertical_14_c(uint16_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh, int bd);
+#define aom_highbd_lpf_vertical_14 aom_highbd_lpf_vertical_14_c
+
+void aom_highbd_lpf_vertical_14_dual_c(uint16_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1, const uint8_t *thresh1, int bd);
+#define aom_highbd_lpf_vertical_14_dual aom_highbd_lpf_vertical_14_dual_c
+
+void aom_highbd_lpf_vertical_4_c(uint16_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh, int bd);
+#define aom_highbd_lpf_vertical_4 aom_highbd_lpf_vertical_4_c
+
+void aom_highbd_lpf_vertical_4_dual_c(uint16_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1, const uint8_t *thresh1, int bd);
+#define aom_highbd_lpf_vertical_4_dual aom_highbd_lpf_vertical_4_dual_c
+
+void aom_highbd_lpf_vertical_6_c(uint16_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh, int bd);
+#define aom_highbd_lpf_vertical_6 aom_highbd_lpf_vertical_6_c
+
+void aom_highbd_lpf_vertical_6_dual_c(uint16_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1, const uint8_t *thresh1, int bd);
+#define aom_highbd_lpf_vertical_6_dual aom_highbd_lpf_vertical_6_dual_c
+
+void aom_highbd_lpf_vertical_8_c(uint16_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh, int bd);
+#define aom_highbd_lpf_vertical_8 aom_highbd_lpf_vertical_8_c
+
+void aom_highbd_lpf_vertical_8_dual_c(uint16_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1, const uint8_t *thresh1, int bd);
+#define aom_highbd_lpf_vertical_8_dual aom_highbd_lpf_vertical_8_dual_c
+
+unsigned int aom_highbd_masked_sad128x128_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad128x128 aom_highbd_masked_sad128x128_c
+
+unsigned int aom_highbd_masked_sad128x64_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad128x64 aom_highbd_masked_sad128x64_c
+
+unsigned int aom_highbd_masked_sad16x16_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad16x16 aom_highbd_masked_sad16x16_c
+
+unsigned int aom_highbd_masked_sad16x32_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad16x32 aom_highbd_masked_sad16x32_c
+
+unsigned int aom_highbd_masked_sad16x4_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad16x4 aom_highbd_masked_sad16x4_c
+
+unsigned int aom_highbd_masked_sad16x64_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad16x64 aom_highbd_masked_sad16x64_c
+
+unsigned int aom_highbd_masked_sad16x8_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad16x8 aom_highbd_masked_sad16x8_c
+
+unsigned int aom_highbd_masked_sad32x16_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad32x16 aom_highbd_masked_sad32x16_c
+
+unsigned int aom_highbd_masked_sad32x32_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad32x32 aom_highbd_masked_sad32x32_c
+
+unsigned int aom_highbd_masked_sad32x64_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad32x64 aom_highbd_masked_sad32x64_c
+
+unsigned int aom_highbd_masked_sad32x8_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad32x8 aom_highbd_masked_sad32x8_c
+
+unsigned int aom_highbd_masked_sad4x16_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad4x16 aom_highbd_masked_sad4x16_c
+
+unsigned int aom_highbd_masked_sad4x4_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad4x4 aom_highbd_masked_sad4x4_c
+
+unsigned int aom_highbd_masked_sad4x8_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad4x8 aom_highbd_masked_sad4x8_c
+
+unsigned int aom_highbd_masked_sad64x128_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad64x128 aom_highbd_masked_sad64x128_c
+
+unsigned int aom_highbd_masked_sad64x16_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad64x16 aom_highbd_masked_sad64x16_c
+
+unsigned int aom_highbd_masked_sad64x32_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad64x32 aom_highbd_masked_sad64x32_c
+
+unsigned int aom_highbd_masked_sad64x64_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad64x64 aom_highbd_masked_sad64x64_c
+
+unsigned int aom_highbd_masked_sad8x16_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad8x16 aom_highbd_masked_sad8x16_c
+
+unsigned int aom_highbd_masked_sad8x32_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad8x32 aom_highbd_masked_sad8x32_c
+
+unsigned int aom_highbd_masked_sad8x4_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad8x4 aom_highbd_masked_sad8x4_c
+
+unsigned int aom_highbd_masked_sad8x8_c(const uint8_t *src8, int src_stride, const uint8_t *ref8, int ref_stride, const uint8_t *second_pred8, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_highbd_masked_sad8x8 aom_highbd_masked_sad8x8_c
+
+unsigned int aom_highbd_obmc_sad128x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad128x128 aom_highbd_obmc_sad128x128_c
+
+unsigned int aom_highbd_obmc_sad128x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad128x64 aom_highbd_obmc_sad128x64_c
+
+unsigned int aom_highbd_obmc_sad16x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad16x16 aom_highbd_obmc_sad16x16_c
+
+unsigned int aom_highbd_obmc_sad16x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad16x32 aom_highbd_obmc_sad16x32_c
+
+unsigned int aom_highbd_obmc_sad16x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad16x4 aom_highbd_obmc_sad16x4_c
+
+unsigned int aom_highbd_obmc_sad16x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad16x64 aom_highbd_obmc_sad16x64_c
+
+unsigned int aom_highbd_obmc_sad16x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad16x8 aom_highbd_obmc_sad16x8_c
+
+unsigned int aom_highbd_obmc_sad32x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad32x16 aom_highbd_obmc_sad32x16_c
+
+unsigned int aom_highbd_obmc_sad32x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad32x32 aom_highbd_obmc_sad32x32_c
+
+unsigned int aom_highbd_obmc_sad32x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad32x64 aom_highbd_obmc_sad32x64_c
+
+unsigned int aom_highbd_obmc_sad32x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad32x8 aom_highbd_obmc_sad32x8_c
+
+unsigned int aom_highbd_obmc_sad4x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad4x16 aom_highbd_obmc_sad4x16_c
+
+unsigned int aom_highbd_obmc_sad4x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad4x4 aom_highbd_obmc_sad4x4_c
+
+unsigned int aom_highbd_obmc_sad4x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad4x8 aom_highbd_obmc_sad4x8_c
+
+unsigned int aom_highbd_obmc_sad64x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad64x128 aom_highbd_obmc_sad64x128_c
+
+unsigned int aom_highbd_obmc_sad64x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad64x16 aom_highbd_obmc_sad64x16_c
+
+unsigned int aom_highbd_obmc_sad64x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad64x32 aom_highbd_obmc_sad64x32_c
+
+unsigned int aom_highbd_obmc_sad64x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad64x64 aom_highbd_obmc_sad64x64_c
+
+unsigned int aom_highbd_obmc_sad8x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad8x16 aom_highbd_obmc_sad8x16_c
+
+unsigned int aom_highbd_obmc_sad8x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad8x32 aom_highbd_obmc_sad8x32_c
+
+unsigned int aom_highbd_obmc_sad8x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad8x4 aom_highbd_obmc_sad8x4_c
+
+unsigned int aom_highbd_obmc_sad8x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_highbd_obmc_sad8x8 aom_highbd_obmc_sad8x8_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance128x128_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance128x128 aom_highbd_obmc_sub_pixel_variance128x128_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance128x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance128x64 aom_highbd_obmc_sub_pixel_variance128x64_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance16x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance16x16 aom_highbd_obmc_sub_pixel_variance16x16_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance16x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance16x32 aom_highbd_obmc_sub_pixel_variance16x32_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance16x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance16x4 aom_highbd_obmc_sub_pixel_variance16x4_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance16x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance16x64 aom_highbd_obmc_sub_pixel_variance16x64_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance16x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance16x8 aom_highbd_obmc_sub_pixel_variance16x8_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance32x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance32x16 aom_highbd_obmc_sub_pixel_variance32x16_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance32x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance32x32 aom_highbd_obmc_sub_pixel_variance32x32_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance32x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance32x64 aom_highbd_obmc_sub_pixel_variance32x64_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance32x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance32x8 aom_highbd_obmc_sub_pixel_variance32x8_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance4x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance4x16 aom_highbd_obmc_sub_pixel_variance4x16_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance4x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance4x4 aom_highbd_obmc_sub_pixel_variance4x4_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance4x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance4x8 aom_highbd_obmc_sub_pixel_variance4x8_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance64x128_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance64x128 aom_highbd_obmc_sub_pixel_variance64x128_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance64x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance64x16 aom_highbd_obmc_sub_pixel_variance64x16_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance64x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance64x32 aom_highbd_obmc_sub_pixel_variance64x32_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance64x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance64x64 aom_highbd_obmc_sub_pixel_variance64x64_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance8x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance8x16 aom_highbd_obmc_sub_pixel_variance8x16_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance8x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance8x32 aom_highbd_obmc_sub_pixel_variance8x32_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance8x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance8x4 aom_highbd_obmc_sub_pixel_variance8x4_c
+
+unsigned int aom_highbd_obmc_sub_pixel_variance8x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_sub_pixel_variance8x8 aom_highbd_obmc_sub_pixel_variance8x8_c
+
+unsigned int aom_highbd_obmc_variance128x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance128x128 aom_highbd_obmc_variance128x128_c
+
+unsigned int aom_highbd_obmc_variance128x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance128x64 aom_highbd_obmc_variance128x64_c
+
+unsigned int aom_highbd_obmc_variance16x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance16x16 aom_highbd_obmc_variance16x16_c
+
+unsigned int aom_highbd_obmc_variance16x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance16x32 aom_highbd_obmc_variance16x32_c
+
+unsigned int aom_highbd_obmc_variance16x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance16x4 aom_highbd_obmc_variance16x4_c
+
+unsigned int aom_highbd_obmc_variance16x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance16x64 aom_highbd_obmc_variance16x64_c
+
+unsigned int aom_highbd_obmc_variance16x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance16x8 aom_highbd_obmc_variance16x8_c
+
+unsigned int aom_highbd_obmc_variance32x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance32x16 aom_highbd_obmc_variance32x16_c
+
+unsigned int aom_highbd_obmc_variance32x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance32x32 aom_highbd_obmc_variance32x32_c
+
+unsigned int aom_highbd_obmc_variance32x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance32x64 aom_highbd_obmc_variance32x64_c
+
+unsigned int aom_highbd_obmc_variance32x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance32x8 aom_highbd_obmc_variance32x8_c
+
+unsigned int aom_highbd_obmc_variance4x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance4x16 aom_highbd_obmc_variance4x16_c
+
+unsigned int aom_highbd_obmc_variance4x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance4x4 aom_highbd_obmc_variance4x4_c
+
+unsigned int aom_highbd_obmc_variance4x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance4x8 aom_highbd_obmc_variance4x8_c
+
+unsigned int aom_highbd_obmc_variance64x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance64x128 aom_highbd_obmc_variance64x128_c
+
+unsigned int aom_highbd_obmc_variance64x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance64x16 aom_highbd_obmc_variance64x16_c
+
+unsigned int aom_highbd_obmc_variance64x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance64x32 aom_highbd_obmc_variance64x32_c
+
+unsigned int aom_highbd_obmc_variance64x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance64x64 aom_highbd_obmc_variance64x64_c
+
+unsigned int aom_highbd_obmc_variance8x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance8x16 aom_highbd_obmc_variance8x16_c
+
+unsigned int aom_highbd_obmc_variance8x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance8x32 aom_highbd_obmc_variance8x32_c
+
+unsigned int aom_highbd_obmc_variance8x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance8x4 aom_highbd_obmc_variance8x4_c
+
+unsigned int aom_highbd_obmc_variance8x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_highbd_obmc_variance8x8 aom_highbd_obmc_variance8x8_c
+
+void aom_highbd_paeth_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_16x16 aom_highbd_paeth_predictor_16x16_c
+
+void aom_highbd_paeth_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_16x32 aom_highbd_paeth_predictor_16x32_c
+
+void aom_highbd_paeth_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_16x4 aom_highbd_paeth_predictor_16x4_c
+
+void aom_highbd_paeth_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_16x64 aom_highbd_paeth_predictor_16x64_c
+
+void aom_highbd_paeth_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_16x8 aom_highbd_paeth_predictor_16x8_c
+
+void aom_highbd_paeth_predictor_2x2_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_2x2 aom_highbd_paeth_predictor_2x2_c
+
+void aom_highbd_paeth_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_32x16 aom_highbd_paeth_predictor_32x16_c
+
+void aom_highbd_paeth_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_32x32 aom_highbd_paeth_predictor_32x32_c
+
+void aom_highbd_paeth_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_32x64 aom_highbd_paeth_predictor_32x64_c
+
+void aom_highbd_paeth_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_32x8 aom_highbd_paeth_predictor_32x8_c
+
+void aom_highbd_paeth_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_4x16 aom_highbd_paeth_predictor_4x16_c
+
+void aom_highbd_paeth_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_4x4 aom_highbd_paeth_predictor_4x4_c
+
+void aom_highbd_paeth_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_4x8 aom_highbd_paeth_predictor_4x8_c
+
+void aom_highbd_paeth_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_64x16 aom_highbd_paeth_predictor_64x16_c
+
+void aom_highbd_paeth_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_64x32 aom_highbd_paeth_predictor_64x32_c
+
+void aom_highbd_paeth_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_64x64 aom_highbd_paeth_predictor_64x64_c
+
+void aom_highbd_paeth_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_8x16 aom_highbd_paeth_predictor_8x16_c
+
+void aom_highbd_paeth_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_8x32 aom_highbd_paeth_predictor_8x32_c
+
+void aom_highbd_paeth_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_8x4 aom_highbd_paeth_predictor_8x4_c
+
+void aom_highbd_paeth_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_paeth_predictor_8x8 aom_highbd_paeth_predictor_8x8_c
+
+void aom_highbd_quantize_b_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
+#define aom_highbd_quantize_b aom_highbd_quantize_b_c
+
+void aom_highbd_quantize_b_32x32_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
+#define aom_highbd_quantize_b_32x32 aom_highbd_quantize_b_32x32_c
+
+void aom_highbd_quantize_b_64x64_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
+#define aom_highbd_quantize_b_64x64 aom_highbd_quantize_b_64x64_c
+
+unsigned int aom_highbd_sad128x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad128x128 aom_highbd_sad128x128_c
+
+unsigned int aom_highbd_sad128x128_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad128x128_avg aom_highbd_sad128x128_avg_c
+
+void aom_highbd_sad128x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad128x128x4d aom_highbd_sad128x128x4d_c
+
+unsigned int aom_highbd_sad128x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad128x64 aom_highbd_sad128x64_c
+
+unsigned int aom_highbd_sad128x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad128x64_avg aom_highbd_sad128x64_avg_c
+
+void aom_highbd_sad128x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad128x64x4d aom_highbd_sad128x64x4d_c
+
+unsigned int aom_highbd_sad16x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad16x16 aom_highbd_sad16x16_c
+
+unsigned int aom_highbd_sad16x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad16x16_avg aom_highbd_sad16x16_avg_c
+
+void aom_highbd_sad16x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad16x16x4d aom_highbd_sad16x16x4d_c
+
+unsigned int aom_highbd_sad16x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad16x32 aom_highbd_sad16x32_c
+
+unsigned int aom_highbd_sad16x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad16x32_avg aom_highbd_sad16x32_avg_c
+
+void aom_highbd_sad16x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad16x32x4d aom_highbd_sad16x32x4d_c
+
+unsigned int aom_highbd_sad16x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad16x4 aom_highbd_sad16x4_c
+
+unsigned int aom_highbd_sad16x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad16x4_avg aom_highbd_sad16x4_avg_c
+
+void aom_highbd_sad16x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad16x4x4d aom_highbd_sad16x4x4d_c
+
+unsigned int aom_highbd_sad16x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad16x64 aom_highbd_sad16x64_c
+
+unsigned int aom_highbd_sad16x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad16x64_avg aom_highbd_sad16x64_avg_c
+
+void aom_highbd_sad16x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad16x64x4d aom_highbd_sad16x64x4d_c
+
+unsigned int aom_highbd_sad16x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad16x8 aom_highbd_sad16x8_c
+
+unsigned int aom_highbd_sad16x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad16x8_avg aom_highbd_sad16x8_avg_c
+
+void aom_highbd_sad16x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad16x8x4d aom_highbd_sad16x8x4d_c
+
+unsigned int aom_highbd_sad32x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad32x16 aom_highbd_sad32x16_c
+
+unsigned int aom_highbd_sad32x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad32x16_avg aom_highbd_sad32x16_avg_c
+
+void aom_highbd_sad32x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad32x16x4d aom_highbd_sad32x16x4d_c
+
+unsigned int aom_highbd_sad32x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad32x32 aom_highbd_sad32x32_c
+
+unsigned int aom_highbd_sad32x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad32x32_avg aom_highbd_sad32x32_avg_c
+
+void aom_highbd_sad32x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad32x32x4d aom_highbd_sad32x32x4d_c
+
+unsigned int aom_highbd_sad32x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad32x64 aom_highbd_sad32x64_c
+
+unsigned int aom_highbd_sad32x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad32x64_avg aom_highbd_sad32x64_avg_c
+
+void aom_highbd_sad32x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad32x64x4d aom_highbd_sad32x64x4d_c
+
+unsigned int aom_highbd_sad32x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad32x8 aom_highbd_sad32x8_c
+
+unsigned int aom_highbd_sad32x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad32x8_avg aom_highbd_sad32x8_avg_c
+
+void aom_highbd_sad32x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad32x8x4d aom_highbd_sad32x8x4d_c
+
+unsigned int aom_highbd_sad4x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad4x16 aom_highbd_sad4x16_c
+
+unsigned int aom_highbd_sad4x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad4x16_avg aom_highbd_sad4x16_avg_c
+
+void aom_highbd_sad4x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad4x16x4d aom_highbd_sad4x16x4d_c
+
+unsigned int aom_highbd_sad4x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad4x4 aom_highbd_sad4x4_c
+
+unsigned int aom_highbd_sad4x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad4x4_avg aom_highbd_sad4x4_avg_c
+
+void aom_highbd_sad4x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad4x4x4d aom_highbd_sad4x4x4d_c
+
+unsigned int aom_highbd_sad4x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad4x8 aom_highbd_sad4x8_c
+
+unsigned int aom_highbd_sad4x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad4x8_avg aom_highbd_sad4x8_avg_c
+
+void aom_highbd_sad4x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad4x8x4d aom_highbd_sad4x8x4d_c
+
+unsigned int aom_highbd_sad64x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad64x128 aom_highbd_sad64x128_c
+
+unsigned int aom_highbd_sad64x128_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad64x128_avg aom_highbd_sad64x128_avg_c
+
+void aom_highbd_sad64x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad64x128x4d aom_highbd_sad64x128x4d_c
+
+unsigned int aom_highbd_sad64x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad64x16 aom_highbd_sad64x16_c
+
+unsigned int aom_highbd_sad64x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad64x16_avg aom_highbd_sad64x16_avg_c
+
+void aom_highbd_sad64x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad64x16x4d aom_highbd_sad64x16x4d_c
+
+unsigned int aom_highbd_sad64x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad64x32 aom_highbd_sad64x32_c
+
+unsigned int aom_highbd_sad64x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad64x32_avg aom_highbd_sad64x32_avg_c
+
+void aom_highbd_sad64x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad64x32x4d aom_highbd_sad64x32x4d_c
+
+unsigned int aom_highbd_sad64x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad64x64 aom_highbd_sad64x64_c
+
+unsigned int aom_highbd_sad64x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad64x64_avg aom_highbd_sad64x64_avg_c
+
+void aom_highbd_sad64x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad64x64x4d aom_highbd_sad64x64x4d_c
+
+unsigned int aom_highbd_sad8x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad8x16 aom_highbd_sad8x16_c
+
+unsigned int aom_highbd_sad8x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad8x16_avg aom_highbd_sad8x16_avg_c
+
+void aom_highbd_sad8x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad8x16x4d aom_highbd_sad8x16x4d_c
+
+unsigned int aom_highbd_sad8x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad8x32 aom_highbd_sad8x32_c
+
+unsigned int aom_highbd_sad8x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad8x32_avg aom_highbd_sad8x32_avg_c
+
+void aom_highbd_sad8x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad8x32x4d aom_highbd_sad8x32x4d_c
+
+unsigned int aom_highbd_sad8x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad8x4 aom_highbd_sad8x4_c
+
+unsigned int aom_highbd_sad8x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad8x4_avg aom_highbd_sad8x4_avg_c
+
+void aom_highbd_sad8x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad8x4x4d aom_highbd_sad8x4x4d_c
+
+unsigned int aom_highbd_sad8x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad8x8 aom_highbd_sad8x8_c
+
+unsigned int aom_highbd_sad8x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_highbd_sad8x8_avg aom_highbd_sad8x8_avg_c
+
+void aom_highbd_sad8x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad8x8x4d aom_highbd_sad8x8x4d_c
+
+void aom_highbd_smooth_h_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_16x16 aom_highbd_smooth_h_predictor_16x16_c
+
+void aom_highbd_smooth_h_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_16x32 aom_highbd_smooth_h_predictor_16x32_c
+
+void aom_highbd_smooth_h_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_16x4 aom_highbd_smooth_h_predictor_16x4_c
+
+void aom_highbd_smooth_h_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_16x64 aom_highbd_smooth_h_predictor_16x64_c
+
+void aom_highbd_smooth_h_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_16x8 aom_highbd_smooth_h_predictor_16x8_c
+
+void aom_highbd_smooth_h_predictor_2x2_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_2x2 aom_highbd_smooth_h_predictor_2x2_c
+
+void aom_highbd_smooth_h_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_32x16 aom_highbd_smooth_h_predictor_32x16_c
+
+void aom_highbd_smooth_h_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_32x32 aom_highbd_smooth_h_predictor_32x32_c
+
+void aom_highbd_smooth_h_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_32x64 aom_highbd_smooth_h_predictor_32x64_c
+
+void aom_highbd_smooth_h_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_32x8 aom_highbd_smooth_h_predictor_32x8_c
+
+void aom_highbd_smooth_h_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_4x16 aom_highbd_smooth_h_predictor_4x16_c
+
+void aom_highbd_smooth_h_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_4x4 aom_highbd_smooth_h_predictor_4x4_c
+
+void aom_highbd_smooth_h_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_4x8 aom_highbd_smooth_h_predictor_4x8_c
+
+void aom_highbd_smooth_h_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_64x16 aom_highbd_smooth_h_predictor_64x16_c
+
+void aom_highbd_smooth_h_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_64x32 aom_highbd_smooth_h_predictor_64x32_c
+
+void aom_highbd_smooth_h_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_64x64 aom_highbd_smooth_h_predictor_64x64_c
+
+void aom_highbd_smooth_h_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_8x16 aom_highbd_smooth_h_predictor_8x16_c
+
+void aom_highbd_smooth_h_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_8x32 aom_highbd_smooth_h_predictor_8x32_c
+
+void aom_highbd_smooth_h_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_8x4 aom_highbd_smooth_h_predictor_8x4_c
+
+void aom_highbd_smooth_h_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_h_predictor_8x8 aom_highbd_smooth_h_predictor_8x8_c
+
+void aom_highbd_smooth_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_16x16 aom_highbd_smooth_predictor_16x16_c
+
+void aom_highbd_smooth_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_16x32 aom_highbd_smooth_predictor_16x32_c
+
+void aom_highbd_smooth_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_16x4 aom_highbd_smooth_predictor_16x4_c
+
+void aom_highbd_smooth_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_16x64 aom_highbd_smooth_predictor_16x64_c
+
+void aom_highbd_smooth_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_16x8 aom_highbd_smooth_predictor_16x8_c
+
+void aom_highbd_smooth_predictor_2x2_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_2x2 aom_highbd_smooth_predictor_2x2_c
+
+void aom_highbd_smooth_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_32x16 aom_highbd_smooth_predictor_32x16_c
+
+void aom_highbd_smooth_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_32x32 aom_highbd_smooth_predictor_32x32_c
+
+void aom_highbd_smooth_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_32x64 aom_highbd_smooth_predictor_32x64_c
+
+void aom_highbd_smooth_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_32x8 aom_highbd_smooth_predictor_32x8_c
+
+void aom_highbd_smooth_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_4x16 aom_highbd_smooth_predictor_4x16_c
+
+void aom_highbd_smooth_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_4x4 aom_highbd_smooth_predictor_4x4_c
+
+void aom_highbd_smooth_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_4x8 aom_highbd_smooth_predictor_4x8_c
+
+void aom_highbd_smooth_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_64x16 aom_highbd_smooth_predictor_64x16_c
+
+void aom_highbd_smooth_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_64x32 aom_highbd_smooth_predictor_64x32_c
+
+void aom_highbd_smooth_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_64x64 aom_highbd_smooth_predictor_64x64_c
+
+void aom_highbd_smooth_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_8x16 aom_highbd_smooth_predictor_8x16_c
+
+void aom_highbd_smooth_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_8x32 aom_highbd_smooth_predictor_8x32_c
+
+void aom_highbd_smooth_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_8x4 aom_highbd_smooth_predictor_8x4_c
+
+void aom_highbd_smooth_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_predictor_8x8 aom_highbd_smooth_predictor_8x8_c
+
+void aom_highbd_smooth_v_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_16x16 aom_highbd_smooth_v_predictor_16x16_c
+
+void aom_highbd_smooth_v_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_16x32 aom_highbd_smooth_v_predictor_16x32_c
+
+void aom_highbd_smooth_v_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_16x4 aom_highbd_smooth_v_predictor_16x4_c
+
+void aom_highbd_smooth_v_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_16x64 aom_highbd_smooth_v_predictor_16x64_c
+
+void aom_highbd_smooth_v_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_16x8 aom_highbd_smooth_v_predictor_16x8_c
+
+void aom_highbd_smooth_v_predictor_2x2_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_2x2 aom_highbd_smooth_v_predictor_2x2_c
+
+void aom_highbd_smooth_v_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_32x16 aom_highbd_smooth_v_predictor_32x16_c
+
+void aom_highbd_smooth_v_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_32x32 aom_highbd_smooth_v_predictor_32x32_c
+
+void aom_highbd_smooth_v_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_32x64 aom_highbd_smooth_v_predictor_32x64_c
+
+void aom_highbd_smooth_v_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_32x8 aom_highbd_smooth_v_predictor_32x8_c
+
+void aom_highbd_smooth_v_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_4x16 aom_highbd_smooth_v_predictor_4x16_c
+
+void aom_highbd_smooth_v_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_4x4 aom_highbd_smooth_v_predictor_4x4_c
+
+void aom_highbd_smooth_v_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_4x8 aom_highbd_smooth_v_predictor_4x8_c
+
+void aom_highbd_smooth_v_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_64x16 aom_highbd_smooth_v_predictor_64x16_c
+
+void aom_highbd_smooth_v_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_64x32 aom_highbd_smooth_v_predictor_64x32_c
+
+void aom_highbd_smooth_v_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_64x64 aom_highbd_smooth_v_predictor_64x64_c
+
+void aom_highbd_smooth_v_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_8x16 aom_highbd_smooth_v_predictor_8x16_c
+
+void aom_highbd_smooth_v_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_8x32 aom_highbd_smooth_v_predictor_8x32_c
+
+void aom_highbd_smooth_v_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_8x4 aom_highbd_smooth_v_predictor_8x4_c
+
+void aom_highbd_smooth_v_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_smooth_v_predictor_8x8 aom_highbd_smooth_v_predictor_8x8_c
+
+int64_t aom_highbd_sse_c(const uint8_t *a8, int a_stride, const uint8_t *b8,int b_stride, int width, int height);
+#define aom_highbd_sse aom_highbd_sse_c
+
+void aom_highbd_subtract_block_c(int rows, int cols, int16_t *diff_ptr, ptrdiff_t diff_stride, const uint8_t *src_ptr, ptrdiff_t src_stride, const uint8_t *pred_ptr, ptrdiff_t pred_stride, int bd);
+#define aom_highbd_subtract_block aom_highbd_subtract_block_c
+
+void aom_highbd_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
+                                                 const MV *const mv, uint8_t *comp_pred8, int width, int height, int subpel_x_q3,
+                                                 int subpel_y_q3, const uint8_t *ref8, int ref_stride, int bd, int subpel_search);
+#define aom_highbd_upsampled_pred aom_highbd_upsampled_pred_c
+
+void aom_highbd_v_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_16x16 aom_highbd_v_predictor_16x16_c
+
+void aom_highbd_v_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_16x32 aom_highbd_v_predictor_16x32_c
+
+void aom_highbd_v_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_16x4 aom_highbd_v_predictor_16x4_c
+
+void aom_highbd_v_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_16x64 aom_highbd_v_predictor_16x64_c
+
+void aom_highbd_v_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_16x8 aom_highbd_v_predictor_16x8_c
+
+void aom_highbd_v_predictor_2x2_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_2x2 aom_highbd_v_predictor_2x2_c
+
+void aom_highbd_v_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_32x16 aom_highbd_v_predictor_32x16_c
+
+void aom_highbd_v_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_32x32 aom_highbd_v_predictor_32x32_c
+
+void aom_highbd_v_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_32x64 aom_highbd_v_predictor_32x64_c
+
+void aom_highbd_v_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_32x8 aom_highbd_v_predictor_32x8_c
+
+void aom_highbd_v_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_4x16 aom_highbd_v_predictor_4x16_c
+
+void aom_highbd_v_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_4x4 aom_highbd_v_predictor_4x4_c
+
+void aom_highbd_v_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_4x8 aom_highbd_v_predictor_4x8_c
+
+void aom_highbd_v_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_64x16 aom_highbd_v_predictor_64x16_c
+
+void aom_highbd_v_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_64x32 aom_highbd_v_predictor_64x32_c
+
+void aom_highbd_v_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_64x64 aom_highbd_v_predictor_64x64_c
+
+void aom_highbd_v_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_8x16 aom_highbd_v_predictor_8x16_c
+
+void aom_highbd_v_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_8x32 aom_highbd_v_predictor_8x32_c
+
+void aom_highbd_v_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_8x4 aom_highbd_v_predictor_8x4_c
+
+void aom_highbd_v_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_v_predictor_8x8 aom_highbd_v_predictor_8x8_c
+
+void aom_ifft16x16_float_c(const float *input, float *temp, float *output);
+#define aom_ifft16x16_float aom_ifft16x16_float_c
+
+void aom_ifft2x2_float_c(const float *input, float *temp, float *output);
+#define aom_ifft2x2_float aom_ifft2x2_float_c
+
+void aom_ifft32x32_float_c(const float *input, float *temp, float *output);
+#define aom_ifft32x32_float aom_ifft32x32_float_c
+
+void aom_ifft4x4_float_c(const float *input, float *temp, float *output);
+#define aom_ifft4x4_float aom_ifft4x4_float_c
+
+void aom_ifft8x8_float_c(const float *input, float *temp, float *output);
+#define aom_ifft8x8_float aom_ifft8x8_float_c
+
+void aom_lowbd_blend_a64_d16_mask_c(uint8_t *dst, uint32_t dst_stride, const CONV_BUF_TYPE *src0, uint32_t src0_stride, const CONV_BUF_TYPE *src1, uint32_t src1_stride, const uint8_t *mask, uint32_t mask_stride, int w, int h, int subx, int suby, ConvolveParams *conv_params);
+#define aom_lowbd_blend_a64_d16_mask aom_lowbd_blend_a64_d16_mask_c
+
+void aom_lpf_horizontal_14_c(uint8_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh);
+#define aom_lpf_horizontal_14 aom_lpf_horizontal_14_c
+
+void aom_lpf_horizontal_14_dual_c(uint8_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1, const uint8_t *thresh1);
+#define aom_lpf_horizontal_14_dual aom_lpf_horizontal_14_dual_c
+
+void aom_lpf_horizontal_4_c(uint8_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh);
+#define aom_lpf_horizontal_4 aom_lpf_horizontal_4_c
+
+void aom_lpf_horizontal_4_dual_c(uint8_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1, const uint8_t *thresh1);
+#define aom_lpf_horizontal_4_dual aom_lpf_horizontal_4_dual_c
+
+void aom_lpf_horizontal_6_c(uint8_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh);
+#define aom_lpf_horizontal_6 aom_lpf_horizontal_6_c
+
+void aom_lpf_horizontal_6_dual_c(uint8_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1, const uint8_t *thresh1);
+#define aom_lpf_horizontal_6_dual aom_lpf_horizontal_6_dual_c
+
+void aom_lpf_horizontal_8_c(uint8_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh);
+#define aom_lpf_horizontal_8 aom_lpf_horizontal_8_c
+
+void aom_lpf_horizontal_8_dual_c(uint8_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1, const uint8_t *thresh1);
+#define aom_lpf_horizontal_8_dual aom_lpf_horizontal_8_dual_c
+
+void aom_lpf_vertical_14_c(uint8_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh);
+#define aom_lpf_vertical_14 aom_lpf_vertical_14_c
+
+void aom_lpf_vertical_14_dual_c(uint8_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1, const uint8_t *thresh1);
+#define aom_lpf_vertical_14_dual aom_lpf_vertical_14_dual_c
+
+void aom_lpf_vertical_4_c(uint8_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh);
+#define aom_lpf_vertical_4 aom_lpf_vertical_4_c
+
+void aom_lpf_vertical_4_dual_c(uint8_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1, const uint8_t *thresh1);
+#define aom_lpf_vertical_4_dual aom_lpf_vertical_4_dual_c
+
+void aom_lpf_vertical_6_c(uint8_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh);
+#define aom_lpf_vertical_6 aom_lpf_vertical_6_c
+
+void aom_lpf_vertical_6_dual_c(uint8_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1, const uint8_t *thresh1);
+#define aom_lpf_vertical_6_dual aom_lpf_vertical_6_dual_c
+
+void aom_lpf_vertical_8_c(uint8_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh);
+#define aom_lpf_vertical_8 aom_lpf_vertical_8_c
+
+void aom_lpf_vertical_8_dual_c(uint8_t *s, int pitch, const uint8_t *blimit0, const uint8_t *limit0, const uint8_t *thresh0, const uint8_t *blimit1, const uint8_t *limit1, const uint8_t *thresh1);
+#define aom_lpf_vertical_8_dual aom_lpf_vertical_8_dual_c
+
+unsigned int aom_masked_sad128x128_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad128x128 aom_masked_sad128x128_c
+
+unsigned int aom_masked_sad128x64_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad128x64 aom_masked_sad128x64_c
+
+unsigned int aom_masked_sad16x16_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad16x16 aom_masked_sad16x16_c
+
+unsigned int aom_masked_sad16x32_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad16x32 aom_masked_sad16x32_c
+
+unsigned int aom_masked_sad16x4_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad16x4 aom_masked_sad16x4_c
+
+unsigned int aom_masked_sad16x64_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad16x64 aom_masked_sad16x64_c
+
+unsigned int aom_masked_sad16x8_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad16x8 aom_masked_sad16x8_c
+
+unsigned int aom_masked_sad32x16_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad32x16 aom_masked_sad32x16_c
+
+unsigned int aom_masked_sad32x32_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad32x32 aom_masked_sad32x32_c
+
+unsigned int aom_masked_sad32x64_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad32x64 aom_masked_sad32x64_c
+
+unsigned int aom_masked_sad32x8_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad32x8 aom_masked_sad32x8_c
+
+unsigned int aom_masked_sad4x16_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad4x16 aom_masked_sad4x16_c
+
+unsigned int aom_masked_sad4x4_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad4x4 aom_masked_sad4x4_c
+
+unsigned int aom_masked_sad4x8_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad4x8 aom_masked_sad4x8_c
+
+unsigned int aom_masked_sad64x128_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad64x128 aom_masked_sad64x128_c
+
+unsigned int aom_masked_sad64x16_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad64x16 aom_masked_sad64x16_c
+
+unsigned int aom_masked_sad64x32_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad64x32 aom_masked_sad64x32_c
+
+unsigned int aom_masked_sad64x64_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad64x64 aom_masked_sad64x64_c
+
+unsigned int aom_masked_sad8x16_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad8x16 aom_masked_sad8x16_c
+
+unsigned int aom_masked_sad8x32_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad8x32 aom_masked_sad8x32_c
+
+unsigned int aom_masked_sad8x4_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad8x4 aom_masked_sad8x4_c
+
+unsigned int aom_masked_sad8x8_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad8x8 aom_masked_sad8x8_c
+
+unsigned int aom_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance128x128 aom_masked_sub_pixel_variance128x128_c
+
+unsigned int aom_masked_sub_pixel_variance128x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance128x64 aom_masked_sub_pixel_variance128x64_c
+
+unsigned int aom_masked_sub_pixel_variance16x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance16x16 aom_masked_sub_pixel_variance16x16_c
+
+unsigned int aom_masked_sub_pixel_variance16x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance16x32 aom_masked_sub_pixel_variance16x32_c
+
+unsigned int aom_masked_sub_pixel_variance16x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance16x4 aom_masked_sub_pixel_variance16x4_c
+
+unsigned int aom_masked_sub_pixel_variance16x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance16x64 aom_masked_sub_pixel_variance16x64_c
+
+unsigned int aom_masked_sub_pixel_variance16x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance16x8 aom_masked_sub_pixel_variance16x8_c
+
+unsigned int aom_masked_sub_pixel_variance32x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance32x16 aom_masked_sub_pixel_variance32x16_c
+
+unsigned int aom_masked_sub_pixel_variance32x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance32x32 aom_masked_sub_pixel_variance32x32_c
+
+unsigned int aom_masked_sub_pixel_variance32x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance32x64 aom_masked_sub_pixel_variance32x64_c
+
+unsigned int aom_masked_sub_pixel_variance32x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance32x8 aom_masked_sub_pixel_variance32x8_c
+
+unsigned int aom_masked_sub_pixel_variance4x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance4x16 aom_masked_sub_pixel_variance4x16_c
+
+unsigned int aom_masked_sub_pixel_variance4x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance4x4 aom_masked_sub_pixel_variance4x4_c
+
+unsigned int aom_masked_sub_pixel_variance4x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance4x8 aom_masked_sub_pixel_variance4x8_c
+
+unsigned int aom_masked_sub_pixel_variance64x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance64x128 aom_masked_sub_pixel_variance64x128_c
+
+unsigned int aom_masked_sub_pixel_variance64x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance64x16 aom_masked_sub_pixel_variance64x16_c
+
+unsigned int aom_masked_sub_pixel_variance64x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance64x32 aom_masked_sub_pixel_variance64x32_c
+
+unsigned int aom_masked_sub_pixel_variance64x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance64x64 aom_masked_sub_pixel_variance64x64_c
+
+unsigned int aom_masked_sub_pixel_variance8x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance8x16 aom_masked_sub_pixel_variance8x16_c
+
+unsigned int aom_masked_sub_pixel_variance8x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance8x32 aom_masked_sub_pixel_variance8x32_c
+
+unsigned int aom_masked_sub_pixel_variance8x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance8x4 aom_masked_sub_pixel_variance8x4_c
+
+unsigned int aom_masked_sub_pixel_variance8x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance8x8 aom_masked_sub_pixel_variance8x8_c
+
+void aom_minmax_8x8_c(const uint8_t *s, int p, const uint8_t *d, int dp, int *min, int *max);
+#define aom_minmax_8x8 aom_minmax_8x8_c
+
+unsigned int aom_mse16x16_c(const uint8_t *src_ptr, int  source_stride, const uint8_t *ref_ptr, int  recon_stride, unsigned int *sse);
+#define aom_mse16x16 aom_mse16x16_c
+
+unsigned int aom_mse16x8_c(const uint8_t *src_ptr, int  source_stride, const uint8_t *ref_ptr, int  recon_stride, unsigned int *sse);
+#define aom_mse16x8 aom_mse16x8_c
+
+unsigned int aom_mse8x16_c(const uint8_t *src_ptr, int  source_stride, const uint8_t *ref_ptr, int  recon_stride, unsigned int *sse);
+#define aom_mse8x16 aom_mse8x16_c
+
+unsigned int aom_mse8x8_c(const uint8_t *src_ptr, int  source_stride, const uint8_t *ref_ptr, int  recon_stride, unsigned int *sse);
+#define aom_mse8x8 aom_mse8x8_c
+
+unsigned int aom_obmc_sad128x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad128x128 aom_obmc_sad128x128_c
+
+unsigned int aom_obmc_sad128x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad128x64 aom_obmc_sad128x64_c
+
+unsigned int aom_obmc_sad16x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad16x16 aom_obmc_sad16x16_c
+
+unsigned int aom_obmc_sad16x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad16x32 aom_obmc_sad16x32_c
+
+unsigned int aom_obmc_sad16x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad16x4 aom_obmc_sad16x4_c
+
+unsigned int aom_obmc_sad16x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad16x64 aom_obmc_sad16x64_c
+
+unsigned int aom_obmc_sad16x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad16x8 aom_obmc_sad16x8_c
+
+unsigned int aom_obmc_sad32x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad32x16 aom_obmc_sad32x16_c
+
+unsigned int aom_obmc_sad32x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad32x32 aom_obmc_sad32x32_c
+
+unsigned int aom_obmc_sad32x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad32x64 aom_obmc_sad32x64_c
+
+unsigned int aom_obmc_sad32x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad32x8 aom_obmc_sad32x8_c
+
+unsigned int aom_obmc_sad4x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad4x16 aom_obmc_sad4x16_c
+
+unsigned int aom_obmc_sad4x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad4x4 aom_obmc_sad4x4_c
+
+unsigned int aom_obmc_sad4x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad4x8 aom_obmc_sad4x8_c
+
+unsigned int aom_obmc_sad64x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad64x128 aom_obmc_sad64x128_c
+
+unsigned int aom_obmc_sad64x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad64x16 aom_obmc_sad64x16_c
+
+unsigned int aom_obmc_sad64x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad64x32 aom_obmc_sad64x32_c
+
+unsigned int aom_obmc_sad64x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad64x64 aom_obmc_sad64x64_c
+
+unsigned int aom_obmc_sad8x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad8x16 aom_obmc_sad8x16_c
+
+unsigned int aom_obmc_sad8x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad8x32 aom_obmc_sad8x32_c
+
+unsigned int aom_obmc_sad8x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad8x4 aom_obmc_sad8x4_c
+
+unsigned int aom_obmc_sad8x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad8x8 aom_obmc_sad8x8_c
+
+unsigned int aom_obmc_sub_pixel_variance128x128_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance128x128 aom_obmc_sub_pixel_variance128x128_c
+
+unsigned int aom_obmc_sub_pixel_variance128x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance128x64 aom_obmc_sub_pixel_variance128x64_c
+
+unsigned int aom_obmc_sub_pixel_variance16x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance16x16 aom_obmc_sub_pixel_variance16x16_c
+
+unsigned int aom_obmc_sub_pixel_variance16x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance16x32 aom_obmc_sub_pixel_variance16x32_c
+
+unsigned int aom_obmc_sub_pixel_variance16x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance16x4 aom_obmc_sub_pixel_variance16x4_c
+
+unsigned int aom_obmc_sub_pixel_variance16x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance16x64 aom_obmc_sub_pixel_variance16x64_c
+
+unsigned int aom_obmc_sub_pixel_variance16x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance16x8 aom_obmc_sub_pixel_variance16x8_c
+
+unsigned int aom_obmc_sub_pixel_variance32x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance32x16 aom_obmc_sub_pixel_variance32x16_c
+
+unsigned int aom_obmc_sub_pixel_variance32x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance32x32 aom_obmc_sub_pixel_variance32x32_c
+
+unsigned int aom_obmc_sub_pixel_variance32x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance32x64 aom_obmc_sub_pixel_variance32x64_c
+
+unsigned int aom_obmc_sub_pixel_variance32x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance32x8 aom_obmc_sub_pixel_variance32x8_c
+
+unsigned int aom_obmc_sub_pixel_variance4x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance4x16 aom_obmc_sub_pixel_variance4x16_c
+
+unsigned int aom_obmc_sub_pixel_variance4x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance4x4 aom_obmc_sub_pixel_variance4x4_c
+
+unsigned int aom_obmc_sub_pixel_variance4x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance4x8 aom_obmc_sub_pixel_variance4x8_c
+
+unsigned int aom_obmc_sub_pixel_variance64x128_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance64x128 aom_obmc_sub_pixel_variance64x128_c
+
+unsigned int aom_obmc_sub_pixel_variance64x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance64x16 aom_obmc_sub_pixel_variance64x16_c
+
+unsigned int aom_obmc_sub_pixel_variance64x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance64x32 aom_obmc_sub_pixel_variance64x32_c
+
+unsigned int aom_obmc_sub_pixel_variance64x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance64x64 aom_obmc_sub_pixel_variance64x64_c
+
+unsigned int aom_obmc_sub_pixel_variance8x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance8x16 aom_obmc_sub_pixel_variance8x16_c
+
+unsigned int aom_obmc_sub_pixel_variance8x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance8x32 aom_obmc_sub_pixel_variance8x32_c
+
+unsigned int aom_obmc_sub_pixel_variance8x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance8x4 aom_obmc_sub_pixel_variance8x4_c
+
+unsigned int aom_obmc_sub_pixel_variance8x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance8x8 aom_obmc_sub_pixel_variance8x8_c
+
+unsigned int aom_obmc_variance128x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance128x128 aom_obmc_variance128x128_c
+
+unsigned int aom_obmc_variance128x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance128x64 aom_obmc_variance128x64_c
+
+unsigned int aom_obmc_variance16x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance16x16 aom_obmc_variance16x16_c
+
+unsigned int aom_obmc_variance16x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance16x32 aom_obmc_variance16x32_c
+
+unsigned int aom_obmc_variance16x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance16x4 aom_obmc_variance16x4_c
+
+unsigned int aom_obmc_variance16x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance16x64 aom_obmc_variance16x64_c
+
+unsigned int aom_obmc_variance16x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance16x8 aom_obmc_variance16x8_c
+
+unsigned int aom_obmc_variance32x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance32x16 aom_obmc_variance32x16_c
+
+unsigned int aom_obmc_variance32x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance32x32 aom_obmc_variance32x32_c
+
+unsigned int aom_obmc_variance32x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance32x64 aom_obmc_variance32x64_c
+
+unsigned int aom_obmc_variance32x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance32x8 aom_obmc_variance32x8_c
+
+unsigned int aom_obmc_variance4x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance4x16 aom_obmc_variance4x16_c
+
+unsigned int aom_obmc_variance4x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance4x4 aom_obmc_variance4x4_c
+
+unsigned int aom_obmc_variance4x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance4x8 aom_obmc_variance4x8_c
+
+unsigned int aom_obmc_variance64x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance64x128 aom_obmc_variance64x128_c
+
+unsigned int aom_obmc_variance64x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance64x16 aom_obmc_variance64x16_c
+
+unsigned int aom_obmc_variance64x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance64x32 aom_obmc_variance64x32_c
+
+unsigned int aom_obmc_variance64x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance64x64 aom_obmc_variance64x64_c
+
+unsigned int aom_obmc_variance8x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance8x16 aom_obmc_variance8x16_c
+
+unsigned int aom_obmc_variance8x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance8x32 aom_obmc_variance8x32_c
+
+unsigned int aom_obmc_variance8x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance8x4 aom_obmc_variance8x4_c
+
+unsigned int aom_obmc_variance8x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance8x8 aom_obmc_variance8x8_c
+
+void aom_paeth_predictor_16x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_16x16 aom_paeth_predictor_16x16_c
+
+void aom_paeth_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_16x32 aom_paeth_predictor_16x32_c
+
+void aom_paeth_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_16x4 aom_paeth_predictor_16x4_c
+
+void aom_paeth_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_16x64 aom_paeth_predictor_16x64_c
+
+void aom_paeth_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_16x8 aom_paeth_predictor_16x8_c
+
+void aom_paeth_predictor_2x2_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_2x2 aom_paeth_predictor_2x2_c
+
+void aom_paeth_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_32x16 aom_paeth_predictor_32x16_c
+
+void aom_paeth_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_32x32 aom_paeth_predictor_32x32_c
+
+void aom_paeth_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_32x64 aom_paeth_predictor_32x64_c
+
+void aom_paeth_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_32x8 aom_paeth_predictor_32x8_c
+
+void aom_paeth_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_4x16 aom_paeth_predictor_4x16_c
+
+void aom_paeth_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_4x4 aom_paeth_predictor_4x4_c
+
+void aom_paeth_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_4x8 aom_paeth_predictor_4x8_c
+
+void aom_paeth_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_64x16 aom_paeth_predictor_64x16_c
+
+void aom_paeth_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_64x32 aom_paeth_predictor_64x32_c
+
+void aom_paeth_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_64x64 aom_paeth_predictor_64x64_c
+
+void aom_paeth_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_8x16 aom_paeth_predictor_8x16_c
+
+void aom_paeth_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_8x32 aom_paeth_predictor_8x32_c
+
+void aom_paeth_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_8x4 aom_paeth_predictor_8x4_c
+
+void aom_paeth_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_paeth_predictor_8x8 aom_paeth_predictor_8x8_c
+
+void aom_quantize_b_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
+#define aom_quantize_b aom_quantize_b_c
+
+void aom_quantize_b_32x32_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
+#define aom_quantize_b_32x32 aom_quantize_b_32x32_c
+
+void aom_quantize_b_64x64_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
+#define aom_quantize_b_64x64 aom_quantize_b_64x64_c
+
+unsigned int aom_sad128x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad128x128 aom_sad128x128_c
+
+unsigned int aom_sad128x128_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad128x128_avg aom_sad128x128_avg_c
+
+void aom_sad128x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad128x128x4d aom_sad128x128x4d_c
+
+unsigned int aom_sad128x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad128x64 aom_sad128x64_c
+
+unsigned int aom_sad128x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad128x64_avg aom_sad128x64_avg_c
+
+void aom_sad128x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad128x64x4d aom_sad128x64x4d_c
+
+unsigned int aom_sad128xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
+#define aom_sad128xh aom_sad128xh_c
+
+unsigned int aom_sad16x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad16x16 aom_sad16x16_c
+
+unsigned int aom_sad16x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad16x16_avg aom_sad16x16_avg_c
+
+void aom_sad16x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad16x16x4d aom_sad16x16x4d_c
+
+unsigned int aom_sad16x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad16x32 aom_sad16x32_c
+
+unsigned int aom_sad16x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad16x32_avg aom_sad16x32_avg_c
+
+void aom_sad16x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad16x32x4d aom_sad16x32x4d_c
+
+unsigned int aom_sad16x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad16x4 aom_sad16x4_c
+
+unsigned int aom_sad16x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad16x4_avg aom_sad16x4_avg_c
+
+void aom_sad16x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad16x4x4d aom_sad16x4x4d_c
+
+unsigned int aom_sad16x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad16x64 aom_sad16x64_c
+
+unsigned int aom_sad16x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad16x64_avg aom_sad16x64_avg_c
+
+void aom_sad16x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad16x64x4d aom_sad16x64x4d_c
+
+unsigned int aom_sad16x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad16x8 aom_sad16x8_c
+
+unsigned int aom_sad16x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad16x8_avg aom_sad16x8_avg_c
+
+void aom_sad16x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad16x8x4d aom_sad16x8x4d_c
+
+unsigned int aom_sad16xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
+#define aom_sad16xh aom_sad16xh_c
+
+unsigned int aom_sad32x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad32x16 aom_sad32x16_c
+
+unsigned int aom_sad32x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad32x16_avg aom_sad32x16_avg_c
+
+void aom_sad32x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad32x16x4d aom_sad32x16x4d_c
+
+unsigned int aom_sad32x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad32x32 aom_sad32x32_c
+
+unsigned int aom_sad32x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad32x32_avg aom_sad32x32_avg_c
+
+void aom_sad32x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad32x32x4d aom_sad32x32x4d_c
+
+unsigned int aom_sad32x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad32x64 aom_sad32x64_c
+
+unsigned int aom_sad32x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad32x64_avg aom_sad32x64_avg_c
+
+void aom_sad32x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad32x64x4d aom_sad32x64x4d_c
+
+unsigned int aom_sad32x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad32x8 aom_sad32x8_c
+
+unsigned int aom_sad32x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad32x8_avg aom_sad32x8_avg_c
+
+void aom_sad32x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad32x8x4d aom_sad32x8x4d_c
+
+unsigned int aom_sad32xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
+#define aom_sad32xh aom_sad32xh_c
+
+unsigned int aom_sad4x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad4x16 aom_sad4x16_c
+
+unsigned int aom_sad4x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad4x16_avg aom_sad4x16_avg_c
+
+void aom_sad4x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad4x16x4d aom_sad4x16x4d_c
+
+unsigned int aom_sad4x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad4x4 aom_sad4x4_c
+
+unsigned int aom_sad4x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad4x4_avg aom_sad4x4_avg_c
+
+void aom_sad4x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad4x4x4d aom_sad4x4x4d_c
+
+unsigned int aom_sad4x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad4x8 aom_sad4x8_c
+
+unsigned int aom_sad4x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad4x8_avg aom_sad4x8_avg_c
+
+void aom_sad4x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad4x8x4d aom_sad4x8x4d_c
+
+unsigned int aom_sad4xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
+#define aom_sad4xh aom_sad4xh_c
+
+unsigned int aom_sad64x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad64x128 aom_sad64x128_c
+
+unsigned int aom_sad64x128_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad64x128_avg aom_sad64x128_avg_c
+
+void aom_sad64x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad64x128x4d aom_sad64x128x4d_c
+
+unsigned int aom_sad64x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad64x16 aom_sad64x16_c
+
+unsigned int aom_sad64x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad64x16_avg aom_sad64x16_avg_c
+
+void aom_sad64x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad64x16x4d aom_sad64x16x4d_c
+
+unsigned int aom_sad64x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad64x32 aom_sad64x32_c
+
+unsigned int aom_sad64x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad64x32_avg aom_sad64x32_avg_c
+
+void aom_sad64x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad64x32x4d aom_sad64x32x4d_c
+
+unsigned int aom_sad64x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad64x64 aom_sad64x64_c
+
+unsigned int aom_sad64x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad64x64_avg aom_sad64x64_avg_c
+
+void aom_sad64x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad64x64x4d aom_sad64x64x4d_c
+
+unsigned int aom_sad64xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
+#define aom_sad64xh aom_sad64xh_c
+
+unsigned int aom_sad8x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad8x16 aom_sad8x16_c
+
+unsigned int aom_sad8x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad8x16_avg aom_sad8x16_avg_c
+
+void aom_sad8x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad8x16x4d aom_sad8x16x4d_c
+
+unsigned int aom_sad8x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad8x32 aom_sad8x32_c
+
+unsigned int aom_sad8x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad8x32_avg aom_sad8x32_avg_c
+
+void aom_sad8x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad8x32x4d aom_sad8x32x4d_c
+
+unsigned int aom_sad8x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad8x4 aom_sad8x4_c
+
+unsigned int aom_sad8x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad8x4_avg aom_sad8x4_avg_c
+
+void aom_sad8x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad8x4x4d aom_sad8x4x4d_c
+
+unsigned int aom_sad8x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad8x8 aom_sad8x8_c
+
+unsigned int aom_sad8x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
+#define aom_sad8x8_avg aom_sad8x8_avg_c
+
+void aom_sad8x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_sad8x8x4d aom_sad8x8x4d_c
+
+unsigned int aom_sad8xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
+#define aom_sad8xh aom_sad8xh_c
+
+int aom_satd_c(const tran_low_t *coeff, int length);
+#define aom_satd aom_satd_c
+
+void aom_smooth_h_predictor_16x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_16x16 aom_smooth_h_predictor_16x16_c
+
+void aom_smooth_h_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_16x32 aom_smooth_h_predictor_16x32_c
+
+void aom_smooth_h_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_16x4 aom_smooth_h_predictor_16x4_c
+
+void aom_smooth_h_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_16x64 aom_smooth_h_predictor_16x64_c
+
+void aom_smooth_h_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_16x8 aom_smooth_h_predictor_16x8_c
+
+void aom_smooth_h_predictor_2x2_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_2x2 aom_smooth_h_predictor_2x2_c
+
+void aom_smooth_h_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_32x16 aom_smooth_h_predictor_32x16_c
+
+void aom_smooth_h_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_32x32 aom_smooth_h_predictor_32x32_c
+
+void aom_smooth_h_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_32x64 aom_smooth_h_predictor_32x64_c
+
+void aom_smooth_h_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_32x8 aom_smooth_h_predictor_32x8_c
+
+void aom_smooth_h_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_4x16 aom_smooth_h_predictor_4x16_c
+
+void aom_smooth_h_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_4x4 aom_smooth_h_predictor_4x4_c
+
+void aom_smooth_h_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_4x8 aom_smooth_h_predictor_4x8_c
+
+void aom_smooth_h_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_64x16 aom_smooth_h_predictor_64x16_c
+
+void aom_smooth_h_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_64x32 aom_smooth_h_predictor_64x32_c
+
+void aom_smooth_h_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_64x64 aom_smooth_h_predictor_64x64_c
+
+void aom_smooth_h_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_8x16 aom_smooth_h_predictor_8x16_c
+
+void aom_smooth_h_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_8x32 aom_smooth_h_predictor_8x32_c
+
+void aom_smooth_h_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_8x4 aom_smooth_h_predictor_8x4_c
+
+void aom_smooth_h_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_h_predictor_8x8 aom_smooth_h_predictor_8x8_c
+
+void aom_smooth_predictor_16x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_16x16 aom_smooth_predictor_16x16_c
+
+void aom_smooth_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_16x32 aom_smooth_predictor_16x32_c
+
+void aom_smooth_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_16x4 aom_smooth_predictor_16x4_c
+
+void aom_smooth_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_16x64 aom_smooth_predictor_16x64_c
+
+void aom_smooth_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_16x8 aom_smooth_predictor_16x8_c
+
+void aom_smooth_predictor_2x2_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_2x2 aom_smooth_predictor_2x2_c
+
+void aom_smooth_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_32x16 aom_smooth_predictor_32x16_c
+
+void aom_smooth_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_32x32 aom_smooth_predictor_32x32_c
+
+void aom_smooth_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_32x64 aom_smooth_predictor_32x64_c
+
+void aom_smooth_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_32x8 aom_smooth_predictor_32x8_c
+
+void aom_smooth_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_4x16 aom_smooth_predictor_4x16_c
+
+void aom_smooth_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_4x4 aom_smooth_predictor_4x4_c
+
+void aom_smooth_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_4x8 aom_smooth_predictor_4x8_c
+
+void aom_smooth_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_64x16 aom_smooth_predictor_64x16_c
+
+void aom_smooth_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_64x32 aom_smooth_predictor_64x32_c
+
+void aom_smooth_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_64x64 aom_smooth_predictor_64x64_c
+
+void aom_smooth_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_8x16 aom_smooth_predictor_8x16_c
+
+void aom_smooth_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_8x32 aom_smooth_predictor_8x32_c
+
+void aom_smooth_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_8x4 aom_smooth_predictor_8x4_c
+
+void aom_smooth_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_predictor_8x8 aom_smooth_predictor_8x8_c
+
+void aom_smooth_v_predictor_16x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_16x16 aom_smooth_v_predictor_16x16_c
+
+void aom_smooth_v_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_16x32 aom_smooth_v_predictor_16x32_c
+
+void aom_smooth_v_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_16x4 aom_smooth_v_predictor_16x4_c
+
+void aom_smooth_v_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_16x64 aom_smooth_v_predictor_16x64_c
+
+void aom_smooth_v_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_16x8 aom_smooth_v_predictor_16x8_c
+
+void aom_smooth_v_predictor_2x2_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_2x2 aom_smooth_v_predictor_2x2_c
+
+void aom_smooth_v_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_32x16 aom_smooth_v_predictor_32x16_c
+
+void aom_smooth_v_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_32x32 aom_smooth_v_predictor_32x32_c
+
+void aom_smooth_v_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_32x64 aom_smooth_v_predictor_32x64_c
+
+void aom_smooth_v_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_32x8 aom_smooth_v_predictor_32x8_c
+
+void aom_smooth_v_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_4x16 aom_smooth_v_predictor_4x16_c
+
+void aom_smooth_v_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_4x4 aom_smooth_v_predictor_4x4_c
+
+void aom_smooth_v_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_4x8 aom_smooth_v_predictor_4x8_c
+
+void aom_smooth_v_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_64x16 aom_smooth_v_predictor_64x16_c
+
+void aom_smooth_v_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_64x32 aom_smooth_v_predictor_64x32_c
+
+void aom_smooth_v_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_64x64 aom_smooth_v_predictor_64x64_c
+
+void aom_smooth_v_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_8x16 aom_smooth_v_predictor_8x16_c
+
+void aom_smooth_v_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_8x32 aom_smooth_v_predictor_8x32_c
+
+void aom_smooth_v_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_8x4 aom_smooth_v_predictor_8x4_c
+
+void aom_smooth_v_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_smooth_v_predictor_8x8 aom_smooth_v_predictor_8x8_c
+
+int64_t aom_sse_c(const uint8_t *a, int a_stride, const uint8_t *b,int b_stride, int width, int height);
+#define aom_sse aom_sse_c
+
+uint32_t aom_sub_pixel_avg_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance128x128 aom_sub_pixel_avg_variance128x128_c
+
+uint32_t aom_sub_pixel_avg_variance128x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance128x64 aom_sub_pixel_avg_variance128x64_c
+
+uint32_t aom_sub_pixel_avg_variance16x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance16x16 aom_sub_pixel_avg_variance16x16_c
+
+uint32_t aom_sub_pixel_avg_variance16x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance16x32 aom_sub_pixel_avg_variance16x32_c
+
+uint32_t aom_sub_pixel_avg_variance16x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance16x4 aom_sub_pixel_avg_variance16x4_c
+
+uint32_t aom_sub_pixel_avg_variance16x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance16x64 aom_sub_pixel_avg_variance16x64_c
+
+uint32_t aom_sub_pixel_avg_variance16x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance16x8 aom_sub_pixel_avg_variance16x8_c
+
+uint32_t aom_sub_pixel_avg_variance32x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance32x16 aom_sub_pixel_avg_variance32x16_c
+
+uint32_t aom_sub_pixel_avg_variance32x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance32x32 aom_sub_pixel_avg_variance32x32_c
+
+uint32_t aom_sub_pixel_avg_variance32x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance32x64 aom_sub_pixel_avg_variance32x64_c
+
+uint32_t aom_sub_pixel_avg_variance32x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance32x8 aom_sub_pixel_avg_variance32x8_c
+
+uint32_t aom_sub_pixel_avg_variance4x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance4x16 aom_sub_pixel_avg_variance4x16_c
+
+uint32_t aom_sub_pixel_avg_variance4x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance4x4 aom_sub_pixel_avg_variance4x4_c
+
+uint32_t aom_sub_pixel_avg_variance4x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance4x8 aom_sub_pixel_avg_variance4x8_c
+
+uint32_t aom_sub_pixel_avg_variance64x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance64x128 aom_sub_pixel_avg_variance64x128_c
+
+uint32_t aom_sub_pixel_avg_variance64x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance64x16 aom_sub_pixel_avg_variance64x16_c
+
+uint32_t aom_sub_pixel_avg_variance64x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance64x32 aom_sub_pixel_avg_variance64x32_c
+
+uint32_t aom_sub_pixel_avg_variance64x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance64x64 aom_sub_pixel_avg_variance64x64_c
+
+uint32_t aom_sub_pixel_avg_variance8x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance8x16 aom_sub_pixel_avg_variance8x16_c
+
+uint32_t aom_sub_pixel_avg_variance8x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance8x32 aom_sub_pixel_avg_variance8x32_c
+
+uint32_t aom_sub_pixel_avg_variance8x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance8x4 aom_sub_pixel_avg_variance8x4_c
+
+uint32_t aom_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
+#define aom_sub_pixel_avg_variance8x8 aom_sub_pixel_avg_variance8x8_c
+
+uint32_t aom_sub_pixel_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance128x128 aom_sub_pixel_variance128x128_c
+
+uint32_t aom_sub_pixel_variance128x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance128x64 aom_sub_pixel_variance128x64_c
+
+uint32_t aom_sub_pixel_variance16x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance16x16 aom_sub_pixel_variance16x16_c
+
+uint32_t aom_sub_pixel_variance16x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance16x32 aom_sub_pixel_variance16x32_c
+
+uint32_t aom_sub_pixel_variance16x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance16x4 aom_sub_pixel_variance16x4_c
+
+uint32_t aom_sub_pixel_variance16x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance16x64 aom_sub_pixel_variance16x64_c
+
+uint32_t aom_sub_pixel_variance16x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance16x8 aom_sub_pixel_variance16x8_c
+
+uint32_t aom_sub_pixel_variance32x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance32x16 aom_sub_pixel_variance32x16_c
+
+uint32_t aom_sub_pixel_variance32x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance32x32 aom_sub_pixel_variance32x32_c
+
+uint32_t aom_sub_pixel_variance32x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance32x64 aom_sub_pixel_variance32x64_c
+
+uint32_t aom_sub_pixel_variance32x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance32x8 aom_sub_pixel_variance32x8_c
+
+uint32_t aom_sub_pixel_variance4x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance4x16 aom_sub_pixel_variance4x16_c
+
+uint32_t aom_sub_pixel_variance4x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance4x4 aom_sub_pixel_variance4x4_c
+
+uint32_t aom_sub_pixel_variance4x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance4x8 aom_sub_pixel_variance4x8_c
+
+uint32_t aom_sub_pixel_variance64x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance64x128 aom_sub_pixel_variance64x128_c
+
+uint32_t aom_sub_pixel_variance64x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance64x16 aom_sub_pixel_variance64x16_c
+
+uint32_t aom_sub_pixel_variance64x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance64x32 aom_sub_pixel_variance64x32_c
+
+uint32_t aom_sub_pixel_variance64x64_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance64x64 aom_sub_pixel_variance64x64_c
+
+uint32_t aom_sub_pixel_variance8x16_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance8x16 aom_sub_pixel_variance8x16_c
+
+uint32_t aom_sub_pixel_variance8x32_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance8x32 aom_sub_pixel_variance8x32_c
+
+uint32_t aom_sub_pixel_variance8x4_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance8x4 aom_sub_pixel_variance8x4_c
+
+uint32_t aom_sub_pixel_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int  yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+#define aom_sub_pixel_variance8x8 aom_sub_pixel_variance8x8_c
+
+void aom_subtract_block_c(int rows, int cols, int16_t *diff_ptr, ptrdiff_t diff_stride, const uint8_t *src_ptr, ptrdiff_t src_stride, const uint8_t *pred_ptr, ptrdiff_t pred_stride);
+#define aom_subtract_block aom_subtract_block_c
+
+uint64_t aom_sum_squares_2d_i16_c(const int16_t *src, int stride, int width, int height);
+#define aom_sum_squares_2d_i16 aom_sum_squares_2d_i16_c
+
+uint64_t aom_sum_squares_i16_c(const int16_t *src, uint32_t N);
+#define aom_sum_squares_i16 aom_sum_squares_i16_c
+
+void aom_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
+                                          const MV *const mv, uint8_t *comp_pred, int width, int height, int subpel_x_q3,
+                                          int subpel_y_q3, const uint8_t *ref, int ref_stride, int subpel_search);
+#define aom_upsampled_pred aom_upsampled_pred_c
+
+void aom_v_predictor_16x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_16x16 aom_v_predictor_16x16_c
+
+void aom_v_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_16x32 aom_v_predictor_16x32_c
+
+void aom_v_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_16x4 aom_v_predictor_16x4_c
+
+void aom_v_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_16x64 aom_v_predictor_16x64_c
+
+void aom_v_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_16x8 aom_v_predictor_16x8_c
+
+void aom_v_predictor_2x2_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_2x2 aom_v_predictor_2x2_c
+
+void aom_v_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_32x16 aom_v_predictor_32x16_c
+
+void aom_v_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_32x32 aom_v_predictor_32x32_c
+
+void aom_v_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_32x64 aom_v_predictor_32x64_c
+
+void aom_v_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_32x8 aom_v_predictor_32x8_c
+
+void aom_v_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_4x16 aom_v_predictor_4x16_c
+
+void aom_v_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_4x4 aom_v_predictor_4x4_c
+
+void aom_v_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_4x8 aom_v_predictor_4x8_c
+
+void aom_v_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_64x16 aom_v_predictor_64x16_c
+
+void aom_v_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_64x32 aom_v_predictor_64x32_c
+
+void aom_v_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_64x64 aom_v_predictor_64x64_c
+
+void aom_v_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_8x16 aom_v_predictor_8x16_c
+
+void aom_v_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_8x32 aom_v_predictor_8x32_c
+
+void aom_v_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_8x4 aom_v_predictor_8x4_c
+
+void aom_v_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_8x8 aom_v_predictor_8x8_c
+
+unsigned int aom_variance128x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance128x128 aom_variance128x128_c
+
+unsigned int aom_variance128x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance128x64 aom_variance128x64_c
+
+unsigned int aom_variance16x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance16x16 aom_variance16x16_c
+
+unsigned int aom_variance16x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance16x32 aom_variance16x32_c
+
+unsigned int aom_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance16x4 aom_variance16x4_c
+
+unsigned int aom_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance16x64 aom_variance16x64_c
+
+unsigned int aom_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance16x8 aom_variance16x8_c
+
+unsigned int aom_variance2x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance2x2 aom_variance2x2_c
+
+unsigned int aom_variance2x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance2x4 aom_variance2x4_c
+
+unsigned int aom_variance32x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance32x16 aom_variance32x16_c
+
+unsigned int aom_variance32x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance32x32 aom_variance32x32_c
+
+unsigned int aom_variance32x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance32x64 aom_variance32x64_c
+
+unsigned int aom_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance32x8 aom_variance32x8_c
+
+unsigned int aom_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance4x16 aom_variance4x16_c
+
+unsigned int aom_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance4x2 aom_variance4x2_c
+
+unsigned int aom_variance4x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance4x4 aom_variance4x4_c
+
+unsigned int aom_variance4x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance4x8 aom_variance4x8_c
+
+unsigned int aom_variance64x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance64x128 aom_variance64x128_c
+
+unsigned int aom_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance64x16 aom_variance64x16_c
+
+unsigned int aom_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance64x32 aom_variance64x32_c
+
+unsigned int aom_variance64x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance64x64 aom_variance64x64_c
+
+unsigned int aom_variance8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance8x16 aom_variance8x16_c
+
+unsigned int aom_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance8x32 aom_variance8x32_c
+
+unsigned int aom_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance8x4 aom_variance8x4_c
+
+unsigned int aom_variance8x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_variance8x8 aom_variance8x8_c
+
+void aom_dsp_rtcd(void);
+
+#include "config/aom_config.h"
+
+#ifdef RTCD_C
+static void setup_rtcd_internal(void)
+{
+}
+#endif
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif

diff --git a/libav1/build/config/aom_scale_rtcd.h b/libav1/build/config/aom_scale_rtcd.h
new file mode 100644
index 0000000..8b92a2d
--- /dev/null
+++ b/libav1/build/config/aom_scale_rtcd.h

@@ -0,0 +1,94 @@
+// This file is generated. Do not edit.
+#ifndef AOM_SCALE_RTCD_H_
+#define AOM_SCALE_RTCD_H_
+
+#ifdef RTCD_C
+#define RTCD_EXTERN
+#else
+#define RTCD_EXTERN extern
+#endif
+
+struct yv12_buffer_config;
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void aom_extend_frame_borders_c(struct yv12_buffer_config *ybf, const int num_planes);
+#define aom_extend_frame_borders aom_extend_frame_borders_c
+
+void aom_extend_frame_borders_y_c(struct yv12_buffer_config *ybf);
+#define aom_extend_frame_borders_y aom_extend_frame_borders_y_c
+
+void aom_extend_frame_inner_borders_c(struct yv12_buffer_config *ybf, const int num_planes);
+#define aom_extend_frame_inner_borders aom_extend_frame_inner_borders_c
+
+void aom_horizontal_line_2_1_scale_c(const unsigned char *source, unsigned int source_width, unsigned char *dest, unsigned int dest_width);
+#define aom_horizontal_line_2_1_scale aom_horizontal_line_2_1_scale_c
+
+void aom_horizontal_line_5_3_scale_c(const unsigned char *source, unsigned int source_width, unsigned char *dest, unsigned int dest_width);
+#define aom_horizontal_line_5_3_scale aom_horizontal_line_5_3_scale_c
+
+void aom_horizontal_line_5_4_scale_c(const unsigned char *source, unsigned int source_width, unsigned char *dest, unsigned int dest_width);
+#define aom_horizontal_line_5_4_scale aom_horizontal_line_5_4_scale_c
+
+void aom_vertical_band_2_1_scale_c(unsigned char *source, int src_pitch, unsigned char *dest, int dest_pitch, unsigned int dest_width);
+#define aom_vertical_band_2_1_scale aom_vertical_band_2_1_scale_c
+
+void aom_vertical_band_2_1_scale_i_c(unsigned char *source, int src_pitch, unsigned char *dest, int dest_pitch, unsigned int dest_width);
+#define aom_vertical_band_2_1_scale_i aom_vertical_band_2_1_scale_i_c
+
+void aom_vertical_band_5_3_scale_c(unsigned char *source, int src_pitch, unsigned char *dest, int dest_pitch, unsigned int dest_width);
+#define aom_vertical_band_5_3_scale aom_vertical_band_5_3_scale_c
+
+void aom_vertical_band_5_4_scale_c(unsigned char *source, int src_pitch, unsigned char *dest, int dest_pitch, unsigned int dest_width);
+#define aom_vertical_band_5_4_scale aom_vertical_band_5_4_scale_c
+
+void aom_yv12_copy_frame_c(const struct yv12_buffer_config *src_bc, struct yv12_buffer_config *dst_bc, const int num_planes);
+#define aom_yv12_copy_frame aom_yv12_copy_frame_c
+
+void aom_yv12_copy_u_c(const struct yv12_buffer_config *src_bc, struct yv12_buffer_config *dst_bc);
+#define aom_yv12_copy_u aom_yv12_copy_u_c
+
+void aom_yv12_copy_v_c(const struct yv12_buffer_config *src_bc, struct yv12_buffer_config *dst_bc);
+#define aom_yv12_copy_v aom_yv12_copy_v_c
+
+void aom_yv12_copy_y_c(const struct yv12_buffer_config *src_ybc, struct yv12_buffer_config *dst_ybc);
+#define aom_yv12_copy_y aom_yv12_copy_y_c
+
+void aom_yv12_extend_frame_borders_c(struct yv12_buffer_config *ybf, const int num_planes);
+#define aom_yv12_extend_frame_borders aom_yv12_extend_frame_borders_c
+
+void aom_yv12_partial_coloc_copy_u_c(const struct yv12_buffer_config *src_bc, struct yv12_buffer_config *dst_bc, int hstart, int hend, int vstart, int vend);
+#define aom_yv12_partial_coloc_copy_u aom_yv12_partial_coloc_copy_u_c
+
+void aom_yv12_partial_coloc_copy_v_c(const struct yv12_buffer_config *src_bc, struct yv12_buffer_config *dst_bc, int hstart, int hend, int vstart, int vend);
+#define aom_yv12_partial_coloc_copy_v aom_yv12_partial_coloc_copy_v_c
+
+void aom_yv12_partial_coloc_copy_y_c(const struct yv12_buffer_config *src_ybc, struct yv12_buffer_config *dst_ybc, int hstart, int hend, int vstart, int vend);
+#define aom_yv12_partial_coloc_copy_y aom_yv12_partial_coloc_copy_y_c
+
+void aom_yv12_partial_copy_u_c(const struct yv12_buffer_config *src_bc, int hstart1, int hend1, int vstart1, int vend1, struct yv12_buffer_config *dst_bc, int hstart2, int vstart2);
+#define aom_yv12_partial_copy_u aom_yv12_partial_copy_u_c
+
+void aom_yv12_partial_copy_v_c(const struct yv12_buffer_config *src_bc, int hstart1, int hend1, int vstart1, int vend1, struct yv12_buffer_config *dst_bc, int hstart2, int vstart2);
+#define aom_yv12_partial_copy_v aom_yv12_partial_copy_v_c
+
+void aom_yv12_partial_copy_y_c(const struct yv12_buffer_config *src_ybc, int hstart1, int hend1, int vstart1, int vend1, struct yv12_buffer_config *dst_ybc, int hstart2, int vstart2);
+#define aom_yv12_partial_copy_y aom_yv12_partial_copy_y_c
+
+void aom_scale_rtcd(void);
+
+#include "config/aom_config.h"
+
+#ifdef RTCD_C
+static void setup_rtcd_internal(void)
+{
+}
+#endif
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif

diff --git a/libav1/build/config/aom_version.h b/libav1/build/config/aom_version.h
new file mode 100644
index 0000000..2cb85f4
--- /dev/null
+++ b/libav1/build/config/aom_version.h

@@ -0,0 +1,19 @@
+/*
+ * Copyright (c) 2019, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#define VERSION_MAJOR 1
+#define VERSION_MINOR 0
+#define VERSION_PATCH 0
+#define VERSION_EXTRA ""
+#define VERSION_PACKED \
+  ((VERSION_MAJOR << 16) | (VERSION_MINOR << 8) | (VERSION_PATCH))
+#define VERSION_STRING_NOSP "v1.0.0"
+#define VERSION_STRING " v1.0.0"

diff --git a/libav1/build/config/av1_rtcd.h b/libav1/build/config/av1_rtcd.h
new file mode 100644
index 0000000..fe89a3a
--- /dev/null
+++ b/libav1/build/config/av1_rtcd.h

@@ -0,0 +1,474 @@
+// This file is generated. Do not edit.
+#ifndef AV1_RTCD_H_
+#define AV1_RTCD_H_
+
+#ifdef RTCD_C
+#define RTCD_EXTERN
+#else
+#define RTCD_EXTERN extern
+#endif
+
+/*
+ * AV1
+ */
+
+#include "aom/aom_integer.h"
+#include "aom_dsp/txfm_common.h"
+#include "av1/common/common.h"
+#include "av1/common/enums.h"
+#include "av1/common/quant_common.h"
+#include "av1/common/filter.h"
+#include "av1/common/convolve.h"
+#include "av1/common/av1_txfm.h"
+#include "av1/common/odintrin.h"
+#include "av1/common/restoration.h"
+
+struct macroblockd;
+
+/* Encoder forward decls */
+struct macroblock;
+struct txfm_param;
+struct aom_variance_vtable;
+struct search_site_config;
+struct yv12_buffer_config;
+struct NN_CONFIG;
+typedef struct NN_CONFIG NN_CONFIG;
+
+/* Function pointers return by CfL functions */
+typedef void (*cfl_subsample_lbd_fn)(const uint8_t *input, int input_stride,
+                                     uint16_t *output_q3);
+
+typedef void (*cfl_subsample_hbd_fn)(const uint16_t *input, int input_stride,
+                                     uint16_t *output_q3);
+
+typedef void (*cfl_subtract_average_fn)(const uint16_t *src, int16_t *dst);
+
+typedef void (*cfl_predict_lbd_fn)(const int16_t *src, uint8_t *dst,
+                                   int dst_stride, int alpha_q3);
+
+typedef void (*cfl_predict_hbd_fn)(const int16_t *src, uint16_t *dst,
+                                   int dst_stride, int alpha_q3, int bd);
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void apply_selfguided_restoration_c(const uint8_t *dat, int width, int height, int stride, int eps, const int *xqd, uint8_t *dst, int dst_stride, int32_t *tmpbuf, int bit_depth, int highbd);
+#define apply_selfguided_restoration apply_selfguided_restoration_c
+
+void av1_apply_temporal_filter_c(const uint8_t *y_frame1, int y_stride, const uint8_t *y_pred, int y_buf_stride, const uint8_t *u_frame1, const uint8_t *v_frame1, int uv_stride, const uint8_t *u_pred, const uint8_t *v_pred, int uv_buf_stride, unsigned int block_width, unsigned int block_height, int ss_x, int ss_y, int strength, const int *blk_fw, int use_32x32, uint32_t *y_accumulator, uint16_t *y_count, uint32_t *u_accumulator, uint16_t *u_count, uint32_t *v_accumulator, uint16_t *v_count);
+#define av1_apply_temporal_filter av1_apply_temporal_filter_c
+
+int64_t av1_block_error_c(const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz);
+#define av1_block_error av1_block_error_c
+
+void av1_build_compound_diffwtd_mask_c(uint8_t *mask, DIFFWTD_MASK_TYPE mask_type, const uint8_t *src0, int src0_stride, const uint8_t *src1, int src1_stride, int h, int w);
+#define av1_build_compound_diffwtd_mask av1_build_compound_diffwtd_mask_c
+
+void av1_build_compound_diffwtd_mask_d16_c(uint8_t *mask, DIFFWTD_MASK_TYPE mask_type, const CONV_BUF_TYPE *src0, int src0_stride, const CONV_BUF_TYPE *src1, int src1_stride, int h, int w, ConvolveParams *conv_params, int bd);
+#define av1_build_compound_diffwtd_mask_d16 av1_build_compound_diffwtd_mask_d16_c
+
+void av1_build_compound_diffwtd_mask_highbd_c(uint8_t *mask, DIFFWTD_MASK_TYPE mask_type, const uint8_t *src0, int src0_stride, const uint8_t *src1, int src1_stride, int h, int w, int bd);
+#define av1_build_compound_diffwtd_mask_highbd av1_build_compound_diffwtd_mask_highbd_c
+
+void av1_compute_stats_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H);
+#define av1_compute_stats av1_compute_stats_c
+
+void av1_compute_stats_highbd_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, aom_bit_depth_t bit_depth);
+#define av1_compute_stats_highbd av1_compute_stats_highbd_c
+
+void av1_convolve_2d_copy_sr_c(const uint8_t *src, int src_stride, uint8_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int subpel_y_q4, ConvolveParams *conv_params);
+#define av1_convolve_2d_copy_sr av1_convolve_2d_copy_sr_c
+
+void av1_convolve_2d_scale_c(const uint8_t *src, int src_stride, uint8_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_qn, const int x_step_qn, const int subpel_y_q4, const int y_step_qn, ConvolveParams *conv_params);
+#define av1_convolve_2d_scale av1_convolve_2d_scale_c
+
+void av1_convolve_2d_sr_c(const uint8_t *src, int src_stride, uint8_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int subpel_y_q4, ConvolveParams *conv_params);
+#define av1_convolve_2d_sr av1_convolve_2d_sr_c
+
+void av1_convolve_horiz_rs_c(const uint8_t *src, int src_stride, uint8_t *dst, int dst_stride, int w, int h, const int16_t *x_filters, int x0_qn, int x_step_qn);
+#define av1_convolve_horiz_rs av1_convolve_horiz_rs_c
+
+void av1_convolve_x_sr_c(const uint8_t *src, int src_stride, uint8_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int subpel_y_q4, ConvolveParams *conv_params);
+#define av1_convolve_x_sr av1_convolve_x_sr_c
+
+void av1_convolve_y_sr_c(const uint8_t *src, int src_stride, uint8_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int subpel_y_q4, ConvolveParams *conv_params);
+#define av1_convolve_y_sr av1_convolve_y_sr_c
+
+int av1_diamond_search_sad_c(struct macroblock *x, const struct search_site_config *cfg,  MV *ref_mv, MV *best_mv, int search_param, int sad_per_bit, int *num00, const struct aom_variance_vtable *fn_ptr, const MV *center_mv);
+#define av1_diamond_search_sad av1_diamond_search_sad_c
+
+void av1_dist_wtd_convolve_2d_c(const uint8_t *src, int src_stride, uint8_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int subpel_y_q4, ConvolveParams *conv_params);
+#define av1_dist_wtd_convolve_2d av1_dist_wtd_convolve_2d_c
+
+void av1_dist_wtd_convolve_2d_copy_c(const uint8_t *src, int src_stride, uint8_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int subpel_y_q4, ConvolveParams *conv_params);
+#define av1_dist_wtd_convolve_2d_copy av1_dist_wtd_convolve_2d_copy_c
+
+void av1_dist_wtd_convolve_x_c(const uint8_t *src, int src_stride, uint8_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int subpel_y_q4, ConvolveParams *conv_params);
+#define av1_dist_wtd_convolve_x av1_dist_wtd_convolve_x_c
+
+void av1_dist_wtd_convolve_y_c(const uint8_t *src, int src_stride, uint8_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int subpel_y_q4, ConvolveParams *conv_params);
+#define av1_dist_wtd_convolve_y av1_dist_wtd_convolve_y_c
+
+void av1_dr_prediction_z1_c(uint8_t *dst, ptrdiff_t stride, int bw, int bh, const uint8_t *above, const uint8_t *left, int upsample_above, int dx, int dy);
+#define av1_dr_prediction_z1 av1_dr_prediction_z1_c
+
+void av1_dr_prediction_z2_c(uint8_t *dst, ptrdiff_t stride, int bw, int bh, const uint8_t *above, const uint8_t *left, int upsample_above, int upsample_left, int dx, int dy);
+#define av1_dr_prediction_z2 av1_dr_prediction_z2_c
+
+void av1_dr_prediction_z3_c(uint8_t *dst, ptrdiff_t stride, int bw, int bh, const uint8_t *above, const uint8_t *left, int upsample_left, int dx, int dy);
+#define av1_dr_prediction_z3 av1_dr_prediction_z3_c
+
+void av1_filter_intra_edge_c(uint8_t *p, int sz, int strength);
+#define av1_filter_intra_edge av1_filter_intra_edge_c
+
+void av1_filter_intra_edge_high_c(uint16_t *p, int sz, int strength);
+#define av1_filter_intra_edge_high av1_filter_intra_edge_high_c
+
+void av1_filter_intra_predictor_c(uint8_t *dst, ptrdiff_t stride, TX_SIZE tx_size, const uint8_t *above, const uint8_t *left, int mode);
+#define av1_filter_intra_predictor av1_filter_intra_predictor_c
+
+int av1_full_range_search_c(const struct macroblock *x, const struct search_site_config *cfg, MV *ref_mv, MV *best_mv, int search_param, int sad_per_bit, int *num00, const struct aom_variance_vtable *fn_ptr, const MV *center_mv);
+#define av1_full_range_search av1_full_range_search_c
+
+void av1_fwd_txfm2d_16x16_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_16x16 av1_fwd_txfm2d_16x16_c
+
+void av1_fwd_txfm2d_16x32_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_16x32 av1_fwd_txfm2d_16x32_c
+
+void av1_fwd_txfm2d_16x4_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_16x4 av1_fwd_txfm2d_16x4_c
+
+void av1_fwd_txfm2d_16x64_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_16x64 av1_fwd_txfm2d_16x64_c
+
+void av1_fwd_txfm2d_16x8_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_16x8 av1_fwd_txfm2d_16x8_c
+
+void av1_fwd_txfm2d_32x16_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_32x16 av1_fwd_txfm2d_32x16_c
+
+void av1_fwd_txfm2d_32x32_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_32x32 av1_fwd_txfm2d_32x32_c
+
+void av1_fwd_txfm2d_32x64_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_32x64 av1_fwd_txfm2d_32x64_c
+
+void av1_fwd_txfm2d_32x8_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_32x8 av1_fwd_txfm2d_32x8_c
+
+void av1_fwd_txfm2d_4x16_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_4x16 av1_fwd_txfm2d_4x16_c
+
+void av1_fwd_txfm2d_4x4_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_4x4 av1_fwd_txfm2d_4x4_c
+
+void av1_fwd_txfm2d_4x8_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_4x8 av1_fwd_txfm2d_4x8_c
+
+void av1_fwd_txfm2d_64x16_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_64x16 av1_fwd_txfm2d_64x16_c
+
+void av1_fwd_txfm2d_64x32_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_64x32 av1_fwd_txfm2d_64x32_c
+
+void av1_fwd_txfm2d_64x64_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_64x64 av1_fwd_txfm2d_64x64_c
+
+void av1_fwd_txfm2d_8x16_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_8x16 av1_fwd_txfm2d_8x16_c
+
+void av1_fwd_txfm2d_8x32_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_8x32 av1_fwd_txfm2d_8x32_c
+
+void av1_fwd_txfm2d_8x4_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_8x4 av1_fwd_txfm2d_8x4_c
+
+void av1_fwd_txfm2d_8x8_c(const int16_t *input, int32_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_fwd_txfm2d_8x8 av1_fwd_txfm2d_8x8_c
+
+void av1_fwht4x4_c(const int16_t *input, tran_low_t *output, int stride);
+#define av1_fwht4x4 av1_fwht4x4_c
+
+uint32_t av1_get_crc32c_value_c(void *crc_calculator, uint8_t *p, int length);
+#define av1_get_crc32c_value av1_get_crc32c_value_c
+
+void av1_get_horver_correlation_full_c( const int16_t *diff, int stride, int w, int h, float *hcorr, float *vcorr);
+#define av1_get_horver_correlation_full av1_get_horver_correlation_full_c
+
+void av1_get_nz_map_contexts_c(const uint8_t *const levels, const int16_t *const scan, const uint16_t eob, const TX_SIZE tx_size, const TX_CLASS tx_class, int8_t *const coeff_contexts);
+#define av1_get_nz_map_contexts av1_get_nz_map_contexts_c
+
+void av1_highbd_apply_temporal_filter_c(const uint8_t *yf, int y_stride, const uint8_t *yp, int y_buf_stride, const uint8_t *uf, const uint8_t *vf, int uv_stride, const uint8_t *up, const uint8_t *vp, int uv_buf_stride, unsigned int block_width, unsigned int block_height, int ss_x, int ss_y, int strength, const int *blk_fw, int use_32x32, uint32_t *y_accumulator, uint16_t *y_count, uint32_t *u_accumulator, uint16_t *u_count, uint32_t *v_accumulator, uint16_t *v_count);
+#define av1_highbd_apply_temporal_filter av1_highbd_apply_temporal_filter_c
+
+int64_t av1_highbd_block_error_c(const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz, int bd);
+#define av1_highbd_block_error av1_highbd_block_error_c
+
+void av1_highbd_convolve8_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, int bps);
+#define av1_highbd_convolve8 av1_highbd_convolve8_c
+
+void av1_highbd_convolve8_horiz_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, int bps);
+#define av1_highbd_convolve8_horiz av1_highbd_convolve8_horiz_c
+
+void av1_highbd_convolve8_vert_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, int bps);
+#define av1_highbd_convolve8_vert av1_highbd_convolve8_vert_c
+
+void av1_highbd_convolve_2d_copy_sr_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int subpel_y_q4, ConvolveParams *conv_params, int bd);
+#define av1_highbd_convolve_2d_copy_sr av1_highbd_convolve_2d_copy_sr_c
+
+void av1_highbd_convolve_2d_scale_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int x_step_qn, const int subpel_y_q4, const int y_step_qn, ConvolveParams *conv_params, int bd);
+#define av1_highbd_convolve_2d_scale av1_highbd_convolve_2d_scale_c
+
+void av1_highbd_convolve_2d_sr_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int subpel_y_q4, ConvolveParams *conv_params, int bd);
+#define av1_highbd_convolve_2d_sr av1_highbd_convolve_2d_sr_c
+
+void av1_highbd_convolve_avg_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, int bps);
+#define av1_highbd_convolve_avg av1_highbd_convolve_avg_c
+
+void av1_highbd_convolve_copy_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, int bps);
+#define av1_highbd_convolve_copy av1_highbd_convolve_copy_c
+
+void av1_highbd_convolve_horiz_rs_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const int16_t *x_filters, int x0_qn, int x_step_qn, int bd);
+#define av1_highbd_convolve_horiz_rs av1_highbd_convolve_horiz_rs_c
+
+void av1_highbd_convolve_x_sr_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int subpel_y_q4, ConvolveParams *conv_params, int bd);
+#define av1_highbd_convolve_x_sr av1_highbd_convolve_x_sr_c
+
+void av1_highbd_convolve_y_sr_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int subpel_y_q4, ConvolveParams *conv_params, int bd);
+#define av1_highbd_convolve_y_sr av1_highbd_convolve_y_sr_c
+
+void av1_highbd_dist_wtd_convolve_2d_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int subpel_y_q4, ConvolveParams *conv_params, int bd);
+#define av1_highbd_dist_wtd_convolve_2d av1_highbd_dist_wtd_convolve_2d_c
+
+void av1_highbd_dist_wtd_convolve_2d_copy_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int subpel_y_q4, ConvolveParams *conv_params, int bd);
+#define av1_highbd_dist_wtd_convolve_2d_copy av1_highbd_dist_wtd_convolve_2d_copy_c
+
+void av1_highbd_dist_wtd_convolve_x_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int subpel_y_q4, ConvolveParams *conv_params, int bd);
+#define av1_highbd_dist_wtd_convolve_x av1_highbd_dist_wtd_convolve_x_c
+
+void av1_highbd_dist_wtd_convolve_y_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_q4, const int subpel_y_q4, ConvolveParams *conv_params, int bd);
+#define av1_highbd_dist_wtd_convolve_y av1_highbd_dist_wtd_convolve_y_c
+
+void av1_highbd_dr_prediction_z1_c(uint16_t *dst, ptrdiff_t stride, int bw, int bh, const uint16_t *above, const uint16_t *left, int upsample_above, int dx, int dy, int bd);
+#define av1_highbd_dr_prediction_z1 av1_highbd_dr_prediction_z1_c
+
+void av1_highbd_dr_prediction_z2_c(uint16_t *dst, ptrdiff_t stride, int bw, int bh, const uint16_t *above, const uint16_t *left, int upsample_above, int upsample_left, int dx, int dy, int bd);
+#define av1_highbd_dr_prediction_z2 av1_highbd_dr_prediction_z2_c
+
+void av1_highbd_dr_prediction_z3_c(uint16_t *dst, ptrdiff_t stride, int bw, int bh, const uint16_t *above, const uint16_t *left, int upsample_left, int dx, int dy, int bd);
+#define av1_highbd_dr_prediction_z3 av1_highbd_dr_prediction_z3_c
+
+void av1_highbd_fwht4x4_c(const int16_t *input, tran_low_t *output, int stride);
+#define av1_highbd_fwht4x4 av1_highbd_fwht4x4_c
+
+void av1_highbd_inv_txfm_add_c(const tran_low_t *dqcoeff, uint8_t *dst, int stride, const TxfmParam *txfm_param);
+#define av1_highbd_inv_txfm_add av1_highbd_inv_txfm_add_c
+
+void av1_highbd_inv_txfm_add_16x4_c(const tran_low_t *dqcoeff, uint8_t *dst, int stride, const TxfmParam *txfm_param);
+#define av1_highbd_inv_txfm_add_16x4 av1_highbd_inv_txfm_add_16x4_c
+
+void av1_highbd_inv_txfm_add_4x16_c(const tran_low_t *dqcoeff, uint8_t *dst, int stride, const TxfmParam *txfm_param);
+#define av1_highbd_inv_txfm_add_4x16 av1_highbd_inv_txfm_add_4x16_c
+
+void av1_highbd_inv_txfm_add_4x4_c(const tran_low_t *dqcoeff, uint8_t *dst, int stride, const TxfmParam *txfm_param);
+#define av1_highbd_inv_txfm_add_4x4 av1_highbd_inv_txfm_add_4x4_c
+
+void av1_highbd_inv_txfm_add_4x8_c(const tran_low_t *dqcoeff, uint8_t *dst, int stride, const TxfmParam *txfm_param);
+#define av1_highbd_inv_txfm_add_4x8 av1_highbd_inv_txfm_add_4x8_c
+
+void av1_highbd_inv_txfm_add_8x4_c(const tran_low_t *dqcoeff, uint8_t *dst, int stride, const TxfmParam *txfm_param);
+#define av1_highbd_inv_txfm_add_8x4 av1_highbd_inv_txfm_add_8x4_c
+
+void av1_highbd_inv_txfm_add_8x8_c(const tran_low_t *dqcoeff, uint8_t *dst, int stride, const TxfmParam *txfm_param);
+#define av1_highbd_inv_txfm_add_8x8 av1_highbd_inv_txfm_add_8x8_c
+
+void av1_highbd_iwht4x4_16_add_c(const tran_low_t *input, uint8_t *dest, int dest_stride, int bd);
+#define av1_highbd_iwht4x4_16_add av1_highbd_iwht4x4_16_add_c
+
+void av1_highbd_iwht4x4_1_add_c(const tran_low_t *input, uint8_t *dest, int dest_stride, int bd);
+#define av1_highbd_iwht4x4_1_add av1_highbd_iwht4x4_1_add_c
+
+int64_t av1_highbd_pixel_proj_error_c( const uint8_t *src8, int width, int height, int src_stride, const uint8_t *dat8, int dat_stride, int32_t *flt0, int flt0_stride, int32_t *flt1, int flt1_stride, int xq[2], const sgr_params_type *params);
+#define av1_highbd_pixel_proj_error av1_highbd_pixel_proj_error_c
+
+void av1_highbd_quantize_fp_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan, int log_scale);
+#define av1_highbd_quantize_fp av1_highbd_quantize_fp_c
+
+void av1_highbd_warp_affine_c(const int32_t *mat, const uint16_t *ref, int width, int height, int stride, uint16_t *pred, int p_col, int p_row, int p_width, int p_height, int p_stride, int subsampling_x, int subsampling_y, int bd, ConvolveParams *conv_params, int16_t alpha, int16_t beta, int16_t gamma, int16_t delta);
+#define av1_highbd_warp_affine av1_highbd_warp_affine_c
+
+void av1_highbd_wiener_convolve_add_src_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, const ConvolveParams *conv_params, int bps);
+#define av1_highbd_wiener_convolve_add_src av1_highbd_wiener_convolve_add_src_c
+
+void av1_inv_txfm2d_add_16x16_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_16x16 av1_inv_txfm2d_add_16x16_c
+
+void av1_inv_txfm2d_add_16x32_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_16x32 av1_inv_txfm2d_add_16x32_c
+
+void av1_inv_txfm2d_add_16x4_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_16x4 av1_inv_txfm2d_add_16x4_c
+
+void av1_inv_txfm2d_add_16x64_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_16x64 av1_inv_txfm2d_add_16x64_c
+
+void av1_inv_txfm2d_add_16x8_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_16x8 av1_inv_txfm2d_add_16x8_c
+
+void av1_inv_txfm2d_add_32x16_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_32x16 av1_inv_txfm2d_add_32x16_c
+
+void av1_inv_txfm2d_add_32x32_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_32x32 av1_inv_txfm2d_add_32x32_c
+
+void av1_inv_txfm2d_add_32x64_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_32x64 av1_inv_txfm2d_add_32x64_c
+
+void av1_inv_txfm2d_add_32x8_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_32x8 av1_inv_txfm2d_add_32x8_c
+
+void av1_inv_txfm2d_add_4x16_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_4x16 av1_inv_txfm2d_add_4x16_c
+
+void av1_inv_txfm2d_add_4x4_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_4x4 av1_inv_txfm2d_add_4x4_c
+
+void av1_inv_txfm2d_add_4x8_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_4x8 av1_inv_txfm2d_add_4x8_c
+
+void av1_inv_txfm2d_add_64x16_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_64x16 av1_inv_txfm2d_add_64x16_c
+
+void av1_inv_txfm2d_add_64x32_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_64x32 av1_inv_txfm2d_add_64x32_c
+
+void av1_inv_txfm2d_add_64x64_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_64x64 av1_inv_txfm2d_add_64x64_c
+
+void av1_inv_txfm2d_add_8x16_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_8x16 av1_inv_txfm2d_add_8x16_c
+
+void av1_inv_txfm2d_add_8x32_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_8x32 av1_inv_txfm2d_add_8x32_c
+
+void av1_inv_txfm2d_add_8x4_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_8x4 av1_inv_txfm2d_add_8x4_c
+
+void av1_inv_txfm2d_add_8x8_c(const int32_t *input, uint16_t *output, int stride, TX_TYPE tx_type, int bd);
+#define av1_inv_txfm2d_add_8x8 av1_inv_txfm2d_add_8x8_c
+
+void av1_inv_txfm_add_c(const tran_low_t *dqcoeff, uint8_t *dst, int stride, const TxfmParam *txfm_param);
+#define av1_inv_txfm_add av1_inv_txfm_add_c
+
+void av1_lowbd_fwd_txfm_c(const int16_t *src_diff, tran_low_t *coeff, int diff_stride, TxfmParam *txfm_param);
+#define av1_lowbd_fwd_txfm av1_lowbd_fwd_txfm_c
+
+int64_t av1_lowbd_pixel_proj_error_c( const uint8_t *src8, int width, int height, int src_stride, const uint8_t *dat8, int dat_stride, int32_t *flt0, int flt0_stride, int32_t *flt1, int flt1_stride, int xq[2], const sgr_params_type *params);
+#define av1_lowbd_pixel_proj_error av1_lowbd_pixel_proj_error_c
+
+void av1_nn_predict_c( const float *input_nodes, const NN_CONFIG *const nn_config, float *const output);
+#define av1_nn_predict av1_nn_predict_c
+
+void av1_quantize_b_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan, const qm_val_t * qm_ptr, const qm_val_t * iqm_ptr, int log_scale);
+#define av1_quantize_b av1_quantize_b_c
+
+void av1_quantize_fp_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
+#define av1_quantize_fp av1_quantize_fp_c
+
+void av1_quantize_fp_32x32_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
+#define av1_quantize_fp_32x32 av1_quantize_fp_32x32_c
+
+void av1_quantize_fp_64x64_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
+#define av1_quantize_fp_64x64 av1_quantize_fp_64x64_c
+
+void av1_round_shift_array_c(int32_t *arr, int size, int bit);
+#define av1_round_shift_array av1_round_shift_array_c
+
+int av1_selfguided_restoration_c(const uint8_t *dgd8, int width, int height,
+                                 int dgd_stride, int32_t *flt0, int32_t *flt1, int flt_stride,
+                                 int sgr_params_idx, int bit_depth, int highbd);
+#define av1_selfguided_restoration av1_selfguided_restoration_c
+
+void av1_txb_init_levels_c(const tran_low_t *const coeff, const int width, const int height, uint8_t *const levels);
+#define av1_txb_init_levels av1_txb_init_levels_c
+
+void av1_upsample_intra_edge_c(uint8_t *p, int sz);
+#define av1_upsample_intra_edge av1_upsample_intra_edge_c
+
+void av1_upsample_intra_edge_high_c(uint16_t *p, int sz, int bd);
+#define av1_upsample_intra_edge_high av1_upsample_intra_edge_high_c
+
+void av1_warp_affine_c(const int32_t *mat, const uint8_t *ref, int width, int height, int stride, uint8_t *pred, int p_col, int p_row, int p_width, int p_height, int p_stride, int subsampling_x, int subsampling_y, ConvolveParams *conv_params, int16_t alpha, int16_t beta, int16_t gamma, int16_t delta);
+#define av1_warp_affine av1_warp_affine_c
+
+void av1_wedge_compute_delta_squares_c(int16_t *d, const int16_t *a, const int16_t *b, int N);
+#define av1_wedge_compute_delta_squares av1_wedge_compute_delta_squares_c
+
+int av1_wedge_sign_from_residuals_c(const int16_t *ds, const uint8_t *m, int N, int64_t limit);
+#define av1_wedge_sign_from_residuals av1_wedge_sign_from_residuals_c
+
+uint64_t av1_wedge_sse_from_residuals_c(const int16_t *r1, const int16_t *d, const uint8_t *m, int N);
+#define av1_wedge_sse_from_residuals av1_wedge_sse_from_residuals_c
+
+void av1_wiener_convolve_add_src_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, const ConvolveParams *conv_params);
+#define av1_wiener_convolve_add_src av1_wiener_convolve_add_src_c
+
+void cdef_filter_block_c(uint8_t *dst8, uint16_t *dst16, int dstride, const uint16_t *in, int pri_strength, int sec_strength, int dir, int pri_damping, int sec_damping, int bsize, int coeff_shift);
+#define cdef_filter_block cdef_filter_block_c
+
+int cdef_find_dir_c(const uint16_t *img, int stride, int32_t *var, int coeff_shift);
+#define cdef_find_dir cdef_find_dir_c
+
+cfl_subsample_hbd_fn cfl_get_luma_subsampling_420_hbd_c(TX_SIZE tx_size);
+#define cfl_get_luma_subsampling_420_hbd cfl_get_luma_subsampling_420_hbd_c
+
+cfl_subsample_lbd_fn cfl_get_luma_subsampling_420_lbd_c(TX_SIZE tx_size);
+#define cfl_get_luma_subsampling_420_lbd cfl_get_luma_subsampling_420_lbd_c
+
+cfl_subsample_hbd_fn cfl_get_luma_subsampling_422_hbd_c(TX_SIZE tx_size);
+#define cfl_get_luma_subsampling_422_hbd cfl_get_luma_subsampling_422_hbd_c
+
+cfl_subsample_lbd_fn cfl_get_luma_subsampling_422_lbd_c(TX_SIZE tx_size);
+#define cfl_get_luma_subsampling_422_lbd cfl_get_luma_subsampling_422_lbd_c
+
+cfl_subsample_hbd_fn cfl_get_luma_subsampling_444_hbd_c(TX_SIZE tx_size);
+#define cfl_get_luma_subsampling_444_hbd cfl_get_luma_subsampling_444_hbd_c
+
+cfl_subsample_lbd_fn cfl_get_luma_subsampling_444_lbd_c(TX_SIZE tx_size);
+#define cfl_get_luma_subsampling_444_lbd cfl_get_luma_subsampling_444_lbd_c
+
+double compute_cross_correlation_c(unsigned char *im1, int stride1, int x1, int y1, unsigned char *im2, int stride2, int x2, int y2);
+#define compute_cross_correlation compute_cross_correlation_c
+
+void copy_rect8_16bit_to_16bit_c(uint16_t *dst, int dstride, const uint16_t *src, int sstride, int v, int h);
+#define copy_rect8_16bit_to_16bit copy_rect8_16bit_to_16bit_c
+
+void copy_rect8_8bit_to_16bit_c(uint16_t *dst, int dstride, const uint8_t *src, int sstride, int v, int h);
+#define copy_rect8_8bit_to_16bit copy_rect8_8bit_to_16bit_c
+
+cfl_predict_hbd_fn get_predict_hbd_fn_c(TX_SIZE tx_size);
+#define get_predict_hbd_fn get_predict_hbd_fn_c
+
+cfl_predict_lbd_fn get_predict_lbd_fn_c(TX_SIZE tx_size);
+#define get_predict_lbd_fn get_predict_lbd_fn_c
+
+cfl_subtract_average_fn get_subtract_average_fn_c(TX_SIZE tx_size);
+#define get_subtract_average_fn get_subtract_average_fn_c
+
+void av1_rtcd(void);
+
+#include "config/aom_config.h"
+
+#ifdef RTCD_C
+static void setup_rtcd_internal(void)
+{
+}
+#endif
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif

diff --git a/libav1/build/libav1.vcxproj b/libav1/build/libav1.vcxproj
new file mode 100644
index 0000000..05d80f0
--- /dev/null
+++ b/libav1/build/libav1.vcxproj

@@ -0,0 +1,999 @@
+<?xml version="1.0" encoding="utf-8"?>

+<Project DefaultTargets="Build" ToolsVersion="15.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

+  <ItemGroup Label="ProjectConfigurations">

+    <ProjectConfiguration Include="Debug|Win32">

+      <Configuration>Debug</Configuration>

+      <Platform>Win32</Platform>

+    </ProjectConfiguration>

+    <ProjectConfiguration Include="Debug|x64">

+      <Configuration>Debug</Configuration>

+      <Platform>x64</Platform>

+    </ProjectConfiguration>

+    <ProjectConfiguration Include="Release|Win32">

+      <Configuration>Release</Configuration>

+      <Platform>Win32</Platform>

+    </ProjectConfiguration>

+    <ProjectConfiguration Include="Release|x64">

+      <Configuration>Release</Configuration>

+      <Platform>x64</Platform>

+    </ProjectConfiguration>

+  </ItemGroup>

+  <PropertyGroup Label="Globals">

+    <ProjectGuid>{E68F3562-9876-4971-8D90-27035E27A825}</ProjectGuid>

+    <Keyword>StaticLibrary</Keyword>

+    <RootNamespace>libav1</RootNamespace>

+    <DefaultLanguage>en-US</DefaultLanguage>

+    <MinimumVisualStudioVersion>14.0</MinimumVisualStudioVersion>

+    <AppContainerApplication>true</AppContainerApplication>

+    <ApplicationType>Windows Store</ApplicationType>

+    <WindowsTargetPlatformVersion>10.0.17763.0</WindowsTargetPlatformVersion>

+    <WindowsTargetPlatformMinVersion>10.0.16299.0</WindowsTargetPlatformMinVersion>

+    <ApplicationTypeRevision>10.0</ApplicationTypeRevision>

+  </PropertyGroup>

+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">

+    <ConfigurationType>StaticLibrary</ConfigurationType>

+    <UseDebugLibraries>true</UseDebugLibraries>

+    <PlatformToolset>v141</PlatformToolset>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">

+    <ConfigurationType>StaticLibrary</ConfigurationType>

+    <UseDebugLibraries>true</UseDebugLibraries>

+    <PlatformToolset>v141</PlatformToolset>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">

+    <ConfigurationType>StaticLibrary</ConfigurationType>

+    <UseDebugLibraries>false</UseDebugLibraries>

+    <WholeProgramOptimization>true</WholeProgramOptimization>

+    <PlatformToolset>v141</PlatformToolset>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">

+    <ConfigurationType>StaticLibrary</ConfigurationType>

+    <UseDebugLibraries>false</UseDebugLibraries>

+    <WholeProgramOptimization>true</WholeProgramOptimization>

+    <PlatformToolset>v141</PlatformToolset>

+  </PropertyGroup>

+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />

+  <ImportGroup Label="ExtensionSettings">

+  </ImportGroup>

+  <ImportGroup Label="Shared">

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <PropertyGroup Label="UserMacros" />

+  <PropertyGroup />

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+    <GenerateManifest>false</GenerateManifest>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+    <GenerateManifest>false</GenerateManifest>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+    <GenerateManifest>false</GenerateManifest>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+    <GenerateManifest>false</GenerateManifest>

+  </PropertyGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+    <ClCompile>

+      <PrecompiledHeader>Use</PrecompiledHeader>

+      <CompileAsWinRT>false</CompileAsWinRT>

+      <SDLCheck>true</SDLCheck>

+      <AdditionalIncludeDirectories>$(ProjectDir)..\;$(ProjectDir);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>

+      <PreprocessorDefinitions>_UNICODE;UNICODE;%(PreprocessorDefinitions)</PreprocessorDefinitions>

+    </ClCompile>

+    <Link>

+      <SubSystem>Console</SubSystem>

+      <IgnoreAllDefaultLibraries>false</IgnoreAllDefaultLibraries>

+      <GenerateWindowsMetadata>false</GenerateWindowsMetadata>

+    </Link>

+    <FxCompile>

+      <VariableName>%(Filename)</VariableName>

+    </FxCompile>

+    <FxCompile>

+      <HeaderFileOutput>$(SolutionDir)libav1\dx\shaders\h\%(Filename).h</HeaderFileOutput>

+      <ShaderModel>5.0</ShaderModel>

+      <ShaderType>Compute</ShaderType>

+      <ObjectFileOutput />

+    </FxCompile>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+    <ClCompile>

+      <PrecompiledHeader>Use</PrecompiledHeader>

+      <CompileAsWinRT>false</CompileAsWinRT>

+      <SDLCheck>true</SDLCheck>

+      <AdditionalIncludeDirectories>$(ProjectDir)..\;$(ProjectDir);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>

+      <PreprocessorDefinitions>_UNICODE;UNICODE;%(PreprocessorDefinitions)</PreprocessorDefinitions>

+    </ClCompile>

+    <Link>

+      <SubSystem>Console</SubSystem>

+      <IgnoreAllDefaultLibraries>false</IgnoreAllDefaultLibraries>

+      <GenerateWindowsMetadata>false</GenerateWindowsMetadata>

+    </Link>

+    <FxCompile>

+      <VariableName>%(Filename)</VariableName>

+    </FxCompile>

+    <FxCompile>

+      <HeaderFileOutput>$(SolutionDir)libav1\dx\shaders\h\%(Filename).h</HeaderFileOutput>

+      <ShaderModel>5.0</ShaderModel>

+      <ShaderType>Compute</ShaderType>

+      <ObjectFileOutput />

+    </FxCompile>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|arm'">

+    <ClCompile>

+      <PrecompiledHeader>Use</PrecompiledHeader>

+      <CompileAsWinRT>false</CompileAsWinRT>

+      <SDLCheck>true</SDLCheck>

+    </ClCompile>

+    <Link>

+      <SubSystem>Console</SubSystem>

+      <IgnoreAllDefaultLibraries>false</IgnoreAllDefaultLibraries>

+      <GenerateWindowsMetadata>false</GenerateWindowsMetadata>

+    </Link>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|arm'">

+    <ClCompile>

+      <PrecompiledHeader>Use</PrecompiledHeader>

+      <CompileAsWinRT>false</CompileAsWinRT>

+      <SDLCheck>true</SDLCheck>

+    </ClCompile>

+    <Link>

+      <SubSystem>Console</SubSystem>

+      <IgnoreAllDefaultLibraries>false</IgnoreAllDefaultLibraries>

+      <GenerateWindowsMetadata>false</GenerateWindowsMetadata>

+    </Link>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+    <ClCompile>

+      <PrecompiledHeader>NotUsing</PrecompiledHeader>

+      <CompileAsWinRT>false</CompileAsWinRT>

+      <SDLCheck>true</SDLCheck>

+      <PreprocessorDefinitions>_CRT_SECURE_NO_WARNINGS;_UNICODE;UNICODE;%(PreprocessorDefinitions)</PreprocessorDefinitions>

+      <AdditionalIncludeDirectories>$(ProjectDir)..\;$(ProjectDir);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>

+    </ClCompile>

+    <Link>

+      <SubSystem>Console</SubSystem>

+      <IgnoreAllDefaultLibraries>false</IgnoreAllDefaultLibraries>

+      <GenerateWindowsMetadata>false</GenerateWindowsMetadata>

+    </Link>

+    <FxCompile>

+      <VariableName>%(Filename)</VariableName>

+    </FxCompile>

+    <FxCompile>

+      <HeaderFileOutput>$(SolutionDir)libav1\dx\shaders\h\%(Filename).h</HeaderFileOutput>

+      <ShaderModel>5.0</ShaderModel>

+      <ShaderType>Compute</ShaderType>

+      <ObjectFileOutput />

+      <DisableOptimizations>true</DisableOptimizations>

+    </FxCompile>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+    <ClCompile>

+      <PrecompiledHeader>NotUsing</PrecompiledHeader>

+      <CompileAsWinRT>false</CompileAsWinRT>

+      <SDLCheck>true</SDLCheck>

+      <PreprocessorDefinitions>_CRT_SECURE_NO_WARNINGS;_UNICODE;UNICODE;%(PreprocessorDefinitions)</PreprocessorDefinitions>

+      <AdditionalIncludeDirectories>$(ProjectDir)..\;$(ProjectDir);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>

+      <Optimization>MinSpace</Optimization>

+      <FavorSizeOrSpeed>Speed</FavorSizeOrSpeed>

+    </ClCompile>

+    <Link>

+      <SubSystem>Console</SubSystem>

+      <IgnoreAllDefaultLibraries>false</IgnoreAllDefaultLibraries>

+      <GenerateWindowsMetadata>false</GenerateWindowsMetadata>

+    </Link>

+    <FxCompile>

+      <VariableName>%(Filename)</VariableName>

+    </FxCompile>

+    <FxCompile>

+      <HeaderFileOutput>$(SolutionDir)libav1\dx\shaders\h\%(Filename).h</HeaderFileOutput>

+      <ShaderModel>5.0</ShaderModel>

+      <ShaderType>Compute</ShaderType>

+      <ObjectFileOutput />

+      <DisableOptimizations>false</DisableOptimizations>

+    </FxCompile>

+  </ItemDefinitionGroup>

+  <ItemGroup>

+    <ClCompile Include="..\aom\src\aom_codec.c" />

+    <ClCompile Include="..\aom\src\aom_decoder.c" />

+    <ClCompile Include="..\aom\src\aom_image.c" />

+    <ClCompile Include="..\aom\src\aom_integer.c" />

+    <ClCompile Include="..\aom_dsp\aom_convolve.c" />

+    <ClCompile Include="..\aom_dsp\aom_dsp_rtcd.c" />

+    <ClCompile Include="..\aom_dsp\avg.c" />

+    <ClCompile Include="..\aom_dsp\binary_codes_reader.c" />

+    <ClCompile Include="..\aom_dsp\bitreader_buffer.c" />

+    <ClCompile Include="..\aom_dsp\blend_a64_hmask.c" />

+    <ClCompile Include="..\aom_dsp\blend_a64_mask.c" />

+    <ClCompile Include="..\aom_dsp\blend_a64_vmask.c" />

+    <ClCompile Include="..\aom_dsp\daalaboolreader.c">

+      <Optimization Condition="'$(Configuration)|$(Platform)'=='Release|x64'">MaxSpeed</Optimization>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\entcode.c">

+      <Optimization Condition="'$(Configuration)|$(Platform)'=='Release|x64'">MaxSpeed</Optimization>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\entdec.c" />

+    <ClCompile Include="..\aom_dsp\fft.c" />

+    <ClCompile Include="..\aom_dsp\fwd_txfm.c" />

+    <ClCompile Include="..\aom_dsp\grain_synthesis.c" />

+    <ClCompile Include="..\aom_dsp\intrapred.c" />

+    <ClCompile Include="..\aom_dsp\loopfilter.c" />

+    <ClCompile Include="..\aom_dsp\subtract.c" />

+    <ClCompile Include="..\aom_mem\aom_mem.c" />

+    <ClCompile Include="..\aom_scale\aom_scale_rtcd.c" />

+    <ClCompile Include="..\aom_scale\generic\aom_scale.c" />

+    <ClCompile Include="..\aom_scale\generic\gen_scalers.c" />

+    <ClCompile Include="..\aom_scale\generic\yv12config.c" />

+    <ClCompile Include="..\aom_scale\generic\yv12extend.c" />

+    <ClCompile Include="..\aom_util\aom_thread.c" />

+    <ClCompile Include="..\aom_util\debug_util.c" />

+    <ClCompile Include="..\av1\av1_dx_iface.c" />

+    <ClCompile Include="..\av1\common\alloccommon.c" />

+    <ClCompile Include="..\av1\common\av1_inv_txfm1d.c" />

+    <ClCompile Include="..\av1\common\av1_inv_txfm2d.c" />

+    <ClCompile Include="..\av1\common\av1_loopfilter.c" />

+    <ClCompile Include="..\av1\common\av1_rtcd.c" />

+    <ClCompile Include="..\av1\common\av1_txfm.c" />

+    <ClCompile Include="..\av1\common\blockd.c" />

+    <ClCompile Include="..\av1\common\cdef.c" />

+    <ClCompile Include="..\av1\common\cdef_block.c" />

+    <ClCompile Include="..\av1\common\cfl.c" />

+    <ClCompile Include="..\av1\common\convolve.c" />

+    <ClCompile Include="..\av1\common\debugmodes.c" />

+    <ClCompile Include="..\av1\common\entropy.c" />

+    <ClCompile Include="..\av1\common\entropymode.c">

+      <Optimization Condition="'$(Configuration)|$(Platform)'=='Release|x64'">MaxSpeed</Optimization>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\entropymv.c">

+      <Optimization Condition="'$(Configuration)|$(Platform)'=='Release|x64'">MaxSpeed</Optimization>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\frame_buffers.c" />

+    <ClCompile Include="..\av1\common\idct.c" />

+    <ClCompile Include="..\av1\common\mvref_common.c">

+      <Optimization Condition="'$(Configuration)|$(Platform)'=='Release|x64'">MaxSpeed</Optimization>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\obu_util.c" />

+    <ClCompile Include="..\av1\common\odintrin.c" />

+    <ClCompile Include="..\av1\common\pred_common.c" />

+    <ClCompile Include="..\av1\common\quant_common.c" />

+    <ClCompile Include="..\av1\common\reconinter.c" />

+    <ClCompile Include="..\av1\common\reconintra.c" />

+    <ClCompile Include="..\av1\common\resize.c" />

+    <ClCompile Include="..\av1\common\restoration.c" />

+    <ClCompile Include="..\av1\common\scale.c" />

+    <ClCompile Include="..\av1\common\scan.c" />

+    <ClCompile Include="..\av1\common\seg_common.c" />

+    <ClCompile Include="..\av1\common\thread_common.c" />

+    <ClCompile Include="..\av1\common\tile_common.c" />

+    <ClCompile Include="..\av1\common\timing.c" />

+    <ClCompile Include="..\av1\common\txb_common.c" />

+    <ClCompile Include="..\av1\common\warped_motion.c" />

+    <ClCompile Include="..\av1\decoder\accounting.c" />

+    <ClCompile Include="..\av1\decoder\decodeframe.c" />

+    <ClCompile Include="..\av1\decoder\decodemv.c">

+      <Optimization Condition="'$(Configuration)|$(Platform)'=='Release|x64'">MaxSpeed</Optimization>

+    </ClCompile>

+    <ClCompile Include="..\av1\decoder\decoder.c" />

+    <ClCompile Include="..\av1\decoder\decodetxb.c" />

+    <ClCompile Include="..\av1\decoder\detokenize.c">

+      <Optimization Condition="'$(Configuration)|$(Platform)'=='Release|x64'">MaxSpeed</Optimization>

+    </ClCompile>

+    <ClCompile Include="..\av1\decoder\obu.c" />

+    <ClCompile Include="..\dx\av1_core.cpp" />

+    <ClCompile Include="..\dx\av1_memory.cpp" />

+    <ClCompile Include="..\dx\av1_compute.cpp" />

+    <ClCompile Include="..\dx\av1_thread.cpp" />

+    <ClCompile Include="..\dx\prediction.cpp" />

+    <ClCompile Include="..\dx\loopfilters.cpp">

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+    </ClCompile>

+    <ClCompile Include="..\dx\transform.cpp" />

+    <ClCompile Include="..\dx\vp9dx_linear_allocator.cpp" />

+  </ItemGroup>

+  <ItemGroup>

+    <ClInclude Include="..\aom\aom.h" />

+    <ClInclude Include="..\aom\aomdx.h" />

+    <ClInclude Include="..\aom\aom_codec.h" />

+    <ClInclude Include="..\aom\aom_decoder.h" />

+    <ClInclude Include="..\aom\aom_frame_buffer.h" />

+    <ClInclude Include="..\aom\aom_image.h" />

+    <ClInclude Include="..\aom\aom_integer.h" />

+    <ClInclude Include="..\aom\internal\aom_codec_internal.h" />

+    <ClInclude Include="..\aom_dsp\aom_dsp_common.h" />

+    <ClInclude Include="..\aom_dsp\aom_filter.h" />

+    <ClInclude Include="..\aom_dsp\aom_simd.h" />

+    <ClInclude Include="..\aom_dsp\aom_simd_inline.h" />

+    <ClInclude Include="..\aom_dsp\binary_codes_reader.h" />

+    <ClInclude Include="..\aom_dsp\bitreader.h" />

+    <ClInclude Include="..\aom_dsp\bitreader_buffer.h" />

+    <ClInclude Include="..\aom_dsp\blend.h" />

+    <ClInclude Include="..\aom_dsp\daalaboolreader.h" />

+    <ClInclude Include="..\aom_dsp\entcode.h" />

+    <ClInclude Include="..\aom_dsp\entdec.h" />

+    <ClInclude Include="..\aom_dsp\fft_common.h" />

+    <ClInclude Include="..\aom_dsp\grain_synthesis.h" />

+    <ClInclude Include="..\aom_dsp\intrapred_common.h" />

+    <ClInclude Include="..\aom_dsp\postproc.h" />

+    <ClInclude Include="..\aom_dsp\prob.h" />

+    <ClInclude Include="..\aom_dsp\recenter.h" />

+    <ClInclude Include="..\aom_dsp\txfm_common.h" />

+    <ClInclude Include="..\aom_mem\aom_mem.h" />

+    <ClInclude Include="..\aom_mem\include\aom_mem_intrnl.h" />

+    <ClInclude Include="..\aom_ports\aom_once.h" />

+    <ClInclude Include="..\aom_ports\aom_timer.h" />

+    <ClInclude Include="..\aom_ports\bitops.h" />

+    <ClInclude Include="..\aom_ports\emmintrin_compat.h" />

+    <ClInclude Include="..\aom_ports\mem.h" />

+    <ClInclude Include="..\aom_ports\mem_ops.h" />

+    <ClInclude Include="..\aom_ports\mem_ops_aligned.h" />

+    <ClInclude Include="..\aom_ports\msvc.h" />

+    <ClInclude Include="..\aom_ports\sanitizer.h" />

+    <ClInclude Include="..\aom_ports\system_state.h" />

+    <ClInclude Include="..\aom_ports\x86.h" />

+    <ClInclude Include="..\aom_scale\aom_scale.h" />

+    <ClInclude Include="..\aom_scale\yv12config.h" />

+    <ClInclude Include="..\aom_util\aom_thread.h" />

+    <ClInclude Include="..\aom_util\debug_util.h" />

+    <ClInclude Include="..\aom_util\endian_inl.h" />

+    <ClInclude Include="..\av1\av1_iface_common.h" />

+    <ClInclude Include="..\av1\common\alloccommon.h" />

+    <ClInclude Include="..\av1\common\av1_inv_txfm1d.h" />

+    <ClInclude Include="..\av1\common\av1_inv_txfm1d_cfg.h" />

+    <ClInclude Include="..\av1\common\av1_loopfilter.h" />

+    <ClInclude Include="..\av1\common\av1_txfm.h" />

+    <ClInclude Include="..\av1\common\blockd.h" />

+    <ClInclude Include="..\av1\common\cdef.h" />

+    <ClInclude Include="..\av1\common\cdef_block.h" />

+    <ClInclude Include="..\av1\common\cfl.h" />

+    <ClInclude Include="..\av1\common\common.h" />

+    <ClInclude Include="..\av1\common\common_data.h" />

+    <ClInclude Include="..\av1\common\convolve.h" />

+    <ClInclude Include="..\av1\common\entropy.h" />

+    <ClInclude Include="..\av1\common\entropymode.h" />

+    <ClInclude Include="..\av1\common\entropymv.h" />

+    <ClInclude Include="..\av1\common\enums.h" />

+    <ClInclude Include="..\av1\common\filter.h" />

+    <ClInclude Include="..\av1\common\frame_buffers.h" />

+    <ClInclude Include="..\av1\common\idct.h" />

+    <ClInclude Include="..\av1\common\mv.h" />

+    <ClInclude Include="..\av1\common\mvref_common.h" />

+    <ClInclude Include="..\av1\common\obmc.h" />

+    <ClInclude Include="..\av1\common\obu_util.h" />

+    <ClInclude Include="..\av1\common\odintrin.h" />

+    <ClInclude Include="..\av1\common\onyxc_int.h" />

+    <ClInclude Include="..\av1\common\pred_common.h" />

+    <ClInclude Include="..\av1\common\quant_common.h" />

+    <ClInclude Include="..\av1\common\reconinter.h" />

+    <ClInclude Include="..\av1\common\reconintra.h" />

+    <ClInclude Include="..\av1\common\resize.h" />

+    <ClInclude Include="..\av1\common\restoration.h" />

+    <ClInclude Include="..\av1\common\scale.h" />

+    <ClInclude Include="..\av1\common\scan.h" />

+    <ClInclude Include="..\av1\common\seg_common.h" />

+    <ClInclude Include="..\av1\common\thread_common.h" />

+    <ClInclude Include="..\av1\common\tile_common.h" />

+    <ClInclude Include="..\av1\common\timing.h" />

+    <ClInclude Include="..\av1\common\token_cdfs.h" />

+    <ClInclude Include="..\av1\common\txb_common.h" />

+    <ClInclude Include="..\av1\common\warped_motion.h" />

+    <ClInclude Include="..\av1\decoder\accounting.h" />

+    <ClInclude Include="..\av1\decoder\decodeframe.h" />

+    <ClInclude Include="..\av1\decoder\decodemv.h" />

+    <ClInclude Include="..\av1\decoder\decoder.h" />

+    <ClInclude Include="..\av1\decoder\decodetxb.h" />

+    <ClInclude Include="..\av1\decoder\detokenize.h" />

+    <ClInclude Include="..\av1\decoder\dthread.h" />

+    <ClInclude Include="..\av1\decoder\inspection.h" />

+    <ClInclude Include="..\av1\decoder\obu.h" />

+    <ClInclude Include="..\dx\av1_core.h" />

+    <ClInclude Include="..\dx\av1_memory.h" />

+    <ClInclude Include="..\dx\av1_compute.h" />

+    <ClInclude Include="..\dx\av1_thread.h" />

+    <ClInclude Include="..\dx\shaders\film_grain_const.h" />

+    <ClInclude Include="..\dx\shaders\mode_info.h" />

+    <ClInclude Include="..\dx\shaders\restoration.h" />

+    <ClInclude Include="..\dx\shaders\idct_shader_common.h">

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+    </ClInclude>

+    <ClInclude Include="..\dx\shaders\inter_common.h" />

+    <ClInclude Include="..\dx\types.h" />

+    <ClInclude Include="..\dx\vp9dx_linear_allocator.h" />

+  </ItemGroup>

+  <ItemGroup>

+    <FxCompile Include="..\dx\shaders\copy_plane.hlsl" />

+    <FxCompile Include="..\dx\shaders\copy_plane_10bit10x3.hlsl" />

+    <FxCompile Include="..\dx\shaders\fill_buffer.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderModel Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">4.0</ShaderModel>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderModel Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">4.0</ShaderModel>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\film_grain_filter.hlsl" />

+    <FxCompile Include="..\dx\shaders\film_grain_gen_chroma.hlsl" />

+    <FxCompile Include="..\dx\shaders\film_grain_gen_luma.hlsl" />

+    <FxCompile Include="..\dx\shaders\gen_lf_blocks_hor.hlsl" />

+    <FxCompile Include="..\dx\shaders\gen_lf_blocks_vert.hlsl" />

+    <FxCompile Include="..\dx\shaders\gen_pred_blocks.hlsl">

+      <EnableDebuggingInformation Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">false</EnableDebuggingInformation>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct16x16.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct16x32.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct16x4.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct16x64.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct16x8.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct32x16.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct32x32.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct32x64.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct32x8.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct4x16.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct4x4.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+      <AssemblerOutput Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </AssemblerOutput>

+      <AssemblerOutputFile Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </AssemblerOutputFile>

+      <AssemblerOutput Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </AssemblerOutput>

+      <AssemblerOutputFile Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </AssemblerOutputFile>

+      <AssemblerOutput Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </AssemblerOutput>

+      <AssemblerOutputFile Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </AssemblerOutputFile>

+      <AssemblerOutput Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </AssemblerOutput>

+      <AssemblerOutputFile Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </AssemblerOutputFile>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct4x8.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct64x16.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct64x32.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct64x64.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct8x16.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct8x32.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct8x4.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct8x8.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </ExcludedFromBuild>

+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </ExcludedFromBuild>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct_lossless.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct_sort_blocks.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compute</ShaderType>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Compute</ShaderType>

+      <FileType>Document</FileType>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+      </DeploymentContent>

+      <DeploymentContent Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+      </DeploymentContent>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_2x2chroma.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_2x2chroma_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_compound.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_compound_2x2.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_compound_2x2_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_compound_diff_chroma.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_compound_diff_chroma_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_compound_diff_luma.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_compound_diff_luma_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_compound_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_compound_masked.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_compound_masked_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_ext_borders.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_ext_borders_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_main.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_main_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_obmc_above.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_obmc_above_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_obmc_left.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_obmc_left_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_2x2chroma.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_2x2chroma_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_compound.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_2x2.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_2x2_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_diff_chroma.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_diff_chroma_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_diff_luma.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_diff_luma_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_masked.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_masked_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_obmc_above.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_obmc_above_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_obmc_left.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_obmc_left_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_warp_compound.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_scale_warp_compound_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_warp.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_warp_compound.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_warp_compound_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\inter_warp_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\intra_filter.hlsl" />

+    <FxCompile Include="..\dx\shaders\intra_filter_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\intra_main.hlsl">

+      <DisableOptimizations Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">true</DisableOptimizations>

+      <DisableOptimizations Condition="'$(Configuration)|$(Platform)'=='Release|x64'">true</DisableOptimizations>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\intra_main_hbd.hlsl">

+      <DisableOptimizations Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">true</DisableOptimizations>

+      <DisableOptimizations Condition="'$(Configuration)|$(Platform)'=='Release|x64'">true</DisableOptimizations>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\loopfilter_hor.hlsl" />

+    <FxCompile Include="..\dx\shaders\loopfilter_hor_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\loopfilter_vert.hlsl" />

+    <FxCompile Include="..\dx\shaders\loopfilter_vert_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\reconstruct_block.hlsl" />

+    <FxCompile Include="..\dx\shaders\reconstruct_block_hbd.hlsl" />

+    <FxCompile Include="..\dx\shaders\upscaling.hlsl" />

+  </ItemGroup>

+  <ItemGroup>

+    <FxCompile Include="..\dx\shaders\cdef_filter.hlsl">

+      <FileType>Document</FileType>

+    </FxCompile>

+  </ItemGroup>

+  <ItemGroup>

+    <FxCompile Include="..\dx\shaders\loop_restoration.hlsl">

+      <FileType>Document</FileType>

+    </FxCompile>

+  </ItemGroup>

+  <ItemGroup>

+    <None Include="..\LICENSE" />

+    <None Include="..\PATENTS" />

+  </ItemGroup>

+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />

+  <ImportGroup Label="ExtensionTargets">

+  </ImportGroup>

+</Project>
\ No newline at end of file

diff --git a/libav1/build/libav1.vcxproj.filters b/libav1/build/libav1.vcxproj.filters
new file mode 100644
index 0000000..3240c79
--- /dev/null
+++ b/libav1/build/libav1.vcxproj.filters

@@ -0,0 +1,861 @@
+<?xml version="1.0" encoding="utf-8"?>

+<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

+  <ItemGroup>

+    <ClCompile Include="..\av1\av1_dx_iface.c">

+      <Filter>av1</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\alloccommon.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\av1_inv_txfm1d.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\av1_inv_txfm2d.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\av1_loopfilter.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\av1_rtcd.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\av1_txfm.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\blockd.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\cdef.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\cdef_block.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\cfl.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\convolve.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\debugmodes.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\entropy.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\entropymode.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\entropymv.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\frame_buffers.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\idct.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\mvref_common.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\obu_util.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\odintrin.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\pred_common.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\quant_common.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\reconinter.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\reconintra.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\resize.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\restoration.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\scale.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\scan.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\seg_common.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\thread_common.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\tile_common.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\timing.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\txb_common.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\common\warped_motion.c">

+      <Filter>av1\common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\decoder\accounting.c">

+      <Filter>av1\decoder</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\decoder\decodeframe.c">

+      <Filter>av1\decoder</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\decoder\decodemv.c">

+      <Filter>av1\decoder</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\decoder\decoder.c">

+      <Filter>av1\decoder</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\decoder\decodetxb.c">

+      <Filter>av1\decoder</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\decoder\detokenize.c">

+      <Filter>av1\decoder</Filter>

+    </ClCompile>

+    <ClCompile Include="..\av1\decoder\obu.c">

+      <Filter>av1\decoder</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\aom_convolve.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\aom_dsp_rtcd.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\avg.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\binary_codes_reader.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\bitreader_buffer.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\blend_a64_hmask.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\blend_a64_mask.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\blend_a64_vmask.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\daalaboolreader.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\entcode.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\entdec.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\fft.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\fwd_txfm.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\grain_synthesis.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\intrapred.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\loopfilter.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_dsp\subtract.c">

+      <Filter>aom_dsp</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_mem\aom_mem.c">

+      <Filter>aom_mem</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_scale\aom_scale_rtcd.c">

+      <Filter>aom_scale</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_scale\generic\aom_scale.c">

+      <Filter>aom_scale</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_scale\generic\gen_scalers.c">

+      <Filter>aom_scale</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_scale\generic\yv12config.c">

+      <Filter>aom_scale</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_scale\generic\yv12extend.c">

+      <Filter>aom_scale</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_util\aom_thread.c">

+      <Filter>aom_util</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom_util\debug_util.c">

+      <Filter>aom_util</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom\src\aom_codec.c">

+      <Filter>aom</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom\src\aom_decoder.c">

+      <Filter>aom</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom\src\aom_image.c">

+      <Filter>aom</Filter>

+    </ClCompile>

+    <ClCompile Include="..\aom\src\aom_integer.c">

+      <Filter>aom</Filter>

+    </ClCompile>

+    <ClCompile Include="..\dx\av1_core.cpp">

+      <Filter>dx</Filter>

+    </ClCompile>

+    <ClCompile Include="..\dx\av1_memory.cpp">

+      <Filter>dx</Filter>

+    </ClCompile>

+    <ClCompile Include="..\dx\vp9dx_linear_allocator.cpp">

+      <Filter>dx</Filter>

+    </ClCompile>

+    <ClCompile Include="..\dx\av1_compute.cpp">

+      <Filter>dx</Filter>

+    </ClCompile>

+    <ClCompile Include="..\dx\transform.cpp">

+      <Filter>dx</Filter>

+    </ClCompile>

+    <ClCompile Include="..\dx\prediction.cpp">

+      <Filter>dx</Filter>

+    </ClCompile>

+    <ClCompile Include="..\dx\loopfilters.cpp">

+      <Filter>dx</Filter>

+    </ClCompile>

+    <ClCompile Include="..\dx\av1_thread.cpp">

+      <Filter>dx</Filter>

+    </ClCompile>

+  </ItemGroup>

+  <ItemGroup>

+    <ClInclude Include="..\av1\av1_iface_common.h">

+      <Filter>av1</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\alloccommon.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\av1_inv_txfm1d.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\av1_inv_txfm1d_cfg.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\av1_loopfilter.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\av1_txfm.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\blockd.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\cdef.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\cdef_block.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\cfl.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\common.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\common_data.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\convolve.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\entropy.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\entropymode.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\entropymv.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\enums.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\filter.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\frame_buffers.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\idct.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\mv.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\mvref_common.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\obmc.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\obu_util.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\odintrin.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\onyxc_int.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\pred_common.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\quant_common.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\reconinter.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\reconintra.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\resize.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\restoration.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\scale.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\scan.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\seg_common.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\thread_common.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\tile_common.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\timing.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\token_cdfs.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\txb_common.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\common\warped_motion.h">

+      <Filter>av1\common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\decoder\accounting.h">

+      <Filter>av1\decoder</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\decoder\decodeframe.h">

+      <Filter>av1\decoder</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\decoder\decodemv.h">

+      <Filter>av1\decoder</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\decoder\decoder.h">

+      <Filter>av1\decoder</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\decoder\decodetxb.h">

+      <Filter>av1\decoder</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\decoder\detokenize.h">

+      <Filter>av1\decoder</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\decoder\dthread.h">

+      <Filter>av1\decoder</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\decoder\inspection.h">

+      <Filter>av1\decoder</Filter>

+    </ClInclude>

+    <ClInclude Include="..\av1\decoder\obu.h">

+      <Filter>av1\decoder</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom\aom.h">

+      <Filter>aom</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom\aom_codec.h">

+      <Filter>aom</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom\aom_decoder.h">

+      <Filter>aom</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom\aom_frame_buffer.h">

+      <Filter>aom</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom\aom_image.h">

+      <Filter>aom</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom\aom_integer.h">

+      <Filter>aom</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom\aomdx.h">

+      <Filter>aom</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom\internal\aom_codec_internal.h">

+      <Filter>aom</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\aom_dsp_common.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\aom_filter.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\aom_simd.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\aom_simd_inline.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\binary_codes_reader.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\bitreader.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\bitreader_buffer.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\blend.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\daalaboolreader.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\entcode.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\entdec.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\fft_common.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\grain_synthesis.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\intrapred_common.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\postproc.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\prob.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\recenter.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_dsp\txfm_common.h">

+      <Filter>aom_dsp</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_mem\aom_mem.h">

+      <Filter>aom_mem</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_mem\include\aom_mem_intrnl.h">

+      <Filter>aom_mem</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_ports\aom_once.h">

+      <Filter>aom_ports</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_ports\aom_timer.h">

+      <Filter>aom_ports</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_ports\bitops.h">

+      <Filter>aom_ports</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_ports\emmintrin_compat.h">

+      <Filter>aom_ports</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_ports\mem.h">

+      <Filter>aom_ports</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_ports\mem_ops.h">

+      <Filter>aom_ports</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_ports\mem_ops_aligned.h">

+      <Filter>aom_ports</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_ports\msvc.h">

+      <Filter>aom_ports</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_ports\sanitizer.h">

+      <Filter>aom_ports</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_ports\system_state.h">

+      <Filter>aom_ports</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_ports\x86.h">

+      <Filter>aom_ports</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_scale\aom_scale.h">

+      <Filter>aom_scale</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_scale\yv12config.h">

+      <Filter>aom_scale</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_util\aom_thread.h">

+      <Filter>aom_util</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_util\debug_util.h">

+      <Filter>aom_util</Filter>

+    </ClInclude>

+    <ClInclude Include="..\aom_util\endian_inl.h">

+      <Filter>aom_util</Filter>

+    </ClInclude>

+    <ClInclude Include="..\dx\av1_core.h">

+      <Filter>dx</Filter>

+    </ClInclude>

+    <ClInclude Include="..\dx\av1_memory.h">

+      <Filter>dx</Filter>

+    </ClInclude>

+    <ClInclude Include="..\dx\vp9dx_linear_allocator.h">

+      <Filter>dx</Filter>

+    </ClInclude>

+    <ClInclude Include="..\dx\av1_compute.h">

+      <Filter>dx</Filter>

+    </ClInclude>

+    <ClInclude Include="..\dx\shaders\idct_shader_common.h">

+      <Filter>dx\shaders</Filter>

+    </ClInclude>

+    <ClInclude Include="..\dx\shaders\inter_common.h">

+      <Filter>dx\shaders</Filter>

+    </ClInclude>

+    <ClInclude Include="..\dx\types.h">

+      <Filter>dx</Filter>

+    </ClInclude>

+    <ClInclude Include="..\dx\shaders\restoration.h">

+      <Filter>dx\shaders</Filter>

+    </ClInclude>

+    <ClInclude Include="..\dx\shaders\film_grain_const.h">

+      <Filter>dx\shaders</Filter>

+    </ClInclude>

+    <ClInclude Include="..\dx\av1_thread.h">

+      <Filter>dx</Filter>

+    </ClInclude>

+    <ClInclude Include="..\dx\shaders\mode_info.h">

+      <Filter>dx\shaders</Filter>

+    </ClInclude>

+  </ItemGroup>

+  <ItemGroup>

+    <Filter Include="av1">

+      <UniqueIdentifier>{807f0bf5-c3bf-4e2e-862a-13905c9fba90}</UniqueIdentifier>

+    </Filter>

+    <Filter Include="av1\common">

+      <UniqueIdentifier>{5adc485d-02d3-493f-936b-4a28b4067223}</UniqueIdentifier>

+    </Filter>

+    <Filter Include="av1\decoder">

+      <UniqueIdentifier>{a2871d17-bd36-4f77-8ae5-fb7fff4a388e}</UniqueIdentifier>

+    </Filter>

+    <Filter Include="aom">

+      <UniqueIdentifier>{c9451528-03cf-43f3-a299-aff11f3933c6}</UniqueIdentifier>

+    </Filter>

+    <Filter Include="aom_dsp">

+      <UniqueIdentifier>{a0643da7-6247-42b0-88a0-48d6546bd6da}</UniqueIdentifier>

+    </Filter>

+    <Filter Include="aom_mem">

+      <UniqueIdentifier>{23662c58-cbb8-49ea-9dca-0acbc1d4e499}</UniqueIdentifier>

+    </Filter>

+    <Filter Include="aom_ports">

+      <UniqueIdentifier>{0a1cf790-c078-4806-abe4-836d189ad143}</UniqueIdentifier>

+    </Filter>

+    <Filter Include="aom_scale">

+      <UniqueIdentifier>{4145e1f2-9b37-4d94-8c39-44d15628ddef}</UniqueIdentifier>

+    </Filter>

+    <Filter Include="aom_util">

+      <UniqueIdentifier>{aa1780f0-b7d3-4975-96dc-76a182ab82b7}</UniqueIdentifier>

+    </Filter>

+    <Filter Include="dx">

+      <UniqueIdentifier>{1aeab06d-a69e-4e7f-90e8-1560c6b05d43}</UniqueIdentifier>

+    </Filter>

+    <Filter Include="dx\shaders">

+      <UniqueIdentifier>{bdfb7f97-4a0f-4adf-8233-520135551539}</UniqueIdentifier>

+    </Filter>

+  </ItemGroup>

+  <ItemGroup>

+    <FxCompile Include="..\dx\shaders\idct16x8.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct32x8.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct32x32.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct16x16.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct16x32.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct32x16.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct64x64.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct64x16.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct16x64.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct64x32.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct32x64.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct_lossless.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct_sort_blocks.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\fill_buffer.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\gen_pred_blocks.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_2x2chroma.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_2x2chroma_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_compound.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_compound_2x2.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_compound_2x2_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_compound_diff_chroma.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_compound_diff_chroma_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_compound_diff_luma.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_compound_diff_luma_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_compound_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_compound_masked.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_compound_masked_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_ext_borders.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_ext_borders_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_main.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_main_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_obmc_above.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_obmc_above_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_obmc_left.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_obmc_left_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_warp.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_warp_compound.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_warp_compound_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_warp_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\intra_filter.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\intra_filter_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\intra_main.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\intra_main_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\copy_plane.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\reconstruct_block.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\reconstruct_block_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\gen_lf_blocks_hor.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\gen_lf_blocks_vert.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\loopfilter_hor.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\loopfilter_hor_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\loopfilter_vert.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\loopfilter_vert_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\cdef_filter.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\loop_restoration.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\film_grain_filter.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\film_grain_gen_chroma.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\film_grain_gen_luma.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct4x4.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct4x8.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct4x16.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct8x4.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct8x8.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct8x16.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct8x32.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\idct16x4.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\copy_plane_10bit10x3.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_2x2chroma.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_2x2chroma_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_compound.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_2x2.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_2x2_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_diff_chroma.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_diff_chroma_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_diff_luma.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_diff_luma_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_masked.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_compound_masked_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_obmc_above.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_obmc_above_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_obmc_left.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_obmc_left_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_warp_compound.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale_warp_compound_hbd.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\upscaling.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+    <FxCompile Include="..\dx\shaders\inter_scale.hlsl">

+      <Filter>dx\shaders</Filter>

+    </FxCompile>

+  </ItemGroup>

+  <ItemGroup>

+    <None Include="..\LICENSE" />

+    <None Include="..\PATENTS" />

+  </ItemGroup>

+</Project>
\ No newline at end of file

diff --git a/libav1/d3dx12.h b/libav1/d3dx12.h
new file mode 100644
index 0000000..7eb73ff
--- /dev/null
+++ b/libav1/d3dx12.h

@@ -0,0 +1,1445 @@
+//*********************************************************
+//
+// Copyright (c) Microsoft. All rights reserved.
+// This code is licensed under the MIT License (MIT).
+// THIS CODE IS PROVIDED *AS IS* WITHOUT WARRANTY OF
+// ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING ANY
+// IMPLIED WARRANTIES OF FITNESS FOR A PARTICULAR
+// PURPOSE, MERCHANTABILITY, OR NON-INFRINGEMENT.
+//
+//*********************************************************
+
+#ifndef __D3DX12_H__
+#define __D3DX12_H__
+
+#include "d3d12.h"
+
+#if defined(__cplusplus)
+
+struct CD3DX12_DEFAULT {};
+extern const DECLSPEC_SELECTANY CD3DX12_DEFAULT D3D12_DEFAULT;
+
+//------------------------------------------------------------------------------------------------
+inline bool operator==(const D3D12_VIEWPORT& l, const D3D12_VIEWPORT& r) {
+  return l.TopLeftX == r.TopLeftX && l.TopLeftY == r.TopLeftY && l.Width == r.Width && l.Height == r.Height &&
+         l.MinDepth == r.MinDepth && l.MaxDepth == r.MaxDepth;
+}
+
+//------------------------------------------------------------------------------------------------
+inline bool operator!=(const D3D12_VIEWPORT& l, const D3D12_VIEWPORT& r) { return !(l == r); }
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_RECT : public D3D12_RECT {
+  CD3DX12_RECT() {}
+  explicit CD3DX12_RECT(const D3D12_RECT& o) : D3D12_RECT(o) {}
+  explicit CD3DX12_RECT(LONG Left, LONG Top, LONG Right, LONG Bottom) {
+    left = Left;
+    top = Top;
+    right = Right;
+    bottom = Bottom;
+  }
+  ~CD3DX12_RECT() {}
+  operator const D3D12_RECT&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_VIEWPORT : public D3D12_VIEWPORT {
+  CD3DX12_VIEWPORT() {}
+  explicit CD3DX12_VIEWPORT(const D3D12_VIEWPORT& o) : D3D12_VIEWPORT(o) {}
+  explicit CD3DX12_VIEWPORT(FLOAT topLeftX, FLOAT topLeftY, FLOAT width, FLOAT height, FLOAT minDepth = D3D12_MIN_DEPTH,
+                            FLOAT maxDepth = D3D12_MAX_DEPTH) {
+    TopLeftX = topLeftX;
+    TopLeftY = topLeftY;
+    Width = width;
+    Height = height;
+    MinDepth = minDepth;
+    MaxDepth = maxDepth;
+  }
+  explicit CD3DX12_VIEWPORT(_In_ ID3D12Resource* pResource, UINT mipSlice = 0, FLOAT topLeftX = 0.0f,
+                            FLOAT topLeftY = 0.0f, FLOAT minDepth = D3D12_MIN_DEPTH, FLOAT maxDepth = D3D12_MAX_DEPTH) {
+    D3D12_RESOURCE_DESC Desc = pResource->GetDesc();
+    const UINT64 SubresourceWidth = Desc.Width >> mipSlice;
+    const UINT64 SubresourceHeight = Desc.Height >> mipSlice;
+    switch (Desc.Dimension) {
+      case D3D12_RESOURCE_DIMENSION_BUFFER:
+        TopLeftX = topLeftX;
+        TopLeftY = 0.0f;
+        Width = Desc.Width - topLeftX;
+        Height = 1.0f;
+        break;
+      case D3D12_RESOURCE_DIMENSION_TEXTURE1D:
+        TopLeftX = topLeftX;
+        TopLeftY = 0.0f;
+        Width = (SubresourceWidth ? SubresourceWidth : 1.0f) - topLeftX;
+        Height = 1.0f;
+        break;
+      case D3D12_RESOURCE_DIMENSION_TEXTURE2D:
+      case D3D12_RESOURCE_DIMENSION_TEXTURE3D:
+        TopLeftX = topLeftX;
+        TopLeftY = topLeftY;
+        Width = (SubresourceWidth ? SubresourceWidth : 1.0f) - topLeftX;
+        Height = (SubresourceHeight ? SubresourceHeight : 1.0f) - topLeftY;
+        break;
+      default:
+        break;
+    }
+
+    MinDepth = minDepth;
+    MaxDepth = maxDepth;
+  }
+  ~CD3DX12_VIEWPORT() {}
+  operator const D3D12_VIEWPORT&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_BOX : public D3D12_BOX {
+  CD3DX12_BOX() {}
+  explicit CD3DX12_BOX(const D3D12_BOX& o) : D3D12_BOX(o) {}
+  explicit CD3DX12_BOX(LONG Left, LONG Right) {
+    left = Left;
+    top = 0;
+    front = 0;
+    right = Right;
+    bottom = 1;
+    back = 1;
+  }
+  explicit CD3DX12_BOX(LONG Left, LONG Top, LONG Right, LONG Bottom) {
+    left = Left;
+    top = Top;
+    front = 0;
+    right = Right;
+    bottom = Bottom;
+    back = 1;
+  }
+  explicit CD3DX12_BOX(LONG Left, LONG Top, LONG Front, LONG Right, LONG Bottom, LONG Back) {
+    left = Left;
+    top = Top;
+    front = Front;
+    right = Right;
+    bottom = Bottom;
+    back = Back;
+  }
+  ~CD3DX12_BOX() {}
+  operator const D3D12_BOX&() const { return *this; }
+};
+inline bool operator==(const D3D12_BOX& l, const D3D12_BOX& r) {
+  return l.left == r.left && l.top == r.top && l.front == r.front && l.right == r.right && l.bottom == r.bottom &&
+         l.back == r.back;
+}
+inline bool operator!=(const D3D12_BOX& l, const D3D12_BOX& r) { return !(l == r); }
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_DEPTH_STENCIL_DESC : public D3D12_DEPTH_STENCIL_DESC {
+  CD3DX12_DEPTH_STENCIL_DESC() {}
+  explicit CD3DX12_DEPTH_STENCIL_DESC(const D3D12_DEPTH_STENCIL_DESC& o) : D3D12_DEPTH_STENCIL_DESC(o) {}
+  explicit CD3DX12_DEPTH_STENCIL_DESC(CD3DX12_DEFAULT) {
+    DepthEnable = TRUE;
+    DepthWriteMask = D3D12_DEPTH_WRITE_MASK_ALL;
+    DepthFunc = D3D12_COMPARISON_FUNC_LESS;
+    StencilEnable = FALSE;
+    StencilReadMask = D3D12_DEFAULT_STENCIL_READ_MASK;
+    StencilWriteMask = D3D12_DEFAULT_STENCIL_WRITE_MASK;
+    const D3D12_DEPTH_STENCILOP_DESC defaultStencilOp = {D3D12_STENCIL_OP_KEEP, D3D12_STENCIL_OP_KEEP,
+                                                         D3D12_STENCIL_OP_KEEP, D3D12_COMPARISON_FUNC_ALWAYS};
+    FrontFace = defaultStencilOp;
+    BackFace = defaultStencilOp;
+  }
+  explicit CD3DX12_DEPTH_STENCIL_DESC(BOOL depthEnable, D3D12_DEPTH_WRITE_MASK depthWriteMask,
+                                      D3D12_COMPARISON_FUNC depthFunc, BOOL stencilEnable, UINT8 stencilReadMask,
+                                      UINT8 stencilWriteMask, D3D12_STENCIL_OP frontStencilFailOp,
+                                      D3D12_STENCIL_OP frontStencilDepthFailOp, D3D12_STENCIL_OP frontStencilPassOp,
+                                      D3D12_COMPARISON_FUNC frontStencilFunc, D3D12_STENCIL_OP backStencilFailOp,
+                                      D3D12_STENCIL_OP backStencilDepthFailOp, D3D12_STENCIL_OP backStencilPassOp,
+                                      D3D12_COMPARISON_FUNC backStencilFunc) {
+    DepthEnable = depthEnable;
+    DepthWriteMask = depthWriteMask;
+    DepthFunc = depthFunc;
+    StencilEnable = stencilEnable;
+    StencilReadMask = stencilReadMask;
+    StencilWriteMask = stencilWriteMask;
+    FrontFace.StencilFailOp = frontStencilFailOp;
+    FrontFace.StencilDepthFailOp = frontStencilDepthFailOp;
+    FrontFace.StencilPassOp = frontStencilPassOp;
+    FrontFace.StencilFunc = frontStencilFunc;
+    BackFace.StencilFailOp = backStencilFailOp;
+    BackFace.StencilDepthFailOp = backStencilDepthFailOp;
+    BackFace.StencilPassOp = backStencilPassOp;
+    BackFace.StencilFunc = backStencilFunc;
+  }
+  ~CD3DX12_DEPTH_STENCIL_DESC() {}
+  operator const D3D12_DEPTH_STENCIL_DESC&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_BLEND_DESC : public D3D12_BLEND_DESC {
+  CD3DX12_BLEND_DESC() {}
+  explicit CD3DX12_BLEND_DESC(const D3D12_BLEND_DESC& o) : D3D12_BLEND_DESC(o) {}
+  explicit CD3DX12_BLEND_DESC(CD3DX12_DEFAULT) {
+    AlphaToCoverageEnable = FALSE;
+    IndependentBlendEnable = FALSE;
+    const D3D12_RENDER_TARGET_BLEND_DESC defaultRenderTargetBlendDesc = {
+        FALSE,
+        FALSE,
+        D3D12_BLEND_ONE,
+        D3D12_BLEND_ZERO,
+        D3D12_BLEND_OP_ADD,
+        D3D12_BLEND_ONE,
+        D3D12_BLEND_ZERO,
+        D3D12_BLEND_OP_ADD,
+        D3D12_LOGIC_OP_NOOP,
+        D3D12_COLOR_WRITE_ENABLE_ALL,
+    };
+    for (UINT i = 0; i < D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT; ++i) RenderTarget[i] = defaultRenderTargetBlendDesc;
+  }
+  ~CD3DX12_BLEND_DESC() {}
+  operator const D3D12_BLEND_DESC&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_RASTERIZER_DESC : public D3D12_RASTERIZER_DESC {
+  CD3DX12_RASTERIZER_DESC() {}
+  explicit CD3DX12_RASTERIZER_DESC(const D3D12_RASTERIZER_DESC& o) : D3D12_RASTERIZER_DESC(o) {}
+  explicit CD3DX12_RASTERIZER_DESC(CD3DX12_DEFAULT) {
+    FillMode = D3D12_FILL_MODE_SOLID;
+    CullMode = D3D12_CULL_MODE_BACK;
+    FrontCounterClockwise = FALSE;
+    DepthBias = D3D12_DEFAULT_DEPTH_BIAS;
+    DepthBiasClamp = D3D12_DEFAULT_DEPTH_BIAS_CLAMP;
+    SlopeScaledDepthBias = D3D12_DEFAULT_SLOPE_SCALED_DEPTH_BIAS;
+    DepthClipEnable = TRUE;
+    MultisampleEnable = FALSE;
+    AntialiasedLineEnable = FALSE;
+    ForcedSampleCount = 0;
+    ConservativeRaster = D3D12_CONSERVATIVE_RASTERIZATION_MODE_OFF;
+  }
+  explicit CD3DX12_RASTERIZER_DESC(D3D12_FILL_MODE fillMode, D3D12_CULL_MODE cullMode, BOOL frontCounterClockwise,
+                                   INT depthBias, FLOAT depthBiasClamp, FLOAT slopeScaledDepthBias,
+                                   BOOL depthClipEnable, BOOL multisampleEnable, BOOL antialiasedLineEnable,
+                                   UINT forcedSampleCount, D3D12_CONSERVATIVE_RASTERIZATION_MODE conservativeRaster) {
+    FillMode = fillMode;
+    CullMode = cullMode;
+    FrontCounterClockwise = frontCounterClockwise;
+    DepthBias = depthBias;
+    DepthBiasClamp = depthBiasClamp;
+    SlopeScaledDepthBias = slopeScaledDepthBias;
+    DepthClipEnable = depthClipEnable;
+    MultisampleEnable = multisampleEnable;
+    AntialiasedLineEnable = antialiasedLineEnable;
+    ForcedSampleCount = forcedSampleCount;
+    ConservativeRaster = conservativeRaster;
+  }
+  ~CD3DX12_RASTERIZER_DESC() {}
+  operator const D3D12_RASTERIZER_DESC&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_RESOURCE_ALLOCATION_INFO : public D3D12_RESOURCE_ALLOCATION_INFO {
+  CD3DX12_RESOURCE_ALLOCATION_INFO() {}
+  explicit CD3DX12_RESOURCE_ALLOCATION_INFO(const D3D12_RESOURCE_ALLOCATION_INFO& o)
+      : D3D12_RESOURCE_ALLOCATION_INFO(o) {}
+  CD3DX12_RESOURCE_ALLOCATION_INFO(UINT64 size, UINT64 alignment) {
+    SizeInBytes = size;
+    Alignment = alignment;
+  }
+  operator const D3D12_RESOURCE_ALLOCATION_INFO&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_HEAP_PROPERTIES : public D3D12_HEAP_PROPERTIES {
+  CD3DX12_HEAP_PROPERTIES() {}
+  explicit CD3DX12_HEAP_PROPERTIES(const D3D12_HEAP_PROPERTIES& o) : D3D12_HEAP_PROPERTIES(o) {}
+  CD3DX12_HEAP_PROPERTIES(D3D12_CPU_PAGE_PROPERTY cpuPageProperty, D3D12_MEMORY_POOL memoryPoolPreference,
+                          UINT creationNodeMask = 1, UINT nodeMask = 1) {
+    Type = D3D12_HEAP_TYPE_CUSTOM;
+    CPUPageProperty = cpuPageProperty;
+    MemoryPoolPreference = memoryPoolPreference;
+    CreationNodeMask = creationNodeMask;
+    VisibleNodeMask = nodeMask;
+  }
+  explicit CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE type, UINT creationNodeMask = 1, UINT nodeMask = 1) {
+    Type = type;
+    CPUPageProperty = D3D12_CPU_PAGE_PROPERTY_UNKNOWN;
+    MemoryPoolPreference = D3D12_MEMORY_POOL_UNKNOWN;
+    CreationNodeMask = creationNodeMask;
+    VisibleNodeMask = nodeMask;
+  }
+  operator const D3D12_HEAP_PROPERTIES&() const { return *this; }
+  bool IsCPUAccessible() const {
+    return Type == D3D12_HEAP_TYPE_UPLOAD || Type == D3D12_HEAP_TYPE_READBACK ||
+           (Type == D3D12_HEAP_TYPE_CUSTOM && (CPUPageProperty == D3D12_CPU_PAGE_PROPERTY_WRITE_COMBINE ||
+                                               CPUPageProperty == D3D12_CPU_PAGE_PROPERTY_WRITE_BACK));
+  }
+};
+inline bool operator==(const D3D12_HEAP_PROPERTIES& l, const D3D12_HEAP_PROPERTIES& r) {
+  return l.Type == r.Type && l.CPUPageProperty == r.CPUPageProperty &&
+         l.MemoryPoolPreference == r.MemoryPoolPreference && l.CreationNodeMask == r.CreationNodeMask &&
+         l.VisibleNodeMask == r.VisibleNodeMask;
+}
+inline bool operator!=(const D3D12_HEAP_PROPERTIES& l, const D3D12_HEAP_PROPERTIES& r) { return !(l == r); }
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_HEAP_DESC : public D3D12_HEAP_DESC {
+  CD3DX12_HEAP_DESC() {}
+  explicit CD3DX12_HEAP_DESC(const D3D12_HEAP_DESC& o) : D3D12_HEAP_DESC(o) {}
+  CD3DX12_HEAP_DESC(UINT64 size, D3D12_HEAP_PROPERTIES properties, UINT64 alignment = 0,
+                    D3D12_HEAP_FLAGS flags = D3D12_HEAP_FLAG_NONE) {
+    SizeInBytes = size;
+    Properties = properties;
+    Alignment = alignment;
+    Flags = flags;
+  }
+  CD3DX12_HEAP_DESC(UINT64 size, D3D12_HEAP_TYPE type, UINT64 alignment = 0,
+                    D3D12_HEAP_FLAGS flags = D3D12_HEAP_FLAG_NONE) {
+    SizeInBytes = size;
+    Properties = CD3DX12_HEAP_PROPERTIES(type);
+    Alignment = alignment;
+    Flags = flags;
+  }
+  CD3DX12_HEAP_DESC(UINT64 size, D3D12_CPU_PAGE_PROPERTY cpuPageProperty, D3D12_MEMORY_POOL memoryPoolPreference,
+                    UINT64 alignment = 0, D3D12_HEAP_FLAGS flags = D3D12_HEAP_FLAG_NONE) {
+    SizeInBytes = size;
+    Properties = CD3DX12_HEAP_PROPERTIES(cpuPageProperty, memoryPoolPreference);
+    Alignment = alignment;
+    Flags = flags;
+  }
+  CD3DX12_HEAP_DESC(const D3D12_RESOURCE_ALLOCATION_INFO& resAllocInfo, D3D12_HEAP_PROPERTIES properties,
+                    D3D12_HEAP_FLAGS flags = D3D12_HEAP_FLAG_NONE) {
+    SizeInBytes = resAllocInfo.SizeInBytes;
+    Properties = properties;
+    Alignment = resAllocInfo.Alignment;
+    Flags = flags;
+  }
+  CD3DX12_HEAP_DESC(const D3D12_RESOURCE_ALLOCATION_INFO& resAllocInfo, D3D12_HEAP_TYPE type,
+                    D3D12_HEAP_FLAGS flags = D3D12_HEAP_FLAG_NONE) {
+    SizeInBytes = resAllocInfo.SizeInBytes;
+    Properties = CD3DX12_HEAP_PROPERTIES(type);
+    Alignment = resAllocInfo.Alignment;
+    Flags = flags;
+  }
+  CD3DX12_HEAP_DESC(const D3D12_RESOURCE_ALLOCATION_INFO& resAllocInfo, D3D12_CPU_PAGE_PROPERTY cpuPageProperty,
+                    D3D12_MEMORY_POOL memoryPoolPreference, D3D12_HEAP_FLAGS flags = D3D12_HEAP_FLAG_NONE) {
+    SizeInBytes = resAllocInfo.SizeInBytes;
+    Properties = CD3DX12_HEAP_PROPERTIES(cpuPageProperty, memoryPoolPreference);
+    Alignment = resAllocInfo.Alignment;
+    Flags = flags;
+  }
+  operator const D3D12_HEAP_DESC&() const { return *this; }
+  bool IsCPUAccessible() const { return static_cast<const CD3DX12_HEAP_PROPERTIES*>(&Properties)->IsCPUAccessible(); }
+};
+inline bool operator==(const D3D12_HEAP_DESC& l, const D3D12_HEAP_DESC& r) {
+  return l.SizeInBytes == r.SizeInBytes && l.Properties == r.Properties && l.Alignment == r.Alignment &&
+         l.Flags == r.Flags;
+}
+inline bool operator!=(const D3D12_HEAP_DESC& l, const D3D12_HEAP_DESC& r) { return !(l == r); }
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_CLEAR_VALUE : public D3D12_CLEAR_VALUE {
+  CD3DX12_CLEAR_VALUE() {}
+  explicit CD3DX12_CLEAR_VALUE(const D3D12_CLEAR_VALUE& o) : D3D12_CLEAR_VALUE(o) {}
+  CD3DX12_CLEAR_VALUE(DXGI_FORMAT format, const FLOAT color[4]) {
+    Format = format;
+    memcpy(Color, color, sizeof(Color));
+  }
+  CD3DX12_CLEAR_VALUE(DXGI_FORMAT format, FLOAT depth, UINT8 stencil) {
+    Format = format;
+    /* Use memcpy to preserve NAN values */
+    memcpy(&DepthStencil.Depth, &depth, sizeof(depth));
+    DepthStencil.Stencil = stencil;
+  }
+  operator const D3D12_CLEAR_VALUE&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_RANGE : public D3D12_RANGE {
+  CD3DX12_RANGE() {}
+  explicit CD3DX12_RANGE(const D3D12_RANGE& o) : D3D12_RANGE(o) {}
+  CD3DX12_RANGE(SIZE_T begin, SIZE_T end) {
+    Begin = begin;
+    End = end;
+  }
+  operator const D3D12_RANGE&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_SHADER_BYTECODE : public D3D12_SHADER_BYTECODE {
+  CD3DX12_SHADER_BYTECODE() {}
+  explicit CD3DX12_SHADER_BYTECODE(const D3D12_SHADER_BYTECODE& o) : D3D12_SHADER_BYTECODE(o) {}
+  CD3DX12_SHADER_BYTECODE(_In_ ID3DBlob* pShaderBlob) {
+    pShaderBytecode = pShaderBlob->GetBufferPointer();
+    BytecodeLength = pShaderBlob->GetBufferSize();
+  }
+  CD3DX12_SHADER_BYTECODE(_In_reads_(bytecodeLength) const void* _pShaderBytecode, SIZE_T bytecodeLength) {
+    pShaderBytecode = _pShaderBytecode;
+    BytecodeLength = bytecodeLength;
+  }
+  operator const D3D12_SHADER_BYTECODE&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_TILED_RESOURCE_COORDINATE : public D3D12_TILED_RESOURCE_COORDINATE {
+  CD3DX12_TILED_RESOURCE_COORDINATE() {}
+  explicit CD3DX12_TILED_RESOURCE_COORDINATE(const D3D12_TILED_RESOURCE_COORDINATE& o)
+      : D3D12_TILED_RESOURCE_COORDINATE(o) {}
+  CD3DX12_TILED_RESOURCE_COORDINATE(UINT x, UINT y, UINT z, UINT subresource) {
+    X = x;
+    Y = y;
+    Z = z;
+    Subresource = subresource;
+  }
+  operator const D3D12_TILED_RESOURCE_COORDINATE&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_TILE_REGION_SIZE : public D3D12_TILE_REGION_SIZE {
+  CD3DX12_TILE_REGION_SIZE() {}
+  explicit CD3DX12_TILE_REGION_SIZE(const D3D12_TILE_REGION_SIZE& o) : D3D12_TILE_REGION_SIZE(o) {}
+  CD3DX12_TILE_REGION_SIZE(UINT numTiles, BOOL useBox, UINT width, UINT16 height, UINT16 depth) {
+    NumTiles = numTiles;
+    UseBox = useBox;
+    Width = width;
+    Height = height;
+    Depth = depth;
+  }
+  operator const D3D12_TILE_REGION_SIZE&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_SUBRESOURCE_TILING : public D3D12_SUBRESOURCE_TILING {
+  CD3DX12_SUBRESOURCE_TILING() {}
+  explicit CD3DX12_SUBRESOURCE_TILING(const D3D12_SUBRESOURCE_TILING& o) : D3D12_SUBRESOURCE_TILING(o) {}
+  CD3DX12_SUBRESOURCE_TILING(UINT widthInTiles, UINT16 heightInTiles, UINT16 depthInTiles,
+                             UINT startTileIndexInOverallResource) {
+    WidthInTiles = widthInTiles;
+    HeightInTiles = heightInTiles;
+    DepthInTiles = depthInTiles;
+    StartTileIndexInOverallResource = startTileIndexInOverallResource;
+  }
+  operator const D3D12_SUBRESOURCE_TILING&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_TILE_SHAPE : public D3D12_TILE_SHAPE {
+  CD3DX12_TILE_SHAPE() {}
+  explicit CD3DX12_TILE_SHAPE(const D3D12_TILE_SHAPE& o) : D3D12_TILE_SHAPE(o) {}
+  CD3DX12_TILE_SHAPE(UINT widthInTexels, UINT heightInTexels, UINT depthInTexels) {
+    WidthInTexels = widthInTexels;
+    HeightInTexels = heightInTexels;
+    DepthInTexels = depthInTexels;
+  }
+  operator const D3D12_TILE_SHAPE&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_RESOURCE_BARRIER : public D3D12_RESOURCE_BARRIER {
+  CD3DX12_RESOURCE_BARRIER() {}
+  explicit CD3DX12_RESOURCE_BARRIER(const D3D12_RESOURCE_BARRIER& o) : D3D12_RESOURCE_BARRIER(o) {}
+  static inline CD3DX12_RESOURCE_BARRIER Transition(
+      _In_ ID3D12Resource* pResource, D3D12_RESOURCE_STATES stateBefore, D3D12_RESOURCE_STATES stateAfter,
+      UINT subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES,
+      D3D12_RESOURCE_BARRIER_FLAGS flags = D3D12_RESOURCE_BARRIER_FLAG_NONE) {
+    CD3DX12_RESOURCE_BARRIER result;
+    ZeroMemory(&result, sizeof(result));
+    D3D12_RESOURCE_BARRIER& barrier = result;
+    result.Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION;
+    result.Flags = flags;
+    barrier.Transition.pResource = pResource;
+    barrier.Transition.StateBefore = stateBefore;
+    barrier.Transition.StateAfter = stateAfter;
+    barrier.Transition.Subresource = subresource;
+    return result;
+  }
+  static inline CD3DX12_RESOURCE_BARRIER Aliasing(_In_ ID3D12Resource* pResourceBefore,
+                                                  _In_ ID3D12Resource* pResourceAfter) {
+    CD3DX12_RESOURCE_BARRIER result;
+    ZeroMemory(&result, sizeof(result));
+    D3D12_RESOURCE_BARRIER& barrier = result;
+    result.Type = D3D12_RESOURCE_BARRIER_TYPE_ALIASING;
+    barrier.Aliasing.pResourceBefore = pResourceBefore;
+    barrier.Aliasing.pResourceAfter = pResourceAfter;
+    return result;
+  }
+  static inline CD3DX12_RESOURCE_BARRIER UAV(_In_ ID3D12Resource* pResource) {
+    CD3DX12_RESOURCE_BARRIER result;
+    ZeroMemory(&result, sizeof(result));
+    D3D12_RESOURCE_BARRIER& barrier = result;
+    result.Type = D3D12_RESOURCE_BARRIER_TYPE_UAV;
+    barrier.UAV.pResource = pResource;
+    return result;
+  }
+  operator const D3D12_RESOURCE_BARRIER&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_PACKED_MIP_INFO : public D3D12_PACKED_MIP_INFO {
+  CD3DX12_PACKED_MIP_INFO() {}
+  explicit CD3DX12_PACKED_MIP_INFO(const D3D12_PACKED_MIP_INFO& o) : D3D12_PACKED_MIP_INFO(o) {}
+  CD3DX12_PACKED_MIP_INFO(UINT8 numStandardMips, UINT8 numPackedMips, UINT numTilesForPackedMips,
+                          UINT startTileIndexInOverallResource) {
+    NumStandardMips = numStandardMips;
+    NumPackedMips = numPackedMips;
+    NumTilesForPackedMips = numTilesForPackedMips;
+    StartTileIndexInOverallResource = startTileIndexInOverallResource;
+  }
+  operator const D3D12_PACKED_MIP_INFO&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_SUBRESOURCE_FOOTPRINT : public D3D12_SUBRESOURCE_FOOTPRINT {
+  CD3DX12_SUBRESOURCE_FOOTPRINT() {}
+  explicit CD3DX12_SUBRESOURCE_FOOTPRINT(const D3D12_SUBRESOURCE_FOOTPRINT& o) : D3D12_SUBRESOURCE_FOOTPRINT(o) {}
+  CD3DX12_SUBRESOURCE_FOOTPRINT(DXGI_FORMAT format, UINT width, UINT height, UINT depth, UINT rowPitch) {
+    Format = format;
+    Width = width;
+    Height = height;
+    Depth = depth;
+    RowPitch = rowPitch;
+  }
+  explicit CD3DX12_SUBRESOURCE_FOOTPRINT(const D3D12_RESOURCE_DESC& resDesc, UINT rowPitch) {
+    Format = resDesc.Format;
+    Width = UINT(resDesc.Width);
+    Height = resDesc.Height;
+    Depth = (resDesc.Dimension == D3D12_RESOURCE_DIMENSION_TEXTURE3D ? resDesc.DepthOrArraySize : 1);
+    RowPitch = rowPitch;
+  }
+  operator const D3D12_SUBRESOURCE_FOOTPRINT&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_TEXTURE_COPY_LOCATION : public D3D12_TEXTURE_COPY_LOCATION {
+  CD3DX12_TEXTURE_COPY_LOCATION() {}
+  explicit CD3DX12_TEXTURE_COPY_LOCATION(const D3D12_TEXTURE_COPY_LOCATION& o) : D3D12_TEXTURE_COPY_LOCATION(o) {}
+  CD3DX12_TEXTURE_COPY_LOCATION(ID3D12Resource* pRes) { pResource = pRes; }
+  CD3DX12_TEXTURE_COPY_LOCATION(ID3D12Resource* pRes, D3D12_PLACED_SUBRESOURCE_FOOTPRINT const& Footprint) {
+    pResource = pRes;
+    Type = D3D12_TEXTURE_COPY_TYPE_PLACED_FOOTPRINT;
+    PlacedFootprint = Footprint;
+  }
+  CD3DX12_TEXTURE_COPY_LOCATION(ID3D12Resource* pRes, UINT Sub) {
+    pResource = pRes;
+    Type = D3D12_TEXTURE_COPY_TYPE_SUBRESOURCE_INDEX;
+    SubresourceIndex = Sub;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_DESCRIPTOR_RANGE : public D3D12_DESCRIPTOR_RANGE {
+  CD3DX12_DESCRIPTOR_RANGE() {}
+  explicit CD3DX12_DESCRIPTOR_RANGE(const D3D12_DESCRIPTOR_RANGE& o) : D3D12_DESCRIPTOR_RANGE(o) {}
+  CD3DX12_DESCRIPTOR_RANGE(D3D12_DESCRIPTOR_RANGE_TYPE rangeType, UINT numDescriptors, UINT baseShaderRegister,
+                           UINT registerSpace = 0,
+                           UINT offsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND) {
+    Init(rangeType, numDescriptors, baseShaderRegister, registerSpace, offsetInDescriptorsFromTableStart);
+  }
+
+  inline void Init(D3D12_DESCRIPTOR_RANGE_TYPE rangeType, UINT numDescriptors, UINT baseShaderRegister,
+                   UINT registerSpace = 0,
+                   UINT offsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND) {
+    Init(*this, rangeType, numDescriptors, baseShaderRegister, registerSpace, offsetInDescriptorsFromTableStart);
+  }
+
+  static inline void Init(_Out_ D3D12_DESCRIPTOR_RANGE& range, D3D12_DESCRIPTOR_RANGE_TYPE rangeType,
+                          UINT numDescriptors, UINT baseShaderRegister, UINT registerSpace = 0,
+                          UINT offsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND) {
+    range.RangeType = rangeType;
+    range.NumDescriptors = numDescriptors;
+    range.BaseShaderRegister = baseShaderRegister;
+    range.RegisterSpace = registerSpace;
+    range.OffsetInDescriptorsFromTableStart = offsetInDescriptorsFromTableStart;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_ROOT_DESCRIPTOR_TABLE : public D3D12_ROOT_DESCRIPTOR_TABLE {
+  CD3DX12_ROOT_DESCRIPTOR_TABLE() {}
+  explicit CD3DX12_ROOT_DESCRIPTOR_TABLE(const D3D12_ROOT_DESCRIPTOR_TABLE& o) : D3D12_ROOT_DESCRIPTOR_TABLE(o) {}
+  CD3DX12_ROOT_DESCRIPTOR_TABLE(UINT numDescriptorRanges,
+                                _In_reads_opt_(numDescriptorRanges) const D3D12_DESCRIPTOR_RANGE* _pDescriptorRanges) {
+    Init(numDescriptorRanges, _pDescriptorRanges);
+  }
+
+  inline void Init(UINT numDescriptorRanges,
+                   _In_reads_opt_(numDescriptorRanges) const D3D12_DESCRIPTOR_RANGE* _pDescriptorRanges) {
+    Init(*this, numDescriptorRanges, _pDescriptorRanges);
+  }
+
+  static inline void Init(_Out_ D3D12_ROOT_DESCRIPTOR_TABLE& rootDescriptorTable, UINT numDescriptorRanges,
+                          _In_reads_opt_(numDescriptorRanges) const D3D12_DESCRIPTOR_RANGE* _pDescriptorRanges) {
+    rootDescriptorTable.NumDescriptorRanges = numDescriptorRanges;
+    rootDescriptorTable.pDescriptorRanges = _pDescriptorRanges;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_ROOT_CONSTANTS : public D3D12_ROOT_CONSTANTS {
+  CD3DX12_ROOT_CONSTANTS() {}
+  explicit CD3DX12_ROOT_CONSTANTS(const D3D12_ROOT_CONSTANTS& o) : D3D12_ROOT_CONSTANTS(o) {}
+  CD3DX12_ROOT_CONSTANTS(UINT num32BitValues, UINT shaderRegister, UINT registerSpace = 0) {
+    Init(num32BitValues, shaderRegister, registerSpace);
+  }
+
+  inline void Init(UINT num32BitValues, UINT shaderRegister, UINT registerSpace = 0) {
+    Init(*this, num32BitValues, shaderRegister, registerSpace);
+  }
+
+  static inline void Init(_Out_ D3D12_ROOT_CONSTANTS& rootConstants, UINT num32BitValues, UINT shaderRegister,
+                          UINT registerSpace = 0) {
+    rootConstants.Num32BitValues = num32BitValues;
+    rootConstants.ShaderRegister = shaderRegister;
+    rootConstants.RegisterSpace = registerSpace;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_ROOT_DESCRIPTOR : public D3D12_ROOT_DESCRIPTOR {
+  CD3DX12_ROOT_DESCRIPTOR() {}
+  explicit CD3DX12_ROOT_DESCRIPTOR(const D3D12_ROOT_DESCRIPTOR& o) : D3D12_ROOT_DESCRIPTOR(o) {}
+  CD3DX12_ROOT_DESCRIPTOR(UINT shaderRegister, UINT registerSpace = 0) { Init(shaderRegister, registerSpace); }
+
+  inline void Init(UINT shaderRegister, UINT registerSpace = 0) { Init(*this, shaderRegister, registerSpace); }
+
+  static inline void Init(_Out_ D3D12_ROOT_DESCRIPTOR& table, UINT shaderRegister, UINT registerSpace = 0) {
+    table.ShaderRegister = shaderRegister;
+    table.RegisterSpace = registerSpace;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_ROOT_PARAMETER : public D3D12_ROOT_PARAMETER {
+  CD3DX12_ROOT_PARAMETER() {}
+  explicit CD3DX12_ROOT_PARAMETER(const D3D12_ROOT_PARAMETER& o) : D3D12_ROOT_PARAMETER(o) {}
+
+  static inline void InitAsDescriptorTable(_Out_ D3D12_ROOT_PARAMETER& rootParam, UINT numDescriptorRanges,
+                                           _In_reads_(numDescriptorRanges)
+                                               const D3D12_DESCRIPTOR_RANGE* pDescriptorRanges,
+                                           D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_DESCRIPTOR_TABLE::Init(rootParam.DescriptorTable, numDescriptorRanges, pDescriptorRanges);
+  }
+
+  static inline void InitAsConstants(_Out_ D3D12_ROOT_PARAMETER& rootParam, UINT num32BitValues, UINT shaderRegister,
+                                     UINT registerSpace = 0,
+                                     D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_32BIT_CONSTANTS;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_CONSTANTS::Init(rootParam.Constants, num32BitValues, shaderRegister, registerSpace);
+  }
+
+  static inline void InitAsConstantBufferView(_Out_ D3D12_ROOT_PARAMETER& rootParam, UINT shaderRegister,
+                                              UINT registerSpace = 0,
+                                              D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_CBV;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_DESCRIPTOR::Init(rootParam.Descriptor, shaderRegister, registerSpace);
+  }
+
+  static inline void InitAsShaderResourceView(_Out_ D3D12_ROOT_PARAMETER& rootParam, UINT shaderRegister,
+                                              UINT registerSpace = 0,
+                                              D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_SRV;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_DESCRIPTOR::Init(rootParam.Descriptor, shaderRegister, registerSpace);
+  }
+
+  static inline void InitAsUnorderedAccessView(_Out_ D3D12_ROOT_PARAMETER& rootParam, UINT shaderRegister,
+                                               UINT registerSpace = 0,
+                                               D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_UAV;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_DESCRIPTOR::Init(rootParam.Descriptor, shaderRegister, registerSpace);
+  }
+
+  inline void InitAsDescriptorTable(UINT numDescriptorRanges,
+                                    _In_reads_(numDescriptorRanges) const D3D12_DESCRIPTOR_RANGE* pDescriptorRanges,
+                                    D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsDescriptorTable(*this, numDescriptorRanges, pDescriptorRanges, visibility);
+  }
+
+  inline void InitAsConstants(UINT num32BitValues, UINT shaderRegister, UINT registerSpace = 0,
+                              D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsConstants(*this, num32BitValues, shaderRegister, registerSpace, visibility);
+  }
+
+  inline void InitAsConstantBufferView(UINT shaderRegister, UINT registerSpace = 0,
+                                       D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsConstantBufferView(*this, shaderRegister, registerSpace, visibility);
+  }
+
+  inline void InitAsShaderResourceView(UINT shaderRegister, UINT registerSpace = 0,
+                                       D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsShaderResourceView(*this, shaderRegister, registerSpace, visibility);
+  }
+
+  inline void InitAsUnorderedAccessView(UINT shaderRegister, UINT registerSpace = 0,
+                                        D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsUnorderedAccessView(*this, shaderRegister, registerSpace, visibility);
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_STATIC_SAMPLER_DESC : public D3D12_STATIC_SAMPLER_DESC {
+  CD3DX12_STATIC_SAMPLER_DESC() {}
+  explicit CD3DX12_STATIC_SAMPLER_DESC(const D3D12_STATIC_SAMPLER_DESC& o) : D3D12_STATIC_SAMPLER_DESC(o) {}
+  CD3DX12_STATIC_SAMPLER_DESC(UINT shaderRegister, D3D12_FILTER filter = D3D12_FILTER_ANISOTROPIC,
+                              D3D12_TEXTURE_ADDRESS_MODE addressU = D3D12_TEXTURE_ADDRESS_MODE_WRAP,
+                              D3D12_TEXTURE_ADDRESS_MODE addressV = D3D12_TEXTURE_ADDRESS_MODE_WRAP,
+                              D3D12_TEXTURE_ADDRESS_MODE addressW = D3D12_TEXTURE_ADDRESS_MODE_WRAP,
+                              FLOAT mipLODBias = 0, UINT maxAnisotropy = 16,
+                              D3D12_COMPARISON_FUNC comparisonFunc = D3D12_COMPARISON_FUNC_LESS_EQUAL,
+                              D3D12_STATIC_BORDER_COLOR borderColor = D3D12_STATIC_BORDER_COLOR_OPAQUE_WHITE,
+                              FLOAT minLOD = 0.f, FLOAT maxLOD = D3D12_FLOAT32_MAX,
+                              D3D12_SHADER_VISIBILITY shaderVisibility = D3D12_SHADER_VISIBILITY_ALL,
+                              UINT registerSpace = 0) {
+    Init(shaderRegister, filter, addressU, addressV, addressW, mipLODBias, maxAnisotropy, comparisonFunc, borderColor,
+         minLOD, maxLOD, shaderVisibility, registerSpace);
+  }
+
+  static inline void Init(_Out_ D3D12_STATIC_SAMPLER_DESC& samplerDesc, UINT shaderRegister,
+                          D3D12_FILTER filter = D3D12_FILTER_ANISOTROPIC,
+                          D3D12_TEXTURE_ADDRESS_MODE addressU = D3D12_TEXTURE_ADDRESS_MODE_WRAP,
+                          D3D12_TEXTURE_ADDRESS_MODE addressV = D3D12_TEXTURE_ADDRESS_MODE_WRAP,
+                          D3D12_TEXTURE_ADDRESS_MODE addressW = D3D12_TEXTURE_ADDRESS_MODE_WRAP, FLOAT mipLODBias = 0,
+                          UINT maxAnisotropy = 16,
+                          D3D12_COMPARISON_FUNC comparisonFunc = D3D12_COMPARISON_FUNC_LESS_EQUAL,
+                          D3D12_STATIC_BORDER_COLOR borderColor = D3D12_STATIC_BORDER_COLOR_OPAQUE_WHITE,
+                          FLOAT minLOD = 0.f, FLOAT maxLOD = D3D12_FLOAT32_MAX,
+                          D3D12_SHADER_VISIBILITY shaderVisibility = D3D12_SHADER_VISIBILITY_ALL,
+                          UINT registerSpace = 0) {
+    samplerDesc.ShaderRegister = shaderRegister;
+    samplerDesc.Filter = filter;
+    samplerDesc.AddressU = addressU;
+    samplerDesc.AddressV = addressV;
+    samplerDesc.AddressW = addressW;
+    samplerDesc.MipLODBias = mipLODBias;
+    samplerDesc.MaxAnisotropy = maxAnisotropy;
+    samplerDesc.ComparisonFunc = comparisonFunc;
+    samplerDesc.BorderColor = borderColor;
+    samplerDesc.MinLOD = minLOD;
+    samplerDesc.MaxLOD = maxLOD;
+    samplerDesc.ShaderVisibility = shaderVisibility;
+    samplerDesc.RegisterSpace = registerSpace;
+  }
+  inline void Init(UINT shaderRegister, D3D12_FILTER filter = D3D12_FILTER_ANISOTROPIC,
+                   D3D12_TEXTURE_ADDRESS_MODE addressU = D3D12_TEXTURE_ADDRESS_MODE_WRAP,
+                   D3D12_TEXTURE_ADDRESS_MODE addressV = D3D12_TEXTURE_ADDRESS_MODE_WRAP,
+                   D3D12_TEXTURE_ADDRESS_MODE addressW = D3D12_TEXTURE_ADDRESS_MODE_WRAP, FLOAT mipLODBias = 0,
+                   UINT maxAnisotropy = 16, D3D12_COMPARISON_FUNC comparisonFunc = D3D12_COMPARISON_FUNC_LESS_EQUAL,
+                   D3D12_STATIC_BORDER_COLOR borderColor = D3D12_STATIC_BORDER_COLOR_OPAQUE_WHITE, FLOAT minLOD = 0.f,
+                   FLOAT maxLOD = D3D12_FLOAT32_MAX,
+                   D3D12_SHADER_VISIBILITY shaderVisibility = D3D12_SHADER_VISIBILITY_ALL, UINT registerSpace = 0) {
+    Init(*this, shaderRegister, filter, addressU, addressV, addressW, mipLODBias, maxAnisotropy, comparisonFunc,
+         borderColor, minLOD, maxLOD, shaderVisibility, registerSpace);
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_ROOT_SIGNATURE_DESC : public D3D12_ROOT_SIGNATURE_DESC {
+  CD3DX12_ROOT_SIGNATURE_DESC() {}
+  explicit CD3DX12_ROOT_SIGNATURE_DESC(const D3D12_ROOT_SIGNATURE_DESC& o) : D3D12_ROOT_SIGNATURE_DESC(o) {}
+  CD3DX12_ROOT_SIGNATURE_DESC(UINT numParameters,
+                              _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER* _pParameters,
+                              UINT numStaticSamplers = 0,
+                              _In_reads_opt_(numStaticSamplers)
+                                  const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                              D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    Init(numParameters, _pParameters, numStaticSamplers, _pStaticSamplers, flags);
+  }
+  CD3DX12_ROOT_SIGNATURE_DESC(CD3DX12_DEFAULT) { Init(0, NULL, 0, NULL, D3D12_ROOT_SIGNATURE_FLAG_NONE); }
+
+  inline void Init(UINT numParameters, _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER* _pParameters,
+                   UINT numStaticSamplers = 0,
+                   _In_reads_opt_(numStaticSamplers) const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                   D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    Init(*this, numParameters, _pParameters, numStaticSamplers, _pStaticSamplers, flags);
+  }
+
+  static inline void Init(_Out_ D3D12_ROOT_SIGNATURE_DESC& desc, UINT numParameters,
+                          _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER* _pParameters,
+                          UINT numStaticSamplers = 0,
+                          _In_reads_opt_(numStaticSamplers) const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                          D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    desc.NumParameters = numParameters;
+    desc.pParameters = _pParameters;
+    desc.NumStaticSamplers = numStaticSamplers;
+    desc.pStaticSamplers = _pStaticSamplers;
+    desc.Flags = flags;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_DESCRIPTOR_RANGE1 : public D3D12_DESCRIPTOR_RANGE1 {
+  CD3DX12_DESCRIPTOR_RANGE1() {}
+  explicit CD3DX12_DESCRIPTOR_RANGE1(const D3D12_DESCRIPTOR_RANGE1& o) : D3D12_DESCRIPTOR_RANGE1(o) {}
+  CD3DX12_DESCRIPTOR_RANGE1(D3D12_DESCRIPTOR_RANGE_TYPE rangeType, UINT numDescriptors, UINT baseShaderRegister,
+                            UINT registerSpace = 0,
+                            D3D12_DESCRIPTOR_RANGE_FLAGS flags = D3D12_DESCRIPTOR_RANGE_FLAG_NONE,
+                            UINT offsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND) {
+    Init(rangeType, numDescriptors, baseShaderRegister, registerSpace, flags, offsetInDescriptorsFromTableStart);
+  }
+
+  inline void Init(D3D12_DESCRIPTOR_RANGE_TYPE rangeType, UINT numDescriptors, UINT baseShaderRegister,
+                   UINT registerSpace = 0, D3D12_DESCRIPTOR_RANGE_FLAGS flags = D3D12_DESCRIPTOR_RANGE_FLAG_NONE,
+                   UINT offsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND) {
+    Init(*this, rangeType, numDescriptors, baseShaderRegister, registerSpace, flags, offsetInDescriptorsFromTableStart);
+  }
+
+  static inline void Init(_Out_ D3D12_DESCRIPTOR_RANGE1& range, D3D12_DESCRIPTOR_RANGE_TYPE rangeType,
+                          UINT numDescriptors, UINT baseShaderRegister, UINT registerSpace = 0,
+                          D3D12_DESCRIPTOR_RANGE_FLAGS flags = D3D12_DESCRIPTOR_RANGE_FLAG_NONE,
+                          UINT offsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND) {
+    range.RangeType = rangeType;
+    range.NumDescriptors = numDescriptors;
+    range.BaseShaderRegister = baseShaderRegister;
+    range.RegisterSpace = registerSpace;
+    range.Flags = flags;
+    range.OffsetInDescriptorsFromTableStart = offsetInDescriptorsFromTableStart;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_ROOT_DESCRIPTOR_TABLE1 : public D3D12_ROOT_DESCRIPTOR_TABLE1 {
+  CD3DX12_ROOT_DESCRIPTOR_TABLE1() {}
+  explicit CD3DX12_ROOT_DESCRIPTOR_TABLE1(const D3D12_ROOT_DESCRIPTOR_TABLE1& o) : D3D12_ROOT_DESCRIPTOR_TABLE1(o) {}
+  CD3DX12_ROOT_DESCRIPTOR_TABLE1(UINT numDescriptorRanges, _In_reads_opt_(numDescriptorRanges)
+                                                               const D3D12_DESCRIPTOR_RANGE1* _pDescriptorRanges) {
+    Init(numDescriptorRanges, _pDescriptorRanges);
+  }
+
+  inline void Init(UINT numDescriptorRanges,
+                   _In_reads_opt_(numDescriptorRanges) const D3D12_DESCRIPTOR_RANGE1* _pDescriptorRanges) {
+    Init(*this, numDescriptorRanges, _pDescriptorRanges);
+  }
+
+  static inline void Init(_Out_ D3D12_ROOT_DESCRIPTOR_TABLE1& rootDescriptorTable, UINT numDescriptorRanges,
+                          _In_reads_opt_(numDescriptorRanges) const D3D12_DESCRIPTOR_RANGE1* _pDescriptorRanges) {
+    rootDescriptorTable.NumDescriptorRanges = numDescriptorRanges;
+    rootDescriptorTable.pDescriptorRanges = _pDescriptorRanges;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_ROOT_DESCRIPTOR1 : public D3D12_ROOT_DESCRIPTOR1 {
+  CD3DX12_ROOT_DESCRIPTOR1() {}
+  explicit CD3DX12_ROOT_DESCRIPTOR1(const D3D12_ROOT_DESCRIPTOR1& o) : D3D12_ROOT_DESCRIPTOR1(o) {}
+  CD3DX12_ROOT_DESCRIPTOR1(UINT shaderRegister, UINT registerSpace = 0,
+                           D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE) {
+    Init(shaderRegister, registerSpace, flags);
+  }
+
+  inline void Init(UINT shaderRegister, UINT registerSpace = 0,
+                   D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE) {
+    Init(*this, shaderRegister, registerSpace, flags);
+  }
+
+  static inline void Init(_Out_ D3D12_ROOT_DESCRIPTOR1& table, UINT shaderRegister, UINT registerSpace = 0,
+                          D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE) {
+    table.ShaderRegister = shaderRegister;
+    table.RegisterSpace = registerSpace;
+    table.Flags = flags;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_ROOT_PARAMETER1 : public D3D12_ROOT_PARAMETER1 {
+  CD3DX12_ROOT_PARAMETER1() {}
+  explicit CD3DX12_ROOT_PARAMETER1(const D3D12_ROOT_PARAMETER1& o) : D3D12_ROOT_PARAMETER1(o) {}
+
+  static inline void InitAsDescriptorTable(_Out_ D3D12_ROOT_PARAMETER1& rootParam, UINT numDescriptorRanges,
+                                           _In_reads_(numDescriptorRanges)
+                                               const D3D12_DESCRIPTOR_RANGE1* pDescriptorRanges,
+                                           D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_DESCRIPTOR_TABLE1::Init(rootParam.DescriptorTable, numDescriptorRanges, pDescriptorRanges);
+  }
+
+  static inline void InitAsConstants(_Out_ D3D12_ROOT_PARAMETER1& rootParam, UINT num32BitValues, UINT shaderRegister,
+                                     UINT registerSpace = 0,
+                                     D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_32BIT_CONSTANTS;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_CONSTANTS::Init(rootParam.Constants, num32BitValues, shaderRegister, registerSpace);
+  }
+
+  static inline void InitAsConstantBufferView(_Out_ D3D12_ROOT_PARAMETER1& rootParam, UINT shaderRegister,
+                                              UINT registerSpace = 0,
+                                              D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE,
+                                              D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_CBV;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_DESCRIPTOR1::Init(rootParam.Descriptor, shaderRegister, registerSpace, flags);
+  }
+
+  static inline void InitAsShaderResourceView(_Out_ D3D12_ROOT_PARAMETER1& rootParam, UINT shaderRegister,
+                                              UINT registerSpace = 0,
+                                              D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE,
+                                              D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_SRV;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_DESCRIPTOR1::Init(rootParam.Descriptor, shaderRegister, registerSpace, flags);
+  }
+
+  static inline void InitAsUnorderedAccessView(_Out_ D3D12_ROOT_PARAMETER1& rootParam, UINT shaderRegister,
+                                               UINT registerSpace = 0,
+                                               D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE,
+                                               D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_UAV;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_DESCRIPTOR1::Init(rootParam.Descriptor, shaderRegister, registerSpace, flags);
+  }
+
+  inline void InitAsDescriptorTable(UINT numDescriptorRanges,
+                                    _In_reads_(numDescriptorRanges) const D3D12_DESCRIPTOR_RANGE1* pDescriptorRanges,
+                                    D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsDescriptorTable(*this, numDescriptorRanges, pDescriptorRanges, visibility);
+  }
+
+  inline void InitAsConstants(UINT num32BitValues, UINT shaderRegister, UINT registerSpace = 0,
+                              D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsConstants(*this, num32BitValues, shaderRegister, registerSpace, visibility);
+  }
+
+  inline void InitAsConstantBufferView(UINT shaderRegister, UINT registerSpace = 0,
+                                       D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE,
+                                       D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsConstantBufferView(*this, shaderRegister, registerSpace, flags, visibility);
+  }
+
+  inline void InitAsShaderResourceView(UINT shaderRegister, UINT registerSpace = 0,
+                                       D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE,
+                                       D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsShaderResourceView(*this, shaderRegister, registerSpace, flags, visibility);
+  }
+
+  inline void InitAsUnorderedAccessView(UINT shaderRegister, UINT registerSpace = 0,
+                                        D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE,
+                                        D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsUnorderedAccessView(*this, shaderRegister, registerSpace, flags, visibility);
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC : public D3D12_VERSIONED_ROOT_SIGNATURE_DESC {
+  CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC() {}
+  explicit CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC(const D3D12_VERSIONED_ROOT_SIGNATURE_DESC& o)
+      : D3D12_VERSIONED_ROOT_SIGNATURE_DESC(o) {}
+  explicit CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC(const D3D12_ROOT_SIGNATURE_DESC& o) {
+    Version = D3D_ROOT_SIGNATURE_VERSION_1_0;
+    Desc_1_0 = o;
+  }
+  explicit CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC(const D3D12_ROOT_SIGNATURE_DESC1& o) {
+    Version = D3D_ROOT_SIGNATURE_VERSION_1_1;
+    Desc_1_1 = o;
+  }
+  CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC(UINT numParameters,
+                                        _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER* _pParameters,
+                                        UINT numStaticSamplers = 0,
+                                        _In_reads_opt_(numStaticSamplers)
+                                            const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                                        D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    Init_1_0(numParameters, _pParameters, numStaticSamplers, _pStaticSamplers, flags);
+  }
+  CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC(UINT numParameters,
+                                        _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER1* _pParameters,
+                                        UINT numStaticSamplers = 0,
+                                        _In_reads_opt_(numStaticSamplers)
+                                            const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                                        D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    Init_1_1(numParameters, _pParameters, numStaticSamplers, _pStaticSamplers, flags);
+  }
+  CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC(CD3DX12_DEFAULT) { Init_1_1(0, NULL, 0, NULL, D3D12_ROOT_SIGNATURE_FLAG_NONE); }
+
+  inline void Init_1_0(UINT numParameters, _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER* _pParameters,
+                       UINT numStaticSamplers = 0,
+                       _In_reads_opt_(numStaticSamplers) const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                       D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    Init_1_0(*this, numParameters, _pParameters, numStaticSamplers, _pStaticSamplers, flags);
+  }
+
+  static inline void Init_1_0(_Out_ D3D12_VERSIONED_ROOT_SIGNATURE_DESC& desc, UINT numParameters,
+                              _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER* _pParameters,
+                              UINT numStaticSamplers = 0,
+                              _In_reads_opt_(numStaticSamplers)
+                                  const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                              D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    desc.Version = D3D_ROOT_SIGNATURE_VERSION_1_0;
+    desc.Desc_1_0.NumParameters = numParameters;
+    desc.Desc_1_0.pParameters = _pParameters;
+    desc.Desc_1_0.NumStaticSamplers = numStaticSamplers;
+    desc.Desc_1_0.pStaticSamplers = _pStaticSamplers;
+    desc.Desc_1_0.Flags = flags;
+  }
+
+  inline void Init_1_1(UINT numParameters, _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER1* _pParameters,
+                       UINT numStaticSamplers = 0,
+                       _In_reads_opt_(numStaticSamplers) const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                       D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    Init_1_1(*this, numParameters, _pParameters, numStaticSamplers, _pStaticSamplers, flags);
+  }
+
+  static inline void Init_1_1(_Out_ D3D12_VERSIONED_ROOT_SIGNATURE_DESC& desc, UINT numParameters,
+                              _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER1* _pParameters,
+                              UINT numStaticSamplers = 0,
+                              _In_reads_opt_(numStaticSamplers)
+                                  const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                              D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    desc.Version = D3D_ROOT_SIGNATURE_VERSION_1_1;
+    desc.Desc_1_1.NumParameters = numParameters;
+    desc.Desc_1_1.pParameters = _pParameters;
+    desc.Desc_1_1.NumStaticSamplers = numStaticSamplers;
+    desc.Desc_1_1.pStaticSamplers = _pStaticSamplers;
+    desc.Desc_1_1.Flags = flags;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_CPU_DESCRIPTOR_HANDLE : public D3D12_CPU_DESCRIPTOR_HANDLE {
+  CD3DX12_CPU_DESCRIPTOR_HANDLE() {}
+  explicit CD3DX12_CPU_DESCRIPTOR_HANDLE(const D3D12_CPU_DESCRIPTOR_HANDLE& o) : D3D12_CPU_DESCRIPTOR_HANDLE(o) {}
+  CD3DX12_CPU_DESCRIPTOR_HANDLE(CD3DX12_DEFAULT) { ptr = 0; }
+  CD3DX12_CPU_DESCRIPTOR_HANDLE(_In_ const D3D12_CPU_DESCRIPTOR_HANDLE& other, INT offsetScaledByIncrementSize) {
+    InitOffsetted(other, offsetScaledByIncrementSize);
+  }
+  CD3DX12_CPU_DESCRIPTOR_HANDLE(_In_ const D3D12_CPU_DESCRIPTOR_HANDLE& other, INT offsetInDescriptors,
+                                UINT descriptorIncrementSize) {
+    InitOffsetted(other, offsetInDescriptors, descriptorIncrementSize);
+  }
+  CD3DX12_CPU_DESCRIPTOR_HANDLE& Offset(INT offsetInDescriptors, UINT descriptorIncrementSize) {
+    ptr += offsetInDescriptors * descriptorIncrementSize;
+    return *this;
+  }
+  CD3DX12_CPU_DESCRIPTOR_HANDLE& Offset(INT offsetScaledByIncrementSize) {
+    ptr += offsetScaledByIncrementSize;
+    return *this;
+  }
+  bool operator==(_In_ const D3D12_CPU_DESCRIPTOR_HANDLE& other) const { return (ptr == other.ptr); }
+  bool operator!=(_In_ const D3D12_CPU_DESCRIPTOR_HANDLE& other) const { return (ptr != other.ptr); }
+  CD3DX12_CPU_DESCRIPTOR_HANDLE& operator=(const D3D12_CPU_DESCRIPTOR_HANDLE& other) {
+    ptr = other.ptr;
+    return *this;
+  }
+
+  inline void InitOffsetted(_In_ const D3D12_CPU_DESCRIPTOR_HANDLE& base, INT offsetScaledByIncrementSize) {
+    InitOffsetted(*this, base, offsetScaledByIncrementSize);
+  }
+
+  inline void InitOffsetted(_In_ const D3D12_CPU_DESCRIPTOR_HANDLE& base, INT offsetInDescriptors,
+                            UINT descriptorIncrementSize) {
+    InitOffsetted(*this, base, offsetInDescriptors, descriptorIncrementSize);
+  }
+
+  static inline void InitOffsetted(_Out_ D3D12_CPU_DESCRIPTOR_HANDLE& handle,
+                                   _In_ const D3D12_CPU_DESCRIPTOR_HANDLE& base, INT offsetScaledByIncrementSize) {
+    handle.ptr = base.ptr + offsetScaledByIncrementSize;
+  }
+
+  static inline void InitOffsetted(_Out_ D3D12_CPU_DESCRIPTOR_HANDLE& handle,
+                                   _In_ const D3D12_CPU_DESCRIPTOR_HANDLE& base, INT offsetInDescriptors,
+                                   UINT descriptorIncrementSize) {
+    handle.ptr = base.ptr + offsetInDescriptors * descriptorIncrementSize;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_GPU_DESCRIPTOR_HANDLE : public D3D12_GPU_DESCRIPTOR_HANDLE {
+  CD3DX12_GPU_DESCRIPTOR_HANDLE() {}
+  explicit CD3DX12_GPU_DESCRIPTOR_HANDLE(const D3D12_GPU_DESCRIPTOR_HANDLE& o) : D3D12_GPU_DESCRIPTOR_HANDLE(o) {}
+  CD3DX12_GPU_DESCRIPTOR_HANDLE(CD3DX12_DEFAULT) { ptr = 0; }
+  CD3DX12_GPU_DESCRIPTOR_HANDLE(_In_ const D3D12_GPU_DESCRIPTOR_HANDLE& other, INT offsetScaledByIncrementSize) {
+    InitOffsetted(other, offsetScaledByIncrementSize);
+  }
+  CD3DX12_GPU_DESCRIPTOR_HANDLE(_In_ const D3D12_GPU_DESCRIPTOR_HANDLE& other, INT offsetInDescriptors,
+                                UINT descriptorIncrementSize) {
+    InitOffsetted(other, offsetInDescriptors, descriptorIncrementSize);
+  }
+  CD3DX12_GPU_DESCRIPTOR_HANDLE& Offset(INT offsetInDescriptors, UINT descriptorIncrementSize) {
+    ptr += offsetInDescriptors * descriptorIncrementSize;
+    return *this;
+  }
+  CD3DX12_GPU_DESCRIPTOR_HANDLE& Offset(INT offsetScaledByIncrementSize) {
+    ptr += offsetScaledByIncrementSize;
+    return *this;
+  }
+  inline bool operator==(_In_ const D3D12_GPU_DESCRIPTOR_HANDLE& other) const { return (ptr == other.ptr); }
+  inline bool operator!=(_In_ const D3D12_GPU_DESCRIPTOR_HANDLE& other) const { return (ptr != other.ptr); }
+  CD3DX12_GPU_DESCRIPTOR_HANDLE& operator=(const D3D12_GPU_DESCRIPTOR_HANDLE& other) {
+    ptr = other.ptr;
+    return *this;
+  }
+
+  inline void InitOffsetted(_In_ const D3D12_GPU_DESCRIPTOR_HANDLE& base, INT offsetScaledByIncrementSize) {
+    InitOffsetted(*this, base, offsetScaledByIncrementSize);
+  }
+
+  inline void InitOffsetted(_In_ const D3D12_GPU_DESCRIPTOR_HANDLE& base, INT offsetInDescriptors,
+                            UINT descriptorIncrementSize) {
+    InitOffsetted(*this, base, offsetInDescriptors, descriptorIncrementSize);
+  }
+
+  static inline void InitOffsetted(_Out_ D3D12_GPU_DESCRIPTOR_HANDLE& handle,
+                                   _In_ const D3D12_GPU_DESCRIPTOR_HANDLE& base, INT offsetScaledByIncrementSize) {
+    handle.ptr = base.ptr + offsetScaledByIncrementSize;
+  }
+
+  static inline void InitOffsetted(_Out_ D3D12_GPU_DESCRIPTOR_HANDLE& handle,
+                                   _In_ const D3D12_GPU_DESCRIPTOR_HANDLE& base, INT offsetInDescriptors,
+                                   UINT descriptorIncrementSize) {
+    handle.ptr = base.ptr + offsetInDescriptors * descriptorIncrementSize;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+inline UINT D3D12CalcSubresource(UINT MipSlice, UINT ArraySlice, UINT PlaneSlice, UINT MipLevels, UINT ArraySize) {
+  return MipSlice + ArraySlice * MipLevels + PlaneSlice * MipLevels * ArraySize;
+}
+
+//------------------------------------------------------------------------------------------------
+template <typename T, typename U, typename V>
+inline void D3D12DecomposeSubresource(UINT Subresource, UINT MipLevels, UINT ArraySize, _Out_ T& MipSlice,
+                                      _Out_ U& ArraySlice, _Out_ V& PlaneSlice) {
+  MipSlice = static_cast<T>(Subresource % MipLevels);
+  ArraySlice = static_cast<U>((Subresource / MipLevels) % ArraySize);
+  PlaneSlice = static_cast<V>(Subresource / (MipLevels * ArraySize));
+}
+
+//------------------------------------------------------------------------------------------------
+inline UINT8 D3D12GetFormatPlaneCount(_In_ ID3D12Device* pDevice, DXGI_FORMAT Format) {
+  D3D12_FEATURE_DATA_FORMAT_INFO formatInfo = {Format};
+  if (FAILED(pDevice->CheckFeatureSupport(D3D12_FEATURE_FORMAT_INFO, &formatInfo, sizeof(formatInfo)))) {
+    return 0;
+  }
+  return formatInfo.PlaneCount;
+}
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_RESOURCE_DESC : public D3D12_RESOURCE_DESC {
+  CD3DX12_RESOURCE_DESC() {}
+  explicit CD3DX12_RESOURCE_DESC(const D3D12_RESOURCE_DESC& o) : D3D12_RESOURCE_DESC(o) {}
+  CD3DX12_RESOURCE_DESC(D3D12_RESOURCE_DIMENSION dimension, UINT64 alignment, UINT64 width, UINT height,
+                        UINT16 depthOrArraySize, UINT16 mipLevels, DXGI_FORMAT format, UINT sampleCount,
+                        UINT sampleQuality, D3D12_TEXTURE_LAYOUT layout, D3D12_RESOURCE_FLAGS flags) {
+    Dimension = dimension;
+    Alignment = alignment;
+    Width = width;
+    Height = height;
+    DepthOrArraySize = depthOrArraySize;
+    MipLevels = mipLevels;
+    Format = format;
+    SampleDesc.Count = sampleCount;
+    SampleDesc.Quality = sampleQuality;
+    Layout = layout;
+    Flags = flags;
+  }
+  static inline CD3DX12_RESOURCE_DESC Buffer(const D3D12_RESOURCE_ALLOCATION_INFO& resAllocInfo,
+                                             D3D12_RESOURCE_FLAGS flags = D3D12_RESOURCE_FLAG_NONE) {
+    return CD3DX12_RESOURCE_DESC(D3D12_RESOURCE_DIMENSION_BUFFER, resAllocInfo.Alignment, resAllocInfo.SizeInBytes, 1,
+                                 1, 1, DXGI_FORMAT_UNKNOWN, 1, 0, D3D12_TEXTURE_LAYOUT_ROW_MAJOR, flags);
+  }
+  static inline CD3DX12_RESOURCE_DESC Buffer(UINT64 width, D3D12_RESOURCE_FLAGS flags = D3D12_RESOURCE_FLAG_NONE,
+                                             UINT64 alignment = 0) {
+    return CD3DX12_RESOURCE_DESC(D3D12_RESOURCE_DIMENSION_BUFFER, alignment, width, 1, 1, 1, DXGI_FORMAT_UNKNOWN, 1, 0,
+                                 D3D12_TEXTURE_LAYOUT_ROW_MAJOR, flags);
+  }
+  static inline CD3DX12_RESOURCE_DESC Tex1D(DXGI_FORMAT format, UINT64 width, UINT16 arraySize = 1,
+                                            UINT16 mipLevels = 0, D3D12_RESOURCE_FLAGS flags = D3D12_RESOURCE_FLAG_NONE,
+                                            D3D12_TEXTURE_LAYOUT layout = D3D12_TEXTURE_LAYOUT_UNKNOWN,
+                                            UINT64 alignment = 0) {
+    return CD3DX12_RESOURCE_DESC(D3D12_RESOURCE_DIMENSION_TEXTURE1D, alignment, width, 1, arraySize, mipLevels, format,
+                                 1, 0, layout, flags);
+  }
+  static inline CD3DX12_RESOURCE_DESC Tex2D(DXGI_FORMAT format, UINT64 width, UINT height, UINT16 arraySize = 1,
+                                            UINT16 mipLevels = 0, UINT sampleCount = 1, UINT sampleQuality = 0,
+                                            D3D12_RESOURCE_FLAGS flags = D3D12_RESOURCE_FLAG_NONE,
+                                            D3D12_TEXTURE_LAYOUT layout = D3D12_TEXTURE_LAYOUT_UNKNOWN,
+                                            UINT64 alignment = 0) {
+    return CD3DX12_RESOURCE_DESC(D3D12_RESOURCE_DIMENSION_TEXTURE2D, alignment, width, height, arraySize, mipLevels,
+                                 format, sampleCount, sampleQuality, layout, flags);
+  }
+  static inline CD3DX12_RESOURCE_DESC Tex3D(DXGI_FORMAT format, UINT64 width, UINT height, UINT16 depth,
+                                            UINT16 mipLevels = 0, D3D12_RESOURCE_FLAGS flags = D3D12_RESOURCE_FLAG_NONE,
+                                            D3D12_TEXTURE_LAYOUT layout = D3D12_TEXTURE_LAYOUT_UNKNOWN,
+                                            UINT64 alignment = 0) {
+    return CD3DX12_RESOURCE_DESC(D3D12_RESOURCE_DIMENSION_TEXTURE3D, alignment, width, height, depth, mipLevels, format,
+                                 1, 0, layout, flags);
+  }
+  inline UINT16 Depth() const { return (Dimension == D3D12_RESOURCE_DIMENSION_TEXTURE3D ? DepthOrArraySize : 1); }
+  inline UINT16 ArraySize() const { return (Dimension != D3D12_RESOURCE_DIMENSION_TEXTURE3D ? DepthOrArraySize : 1); }
+  inline UINT8 PlaneCount(_In_ ID3D12Device* pDevice) const { return D3D12GetFormatPlaneCount(pDevice, Format); }
+  inline UINT Subresources(_In_ ID3D12Device* pDevice) const { return MipLevels * ArraySize() * PlaneCount(pDevice); }
+  inline UINT CalcSubresource(UINT MipSlice, UINT ArraySlice, UINT PlaneSlice) {
+    return D3D12CalcSubresource(MipSlice, ArraySlice, PlaneSlice, MipLevels, ArraySize());
+  }
+  operator const D3D12_RESOURCE_DESC&() const { return *this; }
+};
+inline bool operator==(const D3D12_RESOURCE_DESC& l, const D3D12_RESOURCE_DESC& r) {
+  return l.Dimension == r.Dimension && l.Alignment == r.Alignment && l.Width == r.Width && l.Height == r.Height &&
+         l.DepthOrArraySize == r.DepthOrArraySize && l.MipLevels == r.MipLevels && l.Format == r.Format &&
+         l.SampleDesc.Count == r.SampleDesc.Count && l.SampleDesc.Quality == r.SampleDesc.Quality &&
+         l.Layout == r.Layout && l.Flags == r.Flags;
+}
+inline bool operator!=(const D3D12_RESOURCE_DESC& l, const D3D12_RESOURCE_DESC& r) { return !(l == r); }
+
+//------------------------------------------------------------------------------------------------
+// Row-by-row memcpy
+inline void MemcpySubresource(_In_ const D3D12_MEMCPY_DEST* pDest, _In_ const D3D12_SUBRESOURCE_DATA* pSrc,
+                              SIZE_T RowSizeInBytes, UINT NumRows, UINT NumSlices) {
+  for (UINT z = 0; z < NumSlices; ++z) {
+    BYTE* pDestSlice = reinterpret_cast<BYTE*>(pDest->pData) + pDest->SlicePitch * z;
+    const BYTE* pSrcSlice = reinterpret_cast<const BYTE*>(pSrc->pData) + pSrc->SlicePitch * z;
+    for (UINT y = 0; y < NumRows; ++y) {
+      memcpy(pDestSlice + pDest->RowPitch * y, pSrcSlice + pSrc->RowPitch * y, RowSizeInBytes);
+    }
+  }
+}
+
+//------------------------------------------------------------------------------------------------
+// Returns required size of a buffer to be used for data upload
+inline UINT64 GetRequiredIntermediateSize(_In_ ID3D12Resource* pDestinationResource,
+                                          _In_range_(0, D3D12_REQ_SUBRESOURCES) UINT FirstSubresource,
+                                          _In_range_(0, D3D12_REQ_SUBRESOURCES - FirstSubresource)
+                                              UINT NumSubresources) {
+  D3D12_RESOURCE_DESC Desc = pDestinationResource->GetDesc();
+  UINT64 RequiredSize = 0;
+
+  ID3D12Device* pDevice;
+  pDestinationResource->GetDevice(__uuidof(*pDevice), reinterpret_cast<void**>(&pDevice));
+  pDevice->GetCopyableFootprints(&Desc, FirstSubresource, NumSubresources, 0, nullptr, nullptr, nullptr, &RequiredSize);
+  pDevice->Release();
+
+  return RequiredSize;
+}
+
+//------------------------------------------------------------------------------------------------
+// All arrays must be populated (e.g. by calling GetCopyableFootprints)
+inline UINT64 UpdateSubresources(_In_ ID3D12GraphicsCommandList* pCmdList, _In_ ID3D12Resource* pDestinationResource,
+                                 _In_ ID3D12Resource* pIntermediate,
+                                 _In_range_(0, D3D12_REQ_SUBRESOURCES) UINT FirstSubresource,
+                                 _In_range_(0, D3D12_REQ_SUBRESOURCES - FirstSubresource) UINT NumSubresources,
+                                 UINT64 RequiredSize,
+                                 _In_reads_(NumSubresources) const D3D12_PLACED_SUBRESOURCE_FOOTPRINT* pLayouts,
+                                 _In_reads_(NumSubresources) const UINT* pNumRows,
+                                 _In_reads_(NumSubresources) const UINT64* pRowSizesInBytes,
+                                 _In_reads_(NumSubresources) const D3D12_SUBRESOURCE_DATA* pSrcData) {
+  // Minor validation
+  D3D12_RESOURCE_DESC IntermediateDesc = pIntermediate->GetDesc();
+  D3D12_RESOURCE_DESC DestinationDesc = pDestinationResource->GetDesc();
+  if (IntermediateDesc.Dimension != D3D12_RESOURCE_DIMENSION_BUFFER ||
+      IntermediateDesc.Width < RequiredSize + pLayouts[0].Offset || RequiredSize > (SIZE_T)-1 ||
+      (DestinationDesc.Dimension == D3D12_RESOURCE_DIMENSION_BUFFER &&
+       (FirstSubresource != 0 || NumSubresources != 1))) {
+    return 0;
+  }
+
+  BYTE* pData;
+  HRESULT hr = pIntermediate->Map(0, NULL, reinterpret_cast<void**>(&pData));
+  if (FAILED(hr)) {
+    return 0;
+  }
+
+  for (UINT i = 0; i < NumSubresources; ++i) {
+    if (pRowSizesInBytes[i] > (SIZE_T)-1) return 0;
+    D3D12_MEMCPY_DEST DestData = {pData + pLayouts[i].Offset, pLayouts[i].Footprint.RowPitch,
+                                  pLayouts[i].Footprint.RowPitch * pNumRows[i]};
+    MemcpySubresource(&DestData, &pSrcData[i], (SIZE_T)pRowSizesInBytes[i], pNumRows[i], pLayouts[i].Footprint.Depth);
+  }
+  pIntermediate->Unmap(0, NULL);
+
+  if (DestinationDesc.Dimension == D3D12_RESOURCE_DIMENSION_BUFFER) {
+    CD3DX12_BOX SrcBox(UINT(pLayouts[0].Offset), UINT(pLayouts[0].Offset + pLayouts[0].Footprint.Width));
+    pCmdList->CopyBufferRegion(pDestinationResource, 0, pIntermediate, pLayouts[0].Offset, pLayouts[0].Footprint.Width);
+  } else {
+    for (UINT i = 0; i < NumSubresources; ++i) {
+      CD3DX12_TEXTURE_COPY_LOCATION Dst(pDestinationResource, i + FirstSubresource);
+      CD3DX12_TEXTURE_COPY_LOCATION Src(pIntermediate, pLayouts[i]);
+      pCmdList->CopyTextureRegion(&Dst, 0, 0, 0, &Src, nullptr);
+    }
+  }
+  return RequiredSize;
+}
+
+//------------------------------------------------------------------------------------------------
+// Heap-allocating UpdateSubresources implementation
+inline UINT64 UpdateSubresources(_In_ ID3D12GraphicsCommandList* pCmdList, _In_ ID3D12Resource* pDestinationResource,
+                                 _In_ ID3D12Resource* pIntermediate, UINT64 IntermediateOffset,
+                                 _In_range_(0, D3D12_REQ_SUBRESOURCES) UINT FirstSubresource,
+                                 _In_range_(0, D3D12_REQ_SUBRESOURCES - FirstSubresource) UINT NumSubresources,
+                                 _In_reads_(NumSubresources) D3D12_SUBRESOURCE_DATA* pSrcData) {
+  UINT64 RequiredSize = 0;
+  UINT64 MemToAlloc =
+      static_cast<UINT64>(sizeof(D3D12_PLACED_SUBRESOURCE_FOOTPRINT) + sizeof(UINT) + sizeof(UINT64)) * NumSubresources;
+  if (MemToAlloc > SIZE_MAX) {
+    return 0;
+  }
+  void* pMem = HeapAlloc(GetProcessHeap(), 0, static_cast<SIZE_T>(MemToAlloc));
+  if (pMem == NULL) {
+    return 0;
+  }
+  D3D12_PLACED_SUBRESOURCE_FOOTPRINT* pLayouts = reinterpret_cast<D3D12_PLACED_SUBRESOURCE_FOOTPRINT*>(pMem);
+  UINT64* pRowSizesInBytes = reinterpret_cast<UINT64*>(pLayouts + NumSubresources);
+  UINT* pNumRows = reinterpret_cast<UINT*>(pRowSizesInBytes + NumSubresources);
+
+  D3D12_RESOURCE_DESC Desc = pDestinationResource->GetDesc();
+  ID3D12Device* pDevice;
+  pDestinationResource->GetDevice(__uuidof(*pDevice), reinterpret_cast<void**>(&pDevice));
+  pDevice->GetCopyableFootprints(&Desc, FirstSubresource, NumSubresources, IntermediateOffset, pLayouts, pNumRows,
+                                 pRowSizesInBytes, &RequiredSize);
+  pDevice->Release();
+
+  UINT64 Result = UpdateSubresources(pCmdList, pDestinationResource, pIntermediate, FirstSubresource, NumSubresources,
+                                     RequiredSize, pLayouts, pNumRows, pRowSizesInBytes, pSrcData);
+  HeapFree(GetProcessHeap(), 0, pMem);
+  return Result;
+}
+
+//------------------------------------------------------------------------------------------------
+// Stack-allocating UpdateSubresources implementation
+template <UINT MaxSubresources>
+inline UINT64 UpdateSubresources(_In_ ID3D12GraphicsCommandList* pCmdList, _In_ ID3D12Resource* pDestinationResource,
+                                 _In_ ID3D12Resource* pIntermediate, UINT64 IntermediateOffset,
+                                 _In_range_(0, MaxSubresources) UINT FirstSubresource,
+                                 _In_range_(1, MaxSubresources - FirstSubresource) UINT NumSubresources,
+                                 _In_reads_(NumSubresources) D3D12_SUBRESOURCE_DATA* pSrcData) {
+  UINT64 RequiredSize = 0;
+  D3D12_PLACED_SUBRESOURCE_FOOTPRINT Layouts[MaxSubresources];
+  UINT NumRows[MaxSubresources];
+  UINT64 RowSizesInBytes[MaxSubresources];
+
+  D3D12_RESOURCE_DESC Desc = pDestinationResource->GetDesc();
+  ID3D12Device* pDevice;
+  pDestinationResource->GetDevice(__uuidof(*pDevice), reinterpret_cast<void**>(&pDevice));
+  pDevice->GetCopyableFootprints(&Desc, FirstSubresource, NumSubresources, IntermediateOffset, Layouts, NumRows,
+                                 RowSizesInBytes, &RequiredSize);
+  pDevice->Release();
+
+  return UpdateSubresources(pCmdList, pDestinationResource, pIntermediate, FirstSubresource, NumSubresources,
+                            RequiredSize, Layouts, NumRows, RowSizesInBytes, pSrcData);
+}
+
+//------------------------------------------------------------------------------------------------
+inline bool D3D12IsLayoutOpaque(D3D12_TEXTURE_LAYOUT Layout) {
+  return Layout == D3D12_TEXTURE_LAYOUT_UNKNOWN || Layout == D3D12_TEXTURE_LAYOUT_64KB_UNDEFINED_SWIZZLE;
+}
+
+//------------------------------------------------------------------------------------------------
+inline ID3D12CommandList* const* CommandListCast(ID3D12GraphicsCommandList* const* pp) {
+  // This cast is useful for passing strongly typed command list pointers into
+  // ExecuteCommandLists.
+  // This cast is valid as long as the const-ness is respected. D3D12 APIs do
+  // respect the const-ness of their arguments.
+  return reinterpret_cast<ID3D12CommandList* const*>(pp);
+}
+
+//------------------------------------------------------------------------------------------------
+// D3D12 exports a new method for serializing root signatures in the Windows 10 Anniversary Update.
+// To help enable root signature 1.1 features when they are available and not require maintaining
+// two code paths for building root signatures, this helper method reconstructs a 1.0 signature when
+// 1.1 is not supported.
+inline HRESULT D3DX12SerializeVersionedRootSignature(_In_ const D3D12_VERSIONED_ROOT_SIGNATURE_DESC* pRootSignatureDesc,
+                                                     D3D_ROOT_SIGNATURE_VERSION MaxVersion, _Outptr_ ID3DBlob** ppBlob,
+                                                     _Always_(_Outptr_opt_result_maybenull_) ID3DBlob** ppErrorBlob) {
+  if (ppErrorBlob != NULL) {
+    *ppErrorBlob = NULL;
+  }
+
+  switch (MaxVersion) {
+    case D3D_ROOT_SIGNATURE_VERSION_1_0:
+      switch (pRootSignatureDesc->Version) {
+        case D3D_ROOT_SIGNATURE_VERSION_1_0:
+          return D3D12SerializeRootSignature(&pRootSignatureDesc->Desc_1_0, D3D_ROOT_SIGNATURE_VERSION_1, ppBlob,
+                                             ppErrorBlob);
+
+        case D3D_ROOT_SIGNATURE_VERSION_1_1: {
+          HRESULT hr = S_OK;
+          const D3D12_ROOT_SIGNATURE_DESC1& desc_1_1 = pRootSignatureDesc->Desc_1_1;
+
+          const SIZE_T ParametersSize = sizeof(D3D12_ROOT_PARAMETER) * desc_1_1.NumParameters;
+          void* pParameters = (ParametersSize > 0) ? HeapAlloc(GetProcessHeap(), 0, ParametersSize) : NULL;
+          if (ParametersSize > 0 && pParameters == NULL) {
+            hr = E_OUTOFMEMORY;
+          }
+          D3D12_ROOT_PARAMETER* pParameters_1_0 = reinterpret_cast<D3D12_ROOT_PARAMETER*>(pParameters);
+
+          if (SUCCEEDED(hr)) {
+            for (UINT n = 0; n < desc_1_1.NumParameters; n++) {
+              __analysis_assume(ParametersSize == sizeof(D3D12_ROOT_PARAMETER) * desc_1_1.NumParameters);
+              pParameters_1_0[n].ParameterType = desc_1_1.pParameters[n].ParameterType;
+              pParameters_1_0[n].ShaderVisibility = desc_1_1.pParameters[n].ShaderVisibility;
+
+              switch (desc_1_1.pParameters[n].ParameterType) {
+                case D3D12_ROOT_PARAMETER_TYPE_32BIT_CONSTANTS:
+                  pParameters_1_0[n].Constants.Num32BitValues = desc_1_1.pParameters[n].Constants.Num32BitValues;
+                  pParameters_1_0[n].Constants.RegisterSpace = desc_1_1.pParameters[n].Constants.RegisterSpace;
+                  pParameters_1_0[n].Constants.ShaderRegister = desc_1_1.pParameters[n].Constants.ShaderRegister;
+                  break;
+
+                case D3D12_ROOT_PARAMETER_TYPE_CBV:
+                case D3D12_ROOT_PARAMETER_TYPE_SRV:
+                case D3D12_ROOT_PARAMETER_TYPE_UAV:
+                  pParameters_1_0[n].Descriptor.RegisterSpace = desc_1_1.pParameters[n].Descriptor.RegisterSpace;
+                  pParameters_1_0[n].Descriptor.ShaderRegister = desc_1_1.pParameters[n].Descriptor.ShaderRegister;
+                  break;
+
+                case D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE:
+                  const D3D12_ROOT_DESCRIPTOR_TABLE1& table_1_1 = desc_1_1.pParameters[n].DescriptorTable;
+
+                  const SIZE_T DescriptorRangesSize = sizeof(D3D12_DESCRIPTOR_RANGE) * table_1_1.NumDescriptorRanges;
+                  void* pDescriptorRanges = (DescriptorRangesSize > 0 && SUCCEEDED(hr))
+                                                ? HeapAlloc(GetProcessHeap(), 0, DescriptorRangesSize)
+                                                : NULL;
+                  if (DescriptorRangesSize > 0 && pDescriptorRanges == NULL) {
+                    hr = E_OUTOFMEMORY;
+                  }
+                  D3D12_DESCRIPTOR_RANGE* pDescriptorRanges_1_0 =
+                      reinterpret_cast<D3D12_DESCRIPTOR_RANGE*>(pDescriptorRanges);
+
+                  if (SUCCEEDED(hr)) {
+                    for (UINT x = 0; x < table_1_1.NumDescriptorRanges; x++) {
+                      __analysis_assume(DescriptorRangesSize ==
+                                        sizeof(D3D12_DESCRIPTOR_RANGE) * table_1_1.NumDescriptorRanges);
+                      pDescriptorRanges_1_0[x].BaseShaderRegister = table_1_1.pDescriptorRanges[x].BaseShaderRegister;
+                      pDescriptorRanges_1_0[x].NumDescriptors = table_1_1.pDescriptorRanges[x].NumDescriptors;
+                      pDescriptorRanges_1_0[x].OffsetInDescriptorsFromTableStart =
+                          table_1_1.pDescriptorRanges[x].OffsetInDescriptorsFromTableStart;
+                      pDescriptorRanges_1_0[x].RangeType = table_1_1.pDescriptorRanges[x].RangeType;
+                      pDescriptorRanges_1_0[x].RegisterSpace = table_1_1.pDescriptorRanges[x].RegisterSpace;
+                    }
+                  }
+
+                  D3D12_ROOT_DESCRIPTOR_TABLE& table_1_0 = pParameters_1_0[n].DescriptorTable;
+                  table_1_0.NumDescriptorRanges = table_1_1.NumDescriptorRanges;
+                  table_1_0.pDescriptorRanges = pDescriptorRanges_1_0;
+              }
+            }
+          }
+
+          if (SUCCEEDED(hr)) {
+            CD3DX12_ROOT_SIGNATURE_DESC desc_1_0(desc_1_1.NumParameters, pParameters_1_0, desc_1_1.NumStaticSamplers,
+                                                 desc_1_1.pStaticSamplers, desc_1_1.Flags);
+            hr = D3D12SerializeRootSignature(&desc_1_0, D3D_ROOT_SIGNATURE_VERSION_1, ppBlob, ppErrorBlob);
+          }
+
+          if (pParameters) {
+            for (UINT n = 0; n < desc_1_1.NumParameters; n++) {
+              if (desc_1_1.pParameters[n].ParameterType == D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE) {
+                HeapFree(GetProcessHeap(), 0,
+                         reinterpret_cast<void*>(const_cast<D3D12_DESCRIPTOR_RANGE*>(
+                             pParameters_1_0[n].DescriptorTable.pDescriptorRanges)));
+              }
+            }
+            HeapFree(GetProcessHeap(), 0, pParameters);
+          }
+          return hr;
+        }
+      }
+      break;
+
+    case D3D_ROOT_SIGNATURE_VERSION_1_1:
+      return D3D12SerializeVersionedRootSignature(pRootSignatureDesc, ppBlob, ppErrorBlob);
+  }
+
+  return E_INVALIDARG;
+}
+
+#endif  // defined( __cplusplus )
+
+#endif  //__D3DX12_H__

diff --git a/libav1/dx/av1_compute.cpp b/libav1/dx/av1_compute.cpp
new file mode 100644
index 0000000..209bc40
--- /dev/null
+++ b/libav1/dx/av1_compute.cpp

@@ -0,0 +1,697 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "dx/av1_compute.h"
+extern "C" {
+#include "av1/common/warped_motion.h"
+#include "av1/common/reconinter.h"
+};
+#include "dx/types.h"
+#include <wrl.h>
+#include <dxgi1_4.h>
+#include <d3dx12.h>
+
+#include "dx/shaders/h/idct4x4.h"
+#include "dx/shaders/h/idct4x8.h"
+#include "dx/shaders/h/idct4x16.h"
+#include "dx/shaders/h/idct8x4.h"
+#include "dx/shaders/h/idct8x8.h"
+#include "dx/shaders/h/idct8x16.h"
+#include "dx/shaders/h/idct8x32.h"
+#include "dx/shaders/h/idct16x4.h"
+#include "dx/shaders/h/idct16x8.h"
+#include "dx/shaders/h/idct16x16.h"
+#include "dx/shaders/h/idct16x32.h"
+#include "dx/shaders/h/idct16x64.h"
+#include "dx/shaders/h/idct32x8.h"
+#include "dx/shaders/h/idct32x16.h"
+#include "dx/shaders/h/idct32x32.h"
+#include "dx/shaders/h/idct32x64.h"
+#include "dx/shaders/h/idct64x16.h"
+#include "dx/shaders/h/idct64x32.h"
+#include "dx/shaders/h/idct64x64.h"
+#include "dx/shaders/h/idct_lossless.h"
+#include "dx/shaders/h/idct_sort_blocks.h"
+#include "dx/shaders/h/fill_buffer.h"
+#include "dx/shaders/h/gen_pred_blocks.h"
+#include "dx/shaders/h/loop_restoration.h"
+#include "dx/shaders/h/cdef_filter.h"
+#include "dx/shaders/h/film_grain_filter.h"
+#include "dx/shaders/h/film_grain_gen_luma.h"
+#include "dx/shaders/h/film_grain_gen_chroma.h"
+
+#include "dx/shaders/h/copy_plane.h"
+#include "dx/shaders/h/copy_plane_10bit10x3.h"
+
+#include "dx/shaders/h/inter_2x2chroma.h"
+#include "dx/shaders/h/inter_2x2chroma_hbd.h"
+#include "dx/shaders/h/inter_compound.h"
+#include "dx/shaders/h/inter_compound_2x2.h"
+#include "dx/shaders/h/inter_compound_2x2_hbd.h"
+#include "dx/shaders/h/inter_compound_diff_chroma.h"
+#include "dx/shaders/h/inter_compound_diff_chroma_hbd.h"
+#include "dx/shaders/h/inter_compound_diff_luma.h"
+#include "dx/shaders/h/inter_compound_diff_luma_hbd.h"
+#include "dx/shaders/h/inter_compound_hbd.h"
+#include "dx/shaders/h/inter_compound_masked.h"
+#include "dx/shaders/h/inter_compound_masked_hbd.h"
+#include "dx/shaders/h/inter_ext_borders.h"
+#include "dx/shaders/h/inter_ext_borders_hbd.h"
+#include "dx/shaders/h/inter_main.h"
+#include "dx/shaders/h/inter_main_hbd.h"
+#include "dx/shaders/h/inter_obmc_above.h"
+#include "dx/shaders/h/inter_obmc_above_hbd.h"
+#include "dx/shaders/h/inter_obmc_left.h"
+#include "dx/shaders/h/inter_obmc_left_hbd.h"
+#include "dx/shaders/h/inter_warp.h"
+#include "dx/shaders/h/inter_warp_compound.h"
+#include "dx/shaders/h/inter_warp_compound_hbd.h"
+#include "dx/shaders/h/inter_warp_hbd.h"
+#include "dx/shaders/h/intra_filter.h"
+#include "dx/shaders/h/intra_filter_hbd.h"
+#include "dx/shaders/h/intra_main.h"
+#include "dx/shaders/h/intra_main_hbd.h"
+
+#include "dx/shaders/h/inter_scale.h"
+#include "dx/shaders/h/inter_scale_compound.h"
+#include "dx/shaders/h/inter_scale_compound_2x2.h"
+#include "dx/shaders/h/inter_scale_compound_2x2_hbd.h"
+#include "dx/shaders/h/inter_scale_compound_diff_chroma.h"
+#include "dx/shaders/h/inter_scale_compound_diff_chroma_hbd.h"
+#include "dx/shaders/h/inter_scale_compound_diff_luma.h"
+#include "dx/shaders/h/inter_scale_compound_diff_luma_hbd.h"
+#include "dx/shaders/h/inter_scale_compound_hbd.h"
+#include "dx/shaders/h/inter_scale_compound_masked.h"
+#include "dx/shaders/h/inter_scale_compound_masked_hbd.h"
+#include "dx/shaders/h/inter_scale_hbd.h"
+#include "dx/shaders/h/inter_scale_obmc_above.h"
+#include "dx/shaders/h/inter_scale_obmc_above_hbd.h"
+#include "dx/shaders/h/inter_scale_obmc_left.h"
+#include "dx/shaders/h/inter_scale_obmc_left_hbd.h"
+#include "dx/shaders/h/inter_scale_warp_compound.h"
+#include "dx/shaders/h/inter_scale_warp_compound_hbd.h"
+#include "dx/shaders/h/inter_scale_2x2chroma.h"
+#include "dx/shaders/h/inter_scale_2x2chroma_hbd.h"
+
+#include "dx/shaders/h/reconstruct_block.h"
+#include "dx/shaders/h/reconstruct_block_hbd.h"
+
+#include "dx/shaders/h/loopfilter_vert.h"
+#include "dx/shaders/h/loopfilter_hor.h"
+#include "dx/shaders/h/loopfilter_vert_hbd.h"
+#include "dx/shaders/h/loopfilter_hor_hbd.h"
+#include "dx/shaders/h/gen_lf_blocks_hor.h"
+#include "dx/shaders/h/gen_lf_blocks_vert.h"
+#include "dx/shaders/h/upscaling.h"
+
+ID3D12RootSignature* create_root_sig(ID3D12Device* device, int Srvs, int Uavs, int Cbs, int InlineSize) {
+  CD3DX12_ROOT_SIGNATURE_DESC computeSignatureDesc;
+  CD3DX12_ROOT_PARAMETER params[16];
+  int count = 0;
+  for (int i = 0; i < Srvs; ++i) params[count++].InitAsShaderResourceView(i);
+  for (int i = 0; i < Uavs; ++i) params[count++].InitAsUnorderedAccessView(i);
+  int cb;
+  for (cb = 0; cb < Cbs; ++cb) params[count++].InitAsConstantBufferView(cb);
+  if (InlineSize) {
+    params[count++].InitAsConstants(InlineSize, cb);
+  }
+  computeSignatureDesc.Init(count, params);
+
+  ComPtr<ID3DBlob> computeSignatureData;
+  ComPtr<ID3DBlob> compilerLog;
+  HRESULT hr = D3D12SerializeRootSignature(&computeSignatureDesc, D3D_ROOT_SIGNATURE_VERSION_1, &computeSignatureData,
+                                           &compilerLog);
+  if (FAILED(hr)) {
+    printf((const char*)compilerLog->GetBufferPointer());
+    return NULL;
+  }
+
+  ID3D12RootSignature* sig = NULL;
+  hr = device->CreateRootSignature(0, computeSignatureData->GetBufferPointer(), computeSignatureData->GetBufferSize(),
+                                   IID_PPV_ARGS(&sig));
+  return SUCCEEDED(hr) ? sig : NULL;
+}
+
+HRESULT create_shader(ID3D12Device* device, const BYTE* src, int size, ID3D12RootSignature* sig, ComputeShader* dst) {
+  dst->bytecode = src;
+  dst->size = size;
+  dst->signaturePtr = sig;
+  // dst->pso =
+  D3D12_COMPUTE_PIPELINE_STATE_DESC pipelineDesc = {};
+  pipelineDesc.pRootSignature = sig;
+  pipelineDesc.CS.pShaderBytecode = src;
+  pipelineDesc.CS.BytecodeLength = size;
+  HRESULT hr = device->CreateComputePipelineState(&pipelineDesc, IID_PPV_ARGS(&dst->pso));
+  return hr;
+}
+
+THREADFN create_shader_thread_hook(void* data) {
+  compute_shader_lib* shader_lib = static_cast<compute_shader_lib*>(data);
+  while (1) {
+    CreateShaderTask* task = (CreateShaderTask*)MTQueueGet(&shader_lib->task_queue);
+    if (!task)
+      break;
+    else {
+      shader_lib->create_shader_errors |= create_shader(task->device, task->src, task->size, task->sig, task->dst);
+      if (shader_lib->create_shader_errors) return THREAD_RETURN(1);
+    }
+  }
+  return THREAD_RETURN(0);
+}
+
+int wait_shader_create_complete(compute_shader_lib* lib) {
+  if (!lib) return -1;
+  if (!lib->threads)
+    return 0;
+  else {
+    for (int i = 0; i < lib->threads; i++) {
+      pthread_join(lib->create_shader_threads[i], 0);
+    }
+    lib->threads = 0;
+    if (lib->create_shader_tasks) delete[] lib->create_shader_tasks;
+    lib->create_shader_tasks = 0;
+    return lib->create_shader_errors;
+  }
+}
+
+void* av1_create_pipeline_cache_handle(void* d3d12device, int threads) {
+  bool err = 0;
+  if (!d3d12device) return 0;
+  compute_shader_lib* shader_lib = new (std::nothrow) compute_shader_lib();
+  if (!shader_lib) return 0;
+  ID3D12Device* _device = static_cast<ID3D12Device*>(d3d12device);
+
+  err |= (shader_lib->sig_idct = create_root_sig(_device, 2, 1, 2, 2)) == NULL;
+  err |= (shader_lib->sig_common111 = create_root_sig(_device, 1, 1, 1, 0)) == NULL;
+  err |= (shader_lib->sig_common0102 = create_root_sig(_device, 0, 1, 0, 2)) == NULL;
+  err |= (shader_lib->sig_common0110 = create_root_sig(_device, 0, 1, 1, 0)) == NULL;
+  err |= (shader_lib->sig_copy_plane = create_root_sig(_device, 0, 1, 0, 6)) == NULL;
+  err |= (shader_lib->sig_pred_blocks = create_root_sig(_device, 5, 2, 1, 7)) == NULL;
+  err |= (shader_lib->sig_intra_pred = create_root_sig(_device, 3, 1, 1, 10)) == NULL;
+  err |= (shader_lib->sig_inter_pred = create_root_sig(_device, 5, 1, 1, 4)) == NULL;
+  err |= (shader_lib->sig_loop_rest = create_root_sig(_device, 3, 1, 2, 2)) == NULL;
+  err |= (shader_lib->sig_cdef_filter = create_root_sig(_device, 2, 1, 1, 0)) == NULL;
+  err |= (shader_lib->sig_film_grain = create_root_sig(_device, 3, 1, 1, 0)) == NULL;
+  err |= (shader_lib->sig_lf = create_root_sig(_device, 1, 1, 1, 5)) == NULL;
+  err |= (shader_lib->sig_lf_gen = create_root_sig(_device, 2, 1, 1, 6)) == NULL;
+  err |= (shader_lib->sig_upscale = create_root_sig(_device, 0, 1, 1, 3)) == NULL;
+  if (err) return 0;
+
+  bitdepth_dependent_shaders* s8 = &shader_lib->shaders_8bit;
+  bitdepth_dependent_shaders* hbd = &shader_lib->shaders_hbd;
+
+  CreateShaderTask create_tasks[] = {
+      {_device, idct4x4, sizeof(idct4x4), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_4X4]},
+      {_device, idct4x8, sizeof(idct4x8), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_4X8]},
+      {_device, idct4x16, sizeof(idct4x16), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_4X16]},
+      {_device, idct8x4, sizeof(idct8x4), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_8X4]},
+      {_device, idct8x8, sizeof(idct8x8), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_8X8]},
+      {_device, idct8x16, sizeof(idct8x16), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_8X16]},
+      {_device, idct8x32, sizeof(idct8x32), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_8X32]},
+      {_device, idct16x4, sizeof(idct16x4), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_16X4]},
+      {_device, idct16x8, sizeof(idct16x8), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_16X8]},
+      {_device, idct16x16, sizeof(idct16x16), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_16X16]},
+      {_device, idct16x32, sizeof(idct16x32), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_16X32]},
+      {_device, idct16x64, sizeof(idct16x64), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_16X64]},
+      {_device, idct32x8, sizeof(idct32x8), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_32X8]},
+      {_device, idct32x16, sizeof(idct32x16), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_32X16]},
+      {_device, idct32x32, sizeof(idct32x32), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_32X32]},
+      {_device, idct32x64, sizeof(idct32x64), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_32X64]},
+      {_device, idct64x16, sizeof(idct64x16), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_64X16]},
+      {_device, idct64x32, sizeof(idct64x32), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_64X32]},
+      {_device, idct64x64, sizeof(idct64x64), shader_lib->sig_idct.Get(), &shader_lib->shader_idct[TX_64X64]},
+      {_device, idct_lossless, sizeof(idct_lossless), shader_lib->sig_idct.Get(),
+       &shader_lib->shader_idct[TX_SIZES_ALL]},
+      {_device, idct_sort_blocks, sizeof(idct_sort_blocks), shader_lib->sig_common111.Get(),
+       &shader_lib->shader_idct_sort},
+      {_device, fill_buffer, sizeof(fill_buffer), shader_lib->sig_common0102.Get(), &shader_lib->shader_fill_buffer},
+      {_device, loop_restoration, sizeof(loop_restoration), shader_lib->sig_loop_rest.Get(),
+       &shader_lib->shader_loop_rest},
+      {_device, film_grain_filter, sizeof(film_grain_filter), shader_lib->sig_film_grain.Get(),
+       &shader_lib->shader_filmgrain_filter},
+      {_device, film_grain_gen_luma, sizeof(film_grain_gen_luma), shader_lib->sig_film_grain.Get(),
+       &shader_lib->shader_filmgrain_luma_gen},
+      {_device, film_grain_gen_chroma, sizeof(film_grain_gen_chroma), shader_lib->sig_film_grain.Get(),
+       &shader_lib->shader_filmgrain_chroma_gen},
+      {_device, cdef_filter, sizeof(cdef_filter), shader_lib->sig_cdef_filter.Get(), &shader_lib->shader_cdef_filter},
+
+      {_device, gen_pred_blocks, sizeof(gen_pred_blocks), shader_lib->sig_pred_blocks.Get(),
+       &shader_lib->shader_gen_pred_blocks},
+      {_device, gen_lf_blocks_vert, sizeof(gen_lf_blocks_vert), shader_lib->sig_lf_gen.Get(),
+       &shader_lib->shader_gen_lf_vert},
+      {_device, gen_lf_blocks_hor, sizeof(gen_lf_blocks_hor), shader_lib->sig_lf_gen.Get(),
+       &shader_lib->shader_gen_lf_hor},
+      {_device, copy_plane, sizeof(copy_plane), shader_lib->sig_copy_plane.Get(), &shader_lib->shader_copy_plane},
+      {_device, copy_plane_10bit10x3, sizeof(copy_plane_10bit10x3), shader_lib->sig_copy_plane.Get(),
+       &shader_lib->shader_copy_plane_10bit10x3},
+      {_device, upscaling, sizeof(upscaling), shader_lib->sig_upscale.Get(), &shader_lib->shader_upscale},
+
+      {_device, inter_2x2chroma, sizeof(inter_2x2chroma), shader_lib->sig_inter_pred.Get(), &s8->inter_2x2},
+      {_device, inter_compound, sizeof(inter_compound), shader_lib->sig_inter_pred.Get(), &s8->inter_comp},
+      {_device, inter_compound_2x2, sizeof(inter_compound_2x2), shader_lib->sig_inter_pred.Get(), &s8->inter_comp_2x2},
+      {_device, inter_compound_diff_chroma, sizeof(inter_compound_diff_chroma), shader_lib->sig_inter_pred.Get(),
+       &s8->inter_comp_diff_uv},
+      {_device, inter_compound_diff_luma, sizeof(inter_compound_diff_luma), shader_lib->sig_inter_pred.Get(),
+       &s8->inter_comp_diff_y},
+      {_device, inter_compound_masked, sizeof(inter_compound_masked), shader_lib->sig_inter_pred.Get(),
+       &s8->inter_comp_masked},
+      {_device, inter_main, sizeof(inter_main), shader_lib->sig_inter_pred.Get(), &s8->inter_base},
+      {_device, inter_obmc_above, sizeof(inter_obmc_above), shader_lib->sig_inter_pred.Get(), &s8->inter_obmc_above},
+      {_device, inter_obmc_left, sizeof(inter_obmc_left), shader_lib->sig_inter_pred.Get(), &s8->inter_obmc_left},
+      {_device, inter_warp, sizeof(inter_warp), shader_lib->sig_inter_pred.Get(), &s8->inter_warp},
+      {_device, inter_scale_warp_compound, sizeof(inter_scale_warp_compound), shader_lib->sig_inter_pred.Get(),
+       &s8->inter_warp_comp},
+      {_device, intra_filter, sizeof(intra_filter), shader_lib->sig_intra_pred.Get(), &s8->intra_filter},
+      {_device, intra_main, sizeof(intra_main), shader_lib->sig_intra_pred.Get(), &s8->intra_main},
+      {_device, inter_ext_borders, sizeof(inter_ext_borders), shader_lib->sig_common0110.Get(), &s8->inter_ext_borders},
+      {_device, reconstruct_block, sizeof(reconstruct_block), shader_lib->sig_intra_pred.Get(), &s8->reconstruct_block},
+      {_device, loopfilter_vert, sizeof(loopfilter_vert), shader_lib->sig_lf.Get(), &s8->loopfilter_v},
+      {_device, loopfilter_hor, sizeof(loopfilter_hor), shader_lib->sig_lf.Get(), &s8->loopfilter_h},
+
+      {_device, inter_compound_2x2_hbd, sizeof(inter_compound_2x2_hbd), shader_lib->sig_inter_pred.Get(),
+       &hbd->inter_comp_2x2},
+      {_device, inter_2x2chroma_hbd, sizeof(inter_2x2chroma_hbd), shader_lib->sig_inter_pred.Get(), &hbd->inter_2x2},
+      {_device, inter_compound_diff_chroma_hbd, sizeof(inter_compound_diff_chroma_hbd),
+       shader_lib->sig_inter_pred.Get(), &hbd->inter_comp_diff_uv},
+      {_device, inter_compound_diff_luma_hbd, sizeof(inter_compound_diff_luma_hbd), shader_lib->sig_inter_pred.Get(),
+       &hbd->inter_comp_diff_y},
+      {_device, inter_compound_hbd, sizeof(inter_compound_hbd), shader_lib->sig_inter_pred.Get(), &hbd->inter_comp},
+      {_device, inter_compound_masked_hbd, sizeof(inter_compound_masked_hbd), shader_lib->sig_inter_pred.Get(),
+       &hbd->inter_comp_masked},
+      {_device, inter_main_hbd, sizeof(inter_main_hbd), shader_lib->sig_inter_pred.Get(), &hbd->inter_base},
+      {_device, inter_obmc_above_hbd, sizeof(inter_obmc_above_hbd), shader_lib->sig_inter_pred.Get(),
+       &hbd->inter_obmc_above},
+      {_device, inter_obmc_left_hbd, sizeof(inter_obmc_left_hbd), shader_lib->sig_inter_pred.Get(),
+       &hbd->inter_obmc_left},
+      {_device, inter_scale_warp_compound_hbd, sizeof(inter_scale_warp_compound_hbd), shader_lib->sig_inter_pred.Get(),
+       &hbd->inter_warp_comp},
+      {_device, inter_warp_hbd, sizeof(inter_warp_hbd), shader_lib->sig_inter_pred.Get(), &hbd->inter_warp},
+      {_device, intra_filter_hbd, sizeof(intra_filter_hbd), shader_lib->sig_intra_pred.Get(), &hbd->intra_filter},
+      {_device, intra_main_hbd, sizeof(intra_main_hbd), shader_lib->sig_intra_pred.Get(), &hbd->intra_main},
+      {_device, inter_ext_borders_hbd, sizeof(inter_ext_borders_hbd), shader_lib->sig_common0110.Get(),
+       &hbd->inter_ext_borders},
+      {_device, reconstruct_block_hbd, sizeof(reconstruct_block_hbd), shader_lib->sig_intra_pred.Get(),
+       &hbd->reconstruct_block},
+      {_device, loopfilter_vert_hbd, sizeof(loopfilter_vert_hbd), shader_lib->sig_lf.Get(), &hbd->loopfilter_v},
+      {_device, loopfilter_hor_hbd, sizeof(loopfilter_hor_hbd), shader_lib->sig_lf.Get(), &hbd->loopfilter_h},
+
+      {_device, inter_scale, sizeof(inter_scale), shader_lib->sig_inter_pred.Get(), &s8->inter_scale},
+      {_device, inter_scale_2x2chroma, sizeof(inter_scale_2x2chroma), shader_lib->sig_inter_pred.Get(),
+       &s8->inter_scale_2x2},
+      {_device, inter_scale_compound, sizeof(inter_scale_compound), shader_lib->sig_inter_pred.Get(),
+       &s8->inter_scale_comp},
+      {_device, inter_scale_compound_diff_chroma, sizeof(inter_scale_compound_diff_chroma),
+       shader_lib->sig_inter_pred.Get(), &s8->inter_scale_comp_diff_uv},
+      {_device, inter_scale_compound_diff_luma, sizeof(inter_scale_compound_diff_luma),
+       shader_lib->sig_inter_pred.Get(), &s8->inter_scale_comp_diff_y},
+      {_device, inter_scale_compound_masked, sizeof(inter_scale_compound_masked), shader_lib->sig_inter_pred.Get(),
+       &s8->inter_scale_comp_masked},
+      {_device, inter_scale_obmc_above, sizeof(inter_scale_obmc_above), shader_lib->sig_inter_pred.Get(),
+       &s8->inter_scale_obmc_above},
+      {_device, inter_scale_obmc_left, sizeof(inter_scale_obmc_left), shader_lib->sig_inter_pred.Get(),
+       &s8->inter_scale_obmc_left},
+      {_device, inter_scale_hbd, sizeof(inter_scale_hbd), shader_lib->sig_inter_pred.Get(), &hbd->inter_scale},
+      {_device, inter_scale_2x2chroma_hbd, sizeof(inter_scale_2x2chroma_hbd), shader_lib->sig_inter_pred.Get(),
+       &hbd->inter_scale_2x2},
+      {_device, inter_scale_compound_hbd, sizeof(inter_scale_compound_hbd), shader_lib->sig_inter_pred.Get(),
+       &hbd->inter_scale_comp},
+      {_device, inter_scale_compound_diff_chroma_hbd, sizeof(inter_scale_compound_diff_chroma_hbd),
+       shader_lib->sig_inter_pred.Get(), &hbd->inter_scale_comp_diff_uv},
+      {_device, inter_scale_compound_diff_luma_hbd, sizeof(inter_scale_compound_diff_luma_hbd),
+       shader_lib->sig_inter_pred.Get(), &hbd->inter_scale_comp_diff_y},
+      {_device, inter_scale_compound_masked_hbd, sizeof(inter_scale_compound_masked_hbd),
+       shader_lib->sig_inter_pred.Get(), &hbd->inter_scale_comp_masked},
+      {_device, inter_scale_obmc_above_hbd, sizeof(inter_scale_obmc_above_hbd), shader_lib->sig_inter_pred.Get(),
+       &hbd->inter_scale_obmc_above},
+      {_device, inter_scale_obmc_left_hbd, sizeof(inter_scale_obmc_left_hbd), shader_lib->sig_inter_pred.Get(),
+       &hbd->inter_scale_obmc_left}};
+
+  const int size = sizeof(create_tasks) / sizeof(CreateShaderTask);
+  shader_lib->threads = min(threads, MAX_CREATE_SHADER_THREADS);
+  assert((size + shader_lib->threads) <= BUFFERS_COUNT);
+
+  if (!threads) {
+    // sync case
+    for (int i = 0; i < size; i++) {
+      CreateShaderTask* task = &create_tasks[i];
+      err |= create_shader(task->device, task->src, task->size, task->sig, task->dst) != S_OK;
+    }
+  } else {
+    // async case
+    shader_lib->create_shader_tasks = new CreateShaderTask[size];
+    memcpy(shader_lib->create_shader_tasks, create_tasks, sizeof(create_tasks));
+    MTQueueInit(&shader_lib->task_queue);
+    for (int i = 0; i < size; i++) {
+      MTQueuePush(&shader_lib->task_queue, &shader_lib->create_shader_tasks[i]);
+    }
+    for (int i = 0; i < shader_lib->threads; i++) {
+      MTQueuePush(&shader_lib->task_queue, 0);
+      pthread_create(&shader_lib->create_shader_threads[i], 0, (LPTHREAD_START_ROUTINE)create_shader_thread_hook,
+                     shader_lib);
+    }
+  }
+  return err ? 0 : shader_lib;
+}
+
+void av1_destroy_pipeline_cache_handle(void* handle) {
+  if (handle) {
+    compute_shader_lib* shader_lib = static_cast<compute_shader_lib*>(handle);
+    wait_shader_create_complete(shader_lib);
+    delete shader_lib;
+  }
+}
+
+static int get_random_number_test(int val) {
+  unsigned int bit;
+  bit = ((val >> 0) ^ (val >> 1) ^ (val >> 3) ^ (val >> 12)) & 1;
+  val = (val >> 1) | (bit << 15);
+  return val;
+}
+
+HRESULT av1_upload_luts(Av1Core* dec) {
+  ComPtr<ID3D12Resource> tmp_fg_luma;
+  ComPtr<ID3D12Resource> tmp_fg_chroma;
+  ComPtr<ID3D12Resource> tmp_fg_gaus;
+  ComPtr<ID3D12Resource> tmp_pred_wedge;
+  ComPtr<ID3D12Resource> tmp_warp_filt;
+
+  GpuBufferObject* fg_luma = dec->filmgrain_random_luma;
+  GpuBufferObject* fg_chroma = dec->filmgrain_random_chroma;
+  GpuBufferObject* fg_gaus = dec->filmgrain_gaus;
+  GpuBufferObject* pred_lut = dec->inter_mask_lut;
+  GpuBufferObject* warp = dec->inter_warp_filter;
+
+  HRESULT hr;
+  Microsoft::WRL::ComPtr<ID3D12Device> device = dec->compute.device;
+  hr = device->CreateCommittedResource(&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), D3D12_HEAP_FLAG_NONE,
+                                       &CD3DX12_RESOURCE_DESC::Buffer(fg_luma->size), D3D12_RESOURCE_STATE_GENERIC_READ,
+                                       NULL, IID_PPV_ARGS(&tmp_fg_luma));
+  hr |= device->CreateCommittedResource(&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), D3D12_HEAP_FLAG_NONE,
+                                        &CD3DX12_RESOURCE_DESC::Buffer(fg_chroma->size),
+                                        D3D12_RESOURCE_STATE_GENERIC_READ, NULL, IID_PPV_ARGS(&tmp_fg_chroma));
+  hr |= device->CreateCommittedResource(&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), D3D12_HEAP_FLAG_NONE,
+                                        &CD3DX12_RESOURCE_DESC::Buffer(fg_gaus->size),
+                                        D3D12_RESOURCE_STATE_GENERIC_READ, NULL, IID_PPV_ARGS(&tmp_fg_gaus));
+  hr |= device->CreateCommittedResource(&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), D3D12_HEAP_FLAG_NONE,
+                                        &CD3DX12_RESOURCE_DESC::Buffer(pred_lut->size),
+                                        D3D12_RESOURCE_STATE_GENERIC_READ, NULL, IID_PPV_ARGS(&tmp_pred_wedge));
+  hr |= device->CreateCommittedResource(&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), D3D12_HEAP_FLAG_NONE,
+                                        &CD3DX12_RESOURCE_DESC::Buffer(warp->size), D3D12_RESOURCE_STATE_GENERIC_READ,
+                                        NULL, IID_PPV_ARGS(&tmp_warp_filt));
+  if (FAILED(hr)) return hr;
+
+  int* dst = NULL;
+  hr = tmp_fg_luma->Map(0, &CD3DX12_RANGE(0, 0), (void**)&dst);
+  if (FAILED(hr)) return hr;
+  for (int i = 0; i < 65536; i++) {
+    int random_register = i;
+    for (int j = 0; j < 82; j++) {
+      random_register = get_random_number_test(random_register);
+    }
+    dst[i] = random_register;
+  }
+  tmp_fg_luma->Unmap(0, NULL);
+
+  hr = tmp_fg_chroma->Map(0, &CD3DX12_RANGE(0, 0), (void**)&dst);
+  if (FAILED(hr)) return hr;
+  for (int i = 0; i < 65536; i++) {
+    int random_register = i;
+    for (int j = 0; j < 44; j++) {
+      random_register = get_random_number_test(random_register);
+    }
+    dst[i] = random_register;
+  }
+  tmp_fg_chroma->Unmap(0, NULL);
+
+  hr = tmp_fg_gaus->Map(0, &CD3DX12_RANGE(0, 0), (void**)&dst);
+  if (FAILED(hr)) return hr;
+  memcpy(dst, dx_gaussian_sequence, sizeof(dx_gaussian_sequence));
+  tmp_fg_gaus->Unmap(0, NULL);
+
+  hr = tmp_warp_filt->Map(0, &CD3DX12_RANGE(0, 0), (void**)&dst);
+  if (FAILED(hr)) return hr;
+
+  const int16_t* wrp_src = &warped_filter[0][0];
+  for (int i = 0; i < sizeof(warped_filter) / 2; ++i) {
+    dst[i] = wrp_src[i];
+  }
+  tmp_warp_filt->Unmap(0, NULL);
+
+  uint8_t* ptr = NULL;
+  hr = tmp_pred_wedge->Map(0, &CD3DX12_RANGE(0, 0), (void**)&ptr);
+  if (FAILED(hr)) return hr;
+  int offset = 0;
+  av1_init_wedge_masks();
+  for (int bsize = 0; bsize < BLOCK_SIZES_ALL; ++bsize) {
+    const wedge_params_type* wedge_params = &wedge_params_lookup[bsize];
+    if (!wedge_params->bits) continue;
+    const int bw = block_size_wide[bsize];
+    const int bh = block_size_high[bsize];
+    const int size = bw * bh;
+    dec->wedge_offsets[bsize][0] = (offset + 63) >> 6;
+    for (int index = 0; index < 16; ++index) {
+      memcpy(ptr + offset, wedge_params->masks[0][index], size);
+      offset += size;
+      memcpy(ptr + offset, wedge_params->masks[1][index], size);
+      offset += size;
+    }
+
+    const int size_scale = ii_size_scales[bsize];
+
+    for (int i = 0; i < bh; ++i) memset(ptr + offset + i * bw, 32, bw);
+    offset += size;
+    for (int i = 0; i < bh; ++i) memset(ptr + offset + i * bw, ii_weights1d[i * size_scale], bw);
+    offset += size;
+    for (int y = 0; y < bh; ++y)
+      for (int x = 0; x < bw; ++x) ptr[offset + x + y * bw] = ii_weights1d[x * size_scale];
+    offset += size;
+    for (int y = 0; y < bh; ++y)
+      for (int x = 0; x < bw; ++x) ptr[offset + x + y * bw] = ii_weights1d[(y < x ? y : x) * size_scale];
+    offset += size;
+  }
+
+  for (int bsize = 0; bsize < BLOCK_SIZES_ALL; ++bsize) {
+    const wedge_params_type* wedge_params = &wedge_params_lookup[bsize];
+    if (!wedge_params->bits) continue;
+    const int bw = block_size_wide[bsize] >> 1;
+    const int bh = block_size_high[bsize] >> 1;
+    const int size = AOMMAX(64, bw * bh);
+    dec->wedge_offsets[bsize][1] = (offset + 63) >> 6;
+    for (int index = 0; index < 16; ++index) {
+      for (int sign = 0; sign < 2; ++sign) {
+        int idx = 0;
+        for (int y = 0; y < bh; ++y) {
+          for (int x = 0; x < bw; ++x) {
+            ptr[offset + idx] = (wedge_params->masks[sign][index][x * 2 + (y * 2 + 0) * bw * 2] +
+                                 wedge_params->masks[sign][index][x * 2 + 1 + (y * 2 + 0) * bw * 2] +
+                                 wedge_params->masks[sign][index][x * 2 + (y * 2 + 1) * bw * 2] +
+                                 wedge_params->masks[sign][index][x * 2 + 1 + (y * 2 + 1) * bw * 2] + 2) >>
+                                2;
+            ++idx;
+          }
+        }
+        offset += size;
+      }
+    }
+    const int plane_bsize = ss_size_lookup[bsize][1][1];
+    const int size_scale = ii_size_scales[plane_bsize];
+    for (int i = 0; i < bh; ++i) memset(ptr + offset + i * bw, 32, bw);
+    offset += size;
+    for (int i = 0; i < bh; ++i) memset(ptr + offset + i * bw, ii_weights1d[i * size_scale], bw);
+    offset += size;
+    for (int y = 0; y < bh; ++y)
+      for (int x = 0; x < bw; ++x) ptr[offset + x + y * bw] = ii_weights1d[x * size_scale];
+    offset += size;
+    for (int y = 0; y < bh; ++y)
+      for (int x = 0; x < bw; ++x) ptr[offset + x + y * bw] = ii_weights1d[(y < x ? y : x) * size_scale];
+    offset += size;
+  }
+  tmp_pred_wedge->Unmap(0, NULL);
+
+  ComPtr<ID3D12CommandAllocator> computeAllocator;
+  hr = device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&computeAllocator));
+  if (FAILED(hr)) return hr;
+
+  ComPtr<ID3D12GraphicsCommandList> clist;
+  hr = device->CreateCommandList(0, D3D12_COMMAND_LIST_TYPE_DIRECT, computeAllocator.Get(), NULL, IID_PPV_ARGS(&clist));
+  if (FAILED(hr)) return hr;
+
+  clist->CopyResource(fg_luma->dev, tmp_fg_luma.Get());
+  clist->CopyResource(fg_chroma->dev, tmp_fg_chroma.Get());
+  clist->CopyResource(fg_gaus->dev, tmp_fg_gaus.Get());
+  clist->CopyResource(pred_lut->dev, tmp_pred_wedge.Get());
+  clist->CopyResource(warp->dev, tmp_warp_filt.Get());
+
+  D3D12_RESOURCE_BARRIER barriers[] = {
+      CD3DX12_RESOURCE_BARRIER::Transition(fg_luma->dev, D3D12_RESOURCE_STATE_COPY_DEST,
+                                           D3D12_RESOURCE_STATE_GENERIC_READ),
+      CD3DX12_RESOURCE_BARRIER::Transition(fg_chroma->dev, D3D12_RESOURCE_STATE_COPY_DEST,
+                                           D3D12_RESOURCE_STATE_GENERIC_READ),
+      CD3DX12_RESOURCE_BARRIER::Transition(fg_gaus->dev, D3D12_RESOURCE_STATE_COPY_DEST,
+                                           D3D12_RESOURCE_STATE_GENERIC_READ),
+      CD3DX12_RESOURCE_BARRIER::Transition(pred_lut->dev, D3D12_RESOURCE_STATE_COPY_DEST,
+                                           D3D12_RESOURCE_STATE_GENERIC_READ),
+      CD3DX12_RESOURCE_BARRIER::Transition(warp->dev, D3D12_RESOURCE_STATE_COPY_DEST,
+                                           D3D12_RESOURCE_STATE_GENERIC_READ),
+  };
+
+  clist->ResourceBarrier(5, barriers);
+  clist->Close();
+
+  ID3D12CommandList* cl[] = {clist.Get()};
+  Microsoft::WRL::ComPtr<ID3D12CommandQueue> queue = dec->compute.queue_direct;
+  queue->ExecuteCommandLists(1, cl);
+
+  ComPtr<ID3D12Fence> fence;
+  hr = device->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&fence));
+  if (FAILED(hr)) return hr;
+
+  queue->Signal(fence.Get(), 1);
+  HANDLE event = CreateEvent(nullptr, false, false, nullptr);
+  fence->SetEventOnCompletion(1, event);
+  WaitForSingleObject(event, INFINITE);
+  return hr;
+}
+
+static const int dx_gaussian_sequence[2048] = {
+    56,    568,   -180,  172,   124,   -84,   172,   -64,   -900,  24,    820,   224,   1248,  996,   272,   -8,
+    -916,  -388,  -732,  -104,  -188,  800,   112,   -652,  -320,  -376,  140,   -252,  492,   -168,  44,    -788,
+    588,   -584,  500,   -228,  12,    680,   272,   -476,  972,   -100,  652,   368,   432,   -196,  -720,  -192,
+    1000,  -332,  652,   -136,  -552,  -604,  -4,    192,   -220,  -136,  1000,  -52,   372,   -96,   -624,  124,
+    -24,   396,   540,   -12,   -104,  640,   464,   244,   -208,  -84,   368,   -528,  -740,  248,   -968,  -848,
+    608,   376,   -60,   -292,  -40,   -156,  252,   -292,  248,   224,   -280,  400,   -244,  244,   -60,   76,
+    -80,   212,   532,   340,   128,   -36,   824,   -352,  -60,   -264,  -96,   -612,  416,   -704,  220,   -204,
+    640,   -160,  1220,  -408,  900,   336,   20,    -336,  -96,   -792,  304,   48,    -28,   -1232, -1172, -448,
+    104,   -292,  -520,  244,   60,    -948,  0,     -708,  268,   108,   356,   -548,  488,   -344,  -136,  488,
+    -196,  -224,  656,   -236,  -1128, 60,    4,     140,   276,   -676,  -376,  168,   -108,  464,   8,     564,
+    64,    240,   308,   -300,  -400,  -456,  -136,  56,    120,   -408,  -116,  436,   504,   -232,  328,   844,
+    -164,  -84,   784,   -168,  232,   -224,  348,   -376,  128,   568,   96,    -1244, -288,  276,   848,   832,
+    -360,  656,   464,   -384,  -332,  -356,  728,   -388,  160,   -192,  468,   296,   224,   140,   -776,  -100,
+    280,   4,     196,   44,    -36,   -648,  932,   16,    1428,  28,    528,   808,   772,   20,    268,   88,
+    -332,  -284,  124,   -384,  -448,  208,   -228,  -1044, -328,  660,   380,   -148,  -300,  588,   240,   540,
+    28,    136,   -88,   -436,  256,   296,   -1000, 1400,  0,     -48,   1056,  -136,  264,   -528,  -1108, 632,
+    -484,  -592,  -344,  796,   124,   -668,  -768,  388,   1296,  -232,  -188,  -200,  -288,  -4,    308,   100,
+    -168,  256,   -500,  204,   -508,  648,   -136,  372,   -272,  -120,  -1004, -552,  -548,  -384,  548,   -296,
+    428,   -108,  -8,    -912,  -324,  -224,  -88,   -112,  -220,  -100,  996,   -796,  548,   360,   -216,  180,
+    428,   -200,  -212,  148,   96,    148,   284,   216,   -412,  -320,  120,   -300,  -384,  -604,  -572,  -332,
+    -8,    -180,  -176,  696,   116,   -88,   628,   76,    44,    -516,  240,   -208,  -40,   100,   -592,  344,
+    -308,  -452,  -228,  20,    916,   -1752, -136,  -340,  -804,  140,   40,    512,   340,   248,   184,   -492,
+    896,   -156,  932,   -628,  328,   -688,  -448,  -616,  -752,  -100,  560,   -1020, 180,   -800,  -64,   76,
+    576,   1068,  396,   660,   552,   -108,  -28,   320,   -628,  312,   -92,   -92,   -472,  268,   16,    560,
+    516,   -672,  -52,   492,   -100,  260,   384,   284,   292,   304,   -148,  88,    -152,  1012,  1064,  -228,
+    164,   -376,  -684,  592,   -392,  156,   196,   -524,  -64,   -884,  160,   -176,  636,   648,   404,   -396,
+    -436,  864,   424,   -728,  988,   -604,  904,   -592,  296,   -224,  536,   -176,  -920,  436,   -48,   1176,
+    -884,  416,   -776,  -824,  -884,  524,   -548,  -564,  -68,   -164,  -96,   692,   364,   -692,  -1012, -68,
+    260,   -480,  876,   -1116, 452,   -332,  -352,  892,   -1088, 1220,  -676,  12,    -292,  244,   496,   372,
+    -32,   280,   200,   112,   -440,  -96,   24,    -644,  -184,  56,    -432,  224,   -980,  272,   -260,  144,
+    -436,  420,   356,   364,   -528,  76,    172,   -744,  -368,  404,   -752,  -416,  684,   -688,  72,    540,
+    416,   92,    444,   480,   -72,   -1416, 164,   -1172, -68,   24,    424,   264,   1040,  128,   -912,  -524,
+    -356,  64,    876,   -12,   4,     -88,   532,   272,   -524,  320,   276,   -508,  940,   24,    -400,  -120,
+    756,   60,    236,   -412,  100,   376,   -484,  400,   -100,  -740,  -108,  -260,  328,   -268,  224,   -200,
+    -416,  184,   -604,  -564,  -20,   296,   60,    892,   -888,  60,    164,   68,    -760,  216,   -296,  904,
+    -336,  -28,   404,   -356,  -568,  -208,  -1480, -512,  296,   328,   -360,  -164,  -1560, -776,  1156,  -428,
+    164,   -504,  -112,  120,   -216,  -148,  -264,  308,   32,    64,    -72,   72,    116,   176,   -64,   -272,
+    460,   -536,  -784,  -280,  348,   108,   -752,  -132,  524,   -540,  -776,  116,   -296,  -1196, -288,  -560,
+    1040,  -472,  116,   -848,  -1116, 116,   636,   696,   284,   -176,  1016,  204,   -864,  -648,  -248,  356,
+    972,   -584,  -204,  264,   880,   528,   -24,   -184,  116,   448,   -144,  828,   524,   212,   -212,  52,
+    12,    200,   268,   -488,  -404,  -880,  824,   -672,  -40,   908,   -248,  500,   716,   -576,  492,   -576,
+    16,    720,   -108,  384,   124,   344,   280,   576,   -500,  252,   104,   -308,  196,   -188,  -8,    1268,
+    296,   1032,  -1196, 436,   316,   372,   -432,  -200,  -660,  704,   -224,  596,   -132,  268,   32,    -452,
+    884,   104,   -1008, 424,   -1348, -280,  4,     -1168, 368,   476,   696,   300,   -8,    24,    180,   -592,
+    -196,  388,   304,   500,   724,   -160,  244,   -84,   272,   -256,  -420,  320,   208,   -144,  -156,  156,
+    364,   452,   28,    540,   316,   220,   -644,  -248,  464,   72,    360,   32,    -388,  496,   -680,  -48,
+    208,   -116,  -408,  60,    -604,  -392,  548,   -840,  784,   -460,  656,   -544,  -388,  -264,  908,   -800,
+    -628,  -612,  -568,  572,   -220,  164,   288,   -16,   -308,  308,   -112,  -636,  -760,  280,   -668,  432,
+    364,   240,   -196,  604,   340,   384,   196,   592,   -44,   -500,  432,   -580,  -132,  636,   -76,   392,
+    4,     -412,  540,   508,   328,   -356,  -36,   16,    -220,  -64,   -248,  -60,   24,    -192,  368,   1040,
+    92,    -24,   -1044, -32,   40,    104,   148,   192,   -136,  -520,  56,    -816,  -224,  732,   392,   356,
+    212,   -80,   -424,  -1008, -324,  588,   -1496, 576,   460,   -816,  -848,  56,    -580,  -92,   -1372, -112,
+    -496,  200,   364,   52,    -140,  48,    -48,   -60,   84,    72,    40,    132,   -356,  -268,  -104,  -284,
+    -404,  732,   -520,  164,   -304,  -540,  120,   328,   -76,   -460,  756,   388,   588,   236,   -436,  -72,
+    -176,  -404,  -316,  -148,  716,   -604,  404,   -72,   -88,   -888,  -68,   944,   88,    -220,  -344,  960,
+    472,   460,   -232,  704,   120,   832,   -228,  692,   -508,  132,   -476,  844,   -748,  -364,  -44,   1116,
+    -1104, -1056, 76,    428,   552,   -692,  60,    356,   96,    -384,  -188,  -612,  -576,  736,   508,   892,
+    352,   -1132, 504,   -24,   -352,  324,   332,   -600,  -312,  292,   508,   -144,  -8,    484,   48,    284,
+    -260,  -240,  256,   -100,  -292,  -204,  -44,   472,   -204,  908,   -188,  -1000, -256,  92,    1164,  -392,
+    564,   356,   652,   -28,   -884,  256,   484,   -192,  760,   -176,  376,   -524,  -452,  -436,  860,   -736,
+    212,   124,   504,   -476,  468,   76,    -472,  552,   -692,  -944,  -620,  740,   -240,  400,   132,   20,
+    192,   -196,  264,   -668,  -1012, -60,   296,   -316,  -828,  76,    -156,  284,   -768,  -448,  -832,  148,
+    248,   652,   616,   1236,  288,   -328,  -400,  -124,  588,   220,   520,   -696,  1032,  768,   -740,  -92,
+    -272,  296,   448,   -464,  412,   -200,  392,   440,   -200,  264,   -152,  -260,  320,   1032,  216,   320,
+    -8,    -64,   156,   -1016, 1084,  1172,  536,   484,   -432,  132,   372,   -52,   -256,  84,    116,   -352,
+    48,    116,   304,   -384,  412,   924,   -300,  528,   628,   180,   648,   44,    -980,  -220,  1320,  48,
+    332,   748,   524,   -268,  -720,  540,   -276,  564,   -344,  -208,  -196,  436,   896,   88,    -392,  132,
+    80,    -964,  -288,  568,   56,    -48,   -456,  888,   8,     552,   -156,  -292,  948,   288,   128,   -716,
+    -292,  1192,  -152,  876,   352,   -600,  -260,  -812,  -468,  -28,   -120,  -32,   -44,   1284,  496,   192,
+    464,   312,   -76,   -516,  -380,  -456,  -1012, -48,   308,   -156,  36,    492,   -156,  -808,  188,   1652,
+    68,    -120,  -116,  316,   160,   -140,  352,   808,   -416,  592,   316,   -480,  56,    528,   -204,  -568,
+    372,   -232,  752,   -344,  744,   -4,    324,   -416,  -600,  768,   268,   -248,  -88,   -132,  -420,  -432,
+    80,    -288,  404,   -316,  -1216, -588,  520,   -108,  92,    -320,  368,   -480,  -216,  -92,   1688,  -300,
+    180,   1020,  -176,  820,   -68,   -228,  -260,  436,   -904,  20,    40,    -508,  440,   -736,  312,   332,
+    204,   760,   -372,  728,   96,    -20,   -632,  -520,  -560,  336,   1076,  -64,   -532,  776,   584,   192,
+    396,   -728,  -520,  276,   -188,  80,    -52,   -612,  -252,  -48,   648,   212,   -688,  228,   -52,   -260,
+    428,   -412,  -272,  -404,  180,   816,   -796,  48,    152,   484,   -88,   -216,  988,   696,   188,   -528,
+    648,   -116,  -180,  316,   476,   12,    -564,  96,    476,   -252,  -364,  -376,  -392,  556,   -256,  -576,
+    260,   -352,  120,   -16,   -136,  -260,  -492,  72,    556,   660,   580,   616,   772,   436,   424,   -32,
+    -324,  -1268, 416,   -324,  -80,   920,   160,   228,   724,   32,    -516,  64,    384,   68,    -128,  136,
+    240,   248,   -204,  -68,   252,   -932,  -120,  -480,  -628,  -84,   192,   852,   -404,  -288,  -132,  204,
+    100,   168,   -68,   -196,  -868,  460,   1080,  380,   -80,   244,   0,     484,   -888,  64,    184,   352,
+    600,   460,   164,   604,   -196,  320,   -64,   588,   -184,  228,   12,    372,   48,    -848,  -344,  224,
+    208,   -200,  484,   128,   -20,   272,   -468,  -840,  384,   256,   -720,  -520,  -464,  -580,  112,   -120,
+    644,   -356,  -208,  -608,  -528,  704,   560,   -424,  392,   828,   40,    84,    200,   -152,  0,     -144,
+    584,   280,   -120,  80,    -556,  -972,  -196,  -472,  724,   80,    168,   -32,   88,    160,   -688,  0,
+    160,   356,   372,   -776,  740,   -128,  676,   -248,  -480,  4,     -364,  96,    544,   232,   -1032, 956,
+    236,   356,   20,    -40,   300,   24,    -676,  -596,  132,   1120,  -104,  532,   -1096, 568,   648,   444,
+    508,   380,   188,   -376,  -604,  1488,  424,   24,    756,   -220,  -192,  716,   120,   920,   688,   168,
+    44,    -460,  568,   284,   1144,  1160,  600,   424,   888,   656,   -356,  -320,  220,   316,   -176,  -724,
+    -188,  -816,  -628,  -348,  -228,  -380,  1012,  -452,  -660,  736,   928,   404,   -696,  -72,   -268,  -892,
+    128,   184,   -344,  -780,  360,   336,   400,   344,   428,   548,   -112,  136,   -228,  -216,  -820,  -516,
+    340,   92,    -136,  116,   -300,  376,   -244,  100,   -316,  -520,  -284,  -12,   824,   164,   -548,  -180,
+    -128,  116,   -924,  -828,  268,   -368,  -580,  620,   192,   160,   0,     -1676, 1068,  424,   -56,   -360,
+    468,   -156,  720,   288,   -528,  556,   -364,  548,   -148,  504,   316,   152,   -648,  -620,  -684,  -24,
+    -376,  -384,  -108,  -920,  -1032, 768,   180,   -264,  -508,  -1268, -260,  -60,   300,   -240,  988,   724,
+    -376,  -576,  -212,  -736,  556,   192,   1092,  -620,  -880,  376,   -56,   -4,    -216,  -32,   836,   268,
+    396,   1332,  864,   -600,  100,   56,    -412,  -92,   356,   180,   884,   -468,  -436,  292,   -388,  -804,
+    -704,  -840,  368,   -348,  140,   -724,  1536,  940,   372,   112,   -372,  436,   -480,  1136,  296,   -32,
+    -228,  132,   -48,   -220,  868,   -1016, -60,   -1044, -464,  328,   916,   244,   12,    -736,  -296,  360,
+    468,   -376,  -108,  -92,   788,   368,   -56,   544,   400,   -672,  -420,  728,   16,    320,   44,    -284,
+    -380,  -796,  488,   132,   204,   -596,  -372,  88,    -152,  -908,  -636,  -572,  -624,  -116,  -692,  -200,
+    -56,   276,   -88,   484,   -324,  948,   864,   1000,  -456,  -184,  -276,  292,   -296,  156,   676,   320,
+    160,   908,   -84,   -1236, -288,  -116,  260,   -372,  -644,  732,   -756,  -96,   84,    344,   -520,  348,
+    -688,  240,   -84,   216,   -1044, -136,  -676,  -396,  -1500, 960,   -40,   176,   168,   1516,  420,   -504,
+    -344,  -364,  -360,  1216,  -940,  -380,  -212,  252,   -660,  -708,  484,   -444,  -152,  928,   -120,  1112,
+    476,   -260,  560,   -148,  -344,  108,   -196,  228,   -288,  504,   560,   -328,  -88,   288,   -1008, 460,
+    -228,  468,   -836,  -196,  76,    388,   232,   412,   -1168, -716,  -644,  756,   -172,  -356,  -504,  116,
+    432,   528,   48,    476,   -168,  -608,  448,   160,   -532,  -272,  28,    -676,  -12,   828,   980,   456,
+    520,   104,   -104,  256,   -344,  -4,    -28,   -368,  -52,   -524,  -572,  -556,  -200,  768,   1124,  -208,
+    -512,  176,   232,   248,   -148,  -888,  604,   -600,  -304,  804,   -156,  -212,  488,   -192,  -804,  -256,
+    368,   -360,  -916,  -328,  228,   -240,  -448,  -472,  856,   -556,  -364,  572,   -12,   -156,  -368,  -340,
+    432,   252,   -752,  -152,  288,   268,   -580,  -848,  -592,  108,   -76,   244,   312,   -716,  592,   -80,
+    436,   360,   4,     -248,  160,   516,   584,   732,   44,    -468,  -280,  -292,  -156,  -588,  28,    308,
+    912,   24,    124,   156,   180,   -252,  944,   -924,  -772,  -520,  -428,  -624,  300,   -212,  -1144, 32,
+    -724,  800,   -1128, -212,  -1288, -848,  180,   -416,  440,   192,   -576,  -792,  -76,   -1080, 80,    -532,
+    -352,  -132,  380,   -820,  148,   1112,  128,   164,   456,   700,   -924,  144,   -668,  -384,  648,   -832,
+    508,   552,   -52,   -100,  -656,  208,   -568,  748,   -88,   680,   232,   300,   192,   -408,  -1012, -152,
+    -252,  -268,  272,   -876,  -664,  -648,  -332,  -136,  16,    12,    1152,  -28,   332,   -536,  320,   -672,
+    -460,  -316,  532,   -260,  228,   -40,   1052,  -816,  180,   88,    -496,  -556,  -672,  -368,  428,   92,
+    356,   404,   -408,  252,   196,   -176,  -556,  792,   268,   32,    372,   40,    96,    -332,  328,   120,
+    372,   -900,  -40,   472,   -264,  -592,  952,   128,   656,   112,   664,   -232,  420,   4,     -344,  -464,
+    556,   244,   -416,  -32,   252,   0,     -412,  188,   -696,  508,   -476,  324,   -1096, 656,   -312,  560,
+    264,   -136,  304,   160,   -64,   -580,  248,   336,   -720,  560,   -348,  -288,  -276,  -196,  -500,  852,
+    -544,  -236,  -1128, -992,  -776,  116,   56,    52,    860,   884,   212,   -12,   168,   1020,  512,   -552,
+    924,   -148,  716,   188,   164,   -340,  -520,  -184,  880,   -152,  -680,  -208,  -1156, -300,  -528,  -472,
+    364,   100,   -744,  -1056, -32,   540,   280,   144,   -676,  -32,   -232,  -280,  -224,  96,    568,   -76,
+    172,   148,   148,   104,   32,    -296,  -32,   788,   -80,   32,    -16,   280,   288,   944,   428,   -484};

diff --git a/libav1/dx/av1_compute.h b/libav1/dx/av1_compute.h
new file mode 100644
index 0000000..703f6fd
--- /dev/null
+++ b/libav1/dx/av1_compute.h

@@ -0,0 +1,351 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+#include <wrl.h>
+#include <d3d12.h>
+#include <queue>
+#include "aom_util/aom_thread.h"
+#include "dx/av1_thread.h"
+
+#define MAX_CREATE_SHADER_THREADS 6
+typedef struct {
+  Microsoft::WRL::ComPtr<ID3D12PipelineState> pso;
+  Microsoft::WRL::ComPtr<ID3D12RootSignature> signaturePtr;
+  const BYTE* bytecode;
+  size_t size;
+} ComputeShader;
+
+typedef struct {
+  ID3D12Device* device;
+  const BYTE* src;
+  int size;
+  ID3D12RootSignature* sig;
+  ComputeShader* dst;
+} CreateShaderTask;
+
+typedef struct {
+  ComputeShader inter_base;
+  ComputeShader inter_2x2;
+  ComputeShader inter_comp;
+  ComputeShader inter_comp_diff_y;
+  ComputeShader inter_comp_diff_uv;
+  ComputeShader inter_comp_masked;
+  ComputeShader inter_comp_2x2;
+  ComputeShader inter_warp;
+  ComputeShader inter_obmc_above;
+  ComputeShader inter_obmc_left;
+  ComputeShader inter_warp_comp;
+  ComputeShader intra_filter;
+  ComputeShader intra_main;
+  ComputeShader reconstruct_block;
+  ComputeShader inter_ext_borders;
+  ComputeShader loopfilter_v;
+  ComputeShader loopfilter_h;
+  ComputeShader inter_scale;
+  ComputeShader inter_scale_2x2;
+  ComputeShader inter_scale_comp;
+  ComputeShader inter_scale_comp_diff_y;
+  ComputeShader inter_scale_comp_diff_uv;
+  ComputeShader inter_scale_comp_masked;
+  ComputeShader inter_scale_comp_2x2;
+  ComputeShader inter_scale_obmc_above;
+  ComputeShader inter_scale_obmc_left;
+  ComputeShader inter_scale_warp_comp;
+} bitdepth_dependent_shaders;
+
+typedef struct {
+  ComputeShader shader_idct[20];
+  ComputeShader shader_idct_sort;
+  ComputeShader shader_loop_rest;
+  ComputeShader shader_cdef_filter;
+  ComputeShader shader_filmgrain_luma_gen;
+  ComputeShader shader_filmgrain_chroma_gen;
+  ComputeShader shader_filmgrain_filter;
+  ComputeShader shader_copy_plane;
+  ComputeShader shader_copy_plane_10bit10x3;
+  ComputeShader shader_fill_buffer;
+  ComputeShader shader_gen_pred_blocks;
+  ComputeShader shader_gen_lf_vert;
+  ComputeShader shader_gen_lf_hor;
+  ComputeShader shader_upscale;
+  bitdepth_dependent_shaders shaders_8bit;
+  bitdepth_dependent_shaders shaders_hbd;
+
+  Microsoft::WRL::ComPtr<ID3D12RootSignature> sig_idct;
+  Microsoft::WRL::ComPtr<ID3D12RootSignature> sig_copy_plane;
+  Microsoft::WRL::ComPtr<ID3D12RootSignature> sig_common111;
+  Microsoft::WRL::ComPtr<ID3D12RootSignature> sig_common0102;
+  Microsoft::WRL::ComPtr<ID3D12RootSignature> sig_common0110;
+  Microsoft::WRL::ComPtr<ID3D12RootSignature> sig_pred_blocks;
+  Microsoft::WRL::ComPtr<ID3D12RootSignature> sig_intra_pred;
+  Microsoft::WRL::ComPtr<ID3D12RootSignature> sig_inter_pred;
+
+  Microsoft::WRL::ComPtr<ID3D12RootSignature> sig_lf;
+  Microsoft::WRL::ComPtr<ID3D12RootSignature> sig_lf_gen;
+  Microsoft::WRL::ComPtr<ID3D12RootSignature> sig_loop_rest;
+  Microsoft::WRL::ComPtr<ID3D12RootSignature> sig_cdef_filter;
+  Microsoft::WRL::ComPtr<ID3D12RootSignature> sig_film_grain;
+  Microsoft::WRL::ComPtr<ID3D12RootSignature> sig_upscale;
+
+  int compliled_ = 0;
+  int threads = 0;
+  DataPtrQueueMT task_queue;
+  pthread_t create_shader_threads[MAX_CREATE_SHADER_THREADS] = {};
+  int create_shader_errors = 0;
+  const int task_cnt = 0;
+  CreateShaderTask* create_shader_tasks = 0;
+} compute_shader_lib;
+
+int wait_shader_create_complete(compute_shader_lib* lib);
+
+// struct Av1Core;
+HRESULT av1_upload_luts(struct Av1Core* dec);
+
+static const int obmc_mask[16 * 4] = {
+    // mask_2
+    45, 64, 64, 64,
+    // mask_4
+    39, 50, 59, 64,
+    // mask_8
+    36, 42, 48, 53, 57, 61, 64, 64,
+    // mask_16
+    34, 37, 40, 43, 46, 49, 52, 54, 56, 58, 60, 61, 64, 64, 64, 64,
+    // mask_32
+    33, 35, 36, 38, 40, 41, 43, 44, 45, 47, 48, 50, 51, 52, 53, 55, 56, 57, 58, 59, 60, 60, 61, 62, 64, 64, 64, 64, 64,
+    64, 64, 64};
+
+static const int InterBlockSizeIndexLUT[6][6] = {
+    // h:        4    8    16    32    64    128
+    {0, 1, 2, -1, -1, -1},     // w = 4 (4)
+    {3, 4, 5, 6, -1, -1},      // w = 8
+    {7, 8, 9, 10, 11, -1},     // w = 16
+    {-1, 12, 13, 14, 15, -1},  // w = 32
+    {-1, -1, 16, 17, 18, 19},  // w = 64
+    {-1, -1, -1, -1, 20, 21}   // w = 128
+};
+// TODO:
+// 0        BLOCK_4X4
+// 1        BLOCK_4X8    BLOCK_8X4
+// 2        BLOCK_8X8    BLOCK_4X16    BLOCK_16X4
+// 3        BLOCK_8X16    BLOCK_16X8
+// 4        BLOCK_16X16    BLOCK_8X32    BLOCK_32X8
+// 5        BLOCK_16X32    BLOCK_32X16
+// 6        BLOCK_32X32    BLOCK_16X64    BLOCK_64X16
+// 7        BLOCK_32X64    BLOCK_64X32
+// 8        BLOCK_64X64
+// 9        BLOCK_64X128    BLOCK_128X64
+// 10    BLOCK_128X128
+//
+// const int InterBlockSizeIndexLUT[][] =
+//{//h:        4    8    16    32    64    128
+//        {    0,    1,    2,    -1,    -1,    -1},    //w = 4 (4)
+//        {    1,    2,    3,    4,    -1,    -1},    //w = 8
+//        {    2,    3,    4,    5,    6,    -1},    //w = 16
+//        {    -1,    4,    5,    6,    7,    -1},    //w = 32
+//        {    -1, -1, 6,    7,    8,    9},        //w = 64
+//        {    -1, -1, -1, -1, 9,    10}        //w = 128
+//};
+
+const int InterBlockWidthLUT[] = {0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5};
+const int InterBlockHeightLUT[] = {0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 5, 4, 5};
+
+const int dr_intra_derivative_reduced[28] = {0,  1023, 547, 372, 273, 215, 178, 151, 132, 116, 102, 90, 80, 71,
+                                             64, 57,   51,  45,  40,  35,  31,  27,  23,  19,  15,  11, 7,  3};
+const int intra_dx_index_lut[8][7] = {
+    {25, 26, 27, 0, 27, 26, 25},  {3, 2, 1, 0, 0, 0, 0},  {11, 12, 13, 14, 15, 16, 17}, {17, 16, 15, 14, 13, 12, 11},
+    {24, 23, 22, 21, 20, 19, 18}, {10, 9, 8, 7, 6, 5, 4}, {0, 0, 0, 0, 0, 0, 0},        {18, 19, 20, 21, 22, 23, 24}};
+
+const int intra_dy_index_lut[8][7] = {
+    {0, 0, 0, 0, 1, 2, 3},  {25, 26, 27, 0, 27, 26, 25},  {0, 0, 0, 0, 0, 0, 0},        {11, 12, 13, 14, 15, 16, 17},
+    {4, 5, 6, 7, 8, 9, 10}, {18, 19, 20, 21, 22, 23, 24}, {24, 23, 22, 21, 20, 19, 18}, {0, 0, 0, 0, 0, 0, 0}};
+const int intra_mode_shader_params[16 * 7][4] = {
+    // dx, dy, flags, bits
+    // DIR
+    {11, 0, 305, 5},
+    {7, 0, 305, 5},
+    {3, 0, 305, 5},
+    {0, 0, 273, 5},
+    {-3, -1023, 850, 5},
+    {-7, -547, 850, 5},
+    {-11, -372, 850, 5},
+    {-372, -11, 850, 5},
+    {-547, -7, 850, 5},
+    {-1023, -3, 850, 5},
+    {0, 0, 323, 5},
+    {0, 3, 451, 5},
+    {0, 7, 451, 5},
+    {0, 11, 451, 5},
+    {90, 0, 305, 5},
+    {80, 0, 305, 5},
+    {71, 0, 305, 5},
+    {64, 0, 305, 5},
+    {57, 0, 305, 5},
+    {51, 0, 305, 5},
+    {45, 0, 305, 5},
+    {-45, -90, 850, 5},
+    {-51, -80, 850, 5},
+    {-57, -71, 850, 5},
+    {-64, -64, 850, 5},
+    {-71, -57, 850, 5},
+    {-80, -51, 850, 5},
+    {-90, -45, 850, 5},
+    {-15, -273, 850, 5},
+    {-19, -215, 850, 5},
+    {-23, -178, 850, 5},
+    {-27, -151, 850, 5},
+    {-31, -132, 850, 5},
+    {-35, -116, 850, 5},
+    {-40, -102, 850, 5},
+    {-102, -40, 850, 5},
+    {-116, -35, 850, 5},
+    {-132, -31, 850, 5},
+    {-151, -27, 850, 5},
+    {-178, -23, 850, 5},
+    {-215, -19, 850, 5},
+    {-273, -15, 850, 5},
+    {0, 15, 451, 5},
+    {0, 19, 451, 5},
+    {0, 23, 451, 5},
+    {0, 27, 451, 5},
+    {0, 31, 451, 5},
+    {0, 35, 451, 5},
+    {0, 40, 451, 5},
+    {40, 0, 305, 5},
+    {35, 0, 305, 5},
+    {31, 0, 305, 5},
+    {27, 0, 305, 5},
+    {23, 0, 305, 5},
+    {19, 0, 305, 5},
+    {15, 0, 305, 5},
+    // SMOOTH
+    {0, 0, 92, 9},
+    {0, 0, 92, 9},
+    {0, 0, 92, 9},
+    {0, 0, 92, 9},
+    {0, 0, 92, 9},
+    {0, 0, 92, 9},
+    {0, 0, 92, 9},
+    {0, 0, 84, 8},
+    {0, 0, 84, 8},
+    {0, 0, 84, 8},
+    {0, 0, 84, 8},
+    {0, 0, 84, 8},
+    {0, 0, 84, 8},
+    {0, 0, 84, 8},
+    {0, 0, 88, 8},
+    {0, 0, 88, 8},
+    {0, 0, 88, 8},
+    {0, 0, 88, 8},
+    {0, 0, 88, 8},
+    {0, 0, 88, 8},
+    {0, 0, 88, 8},
+    // PAETH
+    {0, 0, 336, 5},
+    {0, 0, 336, 5},
+    {0, 0, 336, 5},
+    {0, 0, 336, 5},
+    {0, 0, 336, 5},
+    {0, 0, 336, 5},
+    {0, 0, 336, 5},
+    // INTRA_BC
+    {0, 0, 0, 0},
+    {0, 0, 0, 0},
+    {0, 0, 0, 0},
+    {0, 0, 0, 0},
+    {0, 0, 0, 0},
+    {0, 0, 0, 0},
+    {0, 0, 0, 0},
+    // Filter?
+    {0, 0, 80, 0},
+    {0, 0, 80, 0},
+    {0, 0, 80, 0},
+    {0, 0, 80, 0},
+    {0, 0, 80, 0},
+    {0, 0, 80, 0},
+    {0, 0, 80, 0},
+    // DC:
+    {0, 0, 80, 5},
+    {0, 0, 80, 5},
+    {0, 0, 80, 5},
+    {0, 0, 80, 5},
+    {0, 0, 80, 5},
+    {0, 0, 80, 5},
+    {0, 0, 80, 5},
+    // CFL:
+    {0, 0, 80, 5},
+    {0, 0, 80, 5},
+    {0, 0, 80, 5},
+    {0, 0, 80, 5},
+    {0, 0, 80, 5},
+    {0, 0, 80, 5},
+    {0, 0, 80, 5},
+};
+
+extern const int dx_gaussian_sequence[2048];
+
+/*
+int mode_params_lut[13][7][4];
+for (int m = 0; m <= PAETH_PRED; ++m)
+{
+for (int delta = 0; delta < 7; ++delta)
+{
+int dir = m >= V_PRED && m <= D67_PRED;
+int dx = 0;
+int dy = 0;
+
+int need_above = 1;
+int need_left = 1;
+int need_aboveleft = (m == PAETH_PRED); // | is_filter_mode;
+int need_right = 0;
+int need_bot = 0;
+int do_corner_filt = 0;
+if (dir)
+{
+    int angle = mode_to_angle_map[m] + (delta - 3) * 3;
+    dir = (angle > 0 && angle <= 90) ? 1 :
+        (angle > 90 && angle < 180) ? 2 :
+        (angle >= 180 && angle < 270) ? 3 : 0;
+    dx = (angle > 0 && angle < 90) ? dr_intra_derivative[angle] :
+        (angle > 90 && angle < 180) ? -dr_intra_derivative[180 - angle] : 0;
+    dy = (angle > 90 && angle < 180) ? -dr_intra_derivative[angle - 90] :
+        (angle > 180 && angle < 270) ? dr_intra_derivative[270 - angle] : 0;
+    need_aboveleft = 1;
+    need_above = angle < 180;
+    need_left = angle > 90;
+    need_right = angle < 90;
+    need_bot = angle > 180;
+    do_corner_filt = need_above && need_left;
+}
+
+int m1 = m == 0 ? 12 : (m - 1);
+mode_params_lut[m1][delta][0] = dx;
+mode_params_lut[m1][delta][1] = dy;
+mode_params_lut[m1][delta][2] = dir |
+    ((m == SMOOTH_V_PRED || m == SMOOTH_PRED) ? 4 : 0) |
+    ((m == SMOOTH_H_PRED || m == SMOOTH_PRED) ? 8 : 0) |
+    (need_above << 4) |
+    (need_right << 5) |
+    (need_left << 6) |
+    (need_bot << 7) |
+    (need_aboveleft << 8) |
+    (do_corner_filt << 9);
+mode_params_lut[m1][delta][3] =
+    (m == SMOOTH_H_PRED || m == SMOOTH_V_PRED) ? 8 :
+    (m == SMOOTH_PRED) ? 9 : 5;
+}
+}
+*/

diff --git a/libav1/dx/av1_core.cpp b/libav1/dx/av1_core.cpp
new file mode 100644
index 0000000..9f43a34
--- /dev/null
+++ b/libav1/dx/av1_core.cpp

@@ -0,0 +1,1061 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "dx/av1_core.h"
+#include "dx/av1_memory.h"
+#include "dx/av1_compute.h"
+#include <assert.h>
+#include <new>
+#include "dx/types.h"
+#include "av1/decoder/decoder.h"
+#include "av1\common\scan.h"
+#include "av1\common\idct.h"
+#include "av1\common\filter.h"
+#include "av1/common/reconinter.h"
+#include "av1/common/warped_motion.h"
+#include "av1/common/reconintra.h"
+#include "aom_dsp/intrapred_common.h"
+
+enum {
+  IdctBlockSize = 16,
+  IdctCoefCountDenum = 4 * 4 + 2 * 2 * 2,
+  IdctCoefCountNum = 17,  //~71%
+  TileSbSizeThreshold = 2,
+
+  FbYStripe = 256,
+  FbUvStripe = 128,
+};
+
+int av1_postprocess_copy_output(Av1Core *dec, AV1_COMMON *cm);
+void av1_loopfilter_gpu(Av1Core *dec, AV1_COMMON *cm, MACROBLOCKD *xd);
+void av1_cdef_looprestoration(Av1Core *dec, AV1_COMMON *cm, void *lr_ctxt);
+void av1_prediction_run_all(Av1Core *dec, AV1_COMMON *cm, TileInfo *tile);
+void av1_idct_run(Av1Core *dec);
+void av1_inter_ext_borders(Av1Core *dec, AV1_COMMON *cm);
+void av1_mi_push_block(AV1Decoder *pbi, AV1_COMMON *cm, MACROBLOCKD *xd);
+void av1_prediction_gen_blocks(AV1Decoder *pbi, Av1Core *dec);
+
+static THREADFN gpu_thread_hook(void *pdata);
+
+static int get_random_number_test(int val) {
+  unsigned int bit;
+  bit = ((val >> 0) ^ (val >> 1) ^ (val >> 3) ^ (val >> 12)) & 1;
+  val = (val >> 1) | (bit << 15);
+  return val;
+}
+
+#define CHECK_RESULT(DST, FUNC, DO_ASSIGN) \
+  {                                        \
+    auto x = FUNC;                         \
+    if (DO_ASSIGN) {                       \
+      if (!x)                              \
+        return -1;                         \
+      else                                 \
+        DST = x;                           \
+    }                                      \
+  }
+
+struct resource_config {
+  int bitdepth;
+  int width;
+  int height;
+  int fb_count;
+  int ref_fb_count;
+  int gpu_pipeline_depth;
+  int enable_superres;
+};
+
+int av1_allocate_buffers(Av1Core *dec, av1_memory_manager_base *mem, const resource_config &cfg, int do_assign) {
+  const int target_width = (cfg.width + 127) & ~127;
+  const int target_height = (cfg.height + 127) & ~127;
+  const int block_count_4x4 = target_width * target_height * 3 / (2 * 4 * 4);
+  const int mi_cols = target_width >> 2;
+  const int mi_rows = target_height >> 2;
+
+  const int max_tile_cols = target_width >> 6;
+  const int max_tile_rows = target_height >> 6;
+  dec->block_count4x4 = block_count_4x4;
+
+  CHECK_RESULT(dec->pbi_alloc, (void *)mem->host_allocate(sizeof(AV1Decoder)), do_assign);
+  CHECK_RESULT(dec->buf_pool_alloc, (void *)mem->host_allocate(sizeof(BufferPool)), do_assign);
+
+  for (int p = 0; p < 5; ++p)
+    CHECK_RESULT(dec->above_context_alloc[p], (void **)mem->host_allocate(sizeof(void *) * max_tile_rows), do_assign);
+  for (int row = 0; row < max_tile_rows; ++row)
+    for (int p = 0; p < 5; ++p)
+      CHECK_RESULT(dec->above_context_alloc[p][row], (void *)mem->host_allocate(sizeof(ENTROPY_CONTEXT) * mi_cols),
+                   do_assign);
+
+  const int y_max_rst_units = max_tile_cols * max_tile_rows;
+  const int uv_max_rst_units = (max_tile_cols >> 1) * (max_tile_rows >> 1);
+  CHECK_RESULT(dec->restoration_info_alloc[0],
+               (void *)mem->host_allocate(sizeof(RestorationUnitInfo) * y_max_rst_units), do_assign);
+  CHECK_RESULT(dec->restoration_info_alloc[1],
+               (void *)mem->host_allocate(sizeof(RestorationUnitInfo) * uv_max_rst_units), do_assign);
+  CHECK_RESULT(dec->restoration_info_alloc[2],
+               (void *)mem->host_allocate(sizeof(RestorationUnitInfo) * uv_max_rst_units), do_assign);
+  const int tmvs_size = ((mi_rows + MAX_MIB_SIZE) >> 1) * (mi_cols >> 1);
+  CHECK_RESULT(dec->tplmvs_alloc, (void *)mem->host_allocate(sizeof(TPL_MV_REF) * tmvs_size), do_assign);
+
+  const int residuals_size = target_width * target_height +
+                             2 * ((((target_width >> 1) + 127) & (~127)) * (((target_height >> 1) + 127) & (~127)));
+  CHECK_RESULT(dec->idct_residuals, mem->create_buffer(sizeof(short) * residuals_size, MemoryType::DeviceOnly),
+               do_assign);
+  CHECK_RESULT(dec->idct_blocks, mem->create_buffer(block_count_4x4 * IdctBlockSize, MemoryType::DeviceOnly),
+               do_assign);
+  CHECK_RESULT(dec->inter_mask_lut, mem->create_buffer(sizeof(wedge_mask_buf) * 2, MemoryType::DeviceOnlyConst),
+               do_assign);
+  CHECK_RESULT(dec->inter_warp_filter, mem->create_buffer(sizeof(warped_filter) * 2, MemoryType::DeviceOnlyConst),
+               do_assign);
+
+  CHECK_RESULT(dec->filmgrain_noise, mem->create_buffer(sizeof(int) * (96 * 96 + 48 * 48 * 2), MemoryType::DeviceOnly),
+               do_assign);
+  CHECK_RESULT(dec->filmgrain_gaus, mem->create_buffer(sizeof(dx_gaussian_sequence), MemoryType::DeviceOnlyConst),
+               do_assign);
+  CHECK_RESULT(dec->filmgrain_random_luma, mem->create_buffer(sizeof(int) * (65536 + 1), MemoryType::DeviceOnlyConst),
+               do_assign);
+  CHECK_RESULT(dec->filmgrain_random_chroma, mem->create_buffer(sizeof(int) * (65536 + 1), MemoryType::DeviceOnlyConst),
+               do_assign);
+
+  // frame buffer size:
+  const int border = 16;
+  const int y_stride = target_width + 2 * border;
+  const int y_size = y_stride * target_height;
+  const int uv_stride = (target_width >> 1) + border * 2;
+  const int uv_size = uv_stride * (target_height >> 1);
+  const int bpp = cfg.bitdepth > 8 ? 2 : 1;
+  const int fb_size = ((y_size + 2 * uv_size) * bpp + 255) & ~255;
+
+  dec->fb_size = fb_size;
+  dec->fb_offset = fb_size;
+  dec->enable_superres = cfg.enable_superres;
+  CHECK_RESULT(dec->frame_buffer_pool,
+               mem->create_buffer(fb_size * cfg.fb_count + dec->fb_offset, MemoryType::DeviceOnly), do_assign);
+
+  const int grid_w = mi_cols + 2 + 128;
+  const int grid_h = mi_rows + 2 + 128;
+  const int cdef_blocks = target_width * target_height / 64 / 64;
+
+  const int block_count_8x8 = block_count_4x4 >> 2;
+  const int mi_size = mi_cols * mi_rows;
+  const int lf_blk_count = (target_width / 64) * (target_height / 4) +       // vert luma
+                           (target_width / 128) * (target_height / 8) * 2 +  // vert chroma
+                           (target_height / 64) * (target_width / 4) +       // hor luma
+                           (target_height / 128) * (target_width / 8) * 2;   // hor chroma
+
+  const int max_tiles = max_tile_cols * max_tile_rows;
+
+  dec->pred_map_size = (256 + 32) * max_tiles + (mi_size >> 2) * 10;
+  CHECK_RESULT(dec->prediction_blocks, mem->create_buffer(block_count_4x4 * 16 * 2, MemoryType::DeviceOnly), do_assign);
+  CHECK_RESULT(dec->prediction_blocks_warp, mem->create_buffer(block_count_8x8 * 48, MemoryType::DeviceOnly),
+               do_assign);
+  CHECK_RESULT(dec->loopfilter_blocks, mem->create_buffer(lf_blk_count * 32, MemoryType::DeviceOnly), do_assign);
+  CHECK_RESULT(dec->mode_info_pool, mem->create_buffer(sizeof(MB_MODE_INFO) * mi_size, MemoryType::HostRW), do_assign);
+  const int coef_buffer_size = target_width * target_height * 3 * IdctCoefCountNum / (IdctCoefCountDenum * 2);
+  CHECK_RESULT(dec->idct_coefs, mem->create_buffer(sizeof(int) * coef_buffer_size * 2, MemoryType::DeviceUpload),
+               do_assign);
+  for (int i = 0; i < cfg.gpu_pipeline_depth; ++i) {
+    av1_frame_thread_data *td = &dec->frame_thread_data[i];
+    td->mode_info_max = mi_size >> 1;
+    td->mode_info_offset = i * (mi_size >> 1);
+    td->coef_buffer_offset = i * coef_buffer_size;
+    CHECK_RESULT(td->command_buffer.cb_alloc, mem->create_buffer(1024 * 1024, MemoryType::DeviceUpload), do_assign);
+    CHECK_RESULT(td->tile_data, (av1_tile_data *)mem->host_allocate(sizeof(av1_tile_data) * max_tiles), do_assign);
+    CHECK_RESULT(td->gen_mi_block_indexes, mem->create_buffer(sizeof(int) * block_count_4x4 * 2, MemoryType::HostRW),
+                 do_assign);
+    CHECK_RESULT(td->gen_intra_inter_grid, mem->create_buffer(grid_w * grid_h * 6, MemoryType::HostRW), do_assign);
+    CHECK_RESULT(td->gen_block_map, mem->create_buffer(dec->pred_map_size * sizeof(int), MemoryType::HostRW),
+                 do_assign);
+    // CHECK_RESULT(td->mode_info, mem->create_buffer(sizeof(MB_MODE_INFO) * mi_size, MemoryType::HostRW), do_assign);
+    CHECK_RESULT(td->mode_info_grid, mem->create_buffer(sizeof(MB_MODE_INFO *) * mi_size, MemoryType::HostRW),
+                 do_assign);
+    // CHECK_RESULT(td->idct_coefs, mem->create_buffer(sizeof(int) * target_width * target_height * 3 / 2,
+    // MemoryType::DeviceUpload), do_assign);
+    CHECK_RESULT(td->idct_blocks_unordered,
+                 mem->create_buffer(block_count_4x4 * IdctBlockSize, MemoryType::DeviceUpload), do_assign);
+
+    CHECK_RESULT(td->cdef_indexes, mem->create_buffer(sizeof(int) * cdef_blocks, MemoryType::DeviceUpload), do_assign);
+    CHECK_RESULT(td->cdef_skips, mem->create_buffer(sizeof(int) * cdef_blocks * 16 * 16, MemoryType::DeviceUpload),
+                 do_assign);
+    CHECK_RESULT(td->loop_rest_types, mem->create_buffer(16 * (block_count_4x4 >> 8), MemoryType::DeviceUpload),
+                 do_assign);
+    CHECK_RESULT(td->loop_rest_wiener, mem->create_buffer(64 * (block_count_4x4 >> 8), MemoryType::DeviceUpload),
+                 do_assign);
+    CHECK_RESULT(td->filmgrain_rand_offset,
+                 mem->create_buffer(sizeof(int) * (120 * (68 + 1)), MemoryType::DeviceUpload), do_assign);
+
+    CHECK_RESULT(td->palette_buffer, mem->create_buffer(fb_size, MemoryType::DeviceUpload), do_assign);
+  }
+
+  const int mvs_size = (target_width >> 3) * (target_height >> 3) * sizeof(MV_REF);
+  const int seg_size = (target_width >> 2) * (target_height >> 2);
+  for (int i = 0; i < cfg.ref_fb_count; ++i) {
+    HwFrameBuffer *fb = &dec->fb_pool_src[i];
+    CHECK_RESULT(fb->mvs_alloc, mem->host_allocate(mvs_size), do_assign);
+    CHECK_RESULT(fb->seg_alloc, (uint8_t *)mem->host_allocate(seg_size), do_assign);
+  }
+  return 0;
+}
+
+extern "C" int av1_query_memory_requirements(aom_codec_dec_cfg_t *cfg) {
+  av1_memory_allocator_dummy mem;
+
+  mem.host_allocate(32 * 1024);  // assume aom_codec_alg_priv size, actually ~20kb;
+  mem.host_allocate(sizeof(av1_memory_allocator));
+  mem.host_allocate(sizeof(Av1Core));
+  resource_config rcfg;
+  rcfg.bitdepth = cfg->bitdepth;
+  rcfg.width = cfg->width;
+  rcfg.height = cfg->height;
+  rcfg.ref_fb_count = 12;
+  rcfg.fb_count = rcfg.ref_fb_count;
+  rcfg.gpu_pipeline_depth = FrameThreadDataCount;
+  rcfg.enable_superres = 1;
+  Av1Core dec;
+  av1_allocate_buffers(&dec, &mem, rcfg, 0);
+  cfg->host_size = mem.get_host_size();
+  return 0;
+}
+
+int create_device(dx_compute_context *context) {
+  HRESULT hr = S_OK;
+  if (context->device == 0) {
+    UINT dxgiFactoryFlags = 0;
+#if defined(_DEBUG)
+    {
+      ComPtr<ID3D12Debug> debugController;
+      hr = D3D12GetDebugInterface(IID_PPV_ARGS(&debugController));
+      if (SUCCEEDED(hr)) {
+        debugController->EnableDebugLayer();
+        dxgiFactoryFlags |= DXGI_CREATE_FACTORY_DEBUG;
+      }
+    }
+#endif
+
+    ComPtr<IDXGIFactory4> factory;
+    hr = CreateDXGIFactory2(dxgiFactoryFlags, IID_PPV_ARGS(&factory));
+    if (FAILED(hr)) return hr;
+
+    ComPtr<IDXGIAdapter1> hardwareAdapter;
+    hr = E_FAIL;
+    for (int index = 0; DXGI_ERROR_NOT_FOUND != factory->EnumAdapters1(index, &hardwareAdapter); ++index) {
+      DXGI_ADAPTER_DESC1 desc;
+      hardwareAdapter->GetDesc1(&desc);
+      if (desc.Flags & DXGI_ADAPTER_FLAG_SOFTWARE) continue;
+      hr = D3D12CreateDevice(hardwareAdapter.Get(), D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&context->device));
+      if (SUCCEEDED(hr)) break;
+    }
+
+    if (context->device == NULL || FAILED(hr)) return E_FAIL;
+  }
+
+  Microsoft::WRL::ComPtr<ID3D12Device> device = context->device;
+  D3D12_COMMAND_QUEUE_DESC desc = {};
+  if (!context->queue) {
+    desc.Flags = D3D12_COMMAND_QUEUE_FLAG_NONE;
+    desc.Type = D3D12_COMMAND_LIST_TYPE_DIRECT;  // enable_cpu_output == EnableHostOutput ?
+                                                 // D3D12_COMMAND_LIST_TYPE_DIRECT : D3D12_COMMAND_LIST_TYPE_COMPUTE;
+    hr = device->CreateCommandQueue(&desc, IID_PPV_ARGS(&context->queue));
+    if (FAILED(hr)) return hr;
+  }
+  context->queue_direct = context->queue;
+
+  if (FAILED(hr)) return hr;
+  return hr;
+}
+
+extern "C" int av1_create_gpu_decoder(Av1Core **gpu_dec, aom_codec_dec_cfg_t *cfg) {
+  if (cfg->host_size < sizeof(av1_memory_allocator)) return -1;
+
+  av1_memory_allocator *mem = new (cfg->host_memory) av1_memory_allocator;
+  mem->setup((uint8_t *)cfg->host_memory, cfg->host_size);
+  mem->host_allocate(sizeof(*mem));
+  Av1Core *dec = (Av1Core *)mem->host_allocate(sizeof(Av1Core));
+  if (!dec) return -1;
+
+  memset(dec, 0, sizeof(*dec));
+  dec->memory = mem;
+
+  dec->compute.device = static_cast<ID3D12Device *>(cfg->dx12device);
+  dec->compute.queue = static_cast<ID3D12CommandQueue *>(cfg->dx12command_queue);
+  if (FAILED(create_device(&dec->compute))) return -1;
+  dx_compute_context *compute = &dec->compute;
+  mem->set_dx_context(compute);
+
+  //    if (!cfg->out_buffers_cb.get_out_buffer_cb ||
+  //        !cfg->out_buffers_cb.release_out_buffer_cb)
+  //        return -1;
+  dec->cb_get_output_image = cfg->out_buffers_cb.get_out_buffer_cb;
+  dec->cb_release_image = cfg->out_buffers_cb.release_out_buffer_cb;
+  dec->cb_notify_frame_ready = cfg->out_buffers_cb.notify_frame_ready_cb;
+  dec->image_alloc_priv = cfg->out_buffers_cb.out_buffers_priv;
+  dec->shader_lib = static_cast<compute_shader_lib *>(cfg->dxPsos);
+  if (!dec->shader_lib) return -1;
+  if (wait_shader_create_complete(dec->shader_lib)) return -1;
+
+  resource_config rcfg;
+  rcfg.bitdepth = cfg->bitdepth;
+  rcfg.width = cfg->width;
+  rcfg.height = cfg->height;
+  rcfg.ref_fb_count = 12;
+  rcfg.fb_count = rcfg.ref_fb_count;
+  rcfg.gpu_pipeline_depth = FrameThreadDataCount;
+  rcfg.enable_superres = 1;
+  if (av1_allocate_buffers(dec, mem, rcfg, 1)) return -1;
+
+  Microsoft::WRL::ComPtr<ID3D12Device> device = compute->device;
+  dec->tryhdr10x3 = cfg->tryHDR10x3;
+  HRESULT hr;
+
+  MTQueueInit(&dec->frame_data_pool);
+  for (int i = 0; i < FrameThreadDataCount; ++i) {
+    av1_frame_thread_data *td = &dec->frame_thread_data[i];
+    ComputeCommandBuffer *cb = &td->command_buffer;
+    hr = device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&cb->allocator));
+    if (FAILED(hr)) return -1;
+    cb->fence_value = 0;
+    hr = device->CreateFence(cb->fence_value, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&cb->fence));
+    if (FAILED(hr)) return -1;
+    cb->event = CreateEvent(nullptr, false, false, nullptr);
+    td->frame_number = 0;
+    pthread_mutex_init(&td->sec_data_mutex, NULL);
+    MTQueuePush(&dec->frame_data_pool, td);
+  }
+
+  hr = device->CreateCommandList(0, D3D12_COMMAND_LIST_TYPE_DIRECT,
+                                 dec->frame_thread_data[0].command_buffer.allocator.Get(), NULL,
+                                 IID_PPV_ARGS(&compute->command_list));
+  if (FAILED(hr)) return -1;
+  if (FAILED(compute->command_list->Close())) return -1;
+
+  if (FAILED(av1_upload_luts(dec))) return -1;
+
+  MTQueueInit(&dec->gpu_item_pool);
+  MTQueueInit(&dec->gpu_waiting_queue);
+  for (int i = 0; i < 8; ++i) {
+    dec->gpu_item_pool_src[i].data = NULL;
+    dec->gpu_item_pool_src[i].image = NULL;
+    MTQueuePush(&dec->gpu_item_pool, &dec->gpu_item_pool_src[i]);
+  }
+
+  MTQueueInit(&dec->output_queue);
+  MTQueueInit(&dec->image_pool);
+  for (int i = 0; i < ImagePoolSize; ++i) {
+    HwOutputImage *img = &dec->image_pool_src[i];
+    img->size = 0;
+    img->fb_ptr = NULL;
+    img->is_valid = 0;
+    //        img->hw_buf = dec->output_frame_buffers[i];
+    MTQueuePush(&dec->image_pool, img);
+  }
+
+  dec->back_buffer1.size = dec->fb_offset;
+  dec->back_buffer1.base_offset = 0;
+  QueueInit(&dec->fb_pool);
+  for (int i = 0; i < rcfg.ref_fb_count; ++i) {
+    const int offset = dec->fb_offset + dec->fb_size * i;
+    HwFrameBuffer *fb = &dec->fb_pool_src[i];
+    // fb->pool_ptr = pool;
+    // fb->fb_ptr = pool + offset;
+    fb->size = dec->fb_size;
+    fb->base_offset = offset;
+    fb->ref_cnt = 0;
+    QueuePush(&dec->fb_pool, fb);
+  }
+
+  pthread_cond_init(&dec->fb_pool_empty_cond, NULL);
+  pthread_mutex_init(&dec->fb_pool_mutex, NULL);
+  *gpu_dec = dec;
+  return 0;
+}
+
+extern "C" void av1_allocate_pbi(Av1Core *dec, AV1Decoder **ppbi, BufferPool **pbp) {
+  AV1Decoder *pbi = (AV1Decoder *)dec->pbi_alloc;
+  BufferPool *bp = (BufferPool *)dec->buf_pool_alloc;
+  memset(pbi, 0, sizeof(*pbi));
+  memset(bp, 0, sizeof(*bp));
+  pbi->gpu_decoder = dec;
+  *ppbi = pbi;
+  *pbp = bp;
+}
+
+HwFrameBuffer *get_frame_buffer(Av1Core *dec) {
+  HwFrameBuffer *fb = NULL;
+
+  pthread_mutex_lock(&dec->fb_pool_mutex);
+  {
+    while (!dec->fb_pool.m_QueueNotEmpty) pthread_cond_wait(&dec->fb_pool_empty_cond, &dec->fb_pool_mutex);
+    fb = (HwFrameBuffer *)QueueGet(&dec->fb_pool);
+  }
+  fb->ref_cnt = 1;
+  pthread_mutex_unlock(&dec->fb_pool_mutex);
+  return fb;
+}
+
+void frame_buffer_acquire(Av1Core *dec, HwFrameBuffer *fb) {
+  pthread_mutex_lock(&dec->fb_pool_mutex);
+  ++fb->ref_cnt;
+  pthread_mutex_unlock(&dec->fb_pool_mutex);
+}
+
+void frame_buffer_release(Av1Core *dec, HwFrameBuffer *fb) {
+  pthread_mutex_lock(&dec->fb_pool_mutex);
+  --fb->ref_cnt;
+  if (fb->ref_cnt == 0) {
+    QueuePush(&dec->fb_pool, fb);
+    pthread_cond_signal(&dec->fb_pool_empty_cond);
+  }
+  pthread_mutex_unlock(&dec->fb_pool_mutex);
+}
+
+void av1_destroy_gpu_decoder(Av1Core *dec) {
+  if (!dec) return;
+  av1_drain_gpu_decoder(dec);
+
+  if (dec->cb_release_image) {
+    for (int i = 0; i < ImagePoolSize; ++i) {
+      HwOutputImage *img = &dec->image_pool_src[i];
+      if (img->is_valid) {
+        dec->cb_release_image(dec->image_alloc_priv, img->alloc_priv);
+      }
+    }
+  }
+
+  for (int i = 0; i < FrameThreadDataCount; ++i) {
+    av1_frame_thread_data *td = &dec->frame_thread_data[i];
+    CloseHandle(td->command_buffer.event);
+    pthread_mutex_destroy(&td->sec_data_mutex);
+  }
+
+  pthread_cond_destroy(&dec->fb_pool_empty_cond);
+  pthread_mutex_destroy(&dec->fb_pool_mutex);
+  MTQueueDestroy(&dec->output_queue);
+  MTQueueDestroy(&dec->image_pool);
+  MTQueueDestroy(&dec->gpu_waiting_queue);
+  MTQueueDestroy(&dec->gpu_item_pool);
+  MTQueueDestroy(&dec->frame_data_pool);
+
+  if (dec->memory) dec->memory->release();
+
+  dec->~Av1Core();
+}
+
+void av1_prepare_command_buffer(Av1Core *dec) {
+  av1_frame_thread_data *td = dec->curr_frame_data;
+  dx_compute_context *compute = &dec->compute;
+
+  td->command_buffer.allocator->Reset();
+  compute->command_list->Reset(td->command_buffer.allocator.Get(), NULL);
+  td->command_buffer.Reset();
+
+  PutPerfMarker(td, &td->perf_markers[0]);
+}
+
+void av1_commit_command_buffer(Av1Core *dec) {
+  av1_frame_thread_data *td = dec->curr_frame_data;
+  PutPerfMarker(td, &td->perf_markers[15]);
+
+  dx_compute_context *compute = &dec->compute;
+  ComputeCommandBuffer *cb = &td->command_buffer;
+
+  compute->command_list->Close();
+  ID3D12CommandList *list[] = {compute->command_list.Get()};
+
+  ++cb->fence_value;
+  compute->queue->ExecuteCommandLists(1, list);
+  compute->queue->Signal(cb->fence.Get(), cb->fence_value);
+  cb->fence->SetEventOnCompletion(cb->fence_value, cb->event);
+}
+
+void PutPerfMarker(av1_frame_thread_data *td, volatile int64_t *marker) {}
+
+extern "C" void av1_setup_context_buffers(AV1Decoder *pbi) {
+  AV1_COMMON *cm = &pbi->common;
+  Av1Core *dec = pbi->gpu_decoder;
+  // above context:
+  const int cols = (cm->mi_cols + 31) & ~31;
+  const int rows = cm->tile_rows;
+  cm->num_allocated_above_contexts = rows;
+  cm->num_allocated_above_context_mi_col = cols;
+  cm->num_allocated_above_context_planes = cm->seq_params.monochrome ? 1 : MAX_MB_PLANE;
+  cm->above_context[0] = (ENTROPY_CONTEXT **)dec->above_context_alloc[0];
+  cm->above_context[1] = (ENTROPY_CONTEXT **)dec->above_context_alloc[1];
+  cm->above_context[2] = (ENTROPY_CONTEXT **)dec->above_context_alloc[2];
+  cm->above_seg_context = (PARTITION_CONTEXT **)dec->above_context_alloc[3];
+  cm->above_txfm_context = (TXFM_CONTEXT **)dec->above_context_alloc[4];
+
+  cm->rst_info[0].unit_info = (RestorationUnitInfo *)dec->restoration_info_alloc[0];
+  cm->rst_info[1].unit_info = (RestorationUnitInfo *)dec->restoration_info_alloc[1];
+  cm->rst_info[2].unit_info = (RestorationUnitInfo *)dec->restoration_info_alloc[2];
+  cm->tpl_mvs = (TPL_MV_REF *)dec->tplmvs_alloc;
+}
+
+extern "C" void av1_show_frame(AV1Decoder *pbi, YV12_BUFFER_CONFIG *buf, int is_visible) {}
+
+void copy_to_img(Av1Core *dec, HwOutputImage *hw_img, aom_image_t *dst) {
+  dst->bit_depth = hw_img->hbd ? 10 : 8;
+  dst->w = hw_img->y_crop_width;
+  dst->h = hw_img->y_crop_height;
+  dst->d_w = hw_img->y_crop_width;
+  dst->d_h = hw_img->y_crop_height;
+  dst->r_w = hw_img->y_crop_width;
+  dst->r_h = hw_img->y_crop_height;
+  dst->x_chroma_shift = 1;
+  dst->y_chroma_shift = 1;
+  dst->planes[AOM_PLANE_Y] = hw_img->fb_ptr + hw_img->planes[0].offset;
+  dst->planes[AOM_PLANE_U] = hw_img->fb_ptr + hw_img->planes[1].offset;
+  dst->planes[AOM_PLANE_V] = hw_img->fb_ptr + hw_img->planes[2].offset;
+  dst->stride[AOM_PLANE_Y] = hw_img->planes[0].stride;
+  dst->stride[AOM_PLANE_U] = hw_img->planes[1].stride;
+  dst->stride[AOM_PLANE_V] = hw_img->planes[2].stride;
+  dst->user_priv = hw_img->user_priv;
+  dst->fb2_priv = hw_img->alloc_priv;
+  dst->monochrome = hw_img->monochrome;
+  if (hw_img->monochrome) {
+    dst->planes[AOM_PLANE_U] = NULL;
+    dst->planes[AOM_PLANE_V] = NULL;
+  }
+  dst->is_hdr10x3 = dec->tryhdr10x3;
+  hw_img->fb_ptr = NULL;
+  hw_img->alloc_priv = NULL;
+  hw_img->user_priv = NULL;
+  hw_img->is_valid = 0;
+}
+
+extern "C" int get_output_frame(AV1Decoder *pbi, aom_image_t *dst) {
+  Av1Core *dec = pbi->gpu_decoder;
+  if (MTQueueIsEmpty(&dec->output_queue)) return 1;
+  HwOutputImage *hw_img = (HwOutputImage *)MTQueueGet(&dec->output_queue);
+  copy_to_img(dec, hw_img, dst);
+  MTQueuePush(&dec->image_pool, hw_img);
+  return 0;
+}
+
+void av1_sync_gpu(av1_frame_thread_data *td) {
+  ComputeCommandBuffer *cb = &td->command_buffer;
+  WaitForSingleObject(cb->event, INFINITE);
+}
+
+int release_fb(Av1Core *dec, HwFrameBuffer *fb) {
+  --fb->ref_cnt;
+  if (fb->ref_cnt == 0) {
+    QueuePush(&dec->fb_pool, fb);
+    return 1;
+  }
+  return 0;
+}
+
+THREADFN gpu_thread_hook(void *data) {
+  Av1Core *dec = (Av1Core *)data;
+  while (1) {
+    GpuWorkItem *item = (GpuWorkItem *)MTQueueGet(&dec->gpu_waiting_queue);
+    if (!item) {
+      if (dec->cb_notify_frame_ready && dec->image_alloc_priv) dec->cb_notify_frame_ready(dec->image_alloc_priv, 0);
+      break;
+    }
+
+    av1_frame_thread_data *td = item->data;
+    if (td) {
+      // sync gpu
+      av1_sync_gpu(td);
+      pthread_mutex_lock(&dec->fb_pool_mutex);
+      int signal = 0;
+      if (td->back_buffer0) signal |= release_fb(dec, td->back_buffer0);
+      signal |= release_fb(dec, td->dst_frame_buffer);
+      for (int r = 0; r < 7; ++r) {
+        if (td->refs[r]) signal |= release_fb(dec, td->refs[r]);
+        td->refs[r] = 0;
+      }
+      td->dst_frame_buffer = 0;
+      td->back_buffer0 = 0;
+      if (signal) {
+        pthread_cond_signal(&dec->fb_pool_empty_cond);
+      }
+      if (td->sec_thread_data) {
+        MTQueuePush(&dec->frame_data_pool, td->sec_thread_data);
+      }
+      pthread_mutex_unlock(&dec->fb_pool_mutex);
+      td->sec_thread_data = NULL;
+      MTQueuePush(&dec->frame_data_pool, td);
+    }
+
+    HwOutputImage *hw_img = item->image;
+    if (hw_img) {
+      if (dec->cb_notify_frame_ready && dec->image_alloc_priv) {
+        aom_image_t img = {};
+        aom_image_t *dst = &img;
+        copy_to_img(dec, hw_img, dst);
+        MTQueuePush(&dec->image_pool, hw_img);
+        dec->cb_notify_frame_ready(dec->image_alloc_priv, dst);
+      } else {
+        MTQueuePush(&dec->output_queue, hw_img);
+      }
+    }
+    MTQueuePush(&dec->gpu_item_pool, item);
+  }
+
+  return THREAD_RETURN(0);
+}
+
+extern "C" int av1_reallocate_frame_buffer(void *priv, YV12_BUFFER_CONFIG *ybf, int width, int height,
+                                           int upscaled_width, int hbd) {
+  if (!ybf || !priv) return -1;
+
+  Av1Core *dec = (Av1Core *)priv;
+  ybf->hw_show_image = NULL;
+  if (!ybf->hw_buffer) {
+    ybf->hw_buffer = get_frame_buffer(dec);
+  }
+
+  if (!ybf->hw_buffer) return -1;
+  const int bpp = hbd + 1;
+  const int h_border = 16;
+  const int aligned_width = (width + 127) & ~127;
+  const int aligned_height = (height + 127) & ~127;
+  const int y_stride = aligned_width + 2 * h_border;
+  const int y_height = aligned_height;
+  const int y_size = y_height * y_stride;
+  const int uv_stride = (aligned_width >> 1) + 2 * h_border;
+  const int uv_size = (y_height >> 1) * uv_stride;
+  const int fb_size = (y_size + 2 * uv_size) * (1 + hbd);
+
+  const int superres_w = (upscaled_width + 127) & ~127;
+  const int superres_size =
+      (y_height * (superres_w + 2 * h_border) + 2 * (y_height >> 1) * ((superres_w >> 1) + 2 * h_border)) * (1 + hbd);
+  if (ybf->hw_buffer->size < fb_size || ybf->hw_buffer->size < superres_size ||
+      (width != upscaled_width && dec->enable_superres == 0)) {
+    frame_buffer_release(dec, ybf->hw_buffer);
+    ybf->hw_buffer = NULL;
+    return -1;
+  }
+
+  ybf->buffer_alloc = NULL;
+  ybf->buffer_alloc_sz = 0;
+  ybf->y_crop_width = width;
+  ybf->y_crop_height = height;
+  ybf->y_width = (width + 7) & ~7;
+  ybf->y_height = (height + 7) & ~7;
+  ybf->y_stride = y_stride;
+
+  ybf->uv_crop_width = (width + 1) >> 1;
+  ybf->uv_crop_height = (height + 1) >> 1;
+  ybf->uv_width = ybf->y_width >> 1;
+  ybf->uv_height = ybf->y_height >> 1;
+  ybf->uv_stride = uv_stride;
+
+  ybf->border = h_border;
+  ybf->frame_size = (size_t)fb_size;
+  ybf->subsampling_x = 1;
+  ybf->subsampling_y = 1;
+  ybf->use_external_reference_buffers = 0;
+  ybf->flags = hbd ? YV12_FLAG_HIGHBITDEPTH : 0;
+
+  ybf->y_buffer = NULL;
+  ybf->u_buffer = NULL;
+  ybf->v_buffer = NULL;
+  HwFrameBuffer *buf = ybf->hw_buffer;
+  buf->width = ybf->y_width;
+  buf->height = ybf->y_height;
+  buf->hbd = hbd;
+  buf->y_crop_width = ybf->y_crop_width;
+  buf->y_crop_height = ybf->y_crop_height;
+  buf->uv_crop_width = ybf->uv_crop_width;
+  buf->uv_crop_height = ybf->uv_crop_height;
+  buf->planes[0].stride = ybf->y_stride * bpp;
+  buf->planes[1].stride = ybf->uv_stride * bpp;
+  buf->planes[2].stride = ybf->uv_stride * bpp;
+  buf->planes[0].offset = static_cast<int>(buf->base_offset + h_border * bpp);
+  buf->planes[1].offset = static_cast<int>(buf->base_offset + (y_size + h_border) * bpp);
+  buf->planes[2].offset = static_cast<int>(buf->planes[1].offset + uv_size * bpp);
+  buf->planes[0].res_stride = sizeof(short) * ((ybf->y_width + 127) & (~127));
+  buf->planes[1].res_stride = sizeof(short) * ((ybf->uv_width + 127) & (~127));
+  buf->planes[2].res_stride = sizeof(short) * ((ybf->uv_width + 127) & (~127));
+  buf->planes[0].res_offset = 0;
+  buf->planes[1].res_offset = buf->planes[0].res_stride * ((ybf->y_height + 127) & (~127));
+  buf->planes[2].res_offset = buf->planes[1].res_offset + buf->planes[1].res_stride * ((ybf->uv_height + 127) & (~127));
+  return 0;
+}
+
+void av1_release_fb_callback(void *priv, YV12_BUFFER_CONFIG *ybf) {
+  Av1Core *dec = (Av1Core *)priv;
+
+  if (ybf->hw_buffer) {
+    frame_buffer_release(dec, ybf->hw_buffer);
+    ybf->hw_buffer = NULL;
+  }
+  HwOutputImage *img = ybf->hw_show_image;
+  ybf->hw_show_image = NULL;
+  if (img) {
+    dec->cb_release_image(dec->image_alloc_priv, img->alloc_priv);
+    img->fb_ptr = NULL;
+    img->size = 0;
+    img->alloc_priv = NULL;
+    img->is_valid = 0;
+    MTQueuePush(&dec->image_pool, img);
+  }
+}
+
+void av1_drain_gpu_decoder(Av1Core *dec) {
+  if (dec && dec->gpu_thread) {
+    MTQueuePush(&dec->gpu_waiting_queue, NULL);
+    pthread_join(dec->gpu_thread, 0);
+    dec->gpu_thread = NULL;
+  }
+}
+
+void get_sec_data(Av1Core *dec, av1_frame_thread_data *td) {
+  pthread_mutex_lock(&td->sec_data_mutex);
+  if (td->sec_thread_data == NULL) {
+    td->sec_thread_data = (av1_frame_thread_data *)MTQueueGet(&dec->frame_data_pool);
+    av1_frame_thread_data *sec = td->sec_thread_data;
+    for (int i = 0; i < td->tile_count; ++i) sec->tile_data[i].mi_count = 0;
+  }
+  pthread_mutex_unlock(&td->sec_data_mutex);
+}
+
+extern "C" void av1_setup_frame(AV1Decoder *pbi, AV1_COMMON *cm) {
+  Av1Core *dec = pbi->gpu_decoder;
+
+  av1_frame_thread_data *td = dec->curr_frame_data;
+  if (td) {
+    pthread_mutex_lock(&dec->fb_pool_mutex);
+    if (td->back_buffer0) release_fb(dec, td->back_buffer0);
+    release_fb(dec, td->dst_frame_buffer);
+    for (int r = 0; r < 7; ++r) {
+      if (td->refs[r]) release_fb(dec, td->refs[r]);
+      td->refs[r] = 0;
+    }
+    td->dst_frame_buffer = 0;
+    td->back_buffer0 = 0;
+    pthread_mutex_unlock(&dec->fb_pool_mutex);
+    if (td->sec_thread_data) {
+      MTQueuePush(&dec->frame_data_pool, td->sec_thread_data);
+    }
+    MTQueuePush(&dec->frame_data_pool, td);
+    dec->curr_frame_data = NULL;
+  }
+
+  td = (av1_frame_thread_data *)MTQueueGet(&dec->frame_data_pool);
+  td->sec_thread_data = NULL;
+  dec->curr_frame_data = td;
+  td->frame_number = dec->frame_number;
+
+  YV12_BUFFER_CONFIG *buf = &cm->cur_frame->buf;
+  buf->hw_show_image = NULL;
+
+  cm->mi_grid_base = (MB_MODE_INFO **)td->mode_info_grid->host_ptr;
+  cm->mi_grid_visible = cm->mi_grid_base;
+  cm->mi_alloc_size = cm->mi_stride * ((cm->mi_rows + 31) & ~31);
+  memset(cm->mi_grid_base, 0, sizeof(MB_MODE_INFO **) * cm->mi_alloc_size);
+
+  const int grid_w = (((buf->y_width + 63) & (~63)) >> 2) + 2 + 128;
+  const int grid_h = (((buf->y_height + 63) & (~63)) >> 2) + 2 + 128;
+  td->iter_grid_stride = grid_w;
+  td->iter_grid_stride_uv = grid_w >> 1;
+  td->bitdepth = buf->bit_depth;
+
+  td->tile_count = cm->tile_cols * cm->tile_rows;
+  dec->thread_count = AOMMIN(pbi->max_threads, td->tile_count);
+
+  td->gen_intra_iter_y = (int *)td->gen_intra_inter_grid->host_ptr;
+  td->iter_grid_offset_uv = grid_w * grid_h;
+  td->gen_intra_iter_uv = td->gen_intra_iter_y + td->iter_grid_offset_uv;
+  memset(td->gen_intra_iter_y, -1, td->iter_grid_offset_uv * 4);
+  memset(td->gen_intra_iter_uv, -1, td->iter_grid_offset_uv);
+
+  td->is_hbd = buf->bit_depth > 8;
+  td->shaders = td->is_hbd ? &dec->shader_lib->shaders_hbd : &dec->shader_lib->shaders_8bit;
+
+  HwFrameBuffer *fb = buf->hw_buffer;
+  assert(fb);
+
+  fb->frame_number = dec->frame_number;
+  fb->hbd = td->is_hbd;
+  ++dec->frame_number;
+  td->do_superres = 1 && dec->enable_superres && cm->width != cm->superres_upscaled_width;
+  td->do_loop_rest = 1 && (cm->rst_info[0].frame_restoration_type != RESTORE_NONE ||
+                           cm->rst_info[1].frame_restoration_type != RESTORE_NONE ||
+                           cm->rst_info[2].frame_restoration_type != RESTORE_NONE);
+  td->do_cdef = 1 && !cm->skip_loop_filter && !cm->coded_lossless &&
+                (cm->cdef_info.cdef_bits || cm->cdef_info.cdef_strengths[0] || cm->cdef_info.cdef_uv_strengths[0]);
+  td->do_filmgrain = 1 && cm->film_grain_params.apply_grain;
+  td->dst_frame_buffer = fb;
+  td->back_buffer0 = NULL;
+  if (td->do_loop_rest || td->do_cdef || td->do_superres) {
+    HwFrameBuffer *new_buf = get_frame_buffer(dec);
+    td->back_buffer0 = new_buf;
+    new_buf->width = fb->width;
+    new_buf->height = fb->height;
+    new_buf->hbd = td->is_hbd;
+    new_buf->y_crop_width = fb->y_crop_width;
+    new_buf->uv_crop_width = fb->uv_crop_width;
+    new_buf->y_crop_height = fb->y_crop_height;
+    new_buf->uv_crop_height = fb->uv_crop_height;
+    memcpy(new_buf->planes, fb->planes, sizeof(fb->planes));
+    new_buf->planes[0].offset += static_cast<int>(new_buf->base_offset - fb->base_offset);
+    new_buf->planes[1].offset += static_cast<int>(new_buf->base_offset - fb->base_offset);
+    new_buf->planes[2].offset += static_cast<int>(new_buf->base_offset - fb->base_offset);
+    td->frame_buffer = td->back_buffer0;
+    dec->back_buffer1.width = fb->width;
+    dec->back_buffer1.height = fb->height;
+    memcpy(dec->back_buffer1.planes, fb->planes, sizeof(fb->planes));
+    dec->back_buffer1.planes[0].offset += static_cast<int>(dec->back_buffer1.base_offset - fb->base_offset);
+    dec->back_buffer1.planes[1].offset += static_cast<int>(dec->back_buffer1.base_offset - fb->base_offset);
+    dec->back_buffer1.planes[2].offset += static_cast<int>(dec->back_buffer1.base_offset - fb->base_offset);
+  } else
+    td->frame_buffer = td->dst_frame_buffer;
+
+  pthread_mutex_lock(&dec->fb_pool_mutex);
+  ++fb->ref_cnt;
+  for (int i = 0; i < 7; ++i) {
+    td->refs[i] = NULL;
+    if (cm->remapped_ref_idx[i] == -1) continue;
+    YV12_BUFFER_CONFIG *ref = &cm->ref_frame_map[cm->remapped_ref_idx[i]]->buf;
+    if (!ref) continue;
+    assert(ref->hw_buffer->size == dec->fb_size);
+    td->refs[i] = ref->hw_buffer;
+    ++td->refs[i]->ref_cnt;
+  }
+  pthread_mutex_unlock(&dec->fb_pool_mutex);
+
+  td->ext_idct_buffer = 0;
+  if (IdctCoefCountNum != IdctCoefCountDenum) {
+    for (int r = 0; r < cm->tile_rows; ++r)
+      for (int c = 0; c < cm->tile_cols; ++c) {
+        td->ext_idct_buffer |= ((cm->tile_col_start_sb[c + 1] - cm->tile_col_start_sb[c]) *
+                                (cm->tile_row_start_sb[r + 1] - cm->tile_row_start_sb[r])) <= TileSbSizeThreshold;
+      }
+    if (td->ext_idct_buffer) {
+      get_sec_data(dec, td);
+    }
+  }
+
+  assert(dec->thread_count <= EntropyThreadCount);
+  if (!dec->gpu_thread) pthread_create(&dec->gpu_thread, 0, (LPTHREAD_START_ROUTINE)gpu_thread_hook, dec);
+
+  td->scale_enable = 0;
+  for (int i = LAST_FRAME; i <= ALTREF_FRAME; ++i) {
+    scale_factors *sf = get_ref_scale_factors(cm, i);
+    if (!sf) {
+      td->scale_factors[i].x_scale = REF_NO_SCALE;
+      td->scale_factors[i].y_scale = REF_NO_SCALE;
+      td->scale_factors[i].x_step = 0;
+      td->scale_factors[i].y_step = 0;
+    } else {
+      td->scale_factors[i].x_step = sf->x_step_q4;
+      td->scale_factors[i].y_step = sf->y_step_q4;
+      td->scale_factors[i].x_scale = sf->x_scale_fp;
+      td->scale_factors[i].y_scale = sf->y_scale_fp;
+      td->scale_enable |= av1_is_scaled(sf);
+    }
+  }
+}
+
+extern "C" void av1_setup_ext_coef_buffer(AV1Decoder *pbi, AV1_COMMON *cm, ThreadData *thread_data) {
+  av1_tile_data *tile = thread_data->tile_data;
+  Av1Core *dec = pbi->gpu_decoder;
+  av1_frame_thread_data *cur_td = dec->curr_frame_data;
+  get_sec_data(dec, cur_td);
+  av1_frame_thread_data *td = cur_td->sec_thread_data;
+  tile->dq_buffer_ptr = td->coef_buffer_offset + tile->dq_buffer_offset;
+}
+
+extern "C" void av1_setup_sec_data(AV1Decoder *pbi, AV1_COMMON *cm, ThreadData *thread_data) {
+  av1_tile_data *prev_t = thread_data->tile_data;
+  const int tile_id = thread_data->tile_id;
+  Av1Core *dec = pbi->gpu_decoder;
+
+  av1_frame_thread_data *cur_td = dec->curr_frame_data;
+  get_sec_data(dec, cur_td);
+
+  av1_frame_thread_data *td = cur_td->sec_thread_data;
+  av1_tile_data *t = &td->tile_data[tile_id];
+  t->mi_count = 0;
+  t->mi_offset = prev_t->mi_offset + td->mode_info_offset - cur_td->mode_info_offset;
+  thread_data->mi_count2 = 0;
+  thread_data->mi_pool2 = ((MB_MODE_INFO *)dec->mode_info_pool->host_ptr) + t->mi_offset;
+  thread_data->tile_data2 = t;
+}
+
+extern "C" void av1_setup_macroblockd(AV1Decoder *pbi, AV1_COMMON *cm, ThreadData *thread_data, TileInfo *tile) {
+  MACROBLOCKD *xd = &thread_data->xd;
+  Av1Core *dec = pbi->gpu_decoder;
+  const int tile_id = tile->tile_col + tile->tile_row * cm->tile_cols;
+  av1_frame_thread_data *td = dec->curr_frame_data;
+  av1_tile_data *t = &td->tile_data[tile_id];
+  memset(t, 0, sizeof(*t));
+  thread_data->tile_data = t;
+  xd->tile_data = t;
+  const int mi_offset = tile->mi_row_start * ((cm->mi_cols + 15) & ~15) +
+                        tile->mi_col_start * ((tile->mi_row_end - tile->mi_row_start + 15) & (~15));
+  const int mi_max =
+      ((tile->mi_row_end - tile->mi_row_start + 15) & (~15)) * ((tile->mi_col_end - tile->mi_col_start + 15) & (~15));
+
+  t->dq_buffer_offset = mi_offset * (td->ext_idct_buffer ? IdctCoefCountDenum : IdctCoefCountNum);
+  t->dq_buffer_ptr = td->coef_buffer_offset + t->dq_buffer_offset;
+  t->dq_buffer_base = (tran_low_t *)dec->idct_coefs->host_ptr;
+  const int mib_sz = cm->seq_params.mib_size;
+  const int sb_sz = mib_sz * mib_sz * IdctCoefCountDenum;
+  t->dq_buffer_max = t->dq_buffer_ptr + mi_max * IdctCoefCountNum - sb_sz;
+  assert(td->ext_idct_buffer || t->dq_buffer_max > t->dq_buffer_ptr);
+
+  const int blocks_4x4_count = (mi_offset * 3 + 1) >> 1;
+  t->intra_iter_max = -1;
+  t->intra_iter_max_uv = -1;
+
+  t->blocks_offset = blocks_4x4_count;
+  t->idct_blocks_host = (tx_block_info_gpu *)td->idct_blocks_unordered->host_ptr + blocks_4x4_count;
+
+  const int mi_offset_base =
+      tile->mi_row_start * cm->mi_stride + tile->mi_col_start * (tile->mi_row_end - tile->mi_row_start);
+  const int mi_max_base = (tile->mi_row_end - tile->mi_row_start) * (tile->mi_col_end - tile->mi_col_start);
+  t->mi_offset = (mi_offset_base >> 1) + td->mode_info_offset;
+  t->mi_count = 0;
+  thread_data->tile_id = tile_id;
+  thread_data->mi_count = 0;
+  thread_data->mi_count_max = mi_max_base >> 1;
+  thread_data->mi_pool = ((MB_MODE_INFO *)dec->mode_info_pool->host_ptr) + t->mi_offset;
+  thread_data->mi_count2 = 0;
+  thread_data->mi_pool2 = NULL;
+  thread_data->tile_data2 = NULL;
+  thread_data->ext_idct_buffer = td->ext_idct_buffer;
+
+  t->mi_col_start = tile->mi_col_start;
+  t->mi_row_start = tile->mi_row_start;
+  t->gen_index_ptr = 0;
+  t->gen_index_base = blocks_4x4_count * 2;
+  t->gen_indexes = ((unsigned int *)td->gen_mi_block_indexes->host_ptr) + t->gen_index_base;
+  t->gen_block_warp_offset = tile_id * (256 + 32) + (mi_offset >> 2) * 10;
+  t->gen_block_map_offset = t->gen_block_warp_offset + 32;
+  t->gen_block_map = ((int *)td->gen_block_map->host_ptr) + t->gen_block_map_offset;
+  t->gen_block_map_wrp = ((int *)td->gen_block_map->host_ptr) + t->gen_block_warp_offset;
+  t->gen_intra_iter_y = -1;
+  t->gen_intra_iter_uv = -1;
+  t->have_inter = 0;
+  t->gen_intra_max_iter = td->tile_count == 1 ? (dec->pred_map_size - 512) / 10
+                                              : ((((tile->mi_col_end - tile->mi_col_start + 15) & ~15) *
+                                                  ((tile->mi_row_end - tile->mi_row_start + 15) & ~15)) >>
+                                                 2) -
+                                                    2;
+  t->gen_intra_iter_set = AOMMIN(t->gen_intra_max_iter - 16, 256);
+  t->gen_iter_clear_offset = (t->gen_intra_iter_set + 16) * 10 + 256;
+  t->gen_iter_clear_size = sizeof(int) * (256 + t->gen_intra_max_iter * 10 - t->gen_iter_clear_offset);
+  memset(t->gen_block_map_wrp, 0, sizeof(int) * (32 + 16 + t->gen_iter_clear_offset));
+}
+
+extern "C" void av1_intra_palette(AV1Decoder *pbi, MB_MODE_INFO *mi, Av1ColorMapParam *params, int plane) {
+  Av1Core *dec = pbi->gpu_decoder;
+  av1_frame_thread_data *td = dec->curr_frame_data;
+  int i;
+  const HwFrameBuffer *fb = td->frame_buffer;
+  const int bsz = 4 >> plane;
+  uint8_t *map = params->color_map;
+  uint8_t *palette_buf = (uint8_t *)td->palette_buffer->host_ptr;
+  if (fb->hbd) {
+    const int stride = fb->planes[plane].stride >> 1;
+    const int offset = ((mi->mi_row * bsz) & (~3)) * stride + ((mi->mi_col * bsz) & (~3));
+    uint16_t *dst = (uint16_t *)(palette_buf + fb->planes[plane].offset - fb->base_offset) + offset;
+    for (i = 0; i < params->plane_height; ++i) {
+      for (int j = 0; j < params->plane_width; ++j) dst[j] = mi->palette_mode_info.palette_colors[map[j] + 8 * plane];
+      dst += stride;
+      map += params->plane_width;
+    }
+    if (plane) {
+      dst = (uint16_t *)(palette_buf + fb->planes[2].offset - fb->base_offset) + offset;
+      map = params->color_map;
+      for (i = 0; i < params->plane_height; ++i) {
+        for (int j = 0; j < params->plane_width; ++j) dst[j] = mi->palette_mode_info.palette_colors[map[j] + 16];
+        dst += stride;
+        map += params->plane_width;
+      }
+    }
+  } else {
+    const int stride = fb->planes[plane].stride;
+    const int offset = ((mi->mi_row * bsz) & (~3)) * stride + ((mi->mi_col * bsz) & (~3));
+    uint8_t *dst = palette_buf + fb->planes[plane].offset - fb->base_offset + offset;
+    for (i = 0; i < params->plane_height; ++i) {
+      for (int j = 0; j < params->plane_width; ++j)
+        dst[j] = static_cast<uint8_t>(mi->palette_mode_info.palette_colors[map[j] + 8 * plane]);
+      dst += stride;
+      map += params->plane_width;
+    }
+    if (plane) {
+      dst = palette_buf + fb->planes[2].offset - fb->base_offset + offset;  // xd->dev_frame_planes[2] + offset;
+      map = params->color_map;
+      for (i = 0; i < params->plane_height; ++i) {
+        for (int j = 0; j < params->plane_width; ++j)
+          dst[j] = static_cast<uint8_t>(mi->palette_mode_info.palette_colors[map[j] + 16]);
+        dst += stride;
+        map += params->plane_width;
+      }
+    }
+  }
+}
+
+extern "C" void av1_decode_sef(AV1Decoder *pbi) {
+  Av1Core *dec = pbi->gpu_decoder;
+  AV1_COMMON *cm = &pbi->common;
+  YV12_BUFFER_CONFIG *buf = &cm->cur_frame->buf;
+  GpuWorkItem *item = (GpuWorkItem *)MTQueueGet(&dec->gpu_item_pool);
+  item->data = NULL;
+  item->image = buf->hw_show_image;
+
+  if (item->image == NULL && buf->hw_buffer) {
+    av1_frame_thread_data *td = (av1_frame_thread_data *)MTQueueGet(&dec->frame_data_pool);
+    td->dst_frame_buffer = buf->hw_buffer;
+    td->is_hbd = buf->bit_depth > 8;
+    td->shaders = td->is_hbd ? &dec->shader_lib->shaders_hbd : &dec->shader_lib->shaders_8bit;
+    td->do_filmgrain = cm->film_grain_params.apply_grain;
+    dec->curr_frame_data = td;
+    memset(td->refs, 0, sizeof(td->refs));
+    td->back_buffer0 = 0;
+    frame_buffer_acquire(dec, td->dst_frame_buffer);
+    av1_prepare_command_buffer(dec);
+    av1_postprocess_copy_output(dec, cm);
+    av1_commit_command_buffer(dec);
+    item->image = buf->hw_show_image;
+    item->data = td;
+  }
+  dec->curr_frame_data = NULL;
+  if (item->image) item->image->user_priv = pbi->user_priv;
+  buf->hw_show_image = NULL;
+  MTQueuePush(&dec->gpu_waiting_queue, item);
+}
+
+extern "C" int av1_decode_frame_gpu(AV1Decoder *pbi) {
+  Av1Core *dec = pbi->gpu_decoder;
+  AV1_COMMON *cm = &pbi->common;
+  av1_prepare_command_buffer(dec);
+  av1_prediction_gen_blocks(pbi, dec);
+  av1_idct_run(dec);
+  av1_prediction_run_all(dec, cm, NULL);
+  av1_loopfilter_gpu(dec, cm, &pbi->mb);
+  av1_cdef_looprestoration(dec, cm, &pbi->lr_ctxt);
+  av1_inter_ext_borders(dec, cm);
+  av1_postprocess_copy_output(dec, cm);
+  av1_commit_command_buffer(dec);
+  GpuWorkItem *item = (GpuWorkItem *)MTQueueGet(&dec->gpu_item_pool);
+  item->data = dec->curr_frame_data;
+  item->image = NULL;
+  dec->curr_frame_data = NULL;
+  if (cm->show_frame) {
+    item->image = cm->cur_frame->buf.hw_show_image;
+    item->image->user_priv = pbi->user_priv;
+    cm->cur_frame->buf.hw_show_image = NULL;
+  }
+  MTQueuePush(&dec->gpu_waiting_queue, item);
+  return 0;
+}

diff --git a/libav1/dx/av1_core.h b/libav1/dx/av1_core.h
new file mode 100644
index 0000000..1d88bbb
--- /dev/null
+++ b/libav1/dx/av1_core.h

@@ -0,0 +1,56 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AV1_CORE_H_
+#define AV1_CORE_H_
+
+#include "av1/decoder/decoder.h"
+#include "aom/aom_frame_buffer.h"
+#include "av1\common\blockd.h"
+#include "av1\common\onyxc_int.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+int av1_query_memory_requirements(aom_codec_dec_cfg_t *cfg);
+int av1_create_gpu_decoder(struct Av1Core **dec, aom_codec_dec_cfg_t *cfg);
+void av1_allocate_pbi(struct Av1Core *dec, AV1Decoder **ppbi, BufferPool **pbp);
+void av1_destroy_gpu_decoder(struct Av1Core *dec);
+void av1_drain_gpu_decoder(struct Av1Core *dec);
+
+int av1_decode_frame_gpu(AV1Decoder *pbi);
+
+void av1_release_fb_callback(void *priv, YV12_BUFFER_CONFIG *ybf);
+int av1_reallocate_frame_buffer(void *priv, YV12_BUFFER_CONFIG *ybf, int width, int height, int upscaled_width,
+                                int hbd);
+
+void av1_setup_frame(AV1Decoder *pbi, AV1_COMMON *cm);
+void av1_setup_macroblockd(AV1Decoder *pbi, AV1_COMMON *cm, ThreadData *thread_data, TileInfo *tile);
+void av1_setup_sec_data(AV1Decoder *pbi, AV1_COMMON *cm, ThreadData *thread_data);
+void av1_setup_ext_coef_buffer(AV1Decoder *pbi, AV1_COMMON *cm, ThreadData *thread_data);
+
+void av1_mi_push_block(AV1Decoder *pbi, AV1_COMMON *cm, MACROBLOCKD *xd);
+void av1_intra_palette(AV1Decoder *pbi, MB_MODE_INFO *mi, Av1ColorMapParam *params, int plane);
+
+void av1_show_frame(AV1Decoder *pbi, YV12_BUFFER_CONFIG *buf, int is_visible);
+void av1_decode_sef(AV1Decoder *pbi);
+int get_output_frame(AV1Decoder *pbi, aom_image_t *dst);
+void av1_setup_context_buffers(AV1Decoder *pbi);
+#ifdef __cplusplus
+}
+#endif
+
+#endif
\ No newline at end of file

diff --git a/libav1/dx/av1_memory.cpp b/libav1/dx/av1_memory.cpp
new file mode 100644
index 0000000..7c75fe9
--- /dev/null
+++ b/libav1/dx/av1_memory.cpp

@@ -0,0 +1,163 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "dx/av1_memory.h"
+#include "aom_dsp/aom_dsp_common.h"
+#include <d3d12.h>
+#include <assert.h>
+
+enum {
+  HostMemAlign = 32,
+  DeviceMemAlign = 256,
+  HostMemAlign1 = HostMemAlign - 1,
+  DeviceMemAlign1 = DeviceMemAlign - 1,
+};
+
+void av1_memory_allocator::setup(uint8_t *host_mem, size_t host_size) {
+  size_t haddr = (size_t)host_mem;
+  size_t hoffset = ((haddr + 255) & ~255) - haddr;
+  host_base_addr = host_mem + hoffset;
+  host_max_size = host_size - hoffset;
+  host_offset = 0;
+  shader_obj_count = 0;
+  host_obj_count = 0;
+  dev_obj_count = 0;
+  memset(dev_buffer_pool, 0, sizeof(dev_buffer_pool));
+}
+
+HRESULT upload_helper(dx_compute_context *context, ID3D12Resource *dst, void *init, size_t size, size_t src_size) {
+  Microsoft::WRL::ComPtr<ID3D12Device> device = context->device;
+  ComPtr<ID3D12Resource> res;
+  HRESULT hr = device->CreateCommittedResource(&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), D3D12_HEAP_FLAG_NONE,
+                                               &CD3DX12_RESOURCE_DESC::Buffer(size), D3D12_RESOURCE_STATE_GENERIC_READ,
+                                               NULL, IID_PPV_ARGS(&res));
+
+  if (FAILED(hr)) return hr;
+
+  void *ptr = NULL;
+  hr = res->Map(0, &CD3DX12_RANGE(0, 0), &ptr);
+  if (FAILED(hr)) return hr;
+
+  memcpy(ptr, init, src_size);
+  res->Unmap(0, NULL);
+
+  ComPtr<ID3D12CommandAllocator> computeAllocator;
+  hr = device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&computeAllocator));
+  if (FAILED(hr)) return hr;
+
+  ComPtr<ID3D12GraphicsCommandList> clist;
+  hr = device->CreateCommandList(0, D3D12_COMMAND_LIST_TYPE_DIRECT, computeAllocator.Get(), NULL, IID_PPV_ARGS(&clist));
+  if (FAILED(hr)) return hr;
+
+  clist->CopyResource(dst, res.Get());
+  clist->ResourceBarrier(
+      1, &CD3DX12_RESOURCE_BARRIER::Transition(dst, D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_GENERIC_READ));
+  clist->Close();
+
+  ID3D12CommandList *cl[] = {clist.Get()};
+  Microsoft::WRL::ComPtr<ID3D12CommandQueue> queue = context->queue_direct;
+  queue->ExecuteCommandLists(1, cl);
+
+  ComPtr<ID3D12Fence> fence;
+  hr = device->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&fence));
+  if (FAILED(hr)) return hr;
+
+  queue->Signal(fence.Get(), 1);
+  HANDLE event = CreateEvent(nullptr, false, false, nullptr);
+  fence->SetEventOnCompletion(1, event);
+
+  WaitForSingleObject(event, INFINITE);
+  return hr;
+}
+
+GpuBufferObject *av1_memory_allocator::create_buffer(size_t size, MemoryType mem) {
+  assert(dev_obj_count < DevMemPoolSize);
+  GpuBufferObject *obj = &dev_buffer_pool[dev_obj_count++];
+  obj->host_ptr = NULL;
+  obj->memtype = mem;
+  obj->size = (size + DeviceMemAlign1) & (~(DeviceMemAlign1));
+
+  D3D12_HEAP_TYPE heapLut[] = {D3D12_HEAP_TYPE_DEFAULT, D3D12_HEAP_TYPE_UPLOAD, D3D12_HEAP_TYPE_CUSTOM,
+                               D3D12_HEAP_TYPE_READBACK, D3D12_HEAP_TYPE_DEFAULT};
+  const D3D12_HEAP_TYPE heapType = heapLut[mem];
+  const D3D12_RESOURCE_STATES state = (mem == DeviceOnly) ? D3D12_RESOURCE_STATE_UNORDERED_ACCESS
+                                                          : (mem == DeviceOnlyConst)
+                                                                ? D3D12_RESOURCE_STATE_COPY_DEST
+                                                                : (mem == ReadBack) ? D3D12_RESOURCE_STATE_COPY_DEST
+                                                                                    : D3D12_RESOURCE_STATE_GENERIC_READ;
+
+  D3D12_HEAP_PROPERTIES heapProp;
+  heapProp.Type = heapType;
+  heapProp.CPUPageProperty = mem == HostRW ? D3D12_CPU_PAGE_PROPERTY_WRITE_BACK : D3D12_CPU_PAGE_PROPERTY_UNKNOWN;  //
+  heapProp.MemoryPoolPreference = mem == HostRW ? D3D12_MEMORY_POOL_L0 : D3D12_MEMORY_POOL_UNKNOWN;
+  heapProp.CreationNodeMask = 0;
+  heapProp.VisibleNodeMask = 0;
+
+  const D3D12_RESOURCE_FLAGS flags =
+      (mem == DeviceOnly) ? D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS : D3D12_RESOURCE_FLAG_NONE;
+  HRESULT hr = context->device->CreateCommittedResource(&heapProp, D3D12_HEAP_FLAG_NONE,
+                                                        &CD3DX12_RESOURCE_DESC::Buffer(obj->size, flags), state, NULL,
+                                                        __uuidof(*obj->dev), reinterpret_cast<void **>(&obj->dev));
+
+  if (SUCCEEDED(hr) && mem != DeviceOnly && mem != DeviceOnlyConst) {
+    hr = obj->dev->Map(0, &CD3DX12_RANGE(0, 0), &obj->host_ptr);
+  }
+
+  return SUCCEEDED(hr) ? obj : NULL;
+}
+
+void *av1_memory_allocator::host_allocate(size_t size, int align) {
+  align = AOMMAX(align, HostMemAlign) - 1;
+  size_t addr_offset = ((host_offset + align) & ~align) - host_offset;
+  size += addr_offset;
+  size = (size + HostMemAlign1) & (~HostMemAlign1);
+  if ((host_offset + size) > host_max_size) return NULL;
+  uint8_t *ptr = host_base_addr + host_offset + addr_offset;
+  host_offset += size;
+  return ptr;
+}
+
+void av1_memory_allocator::release() {
+  for (int i = 0; i < dev_obj_count; ++i) {
+    GpuBufferObject *obj = &dev_buffer_pool[i];
+    if (!obj->dev) continue;
+
+    if (obj->host_ptr) {
+      obj->dev->Unmap(0, &CD3DX12_RANGE(0, 0));
+      obj->host_ptr = NULL;
+    }
+
+    obj->dev->Release();
+    obj->dev = NULL;
+  }
+}
+
+GpuBufferObject *av1_memory_allocator_dummy::create_buffer(size_t size, MemoryType mem) {
+  size_t addr_offset = ((device_ptr + DeviceMemAlign1) & ~DeviceMemAlign1) - device_ptr;
+  size += addr_offset;
+  size = (size + DeviceMemAlign1) & (~DeviceMemAlign1);
+  device_ptr += size;
+  return NULL;
+}
+
+void *av1_memory_allocator_dummy::host_allocate(size_t size, int align) {
+  align = AOMMAX(align, HostMemAlign) - 1;
+  size_t addr_offset = ((host_ptr + align) & ~align) - host_ptr;
+  size += addr_offset;
+  size = (size + HostMemAlign1) & (~HostMemAlign1);
+  host_ptr += size;
+  return NULL;
+}

diff --git a/libav1/dx/av1_memory.h b/libav1/dx/av1_memory.h
new file mode 100644
index 0000000..6b88eda
--- /dev/null
+++ b/libav1/dx/av1_memory.h

@@ -0,0 +1,101 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+#include "dx\vp9dx_linear_allocator.h"
+#include <d3d12.h>
+#include <wrl.h>
+#include <dxgi1_4.h>
+#include <d3dx12.h>
+
+enum {
+  DevMemPoolSize = 128,
+  HostMemPoolSize = 128,
+  ShaderPoolSize = 80,
+};
+
+enum BufferTypes {
+  ByteBuffer,
+  StructuredBuffer,
+  ConstBuffer,
+  BufferUnknown,
+};
+
+enum MemoryType {
+  DeviceOnly = 0,
+  DeviceUpload = 1,
+  HostRW = 2,
+  ReadBack = 3,
+  DeviceOnlyConst = 4,
+};
+
+using Microsoft::WRL::ComPtr;
+
+struct GpuBufferObject {
+  ID3D12Resource* dev;
+  void* host_ptr;
+  size_t size;
+  size_t max_size;
+  MemoryType memtype;
+  const char* name;
+};
+
+struct dx_compute_context {
+  Microsoft::WRL::ComPtr<ID3D12Device> device;
+  Microsoft::WRL::ComPtr<ID3D12CommandQueue> queue;
+  Microsoft::WRL::ComPtr<ID3D12CommandQueue> queue_direct;
+  Microsoft::WRL::ComPtr<ID3D12GraphicsCommandList> command_list;
+};
+
+class av1_memory_manager_base {
+ public:
+  virtual GpuBufferObject* create_buffer(size_t size, MemoryType mem) = 0;
+  virtual void* host_allocate(size_t size, int align = 0) = 0;
+};
+
+struct av1_memory_allocator : public av1_memory_manager_base {
+ public:
+  av1_memory_allocator() {}
+  void setup(uint8_t* host_mem, size_t host_size);
+  void set_dx_context(dx_compute_context* ctx) { context = ctx; }
+  virtual GpuBufferObject* create_buffer(size_t size, MemoryType mem);
+  virtual void* host_allocate(size_t size, int align = 0);
+  void release();
+
+ private:
+  dx_compute_context* context;
+  GpuBufferObject dev_buffer_pool[DevMemPoolSize];
+
+  int shader_obj_count;
+  int host_obj_count;
+  int dev_obj_count;
+  uint8_t* host_base_addr;
+  size_t host_max_size;
+  size_t host_offset;
+};
+
+struct av1_memory_allocator_dummy : public av1_memory_manager_base {
+ public:
+  av1_memory_allocator_dummy() : host_ptr(0), device_ptr(0) {}
+  virtual GpuBufferObject* create_buffer(size_t size, MemoryType mem);
+  virtual void* host_allocate(size_t size, int align = 0);
+  size_t get_host_size() { return host_ptr; }
+  size_t get_device_size() { return device_ptr; }
+
+ private:
+  size_t host_ptr;
+  size_t device_ptr;
+};

diff --git a/libav1/dx/av1_thread.cpp b/libav1/dx/av1_thread.cpp
new file mode 100644
index 0000000..2c9989b
--- /dev/null
+++ b/libav1/dx/av1_thread.cpp

@@ -0,0 +1,151 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "dx/av1_thread.h"
+
+void QueueInit(DataPtrQueue* queue) {
+  queue->m_head = queue->m_tail = 0;
+  queue->m_QueueNotEmpty = 0;
+}
+
+void QueuePop(DataPtrQueue* queue) {
+  if (queue->m_tail == queue->m_head) return;
+
+  queue->m_element[queue->m_tail % BUFFERS_COUNT] = NULL;
+  queue->m_tail++;
+  if (queue->m_tail == queue->m_head)  // its empty now
+  {
+    queue->m_tail = queue->m_head = 0;
+    queue->m_QueueNotEmpty = 0;
+  }
+}
+
+void QueuePush(DataPtrQueue* queue, void* item) {
+  queue->m_element[queue->m_head % BUFFERS_COUNT] = item;
+  queue->m_head++;
+  queue->m_QueueNotEmpty = 1;
+}
+
+void* QueueFront(DataPtrQueue* queue) {
+  if (QueueIsEmpty(queue)) return NULL;
+  return queue->m_element[queue->m_tail % BUFFERS_COUNT];
+}
+
+void* QueueGet(DataPtrQueue* queue) {
+  void* item = NULL;
+  if (QueueIsEmpty(queue)) return NULL;
+  item = queue->m_element[queue->m_tail % BUFFERS_COUNT];
+  QueuePop(queue);
+  return item;
+}
+
+int QueueIsEmpty(DataPtrQueue* queue) { return queue->m_head == queue->m_tail; }
+
+void MTQueueInit(DataPtrQueueMT* queue) {
+  QueueInit(&queue->int_queue);
+  pthread_cond_init(&queue->m_empty_cond, NULL);
+  pthread_mutex_init(&queue->m_mutex, NULL);
+}
+
+void MTQueueDestroy(DataPtrQueueMT* queue) {
+  pthread_mutex_destroy(&queue->m_mutex);
+  pthread_cond_destroy(&queue->m_empty_cond);
+}
+
+void MTQueuePush(DataPtrQueueMT* queue, void* item) {
+  pthread_mutex_lock(&queue->m_mutex);  // cs.Lock();
+
+  QueuePush(&queue->int_queue, item);
+  pthread_cond_signal(&queue->m_empty_cond);
+
+  pthread_mutex_unlock(&queue->m_mutex);
+}
+
+void* MTQueueGet(DataPtrQueueMT* queue) {
+  void* frontElement = NULL;
+  pthread_mutex_lock(&queue->m_mutex);
+  {
+    while (!queue->int_queue.m_QueueNotEmpty) pthread_cond_wait(&queue->m_empty_cond, &queue->m_mutex);
+
+    frontElement = QueueGet(&queue->int_queue);
+  }
+  pthread_mutex_unlock(&queue->m_mutex);
+
+  return frontElement;
+}
+
+void MTQueuePushSafe(DataPtrQueueMT* queue, void* item) {
+  pthread_mutex_lock(&queue->m_mutex);  // cs.Lock();
+
+  while (queue->int_queue.m_head - queue->int_queue.m_tail >= BUFFERS_COUNT)
+    pthread_cond_wait(&queue->m_empty_cond, &queue->m_mutex);
+
+  QueuePush(&queue->int_queue, item);
+  pthread_cond_signal(&queue->m_empty_cond);
+
+  pthread_mutex_unlock(&queue->m_mutex);
+}
+
+void* MTQueueGetSafe(DataPtrQueueMT* queue) {
+  void* frontElement = NULL;
+  pthread_mutex_lock(&queue->m_mutex);
+  {
+    while (!queue->int_queue.m_QueueNotEmpty) pthread_cond_wait(&queue->m_empty_cond, &queue->m_mutex);
+
+    frontElement = QueueGet(&queue->int_queue);
+    pthread_cond_signal(&queue->m_empty_cond);
+  }
+  pthread_mutex_unlock(&queue->m_mutex);
+
+  return frontElement;
+}
+
+int MTQueueIsEmpty(DataPtrQueueMT* queue) {
+  int is_empty = 0;
+  pthread_mutex_lock(&queue->m_mutex);
+  is_empty = QueueIsEmpty(&queue->int_queue);
+  pthread_mutex_unlock(&queue->m_mutex);
+  return is_empty;
+}
+
+void* MTQueueFront(DataPtrQueueMT* queue) {
+  void* ret = NULL;
+  pthread_mutex_lock(&queue->m_mutex);
+  ret = QueueFront(&queue->int_queue);
+  pthread_mutex_unlock(&queue->m_mutex);
+  return ret;
+}
+
+void MTQueuePop(DataPtrQueueMT* queue) {
+  pthread_mutex_lock(&queue->m_mutex);
+  QueuePop(&queue->int_queue);
+  pthread_mutex_unlock(&queue->m_mutex);
+}
+
+void MTQueueClear(DataPtrQueueMT* queue) {
+  pthread_mutex_lock(&queue->m_mutex);
+  queue->int_queue.m_head = queue->int_queue.m_tail = 0;
+  queue->int_queue.m_QueueNotEmpty = 0;
+  pthread_mutex_unlock(&queue->m_mutex);
+}
+
+int MTQueueGetCount(DataPtrQueueMT* queue) {
+  int ret = 0;
+  pthread_mutex_lock(&queue->m_mutex);
+  ret = queue->int_queue.m_head - queue->int_queue.m_tail;
+  pthread_mutex_unlock(&queue->m_mutex);
+  return ret;
+}

diff --git a/libav1/dx/av1_thread.h b/libav1/dx/av1_thread.h
new file mode 100644
index 0000000..dad238c
--- /dev/null
+++ b/libav1/dx/av1_thread.h

@@ -0,0 +1,51 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+#include "aom_util/aom_thread.h"
+#define BUFFERS_COUNT 100
+
+typedef struct {
+  void* m_element[BUFFERS_COUNT];
+  int m_head;
+  int m_tail;
+  int m_QueueNotEmpty;
+} DataPtrQueue;
+
+typedef struct {
+  DataPtrQueue int_queue;
+  pthread_cond_t m_empty_cond;  // event
+  pthread_mutex_t m_mutex;
+} DataPtrQueueMT;
+
+void QueueInit(DataPtrQueue* queue);
+void QueuePop(DataPtrQueue* queue);
+void QueuePush(DataPtrQueue* queue, void* item);
+void* QueueFront(DataPtrQueue* queue);
+void* QueueGet(DataPtrQueue* queue);
+int QueueIsEmpty(DataPtrQueue* queue);
+
+void MTQueueInit(DataPtrQueueMT* queue);
+void MTQueuePush(DataPtrQueueMT* queue, void* buffer);
+void* MTQueueGet(DataPtrQueueMT* queue);
+void MTQueuePushSafe(DataPtrQueueMT* queue, void* buffer);
+void* MTQueueGetSafe(DataPtrQueueMT* queue);
+int MTQueueIsEmpty(DataPtrQueueMT* queue);
+void MTQueueDestroy(DataPtrQueueMT* queue);
+void* MTQueueFront(DataPtrQueueMT* queue);
+void MTQueuePop(DataPtrQueueMT* queue);
+void MTQueueClear(DataPtrQueueMT* queue);
+int MTQueueGetCount(DataPtrQueueMT* queue);

diff --git a/libav1/dx/loopfilters.cpp b/libav1/dx/loopfilters.cpp
new file mode 100644
index 0000000..e52bbbe
--- /dev/null
+++ b/libav1/dx/loopfilters.cpp

@@ -0,0 +1,1065 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "dx/types.h"
+#include "dx/av1_core.h"
+#include "dx/av1_memory.h"
+#include "dx/av1_compute.h"
+
+#include "av1/common/restoration.h"
+#include "av1/common/av1_loopfilter.h"
+
+struct LoopfilterData {
+  int planes[3][4];
+  int limits[64][4];
+};
+
+struct LoopfilterSRT {
+  int wicount;
+  int plane;
+  int offset_base;
+  int block_cols;
+  int block_id_offset;
+};
+
+struct GenLfData {
+  int mi_stride;
+  int mi_addr_base;
+  int delta_q_info_delta_lf_present_flag;
+  int delta_q_info_delta_lf_multi;
+  int lf_mode_ref_delta_enabled;
+  int r[3];
+
+  int lf_filter_level[2];
+  int lf_filter_level_u;
+  int lf_filter_level_v;
+
+  int lf_mode_deltas[2][4];
+  int lf_ref_deltas[8][4];
+  int seg_features[8][4];
+  int seg_data[8][8][4];
+  int lfi_n_lvl[3][8][8][2][4];
+};
+
+struct GenLfSRT {
+  int wicount;
+  int mi_cols;
+  int mi_rows;
+  int plane;
+  int dst_offset;
+  int dst_stride;
+};
+
+void av1_loopfilter_gpu(Av1Core *dec, AV1_COMMON *cm, MACROBLOCKD *xd) {
+  if (!(cm->lf.filter_level[0] || cm->lf.filter_level[1])) {
+    return;
+  }
+  av1_loop_filter_frame_init(cm, 0, 3);
+
+  av1_frame_thread_data *td = dec->curr_frame_data;
+  HwFrameBuffer *dst = td->dst_frame_buffer;
+  ComputeCommandBuffer *cb = &td->command_buffer;
+  Microsoft::WRL::ComPtr<ID3D12GraphicsCommandList> command_list = dec->compute.command_list;
+  ComputeShader *shader;
+
+  const int mi_cols_y = AOMMIN(cm->mi_cols, (dst->y_crop_width + 3) >> 2);
+  const int mi_rows_y = AOMMIN(cm->mi_rows, (dst->y_crop_height + 3) >> 2);
+  const int mi_cols_uv = AOMMIN((cm->mi_cols + 1) >> 1, (dst->uv_crop_width + 3) >> 2);
+  const int mi_rows_uv = AOMMIN((cm->mi_rows + 1) >> 1, (dst->uv_crop_height + 3) >> 2);
+
+  const int blk_cols[] = {
+      (mi_cols_y + 15) >> 4, (mi_cols_uv + 15) >> 4, (mi_cols_uv + 15) >> 4, mi_cols_y, mi_cols_uv, mi_cols_uv};
+
+  int blk_count[] = {
+      blk_cols[0] * mi_rows_y,
+      blk_cols[1] * mi_rows_uv,
+      blk_cols[2] * mi_rows_uv,
+      blk_cols[3] * ((mi_rows_y + 15) >> 4),
+      blk_cols[4] * ((mi_rows_uv + 15) >> 4),
+      blk_cols[5] * ((mi_rows_uv + 15) >> 4),
+  };
+
+  int blk_offsets[6];
+  blk_offsets[0] = 0;
+  for (int i = 1; i < 6; ++i) blk_offsets[i] = blk_offsets[i - 1] + blk_count[i - 1];
+
+  ConstantBufferObject cbo = cb->Alloc(sizeof(GenLfData));
+  GenLfData *gen_data = (GenLfData *)cbo.host_ptr;
+  gen_data->mi_stride = cm->mi_stride;
+  gen_data->mi_addr_base = static_cast<int>((uint64_t)dec->mode_info_pool->host_ptr);
+  gen_data->delta_q_info_delta_lf_multi = cm->delta_q_info.delta_lf_multi;
+  gen_data->delta_q_info_delta_lf_present_flag = cm->delta_q_info.delta_lf_present_flag;
+  gen_data->lf_mode_ref_delta_enabled = cm->lf.mode_ref_delta_enabled;
+  gen_data->lf_filter_level[0] = cm->lf.filter_level[0];
+  gen_data->lf_filter_level[1] = cm->lf.filter_level[1];
+  gen_data->lf_filter_level_u = cm->lf.filter_level_u;
+  gen_data->lf_filter_level_v = cm->lf.filter_level_v;
+  gen_data->lf_mode_deltas[0][0] = cm->lf.mode_deltas[0];
+  gen_data->lf_mode_deltas[1][0] = cm->lf.mode_deltas[1];
+  for (int i = 0; i < 8; ++i) gen_data->lf_ref_deltas[i][0] = cm->lf.ref_deltas[i];
+  for (int i = 0; i < 8; ++i) {
+    gen_data->seg_features[i][0] = xd->lossless[i];
+    gen_data->seg_features[i][1] = cm->seg.enabled ? cm->seg.feature_mask[i] : 0;
+    for (int j = 0; j < 8; ++j) gen_data->seg_data[i][j][0] = cm->seg.feature_data[i][j];
+  }
+  for (int p = 0; p < 3; ++p)
+    for (int s = 0; s < 8; ++s)
+      for (int r = 0; r < 8; ++r)
+        for (int m = 0; m < 2; ++m) {
+          gen_data->lfi_n_lvl[p][s][r][m][0] = cm->lf_info.lvl[p][s][0][r][m];
+          gen_data->lfi_n_lvl[p][s][r][m][1] = cm->lf_info.lvl[p][s][1][r][m];
+        }
+
+  GenLfSRT gen_srt;
+  shader = &dec->shader_lib->shader_gen_lf_vert;
+
+  command_list->SetComputeRootSignature(shader->signaturePtr.Get());
+  command_list->SetComputeRootShaderResourceView(0, dec->mode_info_pool->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootShaderResourceView(1, td->mode_info_grid->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootUnorderedAccessView(2, dec->loopfilter_blocks->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootConstantBufferView(3, cbo.dev_address);
+
+  command_list->SetPipelineState(shader->pso.Get());
+  for (int plane = 0; plane < 3; ++plane) {
+    if ((plane == 1 && !cm->lf.filter_level_u) || (plane == 2 && !cm->lf.filter_level_v)) continue;
+
+    const int mi_cols = plane ? mi_cols_uv : mi_cols_y;
+    const int mi_rows = plane ? mi_rows_uv : mi_rows_y;
+    gen_srt.plane = plane;
+    gen_srt.mi_cols = mi_cols;
+    gen_srt.mi_rows = mi_rows;
+    gen_srt.dst_offset = blk_offsets[plane] * 32;
+    gen_srt.dst_stride = blk_cols[plane] * 32;
+    gen_srt.wicount = mi_rows * ((mi_cols + 15) >> 4);
+    command_list->SetComputeRoot32BitConstants(4, 6, &gen_srt, 0);
+    command_list->Dispatch((gen_srt.wicount + 63) >> 6, 1, 1);
+  }
+  shader = &dec->shader_lib->shader_gen_lf_hor;
+  command_list->SetPipelineState(shader->pso.Get());
+  for (int plane = 0; plane < 3; ++plane) {
+    if ((plane == 1 && !cm->lf.filter_level_u) || (plane == 2 && !cm->lf.filter_level_v)) continue;
+
+    const int mi_cols = plane ? mi_cols_uv : mi_cols_y;
+    const int mi_rows = plane ? mi_rows_uv : mi_rows_y;
+    gen_srt.plane = plane;
+    gen_srt.mi_cols = mi_cols;
+    gen_srt.mi_rows = mi_rows;
+    gen_srt.dst_offset = blk_offsets[plane + 3] * 32;
+    gen_srt.dst_stride = blk_cols[plane + 3] * 2;
+    gen_srt.wicount = mi_cols * ((mi_rows + 15) >> 4);
+    command_list->SetComputeRoot32BitConstants(4, 6, &gen_srt, 0);
+    command_list->Dispatch((gen_srt.wicount + 63) >> 6, 1, 1);
+  }
+  command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->loopfilter_blocks->dev));
+
+  cbo = cb->Alloc(sizeof(LoopfilterData));
+  LoopfilterData *data = (LoopfilterData *)cbo.host_ptr;
+  memcpy(data->planes, td->frame_buffer->planes, sizeof(data->planes));
+  for (int lvl = 0; lvl < 64; ++lvl) {
+    data->limits[lvl][0] = cm->lf_info.lfthr[lvl].lim[0];
+    data->limits[lvl][1] = cm->lf_info.lfthr[lvl].mblim[0];
+    data->limits[lvl][2] = cm->lf_info.lfthr[lvl].hev_thr[0];
+  }
+
+  LoopfilterSRT srt;
+  shader = &td->shaders->loopfilter_v;
+  command_list->SetComputeRootSignature(shader->signaturePtr.Get());
+  command_list->SetComputeRootShaderResourceView(0, dec->loopfilter_blocks->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootUnorderedAccessView(1, dec->frame_buffer_pool->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootConstantBufferView(2, cbo.dev_address);
+  command_list->SetPipelineState(shader->pso.Get());
+  srt.block_id_offset = 0;
+  for (int p = 0; p < 3; ++p) {
+    if (p == 1 && !cm->lf.filter_level_u) continue;
+    if (p == 2 && !cm->lf.filter_level_v) continue;
+    srt.plane = p;
+    srt.offset_base = blk_offsets[p];
+    srt.block_cols = blk_cols[p];
+    srt.wicount = ((blk_count[p] + 1) >> 1) * 4;
+    command_list->SetComputeRoot32BitConstants(3, 5, &srt, 0);
+    command_list->Dispatch((srt.wicount + 63) >> 6, 1, 1);
+  }
+  command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->frame_buffer_pool->dev));
+
+  srt.block_id_offset = 1;
+  for (int p = 0; p < 3; ++p) {
+    if (p == 1 && !cm->lf.filter_level_u) continue;
+    if (p == 2 && !cm->lf.filter_level_v) continue;
+    srt.plane = p;
+    srt.offset_base = blk_offsets[p];
+    srt.block_cols = blk_cols[p];
+    srt.wicount = (blk_count[p] >> 1) * 4;
+    command_list->SetComputeRoot32BitConstants(3, 5, &srt, 0);
+    command_list->Dispatch((srt.wicount + 63) >> 6, 1, 1);
+  }
+
+  command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->frame_buffer_pool->dev));
+
+  shader = &td->shaders->loopfilter_h;
+  command_list->SetPipelineState(shader->pso.Get());
+  for (int p = 0; p < 3; ++p) {
+    if (p == 1 && !cm->lf.filter_level_u) continue;
+    if (p == 2 && !cm->lf.filter_level_v) continue;
+    int type = p + 3;
+    srt.plane = p;
+    srt.offset_base = blk_offsets[type];
+    srt.block_cols = blk_cols[type];
+    srt.wicount = blk_count[type] * 4;
+    command_list->SetComputeRoot32BitConstants(3, 5, &srt, 0);
+    command_list->Dispatch((srt.wicount + 63) >> 6, 1, 1);
+  }
+  command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->frame_buffer_pool->dev));
+}
+
+struct UpscaleData {
+  int src_planes[3][4];
+  int dst_planes[3][4];
+};
+
+struct UpscaleSRT {
+  int plane;
+  int wi_count;
+  int hbd;
+};
+
+void av1_superres(Av1Core *dec, HwFrameBuffer *src, HwFrameBuffer *dst, int iter) {
+  av1_frame_thread_data *td = dec->curr_frame_data;
+  ComputeCommandBuffer *cb = &td->command_buffer;
+  Microsoft::WRL::ComPtr<ID3D12GraphicsCommandList> command_list = dec->compute.command_list;
+  ConstantBufferObject cbo = cb->Alloc(sizeof(UpscaleData));
+  UpscaleData *data = (UpscaleData *)cbo.host_ptr;
+  memcpy(data->src_planes, src->planes, sizeof(src->planes));
+  memcpy(data->dst_planes, dst->planes, sizeof(dst->planes));
+  data->dst_planes[0][2] = dst->y_crop_width;
+  data->dst_planes[1][2] = dst->uv_crop_width;
+  data->dst_planes[2][2] = dst->uv_crop_width;
+  data->src_planes[0][2] = src->y_crop_width;
+  data->src_planes[1][2] = src->uv_crop_width;
+  data->src_planes[2][2] = src->uv_crop_width;
+  data->src_planes[0][3] = src->width;
+  data->src_planes[1][3] = (src->width + 1) >> 1;
+  data->src_planes[2][3] = (src->width + 1) >> 1;
+
+  UpscaleSRT srt;
+  ComputeShader *shader = &dec->shader_lib->shader_upscale;
+  command_list->SetPipelineState(shader->pso.Get());
+  command_list->SetComputeRootSignature(shader->signaturePtr.Get());
+  command_list->SetComputeRootUnorderedAccessView(0, dec->frame_buffer_pool->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootConstantBufferView(1, cbo.dev_address);
+  for (int plane = 0; plane < 3; ++plane) {
+    srt.plane = plane;
+    const int w = plane ? dst->uv_crop_width : dst->y_crop_width;
+    const int h = plane ? dst->uv_crop_height : dst->y_crop_height;
+    srt.wi_count = h * ((w + 3) >> 2);
+    srt.hbd = dst->hbd;
+    command_list->SetComputeRoot32BitConstants(2, 3, &srt, 0);
+    command_list->Dispatch((srt.wi_count + 63) >> 6, 1, 1);
+  }
+
+  command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->frame_buffer_pool->dev));
+}
+
+struct PlaneInfo1 {
+  int stride;
+  int offset;
+  int width;
+  int height;
+};
+
+struct UnitsInfo {
+  int Rows;
+  int Cols;
+  int Size;
+  int Stride;
+};
+
+struct PlaneRestorationData {
+  PlaneInfo1 plane;
+  UnitsInfo units;
+
+  int pp_offset;
+  int dst_offset;
+  int Lr_buffer_offset;
+  int subsampling;
+  int hbd;
+  int bit_depth;
+  int pad[2];
+};
+
+struct RestorationData {
+  int Sgr_Params[16][4];
+};
+
+struct CDefData {
+  int plane[4];
+  int uv_stride;
+  int dst_offset[3];
+
+  int uv_offset[2];
+  int index_stride;
+  int skips_stride;
+
+  int pri_damping;
+  int sec_damping;
+  int pli;
+  int hbd;
+
+  int bit_depth;
+  int _dummie[3];
+
+  int cdef_directions[16][2][4];
+  int cdef_strength[16][4];
+  int cdef_uv_strength[16][4];
+};
+
+void av1_cdef_filter_run(Av1Core *dec, AV1_COMMON *cm, HwFrameBuffer *src, HwFrameBuffer *dst) {
+  av1_frame_thread_data *td = dec->curr_frame_data;
+  if (!td->do_cdef) {
+    PutPerfMarker(td, &td->perf_markers[4]);
+    return;
+  }
+
+  const CdefInfo *const cdef_info = &cm->cdef_info;
+
+  int *cdef_indexes = (int *)td->cdef_indexes->host_ptr;
+  int *cdef_skips = (int *)td->cdef_skips->host_ptr;
+  const int nvfb = (cm->mi_rows + MI_SIZE_64X64 - 1) / MI_SIZE_64X64;
+  const int nhfb = (cm->mi_cols + MI_SIZE_64X64 - 1) / MI_SIZE_64X64;
+
+  for (int fbr = 0; fbr < nvfb; fbr++) {
+    for (int fbc = 0; fbc < nhfb; fbc++) {
+      cdef_indexes[fbr * nhfb + fbc] =
+          cm->mi_grid_visible[MI_SIZE_64X64 * fbr * cm->mi_stride + MI_SIZE_64X64 * fbc]->cdef_strength;
+    }
+  }
+  for (int r = 0; r < AOMMIN(nvfb * 16, cm->mi_rows / 2); r++) {
+    for (int c = 0; c < AOMMIN(nhfb * 16, cm->mi_cols / 2); c++) {
+      cdef_skips[r * nhfb * 16 + c] = cm->mi_grid_visible[(r * 2 + 0) * cm->mi_stride + c * 2 + 0]->skip &&
+                                      cm->mi_grid_visible[(r * 2 + 0) * cm->mi_stride + c * 2 + 1]->skip &&
+                                      cm->mi_grid_visible[(r * 2 + 1) * cm->mi_stride + c * 2 + 0]->skip &&
+                                      cm->mi_grid_visible[(r * 2 + 1) * cm->mi_stride + c * 2 + 1]->skip;
+    }
+  }
+
+  ComputeCommandBuffer *cb = &td->command_buffer;
+  ConstantBufferObject cbo = cb->Alloc(sizeof(CDefData));
+  CDefData *data = reinterpret_cast<CDefData *>(cbo.host_ptr);
+
+  /* Generated from gen_filter_tables.c. */
+  const int cdef_directions_new[8][4] = {{-1, +1, -2, +2}, {0, +1, -1, +2}, {0, +1, 0, +2}, {0, +1, 1, +2},
+                                         {1, +1, 2, +2},   {1, +0, 2, +1},  {1, +0, 2, +0}, {1, +0, 2, -1}};
+  for (int i = 0; i < 8; i++) {
+    data->cdef_directions[i][0][1] = cdef_directions_new[i][0];
+    data->cdef_directions[i][0][0] = cdef_directions_new[i][1];
+    data->cdef_directions[i][1][1] = cdef_directions_new[i][2];
+    data->cdef_directions[i][1][0] = cdef_directions_new[i][3];
+    data->cdef_directions[8 + i][0][1] = cdef_directions_new[i][0];
+    data->cdef_directions[8 + i][0][0] = cdef_directions_new[i][1];
+    data->cdef_directions[8 + i][1][1] = cdef_directions_new[i][2];
+    data->cdef_directions[8 + i][1][0] = cdef_directions_new[i][3];
+  }
+  for (int i = 0; i < 16; i++) {
+    data->cdef_strength[i][0] = cdef_info->cdef_strengths[i];
+    data->cdef_uv_strength[i][0] = cdef_info->cdef_uv_strengths[i];
+  }
+
+  data->index_stride = nhfb;
+  data->skips_stride = nhfb * 16;
+  data->hbd = td->is_hbd;
+  data->bit_depth = td->bitdepth;
+
+  data->pri_damping = cdef_info->cdef_pri_damping;
+  data->sec_damping = cdef_info->cdef_sec_damping;
+  int w = src->width;
+  int h = src->height;
+  data->plane[0] = src->planes[0].stride;
+  data->plane[1] = src->planes[0].offset;
+  data->plane[2] = w;
+  data->plane[3] = h;
+  data->uv_stride = src->planes[1].stride;
+  data->uv_offset[0] = src->planes[1].offset;
+  data->uv_offset[1] = src->planes[2].offset;
+
+  // srt.data->pli = i;
+  data->dst_offset[0] = dst->planes[0].offset;
+  data->dst_offset[1] = dst->planes[1].offset;
+  data->dst_offset[2] = dst->planes[2].offset;
+
+  Microsoft::WRL::ComPtr<ID3D12GraphicsCommandList> command_list = dec->compute.command_list;
+  ComputeShader *shader = &dec->shader_lib->shader_cdef_filter;
+  command_list->SetPipelineState(shader->pso.Get());
+  command_list->SetComputeRootSignature(shader->signaturePtr.Get());
+  command_list->SetComputeRootShaderResourceView(0, td->cdef_indexes->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootShaderResourceView(1, td->cdef_skips->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootUnorderedAccessView(2, dec->frame_buffer_pool->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootConstantBufferView(3, cbo.dev_address);
+  command_list->Dispatch((w + 31) >> 5, (h + 31) >> 5, 1);
+  command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->frame_buffer_pool->dev));
+  PutPerfMarker(td, &td->perf_markers[4]);
+}
+
+void av1_looprestoration(Av1Core *dec, AV1_COMMON *cm, void *lr_ctxt) {
+  av1_frame_thread_data *td = dec->curr_frame_data;
+  if (!td->do_loop_rest) {
+    PutPerfMarker(td, &td->perf_markers[5]);
+    return;
+  }
+
+  if (cm->rst_info[0].frame_restoration_type != RESTORE_NONE ||
+      cm->rst_info[1].frame_restoration_type != RESTORE_NONE || cm->rst_info[2].frame_restoration_type != RESTORE_NONE)
+    av1_loop_restoration_filter_frame_init((AV1LrStruct *)lr_ctxt, (YV12_BUFFER_CONFIG *)&cm->cur_frame->buf, cm, 0, 3);
+
+  AV1LrStruct *loop_rest_ctxt = (AV1LrStruct *)lr_ctxt;
+  int lr_ptr = 0;
+  // PlaneRestorationData pl_data[3] = { 0, 0, 0 };
+  int pl_width[3] = {0, 0, 0};
+  int pl_height[3] = {0, 0, 0};
+  HwFrameBuffer *dst_buffer = td->dst_frame_buffer;
+  HwFrameBuffer *src_buffer = (td->do_cdef == td->do_superres) ? td->frame_buffer : &dec->back_buffer1;
+  HwFrameBuffer *pp_buffer = td->do_superres ? &dec->back_buffer1 : td->frame_buffer;
+
+  ComputeCommandBuffer *cb = &td->command_buffer;
+  ConstantBufferObject cbo1 = cb->Alloc(sizeof(RestorationData));
+  RestorationData *data = reinterpret_cast<RestorationData *>(cbo1.host_ptr);
+  ConstantBufferObject cbo2 = cb->Alloc(sizeof(PlaneRestorationData) * 3);
+  PlaneRestorationData *pl_data = reinterpret_cast<PlaneRestorationData *>(cbo2.host_ptr);
+
+  for (int p = 0; p < 3; ++p) {
+    int subsampling = p != 0;
+    pl_width[p] = dst_buffer->width >> subsampling;
+    pl_height[p] = dst_buffer->height >> subsampling;
+
+    pl_data[p].plane.width = p ? dst_buffer->uv_crop_width : dst_buffer->y_crop_width;     // pl_width[p];
+    pl_data[p].plane.height = p ? dst_buffer->uv_crop_height : dst_buffer->y_crop_height;  // pl_height[p];
+    pl_data[p].plane.stride = dst_buffer->planes[p].stride;
+    pl_data[p].plane.offset = src_buffer->planes[p].offset;
+    pl_data[p].pp_offset = pp_buffer->planes[p].offset;
+    pl_data[p].dst_offset = dst_buffer->planes[p].offset;
+    pl_data[p].subsampling = p != 0;
+    pl_data[p].hbd = td->is_hbd;
+    pl_data[p].bit_depth = td->bitdepth;
+
+    if (cm->rst_info[p].frame_restoration_type == RESTORE_NONE) {
+      continue;
+    }
+
+    FilterFrameCtxt *ctx = &loop_rest_ctxt->ctxt[p];
+    pl_data[p].units.Rows = ctx->rsi->vert_units_per_tile;
+    pl_data[p].units.Cols = ctx->rsi->horz_units_per_tile;
+    pl_data[p].units.Size = ctx->rsi->restoration_unit_size;
+    pl_data[p].units.Stride = ctx->rsi->horz_units_per_tile;
+    pl_data[p].Lr_buffer_offset = lr_ptr;
+
+    int *lr_type = (int *)td->loop_rest_types->host_ptr;
+    int *lr_wiener = (int *)td->loop_rest_wiener->host_ptr;
+    for (int u = 0; u < ctx->rsi->units_per_tile; ++u) {
+      RestorationUnitInfo *unit = ctx->rsi->unit_info + u;
+      int *dst_type = lr_type + lr_ptr * 4;
+      int *dst_wiener = lr_wiener + lr_ptr * 16;
+      ++lr_ptr;
+      dst_type[0] = unit->restoration_type;
+      dst_type[1] = unit->sgrproj_info.xqd[0];
+      dst_type[2] = unit->sgrproj_info.xqd[1];
+      dst_type[3] = unit->sgrproj_info.ep;
+      if (unit->restoration_type == RESTORE_WIENER) {
+        for (int i = 0; i < 8; ++i) dst_wiener[i] = unit->wiener_info.hfilter[i];
+        for (int i = 0; i < 8; ++i) dst_wiener[i + 8] = unit->wiener_info.vfilter[i];
+      }
+    }
+  }
+
+  {
+    struct CBuffer {
+      int do_restoration;
+      int plane;
+    } cBuffer;
+    memcpy(data->Sgr_Params, sgr_params, sizeof(sgr_params));
+
+    Microsoft::WRL::ComPtr<ID3D12GraphicsCommandList> command_list = dec->compute.command_list;
+    ComputeShader *shader = &dec->shader_lib->shader_loop_rest;
+    command_list->SetPipelineState(shader->pso.Get());
+    command_list->SetComputeRootSignature(shader->signaturePtr.Get());
+    command_list->SetComputeRootShaderResourceView(0, dec->frame_buffer_pool->dev->GetGPUVirtualAddress());
+    command_list->SetComputeRootShaderResourceView(1, td->loop_rest_types->dev->GetGPUVirtualAddress());
+    command_list->SetComputeRootShaderResourceView(2, td->loop_rest_wiener->dev->GetGPUVirtualAddress());
+    command_list->SetComputeRootUnorderedAccessView(3, dec->frame_buffer_pool->dev->GetGPUVirtualAddress());
+    command_list->SetComputeRootConstantBufferView(4, cbo1.dev_address);
+    command_list->SetComputeRootConstantBufferView(5, cbo2.dev_address);
+
+    for (int p = 0; p < 3; ++p) {
+      int w = pl_width[p];
+      int h = pl_height[p];
+      cBuffer.do_restoration = cm->rst_info[p].frame_restoration_type != RESTORE_NONE;
+      cBuffer.plane = p;
+      command_list->SetComputeRoot32BitConstants(6, 2, &cBuffer, 0);
+      command_list->Dispatch((w + 15) >> 4, (h + 3) >> 2, 1);
+    }
+
+    command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->frame_buffer_pool->dev));
+  }
+  PutPerfMarker(td, &td->perf_markers[5]);
+}
+
+void av1_cdef_looprestoration(Av1Core *dec, AV1_COMMON *cm, void *lr_ctxt) {
+  av1_frame_thread_data *td = dec->curr_frame_data;
+  ComputeCommandBuffer *cb = &td->command_buffer;
+  Microsoft::WRL::ComPtr<ID3D12GraphicsCommandList> command_list = dec->compute.command_list;
+
+  PlaneRestorationData *pl_data[3] = {0, 0, 0};
+  RestorationData *lr_data = NULL;
+  HwFrameBuffer *cdef_dst = (td->do_loop_rest == td->do_superres) ? td->dst_frame_buffer : &dec->back_buffer1;
+  HwFrameBuffer *dst_fb = td->dst_frame_buffer;
+
+  av1_cdef_filter_run(dec, cm, td->frame_buffer, cdef_dst);
+
+  if (td->do_superres) {
+    const int h_border = 16;
+    const int bpp = 1 + dst_fb->hbd;
+    const int upscaled_width = cm->superres_upscaled_width;
+    const int y_height = (dst_fb->height + 127) & ~127;
+    const int superres_w = (upscaled_width + 127) & ~127;
+    const int superres_y_stride = (superres_w + 2 * h_border) * bpp;
+    const int superres_uv_stride = ((superres_w >> 1) + 2 * h_border) * bpp;
+    const int superres_y_size = y_height * superres_y_stride;
+    const int superres_uv_size = (y_height >> 1) * superres_uv_stride;
+    if (td->do_loop_rest) {
+      // upscale pre cdef frame (td->frame_buffer) to temp buffer
+      HwFrameBuffer *dst = &dec->back_buffer1;
+      assert(dst->size >= (superres_y_size + 2 * superres_uv_size));
+      dst->y_crop_width = upscaled_width;
+      dst->uv_crop_width = (upscaled_width + 1) >> 1;
+      dst->y_crop_height = dst_fb->y_crop_height;
+      dst->uv_crop_height = dst_fb->uv_crop_height;
+      dst->planes[0].stride = superres_y_stride;
+      dst->planes[1].stride = superres_uv_stride;
+      dst->planes[2].stride = superres_uv_stride;
+      dst->planes[0].offset = static_cast<int>(dst->base_offset + h_border * bpp);
+      dst->planes[1].offset = dst->planes[0].offset + superres_y_size;
+      dst->planes[2].offset = dst->planes[1].offset + superres_uv_size;
+      dst->hbd = dst_fb->hbd;
+      av1_superres(dec, td->frame_buffer, dst, 2);
+    }
+
+    if (td->do_cdef || (td->do_cdef == 0 && td->do_loop_rest == 0)) {
+      // upscale post cdef frame (cdef_dst) or src (td->frame_buffer) to:
+      HwFrameBuffer *src = td->do_cdef ? cdef_dst : td->frame_buffer;
+      HwFrameBuffer *dst = td->do_loop_rest ? td->frame_buffer : dst_fb;
+      assert(dst->size >= (superres_y_size + 2 * superres_uv_size));
+      dst->y_crop_width = upscaled_width;
+      dst->uv_crop_width = (upscaled_width + 1) >> 1;
+      dst->y_crop_height = dst_fb->y_crop_height;
+      dst->uv_crop_height = dst_fb->uv_crop_height;
+      dst->planes[0].stride = superres_y_stride;
+      dst->planes[1].stride = superres_uv_stride;
+      dst->planes[2].stride = superres_uv_stride;
+      dst->planes[0].offset = static_cast<int>(dst->base_offset + h_border * bpp);
+      dst->planes[1].offset = dst->planes[0].offset + superres_y_size;
+      dst->planes[2].offset = dst->planes[1].offset + superres_uv_size;
+      dst->hbd = dst_fb->hbd;
+      av1_superres(dec, src, dst, 3);
+    }
+
+    // update dst fb if not yet updated
+    dst_fb->y_crop_width = upscaled_width;
+    dst_fb->uv_crop_width = (upscaled_width + 1) >> 1;
+    dst_fb->width = (upscaled_width + 7) & ~7;
+    dst_fb->planes[0].stride = superres_y_stride;
+    dst_fb->planes[1].stride = superres_uv_stride;
+    dst_fb->planes[2].stride = superres_uv_stride;
+    dst_fb->planes[0].offset = static_cast<int>(dst_fb->base_offset + h_border * bpp);
+    dst_fb->planes[1].offset = dst_fb->planes[0].offset + superres_y_size;
+    dst_fb->planes[2].offset = dst_fb->planes[1].offset + superres_uv_size;
+    YV12_BUFFER_CONFIG *const buf = &cm->cur_frame->buf;
+    buf->y_crop_width = dst_fb->y_crop_width;
+    buf->uv_crop_width = dst_fb->uv_crop_width;
+    buf->y_width = dst_fb->width;
+    buf->uv_width = buf->y_width >> 1;
+    /*
+    YV12_BUFFER_CONFIG *const buf = &cm->cur_frame->buf;
+    buf->y_crop_width = upscaled_width;
+    buf->uv_crop_width = (upscaled_width + 1) >> 1;
+    buf->y_width = (upscaled_width + 7) & ~7;
+    buf->uv_width = buf->y_width >> 1;*/
+  }
+
+  av1_looprestoration(dec, cm, lr_ctxt);
+}
+
+struct GrainParams {
+  // This structure is compared element-by-element in the function
+  // av1_check_grain_params_equiv: this function must be updated if any changes
+  // are made to this structure.
+  int apply_grain;
+  int update_parameters;
+  int num_y_points;   // value: 0..14
+  int num_cb_points;  // value: 0..10
+
+  int num_cr_points;   // value: 0..10
+  int scaling_shift;   // values : 8..11
+  int ar_coeff_lag;    // values:  0..3
+  int ar_coeff_shift;  // values : 6..9
+
+  // Shift value: AR coeffs range
+  // 6: [-2, 2)
+  // 7: [-1, 1)
+  // 8: [-0.5, 0.5)
+  // 9: [-0.25, 0.25)
+
+  int cb_mult;       // 8 bits
+  int cb_luma_mult;  // 8 bits
+  int cb_offset;     // 9 bits
+  int cr_mult;       // 8 bits
+
+  int cr_luma_mult;  // 8 bits
+  int cr_offset;     // 9 bits
+  int overlap_flag;
+  int clip_to_restricted_range;
+
+  unsigned int bit_depth;  // video bit depth
+  int chroma_scaling_from_luma;
+  int grain_scale_shift;
+  unsigned int random_seed;
+
+  // Y 8 bit values 14*2 = 7*4
+  // UV 8 bit values 10*2 = 5*4+2 (padding 2)
+  int scaling_points[14][2][4];
+
+  // Y 8 bit values 24 = 6*4
+  // UV 8 bit values 25 = 6*4 + 1 (padding 3)
+  int ar_coeffs[3][25][4];  // xyz = y, cb, cr
+};
+
+struct FilmGrainGenData {
+  GrainParams params;
+
+  int luma_block_size_y;
+  int luma_block_size_x;
+  int luma_grain_stride;
+  int chroma_block_size_y;
+
+  int chroma_block_size_x;
+  int chroma_grain_stride;
+  int left_pad;
+  int top_pad;
+
+  int right_pad;
+  int bottom_pad;
+  int dst_offset_u;
+  int dst_offset_v;
+
+  int pred_pos[25][3][4];
+};
+
+struct FilmGrainData {
+  GrainParams params;
+
+  int src_planes[3][4];
+  int dst_planes[3][4];
+
+  int enable_chroma;
+  int random_offset_stride;
+  int width;
+  int height;
+
+  int mc_identity;
+  int luma_grain_stride;
+  int chroma_grain_stride;
+  int left_pad;
+
+  int right_pad;
+  int top_pad;
+  int bottom_pad;
+  int ar_padding;
+
+  int grain_offset_u;
+  int grain_offset_v;
+  int is_10x3;
+  int pad;
+
+  int scaling_lut[256][4];
+};
+
+static void init_scaling_function(const int scaling_points[][2], int num_points, int scaling_lut[][4], int p) {
+  if (num_points == 0) {
+    for (int i = 0; i < 256; i++) scaling_lut[i][p] = 0;
+    return;
+  }
+  for (int i = 0; i < scaling_points[0][0]; i++) scaling_lut[i][p] = scaling_points[0][1];
+  for (int point = 0; point < num_points - 1; point++) {
+    int delta_y = scaling_points[point + 1][1] - scaling_points[point][1];
+    int delta_x = scaling_points[point + 1][0] - scaling_points[point][0];
+    int64_t delta = delta_y * ((65536 + (delta_x >> 1)) / delta_x);
+    for (int x = 0; x < delta_x; x++) {
+      scaling_lut[scaling_points[point][0] + x][p] = scaling_points[point][1] + (int)((x * delta + 32768) >> 16);
+    }
+  }
+  for (int i = scaling_points[num_points - 1][0]; i < 256; i++) scaling_lut[i][p] = scaling_points[num_points - 1][1];
+}
+
+struct RNG {
+  uint16_t random_register;
+
+  void init(int luma_line, uint16_t seed) {
+    uint16_t msb = (seed >> 8) & 255;
+    uint16_t lsb = seed & 255;
+    random_register = (msb << 8) + lsb;
+    int luma_num = luma_line >> 5;
+    random_register ^= ((luma_num * 37 + 178) & 255) << 8;
+    random_register ^= ((luma_num * 173 + 105) & 255);
+  }
+
+  int get_random_number(int bits) {
+    uint16_t bit;
+    bit = ((random_register >> 0) ^ (random_register >> 1) ^ (random_register >> 3) ^ (random_register >> 12)) & 1;
+    random_register = (random_register >> 1) | (bit << 15);
+    return (random_register >> (16 - bits)) & ((1 << bits) - 1);
+  }
+};
+
+void av1_filmgrain_run(Av1Core *dec, AV1_COMMON *cm, int enable_chorma) {
+  av1_frame_thread_data *td = dec->curr_frame_data;
+  aom_film_grain_t *params = &cm->film_grain_params;
+
+  ComputeCommandBuffer *cb = &td->command_buffer;
+  ConstantBufferObject cbo1 = cb->Alloc(sizeof(FilmGrainGenData));
+  FilmGrainGenData *gen_data = reinterpret_cast<FilmGrainGenData *>(cbo1.host_ptr);
+
+  gen_data->params.apply_grain = params->apply_grain;
+  gen_data->params.ar_coeff_lag = params->ar_coeff_lag;
+  gen_data->params.ar_coeff_shift = params->ar_coeff_shift;
+  gen_data->params.bit_depth = params->bit_depth;
+  gen_data->params.cb_luma_mult = params->cb_luma_mult;
+  gen_data->params.cb_mult = params->cb_mult;
+  gen_data->params.cb_offset = params->cb_offset;
+  gen_data->params.chroma_scaling_from_luma = params->chroma_scaling_from_luma;
+  gen_data->params.clip_to_restricted_range = params->clip_to_restricted_range;
+  gen_data->params.cr_luma_mult = params->cr_luma_mult;
+  gen_data->params.cr_mult = params->cr_mult;
+  gen_data->params.cr_offset = params->cr_offset;
+  gen_data->params.grain_scale_shift = params->grain_scale_shift;
+  gen_data->params.num_cb_points = params->num_cb_points;
+  gen_data->params.num_cr_points = params->num_cr_points;
+  gen_data->params.num_y_points = params->num_y_points;
+  gen_data->params.overlap_flag = params->overlap_flag;
+  gen_data->params.random_seed = params->random_seed;
+  gen_data->params.update_parameters = params->update_parameters;
+  gen_data->params.scaling_shift = params->scaling_shift;
+  for (int i = 0; i < 24; ++i) {
+    gen_data->params.ar_coeffs[0][i][0] = params->ar_coeffs_y[i];
+    gen_data->params.ar_coeffs[1][i][0] = params->ar_coeffs_cb[i];
+    gen_data->params.ar_coeffs[2][i][0] = params->ar_coeffs_cr[i];
+  }
+  gen_data->params.ar_coeffs[1][24][0] = params->ar_coeffs_cb[24];
+  gen_data->params.ar_coeffs[2][24][0] = params->ar_coeffs_cr[24];
+
+  for (int i = 0; i < 14; ++i) {
+    gen_data->params.scaling_points[i][0][0] = params->scaling_points_y[i][0];
+    gen_data->params.scaling_points[i][1][0] = params->scaling_points_y[i][1];
+    if (i < 10) {
+      gen_data->params.scaling_points[i][0][1] = params->scaling_points_cb[i][0];
+      gen_data->params.scaling_points[i][1][1] = params->scaling_points_cb[i][1];
+      gen_data->params.scaling_points[i][0][2] = params->scaling_points_cr[i][0];
+      gen_data->params.scaling_points[i][1][2] = params->scaling_points_cr[i][1];
+    }
+  }
+
+  const int left_pad = 3;
+  const int right_pad = 3;  // padding to offset for AR coefficients
+  const int top_pad = 3;
+  const int bottom_pad = 0;
+  const int ar_padding = 3;  // maximum lag used for stabilization of AR coefficients
+  const int luma_subblock_size_y = 32;
+  const int luma_subblock_size_x = 32;
+  const int chroma_subblock_size_y = luma_subblock_size_y >> 1;
+  const int chroma_subblock_size_x = luma_subblock_size_x >> 1;
+  const int luma_block_size_y = top_pad + 2 * ar_padding + luma_subblock_size_y * 2 + bottom_pad;
+  const int luma_block_size_x = left_pad + 2 * ar_padding + luma_subblock_size_x * 2 + 2 * ar_padding + right_pad;
+  const int chroma_block_size_y = top_pad + (2 >> 1) * ar_padding + chroma_subblock_size_y * 2 + bottom_pad;
+  const int chroma_block_size_x =
+      left_pad + (2 >> 1) * ar_padding + chroma_subblock_size_x * 2 + (2 >> 1) * ar_padding + right_pad;
+
+  gen_data->luma_block_size_y = luma_block_size_y;
+  gen_data->luma_block_size_x = luma_block_size_x;
+  gen_data->luma_grain_stride = luma_block_size_x;
+  gen_data->chroma_block_size_y = chroma_block_size_y;
+  gen_data->chroma_block_size_x = chroma_block_size_x;
+  gen_data->chroma_grain_stride = chroma_block_size_x;
+  gen_data->left_pad = left_pad;
+  gen_data->top_pad = top_pad;
+  gen_data->right_pad = right_pad;
+  gen_data->bottom_pad = bottom_pad;
+  gen_data->dst_offset_u = luma_block_size_x * luma_block_size_y;
+  gen_data->dst_offset_v = luma_block_size_x * luma_block_size_y + chroma_block_size_x * chroma_block_size_y;
+  int pos_ar_index = 0;
+  for (int row = -params->ar_coeff_lag; row < 0; row++) {
+    for (int col = -params->ar_coeff_lag; col < params->ar_coeff_lag + 1; col++) {
+      gen_data->pred_pos[pos_ar_index][0][0] = row;
+      gen_data->pred_pos[pos_ar_index][1][0] = col;
+      gen_data->pred_pos[pos_ar_index][2][0] = 0;
+
+      gen_data->pred_pos[pos_ar_index][0][1] = row;
+      gen_data->pred_pos[pos_ar_index][1][1] = col;
+      gen_data->pred_pos[pos_ar_index][2][1] = 0;
+      ++pos_ar_index;
+    }
+  }
+
+  for (int col = -params->ar_coeff_lag; col < 0; col++) {
+    gen_data->pred_pos[pos_ar_index][0][0] = 0;
+    gen_data->pred_pos[pos_ar_index][1][0] = col;
+    gen_data->pred_pos[pos_ar_index][2][0] = 0;
+
+    gen_data->pred_pos[pos_ar_index][0][1] = 0;
+    gen_data->pred_pos[pos_ar_index][1][1] = col;
+    gen_data->pred_pos[pos_ar_index][2][1] = 0;
+    ++pos_ar_index;
+  }
+  if (params->num_y_points > 0) {
+    gen_data->pred_pos[pos_ar_index][0][1] = 0;
+    gen_data->pred_pos[pos_ar_index][1][1] = 0;
+    gen_data->pred_pos[pos_ar_index][2][1] = 1;
+  }
+
+  ConstantBufferObject cbo2 = cb->Alloc(sizeof(FilmGrainData));
+  FilmGrainData *data = reinterpret_cast<FilmGrainData *>(cbo2.host_ptr);
+
+  data->params = gen_data->params;
+  data->enable_chroma = enable_chorma;
+
+  memcpy(data->dst_planes, dec->back_buffer1.planes, sizeof(dec->back_buffer1.planes));
+
+  HwFrameBuffer *src = td->dst_frame_buffer;
+  memcpy(data->src_planes, src->planes, sizeof(src->planes));
+
+  data->luma_grain_stride = luma_block_size_x;
+  data->chroma_grain_stride = chroma_block_size_x;
+  data->grain_offset_u = luma_block_size_x * luma_block_size_y;
+  data->grain_offset_v = luma_block_size_x * luma_block_size_y + chroma_block_size_x * chroma_block_size_y;
+  data->mc_identity = cm->cur_frame->buf.matrix_coefficients == AOM_CICP_MC_IDENTITY ? 1 : 0;
+  data->left_pad = left_pad;
+  data->right_pad = right_pad;
+  data->top_pad = top_pad;
+  data->bottom_pad = bottom_pad;
+  data->ar_padding = ar_padding;
+  data->width = src->y_crop_width;
+  data->height = src->y_crop_height;
+  data->random_offset_stride = (src->width / 32) + 1;
+  data->is_10x3 = dec->tryhdr10x3 && src->hbd;
+  // data->scaling_lut[256][4];
+  init_scaling_function(params->scaling_points_y, params->num_y_points, data->scaling_lut, 0);
+  if (params->chroma_scaling_from_luma) {
+    for (int i = 0; i < 256; ++i) {
+      data->scaling_lut[i][1] = data->scaling_lut[i][0];
+      data->scaling_lut[i][2] = data->scaling_lut[i][0];
+    }
+  } else {
+    init_scaling_function(params->scaling_points_cb, params->num_cb_points, data->scaling_lut, 1);
+    init_scaling_function(params->scaling_points_cr, params->num_cr_points, data->scaling_lut, 2);
+  }
+
+  RNG rng;
+  // const int random_offset_width = src->width / luma_subblock_size_x;
+  int *random_offset = (int *)td->filmgrain_rand_offset->host_ptr;
+  for (int y = 0; y < src->height / 2; y += (luma_subblock_size_y >> 1)) {
+    rng.init(y * 2, params->random_seed);
+    for (int x = 0; x < src->width / 2; x += (luma_subblock_size_x >> 1)) {
+      random_offset[(y / (luma_subblock_size_x >> 1)) * data->random_offset_stride + x / (luma_subblock_size_x >> 1)] =
+          rng.get_random_number(8);
+    }
+  }
+
+  Microsoft::WRL::ComPtr<ID3D12GraphicsCommandList> command_list = dec->compute.command_list;
+  ComputeShader *shader = &dec->shader_lib->shader_filmgrain_luma_gen;
+  command_list->SetPipelineState(shader->pso.Get());
+  command_list->SetComputeRootSignature(shader->signaturePtr.Get());
+  command_list->SetComputeRootShaderResourceView(0, dec->filmgrain_random_luma->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootShaderResourceView(1, dec->filmgrain_random_chroma->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootShaderResourceView(2, dec->filmgrain_gaus->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootUnorderedAccessView(3, dec->filmgrain_noise->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootConstantBufferView(4, cbo1.dev_address);
+  command_list->Dispatch(1, 1, 1);
+  command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->filmgrain_noise->dev));
+
+  shader = &dec->shader_lib->shader_filmgrain_chroma_gen;
+  command_list->SetPipelineState(shader->pso.Get());
+  command_list->SetComputeRootSignature(shader->signaturePtr.Get());
+  command_list->SetComputeRootShaderResourceView(0, dec->filmgrain_random_luma->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootShaderResourceView(1, dec->filmgrain_random_chroma->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootShaderResourceView(2, dec->filmgrain_gaus->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootUnorderedAccessView(3, dec->filmgrain_noise->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootConstantBufferView(4, cbo1.dev_address);
+  command_list->Dispatch(1, 2, 1);
+  command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->filmgrain_noise->dev));
+
+  shader = &dec->shader_lib->shader_filmgrain_filter;
+  command_list->SetPipelineState(shader->pso.Get());
+  command_list->SetComputeRootSignature(shader->signaturePtr.Get());
+  command_list->SetComputeRootShaderResourceView(0, dec->frame_buffer_pool->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootShaderResourceView(1, dec->filmgrain_noise->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootShaderResourceView(2, td->filmgrain_rand_offset->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootUnorderedAccessView(3, dec->frame_buffer_pool->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootConstantBufferView(4, cbo2.dev_address);
+  command_list->Dispatch(src->width / 32 + 1, src->height / 32 + 1, 1);
+  command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->frame_buffer_pool->dev));
+
+  PutPerfMarker(td, &td->perf_markers[7]);
+}
+
+struct CopyPlaneData {
+  unsigned int cb_wi_count;
+  unsigned int cb_src_offset;
+  unsigned int cb_src_stride;
+  unsigned int cb_dst_width;
+  unsigned int cb_dst_offset;
+  unsigned int cb_dst_stride;
+};
+
+int av1_postprocess_copy_output(Av1Core *dec, AV1_COMMON *cm) {
+  av1_frame_thread_data *td = dec->curr_frame_data;
+  cm->cur_frame->buf.hw_show_image = NULL;
+  if (!cm->showable_frame && !cm->show_frame) return 0;
+
+  HwFrameBuffer *src = td->dst_frame_buffer;
+  const int is_monochrome = cm->cur_frame->buf.monochrome;
+  const int is_10x3 = dec->tryhdr10x3 && src->hbd;
+  const int align1 = 255;
+  int y_stride = 0;
+  int uv_stride = 0;
+  const int bpp = src->hbd ? (is_10x3 ? 4 : 2) : 1;
+  int dst_y_texture_width = is_10x3 ? (src->y_crop_width + 2) / 3 : src->y_crop_width;
+  int dst_uv_texture_width = is_10x3 ? (src->uv_crop_width + 2) / 3 : src->uv_crop_width;
+  if (is_monochrome) {
+    y_stride = ((dst_y_texture_width * bpp + align1) & (~align1));
+  } else {
+    uv_stride = ((dst_uv_texture_width * bpp + align1) & (~align1));
+    y_stride = uv_stride * 2;
+  }
+
+  const int num_planes = is_monochrome ? 1 : 3;
+  const int y_size = (y_stride * src->y_crop_height + 511) & ~511;
+  const int uv_size = (uv_stride * src->uv_crop_height + 511) & ~511;
+  const int size = y_size + (num_planes - 1) * uv_size;
+  // callback here:
+  av1_decoded_frame_buffer_t fb = {0};
+
+  int err = 1;
+  frame_buffer_type fb_type = td->is_hbd ? (is_10x3 ? fbt10x3 : fbt10bit) : fbt8bit;
+  if (dec->cb_get_output_image)
+    err = dec->cb_get_output_image(dec->image_alloc_priv, size, src->y_crop_width, src->y_crop_height, fb_type, &fb);
+  if (err || (fb.dx12_buffer == 0 && fb.dx12_texture[0] == 0)) {
+    aom_internal_error(&cm->error, AOM_CODEC_MEM_ERROR, "Failed to allocate output frame.");
+  }
+
+  HwOutputImage *img = (HwOutputImage *)MTQueueGet(&dec->image_pool);
+  img->fb_ptr = (uint8_t *)fb.buffer_host_ptr;
+  img->alloc_priv = fb.priv;
+  img->is_valid = 1;
+  img->y_crop_width = src->y_crop_width;
+  img->y_crop_height = src->y_crop_height;
+  img->uv_crop_width = src->uv_crop_width;
+  img->uv_crop_height = src->uv_crop_height;
+  img->frame_number = src->frame_number;
+  img->hbd = td->is_hbd;
+  img->monochrome = is_monochrome;
+  img->planes[0].stride = y_stride;
+  img->planes[1].stride = uv_stride;
+  img->planes[2].stride = uv_stride;
+  img->planes[0].offset = 0;
+  img->planes[1].offset = y_size;
+  img->planes[2].offset = y_size + uv_size;
+  img->is_valid = 1;
+
+  if (is_monochrome) {
+    img->planes[1].offset = 0;
+    img->planes[2].offset = 0;
+  }
+  img->size = fb.buffer_size;
+  cm->cur_frame->buf.hw_show_image = img;
+
+  const int film_grain_chroma = td->do_filmgrain && !is_monochrome &&
+                                (cm->film_grain_params.num_cb_points || cm->film_grain_params.chroma_scaling_from_luma);
+
+  HwFrameBuffer *dst = &dec->back_buffer1;
+  dst->planes[0].stride = img->planes[0].stride;
+  dst->planes[1].stride = img->planes[1].stride;
+  dst->planes[2].stride = img->planes[2].stride;
+  dst->planes[0].offset = static_cast<int>(img->planes[0].offset + dst->base_offset);
+  dst->planes[1].offset = static_cast<int>(img->planes[1].offset + dst->base_offset);
+  dst->planes[2].offset = static_cast<int>(img->planes[2].offset + dst->base_offset);
+  if (td->do_filmgrain) {
+    av1_filmgrain_run(dec, cm, film_grain_chroma);
+  }
+
+  Microsoft::WRL::ComPtr<ID3D12GraphicsCommandList> command_list = dec->compute.command_list;
+  if (!td->do_filmgrain || !film_grain_chroma) {
+    ComputeShader *shader =
+        is_10x3 ? &dec->shader_lib->shader_copy_plane_10bit10x3 : &dec->shader_lib->shader_copy_plane;
+    command_list->SetComputeRootSignature(shader->signaturePtr.Get());
+    command_list->SetPipelineState(shader->pso.Get());
+    command_list->SetComputeRootUnorderedAccessView(0, dec->frame_buffer_pool->dev->GetGPUVirtualAddress());
+
+    CopyPlaneData data;
+
+    for (int plane = td->do_filmgrain; plane < num_planes; ++plane) {
+      const int w = is_10x3 ? (((plane == 0 ? img->y_crop_width : img->uv_crop_width) + 23) / 24)
+                            : (((plane == 0 ? img->y_crop_width : img->uv_crop_width) * bpp + 15) >> 4);
+      const int h = plane == 0 ? img->y_crop_height : img->uv_crop_height;
+      data.cb_wi_count = w * h;
+      data.cb_src_offset = src->planes[plane].offset;
+      data.cb_src_stride = src->planes[plane].stride;
+      data.cb_dst_width = w;
+      data.cb_dst_offset = dst->planes[plane].offset;
+      data.cb_dst_stride = dst->planes[plane].stride;
+      command_list->SetComputeRoot32BitConstants(1, 6, &data, 0);
+      command_list->Dispatch((data.cb_wi_count + 63) >> 6, 1, 1);
+    }
+    command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->frame_buffer_pool->dev));
+    PutPerfMarker(td, &td->perf_markers[6]);
+    PutPerfMarker(td, &td->perf_markers[7]);
+  }
+
+  command_list->ResourceBarrier(
+      1, &CD3DX12_RESOURCE_BARRIER::Transition(dec->frame_buffer_pool->dev, D3D12_RESOURCE_STATE_UNORDERED_ACCESS,
+                                               D3D12_RESOURCE_STATE_COPY_SOURCE));
+  if (fb.dx12_buffer) {
+    command_list->CopyBufferRegion(static_cast<ID3D12Resource *>(fb.dx12_buffer), 0, dec->frame_buffer_pool->dev,
+                                   dst->base_offset, size);
+  }
+
+  if (fb.dx12_texture[0]) {
+    for (int i = 0; i < num_planes; i++) {
+      ID3D12Resource *dst_texture = static_cast<ID3D12Resource *>(fb.dx12_texture[i]);
+      if (!dst_texture) continue;
+      UINT64 RequiredSize = 0;
+      D3D12_PLACED_SUBRESOURCE_FOOTPRINT buffer_layout = {};
+      buffer_layout.Offset = dst->planes[i].offset;
+      buffer_layout.Footprint.Depth = 1;
+      buffer_layout.Footprint.Format =
+          img->hbd ? (is_10x3 ? DXGI_FORMAT_R10G10B10A2_UNORM : DXGI_FORMAT_R16_UNORM) : DXGI_FORMAT_R8_UNORM;
+      buffer_layout.Footprint.Width = i ? dst_uv_texture_width : dst_y_texture_width;
+      buffer_layout.Footprint.Height = i ? img->uv_crop_height : img->y_crop_height;
+      buffer_layout.Footprint.RowPitch = dst->planes[i].stride;
+
+      CD3DX12_TEXTURE_COPY_LOCATION Dst(dst_texture, 0);
+      CD3DX12_TEXTURE_COPY_LOCATION Src(dec->frame_buffer_pool->dev, buffer_layout);
+      command_list->CopyTextureRegion(&Dst, 0, 0, 0, &Src, nullptr);
+    }
+  }
+  command_list->ResourceBarrier(
+      1, &CD3DX12_RESOURCE_BARRIER::Transition(dec->frame_buffer_pool->dev, D3D12_RESOURCE_STATE_COPY_SOURCE,
+                                               D3D12_RESOURCE_STATE_UNORDERED_ACCESS));
+  return 0;
+}

diff --git a/libav1/dx/prediction.cpp b/libav1/dx/prediction.cpp
new file mode 100644
index 0000000..4332728
--- /dev/null
+++ b/libav1/dx/prediction.cpp

@@ -0,0 +1,1088 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "dx/types.h"
+#include "dx/av1_core.h"
+#include "dx/av1_memory.h"
+#include "dx/av1_compute.h"
+
+#include "av1\common\filter.h"
+#include "av1/common/reconinter.h"
+#include "av1/common/reconintra.h"
+#include "av1/common/warped_motion.h"
+#include "aom_dsp/intrapred_common.h"
+
+enum {
+  NeedAboveLut = 0x3f7f,      // 11 1111 0111 1?11
+  NeedLeftLut = 0x3Ef7,       // 11 1110 1111 01?1
+  NeedRightLut = 0x010A,      // 0 0001 0000 10?0
+  NeedBotLut = 0x0084,        // 0 0000 1000 0?00
+  NeedAboveLeftLut = 0x11ff,  // 1 0001 1111 111?  - set to 1 for DC mode to workaround Filter mode
+};
+
+int get_relative_dist(const OrderHintInfo *oh, int a, int b) {
+  if (!oh->enable_order_hint) return 0;
+
+  const int bits = oh->order_hint_bits_minus_1 + 1;
+
+  assert(bits >= 1);
+  assert(a >= 0 && a < (1 << bits));
+  assert(b >= 0 && b < (1 << bits));
+
+  int diff = a - b;
+  const int m = 1 << (bits - 1);
+  diff = (diff & (m - 1)) - (diff & m);
+  return diff;
+}
+
+void av1_mi_push_block(AV1Decoder *pbi, AV1_COMMON *cm, MACROBLOCKD *xd) {
+  Av1Core *dec = pbi->gpu_decoder;
+  av1_frame_thread_data *td = dec->curr_frame_data;
+  av1_tile_data *tile = xd->tile_data;
+  unsigned int *index_arr = tile->gen_indexes;
+
+  MB_MODE_INFO *mi = xd->mi[0];
+  mi->index_base = tile->gen_index_ptr;
+  int bsize = mi->sb_type;
+  const int bw_log = mi_size_wide_log2[bsize];
+  const int bh_log = mi_size_high_log2[bsize];
+  const int bw = 1 << bw_log;
+  const int bh = 1 << bh_log;
+  const int bw_log_uv = AOMMAX(0, bw_log - 1);
+  const int bh_log_uv = AOMMAX(0, bh_log - 1);
+
+  const int mi_row = mi->mi_row;
+  const int mi_col = mi->mi_col;
+
+  const int is_chroma_ref = ((mi_row & 1) == 0 && (bh & 1) == 1) || ((mi_col & 1) == 0 && (bw & 1) == 1);
+
+  const int is_inter = mi->ref_frame[0] > INTRA_FRAME;
+  const int is_inter_intra = is_interintra_pred(mi);
+  const int is_compound = mi->ref_frame[1] > INTRA_FRAME;
+  const int is_obmc = is_inter && mi->motion_mode == OBMC_CAUSAL && (xd->up_available || xd->left_available);
+  if (is_inter) {
+    tile->have_inter = 1;
+    int is_global_warp[2] = {0, 0};
+    if (xd->cur_frame_force_integer_mv == 0 && bsize >= BLOCK_8X8 &&
+        (mi->mode == GLOBALMV || mi->mode == GLOBAL_GLOBALMV)) {
+      for (int ref = 0; ref < 1 + is_compound; ++ref) {
+        is_global_warp[ref] = xd->global_motion[mi->ref_frame[ref]].wmtype > TRANSLATION &&
+                              !xd->global_motion[mi->ref_frame[ref]].invalid;
+      }
+    }
+    const int is_local_warp =
+        xd->cur_frame_force_integer_mv == 0 && mi->motion_mode == WARPED_CAUSAL && !mi->wm_params.invalid;
+    const int is_luma_warp = is_inter && (is_local_warp || is_global_warp[0]) && bw_log >= 1 && bh_log >= 1;
+    const int is_chroma_warp = is_luma_warp && bw_log >= 2 && bh_log >= 2;
+
+    // Y:
+    const int block_size_id_y = InterBlockSizeIndexLUT[bw_log][bh_log];
+    if (is_compound) {
+      const int is_warp_compound = is_global_warp[0] || is_global_warp[1];
+      int type = is_warp_compound ? CompoundGlobalWarp
+                                  : mi->interinter_comp.type == COMPOUND_WEDGE
+                                        ? CompoundMasked
+                                        : mi->interinter_comp.type == COMPOUND_DIFFWTD ? CompoundDiff : CompoundAvrg;
+      int type_index = (type - 1) * InterTypes::InterSizesAllCommon + block_size_id_y;
+      index_arr[tile->gen_index_ptr++] = tile->gen_block_map[type_index]++;
+    } else {
+      if (is_luma_warp) {
+        index_arr[tile->gen_index_ptr++] = tile->gen_block_map_wrp[block_size_id_y]++;
+      } else {
+        index_arr[tile->gen_index_ptr++] = tile->gen_block_map[block_size_id_y]++;
+      }
+    }
+
+    // UV:
+    if (!is_chroma_ref) {
+      const int plane_bw = AOMMAX(0, bw_log - 1);
+      const int plane_bh = AOMMAX(0, bh_log - 1);
+      const int block_size_id_uv = InterBlockSizeIndexLUT[plane_bw][plane_bh];
+      int sub8x8 = bw_log == 0 || bh_log == 0;
+      if (sub8x8) {
+        int dy = bh_log == 0 ? -1 : 0;
+        int dx = bw_log == 0 ? -1 : 0;
+        const MB_MODE_INFO *prev_mi[] = {
+            xd->mi[dx + dy * xd->mi_stride],
+            xd->mi[dx],
+            xd->mi[dy * xd->mi_stride],
+        };
+        for (int i = 0; i < 3; ++i) {
+          sub8x8 &= prev_mi[i]->ref_frame[0] > INTRA_FRAME;
+        }
+      }
+
+      if (sub8x8) {
+        const int brows = bh_log == 2 ? 4 : 2;
+        const int bcols = bw_log == 2 ? 4 : 2;
+        for (int row = 0; row < brows; ++row) {
+          int mi_index = (bh_log == 0 ? row - 1 : 0) * xd->mi_stride;
+          MB_MODE_INFO *sub_mi = xd->mi[mi_index];
+
+          if (bw_log == 0) {
+            MB_MODE_INFO *sub_mi0 = xd->mi[mi_index - 1];
+            const int is_compound1 = sub_mi->ref_frame[1] > INTRA_FRAME;
+            const int is_compound0 = sub_mi0->ref_frame[1] > INTRA_FRAME;
+            int type_index0 = Inter2x2ArrOffset + (is_compound0 << static_cast<int>(is_compound0 != is_compound1));
+            int type_index1 = Inter2x2ArrOffset + (is_compound1 << static_cast<int>(is_compound0 != is_compound1));
+            index_arr[tile->gen_index_ptr++] = tile->gen_block_map[type_index0];
+            if (type_index0 != type_index1) index_arr[tile->gen_index_ptr++] = tile->gen_block_map[type_index1];
+            tile->gen_block_map[type_index0] += 2;
+            tile->gen_block_map[type_index1] += 2;
+          } else {
+            const int type_index = Inter2x2ArrOffset + is_compound;
+            index_arr[tile->gen_index_ptr++] = tile->gen_block_map[type_index];
+            tile->gen_block_map[type_index] += 2 * bcols;
+          }
+        }
+      } else {
+        if (is_compound) {
+          const int is_warp_compound = (is_global_warp[0] || is_global_warp[1]) && bw_log >= 2 && bh_log >= 2;
+          int type =
+              is_warp_compound
+                  ? (mi->interinter_comp.type == COMPOUND_DIFFWTD ? CompoundDiffUvGlobalWarp : CompoundGlobalWarp)
+                  : mi->interinter_comp.type == COMPOUND_WEDGE
+                        ? CompoundMasked
+                        : mi->interinter_comp.type == COMPOUND_DIFFWTD ? CompoundDiffUv : CompoundAvrg;
+          int type_index = (type - 1) * InterTypes::InterSizesAllCommon + block_size_id_uv;
+          index_arr[tile->gen_index_ptr++] = tile->gen_block_map[type_index];
+          tile->gen_block_map[type_index] += 2;
+        } else {
+          if (is_chroma_warp) {
+            index_arr[tile->gen_index_ptr++] = tile->gen_block_map_wrp[block_size_id_uv];
+            tile->gen_block_map_wrp[block_size_id_uv] += 2;
+          } else {
+            index_arr[tile->gen_index_ptr++] = tile->gen_block_map[block_size_id_uv];
+            tile->gen_block_map[block_size_id_uv] += 2;
+          }
+        }
+      }
+    }
+
+    if (is_obmc) {
+      int count = 0;
+      if (xd->up_available) {
+        const int x_mis = AOMMIN(bw, cm->mi_cols - mi_col);
+        int h = bh_log > 4 ? 3 : (bh_log - 1);
+        int huv = h == 0 ? 0 : h - 1;
+        int obmc_chroma = bsize > BLOCK_16X8 && bsize != BLOCK_4X16 && bsize != BLOCK_16X4;
+        for (int col = 0; col < x_mis && count < max_neighbor_obmc[bw_log];) {
+          MB_MODE_INFO **above = xd->mi - xd->mi_stride + col;
+          int w = mi_size_wide_log2[above[0]->sb_type];
+          if (w == 0) {
+            w = 1;
+            ++above;
+          }
+          if (w > bw_log) w = bw_log;
+          if (above[0]->ref_frame[0] > INTRA_FRAME) {
+            count += 1 + (w == 5);
+            const int type_index_y =
+                (ObmcAbove - 1) * InterTypes::InterSizesAllCommon + ((w << 2) | h);  // InterBlockSizeIndexLUT[w][h];
+            index_arr[tile->gen_index_ptr++] = tile->gen_block_map[type_index_y]++;
+            if (obmc_chroma) {
+              const int type_index_uv = (ObmcAbove - 1) * InterTypes::InterSizesAllCommon + (((w - 1) << 2) | huv);
+              index_arr[tile->gen_index_ptr++] = tile->gen_block_map[type_index_uv];
+              tile->gen_block_map[type_index_uv] += 2;
+            }
+          }
+
+          assert(above[0]->use_intrabc == 0);
+          col += 1 << w;
+        }
+      }
+      count = 0;
+      if (xd->left_available) {
+        const int y_mis = AOMMIN(bh, cm->mi_rows - mi_row);
+        int w = bw_log > 4 ? 3 : (bw_log - 1);
+        int wuv = w == 0 ? 0 : w - 1;
+        for (int row = 0; row < y_mis && count < max_neighbor_obmc[bh_log];) {
+          MB_MODE_INFO **left = xd->mi - 1 + xd->mi_stride * row;
+          int h = mi_size_high_log2[left[0]->sb_type];
+          if (h == 0) {
+            h = 1;
+            left += xd->mi_stride;
+          }
+          if (h > bh_log) h = bh_log;
+
+          if (left[0]->ref_frame[0] > INTRA_FRAME) {
+            count += 1 + (h == 5);
+
+            const int type_index_y = (ObmcLeft - 1) * InterTypes::InterSizesAllCommon + ((h << 2) | w);
+            const int type_index_uv = (ObmcLeft - 1) * InterTypes::InterSizesAllCommon + (((h - 1) << 2) | wuv);
+            index_arr[tile->gen_index_ptr++] = tile->gen_block_map[type_index_y]++;
+            index_arr[tile->gen_index_ptr++] = tile->gen_block_map[type_index_uv];
+            tile->gen_block_map[type_index_uv] += 2;
+          }
+          assert(left[0]->use_intrabc == 0);
+          row += 1 << h;
+        }
+      }
+    }
+  }
+
+  const int y_use_palette = mi->palette_mode_info.palette_size[0] > 0;
+  const int uv_use_palette = mi->palette_mode_info.palette_size[1] > 0;
+  if (!is_inter || is_inter_intra) {
+    const int interintra_mode = mi->interintra_mode;
+    const int is_intrabc = mi->use_intrabc;
+    const BLOCK_SIZE bsize_uv = scale_chroma_bsize((BLOCK_SIZE)bsize, 1, 1);
+    TX_SIZE tx_sizes[2];
+    tx_sizes[0] = av1_get_tx_size(0, xd);
+    tx_sizes[1] = av1_get_tx_size(1, xd);
+    const int txw = is_intrabc ? AOMMIN(bw_log, 4) : is_inter_intra ? bw_log : (tx_size_wide_log2[tx_sizes[0]] - 2);
+    const int txh = is_intrabc ? AOMMIN(bh_log, 4) : is_inter_intra ? bh_log : (tx_size_high_log2[tx_sizes[0]] - 2);
+    const int max_cnt_x = (cm->mi_cols - mi_col + (1 << txw) - 1) >> txw;
+    const int max_cnt_y = (cm->mi_rows - mi_row + (1 << txh) - 1) >> txh;
+    const int unit_x_log = bw_log == 5 && !is_intrabc;
+    const int unit_y_log = bh_log == 5 && !is_intrabc;
+    int cnt_y = 1 << (bh_log - txh);
+    int cnt_x = 1 << (bw_log - txw);
+    if (!y_use_palette) {
+      int iter_grid_stride = td->iter_grid_stride;
+      int *iter_grid = td->gen_intra_iter_y + mi_col + mi_row * iter_grid_stride;
+      const int mode = is_inter_intra ? interintra_to_intra_mode[interintra_mode] : mi->mode;
+
+      int need_above = (NeedAboveLut >> mode) & 1;
+      int need_left = (NeedLeftLut >> mode) & 1;
+      const int need_right = (NeedRightLut >> mode) & 1;
+      const int need_bot = (NeedBotLut >> mode) & 1;
+      int need_aboveleft = (NeedAboveLeftLut >> mode) & 1;
+      const int use_filter = mi->filter_intra_mode_info.use_filter_intra;
+
+      const int type_size = txw + txh;
+      const int type_idx_base = use_filter ? IntraBlockOffset : (IntraBlockOffset - 1 - type_size);
+
+      if (is_intrabc) {
+        need_above = 0;
+        need_left = 0;
+        need_aboveleft = 0;
+      }
+
+      cnt_x >>= unit_x_log;
+      cnt_y >>= unit_y_log;
+      for (int unit_y = 0; unit_y <= unit_y_log; ++unit_y) {
+        for (int unit_x = 0; unit_x <= unit_x_log; ++unit_x) {
+          const int x_start = unit_x * cnt_x;
+          const int x_end = AOMMIN(x_start + cnt_x, max_cnt_x);
+          const int y_start = unit_y * cnt_y;
+          const int y_end = AOMMIN(y_start + cnt_y, max_cnt_y);
+          for (int y = y_start; y < y_end; ++y) {
+            for (int x = x_start; x < x_end; ++x) {
+              const int col_off = x << txw;
+              const int row_off = y << txh;
+              const int subblk_w = 1 << txw;
+              const int subblk_h = 1 << txh;
+              const int have_top = y || xd->up_available;
+              const int have_left = x || xd->left_available;
+
+              int above_available = have_top;
+              int have_top_right = 0;
+              if (need_above) {
+                const int xr = (xd->mb_to_right_edge >> 3) + ((4 << bw_log) - (col_off << 2) - (4 << txw));
+                if (need_right) {
+                  const int right_available = mi_col + (col_off + subblk_w) < xd->tile.mi_col_end;
+                  have_top_right = has_top_right(cm, (BLOCK_SIZE)bsize, mi_row, mi_col, have_top, right_available,
+                                                 mi->partition, tx_sizes[0], row_off, col_off, 0, 0);
+                }
+                above_available = (have_top ? AOMMIN(subblk_w, subblk_w + (xr >> 2)) : 0) +
+                                  (have_top_right ? AOMMIN(subblk_w, xr >> 2) : 0);
+              }
+
+              int left_available = have_left;
+              int have_bottom_left = 0;
+              if (need_left) {
+                const int yd = (xd->mb_to_bottom_edge >> 3) + ((4 << bh_log) - (row_off << 2) - (4 << txh));
+                if (need_bot) {
+                  const int bottom_available = (yd > 0) && ((mi_row + row_off + subblk_h) < xd->tile.mi_row_end);
+                  have_bottom_left = has_bottom_left(cm, (BLOCK_SIZE)bsize, mi_row, mi_col, bottom_available, have_left,
+                                                     mi->partition, tx_sizes[0], row_off, col_off, 0, 0);
+                }
+
+                left_available = (have_left ? AOMMIN(subblk_h, subblk_h + (yd >> 2)) : 0) +
+                                 (have_bottom_left ? AOMMIN(subblk_h, yd >> 2) : 0);
+              }
+
+              int iter = -1;
+              int *iter_grid_blk = iter_grid + subblk_w * x + subblk_h * y * iter_grid_stride;
+              const int above_left = have_top && have_left && need_aboveleft;
+              for (int it = 1 - above_left; it <= above_available; ++it) {
+                iter = AOMMAX(iter, iter_grid_blk[it]);
+              }
+              for (int it = 1; it <= left_available; ++it) {
+                iter = AOMMAX(iter, iter_grid_blk[it * iter_grid_stride]);
+              }
+              ++iter;
+              if (is_intrabc) iter = tile->gen_intra_iter_y + 1;
+              iter = AOMMIN(iter, tile->gen_intra_max_iter);
+              tile->gen_intra_iter_y = AOMMAX(iter, tile->gen_intra_iter_y);
+              for (int it = 1; it <= subblk_h; ++it) {
+                iter_grid_blk[it * iter_grid_stride + subblk_w] = iter;
+              }
+              for (int it = 1; it <= subblk_w; ++it) {
+                iter_grid_blk[it + subblk_h * iter_grid_stride] = iter;
+              }
+
+              if (iter > tile->gen_intra_iter_set) {
+                memset(tile->gen_block_map + tile->gen_iter_clear_offset, 0, tile->gen_iter_clear_size);
+                tile->gen_intra_iter_set = tile->gen_intra_max_iter;
+              }
+
+              const int type_index = iter * IntraTypeCount + type_idx_base;
+              index_arr[tile->gen_index_ptr++] =
+                  (tile->gen_block_map[type_index] << 2) | have_top_right | (have_bottom_left << 1);
+              ++tile->gen_block_map[type_index];
+            }
+          }
+        }
+      }
+    }
+
+    if (!uv_use_palette && !is_chroma_ref) {
+      const int mi_col_uv = mi_col >> 1;
+      const int mi_row_uv = mi_row >> 1;
+      const int iter_grid_stride_uv = td->iter_grid_stride_uv;
+      int *iter_grid = td->gen_intra_iter_uv + mi_col_uv + mi_row_uv * iter_grid_stride_uv;
+      const int mode = is_inter_intra ? interintra_to_intra_mode[interintra_mode] : mi->uv_mode;
+
+      int need_above = (NeedAboveLut >> mode) & 1;
+      int need_left = (NeedLeftLut >> mode) & 1;
+      const int need_right = (NeedRightLut >> mode) & 1;
+      const int need_bot = (NeedBotLut >> mode) & 1;
+      int need_aboveleft = (NeedAboveLeftLut >> mode) & 1;
+
+      const int txw = (is_intrabc || is_inter_intra) ? bw_log_uv : (tx_size_wide_log2[tx_sizes[1]] - 2);
+      const int txh = (is_intrabc || is_inter_intra) ? bh_log_uv : (tx_size_high_log2[tx_sizes[1]] - 2);
+
+      const int cnt_y = 1 << (bh_log_uv - txh - unit_y_log);
+      const int cnt_x = 1 << (bw_log_uv - txw - unit_x_log);
+
+      const int type_size = txw + txh;
+
+      if (is_intrabc) {
+        need_above = 0;
+        need_left = 0;
+        need_aboveleft = 0;
+      }
+
+      for (int unit_y = 0; unit_y <= unit_y_log; ++unit_y) {
+        for (int unit_x = 0; unit_x <= unit_x_log; ++unit_x) {
+          for (int suby = 0; suby < cnt_y; ++suby) {
+            for (int subx = 0; subx < cnt_x; ++subx) {
+              const int x = subx + unit_x * cnt_x;
+              const int y = suby + unit_y * cnt_y;
+              const int col_off = x << txw;
+              const int row_off = y << txh;
+              const int subblk_w = 1 << txw;
+              const int subblk_h = 1 << txh;
+              const int have_top = y || xd->chroma_up_available;
+              const int have_left = x || xd->chroma_left_available;
+              int above_available = have_top;
+              int have_top_right = 0;
+              if (need_above) {
+                const int xr = (xd->mb_to_right_edge >> 4) + ((4 << bw_log_uv) - (col_off << 2) - (4 << txw));
+                if (need_right) {
+                  const int right_available = mi_col + ((col_off + subblk_w) << 1) < xd->tile.mi_col_end;
+                  have_top_right = has_top_right(cm, bsize_uv, mi_row, mi_col, have_top, right_available, mi->partition,
+                                                 tx_sizes[1], row_off, col_off, 1, 1);
+                }
+                above_available = (have_top ? AOMMIN(subblk_w, subblk_w + (xr >> 2)) : 0) +
+                                  (have_top_right ? AOMMIN(subblk_w, xr >> 2) : 0);
+              }
+
+              int left_available = have_left;
+              int have_bottom_left = 0;
+              if (need_left) {
+                const int yd = (xd->mb_to_bottom_edge >> 4) + ((4 << bh_log_uv) - (row_off << 2) - (4 << txh));
+                if (need_bot) {
+                  const int bottom_available =
+                      (yd > 0) && ((mi_row + ((row_off + subblk_h) << 1)) < xd->tile.mi_row_end);
+                  have_bottom_left = has_bottom_left(cm, bsize_uv, mi_row, mi_col, bottom_available, have_left,
+                                                     mi->partition, tx_sizes[1], row_off, col_off, 1, 1);
+                }
+
+                left_available = (have_left ? AOMMIN(subblk_h, subblk_h + (yd >> 2)) : 0) +
+                                 (have_bottom_left ? AOMMIN(subblk_h, yd >> 2) : 0);
+              }
+
+              int iter = -1;
+              int *iter_grid_blk = iter_grid + subblk_w * x + subblk_h * y * iter_grid_stride_uv;
+              const int above_left = need_aboveleft && have_top && have_left;
+              for (int it = 1 - above_left; it <= above_available; ++it) {
+                iter = AOMMAX(iter, iter_grid_blk[it]);
+              }
+              for (int it = 1; it <= left_available; ++it) {
+                iter = AOMMAX(iter, iter_grid_blk[it * iter_grid_stride_uv]);
+              }
+              if (mode == UV_CFL_PRED) {
+                int iter_grid_stride = td->iter_grid_stride;
+                int *y_grid = td->gen_intra_iter_y + ((mi_col_uv + col_off) << 1) +
+                              ((mi_row_uv + row_off) << 1) * iter_grid_stride;
+                for (int yit = 1; yit <= (subblk_h << 1); ++yit) {
+                  for (int xit = 1; xit <= (subblk_w << 1); ++xit) {
+                    iter = AOMMAX(iter, y_grid[xit + yit * iter_grid_stride]);
+                  }
+                }
+              }
+
+              ++iter;
+              if (is_intrabc) iter = tile->gen_intra_iter_uv + 1;
+              iter = AOMMIN(iter, tile->gen_intra_max_iter);
+              tile->gen_intra_iter_uv = AOMMAX(iter, tile->gen_intra_iter_uv);
+              for (int it = 1; it <= subblk_h; ++it) {
+                iter_grid_blk[it * iter_grid_stride_uv + subblk_w] = iter;
+              }
+              for (int it = 1; it <= subblk_w; ++it) {
+                iter_grid_blk[it + subblk_h * iter_grid_stride_uv] = iter;
+              }
+
+              if (iter > tile->gen_intra_iter_set) {
+                memset(tile->gen_block_map + tile->gen_iter_clear_offset, 0, tile->gen_iter_clear_size);
+                tile->gen_intra_iter_set = tile->gen_intra_max_iter;
+              }
+
+              const int type_index = iter * IntraTypeCount + IntraBlockOffset - 1 - type_size;
+              index_arr[tile->gen_index_ptr++] =
+                  (tile->gen_block_map[type_index] << 2) | have_top_right | (have_bottom_left << 1);
+              tile->gen_block_map[type_index] += 2;
+            }
+          }
+        }
+      }
+    }
+  }
+
+  if (mi->skip == 0) {
+    if (y_use_palette || is_obmc) {
+      const int type_index = ReconBlockOffset + bw_log + 6 * bh_log;
+      index_arr[tile->gen_index_ptr++] = tile->gen_block_map[type_index]++;
+    }
+    if (uv_use_palette || (is_obmc && !is_chroma_ref)) {
+      const int type_index = ReconBlockOffset + bw_log_uv + 6 * bh_log_uv;
+      index_arr[tile->gen_index_ptr++] = tile->gen_block_map[type_index];
+      tile->gen_block_map[type_index] += 2;
+    }
+  }
+}
+
+struct GenBlockData {
+  int mi_cols;
+  int mi_rows;
+  int mi_stride;
+  unsigned int mi_addr_base;
+  int iter_grid_stride;
+  int iter_grid_offset_uv;
+  int iter_grid_stride_uv;
+  int disable_edge_filter;
+  int force_integet_mv;
+  int reserved[3];
+  int wedge_offsets[22][4];  //??
+  int dist_wtd[8 * 8][4];
+  int lossless_seg[8][4];
+
+  int global_warp[8][4];
+  struct {
+    WarpedMotionParams params;
+    int pad;
+  } wm_params[8];
+};
+
+struct GenBlockSRT {
+  int wi_count;
+  int mi_offset;
+  int mi_idx_base;
+  int col_srart;
+  int row_srart;
+  int index_offset;
+  int index_offset_warp;
+};
+
+void av1_prediction_gen_blocks(AV1Decoder *pbi, Av1Core *dec) {
+  AV1_COMMON *cm = &pbi->common;
+  MACROBLOCKD *const xd = &pbi->mb;
+  auto td = dec->curr_frame_data;
+
+  ComputeCommandBuffer *cb = &td->command_buffer;
+  ConstantBufferObject cbo = cb->Alloc(sizeof(GenBlockData));
+  GenBlockData *data = (GenBlockData *)cbo.host_ptr;
+  data->mi_cols = cm->mi_cols;
+  data->mi_rows = cm->mi_rows;
+  data->mi_stride = cm->mi_stride;
+  data->mi_addr_base = static_cast<unsigned int>(reinterpret_cast<uint64_t>(dec->mode_info_pool->host_ptr));
+  data->iter_grid_stride = td->iter_grid_stride;
+  data->iter_grid_offset_uv = td->iter_grid_offset_uv;
+  data->iter_grid_stride_uv = td->iter_grid_stride_uv;
+  data->disable_edge_filter = !cm->seq_params.enable_intra_edge_filter;
+  data->force_integet_mv = cm->cur_frame_force_integer_mv;
+  for (int bs = 0; bs < 22; ++bs) {
+    data->wedge_offsets[bs][0] = dec->wedge_offsets[bs][0];
+    data->wedge_offsets[bs][1] = dec->wedge_offsets[bs][1];
+  }
+  for (int i = 0; i < 8; ++i) data->lossless_seg[i][0] = xd->lossless[i];
+  for (int i = 0; i < 7; ++i)
+    data->global_warp[i][0] = cm->global_motion[i + 1].wmtype > TRANSLATION && !cm->global_motion[i + 1].invalid;
+  for (int r1 = 0; r1 < 7; ++r1) {
+    for (int r0 = 0; r0 < 7; ++r0) {
+      int fwd_offset = 8;
+      int bck_offset = 8;
+      const RefCntBuffer *const bck_buf = get_ref_frame_buf(cm, r0 + 1);
+      const RefCntBuffer *const fwd_buf = get_ref_frame_buf(cm, r1 + 1);
+      const int cur_frame_index = cm->cur_frame->order_hint;
+      int bck_frame_index = 0, fwd_frame_index = 0;
+      if (bck_buf != NULL) bck_frame_index = bck_buf->order_hint;
+      if (fwd_buf != NULL) fwd_frame_index = fwd_buf->order_hint;
+
+      int d0 = clamp(abs(get_relative_dist(&cm->seq_params.order_hint_info, fwd_frame_index, cur_frame_index)), 0,
+                     MAX_FRAME_DISTANCE);
+      int d1 = clamp(abs(get_relative_dist(&cm->seq_params.order_hint_info, cur_frame_index, bck_frame_index)), 0,
+                     MAX_FRAME_DISTANCE);
+      const int order = d0 <= d1;
+      if (d0 == 0 || d1 == 0) {
+        fwd_offset = quant_dist_lookup_table[0][3][order];
+        bck_offset = quant_dist_lookup_table[0][3][1 - order];
+      } else {
+        int i;
+        for (i = 0; i < 3; ++i) {
+          int c0 = quant_dist_weight[i][order];
+          int c1 = quant_dist_weight[i][!order];
+          int d0_c0 = d0 * c0;
+          int d1_c1 = d1 * c1;
+          if ((d0 > d1 && d0_c0 < d1_c1) || (d0 <= d1 && d0_c0 > d1_c1)) break;
+        }
+        fwd_offset = quant_dist_lookup_table[0][i][order];
+        bck_offset = quant_dist_lookup_table[0][i][1 - order];
+      }
+      data->dist_wtd[r0 + 8 * r1][0] = fwd_offset | (bck_offset << 4);
+    }
+  }
+  for (int i = 0; i < 7; ++i)
+    memcpy(&data->wm_params[i].params, &xd->global_motion[i + 1], sizeof(data->wm_params[0].params));
+
+  const int tile_count = td->tile_count;
+  av1_tile_data *tiles = td->tile_data;
+  int itra_iters = -1;
+
+  int main_tile = 0;
+  for (int t = 0; t < tile_count; ++t) {
+    int tile_iters = -1;
+    tile_iters = AOMMAX(tile_iters, tiles[t].gen_intra_iter_y);
+    tile_iters = AOMMAX(tile_iters, tiles[t].gen_intra_iter_uv);
+    if (tile_iters > itra_iters) {
+      itra_iters = tile_iters;
+      main_tile = t;
+    }
+    tiles[t].gen_pred_map_max =
+        InterTypes::InterCountsAll + ReconstructBlockSizes + (tile_iters + 1) * IntraTypeCount + 1;
+  }
+
+  if (main_tile) {
+    av1_tile_data tile0 = tiles[0];
+    tiles[0] = tiles[main_tile];
+    tiles[main_tile] = tile0;
+    if (td->sec_thread_data) {
+      av1_tile_data *tile2 = td->sec_thread_data->tile_data;
+      tile0 = tile2[0];
+      tile2[0] = tile2[main_tile];
+      tile2[main_tile] = tile0;
+    }
+  }
+
+  td->intra_iters = itra_iters;
+  const int pred_block_types = InterTypes::InterCountsAll + ReconstructBlockSizes + (itra_iters + 1) * IntraTypeCount;
+  int offset = 0;
+  for (int it = 0; it <= pred_block_types; ++it) {
+    for (int t = 0; t < tile_count; ++t) {
+      if (it < tiles[t].gen_pred_map_max) {
+        int cnt = tiles[t].gen_block_map[it];
+        tiles[t].gen_block_map[it] = offset;
+        offset += cnt;
+      }
+    }
+  }
+  offset = 0;
+  for (int it = 0; it <= InterTypes::InterSizesAllCommon; ++it) {
+    for (int t = 0; t < tile_count; ++t) {
+      int cnt = tiles[t].gen_block_map_wrp[it];
+      tiles[t].gen_block_map_wrp[it] = offset;
+      offset += cnt;
+    }
+  }
+  Microsoft::WRL::ComPtr<ID3D12GraphicsCommandList> command_list = dec->compute.command_list;
+  ComputeShader *shader = &dec->shader_lib->shader_gen_pred_blocks;
+  command_list->SetComputeRootSignature(shader->signaturePtr.Get());
+  command_list->SetPipelineState(shader->pso.Get());
+  command_list->SetComputeRootShaderResourceView(0, dec->mode_info_pool->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootShaderResourceView(1, td->gen_mi_block_indexes->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootShaderResourceView(2, td->gen_block_map->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootShaderResourceView(3, td->mode_info_grid->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootShaderResourceView(4, td->gen_intra_inter_grid->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootUnorderedAccessView(5, dec->prediction_blocks->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootUnorderedAccessView(6, dec->prediction_blocks_warp->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootConstantBufferView(7, cbo.dev_address);
+  GenBlockSRT srt;
+  for (int i = 0; i < tile_count; ++i) {
+    srt.wi_count = tiles[i].mi_count;
+    srt.mi_offset = tiles[i].mi_offset;
+    srt.mi_idx_base = tiles[i].gen_index_base;
+    srt.col_srart = tiles[i].mi_col_start;
+    srt.row_srart = tiles[i].mi_row_start;
+    srt.index_offset = tiles[i].gen_block_map_offset;
+    srt.index_offset_warp = tiles[i].gen_block_warp_offset;
+    command_list->SetComputeRoot32BitConstants(8, 7, &srt, 0);
+    command_list->Dispatch((srt.wi_count + 63) >> 6, 1, 1);
+  }
+
+  if (td->sec_thread_data) {
+    av1_tile_data *tiles2 = td->sec_thread_data->tile_data;
+    for (int i = 0; i < tile_count; ++i) {
+      srt.wi_count = tiles2[i].mi_count;
+      srt.mi_offset = tiles2[i].mi_offset;
+      if (srt.wi_count) {
+        srt.mi_idx_base = tiles[i].gen_index_base;
+        srt.col_srart = tiles[i].mi_col_start;
+        srt.row_srart = tiles[i].mi_row_start;
+        srt.index_offset = tiles[i].gen_block_map_offset;
+        srt.index_offset_warp = tiles[i].gen_block_warp_offset;
+        command_list->SetComputeRoot32BitConstants(8, 7, &srt, 0);
+        command_list->Dispatch((srt.wi_count + 63) >> 6, 1, 1);
+      }
+    }
+  }
+  D3D12_RESOURCE_BARRIER barriers[] = {CD3DX12_RESOURCE_BARRIER::UAV(dec->prediction_blocks->dev),
+                                       CD3DX12_RESOURCE_BARRIER::UAV(dec->prediction_blocks_warp->dev)};
+  command_list->ResourceBarrier(2, barriers);
+  PutPerfMarker(td, &td->perf_markers[1]);
+}
+
+struct GlobalMotionWarp {
+  int is_warp;
+  int reserved;
+  int mat[6];
+  int alpha;
+  int beta;
+  int delta;
+  int gamma;
+};
+
+typedef struct {
+  PlaneInfo planes[3];
+  RefPlaneInfo refplanes[3 * 7];
+  int dims[2][4];
+  int pixel_max[4];
+  int kernels[8 * 16][8];
+  ScaleFactor scale[8];
+  int obmc_mask[(1 + 1 + 2 + 4 + 8) * 4];
+  GlobalMotionWarp warp[7];
+} PSSLInterData;
+
+typedef struct {
+  unsigned int wi_count;
+  unsigned int pass_offset;
+  unsigned int width_log2;
+  unsigned int height_log2;
+} PSSLInterSRT;
+
+struct PSSLIntraData {
+  PlaneInfo planes[3];
+  int flags[4];
+  int filter[5][8][8];
+  int mode_params_lut[16][7][4];
+  int sm_weight_arrays[128][4];
+};
+
+struct PSSLIntraSRT {
+  int counts0[8];
+  int wi_count;
+  int pass_offset;
+};
+
+struct PSSLReconSRT {
+  int wi_count;
+  int pass_offset;
+  int width_log2;
+  int height_log2;
+  int fb_base_offset;
+  int r[5];
+};
+
+void av1_prediction_run_all(Av1Core *dec, AV1_COMMON *cm, TileInfo *tile) {
+  av1_frame_thread_data *td = dec->curr_frame_data;
+  const int tile_count = td->tile_count;
+  av1_tile_data *tiles = td->tile_data;
+
+  // 1 Prepare command buffer and common data
+  bitdepth_dependent_shaders *shaders = td->shaders;
+  ComputeCommandBuffer *cb = &td->command_buffer;
+
+  int do_inter = 0;
+  for (int i = 0; i < tile_count; ++i) {
+    do_inter |= tiles[i].have_inter;
+  }
+
+  Microsoft::WRL::ComPtr<ID3D12GraphicsCommandList> command_list = dec->compute.command_list;
+  if (do_inter) {
+    ConstantBufferObject cbo = cb->Alloc(sizeof(PSSLInterData));
+    PSSLInterData *inter_data = (PSSLInterData *)cbo.host_ptr;
+    memcpy(inter_data->planes, td->frame_buffer->planes, sizeof(inter_data->planes));
+    memcpy(inter_data->scale, td->scale_factors, sizeof(inter_data->scale));
+    for (int i = 0; i < 7; ++i) {
+      for (int p = 0; p < 3; ++p) {
+        inter_data->refplanes[i * 3 + p].offset = td->refs[i]->planes[p].offset;
+        inter_data->refplanes[i * 3 + p].stride = td->refs[i]->planes[p].stride;
+        inter_data->refplanes[i * 3 + p].width =
+            p == 0 ? td->refs[i]->y_crop_width - 1 : td->refs[i]->uv_crop_width - 1;
+        inter_data->refplanes[i * 3 + p].height =
+            p == 0 ? td->refs[i]->y_crop_height - 1 : td->refs[i]->uv_crop_height - 1;
+      }
+    }
+    inter_data->dims[0][0] = inter_data->refplanes[0].width;
+    inter_data->dims[0][1] = inter_data->refplanes[0].height;
+    inter_data->dims[1][0] = inter_data->refplanes[1].width;
+    inter_data->dims[1][1] = inter_data->refplanes[1].height;
+
+    memset(inter_data->kernels, 0, sizeof(inter_data->kernels));
+    // add 4tap filters first:
+    const int16_t *ker_arr[] = {
+        (const int16_t *)av1_sub_pel_filters_4,       (const int16_t *)av1_sub_pel_filters_4smooth,
+        (const int16_t *)av1_bilinear_filters,        (const int16_t *)av1_sub_pel_filters_8,
+        (const int16_t *)av1_sub_pel_filters_8smooth, (const int16_t *)av1_sub_pel_filters_8sharp};
+    for (int i = 0; i < 6; ++i)
+      for (int j = 0; j < 16; ++j)
+        for (int k = 0; k < 8; ++k) inter_data->kernels[i * 16 + j][k] = ker_arr[i][j * 8 + k];
+    memcpy(inter_data->obmc_mask, obmc_mask, sizeof(obmc_mask));
+
+    int enable_gm_warp = 0;
+    for (int i = 0; i < 7; ++i) {
+      WarpedMotionParams *wm = &cm->global_motion[i + 1];
+      const int is_gmwarp = wm->wmtype > TRANSLATION && !wm->invalid;
+      enable_gm_warp |= is_gmwarp;
+      inter_data->warp[i].is_warp = is_gmwarp;
+      if (is_gmwarp) {
+        inter_data->warp[i].mat[0] = wm->wmmat[0];
+        inter_data->warp[i].mat[1] = wm->wmmat[1];
+        inter_data->warp[i].mat[2] = wm->wmmat[2];
+        inter_data->warp[i].mat[3] = wm->wmmat[3];
+        inter_data->warp[i].mat[4] = wm->wmmat[4];
+        inter_data->warp[i].mat[5] = wm->wmmat[5];
+        inter_data->warp[i].alpha = wm->alpha;
+        inter_data->warp[i].beta = wm->beta;
+        inter_data->warp[i].delta = wm->delta;
+        inter_data->warp[i].gamma = wm->gamma;
+      }
+    }
+
+    // 2 Run independent blocks:
+    ComputeShader *shaders_inter[InterTypes::Inter2x2 + 1];
+    shaders_inter[InterTypes::Warp] = &shaders->inter_warp;
+    const int wi_per_block = td->scale_enable ? 4 : 1;
+    if (td->scale_enable) {
+      shaders_inter[InterTypes::CasualInter] = &shaders->inter_scale;
+      shaders_inter[InterTypes::CompoundAvrg] = &shaders->inter_scale_comp;
+      shaders_inter[InterTypes::CompoundDiff] = &shaders->inter_scale_comp_diff_y;
+      shaders_inter[InterTypes::CompoundMasked] = &shaders->inter_scale_comp_masked;
+      shaders_inter[InterTypes::CompoundDiffUv] = &shaders->inter_scale_comp_diff_uv;
+      shaders_inter[InterTypes::ObmcAbove] = &shaders->inter_scale_obmc_above;
+      shaders_inter[InterTypes::ObmcLeft] = &shaders->inter_scale_obmc_left;
+      shaders_inter[InterTypes::Inter2x2] = &shaders->inter_scale_2x2;
+      shaders_inter[InterTypes::CompoundGlobalWarp] = &shaders->inter_warp_comp;
+      shaders_inter[InterTypes::CompoundDiffUvGlobalWarp] = &shaders->inter_warp_comp;
+    } else {
+      shaders_inter[InterTypes::CasualInter] = &shaders->inter_base;
+      shaders_inter[InterTypes::CompoundAvrg] = &shaders->inter_comp;
+      shaders_inter[InterTypes::CompoundDiff] = &shaders->inter_comp_diff_y;
+      shaders_inter[InterTypes::CompoundMasked] = &shaders->inter_comp_masked;
+      shaders_inter[InterTypes::CompoundDiffUv] = &shaders->inter_comp_diff_uv;
+      shaders_inter[InterTypes::ObmcAbove] = &shaders->inter_obmc_above;
+      shaders_inter[InterTypes::ObmcLeft] = &shaders->inter_obmc_left;
+      shaders_inter[InterTypes::Inter2x2] = &shaders->inter_2x2;
+      shaders_inter[InterTypes::CompoundGlobalWarp] = &shaders->inter_warp_comp;
+      shaders_inter[InterTypes::CompoundDiffUvGlobalWarp] = &shaders->inter_warp_comp;
+    }
+
+    ComputeShader *shader = &shaders->inter_base;
+    command_list->SetComputeRootSignature(shader->signaturePtr.Get());
+    command_list->SetComputeRootShaderResourceView(0, dec->prediction_blocks->dev->GetGPUVirtualAddress());
+    command_list->SetComputeRootShaderResourceView(1, dec->idct_residuals->dev->GetGPUVirtualAddress());
+    command_list->SetComputeRootShaderResourceView(2, dec->inter_mask_lut->dev->GetGPUVirtualAddress());
+    command_list->SetComputeRootShaderResourceView(3, dec->prediction_blocks_warp->dev->GetGPUVirtualAddress());
+    command_list->SetComputeRootShaderResourceView(4, dec->inter_warp_filter->dev->GetGPUVirtualAddress());
+    command_list->SetComputeRootUnorderedAccessView(5, dec->frame_buffer_pool->dev->GetGPUVirtualAddress());
+    command_list->SetComputeRootConstantBufferView(6, cbo.dev_address);
+    PSSLInterSRT srt;
+    int *offsets = NULL;
+    shader = NULL;
+    for (int t = InterTypes::Warp; t <= InterTypes::CompoundMasked; ++t) {
+      const int wi_per_subb = t == InterTypes::Warp ? 1 : wi_per_block;
+      offsets = (t == InterTypes::Warp) ? tiles->gen_block_map_wrp
+                                        : (tiles->gen_block_map + (t - 1) * InterTypes::InterSizesAllCommon);
+      const int is_4x2 = td->scale_enable == 0 && t > InterTypes::CasualInter;
+      for (int i = 0; i < InterTypes::InterSizesAllCommon; ++i) {
+        const int offset = offsets[i];
+        const int count = offsets[i + 1] - offset;
+        if (!count) continue;
+        srt.pass_offset = offset;
+        srt.width_log2 = InterBlockWidthLUT[i];
+        srt.height_log2 = InterBlockHeightLUT[i] + is_4x2;
+        srt.wi_count = (wi_per_subb << (srt.width_log2 + srt.height_log2)) * count;
+
+        if (shader != shaders_inter[t]) {
+          shader = shaders_inter[t];
+          command_list->SetPipelineState(shader->pso.Get());
+        }
+        command_list->SetComputeRoot32BitConstants(7, 4, &srt, 0);
+        command_list->Dispatch((srt.wi_count + 63) >> 6, 1, 1);
+      }
+    }
+
+    if (enable_gm_warp) {
+      offsets = tiles->gen_block_map + (InterTypes::CompoundGlobalWarp - 1) * InterTypes::InterSizesAllCommon;
+      for (int i = 0; i < InterTypes::InterSizesAllCommon; ++i) {
+        const int offset = offsets[i];
+        const int count = offsets[i + 1] - offset;
+        if (!count) continue;
+        srt.pass_offset = offset;
+        srt.width_log2 = InterBlockWidthLUT[i];
+        srt.height_log2 = InterBlockHeightLUT[i];
+        srt.wi_count = (4 << (srt.width_log2 + srt.height_log2)) * count;
+
+        if (shader != &shaders->inter_warp_comp) {
+          shader = &shaders->inter_warp_comp;
+          command_list->SetPipelineState(shader->pso.Get());
+        }
+        command_list->SetComputeRoot32BitConstants(7, 4, &srt, 0);
+        command_list->Dispatch((srt.wi_count + 63) >> 6, 1, 1);
+      }
+    }
+
+    if (tiles->gen_block_map[InterTypes::Inter2x2ArrOffset + 1] > tiles->gen_block_map[InterTypes::Inter2x2ArrOffset]) {
+      shader = shaders_inter[InterTypes::Inter2x2];
+      const int offset = tiles->gen_block_map[InterTypes::Inter2x2ArrOffset];
+      const int count = tiles->gen_block_map[InterTypes::Inter2x2ArrOffset + 1] - offset;
+      srt.pass_offset = offset;
+      srt.width_log2 = 0;
+      srt.height_log2 = 0;
+      srt.wi_count = count << td->scale_enable;
+      command_list->SetPipelineState(shader->pso.Get());
+      command_list->SetComputeRoot32BitConstants(7, 4, &srt, 0);
+      command_list->Dispatch((srt.wi_count + 63) >> 6, 1, 1);
+    }
+
+    // sync 0
+    command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->frame_buffer_pool->dev));
+
+    offsets = tiles->gen_block_map + (InterTypes::CompoundDiffUv - 1) * InterTypes::InterSizesAllCommon;
+    for (int i = 0; i < InterTypes::InterSizesAllCommon; ++i) {
+      const int offset = offsets[i];
+      const int count = offsets[i + 1] - offset;
+      if (!count) continue;
+      srt.pass_offset = offset;
+      srt.width_log2 = InterBlockWidthLUT[i];
+      srt.height_log2 = InterBlockHeightLUT[i] + (td->scale_enable == 0);
+      srt.wi_count = (wi_per_block << (srt.width_log2 + srt.height_log2)) * count;
+      if (shader != shaders_inter[InterTypes::CompoundDiffUv]) {
+        shader = shaders_inter[InterTypes::CompoundDiffUv];
+        command_list->SetPipelineState(shader->pso.Get());
+      }
+      command_list->SetComputeRoot32BitConstants(7, 4, &srt, 0);
+      command_list->Dispatch((srt.wi_count + 63) >> 6, 1, 1);
+    }
+
+    if (enable_gm_warp) {
+      offsets = tiles->gen_block_map + (InterTypes::CompoundDiffUvGlobalWarp - 1) * InterTypes::InterSizesAllCommon;
+      for (int i = 0; i < InterTypes::InterSizesAllCommon; ++i) {
+        const int offset = offsets[i];              // i == 0 ? offset0 : offsets[i - 1];
+        const int count = offsets[i + 1] - offset;  // offsets[i] - offset;
+        if (!count) continue;
+        srt.pass_offset = offset;
+        srt.width_log2 = InterBlockWidthLUT[i];
+        srt.height_log2 = InterBlockHeightLUT[i];
+        srt.wi_count = (4 << (srt.width_log2 + srt.height_log2)) * count;
+        if (shader != &shaders->inter_warp_comp) {
+          shader = &shaders->inter_warp_comp;
+          command_list->SetPipelineState(shader->pso.Get());
+        }
+        command_list->SetComputeRoot32BitConstants(7, 4, &srt, 0);
+        command_list->Dispatch((srt.wi_count + 63) >> 6, 1, 1);
+      }
+    }
+
+    offsets = tiles->gen_block_map + (InterTypes::ObmcAbove - 1) * InterTypes::InterSizesAllCommon;
+    for (int i = 0; i < InterTypes::InterSizesAllCommon; ++i) {
+      const int offset = offsets[i];
+      const int count = offsets[i + 1] - offset;
+      if (!count) continue;
+      srt.pass_offset = offset;  //((w << 2) | h)
+      srt.width_log2 = i >> 2;
+      srt.height_log2 = i & 3;
+      srt.wi_count = (wi_per_block << (srt.width_log2 + srt.height_log2)) * count;
+      if (shader != shaders_inter[InterTypes::ObmcAbove]) {
+        shader = shaders_inter[InterTypes::ObmcAbove];
+        command_list->SetPipelineState(shader->pso.Get());
+      }
+      command_list->SetComputeRoot32BitConstants(7, 4, &srt, 0);
+      command_list->Dispatch((srt.wi_count + 63) >> 6, 1, 1);
+    }
+
+    // sync 1
+    command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->frame_buffer_pool->dev));
+
+    // pass 2: Obmc-left
+    offsets = tiles->gen_block_map + (ObmcLeft - 1) * InterTypes::InterSizesAllCommon;
+    for (int i = 0; i < InterTypes::InterSizesAllCommon; ++i) {
+      const int offset = offsets[i];
+      const int count = offsets[i + 1] - offset;
+      if (!count) continue;
+      srt.pass_offset = offset;
+      srt.width_log2 = i & 3;
+      srt.height_log2 = i >> 2;
+      srt.wi_count = (wi_per_block << (srt.width_log2 + srt.height_log2)) * count;
+      if (shader != shaders_inter[InterTypes::ObmcLeft]) {
+        shader = shaders_inter[InterTypes::ObmcLeft];
+        command_list->SetPipelineState(shader->pso.Get());
+      }
+      command_list->SetComputeRoot32BitConstants(7, 4, &srt, 0);
+      command_list->Dispatch((srt.wi_count + 63) >> 6, 1, 1);
+    }
+
+    command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->frame_buffer_pool->dev));
+  }
+
+  const int recon_blocks =
+      tiles->gen_block_map[ReconBlockOffset + ReconstructBlockSizes] - tiles->gen_block_map[ReconBlockOffset];
+  if (td->intra_iters >= 0 || recon_blocks) {
+    ConstantBufferObject cbo = cb->Alloc(sizeof(PSSLIntraData));
+    PSSLIntraData *data = (PSSLIntraData *)cbo.host_ptr;
+    ComputeShader *shader = &shaders->intra_main;
+
+    data->flags[0] = cm->seq_params.enable_intra_edge_filter;
+    memcpy(data->planes, td->frame_buffer->planes, sizeof(data->planes));
+    for (int m = 0; m < 5; ++m)
+      for (int k = 0; k < 8; ++k)
+        for (int i = 0; i < 8; ++i) data->filter[m][k][i] = av1_filter_intra_taps[m][k][i];
+    memcpy(data->mode_params_lut, intra_mode_shader_params, sizeof(intra_mode_shader_params));
+    for (int i = 0; i < 128; ++i) data->sm_weight_arrays[i][0] = sm_weight_arrays[i];
+
+    command_list->SetComputeRootSignature(shader->signaturePtr.Get());
+    command_list->SetComputeRootShaderResourceView(0, dec->prediction_blocks->dev->GetGPUVirtualAddress());
+    command_list->SetComputeRootShaderResourceView(1, dec->idct_residuals->dev->GetGPUVirtualAddress());
+    command_list->SetComputeRootUnorderedAccessView(3, dec->frame_buffer_pool->dev->GetGPUVirtualAddress());
+    command_list->SetComputeRootConstantBufferView(4, cbo.dev_address);
+    if (recon_blocks) {
+      ComputeShader *shader = &shaders->reconstruct_block;
+      command_list->SetPipelineState(shader->pso.Get());
+      command_list->SetComputeRootShaderResourceView(2, td->palette_buffer->dev->GetGPUVirtualAddress());
+      PSSLReconSRT recon_srt;
+      recon_srt.fb_base_offset = static_cast<int>(td->frame_buffer->base_offset);
+      for (int i = 0; i < ReconstructBlockSizes; ++i) {
+        int offset = tiles->gen_block_map[ReconBlockOffset + i];
+        int count = tiles->gen_block_map[ReconBlockOffset + i + 1] - offset;
+        if (count) {
+          recon_srt.pass_offset = offset;
+          recon_srt.width_log2 = i % 6;
+          recon_srt.height_log2 = i / 6;
+          recon_srt.wi_count = count << (recon_srt.width_log2 + recon_srt.height_log2);
+
+          command_list->SetComputeRoot32BitConstants(5, 10, &recon_srt, 0);
+          command_list->Dispatch((recon_srt.wi_count + 63) >> 6, 1, 1);
+        }
+      }
+
+      command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->frame_buffer_pool->dev));
+    }
+
+    if (td->intra_iters >= 0) {
+      PSSLIntraSRT srt;
+      command_list->SetComputeRootShaderResourceView(2, dec->inter_mask_lut->dev->GetGPUVirtualAddress());
+      const int *offsets = tiles->gen_block_map + InterTypes::InterCountsAll + ReconstructBlockSizes;
+      ComputeShader *shader = NULL;
+      for (int it = 0; it <= td->intra_iters; ++it) {
+        const int base = it * IntraTypeCount + 1;
+        int offset = offsets[-1 + base];
+        srt.counts0[0] = offsets[base] - offset;
+        srt.counts0[1] = offsets[base + 1] - offsets[base + 0];
+        srt.counts0[2] = offsets[base + 2] - offsets[base + 1];
+        srt.counts0[3] = offsets[base + 3] - offsets[base + 2];
+        srt.counts0[4] = offsets[base + 4] - offsets[base + 3];
+        srt.counts0[5] = offsets[base + 5] - offsets[base + 4];
+        srt.counts0[6] = offsets[base + 6] - offsets[base + 5];
+        srt.counts0[7] = offsets[base + 7] - offsets[base + 6];
+        int counts8 = offsets[base + 8] - offsets[base + 7];
+        srt.pass_offset = offset;
+        srt.wi_count = (srt.counts0[0] << 10) + (srt.counts0[1] << 9) + (srt.counts0[2] << 8) + (srt.counts0[3] << 7) +
+                       (srt.counts0[4] << 6) + (srt.counts0[5] << 5) + (srt.counts0[6] << 4) + (srt.counts0[7] << 3) +
+                       (counts8 << 2);
+        if (srt.wi_count) {
+          if (shader != &shaders->intra_main) {
+            shader = &shaders->intra_main;
+            command_list->SetPipelineState(shader->pso.Get());
+          }
+          command_list->SetComputeRoot32BitConstants(5, 10, &srt, 0);
+          command_list->Dispatch((srt.wi_count + 255) >> 8, 1, 1);
+        }
+
+        offset = offsets[base + IntraSizes - 1];
+        int count = offsets[base + IntraSizes] - offset;
+        if (count) {
+          int filt_srt[2];
+          filt_srt[0] = count << 3;
+          filt_srt[1] = offset;
+          if (shader != &shaders->intra_filter) {
+            shader = &shaders->intra_filter;
+            command_list->SetPipelineState(shader->pso.Get());
+          }
+          command_list->SetComputeRoot32BitConstants(5, 2, filt_srt, 0);
+          command_list->Dispatch(count, 1, 1);
+        }
+
+        if (count || srt.wi_count) {
+          command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->frame_buffer_pool->dev));
+        } else {
+          assert(0);
+        }
+      }
+    }
+  }
+  PutPerfMarker(td, &td->perf_markers[3]);
+}
+
+void av1_inter_ext_borders(Av1Core *dec, AV1_COMMON *cm) {
+  av1_frame_thread_data *td = dec->curr_frame_data;
+  ComputeCommandBuffer *cb = &td->command_buffer;
+
+  typedef struct {
+    int planes[3][4];
+    int dims[2][4];
+  } CBData;
+
+  ConstantBufferObject cbo = cb->Alloc(sizeof(CBData));
+  CBData *cb_data = static_cast<CBData *>(cbo.host_ptr);
+  memcpy(cb_data->planes, td->dst_frame_buffer->planes, sizeof(cb_data->planes));
+  YV12_BUFFER_CONFIG *buf = &cm->cur_frame->buf;
+
+  cb_data->dims[0][0] = buf->y_crop_width;
+  cb_data->dims[0][1] = buf->y_crop_height;
+  cb_data->dims[1][0] = buf->uv_crop_width;
+  cb_data->dims[1][1] = buf->uv_crop_height;
+
+  ComputeShader *shader = &td->shaders->inter_ext_borders;
+  const int wi_count = buf->y_crop_height + buf->uv_crop_height * 2;
+  Microsoft::WRL::ComPtr<ID3D12GraphicsCommandList> command_list = dec->compute.command_list;
+  command_list->SetComputeRootSignature(shader->signaturePtr.Get());
+  command_list->SetPipelineState(shader->pso.Get());
+  command_list->SetComputeRootUnorderedAccessView(0, dec->frame_buffer_pool->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootConstantBufferView(1, cbo.dev_address);
+  command_list->Dispatch((wi_count + 63) / 64, 1, 1);
+  command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->frame_buffer_pool->dev));
+}

diff --git a/libav1/dx/shaders/cdef_filter.hlsl b/libav1/dx/shaders/cdef_filter.hlsl
new file mode 100644
index 0000000..610e8f6
--- /dev/null
+++ b/libav1/dx/shaders/cdef_filter.hlsl

@@ -0,0 +1,592 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma warning(disable : 4714)
+ByteAddressBuffer cdef_index : register(t0);
+ByteAddressBuffer skips : register(t1);
+RWByteAddressBuffer dst : register(u0);
+
+struct CDefData {
+  int4 pl;
+
+  int uv_stride;
+  int dst_offset_y;
+  int dst_offset_u;
+  int dst_offset_v;
+
+  int uv_offset_u;
+  int uv_offset_v;
+  int index_stride;
+  int skips_stride;
+
+  int pri_damping;
+  int sec_damping;
+  int pli;
+  int hbd;
+
+  int bit_depth;
+  int3 _dummie;
+
+  int4 cdef_directions[16][2];
+  int4 cdef_strength[16];
+  int4 cdef_uv_strength[16];
+};
+
+cbuffer cb_cdef_data : register(b0) { CDefData data; }
+
+#define CDEF_BLOCK_WIDTH 32
+#define CDEF_BLOCK_HEIGHT 32
+#define CDEF_UV_BLOCK_WIDTH 16
+#define CDEF_UV_BLOCK_HEIGHT 16
+
+#define CDEF_WIDTH 64
+#define CDEF_HEIGHT 64
+
+#define CDEF_VERY_LARGE (30000)
+
+#define WG_WIDTH 4
+#define WG_HEIGHT 4
+
+groupshared int input[CDEF_BLOCK_HEIGHT + 18][CDEF_BLOCK_WIDTH + 8];
+groupshared int output[CDEF_BLOCK_HEIGHT][CDEF_BLOCK_WIDTH];
+groupshared int costs[8][WG_HEIGHT][WG_WIDTH];
+groupshared int2 temp[WG_HEIGHT][WG_WIDTH];
+
+// TODO: reorganize load (optimization)
+void load_input(int4 plane, int gx, int gy, int llx, int lly, int llz) {
+  uint id = ((llz * 4 + lly) * 4 + llx);
+  int lx = id % (CDEF_BLOCK_WIDTH / 4 + 2);
+  int ly = id / (CDEF_BLOCK_WIDTH / 4 + 2);
+
+  plane.z >>= 2;
+  for (int y = gy - 3 + ly; y < gy - 3 + CDEF_BLOCK_HEIGHT + 6; y += WG_HEIGHT) {
+    const int gx4 = gx >> 2;
+    for (int x = gx4 - 1 + lx; x < gx4 - 1 + CDEF_BLOCK_WIDTH / 4 + 2; x += WG_WIDTH) {
+      int is_clamp = (x < 0) || (y < 0) || (x >= plane.z) || (y >= plane.w);
+      int4 in_4 = {CDEF_VERY_LARGE, CDEF_VERY_LARGE, CDEF_VERY_LARGE, CDEF_VERY_LARGE};
+      if (!is_clamp) {
+        uint input_char = dst.Load(plane.y + y * plane.x + x * 4);
+        in_4.x = (input_char >> 0) & 255;
+        in_4.y = (input_char >> 8) & 255;
+        in_4.z = (input_char >> 16) & 255;
+        in_4.w = (input_char >> 24) & 255;
+      }
+      input[y - (gy - 3)][(x - (gx4 - 1)) * 4 + 0] = in_4.x;
+      input[y - (gy - 3)][(x - (gx4 - 1)) * 4 + 1] = in_4.y;
+      input[y - (gy - 3)][(x - (gx4 - 1)) * 4 + 2] = in_4.z;
+      input[y - (gy - 3)][(x - (gx4 - 1)) * 4 + 3] = in_4.w;
+    }
+  }
+}
+void load_input_hbd(int4 plane, int gx, int gy, int llx, int lly, int llz) {
+  uint id = ((llz * 4 + lly) * 4 + llx);
+  int lx = id % (CDEF_BLOCK_WIDTH / 4 + 2);
+  int ly = id / (CDEF_BLOCK_WIDTH / 4 + 2);
+
+  plane.z >>= 2;
+  for (int y = gy - 3 + ly; y < gy - 3 + CDEF_BLOCK_HEIGHT + 6; y += WG_HEIGHT) {
+    const int gx4 = gx >> 2;
+    for (int x = gx4 - 1 + lx; x < gx4 - 1 + CDEF_BLOCK_WIDTH / 4 + 2; x += WG_WIDTH) {
+      int is_clamp = (x < 0) || (y < 0) || (x >= plane.z) || (y >= plane.w);
+      int4 in_4 = {CDEF_VERY_LARGE, CDEF_VERY_LARGE, CDEF_VERY_LARGE, CDEF_VERY_LARGE};
+      if (!is_clamp) {
+        uint2 input_char = dst.Load2(plane.y + y * plane.x + (x << 3));
+        in_4.x = (input_char.x >> 0) & 0x03ff;
+        in_4.y = (input_char.x >> 16) & 0x03ff;
+        in_4.z = (input_char.y >> 0) & 0x03ff;
+        in_4.w = (input_char.y >> 16) & 0x03ff;
+      }
+      input[y - (gy - 3)][(x - (gx4 - 1)) * 4 + 0] = in_4.x;
+      input[y - (gy - 3)][(x - (gx4 - 1)) * 4 + 1] = in_4.y;
+      input[y - (gy - 3)][(x - (gx4 - 1)) * 4 + 2] = in_4.z;
+      input[y - (gy - 3)][(x - (gx4 - 1)) * 4 + 3] = in_4.w;
+    }
+  }
+}
+
+// TODO: reorganize load (optimization)
+void load_uv_input(int4 plane, int gx, int gy, int llx, int lly, int llz, int pid) {
+  uint id = ((llz * 4 + lly) * 4 + llx);
+  int lx = id % (CDEF_UV_BLOCK_WIDTH / 4 + 2);
+  int ly = id / (CDEF_UV_BLOCK_WIDTH / 4 + 2);
+  plane.z >>= 2;
+  for (int y = gy - 3 + ly; y < gy - 3 + CDEF_UV_BLOCK_HEIGHT + 6; y += WG_HEIGHT) {
+    const int gx4 = gx >> 2;
+    for (int x = gx4 - 1 + lx; x < gx4 - 1 + CDEF_UV_BLOCK_WIDTH / 4 + 2; x += WG_WIDTH) {
+      int is_clamp = (x < 0) || (y < 0) || (x >= plane.z) || (y >= plane.w);
+      int4 in_4 = {CDEF_VERY_LARGE, CDEF_VERY_LARGE, CDEF_VERY_LARGE, CDEF_VERY_LARGE};
+      if (!is_clamp) {
+        uint input_char = dst.Load(plane.y + y * plane.x + x * 4);
+        in_4.x = (input_char >> 0) & 255;
+        in_4.y = (input_char >> 8) & 255;
+        in_4.z = (input_char >> 16) & 255;
+        in_4.w = (input_char >> 24) & 255;
+      }
+      input[y - (gy - 3) + (pid - 1) * 24][(x - (gx4 - 1)) * 4 + 0] = in_4.x;
+      input[y - (gy - 3) + (pid - 1) * 24][(x - (gx4 - 1)) * 4 + 1] = in_4.y;
+      input[y - (gy - 3) + (pid - 1) * 24][(x - (gx4 - 1)) * 4 + 2] = in_4.z;
+      input[y - (gy - 3) + (pid - 1) * 24][(x - (gx4 - 1)) * 4 + 3] = in_4.w;
+    }
+  }
+}
+void load_uv_input_hbd(int4 plane, int gx, int gy, int llx, int lly, int llz, int pid) {
+  uint id = ((llz * 4 + lly) * 4 + llx);
+  int lx = id % (CDEF_UV_BLOCK_WIDTH / 4 + 2);
+  int ly = id / (CDEF_UV_BLOCK_WIDTH / 4 + 2);
+  plane.z >>= 2;
+  for (int y = gy - 3 + ly; y < gy - 3 + CDEF_UV_BLOCK_HEIGHT + 6; y += WG_HEIGHT) {
+    const int gx4 = gx >> 2;
+    for (int x = gx4 - 1 + lx; x < gx4 - 1 + CDEF_UV_BLOCK_WIDTH / 4 + 2; x += WG_WIDTH) {
+      int is_clamp = (x < 0) || (y < 0) || (x >= plane.z) || (y >= plane.w);
+      int4 in_4 = {CDEF_VERY_LARGE, CDEF_VERY_LARGE, CDEF_VERY_LARGE, CDEF_VERY_LARGE};
+      if (!is_clamp) {
+        uint2 input_char = dst.Load2(plane.y + y * plane.x + (x << 3));
+        in_4.x = (input_char.x >> 0) & 0x03ff;
+        in_4.y = (input_char.x >> 16) & 0x03ff;
+        in_4.z = (input_char.y >> 0) & 0x03ff;
+        in_4.w = (input_char.y >> 16) & 0x03ff;
+      }
+      input[y - (gy - 3) + (pid - 1) * 24][(x - (gx4 - 1)) * 4 + 0] = in_4.x;
+      input[y - (gy - 3) + (pid - 1) * 24][(x - (gx4 - 1)) * 4 + 1] = in_4.y;
+      input[y - (gy - 3) + (pid - 1) * 24][(x - (gx4 - 1)) * 4 + 2] = in_4.z;
+      input[y - (gy - 3) + (pid - 1) * 24][(x - (gx4 - 1)) * 4 + 3] = in_4.w;
+    }
+  }
+}
+
+int get_loaded_source_sample(int x, int y) { return input[y + 3][x + 4]; }
+
+int2 get_block_dir(int x0, int y0, int lx, int ly, int lz, int coeff_shift) {
+  int i;
+  int z0 = lz;
+  int cost[8] = {0, 0, 0, 0, 0, 0, 0, 0};
+
+  const int div_table[] = {0, 840, 420, 280, 210, 168, 140, 120, 105};
+  const int div_table_idx[8][8] = {{1, 2, 3, 4, 5, 6, 7, 8}, {2, 4, 6, 8, 8, 8, 8, 8}, {8, 8, 8, 8, 8, 8, 8, 8},
+                                   {2, 4, 6, 8, 8, 8, 8, 8}, {1, 2, 3, 4, 5, 6, 7, 8}, {2, 4, 6, 8, 8, 8, 8, 8},
+                                   {8, 8, 8, 8, 8, 8, 8, 8}, {2, 4, 6, 8, 8, 8, 8, 8}};
+  const int4 prt_idx[8] = {{1, 1, 0, 0},  {1, 1, 0, 1},  {1, 0, 0, 0}, {1, -1, 0, 1},
+                           {1, -1, 0, 0}, {-1, 1, 1, 0}, {0, 1, 0, 0}, {1, 1, 1, 0}};
+  const int prt_idx_shift[8] = {0, 0, 0, 3, 7, 3, 0, 0};
+
+  const int cost_prt_idx[8][8] = {{14, 13, 12, 11, 10, 9, 8, -1},
+                                  {10, 9, 8, -1, -1, -1, -1, -1},
+                                  {
+                                      -1,
+                                      -1,
+                                      -1,
+                                      -1,
+                                      -1,
+                                      -1,
+                                      -1,
+                                      -1,
+                                  },
+                                  {10, 9, 8, -1, -1, -1, -1, -1},
+                                  {14, 13, 12, 11, 10, 9, 8, -1},
+                                  {10, 9, 8, -1, -1, -1, -1, -1},
+                                  {
+                                      -1,
+                                      -1,
+                                      -1,
+                                      -1,
+                                      -1,
+                                      -1,
+                                      -1,
+                                      -1,
+                                  },
+                                  {10, 9, 8, -1, -1, -1, -1, -1}};
+  int partial[15] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
+  int best_cost = 0;
+  int best_cost_2 = 0;
+  int2 best_dir = {0, 0};
+  for (i = 0; i < 8; i++) {
+    int j;
+    for (j = 0; j < 8; j++) {
+      int x;
+      x = (get_loaded_source_sample(x0 + j, y0 + i) >> coeff_shift) - 128;
+      int4 idx = prt_idx[z0];
+      partial[idx.x * (i >> idx.z) + idx.y * (j >> idx.w) + prt_idx_shift[z0]] += x;
+    }
+  }
+  // for (i = 0; i < 8; i++)
+  { costs[lz][ly][lx] = 0; }
+  for (i = 0; i < 8; i++) {
+    int pt1 = partial[i];
+    int pt2 = cost_prt_idx[z0][i] >= 0 ? partial[cost_prt_idx[z0][i]] : 0;
+    costs[lz][ly][lx] += (pt1 * pt1 + pt2 * pt2) * div_table[div_table_idx[z0][i]];
+  }
+  GroupMemoryBarrierWithGroupSync();
+  for (i = 0; i < 8; i++) {
+    cost[i] = costs[i][ly][lx];
+  }
+  for (i = 0; i < 8; i++) {
+    if (cost[i] > best_cost) {
+      best_cost = cost[i];
+      best_cost_2 = cost[(i + 4) & 7];
+      best_dir.x = i;
+    }
+  }
+  best_dir.y = best_cost - best_cost_2;
+  best_dir.y >>= 10;
+  return best_dir;
+}
+
+int2 get_block_dir_old(int x0, int y0, int coeff_shift) {
+  int i;
+  int j;
+  int cost[8] = {0, 0, 0, 0, 0, 0, 0, 0};
+  int partial[8][15] = {{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
+                        {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
+                        {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
+                        {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}};
+  const int div_table[] = {0, 840, 420, 280, 210, 168, 140, 120, 105};
+
+  int best_cost = 0;
+  int best_cost_2 = 0;
+  int2 best_dir = {0, 0};
+  for (i = 0; i < 8; i++) {
+    for (j = 0; j < 8; j++) {
+      int x;
+      x = (get_loaded_source_sample(x0 + j, y0 + i) >> coeff_shift) - 128;
+      partial[0][i + j] += x;
+      partial[1][i + (j >> 1)] += x;
+      partial[2][i] += x;
+      partial[3][3 + i - (j >> 1)] += x;
+      partial[4][7 + i - j] += x;
+      partial[5][3 - (i >> 1) + j] += x;
+      partial[6][j] += x;
+      partial[7][(i >> 1) + j] += x;
+    }
+  }
+  for (i = 0; i < 8; i++) {
+    cost[2] += partial[2][i] * partial[2][i];
+    cost[6] += partial[6][i] * partial[6][i];
+  }
+  cost[2] *= div_table[8];
+  cost[6] *= div_table[8];
+  for (i = 0; i < 7; i++) {
+    cost[0] += (partial[0][i] * partial[0][i] + partial[0][14 - i] * partial[0][14 - i]) * div_table[i + 1];
+    cost[4] += (partial[4][i] * partial[4][i] + partial[4][14 - i] * partial[4][14 - i]) * div_table[i + 1];
+  }
+  cost[0] += partial[0][7] * partial[0][7] * div_table[8];
+  cost[4] += partial[4][7] * partial[4][7] * div_table[8];
+  for (i = 1; i < 8; i += 2) {
+    for (j = 0; j < 4 + 1; j++) {
+      cost[i] += partial[i][3 + j] * partial[i][3 + j];
+    }
+    cost[i] *= div_table[8];
+    for (j = 0; j < 4 - 1; j++) {
+      cost[i] += (partial[i][j] * partial[i][j] + partial[i][10 - j] * partial[i][10 - j]) * div_table[2 * j + 2];
+    }
+  }
+  for (i = 0; i < 8; i++) {
+    if (cost[i] > best_cost) {
+      best_cost = cost[i];
+      best_cost_2 = cost[(i + 4) & 7];
+      best_dir.x = i;
+    }
+  }
+  best_dir.y = best_cost - best_cost_2;
+  best_dir.y >>= 10;
+  return best_dir;
+}
+
+#define MAX_SB_SIZE_LOG2 7
+#define ALIGN_POWER_OF_TWO(value, n) (((value) + ((1 << (n)) - 1)) & ~((1 << (n)) - 1))
+#define CDEF_VBORDER (3)
+#define CDEF_HBORDER (8)
+#define CDEF_BSTRIDE ALIGN_POWER_OF_TWO((1 << MAX_SB_SIZE_LOG2) + 2 * CDEF_HBORDER, 3)
+
+int get_msb(unsigned int n) {
+  int log = 0;
+  unsigned int value = n;
+  int i;
+
+  for (i = 4; i >= 0; --i) {
+    const int shift = (1 << i);
+    const unsigned int x = value >> shift;
+    if (x != 0) {
+      value = x;
+      log += shift;
+    }
+  }
+  return log;
+}
+
+int constrain(int diff, int threshold, int shift) {
+  return sign(diff) * min(abs(diff), max(0, threshold - (abs(diff) >> shift)));
+}
+
+int adjust_strength(int strength, int var) {
+  const int i = var >> 6 ? min(get_msb(var >> 6), 12) : 0;
+  return var ? (strength * (4 + i) + 8) >> 4 : 0;
+}
+
+[numthreads(WG_WIDTH, WG_HEIGHT, 8)] void main(uint3 Gid
+                                               : SV_GroupID, uint3 DTid
+                                               : SV_DispatchThreadID, uint3 GTid
+                                               : SV_GroupThreadID, uint GI
+                                               : SV_GroupIndex) {
+  int gx = Gid.x * CDEF_BLOCK_WIDTH;
+  int gy = Gid.y * CDEF_BLOCK_HEIGHT;
+  int guvx = Gid.x * CDEF_UV_BLOCK_WIDTH;
+  int guvy = Gid.y * CDEF_UV_BLOCK_HEIGHT;
+  int lx = GTid.x;
+  int ly = GTid.y;
+  int lz = GTid.z;
+  int2 dir_var;
+  int bit_depth = data.bit_depth;
+  int coeff_shift = bit_depth - 8;
+
+  int skip = skips.Load(((Gid.y * WG_HEIGHT + ly) * data.skips_stride + Gid.x * WG_WIDTH + lx) * 4);
+  if (data.hbd) {
+    load_input_hbd(data.pl, gx, gy, lx, ly, lz);
+  } else {
+    load_input(data.pl, gx, gy, lx, ly, lz);
+  }
+  GroupMemoryBarrierWithGroupSync();
+
+  if (lz == 0 && !skip) {
+    temp[ly][lx] = get_block_dir_old(lx * 8, ly * 8, coeff_shift);
+  }
+  int index = cdef_index.Load(((Gid.y >> 1) * data.index_stride + (Gid.x >> 1)) * 4);
+  uint strength = 0;
+  if (index >= 0 && index < 16) {
+    strength = data.cdef_strength[index].x;
+  }
+  int t = skip ? 0 : strength / 4;
+  int s = skip ? 0 : strength % 4;
+  s += s == 3;
+
+  GroupMemoryBarrierWithGroupSync();
+  dir_var = temp[ly][lx];
+
+  const int damping = data.pri_damping + coeff_shift;
+  if (!skip) {
+    const int pri_taps[2][2] = {{4, 2}, {3, 3}};
+    const int sec_taps[2][2] = {{2, 1}, {2, 1}};
+
+    const int x0 = lx * 8;
+    const int y0 = ly * 8;
+    const int z0 = lz;
+    const int pri_strength = adjust_strength(t << coeff_shift, dir_var.y);
+    const int sec_strength = s << coeff_shift;
+    const int dir = t ? dir_var.x : 0;
+    int i, j, k;
+    const int p_t = (pri_strength >> coeff_shift) & 1;
+    const int s_t = (pri_strength >> coeff_shift) & 1;
+    const int pri_msb = get_msb(pri_strength);
+    const int sec_msb = get_msb(sec_strength);
+    const int pri_shift = max(0, damping - pri_msb);
+    const int sec_shift = max(0, damping - sec_msb);
+
+    // for (i = 0; i < 8; i++) {
+    {
+      i = z0;
+      for (j = 0; j < 8; j++) {
+        int sum = 0;
+        int y;
+        int x = get_loaded_source_sample(x0 + j, y0 + i);
+        int mmax = x;
+        int mmin = x;
+        for (k = 0; k < 2; k++) {
+          if (pri_strength) {
+            int2 dir_pri = data.cdef_directions[dir][k].xy;
+            int p0 = get_loaded_source_sample(x0 + j + dir_pri.x, y0 + i + dir_pri.y);
+            int p1 = get_loaded_source_sample(x0 + j - dir_pri.x, y0 + i - dir_pri.y);
+            sum += pri_taps[p_t][k] * constrain(p0 - x, pri_strength, pri_shift);
+            sum += pri_taps[p_t][k] * constrain(p1 - x, pri_strength, pri_shift);
+
+            // mmax = max(p0 & 0xff, mmax);
+            // mmax = max(p1 & 0xff, mmax);
+            // NOTE!: (CDEF_VERY_LARGE & 0xff) = 48, can we adjust CDEF_VERY_LARGE???
+            if (p0 != CDEF_VERY_LARGE) mmax = max(p0, mmax);
+            if (p1 != CDEF_VERY_LARGE) mmax = max(p1, mmax);
+            mmin = min(p0, mmin);
+            mmin = min(p1, mmin);
+          }
+          if (sec_strength) {
+            int2 dir1_sec = data.cdef_directions[dir + 2][k].xy;
+            int2 dir2_sec = data.cdef_directions[dir + 6][k].xy;
+            int s0 = get_loaded_source_sample(x0 + j + dir1_sec.x, y0 + i + dir1_sec.y);
+            int s1 = get_loaded_source_sample(x0 + j - dir1_sec.x, y0 + i - dir1_sec.y);
+            int s2 = get_loaded_source_sample(x0 + j + dir2_sec.x, y0 + i + dir2_sec.y);
+            int s3 = get_loaded_source_sample(x0 + j - dir2_sec.x, y0 + i - dir2_sec.y);
+            // mmax = max(s0 & 0xff, mmax);
+            // mmax = max(s1 & 0xff, mmax);
+            // mmax = max(s2 & 0xff, mmax);
+            // mmax = max(s3 & 0xff, mmax);
+            // NOTE!: (CDEF_VERY_LARGE & 0xff) = 48, can we adjust CDEF_VERY_LARGE???
+            if (s0 != CDEF_VERY_LARGE) mmax = max(s0, mmax);
+            if (s1 != CDEF_VERY_LARGE) mmax = max(s1, mmax);
+            if (s2 != CDEF_VERY_LARGE) mmax = max(s2, mmax);
+            if (s3 != CDEF_VERY_LARGE) mmax = max(s3, mmax);
+            mmin = min(s0, mmin);
+            mmin = min(s1, mmin);
+            mmin = min(s2, mmin);
+            mmin = min(s3, mmin);
+            sum += sec_taps[s_t][k] * constrain(s0 - x, sec_strength, sec_shift);
+            sum += sec_taps[s_t][k] * constrain(s1 - x, sec_strength, sec_shift);
+            sum += sec_taps[s_t][k] * constrain(s2 - x, sec_strength, sec_shift);
+            sum += sec_taps[s_t][k] * constrain(s3 - x, sec_strength, sec_shift);
+          }
+        }
+        y = clamp(x + ((8 + sum - (sum < 0)) >> 4), mmin, mmax);
+
+        output[y0 + i][x0 + j] = y;
+      }
+    }
+
+  } else {
+    for (int j = 0; j < 8; j++) output[ly * 8 + lz][lx * 8 + j] = get_loaded_source_sample(lx * 8 + j, ly * 8 + lz);
+  }
+  GroupMemoryBarrierWithGroupSync();
+  int dy = lz;
+  {
+    for (int dx = 0; dx < 2; dx++) {
+      int4 out_sample;
+      out_sample.x = output[ly * 8 + dy][lx * 8 + dx * 4 + 0];
+      out_sample.y = output[ly * 8 + dy][lx * 8 + dx * 4 + 1];
+      out_sample.z = output[ly * 8 + dy][lx * 8 + dx * 4 + 2];
+      out_sample.w = output[ly * 8 + dy][lx * 8 + dx * 4 + 3];
+
+      if (data.hbd) {
+        dst.Store2(data.dst_offset_y + (gy + ly * 8 + dy) * data.pl.x + ((gx + lx * 8 + dx * 4) << 1),
+                   uint2((out_sample.x << 0) | (out_sample.y << 16), (out_sample.z << 0) | (out_sample.w << 16)));
+      } else {
+        dst.Store(data.dst_offset_y + (gy + ly * 8 + dy) * data.pl.x + gx + lx * 8 + dx * 4,
+                  (out_sample.x << 0) | (out_sample.y << 8) | (out_sample.z << 16) | (out_sample.w << 24));
+      }
+    }
+  }
+  // Chroma processing
+  int pid = 1 + (lz >> 2);
+  int llz = lz & 3;
+  int4 plane = data.pl;
+  plane.x = data.uv_stride;
+  plane.y = pid == 1 ? data.uv_offset_u : data.uv_offset_v;
+  plane.z >>= 1;
+  plane.w >>= 1;
+  GroupMemoryBarrierWithGroupSync();
+  if (data.hbd) {
+    load_uv_input_hbd(plane, guvx, guvy, lx, ly, llz, pid);
+  } else {
+    load_uv_input(plane, guvx, guvy, lx, ly, llz, pid);
+  }
+  GroupMemoryBarrierWithGroupSync();
+  if (index >= 0 && index < 16) {
+    strength = data.cdef_uv_strength[index].x;
+  }
+  t = skip ? 0 : strength / 4;
+  s = skip ? 0 : strength % 4;
+  s += s == 3;
+
+  if (!skip) {
+    const int pri_taps[2][2] = {{4, 2}, {3, 3}};
+    const int sec_taps[2][2] = {{2, 1}, {2, 1}};
+
+    const int x0 = lx * 4;
+    const int y0 = ly * 4;
+    const int z0 = llz;
+    const int pri_strength = t << coeff_shift;
+    const int sec_strength = s << coeff_shift;
+    const int dir = pri_strength ? dir_var.x : 0;
+    int i, j, k;
+    const int p_t = (pri_strength >> coeff_shift) & 1;
+    const int s_t = (pri_strength >> coeff_shift) & 1;
+    const int pri_msb = get_msb(pri_strength);
+    const int sec_msb = get_msb(sec_strength);
+    const int pri_shift = max(0, damping - 1 - pri_msb);
+    const int sec_shift = max(0, damping - 1 - sec_msb);
+
+    // for (i = 0; i < 8; i++) {
+    {
+      i = z0;
+      for (j = 0; j < 4; j++) {
+        int sum = 0;
+        int y;
+        int x = get_loaded_source_sample(x0 + j, y0 + i + (pid - 1) * 24);
+        int mmax = x;
+        int mmin = x;
+        for (k = 0; k < 2; k++) {
+          if (pri_strength) {
+            int2 dir_pri = data.cdef_directions[dir][k].xy;
+            int p0 = get_loaded_source_sample(x0 + j + dir_pri.x, y0 + i + dir_pri.y + (pid - 1) * 24);
+            int p1 = get_loaded_source_sample(x0 + j - dir_pri.x, y0 + i - dir_pri.y + (pid - 1) * 24);
+            sum += pri_taps[p_t][k] * constrain(p0 - x, pri_strength, pri_shift);
+            sum += pri_taps[p_t][k] * constrain(p1 - x, pri_strength, pri_shift);
+            // mmax = max(p0 & 0xff, mmax);
+            // mmax = max(p1 & 0xff, mmax);
+            // NOTE!: (CDEF_VERY_LARGE & 0xff) = 48, can we adjust CDEF_VERY_LARGE???
+            if (p0 != CDEF_VERY_LARGE) mmax = max(p0, mmax);
+            if (p1 != CDEF_VERY_LARGE) mmax = max(p1, mmax);
+            mmin = min(p0, mmin);
+            mmin = min(p1, mmin);
+          }
+          if (sec_strength) {
+            int2 dir1_sec = data.cdef_directions[dir + 2][k].xy;
+            int2 dir2_sec = data.cdef_directions[dir + 6][k].xy;
+            int s0 = get_loaded_source_sample(x0 + j + dir1_sec.x, y0 + i + dir1_sec.y + (pid - 1) * 24);
+            int s1 = get_loaded_source_sample(x0 + j - dir1_sec.x, y0 + i - dir1_sec.y + (pid - 1) * 24);
+            int s2 = get_loaded_source_sample(x0 + j + dir2_sec.x, y0 + i + dir2_sec.y + (pid - 1) * 24);
+            int s3 = get_loaded_source_sample(x0 + j - dir2_sec.x, y0 + i - dir2_sec.y + (pid - 1) * 24);
+            // mmax = max(s0 & 0xff, mmax);
+            // mmax = max(s1 & 0xff, mmax);
+            // mmax = max(s2 & 0xff, mmax);
+            // mmax = max(s3 & 0xff, mmax);
+            // NOTE!: (CDEF_VERY_LARGE & 0xff) = 48, can we adjust CDEF_VERY_LARGE???
+            if (s0 != CDEF_VERY_LARGE) mmax = max(s0, mmax);
+            if (s1 != CDEF_VERY_LARGE) mmax = max(s1, mmax);
+            if (s2 != CDEF_VERY_LARGE) mmax = max(s2, mmax);
+            if (s3 != CDEF_VERY_LARGE) mmax = max(s3, mmax);
+            mmin = min(s0, mmin);
+            mmin = min(s1, mmin);
+            mmin = min(s2, mmin);
+            mmin = min(s3, mmin);
+            sum += sec_taps[s_t][k] * constrain(s0 - x, sec_strength, sec_shift);
+            sum += sec_taps[s_t][k] * constrain(s1 - x, sec_strength, sec_shift);
+            sum += sec_taps[s_t][k] * constrain(s2 - x, sec_strength, sec_shift);
+            sum += sec_taps[s_t][k] * constrain(s3 - x, sec_strength, sec_shift);
+          }
+        }
+        y = clamp(x + ((8 + sum - (sum < 0)) >> 4), mmin, mmax);
+        output[y0 + i + (pid - 1) * 16][x0 + j] = y;
+      }
+    }
+
+  } else {
+    for (int j = 0; j < 4; j++)
+      output[ly * 4 + llz + (pid - 1) * 16][lx * 4 + j] =
+          get_loaded_source_sample(lx * 4 + j, ly * 4 + llz + (pid - 1) * 24);
+  }
+  GroupMemoryBarrierWithGroupSync();
+  dy = llz;
+  {
+    int4 out_sample;
+    out_sample.x = output[ly * 4 + dy + (pid - 1) * 16][lx * 4 + 0];
+    out_sample.y = output[ly * 4 + dy + (pid - 1) * 16][lx * 4 + 1];
+    out_sample.z = output[ly * 4 + dy + (pid - 1) * 16][lx * 4 + 2];
+    out_sample.w = output[ly * 4 + dy + (pid - 1) * 16][lx * 4 + 3];
+    const int dst_offset_uv = pid == 1 ? data.dst_offset_u : data.dst_offset_v;
+    if (data.hbd) {
+      dst.Store2(dst_offset_uv + (guvy + ly * 4 + dy) * data.uv_stride + ((guvx + lx * 4) << 1),
+                 uint2((out_sample.x << 0) | (out_sample.y << 16), (out_sample.z << 0) | (out_sample.w << 16)));
+    } else {
+      dst.Store(dst_offset_uv + (guvy + ly * 4 + dy) * data.uv_stride + guvx + lx * 4,
+                (out_sample.x << 0) | (out_sample.y << 8) | (out_sample.z << 16) | (out_sample.w << 24));
+    }
+  }
+}

diff --git a/libav1/dx/shaders/copy_plane.hlsl b/libav1/dx/shaders/copy_plane.hlsl
new file mode 100644
index 0000000..79a2e27
--- /dev/null
+++ b/libav1/dx/shaders/copy_plane.hlsl

@@ -0,0 +1,37 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+RWByteAddressBuffer pool : register(u0);
+
+cbuffer cb_sort_data : register(b0) {
+  uint cb_wi_count;
+  uint cb_src_offset;
+  uint cb_src_stride;
+  uint cb_dst_width;
+  uint cb_dst_offset;
+  uint cb_dst_stride;
+};
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int x = thread.x % cb_dst_width;
+  const int y = thread.x / cb_dst_width;
+
+  uint4 pixels = pool.Load4(cb_src_offset + x * 16 + y * cb_src_stride);
+  pool.Store4(cb_dst_offset + x * 16 + y * cb_dst_stride, pixels);
+}

diff --git a/libav1/dx/shaders/copy_plane_10bit10x3.hlsl b/libav1/dx/shaders/copy_plane_10bit10x3.hlsl
new file mode 100644
index 0000000..57b1074
--- /dev/null
+++ b/libav1/dx/shaders/copy_plane_10bit10x3.hlsl

@@ -0,0 +1,57 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+RWByteAddressBuffer pool : register(u0);
+
+cbuffer cb_sort_data : register(b0) {
+  uint cb_wi_count;
+  uint cb_src_offset;
+  uint cb_src_stride;
+  uint cb_dst_width;
+  uint cb_dst_offset;
+  uint cb_dst_stride;
+};
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int x = thread.x % cb_dst_width;
+  const int y = thread.x / cb_dst_width;
+
+  int src_stride = cb_src_stride;
+  int dst_stride = cb_dst_stride;
+  int src_offset = cb_src_offset;
+  int dst_offset = cb_dst_offset;
+  uint base_src = y * src_stride + x * 48 + src_offset;
+  uint base_trg = y * dst_stride + x * 32 + dst_offset;
+  uint4 srcval1 = pool.Load4(base_src);
+  uint4 srcval2 = pool.Load4(base_src + 16);
+  uint4 srcval3 = pool.Load4(base_src + 32);
+  uint4 trgval;
+
+  trgval.x = (srcval1.x & 0x03FF) | ((srcval1.x & 0x03FF0000) >> 6) | ((srcval1.y & 0x03FF) << 20);
+  trgval.y = ((srcval1.y & 0x03FF0000) >> 16) | ((srcval1.z & 0x03FF) << 10) | ((srcval1.z & 0x03FF0000) << 4);
+  trgval.z = (srcval1.w & 0x03FF) | ((srcval1.w & 0x03FF0000) >> 6) | ((srcval2.x & 0x03FF) << 20);
+  trgval.w = ((srcval2.x & 0x03FF0000) >> 16) | ((srcval2.y & 0x03FF) << 10) | ((srcval2.y & 0x03FF0000) << 4);
+  pool.Store4(base_trg, trgval);
+
+  trgval.x = (srcval2.z & 0x03FF) | ((srcval2.z & 0x03FF0000) >> 6) | ((srcval2.w & 0x03FF) << 20);
+  trgval.y = ((srcval2.w & 0x03FF0000) >> 16) | ((srcval3.x & 0x03FF) << 10) | ((srcval3.x & 0x03FF0000) << 4);
+  trgval.z = (srcval3.y & 0x03FF) | ((srcval3.y & 0x03FF0000) >> 6) | ((srcval3.z & 0x03FF) << 20);
+  trgval.w = ((srcval3.z & 0x03FF0000) >> 16) | ((srcval3.w & 0x03FF) << 10) | ((srcval3.w & 0x03FF0000) << 4);
+  pool.Store4(base_trg + 16, trgval);
+}

diff --git a/libav1/dx/shaders/fill_buffer.hlsl b/libav1/dx/shaders/fill_buffer.hlsl
new file mode 100644
index 0000000..17a4b66
--- /dev/null
+++ b/libav1/dx/shaders/fill_buffer.hlsl

@@ -0,0 +1,30 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+RWByteAddressBuffer dst : register(u0);
+
+cbuffer cb_sort_data : register(b0) {
+  uint cb_wi_count;
+  uint cb_value;
+};
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+  uint4 p;
+  p.x = p.y = p.z = p.w = cb_value;
+  dst.Store4(thread.x * 16, p);
+}

diff --git a/libav1/dx/shaders/film_grain_const.h b/libav1/dx/shaders/film_grain_const.h
new file mode 100644
index 0000000..654a010
--- /dev/null
+++ b/libav1/dx/shaders/film_grain_const.h

@@ -0,0 +1,286 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#define gauss_bits 11
+#define sizeof_int 4
+#define luma_subblock_size_y 32
+#define luma_subblock_size_x 32
+#define chroma_subblock_size_y 16
+#define chroma_subblock_size_x 16
+#define min_luma_legal_range 16
+#define max_luma_legal_range 235
+#define min_chroma_legal_range 16
+#define max_chroma_legal_range 240
+
+struct GrainParams {
+  // This structure is compared element-by-element in the function
+  // av1_check_grain_params_equiv: this function must be updated if any changes
+  // are made to this structure.
+  int apply_grain;
+  int update_parameters;
+  int num_y_points;   // value: 0..14
+  int num_cb_points;  // value: 0..10
+
+  int num_cr_points;   // value: 0..10
+  int scaling_shift;   // values : 8..11
+  int ar_coeff_lag;    // values:  0..3
+  int ar_coeff_shift;  // values : 6..9
+
+  // Shift value: AR coeffs range
+  // 6: [-2, 2)
+  // 7: [-1, 1)
+  // 8: [-0.5, 0.5)
+  // 9: [-0.25, 0.25)
+
+  int cb_mult;       // 8 bits
+  int cb_luma_mult;  // 8 bits
+  int cb_offset;     // 9 bits
+  int cr_mult;       // 8 bits
+
+  int cr_luma_mult;  // 8 bits
+  int cr_offset;     // 9 bits
+  int overlap_flag;
+  int clip_to_restricted_range;
+
+  unsigned int bit_depth;  // video bit depth
+  int chroma_scaling_from_luma;
+  int grain_scale_shift;
+  unsigned int random_seed;
+
+  // Y 8 bit values 14*2 = 7*4
+  // UV 8 bit values 10*2 = 5*4+2 (padding 2)
+  int4 scaling_points[14][2];
+
+  // Y 8 bit values 24 = 6*4
+  // UV 8 bit values 25 = 6*4 + 1 (padding 3)
+  int4 ar_coeffs[3][25];  // xyz = y, cb, cr
+};
+
+struct FilmGrainGenData {
+  // RW_ByteBuffer dst;
+  GrainParams params;
+
+  int luma_block_size_y;
+  int luma_block_size_x;
+  int luma_grain_stride;
+  int chroma_block_size_y;
+
+  int chroma_block_size_x;
+  int chroma_grain_stride;
+  int left_pad;
+  int top_pad;
+
+  int right_pad;
+  int bottom_pad;
+  int dst_offset_u;
+  int dst_offset_v;
+
+  int4 pred_pos[25][3];
+};
+// Samples with Gaussian distribution in the range of [-2048, 2047] (12 bits)
+// with zero mean and standard deviation of about 512.
+// should be divided by 4 for 10-bit range and 16 for 8-bit range.
+/*
+const int gaussian_sequence[2048] = {
+  56,    568,   -180,  172,   124,   -84,   172,   -64,   -900,  24,   820,
+  224,   1248,  996,   272,   -8,    -916,  -388,  -732,  -104,  -188, 800,
+  112,   -652,  -320,  -376,  140,   -252,  492,   -168,  44,    -788, 588,
+  -584,  500,   -228,  12,    680,   272,   -476,  972,   -100,  652,  368,
+  432,   -196,  -720,  -192,  1000,  -332,  652,   -136,  -552,  -604, -4,
+  192,   -220,  -136,  1000,  -52,   372,   -96,   -624,  124,   -24,  396,
+  540,   -12,   -104,  640,   464,   244,   -208,  -84,   368,   -528, -740,
+  248,   -968,  -848,  608,   376,   -60,   -292,  -40,   -156,  252,  -292,
+  248,   224,   -280,  400,   -244,  244,   -60,   76,    -80,   212,  532,
+  340,   128,   -36,   824,   -352,  -60,   -264,  -96,   -612,  416,  -704,
+  220,   -204,  640,   -160,  1220,  -408,  900,   336,   20,    -336, -96,
+  -792,  304,   48,    -28,   -1232, -1172, -448,  104,   -292,  -520, 244,
+  60,    -948,  0,     -708,  268,   108,   356,   -548,  488,   -344, -136,
+  488,   -196,  -224,  656,   -236,  -1128, 60,    4,     140,   276,  -676,
+  -376,  168,   -108,  464,   8,     564,   64,    240,   308,   -300, -400,
+  -456,  -136,  56,    120,   -408,  -116,  436,   504,   -232,  328,  844,
+  -164,  -84,   784,   -168,  232,   -224,  348,   -376,  128,   568,  96,
+  -1244, -288,  276,   848,   832,   -360,  656,   464,   -384,  -332, -356,
+  728,   -388,  160,   -192,  468,   296,   224,   140,   -776,  -100, 280,
+  4,     196,   44,    -36,   -648,  932,   16,    1428,  28,    528,  808,
+  772,   20,    268,   88,    -332,  -284,  124,   -384,  -448,  208,  -228,
+  -1044, -328,  660,   380,   -148,  -300,  588,   240,   540,   28,   136,
+  -88,   -436,  256,   296,   -1000, 1400,  0,     -48,   1056,  -136, 264,
+  -528,  -1108, 632,   -484,  -592,  -344,  796,   124,   -668,  -768, 388,
+  1296,  -232,  -188,  -200,  -288,  -4,    308,   100,   -168,  256,  -500,
+  204,   -508,  648,   -136,  372,   -272,  -120,  -1004, -552,  -548, -384,
+  548,   -296,  428,   -108,  -8,    -912,  -324,  -224,  -88,   -112, -220,
+  -100,  996,   -796,  548,   360,   -216,  180,   428,   -200,  -212, 148,
+  96,    148,   284,   216,   -412,  -320,  120,   -300,  -384,  -604, -572,
+  -332,  -8,    -180,  -176,  696,   116,   -88,   628,   76,    44,   -516,
+  240,   -208,  -40,   100,   -592,  344,   -308,  -452,  -228,  20,   916,
+  -1752, -136,  -340,  -804,  140,   40,    512,   340,   248,   184,  -492,
+  896,   -156,  932,   -628,  328,   -688,  -448,  -616,  -752,  -100, 560,
+  -1020, 180,   -800,  -64,   76,    576,   1068,  396,   660,   552,  -108,
+  -28,   320,   -628,  312,   -92,   -92,   -472,  268,   16,    560,  516,
+  -672,  -52,   492,   -100,  260,   384,   284,   292,   304,   -148, 88,
+  -152,  1012,  1064,  -228,  164,   -376,  -684,  592,   -392,  156,  196,
+  -524,  -64,   -884,  160,   -176,  636,   648,   404,   -396,  -436, 864,
+  424,   -728,  988,   -604,  904,   -592,  296,   -224,  536,   -176, -920,
+  436,   -48,   1176,  -884,  416,   -776,  -824,  -884,  524,   -548, -564,
+  -68,   -164,  -96,   692,   364,   -692,  -1012, -68,   260,   -480, 876,
+  -1116, 452,   -332,  -352,  892,   -1088, 1220,  -676,  12,    -292, 244,
+  496,   372,   -32,   280,   200,   112,   -440,  -96,   24,    -644, -184,
+  56,    -432,  224,   -980,  272,   -260,  144,   -436,  420,   356,  364,
+  -528,  76,    172,   -744,  -368,  404,   -752,  -416,  684,   -688, 72,
+  540,   416,   92,    444,   480,   -72,   -1416, 164,   -1172, -68,  24,
+  424,   264,   1040,  128,   -912,  -524,  -356,  64,    876,   -12,  4,
+  -88,   532,   272,   -524,  320,   276,   -508,  940,   24,    -400, -120,
+  756,   60,    236,   -412,  100,   376,   -484,  400,   -100,  -740, -108,
+  -260,  328,   -268,  224,   -200,  -416,  184,   -604,  -564,  -20,  296,
+  60,    892,   -888,  60,    164,   68,    -760,  216,   -296,  904,  -336,
+  -28,   404,   -356,  -568,  -208,  -1480, -512,  296,   328,   -360, -164,
+  -1560, -776,  1156,  -428,  164,   -504,  -112,  120,   -216,  -148, -264,
+  308,   32,    64,    -72,   72,    116,   176,   -64,   -272,  460,  -536,
+  -784,  -280,  348,   108,   -752,  -132,  524,   -540,  -776,  116,  -296,
+  -1196, -288,  -560,  1040,  -472,  116,   -848,  -1116, 116,   636,  696,
+  284,   -176,  1016,  204,   -864,  -648,  -248,  356,   972,   -584, -204,
+  264,   880,   528,   -24,   -184,  116,   448,   -144,  828,   524,  212,
+  -212,  52,    12,    200,   268,   -488,  -404,  -880,  824,   -672, -40,
+  908,   -248,  500,   716,   -576,  492,   -576,  16,    720,   -108, 384,
+  124,   344,   280,   576,   -500,  252,   104,   -308,  196,   -188, -8,
+  1268,  296,   1032,  -1196, 436,   316,   372,   -432,  -200,  -660, 704,
+  -224,  596,   -132,  268,   32,    -452,  884,   104,   -1008, 424,  -1348,
+  -280,  4,     -1168, 368,   476,   696,   300,   -8,    24,    180,  -592,
+  -196,  388,   304,   500,   724,   -160,  244,   -84,   272,   -256, -420,
+  320,   208,   -144,  -156,  156,   364,   452,   28,    540,   316,  220,
+  -644,  -248,  464,   72,    360,   32,    -388,  496,   -680,  -48,  208,
+  -116,  -408,  60,    -604,  -392,  548,   -840,  784,   -460,  656,  -544,
+  -388,  -264,  908,   -800,  -628,  -612,  -568,  572,   -220,  164,  288,
+  -16,   -308,  308,   -112,  -636,  -760,  280,   -668,  432,   364,  240,
+  -196,  604,   340,   384,   196,   592,   -44,   -500,  432,   -580, -132,
+  636,   -76,   392,   4,     -412,  540,   508,   328,   -356,  -36,  16,
+  -220,  -64,   -248,  -60,   24,    -192,  368,   1040,  92,    -24,  -1044,
+  -32,   40,    104,   148,   192,   -136,  -520,  56,    -816,  -224, 732,
+  392,   356,   212,   -80,   -424,  -1008, -324,  588,   -1496, 576,  460,
+  -816,  -848,  56,    -580,  -92,   -1372, -112,  -496,  200,   364,  52,
+  -140,  48,    -48,   -60,   84,    72,    40,    132,   -356,  -268, -104,
+  -284,  -404,  732,   -520,  164,   -304,  -540,  120,   328,   -76,  -460,
+  756,   388,   588,   236,   -436,  -72,   -176,  -404,  -316,  -148, 716,
+  -604,  404,   -72,   -88,   -888,  -68,   944,   88,    -220,  -344, 960,
+  472,   460,   -232,  704,   120,   832,   -228,  692,   -508,  132,  -476,
+  844,   -748,  -364,  -44,   1116,  -1104, -1056, 76,    428,   552,  -692,
+  60,    356,   96,    -384,  -188,  -612,  -576,  736,   508,   892,  352,
+  -1132, 504,   -24,   -352,  324,   332,   -600,  -312,  292,   508,  -144,
+  -8,    484,   48,    284,   -260,  -240,  256,   -100,  -292,  -204, -44,
+  472,   -204,  908,   -188,  -1000, -256,  92,    1164,  -392,  564,  356,
+  652,   -28,   -884,  256,   484,   -192,  760,   -176,  376,   -524, -452,
+  -436,  860,   -736,  212,   124,   504,   -476,  468,   76,    -472, 552,
+  -692,  -944,  -620,  740,   -240,  400,   132,   20,    192,   -196, 264,
+  -668,  -1012, -60,   296,   -316,  -828,  76,    -156,  284,   -768, -448,
+  -832,  148,   248,   652,   616,   1236,  288,   -328,  -400,  -124, 588,
+  220,   520,   -696,  1032,  768,   -740,  -92,   -272,  296,   448,  -464,
+  412,   -200,  392,   440,   -200,  264,   -152,  -260,  320,   1032, 216,
+  320,   -8,    -64,   156,   -1016, 1084,  1172,  536,   484,   -432, 132,
+  372,   -52,   -256,  84,    116,   -352,  48,    116,   304,   -384, 412,
+  924,   -300,  528,   628,   180,   648,   44,    -980,  -220,  1320, 48,
+  332,   748,   524,   -268,  -720,  540,   -276,  564,   -344,  -208, -196,
+  436,   896,   88,    -392,  132,   80,    -964,  -288,  568,   56,   -48,
+  -456,  888,   8,     552,   -156,  -292,  948,   288,   128,   -716, -292,
+  1192,  -152,  876,   352,   -600,  -260,  -812,  -468,  -28,   -120, -32,
+  -44,   1284,  496,   192,   464,   312,   -76,   -516,  -380,  -456, -1012,
+  -48,   308,   -156,  36,    492,   -156,  -808,  188,   1652,  68,   -120,
+  -116,  316,   160,   -140,  352,   808,   -416,  592,   316,   -480, 56,
+  528,   -204,  -568,  372,   -232,  752,   -344,  744,   -4,    324,  -416,
+  -600,  768,   268,   -248,  -88,   -132,  -420,  -432,  80,    -288, 404,
+  -316,  -1216, -588,  520,   -108,  92,    -320,  368,   -480,  -216, -92,
+  1688,  -300,  180,   1020,  -176,  820,   -68,   -228,  -260,  436,  -904,
+  20,    40,    -508,  440,   -736,  312,   332,   204,   760,   -372, 728,
+  96,    -20,   -632,  -520,  -560,  336,   1076,  -64,   -532,  776,  584,
+  192,   396,   -728,  -520,  276,   -188,  80,    -52,   -612,  -252, -48,
+  648,   212,   -688,  228,   -52,   -260,  428,   -412,  -272,  -404, 180,
+  816,   -796,  48,    152,   484,   -88,   -216,  988,   696,   188,  -528,
+  648,   -116,  -180,  316,   476,   12,    -564,  96,    476,   -252, -364,
+  -376,  -392,  556,   -256,  -576,  260,   -352,  120,   -16,   -136, -260,
+  -492,  72,    556,   660,   580,   616,   772,   436,   424,   -32,  -324,
+  -1268, 416,   -324,  -80,   920,   160,   228,   724,   32,    -516, 64,
+  384,   68,    -128,  136,   240,   248,   -204,  -68,   252,   -932, -120,
+  -480,  -628,  -84,   192,   852,   -404,  -288,  -132,  204,   100,  168,
+  -68,   -196,  -868,  460,   1080,  380,   -80,   244,   0,     484,  -888,
+  64,    184,   352,   600,   460,   164,   604,   -196,  320,   -64,  588,
+  -184,  228,   12,    372,   48,    -848,  -344,  224,   208,   -200, 484,
+  128,   -20,   272,   -468,  -840,  384,   256,   -720,  -520,  -464, -580,
+  112,   -120,  644,   -356,  -208,  -608,  -528,  704,   560,   -424, 392,
+  828,   40,    84,    200,   -152,  0,     -144,  584,   280,   -120, 80,
+  -556,  -972,  -196,  -472,  724,   80,    168,   -32,   88,    160,  -688,
+  0,     160,   356,   372,   -776,  740,   -128,  676,   -248,  -480, 4,
+  -364,  96,    544,   232,   -1032, 956,   236,   356,   20,    -40,  300,
+  24,    -676,  -596,  132,   1120,  -104,  532,   -1096, 568,   648,  444,
+  508,   380,   188,   -376,  -604,  1488,  424,   24,    756,   -220, -192,
+  716,   120,   920,   688,   168,   44,    -460,  568,   284,   1144, 1160,
+  600,   424,   888,   656,   -356,  -320,  220,   316,   -176,  -724, -188,
+  -816,  -628,  -348,  -228,  -380,  1012,  -452,  -660,  736,   928,  404,
+  -696,  -72,   -268,  -892,  128,   184,   -344,  -780,  360,   336,  400,
+  344,   428,   548,   -112,  136,   -228,  -216,  -820,  -516,  340,  92,
+  -136,  116,   -300,  376,   -244,  100,   -316,  -520,  -284,  -12,  824,
+  164,   -548,  -180,  -128,  116,   -924,  -828,  268,   -368,  -580, 620,
+  192,   160,   0,     -1676, 1068,  424,   -56,   -360,  468,   -156, 720,
+  288,   -528,  556,   -364,  548,   -148,  504,   316,   152,   -648, -620,
+  -684,  -24,   -376,  -384,  -108,  -920,  -1032, 768,   180,   -264, -508,
+  -1268, -260,  -60,   300,   -240,  988,   724,   -376,  -576,  -212, -736,
+  556,   192,   1092,  -620,  -880,  376,   -56,   -4,    -216,  -32,  836,
+  268,   396,   1332,  864,   -600,  100,   56,    -412,  -92,   356,  180,
+  884,   -468,  -436,  292,   -388,  -804,  -704,  -840,  368,   -348, 140,
+  -724,  1536,  940,   372,   112,   -372,  436,   -480,  1136,  296,  -32,
+  -228,  132,   -48,   -220,  868,   -1016, -60,   -1044, -464,  328,  916,
+  244,   12,    -736,  -296,  360,   468,   -376,  -108,  -92,   788,  368,
+  -56,   544,   400,   -672,  -420,  728,   16,    320,   44,    -284, -380,
+  -796,  488,   132,   204,   -596,  -372,  88,    -152,  -908,  -636, -572,
+  -624,  -116,  -692,  -200,  -56,   276,   -88,   484,   -324,  948,  864,
+  1000,  -456,  -184,  -276,  292,   -296,  156,   676,   320,   160,  908,
+  -84,   -1236, -288,  -116,  260,   -372,  -644,  732,   -756,  -96,  84,
+  344,   -520,  348,   -688,  240,   -84,   216,   -1044, -136,  -676, -396,
+  -1500, 960,   -40,   176,   168,   1516,  420,   -504,  -344,  -364, -360,
+  1216,  -940,  -380,  -212,  252,   -660,  -708,  484,   -444,  -152, 928,
+  -120,  1112,  476,   -260,  560,   -148,  -344,  108,   -196,  228,  -288,
+  504,   560,   -328,  -88,   288,   -1008, 460,   -228,  468,   -836, -196,
+  76,    388,   232,   412,   -1168, -716,  -644,  756,   -172,  -356, -504,
+  116,   432,   528,   48,    476,   -168,  -608,  448,   160,   -532, -272,
+  28,    -676,  -12,   828,   980,   456,   520,   104,   -104,  256,  -344,
+  -4,    -28,   -368,  -52,   -524,  -572,  -556,  -200,  768,   1124, -208,
+  -512,  176,   232,   248,   -148,  -888,  604,   -600,  -304,  804,  -156,
+  -212,  488,   -192,  -804,  -256,  368,   -360,  -916,  -328,  228,  -240,
+  -448,  -472,  856,   -556,  -364,  572,   -12,   -156,  -368,  -340, 432,
+  252,   -752,  -152,  288,   268,   -580,  -848,  -592,  108,   -76,  244,
+  312,   -716,  592,   -80,   436,   360,   4,     -248,  160,   516,  584,
+  732,   44,    -468,  -280,  -292,  -156,  -588,  28,    308,   912,  24,
+  124,   156,   180,   -252,  944,   -924,  -772,  -520,  -428,  -624, 300,
+  -212,  -1144, 32,    -724,  800,   -1128, -212,  -1288, -848,  180,  -416,
+  440,   192,   -576,  -792,  -76,   -1080, 80,    -532,  -352,  -132, 380,
+  -820,  148,   1112,  128,   164,   456,   700,   -924,  144,   -668, -384,
+  648,   -832,  508,   552,   -52,   -100,  -656,  208,   -568,  748,  -88,
+  680,   232,   300,   192,   -408,  -1012, -152,  -252,  -268,  272,  -876,
+  -664,  -648,  -332,  -136,  16,    12,    1152,  -28,   332,   -536, 320,
+  -672,  -460,  -316,  532,   -260,  228,   -40,   1052,  -816,  180,  88,
+  -496,  -556,  -672,  -368,  428,   92,    356,   404,   -408,  252,  196,
+  -176,  -556,  792,   268,   32,    372,   40,    96,    -332,  328,  120,
+  372,   -900,  -40,   472,   -264,  -592,  952,   128,   656,   112,  664,
+  -232,  420,   4,     -344,  -464,  556,   244,   -416,  -32,   252,  0,
+  -412,  188,   -696,  508,   -476,  324,   -1096, 656,   -312,  560,  264,
+  -136,  304,   160,   -64,   -580,  248,   336,   -720,  560,   -348, -288,
+  -276,  -196,  -500,  852,   -544,  -236,  -1128, -992,  -776,  116,  56,
+  52,    860,   884,   212,   -12,   168,   1020,  512,   -552,  924,  -148,
+  716,   188,   164,   -340,  -520,  -184,  880,   -152,  -680,  -208, -1156,
+  -300,  -528,  -472,  364,   100,   -744,  -1056, -32,   540,   280,  144,
+  -676,  -32,   -232,  -280,  -224,  96,    568,   -76,   172,   148,  148,
+  104,   32,    -296,  -32,   788,   -80,   32,    -16,   280,   288,  944,
+  428,   -484
+};
+*/
\ No newline at end of file

diff --git a/libav1/dx/shaders/film_grain_filter.hlsl b/libav1/dx/shaders/film_grain_filter.hlsl
new file mode 100644
index 0000000..b0210cc
--- /dev/null
+++ b/libav1/dx/shaders/film_grain_filter.hlsl

@@ -0,0 +1,592 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "film_grain_const.h"
+
+ByteAddressBuffer src : register(t0);
+StructuredBuffer<int> grain_block : register(t1);
+StructuredBuffer<int> random_offset : register(t2);
+RWByteAddressBuffer dst : register(u0);
+
+struct FilmGrainData {
+  GrainParams params;
+
+  int4 src_planes[3];
+  int4 dst_planes[3];
+
+  int enable_chroma;
+  int random_offset_stride;
+  int width;
+  int height;
+
+  int mc_identity;
+  int luma_grain_stride;
+  int chroma_grain_stride;
+  int left_pad;
+
+  int right_pad;
+  int top_pad;
+  int bottom_pad;
+  int ar_padding;
+
+  int grain_offset_u;
+  int grain_offset_v;
+  int is_10x3;
+  int pad;
+
+  int4 scaling_lut[256];
+};
+
+cbuffer cb_film_grain_data : register(b0) { FilmGrainData data; };
+
+#define clamp_ln0(a, b) clamp((a * 23 + b * 22 + 16) >> 5, grain_min, grain_max)
+#define clamp_ln1(a, b) clamp((a * 27 + b * 17 + 16) >> 5, grain_min, grain_max)
+#define clamp_ln2(a, b) clamp((a * 17 + b * 27 + 16) >> 5, grain_min, grain_max)
+
+#define clamp_ln(n, a, b) ((n == 1) ? clamp_ln1(a, b) : clamp_ln2(a, b))
+
+groupshared int luma_grain_temp[3][32 * 32];
+groupshared int avarage_luma[3][32 * 32];
+groupshared int cr_grain_temp[3][16 * 16];
+groupshared int cb_grain_temp[3][16 * 16];
+
+#define chroma_subsamp_x 1
+#define chroma_subsamp_y 1
+
+//    return scaling_lut[x] + (((scaling_lut[x + 1] - scaling_lut[x]) *
+//                                  (index & ((1 << (bit_depth - 8)) - 1)) +
+//                              (1 << (bit_depth - 9))) >>
+//                             (bit_depth - 8));
+
+#define scale_LUT(index)                                                                                          \
+  (data.scaling_lut[index >> (bit_depth - 8)].x +                                                                 \
+   ((index >> (bit_depth - 8)) == 255                                                                             \
+        ? 0                                                                                                       \
+        : (((data.scaling_lut[(index >> (bit_depth - 8)) + 1].x - data.scaling_lut[index >> (bit_depth - 8)].x) * \
+                (index & ((1 << (bit_depth - 8)) - 1)) +                                                          \
+            (1 << (bit_depth - 9))) >>                                                                            \
+           (bit_depth - 8))))
+
+#define scale_LUT_cb(index)                                                                                       \
+  (data.scaling_lut[index >> (bit_depth - 8)].y +                                                                 \
+   ((index >> (bit_depth - 8)) == 255                                                                             \
+        ? 0                                                                                                       \
+        : (((data.scaling_lut[(index >> (bit_depth - 8)) + 1].y - data.scaling_lut[index >> (bit_depth - 8)].y) * \
+                (index & ((1 << (bit_depth - 8)) - 1)) +                                                          \
+            (1 << (bit_depth - 9))) >>                                                                            \
+           (bit_depth - 8))))
+
+#define scale_LUT_cr(index)                                                                                       \
+  (data.scaling_lut[index >> (bit_depth - 8)].z +                                                                 \
+   ((index >> (bit_depth - 8)) == 255                                                                             \
+        ? 0                                                                                                       \
+        : (((data.scaling_lut[(index >> (bit_depth - 8)) + 1].z - data.scaling_lut[index >> (bit_depth - 8)].z) * \
+                (index & ((1 << (bit_depth - 8)) - 1)) +                                                          \
+            (1 << (bit_depth - 9))) >>                                                                            \
+           (bit_depth - 8))))
+
+[numthreads(luma_subblock_size_x / 4, luma_subblock_size_y, 3)] void main(int3 Gid
+                                                                          : SV_GroupID, int3 GTid
+                                                                          : SV_GroupThreadID) {
+  int overlap = data.params.overlap_flag;
+  int bit_depth = data.params.bit_depth;
+  int lid = GTid.z;
+  int ii = Gid.y;
+  int jj = Gid.x * 3 + lid;
+  int grain_center = 128 << (bit_depth - 8);
+  int grain_min = 0 - grain_center;
+  int grain_max = (256 << (bit_depth - 8)) - 1 - grain_center;
+  const int enable_chroma = data.enable_chroma;
+
+  {
+    int y = ii * luma_subblock_size_y;
+    {
+      int x = jj * luma_subblock_size_x;
+
+      int true_chroma_subblock_size_y = luma_subblock_size_y >> chroma_subsamp_y;
+      int true_chroma_subblock_size_x = luma_subblock_size_x >> chroma_subsamp_x;
+      // Grain blocks offset calculation
+      const int random_offset_stride = data.random_offset_stride;
+      int offset_y_up = y ? random_offset[(ii - (y ? 1 : 0)) * random_offset_stride + jj] : 0;
+      int offset_y_up_left =
+          (y > 0) && (x > 0) ? random_offset[(ii - (y ? 1 : 0)) * random_offset_stride + jj - (x ? 1 : 0)] : 0;
+      int offset_y_left = x ? random_offset[ii * random_offset_stride + jj - (x ? 1 : 0)] : 0;
+      int offset_y = random_offset[ii * random_offset_stride + jj];
+      int offset_x_up = (offset_y_up >> 4) & 15;
+      int offset_x_up_left = (offset_y_up_left >> 4) & 15;
+      int offset_x_left = (offset_y_left >> 4) & 15;
+      int offset_x = (offset_y >> 4) & 15;
+      offset_y_up &= 15;
+      offset_y_up_left &= 15;
+      offset_y_left &= 15;
+      offset_y &= 15;
+      const int ar_padding = data.ar_padding;
+      int luma_offset_y = 2 * ar_padding + (offset_y << 1);
+      int luma_offset_x = 2 * ar_padding + (offset_x << 1);
+      int luma_offset_y_left = 2 * ar_padding + (offset_y_left << 1);
+      int luma_offset_x_left = 2 * ar_padding + (offset_x_left << 1) + 32;
+      int luma_offset_y_up = 2 * ar_padding + (offset_y_up << 1) + 32;
+      int luma_offset_x_up = 2 * ar_padding + (offset_x_up << 1);
+      int luma_offset_y_up_left = 2 * ar_padding + (offset_y_up_left << 1) + 32;
+      int luma_offset_x_up_left = 2 * ar_padding + (offset_x_up_left << 1) + 32;
+
+      const int top_pad = data.top_pad;
+      const int left_pad = data.left_pad;
+      int chroma_offset_y = top_pad + (luma_offset_y >> chroma_subsamp_y);
+      int chroma_offset_x = left_pad + (luma_offset_x >> chroma_subsamp_x);
+      int chroma_offset_y_left = top_pad + (luma_offset_y_left >> chroma_subsamp_y);
+      int chroma_offset_x_left = left_pad + (luma_offset_x_left >> chroma_subsamp_x);
+      int chroma_offset_y_up = top_pad + (luma_offset_y_up >> chroma_subsamp_y);
+      int chroma_offset_x_up = left_pad + (luma_offset_x_up >> chroma_subsamp_x);
+      int chroma_offset_y_up_left = top_pad + (luma_offset_y_up_left >> chroma_subsamp_y);
+      int chroma_offset_x_up_left = left_pad + (luma_offset_x_up_left >> chroma_subsamp_x);
+
+      luma_offset_y += top_pad;
+      luma_offset_x += left_pad;
+      luma_offset_y_left += top_pad;
+      luma_offset_x_left += left_pad;
+      luma_offset_y_up += top_pad;
+      luma_offset_x_up += left_pad;
+      luma_offset_y_up_left += top_pad;
+      luma_offset_x_up_left += left_pad;
+
+      const int grain_offset_u = data.grain_offset_u;
+      const int grain_offset_v = data.grain_offset_v;
+      const int luma_grain_stride = data.luma_grain_stride;
+      const int chroma_grain_stride = data.chroma_grain_stride;
+      // Grain blocks fetching
+      //            for (int i = 0; i < 32; i++) {
+      {
+        int i = GTid.y;
+        for (int j = GTid.x; j < luma_subblock_size_x; j += luma_subblock_size_x / 4) {
+          // Luma grain fetching
+          luma_grain_temp[lid][i * luma_subblock_size_x + j] =
+              (grain_block[(luma_offset_y + i) * luma_grain_stride + luma_offset_x + j]);
+          // Chroma grain fetching
+          if (i < true_chroma_subblock_size_y && j < true_chroma_subblock_size_x && enable_chroma) {
+            cb_grain_temp[lid][i * true_chroma_subblock_size_x + j] =
+                (grain_block[grain_offset_u + (chroma_offset_y + i) * chroma_grain_stride + chroma_offset_x + j]);
+            cr_grain_temp[lid][i * true_chroma_subblock_size_x + j] =
+                grain_block[grain_offset_v + (chroma_offset_y + i) * chroma_grain_stride + chroma_offset_x + j];
+          }
+        }
+      }
+      GroupMemoryBarrierWithGroupSync();
+
+      // Overlap processing on X axis
+      if (overlap && x) {
+        int i = GTid.y;
+        int j = GTid.x;
+        // Luma overlap
+        if (j < 2) {
+          int test_luma_left = (grain_block[(luma_offset_y_left + i) * luma_grain_stride + luma_offset_x_left + j]);
+          luma_grain_temp[lid][i * luma_subblock_size_x + j] =
+              clamp_ln(j + 1, test_luma_left, luma_grain_temp[lid][i * luma_subblock_size_x + j]);
+        }
+        // Chroma overlap
+        if (i < true_chroma_subblock_size_y && j <= (1 - chroma_subsamp_x) && enable_chroma) {
+          int test_cb_left = (grain_block[grain_offset_u + (chroma_offset_y_left + i) * chroma_grain_stride +
+                                          chroma_offset_x_left + j]);
+          int test_cr_left = (grain_block[grain_offset_v + (chroma_offset_y_left + i) * chroma_grain_stride +
+                                          chroma_offset_x_left + j]);
+          cb_grain_temp[lid][i * chroma_subblock_size_x + j] =
+              clamp_ln0(test_cb_left, cb_grain_temp[lid][i * true_chroma_subblock_size_x + j]);
+          cr_grain_temp[lid][i * chroma_subblock_size_x + j] =
+              clamp_ln0(test_cr_left, cr_grain_temp[lid][i * true_chroma_subblock_size_x + j]);
+        }
+      }
+      GroupMemoryBarrierWithGroupSync();
+
+      // Overlap processing on Y axis
+      if (overlap && y) {
+        int i = GTid.y;
+        for (int j = GTid.x; j < 32; j += luma_subblock_size_x / 4) {
+          // Luma overlap
+          if (i < 2) {
+            int test_luma_up_left =
+                (grain_block[(luma_offset_y_up_left + i) * luma_grain_stride + luma_offset_x_up_left + j]);
+            int test_luma_up = (grain_block[(luma_offset_y_up + i) * luma_grain_stride + luma_offset_x_up + j]);
+            if (x && (j < 2)) {
+              test_luma_up = clamp_ln(j + 1, test_luma_up_left, test_luma_up);
+            }
+            luma_grain_temp[lid][i * luma_subblock_size_x + j] =
+                clamp_ln(i + 1, test_luma_up, luma_grain_temp[lid][i * luma_subblock_size_x + j]);
+          }
+          // Chroma overlap
+          if ((i <= (1 - chroma_subsamp_y)) && (j < true_chroma_subblock_size_x) && enable_chroma) {
+            int test_cb_up_left = (grain_block[grain_offset_u + (chroma_offset_y_up_left + i) * chroma_grain_stride +
+                                               chroma_offset_x_up_left + j]);
+
+            int test_cb_up =
+                (grain_block[grain_offset_u + (chroma_offset_y_up + i) * chroma_grain_stride + chroma_offset_x_up + j]);
+
+            int test_cr_up_left = (grain_block[grain_offset_v + (chroma_offset_y_up_left + i) * chroma_grain_stride +
+                                               chroma_offset_x_up_left + j]);
+
+            int test_cr_up =
+                (grain_block[grain_offset_v + (chroma_offset_y_up + i) * chroma_grain_stride + chroma_offset_x_up + j]);
+
+            if (x && (j == 0)) {
+              test_cb_up = clamp_ln0(test_cb_up_left, test_cb_up);
+              test_cr_up = clamp_ln0(test_cr_up_left, test_cr_up);
+            }
+
+            cb_grain_temp[lid][i * true_chroma_subblock_size_x + j] =
+                clamp_ln0(test_cb_up, cb_grain_temp[lid][i * true_chroma_subblock_size_x + j]);
+            cr_grain_temp[lid][i * true_chroma_subblock_size_x + j] =
+                clamp_ln0(test_cr_up, cr_grain_temp[lid][i * true_chroma_subblock_size_x + j]);
+          }
+        }
+      }
+      GroupMemoryBarrierWithGroupSync();
+
+      // Grain blocks application
+      int rounding_offset = (1 << (data.params.scaling_shift - 1));
+      int min_luma, max_luma, min_chroma, max_chroma;
+
+      if (data.params.clip_to_restricted_range) {
+        min_luma = min_luma_legal_range << (bit_depth - 8);
+        max_luma = max_luma_legal_range << (bit_depth - 8);
+        if (data.mc_identity) {
+          min_chroma = min_luma_legal_range << (bit_depth - 8);
+          max_chroma = max_luma_legal_range << (bit_depth - 8);
+        } else {
+          min_chroma = min_chroma_legal_range << (bit_depth - 8);
+          max_chroma = max_chroma_legal_range << (bit_depth - 8);
+        }
+      } else {
+        min_luma = min_chroma = 0;
+        max_luma = max_chroma = (256 << (bit_depth - 8)) - 1;
+      }
+      int cb_mult = data.params.cb_mult - 128;            // fixed scale
+      int cb_luma_mult = data.params.cb_luma_mult - 128;  // fixed scale
+      int cb_offset = (data.params.cb_offset << (bit_depth - 8)) - (1 << bit_depth);
+
+      int cr_mult = data.params.cr_mult - 128;            // fixed scale
+      int cr_luma_mult = data.params.cr_luma_mult - 128;  // fixed scale
+      int cr_offset = (data.params.cr_offset << (bit_depth - 8)) - (1 << bit_depth);
+
+      if (data.params.chroma_scaling_from_luma) {
+        cb_mult = 0;        // fixed scale
+        cb_luma_mult = 64;  // fixed scale
+        cb_offset = 0;
+
+        cr_mult = 0;        // fixed scale
+        cr_luma_mult = 64;  // fixed scale
+        cr_offset = 0;
+      }
+      int apply_y = data.params.num_y_points > 0 ? 1 : 0;
+      int apply_cb = (data.params.num_cb_points > 0 || data.params.chroma_scaling_from_luma) ? 1 : 0;
+      int apply_cr = (data.params.num_cr_points > 0 || data.params.chroma_scaling_from_luma) ? 1 : 0;
+
+      // for (int i = 0; i < (luma_subblock_size_y); i++) {
+      {
+        // for (int j = 0; j < (luma_subblock_size_x); j += 4) {
+        {
+          // Luma grain block application
+          int i = GTid.y;
+          int j = GTid.x * 4;
+
+          const int2 src_luma_plane = data.src_planes[0].xy;
+          int4 in_luma;
+          if (bit_depth == 8) {
+            uint luma_uint = src.Load(src_luma_plane.y + (y + i) * src_luma_plane.x + x + j);
+            in_luma.x = (luma_uint >> 0) & 255;
+            in_luma.y = (luma_uint >> 8) & 255;
+            in_luma.z = (luma_uint >> 16) & 255;
+            in_luma.w = (luma_uint >> 24) & 255;
+          } else {
+            uint2 luma_uint = src.Load2(src_luma_plane.y + (y + i) * src_luma_plane.x + (x + j) * 2);
+            in_luma.x = (luma_uint.x >> 0) & 0x03ff;
+            in_luma.y = (luma_uint.x >> 16) & 0x03ff;
+            in_luma.z = (luma_uint.y >> 0) & 0x03ff;
+            in_luma.w = (luma_uint.y >> 16) & 0x03ff;
+          }
+          int4 scaled_luma;
+
+          if (bit_depth == 8) {
+            scaled_luma.x = data.scaling_lut[in_luma.x].x * luma_grain_temp[lid][i * luma_subblock_size_x + j + 0];
+            scaled_luma.y = data.scaling_lut[in_luma.y].x * luma_grain_temp[lid][i * luma_subblock_size_x + j + 1];
+            scaled_luma.z = data.scaling_lut[in_luma.z].x * luma_grain_temp[lid][i * luma_subblock_size_x + j + 2];
+            scaled_luma.w = data.scaling_lut[in_luma.w].x * luma_grain_temp[lid][i * luma_subblock_size_x + j + 3];
+          } else {
+            scaled_luma.x = scale_LUT(in_luma.x) * luma_grain_temp[lid][i * luma_subblock_size_x + j + 0];
+            scaled_luma.y = scale_LUT(in_luma.y) * luma_grain_temp[lid][i * luma_subblock_size_x + j + 1];
+            scaled_luma.z = scale_LUT(in_luma.z) * luma_grain_temp[lid][i * luma_subblock_size_x + j + 2];
+            scaled_luma.w = scale_LUT(in_luma.w) * luma_grain_temp[lid][i * luma_subblock_size_x + j + 3];
+          }
+          int4 out_luma =
+              clamp(in_luma + ((scaled_luma + rounding_offset) >> data.params.scaling_shift), min_luma, max_luma);
+          if (!apply_y) {
+            out_luma = in_luma;
+          }
+
+          if (data.is_10x3) {
+            luma_grain_temp[lid][i * luma_subblock_size_x + j + 0] = out_luma.x;
+            luma_grain_temp[lid][i * luma_subblock_size_x + j + 1] = out_luma.y;
+            luma_grain_temp[lid][i * luma_subblock_size_x + j + 2] = out_luma.z;
+            luma_grain_temp[lid][i * luma_subblock_size_x + j + 3] = out_luma.w;
+          } else {
+            const int2 dst_luma_plane = data.dst_planes[0].xy;
+            if (((y + i) < data.height) && ((x + j) < data.width)) {
+              if (bit_depth == 8) {
+                dst.Store(dst_luma_plane.y + (y + i) * dst_luma_plane.x + x + j,
+                          out_luma.x | (out_luma.y << 8) | (out_luma.z << 16) | (out_luma.w << 24));
+              } else {
+                dst.Store2(dst_luma_plane.y + (y + i) * dst_luma_plane.x + (x + j) * 2,
+                           uint2((out_luma.x << 0) | (out_luma.y << 16), (out_luma.z << 0) | (out_luma.w << 16)));
+              }
+            }
+          }
+
+          GroupMemoryBarrierWithGroupSync();
+
+          if (data.is_10x3) {
+            int x3 = Gid.x * luma_subblock_size_x * 3;
+            const int2 dst_luma_plane = data.dst_planes[0].xy;
+            for (int j3 = GTid.z * (luma_subblock_size_x / 4) + GTid.x; j3 < luma_subblock_size_x;
+                 j3 += 3 * (luma_subblock_size_x / 4)) {
+              uint3 res;
+              res.x = luma_grain_temp[(j3 * 3 + 0) / (uint)luma_subblock_size_x]
+                                     [i * luma_subblock_size_x + (j3 * 3 + 0) % (uint)luma_subblock_size_x];
+              res.y = luma_grain_temp[(j3 * 3 + 1) / (uint)luma_subblock_size_x]
+                                     [i * luma_subblock_size_x + (j3 * 3 + 1) % (uint)luma_subblock_size_x];
+              res.z = luma_grain_temp[(j3 * 3 + 2) / (uint)luma_subblock_size_x]
+                                     [i * luma_subblock_size_x + (j3 * 3 + 2) % (uint)luma_subblock_size_x];
+              if (((y + i) < data.height) && ((x3 + j3 * 3) < data.width)) {
+                dst.Store(dst_luma_plane.y + (y + i) * dst_luma_plane.x + 4 * x3 / 3U + j3 * 4,
+                          ((res.z & 0x3ff) << 20) | ((res.y & 0x3ff) << 10) | (res.x & 0x3ff));
+              }
+            }
+          }
+
+          // Mean luma calculation
+          if (chroma_subsamp_x) {
+            if ((i & 1) == 0) {
+              avarage_luma[lid][(i >> 1) * chroma_subblock_size_x + (j >> 1) + 0] = (in_luma.x + in_luma.y + 1) >> 1;
+              avarage_luma[lid][(i >> 1) * chroma_subblock_size_x + (j >> 1) + 1] = (in_luma.z + in_luma.w + 1) >> 1;
+            }
+          } else {
+            avarage_luma[lid][i * chroma_subblock_size_x + j + 0] = in_luma.x;
+            avarage_luma[lid][i * chroma_subblock_size_x + j + 1] = in_luma.y;
+            avarage_luma[lid][i * chroma_subblock_size_x + j + 2] = in_luma.z;
+            avarage_luma[lid][i * chroma_subblock_size_x + j + 3] = in_luma.w;
+          }
+
+          GroupMemoryBarrierWithGroupSync();
+
+          // Chroma block application
+          if (i < chroma_subblock_size_y && j < chroma_subblock_size_x && enable_chroma) {
+            {  // cb
+              int4 avarage_luma_c;
+              avarage_luma_c.x = avarage_luma[lid][i * true_chroma_subblock_size_x + j + 0];
+              avarage_luma_c.y = avarage_luma[lid][i * true_chroma_subblock_size_x + j + 1];
+              avarage_luma_c.z = avarage_luma[lid][i * true_chroma_subblock_size_x + j + 2];
+              avarage_luma_c.w = avarage_luma[lid][i * true_chroma_subblock_size_x + j + 3];
+
+              int2 chroma_plane = data.src_planes[1].xy;
+              int4 in_cb;
+              if (bit_depth == 8) {
+                uint cb_uint = src.Load(chroma_plane.y + ((y >> chroma_subsamp_y) + i) * chroma_plane.x +
+                                        (x >> chroma_subsamp_x) + j);
+                in_cb.x = (cb_uint >> 0) & 255;
+                in_cb.y = (cb_uint >> 8) & 255;
+                in_cb.z = (cb_uint >> 16) & 255;
+                in_cb.w = (cb_uint >> 24) & 255;
+              } else {
+                uint2 cb_uint = src.Load2(chroma_plane.y + ((y >> chroma_subsamp_y) + i) * chroma_plane.x +
+                                          ((x >> chroma_subsamp_x) + j) * 2);
+                in_cb.x = (cb_uint.x >> 0) & 0x03ff;
+                in_cb.y = (cb_uint.x >> 16) & 0x03ff;
+                in_cb.z = (cb_uint.y >> 0) & 0x03ff;
+                in_cb.w = (cb_uint.y >> 16) & 0x03ff;
+              }
+              int4 cb_to_scale = clamp(((avarage_luma_c * cb_luma_mult + cb_mult * in_cb) >> 6) + cb_offset, 0,
+                                       (256 << (bit_depth - 8)) - 1);
+              int4 scaled_cb;
+              if (bit_depth == 8) {
+                scaled_cb.x =
+                    data.scaling_lut[cb_to_scale.x].y * (cb_grain_temp[lid][i * true_chroma_subblock_size_x + j + 0]);
+                scaled_cb.y =
+                    data.scaling_lut[cb_to_scale.y].y * (cb_grain_temp[lid][i * true_chroma_subblock_size_x + j + 1]);
+                scaled_cb.z =
+                    data.scaling_lut[cb_to_scale.z].y * (cb_grain_temp[lid][i * true_chroma_subblock_size_x + j + 2]);
+                scaled_cb.w =
+                    data.scaling_lut[cb_to_scale.w].y * (cb_grain_temp[lid][i * true_chroma_subblock_size_x + j + 3]);
+              } else {
+                scaled_cb.x =
+                    scale_LUT_cb(cb_to_scale.x) * (cb_grain_temp[lid][i * true_chroma_subblock_size_x + j + 0]);
+                scaled_cb.y =
+                    scale_LUT_cb(cb_to_scale.y) * (cb_grain_temp[lid][i * true_chroma_subblock_size_x + j + 1]);
+                scaled_cb.z =
+                    scale_LUT_cb(cb_to_scale.z) * (cb_grain_temp[lid][i * true_chroma_subblock_size_x + j + 2]);
+                scaled_cb.w =
+                    scale_LUT_cb(cb_to_scale.w) * (cb_grain_temp[lid][i * true_chroma_subblock_size_x + j + 3]);
+              }
+              int4 out_cb =
+                  clamp(in_cb + ((scaled_cb + rounding_offset) >> data.params.scaling_shift), min_chroma, max_chroma);
+              if (!apply_cb) {
+                out_cb = in_cb;
+              }
+              if (data.is_10x3) {
+                cb_grain_temp[lid][i * true_chroma_subblock_size_x + j + 0] = out_cb.x;
+                cb_grain_temp[lid][i * true_chroma_subblock_size_x + j + 1] = out_cb.y;
+                cb_grain_temp[lid][i * true_chroma_subblock_size_x + j + 2] = out_cb.z;
+                cb_grain_temp[lid][i * true_chroma_subblock_size_x + j + 3] = out_cb.w;
+              } else {
+                if ((((y >> chroma_subsamp_y) + i) < (data.height >> chroma_subsamp_y)) &&
+                    (((x >> chroma_subsamp_x) + j) < (data.width >> chroma_subsamp_x))) {
+                  chroma_plane = data.dst_planes[1].xy;
+                  if (bit_depth == 8) {
+                    dst.Store(
+                        chroma_plane.y + ((y >> chroma_subsamp_y) + i) * chroma_plane.x + (x >> chroma_subsamp_x) + j,
+                        out_cb.x | (out_cb.y << 8) | (out_cb.z << 16) | (out_cb.w << 24));
+                  } else {
+                    dst.Store2(chroma_plane.y + ((y >> chroma_subsamp_y) + i) * chroma_plane.x +
+                                   ((x >> chroma_subsamp_x) + j) * 2,
+                               uint2((out_cb.x << 0) | (out_cb.y << 16), (out_cb.z << 0) | (out_cb.w << 16)));
+                  }
+                }
+              }
+            }
+            {  // cr
+              int4 avarage_luma_c;
+              avarage_luma_c.x = avarage_luma[lid][i * true_chroma_subblock_size_x + j + 0];
+              avarage_luma_c.y = avarage_luma[lid][i * true_chroma_subblock_size_x + j + 1];
+              avarage_luma_c.z = avarage_luma[lid][i * true_chroma_subblock_size_x + j + 2];
+              avarage_luma_c.w = avarage_luma[lid][i * true_chroma_subblock_size_x + j + 3];
+
+              int2 chroma_plane = data.src_planes[2].xy;
+              int4 in_cr;
+              if (bit_depth == 8) {
+                uint cr_uint = src.Load(chroma_plane.y + ((y >> chroma_subsamp_y) + i) * chroma_plane.x +
+                                        (x >> chroma_subsamp_x) + j);
+                in_cr.x = (cr_uint >> 0) & 255;
+                in_cr.y = (cr_uint >> 8) & 255;
+                in_cr.z = (cr_uint >> 16) & 255;
+                in_cr.w = (cr_uint >> 24) & 255;
+              } else {
+                uint2 cr_uint = src.Load2(chroma_plane.y + ((y >> chroma_subsamp_y) + i) * chroma_plane.x +
+                                          ((x >> chroma_subsamp_x) + j) * 2);
+                in_cr.x = (cr_uint.x >> 0) & 0x03ff;
+                in_cr.y = (cr_uint.x >> 16) & 0x03ff;
+                in_cr.z = (cr_uint.y >> 0) & 0x03ff;
+                in_cr.w = (cr_uint.y >> 16) & 0x03ff;
+              }
+
+              int4 cr_to_scale = clamp(((avarage_luma_c * cr_luma_mult + cr_mult * in_cr) >> 6) + cr_offset, 0,
+                                       (256 << (bit_depth - 8)) - 1);
+
+              int4 scaled_cr;
+              if (bit_depth == 8) {
+                scaled_cr.x =
+                    data.scaling_lut[cr_to_scale.x].z * (cr_grain_temp[lid][i * true_chroma_subblock_size_x + j + 0]);
+                scaled_cr.y =
+                    data.scaling_lut[cr_to_scale.y].z * (cr_grain_temp[lid][i * true_chroma_subblock_size_x + j + 1]);
+                scaled_cr.z =
+                    data.scaling_lut[cr_to_scale.z].z * (cr_grain_temp[lid][i * true_chroma_subblock_size_x + j + 2]);
+                scaled_cr.w =
+                    data.scaling_lut[cr_to_scale.w].z * (cr_grain_temp[lid][i * true_chroma_subblock_size_x + j + 3]);
+              } else {
+                scaled_cr.x =
+                    scale_LUT_cr(cr_to_scale.x) * (cr_grain_temp[lid][i * true_chroma_subblock_size_x + j + 0]);
+                scaled_cr.y =
+                    scale_LUT_cr(cr_to_scale.y) * (cr_grain_temp[lid][i * true_chroma_subblock_size_x + j + 1]);
+                scaled_cr.z =
+                    scale_LUT_cr(cr_to_scale.z) * (cr_grain_temp[lid][i * true_chroma_subblock_size_x + j + 2]);
+                scaled_cr.w =
+                    scale_LUT_cr(cr_to_scale.w) * (cr_grain_temp[lid][i * true_chroma_subblock_size_x + j + 3]);
+              }
+
+              int4 out_cr =
+                  clamp(in_cr + ((scaled_cr + rounding_offset) >> data.params.scaling_shift), min_chroma, max_chroma);
+              if (!apply_cr) {
+                out_cr = in_cr;
+              }
+              if (data.is_10x3) {
+                cr_grain_temp[lid][i * true_chroma_subblock_size_x + j + 0] = out_cr.x;
+                cr_grain_temp[lid][i * true_chroma_subblock_size_x + j + 1] = out_cr.y;
+                cr_grain_temp[lid][i * true_chroma_subblock_size_x + j + 2] = out_cr.z;
+                cr_grain_temp[lid][i * true_chroma_subblock_size_x + j + 3] = out_cr.w;
+              } else {
+                if ((((y >> chroma_subsamp_y) + i) < (data.height >> chroma_subsamp_y)) &&
+                    (((x >> chroma_subsamp_x) + j) < (data.width >> chroma_subsamp_x))) {
+                  chroma_plane = data.dst_planes[2].xy;
+                  if (bit_depth == 8) {
+                    dst.Store(
+                        chroma_plane.y + ((y >> chroma_subsamp_y) + i) * chroma_plane.x + (x >> chroma_subsamp_x) + j,
+                        out_cr.x | (out_cr.y << 8) | (out_cr.z << 16) | (out_cr.w << 24));
+                  } else {
+                    dst.Store2(chroma_plane.y + ((y >> chroma_subsamp_y) + i) * chroma_plane.x +
+                                   ((x >> chroma_subsamp_x) + j) * 2,
+                               uint2((out_cr.x << 0) | (out_cr.y << 16), (out_cr.z << 0) | (out_cr.w << 16)));
+                  }
+                }
+              }
+            }
+          }
+
+          GroupMemoryBarrierWithGroupSync();
+
+          if (data.is_10x3) {
+            int x3 = Gid.x * 3 * true_chroma_subblock_size_x;
+            if (GTid.y < true_chroma_subblock_size_y) {
+              i = GTid.y;
+              int2 chroma_plane = data.dst_planes[1].xy;
+              for (int j3 = GTid.z * (true_chroma_subblock_size_x / 4) + GTid.x; j3 < true_chroma_subblock_size_x;
+                   j3 += (true_chroma_subblock_size_x / 4)) {
+                uint3 res;
+                res.x =
+                    cb_grain_temp[(j3 * 3 + 0) / (uint)true_chroma_subblock_size_x]
+                                 [i * true_chroma_subblock_size_x + (j3 * 3 + 0) % (uint)true_chroma_subblock_size_x];
+                res.y =
+                    cb_grain_temp[(j3 * 3 + 1) / (uint)true_chroma_subblock_size_x]
+                                 [i * true_chroma_subblock_size_x + (j3 * 3 + 1) % (uint)true_chroma_subblock_size_x];
+                res.z =
+                    cb_grain_temp[(j3 * 3 + 2) / (uint)true_chroma_subblock_size_x]
+                                 [i * true_chroma_subblock_size_x + (j3 * 3 + 2) % (uint)true_chroma_subblock_size_x];
+                if ((((y >> 1) + i) < (data.height >> 1)) && ((x3 + j3 * 3) < (data.width >> 1))) {
+                  dst.Store(chroma_plane.y + ((y >> 1) + i) * chroma_plane.x + 4 * x3 / 3U + j3 * 4,
+                            ((res.z & 0x3ff) << 20) | ((res.y & 0x3ff) << 10) | (res.x & 0x3ff));
+                }
+              }
+            } else {
+              i = GTid.y - true_chroma_subblock_size_y;
+              int2 chroma_plane = data.dst_planes[2].xy;
+              for (int j3 = GTid.z * (true_chroma_subblock_size_x / 4) + GTid.x; j3 < true_chroma_subblock_size_x;
+                   j3 += (true_chroma_subblock_size_x / 4)) {
+                uint3 res;
+                res.x =
+                    cr_grain_temp[(j3 * 3 + 0) / (uint)true_chroma_subblock_size_x]
+                                 [i * true_chroma_subblock_size_x + (j3 * 3 + 0) % (uint)true_chroma_subblock_size_x];
+                res.y =
+                    cr_grain_temp[(j3 * 3 + 1) / (uint)chroma_subblock_size_x]
+                                 [i * true_chroma_subblock_size_x + (j3 * 3 + 1) % (uint)true_chroma_subblock_size_x];
+                res.z =
+                    cr_grain_temp[(j3 * 3 + 2) / (uint)chroma_subblock_size_x]
+                                 [i * true_chroma_subblock_size_x + (j3 * 3 + 2) % (uint)true_chroma_subblock_size_x];
+                if ((((y >> 1) + i) < (data.height >> 1)) && ((x3 + j3 * 3) < (data.width >> 1))) {
+                  dst.Store(chroma_plane.y + (y / 2U + i) * chroma_plane.x + 4 * x3 / 3U + j3 * 4,
+                            ((res.z & 0x3ff) << 20) | ((res.y & 0x3ff) << 10) | (res.x & 0x3ff));
+                }
+              }
+            }
+          }
+        }
+      }
+    }
+  }
+}

diff --git a/libav1/dx/shaders/film_grain_gen_chroma.hlsl b/libav1/dx/shaders/film_grain_gen_chroma.hlsl
new file mode 100644
index 0000000..f592834
--- /dev/null
+++ b/libav1/dx/shaders/film_grain_gen_chroma.hlsl

@@ -0,0 +1,166 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "film_grain_const.h"
+
+StructuredBuffer<int> random : register(t0);
+StructuredBuffer<int> random_uv : register(t1);
+StructuredBuffer<int> gaussian_sequence : register(t2);
+RWStructuredBuffer<int> dst : register(u0);
+
+cbuffer cb_film_grain_data : register(b0) { FilmGrainGenData data; };
+
+uint get_random_number(uint random_register_new) {
+  uint bit;
+  bit = ((random_register_new >> 0) ^ (random_register_new >> 1) ^ (random_register_new >> 3) ^
+         (random_register_new >> 12)) &
+        1;
+  return (random_register_new >> 1) | (bit << 15);
+}
+int clamp_rand(int bits, uint rand) { return (rand >> (16 - bits)) & ((1 << bits) - 1); }
+
+uint init_random_generator(int luma_line, uint seed) {
+  // same for the picture
+
+  uint msb = (seed >> 8) & 255;
+  uint lsb = seed & 255;
+
+  uint random_register = (msb << 8) + lsb;
+
+  //  changes for each row
+  int luma_num = luma_line >> 5;
+
+  random_register ^= ((luma_num * 37 + 178) & 255) << 8;
+  random_register ^= ((luma_num * 173 + 105) & 255);
+  return random_register;
+}
+
+#define WGH 64
+groupshared int crcb_grain_temp[38 * 44];
+groupshared int luma_grain_temp[70 * 76];
+
+[numthreads(WGH, 1, 1)] void main(uint3 DTid
+                                  : SV_DispatchThreadID, uint3 GTid
+                                  : SV_GroupThreadID) {
+  const int chroma_block_size_y = data.chroma_block_size_y;
+  const int chroma_block_size_x = data.chroma_block_size_x;
+  const int chroma_grain_stride = data.chroma_grain_stride;
+  const int luma_grain_stride = data.luma_grain_stride;
+  const int dst_offset_u = data.dst_offset_u;
+  const int dst_offset_v = data.dst_offset_v;
+  int bit_depth = data.params.bit_depth;
+  int grain_center = 128 << (bit_depth - 8);
+  int grain_min = 0 - grain_center;
+  int grain_max = (256 << (bit_depth - 8)) - 1 - grain_center;
+  int gauss_sec_shift = 12 - bit_depth + data.params.grain_scale_shift;
+
+  int num_pos_chroma = 2 * data.params.ar_coeff_lag * (data.params.ar_coeff_lag + 1);
+  if (data.params.num_y_points > 0) ++num_pos_chroma;
+  int rounding_offset = (1 << (data.params.ar_coeff_shift - 1));
+  int skip = 0;
+  if (DTid.y == 0) {
+    if (!(data.params.num_cb_points || data.params.chroma_scaling_from_luma)) {
+      skip = 1;
+    }
+  } else {
+    if (!(data.params.num_cr_points || data.params.chroma_scaling_from_luma)) {
+      skip = 1;
+    }
+  }
+  if (!skip) {
+    if (GTid.x == 0) {
+      uint rand_reg = init_random_generator((7 + DTid.y * 4) << 5, data.params.random_seed);
+      rand_reg = get_random_number(rand_reg);
+      crcb_grain_temp[0] = rand_reg;
+      // could be optimizaed by 2^ setps parallelization
+      for (int i = 1; i < chroma_block_size_y; i++) {
+        rand_reg = random_uv[rand_reg];
+        crcb_grain_temp[i * chroma_grain_stride] = rand_reg;
+      }
+    }
+    GroupMemoryBarrier();
+    for (int i = GTid.x; i < chroma_block_size_y; i += WGH) {
+      uint rand_reg = crcb_grain_temp[i * chroma_grain_stride];
+      for (int j = 1; j < chroma_block_size_x; j++) {
+        rand_reg = get_random_number(rand_reg);
+        crcb_grain_temp[i * chroma_grain_stride + j] = rand_reg;
+      }
+    }
+    GroupMemoryBarrier();
+    for (int j = GTid.x; j < chroma_block_size_y * chroma_block_size_x; j += WGH) {
+      crcb_grain_temp[j] =
+          (gaussian_sequence[clamp_rand(gauss_bits, crcb_grain_temp[j])] + ((1 << gauss_sec_shift) >> 1)) >>
+          gauss_sec_shift;
+    }
+  } else {
+    for (int j = GTid.x; j < chroma_block_size_y * chroma_block_size_x; j += WGH) crcb_grain_temp[j] = 0;
+  }
+  int i;
+  for (i = 0; i < 70; i++) {
+    for (int j = GTid.x; j < 76; j += WGH) {
+      luma_grain_temp[i * 76 + j] = dst[(data.top_pad + i) * luma_grain_stride + j + data.left_pad];
+    }
+  }
+  GroupMemoryBarrierWithGroupSync();
+  for (i = data.top_pad + GTid.x; i < chroma_block_size_y - data.bottom_pad; i += WGH) {
+    for (int j = data.left_pad - GTid.x * (data.left_pad + 1); j < chroma_block_size_x - data.right_pad; j++) {
+      if (j < data.left_pad) {
+        continue;
+      }
+      int wsum_crcb = 0;
+      if (data.params.ar_coeff_lag == 3) {
+        for (uint pos = 0; pos < 24; pos++) {
+          wsum_crcb = wsum_crcb + data.params.ar_coeffs[1 + DTid.y][pos].x *
+                                      crcb_grain_temp[(i + pos / 7 - 3) * chroma_grain_stride + j + pos % 7 - 3];
+        }
+      } else {
+        for (int pos = 0; pos < num_pos_chroma - 1; pos++) {
+          wsum_crcb =
+              wsum_crcb +
+              data.params.ar_coeffs[1 + DTid.y][pos].x *
+                  crcb_grain_temp[(i + data.pred_pos[pos][0].y) * chroma_grain_stride + j + data.pred_pos[pos][1].y];
+        }
+      }
+      if (data.params.num_y_points > 0) {
+        int av_luma = 0;
+        int luma_coord_y = ((i - data.top_pad) << 1);
+        int luma_coord_x = ((j - data.left_pad) << 1);
+
+        for (int k = luma_coord_y; k < luma_coord_y + 1 + 1; k++)
+          for (int l = luma_coord_x; l < luma_coord_x + 1 + 1; l++) av_luma += luma_grain_temp[k * 76 + l];
+
+        av_luma = (av_luma + ((1 << (1 + 1)) >> 1)) >> (1 + 1);
+
+        wsum_crcb = wsum_crcb + data.params.ar_coeffs[DTid.y + 1][num_pos_chroma - 1].x * av_luma;
+      }
+      if (!skip)
+        crcb_grain_temp[i * chroma_grain_stride + j] =
+            clamp(crcb_grain_temp[i * chroma_grain_stride + j] +
+                      ((wsum_crcb + rounding_offset) >> data.params.ar_coeff_shift),
+                  grain_min, grain_max);
+    }
+  }
+  GroupMemoryBarrierWithGroupSync();
+  if (DTid.y == 0) {
+    for (int j = GTid.x; j < chroma_block_size_y * chroma_block_size_x; j += WGH) {
+      dst[dst_offset_u + j] = crcb_grain_temp[j];
+    }
+  } else {
+    for (int j = GTid.x; j < chroma_block_size_y * chroma_block_size_x; j += WGH) {
+      dst[dst_offset_v + j] = crcb_grain_temp[j];
+    }
+  }
+}

diff --git a/libav1/dx/shaders/film_grain_gen_luma.hlsl b/libav1/dx/shaders/film_grain_gen_luma.hlsl
new file mode 100644
index 0000000..a4c1b57
--- /dev/null
+++ b/libav1/dx/shaders/film_grain_gen_luma.hlsl

@@ -0,0 +1,108 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "film_grain_const.h"
+
+StructuredBuffer<int> random : register(t0);
+StructuredBuffer<int> random_uv : register(t1);
+StructuredBuffer<int> gaussian_sequence : register(t2);
+RWStructuredBuffer<int> dst : register(u0);
+
+cbuffer cb_film_grain_data : register(b0) { FilmGrainGenData data; };
+
+uint get_random_number(uint random_register_new) {
+  uint bit;
+  bit = ((random_register_new >> 0) ^ (random_register_new >> 1) ^ (random_register_new >> 3) ^
+         (random_register_new >> 12)) &
+        1;
+  return (random_register_new >> 1) | (bit << 15);
+}
+int clamp_rand(int bits, uint rand) { return (rand >> (16 - bits)) & ((1 << bits) - 1); }
+
+groupshared int luma_grain_temp[73 * 82];
+#define WGH 64
+[numthreads(WGH, 1, 1)] void main(uint3 thread
+                                  : SV_DispatchThreadID) {
+  const int luma_block_size_x = data.luma_block_size_x;
+  const int luma_block_size_y = data.luma_block_size_y;
+  const int luma_grain_stride = data.luma_grain_stride;
+
+  if (data.params.num_y_points == 0) {
+    for (int j = thread.x; j < luma_block_size_x * luma_block_size_y; j += WGH) dst[j] = 0;
+    return;
+  }
+  int bit_depth = data.params.bit_depth;
+  int grain_center = 128 << (bit_depth - 8);
+  int grain_min = 0 - grain_center;
+  int grain_max = (256 << (bit_depth - 8)) - 1 - grain_center;
+  int gauss_sec_shift = 12 - bit_depth + data.params.grain_scale_shift;
+
+  int rounding_offset = (1 << (data.params.ar_coeff_shift - 1));
+  int num_pos_luma = 2 * data.params.ar_coeff_lag * (data.params.ar_coeff_lag + 1);
+
+  if (thread.x == 0) {
+    uint rand_reg = get_random_number(data.params.random_seed);
+    luma_grain_temp[0] = rand_reg;
+    // could be optimizaed by 2^ setps parallelization
+    for (int i = 1; i < luma_block_size_y; i++) {
+      rand_reg = random[rand_reg];
+      luma_grain_temp[i * luma_grain_stride] = rand_reg;
+    }
+  }
+  GroupMemoryBarrierWithGroupSync();
+  int i, j;
+  for (i = thread.x; i < luma_block_size_y; i += WGH) {
+    uint rand_reg = luma_grain_temp[i * luma_grain_stride];
+    for (int j = 1; j < luma_block_size_x; j++) {
+      rand_reg = get_random_number(rand_reg);
+      luma_grain_temp[i * luma_grain_stride + j] = rand_reg;
+    }
+  }
+  GroupMemoryBarrierWithGroupSync();
+  for (j = thread.x; j < luma_block_size_x * luma_block_size_y; j += WGH) {
+    luma_grain_temp[j] =
+        (gaussian_sequence[clamp_rand(gauss_bits, luma_grain_temp[j])] + ((1 << gauss_sec_shift) >> 1)) >>
+        gauss_sec_shift;
+  }
+  GroupMemoryBarrierWithGroupSync();
+  for (i = data.top_pad + (thread.x); i < luma_block_size_y - data.bottom_pad; i += WGH) {
+    for (int j = data.left_pad - (thread.x) * (data.left_pad + 1); j < luma_block_size_x - data.right_pad; j++) {
+      if (j < data.left_pad) {
+        continue;
+      }
+      int wsum = 0;
+      int offset = (i - 3) * luma_grain_stride + j - 3;
+      if (data.params.ar_coeff_lag == 3) {
+        for (uint pos = 0; pos < 24; pos++) {
+          wsum += data.params.ar_coeffs[0][pos].x * luma_grain_temp[offset + (pos / 7) * luma_grain_stride + pos % 7];
+        }
+      } else {
+        for (int pos = 0; pos < num_pos_luma; pos++) {
+          wsum += data.params.ar_coeffs[0][pos].x *
+                  luma_grain_temp[(i + data.pred_pos[pos][0].x) * luma_grain_stride + j + data.pred_pos[pos][1].x];
+        }
+      }
+
+      int val =
+          clamp(luma_grain_temp[i * luma_grain_stride + j] + ((wsum + rounding_offset) >> data.params.ar_coeff_shift),
+                grain_min, grain_max);
+
+      luma_grain_temp[i * luma_grain_stride + j] = val;
+    }
+  }
+  GroupMemoryBarrierWithGroupSync();
+  for (j = thread.x; j < luma_block_size_x * luma_block_size_y; j += WGH) dst[j] = luma_grain_temp[j];
+}
\ No newline at end of file

diff --git a/libav1/dx/shaders/gen_lf_blocks_hor.hlsl b/libav1/dx/shaders/gen_lf_blocks_hor.hlsl
new file mode 100644
index 0000000..5fbdd4f
--- /dev/null
+++ b/libav1/dx/shaders/gen_lf_blocks_hor.hlsl

@@ -0,0 +1,228 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "mode_info.h"
+
+StructuredBuffer<MB_MODE_INFO> mi_pool : register(t0);
+ByteAddressBuffer mi_grid : register(t1);
+RWByteAddressBuffer dst_buf : register(u0);
+
+cbuffer GenLfData : register(b0) {
+  uint cb_mi_stride;
+  uint cb_mi_addr_base;
+  uint cb_delta_q_info_delta_lf_present_flag;
+  int cb_delta_q_info_delta_lf_multi;
+  int cb_lf_mode_ref_delta_enabled;
+  int3 reserved;
+  int2 cb_lf_filter_level;
+  int cb_lf_filter_level_u;
+  int cb_lf_filter_level_v;
+  int4 cb_lf_mode_deltas[2];
+  int4 cb_lf_ref_deltas[8];
+  int4 cb_seg_features[8];
+  int4 cb_seg_data[64];
+  int4 cb_lfi_n_lvl[3 * 8 * 8 * 2];
+};
+
+cbuffer GenLfSRT : register(b1) {
+  uint cb_wicount;
+  uint cb_mi_cols;
+  uint cb_mi_rows;
+  uint cb_plane;
+  uint cb_dst_offset;
+  uint cb_dst_stride;
+};
+
+int get_mi_index(ByteAddressBuffer grid, int index, uint base) {
+  uint2 addr = grid.Load2(index * 8);
+  if (addr.x == 0 && addr.y == 0)
+    return -1;
+  else {
+    addr.x -= base;
+    return addr.x / ModeInfoSize;
+  }
+}
+
+int get_filter_level_hor(int plane, int ref0, int mode, int segment_id, int delta_from_base, int delta[4]) {
+  segment_id = (segment_id >> 24) & 7;
+  const int mode_lut = (0x017f6000 >> (mode & 255)) & 1;
+  if (cb_delta_q_info_delta_lf_present_flag) {
+    const int lut_val = plane + 1;  // plane + 1 for DIR = 1;
+    int delta_lf;
+    if (cb_delta_q_info_delta_lf_multi) {
+      delta_lf = delta[lut_val];
+    } else {
+      delta_lf = delta_from_base;
+    }
+
+    int base_level;
+    if (plane == 0)
+      base_level = cb_lf_filter_level.y;
+    else if (plane == 1)
+      base_level = cb_lf_filter_level_u;
+    else
+      base_level = cb_lf_filter_level_v;
+
+    int lvl_seg = clamp(delta_lf + base_level, 0, 63);
+
+    const int seg_lf_feature_id = lut_val + 1;
+    if (cb_seg_features[segment_id].y & (1 << seg_lf_feature_id)) {
+      const int d = cb_seg_data[segment_id * 8 + seg_lf_feature_id].x;
+      lvl_seg = clamp(lvl_seg + d, 0, 63);
+    }
+
+    if (cb_lf_mode_ref_delta_enabled) {
+      const int scale = 1 << (lvl_seg >> 5);
+      lvl_seg += cb_lf_ref_deltas[ref0].x * scale;
+      if (ref0 > 0) lvl_seg += cb_lf_mode_deltas[mode_lut].x * scale;
+      lvl_seg = clamp(lvl_seg, 0, 63);
+    }
+    return lvl_seg;
+  } else {
+    return cb_lfi_n_lvl[2 * 8 * 8 * plane + 2 * 8 * segment_id + 2 * ref0 + mode_lut].y;
+  }
+}
+
+int get_tx_size_hor(int plane, int col, int row, int bw_log, int bh_log, int is_inter, int skip, uint tx_info,
+                    uint inter_tx_size[4]) {
+  int txstep = 1;
+  if (!cb_seg_features[(tx_info >> 24) & 7].x)  //! is_lossless?
+  {
+    int tx_size = 0;
+    if (plane == 0) {
+      if (is_inter && !skip) {
+        const int blk_row = row & ((1 << bh_log) - 1);
+        const int blk_col = col & ((1 << bw_log) - 1);
+        int bw_log1 = min(4, bw_log);
+        int bh_log1 = min(4, bh_log);
+        int tx_w_log = max(0, bw_log1 - (bw_log1 >= bh_log1));
+        int tx_h_log = max(0, bh_log1 - (bh_log1 >= bw_log1));
+        const int idx = ((blk_row >> tx_h_log) << (bw_log - tx_w_log)) + (blk_col >> tx_w_log);
+        tx_size = inter_tx_size[idx >> 2] >> ((idx & 3) << 3);
+      } else {
+        tx_size = tx_info;
+      }
+      tx_size &= 255;
+      // tx_size = clamp(tx_size, 0, 18);
+      const int tx_size_high[] = {1, 2, 4, 8, 16, 2, 1, 4, 2, 8, 4, 16, 8, 4, 1, 8, 2, 16, 4};
+      txstep = tx_size_high[tx_size];
+    } else {
+      txstep = 1 << clamp(bh_log - 1, 0, 3);
+    }
+  }
+  return txstep;
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wicount) return;
+
+  const int mi_size_wide_log2[] = {0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 0, 2, 1, 3, 2, 4};
+  const int mi_size_high_log2[] = {0, 1, 0, 1, 2, 1, 2, 3, 2, 3, 4, 3, 4, 5, 4, 5, 2, 0, 3, 1, 4, 2};
+
+  const int plane = cb_plane;
+  const int subsampling = plane > 0;
+  const int mi_cols = cb_mi_cols;
+
+  const int mi_col = thread.x % mi_cols;
+  const int mi_row = (thread.x / mi_cols) * 16;
+
+  int r_end = clamp(cb_mi_rows - mi_row, 0, 16);
+
+  const int dst_addr = cb_dst_offset + mi_col * 32 + cb_dst_stride * mi_row;
+  dst_buf.Store4(dst_addr, uint4(0, 0, 0, 0));
+  dst_buf.Store4(dst_addr + 16, uint4(0, 0, 0, 0));
+
+  int r = 0;
+  uint edges = 0;
+  for (; r < r_end;) {
+    const int col = (mi_col << subsampling) | subsampling;
+    const int row = ((mi_row + r) << subsampling) | subsampling;
+
+    const int mi0 = col + row * cb_mi_stride;
+    const int mi_idx = get_mi_index(mi_grid, mi0, cb_mi_addr_base);
+
+    int filter_length = 0;
+    int filter_level = 0;
+
+    if (mi_idx == -1) break;
+
+    uint tx_info = mi_pool[mi_idx].tx_info;
+    uint blk_type = mi_pool[mi_idx].block_type;
+
+    const int bsize = blk_type & 255;
+    const int bw_log = mi_size_wide_log2[bsize];
+    const int bh_log = mi_size_high_log2[bsize];
+    const int skip = (tx_info & 0xff00) != 0;
+    const int ref0 = ((int)blk_type << 8) >> 24;
+    const int is_inter = ref0 > 0 || (mi_pool[mi_idx].intra_mode_flags & 0x100) != 0;
+    const int ts =
+        get_tx_size_hor(plane, col, row, bw_log, bh_log, is_inter, skip, tx_info, mi_pool[mi_idx].inter_tx_size);
+
+    if ((mi_row + r) & (ts - 1)) break;
+
+    const int curr_level = get_filter_level_hor(plane, ref0, mi_pool[mi_idx].modes, tx_info,
+                                                mi_pool[mi_idx].delta_lf_from_base, mi_pool[mi_idx].delta_lf);
+    const int curr_skipped = skip && is_inter;
+    filter_level = curr_level;
+    if ((mi_row + r) > 0) {
+      const int mi1 = mi0 - (cb_mi_stride << subsampling);
+      const int mi1_idx = get_mi_index(mi_grid, mi1, cb_mi_addr_base);
+      if (mi1_idx == -1) break;
+
+      uint pv_tx_info = mi_pool[mi1_idx].tx_info;
+      uint pv_blk_type = mi_pool[mi1_idx].block_type;
+      const int pv_bsize = pv_blk_type & 255;
+      const int pv_bw_log = mi_size_wide_log2[pv_bsize];
+      const int pv_bh_log = mi_size_high_log2[pv_bsize];
+      const int pv_skip = (pv_tx_info & 0xff00) != 0;
+      const int pv_ref0 = ((int)pv_blk_type << 8) >> 24;
+      const int pv_is_inter = pv_ref0 > 0 || (mi_pool[mi1_idx].intra_mode_flags & 0x100) != 0;
+      const int pv_ts = get_tx_size_hor(plane, col, row - (1 << subsampling), pv_bw_log, pv_bh_log, pv_is_inter,
+                                        pv_skip, pv_tx_info, mi_pool[mi1_idx].inter_tx_size);
+      const int pv_lvl = get_filter_level_hor(plane, pv_ref0, mi_pool[mi1_idx].modes, pv_tx_info,
+                                              mi_pool[mi1_idx].delta_lf_from_base, mi_pool[mi1_idx].delta_lf);
+
+      const int pu_edge = ((mi_row + r) & ((1 << max(bh_log - subsampling, 0)) - 1)) == 0;
+
+      if ((curr_level || pv_lvl) && (!(pv_skip && pv_is_inter) || !curr_skipped || pu_edge)) {
+        const int min_ts = min(ts, pv_ts);
+        if (min_ts <= 1) {
+          filter_length = 1;  // 4;
+        } else if (min_ts == 2) {
+          filter_length = 3 - subsampling;
+        } else {
+          filter_length = 4;
+          if (plane != 0) {
+            filter_length = 2;  // 6;
+          }
+        }
+        filter_level = (curr_level) ? (curr_level) : (pv_lvl);
+      }
+    }
+
+    uint edge = 0;
+    if (filter_length)
+      edge = filter_level | ((filter_length - 1) << 6) | (ts << 8);
+    else
+      edge = ts << 8;
+
+    if (r & 1) edge = edges | (edge << 16);
+    dst_buf.Store(dst_addr + (r & ~1) * 2, edge);
+    edges = edge;
+    r += ts;
+  }
+}

diff --git a/libav1/dx/shaders/gen_lf_blocks_vert.hlsl b/libav1/dx/shaders/gen_lf_blocks_vert.hlsl
new file mode 100644
index 0000000..68bbf15
--- /dev/null
+++ b/libav1/dx/shaders/gen_lf_blocks_vert.hlsl

@@ -0,0 +1,235 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "mode_info.h"
+
+StructuredBuffer<MB_MODE_INFO> mi_pool : register(t0);
+ByteAddressBuffer mi_grid : register(t1);
+RWByteAddressBuffer dst_buf : register(u0);
+
+cbuffer GenLfData : register(b0) {
+  uint cb_mi_stride;
+  uint cb_mi_addr_base;
+  uint cb_delta_q_info_delta_lf_present_flag;
+  int cb_delta_q_info_delta_lf_multi;
+  int cb_lf_mode_ref_delta_enabled;
+  int3 reserved;
+  int2 cb_lf_filter_level;
+  int cb_lf_filter_level_u;
+  int cb_lf_filter_level_v;
+  int4 cb_lf_mode_deltas[2];
+  int4 cb_lf_ref_deltas[8];
+  int4 cb_seg_features[8];
+  int4 cb_seg_data[64];
+  int4 cb_lfi_n_lvl[3 * 8 * 8 * 2];
+};
+
+cbuffer GenLfSRT : register(b1) {
+  uint cb_wicount;
+  uint cb_mi_cols;
+  uint cb_mi_rows;
+  uint cb_plane;
+  uint cb_dst_offset;
+  uint cb_dst_stride;
+};
+
+int get_mi_index(ByteAddressBuffer grid, int index, uint base) {
+  uint2 addr = grid.Load2(index * 8);
+  if (addr.x == 0 && addr.y == 0)
+    return -1;
+  else {
+    addr.x -= base;
+    return addr.x / ModeInfoSize;
+  }
+}
+
+int get_filter_level_hor(int plane, int ref0, int mode, int segment_id, int delta_from_base, int delta[4]) {
+  segment_id = (segment_id >> 24) & 7;
+  const int mode_lut = (0x017f6000 >> (mode & 255)) & 1;
+  if (cb_delta_q_info_delta_lf_present_flag) {
+    const int lut_val = plane + (plane > 0);  // plane + 1 for DIR = 1;
+    int delta_lf;
+    if (cb_delta_q_info_delta_lf_multi) {
+      delta_lf = delta[lut_val];
+    } else {
+      delta_lf = delta_from_base;
+    }
+
+    int base_level;
+    if (plane == 0)
+      base_level = cb_lf_filter_level.x;
+    else if (plane == 1)
+      base_level = cb_lf_filter_level_u;
+    else
+      base_level = cb_lf_filter_level_v;
+
+    int lvl_seg = clamp(delta_lf + base_level, 0, 63);
+
+    const int seg_lf_feature_id = lut_val + 1;
+    if (cb_seg_features[segment_id].y & (1 << seg_lf_feature_id)) {
+      const int d = cb_seg_data[segment_id * 8 + seg_lf_feature_id].x;
+      lvl_seg = clamp(lvl_seg + d, 0, 63);
+    }
+
+    if (cb_lf_mode_ref_delta_enabled) {
+      const int scale = 1 << (lvl_seg >> 5);
+      lvl_seg += cb_lf_ref_deltas[ref0].x * scale;
+      if (ref0 > 0) lvl_seg += cb_lf_mode_deltas[mode_lut].x * scale;
+      lvl_seg = clamp(lvl_seg, 0, 63);
+    }
+    return lvl_seg;
+  } else {
+    return cb_lfi_n_lvl[2 * 8 * 8 * plane + 2 * 8 * segment_id + 2 * ref0 + mode_lut].x;
+  }
+}
+
+int get_tx_size_hor(int plane, int col, int row, int bw_log, int bh_log, int is_inter, int skip, uint tx_info,
+                    uint inter_tx_size[4]) {
+  int txstep = 1;
+  if (!cb_seg_features[(tx_info >> 24) & 7].x)  //! is_lossless?
+  {
+    int tx_size = 0;
+    if (plane == 0) {
+      if (is_inter && !skip) {
+        const int blk_row = row & ((1 << bh_log) - 1);
+        const int blk_col = col & ((1 << bw_log) - 1);
+        int bw_log1 = min(4, bw_log);
+        int bh_log1 = min(4, bh_log);
+        int tx_w_log = max(0, bw_log1 - (bw_log1 >= bh_log1));
+        int tx_h_log = max(0, bh_log1 - (bh_log1 >= bw_log1));
+        const int idx = ((blk_row >> tx_h_log) << (bw_log - tx_w_log)) + (blk_col >> tx_w_log);
+        tx_size = inter_tx_size[idx >> 2] >> ((idx & 3) << 3);
+      } else {
+        tx_size = tx_info;
+      }
+      tx_size &= 255;
+      // tx_size = clamp(tx_size, 0, 18);
+      const int tx_size_wide[] = {1, 2, 4, 8, 16, 1, 2, 2, 4, 4, 8, 8, 16, 1, 4, 2, 8, 4, 16};
+      txstep = tx_size_wide[tx_size];
+    } else {
+      txstep = 1 << clamp(bw_log - 1, 0, 3);
+    }
+  }
+  return txstep;
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wicount) return;
+
+  const int mi_size_wide_log2[] = {0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 0, 2, 1, 3, 2, 4};
+  const int mi_size_high_log2[] = {0, 1, 0, 1, 2, 1, 2, 3, 2, 3, 4, 3, 4, 5, 4, 5, 2, 0, 3, 1, 4, 2};
+
+  const int plane = cb_plane;
+  const int subsampling = plane > 0;
+  const int mi_cols = cb_mi_cols;
+
+  const int count_x = (mi_cols + 15) >> 4;
+  const int mi_col = (thread.x % count_x) * 16;
+  const int mi_row = thread.x / count_x;
+
+  int c_end = clamp(mi_cols - mi_col, 0, 16);
+  const int dst_addr = cb_dst_offset + mi_col * 2 + cb_dst_stride * mi_row;
+  dst_buf.Store4(dst_addr, uint4(0, 0, 0, 0));
+  dst_buf.Store4(dst_addr + 16, uint4(0, 0, 0, 0));
+
+  int err = 0;
+  int c = 0;
+  uint edges = 0;
+  for (; c < c_end;) {
+    const int col = ((mi_col + c) << subsampling) | subsampling;
+    const int row = (mi_row << subsampling) | subsampling;
+
+    const int mi0 = col + row * cb_mi_stride;
+    const int mi_idx = get_mi_index(mi_grid, mi0, cb_mi_addr_base);
+
+    int filter_length = 0;
+    int filter_level = 0;
+
+    if (mi_idx == -1) {
+      err = 1;
+      break;
+    }
+    uint tx_info = mi_pool[mi_idx].tx_info;
+    uint blk_type = mi_pool[mi_idx].block_type;
+
+    const int bsize = blk_type & 255;
+    const int bw_log = mi_size_wide_log2[bsize];
+    const int bh_log = mi_size_high_log2[bsize];
+    const int skip = (tx_info & 0xff00) != 0;
+    const int ref0 = ((int)blk_type << 8) >> 24;
+    const int is_inter = ref0 > 0 || (mi_pool[mi_idx].intra_mode_flags & 0x100) != 0;
+    const int ts =
+        get_tx_size_hor(plane, col, row, bw_log, bh_log, is_inter, skip, tx_info, mi_pool[mi_idx].inter_tx_size);
+
+    if ((mi_col + c) & (ts - 1)) {
+      err = 2;
+      break;
+    }
+    const int curr_level = get_filter_level_hor(plane, ref0, mi_pool[mi_idx].modes, tx_info,
+                                                mi_pool[mi_idx].delta_lf_from_base, mi_pool[mi_idx].delta_lf);
+    const int curr_skipped = skip && is_inter;
+    filter_level = curr_level;
+    if ((mi_col + c) > 0) {
+      const int mi1 = mi0 - (1 << subsampling);
+      const int mi1_idx = get_mi_index(mi_grid, mi1, cb_mi_addr_base);
+      if (mi1_idx == -1) {
+        err = 3;
+        break;
+      }
+      uint pv_tx_info = mi_pool[mi1_idx].tx_info;
+      uint pv_blk_type = mi_pool[mi1_idx].block_type;
+      const int pv_bsize = pv_blk_type & 255;
+      const int pv_bw_log = mi_size_wide_log2[pv_bsize];
+      const int pv_bh_log = mi_size_high_log2[pv_bsize];
+      const int pv_skip = (pv_tx_info & 0xff00) != 0;
+      const int pv_ref0 = ((int)pv_blk_type << 8) >> 24;
+      const int pv_is_inter = pv_ref0 > 0 || (mi_pool[mi1_idx].intra_mode_flags & 0x100) != 0;
+      const int pv_ts = get_tx_size_hor(plane, col - (1 << subsampling), row, pv_bw_log, pv_bh_log, pv_is_inter,
+                                        pv_skip, pv_tx_info, mi_pool[mi1_idx].inter_tx_size);
+      const int pv_lvl = get_filter_level_hor(plane, pv_ref0, mi_pool[mi1_idx].modes, pv_tx_info,
+                                              mi_pool[mi1_idx].delta_lf_from_base, mi_pool[mi1_idx].delta_lf);
+
+      const int pu_edge = ((mi_col + c) & ((1 << max(bw_log - subsampling, 0)) - 1)) == 0;
+
+      if ((curr_level || pv_lvl) && (!(pv_skip && pv_is_inter) || !curr_skipped || pu_edge)) {
+        const int min_ts = min(ts, pv_ts);
+        if (min_ts <= 1) {
+          filter_length = 1;  // 4;
+        } else if (min_ts == 2) {
+          filter_length = 3 - subsampling;
+        } else {
+          filter_length = 4;
+          if (plane != 0) {
+            filter_length = 2;  // 6;
+          }
+        }
+        filter_level = (curr_level) ? (curr_level) : (pv_lvl);
+      }
+    }
+
+    uint edge = 0;
+    if (filter_length)
+      edge = filter_level | ((filter_length - 1) << 6) | (ts << 8);
+    else
+      edge = ts << 8;
+
+    if (c & 1) edge = edges | (edge << 16);
+    dst_buf.Store(dst_addr + (c & ~1) * 2, edge);
+    edges = edge;
+    c += ts;
+  }
+}

diff --git a/libav1/dx/shaders/gen_pred_blocks.hlsl b/libav1/dx/shaders/gen_pred_blocks.hlsl
new file mode 100644
index 0000000..5bfbee9
--- /dev/null
+++ b/libav1/dx/shaders/gen_pred_blocks.hlsl

@@ -0,0 +1,978 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "mode_info.h"
+
+#define Warp 0
+#define CasualInter 1
+#define CompoundAvrg 2
+#define CompoundDiff 3
+#define CompoundMasked 4
+#define CompoundGlobalWarp 5
+#define CompoundDiffUv 6
+#define CompoundDiffUvGlobalWarp 7
+#define ObmcAbove 8
+#define ObmcLeft 9
+#define Inter2x2 10
+#define Inter2x2Comp 11
+#define Inter2x2CompP2 12
+#define InterSizesAllCommon 24
+#define Inter2x2ArrOffset 216
+#define Inter2x2Count 3
+#define InterCountsAll 219
+
+#define CompoundTypeAvrg 0
+#define CompoundTypeMasked 1
+#define CompoundTypeDiffY 2
+#define CompoundTypeDiffUv 3
+
+#define IntraSizes 9
+#define ReconstructBlockSizes 36
+#define IntraTypeCount 10
+#define IntraBlockOffset 264
+#define ReconBlockOffset 219
+
+#define DC_PRED 0
+#define V_PRED 1
+#define H_PRED 2
+#define D45_PRED 3
+#define D135_PRED 4
+#define D113_PRED 5
+#define D157_PRED 6
+#define D203_PRED 7
+#define D67_PRED 8
+#define SMOOTH_PRED 9
+#define SMOOTH_V_PRED 10
+#define SMOOTH_H_PRED 11
+#define PAETH_PRED 12
+#define UV_CFL_PRED 13
+#define NEARESTMV 13
+#define NEARMV 14
+#define GLOBALMV 15
+#define NEWMV 16
+// Compound ref compound modes
+#define NEAREST_NEARESTMV 17
+#define NEAR_NEARMV 18
+#define NEAREST_NEWMV 19
+#define NEW_NEARESTMV 20
+#define NEAR_NEWMV 21
+#define NEW_NEARMV 22
+#define GLOBAL_GLOBALMV 23
+#define NEW_NEWMV 24
+#define MB_MODE_COUNT 25
+#define SINGLE_INTER_MODE_START NEARESTMV
+#define SINGLE_INTER_MODE_END NEAREST_NEARESTMV
+
+#define BLOCK_4X4 0
+#define BLOCK_4X8 1
+#define BLOCK_8X4 2
+#define BLOCK_8X8 3
+#define BLOCK_8X16 4
+#define BLOCK_16X8 5
+#define BLOCK_16X16 6
+#define BLOCK_16X32 7
+#define BLOCK_32X16 8
+#define BLOCK_32X32 9
+#define BLOCK_32X64 10
+#define BLOCK_64X32 11
+#define BLOCK_64X64 12
+#define BLOCK_64X128 13
+#define BLOCK_128X64 14
+#define BLOCK_128X128 15
+#define BLOCK_4X16 16
+#define BLOCK_16X4 17
+#define BLOCK_8X32 18
+#define BLOCK_32X8 19
+#define BLOCK_16X64 20
+#define BLOCK_64X16 21
+
+#define SIMPLE_TRANSLATION 0
+#define OBMC_CAUSAL 1
+#define WARPED_CAUSAL 2
+#define MOTION_MODES 3
+
+#define COMPOUND_AVERAGE 0
+#define COMPOUND_DISTWTD 1
+#define COMPOUND_WEDGE 2
+#define COMPOUND_DIFFWTD 3
+#define COMPOUND_TYPES 4
+#define MASKED_COMPOUND_TYPES 2
+
+#define InterNoSkipFlag 0x2000
+#define NeedAboveLut 0x3f7f
+#define NeedLeftLut 0x3Ef7
+#define NeedRightLut 0x010A
+#define NeedBotLut 0x0084
+#define NeedAboveLeftLut 0x11ff
+#define InterFilterLut 0x25432010
+
+StructuredBuffer<MB_MODE_INFO> buffer_mi : register(t0);
+ByteAddressBuffer blocks_indexes : register(t1);
+ByteAddressBuffer blocks_index_base : register(t2);
+ByteAddressBuffer mi_grid : register(t3);
+ByteAddressBuffer intra_iter_grid : register(t4);
+
+RWByteAddressBuffer pred_blocks : register(u0);
+RWByteAddressBuffer pred_blocks_warp : register(u1);
+
+cbuffer GenBlockData : register(b0) {
+  uint cb_mi_cols;
+  uint cb_mi_rows;
+  uint cb_mi_stride;
+  uint cb_mi_addr_base;
+  uint cb_iter_grid_stride;
+  uint cb_iter_grid_offset_uv;
+  uint cb_iter_grid_stride_uv;
+  uint cb_disable_edge_filter;
+  uint cb_force_integet_mv;
+  int3 cb_reserved;
+  int4 cb_wedge_offsets[22];  //??
+  int4 cb_dist_wtd[8 * 8];
+  int4 cb_lossless_seg[8];
+  int4 cb_global_warp[8];
+  struct {
+    WarpedMotionParams params;
+    int pad;
+  } cb_wm_params[8];
+};
+
+cbuffer GenBlockSRT : register(b1) {
+  uint cb_wi_count;
+  uint cb_mi_offset;
+  uint cb_mi_idx_base;
+  uint cb_col_srart;
+  uint cb_row_srart;
+  uint cb_index_offset;
+  uint cb_index_offset_warp;
+};
+
+int intra_edge_filter_strength(int blk_wh, int d, int type) {
+  int strength = 0;
+  if (type == 0) {
+    if (blk_wh <= 8) {
+      if (d >= 56) strength = 1;
+    } else if (blk_wh <= 12) {
+      if (d >= 40) strength = 1;
+    } else if (blk_wh <= 16) {
+      if (d >= 40) strength = 1;
+    } else if (blk_wh <= 24) {
+      if (d >= 8) strength = 1;
+      if (d >= 16) strength = 2;
+      if (d >= 32) strength = 3;
+    } else if (blk_wh <= 32) {
+      if (d >= 1) strength = 1;
+      if (d >= 4) strength = 2;
+      if (d >= 32) strength = 3;
+    } else {
+      if (d >= 1) strength = 3;
+    }
+  } else {
+    if (blk_wh <= 8) {
+      if (d >= 40) strength = 1;
+      if (d >= 64) strength = 2;
+    } else if (blk_wh <= 16) {
+      if (d >= 20) strength = 1;
+      if (d >= 48) strength = 2;
+    } else if (blk_wh <= 24) {
+      if (d >= 4) strength = 3;
+    } else {
+      if (d >= 1) strength = 3;
+    }
+  }
+  return strength;
+}
+
+uint get_mi_index(ByteAddressBuffer grid, int index, uint base) {
+  uint addr = grid.Load(index * 8);
+  addr -= base;
+  return addr / ModeInfoSize;
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int InterBlockSizeIndexLUT[6][6] = {
+      // h:        4    8    16    32    64    128
+      {0, 1, 2, -1, -1, -1},     // w = 4 (4)
+      {3, 4, 5, 6, -1, -1},      // w = 8
+      {7, 8, 9, 10, 11, -1},     // w = 16
+      {-1, 12, 13, 14, 15, -1},  // w = 32
+      {-1, -1, 16, 17, 18, 19},  // w = 64
+      {-1, -1, -1, -1, 20, 21}   // w = 128
+  };
+  const int mi_size_wide_log2[] = {0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 0, 2, 1, 3, 2, 4};
+  const int mi_size_high_log2[] = {0, 1, 0, 1, 2, 1, 2, 3, 2, 3, 4, 3, 4, 5, 4, 5, 2, 0, 3, 1, 4, 2};
+
+  const int mi_addr = cb_mi_offset + thread.x;
+  MB_MODE_INFO mi = buffer_mi[mi_addr];
+
+  int bsize = mi.block_type & 255;
+  const int bw_log = mi_size_wide_log2[bsize];
+  const int bh_log = mi_size_high_log2[bsize];
+  const int bw = 1 << bw_log;
+  const int bh = 1 << bh_log;
+  const int bw_log_uv = max(0, bw_log - 1);
+  const int bh_log_uv = max(0, bh_log - 1);
+
+  const uint mi_row = mi.mi_row;
+  const uint mi_col = mi.mi_col;
+  const int is_chroma_ref = ((mi_row & 1) == 0 && (bh & 1) == 1) || ((mi_col & 1) == 0 && (bw & 1) == 1);
+
+  const int ref0 = ((int)mi.block_type << 8) >> 24;
+  const int ref1 = ((int)mi.block_type) >> 24;
+
+  const int mode = mi.modes & 255;
+  const int is_inter_intra = ref0 > 0 && ref1 == 0 && bsize >= BLOCK_8X8 && bsize <= BLOCK_32X32 &&
+                             mode >= SINGLE_INTER_MODE_START && mode < SINGLE_INTER_MODE_END;
+  int index_addr = (mi.index_base + cb_mi_idx_base) * 4;
+  const int motion_mode = mi.modes >> 24;
+  const int is_obmc_left = ref0 > 0 && motion_mode == OBMC_CAUSAL && mi_col > cb_col_srart;
+  const int is_obmc_above = ref0 > 0 && motion_mode == OBMC_CAUSAL && mi_row > cb_row_srart;
+
+  if (ref0 > 0) {
+    const int is_compound = ref1 > 0;
+    const int allow_warp = cb_force_integet_mv == 0 && bw_log > 0 && bh_log > 0;
+    const int allow_global_warp = allow_warp && (mode == GLOBALMV || mode == GLOBAL_GLOBALMV);
+    const int is_global_warp0 = cb_global_warp[ref0 - 1].x && allow_global_warp;
+    const int is_global_warp1 = (is_compound == 0 || allow_global_warp == 0) ? 0 : cb_global_warp[ref1 - 1].x;
+    const int is_local_warp = motion_mode == WARPED_CAUSAL && (mi.wm_params.type & 0x10000) == 0;
+    const int is_luma_warp = (is_local_warp || is_global_warp0) && allow_warp;
+
+    const int no_skip_flag =
+        ((mi.tx_info & 0xff00) == 0 && !is_inter_intra && !is_obmc_left && !is_obmc_above) ? InterNoSkipFlag : 0;
+    const int block_size_id_y = InterBlockSizeIndexLUT[bw_log][bh_log];
+    const int comp_type = mi.interinter_comp.type;
+    uint wtd = 0;
+    const int wedge_idx = mi.interinter_comp.wedge_sign + mi.interinter_comp.wedge_index * 2;
+
+    if (is_compound) {
+      wtd = 0x88;
+      if (comp_type == COMPOUND_DISTWTD) {
+        wtd = cb_dist_wtd[ref0 - 1 + (ref1 - 1) * 8].x;
+      } else if (comp_type == COMPOUND_WEDGE) {
+        wtd = cb_wedge_offsets[bsize].x + (wedge_idx << (bw_log + bh_log - 2));
+      } else if (comp_type == COMPOUND_DIFFWTD) {
+        wtd = mi.interinter_comp.mask_type;
+      }
+      wtd <<= 17;
+
+      const int is_warp_compound = is_global_warp0 || is_global_warp1;
+      const uint filter_type_h =
+          (InterFilterLut >> ((((mi.interp_filters >> 16) & 15) << 2) + ((bw_log > 0) << 4))) & 7;
+      const uint filter_type_v = (InterFilterLut >> (((mi.interp_filters & 15) << 2) + ((bh_log > 0) << 4))) & 7;
+      const int gpu_comp_type = (0x2100 >> (comp_type * 4)) & 15;
+
+      uint4 block;
+      block.x = mi_col | (mi_row << 16);
+      block.y = ((ref0 - 1) << 2) | (filter_type_h << 5) | (filter_type_v << 9) | ((ref1 - 1) << 14) | wtd |
+                no_skip_flag | (gpu_comp_type << 30);
+
+      block.z = (mi.mv[0] << 1) & 0xfffeffff;
+      block.w = (mi.mv[1] << 1) & 0xfffeffff;
+
+      const int pass_type = is_warp_compound ? 5 : ((0x3422 >> (comp_type * 4)) & 15);
+      int pass_type_index = (pass_type - 1) * InterSizesAllCommon + block_size_id_y;
+
+      int dst_ptr = blocks_index_base.Load(4 * (cb_index_offset + pass_type_index)) + blocks_indexes.Load(index_addr);
+      index_addr += 4;
+      pred_blocks.Store4(dst_ptr * 16, block);
+    } else {
+      if (is_luma_warp) {
+        int dst_ptr =
+            blocks_index_base.Load(4 * (cb_index_offset_warp + block_size_id_y)) + blocks_indexes.Load(index_addr);
+        index_addr += 4;
+        dst_ptr *= 48;
+        pred_blocks_warp.Store(dst_ptr, mi_col | (mi_row << 16));
+        pred_blocks_warp.Store(dst_ptr + 4, ((ref0 - 1) << 2) | no_skip_flag);
+
+        WarpedMotionParams params;
+        if (is_local_warp)
+          params = mi.wm_params;
+        else
+          params = cb_wm_params[ref0 - 1].params;
+
+        pred_blocks_warp.Store4(dst_ptr + 8, params.mat[0]);
+        pred_blocks_warp.Store2(dst_ptr + 24, params.mat[1].xy);
+        int4 angle32;
+        angle32.x = ((int)(params.angles.x << 16)) >> 16;
+        angle32.y = ((int)params.angles.x) >> 16;
+        angle32.w = ((int)(params.angles.y << 16)) >> 16;
+        angle32.z = ((int)params.angles.y) >> 16;
+        pred_blocks_warp.Store4(dst_ptr + 32, angle32);
+      } else {
+        uint4 block;
+        block.x = mi_col | (mi_row << 16);
+        const uint filter_type_h =
+            (InterFilterLut >> ((((mi.interp_filters >> 16) & 15) << 2) + ((bw_log > 0) << 4))) & 7;
+        const uint filter_type_v = (InterFilterLut >> (((mi.interp_filters & 15) << 2) + ((bh_log > 0) << 4))) & 7;
+        block.y = ((ref0 - 1) << 2) | (filter_type_h << 5) | (filter_type_v << 9) | no_skip_flag;
+        block.z = (mi.mv[0] << 1) & 0xfffeffff;
+        block.w = 0;
+
+        int dst_ptr = blocks_index_base.Load(4 * (cb_index_offset + block_size_id_y)) + blocks_indexes.Load(index_addr);
+        index_addr += 4;
+        pred_blocks.Store4(dst_ptr * 16, block);
+      }
+    }
+
+    if (!is_chroma_ref) {
+      const int block_size_id_uv = InterBlockSizeIndexLUT[bw_log_uv][bh_log_uv];
+      const int mi_col_uv = mi_col >> 1;
+      const int mi_row_uv = mi_row >> 1;
+      int sub8x8 = bw_log == 0 || bh_log == 0;
+
+      int mi_addr_above = mi_addr;
+      int mi_addr_left = mi_addr;
+      int mi_addr_aboveleft = mi_addr;
+      if (sub8x8) {
+        int dy = bh_log == 0 ? -1 : 0;
+        int dx = bw_log == 0 ? -1 : 0;
+
+        mi_addr_left = get_mi_index(mi_grid, mi_col + dx + mi_row * cb_mi_stride, cb_mi_addr_base);
+        mi_addr_above = get_mi_index(mi_grid, mi_col + (mi_row + dy) * cb_mi_stride, cb_mi_addr_base);
+        mi_addr_aboveleft = get_mi_index(mi_grid, mi_col + dx + (mi_row + dy) * cb_mi_stride, cb_mi_addr_base);
+
+        sub8x8 &= (((int)buffer_mi[mi_addr_left].block_type << 8) >> 24) > 0;
+        sub8x8 &= (((int)buffer_mi[mi_addr_above].block_type << 8) >> 24) > 0;
+        sub8x8 &= (((int)buffer_mi[mi_addr_aboveleft].block_type << 8) >> 24) > 0;
+      }
+
+      if (sub8x8) {
+        int x = mi_col & (~1);
+        int y = mi_row & (~1);
+        const int brows = bh_log == 2 ? 4 : 2;
+        const int bcols = bw_log == 2 ? 4 : 2;
+        const int bh_flag = bh_log != 0 ? ((brows - 1) << 28) : 0;  // for scale
+        const int bw_flag = bw_log != 0 ? ((bcols - 1) << 26) : 0;
+        for (int row = 0; row < brows; ++row) {
+          int mi_index_1 = row == 0 ? mi_addr_above : mi_addr;
+          if (bw_log == 0) {
+            const int mi_index_0 = row == 0 ? mi_addr_aboveleft : mi_addr_left;
+
+            const int block_type0 = (int)buffer_mi[mi_index_0].block_type;
+            const int block_type1 = (int)buffer_mi[mi_index_1].block_type;
+
+            const int is_compound0 = (block_type0 >> 24) > 0;
+            const int is_compound1 = (block_type1 >> 24) > 0;
+
+            const int diff_comp = is_compound0 != is_compound1;
+            int type_index0 = Inter2x2ArrOffset + (is_compound0 << diff_comp);
+            int type_index1 = Inter2x2ArrOffset + (is_compound1 << diff_comp);
+
+            const int interp_filters0 = buffer_mi[mi_index_0].interp_filters;
+            const int interp_filters1 = buffer_mi[mi_index_1].interp_filters;
+            const int flags = 1 |                                     // U-plane
+                              (is_compound0 == is_compound1) << 25 |  // combo write
+                              no_skip_flag | bh_flag;
+
+            int dst_ptr0 =
+                blocks_index_base.Load(4 * (cb_index_offset + type_index0)) + blocks_indexes.Load(index_addr);
+            index_addr += 4;
+            int dst_ptr1 = dst_ptr0 + 1;
+            if (diff_comp) {
+              dst_ptr1 = blocks_index_base.Load(4 * (cb_index_offset + type_index1)) + blocks_indexes.Load(index_addr);
+              index_addr += 4;
+            }
+
+            dst_ptr0 *= 16;
+            dst_ptr1 *= 16;
+
+            uint filter_type_h =
+                (InterFilterLut >> ((((interp_filters0 >> 16) & 15) << 2) + ((bw_log_uv > 0) << 4))) & 7;
+            uint filter_type_v = (InterFilterLut >> (((interp_filters0 & 15) << 2) + ((bh_log_uv > 0) << 4))) & 7;
+            uint4 block0;
+            block0.x = x | ((y + row) << 16);
+            block0.y = flags | ((((block_type0 << 8) >> 24) - 1) << 2) | (filter_type_h << 5) | (filter_type_v << 9) |
+                       ((((block_type0 >> 24) - 1) & 7) << 14);
+            block0.z = buffer_mi[mi_index_0].mv[0];
+            block0.w = buffer_mi[mi_index_0].mv[1];
+
+            filter_type_h = (InterFilterLut >> ((((interp_filters1 >> 16) & 15) << 2) + ((bw_log_uv > 0) << 4))) & 7;
+            filter_type_v = (InterFilterLut >> (((interp_filters1 & 15) << 2) + ((bh_log_uv > 0) << 4))) & 7;
+            uint4 block1;
+            block1.x = block0.x + 1;
+            block1.y = flags | ((((block_type1 << 8) >> 24) - 1) << 2) | (filter_type_h << 5) | (filter_type_v << 9) |
+                       ((((block_type1 >> 24) - 1) & 7) << 14);
+            block1.z = buffer_mi[mi_index_1].mv[0];
+            block1.w = buffer_mi[mi_index_1].mv[1];
+
+            pred_blocks.Store4(dst_ptr0, block0);
+            pred_blocks.Store4(dst_ptr1, block1);
+            block0.y ^= 3;
+            block1.y ^= 3;
+            dst_ptr0 += 32 >> diff_comp;
+            dst_ptr1 += 32 >> diff_comp;
+            pred_blocks.Store4(dst_ptr0, block0);
+            pred_blocks.Store4(dst_ptr1, block1);
+          } else {
+            const int type_index = Inter2x2ArrOffset + is_compound;
+            int dst_addr =
+                16 * (blocks_index_base.Load(4 * (cb_index_offset + type_index)) + blocks_indexes.Load(index_addr));
+            index_addr += 4;
+
+            const int interp_filters = buffer_mi[mi_index_1].interp_filters;
+            uint filter_type_h =
+                (InterFilterLut >> ((((interp_filters >> 16) & 15) << 2) + ((bw_log_uv > 0) << 4))) & 7;
+            uint filter_type_v = (InterFilterLut >> (((interp_filters & 15) << 2) + ((bh_log_uv > 0) << 4))) & 7;
+            const int block_type = (int)buffer_mi[mi_index_1].block_type;
+
+            uint mode_base = ((((block_type << 8) >> 24) - 1) << 2) | (filter_type_h << 5) | (filter_type_v << 9) |
+                             ((((block_type >> 24) - 1) & 7) << 14) | (1 << 25) | no_skip_flag | bw_flag;
+
+            uint4 block;
+            block.z = buffer_mi[mi_index_1].mv[0];
+            block.w = buffer_mi[mi_index_1].mv[1];
+
+            for (int p = 1; p < 3; ++p) {
+              for (int col = 0; col < bcols; ++col) {
+                block.x = (x + col) | ((y + row) << 16);
+                block.y = mode_base | p;
+                pred_blocks.Store4(dst_addr, block);
+                dst_addr += 16;
+              }
+            }
+          }
+        }
+      } else  //! sub8x8
+      {
+        const uint filter_type_h =
+            (InterFilterLut >> ((((mi.interp_filters >> 16) & 15) << 2) + ((bw_log_uv > 0) << 4))) & 7;
+        const uint filter_type_v = (InterFilterLut >> (((mi.interp_filters & 15) << 2) + ((bh_log_uv > 0) << 4))) & 7;
+        if (is_compound) {
+          if (comp_type == COMPOUND_WEDGE) {
+            wtd = cb_wedge_offsets[bsize].y + (wedge_idx << max(0, bw_log_uv + bh_log_uv - 2));
+            wtd <<= 17;
+          }
+
+          const int is_warp_compound = (is_global_warp0 || is_global_warp1) && bw_log > 1 && bh_log > 1;
+          const int gpu_comp_type = (0x3100 >> (comp_type * 4)) & 15;
+
+          uint4 block;
+          block.x = mi_col_uv | (mi_row_uv << 16);
+          block.y = 1 | ((ref0 - 1) << 2) | (filter_type_h << 5) | (filter_type_v << 9) | ((ref1 - 1) << 14) | wtd |
+                    no_skip_flag | (gpu_comp_type << 30);
+          block.z = mi.mv[0];
+          block.w = mi.mv[1];
+          const int pass_type =
+              is_warp_compound ? (comp_type == COMPOUND_DIFFWTD ? 7 : 5) : ((0x6422 >> (comp_type * 4)) & 15);
+          int pass_type_index = (pass_type - 1) * InterSizesAllCommon + block_size_id_uv;
+          int dst_ptr =
+              blocks_index_base.Load(4 * (cb_index_offset + pass_type_index)) + blocks_indexes.Load(index_addr);
+          index_addr += 4;
+          dst_ptr *= 16;
+          pred_blocks.Store4(dst_ptr, block);
+          block.y ^= 3;
+          pred_blocks.Store4(dst_ptr + 16, block);
+        } else {
+          const int is_chroma_warp = is_luma_warp && bw_log >= 2 && bh_log >= 2;
+          if (is_chroma_warp) {
+            int dst_ptr =
+                blocks_index_base.Load(4 * (cb_index_offset_warp + block_size_id_uv)) + blocks_indexes.Load(index_addr);
+            index_addr += 4;
+            dst_ptr *= 48;
+
+            WarpedMotionParams params;
+            if (is_local_warp)
+              params = mi.wm_params;
+            else
+              params = cb_wm_params[ref0 - 1].params;
+
+            int4 angle32;
+            angle32.x = ((int)(params.angles.x << 16)) >> 16;
+            angle32.y = ((int)params.angles.x) >> 16;
+            angle32.w = ((int)(params.angles.y << 16)) >> 16;
+            angle32.z = ((int)params.angles.y) >> 16;
+
+            pred_blocks_warp.Store(dst_ptr, mi_col_uv | (mi_row_uv << 16));
+            pred_blocks_warp.Store(dst_ptr + 4, 1 | ((ref0 - 1) << 2) | no_skip_flag);
+            pred_blocks_warp.Store4(dst_ptr + 8, params.mat[0]);
+            pred_blocks_warp.Store2(dst_ptr + 24, params.mat[1].xy);
+            pred_blocks_warp.Store4(dst_ptr + 32, angle32);
+            dst_ptr += 48;
+            pred_blocks_warp.Store(dst_ptr, mi_col_uv | (mi_row_uv << 16));
+            pred_blocks_warp.Store(dst_ptr + 4, 2 | ((ref0 - 1) << 2) | no_skip_flag);
+            pred_blocks_warp.Store4(dst_ptr + 8, params.mat[0]);
+            pred_blocks_warp.Store2(dst_ptr + 24, params.mat[1].xy);
+            pred_blocks_warp.Store4(dst_ptr + 32, angle32);
+
+          } else {
+            uint4 block;
+            block.x = mi_col_uv | (mi_row_uv << 16);
+            block.y = 1 | ((ref0 - 1) << 2) | (filter_type_h << 5) | (filter_type_v << 9) | no_skip_flag;
+            block.z = mi.mv[0];
+            block.w = 0;
+
+            int dst_ptr =
+                blocks_index_base.Load(4 * (cb_index_offset + block_size_id_uv)) + blocks_indexes.Load(index_addr);
+            index_addr += 4;
+            dst_ptr *= 16;
+            pred_blocks.Store4(dst_ptr, block);
+            dst_ptr += 16;
+            block.y ^= 3;
+            pred_blocks.Store4(dst_ptr, block);
+          }
+        }
+      }
+    }
+
+    if (is_obmc_above) {
+      const int x_mis = min(bw, cb_mi_cols - mi_col);
+      int h = bh_log > 4 ? 3 : (bh_log - 1);
+      int huv = h == 0 ? 0 : h - 1;
+      int obmc_chroma = bsize > BLOCK_16X8 && bsize != BLOCK_4X16 && bsize != BLOCK_16X4;
+      int count = 0;
+      for (int col = 0; col < x_mis && count < min(bw_log, 4);) {
+        int mi_addr_above = get_mi_index(mi_grid, mi_col + col + (mi_row - 1) * cb_mi_stride, cb_mi_addr_base);
+        int w = mi_size_wide_log2[buffer_mi[mi_addr_above].block_type & 255];
+        if (w == 0) {
+          w = 1;
+          mi_addr_above = get_mi_index(mi_grid, mi_col + col + 1 + (mi_row - 1) * cb_mi_stride, cb_mi_addr_base);
+        }
+        if (w > bw_log) w = bw_log;
+
+        uint above_ref = ((int)buffer_mi[mi_addr_above].block_type << 8) >> 24;
+        if (above_ref > 0) {
+          count += 1 + (w == 5);
+          const int filters = buffer_mi[mi_addr_above].interp_filters;
+
+          uint filter_type_h = (InterFilterLut >> ((((filters >> 16) & 15) << 2) + ((w > 0) << 4))) & 7;
+          uint filter_type_v = (InterFilterLut >> (((filters & 15) << 2) + ((h > 0) << 4))) & 7;
+
+          int4 block;
+          block.x = (mi_col + col) | (mi_row << 16);
+          block.y = 0 | ((above_ref - 1) << 2) | (filter_type_h << 5) | (filter_type_v << 9) | ((1 << h) << 17);
+          block.z = (buffer_mi[mi_addr_above].mv[0] << 1) & 0xfffeffff;
+          block.w = 0;
+
+          int type_index = (ObmcAbove - 1) * InterSizesAllCommon + ((w << 2) | h);  // InterBlockSizeIndexLUT[w][h];
+          int dst_ptr = blocks_index_base.Load(4 * (cb_index_offset + type_index)) + blocks_indexes.Load(index_addr);
+          index_addr += 4;
+          pred_blocks.Store4(dst_ptr * 16, block);
+
+          if (obmc_chroma) {
+            filter_type_h = (InterFilterLut >> ((((filters >> 16) & 15) << 2) + ((w > 1) << 4))) & 7;
+            filter_type_v = (InterFilterLut >> (((filters & 15) << 2) + ((huv > 0) << 4))) & 7;
+            block.x = ((mi_col + col) >> 1) | ((mi_row >> 1) << 16);
+            block.y = 1 | ((above_ref - 1) << 2) | (filter_type_h << 5) | (filter_type_v << 9) |
+                      (((0x84210 >> (h * 4)) & 15) << 17);
+            block.z = buffer_mi[mi_addr_above].mv[0];
+
+            type_index = (ObmcAbove - 1) * InterSizesAllCommon + (((w - 1) << 2) | huv);
+            dst_ptr = blocks_index_base.Load(4 * (cb_index_offset + type_index)) + blocks_indexes.Load(index_addr);
+            index_addr += 4;
+            dst_ptr *= 16;
+            pred_blocks.Store4(dst_ptr, block);
+            block.y ^= 3;
+            dst_ptr += 16;
+            pred_blocks.Store4(dst_ptr, block);
+          }
+        }
+        col += 1 << w;
+      }
+    }
+
+    if (is_obmc_left) {
+      const int y_mis = min(bh, cb_mi_rows - mi_row);
+      int w = bw_log > 4 ? 3 : (bw_log - 1);
+      int wuv = w == 0 ? 0 : w - 1;
+      int count = 0;
+      for (int row = 0; row < y_mis && count < min(bh_log, 4);) {
+        int mi_addr_left = get_mi_index(mi_grid, mi_col - 1 + (mi_row + row) * cb_mi_stride, cb_mi_addr_base);
+        int h = mi_size_high_log2[buffer_mi[mi_addr_left].block_type & 255];
+        if (h == 0) {
+          h = 1;
+          // left += xd->mi_stride;
+          mi_addr_left = get_mi_index(mi_grid, mi_col - 1 + (mi_row + row + 1) * cb_mi_stride, cb_mi_addr_base);
+        }
+        if (h > bh_log) h = bh_log;
+
+        uint left_ref = ((int)buffer_mi[mi_addr_left].block_type << 8) >> 24;
+        if (left_ref > 0) {
+          count += 1 + (h == 5);
+
+          const int filters = buffer_mi[mi_addr_left].interp_filters;
+          uint filter_type_h = (InterFilterLut >> ((((filters >> 16) & 15) << 2) + ((w > 0) << 4))) & 7;
+          uint filter_type_v = (InterFilterLut >> (((filters & 15) << 2) + ((h > 0) << 4))) & 7;
+
+          int4 block;
+          block.x = mi_col | ((mi_row + row) << 16);
+          block.y = 0 | ((left_ref - 1) << 2) | (filter_type_h << 5) | (filter_type_v << 9) | ((1 << w) << 17);
+          block.z = (buffer_mi[mi_addr_left].mv[0] << 1) & 0xfffeffff;
+          block.w = 0;
+
+          int type_index = (ObmcLeft - 1) * InterSizesAllCommon + ((h << 2) | w);  // InterBlockSizeIndexLUT[w][h];
+          int dst_ptr = blocks_index_base.Load(4 * (cb_index_offset + type_index)) + blocks_indexes.Load(index_addr);
+          index_addr += 4;
+          pred_blocks.Store4(dst_ptr * 16, block);
+          filter_type_h = (InterFilterLut >> ((((filters >> 16) & 15) << 2) + ((wuv > 0) << 4))) & 7;
+          filter_type_v = (InterFilterLut >> (((filters & 15) << 2) + ((h > 1) << 4))) & 7;
+          block.x = (mi_col >> 1) | (((mi_row + row) >> 1) << 16);
+          block.y = 1 | ((left_ref - 1) << 2) | (filter_type_h << 5) | (filter_type_v << 9) |
+                    (((0x84210 >> (w * 4)) & 15) << 17);
+          block.z = buffer_mi[mi_addr_left].mv[0];
+
+          type_index = (ObmcLeft - 1) * InterSizesAllCommon + (((h - 1) << 2) | wuv);
+          dst_ptr = blocks_index_base.Load(4 * (cb_index_offset + type_index)) + blocks_indexes.Load(index_addr);
+          index_addr += 4;
+          dst_ptr *= 16;
+          pred_blocks.Store4(dst_ptr, block);
+          block.y ^= 3;
+          dst_ptr += 16;
+          pred_blocks.Store4(dst_ptr, block);
+        }
+        row += 1 << h;
+      }
+    }
+  }
+
+  const int y_use_palette = (mi.palette_mode_info.sizes & 0xffff) != 0;
+  const int uv_use_palette = (mi.palette_mode_info.sizes & 0xffff0000) != 0;
+
+  if (ref0 <= 0 || is_inter_intra) {
+    const int tx_size_wide_log2[] = {0, 1, 2, 3, 4, 0, 1, 1, 2, 2, 3, 3, 4, 0, 2, 1, 3, 2, 4};
+    const int tx_size_high_log2[] = {0, 1, 2, 3, 4, 1, 0, 2, 1, 3, 2, 4, 3, 2, 0, 3, 1, 4, 2};
+    const int mode_to_angle_map[] = {
+        0, 90, 180, 45, 135, 113, 157, 203, 67,
+    };
+
+    const int disable_edge_filter = cb_disable_edge_filter;
+    const int intra_mode_flags = mi.intra_mode_flags;
+    const int is_intrabc = (intra_mode_flags & 0x100) != 0;
+    const int interintra_mode = (mi.modes >> 16) & 255;
+
+    uint tx_info = mi.tx_info;
+    uint tx_size = tx_info & 255;
+
+    int txw = tx_size_wide_log2[tx_size];
+    int txh = tx_size_high_log2[tx_size];
+    const int tx_uv_add = is_intrabc || is_inter_intra;
+    int txw_uv = min(bw_log_uv, 3 + tx_uv_add);
+    int txh_uv = min(bh_log_uv, 3 + tx_uv_add);
+    if (cb_lossless_seg[(tx_info >> 24) & 7].x && !tx_uv_add) {
+      txw = 0;
+      txh = 0;
+      txw_uv = 0;
+      txh_uv = 0;
+    }
+
+    txw = is_intrabc ? min(bw_log, 4) : is_inter_intra ? bw_log : txw;
+    txh = is_intrabc ? min(bh_log, 4) : is_inter_intra ? bh_log : txh;
+
+    const int max_cnt_x = (cb_mi_cols - mi_col + (1 << txw) - 1) >> txw;
+    const int max_cnt_y = (cb_mi_rows - mi_row + (1 << txh) - 1) >> txh;
+    const int unit_x_log = bw_log == 5 && !is_intrabc;
+    const int unit_y_log = bh_log == 5 && !is_intrabc;
+
+    int cnt_y = 1 << (bh_log - txh);
+    int cnt_x = 1 << (bw_log - txw);
+    const int cfl_max_x = (mi_col + (min(cnt_x, max_cnt_x) << txw)) << 2;
+    const int cfl_max_y = (mi_row + (min(cnt_y, max_cnt_y) << txh)) << 2;
+
+    if (!y_use_palette) {
+      const int mode1 = (is_inter_intra ? (0x9210 >> (interintra_mode * 4)) : mi.modes) & 15;
+      int need_above = (NeedAboveLut >> mode1) & 1;
+      int need_left = (NeedLeftLut >> mode1) & 1;
+
+      const int use_filter = mi.filter_intra_mode_info >> 8;
+
+      const int mode_gpu = is_intrabc ? (12 << 6) : use_filter ? (13 << 6) : mode1 ? ((mode1 - 1) << 6) : (14 << 6);
+
+      int is_dir = mode1 >= V_PRED && mode1 <= D67_PRED;
+
+      int mode_flags_base = txw | (((tx_info & 0xff00) == 0) << 5) | mode_gpu;
+
+      int dir_above_filter = 0;
+      int dir_left_filter = 0;
+
+      if (is_dir) {
+        int upsample_above = 0;
+        int upsample_left = 0;
+        int angle_delta = (intra_mode_flags << 8) >> 24;
+        int angle = mode_to_angle_map[mode1] + angle_delta * 3;
+        const int mode_angle = angle_delta + 3;
+        if (!disable_edge_filter) {
+          int filt_type = 0;
+          if (mi_row > cb_row_srart) {
+            const int above_idx = get_mi_index(mi_grid, mi_col + (mi_row - 1) * cb_mi_stride, cb_mi_addr_base);
+            const int m = buffer_mi[above_idx].modes & 255;
+            filt_type = m == SMOOTH_PRED || m == SMOOTH_V_PRED || m == SMOOTH_H_PRED;
+          }
+          if (mi_col > cb_col_srart) {
+            const int left_idx = get_mi_index(mi_grid, mi_col - 1 + mi_row * cb_mi_stride, cb_mi_addr_base);
+            const int m = buffer_mi[left_idx].modes & 255;
+            filt_type |= m == SMOOTH_PRED || m == SMOOTH_V_PRED || m == SMOOTH_H_PRED;
+          }
+          int d90 = abs(angle - 90);
+          int d180 = abs(angle - 180);
+          int blk_wh = (4 << txw) + (4 << txh);
+          upsample_above = d90 != 0 && d90 < 40 && blk_wh <= (16 >> filt_type);
+          upsample_left = d180 != 0 && d180 < 40 && blk_wh <= (16 >> filt_type);
+          dir_above_filter = intra_edge_filter_strength(blk_wh, d90, filt_type);
+          dir_left_filter = intra_edge_filter_strength(blk_wh, d180, filt_type);
+        }
+        mode_flags_base |= (upsample_above << 22) | (upsample_left << 23) | (mode_angle << 28);
+      }
+
+      mode_flags_base |= is_inter_intra << 31;
+
+      int mode_info0 = 0;
+      if (is_inter_intra) {
+        const int w_idx = mi.interintra_wedge_sign + mi.interintra_wedge_index * 2;
+        const int w_ofs = cb_wedge_offsets[bsize].x;
+        const int w_sz = 1 << max(0, bw_log + bh_log - 2);
+        mode_info0 = w_ofs + w_sz * ((intra_mode_flags & 1) ? w_idx : (32 + interintra_mode));
+      } else if (is_intrabc) {
+        mode_info0 = (mi.mv[0] << 1) & 0xfffeffff;
+        need_left = 0;
+        need_above = 0;
+      } else if (use_filter) {
+        mode_info0 = txh | ((mi.filter_intra_mode_info & 255) << 4);
+      }
+
+      const int type_size = txw + txh;
+      const int type_idx_base = use_filter ? IntraBlockOffset : (IntraBlockOffset - 1 - type_size);
+      cnt_x >>= unit_x_log;
+      cnt_y >>= unit_y_log;
+
+      for (int unit_y = 0; unit_y <= unit_y_log; ++unit_y) {
+        for (int unit_x = 0; unit_x <= unit_x_log; ++unit_x) {
+          const int x_start = unit_x * cnt_x;
+          const int x_end = min(x_start + cnt_x, max_cnt_x);
+          const int y_start = unit_y * cnt_y;
+          const int y_end = min(y_start + cnt_y, max_cnt_y);
+          for (int y = y_start; y < y_end; ++y) {
+            for (int x = x_start; x < x_end; ++x) {
+              const int col = mi_col + (x << txw);
+              const int row = mi_row + (y << txh);
+              const int subblk_w = 1 << txw;
+              const int subblk_h = 1 << txh;
+              const int have_top = y || (mi_row > cb_row_srart);
+              const int have_left = x || (mi_col > cb_col_srart);
+              uint block_index = blocks_indexes.Load(index_addr);
+              index_addr += 4;
+
+              int above_available = have_top;
+              if (need_above) {
+                const int xr = cb_mi_cols - col - subblk_w;
+                int have_top_right = block_index & 1;
+                above_available =
+                    (have_top ? min(subblk_w, subblk_w + xr) : 0) + (have_top_right ? min(subblk_w, xr) : 0);
+              }
+
+              int left_available = have_left;
+              if (need_left) {
+                const int yd = cb_mi_rows - row - subblk_h;
+                int have_bottom_left = block_index & 2;
+                left_available =
+                    (have_left ? min(subblk_h, subblk_h + yd) : 0) + (have_bottom_left ? min(subblk_h, yd) : 0);
+              }
+
+              int iter_grid_stride = cb_iter_grid_stride;
+              int iter = intra_iter_grid.Load((col + subblk_w + (row + 1) * iter_grid_stride) * 4);
+              const int type_index = iter * IntraTypeCount + type_idx_base;
+              const int dst_ptr = blocks_index_base.Load(4 * (cb_index_offset + type_index)) + (block_index >> 2);
+
+              uint4 block;
+              block.x = col | (row << 16);
+              block.y = mode_flags_base | (above_available << 10) | (left_available << 16) |
+                        (above_available ? (dir_above_filter << 24) : 0) |
+                        (left_available ? (dir_left_filter << 26) : 0);
+              block.z = mode_info0;
+              block.w = 0;
+              pred_blocks.Store4(dst_ptr * 16, block);
+            }
+          }
+        }
+      }
+    }
+
+    if (!uv_use_palette && !is_chroma_ref) {
+      const int mi_col_uv = mi_col >> 1;
+      const int mi_row_uv = mi_row >> 1;
+      const int mode1 = (is_inter_intra ? (0x9210 >> (interintra_mode * 4)) : (mi.modes >> 8)) & 15;
+
+      int need_above = (NeedAboveLut >> mode1) & 1;
+      int need_left = (NeedLeftLut >> mode1) & 1;
+
+      const int mode_gpu =
+          is_intrabc ? (12 << 6) : mode1 == UV_CFL_PRED ? (15 << 6) : mode1 ? ((mode1 - 1) << 6) : (14 << 6);
+
+      int is_dir = mode1 >= V_PRED && mode1 <= D67_PRED;
+
+      int mode_flags_base = txw_uv | (((tx_info & 0xff00) == 0) << 5) | mode_gpu;
+
+      int dir_above_filter = 0;
+      int dir_left_filter = 0;
+
+      const uint mi_col1 = mi_col & ~1;
+      const uint mi_row1 = mi_row & ~1;
+      if (is_dir) {
+        int upsample_above = 0;
+        int upsample_left = 0;
+        int angle_delta = intra_mode_flags >> 24;
+        int angle = mode_to_angle_map[mode1] + angle_delta * 3;
+        const int mode_angle = angle_delta + 3;
+        if (!disable_edge_filter) {
+          int filt_type = 0;
+          const int mi_base = mi_col1 + mi_row1 * cb_mi_stride;
+          if (mi_row1 > cb_row_srart) {
+            const int above_idx = get_mi_index(mi_grid, mi_base + 1 - cb_mi_stride, cb_mi_addr_base);
+            const int m = (buffer_mi[above_idx].modes >> 8) & 255;
+            filt_type = (m == SMOOTH_PRED || m == SMOOTH_V_PRED || m == SMOOTH_H_PRED) &&
+                        (((int)buffer_mi[above_idx].block_type << 8) >> 24) <= 0 &&
+                        (buffer_mi[above_idx].intra_mode_flags & 0x100) == 0;
+          }
+          if (mi_col1 > cb_col_srart) {
+            const int left_idx = get_mi_index(mi_grid, mi_base - 1 + cb_mi_stride, cb_mi_addr_base);
+            const int m = (buffer_mi[left_idx].modes >> 8) & 255;
+            filt_type |= (m == SMOOTH_PRED || m == SMOOTH_V_PRED || m == SMOOTH_H_PRED) &&
+                         (((int)buffer_mi[left_idx].block_type << 8) >> 24) <= 0 &&
+                         (buffer_mi[left_idx].intra_mode_flags & 0x100) == 0;
+          }
+          int d90 = abs(angle - 90);
+          int d180 = abs(angle - 180);
+          int blk_wh = (4 << txw_uv) + (4 << txh_uv);
+          upsample_above = d90 != 0 && d90 < 40 && blk_wh <= (16 >> filt_type);
+          upsample_left = d180 != 0 && d180 < 40 && blk_wh <= (16 >> filt_type);
+          dir_above_filter = intra_edge_filter_strength(blk_wh, d90, filt_type);
+          dir_left_filter = intra_edge_filter_strength(blk_wh, d180, filt_type);
+        }
+        mode_flags_base |= (upsample_above << 22) | (upsample_left << 23) | (mode_angle << 28);
+      }
+
+      mode_flags_base |= is_inter_intra << 31;
+
+      int mode_u = 1 << 3;  // plane
+      int mode_v = 2 << 3;  // plane
+      int mode_info0 = 0;
+      if (mode1 == UV_CFL_PRED) {
+        int sign_u = ((mi.cfl_alpha_signs + 1) * 11) >> 5;   // CFL_SIGN_U(cfl_alpha_signs);
+        int sign_v = (mi.cfl_alpha_signs + 1) - 3 * sign_u;  // CFL_SIGN_V(cfl_alpha_signs);
+        int idx_u = (sign_u == 2) ? (mi.cfl_alpha_idx >> 4) + 1 : (sign_u == 1) ? -(mi.cfl_alpha_idx >> 4) - 1 : 0;
+        int idx_v = (sign_v == 2) ? (mi.cfl_alpha_idx & 15) + 1 : (sign_v == 1) ? -(mi.cfl_alpha_idx & 15) - 1 : 0;
+        mode_u |= (idx_u + 16) << 22;
+        mode_v |= (idx_v + 16) << 22;
+        mode_info0 = cfl_max_x | (cfl_max_y << 16);
+      }
+
+      if (is_inter_intra) {
+        const int w_idx = mi.interintra_wedge_sign + mi.interintra_wedge_index * 2;
+        const int w_ofs = cb_wedge_offsets[bsize].y;
+        const int w_sz = 1 << max(0, bw_log_uv + bh_log_uv - 2);
+        mode_info0 = w_ofs + w_sz * ((intra_mode_flags & 1) ? w_idx : (32 + interintra_mode));
+      } else if (is_intrabc) {
+        mode_info0 = mi.mv[0];
+        need_left = 0;
+        need_above = 0;
+      }
+
+      const int type_idx_base = IntraBlockOffset - 1 - txw_uv - txh_uv;
+      const int cnt_y_uv = 1 << (bh_log_uv - txh_uv - unit_y_log);
+      const int cnt_x_uv = 1 << (bw_log_uv - txw_uv - unit_x_log);
+      for (int unit_y = 0; unit_y <= unit_y_log; ++unit_y) {
+        for (int unit_x = 0; unit_x <= unit_x_log; ++unit_x) {
+          for (int suby = 0; suby < cnt_y_uv; ++suby) {
+            for (int subx = 0; subx < cnt_x_uv; ++subx) {
+              const int x = subx + unit_x * cnt_x_uv;
+              const int y = suby + unit_y * cnt_y_uv;
+              const int col = mi_col_uv + (x << txw_uv);
+              const int row = mi_row_uv + (y << txh_uv);
+              const int subblk_w = 1 << txw_uv;
+              const int subblk_h = 1 << txh_uv;
+              const int have_top = y || (mi_row1 > cb_row_srart);
+              const int have_left = x || (mi_col1 > cb_col_srart);
+
+              uint block_index = blocks_indexes.Load(index_addr);
+              index_addr += 4;
+
+              int above_available = have_top;
+              if (need_above) {
+                const int xr = ((cb_mi_cols - mi_col - bw) + (2 << bw_log_uv) - ((x + 1) << (txw_uv + 1))) >> 1;
+                int have_top_right = block_index & 1;
+                above_available =
+                    (have_top ? min(subblk_w, subblk_w + xr) : 0) + (have_top_right ? min(subblk_w, xr) : 0);
+              }
+
+              int left_available = have_left;
+              if (need_left) {
+                const int yd = ((cb_mi_rows - mi_row - bh) + (2 << bh_log_uv) - ((y + 1) << (txh_uv + 1))) >> 1;
+                int have_bottom_left = block_index & 2;
+                left_available =
+                    (have_left ? min(subblk_h, subblk_h + yd) : 0) + (have_bottom_left ? min(subblk_h, yd) : 0);
+              }
+
+              int iter_grid_offset = cb_iter_grid_offset_uv;
+              int iter_grid_stride = cb_iter_grid_stride_uv;
+              int iter = intra_iter_grid.Load((iter_grid_offset + col + subblk_w + (row + 1) * iter_grid_stride) * 4);
+              const int type_index = iter * IntraTypeCount + type_idx_base;
+              int dst_ptr = blocks_index_base.Load(4 * (cb_index_offset + type_index)) + (block_index >> 2);
+              dst_ptr *= 16;
+
+              uint uv_mode = mode_flags_base | (above_available << 10) | (left_available << 16) |
+                             (above_available ? (dir_above_filter << 24) : 0) |
+                             (left_available ? (dir_left_filter << 26) : 0);
+              uint4 block;
+              block.x = col | (row << 16);
+              block.y = uv_mode | mode_u;
+              block.z = mode_info0;
+              block.w = 0;
+              pred_blocks.Store4(dst_ptr, block);
+
+              dst_ptr += 16;
+              block.y = uv_mode | mode_v;
+              pred_blocks.Store4(dst_ptr, block);
+            }
+          }
+        }
+      }
+    }
+  }
+
+  const int do_recon = (mi.tx_info & 0xff00) == 0;
+  const int inter_recon = do_recon && (is_obmc_above || is_obmc_left);
+  if (y_use_palette || inter_recon) {
+    const int type_index = ReconBlockOffset + bw_log + 6 * bh_log;
+    int dst_ptr = blocks_index_base.Load(4 * (cb_index_offset + type_index)) + blocks_indexes.Load(index_addr);
+
+    index_addr += 4;
+    dst_ptr *= 16;
+
+    uint4 block;
+    block.x = mi_col | (mi_row << 16);
+    block.y = (do_recon << 2) | (y_use_palette << 3);
+    block.z = 0;
+    block.w = 0;
+    pred_blocks.Store4(dst_ptr, block);
+  }
+  if ((uv_use_palette || inter_recon) && !is_chroma_ref) {
+    const int type_index = ReconBlockOffset + bw_log_uv + 6 * bh_log_uv;
+    int dst_ptr = blocks_index_base.Load(4 * (cb_index_offset + type_index)) + blocks_indexes.Load(index_addr);
+
+    index_addr += 4;
+    dst_ptr *= 16;
+
+    uint4 block;
+    block.x = (mi_col >> 1) | ((mi_row >> 1) << 16);
+    block.y = 1 | (do_recon << 2) | (uv_use_palette << 3);
+    block.z = 0;
+    block.w = 0;
+    pred_blocks.Store4(dst_ptr, block);
+    dst_ptr += 16;
+    block.y = 2 | (do_recon << 2) | (uv_use_palette << 3);
+    pred_blocks.Store4(dst_ptr, block);
+  }
+}

diff --git a/libav1/dx/shaders/idct16x16.hlsl b/libav1/dx/shaders/idct16x16.hlsl
new file mode 100644
index 0000000..aecfd31
--- /dev/null
+++ b/libav1/dx/shaders/idct16x16.hlsl

@@ -0,0 +1,61 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 16
+#define ScanSize 64
+#define IdctOutputShiftH 2
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_16x16 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[16 * 64];
+
+IDCT_GEN(N, 1, 1, 1, N / 4, 0, TRANSFORM_SELECT16, TRANSFORM_SELECT16);
+
+/*
+#include "idct_shader_common.h"
+
+
+enum
+{
+    N = 16,
+    ScanSize = 64,
+    IdctOutputShiftH = 2,
+    IdctOutputShiftV = 4,
+};
+
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans16x16;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans16x16 * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[16 * 64];
+
+
+IDCT_GEN(N, 1, 1, 1, N/4, 0, TRANSFORM_SELECT16, TRANSFORM_SELECT16);
+
+*/

diff --git a/libav1/dx/shaders/idct16x32.hlsl b/libav1/dx/shaders/idct16x32.hlsl
new file mode 100644
index 0000000..a2f1cb9
--- /dev/null
+++ b/libav1/dx/shaders/idct16x32.hlsl

@@ -0,0 +1,57 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 16
+#define ScanSize 128
+#define IdctOutputShiftH 1
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_16x32 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[32 * 64];
+
+IDCT_GEN(N, 1, 2, 2, 8, 1, TRANSFORM_SELECT16, TRANSFORM_SELECT32);
+
+/*
+#include "idct_shader_common.h"
+
+enum
+{
+    N = 16,
+    ScanSize = 128,
+    IdctOutputShiftH = 1,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[32 * 64];
+
+IDCT_GEN(N, 1, 2, 2, 8, 1, TRANSFORM_SELECT16, TRANSFORM_SELECT32);
+*/

diff --git a/libav1/dx/shaders/idct16x4.hlsl b/libav1/dx/shaders/idct16x4.hlsl
new file mode 100644
index 0000000..e08ac42
--- /dev/null
+++ b/libav1/dx/shaders/idct16x4.hlsl

@@ -0,0 +1,57 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 4
+#define ScanSize 16
+#define IdctOutputShiftH 1
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_16x4 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[16 * 64];
+
+IDCT_GEN(N, 4, 1, 1, 4, 0, TRANSFORM_SELECT16, TRANSFORM_SELECT4);
+
+/*
+#include "idct_shader_common.h"
+
+enum
+{
+    N = 4,
+    ScanSize = 16,
+    IdctOutputShiftH = 1,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[16 * 64];
+
+IDCT_GEN(N, 4, 1, 1, 4, 0, TRANSFORM_SELECT16, TRANSFORM_SELECT4);
+*/

diff --git a/libav1/dx/shaders/idct16x64.hlsl b/libav1/dx/shaders/idct16x64.hlsl
new file mode 100644
index 0000000..808eb31
--- /dev/null
+++ b/libav1/dx/shaders/idct16x64.hlsl

@@ -0,0 +1,59 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 16
+#define ScanSize 128
+#define IdctOutputShiftH 2
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_16x64 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[64 * 64];
+
+IDCT_GEN(N, 1, 4, 2, 8, 0, TRANSFORM_SELECT16, TRANSFORM_64);
+
+/*
+#include "idct_shader_common.h"
+
+
+enum
+{
+    N = 16,
+    ScanSize = 128,
+    IdctOutputShiftH = 2,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[64 * 64];
+
+IDCT_GEN(N, 1, 4, 2, 8, 0, TRANSFORM_SELECT16, TRANSFORM_64);
+
+*/

diff --git a/libav1/dx/shaders/idct16x8.hlsl b/libav1/dx/shaders/idct16x8.hlsl
new file mode 100644
index 0000000..5c2a541
--- /dev/null
+++ b/libav1/dx/shaders/idct16x8.hlsl

@@ -0,0 +1,56 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 8
+#define ScanSize 32
+#define IdctOutputShiftH 1
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_16x8 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[16 * 64];
+
+IDCT_GEN(N, 2, 1, 1, 4, 1, TRANSFORM_SELECT16, TRANSFORM_SELECT8);
+/*
+#include "idct_shader_common.h"
+
+enum
+{
+    N = 8,
+    ScanSize = 32,
+    IdctOutputShiftH = 1,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[16 * 64];
+
+IDCT_GEN(N, 2, 1, 1, 4, 1, TRANSFORM_SELECT16, TRANSFORM_SELECT8);
+*/

diff --git a/libav1/dx/shaders/idct32x16.hlsl b/libav1/dx/shaders/idct32x16.hlsl
new file mode 100644
index 0000000..0002821
--- /dev/null
+++ b/libav1/dx/shaders/idct32x16.hlsl

@@ -0,0 +1,58 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 16
+#define ScanSize 128
+#define IdctOutputShiftH 1
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_32x16 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[32 * 64];
+
+IDCT_GEN(N, 2, 1, 1, 8, 1, TRANSFORM_SELECT32, TRANSFORM_SELECT16);
+
+/*
+#include "idct_shader_common.h"
+
+enum
+{
+    N = 16,
+    ScanSize = 128,
+    IdctOutputShiftH = 1,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[32 * 64];
+
+IDCT_GEN(N, 2, 1, 1, 8, 1, TRANSFORM_SELECT32, TRANSFORM_SELECT16);
+
+*/

diff --git a/libav1/dx/shaders/idct32x32.hlsl b/libav1/dx/shaders/idct32x32.hlsl
new file mode 100644
index 0000000..12a9621
--- /dev/null
+++ b/libav1/dx/shaders/idct32x32.hlsl

@@ -0,0 +1,59 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 32
+#define ScanSize 256
+#define IdctOutputShiftH 2
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_32x32 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[32 * 64];
+
+IDCT_GEN(N, 1, 1, 1, N / 4, 0, TRANSFORM_SELECT32, TRANSFORM_SELECT32);
+
+/*
+#include "idct_shader_common.h"
+
+enum
+{
+    N = 32,
+    ScanSize = 256,
+    IdctOutputShiftH = 2,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[32 * 64];
+
+
+IDCT_GEN(N, 1, 1, 1, N/4, 0, TRANSFORM_SELECT32, TRANSFORM_SELECT32);
+
+*/

diff --git a/libav1/dx/shaders/idct32x64.hlsl b/libav1/dx/shaders/idct32x64.hlsl
new file mode 100644
index 0000000..7e6979d
--- /dev/null
+++ b/libav1/dx/shaders/idct32x64.hlsl

@@ -0,0 +1,58 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 32
+#define ScanSize 256
+#define IdctOutputShiftH 1
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_32x64 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[64 * 64];
+
+IDCT_GEN(N, 1, 2, 1, 8, 1, TRANSFORM_SELECT32, TRANSFORM_64);
+/*
+#include "idct_shader_common.h"
+
+enum
+{
+    N = 32,
+    ScanSize = 256,
+    IdctOutputShiftH = 1,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[64 * 64];
+
+IDCT_GEN(N, 1, 2, 1, 8, 1, TRANSFORM_SELECT32, TRANSFORM_64);
+
+
+*/

diff --git a/libav1/dx/shaders/idct32x8.hlsl b/libav1/dx/shaders/idct32x8.hlsl
new file mode 100644
index 0000000..0651530
--- /dev/null
+++ b/libav1/dx/shaders/idct32x8.hlsl

@@ -0,0 +1,57 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 8
+#define ScanSize 64
+#define IdctOutputShiftH 2
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_32x8 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[32 * 64];
+
+IDCT_GEN(N, 4, 1, 1, 8, 0, TRANSFORM_SELECT32, TRANSFORM_SELECT8);
+
+/*
+#include "idct_shader_common.h"
+
+enum
+{
+    N = 8,
+    ScanSize = 64,
+    IdctOutputShiftH = 2,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[32 * 64];
+
+IDCT_GEN(N, 4, 1, 1, 8, 0, TRANSFORM_SELECT32, TRANSFORM_SELECT8);
+*/

diff --git a/libav1/dx/shaders/idct4x16.hlsl b/libav1/dx/shaders/idct4x16.hlsl
new file mode 100644
index 0000000..4f522d1
--- /dev/null
+++ b/libav1/dx/shaders/idct4x16.hlsl

@@ -0,0 +1,56 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 4
+#define ScanSize 16
+#define IdctOutputShiftH 1
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_4x16 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[16 * 64];
+
+IDCT_GEN(N, 1, 4, 4, 4, 0, TRANSFORM_SELECT4, TRANSFORM_SELECT16);
+/*
+#include "idct_shader_common.h"
+
+enum
+{
+    N = 4,
+    ScanSize = 16,
+    IdctOutputShiftH = 1,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[16 * 64];
+
+IDCT_GEN(N, 1, 4, 4, 4, 0, TRANSFORM_SELECT4, TRANSFORM_SELECT16);
+*/

diff --git a/libav1/dx/shaders/idct4x4.hlsl b/libav1/dx/shaders/idct4x4.hlsl
new file mode 100644
index 0000000..b159ba3
--- /dev/null
+++ b/libav1/dx/shaders/idct4x4.hlsl

@@ -0,0 +1,28 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 4
+#define ScanSize 4
+#define IdctOutputShiftH 0
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_4x4 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[4 * 64];
+
+IDCT_GEN(N, 1, 1, 1, N / 4, 0, TRANSFORM_SELECT4, TRANSFORM_SELECT4);

diff --git a/libav1/dx/shaders/idct4x8.hlsl b/libav1/dx/shaders/idct4x8.hlsl
new file mode 100644
index 0000000..bb20ade
--- /dev/null
+++ b/libav1/dx/shaders/idct4x8.hlsl

@@ -0,0 +1,59 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 4
+#define ScanSize 8
+#define IdctOutputShiftH 0
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_4x8 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[8 * 64];
+
+IDCT_GEN(N, 1, 2, 2, 2, 1, TRANSFORM_SELECT4, TRANSFORM_SELECT8);
+
+/*
+#include "idct_shader_common.h"
+
+
+enum
+{
+    N = 4,
+    ScanSize = 8,
+    IdctOutputShiftH = 0,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans4x8;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans4x8 * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[8 * 64];
+
+IDCT_GEN(N, 1, 2, 2, 2, 1, TRANSFORM_SELECT4, TRANSFORM_SELECT8);
+
+*/
\ No newline at end of file

diff --git a/libav1/dx/shaders/idct64x16.hlsl b/libav1/dx/shaders/idct64x16.hlsl
new file mode 100644
index 0000000..f4278fa
--- /dev/null
+++ b/libav1/dx/shaders/idct64x16.hlsl

@@ -0,0 +1,57 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 16
+#define ScanSize 128
+#define IdctOutputShiftH 2
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_64x16 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[64 * 64];
+
+IDCT_GEN(N, 4, 1, 1, 8, 0, TRANSFORM_64, TRANSFORM_SELECT16);
+
+/*
+#include "idct_shader_common.h"
+
+enum
+{
+    N = 16,
+    ScanSize = 128,
+    IdctOutputShiftH = 2,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[64 * 64];
+
+IDCT_GEN(N, 4, 1, 1, 8, 0, TRANSFORM_64, TRANSFORM_SELECT16);
+*/

diff --git a/libav1/dx/shaders/idct64x32.hlsl b/libav1/dx/shaders/idct64x32.hlsl
new file mode 100644
index 0000000..067e7a4
--- /dev/null
+++ b/libav1/dx/shaders/idct64x32.hlsl

@@ -0,0 +1,57 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 32
+#define ScanSize 256
+#define IdctOutputShiftH 1
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_64x32 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[64 * 64];
+
+IDCT_GEN(N, 2, 1, 1, 8, 1, TRANSFORM_64, TRANSFORM_SELECT32);
+
+/*
+#include "idct_shader_common.h"
+
+enum
+{
+    N = 32,
+    ScanSize = 256,
+    IdctOutputShiftH = 1,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[64 * 64];
+
+IDCT_GEN(N, 2, 1, 1, 8, 1, TRANSFORM_64, TRANSFORM_SELECT32);
+*/

diff --git a/libav1/dx/shaders/idct64x64.hlsl b/libav1/dx/shaders/idct64x64.hlsl
new file mode 100644
index 0000000..ab98fb6
--- /dev/null
+++ b/libav1/dx/shaders/idct64x64.hlsl

@@ -0,0 +1,60 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 64
+#define ScanSize 256
+#define IdctOutputShiftH 2
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_64x64 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[64 * 64];
+
+IDCT_GEN(N, 1, 1, 1, 4, 0, TRANSFORM_64_HALF, TRANSFORM_64);
+
+/*
+#include "idct_shader_common.h"
+
+
+enum
+{
+    N = 64,
+    ScanSize = 256,
+    IdctOutputShiftH = 2,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[64 * 64];
+
+IDCT_GEN(N, 1, 1, 1, 4, 0, TRANSFORM_64_HALF, TRANSFORM_64);
+
+
+*/

diff --git a/libav1/dx/shaders/idct8x16.hlsl b/libav1/dx/shaders/idct8x16.hlsl
new file mode 100644
index 0000000..f29802d
--- /dev/null
+++ b/libav1/dx/shaders/idct8x16.hlsl

@@ -0,0 +1,56 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 8
+#define ScanSize 32
+#define IdctOutputShiftH 1
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_8x16 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[16 * 64];
+
+IDCT_GEN(N, 1, 2, 2, 4, 1, TRANSFORM_SELECT8, TRANSFORM_SELECT16);
+/*
+#include "idct_shader_common.h"
+
+enum
+{
+    N = 8,
+    ScanSize = 32,
+    IdctOutputShiftH = 1,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[16 * 64];
+
+IDCT_GEN(N, 1, 2, 2, 4, 1, TRANSFORM_SELECT8, TRANSFORM_SELECT16);
+*/

diff --git a/libav1/dx/shaders/idct8x32.hlsl b/libav1/dx/shaders/idct8x32.hlsl
new file mode 100644
index 0000000..2e1ca0a
--- /dev/null
+++ b/libav1/dx/shaders/idct8x32.hlsl

@@ -0,0 +1,56 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 8
+#define ScanSize 64
+#define IdctOutputShiftH 2
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_8x32 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[32 * 64];
+
+IDCT_GEN(N, 1, 4, 4, 8, 0, TRANSFORM_SELECT8, TRANSFORM_SELECT32);
+/*
+#include "idct_shader_common.h"
+
+enum
+{
+    N = 8,
+    ScanSize = 64,
+    IdctOutputShiftH = 2,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[32 * 64];
+
+IDCT_GEN(N, 1, 4, 4, 8, 0, TRANSFORM_SELECT8, TRANSFORM_SELECT32);
+*/

diff --git a/libav1/dx/shaders/idct8x4.hlsl b/libav1/dx/shaders/idct8x4.hlsl
new file mode 100644
index 0000000..766e3ca
--- /dev/null
+++ b/libav1/dx/shaders/idct8x4.hlsl

@@ -0,0 +1,58 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 4
+#define ScanSize 8
+#define IdctOutputShiftH 0
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_8x4 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[8 * 64];
+
+IDCT_GEN(N, 2, 1, 1, 2, 1, TRANSFORM_SELECT8, TRANSFORM_SELECT4);
+/*
+#include "idct_shader_common.h"
+
+
+enum
+{
+    N = 4,
+    ScanSize = 8,
+    IdctOutputShiftH = 0,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans8x4;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans8x4 * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[8 * 64];
+
+
+IDCT_GEN(N, 2, 1, 1, 2, 1, TRANSFORM_SELECT8, TRANSFORM_SELECT4);
+*/

diff --git a/libav1/dx/shaders/idct8x8.hlsl b/libav1/dx/shaders/idct8x8.hlsl
new file mode 100644
index 0000000..e447dec
--- /dev/null
+++ b/libav1/dx/shaders/idct8x8.hlsl

@@ -0,0 +1,58 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define N 8
+#define ScanSize 16
+#define IdctOutputShiftH 1
+#define IdctOutputShiftV 4
+
+cbuffer cb_scans_8x8 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_mem[8 * 64];
+
+IDCT_GEN(N, 1, 1, 1, N / 4, 0, TRANSFORM_SELECT8, TRANSFORM_SELECT8);
+/*
+#include "idct_shader_common.h"
+
+
+enum
+{
+    N = 8,
+    ScanSize = 16,
+    IdctOutputShiftH = 1,
+    IdctOutputShiftV = 4,
+};
+
+typedef struct
+{
+    int4 table[ScanSize*3];
+} Scans8x8;
+
+typedef struct
+{
+    PSSLIdctData * data;
+    Scans8x8 * scans;
+    uint index_offset;
+    uint wicount;
+} PSSLIdctSRT;
+
+thread_group_memory int shared_mem[8 * 64];
+
+IDCT_GEN(N, 1, 1, 1, N/4, 0, TRANSFORM_SELECT8, TRANSFORM_SELECT8);
+
+*/

diff --git a/libav1/dx/shaders/idct_lossless.hlsl b/libav1/dx/shaders/idct_lossless.hlsl
new file mode 100644
index 0000000..15de470
--- /dev/null
+++ b/libav1/dx/shaders/idct_lossless.hlsl

@@ -0,0 +1,96 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "idct_shader_common.h"
+
+#define ScanSize 4
+#define QuantShift 2
+
+cbuffer cb_scans_4x4 : register(b0) { int4 cb_scans[ScanSize * 3]; };
+
+groupshared int shared_4x4_mem[4 * 64];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wicount) return;
+  int wi = thread.x & 3;
+  uint4 block = buf_blocks.Load4((cb_index_offset + thread.x / 4) * IdctBlockSize);
+  const int plane = (block.x >> 21) & 3;
+  const int loc_mem_offset = 4 * (thread.x & (64 - 4));
+  const int scan_offset = ScanSize * ((block.x >> ScanShift) & ScanMask) + wi;
+  const int coef_count = block.x & 0x7ff;
+
+  const int input_offset = (block.y + wi * 4) << 2;
+  int4 coefs = (wi < coef_count) ? (int4)buf_input.Load4(input_offset) : int4(0, 0, 0, 0);
+
+  shared_4x4_mem[loc_mem_offset + cb_scans[scan_offset].x] = coefs.x;
+  shared_4x4_mem[loc_mem_offset + cb_scans[scan_offset].y] = coefs.y;
+  shared_4x4_mem[loc_mem_offset + cb_scans[scan_offset].z] = coefs.z;
+  shared_4x4_mem[loc_mem_offset + cb_scans[scan_offset].w] = coefs.w;
+
+  GroupMemoryBarrier();
+
+  const int loc_offset_row = (thread.x & 63) * 4;
+  const int loc_offset_col = loc_mem_offset + wi;
+
+  int a1, b1, c1, d1, e1;
+  a1 = shared_4x4_mem[loc_offset_row + 0] >> QuantShift;
+  c1 = shared_4x4_mem[loc_offset_row + 1] >> QuantShift;
+  d1 = shared_4x4_mem[loc_offset_row + 2] >> QuantShift;
+  b1 = shared_4x4_mem[loc_offset_row + 3] >> QuantShift;
+
+  a1 += c1;
+  d1 -= b1;
+  e1 = (a1 - d1) >> 1;
+  b1 = e1 - b1;
+  c1 = e1 - c1;
+  a1 -= b1;
+  d1 += c1;
+
+  shared_4x4_mem[loc_offset_row + 0] = a1;
+  shared_4x4_mem[loc_offset_row + 1] = b1;
+  shared_4x4_mem[loc_offset_row + 2] = c1;
+  shared_4x4_mem[loc_offset_row + 3] = d1;
+
+  GroupMemoryBarrier();
+
+  a1 = shared_4x4_mem[loc_offset_col + 0];
+  c1 = shared_4x4_mem[loc_offset_col + 4];
+  d1 = shared_4x4_mem[loc_offset_col + 8];
+  b1 = shared_4x4_mem[loc_offset_col + 12];
+  a1 += c1;
+  d1 -= b1;
+  e1 = (a1 - d1) >> 1;
+  b1 = e1 - b1;
+  c1 = e1 - c1;
+  a1 -= b1;
+  d1 += c1;
+  shared_4x4_mem[loc_offset_col + 0] = a1;
+  shared_4x4_mem[loc_offset_col + 4] = b1;
+  shared_4x4_mem[loc_offset_col + 8] = c1;
+  shared_4x4_mem[loc_offset_col + 12] = d1;
+
+  GroupMemoryBarrier();
+
+  const int stride = cb_planes[plane].x;
+  const int offset =
+      cb_planes[plane].y + 8 * ((block.z & 0xffff) + (wi >> 2)) + (4 * (block.z >> 16) + (wi & 3)) * stride;
+
+  const int output_offset = loc_mem_offset + (wi & 3) * 4 + 4 * (wi >> 2);
+  buf_dst.Store2(offset,
+                 int2((shared_4x4_mem[output_offset + 0] & 0xffff) | (shared_4x4_mem[output_offset + 1] << 16),
+                      (shared_4x4_mem[output_offset + 2] & 0xffff) | (shared_4x4_mem[output_offset + 3] << 16)));
+}

diff --git a/libav1/dx/shaders/idct_shader_common.h b/libav1/dx/shaders/idct_shader_common.h
new file mode 100644
index 0000000..63bed6a
--- /dev/null
+++ b/libav1/dx/shaders/idct_shader_common.h

@@ -0,0 +1,1695 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma warning(disable : 3557)
+#pragma warning(disable : 4714)
+
+#define IdctBlockSize 16
+
+ByteAddressBuffer buf_input : register(t0);
+ByteAddressBuffer buf_blocks : register(t1);
+RWByteAddressBuffer buf_dst : register(u0);
+
+cbuffer cb_idct_frame_data : register(b1) {
+  uint4 cb_planes[3];
+  int cb_bitdepth;
+};
+
+cbuffer cb_idct_dispatch_data : register(b2) {
+  uint cb_index_offset;
+  uint cb_wicount;
+};
+
+#define NewSqrt2Bits 12
+#define NewSqrt2 5793
+#define NewInvSqrt2 2896
+#define CosBits 12
+#define ScanShift 11
+#define ScanMask 3
+#define IdctType_H_Flip (1 << 15)
+#define IdctType_V_Flip (1 << 16)
+#define IdctType_H_Shift 17
+#define IdctType_V_Shift 19
+#define ClampMaxValue16 ((1 << 15) - 1)
+#define ClampMinValue16 (-(1 << 15))
+
+#define IDCT_GEN(N, COLS, ROWS, ROWS0, COEF_LOOP, SCALE_COEF, TRANSFORM_ROW, TRANSFORM_COL)                           \
+  [numthreads(64, 1, 1)] void main(uint3 thread                                                                       \
+                                   : SV_DispatchThreadID) {                                                           \
+    if (thread.x >= cb_wicount) return;                                                                               \
+    const uint wi = thread.x & (N - 1);                                                                               \
+    uint4 block = buf_blocks.Load4((cb_index_offset + thread.x / N) * IdctBlockSize);                                 \
+    const uint coef_count = block.x & 0x7ff;                                                                          \
+    const uint scan_offset = ScanSize * ((block.x >> ScanShift) & ScanMask) + wi;                                     \
+    const uint loc_mem_offset = COLS * ROWS * N * (thread.x & (64 - N));                                              \
+    const uint input_offset = block.y + wi * 4;                                                                       \
+    const int2 row_clamp = int2(-(1 << (cb_bitdepth + 7)), (1 << (cb_bitdepth + 7)) - 1);                             \
+    uint i;                                                                                                           \
+    for (i = 0; i < COEF_LOOP; ++i) {                                                                                 \
+      int4 coefs = int4(0, 0, 0, 0);                                                                                  \
+      if ((wi + i * N) < coef_count) {                                                                                \
+        coefs = (int4)buf_input.Load4(input_offset * 4 + i * N * 16);                                                 \
+        if (SCALE_COEF) {                                                                                             \
+          coefs.x = round_shift(coefs.x * NewInvSqrt2, NewSqrt2Bits);                                                 \
+          coefs.y = round_shift(coefs.y * NewInvSqrt2, NewSqrt2Bits);                                                 \
+          coefs.z = round_shift(coefs.z * NewInvSqrt2, NewSqrt2Bits);                                                 \
+          coefs.w = round_shift(coefs.w * NewInvSqrt2, NewSqrt2Bits);                                                 \
+        }                                                                                                             \
+      }                                                                                                               \
+      shared_mem[loc_mem_offset + cb_scans[scan_offset + i * N].x] = coefs.x;                                         \
+      shared_mem[loc_mem_offset + cb_scans[scan_offset + i * N].y] = coefs.y;                                         \
+      shared_mem[loc_mem_offset + cb_scans[scan_offset + i * N].z] = coefs.z;                                         \
+      shared_mem[loc_mem_offset + cb_scans[scan_offset + i * N].w] = coefs.w;                                         \
+    }                                                                                                                 \
+    GroupMemoryBarrier();                                                                                             \
+    const uint h_type = (block.x >> IdctType_H_Shift) & 3;                                                            \
+    const uint loc_row_offset = /*(thread.x & 63) * N * ROWS0 * COLS;*/ loc_mem_offset + wi * ROWS0 * COLS * N;       \
+    for (int row = 0; row < ROWS0; ++row) {                                                                           \
+      const uint row_offset = loc_row_offset + row * N;                                                               \
+      TRANSFORM_ROW(N* COLS, h_type, row_offset, 1, row_offset, 1, IdctOutputShiftH, row_clamp);                      \
+    }                                                                                                                 \
+                                                                                                                      \
+    GroupMemoryBarrier();                                                                                             \
+                                                                                                                      \
+    const int v_type = (block.x >> IdctType_V_Shift) & 3;                                                             \
+    const int loc_out_offset = (block.x & IdctType_V_Flip) ? loc_mem_offset + wi * COLS + (N * ROWS - 1) * N * COLS   \
+                                                           : loc_mem_offset + wi * COLS;                              \
+    const int loc_out_stride = (block.x & IdctType_V_Flip) ? -N * COLS : N * COLS;                                    \
+    const int2 col_clamp = int2(ClampMinValue16, ClampMaxValue16);                                                    \
+    for (int c = 0; c < COLS; ++c) {                                                                                  \
+      const int in_offset = loc_mem_offset + wi * COLS + c;                                                           \
+      const int out_offset = loc_out_offset + c;                                                                      \
+      TRANSFORM_COL(N* ROWS, v_type, in_offset, (N * COLS), out_offset, loc_out_stride, IdctOutputShiftV, col_clamp); \
+    }                                                                                                                 \
+                                                                                                                      \
+    GroupMemoryBarrier();                                                                                             \
+                                                                                                                      \
+    const uint plane = (block.x >> 21) & 3;                                                                           \
+    const uint stride = cb_planes[plane].x;                                                                           \
+    const int output_offset = cb_planes[plane].y + 8 * ((wi & (N * COLS / 4 - 1)) + (block.z & 0xffff)) +             \
+                              (wi / (N * COLS / 4) + (block.z >> 16) * 4) * stride;                                   \
+    const uint col = wi & (N * COLS / 4 - 1);                                                                         \
+    const int local_offset = loc_mem_offset + (wi / (N * COLS / 4)) * (N * COLS) +                                    \
+                             ((block.x & IdctType_H_Flip) ? N * COLS - 4 - 4 * col : 4 * col);                        \
+    for (i = 0; i < N * COLS * ROWS / 4; ++i) {                                                                       \
+      int4 sm;                                                                                                        \
+      sm.x = shared_mem[local_offset + 0 + i * (4 * N)];                                                              \
+      sm.y = shared_mem[local_offset + 1 + i * (4 * N)];                                                              \
+      sm.z = shared_mem[local_offset + 2 + i * (4 * N)];                                                              \
+      sm.w = shared_mem[local_offset + 3 + i * (4 * N)];                                                              \
+      if (block.x & IdctType_H_Flip) sm.xyzw = sm.wzyx;                                                               \
+      buf_dst.Store2(output_offset + (4 / COLS) * i * stride,                                                         \
+                     int2((sm.x & 0xffff) | (sm.y << 16), (sm.z & 0xffff) | (sm.w << 16)));                           \
+    }                                                                                                                 \
+  }
+
+#define TRANSFORM_SELECT4(N, type, in_offset, in_stride, out_offset, out_stride, shift, range)                     \
+  {                                                                                                                \
+    if (type == 0) {                                                                                               \
+      IDCT4(shared_mem, in_offset, in_stride, shared_mem, out_offset, out_stride, range, shift);                   \
+    } else if (type == 1) {                                                                                        \
+      IADST4(shared_mem, in_offset, in_stride, shared_mem, out_offset, out_stride, range, shift);                  \
+    } else {                                                                                                       \
+      IIDENTITY_N(N, NewSqrt2, NewSqrt2Bits, shared_mem, in_offset, in_stride, shared_mem, out_offset, out_stride, \
+                  range, shift);                                                                                   \
+    }                                                                                                              \
+  }
+
+#define TRANSFORM_SELECT8(N, type, in_offset, in_stride, out_offset, out_stride, shift, range)                  \
+  {                                                                                                             \
+    if (type == 0) {                                                                                            \
+      IDCT8(shared_mem, in_offset, in_stride, shared_mem, out_offset, out_stride, range, shift);                \
+    } else if (type == 1) {                                                                                     \
+      IADST8(shared_mem, in_offset, in_stride, shared_mem, out_offset, out_stride, range, shift);               \
+    } else {                                                                                                    \
+      IIDENTITY_N(N, 2, 0, shared_mem, in_offset, in_stride, shared_mem, out_offset, out_stride, range, shift); \
+    }                                                                                                           \
+  }
+
+#define TRANSFORM_SELECT16(N, type, in_offset, in_stride, out_offset, out_stride, shift, range)                        \
+  {                                                                                                                    \
+    if (type == 0) {                                                                                                   \
+      IDCT16(shared_mem, in_offset, in_stride, shared_mem, out_offset, out_stride, range, shift);                      \
+    } else if (type == 1) {                                                                                            \
+      IADST16(shared_mem, in_offset, in_stride, shared_mem, out_offset, out_stride, range, shift);                     \
+    } else {                                                                                                           \
+      IIDENTITY_N(N, 2 * NewSqrt2, NewSqrt2Bits, shared_mem, in_offset, in_stride, shared_mem, out_offset, out_stride, \
+                  range, shift);                                                                                       \
+    }                                                                                                                  \
+  }
+
+#define TRANSFORM_SELECT32(N, type, in_offset, in_stride, out_offset, out_stride, shift, range)                 \
+  {                                                                                                             \
+    if (type == 0) {                                                                                            \
+      IDCT32(shared_mem, in_offset, in_stride, shared_mem, out_offset, out_stride, range, shift);               \
+    } else {                                                                                                    \
+      IIDENTITY_N(N, 4, 0, shared_mem, in_offset, in_stride, shared_mem, out_offset, out_stride, range, shift); \
+    }                                                                                                           \
+  }
+
+#define TRANSFORM_64_HALF(N, type, in_offset, in_stride, out_offset, out_stride, shift, range)    \
+  {                                                                                               \
+    if (wi < 32) {                                                                                \
+      IDCT64(shared_mem, in_offset, in_stride, shared_mem, out_offset, out_stride, range, shift); \
+    }                                                                                             \
+  }
+
+#define TRANSFORM_64(N, type, in_offset, in_stride, out_offset, out_stride, shift, range) \
+  { IDCT64(shared_mem, in_offset, in_stride, shared_mem, out_offset, out_stride, range, shift); }
+
+static const int cospi[] = {4096, 4095, 4091, 4085, 4076, 4065, 4052, 4036, 4017, 3996, 3973, 3948, 3920,
+                            3889, 3857, 3822, 3784, 3745, 3703, 3659, 3612, 3564, 3513, 3461, 3406, 3349,
+                            3290, 3229, 3166, 3102, 3035, 2967, 2896, 2824, 2751, 2675, 2598, 2520, 2440,
+                            2359, 2276, 2191, 2106, 2019, 1931, 1842, 1751, 1660, 1567, 1474, 1380, 1285,
+                            1189, 1092, 995,  897,  799,  700,  601,  501,  401,  301,  201,  101};
+
+static const int sinpi[] = {0, 1321, 2482, 3344, 3803};
+
+int clamp_value(int value, int2 range) { return clamp(value, range.x, range.y); }
+
+int round_shift(int value, int bit) {
+  if (bit == 0) return value;
+  return (int)((value + (1 << (bit - 1))) >> bit);
+}
+
+int half_btf(int w0, int in0, int w1, int in1, int bit) { return (w0 * in0 + w1 * in1 + (1 << (bit - 1))) >> bit; }
+
+#define IDCT4(input, in_offset, in_stride, output, out_offset, out_stride, range, output_shift)                 \
+  {                                                                                                             \
+    int step[4];                                                                                                \
+    int step1[4];                                                                                               \
+    step[0] = clamp_value(input[in_offset + 0 * (in_stride)], range);                                           \
+    step[1] = clamp_value(input[in_offset + 2 * (in_stride)], range);                                           \
+    step[2] = clamp_value(input[in_offset + 1 * (in_stride)], range);                                           \
+    step[3] = clamp_value(input[in_offset + 3 * (in_stride)], range);                                           \
+    step1[0] = half_btf(cospi[32], step[0], cospi[32], step[1], CosBits);                                       \
+    step1[1] = half_btf(cospi[32], step[0], -cospi[32], step[1], CosBits);                                      \
+    step1[2] = half_btf(cospi[48], step[2], -cospi[16], step[3], CosBits);                                      \
+    step1[3] = half_btf(cospi[16], step[2], cospi[48], step[3], CosBits);                                       \
+    output[out_offset + 0 * (out_stride)] = round_shift(clamp_value(step1[0] + step1[3], range), output_shift); \
+    output[out_offset + 1 * (out_stride)] = round_shift(clamp_value(step1[1] + step1[2], range), output_shift); \
+    output[out_offset + 2 * (out_stride)] = round_shift(clamp_value(step1[1] - step1[2], range), output_shift); \
+    output[out_offset + 3 * (out_stride)] = round_shift(clamp_value(step1[0] - step1[3], range), output_shift); \
+  }
+
+#define IADST4(input, in_offset, in_stride, output, out_offset, out_stride, range, output_shift) \
+  {                                                                                              \
+    int s0, s1, s2, s3, s4, s5, s6, s7;                                                          \
+    int x0 = clamp_value(input[in_offset + 0 * (in_stride)], range);                             \
+    int x1 = clamp_value(input[in_offset + 1 * (in_stride)], range);                             \
+    int x2 = clamp_value(input[in_offset + 2 * (in_stride)], range);                             \
+    int x3 = clamp_value(input[in_offset + 3 * (in_stride)], range);                             \
+    s0 = sinpi[1] * x0;                                                                          \
+    s1 = sinpi[2] * x0;                                                                          \
+    s2 = sinpi[3] * x1;                                                                          \
+    s3 = sinpi[4] * x2;                                                                          \
+    s4 = sinpi[1] * x2;                                                                          \
+    s5 = sinpi[2] * x3;                                                                          \
+    s6 = sinpi[4] * x3;                                                                          \
+    s7 = (x0 - x2) + x3;                                                                         \
+    s0 = s0 + s3;                                                                                \
+    s1 = s1 - s4;                                                                                \
+    s3 = s2;                                                                                     \
+    s2 = sinpi[3] * s7;                                                                          \
+    s0 = s0 + s5;                                                                                \
+    s1 = s1 - s6;                                                                                \
+    x0 = s0 + s3;                                                                                \
+    x1 = s1 + s3;                                                                                \
+    x2 = s2;                                                                                     \
+    x3 = s0 + s1;                                                                                \
+    x3 = x3 - s3;                                                                                \
+    output[out_offset + 0 * (out_stride)] = round_shift(round_shift(x0, CosBits), output_shift); \
+    output[out_offset + 1 * (out_stride)] = round_shift(round_shift(x1, CosBits), output_shift); \
+    output[out_offset + 2 * (out_stride)] = round_shift(round_shift(x2, CosBits), output_shift); \
+    output[out_offset + 3 * (out_stride)] = round_shift(round_shift(x3, CosBits), output_shift); \
+  }
+
+#define IADST8(input, in_offset, in_stride, output, out_offset, out_stride, range, output_shift) \
+  {                                                                                              \
+    int step0[8];                                                                                \
+    int step1[8];                                                                                \
+    step0[0] = clamp_value(input[in_offset + 7 * (in_stride)], range);                           \
+    step0[1] = clamp_value(input[in_offset + 0 * (in_stride)], range);                           \
+    step0[2] = clamp_value(input[in_offset + 5 * (in_stride)], range);                           \
+    step0[3] = clamp_value(input[in_offset + 2 * (in_stride)], range);                           \
+    step0[4] = clamp_value(input[in_offset + 3 * (in_stride)], range);                           \
+    step0[5] = clamp_value(input[in_offset + 4 * (in_stride)], range);                           \
+    step0[6] = clamp_value(input[in_offset + 1 * (in_stride)], range);                           \
+    step0[7] = clamp_value(input[in_offset + 6 * (in_stride)], range);                           \
+    step1[0] = half_btf(cospi[4], step0[0], cospi[60], step0[1], CosBits);                       \
+    step1[1] = half_btf(cospi[60], step0[0], -cospi[4], step0[1], CosBits);                      \
+    step1[2] = half_btf(cospi[20], step0[2], cospi[44], step0[3], CosBits);                      \
+    step1[3] = half_btf(cospi[44], step0[2], -cospi[20], step0[3], CosBits);                     \
+    step1[4] = half_btf(cospi[36], step0[4], cospi[28], step0[5], CosBits);                      \
+    step1[5] = half_btf(cospi[28], step0[4], -cospi[36], step0[5], CosBits);                     \
+    step1[6] = half_btf(cospi[52], step0[6], cospi[12], step0[7], CosBits);                      \
+    step1[7] = half_btf(cospi[12], step0[6], -cospi[52], step0[7], CosBits);                     \
+    step0[0] = clamp_value(step1[0] + step1[4], range);                                          \
+    step0[1] = clamp_value(step1[1] + step1[5], range);                                          \
+    step0[2] = clamp_value(step1[2] + step1[6], range);                                          \
+    step0[3] = clamp_value(step1[3] + step1[7], range);                                          \
+    step0[4] = clamp_value(step1[0] - step1[4], range);                                          \
+    step0[5] = clamp_value(step1[1] - step1[5], range);                                          \
+    step0[6] = clamp_value(step1[2] - step1[6], range);                                          \
+    step0[7] = clamp_value(step1[3] - step1[7], range);                                          \
+    step1[0] = step0[0];                                                                         \
+    step1[1] = step0[1];                                                                         \
+    step1[2] = step0[2];                                                                         \
+    step1[3] = step0[3];                                                                         \
+    step1[4] = half_btf(cospi[16], step0[4], cospi[48], step0[5], CosBits);                      \
+    step1[5] = half_btf(cospi[48], step0[4], -cospi[16], step0[5], CosBits);                     \
+    step1[6] = half_btf(-cospi[48], step0[6], cospi[16], step0[7], CosBits);                     \
+    step1[7] = half_btf(cospi[16], step0[6], cospi[48], step0[7], CosBits);                      \
+    step0[0] = clamp_value(step1[0] + step1[2], range);                                          \
+    step0[1] = clamp_value(step1[1] + step1[3], range);                                          \
+    step0[2] = clamp_value(step1[0] - step1[2], range);                                          \
+    step0[3] = clamp_value(step1[1] - step1[3], range);                                          \
+    step0[4] = clamp_value(step1[4] + step1[6], range);                                          \
+    step0[5] = clamp_value(step1[5] + step1[7], range);                                          \
+    step0[6] = clamp_value(step1[4] - step1[6], range);                                          \
+    step0[7] = clamp_value(step1[5] - step1[7], range);                                          \
+    step1[0] = step0[0];                                                                         \
+    step1[1] = step0[1];                                                                         \
+    step1[2] = half_btf(cospi[32], step0[2], cospi[32], step0[3], CosBits);                      \
+    step1[3] = half_btf(cospi[32], step0[2], -cospi[32], step0[3], CosBits);                     \
+    step1[4] = step0[4];                                                                         \
+    step1[5] = step0[5];                                                                         \
+    step1[6] = half_btf(cospi[32], step0[6], cospi[32], step0[7], CosBits);                      \
+    step1[7] = half_btf(cospi[32], step0[6], -cospi[32], step0[7], CosBits);                     \
+    output[out_offset + 0 * (out_stride)] = round_shift(step1[0], output_shift);                 \
+    output[out_offset + 1 * (out_stride)] = round_shift(-step1[4], output_shift);                \
+    output[out_offset + 2 * (out_stride)] = round_shift(step1[6], output_shift);                 \
+    output[out_offset + 3 * (out_stride)] = round_shift(-step1[2], output_shift);                \
+    output[out_offset + 4 * (out_stride)] = round_shift(step1[3], output_shift);                 \
+    output[out_offset + 5 * (out_stride)] = round_shift(-step1[7], output_shift);                \
+    output[out_offset + 6 * (out_stride)] = round_shift(step1[5], output_shift);                 \
+    output[out_offset + 7 * (out_stride)] = round_shift(-step1[1], output_shift);                \
+  }
+
+#define IDCT8(input, in_offset, in_stride, output, out_offset, out_stride, range, output_shift)                 \
+  {                                                                                                             \
+    int step0[8];                                                                                               \
+    int step1[8];                                                                                               \
+    step0[0] = clamp_value(input[in_offset + 0 * (in_stride)], range);                                          \
+    step0[1] = clamp_value(input[in_offset + 4 * (in_stride)], range);                                          \
+    step0[2] = clamp_value(input[in_offset + 2 * (in_stride)], range);                                          \
+    step0[3] = clamp_value(input[in_offset + 6 * (in_stride)], range);                                          \
+    step0[4] = clamp_value(input[in_offset + 1 * (in_stride)], range);                                          \
+    step0[5] = clamp_value(input[in_offset + 5 * (in_stride)], range);                                          \
+    step0[6] = clamp_value(input[in_offset + 3 * (in_stride)], range);                                          \
+    step0[7] = clamp_value(input[in_offset + 7 * (in_stride)], range);                                          \
+    step1[0] = step0[0];                                                                                        \
+    step1[1] = step0[1];                                                                                        \
+    step1[2] = step0[2];                                                                                        \
+    step1[3] = step0[3];                                                                                        \
+    step1[4] = half_btf(cospi[56], step0[4], -cospi[8], step0[7], CosBits);                                     \
+    step1[5] = half_btf(cospi[24], step0[5], -cospi[40], step0[6], CosBits);                                    \
+    step1[6] = half_btf(cospi[40], step0[5], cospi[24], step0[6], CosBits);                                     \
+    step1[7] = half_btf(cospi[8], step0[4], cospi[56], step0[7], CosBits);                                      \
+    step0[0] = half_btf(cospi[32], step1[0], cospi[32], step1[1], CosBits);                                     \
+    step0[1] = half_btf(cospi[32], step1[0], -cospi[32], step1[1], CosBits);                                    \
+    step0[2] = half_btf(cospi[48], step1[2], -cospi[16], step1[3], CosBits);                                    \
+    step0[3] = half_btf(cospi[16], step1[2], cospi[48], step1[3], CosBits);                                     \
+    step0[4] = clamp_value(step1[4] + step1[5], range);                                                         \
+    step0[5] = clamp_value(step1[4] - step1[5], range);                                                         \
+    step0[6] = clamp_value(-step1[6] + step1[7], range);                                                        \
+    step0[7] = clamp_value(step1[6] + step1[7], range);                                                         \
+    step1[0] = clamp_value(step0[0] + step0[3], range);                                                         \
+    step1[1] = clamp_value(step0[1] + step0[2], range);                                                         \
+    step1[2] = clamp_value(step0[1] - step0[2], range);                                                         \
+    step1[3] = clamp_value(step0[0] - step0[3], range);                                                         \
+    step1[4] = step0[4];                                                                                        \
+    step1[5] = half_btf(-cospi[32], step0[5], cospi[32], step0[6], CosBits);                                    \
+    step1[6] = half_btf(cospi[32], step0[5], cospi[32], step0[6], CosBits);                                     \
+    step1[7] = step0[7];                                                                                        \
+    output[out_offset + 0 * (out_stride)] = round_shift(clamp_value(step1[0] + step1[7], range), output_shift); \
+    output[out_offset + 1 * (out_stride)] = round_shift(clamp_value(step1[1] + step1[6], range), output_shift); \
+    output[out_offset + 2 * (out_stride)] = round_shift(clamp_value(step1[2] + step1[5], range), output_shift); \
+    output[out_offset + 3 * (out_stride)] = round_shift(clamp_value(step1[3] + step1[4], range), output_shift); \
+    output[out_offset + 4 * (out_stride)] = round_shift(clamp_value(step1[3] - step1[4], range), output_shift); \
+    output[out_offset + 5 * (out_stride)] = round_shift(clamp_value(step1[2] - step1[5], range), output_shift); \
+    output[out_offset + 6 * (out_stride)] = round_shift(clamp_value(step1[1] - step1[6], range), output_shift); \
+    output[out_offset + 7 * (out_stride)] = round_shift(clamp_value(step1[0] - step1[7], range), output_shift); \
+  }
+
+#define IDCT16(input, in_offset, in_stride, output, out_offset, out_stride, range, output_shift)                \
+  {                                                                                                             \
+    int step0[16];                                                                                              \
+    int step1[16];                                                                                              \
+    step0[0] = clamp_value(input[in_offset + (in_stride)*0], range);                                            \
+    step0[1] = clamp_value(input[in_offset + (in_stride)*8], range);                                            \
+    step0[2] = clamp_value(input[in_offset + (in_stride)*4], range);                                            \
+    step0[3] = clamp_value(input[in_offset + (in_stride)*12], range);                                           \
+    step0[4] = clamp_value(input[in_offset + (in_stride)*2], range);                                            \
+    step0[5] = clamp_value(input[in_offset + (in_stride)*10], range);                                           \
+    step0[6] = clamp_value(input[in_offset + (in_stride)*6], range);                                            \
+    step0[7] = clamp_value(input[in_offset + (in_stride)*14], range);                                           \
+    step0[8] = clamp_value(input[in_offset + (in_stride)*1], range);                                            \
+    step0[9] = clamp_value(input[in_offset + (in_stride)*9], range);                                            \
+    step0[10] = clamp_value(input[in_offset + (in_stride)*5], range);                                           \
+    step0[11] = clamp_value(input[in_offset + (in_stride)*13], range);                                          \
+    step0[12] = clamp_value(input[in_offset + (in_stride)*3], range);                                           \
+    step0[13] = clamp_value(input[in_offset + (in_stride)*11], range);                                          \
+    step0[14] = clamp_value(input[in_offset + (in_stride)*7], range);                                           \
+    step0[15] = clamp_value(input[in_offset + (in_stride)*15], range);                                          \
+    step1[0] = step0[0];                                                                                        \
+    step1[1] = step0[1];                                                                                        \
+    step1[2] = step0[2];                                                                                        \
+    step1[3] = step0[3];                                                                                        \
+    step1[4] = step0[4];                                                                                        \
+    step1[5] = step0[5];                                                                                        \
+    step1[6] = step0[6];                                                                                        \
+    step1[7] = step0[7];                                                                                        \
+    step1[8] = half_btf(cospi[60], step0[8], -cospi[4], step0[15], CosBits);                                    \
+    step1[9] = half_btf(cospi[28], step0[9], -cospi[36], step0[14], CosBits);                                   \
+    step1[10] = half_btf(cospi[44], step0[10], -cospi[20], step0[13], CosBits);                                 \
+    step1[11] = half_btf(cospi[12], step0[11], -cospi[52], step0[12], CosBits);                                 \
+    step1[12] = half_btf(cospi[52], step0[11], cospi[12], step0[12], CosBits);                                  \
+    step1[13] = half_btf(cospi[20], step0[10], cospi[44], step0[13], CosBits);                                  \
+    step1[14] = half_btf(cospi[36], step0[9], cospi[28], step0[14], CosBits);                                   \
+    step1[15] = half_btf(cospi[4], step0[8], cospi[60], step0[15], CosBits);                                    \
+    step0[0] = step1[0];                                                                                        \
+    step0[1] = step1[1];                                                                                        \
+    step0[2] = step1[2];                                                                                        \
+    step0[3] = step1[3];                                                                                        \
+    step0[4] = half_btf(cospi[56], step1[4], -cospi[8], step1[7], CosBits);                                     \
+    step0[5] = half_btf(cospi[24], step1[5], -cospi[40], step1[6], CosBits);                                    \
+    step0[6] = half_btf(cospi[40], step1[5], cospi[24], step1[6], CosBits);                                     \
+    step0[7] = half_btf(cospi[8], step1[4], cospi[56], step1[7], CosBits);                                      \
+    step0[8] = clamp_value(step1[8] + step1[9], range);                                                         \
+    step0[9] = clamp_value(step1[8] - step1[9], range);                                                         \
+    step0[10] = clamp_value(-step1[10] + step1[11], range);                                                     \
+    step0[11] = clamp_value(step1[10] + step1[11], range);                                                      \
+    step0[12] = clamp_value(step1[12] + step1[13], range);                                                      \
+    step0[13] = clamp_value(step1[12] - step1[13], range);                                                      \
+    step0[14] = clamp_value(-step1[14] + step1[15], range);                                                     \
+    step0[15] = clamp_value(step1[14] + step1[15], range);                                                      \
+    step1[0] = half_btf(cospi[32], step0[0], cospi[32], step0[1], CosBits);                                     \
+    step1[1] = half_btf(cospi[32], step0[0], -cospi[32], step0[1], CosBits);                                    \
+    step1[2] = half_btf(cospi[48], step0[2], -cospi[16], step0[3], CosBits);                                    \
+    step1[3] = half_btf(cospi[16], step0[2], cospi[48], step0[3], CosBits);                                     \
+    step1[4] = clamp_value(step0[4] + step0[5], range);                                                         \
+    step1[5] = clamp_value(step0[4] - step0[5], range);                                                         \
+    step1[6] = clamp_value(-step0[6] + step0[7], range);                                                        \
+    step1[7] = clamp_value(step0[6] + step0[7], range);                                                         \
+    step1[8] = step0[8];                                                                                        \
+    step1[9] = half_btf(-cospi[16], step0[9], cospi[48], step0[14], CosBits);                                   \
+    step1[10] = half_btf(-cospi[48], step0[10], -cospi[16], step0[13], CosBits);                                \
+    step1[11] = step0[11];                                                                                      \
+    step1[12] = step0[12];                                                                                      \
+    step1[13] = half_btf(-cospi[16], step0[10], cospi[48], step0[13], CosBits);                                 \
+    step1[14] = half_btf(cospi[48], step0[9], cospi[16], step0[14], CosBits);                                   \
+    step1[15] = step0[15];                                                                                      \
+    step0[0] = clamp_value(step1[0] + step1[3], range);                                                         \
+    step0[1] = clamp_value(step1[1] + step1[2], range);                                                         \
+    step0[2] = clamp_value(step1[1] - step1[2], range);                                                         \
+    step0[3] = clamp_value(step1[0] - step1[3], range);                                                         \
+    step0[4] = step1[4];                                                                                        \
+    step0[5] = half_btf(-cospi[32], step1[5], cospi[32], step1[6], CosBits);                                    \
+    step0[6] = half_btf(cospi[32], step1[5], cospi[32], step1[6], CosBits);                                     \
+    step0[7] = step1[7];                                                                                        \
+    step0[8] = clamp_value(step1[8] + step1[11], range);                                                        \
+    step0[9] = clamp_value(step1[9] + step1[10], range);                                                        \
+    step0[10] = clamp_value(step1[9] - step1[10], range);                                                       \
+    step0[11] = clamp_value(step1[8] - step1[11], range);                                                       \
+    step0[12] = clamp_value(-step1[12] + step1[15], range);                                                     \
+    step0[13] = clamp_value(-step1[13] + step1[14], range);                                                     \
+    step0[14] = clamp_value(step1[13] + step1[14], range);                                                      \
+    step0[15] = clamp_value(step1[12] + step1[15], range);                                                      \
+    step1[0] = clamp_value(step0[0] + step0[7], range);                                                         \
+    step1[1] = clamp_value(step0[1] + step0[6], range);                                                         \
+    step1[2] = clamp_value(step0[2] + step0[5], range);                                                         \
+    step1[3] = clamp_value(step0[3] + step0[4], range);                                                         \
+    step1[4] = clamp_value(step0[3] - step0[4], range);                                                         \
+    step1[5] = clamp_value(step0[2] - step0[5], range);                                                         \
+    step1[6] = clamp_value(step0[1] - step0[6], range);                                                         \
+    step1[7] = clamp_value(step0[0] - step0[7], range);                                                         \
+    step1[8] = step0[8];                                                                                        \
+    step1[9] = step0[9];                                                                                        \
+    step1[10] = half_btf(-cospi[32], step0[10], cospi[32], step0[13], CosBits);                                 \
+    step1[11] = half_btf(-cospi[32], step0[11], cospi[32], step0[12], CosBits);                                 \
+    step1[12] = half_btf(cospi[32], step0[11], cospi[32], step0[12], CosBits);                                  \
+    step1[13] = half_btf(cospi[32], step0[10], cospi[32], step0[13], CosBits);                                  \
+    step1[14] = step0[14];                                                                                      \
+    step1[15] = step0[15];                                                                                      \
+    output[out_offset + (out_stride)*0] = round_shift(clamp_value(step1[0] + step1[15], range), output_shift);  \
+    output[out_offset + (out_stride)*1] = round_shift(clamp_value(step1[1] + step1[14], range), output_shift);  \
+    output[out_offset + (out_stride)*2] = round_shift(clamp_value(step1[2] + step1[13], range), output_shift);  \
+    output[out_offset + (out_stride)*3] = round_shift(clamp_value(step1[3] + step1[12], range), output_shift);  \
+    output[out_offset + (out_stride)*4] = round_shift(clamp_value(step1[4] + step1[11], range), output_shift);  \
+    output[out_offset + (out_stride)*5] = round_shift(clamp_value(step1[5] + step1[10], range), output_shift);  \
+    output[out_offset + (out_stride)*6] = round_shift(clamp_value(step1[6] + step1[9], range), output_shift);   \
+    output[out_offset + (out_stride)*7] = round_shift(clamp_value(step1[7] + step1[8], range), output_shift);   \
+    output[out_offset + (out_stride)*8] = round_shift(clamp_value(step1[7] - step1[8], range), output_shift);   \
+    output[out_offset + (out_stride)*9] = round_shift(clamp_value(step1[6] - step1[9], range), output_shift);   \
+    output[out_offset + (out_stride)*10] = round_shift(clamp_value(step1[5] - step1[10], range), output_shift); \
+    output[out_offset + (out_stride)*11] = round_shift(clamp_value(step1[4] - step1[11], range), output_shift); \
+    output[out_offset + (out_stride)*12] = round_shift(clamp_value(step1[3] - step1[12], range), output_shift); \
+    output[out_offset + (out_stride)*13] = round_shift(clamp_value(step1[2] - step1[13], range), output_shift); \
+    output[out_offset + (out_stride)*14] = round_shift(clamp_value(step1[1] - step1[14], range), output_shift); \
+    output[out_offset + (out_stride)*15] = round_shift(clamp_value(step1[0] - step1[15], range), output_shift); \
+  }
+
+#define IADST16(input, in_offset, in_stride, output, out_offset, out_stride, range, output_shift) \
+  {                                                                                               \
+    int step0[16];                                                                                \
+    int step1[16];                                                                                \
+    step0[0] = clamp_value(input[in_offset + (in_stride)*15], range);                             \
+    step0[1] = clamp_value(input[in_offset + (in_stride)*0], range);                              \
+    step0[2] = clamp_value(input[in_offset + (in_stride)*13], range);                             \
+    step0[3] = clamp_value(input[in_offset + (in_stride)*2], range);                              \
+    step0[4] = clamp_value(input[in_offset + (in_stride)*11], range);                             \
+    step0[5] = clamp_value(input[in_offset + (in_stride)*4], range);                              \
+    step0[6] = clamp_value(input[in_offset + (in_stride)*9], range);                              \
+    step0[7] = clamp_value(input[in_offset + (in_stride)*6], range);                              \
+    step0[8] = clamp_value(input[in_offset + (in_stride)*7], range);                              \
+    step0[9] = clamp_value(input[in_offset + (in_stride)*8], range);                              \
+    step0[10] = clamp_value(input[in_offset + (in_stride)*5], range);                             \
+    step0[11] = clamp_value(input[in_offset + (in_stride)*10], range);                            \
+    step0[12] = clamp_value(input[in_offset + (in_stride)*3], range);                             \
+    step0[13] = clamp_value(input[in_offset + (in_stride)*12], range);                            \
+    step0[14] = clamp_value(input[in_offset + (in_stride)*1], range);                             \
+    step0[15] = clamp_value(input[in_offset + (in_stride)*14], range);                            \
+    step1[0] = half_btf(cospi[2], step0[0], cospi[62], step0[1], CosBits);                        \
+    step1[1] = half_btf(cospi[62], step0[0], -cospi[2], step0[1], CosBits);                       \
+    step1[2] = half_btf(cospi[10], step0[2], cospi[54], step0[3], CosBits);                       \
+    step1[3] = half_btf(cospi[54], step0[2], -cospi[10], step0[3], CosBits);                      \
+    step1[4] = half_btf(cospi[18], step0[4], cospi[46], step0[5], CosBits);                       \
+    step1[5] = half_btf(cospi[46], step0[4], -cospi[18], step0[5], CosBits);                      \
+    step1[6] = half_btf(cospi[26], step0[6], cospi[38], step0[7], CosBits);                       \
+    step1[7] = half_btf(cospi[38], step0[6], -cospi[26], step0[7], CosBits);                      \
+    step1[8] = half_btf(cospi[34], step0[8], cospi[30], step0[9], CosBits);                       \
+    step1[9] = half_btf(cospi[30], step0[8], -cospi[34], step0[9], CosBits);                      \
+    step1[10] = half_btf(cospi[42], step0[10], cospi[22], step0[11], CosBits);                    \
+    step1[11] = half_btf(cospi[22], step0[10], -cospi[42], step0[11], CosBits);                   \
+    step1[12] = half_btf(cospi[50], step0[12], cospi[14], step0[13], CosBits);                    \
+    step1[13] = half_btf(cospi[14], step0[12], -cospi[50], step0[13], CosBits);                   \
+    step1[14] = half_btf(cospi[58], step0[14], cospi[6], step0[15], CosBits);                     \
+    step1[15] = half_btf(cospi[6], step0[14], -cospi[58], step0[15], CosBits);                    \
+    step0[0] = clamp_value(step1[0] + step1[8], range);                                           \
+    step0[1] = clamp_value(step1[1] + step1[9], range);                                           \
+    step0[2] = clamp_value(step1[2] + step1[10], range);                                          \
+    step0[3] = clamp_value(step1[3] + step1[11], range);                                          \
+    step0[4] = clamp_value(step1[4] + step1[12], range);                                          \
+    step0[5] = clamp_value(step1[5] + step1[13], range);                                          \
+    step0[6] = clamp_value(step1[6] + step1[14], range);                                          \
+    step0[7] = clamp_value(step1[7] + step1[15], range);                                          \
+    step0[8] = clamp_value(step1[0] - step1[8], range);                                           \
+    step0[9] = clamp_value(step1[1] - step1[9], range);                                           \
+    step0[10] = clamp_value(step1[2] - step1[10], range);                                         \
+    step0[11] = clamp_value(step1[3] - step1[11], range);                                         \
+    step0[12] = clamp_value(step1[4] - step1[12], range);                                         \
+    step0[13] = clamp_value(step1[5] - step1[13], range);                                         \
+    step0[14] = clamp_value(step1[6] - step1[14], range);                                         \
+    step0[15] = clamp_value(step1[7] - step1[15], range);                                         \
+    step1[0] = step0[0];                                                                          \
+    step1[1] = step0[1];                                                                          \
+    step1[2] = step0[2];                                                                          \
+    step1[3] = step0[3];                                                                          \
+    step1[4] = step0[4];                                                                          \
+    step1[5] = step0[5];                                                                          \
+    step1[6] = step0[6];                                                                          \
+    step1[7] = step0[7];                                                                          \
+    step1[8] = half_btf(cospi[8], step0[8], cospi[56], step0[9], CosBits);                        \
+    step1[9] = half_btf(cospi[56], step0[8], -cospi[8], step0[9], CosBits);                       \
+    step1[10] = half_btf(cospi[40], step0[10], cospi[24], step0[11], CosBits);                    \
+    step1[11] = half_btf(cospi[24], step0[10], -cospi[40], step0[11], CosBits);                   \
+    step1[12] = half_btf(-cospi[56], step0[12], cospi[8], step0[13], CosBits);                    \
+    step1[13] = half_btf(cospi[8], step0[12], cospi[56], step0[13], CosBits);                     \
+    step1[14] = half_btf(-cospi[24], step0[14], cospi[40], step0[15], CosBits);                   \
+    step1[15] = half_btf(cospi[40], step0[14], cospi[24], step0[15], CosBits);                    \
+    step0[0] = clamp_value(step1[0] + step1[4], range);                                           \
+    step0[1] = clamp_value(step1[1] + step1[5], range);                                           \
+    step0[2] = clamp_value(step1[2] + step1[6], range);                                           \
+    step0[3] = clamp_value(step1[3] + step1[7], range);                                           \
+    step0[4] = clamp_value(step1[0] - step1[4], range);                                           \
+    step0[5] = clamp_value(step1[1] - step1[5], range);                                           \
+    step0[6] = clamp_value(step1[2] - step1[6], range);                                           \
+    step0[7] = clamp_value(step1[3] - step1[7], range);                                           \
+    step0[8] = clamp_value(step1[8] + step1[12], range);                                          \
+    step0[9] = clamp_value(step1[9] + step1[13], range);                                          \
+    step0[10] = clamp_value(step1[10] + step1[14], range);                                        \
+    step0[11] = clamp_value(step1[11] + step1[15], range);                                        \
+    step0[12] = clamp_value(step1[8] - step1[12], range);                                         \
+    step0[13] = clamp_value(step1[9] - step1[13], range);                                         \
+    step0[14] = clamp_value(step1[10] - step1[14], range);                                        \
+    step0[15] = clamp_value(step1[11] - step1[15], range);                                        \
+    step1[0] = step0[0];                                                                          \
+    step1[1] = step0[1];                                                                          \
+    step1[2] = step0[2];                                                                          \
+    step1[3] = step0[3];                                                                          \
+    step1[4] = half_btf(cospi[16], step0[4], cospi[48], step0[5], CosBits);                       \
+    step1[5] = half_btf(cospi[48], step0[4], -cospi[16], step0[5], CosBits);                      \
+    step1[6] = half_btf(-cospi[48], step0[6], cospi[16], step0[7], CosBits);                      \
+    step1[7] = half_btf(cospi[16], step0[6], cospi[48], step0[7], CosBits);                       \
+    step1[8] = step0[8];                                                                          \
+    step1[9] = step0[9];                                                                          \
+    step1[10] = step0[10];                                                                        \
+    step1[11] = step0[11];                                                                        \
+    step1[12] = half_btf(cospi[16], step0[12], cospi[48], step0[13], CosBits);                    \
+    step1[13] = half_btf(cospi[48], step0[12], -cospi[16], step0[13], CosBits);                   \
+    step1[14] = half_btf(-cospi[48], step0[14], cospi[16], step0[15], CosBits);                   \
+    step1[15] = half_btf(cospi[16], step0[14], cospi[48], step0[15], CosBits);                    \
+    step0[0] = clamp_value(step1[0] + step1[2], range);                                           \
+    step0[1] = clamp_value(step1[1] + step1[3], range);                                           \
+    step0[2] = clamp_value(step1[0] - step1[2], range);                                           \
+    step0[3] = clamp_value(step1[1] - step1[3], range);                                           \
+    step0[4] = clamp_value(step1[4] + step1[6], range);                                           \
+    step0[5] = clamp_value(step1[5] + step1[7], range);                                           \
+    step0[6] = clamp_value(step1[4] - step1[6], range);                                           \
+    step0[7] = clamp_value(step1[5] - step1[7], range);                                           \
+    step0[8] = clamp_value(step1[8] + step1[10], range);                                          \
+    step0[9] = clamp_value(step1[9] + step1[11], range);                                          \
+    step0[10] = clamp_value(step1[8] - step1[10], range);                                         \
+    step0[11] = clamp_value(step1[9] - step1[11], range);                                         \
+    step0[12] = clamp_value(step1[12] + step1[14], range);                                        \
+    step0[13] = clamp_value(step1[13] + step1[15], range);                                        \
+    step0[14] = clamp_value(step1[12] - step1[14], range);                                        \
+    step0[15] = clamp_value(step1[13] - step1[15], range);                                        \
+    step1[0] = step0[0];                                                                          \
+    step1[1] = step0[1];                                                                          \
+    step1[2] = half_btf(cospi[32], step0[2], cospi[32], step0[3], CosBits);                       \
+    step1[3] = half_btf(cospi[32], step0[2], -cospi[32], step0[3], CosBits);                      \
+    step1[4] = step0[4];                                                                          \
+    step1[5] = step0[5];                                                                          \
+    step1[6] = half_btf(cospi[32], step0[6], cospi[32], step0[7], CosBits);                       \
+    step1[7] = half_btf(cospi[32], step0[6], -cospi[32], step0[7], CosBits);                      \
+    step1[8] = step0[8];                                                                          \
+    step1[9] = step0[9];                                                                          \
+    step1[10] = half_btf(cospi[32], step0[10], cospi[32], step0[11], CosBits);                    \
+    step1[11] = half_btf(cospi[32], step0[10], -cospi[32], step0[11], CosBits);                   \
+    step1[12] = step0[12];                                                                        \
+    step1[13] = step0[13];                                                                        \
+    step1[14] = half_btf(cospi[32], step0[14], cospi[32], step0[15], CosBits);                    \
+    step1[15] = half_btf(cospi[32], step0[14], -cospi[32], step0[15], CosBits);                   \
+    output[out_offset + (out_stride)*0] = round_shift(step1[0], output_shift);                    \
+    output[out_offset + (out_stride)*1] = round_shift(-step1[8], output_shift);                   \
+    output[out_offset + (out_stride)*2] = round_shift(step1[12], output_shift);                   \
+    output[out_offset + (out_stride)*3] = round_shift(-step1[4], output_shift);                   \
+    output[out_offset + (out_stride)*4] = round_shift(step1[6], output_shift);                    \
+    output[out_offset + (out_stride)*5] = round_shift(-step1[14], output_shift);                  \
+    output[out_offset + (out_stride)*6] = round_shift(step1[10], output_shift);                   \
+    output[out_offset + (out_stride)*7] = round_shift(-step1[2], output_shift);                   \
+    output[out_offset + (out_stride)*8] = round_shift(step1[3], output_shift);                    \
+    output[out_offset + (out_stride)*9] = round_shift(-step1[11], output_shift);                  \
+    output[out_offset + (out_stride)*10] = round_shift(step1[15], output_shift);                  \
+    output[out_offset + (out_stride)*11] = round_shift(-step1[7], output_shift);                  \
+    output[out_offset + (out_stride)*12] = round_shift(step1[5], output_shift);                   \
+    output[out_offset + (out_stride)*13] = round_shift(-step1[13], output_shift);                 \
+    output[out_offset + (out_stride)*14] = round_shift(step1[9], output_shift);                   \
+    output[out_offset + (out_stride)*15] = round_shift(-step1[1], output_shift);                  \
+  }
+
+#define IDCT32(input, in_offset, in_stride, output, out_offset, out_stride, range, output_shift)                 \
+  {                                                                                                              \
+    int step0[32];                                                                                               \
+    int step1[32];                                                                                               \
+    step0[0] = clamp_value(input[in_offset + (in_stride)*0], range);                                             \
+    step0[1] = clamp_value(input[in_offset + (in_stride)*16], range);                                            \
+    step0[2] = clamp_value(input[in_offset + (in_stride)*8], range);                                             \
+    step0[3] = clamp_value(input[in_offset + (in_stride)*24], range);                                            \
+    step0[4] = clamp_value(input[in_offset + (in_stride)*4], range);                                             \
+    step0[5] = clamp_value(input[in_offset + (in_stride)*20], range);                                            \
+    step0[6] = clamp_value(input[in_offset + (in_stride)*12], range);                                            \
+    step0[7] = clamp_value(input[in_offset + (in_stride)*28], range);                                            \
+    step0[8] = clamp_value(input[in_offset + (in_stride)*2], range);                                             \
+    step0[9] = clamp_value(input[in_offset + (in_stride)*18], range);                                            \
+    step0[10] = clamp_value(input[in_offset + (in_stride)*10], range);                                           \
+    step0[11] = clamp_value(input[in_offset + (in_stride)*26], range);                                           \
+    step0[12] = clamp_value(input[in_offset + (in_stride)*6], range);                                            \
+    step0[13] = clamp_value(input[in_offset + (in_stride)*22], range);                                           \
+    step0[14] = clamp_value(input[in_offset + (in_stride)*14], range);                                           \
+    step0[15] = clamp_value(input[in_offset + (in_stride)*30], range);                                           \
+    step0[16] = clamp_value(input[in_offset + (in_stride)*1], range);                                            \
+    step0[17] = clamp_value(input[in_offset + (in_stride)*17], range);                                           \
+    step0[18] = clamp_value(input[in_offset + (in_stride)*9], range);                                            \
+    step0[19] = clamp_value(input[in_offset + (in_stride)*25], range);                                           \
+    step0[20] = clamp_value(input[in_offset + (in_stride)*5], range);                                            \
+    step0[21] = clamp_value(input[in_offset + (in_stride)*21], range);                                           \
+    step0[22] = clamp_value(input[in_offset + (in_stride)*13], range);                                           \
+    step0[23] = clamp_value(input[in_offset + (in_stride)*29], range);                                           \
+    step0[24] = clamp_value(input[in_offset + (in_stride)*3], range);                                            \
+    step0[25] = clamp_value(input[in_offset + (in_stride)*19], range);                                           \
+    step0[26] = clamp_value(input[in_offset + (in_stride)*11], range);                                           \
+    step0[27] = clamp_value(input[in_offset + (in_stride)*27], range);                                           \
+    step0[28] = clamp_value(input[in_offset + (in_stride)*7], range);                                            \
+    step0[29] = clamp_value(input[in_offset + (in_stride)*23], range);                                           \
+    step0[30] = clamp_value(input[in_offset + (in_stride)*15], range);                                           \
+    step0[31] = clamp_value(input[in_offset + (in_stride)*31], range);                                           \
+    step1[0] = step0[0];                                                                                         \
+    step1[1] = step0[1];                                                                                         \
+    step1[2] = step0[2];                                                                                         \
+    step1[3] = step0[3];                                                                                         \
+    step1[4] = step0[4];                                                                                         \
+    step1[5] = step0[5];                                                                                         \
+    step1[6] = step0[6];                                                                                         \
+    step1[7] = step0[7];                                                                                         \
+    step1[8] = step0[8];                                                                                         \
+    step1[9] = step0[9];                                                                                         \
+    step1[10] = step0[10];                                                                                       \
+    step1[11] = step0[11];                                                                                       \
+    step1[12] = step0[12];                                                                                       \
+    step1[13] = step0[13];                                                                                       \
+    step1[14] = step0[14];                                                                                       \
+    step1[15] = step0[15];                                                                                       \
+    step1[16] = half_btf(cospi[62], step0[16], -cospi[2], step0[31], CosBits);                                   \
+    step1[17] = half_btf(cospi[30], step0[17], -cospi[34], step0[30], CosBits);                                  \
+    step1[18] = half_btf(cospi[46], step0[18], -cospi[18], step0[29], CosBits);                                  \
+    step1[19] = half_btf(cospi[14], step0[19], -cospi[50], step0[28], CosBits);                                  \
+    step1[20] = half_btf(cospi[54], step0[20], -cospi[10], step0[27], CosBits);                                  \
+    step1[21] = half_btf(cospi[22], step0[21], -cospi[42], step0[26], CosBits);                                  \
+    step1[22] = half_btf(cospi[38], step0[22], -cospi[26], step0[25], CosBits);                                  \
+    step1[23] = half_btf(cospi[6], step0[23], -cospi[58], step0[24], CosBits);                                   \
+    step1[24] = half_btf(cospi[58], step0[23], cospi[6], step0[24], CosBits);                                    \
+    step1[25] = half_btf(cospi[26], step0[22], cospi[38], step0[25], CosBits);                                   \
+    step1[26] = half_btf(cospi[42], step0[21], cospi[22], step0[26], CosBits);                                   \
+    step1[27] = half_btf(cospi[10], step0[20], cospi[54], step0[27], CosBits);                                   \
+    step1[28] = half_btf(cospi[50], step0[19], cospi[14], step0[28], CosBits);                                   \
+    step1[29] = half_btf(cospi[18], step0[18], cospi[46], step0[29], CosBits);                                   \
+    step1[30] = half_btf(cospi[34], step0[17], cospi[30], step0[30], CosBits);                                   \
+    step1[31] = half_btf(cospi[2], step0[16], cospi[62], step0[31], CosBits);                                    \
+    step0[0] = step1[0];                                                                                         \
+    step0[1] = step1[1];                                                                                         \
+    step0[2] = step1[2];                                                                                         \
+    step0[3] = step1[3];                                                                                         \
+    step0[4] = step1[4];                                                                                         \
+    step0[5] = step1[5];                                                                                         \
+    step0[6] = step1[6];                                                                                         \
+    step0[7] = step1[7];                                                                                         \
+    step0[8] = half_btf(cospi[60], step1[8], -cospi[4], step1[15], CosBits);                                     \
+    step0[9] = half_btf(cospi[28], step1[9], -cospi[36], step1[14], CosBits);                                    \
+    step0[10] = half_btf(cospi[44], step1[10], -cospi[20], step1[13], CosBits);                                  \
+    step0[11] = half_btf(cospi[12], step1[11], -cospi[52], step1[12], CosBits);                                  \
+    step0[12] = half_btf(cospi[52], step1[11], cospi[12], step1[12], CosBits);                                   \
+    step0[13] = half_btf(cospi[20], step1[10], cospi[44], step1[13], CosBits);                                   \
+    step0[14] = half_btf(cospi[36], step1[9], cospi[28], step1[14], CosBits);                                    \
+    step0[15] = half_btf(cospi[4], step1[8], cospi[60], step1[15], CosBits);                                     \
+    step0[16] = clamp_value(step1[16] + step1[17], range);                                                       \
+    step0[17] = clamp_value(step1[16] - step1[17], range);                                                       \
+    step0[18] = clamp_value(-step1[18] + step1[19], range);                                                      \
+    step0[19] = clamp_value(step1[18] + step1[19], range);                                                       \
+    step0[20] = clamp_value(step1[20] + step1[21], range);                                                       \
+    step0[21] = clamp_value(step1[20] - step1[21], range);                                                       \
+    step0[22] = clamp_value(-step1[22] + step1[23], range);                                                      \
+    step0[23] = clamp_value(step1[22] + step1[23], range);                                                       \
+    step0[24] = clamp_value(step1[24] + step1[25], range);                                                       \
+    step0[25] = clamp_value(step1[24] - step1[25], range);                                                       \
+    step0[26] = clamp_value(-step1[26] + step1[27], range);                                                      \
+    step0[27] = clamp_value(step1[26] + step1[27], range);                                                       \
+    step0[28] = clamp_value(step1[28] + step1[29], range);                                                       \
+    step0[29] = clamp_value(step1[28] - step1[29], range);                                                       \
+    step0[30] = clamp_value(-step1[30] + step1[31], range);                                                      \
+    step0[31] = clamp_value(step1[30] + step1[31], range);                                                       \
+    step1[0] = step0[0];                                                                                         \
+    step1[1] = step0[1];                                                                                         \
+    step1[2] = step0[2];                                                                                         \
+    step1[3] = step0[3];                                                                                         \
+    step1[4] = half_btf(cospi[56], step0[4], -cospi[8], step0[7], CosBits);                                      \
+    step1[5] = half_btf(cospi[24], step0[5], -cospi[40], step0[6], CosBits);                                     \
+    step1[6] = half_btf(cospi[40], step0[5], cospi[24], step0[6], CosBits);                                      \
+    step1[7] = half_btf(cospi[8], step0[4], cospi[56], step0[7], CosBits);                                       \
+    step1[8] = clamp_value(step0[8] + step0[9], range);                                                          \
+    step1[9] = clamp_value(step0[8] - step0[9], range);                                                          \
+    step1[10] = clamp_value(-step0[10] + step0[11], range);                                                      \
+    step1[11] = clamp_value(step0[10] + step0[11], range);                                                       \
+    step1[12] = clamp_value(step0[12] + step0[13], range);                                                       \
+    step1[13] = clamp_value(step0[12] - step0[13], range);                                                       \
+    step1[14] = clamp_value(-step0[14] + step0[15], range);                                                      \
+    step1[15] = clamp_value(step0[14] + step0[15], range);                                                       \
+    step1[16] = step0[16];                                                                                       \
+    step1[17] = half_btf(-cospi[8], step0[17], cospi[56], step0[30], CosBits);                                   \
+    step1[18] = half_btf(-cospi[56], step0[18], -cospi[8], step0[29], CosBits);                                  \
+    step1[19] = step0[19];                                                                                       \
+    step1[20] = step0[20];                                                                                       \
+    step1[21] = half_btf(-cospi[40], step0[21], cospi[24], step0[26], CosBits);                                  \
+    step1[22] = half_btf(-cospi[24], step0[22], -cospi[40], step0[25], CosBits);                                 \
+    step1[23] = step0[23];                                                                                       \
+    step1[24] = step0[24];                                                                                       \
+    step1[25] = half_btf(-cospi[40], step0[22], cospi[24], step0[25], CosBits);                                  \
+    step1[26] = half_btf(cospi[24], step0[21], cospi[40], step0[26], CosBits);                                   \
+    step1[27] = step0[27];                                                                                       \
+    step1[28] = step0[28];                                                                                       \
+    step1[29] = half_btf(-cospi[8], step0[18], cospi[56], step0[29], CosBits);                                   \
+    step1[30] = half_btf(cospi[56], step0[17], cospi[8], step0[30], CosBits);                                    \
+    step1[31] = step0[31];                                                                                       \
+    step0[0] = half_btf(cospi[32], step1[0], cospi[32], step1[1], CosBits);                                      \
+    step0[1] = half_btf(cospi[32], step1[0], -cospi[32], step1[1], CosBits);                                     \
+    step0[2] = half_btf(cospi[48], step1[2], -cospi[16], step1[3], CosBits);                                     \
+    step0[3] = half_btf(cospi[16], step1[2], cospi[48], step1[3], CosBits);                                      \
+    step0[4] = clamp_value(step1[4] + step1[5], range);                                                          \
+    step0[5] = clamp_value(step1[4] - step1[5], range);                                                          \
+    step0[6] = clamp_value(-step1[6] + step1[7], range);                                                         \
+    step0[7] = clamp_value(step1[6] + step1[7], range);                                                          \
+    step0[8] = step1[8];                                                                                         \
+    step0[9] = half_btf(-cospi[16], step1[9], cospi[48], step1[14], CosBits);                                    \
+    step0[10] = half_btf(-cospi[48], step1[10], -cospi[16], step1[13], CosBits);                                 \
+    step0[11] = step1[11];                                                                                       \
+    step0[12] = step1[12];                                                                                       \
+    step0[13] = half_btf(-cospi[16], step1[10], cospi[48], step1[13], CosBits);                                  \
+    step0[14] = half_btf(cospi[48], step1[9], cospi[16], step1[14], CosBits);                                    \
+    step0[15] = step1[15];                                                                                       \
+    step0[16] = clamp_value(step1[16] + step1[19], range);                                                       \
+    step0[17] = clamp_value(step1[17] + step1[18], range);                                                       \
+    step0[18] = clamp_value(step1[17] - step1[18], range);                                                       \
+    step0[19] = clamp_value(step1[16] - step1[19], range);                                                       \
+    step0[20] = clamp_value(-step1[20] + step1[23], range);                                                      \
+    step0[21] = clamp_value(-step1[21] + step1[22], range);                                                      \
+    step0[22] = clamp_value(step1[21] + step1[22], range);                                                       \
+    step0[23] = clamp_value(step1[20] + step1[23], range);                                                       \
+    step0[24] = clamp_value(step1[24] + step1[27], range);                                                       \
+    step0[25] = clamp_value(step1[25] + step1[26], range);                                                       \
+    step0[26] = clamp_value(step1[25] - step1[26], range);                                                       \
+    step0[27] = clamp_value(step1[24] - step1[27], range);                                                       \
+    step0[28] = clamp_value(-step1[28] + step1[31], range);                                                      \
+    step0[29] = clamp_value(-step1[29] + step1[30], range);                                                      \
+    step0[30] = clamp_value(step1[29] + step1[30], range);                                                       \
+    step0[31] = clamp_value(step1[28] + step1[31], range);                                                       \
+    step1[0] = clamp_value(step0[0] + step0[3], range);                                                          \
+    step1[1] = clamp_value(step0[1] + step0[2], range);                                                          \
+    step1[2] = clamp_value(step0[1] - step0[2], range);                                                          \
+    step1[3] = clamp_value(step0[0] - step0[3], range);                                                          \
+    step1[4] = step0[4];                                                                                         \
+    step1[5] = half_btf(-cospi[32], step0[5], cospi[32], step0[6], CosBits);                                     \
+    step1[6] = half_btf(cospi[32], step0[5], cospi[32], step0[6], CosBits);                                      \
+    step1[7] = step0[7];                                                                                         \
+    step1[8] = clamp_value(step0[8] + step0[11], range);                                                         \
+    step1[9] = clamp_value(step0[9] + step0[10], range);                                                         \
+    step1[10] = clamp_value(step0[9] - step0[10], range);                                                        \
+    step1[11] = clamp_value(step0[8] - step0[11], range);                                                        \
+    step1[12] = clamp_value(-step0[12] + step0[15], range);                                                      \
+    step1[13] = clamp_value(-step0[13] + step0[14], range);                                                      \
+    step1[14] = clamp_value(step0[13] + step0[14], range);                                                       \
+    step1[15] = clamp_value(step0[12] + step0[15], range);                                                       \
+    step1[16] = step0[16];                                                                                       \
+    step1[17] = step0[17];                                                                                       \
+    step1[18] = half_btf(-cospi[16], step0[18], cospi[48], step0[29], CosBits);                                  \
+    step1[19] = half_btf(-cospi[16], step0[19], cospi[48], step0[28], CosBits);                                  \
+    step1[20] = half_btf(-cospi[48], step0[20], -cospi[16], step0[27], CosBits);                                 \
+    step1[21] = half_btf(-cospi[48], step0[21], -cospi[16], step0[26], CosBits);                                 \
+    step1[22] = step0[22];                                                                                       \
+    step1[23] = step0[23];                                                                                       \
+    step1[24] = step0[24];                                                                                       \
+    step1[25] = step0[25];                                                                                       \
+    step1[26] = half_btf(-cospi[16], step0[21], cospi[48], step0[26], CosBits);                                  \
+    step1[27] = half_btf(-cospi[16], step0[20], cospi[48], step0[27], CosBits);                                  \
+    step1[28] = half_btf(cospi[48], step0[19], cospi[16], step0[28], CosBits);                                   \
+    step1[29] = half_btf(cospi[48], step0[18], cospi[16], step0[29], CosBits);                                   \
+    step1[30] = step0[30];                                                                                       \
+    step1[31] = step0[31];                                                                                       \
+    step0[0] = clamp_value(step1[0] + step1[7], range);                                                          \
+    step0[1] = clamp_value(step1[1] + step1[6], range);                                                          \
+    step0[2] = clamp_value(step1[2] + step1[5], range);                                                          \
+    step0[3] = clamp_value(step1[3] + step1[4], range);                                                          \
+    step0[4] = clamp_value(step1[3] - step1[4], range);                                                          \
+    step0[5] = clamp_value(step1[2] - step1[5], range);                                                          \
+    step0[6] = clamp_value(step1[1] - step1[6], range);                                                          \
+    step0[7] = clamp_value(step1[0] - step1[7], range);                                                          \
+    step0[8] = step1[8];                                                                                         \
+    step0[9] = step1[9];                                                                                         \
+    step0[10] = half_btf(-cospi[32], step1[10], cospi[32], step1[13], CosBits);                                  \
+    step0[11] = half_btf(-cospi[32], step1[11], cospi[32], step1[12], CosBits);                                  \
+    step0[12] = half_btf(cospi[32], step1[11], cospi[32], step1[12], CosBits);                                   \
+    step0[13] = half_btf(cospi[32], step1[10], cospi[32], step1[13], CosBits);                                   \
+    step0[14] = step1[14];                                                                                       \
+    step0[15] = step1[15];                                                                                       \
+    step0[16] = clamp_value(step1[16] + step1[23], range);                                                       \
+    step0[17] = clamp_value(step1[17] + step1[22], range);                                                       \
+    step0[18] = clamp_value(step1[18] + step1[21], range);                                                       \
+    step0[19] = clamp_value(step1[19] + step1[20], range);                                                       \
+    step0[20] = clamp_value(step1[19] - step1[20], range);                                                       \
+    step0[21] = clamp_value(step1[18] - step1[21], range);                                                       \
+    step0[22] = clamp_value(step1[17] - step1[22], range);                                                       \
+    step0[23] = clamp_value(step1[16] - step1[23], range);                                                       \
+    step0[24] = clamp_value(-step1[24] + step1[31], range);                                                      \
+    step0[25] = clamp_value(-step1[25] + step1[30], range);                                                      \
+    step0[26] = clamp_value(-step1[26] + step1[29], range);                                                      \
+    step0[27] = clamp_value(-step1[27] + step1[28], range);                                                      \
+    step0[28] = clamp_value(step1[27] + step1[28], range);                                                       \
+    step0[29] = clamp_value(step1[26] + step1[29], range);                                                       \
+    step0[30] = clamp_value(step1[25] + step1[30], range);                                                       \
+    step0[31] = clamp_value(step1[24] + step1[31], range);                                                       \
+    step1[0] = clamp_value(step0[0] + step0[15], range);                                                         \
+    step1[1] = clamp_value(step0[1] + step0[14], range);                                                         \
+    step1[2] = clamp_value(step0[2] + step0[13], range);                                                         \
+    step1[3] = clamp_value(step0[3] + step0[12], range);                                                         \
+    step1[4] = clamp_value(step0[4] + step0[11], range);                                                         \
+    step1[5] = clamp_value(step0[5] + step0[10], range);                                                         \
+    step1[6] = clamp_value(step0[6] + step0[9], range);                                                          \
+    step1[7] = clamp_value(step0[7] + step0[8], range);                                                          \
+    step1[8] = clamp_value(step0[7] - step0[8], range);                                                          \
+    step1[9] = clamp_value(step0[6] - step0[9], range);                                                          \
+    step1[10] = clamp_value(step0[5] - step0[10], range);                                                        \
+    step1[11] = clamp_value(step0[4] - step0[11], range);                                                        \
+    step1[12] = clamp_value(step0[3] - step0[12], range);                                                        \
+    step1[13] = clamp_value(step0[2] - step0[13], range);                                                        \
+    step1[14] = clamp_value(step0[1] - step0[14], range);                                                        \
+    step1[15] = clamp_value(step0[0] - step0[15], range);                                                        \
+    step1[16] = step0[16];                                                                                       \
+    step1[17] = step0[17];                                                                                       \
+    step1[18] = step0[18];                                                                                       \
+    step1[19] = step0[19];                                                                                       \
+    step1[20] = half_btf(-cospi[32], step0[20], cospi[32], step0[27], CosBits);                                  \
+    step1[21] = half_btf(-cospi[32], step0[21], cospi[32], step0[26], CosBits);                                  \
+    step1[22] = half_btf(-cospi[32], step0[22], cospi[32], step0[25], CosBits);                                  \
+    step1[23] = half_btf(-cospi[32], step0[23], cospi[32], step0[24], CosBits);                                  \
+    step1[24] = half_btf(cospi[32], step0[23], cospi[32], step0[24], CosBits);                                   \
+    step1[25] = half_btf(cospi[32], step0[22], cospi[32], step0[25], CosBits);                                   \
+    step1[26] = half_btf(cospi[32], step0[21], cospi[32], step0[26], CosBits);                                   \
+    step1[27] = half_btf(cospi[32], step0[20], cospi[32], step0[27], CosBits);                                   \
+    step1[28] = step0[28];                                                                                       \
+    step1[29] = step0[29];                                                                                       \
+    step1[30] = step0[30];                                                                                       \
+    step1[31] = step0[31];                                                                                       \
+    output[out_offset + (out_stride)*0] = round_shift(clamp_value(step1[0] + step1[31], range), output_shift);   \
+    output[out_offset + (out_stride)*1] = round_shift(clamp_value(step1[1] + step1[30], range), output_shift);   \
+    output[out_offset + (out_stride)*2] = round_shift(clamp_value(step1[2] + step1[29], range), output_shift);   \
+    output[out_offset + (out_stride)*3] = round_shift(clamp_value(step1[3] + step1[28], range), output_shift);   \
+    output[out_offset + (out_stride)*4] = round_shift(clamp_value(step1[4] + step1[27], range), output_shift);   \
+    output[out_offset + (out_stride)*5] = round_shift(clamp_value(step1[5] + step1[26], range), output_shift);   \
+    output[out_offset + (out_stride)*6] = round_shift(clamp_value(step1[6] + step1[25], range), output_shift);   \
+    output[out_offset + (out_stride)*7] = round_shift(clamp_value(step1[7] + step1[24], range), output_shift);   \
+    output[out_offset + (out_stride)*8] = round_shift(clamp_value(step1[8] + step1[23], range), output_shift);   \
+    output[out_offset + (out_stride)*9] = round_shift(clamp_value(step1[9] + step1[22], range), output_shift);   \
+    output[out_offset + (out_stride)*10] = round_shift(clamp_value(step1[10] + step1[21], range), output_shift); \
+    output[out_offset + (out_stride)*11] = round_shift(clamp_value(step1[11] + step1[20], range), output_shift); \
+    output[out_offset + (out_stride)*12] = round_shift(clamp_value(step1[12] + step1[19], range), output_shift); \
+    output[out_offset + (out_stride)*13] = round_shift(clamp_value(step1[13] + step1[18], range), output_shift); \
+    output[out_offset + (out_stride)*14] = round_shift(clamp_value(step1[14] + step1[17], range), output_shift); \
+    output[out_offset + (out_stride)*15] = round_shift(clamp_value(step1[15] + step1[16], range), output_shift); \
+    output[out_offset + (out_stride)*16] = round_shift(clamp_value(step1[15] - step1[16], range), output_shift); \
+    output[out_offset + (out_stride)*17] = round_shift(clamp_value(step1[14] - step1[17], range), output_shift); \
+    output[out_offset + (out_stride)*18] = round_shift(clamp_value(step1[13] - step1[18], range), output_shift); \
+    output[out_offset + (out_stride)*19] = round_shift(clamp_value(step1[12] - step1[19], range), output_shift); \
+    output[out_offset + (out_stride)*20] = round_shift(clamp_value(step1[11] - step1[20], range), output_shift); \
+    output[out_offset + (out_stride)*21] = round_shift(clamp_value(step1[10] - step1[21], range), output_shift); \
+    output[out_offset + (out_stride)*22] = round_shift(clamp_value(step1[9] - step1[22], range), output_shift);  \
+    output[out_offset + (out_stride)*23] = round_shift(clamp_value(step1[8] - step1[23], range), output_shift);  \
+    output[out_offset + (out_stride)*24] = round_shift(clamp_value(step1[7] - step1[24], range), output_shift);  \
+    output[out_offset + (out_stride)*25] = round_shift(clamp_value(step1[6] - step1[25], range), output_shift);  \
+    output[out_offset + (out_stride)*26] = round_shift(clamp_value(step1[5] - step1[26], range), output_shift);  \
+    output[out_offset + (out_stride)*27] = round_shift(clamp_value(step1[4] - step1[27], range), output_shift);  \
+    output[out_offset + (out_stride)*28] = round_shift(clamp_value(step1[3] - step1[28], range), output_shift);  \
+    output[out_offset + (out_stride)*29] = round_shift(clamp_value(step1[2] - step1[29], range), output_shift);  \
+    output[out_offset + (out_stride)*30] = round_shift(clamp_value(step1[1] - step1[30], range), output_shift);  \
+    output[out_offset + (out_stride)*31] = round_shift(clamp_value(step1[0] - step1[31], range), output_shift);  \
+  }
+
+/*
+step0[0]  = input[in_offset + (in_stride) * 0]; \
+step0[1]  = input[in_offset + (in_stride) * 32]; \
+step0[2]  = input[in_offset + (in_stride) * 16]; \
+step0[3]  = input[in_offset + (in_stride) * 48]; \
+step0[4]  = input[in_offset + (in_stride) * 8]; \
+step0[5]  = input[in_offset + (in_stride) * 40]; \
+step0[6]  = input[in_offset + (in_stride) * 24]; \
+step0[7]  = input[in_offset + (in_stride) * 56]; \
+step0[8]  = input[in_offset + (in_stride) * 4]; \
+step0[9]  = input[in_offset + (in_stride) * 36]; \
+step0[10] = input[in_offset + (in_stride) * 20]; \
+step0[11] = input[in_offset + (in_stride) * 52]; \
+step0[12] = input[in_offset + (in_stride) * 12]; \
+step0[13] = input[in_offset + (in_stride) * 44]; \
+step0[14] = input[in_offset + (in_stride) * 28]; \
+step0[15] = input[in_offset + (in_stride) * 60]; \
+step0[16] = input[in_offset + (in_stride) * 2]; \
+step0[17] = input[in_offset + (in_stride) * 34]; \
+step0[18] = input[in_offset + (in_stride) * 18]; \
+step0[19] = input[in_offset + (in_stride) * 50]; \
+step0[20] = input[in_offset + (in_stride) * 10]; \
+step0[21] = input[in_offset + (in_stride) * 42]; \
+step0[22] = input[in_offset + (in_stride) * 26]; \
+step0[23] = input[in_offset + (in_stride) * 58]; \
+step0[24] = input[in_offset + (in_stride) * 6]; \
+step0[25] = input[in_offset + (in_stride) * 38]; \
+step0[26] = input[in_offset + (in_stride) * 22]; \
+step0[27] = input[in_offset + (in_stride) * 54]; \
+step0[28] = input[in_offset + (in_stride) * 14]; \
+step0[29] = input[in_offset + (in_stride) * 46]; \
+step0[30] = input[in_offset + (in_stride) * 30]; \
+step0[31] = input[in_offset + (in_stride) * 62]; \
+step0[32] = input[in_offset + (in_stride) * 1]; \
+step0[33] = input[in_offset + (in_stride) * 33]; \
+step0[34] = input[in_offset + (in_stride) * 17]; \
+step0[35] = input[in_offset + (in_stride) * 49]; \
+step0[36] = input[in_offset + (in_stride) * 9]; \
+step0[37] = input[in_offset + (in_stride) * 41]; \
+step0[38] = input[in_offset + (in_stride) * 25]; \
+step0[39] = input[in_offset + (in_stride) * 57]; \
+step0[40] = input[in_offset + (in_stride) * 5]; \
+step0[41] = input[in_offset + (in_stride) * 37]; \
+step0[42] = input[in_offset + (in_stride) * 21]; \
+step0[43] = input[in_offset + (in_stride) * 53]; \
+step0[44] = input[in_offset + (in_stride) * 13]; \
+step0[45] = input[in_offset + (in_stride) * 45]; \
+step0[46] = input[in_offset + (in_stride) * 29]; \
+step0[47] = input[in_offset + (in_stride) * 61]; \
+step0[48] = input[in_offset + (in_stride) * 3]; \
+step0[49] = input[in_offset + (in_stride) * 35]; \
+step0[50] = input[in_offset + (in_stride) * 19]; \
+step0[51] = input[in_offset + (in_stride) * 51]; \
+step0[52] = input[in_offset + (in_stride) * 11]; \
+step0[53] = input[in_offset + (in_stride) * 43]; \
+step0[54] = input[in_offset + (in_stride) * 27]; \
+step0[55] = input[in_offset + (in_stride) * 59]; \
+step0[56] = input[in_offset + (in_stride) * 7]; \
+step0[57] = input[in_offset + (in_stride) * 39]; \
+step0[58] = input[in_offset + (in_stride) * 23]; \
+step0[59] = input[in_offset + (in_stride) * 55]; \
+step0[60] = input[in_offset + (in_stride) * 15]; \
+step0[61] = input[in_offset + (in_stride) * 47]; \
+step0[62] = input[in_offset + (in_stride) * 31]; \
+step0[63] = input[in_offset + (in_stride) * 63]; \
+*/
+
+#define IDCT64(input, in_offset, in_stride, output, out_offset, out_stride, range, output_shift)                 \
+  {                                                                                                              \
+    int step0[64];                                                                                               \
+    int step1[64];                                                                                               \
+    step0[0] = clamp_value(input[in_offset + (in_stride)*0], range);                                             \
+    step0[1] = 0;                                                                                                \
+    step0[2] = clamp_value(input[in_offset + (in_stride)*16], range);                                            \
+    step0[3] = 0;                                                                                                \
+    step0[4] = clamp_value(input[in_offset + (in_stride)*8], range);                                             \
+    step0[5] = 0;                                                                                                \
+    step0[6] = clamp_value(input[in_offset + (in_stride)*24], range);                                            \
+    step0[7] = 0;                                                                                                \
+    step0[8] = clamp_value(input[in_offset + (in_stride)*4], range);                                             \
+    step0[9] = 0;                                                                                                \
+    step0[10] = clamp_value(input[in_offset + (in_stride)*20], range);                                           \
+    step0[11] = 0;                                                                                               \
+    step0[12] = clamp_value(input[in_offset + (in_stride)*12], range);                                           \
+    step0[13] = 0;                                                                                               \
+    step0[14] = clamp_value(input[in_offset + (in_stride)*28], range);                                           \
+    step0[15] = 0;                                                                                               \
+    step0[16] = clamp_value(input[in_offset + (in_stride)*2], range);                                            \
+    step0[17] = 0;                                                                                               \
+    step0[18] = clamp_value(input[in_offset + (in_stride)*18], range);                                           \
+    step0[19] = 0;                                                                                               \
+    step0[20] = clamp_value(input[in_offset + (in_stride)*10], range);                                           \
+    step0[21] = 0;                                                                                               \
+    step0[22] = clamp_value(input[in_offset + (in_stride)*26], range);                                           \
+    step0[23] = 0;                                                                                               \
+    step0[24] = clamp_value(input[in_offset + (in_stride)*6], range);                                            \
+    step0[25] = 0;                                                                                               \
+    step0[26] = clamp_value(input[in_offset + (in_stride)*22], range);                                           \
+    step0[27] = 0;                                                                                               \
+    step0[28] = clamp_value(input[in_offset + (in_stride)*14], range);                                           \
+    step0[29] = 0;                                                                                               \
+    step0[30] = clamp_value(input[in_offset + (in_stride)*30], range);                                           \
+    step0[31] = 0;                                                                                               \
+    step0[32] = clamp_value(input[in_offset + (in_stride)*1], range);                                            \
+    step0[33] = 0;                                                                                               \
+    step0[34] = clamp_value(input[in_offset + (in_stride)*17], range);                                           \
+    step0[35] = 0;                                                                                               \
+    step0[36] = clamp_value(input[in_offset + (in_stride)*9], range);                                            \
+    step0[37] = 0;                                                                                               \
+    step0[38] = clamp_value(input[in_offset + (in_stride)*25], range);                                           \
+    step0[39] = 0;                                                                                               \
+    step0[40] = clamp_value(input[in_offset + (in_stride)*5], range);                                            \
+    step0[41] = 0;                                                                                               \
+    step0[42] = clamp_value(input[in_offset + (in_stride)*21], range);                                           \
+    step0[43] = 0;                                                                                               \
+    step0[44] = clamp_value(input[in_offset + (in_stride)*13], range);                                           \
+    step0[45] = 0;                                                                                               \
+    step0[46] = clamp_value(input[in_offset + (in_stride)*29], range);                                           \
+    step0[47] = 0;                                                                                               \
+    step0[48] = clamp_value(input[in_offset + (in_stride)*3], range);                                            \
+    step0[49] = 0;                                                                                               \
+    step0[50] = clamp_value(input[in_offset + (in_stride)*19], range);                                           \
+    step0[51] = 0;                                                                                               \
+    step0[52] = clamp_value(input[in_offset + (in_stride)*11], range);                                           \
+    step0[53] = 0;                                                                                               \
+    step0[54] = clamp_value(input[in_offset + (in_stride)*27], range);                                           \
+    step0[55] = 0;                                                                                               \
+    step0[56] = clamp_value(input[in_offset + (in_stride)*7], range);                                            \
+    step0[57] = 0;                                                                                               \
+    step0[58] = clamp_value(input[in_offset + (in_stride)*23], range);                                           \
+    step0[59] = 0;                                                                                               \
+    step0[60] = clamp_value(input[in_offset + (in_stride)*15], range);                                           \
+    step0[61] = 0;                                                                                               \
+    step0[62] = clamp_value(input[in_offset + (in_stride)*31], range);                                           \
+    step0[63] = 0;                                                                                               \
+    step1[0] = step0[0];                                                                                         \
+    step1[1] = step0[1];                                                                                         \
+    step1[2] = step0[2];                                                                                         \
+    step1[3] = step0[3];                                                                                         \
+    step1[4] = step0[4];                                                                                         \
+    step1[5] = step0[5];                                                                                         \
+    step1[6] = step0[6];                                                                                         \
+    step1[7] = step0[7];                                                                                         \
+    step1[8] = step0[8];                                                                                         \
+    step1[9] = step0[9];                                                                                         \
+    step1[10] = step0[10];                                                                                       \
+    step1[11] = step0[11];                                                                                       \
+    step1[12] = step0[12];                                                                                       \
+    step1[13] = step0[13];                                                                                       \
+    step1[14] = step0[14];                                                                                       \
+    step1[15] = step0[15];                                                                                       \
+    step1[16] = step0[16];                                                                                       \
+    step1[17] = step0[17];                                                                                       \
+    step1[18] = step0[18];                                                                                       \
+    step1[19] = step0[19];                                                                                       \
+    step1[20] = step0[20];                                                                                       \
+    step1[21] = step0[21];                                                                                       \
+    step1[22] = step0[22];                                                                                       \
+    step1[23] = step0[23];                                                                                       \
+    step1[24] = step0[24];                                                                                       \
+    step1[25] = step0[25];                                                                                       \
+    step1[26] = step0[26];                                                                                       \
+    step1[27] = step0[27];                                                                                       \
+    step1[28] = step0[28];                                                                                       \
+    step1[29] = step0[29];                                                                                       \
+    step1[30] = step0[30];                                                                                       \
+    step1[31] = step0[31];                                                                                       \
+    step1[32] = half_btf(cospi[63], step0[32], -cospi[1], step0[63], CosBits);                                   \
+    step1[33] = half_btf(cospi[31], step0[33], -cospi[33], step0[62], CosBits);                                  \
+    step1[34] = half_btf(cospi[47], step0[34], -cospi[17], step0[61], CosBits);                                  \
+    step1[35] = half_btf(cospi[15], step0[35], -cospi[49], step0[60], CosBits);                                  \
+    step1[36] = half_btf(cospi[55], step0[36], -cospi[9], step0[59], CosBits);                                   \
+    step1[37] = half_btf(cospi[23], step0[37], -cospi[41], step0[58], CosBits);                                  \
+    step1[38] = half_btf(cospi[39], step0[38], -cospi[25], step0[57], CosBits);                                  \
+    step1[39] = half_btf(cospi[7], step0[39], -cospi[57], step0[56], CosBits);                                   \
+    step1[40] = half_btf(cospi[59], step0[40], -cospi[5], step0[55], CosBits);                                   \
+    step1[41] = half_btf(cospi[27], step0[41], -cospi[37], step0[54], CosBits);                                  \
+    step1[42] = half_btf(cospi[43], step0[42], -cospi[21], step0[53], CosBits);                                  \
+    step1[43] = half_btf(cospi[11], step0[43], -cospi[53], step0[52], CosBits);                                  \
+    step1[44] = half_btf(cospi[51], step0[44], -cospi[13], step0[51], CosBits);                                  \
+    step1[45] = half_btf(cospi[19], step0[45], -cospi[45], step0[50], CosBits);                                  \
+    step1[46] = half_btf(cospi[35], step0[46], -cospi[29], step0[49], CosBits);                                  \
+    step1[47] = half_btf(cospi[3], step0[47], -cospi[61], step0[48], CosBits);                                   \
+    step1[48] = half_btf(cospi[61], step0[47], cospi[3], step0[48], CosBits);                                    \
+    step1[49] = half_btf(cospi[29], step0[46], cospi[35], step0[49], CosBits);                                   \
+    step1[50] = half_btf(cospi[45], step0[45], cospi[19], step0[50], CosBits);                                   \
+    step1[51] = half_btf(cospi[13], step0[44], cospi[51], step0[51], CosBits);                                   \
+    step1[52] = half_btf(cospi[53], step0[43], cospi[11], step0[52], CosBits);                                   \
+    step1[53] = half_btf(cospi[21], step0[42], cospi[43], step0[53], CosBits);                                   \
+    step1[54] = half_btf(cospi[37], step0[41], cospi[27], step0[54], CosBits);                                   \
+    step1[55] = half_btf(cospi[5], step0[40], cospi[59], step0[55], CosBits);                                    \
+    step1[56] = half_btf(cospi[57], step0[39], cospi[7], step0[56], CosBits);                                    \
+    step1[57] = half_btf(cospi[25], step0[38], cospi[39], step0[57], CosBits);                                   \
+    step1[58] = half_btf(cospi[41], step0[37], cospi[23], step0[58], CosBits);                                   \
+    step1[59] = half_btf(cospi[9], step0[36], cospi[55], step0[59], CosBits);                                    \
+    step1[60] = half_btf(cospi[49], step0[35], cospi[15], step0[60], CosBits);                                   \
+    step1[61] = half_btf(cospi[17], step0[34], cospi[47], step0[61], CosBits);                                   \
+    step1[62] = half_btf(cospi[33], step0[33], cospi[31], step0[62], CosBits);                                   \
+    step1[63] = half_btf(cospi[1], step0[32], cospi[63], step0[63], CosBits);                                    \
+    step0[0] = step1[0];                                                                                         \
+    step0[1] = step1[1];                                                                                         \
+    step0[2] = step1[2];                                                                                         \
+    step0[3] = step1[3];                                                                                         \
+    step0[4] = step1[4];                                                                                         \
+    step0[5] = step1[5];                                                                                         \
+    step0[6] = step1[6];                                                                                         \
+    step0[7] = step1[7];                                                                                         \
+    step0[8] = step1[8];                                                                                         \
+    step0[9] = step1[9];                                                                                         \
+    step0[10] = step1[10];                                                                                       \
+    step0[11] = step1[11];                                                                                       \
+    step0[12] = step1[12];                                                                                       \
+    step0[13] = step1[13];                                                                                       \
+    step0[14] = step1[14];                                                                                       \
+    step0[15] = step1[15];                                                                                       \
+    step0[16] = half_btf(cospi[62], step1[16], -cospi[2], step1[31], CosBits);                                   \
+    step0[17] = half_btf(cospi[30], step1[17], -cospi[34], step1[30], CosBits);                                  \
+    step0[18] = half_btf(cospi[46], step1[18], -cospi[18], step1[29], CosBits);                                  \
+    step0[19] = half_btf(cospi[14], step1[19], -cospi[50], step1[28], CosBits);                                  \
+    step0[20] = half_btf(cospi[54], step1[20], -cospi[10], step1[27], CosBits);                                  \
+    step0[21] = half_btf(cospi[22], step1[21], -cospi[42], step1[26], CosBits);                                  \
+    step0[22] = half_btf(cospi[38], step1[22], -cospi[26], step1[25], CosBits);                                  \
+    step0[23] = half_btf(cospi[6], step1[23], -cospi[58], step1[24], CosBits);                                   \
+    step0[24] = half_btf(cospi[58], step1[23], cospi[6], step1[24], CosBits);                                    \
+    step0[25] = half_btf(cospi[26], step1[22], cospi[38], step1[25], CosBits);                                   \
+    step0[26] = half_btf(cospi[42], step1[21], cospi[22], step1[26], CosBits);                                   \
+    step0[27] = half_btf(cospi[10], step1[20], cospi[54], step1[27], CosBits);                                   \
+    step0[28] = half_btf(cospi[50], step1[19], cospi[14], step1[28], CosBits);                                   \
+    step0[29] = half_btf(cospi[18], step1[18], cospi[46], step1[29], CosBits);                                   \
+    step0[30] = half_btf(cospi[34], step1[17], cospi[30], step1[30], CosBits);                                   \
+    step0[31] = half_btf(cospi[2], step1[16], cospi[62], step1[31], CosBits);                                    \
+    step0[32] = clamp_value(step1[32] + step1[33], range);                                                       \
+    step0[33] = clamp_value(step1[32] - step1[33], range);                                                       \
+    step0[34] = clamp_value(-step1[34] + step1[35], range);                                                      \
+    step0[35] = clamp_value(step1[34] + step1[35], range);                                                       \
+    step0[36] = clamp_value(step1[36] + step1[37], range);                                                       \
+    step0[37] = clamp_value(step1[36] - step1[37], range);                                                       \
+    step0[38] = clamp_value(-step1[38] + step1[39], range);                                                      \
+    step0[39] = clamp_value(step1[38] + step1[39], range);                                                       \
+    step0[40] = clamp_value(step1[40] + step1[41], range);                                                       \
+    step0[41] = clamp_value(step1[40] - step1[41], range);                                                       \
+    step0[42] = clamp_value(-step1[42] + step1[43], range);                                                      \
+    step0[43] = clamp_value(step1[42] + step1[43], range);                                                       \
+    step0[44] = clamp_value(step1[44] + step1[45], range);                                                       \
+    step0[45] = clamp_value(step1[44] - step1[45], range);                                                       \
+    step0[46] = clamp_value(-step1[46] + step1[47], range);                                                      \
+    step0[47] = clamp_value(step1[46] + step1[47], range);                                                       \
+    step0[48] = clamp_value(step1[48] + step1[49], range);                                                       \
+    step0[49] = clamp_value(step1[48] - step1[49], range);                                                       \
+    step0[50] = clamp_value(-step1[50] + step1[51], range);                                                      \
+    step0[51] = clamp_value(step1[50] + step1[51], range);                                                       \
+    step0[52] = clamp_value(step1[52] + step1[53], range);                                                       \
+    step0[53] = clamp_value(step1[52] - step1[53], range);                                                       \
+    step0[54] = clamp_value(-step1[54] + step1[55], range);                                                      \
+    step0[55] = clamp_value(step1[54] + step1[55], range);                                                       \
+    step0[56] = clamp_value(step1[56] + step1[57], range);                                                       \
+    step0[57] = clamp_value(step1[56] - step1[57], range);                                                       \
+    step0[58] = clamp_value(-step1[58] + step1[59], range);                                                      \
+    step0[59] = clamp_value(step1[58] + step1[59], range);                                                       \
+    step0[60] = clamp_value(step1[60] + step1[61], range);                                                       \
+    step0[61] = clamp_value(step1[60] - step1[61], range);                                                       \
+    step0[62] = clamp_value(-step1[62] + step1[63], range);                                                      \
+    step0[63] = clamp_value(step1[62] + step1[63], range);                                                       \
+    step1[0] = step0[0];                                                                                         \
+    step1[1] = step0[1];                                                                                         \
+    step1[2] = step0[2];                                                                                         \
+    step1[3] = step0[3];                                                                                         \
+    step1[4] = step0[4];                                                                                         \
+    step1[5] = step0[5];                                                                                         \
+    step1[6] = step0[6];                                                                                         \
+    step1[7] = step0[7];                                                                                         \
+    step1[8] = half_btf(cospi[60], step0[8], -cospi[4], step0[15], CosBits);                                     \
+    step1[9] = half_btf(cospi[28], step0[9], -cospi[36], step0[14], CosBits);                                    \
+    step1[10] = half_btf(cospi[44], step0[10], -cospi[20], step0[13], CosBits);                                  \
+    step1[11] = half_btf(cospi[12], step0[11], -cospi[52], step0[12], CosBits);                                  \
+    step1[12] = half_btf(cospi[52], step0[11], cospi[12], step0[12], CosBits);                                   \
+    step1[13] = half_btf(cospi[20], step0[10], cospi[44], step0[13], CosBits);                                   \
+    step1[14] = half_btf(cospi[36], step0[9], cospi[28], step0[14], CosBits);                                    \
+    step1[15] = half_btf(cospi[4], step0[8], cospi[60], step0[15], CosBits);                                     \
+    step1[16] = clamp_value(step0[16] + step0[17], range);                                                       \
+    step1[17] = clamp_value(step0[16] - step0[17], range);                                                       \
+    step1[18] = clamp_value(-step0[18] + step0[19], range);                                                      \
+    step1[19] = clamp_value(step0[18] + step0[19], range);                                                       \
+    step1[20] = clamp_value(step0[20] + step0[21], range);                                                       \
+    step1[21] = clamp_value(step0[20] - step0[21], range);                                                       \
+    step1[22] = clamp_value(-step0[22] + step0[23], range);                                                      \
+    step1[23] = clamp_value(step0[22] + step0[23], range);                                                       \
+    step1[24] = clamp_value(step0[24] + step0[25], range);                                                       \
+    step1[25] = clamp_value(step0[24] - step0[25], range);                                                       \
+    step1[26] = clamp_value(-step0[26] + step0[27], range);                                                      \
+    step1[27] = clamp_value(step0[26] + step0[27], range);                                                       \
+    step1[28] = clamp_value(step0[28] + step0[29], range);                                                       \
+    step1[29] = clamp_value(step0[28] - step0[29], range);                                                       \
+    step1[30] = clamp_value(-step0[30] + step0[31], range);                                                      \
+    step1[31] = clamp_value(step0[30] + step0[31], range);                                                       \
+    step1[32] = step0[32];                                                                                       \
+    step1[33] = half_btf(-cospi[4], step0[33], cospi[60], step0[62], CosBits);                                   \
+    step1[34] = half_btf(-cospi[60], step0[34], -cospi[4], step0[61], CosBits);                                  \
+    step1[35] = step0[35];                                                                                       \
+    step1[36] = step0[36];                                                                                       \
+    step1[37] = half_btf(-cospi[36], step0[37], cospi[28], step0[58], CosBits);                                  \
+    step1[38] = half_btf(-cospi[28], step0[38], -cospi[36], step0[57], CosBits);                                 \
+    step1[39] = step0[39];                                                                                       \
+    step1[40] = step0[40];                                                                                       \
+    step1[41] = half_btf(-cospi[20], step0[41], cospi[44], step0[54], CosBits);                                  \
+    step1[42] = half_btf(-cospi[44], step0[42], -cospi[20], step0[53], CosBits);                                 \
+    step1[43] = step0[43];                                                                                       \
+    step1[44] = step0[44];                                                                                       \
+    step1[45] = half_btf(-cospi[52], step0[45], cospi[12], step0[50], CosBits);                                  \
+    step1[46] = half_btf(-cospi[12], step0[46], -cospi[52], step0[49], CosBits);                                 \
+    step1[47] = step0[47];                                                                                       \
+    step1[48] = step0[48];                                                                                       \
+    step1[49] = half_btf(-cospi[52], step0[46], cospi[12], step0[49], CosBits);                                  \
+    step1[50] = half_btf(cospi[12], step0[45], cospi[52], step0[50], CosBits);                                   \
+    step1[51] = step0[51];                                                                                       \
+    step1[52] = step0[52];                                                                                       \
+    step1[53] = half_btf(-cospi[20], step0[42], cospi[44], step0[53], CosBits);                                  \
+    step1[54] = half_btf(cospi[44], step0[41], cospi[20], step0[54], CosBits);                                   \
+    step1[55] = step0[55];                                                                                       \
+    step1[56] = step0[56];                                                                                       \
+    step1[57] = half_btf(-cospi[36], step0[38], cospi[28], step0[57], CosBits);                                  \
+    step1[58] = half_btf(cospi[28], step0[37], cospi[36], step0[58], CosBits);                                   \
+    step1[59] = step0[59];                                                                                       \
+    step1[60] = step0[60];                                                                                       \
+    step1[61] = half_btf(-cospi[4], step0[34], cospi[60], step0[61], CosBits);                                   \
+    step1[62] = half_btf(cospi[60], step0[33], cospi[4], step0[62], CosBits);                                    \
+    step1[63] = step0[63];                                                                                       \
+    step0[0] = step1[0];                                                                                         \
+    step0[1] = step1[1];                                                                                         \
+    step0[2] = step1[2];                                                                                         \
+    step0[3] = step1[3];                                                                                         \
+    step0[4] = half_btf(cospi[56], step1[4], -cospi[8], step1[7], CosBits);                                      \
+    step0[5] = half_btf(cospi[24], step1[5], -cospi[40], step1[6], CosBits);                                     \
+    step0[6] = half_btf(cospi[40], step1[5], cospi[24], step1[6], CosBits);                                      \
+    step0[7] = half_btf(cospi[8], step1[4], cospi[56], step1[7], CosBits);                                       \
+    step0[8] = clamp_value(step1[8] + step1[9], range);                                                          \
+    step0[9] = clamp_value(step1[8] - step1[9], range);                                                          \
+    step0[10] = clamp_value(-step1[10] + step1[11], range);                                                      \
+    step0[11] = clamp_value(step1[10] + step1[11], range);                                                       \
+    step0[12] = clamp_value(step1[12] + step1[13], range);                                                       \
+    step0[13] = clamp_value(step1[12] - step1[13], range);                                                       \
+    step0[14] = clamp_value(-step1[14] + step1[15], range);                                                      \
+    step0[15] = clamp_value(step1[14] + step1[15], range);                                                       \
+    step0[16] = step1[16];                                                                                       \
+    step0[17] = half_btf(-cospi[8], step1[17], cospi[56], step1[30], CosBits);                                   \
+    step0[18] = half_btf(-cospi[56], step1[18], -cospi[8], step1[29], CosBits);                                  \
+    step0[19] = step1[19];                                                                                       \
+    step0[20] = step1[20];                                                                                       \
+    step0[21] = half_btf(-cospi[40], step1[21], cospi[24], step1[26], CosBits);                                  \
+    step0[22] = half_btf(-cospi[24], step1[22], -cospi[40], step1[25], CosBits);                                 \
+    step0[23] = step1[23];                                                                                       \
+    step0[24] = step1[24];                                                                                       \
+    step0[25] = half_btf(-cospi[40], step1[22], cospi[24], step1[25], CosBits);                                  \
+    step0[26] = half_btf(cospi[24], step1[21], cospi[40], step1[26], CosBits);                                   \
+    step0[27] = step1[27];                                                                                       \
+    step0[28] = step1[28];                                                                                       \
+    step0[29] = half_btf(-cospi[8], step1[18], cospi[56], step1[29], CosBits);                                   \
+    step0[30] = half_btf(cospi[56], step1[17], cospi[8], step1[30], CosBits);                                    \
+    step0[31] = step1[31];                                                                                       \
+    step0[32] = clamp_value(step1[32] + step1[35], range);                                                       \
+    step0[33] = clamp_value(step1[33] + step1[34], range);                                                       \
+    step0[34] = clamp_value(step1[33] - step1[34], range);                                                       \
+    step0[35] = clamp_value(step1[32] - step1[35], range);                                                       \
+    step0[36] = clamp_value(-step1[36] + step1[39], range);                                                      \
+    step0[37] = clamp_value(-step1[37] + step1[38], range);                                                      \
+    step0[38] = clamp_value(step1[37] + step1[38], range);                                                       \
+    step0[39] = clamp_value(step1[36] + step1[39], range);                                                       \
+    step0[40] = clamp_value(step1[40] + step1[43], range);                                                       \
+    step0[41] = clamp_value(step1[41] + step1[42], range);                                                       \
+    step0[42] = clamp_value(step1[41] - step1[42], range);                                                       \
+    step0[43] = clamp_value(step1[40] - step1[43], range);                                                       \
+    step0[44] = clamp_value(-step1[44] + step1[47], range);                                                      \
+    step0[45] = clamp_value(-step1[45] + step1[46], range);                                                      \
+    step0[46] = clamp_value(step1[45] + step1[46], range);                                                       \
+    step0[47] = clamp_value(step1[44] + step1[47], range);                                                       \
+    step0[48] = clamp_value(step1[48] + step1[51], range);                                                       \
+    step0[49] = clamp_value(step1[49] + step1[50], range);                                                       \
+    step0[50] = clamp_value(step1[49] - step1[50], range);                                                       \
+    step0[51] = clamp_value(step1[48] - step1[51], range);                                                       \
+    step0[52] = clamp_value(-step1[52] + step1[55], range);                                                      \
+    step0[53] = clamp_value(-step1[53] + step1[54], range);                                                      \
+    step0[54] = clamp_value(step1[53] + step1[54], range);                                                       \
+    step0[55] = clamp_value(step1[52] + step1[55], range);                                                       \
+    step0[56] = clamp_value(step1[56] + step1[59], range);                                                       \
+    step0[57] = clamp_value(step1[57] + step1[58], range);                                                       \
+    step0[58] = clamp_value(step1[57] - step1[58], range);                                                       \
+    step0[59] = clamp_value(step1[56] - step1[59], range);                                                       \
+    step0[60] = clamp_value(-step1[60] + step1[63], range);                                                      \
+    step0[61] = clamp_value(-step1[61] + step1[62], range);                                                      \
+    step0[62] = clamp_value(step1[61] + step1[62], range);                                                       \
+    step0[63] = clamp_value(step1[60] + step1[63], range);                                                       \
+    step1[0] = half_btf(cospi[32], step0[0], cospi[32], step0[1], CosBits);                                      \
+    step1[1] = half_btf(cospi[32], step0[0], -cospi[32], step0[1], CosBits);                                     \
+    step1[2] = half_btf(cospi[48], step0[2], -cospi[16], step0[3], CosBits);                                     \
+    step1[3] = half_btf(cospi[16], step0[2], cospi[48], step0[3], CosBits);                                      \
+    step1[4] = clamp_value(step0[4] + step0[5], range);                                                          \
+    step1[5] = clamp_value(step0[4] - step0[5], range);                                                          \
+    step1[6] = clamp_value(-step0[6] + step0[7], range);                                                         \
+    step1[7] = clamp_value(step0[6] + step0[7], range);                                                          \
+    step1[8] = step0[8];                                                                                         \
+    step1[9] = half_btf(-cospi[16], step0[9], cospi[48], step0[14], CosBits);                                    \
+    step1[10] = half_btf(-cospi[48], step0[10], -cospi[16], step0[13], CosBits);                                 \
+    step1[11] = step0[11];                                                                                       \
+    step1[12] = step0[12];                                                                                       \
+    step1[13] = half_btf(-cospi[16], step0[10], cospi[48], step0[13], CosBits);                                  \
+    step1[14] = half_btf(cospi[48], step0[9], cospi[16], step0[14], CosBits);                                    \
+    step1[15] = step0[15];                                                                                       \
+    step1[16] = clamp_value(step0[16] + step0[19], range);                                                       \
+    step1[17] = clamp_value(step0[17] + step0[18], range);                                                       \
+    step1[18] = clamp_value(step0[17] - step0[18], range);                                                       \
+    step1[19] = clamp_value(step0[16] - step0[19], range);                                                       \
+    step1[20] = clamp_value(-step0[20] + step0[23], range);                                                      \
+    step1[21] = clamp_value(-step0[21] + step0[22], range);                                                      \
+    step1[22] = clamp_value(step0[21] + step0[22], range);                                                       \
+    step1[23] = clamp_value(step0[20] + step0[23], range);                                                       \
+    step1[24] = clamp_value(step0[24] + step0[27], range);                                                       \
+    step1[25] = clamp_value(step0[25] + step0[26], range);                                                       \
+    step1[26] = clamp_value(step0[25] - step0[26], range);                                                       \
+    step1[27] = clamp_value(step0[24] - step0[27], range);                                                       \
+    step1[28] = clamp_value(-step0[28] + step0[31], range);                                                      \
+    step1[29] = clamp_value(-step0[29] + step0[30], range);                                                      \
+    step1[30] = clamp_value(step0[29] + step0[30], range);                                                       \
+    step1[31] = clamp_value(step0[28] + step0[31], range);                                                       \
+    step1[32] = step0[32];                                                                                       \
+    step1[33] = step0[33];                                                                                       \
+    step1[34] = half_btf(-cospi[8], step0[34], cospi[56], step0[61], CosBits);                                   \
+    step1[35] = half_btf(-cospi[8], step0[35], cospi[56], step0[60], CosBits);                                   \
+    step1[36] = half_btf(-cospi[56], step0[36], -cospi[8], step0[59], CosBits);                                  \
+    step1[37] = half_btf(-cospi[56], step0[37], -cospi[8], step0[58], CosBits);                                  \
+    step1[38] = step0[38];                                                                                       \
+    step1[39] = step0[39];                                                                                       \
+    step1[40] = step0[40];                                                                                       \
+    step1[41] = step0[41];                                                                                       \
+    step1[42] = half_btf(-cospi[40], step0[42], cospi[24], step0[53], CosBits);                                  \
+    step1[43] = half_btf(-cospi[40], step0[43], cospi[24], step0[52], CosBits);                                  \
+    step1[44] = half_btf(-cospi[24], step0[44], -cospi[40], step0[51], CosBits);                                 \
+    step1[45] = half_btf(-cospi[24], step0[45], -cospi[40], step0[50], CosBits);                                 \
+    step1[46] = step0[46];                                                                                       \
+    step1[47] = step0[47];                                                                                       \
+    step1[48] = step0[48];                                                                                       \
+    step1[49] = step0[49];                                                                                       \
+    step1[50] = half_btf(-cospi[40], step0[45], cospi[24], step0[50], CosBits);                                  \
+    step1[51] = half_btf(-cospi[40], step0[44], cospi[24], step0[51], CosBits);                                  \
+    step1[52] = half_btf(cospi[24], step0[43], cospi[40], step0[52], CosBits);                                   \
+    step1[53] = half_btf(cospi[24], step0[42], cospi[40], step0[53], CosBits);                                   \
+    step1[54] = step0[54];                                                                                       \
+    step1[55] = step0[55];                                                                                       \
+    step1[56] = step0[56];                                                                                       \
+    step1[57] = step0[57];                                                                                       \
+    step1[58] = half_btf(-cospi[8], step0[37], cospi[56], step0[58], CosBits);                                   \
+    step1[59] = half_btf(-cospi[8], step0[36], cospi[56], step0[59], CosBits);                                   \
+    step1[60] = half_btf(cospi[56], step0[35], cospi[8], step0[60], CosBits);                                    \
+    step1[61] = half_btf(cospi[56], step0[34], cospi[8], step0[61], CosBits);                                    \
+    step1[62] = step0[62];                                                                                       \
+    step1[63] = step0[63];                                                                                       \
+    step0[0] = clamp_value(step1[0] + step1[3], range);                                                          \
+    step0[1] = clamp_value(step1[1] + step1[2], range);                                                          \
+    step0[2] = clamp_value(step1[1] - step1[2], range);                                                          \
+    step0[3] = clamp_value(step1[0] - step1[3], range);                                                          \
+    step0[4] = step1[4];                                                                                         \
+    step0[5] = half_btf(-cospi[32], step1[5], cospi[32], step1[6], CosBits);                                     \
+    step0[6] = half_btf(cospi[32], step1[5], cospi[32], step1[6], CosBits);                                      \
+    step0[7] = step1[7];                                                                                         \
+    step0[8] = clamp_value(step1[8] + step1[11], range);                                                         \
+    step0[9] = clamp_value(step1[9] + step1[10], range);                                                         \
+    step0[10] = clamp_value(step1[9] - step1[10], range);                                                        \
+    step0[11] = clamp_value(step1[8] - step1[11], range);                                                        \
+    step0[12] = clamp_value(-step1[12] + step1[15], range);                                                      \
+    step0[13] = clamp_value(-step1[13] + step1[14], range);                                                      \
+    step0[14] = clamp_value(step1[13] + step1[14], range);                                                       \
+    step0[15] = clamp_value(step1[12] + step1[15], range);                                                       \
+    step0[16] = step1[16];                                                                                       \
+    step0[17] = step1[17];                                                                                       \
+    step0[18] = half_btf(-cospi[16], step1[18], cospi[48], step1[29], CosBits);                                  \
+    step0[19] = half_btf(-cospi[16], step1[19], cospi[48], step1[28], CosBits);                                  \
+    step0[20] = half_btf(-cospi[48], step1[20], -cospi[16], step1[27], CosBits);                                 \
+    step0[21] = half_btf(-cospi[48], step1[21], -cospi[16], step1[26], CosBits);                                 \
+    step0[22] = step1[22];                                                                                       \
+    step0[23] = step1[23];                                                                                       \
+    step0[24] = step1[24];                                                                                       \
+    step0[25] = step1[25];                                                                                       \
+    step0[26] = half_btf(-cospi[16], step1[21], cospi[48], step1[26], CosBits);                                  \
+    step0[27] = half_btf(-cospi[16], step1[20], cospi[48], step1[27], CosBits);                                  \
+    step0[28] = half_btf(cospi[48], step1[19], cospi[16], step1[28], CosBits);                                   \
+    step0[29] = half_btf(cospi[48], step1[18], cospi[16], step1[29], CosBits);                                   \
+    step0[30] = step1[30];                                                                                       \
+    step0[31] = step1[31];                                                                                       \
+    step0[32] = clamp_value(step1[32] + step1[39], range);                                                       \
+    step0[33] = clamp_value(step1[33] + step1[38], range);                                                       \
+    step0[34] = clamp_value(step1[34] + step1[37], range);                                                       \
+    step0[35] = clamp_value(step1[35] + step1[36], range);                                                       \
+    step0[36] = clamp_value(step1[35] - step1[36], range);                                                       \
+    step0[37] = clamp_value(step1[34] - step1[37], range);                                                       \
+    step0[38] = clamp_value(step1[33] - step1[38], range);                                                       \
+    step0[39] = clamp_value(step1[32] - step1[39], range);                                                       \
+    step0[40] = clamp_value(-step1[40] + step1[47], range);                                                      \
+    step0[41] = clamp_value(-step1[41] + step1[46], range);                                                      \
+    step0[42] = clamp_value(-step1[42] + step1[45], range);                                                      \
+    step0[43] = clamp_value(-step1[43] + step1[44], range);                                                      \
+    step0[44] = clamp_value(step1[43] + step1[44], range);                                                       \
+    step0[45] = clamp_value(step1[42] + step1[45], range);                                                       \
+    step0[46] = clamp_value(step1[41] + step1[46], range);                                                       \
+    step0[47] = clamp_value(step1[40] + step1[47], range);                                                       \
+    step0[48] = clamp_value(step1[48] + step1[55], range);                                                       \
+    step0[49] = clamp_value(step1[49] + step1[54], range);                                                       \
+    step0[50] = clamp_value(step1[50] + step1[53], range);                                                       \
+    step0[51] = clamp_value(step1[51] + step1[52], range);                                                       \
+    step0[52] = clamp_value(step1[51] - step1[52], range);                                                       \
+    step0[53] = clamp_value(step1[50] - step1[53], range);                                                       \
+    step0[54] = clamp_value(step1[49] - step1[54], range);                                                       \
+    step0[55] = clamp_value(step1[48] - step1[55], range);                                                       \
+    step0[56] = clamp_value(-step1[56] + step1[63], range);                                                      \
+    step0[57] = clamp_value(-step1[57] + step1[62], range);                                                      \
+    step0[58] = clamp_value(-step1[58] + step1[61], range);                                                      \
+    step0[59] = clamp_value(-step1[59] + step1[60], range);                                                      \
+    step0[60] = clamp_value(step1[59] + step1[60], range);                                                       \
+    step0[61] = clamp_value(step1[58] + step1[61], range);                                                       \
+    step0[62] = clamp_value(step1[57] + step1[62], range);                                                       \
+    step0[63] = clamp_value(step1[56] + step1[63], range);                                                       \
+    step1[0] = clamp_value(step0[0] + step0[7], range);                                                          \
+    step1[1] = clamp_value(step0[1] + step0[6], range);                                                          \
+    step1[2] = clamp_value(step0[2] + step0[5], range);                                                          \
+    step1[3] = clamp_value(step0[3] + step0[4], range);                                                          \
+    step1[4] = clamp_value(step0[3] - step0[4], range);                                                          \
+    step1[5] = clamp_value(step0[2] - step0[5], range);                                                          \
+    step1[6] = clamp_value(step0[1] - step0[6], range);                                                          \
+    step1[7] = clamp_value(step0[0] - step0[7], range);                                                          \
+    step1[8] = step0[8];                                                                                         \
+    step1[9] = step0[9];                                                                                         \
+    step1[10] = half_btf(-cospi[32], step0[10], cospi[32], step0[13], CosBits);                                  \
+    step1[11] = half_btf(-cospi[32], step0[11], cospi[32], step0[12], CosBits);                                  \
+    step1[12] = half_btf(cospi[32], step0[11], cospi[32], step0[12], CosBits);                                   \
+    step1[13] = half_btf(cospi[32], step0[10], cospi[32], step0[13], CosBits);                                   \
+    step1[14] = step0[14];                                                                                       \
+    step1[15] = step0[15];                                                                                       \
+    step1[16] = clamp_value(step0[16] + step0[23], range);                                                       \
+    step1[17] = clamp_value(step0[17] + step0[22], range);                                                       \
+    step1[18] = clamp_value(step0[18] + step0[21], range);                                                       \
+    step1[19] = clamp_value(step0[19] + step0[20], range);                                                       \
+    step1[20] = clamp_value(step0[19] - step0[20], range);                                                       \
+    step1[21] = clamp_value(step0[18] - step0[21], range);                                                       \
+    step1[22] = clamp_value(step0[17] - step0[22], range);                                                       \
+    step1[23] = clamp_value(step0[16] - step0[23], range);                                                       \
+    step1[24] = clamp_value(-step0[24] + step0[31], range);                                                      \
+    step1[25] = clamp_value(-step0[25] + step0[30], range);                                                      \
+    step1[26] = clamp_value(-step0[26] + step0[29], range);                                                      \
+    step1[27] = clamp_value(-step0[27] + step0[28], range);                                                      \
+    step1[28] = clamp_value(step0[27] + step0[28], range);                                                       \
+    step1[29] = clamp_value(step0[26] + step0[29], range);                                                       \
+    step1[30] = clamp_value(step0[25] + step0[30], range);                                                       \
+    step1[31] = clamp_value(step0[24] + step0[31], range);                                                       \
+    step1[32] = step0[32];                                                                                       \
+    step1[33] = step0[33];                                                                                       \
+    step1[34] = step0[34];                                                                                       \
+    step1[35] = step0[35];                                                                                       \
+    step1[36] = half_btf(-cospi[16], step0[36], cospi[48], step0[59], CosBits);                                  \
+    step1[37] = half_btf(-cospi[16], step0[37], cospi[48], step0[58], CosBits);                                  \
+    step1[38] = half_btf(-cospi[16], step0[38], cospi[48], step0[57], CosBits);                                  \
+    step1[39] = half_btf(-cospi[16], step0[39], cospi[48], step0[56], CosBits);                                  \
+    step1[40] = half_btf(-cospi[48], step0[40], -cospi[16], step0[55], CosBits);                                 \
+    step1[41] = half_btf(-cospi[48], step0[41], -cospi[16], step0[54], CosBits);                                 \
+    step1[42] = half_btf(-cospi[48], step0[42], -cospi[16], step0[53], CosBits);                                 \
+    step1[43] = half_btf(-cospi[48], step0[43], -cospi[16], step0[52], CosBits);                                 \
+    step1[44] = step0[44];                                                                                       \
+    step1[45] = step0[45];                                                                                       \
+    step1[46] = step0[46];                                                                                       \
+    step1[47] = step0[47];                                                                                       \
+    step1[48] = step0[48];                                                                                       \
+    step1[49] = step0[49];                                                                                       \
+    step1[50] = step0[50];                                                                                       \
+    step1[51] = step0[51];                                                                                       \
+    step1[52] = half_btf(-cospi[16], step0[43], cospi[48], step0[52], CosBits);                                  \
+    step1[53] = half_btf(-cospi[16], step0[42], cospi[48], step0[53], CosBits);                                  \
+    step1[54] = half_btf(-cospi[16], step0[41], cospi[48], step0[54], CosBits);                                  \
+    step1[55] = half_btf(-cospi[16], step0[40], cospi[48], step0[55], CosBits);                                  \
+    step1[56] = half_btf(cospi[48], step0[39], cospi[16], step0[56], CosBits);                                   \
+    step1[57] = half_btf(cospi[48], step0[38], cospi[16], step0[57], CosBits);                                   \
+    step1[58] = half_btf(cospi[48], step0[37], cospi[16], step0[58], CosBits);                                   \
+    step1[59] = half_btf(cospi[48], step0[36], cospi[16], step0[59], CosBits);                                   \
+    step1[60] = step0[60];                                                                                       \
+    step1[61] = step0[61];                                                                                       \
+    step1[62] = step0[62];                                                                                       \
+    step1[63] = step0[63];                                                                                       \
+    step0[0] = clamp_value(step1[0] + step1[15], range);                                                         \
+    step0[1] = clamp_value(step1[1] + step1[14], range);                                                         \
+    step0[2] = clamp_value(step1[2] + step1[13], range);                                                         \
+    step0[3] = clamp_value(step1[3] + step1[12], range);                                                         \
+    step0[4] = clamp_value(step1[4] + step1[11], range);                                                         \
+    step0[5] = clamp_value(step1[5] + step1[10], range);                                                         \
+    step0[6] = clamp_value(step1[6] + step1[9], range);                                                          \
+    step0[7] = clamp_value(step1[7] + step1[8], range);                                                          \
+    step0[8] = clamp_value(step1[7] - step1[8], range);                                                          \
+    step0[9] = clamp_value(step1[6] - step1[9], range);                                                          \
+    step0[10] = clamp_value(step1[5] - step1[10], range);                                                        \
+    step0[11] = clamp_value(step1[4] - step1[11], range);                                                        \
+    step0[12] = clamp_value(step1[3] - step1[12], range);                                                        \
+    step0[13] = clamp_value(step1[2] - step1[13], range);                                                        \
+    step0[14] = clamp_value(step1[1] - step1[14], range);                                                        \
+    step0[15] = clamp_value(step1[0] - step1[15], range);                                                        \
+    step0[16] = step1[16];                                                                                       \
+    step0[17] = step1[17];                                                                                       \
+    step0[18] = step1[18];                                                                                       \
+    step0[19] = step1[19];                                                                                       \
+    step0[20] = half_btf(-cospi[32], step1[20], cospi[32], step1[27], CosBits);                                  \
+    step0[21] = half_btf(-cospi[32], step1[21], cospi[32], step1[26], CosBits);                                  \
+    step0[22] = half_btf(-cospi[32], step1[22], cospi[32], step1[25], CosBits);                                  \
+    step0[23] = half_btf(-cospi[32], step1[23], cospi[32], step1[24], CosBits);                                  \
+    step0[24] = half_btf(cospi[32], step1[23], cospi[32], step1[24], CosBits);                                   \
+    step0[25] = half_btf(cospi[32], step1[22], cospi[32], step1[25], CosBits);                                   \
+    step0[26] = half_btf(cospi[32], step1[21], cospi[32], step1[26], CosBits);                                   \
+    step0[27] = half_btf(cospi[32], step1[20], cospi[32], step1[27], CosBits);                                   \
+    step0[28] = step1[28];                                                                                       \
+    step0[29] = step1[29];                                                                                       \
+    step0[30] = step1[30];                                                                                       \
+    step0[31] = step1[31];                                                                                       \
+    step0[32] = clamp_value(step1[32] + step1[47], range);                                                       \
+    step0[33] = clamp_value(step1[33] + step1[46], range);                                                       \
+    step0[34] = clamp_value(step1[34] + step1[45], range);                                                       \
+    step0[35] = clamp_value(step1[35] + step1[44], range);                                                       \
+    step0[36] = clamp_value(step1[36] + step1[43], range);                                                       \
+    step0[37] = clamp_value(step1[37] + step1[42], range);                                                       \
+    step0[38] = clamp_value(step1[38] + step1[41], range);                                                       \
+    step0[39] = clamp_value(step1[39] + step1[40], range);                                                       \
+    step0[40] = clamp_value(step1[39] - step1[40], range);                                                       \
+    step0[41] = clamp_value(step1[38] - step1[41], range);                                                       \
+    step0[42] = clamp_value(step1[37] - step1[42], range);                                                       \
+    step0[43] = clamp_value(step1[36] - step1[43], range);                                                       \
+    step0[44] = clamp_value(step1[35] - step1[44], range);                                                       \
+    step0[45] = clamp_value(step1[34] - step1[45], range);                                                       \
+    step0[46] = clamp_value(step1[33] - step1[46], range);                                                       \
+    step0[47] = clamp_value(step1[32] - step1[47], range);                                                       \
+    step0[48] = clamp_value(-step1[48] + step1[63], range);                                                      \
+    step0[49] = clamp_value(-step1[49] + step1[62], range);                                                      \
+    step0[50] = clamp_value(-step1[50] + step1[61], range);                                                      \
+    step0[51] = clamp_value(-step1[51] + step1[60], range);                                                      \
+    step0[52] = clamp_value(-step1[52] + step1[59], range);                                                      \
+    step0[53] = clamp_value(-step1[53] + step1[58], range);                                                      \
+    step0[54] = clamp_value(-step1[54] + step1[57], range);                                                      \
+    step0[55] = clamp_value(-step1[55] + step1[56], range);                                                      \
+    step0[56] = clamp_value(step1[55] + step1[56], range);                                                       \
+    step0[57] = clamp_value(step1[54] + step1[57], range);                                                       \
+    step0[58] = clamp_value(step1[53] + step1[58], range);                                                       \
+    step0[59] = clamp_value(step1[52] + step1[59], range);                                                       \
+    step0[60] = clamp_value(step1[51] + step1[60], range);                                                       \
+    step0[61] = clamp_value(step1[50] + step1[61], range);                                                       \
+    step0[62] = clamp_value(step1[49] + step1[62], range);                                                       \
+    step0[63] = clamp_value(step1[48] + step1[63], range);                                                       \
+    step1[0] = clamp_value(step0[0] + step0[31], range);                                                         \
+    step1[1] = clamp_value(step0[1] + step0[30], range);                                                         \
+    step1[2] = clamp_value(step0[2] + step0[29], range);                                                         \
+    step1[3] = clamp_value(step0[3] + step0[28], range);                                                         \
+    step1[4] = clamp_value(step0[4] + step0[27], range);                                                         \
+    step1[5] = clamp_value(step0[5] + step0[26], range);                                                         \
+    step1[6] = clamp_value(step0[6] + step0[25], range);                                                         \
+    step1[7] = clamp_value(step0[7] + step0[24], range);                                                         \
+    step1[8] = clamp_value(step0[8] + step0[23], range);                                                         \
+    step1[9] = clamp_value(step0[9] + step0[22], range);                                                         \
+    step1[10] = clamp_value(step0[10] + step0[21], range);                                                       \
+    step1[11] = clamp_value(step0[11] + step0[20], range);                                                       \
+    step1[12] = clamp_value(step0[12] + step0[19], range);                                                       \
+    step1[13] = clamp_value(step0[13] + step0[18], range);                                                       \
+    step1[14] = clamp_value(step0[14] + step0[17], range);                                                       \
+    step1[15] = clamp_value(step0[15] + step0[16], range);                                                       \
+    step1[16] = clamp_value(step0[15] - step0[16], range);                                                       \
+    step1[17] = clamp_value(step0[14] - step0[17], range);                                                       \
+    step1[18] = clamp_value(step0[13] - step0[18], range);                                                       \
+    step1[19] = clamp_value(step0[12] - step0[19], range);                                                       \
+    step1[20] = clamp_value(step0[11] - step0[20], range);                                                       \
+    step1[21] = clamp_value(step0[10] - step0[21], range);                                                       \
+    step1[22] = clamp_value(step0[9] - step0[22], range);                                                        \
+    step1[23] = clamp_value(step0[8] - step0[23], range);                                                        \
+    step1[24] = clamp_value(step0[7] - step0[24], range);                                                        \
+    step1[25] = clamp_value(step0[6] - step0[25], range);                                                        \
+    step1[26] = clamp_value(step0[5] - step0[26], range);                                                        \
+    step1[27] = clamp_value(step0[4] - step0[27], range);                                                        \
+    step1[28] = clamp_value(step0[3] - step0[28], range);                                                        \
+    step1[29] = clamp_value(step0[2] - step0[29], range);                                                        \
+    step1[30] = clamp_value(step0[1] - step0[30], range);                                                        \
+    step1[31] = clamp_value(step0[0] - step0[31], range);                                                        \
+    step1[32] = step0[32];                                                                                       \
+    step1[33] = step0[33];                                                                                       \
+    step1[34] = step0[34];                                                                                       \
+    step1[35] = step0[35];                                                                                       \
+    step1[36] = step0[36];                                                                                       \
+    step1[37] = step0[37];                                                                                       \
+    step1[38] = step0[38];                                                                                       \
+    step1[39] = step0[39];                                                                                       \
+    step1[40] = half_btf(-cospi[32], step0[40], cospi[32], step0[55], CosBits);                                  \
+    step1[41] = half_btf(-cospi[32], step0[41], cospi[32], step0[54], CosBits);                                  \
+    step1[42] = half_btf(-cospi[32], step0[42], cospi[32], step0[53], CosBits);                                  \
+    step1[43] = half_btf(-cospi[32], step0[43], cospi[32], step0[52], CosBits);                                  \
+    step1[44] = half_btf(-cospi[32], step0[44], cospi[32], step0[51], CosBits);                                  \
+    step1[45] = half_btf(-cospi[32], step0[45], cospi[32], step0[50], CosBits);                                  \
+    step1[46] = half_btf(-cospi[32], step0[46], cospi[32], step0[49], CosBits);                                  \
+    step1[47] = half_btf(-cospi[32], step0[47], cospi[32], step0[48], CosBits);                                  \
+    step1[48] = half_btf(cospi[32], step0[47], cospi[32], step0[48], CosBits);                                   \
+    step1[49] = half_btf(cospi[32], step0[46], cospi[32], step0[49], CosBits);                                   \
+    step1[50] = half_btf(cospi[32], step0[45], cospi[32], step0[50], CosBits);                                   \
+    step1[51] = half_btf(cospi[32], step0[44], cospi[32], step0[51], CosBits);                                   \
+    step1[52] = half_btf(cospi[32], step0[43], cospi[32], step0[52], CosBits);                                   \
+    step1[53] = half_btf(cospi[32], step0[42], cospi[32], step0[53], CosBits);                                   \
+    step1[54] = half_btf(cospi[32], step0[41], cospi[32], step0[54], CosBits);                                   \
+    step1[55] = half_btf(cospi[32], step0[40], cospi[32], step0[55], CosBits);                                   \
+    step1[56] = step0[56];                                                                                       \
+    step1[57] = step0[57];                                                                                       \
+    step1[58] = step0[58];                                                                                       \
+    step1[59] = step0[59];                                                                                       \
+    step1[60] = step0[60];                                                                                       \
+    step1[61] = step0[61];                                                                                       \
+    step1[62] = step0[62];                                                                                       \
+    step1[63] = step0[63];                                                                                       \
+    output[out_offset + (out_stride)*0] = round_shift(clamp_value(step1[0] + step1[63], range), output_shift);   \
+    output[out_offset + (out_stride)*1] = round_shift(clamp_value(step1[1] + step1[62], range), output_shift);   \
+    output[out_offset + (out_stride)*2] = round_shift(clamp_value(step1[2] + step1[61], range), output_shift);   \
+    output[out_offset + (out_stride)*3] = round_shift(clamp_value(step1[3] + step1[60], range), output_shift);   \
+    output[out_offset + (out_stride)*4] = round_shift(clamp_value(step1[4] + step1[59], range), output_shift);   \
+    output[out_offset + (out_stride)*5] = round_shift(clamp_value(step1[5] + step1[58], range), output_shift);   \
+    output[out_offset + (out_stride)*6] = round_shift(clamp_value(step1[6] + step1[57], range), output_shift);   \
+    output[out_offset + (out_stride)*7] = round_shift(clamp_value(step1[7] + step1[56], range), output_shift);   \
+    output[out_offset + (out_stride)*8] = round_shift(clamp_value(step1[8] + step1[55], range), output_shift);   \
+    output[out_offset + (out_stride)*9] = round_shift(clamp_value(step1[9] + step1[54], range), output_shift);   \
+    output[out_offset + (out_stride)*10] = round_shift(clamp_value(step1[10] + step1[53], range), output_shift); \
+    output[out_offset + (out_stride)*11] = round_shift(clamp_value(step1[11] + step1[52], range), output_shift); \
+    output[out_offset + (out_stride)*12] = round_shift(clamp_value(step1[12] + step1[51], range), output_shift); \
+    output[out_offset + (out_stride)*13] = round_shift(clamp_value(step1[13] + step1[50], range), output_shift); \
+    output[out_offset + (out_stride)*14] = round_shift(clamp_value(step1[14] + step1[49], range), output_shift); \
+    output[out_offset + (out_stride)*15] = round_shift(clamp_value(step1[15] + step1[48], range), output_shift); \
+    output[out_offset + (out_stride)*16] = round_shift(clamp_value(step1[16] + step1[47], range), output_shift); \
+    output[out_offset + (out_stride)*17] = round_shift(clamp_value(step1[17] + step1[46], range), output_shift); \
+    output[out_offset + (out_stride)*18] = round_shift(clamp_value(step1[18] + step1[45], range), output_shift); \
+    output[out_offset + (out_stride)*19] = round_shift(clamp_value(step1[19] + step1[44], range), output_shift); \
+    output[out_offset + (out_stride)*20] = round_shift(clamp_value(step1[20] + step1[43], range), output_shift); \
+    output[out_offset + (out_stride)*21] = round_shift(clamp_value(step1[21] + step1[42], range), output_shift); \
+    output[out_offset + (out_stride)*22] = round_shift(clamp_value(step1[22] + step1[41], range), output_shift); \
+    output[out_offset + (out_stride)*23] = round_shift(clamp_value(step1[23] + step1[40], range), output_shift); \
+    output[out_offset + (out_stride)*24] = round_shift(clamp_value(step1[24] + step1[39], range), output_shift); \
+    output[out_offset + (out_stride)*25] = round_shift(clamp_value(step1[25] + step1[38], range), output_shift); \
+    output[out_offset + (out_stride)*26] = round_shift(clamp_value(step1[26] + step1[37], range), output_shift); \
+    output[out_offset + (out_stride)*27] = round_shift(clamp_value(step1[27] + step1[36], range), output_shift); \
+    output[out_offset + (out_stride)*28] = round_shift(clamp_value(step1[28] + step1[35], range), output_shift); \
+    output[out_offset + (out_stride)*29] = round_shift(clamp_value(step1[29] + step1[34], range), output_shift); \
+    output[out_offset + (out_stride)*30] = round_shift(clamp_value(step1[30] + step1[33], range), output_shift); \
+    output[out_offset + (out_stride)*31] = round_shift(clamp_value(step1[31] + step1[32], range), output_shift); \
+    output[out_offset + (out_stride)*32] = round_shift(clamp_value(step1[31] - step1[32], range), output_shift); \
+    output[out_offset + (out_stride)*33] = round_shift(clamp_value(step1[30] - step1[33], range), output_shift); \
+    output[out_offset + (out_stride)*34] = round_shift(clamp_value(step1[29] - step1[34], range), output_shift); \
+    output[out_offset + (out_stride)*35] = round_shift(clamp_value(step1[28] - step1[35], range), output_shift); \
+    output[out_offset + (out_stride)*36] = round_shift(clamp_value(step1[27] - step1[36], range), output_shift); \
+    output[out_offset + (out_stride)*37] = round_shift(clamp_value(step1[26] - step1[37], range), output_shift); \
+    output[out_offset + (out_stride)*38] = round_shift(clamp_value(step1[25] - step1[38], range), output_shift); \
+    output[out_offset + (out_stride)*39] = round_shift(clamp_value(step1[24] - step1[39], range), output_shift); \
+    output[out_offset + (out_stride)*40] = round_shift(clamp_value(step1[23] - step1[40], range), output_shift); \
+    output[out_offset + (out_stride)*41] = round_shift(clamp_value(step1[22] - step1[41], range), output_shift); \
+    output[out_offset + (out_stride)*42] = round_shift(clamp_value(step1[21] - step1[42], range), output_shift); \
+    output[out_offset + (out_stride)*43] = round_shift(clamp_value(step1[20] - step1[43], range), output_shift); \
+    output[out_offset + (out_stride)*44] = round_shift(clamp_value(step1[19] - step1[44], range), output_shift); \
+    output[out_offset + (out_stride)*45] = round_shift(clamp_value(step1[18] - step1[45], range), output_shift); \
+    output[out_offset + (out_stride)*46] = round_shift(clamp_value(step1[17] - step1[46], range), output_shift); \
+    output[out_offset + (out_stride)*47] = round_shift(clamp_value(step1[16] - step1[47], range), output_shift); \
+    output[out_offset + (out_stride)*48] = round_shift(clamp_value(step1[15] - step1[48], range), output_shift); \
+    output[out_offset + (out_stride)*49] = round_shift(clamp_value(step1[14] - step1[49], range), output_shift); \
+    output[out_offset + (out_stride)*50] = round_shift(clamp_value(step1[13] - step1[50], range), output_shift); \
+    output[out_offset + (out_stride)*51] = round_shift(clamp_value(step1[12] - step1[51], range), output_shift); \
+    output[out_offset + (out_stride)*52] = round_shift(clamp_value(step1[11] - step1[52], range), output_shift); \
+    output[out_offset + (out_stride)*53] = round_shift(clamp_value(step1[10] - step1[53], range), output_shift); \
+    output[out_offset + (out_stride)*54] = round_shift(clamp_value(step1[9] - step1[54], range), output_shift);  \
+    output[out_offset + (out_stride)*55] = round_shift(clamp_value(step1[8] - step1[55], range), output_shift);  \
+    output[out_offset + (out_stride)*56] = round_shift(clamp_value(step1[7] - step1[56], range), output_shift);  \
+    output[out_offset + (out_stride)*57] = round_shift(clamp_value(step1[6] - step1[57], range), output_shift);  \
+    output[out_offset + (out_stride)*58] = round_shift(clamp_value(step1[5] - step1[58], range), output_shift);  \
+    output[out_offset + (out_stride)*59] = round_shift(clamp_value(step1[4] - step1[59], range), output_shift);  \
+    output[out_offset + (out_stride)*60] = round_shift(clamp_value(step1[3] - step1[60], range), output_shift);  \
+    output[out_offset + (out_stride)*61] = round_shift(clamp_value(step1[2] - step1[61], range), output_shift);  \
+    output[out_offset + (out_stride)*62] = round_shift(clamp_value(step1[1] - step1[62], range), output_shift);  \
+    output[out_offset + (out_stride)*63] = round_shift(clamp_value(step1[0] - step1[63], range), output_shift);  \
+  }
+
+#define IIDENTITY_N(N, mult, shift, input, in_offset, in_stride, output, out_offset, out_stride, range, output_shift) \
+  {                                                                                                                   \
+    for (int i = 0; i < N; ++i) {                                                                                     \
+      output[out_offset + i * (out_stride)] =                                                                         \
+          clamp_value(round_shift(round_shift(mult * clamp_value(input[in_offset + i * (in_stride)], range), shift),  \
+                                  output_shift),                                                                      \
+                      range);                                                                                         \
+    }                                                                                                                 \
+  }

diff --git a/libav1/dx/shaders/idct_sort_blocks.hlsl b/libav1/dx/shaders/idct_sort_blocks.hlsl
new file mode 100644
index 0000000..4dd02fe
--- /dev/null
+++ b/libav1/dx/shaders/idct_sort_blocks.hlsl

@@ -0,0 +1,41 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#define Types 20
+#define TypeMask 0xff
+#define TypeSize 8
+
+cbuffer cb_sort_data : register(b0) {
+  uint cb_count;
+  uint cb_src_offset;
+  int2 reserved;
+  int4 cb_offsets[Types];
+};
+
+ByteAddressBuffer src : register(t0);
+RWByteAddressBuffer dst : register(u0);
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_count) return;
+  int4 block = src.Load4((thread.x + cb_src_offset) * 16);
+  uint type = block.w & TypeMask;
+  uint offset = block.w >> TypeSize;
+  if (type >= Types) return;
+
+  int index = cb_offsets[type].x + offset;
+  dst.Store4(index * 16, block);
+}

diff --git a/libav1/dx/shaders/inter_2x2chroma.hlsl b/libav1/dx/shaders/inter_2x2chroma.hlsl
new file mode 100644
index 0000000..5df868a
--- /dev/null
+++ b/libav1/dx/shaders/inter_2x2chroma.hlsl

@@ -0,0 +1,130 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 2
+#define SubblockH 2
+#define OutputShift 11
+#define OutputRoundAdd (1 << (OutputShift - 1))
+#define OffsetBits 19
+#define SumAdd (1 << OffsetBits)
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define DualWriteBlock (1 << 25)
+#define SUM1 1 << OffsetBits
+
+groupshared uint2 mem[64];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  uint4 block = pred_blocks.Load4((cb_pass_offset + thread.x) * 16);
+
+  int x = SubblockW * (block.x & 0xffff);
+  int y = SubblockH * (block.x >> 16);
+
+  const int2 dims = cb_dims[1].xy;
+  const int plane = block.y & 3;
+  int noskip = block.y & NoSkipFlag;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x);
+
+  int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+
+  int4 tmp = {SUM1, SUM1, SUM1, SUM1};
+
+  int2 l;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.xy += l * cb_kernels[filter_v][0].x;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.zw += l * cb_kernels[filter_v][0].x;
+  tmp.xy += l * cb_kernels[filter_v][0].y;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.zw += l * cb_kernels[filter_v][0].y;
+  tmp.xy += l * cb_kernels[filter_v][0].z;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.zw += l * cb_kernels[filter_v][0].z;
+  tmp.xy += l * cb_kernels[filter_v][0].w;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.zw += l * cb_kernels[filter_v][0].w;
+  tmp.xy += l * cb_kernels[filter_v][1].x;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.zw += l * cb_kernels[filter_v][1].x;
+  tmp.xy += l * cb_kernels[filter_v][1].y;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.zw += l * cb_kernels[filter_v][1].y;
+  tmp.xy += l * cb_kernels[filter_v][1].z;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.zw += l * cb_kernels[filter_v][1].z;
+  tmp.xy += l * cb_kernels[filter_v][1].w;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.zw += l * cb_kernels[filter_v][1].w;
+  tmp.x = clamp((int)(((tmp.x + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+  tmp.y = clamp((int)(((tmp.y + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+  tmp.z = clamp((int)(((tmp.z + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+  tmp.w = clamp((int)(((tmp.w + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+
+  /// add residuals here
+
+  const int res_stride = cb_planes[plane].z;
+  const int res_offset = cb_planes[plane].w + (x << 1) + y * res_stride;
+  if (noskip) {
+    int r0 = residuals.Load(res_offset);
+    int r1 = residuals.Load(res_offset + res_stride);
+    tmp.x += (r0 << 16) >> 16;
+    tmp.y += r0 >> 16;
+    tmp.z += (r1 << 16) >> 16;
+    tmp.w += r1 >> 16;
+    tmp = clamp(tmp, 0, 255);
+  }
+
+  uint2 output;
+  output.x = (tmp.x | (tmp.y << 8)) << ((x & 2) * 8);
+  output.y = (tmp.z | (tmp.w << 8)) << ((x & 2) * 8);
+
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + (x & (~3)) + y * output_stride;
+
+  if (block.y & DualWriteBlock) {
+    mem[thread.x & 63] = output;
+    GroupMemoryBarrier();
+    if ((thread.x & 1) == 0) {
+      int2 output1 = mem[(thread.x & 63) + 1];
+      output.x |= output1.x;
+      output.y |= output1.y;
+      dst_frame.Store(output_offset, output.x);
+      dst_frame.Store(output_offset + output_stride, output.y);
+    }
+  } else {
+    const uint mask = 0xffff0000 >> ((x & 2) * 8);
+    output.x |= dst_frame.Load(output_offset) & mask;
+    output.y |= dst_frame.Load(output_offset + output_stride) & mask;
+    dst_frame.Store(output_offset, output.x);
+    dst_frame.Store(output_offset + output_stride, output.y);
+  }
+}

diff --git a/libav1/dx/shaders/inter_2x2chroma_hbd.hlsl b/libav1/dx/shaders/inter_2x2chroma_hbd.hlsl
new file mode 100644
index 0000000..03073d1
--- /dev/null
+++ b/libav1/dx/shaders/inter_2x2chroma_hbd.hlsl

@@ -0,0 +1,108 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 2
+#define SubblockH 2
+#define OutputShift 11
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define PixelMax 1023
+
+groupshared uint2 mem[64];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  uint4 block = pred_blocks.Load4((cb_pass_offset + thread.x) * 16);
+
+  int x = SubblockW * (block.x & 0xffff);
+  int y = SubblockH * (block.x >> 16);
+
+  const int2 dims = cb_dims[1].xy;
+  const int plane = block.y & 3;
+  int noskip = block.y & NoSkipFlag;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x) << 1;
+
+  int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+
+  int4 tmp = {0, 0, 0, 0};
+
+  int2 l;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.xy += l * cb_kernels[filter_v][0].x;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.zw += l * cb_kernels[filter_v][0].x;
+  tmp.xy += l * cb_kernels[filter_v][0].y;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.zw += l * cb_kernels[filter_v][0].y;
+  tmp.xy += l * cb_kernels[filter_v][0].z;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.zw += l * cb_kernels[filter_v][0].z;
+  tmp.xy += l * cb_kernels[filter_v][0].w;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.zw += l * cb_kernels[filter_v][0].w;
+  tmp.xy += l * cb_kernels[filter_v][1].x;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.zw += l * cb_kernels[filter_v][1].x;
+  tmp.xy += l * cb_kernels[filter_v][1].y;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.zw += l * cb_kernels[filter_v][1].y;
+  tmp.xy += l * cb_kernels[filter_v][1].z;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.zw += l * cb_kernels[filter_v][1].z;
+  tmp.xy += l * cb_kernels[filter_v][1].w;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  tmp.zw += l * cb_kernels[filter_v][1].w;
+  tmp.x = clamp((int)(((tmp.x + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+  tmp.y = clamp((int)(((tmp.y + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+  tmp.z = clamp((int)(((tmp.z + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+  tmp.w = clamp((int)(((tmp.w + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+
+  /// add residuals here
+  x <<= 1;
+  const int res_stride = cb_planes[plane].z;
+  const int res_offset = cb_planes[plane].w + x + y * res_stride;
+  if (noskip) {
+    int r0 = residuals.Load(res_offset);
+    int r1 = residuals.Load(res_offset + res_stride);
+    tmp.x += (r0 << 16) >> 16;
+    tmp.y += r0 >> 16;
+    tmp.z += (r1 << 16) >> 16;
+    tmp.w += r1 >> 16;
+    tmp = clamp(tmp, 0, PixelMax);
+  }
+
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+  dst_frame.Store(output_offset, tmp.x | (tmp.y << 16));
+  dst_frame.Store(output_offset + output_stride, tmp.z | (tmp.w << 16));
+}

diff --git a/libav1/dx/shaders/inter_common.h b/libav1/dx/shaders/inter_common.h
new file mode 100644
index 0000000..9a43771
--- /dev/null
+++ b/libav1/dx/shaders/inter_common.h

@@ -0,0 +1,439 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#define FILTER_MASK 3
+#define SUBPEL_BITS 4
+#define SUBPEL_MASK ((1 << SUBPEL_BITS) - 1)
+#define BLOCK_POS_MASK 0x7fff
+#define SCALE_BITS 14
+#define REF_SCALE_SHIFT 14
+#define SCALE_SUBPEL_BITS 10
+#define SCALE_SUBPEL_SHIFTS (1 << SCALE_SUBPEL_BITS)        // 1024
+#define SCALE_SUBPEL_MASK (SCALE_SUBPEL_SHIFTS - 1)         // 1023
+#define SCALE_EXTRA_BITS (SCALE_SUBPEL_BITS - SUBPEL_BITS)  // 6
+#define SCALE_EXTRA_OFF ((1 << SCALE_EXTRA_BITS) / 2)       // 32
+#define NON_SQR_FLAG_SHIFT_X 15
+#define NON_SQR_FLAG_SHIFT_Y 31
+#define RefCount 7
+#define NoSkipFlag 0x2000
+
+#define FilterLineShift 3
+#define FilterLineAdd8bit ((1 << (FilterLineShift - 1)) + (1 << 14))
+#define FilterLineAdd10bit ((1 << (FilterLineShift - 1)) + (1 << 16))
+
+#define WarpHorizBits 3
+#define WarpHorizSumAdd ((1 << 14) + (1 << (WarpHorizBits - 1)))
+#define WarpHorizSumAdd10 ((1 << 16) + (1 << (WarpHorizBits - 1)))
+#define PixelMax10 1023
+#define WarpPrecBits 16
+#define WarpReduceBits 6
+#define WarpFiltRoundBits (WarpPrecBits - WarpReduceBits)
+#define WarpFiltRoundAdd (1 << (WarpFiltRoundBits - 1))
+#define WarpFiltOffset 64
+#define WarpFilterSize 32
+#define WarpBlockSize 48
+
+ByteAddressBuffer pred_blocks : register(t0);
+ByteAddressBuffer residuals : register(t1);
+ByteAddressBuffer comp_mask : register(t2);
+ByteAddressBuffer warp_blocks : register(t3);
+ByteAddressBuffer warp_filter : register(t4);
+RWByteAddressBuffer dst_frame : register(u0);
+
+typedef struct {
+  int4 info0;
+  int4 info1;
+  int4 info2;
+} GlobalMotionWarp;
+
+cbuffer PSSLInterData : register(b0) {
+  int4 cb_planes[3];
+  int4 cb_refplanes[3 * RefCount];
+  int4 cb_dims[2];
+  int4 cb_pixel_max;
+  int4 cb_kernels[8 * 16][2];
+  int4 cb_scale[8];
+  int4 cb_obmc_mask[1 + 1 + 2 + 4 + 8];
+  GlobalMotionWarp cb_gm_warp[RefCount];
+};
+
+cbuffer PSSLInterSRT : register(b1) {
+  uint cb_wi_count;
+  uint cb_pass_offset;
+  uint cb_width_log2;
+  uint cb_height_log2;
+};
+
+int4 filter_line(RWByteAddressBuffer ref, int offset, int4 fkernel0, int4 fkernel1) {
+  uint4 l = ref.Load4(offset & (~3));
+  uint shift = (offset & 3) * 8;
+  l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+  l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+  l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+
+  int4 sum = {0, 0, 0, 0};
+  int filter_val = 0;
+  filter_val = fkernel0.x;
+  sum.x += filter_val * (int)((l.x >> 0) & 0xff);
+  sum.y += filter_val * (int)((l.x >> 8) & 0xff);
+  sum.z += filter_val * (int)((l.x >> 16) & 0xff);
+  sum.w += filter_val * (int)((l.x >> 24) & 0xff);
+
+  filter_val = fkernel0.y;
+  sum.x += filter_val * (int)((l.x >> 8) & 0xff);
+  sum.y += filter_val * (int)((l.x >> 16) & 0xff);
+  sum.z += filter_val * (int)((l.x >> 24) & 0xff);
+  sum.w += filter_val * (int)((l.y >> 0) & 0xff);
+
+  filter_val = fkernel0.z;
+  sum.x += filter_val * (int)((l.x >> 16) & 0xff);
+  sum.y += filter_val * (int)((l.x >> 24) & 0xff);
+  sum.z += filter_val * (int)((l.y >> 0) & 0xff);
+  sum.w += filter_val * (int)((l.y >> 8) & 0xff);
+
+  filter_val = fkernel0.w;
+  sum.x += filter_val * (int)((l.x >> 24) & 0xff);
+  sum.y += filter_val * (int)((l.y >> 0) & 0xff);
+  sum.z += filter_val * (int)((l.y >> 8) & 0xff);
+  sum.w += filter_val * (int)((l.y >> 16) & 0xff);
+
+  filter_val = fkernel1.x;
+  sum.x += filter_val * (int)((l.y >> 0) & 0xff);
+  sum.y += filter_val * (int)((l.y >> 8) & 0xff);
+  sum.z += filter_val * (int)((l.y >> 16) & 0xff);
+  sum.w += filter_val * (int)((l.y >> 24) & 0xff);
+
+  filter_val = fkernel1.y;
+  sum.x += filter_val * (int)((l.y >> 8) & 0xff);
+  sum.y += filter_val * (int)((l.y >> 16) & 0xff);
+  sum.z += filter_val * (int)((l.y >> 24) & 0xff);
+  sum.w += filter_val * (int)((l.z >> 0) & 0xff);
+
+  filter_val = fkernel1.z;
+  sum.x += filter_val * (int)((l.y >> 16) & 0xff);
+  sum.y += filter_val * (int)((l.y >> 24) & 0xff);
+  sum.z += filter_val * (int)((l.z >> 0) & 0xff);
+  sum.w += filter_val * (int)((l.z >> 8) & 0xff);
+
+  filter_val = fkernel1.w;
+  sum.x += filter_val * (int)((l.y >> 24) & 0xff);
+  sum.y += filter_val * (int)((l.z >> 0) & 0xff);
+  sum.z += filter_val * (int)((l.z >> 8) & 0xff);
+  sum.w += filter_val * (int)((l.z >> 16) & 0xff);
+
+  sum.x = (sum.x + FilterLineAdd8bit) >> FilterLineShift;
+  sum.y = (sum.y + FilterLineAdd8bit) >> FilterLineShift;
+  sum.z = (sum.z + FilterLineAdd8bit) >> FilterLineShift;
+  sum.w = (sum.w + FilterLineAdd8bit) >> FilterLineShift;
+  return sum;
+}
+
+int4 filter_line_hbd(RWByteAddressBuffer ref, int offset, int4 fkernel0, int4 fkernel1) {
+  const int shift = 8 * (offset & 2);
+  offset &= ~3;
+  uint4 l0 = ref.Load4(offset);
+  uint2 l1 = ref.Load2(offset + 16);
+
+  l0.x = (l0.x >> shift) | ((l0.y << (24 - shift)) << 8);
+  l0.y = (l0.y >> shift) | ((l0.z << (24 - shift)) << 8);
+  l0.z = (l0.z >> shift) | ((l0.w << (24 - shift)) << 8);
+  l0.w = (l0.w >> shift) | ((l1.x << (24 - shift)) << 8);
+  l1.x = (l1.x >> shift) | ((l1.y << (24 - shift)) << 8);
+  l1.y = l1.y >> shift;
+  int4 sum = {0, 0, 0, 0};
+
+  sum.x += fkernel0.x * (int)((l0.x >> 0) & 0xffff);
+  sum.y += fkernel0.x * (int)((l0.x >> 16) & 0xffff);
+  sum.z += fkernel0.x * (int)((l0.y >> 0) & 0xffff);
+  sum.w += fkernel0.x * (int)((l0.y >> 16) & 0xffff);
+
+  sum.x += fkernel0.y * (int)((l0.x >> 16) & 0xffff);
+  sum.y += fkernel0.y * (int)((l0.y >> 0) & 0xffff);
+  sum.z += fkernel0.y * (int)((l0.y >> 16) & 0xffff);
+  sum.w += fkernel0.y * (int)((l0.z >> 0) & 0xffff);
+
+  sum.x += fkernel0.z * (int)((l0.y >> 0) & 0xffff);
+  sum.y += fkernel0.z * (int)((l0.y >> 16) & 0xffff);
+  sum.z += fkernel0.z * (int)((l0.z >> 0) & 0xffff);
+  sum.w += fkernel0.z * (int)((l0.z >> 16) & 0xffff);
+
+  sum.x += fkernel0.w * (int)((l0.y >> 16) & 0xffff);
+  sum.y += fkernel0.w * (int)((l0.z >> 0) & 0xffff);
+  sum.z += fkernel0.w * (int)((l0.z >> 16) & 0xffff);
+  sum.w += fkernel0.w * (int)((l0.w >> 0) & 0xffff);
+
+  sum.x += fkernel1.x * (int)((l0.z >> 0) & 0xffff);
+  sum.y += fkernel1.x * (int)((l0.z >> 16) & 0xffff);
+  sum.z += fkernel1.x * (int)((l0.w >> 0) & 0xffff);
+  sum.w += fkernel1.x * (int)((l0.w >> 16) & 0xffff);
+
+  sum.x += fkernel1.y * (int)((l0.z >> 16) & 0xffff);
+  sum.y += fkernel1.y * (int)((l0.w >> 0) & 0xffff);
+  sum.z += fkernel1.y * (int)((l0.w >> 16) & 0xffff);
+  sum.w += fkernel1.y * (int)((l1.x >> 0) & 0xffff);
+
+  sum.x += fkernel1.z * (int)((l0.w >> 0) & 0xffff);
+  sum.y += fkernel1.z * (int)((l0.w >> 16) & 0xffff);
+  sum.z += fkernel1.z * (int)((l1.x >> 0) & 0xffff);
+  sum.w += fkernel1.z * (int)((l1.x >> 16) & 0xffff);
+
+  sum.x += fkernel1.w * (int)((l0.w >> 16) & 0xffff);
+  sum.y += fkernel1.w * (int)((l1.x >> 0) & 0xffff);
+  sum.z += fkernel1.w * (int)((l1.x >> 16) & 0xffff);
+  sum.w += fkernel1.w * (int)((l1.y >> 0) & 0xffff);
+
+  sum.x = (sum.x + FilterLineAdd10bit) >> FilterLineShift;
+  sum.y = (sum.y + FilterLineAdd10bit) >> FilterLineShift;
+  sum.z = (sum.z + FilterLineAdd10bit) >> FilterLineShift;
+  sum.w = (sum.w + FilterLineAdd10bit) >> FilterLineShift;
+  return sum;
+}
+
+int2 filter_line2(RWByteAddressBuffer ref, int offset, int4 fkernel0, int4 fkernel1) {
+  uint4 l = ref.Load4(offset & (~3));
+  uint shift = (offset & 3) * 8;
+  l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+  l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+  l.z = (l.z >> shift);
+
+  int2 sum = {0, 0};
+
+  sum.x += fkernel0.x * (int)((l.x >> 0) & 0xff);
+  sum.y += fkernel0.x * (int)((l.x >> 8) & 0xff);
+
+  sum.x += fkernel0.y * (int)((l.x >> 8) & 0xff);
+  sum.y += fkernel0.y * (int)((l.x >> 16) & 0xff);
+
+  sum.x += fkernel0.z * (int)((l.x >> 16) & 0xff);
+  sum.y += fkernel0.z * (int)((l.x >> 24) & 0xff);
+
+  sum.x += fkernel0.w * (int)((l.x >> 24) & 0xff);
+  sum.y += fkernel0.w * (int)((l.y >> 0) & 0xff);
+
+  sum.x += fkernel1.x * (int)((l.y >> 0) & 0xff);
+  sum.y += fkernel1.x * (int)((l.y >> 8) & 0xff);
+
+  sum.x += fkernel1.y * (int)((l.y >> 8) & 0xff);
+  sum.y += fkernel1.y * (int)((l.y >> 16) & 0xff);
+
+  sum.x += fkernel1.z * (int)((l.y >> 16) & 0xff);
+  sum.y += fkernel1.z * (int)((l.y >> 24) & 0xff);
+
+  sum.x += fkernel1.w * (int)((l.y >> 24) & 0xff);
+  sum.y += fkernel1.w * (int)((l.z >> 0) & 0xff);
+
+  sum.x = (sum.x + FilterLineAdd8bit) >> FilterLineShift;
+  sum.y = (sum.y + FilterLineAdd8bit) >> FilterLineShift;
+  return sum;
+}
+
+int2 filter_line2_hbd(RWByteAddressBuffer ref, int offset, int4 fkernel0, int4 fkernel1) {
+  const int shift = 8 * (offset & 2);
+  offset &= ~3;
+  uint4 l0 = ref.Load4(offset);
+  uint l1 = ref.Load(offset + 16);
+
+  l0.x = (l0.x >> shift) | ((l0.y << (24 - shift)) << 8);
+  l0.y = (l0.y >> shift) | ((l0.z << (24 - shift)) << 8);
+  l0.z = (l0.z >> shift) | ((l0.w << (24 - shift)) << 8);
+  l0.w = (l0.w >> shift) | ((l1.x << (24 - shift)) << 8);
+  l1.x = l1.x >> shift;
+  int2 sum = {0, 0};
+
+  sum.x += fkernel0.x * (int)((l0.x >> 0) & 0xffff);
+  sum.y += fkernel0.x * (int)((l0.x >> 16) & 0xffff);
+  sum.x += fkernel0.y * (int)((l0.x >> 16) & 0xffff);
+  sum.y += fkernel0.y * (int)((l0.y >> 0) & 0xffff);
+  sum.x += fkernel0.z * (int)((l0.y >> 0) & 0xffff);
+  sum.y += fkernel0.z * (int)((l0.y >> 16) & 0xffff);
+  sum.x += fkernel0.w * (int)((l0.y >> 16) & 0xffff);
+  sum.y += fkernel0.w * (int)((l0.z >> 0) & 0xffff);
+  sum.x += fkernel1.x * (int)((l0.z >> 0) & 0xffff);
+  sum.y += fkernel1.x * (int)((l0.z >> 16) & 0xffff);
+  sum.x += fkernel1.y * (int)((l0.z >> 16) & 0xffff);
+  sum.y += fkernel1.y * (int)((l0.w >> 0) & 0xffff);
+  sum.x += fkernel1.z * (int)((l0.w >> 0) & 0xffff);
+  sum.y += fkernel1.z * (int)((l0.w >> 16) & 0xffff);
+  sum.x += fkernel1.w * (int)((l0.w >> 16) & 0xffff);
+  sum.y += fkernel1.w * (int)((l1.x >> 0) & 0xffff);
+
+  sum.x = (sum.x + FilterLineAdd10bit) >> FilterLineShift;
+  sum.y = (sum.y + FilterLineAdd10bit) >> FilterLineShift;
+  return sum;
+}
+
+#if 0
+int scale_value(int val, int sf)
+{
+    const int off = (sf - (1 << REF_SCALE_SHIFT)) * (1 << (SUBPEL_BITS - 1));
+    long tval = (long)val * sf + off;
+    return (tval < 0) ? (int)(-((-tval + 128) >> 8)) : (int)((tval + 128) >> 8);
+}
+#else
+int scale_value(int val, int sf) {
+  const int off = (sf - (1 << REF_SCALE_SHIFT)) * (1 << (SUBPEL_BITS - 1));
+  double dval = (double)val * sf + off;
+  dval = (dval < 0) ? (dval - 128) : (dval + 128);
+  dval *= 0.00390625;
+  return (int)dval;
+}
+#endif
+
+int4 filter_line_warp(RWByteAddressBuffer ref, int offset, ByteAddressBuffer filter, int sx, int alpha) {
+  uint4 l = ref.Load4(offset & (~3));
+  uint shift = (offset & 3) * 8;
+  l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+  l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+  l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+  int4 sum = int4(WarpHorizSumAdd, WarpHorizSumAdd, WarpHorizSumAdd, WarpHorizSumAdd);
+
+  int filter_offset = WarpFilterSize * (((sx + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+  uint4 f0 = filter.Load4(filter_offset);
+  uint4 f1 = filter.Load4(filter_offset + 16);
+
+  sum.x += f0.x * (int)((l.x >> 0) & 0xff);
+  sum.x += f0.y * (int)((l.x >> 8) & 0xff);
+  sum.x += f0.z * (int)((l.x >> 16) & 0xff);
+  sum.x += f0.w * (int)((l.x >> 24) & 0xff);
+  sum.x += f1.x * (int)((l.y >> 0) & 0xff);
+  sum.x += f1.y * (int)((l.y >> 8) & 0xff);
+  sum.x += f1.z * (int)((l.y >> 16) & 0xff);
+  sum.x += f1.w * (int)((l.y >> 24) & 0xff);
+
+  sx += alpha;
+  filter_offset = WarpFilterSize * (((sx + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+  f0 = filter.Load4(filter_offset);
+  f1 = filter.Load4(filter_offset + 16);
+
+  sum.y += f0.x * (int)((l.x >> 8) & 0xff);
+  sum.y += f0.y * (int)((l.x >> 16) & 0xff);
+  sum.y += f0.z * (int)((l.x >> 24) & 0xff);
+  sum.y += f0.w * (int)((l.y >> 0) & 0xff);
+  sum.y += f1.x * (int)((l.y >> 8) & 0xff);
+  sum.y += f1.y * (int)((l.y >> 16) & 0xff);
+  sum.y += f1.z * (int)((l.y >> 24) & 0xff);
+  sum.y += f1.w * (int)((l.z >> 0) & 0xff);
+
+  sx += alpha;
+  filter_offset = WarpFilterSize * (((sx + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+  f0 = filter.Load4(filter_offset);
+  f1 = filter.Load4(filter_offset + 16);
+
+  sum.z += f0.x * (int)((l.x >> 16) & 0xff);
+  sum.z += f0.y * (int)((l.x >> 24) & 0xff);
+  sum.z += f0.z * (int)((l.y >> 0) & 0xff);
+  sum.z += f0.w * (int)((l.y >> 8) & 0xff);
+  sum.z += f1.x * (int)((l.y >> 16) & 0xff);
+  sum.z += f1.y * (int)((l.y >> 24) & 0xff);
+  sum.z += f1.z * (int)((l.z >> 0) & 0xff);
+  sum.z += f1.w * (int)((l.z >> 8) & 0xff);
+
+  sx += alpha;
+  filter_offset = WarpFilterSize * (((sx + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+  f0 = filter.Load4(filter_offset);
+  f1 = filter.Load4(filter_offset + 16);
+
+  sum.w += f0.x * (int)((l.x >> 24) & 0xff);
+  sum.w += f0.y * (int)((l.y >> 0) & 0xff);
+  sum.w += f0.z * (int)((l.y >> 8) & 0xff);
+  sum.w += f0.w * (int)((l.y >> 16) & 0xff);
+  sum.w += f1.x * (int)((l.y >> 24) & 0xff);
+  sum.w += f1.y * (int)((l.z >> 0) & 0xff);
+  sum.w += f1.z * (int)((l.z >> 8) & 0xff);
+  sum.w += f1.w * (int)((l.z >> 16) & 0xff);
+
+  sum.x = sum.x >> WarpHorizBits;
+  sum.y = sum.y >> WarpHorizBits;
+  sum.z = sum.z >> WarpHorizBits;
+  sum.w = sum.w >> WarpHorizBits;
+  return sum;
+}
+
+int4 filter_line_warp_hbd(RWByteAddressBuffer ref, int offset, ByteAddressBuffer filter, int sx, int alpha) {
+  uint shift = (offset & 3) * 8;
+  offset &= ~3;
+  uint4 l0 = ref.Load4(offset);
+  uint2 l4 = ref.Load2(offset + 16);
+  l0.x = (l0.x >> shift) | ((l0.y << (24 - shift)) << 8);
+  l0.y = (l0.y >> shift) | ((l0.z << (24 - shift)) << 8);
+  l0.z = (l0.z >> shift) | ((l0.w << (24 - shift)) << 8);
+  l0.w = (l0.w >> shift) | ((l4.x << (24 - shift)) << 8);
+  l4.x = (l4.x >> shift) | ((l4.y << (24 - shift)) << 8);
+  l4.y >>= shift;
+
+  int4 sum = int4(WarpHorizSumAdd10, WarpHorizSumAdd10, WarpHorizSumAdd10, WarpHorizSumAdd10);
+
+  int filter_offset = WarpFilterSize * (((sx + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+  uint4 f0 = filter.Load4(filter_offset);
+  uint4 f1 = filter.Load4(filter_offset + 16);
+
+  sum.x += f0.x * (int)((l0.x >> 0) & PixelMax10);
+  sum.x += f0.y * (int)((l0.x >> 16) & PixelMax10);
+  sum.x += f0.z * (int)((l0.y >> 0) & PixelMax10);
+  sum.x += f0.w * (int)((l0.y >> 16) & PixelMax10);
+  sum.x += f1.x * (int)((l0.z >> 0) & PixelMax10);
+  sum.x += f1.y * (int)((l0.z >> 16) & PixelMax10);
+  sum.x += f1.z * (int)((l0.w >> 0) & PixelMax10);
+  sum.x += f1.w * (int)((l0.w >> 16) & PixelMax10);
+
+  sx += alpha;
+  filter_offset = WarpFilterSize * (((sx + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+  f0 = filter.Load4(filter_offset);
+  f1 = filter.Load4(filter_offset + 16);
+
+  sum.y += f0.x * (int)((l0.x >> 16) & PixelMax10);
+  sum.y += f0.y * (int)((l0.y >> 0) & PixelMax10);
+  sum.y += f0.z * (int)((l0.y >> 16) & PixelMax10);
+  sum.y += f0.w * (int)((l0.z >> 0) & PixelMax10);
+  sum.y += f1.x * (int)((l0.z >> 16) & PixelMax10);
+  sum.y += f1.y * (int)((l0.w >> 0) & PixelMax10);
+  sum.y += f1.z * (int)((l0.w >> 16) & PixelMax10);
+  sum.y += f1.w * (int)((l4.x >> 0) & PixelMax10);
+
+  sx += alpha;
+  filter_offset = WarpFilterSize * (((sx + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+  f0 = filter.Load4(filter_offset);
+  f1 = filter.Load4(filter_offset + 16);
+
+  sum.z += f0.x * (int)((l0.y >> 0) & PixelMax10);
+  sum.z += f0.y * (int)((l0.y >> 16) & PixelMax10);
+  sum.z += f0.z * (int)((l0.z >> 0) & PixelMax10);
+  sum.z += f0.w * (int)((l0.z >> 16) & PixelMax10);
+  sum.z += f1.x * (int)((l0.w >> 0) & PixelMax10);
+  sum.z += f1.y * (int)((l0.w >> 16) & PixelMax10);
+  sum.z += f1.z * (int)((l4.x >> 0) & PixelMax10);
+  sum.z += f1.w * (int)((l4.x >> 16) & PixelMax10);
+
+  sx += alpha;
+  filter_offset = WarpFilterSize * (((sx + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+  f0 = filter.Load4(filter_offset);
+  f1 = filter.Load4(filter_offset + 16);
+
+  sum.w += f0.x * (int)((l0.y >> 16) & PixelMax10);
+  sum.w += f0.y * (int)((l0.z >> 0) & PixelMax10);
+  sum.w += f0.z * (int)((l0.z >> 16) & PixelMax10);
+  sum.w += f0.w * (int)((l0.w >> 0) & PixelMax10);
+  sum.w += f1.x * (int)((l0.w >> 16) & PixelMax10);
+  sum.w += f1.y * (int)((l4.x >> 0) & PixelMax10);
+  sum.w += f1.z * (int)((l4.x >> 16) & PixelMax10);
+  sum.w += f1.w * (int)((l4.y >> 0) & PixelMax10);
+
+  sum.x = sum.x >> WarpHorizBits;
+  sum.y = sum.y >> WarpHorizBits;
+  sum.z = sum.z >> WarpHorizBits;
+  sum.w = sum.w >> WarpHorizBits;
+  return sum;
+}

diff --git a/libav1/dx/shaders/inter_compound.hlsl b/libav1/dx/shaders/inter_compound.hlsl
new file mode 100644
index 0000000..6c75ca9
--- /dev/null
+++ b/libav1/dx/shaders/inter_compound.hlsl

@@ -0,0 +1,216 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 2
+#define OutputShift 7
+#define OutputRoundAdd (1 << (OutputShift - 1))
+#define OffsetBits 19
+#define SumAdd (1 << OffsetBits)
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define SUM1 1 << OffsetBits
+
+int blend(int src0, int src1, int coef0, int coef1) {
+  src0 = (src0 + OutputRoundAdd) >> OutputShift;
+  src1 = (src1 + OutputRoundAdd) >> OutputShift;
+  int result = (src0 * coef0 + src1 * coef1) >> DistBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, 255);
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log));
+  uint4 block = pred_blocks.Load4(block_index * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * (((block.x >> 16) << 1) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int2 dims = cb_dims[plane > 0].xy;
+
+  const int noskip = block.y & NoSkipFlag;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x);
+
+  int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output[2] = {{SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}};
+
+  int4 l;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output[0] += l * kernel_v0.x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.x;
+  output[0] += l * kernel_v0.y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  // output[2] += l * kernel_v0.x;
+  output[1] += l * kernel_v0.y;
+  output[0] += l * kernel_v0.z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v0.x;
+  // output[2] += l * kernel_v0.y;
+  output[1] += l * kernel_v0.z;
+  output[0] += l * kernel_v0.w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v0.y;
+  // output[2] += l * kernel_v0.z;
+  output[1] += l * kernel_v0.w;
+  output[0] += l * kernel_v1.x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v0.z;
+  // output[2] += l * kernel_v0.w;
+  output[1] += l * kernel_v1.x;
+  output[0] += l * kernel_v1.y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v0.w;
+  // output[2] += l * kernel_v1.x;
+  output[1] += l * kernel_v1.y;
+  output[0] += l * kernel_v1.z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v1.x;
+  // output[2] += l * kernel_v1.y;
+  output[1] += l * kernel_v1.z;
+  output[0] += l * kernel_v1.w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v1.y;
+  // output[2] += l * kernel_v1.z;
+  output[1] += l * kernel_v1.w;
+  // l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 9, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v1.z;
+  // output[2] += l * kernel_v1.w;
+  // l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 10, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v1.w;
+
+  mv = block.w;
+  mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x);
+
+  filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  refplane = ((block.y >> 14) & 7) * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output1[2] = {{SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}};
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output1[0] += l * kernel_v0.x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.x;
+  output1[0] += l * kernel_v0.y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[2] += l * kernel_v0.x;
+  output1[1] += l * kernel_v0.y;
+  output1[0] += l * kernel_v0.z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v0.x;
+  // output1[2] += l * kernel_v0.y;
+  output1[1] += l * kernel_v0.z;
+  output1[0] += l * kernel_v0.w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v0.y;
+  // output1[2] += l * kernel_v0.z;
+  output1[1] += l * kernel_v0.w;
+  output1[0] += l * kernel_v1.x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v0.z;
+  // output1[2] += l * kernel_v0.w;
+  output1[1] += l * kernel_v1.x;
+  output1[0] += l * kernel_v1.y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v0.w;
+  // output1[2] += l * kernel_v1.x;
+  output1[1] += l * kernel_v1.y;
+  output1[0] += l * kernel_v1.z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v1.x;
+  // output1[2] += l * kernel_v1.y;
+  output1[1] += l * kernel_v1.z;
+  output1[0] += l * kernel_v1.w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v1.y;
+  // output1[2] += l * kernel_v1.z;
+  output1[1] += l * kernel_v1.w;
+  // l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 9, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v1.z;
+  // output1[2] += l * kernel_v1.w;
+  // l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 10, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v1.w;
+
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+
+  int coef0 = (block.y >> 17) & 15;
+  int coef1 = (block.y >> 21) & 15;
+  const int res_stride = cb_planes[plane].z;
+  const int res_offset = cb_planes[plane].w;
+
+  for (int i = 0; i < 2; ++i) {
+    int4 pix4;
+    pix4.x = blend(output[i].x, output1[i].x, coef0, coef1);
+    pix4.y = blend(output[i].y, output1[i].y, coef0, coef1);
+    pix4.z = blend(output[i].z, output1[i].z, coef0, coef1);
+    pix4.w = blend(output[i].w, output1[i].w, coef0, coef1);
+
+    if (noskip) {
+      int2 r = (int2)residuals.Load2(res_offset + (x << 1) + (y + i) * res_stride);
+      pix4.x += (r.x << 16) >> 16;
+      pix4.y += r.x >> 16;
+      pix4.z += (r.y << 16) >> 16;
+      pix4.w += r.y >> 16;
+      pix4 = clamp(pix4, 0, 255);
+    }
+    dst_frame.Store(output_offset + i * output_stride, pix4.x | (pix4.y << 8) | (pix4.z << 16) | (pix4.w << 24));
+  }
+}

diff --git a/libav1/dx/shaders/inter_compound_2x2.hlsl b/libav1/dx/shaders/inter_compound_2x2.hlsl
new file mode 100644
index 0000000..6d0a7af
--- /dev/null
+++ b/libav1/dx/shaders/inter_compound_2x2.hlsl

@@ -0,0 +1,188 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 2
+#define SubblockH 2
+#define OutputShift 7
+#define OutputRoundAdd (1 << (OutputShift - 1))
+#define OffsetBits 19
+#define SumAdd (1 << OffsetBits)
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define DualWriteBlock (1 << 25)
+#define SUM1 1 << OffsetBits
+
+int blend(int src0, int src1, int coef0, int coef1) {
+  src0 = (src0 + OutputRoundAdd) >> OutputShift;
+  src1 = (src1 + OutputRoundAdd) >> OutputShift;
+  int result = (src0 * coef0 + src1 * coef1) >> DistBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, 255);
+}
+
+groupshared int2 mem[64];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  uint4 block = pred_blocks.Load4((cb_pass_offset + thread.x) * 16);
+
+  int x = SubblockW * (block.x & 0xffff);
+  int y = SubblockH * (block.x >> 16);
+
+  const int2 dims = cb_dims[1].xy;
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x);
+
+  int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output0 = {SUM1, SUM1, SUM1, SUM1};
+
+  int2 l;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output0.xy += l * kernel_v0.x;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output0.zw += l * kernel_v0.x;
+  output0.xy += l * kernel_v0.y;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output0.zw += l * kernel_v0.y;
+  output0.xy += l * kernel_v0.z;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output0.zw += l * kernel_v0.z;
+  output0.xy += l * kernel_v0.w;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output0.zw += l * kernel_v0.w;
+  output0.xy += l * kernel_v1.x;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output0.zw += l * kernel_v1.x;
+  output0.xy += l * kernel_v1.y;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output0.zw += l * kernel_v1.y;
+  output0.xy += l * kernel_v1.z;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output0.zw += l * kernel_v1.z;
+  output0.xy += l * kernel_v1.w;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output0.zw += l * kernel_v1.w;
+
+  mv = block.w;
+  mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x);
+  filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  refplane = ((block.y >> 14) & 7) * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output1 = {SUM1, SUM1, SUM1, SUM1};
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output1.xy += l * kernel_v0.x;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output1.zw += l * kernel_v0.x;
+  output1.xy += l * kernel_v0.y;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output1.zw += l * kernel_v0.y;
+  output1.xy += l * kernel_v0.z;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output1.zw += l * kernel_v0.z;
+  output1.xy += l * kernel_v0.w;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output1.zw += l * kernel_v0.w;
+  output1.xy += l * kernel_v1.x;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output1.zw += l * kernel_v1.x;
+  output1.xy += l * kernel_v1.y;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output1.zw += l * kernel_v1.y;
+  output1.xy += l * kernel_v1.z;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output1.zw += l * kernel_v1.z;
+  output1.xy += l * kernel_v1.w;
+  l = filter_line2(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output1.zw += l * kernel_v1.w;
+
+  int coef0 = (block.y >> 17) & 15;
+  int coef1 = (block.y >> 21) & 15;
+
+  int4 out4;
+  out4.x = blend(output0.x, output1.x, coef0, coef1);
+  out4.y = blend(output0.y, output1.y, coef0, coef1);
+  out4.z = blend(output0.z, output1.z, coef0, coef1);
+  out4.w = blend(output0.w, output1.w, coef0, coef1);
+
+  if (noskip) {
+    const int res_stride = cb_planes[plane].z;
+    const int res_offset = cb_planes[plane].w + (x << 1) + y * res_stride;
+    int r0 = residuals.Load(res_offset);
+    int r1 = residuals.Load(res_offset + res_stride);
+    out4.x += (r0 << 16) >> 16;
+    out4.y += r0 >> 16;
+    out4.z += (r1 << 16) >> 16;
+    out4.w += r1 >> 16;
+    out4 = clamp(out4, 0, 255);
+  }
+
+  int2 output;
+  output.x = (out4.x | (out4.y << 8)) << ((x & 2) << 3);
+  output.y = (out4.z | (out4.w << 8)) << ((x & 2) << 3);
+
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + (x & (~3)) + y * output_stride;
+
+  if (block.y & DualWriteBlock) {
+    mem[thread.x & 63] = output;
+    GroupMemoryBarrier();
+    if ((thread.x & 1) == 0) {
+      int2 output2 = mem[(thread.x & 63) + 1];
+      output.x |= output2.x;
+      output.y |= output2.y;
+      dst_frame.Store(output_offset, output.x);
+      dst_frame.Store(output_offset + output_stride, output.y);
+    }
+  } else {
+    const uint mask = 0xffff0000 >> ((x & 2) * 8);
+    output.x |= dst_frame.Load(output_offset) & mask;
+    output.y |= dst_frame.Load(output_offset + output_stride) & mask;
+    dst_frame.Store(output_offset, output.x);
+    dst_frame.Store(output_offset + output_stride, output.y);
+  }
+}

diff --git a/libav1/dx/shaders/inter_compound_2x2_hbd.hlsl b/libav1/dx/shaders/inter_compound_2x2_hbd.hlsl
new file mode 100644
index 0000000..41d7acc
--- /dev/null
+++ b/libav1/dx/shaders/inter_compound_2x2_hbd.hlsl

@@ -0,0 +1,167 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 2
+#define SubblockH 2
+#define OutputShift 7
+#define OffsetBits 21
+#define SumAdd (1 << OffsetBits)
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define PixelMax 1023
+
+int blend(int src0, int src1, int coef0, int coef1) {
+  src0 = (src0 + OutputRoundAdd) >> OutputShift;
+  src1 = (src1 + OutputRoundAdd) >> OutputShift;
+  int result = (src0 * coef0 + src1 * coef1) >> DistBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, PixelMax);
+}
+
+groupshared int2 mem[64];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  uint4 block = pred_blocks.Load4((cb_pass_offset + thread.x) * 16);
+
+  int x = SubblockW * (block.x & 0xffff);
+  int y = SubblockH * (block.x >> 16);
+
+  const int2 dims = cb_dims[1].xy;
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x) << 1;
+
+  int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output0 = {0, 0, 0, 0};
+  int2 l;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output0.xy += l * kernel_v0.x;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output0.zw += l * kernel_v0.x;
+  output0.xy += l * kernel_v0.y;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output0.zw += l * kernel_v0.y;
+  output0.xy += l * kernel_v0.z;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output0.zw += l * kernel_v0.z;
+  output0.xy += l * kernel_v0.w;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output0.zw += l * kernel_v0.w;
+  output0.xy += l * kernel_v1.x;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output0.zw += l * kernel_v1.x;
+  output0.xy += l * kernel_v1.y;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output0.zw += l * kernel_v1.y;
+  output0.xy += l * kernel_v1.z;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output0.zw += l * kernel_v1.z;
+  output0.xy += l * kernel_v1.w;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output0.zw += l * kernel_v1.w;
+
+  mv = block.w;
+  mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x) << 1;
+  filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  refplane = ((block.y >> 14) & 7) * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output1 = {0, 0, 0, 0};
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output1.xy += l * kernel_v0.x;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output1.zw += l * kernel_v0.x;
+  output1.xy += l * kernel_v0.y;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output1.zw += l * kernel_v0.y;
+  output1.xy += l * kernel_v0.z;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output1.zw += l * kernel_v0.z;
+  output1.xy += l * kernel_v0.w;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output1.zw += l * kernel_v0.w;
+  output1.xy += l * kernel_v1.x;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output1.zw += l * kernel_v1.x;
+  output1.xy += l * kernel_v1.y;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output1.zw += l * kernel_v1.y;
+  output1.xy += l * kernel_v1.z;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output1.zw += l * kernel_v1.z;
+  output1.xy += l * kernel_v1.w;
+  l = filter_line2_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output1.zw += l * kernel_v1.w;
+
+  int coef0 = (block.y >> 17) & 15;
+  int coef1 = (block.y >> 21) & 15;
+
+  int4 out4;
+  out4.x = blend(output0.x, output1.x, coef0, coef1);
+  out4.y = blend(output0.y, output1.y, coef0, coef1);
+  out4.z = blend(output0.z, output1.z, coef0, coef1);
+  out4.w = blend(output0.w, output1.w, coef0, coef1);
+
+  x <<= 1;
+  if (noskip) {
+    const int res_stride = cb_planes[plane].z;
+    const int res_offset = cb_planes[plane].w + x + y * res_stride;
+    int r0 = residuals.Load(res_offset);
+    int r1 = residuals.Load(res_offset + res_stride);
+    out4.x += (r0 << 16) >> 16;
+    out4.y += r0 >> 16;
+    out4.z += (r1 << 16) >> 16;
+    out4.w += r1 >> 16;
+    out4 = clamp(out4, 0, PixelMax);
+  }
+
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+  dst_frame.Store(output_offset, out4.x | (out4.y << 16));
+  dst_frame.Store(output_offset + output_stride, out4.z | (out4.w << 16));
+}

diff --git a/libav1/dx/shaders/inter_compound_diff_chroma.hlsl b/libav1/dx/shaders/inter_compound_diff_chroma.hlsl
new file mode 100644
index 0000000..0620db0
--- /dev/null
+++ b/libav1/dx/shaders/inter_compound_diff_chroma.hlsl

@@ -0,0 +1,216 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 2
+#define OutputShift 7
+#define OutputRoundAdd (1 << (OutputShift - 1))
+#define OffsetBits 19
+#define SumAdd (1 << OffsetBits)
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4,
+#define DiffWTDBits 6
+#define DiffWTDMax 64
+#define SUM1 1 << OffsetBits
+
+int blend(int src0, int src1, int m) {
+  src0 = (src0 + OutputRoundAdd) >> OutputShift;
+  src1 = (src1 + OutputRoundAdd) >> OutputShift;
+  int result = (src0 * m + src1 * (DiffWTDMax - m)) >> DiffWTDBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, 255);
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+
+  uint4 block = pred_blocks.Load4((cb_pass_offset + (thread.x >> (w_log + h_log))) * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * (((block.x >> 16) << 1) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int2 dims = cb_dims[plane > 0].xy;
+
+  const int noskip = block.y & NoSkipFlag;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x);
+
+  int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output[2] = {{SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}};
+
+  int4 l;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output[0] += l * kernel_v0.x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.x;
+  output[0] += l * kernel_v0.y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  // output[2] += l * kernel_v0.x;
+  output[1] += l * kernel_v0.y;
+  output[0] += l * kernel_v0.z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v0.x;
+  // output[2] += l * kernel_v0.y;
+  output[1] += l * kernel_v0.z;
+  output[0] += l * kernel_v0.w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v0.y;
+  // output[2] += l * kernel_v0.z;
+  output[1] += l * kernel_v0.w;
+  output[0] += l * kernel_v1.x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v0.z;
+  // output[2] += l * kernel_v0.w;
+  output[1] += l * kernel_v1.x;
+  output[0] += l * kernel_v1.y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v0.w;
+  // output[2] += l * kernel_v1.x;
+  output[1] += l * kernel_v1.y;
+  output[0] += l * kernel_v1.z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v1.x;
+  // output[2] += l * kernel_v1.y;
+  output[1] += l * kernel_v1.z;
+  output[0] += l * kernel_v1.w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v1.y;
+  // output[2] += l * kernel_v1.z;
+  output[1] += l * kernel_v1.w;
+  // l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 9, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v1.z;
+  // output[2] += l * kernel_v1.w;
+  // l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 10, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v1.w;
+
+  mv = block.w;
+  mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x);
+
+  filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  refplane = ((block.y >> 14) & 7) * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output1[2] = {{SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}};
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output1[0] += l * kernel_v0.x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.x;
+  output1[0] += l * kernel_v0.y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[2] += l * kernel_v0.x;
+  output1[1] += l * kernel_v0.y;
+  output1[0] += l * kernel_v0.z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v0.x;
+  // output1[2] += l * kernel_v0.y;
+  output1[1] += l * kernel_v0.z;
+  output1[0] += l * kernel_v0.w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v0.y;
+  // output1[2] += l * kernel_v0.z;
+  output1[1] += l * kernel_v0.w;
+  output1[0] += l * kernel_v1.x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v0.z;
+  // output1[2] += l * kernel_v0.w;
+  output1[1] += l * kernel_v1.x;
+  output1[0] += l * kernel_v1.y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v0.w;
+  // output1[2] += l * kernel_v1.x;
+  output1[1] += l * kernel_v1.y;
+  output1[0] += l * kernel_v1.z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v1.x;
+  // output1[2] += l * kernel_v1.y;
+  output1[1] += l * kernel_v1.z;
+  output1[0] += l * kernel_v1.w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v1.y;
+  // output1[2] += l * kernel_v1.z;
+  output1[1] += l * kernel_v1.w;
+  // l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 9, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v1.z;
+  // output1[2] += l * kernel_v1.w;
+  // l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 10, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v1.w;
+
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+  const int res_stride = cb_planes[plane].z;
+  const int res_offset = cb_planes[plane].w + (x << 1) + y * res_stride;
+  for (int i = 0; i < 2; ++i) {
+    uint mask = dst_frame.Load(output_offset + i * output_stride);
+    output[i].x = blend(output[i].x, output1[i].x, (mask >> 0) & 255);
+    output[i].y = blend(output[i].y, output1[i].y, (mask >> 8) & 255);
+    output[i].z = blend(output[i].z, output1[i].z, (mask >> 16) & 255);
+    output[i].w = blend(output[i].w, output1[i].w, (mask >> 24) & 255);
+    if (noskip) {
+      int2 r = (int2)residuals.Load2(res_offset + i * res_stride);
+      output[i].x += (r.x << 16) >> 16;
+      output[i].y += r.x >> 16;
+      output[i].z += (r.y << 16) >> 16;
+      output[i].w += r.y >> 16;
+      output[i] = clamp(output[i], 0, 255);
+    }
+
+    uint pixels = output[i].x | (output[i].y << 8) | (output[i].z << 16) | (output[i].w << 24);
+
+    dst_frame.Store(output_offset + i * output_stride, pixels);
+  }
+}

diff --git a/libav1/dx/shaders/inter_compound_diff_chroma_hbd.hlsl b/libav1/dx/shaders/inter_compound_diff_chroma_hbd.hlsl
new file mode 100644
index 0000000..3b1c634
--- /dev/null
+++ b/libav1/dx/shaders/inter_compound_diff_chroma_hbd.hlsl

@@ -0,0 +1,179 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 2
+#define OutputShift 7
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define PixelMax 1023
+#define DiffWTDBits 6
+#define DiffWTDMax 64
+
+int blend(int src0, int src1, int m) {
+  src0 = (src0 + OutputRoundAdd) >> OutputShift;
+  src1 = (src1 + OutputRoundAdd) >> OutputShift;
+  int result = (src0 * m + src1 * (DiffWTDMax - m)) >> DiffWTDBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, PixelMax);
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+
+  uint4 block = pred_blocks.Load4((cb_pass_offset + (thread.x >> (w_log + h_log))) * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * (((block.x >> 16) << 1) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int2 dims = cb_dims[plane > 0].xy;
+
+  const int noskip = block.y & NoSkipFlag;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x) << 1;
+
+  int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output[2] = {{0, 0, 0, 0}, {0, 0, 0, 0}};
+
+  int4 l;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output[0] += l * kernel_v0.x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.x;
+  output[0] += l * kernel_v0.y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.y;
+  output[0] += l * kernel_v0.z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.z;
+  output[0] += l * kernel_v0.w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.w;
+  output[0] += l * kernel_v1.x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v1.x;
+  output[0] += l * kernel_v1.y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v1.y;
+  output[0] += l * kernel_v1.z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v1.z;
+  output[0] += l * kernel_v1.w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v1.w;
+
+  mv = block.w;
+  mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x) << 1;
+
+  filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  refplane = ((block.y >> 14) & 7) * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output1[2] = {{0, 0, 0, 0}, {0, 0, 0, 0}};
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output1[0] += l * kernel_v0.x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.x;
+  output1[0] += l * kernel_v0.y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.y;
+  output1[0] += l * kernel_v0.z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.z;
+  output1[0] += l * kernel_v0.w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.w;
+  output1[0] += l * kernel_v1.x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v1.x;
+  output1[0] += l * kernel_v1.y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v1.y;
+  output1[0] += l * kernel_v1.z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v1.z;
+  output1[0] += l * kernel_v1.w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v1.w;
+
+  x <<= 1;
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+  const int res_stride = cb_planes[plane].z;
+  const int res_offset = cb_planes[plane].w + x + y * res_stride;
+  for (int i = 0; i < 2; ++i) {
+    uint mask = dst_frame.Load(output_offset + i * output_stride);
+    output[i].x = blend(output[i].x, output1[i].x, (mask >> 0) & 255);
+    output[i].y = blend(output[i].y, output1[i].y, (mask >> 8) & 255);
+    output[i].z = blend(output[i].z, output1[i].z, (mask >> 16) & 255);
+    output[i].w = blend(output[i].w, output1[i].w, (mask >> 24) & 255);
+    if (noskip) {
+      int2 r = (int2)residuals.Load2(res_offset + i * res_stride);
+      output[i].x += (r.x << 16) >> 16;
+      output[i].y += r.x >> 16;
+      output[i].z += (r.y << 16) >> 16;
+      output[i].w += r.y >> 16;
+      output[i] = clamp(output[i], 0, PixelMax);
+    }
+
+    dst_frame.Store2(output_offset + i * output_stride,
+                     uint2(output[i].x | (output[i].y << 16), output[i].z | (output[i].w << 16)));
+  }
+}

diff --git a/libav1/dx/shaders/inter_compound_diff_luma.hlsl b/libav1/dx/shaders/inter_compound_diff_luma.hlsl
new file mode 100644
index 0000000..23157ba
--- /dev/null
+++ b/libav1/dx/shaders/inter_compound_diff_luma.hlsl

@@ -0,0 +1,261 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 2
+#define OutputShift 7
+#define OutputRoundAdd (1 << (OutputShift - 1))
+#define OffsetBits 19
+#define SumAdd (1 << OffsetBits)
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define DiffWTDBase 38
+#define DiffWTDRoundAdd 8
+#define DiffWTDRoundShft 8
+#define DiffWTDBits 6
+#define DiffWTDMax 64
+#define SUM1 1 << OffsetBits
+
+int compute_mask(int src0, int src1, int inv) {
+  int m = clamp(DiffWTDBase + ((abs(src0 - src1) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+  return inv ? DiffWTDMax - m : m;
+}
+
+int blend(int src0, int src1, int m) {
+  int result = (src0 * m + src1 * (DiffWTDMax - m)) >> DiffWTDBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, 255);
+}
+
+groupshared int2 mem[64];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+
+  uint4 block = pred_blocks.Load4((cb_pass_offset + (thread.x >> (w_log + h_log))) * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * (((block.x >> 16) << 1) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int2 dims = cb_dims[plane > 0].xy;
+
+  const int noskip = block.y & NoSkipFlag;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x);
+
+  int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output[2] = {{SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}};
+
+  int4 l;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output[0] += l * kernel_v0.x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.x;
+  output[0] += l * kernel_v0.y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  // output[2] += l * kernel_v0.x;
+  output[1] += l * kernel_v0.y;
+  output[0] += l * kernel_v0.z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v0.x;
+  // output[2] += l * kernel_v0.y;
+  output[1] += l * kernel_v0.z;
+  output[0] += l * kernel_v0.w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v0.y;
+  // output[2] += l * kernel_v0.z;
+  output[1] += l * kernel_v0.w;
+  output[0] += l * kernel_v1.x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v0.z;
+  // output[2] += l * kernel_v0.w;
+  output[1] += l * kernel_v1.x;
+  output[0] += l * kernel_v1.y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v0.w;
+  // output[2] += l * kernel_v1.x;
+  output[1] += l * kernel_v1.y;
+  output[0] += l * kernel_v1.z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v1.x;
+  // output[2] += l * kernel_v1.y;
+  output[1] += l * kernel_v1.z;
+  output[0] += l * kernel_v1.w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v1.y;
+  // output[2] += l * kernel_v1.z;
+  output[1] += l * kernel_v1.w;
+  // l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 9, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v1.z;
+  // output[2] += l * kernel_v1.w;
+  // l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 10, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v1.w;
+
+  mv = block.w;
+  mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x);
+
+  filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  refplane = ((block.y >> 14) & 7) * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output1[2] = {{SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}};
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output1[0] += l * kernel_v0.x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.x;
+  output1[0] += l * kernel_v0.y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[2] += l * kernel_v0.x;
+  output1[1] += l * kernel_v0.y;
+  output1[0] += l * kernel_v0.z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v0.x;
+  // output1[2] += l * kernel_v0.y;
+  output1[1] += l * kernel_v0.z;
+  output1[0] += l * kernel_v0.w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v0.y;
+  // output1[2] += l * kernel_v0.z;
+  output1[1] += l * kernel_v0.w;
+  output1[0] += l * kernel_v1.x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v0.z;
+  // output1[2] += l * kernel_v0.w;
+  output1[1] += l * kernel_v1.x;
+  output1[0] += l * kernel_v1.y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v0.w;
+  // output1[2] += l * kernel_v1.x;
+  output1[1] += l * kernel_v1.y;
+  output1[0] += l * kernel_v1.z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v1.x;
+  // output1[2] += l * kernel_v1.y;
+  output1[1] += l * kernel_v1.z;
+  output1[0] += l * kernel_v1.w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v1.y;
+  // output1[2] += l * kernel_v1.z;
+  output1[1] += l * kernel_v1.w;
+  // l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 9, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v1.z;
+  // output1[2] += l * kernel_v1.w;
+  // l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 10, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v1.w;
+
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+
+  int2 m0;
+  m0 = int2(0, 0);
+  int inv = (block.y >> 17) & 1;
+  const int res_stride = cb_planes[plane].z;
+  const int res_offset = cb_planes[plane].w + (x << 1) + y * res_stride;
+  for (int i = 0; i < 2; ++i) {
+    int4 pix4;
+    int src0 = (output[i].x + OutputRoundAdd) >> OutputShift;
+    int src1 = (output1[i].x + OutputRoundAdd) >> OutputShift;
+    int m = compute_mask(src0, src1, inv);
+    pix4.x = blend(src0, src1, m);
+    m0.x += m;
+
+    src0 = (output[i].y + OutputRoundAdd) >> OutputShift;
+    src1 = (output1[i].y + OutputRoundAdd) >> OutputShift;
+    m = compute_mask(src0, src1, inv);
+    pix4.y = blend(src0, src1, m);
+    m0.x += m;
+
+    src0 = (output[i].z + OutputRoundAdd) >> OutputShift;
+    src1 = (output1[i].z + OutputRoundAdd) >> OutputShift;
+    m = compute_mask(src0, src1, inv);
+    pix4.z = blend(src0, src1, m);
+    m0.y += m;
+
+    src0 = (output[i].w + OutputRoundAdd) >> OutputShift;
+    src1 = (output1[i].w + OutputRoundAdd) >> OutputShift;
+    m = compute_mask(src0, src1, inv);
+    pix4.w = blend(src0, src1, m);
+    m0.y += m;
+
+    if (noskip) {
+      int2 r = (int2)residuals.Load2(res_offset + i * res_stride);
+      pix4.x += (r.x << 16) >> 16;
+      pix4.y += r.x >> 16;
+      pix4.z += (r.y << 16) >> 16;
+      pix4.w += r.y >> 16;
+      pix4 = clamp(pix4, 0, 255);
+    }
+
+    dst_frame.Store(output_offset + i * output_stride, pix4.x | (pix4.y << 8) | (pix4.z << 16) | (pix4.w << 24));
+  }
+
+  m0.x = (m0.x + 2) >> 2;
+  m0.y = (m0.y + 2) >> 2;
+
+  mem[thread.x & 63] = m0;
+
+  GroupMemoryBarrier();
+
+  if ((thread.x & 1) == 0) {
+    int2 m1 = mem[(thread.x & 63) + 1];
+    int chroma_offset = (x + y * cb_planes[1].x) >> 1;
+    uint mask = m0.x | (m0.y << 8) | (m1.x << 16) | (m1.y << 24);
+    dst_frame.Store(cb_planes[1].y + chroma_offset, mask);
+    dst_frame.Store(cb_planes[2].y + chroma_offset, mask);
+  }
+}

diff --git a/libav1/dx/shaders/inter_compound_diff_luma_hbd.hlsl b/libav1/dx/shaders/inter_compound_diff_luma_hbd.hlsl
new file mode 100644
index 0000000..181f973
--- /dev/null
+++ b/libav1/dx/shaders/inter_compound_diff_luma_hbd.hlsl

@@ -0,0 +1,224 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 2
+#define OutputShift 7
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define PixelMax 1023
+#define DiffWTDBase 38
+#define DiffWTDRoundAdd 32
+#define DiffWTDRoundShft 10
+#define DiffWTDBits 6
+#define DiffWTDMax 64
+
+int compute_mask(int src0, int src1, int inv) {
+  int m = clamp(DiffWTDBase + ((abs(src0 - src1) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+  return inv ? DiffWTDMax - m : m;
+}
+
+int blend(int src0, int src1, int m) {
+  int result = (src0 * m + src1 * (DiffWTDMax - m)) >> DiffWTDBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, PixelMax);
+}
+
+groupshared int2 mem[64];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+
+  uint4 block = pred_blocks.Load4((cb_pass_offset + (thread.x >> (w_log + h_log))) * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * (((block.x >> 16) << 1) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int2 dims = cb_dims[plane > 0].xy;
+
+  const int noskip = block.y & NoSkipFlag;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x) << 1;
+
+  int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output[2] = {{0, 0, 0, 0}, {0, 0, 0, 0}};
+
+  int4 l;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output[0] += l * kernel_v0.x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.x;
+  output[0] += l * kernel_v0.y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.y;
+  output[0] += l * kernel_v0.z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.z;
+  output[0] += l * kernel_v0.w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.w;
+  output[0] += l * kernel_v1.x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v1.x;
+  output[0] += l * kernel_v1.y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v1.y;
+  output[0] += l * kernel_v1.z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v1.z;
+  output[0] += l * kernel_v1.w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v1.w;
+
+  mv = block.w;
+  mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x) << 1;
+
+  filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  refplane = ((block.y >> 14) & 7) * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output1[2] = {{0, 0, 0, 0}, {0, 0, 0, 0}};
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output1[0] += l * kernel_v0.x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.x;
+  output1[0] += l * kernel_v0.y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.y;
+  output1[0] += l * kernel_v0.z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.z;
+  output1[0] += l * kernel_v0.w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.w;
+  output1[0] += l * kernel_v1.x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v1.x;
+  output1[0] += l * kernel_v1.y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v1.y;
+  output1[0] += l * kernel_v1.z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v1.z;
+  output1[0] += l * kernel_v1.w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v1.w;
+  x <<= 1;
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+
+  int2 m0;
+  m0 = int2(0, 0);
+  int inv = (block.y >> 17) & 1;
+  const int res_stride = cb_planes[plane].z;
+  const int res_offset = cb_planes[plane].w + x + y * res_stride;
+  for (int i = 0; i < 2; ++i) {
+    int4 pix4;
+    int src0 = (output[i].x + OutputRoundAdd) >> OutputShift;
+    int src1 = (output1[i].x + OutputRoundAdd) >> OutputShift;
+    int m = compute_mask(src0, src1, inv);
+    pix4.x = blend(src0, src1, m);
+    m0.x += m;
+
+    src0 = (output[i].y + OutputRoundAdd) >> OutputShift;
+    src1 = (output1[i].y + OutputRoundAdd) >> OutputShift;
+    m = compute_mask(src0, src1, inv);
+    pix4.y = blend(src0, src1, m);
+    m0.x += m;
+
+    src0 = (output[i].z + OutputRoundAdd) >> OutputShift;
+    src1 = (output1[i].z + OutputRoundAdd) >> OutputShift;
+    m = compute_mask(src0, src1, inv);
+    pix4.z = blend(src0, src1, m);
+    m0.y += m;
+
+    src0 = (output[i].w + OutputRoundAdd) >> OutputShift;
+    src1 = (output1[i].w + OutputRoundAdd) >> OutputShift;
+    m = compute_mask(src0, src1, inv);
+    pix4.w = blend(src0, src1, m);
+    m0.y += m;
+
+    if (noskip) {
+      int2 r = (int2)residuals.Load2(res_offset + i * res_stride);
+      pix4.x += (r.x << 16) >> 16;
+      pix4.y += r.x >> 16;
+      pix4.z += (r.y << 16) >> 16;
+      pix4.w += r.y >> 16;
+      pix4 = clamp(pix4, 0, PixelMax);
+    }
+
+    dst_frame.Store2(output_offset + i * output_stride, uint2(pix4.x | (pix4.y << 16), pix4.z | (pix4.w << 16)));
+  }
+
+  m0.x = (m0.x + 2) >> 2;
+  m0.y = (m0.y + 2) >> 2;
+
+  mem[thread.x & 63] = m0;
+
+  GroupMemoryBarrier();
+
+  if ((thread.x & 1) == 0) {
+    int2 m1 = mem[(thread.x & 63) + 1];
+    int chroma_offset = (x + y * cb_planes[1].x) >> 1;
+    uint mask = m0.x | (m0.y << 8) | (m1.x << 16) | (m1.y << 24);
+    dst_frame.Store(cb_planes[1].y + chroma_offset, mask);
+    dst_frame.Store(cb_planes[2].y + chroma_offset, mask);
+  }
+}

diff --git a/libav1/dx/shaders/inter_compound_hbd.hlsl b/libav1/dx/shaders/inter_compound_hbd.hlsl
new file mode 100644
index 0000000..1e48f73
--- /dev/null
+++ b/libav1/dx/shaders/inter_compound_hbd.hlsl

@@ -0,0 +1,179 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 2
+#define OutputShift 7
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define PixelMax 1023
+
+int blend(int src0, int src1, int coef0, int coef1) {
+  src0 = (src0 + OutputRoundAdd) >> OutputShift;
+  src1 = (src1 + OutputRoundAdd) >> OutputShift;
+  int result = (src0 * coef0 + src1 * coef1) >> DistBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, PixelMax);
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log));
+  uint4 block = pred_blocks.Load4(block_index * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * (((block.x >> 16) << 1) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int2 dims = cb_dims[plane > 0].xy;
+
+  const int noskip = block.y & NoSkipFlag;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x) << 1;
+
+  int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output[2] = {{0, 0, 0, 0}, {0, 0, 0, 0}};
+
+  int4 l;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output[0] += l * kernel_v0.x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.x;
+  output[0] += l * kernel_v0.y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.y;
+  output[0] += l * kernel_v0.z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.z;
+  output[0] += l * kernel_v0.w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.w;
+  output[0] += l * kernel_v1.x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v1.x;
+  output[0] += l * kernel_v1.y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v1.y;
+  output[0] += l * kernel_v1.z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v1.z;
+  output[0] += l * kernel_v1.w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v1.w;
+
+  mv = block.w;
+  mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x) << 1;
+
+  filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  refplane = ((block.y >> 14) & 7) * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output1[2] = {{0, 0, 0, 0}, {0, 0, 0, 0}};
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output1[0] += l * kernel_v0.x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.x;
+  output1[0] += l * kernel_v0.y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.y;
+  output1[0] += l * kernel_v0.z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.z;
+  output1[0] += l * kernel_v0.w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.w;
+  output1[0] += l * kernel_v1.x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v1.x;
+  output1[0] += l * kernel_v1.y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v1.y;
+  output1[0] += l * kernel_v1.z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v1.z;
+  output1[0] += l * kernel_v1.w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v1.w;
+
+  x <<= 1;
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+
+  int coef0 = (block.y >> 17) & 15;
+  int coef1 = (block.y >> 21) & 15;
+  const int res_stride = cb_planes[plane].z;
+  const int res_offset = cb_planes[plane].w;
+  for (int i = 0; i < 2; ++i) {
+    int4 pix4;
+    pix4.x = blend(output[i].x, output1[i].x, coef0, coef1);
+    pix4.y = blend(output[i].y, output1[i].y, coef0, coef1);
+    pix4.z = blend(output[i].z, output1[i].z, coef0, coef1);
+    pix4.w = blend(output[i].w, output1[i].w, coef0, coef1);
+
+    if (noskip) {
+      int2 r = (int2)residuals.Load2(res_offset + x + (y + i) * res_stride);
+      pix4.x += (r.x << 16) >> 16;
+      pix4.y += r.x >> 16;
+      pix4.z += (r.y << 16) >> 16;
+      pix4.w += r.y >> 16;
+      pix4 = clamp(pix4, 0, PixelMax);
+    }
+    dst_frame.Store2(output_offset + i * output_stride, uint2(pix4.x | (pix4.y << 16), pix4.z | (pix4.w << 16)));
+  }
+}

diff --git a/libav1/dx/shaders/inter_compound_masked.hlsl b/libav1/dx/shaders/inter_compound_masked.hlsl
new file mode 100644
index 0000000..8e1de9f
--- /dev/null
+++ b/libav1/dx/shaders/inter_compound_masked.hlsl

@@ -0,0 +1,218 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 2
+#define OutputShift 7
+#define OutputRoundAdd (1 << (OutputShift - 1))
+#define OffsetBits 19
+#define SumAdd (1 << OffsetBits)
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define MaskBits 6
+#define MaskMax 64
+#define SUM1 1 << OffsetBits
+
+int blend(int src0, int src1, int mask) {
+  src0 = (src0 + OutputRoundAdd) >> OutputShift;
+  src1 = (src1 + OutputRoundAdd) >> OutputShift;
+  int result = (src0 * mask + src1 * (MaskMax - mask)) >> MaskBits;  // maybe this needs rounding (+32 before shift)
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, 255);
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+
+  uint4 block = pred_blocks.Load4((cb_pass_offset + (thread.x >> (w_log + h_log))) * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * (((block.x >> 16) << 1) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int2 dims = cb_dims[plane > 0].xy;
+
+  const int noskip = block.y & NoSkipFlag;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x);
+
+  int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output[2] = {{SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}};
+
+  int4 l;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output[0] += l * kernel_v0.x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.x;
+  output[0] += l * kernel_v0.y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  // output[2] += l * kernel_v0.x;
+  output[1] += l * kernel_v0.y;
+  output[0] += l * kernel_v0.z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v0.x;
+  // output[2] += l * kernel_v0.y;
+  output[1] += l * kernel_v0.z;
+  output[0] += l * kernel_v0.w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v0.y;
+  // output[2] += l * kernel_v0.z;
+  output[1] += l * kernel_v0.w;
+  output[0] += l * kernel_v1.x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v0.z;
+  // output[2] += l * kernel_v0.w;
+  output[1] += l * kernel_v1.x;
+  output[0] += l * kernel_v1.y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v0.w;
+  // output[2] += l * kernel_v1.x;
+  output[1] += l * kernel_v1.y;
+  output[0] += l * kernel_v1.z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v1.x;
+  // output[2] += l * kernel_v1.y;
+  output[1] += l * kernel_v1.z;
+  output[0] += l * kernel_v1.w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v1.y;
+  // output[2] += l * kernel_v1.z;
+  output[1] += l * kernel_v1.w;
+  // l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 9, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v1.z;
+  // output[2] += l * kernel_v1.w;
+  // l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 10, 0, dims.y), kernel_h0, kernel_h1);
+  // output[3] += l * kernel_v1.w;
+
+  mv = block.w;
+  mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x);
+
+  filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  refplane = ((block.y >> 14) & 7) * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output1[2] = {{SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}};
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output1[0] += l * kernel_v0.x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.x;
+  output1[0] += l * kernel_v0.y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[2] += l * kernel_v0.x;
+  output1[1] += l * kernel_v0.y;
+  output1[0] += l * kernel_v0.z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v0.x;
+  // output1[2] += l * kernel_v0.y;
+  output1[1] += l * kernel_v0.z;
+  output1[0] += l * kernel_v0.w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v0.y;
+  // output1[2] += l * kernel_v0.z;
+  output1[1] += l * kernel_v0.w;
+  output1[0] += l * kernel_v1.x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v0.z;
+  // output1[2] += l * kernel_v0.w;
+  output1[1] += l * kernel_v1.x;
+  output1[0] += l * kernel_v1.y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v0.w;
+  // output1[2] += l * kernel_v1.x;
+  output1[1] += l * kernel_v1.y;
+  output1[0] += l * kernel_v1.z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v1.x;
+  // output1[2] += l * kernel_v1.y;
+  output1[1] += l * kernel_v1.z;
+  output1[0] += l * kernel_v1.w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v1.y;
+  // output1[2] += l * kernel_v1.z;
+  output1[1] += l * kernel_v1.w;
+  // l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 9, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v1.z;
+  // output1[2] += l * kernel_v1.w;
+  // l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 10, 0, dims.y), kernel_h0, kernel_h1);
+  // output1[3] += l * kernel_v1.w;
+
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+
+  int wedge_stride = SubblockW << w_log;
+  int wedge_addr = (((block.y >> 17) & 0x1fff) << 6) + SubblockW * (subblock & ((1 << w_log) - 1)) +
+                   SubblockH * (subblock >> w_log) * wedge_stride;
+  const int res_stride = cb_planes[plane].z;
+  const int res_offset = cb_planes[plane].w + (x << 1) + y * res_stride;
+  for (int i = 0; i < 2; ++i) {
+    uint wedge = comp_mask.Load(wedge_addr + wedge_stride * i);
+    output[i].x = blend(output[i].x, output1[i].x, (wedge >> 0) & 255);
+    output[i].y = blend(output[i].y, output1[i].y, (wedge >> 8) & 255);
+    output[i].z = blend(output[i].z, output1[i].z, (wedge >> 16) & 255);
+    output[i].w = blend(output[i].w, output1[i].w, (wedge >> 24) & 255);
+    if (noskip) {
+      int2 r = (int2)residuals.Load2(res_offset + i * res_stride);
+      output[i].x += (r.x << 16) >> 16;
+      output[i].y += r.x >> 16;
+      output[i].z += (r.y << 16) >> 16;
+      output[i].w += r.y >> 16;
+      output[i] = clamp(output[i], 0, 255);
+    }
+    uint pixels = output[i].x | (output[i].y << 8) | (output[i].z << 16) | (output[i].w << 24);
+    dst_frame.Store(output_offset + i * output_stride, pixels);
+  }
+}

diff --git a/libav1/dx/shaders/inter_compound_masked_hbd.hlsl b/libav1/dx/shaders/inter_compound_masked_hbd.hlsl
new file mode 100644
index 0000000..b2cd072
--- /dev/null
+++ b/libav1/dx/shaders/inter_compound_masked_hbd.hlsl

@@ -0,0 +1,182 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 2
+#define OutputShift 7
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define PixelMax 1023
+#define MaskBits 6
+#define MaskMax 64
+
+int blend(int src0, int src1, int mask) {
+  src0 = (src0 + OutputRoundAdd) >> OutputShift;
+  src1 = (src1 + OutputRoundAdd) >> OutputShift;
+  int result = (src0 * mask + src1 * (MaskMax - mask)) >> MaskBits;  // maybe this needs rounding (+32 before shift)
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, PixelMax);
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+
+  uint4 block = pred_blocks.Load4((cb_pass_offset + (thread.x >> (w_log + h_log))) * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * (((block.x >> 16) << 1) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int2 dims = cb_dims[plane > 0].xy;
+
+  const int noskip = block.y & NoSkipFlag;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x) << 1;
+
+  int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output[2] = {{0, 0, 0, 0}, {0, 0, 0, 0}};
+
+  int4 l;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output[0] += l * kernel_v0.x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.x;
+  output[0] += l * kernel_v0.y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.y;
+  output[0] += l * kernel_v0.z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.z;
+  output[0] += l * kernel_v0.w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v0.w;
+  output[0] += l * kernel_v1.x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v1.x;
+  output[0] += l * kernel_v1.y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v1.y;
+  output[0] += l * kernel_v1.z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v1.z;
+  output[0] += l * kernel_v1.w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * kernel_v1.w;
+
+  mv = block.w;
+  mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x) << 1;
+
+  filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  refplane = ((block.y >> 14) & 7) * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+
+  int4 output1[2] = {{0, 0, 0, 0}, {0, 0, 0, 0}};
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output1[0] += l * kernel_v0.x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.x;
+  output1[0] += l * kernel_v0.y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.y;
+  output1[0] += l * kernel_v0.z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.z;
+  output1[0] += l * kernel_v0.w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v0.w;
+  output1[0] += l * kernel_v1.x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v1.x;
+  output1[0] += l * kernel_v1.y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v1.y;
+  output1[0] += l * kernel_v1.z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v1.z;
+  output1[0] += l * kernel_v1.w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output1[1] += l * kernel_v1.w;
+
+  x <<= 1;
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+
+  int wedge_stride = SubblockW << w_log;
+  int wedge_addr = (((block.y >> 17) & 0x1fff) << 6) + SubblockW * (subblock & ((1 << w_log) - 1)) +
+                   SubblockH * (subblock >> w_log) * wedge_stride;
+  const int res_stride = cb_planes[plane].z;
+  const int res_offset = cb_planes[plane].w + x + y * res_stride;
+  for (int i = 0; i < 2; ++i) {
+    uint wedge = comp_mask.Load(wedge_addr + wedge_stride * i);
+    output[i].x = blend(output[i].x, output1[i].x, (wedge >> 0) & 255);
+    output[i].y = blend(output[i].y, output1[i].y, (wedge >> 8) & 255);
+    output[i].z = blend(output[i].z, output1[i].z, (wedge >> 16) & 255);
+    output[i].w = blend(output[i].w, output1[i].w, (wedge >> 24) & 255);
+    if (noskip) {
+      int2 r = (int2)residuals.Load2(res_offset + i * res_stride);
+      output[i].x += (r.x << 16) >> 16;
+      output[i].y += r.x >> 16;
+      output[i].z += (r.y << 16) >> 16;
+      output[i].w += r.y >> 16;
+      output[i] = clamp(output[i], 0, PixelMax);
+    }
+    dst_frame.Store2(output_offset + i * output_stride,
+                     uint2(output[i].x | (output[i].y << 16), output[i].z | (output[i].w << 16)));
+  }
+}

diff --git a/libav1/dx/shaders/inter_ext_borders.hlsl b/libav1/dx/shaders/inter_ext_borders.hlsl
new file mode 100644
index 0000000..5554506
--- /dev/null
+++ b/libav1/dx/shaders/inter_ext_borders.hlsl

@@ -0,0 +1,52 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+cbuffer ExtBorderSRT : register(b0) {
+  uint4 cb_planes[3];
+  uint4 cb_dims[2];
+};
+
+RWByteAddressBuffer Dst : register(u0);
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  const uint yh = cb_dims[0].y;
+  const uint uvh = cb_dims[1].y;
+
+  if (thread.x >= (yh + 2 * uvh)) return;
+
+  const uint plane = (thread.x >= yh) + (thread.x >= (yh + uvh));
+  const uint wi = thread.x - yh * (thread.x >= yh) - uvh * (thread.x >= (yh + uvh));
+
+  const uint left_ptr = cb_planes[plane].y + wi * cb_planes[plane].x;
+  const uint right_ptr = left_ptr + ((cb_dims[plane > 0].x - 1) & (~3));
+  const uint shft = ((cb_dims[plane > 0].x - 1) & 3) * 8;
+  uint left = (Dst.Load(left_ptr) & 0xff) * 0x01010101;
+  uint right0 = Dst.Load(right_ptr);
+  uint right = ((right0 >> shft) & 0xff) * 0x01010101;
+
+  uint r0mask = (1 << shft) - 1;
+  right0 = (right0 & r0mask) | (right & (~r0mask));
+
+  Dst.Store(left_ptr - 12, left);
+  Dst.Store(left_ptr - 8, left);
+  Dst.Store(left_ptr - 4, left);
+
+  Dst.Store(right_ptr + 0, right0);
+  Dst.Store(right_ptr + 4, right);
+  Dst.Store(right_ptr + 8, right);
+  Dst.Store(right_ptr + 12, right);
+}
\ No newline at end of file

diff --git a/libav1/dx/shaders/inter_ext_borders_hbd.hlsl b/libav1/dx/shaders/inter_ext_borders_hbd.hlsl
new file mode 100644
index 0000000..b751e88
--- /dev/null
+++ b/libav1/dx/shaders/inter_ext_borders_hbd.hlsl

@@ -0,0 +1,51 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+cbuffer ExtBorderSRT : register(b0) {
+  uint4 cb_planes[3];
+  uint4 cb_dims[2];
+};
+
+RWByteAddressBuffer Dst : register(u0);
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  const uint yh = cb_dims[0].y;
+  const uint uvh = cb_dims[1].y;
+
+  if (thread.x >= (yh + 2 * uvh)) return;
+
+  const uint plane = (thread.x >= yh) + (thread.x >= (yh + uvh));
+  const uint wi = thread.x - yh * (thread.x >= yh) - uvh * (thread.x >= (yh + uvh));
+
+  const uint left_ptr = cb_planes[plane].y + wi * cb_planes[plane].x;
+  const uint last_pix = (cb_dims[plane > 0].x << 1) - 2;
+  const uint right_ptr = left_ptr + (last_pix & (~3));
+
+  const uint shft = (last_pix & 3) * 8;
+  uint left = (Dst.Load(left_ptr) & 0xffff) * 0x00010001;
+  uint right0 = Dst.Load(right_ptr);
+  uint right = ((right0 >> shft) & 0xffff) * 0x00010001;
+
+  uint r0mask = (1 << shft) - 1;
+  right0 = (right0 & r0mask) | (right & (~r0mask));
+
+  int i;
+  for (i = -24; i < 0; i += 4) Dst.Store(left_ptr + i, left);
+
+  Dst.Store(right_ptr, right0);
+  for (i = 4; i <= 24; i += 4) Dst.Store(right_ptr + i, right);
+}
\ No newline at end of file

diff --git a/libav1/dx/shaders/inter_main.hlsl b/libav1/dx/shaders/inter_main.hlsl
new file mode 100644
index 0000000..491ea4d
--- /dev/null
+++ b/libav1/dx/shaders/inter_main.hlsl

@@ -0,0 +1,147 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 11
+#define OutputRoundAdd (1 << (OutputShift - 1))
+#define OffsetBits 19
+#define SumAdd (1 << OffsetBits)
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define SUM1 1 << OffsetBits
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log));
+  uint4 block = pred_blocks.Load4(block_index << 4);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        1    compound?
+
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * ((block.x >> 16) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int2 dims = cb_dims[plane > 0].xy;
+
+  const int noskip = block.y & NoSkipFlag;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x);
+
+  const int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  const int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  const int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+
+  int4 output[4] = {
+      {SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}};
+
+  int4 l;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output[0] += l * cb_kernels[filter_v][0].x;
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * cb_kernels[filter_v][0].x;
+  output[0] += l * cb_kernels[filter_v][0].y;
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output[2] += l * cb_kernels[filter_v][0].x;
+  output[1] += l * cb_kernels[filter_v][0].y;
+  output[0] += l * cb_kernels[filter_v][0].z;
+
+  //
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].x;
+  output[2] += l * cb_kernels[filter_v][0].y;
+  output[1] += l * cb_kernels[filter_v][0].z;
+  output[0] += l * cb_kernels[filter_v][0].w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].y;
+  output[2] += l * cb_kernels[filter_v][0].z;
+  output[1] += l * cb_kernels[filter_v][0].w;
+  output[0] += l * cb_kernels[filter_v][1].x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].z;
+  output[2] += l * cb_kernels[filter_v][0].w;
+  output[1] += l * cb_kernels[filter_v][1].x;
+  output[0] += l * cb_kernels[filter_v][1].y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].w;
+  output[2] += l * cb_kernels[filter_v][1].x;
+  output[1] += l * cb_kernels[filter_v][1].y;
+  output[0] += l * cb_kernels[filter_v][1].z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].x;
+  output[2] += l * cb_kernels[filter_v][1].y;
+  output[1] += l * cb_kernels[filter_v][1].z;
+  output[0] += l * cb_kernels[filter_v][1].w;
+  //
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].y;
+  output[2] += l * cb_kernels[filter_v][1].z;
+  output[1] += l * cb_kernels[filter_v][1].w;
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 9, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].z;
+  output[2] += l * cb_kernels[filter_v][1].w;
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 10, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].w;
+
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+  const int res_stride = cb_planes[plane].z;
+  const int res_offset = cb_planes[plane].w + (x << 1) + y * res_stride;
+  for (int i = 0; i < 4; ++i) {
+    output[i].x = clamp((int)(((output[i].x + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+    output[i].y = clamp((int)(((output[i].y + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+    output[i].z = clamp((int)(((output[i].z + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+    output[i].w = clamp((int)(((output[i].w + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+
+    if (noskip) {
+      int2 r = (int2)residuals.Load2(res_offset + i * res_stride);
+      output[i].x += (r.x << 16) >> 16;
+      output[i].y += r.x >> 16;
+      output[i].z += (r.y << 16) >> 16;
+      output[i].w += r.y >> 16;
+      output[i] = clamp(output[i], 0, 255);
+    }
+
+    dst_frame.Store(output_offset + i * output_stride,
+                    output[i].x | (output[i].y << 8) | (output[i].z << 16) | (output[i].w << 24));
+  }
+}

diff --git a/libav1/dx/shaders/inter_main_hbd.hlsl b/libav1/dx/shaders/inter_main_hbd.hlsl
new file mode 100644
index 0000000..efbf794
--- /dev/null
+++ b/libav1/dx/shaders/inter_main_hbd.hlsl

@@ -0,0 +1,147 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 11
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define PixelMax 1023
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log));
+  uint4 block = pred_blocks.Load4(block_index << 4);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        1    compound?
+
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * ((block.x >> 16) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int2 dims = cb_dims[plane > 0].xy;
+
+  const int noskip = block.y & NoSkipFlag;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x) << 1;
+
+  const int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  const int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  const int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+
+  int4 output[4] = {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}};
+
+  int4 l;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output[0] += l * cb_kernels[filter_v][0].x;
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * cb_kernels[filter_v][0].x;
+  output[0] += l * cb_kernels[filter_v][0].y;
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output[2] += l * cb_kernels[filter_v][0].x;
+  output[1] += l * cb_kernels[filter_v][0].y;
+  output[0] += l * cb_kernels[filter_v][0].z;
+
+  //
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].x;
+  output[2] += l * cb_kernels[filter_v][0].y;
+  output[1] += l * cb_kernels[filter_v][0].z;
+  output[0] += l * cb_kernels[filter_v][0].w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].y;
+  output[2] += l * cb_kernels[filter_v][0].z;
+  output[1] += l * cb_kernels[filter_v][0].w;
+  output[0] += l * cb_kernels[filter_v][1].x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].z;
+  output[2] += l * cb_kernels[filter_v][0].w;
+  output[1] += l * cb_kernels[filter_v][1].x;
+  output[0] += l * cb_kernels[filter_v][1].y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].w;
+  output[2] += l * cb_kernels[filter_v][1].x;
+  output[1] += l * cb_kernels[filter_v][1].y;
+  output[0] += l * cb_kernels[filter_v][1].z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].x;
+  output[2] += l * cb_kernels[filter_v][1].y;
+  output[1] += l * cb_kernels[filter_v][1].z;
+  output[0] += l * cb_kernels[filter_v][1].w;
+  //
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].y;
+  output[2] += l * cb_kernels[filter_v][1].z;
+  output[1] += l * cb_kernels[filter_v][1].w;
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 9, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].z;
+  output[2] += l * cb_kernels[filter_v][1].w;
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 10, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].w;
+
+  x <<= 1;
+
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+  const int res_stride = cb_planes[plane].z;
+  const int res_offset = cb_planes[plane].w + x + y * res_stride;
+  for (int i = 0; i < 4; ++i) {
+    output[i].x = clamp((int)(((output[i].x + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+    output[i].y = clamp((int)(((output[i].y + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+    output[i].z = clamp((int)(((output[i].z + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+    output[i].w = clamp((int)(((output[i].w + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+
+    if (noskip) {
+      int2 r = (int2)residuals.Load2(res_offset + i * res_stride);
+      output[i].x += (r.x << 16) >> 16;
+      output[i].y += r.x >> 16;
+      output[i].z += (r.y << 16) >> 16;
+      output[i].w += r.y >> 16;
+      output[i] = clamp(output[i], 0, PixelMax);
+    }
+
+    dst_frame.Store2(output_offset + i * output_stride,
+                     uint2(output[i].x | (output[i].y << 16), output[i].z | (output[i].w << 16)));
+  }
+}

diff --git a/libav1/dx/shaders/inter_obmc_above.hlsl b/libav1/dx/shaders/inter_obmc_above.hlsl
new file mode 100644
index 0000000..1a8b661
--- /dev/null
+++ b/libav1/dx/shaders/inter_obmc_above.hlsl

@@ -0,0 +1,140 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 11
+#define OutputRoundAdd (1 << (OutputShift - 1))
+#define OffsetBits 19
+#define SumAdd (1 << OffsetBits)
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define SUM1 1 << OffsetBits
+
+void store_vblend(RWByteAddressBuffer buf, uint addr, int4 v, int mask) {
+  uint src0 = buf.Load(addr);
+  v.x = (((src0 >> 0) & 255) * mask + v.x * (64 - mask) + 32) >> 6;
+  v.y = (((src0 >> 8) & 255) * mask + v.y * (64 - mask) + 32) >> 6;
+  v.z = (((src0 >> 16) & 255) * mask + v.z * (64 - mask) + 32) >> 6;
+  v.w = (((src0 >> 24) & 255) * mask + v.w * (64 - mask) + 32) >> 6;
+  buf.Store(addr, v.x | (v.y << 8) | (v.z << 16) | (v.w << 24));
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+
+  uint4 block = pred_blocks.Load4((cb_pass_offset + (thread.x >> (w_log + h_log))) * 16);
+
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * ((block.x >> 16) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int2 dims = cb_dims[plane > 0].xy;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x);
+
+  const int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  const int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  const int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+
+  int4 output[4] = {
+      {SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}};
+
+  int4 l;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output[0] += l * cb_kernels[filter_v][0].x;
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * cb_kernels[filter_v][0].x;
+  output[0] += l * cb_kernels[filter_v][0].y;
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output[2] += l * cb_kernels[filter_v][0].x;
+  output[1] += l * cb_kernels[filter_v][0].y;
+  output[0] += l * cb_kernels[filter_v][0].z;
+
+  //
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].x;
+  output[2] += l * cb_kernels[filter_v][0].y;
+  output[1] += l * cb_kernels[filter_v][0].z;
+  output[0] += l * cb_kernels[filter_v][0].w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].y;
+  output[2] += l * cb_kernels[filter_v][0].z;
+  output[1] += l * cb_kernels[filter_v][0].w;
+  output[0] += l * cb_kernels[filter_v][1].x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].z;
+  output[2] += l * cb_kernels[filter_v][0].w;
+  output[1] += l * cb_kernels[filter_v][1].x;
+  output[0] += l * cb_kernels[filter_v][1].y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].w;
+  output[2] += l * cb_kernels[filter_v][1].x;
+  output[1] += l * cb_kernels[filter_v][1].y;
+  output[0] += l * cb_kernels[filter_v][1].z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].x;
+  output[2] += l * cb_kernels[filter_v][1].y;
+  output[1] += l * cb_kernels[filter_v][1].z;
+  output[0] += l * cb_kernels[filter_v][1].w;
+  //
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].y;
+  output[2] += l * cb_kernels[filter_v][1].z;
+  output[1] += l * cb_kernels[filter_v][1].w;
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 9, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].z;
+  output[2] += l * cb_kernels[filter_v][1].w;
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 10, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].w;
+
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+
+  for (int i = 0; i < 4; ++i) {
+    output[i].x = clamp((int)(((output[i].x + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+    output[i].y = clamp((int)(((output[i].y + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+    output[i].z = clamp((int)(((output[i].z + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+    output[i].w = clamp((int)(((output[i].w + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+  }
+
+  int4 m = cb_obmc_mask[(block.y >> 17) + (subblock >> w_log)];
+
+  store_vblend(dst_frame, output_offset + 0 * output_stride, output[0], m.x);
+  store_vblend(dst_frame, output_offset + 1 * output_stride, output[1], m.y);
+  store_vblend(dst_frame, output_offset + 2 * output_stride, output[2], m.z);
+  store_vblend(dst_frame, output_offset + 3 * output_stride, output[3], m.w);
+}

diff --git a/libav1/dx/shaders/inter_obmc_above_hbd.hlsl b/libav1/dx/shaders/inter_obmc_above_hbd.hlsl
new file mode 100644
index 0000000..c62241d
--- /dev/null
+++ b/libav1/dx/shaders/inter_obmc_above_hbd.hlsl

@@ -0,0 +1,138 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 11
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define PixelMax 1023
+
+void store_vblend(RWByteAddressBuffer buf, uint addr, int4 v, int mask) {
+  uint2 src0 = buf.Load2(addr);
+  v.x = (((src0.x >> 0) & 1023) * mask + v.x * (64 - mask) + 32) >> 6;
+  v.y = (((src0.x >> 16) & 1023) * mask + v.y * (64 - mask) + 32) >> 6;
+  v.z = (((src0.y >> 0) & 1023) * mask + v.z * (64 - mask) + 32) >> 6;
+  v.w = (((src0.y >> 16) & 1023) * mask + v.w * (64 - mask) + 32) >> 6;
+  buf.Store2(addr, uint2(v.x | (v.y << 16), v.z | (v.w << 16)));
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+
+  uint4 block = pred_blocks.Load4((cb_pass_offset + (thread.x >> (w_log + h_log))) * 16);
+
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * ((block.x >> 16) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int2 dims = cb_dims[plane > 0].xy;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x) << 1;
+
+  const int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  const int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  const int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+
+  int4 output[4] = {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}};
+
+  int4 l;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output[0] += l * cb_kernels[filter_v][0].x;
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * cb_kernels[filter_v][0].x;
+  output[0] += l * cb_kernels[filter_v][0].y;
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output[2] += l * cb_kernels[filter_v][0].x;
+  output[1] += l * cb_kernels[filter_v][0].y;
+  output[0] += l * cb_kernels[filter_v][0].z;
+
+  //
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].x;
+  output[2] += l * cb_kernels[filter_v][0].y;
+  output[1] += l * cb_kernels[filter_v][0].z;
+  output[0] += l * cb_kernels[filter_v][0].w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].y;
+  output[2] += l * cb_kernels[filter_v][0].z;
+  output[1] += l * cb_kernels[filter_v][0].w;
+  output[0] += l * cb_kernels[filter_v][1].x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].z;
+  output[2] += l * cb_kernels[filter_v][0].w;
+  output[1] += l * cb_kernels[filter_v][1].x;
+  output[0] += l * cb_kernels[filter_v][1].y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].w;
+  output[2] += l * cb_kernels[filter_v][1].x;
+  output[1] += l * cb_kernels[filter_v][1].y;
+  output[0] += l * cb_kernels[filter_v][1].z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].x;
+  output[2] += l * cb_kernels[filter_v][1].y;
+  output[1] += l * cb_kernels[filter_v][1].z;
+  output[0] += l * cb_kernels[filter_v][1].w;
+  //
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].y;
+  output[2] += l * cb_kernels[filter_v][1].z;
+  output[1] += l * cb_kernels[filter_v][1].w;
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 9, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].z;
+  output[2] += l * cb_kernels[filter_v][1].w;
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 10, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].w;
+
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + (x << 1) + y * output_stride;
+
+  for (int i = 0; i < 4; ++i) {
+    output[i].x = clamp((int)(((output[i].x + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+    output[i].y = clamp((int)(((output[i].y + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+    output[i].z = clamp((int)(((output[i].z + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+    output[i].w = clamp((int)(((output[i].w + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+  }
+
+  int4 m = cb_obmc_mask[(block.y >> 17) + (subblock >> w_log)];
+
+  store_vblend(dst_frame, output_offset + 0 * output_stride, output[0], m.x);
+  store_vblend(dst_frame, output_offset + 1 * output_stride, output[1], m.y);
+  store_vblend(dst_frame, output_offset + 2 * output_stride, output[2], m.z);
+  store_vblend(dst_frame, output_offset + 3 * output_stride, output[3], m.w);
+}

diff --git a/libav1/dx/shaders/inter_obmc_left.hlsl b/libav1/dx/shaders/inter_obmc_left.hlsl
new file mode 100644
index 0000000..b19cd01
--- /dev/null
+++ b/libav1/dx/shaders/inter_obmc_left.hlsl

@@ -0,0 +1,139 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 11
+#define OutputRoundAdd (1 << (OutputShift - 1))
+#define OffsetBits 19
+#define SumAdd (1 << OffsetBits)
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define SUM1 1 << OffsetBits
+
+void store_hblend(RWByteAddressBuffer buf, uint addr, int4 v, int4 mask) {
+  uint src0 = buf.Load(addr);
+  v.x = (((src0 >> 0) & 255) * mask.x + v.x * (64 - mask.x) + 32) >> 6;
+  v.y = (((src0 >> 8) & 255) * mask.y + v.y * (64 - mask.y) + 32) >> 6;
+  v.z = (((src0 >> 16) & 255) * mask.z + v.z * (64 - mask.z) + 32) >> 6;
+  v.w = (((src0 >> 24) & 255) * mask.w + v.w * (64 - mask.w) + 32) >> 6;
+  buf.Store(addr, v.x | (v.y << 8) | (v.z << 16) | (v.w << 24));
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+
+  uint4 block = pred_blocks.Load4((cb_pass_offset + (thread.x >> (w_log + h_log))) * 16);
+
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * ((block.x >> 16) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int2 dims = cb_dims[plane > 0].xy;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x);
+
+  const int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  const int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  const int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+
+  int4 output[4] = {
+      {SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}, {SUM1, SUM1, SUM1, SUM1}};
+
+  int4 l;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output[0] += l * cb_kernels[filter_v][0].x;
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * cb_kernels[filter_v][0].x;
+  output[0] += l * cb_kernels[filter_v][0].y;
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output[2] += l * cb_kernels[filter_v][0].x;
+  output[1] += l * cb_kernels[filter_v][0].y;
+  output[0] += l * cb_kernels[filter_v][0].z;
+
+  //
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].x;
+  output[2] += l * cb_kernels[filter_v][0].y;
+  output[1] += l * cb_kernels[filter_v][0].z;
+  output[0] += l * cb_kernels[filter_v][0].w;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].y;
+  output[2] += l * cb_kernels[filter_v][0].z;
+  output[1] += l * cb_kernels[filter_v][0].w;
+  output[0] += l * cb_kernels[filter_v][1].x;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].z;
+  output[2] += l * cb_kernels[filter_v][0].w;
+  output[1] += l * cb_kernels[filter_v][1].x;
+  output[0] += l * cb_kernels[filter_v][1].y;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].w;
+  output[2] += l * cb_kernels[filter_v][1].x;
+  output[1] += l * cb_kernels[filter_v][1].y;
+  output[0] += l * cb_kernels[filter_v][1].z;
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].x;
+  output[2] += l * cb_kernels[filter_v][1].y;
+  output[1] += l * cb_kernels[filter_v][1].z;
+  output[0] += l * cb_kernels[filter_v][1].w;
+  //
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].y;
+  output[2] += l * cb_kernels[filter_v][1].z;
+  output[1] += l * cb_kernels[filter_v][1].w;
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 9, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].z;
+  output[2] += l * cb_kernels[filter_v][1].w;
+
+  l = filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 10, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].w;
+
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+
+  for (int i = 0; i < 4; ++i) {
+    output[i].x = clamp((int)(((output[i].x + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+    output[i].y = clamp((int)(((output[i].y + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+    output[i].z = clamp((int)(((output[i].z + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+    output[i].w = clamp((int)(((output[i].w + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+  }
+
+  int4 m = cb_obmc_mask[(block.y >> 17) + (subblock & ((1 << w_log) - 1))];
+  store_hblend(dst_frame, output_offset + 0 * output_stride, output[0], m);
+  store_hblend(dst_frame, output_offset + 1 * output_stride, output[1], m);
+  store_hblend(dst_frame, output_offset + 2 * output_stride, output[2], m);
+  store_hblend(dst_frame, output_offset + 3 * output_stride, output[3], m);
+}

diff --git a/libav1/dx/shaders/inter_obmc_left_hbd.hlsl b/libav1/dx/shaders/inter_obmc_left_hbd.hlsl
new file mode 100644
index 0000000..af8725d
--- /dev/null
+++ b/libav1/dx/shaders/inter_obmc_left_hbd.hlsl

@@ -0,0 +1,137 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 11
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define PixelMax 1023
+
+void store_hblend(RWByteAddressBuffer buf, uint addr, int4 v, int4 mask) {
+  uint2 src0 = buf.Load2(addr);
+  v.x = (((src0.x >> 0) & 1023) * mask.x + v.x * (64 - mask.x) + 32) >> 6;
+  v.y = (((src0.x >> 16) & 1023) * mask.y + v.y * (64 - mask.y) + 32) >> 6;
+  v.z = (((src0.y >> 0) & 1023) * mask.z + v.z * (64 - mask.z) + 32) >> 6;
+  v.w = (((src0.y >> 16) & 1023) * mask.w + v.w * (64 - mask.w) + 32) >> 6;
+  buf.Store2(addr, uint2(v.x | (v.y << 16), v.z | (v.w << 16)));
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+
+  uint4 block = pred_blocks.Load4((cb_pass_offset + (thread.x >> (w_log + h_log))) * 16);
+
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * ((block.x >> 16) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int2 dims = cb_dims[plane > 0].xy;
+
+  int mv = block.z;
+  int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+  int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+  mvx = clamp(mvx, -11, dims.x) << 1;
+
+  const int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+  const int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+
+  const int refplane = ((block.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+
+  int4 output[4] = {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}};
+
+  int4 l;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 0, 0, dims.y), kernel_h0, kernel_h1);
+  output[0] += l * cb_kernels[filter_v][0].x;
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 1, 0, dims.y), kernel_h0, kernel_h1);
+  output[1] += l * cb_kernels[filter_v][0].x;
+  output[0] += l * cb_kernels[filter_v][0].y;
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 2, 0, dims.y), kernel_h0, kernel_h1);
+  output[2] += l * cb_kernels[filter_v][0].x;
+  output[1] += l * cb_kernels[filter_v][0].y;
+  output[0] += l * cb_kernels[filter_v][0].z;
+
+  //
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 3, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].x;
+  output[2] += l * cb_kernels[filter_v][0].y;
+  output[1] += l * cb_kernels[filter_v][0].z;
+  output[0] += l * cb_kernels[filter_v][0].w;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 4, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].y;
+  output[2] += l * cb_kernels[filter_v][0].z;
+  output[1] += l * cb_kernels[filter_v][0].w;
+  output[0] += l * cb_kernels[filter_v][1].x;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 5, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].z;
+  output[2] += l * cb_kernels[filter_v][0].w;
+  output[1] += l * cb_kernels[filter_v][1].x;
+  output[0] += l * cb_kernels[filter_v][1].y;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 6, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][0].w;
+  output[2] += l * cb_kernels[filter_v][1].x;
+  output[1] += l * cb_kernels[filter_v][1].y;
+  output[0] += l * cb_kernels[filter_v][1].z;
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 7, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].x;
+  output[2] += l * cb_kernels[filter_v][1].y;
+  output[1] += l * cb_kernels[filter_v][1].z;
+  output[0] += l * cb_kernels[filter_v][1].w;
+  //
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 8, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].y;
+  output[2] += l * cb_kernels[filter_v][1].z;
+  output[1] += l * cb_kernels[filter_v][1].w;
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 9, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].z;
+  output[2] += l * cb_kernels[filter_v][1].w;
+
+  l = filter_line_hbd(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + 10, 0, dims.y), kernel_h0, kernel_h1);
+  output[3] += l * cb_kernels[filter_v][1].w;
+
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + (x << 1) + y * output_stride;
+
+  for (int i = 0; i < 4; ++i) {
+    output[i].x = clamp((int)(((output[i].x + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+    output[i].y = clamp((int)(((output[i].y + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+    output[i].z = clamp((int)(((output[i].z + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+    output[i].w = clamp((int)(((output[i].w + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+  }
+
+  int4 m = cb_obmc_mask[(block.y >> 17) + (subblock & ((1 << w_log) - 1))];
+  store_hblend(dst_frame, output_offset + 0 * output_stride, output[0], m);
+  store_hblend(dst_frame, output_offset + 1 * output_stride, output[1], m);
+  store_hblend(dst_frame, output_offset + 2 * output_stride, output[2], m);
+  store_hblend(dst_frame, output_offset + 3 * output_stride, output[3], m);
+}

diff --git a/libav1/dx/shaders/inter_scale.hlsl b/libav1/dx/shaders/inter_scale.hlsl
new file mode 100644
index 0000000..cb2722e
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale.hlsl

@@ -0,0 +1,125 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 11
+#define OffsetBits 19
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define LocalStride 20
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index << 4);
+
+  const int wi = thread.x & 3;
+  const int dx = SubblockW * (subblock & ((1 << w_log) - 1));
+  const int dy = SubblockH * (subblock >> w_log);
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+  const int ref_frm = (block.y >> 2) & 7;
+  const int refplane = ref_frm * 3 + plane;
+  const int ref_offset = cb_refplanes[refplane].y;
+  const int ref_stride = cb_refplanes[refplane].x;
+  const int ref_w = cb_refplanes[refplane].z;
+  const int ref_h = cb_refplanes[refplane].w;
+
+  int4 scale = cb_scale[ref_frm + 1];
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+  int mv = block.z;
+  int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w);
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+
+  mbx += dx;
+  mby += dy + wi;
+
+  const int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  const int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    uint3 l = dst_frame.Load3(ref_addr & (~3));
+    const uint shift = (ref_addr & 3) * 8;
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xff);
+    sum += kernel_h0.y * (int)((l.x >> 8) & 0xff);
+    sum += kernel_h0.z * (int)((l.x >> 16) & 0xff);
+    sum += kernel_h0.w * (int)((l.x >> 24) & 0xff);
+    sum += kernel_h1.x * (int)((l.y >> 0) & 0xff);
+    sum += kernel_h1.y * (int)((l.y >> 8) & 0xff);
+    sum += kernel_h1.z * (int)((l.y >> 16) & 0xff);
+    sum += kernel_h1.w * (int)((l.y >> 24) & 0xff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd8bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  const int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  const int4 kernel_v0 = cb_kernels[filter_v][0];
+  const int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int output[4];  /// int4???
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = clamp((int)(((sum + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+  }
+
+  if (noskip) {
+    const int res_addr = cb_planes[plane].w + (mbx << 1) + mby * cb_planes[plane].z;
+    int2 r = (int2)residuals.Load2(res_addr);
+    output[0] = clamp(output[0] + ((r.x << 16) >> 16), 0, 255);
+    output[1] = clamp(output[1] + (r.x >> 16), 0, 255);
+    output[2] = clamp(output[2] + ((r.y << 16) >> 16), 0, 255);
+    output[3] = clamp(output[3] + (r.y >> 16), 0, 255);
+  }
+
+  const int output_addr = cb_planes[plane].y + mbx + mby * cb_planes[plane].x;
+  dst_frame.Store(output_addr, output[0] | (output[1] << 8) | (output[2] << 16) | (output[3] << 24));
+}

diff --git a/libav1/dx/shaders/inter_scale_2x2chroma.hlsl b/libav1/dx/shaders/inter_scale_2x2chroma.hlsl
new file mode 100644
index 0000000..98cdaf2
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_2x2chroma.hlsl

@@ -0,0 +1,133 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 2
+#define SubblockH 2
+#define OutputShift 11
+#define OffsetBits 19
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define DualWriteBlock (1 << 25)
+#define LocalStride 16
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  uint4 block = pred_blocks.Load4((cb_pass_offset + (thread.x >> 1)) << 4);
+  const int wi = thread.x & 1;
+
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+  const int ref_frm = (block.y >> 2) & 7;
+  const int refplane = ref_frm * 3 + plane;
+  const int ref_offset = cb_refplanes[refplane].y;
+  const int ref_stride = cb_refplanes[refplane].x;
+  const int ref_w = cb_refplanes[refplane].z;
+  const int ref_h = cb_refplanes[refplane].w;
+
+  int4 scale = cb_scale[ref_frm + 1];
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+  const int dx = mbx & (block.y >> (26 - 1)) & 6;
+  const int dy = mby & (block.y >> (28 - 1)) & 6;
+  int mv = block.z;
+  int mvx = scale_value(((mbx - dx) << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value(((mby - dy) << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w);
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+
+  mby += wi;
+
+  const int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  const int lines = 8 + (((SubblockH - 1) * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    uint3 l = dst_frame.Load3(ref_addr & (~3));
+    const uint shift = (ref_addr & 3) * 8;
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xff);
+    sum += kernel_h0.y * (int)((l.x >> 8) & 0xff);
+    sum += kernel_h0.z * (int)((l.x >> 16) & 0xff);
+    sum += kernel_h0.w * (int)((l.x >> 24) & 0xff);
+    sum += kernel_h1.x * (int)((l.y >> 0) & 0xff);
+    sum += kernel_h1.y * (int)((l.y >> 8) & 0xff);
+    sum += kernel_h1.z * (int)((l.y >> 16) & 0xff);
+    sum += kernel_h1.w * (int)((l.y >> 24) & 0xff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd8bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  const int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  const int4 kernel_v0 = cb_kernels[filter_v][0];
+  const int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 62) * LocalStride;
+  int output[2];
+  for (i = 0; i < 2; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = clamp((int)(((sum + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+  }
+
+  if (noskip) {
+    const int res_addr = cb_planes[plane].w + (mbx << 1) + mby * cb_planes[plane].z;
+    int r = residuals.Load(res_addr);
+    output[0] = clamp(output[0] + ((r << 16) >> 16), 0, 255);
+    output[1] = clamp(output[1] + (r >> 16), 0, 255);
+  }
+
+  uint output4 = (output[0] | (output[1] << 8)) << ((mbx & 2) * 8);
+  const int output_addr = cb_planes[plane].y + (mbx & ~3) + mby * cb_planes[plane].x;
+
+  if (block.y & DualWriteBlock) {
+    GroupMemoryBarrier();
+    intermediate_buffer[thread.x & 63] = output4;
+    GroupMemoryBarrier();
+    if ((thread.x & 2) == 0) {
+      output4 |= intermediate_buffer[(thread.x & 63) + 2];
+      dst_frame.Store(output_addr, output4);
+    }
+  } else {
+    const uint mask = 0xffff0000 >> ((mbx & 2) * 8);
+    output4 |= dst_frame.Load(output_addr) & mask;
+    dst_frame.Store(output_addr, output4);
+  }
+}

diff --git a/libav1/dx/shaders/inter_scale_2x2chroma_hbd.hlsl b/libav1/dx/shaders/inter_scale_2x2chroma_hbd.hlsl
new file mode 100644
index 0000000..2d22031
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_2x2chroma_hbd.hlsl

@@ -0,0 +1,125 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 2
+#define SubblockH 2
+#define OutputShift 11
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define PixelMax 1023
+#define LocalStride 16
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  uint4 block = pred_blocks.Load4((cb_pass_offset + (thread.x >> 1)) << 4);
+  const int wi = thread.x & 1;
+
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+  const int ref_frm = (block.y >> 2) & 7;
+  const int refplane = ref_frm * 3 + plane;
+  const int ref_offset = cb_refplanes[refplane].y;
+  const int ref_stride = cb_refplanes[refplane].x;
+  const int ref_w = cb_refplanes[refplane].z;
+  const int ref_h = cb_refplanes[refplane].w;
+
+  int4 scale = cb_scale[ref_frm + 1];
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+  const int dx = mbx & (block.y >> (26 - 1)) & 6;
+  const int dy = mby & (block.y >> (28 - 1)) & 6;
+
+  int mv = block.z;
+  int mvx = scale_value(((mbx - dx) << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value(((mby - dy) << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (wi + dx) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w) << 1;
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+
+  mby += wi;
+
+  const int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  const int lines = 8 + (((SubblockH - 1) * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    const uint shift = (ref_addr & 2) * 8;
+    ref_addr &= ~3;
+    uint4 l = dst_frame.Load4(ref_addr);
+    uint l5 = dst_frame.Load(ref_addr + 16);
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+    l.w = (l.w >> shift) | ((l5 << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xffff);
+    sum += kernel_h0.y * (int)((l.x >> 16) & 0xffff);
+    sum += kernel_h0.z * (int)((l.y >> 0) & 0xffff);
+    sum += kernel_h0.w * (int)((l.y >> 16) & 0xffff);
+    sum += kernel_h1.x * (int)((l.z >> 0) & 0xffff);
+    sum += kernel_h1.y * (int)((l.z >> 16) & 0xffff);
+    sum += kernel_h1.z * (int)((l.w >> 0) & 0xffff);
+    sum += kernel_h1.w * (int)((l.w >> 16) & 0xffff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd10bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  const int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  const int4 kernel_v0 = cb_kernels[filter_v][0];
+  const int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 62) * LocalStride;
+  int output[2];
+  for (i = 0; i < 2; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = clamp((int)(((sum + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+  }
+
+  mbx <<= 1;
+  if (noskip) {
+    const int res_addr = cb_planes[plane].w + mbx + mby * cb_planes[plane].z;
+    int r = residuals.Load(res_addr);
+    output[0] = clamp(output[0] + ((r << 16) >> 16), 0, PixelMax);
+    output[1] = clamp(output[1] + (r >> 16), 0, PixelMax);
+  }
+
+  const int output_addr = cb_planes[plane].y + mbx + mby * cb_planes[plane].x;
+  dst_frame.Store(output_addr, output[0] | (output[1] << 16));
+}

diff --git a/libav1/dx/shaders/inter_scale_compound.hlsl b/libav1/dx/shaders/inter_scale_compound.hlsl
new file mode 100644
index 0000000..08c82a9
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_compound.hlsl

@@ -0,0 +1,206 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 7
+#define OffsetBits 19
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define LocalStride 20
+
+int blend(int src0, int src1, int coef0, int coef1) {
+  src0 = (src0 + OutputRoundAdd) >> OutputShift;
+  src1 = (src1 + OutputRoundAdd) >> OutputShift;
+  int result = (src0 * coef0 + src1 * coef1) >> DistBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, 255);
+}
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+  const int wi = thread.x & 3;
+  const int dx = SubblockW * (subblock & ((1 << w_log) - 1));
+  const int dy = SubblockH * (subblock >> w_log);
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+
+  int ref_frm = (block.y >> 2) & 7;
+  int refplane = ref_frm * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+  int ref_w = cb_refplanes[refplane].z;
+  int ref_h = cb_refplanes[refplane].w;
+  int4 scale = cb_scale[ref_frm + 1];
+  int mv = block.z;
+  int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w);
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    uint3 l = dst_frame.Load3(ref_addr & (~3));
+    const uint shift = (ref_addr & 3) * 8;
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xff);
+    sum += kernel_h0.y * (int)((l.x >> 8) & 0xff);
+    sum += kernel_h0.z * (int)((l.x >> 16) & 0xff);
+    sum += kernel_h0.w * (int)((l.x >> 24) & 0xff);
+    sum += kernel_h1.x * (int)((l.y >> 0) & 0xff);
+    sum += kernel_h1.y * (int)((l.y >> 8) & 0xff);
+    sum += kernel_h1.z * (int)((l.y >> 16) & 0xff);
+    sum += kernel_h1.w * (int)((l.y >> 24) & 0xff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd8bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int output[4];  /// int4???
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = sum;
+  }
+
+  GroupMemoryBarrier();
+
+  ref_frm = (block.y >> 14) & 7;
+  refplane = ref_frm * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+  ref_w = cb_refplanes[refplane].z;
+  ref_h = cb_refplanes[refplane].w;
+  scale = cb_scale[ref_frm + 1];
+  mv = block.w;
+  mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w);
+  y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  local_base = (thread.x & 63) * LocalStride;
+
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    uint3 l = dst_frame.Load3(ref_addr & (~3));
+    const uint shift = (ref_addr & 3) * 8;
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xff);
+    sum += kernel_h0.y * (int)((l.x >> 8) & 0xff);
+    sum += kernel_h0.z * (int)((l.x >> 16) & 0xff);
+    sum += kernel_h0.w * (int)((l.x >> 24) & 0xff);
+    sum += kernel_h1.x * (int)((l.y >> 0) & 0xff);
+    sum += kernel_h1.y * (int)((l.y >> 8) & 0xff);
+    sum += kernel_h1.z * (int)((l.y >> 16) & 0xff);
+    sum += kernel_h1.w * (int)((l.y >> 24) & 0xff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd8bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int coef0 = (block.y >> 17) & 15;
+  int coef1 = (block.y >> 21) & 15;
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = blend(output[i], sum, coef0, coef1);
+  }
+
+  mbx += dx;
+  mby += dy + wi;
+  if (noskip) {
+    const int res_addr = cb_planes[plane].w + (mbx << 1) + mby * cb_planes[plane].z;
+    int2 r = (int2)residuals.Load2(res_addr);
+    output[0] = clamp(output[0] + ((r.x << 16) >> 16), 0, 255);
+    output[1] = clamp(output[1] + (r.x >> 16), 0, 255);
+    output[2] = clamp(output[2] + ((r.y << 16) >> 16), 0, 255);
+    output[3] = clamp(output[3] + (r.y >> 16), 0, 255);
+  }
+
+  const int output_addr = cb_planes[plane].y + mbx + mby * cb_planes[plane].x;
+  dst_frame.Store(output_addr, output[0] | (output[1] << 8) | (output[2] << 16) | (output[3] << 24));
+}

diff --git a/libav1/dx/shaders/inter_scale_compound_2x2.hlsl b/libav1/dx/shaders/inter_scale_compound_2x2.hlsl
new file mode 100644
index 0000000..405fd84
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_compound_2x2.hlsl

@@ -0,0 +1,213 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 2
+#define SubblockH 2
+#define OutputShift 7
+#define OffsetBits 19
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define DualWriteBlock (1 << 25)
+#define LocalStride 16
+
+int blend(int src0, int src1, int coef0, int coef1) {
+  src0 = (src0 + OutputRoundAdd) >> OutputShift;
+  src1 = (src1 + OutputRoundAdd) >> OutputShift;
+  int result = (src0 * coef0 + src1 * coef1) >> DistBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, 255);
+}
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int wi = thread.x & 1;
+  uint4 block = pred_blocks.Load4((cb_pass_offset + (thread.x >> 1)) * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+  const int dx = mbx & (block.y >> (26 - 1)) & 6;
+  const int dy = mby & (block.y >> (28 - 1)) & 6;
+  int ref_frm = (block.y >> 2) & 7;
+  int refplane = ref_frm * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+  int ref_w = cb_refplanes[refplane].z;
+  int ref_h = cb_refplanes[refplane].w;
+  int4 scale = cb_scale[ref_frm + 1];
+  int mv = block.z;
+  int mvx = scale_value(((mbx - dx) << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value(((mby - dy) << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (wi + dx) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w);
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  int lines = 8 + (((SubblockH - 1) * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    uint3 l = dst_frame.Load3(ref_addr & (~3));
+    const uint shift = (ref_addr & 3) * 8;
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xff);
+    sum += kernel_h0.y * (int)((l.x >> 8) & 0xff);
+    sum += kernel_h0.z * (int)((l.x >> 16) & 0xff);
+    sum += kernel_h0.w * (int)((l.x >> 24) & 0xff);
+    sum += kernel_h1.x * (int)((l.y >> 0) & 0xff);
+    sum += kernel_h1.y * (int)((l.y >> 8) & 0xff);
+    sum += kernel_h1.z * (int)((l.y >> 16) & 0xff);
+    sum += kernel_h1.w * (int)((l.y >> 24) & 0xff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd8bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 62) * LocalStride;
+  int output[2];
+  for (i = 0; i < 2; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = sum;
+  }
+
+  GroupMemoryBarrier();
+
+  ref_frm = (block.y >> 14) & 7;
+  refplane = ref_frm * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+  ref_w = cb_refplanes[refplane].z;
+  ref_h = cb_refplanes[refplane].w;
+  scale = cb_scale[ref_frm + 1];
+  mv = block.w;
+  mvx = scale_value(((mbx - dx) << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  mvy = scale_value(((mby - dy) << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (wi + dx) * scale.y;
+  mvy += dy * scale.w;
+  x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w);
+  y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  lines = 8 + (((SubblockH - 1) * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  local_base = (thread.x & 63) * LocalStride;
+
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    uint3 l = dst_frame.Load3(ref_addr & (~3));
+    const uint shift = (ref_addr & 3) * 8;
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xff);
+    sum += kernel_h0.y * (int)((l.x >> 8) & 0xff);
+    sum += kernel_h0.z * (int)((l.x >> 16) & 0xff);
+    sum += kernel_h0.w * (int)((l.x >> 24) & 0xff);
+    sum += kernel_h1.x * (int)((l.y >> 0) & 0xff);
+    sum += kernel_h1.y * (int)((l.y >> 8) & 0xff);
+    sum += kernel_h1.z * (int)((l.y >> 16) & 0xff);
+    sum += kernel_h1.w * (int)((l.y >> 24) & 0xff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd8bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 62) * LocalStride;
+  int coef0 = (block.y >> 17) & 15;
+  int coef1 = (block.y >> 21) & 15;
+  for (i = 0; i < 2; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = blend(output[i], sum, coef0, coef1);
+  }
+
+  mby += wi;
+  if (noskip) {
+    const int res_addr = cb_planes[plane].w + (mbx << 1) + mby * cb_planes[plane].z;
+    int r = residuals.Load(res_addr);
+    output[0] = clamp(output[0] + ((r.x << 16) >> 16), 0, 255);
+    output[1] = clamp(output[1] + (r.x >> 16), 0, 255);
+  }
+
+  uint output4 = (output[0] | (output[1] << 8)) << ((mbx & 2) * 8);
+  const int output_addr = cb_planes[plane].y + (mbx & ~3) + mby * cb_planes[plane].x;
+
+  if (block.y & DualWriteBlock) {
+    GroupMemoryBarrier();
+    intermediate_buffer[thread.x & 63] = output4;
+    GroupMemoryBarrier();
+    if ((thread.x & 2) == 0) {
+      output4 |= intermediate_buffer[(thread.x & 63) + 2];
+      dst_frame.Store(output_addr, output4);
+    }
+  } else {
+    const uint mask = 0xffff0000 >> ((mbx & 2) * 8);
+    output4 |= dst_frame.Load(output_addr) & mask;
+    dst_frame.Store(output_addr, output4);
+  }
+}

diff --git a/libav1/dx/shaders/inter_scale_compound_2x2_hbd.hlsl b/libav1/dx/shaders/inter_scale_compound_2x2_hbd.hlsl
new file mode 100644
index 0000000..0f1de9e
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_compound_2x2_hbd.hlsl

@@ -0,0 +1,208 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 2
+#define SubblockH 2
+#define OutputShift 7
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define PixelMax 1023
+#define LocalStride 16
+
+int blend(int src0, int src1, int coef0, int coef1) {
+  src0 = (src0 + OutputRoundAdd) >> OutputShift;
+  src1 = (src1 + OutputRoundAdd) >> OutputShift;
+  int result = (src0 * coef0 + src1 * coef1) >> DistBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, PixelMax);
+}
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int wi = thread.x & 1;
+  uint4 block = pred_blocks.Load4((cb_pass_offset + (thread.x >> 1)) * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+  const int dx = mbx & (block.y >> (26 - 1)) & 6;
+  const int dy = mby & (block.y >> (28 - 1)) & 6;
+  int ref_frm = (block.y >> 2) & 7;
+  int refplane = ref_frm * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+  int ref_w = cb_refplanes[refplane].z;
+  int ref_h = cb_refplanes[refplane].w;
+  int4 scale = cb_scale[ref_frm + 1];
+  int mv = block.z;
+  int mvx = scale_value(((mbx - dx) << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value(((mby - dy) << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (wi + dx) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w) << 1;
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  int lines = 8 + (((SubblockH - 1) * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    const uint shift = (ref_addr & 2) * 8;
+    ref_addr &= ~3;
+    uint4 l = dst_frame.Load4(ref_addr);
+    uint l5 = dst_frame.Load(ref_addr + 16);
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+    l.w = (l.w >> shift) | ((l5 << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xffff);
+    sum += kernel_h0.y * (int)((l.x >> 16) & 0xffff);
+    sum += kernel_h0.z * (int)((l.y >> 0) & 0xffff);
+    sum += kernel_h0.w * (int)((l.y >> 16) & 0xffff);
+    sum += kernel_h1.x * (int)((l.z >> 0) & 0xffff);
+    sum += kernel_h1.y * (int)((l.z >> 16) & 0xffff);
+    sum += kernel_h1.z * (int)((l.w >> 0) & 0xffff);
+    sum += kernel_h1.w * (int)((l.w >> 16) & 0xffff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd10bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 62) * LocalStride;
+  int output[2];
+  for (i = 0; i < 2; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = sum;
+  }
+
+  GroupMemoryBarrier();
+
+  ref_frm = (block.y >> 14) & 7;
+  refplane = ref_frm * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+  ref_w = cb_refplanes[refplane].z;
+  ref_h = cb_refplanes[refplane].w;
+  scale = cb_scale[ref_frm + 1];
+  mv = block.w;
+  mvx = scale_value(((mbx - dx) << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  mvy = scale_value(((mby - dy) << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (wi + dx) * scale.y;
+  mvy += dy * scale.w;
+  x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w) << 1;
+  y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  lines = 8 + (((SubblockH - 1) * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  local_base = (thread.x & 63) * LocalStride;
+
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    const uint shift = (ref_addr & 2) * 8;
+    ref_addr &= ~3;
+    uint4 l = dst_frame.Load4(ref_addr);
+    uint l5 = dst_frame.Load(ref_addr + 16);
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+    l.w = (l.w >> shift) | ((l5 << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xffff);
+    sum += kernel_h0.y * (int)((l.x >> 16) & 0xffff);
+    sum += kernel_h0.z * (int)((l.y >> 0) & 0xffff);
+    sum += kernel_h0.w * (int)((l.y >> 16) & 0xffff);
+    sum += kernel_h1.x * (int)((l.z >> 0) & 0xffff);
+    sum += kernel_h1.y * (int)((l.z >> 16) & 0xffff);
+    sum += kernel_h1.z * (int)((l.w >> 0) & 0xffff);
+    sum += kernel_h1.w * (int)((l.w >> 16) & 0xffff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd10bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 62) * LocalStride;
+  int coef0 = (block.y >> 17) & 15;
+  int coef1 = (block.y >> 21) & 15;
+  for (i = 0; i < 2; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = blend(output[i], sum, coef0, coef1);
+  }
+
+  mbx <<= 1;
+  mby += wi;
+  if (noskip) {
+    const int res_addr = cb_planes[plane].w + mbx + mby * cb_planes[plane].z;
+    int r = residuals.Load(res_addr);
+    output[0] = clamp(output[0] + ((r << 16) >> 16), 0, PixelMax);
+    output[1] = clamp(output[1] + (r >> 16), 0, PixelMax);
+  }
+
+  const int output_addr = cb_planes[plane].y + mbx + mby * cb_planes[plane].x;
+  dst_frame.Store(output_addr, output[0] | (output[1] << 16));
+}

diff --git a/libav1/dx/shaders/inter_scale_compound_diff_chroma.hlsl b/libav1/dx/shaders/inter_scale_compound_diff_chroma.hlsl
new file mode 100644
index 0000000..10d8d37
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_compound_diff_chroma.hlsl

@@ -0,0 +1,208 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 7
+#define OffsetBits 19
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define DiffWTDBits 6
+#define DiffWTDMax 64
+#define LocalStride 20
+
+int blend(int src0, int src1, int m) {
+  src0 = (src0 + OutputRoundAdd) >> OutputShift;
+  src1 = (src1 + OutputRoundAdd) >> OutputShift;
+  int result = (src0 * m + src1 * (DiffWTDMax - m)) >> DiffWTDBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, 255);
+}
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+  const int wi = thread.x & 3;
+  const int dx = SubblockW * (subblock & ((1 << w_log) - 1));
+  const int dy = SubblockH * (subblock >> w_log);
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+
+  int ref_frm = (block.y >> 2) & 7;
+  int refplane = ref_frm * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+  int ref_w = cb_refplanes[refplane].z;
+  int ref_h = cb_refplanes[refplane].w;
+  int4 scale = cb_scale[ref_frm + 1];
+  int mv = block.z;
+  int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w);
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    uint3 l = dst_frame.Load3(ref_addr & (~3));
+    const uint shift = (ref_addr & 3) * 8;
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xff);
+    sum += kernel_h0.y * (int)((l.x >> 8) & 0xff);
+    sum += kernel_h0.z * (int)((l.x >> 16) & 0xff);
+    sum += kernel_h0.w * (int)((l.x >> 24) & 0xff);
+    sum += kernel_h1.x * (int)((l.y >> 0) & 0xff);
+    sum += kernel_h1.y * (int)((l.y >> 8) & 0xff);
+    sum += kernel_h1.z * (int)((l.y >> 16) & 0xff);
+    sum += kernel_h1.w * (int)((l.y >> 24) & 0xff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd8bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int output[4];  /// int4???
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = sum;
+  }
+
+  GroupMemoryBarrier();
+
+  ref_frm = (block.y >> 14) & 7;
+  refplane = ref_frm * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+  ref_w = cb_refplanes[refplane].z;
+  ref_h = cb_refplanes[refplane].w;
+  scale = cb_scale[ref_frm + 1];
+  mv = block.w;
+  mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w);
+  y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  local_base = (thread.x & 63) * LocalStride;
+
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    uint3 l = dst_frame.Load3(ref_addr & (~3));
+    const uint shift = (ref_addr & 3) * 8;
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xff);
+    sum += kernel_h0.y * (int)((l.x >> 8) & 0xff);
+    sum += kernel_h0.z * (int)((l.x >> 16) & 0xff);
+    sum += kernel_h0.w * (int)((l.x >> 24) & 0xff);
+    sum += kernel_h1.x * (int)((l.y >> 0) & 0xff);
+    sum += kernel_h1.y * (int)((l.y >> 8) & 0xff);
+    sum += kernel_h1.z * (int)((l.y >> 16) & 0xff);
+    sum += kernel_h1.w * (int)((l.y >> 24) & 0xff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd8bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+
+  mbx += dx;
+  mby += dy + wi;
+  const int output_addr = cb_planes[plane].y + mbx + mby * cb_planes[plane].x;
+  uint mask = dst_frame.Load(output_addr);
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = blend(output[i], sum, (mask >> (i * 8)) & 255);
+  }
+
+  if (noskip) {
+    const int res_addr = cb_planes[plane].w + (mbx << 1) + mby * cb_planes[plane].z;
+    int2 r = (int2)residuals.Load2(res_addr);
+    output[0] = clamp(output[0] + ((r.x << 16) >> 16), 0, 255);
+    output[1] = clamp(output[1] + (r.x >> 16), 0, 255);
+    output[2] = clamp(output[2] + ((r.y << 16) >> 16), 0, 255);
+    output[3] = clamp(output[3] + (r.y >> 16), 0, 255);
+  }
+
+  dst_frame.Store(output_addr, output[0] | (output[1] << 8) | (output[2] << 16) | (output[3] << 24));
+}

diff --git a/libav1/dx/shaders/inter_scale_compound_diff_chroma_hbd.hlsl b/libav1/dx/shaders/inter_scale_compound_diff_chroma_hbd.hlsl
new file mode 100644
index 0000000..9f4953e
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_compound_diff_chroma_hbd.hlsl

@@ -0,0 +1,217 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 7
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define PixelMax 1023
+#define DiffWTDBits 6
+#define DiffWTDMax 64
+#define LocalStride 20
+
+int blend(int src0, int src1, int m) {
+  src0 = (src0 + OutputRoundAdd) >> OutputShift;
+  src1 = (src1 + OutputRoundAdd) >> OutputShift;
+  int result = (src0 * m + src1 * (DiffWTDMax - m)) >> DiffWTDBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, PixelMax);
+}
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+  const int wi = thread.x & 3;
+  const int dx = SubblockW * (subblock & ((1 << w_log) - 1));
+  const int dy = SubblockH * (subblock >> w_log);
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+
+  int ref_frm = (block.y >> 2) & 7;
+  int refplane = ref_frm * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+  int ref_w = cb_refplanes[refplane].z;
+  int ref_h = cb_refplanes[refplane].w;
+  int4 scale = cb_scale[ref_frm + 1];
+  int mv = block.z;
+  int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w) << 1;
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    const uint shift = (ref_addr & 2) * 8;
+    ref_addr &= ~3;
+    uint4 l = dst_frame.Load4(ref_addr);
+    uint l5 = dst_frame.Load(ref_addr + 16);
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+    l.w = (l.w >> shift) | ((l5 << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xffff);
+    sum += kernel_h0.y * (int)((l.x >> 16) & 0xffff);
+    sum += kernel_h0.z * (int)((l.y >> 0) & 0xffff);
+    sum += kernel_h0.w * (int)((l.y >> 16) & 0xffff);
+    sum += kernel_h1.x * (int)((l.z >> 0) & 0xffff);
+    sum += kernel_h1.y * (int)((l.z >> 16) & 0xffff);
+    sum += kernel_h1.z * (int)((l.w >> 0) & 0xffff);
+    sum += kernel_h1.w * (int)((l.w >> 16) & 0xffff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd10bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int output[4];  /// int4???
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = sum;
+  }
+
+  GroupMemoryBarrier();
+
+  ref_frm = (block.y >> 14) & 7;
+  refplane = ref_frm * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+  ref_w = cb_refplanes[refplane].z;
+  ref_h = cb_refplanes[refplane].w;
+  scale = cb_scale[ref_frm + 1];
+  mv = block.w;
+  mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w) << 1;
+  y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  local_base = (thread.x & 63) * LocalStride;
+
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    const uint shift = (ref_addr & 2) * 8;
+    ref_addr &= ~3;
+    uint4 l = dst_frame.Load4(ref_addr);
+    uint l5 = dst_frame.Load(ref_addr + 16);
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+    l.w = (l.w >> shift) | ((l5 << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xffff);
+    sum += kernel_h0.y * (int)((l.x >> 16) & 0xffff);
+    sum += kernel_h0.z * (int)((l.y >> 0) & 0xffff);
+    sum += kernel_h0.w * (int)((l.y >> 16) & 0xffff);
+    sum += kernel_h1.x * (int)((l.z >> 0) & 0xffff);
+    sum += kernel_h1.y * (int)((l.z >> 16) & 0xffff);
+    sum += kernel_h1.z * (int)((l.w >> 0) & 0xffff);
+    sum += kernel_h1.w * (int)((l.w >> 16) & 0xffff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd10bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+
+  mbx = (mbx + dx) << 1;
+  mby += dy + wi;
+  const int output_addr = cb_planes[plane].y + mbx + mby * cb_planes[plane].x;
+  uint mask = dst_frame.Load(output_addr);
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = blend(output[i], sum, (mask >> (i * 8)) & 255);
+  }
+
+  if (noskip) {
+    const int res_addr = cb_planes[plane].w + mbx + mby * cb_planes[plane].z;
+    int2 r = (int2)residuals.Load2(res_addr);
+    output[0] = clamp(output[0] + ((r.x << 16) >> 16), 0, PixelMax);
+    output[1] = clamp(output[1] + (r.x >> 16), 0, PixelMax);
+    output[2] = clamp(output[2] + ((r.y << 16) >> 16), 0, PixelMax);
+    output[3] = clamp(output[3] + (r.y >> 16), 0, PixelMax);
+  }
+
+  dst_frame.Store2(output_addr, uint2(output[0] | (output[1] << 16), output[2] | (output[3] << 16)));
+}

diff --git a/libav1/dx/shaders/inter_scale_compound_diff_luma.hlsl b/libav1/dx/shaders/inter_scale_compound_diff_luma.hlsl
new file mode 100644
index 0000000..dba61fb
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_compound_diff_luma.hlsl

@@ -0,0 +1,243 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 7
+#define OffsetBits 19
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define DiffWTDBase 38
+#define DiffWTDRoundAdd (1 << 3)
+#define DiffWTDRoundShft 8
+#define DiffWTDBits 6
+#define DiffWTDMax 64
+#define LocalStride 20
+
+int compute_mask(int src0, int src1, int inv) {
+  int m = clamp(DiffWTDBase + ((abs(src0 - src1) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+  return inv ? DiffWTDMax - m : m;
+}
+
+int blend(int src0, int src1, int m) {
+  int result = (src0 * m + src1 * (DiffWTDMax - m)) >> DiffWTDBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, 255);
+}
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+  const int wi = thread.x & 3;
+  const int dx = SubblockW * (subblock & ((1 << w_log) - 1));
+  const int dy = SubblockH * (subblock >> w_log);
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+
+  int ref_frm = (block.y >> 2) & 7;
+  int refplane = ref_frm * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+  int ref_w = cb_refplanes[refplane].z;
+  int ref_h = cb_refplanes[refplane].w;
+  int4 scale = cb_scale[ref_frm + 1];
+  int mv = block.z;
+  int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w);
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    uint3 l = dst_frame.Load3(ref_addr & (~3));
+    const uint shift = (ref_addr & 3) * 8;
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xff);
+    sum += kernel_h0.y * (int)((l.x >> 8) & 0xff);
+    sum += kernel_h0.z * (int)((l.x >> 16) & 0xff);
+    sum += kernel_h0.w * (int)((l.x >> 24) & 0xff);
+    sum += kernel_h1.x * (int)((l.y >> 0) & 0xff);
+    sum += kernel_h1.y * (int)((l.y >> 8) & 0xff);
+    sum += kernel_h1.z * (int)((l.y >> 16) & 0xff);
+    sum += kernel_h1.w * (int)((l.y >> 24) & 0xff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd8bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int output[4];  /// int4???
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = sum;
+  }
+
+  GroupMemoryBarrier();
+
+  ref_frm = (block.y >> 14) & 7;
+  refplane = ref_frm * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+  ref_w = cb_refplanes[refplane].z;
+  ref_h = cb_refplanes[refplane].w;
+  scale = cb_scale[ref_frm + 1];
+  mv = block.w;
+  mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w);
+  y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  local_base = (thread.x & 63) * LocalStride;
+
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    uint3 l = dst_frame.Load3(ref_addr & (~3));
+    const uint shift = (ref_addr & 3) * 8;
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xff);
+    sum += kernel_h0.y * (int)((l.x >> 8) & 0xff);
+    sum += kernel_h0.z * (int)((l.x >> 16) & 0xff);
+    sum += kernel_h0.w * (int)((l.x >> 24) & 0xff);
+    sum += kernel_h1.x * (int)((l.y >> 0) & 0xff);
+    sum += kernel_h1.y * (int)((l.y >> 8) & 0xff);
+    sum += kernel_h1.z * (int)((l.y >> 16) & 0xff);
+    sum += kernel_h1.w * (int)((l.y >> 24) & 0xff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd8bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  const int inv = (block.y >> 17) & 1;
+
+  int mask[4];
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+
+    int src0 = (output[i] + OutputRoundAdd) >> OutputShift;
+    int src1 = (sum + OutputRoundAdd) >> OutputShift;
+    int m = compute_mask(src0, src1, inv);
+    output[i] = blend(src0, src1, m);
+    mask[i] = m;
+  }
+  int m0 = mask[0] + mask[1];
+  int m1 = mask[2] + mask[3];
+
+  mbx += dx;
+  mby += dy + wi;
+  if (noskip) {
+    const int res_addr = cb_planes[plane].w + (mbx << 1) + mby * cb_planes[plane].z;
+    int2 r = (int2)residuals.Load2(res_addr);
+    output[0] = clamp(output[0] + ((r.x << 16) >> 16), 0, 255);
+    output[1] = clamp(output[1] + (r.x >> 16), 0, 255);
+    output[2] = clamp(output[2] + ((r.y << 16) >> 16), 0, 255);
+    output[3] = clamp(output[3] + (r.y >> 16), 0, 255);
+  }
+
+  const int output_addr = cb_planes[plane].y + mbx + mby * cb_planes[plane].x;
+  dst_frame.Store(output_addr, output[0] | (output[1] << 8) | (output[2] << 16) | (output[3] << 24));
+
+  GroupMemoryBarrier();
+
+  local_base = (thread.x & 63) << 1;
+  intermediate_buffer[local_base + 0] = m0;
+  intermediate_buffer[local_base + 1] = m1;
+
+  GroupMemoryBarrier();
+
+  // even lines of every even 4x4 subblock
+  if ((thread.x & 5) == 0) {
+    m0 = (m0 + intermediate_buffer[local_base + 2] + 2) >> 2;
+    m1 = (m1 + intermediate_buffer[local_base + 3] + 2) >> 2;
+    // mask from the next 4x4 block:
+    int m2 = (intermediate_buffer[local_base + 8] + intermediate_buffer[local_base + 10] + 2) >> 2;
+    int m3 = (intermediate_buffer[local_base + 9] + intermediate_buffer[local_base + 11] + 2) >> 2;
+    int chroma_offset = (mbx + mby * cb_planes[1].x) >> 1;
+    uint chroma_mask = m0 | (m1 << 8) | (m2 << 16) | (m3 << 24);
+    dst_frame.Store(cb_planes[1].y + chroma_offset, chroma_mask);
+    dst_frame.Store(cb_planes[2].y + chroma_offset, chroma_mask);
+  }
+}

diff --git a/libav1/dx/shaders/inter_scale_compound_diff_luma_hbd.hlsl b/libav1/dx/shaders/inter_scale_compound_diff_luma_hbd.hlsl
new file mode 100644
index 0000000..fcd4855
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_compound_diff_luma_hbd.hlsl

@@ -0,0 +1,252 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 7
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define PixelMax 1023
+#define DiffWTDBase 38
+#define DiffWTDRoundAdd (1 << 5)
+#define DiffWTDRoundShft (6 + 4)
+#define DiffWTDBits 6
+#define DiffWTDMax 64
+#define LocalStride 20
+
+int compute_mask(int src0, int src1, int inv) {
+  int m = clamp(DiffWTDBase + ((abs(src0 - src1) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+  return inv ? DiffWTDMax - m : m;
+}
+
+int blend(int src0, int src1, int m) {
+  int result = (src0 * m + src1 * (DiffWTDMax - m)) >> DiffWTDBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, PixelMax);
+}
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+  const int wi = thread.x & 3;
+  const int dx = SubblockW * (subblock & ((1 << w_log) - 1));
+  const int dy = SubblockH * (subblock >> w_log);
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+
+  int ref_frm = (block.y >> 2) & 7;
+  int refplane = ref_frm * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+  int ref_w = cb_refplanes[refplane].z;
+  int ref_h = cb_refplanes[refplane].w;
+  int4 scale = cb_scale[ref_frm + 1];
+  int mv = block.z;
+  int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w) << 1;
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    const uint shift = (ref_addr & 2) * 8;
+    ref_addr &= ~3;
+    uint4 l = dst_frame.Load4(ref_addr);
+    uint l5 = dst_frame.Load(ref_addr + 16);
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+    l.w = (l.w >> shift) | ((l5 << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xffff);
+    sum += kernel_h0.y * (int)((l.x >> 16) & 0xffff);
+    sum += kernel_h0.z * (int)((l.y >> 0) & 0xffff);
+    sum += kernel_h0.w * (int)((l.y >> 16) & 0xffff);
+    sum += kernel_h1.x * (int)((l.z >> 0) & 0xffff);
+    sum += kernel_h1.y * (int)((l.z >> 16) & 0xffff);
+    sum += kernel_h1.z * (int)((l.w >> 0) & 0xffff);
+    sum += kernel_h1.w * (int)((l.w >> 16) & 0xffff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd10bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int output[4];  /// int4???
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = sum;
+  }
+
+  GroupMemoryBarrier();
+
+  ref_frm = (block.y >> 14) & 7;
+  refplane = ref_frm * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+  ref_w = cb_refplanes[refplane].z;
+  ref_h = cb_refplanes[refplane].w;
+  scale = cb_scale[ref_frm + 1];
+  mv = block.w;
+  mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w) << 1;
+  y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  local_base = (thread.x & 63) * LocalStride;
+
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    const uint shift = (ref_addr & 2) * 8;
+    ref_addr &= ~3;
+    uint4 l = dst_frame.Load4(ref_addr);
+    uint l5 = dst_frame.Load(ref_addr + 16);
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+    l.w = (l.w >> shift) | ((l5 << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xffff);
+    sum += kernel_h0.y * (int)((l.x >> 16) & 0xffff);
+    sum += kernel_h0.z * (int)((l.y >> 0) & 0xffff);
+    sum += kernel_h0.w * (int)((l.y >> 16) & 0xffff);
+    sum += kernel_h1.x * (int)((l.z >> 0) & 0xffff);
+    sum += kernel_h1.y * (int)((l.z >> 16) & 0xffff);
+    sum += kernel_h1.z * (int)((l.w >> 0) & 0xffff);
+    sum += kernel_h1.w * (int)((l.w >> 16) & 0xffff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd10bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  const int inv = (block.y >> 17) & 1;
+
+  int mask[4];
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+
+    int src0 = (output[i] + OutputRoundAdd) >> OutputShift;
+    int src1 = (sum + OutputRoundAdd) >> OutputShift;
+    int m = compute_mask(src0, src1, inv);
+    output[i] = blend(src0, src1, m);
+    mask[i] = m;
+  }
+  int m0 = mask[0] + mask[1];
+  int m1 = mask[2] + mask[3];
+
+  mbx = (mbx + dx) << 1;
+  mby += dy + wi;
+  if (noskip) {
+    const int res_addr = cb_planes[plane].w + mbx + mby * cb_planes[plane].z;
+    int2 r = (int2)residuals.Load2(res_addr);
+    output[0] = clamp(output[0] + ((r.x << 16) >> 16), 0, PixelMax);
+    output[1] = clamp(output[1] + (r.x >> 16), 0, PixelMax);
+    output[2] = clamp(output[2] + ((r.y << 16) >> 16), 0, PixelMax);
+    output[3] = clamp(output[3] + (r.y >> 16), 0, PixelMax);
+  }
+
+  const int output_addr = cb_planes[plane].y + mbx + mby * cb_planes[plane].x;
+  dst_frame.Store2(output_addr, uint2(output[0] | (output[1] << 16), output[2] | (output[3] << 16)));
+
+  GroupMemoryBarrier();
+
+  local_base = (thread.x & 63) << 1;
+  intermediate_buffer[local_base + 0] = m0;
+  intermediate_buffer[local_base + 1] = m1;
+
+  GroupMemoryBarrier();
+
+  // even lines of every even 4x4 subblock
+  if ((thread.x & 5) == 0) {
+    m0 = (m0 + intermediate_buffer[local_base + 2] + 2) >> 2;
+    m1 = (m1 + intermediate_buffer[local_base + 3] + 2) >> 2;
+    // mask from the next 4x4 block:
+    int m2 = (intermediate_buffer[local_base + 8] + intermediate_buffer[local_base + 10] + 2) >> 2;
+    int m3 = (intermediate_buffer[local_base + 9] + intermediate_buffer[local_base + 11] + 2) >> 2;
+    int chroma_offset = (mbx + mby * cb_planes[1].x) >> 1;
+    uint chroma_mask = m0 | (m1 << 8) | (m2 << 16) | (m3 << 24);
+    dst_frame.Store(cb_planes[1].y + chroma_offset, chroma_mask);
+    dst_frame.Store(cb_planes[2].y + chroma_offset, chroma_mask);
+  }
+}

diff --git a/libav1/dx/shaders/inter_scale_compound_hbd.hlsl b/libav1/dx/shaders/inter_scale_compound_hbd.hlsl
new file mode 100644
index 0000000..f495765
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_compound_hbd.hlsl

@@ -0,0 +1,215 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 7
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define PixelMax 1023
+#define LocalStride 20
+
+int blend(int src0, int src1, int coef0, int coef1) {
+  src0 = (src0 + OutputRoundAdd) >> OutputShift;
+  src1 = (src1 + OutputRoundAdd) >> OutputShift;
+  int result = (src0 * coef0 + src1 * coef1) >> DistBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, PixelMax);
+}
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+  const int wi = thread.x & 3;
+  const int dx = SubblockW * (subblock & ((1 << w_log) - 1));
+  const int dy = SubblockH * (subblock >> w_log);
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+
+  int ref_frm = (block.y >> 2) & 7;
+  int refplane = ref_frm * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+  int ref_w = cb_refplanes[refplane].z;
+  int ref_h = cb_refplanes[refplane].w;
+  int4 scale = cb_scale[ref_frm + 1];
+  int mv = block.z;
+  int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w) << 1;
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    const uint shift = (ref_addr & 2) * 8;
+    ref_addr &= ~3;
+    uint4 l = dst_frame.Load4(ref_addr);
+    uint l5 = dst_frame.Load(ref_addr + 16);
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+    l.w = (l.w >> shift) | ((l5 << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xffff);
+    sum += kernel_h0.y * (int)((l.x >> 16) & 0xffff);
+    sum += kernel_h0.z * (int)((l.y >> 0) & 0xffff);
+    sum += kernel_h0.w * (int)((l.y >> 16) & 0xffff);
+    sum += kernel_h1.x * (int)((l.z >> 0) & 0xffff);
+    sum += kernel_h1.y * (int)((l.z >> 16) & 0xffff);
+    sum += kernel_h1.z * (int)((l.w >> 0) & 0xffff);
+    sum += kernel_h1.w * (int)((l.w >> 16) & 0xffff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd10bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int output[4];  /// int4???
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = sum;
+  }
+
+  GroupMemoryBarrier();
+
+  ref_frm = (block.y >> 14) & 7;
+  refplane = ref_frm * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+  ref_w = cb_refplanes[refplane].z;
+  ref_h = cb_refplanes[refplane].w;
+  scale = cb_scale[ref_frm + 1];
+  mv = block.w;
+  mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w) << 1;
+  y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  local_base = (thread.x & 63) * LocalStride;
+
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    const uint shift = (ref_addr & 2) * 8;
+    ref_addr &= ~3;
+    uint4 l = dst_frame.Load4(ref_addr);
+    uint l5 = dst_frame.Load(ref_addr + 16);
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+    l.w = (l.w >> shift) | ((l5 << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xffff);
+    sum += kernel_h0.y * (int)((l.x >> 16) & 0xffff);
+    sum += kernel_h0.z * (int)((l.y >> 0) & 0xffff);
+    sum += kernel_h0.w * (int)((l.y >> 16) & 0xffff);
+    sum += kernel_h1.x * (int)((l.z >> 0) & 0xffff);
+    sum += kernel_h1.y * (int)((l.z >> 16) & 0xffff);
+    sum += kernel_h1.z * (int)((l.w >> 0) & 0xffff);
+    sum += kernel_h1.w * (int)((l.w >> 16) & 0xffff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd10bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int coef0 = (block.y >> 17) & 15;
+  int coef1 = (block.y >> 21) & 15;
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = blend(output[i], sum, coef0, coef1);
+  }
+
+  mbx = (mbx + dx) << 1;
+  mby += dy + wi;
+  if (noskip) {
+    const int res_addr = cb_planes[plane].w + mbx + mby * cb_planes[plane].z;
+    int2 r = (int2)residuals.Load2(res_addr);
+    output[0] = clamp(output[0] + ((r.x << 16) >> 16), 0, PixelMax);
+    output[1] = clamp(output[1] + (r.x >> 16), 0, PixelMax);
+    output[2] = clamp(output[2] + ((r.y << 16) >> 16), 0, PixelMax);
+    output[3] = clamp(output[3] + (r.y >> 16), 0, PixelMax);
+  }
+
+  const int output_addr = cb_planes[plane].y + mbx + mby * cb_planes[plane].x;
+  dst_frame.Store2(output_addr, uint2(output[0] | (output[1] << 16), output[2] | (output[3] << 16)));
+}

diff --git a/libav1/dx/shaders/inter_scale_compound_masked.hlsl b/libav1/dx/shaders/inter_scale_compound_masked.hlsl
new file mode 100644
index 0000000..e04c3d7
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_compound_masked.hlsl

@@ -0,0 +1,210 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 7
+#define OffsetBits 19
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define MaskBits 6
+#define MaskMax 64
+#define LocalStride 20
+
+int blend(int src0, int src1, int mask) {
+  src0 = (src0 + OutputRoundAdd) >> OutputShift;
+  src1 = (src1 + OutputRoundAdd) >> OutputShift;
+  int result = (src0 * mask + src1 * (MaskMax - mask)) >> MaskBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, 255);
+}
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+  const int wi = thread.x & 3;
+  const int dx = SubblockW * (subblock & ((1 << w_log) - 1));
+  const int dy = SubblockH * (subblock >> w_log);
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+
+  int ref_frm = (block.y >> 2) & 7;
+  int refplane = ref_frm * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+  int ref_w = cb_refplanes[refplane].z;
+  int ref_h = cb_refplanes[refplane].w;
+  int4 scale = cb_scale[ref_frm + 1];
+  int mv = block.z;
+  int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w);
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    uint3 l = dst_frame.Load3(ref_addr & (~3));
+    const uint shift = (ref_addr & 3) * 8;
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xff);
+    sum += kernel_h0.y * (int)((l.x >> 8) & 0xff);
+    sum += kernel_h0.z * (int)((l.x >> 16) & 0xff);
+    sum += kernel_h0.w * (int)((l.x >> 24) & 0xff);
+    sum += kernel_h1.x * (int)((l.y >> 0) & 0xff);
+    sum += kernel_h1.y * (int)((l.y >> 8) & 0xff);
+    sum += kernel_h1.z * (int)((l.y >> 16) & 0xff);
+    sum += kernel_h1.w * (int)((l.y >> 24) & 0xff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd8bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int output[4];  /// int4???
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = sum;
+  }
+
+  GroupMemoryBarrier();
+
+  ref_frm = (block.y >> 14) & 7;
+  refplane = ref_frm * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+  ref_w = cb_refplanes[refplane].z;
+  ref_h = cb_refplanes[refplane].w;
+  scale = cb_scale[ref_frm + 1];
+  mv = block.w;
+  mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w);
+  y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  local_base = (thread.x & 63) * LocalStride;
+
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    uint3 l = dst_frame.Load3(ref_addr & (~3));
+    const uint shift = (ref_addr & 3) * 8;
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xff);
+    sum += kernel_h0.y * (int)((l.x >> 8) & 0xff);
+    sum += kernel_h0.z * (int)((l.x >> 16) & 0xff);
+    sum += kernel_h0.w * (int)((l.x >> 24) & 0xff);
+    sum += kernel_h1.x * (int)((l.y >> 0) & 0xff);
+    sum += kernel_h1.y * (int)((l.y >> 8) & 0xff);
+    sum += kernel_h1.z * (int)((l.y >> 16) & 0xff);
+    sum += kernel_h1.w * (int)((l.y >> 24) & 0xff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd8bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+
+  int wedge_stride = SubblockW << w_log;
+  int wedge_addr = (((block.y >> 17) & 0x1fff) << 6) + (dy + wi) * wedge_stride + dx;
+  uint wedge = comp_mask.Load(wedge_addr);
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = blend(output[i], sum, (wedge >> (i * 8)) & 255);
+  }
+
+  mbx += dx;
+  mby += dy + wi;
+  if (noskip) {
+    const int res_addr = cb_planes[plane].w + (mbx << 1) + mby * cb_planes[plane].z;
+    int2 r = (int2)residuals.Load2(res_addr);
+    output[0] = clamp(output[0] + ((r.x << 16) >> 16), 0, 255);
+    output[1] = clamp(output[1] + (r.x >> 16), 0, 255);
+    output[2] = clamp(output[2] + ((r.y << 16) >> 16), 0, 255);
+    output[3] = clamp(output[3] + (r.y >> 16), 0, 255);
+  }
+
+  const int output_addr = cb_planes[plane].y + mbx + mby * cb_planes[plane].x;
+  dst_frame.Store(output_addr, output[0] | (output[1] << 8) | (output[2] << 16) | (output[3] << 24));
+}

diff --git a/libav1/dx/shaders/inter_scale_compound_masked_hbd.hlsl b/libav1/dx/shaders/inter_scale_compound_masked_hbd.hlsl
new file mode 100644
index 0000000..f67ebb8
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_compound_masked_hbd.hlsl

@@ -0,0 +1,218 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 7
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define RoundFinal 4
+#define DistBits 4
+#define MaskBits 6
+#define MaskMax 64
+#define PixelMax 1023
+#define LocalStride 20
+
+int blend(int src0, int src1, int mask) {
+  src0 = (src0 + OutputRoundAdd) >> OutputShift;
+  src1 = (src1 + OutputRoundAdd) >> OutputShift;
+  int result = (src0 * mask + src1 * (MaskMax - mask)) >> MaskBits;
+  result = (result - OutputSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, PixelMax);
+}
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index * 16);
+  // block.x - pos xy
+  // block.y - flags:
+  //        2    plane
+  //        3    ref
+  //        4    filter_x
+  //        4    filter_y
+  //        1    skip
+  //        3    ref1
+  //
+
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+  const int wi = thread.x & 3;
+  const int dx = SubblockW * (subblock & ((1 << w_log) - 1));
+  const int dy = SubblockH * (subblock >> w_log);
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+
+  int ref_frm = (block.y >> 2) & 7;
+  int refplane = ref_frm * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+  int ref_w = cb_refplanes[refplane].z;
+  int ref_h = cb_refplanes[refplane].w;
+  int4 scale = cb_scale[ref_frm + 1];
+  int mv = block.z;
+  int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w) << 1;
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    const uint shift = (ref_addr & 2) * 8;
+    ref_addr &= ~3;
+    uint4 l = dst_frame.Load4(ref_addr);
+    uint l5 = dst_frame.Load(ref_addr + 16);
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+    l.w = (l.w >> shift) | ((l5 << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xffff);
+    sum += kernel_h0.y * (int)((l.x >> 16) & 0xffff);
+    sum += kernel_h0.z * (int)((l.y >> 0) & 0xffff);
+    sum += kernel_h0.w * (int)((l.y >> 16) & 0xffff);
+    sum += kernel_h1.x * (int)((l.z >> 0) & 0xffff);
+    sum += kernel_h1.y * (int)((l.z >> 16) & 0xffff);
+    sum += kernel_h1.z * (int)((l.w >> 0) & 0xffff);
+    sum += kernel_h1.w * (int)((l.w >> 16) & 0xffff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd10bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  int4 kernel_v0 = cb_kernels[filter_v][0];
+  int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int output[4];  /// int4???
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = sum;
+  }
+
+  GroupMemoryBarrier();
+
+  ref_frm = (block.y >> 14) & 7;
+  refplane = ref_frm * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+  ref_w = cb_refplanes[refplane].z;
+  ref_h = cb_refplanes[refplane].w;
+  scale = cb_scale[ref_frm + 1];
+  mv = block.w;
+  mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w) << 1;
+  y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+  kernel_h0 = cb_kernels[filter_h][0];
+  kernel_h1 = cb_kernels[filter_h][1];
+  local_base = (thread.x & 63) * LocalStride;
+
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    const uint shift = (ref_addr & 2) * 8;
+    ref_addr &= ~3;
+    uint4 l = dst_frame.Load4(ref_addr);
+    uint l5 = dst_frame.Load(ref_addr + 16);
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+    l.w = (l.w >> shift) | ((l5 << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xffff);
+    sum += kernel_h0.y * (int)((l.x >> 16) & 0xffff);
+    sum += kernel_h0.z * (int)((l.y >> 0) & 0xffff);
+    sum += kernel_h0.w * (int)((l.y >> 16) & 0xffff);
+    sum += kernel_h1.x * (int)((l.z >> 0) & 0xffff);
+    sum += kernel_h1.y * (int)((l.z >> 16) & 0xffff);
+    sum += kernel_h1.z * (int)((l.w >> 0) & 0xffff);
+    sum += kernel_h1.w * (int)((l.w >> 16) & 0xffff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd10bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  kernel_v0 = cb_kernels[filter_v][0];
+  kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int wedge_stride = SubblockW << w_log;
+  int wedge_addr = (((block.y >> 17) & 0x1fff) << 6) + (dy + wi) * wedge_stride + dx;
+  uint wedge = comp_mask.Load(wedge_addr);
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = blend(output[i], sum, (wedge >> (i * 8)) & 255);
+  }
+
+  mbx = (mbx + dx) << 1;
+  mby += dy + wi;
+  if (noskip) {
+    const int res_addr = cb_planes[plane].w + mbx + mby * cb_planes[plane].z;
+    int2 r = (int2)residuals.Load2(res_addr);
+    output[0] = clamp(output[0] + ((r.x << 16) >> 16), 0, PixelMax);
+    output[1] = clamp(output[1] + (r.x >> 16), 0, PixelMax);
+    output[2] = clamp(output[2] + ((r.y << 16) >> 16), 0, PixelMax);
+    output[3] = clamp(output[3] + (r.y >> 16), 0, PixelMax);
+  }
+
+  const int output_addr = cb_planes[plane].y + mbx + mby * cb_planes[plane].x;
+  dst_frame.Store2(output_addr, uint2(output[0] | (output[1] << 16), output[2] | (output[3] << 16)));
+}

diff --git a/libav1/dx/shaders/inter_scale_hbd.hlsl b/libav1/dx/shaders/inter_scale_hbd.hlsl
new file mode 100644
index 0000000..eb4df18
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_hbd.hlsl

@@ -0,0 +1,130 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 11
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define PixelMax 1023
+#define LocalStride 20
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index << 4);
+
+  const int wi = thread.x & 3;
+  const int dx = SubblockW * (subblock & ((1 << w_log) - 1));
+  const int dy = SubblockH * (subblock >> w_log);
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+  const int ref_frm = (block.y >> 2) & 7;
+  const int refplane = ref_frm * 3 + plane;
+  const int ref_offset = cb_refplanes[refplane].y;
+  const int ref_stride = cb_refplanes[refplane].x;
+  const int ref_w = cb_refplanes[refplane].z;
+  const int ref_h = cb_refplanes[refplane].w;
+
+  int4 scale = cb_scale[ref_frm + 1];
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+  int mv = block.z;
+  int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w) << 1;
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  mbx += dx;
+  mby += dy + wi;
+
+  const int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  const int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    const uint shift = (ref_addr & 2) * 8;
+    ref_addr &= ~3;
+    uint4 l = dst_frame.Load4(ref_addr);
+    uint l5 = dst_frame.Load(ref_addr + 16);
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+    l.w = (l.w >> shift) | ((l5 << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xffff);
+    sum += kernel_h0.y * (int)((l.x >> 16) & 0xffff);
+    sum += kernel_h0.z * (int)((l.y >> 0) & 0xffff);
+    sum += kernel_h0.w * (int)((l.y >> 16) & 0xffff);
+    sum += kernel_h1.x * (int)((l.z >> 0) & 0xffff);
+    sum += kernel_h1.y * (int)((l.z >> 16) & 0xffff);
+    sum += kernel_h1.z * (int)((l.w >> 0) & 0xffff);
+    sum += kernel_h1.w * (int)((l.w >> 16) & 0xffff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd10bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  const int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  const int4 kernel_v0 = cb_kernels[filter_v][0];
+  const int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int output[4];  /// int4???
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = clamp((int)(((sum + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+  }
+
+  mbx <<= 1;
+  if (noskip) {
+    const int res_addr = cb_planes[plane].w + mbx + mby * cb_planes[plane].z;
+    int2 r = (int2)residuals.Load2(res_addr);
+    output[0] = clamp(output[0] + ((r.x << 16) >> 16), 0, PixelMax);
+    output[1] = clamp(output[1] + (r.x >> 16), 0, PixelMax);
+    output[2] = clamp(output[2] + ((r.y << 16) >> 16), 0, PixelMax);
+    output[3] = clamp(output[3] + (r.y >> 16), 0, PixelMax);
+  }
+
+  const int output_addr = cb_planes[plane].y + mbx + mby * cb_planes[plane].x;
+  dst_frame.Store2(output_addr, uint2(output[0] | (output[1] << 16), output[2] | (output[3] << 16)));
+}

diff --git a/libav1/dx/shaders/inter_scale_obmc_above.hlsl b/libav1/dx/shaders/inter_scale_obmc_above.hlsl
new file mode 100644
index 0000000..9f86545
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_obmc_above.hlsl

@@ -0,0 +1,123 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 11
+#define OffsetBits 19
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define LocalStride 20
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index << 4);
+
+  const int wi = thread.x & 3;
+  const int dx = SubblockW * (subblock & ((1 << w_log) - 1));
+  const int dy = SubblockH * (subblock >> w_log);
+  const int plane = block.y & 3;
+  const int ref_frm = (block.y >> 2) & 7;
+  const int refplane = ref_frm * 3 + plane;
+  const int ref_offset = cb_refplanes[refplane].y;
+  const int ref_stride = cb_refplanes[refplane].x;
+  const int ref_w = cb_refplanes[refplane].z;
+  const int ref_h = cb_refplanes[refplane].w;
+
+  int4 scale = cb_scale[ref_frm + 1];
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+  int mv = block.z;
+  int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w);
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+
+  mbx += dx;
+  mby += dy + wi;
+
+  const int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  const int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    uint3 l = dst_frame.Load3(ref_addr & (~3));
+    const uint shift = (ref_addr & 3) * 8;
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xff);
+    sum += kernel_h0.y * (int)((l.x >> 8) & 0xff);
+    sum += kernel_h0.z * (int)((l.x >> 16) & 0xff);
+    sum += kernel_h0.w * (int)((l.x >> 24) & 0xff);
+    sum += kernel_h1.x * (int)((l.y >> 0) & 0xff);
+    sum += kernel_h1.y * (int)((l.y >> 8) & 0xff);
+    sum += kernel_h1.z * (int)((l.y >> 16) & 0xff);
+    sum += kernel_h1.w * (int)((l.y >> 24) & 0xff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd8bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  const int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  const int4 kernel_v0 = cb_kernels[filter_v][0];
+  const int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int output[4];  /// int4???
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = clamp((int)(((sum + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+  }
+
+  int4 m = cb_obmc_mask[(block.y >> 17) + (subblock >> w_log)];
+  const int output_addr = cb_planes[plane].y + mbx + mby * cb_planes[plane].x;
+
+  int mask = ((m.x | (m.y << 8) | (m.z << 16) | (m.w << 24)) >> (wi * 8)) & 255;
+  uint src0 = dst_frame.Load(output_addr);
+  output[0] = (((src0 >> 0) & 255) * mask + output[0] * (64 - mask) + 32) >> 6;
+  output[1] = (((src0 >> 8) & 255) * mask + output[1] * (64 - mask) + 32) >> 6;
+  output[2] = (((src0 >> 16) & 255) * mask + output[2] * (64 - mask) + 32) >> 6;
+  output[3] = (((src0 >> 24) & 255) * mask + output[3] * (64 - mask) + 32) >> 6;
+  dst_frame.Store(output_addr, output[0] | (output[1] << 8) | (output[2] << 16) | (output[3] << 24));
+}

diff --git a/libav1/dx/shaders/inter_scale_obmc_above_hbd.hlsl b/libav1/dx/shaders/inter_scale_obmc_above_hbd.hlsl
new file mode 100644
index 0000000..85fa6dc
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_obmc_above_hbd.hlsl

@@ -0,0 +1,129 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 11
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define PixelMax 1023
+#define LocalStride 20
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index << 4);
+
+  const int wi = thread.x & 3;
+  const int dx = SubblockW * (subblock & ((1 << w_log) - 1));
+  const int dy = SubblockH * (subblock >> w_log);
+  const int plane = block.y & 3;
+  const int ref_frm = (block.y >> 2) & 7;
+  const int refplane = ref_frm * 3 + plane;
+  const int ref_offset = cb_refplanes[refplane].y;
+  const int ref_stride = cb_refplanes[refplane].x;
+  const int ref_w = cb_refplanes[refplane].z;
+  const int ref_h = cb_refplanes[refplane].w;
+
+  int4 scale = cb_scale[ref_frm + 1];
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+  int mv = block.z;
+  int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w) << 1;
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  mbx += dx;
+  mby += dy + wi;
+
+  const int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  const int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    const uint shift = (ref_addr & 2) * 8;
+    ref_addr &= ~3;
+    uint4 l = dst_frame.Load4(ref_addr);
+    uint l5 = dst_frame.Load(ref_addr + 16);
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+    l.w = (l.w >> shift) | ((l5 << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xffff);
+    sum += kernel_h0.y * (int)((l.x >> 16) & 0xffff);
+    sum += kernel_h0.z * (int)((l.y >> 0) & 0xffff);
+    sum += kernel_h0.w * (int)((l.y >> 16) & 0xffff);
+    sum += kernel_h1.x * (int)((l.z >> 0) & 0xffff);
+    sum += kernel_h1.y * (int)((l.z >> 16) & 0xffff);
+    sum += kernel_h1.z * (int)((l.w >> 0) & 0xffff);
+    sum += kernel_h1.w * (int)((l.w >> 16) & 0xffff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd10bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  const int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  const int4 kernel_v0 = cb_kernels[filter_v][0];
+  const int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int output[4];  /// int4???
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = clamp((int)(((sum + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+  }
+
+  mbx <<= 1;
+  const int output_addr = cb_planes[plane].y + mbx + mby * cb_planes[plane].x;
+
+  int4 m = cb_obmc_mask[(block.y >> 17) + (subblock >> w_log)];
+
+  int mask = ((m.x | (m.y << 8) | (m.z << 16) | (m.w << 24)) >> (wi * 8)) & 255;
+  uint2 src = dst_frame.Load2(output_addr);
+  output[0] = (((src.x >> 0) & PixelMax) * mask + output[0] * (64 - mask) + 32) >> 6;
+  output[1] = (((src.x >> 16) & PixelMax) * mask + output[1] * (64 - mask) + 32) >> 6;
+  output[2] = (((src.y >> 0) & PixelMax) * mask + output[2] * (64 - mask) + 32) >> 6;
+  output[3] = (((src.y >> 16) & PixelMax) * mask + output[3] * (64 - mask) + 32) >> 6;
+  dst_frame.Store2(output_addr, uint2(output[0] | (output[1] << 16), output[2] | (output[3] << 16)));
+}

diff --git a/libav1/dx/shaders/inter_scale_obmc_left.hlsl b/libav1/dx/shaders/inter_scale_obmc_left.hlsl
new file mode 100644
index 0000000..1911208
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_obmc_left.hlsl

@@ -0,0 +1,121 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 11
+#define OffsetBits 19
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define LocalStride 20
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index << 4);
+
+  const int wi = thread.x & 3;
+  const int dx = SubblockW * (subblock & ((1 << w_log) - 1));
+  const int dy = SubblockH * (subblock >> w_log);
+  const int plane = block.y & 3;
+  const int ref_frm = (block.y >> 2) & 7;
+  const int refplane = ref_frm * 3 + plane;
+  const int ref_offset = cb_refplanes[refplane].y;
+  const int ref_stride = cb_refplanes[refplane].x;
+  const int ref_w = cb_refplanes[refplane].z;
+  const int ref_h = cb_refplanes[refplane].w;
+
+  int4 scale = cb_scale[ref_frm + 1];
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+  int mv = block.z;
+  int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w);
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+
+  mbx += dx;
+  mby += dy + wi;
+
+  const int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  const int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    uint3 l = dst_frame.Load3(ref_addr & (~3));
+    const uint shift = (ref_addr & 3) * 8;
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xff);
+    sum += kernel_h0.y * (int)((l.x >> 8) & 0xff);
+    sum += kernel_h0.z * (int)((l.x >> 16) & 0xff);
+    sum += kernel_h0.w * (int)((l.x >> 24) & 0xff);
+    sum += kernel_h1.x * (int)((l.y >> 0) & 0xff);
+    sum += kernel_h1.y * (int)((l.y >> 8) & 0xff);
+    sum += kernel_h1.z * (int)((l.y >> 16) & 0xff);
+    sum += kernel_h1.w * (int)((l.y >> 24) & 0xff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd8bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  const int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  const int4 kernel_v0 = cb_kernels[filter_v][0];
+  const int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int output[4];  /// int4???
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = clamp((int)(((sum + OutputRoundAdd) >> OutputShift) - OutputSub), 0, 255);
+  }
+
+  int4 mask = cb_obmc_mask[(block.y >> 17) + (subblock & ((1 << w_log) - 1))];
+  const int output_addr = cb_planes[plane].y + mbx + mby * cb_planes[plane].x;
+  uint src0 = dst_frame.Load(output_addr);
+  output[0] = (((src0 >> 0) & 255) * mask.x + output[0] * (64 - mask.x) + 32) >> 6;
+  output[1] = (((src0 >> 8) & 255) * mask.y + output[1] * (64 - mask.y) + 32) >> 6;
+  output[2] = (((src0 >> 16) & 255) * mask.z + output[2] * (64 - mask.z) + 32) >> 6;
+  output[3] = (((src0 >> 24) & 255) * mask.w + output[3] * (64 - mask.w) + 32) >> 6;
+  dst_frame.Store(output_addr, output[0] | (output[1] << 8) | (output[2] << 16) | (output[3] << 24));
+}

diff --git a/libav1/dx/shaders/inter_scale_obmc_left_hbd.hlsl b/libav1/dx/shaders/inter_scale_obmc_left_hbd.hlsl
new file mode 100644
index 0000000..2d76f4d
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_obmc_left_hbd.hlsl

@@ -0,0 +1,126 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define OutputShift 11
+#define OffsetBits 21
+#define OutputRoundAdd ((1 << (OutputShift - 1)) + (1 << OffsetBits))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+#define PixelMax 1023
+#define LocalStride 20
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index << 4);
+
+  const int wi = thread.x & 3;
+  const int dx = SubblockW * (subblock & ((1 << w_log) - 1));
+  const int dy = SubblockH * (subblock >> w_log);
+  const int plane = block.y & 3;
+  const int ref_frm = (block.y >> 2) & 7;
+  const int refplane = ref_frm * 3 + plane;
+  const int ref_offset = cb_refplanes[refplane].y;
+  const int ref_stride = cb_refplanes[refplane].x;
+  const int ref_w = cb_refplanes[refplane].z;
+  const int ref_h = cb_refplanes[refplane].w;
+
+  int4 scale = cb_scale[ref_frm + 1];
+  int mbx = SubblockW * (block.x & 0xffff);
+  int mby = SubblockH * (block.x >> 16);
+  int mv = block.z;
+  int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+  int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+  mvx += (dx + wi) * scale.y;
+  mvy += dy * scale.w;
+  int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w) << 1;
+  int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+  mvx &= SCALE_SUBPEL_MASK;
+  mvy &= SCALE_SUBPEL_MASK;
+  mbx += dx;
+  mby += dy + wi;
+
+  const int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+  const int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+
+  int4 kernel_h0 = cb_kernels[filter_h][0];
+  int4 kernel_h1 = cb_kernels[filter_h][1];
+  int local_base = (thread.x & 63) * LocalStride;
+  int i;
+  for (i = 0; i < lines; ++i) {
+    int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+    const uint shift = (ref_addr & 2) * 8;
+    ref_addr &= ~3;
+    uint4 l = dst_frame.Load4(ref_addr);
+    uint l5 = dst_frame.Load(ref_addr + 16);
+    l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+    l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+    l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+    l.w = (l.w >> shift) | ((l5 << (24 - shift)) << 8);
+    int sum = 0;
+    sum += kernel_h0.x * (int)((l.x >> 0) & 0xffff);
+    sum += kernel_h0.y * (int)((l.x >> 16) & 0xffff);
+    sum += kernel_h0.z * (int)((l.y >> 0) & 0xffff);
+    sum += kernel_h0.w * (int)((l.y >> 16) & 0xffff);
+    sum += kernel_h1.x * (int)((l.z >> 0) & 0xffff);
+    sum += kernel_h1.y * (int)((l.z >> 16) & 0xffff);
+    sum += kernel_h1.z * (int)((l.w >> 0) & 0xffff);
+    sum += kernel_h1.w * (int)((l.w >> 16) & 0xffff);
+    intermediate_buffer[local_base + i] = (sum + FilterLineAdd10bit) >> FilterLineShift;
+  }
+
+  GroupMemoryBarrier();
+
+  mvy += wi * scale.w;
+  const int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+  const int4 kernel_v0 = cb_kernels[filter_v][0];
+  const int4 kernel_v1 = cb_kernels[filter_v][1];
+  local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+  int output[4];  /// int4???
+  for (i = 0; i < 4; ++i) {
+    int sum = 0;
+    int loc_addr = local_base + i * LocalStride;
+    sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+    sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+    sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+    sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+    sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+    sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+    sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+    sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+    output[i] = clamp((int)(((sum + OutputRoundAdd) >> OutputShift) - OutputSub), 0, PixelMax);
+  }
+
+  mbx <<= 1;
+  int4 mask = cb_obmc_mask[(block.y >> 17) + (subblock & ((1 << w_log) - 1))];
+  const int output_addr = cb_planes[plane].y + mbx + mby * cb_planes[plane].x;
+  uint2 src = dst_frame.Load2(output_addr);
+  output[0] = (((src.x >> 0) & PixelMax) * mask.x + output[0] * (64 - mask.x) + 32) >> 6;
+  output[1] = (((src.x >> 16) & PixelMax) * mask.y + output[1] * (64 - mask.y) + 32) >> 6;
+  output[2] = (((src.y >> 0) & PixelMax) * mask.z + output[2] * (64 - mask.z) + 32) >> 6;
+  output[3] = (((src.y >> 16) & PixelMax) * mask.w + output[3] * (64 - mask.w) + 32) >> 6;
+  dst_frame.Store2(output_addr, uint2(output[0] | (output[1] << 16), output[2] | (output[3] << 16)));
+}

diff --git a/libav1/dx/shaders/inter_scale_warp_compound.hlsl b/libav1/dx/shaders/inter_scale_warp_compound.hlsl
new file mode 100644
index 0000000..53df9b9
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_warp_compound.hlsl

@@ -0,0 +1,400 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define VertBits 7
+#define VertSumAdd ((1 << 19) + (1 << (VertBits - 1)))
+#define VertSub ((1 << (19 - VertBits - 1)) + (1 << (19 - VertBits)))
+#define RoundFinal 4
+#define DiffWTDBase 38
+#define DiffWTDRoundAdd (1 << 3)
+#define DiffWTDRoundShft 8
+#define DiffWTDBits 6
+#define DiffWTDMax 64
+#define LocalStride 20
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+int blend(int src0, int src1, int coef) {
+  int result = (src0 * coef + src1 * (64 - coef)) >> 6;
+  result = (result - VertSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, 255);
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  int wi = thread.x & 3;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index * 16);
+  const int dx = SubblockW * (subblock & ((1 << w_log) - 1));
+  const int dy = SubblockH * (subblock >> w_log);
+  const int mbx = SubblockW * (block.x & 0xffff);
+  const int mby = SubblockH * (block.x >> 16);
+  int x = mbx + dx;
+  int y = mby + dy;
+
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+  const int subsampling = plane > 0;
+  const int local_ofs = (thread.x & 63) * LocalStride;
+
+  // REF0
+  int ref = (block.y >> 2) & 7;
+  int refplane = ref * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+  int ref_w = cb_refplanes[refplane].z;
+  int ref_h = cb_refplanes[refplane].w;
+  int4 info0 = cb_gm_warp[ref].info0;
+  int output0[4];
+  int i;
+  if (info0.x) {
+    int4 info1 = cb_gm_warp[ref].info1;
+    int4 info2 = cb_gm_warp[ref].info2;
+    const int src_x = ((x & (~7)) + 4) << subsampling;
+    const int src_y = ((y & (~7)) + 4) << subsampling;
+    const int dst_x = info1.x * src_x + info1.y * src_y + (int)info0.z;
+    const int dst_y = info1.z * src_x + info1.w * src_y + (int)info0.w;
+    const int x4 = dst_x >> subsampling;
+    const int y4 = dst_y >> subsampling;
+
+    int ix4 = clamp((x4 >> WarpPrecBits) - 7 + (x & 7), -11, ref_w);
+    int iy4 = (y4 >> WarpPrecBits) - 7 + (y & 7);
+
+    int sx4 = x4 & ((1 << WarpPrecBits) - 1);
+    int sy4 = y4 & ((1 << WarpPrecBits) - 1);
+
+    sx4 += info2.x * (-4) + info2.y * (-4);
+    sy4 += info2.w * (-4) + info2.z * (-4);
+
+    sx4 &= ~((1 << WarpReduceBits) - 1);
+    sy4 &= ~((1 << WarpReduceBits) - 1);
+
+    sx4 += info2.y * ((y & 7) - 3) + info2.x * (x & 7);
+    sy4 += info2.z * (y & 7) + info2.w * (x & 7);
+
+    ref_offset += ix4 + wi;
+    for (i = 0; i < 11; ++i) {
+      const int offset = ref_offset + ref_stride * clamp(iy4 + i, 0, ref_h);
+      uint3 l = dst_frame.Load3(offset & (~3));
+      uint shift = (offset & 3) * 8;
+      l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+      l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+      int sum = WarpHorizSumAdd;
+      const int sx = sx4 + info2.y * i + wi * info2.x;
+      int filter_offset = WarpFilterSize * (((sx + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+      uint4 f0 = warp_filter.Load4(filter_offset);
+      uint4 f1 = warp_filter.Load4(filter_offset + 16);
+      sum += f0.x * (int)((l.x >> 0) & 0xff);
+      sum += f0.y * (int)((l.x >> 8) & 0xff);
+      sum += f0.z * (int)((l.x >> 16) & 0xff);
+      sum += f0.w * (int)((l.x >> 24) & 0xff);
+      sum += f1.x * (int)((l.y >> 0) & 0xff);
+      sum += f1.y * (int)((l.y >> 8) & 0xff);
+      sum += f1.z * (int)((l.y >> 16) & 0xff);
+      sum += f1.w * (int)((l.y >> 24) & 0xff);
+      intermediate_buffer[local_ofs + i] = sum >> WarpHorizBits;
+    }
+    GroupMemoryBarrier();
+
+    int sy = sy4 + wi * info2.z;
+    for (i = 0; i < 4; ++i) {
+      int filter_addr =
+          WarpFilterSize * (((sy + i * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+      int4 filter0 = warp_filter.Load4(filter_addr);
+      int4 filter1 = warp_filter.Load4(filter_addr + 16);
+      int local_grp = ((thread.x & 60) + i) * LocalStride + wi;
+      output0[i] = intermediate_buffer[local_grp + 0] * filter0.x + intermediate_buffer[local_grp + 1] * filter0.y +
+                   intermediate_buffer[local_grp + 2] * filter0.z + intermediate_buffer[local_grp + 3] * filter0.w +
+                   intermediate_buffer[local_grp + 4] * filter1.x + intermediate_buffer[local_grp + 5] * filter1.y +
+                   intermediate_buffer[local_grp + 6] * filter1.z + intermediate_buffer[local_grp + 7] * filter1.w;
+    }
+  } else {
+    int4 scale = cb_scale[ref + 1];
+    int mv = block.z;
+    int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+    int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+    mvx += (dx + wi) * scale.y;
+    mvy += dy * scale.w;
+    int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w);
+    int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+    mvx &= SCALE_SUBPEL_MASK;
+    mvy &= SCALE_SUBPEL_MASK;
+    int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+    int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+    int4 kernel_h0 = cb_kernels[filter_h][0];
+    int4 kernel_h1 = cb_kernels[filter_h][1];
+
+    for (i = 0; i < lines; ++i) {
+      int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+      uint3 l = dst_frame.Load3(ref_addr & (~3));
+      const uint shift = (ref_addr & 3) * 8;
+      l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+      l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+      int sum = 0;
+      sum += kernel_h0.x * (int)((l.x >> 0) & 0xff);
+      sum += kernel_h0.y * (int)((l.x >> 8) & 0xff);
+      sum += kernel_h0.z * (int)((l.x >> 16) & 0xff);
+      sum += kernel_h0.w * (int)((l.x >> 24) & 0xff);
+      sum += kernel_h1.x * (int)((l.y >> 0) & 0xff);
+      sum += kernel_h1.y * (int)((l.y >> 8) & 0xff);
+      sum += kernel_h1.z * (int)((l.y >> 16) & 0xff);
+      sum += kernel_h1.w * (int)((l.y >> 24) & 0xff);
+      intermediate_buffer[local_ofs + i] = (sum + FilterLineAdd8bit) >> FilterLineShift;
+    }
+
+    GroupMemoryBarrier();
+
+    mvy += wi * scale.w;
+    const int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+    const int4 kernel_v0 = cb_kernels[filter_v][0];
+    const int4 kernel_v1 = cb_kernels[filter_v][1];
+    const int local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+    for (i = 0; i < 4; ++i) {
+      int sum = 0;
+      int loc_addr = local_base + i * LocalStride;
+      sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+      sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+      sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+      sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+      sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+      sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+      sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+      sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+      output0[i] = sum;
+    }
+  }
+  GroupMemoryBarrier();
+
+  // REF1
+  ref = (block.y >> 14) & 7;
+  refplane = ref * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+  ref_w = cb_refplanes[refplane].z;
+  ref_h = cb_refplanes[refplane].w;
+  info0 = cb_gm_warp[ref].info0;
+
+  int output1[4];
+  if (info0.x) {
+    int4 info1 = cb_gm_warp[ref].info1;
+    int4 info2 = cb_gm_warp[ref].info2;
+    const int src_x = ((x & (~7)) + 4) << subsampling;
+    const int src_y = ((y & (~7)) + 4) << subsampling;
+    const int dst_x = info1.x * src_x + info1.y * src_y + (int)info0.z;
+    const int dst_y = info1.z * src_x + info1.w * src_y + (int)info0.w;
+    const int x4 = dst_x >> subsampling;
+    const int y4 = dst_y >> subsampling;
+
+    int ix4 = clamp((x4 >> WarpPrecBits) - 7 + (x & 7), -11, ref_w);
+    int iy4 = (y4 >> WarpPrecBits) - 7 + (y & 7);
+
+    int sx4 = x4 & ((1 << WarpPrecBits) - 1);
+    int sy4 = y4 & ((1 << WarpPrecBits) - 1);
+
+    sx4 += info2.x * (-4) + info2.y * (-4);
+    sy4 += info2.w * (-4) + info2.z * (-4);
+
+    sx4 &= ~((1 << WarpReduceBits) - 1);
+    sy4 &= ~((1 << WarpReduceBits) - 1);
+
+    sx4 += info2.y * ((y & 7) - 3) + info2.x * (x & 7);
+    sy4 += info2.z * (y & 7) + info2.w * (x & 7);
+
+    ref_offset += ix4 + wi;
+    for (i = 0; i < 11; ++i) {
+      const int offset = ref_offset + ref_stride * clamp(iy4 + i, 0, ref_h);
+      uint3 l = dst_frame.Load3(offset & (~3));
+      uint shift = (offset & 3) * 8;
+      l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+      l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+      int sum = WarpHorizSumAdd;
+      const int sx = sx4 + info2.y * i + wi * info2.x;
+      int filter_offset = WarpFilterSize * (((sx + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+      uint4 f0 = warp_filter.Load4(filter_offset);
+      uint4 f1 = warp_filter.Load4(filter_offset + 16);
+      sum += f0.x * (int)((l.x >> 0) & 0xff);
+      sum += f0.y * (int)((l.x >> 8) & 0xff);
+      sum += f0.z * (int)((l.x >> 16) & 0xff);
+      sum += f0.w * (int)((l.x >> 24) & 0xff);
+      sum += f1.x * (int)((l.y >> 0) & 0xff);
+      sum += f1.y * (int)((l.y >> 8) & 0xff);
+      sum += f1.z * (int)((l.y >> 16) & 0xff);
+      sum += f1.w * (int)((l.y >> 24) & 0xff);
+      intermediate_buffer[local_ofs + i] = sum >> WarpHorizBits;
+    }
+    GroupMemoryBarrier();
+
+    int sy = sy4 + wi * info2.z;
+    for (i = 0; i < 4; ++i) {
+      int filter_addr =
+          WarpFilterSize * (((sy + i * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+      int4 filter0 = warp_filter.Load4(filter_addr);
+      int4 filter1 = warp_filter.Load4(filter_addr + 16);
+      int local_grp = ((thread.x & 60) + i) * LocalStride + wi;
+      output1[i] = intermediate_buffer[local_grp + 0] * filter0.x + intermediate_buffer[local_grp + 1] * filter0.y +
+                   intermediate_buffer[local_grp + 2] * filter0.z + intermediate_buffer[local_grp + 3] * filter0.w +
+                   intermediate_buffer[local_grp + 4] * filter1.x + intermediate_buffer[local_grp + 5] * filter1.y +
+                   intermediate_buffer[local_grp + 6] * filter1.z + intermediate_buffer[local_grp + 7] * filter1.w;
+    }
+  } else {
+    int4 scale = cb_scale[ref + 1];
+    int mv = block.w;
+    int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+    int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+    mvx += (dx + wi) * scale.y;
+    mvy += dy * scale.w;
+    int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w);
+    int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+    mvx &= SCALE_SUBPEL_MASK;
+    mvy &= SCALE_SUBPEL_MASK;
+    int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+    int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+    int4 kernel_h0 = cb_kernels[filter_h][0];
+    int4 kernel_h1 = cb_kernels[filter_h][1];
+
+    for (i = 0; i < lines; ++i) {
+      int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+      uint3 l = dst_frame.Load3(ref_addr & (~3));
+      const uint shift = (ref_addr & 3) * 8;
+      l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+      l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+      int sum = 0;
+      sum += kernel_h0.x * (int)((l.x >> 0) & 0xff);
+      sum += kernel_h0.y * (int)((l.x >> 8) & 0xff);
+      sum += kernel_h0.z * (int)((l.x >> 16) & 0xff);
+      sum += kernel_h0.w * (int)((l.x >> 24) & 0xff);
+      sum += kernel_h1.x * (int)((l.y >> 0) & 0xff);
+      sum += kernel_h1.y * (int)((l.y >> 8) & 0xff);
+      sum += kernel_h1.z * (int)((l.y >> 16) & 0xff);
+      sum += kernel_h1.w * (int)((l.y >> 24) & 0xff);
+      intermediate_buffer[local_ofs + i] = (sum + FilterLineAdd8bit) >> FilterLineShift;
+    }
+
+    GroupMemoryBarrier();
+
+    mvy += wi * scale.w;
+    const int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+    const int4 kernel_v0 = cb_kernels[filter_v][0];
+    const int4 kernel_v1 = cb_kernels[filter_v][1];
+    const int local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+    for (i = 0; i < 4; ++i) {
+      int sum = 0;
+      int loc_addr = local_base + i * LocalStride;
+      sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+      sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+      sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+      sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+      sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+      sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+      sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+      sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+      output1[i] = sum;
+    }
+  }
+
+  output0[0] = (output0[0] + VertSumAdd) >> VertBits;
+  output0[1] = (output0[1] + VertSumAdd) >> VertBits;
+  output0[2] = (output0[2] + VertSumAdd) >> VertBits;
+  output0[3] = (output0[3] + VertSumAdd) >> VertBits;
+  output1[0] = (output1[0] + VertSumAdd) >> VertBits;
+  output1[1] = (output1[1] + VertSumAdd) >> VertBits;
+  output1[2] = (output1[2] + VertSumAdd) >> VertBits;
+  output1[3] = (output1[3] + VertSumAdd) >> VertBits;
+
+  y += wi;
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+
+  int compound_type = block.y >> 30;
+  int4 coefs;
+  if (compound_type == 0) {
+    coefs.x = ((block.y >> 17) & 15) << 2;
+    coefs.yzw = coefs.xxx;
+  } else if (compound_type == 1) {
+    int wedge_stride = SubblockW << w_log;
+    int wedge_addr = (((block.y >> 17) & 0x1fff) << 6) + SubblockW * (subblock & ((1 << w_log) - 1)) +
+                     (SubblockH * (subblock >> w_log) + wi) * wedge_stride;
+    uint wedge = comp_mask.Load(wedge_addr);
+    coefs.x = (wedge >> 0) & 255;
+    coefs.y = (wedge >> 8) & 255;
+    coefs.z = (wedge >> 16) & 255;
+    coefs.w = (wedge >> 24) & 255;
+  } else if (compound_type == 3) {
+    uint m = dst_frame.Load(output_offset);
+    coefs.x = (m >> 0) & 255;
+    coefs.y = (m >> 8) & 255;
+    coefs.z = (m >> 16) & 255;
+    coefs.w = (m >> 24) & 255;
+  } else {
+    coefs.x =
+        clamp(DiffWTDBase + ((abs(output0[0] - output1[0]) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+    coefs.y =
+        clamp(DiffWTDBase + ((abs(output0[1] - output1[1]) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+    coefs.z =
+        clamp(DiffWTDBase + ((abs(output0[2] - output1[2]) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+    coefs.w =
+        clamp(DiffWTDBase + ((abs(output0[3] - output1[3]) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+    if ((block.y >> 17) & 1) coefs = int4(64, 64, 64, 64) - coefs;
+  }
+
+  output0[0] = blend(output0[0], output1[0], coefs.x);
+  output0[1] = blend(output0[1], output1[1], coefs.y);
+  output0[2] = blend(output0[2], output1[2], coefs.z);
+  output0[3] = blend(output0[3], output1[3], coefs.w);
+
+  if (noskip) {
+    const int res_stride = cb_planes[plane].z;
+    const int res_offset = cb_planes[plane].w;
+    int2 r = (int2)residuals.Load2(res_offset + (x << 1) + y * res_stride);
+    output0[0] = clamp(output0[0] + ((r.x << 16) >> 16), 0, 255);
+    output0[1] = clamp(output0[1] + (r.x >> 16), 0, 255);
+    output0[2] = clamp(output0[2] + ((r.y << 16) >> 16), 0, 255);
+    output0[3] = clamp(output0[3] + (r.y >> 16), 0, 255);
+  }
+  dst_frame.Store(output_offset, output0[0] | (output0[1] << 8) | (output0[2] << 16) | (output0[3] << 24));
+
+  if (compound_type == 2) {
+    wi = (thread.x & 63) << 1;
+    coefs.x = coefs.x + coefs.y;
+    coefs.y = coefs.z + coefs.w;
+    intermediate_buffer[wi] = coefs.x;
+    intermediate_buffer[wi + 1] = coefs.y;
+
+    if ((wi & 10) == 0)  // filter odd cols and lines
+    {
+      // next row
+      coefs.x = (coefs.x + intermediate_buffer[wi + 2] + 2) >> 2;
+      coefs.y = (coefs.y + intermediate_buffer[wi + 3] + 2) >> 2;
+      // next 4x4
+      coefs.z = (intermediate_buffer[wi + 8] + intermediate_buffer[wi + 10] + 2) >> 2;
+      coefs.w = (intermediate_buffer[wi + 9] + intermediate_buffer[wi + 11] + 2) >> 2;
+
+      int chroma_offset = (x + y * cb_planes[1].x) >> 1;
+      uint mask = coefs.x | (coefs.y << 8) | (coefs.z << 16) | (coefs.w << 24);
+      dst_frame.Store(cb_planes[1].y + chroma_offset, mask);
+      dst_frame.Store(cb_planes[2].y + chroma_offset, mask);
+    }
+  }
+}

diff --git a/libav1/dx/shaders/inter_scale_warp_compound_hbd.hlsl b/libav1/dx/shaders/inter_scale_warp_compound_hbd.hlsl
new file mode 100644
index 0000000..87891e6
--- /dev/null
+++ b/libav1/dx/shaders/inter_scale_warp_compound_hbd.hlsl

@@ -0,0 +1,425 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define VertBits 7
+#define VertSumAdd ((1 << 21) + (1 << (VertBits - 1)))
+#define VertSub ((1 << (21 - VertBits - 1)) + (1 << (21 - VertBits)))
+#define RoundFinal 4
+#define DiffWTDBase 38
+#define DiffWTDRoundAdd (1 << 5)
+#define DiffWTDRoundShft 10
+#define DiffWTDBits 6
+#define DiffWTDMax 64
+#define PixelMax 1023
+#define LocalStride 20
+
+groupshared int intermediate_buffer[64 * LocalStride];
+
+int blend(int src0, int src1, int coef) {
+  int result = (src0 * coef + src1 * (64 - coef)) >> 6;
+  result = (result - VertSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, PixelMax);
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  int wi = thread.x & 3;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index * 16);
+
+  const int dx = SubblockW * (subblock & ((1 << w_log) - 1));
+  const int dy = SubblockH * (subblock >> w_log);
+  const int mbx = SubblockW * (block.x & 0xffff);
+  const int mby = SubblockH * (block.x >> 16);
+  int x = mbx + dx;
+  int y = mby + dy;
+
+  const int plane = block.y & 3;
+  const int noskip = block.y & NoSkipFlag;
+  const int subsampling = plane > 0;
+  const int local_ofs = (thread.x & 63) * LocalStride;
+
+  // REF0
+  int ref = (block.y >> 2) & 7;
+  int refplane = ref * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+  int ref_w = cb_refplanes[refplane].z;
+  int ref_h = cb_refplanes[refplane].w;
+  int4 info0 = cb_gm_warp[ref].info0;
+  int output0[4];
+  int i;
+  if (info0.x) {
+    int4 info1 = cb_gm_warp[ref].info1;
+    int4 info2 = cb_gm_warp[ref].info2;
+    const int src_x = ((x & (~7)) + 4) << subsampling;
+    const int src_y = ((y & (~7)) + 4) << subsampling;
+    const int dst_x = info1.x * src_x + info1.y * src_y + (int)info0.z;
+    const int dst_y = info1.z * src_x + info1.w * src_y + (int)info0.w;
+    const int x4 = dst_x >> subsampling;
+    const int y4 = dst_y >> subsampling;
+
+    int ix4 = clamp((x4 >> WarpPrecBits) - 7 + (x & 7), -11, ref_w);
+    int iy4 = (y4 >> WarpPrecBits) - 7 + (y & 7);
+
+    int sx4 = x4 & ((1 << WarpPrecBits) - 1);
+    int sy4 = y4 & ((1 << WarpPrecBits) - 1);
+
+    sx4 += info2.x * (-4) + info2.y * (-4);
+    sy4 += info2.w * (-4) + info2.z * (-4);
+
+    sx4 &= ~((1 << WarpReduceBits) - 1);
+    sy4 &= ~((1 << WarpReduceBits) - 1);
+
+    sx4 += info2.y * ((y & 7) - 3) + info2.x * (x & 7);
+    sy4 += info2.z * (y & 7) + info2.w * (x & 7);
+
+    ref_offset += (ix4 + wi) << 1;
+    for (i = 0; i < 11; ++i) {
+      int offset = ref_offset + ref_stride * clamp(iy4 + i, 0, ref_h);
+      uint shift = (offset & 3) * 8;
+      offset &= ~3;
+      uint4 l0 = dst_frame.Load4(offset);
+      uint l4 = dst_frame.Load(offset + 16);
+      l0.x = (l0.x >> shift) | ((l0.y << (24 - shift)) << 8);
+      l0.y = (l0.y >> shift) | ((l0.z << (24 - shift)) << 8);
+      l0.z = (l0.z >> shift) | ((l0.w << (24 - shift)) << 8);
+      l0.w = (l0.w >> shift) | ((l4 << (24 - shift)) << 8);
+
+      int sum = WarpHorizSumAdd10;
+      const int sx = sx4 + info2.y * i + wi * info2.x;
+      int filter_offset = WarpFilterSize * (((sx + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+      uint4 f0 = warp_filter.Load4(filter_offset);
+      uint4 f1 = warp_filter.Load4(filter_offset + 16);
+      sum += f0.x * (int)((l0.x >> 0) & PixelMax10);
+      sum += f0.y * (int)((l0.x >> 16) & PixelMax10);
+      sum += f0.z * (int)((l0.y >> 0) & PixelMax10);
+      sum += f0.w * (int)((l0.y >> 16) & PixelMax10);
+      sum += f1.x * (int)((l0.z >> 0) & PixelMax10);
+      sum += f1.y * (int)((l0.z >> 16) & PixelMax10);
+      sum += f1.z * (int)((l0.w >> 0) & PixelMax10);
+      sum += f1.w * (int)((l0.w >> 16) & PixelMax10);
+      intermediate_buffer[local_ofs + i] = sum >> WarpHorizBits;
+    }
+
+    GroupMemoryBarrier();
+
+    int sy = sy4 + wi * info2.z;
+    for (i = 0; i < 4; ++i) {
+      int filter_addr =
+          WarpFilterSize * (((sy + i * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+      int4 filter0 = warp_filter.Load4(filter_addr);
+      int4 filter1 = warp_filter.Load4(filter_addr + 16);
+      int local_grp = ((thread.x & 60) + i) * LocalStride + wi;
+      output0[i] = intermediate_buffer[local_grp + 0] * filter0.x + intermediate_buffer[local_grp + 1] * filter0.y +
+                   intermediate_buffer[local_grp + 2] * filter0.z + intermediate_buffer[local_grp + 3] * filter0.w +
+                   intermediate_buffer[local_grp + 4] * filter1.x + intermediate_buffer[local_grp + 5] * filter1.y +
+                   intermediate_buffer[local_grp + 6] * filter1.z + intermediate_buffer[local_grp + 7] * filter1.w;
+    }
+  } else {
+    int4 scale = cb_scale[ref + 1];
+    int mv = block.z;
+    int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+    int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+    mvx += (dx + wi) * scale.y;
+    mvy += dy * scale.w;
+    int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w) << 1;
+    int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+    mvx &= SCALE_SUBPEL_MASK;
+    mvy &= SCALE_SUBPEL_MASK;
+    int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+    int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+    int4 kernel_h0 = cb_kernels[filter_h][0];
+    int4 kernel_h1 = cb_kernels[filter_h][1];
+
+    for (i = 0; i < lines; ++i) {
+      int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+      const uint shift = (ref_addr & 2) * 8;
+      ref_addr &= ~3;
+      uint4 l = dst_frame.Load4(ref_addr);
+      uint l5 = dst_frame.Load(ref_addr + 16);
+      l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+      l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+      l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+      l.w = (l.w >> shift) | ((l5 << (24 - shift)) << 8);
+      int sum = 0;
+      sum += kernel_h0.x * (int)((l.x >> 0) & 0xffff);
+      sum += kernel_h0.y * (int)((l.x >> 16) & 0xffff);
+      sum += kernel_h0.z * (int)((l.y >> 0) & 0xffff);
+      sum += kernel_h0.w * (int)((l.y >> 16) & 0xffff);
+      sum += kernel_h1.x * (int)((l.z >> 0) & 0xffff);
+      sum += kernel_h1.y * (int)((l.z >> 16) & 0xffff);
+      sum += kernel_h1.z * (int)((l.w >> 0) & 0xffff);
+      sum += kernel_h1.w * (int)((l.w >> 16) & 0xffff);
+      intermediate_buffer[local_ofs + i] = (sum + FilterLineAdd10bit) >> FilterLineShift;
+    }
+
+    GroupMemoryBarrier();
+
+    mvy += wi * scale.w;
+    const int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+    const int4 kernel_v0 = cb_kernels[filter_v][0];
+    const int4 kernel_v1 = cb_kernels[filter_v][1];
+    const int local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+    for (i = 0; i < 4; ++i) {
+      int sum = 0;
+      int loc_addr = local_base + i * LocalStride;
+      sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+      sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+      sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+      sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+      sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+      sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+      sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+      sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+      output0[i] = sum;
+    }
+  }
+
+  GroupMemoryBarrier();
+
+  // REF1
+  ref = (block.y >> 14) & 7;
+  refplane = ref * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+  ref_w = cb_refplanes[refplane].z;
+  ref_h = cb_refplanes[refplane].w;
+  info0 = cb_gm_warp[ref].info0;
+
+  int output1[4];
+  if (info0.x) {
+    int4 info1 = cb_gm_warp[ref].info1;
+    int4 info2 = cb_gm_warp[ref].info2;
+    const int src_x = ((x & (~7)) + 4) << subsampling;
+    const int src_y = ((y & (~7)) + 4) << subsampling;
+    const int dst_x = info1.x * src_x + info1.y * src_y + (int)info0.z;
+    const int dst_y = info1.z * src_x + info1.w * src_y + (int)info0.w;
+    const int x4 = dst_x >> subsampling;
+    const int y4 = dst_y >> subsampling;
+
+    int ix4 = clamp((x4 >> WarpPrecBits) - 7 + (x & 7), -11, ref_w);
+    int iy4 = (y4 >> WarpPrecBits) - 7 + (y & 7);
+
+    int sx4 = x4 & ((1 << WarpPrecBits) - 1);
+    int sy4 = y4 & ((1 << WarpPrecBits) - 1);
+
+    sx4 += info2.x * (-4) + info2.y * (-4);
+    sy4 += info2.w * (-4) + info2.z * (-4);
+
+    sx4 &= ~((1 << WarpReduceBits) - 1);
+    sy4 &= ~((1 << WarpReduceBits) - 1);
+
+    sx4 += info2.y * ((y & 7) - 3) + info2.x * (x & 7);
+    sy4 += info2.z * (y & 7) + info2.w * (x & 7);
+
+    ref_offset += (ix4 + wi) << 1;
+    for (i = 0; i < 11; ++i) {
+      int offset = ref_offset + ref_stride * clamp(iy4 + i, 0, ref_h);
+      uint shift = (offset & 3) * 8;
+      offset &= ~3;
+      uint4 l0 = dst_frame.Load4(offset);
+      uint l4 = dst_frame.Load(offset + 16);
+      l0.x = (l0.x >> shift) | ((l0.y << (24 - shift)) << 8);
+      l0.y = (l0.y >> shift) | ((l0.z << (24 - shift)) << 8);
+      l0.z = (l0.z >> shift) | ((l0.w << (24 - shift)) << 8);
+      l0.w = (l0.w >> shift) | ((l4 << (24 - shift)) << 8);
+
+      int sum = WarpHorizSumAdd10;
+      const int sx = sx4 + info2.y * i + wi * info2.x;
+      int filter_offset = WarpFilterSize * (((sx + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+      uint4 f0 = warp_filter.Load4(filter_offset);
+      uint4 f1 = warp_filter.Load4(filter_offset + 16);
+      sum += f0.x * (int)((l0.x >> 0) & PixelMax10);
+      sum += f0.y * (int)((l0.x >> 16) & PixelMax10);
+      sum += f0.z * (int)((l0.y >> 0) & PixelMax10);
+      sum += f0.w * (int)((l0.y >> 16) & PixelMax10);
+      sum += f1.x * (int)((l0.z >> 0) & PixelMax10);
+      sum += f1.y * (int)((l0.z >> 16) & PixelMax10);
+      sum += f1.z * (int)((l0.w >> 0) & PixelMax10);
+      sum += f1.w * (int)((l0.w >> 16) & PixelMax10);
+      intermediate_buffer[local_ofs + i] = sum >> WarpHorizBits;
+    }
+
+    GroupMemoryBarrier();
+
+    int sy = sy4 + wi * info2.z;
+    for (i = 0; i < 4; ++i) {
+      int filter_addr =
+          WarpFilterSize * (((sy + i * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+      int4 filter0 = warp_filter.Load4(filter_addr);
+      int4 filter1 = warp_filter.Load4(filter_addr + 16);
+      int local_grp = ((thread.x & 60) + i) * LocalStride + wi;
+      output1[i] = intermediate_buffer[local_grp + 0] * filter0.x + intermediate_buffer[local_grp + 1] * filter0.y +
+                   intermediate_buffer[local_grp + 2] * filter0.z + intermediate_buffer[local_grp + 3] * filter0.w +
+                   intermediate_buffer[local_grp + 4] * filter1.x + intermediate_buffer[local_grp + 5] * filter1.y +
+                   intermediate_buffer[local_grp + 6] * filter1.z + intermediate_buffer[local_grp + 7] * filter1.w;
+    }
+  } else {
+    int4 scale = cb_scale[ref + 1];
+    int mv = block.w;
+    int mvx = scale_value((mbx << SUBPEL_BITS) + (mv >> 16), scale.x) + SCALE_EXTRA_OFF;
+    int mvy = scale_value((mby << SUBPEL_BITS) + ((mv << 16) >> 16), scale.z) + SCALE_EXTRA_OFF;
+    mvx += (dx + wi) * scale.y;
+    mvy += dy * scale.w;
+    int x0 = clamp((mvx >> SCALE_SUBPEL_BITS) - 3, -11, ref_w) << 1;
+    int y0 = (mvy >> SCALE_SUBPEL_BITS) - 3;
+    mvx &= SCALE_SUBPEL_MASK;
+    mvy &= SCALE_SUBPEL_MASK;
+    int filter_h = (((block.y >> 5) & 15) << 4) + (mvx >> SCALE_EXTRA_BITS);
+    int lines = 8 + ((3 * scale.w + mvy) >> SCALE_SUBPEL_BITS);
+    int4 kernel_h0 = cb_kernels[filter_h][0];
+    int4 kernel_h1 = cb_kernels[filter_h][1];
+
+    for (i = 0; i < lines; ++i) {
+      int ref_addr = ref_offset + ref_stride * clamp(y0 + i, 0, ref_h) + x0;
+      const uint shift = (ref_addr & 2) * 8;
+      ref_addr &= ~3;
+      uint4 l = dst_frame.Load4(ref_addr);
+      uint l5 = dst_frame.Load(ref_addr + 16);
+      l.x = (l.x >> shift) | ((l.y << (24 - shift)) << 8);
+      l.y = (l.y >> shift) | ((l.z << (24 - shift)) << 8);
+      l.z = (l.z >> shift) | ((l.w << (24 - shift)) << 8);
+      l.w = (l.w >> shift) | ((l5 << (24 - shift)) << 8);
+      int sum = 0;
+      sum += kernel_h0.x * (int)((l.x >> 0) & 0xffff);
+      sum += kernel_h0.y * (int)((l.x >> 16) & 0xffff);
+      sum += kernel_h0.z * (int)((l.y >> 0) & 0xffff);
+      sum += kernel_h0.w * (int)((l.y >> 16) & 0xffff);
+      sum += kernel_h1.x * (int)((l.z >> 0) & 0xffff);
+      sum += kernel_h1.y * (int)((l.z >> 16) & 0xffff);
+      sum += kernel_h1.z * (int)((l.w >> 0) & 0xffff);
+      sum += kernel_h1.w * (int)((l.w >> 16) & 0xffff);
+      intermediate_buffer[local_ofs + i] = (sum + FilterLineAdd10bit) >> FilterLineShift;
+    }
+
+    GroupMemoryBarrier();
+
+    mvy += wi * scale.w;
+    const int filter_v = (((block.y >> 9) & 15) << 4) + ((mvy & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS);
+    const int4 kernel_v0 = cb_kernels[filter_v][0];
+    const int4 kernel_v1 = cb_kernels[filter_v][1];
+    const int local_base = (mvy >> SCALE_SUBPEL_BITS) + (thread.x & 60) * LocalStride;
+    for (i = 0; i < 4; ++i) {
+      int sum = 0;
+      int loc_addr = local_base + i * LocalStride;
+      sum += kernel_v0.x * intermediate_buffer[loc_addr + 0];
+      sum += kernel_v0.y * intermediate_buffer[loc_addr + 1];
+      sum += kernel_v0.z * intermediate_buffer[loc_addr + 2];
+      sum += kernel_v0.w * intermediate_buffer[loc_addr + 3];
+      sum += kernel_v1.x * intermediate_buffer[loc_addr + 4];
+      sum += kernel_v1.y * intermediate_buffer[loc_addr + 5];
+      sum += kernel_v1.z * intermediate_buffer[loc_addr + 6];
+      sum += kernel_v1.w * intermediate_buffer[loc_addr + 7];
+      output1[i] = sum;
+    }
+  }
+
+  output0[0] = (output0[0] + VertSumAdd) >> VertBits;
+  output0[1] = (output0[1] + VertSumAdd) >> VertBits;
+  output0[2] = (output0[2] + VertSumAdd) >> VertBits;
+  output0[3] = (output0[3] + VertSumAdd) >> VertBits;
+  output1[0] = (output1[0] + VertSumAdd) >> VertBits;
+  output1[1] = (output1[1] + VertSumAdd) >> VertBits;
+  output1[2] = (output1[2] + VertSumAdd) >> VertBits;
+  output1[3] = (output1[3] + VertSumAdd) >> VertBits;
+
+  y += wi;
+  x <<= 1;
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+
+  int compound_type = block.y >> 30;
+  int4 coefs;
+  if (compound_type == 0) {
+    coefs.x = ((block.y >> 17) & 15) << 2;
+    coefs.yzw = coefs.xxx;
+  } else if (compound_type == 1) {
+    int wedge_stride = SubblockW << w_log;
+    int wedge_addr = (((block.y >> 17) & 0x1fff) << 6) + SubblockW * (subblock & ((1 << w_log) - 1)) +
+                     (SubblockH * (subblock >> w_log) + wi) * wedge_stride;
+    uint wedge = comp_mask.Load(wedge_addr);
+    coefs.x = (wedge >> 0) & 255;
+    coefs.y = (wedge >> 8) & 255;
+    coefs.z = (wedge >> 16) & 255;
+    coefs.w = (wedge >> 24) & 255;
+  } else if (compound_type == 3) {
+    uint m = dst_frame.Load(output_offset);
+    coefs.x = (m >> 0) & 255;
+    coefs.y = (m >> 8) & 255;
+    coefs.z = (m >> 16) & 255;
+    coefs.w = (m >> 24) & 255;
+  } else {
+    coefs.x =
+        clamp(DiffWTDBase + ((abs(output0[0] - output1[0]) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+    coefs.y =
+        clamp(DiffWTDBase + ((abs(output0[1] - output1[1]) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+    coefs.z =
+        clamp(DiffWTDBase + ((abs(output0[2] - output1[2]) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+    coefs.w =
+        clamp(DiffWTDBase + ((abs(output0[3] - output1[3]) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+    if ((block.y >> 17) & 1) coefs = int4(64, 64, 64, 64) - coefs;
+  }
+
+  output0[0] = blend(output0[0], output1[0], coefs.x);
+  output0[1] = blend(output0[1], output1[1], coefs.y);
+  output0[2] = blend(output0[2], output1[2], coefs.z);
+  output0[3] = blend(output0[3], output1[3], coefs.w);
+
+  if (noskip) {
+    const int res_stride = cb_planes[plane].z;
+    const int res_offset = cb_planes[plane].w;
+    int2 r = (int2)residuals.Load2(res_offset + x + y * res_stride);
+    output0[0] = clamp(output0[0] + ((r.x << 16) >> 16), 0, PixelMax);
+    output0[1] = clamp(output0[1] + (r.x >> 16), 0, PixelMax);
+    output0[2] = clamp(output0[2] + ((r.y << 16) >> 16), 0, PixelMax);
+    output0[3] = clamp(output0[3] + (r.y >> 16), 0, PixelMax);
+  }
+
+  dst_frame.Store2(output_offset, uint2(output0[0] | (output0[1] << 16), output0[2] | (output0[3] << 16)));
+
+  if (compound_type == 2) {
+    wi = (thread.x & 63) << 1;
+    coefs.x = coefs.x + coefs.y;
+    coefs.y = coefs.z + coefs.w;
+    intermediate_buffer[wi] = coefs.x;
+    intermediate_buffer[wi + 1] = coefs.y;
+
+    if ((wi & 10) == 0)  // filter odd cols and lines
+    {
+      // next row
+      coefs.x = (coefs.x + intermediate_buffer[wi + 2] + 2) >> 2;
+      coefs.y = (coefs.y + intermediate_buffer[wi + 3] + 2) >> 2;
+      // next 4x4
+      coefs.z = (intermediate_buffer[wi + 8] + intermediate_buffer[wi + 10] + 2) >> 2;
+      coefs.w = (intermediate_buffer[wi + 9] + intermediate_buffer[wi + 11] + 2) >> 2;
+
+      int chroma_offset = (x + y * cb_planes[1].x) >> 1;
+      uint mask = coefs.x | (coefs.y << 8) | (coefs.z << 16) | (coefs.w << 24);
+      dst_frame.Store(cb_planes[1].y + chroma_offset, mask);
+      dst_frame.Store(cb_planes[2].y + chroma_offset, mask);
+    }
+  }
+}

diff --git a/libav1/dx/shaders/inter_warp.hlsl b/libav1/dx/shaders/inter_warp.hlsl
new file mode 100644
index 0000000..b7c101d
--- /dev/null
+++ b/libav1/dx/shaders/inter_warp.hlsl

@@ -0,0 +1,170 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define VertBits 11
+#define VertSumAdd ((1 << 19) + (1 << (VertBits - 1)))
+#define VertSub ((1 << (8 - 1)) + (1 << 8))
+#define FilterSize 32
+#define BlockSize 48
+
+groupshared int4 mem[11 * 64];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+
+  const int offset = (cb_pass_offset + (thread.x >> (w_log + h_log))) * BlockSize;
+  uint4 info0 = warp_blocks.Load4(offset);
+  int4 info1 = warp_blocks.Load4(offset + 16);
+  int4 info2 = warp_blocks.Load4(offset + 32);
+  // struct
+  //{
+  //    uint pos;
+  //    uint flags;
+  //    int mat[6];
+  //    int alpha;
+  //    int beta;
+  //    int delta;
+  //    int gamma;
+  //};
+
+  int x = SubblockW * ((info0.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * ((info0.x >> 16) + (subblock >> w_log));
+
+  const int plane = info0.y & 3;
+  const int subsampling = plane > 0;
+  const int2 dims = cb_dims[subsampling].xy;
+  const int noskip = info0.y & NoSkipFlag;
+
+  const int src_x = ((x & (~7)) + 4) << subsampling;
+  const int src_y = ((y & (~7)) + 4) << subsampling;
+  const int dst_x = info1.x * src_x + info1.y * src_y + (int)info0.z;
+  const int dst_y = info1.z * src_x + info1.w * src_y + (int)info0.w;
+  const int x4 = dst_x >> subsampling;
+  const int y4 = dst_y >> subsampling;
+
+  int ix4 = clamp((x4 >> WarpPrecBits) - 7 + (x & 7), -11, dims.x);
+  int iy4 = (y4 >> WarpPrecBits) - 7 + (y & 7);
+
+  int sx4 = x4 & ((1 << WarpPrecBits) - 1);
+  int sy4 = y4 & ((1 << WarpPrecBits) - 1);
+
+  sx4 += info2.x * (-4) + info2.y * (-4);
+  sy4 += info2.w * (-4) + info2.z * (-4);
+
+  sx4 &= ~((1 << WarpReduceBits) - 1);
+  sy4 &= ~((1 << WarpReduceBits) - 1);
+
+  sx4 += info2.y * ((y & 7) - 3) + info2.x * (x & 7);
+  sy4 += info2.z * (y & 7) + info2.w * (x & 7);
+
+  int refplane = ((info0.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y + ix4;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int local_ofs = (thread.x & 63) * 11;
+  mem[local_ofs + 0] = filter_line_warp(dst_frame, ref_offset + ref_stride * clamp(iy4 + 0, 0, dims.y), warp_filter,
+                                        sx4 + info2.y * 0, info2.x);
+  mem[local_ofs + 1] = filter_line_warp(dst_frame, ref_offset + ref_stride * clamp(iy4 + 1, 0, dims.y), warp_filter,
+                                        sx4 + info2.y * 1, info2.x);
+  mem[local_ofs + 2] = filter_line_warp(dst_frame, ref_offset + ref_stride * clamp(iy4 + 2, 0, dims.y), warp_filter,
+                                        sx4 + info2.y * 2, info2.x);
+  mem[local_ofs + 3] = filter_line_warp(dst_frame, ref_offset + ref_stride * clamp(iy4 + 3, 0, dims.y), warp_filter,
+                                        sx4 + info2.y * 3, info2.x);
+  mem[local_ofs + 4] = filter_line_warp(dst_frame, ref_offset + ref_stride * clamp(iy4 + 4, 0, dims.y), warp_filter,
+                                        sx4 + info2.y * 4, info2.x);
+  mem[local_ofs + 5] = filter_line_warp(dst_frame, ref_offset + ref_stride * clamp(iy4 + 5, 0, dims.y), warp_filter,
+                                        sx4 + info2.y * 5, info2.x);
+  mem[local_ofs + 6] = filter_line_warp(dst_frame, ref_offset + ref_stride * clamp(iy4 + 6, 0, dims.y), warp_filter,
+                                        sx4 + info2.y * 6, info2.x);
+  mem[local_ofs + 7] = filter_line_warp(dst_frame, ref_offset + ref_stride * clamp(iy4 + 7, 0, dims.y), warp_filter,
+                                        sx4 + info2.y * 7, info2.x);
+  mem[local_ofs + 8] = filter_line_warp(dst_frame, ref_offset + ref_stride * clamp(iy4 + 8, 0, dims.y), warp_filter,
+                                        sx4 + info2.y * 8, info2.x);
+  mem[local_ofs + 9] = filter_line_warp(dst_frame, ref_offset + ref_stride * clamp(iy4 + 9, 0, dims.y), warp_filter,
+                                        sx4 + info2.y * 9, info2.x);
+  mem[local_ofs + 10] = filter_line_warp(dst_frame, ref_offset + ref_stride * clamp(iy4 + 10, 0, dims.y), warp_filter,
+                                         sx4 + info2.y * 10, info2.x);
+
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+  const int res_stride = cb_planes[plane].z;
+  const int res_offset = cb_planes[plane].w + (x << 1) + y * res_stride;
+  for (int l = 0; l < 4; ++l) {
+    int4 output;
+    int sy = sy4 + l * info2.z;
+
+    int filter_addr;
+    int4 filter0, filter1;
+
+    filter_addr = FilterSize * (((sy + 0 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output.x = mem[local_ofs + l + 0].x * filter0.x + mem[local_ofs + l + 1].x * filter0.y +
+               mem[local_ofs + l + 2].x * filter0.z + mem[local_ofs + l + 3].x * filter0.w +
+               mem[local_ofs + l + 4].x * filter1.x + mem[local_ofs + l + 5].x * filter1.y +
+               mem[local_ofs + l + 6].x * filter1.z + mem[local_ofs + l + 7].x * filter1.w;
+
+    filter_addr = FilterSize * (((sy + 1 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output.y = mem[local_ofs + l + 0].y * filter0.x + mem[local_ofs + l + 1].y * filter0.y +
+               mem[local_ofs + l + 2].y * filter0.z + mem[local_ofs + l + 3].y * filter0.w +
+               mem[local_ofs + l + 4].y * filter1.x + mem[local_ofs + l + 5].y * filter1.y +
+               mem[local_ofs + l + 6].y * filter1.z + mem[local_ofs + l + 7].y * filter1.w;
+
+    filter_addr = FilterSize * (((sy + 2 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output.z = mem[local_ofs + l + 0].z * filter0.x + mem[local_ofs + l + 1].z * filter0.y +
+               mem[local_ofs + l + 2].z * filter0.z + mem[local_ofs + l + 3].z * filter0.w +
+               mem[local_ofs + l + 4].z * filter1.x + mem[local_ofs + l + 5].z * filter1.y +
+               mem[local_ofs + l + 6].z * filter1.z + mem[local_ofs + l + 7].z * filter1.w;
+
+    filter_addr = FilterSize * (((sy + 3 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output.w = mem[local_ofs + l + 0].w * filter0.x + mem[local_ofs + l + 1].w * filter0.y +
+               mem[local_ofs + l + 2].w * filter0.z + mem[local_ofs + l + 3].w * filter0.w +
+               mem[local_ofs + l + 4].w * filter1.x + mem[local_ofs + l + 5].w * filter1.y +
+               mem[local_ofs + l + 6].w * filter1.z + mem[local_ofs + l + 7].w * filter1.w;
+
+    output.x = clamp((int)(((output.x + VertSumAdd) >> VertBits) - VertSub), 0, 255);
+    output.y = clamp((int)(((output.y + VertSumAdd) >> VertBits) - VertSub), 0, 255);
+    output.z = clamp((int)(((output.z + VertSumAdd) >> VertBits) - VertSub), 0, 255);
+    output.w = clamp((int)(((output.w + VertSumAdd) >> VertBits) - VertSub), 0, 255);
+
+    if (noskip) {
+      int2 r = (int2)residuals.Load2(res_offset + l * res_stride);
+      output.x += (r.x << 16) >> 16;
+      output.y += r.x >> 16;
+      output.z += (r.y << 16) >> 16;
+      output.w += r.y >> 16;
+      output = clamp(output, 0, 255);
+    }
+
+    dst_frame.Store(output_offset + l * output_stride,
+                    output.x | (output.y << 8) | (output.z << 16) | (output.w << 24));
+  }
+}

diff --git a/libav1/dx/shaders/inter_warp_compound.hlsl b/libav1/dx/shaders/inter_warp_compound.hlsl
new file mode 100644
index 0000000..fed905f
--- /dev/null
+++ b/libav1/dx/shaders/inter_warp_compound.hlsl

@@ -0,0 +1,363 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define VertBits 7
+#define VertSumAdd ((1 << 19) + (1 << (VertBits - 1)))
+#define VertSub ((1 << (19 - VertBits - 1)) + (1 << (19 - VertBits)))
+#define RoundFinal 4
+#define DiffWTDBase 38
+#define DiffWTDRoundAdd 8
+#define DiffWTDRoundShft 8
+#define DiffWTDBits 6
+#define DiffWTDMax 64
+
+groupshared int4 mem[11 * 16];
+
+int blend(int src0, int src1, int coef) {
+  int result = (src0 * coef + src1 * (64 - coef)) >> 6;
+  result = (result - VertSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, 255);
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  int wi = thread.x & 3;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index * 16);
+
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * ((block.x >> 16) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int2 dims = cb_dims[plane > 0].xy;
+  const int noskip = block.y & NoSkipFlag;
+  const int subsampling = plane > 0;
+  const int local_ofs = ((thread.x & 63) >> 2) * 11;
+
+  // REF0
+  int ref = (block.y >> 2) & 7;
+  int refplane = ref * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+  int4 info0 = cb_gm_warp[ref].info0;
+  int4 output0;
+
+  if (info0.x) {
+    int4 info1 = cb_gm_warp[ref].info1;
+    int4 info2 = cb_gm_warp[ref].info2;
+    const int src_x = ((x & (~7)) + 4) << subsampling;
+    const int src_y = ((y & (~7)) + 4) << subsampling;
+    const int dst_x = info1.x * src_x + info1.y * src_y + (int)info0.z;
+    const int dst_y = info1.z * src_x + info1.w * src_y + (int)info0.w;
+    const int x4 = dst_x >> subsampling;
+    const int y4 = dst_y >> subsampling;
+
+    int ix4 = clamp((x4 >> WarpPrecBits) - 7 + (x & 7), -11, dims.x);
+    int iy4 = (y4 >> WarpPrecBits) - 7 + (y & 7);
+
+    int sx4 = x4 & ((1 << WarpPrecBits) - 1);
+    int sy4 = y4 & ((1 << WarpPrecBits) - 1);
+
+    sx4 += info2.x * (-4) + info2.y * (-4);
+    sy4 += info2.w * (-4) + info2.z * (-4);
+
+    sx4 &= ~((1 << WarpReduceBits) - 1);
+    sy4 &= ~((1 << WarpReduceBits) - 1);
+
+    sx4 += info2.y * ((y & 7) - 3) + info2.x * (x & 7);
+    sy4 += info2.z * (y & 7) + info2.w * (x & 7);
+
+    ref_offset += ix4;
+
+    for (int i = wi; i < 11; i += 4) {
+      mem[local_ofs + i] = filter_line_warp(dst_frame, ref_offset + ref_stride * clamp(iy4 + i, 0, dims.y), warp_filter,
+                                            sx4 + info2.y * i, info2.x);
+    }
+
+    int sy = sy4 + wi * info2.z;
+
+    int filter_addr;
+    int4 filter0, filter1;
+
+    filter_addr = WarpFilterSize * (((sy + 0 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output0.x = mem[local_ofs + wi + 0].x * filter0.x + mem[local_ofs + wi + 1].x * filter0.y +
+                mem[local_ofs + wi + 2].x * filter0.z + mem[local_ofs + wi + 3].x * filter0.w +
+                mem[local_ofs + wi + 4].x * filter1.x + mem[local_ofs + wi + 5].x * filter1.y +
+                mem[local_ofs + wi + 6].x * filter1.z + mem[local_ofs + wi + 7].x * filter1.w;
+
+    filter_addr = WarpFilterSize * (((sy + 1 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output0.y = mem[local_ofs + wi + 0].y * filter0.x + mem[local_ofs + wi + 1].y * filter0.y +
+                mem[local_ofs + wi + 2].y * filter0.z + mem[local_ofs + wi + 3].y * filter0.w +
+                mem[local_ofs + wi + 4].y * filter1.x + mem[local_ofs + wi + 5].y * filter1.y +
+                mem[local_ofs + wi + 6].y * filter1.z + mem[local_ofs + wi + 7].y * filter1.w;
+
+    filter_addr = WarpFilterSize * (((sy + 2 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output0.z = mem[local_ofs + wi + 0].z * filter0.x + mem[local_ofs + wi + 1].z * filter0.y +
+                mem[local_ofs + wi + 2].z * filter0.z + mem[local_ofs + wi + 3].z * filter0.w +
+                mem[local_ofs + wi + 4].z * filter1.x + mem[local_ofs + wi + 5].z * filter1.y +
+                mem[local_ofs + wi + 6].z * filter1.z + mem[local_ofs + wi + 7].z * filter1.w;
+
+    filter_addr = WarpFilterSize * (((sy + 3 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output0.w = mem[local_ofs + wi + 0].w * filter0.x + mem[local_ofs + wi + 1].w * filter0.y +
+                mem[local_ofs + wi + 2].w * filter0.z + mem[local_ofs + wi + 3].w * filter0.w +
+                mem[local_ofs + wi + 4].w * filter1.x + mem[local_ofs + wi + 5].w * filter1.y +
+                mem[local_ofs + wi + 6].w * filter1.z + mem[local_ofs + wi + 7].w * filter1.w;
+  } else {
+    int mv = block.z;
+    int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+    int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+    mvx = clamp(mvx, -11, dims.x);
+    int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+    int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+    int4 kernel_h0 = cb_kernels[filter_h][0];
+    int4 kernel_h1 = cb_kernels[filter_h][1];
+
+    for (int i = wi; i < 11; i += 4) {
+      mem[local_ofs + i] =
+          filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + i, 0, dims.y), kernel_h0, kernel_h1);
+    }
+
+    int4 kernel_v0 = cb_kernels[filter_v][0];
+    int4 kernel_v1 = cb_kernels[filter_v][1];
+
+    output0.x = mem[local_ofs + wi + 0].x * kernel_v0.x + mem[local_ofs + wi + 1].x * kernel_v0.y +
+                mem[local_ofs + wi + 2].x * kernel_v0.z + mem[local_ofs + wi + 3].x * kernel_v0.w +
+                mem[local_ofs + wi + 4].x * kernel_v1.x + mem[local_ofs + wi + 5].x * kernel_v1.y +
+                mem[local_ofs + wi + 6].x * kernel_v1.z + mem[local_ofs + wi + 7].x * kernel_v1.w;
+    output0.y = mem[local_ofs + wi + 0].y * kernel_v0.x + mem[local_ofs + wi + 1].y * kernel_v0.y +
+                mem[local_ofs + wi + 2].y * kernel_v0.z + mem[local_ofs + wi + 3].y * kernel_v0.w +
+                mem[local_ofs + wi + 4].y * kernel_v1.x + mem[local_ofs + wi + 5].y * kernel_v1.y +
+                mem[local_ofs + wi + 6].y * kernel_v1.z + mem[local_ofs + wi + 7].y * kernel_v1.w;
+    output0.z = mem[local_ofs + wi + 0].z * kernel_v0.x + mem[local_ofs + wi + 1].z * kernel_v0.y +
+                mem[local_ofs + wi + 2].z * kernel_v0.z + mem[local_ofs + wi + 3].z * kernel_v0.w +
+                mem[local_ofs + wi + 4].z * kernel_v1.x + mem[local_ofs + wi + 5].z * kernel_v1.y +
+                mem[local_ofs + wi + 6].z * kernel_v1.z + mem[local_ofs + wi + 7].z * kernel_v1.w;
+    output0.w = mem[local_ofs + wi + 0].w * kernel_v0.x + mem[local_ofs + wi + 1].w * kernel_v0.y +
+                mem[local_ofs + wi + 2].w * kernel_v0.z + mem[local_ofs + wi + 3].w * kernel_v0.w +
+                mem[local_ofs + wi + 4].w * kernel_v1.x + mem[local_ofs + wi + 5].w * kernel_v1.y +
+                mem[local_ofs + wi + 6].w * kernel_v1.z + mem[local_ofs + wi + 7].w * kernel_v1.w;
+  }
+
+  // REF1
+  ref = (block.y >> 14) & 7;
+  refplane = ref * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+  info0 = cb_gm_warp[ref].info0;
+
+  int4 output1;
+  if (info0.x) {
+    int4 info1 = cb_gm_warp[ref].info1;
+    int4 info2 = cb_gm_warp[ref].info2;
+    const int src_x = ((x & (~7)) + 4) << subsampling;
+    const int src_y = ((y & (~7)) + 4) << subsampling;
+    const int dst_x = info1.x * src_x + info1.y * src_y + (int)info0.z;
+    const int dst_y = info1.z * src_x + info1.w * src_y + (int)info0.w;
+    const int x4 = dst_x >> subsampling;
+    const int y4 = dst_y >> subsampling;
+
+    int ix4 = clamp((x4 >> WarpPrecBits) - 7 + (x & 7), -11, dims.x);
+    int iy4 = (y4 >> WarpPrecBits) - 7 + (y & 7);
+
+    int sx4 = x4 & ((1 << WarpPrecBits) - 1);
+    int sy4 = y4 & ((1 << WarpPrecBits) - 1);
+
+    sx4 += info2.x * (-4) + info2.y * (-4);
+    sy4 += info2.w * (-4) + info2.z * (-4);
+
+    sx4 &= ~((1 << WarpReduceBits) - 1);
+    sy4 &= ~((1 << WarpReduceBits) - 1);
+
+    sx4 += info2.y * ((y & 7) - 3) + info2.x * (x & 7);
+    sy4 += info2.z * (y & 7) + info2.w * (x & 7);
+
+    ref_offset += ix4;
+
+    for (int i = wi; i < 11; i += 4) {
+      mem[local_ofs + i] = filter_line_warp(dst_frame, ref_offset + ref_stride * clamp(iy4 + i, 0, dims.y), warp_filter,
+                                            sx4 + info2.y * i, info2.x);
+    }
+
+    int sy = sy4 + wi * info2.z;
+    int filter_addr;
+    int4 filter0, filter1;
+
+    filter_addr = WarpFilterSize * (((sy + 0 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output1.x = mem[local_ofs + wi + 0].x * filter0.x + mem[local_ofs + wi + 1].x * filter0.y +
+                mem[local_ofs + wi + 2].x * filter0.z + mem[local_ofs + wi + 3].x * filter0.w +
+                mem[local_ofs + wi + 4].x * filter1.x + mem[local_ofs + wi + 5].x * filter1.y +
+                mem[local_ofs + wi + 6].x * filter1.z + mem[local_ofs + wi + 7].x * filter1.w;
+
+    filter_addr = WarpFilterSize * (((sy + 1 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output1.y = mem[local_ofs + wi + 0].y * filter0.x + mem[local_ofs + wi + 1].y * filter0.y +
+                mem[local_ofs + wi + 2].y * filter0.z + mem[local_ofs + wi + 3].y * filter0.w +
+                mem[local_ofs + wi + 4].y * filter1.x + mem[local_ofs + wi + 5].y * filter1.y +
+                mem[local_ofs + wi + 6].y * filter1.z + mem[local_ofs + wi + 7].y * filter1.w;
+
+    filter_addr = WarpFilterSize * (((sy + 2 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output1.z = mem[local_ofs + wi + 0].z * filter0.x + mem[local_ofs + wi + 1].z * filter0.y +
+                mem[local_ofs + wi + 2].z * filter0.z + mem[local_ofs + wi + 3].z * filter0.w +
+                mem[local_ofs + wi + 4].z * filter1.x + mem[local_ofs + wi + 5].z * filter1.y +
+                mem[local_ofs + wi + 6].z * filter1.z + mem[local_ofs + wi + 7].z * filter1.w;
+
+    filter_addr = WarpFilterSize * (((sy + 3 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output1.w = mem[local_ofs + wi + 0].w * filter0.x + mem[local_ofs + wi + 1].w * filter0.y +
+                mem[local_ofs + wi + 2].w * filter0.z + mem[local_ofs + wi + 3].w * filter0.w +
+                mem[local_ofs + wi + 4].w * filter1.x + mem[local_ofs + wi + 5].w * filter1.y +
+                mem[local_ofs + wi + 6].w * filter1.z + mem[local_ofs + wi + 7].w * filter1.w;
+  } else {
+    int mv = block.w;
+    int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+    int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+    mvx = clamp(mvx, -11, dims.x);
+
+    int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+    int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+    int4 kernel_h0 = cb_kernels[filter_h][0];
+    int4 kernel_h1 = cb_kernels[filter_h][1];
+
+    for (int i = wi; i < 11; i += 4) {
+      mem[local_ofs + i] =
+          filter_line(dst_frame, ref_offset + mvx + ref_stride * clamp(mvy + i, 0, dims.y), kernel_h0, kernel_h1);
+    }
+
+    int4 kernel_v0 = cb_kernels[filter_v][0];
+    int4 kernel_v1 = cb_kernels[filter_v][1];
+
+    output1.x = mem[local_ofs + wi + 0].x * kernel_v0.x + mem[local_ofs + wi + 1].x * kernel_v0.y +
+                mem[local_ofs + wi + 2].x * kernel_v0.z + mem[local_ofs + wi + 3].x * kernel_v0.w +
+                mem[local_ofs + wi + 4].x * kernel_v1.x + mem[local_ofs + wi + 5].x * kernel_v1.y +
+                mem[local_ofs + wi + 6].x * kernel_v1.z + mem[local_ofs + wi + 7].x * kernel_v1.w;
+    output1.y = mem[local_ofs + wi + 0].y * kernel_v0.x + mem[local_ofs + wi + 1].y * kernel_v0.y +
+                mem[local_ofs + wi + 2].y * kernel_v0.z + mem[local_ofs + wi + 3].y * kernel_v0.w +
+                mem[local_ofs + wi + 4].y * kernel_v1.x + mem[local_ofs + wi + 5].y * kernel_v1.y +
+                mem[local_ofs + wi + 6].y * kernel_v1.z + mem[local_ofs + wi + 7].y * kernel_v1.w;
+    output1.z = mem[local_ofs + wi + 0].z * kernel_v0.x + mem[local_ofs + wi + 1].z * kernel_v0.y +
+                mem[local_ofs + wi + 2].z * kernel_v0.z + mem[local_ofs + wi + 3].z * kernel_v0.w +
+                mem[local_ofs + wi + 4].z * kernel_v1.x + mem[local_ofs + wi + 5].z * kernel_v1.y +
+                mem[local_ofs + wi + 6].z * kernel_v1.z + mem[local_ofs + wi + 7].z * kernel_v1.w;
+    output1.w = mem[local_ofs + wi + 0].w * kernel_v0.x + mem[local_ofs + wi + 1].w * kernel_v0.y +
+                mem[local_ofs + wi + 2].w * kernel_v0.z + mem[local_ofs + wi + 3].w * kernel_v0.w +
+                mem[local_ofs + wi + 4].w * kernel_v1.x + mem[local_ofs + wi + 5].w * kernel_v1.y +
+                mem[local_ofs + wi + 6].w * kernel_v1.z + mem[local_ofs + wi + 7].w * kernel_v1.w;
+  }
+
+  output0.x = (output0.x + VertSumAdd) >> VertBits;
+  output0.y = (output0.y + VertSumAdd) >> VertBits;
+  output0.z = (output0.z + VertSumAdd) >> VertBits;
+  output0.w = (output0.w + VertSumAdd) >> VertBits;
+  output1.x = (output1.x + VertSumAdd) >> VertBits;
+  output1.y = (output1.y + VertSumAdd) >> VertBits;
+  output1.z = (output1.z + VertSumAdd) >> VertBits;
+  output1.w = (output1.w + VertSumAdd) >> VertBits;
+
+  y += wi;
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+
+  int compound_type = block.y >> 30;
+  int4 coefs;
+  if (compound_type == 0) {
+    coefs.x = ((block.y >> 17) & 15) << 2;
+    coefs.yzw = coefs.xxx;
+  } else if (compound_type == 1) {
+    int wedge_stride = SubblockW << w_log;
+    int wedge_addr = (((block.y >> 17) & 0x1fff) << 6) + SubblockW * (subblock & ((1 << w_log) - 1)) +
+                     (SubblockH * (subblock >> w_log) + wi) * wedge_stride;
+    uint wedge = comp_mask.Load(wedge_addr);
+    coefs.x = (wedge >> 0) & 255;
+    coefs.y = (wedge >> 8) & 255;
+    coefs.z = (wedge >> 16) & 255;
+    coefs.w = (wedge >> 24) & 255;
+  } else if (compound_type == 3) {
+    uint m = dst_frame.Load(output_offset);
+    coefs.x = (m >> 0) & 255;
+    coefs.y = (m >> 8) & 255;
+    coefs.z = (m >> 16) & 255;
+    coefs.w = (m >> 24) & 255;
+  } else {
+    coefs.x = clamp(DiffWTDBase + ((abs(output0.x - output1.x) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+    coefs.y = clamp(DiffWTDBase + ((abs(output0.y - output1.y) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+    coefs.z = clamp(DiffWTDBase + ((abs(output0.z - output1.z) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+    coefs.w = clamp(DiffWTDBase + ((abs(output0.w - output1.w) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+    if ((block.y >> 17) & 1) coefs = int4(64, 64, 64, 64) - coefs;
+  }
+
+  output0.x = blend(output0.x, output1.x, coefs.x);
+  output0.y = blend(output0.y, output1.y, coefs.y);
+  output0.z = blend(output0.z, output1.z, coefs.z);
+  output0.w = blend(output0.w, output1.w, coefs.w);
+
+  if (noskip) {
+    const int res_stride = cb_planes[plane].z;
+    const int res_offset = cb_planes[plane].w;
+    int2 r = (int2)residuals.Load2(res_offset + (x << 1) + y * res_stride);
+    output0.x += (r.x << 16) >> 16;
+    output0.y += r.x >> 16;
+    output0.z += (r.y << 16) >> 16;
+    output0.w += r.y >> 16;
+    output0 = clamp(output0, 0, 255);
+  }
+  dst_frame.Store(output_offset, output0.x | (output0.y << 8) | (output0.z << 16) | (output0.w << 24));
+
+  if (compound_type == 2) {
+    wi = thread.x & 63;
+    coefs.x = coefs.x + coefs.y;
+    coefs.y = coefs.z + coefs.w;
+    mem[wi].xy = coefs.xy;
+
+    if ((wi & 5) == 0)  // filter odd cols and lines
+    {
+      coefs.xy += mem[wi + 1].xy;  // next line
+      coefs.zw = mem[wi + 4].xy;   // next 4x4 col
+      coefs.zw += mem[wi + 5].xy;  // next 4x4 col, next line
+      coefs.x = (coefs.x + 2) >> 2;
+      coefs.y = (coefs.y + 2) >> 2;
+      coefs.z = (coefs.z + 2) >> 2;
+      coefs.w = (coefs.w + 2) >> 2;
+
+      int chroma_offset = (x + y * cb_planes[1].x) >> 1;
+      uint mask = coefs.x | (coefs.y << 8) | (coefs.z << 16) | (coefs.w << 24);
+      dst_frame.Store(cb_planes[1].y + chroma_offset, mask);
+      dst_frame.Store(cb_planes[2].y + chroma_offset, mask);
+    }
+  }
+}

diff --git a/libav1/dx/shaders/inter_warp_compound_hbd.hlsl b/libav1/dx/shaders/inter_warp_compound_hbd.hlsl
new file mode 100644
index 0000000..214dd79
--- /dev/null
+++ b/libav1/dx/shaders/inter_warp_compound_hbd.hlsl

@@ -0,0 +1,368 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define VertBits 7
+#define VertSumAdd ((1 << 21) + (1 << (VertBits - 1)))
+#define VertSub ((1 << (21 - VertBits - 1)) + (1 << (21 - VertBits)))
+#define RoundFinal 4
+#define DiffWTDBase 38
+#define DiffWTDRoundAdd 32
+#define DiffWTDRoundShft 10
+#define DiffWTDBits 6
+#define DiffWTDMax 64
+#define PixelMax 1023
+
+groupshared int4 mem[11 * 16];
+
+int blend(int src0, int src1, int coef) {
+  int result = (src0 * coef + src1 * (64 - coef)) >> 6;
+  result = (result - VertSub + (1 << (RoundFinal - 1))) >> RoundFinal;
+  return clamp(result, 0, PixelMax);
+}
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  int wi = thread.x & 3;
+  const int subblock = (thread.x >> 2) & ((1 << (w_log + h_log)) - 1);
+  int block_index = cb_pass_offset + (thread.x >> (w_log + h_log + 2));
+  uint4 block = pred_blocks.Load4(block_index * 16);
+
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * ((block.x >> 16) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int2 dims = cb_dims[plane > 0].xy;
+  const int noskip = block.y & NoSkipFlag;
+  const int subsampling = plane > 0;
+  const int local_ofs = ((thread.x & 63) >> 2) * 11;
+
+  // REF0
+  int ref = (block.y >> 2) & 7;
+  int refplane = ref * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y;
+  int ref_stride = cb_refplanes[refplane].x;
+  int4 info0 = cb_gm_warp[ref].info0;
+  int4 output0;
+
+  if (info0.x) {
+    int4 info1 = cb_gm_warp[ref].info1;
+    int4 info2 = cb_gm_warp[ref].info2;
+    const int src_x = ((x & (~7)) + 4) << subsampling;
+    const int src_y = ((y & (~7)) + 4) << subsampling;
+    const int dst_x = info1.x * src_x + info1.y * src_y + (int)info0.z;
+    const int dst_y = info1.z * src_x + info1.w * src_y + (int)info0.w;
+    const int x4 = dst_x >> subsampling;
+    const int y4 = dst_y >> subsampling;
+
+    int ix4 = clamp((x4 >> WarpPrecBits) - 7 + (x & 7), -11, dims.x);
+    int iy4 = (y4 >> WarpPrecBits) - 7 + (y & 7);
+
+    int sx4 = x4 & ((1 << WarpPrecBits) - 1);
+    int sy4 = y4 & ((1 << WarpPrecBits) - 1);
+
+    sx4 += info2.x * (-4) + info2.y * (-4);
+    sy4 += info2.w * (-4) + info2.z * (-4);
+
+    sx4 &= ~((1 << WarpReduceBits) - 1);
+    sy4 &= ~((1 << WarpReduceBits) - 1);
+
+    sx4 += info2.y * ((y & 7) - 3) + info2.x * (x & 7);
+    sy4 += info2.z * (y & 7) + info2.w * (x & 7);
+
+    ref_offset += ix4 << 1;
+
+    for (int i = wi; i < 11; i += 4) {
+      mem[local_ofs + i] = filter_line_warp_hbd(dst_frame, ref_offset + ref_stride * clamp(iy4 + i, 0, dims.y),
+                                                warp_filter, sx4 + info2.y * i, info2.x);
+    }
+
+    int sy = sy4 + wi * info2.z;
+
+    int filter_addr;
+    int4 filter0, filter1;
+
+    filter_addr = WarpFilterSize * (((sy + 0 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output0.x = mem[local_ofs + wi + 0].x * filter0.x + mem[local_ofs + wi + 1].x * filter0.y +
+                mem[local_ofs + wi + 2].x * filter0.z + mem[local_ofs + wi + 3].x * filter0.w +
+                mem[local_ofs + wi + 4].x * filter1.x + mem[local_ofs + wi + 5].x * filter1.y +
+                mem[local_ofs + wi + 6].x * filter1.z + mem[local_ofs + wi + 7].x * filter1.w;
+
+    filter_addr = WarpFilterSize * (((sy + 1 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output0.y = mem[local_ofs + wi + 0].y * filter0.x + mem[local_ofs + wi + 1].y * filter0.y +
+                mem[local_ofs + wi + 2].y * filter0.z + mem[local_ofs + wi + 3].y * filter0.w +
+                mem[local_ofs + wi + 4].y * filter1.x + mem[local_ofs + wi + 5].y * filter1.y +
+                mem[local_ofs + wi + 6].y * filter1.z + mem[local_ofs + wi + 7].y * filter1.w;
+
+    filter_addr = WarpFilterSize * (((sy + 2 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output0.z = mem[local_ofs + wi + 0].z * filter0.x + mem[local_ofs + wi + 1].z * filter0.y +
+                mem[local_ofs + wi + 2].z * filter0.z + mem[local_ofs + wi + 3].z * filter0.w +
+                mem[local_ofs + wi + 4].z * filter1.x + mem[local_ofs + wi + 5].z * filter1.y +
+                mem[local_ofs + wi + 6].z * filter1.z + mem[local_ofs + wi + 7].z * filter1.w;
+
+    filter_addr = WarpFilterSize * (((sy + 3 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output0.w = mem[local_ofs + wi + 0].w * filter0.x + mem[local_ofs + wi + 1].w * filter0.y +
+                mem[local_ofs + wi + 2].w * filter0.z + mem[local_ofs + wi + 3].w * filter0.w +
+                mem[local_ofs + wi + 4].w * filter1.x + mem[local_ofs + wi + 5].w * filter1.y +
+                mem[local_ofs + wi + 6].w * filter1.z + mem[local_ofs + wi + 7].w * filter1.w;
+  } else {
+    int mv = block.z;
+    int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+    int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+    mvx = clamp(mvx, -11, dims.x);
+    int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+    int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+    int4 kernel_h0 = cb_kernels[filter_h][0];
+    int4 kernel_h1 = cb_kernels[filter_h][1];
+    ref_offset += mvx << 1;
+
+    for (int i = wi; i < 11; i += 4) {
+      mem[local_ofs + i] =
+          filter_line_hbd(dst_frame, ref_offset + ref_stride * clamp(mvy + i, 0, dims.y), kernel_h0, kernel_h1);
+    }
+
+    int4 kernel_v0 = cb_kernels[filter_v][0];
+    int4 kernel_v1 = cb_kernels[filter_v][1];
+
+    output0.x = mem[local_ofs + wi + 0].x * kernel_v0.x + mem[local_ofs + wi + 1].x * kernel_v0.y +
+                mem[local_ofs + wi + 2].x * kernel_v0.z + mem[local_ofs + wi + 3].x * kernel_v0.w +
+                mem[local_ofs + wi + 4].x * kernel_v1.x + mem[local_ofs + wi + 5].x * kernel_v1.y +
+                mem[local_ofs + wi + 6].x * kernel_v1.z + mem[local_ofs + wi + 7].x * kernel_v1.w;
+    output0.y = mem[local_ofs + wi + 0].y * kernel_v0.x + mem[local_ofs + wi + 1].y * kernel_v0.y +
+                mem[local_ofs + wi + 2].y * kernel_v0.z + mem[local_ofs + wi + 3].y * kernel_v0.w +
+                mem[local_ofs + wi + 4].y * kernel_v1.x + mem[local_ofs + wi + 5].y * kernel_v1.y +
+                mem[local_ofs + wi + 6].y * kernel_v1.z + mem[local_ofs + wi + 7].y * kernel_v1.w;
+    output0.z = mem[local_ofs + wi + 0].z * kernel_v0.x + mem[local_ofs + wi + 1].z * kernel_v0.y +
+                mem[local_ofs + wi + 2].z * kernel_v0.z + mem[local_ofs + wi + 3].z * kernel_v0.w +
+                mem[local_ofs + wi + 4].z * kernel_v1.x + mem[local_ofs + wi + 5].z * kernel_v1.y +
+                mem[local_ofs + wi + 6].z * kernel_v1.z + mem[local_ofs + wi + 7].z * kernel_v1.w;
+    output0.w = mem[local_ofs + wi + 0].w * kernel_v0.x + mem[local_ofs + wi + 1].w * kernel_v0.y +
+                mem[local_ofs + wi + 2].w * kernel_v0.z + mem[local_ofs + wi + 3].w * kernel_v0.w +
+                mem[local_ofs + wi + 4].w * kernel_v1.x + mem[local_ofs + wi + 5].w * kernel_v1.y +
+                mem[local_ofs + wi + 6].w * kernel_v1.z + mem[local_ofs + wi + 7].w * kernel_v1.w;
+  }
+
+  // REF1
+  ref = (block.y >> 14) & 7;
+  refplane = ref * 3 + plane;
+  ref_offset = cb_refplanes[refplane].y;
+  ref_stride = cb_refplanes[refplane].x;
+  info0 = cb_gm_warp[ref].info0;
+
+  int4 output1;
+  if (info0.x) {
+    int4 info1 = cb_gm_warp[ref].info1;
+    int4 info2 = cb_gm_warp[ref].info2;
+    const int src_x = ((x & (~7)) + 4) << subsampling;
+    const int src_y = ((y & (~7)) + 4) << subsampling;
+    const int dst_x = info1.x * src_x + info1.y * src_y + (int)info0.z;
+    const int dst_y = info1.z * src_x + info1.w * src_y + (int)info0.w;
+    const int x4 = dst_x >> subsampling;
+    const int y4 = dst_y >> subsampling;
+
+    int ix4 = clamp((x4 >> WarpPrecBits) - 7 + (x & 7), -11, dims.x);
+    int iy4 = (y4 >> WarpPrecBits) - 7 + (y & 7);
+
+    int sx4 = x4 & ((1 << WarpPrecBits) - 1);
+    int sy4 = y4 & ((1 << WarpPrecBits) - 1);
+
+    sx4 += info2.x * (-4) + info2.y * (-4);
+    sy4 += info2.w * (-4) + info2.z * (-4);
+
+    sx4 &= ~((1 << WarpReduceBits) - 1);
+    sy4 &= ~((1 << WarpReduceBits) - 1);
+
+    sx4 += info2.y * ((y & 7) - 3) + info2.x * (x & 7);
+    sy4 += info2.z * (y & 7) + info2.w * (x & 7);
+
+    ref_offset += ix4 << 1;
+
+    for (int i = wi; i < 11; i += 4) {
+      mem[local_ofs + i] = filter_line_warp_hbd(dst_frame, ref_offset + ref_stride * clamp(iy4 + i, 0, dims.y),
+                                                warp_filter, sx4 + info2.y * i, info2.x);
+    }
+
+    int sy = sy4 + wi * info2.z;
+    int filter_addr;
+    int4 filter0, filter1;
+
+    filter_addr = WarpFilterSize * (((sy + 0 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output1.x = mem[local_ofs + wi + 0].x * filter0.x + mem[local_ofs + wi + 1].x * filter0.y +
+                mem[local_ofs + wi + 2].x * filter0.z + mem[local_ofs + wi + 3].x * filter0.w +
+                mem[local_ofs + wi + 4].x * filter1.x + mem[local_ofs + wi + 5].x * filter1.y +
+                mem[local_ofs + wi + 6].x * filter1.z + mem[local_ofs + wi + 7].x * filter1.w;
+
+    filter_addr = WarpFilterSize * (((sy + 1 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output1.y = mem[local_ofs + wi + 0].y * filter0.x + mem[local_ofs + wi + 1].y * filter0.y +
+                mem[local_ofs + wi + 2].y * filter0.z + mem[local_ofs + wi + 3].y * filter0.w +
+                mem[local_ofs + wi + 4].y * filter1.x + mem[local_ofs + wi + 5].y * filter1.y +
+                mem[local_ofs + wi + 6].y * filter1.z + mem[local_ofs + wi + 7].y * filter1.w;
+
+    filter_addr = WarpFilterSize * (((sy + 2 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output1.z = mem[local_ofs + wi + 0].z * filter0.x + mem[local_ofs + wi + 1].z * filter0.y +
+                mem[local_ofs + wi + 2].z * filter0.z + mem[local_ofs + wi + 3].z * filter0.w +
+                mem[local_ofs + wi + 4].z * filter1.x + mem[local_ofs + wi + 5].z * filter1.y +
+                mem[local_ofs + wi + 6].z * filter1.z + mem[local_ofs + wi + 7].z * filter1.w;
+
+    filter_addr = WarpFilterSize * (((sy + 3 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output1.w = mem[local_ofs + wi + 0].w * filter0.x + mem[local_ofs + wi + 1].w * filter0.y +
+                mem[local_ofs + wi + 2].w * filter0.z + mem[local_ofs + wi + 3].w * filter0.w +
+                mem[local_ofs + wi + 4].w * filter1.x + mem[local_ofs + wi + 5].w * filter1.y +
+                mem[local_ofs + wi + 6].w * filter1.z + mem[local_ofs + wi + 7].w * filter1.w;
+  } else {
+    int mv = block.w;
+    int mvx = x + ((mv) >> (16 + SUBPEL_BITS)) - 3;
+    int mvy = y + ((mv << 16) >> (16 + SUBPEL_BITS)) - 3;
+    mvx = clamp(mvx, -11, dims.x);
+
+    int filter_h = (((block.y >> 5) & 15) << 4) + ((mv >> 16) & SUBPEL_MASK);
+    int filter_v = (((block.y >> 9) & 15) << 4) + (mv & SUBPEL_MASK);
+    int4 kernel_h0 = cb_kernels[filter_h][0];
+    int4 kernel_h1 = cb_kernels[filter_h][1];
+
+    ref_offset += mvx << 1;
+    for (int i = wi; i < 11; i += 4) {
+      mem[local_ofs + i] =
+          filter_line_hbd(dst_frame, ref_offset + ref_stride * clamp(mvy + i, 0, dims.y), kernel_h0, kernel_h1);
+    }
+
+    int4 kernel_v0 = cb_kernels[filter_v][0];
+    int4 kernel_v1 = cb_kernels[filter_v][1];
+
+    output1.x = mem[local_ofs + wi + 0].x * kernel_v0.x + mem[local_ofs + wi + 1].x * kernel_v0.y +
+                mem[local_ofs + wi + 2].x * kernel_v0.z + mem[local_ofs + wi + 3].x * kernel_v0.w +
+                mem[local_ofs + wi + 4].x * kernel_v1.x + mem[local_ofs + wi + 5].x * kernel_v1.y +
+                mem[local_ofs + wi + 6].x * kernel_v1.z + mem[local_ofs + wi + 7].x * kernel_v1.w;
+    output1.y = mem[local_ofs + wi + 0].y * kernel_v0.x + mem[local_ofs + wi + 1].y * kernel_v0.y +
+                mem[local_ofs + wi + 2].y * kernel_v0.z + mem[local_ofs + wi + 3].y * kernel_v0.w +
+                mem[local_ofs + wi + 4].y * kernel_v1.x + mem[local_ofs + wi + 5].y * kernel_v1.y +
+                mem[local_ofs + wi + 6].y * kernel_v1.z + mem[local_ofs + wi + 7].y * kernel_v1.w;
+    output1.z = mem[local_ofs + wi + 0].z * kernel_v0.x + mem[local_ofs + wi + 1].z * kernel_v0.y +
+                mem[local_ofs + wi + 2].z * kernel_v0.z + mem[local_ofs + wi + 3].z * kernel_v0.w +
+                mem[local_ofs + wi + 4].z * kernel_v1.x + mem[local_ofs + wi + 5].z * kernel_v1.y +
+                mem[local_ofs + wi + 6].z * kernel_v1.z + mem[local_ofs + wi + 7].z * kernel_v1.w;
+    output1.w = mem[local_ofs + wi + 0].w * kernel_v0.x + mem[local_ofs + wi + 1].w * kernel_v0.y +
+                mem[local_ofs + wi + 2].w * kernel_v0.z + mem[local_ofs + wi + 3].w * kernel_v0.w +
+                mem[local_ofs + wi + 4].w * kernel_v1.x + mem[local_ofs + wi + 5].w * kernel_v1.y +
+                mem[local_ofs + wi + 6].w * kernel_v1.z + mem[local_ofs + wi + 7].w * kernel_v1.w;
+  }
+
+  output0.x = (output0.x + VertSumAdd) >> VertBits;
+  output0.y = (output0.y + VertSumAdd) >> VertBits;
+  output0.z = (output0.z + VertSumAdd) >> VertBits;
+  output0.w = (output0.w + VertSumAdd) >> VertBits;
+  output1.x = (output1.x + VertSumAdd) >> VertBits;
+  output1.y = (output1.y + VertSumAdd) >> VertBits;
+  output1.z = (output1.z + VertSumAdd) >> VertBits;
+  output1.w = (output1.w + VertSumAdd) >> VertBits;
+
+  y += wi;
+  x <<= 1;
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+
+  int compound_type = block.y >> 30;
+  int4 coefs;
+  if (compound_type == 0) {
+    coefs.x = ((block.y >> 17) & 15) << 2;
+    coefs.yzw = coefs.xxx;
+  } else if (compound_type == 1) {
+    int wedge_stride = SubblockW << w_log;
+    int wedge_addr = (((block.y >> 17) & 0x1fff) << 6) + SubblockW * (subblock & ((1 << w_log) - 1)) +
+                     (SubblockH * (subblock >> w_log) + wi) * wedge_stride;
+    uint wedge = comp_mask.Load(wedge_addr);
+    coefs.x = (wedge >> 0) & 255;
+    coefs.y = (wedge >> 8) & 255;
+    coefs.z = (wedge >> 16) & 255;
+    coefs.w = (wedge >> 24) & 255;
+  } else if (compound_type == 3) {
+    uint m = dst_frame.Load(output_offset);
+    coefs.x = (m >> 0) & 255;
+    coefs.y = (m >> 8) & 255;
+    coefs.z = (m >> 16) & 255;
+    coefs.w = (m >> 24) & 255;
+  } else {
+    coefs.x = clamp(DiffWTDBase + ((abs(output0.x - output1.x) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+    coefs.y = clamp(DiffWTDBase + ((abs(output0.y - output1.y) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+    coefs.z = clamp(DiffWTDBase + ((abs(output0.z - output1.z) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+    coefs.w = clamp(DiffWTDBase + ((abs(output0.w - output1.w) + DiffWTDRoundAdd) >> DiffWTDRoundShft), 0, DiffWTDMax);
+    if ((block.y >> 17) & 1) coefs = int4(64, 64, 64, 64) - coefs;
+  }
+
+  output0.x = blend(output0.x, output1.x, coefs.x);
+  output0.y = blend(output0.y, output1.y, coefs.y);
+  output0.z = blend(output0.z, output1.z, coefs.z);
+  output0.w = blend(output0.w, output1.w, coefs.w);
+
+  if (noskip) {
+    const int res_stride = cb_planes[plane].z;
+    const int res_offset = cb_planes[plane].w;
+    int2 r = (int2)residuals.Load2(res_offset + x + y * res_stride);
+    output0.x += (r.x << 16) >> 16;
+    output0.y += r.x >> 16;
+    output0.z += (r.y << 16) >> 16;
+    output0.w += r.y >> 16;
+    output0 = clamp(output0, 0, PixelMax);
+  }
+
+  dst_frame.Store2(output_offset, uint2(output0.x | (output0.y << 16), output0.z | (output0.w << 16)));
+
+  if (compound_type == 2) {
+    wi = thread.x & 63;
+    coefs.x = coefs.x + coefs.y;
+    coefs.y = coefs.z + coefs.w;
+    mem[wi].xy = coefs.xy;
+
+    if ((wi & 5) == 0)  // filter odd cols and lines
+    {
+      coefs.xy += mem[wi + 1].xy;  // next line
+      coefs.zw = mem[wi + 4].xy;   // next 4x4 col
+      coefs.zw += mem[wi + 5].xy;  // next 4x4 col, next line
+      coefs.x = (coefs.x + 2) >> 2;
+      coefs.y = (coefs.y + 2) >> 2;
+      coefs.z = (coefs.z + 2) >> 2;
+      coefs.w = (coefs.w + 2) >> 2;
+
+      int chroma_offset = (x + y * cb_planes[1].x) >> 1;
+      uint mask = coefs.x | (coefs.y << 8) | (coefs.z << 16) | (coefs.w << 24);
+      dst_frame.Store(cb_planes[1].y + chroma_offset, mask);
+      dst_frame.Store(cb_planes[2].y + chroma_offset, mask);
+    }
+  }
+}

diff --git a/libav1/dx/shaders/inter_warp_hbd.hlsl b/libav1/dx/shaders/inter_warp_hbd.hlsl
new file mode 100644
index 0000000..d466e0b
--- /dev/null
+++ b/libav1/dx/shaders/inter_warp_hbd.hlsl

@@ -0,0 +1,170 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "inter_common.h"
+
+#define SubblockW 4
+#define SubblockH 4
+#define VertBits 11
+#define VertSumAdd ((1 << 21) + (1 << (VertBits - 1)))
+#define VertSub ((1 << (10 - 1)) + (1 << 10))
+#define PixelMax 1023
+
+groupshared int4 mem[11 * 64];
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+
+  const int offset = (cb_pass_offset + (thread.x >> (w_log + h_log))) * WarpBlockSize;
+  uint4 info0 = warp_blocks.Load4(offset);
+  int4 info1 = warp_blocks.Load4(offset + 16);
+  int4 info2 = warp_blocks.Load4(offset + 32);
+  // struct
+  //{
+  //    uint pos;
+  //    uint flags;
+  //    int mat[6];
+  //    int alpha;
+  //    int beta;
+  //    int delta;
+  //    int gamma;
+  //};
+
+  int x = SubblockW * ((info0.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * ((info0.x >> 16) + (subblock >> w_log));
+
+  const int plane = info0.y & 3;
+  const int subsampling = plane > 0;
+  const int2 dims = cb_dims[subsampling].xy;
+  const int noskip = info0.y & NoSkipFlag;
+
+  const int src_x = ((x & (~7)) + 4) << subsampling;
+  const int src_y = ((y & (~7)) + 4) << subsampling;
+  const int dst_x = info1.x * src_x + info1.y * src_y + (int)info0.z;
+  const int dst_y = info1.z * src_x + info1.w * src_y + (int)info0.w;
+  const int x4 = dst_x >> subsampling;
+  const int y4 = dst_y >> subsampling;
+
+  int ix4 = clamp((x4 >> WarpPrecBits) - 7 + (x & 7), -11, dims.x) << 1;
+  int iy4 = (y4 >> WarpPrecBits) - 7 + (y & 7);
+
+  int sx4 = x4 & ((1 << WarpPrecBits) - 1);
+  int sy4 = y4 & ((1 << WarpPrecBits) - 1);
+
+  sx4 += info2.x * (-4) + info2.y * (-4);
+  sy4 += info2.w * (-4) + info2.z * (-4);
+
+  sx4 &= ~((1 << WarpReduceBits) - 1);
+  sy4 &= ~((1 << WarpReduceBits) - 1);
+
+  sx4 += info2.y * ((y & 7) - 3) + info2.x * (x & 7);
+  sy4 += info2.z * (y & 7) + info2.w * (x & 7);
+
+  int refplane = ((info0.y >> 2) & 7) * 3 + plane;
+  int ref_offset = cb_refplanes[refplane].y + ix4;
+  int ref_stride = cb_refplanes[refplane].x;
+
+  int local_ofs = (thread.x & 63) * 11;
+  mem[local_ofs + 0] = filter_line_warp_hbd(dst_frame, ref_offset + ref_stride * clamp(iy4 + 0, 0, dims.y), warp_filter,
+                                            sx4 + info2.y * 0, info2.x);
+  mem[local_ofs + 1] = filter_line_warp_hbd(dst_frame, ref_offset + ref_stride * clamp(iy4 + 1, 0, dims.y), warp_filter,
+                                            sx4 + info2.y * 1, info2.x);
+  mem[local_ofs + 2] = filter_line_warp_hbd(dst_frame, ref_offset + ref_stride * clamp(iy4 + 2, 0, dims.y), warp_filter,
+                                            sx4 + info2.y * 2, info2.x);
+  mem[local_ofs + 3] = filter_line_warp_hbd(dst_frame, ref_offset + ref_stride * clamp(iy4 + 3, 0, dims.y), warp_filter,
+                                            sx4 + info2.y * 3, info2.x);
+  mem[local_ofs + 4] = filter_line_warp_hbd(dst_frame, ref_offset + ref_stride * clamp(iy4 + 4, 0, dims.y), warp_filter,
+                                            sx4 + info2.y * 4, info2.x);
+  mem[local_ofs + 5] = filter_line_warp_hbd(dst_frame, ref_offset + ref_stride * clamp(iy4 + 5, 0, dims.y), warp_filter,
+                                            sx4 + info2.y * 5, info2.x);
+  mem[local_ofs + 6] = filter_line_warp_hbd(dst_frame, ref_offset + ref_stride * clamp(iy4 + 6, 0, dims.y), warp_filter,
+                                            sx4 + info2.y * 6, info2.x);
+  mem[local_ofs + 7] = filter_line_warp_hbd(dst_frame, ref_offset + ref_stride * clamp(iy4 + 7, 0, dims.y), warp_filter,
+                                            sx4 + info2.y * 7, info2.x);
+  mem[local_ofs + 8] = filter_line_warp_hbd(dst_frame, ref_offset + ref_stride * clamp(iy4 + 8, 0, dims.y), warp_filter,
+                                            sx4 + info2.y * 8, info2.x);
+  mem[local_ofs + 9] = filter_line_warp_hbd(dst_frame, ref_offset + ref_stride * clamp(iy4 + 9, 0, dims.y), warp_filter,
+                                            sx4 + info2.y * 9, info2.x);
+  mem[local_ofs + 10] = filter_line_warp_hbd(dst_frame, ref_offset + ref_stride * clamp(iy4 + 10, 0, dims.y),
+                                             warp_filter, sx4 + info2.y * 10, info2.x);
+
+  x <<= 1;
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+  const int res_stride = cb_planes[plane].z;
+  const int res_offset = cb_planes[plane].w + x + y * res_stride;
+  for (int l = 0; l < 4; ++l) {
+    int4 output;
+    int sy = sy4 + l * info2.z;
+
+    int filter_addr;
+    int4 filter0, filter1;
+
+    filter_addr = WarpFilterSize * (((sy + 0 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output.x = mem[local_ofs + l + 0].x * filter0.x + mem[local_ofs + l + 1].x * filter0.y +
+               mem[local_ofs + l + 2].x * filter0.z + mem[local_ofs + l + 3].x * filter0.w +
+               mem[local_ofs + l + 4].x * filter1.x + mem[local_ofs + l + 5].x * filter1.y +
+               mem[local_ofs + l + 6].x * filter1.z + mem[local_ofs + l + 7].x * filter1.w;
+
+    filter_addr = WarpFilterSize * (((sy + 1 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output.y = mem[local_ofs + l + 0].y * filter0.x + mem[local_ofs + l + 1].y * filter0.y +
+               mem[local_ofs + l + 2].y * filter0.z + mem[local_ofs + l + 3].y * filter0.w +
+               mem[local_ofs + l + 4].y * filter1.x + mem[local_ofs + l + 5].y * filter1.y +
+               mem[local_ofs + l + 6].y * filter1.z + mem[local_ofs + l + 7].y * filter1.w;
+
+    filter_addr = WarpFilterSize * (((sy + 2 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output.z = mem[local_ofs + l + 0].z * filter0.x + mem[local_ofs + l + 1].z * filter0.y +
+               mem[local_ofs + l + 2].z * filter0.z + mem[local_ofs + l + 3].z * filter0.w +
+               mem[local_ofs + l + 4].z * filter1.x + mem[local_ofs + l + 5].z * filter1.y +
+               mem[local_ofs + l + 6].z * filter1.z + mem[local_ofs + l + 7].z * filter1.w;
+
+    filter_addr = WarpFilterSize * (((sy + 3 * info2.w + WarpFiltRoundAdd) >> WarpFiltRoundBits) + WarpFiltOffset);
+    filter0 = warp_filter.Load4(filter_addr);
+    filter1 = warp_filter.Load4(filter_addr + 16);
+    output.w = mem[local_ofs + l + 0].w * filter0.x + mem[local_ofs + l + 1].w * filter0.y +
+               mem[local_ofs + l + 2].w * filter0.z + mem[local_ofs + l + 3].w * filter0.w +
+               mem[local_ofs + l + 4].w * filter1.x + mem[local_ofs + l + 5].w * filter1.y +
+               mem[local_ofs + l + 6].w * filter1.z + mem[local_ofs + l + 7].w * filter1.w;
+
+    output.x = clamp((int)(((output.x + VertSumAdd) >> VertBits) - VertSub), 0, PixelMax);
+    output.y = clamp((int)(((output.y + VertSumAdd) >> VertBits) - VertSub), 0, PixelMax);
+    output.z = clamp((int)(((output.z + VertSumAdd) >> VertBits) - VertSub), 0, PixelMax);
+    output.w = clamp((int)(((output.w + VertSumAdd) >> VertBits) - VertSub), 0, PixelMax);
+
+    if (noskip) {
+      int2 r = (int2)residuals.Load2(res_offset + l * res_stride);
+      output.x += (r.x << 16) >> 16;
+      output.y += r.x >> 16;
+      output.z += (r.y << 16) >> 16;
+      output.w += r.y >> 16;
+      output = clamp(output, 0, PixelMax);
+    }
+
+    dst_frame.Store2(output_offset + l * output_stride,
+                     uint2(output.x | (output.y << 16), output.z | (output.w << 16)));
+  }
+}

diff --git a/libav1/dx/shaders/intra_filter.hlsl b/libav1/dx/shaders/intra_filter.hlsl
new file mode 100644
index 0000000..8cf54bc
--- /dev/null
+++ b/libav1/dx/shaders/intra_filter.hlsl

@@ -0,0 +1,183 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+cbuffer IntraDataCommon : register(b0) {
+  int4 cb_planes[3];
+  int4 cb_flags;
+  int4 cb_filter[5][8][2];
+  int4 cb_mode_params_lut[16][7];
+  int4 cb_sm_weight_arrays[128];
+};
+
+cbuffer PSSLIntraSRT : register(b1) {
+  uint cb_wi_count;
+  int cb_pass_offset;
+  int4 cb_counts0;
+  int4 cb_counts1;
+};
+
+ByteAddressBuffer pred_blocks : register(t0);
+ByteAddressBuffer residuals : register(t1);
+ByteAddressBuffer wedge_mask : register(t2);
+RWByteAddressBuffer dst_frame : register(u0);
+
+#define ROWS 8
+groupshared int mem[ROWS][8];
+
+groupshared int4 loc_above[8];
+groupshared int loc_left[32];
+groupshared int loc_corner[18];
+
+[numthreads(8, ROWS, 1)] void main(uint3 thread
+                                   : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  uint3 block = pred_blocks.Load3((cb_pass_offset + (thread.x >> 3)) << 4);
+
+  const int wi = thread.x & 7;
+  int x = 4 * (block.x & 0xffff);
+  int y = 4 * (block.x >> 16);
+
+  //    block.y bits:
+  //    0    3        bw_log
+  //    3    2        plane
+  //    5    1        non skip
+  //    6    4        mode
+  //        all mods except intra_bc:
+  //    10    6        above_available
+  //    16    6        left_available
+  //            dir mode params:
+  //    22    2        upsample
+  //    24    2        edge_filter above
+  //    26    2        edge_filter left
+  //    28    3        mode_angle
+  //    31    1        inter_intra?
+  //            CFL:
+  //    22    4        alpha
+  //
+  //    block.z
+  //        intra_bc - mv
+  //        inter-intra    - coef. table indexes;
+  //        filter - mode_info0 = txh | (filter_mode << 4);
+  //    block.w - reserved; (prob. used for sorting);
+
+  const int bw = 1 << (block.y & 3);
+  const int bh = 2 << (block.z & 3);
+  const int plane = (block.y >> 3) & 3;
+  const int mode = (block.z >> 4) & 7;
+  const int stride = cb_planes[plane].x;
+  const int offset = cb_planes[plane].y + x + y * stride;
+
+  const int above_available = (block.y >> 10) & 63;
+  const int left_available = ((block.y >> 16) & 63) << 2;
+  if (wi < bw) {
+    uint pixels = above_available ? dst_frame.Load(offset - stride + min(above_available - 1, wi) * 4)
+                                  : left_available ? dst_frame.Load(offset - 4) : 0x7f7f7f7f;
+    if (wi >= above_available) pixels = (pixels >> 24) * 0x01010101;
+    loc_above[wi].x = (pixels >> 0) & 255;
+    loc_above[wi].y = (pixels >> 8) & 255;
+    loc_above[wi].z = (pixels >> 16) & 255;
+    loc_above[wi].w = (pixels >> 24) & 255;
+  }
+  GroupMemoryBarrier();
+
+  int row;
+  for (row = wi; row < bh * 2; row += 8) {
+    const int l_addr = offset - 4 + min(row, left_available - 1) * stride;
+    int left = left_available ? (dst_frame.Load(l_addr) >> 24) : above_available ? loc_above[0].x : 129;
+
+    loc_left[row] = left;
+    if (row & 1) loc_corner[(row + 1) >> 1] = left;
+  }
+
+  if (wi == 0) {
+    loc_corner[0] = (left_available && above_available) ? (dst_frame.Load(offset - 4 - stride) >> 24)
+                                                        : (left_available || above_available) ? loc_above[0].x : 128;
+  }
+
+  GroupMemoryBarrier();
+
+  int4 filter0 = cb_filter[mode][wi][0];
+  int3 filter1 = cb_filter[mode][wi][1].xyz;
+
+  for (row = thread.y; row < bh; row += ROWS)
+    for (int col = -(int)thread.y - 1; col < bw; ++col) {
+      if (col < 0) {
+        continue;
+      }
+
+      int p0 = loc_corner[row];
+      int4 p14 = loc_above[col];
+      int p5 = loc_left[2 * row];
+      int p6 = loc_left[2 * row + 1];
+
+      int pixel = p0 * filter0.x + p14.x * filter0.y + p14.y * filter0.z + p14.z * filter0.w + p14.w * filter1.x +
+                  p5 * filter1.y + p6 * filter1.z;
+      pixel = clamp((pixel + 8) >> 4, 0, 255);
+      mem[thread.y][wi] = pixel;
+
+      GroupMemoryBarrier();
+
+      if (wi < 2) {
+        uint pixel4;
+        int4 pix;
+        pix.x = mem[thread.y][wi * 4 + 0];
+        pix.y = mem[thread.y][wi * 4 + 1];
+        pix.z = mem[thread.y][wi * 4 + 2];
+        pix.w = mem[thread.y][wi * 4 + 3];
+
+        loc_left[2 * row + wi] = pix.w;
+        if (wi == 1) {
+          loc_above[col] = pix;
+          loc_corner[row] = p14.w;
+        }
+        pixel4 = pix.x;
+        pixel4 |= pix.y << 8;
+        pixel4 |= pix.z << 16;
+        pixel4 |= pix.w << 24;
+
+        const int addr = offset + col * 4 + (row * 2 + wi) * stride;
+        dst_frame.Store(addr, pixel4);
+      }
+
+      GroupMemoryBarrier();
+    }
+
+  if (block.y & (1 << 5)) {
+    const int res_stride = cb_planes[plane].z;
+    const int res_offset = cb_planes[plane].w + 2 * x + y * res_stride;
+    // 4x4: (11-wi)/8 = 1 1 1 1 0 0 0 0
+    // 8x8: (23-wi)/8=    2 2 2 2 2 2 2 2
+    // 16x16: (2*4*8+7 - wi) / 8 = 8 8 8 8 8 8 8 8
+
+    const int words = (2 * bw * bh + 7 - wi) >> 3;
+    for (int i = 0; i < words; ++i) {
+      const int idx = i * 8 + wi;
+      const int wx = (idx & (bw - 1));
+      const int wy = idx >> (block.y & 3);  //(block.y & 3) = bw_log2
+      const int addr = offset + 4 * wx + stride * wy;
+      uint pix = dst_frame.Load(addr);
+
+      const int res_addr = res_offset + 8 * wx + res_stride * wy;
+      int2 r = residuals.Load2(res_addr);
+      uint result = (clamp((int)((pix >> 0) & 255) + (int)((r.x << 16) >> 16), 0, 255) << 0) |
+                    (clamp((int)((pix >> 8) & 255) + (int)(r.x >> 16), 0, 255) << 8) |
+                    (clamp((int)((pix >> 16) & 255) + (int)((r.y << 16) >> 16), 0, 255) << 16) |
+                    (clamp((int)((pix >> 24) & 255) + (int)(r.y >> 16), 0, 255) << 24);
+      dst_frame.Store(addr, result);
+    }
+  }
+}

diff --git a/libav1/dx/shaders/intra_filter_hbd.hlsl b/libav1/dx/shaders/intra_filter_hbd.hlsl
new file mode 100644
index 0000000..88d98f5
--- /dev/null
+++ b/libav1/dx/shaders/intra_filter_hbd.hlsl

@@ -0,0 +1,182 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+cbuffer IntraDataCommon : register(b0) {
+  int4 cb_planes[3];
+  int4 cb_flags;
+  int4 cb_filter[5][8][2];
+  int4 cb_mode_params_lut[16][7];
+  int4 cb_sm_weight_arrays[128];
+};
+
+cbuffer PSSLIntraSRT : register(b1) {
+  uint cb_wi_count;
+  int cb_pass_offset;
+  int4 cb_counts0;
+  int4 cb_counts1;
+};
+
+ByteAddressBuffer pred_blocks : register(t0);
+ByteAddressBuffer residuals : register(t1);
+ByteAddressBuffer wedge_mask : register(t2);
+RWByteAddressBuffer dst_frame : register(u0);
+
+#define ROWS 8
+groupshared int mem[ROWS][8];
+
+groupshared int4 loc_above[8];
+groupshared int loc_left[32];
+groupshared int loc_corner[18];
+
+[numthreads(8, ROWS, 1)] void main(uint3 thread
+                                   : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  uint3 block = pred_blocks.Load3((cb_pass_offset + (thread.x >> 3)) << 4);
+
+  const int wi = thread.x & 7;
+  int x = 8 * (block.x & 0xffff);
+  int y = 4 * (block.x >> 16);
+
+  //    block.y bits:
+  //    0    3        bw_log
+  //    3    2        plane
+  //    5    1        non skip
+  //    6    4        mode
+  //        all mods except intra_bc:
+  //    10    6        above_available
+  //    16    6        left_available
+  //            dir mode params:
+  //    22    2        upsample
+  //    24    2        edge_filter above
+  //    26    2        edge_filter left
+  //    28    3        mode_angle
+  //    31    1        inter_intra?
+  //            CFL:
+  //    22    4        alpha
+  //
+  //    block.z
+  //        intra_bc - mv
+  //        inter-intra    - coef. table indexes;
+  //        filter - mode_info0 = txh | (filter_mode << 4);
+  //    block.w - reserved; (prob. used for sorting);
+
+  const int bw = 1 << (block.y & 3);
+  const int bh = 2 << (block.z & 3);
+  const int plane = (block.y >> 3) & 3;
+  const int mode = (block.z >> 4) & 7;
+  const int stride = cb_planes[plane].x;
+  const int offset = cb_planes[plane].y + x + y * stride;
+
+  const int above_available = (block.y >> 10) & 63;
+  const int left_available = ((block.y >> 16) & 63) << 2;
+
+  if (wi < bw) {
+    uint2 pixels = uint2(0, 0x01ff0000);
+    if (above_available)
+      pixels = dst_frame.Load2(offset - stride + min(above_available - 1, wi) * 8);
+    else if (left_available)
+      pixels.y = dst_frame.Load(offset - 4);
+    if (wi >= above_available) {
+      pixels.y = (pixels.y & 0x03ff0000) | (pixels.y >> 16);
+      pixels.x = pixels.y;
+    }
+    loc_above[wi].x = (pixels.x >> 0) & 1023;
+    loc_above[wi].y = (pixels.x >> 16) & 1023;
+    loc_above[wi].z = (pixels.y >> 0) & 1023;
+    loc_above[wi].w = (pixels.y >> 16) & 1023;
+  }
+
+  GroupMemoryBarrier();
+
+  int row;
+  for (row = wi; row < bh * 2; row += 8) {
+    const int l_addr = offset - 4 + min(row, left_available - 1) * stride;
+    int left = left_available ? (dst_frame.Load(l_addr) >> 16) : above_available ? loc_above[0].x : 513;
+
+    loc_left[row] = left;
+    if (row & 1) loc_corner[((row + 1) >> 1)] = left;
+  }
+
+  if (wi == 0) {
+    loc_corner[0] = (left_available && above_available) ? (dst_frame.Load(offset - 4 - stride) >> 16)
+                                                        : (left_available || above_available) ? loc_above[0].x : 512;
+  }
+
+  GroupMemoryBarrier();
+
+  int4 filter0 = cb_filter[mode][wi][0];
+  int3 filter1 = cb_filter[mode][wi][1].xyz;
+
+  for (row = thread.y; row < bh; row += ROWS)
+    for (int col = -(int)thread.y - 1; col < bw; ++col) {
+      if (col < 0) {
+        continue;
+      }
+
+      int p0 = loc_corner[row];
+      int4 p14 = loc_above[col];
+      int p5 = loc_left[2 * row];
+      int p6 = loc_left[2 * row + 1];
+      int pixel = p0 * filter0.x + p14.x * filter0.y + p14.y * filter0.z + p14.z * filter0.w + p14.w * filter1.x +
+                  p5 * filter1.y + p6 * filter1.z;
+      pixel = clamp((pixel + 8) >> 4, 0, 1023);
+      mem[thread.y][wi] = pixel;
+
+      GroupMemoryBarrier();
+
+      if (wi < 2) {
+        int4 pix;
+        pix.x = mem[thread.y][wi * 4 + 0];
+        pix.y = mem[thread.y][wi * 4 + 1];
+        pix.z = mem[thread.y][wi * 4 + 2];
+        pix.w = mem[thread.y][wi * 4 + 3];
+
+        loc_left[2 * row + wi] = pix.w;
+        if (wi == 1) {
+          loc_above[col] = pix;
+          loc_corner[row] = p14.w;
+        }
+        uint2 pixel4;
+        pixel4.x = pix.x;
+        pixel4.x |= pix.y << 16;
+        pixel4.y = pix.z;
+        pixel4.y |= pix.w << 16;
+        const int addr = offset + col * 8 + (row * 2 + wi) * stride;
+        dst_frame.Store2(addr, pixel4);
+      }
+      GroupMemoryBarrier();
+    }
+
+  if (block.y & (1 << 5)) {
+    const int res_stride = cb_planes[plane].z;
+    const int res_offset = cb_planes[plane].w + x + y * res_stride;
+    // 4x4: (11-wi)/8 = 1 1 1 1 0 0 0 0
+    // 8x8: (23-wi)/8=    2 2 2 2 2 2 2 2
+    // 16x16: (2*4*8+7 - wi) / 8 = 8 8 8 8 8 8 8 8
+    const int wi2 = wi + thread.y * 8;
+    for (int i = wi2; i < (4 * bw * bh); i += (8 * ROWS)) {
+      const int wx = i & ((bw << 1) - 1);
+      const int wy = i >> ((block.y & 3) + 1);
+      const int addr = offset + 4 * wx + stride * wy;
+      uint pix = dst_frame.Load(addr);
+      int r = residuals.Load(res_offset + 4 * wx + res_stride * wy);
+      uint result = (clamp((int)((pix >> 0) & 1023) + (int)((r.x << 16) >> 16), 0, 1023) << 0) |
+                    (clamp((int)((pix >> 16) & 1023) + (int)(r.x >> 16), 0, 1023) << 16);
+      dst_frame.Store(addr, result);
+    }
+  }
+}

diff --git a/libav1/dx/shaders/intra_main.hlsl b/libav1/dx/shaders/intra_main.hlsl
new file mode 100644
index 0000000..cbe010b
--- /dev/null
+++ b/libav1/dx/shaders/intra_main.hlsl

@@ -0,0 +1,592 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+cbuffer IntraDataCommon : register(b0) {
+  int4 cb_planes[3];
+  int4 cb_flags;
+  int4 cb_filter[5][8][2];
+  int4 cb_mode_params_lut[16][7];
+  int4 cb_sm_weight_arrays[128];
+};
+
+cbuffer PSSLIntraSRT : register(b1) {
+  int4 cb_counts0;
+  int4 cb_counts1;
+  uint cb_wi_count;
+  int cb_pass_offset;
+};
+
+ByteAddressBuffer pred_blocks : register(t0);
+ByteAddressBuffer residuals : register(t1);
+ByteAddressBuffer wedge_mask : register(t2);
+RWByteAddressBuffer dst_frame : register(u0);
+
+#define Dir1 1
+#define Dir2 2
+#define Dir3 3
+#define SmoothV 4
+#define SmoothH 8
+
+#define MODE_PAETH 11
+#define MODE_INTRA_BC 12
+#define MODE_FILTER 13
+#define MODE_DC 14
+#define MODE_CFL 15
+
+#define NeedAboveShift 4
+#define NeedRightShift 5
+#define NeedLeftShift 6
+#define NeedBotShift 7
+#define NeedAboveLeftShift 8
+#define FilterAboveLeftFlag 0x200
+#define NeedAboveLeftLUT 0x08ff
+// intra bc:
+#define SubpelBits 4
+#define FilterLineShift 3
+#define OutputShift 11
+#define OffsetBits 19
+#define SumAddHor (1 << 14)
+#define SumAddVert ((1 << OffsetBits) + (1 << (OutputShift - 1)))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+
+groupshared int above[64 * 20];
+groupshared int left[64 * 20];
+
+int compute_bc(int p0, int p1, int p2, int p3, int fh, int fv) {
+  int l0 = ((1 << 14) + p0 * fh + p1 * (128 - fh) + (1 << (FilterLineShift - 1))) >> FilterLineShift;
+  int l1 = ((1 << 14) + p2 * fh + p3 * (128 - fh) + (1 << (FilterLineShift - 1))) >> FilterLineShift;
+  int output = SumAddVert + l0 * fv + l1 * (128 - fv);
+  return clamp((output >> OutputShift) - OutputSub, 0, 255);
+}
+
+uint compute_cfl_pixel(int dc, int value) {
+  value = (value < 0) ? -((-value + (1 << 5)) >> 6) : ((value + (1 << 5)) >> 6);
+  return clamp(dc + value, 0, 255);
+}
+
+int intra_inter_blend(int mask, int intra, int inter) {
+  return clamp((intra * mask + inter * (64 - mask) + 32) >> 6, 0, 255);
+}
+#define WG_SIZE 256
+[numthreads(256, 1, 1)] void main(uint3 thread
+                                  : SV_DispatchThreadID) {
+  int4 counts0 = cb_counts0;
+  int4 counts1 = cb_counts1;
+  int bsize_log = 10;
+  int offset = cb_pass_offset;
+  int threadx = thread.x;
+  uint3 block = uint3(0, 0, 0);
+  int wi_count = 0;
+  int wi = 1024;
+  if (thread.x < cb_wi_count) {
+    if (threadx >= (counts0.x << 10)) {
+      offset += cb_counts0.x;
+      threadx -= counts0.x << 10;
+      bsize_log = 9;
+      if (threadx >= (counts0.y << 9)) {
+        offset += cb_counts0.y;
+        threadx -= counts0.y << 9;
+        bsize_log = 8;
+        if (threadx >= (counts0.z << 8)) {
+          offset += cb_counts0.z;
+          threadx -= counts0.z << 8;
+          bsize_log = 7;
+          if (threadx >= (counts0.w << 7)) {
+            offset += cb_counts0.w;
+            threadx -= counts0.w << 7;
+            bsize_log = 6;
+            if (threadx >= (counts1.x << 6)) {
+              offset += cb_counts1.x;
+              threadx -= counts1.x << 6;
+              bsize_log = 5;
+              if (threadx >= (counts1.y << 5)) {
+                offset += cb_counts1.y;
+                threadx -= counts1.y << 5;
+                bsize_log = 4;
+                if (threadx >= (counts1.z << 4)) {
+                  offset += cb_counts1.z;
+                  threadx -= counts1.z << 4;
+                  bsize_log = 3;
+                  if (threadx >= (counts1.w << 3)) {
+                    offset += cb_counts1.w;
+                    threadx -= counts1.w << 3;
+                    bsize_log = 2;
+                  }
+                }
+              }
+            }
+          }
+        }
+      }
+    }
+
+    int block_index = offset + (threadx >> bsize_log);
+    block = pred_blocks.Load3(block_index << 4);
+    wi_count = min(WG_SIZE, 1 << bsize_log);
+    wi = thread.x & (wi_count - 1);
+  }
+
+  const int bw_log = block.y & 7;  // 0..4
+  const int bw = 4 << bw_log;
+  const int bh = 1 << (bsize_log - bw_log);
+  int bx = (block.x & 0xffff) << 2;
+  int by = (block.x >> 16) << 2;
+  const int plane = (block.y >> 3) & 3;
+  const int above_available = (block.y >> 10) & 63;
+  const int left_available = ((block.y >> 16) & 63) << 2;
+  const int mode = (block.y >> 6) & 15;
+  const int mode_angle = (block.y >> 28) & 7;
+  int4 params = cb_mode_params_lut[mode][mode_angle];
+
+  //    block.y bits:
+  //    0    3        bw_log
+  //    3    2        plane
+  //    5    1        non skip
+  //    6    4        mode
+  //        all mods except intra_bc:
+  //    10    6        above_available
+  //    16    6        left_available
+  //            dir mode params:
+  //    22    2        upsample
+  //    24    2        edge_filter above
+  //    26    2        edge_filter left
+  //    28    3        mode_angle
+  //    31    1        inter_intra?
+  //            CFL:
+  //    22    4        alpha
+  //
+  //    block.z
+  //        intra_bc - mv
+  //        inter-intra    - coef. table indexes;
+  //        filter - bh_log | filter_mode;
+  //    block.w - reserved; (prob. used for sorting);
+
+  const int stride = cb_planes[plane].x;
+  const int corner = cb_planes[plane].y + (bx - 4) + (by - 1) * stride;
+
+  const int loc_base = (((thread.x & (WG_SIZE - 1)) - wi) >> 2) * 20 + 2;
+  int above_count = ((params.z >> NeedAboveShift) & 1) * bw + ((params.z >> NeedRightShift) & 1) * bh;
+  // Above:
+  if (wi < (above_count >> 2) || wi == 0) {
+    int addr = corner + 4 + 4 * min(wi, above_available - 1);
+    uint pixels =
+        above_available ? dst_frame.Load(addr) : left_available ? dst_frame.Load(corner + stride) : 0x7f7f7f7f;
+    if (wi >= above_available) pixels = (pixels >> 24) * 0x01010101;
+    above[loc_base + wi * 4 + 0] = (pixels >> 0) & 255;
+    above[loc_base + wi * 4 + 1] = (pixels >> 8) & 255;
+    above[loc_base + wi * 4 + 2] = (pixels >> 16) & 255;
+    above[loc_base + wi * 4 + 3] = (pixels >> 24) & 255;
+  }
+
+  // Left:
+  GroupMemoryBarrierWithGroupSync();
+
+  int left_count = ((params.z >> NeedLeftShift) & 1) * bh + ((params.z >> NeedBotShift) & 1) * bw;
+
+  if (wi < left_count || wi == 0) {
+    const int addr = corner + (min(wi, left_available - 1) + 1) * stride;
+    left[loc_base + wi] = left_available ? (dst_frame.Load(addr) >> 24) : above_available ? above[loc_base] : 129;
+  }
+  if (wi < (left_count - wi_count)) {
+    const int addr = corner + (min(wi + wi_count, left_available - 1) + 1) * stride;
+    left[loc_base + wi + wi_count] =
+        left_available ? (dst_frame.Load(addr) >> 24) : above_available ? above[loc_base] : 129;
+  }
+
+  GroupMemoryBarrierWithGroupSync();
+
+  const int need_aboveleft = (params.z >> NeedAboveLeftShift) & 1;
+  if (wi == 0 && need_aboveleft) {
+    uint t = above[loc_base];
+    uint l = left[loc_base];
+    uint topleft = (left_available && above_available) ? (dst_frame.Load(corner) >> 24)
+                                                       : left_available ? l : above_available ? t : 128;
+
+    if ((params.z & FilterAboveLeftFlag) && (bw + bh >= 24) && cb_flags.x)
+      topleft = ((l + t) * 5 + topleft * 6 + 8) >> 4;
+
+    above[loc_base - 1] = topleft;
+    left[loc_base - 1] = topleft;
+    above[loc_base - 2] = topleft;
+    left[loc_base - 2] = topleft;
+  }
+
+  GroupMemoryBarrierWithGroupSync();
+
+  int dir_mode = mode < MODE_DC;
+  int upsample_above = (block.y >> 22) & dir_mode;
+  int upsample_left = (block.y >> 23) & dir_mode;
+
+  int filter_count = 0;
+  if (above_available && ((params.z >> NeedAboveShift) & 1)) {
+    filter_count = min(above_available << 2, bw) + ((params.z >> NeedRightShift) & 1) * bh + need_aboveleft - 1;
+  }
+  int edge_filter = (block.y >> 24) & ((wi < filter_count && dir_mode) * 3);
+
+  int sum0 = 8;
+  int sum1 = 8;
+  const int wi1 = wi + wi_count;
+  if (edge_filter) {
+    edge_filter <<= 2;
+    const int base = loc_base - need_aboveleft;
+    const int last = above_count;
+    //{ 0, 4, 8, 4, 0 }, { 0, 5, 6, 5, 0 }, { 2, 4, 4, 4, 2 }
+    sum0 += above[base + max(wi - 1, 0)] * ((0x2000 >> edge_filter) & 15);
+    sum0 += above[base + (wi + 0)] * ((0x4540 >> edge_filter) & 15);
+    sum0 += above[base + (wi + 1)] * ((0x4680 >> edge_filter) & 15);
+    sum0 += above[base + min(wi + 2, last)] * ((0x4540 >> edge_filter) & 15);
+    sum0 += above[base + min(wi + 3, last)] * ((0x2000 >> edge_filter) & 15);
+    if (wi1 < above_count)  // rare, some mods for 4x4, 8x4, 4x8 blocks
+    {
+      sum1 += above[base + max(wi1 - 1, 0)] * ((0x2000 >> edge_filter) & 15);
+      sum1 += above[base + (wi1 + 0)] * ((0x4540 >> edge_filter) & 15);
+      sum1 += above[base + (wi1 + 1)] * ((0x4680 >> edge_filter) & 15);
+      sum1 += above[base + min(wi1 + 2, last)] * ((0x4540 >> edge_filter) & 15);
+      sum1 += above[base + min(wi1 + 3, last)] * ((0x2000 >> edge_filter) & 15);
+    }
+  }
+  GroupMemoryBarrierWithGroupSync();
+  if (edge_filter) {
+    above[loc_base + wi] = sum0 >> 4;
+    if (wi1 < filter_count) {
+      above[loc_base + wi1] = sum1 >> 4;
+    }
+  }
+
+  if (left_available && ((params.z >> NeedLeftShift) & 1)) {
+    filter_count = min(left_available, bh) + ((params.z >> NeedBotShift) & 1) * bw + need_aboveleft - 1;
+  }
+  edge_filter = (block.y >> 26) & ((wi < filter_count && dir_mode) * 3);
+  if (edge_filter) {
+    sum0 = 8;
+    sum1 = 8;
+    edge_filter *= 4;
+    const int base = loc_base - need_aboveleft;
+    const int last = filter_count;
+    sum0 += left[base + max(wi - 1, 0)] * ((0x2000 >> edge_filter) & 15);
+    sum0 += left[base + (wi + 0)] * ((0x4540 >> edge_filter) & 15);
+    sum0 += left[base + (wi + 1)] * ((0x4680 >> edge_filter) & 15);
+    sum0 += left[base + min(wi + 2, last)] * ((0x4540 >> edge_filter) & 15);
+    sum0 += left[base + min(wi + 3, last)] * ((0x2000 >> edge_filter) & 15);
+    if (wi1 < filter_count) {
+      sum1 += left[base + max(wi1 - 1, 0)] * ((0x2000 >> edge_filter) & 15);
+      sum1 += left[base + (wi1 + 0)] * ((0x4540 >> edge_filter) & 15);
+      sum1 += left[base + (wi1 + 1)] * ((0x4680 >> edge_filter) & 15);
+      sum1 += left[base + min(wi1 + 2, last)] * ((0x4540 >> edge_filter) & 15);
+      sum1 += left[base + min(wi1 + 3, last)] * ((0x2000 >> edge_filter) & 15);
+    }
+  }
+  GroupMemoryBarrierWithGroupSync();
+  if (edge_filter) {
+    left[loc_base + wi] = sum0 >> 4;
+    if (wi1 < filter_count) {
+      left[loc_base + wi1] = sum1 >> 4;
+    }
+  }
+
+  GroupMemoryBarrierWithGroupSync();
+
+  int p0 = 0, p1 = 0, p2 = 0, p3 = 0, p4 = 0;
+  int do_upsample = upsample_above && wi < (above_count >> 1);
+  if (do_upsample) {
+    p0 = above[loc_base + wi * 2 - 2];
+    p1 = above[loc_base + wi * 2 - 1];
+    p2 = above[loc_base + wi * 2 + 0];
+    p3 = above[loc_base + wi * 2 + 1];
+    p4 = above[loc_base + min(wi * 2 + 2, above_count - 1)];
+  }
+  GroupMemoryBarrierWithGroupSync();
+  if (do_upsample) {
+    above[loc_base - 1 + wi * 4] = clamp((-p0 + 9 * p1 + 9 * p2 - p3 + 8) >> 4, 0, 255);
+    above[loc_base + 0 + wi * 4] = p2;
+    above[loc_base + 1 + wi * 4] = clamp((-p1 + 9 * p2 + 9 * p3 - p4 + 8) >> 4, 0, 255);
+    above[loc_base + 2 + wi * 4] = p3;
+  }
+
+  do_upsample = upsample_left && wi < (left_count >> 1);
+  if (do_upsample) {
+    p0 = left[loc_base + wi * 2 - 2];
+    p1 = left[loc_base + wi * 2 - 1];
+    p2 = left[loc_base + wi * 2 + 0];
+    p3 = left[loc_base + wi * 2 + 1];
+    p4 = left[loc_base + min(wi * 2 + 2, left_count - 1)];
+  }
+  GroupMemoryBarrierWithGroupSync();
+  if (do_upsample) {
+    left[loc_base - 1 + wi * 4] = clamp((-p0 + 9 * p1 + 9 * p2 - p3 + 8) >> 4, 0, 255);
+    left[loc_base + 0 + wi * 4] = p2;
+    left[loc_base + 1 + wi * 4] = clamp((-p1 + 9 * p2 + 9 * p3 - p4 + 8) >> 4, 0, 255);
+    left[loc_base + 2 + wi * 4] = p3;
+  }
+  GroupMemoryBarrierWithGroupSync();
+
+  // int x0 = (wi & ((1 << bw_log) - 1)) << 2;
+  // int y = wi >> bw_log;
+  int x0 = (thread.x & ((1 << bw_log) - 1)) << 2;
+  int y = (thread.x & ((1 << bsize_log) - 1)) >> bw_log;
+
+  uint pixels = 0;
+  if (mode == MODE_INTRA_BC) {
+    int mv = (int)block.z;
+    int mvx = bx + x0 + ((mv) >> (16 + SubpelBits));
+    int mvy = by + y + ((mv << 16) >> (16 + SubpelBits));
+    const int filt_h = 128 >> ((mv >> 19) & 1);
+    const int filt_v = 128 >> ((mv >> 3) & 1);
+    int addr = cb_planes[plane].y + mvx + mvy * stride;
+
+    uint2 ref0, ref1 = 0;
+
+    const uint shift = (addr & 3) << 3;
+    addr &= ~3;
+    ref0 = dst_frame.Load2(addr);
+    ref0.x = (ref0.x >> shift) | ((ref0.y << (24 - shift)) << 8);
+    ref0.y = ref0.y >> shift;
+
+    if (filt_v != 128) {
+      ref1 = dst_frame.Load2(addr + stride);
+      ref1.x = (ref1.x >> shift) | ((ref1.y << (24 - shift)) << 8);
+      ref1.y = ref1.y >> shift;
+    }
+
+    pixels =
+        (compute_bc((ref0.x >> 0) & 255, (ref0.x >> 8) & 255, (ref1.x >> 0) & 255, (ref1.x >> 8) & 255, filt_h, filt_v)
+         << 0) |
+        (compute_bc((ref0.x >> 8) & 255, (ref0.x >> 16) & 255, (ref1.x >> 8) & 255, (ref1.x >> 16) & 255, filt_h,
+                    filt_v)
+         << 8) |
+        (compute_bc((ref0.x >> 16) & 255, (ref0.x >> 24) & 255, (ref1.x >> 16) & 255, (ref1.x >> 24) & 255, filt_h,
+                    filt_v)
+         << 16) |
+        (compute_bc((ref0.x >> 24) & 255, (ref0.y >> 0) & 255, (ref1.x >> 24) & 255, (ref1.y >> 0) & 255, filt_h,
+                    filt_v)
+         << 24);
+  }
+
+  uint dc = 0;
+  if (mode >= MODE_DC) {
+    uint count = 0;
+    if (above_available) {
+      for (int i = 0; i < bw; ++i) dc += above[loc_base + i];
+      count += bw;
+    }
+    if (left_available) {
+      for (int i = 0; i < bh; ++i) dc += left[loc_base + i];
+      count += bh;
+    }
+    dc += count >> 1;
+    dc = count ? dc / count : 128;
+    pixels = 0x01010101 * dc;
+  }
+
+  GroupMemoryBarrierWithGroupSync();
+
+  int4 luma = 0;
+  if (mode == MODE_CFL) {
+    const int max_x = (block.z & 0xffff) - 4;
+    const int max_y = (block.z >> 16) - 2;
+    const int y_stride = cb_planes[0].x;
+    const int luma_x = (bx + x0) << 1;
+    const int luma_y = min((by + y) << 1, max_y);
+    int y_offset = cb_planes[0].y + min(luma_x, max_x) + luma_y * y_stride;
+    uint2 luma0 = dst_frame.Load2(y_offset);
+    uint2 luma1 = dst_frame.Load2(y_offset + y_stride);
+    if (luma_x >= max_x) {
+      luma0.y = (luma0.x >> 16) | (luma0.x & 0xffff0000);
+      luma1.y = (luma1.x >> 16) | (luma1.x & 0xffff0000);
+      if (luma_x > max_x) {
+        luma0.x = luma0.y;
+        luma1.x = luma1.y;
+      }
+    }
+    luma.x = ((luma0.x >> 0) & 255) + ((luma0.x >> 8) & 255) + ((luma1.x >> 0) & 255) + ((luma1.x >> 8) & 255);
+    luma.y = ((luma0.x >> 16) & 255) + ((luma0.x >> 24) & 255) + ((luma1.x >> 16) & 255) + ((luma1.x >> 24) & 255);
+    luma.z = ((luma0.y >> 0) & 255) + ((luma0.y >> 8) & 255) + ((luma1.y >> 0) & 255) + ((luma1.y >> 8) & 255);
+    luma.w = ((luma0.y >> 16) & 255) + ((luma0.y >> 24) & 255) + ((luma1.y >> 16) & 255) + ((luma1.y >> 24) & 255);
+    luma <<= 1;
+    above[loc_base + wi] = luma.x + luma.y + luma.z + luma.w;
+  }
+
+  GroupMemoryBarrierWithGroupSync();
+
+  if (mode == MODE_CFL)  // reduce
+  {
+    int count = wi_count >> 2;
+    // max cfl block  32x32    /    4(pixels) = 256 wi (wi_count);
+    // count = 256wi / 4  = up to 64wi == wavefront size, => we can use GroupMemoryBarrier();
+    int ofs = loc_base + wi * 4;
+    if (wi < count) {
+      int sum = above[ofs + 0] + above[ofs + 1] + above[ofs + 2] + above[ofs + 3];
+      GroupMemoryBarrier();
+      above[loc_base + wi] = sum;
+      GroupMemoryBarrier();
+    }
+    ofs = loc_base + wi * 2;
+    count >>= 1;
+    while (wi < count) {
+      int sum = above[ofs + 0] + above[ofs + 1];
+      GroupMemoryBarrier();
+      above[loc_base + wi] = sum;
+      GroupMemoryBarrier();
+      count >>= 1;
+    }
+  }
+
+  GroupMemoryBarrierWithGroupSync();
+
+  if (thread.x >= cb_wi_count) return;
+
+  if (mode == MODE_CFL) {
+    int avrg = (above[loc_base] + (2 << bsize_log)) >> (bsize_log + 2);
+    int alpha = ((block.y >> 22) & 63) - 16;
+
+    pixels = compute_cfl_pixel(dc, (luma.x - avrg) * alpha);
+    pixels |= compute_cfl_pixel(dc, (luma.y - avrg) * alpha) << 8;
+    pixels |= compute_cfl_pixel(dc, (luma.z - avrg) * alpha) << 16;
+    pixels |= compute_cfl_pixel(dc, (luma.w - avrg) * alpha) << 24;
+  }
+
+  if (mode < MODE_INTRA_BC) {
+    const int frac_bits_x = 6 - upsample_above;
+    const int frac_bits_y = 6 - upsample_left;
+    const int min_base_x = -(1 << upsample_above);
+
+    above_count <<= upsample_above;
+    left_count <<= upsample_left;
+    uint topleft = above[loc_base - 1];
+
+    for (int i = 0; i < 4; ++i) {
+      int x = i + x0;
+      ////1:
+      // int offset_x = (1 + y) * dx;
+      // int base_x = offset_x >> frac_bits_x;
+      //
+      ////2:
+      // int offset_x = (x << 6) - (y + 1) * dx;
+      // const int base_x = offset_x >> frac_bits_x;
+      //
+      // int offset_y = (y << 6) - (x + 1) * dy;
+      // const int base_y = offset_y >> frac_bits_y;
+      //
+      ////3:
+      // int offset_y = (1 + x) * dy;
+      // int base_y = offset_x >> frac_bits_x;
+      //
+      // int shift_y = ((offset_y << upsample_above) & 0x3F) >> 1;
+      // int shift_x = ((offset_x << upsample_above) & 0x3F) >> 1;
+      // Note:     for non dir modes:
+      //        dx = 0; dy = 0;     upsample_above = 0;     upsample_left = 0;
+
+      const int offset_x = (x << 6) + (y + 1) * params.x;
+      const int offset_y = (y << 6) + (x + 1) * params.y;
+      const int shift_x = ((offset_x << upsample_above) & 0x3F) >> 1;
+      const int shift_y = ((offset_y << upsample_left) & 0x3F) >> 1;
+
+      // Note: shift_x = 0 & shift_y = 0 if (dx == 0) & (dy == 0);
+      // maybe this can be used to simplify weight calc;
+
+      const int base_x = offset_x >> frac_bits_x;
+      int l0 = left[loc_base + clamp(offset_y >> frac_bits_y, -2, left_count - 1)];
+      int l1 = left[loc_base + clamp((offset_y >> frac_bits_y) + 1, -2, left_count - 1)];
+      int t0 = above[loc_base + clamp(base_x, -2, above_count - 1)];
+      int t1 = above[loc_base + clamp(base_x + 1, -2, above_count - 1)];
+
+      // switch(mode)
+      //{
+      //    case (SMOOTH_V | SMOOTH_H):
+      //        w_t0 = *(sm_weight_arrays + bh + y);
+      //        w_l0 = *(sm_weight_arrays + bw + x);
+      //
+      //    case SMOOTH_V:
+      //        w_t0 = *(sm_weight_arrays + bh + y);
+      //
+      //    case SMOOTH_H:
+      //        w_l0 = *(sm_weight_arrays + bw + x);
+      //
+      //    case V:
+      //    case DIR_1:
+      //        w_t0 = 32 - shift_x;
+      //
+      //    case H:
+      //    case DIR_3:
+      //        w_l0 = 32 - shift_y;
+      //
+      //    case MODE_PAETH:
+      //        int base = t0 + l0 - topleft;
+      //        int diff_t = abs(t0 - base);
+      //        int diff_l = abs(l0 - base);
+      //        int diff_tl = abs(topleft - base);
+      //        w_t0 = diff_t < diff_l && diff_t < diff_tl;
+      //        w_l0 = diff_l < diff_t && diff_l < diff_tl;
+      //};
+
+      int mode_info = params.z & 15;
+      if (mode_info == Dir2)  // eliminate dir2
+        mode_info = base_x >= min_base_x ? Dir1 : Dir3;
+
+      int w_t0 = (mode_info & SmoothV) ? cb_sm_weight_arrays[bh + y].x : (mode_info == Dir1) ? (32 - shift_x) : 0;
+      int w_l0 = (mode_info & SmoothH) ? cb_sm_weight_arrays[bw + x].x : (mode_info == Dir3) ? (32 - shift_y) : 0;
+
+      if (mode == MODE_PAETH) {
+        int diff_t = topleft - l0;
+        int diff_l = topleft - t0;
+        int diff_tl = abs(diff_t + diff_l);
+        diff_t = abs(diff_t);
+        diff_l = abs(diff_l);
+        w_l0 = diff_l <= diff_t && diff_l <= diff_tl;
+        w_t0 = diff_t < diff_l && diff_t <= diff_tl;
+        w_l0 <<= 5;
+        w_t0 <<= 5;
+      }
+
+      int w_tl = (mode == MODE_PAETH) ? (32 - w_t0 - w_l0) : 0;
+      int w_l1 = (mode_info == Dir3) ? shift_y : 0;
+      int w_t1 = (mode_info == Dir1) ? shift_x : 0;
+      int w_b = (mode_info & SmoothV) ? 0x100 - w_t0 : 0;
+      int w_r = (mode_info & SmoothH) ? 0x100 - w_l0 : 0;
+
+      int sum = w_tl * topleft + w_b * left[loc_base + bh - 1] + w_r * above[loc_base + bw - 1] + w_l0 * l0 +
+                w_t0 * t0 + w_l1 * l1 + w_t1 * t1;
+
+      sum += 1 << (params.w - 1);
+      sum >>= params.w;
+      pixels |= clamp(sum, 0, 255) << (i * 8);
+    }
+  }
+
+  const int addr = corner + 4 + x0 + (y + 1) * stride;
+  if (block.y & 0x80000000)  // inter-intra
+  {
+    int wedge_addr = (block.z << 6) + x0 + y * bw;
+    uint wedge = wedge_mask.Load(wedge_addr);
+    uint inter = dst_frame.Load(addr);
+
+    pixels = (intra_inter_blend((wedge >> 0) & 255, (pixels >> 0) & 255, (inter >> 0) & 255) << 0) |
+             (intra_inter_blend((wedge >> 8) & 255, (pixels >> 8) & 255, (inter >> 8) & 255) << 8) |
+             (intra_inter_blend((wedge >> 16) & 255, (pixels >> 16) & 255, (inter >> 16) & 255) << 16) |
+             (intra_inter_blend((wedge >> 24) & 255, (pixels >> 24) & 255, (inter >> 24) & 255) << 24);
+  }
+  if (block.y & (1 << 5)) {
+    const int res_stride = cb_planes[plane].z;
+    const int res_addr = cb_planes[plane].w + 2 * (bx + x0) + (by + y) * res_stride;
+    int2 res = residuals.Load2(res_addr);
+    pixels = (clamp((int)((pixels >> 0) & 255) + (int)((res.x << 16) >> 16), 0, 255) << 0) |
+             (clamp((int)((pixels >> 8) & 255) + (int)(res.x >> 16), 0, 255) << 8) |
+             (clamp((int)((pixels >> 16) & 255) + (int)((res.y << 16) >> 16), 0, 255) << 16) |
+             (clamp((int)((pixels >> 24) & 255) + (int)(res.y >> 16), 0, 255) << 24);
+  }
+
+  dst_frame.Store(addr, pixels);
+}

diff --git a/libav1/dx/shaders/intra_main_hbd.hlsl b/libav1/dx/shaders/intra_main_hbd.hlsl
new file mode 100644
index 0000000..c819dfb
--- /dev/null
+++ b/libav1/dx/shaders/intra_main_hbd.hlsl

@@ -0,0 +1,607 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+cbuffer IntraDataCommon : register(b0) {
+  int4 cb_planes[3];
+  int4 cb_flags;
+  int4 cb_filter[5][8][2];
+  int4 cb_mode_params_lut[16][7];
+  int4 cb_sm_weight_arrays[128];
+};
+
+cbuffer PSSLIntraSRT : register(b1) {
+  int4 cb_counts0;
+  int4 cb_counts1;
+  uint cb_wi_count;
+  int cb_pass_offset;
+};
+
+ByteAddressBuffer pred_blocks : register(t0);
+ByteAddressBuffer residuals : register(t1);
+ByteAddressBuffer wedge_mask : register(t2);
+RWByteAddressBuffer dst_frame : register(u0);
+
+#define Dir1 1
+#define Dir2 2
+#define Dir3 3
+#define SmoothV 4
+#define SmoothH 8
+
+#define MODE_PAETH 11
+#define MODE_INTRA_BC 12
+#define MODE_FILTER 13
+#define MODE_DC 14
+#define MODE_CFL 15
+
+#define NeedAboveShift 4
+#define NeedRightShift 5
+#define NeedLeftShift 6
+#define NeedBotShift 7
+#define NeedAboveLeftShift 8
+#define FilterAboveLeftFlag 0x200
+#define NeedAboveLeftLUT 0x08ff
+// intra bc:
+#define SubpelBits 4
+#define FilterLineShift 3
+#define OutputShift 11
+#define OffsetBits 19
+#define SumAddHor (1 << 14)
+#define SumAddVert ((1 << OffsetBits) + (1 << (OutputShift - 1)))
+#define OutputSub ((1 << (OffsetBits - OutputShift)) + (1 << (OffsetBits - OutputShift - 1)))
+
+groupshared int above[64 * 20];
+groupshared int left[64 * 20];
+
+int compute_bc(int p0, int p1, int p2, int p3, int fh, int fv) {
+  int l0 = ((1 << 14) + p0 * fh + p1 * (128 - fh) + (1 << (FilterLineShift - 1))) >> FilterLineShift;
+  int l1 = ((1 << 14) + p2 * fh + p3 * (128 - fh) + (1 << (FilterLineShift - 1))) >> FilterLineShift;
+  int output = SumAddVert + l0 * fv + l1 * (128 - fv);
+  return clamp((output >> OutputShift) - OutputSub, 0, 1023);
+}
+
+uint compute_cfl_pixel(int dc, int value) {
+  value = (value < 0) ? -((-value + (1 << 5)) >> 6) : ((value + (1 << 5)) >> 6);
+  return clamp(dc + value, 0, 1023);
+}
+
+int intra_inter_blend(int mask, int intra, int inter) {
+  return clamp((intra * mask + inter * (64 - mask) + 32) >> 6, 0, 1023);
+}
+#define WG_SIZE 256
+[numthreads(WG_SIZE, 1, 1)] void main(uint3 thread
+                                      : SV_DispatchThreadID) {
+  int4 counts0 = cb_counts0;
+  int4 counts1 = cb_counts1;
+  int bsize_log = 10;
+  int offset = cb_pass_offset;
+  int threadx = thread.x;
+  uint3 block = uint3(0, 0, 0);
+  int wi_count = 0;
+  int wi = 1024;
+  if (thread.x < cb_wi_count) {
+    if (threadx >= (counts0.x << 10)) {
+      offset += cb_counts0.x;
+      threadx -= counts0.x << 10;
+      bsize_log = 9;
+      if (threadx >= (counts0.y << 9)) {
+        offset += cb_counts0.y;
+        threadx -= counts0.y << 9;
+        bsize_log = 8;
+        if (threadx >= (counts0.z << 8)) {
+          offset += cb_counts0.z;
+          threadx -= counts0.z << 8;
+          bsize_log = 7;
+          if (threadx >= (counts0.w << 7)) {
+            offset += cb_counts0.w;
+            threadx -= counts0.w << 7;
+            bsize_log = 6;
+            if (threadx >= (counts1.x << 6)) {
+              offset += cb_counts1.x;
+              threadx -= counts1.x << 6;
+              bsize_log = 5;
+              if (threadx >= (counts1.y << 5)) {
+                offset += cb_counts1.y;
+                threadx -= counts1.y << 5;
+                bsize_log = 4;
+                if (threadx >= (counts1.z << 4)) {
+                  offset += cb_counts1.z;
+                  threadx -= counts1.z << 4;
+                  bsize_log = 3;
+                  if (threadx >= (counts1.w << 3)) {
+                    offset += cb_counts1.w;
+                    threadx -= counts1.w << 3;
+                    bsize_log = 2;
+                  }
+                }
+              }
+            }
+          }
+        }
+      }
+    }
+
+    int block_index = offset + (threadx >> bsize_log);
+    block = pred_blocks.Load3(block_index << 4);
+    wi_count = min(WG_SIZE, 1 << bsize_log);
+    wi = thread.x & (wi_count - 1);
+  }
+  const int bw_log = block.y & 7;  // 0..4
+  const int bw = 4 << bw_log;
+  const int bh = 1 << (bsize_log - bw_log);
+
+  int bx = (block.x & 0xffff) << 2;
+  int by = (block.x >> 16) << 2;
+
+  const int plane = (block.y >> 3) & 3;
+  const int above_available = ((block.y >> 10) & 63) << 1;
+  const int left_available = ((block.y >> 16) & 63) << 2;
+
+  const int mode = (block.y >> 6) & 15;
+
+  const int mode_angle = (block.y >> 28) & 7;
+  int4 params = cb_mode_params_lut[mode][mode_angle];
+
+  //    block.y bits:
+  //    0    3        bw_log
+  //    3    2        plane
+  //    5    1        non skip
+  //    6    4        mode
+  //        all mods except intra_bc:
+  //    10    6        above_available
+  //    16    6        left_available
+  //            dir mode params:
+  //    22    2        upsample
+  //    24    2        edge_filter above
+  //    26    2        edge_filter left
+  //    28    3        mode_angle
+  //    31    1        inter_intra?
+  //            CFL:
+  //    22    4        alpha
+  //
+  //    block.z
+  //        intra_bc - mv
+  //        inter-intra    - coef. table indexes;
+  //        filter - bh_log | filter_mode;
+  //    block.w - reserved; (prob. used for sorting);
+
+  const int stride = cb_planes[plane].x;
+  const int corner = cb_planes[plane].y + ((bx << 1) - 4) + (by - 1) * stride;
+
+  const int loc_base = (((thread.x & (WG_SIZE - 1)) - wi) >> 2) * 20 + 2;
+  int above_count = ((params.z >> NeedAboveShift) & 1) * bw + ((params.z >> NeedRightShift) & 1) * bh;
+
+  if (wi < (above_count >> 1) || wi == 0) {
+    int addr = corner + 4 + 4 * min(wi, above_available - 1);
+    uint pixels =
+        above_available ? dst_frame.Load(addr) : left_available ? dst_frame.Load(corner + stride) : 0x01ff0000;
+    if (wi >= above_available) pixels = (pixels & 0xffff0000) | (pixels >> 16);
+    above[loc_base + (wi << 1) + 0] = (pixels >> 0) & 1023;
+    above[loc_base + (wi << 1) + 1] = (pixels >> 16) & 1023;
+  }
+
+  // Left:
+  GroupMemoryBarrierWithGroupSync();
+
+  int left_count = ((params.z >> NeedLeftShift) & 1) * bh + ((params.z >> NeedBotShift) & 1) * bw;
+
+  if (wi < left_count || wi == 0) {
+    const int addr = corner + (min(wi, left_available - 1) + 1) * stride;
+    left[loc_base + wi] = left_available ? (dst_frame.Load(addr) >> 16) : above_available ? above[loc_base] : 513;
+  }
+  if (wi < (left_count - wi_count)) {
+    const int addr = corner + (min(wi + wi_count, left_available - 1) + 1) * stride;
+    left[loc_base + wi + wi_count] =
+        left_available ? (dst_frame.Load(addr) >> 16) : above_available ? above[loc_base] : 513;
+  }
+
+  GroupMemoryBarrierWithGroupSync();
+
+  const int need_aboveleft = (params.z >> NeedAboveLeftShift) & 1;
+  if (wi == 0 && need_aboveleft) {
+    uint t = above[loc_base];
+    uint l = left[loc_base];
+    uint topleft = (left_available && above_available) ? (dst_frame.Load(corner) >> 16)
+                                                       : left_available ? l : above_available ? t : 512;
+
+    if ((params.z & FilterAboveLeftFlag) && (bw + bh >= 24) && cb_flags.x)
+      topleft = ((l + t) * 5 + topleft * 6 + 8) >> 4;
+
+    above[loc_base - 1] = topleft;
+    left[loc_base - 1] = topleft;
+    above[loc_base - 2] = topleft;
+    left[loc_base - 2] = topleft;
+  }
+
+  GroupMemoryBarrierWithGroupSync();
+
+  int dir_mode = mode < MODE_DC;
+  int upsample_above = (block.y >> 22) & dir_mode;
+  int upsample_left = (block.y >> 23) & dir_mode;
+
+  int filter_count = 0;
+  if (above_available && ((params.z >> NeedAboveShift) & 1)) {
+    filter_count = min(above_available << 2, bw) + ((params.z >> NeedRightShift) & 1) * bh + need_aboveleft - 1;
+  }
+  int edge_filter = (block.y >> 24) & ((wi < filter_count && dir_mode) * 3);
+
+  int sum0 = 8;
+  int sum1 = 8;
+  const int wi1 = wi + wi_count;
+  if (edge_filter) {
+    edge_filter <<= 2;
+    const int base = loc_base - need_aboveleft;
+    const int last = filter_count;
+    //{ 0, 4, 8, 4, 0 }, { 0, 5, 6, 5, 0 }, { 2, 4, 4, 4, 2 }
+    sum0 += above[base + max(wi - 1, 0)] * ((0x2000 >> edge_filter) & 15);
+    sum0 += above[base + (wi + 0)] * ((0x4540 >> edge_filter) & 15);
+    sum0 += above[base + (wi + 1)] * ((0x4680 >> edge_filter) & 15);
+    sum0 += above[base + min(wi + 2, last)] * ((0x4540 >> edge_filter) & 15);
+    sum0 += above[base + min(wi + 3, last)] * ((0x2000 >> edge_filter) & 15);
+    if (wi1 < filter_count)  // rare, some mods for 4x4, 8x4, 4x8 blocks
+    {
+      sum1 += above[base + max(wi1 - 1, 0)] * ((0x2000 >> edge_filter) & 15);
+      sum1 += above[base + (wi1 + 0)] * ((0x4540 >> edge_filter) & 15);
+      sum1 += above[base + (wi1 + 1)] * ((0x4680 >> edge_filter) & 15);
+      sum1 += above[base + min(wi1 + 2, last)] * ((0x4540 >> edge_filter) & 15);
+      sum1 += above[base + min(wi1 + 3, last)] * ((0x2000 >> edge_filter) & 15);
+    }
+  }
+  GroupMemoryBarrierWithGroupSync();
+  if (edge_filter) {
+    above[loc_base + wi] = sum0 >> 4;
+    if (wi1 < filter_count) {
+      above[loc_base + wi1] = sum1 >> 4;
+    }
+  }
+
+  if (left_available && ((params.z >> NeedLeftShift) & 1)) {
+    filter_count = min(left_available, bh) + ((params.z >> NeedBotShift) & 1) * bw + need_aboveleft - 1;
+  }
+  edge_filter = (block.y >> 26) & ((wi < filter_count && dir_mode) * 3);
+  if (edge_filter) {
+    sum0 = 8;
+    sum1 = 8;
+    edge_filter *= 4;
+    const int base = loc_base - need_aboveleft;
+    const int last = filter_count;
+    sum0 += left[base + max(wi - 1, 0)] * ((0x2000 >> edge_filter) & 15);
+    sum0 += left[base + (wi + 0)] * ((0x4540 >> edge_filter) & 15);
+    sum0 += left[base + (wi + 1)] * ((0x4680 >> edge_filter) & 15);
+    sum0 += left[base + min(wi + 2, last)] * ((0x4540 >> edge_filter) & 15);
+    sum0 += left[base + min(wi + 3, last)] * ((0x2000 >> edge_filter) & 15);
+    if (wi1 < filter_count) {
+      sum1 += left[base + max(wi1 - 1, 0)] * ((0x2000 >> edge_filter) & 15);
+      sum1 += left[base + (wi1 + 0)] * ((0x4540 >> edge_filter) & 15);
+      sum1 += left[base + (wi1 + 1)] * ((0x4680 >> edge_filter) & 15);
+      sum1 += left[base + min(wi1 + 2, last)] * ((0x4540 >> edge_filter) & 15);
+      sum1 += left[base + min(wi1 + 3, last)] * ((0x2000 >> edge_filter) & 15);
+    }
+  }
+  GroupMemoryBarrierWithGroupSync();
+  if (edge_filter) {
+    left[loc_base + wi] = sum0 >> 4;
+    if (wi1 < filter_count) {
+      left[loc_base + wi1] = sum1 >> 4;
+    }
+  }
+
+  GroupMemoryBarrierWithGroupSync();
+
+  int p0 = 0, p1 = 0, p2 = 0, p3 = 0, p4 = 0;
+  int do_upsample = upsample_above && wi < (above_count >> 1);
+  if (do_upsample) {
+    p0 = above[loc_base + wi * 2 - 2];
+    p1 = above[loc_base + wi * 2 - 1];
+    p2 = above[loc_base + wi * 2 + 0];
+    p3 = above[loc_base + wi * 2 + 1];
+    p4 = above[loc_base + min(wi * 2 + 2, above_count - 1)];
+  }
+  GroupMemoryBarrierWithGroupSync();
+  if (do_upsample) {
+    above[loc_base - 1 + wi * 4] = clamp((-p0 + 9 * p1 + 9 * p2 - p3 + 8) >> 4, 0, 1023);
+    above[loc_base + 0 + wi * 4] = p2;
+    above[loc_base + 1 + wi * 4] = clamp((-p1 + 9 * p2 + 9 * p3 - p4 + 8) >> 4, 0, 1023);
+    above[loc_base + 2 + wi * 4] = p3;
+  }
+
+  do_upsample = upsample_left && wi < (left_count >> 1);
+  if (do_upsample) {
+    p0 = left[loc_base + wi * 2 - 2];
+    p1 = left[loc_base + wi * 2 - 1];
+    p2 = left[loc_base + wi * 2 + 0];
+    p3 = left[loc_base + wi * 2 + 1];
+    p4 = left[loc_base + min(wi * 2 + 2, left_count - 1)];
+  }
+  GroupMemoryBarrierWithGroupSync();
+  if (do_upsample) {
+    left[loc_base - 1 + wi * 4] = clamp((-p0 + 9 * p1 + 9 * p2 - p3 + 8) >> 4, 0, 1023);
+    left[loc_base + 0 + wi * 4] = p2;
+    left[loc_base + 1 + wi * 4] = clamp((-p1 + 9 * p2 + 9 * p3 - p4 + 8) >> 4, 0, 1023);
+    left[loc_base + 2 + wi * 4] = p3;
+  }
+  GroupMemoryBarrierWithGroupSync();
+
+  // int x0 = (wi & ((1 << bw_log) - 1)) << 2;
+  // int y = wi >> bw_log;
+  int x0 = (thread.x & ((1 << bw_log) - 1)) << 2;
+  int y = (thread.x & ((1 << bsize_log) - 1)) >> bw_log;
+
+  uint2 pixels = uint2(0, 0);
+  if (mode == MODE_INTRA_BC) {
+    int mv = (int)block.z;
+    int mvx = bx + x0 + ((mv) >> (16 + SubpelBits));
+    int mvy = by + y + ((mv << 16) >> (16 + SubpelBits));
+    const int filt_h = 128 >> ((mv >> 19) & 1);
+    const int filt_v = 128 >> ((mv >> 3) & 1);
+    int addr = cb_planes[plane].y + (mvx << 1) + mvy * stride;
+    uint3 ref0, ref1 = 0;
+
+    const uint shift = (addr & 2) << 3;
+    addr &= ~3;
+    ref0 = dst_frame.Load3(addr);
+    ref0.x = (ref0.x >> shift) | ((ref0.y << (24 - shift)) << 8);
+    ref0.y = (ref0.y >> shift) | ((ref0.z << (24 - shift)) << 8);
+    ref0.z = ref0.z >> shift;
+
+    if (filt_v != 128) {
+      ref1 = dst_frame.Load3(addr + stride);
+      ref1.x = (ref1.x >> shift) | ((ref1.y << (24 - shift)) << 8);
+      ref1.y = (ref1.y >> shift) | ((ref1.z << (24 - shift)) << 8);
+      ref1.z = ref1.z >> shift;
+    }
+
+    pixels.x = (compute_bc((ref0.x >> 0) & 1023, (ref0.x >> 16) & 1023, (ref1.x >> 0) & 1023, (ref1.x >> 16) & 1023,
+                           filt_h, filt_v)
+                << 0) |
+               (compute_bc((ref0.x >> 16) & 1023, (ref0.y >> 0) & 1023, (ref1.x >> 8) & 1023, (ref1.y >> 0) & 1023,
+                           filt_h, filt_v)
+                << 16);
+    pixels.y = (compute_bc((ref0.y >> 0) & 1023, (ref0.y >> 16) & 1023, (ref1.y >> 0) & 1023, (ref1.y >> 16) & 1023,
+                           filt_h, filt_v)
+                << 0) |
+               (compute_bc((ref0.y >> 16) & 1023, (ref0.z >> 0) & 1023, (ref1.y >> 16) & 1023, (ref1.z >> 0) & 1023,
+                           filt_h, filt_v)
+                << 16);
+  }
+
+  uint dc = 0;
+  if (mode >= MODE_DC) {
+    uint count = 0;
+    // todo: maybe some optization here?
+    if (above_available) {
+      for (int i = 0; i < bw; ++i) dc += above[loc_base + i];
+      count += bw;
+    }
+    if (left_available) {
+      for (int i = 0; i < bh; ++i) dc += left[loc_base + i];
+      count += bh;
+    }
+    dc += count >> 1;
+    dc = count ? dc / count : 512;
+
+    pixels.x = dc | (dc << 16);
+    pixels.y = pixels.x;
+  }
+
+  GroupMemoryBarrierWithGroupSync();
+
+  int4 luma = 0;
+  if (mode == MODE_CFL) {
+    const int max_y = (block.z >> 16) - 2;
+    const int y_stride = cb_planes[0].x;
+    const int luma_y = min((by + y) << 1, max_y);
+
+    // const int max_x = (block.z & 0xffff) - 4;
+    // const int luma_x = (bx + x0) << 1;
+    // int y_offset = cb_planes[0].y + min(luma_x, max_x) + luma_y * y_stride;
+    const int max_x = ((block.z & 0xffff) << 1) - 8;
+    const int luma_x = (bx + x0) << 2;
+    int y_offset = cb_planes[0].y + min(luma_x, max_x) + luma_y * y_stride;
+
+    uint4 luma0 = dst_frame.Load4(y_offset);
+    uint4 luma1 = dst_frame.Load4(y_offset + y_stride);
+
+    if (luma_x >= max_x) {
+      luma0.zw = luma0.yy;
+      luma1.zw = luma1.yy;
+      if (luma_x > max_x) {
+        luma0.x = luma0.y;
+        luma1.x = luma1.y;
+      }
+    }
+
+    luma.x = ((luma0.x >> 0) & 1023) + ((luma0.x >> 16) & 1023) + ((luma1.x >> 0) & 1023) + ((luma1.x >> 16) & 1023);
+    luma.y = ((luma0.y >> 0) & 1023) + ((luma0.y >> 16) & 1023) + ((luma1.y >> 0) & 1023) + ((luma1.y >> 16) & 1023);
+    luma.z = ((luma0.z >> 0) & 1023) + ((luma0.z >> 16) & 1023) + ((luma1.z >> 0) & 1023) + ((luma1.z >> 16) & 1023);
+    luma.w = ((luma0.w >> 0) & 1023) + ((luma0.w >> 16) & 1023) + ((luma1.w >> 0) & 1023) + ((luma1.w >> 16) & 1023);
+    luma <<= 1;
+    above[loc_base + wi] = luma.x + luma.y + luma.z + luma.w;
+  }
+
+  GroupMemoryBarrierWithGroupSync();
+
+  if (mode == MODE_CFL)  // reduce
+  {
+    int count = wi_count >> 2;
+    // max cfl block  32x32    /    4(pixels) = 256 wi (wi_count);
+    // count = 256wi / 4  = up to 64wi == wavefront size, => we can use GroupMemoryBarrier();
+    int ofs = loc_base + wi * 4;
+    if (wi < count) {
+      int sum = above[ofs + 0] + above[ofs + 1] + above[ofs + 2] + above[ofs + 3];
+      GroupMemoryBarrier();
+      above[loc_base + wi] = sum;
+      GroupMemoryBarrier();
+    }
+    ofs = loc_base + wi * 2;
+    count >>= 1;
+    while (wi < count) {
+      int sum = above[ofs + 0] + above[ofs + 1];
+      GroupMemoryBarrier();
+      above[loc_base + wi] = sum;
+      GroupMemoryBarrier();
+      count >>= 1;
+    }
+  }
+
+  GroupMemoryBarrierWithGroupSync();
+  if (thread.x >= cb_wi_count) return;
+
+  if (mode == MODE_CFL) {
+    int avrg = (above[loc_base] + (2 << bsize_log)) >> (bsize_log + 2);
+    int alpha = ((block.y >> 22) & 63) - 16;
+    pixels.x = compute_cfl_pixel(dc, (luma.x - avrg) * alpha);
+    pixels.x |= compute_cfl_pixel(dc, (luma.y - avrg) * alpha) << 16;
+    pixels.y = compute_cfl_pixel(dc, (luma.z - avrg) * alpha);
+    pixels.y |= compute_cfl_pixel(dc, (luma.w - avrg) * alpha) << 16;
+  }
+
+  if (mode < MODE_INTRA_BC) {
+    const int frac_bits_x = 6 - upsample_above;
+    const int frac_bits_y = 6 - upsample_left;
+    const int min_base_x = -(1 << upsample_above);
+
+    above_count <<= upsample_above;
+    left_count <<= upsample_left;
+    uint topleft = above[loc_base - 1];
+
+    for (int i = 0; i < 4; ++i) {
+      int x = i + x0;
+      ////1:
+      // int offset_x = (1 + y) * dx;
+      // int base_x = offset_x >> frac_bits_x;
+      //
+      ////2:
+      // int offset_x = (x << 6) - (y + 1) * dx;
+      // const int base_x = offset_x >> frac_bits_x;
+      //
+      // int offset_y = (y << 6) - (x + 1) * dy;
+      // const int base_y = offset_y >> frac_bits_y;
+      //
+      ////3:
+      // int offset_y = (1 + x) * dy;
+      // int base_y = offset_x >> frac_bits_x;
+      //
+      // int shift_y = ((offset_y << upsample_above) & 0x3F) >> 1;
+      // int shift_x = ((offset_x << upsample_above) & 0x3F) >> 1;
+      // Note:     for non dir modes:
+      //        dx = 0; dy = 0;     upsample_above = 0;     upsample_left = 0;
+
+      const int offset_x = (x << 6) + (y + 1) * params.x;
+      const int offset_y = (y << 6) + (x + 1) * params.y;
+      const int shift_x = ((offset_x << upsample_above) & 0x3F) >> 1;
+      const int shift_y = ((offset_y << upsample_left) & 0x3F) >> 1;
+
+      // Note: shift_x = 0 & shift_y = 0 if (dx == 0) & (dy == 0);
+      // maybe this can be used to simplify weight calc;
+
+      const int base_x = offset_x >> frac_bits_x;
+      int l0 = left[loc_base + clamp(offset_y >> frac_bits_y, -2, left_count - 1)];
+      int l1 = left[loc_base + clamp((offset_y >> frac_bits_y) + 1, -2, left_count - 1)];
+      int t0 = above[loc_base + clamp(base_x, -2, above_count - 1)];
+      int t1 = above[loc_base + clamp(base_x + 1, -2, above_count - 1)];
+
+      // switch(mode)
+      //{
+      //    case (SMOOTH_V | SMOOTH_H):
+      //        w_t0 = *(sm_weight_arrays + bh + y);
+      //        w_l0 = *(sm_weight_arrays + bw + x);
+      //
+      //    case SMOOTH_V:
+      //        w_t0 = *(sm_weight_arrays + bh + y);
+      //
+      //    case SMOOTH_H:
+      //        w_l0 = *(sm_weight_arrays + bw + x);
+      //
+      //    case V:
+      //    case DIR_1:
+      //        w_t0 = 32 - shift_x;
+      //
+      //    case H:
+      //    case DIR_3:
+      //        w_l0 = 32 - shift_y;
+      //
+      //    case MODE_PAETH:
+      //        int base = t0 + l0 - topleft;
+      //        int diff_t = abs(t0 - base);
+      //        int diff_l = abs(l0 - base);
+      //        int diff_tl = abs(topleft - base);
+      //        w_t0 = diff_t < diff_l && diff_t < diff_tl;
+      //        w_l0 = diff_l < diff_t && diff_l < diff_tl;
+      //};
+
+      int mode_info = params.z & 15;
+      if (mode_info == Dir2)  // eliminate dir2
+        mode_info = base_x >= min_base_x ? Dir1 : Dir3;
+
+      int w_t0 = (mode_info & SmoothV) ? cb_sm_weight_arrays[bh + y].x : (mode_info == Dir1) ? (32 - shift_x) : 0;
+      int w_l0 = (mode_info & SmoothH) ? cb_sm_weight_arrays[bw + x].x : (mode_info == Dir3) ? (32 - shift_y) : 0;
+
+      if (mode == MODE_PAETH) {
+        int diff_t = topleft - l0;
+        int diff_l = topleft - t0;
+        int diff_tl = abs(diff_t + diff_l);
+        diff_t = abs(diff_t);
+        diff_l = abs(diff_l);
+        w_l0 = diff_l <= diff_t && diff_l <= diff_tl;
+        w_t0 = diff_t < diff_l && diff_t <= diff_tl;
+        w_l0 <<= 5;
+        w_t0 <<= 5;
+      }
+
+      int w_tl = (mode == MODE_PAETH) ? (32 - w_t0 - w_l0) : 0;
+      int w_l1 = (mode_info == Dir3) ? shift_y : 0;
+      int w_t1 = (mode_info == Dir1) ? shift_x : 0;
+      int w_b = (mode_info & SmoothV) ? 0x100 - w_t0 : 0;
+      int w_r = (mode_info & SmoothH) ? 0x100 - w_l0 : 0;
+
+      int sum = w_tl * topleft + w_b * left[loc_base + bh - 1] + w_r * above[loc_base + bw - 1] + w_l0 * l0 +
+                w_t0 * t0 + w_l1 * l1 + w_t1 * t1;
+
+      sum += 1 << (params.w - 1);
+      sum >>= params.w;
+      int val = clamp(sum, 0, 1023) << ((i & 1) * 16);
+      if (i & 2)
+        pixels.y |= val;
+      else
+        pixels.x |= val;
+    }
+  }
+
+  const int addr = corner + 4 + (x0 << 1) + (y + 1) * stride;
+  if (block.y & 0x80000000)  // inter-intra
+  {
+    int wedge_addr = (block.z << 6) + x0 + y * bw;
+    uint wedge = wedge_mask.Load(wedge_addr);
+    uint2 inter = dst_frame.Load2(addr);
+
+    pixels.x = intra_inter_blend((wedge >> 0) & 255, (pixels.x >> 0) & 1023, (inter.x >> 0) & 1023) |
+               (intra_inter_blend((wedge >> 8) & 255, (pixels.x >> 16) & 1023, (inter.x >> 16) & 1023) << 16);
+    pixels.y = intra_inter_blend((wedge >> 16) & 255, (pixels.y >> 0) & 1023, (inter.y >> 0) & 1023) |
+               (intra_inter_blend((wedge >> 24) & 255, (pixels.y >> 16) & 1023, (inter.y >> 16) & 1023) << 16);
+  }
+
+  if (block.y & (1 << 5)) {
+    const int res_stride = cb_planes[plane].z;
+    const int res_addr = cb_planes[plane].w + ((bx + x0) << 1) + (by + y) * res_stride;
+    int2 res = residuals.Load2(res_addr);
+    pixels.x = clamp((int)((pixels.x >> 0) & 1023) + (int)((res.x << 16) >> 16), 0, 1023) |
+               (clamp((int)((pixels.x >> 16) & 1023) + (int)(res.x >> 16), 0, 1023) << 16);
+    pixels.y = clamp((int)((pixels.y >> 0) & 1023) + (int)((res.y << 16) >> 16), 0, 1023) |
+               (clamp((int)((pixels.y >> 16) & 1023) + (int)(res.y >> 16), 0, 1023) << 16);
+  }
+
+  dst_frame.Store2(addr, pixels);
+}

diff --git a/libav1/dx/shaders/loop_restoration.hlsl b/libav1/dx/shaders/loop_restoration.hlsl
new file mode 100644
index 0000000..989b417
--- /dev/null
+++ b/libav1/dx/shaders/loop_restoration.hlsl

@@ -0,0 +1,353 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma warning(disable : 3557)
+#include "restoration.h"
+
+#define MI_SIZE 4
+#define MI_SIZE_LOG2 2
+#define Round2(value, n) (((value) + (((1 << (n)) >> 1))) >> (n))
+#define STRIPE_SIZE 64
+#define RESTORE_NONE 0
+#define RESTORE_WIENER 1
+#define RESTORE_SGRPROJ 2
+#define RESTORE_SWITCHABLE 3
+#define RESTORE_SWITCHABLE_TYPES RESTORE_SWITCHABLE
+#define RESTORE_TYPES 4
+#define WIENER_ROUND0_BITS 3
+#define FILTER_BITS 7
+#define InterRound0 WIENER_ROUND0_BITS
+#define InterRound1 (2 * FILTER_BITS - WIENER_ROUND0_BITS)
+
+ByteAddressBuffer src : register(t0);        // SRV
+ByteAddressBuffer LrTypeSgr : register(t1);  // SRV
+ByteAddressBuffer LrWiener : register(t2);   // SRV
+RWByteAddressBuffer dst : register(u0);      // UAV
+
+cbuffer PlaneRestorationData : register(b0) {
+  struct {
+    int4 Sgr_Params[16];
+  } data;
+};
+
+struct PlaneInfo {
+  int stride;
+  int offset;
+  int width;
+  int height;
+};
+
+struct UnitsInfo {
+  int Rows;
+  int Cols;
+  int Size;
+  int Stride;
+};
+
+struct PlaneRestorationData {
+  PlaneInfo plane;
+  UnitsInfo units;
+
+  int pp_offset;
+  int dst_offset;
+  int Lr_buffer_offset;
+  int subsampling;
+  int hbd;
+  int bit_depth;
+  int2 pad;
+};
+cbuffer PlaneRestorationConstBuffer : register(b1) { PlaneRestorationData pl[3]; };
+
+cbuffer cb_loop_rest_data : register(b2) {
+  int do_restoration;
+  int plane_id;
+}
+
+#define WG_WIDTH 16
+#define WG_HEIGHT 4
+
+groupshared int input[WG_HEIGHT + 6][WG_WIDTH + 8];
+groupshared int output[WG_HEIGHT][WG_WIDTH];
+groupshared int intermediate[WG_HEIGHT + 6][WG_WIDTH];
+
+groupshared int flt[2][WG_HEIGHT][WG_WIDTH];
+groupshared int A[WG_HEIGHT + 2][WG_WIDTH + 2];
+groupshared int B[WG_HEIGHT + 2][WG_WIDTH + 2];
+
+#define get_loaded_source_sample(x, y) input[y + 3][x + 4]
+void box_filter0(int w, int h, int r, int eps, int lx, int ly, int bit_depth) {
+  uint n = (2 * r + 1) * (2 * r + 1);
+  int i;
+  for (i = ly - 1; i < h + 1; i += WG_HEIGHT) {
+    for (int j = lx - 1; j < w + 1; j += WG_WIDTH) {
+      uint a = 0;
+      uint b = 0;
+      for (int dy = -r; dy <= r; dy++) {
+        for (int dx = -r; dx <= r; dx++) {
+          uint c = get_loaded_source_sample(j + dx, i + dy);
+          a += c * c;
+          b += c;
+        }
+      }
+      a = Round2(a, 2 * (bit_depth - 8));
+      uint d = Round2(b, bit_depth - 8);
+      uint p = max(0, int(a * n - d * d));
+      uint z = Round2(p * eps, SGRPROJ_MTABLE_BITS);  // p*s in documentation
+      z = min(z, 255);
+      // int a2 = x_by_xplus1[z];
+      uint a2 = 0;
+      if (z >= 255)
+        a2 = 256;
+      else if (z == 0)
+        a2 = 1;
+      else
+        a2 = ((z << SGRPROJ_SGR_BITS) + (z >> 1)) / (z + 1);
+      uint oneOverN = ((1 << SGRPROJ_RECIP_BITS) + (n >> 1)) / n;
+      uint b2 = ((1 << SGRPROJ_SGR_BITS) - a2) * b * oneOverN;
+      A[1 + i][1 + j] = a2;
+      B[1 + i][1 + j] = Round2(b2, SGRPROJ_RECIP_BITS);
+    }
+  }
+  for (i = ly; i < h; i += WG_HEIGHT) {
+    int shift = 5;  // -((1 - stage) * (i & 1));
+    if (i & 1) {
+      shift = 4;
+    }
+    for (int j = lx; j < w; j += WG_WIDTH) {
+      int a = 0;
+      int b = 0;
+      for (int dy = -1; dy <= 1; dy++) {
+        for (int dx = -1; dx <= 1; dx++) {
+          int weight = 0;
+          if ((i + dy) & 1) {
+            weight = (dx == 0) ? 6 : 5;
+          } else {
+            weight = 0;
+          }
+          a += weight * A[1 + i + dy][1 + j + dx];
+          b += weight * B[1 + i + dy][1 + j + dx];
+        }
+      }
+      int v = a * get_loaded_source_sample(j, i) + b;
+      flt[0][i][j] = Round2(v, SGRPROJ_SGR_BITS + shift - SGRPROJ_RST_BITS);
+    }
+  }
+}
+
+void box_filter1(int w, int h, int r, int eps, int lx, int ly, int bit_depth) {
+  uint n = (2 * r + 1) * (2 * r + 1);
+  int i;
+  for (i = ly - 1; i < h + 1; i += WG_HEIGHT) {
+    for (int j = lx - 1; j < w + 1; j += WG_WIDTH) {
+      uint a = 0;
+      uint b = 0;
+      for (int dy = -r; dy <= r; dy++) {
+        for (int dx = -r; dx <= r; dx++) {
+          uint c = get_loaded_source_sample(j + dx, i + dy);
+          a += c * c;
+          b += c;
+        }
+      }
+      a = Round2(a, 2 * (bit_depth - 8));
+      uint d = Round2(b, bit_depth - 8);
+      uint p = max(0, int(a * n - d * d));
+      uint z = Round2(p * eps, SGRPROJ_MTABLE_BITS);  // p*s in documentation
+      z = min(z, 255);
+      uint a2 = 0;
+      if (z >= 255)
+        a2 = 256;
+      else if (z == 0)
+        a2 = 1;
+      else
+        a2 = ((z << SGRPROJ_SGR_BITS) + (z >> 1)) / (z + 1);
+      uint oneOverN = ((1 << SGRPROJ_RECIP_BITS) + (n >> 1)) / n;
+      uint b2 = ((1 << SGRPROJ_SGR_BITS) - a2) * b * oneOverN;
+      A[1 + i][1 + j] = a2;
+      B[1 + i][1 + j] = Round2(b2, SGRPROJ_RECIP_BITS);
+    }
+  }
+  for (i = ly; i < h; i += WG_HEIGHT) {
+    int shift = 5;  // -((1 - stage) * (i & 1));
+    for (int j = lx; j < w; j += WG_WIDTH) {
+      int a = 0;
+      int b = 0;
+      for (int dy = -1; dy <= 1; dy++) {
+        for (int dx = -1; dx <= 1; dx++) {
+          int weight = 0;
+          weight = (dx == 0 || dy == 0) ? 4 : 3;
+          a += weight * A[1 + i + dy][1 + j + dx];
+          b += weight * B[1 + i + dy][1 + j + dx];
+        }
+      }
+      int v = a * get_loaded_source_sample(j, i) + b;
+      flt[1][i][j] = Round2(v, SGRPROJ_SGR_BITS + shift - SGRPROJ_RST_BITS);
+    }
+  }
+}
+
+[numthreads(WG_WIDTH, WG_HEIGHT, 1)] void main(uint3 thread
+                                               : SV_DispatchThreadID) {
+  const int gx = thread.x & (~(WG_WIDTH - 1));
+  const int gy = thread.y & (~(WG_HEIGHT - 1));
+  const int lx = thread.x & (WG_WIDTH - 1);
+  const int ly = thread.y & (WG_HEIGHT - 1);
+
+  const int subsampling = pl[plane_id].subsampling;
+  const int bit_depth = pl[plane_id].bit_depth;
+
+  // Load block
+  int stripe_id = uint((gy << subsampling) + 8) / STRIPE_SIZE;
+  int StripeStartY = ((stripe_id * STRIPE_SIZE) - 8) >> subsampling;
+  int StripeEndY = StripeStartY + (STRIPE_SIZE >> subsampling) - 1;
+
+  PlaneInfo plane = pl[plane_id].plane;
+  int block_offset = 0;
+
+  const int dst_offset = pl[plane_id].dst_offset;
+
+  UnitsInfo units = pl[plane_id].units;
+  int unitCol = min(units.Cols - 1, uint(gx + 1) / units.Size);
+  int unitRow = min(units.Rows - 1, uint((gy + 1 + (8 >> subsampling))) / units.Size);
+  int unitId = unitRow * units.Stride + unitCol + pl[plane_id].Lr_buffer_offset;
+  int4 rType = int4(RESTORE_NONE, 0, 0, 0);
+  if (do_restoration) rType = LrTypeSgr.Load4(unitId * 16);
+
+  if (rType.x == RESTORE_NONE || !do_restoration) {
+    if ((thread.x & 3) == 0) {
+      if (pl[plane_id].hbd) {
+        uint2 input_char = src.Load2(plane.offset + plane.stride * thread.y + (thread.x << 1));
+        dst.Store2(dst_offset + plane.stride * thread.y + (thread.x << 1), input_char);
+      } else {
+        uint input_char = src.Load(plane.offset + plane.stride * thread.y + thread.x);
+        dst.Store(dst_offset + plane.stride * thread.y + thread.x, input_char);
+      }
+    }
+    return;
+  }
+
+  for (int y = ly; y < WG_HEIGHT + 6; y += WG_HEIGHT)
+    for (int x = lx; x < WG_WIDTH / 4 + 2; x += WG_WIDTH) {
+      int c_y = clamp(y + gy - 3, StripeStartY - 2, StripeEndY + 2);
+      c_y = clamp(c_y, 0, plane.height - 1);
+      int nc_x = (x << 2) + gx - 4;
+      int c_x = clamp(nc_x, 0, plane.width - 1) & (~3);
+      int offset = (c_y < StripeStartY || c_y > StripeEndY) ? pl[plane_id].pp_offset : plane.offset;
+
+      if (pl[plane_id].hbd) {
+        int shift_max = nc_x < 0 ? 0 : (plane.width - c_x - 1) * 16;
+        int shift_min = (nc_x - c_x) >= 4 ? shift_max : 0;
+        block_offset = c_y * plane.stride + (c_x << 1);
+
+        uint2 input_char = src.Load2(offset + block_offset);
+        uint4 shift = uint4(clamp(0, shift_min, shift_max), clamp(16, shift_min, shift_max),
+                            clamp(32, shift_min, shift_max), clamp(48, shift_min, shift_max));
+        input[y][x * 4 + 0] = (shift.x > 16 ? (input_char.y >> (shift.x - 32)) : (input_char.x >> shift.x)) & 0xffff;
+        input[y][x * 4 + 1] = (shift.y > 16 ? (input_char.y >> (shift.y - 32)) : (input_char.x >> shift.y)) & 0xffff;
+        input[y][x * 4 + 2] = (shift.z > 16 ? (input_char.y >> (shift.z - 32)) : (input_char.x >> shift.z)) & 0xffff;
+        input[y][x * 4 + 3] = (shift.w > 16 ? (input_char.y >> (shift.w - 32)) : (input_char.x >> shift.w)) & 0xffff;
+      } else {
+        int shift_max = nc_x < 0 ? 0 : (plane.width - c_x - 1) * 8;
+        int shift_min = (nc_x - c_x) >= 4 ? shift_max : 0;
+        block_offset = c_y * plane.stride + c_x;
+        uint input_char = src.Load(offset + block_offset);
+        input[y][x * 4 + 0] = (input_char >> clamp(0, shift_min, shift_max)) & 255;
+        input[y][x * 4 + 1] = (input_char >> clamp(8, shift_min, shift_max)) & 255;
+        input[y][x * 4 + 2] = (input_char >> clamp(16, shift_min, shift_max)) & 255;
+        input[y][x * 4 + 3] = (input_char >> clamp(24, shift_min, shift_max)) & 255;
+      }
+    }
+
+  GroupMemoryBarrier();
+
+  if (rType.x == RESTORE_WIENER) {
+    int limit = (1 << (bit_depth + 1 + FILTER_BITS - InterRound0)) - 1;
+
+    int Lr_offset = unitId * 64;
+    int4 hfilter0 = LrWiener.Load4(Lr_offset + 0);
+    int4 hfilter1 = LrWiener.Load4(Lr_offset + 16);
+
+    for (int r = ly; r < WG_HEIGHT + 6; r += WG_HEIGHT) {
+      int s = (input[r][lx + 4] << 7) + (1 << (bit_depth + 7 - 1));
+
+      s += hfilter0.x * input[r][lx + 1];
+      s += hfilter0.y * input[r][lx + 2];
+      s += hfilter0.z * input[r][lx + 3];
+      s += hfilter0.w * input[r][lx + 4];
+      s += hfilter1.x * input[r][lx + 5];
+      s += hfilter1.y * input[r][lx + 6];
+      s += hfilter1.z * input[r][lx + 7];
+
+      int v = Round2(s, InterRound0);
+      intermediate[r][lx] = clamp(v, 0, limit);
+    }
+
+    int4 vfilter0 = LrWiener.Load4(Lr_offset + 32);
+    int4 vfilter1 = LrWiener.Load4(Lr_offset + 48);
+
+    int s = (intermediate[ly + 3][lx] << 7) - (1 << (bit_depth + InterRound1 - 1));
+
+    s += vfilter0.x * intermediate[ly + 0][lx];
+    s += vfilter0.y * intermediate[ly + 1][lx];
+    s += vfilter0.z * intermediate[ly + 2][lx];
+    s += vfilter0.w * intermediate[ly + 3][lx];
+    s += vfilter1.x * intermediate[ly + 4][lx];
+    s += vfilter1.y * intermediate[ly + 5][lx];
+    s += vfilter1.z * intermediate[ly + 6][lx];
+    int v = Round2(s, InterRound1);
+    output[ly][lx] = clamp(v, 0, (1 << bit_depth) - 1);  // input[ly][lx + 4];//
+    if (lx < WG_WIDTH / 4) {
+      if (pl[plane_id].hbd) {
+        dst.Store2(dst_offset + (gy + ly) * plane.stride + (gx + lx * 4) * 2,
+                   uint2((output[ly][lx * 4 + 0] << 0) | (output[ly][lx * 4 + 1] << 16),
+                         (output[ly][lx * 4 + 2] << 0) | (output[ly][lx * 4 + 3] << 16)));
+      } else {
+        dst.Store(dst_offset + (gy + ly) * plane.stride + gx + lx * 4,
+                  (output[ly][lx * 4 + 0] << 0) | (output[ly][lx * 4 + 1] << 8) | (output[ly][lx * 4 + 2] << 16) |
+                      (output[ly][lx * 4 + 3] << 24));
+      }
+    }
+  } else if (rType.x == RESTORE_SGRPROJ) {
+    const int w0 = rType.y;
+    const int w1 = rType.z;
+    const int w2 = (1 << SGRPROJ_PRJ_BITS) - w0 - w1;
+
+    int r0 = data.Sgr_Params[rType.w].x;
+    int r1 = data.Sgr_Params[rType.w].y;
+    int eps0 = data.Sgr_Params[rType.w].z;
+    int eps1 = data.Sgr_Params[rType.w].w;
+
+    box_filter0(WG_WIDTH, WG_HEIGHT, r0, eps0, lx, ly, bit_depth);
+    box_filter1(WG_WIDTH, WG_HEIGHT, r1, eps1, lx, ly, bit_depth);
+
+    int u = input[ly + 3][lx + 4] << SGRPROJ_RST_BITS;
+    int v = w1 * u;
+    v += w0 * (r0 ? flt[0][ly][lx] : u);
+    v += w2 * (r1 ? flt[1][ly][lx] : u);
+    int s = Round2(v, (SGRPROJ_RST_BITS + SGRPROJ_PRJ_BITS));
+    output[ly][lx] = clamp(s, 0, (1 << bit_depth) - 1);
+    if (lx < WG_WIDTH / 4) {
+      if (pl[plane_id].hbd) {
+        dst.Store2(dst_offset + (gy + ly) * plane.stride + (gx + lx * 4) * 2,
+                   uint2((output[ly][lx * 4 + 0] << 0) | (output[ly][lx * 4 + 1] << 16),
+                         (output[ly][lx * 4 + 2] << 0) | (output[ly][lx * 4 + 3] << 16)));
+      } else {
+        dst.Store(dst_offset + (gy + ly) * plane.stride + gx + lx * 4,
+                  (output[ly][lx * 4 + 0] << 0) | (output[ly][lx * 4 + 1] << 8) | (output[ly][lx * 4 + 2] << 16) |
+                      (output[ly][lx * 4 + 3] << 24));
+      }
+    }
+  }
+}

diff --git a/libav1/dx/shaders/loopfilter_hor.hlsl b/libav1/dx/shaders/loopfilter_hor.hlsl
new file mode 100644
index 0000000..2aad708
--- /dev/null
+++ b/libav1/dx/shaders/loopfilter_hor.hlsl

@@ -0,0 +1,232 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#define BlockSize 32
+#define Stride64 64
+
+groupshared uint input_edges[16 * (64 / 4)];
+groupshared uint shared_mem[64 * 16];
+
+void filter_horizontal_edge(RWByteAddressBuffer buffer, uint offset, uint stride, int4 limits, uint type, int wi) {
+  const int shift = (offset & 3) * 8;
+  offset &= ~3;
+  int4 p0 = int4(0, 0, 0, 0);
+  int4 q1 = int4(0, 0, 0, 0);
+  int4 p1, q0;
+  if (type == 3) {
+    p0.x = (buffer.Load(offset - 8 * stride) >> shift) & 0xff;
+    p0.y = (buffer.Load(offset - 7 * stride) >> shift) & 0xff;
+    p0.z = (buffer.Load(offset - 6 * stride) >> shift) & 0xff;
+    p0.w = (buffer.Load(offset - 5 * stride) >> shift) & 0xff;
+  }
+  p1.x = (buffer.Load(offset - 4 * stride) >> shift) & 0xff;
+  p1.y = (buffer.Load(offset - 3 * stride) >> shift) & 0xff;
+  p1.z = (buffer.Load(offset - 2 * stride) >> shift) & 0xff;
+  p1.w = (buffer.Load(offset - 1 * stride) >> shift) & 0xff;
+  q0.x = (buffer.Load(offset) >> shift) & 0xff;
+  q0.y = (buffer.Load(offset + 1 * stride) >> shift) & 0xff;
+  q0.z = (buffer.Load(offset + 2 * stride) >> shift) & 0xff;
+  q0.w = (buffer.Load(offset + 3 * stride) >> shift) & 0xff;
+
+  if (type == 3) {
+    q1.x = (buffer.Load(offset + 4 * stride) >> shift) & 0xff;
+    q1.y = (buffer.Load(offset + 5 * stride) >> shift) & 0xff;
+    q1.z = (buffer.Load(offset + 6 * stride) >> shift) & 0xff;
+    q1.w = (buffer.Load(offset + 7 * stride) >> shift) & 0xff;
+  }
+
+  shared_mem[Stride64 * 0 + wi] = p0.x;
+  shared_mem[Stride64 * 1 + wi] = p0.y;
+  shared_mem[Stride64 * 2 + wi] = p0.z;
+  shared_mem[Stride64 * 3 + wi] = p0.w;
+  shared_mem[Stride64 * 4 + wi] = p1.x;
+  shared_mem[Stride64 * 5 + wi] = p1.y;
+  shared_mem[Stride64 * 6 + wi] = p1.z;
+  shared_mem[Stride64 * 7 + wi] = p1.w;
+  shared_mem[Stride64 * 8 + wi] = q0.x;
+  shared_mem[Stride64 * 9 + wi] = q0.y;
+  shared_mem[Stride64 * 10 + wi] = q0.z;
+  shared_mem[Stride64 * 11 + wi] = q0.w;
+  shared_mem[Stride64 * 12 + wi] = q1.x;
+  shared_mem[Stride64 * 13 + wi] = q1.y;
+  shared_mem[Stride64 * 14 + wi] = q1.z;
+  shared_mem[Stride64 * 15 + wi] = q1.w;
+
+  int mask = abs(p1.z - p1.w) <= limits.x && abs(q0.x - q0.y) <= limits.x &&
+             abs(p1.w - q0.x) * 2 + (abs(p1.z - q0.y) >> 1) <= limits.y;
+
+  mask &= (abs(p1.y - p1.z) <= limits.x && abs(q0.y - q0.z) <= limits.x) | (type == 0);
+  mask &= (abs(p1.x - p1.y) <= limits.x && abs(q0.z - q0.w) <= limits.x) | (type <= 1);
+
+  if (mask) {
+    int flat_uv = abs(p1.y - p1.w) <= 1 && abs(p1.z - p1.w) <= 1 && abs(q0.y - q0.x) <= 1 && abs(q0.z - q0.x) <= 1;
+    int flat = flat_uv && abs(p1.x - p1.w) <= 1 && abs(q0.w - q0.x) <= 1;
+    int flat2 = abs(p0.y - p1.w) <= 1 && abs(p0.z - p1.w) <= 1 && abs(p0.w - p1.w) <= 1 && abs(q1.x - q0.x) <= 1 &&
+                abs(q1.y - q0.x) <= 1 && abs(q1.z - q0.x) <= 1;
+
+    if (type == 3 && flat && flat2) {
+      shared_mem[Stride64 * 2 + wi] =
+          (p0.y * 7 + p0.z * 2 + p0.w * 2 + p1.x * 1 + p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 1 + 8) >> 4;
+      shared_mem[Stride64 * 3 + wi] =
+          (p0.y * 5 + p0.z * 2 + p0.w * 2 + p1.x * 2 + p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 1 + 8) >> 4;
+      shared_mem[Stride64 * 4 + wi] = (p0.y * 4 + p0.z * 1 + p0.w * 2 + p1.x * 2 + p1.y * 2 + p1.z * 1 + p1.w * 1 +
+                                       q0.x * 1 + q0.y * 1 + q0.z * 1 + 8) >>
+                                      4;
+      shared_mem[Stride64 * 5 + wi] = (p0.y * 3 + p0.z * 1 + p0.w * 1 + p1.x * 2 + p1.y * 2 + p1.z * 2 + p1.w * 1 +
+                                       q0.x * 1 + q0.y * 1 + q0.z * 1 + q0.w * 1 + 8) >>
+                                      4;
+      shared_mem[Stride64 * 6 + wi] = (p0.y * 2 + p0.z * 1 + p0.w * 1 + p1.x * 1 + p1.y * 2 + p1.z * 2 + p1.w * 2 +
+                                       q0.x * 1 + q0.y * 1 + q0.z * 1 + q0.w * 1 + q1.x * 1 + 8) >>
+                                      4;
+      shared_mem[Stride64 * 7 + wi] = (p0.y * 1 + p0.z * 1 + p0.w * 1 + p1.x * 1 + p1.y * 1 + p1.z * 2 + p1.w * 2 +
+                                       q0.x * 2 + q0.y * 1 + q0.z * 1 + q0.w * 1 + q1.x * 1 + q1.y * 1 + 8) >>
+                                      4;
+      shared_mem[Stride64 * 8 + wi] = (p0.z * 1 + p0.w * 1 + p1.x * 1 + p1.y * 1 + p1.z * 1 + p1.w * 2 + q0.x * 2 +
+                                       q0.y * 2 + q0.z * 1 + q0.w * 1 + q1.x * 1 + q1.y * 1 + q1.z * 1 + 8) >>
+                                      4;
+      shared_mem[Stride64 * 9 + wi] = (p0.w * 1 + p1.x * 1 + p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 2 + q0.y * 2 +
+                                       q0.z * 2 + q0.w * 1 + q1.x * 1 + q1.y * 1 + q1.z * 2 + 8) >>
+                                      4;
+      shared_mem[Stride64 * 10 + wi] = (p1.x * 1 + p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 2 + q0.z * 2 +
+                                        q0.w * 2 + q1.x * 1 + q1.y * 1 + q1.z * 3 + 8) >>
+                                       4;
+      shared_mem[Stride64 * 11 + wi] = (p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 1 + q0.z * 2 + q0.w * 2 +
+                                        q1.x * 2 + q1.y * 1 + q1.z * 4 + 8) >>
+                                       4;
+      shared_mem[Stride64 * 12 + wi] =
+          (p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 1 + q0.z * 1 + q0.w * 2 + q1.x * 2 + q1.y * 2 + q1.z * 5 + 8) >> 4;
+      shared_mem[Stride64 * 13 + wi] =
+          (p1.w * 1 + q0.x * 1 + q0.y * 1 + q0.z * 1 + q0.w * 1 + q1.x * 2 + q1.y * 2 + q1.z * 7 + 8) >> 4;
+    } else if (type >= 2 && flat) {
+      int v = p1.x * 3 + p1.y + p1.z + p1.w + q0.x;
+      shared_mem[Stride64 * 5 + wi] = (v + p1.y + 4) >> 3;
+      v += -p1.x + q0.y;
+      shared_mem[Stride64 * 6 + wi] = (v + p1.z + 4) >> 3;
+      v += -p1.x + q0.z;
+      shared_mem[Stride64 * 7 + wi] = (v + p1.w + 4) >> 3;
+      v += -p1.x + q0.w;
+      shared_mem[Stride64 * 8 + wi] = (v + q0.x + 4) >> 3;
+      v += -p1.y + q0.w;
+      shared_mem[Stride64 * 9 + wi] = (v + q0.y + 4) >> 3;
+      v += -p1.z + q0.w;
+      shared_mem[Stride64 * 10 + wi] = (v + q0.z + 4) >> 3;
+    } else if (type == 1 && flat_uv) {
+      // 5-tap filter [1, 2, 2, 2, 1]
+      shared_mem[Stride64 * 6 + wi] = (p1.y * 3 + p1.z * 2 + p1.w * 2 + q0.x + 4) >> 3;
+      shared_mem[Stride64 * 7 + wi] = (p1.y + p1.z * 2 + p1.w * 2 + q0.x * 2 + q0.y + 4) >> 3;
+      shared_mem[Stride64 * 8 + wi] = (p1.z + p1.w * 2 + q0.x * 2 + q0.y * 2 + q0.z + 4) >> 3;
+      shared_mem[Stride64 * 9 + wi] = (p1.w + q0.x * 2 + q0.y * 2 + q0.z * 3 + 4) >> 3;
+    } else {
+      uint hev = (abs(p1.w - p1.z) > limits.z || abs(q0.x - q0.y) > limits.z) ? 0xffffffff : 0;
+      int ps1 = p1.z - 128;
+      int ps0 = p1.w - 128;
+      int qs0 = q0.x - 128;
+      int qs1 = q0.y - 128;
+
+      int f0 = clamp(ps1 - qs1, -128, 127) & hev;
+      f0 = clamp(f0 + 3 * (qs0 - ps0), -128, 124);
+      int f1 = min(f0 + 4, 127) >> 3;
+      int f2 = (f0 + 3) >> 3;
+
+      shared_mem[Stride64 * 7 + wi] = clamp(ps0 + f2, -128, 127) + 128;
+      shared_mem[Stride64 * 8 + wi] = (clamp(qs0 - f1, -128, 127) + 128);
+
+      f0 = ((f1 + 1) >> 1) & (~hev);
+
+      shared_mem[Stride64 * 6 + wi] = clamp(ps1 + f0, -128, 127) + 128;
+      shared_mem[Stride64 * 9 + wi] = clamp(qs1 - f0, -128, 127) + 128;
+    }
+  }
+
+  GroupMemoryBarrier();
+
+  const int wi4 = wi & 3;
+  const int base = (wi - wi4) + wi4 * Stride64;
+  const int dst_offset = offset + wi4 * stride;
+
+  if (type == 3 && wi4 >= 2) {
+    buffer.Store(dst_offset - 8 * stride,
+                 shared_mem[base + Stride64 * 0 + 0] | shared_mem[base + Stride64 * 0 + 1] << 8 |
+                     shared_mem[base + Stride64 * 0 + 2] << 16 | shared_mem[base + Stride64 * 0 + 3] << 24);
+  }
+  if (type > 1 || wi4 >= 2) {
+    buffer.Store(dst_offset - 4 * stride,
+                 shared_mem[base + Stride64 * 4 + 0] | shared_mem[base + Stride64 * 4 + 1] << 8 |
+                     shared_mem[base + Stride64 * 4 + 2] << 16 | shared_mem[base + Stride64 * 4 + 3] << 24);
+  }
+  if (type > 1 || wi4 < 2) {
+    buffer.Store(dst_offset - 0 * stride,
+                 shared_mem[base + Stride64 * 8 + 0] | shared_mem[base + Stride64 * 8 + 1] << 8 |
+                     shared_mem[base + Stride64 * 8 + 2] << 16 | shared_mem[base + Stride64 * 8 + 3] << 24);
+  }
+  if (type == 3 && wi4 < 2) {
+    buffer.Store(dst_offset + 4 * stride,
+                 shared_mem[base + Stride64 * 12 + 0] | shared_mem[base + Stride64 * 12 + 1] << 8 |
+                     shared_mem[base + Stride64 * 12 + 2] << 16 | shared_mem[base + Stride64 * 12 + 3] << 24);
+  }
+}
+
+RWByteAddressBuffer dst_frame : register(u0);
+ByteAddressBuffer lf_blocks : register(t0);
+
+cbuffer LoopfilterData : register(b0) {
+  int4 cb_planes[3];
+  int4 cb_limits[64];
+};
+
+cbuffer LoopfilterSRT : register(b1) {
+  uint cb_wicount;
+  uint cb_plane;
+  uint cb_offset_base;
+  uint cb_block_cols;
+  uint cb_block_id_offset;
+};
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wicount) return;
+  const int plane = cb_plane;
+  const int wi = thread.x & 3;
+  const uint block_id = thread.x >> 2;
+  uint2 data = lf_blocks.Load2((cb_offset_base + block_id) * BlockSize + (wi << 3));
+  const int local_offset = (thread.x & (64 - 4)) << 2;
+  input_edges[local_offset + (wi << 2) + 0] = data.x & 0xffff;
+  input_edges[local_offset + (wi << 2) + 1] = (data.x >> 16);
+  input_edges[local_offset + (wi << 2) + 2] = data.y & 0xffff;
+  input_edges[local_offset + (wi << 2) + 3] = (data.y >> 16);
+
+  GroupMemoryBarrier();
+
+  const int block_x = block_id % cb_block_cols;
+  const int block_y = block_id / cb_block_cols;
+  const int stride = cb_planes[plane].x;
+  const int addr = cb_planes[plane].y + block_x * 4 + wi + block_y * 64 * stride;
+
+  for (int row = 0; row < 16;) {
+    uint edge = input_edges[local_offset + row];
+
+    if (!edge) break;
+    int level = edge & 63;
+    int filter = (edge >> 6) & 3;
+    int step = edge >> 8;
+
+    if (level) {
+      const int4 limits = cb_limits[level];
+      filter_horizontal_edge(dst_frame, addr + row * 4 * stride, stride, limits, filter, thread.x & 63);
+    }
+    row += step;
+  }
+}

diff --git a/libav1/dx/shaders/loopfilter_hor_hbd.hlsl b/libav1/dx/shaders/loopfilter_hor_hbd.hlsl
new file mode 100644
index 0000000..4dbbc50
--- /dev/null
+++ b/libav1/dx/shaders/loopfilter_hor_hbd.hlsl

@@ -0,0 +1,236 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#define BlockSize 32
+#define Stride64 64
+#define SignedPixelMin -512
+#define SignedPixelMax 511
+#define PixelGray 512
+#define Thresh10 4
+
+groupshared uint input_edges[16 * (64 / 4)];
+groupshared uint shared_mem[64 * 16];
+
+void filter_horizontal_edge(RWByteAddressBuffer buffer, uint offset, uint stride, int4 limits, uint type, int wi) {
+  const int shift = (offset & 2) * 8;
+  offset &= ~3;
+  int4 p0 = int4(0, 0, 0, 0);
+  int4 q1 = int4(0, 0, 0, 0);
+  int4 p1, q0;
+  if (type == 3) {
+    p0.x = (buffer.Load(offset - 8 * stride) >> shift) & 0xffff;
+    p0.y = (buffer.Load(offset - 7 * stride) >> shift) & 0xffff;
+    p0.z = (buffer.Load(offset - 6 * stride) >> shift) & 0xffff;
+    p0.w = (buffer.Load(offset - 5 * stride) >> shift) & 0xffff;
+  }
+  p1.x = (buffer.Load(offset - 4 * stride) >> shift) & 0xffff;
+  p1.y = (buffer.Load(offset - 3 * stride) >> shift) & 0xffff;
+  p1.z = (buffer.Load(offset - 2 * stride) >> shift) & 0xffff;
+  p1.w = (buffer.Load(offset - 1 * stride) >> shift) & 0xffff;
+  q0.x = (buffer.Load(offset) >> shift) & 0xffff;
+  q0.y = (buffer.Load(offset + 1 * stride) >> shift) & 0xffff;
+  q0.z = (buffer.Load(offset + 2 * stride) >> shift) & 0xffff;
+  q0.w = (buffer.Load(offset + 3 * stride) >> shift) & 0xffff;
+
+  if (type == 3) {
+    q1.x = (buffer.Load(offset + 4 * stride) >> shift) & 0xffff;
+    q1.y = (buffer.Load(offset + 5 * stride) >> shift) & 0xffff;
+    q1.z = (buffer.Load(offset + 6 * stride) >> shift) & 0xffff;
+    q1.w = (buffer.Load(offset + 7 * stride) >> shift) & 0xffff;
+  }
+
+  shared_mem[Stride64 * 0 + wi] = p0.x;
+  shared_mem[Stride64 * 1 + wi] = p0.y;
+  shared_mem[Stride64 * 2 + wi] = p0.z;
+  shared_mem[Stride64 * 3 + wi] = p0.w;
+  shared_mem[Stride64 * 4 + wi] = p1.x;
+  shared_mem[Stride64 * 5 + wi] = p1.y;
+  shared_mem[Stride64 * 6 + wi] = p1.z;
+  shared_mem[Stride64 * 7 + wi] = p1.w;
+  shared_mem[Stride64 * 8 + wi] = q0.x;
+  shared_mem[Stride64 * 9 + wi] = q0.y;
+  shared_mem[Stride64 * 10 + wi] = q0.z;
+  shared_mem[Stride64 * 11 + wi] = q0.w;
+  shared_mem[Stride64 * 12 + wi] = q1.x;
+  shared_mem[Stride64 * 13 + wi] = q1.y;
+  shared_mem[Stride64 * 14 + wi] = q1.z;
+  shared_mem[Stride64 * 15 + wi] = q1.w;
+
+  int mask = abs(p1.z - p1.w) <= limits.x && abs(q0.x - q0.y) <= limits.x &&
+             abs(p1.w - q0.x) * 2 + (abs(p1.z - q0.y) >> 1) <= limits.y;
+
+  mask &= (abs(p1.y - p1.z) <= limits.x && abs(q0.y - q0.z) <= limits.x) | (type == 0);
+  mask &= (abs(p1.x - p1.y) <= limits.x && abs(q0.z - q0.w) <= limits.x) | (type <= 1);
+
+  if (mask) {
+    int flat_uv = abs(p1.y - p1.w) <= Thresh10 && abs(p1.z - p1.w) <= Thresh10 && abs(q0.y - q0.x) <= Thresh10 &&
+                  abs(q0.z - q0.x) <= Thresh10;
+    int flat = flat_uv && abs(p1.x - p1.w) <= Thresh10 && abs(q0.w - q0.x) <= Thresh10;
+    int flat2 = abs(p0.y - p1.w) <= Thresh10 && abs(p0.z - p1.w) <= Thresh10 && abs(p0.w - p1.w) <= Thresh10 &&
+                abs(q1.x - q0.x) <= Thresh10 && abs(q1.y - q0.x) <= Thresh10 && abs(q1.z - q0.x) <= Thresh10;
+
+    if (type == 3 && flat && flat2) {
+      shared_mem[Stride64 * 2 + wi] =
+          (p0.y * 7 + p0.z * 2 + p0.w * 2 + p1.x * 1 + p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 1 + 8) >> 4;
+      shared_mem[Stride64 * 3 + wi] =
+          (p0.y * 5 + p0.z * 2 + p0.w * 2 + p1.x * 2 + p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 1 + 8) >> 4;
+      shared_mem[Stride64 * 4 + wi] = (p0.y * 4 + p0.z * 1 + p0.w * 2 + p1.x * 2 + p1.y * 2 + p1.z * 1 + p1.w * 1 +
+                                       q0.x * 1 + q0.y * 1 + q0.z * 1 + 8) >>
+                                      4;
+      shared_mem[Stride64 * 5 + wi] = (p0.y * 3 + p0.z * 1 + p0.w * 1 + p1.x * 2 + p1.y * 2 + p1.z * 2 + p1.w * 1 +
+                                       q0.x * 1 + q0.y * 1 + q0.z * 1 + q0.w * 1 + 8) >>
+                                      4;
+      shared_mem[Stride64 * 6 + wi] = (p0.y * 2 + p0.z * 1 + p0.w * 1 + p1.x * 1 + p1.y * 2 + p1.z * 2 + p1.w * 2 +
+                                       q0.x * 1 + q0.y * 1 + q0.z * 1 + q0.w * 1 + q1.x * 1 + 8) >>
+                                      4;
+      shared_mem[Stride64 * 7 + wi] = (p0.y * 1 + p0.z * 1 + p0.w * 1 + p1.x * 1 + p1.y * 1 + p1.z * 2 + p1.w * 2 +
+                                       q0.x * 2 + q0.y * 1 + q0.z * 1 + q0.w * 1 + q1.x * 1 + q1.y * 1 + 8) >>
+                                      4;
+      shared_mem[Stride64 * 8 + wi] = (p0.z * 1 + p0.w * 1 + p1.x * 1 + p1.y * 1 + p1.z * 1 + p1.w * 2 + q0.x * 2 +
+                                       q0.y * 2 + q0.z * 1 + q0.w * 1 + q1.x * 1 + q1.y * 1 + q1.z * 1 + 8) >>
+                                      4;
+      shared_mem[Stride64 * 9 + wi] = (p0.w * 1 + p1.x * 1 + p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 2 + q0.y * 2 +
+                                       q0.z * 2 + q0.w * 1 + q1.x * 1 + q1.y * 1 + q1.z * 2 + 8) >>
+                                      4;
+      shared_mem[Stride64 * 10 + wi] = (p1.x * 1 + p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 2 + q0.z * 2 +
+                                        q0.w * 2 + q1.x * 1 + q1.y * 1 + q1.z * 3 + 8) >>
+                                       4;
+      shared_mem[Stride64 * 11 + wi] = (p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 1 + q0.z * 2 + q0.w * 2 +
+                                        q1.x * 2 + q1.y * 1 + q1.z * 4 + 8) >>
+                                       4;
+      shared_mem[Stride64 * 12 + wi] =
+          (p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 1 + q0.z * 1 + q0.w * 2 + q1.x * 2 + q1.y * 2 + q1.z * 5 + 8) >> 4;
+      shared_mem[Stride64 * 13 + wi] =
+          (p1.w * 1 + q0.x * 1 + q0.y * 1 + q0.z * 1 + q0.w * 1 + q1.x * 2 + q1.y * 2 + q1.z * 7 + 8) >> 4;
+    } else if (type >= 2 && flat) {
+      int v = p1.x * 3 + p1.y + p1.z + p1.w + q0.x;
+      shared_mem[Stride64 * 5 + wi] = (v + p1.y + 4) >> 3;
+      v += -p1.x + q0.y;
+      shared_mem[Stride64 * 6 + wi] = (v + p1.z + 4) >> 3;
+      v += -p1.x + q0.z;
+      shared_mem[Stride64 * 7 + wi] = (v + p1.w + 4) >> 3;
+      v += -p1.x + q0.w;
+      shared_mem[Stride64 * 8 + wi] = (v + q0.x + 4) >> 3;
+      v += -p1.y + q0.w;
+      shared_mem[Stride64 * 9 + wi] = (v + q0.y + 4) >> 3;
+      v += -p1.z + q0.w;
+      shared_mem[Stride64 * 10 + wi] = (v + q0.z + 4) >> 3;
+    } else if (type == 1 && flat_uv) {
+      // 5-tap filter [1, 2, 2, 2, 1]
+      shared_mem[Stride64 * 6 + wi] = (p1.y * 3 + p1.z * 2 + p1.w * 2 + q0.x + 4) >> 3;
+      shared_mem[Stride64 * 7 + wi] = (p1.y + p1.z * 2 + p1.w * 2 + q0.x * 2 + q0.y + 4) >> 3;
+      shared_mem[Stride64 * 8 + wi] = (p1.z + p1.w * 2 + q0.x * 2 + q0.y * 2 + q0.z + 4) >> 3;
+      shared_mem[Stride64 * 9 + wi] = (p1.w + q0.x * 2 + q0.y * 2 + q0.z * 3 + 4) >> 3;
+    } else {
+      uint hev = (abs(p1.w - p1.z) > limits.z || abs(q0.x - q0.y) > limits.z) ? 0xffffffff : 0;
+      int ps1 = p1.z - PixelGray;
+      int ps0 = p1.w - PixelGray;
+      int qs0 = q0.x - PixelGray;
+      int qs1 = q0.y - PixelGray;
+
+      int f0 = clamp(ps1 - qs1, SignedPixelMin, SignedPixelMax) & hev;
+      f0 = clamp(f0 + 3 * (qs0 - ps0), SignedPixelMin, SignedPixelMax);
+      int f1 = clamp(f0 + 4, SignedPixelMin, SignedPixelMax) >> 3;
+      int f2 = clamp(f0 + 3, SignedPixelMin, SignedPixelMax) >> 3;
+
+      shared_mem[Stride64 * 7 + wi] = clamp(ps0 + f2, SignedPixelMin, SignedPixelMax) + PixelGray;
+      shared_mem[Stride64 * 8 + wi] = (clamp(qs0 - f1, SignedPixelMin, SignedPixelMax) + PixelGray);
+
+      f0 = ((f1 + 1) >> 1) & (~hev);
+
+      shared_mem[Stride64 * 6 + wi] = clamp(ps1 + f0, SignedPixelMin, SignedPixelMax) + PixelGray;
+      shared_mem[Stride64 * 9 + wi] = clamp(qs1 - f0, SignedPixelMin, SignedPixelMax) + PixelGray;
+    }
+  }
+
+  GroupMemoryBarrier();
+
+  const int wi4 = wi & 3;
+  const int base = (wi - wi4) + wi4 * Stride64;
+  const int dst_offset = (offset & (~7)) + wi4 * stride;
+
+  if (type == 3 && wi4 >= 2) {
+    buffer.Store2(dst_offset - 8 * stride,
+                  uint2(shared_mem[base + Stride64 * 0 + 0] | shared_mem[base + Stride64 * 0 + 1] << 16,
+                        shared_mem[base + Stride64 * 0 + 2] | shared_mem[base + Stride64 * 0 + 3] << 16));
+  }
+  if (type > 1 || wi4 >= 2) {
+    buffer.Store2(dst_offset - 4 * stride,
+                  uint2(shared_mem[base + Stride64 * 4 + 0] | shared_mem[base + Stride64 * 4 + 1] << 16,
+                        shared_mem[base + Stride64 * 4 + 2] | shared_mem[base + Stride64 * 4 + 3] << 16));
+  }
+  if (type > 1 || wi4 < 2) {
+    buffer.Store2(dst_offset - 0 * stride,
+                  uint2(shared_mem[base + Stride64 * 8 + 0] | shared_mem[base + Stride64 * 8 + 1] << 16,
+                        shared_mem[base + Stride64 * 8 + 2] | shared_mem[base + Stride64 * 8 + 3] << 16));
+  }
+  if (type == 3 && wi4 < 2) {
+    buffer.Store2(dst_offset + 4 * stride,
+                  uint2(shared_mem[base + Stride64 * 12 + 0] | shared_mem[base + Stride64 * 12 + 1] << 16,
+                        shared_mem[base + Stride64 * 12 + 2] | shared_mem[base + Stride64 * 12 + 3] << 16));
+  }
+}
+
+RWByteAddressBuffer dst_frame : register(u0);
+ByteAddressBuffer lf_blocks : register(t0);
+
+cbuffer LoopfilterData : register(b0) {
+  int4 cb_planes[3];
+  int4 cb_limits[64];
+};
+
+cbuffer LoopfilterSRT : register(b1) {
+  uint cb_wicount;
+  uint cb_plane;
+  uint cb_offset_base;
+  uint cb_block_cols;
+  uint cb_block_id_offset;
+};
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wicount) return;
+  const int plane = cb_plane;
+  const int wi = thread.x & 3;
+  const uint block_id = thread.x >> 2;
+  uint2 data = lf_blocks.Load2((cb_offset_base + block_id) * BlockSize + (wi << 3));
+  const int local_offset = (thread.x & (64 - 4)) << 2;
+  input_edges[local_offset + (wi << 2) + 0] = data.x & 0xffff;
+  input_edges[local_offset + (wi << 2) + 1] = (data.x >> 16);
+  input_edges[local_offset + (wi << 2) + 2] = data.y & 0xffff;
+  input_edges[local_offset + (wi << 2) + 3] = (data.y >> 16);
+
+  GroupMemoryBarrier();
+
+  const int block_x = block_id % cb_block_cols;
+  const int block_y = block_id / cb_block_cols;
+  const int stride = cb_planes[plane].x;
+  const int addr = cb_planes[plane].y + block_x * 8 + wi * 2 + block_y * 64 * stride;
+
+  for (int row = 0; row < 16;) {
+    uint edge = input_edges[local_offset + row];
+    if (!edge) break;
+    int level = edge & 63;
+    int filter = (edge >> 6) & 3;
+    int step = edge >> 8;
+
+    if (level) {
+      const int4 limits = cb_limits[level] << 2;
+      filter_horizontal_edge(dst_frame, addr + row * 4 * stride, stride, limits, filter, thread.x & 63);
+    }
+    row += step;
+  }
+}

diff --git a/libav1/dx/shaders/loopfilter_vert.hlsl b/libav1/dx/shaders/loopfilter_vert.hlsl
new file mode 100644
index 0000000..fae8192
--- /dev/null
+++ b/libav1/dx/shaders/loopfilter_vert.hlsl

@@ -0,0 +1,202 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#define BlockSize 32
+
+groupshared uint input_edges[16 * (64 / 4)];
+
+void filter_vertical_edge(RWByteAddressBuffer buffer, uint offset, int4 limits, uint type) {
+  uint4 pixels = buffer.Load4(offset - 8);
+  int4 p0, p1, q0, q1;
+  p0.x = (pixels.x >> 0) & 0xff;
+  p0.y = (pixels.x >> 8) & 0xff;
+  p0.z = (pixels.x >> 16) & 0xff;
+  p0.w = (pixels.x >> 24) & 0xff;
+
+  p1.x = (pixels.y >> 0) & 0xff;
+  p1.y = (pixels.y >> 8) & 0xff;
+  p1.z = (pixels.y >> 16) & 0xff;
+  p1.w = (pixels.y >> 24) & 0xff;
+
+  q0.x = (pixels.z >> 0) & 0xff;
+  q0.y = (pixels.z >> 8) & 0xff;
+  q0.z = (pixels.z >> 16) & 0xff;
+  q0.w = (pixels.z >> 24) & 0xff;
+
+  q1.x = (pixels.w >> 0) & 0xff;
+  q1.y = (pixels.w >> 8) & 0xff;
+  q1.z = (pixels.w >> 16) & 0xff;
+  q1.w = (pixels.w >> 24) & 0xff;
+
+  int mask = abs(p1.z - p1.w) <= limits.x && abs(q0.x - q0.y) <= limits.x &&
+             abs(p1.w - q0.x) * 2 + (abs(p1.z - q0.y) >> 1) <= limits.y;
+  mask &= (abs(p1.y - p1.z) <= limits.x && abs(q0.y - q0.z) <= limits.x) | (type == 0);
+  mask &= (abs(p1.x - p1.y) <= limits.x && abs(q0.z - q0.w) <= limits.x) | (type <= 1);
+
+  if (!mask) return;
+
+  int flat_uv = abs(p1.y - p1.w) <= 1 && abs(p1.z - p1.w) <= 1 && abs(q0.y - q0.x) <= 1 && abs(q0.z - q0.x) <= 1;
+  int flat = flat_uv && abs(p1.x - p1.w) <= 1 && abs(q0.w - q0.x) <= 1;
+  int flat2 = abs(p0.y - p1.w) <= 1 && abs(p0.z - p1.w) <= 1 && abs(p0.w - p1.w) <= 1 && abs(q1.x - q0.x) <= 1 &&
+              abs(q1.y - q0.x) <= 1 && abs(q1.z - q0.x) <= 1;
+
+  if (type == 3 && flat && flat2) {
+    pixels.x &= 0x0000ffff;
+    pixels.w &= 0xffff0000;
+    pixels.x |= ((p0.y * 7 + p0.z * 2 + p0.w * 2 + p1.x * 1 + p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 1 + 8) >> 4)
+                << 16;
+    pixels.x |=
+        ((p0.y * 5 + p0.z * 2 + p0.w * 2 + p1.x * 2 + p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 1 + 8) >> 4)
+        << 24;
+    pixels.y = ((p0.y * 4 + p0.z * 1 + p0.w * 2 + p1.x * 2 + p1.y * 2 + p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 1 +
+                 q0.z * 1 + 8) >>
+                4)
+               << 0;
+    pixels.y |= ((p0.y * 3 + p0.z * 1 + p0.w * 1 + p1.x * 2 + p1.y * 2 + p1.z * 2 + p1.w * 1 + q0.x * 1 + q0.y * 1 +
+                  q0.z * 1 + q0.w * 1 + 8) >>
+                 4)
+                << 8;
+    pixels.y |= ((p0.y * 2 + p0.z * 1 + p0.w * 1 + p1.x * 1 + p1.y * 2 + p1.z * 2 + p1.w * 2 + q0.x * 1 + q0.y * 1 +
+                  q0.z * 1 + q0.w * 1 + q1.x * 1 + 8) >>
+                 4)
+                << 16;
+    pixels.y |= ((p0.y * 1 + p0.z * 1 + p0.w * 1 + p1.x * 1 + p1.y * 1 + p1.z * 2 + p1.w * 2 + q0.x * 2 + q0.y * 1 +
+                  q0.z * 1 + q0.w * 1 + q1.x * 1 + q1.y * 1 + 8) >>
+                 4)
+                << 24;
+    pixels.z = ((p0.z * 1 + p0.w * 1 + p1.x * 1 + p1.y * 1 + p1.z * 1 + p1.w * 2 + q0.x * 2 + q0.y * 2 + q0.z * 1 +
+                 q0.w * 1 + q1.x * 1 + q1.y * 1 + q1.z * 1 + 8) >>
+                4)
+               << 0;
+    pixels.z |= ((p0.w * 1 + p1.x * 1 + p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 2 + q0.y * 2 + q0.z * 2 + q0.w * 1 +
+                  q1.x * 1 + q1.y * 1 + q1.z * 2 + 8) >>
+                 4)
+                << 8;
+    pixels.z |= ((p1.x * 1 + p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 2 + q0.z * 2 + q0.w * 2 + q1.x * 1 +
+                  q1.y * 1 + q1.z * 3 + 8) >>
+                 4)
+                << 16;
+    pixels.z |= ((p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 1 + q0.z * 2 + q0.w * 2 + q1.x * 2 + q1.y * 1 +
+                  q1.z * 4 + 8) >>
+                 4)
+                << 24;
+    pixels.w |=
+        ((p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 1 + q0.z * 1 + q0.w * 2 + q1.x * 2 + q1.y * 2 + q1.z * 5 + 8) >> 4)
+        << 0;
+    pixels.w |= ((p1.w * 1 + q0.x * 1 + q0.y * 1 + q0.z * 1 + q0.w * 1 + q1.x * 2 + q1.y * 2 + q1.z * 7 + 8) >> 4) << 8;
+    buffer.Store4(offset - 8, pixels);
+  } else if (type >= 2 && flat) {
+    int v = p1.x * 3 + p1.y + p1.z + p1.w + q0.x;
+    pixels.y = p1.x;
+    pixels.y |= ((v + p1.y + 4) >> 3) << 8;
+    v += -p1.x + q0.y;
+    pixels.y |= ((v + p1.z + 4) >> 3) << 16;
+    v += -p1.x + q0.z;
+    pixels.y |= ((v + p1.w + 4) >> 3) << 24;
+
+    v += -p1.x + q0.w;
+    pixels.z = (v + q0.x + 4) >> 3;
+    v += -p1.y + q0.w;
+    pixels.z |= ((v + q0.y + 4) >> 3) << 8;
+    v += -p1.z + q0.w;
+    pixels.z |= ((v + q0.z + 4) >> 3) << 16;
+    pixels.z |= q0.w << 24;
+    buffer.Store2(offset - 4, pixels.yz);
+  } else if (type == 1 && flat_uv) {
+    pixels.y &= 0x0000ffff;
+    pixels.z &= 0xffff0000;
+    // 5-tap filter [1, 2, 2, 2, 1]
+    pixels.y |= ((p1.y * 3 + p1.z * 2 + p1.w * 2 + q0.x + 4) >> 3) << 16;
+    pixels.y |= ((p1.y + p1.z * 2 + p1.w * 2 + q0.x * 2 + q0.y + 4) >> 3) << 24;
+    pixels.z |= ((p1.z + p1.w * 2 + q0.x * 2 + q0.y * 2 + q0.z + 4) >> 3);
+    pixels.z |= ((p1.w + q0.x * 2 + q0.y * 2 + q0.z * 3 + 4) >> 3) << 8;
+    buffer.Store2(offset - 4, pixels.yz);
+  } else {
+    uint hev = (abs(p1.w - p1.z) > limits.z || abs(q0.x - q0.y) > limits.z) ? 0xffffffff : 0;
+    int ps1 = p1.z - 128;
+    int ps0 = p1.w - 128;
+    int qs0 = q0.x - 128;
+    int qs1 = q0.y - 128;
+
+    int f0 = clamp(ps1 - qs1, -128, 127) & hev;
+    f0 = clamp(f0 + 3 * (qs0 - ps0), -128, 124);
+    int f1 = min(f0 + 4, 127) >> 3;
+    int f2 = (f0 + 3) >> 3;
+
+    pixels.y &= 0x0000FFFF;
+    pixels.z &= 0xFFFF0000;
+
+    pixels.y |= (clamp(ps0 + f2, -128, 127) + 128) << 24;
+    pixels.z |= clamp(qs0 - f1, -128, 127) + 128;
+
+    f0 = ((f1 + 1) >> 1) & (~hev);
+
+    pixels.y |= (clamp(ps1 + f0, -128, 127) + 128) << 16;
+    pixels.z |= (clamp(qs1 - f0, -128, 127) + 128) << 8;
+    buffer.Store2(offset - 4, pixels.yz);
+  }
+}
+
+RWByteAddressBuffer dst_frame : register(u0);
+ByteAddressBuffer lf_blocks : register(t0);
+
+cbuffer LoopfilterData : register(b0) {
+  int4 cb_planes[3];
+  int4 cb_limits[64];
+};
+
+cbuffer LoopfilterSRT : register(b1) {
+  uint cb_wicount;
+  uint cb_plane;
+  uint cb_offset_base;
+  uint cb_block_cols;
+  uint cb_block_id_offset;
+};
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wicount) return;
+  const int plane = cb_plane;
+  const int wi = thread.x & 3;
+  const uint block_id = ((thread.x >> 2) << 1) + cb_block_id_offset;
+  uint2 data = lf_blocks.Load2((cb_offset_base + block_id) * BlockSize + (wi << 3));
+  const int local_offset = (thread.x & (64 - 4)) << 2;
+  input_edges[local_offset + (wi << 2) + 0] = data.x & 0xffff;
+  input_edges[local_offset + (wi << 2) + 1] = (data.x >> 16);
+  input_edges[local_offset + (wi << 2) + 2] = data.y & 0xffff;
+  input_edges[local_offset + (wi << 2) + 3] = (data.y >> 16);
+
+  GroupMemoryBarrier();
+
+  const int block_x = block_id % cb_block_cols;
+  const int block_y = block_id / cb_block_cols;
+  const int stride = cb_planes[plane].x;
+  const int addr = cb_planes[plane].y + block_x * 64 + (block_y * 4 + wi) * stride;
+
+  for (int col = 0; col < 16;) {
+    uint edge = input_edges[local_offset + col];
+    if (!edge) break;
+    int level = edge & 63;
+    int filter = (edge >> 6) & 3;
+    int step = edge >> 8;
+
+    if (level) {
+      const int4 limits = cb_limits[level];
+      filter_vertical_edge(dst_frame, addr + col * 4, limits, filter);
+    }
+    col += step;
+  }
+}

diff --git a/libav1/dx/shaders/loopfilter_vert_hbd.hlsl b/libav1/dx/shaders/loopfilter_vert_hbd.hlsl
new file mode 100644
index 0000000..3812660
--- /dev/null
+++ b/libav1/dx/shaders/loopfilter_vert_hbd.hlsl

@@ -0,0 +1,204 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#define BlockSize 32
+#define SignedPixelMin -512
+#define SignedPixelMax 511
+#define PixelGray 512
+#define Thresh10 4
+
+groupshared uint input_edges[16 * (64 / 4)];
+
+void filter_vertical_edge(RWByteAddressBuffer buffer, uint offset, int4 limits, uint type) {
+  uint2 pixels0 = uint2(0, 0);
+  uint2 pixels2 = uint2(0, 0);
+
+  if (type == 3) pixels0 = buffer.Load2(offset - 16);
+  uint4 pixels = buffer.Load4(offset - 8);
+  if (type == 3) pixels2 = buffer.Load2(offset + 8);
+
+  int4 p0, p1, q0, q1;
+  p0.x = (pixels0.x >> 0) & 0xffff;
+  p0.y = (pixels0.x >> 16) & 0xffff;
+  p0.z = (pixels0.y >> 0) & 0xffff;
+  p0.w = (pixels0.y >> 16) & 0xffff;
+
+  p1.x = (pixels.x >> 0) & 0xffff;
+  p1.y = (pixels.x >> 16) & 0xffff;
+  p1.z = (pixels.y >> 0) & 0xffff;
+  p1.w = (pixels.y >> 16) & 0xffff;
+
+  q0.x = (pixels.z >> 0) & 0xffff;
+  q0.y = (pixels.z >> 16) & 0xffff;
+  q0.z = (pixels.w >> 0) & 0xffff;
+  q0.w = (pixels.w >> 16) & 0xffff;
+
+  q1.x = (pixels2.x >> 0) & 0xffff;
+  q1.y = (pixels2.x >> 16) & 0xffff;
+  q1.z = (pixels2.y >> 0) & 0xffff;
+  q1.w = (pixels2.y >> 16) & 0xffff;
+
+  int mask = abs(p1.z - p1.w) <= limits.x && abs(q0.x - q0.y) <= limits.x &&
+             abs(p1.w - q0.x) * 2 + (abs(p1.z - q0.y) >> 1) <= limits.y;
+  mask &= (abs(p1.y - p1.z) <= limits.x && abs(q0.y - q0.z) <= limits.x) | (type == 0);
+  mask &= (abs(p1.x - p1.y) <= limits.x && abs(q0.z - q0.w) <= limits.x) | (type <= 1);
+
+  if (!mask) return;
+
+  int flat_uv = abs(p1.y - p1.w) <= Thresh10 && abs(p1.z - p1.w) <= Thresh10 && abs(q0.y - q0.x) <= Thresh10 &&
+                abs(q0.z - q0.x) <= Thresh10;
+  int flat = flat_uv && abs(p1.x - p1.w) <= Thresh10 && abs(q0.w - q0.x) <= Thresh10;
+  int flat2 = abs(p0.y - p1.w) <= Thresh10 && abs(p0.z - p1.w) <= Thresh10 && abs(p0.w - p1.w) <= Thresh10 &&
+              abs(q1.x - q0.x) <= Thresh10 && abs(q1.y - q0.x) <= Thresh10 && abs(q1.z - q0.x) <= Thresh10;
+  if (type == 3 && flat && flat2) {
+    uint pix_l;
+    uint pix_r;
+    pix_l = ((p0.y * 7 + p0.z * 2 + p0.w * 2 + p1.x * 1 + p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 1 + 8) >> 4);
+    pix_l |=
+        ((p0.y * 5 + p0.z * 2 + p0.w * 2 + p1.x * 2 + p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 1 + 8) >> 4)
+        << 16;
+    pixels.x = ((p0.y * 4 + p0.z * 1 + p0.w * 2 + p1.x * 2 + p1.y * 2 + p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 1 +
+                 q0.z * 1 + 8) >>
+                4);
+    pixels.x |= ((p0.y * 3 + p0.z * 1 + p0.w * 1 + p1.x * 2 + p1.y * 2 + p1.z * 2 + p1.w * 1 + q0.x * 1 + q0.y * 1 +
+                  q0.z * 1 + q0.w * 1 + 8) >>
+                 4)
+                << 16;
+    pixels.y = ((p0.y * 2 + p0.z * 1 + p0.w * 1 + p1.x * 1 + p1.y * 2 + p1.z * 2 + p1.w * 2 + q0.x * 1 + q0.y * 1 +
+                 q0.z * 1 + q0.w * 1 + q1.x * 1 + 8) >>
+                4);
+    pixels.y |= ((p0.y * 1 + p0.z * 1 + p0.w * 1 + p1.x * 1 + p1.y * 1 + p1.z * 2 + p1.w * 2 + q0.x * 2 + q0.y * 1 +
+                  q0.z * 1 + q0.w * 1 + q1.x * 1 + q1.y * 1 + 8) >>
+                 4)
+                << 16;
+    pixels.z = ((p0.z * 1 + p0.w * 1 + p1.x * 1 + p1.y * 1 + p1.z * 1 + p1.w * 2 + q0.x * 2 + q0.y * 2 + q0.z * 1 +
+                 q0.w * 1 + q1.x * 1 + q1.y * 1 + q1.z * 1 + 8) >>
+                4);
+    pixels.z |= ((p0.w * 1 + p1.x * 1 + p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 2 + q0.y * 2 + q0.z * 2 + q0.w * 1 +
+                  q1.x * 1 + q1.y * 1 + q1.z * 2 + 8) >>
+                 4)
+                << 16;
+    pixels.w = ((p1.x * 1 + p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 2 + q0.z * 2 + q0.w * 2 + q1.x * 1 +
+                 q1.y * 1 + q1.z * 3 + 8) >>
+                4);
+    pixels.w |= ((p1.y * 1 + p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 1 + q0.z * 2 + q0.w * 2 + q1.x * 2 + q1.y * 1 +
+                  q1.z * 4 + 8) >>
+                 4)
+                << 16;
+    pix_r =
+        ((p1.z * 1 + p1.w * 1 + q0.x * 1 + q0.y * 1 + q0.z * 1 + q0.w * 2 + q1.x * 2 + q1.y * 2 + q1.z * 5 + 8) >> 4);
+    pix_r |= ((p1.w * 1 + q0.x * 1 + q0.y * 1 + q0.z * 1 + q0.w * 1 + q1.x * 2 + q1.y * 2 + q1.z * 7 + 8) >> 4) << 16;
+    buffer.Store(offset - 12, pix_l);
+    buffer.Store4(offset - 8, pixels);
+    buffer.Store(offset + 8, pix_r);
+  } else if (type >= 2 && flat) {
+    int v = p1.x * 3 + p1.y + p1.z + p1.w + q0.x;
+    pixels.x = p1.x;
+    pixels.x |= ((v + p1.y + 4) >> 3) << 16;
+    v += -p1.x + q0.y;
+    pixels.y = ((v + p1.z + 4) >> 3);
+    v += -p1.x + q0.z;
+    pixels.y |= ((v + p1.w + 4) >> 3) << 16;
+
+    v += -p1.x + q0.w;
+    pixels.z = (v + q0.x + 4) >> 3;
+    v += -p1.y + q0.w;
+    pixels.z |= ((v + q0.y + 4) >> 3) << 16;
+    v += -p1.z + q0.w;
+    pixels.w = ((v + q0.z + 4) >> 3);
+    pixels.w |= q0.w << 16;
+    buffer.Store4(offset - 8, pixels);
+  } else if (type == 1 && flat_uv) {
+    // 5-tap filter [1, 2, 2, 2, 1]
+    pixels.y = ((p1.y * 3 + p1.z * 2 + p1.w * 2 + q0.x + 4) >> 3);
+    pixels.y |= ((p1.y + p1.z * 2 + p1.w * 2 + q0.x * 2 + q0.y + 4) >> 3) << 16;
+    pixels.z = ((p1.z + p1.w * 2 + q0.x * 2 + q0.y * 2 + q0.z + 4) >> 3);
+    pixels.z |= ((p1.w + q0.x * 2 + q0.y * 2 + q0.z * 3 + 4) >> 3) << 16;
+    buffer.Store2(offset - 4, pixels.yz);
+  } else {
+    uint hev = (abs(p1.w - p1.z) > limits.z || abs(q0.x - q0.y) > limits.z) ? 0xffffffff : 0;
+
+    int ps1 = p1.z - PixelGray;
+    int ps0 = p1.w - PixelGray;
+    int qs0 = q0.x - PixelGray;
+    int qs1 = q0.y - PixelGray;
+
+    int f0 = clamp(ps1 - qs1, SignedPixelMin, SignedPixelMax) & hev;
+    f0 = clamp(f0 + 3 * (qs0 - ps0), SignedPixelMin, SignedPixelMax);
+    int f1 = clamp(f0 + 4, SignedPixelMin, SignedPixelMax) >> 3;
+    int f2 = clamp(f0 + 3, SignedPixelMin, SignedPixelMax) >> 3;
+
+    pixels.y = (clamp(ps0 + f2, SignedPixelMin, SignedPixelMax) + PixelGray) << 16;
+    pixels.z = clamp(qs0 - f1, SignedPixelMin, SignedPixelMax) + PixelGray;
+
+    f0 = ((f1 + 1) >> 1) & (~hev);
+
+    pixels.y |= clamp(ps1 + f0, SignedPixelMin, SignedPixelMax) + PixelGray;
+    pixels.z |= (clamp(qs1 - f0, SignedPixelMin, SignedPixelMax) + PixelGray) << 16;
+    buffer.Store2(offset - 4, pixels.yz);
+  }
+}
+
+RWByteAddressBuffer dst_frame : register(u0);
+ByteAddressBuffer lf_blocks : register(t0);
+
+cbuffer LoopfilterData : register(b0) {
+  int4 cb_planes[3];
+  int4 cb_limits[64];
+};
+
+cbuffer LoopfilterSRT : register(b1) {
+  uint cb_wicount;
+  uint cb_plane;
+  uint cb_offset_base;
+  uint cb_block_cols;
+  uint cb_block_id_offset;
+};
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wicount) return;
+  const int plane = cb_plane;
+  const int wi = thread.x & 3;
+  const uint block_id = ((thread.x >> 2) << 1) + cb_block_id_offset;
+  uint2 data = lf_blocks.Load2((cb_offset_base + block_id) * BlockSize + (wi << 3));
+  const int local_offset = (thread.x & (64 - 4)) << 2;
+  input_edges[local_offset + (wi << 2) + 0] = data.x & 0xffff;
+  input_edges[local_offset + (wi << 2) + 1] = (data.x >> 16);
+  input_edges[local_offset + (wi << 2) + 2] = data.y & 0xffff;
+  input_edges[local_offset + (wi << 2) + 3] = (data.y >> 16);
+
+  GroupMemoryBarrier();
+
+  const int block_x = block_id % cb_block_cols;
+  const int block_y = block_id / cb_block_cols;
+  const int stride = cb_planes[plane].x;
+  const int addr = cb_planes[plane].y + block_x * 128 + (block_y * 4 + wi) * stride;
+
+  for (int col = 0; col < 16;) {
+    uint edge = input_edges[local_offset + col];
+    if (!edge) break;
+    int level = edge & 63;
+    int filter = (edge >> 6) & 3;
+    int step = edge >> 8;
+
+    if (level) {
+      const int4 limits = cb_limits[level] << 2;
+      filter_vertical_edge(dst_frame, addr + col * 8, limits, filter);
+    }
+    col += step;
+  }
+}

diff --git a/libav1/dx/shaders/mode_info.h b/libav1/dx/shaders/mode_info.h
new file mode 100644
index 0000000..fb13cd2
--- /dev/null
+++ b/libav1/dx/shaders/mode_info.h

@@ -0,0 +1,69 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+struct INTERINTER_COMPOUND_DATA {
+  int type;
+  int mask_type;
+  int wedge_index;
+  int wedge_sign;
+};
+
+struct WarpedMotionParams {
+  int4 mat[2];
+  int2 angles;
+  int type;
+};
+
+struct PALETTE_MODE_INFO {
+  int colors[12];  // TODO remove this?
+  uint sizes;
+};
+
+struct MB_MODE_INFO {
+  PALETTE_MODE_INFO palette_mode_info;
+  WarpedMotionParams wm_params;
+  INTERINTER_COMPOUND_DATA interinter_comp;
+  uint filter_intra_mode_info;
+  int mv[2];
+  uint interp_filters;
+  // TODO(debargha): Consolidate these flags
+  int interintra_wedge_index;
+  int interintra_wedge_sign;
+  int current_qindex;
+  int delta_lf_from_base;
+  int delta_lf[4];
+
+  int mi_row;
+  int mi_col;
+  uint index_base;
+
+  int num_proj_ref;
+  // Index of the alpha Cb and alpha Cr combination
+  int cfl_alpha_idx;
+  // Joint sign of alpha Cb and alpha Cr
+  int cfl_alpha_signs;
+
+  uint comp_idx;
+  uint block_type;
+  uint modes;
+  //    uint txk_type[16];
+  uint intra_mode_flags;
+  uint tx_info;
+  uint inter_tx_size[4];
+  uint seg_cdef_flags;
+};  // sizeof(MB_MODE_INFO)==224
+
+#define ModeInfoSize 224
\ No newline at end of file

diff --git a/libav1/dx/shaders/reconstruct_block.hlsl b/libav1/dx/shaders/reconstruct_block.hlsl
new file mode 100644
index 0000000..d68e134
--- /dev/null
+++ b/libav1/dx/shaders/reconstruct_block.hlsl

@@ -0,0 +1,79 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#define SubblockW 4
+#define SubblockH 4
+
+cbuffer IntraDataCommon : register(b0) {
+  int4 cb_planes[3];
+  int4 cb_flags;
+  int4 cb_filter[5][8][2];
+  int4 cb_mode_params_lut[16][7];
+  int4 cb_sm_weight_arrays[128];
+};
+
+cbuffer PSSLIntraSRT : register(b1) {
+  uint cb_wi_count;
+  int cb_pass_offset;
+  uint cb_width_log2;
+  uint cb_height_log2;
+  uint cb_fb_base_offset;
+  int r[5];
+};
+
+ByteAddressBuffer pred_blocks : register(t0);
+ByteAddressBuffer residuals : register(t1);
+ByteAddressBuffer palette_buf : register(t2);
+RWByteAddressBuffer dst_frame : register(u0);
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log));
+  uint2 block = pred_blocks.Load2(block_index << 4);
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * ((block.x >> 16) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+
+  const int do_recon = block.y & 4;
+  const int do_palette = block.y & 8;
+
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+  const int res_stride = cb_planes[plane].z;
+  const int res_addr = cb_planes[plane].w + 2 * x + y * res_stride;
+  for (int i = 0; i < 4; ++i) {
+    uint addr = output_offset + i * output_stride;
+    uint pixels;
+    if (do_palette)
+      pixels = palette_buf.Load(addr - cb_fb_base_offset);
+    else
+      pixels = dst_frame.Load(addr);
+    if (do_recon) {
+      int2 res = residuals.Load2(res_addr + i * res_stride);
+      pixels = (clamp((int)((pixels >> 0) & 255) + (int)((res.x << 16) >> 16), 0, 255) << 0) |
+               (clamp((int)((pixels >> 8) & 255) + (int)(res.x >> 16), 0, 255) << 8) |
+               (clamp((int)((pixels >> 16) & 255) + (int)((res.y << 16) >> 16), 0, 255) << 16) |
+               (clamp((int)((pixels >> 24) & 255) + (int)(res.y >> 16), 0, 255) << 24);
+    }
+    dst_frame.Store(addr, pixels);
+  }
+}
\ No newline at end of file

diff --git a/libav1/dx/shaders/reconstruct_block_hbd.hlsl b/libav1/dx/shaders/reconstruct_block_hbd.hlsl
new file mode 100644
index 0000000..5103df1
--- /dev/null
+++ b/libav1/dx/shaders/reconstruct_block_hbd.hlsl

@@ -0,0 +1,80 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#define SubblockW 4
+#define SubblockH 4
+#define PixelMax 1023
+
+cbuffer IntraDataCommon : register(b0) {
+  int4 cb_planes[3];
+  int4 cb_flags;
+  int4 cb_filter[5][8][2];
+  int4 cb_mode_params_lut[16][7];
+  int4 cb_sm_weight_arrays[128];
+};
+
+cbuffer PSSLIntraSRT : register(b1) {
+  uint cb_wi_count;
+  int cb_pass_offset;
+  uint cb_width_log2;
+  uint cb_height_log2;
+  uint cb_fb_base_offset;
+  int r[5];
+};
+
+ByteAddressBuffer pred_blocks : register(t0);
+ByteAddressBuffer residuals : register(t1);
+ByteAddressBuffer palette_buf : register(t2);
+RWByteAddressBuffer dst_frame : register(u0);
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  if (thread.x >= cb_wi_count) return;
+
+  const int w_log = cb_width_log2;
+  const int h_log = cb_height_log2;
+  const int subblock = thread.x & ((1 << (w_log + h_log)) - 1);
+  const int block_index = cb_pass_offset + (thread.x >> (w_log + h_log));
+  uint2 block = pred_blocks.Load2(block_index << 4);
+  int x = SubblockW * ((block.x & 0xffff) + (subblock & ((1 << w_log) - 1)));
+  int y = SubblockH * ((block.x >> 16) + (subblock >> w_log));
+
+  const int plane = block.y & 3;
+  const int do_recon = block.y & 4;
+  const int do_palette = block.y & 8;
+  x <<= 1;
+  const int output_stride = cb_planes[plane].x;
+  const int output_offset = cb_planes[plane].y + x + y * output_stride;
+  const int res_stride = cb_planes[plane].z;
+  const int res_addr = cb_planes[plane].w + x + y * res_stride;
+
+  for (int i = 0; i < 4; ++i) {
+    uint addr = output_offset + i * output_stride;
+    uint2 pixels;
+    if (do_palette)
+      pixels = palette_buf.Load2(addr - cb_fb_base_offset);
+    else
+      pixels = dst_frame.Load2(addr);
+    if (do_recon) {
+      int2 res = residuals.Load2(res_addr + i * res_stride);
+      pixels.x = clamp((int)((pixels.x >> 0) & PixelMax) + (int)((res.x << 16) >> 16), 0, PixelMax) |
+                 (clamp((int)((pixels.x >> 16) & PixelMax) + (int)(res.x >> 16), 0, PixelMax) << 16);
+      pixels.y = clamp((int)((pixels.y >> 0) & PixelMax) + (int)((res.y << 16) >> 16), 0, PixelMax) |
+                 (clamp((int)((pixels.y >> 16) & PixelMax) + (int)(res.y >> 16), 0, PixelMax) << 16);
+    }
+    dst_frame.Store2(addr, pixels);
+  }
+}

diff --git a/libav1/dx/shaders/restoration.h b/libav1/dx/shaders/restoration.h
new file mode 100644
index 0000000..c91333f
--- /dev/null
+++ b/libav1/dx/shaders/restoration.h

@@ -0,0 +1,166 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_RESTORATION_H_
+#define AOM_AV1_COMMON_RESTORATION_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// Border for Loop restoration buffer
+#define AOM_RESTORATION_FRAME_BORDER 32
+#define CLIP(x, lo, hi) ((x) < (lo) ? (lo) : (x) > (hi) ? (hi) : (x))
+#define RINT(x) ((x) < 0 ? (int)((x)-0.5) : (int)((x) + 0.5))
+
+#define RESTORATION_PROC_UNIT_SIZE 64
+
+// Filter tile grid offset upwards compared to the superblock grid
+#define RESTORATION_UNIT_OFFSET 8
+
+#define SGRPROJ_BORDER_VERT 3  // Vertical border used for Sgr
+#define SGRPROJ_BORDER_HORZ 3  // Horizontal border used for Sgr
+
+#define WIENER_BORDER_VERT 2  // Vertical border used for Wiener
+#define WIENER_HALFWIN 3
+#define WIENER_BORDER_HORZ (WIENER_HALFWIN)  // Horizontal border for Wiener
+
+// RESTORATION_BORDER_VERT determines line buffer requirement for LR.
+// Should be set at the max of SGRPROJ_BORDER_VERT and WIENER_BORDER_VERT.
+// Note the line buffer needed is twice the value of this macro.
+#if SGRPROJ_BORDER_VERT >= WIENER_BORDER_VERT
+#define RESTORATION_BORDER_VERT (SGRPROJ_BORDER_VERT)
+#else
+#define RESTORATION_BORDER_VERT (WIENER_BORDER_VERT)
+#endif  // SGRPROJ_BORDER_VERT >= WIENER_BORDER_VERT
+
+#if SGRPROJ_BORDER_HORZ >= WIENER_BORDER_HORZ
+#define RESTORATION_BORDER_HORZ (SGRPROJ_BORDER_HORZ)
+#else
+#define RESTORATION_BORDER_HORZ (WIENER_BORDER_HORZ)
+#endif  // SGRPROJ_BORDER_VERT >= WIENER_BORDER_VERT
+
+// How many border pixels do we need for each processing unit?
+#define RESTORATION_BORDER 3
+
+// How many rows of deblocked pixels do we save above/below each processing
+// stripe?
+#define RESTORATION_CTX_VERT 2
+
+// Additional pixels to the left and right in above/below buffers
+// It is RESTORATION_BORDER_HORZ rounded up to get nicer buffer alignment
+#define RESTORATION_EXTRA_HORZ 4
+
+// Pad up to 20 more (may be much less is needed)
+#define RESTORATION_PADDING 20
+#define RESTORATION_PROC_UNIT_PELS                                                    \
+  ((RESTORATION_PROC_UNIT_SIZE + RESTORATION_BORDER_HORZ * 2 + RESTORATION_PADDING) * \
+   (RESTORATION_PROC_UNIT_SIZE + RESTORATION_BORDER_VERT * 2 + RESTORATION_PADDING))
+
+#define RESTORATION_UNITSIZE_MAX 256
+#define RESTORATION_UNITPELS_HORZ_MAX (RESTORATION_UNITSIZE_MAX * 3 / 2 + 2 * RESTORATION_BORDER_HORZ + 16)
+#define RESTORATION_UNITPELS_VERT_MAX \
+  ((RESTORATION_UNITSIZE_MAX * 3 / 2 + 2 * RESTORATION_BORDER_VERT + RESTORATION_UNIT_OFFSET))
+#define RESTORATION_UNITPELS_MAX (RESTORATION_UNITPELS_HORZ_MAX * RESTORATION_UNITPELS_VERT_MAX)
+
+// Two 32-bit buffers needed for the restored versions from two filters
+// TODO(debargha, rupert): Refactor to not need the large tilesize to be stored
+// on the decoder side.
+#define SGRPROJ_TMPBUF_SIZE (RESTORATION_UNITPELS_MAX * 2 * sizeof(int32_t))
+
+#define SGRPROJ_EXTBUF_SIZE (0)
+#define SGRPROJ_PARAMS_BITS 4
+#define SGRPROJ_PARAMS (1 << SGRPROJ_PARAMS_BITS)
+
+// Precision bits for projection
+#define SGRPROJ_PRJ_BITS 7
+// Restoration precision bits generated higher than source before projection
+#define SGRPROJ_RST_BITS 4
+// Internal precision bits for core selfguided_restoration
+#define SGRPROJ_SGR_BITS 8
+#define SGRPROJ_SGR (1 << SGRPROJ_SGR_BITS)
+
+#define SGRPROJ_PRJ_MIN0 (-(1 << SGRPROJ_PRJ_BITS) * 3 / 4)
+#define SGRPROJ_PRJ_MAX0 (SGRPROJ_PRJ_MIN0 + (1 << SGRPROJ_PRJ_BITS) - 1)
+#define SGRPROJ_PRJ_MIN1 (-(1 << SGRPROJ_PRJ_BITS) / 4)
+#define SGRPROJ_PRJ_MAX1 (SGRPROJ_PRJ_MIN1 + (1 << SGRPROJ_PRJ_BITS) - 1)
+
+#define SGRPROJ_PRJ_SUBEXP_K 4
+
+#define SGRPROJ_BITS (SGRPROJ_PRJ_BITS * 2 + SGRPROJ_PARAMS_BITS)
+
+#define MAX_RADIUS 2  // Only 1, 2, 3 allowed
+#define MAX_NELEM ((2 * MAX_RADIUS + 1) * (2 * MAX_RADIUS + 1))
+#define SGRPROJ_MTABLE_BITS 20
+#define SGRPROJ_RECIP_BITS 12
+
+#define WIENER_HALFWIN1 (WIENER_HALFWIN + 1)
+#define WIENER_WIN (2 * WIENER_HALFWIN + 1)
+#define WIENER_WIN2 ((WIENER_WIN) * (WIENER_WIN))
+#define WIENER_TMPBUF_SIZE (0)
+#define WIENER_EXTBUF_SIZE (0)
+
+// If WIENER_WIN_CHROMA == WIENER_WIN - 2, that implies 5x5 filters are used for
+// chroma. To use 7x7 for chroma set WIENER_WIN_CHROMA to WIENER_WIN.
+#define WIENER_WIN_CHROMA (WIENER_WIN - 2)
+#define WIENER_WIN2_CHROMA ((WIENER_WIN_CHROMA) * (WIENER_WIN_CHROMA))
+
+#define WIENER_FILT_PREC_BITS 7
+#define WIENER_FILT_STEP (1 << WIENER_FILT_PREC_BITS)
+
+// Central values for the taps
+#define WIENER_FILT_TAP0_MIDV (3)
+#define WIENER_FILT_TAP1_MIDV (-7)
+#define WIENER_FILT_TAP2_MIDV (15)
+#define WIENER_FILT_TAP3_MIDV \
+  (WIENER_FILT_STEP - 2 * (WIENER_FILT_TAP0_MIDV + WIENER_FILT_TAP1_MIDV + WIENER_FILT_TAP2_MIDV))
+
+#define WIENER_FILT_TAP0_BITS 4
+#define WIENER_FILT_TAP1_BITS 5
+#define WIENER_FILT_TAP2_BITS 6
+
+#define WIENER_FILT_BITS ((WIENER_FILT_TAP0_BITS + WIENER_FILT_TAP1_BITS + WIENER_FILT_TAP2_BITS) * 2)
+
+#define WIENER_FILT_TAP0_MINV (WIENER_FILT_TAP0_MIDV - (1 << WIENER_FILT_TAP0_BITS) / 2)
+#define WIENER_FILT_TAP1_MINV (WIENER_FILT_TAP1_MIDV - (1 << WIENER_FILT_TAP1_BITS) / 2)
+#define WIENER_FILT_TAP2_MINV (WIENER_FILT_TAP2_MIDV - (1 << WIENER_FILT_TAP2_BITS) / 2)
+
+#define WIENER_FILT_TAP0_MAXV (WIENER_FILT_TAP0_MIDV - 1 + (1 << WIENER_FILT_TAP0_BITS) / 2)
+#define WIENER_FILT_TAP1_MAXV (WIENER_FILT_TAP1_MIDV - 1 + (1 << WIENER_FILT_TAP1_BITS) / 2)
+#define WIENER_FILT_TAP2_MAXV (WIENER_FILT_TAP2_MIDV - 1 + (1 << WIENER_FILT_TAP2_BITS) / 2)
+
+#define WIENER_FILT_TAP0_SUBEXP_K 1
+#define WIENER_FILT_TAP1_SUBEXP_K 2
+#define WIENER_FILT_TAP2_SUBEXP_K 3
+
+// Max of SGRPROJ_TMPBUF_SIZE, DOMAINTXFMRF_TMPBUF_SIZE, WIENER_TMPBUF_SIZE
+#define RESTORATION_TMPBUF_SIZE (SGRPROJ_TMPBUF_SIZE)
+
+// Max of SGRPROJ_EXTBUF_SIZE, WIENER_EXTBUF_SIZE
+#define RESTORATION_EXTBUF_SIZE (WIENER_EXTBUF_SIZE)
+
+#if WIENER_FILT_PREC_BITS != 7
+#error "Wiener filter currently only works if WIENER_FILT_PREC_BITS == 7"
+#endif
+
+#define LR_TILE_ROW 0
+#define LR_TILE_COL 0
+#define LR_TILE_COLS 1
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_AV1_COMMON_RESTORATION_H_

diff --git a/libav1/dx/shaders/upscaling.hlsl b/libav1/dx/shaders/upscaling.hlsl
new file mode 100644
index 0000000..f20dcd2
--- /dev/null
+++ b/libav1/dx/shaders/upscaling.hlsl

@@ -0,0 +1,155 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma warning(disable : 3556)
+#define Round2(value, n) (((value) + (((1 << (n)) >> 1))) >> (n))
+#define SUPERRES_SCALE_BITS 14
+#define SUPERRES_SCALE_MASK ((1 << 14) - 1)
+#define SUPERRES_EXTRA_BITS 8
+#define SUPERRES_FILTER_OFFSET 3
+#define SUPERRES_FILTER_TAPS 8
+#define SUPERRES_FILTER_BITS 6
+#define SUPERRES_FILTER_SHIFTS (1 << SUPERRES_FILTER_BITS)
+#define FILTER_BITS 7
+
+RWByteAddressBuffer dst_frame : register(u0);
+
+cbuffer UpscaleData : register(b0) {
+  uint4 src_planes[3];
+  uint4 dst_planes[3];
+};
+
+cbuffer UpscaleSRT : register(b1) {
+  uint cb_plane;
+  uint cb_wi_count;
+  uint cb_hbd;
+};
+
+[numthreads(64, 1, 1)] void main(uint3 thread
+                                 : SV_DispatchThreadID) {
+  const int Upscale_Filter[SUPERRES_FILTER_SHIFTS][SUPERRES_FILTER_TAPS] = {
+      {0, 0, 0, 128, 0, 0, 0, 0},        {0, 0, -1, 128, 2, -1, 0, 0},      {0, 1, -3, 127, 4, -2, 1, 0},
+      {0, 1, -4, 127, 6, -3, 1, 0},      {0, 2, -6, 126, 8, -3, 1, 0},      {0, 2, -7, 125, 11, -4, 1, 0},
+      {-1, 2, -8, 125, 13, -5, 2, 0},    {-1, 3, -9, 124, 15, -6, 2, 0},    {-1, 3, -10, 123, 18, -6, 2, -1},
+      {-1, 3, -11, 122, 20, -7, 3, -1},  {-1, 4, -12, 121, 22, -8, 3, -1},  {-1, 4, -13, 120, 25, -9, 3, -1},
+      {-1, 4, -14, 118, 28, -9, 3, -1},  {-1, 4, -15, 117, 30, -10, 4, -1}, {-1, 5, -16, 116, 32, -11, 4, -1},
+      {-1, 5, -16, 114, 35, -12, 4, -1}, {-1, 5, -17, 112, 38, -12, 4, -1}, {-1, 5, -18, 111, 40, -13, 5, -1},
+      {-1, 5, -18, 109, 43, -14, 5, -1}, {-1, 6, -19, 107, 45, -14, 5, -1}, {-1, 6, -19, 105, 48, -15, 5, -1},
+      {-1, 6, -19, 103, 51, -16, 5, -1}, {-1, 6, -20, 101, 53, -16, 6, -1}, {-1, 6, -20, 99, 56, -17, 6, -1},
+      {-1, 6, -20, 97, 58, -17, 6, -1},  {-1, 6, -20, 95, 61, -18, 6, -1},  {-2, 7, -20, 93, 64, -18, 6, -2},
+      {-2, 7, -20, 91, 66, -19, 6, -1},  {-2, 7, -20, 88, 69, -19, 6, -1},  {-2, 7, -20, 86, 71, -19, 6, -1},
+      {-2, 7, -20, 84, 74, -20, 7, -2},  {-2, 7, -20, 81, 76, -20, 7, -1},  {-2, 7, -20, 79, 79, -20, 7, -2},
+      {-1, 7, -20, 76, 81, -20, 7, -2},  {-2, 7, -20, 74, 84, -20, 7, -2},  {-1, 6, -19, 71, 86, -20, 7, -2},
+      {-1, 6, -19, 69, 88, -20, 7, -2},  {-1, 6, -19, 66, 91, -20, 7, -2},  {-2, 6, -18, 64, 93, -20, 7, -2},
+      {-1, 6, -18, 61, 95, -20, 6, -1},  {-1, 6, -17, 58, 97, -20, 6, -1},  {-1, 6, -17, 56, 99, -20, 6, -1},
+      {-1, 6, -16, 53, 101, -20, 6, -1}, {-1, 5, -16, 51, 103, -19, 6, -1}, {-1, 5, -15, 48, 105, -19, 6, -1},
+      {-1, 5, -14, 45, 107, -19, 6, -1}, {-1, 5, -14, 43, 109, -18, 5, -1}, {-1, 5, -13, 40, 111, -18, 5, -1},
+      {-1, 4, -12, 38, 112, -17, 5, -1}, {-1, 4, -12, 35, 114, -16, 5, -1}, {-1, 4, -11, 32, 116, -16, 5, -1},
+      {-1, 4, -10, 30, 117, -15, 4, -1}, {-1, 3, -9, 28, 118, -14, 4, -1},  {-1, 3, -9, 25, 120, -13, 4, -1},
+      {-1, 3, -8, 22, 121, -12, 4, -1},  {-1, 3, -7, 20, 122, -11, 3, -1},  {-1, 2, -6, 18, 123, -10, 3, -1},
+      {0, 2, -6, 15, 124, -9, 3, -1},    {0, 2, -5, 13, 125, -8, 2, -1},    {0, 1, -4, 11, 125, -7, 2, 0},
+      {0, 1, -3, 8, 126, -6, 2, 0},      {0, 1, -3, 6, 127, -4, 1, 0},      {0, 1, -2, 4, 127, -3, 1, 0},
+      {0, 0, -1, 2, 128, -1, 0, 0},
+  };
+
+  if (thread.x >= cb_wi_count) return;
+  uint4 dst_plane = dst_planes[cb_plane];
+  uint4 src_plane = src_planes[cb_plane];
+  const uint w = (dst_plane.z + 3) >> 2;
+  const uint x = thread.x % w;
+  const uint y = thread.x / w;
+
+  const int downscaledPlaneW = src_plane.z;
+  const int upscaledPlaneW = dst_plane.z;
+  const int stepX = ((downscaledPlaneW << SUPERRES_SCALE_BITS) + (upscaledPlaneW >> 1)) / upscaledPlaneW;
+  const int err = (upscaledPlaneW * stepX) - (downscaledPlaneW << SUPERRES_SCALE_BITS);
+  int initialSubpelX =
+      (-((upscaledPlaneW - downscaledPlaneW) << (SUPERRES_SCALE_BITS - 1)) + (upscaledPlaneW >> 1)) / upscaledPlaneW +
+      (1 << (SUPERRES_EXTRA_BITS - 1)) - err / 2;
+  initialSubpelX &= SUPERRES_SCALE_MASK;
+  int minX = 0;
+  int maxX = src_plane.w - 1;
+  // for (int y = 0; y < planeH; y++ ) {
+  // for (int x = 0; x < upscaledPlaneW; x+=4 ) {
+  int4 srcX;
+  srcX.x = -(1 << SUPERRES_SCALE_BITS) + initialSubpelX + (x * 4 + 0) * stepX;
+  srcX.y = -(1 << SUPERRES_SCALE_BITS) + initialSubpelX + (x * 4 + 1) * stepX;
+  srcX.z = -(1 << SUPERRES_SCALE_BITS) + initialSubpelX + (x * 4 + 2) * stepX;
+  srcX.w = -(1 << SUPERRES_SCALE_BITS) + initialSubpelX + (x * 4 + 3) * stepX;
+  int4 srcXPx = (srcX >> SUPERRES_SCALE_BITS);
+  int4 srcXSubpel = (srcX & SUPERRES_SCALE_MASK) >> SUPERRES_EXTRA_BITS;
+  int4 sum = 0;
+  int first_x = (srcXPx.x - SUPERRES_FILTER_OFFSET) >> 2;
+  int samples[16];
+  for (int i = 0; i < 4; i++) {
+    if (cb_hbd) {
+      if (first_x + i < minX) {
+        uint pixels = dst_frame.Load(src_plane.y + y * src_plane.x);
+        samples[i * 4 + 0] = pixels & 0x3ff;
+        samples[i * 4 + 1] = pixels & 0x3ff;
+        samples[i * 4 + 2] = pixels & 0x3ff;
+        samples[i * 4 + 3] = pixels & 0x3ff;
+      } else if (first_x + i > (maxX >> 2)) {
+        uint2 pixels = dst_frame.Load2(src_plane.y + ((maxX >> 2) << 3) + y * src_plane.x);
+        samples[i * 4 + 0] = (pixels.y >> 16) & 0x3ff;
+        samples[i * 4 + 1] = (pixels.y >> 16) & 0x3ff;
+        samples[i * 4 + 2] = (pixels.y >> 16) & 0x3ff;
+        samples[i * 4 + 3] = (pixels.y >> 16) & 0x3ff;
+      } else {
+        uint2 pixels = dst_frame.Load2(src_plane.y + (first_x + i) * 8 + y * src_plane.x);
+        samples[i * 4 + 0] = (pixels.x >> 0) & 0x3ff;
+        samples[i * 4 + 1] = (pixels.x >> 16) & 0x3ff;
+        samples[i * 4 + 2] = (pixels.y >> 0) & 0x3ff;
+        samples[i * 4 + 3] = (pixels.y >> 16) & 0x3ff;
+      }
+    } else {
+      if (first_x + i < minX) {
+        uint pixels = dst_frame.Load(src_plane.y + y * src_plane.x);
+        samples[i * 4 + 0] = pixels & 0xff;
+        samples[i * 4 + 1] = pixels & 0xff;
+        samples[i * 4 + 2] = pixels & 0xff;
+        samples[i * 4 + 3] = pixels & 0xff;
+      } else if (first_x + i > (maxX >> 2)) {
+        uint pixels = dst_frame.Load(src_plane.y + ((maxX >> 2) << 2) + y * src_plane.x);
+        samples[i * 4 + 0] = (pixels >> 24) & 0xff;
+        samples[i * 4 + 1] = (pixels >> 24) & 0xff;
+        samples[i * 4 + 2] = (pixels >> 24) & 0xff;
+        samples[i * 4 + 3] = (pixels >> 24) & 0xff;
+      } else {
+        uint pixels = dst_frame.Load(src_plane.y + (first_x + i) * 4 + y * src_plane.x);
+        samples[i * 4 + 0] = (pixels >> 0) & 0xff;
+        samples[i * 4 + 1] = (pixels >> 8) & 0xff;
+        samples[i * 4 + 2] = (pixels >> 16) & 0xff;
+        samples[i * 4 + 3] = (pixels >> 24) & 0xff;
+      }
+    }
+  }
+  for (int k = 0; k < SUPERRES_FILTER_TAPS; k++) {
+    int4 sampleX = srcXPx + (k - SUPERRES_FILTER_OFFSET) - (first_x << 2);
+    sum.x += samples[sampleX.x] * Upscale_Filter[srcXSubpel.x][k];
+    sum.y += samples[sampleX.y] * Upscale_Filter[srcXSubpel.y][k];
+    sum.z += samples[sampleX.z] * Upscale_Filter[srcXSubpel.z][k];
+    sum.w += samples[sampleX.w] * Upscale_Filter[srcXSubpel.w][k];
+  }
+  sum = clamp(Round2(sum, FILTER_BITS), 0, cb_hbd ? 1023 : 255);
+
+  if (cb_hbd) {
+    dst_frame.Store2(dst_plane.y + x * 8 + y * dst_plane.x, uint2(sum.x | (sum.y << 16), sum.z | (sum.w << 16)));
+  } else {
+    uint res = (sum.x << 0) | (sum.y << 8) | (sum.z << 16) | (sum.w << 24);
+    dst_frame.Store(dst_plane.y + x * 4 + y * dst_plane.x, res);
+  }
+}

diff --git a/libav1/dx/transform.cpp b/libav1/dx/transform.cpp
new file mode 100644
index 0000000..3b3e844
--- /dev/null
+++ b/libav1/dx/transform.cpp

@@ -0,0 +1,137 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "dx/types.h"
+#include "dx/av1_core.h"
+#include "dx/av1_memory.h"
+#include "dx/av1_compute.h"
+#include "av1\common\scan.h"
+#include "av1\common\idct.h"
+
+struct cb_sort_data {
+  int count;
+  int src_offset;
+  int reserved[2];
+  int offsets[20][4];
+};
+
+struct idct_frame_data {
+  int planes[3][4];
+  int bitdepth;
+};
+
+void av1_idct_run(Av1Core *dec) {
+  av1_frame_thread_data *td = dec->curr_frame_data;
+
+  const int tile_count = td->tile_count;
+  int offset = 0;
+  for (int i = 0; i <= TX_SIZES_ALL + 1; ++i) {
+    for (int t = 0; t < tile_count; ++t) {
+      int count = td->tile_data[t].idct_blocks_sizes[i];
+      td->tile_data[t].idct_blocks_sizes[i] = offset;
+      offset += count;
+    }
+  }
+
+  ComputeCommandBuffer *cb = &td->command_buffer;
+  Microsoft::WRL::ComPtr<ID3D12GraphicsCommandList> command_list = dec->compute.command_list;
+  ComputeShader *shader = &dec->shader_lib->shader_fill_buffer;
+
+  command_list->SetComputeRootSignature(shader->signaturePtr.Get());
+  command_list->SetPipelineState(shader->pso.Get());
+  command_list->SetComputeRootUnorderedAccessView(0, dec->idct_residuals->dev->GetGPUVirtualAddress());
+  int value[2];
+  value[0] = static_cast<int>(dec->idct_residuals->size >> 4);
+  value[1] = 0;
+  command_list->SetComputeRoot32BitConstants(1, 2, value, 0);
+  command_list->Dispatch((value[0] + 63) / 64, 1, 1);
+
+  shader = &dec->shader_lib->shader_idct_sort;
+  command_list->SetComputeRootSignature(shader->signaturePtr.Get());
+  command_list->SetPipelineState(shader->pso.Get());
+  command_list->SetComputeRootShaderResourceView(0, td->idct_blocks_unordered->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootUnorderedAccessView(1, dec->idct_blocks->dev->GetGPUVirtualAddress());
+  for (int t = 0; t < tile_count; ++t) {
+    ConstantBufferObject cbo = cb->Alloc(sizeof(cb_sort_data));
+    cb_sort_data *sort_data = reinterpret_cast<cb_sort_data *>(cbo.host_ptr);
+
+    // av1_tile_thread_data * thr = td->thread_data + thread;
+    av1_tile_data *tile = td->tile_data + t;
+    for (int i = 0; i < TX_SIZES_ALL + 1; ++i) {
+      sort_data->offsets[i][0] = tile->idct_blocks_sizes[i];
+    }
+    sort_data->count = tile->idct_blocks_ptr;
+    sort_data->src_offset = tile->blocks_offset;
+    if (tile->idct_blocks_ptr) {
+      command_list->SetComputeRootConstantBufferView(2, cbo.dev_address);
+      command_list->Dispatch((tile->idct_blocks_ptr + 63) / 64, 1, 1);
+    }
+  }
+
+  D3D12_RESOURCE_BARRIER barriers[] = {CD3DX12_RESOURCE_BARRIER::UAV(dec->idct_blocks->dev),
+                                       CD3DX12_RESOURCE_BARRIER::UAV(dec->idct_residuals->dev)};
+  command_list->ResourceBarrier(2, barriers);
+  ConstantBufferObject cbo = cb->Alloc(sizeof(idct_frame_data));
+  idct_frame_data *data = reinterpret_cast<idct_frame_data *>(cbo.host_ptr);
+  data->planes[0][0] = td->frame_buffer->planes[0].res_stride;
+  data->planes[0][1] = td->frame_buffer->planes[0].res_offset;
+  data->planes[1][0] = td->frame_buffer->planes[1].res_stride;
+  data->planes[1][1] = td->frame_buffer->planes[1].res_offset;
+  data->planes[2][0] = td->frame_buffer->planes[2].res_stride;
+  data->planes[2][1] = td->frame_buffer->planes[2].res_offset;
+  data->bitdepth = td->bitdepth;
+  av1_tile_data *tdata = td->tile_data;
+
+  command_list->SetComputeRootSignature(dec->shader_lib->sig_idct.Get());
+  command_list->SetComputeRootShaderResourceView(0, dec->idct_coefs->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootShaderResourceView(1, dec->idct_blocks->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootUnorderedAccessView(2, dec->idct_residuals->dev->GetGPUVirtualAddress());
+  command_list->SetComputeRootConstantBufferView(4, cbo.dev_address);
+  for (int type = 0; type <= TX_SIZES_ALL; ++type) {
+    int offset = tdata->idct_blocks_sizes[type];
+    int count = tdata->idct_blocks_sizes[type + 1] - offset;
+    if (count == 0) continue;
+
+    const int tx_size = type == TX_SIZES_ALL ? TX_4X4 : type;
+    const int block_w = tx_size_wide[tx_size];
+    const int block_h = tx_size_high[tx_size];
+    const int block_size = block_w * block_h;
+    const int wi_per_block = AOMMIN(block_w, block_h);
+
+    ConstantBufferObject scan_cbo = cb->Alloc(block_size * 3 * 4);
+    int *gpu_scans = reinterpret_cast<int *>(scan_cbo.host_ptr);
+    for (int s = 0; s < 3; ++s) {
+      int *dst = gpu_scans + s * block_size;
+      const int16_t *scan = av1_idct_scans[tx_size][s];
+      if (block_w > 32) {
+        for (int i = 0; i < 32 * block_h; ++i) {
+          int s = scan[i];
+          dst[i] = (s & 31) + (s / 32) * 64;
+        }
+      } else
+        for (int i = 0; i < block_size; ++i) dst[i] = scan[i];
+    }
+
+    int const_inline[2];
+    const_inline[0] = offset;
+    const_inline[1] = wi_per_block * count;
+    command_list->SetPipelineState(dec->shader_lib->shader_idct[type].pso.Get());
+    command_list->SetComputeRootConstantBufferView(3, scan_cbo.dev_address);
+    command_list->SetComputeRoot32BitConstants(5, 2, const_inline, 0);
+    command_list->Dispatch((const_inline[1] + 63) / 64, 1, 1);
+  }
+  command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::UAV(dec->idct_residuals->dev));
+}

diff --git a/libav1/dx/types.h b/libav1/dx/types.h
new file mode 100644
index 0000000..b4fba6d
--- /dev/null
+++ b/libav1/dx/types.h

@@ -0,0 +1,246 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+#include "av1/common/blockd.h"
+#include "dx/av1_compute.h"
+#include "dx/av1_memory.h"
+#include "dx/av1_thread.h"
+#include "av1/common/onyxc_int.h"
+
+enum {
+  UsePipeline = 0,
+  PipelineSize = 2,
+  FrameThreadDataCount = 2,
+  EntropyThreadCount = 8,
+  ImagePoolSize = 16,
+};
+
+struct InterSortingInfo {
+  uint8_t bw_log2;
+  uint8_t bh_log2;
+  uint8_t type;
+  uint8_t type_index;
+  uint32_t index;
+};
+
+struct InterBlock {
+  uint16_t x, y;
+  uint32_t flags;
+  MV mv0;
+  MV mv1;
+  // uint32_t sorting_idx;
+};
+
+enum InterTypes {
+  // warp;
+  Warp = 0,
+
+  CasualInter,
+  CompoundAvrg,
+  CompoundDiff,
+  CompoundMasked,
+  CompoundGlobalWarp,
+  // Inter2x2, Inter2x2Comp here;
+  // sync
+  CompoundDiffUv,
+  CompoundDiffUvGlobalWarp,
+  ObmcAbove,
+  // sync
+  ObmcLeft,
+
+  // 2x2 only
+  Inter2x2,  //-> first pass
+  Inter2x2Comp,
+
+  Inter2x2CompP2,  //-> pass 2
+
+  InterSizesAllCommon = 24,
+  InterTypesNon2x2 = Inter2x2,
+  Inter2x2ArrOffset = InterSizesAllCommon * (InterTypesNon2x2 - 1),
+  Inter2x2Count = 3,
+  InterCountsAll = Inter2x2ArrOffset + Inter2x2Count,
+
+  CompoundTypeAvrg = 0,
+  CompoundTypeMasked = 1,
+  CompoundTypeDiffY = 2,
+  CompoundTypeDiffUv = 3,
+};
+
+enum {
+  IntraSizes = 9,                   // bw_log + bh_log = (0..4) + (0..4) = 0..8
+  ReconstructBlockSizes = 36,       //(0..5)*(0..5)
+  IntraTypeCount = IntraSizes + 1,  /// + filter_intra
+  IntraBlockOffset = InterTypes::InterCountsAll + ReconstructBlockSizes + IntraSizes,
+  ReconBlockOffset = InterTypes::InterCountsAll,
+};
+
+struct ScaleFactor {
+  int x_scale;
+  int x_step;
+  int y_scale;
+  int y_step;
+};
+
+struct ConstantBufferObject {
+  void* host_ptr;
+  D3D12_GPU_VIRTUAL_ADDRESS dev_address;
+  ConstantBufferObject(void* ptr, D3D12_GPU_VIRTUAL_ADDRESS addr) : host_ptr(ptr), dev_address(addr) {}
+};
+
+struct ComputeCommandBuffer {
+  Microsoft::WRL::ComPtr<ID3D12CommandAllocator> allocator;
+  Microsoft::WRL::ComPtr<ID3D12Fence> fence;
+  HANDLE event;
+
+  GpuBufferObject* cb_alloc;
+  size_t cb_ptr;
+  uint32_t fence_value;
+
+  ConstantBufferObject Alloc(size_t size) {
+    size = (size + 255) & (~255);
+    assert(cb_ptr + size < cb_alloc->size);
+    size_t p = cb_ptr;
+    cb_ptr += size;
+    return ConstantBufferObject(reinterpret_cast<char*>(cb_alloc->host_ptr) + p,
+                                cb_alloc->dev->GetGPUVirtualAddress() + p);
+  }
+  void Reset() { cb_ptr = 0; }
+};
+
+struct av1_frame_thread_data {
+  ComputeCommandBuffer command_buffer;
+  bitdepth_dependent_shaders* shaders;
+  GpuBufferObject* mode_info_grid;
+  GpuBufferObject* idct_blocks_unordered;
+
+  GpuBufferObject* loop_rest_types;
+  GpuBufferObject* loop_rest_wiener;
+  GpuBufferObject* cdef_indexes;
+  GpuBufferObject* cdef_skips;
+  GpuBufferObject* filmgrain_rand_offset;
+
+  GpuBufferObject* gen_mi_block_indexes;
+  GpuBufferObject* gen_intra_inter_grid;
+  GpuBufferObject* gen_block_map;
+
+  GpuBufferObject* palette_buffer;
+
+  int* gen_intra_iter_y;
+  int* gen_intra_iter_uv;
+  int intra_iters;
+  int do_cdef;
+  int do_loop_rest;
+  int do_filmgrain;
+  int do_superres;
+  int is_hbd;
+
+  int ext_idct_buffer;
+  int iter_grid_stride;
+  int iter_grid_stride_uv;
+  int iter_grid_offset_uv;
+  int frame_number;
+  int bitdepth;
+  int thread_count;
+  int tile_count;
+  int mode_info_offset;
+  int mode_info_max;
+  int coef_buffer_offset;
+  av1_tile_data* tile_data;
+  HwFrameBuffer* frame_buffer;
+  HwFrameBuffer* back_buffer0;
+  HwFrameBuffer* dst_frame_buffer;
+  HwFrameBuffer* refs[8];
+
+  av1_frame_thread_data* sec_thread_data;
+  pthread_mutex_t sec_data_mutex;
+  int scale_enable;
+  ScaleFactor scale_factors[8];
+  volatile int64_t perf_markers[16];
+};
+
+typedef struct {
+  av1_frame_thread_data* data;
+  HwOutputImage* image;
+} GpuWorkItem;
+
+typedef struct Av1Core {
+  dx_compute_context compute;
+  av1_memory_allocator* memory;
+  compute_shader_lib* shader_lib;
+  av1_get_decoded_buffer_cb_fn_t cb_get_output_image;
+  av1_release_decoded_buffer_cb_fn_t cb_release_image;
+  aom_notify_frame_ready_cb_fn_t cb_notify_frame_ready;
+  void* image_alloc_priv;
+
+  void* ring_buf_host;
+  void* pbi_alloc;
+  void* buf_pool_alloc;
+  void** above_context_alloc[5];
+  void* restoration_info_alloc[3];
+  void* tplmvs_alloc;
+  GpuBufferObject* ring_buf_dev;
+  GpuBufferObject* idct_blocks;
+  GpuBufferObject* idct_residuals;
+  GpuBufferObject* frame_buffer_pool;
+  GpuBufferObject* inter_mask_lut;
+  GpuBufferObject* inter_warp_filter;
+  GpuBufferObject* mode_info_pool;
+  GpuBufferObject* idct_coefs;
+
+  GpuBufferObject* filmgrain_noise;
+  GpuBufferObject* filmgrain_gaus;
+  GpuBufferObject* filmgrain_random_luma;
+  GpuBufferObject* filmgrain_random_chroma;
+
+  GpuBufferObject* prediction_blocks;
+  GpuBufferObject* prediction_blocks_warp;
+  GpuBufferObject* loopfilter_blocks;
+
+  size_t fb_pool_alloc_ptr;
+  size_t fb_pool_alloc_reset;
+
+  int pred_map_size;
+  int block_count4x4;
+
+  av1_frame_thread_data frame_thread_data[FrameThreadDataCount];
+  av1_frame_thread_data* curr_frame_data;
+  int frame_number;
+  int thread_count;
+  int corrupted_seq;
+  HwFrameBuffer back_buffer1;
+  int fb_size;
+  int fb_offset;
+  int enable_superres;
+  int wedge_offsets[BLOCK_SIZES_ALL][2];
+
+  HwFrameBuffer fb_pool_src[16];
+  HwOutputImage image_pool_src[ImagePoolSize];
+  pthread_cond_t fb_pool_empty_cond;
+  pthread_mutex_t fb_pool_mutex;
+  DataPtrQueue fb_pool;
+  DataPtrQueueMT image_pool;
+  DataPtrQueueMT output_queue;
+  DataPtrQueueMT frame_data_pool;
+  GpuWorkItem gpu_item_pool_src[8];
+  DataPtrQueueMT gpu_item_pool;
+  DataPtrQueueMT gpu_waiting_queue;
+  pthread_t gpu_thread;
+  int tryhdr10x3;
+} Av1Core;
+
+Av1Core* Get(uint32_t may_be_null = 0);
+void PutPerfMarker(av1_frame_thread_data* td, volatile int64_t* marker);

diff --git a/libav1/dx/vp9dx_linear_allocator.cpp b/libav1/dx/vp9dx_linear_allocator.cpp
new file mode 100644
index 0000000..46427d7
--- /dev/null
+++ b/libav1/dx/vp9dx_linear_allocator.cpp

@@ -0,0 +1,121 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "dx/vp9dx_linear_allocator.h"
+
+static void remove_item(AllocItem* items, int items_count, int pos) {
+  for (int i = pos + 1; i < items_count; ++i) {
+    items[i - 1] = items[i];
+  }
+}
+
+void la_init(LinearAllocator* allocator, uint8_t* ptr, size_t size) {
+  allocator->base_ptr = ptr;
+  allocator->free_chunks_count = 1;
+  allocator->allocated_item_count = 0;
+  allocator->free_mem_chuncks[0].offset = 0;
+  allocator->free_mem_chuncks[0].size = size;
+}
+
+uint8_t* la_allocate(LinearAllocator* allocator, size_t size) {
+  size = (size + BlockAlignment - 1) & (~(BlockAlignment - 1));
+
+  int chunk = -1;
+  for (int i = 0; i < allocator->free_chunks_count; ++i) {
+    if (allocator->free_mem_chuncks[i].size >= size) {
+      chunk = i;
+      break;
+    }
+  }
+
+  if (chunk == -1) return NULL;
+
+  size_t offset = allocator->free_mem_chuncks[chunk].offset;
+  allocator->free_mem_chuncks[chunk].offset += size;
+  allocator->free_mem_chuncks[chunk].size -= size;
+
+  if (allocator->free_mem_chuncks[chunk].size == 0) {
+    remove_item(allocator->free_mem_chuncks, allocator->free_chunks_count, chunk);
+    --allocator->free_chunks_count;
+  }
+
+  uint8_t* ptr = allocator->base_ptr + offset;
+
+  allocator->allocated_items[allocator->allocated_item_count].size = size;
+  allocator->allocated_items[allocator->allocated_item_count].offset = offset;
+
+  ++allocator->allocated_item_count;
+  return ptr;
+}
+
+void la_free(LinearAllocator* allocator, uint8_t* ptr) {
+  size_t offset = (size_t)(ptr - allocator->base_ptr);
+  int index = -1;
+  for (int i = 0; i < allocator->allocated_item_count; ++i) {
+    if (allocator->allocated_items[i].offset == offset) {
+      index = i;
+      break;
+    }
+  }
+
+  if (index == -1) {
+    return;
+  }
+  size_t size = allocator->allocated_items[index].size;
+
+  remove_item(allocator->allocated_items, allocator->allocated_item_count, index);
+  --allocator->allocated_item_count;
+
+  int next_chunk = -1;
+  for (int i = 0; i < allocator->free_chunks_count; ++i) {
+    if (allocator->free_mem_chuncks[i].offset > offset) {
+      next_chunk = i;
+      break;
+    }
+  }
+
+  int prev_chunk = (next_chunk == -1) ? allocator->free_chunks_count - 1 : next_chunk - 1;
+
+  // add new memory chunk and/or merge with existing chunks
+  if (prev_chunk >= 0 &&
+      (allocator->free_mem_chuncks[prev_chunk].offset + allocator->free_mem_chuncks[prev_chunk].size) == offset) {
+    allocator->free_mem_chuncks[prev_chunk].size += size;
+    const size_t chunk_end =
+        allocator->free_mem_chuncks[prev_chunk].offset + allocator->free_mem_chuncks[prev_chunk].size;
+    if (prev_chunk < (allocator->free_chunks_count - 1) &&
+        chunk_end == allocator->free_mem_chuncks[prev_chunk + 1].offset) {
+      allocator->free_mem_chuncks[prev_chunk].size += allocator->free_mem_chuncks[prev_chunk + 1].size;
+      remove_item(allocator->free_mem_chuncks, allocator->free_chunks_count, prev_chunk + 1);
+      --allocator->free_chunks_count;
+    }
+  } else {
+    if (next_chunk != -1 && (offset + size) == allocator->free_mem_chuncks[next_chunk].offset) {
+      allocator->free_mem_chuncks[next_chunk].offset = offset;
+      allocator->free_mem_chuncks[next_chunk].size += size;
+    } else {
+      int pos = allocator->free_chunks_count;
+      if (next_chunk != -1) {
+        pos = next_chunk;
+        for (int i = allocator->free_chunks_count; i > next_chunk; --i) {
+          allocator->free_mem_chuncks[i] = allocator->free_mem_chuncks[i - 1];
+        }
+      }
+      allocator->free_mem_chuncks[pos].offset = offset;
+      allocator->free_mem_chuncks[pos].size = size;
+      ++allocator->free_chunks_count;
+    }
+  }
+}

diff --git a/libav1/dx/vp9dx_linear_allocator.h b/libav1/dx/vp9dx_linear_allocator.h
new file mode 100644
index 0000000..aaaa369
--- /dev/null
+++ b/libav1/dx/vp9dx_linear_allocator.h

@@ -0,0 +1,41 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+
+#include <inttypes.h>
+#include <cstring>
+
+enum { MaxChuncks = 256, MaxAllocations = 256, BlockAlignment = 65536 };
+
+struct AllocItem {
+  size_t offset;
+  size_t size;
+};
+
+struct LinearAllocator {
+  AllocItem free_mem_chuncks[MaxChuncks];
+  AllocItem allocated_items[MaxAllocations];
+
+  int free_chunks_count;
+  int allocated_item_count;
+
+  uint8_t* base_ptr;
+};
+
+void la_init(LinearAllocator* allocator, uint8_t* ptr, size_t size);
+uint8_t* la_allocate(LinearAllocator* allocator, size_t size);
+void la_free(LinearAllocator* allocator, uint8_t* ptr);

diff --git a/testing/common/d3dx12.h b/testing/common/d3dx12.h
new file mode 100644
index 0000000..7eb73ff
--- /dev/null
+++ b/testing/common/d3dx12.h

@@ -0,0 +1,1445 @@
+//*********************************************************
+//
+// Copyright (c) Microsoft. All rights reserved.
+// This code is licensed under the MIT License (MIT).
+// THIS CODE IS PROVIDED *AS IS* WITHOUT WARRANTY OF
+// ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING ANY
+// IMPLIED WARRANTIES OF FITNESS FOR A PARTICULAR
+// PURPOSE, MERCHANTABILITY, OR NON-INFRINGEMENT.
+//
+//*********************************************************
+
+#ifndef __D3DX12_H__
+#define __D3DX12_H__
+
+#include "d3d12.h"
+
+#if defined(__cplusplus)
+
+struct CD3DX12_DEFAULT {};
+extern const DECLSPEC_SELECTANY CD3DX12_DEFAULT D3D12_DEFAULT;
+
+//------------------------------------------------------------------------------------------------
+inline bool operator==(const D3D12_VIEWPORT& l, const D3D12_VIEWPORT& r) {
+  return l.TopLeftX == r.TopLeftX && l.TopLeftY == r.TopLeftY && l.Width == r.Width && l.Height == r.Height &&
+         l.MinDepth == r.MinDepth && l.MaxDepth == r.MaxDepth;
+}
+
+//------------------------------------------------------------------------------------------------
+inline bool operator!=(const D3D12_VIEWPORT& l, const D3D12_VIEWPORT& r) { return !(l == r); }
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_RECT : public D3D12_RECT {
+  CD3DX12_RECT() {}
+  explicit CD3DX12_RECT(const D3D12_RECT& o) : D3D12_RECT(o) {}
+  explicit CD3DX12_RECT(LONG Left, LONG Top, LONG Right, LONG Bottom) {
+    left = Left;
+    top = Top;
+    right = Right;
+    bottom = Bottom;
+  }
+  ~CD3DX12_RECT() {}
+  operator const D3D12_RECT&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_VIEWPORT : public D3D12_VIEWPORT {
+  CD3DX12_VIEWPORT() {}
+  explicit CD3DX12_VIEWPORT(const D3D12_VIEWPORT& o) : D3D12_VIEWPORT(o) {}
+  explicit CD3DX12_VIEWPORT(FLOAT topLeftX, FLOAT topLeftY, FLOAT width, FLOAT height, FLOAT minDepth = D3D12_MIN_DEPTH,
+                            FLOAT maxDepth = D3D12_MAX_DEPTH) {
+    TopLeftX = topLeftX;
+    TopLeftY = topLeftY;
+    Width = width;
+    Height = height;
+    MinDepth = minDepth;
+    MaxDepth = maxDepth;
+  }
+  explicit CD3DX12_VIEWPORT(_In_ ID3D12Resource* pResource, UINT mipSlice = 0, FLOAT topLeftX = 0.0f,
+                            FLOAT topLeftY = 0.0f, FLOAT minDepth = D3D12_MIN_DEPTH, FLOAT maxDepth = D3D12_MAX_DEPTH) {
+    D3D12_RESOURCE_DESC Desc = pResource->GetDesc();
+    const UINT64 SubresourceWidth = Desc.Width >> mipSlice;
+    const UINT64 SubresourceHeight = Desc.Height >> mipSlice;
+    switch (Desc.Dimension) {
+      case D3D12_RESOURCE_DIMENSION_BUFFER:
+        TopLeftX = topLeftX;
+        TopLeftY = 0.0f;
+        Width = Desc.Width - topLeftX;
+        Height = 1.0f;
+        break;
+      case D3D12_RESOURCE_DIMENSION_TEXTURE1D:
+        TopLeftX = topLeftX;
+        TopLeftY = 0.0f;
+        Width = (SubresourceWidth ? SubresourceWidth : 1.0f) - topLeftX;
+        Height = 1.0f;
+        break;
+      case D3D12_RESOURCE_DIMENSION_TEXTURE2D:
+      case D3D12_RESOURCE_DIMENSION_TEXTURE3D:
+        TopLeftX = topLeftX;
+        TopLeftY = topLeftY;
+        Width = (SubresourceWidth ? SubresourceWidth : 1.0f) - topLeftX;
+        Height = (SubresourceHeight ? SubresourceHeight : 1.0f) - topLeftY;
+        break;
+      default:
+        break;
+    }
+
+    MinDepth = minDepth;
+    MaxDepth = maxDepth;
+  }
+  ~CD3DX12_VIEWPORT() {}
+  operator const D3D12_VIEWPORT&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_BOX : public D3D12_BOX {
+  CD3DX12_BOX() {}
+  explicit CD3DX12_BOX(const D3D12_BOX& o) : D3D12_BOX(o) {}
+  explicit CD3DX12_BOX(LONG Left, LONG Right) {
+    left = Left;
+    top = 0;
+    front = 0;
+    right = Right;
+    bottom = 1;
+    back = 1;
+  }
+  explicit CD3DX12_BOX(LONG Left, LONG Top, LONG Right, LONG Bottom) {
+    left = Left;
+    top = Top;
+    front = 0;
+    right = Right;
+    bottom = Bottom;
+    back = 1;
+  }
+  explicit CD3DX12_BOX(LONG Left, LONG Top, LONG Front, LONG Right, LONG Bottom, LONG Back) {
+    left = Left;
+    top = Top;
+    front = Front;
+    right = Right;
+    bottom = Bottom;
+    back = Back;
+  }
+  ~CD3DX12_BOX() {}
+  operator const D3D12_BOX&() const { return *this; }
+};
+inline bool operator==(const D3D12_BOX& l, const D3D12_BOX& r) {
+  return l.left == r.left && l.top == r.top && l.front == r.front && l.right == r.right && l.bottom == r.bottom &&
+         l.back == r.back;
+}
+inline bool operator!=(const D3D12_BOX& l, const D3D12_BOX& r) { return !(l == r); }
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_DEPTH_STENCIL_DESC : public D3D12_DEPTH_STENCIL_DESC {
+  CD3DX12_DEPTH_STENCIL_DESC() {}
+  explicit CD3DX12_DEPTH_STENCIL_DESC(const D3D12_DEPTH_STENCIL_DESC& o) : D3D12_DEPTH_STENCIL_DESC(o) {}
+  explicit CD3DX12_DEPTH_STENCIL_DESC(CD3DX12_DEFAULT) {
+    DepthEnable = TRUE;
+    DepthWriteMask = D3D12_DEPTH_WRITE_MASK_ALL;
+    DepthFunc = D3D12_COMPARISON_FUNC_LESS;
+    StencilEnable = FALSE;
+    StencilReadMask = D3D12_DEFAULT_STENCIL_READ_MASK;
+    StencilWriteMask = D3D12_DEFAULT_STENCIL_WRITE_MASK;
+    const D3D12_DEPTH_STENCILOP_DESC defaultStencilOp = {D3D12_STENCIL_OP_KEEP, D3D12_STENCIL_OP_KEEP,
+                                                         D3D12_STENCIL_OP_KEEP, D3D12_COMPARISON_FUNC_ALWAYS};
+    FrontFace = defaultStencilOp;
+    BackFace = defaultStencilOp;
+  }
+  explicit CD3DX12_DEPTH_STENCIL_DESC(BOOL depthEnable, D3D12_DEPTH_WRITE_MASK depthWriteMask,
+                                      D3D12_COMPARISON_FUNC depthFunc, BOOL stencilEnable, UINT8 stencilReadMask,
+                                      UINT8 stencilWriteMask, D3D12_STENCIL_OP frontStencilFailOp,
+                                      D3D12_STENCIL_OP frontStencilDepthFailOp, D3D12_STENCIL_OP frontStencilPassOp,
+                                      D3D12_COMPARISON_FUNC frontStencilFunc, D3D12_STENCIL_OP backStencilFailOp,
+                                      D3D12_STENCIL_OP backStencilDepthFailOp, D3D12_STENCIL_OP backStencilPassOp,
+                                      D3D12_COMPARISON_FUNC backStencilFunc) {
+    DepthEnable = depthEnable;
+    DepthWriteMask = depthWriteMask;
+    DepthFunc = depthFunc;
+    StencilEnable = stencilEnable;
+    StencilReadMask = stencilReadMask;
+    StencilWriteMask = stencilWriteMask;
+    FrontFace.StencilFailOp = frontStencilFailOp;
+    FrontFace.StencilDepthFailOp = frontStencilDepthFailOp;
+    FrontFace.StencilPassOp = frontStencilPassOp;
+    FrontFace.StencilFunc = frontStencilFunc;
+    BackFace.StencilFailOp = backStencilFailOp;
+    BackFace.StencilDepthFailOp = backStencilDepthFailOp;
+    BackFace.StencilPassOp = backStencilPassOp;
+    BackFace.StencilFunc = backStencilFunc;
+  }
+  ~CD3DX12_DEPTH_STENCIL_DESC() {}
+  operator const D3D12_DEPTH_STENCIL_DESC&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_BLEND_DESC : public D3D12_BLEND_DESC {
+  CD3DX12_BLEND_DESC() {}
+  explicit CD3DX12_BLEND_DESC(const D3D12_BLEND_DESC& o) : D3D12_BLEND_DESC(o) {}
+  explicit CD3DX12_BLEND_DESC(CD3DX12_DEFAULT) {
+    AlphaToCoverageEnable = FALSE;
+    IndependentBlendEnable = FALSE;
+    const D3D12_RENDER_TARGET_BLEND_DESC defaultRenderTargetBlendDesc = {
+        FALSE,
+        FALSE,
+        D3D12_BLEND_ONE,
+        D3D12_BLEND_ZERO,
+        D3D12_BLEND_OP_ADD,
+        D3D12_BLEND_ONE,
+        D3D12_BLEND_ZERO,
+        D3D12_BLEND_OP_ADD,
+        D3D12_LOGIC_OP_NOOP,
+        D3D12_COLOR_WRITE_ENABLE_ALL,
+    };
+    for (UINT i = 0; i < D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT; ++i) RenderTarget[i] = defaultRenderTargetBlendDesc;
+  }
+  ~CD3DX12_BLEND_DESC() {}
+  operator const D3D12_BLEND_DESC&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_RASTERIZER_DESC : public D3D12_RASTERIZER_DESC {
+  CD3DX12_RASTERIZER_DESC() {}
+  explicit CD3DX12_RASTERIZER_DESC(const D3D12_RASTERIZER_DESC& o) : D3D12_RASTERIZER_DESC(o) {}
+  explicit CD3DX12_RASTERIZER_DESC(CD3DX12_DEFAULT) {
+    FillMode = D3D12_FILL_MODE_SOLID;
+    CullMode = D3D12_CULL_MODE_BACK;
+    FrontCounterClockwise = FALSE;
+    DepthBias = D3D12_DEFAULT_DEPTH_BIAS;
+    DepthBiasClamp = D3D12_DEFAULT_DEPTH_BIAS_CLAMP;
+    SlopeScaledDepthBias = D3D12_DEFAULT_SLOPE_SCALED_DEPTH_BIAS;
+    DepthClipEnable = TRUE;
+    MultisampleEnable = FALSE;
+    AntialiasedLineEnable = FALSE;
+    ForcedSampleCount = 0;
+    ConservativeRaster = D3D12_CONSERVATIVE_RASTERIZATION_MODE_OFF;
+  }
+  explicit CD3DX12_RASTERIZER_DESC(D3D12_FILL_MODE fillMode, D3D12_CULL_MODE cullMode, BOOL frontCounterClockwise,
+                                   INT depthBias, FLOAT depthBiasClamp, FLOAT slopeScaledDepthBias,
+                                   BOOL depthClipEnable, BOOL multisampleEnable, BOOL antialiasedLineEnable,
+                                   UINT forcedSampleCount, D3D12_CONSERVATIVE_RASTERIZATION_MODE conservativeRaster) {
+    FillMode = fillMode;
+    CullMode = cullMode;
+    FrontCounterClockwise = frontCounterClockwise;
+    DepthBias = depthBias;
+    DepthBiasClamp = depthBiasClamp;
+    SlopeScaledDepthBias = slopeScaledDepthBias;
+    DepthClipEnable = depthClipEnable;
+    MultisampleEnable = multisampleEnable;
+    AntialiasedLineEnable = antialiasedLineEnable;
+    ForcedSampleCount = forcedSampleCount;
+    ConservativeRaster = conservativeRaster;
+  }
+  ~CD3DX12_RASTERIZER_DESC() {}
+  operator const D3D12_RASTERIZER_DESC&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_RESOURCE_ALLOCATION_INFO : public D3D12_RESOURCE_ALLOCATION_INFO {
+  CD3DX12_RESOURCE_ALLOCATION_INFO() {}
+  explicit CD3DX12_RESOURCE_ALLOCATION_INFO(const D3D12_RESOURCE_ALLOCATION_INFO& o)
+      : D3D12_RESOURCE_ALLOCATION_INFO(o) {}
+  CD3DX12_RESOURCE_ALLOCATION_INFO(UINT64 size, UINT64 alignment) {
+    SizeInBytes = size;
+    Alignment = alignment;
+  }
+  operator const D3D12_RESOURCE_ALLOCATION_INFO&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_HEAP_PROPERTIES : public D3D12_HEAP_PROPERTIES {
+  CD3DX12_HEAP_PROPERTIES() {}
+  explicit CD3DX12_HEAP_PROPERTIES(const D3D12_HEAP_PROPERTIES& o) : D3D12_HEAP_PROPERTIES(o) {}
+  CD3DX12_HEAP_PROPERTIES(D3D12_CPU_PAGE_PROPERTY cpuPageProperty, D3D12_MEMORY_POOL memoryPoolPreference,
+                          UINT creationNodeMask = 1, UINT nodeMask = 1) {
+    Type = D3D12_HEAP_TYPE_CUSTOM;
+    CPUPageProperty = cpuPageProperty;
+    MemoryPoolPreference = memoryPoolPreference;
+    CreationNodeMask = creationNodeMask;
+    VisibleNodeMask = nodeMask;
+  }
+  explicit CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE type, UINT creationNodeMask = 1, UINT nodeMask = 1) {
+    Type = type;
+    CPUPageProperty = D3D12_CPU_PAGE_PROPERTY_UNKNOWN;
+    MemoryPoolPreference = D3D12_MEMORY_POOL_UNKNOWN;
+    CreationNodeMask = creationNodeMask;
+    VisibleNodeMask = nodeMask;
+  }
+  operator const D3D12_HEAP_PROPERTIES&() const { return *this; }
+  bool IsCPUAccessible() const {
+    return Type == D3D12_HEAP_TYPE_UPLOAD || Type == D3D12_HEAP_TYPE_READBACK ||
+           (Type == D3D12_HEAP_TYPE_CUSTOM && (CPUPageProperty == D3D12_CPU_PAGE_PROPERTY_WRITE_COMBINE ||
+                                               CPUPageProperty == D3D12_CPU_PAGE_PROPERTY_WRITE_BACK));
+  }
+};
+inline bool operator==(const D3D12_HEAP_PROPERTIES& l, const D3D12_HEAP_PROPERTIES& r) {
+  return l.Type == r.Type && l.CPUPageProperty == r.CPUPageProperty &&
+         l.MemoryPoolPreference == r.MemoryPoolPreference && l.CreationNodeMask == r.CreationNodeMask &&
+         l.VisibleNodeMask == r.VisibleNodeMask;
+}
+inline bool operator!=(const D3D12_HEAP_PROPERTIES& l, const D3D12_HEAP_PROPERTIES& r) { return !(l == r); }
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_HEAP_DESC : public D3D12_HEAP_DESC {
+  CD3DX12_HEAP_DESC() {}
+  explicit CD3DX12_HEAP_DESC(const D3D12_HEAP_DESC& o) : D3D12_HEAP_DESC(o) {}
+  CD3DX12_HEAP_DESC(UINT64 size, D3D12_HEAP_PROPERTIES properties, UINT64 alignment = 0,
+                    D3D12_HEAP_FLAGS flags = D3D12_HEAP_FLAG_NONE) {
+    SizeInBytes = size;
+    Properties = properties;
+    Alignment = alignment;
+    Flags = flags;
+  }
+  CD3DX12_HEAP_DESC(UINT64 size, D3D12_HEAP_TYPE type, UINT64 alignment = 0,
+                    D3D12_HEAP_FLAGS flags = D3D12_HEAP_FLAG_NONE) {
+    SizeInBytes = size;
+    Properties = CD3DX12_HEAP_PROPERTIES(type);
+    Alignment = alignment;
+    Flags = flags;
+  }
+  CD3DX12_HEAP_DESC(UINT64 size, D3D12_CPU_PAGE_PROPERTY cpuPageProperty, D3D12_MEMORY_POOL memoryPoolPreference,
+                    UINT64 alignment = 0, D3D12_HEAP_FLAGS flags = D3D12_HEAP_FLAG_NONE) {
+    SizeInBytes = size;
+    Properties = CD3DX12_HEAP_PROPERTIES(cpuPageProperty, memoryPoolPreference);
+    Alignment = alignment;
+    Flags = flags;
+  }
+  CD3DX12_HEAP_DESC(const D3D12_RESOURCE_ALLOCATION_INFO& resAllocInfo, D3D12_HEAP_PROPERTIES properties,
+                    D3D12_HEAP_FLAGS flags = D3D12_HEAP_FLAG_NONE) {
+    SizeInBytes = resAllocInfo.SizeInBytes;
+    Properties = properties;
+    Alignment = resAllocInfo.Alignment;
+    Flags = flags;
+  }
+  CD3DX12_HEAP_DESC(const D3D12_RESOURCE_ALLOCATION_INFO& resAllocInfo, D3D12_HEAP_TYPE type,
+                    D3D12_HEAP_FLAGS flags = D3D12_HEAP_FLAG_NONE) {
+    SizeInBytes = resAllocInfo.SizeInBytes;
+    Properties = CD3DX12_HEAP_PROPERTIES(type);
+    Alignment = resAllocInfo.Alignment;
+    Flags = flags;
+  }
+  CD3DX12_HEAP_DESC(const D3D12_RESOURCE_ALLOCATION_INFO& resAllocInfo, D3D12_CPU_PAGE_PROPERTY cpuPageProperty,
+                    D3D12_MEMORY_POOL memoryPoolPreference, D3D12_HEAP_FLAGS flags = D3D12_HEAP_FLAG_NONE) {
+    SizeInBytes = resAllocInfo.SizeInBytes;
+    Properties = CD3DX12_HEAP_PROPERTIES(cpuPageProperty, memoryPoolPreference);
+    Alignment = resAllocInfo.Alignment;
+    Flags = flags;
+  }
+  operator const D3D12_HEAP_DESC&() const { return *this; }
+  bool IsCPUAccessible() const { return static_cast<const CD3DX12_HEAP_PROPERTIES*>(&Properties)->IsCPUAccessible(); }
+};
+inline bool operator==(const D3D12_HEAP_DESC& l, const D3D12_HEAP_DESC& r) {
+  return l.SizeInBytes == r.SizeInBytes && l.Properties == r.Properties && l.Alignment == r.Alignment &&
+         l.Flags == r.Flags;
+}
+inline bool operator!=(const D3D12_HEAP_DESC& l, const D3D12_HEAP_DESC& r) { return !(l == r); }
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_CLEAR_VALUE : public D3D12_CLEAR_VALUE {
+  CD3DX12_CLEAR_VALUE() {}
+  explicit CD3DX12_CLEAR_VALUE(const D3D12_CLEAR_VALUE& o) : D3D12_CLEAR_VALUE(o) {}
+  CD3DX12_CLEAR_VALUE(DXGI_FORMAT format, const FLOAT color[4]) {
+    Format = format;
+    memcpy(Color, color, sizeof(Color));
+  }
+  CD3DX12_CLEAR_VALUE(DXGI_FORMAT format, FLOAT depth, UINT8 stencil) {
+    Format = format;
+    /* Use memcpy to preserve NAN values */
+    memcpy(&DepthStencil.Depth, &depth, sizeof(depth));
+    DepthStencil.Stencil = stencil;
+  }
+  operator const D3D12_CLEAR_VALUE&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_RANGE : public D3D12_RANGE {
+  CD3DX12_RANGE() {}
+  explicit CD3DX12_RANGE(const D3D12_RANGE& o) : D3D12_RANGE(o) {}
+  CD3DX12_RANGE(SIZE_T begin, SIZE_T end) {
+    Begin = begin;
+    End = end;
+  }
+  operator const D3D12_RANGE&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_SHADER_BYTECODE : public D3D12_SHADER_BYTECODE {
+  CD3DX12_SHADER_BYTECODE() {}
+  explicit CD3DX12_SHADER_BYTECODE(const D3D12_SHADER_BYTECODE& o) : D3D12_SHADER_BYTECODE(o) {}
+  CD3DX12_SHADER_BYTECODE(_In_ ID3DBlob* pShaderBlob) {
+    pShaderBytecode = pShaderBlob->GetBufferPointer();
+    BytecodeLength = pShaderBlob->GetBufferSize();
+  }
+  CD3DX12_SHADER_BYTECODE(_In_reads_(bytecodeLength) const void* _pShaderBytecode, SIZE_T bytecodeLength) {
+    pShaderBytecode = _pShaderBytecode;
+    BytecodeLength = bytecodeLength;
+  }
+  operator const D3D12_SHADER_BYTECODE&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_TILED_RESOURCE_COORDINATE : public D3D12_TILED_RESOURCE_COORDINATE {
+  CD3DX12_TILED_RESOURCE_COORDINATE() {}
+  explicit CD3DX12_TILED_RESOURCE_COORDINATE(const D3D12_TILED_RESOURCE_COORDINATE& o)
+      : D3D12_TILED_RESOURCE_COORDINATE(o) {}
+  CD3DX12_TILED_RESOURCE_COORDINATE(UINT x, UINT y, UINT z, UINT subresource) {
+    X = x;
+    Y = y;
+    Z = z;
+    Subresource = subresource;
+  }
+  operator const D3D12_TILED_RESOURCE_COORDINATE&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_TILE_REGION_SIZE : public D3D12_TILE_REGION_SIZE {
+  CD3DX12_TILE_REGION_SIZE() {}
+  explicit CD3DX12_TILE_REGION_SIZE(const D3D12_TILE_REGION_SIZE& o) : D3D12_TILE_REGION_SIZE(o) {}
+  CD3DX12_TILE_REGION_SIZE(UINT numTiles, BOOL useBox, UINT width, UINT16 height, UINT16 depth) {
+    NumTiles = numTiles;
+    UseBox = useBox;
+    Width = width;
+    Height = height;
+    Depth = depth;
+  }
+  operator const D3D12_TILE_REGION_SIZE&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_SUBRESOURCE_TILING : public D3D12_SUBRESOURCE_TILING {
+  CD3DX12_SUBRESOURCE_TILING() {}
+  explicit CD3DX12_SUBRESOURCE_TILING(const D3D12_SUBRESOURCE_TILING& o) : D3D12_SUBRESOURCE_TILING(o) {}
+  CD3DX12_SUBRESOURCE_TILING(UINT widthInTiles, UINT16 heightInTiles, UINT16 depthInTiles,
+                             UINT startTileIndexInOverallResource) {
+    WidthInTiles = widthInTiles;
+    HeightInTiles = heightInTiles;
+    DepthInTiles = depthInTiles;
+    StartTileIndexInOverallResource = startTileIndexInOverallResource;
+  }
+  operator const D3D12_SUBRESOURCE_TILING&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_TILE_SHAPE : public D3D12_TILE_SHAPE {
+  CD3DX12_TILE_SHAPE() {}
+  explicit CD3DX12_TILE_SHAPE(const D3D12_TILE_SHAPE& o) : D3D12_TILE_SHAPE(o) {}
+  CD3DX12_TILE_SHAPE(UINT widthInTexels, UINT heightInTexels, UINT depthInTexels) {
+    WidthInTexels = widthInTexels;
+    HeightInTexels = heightInTexels;
+    DepthInTexels = depthInTexels;
+  }
+  operator const D3D12_TILE_SHAPE&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_RESOURCE_BARRIER : public D3D12_RESOURCE_BARRIER {
+  CD3DX12_RESOURCE_BARRIER() {}
+  explicit CD3DX12_RESOURCE_BARRIER(const D3D12_RESOURCE_BARRIER& o) : D3D12_RESOURCE_BARRIER(o) {}
+  static inline CD3DX12_RESOURCE_BARRIER Transition(
+      _In_ ID3D12Resource* pResource, D3D12_RESOURCE_STATES stateBefore, D3D12_RESOURCE_STATES stateAfter,
+      UINT subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES,
+      D3D12_RESOURCE_BARRIER_FLAGS flags = D3D12_RESOURCE_BARRIER_FLAG_NONE) {
+    CD3DX12_RESOURCE_BARRIER result;
+    ZeroMemory(&result, sizeof(result));
+    D3D12_RESOURCE_BARRIER& barrier = result;
+    result.Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION;
+    result.Flags = flags;
+    barrier.Transition.pResource = pResource;
+    barrier.Transition.StateBefore = stateBefore;
+    barrier.Transition.StateAfter = stateAfter;
+    barrier.Transition.Subresource = subresource;
+    return result;
+  }
+  static inline CD3DX12_RESOURCE_BARRIER Aliasing(_In_ ID3D12Resource* pResourceBefore,
+                                                  _In_ ID3D12Resource* pResourceAfter) {
+    CD3DX12_RESOURCE_BARRIER result;
+    ZeroMemory(&result, sizeof(result));
+    D3D12_RESOURCE_BARRIER& barrier = result;
+    result.Type = D3D12_RESOURCE_BARRIER_TYPE_ALIASING;
+    barrier.Aliasing.pResourceBefore = pResourceBefore;
+    barrier.Aliasing.pResourceAfter = pResourceAfter;
+    return result;
+  }
+  static inline CD3DX12_RESOURCE_BARRIER UAV(_In_ ID3D12Resource* pResource) {
+    CD3DX12_RESOURCE_BARRIER result;
+    ZeroMemory(&result, sizeof(result));
+    D3D12_RESOURCE_BARRIER& barrier = result;
+    result.Type = D3D12_RESOURCE_BARRIER_TYPE_UAV;
+    barrier.UAV.pResource = pResource;
+    return result;
+  }
+  operator const D3D12_RESOURCE_BARRIER&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_PACKED_MIP_INFO : public D3D12_PACKED_MIP_INFO {
+  CD3DX12_PACKED_MIP_INFO() {}
+  explicit CD3DX12_PACKED_MIP_INFO(const D3D12_PACKED_MIP_INFO& o) : D3D12_PACKED_MIP_INFO(o) {}
+  CD3DX12_PACKED_MIP_INFO(UINT8 numStandardMips, UINT8 numPackedMips, UINT numTilesForPackedMips,
+                          UINT startTileIndexInOverallResource) {
+    NumStandardMips = numStandardMips;
+    NumPackedMips = numPackedMips;
+    NumTilesForPackedMips = numTilesForPackedMips;
+    StartTileIndexInOverallResource = startTileIndexInOverallResource;
+  }
+  operator const D3D12_PACKED_MIP_INFO&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_SUBRESOURCE_FOOTPRINT : public D3D12_SUBRESOURCE_FOOTPRINT {
+  CD3DX12_SUBRESOURCE_FOOTPRINT() {}
+  explicit CD3DX12_SUBRESOURCE_FOOTPRINT(const D3D12_SUBRESOURCE_FOOTPRINT& o) : D3D12_SUBRESOURCE_FOOTPRINT(o) {}
+  CD3DX12_SUBRESOURCE_FOOTPRINT(DXGI_FORMAT format, UINT width, UINT height, UINT depth, UINT rowPitch) {
+    Format = format;
+    Width = width;
+    Height = height;
+    Depth = depth;
+    RowPitch = rowPitch;
+  }
+  explicit CD3DX12_SUBRESOURCE_FOOTPRINT(const D3D12_RESOURCE_DESC& resDesc, UINT rowPitch) {
+    Format = resDesc.Format;
+    Width = UINT(resDesc.Width);
+    Height = resDesc.Height;
+    Depth = (resDesc.Dimension == D3D12_RESOURCE_DIMENSION_TEXTURE3D ? resDesc.DepthOrArraySize : 1);
+    RowPitch = rowPitch;
+  }
+  operator const D3D12_SUBRESOURCE_FOOTPRINT&() const { return *this; }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_TEXTURE_COPY_LOCATION : public D3D12_TEXTURE_COPY_LOCATION {
+  CD3DX12_TEXTURE_COPY_LOCATION() {}
+  explicit CD3DX12_TEXTURE_COPY_LOCATION(const D3D12_TEXTURE_COPY_LOCATION& o) : D3D12_TEXTURE_COPY_LOCATION(o) {}
+  CD3DX12_TEXTURE_COPY_LOCATION(ID3D12Resource* pRes) { pResource = pRes; }
+  CD3DX12_TEXTURE_COPY_LOCATION(ID3D12Resource* pRes, D3D12_PLACED_SUBRESOURCE_FOOTPRINT const& Footprint) {
+    pResource = pRes;
+    Type = D3D12_TEXTURE_COPY_TYPE_PLACED_FOOTPRINT;
+    PlacedFootprint = Footprint;
+  }
+  CD3DX12_TEXTURE_COPY_LOCATION(ID3D12Resource* pRes, UINT Sub) {
+    pResource = pRes;
+    Type = D3D12_TEXTURE_COPY_TYPE_SUBRESOURCE_INDEX;
+    SubresourceIndex = Sub;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_DESCRIPTOR_RANGE : public D3D12_DESCRIPTOR_RANGE {
+  CD3DX12_DESCRIPTOR_RANGE() {}
+  explicit CD3DX12_DESCRIPTOR_RANGE(const D3D12_DESCRIPTOR_RANGE& o) : D3D12_DESCRIPTOR_RANGE(o) {}
+  CD3DX12_DESCRIPTOR_RANGE(D3D12_DESCRIPTOR_RANGE_TYPE rangeType, UINT numDescriptors, UINT baseShaderRegister,
+                           UINT registerSpace = 0,
+                           UINT offsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND) {
+    Init(rangeType, numDescriptors, baseShaderRegister, registerSpace, offsetInDescriptorsFromTableStart);
+  }
+
+  inline void Init(D3D12_DESCRIPTOR_RANGE_TYPE rangeType, UINT numDescriptors, UINT baseShaderRegister,
+                   UINT registerSpace = 0,
+                   UINT offsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND) {
+    Init(*this, rangeType, numDescriptors, baseShaderRegister, registerSpace, offsetInDescriptorsFromTableStart);
+  }
+
+  static inline void Init(_Out_ D3D12_DESCRIPTOR_RANGE& range, D3D12_DESCRIPTOR_RANGE_TYPE rangeType,
+                          UINT numDescriptors, UINT baseShaderRegister, UINT registerSpace = 0,
+                          UINT offsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND) {
+    range.RangeType = rangeType;
+    range.NumDescriptors = numDescriptors;
+    range.BaseShaderRegister = baseShaderRegister;
+    range.RegisterSpace = registerSpace;
+    range.OffsetInDescriptorsFromTableStart = offsetInDescriptorsFromTableStart;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_ROOT_DESCRIPTOR_TABLE : public D3D12_ROOT_DESCRIPTOR_TABLE {
+  CD3DX12_ROOT_DESCRIPTOR_TABLE() {}
+  explicit CD3DX12_ROOT_DESCRIPTOR_TABLE(const D3D12_ROOT_DESCRIPTOR_TABLE& o) : D3D12_ROOT_DESCRIPTOR_TABLE(o) {}
+  CD3DX12_ROOT_DESCRIPTOR_TABLE(UINT numDescriptorRanges,
+                                _In_reads_opt_(numDescriptorRanges) const D3D12_DESCRIPTOR_RANGE* _pDescriptorRanges) {
+    Init(numDescriptorRanges, _pDescriptorRanges);
+  }
+
+  inline void Init(UINT numDescriptorRanges,
+                   _In_reads_opt_(numDescriptorRanges) const D3D12_DESCRIPTOR_RANGE* _pDescriptorRanges) {
+    Init(*this, numDescriptorRanges, _pDescriptorRanges);
+  }
+
+  static inline void Init(_Out_ D3D12_ROOT_DESCRIPTOR_TABLE& rootDescriptorTable, UINT numDescriptorRanges,
+                          _In_reads_opt_(numDescriptorRanges) const D3D12_DESCRIPTOR_RANGE* _pDescriptorRanges) {
+    rootDescriptorTable.NumDescriptorRanges = numDescriptorRanges;
+    rootDescriptorTable.pDescriptorRanges = _pDescriptorRanges;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_ROOT_CONSTANTS : public D3D12_ROOT_CONSTANTS {
+  CD3DX12_ROOT_CONSTANTS() {}
+  explicit CD3DX12_ROOT_CONSTANTS(const D3D12_ROOT_CONSTANTS& o) : D3D12_ROOT_CONSTANTS(o) {}
+  CD3DX12_ROOT_CONSTANTS(UINT num32BitValues, UINT shaderRegister, UINT registerSpace = 0) {
+    Init(num32BitValues, shaderRegister, registerSpace);
+  }
+
+  inline void Init(UINT num32BitValues, UINT shaderRegister, UINT registerSpace = 0) {
+    Init(*this, num32BitValues, shaderRegister, registerSpace);
+  }
+
+  static inline void Init(_Out_ D3D12_ROOT_CONSTANTS& rootConstants, UINT num32BitValues, UINT shaderRegister,
+                          UINT registerSpace = 0) {
+    rootConstants.Num32BitValues = num32BitValues;
+    rootConstants.ShaderRegister = shaderRegister;
+    rootConstants.RegisterSpace = registerSpace;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_ROOT_DESCRIPTOR : public D3D12_ROOT_DESCRIPTOR {
+  CD3DX12_ROOT_DESCRIPTOR() {}
+  explicit CD3DX12_ROOT_DESCRIPTOR(const D3D12_ROOT_DESCRIPTOR& o) : D3D12_ROOT_DESCRIPTOR(o) {}
+  CD3DX12_ROOT_DESCRIPTOR(UINT shaderRegister, UINT registerSpace = 0) { Init(shaderRegister, registerSpace); }
+
+  inline void Init(UINT shaderRegister, UINT registerSpace = 0) { Init(*this, shaderRegister, registerSpace); }
+
+  static inline void Init(_Out_ D3D12_ROOT_DESCRIPTOR& table, UINT shaderRegister, UINT registerSpace = 0) {
+    table.ShaderRegister = shaderRegister;
+    table.RegisterSpace = registerSpace;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_ROOT_PARAMETER : public D3D12_ROOT_PARAMETER {
+  CD3DX12_ROOT_PARAMETER() {}
+  explicit CD3DX12_ROOT_PARAMETER(const D3D12_ROOT_PARAMETER& o) : D3D12_ROOT_PARAMETER(o) {}
+
+  static inline void InitAsDescriptorTable(_Out_ D3D12_ROOT_PARAMETER& rootParam, UINT numDescriptorRanges,
+                                           _In_reads_(numDescriptorRanges)
+                                               const D3D12_DESCRIPTOR_RANGE* pDescriptorRanges,
+                                           D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_DESCRIPTOR_TABLE::Init(rootParam.DescriptorTable, numDescriptorRanges, pDescriptorRanges);
+  }
+
+  static inline void InitAsConstants(_Out_ D3D12_ROOT_PARAMETER& rootParam, UINT num32BitValues, UINT shaderRegister,
+                                     UINT registerSpace = 0,
+                                     D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_32BIT_CONSTANTS;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_CONSTANTS::Init(rootParam.Constants, num32BitValues, shaderRegister, registerSpace);
+  }
+
+  static inline void InitAsConstantBufferView(_Out_ D3D12_ROOT_PARAMETER& rootParam, UINT shaderRegister,
+                                              UINT registerSpace = 0,
+                                              D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_CBV;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_DESCRIPTOR::Init(rootParam.Descriptor, shaderRegister, registerSpace);
+  }
+
+  static inline void InitAsShaderResourceView(_Out_ D3D12_ROOT_PARAMETER& rootParam, UINT shaderRegister,
+                                              UINT registerSpace = 0,
+                                              D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_SRV;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_DESCRIPTOR::Init(rootParam.Descriptor, shaderRegister, registerSpace);
+  }
+
+  static inline void InitAsUnorderedAccessView(_Out_ D3D12_ROOT_PARAMETER& rootParam, UINT shaderRegister,
+                                               UINT registerSpace = 0,
+                                               D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_UAV;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_DESCRIPTOR::Init(rootParam.Descriptor, shaderRegister, registerSpace);
+  }
+
+  inline void InitAsDescriptorTable(UINT numDescriptorRanges,
+                                    _In_reads_(numDescriptorRanges) const D3D12_DESCRIPTOR_RANGE* pDescriptorRanges,
+                                    D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsDescriptorTable(*this, numDescriptorRanges, pDescriptorRanges, visibility);
+  }
+
+  inline void InitAsConstants(UINT num32BitValues, UINT shaderRegister, UINT registerSpace = 0,
+                              D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsConstants(*this, num32BitValues, shaderRegister, registerSpace, visibility);
+  }
+
+  inline void InitAsConstantBufferView(UINT shaderRegister, UINT registerSpace = 0,
+                                       D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsConstantBufferView(*this, shaderRegister, registerSpace, visibility);
+  }
+
+  inline void InitAsShaderResourceView(UINT shaderRegister, UINT registerSpace = 0,
+                                       D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsShaderResourceView(*this, shaderRegister, registerSpace, visibility);
+  }
+
+  inline void InitAsUnorderedAccessView(UINT shaderRegister, UINT registerSpace = 0,
+                                        D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsUnorderedAccessView(*this, shaderRegister, registerSpace, visibility);
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_STATIC_SAMPLER_DESC : public D3D12_STATIC_SAMPLER_DESC {
+  CD3DX12_STATIC_SAMPLER_DESC() {}
+  explicit CD3DX12_STATIC_SAMPLER_DESC(const D3D12_STATIC_SAMPLER_DESC& o) : D3D12_STATIC_SAMPLER_DESC(o) {}
+  CD3DX12_STATIC_SAMPLER_DESC(UINT shaderRegister, D3D12_FILTER filter = D3D12_FILTER_ANISOTROPIC,
+                              D3D12_TEXTURE_ADDRESS_MODE addressU = D3D12_TEXTURE_ADDRESS_MODE_WRAP,
+                              D3D12_TEXTURE_ADDRESS_MODE addressV = D3D12_TEXTURE_ADDRESS_MODE_WRAP,
+                              D3D12_TEXTURE_ADDRESS_MODE addressW = D3D12_TEXTURE_ADDRESS_MODE_WRAP,
+                              FLOAT mipLODBias = 0, UINT maxAnisotropy = 16,
+                              D3D12_COMPARISON_FUNC comparisonFunc = D3D12_COMPARISON_FUNC_LESS_EQUAL,
+                              D3D12_STATIC_BORDER_COLOR borderColor = D3D12_STATIC_BORDER_COLOR_OPAQUE_WHITE,
+                              FLOAT minLOD = 0.f, FLOAT maxLOD = D3D12_FLOAT32_MAX,
+                              D3D12_SHADER_VISIBILITY shaderVisibility = D3D12_SHADER_VISIBILITY_ALL,
+                              UINT registerSpace = 0) {
+    Init(shaderRegister, filter, addressU, addressV, addressW, mipLODBias, maxAnisotropy, comparisonFunc, borderColor,
+         minLOD, maxLOD, shaderVisibility, registerSpace);
+  }
+
+  static inline void Init(_Out_ D3D12_STATIC_SAMPLER_DESC& samplerDesc, UINT shaderRegister,
+                          D3D12_FILTER filter = D3D12_FILTER_ANISOTROPIC,
+                          D3D12_TEXTURE_ADDRESS_MODE addressU = D3D12_TEXTURE_ADDRESS_MODE_WRAP,
+                          D3D12_TEXTURE_ADDRESS_MODE addressV = D3D12_TEXTURE_ADDRESS_MODE_WRAP,
+                          D3D12_TEXTURE_ADDRESS_MODE addressW = D3D12_TEXTURE_ADDRESS_MODE_WRAP, FLOAT mipLODBias = 0,
+                          UINT maxAnisotropy = 16,
+                          D3D12_COMPARISON_FUNC comparisonFunc = D3D12_COMPARISON_FUNC_LESS_EQUAL,
+                          D3D12_STATIC_BORDER_COLOR borderColor = D3D12_STATIC_BORDER_COLOR_OPAQUE_WHITE,
+                          FLOAT minLOD = 0.f, FLOAT maxLOD = D3D12_FLOAT32_MAX,
+                          D3D12_SHADER_VISIBILITY shaderVisibility = D3D12_SHADER_VISIBILITY_ALL,
+                          UINT registerSpace = 0) {
+    samplerDesc.ShaderRegister = shaderRegister;
+    samplerDesc.Filter = filter;
+    samplerDesc.AddressU = addressU;
+    samplerDesc.AddressV = addressV;
+    samplerDesc.AddressW = addressW;
+    samplerDesc.MipLODBias = mipLODBias;
+    samplerDesc.MaxAnisotropy = maxAnisotropy;
+    samplerDesc.ComparisonFunc = comparisonFunc;
+    samplerDesc.BorderColor = borderColor;
+    samplerDesc.MinLOD = minLOD;
+    samplerDesc.MaxLOD = maxLOD;
+    samplerDesc.ShaderVisibility = shaderVisibility;
+    samplerDesc.RegisterSpace = registerSpace;
+  }
+  inline void Init(UINT shaderRegister, D3D12_FILTER filter = D3D12_FILTER_ANISOTROPIC,
+                   D3D12_TEXTURE_ADDRESS_MODE addressU = D3D12_TEXTURE_ADDRESS_MODE_WRAP,
+                   D3D12_TEXTURE_ADDRESS_MODE addressV = D3D12_TEXTURE_ADDRESS_MODE_WRAP,
+                   D3D12_TEXTURE_ADDRESS_MODE addressW = D3D12_TEXTURE_ADDRESS_MODE_WRAP, FLOAT mipLODBias = 0,
+                   UINT maxAnisotropy = 16, D3D12_COMPARISON_FUNC comparisonFunc = D3D12_COMPARISON_FUNC_LESS_EQUAL,
+                   D3D12_STATIC_BORDER_COLOR borderColor = D3D12_STATIC_BORDER_COLOR_OPAQUE_WHITE, FLOAT minLOD = 0.f,
+                   FLOAT maxLOD = D3D12_FLOAT32_MAX,
+                   D3D12_SHADER_VISIBILITY shaderVisibility = D3D12_SHADER_VISIBILITY_ALL, UINT registerSpace = 0) {
+    Init(*this, shaderRegister, filter, addressU, addressV, addressW, mipLODBias, maxAnisotropy, comparisonFunc,
+         borderColor, minLOD, maxLOD, shaderVisibility, registerSpace);
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_ROOT_SIGNATURE_DESC : public D3D12_ROOT_SIGNATURE_DESC {
+  CD3DX12_ROOT_SIGNATURE_DESC() {}
+  explicit CD3DX12_ROOT_SIGNATURE_DESC(const D3D12_ROOT_SIGNATURE_DESC& o) : D3D12_ROOT_SIGNATURE_DESC(o) {}
+  CD3DX12_ROOT_SIGNATURE_DESC(UINT numParameters,
+                              _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER* _pParameters,
+                              UINT numStaticSamplers = 0,
+                              _In_reads_opt_(numStaticSamplers)
+                                  const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                              D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    Init(numParameters, _pParameters, numStaticSamplers, _pStaticSamplers, flags);
+  }
+  CD3DX12_ROOT_SIGNATURE_DESC(CD3DX12_DEFAULT) { Init(0, NULL, 0, NULL, D3D12_ROOT_SIGNATURE_FLAG_NONE); }
+
+  inline void Init(UINT numParameters, _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER* _pParameters,
+                   UINT numStaticSamplers = 0,
+                   _In_reads_opt_(numStaticSamplers) const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                   D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    Init(*this, numParameters, _pParameters, numStaticSamplers, _pStaticSamplers, flags);
+  }
+
+  static inline void Init(_Out_ D3D12_ROOT_SIGNATURE_DESC& desc, UINT numParameters,
+                          _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER* _pParameters,
+                          UINT numStaticSamplers = 0,
+                          _In_reads_opt_(numStaticSamplers) const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                          D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    desc.NumParameters = numParameters;
+    desc.pParameters = _pParameters;
+    desc.NumStaticSamplers = numStaticSamplers;
+    desc.pStaticSamplers = _pStaticSamplers;
+    desc.Flags = flags;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_DESCRIPTOR_RANGE1 : public D3D12_DESCRIPTOR_RANGE1 {
+  CD3DX12_DESCRIPTOR_RANGE1() {}
+  explicit CD3DX12_DESCRIPTOR_RANGE1(const D3D12_DESCRIPTOR_RANGE1& o) : D3D12_DESCRIPTOR_RANGE1(o) {}
+  CD3DX12_DESCRIPTOR_RANGE1(D3D12_DESCRIPTOR_RANGE_TYPE rangeType, UINT numDescriptors, UINT baseShaderRegister,
+                            UINT registerSpace = 0,
+                            D3D12_DESCRIPTOR_RANGE_FLAGS flags = D3D12_DESCRIPTOR_RANGE_FLAG_NONE,
+                            UINT offsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND) {
+    Init(rangeType, numDescriptors, baseShaderRegister, registerSpace, flags, offsetInDescriptorsFromTableStart);
+  }
+
+  inline void Init(D3D12_DESCRIPTOR_RANGE_TYPE rangeType, UINT numDescriptors, UINT baseShaderRegister,
+                   UINT registerSpace = 0, D3D12_DESCRIPTOR_RANGE_FLAGS flags = D3D12_DESCRIPTOR_RANGE_FLAG_NONE,
+                   UINT offsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND) {
+    Init(*this, rangeType, numDescriptors, baseShaderRegister, registerSpace, flags, offsetInDescriptorsFromTableStart);
+  }
+
+  static inline void Init(_Out_ D3D12_DESCRIPTOR_RANGE1& range, D3D12_DESCRIPTOR_RANGE_TYPE rangeType,
+                          UINT numDescriptors, UINT baseShaderRegister, UINT registerSpace = 0,
+                          D3D12_DESCRIPTOR_RANGE_FLAGS flags = D3D12_DESCRIPTOR_RANGE_FLAG_NONE,
+                          UINT offsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND) {
+    range.RangeType = rangeType;
+    range.NumDescriptors = numDescriptors;
+    range.BaseShaderRegister = baseShaderRegister;
+    range.RegisterSpace = registerSpace;
+    range.Flags = flags;
+    range.OffsetInDescriptorsFromTableStart = offsetInDescriptorsFromTableStart;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_ROOT_DESCRIPTOR_TABLE1 : public D3D12_ROOT_DESCRIPTOR_TABLE1 {
+  CD3DX12_ROOT_DESCRIPTOR_TABLE1() {}
+  explicit CD3DX12_ROOT_DESCRIPTOR_TABLE1(const D3D12_ROOT_DESCRIPTOR_TABLE1& o) : D3D12_ROOT_DESCRIPTOR_TABLE1(o) {}
+  CD3DX12_ROOT_DESCRIPTOR_TABLE1(UINT numDescriptorRanges, _In_reads_opt_(numDescriptorRanges)
+                                                               const D3D12_DESCRIPTOR_RANGE1* _pDescriptorRanges) {
+    Init(numDescriptorRanges, _pDescriptorRanges);
+  }
+
+  inline void Init(UINT numDescriptorRanges,
+                   _In_reads_opt_(numDescriptorRanges) const D3D12_DESCRIPTOR_RANGE1* _pDescriptorRanges) {
+    Init(*this, numDescriptorRanges, _pDescriptorRanges);
+  }
+
+  static inline void Init(_Out_ D3D12_ROOT_DESCRIPTOR_TABLE1& rootDescriptorTable, UINT numDescriptorRanges,
+                          _In_reads_opt_(numDescriptorRanges) const D3D12_DESCRIPTOR_RANGE1* _pDescriptorRanges) {
+    rootDescriptorTable.NumDescriptorRanges = numDescriptorRanges;
+    rootDescriptorTable.pDescriptorRanges = _pDescriptorRanges;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_ROOT_DESCRIPTOR1 : public D3D12_ROOT_DESCRIPTOR1 {
+  CD3DX12_ROOT_DESCRIPTOR1() {}
+  explicit CD3DX12_ROOT_DESCRIPTOR1(const D3D12_ROOT_DESCRIPTOR1& o) : D3D12_ROOT_DESCRIPTOR1(o) {}
+  CD3DX12_ROOT_DESCRIPTOR1(UINT shaderRegister, UINT registerSpace = 0,
+                           D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE) {
+    Init(shaderRegister, registerSpace, flags);
+  }
+
+  inline void Init(UINT shaderRegister, UINT registerSpace = 0,
+                   D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE) {
+    Init(*this, shaderRegister, registerSpace, flags);
+  }
+
+  static inline void Init(_Out_ D3D12_ROOT_DESCRIPTOR1& table, UINT shaderRegister, UINT registerSpace = 0,
+                          D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE) {
+    table.ShaderRegister = shaderRegister;
+    table.RegisterSpace = registerSpace;
+    table.Flags = flags;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_ROOT_PARAMETER1 : public D3D12_ROOT_PARAMETER1 {
+  CD3DX12_ROOT_PARAMETER1() {}
+  explicit CD3DX12_ROOT_PARAMETER1(const D3D12_ROOT_PARAMETER1& o) : D3D12_ROOT_PARAMETER1(o) {}
+
+  static inline void InitAsDescriptorTable(_Out_ D3D12_ROOT_PARAMETER1& rootParam, UINT numDescriptorRanges,
+                                           _In_reads_(numDescriptorRanges)
+                                               const D3D12_DESCRIPTOR_RANGE1* pDescriptorRanges,
+                                           D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_DESCRIPTOR_TABLE1::Init(rootParam.DescriptorTable, numDescriptorRanges, pDescriptorRanges);
+  }
+
+  static inline void InitAsConstants(_Out_ D3D12_ROOT_PARAMETER1& rootParam, UINT num32BitValues, UINT shaderRegister,
+                                     UINT registerSpace = 0,
+                                     D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_32BIT_CONSTANTS;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_CONSTANTS::Init(rootParam.Constants, num32BitValues, shaderRegister, registerSpace);
+  }
+
+  static inline void InitAsConstantBufferView(_Out_ D3D12_ROOT_PARAMETER1& rootParam, UINT shaderRegister,
+                                              UINT registerSpace = 0,
+                                              D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE,
+                                              D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_CBV;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_DESCRIPTOR1::Init(rootParam.Descriptor, shaderRegister, registerSpace, flags);
+  }
+
+  static inline void InitAsShaderResourceView(_Out_ D3D12_ROOT_PARAMETER1& rootParam, UINT shaderRegister,
+                                              UINT registerSpace = 0,
+                                              D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE,
+                                              D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_SRV;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_DESCRIPTOR1::Init(rootParam.Descriptor, shaderRegister, registerSpace, flags);
+  }
+
+  static inline void InitAsUnorderedAccessView(_Out_ D3D12_ROOT_PARAMETER1& rootParam, UINT shaderRegister,
+                                               UINT registerSpace = 0,
+                                               D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE,
+                                               D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    rootParam.ParameterType = D3D12_ROOT_PARAMETER_TYPE_UAV;
+    rootParam.ShaderVisibility = visibility;
+    CD3DX12_ROOT_DESCRIPTOR1::Init(rootParam.Descriptor, shaderRegister, registerSpace, flags);
+  }
+
+  inline void InitAsDescriptorTable(UINT numDescriptorRanges,
+                                    _In_reads_(numDescriptorRanges) const D3D12_DESCRIPTOR_RANGE1* pDescriptorRanges,
+                                    D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsDescriptorTable(*this, numDescriptorRanges, pDescriptorRanges, visibility);
+  }
+
+  inline void InitAsConstants(UINT num32BitValues, UINT shaderRegister, UINT registerSpace = 0,
+                              D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsConstants(*this, num32BitValues, shaderRegister, registerSpace, visibility);
+  }
+
+  inline void InitAsConstantBufferView(UINT shaderRegister, UINT registerSpace = 0,
+                                       D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE,
+                                       D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsConstantBufferView(*this, shaderRegister, registerSpace, flags, visibility);
+  }
+
+  inline void InitAsShaderResourceView(UINT shaderRegister, UINT registerSpace = 0,
+                                       D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE,
+                                       D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsShaderResourceView(*this, shaderRegister, registerSpace, flags, visibility);
+  }
+
+  inline void InitAsUnorderedAccessView(UINT shaderRegister, UINT registerSpace = 0,
+                                        D3D12_ROOT_DESCRIPTOR_FLAGS flags = D3D12_ROOT_DESCRIPTOR_FLAG_NONE,
+                                        D3D12_SHADER_VISIBILITY visibility = D3D12_SHADER_VISIBILITY_ALL) {
+    InitAsUnorderedAccessView(*this, shaderRegister, registerSpace, flags, visibility);
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC : public D3D12_VERSIONED_ROOT_SIGNATURE_DESC {
+  CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC() {}
+  explicit CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC(const D3D12_VERSIONED_ROOT_SIGNATURE_DESC& o)
+      : D3D12_VERSIONED_ROOT_SIGNATURE_DESC(o) {}
+  explicit CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC(const D3D12_ROOT_SIGNATURE_DESC& o) {
+    Version = D3D_ROOT_SIGNATURE_VERSION_1_0;
+    Desc_1_0 = o;
+  }
+  explicit CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC(const D3D12_ROOT_SIGNATURE_DESC1& o) {
+    Version = D3D_ROOT_SIGNATURE_VERSION_1_1;
+    Desc_1_1 = o;
+  }
+  CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC(UINT numParameters,
+                                        _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER* _pParameters,
+                                        UINT numStaticSamplers = 0,
+                                        _In_reads_opt_(numStaticSamplers)
+                                            const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                                        D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    Init_1_0(numParameters, _pParameters, numStaticSamplers, _pStaticSamplers, flags);
+  }
+  CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC(UINT numParameters,
+                                        _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER1* _pParameters,
+                                        UINT numStaticSamplers = 0,
+                                        _In_reads_opt_(numStaticSamplers)
+                                            const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                                        D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    Init_1_1(numParameters, _pParameters, numStaticSamplers, _pStaticSamplers, flags);
+  }
+  CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC(CD3DX12_DEFAULT) { Init_1_1(0, NULL, 0, NULL, D3D12_ROOT_SIGNATURE_FLAG_NONE); }
+
+  inline void Init_1_0(UINT numParameters, _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER* _pParameters,
+                       UINT numStaticSamplers = 0,
+                       _In_reads_opt_(numStaticSamplers) const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                       D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    Init_1_0(*this, numParameters, _pParameters, numStaticSamplers, _pStaticSamplers, flags);
+  }
+
+  static inline void Init_1_0(_Out_ D3D12_VERSIONED_ROOT_SIGNATURE_DESC& desc, UINT numParameters,
+                              _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER* _pParameters,
+                              UINT numStaticSamplers = 0,
+                              _In_reads_opt_(numStaticSamplers)
+                                  const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                              D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    desc.Version = D3D_ROOT_SIGNATURE_VERSION_1_0;
+    desc.Desc_1_0.NumParameters = numParameters;
+    desc.Desc_1_0.pParameters = _pParameters;
+    desc.Desc_1_0.NumStaticSamplers = numStaticSamplers;
+    desc.Desc_1_0.pStaticSamplers = _pStaticSamplers;
+    desc.Desc_1_0.Flags = flags;
+  }
+
+  inline void Init_1_1(UINT numParameters, _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER1* _pParameters,
+                       UINT numStaticSamplers = 0,
+                       _In_reads_opt_(numStaticSamplers) const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                       D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    Init_1_1(*this, numParameters, _pParameters, numStaticSamplers, _pStaticSamplers, flags);
+  }
+
+  static inline void Init_1_1(_Out_ D3D12_VERSIONED_ROOT_SIGNATURE_DESC& desc, UINT numParameters,
+                              _In_reads_opt_(numParameters) const D3D12_ROOT_PARAMETER1* _pParameters,
+                              UINT numStaticSamplers = 0,
+                              _In_reads_opt_(numStaticSamplers)
+                                  const D3D12_STATIC_SAMPLER_DESC* _pStaticSamplers = NULL,
+                              D3D12_ROOT_SIGNATURE_FLAGS flags = D3D12_ROOT_SIGNATURE_FLAG_NONE) {
+    desc.Version = D3D_ROOT_SIGNATURE_VERSION_1_1;
+    desc.Desc_1_1.NumParameters = numParameters;
+    desc.Desc_1_1.pParameters = _pParameters;
+    desc.Desc_1_1.NumStaticSamplers = numStaticSamplers;
+    desc.Desc_1_1.pStaticSamplers = _pStaticSamplers;
+    desc.Desc_1_1.Flags = flags;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_CPU_DESCRIPTOR_HANDLE : public D3D12_CPU_DESCRIPTOR_HANDLE {
+  CD3DX12_CPU_DESCRIPTOR_HANDLE() {}
+  explicit CD3DX12_CPU_DESCRIPTOR_HANDLE(const D3D12_CPU_DESCRIPTOR_HANDLE& o) : D3D12_CPU_DESCRIPTOR_HANDLE(o) {}
+  CD3DX12_CPU_DESCRIPTOR_HANDLE(CD3DX12_DEFAULT) { ptr = 0; }
+  CD3DX12_CPU_DESCRIPTOR_HANDLE(_In_ const D3D12_CPU_DESCRIPTOR_HANDLE& other, INT offsetScaledByIncrementSize) {
+    InitOffsetted(other, offsetScaledByIncrementSize);
+  }
+  CD3DX12_CPU_DESCRIPTOR_HANDLE(_In_ const D3D12_CPU_DESCRIPTOR_HANDLE& other, INT offsetInDescriptors,
+                                UINT descriptorIncrementSize) {
+    InitOffsetted(other, offsetInDescriptors, descriptorIncrementSize);
+  }
+  CD3DX12_CPU_DESCRIPTOR_HANDLE& Offset(INT offsetInDescriptors, UINT descriptorIncrementSize) {
+    ptr += offsetInDescriptors * descriptorIncrementSize;
+    return *this;
+  }
+  CD3DX12_CPU_DESCRIPTOR_HANDLE& Offset(INT offsetScaledByIncrementSize) {
+    ptr += offsetScaledByIncrementSize;
+    return *this;
+  }
+  bool operator==(_In_ const D3D12_CPU_DESCRIPTOR_HANDLE& other) const { return (ptr == other.ptr); }
+  bool operator!=(_In_ const D3D12_CPU_DESCRIPTOR_HANDLE& other) const { return (ptr != other.ptr); }
+  CD3DX12_CPU_DESCRIPTOR_HANDLE& operator=(const D3D12_CPU_DESCRIPTOR_HANDLE& other) {
+    ptr = other.ptr;
+    return *this;
+  }
+
+  inline void InitOffsetted(_In_ const D3D12_CPU_DESCRIPTOR_HANDLE& base, INT offsetScaledByIncrementSize) {
+    InitOffsetted(*this, base, offsetScaledByIncrementSize);
+  }
+
+  inline void InitOffsetted(_In_ const D3D12_CPU_DESCRIPTOR_HANDLE& base, INT offsetInDescriptors,
+                            UINT descriptorIncrementSize) {
+    InitOffsetted(*this, base, offsetInDescriptors, descriptorIncrementSize);
+  }
+
+  static inline void InitOffsetted(_Out_ D3D12_CPU_DESCRIPTOR_HANDLE& handle,
+                                   _In_ const D3D12_CPU_DESCRIPTOR_HANDLE& base, INT offsetScaledByIncrementSize) {
+    handle.ptr = base.ptr + offsetScaledByIncrementSize;
+  }
+
+  static inline void InitOffsetted(_Out_ D3D12_CPU_DESCRIPTOR_HANDLE& handle,
+                                   _In_ const D3D12_CPU_DESCRIPTOR_HANDLE& base, INT offsetInDescriptors,
+                                   UINT descriptorIncrementSize) {
+    handle.ptr = base.ptr + offsetInDescriptors * descriptorIncrementSize;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_GPU_DESCRIPTOR_HANDLE : public D3D12_GPU_DESCRIPTOR_HANDLE {
+  CD3DX12_GPU_DESCRIPTOR_HANDLE() {}
+  explicit CD3DX12_GPU_DESCRIPTOR_HANDLE(const D3D12_GPU_DESCRIPTOR_HANDLE& o) : D3D12_GPU_DESCRIPTOR_HANDLE(o) {}
+  CD3DX12_GPU_DESCRIPTOR_HANDLE(CD3DX12_DEFAULT) { ptr = 0; }
+  CD3DX12_GPU_DESCRIPTOR_HANDLE(_In_ const D3D12_GPU_DESCRIPTOR_HANDLE& other, INT offsetScaledByIncrementSize) {
+    InitOffsetted(other, offsetScaledByIncrementSize);
+  }
+  CD3DX12_GPU_DESCRIPTOR_HANDLE(_In_ const D3D12_GPU_DESCRIPTOR_HANDLE& other, INT offsetInDescriptors,
+                                UINT descriptorIncrementSize) {
+    InitOffsetted(other, offsetInDescriptors, descriptorIncrementSize);
+  }
+  CD3DX12_GPU_DESCRIPTOR_HANDLE& Offset(INT offsetInDescriptors, UINT descriptorIncrementSize) {
+    ptr += offsetInDescriptors * descriptorIncrementSize;
+    return *this;
+  }
+  CD3DX12_GPU_DESCRIPTOR_HANDLE& Offset(INT offsetScaledByIncrementSize) {
+    ptr += offsetScaledByIncrementSize;
+    return *this;
+  }
+  inline bool operator==(_In_ const D3D12_GPU_DESCRIPTOR_HANDLE& other) const { return (ptr == other.ptr); }
+  inline bool operator!=(_In_ const D3D12_GPU_DESCRIPTOR_HANDLE& other) const { return (ptr != other.ptr); }
+  CD3DX12_GPU_DESCRIPTOR_HANDLE& operator=(const D3D12_GPU_DESCRIPTOR_HANDLE& other) {
+    ptr = other.ptr;
+    return *this;
+  }
+
+  inline void InitOffsetted(_In_ const D3D12_GPU_DESCRIPTOR_HANDLE& base, INT offsetScaledByIncrementSize) {
+    InitOffsetted(*this, base, offsetScaledByIncrementSize);
+  }
+
+  inline void InitOffsetted(_In_ const D3D12_GPU_DESCRIPTOR_HANDLE& base, INT offsetInDescriptors,
+                            UINT descriptorIncrementSize) {
+    InitOffsetted(*this, base, offsetInDescriptors, descriptorIncrementSize);
+  }
+
+  static inline void InitOffsetted(_Out_ D3D12_GPU_DESCRIPTOR_HANDLE& handle,
+                                   _In_ const D3D12_GPU_DESCRIPTOR_HANDLE& base, INT offsetScaledByIncrementSize) {
+    handle.ptr = base.ptr + offsetScaledByIncrementSize;
+  }
+
+  static inline void InitOffsetted(_Out_ D3D12_GPU_DESCRIPTOR_HANDLE& handle,
+                                   _In_ const D3D12_GPU_DESCRIPTOR_HANDLE& base, INT offsetInDescriptors,
+                                   UINT descriptorIncrementSize) {
+    handle.ptr = base.ptr + offsetInDescriptors * descriptorIncrementSize;
+  }
+};
+
+//------------------------------------------------------------------------------------------------
+inline UINT D3D12CalcSubresource(UINT MipSlice, UINT ArraySlice, UINT PlaneSlice, UINT MipLevels, UINT ArraySize) {
+  return MipSlice + ArraySlice * MipLevels + PlaneSlice * MipLevels * ArraySize;
+}
+
+//------------------------------------------------------------------------------------------------
+template <typename T, typename U, typename V>
+inline void D3D12DecomposeSubresource(UINT Subresource, UINT MipLevels, UINT ArraySize, _Out_ T& MipSlice,
+                                      _Out_ U& ArraySlice, _Out_ V& PlaneSlice) {
+  MipSlice = static_cast<T>(Subresource % MipLevels);
+  ArraySlice = static_cast<U>((Subresource / MipLevels) % ArraySize);
+  PlaneSlice = static_cast<V>(Subresource / (MipLevels * ArraySize));
+}
+
+//------------------------------------------------------------------------------------------------
+inline UINT8 D3D12GetFormatPlaneCount(_In_ ID3D12Device* pDevice, DXGI_FORMAT Format) {
+  D3D12_FEATURE_DATA_FORMAT_INFO formatInfo = {Format};
+  if (FAILED(pDevice->CheckFeatureSupport(D3D12_FEATURE_FORMAT_INFO, &formatInfo, sizeof(formatInfo)))) {
+    return 0;
+  }
+  return formatInfo.PlaneCount;
+}
+
+//------------------------------------------------------------------------------------------------
+struct CD3DX12_RESOURCE_DESC : public D3D12_RESOURCE_DESC {
+  CD3DX12_RESOURCE_DESC() {}
+  explicit CD3DX12_RESOURCE_DESC(const D3D12_RESOURCE_DESC& o) : D3D12_RESOURCE_DESC(o) {}
+  CD3DX12_RESOURCE_DESC(D3D12_RESOURCE_DIMENSION dimension, UINT64 alignment, UINT64 width, UINT height,
+                        UINT16 depthOrArraySize, UINT16 mipLevels, DXGI_FORMAT format, UINT sampleCount,
+                        UINT sampleQuality, D3D12_TEXTURE_LAYOUT layout, D3D12_RESOURCE_FLAGS flags) {
+    Dimension = dimension;
+    Alignment = alignment;
+    Width = width;
+    Height = height;
+    DepthOrArraySize = depthOrArraySize;
+    MipLevels = mipLevels;
+    Format = format;
+    SampleDesc.Count = sampleCount;
+    SampleDesc.Quality = sampleQuality;
+    Layout = layout;
+    Flags = flags;
+  }
+  static inline CD3DX12_RESOURCE_DESC Buffer(const D3D12_RESOURCE_ALLOCATION_INFO& resAllocInfo,
+                                             D3D12_RESOURCE_FLAGS flags = D3D12_RESOURCE_FLAG_NONE) {
+    return CD3DX12_RESOURCE_DESC(D3D12_RESOURCE_DIMENSION_BUFFER, resAllocInfo.Alignment, resAllocInfo.SizeInBytes, 1,
+                                 1, 1, DXGI_FORMAT_UNKNOWN, 1, 0, D3D12_TEXTURE_LAYOUT_ROW_MAJOR, flags);
+  }
+  static inline CD3DX12_RESOURCE_DESC Buffer(UINT64 width, D3D12_RESOURCE_FLAGS flags = D3D12_RESOURCE_FLAG_NONE,
+                                             UINT64 alignment = 0) {
+    return CD3DX12_RESOURCE_DESC(D3D12_RESOURCE_DIMENSION_BUFFER, alignment, width, 1, 1, 1, DXGI_FORMAT_UNKNOWN, 1, 0,
+                                 D3D12_TEXTURE_LAYOUT_ROW_MAJOR, flags);
+  }
+  static inline CD3DX12_RESOURCE_DESC Tex1D(DXGI_FORMAT format, UINT64 width, UINT16 arraySize = 1,
+                                            UINT16 mipLevels = 0, D3D12_RESOURCE_FLAGS flags = D3D12_RESOURCE_FLAG_NONE,
+                                            D3D12_TEXTURE_LAYOUT layout = D3D12_TEXTURE_LAYOUT_UNKNOWN,
+                                            UINT64 alignment = 0) {
+    return CD3DX12_RESOURCE_DESC(D3D12_RESOURCE_DIMENSION_TEXTURE1D, alignment, width, 1, arraySize, mipLevels, format,
+                                 1, 0, layout, flags);
+  }
+  static inline CD3DX12_RESOURCE_DESC Tex2D(DXGI_FORMAT format, UINT64 width, UINT height, UINT16 arraySize = 1,
+                                            UINT16 mipLevels = 0, UINT sampleCount = 1, UINT sampleQuality = 0,
+                                            D3D12_RESOURCE_FLAGS flags = D3D12_RESOURCE_FLAG_NONE,
+                                            D3D12_TEXTURE_LAYOUT layout = D3D12_TEXTURE_LAYOUT_UNKNOWN,
+                                            UINT64 alignment = 0) {
+    return CD3DX12_RESOURCE_DESC(D3D12_RESOURCE_DIMENSION_TEXTURE2D, alignment, width, height, arraySize, mipLevels,
+                                 format, sampleCount, sampleQuality, layout, flags);
+  }
+  static inline CD3DX12_RESOURCE_DESC Tex3D(DXGI_FORMAT format, UINT64 width, UINT height, UINT16 depth,
+                                            UINT16 mipLevels = 0, D3D12_RESOURCE_FLAGS flags = D3D12_RESOURCE_FLAG_NONE,
+                                            D3D12_TEXTURE_LAYOUT layout = D3D12_TEXTURE_LAYOUT_UNKNOWN,
+                                            UINT64 alignment = 0) {
+    return CD3DX12_RESOURCE_DESC(D3D12_RESOURCE_DIMENSION_TEXTURE3D, alignment, width, height, depth, mipLevels, format,
+                                 1, 0, layout, flags);
+  }
+  inline UINT16 Depth() const { return (Dimension == D3D12_RESOURCE_DIMENSION_TEXTURE3D ? DepthOrArraySize : 1); }
+  inline UINT16 ArraySize() const { return (Dimension != D3D12_RESOURCE_DIMENSION_TEXTURE3D ? DepthOrArraySize : 1); }
+  inline UINT8 PlaneCount(_In_ ID3D12Device* pDevice) const { return D3D12GetFormatPlaneCount(pDevice, Format); }
+  inline UINT Subresources(_In_ ID3D12Device* pDevice) const { return MipLevels * ArraySize() * PlaneCount(pDevice); }
+  inline UINT CalcSubresource(UINT MipSlice, UINT ArraySlice, UINT PlaneSlice) {
+    return D3D12CalcSubresource(MipSlice, ArraySlice, PlaneSlice, MipLevels, ArraySize());
+  }
+  operator const D3D12_RESOURCE_DESC&() const { return *this; }
+};
+inline bool operator==(const D3D12_RESOURCE_DESC& l, const D3D12_RESOURCE_DESC& r) {
+  return l.Dimension == r.Dimension && l.Alignment == r.Alignment && l.Width == r.Width && l.Height == r.Height &&
+         l.DepthOrArraySize == r.DepthOrArraySize && l.MipLevels == r.MipLevels && l.Format == r.Format &&
+         l.SampleDesc.Count == r.SampleDesc.Count && l.SampleDesc.Quality == r.SampleDesc.Quality &&
+         l.Layout == r.Layout && l.Flags == r.Flags;
+}
+inline bool operator!=(const D3D12_RESOURCE_DESC& l, const D3D12_RESOURCE_DESC& r) { return !(l == r); }
+
+//------------------------------------------------------------------------------------------------
+// Row-by-row memcpy
+inline void MemcpySubresource(_In_ const D3D12_MEMCPY_DEST* pDest, _In_ const D3D12_SUBRESOURCE_DATA* pSrc,
+                              SIZE_T RowSizeInBytes, UINT NumRows, UINT NumSlices) {
+  for (UINT z = 0; z < NumSlices; ++z) {
+    BYTE* pDestSlice = reinterpret_cast<BYTE*>(pDest->pData) + pDest->SlicePitch * z;
+    const BYTE* pSrcSlice = reinterpret_cast<const BYTE*>(pSrc->pData) + pSrc->SlicePitch * z;
+    for (UINT y = 0; y < NumRows; ++y) {
+      memcpy(pDestSlice + pDest->RowPitch * y, pSrcSlice + pSrc->RowPitch * y, RowSizeInBytes);
+    }
+  }
+}
+
+//------------------------------------------------------------------------------------------------
+// Returns required size of a buffer to be used for data upload
+inline UINT64 GetRequiredIntermediateSize(_In_ ID3D12Resource* pDestinationResource,
+                                          _In_range_(0, D3D12_REQ_SUBRESOURCES) UINT FirstSubresource,
+                                          _In_range_(0, D3D12_REQ_SUBRESOURCES - FirstSubresource)
+                                              UINT NumSubresources) {
+  D3D12_RESOURCE_DESC Desc = pDestinationResource->GetDesc();
+  UINT64 RequiredSize = 0;
+
+  ID3D12Device* pDevice;
+  pDestinationResource->GetDevice(__uuidof(*pDevice), reinterpret_cast<void**>(&pDevice));
+  pDevice->GetCopyableFootprints(&Desc, FirstSubresource, NumSubresources, 0, nullptr, nullptr, nullptr, &RequiredSize);
+  pDevice->Release();
+
+  return RequiredSize;
+}
+
+//------------------------------------------------------------------------------------------------
+// All arrays must be populated (e.g. by calling GetCopyableFootprints)
+inline UINT64 UpdateSubresources(_In_ ID3D12GraphicsCommandList* pCmdList, _In_ ID3D12Resource* pDestinationResource,
+                                 _In_ ID3D12Resource* pIntermediate,
+                                 _In_range_(0, D3D12_REQ_SUBRESOURCES) UINT FirstSubresource,
+                                 _In_range_(0, D3D12_REQ_SUBRESOURCES - FirstSubresource) UINT NumSubresources,
+                                 UINT64 RequiredSize,
+                                 _In_reads_(NumSubresources) const D3D12_PLACED_SUBRESOURCE_FOOTPRINT* pLayouts,
+                                 _In_reads_(NumSubresources) const UINT* pNumRows,
+                                 _In_reads_(NumSubresources) const UINT64* pRowSizesInBytes,
+                                 _In_reads_(NumSubresources) const D3D12_SUBRESOURCE_DATA* pSrcData) {
+  // Minor validation
+  D3D12_RESOURCE_DESC IntermediateDesc = pIntermediate->GetDesc();
+  D3D12_RESOURCE_DESC DestinationDesc = pDestinationResource->GetDesc();
+  if (IntermediateDesc.Dimension != D3D12_RESOURCE_DIMENSION_BUFFER ||
+      IntermediateDesc.Width < RequiredSize + pLayouts[0].Offset || RequiredSize > (SIZE_T)-1 ||
+      (DestinationDesc.Dimension == D3D12_RESOURCE_DIMENSION_BUFFER &&
+       (FirstSubresource != 0 || NumSubresources != 1))) {
+    return 0;
+  }
+
+  BYTE* pData;
+  HRESULT hr = pIntermediate->Map(0, NULL, reinterpret_cast<void**>(&pData));
+  if (FAILED(hr)) {
+    return 0;
+  }
+
+  for (UINT i = 0; i < NumSubresources; ++i) {
+    if (pRowSizesInBytes[i] > (SIZE_T)-1) return 0;
+    D3D12_MEMCPY_DEST DestData = {pData + pLayouts[i].Offset, pLayouts[i].Footprint.RowPitch,
+                                  pLayouts[i].Footprint.RowPitch * pNumRows[i]};
+    MemcpySubresource(&DestData, &pSrcData[i], (SIZE_T)pRowSizesInBytes[i], pNumRows[i], pLayouts[i].Footprint.Depth);
+  }
+  pIntermediate->Unmap(0, NULL);
+
+  if (DestinationDesc.Dimension == D3D12_RESOURCE_DIMENSION_BUFFER) {
+    CD3DX12_BOX SrcBox(UINT(pLayouts[0].Offset), UINT(pLayouts[0].Offset + pLayouts[0].Footprint.Width));
+    pCmdList->CopyBufferRegion(pDestinationResource, 0, pIntermediate, pLayouts[0].Offset, pLayouts[0].Footprint.Width);
+  } else {
+    for (UINT i = 0; i < NumSubresources; ++i) {
+      CD3DX12_TEXTURE_COPY_LOCATION Dst(pDestinationResource, i + FirstSubresource);
+      CD3DX12_TEXTURE_COPY_LOCATION Src(pIntermediate, pLayouts[i]);
+      pCmdList->CopyTextureRegion(&Dst, 0, 0, 0, &Src, nullptr);
+    }
+  }
+  return RequiredSize;
+}
+
+//------------------------------------------------------------------------------------------------
+// Heap-allocating UpdateSubresources implementation
+inline UINT64 UpdateSubresources(_In_ ID3D12GraphicsCommandList* pCmdList, _In_ ID3D12Resource* pDestinationResource,
+                                 _In_ ID3D12Resource* pIntermediate, UINT64 IntermediateOffset,
+                                 _In_range_(0, D3D12_REQ_SUBRESOURCES) UINT FirstSubresource,
+                                 _In_range_(0, D3D12_REQ_SUBRESOURCES - FirstSubresource) UINT NumSubresources,
+                                 _In_reads_(NumSubresources) D3D12_SUBRESOURCE_DATA* pSrcData) {
+  UINT64 RequiredSize = 0;
+  UINT64 MemToAlloc =
+      static_cast<UINT64>(sizeof(D3D12_PLACED_SUBRESOURCE_FOOTPRINT) + sizeof(UINT) + sizeof(UINT64)) * NumSubresources;
+  if (MemToAlloc > SIZE_MAX) {
+    return 0;
+  }
+  void* pMem = HeapAlloc(GetProcessHeap(), 0, static_cast<SIZE_T>(MemToAlloc));
+  if (pMem == NULL) {
+    return 0;
+  }
+  D3D12_PLACED_SUBRESOURCE_FOOTPRINT* pLayouts = reinterpret_cast<D3D12_PLACED_SUBRESOURCE_FOOTPRINT*>(pMem);
+  UINT64* pRowSizesInBytes = reinterpret_cast<UINT64*>(pLayouts + NumSubresources);
+  UINT* pNumRows = reinterpret_cast<UINT*>(pRowSizesInBytes + NumSubresources);
+
+  D3D12_RESOURCE_DESC Desc = pDestinationResource->GetDesc();
+  ID3D12Device* pDevice;
+  pDestinationResource->GetDevice(__uuidof(*pDevice), reinterpret_cast<void**>(&pDevice));
+  pDevice->GetCopyableFootprints(&Desc, FirstSubresource, NumSubresources, IntermediateOffset, pLayouts, pNumRows,
+                                 pRowSizesInBytes, &RequiredSize);
+  pDevice->Release();
+
+  UINT64 Result = UpdateSubresources(pCmdList, pDestinationResource, pIntermediate, FirstSubresource, NumSubresources,
+                                     RequiredSize, pLayouts, pNumRows, pRowSizesInBytes, pSrcData);
+  HeapFree(GetProcessHeap(), 0, pMem);
+  return Result;
+}
+
+//------------------------------------------------------------------------------------------------
+// Stack-allocating UpdateSubresources implementation
+template <UINT MaxSubresources>
+inline UINT64 UpdateSubresources(_In_ ID3D12GraphicsCommandList* pCmdList, _In_ ID3D12Resource* pDestinationResource,
+                                 _In_ ID3D12Resource* pIntermediate, UINT64 IntermediateOffset,
+                                 _In_range_(0, MaxSubresources) UINT FirstSubresource,
+                                 _In_range_(1, MaxSubresources - FirstSubresource) UINT NumSubresources,
+                                 _In_reads_(NumSubresources) D3D12_SUBRESOURCE_DATA* pSrcData) {
+  UINT64 RequiredSize = 0;
+  D3D12_PLACED_SUBRESOURCE_FOOTPRINT Layouts[MaxSubresources];
+  UINT NumRows[MaxSubresources];
+  UINT64 RowSizesInBytes[MaxSubresources];
+
+  D3D12_RESOURCE_DESC Desc = pDestinationResource->GetDesc();
+  ID3D12Device* pDevice;
+  pDestinationResource->GetDevice(__uuidof(*pDevice), reinterpret_cast<void**>(&pDevice));
+  pDevice->GetCopyableFootprints(&Desc, FirstSubresource, NumSubresources, IntermediateOffset, Layouts, NumRows,
+                                 RowSizesInBytes, &RequiredSize);
+  pDevice->Release();
+
+  return UpdateSubresources(pCmdList, pDestinationResource, pIntermediate, FirstSubresource, NumSubresources,
+                            RequiredSize, Layouts, NumRows, RowSizesInBytes, pSrcData);
+}
+
+//------------------------------------------------------------------------------------------------
+inline bool D3D12IsLayoutOpaque(D3D12_TEXTURE_LAYOUT Layout) {
+  return Layout == D3D12_TEXTURE_LAYOUT_UNKNOWN || Layout == D3D12_TEXTURE_LAYOUT_64KB_UNDEFINED_SWIZZLE;
+}
+
+//------------------------------------------------------------------------------------------------
+inline ID3D12CommandList* const* CommandListCast(ID3D12GraphicsCommandList* const* pp) {
+  // This cast is useful for passing strongly typed command list pointers into
+  // ExecuteCommandLists.
+  // This cast is valid as long as the const-ness is respected. D3D12 APIs do
+  // respect the const-ness of their arguments.
+  return reinterpret_cast<ID3D12CommandList* const*>(pp);
+}
+
+//------------------------------------------------------------------------------------------------
+// D3D12 exports a new method for serializing root signatures in the Windows 10 Anniversary Update.
+// To help enable root signature 1.1 features when they are available and not require maintaining
+// two code paths for building root signatures, this helper method reconstructs a 1.0 signature when
+// 1.1 is not supported.
+inline HRESULT D3DX12SerializeVersionedRootSignature(_In_ const D3D12_VERSIONED_ROOT_SIGNATURE_DESC* pRootSignatureDesc,
+                                                     D3D_ROOT_SIGNATURE_VERSION MaxVersion, _Outptr_ ID3DBlob** ppBlob,
+                                                     _Always_(_Outptr_opt_result_maybenull_) ID3DBlob** ppErrorBlob) {
+  if (ppErrorBlob != NULL) {
+    *ppErrorBlob = NULL;
+  }
+
+  switch (MaxVersion) {
+    case D3D_ROOT_SIGNATURE_VERSION_1_0:
+      switch (pRootSignatureDesc->Version) {
+        case D3D_ROOT_SIGNATURE_VERSION_1_0:
+          return D3D12SerializeRootSignature(&pRootSignatureDesc->Desc_1_0, D3D_ROOT_SIGNATURE_VERSION_1, ppBlob,
+                                             ppErrorBlob);
+
+        case D3D_ROOT_SIGNATURE_VERSION_1_1: {
+          HRESULT hr = S_OK;
+          const D3D12_ROOT_SIGNATURE_DESC1& desc_1_1 = pRootSignatureDesc->Desc_1_1;
+
+          const SIZE_T ParametersSize = sizeof(D3D12_ROOT_PARAMETER) * desc_1_1.NumParameters;
+          void* pParameters = (ParametersSize > 0) ? HeapAlloc(GetProcessHeap(), 0, ParametersSize) : NULL;
+          if (ParametersSize > 0 && pParameters == NULL) {
+            hr = E_OUTOFMEMORY;
+          }
+          D3D12_ROOT_PARAMETER* pParameters_1_0 = reinterpret_cast<D3D12_ROOT_PARAMETER*>(pParameters);
+
+          if (SUCCEEDED(hr)) {
+            for (UINT n = 0; n < desc_1_1.NumParameters; n++) {
+              __analysis_assume(ParametersSize == sizeof(D3D12_ROOT_PARAMETER) * desc_1_1.NumParameters);
+              pParameters_1_0[n].ParameterType = desc_1_1.pParameters[n].ParameterType;
+              pParameters_1_0[n].ShaderVisibility = desc_1_1.pParameters[n].ShaderVisibility;
+
+              switch (desc_1_1.pParameters[n].ParameterType) {
+                case D3D12_ROOT_PARAMETER_TYPE_32BIT_CONSTANTS:
+                  pParameters_1_0[n].Constants.Num32BitValues = desc_1_1.pParameters[n].Constants.Num32BitValues;
+                  pParameters_1_0[n].Constants.RegisterSpace = desc_1_1.pParameters[n].Constants.RegisterSpace;
+                  pParameters_1_0[n].Constants.ShaderRegister = desc_1_1.pParameters[n].Constants.ShaderRegister;
+                  break;
+
+                case D3D12_ROOT_PARAMETER_TYPE_CBV:
+                case D3D12_ROOT_PARAMETER_TYPE_SRV:
+                case D3D12_ROOT_PARAMETER_TYPE_UAV:
+                  pParameters_1_0[n].Descriptor.RegisterSpace = desc_1_1.pParameters[n].Descriptor.RegisterSpace;
+                  pParameters_1_0[n].Descriptor.ShaderRegister = desc_1_1.pParameters[n].Descriptor.ShaderRegister;
+                  break;
+
+                case D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE:
+                  const D3D12_ROOT_DESCRIPTOR_TABLE1& table_1_1 = desc_1_1.pParameters[n].DescriptorTable;
+
+                  const SIZE_T DescriptorRangesSize = sizeof(D3D12_DESCRIPTOR_RANGE) * table_1_1.NumDescriptorRanges;
+                  void* pDescriptorRanges = (DescriptorRangesSize > 0 && SUCCEEDED(hr))
+                                                ? HeapAlloc(GetProcessHeap(), 0, DescriptorRangesSize)
+                                                : NULL;
+                  if (DescriptorRangesSize > 0 && pDescriptorRanges == NULL) {
+                    hr = E_OUTOFMEMORY;
+                  }
+                  D3D12_DESCRIPTOR_RANGE* pDescriptorRanges_1_0 =
+                      reinterpret_cast<D3D12_DESCRIPTOR_RANGE*>(pDescriptorRanges);
+
+                  if (SUCCEEDED(hr)) {
+                    for (UINT x = 0; x < table_1_1.NumDescriptorRanges; x++) {
+                      __analysis_assume(DescriptorRangesSize ==
+                                        sizeof(D3D12_DESCRIPTOR_RANGE) * table_1_1.NumDescriptorRanges);
+                      pDescriptorRanges_1_0[x].BaseShaderRegister = table_1_1.pDescriptorRanges[x].BaseShaderRegister;
+                      pDescriptorRanges_1_0[x].NumDescriptors = table_1_1.pDescriptorRanges[x].NumDescriptors;
+                      pDescriptorRanges_1_0[x].OffsetInDescriptorsFromTableStart =
+                          table_1_1.pDescriptorRanges[x].OffsetInDescriptorsFromTableStart;
+                      pDescriptorRanges_1_0[x].RangeType = table_1_1.pDescriptorRanges[x].RangeType;
+                      pDescriptorRanges_1_0[x].RegisterSpace = table_1_1.pDescriptorRanges[x].RegisterSpace;
+                    }
+                  }
+
+                  D3D12_ROOT_DESCRIPTOR_TABLE& table_1_0 = pParameters_1_0[n].DescriptorTable;
+                  table_1_0.NumDescriptorRanges = table_1_1.NumDescriptorRanges;
+                  table_1_0.pDescriptorRanges = pDescriptorRanges_1_0;
+              }
+            }
+          }
+
+          if (SUCCEEDED(hr)) {
+            CD3DX12_ROOT_SIGNATURE_DESC desc_1_0(desc_1_1.NumParameters, pParameters_1_0, desc_1_1.NumStaticSamplers,
+                                                 desc_1_1.pStaticSamplers, desc_1_1.Flags);
+            hr = D3D12SerializeRootSignature(&desc_1_0, D3D_ROOT_SIGNATURE_VERSION_1, ppBlob, ppErrorBlob);
+          }
+
+          if (pParameters) {
+            for (UINT n = 0; n < desc_1_1.NumParameters; n++) {
+              if (desc_1_1.pParameters[n].ParameterType == D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE) {
+                HeapFree(GetProcessHeap(), 0,
+                         reinterpret_cast<void*>(const_cast<D3D12_DESCRIPTOR_RANGE*>(
+                             pParameters_1_0[n].DescriptorTable.pDescriptorRanges)));
+              }
+            }
+            HeapFree(GetProcessHeap(), 0, pParameters);
+          }
+          return hr;
+        }
+      }
+      break;
+
+    case D3D_ROOT_SIGNATURE_VERSION_1_1:
+      return D3D12SerializeVersionedRootSignature(pRootSignatureDesc, ppBlob, ppErrorBlob);
+  }
+
+  return E_INVALIDARG;
+}
+
+#endif  // defined( __cplusplus )
+
+#endif  //__D3DX12_H__

diff --git a/testing/common/frame_outs.cpp b/testing/common/frame_outs.cpp
new file mode 100644
index 0000000..ddbae16
--- /dev/null
+++ b/testing/common/frame_outs.cpp

@@ -0,0 +1,179 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "frame_outs.h"
+#include <assert.h>
+#include "d3dx12.h"
+
+#if defined(_DEBUG)
+inline void SetName(ID3D12Object* pObject, LPCWSTR name) { pObject->SetName(name); }
+inline void SetNameIndexed(ID3D12Object* pObject, LPCWSTR name, UINT index) {
+  WCHAR fullName[50];
+  if (swprintf_s(fullName, L"%s[%u]", name, index) > 0) {
+    pObject->SetName(fullName);
+  }
+}
+
+#define NAME_D3D12_OBJECT(x) SetName(x.Get(), L#x)
+#define NAME_D3D12_OBJECT_INDEXED(x, n) SetNameIndexed(x[n].Get(), L#x, n)
+#else
+#define NAME_D3D12_OBJECT(x)
+#define NAME_D3D12_OBJECT_INDEXED(x, n)
+#endif
+
+OutBufferWrapper::~OutBufferWrapper() {
+  if (storage_ && buffer_) {
+    XB_LOGD << "Release OutBuffer " << buffer_;
+    storage_->Push(buffer_);
+  }
+}
+
+void OutBufferWrapper::ReleaseRented() {
+  if (storage_ && buffer_) {
+    storage_->ReleaseRentedBuffer(buffer_);
+  }
+}
+
+OutBufferStorage::OutBufferStorage(int queue_size, ID3D12Device* dx12_device)
+    : storage_(queue_size), queue_free_(queue_size), queue_size_(queue_size), dx12_device_(dx12_device) {
+  for (int i = 0; i < queue_size_; i++) {
+    OutBuffer& item = storage_[i];
+    item.shown = 0;
+    queue_free_.Push(&item);
+  }
+}
+
+void OutBufferStorage::Push(OutBuffer* frame) { queue_free_.Push(frame); }
+
+std::shared_ptr<OutBufferWrapper> OutBufferStorage::GetFreeBuffer(uint32_t width, uint32_t height,
+                                                                  frame_buffer_type fb_type, size_t buffer_size,
+                                                                  uint32_t flags) {
+  HRESULT hr;
+  int has_errors = 0;
+  queue_free_.DoForAll(1, [&](OutBuffer* buffer) {
+    if (flags & CBUFTEXTURE) {
+      if (buffer->renderWidth != width || buffer->renderHeight != height || buffer->fb_type != fb_type) {
+        for (int i = 0; i < 3; i++) {
+          const int subsampling = i > 0;
+          const int _width = (fb_type == fbt10x3) ? (((width + subsampling) >> subsampling) + 2) / 3
+                                                  : ((width + subsampling) >> subsampling);
+          const int _height = (height + subsampling) >> subsampling;
+
+          // Create interop resources
+          D3D12_RESOURCE_DESC textureDesc = {};
+          textureDesc.MipLevels = 1;
+          textureDesc.Format = (fb_type == fbt10bit)
+                                   ? DXGI_FORMAT_R16_UNORM
+                                   : ((fb_type == fbt8bit) ? DXGI_FORMAT_R8_UNORM : DXGI_FORMAT_R10G10B10A2_UNORM);
+          textureDesc.Width = _width;
+          textureDesc.Height = _height;
+          textureDesc.Flags = D3D12_RESOURCE_FLAG_ALLOW_RENDER_TARGET;
+          textureDesc.DepthOrArraySize = 1;
+          textureDesc.SampleDesc.Count = 1;
+          textureDesc.SampleDesc.Quality = 0;
+          textureDesc.Dimension = D3D12_RESOURCE_DIMENSION_TEXTURE2D;
+          textureDesc.Layout = D3D12_TEXTURE_LAYOUT_UNKNOWN;
+          hr = dx12_device_->CreateCommittedResource(
+              &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT), D3D12_HEAP_FLAG_SHARED, &textureDesc,
+              D3D12_RESOURCE_STATE_COPY_DEST, 0,
+              IID_PPV_ARGS(&buffer->d3d12textures[i]));
+          if (FAILED(hr)) {
+            has_errors = 1;
+            return 0;
+          }
+          NAME_D3D12_OBJECT_INDEXED(buffer->d3d12textures, i);
+        }
+        buffer->renderWidth = width;
+        buffer->renderHeight = height;
+        buffer->fb_type = fb_type;
+      }
+    }
+    if (flags & CBUFREADBACK) {
+      if (buffer->buffer_size < buffer_size) {
+        // Create readback buffer (if any)
+        const D3D12_HEAP_TYPE heapType = D3D12_HEAP_TYPE_READBACK;
+        const D3D12_RESOURCE_STATES state = D3D12_RESOURCE_STATE_COPY_DEST;
+        D3D12_HEAP_PROPERTIES heapProp;
+        heapProp.Type = heapType;
+        heapProp.CPUPageProperty = D3D12_CPU_PAGE_PROPERTY_UNKNOWN;  //
+        heapProp.MemoryPoolPreference = D3D12_MEMORY_POOL_UNKNOWN;
+        heapProp.CreationNodeMask = 0;
+        heapProp.VisibleNodeMask = 0;
+
+        const D3D12_RESOURCE_FLAGS flags = D3D12_RESOURCE_FLAG_NONE;
+        hr = dx12_device_->CreateCommittedResource(&heapProp, D3D12_HEAP_FLAG_NONE,
+                                                   &CD3DX12_RESOURCE_DESC::Buffer(buffer_size, flags), state, NULL,
+                                                   IID_PPV_ARGS(&buffer->d3d12buffer));
+
+        if (SUCCEEDED(hr)) {
+          hr = buffer->d3d12buffer->Map(0, NULL, &buffer->pMem);
+          if (FAILED(hr)) {
+            has_errors = 1;
+            return 0;
+          }
+          buffer->d3d12buffer->Unmap(0, 0);
+        }
+        buffer->buffer_size = buffer_size;
+      }
+    }
+    return 1;
+  });
+  if (has_errors) {
+    XB_LOGE << "Failed to allocate output buffers.";
+    queue_free_.DoForAll(1, [&](OutBuffer* buffer) {
+      buffer->Reset();
+      return 1;
+    });
+    return 0;
+  }
+  OutBuffer* buffer = queue_free_.Pull();
+  if (!(flags & CBUFREADBACK)) {
+    buffer->d3d12buffer.Reset();
+    buffer->pMem = 0;
+  }
+  std::unique_lock<std::mutex> lock(cs_);
+  OutBufferWrapper* bf = new (std::nothrow) OutBufferWrapper(buffer, this);
+  if (!bf) return 0;
+  std::shared_ptr<OutBufferWrapper> retbuf = std::shared_ptr<OutBufferWrapper>(bf);
+  rented_.push_back(retbuf);
+  return retbuf;
+}
+
+std::shared_ptr<OutBufferWrapper> OutBufferStorage::ReleaseRentedBuffer(void* ptr) {
+  std::unique_lock<std::mutex> lock(cs_);
+  for (auto iter = rented_.begin(); iter != rented_.end(); iter++) {
+    if ((*iter)->Buffer() == ptr) {
+      XB_LOGD << "ReleaseRentedBuffer " << ptr;
+      std::shared_ptr<OutBufferWrapper> buff = *iter;
+      rented_.erase(iter);
+      return buff;
+    }
+  }
+  assert(0 && "buffer not found");
+  return 0;
+}
+
+std::shared_ptr<OutBufferWrapper> OutBufferStorage::FindRentedBuffer(void* ptr) {
+  std::unique_lock<std::mutex> lock(cs_);
+  for (auto iter = rented_.begin(); iter != rented_.end(); iter++) {
+    if ((*iter)->Buffer() == ptr) {
+      std::shared_ptr<OutBufferWrapper> buff = *iter;
+      return buff;
+    }
+  }
+  assert(0 && "buffer not found");
+  return 0;
+}

diff --git a/testing/common/frame_outs.h b/testing/common/frame_outs.h
new file mode 100644
index 0000000..f439974
--- /dev/null
+++ b/testing/common/frame_outs.h

@@ -0,0 +1,123 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+#include "frame_queue.h"
+#include "log.h"
+#include <vector>
+#include <list>
+#include <mutex>
+#include <memory>
+#include "aom/aom_decoder.h"
+
+#include <wrl.h>
+#include <wrl/client.h>
+#include <d3d12.h>
+#include <assert.h>
+
+using namespace Microsoft::WRL;
+
+#define MEM_BLOCK_SIZE 65536
+#define TEXTURE_MEM_ALIGNMENT 256
+#define TEXTURE_MEM_ALIGNMENT_1 (TEXTURE_MEM_ALIGNMENT - 1)
+#define TEXTURE_MEM_ALIGNMENT_SHIFT 8
+#define MEM_ALIGN_64K (1 << 16)
+
+typedef struct OutBuffer {
+  void* pMem;
+  size_t memSize;
+  uint32_t frameWidth;
+  uint32_t frameHeight;
+
+  uint32_t renderWidth;
+  uint32_t renderHeight;
+
+  uint8_t* planes[3];
+  uint32_t strides[3];
+  frame_buffer_type fb_type;
+  ComPtr<ID3D12Resource> d3d12buffer;
+  ComPtr<ID3D12Resource> d3d12textures[3];
+  size_t buffer_size;
+
+  uint64_t pts;
+  uint32_t frame_no;
+  uint32_t frame_no_offset;
+  uint32_t shown;
+  OutBuffer() { memset(this, 0, sizeof(OutBuffer)); }
+  ~OutBuffer() {}
+  void Reset() {
+    for (int i = 0; i < 3; i++) {
+      d3d12textures[i].Reset();
+      d3d12buffer.Reset();
+      planes[i] = 0;
+      strides[i] = 0;
+    }
+    pMem = 0;
+    memSize = 0;
+    renderWidth = 0;
+    renderHeight = 0;
+    frameWidth = 0;
+    frameHeight = 0;
+    fb_type = fbtNone;
+  }
+} OutBuffer;
+
+class OutBufferStorage;
+class OutBufferWrapper {
+ public:
+  OutBufferWrapper(OutBuffer* buffer, OutBufferStorage* storage) : buffer_(buffer), storage_(storage) {}
+  ~OutBufferWrapper();
+  OutBuffer* Buffer() { return buffer_; }
+  uint32_t Shown() { return buffer_->shown; }
+  void SetShown(uint32_t val) { buffer_->shown = val; }
+  void ReleaseRented();
+
+ private:
+  OutBuffer* buffer_;
+  OutBufferStorage* storage_;
+};
+
+typedef enum { CBUFREADBACK = 0x1, CBUFTEXTURE = 0x2, CBUFALL = 0x3 } cbufFlags;
+
+class OutBufferStorage {
+ public:
+  OutBufferStorage(int queue_size, ID3D12Device* dx12_device);
+  ~OutBufferStorage() {
+    if (queue_free_.Size() != queue_size_) {
+      XB_LOGE << "OutBufferStorage still has unreleashed buffers!! " << (queue_size_ - queue_free_.Size());
+      rented_.clear();
+      XB_LOGE << "Release it anyway";
+    }
+  }
+
+  size_t GetSize() { return queue_free_.Size(); }
+  std::shared_ptr<OutBufferWrapper> GetFreeBuffer(uint32_t width, uint32_t height, frame_buffer_type fb_type,
+                                                  size_t buffer_size, uint32_t flags);
+  std::shared_ptr<OutBufferWrapper> ReleaseRentedBuffer(void* ptr);
+  std::shared_ptr<OutBufferWrapper> FindRentedBuffer(void* ptr);
+
+ private:
+  void Push(OutBuffer* frame);
+  friend class OutBufferWrapper;
+
+ private:
+  std::vector<OutBuffer> storage_;
+  std::list<std::shared_ptr<OutBufferWrapper> > rented_;
+  myqueue<OutBuffer*> queue_free_;
+  int queue_size_;
+  std::mutex cs_;
+  ComPtr<ID3D12Device> dx12_device_;
+};

diff --git a/testing/common/frame_queue.h b/testing/common/frame_queue.h
new file mode 100644
index 0000000..fa731fa
--- /dev/null
+++ b/testing/common/frame_queue.h

@@ -0,0 +1,96 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+#include <thread>
+#include <mutex>
+#include <condition_variable>
+#include <queue>
+
+template <class T>
+class myqueue {
+ public:
+  myqueue(unsigned int cap = 5) : cap_(cap) {}
+  void Push(T value) {
+    std::unique_lock<std::mutex> lock(cs_);
+    cv_.wait(lock, [this] { return this->queue_.size() < this->cap_; });
+    queue_.push(value);
+    cv_.notify_one();
+  }
+  void PushWait(T value) {
+    Push(value);
+    std::unique_lock<std::mutex> lock(cs_get_);
+    wait_ = true;
+    cv_get_.wait(lock, [this] { return !this->wait_; });
+  }
+  T Pull() {
+    std::unique_lock<std::mutex> lock(cs_);
+    cv_.wait(lock, [this] { return !this->queue_.empty(); });
+    T r = queue_.front();
+    queue_.pop();
+    cv_.notify_one();
+    return r;
+  }
+  T PullLimited(uint32_t max_size) {
+    std::unique_lock<std::mutex> lock(cs_);
+    cv_.wait(lock, [this] { return this->cap_ - this->queue_.size() >= max_size; });
+    T r = queue_.front();
+    queue_.pop();
+    cv_.notify_one();
+    return r;
+  }
+  T& FrontUnsafe() {
+    std::unique_lock<std::mutex> lock(cs_);
+    cv_.wait(lock, [this] { return !this->queue_.empty(); });
+    T& r = queue_.front();
+    return r;
+  }
+  void EndWait() {
+    std::unique_lock<std::mutex> lock(cs_get_);
+    wait_ = false;
+    cv_get_.notify_one();
+  }
+  bool IsEmpty() {
+    std::lock_guard<std::mutex> lock(cs_);
+    return queue_.empty();
+  }
+  bool notFull() {
+    std::lock_guard<std::mutex> lock(cs_);
+    return queue_.size() < cap_;
+  }
+  size_t Size() {
+    std::lock_guard<std::mutex> lock(cs_);
+    return queue_.size();
+  }
+  template <class Predicate>
+  void DoForAll(int waitNotEmpty, Predicate func) {
+    std::unique_lock<std::mutex> lock(cs_);
+    if (waitNotEmpty) cv_.wait(lock, [this] { return !this->queue_.empty(); });
+    for (auto iter = queue_._Get_container().begin(); iter != queue_._Get_container().end(); iter++) {
+      T t = *iter;
+      if (!func(t)) break;
+    }
+  }
+
+ private:
+  std::queue<T> queue_;
+  std::mutex cs_;
+  std::condition_variable cv_;
+  std::mutex cs_get_;
+  std::condition_variable cv_get_;
+  unsigned int cap_;
+  bool wait_ = false;
+};

diff --git a/testing/common/ivfdec.c b/testing/common/ivfdec.c
new file mode 100644
index 0000000..5ccb7ca
--- /dev/null
+++ b/testing/common/ivfdec.c

@@ -0,0 +1,115 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "ivfdec.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "aom_ports/mem_ops.h"
+#include "aom_ports/sanitizer.h"
+
+static const char *IVF_SIGNATURE = "DKIF";
+
+static void fix_framerate(int *num, int *den) {
+  if (*den <= 0 || *den >= 1000000000 || *num <= 0 || *num >= 1000) {
+    // framerate seems to be invalid, just default to 30fps.
+    *num = 30;
+    *den = 1;
+  }
+}
+
+int file_is_ivf(struct AvxInputContext *input_ctx) {
+  char raw_hdr[32];
+  int is_ivf = 0;
+
+  if (fread(raw_hdr, 1, 32, input_ctx->file) == 32) {
+    if (memcmp(IVF_SIGNATURE, raw_hdr, 4) == 0) {
+      is_ivf = 1;
+
+      if (mem_get_le16(raw_hdr + 4) != 0) {
+        fprintf(stderr,
+                "Error: Unrecognized IVF version! This file may not"
+                " decode properly.");
+      }
+
+      input_ctx->fourcc = mem_get_le32(raw_hdr + 8);
+      input_ctx->width = mem_get_le16(raw_hdr + 12);
+      input_ctx->height = mem_get_le16(raw_hdr + 14);
+      input_ctx->framerate.numerator = mem_get_le32(raw_hdr + 16);
+      input_ctx->framerate.denominator = mem_get_le32(raw_hdr + 20);
+      fix_framerate(&input_ctx->framerate.numerator, &input_ctx->framerate.denominator);
+    }
+  }
+
+  if (!is_ivf) {
+    rewind(input_ctx->file);
+    input_ctx->detect.buf_read = 0;
+  } else {
+    input_ctx->detect.position = 4;
+  }
+  return is_ivf;
+}
+
+void ivf_free_buffer(uint8_t **buffer) {
+  if (*buffer) {
+    free(*buffer);
+    *buffer = 0;
+  }
+}
+
+int ivf_read_frame(FILE *infile, uint8_t **buffer, size_t *bytes_read, size_t *buffer_size, aom_codec_pts_t *pts) {
+  char raw_header[IVF_FRAME_HDR_SZ] = {0};
+  size_t frame_size = 0;
+
+  if (fread(raw_header, IVF_FRAME_HDR_SZ, 1, infile) != 1) {
+    if (!feof(infile)) warn("Failed to read frame size");
+  } else {
+    frame_size = mem_get_le32(raw_header);
+
+    if (frame_size > 256 * 1024 * 1024) {
+      warn("Read invalid frame size (%u)", (unsigned int)frame_size);
+      frame_size = 0;
+    }
+
+    if (frame_size > *buffer_size) {
+      uint8_t *new_buffer = (uint8_t *)realloc(*buffer, 2 * frame_size);
+
+      if (new_buffer) {
+        *buffer = new_buffer;
+        *buffer_size = 2 * frame_size;
+      } else {
+        warn("Failed to allocate compressed data buffer");
+        frame_size = 0;
+      }
+    }
+
+    if (pts) {
+      *pts = mem_get_le32(&raw_header[4]);
+      *pts += ((aom_codec_pts_t)mem_get_le32(&raw_header[8]) << 32);
+    }
+  }
+
+  if (!feof(infile)) {
+    ASAN_UNPOISON_MEMORY_REGION(*buffer, *buffer_size);
+    if (fread(*buffer, 1, frame_size, infile) != frame_size) {
+      warn("Failed to read full frame");
+      return 1;
+    }
+
+    ASAN_POISON_MEMORY_REGION(*buffer + frame_size, *buffer_size - frame_size);
+    *bytes_read = frame_size;
+    return 0;
+  }
+
+  return 1;
+}

diff --git a/testing/common/ivfdec.h b/testing/common/ivfdec.h
new file mode 100644
index 0000000..8d03f9c
--- /dev/null
+++ b/testing/common/ivfdec.h

@@ -0,0 +1,30 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#ifndef AOM_COMMON_IVFDEC_H_
+#define AOM_COMMON_IVFDEC_H_
+
+#include "tools_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+int file_is_ivf(struct AvxInputContext *input);
+
+typedef int64_t aom_codec_pts_t;
+int ivf_read_frame(FILE *infile, uint8_t **buffer, size_t *bytes_read, size_t *buffer_size, aom_codec_pts_t *pts);
+void ivf_free_buffer(uint8_t **buffer);
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#endif  // AOM_COMMON_IVFDEC_H_

diff --git a/testing/common/log.cpp b/testing/common/log.cpp
new file mode 100644
index 0000000..1376c99
--- /dev/null
+++ b/testing/common/log.cpp

@@ -0,0 +1,70 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <wrl.h>
+#include "log.h"
+
+const char* log_priority_names[] = {
+    "DEBUG", "INFO", "WARNING", "ERROR", "FATAL",
+};
+logging::LogPriority g_priority = logging::LOG_ERROR;
+std::string g_log_file;
+
+namespace logging {
+void set_log_priority(logging::LogPriority priority) { g_priority = priority; }
+void set_log_file(const char* file_path) { g_log_file = file_path; }
+
+LogMessage::LogMessage(const char* file, int line, LogPriority priority)
+    : priority_(priority), file_(file), line_(line) {
+  Init(file_, line_);
+}
+LogMessage::~LogMessage() {
+  if (priority_ >= g_priority) {
+    stream_ << std::endl;
+    std::string str_newline(stream_.str());
+    std::unique_lock<std::mutex> lock(cs_);
+    RawLog(str_newline);
+  }
+}
+void LogMessage::Init(const char* file, int line) {
+  std::string filename(file);
+  size_t last_slash_pos = filename.find_last_of("\\/");
+  if (last_slash_pos != std::string::npos) {
+    filename.erase(0, last_slash_pos + 1);
+  }
+  stream_ << '[';
+  stream_ << std::this_thread::get_id() << ':';
+  stream_ << ((double)clock() / (double)CLOCKS_PER_SEC) << ':';
+  stream_ << log_priority_names[priority_];
+  stream_ << ":" << filename << "(" << line << ")] ";
+  message_start_ = stream_.tellp();
+}
+void LogMessage::RawLog(std::string& message) {
+#if (WINAPI_FAMILY == WINAPI_FAMILY_APP)
+  OutputDebugStringA((message + "\n").c_str());
+#else
+  printf("%s\n", message.c_str());
+#endif
+  if (!g_log_file.empty()) {
+    FILE* f = fopen(g_log_file.c_str(), "a+");
+    if (f) {
+      fprintf(f, "%s", message.c_str());
+      fclose(f);
+    }
+  }
+}
+
+}  // namespace logging
\ No newline at end of file

diff --git a/testing/common/log.h b/testing/common/log.h
new file mode 100644
index 0000000..80d3f6f
--- /dev/null
+++ b/testing/common/log.h

@@ -0,0 +1,75 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+
+#include <iostream>
+#include <sstream>
+#include <thread>
+#include <chrono>
+#include <mutex>
+#include <string>
+
+//#define XB_LOG_OFF
+
+namespace logging {
+typedef enum LogPriority { LOG_DEBUG = 0, LOG_INFO, LOG_WARNING, LOG_ERROR, LOG_FATAL } LogPriority;
+
+class LogMessage {
+ public:
+  LogMessage(const char* file, int line, LogPriority priority);
+  ~LogMessage();
+  std::ostream& stream() { return stream_; }
+
+ private:
+  void Init(const char* file, int line);
+  void RawLog(std::string& message);
+  std::ostringstream stream_;
+  LogPriority priority_;
+  size_t message_start_;
+  const char* file_;
+  const int line_;
+  std::mutex cs_;
+};
+class DummyLogMessage {
+ public:
+  DummyLogMessage& operator<<(const char* data) { return *this; }
+  DummyLogMessage& operator<<(int data) { return *this; }
+  DummyLogMessage& operator<<(size_t data) { return *this; }
+  DummyLogMessage& operator<<(void* data) { return *this; }
+};
+void set_log_priority(logging::LogPriority priority);
+void set_log_file(const char* file_path);
+}  // namespace logging
+#define LOG_MESSAGE_DEBUG ::logging::LogMessage(__FILE__, __LINE__, ::logging::LOG_DEBUG)
+#define LOG_MESSAGE_INFO ::logging::LogMessage(__FILE__, __LINE__, ::logging::LOG_INFO)
+#define LOG_MESSAGE_WARNING ::logging::LogMessage(__FILE__, __LINE__, ::logging::LOG_WARNING)
+#define LOG_MESSAGE_ERROR_ ::logging::LogMessage(__FILE__, __LINE__, ::logging::LOG_ERROR)
+#define LOG_MESSAGE_FATAL ::logging::LogMessage(__FILE__, __LINE__, ::logging::LOG_FATAL)
+
+#define LOG_STREAM(severity) LOG_MESSAGE_##severity.stream()
+#define DUMMY_LOG ::logging::DummyLogMessage()
+#ifndef XB_LOG_OFF
+#define XB_LOG(severity) LOG_STREAM(severity)
+#else
+#define XB_LOG(severity) DUMMY_LOG
+#endif
+
+#define XB_LOGD XB_LOG(DEBUG)
+#define XB_LOGI XB_LOG(INFO)
+#define XB_LOGW XB_LOG(WARNING)
+#define XB_LOGE XB_LOG(ERROR_)
+#define XB_LOGFATAL XB_LOG(FATAL)
\ No newline at end of file

diff --git a/testing/common/md5_utils.c b/testing/common/md5_utils.c
new file mode 100644
index 0000000..e9cbee8
--- /dev/null
+++ b/testing/common/md5_utils.c

@@ -0,0 +1,244 @@
+/*
+ * This code implements the MD5 message-digest algorithm.
+ * The algorithm is due to Ron Rivest.  This code was
+ * written by Colin Plumb in 1993, no copyright is claimed.
+ * This code is in the public domain; do with it what you wish.
+ *
+ * Equivalent code is available from RSA Data Security, Inc.
+ * This code has been tested against that, and is equivalent,
+ * except that you don't need to include two pages of legalese
+ * with every copy.
+ *
+ * To compute the message digest of a chunk of bytes, declare an
+ * MD5Context structure, pass it to MD5Init, call MD5Update as
+ * needed on buffers full of bytes, and then call MD5Final, which
+ * will fill a supplied 16-byte array with the digest.
+ *
+ * Changed so as no longer to depend on Colin Plumb's `usual.h' header
+ * definitions
+ *  - Ian Jackson <ian@chiark.greenend.org.uk>.
+ * Still in the public domain.
+ */
+
+#include <string.h> /* for memcpy() */
+
+#include "common/md5_utils.h"
+
+static void byteSwap(UWORD32 *buf, unsigned words) {
+  md5byte *p;
+
+  /* Only swap bytes for big endian machines */
+  int i = 1;
+
+  if (*(char *)&i == 1) return;
+
+  p = (md5byte *)buf;
+
+  do {
+    *buf++ = (UWORD32)((unsigned)p[3] << 8 | p[2]) << 16 | ((unsigned)p[1] << 8 | p[0]);
+    p += 4;
+  } while (--words);
+}
+
+/*
+ * Start MD5 accumulation.  Set bit count to 0 and buffer to mysterious
+ * initialization constants.
+ */
+void MD5Init(struct MD5Context *ctx) {
+  ctx->buf[0] = 0x67452301;
+  ctx->buf[1] = 0xefcdab89;
+  ctx->buf[2] = 0x98badcfe;
+  ctx->buf[3] = 0x10325476;
+
+  ctx->bytes[0] = 0;
+  ctx->bytes[1] = 0;
+}
+
+/*
+ * Update context to reflect the concatenation of another buffer full
+ * of bytes.
+ */
+void MD5Update(struct MD5Context *ctx, md5byte const *buf, unsigned len) {
+  UWORD32 t;
+
+  /* Update byte count */
+
+  t = ctx->bytes[0];
+
+  if ((ctx->bytes[0] = t + len) < t) ctx->bytes[1]++; /* Carry from low to high */
+
+  t = 64 - (t & 0x3f); /* Space available in ctx->in (at least 1) */
+
+  if (t > len) {
+    memcpy((md5byte *)ctx->in + 64 - t, buf, len);
+    return;
+  }
+
+  /* First chunk is an odd size */
+  memcpy((md5byte *)ctx->in + 64 - t, buf, t);
+  byteSwap(ctx->in, 16);
+  MD5Transform(ctx->buf, ctx->in);
+  buf += t;
+  len -= t;
+
+  /* Process data in 64-byte chunks */
+  while (len >= 64) {
+    memcpy(ctx->in, buf, 64);
+    byteSwap(ctx->in, 16);
+    MD5Transform(ctx->buf, ctx->in);
+    buf += 64;
+    len -= 64;
+  }
+
+  /* Handle any remaining bytes of data. */
+  memcpy(ctx->in, buf, len);
+}
+
+/*
+ * Final wrapup - pad to 64-byte boundary with the bit pattern
+ * 1 0* (64-bit count of bits processed, MSB-first)
+ */
+void MD5Final(md5byte digest[16], struct MD5Context *ctx) {
+  int count = ctx->bytes[0] & 0x3f; /* Number of bytes in ctx->in */
+  md5byte *p = (md5byte *)ctx->in + count;
+
+  /* Set the first char of padding to 0x80.  There is always room. */
+  *p++ = 0x80;
+
+  /* Bytes of padding needed to make 56 bytes (-8..55) */
+  count = 56 - 1 - count;
+
+  if (count < 0) { /* Padding forces an extra block */
+    memset(p, 0, count + 8);
+    byteSwap(ctx->in, 16);
+    MD5Transform(ctx->buf, ctx->in);
+    p = (md5byte *)ctx->in;
+    count = 56;
+  }
+
+  memset(p, 0, count);
+  byteSwap(ctx->in, 14);
+
+  /* Append length in bits and transform */
+  ctx->in[14] = ctx->bytes[0] << 3;
+  ctx->in[15] = ctx->bytes[1] << 3 | ctx->bytes[0] >> 29;
+  MD5Transform(ctx->buf, ctx->in);
+
+  byteSwap(ctx->buf, 4);
+  memcpy(digest, ctx->buf, 16);
+  memset(ctx, 0, sizeof(*ctx)); /* In case it's sensitive */
+}
+
+#ifndef ASM_MD5
+
+/* The four core functions - F1 is optimized somewhat */
+
+/* #define F1(x, y, z) (x & y | ~x & z) */
+#define F1(x, y, z) (z ^ (x & (y ^ z)))
+#define F2(x, y, z) F1(z, x, y)
+#define F3(x, y, z) (x ^ y ^ z)
+#define F4(x, y, z) (y ^ (x | ~z))
+
+/* This is the central step in the MD5 algorithm. */
+#define MD5STEP(f, w, x, y, z, in, s) (w += f(x, y, z) + in, w = (w << s | w >> (32 - s)) + x)
+
+#if defined(__clang__) && defined(__has_attribute)
+#if __has_attribute(no_sanitize)
+#define AOM_NO_UNSIGNED_OVERFLOW_CHECK __attribute__((no_sanitize("unsigned-integer-overflow")))
+#endif
+#endif
+
+#ifndef AOM_NO_UNSIGNED_OVERFLOW_CHECK
+#define AOM_NO_UNSIGNED_OVERFLOW_CHECK
+#endif
+
+/*
+ * The core of the MD5 algorithm, this alters an existing MD5 hash to
+ * reflect the addition of 16 longwords of new data.  MD5Update blocks
+ * the data and converts bytes into longwords for this routine.
+ */
+AOM_NO_UNSIGNED_OVERFLOW_CHECK void MD5Transform(UWORD32 buf[4], UWORD32 const in[16]) {
+  register UWORD32 a, b, c, d;
+
+  a = buf[0];
+  b = buf[1];
+  c = buf[2];
+  d = buf[3];
+
+  MD5STEP(F1, a, b, c, d, in[0] + 0xd76aa478, 7);
+  MD5STEP(F1, d, a, b, c, in[1] + 0xe8c7b756, 12);
+  MD5STEP(F1, c, d, a, b, in[2] + 0x242070db, 17);
+  MD5STEP(F1, b, c, d, a, in[3] + 0xc1bdceee, 22);
+  MD5STEP(F1, a, b, c, d, in[4] + 0xf57c0faf, 7);
+  MD5STEP(F1, d, a, b, c, in[5] + 0x4787c62a, 12);
+  MD5STEP(F1, c, d, a, b, in[6] + 0xa8304613, 17);
+  MD5STEP(F1, b, c, d, a, in[7] + 0xfd469501, 22);
+  MD5STEP(F1, a, b, c, d, in[8] + 0x698098d8, 7);
+  MD5STEP(F1, d, a, b, c, in[9] + 0x8b44f7af, 12);
+  MD5STEP(F1, c, d, a, b, in[10] + 0xffff5bb1, 17);
+  MD5STEP(F1, b, c, d, a, in[11] + 0x895cd7be, 22);
+  MD5STEP(F1, a, b, c, d, in[12] + 0x6b901122, 7);
+  MD5STEP(F1, d, a, b, c, in[13] + 0xfd987193, 12);
+  MD5STEP(F1, c, d, a, b, in[14] + 0xa679438e, 17);
+  MD5STEP(F1, b, c, d, a, in[15] + 0x49b40821, 22);
+
+  MD5STEP(F2, a, b, c, d, in[1] + 0xf61e2562, 5);
+  MD5STEP(F2, d, a, b, c, in[6] + 0xc040b340, 9);
+  MD5STEP(F2, c, d, a, b, in[11] + 0x265e5a51, 14);
+  MD5STEP(F2, b, c, d, a, in[0] + 0xe9b6c7aa, 20);
+  MD5STEP(F2, a, b, c, d, in[5] + 0xd62f105d, 5);
+  MD5STEP(F2, d, a, b, c, in[10] + 0x02441453, 9);
+  MD5STEP(F2, c, d, a, b, in[15] + 0xd8a1e681, 14);
+  MD5STEP(F2, b, c, d, a, in[4] + 0xe7d3fbc8, 20);
+  MD5STEP(F2, a, b, c, d, in[9] + 0x21e1cde6, 5);
+  MD5STEP(F2, d, a, b, c, in[14] + 0xc33707d6, 9);
+  MD5STEP(F2, c, d, a, b, in[3] + 0xf4d50d87, 14);
+  MD5STEP(F2, b, c, d, a, in[8] + 0x455a14ed, 20);
+  MD5STEP(F2, a, b, c, d, in[13] + 0xa9e3e905, 5);
+  MD5STEP(F2, d, a, b, c, in[2] + 0xfcefa3f8, 9);
+  MD5STEP(F2, c, d, a, b, in[7] + 0x676f02d9, 14);
+  MD5STEP(F2, b, c, d, a, in[12] + 0x8d2a4c8a, 20);
+
+  MD5STEP(F3, a, b, c, d, in[5] + 0xfffa3942, 4);
+  MD5STEP(F3, d, a, b, c, in[8] + 0x8771f681, 11);
+  MD5STEP(F3, c, d, a, b, in[11] + 0x6d9d6122, 16);
+  MD5STEP(F3, b, c, d, a, in[14] + 0xfde5380c, 23);
+  MD5STEP(F3, a, b, c, d, in[1] + 0xa4beea44, 4);
+  MD5STEP(F3, d, a, b, c, in[4] + 0x4bdecfa9, 11);
+  MD5STEP(F3, c, d, a, b, in[7] + 0xf6bb4b60, 16);
+  MD5STEP(F3, b, c, d, a, in[10] + 0xbebfbc70, 23);
+  MD5STEP(F3, a, b, c, d, in[13] + 0x289b7ec6, 4);
+  MD5STEP(F3, d, a, b, c, in[0] + 0xeaa127fa, 11);
+  MD5STEP(F3, c, d, a, b, in[3] + 0xd4ef3085, 16);
+  MD5STEP(F3, b, c, d, a, in[6] + 0x04881d05, 23);
+  MD5STEP(F3, a, b, c, d, in[9] + 0xd9d4d039, 4);
+  MD5STEP(F3, d, a, b, c, in[12] + 0xe6db99e5, 11);
+  MD5STEP(F3, c, d, a, b, in[15] + 0x1fa27cf8, 16);
+  MD5STEP(F3, b, c, d, a, in[2] + 0xc4ac5665, 23);
+
+  MD5STEP(F4, a, b, c, d, in[0] + 0xf4292244, 6);
+  MD5STEP(F4, d, a, b, c, in[7] + 0x432aff97, 10);
+  MD5STEP(F4, c, d, a, b, in[14] + 0xab9423a7, 15);
+  MD5STEP(F4, b, c, d, a, in[5] + 0xfc93a039, 21);
+  MD5STEP(F4, a, b, c, d, in[12] + 0x655b59c3, 6);
+  MD5STEP(F4, d, a, b, c, in[3] + 0x8f0ccc92, 10);
+  MD5STEP(F4, c, d, a, b, in[10] + 0xffeff47d, 15);
+  MD5STEP(F4, b, c, d, a, in[1] + 0x85845dd1, 21);
+  MD5STEP(F4, a, b, c, d, in[8] + 0x6fa87e4f, 6);
+  MD5STEP(F4, d, a, b, c, in[15] + 0xfe2ce6e0, 10);
+  MD5STEP(F4, c, d, a, b, in[6] + 0xa3014314, 15);
+  MD5STEP(F4, b, c, d, a, in[13] + 0x4e0811a1, 21);
+  MD5STEP(F4, a, b, c, d, in[4] + 0xf7537e82, 6);
+  MD5STEP(F4, d, a, b, c, in[11] + 0xbd3af235, 10);
+  MD5STEP(F4, c, d, a, b, in[2] + 0x2ad7d2bb, 15);
+  MD5STEP(F4, b, c, d, a, in[9] + 0xeb86d391, 21);
+
+  buf[0] += a;
+  buf[1] += b;
+  buf[2] += c;
+  buf[3] += d;
+}
+
+#undef AOM_NO_UNSIGNED_OVERFLOW_CHECK
+
+#endif

diff --git a/testing/common/md5_utils.h b/testing/common/md5_utils.h
new file mode 100644
index 0000000..144fa3a
--- /dev/null
+++ b/testing/common/md5_utils.h

@@ -0,0 +1,49 @@
+/*
+ * This is the header file for the MD5 message-digest algorithm.
+ * The algorithm is due to Ron Rivest.  This code was
+ * written by Colin Plumb in 1993, no copyright is claimed.
+ * This code is in the public domain; do with it what you wish.
+ *
+ * Equivalent code is available from RSA Data Security, Inc.
+ * This code has been tested against that, and is equivalent,
+ * except that you don't need to include two pages of legalese
+ * with every copy.
+ *
+ * To compute the message digest of a chunk of bytes, declare an
+ * MD5Context structure, pass it to MD5Init, call MD5Update as
+ * needed on buffers full of bytes, and then call MD5Final, which
+ * will fill a supplied 16-byte array with the digest.
+ *
+ * Changed so as no longer to depend on Colin Plumb's `usual.h'
+ * header definitions
+ *  - Ian Jackson <ian@chiark.greenend.org.uk>.
+ * Still in the public domain.
+ */
+
+#ifndef AOM_COMMON_MD5_UTILS_H_
+#define AOM_COMMON_MD5_UTILS_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define md5byte unsigned char
+#define UWORD32 unsigned int
+
+typedef struct MD5Context MD5Context;
+struct MD5Context {
+  UWORD32 buf[4];
+  UWORD32 bytes[2];
+  UWORD32 in[16];
+};
+
+void MD5Init(struct MD5Context *context);
+void MD5Update(struct MD5Context *context, md5byte const *buf, unsigned len);
+void MD5Final(unsigned char digest[16], struct MD5Context *context);
+void MD5Transform(UWORD32 buf[4], UWORD32 const in[16]);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_COMMON_MD5_UTILS_H_

diff --git a/testing/common/mp4parser.cc b/testing/common/mp4parser.cc
new file mode 100644
index 0000000..9a51a4d
--- /dev/null
+++ b/testing/common/mp4parser.cc

@@ -0,0 +1,607 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "mp4parser.h"
+#include <assert.h>
+
+#define MOV_TRUN_DATA_OFFSET 0x01
+#define MOV_TRUN_FIRST_SAMPLE_FLAGS 0x04
+#define MOV_TRUN_SAMPLE_DURATION 0x100
+#define MOV_TRUN_SAMPLE_SIZE 0x200
+#define MOV_TRUN_SAMPLE_FLAGS 0x400
+#define MOV_TRUN_SAMPLE_CTS 0x800
+
+#define MOV_FRAG_SAMPLE_FLAG_DEGRADATION_PRIORITY_MASK 0x0000ffff
+#define MOV_FRAG_SAMPLE_FLAG_IS_NON_SYNC 0x00010000
+#define MOV_FRAG_SAMPLE_FLAG_PADDING_MASK 0x000e0000
+#define MOV_FRAG_SAMPLE_FLAG_REDUNDANCY_MASK 0x00300000
+#define MOV_FRAG_SAMPLE_FLAG_DEPENDED_MASK 0x00c00000
+#define MOV_FRAG_SAMPLE_FLAG_DEPENDS_MASK 0x03000000
+
+#define MOV_TFHD_BASE_DATA_OFFSET 0x01
+#define MOV_TFHD_STSD_ID 0x02
+#define MOV_TFHD_DEFAULT_DURATION 0x08
+#define MOV_TFHD_DEFAULT_SIZE 0x10
+#define MOV_TFHD_DEFAULT_FLAGS 0x20
+#define MOV_TFHD_DURATION_IS_EMPTY 0x010000
+#define MOV_TFHD_DEFAULT_BASE_IS_MOOF 0x020000
+
+int Mp4Parser::findAtom(const char* name, mp4atom* atom) {
+  int ret;
+  while (1) {
+    ret = readAtom(atom);
+    if (ret) return ret;
+    if (!strcmp((const char*)atom->name, name)) return 0;
+    ret = fseek(file_, atom->offset + atom->size, SEEK_SET);
+    if (ret) return ret;
+  }
+}
+
+int Mp4Parser::readAtom(const char* name, mp4atom* atom) {
+  int ret;
+  ret = readAtom(atom);
+  if (ret) return ret;
+  return strcmp((const char*)atom->name, name);
+}
+
+int Mp4Parser::parse_stsd() {
+  uint32_t val;
+  int ret = readUINT32(&val);
+  if (ret) return ret;
+  ret = readUINT32(&val);
+  if (ret || val != 1) return -1;
+  mp4atom atom;
+  ret = readAtom(&atom);
+  if (ret) return ret;
+  if (strcmp((const char*)atom.name, "av01")) return -1;
+  fseek(file_, 24, SEEK_CUR);
+  ret = readUINT16(&width_);
+  if (ret) return ret;
+  ret = readUINT16(&height_);
+  if (ret) return ret;
+  return 0;
+}
+
+int Mp4Parser::parse_stsc() {
+  int ret = findAtom("stsc", &stsc_atom_.atom);
+  if (ret) return ret;
+  ret = readUINT8(&stsc_atom_.version);
+  if (ret) return ret;
+  ret = readUINT24(&stsc_atom_.flags);
+  if (ret) return ret;
+  ret = readUINT32(&stsc_atom_.entry_count);
+  if (ret) return ret;
+  for (uint32_t i = 0; i < stsc_atom_.entry_count; i++) {
+    stscEntry stc;
+    ret = readUINT32(&stc.first_chunk);
+    if (ret) return ret;
+    ret = readUINT32(&stc.samples_per_chunk);
+    if (ret) return ret;
+    ret = readUINT32(&stc.sample_description_index);
+    if (ret) return ret;
+    stsc_atom_.table.push_back(stc);
+  }
+  return 0;
+}
+
+int Mp4Parser::parse_stts() {
+  int ret = findAtom("stts", &stts_atom_.atom);
+  if (ret) return ret;
+  ret = readUINT8(&stts_atom_.version);
+  if (ret) return ret;
+  ret = readUINT24(&stts_atom_.flags);
+  if (ret) return ret;
+  ret = readUINT32(&stts_atom_.entry_count);
+  if (ret) return ret;
+  stts_atom_.total_samples = 0;
+  for (uint32_t i = 0; i < stts_atom_.entry_count; i++) {
+    sttsEntry stts_entry;
+    ret = readUINT32(&stts_entry.sample_count);
+    if (ret) return ret;
+    ret = readUINT32(&stts_entry.sample_delta);
+    stts_atom_.table.push_back(stts_entry);
+    stts_atom_.total_samples += stts_entry.sample_count;
+  }
+  return 0;
+}
+
+int Mp4Parser::parse_stsz() {
+  int ret = findAtom("stsz", &stsz_atom_.atom);
+  if (ret) return ret;
+  ret = readUINT8(&stsz_atom_.version);
+  if (ret) return ret;
+  ret = readUINT24(&stsz_atom_.flags);
+  if (ret) return ret;
+  ret = readUINT32(&stsz_atom_.size);
+  if (ret) return ret;
+  ret = readUINT32(&stsz_atom_.entry_count);
+  if (ret) return ret;
+  for (uint32_t i = 0; i < stsz_atom_.entry_count; i++) {
+    uint32_t stsz_entry;
+    ret = readUINT32(&stsz_entry);
+    stsz_atom_.table.push_back(stsz_entry);
+  }
+  return 0;
+}
+
+int Mp4Parser::parse_stco() {
+  int ret = findAtom("stco", &stco_atom_.atom);
+  if (ret) return ret;
+  ret = readUINT8(&stco_atom_.version);
+  if (ret) return ret;
+  ret = readUINT24(&stco_atom_.flags);
+  if (ret) return ret;
+  ret = readUINT32(&stco_atom_.entry_count);
+  if (ret) return ret;
+  for (uint32_t i = 0; i < stsz_atom_.entry_count; i++) {
+    uint32_t stco_entry;
+    ret = readUINT32(&stco_entry);
+    stco_atom_.table.push_back(stco_entry);
+  }
+  return 0;
+}
+
+int Mp4Parser::parse_sidx() {
+  int ret = fseek(file_, moov_atom_.offset + moov_atom_.size, SEEK_SET);
+  if (ret) return ret;
+  ret = readAtom("sidx", &sidx_atom_.atom);
+  if (ret) return 1;
+  ret = readUINT8(&sidx_atom_.version);
+  if (ret) return ret;
+  ret = readUINT24(&sidx_atom_.flags);
+  if (ret) return ret;
+  ret = readUINT32(&sidx_atom_.track_id);
+  if (ret) return ret;
+  ret = readUINT32(&sidx_atom_.timescale);
+  if (ret) return ret;
+  if (sidx_atom_.version == 0) {
+    ret = readUINT32(&sidx_atom_.first_pts);
+    if (ret) return ret;
+    ret = readUINT32(&sidx_atom_.first_offset);
+    if (ret) return ret;
+  } else {
+    ret = readUINT64(&sidx_atom_.first_pts);
+    if (ret) return ret;
+    ret = readUINT64(&sidx_atom_.first_offset);
+    if (ret) return ret;
+  }
+  uint16_t val;
+  ret = readUINT16(&val);
+  if (ret) return ret;
+  ret = readUINT16(&sidx_atom_.item_count);
+  if (ret) return ret;
+
+  sidx_atom_.table.reserve(sidx_atom_.item_count);
+  for (uint32_t i = 0; i < sidx_atom_.item_count; i++) {
+    sidxEntry entry;
+    uint32_t tmp;
+    ret = readUINT32(&tmp);
+    if (ret) return ret;
+    entry.reference_type = (tmp & 0x80000000) >> 24;
+    entry.referenced_size = tmp & 0x7fffffff;
+    ret = readUINT32(&entry.subsegment_duration);
+    if (ret) return ret;
+    ret = readUINT32(&tmp);
+    if (ret) return ret;
+    entry.starts_with_SAP = (tmp & 0x80000000) >> 24;
+    entry.SAP_type = (tmp & 0x70000000) >> 24;
+    entry.SAP_delta_time = tmp & 0xfffffff;
+    sidx_atom_.table.push_back(entry);
+  }
+
+  return 0;
+}
+
+int Mp4Parser::parse_trun() {
+  mp4atom atom;
+  int ret;
+  ret = findAtom("moof", &current_moof_);
+  if (ret) return ret;
+  ret = findAtom("traf", &atom);
+  if (ret) return ret;
+  ret = findAtom("trun", &trun_atom_.atom);
+  if (ret) return ret;
+  ret = readUINT8(&trun_atom_.version);
+  if (ret) return ret;
+  ret = readUINT24(&trun_atom_.flags);
+  if (ret) return ret;
+  ret = readUINT32(&trun_atom_.count);
+  if (ret) return ret;
+  trun_atom_.data_offset = 0;
+  if (trun_atom_.flags & MOV_TRUN_DATA_OFFSET) {
+    ret = readUINT32(&trun_atom_.data_offset);
+    if (ret) return ret;
+  }
+  uint64_t data_offset =
+      trun_atom_.data_offset + (default_base_is_moof_ ? current_moof_.offset : tfhd_atom_.base_data_offset);
+  trun_atom_.first_sample_flags = 0;
+  if (trun_atom_.flags & MOV_TRUN_FIRST_SAMPLE_FLAGS) {
+    ret = readUINT32(&trun_atom_.first_sample_flags);
+    if (ret) return ret;
+  }
+  uint32_t prev_size = 0;
+  uint64_t frame_time = 0;
+  trun_atom_.table.clear();
+  trun_atom_.table.reserve(trun_atom_.count);
+  trun_atom_.table_pointer = 0;
+  for (uint32_t i = 0; i < trun_atom_.count; i++) {
+    trunEntry entry;
+    data_offset += prev_size;
+    if (trun_atom_.flags & MOV_TRUN_SAMPLE_DURATION) readUINT32(&entry.duration);
+    if (trun_atom_.flags & MOV_TRUN_SAMPLE_SIZE) readUINT32(&entry.size);
+    if (trun_atom_.flags & MOV_TRUN_SAMPLE_FLAGS) readUINT32(&entry.flags);
+    if (trun_atom_.flags & MOV_TRUN_SAMPLE_CTS) {
+      assert(0);
+    }
+    entry.file_offset = data_offset;
+    entry.time = frame_time;
+    frame_time += (trun_atom_.flags & MOV_TRUN_SAMPLE_DURATION) ? entry.duration : tfhd_atom_.defaultSampleDuration;
+    prev_size = entry.size;
+    trun_atom_.table.push_back(entry);
+  }
+  trun_atom_.valid = 1;
+  return 0;
+}
+
+int Mp4Parser::parse_mvhd() {
+  int ret = findAtom("mvhd", &mvhd_atom_.atom);
+  if (ret) return ret;
+  ret = readUINT8(&mvhd_atom_.version);
+  if (ret) return ret;
+  ret = readUINT24(&mvhd_atom_.reserved);
+  if (ret) return ret;
+  ret = readUINT32(&mvhd_atom_.creation_time);
+  if (ret) return ret;
+  ret = readUINT32(&mvhd_atom_.modification_time);
+  if (ret) return ret;
+  ret = readUINT32(&mvhd_atom_.timescale);
+  if (ret) return ret;
+
+  ret = fseek(file_, mvhd_atom_.atom.offset + mvhd_atom_.atom.size, SEEK_SET);
+  if (ret) return ret;
+  return 0;
+}
+
+int Mp4Parser::parse_mdhd() {
+  int ret = findAtom("mdhd", &mdhd_atom_.atom);
+  if (ret) return ret;
+  ret = readUINT8(&mdhd_atom_.version);
+  if (ret) return ret;
+  ret = readUINT24(&mdhd_atom_.reserved);
+  if (ret) return ret;
+  ret = readUINT32(&mdhd_atom_.creation_time);
+  if (ret) return ret;
+  ret = readUINT32(&mdhd_atom_.modification_time);
+  if (ret) return ret;
+  ret = readUINT32(&mdhd_atom_.timescale);
+  if (ret) return ret;
+
+  ret = fseek(file_, mdhd_atom_.atom.offset + mdhd_atom_.atom.size, SEEK_SET);
+  if (ret) return ret;
+  return 0;
+}
+
+int Mp4Parser::parse_tfhd() {
+  mp4atom atom;
+  int ret = fseek(file_, moov_atom_.offset + moov_atom_.size, SEEK_SET);
+  if (ret) return ret;
+  ret = findAtom("moof", &moof_atom_);
+  if (ret) return ret;
+  ret = findAtom("traf", &atom);
+  if (ret) return ret;
+  ret = findAtom("tfhd", &tfhd_atom_.atom);
+  if (ret) return ret;
+  ret = readUINT8(&tfhd_atom_.version);
+  if (ret) return ret;
+  ret = readUINT24(&tfhd_atom_.flags);
+  if (ret) return ret;
+  ret = readUINT32(&tfhd_atom_.track_id);
+  if (ret) return ret;
+  if (MOV_TFHD_BASE_DATA_OFFSET & tfhd_atom_.flags) {
+    ret = readUINT64(&tfhd_atom_.base_data_offset);
+    if (ret) return ret;
+  }
+  if (MOV_TFHD_STSD_ID & tfhd_atom_.flags) {
+    ret = readUINT32(&tfhd_atom_.sample_desc_index);
+    if (ret) return ret;
+  }
+  if (MOV_TFHD_DEFAULT_DURATION & tfhd_atom_.flags) {
+    ret = readUINT32(&tfhd_atom_.defaultSampleDuration);
+    if (ret) return ret;
+  }
+
+  default_base_is_moof_ = MOV_TFHD_DEFAULT_BASE_IS_MOOF & tfhd_atom_.flags;
+
+  return 0;
+}
+
+int Mp4Parser::OpenFile(FILE* file) {
+  file_ = file;
+  need_fclose_ = 0;
+  int ret = readAtom(&ftyp_);
+  if (ret) return ret;
+  if (strcmp((const char*)ftyp_.name, "ftyp")) return -1;
+  ret = fseek(file_, ftyp_.offset + ftyp_.size, SEEK_SET);
+  if (ret) return ret;
+  ret = findAtom("moov", &moov_atom_);
+  if (ret) return ret;
+  ret = parse_mvhd();
+  if (ret) return ret;
+  ret = findAtom("trak", &any_atom_1_);
+  if (ret) return ret;
+  ret = findAtom("mdia", &any_atom_2_);
+  if (ret) return ret;
+  ret = parse_mdhd();
+  if (ret) return ret;
+  ret = findAtom("minf", &any_atom_3_);
+  if (ret) return ret;
+  ret = findAtom("stbl", &stbl_atom_);
+  if (ret) return ret;
+  // next level
+  ret = findAtom("stsd", &stsd_atom_);
+  if (ret) return ret;
+  ret = parse_stsd();
+  if (ret) return ret;
+
+  ret = fseek(file_, stsd_atom_.offset + stsd_atom_.size, SEEK_SET);
+  ret = parse_stts();
+  if (ret) return ret;
+  if (stts_atom_.entry_count) {
+    // quicktime format
+    is_qt_format_ = 1;
+    ret = fseek(file_, stbl_atom_.offset + 8, SEEK_SET);
+    if (ret) return ret;
+
+    ret = parse_stsc();
+    if (ret) return ret;
+    ret = parse_stsz();
+    if (ret) return ret;
+    ret = parse_stco();
+    if (ret) return ret;
+    return 0;
+  } else {
+    ret = fseek(file_, moov_atom_.offset + moov_atom_.size, SEEK_SET);
+    ret = parse_tfhd();
+    if (ret) return ret;
+
+    ret = parse_sidx();
+    if (ret != 1 && ret != 0) return ret;
+    is_fragmented_ = ret == 0;
+    ret = fseek(file_, moov_atom_.offset + moov_atom_.size, SEEK_SET);
+    if (ret) return ret;
+    ret = parse_trun();
+    if (ret) return ret;
+    return 0;
+  }
+}
+
+int Mp4Parser::OpenFile(const char* path) {
+  FILE* file = fopen(path, "rb");
+  if (!file) {
+    printf("Failed to open file %s\n", path);
+    return -1;
+  }
+  need_fclose_ = 1;
+  return OpenFile(file);
+}
+
+int Mp4Parser::readName(uint8_t* buf) {
+  size_t read = fread(buf, 1, 4, file_);
+  if (read != 4) return -1;
+  buf[4] = 0;
+  return 0;
+}
+
+int Mp4Parser::readUINT8(uint32_t* val) {
+  uint8_t buf;
+  size_t read = fread(&buf, 1, 1, file_);
+  if (read != 1) return -1;
+  *val = buf;
+  return 0;
+}
+
+int Mp4Parser::readUINT24(uint32_t* val) {
+  uint8_t buf[3];
+  size_t read = fread(buf, 1, 3, file_);
+  if (read != 3) return -1;
+  *val = (buf[0] << 16) | (buf[1] << 8) | (buf[2]);
+  return 0;
+}
+
+int Mp4Parser::readUINT16(uint16_t* val) {
+  uint8_t buf[2];
+  size_t read = fread(buf, 1, 2, file_);
+  if (read != 2) return -1;
+  *val = (buf[0] << 8) | (buf[1]);
+  return 0;
+}
+
+int Mp4Parser::readUINT16(uint32_t* val) {
+  uint8_t buf[2];
+  size_t read = fread(buf, 1, 2, file_);
+  if (read != 2) return -1;
+  *val = (buf[0] << 8) | (buf[1]);
+  return 0;
+}
+
+int Mp4Parser::readUINT32(uint32_t* val) {
+  uint8_t buf[4];
+  size_t read = fread(buf, 1, 4, file_);
+  if (read != 4) return -1;
+  *val = (buf[0] << 24) | (buf[1] << 16) | (buf[2] << 8) | (buf[3]);
+  return 0;
+}
+
+int Mp4Parser::readUINT32(uint64_t* val) {
+  uint8_t buf[4];
+  size_t read = fread(buf, 1, 4, file_);
+  if (read != 4) return -1;
+  *val = (buf[0] << 24) | (buf[1] << 16) | (buf[2] << 8) | (buf[3]);
+  return 0;
+}
+
+int Mp4Parser::readUINT64(uint64_t* val) {
+  uint8_t buf[8];
+  size_t read = fread(buf, 1, 8, file_);
+  if (read != 8) return -1;
+  *val = ((uint64_t)((buf[0] << 24) | (buf[1] << 16) | (buf[2] << 8) | (buf[3] << 0)) << 32) | (buf[4] << 24) |
+         (buf[5] << 16) | (buf[6] << 8) | (buf[7]);
+  return 0;
+}
+
+int Mp4Parser::readAtom(mp4atom* atom) {
+  atom->offset = ftell(file_);
+  int ret = readUINT32(&atom->size);
+  if (ret) return ret;
+  ret = readName(atom->name);
+  if (ret) return ret;
+  if (atom->size == 1) {
+    ret = readUINT64(&atom->ext_size);
+    if (ret) return ret;
+  }
+  return 0;
+}
+
+int Mp4Parser::Read(uint8_t* buffer) {
+  if (is_qt_format_) {
+    int ret = calcChunk();
+    if (ret) return -1;
+    uint32_t chunk_offset = stco_atom_.table[current_chunk_];
+    uint32_t inchunk_offset = 0;
+    for (uint32_t sample = first_in_current_chunk_; sample < current_sample_; sample++)
+      inchunk_offset += stsz_atom_.table[sample];
+    uint32_t file_offset = chunk_offset + inchunk_offset;
+    fseek(file_, file_offset, SEEK_SET);
+    size_t _read = fread(buffer, 1, stsz_atom_.table[current_sample_], file_);
+    if (_read != stsz_atom_.table[current_sample_]) return -1;
+    current_sample_++;
+    return 0;
+  } else {
+    trunEntry& entry = trun_atom_.table[trun_atom_.table_pointer];
+    fseek(file_, (long)entry.file_offset, SEEK_SET);
+    size_t _read = fread(buffer, 1, entry.size, file_);
+    if (_read != entry.size) return -1;
+    trun_atom_.table_pointer++;
+    return 0;
+  }
+}
+
+int Mp4Parser::calcChunk() {
+  while (1) {
+    if (current_sample_ >= first_in_current_chunk_ &&
+        current_sample_ < (first_in_current_chunk_ + stsc_atom_.table[current_chunk_id_].samples_per_chunk)) {
+      return 0;
+    }
+    current_chunk_++;
+    first_in_current_chunk_ += stsc_atom_.table[current_chunk_id_].samples_per_chunk;
+    for (uint32_t ch_id = 0; ch_id < stsc_atom_.entry_count; ch_id++) {
+      if (current_chunk_ >= (stsc_atom_.table[ch_id].first_chunk - 1))
+        current_chunk_id_ = ch_id;
+      else
+        break;
+    }
+  }
+  // never happen
+  return -1;
+}
+
+uint32_t Mp4Parser::GetSize() {
+  int ret;
+  if (is_qt_format_) {
+    if (current_sample_ < stts_atom_.total_samples) {
+      return stsz_atom_.table[current_sample_];
+    } else
+      return 0;  // out of samples count
+  } else {
+    while (1) {
+      if (trun_atom_.table_pointer < trun_atom_.table.size())
+        return trun_atom_.table[trun_atom_.table_pointer].size;
+      else if (is_fragmented_) {
+        // read next fragment
+        ret = fseek(file_, current_moof_.offset + current_moof_.size, SEEK_SET);
+        if (ret) return 0;
+        ret = parse_trun();
+        if (ret) return 0;
+        trun_atom_.table_pointer = 0;
+      } else
+        return 0;
+    }
+  }
+}
+
+AvxRational Mp4Parser::FrameRate() {
+  AvxRational ret;
+  if (is_qt_format_) {
+    ret.numerator = (int)mdhd_atom_.timescale;
+    ret.denominator = (int)stts_atom_.table[0].sample_delta;
+  } else {
+    ret.numerator = (int)mvhd_atom_.timescale;
+    ret.denominator = (int)tfhd_atom_.defaultSampleDuration;
+  }
+  if (ret.numerator == 0 || ret.denominator == 0) {
+    ret.numerator = 30;
+    ret.denominator = 1;
+  }
+  return ret;
+}
+
+extern "C" mp4context* create_mp4_ctx() {
+  mp4context* ctx = new mp4context;
+  memset(ctx, 0, sizeof(mp4context));
+  ctx->parser = new Mp4Parser();
+  return ctx;
+}
+
+extern "C" int destroy_mp4_ctx(mp4context* ctx) {
+  if (!ctx || !ctx->parser) return -1;
+  Mp4Parser* parser = static_cast<Mp4Parser*>(ctx->parser);
+  if (ctx->buffer) delete[] ctx->buffer;
+  delete parser;
+  delete ctx;
+  return 0;
+}
+
+extern "C" int file_is_mp4(mp4context* ctx, AvxInputContext* input_ctx) {
+  if (!ctx || !ctx->file || !ctx->parser) return 0;
+  Mp4Parser* parser = static_cast<Mp4Parser*>(ctx->parser);
+  long pos = ftell(ctx->file);
+  fseek(ctx->file, 0, SEEK_SET);
+  int ret = parser->OpenFile(ctx->file);
+  if (!ret) {
+    fseek(ctx->file, pos, SEEK_SET);
+    if (input_ctx) input_ctx->framerate = parser->FrameRate();
+    return 1;
+  }
+  return 0;
+}
+
+extern "C" int mp4_read_frame(mp4context* ctx, uint8_t** buf, size_t* bytes_in_buf, size_t* buffer_size) {
+  if (!ctx || !ctx->file || !ctx->parser) return -1;
+  Mp4Parser* parser = static_cast<Mp4Parser*>(ctx->parser);
+  uint32_t buf_sz = parser->GetSize();
+  if (!buf_sz) return -1;
+  if (ctx->buffer_capacity < buf_sz) {
+    if (ctx->buffer) delete[] ctx->buffer;
+    ctx->buffer = new uint8_t[buf_sz];
+    ctx->buffer_capacity = buf_sz;
+  }
+  int ret = parser->Read(ctx->buffer);
+  if (ret) return -1;
+  *buf = ctx->buffer;
+  *bytes_in_buf = buf_sz;
+  *buffer_size = ctx->buffer_capacity;
+  return 0;
+}
\ No newline at end of file

diff --git a/testing/common/mp4parser.h b/testing/common/mp4parser.h
new file mode 100644
index 0000000..2138aff
--- /dev/null
+++ b/testing/common/mp4parser.h

@@ -0,0 +1,245 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+
+#include "common/tools_common.h"
+
+typedef struct mp4context {
+  void* parser;
+  FILE* file;
+  uint8_t* buffer;
+  size_t buffer_capacity;
+  size_t bytes_buffered;
+  int frate_denominator;
+  int frate_numerator;
+} mp4context;
+
+#ifdef __cplusplus
+
+extern "C" {
+mp4context* create_mp4_ctx();
+int file_is_mp4(mp4context* ctx, AvxInputContext* input_ctx);
+int mp4_read_frame(mp4context* ctx, uint8_t** buf, size_t* bytes_in_buf, size_t* buffer_size);
+int destroy_mp4_ctx(mp4context* ctx);
+}
+
+#include <vector>
+#include <list>
+
+class Mp4Parser {
+ public:
+  ~Mp4Parser() {
+    if (need_fclose_) CloseFile();
+  }
+  void CloseFile() {
+    if (file_) fclose(file_);
+    file_ = 0;
+  }
+  int OpenFile(const char* path);
+  int OpenFile(FILE* file);
+  uint32_t GetSize();
+  int Read(uint8_t* buffer);
+  AvxRational FrameRate();
+  uint32_t Width() { return width_; }
+  uint32_t Height() { return height_; }
+
+ protected:
+  struct mp4atom {
+    uint32_t offset;
+    uint32_t size;
+    uint64_t ext_size;
+    uint8_t name[5];
+    mp4atom() { memset(this, 0, sizeof(mp4atom)); }
+  };
+
+  struct mvhdAtom {
+    mp4atom atom;
+    uint32_t version;
+    uint32_t reserved;
+    uint32_t creation_time;
+    uint32_t modification_time;
+    uint32_t timescale;
+  };
+
+  struct mdhdAtom {
+    mp4atom atom;
+    uint32_t version;
+    uint32_t reserved;
+    uint32_t creation_time;
+    uint32_t modification_time;
+    uint32_t timescale;
+  };
+
+  struct tfhdAtom {
+    mp4atom atom;
+    uint32_t version;
+    uint32_t flags;
+    uint32_t track_id;
+    // All of the following are optional fields
+    uint64_t base_data_offset;
+    uint32_t sample_desc_index;
+    uint32_t defaultSampleDuration;
+    uint32_t defaultSampleSize;
+    uint32_t defaultSampleFlags;
+  };
+
+  struct trunEntry {
+    uint32_t duration;
+    uint32_t size;
+    uint32_t flags;
+    int64_t CompositionTimeOffset;
+    uint64_t file_offset;
+    uint64_t time;
+  };
+
+  struct trunAtom {
+    mp4atom atom;
+    uint32_t version;
+    uint32_t flags;
+    uint32_t count;
+    uint32_t data_offset;
+    uint32_t first_sample_flags;
+    std::vector<trunEntry> table;
+    uint32_t table_pointer = 0;
+    uint32_t valid = 0;
+  };
+  struct sidxEntry {
+    uint32_t reference_type;
+    uint32_t referenced_size;
+    uint32_t subsegment_duration;
+    uint32_t starts_with_SAP;
+    uint32_t SAP_type;
+    uint32_t SAP_delta_time;
+  };
+  struct sidxAtom {
+    mp4atom atom;
+    uint32_t version;
+    uint32_t flags;
+    uint32_t track_id;
+    uint32_t timescale;
+    uint64_t first_pts;
+    uint64_t first_offset;
+    uint32_t item_count;
+    std::vector<sidxEntry> table;
+  };
+  struct sttsEntry {
+    uint32_t sample_count;
+    uint32_t sample_delta;
+  };
+  struct sttsAtom {
+    mp4atom atom;
+    uint32_t version;
+    uint32_t flags;
+    uint32_t entry_count;
+    uint32_t total_samples;
+    std::vector<sttsEntry> table;
+  };
+  struct stscEntry {
+    uint32_t first_chunk;
+    uint32_t samples_per_chunk;
+    uint32_t sample_description_index;
+  };
+  struct stscAtom {
+    mp4atom atom;
+    uint32_t version;
+    uint32_t flags;
+    uint32_t entry_count;
+    std::vector<stscEntry> table;
+  };
+  struct stszAtom {
+    mp4atom atom;
+    uint32_t version;
+    uint32_t flags;
+    uint32_t size;
+    uint32_t entry_count;
+    std::vector<uint32_t> table;
+  };
+  struct stcoAtom {
+    mp4atom atom;
+    uint32_t version;
+    uint32_t flags;
+    uint32_t size;
+    uint32_t entry_count;
+    std::vector<uint32_t> table;
+  };
+
+ protected:
+  int readAtom(mp4atom* atom);
+  int readUINT8(uint32_t* val);
+  int readUINT16(uint16_t* val);
+  int readUINT16(uint32_t* val);
+  int readUINT24(uint32_t* val);
+  int readUINT32(uint32_t* val);
+  int readUINT32(uint64_t* val);
+  int readUINT64(uint64_t* val);
+  int readName(uint8_t* buf);
+  int findAtom(const char* name, mp4atom* atom);
+  int readAtom(const char* name, mp4atom* atom);
+  int parse_stsd();
+  int parse_trun();
+  int parse_tfhd();
+  int parse_mvhd();
+  int parse_mdhd();
+  int parse_sidx();
+  int parse_stts();
+  int parse_stsc();
+  int parse_stsz();
+  int parse_stco();
+  int calcChunk();
+
+ private:
+  FILE* file_ = 0;
+  int need_fclose_ = 0;
+  uint16_t width_ = 0;
+  uint16_t height_ = 0;
+  int default_base_is_moof_ = 0;
+  int is_fragmented_ = 0;
+  int is_qt_format_ = 0;
+
+  uint32_t current_chunk_ = 0;
+  uint32_t current_chunk_id_ = 0;
+  uint32_t first_in_current_chunk_ = 0;
+  uint32_t current_sample_ = 0;
+
+  mp4atom ftyp_;
+  mp4atom moov_atom_;
+  mvhdAtom mvhd_atom_;
+  mp4atom any_atom_1_;
+  mp4atom any_atom_2_;
+  mp4atom any_atom_3_;
+  mp4atom any_atom_4_;
+  mp4atom any_atom_5_;
+  mp4atom any_atom_6_;
+  mp4atom stbl_atom_;
+  mp4atom stsd_atom_;
+  mp4atom av01_atom_;
+  mp4atom current_moof_;
+  mp4atom moof_atom_;
+  trunAtom trun_atom_;
+  tfhdAtom tfhd_atom_;
+  sidxAtom sidx_atom_;
+  sttsAtom stts_atom_;
+  stscAtom stsc_atom_;
+  stszAtom stsz_atom_;
+  stcoAtom stco_atom_;
+  mdhdAtom mdhd_atom_;
+};
+#endif

diff --git a/testing/common/obudec.c b/testing/common/obudec.c
new file mode 100644
index 0000000..fe6540d
--- /dev/null
+++ b/testing/common/obudec.c

@@ -0,0 +1,450 @@
+/*
+ * Copyright (c) 2017, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <assert.h>
+
+#include "common/obudec.h"
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_ports/mem_ops.h"
+#include "av1/common/common.h"
+#include "av1/common/obu_util.h"
+
+#define OBU_BUFFER_SIZE (500 * 1024)
+
+#define OBU_HEADER_SIZE 1
+#define OBU_EXTENSION_SIZE 1
+#define OBU_MAX_LENGTH_FIELD_SIZE 8
+
+#define OBU_MAX_HEADER_SIZE (OBU_HEADER_SIZE + OBU_EXTENSION_SIZE + 2 * OBU_MAX_LENGTH_FIELD_SIZE)
+
+#define OBU_DETECTION_SIZE (OBU_HEADER_SIZE + OBU_EXTENSION_SIZE + 4 * OBU_MAX_LENGTH_FIELD_SIZE)
+
+// Reads unsigned LEB128 integer and returns 0 upon successful read and decode.
+// Stores raw bytes in 'value_buffer', length of the number in 'value_length',
+// and decoded value in 'value'.
+static int obudec_read_leb128(FILE *f, uint8_t *value_buffer, size_t *value_length, uint64_t *value) {
+  if (!f || !value_buffer || !value_length || !value) return -1;
+  size_t len;
+  for (len = 0; len < OBU_MAX_LENGTH_FIELD_SIZE; ++len) {
+    const size_t num_read = fread(&value_buffer[len], 1, 1, f);
+    if (num_read == 0) {
+      if (len == 0 && feof(f)) {
+        *value_length = 0;
+        return 0;
+      }
+      // Ran out of data before completing read of value.
+      return -1;
+    }
+    if ((value_buffer[len] >> 7) == 0) {
+      ++len;
+      *value_length = len;
+      break;
+    }
+  }
+
+  return aom_uleb_decode(value_buffer, len, value, NULL);
+}
+
+// Reads OBU header from 'f'. The 'buffer_capacity' passed in must be large
+// enough to store an OBU header with extension (2 bytes). Raw OBU data is
+// written to 'obu_data', parsed OBU header values are written to 'obu_header',
+// and total bytes read from file are written to 'bytes_read'. Returns 0 for
+// success, and non-zero on failure. When end of file is reached, the return
+// value is 0 and the 'bytes_read' value is set to 0.
+static int obudec_read_obu_header(FILE *f, size_t buffer_capacity, int is_annexb, uint8_t *obu_data,
+                                  ObuHeader *obu_header, size_t *bytes_read) {
+  if (!f || buffer_capacity < (OBU_HEADER_SIZE + OBU_EXTENSION_SIZE) || !obu_data || !obu_header || !bytes_read) {
+    return -1;
+  }
+  *bytes_read = fread(obu_data, 1, 1, f);
+
+  if (feof(f) && *bytes_read == 0) {
+    return 0;
+  } else if (*bytes_read != 1) {
+    fprintf(stderr, "obudec: Failure reading OBU header.\n");
+    return -1;
+  }
+
+  const int has_extension = (obu_data[0] >> 2) & 0x1;
+  if (has_extension) {
+    if (fread(&obu_data[1], 1, 1, f) != 1) {
+      fprintf(stderr, "obudec: Failure reading OBU extension.");
+      return -1;
+    }
+    ++*bytes_read;
+  }
+
+  size_t obu_bytes_parsed = 0;
+  const aom_codec_err_t parse_result =
+      aom_read_obu_header(obu_data, *bytes_read, &obu_bytes_parsed, obu_header, is_annexb);
+  if (parse_result != AOM_CODEC_OK || *bytes_read != obu_bytes_parsed) {
+    // fprintf(stderr, "obudec: Error parsing OBU header.\n");
+    return -1;
+  }
+
+  return 0;
+}
+
+// Reads OBU payload from 'f' and returns 0 for success when all payload bytes
+// are read from the file. Payload data is written to 'obu_data', and actual
+// bytes read added to 'bytes_read'.
+static int obudec_read_obu_payload(FILE *f, size_t payload_length, uint8_t *obu_data, size_t *bytes_read) {
+  if (!f || payload_length == 0 || !obu_data || !bytes_read) return -1;
+
+  if (fread(obu_data, 1, payload_length, f) != payload_length) {
+    fprintf(stderr, "obudec: Failure reading OBU payload.\n");
+    return -1;
+  }
+
+  *bytes_read += payload_length;
+  return 0;
+}
+
+static int obudec_read_obu_header_and_size(FILE *f, size_t buffer_capacity, int is_annexb, uint8_t *buffer,
+                                           size_t *bytes_read, size_t *payload_length, ObuHeader *obu_header) {
+  const size_t kMinimumBufferSize = OBU_MAX_HEADER_SIZE;
+  if (!f || !buffer || !bytes_read || !payload_length || !obu_header || buffer_capacity < kMinimumBufferSize) {
+    return -1;
+  }
+
+  size_t leb128_length_obu = 0;
+  size_t leb128_length_payload = 0;
+  uint64_t obu_size = 0;
+  if (is_annexb) {
+    if (obudec_read_leb128(f, &buffer[0], &leb128_length_obu, &obu_size) != 0) {
+      fprintf(stderr, "obudec: Failure reading OBU size length.\n");
+      return -1;
+    } else if (leb128_length_obu == 0) {
+      *payload_length = 0;
+      return 0;
+    }
+    if (obu_size > UINT32_MAX) {
+      fprintf(stderr, "obudec: OBU payload length too large.\n");
+      return -1;
+    }
+  }
+
+  size_t header_size = 0;
+  if (obudec_read_obu_header(f, buffer_capacity - leb128_length_obu, is_annexb, buffer + leb128_length_obu, obu_header,
+                             &header_size) != 0) {
+    return -1;
+  } else if (header_size == 0) {
+    *payload_length = 0;
+    return 0;
+  }
+
+  if (!obu_header->has_size_field) {
+    assert(is_annexb);
+    if (obu_size < header_size) {
+      fprintf(stderr, "obudec: OBU size is too small.\n");
+      return -1;
+    }
+    *payload_length = (size_t)obu_size - header_size;
+  } else {
+    uint64_t u64_payload_length = 0;
+    if (obudec_read_leb128(f, &buffer[leb128_length_obu + header_size], &leb128_length_payload, &u64_payload_length) !=
+        0) {
+      fprintf(stderr, "obudec: Failure reading OBU payload length.\n");
+      return -1;
+    }
+    if (u64_payload_length > UINT32_MAX) {
+      fprintf(stderr, "obudec: OBU payload length too large.\n");
+      return -1;
+    }
+
+    *payload_length = (size_t)u64_payload_length;
+  }
+
+  *bytes_read = leb128_length_obu + header_size + leb128_length_payload;
+  return 0;
+}
+
+static int obudec_grow_buffer(size_t growth_amount, uint8_t **obu_buffer, size_t *obu_buffer_capacity) {
+  if (!*obu_buffer || !obu_buffer_capacity || growth_amount == 0) {
+    return -1;
+  }
+
+  const size_t capacity = *obu_buffer_capacity;
+  if (SIZE_MAX - growth_amount < capacity) {
+    fprintf(stderr, "obudec: cannot grow buffer, capacity will roll over.\n");
+    return -1;
+  }
+
+  const size_t new_capacity = capacity + growth_amount;
+
+#if defined AOM_MAX_ALLOCABLE_MEMORY
+  if (new_capacity > AOM_MAX_ALLOCABLE_MEMORY) {
+    fprintf(stderr, "obudec: OBU size exceeds max alloc size.\n");
+    return -1;
+  }
+#endif
+
+  uint8_t *new_buffer = (uint8_t *)realloc(*obu_buffer, new_capacity);
+  if (!new_buffer) {
+    fprintf(stderr, "obudec: Failed to allocate compressed data buffer.\n");
+    return -1;
+  }
+
+  *obu_buffer = new_buffer;
+  *obu_buffer_capacity = new_capacity;
+  return 0;
+}
+
+static int obudec_read_one_obu(FILE *f, uint8_t **obu_buffer, size_t obu_bytes_buffered, size_t *obu_buffer_capacity,
+                               size_t *obu_length, ObuHeader *obu_header, int is_annexb) {
+  if (!f || !(*obu_buffer) || !obu_buffer_capacity || !obu_length || !obu_header) {
+    return -1;
+  }
+
+  size_t bytes_read = 0;
+  size_t obu_payload_length = 0;
+  size_t available_buffer_capacity = *obu_buffer_capacity - obu_bytes_buffered;
+
+  if (available_buffer_capacity < OBU_MAX_HEADER_SIZE) {
+    if (obudec_grow_buffer(AOMMAX(*obu_buffer_capacity, OBU_MAX_HEADER_SIZE), obu_buffer, obu_buffer_capacity) != 0) {
+      *obu_length = bytes_read;
+      return -1;
+    }
+    available_buffer_capacity += AOMMAX(*obu_buffer_capacity, OBU_MAX_HEADER_SIZE);
+  }
+
+  const int status =
+      obudec_read_obu_header_and_size(f, available_buffer_capacity, is_annexb, *obu_buffer + obu_bytes_buffered,
+                                      &bytes_read, &obu_payload_length, obu_header);
+  if (status < 0) return status;
+
+  if (obu_payload_length > SIZE_MAX - bytes_read) return -1;
+
+  if (obu_payload_length > 256 * 1024 * 1024) {
+    fprintf(stderr, "obudec: Read invalid OBU size (%u)\n", (unsigned int)obu_payload_length);
+    *obu_length = bytes_read + obu_payload_length;
+    return -1;
+  }
+
+  if (bytes_read + obu_payload_length > available_buffer_capacity &&
+      obudec_grow_buffer(AOMMAX(*obu_buffer_capacity, obu_payload_length), obu_buffer, obu_buffer_capacity) != 0) {
+    *obu_length = bytes_read + obu_payload_length;
+    return -1;
+  }
+
+  if (obu_payload_length > 0 &&
+      obudec_read_obu_payload(f, obu_payload_length, *obu_buffer + obu_bytes_buffered + bytes_read, &bytes_read) != 0) {
+    return -1;
+  }
+
+  *obu_length = bytes_read;
+  return 0;
+}
+
+int file_is_obu(struct ObuDecInputContext *obu_ctx) {
+  if (!obu_ctx || !obu_ctx->avx_ctx) return 0;
+
+  struct AvxInputContext *avx_ctx = obu_ctx->avx_ctx;
+  uint8_t detect_buf[OBU_DETECTION_SIZE] = {0};
+  const int is_annexb = obu_ctx->is_annexb;
+  FILE *f = avx_ctx->file;
+  size_t payload_length = 0;
+  ObuHeader obu_header;
+  memset(&obu_header, 0, sizeof(obu_header));
+  size_t length_of_unit_size = 0;
+  size_t annexb_header_length = 0;
+  uint64_t unit_size = 0;
+
+  if (is_annexb) {
+    // read the size of first temporal unit
+    if (obudec_read_leb128(f, &detect_buf[0], &length_of_unit_size, &unit_size) != 0) {
+      fprintf(stderr, "obudec: Failure reading temporal unit header\n");
+      return 0;
+    }
+
+    // read the size of first frame unit
+    if (obudec_read_leb128(f, &detect_buf[length_of_unit_size], &annexb_header_length, &unit_size) != 0) {
+      fprintf(stderr, "obudec: Failure reading frame unit header\n");
+      return 0;
+    }
+    annexb_header_length += length_of_unit_size;
+  }
+
+  size_t bytes_read = 0;
+  if (obudec_read_obu_header_and_size(f, OBU_DETECTION_SIZE - annexb_header_length, is_annexb,
+                                      &detect_buf[annexb_header_length], &bytes_read, &payload_length,
+                                      &obu_header) != 0) {
+    //    fprintf(stderr, "obudec: Failure reading first OBU.\n");
+    rewind(f);
+    return 0;
+  }
+
+  if (is_annexb) {
+    bytes_read += annexb_header_length;
+  }
+
+  if (obu_header.type != OBU_TEMPORAL_DELIMITER && obu_header.type != OBU_SEQUENCE_HEADER) {
+    return 0;
+  }
+
+  if (obu_header.has_size_field) {
+    if (obu_header.type == OBU_TEMPORAL_DELIMITER && payload_length != 0) {
+      fprintf(stderr, "obudec: Invalid OBU_TEMPORAL_DELIMITER payload length (non-zero).");
+      rewind(f);
+      return 0;
+    }
+  } else if (!is_annexb) {
+    fprintf(stderr, "obudec: OBU size fields required, cannot decode input.\n");
+    rewind(f);
+    return 0;
+  }
+
+  // Appears that input is valid Section 5 AV1 stream.
+  obu_ctx->buffer = (uint8_t *)malloc(OBU_BUFFER_SIZE);
+  if (!obu_ctx->buffer) {
+    fprintf(stderr, "Out of memory.\n");
+    rewind(f);
+    return 0;
+  }
+  obu_ctx->buffer_capacity = OBU_BUFFER_SIZE;
+
+  memcpy(obu_ctx->buffer, &detect_buf[0], bytes_read);
+  obu_ctx->bytes_buffered = bytes_read;
+  // If the first OBU is a SEQUENCE_HEADER, then it will have a payload.
+  // We need to read this in so that our buffer only contains complete OBUs.
+  if (payload_length > 0) {
+    if (payload_length > (obu_ctx->buffer_capacity - bytes_read)) {
+      fprintf(stderr, "obudec: First OBU's payload is too large\n");
+      rewind(f);
+      return 0;
+    }
+
+    size_t payload_bytes = 0;
+    const int status = obudec_read_obu_payload(f, payload_length, &obu_ctx->buffer[bytes_read], &payload_bytes);
+    if (status < 0) {
+      rewind(f);
+      return 0;
+    }
+    obu_ctx->bytes_buffered += payload_bytes;
+  }
+  return 1;
+}
+
+int obudec_read_temporal_unit(struct ObuDecInputContext *obu_ctx, uint8_t **buffer, size_t *bytes_read,
+                              size_t *buffer_size) {
+  FILE *f = obu_ctx->avx_ctx->file;
+  if (!f) return -1;
+
+  *buffer_size = 0;
+  *bytes_read = 0;
+
+  if (feof(f)) {
+    return 1;
+  }
+
+  size_t tu_size;
+  size_t obu_size = 0;
+  size_t length_of_temporal_unit_size = 0;
+  uint8_t tuheader[OBU_MAX_LENGTH_FIELD_SIZE] = {0};
+
+  if (obu_ctx->is_annexb) {
+    uint64_t size = 0;
+
+    if (obu_ctx->bytes_buffered == 0) {
+      if (obudec_read_leb128(f, &tuheader[0], &length_of_temporal_unit_size, &size) != 0) {
+        fprintf(stderr, "obudec: Failure reading temporal unit header\n");
+        return -1;
+      }
+      if (size == 0 && feof(f)) {
+        return 1;
+      }
+    } else {
+      // temporal unit size was already stored in buffer
+      if (aom_uleb_decode(obu_ctx->buffer, obu_ctx->bytes_buffered, &size, &length_of_temporal_unit_size) != 0) {
+        fprintf(stderr, "obudec: Failure reading temporal unit header\n");
+        return -1;
+      }
+    }
+
+    if (size > UINT32_MAX || size + length_of_temporal_unit_size > UINT32_MAX) {
+      fprintf(stderr, "obudec: TU too large.\n");
+      return -1;
+    }
+
+    size += length_of_temporal_unit_size;
+    tu_size = (size_t)size;
+  } else {
+    while (1) {
+      ObuHeader obu_header;
+      memset(&obu_header, 0, sizeof(obu_header));
+
+      if (obudec_read_one_obu(f, &obu_ctx->buffer, obu_ctx->bytes_buffered, &obu_ctx->buffer_capacity, &obu_size,
+                              &obu_header, 0) != 0) {
+        fprintf(stderr, "obudec: read_one_obu failed in TU loop\n");
+        return -1;
+      }
+
+      if (obu_header.type == OBU_TEMPORAL_DELIMITER || obu_size == 0) {
+        tu_size = obu_ctx->bytes_buffered;
+        break;
+      } else {
+        obu_ctx->bytes_buffered += obu_size;
+      }
+    }
+  }
+
+#if defined AOM_MAX_ALLOCABLE_MEMORY
+  if (tu_size > AOM_MAX_ALLOCABLE_MEMORY) {
+    fprintf(stderr, "obudec: Temporal Unit size exceeds max alloc size.\n");
+    return -1;
+  }
+#endif
+  uint8_t *new_buffer = (uint8_t *)realloc(*buffer, tu_size);
+  if (!new_buffer) {
+    free(*buffer);
+    fprintf(stderr, "obudec: Out of memory.\n");
+    return -1;
+  }
+  *buffer = new_buffer;
+  *bytes_read = tu_size;
+  *buffer_size = tu_size;
+
+  if (!obu_ctx->is_annexb) {
+    memcpy(*buffer, obu_ctx->buffer, tu_size);
+
+    // At this point, (obu_ctx->buffer + obu_ctx->bytes_buffered + obu_size)
+    // points to the end of the buffer.
+    memmove(obu_ctx->buffer, obu_ctx->buffer + obu_ctx->bytes_buffered, obu_size);
+    obu_ctx->bytes_buffered = obu_size;
+  } else {
+    if (!feof(f)) {
+      size_t data_size;
+      size_t offset;
+      if (!obu_ctx->bytes_buffered) {
+        data_size = tu_size - length_of_temporal_unit_size;
+        memcpy(*buffer, &tuheader[0], length_of_temporal_unit_size);
+        offset = length_of_temporal_unit_size;
+      } else {
+        memcpy(*buffer, obu_ctx->buffer, obu_ctx->bytes_buffered);
+        offset = obu_ctx->bytes_buffered;
+        data_size = tu_size - obu_ctx->bytes_buffered;
+        obu_ctx->bytes_buffered = 0;
+      }
+
+      if (fread(*buffer + offset, 1, data_size, f) != data_size) {
+        fprintf(stderr, "obudec: Failed to read full temporal unit\n");
+        return -1;
+      }
+    }
+  }
+  return 0;
+}
+
+void obudec_free(struct ObuDecInputContext *obu_ctx) { free(obu_ctx->buffer); }

diff --git a/testing/common/obudec.h b/testing/common/obudec.h
new file mode 100644
index 0000000..20f7e17
--- /dev/null
+++ b/testing/common/obudec.h

@@ -0,0 +1,47 @@
+/*
+ * Copyright (c) 2017, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#ifndef AOM_COMMON_OBUDEC_H_
+#define AOM_COMMON_OBUDEC_H_
+
+#include "common/tools_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct ObuDecInputContext {
+  struct AvxInputContext *avx_ctx;
+  uint8_t *buffer;
+  size_t buffer_capacity;
+  size_t bytes_buffered;
+  int is_annexb;
+};
+
+// Returns 1 when file data starts (if Annex B stream, after reading the
+// size of the OBU) with what appears to be a Temporal Delimiter
+// OBU as defined by Section 5 of the AV1 bitstream specification.
+int file_is_obu(struct ObuDecInputContext *obu_ctx);
+
+// Reads one Temporal Unit from the input file. Returns 0 when a TU is
+// successfully read, 1 when end of file is reached, and less than 0 when an
+// error occurs. Stores TU data in 'buffer'. Reallocs buffer to match TU size,
+// returns buffer capacity via 'buffer_size', and returns size of buffered data
+// via 'bytes_read'.
+int obudec_read_temporal_unit(struct ObuDecInputContext *obu_ctx, uint8_t **buffer, size_t *bytes_read,
+                              size_t *buffer_size);
+
+void obudec_free(struct ObuDecInputContext *obu_ctx);
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#endif  // AOM_COMMON_OBUDEC_H_

diff --git a/testing/common/stream_reader.cpp b/testing/common/stream_reader.cpp
new file mode 100644
index 0000000..3ccc647
--- /dev/null
+++ b/testing/common/stream_reader.cpp

@@ -0,0 +1,140 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "stream_reader.h"
+#include "log.h"
+
+void ReadBufferWrapper::InternalRelease() {
+  if (buffer_) buffer_->releaser_->ReleaseData(buffer_);
+  buffer_.reset();
+}
+ReadBufferWrapper::~ReadBufferWrapper() { InternalRelease(); }
+
+void StreamReader::ReleaseData(reader_buffer_ptr data) {
+  if (data) reader_queue_.Push(data);
+}
+
+int StreamReader::readFrame(uint8_t **buf, size_t *bytes_in_buffer) {
+  switch (aom_input_ctx_.file_type) {
+    case FILE_TYPE_WEBM:
+      return webm_read_frame(&webm_ctx_, buf, bytes_in_buffer, &input_buffer_capacity_);
+    case FILE_TYPE_IVF:
+      return ivf_read_frame(aom_input_ctx_.file, buf, bytes_in_buffer, &input_buffer_capacity_, NULL);
+    case FILE_TYPE_OBU:
+      return obudec_read_temporal_unit(&obu_ctx_, buf, bytes_in_buffer, &input_buffer_capacity_);
+    case FILE_TYPE_MP4:
+      return mp4_read_frame(mp4_context_, buf, bytes_in_buffer, &input_buffer_capacity_);
+    default:
+      return 1;
+  }
+}
+
+StreamReader::StreamReader(uint32_t mode, uint32_t buffers_num)
+    : reader_queue_(buffers_num), ready_queue_(buffers_num + 1), mode_(mode) {
+  for (uint32_t i = 0; i < buffers_num; i++) {
+    reader_queue_.Push(reader_buffer_ptr(new ReaderBuffer(this)));
+  }
+}
+
+int StreamReader::Start(std::string &source_path) {
+  sourec_file_ = source_path;
+  aom_input_ctx_.filename = sourec_file_.c_str();
+  aom_input_ctx_.file = fopen(aom_input_ctx_.filename, "rb");
+  obu_ctx_.avx_ctx = &aom_input_ctx_;
+  mp4_context_ = create_mp4_ctx();
+  mp4_context_->file = aom_input_ctx_.file;
+  if (!aom_input_ctx_.file) return -1;
+  if (file_is_ivf(&aom_input_ctx_))
+    aom_input_ctx_.file_type = FILE_TYPE_IVF;
+  else if (file_is_webm(&webm_ctx_, &aom_input_ctx_)) {
+    aom_input_ctx_.file_type = FILE_TYPE_WEBM;
+    if (webm_guess_framerate(&webm_ctx_, &aom_input_ctx_)) {
+      XB_LOGE << "Failed to guess framerate -- error parsing. webm file?";
+      fclose(aom_input_ctx_.file);
+      aom_input_ctx_.file = 0;
+      return -1;
+    }
+  } else if (file_is_obu(&obu_ctx_))
+    aom_input_ctx_.file_type = FILE_TYPE_OBU;
+  else if (file_is_mp4(mp4_context_, &aom_input_ctx_))
+    aom_input_ctx_.file_type = FILE_TYPE_MP4;
+  else {
+    XB_LOGE << "Unsupported input file type";
+    fclose(aom_input_ctx_.file);
+    aom_input_ctx_.file = 0;
+    return -1;
+  }
+  if (mode_ == srmodeAsync) {
+    stop_ = 0;
+    eof_ = 0;
+    reading_thread_ = std::thread(ThreadFunc, this);
+  }
+  return 0;
+}
+
+int StreamReader::ReadLoop() {
+  started_ = 1;
+  while (!stop_) {
+    size_t buf_size = 0;
+    reader_buffer_ptr buffer = reader_queue_.Pull();
+    if (!buffer || readFrame(&buf_, &buf_size)) {
+      break;
+    } else {
+      int ret = buffer->Put(buf_, (uint32_t)buf_size);
+      if (ret) {
+        XB_LOGE << "Out of memory while allocating " << buf_size << " bytes";
+        ready_queue_.Push(0);
+        return 1;
+      }
+      ready_queue_.Push(buffer);
+    }
+  }
+  ready_queue_.Push(0);
+  if (aom_input_ctx_.file) {
+    fclose(aom_input_ctx_.file);
+    aom_input_ctx_.file = 0;
+  }
+  return 0;
+}
+
+reader_buffer_ptr StreamReader::GetData() {
+  if (mode_ == srmodeAsync) {
+    reader_buffer_ptr buffer = ready_queue_.Pull();
+    if (!buffer) {
+      return 0;
+    } else
+      return buffer;
+  } else {
+    size_t buf_size = 0;
+    reader_buffer_ptr buffer(reader_queue_.Pull());
+    if (!buffer || readFrame(&buf_, &buf_size)) {
+      if (aom_input_ctx_.file) {
+        fclose(aom_input_ctx_.file);
+        aom_input_ctx_.file = 0;
+      }
+      return 0;
+    } else {
+      int ret = buffer->Put(buf_, (uint32_t)buf_size);
+      if (ret) {
+        XB_LOGE << "Out of memory";
+        return 0;
+      }
+    }
+    return buffer;
+  }
+}
+
+size_t StreamReader::GetCount() { return ready_queue_.Size(); }

diff --git a/testing/common/stream_reader.h b/testing/common/stream_reader.h
new file mode 100644
index 0000000..e0378c5
--- /dev/null
+++ b/testing/common/stream_reader.h

@@ -0,0 +1,158 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+
+#include <thread>
+#include <mutex>
+#include <memory>
+#include <vector>
+#include <list>
+#include "frame_queue.h"
+#include "common/ivfdec.h"
+#include "common/webmdec.h"
+#include "common/obudec.h"
+#include "common/mp4parser.h"
+#include "mkvparser/mkvparser.h"
+#include "mkvparser/mkvreader.h"
+#include "log.h"
+
+class StreamReader;
+
+class ReaderBuffer {
+ public:
+  ReaderBuffer(StreamReader* releaser) : buffer_(0), size_(0), releaser_(releaser) {}
+  ~ReaderBuffer() {
+    XB_LOGD << "~ReaderBuffer";
+    delete[] buffer_;
+  }
+  int Put(const uint8_t* src, uint32_t size, int last = 0) {
+    if (src && size) {
+      if (size > capacity_) {
+        XB_LOGD << "ReaderBuffer::Put " << size;
+        delete[] buffer_;
+        buffer_ = new (std::nothrow) uint8_t[size];
+        if (!buffer_) return -1;
+        capacity_ = size;
+      }
+      memcpy(buffer_, src, size);
+      size_ = size;
+    } else {
+      delete[] buffer_;  // eof case
+      buffer_ = 0;
+      size_ = 0;
+      capacity_ = 0;
+    }
+    isFinal_ = last;
+    return 0;
+  }
+  uint8_t* Data() { return buffer_; }
+  uint32_t Size() { return size_; }
+  int IsFinal() const { return isFinal_; }
+
+ private:
+  uint8_t* buffer_ = 0;
+  uint32_t size_ = 0;
+  uint32_t capacity_ = 0;
+  uint32_t isFinal_ = 0;
+  StreamReader* releaser_;
+  friend class ReadBufferWrapper;
+};
+
+typedef std::shared_ptr<ReaderBuffer> reader_buffer_ptr;
+class ReadBufferWrapper {
+ public:
+  ReadBufferWrapper(reader_buffer_ptr buffer = reader_buffer_ptr(0)) : buffer_(buffer) {}
+  ~ReadBufferWrapper();
+  ReadBufferWrapper& operator=(reader_buffer_ptr other) {
+    InternalRelease();
+    buffer_ = other;
+    return *this;
+  }
+  void Reset() {
+    InternalRelease();
+    buffer_ = 0;
+  }
+  ReaderBuffer* operator*() { return buffer_.get(); }
+  operator bool() { return buffer_ != 0; }
+  ReaderBuffer* operator->() { return buffer_.get(); }
+
+ private:
+  void InternalRelease();
+
+ private:
+  reader_buffer_ptr buffer_;
+};
+
+enum StreamReaderMode { srmodeAsync = 0, srmodeSync = 1 };
+
+class StreamReader {
+ public:
+  StreamReader(uint32_t mode, uint32_t buffers_num = 20);
+  ~StreamReader() {
+    Stop();
+    if (aom_input_ctx_.file) fclose(aom_input_ctx_.file);
+    if (mp4_context_) destroy_mp4_ctx(mp4_context_);
+    if (aom_input_ctx_.file_type == FILE_TYPE_IVF)
+      ivf_free_buffer(&buf_);
+    else if (aom_input_ctx_.file_type == FILE_TYPE_WEBM)
+      webm_free(&webm_ctx_);
+    else if (aom_input_ctx_.file_type == FILE_TYPE_OBU)
+      obudec_free(&obu_ctx_);
+  }
+  int Start(std::string& source_path);
+
+  void Stop() {
+    if (!stop_) {
+      stop_ = 1;
+      if (mode_ == srmodeAsync) {
+        reader_queue_.Push(0);
+        reading_thread_.join();
+      }
+    }
+  }
+  reader_buffer_ptr GetData();
+  void ReleaseData(reader_buffer_ptr data);
+  size_t GetCount();
+  double FrameRate() {
+    if (aom_input_ctx_.framerate.denominator)
+      return (double)aom_input_ctx_.framerate.numerator / (double)aom_input_ctx_.framerate.denominator;
+    else
+      return 30.0;
+  }
+
+ private:
+  int readFrame(uint8_t** buf, size_t* bytes_in_buffer);
+  static void ThreadFunc(StreamReader* ptr) { ptr->ReadLoop(); }
+  int ReadLoop();
+
+  std::string sourec_file_;
+  AvxInputContext aom_input_ctx_ = {};
+  WebmInputContext webm_ctx_ = {};
+  ObuDecInputContext obu_ctx_ = {};
+  mp4context* mp4_context_ = 0;
+  volatile int stop_ = 1;
+  volatile int started_ = 0;
+  std::thread reading_thread_;
+  std::mutex cs_;
+  myqueue<reader_buffer_ptr> reader_queue_;
+  myqueue<reader_buffer_ptr> ready_queue_;
+  int eof_ = 0;
+  size_t input_buffer_capacity_ = 0;
+  uint8_t* buf_ = 0;
+  uint32_t mode_ = 0;
+  friend class ReaderBuffer;
+};
\ No newline at end of file

diff --git a/testing/common/tools_common.c b/testing/common/tools_common.c
new file mode 100644
index 0000000..1ace66c
--- /dev/null
+++ b/testing/common/tools_common.c

@@ -0,0 +1,486 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "tools_common.h"
+
+#include <math.h>
+#include <stdarg.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#if CONFIG_AV1_ENCODER
+#include "aom/aomcx.h"
+#endif
+
+#if CONFIG_AV1_DECODER
+#include "aom/aomdx.h"
+#endif
+
+#if defined(_WIN32) || defined(__OS2__)
+#include <io.h>
+#include <fcntl.h>
+
+#ifdef __OS2__
+#define _setmode setmode
+#define _fileno fileno
+#define _O_BINARY O_BINARY
+#endif
+#endif
+
+#define LOG_ERROR(label)               \
+  do {                                 \
+    const char *l = label;             \
+    va_list ap;                        \
+    va_start(ap, fmt);                 \
+    if (l) fprintf(stderr, "%s: ", l); \
+    vfprintf(stderr, fmt, ap);         \
+    fprintf(stderr, "\n");             \
+    va_end(ap);                        \
+  } while (0)
+
+FILE *set_binary_mode(FILE *stream) {
+  (void)stream;
+#if defined(_WIN32) || defined(__OS2__)
+  _setmode(_fileno(stream), _O_BINARY);
+#endif
+  return stream;
+}
+
+void die(const char *fmt, ...) {
+  LOG_ERROR(NULL);
+  usage_exit();
+}
+
+void fatal(const char *fmt, ...) {
+  LOG_ERROR("Fatal");
+  exit(EXIT_FAILURE);
+}
+
+void warn(const char *fmt, ...) { LOG_ERROR("Warning"); }
+
+void die_codec(aom_codec_ctx_t *ctx, const char *s) {
+  const char *detail = aom_codec_error_detail(ctx);
+
+  printf("%s: %s\n", s, aom_codec_error(ctx));
+  if (detail) printf("    %s\n", detail);
+  exit(EXIT_FAILURE);
+}
+
+int read_yuv_frame(struct AvxInputContext *input_ctx, aom_image_t *yuv_frame) {
+  FILE *f = input_ctx->file;
+  struct FileTypeDetectionBuffer *detect = &input_ctx->detect;
+  int plane = 0;
+  int shortread = 0;
+  const int bytespp = (yuv_frame->fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? 2 : 1;
+
+  for (plane = 0; plane < 3; ++plane) {
+    uint8_t *ptr;
+    const int w = aom_img_plane_width(yuv_frame, plane);
+    const int h = aom_img_plane_height(yuv_frame, plane);
+    int r;
+
+    /* Determine the correct plane based on the image format. The for-loop
+     * always counts in Y,U,V order, but this may not match the order of
+     * the data on disk.
+     */
+    switch (plane) {
+      case 1:
+        ptr = yuv_frame->planes[yuv_frame->fmt == AOM_IMG_FMT_YV12 ? AOM_PLANE_V : AOM_PLANE_U];
+        break;
+      case 2:
+        ptr = yuv_frame->planes[yuv_frame->fmt == AOM_IMG_FMT_YV12 ? AOM_PLANE_U : AOM_PLANE_V];
+        break;
+      default:
+        ptr = yuv_frame->planes[plane];
+    }
+
+    for (r = 0; r < h; ++r) {
+      size_t needed = w * bytespp;
+      size_t buf_position = 0;
+      const size_t left = detect->buf_read - detect->position;
+      if (left > 0) {
+        const size_t more = (left < needed) ? left : needed;
+        memcpy(ptr, detect->buf + detect->position, more);
+        buf_position = more;
+        needed -= more;
+        detect->position += more;
+      }
+      if (needed > 0) {
+        shortread |= (fread(ptr + buf_position, 1, needed, f) < needed);
+      }
+
+      ptr += yuv_frame->stride[plane];
+    }
+  }
+
+  return shortread;
+}
+
+#if CONFIG_AV1_ENCODER
+static const AvxInterface aom_encoders[] = {
+    {"av1", AV1_FOURCC, &aom_codec_av1_cx},
+};
+
+int get_aom_encoder_count(void) { return sizeof(aom_encoders) / sizeof(aom_encoders[0]); }
+
+const AvxInterface *get_aom_encoder_by_index(int i) { return &aom_encoders[i]; }
+
+const AvxInterface *get_aom_encoder_by_name(const char *name) {
+  int i;
+
+  for (i = 0; i < get_aom_encoder_count(); ++i) {
+    const AvxInterface *encoder = get_aom_encoder_by_index(i);
+    if (strcmp(encoder->name, name) == 0) return encoder;
+  }
+
+  return NULL;
+}
+
+// large scale tile encoding
+static const AvxInterface aom_lst_encoder = {"av1", LST_FOURCC, &aom_codec_av1_cx};
+const AvxInterface *get_aom_lst_encoder(void) { return &aom_lst_encoder; }
+#endif  // CONFIG_AV1_ENCODER
+
+#if CONFIG_AV1_DECODER
+static const AvxInterface aom_decoders[] = {
+    {"av1", AV1_FOURCC, &aom_codec_av1_dx},
+};
+
+int get_aom_decoder_count(void) { return sizeof(aom_decoders) / sizeof(aom_decoders[0]); }
+
+const AvxInterface *get_aom_decoder_by_index(int i) { return &aom_decoders[i]; }
+
+const AvxInterface *get_aom_decoder_by_name(const char *name) {
+  int i;
+
+  for (i = 0; i < get_aom_decoder_count(); ++i) {
+    const AvxInterface *const decoder = get_aom_decoder_by_index(i);
+    if (strcmp(decoder->name, name) == 0) return decoder;
+  }
+
+  return NULL;
+}
+
+const AvxInterface *get_aom_decoder_by_fourcc(uint32_t fourcc) {
+  int i;
+
+  for (i = 0; i < get_aom_decoder_count(); ++i) {
+    const AvxInterface *const decoder = get_aom_decoder_by_index(i);
+    if (decoder->fourcc == fourcc) return decoder;
+  }
+
+  return NULL;
+}
+#endif  // CONFIG_AV1_DECODER
+
+void aom_img_write(const aom_image_t *img, FILE *file) {
+  int plane;
+
+  for (plane = 0; plane < 3; ++plane) {
+    const unsigned char *buf = img->planes[plane];
+    const int stride = img->stride[plane];
+    const int w = aom_img_plane_width(img, plane) * ((img->fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? 2 : 1);
+    const int h = aom_img_plane_height(img, plane);
+    int y;
+
+    for (y = 0; y < h; ++y) {
+      fwrite(buf, 1, w, file);
+      buf += stride;
+    }
+  }
+}
+
+int aom_img_read(aom_image_t *img, FILE *file) {
+  int plane;
+
+  for (plane = 0; plane < 3; ++plane) {
+    unsigned char *buf = img->planes[plane];
+    const int stride = img->stride[plane];
+    const int w = aom_img_plane_width(img, plane) * ((img->fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? 2 : 1);
+    const int h = aom_img_plane_height(img, plane);
+    int y;
+
+    for (y = 0; y < h; ++y) {
+      if (fread(buf, 1, w, file) != (size_t)w) return 0;
+      buf += stride;
+    }
+  }
+
+  return 1;
+}
+
+// TODO(dkovalev) change sse_to_psnr signature: double -> int64_t
+double sse_to_psnr(double samples, double peak, double sse) {
+  static const double kMaxPSNR = 100.0;
+
+  if (sse > 0.0) {
+    const double psnr = 10.0 * log10(samples * peak * peak / sse);
+    return psnr > kMaxPSNR ? kMaxPSNR : psnr;
+  } else {
+    return kMaxPSNR;
+  }
+}
+
+// TODO(debargha): Consolidate the functions below into a separate file.
+static void highbd_img_upshift(aom_image_t *dst, const aom_image_t *src, int input_shift) {
+  // Note the offset is 1 less than half.
+  const int offset = input_shift > 0 ? (1 << (input_shift - 1)) - 1 : 0;
+  int plane;
+  if (dst->d_w != src->d_w || dst->d_h != src->d_h || dst->x_chroma_shift != src->x_chroma_shift ||
+      dst->y_chroma_shift != src->y_chroma_shift || dst->fmt != src->fmt || input_shift < 0) {
+    fatal("Unsupported image conversion");
+  }
+  switch (src->fmt) {
+    case AOM_IMG_FMT_I42016:
+    case AOM_IMG_FMT_I42216:
+    case AOM_IMG_FMT_I44416:
+      break;
+    default:
+      fatal("Unsupported image conversion");
+      break;
+  }
+  for (plane = 0; plane < 3; plane++) {
+    int w = src->d_w;
+    int h = src->d_h;
+    int x, y;
+    if (plane) {
+      w = (w + src->x_chroma_shift) >> src->x_chroma_shift;
+      h = (h + src->y_chroma_shift) >> src->y_chroma_shift;
+    }
+    for (y = 0; y < h; y++) {
+      const uint16_t *p_src = (const uint16_t *)(src->planes[plane] + y * src->stride[plane]);
+      uint16_t *p_dst = (uint16_t *)(dst->planes[plane] + y * dst->stride[plane]);
+      for (x = 0; x < w; x++) *p_dst++ = (*p_src++ << input_shift) + offset;
+    }
+  }
+}
+
+static void lowbd_img_upshift(aom_image_t *dst, const aom_image_t *src, int input_shift) {
+  // Note the offset is 1 less than half.
+  const int offset = input_shift > 0 ? (1 << (input_shift - 1)) - 1 : 0;
+  int plane;
+  if (dst->d_w != src->d_w || dst->d_h != src->d_h || dst->x_chroma_shift != src->x_chroma_shift ||
+      dst->y_chroma_shift != src->y_chroma_shift || dst->fmt != src->fmt + AOM_IMG_FMT_HIGHBITDEPTH ||
+      input_shift < 0) {
+    fatal("Unsupported image conversion");
+  }
+  switch (src->fmt) {
+    case AOM_IMG_FMT_YV12:
+    case AOM_IMG_FMT_I420:
+    case AOM_IMG_FMT_I422:
+    case AOM_IMG_FMT_I444:
+      break;
+    default:
+      fatal("Unsupported image conversion");
+      break;
+  }
+  for (plane = 0; plane < 3; plane++) {
+    int w = src->d_w;
+    int h = src->d_h;
+    int x, y;
+    if (plane) {
+      w = (w + src->x_chroma_shift) >> src->x_chroma_shift;
+      h = (h + src->y_chroma_shift) >> src->y_chroma_shift;
+    }
+    for (y = 0; y < h; y++) {
+      const uint8_t *p_src = src->planes[plane] + y * src->stride[plane];
+      uint16_t *p_dst = (uint16_t *)(dst->planes[plane] + y * dst->stride[plane]);
+      for (x = 0; x < w; x++) {
+        *p_dst++ = (*p_src++ << input_shift) + offset;
+      }
+    }
+  }
+}
+
+void aom_img_upshift(aom_image_t *dst, const aom_image_t *src, int input_shift) {
+  if (src->fmt & AOM_IMG_FMT_HIGHBITDEPTH) {
+    highbd_img_upshift(dst, src, input_shift);
+  } else {
+    lowbd_img_upshift(dst, src, input_shift);
+  }
+}
+
+void aom_img_truncate_16_to_8(aom_image_t *dst, const aom_image_t *src) {
+  int plane;
+  if (dst->fmt + AOM_IMG_FMT_HIGHBITDEPTH != src->fmt || dst->d_w != src->d_w || dst->d_h != src->d_h ||
+      dst->x_chroma_shift != src->x_chroma_shift || dst->y_chroma_shift != src->y_chroma_shift) {
+    fatal("Unsupported image conversion");
+  }
+  switch (dst->fmt) {
+    case AOM_IMG_FMT_I420:
+    case AOM_IMG_FMT_I422:
+    case AOM_IMG_FMT_I444:
+      break;
+    default:
+      fatal("Unsupported image conversion");
+      break;
+  }
+  for (plane = 0; plane < 3; plane++) {
+    int w = src->d_w;
+    int h = src->d_h;
+    int x, y;
+    if (plane) {
+      w = (w + src->x_chroma_shift) >> src->x_chroma_shift;
+      h = (h + src->y_chroma_shift) >> src->y_chroma_shift;
+    }
+    for (y = 0; y < h; y++) {
+      const uint16_t *p_src = (const uint16_t *)(src->planes[plane] + y * src->stride[plane]);
+      uint8_t *p_dst = dst->planes[plane] + y * dst->stride[plane];
+      for (x = 0; x < w; x++) {
+        *p_dst++ = (uint8_t)(*p_src++);
+      }
+    }
+  }
+}
+
+static void highbd_img_downshift(aom_image_t *dst, const aom_image_t *src, int down_shift) {
+  int plane;
+  if (dst->d_w != src->d_w || dst->d_h != src->d_h || dst->x_chroma_shift != src->x_chroma_shift ||
+      dst->y_chroma_shift != src->y_chroma_shift || dst->fmt != src->fmt || down_shift < 0) {
+    fatal("Unsupported image conversion");
+  }
+  switch (src->fmt) {
+    case AOM_IMG_FMT_I42016:
+    case AOM_IMG_FMT_I42216:
+    case AOM_IMG_FMT_I44416:
+      break;
+    default:
+      fatal("Unsupported image conversion");
+      break;
+  }
+  for (plane = 0; plane < 3; plane++) {
+    int w = src->d_w;
+    int h = src->d_h;
+    int x, y;
+    if (plane) {
+      w = (w + src->x_chroma_shift) >> src->x_chroma_shift;
+      h = (h + src->y_chroma_shift) >> src->y_chroma_shift;
+    }
+    for (y = 0; y < h; y++) {
+      const uint16_t *p_src = (const uint16_t *)(src->planes[plane] + y * src->stride[plane]);
+      uint16_t *p_dst = (uint16_t *)(dst->planes[plane] + y * dst->stride[plane]);
+      for (x = 0; x < w; x++) *p_dst++ = *p_src++ >> down_shift;
+    }
+  }
+}
+
+static void lowbd_img_downshift(aom_image_t *dst, const aom_image_t *src, int down_shift) {
+  int plane;
+  if (dst->d_w != src->d_w || dst->d_h != src->d_h || dst->x_chroma_shift != src->x_chroma_shift ||
+      dst->y_chroma_shift != src->y_chroma_shift || src->fmt != dst->fmt + AOM_IMG_FMT_HIGHBITDEPTH || down_shift < 0) {
+    fatal("Unsupported image conversion");
+  }
+  switch (dst->fmt) {
+    case AOM_IMG_FMT_I420:
+    case AOM_IMG_FMT_I422:
+    case AOM_IMG_FMT_I444:
+      break;
+    default:
+      fatal("Unsupported image conversion");
+      break;
+  }
+  for (plane = 0; plane < 3; plane++) {
+    int w = src->d_w;
+    int h = src->d_h;
+    int x, y;
+    if (plane) {
+      w = (w + src->x_chroma_shift) >> src->x_chroma_shift;
+      h = (h + src->y_chroma_shift) >> src->y_chroma_shift;
+    }
+    for (y = 0; y < h; y++) {
+      const uint16_t *p_src = (const uint16_t *)(src->planes[plane] + y * src->stride[plane]);
+      uint8_t *p_dst = dst->planes[plane] + y * dst->stride[plane];
+      for (x = 0; x < w; x++) {
+        *p_dst++ = *p_src++ >> down_shift;
+      }
+    }
+  }
+}
+
+void aom_img_downshift(aom_image_t *dst, const aom_image_t *src, int down_shift) {
+  if (dst->fmt & AOM_IMG_FMT_HIGHBITDEPTH) {
+    highbd_img_downshift(dst, src, down_shift);
+  } else {
+    lowbd_img_downshift(dst, src, down_shift);
+  }
+}
+
+static int img_shifted_realloc_required(const aom_image_t *img, const aom_image_t *shifted,
+                                        aom_img_fmt_t required_fmt) {
+  return img->d_w != shifted->d_w || img->d_h != shifted->d_h || required_fmt != shifted->fmt;
+}
+
+void aom_shift_img(unsigned int output_bit_depth, aom_image_t **img_ptr, aom_image_t **img_shifted_ptr) {
+  aom_image_t *img = *img_ptr;
+  aom_image_t *img_shifted = *img_shifted_ptr;
+
+  const aom_img_fmt_t shifted_fmt =
+      output_bit_depth == 8 ? img->fmt & ~AOM_IMG_FMT_HIGHBITDEPTH : img->fmt | AOM_IMG_FMT_HIGHBITDEPTH;
+
+  if (shifted_fmt != img->fmt || output_bit_depth != img->bit_depth) {
+    if (img_shifted && img_shifted_realloc_required(img, img_shifted, shifted_fmt)) {
+      aom_img_free(img_shifted);
+      img_shifted = NULL;
+    }
+    if (img_shifted) {
+      img_shifted->monochrome = img->monochrome;
+    }
+    if (!img_shifted) {
+      img_shifted = aom_img_alloc(NULL, shifted_fmt, img->d_w, img->d_h, 16);
+      img_shifted->bit_depth = output_bit_depth;
+      img_shifted->monochrome = img->monochrome;
+      img_shifted->csp = img->csp;
+    }
+    if (output_bit_depth > img->bit_depth) {
+      aom_img_upshift(img_shifted, img, output_bit_depth - img->bit_depth);
+    } else {
+      aom_img_downshift(img_shifted, img, img->bit_depth - output_bit_depth);
+    }
+    *img_shifted_ptr = img_shifted;
+    *img_ptr = img_shifted;
+  }
+}
+
+// Related to I420, NV12 format has one luma "luminance" plane Y and one plane
+// with U and V values interleaved.
+void aom_img_write_nv12(const aom_image_t *img, FILE *file) {
+  // Y plane
+  const unsigned char *buf = img->planes[0];
+  int stride = img->stride[0];
+  int w = aom_img_plane_width(img, 0) * ((img->fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? 2 : 1);
+  int h = aom_img_plane_height(img, 0);
+  int x, y;
+
+  for (y = 0; y < h; ++y) {
+    fwrite(buf, 1, w, file);
+    buf += stride;
+  }
+
+  // Interleaved U and V plane
+  const unsigned char *ubuf = img->planes[1];
+  const unsigned char *vbuf = img->planes[2];
+  const size_t size = (img->fmt & AOM_IMG_FMT_HIGHBITDEPTH) ? 2 : 1;
+  stride = img->stride[1];
+  w = aom_img_plane_width(img, 1);
+  h = aom_img_plane_height(img, 1);
+
+  for (y = 0; y < h; ++y) {
+    for (x = 0; x < w; ++x) {
+      fwrite(ubuf, size, 1, file);
+      fwrite(vbuf, size, 1, file);
+      ubuf += size;
+      vbuf += size;
+    }
+    ubuf += (stride - w * size);
+    vbuf += (stride - w * size);
+  }
+}

diff --git a/testing/common/tools_common.h b/testing/common/tools_common.h
new file mode 100644
index 0000000..85beff5
--- /dev/null
+++ b/testing/common/tools_common.h

@@ -0,0 +1,173 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#ifndef AOM_COMMON_TOOLS_COMMON_H_
+#define AOM_COMMON_TOOLS_COMMON_H_
+
+#include <stdio.h>
+
+#include "config/aom_config.h"
+
+#include "aom/aom_codec.h"
+#include "aom/aom_image.h"
+#include "aom/aom_integer.h"
+#include "aom_ports/mem.h"
+#include "aom_ports/msvc.h"
+
+#if CONFIG_AV1_ENCODER
+#include "common/y4minput.h"
+#endif
+
+#if defined(_MSC_VER)
+/* MSVS uses _f{seek,tell}i64. */
+#define fseeko _fseeki64
+#define ftello _ftelli64
+typedef int64_t FileOffset;
+#elif defined(_WIN32)
+#include <sys/types.h> /* NOLINT*/
+/* MinGW uses f{seek,tell}o64 for large files. */
+#define fseeko fseeko64
+#define ftello ftello64
+typedef off64_t FileOffset;
+#elif CONFIG_OS_SUPPORT
+#include <sys/types.h> /* NOLINT*/
+typedef off_t FileOffset;
+/* Use 32-bit file operations in WebM file format when building ARM
+ * executables (.axf) with RVCT. */
+#else
+#define fseeko fseek
+#define ftello ftell
+typedef long FileOffset; /* NOLINT */
+#endif /* CONFIG_OS_SUPPORT */
+
+#if CONFIG_OS_SUPPORT
+#if defined(_MSC_VER)
+#include <io.h> /* NOLINT */
+#define isatty _isatty
+#define fileno _fileno
+#else
+#include <unistd.h> /* NOLINT */
+#endif              /* _MSC_VER */
+#endif              /* CONFIG_OS_SUPPORT */
+
+#define LITERALU64(hi, lo) ((((uint64_t)hi) << 32) | lo)
+
+#ifndef PATH_MAX
+#define PATH_MAX 512
+#endif
+
+#define IVF_FRAME_HDR_SZ (4 + 8) /* 4 byte size + 8 byte timestamp */
+#define IVF_FILE_HDR_SZ 32
+
+#define RAW_FRAME_HDR_SZ sizeof(uint32_t)
+
+#define AV1_FOURCC 0x31305641
+
+enum VideoFileType { FILE_TYPE_OBU, FILE_TYPE_RAW, FILE_TYPE_IVF, FILE_TYPE_Y4M, FILE_TYPE_WEBM, FILE_TYPE_MP4 };
+
+// Used in lightfield example.
+enum {
+  YUV1D,  // 1D tile output for conformance test.
+  YUV,    // Tile output in YUV format.
+  NV12,   // Tile output in NV12 format.
+} UENUM1BYTE(OUTPUT_FORMAT);
+
+// The fourcc for large_scale_tile encoding is "LSTC".
+#define LST_FOURCC 0x4354534c
+
+struct FileTypeDetectionBuffer {
+  char buf[4];
+  size_t buf_read;
+  size_t position;
+};
+
+struct AvxRational {
+  int numerator;
+  int denominator;
+};
+
+struct AvxInputContext {
+  const char *filename;
+  FILE *file;
+  int64_t length;
+  struct FileTypeDetectionBuffer detect;
+  enum VideoFileType file_type;
+  uint32_t width;
+  uint32_t height;
+  struct AvxRational pixel_aspect_ratio;
+  aom_img_fmt_t fmt;
+  aom_bit_depth_t bit_depth;
+  int only_i420;
+  uint32_t fourcc;
+  struct AvxRational framerate;
+#if CONFIG_AV1_ENCODER
+  y4m_input y4m;
+#endif
+};
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#if defined(__GNUC__)
+#define AOM_NO_RETURN __attribute__((noreturn))
+#else
+#define AOM_NO_RETURN
+#endif
+
+/* Sets a stdio stream into binary mode */
+FILE *set_binary_mode(FILE *stream);
+
+void die(const char *fmt, ...) AOM_NO_RETURN;
+void fatal(const char *fmt, ...) AOM_NO_RETURN;
+void warn(const char *fmt, ...);
+
+void die_codec(aom_codec_ctx_t *ctx, const char *s) AOM_NO_RETURN;
+
+/* The tool including this file must define usage_exit() */
+void usage_exit(void) AOM_NO_RETURN;
+
+#undef AOM_NO_RETURN
+
+int read_yuv_frame(struct AvxInputContext *input_ctx, aom_image_t *yuv_frame);
+
+typedef struct AvxInterface {
+  const char *const name;
+  const uint32_t fourcc;
+  aom_codec_iface_t *(*const codec_interface)();
+} AvxInterface;
+
+int get_aom_encoder_count(void);
+const AvxInterface *get_aom_encoder_by_index(int i);
+const AvxInterface *get_aom_encoder_by_name(const char *name);
+const AvxInterface *get_aom_lst_encoder(void);
+
+int get_aom_decoder_count(void);
+const AvxInterface *get_aom_decoder_by_index(int i);
+const AvxInterface *get_aom_decoder_by_name(const char *name);
+const AvxInterface *get_aom_decoder_by_fourcc(uint32_t fourcc);
+
+void aom_img_write(const aom_image_t *img, FILE *file);
+int aom_img_read(aom_image_t *img, FILE *file);
+
+double sse_to_psnr(double samples, double peak, double mse);
+void aom_img_upshift(aom_image_t *dst, const aom_image_t *src, int input_shift);
+void aom_img_downshift(aom_image_t *dst, const aom_image_t *src, int down_shift);
+void aom_shift_img(unsigned int output_bit_depth, aom_image_t **img_ptr, aom_image_t **img_shifted_ptr);
+void aom_img_truncate_16_to_8(aom_image_t *dst, const aom_image_t *src);
+
+// Output in NV12 format.
+void aom_img_write_nv12(const aom_image_t *img, FILE *file);
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#endif  // AOM_COMMON_TOOLS_COMMON_H_

diff --git a/testing/common/utils.h b/testing/common/utils.h
new file mode 100644
index 0000000..367cc6b
--- /dev/null
+++ b/testing/common/utils.h

@@ -0,0 +1,48 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+#include <stdint.h>
+#include <assert.h>
+
+#define timers_number 10
+class StreamTimer {
+ public:
+  StreamTimer() : frame_no_(0) { QueryPerformanceFrequency(&frequency_); }
+
+  uint64_t GetCurrent(int idx, uint64_t* duration) {
+    assert(idx >= 0 && idx < timers_number);
+    if (st2_[idx].QuadPart == 0) {
+      QueryPerformanceCounter(&st2_[idx]);
+      last2_[idx] = st2_[idx];
+      *duration = 0;
+      return 0;
+    }
+    LARGE_INTEGER cur;
+    QueryPerformanceCounter(&cur);
+    *duration = (cur.QuadPart - last2_[idx].QuadPart) * 1000000 / frequency_.QuadPart;
+    uint64_t ct = (cur.QuadPart - st2_[idx].QuadPart) * 1000000 / frequency_.QuadPart;
+    last2_[idx] = cur;
+    return ct;
+  }
+
+ private:
+  uint32_t frame_no_ = 0;
+
+  LARGE_INTEGER st2_[timers_number] = {};
+  LARGE_INTEGER last2_[timers_number] = {};
+  LARGE_INTEGER frequency_;
+};
\ No newline at end of file

diff --git a/testing/common/webmdec.cc b/testing/common/webmdec.cc
new file mode 100644
index 0000000..e97ce81
--- /dev/null
+++ b/testing/common/webmdec.cc

@@ -0,0 +1,213 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "webmdec.h"
+
+#include <cassert>
+#include <cstring>
+#include <cstdio>
+
+#include "mkvparser/mkvparser.h"
+#include "mkvparser/mkvreader.h"
+
+namespace {
+
+void reset(struct WebmInputContext *const webm_ctx) {
+  if (webm_ctx->reader != NULL) {
+    mkvparser::MkvReader *const reader = reinterpret_cast<mkvparser::MkvReader *>(webm_ctx->reader);
+    delete reader;
+  }
+  if (webm_ctx->segment != NULL) {
+    mkvparser::Segment *const segment = reinterpret_cast<mkvparser::Segment *>(webm_ctx->segment);
+    delete segment;
+  }
+  if (webm_ctx->buffer != NULL) {
+    delete[] webm_ctx->buffer;
+  }
+  webm_ctx->reader = NULL;
+  webm_ctx->segment = NULL;
+  webm_ctx->buffer = NULL;
+  webm_ctx->cluster = NULL;
+  webm_ctx->block_entry = NULL;
+  webm_ctx->block = NULL;
+  webm_ctx->block_frame_index = 0;
+  webm_ctx->video_track_index = 0;
+  webm_ctx->timestamp_ns = 0;
+  webm_ctx->is_key_frame = false;
+}
+
+void get_first_cluster(struct WebmInputContext *const webm_ctx) {
+  mkvparser::Segment *const segment = reinterpret_cast<mkvparser::Segment *>(webm_ctx->segment);
+  const mkvparser::Cluster *const cluster = segment->GetFirst();
+  webm_ctx->cluster = cluster;
+}
+
+void rewind_and_reset(struct WebmInputContext *const webm_ctx, struct AvxInputContext *const aom_ctx) {
+  rewind(aom_ctx->file);
+  reset(webm_ctx);
+}
+
+}  // namespace
+
+int file_is_webm(struct WebmInputContext *webm_ctx, struct AvxInputContext *aom_ctx) {
+  mkvparser::MkvReader *const reader = new mkvparser::MkvReader(aom_ctx->file);
+  webm_ctx->reader = reader;
+  webm_ctx->reached_eos = 0;
+
+  mkvparser::EBMLHeader header;
+  long long pos = 0;
+  if (header.Parse(reader, pos) < 0) {
+    rewind_and_reset(webm_ctx, aom_ctx);
+    return 0;
+  }
+
+  mkvparser::Segment *segment;
+  if (mkvparser::Segment::CreateInstance(reader, pos, segment)) {
+    rewind_and_reset(webm_ctx, aom_ctx);
+    return 0;
+  }
+  webm_ctx->segment = segment;
+  if (segment->Load() < 0) {
+    rewind_and_reset(webm_ctx, aom_ctx);
+    return 0;
+  }
+
+  const mkvparser::Tracks *const tracks = segment->GetTracks();
+  const mkvparser::VideoTrack *video_track = NULL;
+  for (unsigned long i = 0; i < tracks->GetTracksCount(); ++i) {
+    const mkvparser::Track *const track = tracks->GetTrackByIndex(i);
+    if (track->GetType() == mkvparser::Track::kVideo) {
+      video_track = static_cast<const mkvparser::VideoTrack *>(track);
+      webm_ctx->video_track_index = static_cast<int>(track->GetNumber());
+      break;
+    }
+  }
+
+  if (video_track == NULL || video_track->GetCodecId() == NULL) {
+    rewind_and_reset(webm_ctx, aom_ctx);
+    return 0;
+  }
+
+  if (!strncmp(video_track->GetCodecId(), "V_AV1", 5)) {
+    aom_ctx->fourcc = AV1_FOURCC;
+  } else {
+    rewind_and_reset(webm_ctx, aom_ctx);
+    return 0;
+  }
+
+  aom_ctx->framerate.denominator = 0;
+  aom_ctx->framerate.numerator = 0;
+  aom_ctx->width = static_cast<uint32_t>(video_track->GetWidth());
+  aom_ctx->height = static_cast<uint32_t>(video_track->GetHeight());
+
+  get_first_cluster(webm_ctx);
+
+  return 1;
+}
+
+int webm_read_frame(struct WebmInputContext *webm_ctx, uint8_t **buffer, size_t *bytes_read, size_t *buffer_size) {
+  assert(webm_ctx->buffer == *buffer);
+  // This check is needed for frame parallel decoding, in which case this
+  // function could be called even after it has reached end of input stream.
+  if (webm_ctx->reached_eos) {
+    return 1;
+  }
+  mkvparser::Segment *const segment = reinterpret_cast<mkvparser::Segment *>(webm_ctx->segment);
+  const mkvparser::Cluster *cluster = reinterpret_cast<const mkvparser::Cluster *>(webm_ctx->cluster);
+  const mkvparser::Block *block = reinterpret_cast<const mkvparser::Block *>(webm_ctx->block);
+  const mkvparser::BlockEntry *block_entry = reinterpret_cast<const mkvparser::BlockEntry *>(webm_ctx->block_entry);
+  bool block_entry_eos = false;
+  do {
+    long status = 0;
+    bool get_new_block = false;
+    if (block_entry == NULL && !block_entry_eos) {
+      status = cluster->GetFirst(block_entry);
+      get_new_block = true;
+    } else if (block_entry_eos || block_entry->EOS()) {
+      cluster = segment->GetNext(cluster);
+      if (cluster == NULL || cluster->EOS()) {
+        *bytes_read = 0;
+        webm_ctx->reached_eos = 1;
+        return 1;
+      }
+      status = cluster->GetFirst(block_entry);
+      block_entry_eos = false;
+      get_new_block = true;
+    } else if (block == NULL || webm_ctx->block_frame_index == block->GetFrameCount() ||
+               block->GetTrackNumber() != webm_ctx->video_track_index) {
+      status = cluster->GetNext(block_entry, block_entry);
+      if (block_entry == NULL || block_entry->EOS()) {
+        block_entry_eos = true;
+        continue;
+      }
+      get_new_block = true;
+    }
+    if (status || block_entry == NULL) {
+      return -1;
+    }
+    if (get_new_block) {
+      block = block_entry->GetBlock();
+      if (block == NULL) return -1;
+      webm_ctx->block_frame_index = 0;
+    }
+  } while (block_entry_eos || block->GetTrackNumber() != webm_ctx->video_track_index);
+
+  webm_ctx->cluster = cluster;
+  webm_ctx->block_entry = block_entry;
+  webm_ctx->block = block;
+
+  const mkvparser::Block::Frame &frame = block->GetFrame(webm_ctx->block_frame_index);
+  ++webm_ctx->block_frame_index;
+  if (frame.len > static_cast<long>(*buffer_size)) {
+    delete[] * buffer;
+    *buffer = new uint8_t[frame.len];
+    webm_ctx->buffer = *buffer;
+    if (*buffer == NULL) {
+      return -1;
+    }
+    *buffer_size = frame.len;
+  }
+  *bytes_read = frame.len;
+  webm_ctx->timestamp_ns = block->GetTime(cluster);
+  webm_ctx->is_key_frame = block->IsKey();
+
+  mkvparser::MkvReader *const reader = reinterpret_cast<mkvparser::MkvReader *>(webm_ctx->reader);
+  return frame.Read(reader, *buffer) ? -1 : 0;
+}
+
+int webm_guess_framerate(struct WebmInputContext *webm_ctx, struct AvxInputContext *aom_ctx) {
+  uint32_t i = 0;
+  uint8_t *buffer = NULL;
+  size_t buffer_size = 0;
+  size_t bytes_read = 0;
+  assert(webm_ctx->buffer == NULL);
+  while (webm_ctx->timestamp_ns < 1000000000 && i < 50) {
+    if (webm_read_frame(webm_ctx, &buffer, &bytes_read, &buffer_size)) {
+      break;
+    }
+    ++i;
+  }
+  aom_ctx->framerate.numerator = (i - 1) * 1000000;
+  aom_ctx->framerate.denominator = static_cast<int>(webm_ctx->timestamp_ns / 1000);
+  delete[] buffer;
+  webm_ctx->buffer = NULL;
+
+  get_first_cluster(webm_ctx);
+  webm_ctx->block = NULL;
+  webm_ctx->block_entry = NULL;
+  webm_ctx->block_frame_index = 0;
+  webm_ctx->timestamp_ns = 0;
+  webm_ctx->reached_eos = 0;
+
+  return 0;
+}
+
+void webm_free(struct WebmInputContext *webm_ctx) { reset(webm_ctx); }

diff --git a/testing/common/webmdec.h b/testing/common/webmdec.h
new file mode 100644
index 0000000..2848fdf
--- /dev/null
+++ b/testing/common/webmdec.h

@@ -0,0 +1,68 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#ifndef AOM_COMMON_WEBMDEC_H_
+#define AOM_COMMON_WEBMDEC_H_
+
+#include "tools_common.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct AvxInputContext;
+
+struct WebmInputContext {
+  void *reader;
+  void *segment;
+  uint8_t *buffer;
+  const void *cluster;
+  const void *block_entry;
+  const void *block;
+  int block_frame_index;
+  int video_track_index;
+  uint64_t timestamp_ns;
+  int is_key_frame;
+  int reached_eos;
+};
+
+// Checks if the input is a WebM file. If so, initializes WebMInputContext so
+// that webm_read_frame can be called to retrieve a video frame.
+// Returns 1 on success and 0 on failure or input is not WebM file.
+// TODO(vigneshv): Refactor this function into two smaller functions specific
+// to their task.
+int file_is_webm(struct WebmInputContext *webm_ctx, struct AvxInputContext *aom_ctx);
+
+// Reads a WebM Video Frame. Memory for the buffer is created, owned and managed
+// by this function. For the first call, |buffer| should be NULL and
+// |*buffer_size| should be 0. Once all the frames are read and used,
+// webm_free() should be called, otherwise there will be a leak.
+// Parameters:
+//      webm_ctx - WebmInputContext object
+//      buffer - pointer where the frame data will be filled.
+//      bytes_read - pointer to bytes read.
+//      buffer_size - pointer to buffer size.
+// Return values:
+//      0 - Success
+//      1 - End of Stream
+//     -1 - Error
+int webm_read_frame(struct WebmInputContext *webm_ctx, uint8_t **buffer, size_t *bytes_read, size_t *buffer_size);
+
+// Guesses the frame rate of the input file based on the container timestamps.
+int webm_guess_framerate(struct WebmInputContext *webm_ctx, struct AvxInputContext *aom_ctx);
+
+// Resets the WebMInputContext.
+void webm_free(struct WebmInputContext *webm_ctx);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif
+
+#endif  // AOM_COMMON_WEBMDEC_H_

diff --git a/testing/player_common/cmd_parser.cpp b/testing/player_common/cmd_parser.cpp
new file mode 100644
index 0000000..7f765b9
--- /dev/null
+++ b/testing/player_common/cmd_parser.cpp

@@ -0,0 +1,192 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "cmd_parser.h"
+#include <stdio.h>
+#include <stdlib.h>
+#include <map>
+
+int CParameters::Parse(one_task& args, std::string& Err) {
+  const ParamStruct avaiting[] = {
+      {"i", atString, anAtLeastOne, "source file path", "", &params_.source_file_},
+      {"o", atString, anOptional, "target YUV file path", "output", &params_.target_file_},
+      {"dir", atString, anAtLeastOne, "source directory", "", &params_.source_dir_},
+      {"t", atInt, anOptional, "threads", "", &params_.threads_},
+      {"loop", atInt, anOptional, "loops", "", &params_.loops_},
+      {"limit", atInt, anOptional, "decoding frame limit", "", &params_.limit_},
+      {"progress", atSingle, anOptional, "show decoding time for each frame", "", &params_.progress_},
+      {"save_md5", atString, anOptional, "Save md5 for each frame", "", &params_.save_md5_},
+      {"md5_frame_check", atSingle, anOptional, "Calculate md5 for each frame & compare it with md5 file", "",
+       &params_.md5_frame_check_},
+      {"start", atInt, anOptional, "Start from frame No", "", &params_.start_frame_},
+      {"frate", atInt, anOptional,
+       "Frame rate. 0 - webm header framerate(for webm only), -1 - maximum frate, default 0", "", &params_.frate_},
+      {"dropinfo", atSingle, anOptional, "Show all drops", "", &params_.show_drops_},
+      {"queue_size", atInt, anOptional, "Player frame queue size. Default 16", "", &params_.player_queue_size_},
+      {"conformance", atSingle, anOptional, "Do conformance test for directory", "", &params_.conformance_},
+      {"try10x3", atSingle, anOptional, "Use 10_10_10_2 display mode. Default false", "", &params_.hdr10x3_},
+      {"r", atSingle, anOptional, "Recreate decoder for every file", "", &params_.recreate_}};
+  params_.command_line_ = args.command_line_;
+  const int list_length = sizeof(avaiting) / sizeof(ParamStruct);
+  bool has_value[list_length] = {false};
+  Err = "";
+  for (std::vector<std::string>::iterator iter = args.begin(); iter != args.end(); iter++) {
+    const char* cur = iter->c_str();
+    int par;
+    for (par = 0; par < list_length; par++) {
+      if (cur[0] != '-') {
+        int i = (int)(iter - args.begin());
+        GetHelp(i, iter->c_str(), Err, avaiting, list_length);
+        return i;
+      }
+      if (strcmp(&cur[1], avaiting[par].arg_name) == 0) {
+        switch (avaiting[par].type) {
+          case atString:
+            iter++;
+            *static_cast<std::string*>(avaiting[par].pValue) += *iter;
+            break;
+          case atSingle:
+            *static_cast<bool*>(avaiting[par].pValue) = true;
+            break;
+          case atInt:
+            iter++;
+            *static_cast<int*>(avaiting[par].pValue) = atoi(iter->c_str());
+            break;
+        }
+        has_value[par] = true;
+        break;
+      }
+    }
+    if (par == list_length) {
+      int i = (int)(iter - args.begin());
+      GetHelp(i, iter->c_str(), Err, avaiting, list_length);
+      return i;
+    }
+  }
+  int oneof = 0;
+  std::vector<int> idxs;
+  std::map<std::string, int> exclusives;
+  for (int i = 0; i < list_length; i++) {
+    if (avaiting[i].need == anMandatory && !has_value[i]) {
+      Err += "You MUST specify arg ";
+      Err += avaiting[i].arg_name;
+      Err += " value.\nUse \n";
+      GetHelp(0, 0, Err, avaiting, list_length);
+      return -1;
+    } else if (avaiting[i].need == anAtLeastOne) {
+      idxs.push_back(i);
+      if (has_value[i]) oneof++;
+    }
+    if (strlen(avaiting[i].exclusive) && has_value[i]) {
+      std::map<std::string, int>::iterator iter = exclusives.find(avaiting[i].exclusive);
+      if (iter != exclusives.end()) {
+        Err += "You can't use following args together: ";
+        Err += "\"";
+        Err += avaiting[iter->second].arg_name;
+        Err += "\" ";
+        Err += "\"";
+        Err += avaiting[i].arg_name;
+        Err += "\"";
+        return -1;
+      }
+      exclusives.insert(std::pair<std::string, int>(avaiting[i].exclusive, i));
+    }
+  }
+  if (!idxs.empty() && !oneof) {
+    Err += "You MUST specify at least one of the following args: ";
+    for (std::vector<int>::iterator iter = idxs.begin(); iter < idxs.end(); iter++) {
+      Err += avaiting[*iter].arg_name;
+      Err += ", ";
+    }
+    Err += "\nUse \n";
+    GetHelp(0, 0, Err, avaiting, list_length);
+    return -1;
+  }
+
+  params_.isDirectory_ = (params_.source_file_.empty() && !params_.source_dir_.empty());
+  params_.isSubList_ = (params_.source_file_.empty() && params_.source_dir_.empty() && !params_.view_list_.empty());
+  params_.needReadBack_ = !params_.target_file_.empty() || params_.md5_ || params_.md5_frame_check_ ||
+                          !params_.save_md5_.empty() || params_.conformance_ || params_.noninterop_;
+  if (!params_.source_file_.empty()) params_.source_file_ = read_root_path_ + params_.source_file_;
+  if (!params_.target_file_.empty()) params_.target_file_ = write_root_path_ + params_.target_file_;
+  if (!params_.source_dir_.empty()) params_.source_dir_ = read_root_path_ + params_.source_dir_;
+  if (!params_.save_md5_.empty()) params_.save_md5_ = write_root_path_ + params_.save_md5_;
+  if (params_.conformance_) {
+    params_.frate_ = -1;
+    params_.md5_ = true;
+  }
+  return 0;
+}
+
+void CParameters::Reset() {
+  params_.source_file_.clear();
+  params_.source_dir_.clear();
+  params_.bad_conform_dir_.clear();
+  params_.view_list_.clear();
+  params_.use_log_.clear();
+  params_.save_wrong_yuv_.clear();
+  params_.save_md5_.clear();
+  params_.isDirectory_ = false;
+  params_.isSubList_ = false;
+  params_.showIt_ = false;
+  params_.threads_ = 1;
+  params_.loops_ = 1;
+  params_.limit_ = -1;
+  params_.progress_ = false;
+  params_.start_frame_ = 0;
+  params_.frate_ = 0;
+  params_.needReadBack_ = false;
+  params_.md5_ = false;
+  params_.md5_frame_check_ = false;
+  params_.scheme_ = 0;
+  params_.player_queue_size_ = PLAYER_QUEUE_SIZE;
+  params_.extra_fb_num_ = 0;
+  params_.dec_task_queue_depth_ = 6;
+  params_.hdr10x3_ = 0;
+  params_.autoSyncAu_ = false;
+  params_.nonwait_sync_ = false;
+  params_.rawDecode_ = false;
+  params_.show_drops_ = false;
+  params_.conformance_ = false;
+  params_.noninterop_ = false;
+  params_.recreate_ = false;
+}
+
+void CParameters::GetHelp(int pos, const char* arg, std::string& errMsg, const ParamStruct* arr, int size) {
+  if (arg) {
+    char ss[128];
+    sprintf(ss, "Invalid command string.\nError in arg %d (%s).\nUse \n", pos, arg);
+    errMsg += ss;
+  }
+  errMsg += "{-i <source file path> | -dir <source directory>}\n";
+  for (int i = 2; i < size; i++) {
+    const ParamStruct& item = arr[i];
+    errMsg += " [-";
+    errMsg += item.arg_name;
+    if (item.type != atSingle) {
+      errMsg += " <";
+      errMsg += item.descr;
+      errMsg += ">";
+    }
+    errMsg += "]";
+    if (item.type == atSingle) {
+      errMsg += "\t";
+      errMsg += item.descr;
+    }
+    errMsg += "\n";
+  }
+  errMsg += "\n";
+}

diff --git a/testing/player_common/cmd_parser.h b/testing/player_common/cmd_parser.h
new file mode 100644
index 0000000..8ed4fad
--- /dev/null
+++ b/testing/player_common/cmd_parser.h

@@ -0,0 +1,99 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+#ifndef _CMD_PARSER_H_
+#define _CMD_PARSER_H_
+
+#include <chrono>
+#include <thread>
+#include <string>
+#include <vector>
+
+#define PLAYER_QUEUE_SIZE 16
+
+struct one_task : public std::vector<std::string> {
+  std::string command_line_;
+  one_task(std::string& command_line) : command_line_(command_line) {}
+};
+
+class CParameters {
+ public:
+  struct Parameters {
+    std::string command_line_;
+    std::string source_file_;
+    std::string source_dir_;
+    std::string bad_conform_dir_;
+    std::string bad_conform_file_;
+    std::string view_list_;
+    std::string target_file_;
+    std::string use_log_;
+    std::string save_wrong_yuv_;
+    std::string save_md5_;
+    bool isDirectory_;
+    bool isSubList_;
+    bool showIt_;
+    bool rawDecode_;
+    int threads_;
+    int loops_;
+    int limit_;
+    bool progress_;
+    int start_frame_;
+    int frate_;
+    bool needReadBack_;
+    bool md5_;
+    bool md5_frame_check_;
+    int scheme_;
+    int player_queue_size_;
+    int extra_fb_num_;
+    int dec_task_queue_depth_;
+    bool hdr10x3_;
+    bool autoSyncAu_;
+    bool nonwait_sync_;
+    bool show_drops_;
+    bool conformance_;
+    bool noninterop_;
+    bool recreate_;
+  };
+  enum arg_type { atSingle, atInt, atString };
+  enum arg_need { anOptional, anMandatory, anAtLeastOne };
+
+  struct ParamStruct {
+    const char* arg_name;
+    arg_type type;
+    arg_need need;
+    const char* descr;
+    const char* exclusive;
+    void* pValue;
+  };
+
+  CParameters(const char* read_root = "", const char* write_root = "")
+      : read_root_path_(read_root), write_root_path_(write_root) {
+    Reset();
+  }
+  int Parse(one_task& args, std::string& Err);
+  void Reset();
+  void GetHelp(int pos, const char* arg, std::string& errMsg, const ParamStruct* arr, int size);
+  const Parameters& GetParams() const { return params_; }
+  Parameters* GetParamsPtr() { return &params_; }
+
+ protected:
+  Parameters params_;
+  std::string read_root_path_;
+  std::string write_root_path_;
+};
+
+#endif  //_CMD_PARSER_H_
\ No newline at end of file

diff --git a/testing/player_common/decoder.cpp b/testing/player_common/decoder.cpp
new file mode 100644
index 0000000..271287f
--- /dev/null
+++ b/testing/player_common/decoder.cpp

@@ -0,0 +1,676 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "decoder.h"
+#include "decoder_creator.h"
+
+#define PLANE_WIDTH_ALIGMENT 256
+#define PLANE_WIDTH_ALIGMENT1 (PLANE_WIDTH_ALIGMENT - 1)
+
+using sys_clock = std::chrono::system_clock;
+using time_point = sys_clock::time_point;
+// using duration_uint64 = std::chrono::duration<uint64_t, std::micro>;
+
+static const int direct_mem_align = 16 * 1024;
+
+char* pdummy = 0;
+int AV1Decoder::DecodeLoop() {
+  char dummy;
+  pdummy = &dummy;
+  is_running_ = 1;
+  for (int i = 0; i < params_.loops_; i++) {
+    XB_LOGI << " ";
+    XB_LOGI << "start loop " << i << " for " << params_.source_file_;
+    DecodeOne();
+  }
+  XB_LOGI << " ";
+  XB_LOGI << "All works're done. Bye";
+  is_running_ = 0;
+  return 0;
+}
+
+int BufferLocker::GetBufferInt(size_t buf_size, uint16_t w, uint16_t h, frame_buffer_type fb_type,
+                               av1_decoded_frame_buffer_t* fb) {
+  uint32_t flags = params_->needReadBack_ ? (params_->noninterop_ ? CBUFREADBACK : CBUFALL)
+                                          : CBUFTEXTURE;
+  std::shared_ptr<OutBufferWrapper> buffer = buffers_pool_->GetFreeBuffer(w, h, fb_type, buf_size, flags);
+  if (!buffer) return -1;
+  OutBuffer* buf = buffer->Buffer();
+  fb->dx12_buffer = buf->d3d12buffer.Get();
+  fb->buffer_host_ptr = buf->pMem;
+  fb->dx12_texture[0] = buf->d3d12textures[0].Get();
+  fb->dx12_texture[1] = buf->d3d12textures[1].Get();
+  fb->dx12_texture[2] = buf->d3d12textures[2].Get();
+  fb->priv = buf;
+  return 0;
+}
+int BufferLocker::ReleaseBufferInt(void* buffer_priv) {
+  buffers_pool_->ReleaseRentedBuffer(buffer_priv);
+  return 0;
+}
+
+int AV1Decoder::DecodeOne() {
+  // int res;
+  int dec_flags = 0;
+  StreamReader reader(srmodeAsync);
+  if (params_.source_file_.empty() || params_.source_file_.length() < 4) {
+    XB_LOGE << "Invalid parameters. File path is empty";
+    return -1;
+  }
+  std::string source_path(params_.source_file_);
+  if (reader.Start(source_path)) {
+    XB_LOGE << "Unable to start file reader for " << source_path;
+    return -1;
+  }
+  ReadBufferWrapper buffer;
+
+  // skip a few first frames if required
+  for (int i = 0; i < params_.start_frame_; i++) {
+    buffer = reader.GetData();
+    if (!buffer) {
+      XB_LOGE << "end of stream reached before start decoding";
+      return 0;
+    }
+  }
+  int cur_frame = params_.start_frame_;
+  if (initialized_ && need_reinit_) {
+    XB_LOGI << "Destroy decoder";
+    aom_codec_destroy(&decctx_);
+    memset(&decctx_, 0, sizeof(decctx_));
+    initialized_ = 0;
+  }
+  if (params_.recreate_ || !initialized_) {
+    XB_LOGI << "(Re)create decoder";
+    try {
+      storage_.reset();
+      host_mem_.clear();
+      host_mem_.shrink_to_fit();
+      aom_codec_dec_cfg_t cfg;
+      memset(&cfg, 0, sizeof(aom_codec_dec_cfg_t));
+      cfg.width = 4096;
+      cfg.height = 2176;
+      cfg.bitdepth = 10;
+      cfg.threads = params_.threads_;
+      cfg.cfg.ext_partition = 1;  //?
+      cfg.allow_lowbitdepth = CONFIG_LOWBITDEPTH;
+      cfg.out_buffers_cb.get_out_buffer_cb = GetBuffer;
+      cfg.out_buffers_cb.release_out_buffer_cb = ReleaseBuffer;
+#if CB_OUTPUT
+      cfg.out_buffers_cb.notify_frame_ready_cb = NotifyFrameReady;
+#endif
+
+      cfg.out_buffers_cb.out_buffers_priv = this;
+      cfg.host_size = 0;
+      cfg.tryHDR10x3 = params_.hdr10x3_;
+      av1_codec_query_memory_requirements(&cfg);
+      host_mem_.resize(cfg.host_size, 0);
+      cfg.host_memory = &host_mem_[0];  // onion_mem.Data();
+      cfg.host_size = host_mem_.size();
+      cfg.dx12device = dx_desources_.dx12_device.Get();
+      cfg.dx12command_queue = dx_desources_.dx12_command_queue.Get();
+      cfg.dxPsos = dx_desources_.dx12_psos;
+      if (!cfg.dxPsos) {
+        XB_LOGE << "Unable to create compute shaders";
+        return 1;
+      }
+
+      if (aom_codec_dec_init(&decctx_, aom_codec_av1_dx(), &cfg, dec_flags)) {
+        host_mem_.clear();
+        host_mem_.shrink_to_fit();
+        XB_LOGE << "Failed to initialize decoder: " << aom_codec_error(&decctx_);
+        throw 2;
+      }
+      storage_ = std::unique_ptr<OutBufferStorage>(
+          new OutBufferStorage(params_.player_queue_size_, dx_desources_.dx12_device.Get()));
+      initialized_ = 1;
+    } catch (int error) {
+      XB_LOGE << "Unable to create decoder. Error " << error;
+      return -1;
+    } catch (std::bad_alloc error) {
+      XB_LOGE << "Unable to create decoder. Error " << error.what();
+      return -1;
+    } catch (...) {
+      XB_LOGE << "Unable to create decoder. Unknown error";
+      return -1;
+    }
+  }
+  // skip non-key frames (if start_frame_ != 0)
+  int is_key = 0;
+  buffer.Reset();
+  aom_codec_stream_info_t si;
+  memset(&si, 0, sizeof(si));
+  while (1) {
+    buffer = reader.GetData();
+    if (!buffer) {
+      XB_LOGE << "End of Stream reached looking for key frame ";
+      return -1;
+    }
+    aom_codec_err_t err = aom_codec_peek_stream_info(aom_codec_av1_dx(), buffer->Data(), buffer->Size(), &si);
+    if (AOM_CODEC_OK != err) {
+      XB_LOGE << "Failed to get stream info";
+      return -1;
+    }
+    is_key = si.is_kf;
+    if (is_key) break;
+    cur_frame++;
+  }
+  if (!buffer) {
+    buffer = reader.GetData();
+    if (!buffer) {
+      XB_LOGE << "End of stream ";
+      return -1;
+    }
+  }
+  double framedur = 16667.0;
+  double orig_frate = reader.FrameRate();
+  if (params_.frate_ > 0)
+    framedur = (double)1000000.0 / (double)params_.frate_;
+  else if (params_.frate_ == 0)
+    framedur = 1000000.0 / orig_frate;
+  if (params_.frate_ < 0)
+    XB_LOGI << "Frame rate = unlimited (-1)"
+            << " (" << params_.threads_ << " threads, player_queue_size " << params_.player_queue_size_ << ")";
+  else
+    XB_LOGI << "Frame rate = " << (1000000.0 / framedur) << " (" << params_.threads_ << " threads, player_queue_size "
+            << params_.player_queue_size_ << ")";
+  need_preload_ = params_.frate_ >= 0;
+  preload_num_ = params_.player_queue_size_ < PRELOAD_SIZE ? params_.player_queue_size_ : PRELOAD_SIZE;
+
+  output_queue_ = std::unique_ptr<BufferLocker>(new BufferLocker(&params_, cur_frame, storage_));
+  if (display_) display_->AddSource(this);
+
+#if (ASYNC_OUTPUT)
+  StartFetch();
+#elif !CB_OUTPUT
+  int frame_out = 0;
+  aom_image_t* img;
+#endif  // ASYNC_OUTPUT
+
+  int frame_in = 0;
+  int cnt = 0;
+  int limit = params_.limit_;
+  int has_error = 0;
+  int at_least_one = 0;
+  while (buffer && limit-- && !(*stop_) && !output_queue_->hasFatalError_) {
+    XB_LOGD << cnt++ << " - " << buffer->Size();
+    uint64_t user_priv = static_cast<uint64_t>(frame_in++ * framedur);
+    if (aom_codec_decode(&decctx_, buffer->Data(), buffer->Size(), (void*)user_priv)) {
+      XB_LOGE << "aom_codec_decode error " << aom_codec_error(&decctx_);
+      has_error = 1;
+      output_queue_->hasFatalError_ = 1;
+      break;
+    }
+    at_least_one = 1;
+#if (!ASYNC_OUTPUT && !CB_OUTPUT)
+    aom_codec_iter_t iter = NULL;
+    while ((img = aom_codec_get_frame(&decctx_, &iter))) {
+      XB_LOGD << "got " << ++frame_out;
+      if (output_queue_->Push(img)) {
+        output_queue_->hasFatalError_ = 1;
+        break;
+      }
+    }
+#endif  // ASYNC_OUTPUT
+    buffer = reader.GetData();
+  }
+  // drain
+  aom_codec_decode(&decctx_, 0, 0, 0);
+#if (!ASYNC_OUTPUT && !CB_OUTPUT)
+  if (!output_queue_->hasFatalError_) {
+    aom_codec_iter_t iter = NULL;
+    while ((img = aom_codec_get_frame(&decctx_, &iter))) {
+      XB_LOGD << "got " << ++frame_out;
+      output_queue_->Push(img);
+    }
+  }
+  output_queue_->Push(0);
+#elif ASYNC_OUTPUT
+  StopFetch();
+#endif  // ASYNC_OUTPUT
+  need_preload_ = 0;
+  if (*stop_) {
+    if (display_) display_->RemoveSource(this);
+    output_queue_->Flush();
+    XB_LOGI << "Decoding process was terminated by user";
+  } else {
+    if (at_least_one) output_queue_->WaitLast();
+    if (display_) display_->RemoveSource(this);
+  }
+  if (has_error || params_.recreate_) {
+    XB_LOGI << "Destroy decoder";
+    aom_codec_destroy(&decctx_);
+    memset(&decctx_, 0, sizeof(decctx_));
+    initialized_ = 0;
+  }
+  output_queue_->Finalize();
+  uint32_t has_md5_errors = output_queue_->has_md5_errors_;
+  if (*stop_ == taskNext) *stop_ = TaskNone;
+  if (params_.conformance_) {
+    if (has_md5_errors) {
+      if (!params_.bad_conform_file_.empty()) {
+        rename(params_.source_file_.c_str(), params_.bad_conform_file_.c_str());
+        rename((params_.source_file_ + ".md5").c_str(), (params_.bad_conform_file_ + ".md5").c_str());
+      }
+    } else {
+      remove(params_.source_file_.c_str());
+      remove((params_.source_file_ + ".md5").c_str());
+    }
+  }
+
+  return 0;
+}
+
+int AV1Decoder::FetchLoop() {
+  aom_image_t* img;
+  aom_codec_iter_t iter = NULL;
+  while (!stop_fetch_) {
+    while ((img = aom_codec_get_frame(&decctx_, &iter))) {
+      output_queue_->Push(img);
+    }
+  }
+  while ((img = aom_codec_get_frame(&decctx_, &iter))) {
+    output_queue_->Push(img);
+  }
+  output_queue_->Push(0);
+  return 0;
+}
+
+void AV1Decoder::FlushOutput() {
+  if (output_queue_.get()) {
+    output_queue_->Flush();
+  }
+}
+
+std::shared_ptr<OutBufferWrapper> AV1Decoder::GetFrame(StreamStats* stats) {
+  if (!output_queue_.get()) return 0;
+  if (need_preload_) {
+    if (output_queue_->Size() < preload_num_ && output_queue_->hasFreeBuffers())
+      return 0;
+    else
+      need_preload_ = 0;
+  }
+
+  if (stats) {
+    stats->frate = output_queue_->FRate();
+    stats->dropped = output_queue_->Drops();
+    stats->fname = params_.source_file_;
+  }
+  return output_queue_->Pop();
+}
+
+void BufferLocker::Flush() {
+  while (!queue_.empty()) {
+    Pop();
+  }
+}
+
+int BufferLocker::Push(aom_image_t* img) {
+  if (!img) {
+    std::lock_guard<std::mutex> lock(cs_);
+    queue_.push(0);
+    return 0;
+  }
+  if (got_last_frame_) got_last_frame_ = 0;
+  if (img && (img->d_w != width_ || img->d_h != height_ || img->bit_depth != bpp_)) {
+    width_ = img->d_w;
+    height_ = img->d_h;
+    bpp_ = img->bit_depth;
+    XB_LOGI << "Got first frame (" << (uint64_t)img->user_priv << ") " << width_ << " x " << height_ << " x " << bpp_;
+  }
+  if (!params_->target_file_.empty() || params_->md5_frame_check_ || params_->md5_ || md5_saver_) {
+    if (!params_->target_file_.empty() && img && params_->save_wrong_yuv_.empty()) {
+      SaveToFile(img, false);
+    }
+    if (md5_saver_) {
+      int ret = md5_saver_->Put(img);
+      if (ret) md5_saver_.reset(0);
+    }
+    if ((params_->md5_frame_check_ || params_->md5_) && img) {
+      int err = md5_checker_->Put(img);
+      if (err && !params_->save_wrong_yuv_.empty()) {
+        SaveToFile(img, true);
+      }
+    }
+  }
+  if (params_->rawDecode_ || params_->progress_) {
+    uint64_t duration;
+    uint64_t stop_decode_time = timer_.GetCurrent(5, &duration);
+    if (frame_counter_) {
+      double current_frate = (double)frame_counter_ * 1000000.0 / (double)stop_decode_time;
+      XB_LOGI << "Current frate " << frame_counter_ << "  " << current_frate;
+      XB_LOGI << "## " << frame_counter_ << " " << duration << "(" << stop_decode_time << ")";
+    }
+    frame_counter_++;
+    if (params_->rawDecode_) return 0;
+  }
+  std::shared_ptr<OutBufferWrapper> buffer = buffers_pool_->ReleaseRentedBuffer(img->fb2_priv);  //!!!!
+  assert(buffer);
+  OutBuffer* buf = buffer->Buffer();
+  buf->frame_no_offset = start_frame_;
+  buf->frameWidth = img->w;
+  buf->frameHeight = img->h;
+  buf->renderWidth = img->d_w;
+  buf->renderHeight = img->d_h;
+  buf->fb_type = (img->bit_depth == 10) ? (img->is_hdr10x3 ? fbt10x3 : fbt10bit) : fbt8bit;
+  buf->frame_no = frame_no_++;
+  buf->pts = (uint64_t)img->user_priv;
+  memcpy(buffer->Buffer()->planes, img->planes, sizeof(img->planes));
+  memcpy(buffer->Buffer()->strides, img->stride, sizeof(img->stride));
+  std::lock_guard<std::mutex> lock(cs_);
+  buffer->SetShown(0);
+  queue_.push(buffer);
+  return 0;
+}
+
+int BufferLocker::SaveToFile(aom_image_t* img, bool one_frame) {
+  std::string target_file;
+  if (one_frame) {
+    size_t pos = params_->source_file_.rfind("/");
+    if (pos != std::string::npos) {
+      char ss[32];
+      sprintf_s(ss, 32, "%d", (int)(frame_no_ + start_frame_));
+      std::string fn = params_->source_file_.substr(pos);
+      target_file = params_->save_wrong_yuv_ + "/" + fn + "_" + ss + ".yuv";
+    }
+  } else
+    target_file = params_->target_file_;
+  if (target_file.empty()) {
+    XB_LOGE << "Target file path isn't defined";
+    return -1;
+  }
+  if (!out_file_) {
+    out_file_ = fopen(target_file.c_str(), "wb");
+    if (!out_file_) {
+      XB_LOGE << "Unable to open target file " << target_file;
+      return -1;
+    }
+  }
+  uint8_t unpack_buff[4096 * 4];
+  uint8_t* ptr = img->planes[0];
+  uint8_t* wr_ptr;
+  uint32_t width_in_byte = img->d_w * (img->bit_depth == 8 ? 1 : 2);
+  for (uint32_t i = 0; i < img->d_h; i++) {
+    if (img->is_hdr10x3) {
+      uint8_t* targ = unpack_buff;
+      Md5Checker::Unpack10x3(ptr, targ, img->d_w);
+      wr_ptr = targ;
+    } else
+      wr_ptr = ptr;
+    size_t written = fwrite(wr_ptr, sizeof(uint8_t), width_in_byte, out_file_);
+    if (written != width_in_byte) {
+      XB_LOGE << "Error in target file " << target_file;
+      return -1;
+    }
+    ptr += img->stride[0];
+  }
+  if (!img->monochrome) {
+    width_in_byte /= 2;
+    uint32_t uv_width = (img->d_w + 1) / 2;
+    uint32_t uv_width_in_byte = (img->d_w + 1) / 2 * (img->bit_depth == 8 ? 1 : 2);
+    uint32_t uv_height = (img->d_h + 1) / 2;
+    for (int k = 1; k < 3; k++) {
+      ptr = img->planes[k];
+      for (uint32_t i = 0; i < uv_height; i++) {
+        if (img->is_hdr10x3) {
+          uint8_t* targ = unpack_buff;
+          Md5Checker::Unpack10x3(ptr, targ, uv_width);
+          wr_ptr = targ;
+        } else
+          wr_ptr = ptr;
+        size_t written = fwrite(wr_ptr, sizeof(uint8_t), uv_width_in_byte, out_file_);
+        if (written != uv_width_in_byte) {
+          XB_LOGE << "Error in target file " << target_file;
+          return -1;
+        }
+        ptr += img->stride[k];
+      }
+    }
+  }
+  if (one_frame && out_file_) {
+    fclose(out_file_);
+    out_file_ = 0;
+    XB_LOGI << "Decoded frame No " << (frame_no_ + start_frame_) << " was saved to " << target_file;
+  }
+  return 0;
+}
+
+std::shared_ptr<OutBufferWrapper> BufferLocker::Pop() {
+  std::lock_guard<std::mutex> lock(cs_);
+  std::shared_ptr<OutBufferWrapper> ret(0);
+  if (!started_ && queue_.empty()) return ret;
+  started_ = 1;
+  int64_t delay = 0;
+  uint64_t duration;
+  uint64_t current = timer_.GetCurrent(0, &duration);
+  int finished = 0;
+  int local_drops = 0;
+  // clean dropped
+  if (params_->frate_ >= 0) {
+    while (!queue_.empty()) {
+      ret = queue_.front();
+      if (ret) {
+        uint64_t pts = ret->Buffer()->pts;
+        delay = current - pts;
+        if (delay < DISPLAY_MIN_TIME) {
+          XB_LOGD << "Too early " << ret->Buffer()->frame_no << " " << delay;
+          ret = 0;
+          break;
+        }
+        queue_.pop();
+        if (delay <= DROP_TIME) {
+          local_drops = 0;
+          break;
+        }
+        local_drops++;
+      } else {
+        finished = 1;
+        queue_.pop();
+      }
+    }
+  } else if (queue_.size() > 0) {
+    while (queue_.size() > 1) {
+      queue_.pop();
+    }
+    ret = queue_.front();
+    if (!ret) finished = 1;
+    queue_.pop();
+  } else {
+  }
+  if (ret) {
+    last_frame_ = ret;
+    int frame_no = last_frame_->Buffer()->frame_no;
+    received_ = frame_no + 1;
+    current_frate_ = (double)frame_no * 1000000.0 / (double)current;
+    frame_dropped_ += local_drops;
+    if (params_->show_drops_ && local_drops) {
+      XB_LOGW << ret->Buffer()->frame_no << " - drop " << frame_dropped_ << " " << (delay / 1000) << "ms";
+    }
+  }
+  if (finished) {
+    XB_LOGI << "The average fps is " << current_frate_ << " (" << frame_dropped_ << " drops) in " << received_
+            << " frames";
+  }
+
+  if (finished && queue_.empty()) {
+    got_last_frame_ = 1;
+    cv_.notify_one();
+  }
+  return last_frame_;
+}
+
+int Md5Checker::Initialize(const char* md5_file, bool for_stream, uint32_t start_frame) {
+  std::string md5file(md5_file);
+  md5file += ".md5";
+  if (ParseMd5File(md5file.c_str())) return -1;
+  current_ = start_frame;
+  for_stream_ = for_stream;
+  if (for_stream_) MD5Init(&md5_ctx_);
+  return 0;
+}
+
+int Md5Saver::Put(aom_image_t* img) {
+  char ss_md5[3];
+  std::string frame_md5;
+  if (!file_) {
+    file_ = fopen(md5_file_.c_str(), "w");
+    if (!file_) {
+      return -1;
+    }
+  }
+  MD5Init(&md5_ctx_);
+  uint8_t* ptr = img->planes[0];
+  uint32_t bpp = img->bit_depth == 8 ? 1 : 2;
+  uint32_t w_c = (img->d_w + 1) / 2;
+  uint32_t h_c = (img->d_h + 1) / 2;
+  for (uint32_t i = 0; i < img->d_h; i++) {
+    MD5Update(&md5_ctx_, ptr, img->d_w * bpp);
+    ptr += img->stride[0];
+  }
+  if (!img->monochrome) {
+    for (int k = 1; k < 3; k++) {
+      uint8_t* ptr = img->planes[k];
+      for (uint32_t i = 0; i < h_c; i++) {
+        MD5Update(&md5_ctx_, ptr, w_c * bpp);
+        ptr += img->stride[k];
+      }
+    }
+  }
+  MD5Final(md5_digest_, &md5_ctx_);
+  for (int i = 0; i < 16; ++i) {
+    sprintf_s(ss_md5, 3, "%02x", md5_digest_[i]);
+    frame_md5.append(ss_md5, 2);
+  }
+  fprintf(file_, "%s   %d\n", frame_md5.c_str(), current_);
+  current_++;
+  return 0;
+}
+
+void Md5Checker::Unpack10x3(uint8_t* ptr_src, uint8_t* ptr_trg, int width_in_pix) {
+  uint32_t triads = (width_in_pix + 5) / 6;
+  uint32_t* col_ptr = (uint32_t*)ptr_trg;
+  uint32_t* src_ptr = (uint32_t*)ptr_src;
+  for (uint32_t i = 0; i < triads; i++) {
+    uint32_t val = (src_ptr[0] & 0x000003FF) | ((src_ptr[0] & 0x000FFC00) << 6);
+    col_ptr[0] = val;
+    val = (src_ptr[0] & 0x3FF00000) >> 20 | ((src_ptr[1] & 0x000003FF) << 16);
+    col_ptr[1] = val;
+    val = (src_ptr[1] & 0x000FFC00) >> 10 | ((src_ptr[1] & 0x3FF00000) >> 4);
+    col_ptr[2] = val;
+
+    src_ptr += 2;
+    col_ptr += 3;
+  }
+}
+
+int Md5Checker::Put(aom_image_t* img) {
+  char ss_md5[3];
+  std::string frame_md5;
+  int error = 0;
+  uint8_t unpack_buff[4096 * 4];
+  if (img->is_hdr10x3 && img->d_w > 9182) return 1;
+  if (!for_stream_) MD5Init(&md5_ctx_);
+  uint8_t* ptr = img->planes[0];
+  uint32_t bpp = img->bit_depth == 8 ? 1 : 2;
+  uint32_t w_c = (img->d_w + 1) / 2;
+  uint32_t h_c = (img->d_h + 1) / 2;
+  for (uint32_t i = 0; i < img->d_h; i++) {
+    if (img->is_hdr10x3) {
+      uint8_t* targ = unpack_buff;
+      Unpack10x3(ptr, targ, img->d_w);
+      MD5Update(&md5_ctx_, targ, img->d_w * bpp);
+    } else {
+      MD5Update(&md5_ctx_, ptr, img->d_w * bpp);
+    }
+    ptr += img->stride[0];
+  }
+  if (!img->monochrome) {
+    for (int k = 1; k < 3; k++) {
+      uint8_t* ptr = img->planes[k];
+      for (uint32_t i = 0; i < h_c; i++) {
+        if (img->is_hdr10x3) {
+          uint8_t* targ = unpack_buff;
+          Unpack10x3(ptr, targ, w_c);
+          MD5Update(&md5_ctx_, targ, w_c * bpp);
+        } else {
+          MD5Update(&md5_ctx_, ptr, w_c * bpp);
+        }
+        ptr += img->stride[k];
+      }
+    }
+  }
+  if (!for_stream_) {
+    MD5Final(md5_digest_, &md5_ctx_);
+    for (int i = 0; i < 16; ++i) {
+      sprintf_s(ss_md5, 3, "%02x", md5_digest_[i]);
+      frame_md5.append(ss_md5, 2);
+    }
+    if (md5_strings_.empty()) {
+      XB_LOGE << "MD5 checker doesn't have referenced md5 values";
+    } else if (frame_md5.compare(md5_strings_[current_]) != 0) {
+      XB_LOGE << "MD5 checksum error in " << current_ << " frame. Avaiting " << md5_strings_[current_] << " In real "
+              << frame_md5;
+      hasErrors_++;
+      error = 1;
+    }
+  }
+  current_++;
+  return error;
+}
+void Md5Checker::Finalize() {
+  char ss_md5[3];
+  std::string frame_md5;
+
+  MD5Final(md5_digest_, &md5_ctx_);
+  for (int i = 0; i < 16; ++i) {
+    sprintf_s(ss_md5, 3, "%02x", md5_digest_[i]);
+    frame_md5.append(ss_md5, 2);
+  }
+  if (md5_strings_.empty()) {
+    XB_LOGE << "MD5 checker doesn't have referenced md5 values";
+  } else if (frame_md5.compare(md5_strings_[0]) != 0) {
+    hasErrors_++;
+  }
+}
+
+uint32_t Md5Checker::ShowResult(uint32_t fatal_error) {
+  if (for_stream_) Finalize();
+  if (hasErrors_) {
+    if (for_stream_)
+      XB_LOGE << "MD5 checksum doesn't match";
+    else
+      XB_LOGE << "MD5 checksum errors in " << hasErrors_ << " frames";
+  } else if (!fatal_error)
+    XB_LOGI << "MD5 is OK";
+  if (fatal_error) XB_LOGE << "Unable to decode whole stream die to fatal error";
+  return hasErrors_;
+}
+
+int Md5Checker::ParseMd5File(const char* md5_file) {
+  md5_strings_.clear();
+  std::ifstream input(md5_file);
+  if (input.is_open()) {
+    std::string line;
+
+    while (std::getline(input, line)) {
+      md5_strings_.push_back(line.substr(0, 32));
+    }
+  } else {
+    XB_LOGE << "Invalid md5 file path '" << md5_file << "'";
+    return -1;
+  }
+
+  return 0;
+}

diff --git a/testing/player_common/decoder.h b/testing/player_common/decoder.h
new file mode 100644
index 0000000..6393ca4
--- /dev/null
+++ b/testing/player_common/decoder.h

@@ -0,0 +1,286 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+
+#include <stdint.h>
+#include <memory>
+#include <string>
+#include <thread>
+#include <mutex>
+#include <queue>
+#include <assert.h>
+#include <fstream>
+
+#include <wrl.h>
+#include <wrl/client.h>
+#include <d3d11.h>
+
+#include "common/log.h"
+#include "cmd_parser.h"
+#include "common/utils.h"
+#include "common/stream_reader.h"
+#include "aom/aom_codec.h"
+#include "aom/aom_decoder.h"
+#include "aom/aomdx.h"
+#include "common/frame_outs.h"
+
+#include "common/md5_utils.h"
+
+#define DROP_TIME 8000
+#define DISPLAY_MIN_TIME (-10000)
+#define DEFAULT_FPS 60
+#define MAGIG_IDX 0xFFFF
+#define PRELOAD_SIZE 10
+#define MAX_PLAYER_QUEUE_SIZE 16
+
+#define STREAM_READER_CACHE_SIZE 20
+
+#define CB_OUTPUT 1
+#if CB_OUTPUT
+#define ASYNC_OUTPUT 0
+#else
+#define ASYNC_OUTPUT 1
+#endif
+
+// using namespace DirectX;
+using namespace Microsoft::WRL;
+
+class DisplayImpl;
+class AV1Decoder;
+interface DisplayCallbacks;
+
+#define AVAR_SIZE 8
+
+typedef struct d3d_resources {
+  ComPtr<ID3D12Device> dx12_device;
+  ComPtr<ID3D12CommandQueue> dx12_command_queue;
+  void* dx12_psos;
+  d3d_resources(d3d_resources& other) {
+    dx12_device = other.dx12_device;
+    dx12_command_queue = other.dx12_command_queue;
+    dx12_psos = other.dx12_psos;
+  };
+  d3d_resources(ID3D12Device* dx12_device, ID3D12CommandQueue* dx12_command_queue) {
+    this->dx12_device = dx12_device;
+    this->dx12_command_queue = dx12_command_queue;
+  }
+} d3d_resources;
+
+class Md5Checker {
+ public:
+  int Initialize(const char* md5_file, bool for_stream, uint32_t start_frame);
+  int Put(aom_image_t* img);
+  void Finalize();
+  uint32_t ShowResult(uint32_t fatal_error);
+  static void Unpack10x3(uint8_t* ptr_src, uint8_t* ptr_trg, int width_in_pix);
+
+ private:
+  int ParseMd5File(const char* md5_file);
+  std::string md5_file_;
+  std::vector<std::string> md5_strings_;
+  MD5Context md5_ctx_;
+  unsigned char md5_digest_[16];
+  uint32_t current_ = 0;
+  uint32_t hasErrors_ = 0;
+  bool for_stream_ = 0;
+  //  bool initialized_ = false;
+};
+
+class Md5Saver {
+ public:
+  Md5Saver(const char* md5_file, uint32_t start_frame) : md5_file_(md5_file), current_(start_frame) {}
+  ~Md5Saver() {
+    if (file_) fclose(file_);
+  }
+  int Put(aom_image_t* img);
+
+ private:
+  std::string md5_file_;
+  MD5Context md5_ctx_;
+  unsigned char md5_digest_[16];
+  uint32_t current_ = 0;
+  FILE* file_ = 0;
+};
+
+struct StreamStats {
+  double frate;
+  uint32_t dropped;
+  std::string fname;
+};
+
+class BufferLocker {
+ public:
+  BufferLocker(const CParameters::Parameters* params, uint32_t start_frame, std::shared_ptr<OutBufferStorage> storage)
+      : params_(params), start_frame_(start_frame), buffers_pool_(storage) {
+    if (params_->md5_frame_check_ || params_->md5_) {
+      md5_checker_.reset(new Md5Checker());
+      if (md5_checker_) {
+        md5_checker_->Initialize(params_->source_file_.c_str(), params_->md5_, start_frame);
+      }
+    }
+    if (!params_->save_md5_.empty()) {
+      md5_saver_.reset(new Md5Saver(params_->save_md5_.c_str(), start_frame));
+    }
+  }
+  ~BufferLocker() { Finalize(); }
+  bool IsEmpty() {
+    std::lock_guard<std::mutex> lock(cs_);
+    return queue_.empty();
+  }
+  void WaitEmpty() {
+    std::unique_lock<std::mutex> lock(cs_);
+    cv_.wait(lock, [this] { return !this->queue_.empty(); });
+  }
+  void WaitLast() {
+    std::unique_lock<std::mutex> lock(cs_);
+    cv_.wait(lock, [this] { return this->got_last_frame_; });
+  }
+
+  int Push(aom_image_t* img);
+  std::shared_ptr<OutBufferWrapper> Pop();
+  int Drops() const { return frame_dropped_; }
+  double FRate() { return current_frate_; }
+  size_t Size() {
+    std::lock_guard<std::mutex> lock(cs_);
+    return queue_.size();
+  }
+  int hasFreeBuffers() { return buffers_pool_->GetSize() > 0; }
+
+  int SaveToFile(aom_image_t* img, bool one_frame);
+  int GetBufferInt(size_t buf_size, uint16_t w, uint16_t h, frame_buffer_type fb_type, av1_decoded_frame_buffer_t* fb);
+  int ReleaseBufferInt(void* buffer_priv);
+  void Flush();
+  void Finalize() {
+    if (!finalized_) {
+      if (out_file_) fclose(out_file_);
+      if (md5_checker_) has_md5_errors_ = md5_checker_->ShowResult(hasFatalError_);
+      last_frame_.reset();
+      // buffers_pool_.reset();
+      finalized_ = true;
+      started_ = 0;
+    }
+  }
+
+ private:
+  std::queue<std::shared_ptr<OutBufferWrapper> > queue_;
+  std::shared_ptr<OutBufferWrapper> last_frame_;
+  std::unique_ptr<Md5Checker> md5_checker_;
+  std::unique_ptr<Md5Saver> md5_saver_;
+  std::mutex cs_;
+  StreamTimer timer_;
+  int frame_dropped_ = 0;
+  FILE* out_file_ = 0;
+  std::condition_variable cv_;
+  uint32_t received_ = 0;
+  double current_frate_ = 0.0;
+  std::shared_ptr<OutBufferStorage> buffers_pool_;
+  int frame_no_ = 0;
+  int frame_counter_ = 0;
+  int got_last_frame_ = 0;
+  const CParameters::Parameters* params_;
+  uint32_t hasFatalError_ = 0;
+  uint32_t width_ = 0;
+  uint32_t height_ = 0;
+  uint32_t bpp_ = 0;
+  uint32_t start_frame_ = 0;
+  ComPtr<ID3D12Device> dx12_device_;
+  bool has_md5_errors_ = false;
+  bool finalized_ = false;
+  int started_ = 0;
+  friend class AV1Decoder;
+};
+
+class AV1Decoder {
+ public:
+  AV1Decoder(d3d_resources dx_resources, DisplayCallbacks* display, uint32_t* stop = 0)
+      : dx_desources_(dx_resources), display_(display), stop_(stop) {}
+  ~AV1Decoder() {
+    if (initialized_) {
+      aom_codec_destroy(&decctx_);
+    }
+  }
+  void Start(CParameters* params) {
+    need_reinit_ = params && (params->GetParams().threads_ != params_.threads_ ||
+                              params->GetParams().player_queue_size_ != params_.player_queue_size_ ||
+                              params->GetParams().recreate_);
+    params_ = params->GetParams();
+    stop_decode_ = 0;
+    decode_thread_ = std::thread(RunDecoderLoop, this);
+    //        DecodeLoop();
+  }
+
+  void Stop() {
+    stop_decode_ = 1;
+    WaitForFinish();
+  }
+
+  void WaitForFinish() { decode_thread_.join(); }
+
+  std::shared_ptr<OutBufferWrapper> GetFrame(StreamStats* stats = 0);
+  void FlushOutput();
+
+  static int GetBuffer(void* priv, size_t buf_size, uint16_t w, uint16_t h, frame_buffer_type fb_type,
+                       av1_decoded_frame_buffer_t* fb) {
+    if (!priv) return -1;
+    AV1Decoder* This = static_cast<AV1Decoder*>(priv);
+    return This->output_queue_->GetBufferInt(buf_size, w, h, fb_type, fb);
+  }
+  static int ReleaseBuffer(void* priv, void* buffer_priv) {
+    if (!priv) return -1;
+    AV1Decoder* This = static_cast<AV1Decoder*>(priv);
+    return This->output_queue_->ReleaseBufferInt(buffer_priv);
+  }
+  static int NotifyFrameReady(void* priv, void* img) {
+    if (!priv) return -1;
+    AV1Decoder* This = static_cast<AV1Decoder*>(priv);
+    return This->output_queue_->Push((aom_image_t*)img);
+  }
+
+ protected:
+  static void RunDecoderLoop(AV1Decoder* ptr) { ptr->DecodeLoop(); }
+  static void RunFetchLoop(AV1Decoder* ptr) { ptr->FetchLoop(); }
+  int DecodeLoop();
+  int DecodeOne();
+  int FetchLoop();
+  void StartFetch() {
+    stop_fetch_ = 0;
+    fetch_thread_ = std::thread(RunFetchLoop, this);
+  }
+  void StopFetch() {
+    stop_fetch_ = 1;
+    fetch_thread_.join();
+  }
+
+ private:
+  std::thread fetch_thread_;
+  std::thread decode_thread_;
+  CParameters::Parameters params_;
+  volatile int is_running_ = 0;
+  aom_codec_ctx_t decctx_;
+  volatile int stop_decode_ = 0;
+  volatile int stop_fetch_ = 0;
+  d3d_resources dx_desources_;
+  uint32_t* stop_;
+  std::unique_ptr<BufferLocker> output_queue_;
+  int need_preload_ = 1;
+  uint32_t preload_num_ = PRELOAD_SIZE;
+  DisplayCallbacks* display_ = 0;
+  std::vector<uint8_t> host_mem_;
+  std::shared_ptr<OutBufferStorage> storage_;
+  int initialized_ = 0;
+  int need_reinit_ = 0;
+};
\ No newline at end of file

diff --git a/testing/player_common/decoder_creator.cpp b/testing/player_common/decoder_creator.cpp
new file mode 100644
index 0000000..c382f29
--- /dev/null
+++ b/testing/player_common/decoder_creator.cpp

@@ -0,0 +1,217 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <vector>
+#include <list>
+#include <string>
+#include <fstream>
+#include <sstream>
+#include <chrono>
+#include <thread>
+#include <set>
+
+#include "common\log.h"
+#include "cmd_parser.h"
+#include "decoder.h"
+#include "aom/aom.h"
+#include "aom/aom_decoder.h"
+
+#include "common/stream_reader.h"
+#include "decoder_creator.h"
+
+#if (WINAPI_FAMILY == WINAPI_FAMILY_APP)
+using namespace Platform;
+using namespace Windows::Foundation;
+using namespace Windows::Foundation::Collections;
+using namespace Windows::Storage::Streams;
+
+using namespace Windows::Storage;
+using namespace Platform;
+#endif
+
+extern "C" void usage_exit(void) { exit(EXIT_FAILURE); }
+
+#define ERR_DESC(x) \
+  { x, #x }
+struct error_decript {
+  int err;
+  const char* err_string;
+};
+const error_decript error_decripts[] = {ERR_DESC(AOM_CODEC_OK),
+                                        ERR_DESC(AOM_CODEC_ERROR),
+                                        ERR_DESC(AOM_CODEC_MEM_ERROR),
+                                        ERR_DESC(AOM_CODEC_ABI_MISMATCH),
+                                        ERR_DESC(AOM_CODEC_INCAPABLE),
+                                        ERR_DESC(AOM_CODEC_UNSUP_BITSTREAM),
+                                        ERR_DESC(AOM_CODEC_CORRUPT_FRAME),
+                                        ERR_DESC(AOM_CODEC_INVALID_PARAM),
+                                        ERR_DESC(AOM_CODEC_UNSUP_FEATURE)};
+const int errors_size = sizeof(error_decripts) / sizeof(error_decript);
+
+#if (WINAPI_FAMILY == WINAPI_FAMILY_APP)
+static std::string pstos(Platform::String ^ ps) {
+  std::string s;
+  std::wstring ws(ps->Data());
+  s.assign(ws.begin(), ws.end());
+  return s;
+}
+#endif
+
+const char* GetVpxErrorString(int err) {
+  for (int i = 0; i < errors_size; i++) {
+    if (error_decripts[i].err == err) return error_decripts[i].err_string;
+  }
+  return "Unspecified error";
+}
+
+static void parse_file(const std::string& str, std::vector<one_task>& list) {
+  std::ifstream input(str.c_str());
+  if (input.is_open()) {
+    std::string line;
+
+    while (std::getline(input, line)) {
+      if (line[0] == '#') continue;
+      std::string ss;
+      int brace_is_open = 0;
+      list.push_back(one_task(line));
+      std::vector<std::string>& current = list.back();
+      for (int i = 0; i < line.length(); i++) {
+        int s = line[i];
+        if (s == '\"') {
+          if (brace_is_open) {
+            if (ss.length()) current.push_back(ss);
+            ss = "";
+            brace_is_open = 0;
+            continue;
+          } else {
+            brace_is_open = 1;
+            continue;
+          }
+        }
+        if (isspace(s) && !brace_is_open) {
+          if (ss.length()) current.push_back(ss);
+          ss = "";
+          continue;
+        }
+        ss += (char)s;
+      }
+      if (ss.length()) current.push_back(ss);
+    }
+  } else {
+    XB_LOGE << "Invalid file path '" << str.c_str() << "'";
+  }
+}
+
+int Decoders::process_directory(CParameters& params) {
+  XB_LOGI << "#######################################";
+  XB_LOGI << "# Process directory " << params.GetParams().source_dir_;
+  XB_LOGI << "#######################################";
+  std::vector<std::string> names;
+  std::string search_path = /*read_path_ + "\\" +*/ params.GetParams().source_dir_ + "\\*.*";
+  WIN32_FIND_DATAA fd;
+  HANDLE hFind = ::FindFirstFileA(search_path.c_str(), &fd);
+  if (hFind != INVALID_HANDLE_VALUE) {
+    do {
+      // read all (real) files in current folder
+      // , delete '!' read other 2 default folder . and ..
+      if (!(fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
+        std::string str(fd.cFileName);
+        if (str.substr(str.length() - 4, 4) == ".ivf" || str.substr(str.length() - 4, 4) == ".av1" ||
+            str.substr(str.length() - 5, 5) == ".webm" || str.substr(str.length() - 4, 4) == ".obu" ||
+            str.substr(str.length() - 4, 4) == ".mp4")
+          names.push_back(str);
+      }
+    } while (::FindNextFileA(hFind, &fd));
+    ::FindClose(hFind);
+  }
+  for (std::vector<std::string>::iterator iter = names.begin(); iter != names.end() && !stop_; iter++) {
+    std::string path(params.GetParams().source_dir_);
+    if (path.substr(path.length() - 1, 1) != "\\" || path.substr(path.length() - 1, 1) != "/") path += "\\";
+    params.GetParamsPtr()->bad_conform_file_ = params.GetParams().bad_conform_dir_ + *iter;
+    path += *iter;
+    XB_LOGD << "Create decoder for " << params.GetParams().source_file_.c_str();
+    params.GetParamsPtr()->source_file_ = path;
+    decoder_->Start(&params);
+    decoder_->WaitForFinish();
+  }
+  return 0;
+}
+
+Decoders::Decoders(int is_xbox, d3d_resources dx_resources, DisplayCallbacks* display, const char* root_path)
+    : is_xbox_(is_xbox), dx_resources_(dx_resources), display_(display), root_path_(root_path ? root_path : "") {}
+
+void Decoders::GetAvailablePaths(std::string& read_path, std::string& write_path, std::string& root_path) {
+#if (WINAPI_FAMILY == WINAPI_FAMILY_APP)
+  write_path = "..\\data";
+  read_path = "..\\data";
+#else
+  read_path = write_path = root_path.empty() ? "./" : root_path;
+#endif
+}
+
+int Decoders::DecodeLoop() {
+  std::string err;
+  std::string args_path(_CONFIG_PATH_);
+  std::vector<one_task> arg_lists;
+  std::string write_path;
+  std::string read_path;
+  GetAvailablePaths(read_path, write_path, root_path_);
+  write_path_ = write_path;
+  read_path_ = read_path;
+  std::string full_args_path = read_path + args_path;
+  parse_file(full_args_path, arg_lists);
+  if (arg_lists.size() == 0) {
+    XB_LOGE << "No args were found";
+    return -1;
+  }
+  dx_resources_.dx12_psos =
+      av1_create_pipeline_cache_handle(dx_resources_.dx12_device.Get(), CREATE_SHADERS_THREAD_CNT);
+  if (!dx_resources_.dx12_psos) {
+    XB_LOGE << "Unable to create compute shaders";
+    return -1;
+  }
+  decoder_ = std::shared_ptr<AV1Decoder>(new AV1Decoder(dx_resources_, display_, &stop_decode_));
+  for (std::vector<one_task>::iterator iter = arg_lists.begin(); iter != arg_lists.end() && !stop_; iter++) {
+    // check if line is not empty
+    if (iter->empty()) continue;
+    CParameters params(read_path.c_str(), write_path.c_str());
+    int ret = params.Parse(*iter, err);
+    if (ret || !err.empty()) {
+      XB_LOGE << "Invalid arguments string. " << err;
+      continue;
+    }
+    if (params.GetParams().isDirectory_) {
+      if (params.GetParams().conformance_) {
+        params.GetParamsPtr()->bad_conform_dir_ = params.GetParamsPtr()->source_dir_ + "\\trg\\";
+        params.GetParamsPtr()->source_dir_ += "\\src";
+      }
+      process_directory(params);
+    } else {
+      XB_LOGD << "Run decoder for " << params.GetParams().source_file_.c_str();
+      decoder_->Start(&params);
+      decoder_->WaitForFinish();
+      int a = 0;
+    }
+  }
+  // finished all tasks
+  decoder_.reset();
+  if (display_) display_->AllJobDoneNotify();
+  av1_destroy_pipeline_cache_handle(dx_resources_.dx12_psos);
+  dx_resources_.dx12_psos = 0;
+  return 0;
+}

diff --git a/testing/player_common/decoder_creator.h b/testing/player_common/decoder_creator.h
new file mode 100644
index 0000000..ffeaf1a
--- /dev/null
+++ b/testing/player_common/decoder_creator.h

@@ -0,0 +1,82 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+
+#include "decoder.h"
+
+#define CREATE_SHADERS_THREAD_CNT 4  // 0 - sync creation, async (1 - 6 threads num)
+#define _CONFIG_PATH_ "/args/args_av1.txt"
+
+typedef enum { TaskNone = 0, taskStop = 1, taskNext = 2, taskStopNoWait = 3, taskStopFlush = 4 } decodeTask;
+
+#if (WINAPI_FAMILY == WINAPI_FAMILY_APP)
+using namespace Windows::Storage;
+#endif
+
+class Decoders;
+
+interface DisplayCallbacks {
+  virtual void AddSource(AV1Decoder * source) = 0;
+  virtual void RemoveSource(AV1Decoder * source) = 0;
+  virtual void AllJobDoneNotify() = 0;
+};
+
+class Decoders {
+ public:
+  Decoders(int is_xbox, d3d_resources dx_resources, DisplayCallbacks* display, const char* root_path = 0);
+
+  static void GetAvailablePaths(std::string& read_path, std::string& write_path, std::string& root_path);
+  void Start() {
+    stop_decode_ = 0;
+    stop_ = 0;
+    work_thread_ = std::thread(RunDecoderLoop, this);
+  }
+
+  void Stop(decodeTask stop_task) {
+    stop_decode_ = stop_task;
+    if (stop_task == taskStop) {
+      stop_ = 1;
+      WaitForFinish();
+    } else if (stop_task == taskStopNoWait) {
+      stop_ = 1;
+    } else if (stop_task == taskStopFlush) {
+      stop_ = 1;
+      decoder_->FlushOutput();
+      WaitForFinish();
+    }
+  }
+
+  void WaitForFinish() { work_thread_.join(); }
+
+ protected:
+  static void RunDecoderLoop(Decoders* ptr) { ptr->DecodeLoop(); }
+  int DecodeLoop();
+  int process_directory(CParameters& params);
+
+ private:
+  int is_xbox_ = 0;
+  DisplayCallbacks* display_;
+  std::shared_ptr<AV1Decoder> decoder_;
+  std::thread work_thread_;
+  volatile int is_running_ = 0;
+  volatile int stop_ = 0;
+  uint32_t stop_decode_ = 0;
+  std::string read_path_;
+  std::string write_path_;
+  d3d_resources dx_resources_;
+  std::string root_path_;
+};
\ No newline at end of file

diff --git a/testing/uwp_common/uwp_common.vcxproj b/testing/uwp_common/uwp_common.vcxproj
new file mode 100644
index 0000000..74da033
--- /dev/null
+++ b/testing/uwp_common/uwp_common.vcxproj

@@ -0,0 +1,270 @@
+<?xml version="1.0" encoding="utf-8"?>

+<Project DefaultTargets="Build" ToolsVersion="15.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

+  <ItemGroup Label="ProjectConfigurations">

+    <ProjectConfiguration Include="Debug|ARM">

+      <Configuration>Debug</Configuration>

+      <Platform>ARM</Platform>

+    </ProjectConfiguration>

+    <ProjectConfiguration Include="Debug|Win32">

+      <Configuration>Debug</Configuration>

+      <Platform>Win32</Platform>

+    </ProjectConfiguration>

+    <ProjectConfiguration Include="Debug|x64">

+      <Configuration>Debug</Configuration>

+      <Platform>x64</Platform>

+    </ProjectConfiguration>

+    <ProjectConfiguration Include="Release|ARM">

+      <Configuration>Release</Configuration>

+      <Platform>ARM</Platform>

+    </ProjectConfiguration>

+    <ProjectConfiguration Include="Release|Win32">

+      <Configuration>Release</Configuration>

+      <Platform>Win32</Platform>

+    </ProjectConfiguration>

+    <ProjectConfiguration Include="Release|x64">

+      <Configuration>Release</Configuration>

+      <Platform>x64</Platform>

+    </ProjectConfiguration>

+  </ItemGroup>

+  <ItemGroup>

+    <ClCompile Include="..\..\third_party\libwebm\mkvparser\mkvparser.cc" />

+    <ClCompile Include="..\..\third_party\libwebm\mkvparser\mkvreader.cc" />

+    <ClCompile Include="..\common\frame_outs.cpp" />

+    <ClCompile Include="..\common\ivfdec.c">

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Debug|ARM'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Release|ARM'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Release|x64'">false</CompileAsWinRT>

+    </ClCompile>

+    <ClCompile Include="..\common\log.cpp" />

+    <ClCompile Include="..\common\md5_utils.c">

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Debug|ARM'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Release|ARM'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Release|x64'">false</CompileAsWinRT>

+    </ClCompile>

+    <ClCompile Include="..\common\mp4parser.cc" />

+    <ClCompile Include="..\common\obudec.c">

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Debug|ARM'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Release|ARM'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Release|x64'">false</CompileAsWinRT>

+    </ClCompile>

+    <ClCompile Include="..\common\stream_reader.cpp" />

+    <ClCompile Include="..\common\tools_common.c">

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Debug|ARM'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Release|ARM'">false</CompileAsWinRT>

+      <CompileAsWinRT Condition="'$(Configuration)|$(Platform)'=='Release|x64'">false</CompileAsWinRT>

+    </ClCompile>

+    <ClCompile Include="..\common\webmdec.cc" />

+    <ClCompile Include="..\player_common\cmd_parser.cpp" />

+    <ClCompile Include="..\player_common\decoder.cpp" />

+    <ClCompile Include="..\player_common\decoder_creator.cpp" />

+  </ItemGroup>

+  <ItemGroup>

+    <ClInclude Include="..\common\frame_outs.h" />

+    <ClInclude Include="..\common\frame_queue.h" />

+    <ClInclude Include="..\common\ivfdec.h" />

+    <ClInclude Include="..\common\log.h" />

+    <ClInclude Include="..\common\md5_utils.h" />

+    <ClInclude Include="..\common\mp4parser.h" />

+    <ClInclude Include="..\common\obudec.h" />

+    <ClInclude Include="..\common\stream_reader.h" />

+    <ClInclude Include="..\common\tools_common.h" />

+    <ClInclude Include="..\common\utils.h" />

+    <ClInclude Include="..\common\webmdec.h" />

+    <ClInclude Include="..\player_common\cmd_parser.h" />

+    <ClInclude Include="..\player_common\decoder.h" />

+    <ClInclude Include="..\player_common\decoder_creator.h" />

+  </ItemGroup>

+  <ItemGroup>

+    <ProjectReference Include="..\..\libav1\build\libav1.vcxproj">

+      <Project>{e68f3562-9876-4971-8d90-27035e27a825}</Project>

+    </ProjectReference>

+  </ItemGroup>

+  <PropertyGroup Label="Globals">

+    <ProjectGuid>{c7b73550-4ca4-45c1-8436-936afd7f0ca5}</ProjectGuid>

+    <Keyword>StaticLibrary</Keyword>

+    <RootNamespace>uwp_common</RootNamespace>

+    <DefaultLanguage>en-US</DefaultLanguage>

+    <MinimumVisualStudioVersion>14.0</MinimumVisualStudioVersion>

+    <AppContainerApplication>true</AppContainerApplication>

+    <ApplicationType>Windows Store</ApplicationType>

+    <WindowsTargetPlatformVersion>10.0.17763.0</WindowsTargetPlatformVersion>

+    <WindowsTargetPlatformMinVersion>10.0.16299.0</WindowsTargetPlatformMinVersion>

+    <ApplicationTypeRevision>10.0</ApplicationTypeRevision>

+  </PropertyGroup>

+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">

+    <ConfigurationType>StaticLibrary</ConfigurationType>

+    <UseDebugLibraries>true</UseDebugLibraries>

+    <PlatformToolset>v141</PlatformToolset>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|ARM'" Label="Configuration">

+    <ConfigurationType>StaticLibrary</ConfigurationType>

+    <UseDebugLibraries>true</UseDebugLibraries>

+    <PlatformToolset>v141</PlatformToolset>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">

+    <ConfigurationType>StaticLibrary</ConfigurationType>

+    <UseDebugLibraries>true</UseDebugLibraries>

+    <PlatformToolset>v141</PlatformToolset>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">

+    <ConfigurationType>StaticLibrary</ConfigurationType>

+    <UseDebugLibraries>false</UseDebugLibraries>

+    <WholeProgramOptimization>true</WholeProgramOptimization>

+    <PlatformToolset>v141</PlatformToolset>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|ARM'" Label="Configuration">

+    <ConfigurationType>StaticLibrary</ConfigurationType>

+    <UseDebugLibraries>false</UseDebugLibraries>

+    <WholeProgramOptimization>true</WholeProgramOptimization>

+    <PlatformToolset>v141</PlatformToolset>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">

+    <ConfigurationType>StaticLibrary</ConfigurationType>

+    <UseDebugLibraries>false</UseDebugLibraries>

+    <WholeProgramOptimization>true</WholeProgramOptimization>

+    <PlatformToolset>v141</PlatformToolset>

+  </PropertyGroup>

+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />

+  <ImportGroup Label="ExtensionSettings">

+  </ImportGroup>

+  <ImportGroup Label="Shared">

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|ARM'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|ARM'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <PropertyGroup Label="UserMacros" />

+  <PropertyGroup />

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+    <GenerateManifest>false</GenerateManifest>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+    <GenerateManifest>false</GenerateManifest>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|ARM'">

+    <GenerateManifest>false</GenerateManifest>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|ARM'">

+    <GenerateManifest>false</GenerateManifest>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+    <GenerateManifest>false</GenerateManifest>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+    <GenerateManifest>false</GenerateManifest>

+  </PropertyGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+    <ClCompile>

+      <PrecompiledHeader>Use</PrecompiledHeader>

+      <CompileAsWinRT>false</CompileAsWinRT>

+      <SDLCheck>true</SDLCheck>

+    </ClCompile>

+    <Link>

+      <SubSystem>Console</SubSystem>

+      <IgnoreAllDefaultLibraries>false</IgnoreAllDefaultLibraries>

+      <GenerateWindowsMetadata>false</GenerateWindowsMetadata>

+    </Link>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+    <ClCompile>

+      <PrecompiledHeader>Use</PrecompiledHeader>

+      <CompileAsWinRT>false</CompileAsWinRT>

+      <SDLCheck>true</SDLCheck>

+    </ClCompile>

+    <Link>

+      <SubSystem>Console</SubSystem>

+      <IgnoreAllDefaultLibraries>false</IgnoreAllDefaultLibraries>

+      <GenerateWindowsMetadata>false</GenerateWindowsMetadata>

+    </Link>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|arm'">

+    <ClCompile>

+      <PrecompiledHeader>Use</PrecompiledHeader>

+      <CompileAsWinRT>false</CompileAsWinRT>

+      <SDLCheck>true</SDLCheck>

+    </ClCompile>

+    <Link>

+      <SubSystem>Console</SubSystem>

+      <IgnoreAllDefaultLibraries>false</IgnoreAllDefaultLibraries>

+      <GenerateWindowsMetadata>false</GenerateWindowsMetadata>

+    </Link>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|arm'">

+    <ClCompile>

+      <PrecompiledHeader>Use</PrecompiledHeader>

+      <CompileAsWinRT>false</CompileAsWinRT>

+      <SDLCheck>true</SDLCheck>

+    </ClCompile>

+    <Link>

+      <SubSystem>Console</SubSystem>

+      <IgnoreAllDefaultLibraries>false</IgnoreAllDefaultLibraries>

+      <GenerateWindowsMetadata>false</GenerateWindowsMetadata>

+    </Link>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+    <ClCompile>

+      <PrecompiledHeader>NotUsing</PrecompiledHeader>

+      <CompileAsWinRT>true</CompileAsWinRT>

+      <SDLCheck>true</SDLCheck>

+      <AdditionalIncludeDirectories>$(SolutionDir)third_party\libwebm;$(SolutionDir)testing;$(SolutionDir)libav1;$(SolutionDir)libav1\build</AdditionalIncludeDirectories>

+      <PreprocessorDefinitions>_CRT_SECURE_NO_WARNINGS;%(PreprocessorDefinitions)</PreprocessorDefinitions>

+      <DisableSpecificWarnings>4264</DisableSpecificWarnings>

+    </ClCompile>

+    <Link>

+      <SubSystem>Console</SubSystem>

+      <IgnoreAllDefaultLibraries>false</IgnoreAllDefaultLibraries>

+      <GenerateWindowsMetadata>false</GenerateWindowsMetadata>

+    </Link>

+    <Lib>

+      <AdditionalOptions>/ignore:4264 %(AdditionalOptions)</AdditionalOptions>

+    </Lib>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+    <ClCompile>

+      <PrecompiledHeader>NotUsing</PrecompiledHeader>

+      <CompileAsWinRT>true</CompileAsWinRT>

+      <SDLCheck>true</SDLCheck>

+      <AdditionalIncludeDirectories>$(SolutionDir)third_party\libwebm;$(SolutionDir)testing;$(SolutionDir)libav1;$(SolutionDir)libav1\build</AdditionalIncludeDirectories>

+      <PreprocessorDefinitions>_CRT_SECURE_NO_WARNINGS;%(PreprocessorDefinitions)</PreprocessorDefinitions>

+      <DisableSpecificWarnings>4264</DisableSpecificWarnings>

+    </ClCompile>

+    <Link>

+      <SubSystem>Console</SubSystem>

+      <IgnoreAllDefaultLibraries>false</IgnoreAllDefaultLibraries>

+      <GenerateWindowsMetadata>false</GenerateWindowsMetadata>

+    </Link>

+    <Lib>

+      <AdditionalOptions>/ignore:4264 %(AdditionalOptions)</AdditionalOptions>

+    </Lib>

+  </ItemDefinitionGroup>

+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />

+  <ImportGroup Label="ExtensionTargets">

+  </ImportGroup>

+</Project>
\ No newline at end of file

diff --git a/testing/uwp_common/uwp_common.vcxproj.filters b/testing/uwp_common/uwp_common.vcxproj.filters
new file mode 100644
index 0000000..4cac99d
--- /dev/null
+++ b/testing/uwp_common/uwp_common.vcxproj.filters

@@ -0,0 +1,103 @@
+<?xml version="1.0" encoding="utf-8"?>

+<Project ToolsVersion="15.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

+  <ItemGroup>

+    <Filter Include="Resource Files">

+      <UniqueIdentifier>{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}</UniqueIdentifier>

+      <Extensions>rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tga;tiff;tif;png;wav;mfcribbon-ms</Extensions>

+    </Filter>

+    <Filter Include="Source Files">

+      <UniqueIdentifier>{31633f1f-9fde-41ec-b1e6-b7aed05a9e0d}</UniqueIdentifier>

+    </Filter>

+    <Filter Include="common">

+      <UniqueIdentifier>{c38ac1ea-4980-4198-b475-16fc598a70f9}</UniqueIdentifier>

+    </Filter>

+  </ItemGroup>

+  <ItemGroup>

+    <ClCompile Include="..\..\third_party\libwebm\mkvparser\mkvparser.cc">

+      <Filter>common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\..\third_party\libwebm\mkvparser\mkvreader.cc">

+      <Filter>common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\common\frame_outs.cpp">

+      <Filter>common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\common\ivfdec.c">

+      <Filter>common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\common\log.cpp">

+      <Filter>common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\common\md5_utils.c">

+      <Filter>common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\common\mp4parser.cc">

+      <Filter>common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\common\obudec.c">

+      <Filter>common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\common\stream_reader.cpp">

+      <Filter>common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\common\tools_common.c">

+      <Filter>common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\common\webmdec.cc">

+      <Filter>common</Filter>

+    </ClCompile>

+    <ClCompile Include="..\player_common\cmd_parser.cpp">

+      <Filter>Source Files</Filter>

+    </ClCompile>

+    <ClCompile Include="..\player_common\decoder.cpp">

+      <Filter>Source Files</Filter>

+    </ClCompile>

+    <ClCompile Include="..\player_common\decoder_creator.cpp">

+      <Filter>Source Files</Filter>

+    </ClCompile>

+  </ItemGroup>

+  <ItemGroup>

+    <ClInclude Include="..\common\frame_outs.h">

+      <Filter>common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\common\frame_queue.h">

+      <Filter>common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\common\ivfdec.h">

+      <Filter>common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\common\log.h">

+      <Filter>common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\common\md5_utils.h">

+      <Filter>common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\common\mp4parser.h">

+      <Filter>common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\common\obudec.h">

+      <Filter>common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\common\stream_reader.h">

+      <Filter>common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\common\tools_common.h">

+      <Filter>common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\common\utils.h">

+      <Filter>common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\common\webmdec.h">

+      <Filter>common</Filter>

+    </ClInclude>

+    <ClInclude Include="..\player_common\cmd_parser.h">

+      <Filter>Source Files</Filter>

+    </ClInclude>

+    <ClInclude Include="..\player_common\decoder.h">

+      <Filter>Source Files</Filter>

+    </ClInclude>

+    <ClInclude Include="..\player_common\decoder_creator.h">

+      <Filter>Source Files</Filter>

+    </ClInclude>

+  </ItemGroup>

+</Project>
\ No newline at end of file

diff --git a/testing/uwp_dx12_player/App.cpp b/testing/uwp_dx12_player/App.cpp
new file mode 100644
index 0000000..985add8
--- /dev/null
+++ b/testing/uwp_dx12_player/App.cpp

@@ -0,0 +1,199 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "pch.h"
+#include "App.h"
+
+#include <ppltasks.h>
+#include <Xinput.h>
+
+using namespace uwp_dx12_player;
+
+using namespace concurrency;
+using namespace Windows::ApplicationModel;
+using namespace Windows::ApplicationModel::Core;
+using namespace Windows::ApplicationModel::Activation;
+using namespace Windows::UI::Core;
+using namespace Windows::UI::Input;
+using namespace Windows::System;
+using namespace Windows::Foundation;
+using namespace Windows::Graphics::Display;
+
+using Microsoft::WRL::ComPtr;
+
+// The DirectX 12 Application template is documented at https://go.microsoft.com/fwlink/?LinkID=613670&clcid=0x409
+
+// The main function is only used to initialize our IFrameworkView class.
+[Platform::MTAThread] int main(Platform::Array<Platform::String ^> ^) {
+  PlayerWindow::InitializeLog(logging::LOG_INFO);
+  XB_LOGI << "Start application";
+
+  auto direct3DApplicationSource = ref new Direct3DApplicationSource();
+  CoreApplication::Run(direct3DApplicationSource);
+  return 0;
+}
+
+    IFrameworkView
+    ^ Direct3DApplicationSource::CreateView() {
+  return ref new App();
+}
+
+App::App() : m_windowClosed(false), m_windowVisible(true) {}
+
+// The first method called when the IFrameworkView is being created.
+void App::Initialize(CoreApplicationView ^ applicationView) {
+  // Register event handlers for app lifecycle. This example includes Activated, so that we
+  // can make the CoreWindow active and start rendering on the window.
+  applicationView->Activated +=
+      ref new TypedEventHandler<CoreApplicationView ^, IActivatedEventArgs ^>(this, &App::OnActivated);
+
+  CoreApplication::Suspending += ref new EventHandler<SuspendingEventArgs ^>(this, &App::OnSuspending);
+
+  CoreApplication::Resuming += ref new EventHandler<Platform::Object ^>(this, &App::OnResuming);
+}
+
+// Called when the CoreWindow object is created (or re-created).
+void App::SetWindow(CoreWindow ^ window) {
+  window->SizeChanged +=
+      ref new TypedEventHandler<CoreWindow ^, WindowSizeChangedEventArgs ^>(this, &App::OnWindowSizeChanged);
+
+  window->VisibilityChanged +=
+      ref new TypedEventHandler<CoreWindow ^, VisibilityChangedEventArgs ^>(this, &App::OnVisibilityChanged);
+
+  window->Closed += ref new TypedEventHandler<CoreWindow ^, CoreWindowEventArgs ^>(this, &App::OnWindowClosed);
+
+  DisplayInformation ^ currentDisplayInformation = DisplayInformation::GetForCurrentView();
+
+  currentDisplayInformation->DpiChanged +=
+      ref new TypedEventHandler<DisplayInformation ^, Object ^>(this, &App::OnDpiChanged);
+
+  currentDisplayInformation->OrientationChanged +=
+      ref new TypedEventHandler<DisplayInformation ^, Object ^>(this, &App::OnOrientationChanged);
+
+  DisplayInformation::DisplayContentsInvalidated +=
+      ref new TypedEventHandler<DisplayInformation ^, Object ^>(this, &App::OnDisplayContentsInvalidated);
+}
+
+// Initializes scene resources, or loads a previously saved app state.
+void App::Load(Platform::String ^ entryPoint) {
+  if (m_main == nullptr) {
+    m_main = std::unique_ptr<PlayerWindow>(new PlayerWindow(CoreWindow::GetForCurrentThread()));
+    m_decoder_state = m_main->Run();
+  }
+}
+
+// This method is called after the window becomes active.
+void App::Run() {
+  DWORD lastPacketNumber = 0;
+  while (!m_windowClosed) {
+    if (m_decoder_state == -1) {
+      // Decoder creation failed. Exit
+      XB_LOGE << "Unable to create decoder instance. May be due to wrong application mode (must be GAME MODE)";
+      break;
+    }
+    if (m_windowVisible) {
+      CoreWindow::GetForCurrentThread()->Dispatcher->ProcessEvents(CoreProcessEventsOption::ProcessAllIfPresent);
+      XINPUT_STATE state;
+      ZeroMemory(&state, sizeof(XINPUT_STATE));
+      if (XInputGetState(0, &state) == ERROR_SUCCESS) {
+        if (state.dwPacketNumber != lastPacketNumber) {
+          if (state.Gamepad.wButtons == XINPUT_GAMEPAD_DPAD_RIGHT) {
+            m_main->StopDecode(taskNext);
+          }
+          lastPacketNumber = state.dwPacketNumber;
+        }
+      }
+      if (m_windowVisible) {
+        m_main->ShowFrame();
+      }
+      if (m_main->isJobsDone()) {
+        XB_LOGI << "Close main window";
+        m_main->StopDecode(taskStop);
+        m_windowClosed = true;
+      }
+
+    } else {
+      CoreWindow::GetForCurrentThread()->Dispatcher->ProcessEvents(CoreProcessEventsOption::ProcessOneAndAllPending);
+      m_main->StopDecode(taskStopFlush);
+      m_windowClosed = true;
+    }
+  }
+}
+
+// Required for IFrameworkView.
+// Terminate events do not cause Uninitialize to be called. It will be called if your IFrameworkView
+// class is torn down while the app is in the foreground.
+void App::Uninitialize() {}
+
+// Application lifecycle event handlers.
+
+void App::OnActivated(CoreApplicationView ^ applicationView, IActivatedEventArgs ^ args) {
+  // Run() won't start until the CoreWindow is activated.
+  CoreWindow::GetForCurrentThread()->Activate();
+}
+
+void App::OnSuspending(Platform::Object ^ sender, SuspendingEventArgs ^ args) {
+  // Save app state asynchronously after requesting a deferral. Holding a deferral
+  // indicates that the application is busy performing suspending operations. Be
+  // aware that a deferral may not be held indefinitely. After about five seconds,
+  // the app will be forced to exit.
+  SuspendingDeferral ^ deferral = args->SuspendingOperation->GetDeferral();
+
+  create_task([this, deferral]() {
+    deferral->Complete();
+  });
+}
+
+void App::OnResuming(Platform::Object ^ sender, Platform::Object ^ args) {
+  // Restore any data or state that was unloaded on suspend. By default, data
+  // and state are persisted when resuming from suspend. Note that this event
+  // does not occur if the app was previously terminated.
+
+  // m_main->OnResuming();
+}
+
+// Window event handlers.
+
+void App::OnWindowSizeChanged(CoreWindow ^ sender, WindowSizeChangedEventArgs ^ args) {
+  // GetDeviceResources()->SetLogicalSize(Size(sender->Bounds.Width, sender->Bounds.Height));
+  // m_main->OnWindowSizeChanged();
+}
+
+void App::OnVisibilityChanged(CoreWindow ^ sender, VisibilityChangedEventArgs ^ args) {
+  m_windowVisible = args->Visible;
+}
+
+void App::OnWindowClosed(CoreWindow ^ sender, CoreWindowEventArgs ^ args) { m_windowClosed = true; }
+
+// DisplayInformation event handlers.
+
+void App::OnDpiChanged(DisplayInformation ^ sender, Object ^ args) {
+  // Note: The value for LogicalDpi retrieved here may not match the effective DPI of the app
+  // if it is being scaled for high resolution devices. Once the DPI is set on DeviceResources,
+  // you should always retrieve it using the GetDpi method.
+  // See DeviceResources.cpp for more details.
+  // GetDeviceResources()->SetDpi(sender->LogicalDpi);
+  // m_main->OnWindowSizeChanged();
+}
+
+void App::OnOrientationChanged(DisplayInformation ^ sender, Object ^ args) {
+  // GetDeviceResources()->SetCurrentOrientation(sender->CurrentOrientation);
+  // m_main->OnWindowSizeChanged();
+}
+
+void App::OnDisplayContentsInvalidated(DisplayInformation ^ sender, Object ^ args) {
+  // GetDeviceResources()->ValidateDevice();
+}

diff --git a/testing/uwp_dx12_player/App.h b/testing/uwp_dx12_player/App.h
new file mode 100644
index 0000000..458cc72
--- /dev/null
+++ b/testing/uwp_dx12_player/App.h

@@ -0,0 +1,66 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+
+#include "pch.h"
+#include "dxengine.h"
+#include "playerwnd.h"
+
+namespace uwp_dx12_player {
+// Main entry point for our app. Connects the app with the Windows shell and handles application lifecycle events.
+ref class App sealed : public Windows::ApplicationModel::Core::IFrameworkView {
+ public:
+  App();
+
+  // IFrameworkView methods.
+  virtual void Initialize(Windows::ApplicationModel::Core::CoreApplicationView ^ applicationView);
+  virtual void SetWindow(Windows::UI::Core::CoreWindow ^ window);
+  virtual void Load(Platform::String ^ entryPoint);
+  virtual void Run();
+  virtual void Uninitialize();
+
+ protected:
+  // Application lifecycle event handlers.
+  void OnActivated(Windows::ApplicationModel::Core::CoreApplicationView ^ applicationView,
+                   Windows::ApplicationModel::Activation::IActivatedEventArgs ^ args);
+  void OnSuspending(Platform::Object ^ sender, Windows::ApplicationModel::SuspendingEventArgs ^ args);
+  void OnResuming(Platform::Object ^ sender, Platform::Object ^ args);
+
+  // Window event handlers.
+  void OnWindowSizeChanged(Windows::UI::Core::CoreWindow ^ sender,
+                           Windows::UI::Core::WindowSizeChangedEventArgs ^ args);
+  void OnVisibilityChanged(Windows::UI::Core::CoreWindow ^ sender,
+                           Windows::UI::Core::VisibilityChangedEventArgs ^ args);
+  void OnWindowClosed(Windows::UI::Core::CoreWindow ^ sender, Windows::UI::Core::CoreWindowEventArgs ^ args);
+
+  // DisplayInformation event handlers.
+  void OnDpiChanged(Windows::Graphics::Display::DisplayInformation ^ sender, Platform::Object ^ args);
+  void OnOrientationChanged(Windows::Graphics::Display::DisplayInformation ^ sender, Platform::Object ^ args);
+  void OnDisplayContentsInvalidated(Windows::Graphics::Display::DisplayInformation ^ sender, Platform::Object ^ args);
+
+ private:
+  std::unique_ptr<PlayerWindow> m_main;
+  bool m_windowClosed;
+  bool m_windowVisible;
+  int m_decoder_state = 0;
+};
+}  // namespace uwp_dx12_player
+
+ref class Direct3DApplicationSource sealed : Windows::ApplicationModel::Core::IFrameworkViewSource {
+ public:
+  virtual Windows::ApplicationModel::Core::IFrameworkView ^ CreateView();
+};

diff --git a/testing/uwp_dx12_player/Assets/LockScreenLogo.scale-200.png b/testing/uwp_dx12_player/Assets/LockScreenLogo.scale-200.png
new file mode 100644
index 0000000..735f57a
--- /dev/null
+++ b/testing/uwp_dx12_player/Assets/LockScreenLogo.scale-200.png
Binary files differ

diff --git a/testing/uwp_dx12_player/Assets/SplashScreen.scale-200.png b/testing/uwp_dx12_player/Assets/SplashScreen.scale-200.png
new file mode 100644
index 0000000..023e7f1
--- /dev/null
+++ b/testing/uwp_dx12_player/Assets/SplashScreen.scale-200.png
Binary files differ

diff --git a/testing/uwp_dx12_player/Assets/Square150x150Logo.scale-200.png b/testing/uwp_dx12_player/Assets/Square150x150Logo.scale-200.png
new file mode 100644
index 0000000..af49fec
--- /dev/null
+++ b/testing/uwp_dx12_player/Assets/Square150x150Logo.scale-200.png
Binary files differ

diff --git a/testing/uwp_dx12_player/Assets/Square44x44Logo.scale-200.png b/testing/uwp_dx12_player/Assets/Square44x44Logo.scale-200.png
new file mode 100644
index 0000000..ce342a2
--- /dev/null
+++ b/testing/uwp_dx12_player/Assets/Square44x44Logo.scale-200.png
Binary files differ

diff --git a/testing/uwp_dx12_player/Assets/Square44x44Logo.targetsize-24_altform-unplated.png b/testing/uwp_dx12_player/Assets/Square44x44Logo.targetsize-24_altform-unplated.png
new file mode 100644
index 0000000..f6c02ce
--- /dev/null
+++ b/testing/uwp_dx12_player/Assets/Square44x44Logo.targetsize-24_altform-unplated.png
Binary files differ

diff --git a/testing/uwp_dx12_player/Assets/StoreLogo.png b/testing/uwp_dx12_player/Assets/StoreLogo.png
new file mode 100644
index 0000000..7385b56
--- /dev/null
+++ b/testing/uwp_dx12_player/Assets/StoreLogo.png
Binary files differ

diff --git a/testing/uwp_dx12_player/Assets/Wide310x150Logo.scale-200.png b/testing/uwp_dx12_player/Assets/Wide310x150Logo.scale-200.png
new file mode 100644
index 0000000..288995b
--- /dev/null
+++ b/testing/uwp_dx12_player/Assets/Wide310x150Logo.scale-200.png
Binary files differ

diff --git a/testing/uwp_dx12_player/Package.appxmanifest b/testing/uwp_dx12_player/Package.appxmanifest
new file mode 100644
index 0000000..4bb13d4
--- /dev/null
+++ b/testing/uwp_dx12_player/Package.appxmanifest

@@ -0,0 +1,49 @@
+<?xml version="1.0" encoding="utf-8"?>

+

+<Package

+  xmlns="http://schemas.microsoft.com/appx/manifest/foundation/windows10"

+  xmlns:mp="http://schemas.microsoft.com/appx/2014/phone/manifest"

+  xmlns:uap="http://schemas.microsoft.com/appx/manifest/uap/windows10"

+  IgnorableNamespaces="uap mp">

+

+  <Identity

+    Name="5fcaab29-e491-4021-be67-ca4afc9abf9b"

+    Publisher="CN=vpasoshnikov"

+    Version="1.0.0.0" />

+

+  <mp:PhoneIdentity PhoneProductId="5fcaab29-e491-4021-be67-ca4afc9abf9b" PhonePublisherId="00000000-0000-0000-0000-000000000000"/>

+

+  <Properties>

+    <DisplayName>uwp_dx12_player</DisplayName>

+    <PublisherDisplayName>vpasoshnikov</PublisherDisplayName>

+    <Logo>Assets\StoreLogo.png</Logo>

+  </Properties>

+

+  <Dependencies>

+    <TargetDeviceFamily Name="Windows.Universal" MinVersion="10.0.0.0" MaxVersionTested="10.0.0.0" />

+  </Dependencies>

+

+  <Resources>

+    <Resource Language="x-generate"/>

+  </Resources>

+

+  <Applications>

+    <Application Id="App"

+      Executable="$targetnametoken$.exe"

+      EntryPoint="uwp_dx12_player.App">

+      <uap:VisualElements

+        DisplayName="uwp_dx12_player"

+        Square150x150Logo="Assets\Square150x150Logo.png"

+        Square44x44Logo="Assets\Square44x44Logo.png"

+        Description="uwp_dx12_player"

+        BackgroundColor="transparent">

+        <uap:DefaultTile Wide310x150Logo="Assets\Wide310x150Logo.png"/>

+        <uap:SplashScreen Image="Assets\SplashScreen.png" />

+      </uap:VisualElements>

+    </Application>

+  </Applications>

+

+  <Capabilities>

+    <Capability Name="internetClient" />

+  </Capabilities>

+</Package>
\ No newline at end of file

diff --git a/testing/uwp_dx12_player/SamplePixelShader.hlsl b/testing/uwp_dx12_player/SamplePixelShader.hlsl
new file mode 100644
index 0000000..c2d56a1
--- /dev/null
+++ b/testing/uwp_dx12_player/SamplePixelShader.hlsl

@@ -0,0 +1,176 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "ps_cbuf.h"
+
+#define kDefaultWhiteLevelSRGB 100.0f  // sRGB
+#define kDefaultWhiteLevelPQ \
+  10000.0f  // PQ https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.2100-0-201607-I!!PDF-E.pdf
+#define kDefaultWhiteLevelHLG \
+  1000.0  // HLG https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.2100-0-201607-I!!PDF-E.pdf
+
+// color conversion matrixs. It works with liner colors
+static const float3x3 BT709_TO_BT2020 = {  // ref: ARIB STD-B62 and BT.2087
+    0.6274, 0.3293, 0.0433, 0.0691, 0.9195, 0.0114, 0.0164, 0.0880, 0.8956};
+
+static const float3x3 BT2020_TO_BT709 = {1.6605, -0.5877, -0.0728, -0.1246, 1.1330, -0.0084, -0.0182, -0.1006, 1.1187};
+
+// YUV to RGB matrixs:
+// BT.709
+static const float3x3 YCbCR_TO_SRGB = {1.1644f, 0.0f, 1.7927f, 1.1644f, -0.2133f, -0.5329f, 1.1644f, 2.1124f, 0.0f};
+// BT.2020
+static const float3x3 YCbCr_TO_BT2020 = {1.163746465f, -0.028815145f, 2.823537589f, 1.164383561f, -0.258509894f,
+                                         0.379693635f, 1.164383561f,  2.385315708f, 0.021554502f};
+
+float3 SRGB_OETF(float3 L) {
+  const float thr = 1.0;
+  L = L / kDefaultWhiteLevelSRGB;
+  float3 dark = L * 12.92;
+  float3 light = 1.055 * (float3)pow(abs(L), 1.0 / 2.4) - 0.055;
+
+  float3 r;
+  bool3 cri = L <= 0.0031308;
+  r = cri ? dark : light;
+
+//#define SHOW_COLOR
+#ifdef SHOW_COLOR
+  float Y = 0.2126f * r.x + 0.7152f * r.y + 0.0722f * r.z;
+  float3 ret = r;
+  float gray = (3.0 - Y) / 4.0;
+  float3 gray3 = float3(gray, gray, gray);
+  if (Y > 3.0) {
+    ret = float3(1.0, 0.0, 0.0);
+  } else if (gray < 0.5) {
+    ret = gray3;
+  }
+  return ret;
+#else
+  return r;
+#endif
+}
+
+float3 SRGB_EOTF(float3 E) {
+  float3 dark = E / 12.92;
+  float3 light = pow(abs((E + 0.055) / (1 + 0.055)), 2.4);
+  bool3 cri = E.xyz <= 0.04045;
+  float3 r = cri ? dark : light;
+  r = r * kDefaultWhiteLevelSRGB;
+  return r;
+}
+
+float3 BT709_EOTF(float3 rgb) {
+  const float a = 1.09929682f;
+  const float b = 0.0180539685f;
+  const float threshold = 0.081242864f;  // fromLiner(BT709, 0.0180539685f)
+  bool3 cri = rgb < threshold;
+  float3 dark = rgb / 4.5f;
+  float3 light = pow(abs((rgb + a - 1.0f) / a), 1.0f / 0.45f);
+  float3 liner709 = cri ? dark : light;
+
+  return liner709 * kDefaultWhiteLevelSRGB;
+}
+
+// input: normalized L in units of RefWhite (1.0=100nits), output: normalized E
+float3 PQ_OETF(float3 L) {
+  const float c1 = 0.8359375;       // 3424.f/4096.f;
+  const float c2 = 18.8515625;      // 2413.f/4096.f*32.f;
+  const float c3 = 18.6875;         // 2392.f/4096.f*32.f;
+  const float m1 = 0.159301758125;  // 2610.f / 4096.f / 4;
+  const float m2 = 78.84375;        // 2523.f / 4096.f * 128.f;
+  L = L / kDefaultWhiteLevelPQ;     // normalize liner color
+  float3 Lm1 = pow(abs(L), m1);
+  float3 X = (c1 + c2 * Lm1) / (1 + c3 * Lm1);
+  float3 res = pow(abs(X), m2);
+  return res;
+}
+
+// input: normalized E (0.0, 1.0), output: normalized L in units of RefWhite
+float3 PQ_EOTF(float3 E) {
+  const float c1 = 0.8359375;       // 3424.f/4096.f;
+  const float c2 = 18.8515625;      // 2413.f/4096.f*32.f;
+  const float c3 = 18.6875;         // 2392.f/4096.f*32.f;
+  const float m1 = 0.159301758125;  // 2610.f / 4096.f / 4;
+  const float m2 = 78.84375;        // 2523.f / 4096.f * 128.f;
+  float3 M = c2 - c3 * pow(abs(E), 1 / m2);
+  float3 N = max(pow(abs(E), 1 / m2) - c1, 0);
+  float3 L = pow(abs(N / M), 1 / m1);  // normalized nits (1.0 = 10000nits)
+  L = L * kDefaultWhiteLevelPQ;        // white level in range (0 - kDefaultWhiteLevelPQ);
+  return L;
+}
+
+float3 ARIB_STD_B67_EOTF(float3 E) {
+  const float a = 0.17883277f;
+  const float b = 0.28466892f;
+  const float c = 0.55991073f;
+  const float Lmax = 12.0f;
+  float3 dark = (E * 2.0f) * (E * 2.0f);
+  float3 light = exp((E - c) / a) + b;
+  bool3 cri = E <= 0.5f;
+  float3 r = cri ? dark : light;
+  return r / Lmax * kDefaultWhiteLevelHLG;
+}
+struct PSInput {
+  float4 position : SV_POSITION;
+  float2 uv : TEXCOORD;
+};
+
+Texture2D g_textureY : register(t0);
+Texture2D g_textureU : register(t1);
+Texture2D g_textureV : register(t2);
+SamplerState g_samplerY : register(s0);
+SamplerState g_samplerUV : register(s1);
+
+float4 main(PSInput fragment) : SV_TARGET {
+  float3 ycbcr;
+  if (hdr_10_10_10_2) {
+    uint2 DTid, ptex_dim, ptex_ind, ptex_ind_temp;
+    float2 ptex_pos, tex_pos, mag_ratio;
+
+    mag_ratio.x = float(disp_w) / float(frame_w);
+    mag_ratio.y = float(disp_h) / float(frame_h);
+    DTid.x = uint(floor(fragment.uv.x * disp_w));
+    DTid.y = uint(floor(fragment.uv.y * disp_h));
+    tex_pos = (float2(DTid) + 0.5f) / mag_ratio;
+
+    ptex_dim = uint2(ptex_w, ptex_h);
+    ptex_ind = uint2(floor(tex_pos.x / 3), floor(tex_pos.y));
+    ptex_pos.x = ((float)ptex_ind + 0.5f) / (float)(ptex_dim.x);
+    ptex_pos.y = tex_pos.y / (float)(ptex_dim.y);
+    ycbcr.x = g_textureY.Sample(g_samplerY, ptex_pos)[uint(floor(tex_pos.x)) % 3];
+
+    ptex_dim >>= 1;
+    DTid >>= 1;
+    ptex_ind_temp = uint2(floor((float(DTid.x + 0.5f) / mag_ratio.x) / 3), floor(float(DTid.y + 0.5f) / mag_ratio.y));
+    ptex_ind = ptex_ind_temp;  // >> 1;
+    ptex_pos = ((float2)ptex_ind + 0.5f) / (float2)(ptex_dim);
+
+    ycbcr.y = g_textureU.Sample(g_samplerUV, ptex_pos)[uint(floor(float(DTid.x + 0.5f) / mag_ratio.x)) % 3];
+    ycbcr.z = g_textureV.Sample(g_samplerUV, ptex_pos)[uint(floor(float(DTid.x + 0.5f) / mag_ratio.x)) % 3];
+  } else if (isMonochrome) {
+    float2 cbcr = isMonochrome ? float2(0.5, 0.5)
+                               : float2(g_textureU.Sample(g_samplerUV, fragment.uv).x,
+                                        g_textureV.Sample(g_samplerUV, fragment.uv).x);
+    ycbcr = float3(g_textureY.Sample(g_samplerY, fragment.uv).x, cbcr);
+
+  } else {
+    ycbcr = float3(g_textureY.Sample(g_samplerY, fragment.uv).x, g_textureU.Sample(g_samplerUV, fragment.uv).x,
+                   g_textureV.Sample(g_samplerUV, fragment.uv).x);
+  }
+  ycbcr *= scale_factor;
+  ycbcr -= float3(0.0625, 0.5, 0.5);
+  float4 result = float4(mul(YCbCR_TO_SRGB, ycbcr), 1.0f);
+  return result;
+}
\ No newline at end of file

diff --git a/testing/uwp_dx12_player/SampleVertexShader.hlsl b/testing/uwp_dx12_player/SampleVertexShader.hlsl
new file mode 100644
index 0000000..3194487
--- /dev/null
+++ b/testing/uwp_dx12_player/SampleVertexShader.hlsl

@@ -0,0 +1,32 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+// Per-vertex data used as input to the vertex shader.
+
+struct PSInput {
+  float4 position : SV_POSITION;
+  float2 uv : TEXCOORD;
+  //    float4 color : COLOR;
+};
+
+PSInput main(float4 position : POSITION, float2 uv : TEXCOORD) {
+  PSInput result;
+
+  result.position = position;
+  result.uv = uv;
+
+  return result;
+}

diff --git a/testing/uwp_dx12_player/ShaderStructures.h b/testing/uwp_dx12_player/ShaderStructures.h
new file mode 100644
index 0000000..744a5ab
--- /dev/null
+++ b/testing/uwp_dx12_player/ShaderStructures.h

@@ -0,0 +1,11 @@
+#pragma once
+
+namespace av1xboxplayer {
+// Used to send per-vertex data to the vertex shader.
+struct VertexPositionTex {
+  DirectX::XMFLOAT3 pos;
+  DirectX::XMFLOAT2 texcoord;
+  // FLOAT X, Y, Z;      // position
+  // DirectX::XMFLOAT4 Color;    // color
+};
+}  // namespace av1xboxplayer
\ No newline at end of file

diff --git a/testing/uwp_dx12_player/dxengine.cpp b/testing/uwp_dx12_player/dxengine.cpp
new file mode 100644
index 0000000..3238f15
--- /dev/null
+++ b/testing/uwp_dx12_player/dxengine.cpp

@@ -0,0 +1,597 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "pch.h"
+#include "DXEngine.h"
+#include "SamplePixelShader.h"
+#include "SampleVertexShader.h"
+#include "ps_cbuf.h"
+
+using namespace Windows::Graphics::Display;
+
+DXEngine::DXEngine(Windows::UI::Core::CoreWindow ^ window) : m_frameIndex(0), m_rtvDescriptorSize(0) {
+  DisplayInformation ^ currentDisplayInformation = DisplayInformation::GetForCurrentView();
+  m_window = window;
+  m_logicalSize = Windows::Foundation::Size(m_window->Bounds.Width, m_window->Bounds.Height);
+  m_dpi = currentDisplayInformation->LogicalDpi;
+  UpdateRenderTargetSize();
+  m_viewport = CD3DX12_VIEWPORT(0.0f, 0.0f, static_cast<float>(m_width), static_cast<float>(m_height));
+  m_scissorRect = CD3DX12_RECT(0, 0, static_cast<LONG>(m_width), static_cast<LONG>(m_height)), strides_[0] = m_width;
+  strides_[1] = strides_[2] = strides_[0] / 2;
+  upload_buffer_size_[0] = upload_buffer_size_[1] = upload_buffer_size_[2] = 0;
+}
+
+// Converts a length in device-independent pixels (DIPs) to a length in physical pixels.
+static inline float ConvertDipsToPixels(float dips, float dpi) {
+  static const float dipsPerInch = 96.0f;
+  return floorf(dips * dpi / dipsPerInch + 0.5f);  // Round to nearest integer.
+}
+
+void DXEngine::UpdateRenderTargetSize() {
+  m_effectiveDpi = m_dpi;
+
+  // Calculate the necessary render target size in pixels.
+  m_width = max(lround(ConvertDipsToPixels(m_logicalSize.Width, m_effectiveDpi)), 1);
+  m_height = max(lround(ConvertDipsToPixels(m_logicalSize.Height, m_effectiveDpi)), 1);
+}
+
+int DXEngine::OnInit() {
+  if (LoadPipeline() || LoadAssets()) return -1;
+  ready_ = 1;
+  return 0;
+}
+
+// Load the rendering pipeline dependencies.
+int DXEngine::LoadPipeline() {
+  UINT dxgiFactoryFlags = 0;
+  D2D1_FACTORY_OPTIONS factory_options = {D2D1_DEBUG_LEVEL_NONE};
+#if defined(_DEBUG)
+  // Enable the debug layer (requires the Graphics Tools "optional feature").
+  // NOTE: Enabling the debug layer after device creation will invalidate the active device.
+  {
+    ComPtr<ID3D12Debug> debugController;
+    if (SUCCEEDED(D3D12GetDebugInterface(IID_PPV_ARGS(&debugController)))) {
+      debugController->EnableDebugLayer();
+
+      // Enable additional debug layers.
+      dxgiFactoryFlags |= DXGI_CREATE_FACTORY_DEBUG;
+      factory_options.debugLevel = D2D1_DEBUG_LEVEL_INFORMATION;
+    }
+  }
+#endif
+
+  ComPtr<IDXGIFactory4> factory;
+  CHECK_HRESULT(CreateDXGIFactory2(dxgiFactoryFlags, IID_PPV_ARGS(&factory)));
+
+  if (m_useWarpDevice) {
+    ComPtr<IDXGIAdapter> warpAdapter;
+    CHECK_HRESULT(factory->EnumWarpAdapter(IID_PPV_ARGS(&warpAdapter)));
+
+    CHECK_HRESULT(D3D12CreateDevice(warpAdapter.Get(), D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&m_device)));
+  } else {
+    ComPtr<IDXGIAdapter1> hardwareAdapter;
+    GetHardwareAdapter(factory.Get(), &hardwareAdapter);
+
+    CHECK_HRESULT(D3D12CreateDevice(hardwareAdapter.Get(), D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&m_device)));
+  }
+
+  // Describe and create the command queue.
+  D3D12_COMMAND_QUEUE_DESC queueDesc = {};
+  queueDesc.Flags = D3D12_COMMAND_QUEUE_FLAG_NONE;
+  queueDesc.Type = D3D12_COMMAND_LIST_TYPE_DIRECT;
+
+  CHECK_HRESULT(m_device->CreateCommandQueue(&queueDesc, IID_PPV_ARGS(&m_commandQueue)));
+
+  ComPtr<ID3D11Device> d3d11Device;
+  CHECK_HRESULT(D3D11On12CreateDevice(m_device.Get(), D3D11_CREATE_DEVICE_BGRA_SUPPORT, nullptr, 0,
+                                      reinterpret_cast<IUnknown**>(m_commandQueue.GetAddressOf()), 1, 0, &d3d11Device,
+                                      &d11on12device_context_, nullptr));
+
+  // Query the 11On12 device from the 11 device.
+  CHECK_HRESULT(d3d11Device.As(&d11on12device_));
+
+  // Describe and create the swap chain.
+  DXGI_SWAP_CHAIN_DESC1 swapChainDesc = {};
+  swapChainDesc.BufferCount = FrameCount;
+  swapChainDesc.Width = m_width;
+  swapChainDesc.Height = m_height;
+  swapChainDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
+  swapChainDesc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT;
+  swapChainDesc.SwapEffect = DXGI_SWAP_EFFECT_FLIP_DISCARD;
+  swapChainDesc.SampleDesc.Count = 1;
+  swapChainDesc.Scaling = DXGI_SCALING_NONE;
+
+  ComPtr<IDXGISwapChain1> swapChain;
+  CHECK_HRESULT(factory->CreateSwapChainForCoreWindow(
+      m_commandQueue.Get(),  // Swap chain needs the queue so that it can force a flush on it.
+      reinterpret_cast<IUnknown*>(m_window.Get()), &swapChainDesc, nullptr, &swapChain));
+
+  D2D1_DEVICE_CONTEXT_OPTIONS deviceOptions = D2D1_DEVICE_CONTEXT_OPTIONS_NONE;
+  CHECK_HRESULT(
+      D2D1CreateFactory(D2D1_FACTORY_TYPE_SINGLE_THREADED, __uuidof(ID2D1Factory3), &factory_options, &d2d1factory_));
+  ComPtr<IDXGIDevice> dxgiDevice;
+  CHECK_HRESULT(d11on12device_.As(&dxgiDevice));
+  CHECK_HRESULT(d2d1factory_->CreateDevice(dxgiDevice.Get(), &d2d1device2_));
+  CHECK_HRESULT(d2d1device2_->CreateDeviceContext(deviceOptions, &d2d1deviceContext_));
+  CHECK_HRESULT(DWriteCreateFactory(DWRITE_FACTORY_TYPE_SHARED, __uuidof(IDWriteFactory), &d2writeFactory_));
+
+  float dpiX;
+  float dpiY;
+  d2d1factory_->GetDesktopDpi(&dpiX, &dpiY);
+  D2D1_BITMAP_PROPERTIES1 bitmapProperties =
+      D2D1::BitmapProperties1(D2D1_BITMAP_OPTIONS_TARGET | D2D1_BITMAP_OPTIONS_CANNOT_DRAW,
+                              D2D1::PixelFormat(DXGI_FORMAT_UNKNOWN, D2D1_ALPHA_MODE_PREMULTIPLIED), dpiX, dpiY);
+
+  CHECK_HRESULT(swapChain.As(&m_swapChain));
+  m_frameIndex = m_swapChain->GetCurrentBackBufferIndex();
+
+  // Create descriptor heaps.
+  {
+    // Describe and create a render target view (RTV) descriptor heap.
+    D3D12_DESCRIPTOR_HEAP_DESC rtvHeapDesc = {};
+    rtvHeapDesc.NumDescriptors = FrameCount;
+    rtvHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV;
+    rtvHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE;
+    CHECK_HRESULT(m_device->CreateDescriptorHeap(&rtvHeapDesc, IID_PPV_ARGS(&m_rtvHeap)));
+
+    // Describe and create a shader resource view (SRV) heap for the texture.
+    D3D12_DESCRIPTOR_HEAP_DESC srvHeapDesc = {};
+    srvHeapDesc.NumDescriptors = 4;
+    srvHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV;
+    srvHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE;
+    CHECK_HRESULT(m_device->CreateDescriptorHeap(&srvHeapDesc, IID_PPV_ARGS(&m_srvHeap)));
+
+    m_rtvDescriptorSize = m_device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_RTV);
+    m_srvDescriptorSize = m_device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
+  }
+
+  // Create frame resources.
+  {
+    CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHandle(m_rtvHeap->GetCPUDescriptorHandleForHeapStart());
+
+    // Create a RTV for each frame.
+    for (UINT n = 0; n < FrameCount; n++) {
+      CHECK_HRESULT(m_swapChain->GetBuffer(n, IID_PPV_ARGS(&m_renderTargets[n])));
+      m_device->CreateRenderTargetView(m_renderTargets[n].Get(), nullptr, rtvHandle);
+      // Create a wrapped 11On12 resource of this back buffer. Since we are
+      // rendering all D3D12 content first and then all D2D content, we specify
+      // the In resource state as RENDER_TARGET - because D3D12 will have last
+      // used it in this state - and the Out resource state as PRESENT. When
+      // ReleaseWrappedResources() is called on the 11On12 device, the resource
+      // will be transitioned to the PRESENT state.
+      D3D11_RESOURCE_FLAGS d3d11Flags = {D3D11_BIND_RENDER_TARGET};
+      CHECK_HRESULT(d11on12device_->CreateWrappedResource(
+          m_renderTargets[n].Get(), &d3d11Flags, D3D12_RESOURCE_STATE_RENDER_TARGET, D3D12_RESOURCE_STATE_PRESENT,
+          IID_PPV_ARGS(&wrappedBackBuffers_[n])));
+
+      // Create a render target for D2D to draw directly to this back buffer.
+      ComPtr<IDXGISurface> surface;
+      CHECK_HRESULT(wrappedBackBuffers_[n].As(&surface));
+      CHECK_HRESULT(
+          d2d1deviceContext_->CreateBitmapFromDxgiSurface(surface.Get(), &bitmapProperties, &d2d1renderTargets_[n]));
+
+      rtvHandle.Offset(1, m_rtvDescriptorSize);
+    }
+  }
+
+  CHECK_HRESULT(m_device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&m_commandAllocator)));
+
+  CHECK_HRESULT(d2d1deviceContext_->CreateSolidColorBrush(D2D1::ColorF(D2D1::ColorF::LightGreen), &textBrush_));
+  CHECK_HRESULT(d2writeFactory_->CreateTextFormat(L"Verdana", NULL, DWRITE_FONT_WEIGHT_NORMAL, DWRITE_FONT_STYLE_NORMAL,
+                                                  DWRITE_FONT_STRETCH_NORMAL, 24, L"en-us", &textFormat_));
+  CHECK_HRESULT(textFormat_->SetTextAlignment(DWRITE_TEXT_ALIGNMENT_LEADING));
+  CHECK_HRESULT(textFormat_->SetParagraphAlignment(DWRITE_PARAGRAPH_ALIGNMENT_NEAR));
+
+  return S_OK;
+}
+
+#define TEXT_TOP 20
+#define TEXT_LEFT 20
+void DXEngine::RenderUI(std::shared_ptr<OneRenderTask> task) {
+  D2D1_SIZE_F rtSize = d2d1renderTargets_[m_frameIndex]->GetSize();
+  D2D1_RECT_F textRect = D2D1::RectF(TEXT_LEFT, TEXT_TOP, rtSize.width - TEXT_LEFT, 100);
+  //    static const WCHAR text[] = L"11On12";
+
+  // Acquire our wrapped render target resource for the current back buffer.
+  d11on12device_->AcquireWrappedResources(wrappedBackBuffers_[m_frameIndex].GetAddressOf(), 1);
+
+  // Render text directly to the back buffer.
+  d2d1deviceContext_->SetTarget(d2d1renderTargets_[m_frameIndex].Get());
+  d2d1deviceContext_->BeginDraw();
+  d2d1deviceContext_->SetTransform(D2D1::Matrix3x2F::Identity());
+  d2d1deviceContext_->DrawText(task->screen_msg.c_str(), (UINT32)task->screen_msg.length(), textFormat_.Get(),
+                               &textRect, textBrush_.Get());
+  CHECK_HRESULT_0(d2d1deviceContext_->EndDraw());
+
+  // Release our wrapped render target resource. Releasing
+  // transitions the back buffer resource to the state specified
+  // as the OutState when the wrapped resource was created.
+  d11on12device_->ReleaseWrappedResources(wrappedBackBuffers_[m_frameIndex].GetAddressOf(), 1);
+
+  // Flush to submit the 11 command list to the shared command queue.
+  d11on12device_context_->Flush();
+}
+
+// Load the sample assets.
+int DXEngine::LoadAssets() {
+  // Create the root signature.
+  {
+    HRESULT hr;
+    D3D12_FEATURE_DATA_ROOT_SIGNATURE featureData = {};
+
+    // This is the highest version the sample supports. If CheckFeatureSupport succeeds, the HighestVersion returned
+    // will not be greater than this.
+    featureData.HighestVersion = D3D_ROOT_SIGNATURE_VERSION_1_1;
+
+    if (FAILED(m_device->CheckFeatureSupport(D3D12_FEATURE_ROOT_SIGNATURE, &featureData, sizeof(featureData)))) {
+      featureData.HighestVersion = D3D_ROOT_SIGNATURE_VERSION_1_0;
+    }
+
+    CD3DX12_DESCRIPTOR_RANGE1 ranges[2];
+    ranges[0].Init(D3D12_DESCRIPTOR_RANGE_TYPE_CBV, 1, 0, 0,
+                   D3D12_DESCRIPTOR_RANGE_FLAG_NONE /*D3D12_DESCRIPTOR_RANGE_FLAG_DATA_STATIC*/);
+    ranges[1].Init(D3D12_DESCRIPTOR_RANGE_TYPE_SRV, 3, 0, 0,
+                   D3D12_DESCRIPTOR_RANGE_FLAG_NONE /*D3D12_DESCRIPTOR_RANGE_FLAG_DATA_STATIC*/);
+
+    CD3DX12_ROOT_PARAMETER1 rootParameters[1];
+    rootParameters[0].InitAsDescriptorTable(2, &ranges[0], D3D12_SHADER_VISIBILITY_PIXEL);
+
+    D3D12_STATIC_SAMPLER_DESC sampler = {};
+    sampler.Filter = D3D12_FILTER_MIN_MAG_MIP_POINT;
+    sampler.AddressU = D3D12_TEXTURE_ADDRESS_MODE_BORDER;
+    sampler.AddressV = D3D12_TEXTURE_ADDRESS_MODE_BORDER;
+    sampler.AddressW = D3D12_TEXTURE_ADDRESS_MODE_BORDER;
+    sampler.MipLODBias = 0;
+    sampler.MaxAnisotropy = 0;
+    sampler.ComparisonFunc = D3D12_COMPARISON_FUNC_NEVER;
+    sampler.BorderColor = D3D12_STATIC_BORDER_COLOR_TRANSPARENT_BLACK;
+    sampler.MinLOD = 0.0f;
+    sampler.MaxLOD = D3D12_FLOAT32_MAX;
+    sampler.ShaderRegister = 0;
+    sampler.RegisterSpace = 0;
+    sampler.ShaderVisibility = D3D12_SHADER_VISIBILITY_PIXEL;
+    D3D12_STATIC_SAMPLER_DESC sampler1 = sampler;
+    sampler1.ShaderRegister = 1;
+
+    D3D12_STATIC_SAMPLER_DESC samplers[2] = {sampler, sampler1};
+
+    CD3DX12_VERSIONED_ROOT_SIGNATURE_DESC rootSignatureDesc;
+    rootSignatureDesc.Init_1_1(_countof(rootParameters), rootParameters, 2, samplers,
+                               D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT);
+
+    ComPtr<ID3DBlob> signature;
+    ComPtr<ID3DBlob> error;
+    hr = D3DX12SerializeVersionedRootSignature(&rootSignatureDesc, featureData.HighestVersion, &signature, &error);
+    if (FAILED(hr)) {
+      void* ss = error->GetBufferPointer();
+      wprintf(L"%s\n", (wchar_t*)ss);
+    }
+    CHECK_HRESULT(m_device->CreateRootSignature(0, signature->GetBufferPointer(), signature->GetBufferSize(),
+                                                IID_PPV_ARGS(&m_rootSignature)));
+
+    // create CbPixShData buffer
+    const D3D12_HEAP_TYPE heapType = D3D12_HEAP_TYPE_UPLOAD;
+    const D3D12_RESOURCE_STATES state = D3D12_RESOURCE_STATE_GENERIC_READ;
+    D3D12_HEAP_PROPERTIES heapProp;
+    heapProp.Type = heapType;
+    heapProp.CPUPageProperty = D3D12_CPU_PAGE_PROPERTY_UNKNOWN;  //
+    heapProp.MemoryPoolPreference = D3D12_MEMORY_POOL_UNKNOWN;
+    heapProp.CreationNodeMask = 0;
+    heapProp.VisibleNodeMask = 0;
+
+    const D3D12_RESOURCE_FLAGS flags = D3D12_RESOURCE_FLAG_NONE;
+    hr = m_device->CreateCommittedResource(&heapProp, D3D12_HEAP_FLAG_NONE, &CD3DX12_RESOURCE_DESC::Buffer(256, flags),
+                                           state, NULL, IID_PPV_ARGS(&hdr_buffer_));
+    CHECK_HRESULT(hr);
+    void* ptr = 0;
+    hr = hdr_buffer_->Map(0, NULL, &ptr);
+    if (FAILED(hr)) return 0;
+    CbPixShData buffer = {};
+    buffer.scale_factor = 1.0;
+    buffer.hdr_10_10_10_2 = 0;
+    memcpy(ptr, &buffer, sizeof(CbPixShData));
+    hdr_buffer_->Unmap(0, 0);
+  }
+
+  // Create the pipeline state, which includes compiling and loading shaders.
+  {
+    // Define the vertex input layout.
+    D3D12_INPUT_ELEMENT_DESC inputElementDescs[] = {
+        {"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0},
+        {"TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 12, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0}};
+
+    // Describe and create the graphics pipeline state object (PSO).
+    D3D12_GRAPHICS_PIPELINE_STATE_DESC psoDesc = {};
+    psoDesc.InputLayout = {inputElementDescs, _countof(inputElementDescs)};
+    psoDesc.pRootSignature = m_rootSignature.Get();
+    psoDesc.VS = CD3DX12_SHADER_BYTECODE(SampleVertexShader, sizeof(SampleVertexShader));
+    psoDesc.PS = CD3DX12_SHADER_BYTECODE(SamplePixelShader, sizeof(SamplePixelShader));
+    psoDesc.RasterizerState = CD3DX12_RASTERIZER_DESC(D3D12_DEFAULT);
+    psoDesc.BlendState = CD3DX12_BLEND_DESC(D3D12_DEFAULT);
+    psoDesc.DepthStencilState.DepthEnable = FALSE;
+    psoDesc.DepthStencilState.StencilEnable = FALSE;
+    psoDesc.SampleMask = UINT_MAX;
+    psoDesc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE;
+    psoDesc.NumRenderTargets = 1;
+    psoDesc.RTVFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM;
+    psoDesc.SampleDesc.Count = 1;
+    HRESULT hr = m_device->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&m_pipelineState));
+    CHECK_HRESULT(hr);
+  }
+
+  // Create the command list.
+  CHECK_HRESULT(m_device->CreateCommandList(0, D3D12_COMMAND_LIST_TYPE_DIRECT, m_commandAllocator.Get(),
+                                            m_pipelineState.Get(), IID_PPV_ARGS(&m_commandList)));
+  SET_D3D12_NAME(m_commandList);
+  setVertexes();
+
+  // Close the command list and execute it to begin the initial GPU setup.
+  CHECK_HRESULT(m_commandList->Close());
+  ID3D12CommandList* ppCommandLists[] = {m_commandList.Get()};
+  m_commandQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists);
+
+  // Create synchronization objects and wait until assets have been uploaded to the GPU.
+  {
+    CHECK_HRESULT(m_device->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&m_fence)));
+    m_fenceValue = 1;
+
+    // Create an event handle to use for frame synchronization.
+    m_fenceEvent = CreateEvent(nullptr, FALSE, FALSE, nullptr);
+    if (m_fenceEvent == nullptr) {
+      CHECK_HRESULT(HRESULT_FROM_WIN32(GetLastError()));
+    }
+
+    // Wait for the command list to execute; we are reusing the same command
+    // list in our main loop but for now, we just want to wait for setup to
+    // complete before continuing.
+    WaitForPreviousFrame();
+  }
+  return S_OK;
+}
+
+void DXEngine::setVertexes() {
+  fRect startRect = {-1.0f, 1.0f, 1.0f, -1.0f};
+  float texture_right = 1.0;
+  Vertex triangleVertices[] = {{{startRect.left, startRect.top, 0.0f}, {0.0f, 0.0f}},
+                               {{startRect.right, startRect.top, 0.0f}, {texture_right, 0.0f}},
+                               {{startRect.left, startRect.bottom, 0.0f}, {0.0f, 1.0f}},
+                               {{startRect.right, startRect.bottom, 0.0f}, {texture_right, 1.0f}}};
+  const UINT vertexesNumber = sizeof(triangleVertices) / sizeof(Vertex);
+  setVertexes(triangleVertices, vertexesNumber);
+}
+
+void DXEngine::OnResize() {
+  std::unique_lock<std::mutex> lck(cs_render_);
+  if (hasTexture_) {
+    CHECK_HRESULT_0(m_commandAllocator->Reset());
+    CHECK_HRESULT_0(m_commandList->Reset(m_commandAllocator.Get(), m_pipelineState.Get()));
+    setVertexes();
+    InternalRender(FALSE, 0);
+  }
+}
+
+// Update frame-based values.
+void DXEngine::OnUpdate() {}
+
+// Render the scene.
+void DXEngine::OnRender(std::shared_ptr<OneRenderTask> task) {
+  if (ready_ && task && !task->shown) {
+    InternalRender(TRUE, task);
+  }
+}
+
+void DXEngine::InternalRender(BOOL reset, std::shared_ptr<OneRenderTask> task) {
+  // Record all the commands we need to render the scene into the command list.
+  PopulateCommandList(reset, task);
+
+  // Execute the command list.
+  ID3D12CommandList* ppCommandLists[] = {m_commandList.Get()};
+  m_commandQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists);
+
+  RenderUI(task);
+
+  // Present the frame.
+  CHECK_HRESULT_0(m_swapChain->Present(1, 0));
+
+  WaitForPreviousFrame();
+}
+
+void DXEngine::OnDestroy() {
+  // Ensure that the GPU is no longer referencing resources that are about to be
+  // cleaned up by the destructor.
+  WaitForPreviousFrame();
+  ready_ = 0;
+  CloseHandle(m_fenceEvent);
+}
+
+void DXEngine::PopulateCommandList(BOOL reset, std::shared_ptr<OneRenderTask> task) {
+  // Command list allocators can only be reset when the associated
+  // command lists have finished execution on the GPU; apps should use
+  // fences to determine GPU execution progress.
+  if (reset) {
+    CHECK_HRESULT_0(m_commandAllocator->Reset());
+
+    // However, when ExecuteCommandList() is called on a particular command
+    // list, that command list can then be reset at any time and must be before
+    // re-recording.
+    CHECK_HRESULT_0(m_commandList->Reset(m_commandAllocator.Get(), m_pipelineState.Get()));
+  }
+
+  CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHandle(m_rtvHeap->GetCPUDescriptorHandleForHeapStart(), m_frameIndex,
+                                          m_rtvDescriptorSize);
+  m_commandList->OMSetRenderTargets(1, &rtvHandle, FALSE, nullptr);
+  m_commandList->ResourceBarrier(
+      1, &CD3DX12_RESOURCE_BARRIER::Transition(m_renderTargets[m_frameIndex].Get(), D3D12_RESOURCE_STATE_PRESENT,
+                                               D3D12_RESOURCE_STATE_RENDER_TARGET));
+  const float clearColor[] = {0.0f, 0.2f, 0.4f, 1.0f};
+  m_commandList->ClearRenderTargetView(rtvHandle, clearColor, 0, nullptr);
+
+  if (task) {
+    if (task->type != renderType_) {
+      void* ptr = 0;
+      HRESULT hr = hdr_buffer_->Map(0, NULL, &ptr);
+      CHECK_HRESULT_0(hr);
+      renderType_ = task->type;
+      CbPixShData* cbptr = (CbPixShData*)ptr;
+      cbptr->scale_factor = renderType_ == ft10bit ? 64.0f : 1.0f;
+      cbptr->hdr_10_10_10_2 = renderType_ == ft10x3 ? 1 : 0;
+      if (cbptr->hdr_10_10_10_2) {
+        D3D12_RESOURCE_DESC desc = task->textures[0]->GetDesc();
+        cbptr->disp_w = lround(m_viewport.Width);
+        cbptr->disp_h = lround(m_viewport.Height);
+        cbptr->frame_w = task->render_width;
+        cbptr->frame_h = task->render_height;
+        cbptr->ptex_w = (int)desc.Width;
+        cbptr->ptex_h = desc.Height;
+      }
+      hdr_buffer_->Unmap(0, 0);
+    }
+    const D3D12_RESOURCE_BARRIER barriers[3] = {
+        CD3DX12_RESOURCE_BARRIER::Transition(task->textures[0].Get(), D3D12_RESOURCE_STATE_COPY_DEST,
+                                             D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE),
+        CD3DX12_RESOURCE_BARRIER::Transition(task->textures[1].Get(), D3D12_RESOURCE_STATE_COPY_DEST,
+                                             D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE),
+        CD3DX12_RESOURCE_BARRIER::Transition(task->textures[2].Get(), D3D12_RESOURCE_STATE_COPY_DEST,
+                                             D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE)};
+    m_commandList->ResourceBarrier(3, barriers);
+    CD3DX12_CPU_DESCRIPTOR_HANDLE srvHandle(m_srvHeap->GetCPUDescriptorHandleForHeapStart());
+
+    m_commandList->SetGraphicsRootSignature(m_rootSignature.Get());
+
+    D3D12_CONSTANT_BUFFER_VIEW_DESC cbvDesc{};
+    D3D12_RESOURCE_DESC cbdesk = hdr_buffer_->GetDesc();
+    cbvDesc.BufferLocation = hdr_buffer_->GetGPUVirtualAddress();
+    cbvDesc.SizeInBytes = (UINT)cbdesk.Width;
+    m_device->CreateConstantBufferView(&cbvDesc, srvHandle);
+    srvHandle.Offset(1, m_srvDescriptorSize);
+
+    for (int i = 0; i < 3; i++) {
+      D3D12_RESOURCE_DESC textureDesc = task->textures[i]->GetDesc();
+      D3D12_SHADER_RESOURCE_VIEW_DESC srvDesc = {};
+      srvDesc.Shader4ComponentMapping = D3D12_DEFAULT_SHADER_4_COMPONENT_MAPPING;
+      srvDesc.Format = textureDesc.Format;
+      srvDesc.ViewDimension = D3D12_SRV_DIMENSION_TEXTURE2D;
+      srvDesc.Texture2D.MipLevels = 1;
+      m_device->CreateShaderResourceView(task->textures[i].Get(), &srvDesc, srvHandle);
+      srvHandle.Offset(1, m_srvDescriptorSize);
+    }
+    // Set necessary state.
+
+    ID3D12DescriptorHeap* ppHeaps[] = {m_srvHeap.Get()};
+    m_commandList->SetDescriptorHeaps(_countof(ppHeaps), ppHeaps);
+
+    m_commandList->SetGraphicsRootDescriptorTable(0, m_srvHeap->GetGPUDescriptorHandleForHeapStart());
+    m_commandList->RSSetViewports(1, &m_viewport);
+    m_commandList->RSSetScissorRects(1, &m_scissorRect);
+
+    // Record commands.
+    m_commandList->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP /*D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST*/);
+    m_commandList->IASetVertexBuffers(0, 1, &m_vertexBufferView);
+    m_commandList->DrawInstanced(4, 1, 0, 0);
+    {
+      const D3D12_RESOURCE_BARRIER barriers[3] = {
+          CD3DX12_RESOURCE_BARRIER::Transition(task->textures[0].Get(), D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE,
+                                               D3D12_RESOURCE_STATE_COPY_DEST),
+          CD3DX12_RESOURCE_BARRIER::Transition(task->textures[1].Get(), D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE,
+                                               D3D12_RESOURCE_STATE_COPY_DEST),
+          CD3DX12_RESOURCE_BARRIER::Transition(task->textures[2].Get(), D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE,
+                                               D3D12_RESOURCE_STATE_COPY_DEST)};
+      m_commandList->ResourceBarrier(3, barriers);
+    }
+  }
+
+  CHECK_HRESULT_0(m_commandList->Close());
+}
+
+void DXEngine::WaitForPreviousFrame() {
+  // WAITING FOR THE FRAME TO COMPLETE BEFORE CONTINUING IS NOT BEST PRACTICE.
+  // This is code implemented as such for simplicity. The D3D12HelloFrameBuffering
+  // sample illustrates how to use fences for efficient resource usage and to
+  // maximize GPU utilization.
+
+  // Signal and increment the fence value.
+  const UINT64 fence = m_fenceValue;
+  CHECK_HRESULT_0(m_commandQueue->Signal(m_fence.Get(), fence));
+  m_fenceValue++;
+
+  // Wait until the previous frame is finished.
+  if (m_fence->GetCompletedValue() < fence) {
+    CHECK_HRESULT_0(m_fence->SetEventOnCompletion(fence, m_fenceEvent));
+    WaitForSingleObject(m_fenceEvent, INFINITE);
+  }
+
+  m_frameIndex = m_swapChain->GetCurrentBackBufferIndex();
+}
+
+// Helper function for acquiring the first available hardware adapter that supports Direct3D 12.
+// If no such adapter can be found, *ppAdapter will be set to nullptr.
+_Use_decl_annotations_ void DXEngine::GetHardwareAdapter(IDXGIFactory2* pFactory, IDXGIAdapter1** ppAdapter) {
+  ComPtr<IDXGIAdapter1> adapter;
+  *ppAdapter = nullptr;
+
+  for (UINT adapterIndex = 0; DXGI_ERROR_NOT_FOUND != pFactory->EnumAdapters1(adapterIndex, &adapter); ++adapterIndex) {
+    DXGI_ADAPTER_DESC1 desc;
+    adapter->GetDesc1(&desc);
+
+    if (desc.Flags & DXGI_ADAPTER_FLAG_SOFTWARE) {
+      // Don't select the Basic Render Driver adapter.
+      // If you want a software adapter, pass in "/warp" on the command line.
+      continue;
+    }
+
+    // Check to see if the adapter supports Direct3D 12, but don't create the
+    // actual device yet.
+    if (SUCCEEDED(D3D12CreateDevice(adapter.Get(), D3D_FEATURE_LEVEL_11_0, _uuidof(ID3D12Device), nullptr))) {
+      break;
+    }
+  }
+
+  *ppAdapter = adapter.Detach();
+}
+
+void DXEngine::setVertexes(Vertex* vertexes, UINT size) {
+  UINT vertexBufferSize = sizeof(Vertex) * size;
+  CHECK_HRESULT_0(m_device->CreateCommittedResource(
+      &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), D3D12_HEAP_FLAG_NONE,
+      &CD3DX12_RESOURCE_DESC::Buffer(vertexBufferSize), D3D12_RESOURCE_STATE_GENERIC_READ, nullptr,
+      IID_PPV_ARGS(&vertexUploadBuffer_)));
+  SET_D3D12_NAME(vertexUploadBuffer_);
+
+  // Copy the triangle data to the vertex buffer.
+  UINT8* pVertexDataBegin;
+  CD3DX12_RANGE readRange(0, 0);  // We do not intend to read from this resource on the CPU.
+  CHECK_HRESULT_0(vertexUploadBuffer_->Map(0, &readRange, reinterpret_cast<void**>(&pVertexDataBegin)));
+  memcpy(pVertexDataBegin, vertexes, vertexBufferSize);
+  vertexUploadBuffer_->Unmap(0, nullptr);
+
+  CHECK_HRESULT_0(
+      m_device->CreateCommittedResource(&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT), D3D12_HEAP_FLAG_NONE,
+                                        &CD3DX12_RESOURCE_DESC::Buffer(vertexBufferSize),
+                                        D3D12_RESOURCE_STATE_COPY_DEST, nullptr, IID_PPV_ARGS(&m_vertexBuffer)));
+  SET_D3D12_NAME(m_vertexBuffer);
+  if (m_commandList) {
+    m_commandList->CopyBufferRegion(m_vertexBuffer.Get(), 0, vertexUploadBuffer_.Get(), 0, vertexBufferSize);
+    const D3D12_RESOURCE_BARRIER barriers[1] = {CD3DX12_RESOURCE_BARRIER::Transition(
+        m_vertexBuffer.Get(), D3D12_RESOURCE_STATE_COPY_DEST,
+        D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER /*D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE*/)};
+    m_commandList->ResourceBarrier(1, barriers);
+  }
+
+  // Initialize the vertex buffer view.
+  m_vertexBufferView.BufferLocation = m_vertexBuffer->GetGPUVirtualAddress();
+  m_vertexBufferView.StrideInBytes = sizeof(Vertex);
+  m_vertexBufferView.SizeInBytes = vertexBufferSize;
+}

diff --git a/testing/uwp_dx12_player/dxengine.h b/testing/uwp_dx12_player/dxengine.h
new file mode 100644
index 0000000..f8dad75
--- /dev/null
+++ b/testing/uwp_dx12_player/dxengine.h

@@ -0,0 +1,187 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+#include "pch.h"
+
+#include <mutex>
+#include "common\log.h"
+#include <memory>
+
+#include <d3d11on12.h>
+#include <d2d1_3.h>
+#include <dwrite.h>
+
+using namespace DirectX;
+
+// Note that while ComPtr is used to manage the lifetime of resources on the CPU,
+// it has no understanding of the lifetime of resources on the GPU. Apps must account
+// for the GPU lifetime of resources to avoid destroying objects that may still be
+// referenced by the GPU.
+// An example of this can be found in the class method: OnDestroy().
+using Microsoft::WRL::ComPtr;
+
+#define CHECK_HRESULT(hr)                  \
+  {                                        \
+    HRESULT _hr_ = hr;                     \
+    if (FAILED(_hr_)) {                    \
+      XB_LOGE << "DirectX error "          \
+              << "0x" << std::hex << _hr_; \
+      return _hr_;                         \
+    }                                      \
+  }
+
+#define CHECK_HRESULT_0(hr)                \
+  {                                        \
+    HRESULT _hr_ = hr;                     \
+    if (FAILED(_hr_)) {                    \
+      XB_LOGE << "DirectX error "          \
+              << "0x" << std::hex << _hr_; \
+    }                                      \
+  }
+
+#define SET_D3D12_NAME(x) x->SetName(L#x);
+#define SET_D3D12_NAME_ARR(x, i)           \
+  {                                        \
+    wchar_t ss[64];                        \
+    swprintf_s(ss, 64, L"%s[%d]", L#x, i); \
+    x[i]->SetName(ss);                     \
+  }
+
+typedef enum { ftNone = -1, ft8bit = 0, ft10bit, ft10x3 } frame_type;
+
+struct OneRenderTask {
+  ComPtr<ID3D12Resource> textures[3];
+  uint32_t render_width;
+  uint32_t render_height;
+  frame_type type;
+  int shown;
+  std::wstring screen_msg;
+};
+
+class DXEngine
+{
+ public:
+  struct fRect {
+    float left;
+    float top;
+    float right;
+    float bottom;
+  };
+  DXEngine(Windows::UI::Core::CoreWindow ^ window);
+
+  int OnInit();
+  void OnUpdate();
+  void OnRender(std::shared_ptr<OneRenderTask> task);
+  void OnDestroy();
+  void OnResize();
+  int Ready() { return ready_; }
+
+  ID3D12Device* Device() { return m_device.Get(); }
+
+ private:
+  void GetHardwareAdapter(IDXGIFactory2* pFactory, IDXGIAdapter1** ppAdapter);
+  void RenderUI(std::shared_ptr<OneRenderTask> task);
+  void UpdateRenderTargetSize();
+
+ private:
+  static const UINT FrameCount = 4;
+  static const UINT TextureWidth = 256;
+  static const UINT TextureHeight = 256;
+  static const UINT TexturePixelSize = 4;  // The number of bytes used to represent a pixel in the texture.
+
+  BOOL first_time_ = 1;
+
+  struct Vertex {
+    XMFLOAT3 position;
+    XMFLOAT2 uv;
+  };
+
+  // Pipeline objects.
+  CD3DX12_VIEWPORT m_viewport;
+  CD3DX12_RECT m_scissorRect;
+  ComPtr<IDXGISwapChain3> m_swapChain;
+  ComPtr<ID3D12Device> m_device;
+  ComPtr<ID3D12Resource> m_renderTargets[FrameCount];
+  ComPtr<ID3D12CommandAllocator> m_commandAllocator;
+  ComPtr<ID3D12CommandQueue> m_commandQueue;
+  ComPtr<ID3D12RootSignature> m_rootSignature;
+  ComPtr<ID3D12DescriptorHeap> m_rtvHeap;
+  ComPtr<ID3D12DescriptorHeap> m_srvHeap;
+  ComPtr<ID3D12PipelineState> m_pipelineState;
+  ComPtr<ID3D12GraphicsCommandList> m_commandList;
+  UINT m_rtvDescriptorSize;
+  UINT m_srvDescriptorSize;
+
+  // App resources.
+  ComPtr<ID3D12Resource> m_vertexBuffer;
+  D3D12_VERTEX_BUFFER_VIEW m_vertexBufferView;
+
+  ComPtr<ID3D12Resource> textureUploadHeap_[3];
+  ComPtr<ID3D12Resource> textureDefaultHeap_[3];
+
+  ComPtr<ID3D12Resource> textures_[3];
+
+  ComPtr<ID3D12Resource> vertexUploadBuffer_;
+  ComPtr<ID3D12Resource> downloadBuffer_;
+  ComPtr<ID3D12Resource> hdr_buffer_;
+
+  ComPtr<ID3D11On12Device> d11on12device_;
+  ComPtr<ID3D11DeviceContext> d11on12device_context_;
+  ComPtr<ID2D1Factory3> d2d1factory_;
+  ComPtr<IDWriteFactory> d2writeFactory_;
+  ComPtr<ID2D1Device2> d2d1device2_;
+  ComPtr<ID2D1DeviceContext2> d2d1deviceContext_;
+  ComPtr<ID3D11Resource> wrappedBackBuffers_[FrameCount];
+  ComPtr<ID2D1Bitmap1> d2d1renderTargets_[FrameCount];
+
+  ComPtr<ID2D1SolidColorBrush> textBrush_;
+  ComPtr<IDWriteTextFormat> textFormat_;
+
+  // Synchronization objects.
+  UINT m_frameIndex;
+  HANDLE m_fenceEvent;
+  ComPtr<ID3D12Fence> m_fence;
+  UINT64 m_fenceValue;
+
+  int LoadPipeline();
+  int LoadAssets();
+  void InternalRender(BOOL reset, std::shared_ptr<OneRenderTask> task);
+  void setVertexes(Vertex* vertexes, UINT size);
+  void setVertexes();
+  void PopulateCommandList(BOOL reset, std::shared_ptr<OneRenderTask> task);
+  void WaitForPreviousFrame();
+
+  BOOL hasTexture_ = FALSE;
+  std::mutex cs_render_;
+  unsigned char color_offset_ = 0;
+  UINT renderWidth_ = 0;
+  UINT renderHeight_ = 0;
+  frame_type renderType_ = ftNone;
+
+  UINT strides_[3];
+  int upload_buffer_size_[3];
+  UINT download_buffer_size_ = 0;
+  UINT m_width;
+  UINT m_height;
+  bool m_useWarpDevice = false;
+  int ready_ = 0;
+  Platform::Agile<Windows::UI::Core::CoreWindow> m_window;
+  Windows::Foundation::Size m_logicalSize;
+  float m_dpi;
+  float m_effectiveDpi;
+  std::mutex cs_;
+};

diff --git a/testing/uwp_dx12_player/hlsl_cb.h b/testing/uwp_dx12_player/hlsl_cb.h
new file mode 100644
index 0000000..0e85170
--- /dev/null
+++ b/testing/uwp_dx12_player/hlsl_cb.h

@@ -0,0 +1,21 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifdef __cplusplus
+#define CONSTANT_BUFFER_DEF(name, reg) struct name
+#else
+#define CONSTANT_BUFFER_DEF(name, reg) cbuffer name : register(reg)
+#endif

diff --git a/testing/uwp_dx12_player/pch.cpp b/testing/uwp_dx12_player/pch.cpp
new file mode 100644
index 0000000..1d9f38c
--- /dev/null
+++ b/testing/uwp_dx12_player/pch.cpp

@@ -0,0 +1 @@
+#include "pch.h"

diff --git a/testing/uwp_dx12_player/pch.h b/testing/uwp_dx12_player/pch.h
new file mode 100644
index 0000000..3a2906a
--- /dev/null
+++ b/testing/uwp_dx12_player/pch.h

@@ -0,0 +1,18 @@
+#pragma once
+
+#include <wrl.h>
+#include <wrl/client.h>
+#include <dxgi1_4.h>
+#include <d3d12.h>
+#include "common\d3dx12.h"
+#include <pix.h>
+#include <DirectXColors.h>
+#include <DirectXMath.h>
+#include <memory>
+#include <vector>
+#include <agile.h>
+#include <concrt.h>
+
+#if defined(_DEBUG)
+#include <dxgidebug.h>
+#endif

diff --git a/testing/uwp_dx12_player/playerwnd.cpp b/testing/uwp_dx12_player/playerwnd.cpp
new file mode 100644
index 0000000..d6b0af2
--- /dev/null
+++ b/testing/uwp_dx12_player/playerwnd.cpp

@@ -0,0 +1,104 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "pch.h"
+#include "playerwnd.h"
+
+static std::string get_log_path(std::string& root_path) {
+  char ret[512];
+  std::string log_folder(root_path + "\\logs");
+  std::wstring log_folder_w(log_folder.begin(), log_folder.end());
+  _wmkdir(log_folder_w.c_str());
+  struct tm* cur_time;
+  __time64_t long_time;
+  _time64(&long_time);
+  cur_time = _localtime64(&long_time);
+  sprintf_s(ret, 512, "%s\\%4d-%2d-%2d_%2d-%2d-%d.txt", log_folder.c_str(), cur_time->tm_year + 1900,
+            cur_time->tm_mon + 1, cur_time->tm_mday, cur_time->tm_hour, cur_time->tm_min, cur_time->tm_sec);
+  return ret;
+}
+
+void PlayerWindow::InitializeLog(logging::LogPriority level) {
+  std::string wpath;
+  std::string rpath;
+  std::string rootpath(_ROOT_FOLDER_);
+  Decoders::GetAvailablePaths(rpath, wpath, rootpath);
+  logging::set_log_priority(level);
+  if (WRITE_LOG_TO_FILE) logging::set_log_file(get_log_path(wpath).c_str());
+}
+
+void PlayerWindow::AddSource(AV1Decoder* source) {
+  std::unique_lock<std::mutex> lck(cs_);
+  decoder_ = source;
+}
+void PlayerWindow::RemoveSource(AV1Decoder* source) {
+  std::unique_lock<std::mutex> lck(cs_);
+  decoder_ = 0;
+}
+void PlayerWindow::AllJobDoneNotify() {
+  all_jobs_done_ = 1;
+  cv_finish_.notify_one();
+}
+
+void PlayerWindow::StopDecode(decodeTask stop_task) {
+  if (decoders_) decoders_->Stop(stop_task);
+}
+
+int PlayerWindow::Run() {
+  pDx12Engine_.reset(new DXEngine(window_.Get()));
+  if (pDx12Engine_->OnInit()) return -1;
+  d3d_resources dx_resources(pDx12Engine_->Device(), 0);
+  decoders_ =
+      std::shared_ptr<Decoders>(new Decoders(1, dx_resources, static_cast<DisplayCallbacks*>(this), _ROOT_FOLDER_));
+  decoders_->Start();
+  return 1;
+}
+
+int PlayerWindow::ShowFrame() {
+  if (pDx12Engine_) {
+    std::unique_lock<std::mutex> lck(cs_);
+    pDx12Engine_->OnRender(GetFrame());
+  }
+  return 0;
+}
+
+std::shared_ptr<OneRenderTask> PlayerWindow::GetFrame() {
+  if (decoder_) {
+    StreamStats stats;
+    std::shared_ptr<OutBufferWrapper> frame = decoder_->GetFrame(&stats);
+    if (frame) {
+      OutBuffer* buffer = frame->Buffer();
+      std::shared_ptr<OneRenderTask> task(new OneRenderTask);
+      task->render_width = buffer->renderWidth;
+      task->render_height = buffer->renderHeight;
+      task->textures[0] = buffer->d3d12textures[0];
+      task->textures[1] = buffer->d3d12textures[1];
+      task->textures[2] = buffer->d3d12textures[2];
+      task->shown = buffer->shown;
+      buffer->shown = 1;
+      task->type = (frame_type)buffer->fb_type;
+      wchar_t screen_msg[1024];
+      swprintf_s(screen_msg, 1024, L"%s\n%dx%d %dbit%s: %d (%.1f/%d)",
+                 std::wstring(stats.fname.begin(), stats.fname.end()).c_str(), buffer->renderWidth,
+                 buffer->renderHeight, (buffer->fb_type == fbt8bit) ? 8 : 10,
+                 (buffer->fb_type == fbt10x3) ? L"(10x3)" : L"", buffer->frame_no + buffer->frame_no_offset,
+                 stats.frate, stats.dropped);
+      task->screen_msg = screen_msg;
+      return task;
+    }
+  }
+  return 0;
+}
\ No newline at end of file

diff --git a/testing/uwp_dx12_player/playerwnd.h b/testing/uwp_dx12_player/playerwnd.h
new file mode 100644
index 0000000..413cf49
--- /dev/null
+++ b/testing/uwp_dx12_player/playerwnd.h

@@ -0,0 +1,62 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#pragma once
+
+#include "DXEngine.h"
+#include <string>
+#include <thread>
+#include <mutex>
+#include "player_common/decoder_creator.h"
+#include "player_common/decoder.h"
+
+#define _ROOT_FOLDER_ "G:\\test_data_av1\\av1"
+#define WRITE_LOG_TO_FILE 1
+
+class PlayerWindow : public DisplayCallbacks {
+ public:
+  PlayerWindow(Windows::UI::Core::CoreWindow ^ window) : window_(window) {}
+  static void InitializeLog(logging::LogPriority level);
+  int ShowFrame();
+  int Run();
+  void StopDecode(decodeTask stop_task);
+  int isJobsDone() { return all_jobs_done_; }
+
+  void WaitForFinishAll() {
+    std::unique_lock<std::mutex> lock(cs_finish_);
+    cv_finish_.wait(lock, [this] { return all_jobs_done_; });
+  }
+
+  // DisplayCallbacks
+  virtual void AddSource(AV1Decoder* source) final;
+  virtual void RemoveSource(AV1Decoder* source) final;
+  virtual void AllJobDoneNotify() final;
+
+ private:
+  std::shared_ptr<OneRenderTask> GetFrame();
+
+  BOOL running_ = FALSE;
+
+  std::mutex cs_;
+  std::unique_ptr<DXEngine> pDx12Engine_;
+  std::shared_ptr<Decoders> decoders_;
+  AV1Decoder* decoder_;
+  std::mutex cs_finish_;
+  std::condition_variable cv_finish_;
+  int all_jobs_done_ = 0;
+
+  Platform::Agile<Windows::UI::Core::CoreWindow> window_;
+};
\ No newline at end of file

diff --git a/testing/uwp_dx12_player/ps_cbuf.h b/testing/uwp_dx12_player/ps_cbuf.h
new file mode 100644
index 0000000..e231f57
--- /dev/null
+++ b/testing/uwp_dx12_player/ps_cbuf.h

@@ -0,0 +1,59 @@
+/*
+ * Copyright 2020 Google LLC
+ *
+ */
+
+/*
+ * Copyright (c) 2020, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef _PS_CBUF_H_
+#define _PS_CBUF_H_
+#include "hlsl_cb.h"
+
+// YUV2RGB_MATRIX
+#define MATRIX_BT709 0
+#define MATRIX_BT2020 1
+
+// enum TRANSFER_FUNC
+#define TFUNC_SRGB 0
+#define TFUNC_BT709 1
+#define TFUNC_2084 2  // SMPTE ST 2084
+#define TFUNC_HGL 3   // ARIB STD-B67
+
+// enum PRIMERY_COLORS
+#define PRIM_709 0
+#define PRIM_2020 1
+
+// enum YUV_RANGE
+#define RANGE_LIMITED 0
+#define RANGE_FULL 1
+
+CONSTANT_BUFFER_DEF(CbPixShData, b0) {
+  float scale_factor;
+  int src_matrix;
+  int src_range;
+  int src_func;
+  int src_prim;
+  int targ_func;
+  int targ_prim;
+  float lum_max;
+  float lum_min;
+  int isMonochrome;
+  int hdr_10_10_10_2;
+  int disp_w;
+  int disp_h;
+  int frame_w;
+  int frame_h;
+  int ptex_w;
+  int ptex_h;
+};
+
+#endif  //_PS_CBUF_H_

diff --git a/testing/uwp_dx12_player/uwp_dx12_player.vcxproj b/testing/uwp_dx12_player/uwp_dx12_player.vcxproj
new file mode 100644
index 0000000..c5962d6
--- /dev/null
+++ b/testing/uwp_dx12_player/uwp_dx12_player.vcxproj

@@ -0,0 +1,308 @@
+<?xml version="1.0" encoding="utf-8"?>

+<Project DefaultTargets="Build" ToolsVersion="15.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

+  <ItemGroup Label="ProjectConfigurations">

+    <ProjectConfiguration Include="Debug|Win32">

+      <Configuration>Debug</Configuration>

+      <Platform>Win32</Platform>

+    </ProjectConfiguration>

+    <ProjectConfiguration Include="Release|Win32">

+      <Configuration>Release</Configuration>

+      <Platform>Win32</Platform>

+    </ProjectConfiguration>

+    <ProjectConfiguration Include="Debug|x64">

+      <Configuration>Debug</Configuration>

+      <Platform>x64</Platform>

+    </ProjectConfiguration>

+    <ProjectConfiguration Include="Release|x64">

+      <Configuration>Release</Configuration>

+      <Platform>x64</Platform>

+    </ProjectConfiguration>

+    <ProjectConfiguration Include="Debug|ARM">

+      <Configuration>Debug</Configuration>

+      <Platform>ARM</Platform>

+    </ProjectConfiguration>

+    <ProjectConfiguration Include="Release|ARM">

+      <Configuration>Release</Configuration>

+      <Platform>ARM</Platform>

+    </ProjectConfiguration>

+    <ProjectConfiguration Include="Debug|ARM64">

+      <Configuration>Debug</Configuration>

+      <Platform>ARM64</Platform>

+    </ProjectConfiguration>

+    <ProjectConfiguration Include="Release|ARM64">

+      <Configuration>Release</Configuration>

+      <Platform>ARM64</Platform>

+    </ProjectConfiguration>

+  </ItemGroup>

+  <PropertyGroup Label="Globals">

+    <ProjectGuid>{634f5b60-2642-427e-902e-ff6a90965205}</ProjectGuid>

+    <Keyword>DirectXApp</Keyword>

+    <RootNamespace>uwp_dx12_player</RootNamespace>

+    <DefaultLanguage>en-US</DefaultLanguage>

+    <MinimumVisualStudioVersion>14.0</MinimumVisualStudioVersion>

+    <AppContainerApplication>true</AppContainerApplication>

+    <ApplicationType>Windows Store</ApplicationType>

+    <WindowsTargetPlatformVersion>10.0.17763.0</WindowsTargetPlatformVersion>

+    <WindowsTargetPlatformMinVersion>10.0.16299.0</WindowsTargetPlatformMinVersion>

+    <ApplicationTypeRevision>10.0</ApplicationTypeRevision>

+  </PropertyGroup>

+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">

+    <ConfigurationType>Application</ConfigurationType>

+    <UseDebugLibraries>true</UseDebugLibraries>

+    <PlatformToolset>v141</PlatformToolset>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|ARM'" Label="Configuration">

+    <ConfigurationType>Application</ConfigurationType>

+    <UseDebugLibraries>true</UseDebugLibraries>

+    <PlatformToolset>v141</PlatformToolset>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|ARM64'" Label="Configuration">

+    <ConfigurationType>Application</ConfigurationType>

+    <UseDebugLibraries>true</UseDebugLibraries>

+    <PlatformToolset>v141</PlatformToolset>

+    <UseDotNetNativeToolchain>true</UseDotNetNativeToolchain>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">

+    <ConfigurationType>Application</ConfigurationType>

+    <UseDebugLibraries>true</UseDebugLibraries>

+    <PlatformToolset>v141</PlatformToolset>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">

+    <ConfigurationType>Application</ConfigurationType>

+    <UseDebugLibraries>false</UseDebugLibraries>

+    <WholeProgramOptimization>true</WholeProgramOptimization>

+    <PlatformToolset>v141</PlatformToolset>

+    <UseDotNetNativeToolchain>true</UseDotNetNativeToolchain>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|ARM'" Label="Configuration">

+    <ConfigurationType>Application</ConfigurationType>

+    <UseDebugLibraries>false</UseDebugLibraries>

+    <WholeProgramOptimization>true</WholeProgramOptimization>

+    <PlatformToolset>v141</PlatformToolset>

+    <UseDotNetNativeToolchain>true</UseDotNetNativeToolchain>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|ARM64'" Label="Configuration">

+    <ConfigurationType>Application</ConfigurationType>

+    <UseDebugLibraries>false</UseDebugLibraries>

+    <WholeProgramOptimization>true</WholeProgramOptimization>

+    <PlatformToolset>v141</PlatformToolset>

+    <UseDotNetNativeToolchain>true</UseDotNetNativeToolchain>

+  </PropertyGroup>

+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">

+    <ConfigurationType>Application</ConfigurationType>

+    <UseDebugLibraries>false</UseDebugLibraries>

+    <WholeProgramOptimization>true</WholeProgramOptimization>

+    <PlatformToolset>v141</PlatformToolset>

+    <UseDotNetNativeToolchain>true</UseDotNetNativeToolchain>

+  </PropertyGroup>

+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />

+  <ImportGroup Label="ExtensionSettings">

+    <Import Project="$(VSINSTALLDIR)\Common7\IDE\Extensions\Microsoft\VsGraphics\ImageContentTask.props" />

+    <Import Project="$(VSINSTALLDIR)\Common7\IDE\Extensions\Microsoft\VsGraphics\MeshContentTask.props" />

+    <Import Project="$(VSINSTALLDIR)\Common7\IDE\Extensions\Microsoft\VsGraphics\ShaderGraphContentTask.props" />

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|ARM'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|ARM'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|ARM64'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|ARM64'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />

+  </ImportGroup>

+  <PropertyGroup Label="UserMacros" />

+  <PropertyGroup>

+    <PackageCertificateKeyFile>uwp_dx12_player_TemporaryKey.pfx</PackageCertificateKeyFile>

+  </PropertyGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|ARM'">

+    <Link>

+      <AdditionalDependencies>d3d12.lib;dxgi.lib;dxguid.lib;%(AdditionalDependencies)</AdditionalDependencies>

+      <AdditionalLibraryDirectories>%(AdditionalLibraryDirectories); $(VCInstallDir)\lib\store\arm; $(VCInstallDir)\lib\arm</AdditionalLibraryDirectories>

+    </Link>

+    <ClCompile>

+      <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>

+      <PrecompiledHeaderOutputFile>$(IntDir)pch.pch</PrecompiledHeaderOutputFile>

+      <AdditionalIncludeDirectories>$(ProjectDir);$(IntermediateOutputPath);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>

+      <AdditionalOptions>/bigobj %(AdditionalOptions)</AdditionalOptions>

+      <PreprocessorDefinitions>_DEBUG;%(PreprocessorDefinitions)</PreprocessorDefinitions>

+    </ClCompile>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|ARM'">

+    <Link>

+      <AdditionalDependencies>d3d12.lib;dxgi.lib;dxguid.lib;%(AdditionalDependencies)</AdditionalDependencies>

+      <AdditionalLibraryDirectories>%(AdditionalLibraryDirectories); $(VCInstallDir)\lib\store\arm; $(VCInstallDir)\lib\arm</AdditionalLibraryDirectories>

+    </Link>

+    <ClCompile>

+      <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>

+      <PrecompiledHeaderOutputFile>$(IntDir)pch.pch</PrecompiledHeaderOutputFile>

+      <AdditionalIncludeDirectories>$(ProjectDir);$(IntermediateOutputPath);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>

+      <AdditionalOptions>/bigobj %(AdditionalOptions)</AdditionalOptions>

+      <PreprocessorDefinitions>NDEBUG;%(PreprocessorDefinitions)</PreprocessorDefinitions>

+    </ClCompile>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|ARM64'">

+    <Link>

+      <AdditionalDependencies>d3d12.lib;dxgi.lib;dxguid.lib;%(AdditionalDependencies)</AdditionalDependencies>

+      <AdditionalLibraryDirectories>%(AdditionalLibraryDirectories); $(VCInstallDir)\lib\store\arm64; $(VCInstallDir)\lib\arm64</AdditionalLibraryDirectories>

+    </Link>

+    <ClCompile>

+      <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>

+      <PrecompiledHeaderOutputFile>$(IntDir)pch.pch</PrecompiledHeaderOutputFile>

+      <AdditionalIncludeDirectories>$(ProjectDir);$(IntermediateOutputPath);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>

+      <AdditionalOptions>/bigobj %(AdditionalOptions)</AdditionalOptions>

+      <PreprocessorDefinitions>_DEBUG;%(PreprocessorDefinitions)</PreprocessorDefinitions>

+    </ClCompile>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|ARM64'">

+    <Link>

+      <AdditionalDependencies>d3d12.lib;dxgi.lib;dxguid.lib;%(AdditionalDependencies)</AdditionalDependencies>

+      <AdditionalLibraryDirectories>%(AdditionalLibraryDirectories); $(VCInstallDir)\lib\store\arm64; $(VCInstallDir)\lib\arm64</AdditionalLibraryDirectories>

+    </Link>

+    <ClCompile>

+      <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>

+      <PrecompiledHeaderOutputFile>$(IntDir)pch.pch</PrecompiledHeaderOutputFile>

+      <AdditionalIncludeDirectories>$(ProjectDir);$(IntermediateOutputPath);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>

+      <AdditionalOptions>/bigobj %(AdditionalOptions)</AdditionalOptions>

+      <PreprocessorDefinitions>NDEBUG;%(PreprocessorDefinitions)</PreprocessorDefinitions>

+    </ClCompile>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">

+    <Link>

+      <AdditionalDependencies>d3d12.lib;dxgi.lib;dxguid.lib;%(AdditionalDependencies)</AdditionalDependencies>

+      <AdditionalLibraryDirectories>%(AdditionalLibraryDirectories); $(VCInstallDir)\lib\store; $(VCInstallDir)\lib</AdditionalLibraryDirectories>

+    </Link>

+    <ClCompile>

+      <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>

+      <PrecompiledHeaderOutputFile>$(IntDir)pch.pch</PrecompiledHeaderOutputFile>

+      <AdditionalIncludeDirectories>$(ProjectDir);$(IntermediateOutputPath);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>

+      <AdditionalOptions>/bigobj %(AdditionalOptions)</AdditionalOptions>

+      <PreprocessorDefinitions>_DEBUG;%(PreprocessorDefinitions)</PreprocessorDefinitions>

+    </ClCompile>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">

+    <Link>

+      <AdditionalDependencies>d3d12.lib;dxgi.lib;dxguid.lib;%(AdditionalDependencies)</AdditionalDependencies>

+      <AdditionalLibraryDirectories>%(AdditionalLibraryDirectories); $(VCInstallDir)\lib\store; $(VCInstallDir)\lib</AdditionalLibraryDirectories>

+    </Link>

+    <ClCompile>

+      <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>

+      <PrecompiledHeaderOutputFile>$(IntDir)pch.pch</PrecompiledHeaderOutputFile>

+      <AdditionalIncludeDirectories>$(ProjectDir);$(IntermediateOutputPath);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>

+      <AdditionalOptions>/bigobj %(AdditionalOptions)</AdditionalOptions>

+      <PreprocessorDefinitions>NDEBUG;%(PreprocessorDefinitions)</PreprocessorDefinitions>

+    </ClCompile>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">

+    <Link>

+      <AdditionalDependencies>d3d12.lib;dxgi.lib;dxguid.lib;%(AdditionalDependencies)</AdditionalDependencies>

+      <AdditionalLibraryDirectories>%(AdditionalLibraryDirectories); $(VCInstallDir)\lib\store\amd64; $(VCInstallDir)\lib\amd64</AdditionalLibraryDirectories>

+    </Link>

+    <ClCompile>

+      <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>

+      <PrecompiledHeaderOutputFile>$(IntDir)pch.pch</PrecompiledHeaderOutputFile>

+      <AdditionalIncludeDirectories>$(ProjectDir);$(SolutionDir)libav1\build;$(SolutionDir)libav1;$(SolutionDir)third_party\libwebm;$(SolutionDir)testing;$(IntermediateOutputPath);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>

+      <AdditionalOptions>/bigobj %(AdditionalOptions)</AdditionalOptions>

+      <PreprocessorDefinitions>_DEBUG;_CRT_SECURE_NO_WARNINGS;%(PreprocessorDefinitions)</PreprocessorDefinitions>

+    </ClCompile>

+  </ItemDefinitionGroup>

+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">

+    <Link>

+      <AdditionalDependencies>d3d12.lib;dxgi.lib;dxguid.lib;%(AdditionalDependencies)</AdditionalDependencies>

+      <AdditionalLibraryDirectories>%(AdditionalLibraryDirectories); $(VCInstallDir)\lib\store\amd64; $(VCInstallDir)\lib\amd64</AdditionalLibraryDirectories>

+    </Link>

+    <ClCompile>

+      <PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>

+      <PrecompiledHeaderOutputFile>$(IntDir)pch.pch</PrecompiledHeaderOutputFile>

+      <AdditionalIncludeDirectories>$(ProjectDir);$(SolutionDir)libav1\build;$(SolutionDir)libav1;$(SolutionDir)third_party\libwebm;$(SolutionDir)testing;$(IntermediateOutputPath);%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>

+      <AdditionalOptions>/bigobj %(AdditionalOptions)</AdditionalOptions>

+      <PreprocessorDefinitions>NDEBUG;_CRT_SECURE_NO_WARNINGS;%(PreprocessorDefinitions)</PreprocessorDefinitions>

+    </ClCompile>

+  </ItemDefinitionGroup>

+  <ItemGroup>

+    <Image Include="Assets\LockScreenLogo.scale-200.png" />

+    <Image Include="Assets\SplashScreen.scale-200.png" />

+    <Image Include="Assets\Square150x150Logo.scale-200.png" />

+    <Image Include="Assets\Square44x44Logo.scale-200.png" />

+    <Image Include="Assets\Square44x44Logo.targetsize-24_altform-unplated.png" />

+    <Image Include="Assets\StoreLogo.png" />

+    <Image Include="Assets\Wide310x150Logo.scale-200.png" />

+  </ItemGroup>

+  <ItemGroup>

+    <ClInclude Include="App.h" />

+    <ClInclude Include="dxengine.h" />

+    <ClInclude Include="hlsl_cb.h" />

+    <ClInclude Include="pch.h" />

+    <ClInclude Include="playerwnd.h" />

+    <ClInclude Include="ps_cbuf.h" />

+  </ItemGroup>

+  <ItemGroup>

+    <ClCompile Include="App.cpp" />

+    <ClCompile Include="dxengine.cpp" />

+    <ClCompile Include="pch.cpp">

+      <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Create</PrecompiledHeader>

+      <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Create</PrecompiledHeader>

+      <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug|ARM'">Create</PrecompiledHeader>

+      <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release|ARM'">Create</PrecompiledHeader>

+      <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug|ARM64'">Create</PrecompiledHeader>

+      <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release|ARM64'">Create</PrecompiledHeader>

+      <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Create</PrecompiledHeader>

+      <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Create</PrecompiledHeader>

+    </ClCompile>

+    <ClCompile Include="playerwnd.cpp" />

+  </ItemGroup>

+  <ItemGroup>

+    <AppxManifest Include="Package.appxmanifest">

+      <SubType>Designer</SubType>

+    </AppxManifest>

+    <None Include="uwp_dx12_player_TemporaryKey.pfx" />

+  </ItemGroup>

+  <ItemGroup>

+    <FxCompile Include="SamplePixelShader.hlsl">

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Pixel</ShaderType>

+      <ShaderModel Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">5.0</ShaderModel>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Pixel</ShaderType>

+      <ShaderModel Condition="'$(Configuration)|$(Platform)'=='Release|x64'">5.0</ShaderModel>

+      <VariableName Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">%(Filename)</VariableName>

+      <VariableName Condition="'$(Configuration)|$(Platform)'=='Release|x64'">%(Filename)</VariableName>

+      <HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">$(ProjectDir)%(Filename).h</HeaderFileOutput>

+      <HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Release|x64'">$(ProjectDir)%(Filename).h</HeaderFileOutput>

+    </FxCompile>

+    <FxCompile Include="SampleVertexShader.hlsl">

+      <VariableName Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">%(Filename)</VariableName>

+      <HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">$(ProjectDir)%(Filename).h</HeaderFileOutput>

+      <VariableName Condition="'$(Configuration)|$(Platform)'=='Release|x64'">%(Filename)</VariableName>

+      <HeaderFileOutput Condition="'$(Configuration)|$(Platform)'=='Release|x64'">$(ProjectDir)%(Filename).h</HeaderFileOutput>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Vertex</ShaderType>

+      <ShaderModel Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">5.0</ShaderModel>

+      <ShaderType Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Vertex</ShaderType>

+      <ShaderModel Condition="'$(Configuration)|$(Platform)'=='Release|x64'">5.0</ShaderModel>

+    </FxCompile>

+  </ItemGroup>

+  <ItemGroup>

+    <ProjectReference Include="..\uwp_common\uwp_common.vcxproj">

+      <Project>{c7b73550-4ca4-45c1-8436-936afd7f0ca5}</Project>

+    </ProjectReference>

+  </ItemGroup>

+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />

+  <ImportGroup Label="ExtensionTargets">

+    <Import Project="$(VSINSTALLDIR)\Common7\IDE\Extensions\Microsoft\VsGraphics\ImageContentTask.targets" />

+    <Import Project="$(VSINSTALLDIR)\Common7\IDE\Extensions\Microsoft\VsGraphics\MeshContentTask.targets" />

+    <Import Project="$(VSINSTALLDIR)\Common7\IDE\Extensions\Microsoft\VsGraphics\ShaderGraphContentTask.targets" />

+  </ImportGroup>

+</Project>
\ No newline at end of file

diff --git a/testing/uwp_dx12_player/uwp_dx12_player_TemporaryKey.pfx b/testing/uwp_dx12_player/uwp_dx12_player_TemporaryKey.pfx
new file mode 100644
index 0000000..3553d8a
--- /dev/null
+++ b/testing/uwp_dx12_player/uwp_dx12_player_TemporaryKey.pfx
Binary files differ

diff --git a/third_party/libwebm/AUTHORS.TXT b/third_party/libwebm/AUTHORS.TXT
new file mode 100644
index 0000000..9686ac1
--- /dev/null
+++ b/third_party/libwebm/AUTHORS.TXT

@@ -0,0 +1,4 @@
+# Names should be added to this file like so:
+# Name or Organization <email address>
+
+Google Inc.

diff --git a/third_party/libwebm/LICENSE.TXT b/third_party/libwebm/LICENSE.TXT
new file mode 100644
index 0000000..7a6f995
--- /dev/null
+++ b/third_party/libwebm/LICENSE.TXT

@@ -0,0 +1,30 @@
+Copyright (c) 2010, Google Inc. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+  * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+
+  * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+
+  * Neither the name of Google nor the names of its contributors may
+    be used to endorse or promote products derived from this software
+    without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+

diff --git a/third_party/libwebm/PATENTS.TXT b/third_party/libwebm/PATENTS.TXT
new file mode 100644
index 0000000..caedf60
--- /dev/null
+++ b/third_party/libwebm/PATENTS.TXT

@@ -0,0 +1,23 @@
+Additional IP Rights Grant (Patents)
+------------------------------------
+
+"These implementations" means the copyrightable works that implement the WebM
+codecs distributed by Google as part of the WebM Project.
+
+Google hereby grants to you a perpetual, worldwide, non-exclusive, no-charge,
+royalty-free, irrevocable (except as stated in this section) patent license to
+make, have made, use, offer to sell, sell, import, transfer, and otherwise
+run, modify and propagate the contents of these implementations of WebM, where
+such license applies only to those patent claims, both currently owned by
+Google and acquired in the future, licensable by Google that are necessarily
+infringed by these implementations of WebM. This grant does not include claims
+that would be infringed only as a consequence of further modification of these
+implementations. If you or your agent or exclusive licensee institute or order
+or agree to the institution of patent litigation or any other patent
+enforcement activity against any entity (including a cross-claim or
+counterclaim in a lawsuit) alleging that any of these implementations of WebM
+or any code incorporated within any of these implementations of WebM
+constitute direct or contributory patent infringement, or inducement of
+patent infringement, then any patent rights granted to you under this License
+for these implementations of WebM shall terminate as of the date such
+litigation is filed.

diff --git a/third_party/libwebm/README.libaom b/third_party/libwebm/README.libaom
new file mode 100644
index 0000000..1e87afd
--- /dev/null
+++ b/third_party/libwebm/README.libaom

@@ -0,0 +1,20 @@
+URL: https://chromium.googlesource.com/webm/libwebm
+Version: 37d9b860ebbf40cb0f6dcb7a6fef452d798062da
+License: BSD
+License File: LICENSE.txt
+
+Description:
+libwebm is used to handle WebM container I/O.
+
+Local Changes:
+Only keep:
+ - Android.mk
+ - AUTHORS.TXT
+ - common/
+    file_util.cc/h
+    hdr_util.cc/h
+    webmids.h
+ - LICENSE.TXT
+ - mkvmuxer/
+ - mkvparser/
+ - PATENTS.TXT

diff --git a/third_party/libwebm/common/webmids.h b/third_party/libwebm/common/webmids.h
new file mode 100644
index 0000000..fc0c208
--- /dev/null
+++ b/third_party/libwebm/common/webmids.h

@@ -0,0 +1,193 @@
+// Copyright (c) 2012 The WebM project authors. All Rights Reserved.
+//
+// Use of this source code is governed by a BSD-style license
+// that can be found in the LICENSE file in the root of the source
+// tree. An additional intellectual property rights grant can be found
+// in the file PATENTS.  All contributing project authors may
+// be found in the AUTHORS file in the root of the source tree.
+
+#ifndef COMMON_WEBMIDS_H_
+#define COMMON_WEBMIDS_H_
+
+namespace libwebm {
+
+enum MkvId {
+  kMkvEBML = 0x1A45DFA3,
+  kMkvEBMLVersion = 0x4286,
+  kMkvEBMLReadVersion = 0x42F7,
+  kMkvEBMLMaxIDLength = 0x42F2,
+  kMkvEBMLMaxSizeLength = 0x42F3,
+  kMkvDocType = 0x4282,
+  kMkvDocTypeVersion = 0x4287,
+  kMkvDocTypeReadVersion = 0x4285,
+  kMkvVoid = 0xEC,
+  kMkvSignatureSlot = 0x1B538667,
+  kMkvSignatureAlgo = 0x7E8A,
+  kMkvSignatureHash = 0x7E9A,
+  kMkvSignaturePublicKey = 0x7EA5,
+  kMkvSignature = 0x7EB5,
+  kMkvSignatureElements = 0x7E5B,
+  kMkvSignatureElementList = 0x7E7B,
+  kMkvSignedElement = 0x6532,
+  // segment
+  kMkvSegment = 0x18538067,
+  // Meta Seek Information
+  kMkvSeekHead = 0x114D9B74,
+  kMkvSeek = 0x4DBB,
+  kMkvSeekID = 0x53AB,
+  kMkvSeekPosition = 0x53AC,
+  // Segment Information
+  kMkvInfo = 0x1549A966,
+  kMkvTimecodeScale = 0x2AD7B1,
+  kMkvDuration = 0x4489,
+  kMkvDateUTC = 0x4461,
+  kMkvTitle = 0x7BA9,
+  kMkvMuxingApp = 0x4D80,
+  kMkvWritingApp = 0x5741,
+  // Cluster
+  kMkvCluster = 0x1F43B675,
+  kMkvTimecode = 0xE7,
+  kMkvPrevSize = 0xAB,
+  kMkvBlockGroup = 0xA0,
+  kMkvBlock = 0xA1,
+  kMkvBlockDuration = 0x9B,
+  kMkvReferenceBlock = 0xFB,
+  kMkvLaceNumber = 0xCC,
+  kMkvSimpleBlock = 0xA3,
+  kMkvBlockAdditions = 0x75A1,
+  kMkvBlockMore = 0xA6,
+  kMkvBlockAddID = 0xEE,
+  kMkvBlockAdditional = 0xA5,
+  kMkvDiscardPadding = 0x75A2,
+  // Track
+  kMkvTracks = 0x1654AE6B,
+  kMkvTrackEntry = 0xAE,
+  kMkvTrackNumber = 0xD7,
+  kMkvTrackUID = 0x73C5,
+  kMkvTrackType = 0x83,
+  kMkvFlagEnabled = 0xB9,
+  kMkvFlagDefault = 0x88,
+  kMkvFlagForced = 0x55AA,
+  kMkvFlagLacing = 0x9C,
+  kMkvDefaultDuration = 0x23E383,
+  kMkvMaxBlockAdditionID = 0x55EE,
+  kMkvName = 0x536E,
+  kMkvLanguage = 0x22B59C,
+  kMkvCodecID = 0x86,
+  kMkvCodecPrivate = 0x63A2,
+  kMkvCodecName = 0x258688,
+  kMkvCodecDelay = 0x56AA,
+  kMkvSeekPreRoll = 0x56BB,
+  // video
+  kMkvVideo = 0xE0,
+  kMkvFlagInterlaced = 0x9A,
+  kMkvStereoMode = 0x53B8,
+  kMkvAlphaMode = 0x53C0,
+  kMkvPixelWidth = 0xB0,
+  kMkvPixelHeight = 0xBA,
+  kMkvPixelCropBottom = 0x54AA,
+  kMkvPixelCropTop = 0x54BB,
+  kMkvPixelCropLeft = 0x54CC,
+  kMkvPixelCropRight = 0x54DD,
+  kMkvDisplayWidth = 0x54B0,
+  kMkvDisplayHeight = 0x54BA,
+  kMkvDisplayUnit = 0x54B2,
+  kMkvAspectRatioType = 0x54B3,
+  kMkvColourSpace = 0x2EB524,
+  kMkvFrameRate = 0x2383E3,
+  // end video
+  // colour
+  kMkvColour = 0x55B0,
+  kMkvMatrixCoefficients = 0x55B1,
+  kMkvBitsPerChannel = 0x55B2,
+  kMkvChromaSubsamplingHorz = 0x55B3,
+  kMkvChromaSubsamplingVert = 0x55B4,
+  kMkvCbSubsamplingHorz = 0x55B5,
+  kMkvCbSubsamplingVert = 0x55B6,
+  kMkvChromaSitingHorz = 0x55B7,
+  kMkvChromaSitingVert = 0x55B8,
+  kMkvRange = 0x55B9,
+  kMkvTransferCharacteristics = 0x55BA,
+  kMkvPrimaries = 0x55BB,
+  kMkvMaxCLL = 0x55BC,
+  kMkvMaxFALL = 0x55BD,
+  // mastering metadata
+  kMkvMasteringMetadata = 0x55D0,
+  kMkvPrimaryRChromaticityX = 0x55D1,
+  kMkvPrimaryRChromaticityY = 0x55D2,
+  kMkvPrimaryGChromaticityX = 0x55D3,
+  kMkvPrimaryGChromaticityY = 0x55D4,
+  kMkvPrimaryBChromaticityX = 0x55D5,
+  kMkvPrimaryBChromaticityY = 0x55D6,
+  kMkvWhitePointChromaticityX = 0x55D7,
+  kMkvWhitePointChromaticityY = 0x55D8,
+  kMkvLuminanceMax = 0x55D9,
+  kMkvLuminanceMin = 0x55DA,
+  // end mastering metadata
+  // end colour
+  // projection
+  kMkvProjection = 0x7670,
+  kMkvProjectionType = 0x7671,
+  kMkvProjectionPrivate = 0x7672,
+  kMkvProjectionPoseYaw = 0x7673,
+  kMkvProjectionPosePitch = 0x7674,
+  kMkvProjectionPoseRoll = 0x7675,
+  // end projection
+  // audio
+  kMkvAudio = 0xE1,
+  kMkvSamplingFrequency = 0xB5,
+  kMkvOutputSamplingFrequency = 0x78B5,
+  kMkvChannels = 0x9F,
+  kMkvBitDepth = 0x6264,
+  // end audio
+  // ContentEncodings
+  kMkvContentEncodings = 0x6D80,
+  kMkvContentEncoding = 0x6240,
+  kMkvContentEncodingOrder = 0x5031,
+  kMkvContentEncodingScope = 0x5032,
+  kMkvContentEncodingType = 0x5033,
+  kMkvContentCompression = 0x5034,
+  kMkvContentCompAlgo = 0x4254,
+  kMkvContentCompSettings = 0x4255,
+  kMkvContentEncryption = 0x5035,
+  kMkvContentEncAlgo = 0x47E1,
+  kMkvContentEncKeyID = 0x47E2,
+  kMkvContentSignature = 0x47E3,
+  kMkvContentSigKeyID = 0x47E4,
+  kMkvContentSigAlgo = 0x47E5,
+  kMkvContentSigHashAlgo = 0x47E6,
+  kMkvContentEncAESSettings = 0x47E7,
+  kMkvAESSettingsCipherMode = 0x47E8,
+  kMkvAESSettingsCipherInitData = 0x47E9,
+  // end ContentEncodings
+  // Cueing Data
+  kMkvCues = 0x1C53BB6B,
+  kMkvCuePoint = 0xBB,
+  kMkvCueTime = 0xB3,
+  kMkvCueTrackPositions = 0xB7,
+  kMkvCueTrack = 0xF7,
+  kMkvCueClusterPosition = 0xF1,
+  kMkvCueBlockNumber = 0x5378,
+  // Chapters
+  kMkvChapters = 0x1043A770,
+  kMkvEditionEntry = 0x45B9,
+  kMkvChapterAtom = 0xB6,
+  kMkvChapterUID = 0x73C4,
+  kMkvChapterStringUID = 0x5654,
+  kMkvChapterTimeStart = 0x91,
+  kMkvChapterTimeEnd = 0x92,
+  kMkvChapterDisplay = 0x80,
+  kMkvChapString = 0x85,
+  kMkvChapLanguage = 0x437C,
+  kMkvChapCountry = 0x437E,
+  // Tags
+  kMkvTags = 0x1254C367,
+  kMkvTag = 0x7373,
+  kMkvSimpleTag = 0x67C8,
+  kMkvTagName = 0x45A3,
+  kMkvTagString = 0x4487
+};
+
+}  // namespace libwebm
+
+#endif  // COMMON_WEBMIDS_H_

diff --git a/third_party/libwebm/mkvparser/mkvparser.cc b/third_party/libwebm/mkvparser/mkvparser.cc
new file mode 100644
index 0000000..397ebc2
--- /dev/null
+++ b/third_party/libwebm/mkvparser/mkvparser.cc

@@ -0,0 +1,7237 @@
+// Copyright (c) 2012 The WebM project authors. All Rights Reserved.
+//
+// Use of this source code is governed by a BSD-style license
+// that can be found in the LICENSE file in the root of the source
+// tree. An additional intellectual property rights grant can be found
+// in the file PATENTS.  All contributing project authors may
+// be found in the AUTHORS file in the root of the source tree.
+#include "mkvparser/mkvparser.h"
+
+#if defined(_MSC_VER) && _MSC_VER < 1800
+#include <float.h>  // _isnan() / _finite()
+#define MSC_COMPAT
+#endif
+
+#include <cassert>
+#include <cfloat>
+#include <climits>
+#include <cmath>
+#include <cstring>
+#include <memory>
+#include <new>
+
+#include "common/webmids.h"
+
+namespace mkvparser {
+const long long kStringElementSizeLimit = 20 * 1000 * 1000;
+const float MasteringMetadata::kValueNotPresent = FLT_MAX;
+const long long Colour::kValueNotPresent = LLONG_MAX;
+const float Projection::kValueNotPresent = FLT_MAX;
+
+#ifdef MSC_COMPAT
+inline bool isnan(double val) { return !!_isnan(val); }
+inline bool isinf(double val) { return !_finite(val); }
+#else
+inline bool isnan(double val) { return std::isnan(val); }
+inline bool isinf(double val) { return std::isinf(val); }
+#endif  // MSC_COMPAT
+
+template <typename Type>
+Type* SafeArrayAlloc(unsigned long long num_elements, unsigned long long element_size) {
+  if (num_elements == 0 || element_size == 0) return NULL;
+
+  const size_t kMaxAllocSize = 0x80000000;  // 2GiB
+  const unsigned long long num_bytes = num_elements * element_size;
+  if (element_size > (kMaxAllocSize / num_elements)) return NULL;
+  if (num_bytes != static_cast<size_t>(num_bytes)) return NULL;
+
+  return new (std::nothrow) Type[static_cast<size_t>(num_bytes)];
+}
+
+void GetVersion(int& major, int& minor, int& build, int& revision) {
+  major = 1;
+  minor = 0;
+  build = 0;
+  revision = 30;
+}
+
+long long ReadUInt(IMkvReader* pReader, long long pos, long& len) {
+  if (!pReader || pos < 0) return E_FILE_FORMAT_INVALID;
+
+  len = 1;
+  unsigned char b;
+  int status = pReader->Read(pos, 1, &b);
+
+  if (status < 0)  // error or underflow
+    return status;
+
+  if (status > 0)  // interpreted as "underflow"
+    return E_BUFFER_NOT_FULL;
+
+  if (b == 0)  // we can't handle u-int values larger than 8 bytes
+    return E_FILE_FORMAT_INVALID;
+
+  unsigned char m = 0x80;
+
+  while (!(b & m)) {
+    m >>= 1;
+    ++len;
+  }
+
+  long long result = b & (~m);
+  ++pos;
+
+  for (int i = 1; i < len; ++i) {
+    status = pReader->Read(pos, 1, &b);
+
+    if (status < 0) {
+      len = 1;
+      return status;
+    }
+
+    if (status > 0) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    result <<= 8;
+    result |= b;
+
+    ++pos;
+  }
+
+  return result;
+}
+
+// Reads an EBML ID and returns it.
+// An ID must at least 1 byte long, cannot exceed 4, and its value must be
+// greater than 0.
+// See known EBML values and EBMLMaxIDLength:
+// http://www.matroska.org/technical/specs/index.html
+// Returns the ID, or a value less than 0 to report an error while reading the
+// ID.
+long long ReadID(IMkvReader* pReader, long long pos, long& len) {
+  if (pReader == NULL || pos < 0) return E_FILE_FORMAT_INVALID;
+
+  // Read the first byte. The length in bytes of the ID is determined by
+  // finding the first set bit in the first byte of the ID.
+  unsigned char temp_byte = 0;
+  int read_status = pReader->Read(pos, 1, &temp_byte);
+
+  if (read_status < 0)
+    return E_FILE_FORMAT_INVALID;
+  else if (read_status > 0)  // No data to read.
+    return E_BUFFER_NOT_FULL;
+
+  if (temp_byte == 0)  // ID length > 8 bytes; invalid file.
+    return E_FILE_FORMAT_INVALID;
+
+  int bit_pos = 0;
+  const int kMaxIdLengthInBytes = 4;
+  const int kCheckByte = 0x80;
+
+  // Find the first bit that's set.
+  bool found_bit = false;
+  for (; bit_pos < kMaxIdLengthInBytes; ++bit_pos) {
+    if ((kCheckByte >> bit_pos) & temp_byte) {
+      found_bit = true;
+      break;
+    }
+  }
+
+  if (!found_bit) {
+    // The value is too large to be a valid ID.
+    return E_FILE_FORMAT_INVALID;
+  }
+
+  // Read the remaining bytes of the ID (if any).
+  const int id_length = bit_pos + 1;
+  long long ebml_id = temp_byte;
+  for (int i = 1; i < id_length; ++i) {
+    ebml_id <<= 8;
+    read_status = pReader->Read(pos + i, 1, &temp_byte);
+
+    if (read_status < 0)
+      return E_FILE_FORMAT_INVALID;
+    else if (read_status > 0)
+      return E_BUFFER_NOT_FULL;
+
+    ebml_id |= temp_byte;
+  }
+
+  len = id_length;
+  return ebml_id;
+}
+
+long long GetUIntLength(IMkvReader* pReader, long long pos, long& len) {
+  if (!pReader || pos < 0) return E_FILE_FORMAT_INVALID;
+
+  long long total, available;
+
+  int status = pReader->Length(&total, &available);
+  if (status < 0 || (total >= 0 && available > total)) return E_FILE_FORMAT_INVALID;
+
+  len = 1;
+
+  if (pos >= available) return pos;  // too few bytes available
+
+  unsigned char b;
+
+  status = pReader->Read(pos, 1, &b);
+
+  if (status != 0) return status;
+
+  if (b == 0)  // we can't handle u-int values larger than 8 bytes
+    return E_FILE_FORMAT_INVALID;
+
+  unsigned char m = 0x80;
+
+  while (!(b & m)) {
+    m >>= 1;
+    ++len;
+  }
+
+  return 0;  // success
+}
+
+// TODO(vigneshv): This function assumes that unsigned values never have their
+// high bit set.
+long long UnserializeUInt(IMkvReader* pReader, long long pos, long long size) {
+  if (!pReader || pos < 0 || (size <= 0) || (size > 8)) return E_FILE_FORMAT_INVALID;
+
+  long long result = 0;
+
+  for (long long i = 0; i < size; ++i) {
+    unsigned char b;
+
+    const long status = pReader->Read(pos, 1, &b);
+
+    if (status < 0) return status;
+
+    result <<= 8;
+    result |= b;
+
+    ++pos;
+  }
+
+  return result;
+}
+
+long UnserializeFloat(IMkvReader* pReader, long long pos, long long size_, double& result) {
+  if (!pReader || pos < 0 || ((size_ != 4) && (size_ != 8))) return E_FILE_FORMAT_INVALID;
+
+  const long size = static_cast<long>(size_);
+
+  unsigned char buf[8];
+
+  const int status = pReader->Read(pos, size, buf);
+
+  if (status < 0)  // error
+    return status;
+
+  if (size == 4) {
+    union {
+      float f;
+      unsigned long ff;
+    };
+
+    ff = 0;
+
+    for (int i = 0;;) {
+      ff |= buf[i];
+
+      if (++i >= 4) break;
+
+      ff <<= 8;
+    }
+
+    result = f;
+  } else {
+    union {
+      double d;
+      unsigned long long dd;
+    };
+
+    dd = 0;
+
+    for (int i = 0;;) {
+      dd |= buf[i];
+
+      if (++i >= 8) break;
+
+      dd <<= 8;
+    }
+
+    result = d;
+  }
+
+  if (mkvparser::isinf(result) || mkvparser::isnan(result)) return E_FILE_FORMAT_INVALID;
+
+  return 0;
+}
+
+long UnserializeInt(IMkvReader* pReader, long long pos, long long size, long long& result_ref) {
+  if (!pReader || pos < 0 || size < 1 || size > 8) return E_FILE_FORMAT_INVALID;
+
+  signed char first_byte = 0;
+  const long status = pReader->Read(pos, 1, (unsigned char*)&first_byte);
+
+  if (status < 0) return status;
+
+  unsigned long long result = first_byte;
+  ++pos;
+
+  for (long i = 1; i < size; ++i) {
+    unsigned char b;
+
+    const long status = pReader->Read(pos, 1, &b);
+
+    if (status < 0) return status;
+
+    result <<= 8;
+    result |= b;
+
+    ++pos;
+  }
+
+  result_ref = static_cast<long long>(result);
+  return 0;
+}
+
+long UnserializeString(IMkvReader* pReader, long long pos, long long size, char*& str) {
+  delete[] str;
+  str = NULL;
+
+  if (size >= LONG_MAX || size < 0 || size > kStringElementSizeLimit) return E_FILE_FORMAT_INVALID;
+
+  // +1 for '\0' terminator
+  const long required_size = static_cast<long>(size) + 1;
+
+  str = SafeArrayAlloc<char>(1, required_size);
+  if (str == NULL) return E_FILE_FORMAT_INVALID;
+
+  unsigned char* const buf = reinterpret_cast<unsigned char*>(str);
+
+  const long status = pReader->Read(pos, static_cast<long>(size), buf);
+
+  if (status) {
+    delete[] str;
+    str = NULL;
+
+    return status;
+  }
+
+  str[required_size - 1] = '\0';
+  return 0;
+}
+
+long ParseElementHeader(IMkvReader* pReader, long long& pos, long long stop, long long& id, long long& size) {
+  if (stop >= 0 && pos >= stop) return E_FILE_FORMAT_INVALID;
+
+  long len;
+
+  id = ReadID(pReader, pos, len);
+
+  if (id < 0) return E_FILE_FORMAT_INVALID;
+
+  pos += len;  // consume id
+
+  if (stop >= 0 && pos >= stop) return E_FILE_FORMAT_INVALID;
+
+  size = ReadUInt(pReader, pos, len);
+
+  if (size < 0 || len < 1 || len > 8) {
+    // Invalid: Negative payload size, negative or 0 length integer, or integer
+    // larger than 64 bits (libwebm cannot handle them).
+    return E_FILE_FORMAT_INVALID;
+  }
+
+  // Avoid rolling over pos when very close to LLONG_MAX.
+  const unsigned long long rollover_check = static_cast<unsigned long long>(pos) + len;
+  if (rollover_check > LLONG_MAX) return E_FILE_FORMAT_INVALID;
+
+  pos += len;  // consume length of size
+
+  // pos now designates payload
+
+  if (stop >= 0 && pos > stop) return E_FILE_FORMAT_INVALID;
+
+  return 0;  // success
+}
+
+bool Match(IMkvReader* pReader, long long& pos, unsigned long expected_id, long long& val) {
+  if (!pReader || pos < 0) return false;
+
+  long long total = 0;
+  long long available = 0;
+
+  const long status = pReader->Length(&total, &available);
+  if (status < 0 || (total >= 0 && available > total)) return false;
+
+  long len = 0;
+
+  const long long id = ReadID(pReader, pos, len);
+  if (id < 0 || (available - pos) > len) return false;
+
+  if (static_cast<unsigned long>(id) != expected_id) return false;
+
+  pos += len;  // consume id
+
+  const long long size = ReadUInt(pReader, pos, len);
+  if (size < 0 || size > 8 || len < 1 || len > 8 || (available - pos) > len) return false;
+
+  pos += len;  // consume length of size of payload
+
+  val = UnserializeUInt(pReader, pos, size);
+  if (val < 0) return false;
+
+  pos += size;  // consume size of payload
+
+  return true;
+}
+
+bool Match(IMkvReader* pReader, long long& pos, unsigned long expected_id, unsigned char*& buf, size_t& buflen) {
+  if (!pReader || pos < 0) return false;
+
+  long long total = 0;
+  long long available = 0;
+
+  long status = pReader->Length(&total, &available);
+  if (status < 0 || (total >= 0 && available > total)) return false;
+
+  long len = 0;
+  const long long id = ReadID(pReader, pos, len);
+  if (id < 0 || (available - pos) > len) return false;
+
+  if (static_cast<unsigned long>(id) != expected_id) return false;
+
+  pos += len;  // consume id
+
+  const long long size = ReadUInt(pReader, pos, len);
+  if (size < 0 || len <= 0 || len > 8 || (available - pos) > len) return false;
+
+  unsigned long long rollover_check = static_cast<unsigned long long>(pos) + len;
+  if (rollover_check > LLONG_MAX) return false;
+
+  pos += len;  // consume length of size of payload
+
+  rollover_check = static_cast<unsigned long long>(pos) + size;
+  if (rollover_check > LLONG_MAX) return false;
+
+  if ((pos + size) > available) return false;
+
+  if (size >= LONG_MAX) return false;
+
+  const long buflen_ = static_cast<long>(size);
+
+  buf = SafeArrayAlloc<unsigned char>(1, buflen_);
+  if (!buf) return false;
+
+  status = pReader->Read(pos, buflen_, buf);
+  if (status != 0) return false;
+
+  buflen = buflen_;
+
+  pos += size;  // consume size of payload
+  return true;
+}
+
+EBMLHeader::EBMLHeader() : m_docType(NULL) { Init(); }
+
+EBMLHeader::~EBMLHeader() { delete[] m_docType; }
+
+void EBMLHeader::Init() {
+  m_version = 1;
+  m_readVersion = 1;
+  m_maxIdLength = 4;
+  m_maxSizeLength = 8;
+
+  if (m_docType) {
+    delete[] m_docType;
+    m_docType = NULL;
+  }
+
+  m_docTypeVersion = 1;
+  m_docTypeReadVersion = 1;
+}
+
+long long EBMLHeader::Parse(IMkvReader* pReader, long long& pos) {
+  if (!pReader) return E_FILE_FORMAT_INVALID;
+
+  long long total, available;
+
+  long status = pReader->Length(&total, &available);
+
+  if (status < 0)  // error
+    return status;
+
+  pos = 0;
+
+  // Scan until we find what looks like the first byte of the EBML header.
+  const long long kMaxScanBytes = (available >= 1024) ? 1024 : available;
+  const unsigned char kEbmlByte0 = 0x1A;
+  unsigned char scan_byte = 0;
+
+  while (pos < kMaxScanBytes) {
+    status = pReader->Read(pos, 1, &scan_byte);
+
+    if (status < 0)  // error
+      return status;
+    else if (status > 0)
+      return E_BUFFER_NOT_FULL;
+
+    if (scan_byte == kEbmlByte0) break;
+
+    ++pos;
+  }
+
+  long len = 0;
+  const long long ebml_id = ReadID(pReader, pos, len);
+
+  if (ebml_id == E_BUFFER_NOT_FULL) return E_BUFFER_NOT_FULL;
+
+  if (len != 4 || ebml_id != libwebm::kMkvEBML) return E_FILE_FORMAT_INVALID;
+
+  // Move read pos forward to the EBML header size field.
+  pos += 4;
+
+  // Read length of size field.
+  long long result = GetUIntLength(pReader, pos, len);
+
+  if (result < 0)  // error
+    return E_FILE_FORMAT_INVALID;
+  else if (result > 0)  // need more data
+    return E_BUFFER_NOT_FULL;
+
+  if (len < 1 || len > 8) return E_FILE_FORMAT_INVALID;
+
+  if ((total >= 0) && ((total - pos) < len)) return E_FILE_FORMAT_INVALID;
+
+  if ((available - pos) < len) return pos + len;  // try again later
+
+  // Read the EBML header size.
+  result = ReadUInt(pReader, pos, len);
+
+  if (result < 0)  // error
+    return result;
+
+  pos += len;  // consume size field
+
+  // pos now designates start of payload
+
+  if ((total >= 0) && ((total - pos) < result)) return E_FILE_FORMAT_INVALID;
+
+  if ((available - pos) < result) return pos + result;
+
+  const long long end = pos + result;
+
+  Init();
+
+  while (pos < end) {
+    long long id, size;
+
+    status = ParseElementHeader(pReader, pos, end, id, size);
+
+    if (status < 0)  // error
+      return status;
+
+    if (size == 0) return E_FILE_FORMAT_INVALID;
+
+    if (id == libwebm::kMkvEBMLVersion) {
+      m_version = UnserializeUInt(pReader, pos, size);
+
+      if (m_version <= 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvEBMLReadVersion) {
+      m_readVersion = UnserializeUInt(pReader, pos, size);
+
+      if (m_readVersion <= 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvEBMLMaxIDLength) {
+      m_maxIdLength = UnserializeUInt(pReader, pos, size);
+
+      if (m_maxIdLength <= 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvEBMLMaxSizeLength) {
+      m_maxSizeLength = UnserializeUInt(pReader, pos, size);
+
+      if (m_maxSizeLength <= 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvDocType) {
+      if (m_docType) return E_FILE_FORMAT_INVALID;
+
+      status = UnserializeString(pReader, pos, size, m_docType);
+
+      if (status)  // error
+        return status;
+    } else if (id == libwebm::kMkvDocTypeVersion) {
+      m_docTypeVersion = UnserializeUInt(pReader, pos, size);
+
+      if (m_docTypeVersion <= 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvDocTypeReadVersion) {
+      m_docTypeReadVersion = UnserializeUInt(pReader, pos, size);
+
+      if (m_docTypeReadVersion <= 0) return E_FILE_FORMAT_INVALID;
+    }
+
+    pos += size;
+  }
+
+  if (pos != end) return E_FILE_FORMAT_INVALID;
+
+  // Make sure DocType, DocTypeReadVersion, and DocTypeVersion are valid.
+  if (m_docType == NULL || m_docTypeReadVersion <= 0 || m_docTypeVersion <= 0) return E_FILE_FORMAT_INVALID;
+
+  // Make sure EBMLMaxIDLength and EBMLMaxSizeLength are valid.
+  if (m_maxIdLength <= 0 || m_maxIdLength > 4 || m_maxSizeLength <= 0 || m_maxSizeLength > 8)
+    return E_FILE_FORMAT_INVALID;
+
+  return 0;
+}
+
+Segment::Segment(IMkvReader* pReader, long long elem_start,
+                 // long long elem_size,
+                 long long start, long long size)
+    : m_pReader(pReader),
+      m_element_start(elem_start),
+      // m_element_size(elem_size),
+      m_start(start),
+      m_size(size),
+      m_pos(start),
+      m_pUnknownSize(0),
+      m_pSeekHead(NULL),
+      m_pInfo(NULL),
+      m_pTracks(NULL),
+      m_pCues(NULL),
+      m_pChapters(NULL),
+      m_pTags(NULL),
+      m_clusters(NULL),
+      m_clusterCount(0),
+      m_clusterPreloadCount(0),
+      m_clusterSize(0) {}
+
+Segment::~Segment() {
+  const long count = m_clusterCount + m_clusterPreloadCount;
+
+  Cluster** i = m_clusters;
+  Cluster** j = m_clusters + count;
+
+  while (i != j) {
+    Cluster* const p = *i++;
+    delete p;
+  }
+
+  delete[] m_clusters;
+
+  delete m_pTracks;
+  delete m_pInfo;
+  delete m_pCues;
+  delete m_pChapters;
+  delete m_pTags;
+  delete m_pSeekHead;
+}
+
+long long Segment::CreateInstance(IMkvReader* pReader, long long pos, Segment*& pSegment) {
+  if (pReader == NULL || pos < 0) return E_PARSE_FAILED;
+
+  pSegment = NULL;
+
+  long long total, available;
+
+  const long status = pReader->Length(&total, &available);
+
+  if (status < 0)  // error
+    return status;
+
+  if (available < 0) return -1;
+
+  if ((total >= 0) && (available > total)) return -1;
+
+  // I would assume that in practice this loop would execute
+  // exactly once, but we allow for other elements (e.g. Void)
+  // to immediately follow the EBML header.  This is fine for
+  // the source filter case (since the entire file is available),
+  // but in the splitter case over a network we should probably
+  // just give up early.  We could for example decide only to
+  // execute this loop a maximum of, say, 10 times.
+  // TODO:
+  // There is an implied "give up early" by only parsing up
+  // to the available limit.  We do do that, but only if the
+  // total file size is unknown.  We could decide to always
+  // use what's available as our limit (irrespective of whether
+  // we happen to know the total file length).  This would have
+  // as its sense "parse this much of the file before giving up",
+  // which a slightly different sense from "try to parse up to
+  // 10 EMBL elements before giving up".
+
+  for (;;) {
+    if ((total >= 0) && (pos >= total)) return E_FILE_FORMAT_INVALID;
+
+    // Read ID
+    long len;
+    long long result = GetUIntLength(pReader, pos, len);
+
+    if (result)  // error, or too few available bytes
+      return result;
+
+    if ((total >= 0) && ((pos + len) > total)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > available) return pos + len;
+
+    const long long idpos = pos;
+    const long long id = ReadID(pReader, pos, len);
+
+    if (id < 0) return E_FILE_FORMAT_INVALID;
+
+    pos += len;  // consume ID
+
+    // Read Size
+
+    result = GetUIntLength(pReader, pos, len);
+
+    if (result)  // error, or too few available bytes
+      return result;
+
+    if ((total >= 0) && ((pos + len) > total)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > available) return pos + len;
+
+    long long size = ReadUInt(pReader, pos, len);
+
+    if (size < 0)  // error
+      return size;
+
+    pos += len;  // consume length of size of element
+
+    // Pos now points to start of payload
+
+    // Handle "unknown size" for live streaming of webm files.
+    const long long unknown_size = (1LL << (7 * len)) - 1;
+
+    if (id == libwebm::kMkvSegment) {
+      if (size == unknown_size)
+        size = -1;
+
+      else if (total < 0)
+        size = -1;
+
+      else if ((pos + size) > total)
+        size = -1;
+
+      pSegment = new (std::nothrow) Segment(pReader, idpos, pos, size);
+      if (pSegment == NULL) return E_PARSE_FAILED;
+
+      return 0;  // success
+    }
+
+    if (size == unknown_size) return E_FILE_FORMAT_INVALID;
+
+    if ((total >= 0) && ((pos + size) > total)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + size) > available) return pos + size;
+
+    pos += size;  // consume payload
+  }
+}
+
+long long Segment::ParseHeaders() {
+  // Outermost (level 0) segment object has been constructed,
+  // and pos designates start of payload.  We need to find the
+  // inner (level 1) elements.
+  long long total, available;
+
+  const int status = m_pReader->Length(&total, &available);
+
+  if (status < 0)  // error
+    return status;
+
+  if (total > 0 && available > total) return E_FILE_FORMAT_INVALID;
+
+  const long long segment_stop = (m_size < 0) ? -1 : m_start + m_size;
+
+  if ((segment_stop >= 0 && total >= 0 && segment_stop > total) || (segment_stop >= 0 && m_pos > segment_stop)) {
+    return E_FILE_FORMAT_INVALID;
+  }
+
+  for (;;) {
+    if ((total >= 0) && (m_pos >= total)) break;
+
+    if ((segment_stop >= 0) && (m_pos >= segment_stop)) break;
+
+    long long pos = m_pos;
+    const long long element_start = pos;
+
+    // Avoid rolling over pos when very close to LLONG_MAX.
+    unsigned long long rollover_check = pos + 1ULL;
+    if (rollover_check > LLONG_MAX) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + 1) > available) return (pos + 1);
+
+    long len;
+    long long result = GetUIntLength(m_pReader, pos, len);
+
+    if (result < 0)  // error
+      return result;
+
+    if (result > 0) {
+      // MkvReader doesn't have enough data to satisfy this read attempt.
+      return (pos + 1);
+    }
+
+    if ((segment_stop >= 0) && ((pos + len) > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > available) return pos + len;
+
+    const long long idpos = pos;
+    const long long id = ReadID(m_pReader, idpos, len);
+
+    if (id < 0) return E_FILE_FORMAT_INVALID;
+
+    if (id == libwebm::kMkvCluster) break;
+
+    pos += len;  // consume ID
+
+    if ((pos + 1) > available) return (pos + 1);
+
+    // Read Size
+    result = GetUIntLength(m_pReader, pos, len);
+
+    if (result < 0)  // error
+      return result;
+
+    if (result > 0) {
+      // MkvReader doesn't have enough data to satisfy this read attempt.
+      return (pos + 1);
+    }
+
+    if ((segment_stop >= 0) && ((pos + len) > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > available) return pos + len;
+
+    const long long size = ReadUInt(m_pReader, pos, len);
+
+    if (size < 0 || len < 1 || len > 8) {
+      // TODO(tomfinegan): ReadUInt should return an error when len is < 1 or
+      // len > 8 is true instead of checking this _everywhere_.
+      return size;
+    }
+
+    pos += len;  // consume length of size of element
+
+    // Avoid rolling over pos when very close to LLONG_MAX.
+    rollover_check = static_cast<unsigned long long>(pos) + size;
+    if (rollover_check > LLONG_MAX) return E_FILE_FORMAT_INVALID;
+
+    const long long element_size = size + pos - element_start;
+
+    // Pos now points to start of payload
+
+    if ((segment_stop >= 0) && ((pos + size) > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+    // We read EBML elements either in total or nothing at all.
+
+    if ((pos + size) > available) return pos + size;
+
+    if (id == libwebm::kMkvInfo) {
+      if (m_pInfo) return E_FILE_FORMAT_INVALID;
+
+      m_pInfo = new (std::nothrow) SegmentInfo(this, pos, size, element_start, element_size);
+
+      if (m_pInfo == NULL) return -1;
+
+      const long status = m_pInfo->Parse();
+
+      if (status) return status;
+    } else if (id == libwebm::kMkvTracks) {
+      if (m_pTracks) return E_FILE_FORMAT_INVALID;
+
+      m_pTracks = new (std::nothrow) Tracks(this, pos, size, element_start, element_size);
+
+      if (m_pTracks == NULL) return -1;
+
+      const long status = m_pTracks->Parse();
+
+      if (status) return status;
+    } else if (id == libwebm::kMkvCues) {
+      if (m_pCues == NULL) {
+        m_pCues = new (std::nothrow) Cues(this, pos, size, element_start, element_size);
+
+        if (m_pCues == NULL) return -1;
+      }
+    } else if (id == libwebm::kMkvSeekHead) {
+      if (m_pSeekHead == NULL) {
+        m_pSeekHead = new (std::nothrow) SeekHead(this, pos, size, element_start, element_size);
+
+        if (m_pSeekHead == NULL) return -1;
+
+        const long status = m_pSeekHead->Parse();
+
+        if (status) return status;
+      }
+    } else if (id == libwebm::kMkvChapters) {
+      if (m_pChapters == NULL) {
+        m_pChapters = new (std::nothrow) Chapters(this, pos, size, element_start, element_size);
+
+        if (m_pChapters == NULL) return -1;
+
+        const long status = m_pChapters->Parse();
+
+        if (status) return status;
+      }
+    } else if (id == libwebm::kMkvTags) {
+      if (m_pTags == NULL) {
+        m_pTags = new (std::nothrow) Tags(this, pos, size, element_start, element_size);
+
+        if (m_pTags == NULL) return -1;
+
+        const long status = m_pTags->Parse();
+
+        if (status) return status;
+      }
+    }
+
+    m_pos = pos + size;  // consume payload
+  }
+
+  if (segment_stop >= 0 && m_pos > segment_stop) return E_FILE_FORMAT_INVALID;
+
+  if (m_pInfo == NULL)  // TODO: liberalize this behavior
+    return E_FILE_FORMAT_INVALID;
+
+  if (m_pTracks == NULL) return E_FILE_FORMAT_INVALID;
+
+  return 0;  // success
+}
+
+long Segment::LoadCluster(long long& pos, long& len) {
+  for (;;) {
+    const long result = DoLoadCluster(pos, len);
+
+    if (result <= 1) return result;
+  }
+}
+
+long Segment::DoLoadCluster(long long& pos, long& len) {
+  if (m_pos < 0) return DoLoadClusterUnknownSize(pos, len);
+
+  long long total, avail;
+
+  long status = m_pReader->Length(&total, &avail);
+
+  if (status < 0)  // error
+    return status;
+
+  if (total >= 0 && avail > total) return E_FILE_FORMAT_INVALID;
+
+  const long long segment_stop = (m_size < 0) ? -1 : m_start + m_size;
+
+  long long cluster_off = -1;   // offset relative to start of segment
+  long long cluster_size = -1;  // size of cluster payload
+
+  for (;;) {
+    if ((total >= 0) && (m_pos >= total)) return 1;  // no more clusters
+
+    if ((segment_stop >= 0) && (m_pos >= segment_stop)) return 1;  // no more clusters
+
+    pos = m_pos;
+
+    // Read ID
+
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    long long result = GetUIntLength(m_pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0) return E_BUFFER_NOT_FULL;
+
+    if ((segment_stop >= 0) && ((pos + len) > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long idpos = pos;
+    const long long id = ReadID(m_pReader, idpos, len);
+
+    if (id < 0) return E_FILE_FORMAT_INVALID;
+
+    pos += len;  // consume ID
+
+    // Read Size
+
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    result = GetUIntLength(m_pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0) return E_BUFFER_NOT_FULL;
+
+    if ((segment_stop >= 0) && ((pos + len) > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long size = ReadUInt(m_pReader, pos, len);
+
+    if (size < 0)  // error
+      return static_cast<long>(size);
+
+    pos += len;  // consume length of size of element
+
+    // pos now points to start of payload
+
+    if (size == 0) {
+      // Missing element payload: move on.
+      m_pos = pos;
+      continue;
+    }
+
+    const long long unknown_size = (1LL << (7 * len)) - 1;
+
+    if ((segment_stop >= 0) && (size != unknown_size) && ((pos + size) > segment_stop)) {
+      return E_FILE_FORMAT_INVALID;
+    }
+
+    if (id == libwebm::kMkvCues) {
+      if (size == unknown_size) {
+        // Cues element of unknown size: Not supported.
+        return E_FILE_FORMAT_INVALID;
+      }
+
+      if (m_pCues == NULL) {
+        const long long element_size = (pos - idpos) + size;
+
+        m_pCues = new (std::nothrow) Cues(this, pos, size, idpos, element_size);
+        if (m_pCues == NULL) return -1;
+      }
+
+      m_pos = pos + size;  // consume payload
+      continue;
+    }
+
+    if (id != libwebm::kMkvCluster) {
+      // Besides the Segment, Libwebm allows only cluster elements of unknown
+      // size. Fail the parse upon encountering a non-cluster element reporting
+      // unknown size.
+      if (size == unknown_size) return E_FILE_FORMAT_INVALID;
+
+      m_pos = pos + size;  // consume payload
+      continue;
+    }
+
+    // We have a cluster.
+
+    cluster_off = idpos - m_start;  // relative pos
+
+    if (size != unknown_size) cluster_size = size;
+
+    break;
+  }
+
+  if (cluster_off < 0) {
+    // No cluster, die.
+    return E_FILE_FORMAT_INVALID;
+  }
+
+  long long pos_;
+  long len_;
+
+  status = Cluster::HasBlockEntries(this, cluster_off, pos_, len_);
+
+  if (status < 0) {  // error, or underflow
+    pos = pos_;
+    len = len_;
+
+    return status;
+  }
+
+  // status == 0 means "no block entries found"
+  // status > 0 means "found at least one block entry"
+
+  // TODO:
+  // The issue here is that the segment increments its own
+  // pos ptr past the most recent cluster parsed, and then
+  // starts from there to parse the next cluster.  If we
+  // don't know the size of the current cluster, then we
+  // must either parse its payload (as we do below), looking
+  // for the cluster (or cues) ID to terminate the parse.
+  // This isn't really what we want: rather, we really need
+  // a way to create the curr cluster object immediately.
+  // The pity is that cluster::parse can determine its own
+  // boundary, and we largely duplicate that same logic here.
+  //
+  // Maybe we need to get rid of our look-ahead preloading
+  // in source::parse???
+  //
+  // As we're parsing the blocks in the curr cluster
+  //(in cluster::parse), we should have some way to signal
+  // to the segment that we have determined the boundary,
+  // so it can adjust its own segment::m_pos member.
+  //
+  // The problem is that we're asserting in asyncreadinit,
+  // because we adjust the pos down to the curr seek pos,
+  // and the resulting adjusted len is > 2GB.  I'm suspicious
+  // that this is even correct, but even if it is, we can't
+  // be loading that much data in the cache anyway.
+
+  const long idx = m_clusterCount;
+
+  if (m_clusterPreloadCount > 0) {
+    if (idx >= m_clusterSize) return E_FILE_FORMAT_INVALID;
+
+    Cluster* const pCluster = m_clusters[idx];
+    if (pCluster == NULL || pCluster->m_index >= 0) return E_FILE_FORMAT_INVALID;
+
+    const long long off = pCluster->GetPosition();
+    if (off < 0) return E_FILE_FORMAT_INVALID;
+
+    if (off == cluster_off) {  // preloaded already
+      if (status == 0)         // no entries found
+        return E_FILE_FORMAT_INVALID;
+
+      if (cluster_size >= 0)
+        pos += cluster_size;
+      else {
+        const long long element_size = pCluster->GetElementSize();
+
+        if (element_size <= 0) return E_FILE_FORMAT_INVALID;  // TODO: handle this case
+
+        pos = pCluster->m_element_start + element_size;
+      }
+
+      pCluster->m_index = idx;  // move from preloaded to loaded
+      ++m_clusterCount;
+      --m_clusterPreloadCount;
+
+      m_pos = pos;  // consume payload
+      if (segment_stop >= 0 && m_pos > segment_stop) return E_FILE_FORMAT_INVALID;
+
+      return 0;  // success
+    }
+  }
+
+  if (status == 0) {  // no entries found
+    if (cluster_size >= 0) pos += cluster_size;
+
+    if ((total >= 0) && (pos >= total)) {
+      m_pos = total;
+      return 1;  // no more clusters
+    }
+
+    if ((segment_stop >= 0) && (pos >= segment_stop)) {
+      m_pos = segment_stop;
+      return 1;  // no more clusters
+    }
+
+    m_pos = pos;
+    return 2;  // try again
+  }
+
+  // status > 0 means we have an entry
+
+  Cluster* const pCluster = Cluster::Create(this, idx, cluster_off);
+  if (pCluster == NULL) return -1;
+
+  if (!AppendCluster(pCluster)) {
+    delete pCluster;
+    return -1;
+  }
+
+  if (cluster_size >= 0) {
+    pos += cluster_size;
+
+    m_pos = pos;
+
+    if (segment_stop > 0 && m_pos > segment_stop) return E_FILE_FORMAT_INVALID;
+
+    return 0;
+  }
+
+  m_pUnknownSize = pCluster;
+  m_pos = -pos;
+
+  return 0;  // partial success, since we have a new cluster
+
+  // status == 0 means "no block entries found"
+  // pos designates start of payload
+  // m_pos has NOT been adjusted yet (in case we need to come back here)
+}
+
+long Segment::DoLoadClusterUnknownSize(long long& pos, long& len) {
+  if (m_pos >= 0 || m_pUnknownSize == NULL) return E_PARSE_FAILED;
+
+  const long status = m_pUnknownSize->Parse(pos, len);
+
+  if (status < 0)  // error or underflow
+    return status;
+
+  if (status == 0)  // parsed a block
+    return 2;       // continue parsing
+
+  const long long start = m_pUnknownSize->m_element_start;
+  const long long size = m_pUnknownSize->GetElementSize();
+
+  if (size < 0) return E_FILE_FORMAT_INVALID;
+
+  pos = start + size;
+  m_pos = pos;
+
+  m_pUnknownSize = 0;
+
+  return 2;  // continue parsing
+}
+
+bool Segment::AppendCluster(Cluster* pCluster) {
+  if (pCluster == NULL || pCluster->m_index < 0) return false;
+
+  const long count = m_clusterCount + m_clusterPreloadCount;
+
+  long& size = m_clusterSize;
+  const long idx = pCluster->m_index;
+
+  if (size < count || idx != m_clusterCount) return false;
+
+  if (count >= size) {
+    const long n = (size <= 0) ? 2048 : 2 * size;
+
+    Cluster** const qq = new (std::nothrow) Cluster*[n];
+    if (qq == NULL) return false;
+
+    Cluster** q = qq;
+    Cluster** p = m_clusters;
+    Cluster** const pp = p + count;
+
+    while (p != pp) *q++ = *p++;
+
+    delete[] m_clusters;
+
+    m_clusters = qq;
+    size = n;
+  }
+
+  if (m_clusterPreloadCount > 0) {
+    Cluster** const p = m_clusters + m_clusterCount;
+    if (*p == NULL || (*p)->m_index >= 0) return false;
+
+    Cluster** q = p + m_clusterPreloadCount;
+    if (q >= (m_clusters + size)) return false;
+
+    for (;;) {
+      Cluster** const qq = q - 1;
+      if ((*qq)->m_index >= 0) return false;
+
+      *q = *qq;
+      q = qq;
+
+      if (q == p) break;
+    }
+  }
+
+  m_clusters[idx] = pCluster;
+  ++m_clusterCount;
+  return true;
+}
+
+bool Segment::PreloadCluster(Cluster* pCluster, ptrdiff_t idx) {
+  if (pCluster == NULL || pCluster->m_index >= 0 || idx < m_clusterCount) return false;
+
+  const long count = m_clusterCount + m_clusterPreloadCount;
+
+  long& size = m_clusterSize;
+  if (size < count) return false;
+
+  if (count >= size) {
+    const long n = (size <= 0) ? 2048 : 2 * size;
+
+    Cluster** const qq = new (std::nothrow) Cluster*[n];
+    if (qq == NULL) return false;
+    Cluster** q = qq;
+
+    Cluster** p = m_clusters;
+    Cluster** const pp = p + count;
+
+    while (p != pp) *q++ = *p++;
+
+    delete[] m_clusters;
+
+    m_clusters = qq;
+    size = n;
+  }
+
+  if (m_clusters == NULL) return false;
+
+  Cluster** const p = m_clusters + idx;
+
+  Cluster** q = m_clusters + count;
+  if (q < p || q >= (m_clusters + size)) return false;
+
+  while (q > p) {
+    Cluster** const qq = q - 1;
+
+    if ((*qq)->m_index >= 0) return false;
+
+    *q = *qq;
+    q = qq;
+  }
+
+  m_clusters[idx] = pCluster;
+  ++m_clusterPreloadCount;
+  return true;
+}
+
+long Segment::Load() {
+  if (m_clusters != NULL || m_clusterSize != 0 || m_clusterCount != 0) return E_PARSE_FAILED;
+
+  // Outermost (level 0) segment object has been constructed,
+  // and pos designates start of payload.  We need to find the
+  // inner (level 1) elements.
+
+  const long long header_status = ParseHeaders();
+
+  if (header_status < 0)  // error
+    return static_cast<long>(header_status);
+
+  if (header_status > 0)  // underflow
+    return E_BUFFER_NOT_FULL;
+
+  if (m_pInfo == NULL || m_pTracks == NULL) return E_FILE_FORMAT_INVALID;
+
+  for (;;) {
+    const long status = LoadCluster();
+
+    if (status < 0)  // error
+      return status;
+
+    if (status >= 1)  // no more clusters
+      return 0;
+  }
+}
+
+SeekHead::Entry::Entry() : id(0), pos(0), element_start(0), element_size(0) {}
+
+SeekHead::SeekHead(Segment* pSegment, long long start, long long size_, long long element_start, long long element_size)
+    : m_pSegment(pSegment),
+      m_start(start),
+      m_size(size_),
+      m_element_start(element_start),
+      m_element_size(element_size),
+      m_entries(0),
+      m_entry_count(0),
+      m_void_elements(0),
+      m_void_element_count(0) {}
+
+SeekHead::~SeekHead() {
+  delete[] m_entries;
+  delete[] m_void_elements;
+}
+
+long SeekHead::Parse() {
+  IMkvReader* const pReader = m_pSegment->m_pReader;
+
+  long long pos = m_start;
+  const long long stop = m_start + m_size;
+
+  // first count the seek head entries
+
+  int entry_count = 0;
+  int void_element_count = 0;
+
+  while (pos < stop) {
+    long long id, size;
+
+    const long status = ParseElementHeader(pReader, pos, stop, id, size);
+
+    if (status < 0)  // error
+      return status;
+
+    if (id == libwebm::kMkvSeek)
+      ++entry_count;
+    else if (id == libwebm::kMkvVoid)
+      ++void_element_count;
+
+    pos += size;  // consume payload
+
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+
+  if (entry_count > 0) {
+    m_entries = new (std::nothrow) Entry[entry_count];
+
+    if (m_entries == NULL) return -1;
+  }
+
+  if (void_element_count > 0) {
+    m_void_elements = new (std::nothrow) VoidElement[void_element_count];
+
+    if (m_void_elements == NULL) return -1;
+  }
+
+  // now parse the entries and void elements
+
+  Entry* pEntry = m_entries;
+  VoidElement* pVoidElement = m_void_elements;
+
+  pos = m_start;
+
+  while (pos < stop) {
+    const long long idpos = pos;
+
+    long long id, size;
+
+    const long status = ParseElementHeader(pReader, pos, stop, id, size);
+
+    if (status < 0)  // error
+      return status;
+
+    if (id == libwebm::kMkvSeek && entry_count > 0) {
+      if (ParseEntry(pReader, pos, size, pEntry)) {
+        Entry& e = *pEntry++;
+
+        e.element_start = idpos;
+        e.element_size = (pos + size) - idpos;
+      }
+    } else if (id == libwebm::kMkvVoid && void_element_count > 0) {
+      VoidElement& e = *pVoidElement++;
+
+      e.element_start = idpos;
+      e.element_size = (pos + size) - idpos;
+    }
+
+    pos += size;  // consume payload
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+
+  ptrdiff_t count_ = ptrdiff_t(pEntry - m_entries);
+  assert(count_ >= 0);
+  assert(count_ <= entry_count);
+
+  m_entry_count = static_cast<int>(count_);
+
+  count_ = ptrdiff_t(pVoidElement - m_void_elements);
+  assert(count_ >= 0);
+  assert(count_ <= void_element_count);
+
+  m_void_element_count = static_cast<int>(count_);
+
+  return 0;
+}
+
+int SeekHead::GetCount() const { return m_entry_count; }
+
+const SeekHead::Entry* SeekHead::GetEntry(int idx) const {
+  if (idx < 0) return 0;
+
+  if (idx >= m_entry_count) return 0;
+
+  return m_entries + idx;
+}
+
+int SeekHead::GetVoidElementCount() const { return m_void_element_count; }
+
+const SeekHead::VoidElement* SeekHead::GetVoidElement(int idx) const {
+  if (idx < 0) return 0;
+
+  if (idx >= m_void_element_count) return 0;
+
+  return m_void_elements + idx;
+}
+
+long Segment::ParseCues(long long off, long long& pos, long& len) {
+  if (m_pCues) return 0;  // success
+
+  if (off < 0) return -1;
+
+  long long total, avail;
+
+  const int status = m_pReader->Length(&total, &avail);
+
+  if (status < 0)  // error
+    return status;
+
+  assert((total < 0) || (avail <= total));
+
+  pos = m_start + off;
+
+  if ((total < 0) || (pos >= total)) return 1;  // don't bother parsing cues
+
+  const long long element_start = pos;
+  const long long segment_stop = (m_size < 0) ? -1 : m_start + m_size;
+
+  if ((pos + 1) > avail) {
+    len = 1;
+    return E_BUFFER_NOT_FULL;
+  }
+
+  long long result = GetUIntLength(m_pReader, pos, len);
+
+  if (result < 0)  // error
+    return static_cast<long>(result);
+
+  if (result > 0)  // underflow (weird)
+  {
+    len = 1;
+    return E_BUFFER_NOT_FULL;
+  }
+
+  if ((segment_stop >= 0) && ((pos + len) > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+  if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+  const long long idpos = pos;
+
+  const long long id = ReadID(m_pReader, idpos, len);
+
+  if (id != libwebm::kMkvCues) return E_FILE_FORMAT_INVALID;
+
+  pos += len;  // consume ID
+  assert((segment_stop < 0) || (pos <= segment_stop));
+
+  // Read Size
+
+  if ((pos + 1) > avail) {
+    len = 1;
+    return E_BUFFER_NOT_FULL;
+  }
+
+  result = GetUIntLength(m_pReader, pos, len);
+
+  if (result < 0)  // error
+    return static_cast<long>(result);
+
+  if (result > 0)  // underflow (weird)
+  {
+    len = 1;
+    return E_BUFFER_NOT_FULL;
+  }
+
+  if ((segment_stop >= 0) && ((pos + len) > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+  if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+  const long long size = ReadUInt(m_pReader, pos, len);
+
+  if (size < 0)  // error
+    return static_cast<long>(size);
+
+  if (size == 0)  // weird, although technically not illegal
+    return 1;     // done
+
+  pos += len;  // consume length of size of element
+  assert((segment_stop < 0) || (pos <= segment_stop));
+
+  // Pos now points to start of payload
+
+  const long long element_stop = pos + size;
+
+  if ((segment_stop >= 0) && (element_stop > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+  if ((total >= 0) && (element_stop > total)) return 1;  // don't bother parsing anymore
+
+  len = static_cast<long>(size);
+
+  if (element_stop > avail) return E_BUFFER_NOT_FULL;
+
+  const long long element_size = element_stop - element_start;
+
+  m_pCues = new (std::nothrow) Cues(this, pos, size, element_start, element_size);
+  if (m_pCues == NULL) return -1;
+
+  return 0;  // success
+}
+
+bool SeekHead::ParseEntry(IMkvReader* pReader, long long start, long long size_, Entry* pEntry) {
+  if (size_ <= 0) return false;
+
+  long long pos = start;
+  const long long stop = start + size_;
+
+  long len;
+
+  // parse the container for the level-1 element ID
+
+  const long long seekIdId = ReadID(pReader, pos, len);
+  if (seekIdId < 0) return false;
+
+  if (seekIdId != libwebm::kMkvSeekID) return false;
+
+  if ((pos + len) > stop) return false;
+
+  pos += len;  // consume SeekID id
+
+  const long long seekIdSize = ReadUInt(pReader, pos, len);
+
+  if (seekIdSize <= 0) return false;
+
+  if ((pos + len) > stop) return false;
+
+  pos += len;  // consume size of field
+
+  if ((pos + seekIdSize) > stop) return false;
+
+  pEntry->id = ReadID(pReader, pos, len);  // payload
+
+  if (pEntry->id <= 0) return false;
+
+  if (len != seekIdSize) return false;
+
+  pos += seekIdSize;  // consume SeekID payload
+
+  const long long seekPosId = ReadID(pReader, pos, len);
+
+  if (seekPosId != libwebm::kMkvSeekPosition) return false;
+
+  if ((pos + len) > stop) return false;
+
+  pos += len;  // consume id
+
+  const long long seekPosSize = ReadUInt(pReader, pos, len);
+
+  if (seekPosSize <= 0) return false;
+
+  if ((pos + len) > stop) return false;
+
+  pos += len;  // consume size
+
+  if ((pos + seekPosSize) > stop) return false;
+
+  pEntry->pos = UnserializeUInt(pReader, pos, seekPosSize);
+
+  if (pEntry->pos < 0) return false;
+
+  pos += seekPosSize;  // consume payload
+
+  if (pos != stop) return false;
+
+  return true;
+}
+
+Cues::Cues(Segment* pSegment, long long start_, long long size_, long long element_start, long long element_size)
+    : m_pSegment(pSegment),
+      m_start(start_),
+      m_size(size_),
+      m_element_start(element_start),
+      m_element_size(element_size),
+      m_cue_points(NULL),
+      m_count(0),
+      m_preload_count(0),
+      m_pos(start_) {}
+
+Cues::~Cues() {
+  const long n = m_count + m_preload_count;
+
+  CuePoint** p = m_cue_points;
+  CuePoint** const q = p + n;
+
+  while (p != q) {
+    CuePoint* const pCP = *p++;
+    assert(pCP);
+
+    delete pCP;
+  }
+
+  delete[] m_cue_points;
+}
+
+long Cues::GetCount() const {
+  if (m_cue_points == NULL) return -1;
+
+  return m_count;  // TODO: really ignore preload count?
+}
+
+bool Cues::DoneParsing() const {
+  const long long stop = m_start + m_size;
+  return (m_pos >= stop);
+}
+
+bool Cues::Init() const {
+  if (m_cue_points) return true;
+
+  if (m_count != 0 || m_preload_count != 0) return false;
+
+  IMkvReader* const pReader = m_pSegment->m_pReader;
+
+  const long long stop = m_start + m_size;
+  long long pos = m_start;
+
+  long cue_points_size = 0;
+
+  while (pos < stop) {
+    const long long idpos = pos;
+
+    long len;
+
+    const long long id = ReadID(pReader, pos, len);
+    if (id < 0 || (pos + len) > stop) {
+      return false;
+    }
+
+    pos += len;  // consume ID
+
+    const long long size = ReadUInt(pReader, pos, len);
+    if (size < 0 || (pos + len > stop)) {
+      return false;
+    }
+
+    pos += len;  // consume Size field
+    if (pos + size > stop) {
+      return false;
+    }
+
+    if (id == libwebm::kMkvCuePoint) {
+      if (!PreloadCuePoint(cue_points_size, idpos)) return false;
+    }
+
+    pos += size;  // skip payload
+  }
+  return true;
+}
+
+bool Cues::PreloadCuePoint(long& cue_points_size, long long pos) const {
+  if (m_count != 0) return false;
+
+  if (m_preload_count >= cue_points_size) {
+    const long n = (cue_points_size <= 0) ? 2048 : 2 * cue_points_size;
+
+    CuePoint** const qq = new (std::nothrow) CuePoint*[n];
+    if (qq == NULL) return false;
+
+    CuePoint** q = qq;  // beginning of target
+
+    CuePoint** p = m_cue_points;                // beginning of source
+    CuePoint** const pp = p + m_preload_count;  // end of source
+
+    while (p != pp) *q++ = *p++;
+
+    delete[] m_cue_points;
+
+    m_cue_points = qq;
+    cue_points_size = n;
+  }
+
+  CuePoint* const pCP = new (std::nothrow) CuePoint(m_preload_count, pos);
+  if (pCP == NULL) return false;
+
+  m_cue_points[m_preload_count++] = pCP;
+  return true;
+}
+
+bool Cues::LoadCuePoint() const {
+  const long long stop = m_start + m_size;
+
+  if (m_pos >= stop) return false;  // nothing else to do
+
+  if (!Init()) {
+    m_pos = stop;
+    return false;
+  }
+
+  IMkvReader* const pReader = m_pSegment->m_pReader;
+
+  while (m_pos < stop) {
+    const long long idpos = m_pos;
+
+    long len;
+
+    const long long id = ReadID(pReader, m_pos, len);
+    if (id < 0 || (m_pos + len) > stop) return false;
+
+    m_pos += len;  // consume ID
+
+    const long long size = ReadUInt(pReader, m_pos, len);
+    if (size < 0 || (m_pos + len) > stop) return false;
+
+    m_pos += len;  // consume Size field
+    if ((m_pos + size) > stop) return false;
+
+    if (id != libwebm::kMkvCuePoint) {
+      m_pos += size;  // consume payload
+      if (m_pos > stop) return false;
+
+      continue;
+    }
+
+    if (m_preload_count < 1) return false;
+
+    CuePoint* const pCP = m_cue_points[m_count];
+    if (!pCP || (pCP->GetTimeCode() < 0 && (-pCP->GetTimeCode() != idpos))) return false;
+
+    if (!pCP->Load(pReader)) {
+      m_pos = stop;
+      return false;
+    }
+    ++m_count;
+    --m_preload_count;
+
+    m_pos += size;  // consume payload
+    if (m_pos > stop) return false;
+
+    return true;  // yes, we loaded a cue point
+  }
+
+  return false;  // no, we did not load a cue point
+}
+
+bool Cues::Find(long long time_ns, const Track* pTrack, const CuePoint*& pCP,
+                const CuePoint::TrackPosition*& pTP) const {
+  if (time_ns < 0 || pTrack == NULL || m_cue_points == NULL || m_count == 0) return false;
+
+  CuePoint** const ii = m_cue_points;
+  CuePoint** i = ii;
+
+  CuePoint** const jj = ii + m_count;
+  CuePoint** j = jj;
+
+  pCP = *i;
+  if (pCP == NULL) return false;
+
+  if (time_ns <= pCP->GetTime(m_pSegment)) {
+    pTP = pCP->Find(pTrack);
+    return (pTP != NULL);
+  }
+
+  while (i < j) {
+    // INVARIANT:
+    //[ii, i) <= time_ns
+    //[i, j)  ?
+    //[j, jj) > time_ns
+
+    CuePoint** const k = i + (j - i) / 2;
+    if (k >= jj) return false;
+
+    CuePoint* const pCP = *k;
+    if (pCP == NULL) return false;
+
+    const long long t = pCP->GetTime(m_pSegment);
+
+    if (t <= time_ns)
+      i = k + 1;
+    else
+      j = k;
+
+    if (i > j) return false;
+  }
+
+  if (i != j || i > jj || i <= ii) return false;
+
+  pCP = *--i;
+
+  if (pCP == NULL || pCP->GetTime(m_pSegment) > time_ns) return false;
+
+  // TODO: here and elsewhere, it's probably not correct to search
+  // for the cue point with this time, and then search for a matching
+  // track.  In principle, the matching track could be on some earlier
+  // cue point, and with our current algorithm, we'd miss it.  To make
+  // this bullet-proof, we'd need to create a secondary structure,
+  // with a list of cue points that apply to a track, and then search
+  // that track-based structure for a matching cue point.
+
+  pTP = pCP->Find(pTrack);
+  return (pTP != NULL);
+}
+
+const CuePoint* Cues::GetFirst() const {
+  if (m_cue_points == NULL || m_count == 0) return NULL;
+
+  CuePoint* const* const pp = m_cue_points;
+  if (pp == NULL) return NULL;
+
+  CuePoint* const pCP = pp[0];
+  if (pCP == NULL || pCP->GetTimeCode() < 0) return NULL;
+
+  return pCP;
+}
+
+const CuePoint* Cues::GetLast() const {
+  if (m_cue_points == NULL || m_count <= 0) return NULL;
+
+  const long index = m_count - 1;
+
+  CuePoint* const* const pp = m_cue_points;
+  if (pp == NULL) return NULL;
+
+  CuePoint* const pCP = pp[index];
+  if (pCP == NULL || pCP->GetTimeCode() < 0) return NULL;
+
+  return pCP;
+}
+
+const CuePoint* Cues::GetNext(const CuePoint* pCurr) const {
+  if (pCurr == NULL || pCurr->GetTimeCode() < 0 || m_cue_points == NULL || m_count < 1) {
+    return NULL;
+  }
+
+  long index = pCurr->m_index;
+  if (index >= m_count) return NULL;
+
+  CuePoint* const* const pp = m_cue_points;
+  if (pp == NULL || pp[index] != pCurr) return NULL;
+
+  ++index;
+
+  if (index >= m_count) return NULL;
+
+  CuePoint* const pNext = pp[index];
+
+  if (pNext == NULL || pNext->GetTimeCode() < 0) return NULL;
+
+  return pNext;
+}
+
+const BlockEntry* Cues::GetBlock(const CuePoint* pCP, const CuePoint::TrackPosition* pTP) const {
+  if (pCP == NULL || pTP == NULL) return NULL;
+
+  return m_pSegment->GetBlock(*pCP, *pTP);
+}
+
+const BlockEntry* Segment::GetBlock(const CuePoint& cp, const CuePoint::TrackPosition& tp) {
+  Cluster** const ii = m_clusters;
+  Cluster** i = ii;
+
+  const long count = m_clusterCount + m_clusterPreloadCount;
+
+  Cluster** const jj = ii + count;
+  Cluster** j = jj;
+
+  while (i < j) {
+    // INVARIANT:
+    //[ii, i) < pTP->m_pos
+    //[i, j) ?
+    //[j, jj)  > pTP->m_pos
+
+    Cluster** const k = i + (j - i) / 2;
+    assert(k < jj);
+
+    Cluster* const pCluster = *k;
+    assert(pCluster);
+
+    // const long long pos_ = pCluster->m_pos;
+    // assert(pos_);
+    // const long long pos = pos_ * ((pos_ < 0) ? -1 : 1);
+
+    const long long pos = pCluster->GetPosition();
+    assert(pos >= 0);
+
+    if (pos < tp.m_pos)
+      i = k + 1;
+    else if (pos > tp.m_pos)
+      j = k;
+    else
+      return pCluster->GetEntry(cp, tp);
+  }
+
+  assert(i == j);
+  // assert(Cluster::HasBlockEntries(this, tp.m_pos));
+
+  Cluster* const pCluster = Cluster::Create(this, -1, tp.m_pos);  //, -1);
+  if (pCluster == NULL) return NULL;
+
+  const ptrdiff_t idx = i - m_clusters;
+
+  if (!PreloadCluster(pCluster, idx)) {
+    delete pCluster;
+    return NULL;
+  }
+  assert(m_clusters);
+  assert(m_clusterPreloadCount > 0);
+  assert(m_clusters[idx] == pCluster);
+
+  return pCluster->GetEntry(cp, tp);
+}
+
+const Cluster* Segment::FindOrPreloadCluster(long long requested_pos) {
+  if (requested_pos < 0) return 0;
+
+  Cluster** const ii = m_clusters;
+  Cluster** i = ii;
+
+  const long count = m_clusterCount + m_clusterPreloadCount;
+
+  Cluster** const jj = ii + count;
+  Cluster** j = jj;
+
+  while (i < j) {
+    // INVARIANT:
+    //[ii, i) < pTP->m_pos
+    //[i, j) ?
+    //[j, jj)  > pTP->m_pos
+
+    Cluster** const k = i + (j - i) / 2;
+    assert(k < jj);
+
+    Cluster* const pCluster = *k;
+    assert(pCluster);
+
+    // const long long pos_ = pCluster->m_pos;
+    // assert(pos_);
+    // const long long pos = pos_ * ((pos_ < 0) ? -1 : 1);
+
+    const long long pos = pCluster->GetPosition();
+    assert(pos >= 0);
+
+    if (pos < requested_pos)
+      i = k + 1;
+    else if (pos > requested_pos)
+      j = k;
+    else
+      return pCluster;
+  }
+
+  assert(i == j);
+  // assert(Cluster::HasBlockEntries(this, tp.m_pos));
+
+  Cluster* const pCluster = Cluster::Create(this, -1, requested_pos);
+  if (pCluster == NULL) return NULL;
+
+  const ptrdiff_t idx = i - m_clusters;
+
+  if (!PreloadCluster(pCluster, idx)) {
+    delete pCluster;
+    return NULL;
+  }
+  assert(m_clusters);
+  assert(m_clusterPreloadCount > 0);
+  assert(m_clusters[idx] == pCluster);
+
+  return pCluster;
+}
+
+CuePoint::CuePoint(long idx, long long pos)
+    : m_element_start(0),
+      m_element_size(0),
+      m_index(idx),
+      m_timecode(-1 * pos),
+      m_track_positions(NULL),
+      m_track_positions_count(0) {
+  assert(pos > 0);
+}
+
+CuePoint::~CuePoint() { delete[] m_track_positions; }
+
+bool CuePoint::Load(IMkvReader* pReader) {
+  // odbgstream os;
+  // os << "CuePoint::Load(begin): timecode=" << m_timecode << endl;
+
+  if (m_timecode >= 0)  // already loaded
+    return true;
+
+  assert(m_track_positions == NULL);
+  assert(m_track_positions_count == 0);
+
+  long long pos_ = -m_timecode;
+  const long long element_start = pos_;
+
+  long long stop;
+
+  {
+    long len;
+
+    const long long id = ReadID(pReader, pos_, len);
+    if (id != libwebm::kMkvCuePoint) return false;
+
+    pos_ += len;  // consume ID
+
+    const long long size = ReadUInt(pReader, pos_, len);
+    assert(size >= 0);
+
+    pos_ += len;  // consume Size field
+    // pos_ now points to start of payload
+
+    stop = pos_ + size;
+  }
+
+  const long long element_size = stop - element_start;
+
+  long long pos = pos_;
+
+  // First count number of track positions
+
+  while (pos < stop) {
+    long len;
+
+    const long long id = ReadID(pReader, pos, len);
+    if ((id < 0) || (pos + len > stop)) {
+      return false;
+    }
+
+    pos += len;  // consume ID
+
+    const long long size = ReadUInt(pReader, pos, len);
+    if ((size < 0) || (pos + len > stop)) {
+      return false;
+    }
+
+    pos += len;  // consume Size field
+    if ((pos + size) > stop) {
+      return false;
+    }
+
+    if (id == libwebm::kMkvCueTime)
+      m_timecode = UnserializeUInt(pReader, pos, size);
+
+    else if (id == libwebm::kMkvCueTrackPositions)
+      ++m_track_positions_count;
+
+    pos += size;  // consume payload
+  }
+
+  if (m_timecode < 0 || m_track_positions_count <= 0) {
+    return false;
+  }
+
+  // os << "CuePoint::Load(cont'd): idpos=" << idpos
+  //   << " timecode=" << m_timecode
+  //   << endl;
+
+  m_track_positions = new (std::nothrow) TrackPosition[m_track_positions_count];
+  if (m_track_positions == NULL) return false;
+
+  // Now parse track positions
+
+  TrackPosition* p = m_track_positions;
+  pos = pos_;
+
+  while (pos < stop) {
+    long len;
+
+    const long long id = ReadID(pReader, pos, len);
+    if (id < 0 || (pos + len) > stop) return false;
+
+    pos += len;  // consume ID
+
+    const long long size = ReadUInt(pReader, pos, len);
+    assert(size >= 0);
+    assert((pos + len) <= stop);
+
+    pos += len;  // consume Size field
+    assert((pos + size) <= stop);
+
+    if (id == libwebm::kMkvCueTrackPositions) {
+      TrackPosition& tp = *p++;
+      if (!tp.Parse(pReader, pos, size)) {
+        return false;
+      }
+    }
+
+    pos += size;  // consume payload
+    if (pos > stop) return false;
+  }
+
+  assert(size_t(p - m_track_positions) == m_track_positions_count);
+
+  m_element_start = element_start;
+  m_element_size = element_size;
+
+  return true;
+}
+
+bool CuePoint::TrackPosition::Parse(IMkvReader* pReader, long long start_, long long size_) {
+  const long long stop = start_ + size_;
+  long long pos = start_;
+
+  m_track = -1;
+  m_pos = -1;
+  m_block = 1;  // default
+
+  while (pos < stop) {
+    long len;
+
+    const long long id = ReadID(pReader, pos, len);
+    if ((id < 0) || ((pos + len) > stop)) {
+      return false;
+    }
+
+    pos += len;  // consume ID
+
+    const long long size = ReadUInt(pReader, pos, len);
+    if ((size < 0) || ((pos + len) > stop)) {
+      return false;
+    }
+
+    pos += len;  // consume Size field
+    if ((pos + size) > stop) {
+      return false;
+    }
+
+    if (id == libwebm::kMkvCueTrack)
+      m_track = UnserializeUInt(pReader, pos, size);
+    else if (id == libwebm::kMkvCueClusterPosition)
+      m_pos = UnserializeUInt(pReader, pos, size);
+    else if (id == libwebm::kMkvCueBlockNumber)
+      m_block = UnserializeUInt(pReader, pos, size);
+
+    pos += size;  // consume payload
+  }
+
+  if ((m_pos < 0) || (m_track <= 0)) {
+    return false;
+  }
+
+  return true;
+}
+
+const CuePoint::TrackPosition* CuePoint::Find(const Track* pTrack) const {
+  if (pTrack == NULL) {
+    return NULL;
+  }
+
+  const long long n = pTrack->GetNumber();
+
+  const TrackPosition* i = m_track_positions;
+  const TrackPosition* const j = i + m_track_positions_count;
+
+  while (i != j) {
+    const TrackPosition& p = *i++;
+
+    if (p.m_track == n) return &p;
+  }
+
+  return NULL;  // no matching track number found
+}
+
+long long CuePoint::GetTimeCode() const { return m_timecode; }
+
+long long CuePoint::GetTime(const Segment* pSegment) const {
+  assert(pSegment);
+  assert(m_timecode >= 0);
+
+  const SegmentInfo* const pInfo = pSegment->GetInfo();
+  assert(pInfo);
+
+  const long long scale = pInfo->GetTimeCodeScale();
+  assert(scale >= 1);
+
+  const long long time = scale * m_timecode;
+
+  return time;
+}
+
+bool Segment::DoneParsing() const {
+  if (m_size < 0) {
+    long long total, avail;
+
+    const int status = m_pReader->Length(&total, &avail);
+
+    if (status < 0)  // error
+      return true;   // must assume done
+
+    if (total < 0) return false;  // assume live stream
+
+    return (m_pos >= total);
+  }
+
+  const long long stop = m_start + m_size;
+
+  return (m_pos >= stop);
+}
+
+const Cluster* Segment::GetFirst() const {
+  if ((m_clusters == NULL) || (m_clusterCount <= 0)) return &m_eos;
+
+  Cluster* const pCluster = m_clusters[0];
+  assert(pCluster);
+
+  return pCluster;
+}
+
+const Cluster* Segment::GetLast() const {
+  if ((m_clusters == NULL) || (m_clusterCount <= 0)) return &m_eos;
+
+  const long idx = m_clusterCount - 1;
+
+  Cluster* const pCluster = m_clusters[idx];
+  assert(pCluster);
+
+  return pCluster;
+}
+
+unsigned long Segment::GetCount() const { return m_clusterCount; }
+
+const Cluster* Segment::GetNext(const Cluster* pCurr) {
+  assert(pCurr);
+  assert(pCurr != &m_eos);
+  assert(m_clusters);
+
+  long idx = pCurr->m_index;
+
+  if (idx >= 0) {
+    assert(m_clusterCount > 0);
+    assert(idx < m_clusterCount);
+    assert(pCurr == m_clusters[idx]);
+
+    ++idx;
+
+    if (idx >= m_clusterCount) return &m_eos;  // caller will LoadCluster as desired
+
+    Cluster* const pNext = m_clusters[idx];
+    assert(pNext);
+    assert(pNext->m_index >= 0);
+    assert(pNext->m_index == idx);
+
+    return pNext;
+  }
+
+  assert(m_clusterPreloadCount > 0);
+
+  long long pos = pCurr->m_element_start;
+
+  assert(m_size >= 0);                      // TODO
+  const long long stop = m_start + m_size;  // end of segment
+
+  {
+    long len;
+
+    long long result = GetUIntLength(m_pReader, pos, len);
+    assert(result == 0);
+    assert((pos + len) <= stop);  // TODO
+    if (result != 0) return NULL;
+
+    const long long id = ReadID(m_pReader, pos, len);
+    if (id != libwebm::kMkvCluster) return NULL;
+
+    pos += len;  // consume ID
+
+    // Read Size
+    result = GetUIntLength(m_pReader, pos, len);
+    assert(result == 0);          // TODO
+    assert((pos + len) <= stop);  // TODO
+
+    const long long size = ReadUInt(m_pReader, pos, len);
+    assert(size > 0);  // TODO
+    // assert((pCurr->m_size <= 0) || (pCurr->m_size == size));
+
+    pos += len;                    // consume length of size of element
+    assert((pos + size) <= stop);  // TODO
+
+    // Pos now points to start of payload
+
+    pos += size;  // consume payload
+  }
+
+  long long off_next = 0;
+
+  while (pos < stop) {
+    long len;
+
+    long long result = GetUIntLength(m_pReader, pos, len);
+    assert(result == 0);
+    assert((pos + len) <= stop);  // TODO
+    if (result != 0) return NULL;
+
+    const long long idpos = pos;  // pos of next (potential) cluster
+
+    const long long id = ReadID(m_pReader, idpos, len);
+    if (id < 0) return NULL;
+
+    pos += len;  // consume ID
+
+    // Read Size
+    result = GetUIntLength(m_pReader, pos, len);
+    assert(result == 0);          // TODO
+    assert((pos + len) <= stop);  // TODO
+
+    const long long size = ReadUInt(m_pReader, pos, len);
+    assert(size >= 0);  // TODO
+
+    pos += len;                    // consume length of size of element
+    assert((pos + size) <= stop);  // TODO
+
+    // Pos now points to start of payload
+
+    if (size == 0)  // weird
+      continue;
+
+    if (id == libwebm::kMkvCluster) {
+      const long long off_next_ = idpos - m_start;
+
+      long long pos_;
+      long len_;
+
+      const long status = Cluster::HasBlockEntries(this, off_next_, pos_, len_);
+
+      assert(status >= 0);
+
+      if (status > 0) {
+        off_next = off_next_;
+        break;
+      }
+    }
+
+    pos += size;  // consume payload
+  }
+
+  if (off_next <= 0) return 0;
+
+  Cluster** const ii = m_clusters + m_clusterCount;
+  Cluster** i = ii;
+
+  Cluster** const jj = ii + m_clusterPreloadCount;
+  Cluster** j = jj;
+
+  while (i < j) {
+    // INVARIANT:
+    //[0, i) < pos_next
+    //[i, j) ?
+    //[j, jj)  > pos_next
+
+    Cluster** const k = i + (j - i) / 2;
+    assert(k < jj);
+
+    Cluster* const pNext = *k;
+    assert(pNext);
+    assert(pNext->m_index < 0);
+
+    // const long long pos_ = pNext->m_pos;
+    // assert(pos_);
+    // pos = pos_ * ((pos_ < 0) ? -1 : 1);
+
+    pos = pNext->GetPosition();
+
+    if (pos < off_next)
+      i = k + 1;
+    else if (pos > off_next)
+      j = k;
+    else
+      return pNext;
+  }
+
+  assert(i == j);
+
+  Cluster* const pNext = Cluster::Create(this, -1, off_next);
+  if (pNext == NULL) return NULL;
+
+  const ptrdiff_t idx_next = i - m_clusters;  // insertion position
+
+  if (!PreloadCluster(pNext, idx_next)) {
+    delete pNext;
+    return NULL;
+  }
+  assert(m_clusters);
+  assert(idx_next < m_clusterSize);
+  assert(m_clusters[idx_next] == pNext);
+
+  return pNext;
+}
+
+long Segment::ParseNext(const Cluster* pCurr, const Cluster*& pResult, long long& pos, long& len) {
+  assert(pCurr);
+  assert(!pCurr->EOS());
+  assert(m_clusters);
+
+  pResult = 0;
+
+  if (pCurr->m_index >= 0) {  // loaded (not merely preloaded)
+    assert(m_clusters[pCurr->m_index] == pCurr);
+
+    const long next_idx = pCurr->m_index + 1;
+
+    if (next_idx < m_clusterCount) {
+      pResult = m_clusters[next_idx];
+      return 0;  // success
+    }
+
+    // curr cluster is last among loaded
+
+    const long result = LoadCluster(pos, len);
+
+    if (result < 0)  // error or underflow
+      return result;
+
+    if (result > 0)  // no more clusters
+    {
+      // pResult = &m_eos;
+      return 1;
+    }
+
+    pResult = GetLast();
+    return 0;  // success
+  }
+
+  assert(m_pos > 0);
+
+  long long total, avail;
+
+  long status = m_pReader->Length(&total, &avail);
+
+  if (status < 0)  // error
+    return status;
+
+  assert((total < 0) || (avail <= total));
+
+  const long long segment_stop = (m_size < 0) ? -1 : m_start + m_size;
+
+  // interrogate curr cluster
+
+  pos = pCurr->m_element_start;
+
+  if (pCurr->m_element_size >= 0)
+    pos += pCurr->m_element_size;
+  else {
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    long long result = GetUIntLength(m_pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0)  // weird
+      return E_BUFFER_NOT_FULL;
+
+    if ((segment_stop >= 0) && ((pos + len) > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long id = ReadUInt(m_pReader, pos, len);
+
+    if (id != libwebm::kMkvCluster) return -1;
+
+    pos += len;  // consume ID
+
+    // Read Size
+
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    result = GetUIntLength(m_pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0)  // weird
+      return E_BUFFER_NOT_FULL;
+
+    if ((segment_stop >= 0) && ((pos + len) > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long size = ReadUInt(m_pReader, pos, len);
+
+    if (size < 0)  // error
+      return static_cast<long>(size);
+
+    pos += len;  // consume size field
+
+    const long long unknown_size = (1LL << (7 * len)) - 1;
+
+    if (size == unknown_size)        // TODO: should never happen
+      return E_FILE_FORMAT_INVALID;  // TODO: resolve this
+
+    // assert((pCurr->m_size <= 0) || (pCurr->m_size == size));
+
+    if ((segment_stop >= 0) && ((pos + size) > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+    // Pos now points to start of payload
+
+    pos += size;  // consume payload (that is, the current cluster)
+    if (segment_stop >= 0 && pos > segment_stop) return E_FILE_FORMAT_INVALID;
+
+    // By consuming the payload, we are assuming that the curr
+    // cluster isn't interesting.  That is, we don't bother checking
+    // whether the payload of the curr cluster is less than what
+    // happens to be available (obtained via IMkvReader::Length).
+    // Presumably the caller has already dispensed with the current
+    // cluster, and really does want the next cluster.
+  }
+
+  // pos now points to just beyond the last fully-loaded cluster
+
+  for (;;) {
+    const long status = DoParseNext(pResult, pos, len);
+
+    if (status <= 1) return status;
+  }
+}
+
+long Segment::DoParseNext(const Cluster*& pResult, long long& pos, long& len) {
+  long long total, avail;
+
+  long status = m_pReader->Length(&total, &avail);
+
+  if (status < 0)  // error
+    return status;
+
+  assert((total < 0) || (avail <= total));
+
+  const long long segment_stop = (m_size < 0) ? -1 : m_start + m_size;
+
+  // Parse next cluster.  This is strictly a parsing activity.
+  // Creation of a new cluster object happens later, after the
+  // parsing is done.
+
+  long long off_next = 0;
+  long long cluster_size = -1;
+
+  for (;;) {
+    if ((total >= 0) && (pos >= total)) return 1;  // EOF
+
+    if ((segment_stop >= 0) && (pos >= segment_stop)) return 1;  // EOF
+
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    long long result = GetUIntLength(m_pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0)  // weird
+      return E_BUFFER_NOT_FULL;
+
+    if ((segment_stop >= 0) && ((pos + len) > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long idpos = pos;            // absolute
+    const long long idoff = pos - m_start;  // relative
+
+    const long long id = ReadID(m_pReader, idpos, len);  // absolute
+
+    if (id < 0)  // error
+      return static_cast<long>(id);
+
+    if (id == 0)  // weird
+      return -1;  // generic error
+
+    pos += len;  // consume ID
+
+    // Read Size
+
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    result = GetUIntLength(m_pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0)  // weird
+      return E_BUFFER_NOT_FULL;
+
+    if ((segment_stop >= 0) && ((pos + len) > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long size = ReadUInt(m_pReader, pos, len);
+
+    if (size < 0)  // error
+      return static_cast<long>(size);
+
+    pos += len;  // consume length of size of element
+
+    // Pos now points to start of payload
+
+    if (size == 0)  // weird
+      continue;
+
+    const long long unknown_size = (1LL << (7 * len)) - 1;
+
+    if ((segment_stop >= 0) && (size != unknown_size) && ((pos + size) > segment_stop)) {
+      return E_FILE_FORMAT_INVALID;
+    }
+
+    if (id == libwebm::kMkvCues) {
+      if (size == unknown_size) return E_FILE_FORMAT_INVALID;
+
+      const long long element_stop = pos + size;
+
+      if ((segment_stop >= 0) && (element_stop > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+      const long long element_start = idpos;
+      const long long element_size = element_stop - element_start;
+
+      if (m_pCues == NULL) {
+        m_pCues = new (std::nothrow) Cues(this, pos, size, element_start, element_size);
+        if (m_pCues == NULL) return false;
+      }
+
+      pos += size;  // consume payload
+      if (segment_stop >= 0 && pos > segment_stop) return E_FILE_FORMAT_INVALID;
+
+      continue;
+    }
+
+    if (id != libwebm::kMkvCluster) {  // not a Cluster ID
+      if (size == unknown_size) return E_FILE_FORMAT_INVALID;
+
+      pos += size;  // consume payload
+      if (segment_stop >= 0 && pos > segment_stop) return E_FILE_FORMAT_INVALID;
+
+      continue;
+    }
+
+    // We have a cluster.
+    off_next = idoff;
+
+    if (size != unknown_size) cluster_size = size;
+
+    break;
+  }
+
+  assert(off_next > 0);  // have cluster
+
+  // We have parsed the next cluster.
+  // We have not created a cluster object yet.  What we need
+  // to do now is determine whether it has already be preloaded
+  //(in which case, an object for this cluster has already been
+  // created), and if not, create a new cluster object.
+
+  Cluster** const ii = m_clusters + m_clusterCount;
+  Cluster** i = ii;
+
+  Cluster** const jj = ii + m_clusterPreloadCount;
+  Cluster** j = jj;
+
+  while (i < j) {
+    // INVARIANT:
+    //[0, i) < pos_next
+    //[i, j) ?
+    //[j, jj)  > pos_next
+
+    Cluster** const k = i + (j - i) / 2;
+    assert(k < jj);
+
+    const Cluster* const pNext = *k;
+    assert(pNext);
+    assert(pNext->m_index < 0);
+
+    pos = pNext->GetPosition();
+    assert(pos >= 0);
+
+    if (pos < off_next)
+      i = k + 1;
+    else if (pos > off_next)
+      j = k;
+    else {
+      pResult = pNext;
+      return 0;  // success
+    }
+  }
+
+  assert(i == j);
+
+  long long pos_;
+  long len_;
+
+  status = Cluster::HasBlockEntries(this, off_next, pos_, len_);
+
+  if (status < 0) {  // error or underflow
+    pos = pos_;
+    len = len_;
+
+    return status;
+  }
+
+  if (status > 0) {  // means "found at least one block entry"
+    Cluster* const pNext = Cluster::Create(this,
+                                           -1,  // preloaded
+                                           off_next);
+    if (pNext == NULL) return -1;
+
+    const ptrdiff_t idx_next = i - m_clusters;  // insertion position
+
+    if (!PreloadCluster(pNext, idx_next)) {
+      delete pNext;
+      return -1;
+    }
+    assert(m_clusters);
+    assert(idx_next < m_clusterSize);
+    assert(m_clusters[idx_next] == pNext);
+
+    pResult = pNext;
+    return 0;  // success
+  }
+
+  // status == 0 means "no block entries found"
+
+  if (cluster_size < 0) {               // unknown size
+    const long long payload_pos = pos;  // absolute pos of cluster payload
+
+    for (;;) {  // determine cluster size
+      if ((total >= 0) && (pos >= total)) break;
+
+      if ((segment_stop >= 0) && (pos >= segment_stop)) break;  // no more clusters
+
+      // Read ID
+
+      if ((pos + 1) > avail) {
+        len = 1;
+        return E_BUFFER_NOT_FULL;
+      }
+
+      long long result = GetUIntLength(m_pReader, pos, len);
+
+      if (result < 0)  // error
+        return static_cast<long>(result);
+
+      if (result > 0)  // weird
+        return E_BUFFER_NOT_FULL;
+
+      if ((segment_stop >= 0) && ((pos + len) > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+      if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+      const long long idpos = pos;
+      const long long id = ReadID(m_pReader, idpos, len);
+
+      if (id < 0)  // error (or underflow)
+        return static_cast<long>(id);
+
+      // This is the distinguished set of ID's we use to determine
+      // that we have exhausted the sub-element's inside the cluster
+      // whose ID we parsed earlier.
+
+      if (id == libwebm::kMkvCluster || id == libwebm::kMkvCues) break;
+
+      pos += len;  // consume ID (of sub-element)
+
+      // Read Size
+
+      if ((pos + 1) > avail) {
+        len = 1;
+        return E_BUFFER_NOT_FULL;
+      }
+
+      result = GetUIntLength(m_pReader, pos, len);
+
+      if (result < 0)  // error
+        return static_cast<long>(result);
+
+      if (result > 0)  // weird
+        return E_BUFFER_NOT_FULL;
+
+      if ((segment_stop >= 0) && ((pos + len) > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+      if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+      const long long size = ReadUInt(m_pReader, pos, len);
+
+      if (size < 0)  // error
+        return static_cast<long>(size);
+
+      pos += len;  // consume size field of element
+
+      // pos now points to start of sub-element's payload
+
+      if (size == 0)  // weird
+        continue;
+
+      const long long unknown_size = (1LL << (7 * len)) - 1;
+
+      if (size == unknown_size) return E_FILE_FORMAT_INVALID;  // not allowed for sub-elements
+
+      if ((segment_stop >= 0) && ((pos + size) > segment_stop))  // weird
+        return E_FILE_FORMAT_INVALID;
+
+      pos += size;  // consume payload of sub-element
+      if (segment_stop >= 0 && pos > segment_stop) return E_FILE_FORMAT_INVALID;
+    }  // determine cluster size
+
+    cluster_size = pos - payload_pos;
+    assert(cluster_size >= 0);  // TODO: handle cluster_size = 0
+
+    pos = payload_pos;  // reset and re-parse original cluster
+  }
+
+  pos += cluster_size;  // consume payload
+  if (segment_stop >= 0 && pos > segment_stop) return E_FILE_FORMAT_INVALID;
+
+  return 2;  // try to find a cluster that follows next
+}
+
+const Cluster* Segment::FindCluster(long long time_ns) const {
+  if ((m_clusters == NULL) || (m_clusterCount <= 0)) return &m_eos;
+
+  {
+    Cluster* const pCluster = m_clusters[0];
+    assert(pCluster);
+    assert(pCluster->m_index == 0);
+
+    if (time_ns <= pCluster->GetTime()) return pCluster;
+  }
+
+  // Binary search of cluster array
+
+  long i = 0;
+  long j = m_clusterCount;
+
+  while (i < j) {
+    // INVARIANT:
+    //[0, i) <= time_ns
+    //[i, j) ?
+    //[j, m_clusterCount)  > time_ns
+
+    const long k = i + (j - i) / 2;
+    assert(k < m_clusterCount);
+
+    Cluster* const pCluster = m_clusters[k];
+    assert(pCluster);
+    assert(pCluster->m_index == k);
+
+    const long long t = pCluster->GetTime();
+
+    if (t <= time_ns)
+      i = k + 1;
+    else
+      j = k;
+
+    assert(i <= j);
+  }
+
+  assert(i == j);
+  assert(i > 0);
+  assert(i <= m_clusterCount);
+
+  const long k = i - 1;
+
+  Cluster* const pCluster = m_clusters[k];
+  assert(pCluster);
+  assert(pCluster->m_index == k);
+  assert(pCluster->GetTime() <= time_ns);
+
+  return pCluster;
+}
+
+const Tracks* Segment::GetTracks() const { return m_pTracks; }
+const SegmentInfo* Segment::GetInfo() const { return m_pInfo; }
+const Cues* Segment::GetCues() const { return m_pCues; }
+const Chapters* Segment::GetChapters() const { return m_pChapters; }
+const Tags* Segment::GetTags() const { return m_pTags; }
+const SeekHead* Segment::GetSeekHead() const { return m_pSeekHead; }
+
+long long Segment::GetDuration() const {
+  assert(m_pInfo);
+  return m_pInfo->GetDuration();
+}
+
+Chapters::Chapters(Segment* pSegment, long long payload_start, long long payload_size, long long element_start,
+                   long long element_size)
+    : m_pSegment(pSegment),
+      m_start(payload_start),
+      m_size(payload_size),
+      m_element_start(element_start),
+      m_element_size(element_size),
+      m_editions(NULL),
+      m_editions_size(0),
+      m_editions_count(0) {}
+
+Chapters::~Chapters() {
+  while (m_editions_count > 0) {
+    Edition& e = m_editions[--m_editions_count];
+    e.Clear();
+  }
+  delete[] m_editions;
+}
+
+long Chapters::Parse() {
+  IMkvReader* const pReader = m_pSegment->m_pReader;
+
+  long long pos = m_start;              // payload start
+  const long long stop = pos + m_size;  // payload stop
+
+  while (pos < stop) {
+    long long id, size;
+
+    long status = ParseElementHeader(pReader, pos, stop, id, size);
+
+    if (status < 0)  // error
+      return status;
+
+    if (size == 0)  // weird
+      continue;
+
+    if (id == libwebm::kMkvEditionEntry) {
+      status = ParseEdition(pos, size);
+
+      if (status < 0)  // error
+        return status;
+    }
+
+    pos += size;
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+  return 0;
+}
+
+int Chapters::GetEditionCount() const { return m_editions_count; }
+
+const Chapters::Edition* Chapters::GetEdition(int idx) const {
+  if (idx < 0) return NULL;
+
+  if (idx >= m_editions_count) return NULL;
+
+  return m_editions + idx;
+}
+
+bool Chapters::ExpandEditionsArray() {
+  if (m_editions_size > m_editions_count) return true;  // nothing else to do
+
+  const int size = (m_editions_size == 0) ? 1 : 2 * m_editions_size;
+
+  Edition* const editions = new (std::nothrow) Edition[size];
+
+  if (editions == NULL) return false;
+
+  for (int idx = 0; idx < m_editions_count; ++idx) {
+    m_editions[idx].ShallowCopy(editions[idx]);
+  }
+
+  delete[] m_editions;
+  m_editions = editions;
+
+  m_editions_size = size;
+  return true;
+}
+
+long Chapters::ParseEdition(long long pos, long long size) {
+  if (!ExpandEditionsArray()) return -1;
+
+  Edition& e = m_editions[m_editions_count++];
+  e.Init();
+
+  return e.Parse(m_pSegment->m_pReader, pos, size);
+}
+
+Chapters::Edition::Edition() {}
+
+Chapters::Edition::~Edition() {}
+
+int Chapters::Edition::GetAtomCount() const { return m_atoms_count; }
+
+const Chapters::Atom* Chapters::Edition::GetAtom(int index) const {
+  if (index < 0) return NULL;
+
+  if (index >= m_atoms_count) return NULL;
+
+  return m_atoms + index;
+}
+
+void Chapters::Edition::Init() {
+  m_atoms = NULL;
+  m_atoms_size = 0;
+  m_atoms_count = 0;
+}
+
+void Chapters::Edition::ShallowCopy(Edition& rhs) const {
+  rhs.m_atoms = m_atoms;
+  rhs.m_atoms_size = m_atoms_size;
+  rhs.m_atoms_count = m_atoms_count;
+}
+
+void Chapters::Edition::Clear() {
+  while (m_atoms_count > 0) {
+    Atom& a = m_atoms[--m_atoms_count];
+    a.Clear();
+  }
+
+  delete[] m_atoms;
+  m_atoms = NULL;
+
+  m_atoms_size = 0;
+}
+
+long Chapters::Edition::Parse(IMkvReader* pReader, long long pos, long long size) {
+  const long long stop = pos + size;
+
+  while (pos < stop) {
+    long long id, size;
+
+    long status = ParseElementHeader(pReader, pos, stop, id, size);
+
+    if (status < 0)  // error
+      return status;
+
+    if (size == 0) continue;
+
+    if (id == libwebm::kMkvChapterAtom) {
+      status = ParseAtom(pReader, pos, size);
+
+      if (status < 0)  // error
+        return status;
+    }
+
+    pos += size;
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+  return 0;
+}
+
+long Chapters::Edition::ParseAtom(IMkvReader* pReader, long long pos, long long size) {
+  if (!ExpandAtomsArray()) return -1;
+
+  Atom& a = m_atoms[m_atoms_count++];
+  a.Init();
+
+  return a.Parse(pReader, pos, size);
+}
+
+bool Chapters::Edition::ExpandAtomsArray() {
+  if (m_atoms_size > m_atoms_count) return true;  // nothing else to do
+
+  const int size = (m_atoms_size == 0) ? 1 : 2 * m_atoms_size;
+
+  Atom* const atoms = new (std::nothrow) Atom[size];
+
+  if (atoms == NULL) return false;
+
+  for (int idx = 0; idx < m_atoms_count; ++idx) {
+    m_atoms[idx].ShallowCopy(atoms[idx]);
+  }
+
+  delete[] m_atoms;
+  m_atoms = atoms;
+
+  m_atoms_size = size;
+  return true;
+}
+
+Chapters::Atom::Atom() {}
+
+Chapters::Atom::~Atom() {}
+
+unsigned long long Chapters::Atom::GetUID() const { return m_uid; }
+
+const char* Chapters::Atom::GetStringUID() const { return m_string_uid; }
+
+long long Chapters::Atom::GetStartTimecode() const { return m_start_timecode; }
+
+long long Chapters::Atom::GetStopTimecode() const { return m_stop_timecode; }
+
+long long Chapters::Atom::GetStartTime(const Chapters* pChapters) const { return GetTime(pChapters, m_start_timecode); }
+
+long long Chapters::Atom::GetStopTime(const Chapters* pChapters) const { return GetTime(pChapters, m_stop_timecode); }
+
+int Chapters::Atom::GetDisplayCount() const { return m_displays_count; }
+
+const Chapters::Display* Chapters::Atom::GetDisplay(int index) const {
+  if (index < 0) return NULL;
+
+  if (index >= m_displays_count) return NULL;
+
+  return m_displays + index;
+}
+
+void Chapters::Atom::Init() {
+  m_string_uid = NULL;
+  m_uid = 0;
+  m_start_timecode = -1;
+  m_stop_timecode = -1;
+
+  m_displays = NULL;
+  m_displays_size = 0;
+  m_displays_count = 0;
+}
+
+void Chapters::Atom::ShallowCopy(Atom& rhs) const {
+  rhs.m_string_uid = m_string_uid;
+  rhs.m_uid = m_uid;
+  rhs.m_start_timecode = m_start_timecode;
+  rhs.m_stop_timecode = m_stop_timecode;
+
+  rhs.m_displays = m_displays;
+  rhs.m_displays_size = m_displays_size;
+  rhs.m_displays_count = m_displays_count;
+}
+
+void Chapters::Atom::Clear() {
+  delete[] m_string_uid;
+  m_string_uid = NULL;
+
+  while (m_displays_count > 0) {
+    Display& d = m_displays[--m_displays_count];
+    d.Clear();
+  }
+
+  delete[] m_displays;
+  m_displays = NULL;
+
+  m_displays_size = 0;
+}
+
+long Chapters::Atom::Parse(IMkvReader* pReader, long long pos, long long size) {
+  const long long stop = pos + size;
+
+  while (pos < stop) {
+    long long id, size;
+
+    long status = ParseElementHeader(pReader, pos, stop, id, size);
+
+    if (status < 0)  // error
+      return status;
+
+    if (size == 0)  // 0 length payload, skip.
+      continue;
+
+    if (id == libwebm::kMkvChapterDisplay) {
+      status = ParseDisplay(pReader, pos, size);
+
+      if (status < 0)  // error
+        return status;
+    } else if (id == libwebm::kMkvChapterStringUID) {
+      status = UnserializeString(pReader, pos, size, m_string_uid);
+
+      if (status < 0)  // error
+        return status;
+    } else if (id == libwebm::kMkvChapterUID) {
+      long long val;
+      status = UnserializeInt(pReader, pos, size, val);
+
+      if (status < 0)  // error
+        return status;
+
+      m_uid = static_cast<unsigned long long>(val);
+    } else if (id == libwebm::kMkvChapterTimeStart) {
+      const long long val = UnserializeUInt(pReader, pos, size);
+
+      if (val < 0)  // error
+        return static_cast<long>(val);
+
+      m_start_timecode = val;
+    } else if (id == libwebm::kMkvChapterTimeEnd) {
+      const long long val = UnserializeUInt(pReader, pos, size);
+
+      if (val < 0)  // error
+        return static_cast<long>(val);
+
+      m_stop_timecode = val;
+    }
+
+    pos += size;
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+  return 0;
+}
+
+long long Chapters::Atom::GetTime(const Chapters* pChapters, long long timecode) {
+  if (pChapters == NULL) return -1;
+
+  Segment* const pSegment = pChapters->m_pSegment;
+
+  if (pSegment == NULL)  // weird
+    return -1;
+
+  const SegmentInfo* const pInfo = pSegment->GetInfo();
+
+  if (pInfo == NULL) return -1;
+
+  const long long timecode_scale = pInfo->GetTimeCodeScale();
+
+  if (timecode_scale < 1)  // weird
+    return -1;
+
+  if (timecode < 0) return -1;
+
+  const long long result = timecode_scale * timecode;
+
+  return result;
+}
+
+long Chapters::Atom::ParseDisplay(IMkvReader* pReader, long long pos, long long size) {
+  if (!ExpandDisplaysArray()) return -1;
+
+  Display& d = m_displays[m_displays_count++];
+  d.Init();
+
+  return d.Parse(pReader, pos, size);
+}
+
+bool Chapters::Atom::ExpandDisplaysArray() {
+  if (m_displays_size > m_displays_count) return true;  // nothing else to do
+
+  const int size = (m_displays_size == 0) ? 1 : 2 * m_displays_size;
+
+  Display* const displays = new (std::nothrow) Display[size];
+
+  if (displays == NULL) return false;
+
+  for (int idx = 0; idx < m_displays_count; ++idx) {
+    m_displays[idx].ShallowCopy(displays[idx]);
+  }
+
+  delete[] m_displays;
+  m_displays = displays;
+
+  m_displays_size = size;
+  return true;
+}
+
+Chapters::Display::Display() {}
+
+Chapters::Display::~Display() {}
+
+const char* Chapters::Display::GetString() const { return m_string; }
+
+const char* Chapters::Display::GetLanguage() const { return m_language; }
+
+const char* Chapters::Display::GetCountry() const { return m_country; }
+
+void Chapters::Display::Init() {
+  m_string = NULL;
+  m_language = NULL;
+  m_country = NULL;
+}
+
+void Chapters::Display::ShallowCopy(Display& rhs) const {
+  rhs.m_string = m_string;
+  rhs.m_language = m_language;
+  rhs.m_country = m_country;
+}
+
+void Chapters::Display::Clear() {
+  delete[] m_string;
+  m_string = NULL;
+
+  delete[] m_language;
+  m_language = NULL;
+
+  delete[] m_country;
+  m_country = NULL;
+}
+
+long Chapters::Display::Parse(IMkvReader* pReader, long long pos, long long size) {
+  const long long stop = pos + size;
+
+  while (pos < stop) {
+    long long id, size;
+
+    long status = ParseElementHeader(pReader, pos, stop, id, size);
+
+    if (status < 0)  // error
+      return status;
+
+    if (size == 0)  // No payload.
+      continue;
+
+    if (id == libwebm::kMkvChapString) {
+      status = UnserializeString(pReader, pos, size, m_string);
+
+      if (status) return status;
+    } else if (id == libwebm::kMkvChapLanguage) {
+      status = UnserializeString(pReader, pos, size, m_language);
+
+      if (status) return status;
+    } else if (id == libwebm::kMkvChapCountry) {
+      status = UnserializeString(pReader, pos, size, m_country);
+
+      if (status) return status;
+    }
+
+    pos += size;
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+  return 0;
+}
+
+Tags::Tags(Segment* pSegment, long long payload_start, long long payload_size, long long element_start,
+           long long element_size)
+    : m_pSegment(pSegment),
+      m_start(payload_start),
+      m_size(payload_size),
+      m_element_start(element_start),
+      m_element_size(element_size),
+      m_tags(NULL),
+      m_tags_size(0),
+      m_tags_count(0) {}
+
+Tags::~Tags() {
+  while (m_tags_count > 0) {
+    Tag& t = m_tags[--m_tags_count];
+    t.Clear();
+  }
+  delete[] m_tags;
+}
+
+long Tags::Parse() {
+  IMkvReader* const pReader = m_pSegment->m_pReader;
+
+  long long pos = m_start;              // payload start
+  const long long stop = pos + m_size;  // payload stop
+
+  while (pos < stop) {
+    long long id, size;
+
+    long status = ParseElementHeader(pReader, pos, stop, id, size);
+
+    if (status < 0) return status;
+
+    if (size == 0)  // 0 length tag, read another
+      continue;
+
+    if (id == libwebm::kMkvTag) {
+      status = ParseTag(pos, size);
+
+      if (status < 0) return status;
+    }
+
+    pos += size;
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+
+  return 0;
+}
+
+int Tags::GetTagCount() const { return m_tags_count; }
+
+const Tags::Tag* Tags::GetTag(int idx) const {
+  if (idx < 0) return NULL;
+
+  if (idx >= m_tags_count) return NULL;
+
+  return m_tags + idx;
+}
+
+bool Tags::ExpandTagsArray() {
+  if (m_tags_size > m_tags_count) return true;  // nothing else to do
+
+  const int size = (m_tags_size == 0) ? 1 : 2 * m_tags_size;
+
+  Tag* const tags = new (std::nothrow) Tag[size];
+
+  if (tags == NULL) return false;
+
+  for (int idx = 0; idx < m_tags_count; ++idx) {
+    m_tags[idx].ShallowCopy(tags[idx]);
+  }
+
+  delete[] m_tags;
+  m_tags = tags;
+
+  m_tags_size = size;
+  return true;
+}
+
+long Tags::ParseTag(long long pos, long long size) {
+  if (!ExpandTagsArray()) return -1;
+
+  Tag& t = m_tags[m_tags_count++];
+  t.Init();
+
+  return t.Parse(m_pSegment->m_pReader, pos, size);
+}
+
+Tags::Tag::Tag() {}
+
+Tags::Tag::~Tag() {}
+
+int Tags::Tag::GetSimpleTagCount() const { return m_simple_tags_count; }
+
+const Tags::SimpleTag* Tags::Tag::GetSimpleTag(int index) const {
+  if (index < 0) return NULL;
+
+  if (index >= m_simple_tags_count) return NULL;
+
+  return m_simple_tags + index;
+}
+
+void Tags::Tag::Init() {
+  m_simple_tags = NULL;
+  m_simple_tags_size = 0;
+  m_simple_tags_count = 0;
+}
+
+void Tags::Tag::ShallowCopy(Tag& rhs) const {
+  rhs.m_simple_tags = m_simple_tags;
+  rhs.m_simple_tags_size = m_simple_tags_size;
+  rhs.m_simple_tags_count = m_simple_tags_count;
+}
+
+void Tags::Tag::Clear() {
+  while (m_simple_tags_count > 0) {
+    SimpleTag& d = m_simple_tags[--m_simple_tags_count];
+    d.Clear();
+  }
+
+  delete[] m_simple_tags;
+  m_simple_tags = NULL;
+
+  m_simple_tags_size = 0;
+}
+
+long Tags::Tag::Parse(IMkvReader* pReader, long long pos, long long size) {
+  const long long stop = pos + size;
+
+  while (pos < stop) {
+    long long id, size;
+
+    long status = ParseElementHeader(pReader, pos, stop, id, size);
+
+    if (status < 0) return status;
+
+    if (size == 0)  // 0 length tag, read another
+      continue;
+
+    if (id == libwebm::kMkvSimpleTag) {
+      status = ParseSimpleTag(pReader, pos, size);
+
+      if (status < 0) return status;
+    }
+
+    pos += size;
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+  return 0;
+}
+
+long Tags::Tag::ParseSimpleTag(IMkvReader* pReader, long long pos, long long size) {
+  if (!ExpandSimpleTagsArray()) return -1;
+
+  SimpleTag& st = m_simple_tags[m_simple_tags_count++];
+  st.Init();
+
+  return st.Parse(pReader, pos, size);
+}
+
+bool Tags::Tag::ExpandSimpleTagsArray() {
+  if (m_simple_tags_size > m_simple_tags_count) return true;  // nothing else to do
+
+  const int size = (m_simple_tags_size == 0) ? 1 : 2 * m_simple_tags_size;
+
+  SimpleTag* const displays = new (std::nothrow) SimpleTag[size];
+
+  if (displays == NULL) return false;
+
+  for (int idx = 0; idx < m_simple_tags_count; ++idx) {
+    m_simple_tags[idx].ShallowCopy(displays[idx]);
+  }
+
+  delete[] m_simple_tags;
+  m_simple_tags = displays;
+
+  m_simple_tags_size = size;
+  return true;
+}
+
+Tags::SimpleTag::SimpleTag() {}
+
+Tags::SimpleTag::~SimpleTag() {}
+
+const char* Tags::SimpleTag::GetTagName() const { return m_tag_name; }
+
+const char* Tags::SimpleTag::GetTagString() const { return m_tag_string; }
+
+void Tags::SimpleTag::Init() {
+  m_tag_name = NULL;
+  m_tag_string = NULL;
+}
+
+void Tags::SimpleTag::ShallowCopy(SimpleTag& rhs) const {
+  rhs.m_tag_name = m_tag_name;
+  rhs.m_tag_string = m_tag_string;
+}
+
+void Tags::SimpleTag::Clear() {
+  delete[] m_tag_name;
+  m_tag_name = NULL;
+
+  delete[] m_tag_string;
+  m_tag_string = NULL;
+}
+
+long Tags::SimpleTag::Parse(IMkvReader* pReader, long long pos, long long size) {
+  const long long stop = pos + size;
+
+  while (pos < stop) {
+    long long id, size;
+
+    long status = ParseElementHeader(pReader, pos, stop, id, size);
+
+    if (status < 0)  // error
+      return status;
+
+    if (size == 0)  // weird
+      continue;
+
+    if (id == libwebm::kMkvTagName) {
+      status = UnserializeString(pReader, pos, size, m_tag_name);
+
+      if (status) return status;
+    } else if (id == libwebm::kMkvTagString) {
+      status = UnserializeString(pReader, pos, size, m_tag_string);
+
+      if (status) return status;
+    }
+
+    pos += size;
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+  return 0;
+}
+
+SegmentInfo::SegmentInfo(Segment* pSegment, long long start, long long size_, long long element_start,
+                         long long element_size)
+    : m_pSegment(pSegment),
+      m_start(start),
+      m_size(size_),
+      m_element_start(element_start),
+      m_element_size(element_size),
+      m_pMuxingAppAsUTF8(NULL),
+      m_pWritingAppAsUTF8(NULL),
+      m_pTitleAsUTF8(NULL) {}
+
+SegmentInfo::~SegmentInfo() {
+  delete[] m_pMuxingAppAsUTF8;
+  m_pMuxingAppAsUTF8 = NULL;
+
+  delete[] m_pWritingAppAsUTF8;
+  m_pWritingAppAsUTF8 = NULL;
+
+  delete[] m_pTitleAsUTF8;
+  m_pTitleAsUTF8 = NULL;
+}
+
+long SegmentInfo::Parse() {
+  assert(m_pMuxingAppAsUTF8 == NULL);
+  assert(m_pWritingAppAsUTF8 == NULL);
+  assert(m_pTitleAsUTF8 == NULL);
+
+  IMkvReader* const pReader = m_pSegment->m_pReader;
+
+  long long pos = m_start;
+  const long long stop = m_start + m_size;
+
+  m_timecodeScale = 1000000;
+  m_duration = -1;
+
+  while (pos < stop) {
+    long long id, size;
+
+    const long status = ParseElementHeader(pReader, pos, stop, id, size);
+
+    if (status < 0)  // error
+      return status;
+
+    if (id == libwebm::kMkvTimecodeScale) {
+      m_timecodeScale = UnserializeUInt(pReader, pos, size);
+
+      if (m_timecodeScale <= 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvDuration) {
+      const long status = UnserializeFloat(pReader, pos, size, m_duration);
+
+      if (status < 0) return status;
+
+      if (m_duration < 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvMuxingApp) {
+      const long status = UnserializeString(pReader, pos, size, m_pMuxingAppAsUTF8);
+
+      if (status) return status;
+    } else if (id == libwebm::kMkvWritingApp) {
+      const long status = UnserializeString(pReader, pos, size, m_pWritingAppAsUTF8);
+
+      if (status) return status;
+    } else if (id == libwebm::kMkvTitle) {
+      const long status = UnserializeString(pReader, pos, size, m_pTitleAsUTF8);
+
+      if (status) return status;
+    }
+
+    pos += size;
+
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  const double rollover_check = m_duration * m_timecodeScale;
+  if (rollover_check > static_cast<double>(LLONG_MAX)) return E_FILE_FORMAT_INVALID;
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+
+  return 0;
+}
+
+long long SegmentInfo::GetTimeCodeScale() const { return m_timecodeScale; }
+
+long long SegmentInfo::GetDuration() const {
+  if (m_duration < 0) return -1;
+
+  assert(m_timecodeScale >= 1);
+
+  const double dd = double(m_duration) * double(m_timecodeScale);
+  const long long d = static_cast<long long>(dd);
+
+  return d;
+}
+
+const char* SegmentInfo::GetMuxingAppAsUTF8() const { return m_pMuxingAppAsUTF8; }
+
+const char* SegmentInfo::GetWritingAppAsUTF8() const { return m_pWritingAppAsUTF8; }
+
+const char* SegmentInfo::GetTitleAsUTF8() const { return m_pTitleAsUTF8; }
+
+///////////////////////////////////////////////////////////////
+// ContentEncoding element
+ContentEncoding::ContentCompression::ContentCompression() : algo(0), settings(NULL), settings_len(0) {}
+
+ContentEncoding::ContentCompression::~ContentCompression() { delete[] settings; }
+
+ContentEncoding::ContentEncryption::ContentEncryption()
+    : algo(0),
+      key_id(NULL),
+      key_id_len(0),
+      signature(NULL),
+      signature_len(0),
+      sig_key_id(NULL),
+      sig_key_id_len(0),
+      sig_algo(0),
+      sig_hash_algo(0) {}
+
+ContentEncoding::ContentEncryption::~ContentEncryption() {
+  delete[] key_id;
+  delete[] signature;
+  delete[] sig_key_id;
+}
+
+ContentEncoding::ContentEncoding()
+    : compression_entries_(NULL),
+      compression_entries_end_(NULL),
+      encryption_entries_(NULL),
+      encryption_entries_end_(NULL),
+      encoding_order_(0),
+      encoding_scope_(1),
+      encoding_type_(0) {}
+
+ContentEncoding::~ContentEncoding() {
+  ContentCompression** comp_i = compression_entries_;
+  ContentCompression** const comp_j = compression_entries_end_;
+
+  while (comp_i != comp_j) {
+    ContentCompression* const comp = *comp_i++;
+    delete comp;
+  }
+
+  delete[] compression_entries_;
+
+  ContentEncryption** enc_i = encryption_entries_;
+  ContentEncryption** const enc_j = encryption_entries_end_;
+
+  while (enc_i != enc_j) {
+    ContentEncryption* const enc = *enc_i++;
+    delete enc;
+  }
+
+  delete[] encryption_entries_;
+}
+
+const ContentEncoding::ContentCompression* ContentEncoding::GetCompressionByIndex(unsigned long idx) const {
+  const ptrdiff_t count = compression_entries_end_ - compression_entries_;
+  assert(count >= 0);
+
+  if (idx >= static_cast<unsigned long>(count)) return NULL;
+
+  return compression_entries_[idx];
+}
+
+unsigned long ContentEncoding::GetCompressionCount() const {
+  const ptrdiff_t count = compression_entries_end_ - compression_entries_;
+  assert(count >= 0);
+
+  return static_cast<unsigned long>(count);
+}
+
+const ContentEncoding::ContentEncryption* ContentEncoding::GetEncryptionByIndex(unsigned long idx) const {
+  const ptrdiff_t count = encryption_entries_end_ - encryption_entries_;
+  assert(count >= 0);
+
+  if (idx >= static_cast<unsigned long>(count)) return NULL;
+
+  return encryption_entries_[idx];
+}
+
+unsigned long ContentEncoding::GetEncryptionCount() const {
+  const ptrdiff_t count = encryption_entries_end_ - encryption_entries_;
+  assert(count >= 0);
+
+  return static_cast<unsigned long>(count);
+}
+
+long ContentEncoding::ParseContentEncAESSettingsEntry(long long start, long long size, IMkvReader* pReader,
+                                                      ContentEncAESSettings* aes) {
+  assert(pReader);
+  assert(aes);
+
+  long long pos = start;
+  const long long stop = start + size;
+
+  while (pos < stop) {
+    long long id, size;
+    const long status = ParseElementHeader(pReader, pos, stop, id, size);
+    if (status < 0)  // error
+      return status;
+
+    if (id == libwebm::kMkvAESSettingsCipherMode) {
+      aes->cipher_mode = UnserializeUInt(pReader, pos, size);
+      if (aes->cipher_mode != 1) return E_FILE_FORMAT_INVALID;
+    }
+
+    pos += size;  // consume payload
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  return 0;
+}
+
+long ContentEncoding::ParseContentEncodingEntry(long long start, long long size, IMkvReader* pReader) {
+  assert(pReader);
+
+  long long pos = start;
+  const long long stop = start + size;
+
+  // Count ContentCompression and ContentEncryption elements.
+  int compression_count = 0;
+  int encryption_count = 0;
+
+  while (pos < stop) {
+    long long id, size;
+    const long status = ParseElementHeader(pReader, pos, stop, id, size);
+    if (status < 0)  // error
+      return status;
+
+    if (id == libwebm::kMkvContentCompression) ++compression_count;
+
+    if (id == libwebm::kMkvContentEncryption) ++encryption_count;
+
+    pos += size;  // consume payload
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (compression_count <= 0 && encryption_count <= 0) return -1;
+
+  if (compression_count > 0) {
+    compression_entries_ = new (std::nothrow) ContentCompression*[compression_count];
+    if (!compression_entries_) return -1;
+    compression_entries_end_ = compression_entries_;
+  }
+
+  if (encryption_count > 0) {
+    encryption_entries_ = new (std::nothrow) ContentEncryption*[encryption_count];
+    if (!encryption_entries_) {
+      delete[] compression_entries_;
+      return -1;
+    }
+    encryption_entries_end_ = encryption_entries_;
+  }
+
+  pos = start;
+  while (pos < stop) {
+    long long id, size;
+    long status = ParseElementHeader(pReader, pos, stop, id, size);
+    if (status < 0)  // error
+      return status;
+
+    if (id == libwebm::kMkvContentEncodingOrder) {
+      encoding_order_ = UnserializeUInt(pReader, pos, size);
+    } else if (id == libwebm::kMkvContentEncodingScope) {
+      encoding_scope_ = UnserializeUInt(pReader, pos, size);
+      if (encoding_scope_ < 1) return -1;
+    } else if (id == libwebm::kMkvContentEncodingType) {
+      encoding_type_ = UnserializeUInt(pReader, pos, size);
+    } else if (id == libwebm::kMkvContentCompression) {
+      ContentCompression* const compression = new (std::nothrow) ContentCompression();
+      if (!compression) return -1;
+
+      status = ParseCompressionEntry(pos, size, pReader, compression);
+      if (status) {
+        delete compression;
+        return status;
+      }
+      *compression_entries_end_++ = compression;
+    } else if (id == libwebm::kMkvContentEncryption) {
+      ContentEncryption* const encryption = new (std::nothrow) ContentEncryption();
+      if (!encryption) return -1;
+
+      status = ParseEncryptionEntry(pos, size, pReader, encryption);
+      if (status) {
+        delete encryption;
+        return status;
+      }
+      *encryption_entries_end_++ = encryption;
+    }
+
+    pos += size;  // consume payload
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+  return 0;
+}
+
+long ContentEncoding::ParseCompressionEntry(long long start, long long size, IMkvReader* pReader,
+                                            ContentCompression* compression) {
+  assert(pReader);
+  assert(compression);
+
+  long long pos = start;
+  const long long stop = start + size;
+
+  bool valid = false;
+
+  while (pos < stop) {
+    long long id, size;
+    const long status = ParseElementHeader(pReader, pos, stop, id, size);
+    if (status < 0)  // error
+      return status;
+
+    if (id == libwebm::kMkvContentCompAlgo) {
+      long long algo = UnserializeUInt(pReader, pos, size);
+      if (algo < 0) return E_FILE_FORMAT_INVALID;
+      compression->algo = algo;
+      valid = true;
+    } else if (id == libwebm::kMkvContentCompSettings) {
+      if (size <= 0) return E_FILE_FORMAT_INVALID;
+
+      const size_t buflen = static_cast<size_t>(size);
+      unsigned char* buf = SafeArrayAlloc<unsigned char>(1, buflen);
+      if (buf == NULL) return -1;
+
+      const int read_status = pReader->Read(pos, static_cast<long>(buflen), buf);
+      if (read_status) {
+        delete[] buf;
+        return status;
+      }
+
+      compression->settings = buf;
+      compression->settings_len = buflen;
+    }
+
+    pos += size;  // consume payload
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  // ContentCompAlgo is mandatory
+  if (!valid) return E_FILE_FORMAT_INVALID;
+
+  return 0;
+}
+
+long ContentEncoding::ParseEncryptionEntry(long long start, long long size, IMkvReader* pReader,
+                                           ContentEncryption* encryption) {
+  assert(pReader);
+  assert(encryption);
+
+  long long pos = start;
+  const long long stop = start + size;
+
+  while (pos < stop) {
+    long long id, size;
+    const long status = ParseElementHeader(pReader, pos, stop, id, size);
+    if (status < 0)  // error
+      return status;
+
+    if (id == libwebm::kMkvContentEncAlgo) {
+      encryption->algo = UnserializeUInt(pReader, pos, size);
+      if (encryption->algo != 5) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvContentEncKeyID) {
+      delete[] encryption->key_id;
+      encryption->key_id = NULL;
+      encryption->key_id_len = 0;
+
+      if (size <= 0) return E_FILE_FORMAT_INVALID;
+
+      const size_t buflen = static_cast<size_t>(size);
+      unsigned char* buf = SafeArrayAlloc<unsigned char>(1, buflen);
+      if (buf == NULL) return -1;
+
+      const int read_status = pReader->Read(pos, static_cast<long>(buflen), buf);
+      if (read_status) {
+        delete[] buf;
+        return status;
+      }
+
+      encryption->key_id = buf;
+      encryption->key_id_len = buflen;
+    } else if (id == libwebm::kMkvContentSignature) {
+      delete[] encryption->signature;
+      encryption->signature = NULL;
+      encryption->signature_len = 0;
+
+      if (size <= 0) return E_FILE_FORMAT_INVALID;
+
+      const size_t buflen = static_cast<size_t>(size);
+      unsigned char* buf = SafeArrayAlloc<unsigned char>(1, buflen);
+      if (buf == NULL) return -1;
+
+      const int read_status = pReader->Read(pos, static_cast<long>(buflen), buf);
+      if (read_status) {
+        delete[] buf;
+        return status;
+      }
+
+      encryption->signature = buf;
+      encryption->signature_len = buflen;
+    } else if (id == libwebm::kMkvContentSigKeyID) {
+      delete[] encryption->sig_key_id;
+      encryption->sig_key_id = NULL;
+      encryption->sig_key_id_len = 0;
+
+      if (size <= 0) return E_FILE_FORMAT_INVALID;
+
+      const size_t buflen = static_cast<size_t>(size);
+      unsigned char* buf = SafeArrayAlloc<unsigned char>(1, buflen);
+      if (buf == NULL) return -1;
+
+      const int read_status = pReader->Read(pos, static_cast<long>(buflen), buf);
+      if (read_status) {
+        delete[] buf;
+        return status;
+      }
+
+      encryption->sig_key_id = buf;
+      encryption->sig_key_id_len = buflen;
+    } else if (id == libwebm::kMkvContentSigAlgo) {
+      encryption->sig_algo = UnserializeUInt(pReader, pos, size);
+    } else if (id == libwebm::kMkvContentSigHashAlgo) {
+      encryption->sig_hash_algo = UnserializeUInt(pReader, pos, size);
+    } else if (id == libwebm::kMkvContentEncAESSettings) {
+      const long status = ParseContentEncAESSettingsEntry(pos, size, pReader, &encryption->aes_settings);
+      if (status) return status;
+    }
+
+    pos += size;  // consume payload
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  return 0;
+}
+
+Track::Track(Segment* pSegment, long long element_start, long long element_size)
+    : m_pSegment(pSegment),
+      m_element_start(element_start),
+      m_element_size(element_size),
+      content_encoding_entries_(NULL),
+      content_encoding_entries_end_(NULL) {}
+
+Track::~Track() {
+  Info& info = const_cast<Info&>(m_info);
+  info.Clear();
+
+  ContentEncoding** i = content_encoding_entries_;
+  ContentEncoding** const j = content_encoding_entries_end_;
+
+  while (i != j) {
+    ContentEncoding* const encoding = *i++;
+    delete encoding;
+  }
+
+  delete[] content_encoding_entries_;
+}
+
+long Track::Create(Segment* pSegment, const Info& info, long long element_start, long long element_size,
+                   Track*& pResult) {
+  if (pResult) return -1;
+
+  Track* const pTrack = new (std::nothrow) Track(pSegment, element_start, element_size);
+
+  if (pTrack == NULL) return -1;  // generic error
+
+  const int status = info.Copy(pTrack->m_info);
+
+  if (status) {  // error
+    delete pTrack;
+    return status;
+  }
+
+  pResult = pTrack;
+  return 0;  // success
+}
+
+Track::Info::Info()
+    : uid(0),
+      defaultDuration(0),
+      codecDelay(0),
+      seekPreRoll(0),
+      nameAsUTF8(NULL),
+      language(NULL),
+      codecId(NULL),
+      codecNameAsUTF8(NULL),
+      codecPrivate(NULL),
+      codecPrivateSize(0),
+      lacing(false) {}
+
+Track::Info::~Info() { Clear(); }
+
+void Track::Info::Clear() {
+  delete[] nameAsUTF8;
+  nameAsUTF8 = NULL;
+
+  delete[] language;
+  language = NULL;
+
+  delete[] codecId;
+  codecId = NULL;
+
+  delete[] codecPrivate;
+  codecPrivate = NULL;
+  codecPrivateSize = 0;
+
+  delete[] codecNameAsUTF8;
+  codecNameAsUTF8 = NULL;
+}
+
+int Track::Info::CopyStr(char* Info::*str, Info& dst_) const {
+  if (str == static_cast<char * Info::*>(NULL)) return -1;
+
+  char*& dst = dst_.*str;
+
+  if (dst)  // should be NULL already
+    return -1;
+
+  const char* const src = this->*str;
+
+  if (src == NULL) return 0;
+
+  const size_t len = strlen(src);
+
+  dst = SafeArrayAlloc<char>(1, len + 1);
+
+  if (dst == NULL) return -1;
+
+  strcpy(dst, src);
+
+  return 0;
+}
+
+int Track::Info::Copy(Info& dst) const {
+  if (&dst == this) return 0;
+
+  dst.type = type;
+  dst.number = number;
+  dst.defaultDuration = defaultDuration;
+  dst.codecDelay = codecDelay;
+  dst.seekPreRoll = seekPreRoll;
+  dst.uid = uid;
+  dst.lacing = lacing;
+  dst.settings = settings;
+
+  // We now copy the string member variables from src to dst.
+  // This involves memory allocation so in principle the operation
+  // can fail (indeed, that's why we have Info::Copy), so we must
+  // report this to the caller.  An error return from this function
+  // therefore implies that the copy was only partially successful.
+
+  if (int status = CopyStr(&Info::nameAsUTF8, dst)) return status;
+
+  if (int status = CopyStr(&Info::language, dst)) return status;
+
+  if (int status = CopyStr(&Info::codecId, dst)) return status;
+
+  if (int status = CopyStr(&Info::codecNameAsUTF8, dst)) return status;
+
+  if (codecPrivateSize > 0) {
+    if (codecPrivate == NULL) return -1;
+
+    if (dst.codecPrivate) return -1;
+
+    if (dst.codecPrivateSize != 0) return -1;
+
+    dst.codecPrivate = SafeArrayAlloc<unsigned char>(1, codecPrivateSize);
+
+    if (dst.codecPrivate == NULL) return -1;
+
+    memcpy(dst.codecPrivate, codecPrivate, codecPrivateSize);
+    dst.codecPrivateSize = codecPrivateSize;
+  }
+
+  return 0;
+}
+
+const BlockEntry* Track::GetEOS() const { return &m_eos; }
+
+long Track::GetType() const { return m_info.type; }
+
+long Track::GetNumber() const { return m_info.number; }
+
+unsigned long long Track::GetUid() const { return m_info.uid; }
+
+const char* Track::GetNameAsUTF8() const { return m_info.nameAsUTF8; }
+
+const char* Track::GetLanguage() const { return m_info.language; }
+
+const char* Track::GetCodecNameAsUTF8() const { return m_info.codecNameAsUTF8; }
+
+const char* Track::GetCodecId() const { return m_info.codecId; }
+
+const unsigned char* Track::GetCodecPrivate(size_t& size) const {
+  size = m_info.codecPrivateSize;
+  return m_info.codecPrivate;
+}
+
+bool Track::GetLacing() const { return m_info.lacing; }
+
+unsigned long long Track::GetDefaultDuration() const { return m_info.defaultDuration; }
+
+unsigned long long Track::GetCodecDelay() const { return m_info.codecDelay; }
+
+unsigned long long Track::GetSeekPreRoll() const { return m_info.seekPreRoll; }
+
+long Track::GetFirst(const BlockEntry*& pBlockEntry) const {
+  const Cluster* pCluster = m_pSegment->GetFirst();
+
+  for (int i = 0;;) {
+    if (pCluster == NULL) {
+      pBlockEntry = GetEOS();
+      return 1;
+    }
+
+    if (pCluster->EOS()) {
+      if (m_pSegment->DoneParsing()) {
+        pBlockEntry = GetEOS();
+        return 1;
+      }
+
+      pBlockEntry = 0;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    long status = pCluster->GetFirst(pBlockEntry);
+
+    if (status < 0)  // error
+      return status;
+
+    if (pBlockEntry == 0) {  // empty cluster
+      pCluster = m_pSegment->GetNext(pCluster);
+      continue;
+    }
+
+    for (;;) {
+      const Block* const pBlock = pBlockEntry->GetBlock();
+      assert(pBlock);
+
+      const long long tn = pBlock->GetTrackNumber();
+
+      if ((tn == m_info.number) && VetEntry(pBlockEntry)) return 0;
+
+      const BlockEntry* pNextEntry;
+
+      status = pCluster->GetNext(pBlockEntry, pNextEntry);
+
+      if (status < 0)  // error
+        return status;
+
+      if (pNextEntry == 0) break;
+
+      pBlockEntry = pNextEntry;
+    }
+
+    ++i;
+
+    if (i >= 100) break;
+
+    pCluster = m_pSegment->GetNext(pCluster);
+  }
+
+  // NOTE: if we get here, it means that we didn't find a block with
+  // a matching track number.  We interpret that as an error (which
+  // might be too conservative).
+
+  pBlockEntry = GetEOS();  // so we can return a non-NULL value
+  return 1;
+}
+
+long Track::GetNext(const BlockEntry* pCurrEntry, const BlockEntry*& pNextEntry) const {
+  assert(pCurrEntry);
+  assert(!pCurrEntry->EOS());  //?
+
+  const Block* const pCurrBlock = pCurrEntry->GetBlock();
+  assert(pCurrBlock && pCurrBlock->GetTrackNumber() == m_info.number);
+  if (!pCurrBlock || pCurrBlock->GetTrackNumber() != m_info.number) return -1;
+
+  const Cluster* pCluster = pCurrEntry->GetCluster();
+  assert(pCluster);
+  assert(!pCluster->EOS());
+
+  long status = pCluster->GetNext(pCurrEntry, pNextEntry);
+
+  if (status < 0)  // error
+    return status;
+
+  for (int i = 0;;) {
+    while (pNextEntry) {
+      const Block* const pNextBlock = pNextEntry->GetBlock();
+      assert(pNextBlock);
+
+      if (pNextBlock->GetTrackNumber() == m_info.number) return 0;
+
+      pCurrEntry = pNextEntry;
+
+      status = pCluster->GetNext(pCurrEntry, pNextEntry);
+
+      if (status < 0)  // error
+        return status;
+    }
+
+    pCluster = m_pSegment->GetNext(pCluster);
+
+    if (pCluster == NULL) {
+      pNextEntry = GetEOS();
+      return 1;
+    }
+
+    if (pCluster->EOS()) {
+      if (m_pSegment->DoneParsing()) {
+        pNextEntry = GetEOS();
+        return 1;
+      }
+
+      // TODO: there is a potential O(n^2) problem here: we tell the
+      // caller to (pre)load another cluster, which he does, but then he
+      // calls GetNext again, which repeats the same search.  This is
+      // a pathological case, since the only way it can happen is if
+      // there exists a long sequence of clusters none of which contain a
+      // block from this track.  One way around this problem is for the
+      // caller to be smarter when he loads another cluster: don't call
+      // us back until you have a cluster that contains a block from this
+      // track. (Of course, that's not cheap either, since our caller
+      // would have to scan the each cluster as it's loaded, so that
+      // would just push back the problem.)
+
+      pNextEntry = NULL;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    status = pCluster->GetFirst(pNextEntry);
+
+    if (status < 0)  // error
+      return status;
+
+    if (pNextEntry == NULL)  // empty cluster
+      continue;
+
+    ++i;
+
+    if (i >= 100) break;
+  }
+
+  // NOTE: if we get here, it means that we didn't find a block with
+  // a matching track number after lots of searching, so we give
+  // up trying.
+
+  pNextEntry = GetEOS();  // so we can return a non-NULL value
+  return 1;
+}
+
+bool Track::VetEntry(const BlockEntry* pBlockEntry) const {
+  assert(pBlockEntry);
+  const Block* const pBlock = pBlockEntry->GetBlock();
+  assert(pBlock);
+  assert(pBlock->GetTrackNumber() == m_info.number);
+  if (!pBlock || pBlock->GetTrackNumber() != m_info.number) return false;
+
+  // This function is used during a seek to determine whether the
+  // frame is a valid seek target.  This default function simply
+  // returns true, which means all frames are valid seek targets.
+  // It gets overridden by the VideoTrack class, because only video
+  // keyframes can be used as seek target.
+
+  return true;
+}
+
+long Track::Seek(long long time_ns, const BlockEntry*& pResult) const {
+  const long status = GetFirst(pResult);
+
+  if (status < 0)  // buffer underflow, etc
+    return status;
+
+  assert(pResult);
+
+  if (pResult->EOS()) return 0;
+
+  const Cluster* pCluster = pResult->GetCluster();
+  assert(pCluster);
+  assert(pCluster->GetIndex() >= 0);
+
+  if (time_ns <= pResult->GetBlock()->GetTime(pCluster)) return 0;
+
+  Cluster** const clusters = m_pSegment->m_clusters;
+  assert(clusters);
+
+  const long count = m_pSegment->GetCount();  // loaded only, not preloaded
+  assert(count > 0);
+
+  Cluster** const i = clusters + pCluster->GetIndex();
+  assert(i);
+  assert(*i == pCluster);
+  assert(pCluster->GetTime() <= time_ns);
+
+  Cluster** const j = clusters + count;
+
+  Cluster** lo = i;
+  Cluster** hi = j;
+
+  while (lo < hi) {
+    // INVARIANT:
+    //[i, lo) <= time_ns
+    //[lo, hi) ?
+    //[hi, j)  > time_ns
+
+    Cluster** const mid = lo + (hi - lo) / 2;
+    assert(mid < hi);
+
+    pCluster = *mid;
+    assert(pCluster);
+    assert(pCluster->GetIndex() >= 0);
+    assert(pCluster->GetIndex() == long(mid - m_pSegment->m_clusters));
+
+    const long long t = pCluster->GetTime();
+
+    if (t <= time_ns)
+      lo = mid + 1;
+    else
+      hi = mid;
+
+    assert(lo <= hi);
+  }
+
+  assert(lo == hi);
+  assert(lo > i);
+  assert(lo <= j);
+
+  while (lo > i) {
+    pCluster = *--lo;
+    assert(pCluster);
+    assert(pCluster->GetTime() <= time_ns);
+
+    pResult = pCluster->GetEntry(this);
+
+    if ((pResult != 0) && !pResult->EOS()) return 0;
+
+    // landed on empty cluster (no entries)
+  }
+
+  pResult = GetEOS();  // weird
+  return 0;
+}
+
+const ContentEncoding* Track::GetContentEncodingByIndex(unsigned long idx) const {
+  const ptrdiff_t count = content_encoding_entries_end_ - content_encoding_entries_;
+  assert(count >= 0);
+
+  if (idx >= static_cast<unsigned long>(count)) return NULL;
+
+  return content_encoding_entries_[idx];
+}
+
+unsigned long Track::GetContentEncodingCount() const {
+  const ptrdiff_t count = content_encoding_entries_end_ - content_encoding_entries_;
+  assert(count >= 0);
+
+  return static_cast<unsigned long>(count);
+}
+
+long Track::ParseContentEncodingsEntry(long long start, long long size) {
+  IMkvReader* const pReader = m_pSegment->m_pReader;
+  assert(pReader);
+
+  long long pos = start;
+  const long long stop = start + size;
+
+  // Count ContentEncoding elements.
+  int count = 0;
+  while (pos < stop) {
+    long long id, size;
+    const long status = ParseElementHeader(pReader, pos, stop, id, size);
+    if (status < 0)  // error
+      return status;
+
+    // pos now designates start of element
+    if (id == libwebm::kMkvContentEncoding) ++count;
+
+    pos += size;  // consume payload
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (count <= 0) return -1;
+
+  content_encoding_entries_ = new (std::nothrow) ContentEncoding*[count];
+  if (!content_encoding_entries_) return -1;
+
+  content_encoding_entries_end_ = content_encoding_entries_;
+
+  pos = start;
+  while (pos < stop) {
+    long long id, size;
+    long status = ParseElementHeader(pReader, pos, stop, id, size);
+    if (status < 0)  // error
+      return status;
+
+    // pos now designates start of element
+    if (id == libwebm::kMkvContentEncoding) {
+      ContentEncoding* const content_encoding = new (std::nothrow) ContentEncoding();
+      if (!content_encoding) return -1;
+
+      status = content_encoding->ParseContentEncodingEntry(pos, size, pReader);
+      if (status) {
+        delete content_encoding;
+        return status;
+      }
+
+      *content_encoding_entries_end_++ = content_encoding;
+    }
+
+    pos += size;  // consume payload
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+
+  return 0;
+}
+
+Track::EOSBlock::EOSBlock() : BlockEntry(NULL, LONG_MIN) {}
+
+BlockEntry::Kind Track::EOSBlock::GetKind() const { return kBlockEOS; }
+
+const Block* Track::EOSBlock::GetBlock() const { return NULL; }
+
+bool PrimaryChromaticity::Parse(IMkvReader* reader, long long read_pos, long long value_size, bool is_x,
+                                PrimaryChromaticity** chromaticity) {
+  if (!reader) return false;
+
+  if (!*chromaticity) *chromaticity = new PrimaryChromaticity();
+
+  if (!*chromaticity) return false;
+
+  PrimaryChromaticity* pc = *chromaticity;
+  float* value = is_x ? &pc->x : &pc->y;
+
+  double parser_value = 0;
+  const long long parse_status = UnserializeFloat(reader, read_pos, value_size, parser_value);
+
+  // Valid range is [0, 1]. Make sure the double is representable as a float
+  // before casting.
+  if (parse_status < 0 || parser_value < 0.0 || parser_value > 1.0 || (parser_value > 0.0 && parser_value < FLT_MIN))
+    return false;
+
+  *value = static_cast<float>(parser_value);
+
+  return true;
+}
+
+bool MasteringMetadata::Parse(IMkvReader* reader, long long mm_start, long long mm_size, MasteringMetadata** mm) {
+  if (!reader || *mm) return false;
+
+  std::unique_ptr<MasteringMetadata> mm_ptr(new MasteringMetadata());
+  if (!mm_ptr.get()) return false;
+
+  const long long mm_end = mm_start + mm_size;
+  long long read_pos = mm_start;
+
+  while (read_pos < mm_end) {
+    long long child_id = 0;
+    long long child_size = 0;
+
+    const long long status = ParseElementHeader(reader, read_pos, mm_end, child_id, child_size);
+    if (status < 0) return false;
+
+    if (child_id == libwebm::kMkvLuminanceMax) {
+      double value = 0;
+      const long long value_parse_status = UnserializeFloat(reader, read_pos, child_size, value);
+      if (value < -FLT_MAX || value > FLT_MAX || (value > 0.0 && value < FLT_MIN)) {
+        return false;
+      }
+      mm_ptr->luminance_max = static_cast<float>(value);
+      if (value_parse_status < 0 || mm_ptr->luminance_max < 0.0 || mm_ptr->luminance_max > 9999.99) {
+        return false;
+      }
+    } else if (child_id == libwebm::kMkvLuminanceMin) {
+      double value = 0;
+      const long long value_parse_status = UnserializeFloat(reader, read_pos, child_size, value);
+      if (value < -FLT_MAX || value > FLT_MAX || (value > 0.0 && value < FLT_MIN)) {
+        return false;
+      }
+      mm_ptr->luminance_min = static_cast<float>(value);
+      if (value_parse_status < 0 || mm_ptr->luminance_min < 0.0 || mm_ptr->luminance_min > 999.9999) {
+        return false;
+      }
+    } else {
+      bool is_x = false;
+      PrimaryChromaticity** chromaticity;
+      switch (child_id) {
+        case libwebm::kMkvPrimaryRChromaticityX:
+        case libwebm::kMkvPrimaryRChromaticityY:
+          is_x = child_id == libwebm::kMkvPrimaryRChromaticityX;
+          chromaticity = &mm_ptr->r;
+          break;
+        case libwebm::kMkvPrimaryGChromaticityX:
+        case libwebm::kMkvPrimaryGChromaticityY:
+          is_x = child_id == libwebm::kMkvPrimaryGChromaticityX;
+          chromaticity = &mm_ptr->g;
+          break;
+        case libwebm::kMkvPrimaryBChromaticityX:
+        case libwebm::kMkvPrimaryBChromaticityY:
+          is_x = child_id == libwebm::kMkvPrimaryBChromaticityX;
+          chromaticity = &mm_ptr->b;
+          break;
+        case libwebm::kMkvWhitePointChromaticityX:
+        case libwebm::kMkvWhitePointChromaticityY:
+          is_x = child_id == libwebm::kMkvWhitePointChromaticityX;
+          chromaticity = &mm_ptr->white_point;
+          break;
+        default:
+          return false;
+      }
+      const bool value_parse_status = PrimaryChromaticity::Parse(reader, read_pos, child_size, is_x, chromaticity);
+      if (!value_parse_status) return false;
+    }
+
+    read_pos += child_size;
+    if (read_pos > mm_end) return false;
+  }
+
+  *mm = mm_ptr.release();
+  return true;
+}
+
+bool Colour::Parse(IMkvReader* reader, long long colour_start, long long colour_size, Colour** colour) {
+  if (!reader || *colour) return false;
+
+  std::unique_ptr<Colour> colour_ptr(new Colour());
+  if (!colour_ptr.get()) return false;
+
+  const long long colour_end = colour_start + colour_size;
+  long long read_pos = colour_start;
+
+  while (read_pos < colour_end) {
+    long long child_id = 0;
+    long long child_size = 0;
+
+    const long status = ParseElementHeader(reader, read_pos, colour_end, child_id, child_size);
+    if (status < 0) return false;
+
+    if (child_id == libwebm::kMkvMatrixCoefficients) {
+      colour_ptr->matrix_coefficients = UnserializeUInt(reader, read_pos, child_size);
+      if (colour_ptr->matrix_coefficients < 0) return false;
+    } else if (child_id == libwebm::kMkvBitsPerChannel) {
+      colour_ptr->bits_per_channel = UnserializeUInt(reader, read_pos, child_size);
+      if (colour_ptr->bits_per_channel < 0) return false;
+    } else if (child_id == libwebm::kMkvChromaSubsamplingHorz) {
+      colour_ptr->chroma_subsampling_horz = UnserializeUInt(reader, read_pos, child_size);
+      if (colour_ptr->chroma_subsampling_horz < 0) return false;
+    } else if (child_id == libwebm::kMkvChromaSubsamplingVert) {
+      colour_ptr->chroma_subsampling_vert = UnserializeUInt(reader, read_pos, child_size);
+      if (colour_ptr->chroma_subsampling_vert < 0) return false;
+    } else if (child_id == libwebm::kMkvCbSubsamplingHorz) {
+      colour_ptr->cb_subsampling_horz = UnserializeUInt(reader, read_pos, child_size);
+      if (colour_ptr->cb_subsampling_horz < 0) return false;
+    } else if (child_id == libwebm::kMkvCbSubsamplingVert) {
+      colour_ptr->cb_subsampling_vert = UnserializeUInt(reader, read_pos, child_size);
+      if (colour_ptr->cb_subsampling_vert < 0) return false;
+    } else if (child_id == libwebm::kMkvChromaSitingHorz) {
+      colour_ptr->chroma_siting_horz = UnserializeUInt(reader, read_pos, child_size);
+      if (colour_ptr->chroma_siting_horz < 0) return false;
+    } else if (child_id == libwebm::kMkvChromaSitingVert) {
+      colour_ptr->chroma_siting_vert = UnserializeUInt(reader, read_pos, child_size);
+      if (colour_ptr->chroma_siting_vert < 0) return false;
+    } else if (child_id == libwebm::kMkvRange) {
+      colour_ptr->range = UnserializeUInt(reader, read_pos, child_size);
+      if (colour_ptr->range < 0) return false;
+    } else if (child_id == libwebm::kMkvTransferCharacteristics) {
+      colour_ptr->transfer_characteristics = UnserializeUInt(reader, read_pos, child_size);
+      if (colour_ptr->transfer_characteristics < 0) return false;
+    } else if (child_id == libwebm::kMkvPrimaries) {
+      colour_ptr->primaries = UnserializeUInt(reader, read_pos, child_size);
+      if (colour_ptr->primaries < 0) return false;
+    } else if (child_id == libwebm::kMkvMaxCLL) {
+      colour_ptr->max_cll = UnserializeUInt(reader, read_pos, child_size);
+      if (colour_ptr->max_cll < 0) return false;
+    } else if (child_id == libwebm::kMkvMaxFALL) {
+      colour_ptr->max_fall = UnserializeUInt(reader, read_pos, child_size);
+      if (colour_ptr->max_fall < 0) return false;
+    } else if (child_id == libwebm::kMkvMasteringMetadata) {
+      if (!MasteringMetadata::Parse(reader, read_pos, child_size, &colour_ptr->mastering_metadata)) return false;
+    } else {
+      return false;
+    }
+
+    read_pos += child_size;
+    if (read_pos > colour_end) return false;
+  }
+  *colour = colour_ptr.release();
+  return true;
+}
+
+bool Projection::Parse(IMkvReader* reader, long long start, long long size, Projection** projection) {
+  if (!reader || *projection) return false;
+
+  std::unique_ptr<Projection> projection_ptr(new Projection());
+  if (!projection_ptr.get()) return false;
+
+  const long long end = start + size;
+  long long read_pos = start;
+
+  while (read_pos < end) {
+    long long child_id = 0;
+    long long child_size = 0;
+
+    const long long status = ParseElementHeader(reader, read_pos, end, child_id, child_size);
+    if (status < 0) return false;
+
+    if (child_id == libwebm::kMkvProjectionType) {
+      long long projection_type = kTypeNotPresent;
+      projection_type = UnserializeUInt(reader, read_pos, child_size);
+      if (projection_type < 0) return false;
+
+      projection_ptr->type = static_cast<ProjectionType>(projection_type);
+    } else if (child_id == libwebm::kMkvProjectionPrivate) {
+      unsigned char* data = SafeArrayAlloc<unsigned char>(1, child_size);
+
+      if (data == NULL) return false;
+
+      const int status = reader->Read(read_pos, static_cast<long>(child_size), data);
+
+      if (status) {
+        delete[] data;
+        return false;
+      }
+
+      projection_ptr->private_data = data;
+      projection_ptr->private_data_length = static_cast<size_t>(child_size);
+    } else {
+      double value = 0;
+      const long long value_parse_status = UnserializeFloat(reader, read_pos, child_size, value);
+      // Make sure value is representable as a float before casting.
+      if (value_parse_status < 0 || value < -FLT_MAX || value > FLT_MAX || (value > 0.0 && value < FLT_MIN)) {
+        return false;
+      }
+
+      switch (child_id) {
+        case libwebm::kMkvProjectionPoseYaw:
+          projection_ptr->pose_yaw = static_cast<float>(value);
+          break;
+        case libwebm::kMkvProjectionPosePitch:
+          projection_ptr->pose_pitch = static_cast<float>(value);
+          break;
+        case libwebm::kMkvProjectionPoseRoll:
+          projection_ptr->pose_roll = static_cast<float>(value);
+          break;
+        default:
+          return false;
+      }
+    }
+
+    read_pos += child_size;
+    if (read_pos > end) return false;
+  }
+
+  *projection = projection_ptr.release();
+  return true;
+}
+
+VideoTrack::VideoTrack(Segment* pSegment, long long element_start, long long element_size)
+    : Track(pSegment, element_start, element_size), m_colour_space(NULL), m_colour(NULL), m_projection(NULL) {}
+
+VideoTrack::~VideoTrack() {
+  delete m_colour;
+  delete m_projection;
+}
+
+long VideoTrack::Parse(Segment* pSegment, const Info& info, long long element_start, long long element_size,
+                       VideoTrack*& pResult) {
+  if (pResult) return -1;
+
+  if (info.type != Track::kVideo) return -1;
+
+  long long width = 0;
+  long long height = 0;
+  long long display_width = 0;
+  long long display_height = 0;
+  long long display_unit = 0;
+  long long stereo_mode = 0;
+
+  double rate = 0.0;
+  char* colour_space = NULL;
+
+  IMkvReader* const pReader = pSegment->m_pReader;
+
+  const Settings& s = info.settings;
+  assert(s.start >= 0);
+  assert(s.size >= 0);
+
+  long long pos = s.start;
+  assert(pos >= 0);
+
+  const long long stop = pos + s.size;
+
+  Colour* colour = NULL;
+  std::unique_ptr<Projection> projection_ptr;
+
+  while (pos < stop) {
+    long long id, size;
+
+    const long status = ParseElementHeader(pReader, pos, stop, id, size);
+
+    if (status < 0)  // error
+      return status;
+
+    if (id == libwebm::kMkvPixelWidth) {
+      width = UnserializeUInt(pReader, pos, size);
+
+      if (width <= 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvPixelHeight) {
+      height = UnserializeUInt(pReader, pos, size);
+
+      if (height <= 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvDisplayWidth) {
+      display_width = UnserializeUInt(pReader, pos, size);
+
+      if (display_width <= 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvDisplayHeight) {
+      display_height = UnserializeUInt(pReader, pos, size);
+
+      if (display_height <= 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvDisplayUnit) {
+      display_unit = UnserializeUInt(pReader, pos, size);
+
+      if (display_unit < 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvStereoMode) {
+      stereo_mode = UnserializeUInt(pReader, pos, size);
+
+      if (stereo_mode < 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvFrameRate) {
+      const long status = UnserializeFloat(pReader, pos, size, rate);
+
+      if (status < 0) return status;
+
+      if (rate <= 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvColour) {
+      if (!Colour::Parse(pReader, pos, size, &colour)) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvProjection) {
+      Projection* projection = NULL;
+      if (!Projection::Parse(pReader, pos, size, &projection)) {
+        return E_FILE_FORMAT_INVALID;
+      } else {
+        projection_ptr.reset(projection);
+      }
+    } else if (id == libwebm::kMkvColourSpace) {
+      const long status = UnserializeString(pReader, pos, size, colour_space);
+      if (status < 0) return status;
+    }
+
+    pos += size;  // consume payload
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+
+  VideoTrack* const pTrack = new (std::nothrow) VideoTrack(pSegment, element_start, element_size);
+
+  if (pTrack == NULL) return -1;  // generic error
+
+  const int status = info.Copy(pTrack->m_info);
+
+  if (status) {  // error
+    delete pTrack;
+    return status;
+  }
+
+  pTrack->m_width = width;
+  pTrack->m_height = height;
+  pTrack->m_display_width = display_width;
+  pTrack->m_display_height = display_height;
+  pTrack->m_display_unit = display_unit;
+  pTrack->m_stereo_mode = stereo_mode;
+  pTrack->m_rate = rate;
+  pTrack->m_colour = colour;
+  pTrack->m_colour_space = colour_space;
+  pTrack->m_projection = projection_ptr.release();
+
+  pResult = pTrack;
+  return 0;  // success
+}
+
+bool VideoTrack::VetEntry(const BlockEntry* pBlockEntry) const {
+  return Track::VetEntry(pBlockEntry) && pBlockEntry->GetBlock()->IsKey();
+}
+
+long VideoTrack::Seek(long long time_ns, const BlockEntry*& pResult) const {
+  const long status = GetFirst(pResult);
+
+  if (status < 0)  // buffer underflow, etc
+    return status;
+
+  assert(pResult);
+
+  if (pResult->EOS()) return 0;
+
+  const Cluster* pCluster = pResult->GetCluster();
+  assert(pCluster);
+  assert(pCluster->GetIndex() >= 0);
+
+  if (time_ns <= pResult->GetBlock()->GetTime(pCluster)) return 0;
+
+  Cluster** const clusters = m_pSegment->m_clusters;
+  assert(clusters);
+
+  const long count = m_pSegment->GetCount();  // loaded only, not pre-loaded
+  assert(count > 0);
+
+  Cluster** const i = clusters + pCluster->GetIndex();
+  assert(i);
+  assert(*i == pCluster);
+  assert(pCluster->GetTime() <= time_ns);
+
+  Cluster** const j = clusters + count;
+
+  Cluster** lo = i;
+  Cluster** hi = j;
+
+  while (lo < hi) {
+    // INVARIANT:
+    //[i, lo) <= time_ns
+    //[lo, hi) ?
+    //[hi, j)  > time_ns
+
+    Cluster** const mid = lo + (hi - lo) / 2;
+    assert(mid < hi);
+
+    pCluster = *mid;
+    assert(pCluster);
+    assert(pCluster->GetIndex() >= 0);
+    assert(pCluster->GetIndex() == long(mid - m_pSegment->m_clusters));
+
+    const long long t = pCluster->GetTime();
+
+    if (t <= time_ns)
+      lo = mid + 1;
+    else
+      hi = mid;
+
+    assert(lo <= hi);
+  }
+
+  assert(lo == hi);
+  assert(lo > i);
+  assert(lo <= j);
+
+  pCluster = *--lo;
+  assert(pCluster);
+  assert(pCluster->GetTime() <= time_ns);
+
+  pResult = pCluster->GetEntry(this, time_ns);
+
+  if ((pResult != 0) && !pResult->EOS())  // found a keyframe
+    return 0;
+
+  while (lo != i) {
+    pCluster = *--lo;
+    assert(pCluster);
+    assert(pCluster->GetTime() <= time_ns);
+
+    pResult = pCluster->GetEntry(this, time_ns);
+
+    if ((pResult != 0) && !pResult->EOS()) return 0;
+  }
+
+  // weird: we're on the first cluster, but no keyframe found
+  // should never happen but we must return something anyway
+
+  pResult = GetEOS();
+  return 0;
+}
+
+Colour* VideoTrack::GetColour() const { return m_colour; }
+
+Projection* VideoTrack::GetProjection() const { return m_projection; }
+
+long long VideoTrack::GetWidth() const { return m_width; }
+
+long long VideoTrack::GetHeight() const { return m_height; }
+
+long long VideoTrack::GetDisplayWidth() const { return m_display_width > 0 ? m_display_width : GetWidth(); }
+
+long long VideoTrack::GetDisplayHeight() const { return m_display_height > 0 ? m_display_height : GetHeight(); }
+
+long long VideoTrack::GetDisplayUnit() const { return m_display_unit; }
+
+long long VideoTrack::GetStereoMode() const { return m_stereo_mode; }
+
+double VideoTrack::GetFrameRate() const { return m_rate; }
+
+AudioTrack::AudioTrack(Segment* pSegment, long long element_start, long long element_size)
+    : Track(pSegment, element_start, element_size) {}
+
+long AudioTrack::Parse(Segment* pSegment, const Info& info, long long element_start, long long element_size,
+                       AudioTrack*& pResult) {
+  if (pResult) return -1;
+
+  if (info.type != Track::kAudio) return -1;
+
+  IMkvReader* const pReader = pSegment->m_pReader;
+
+  const Settings& s = info.settings;
+  assert(s.start >= 0);
+  assert(s.size >= 0);
+
+  long long pos = s.start;
+  assert(pos >= 0);
+
+  const long long stop = pos + s.size;
+
+  double rate = 8000.0;  // MKV default
+  long long channels = 1;
+  long long bit_depth = 0;
+
+  while (pos < stop) {
+    long long id, size;
+
+    long status = ParseElementHeader(pReader, pos, stop, id, size);
+
+    if (status < 0)  // error
+      return status;
+
+    if (id == libwebm::kMkvSamplingFrequency) {
+      status = UnserializeFloat(pReader, pos, size, rate);
+
+      if (status < 0) return status;
+
+      if (rate <= 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvChannels) {
+      channels = UnserializeUInt(pReader, pos, size);
+
+      if (channels <= 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvBitDepth) {
+      bit_depth = UnserializeUInt(pReader, pos, size);
+
+      if (bit_depth <= 0) return E_FILE_FORMAT_INVALID;
+    }
+
+    pos += size;  // consume payload
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+
+  AudioTrack* const pTrack = new (std::nothrow) AudioTrack(pSegment, element_start, element_size);
+
+  if (pTrack == NULL) return -1;  // generic error
+
+  const int status = info.Copy(pTrack->m_info);
+
+  if (status) {
+    delete pTrack;
+    return status;
+  }
+
+  pTrack->m_rate = rate;
+  pTrack->m_channels = channels;
+  pTrack->m_bitDepth = bit_depth;
+
+  pResult = pTrack;
+  return 0;  // success
+}
+
+double AudioTrack::GetSamplingRate() const { return m_rate; }
+
+long long AudioTrack::GetChannels() const { return m_channels; }
+
+long long AudioTrack::GetBitDepth() const { return m_bitDepth; }
+
+Tracks::Tracks(Segment* pSegment, long long start, long long size_, long long element_start, long long element_size)
+    : m_pSegment(pSegment),
+      m_start(start),
+      m_size(size_),
+      m_element_start(element_start),
+      m_element_size(element_size),
+      m_trackEntries(NULL),
+      m_trackEntriesEnd(NULL) {}
+
+long Tracks::Parse() {
+  assert(m_trackEntries == NULL);
+  assert(m_trackEntriesEnd == NULL);
+
+  const long long stop = m_start + m_size;
+  IMkvReader* const pReader = m_pSegment->m_pReader;
+
+  int count = 0;
+  long long pos = m_start;
+
+  while (pos < stop) {
+    long long id, size;
+
+    const long status = ParseElementHeader(pReader, pos, stop, id, size);
+
+    if (status < 0)  // error
+      return status;
+
+    if (size == 0)  // weird
+      continue;
+
+    if (id == libwebm::kMkvTrackEntry) ++count;
+
+    pos += size;  // consume payload
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+
+  if (count <= 0) return 0;  // success
+
+  m_trackEntries = new (std::nothrow) Track*[count];
+
+  if (m_trackEntries == NULL) return -1;
+
+  m_trackEntriesEnd = m_trackEntries;
+
+  pos = m_start;
+
+  while (pos < stop) {
+    const long long element_start = pos;
+
+    long long id, payload_size;
+
+    const long status = ParseElementHeader(pReader, pos, stop, id, payload_size);
+
+    if (status < 0)  // error
+      return status;
+
+    if (payload_size == 0)  // weird
+      continue;
+
+    const long long payload_stop = pos + payload_size;
+    assert(payload_stop <= stop);  // checked in ParseElement
+
+    const long long element_size = payload_stop - element_start;
+
+    if (id == libwebm::kMkvTrackEntry) {
+      Track*& pTrack = *m_trackEntriesEnd;
+      pTrack = NULL;
+
+      const long status = ParseTrackEntry(pos, payload_size, element_start, element_size, pTrack);
+      if (status) return status;
+
+      if (pTrack) ++m_trackEntriesEnd;
+    }
+
+    pos = payload_stop;
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+
+  return 0;  // success
+}
+
+unsigned long Tracks::GetTracksCount() const {
+  const ptrdiff_t result = m_trackEntriesEnd - m_trackEntries;
+  assert(result >= 0);
+
+  return static_cast<unsigned long>(result);
+}
+
+long Tracks::ParseTrackEntry(long long track_start, long long track_size, long long element_start,
+                             long long element_size, Track*& pResult) const {
+  if (pResult) return -1;
+
+  IMkvReader* const pReader = m_pSegment->m_pReader;
+
+  long long pos = track_start;
+  const long long track_stop = track_start + track_size;
+
+  Track::Info info;
+
+  info.type = 0;
+  info.number = 0;
+  info.uid = 0;
+  info.defaultDuration = 0;
+
+  Track::Settings v;
+  v.start = -1;
+  v.size = -1;
+
+  Track::Settings a;
+  a.start = -1;
+  a.size = -1;
+
+  Track::Settings e;  // content_encodings_settings;
+  e.start = -1;
+  e.size = -1;
+
+  long long lacing = 1;  // default is true
+
+  while (pos < track_stop) {
+    long long id, size;
+
+    const long status = ParseElementHeader(pReader, pos, track_stop, id, size);
+
+    if (status < 0)  // error
+      return status;
+
+    if (size < 0) return E_FILE_FORMAT_INVALID;
+
+    const long long start = pos;
+
+    if (id == libwebm::kMkvVideo) {
+      v.start = start;
+      v.size = size;
+    } else if (id == libwebm::kMkvAudio) {
+      a.start = start;
+      a.size = size;
+    } else if (id == libwebm::kMkvContentEncodings) {
+      e.start = start;
+      e.size = size;
+    } else if (id == libwebm::kMkvTrackUID) {
+      if (size > 8) return E_FILE_FORMAT_INVALID;
+
+      info.uid = 0;
+
+      long long pos_ = start;
+      const long long pos_end = start + size;
+
+      while (pos_ != pos_end) {
+        unsigned char b;
+
+        const int status = pReader->Read(pos_, 1, &b);
+
+        if (status) return status;
+
+        info.uid <<= 8;
+        info.uid |= b;
+
+        ++pos_;
+      }
+    } else if (id == libwebm::kMkvTrackNumber) {
+      const long long num = UnserializeUInt(pReader, pos, size);
+
+      if ((num <= 0) || (num > 127)) return E_FILE_FORMAT_INVALID;
+
+      info.number = static_cast<long>(num);
+    } else if (id == libwebm::kMkvTrackType) {
+      const long long type = UnserializeUInt(pReader, pos, size);
+
+      if ((type <= 0) || (type > 254)) return E_FILE_FORMAT_INVALID;
+
+      info.type = static_cast<long>(type);
+    } else if (id == libwebm::kMkvName) {
+      const long status = UnserializeString(pReader, pos, size, info.nameAsUTF8);
+
+      if (status) return status;
+    } else if (id == libwebm::kMkvLanguage) {
+      const long status = UnserializeString(pReader, pos, size, info.language);
+
+      if (status) return status;
+    } else if (id == libwebm::kMkvDefaultDuration) {
+      const long long duration = UnserializeUInt(pReader, pos, size);
+
+      if (duration < 0) return E_FILE_FORMAT_INVALID;
+
+      info.defaultDuration = static_cast<unsigned long long>(duration);
+    } else if (id == libwebm::kMkvCodecID) {
+      const long status = UnserializeString(pReader, pos, size, info.codecId);
+
+      if (status) return status;
+    } else if (id == libwebm::kMkvFlagLacing) {
+      lacing = UnserializeUInt(pReader, pos, size);
+
+      if ((lacing < 0) || (lacing > 1)) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvCodecPrivate) {
+      delete[] info.codecPrivate;
+      info.codecPrivate = NULL;
+      info.codecPrivateSize = 0;
+
+      const size_t buflen = static_cast<size_t>(size);
+
+      if (buflen) {
+        unsigned char* buf = SafeArrayAlloc<unsigned char>(1, buflen);
+
+        if (buf == NULL) return -1;
+
+        const int status = pReader->Read(pos, static_cast<long>(buflen), buf);
+
+        if (status) {
+          delete[] buf;
+          return status;
+        }
+
+        info.codecPrivate = buf;
+        info.codecPrivateSize = buflen;
+      }
+    } else if (id == libwebm::kMkvCodecName) {
+      const long status = UnserializeString(pReader, pos, size, info.codecNameAsUTF8);
+
+      if (status) return status;
+    } else if (id == libwebm::kMkvCodecDelay) {
+      info.codecDelay = UnserializeUInt(pReader, pos, size);
+    } else if (id == libwebm::kMkvSeekPreRoll) {
+      info.seekPreRoll = UnserializeUInt(pReader, pos, size);
+    }
+
+    pos += size;  // consume payload
+    if (pos > track_stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != track_stop) return E_FILE_FORMAT_INVALID;
+
+  if (info.number <= 0)  // not specified
+    return E_FILE_FORMAT_INVALID;
+
+  if (GetTrackByNumber(info.number)) return E_FILE_FORMAT_INVALID;
+
+  if (info.type <= 0)  // not specified
+    return E_FILE_FORMAT_INVALID;
+
+  info.lacing = (lacing > 0) ? true : false;
+
+  if (info.type == Track::kVideo) {
+    if (v.start < 0) return E_FILE_FORMAT_INVALID;
+
+    if (a.start >= 0) return E_FILE_FORMAT_INVALID;
+
+    info.settings = v;
+
+    VideoTrack* pTrack = NULL;
+
+    const long status = VideoTrack::Parse(m_pSegment, info, element_start, element_size, pTrack);
+
+    if (status) return status;
+
+    pResult = pTrack;
+    assert(pResult);
+
+    if (e.start >= 0) pResult->ParseContentEncodingsEntry(e.start, e.size);
+  } else if (info.type == Track::kAudio) {
+    if (a.start < 0) return E_FILE_FORMAT_INVALID;
+
+    if (v.start >= 0) return E_FILE_FORMAT_INVALID;
+
+    info.settings = a;
+
+    AudioTrack* pTrack = NULL;
+
+    const long status = AudioTrack::Parse(m_pSegment, info, element_start, element_size, pTrack);
+
+    if (status) return status;
+
+    pResult = pTrack;
+    assert(pResult);
+
+    if (e.start >= 0) pResult->ParseContentEncodingsEntry(e.start, e.size);
+  } else {
+    // neither video nor audio - probably metadata or subtitles
+
+    if (a.start >= 0) return E_FILE_FORMAT_INVALID;
+
+    if (v.start >= 0) return E_FILE_FORMAT_INVALID;
+
+    if (info.type == Track::kMetadata && e.start >= 0) return E_FILE_FORMAT_INVALID;
+
+    info.settings.start = -1;
+    info.settings.size = 0;
+
+    Track* pTrack = NULL;
+
+    const long status = Track::Create(m_pSegment, info, element_start, element_size, pTrack);
+
+    if (status) return status;
+
+    pResult = pTrack;
+    assert(pResult);
+  }
+
+  return 0;  // success
+}
+
+Tracks::~Tracks() {
+  Track** i = m_trackEntries;
+  Track** const j = m_trackEntriesEnd;
+
+  while (i != j) {
+    Track* const pTrack = *i++;
+    delete pTrack;
+  }
+
+  delete[] m_trackEntries;
+}
+
+const Track* Tracks::GetTrackByNumber(long tn) const {
+  if (tn < 0) return NULL;
+
+  Track** i = m_trackEntries;
+  Track** const j = m_trackEntriesEnd;
+
+  while (i != j) {
+    Track* const pTrack = *i++;
+
+    if (pTrack == NULL) continue;
+
+    if (tn == pTrack->GetNumber()) return pTrack;
+  }
+
+  return NULL;  // not found
+}
+
+const Track* Tracks::GetTrackByIndex(unsigned long idx) const {
+  const ptrdiff_t count = m_trackEntriesEnd - m_trackEntries;
+
+  if (idx >= static_cast<unsigned long>(count)) return NULL;
+
+  return m_trackEntries[idx];
+}
+
+long Cluster::Load(long long& pos, long& len) const {
+  if (m_pSegment == NULL) return E_PARSE_FAILED;
+
+  if (m_timecode >= 0)  // at least partially loaded
+    return 0;
+
+  if (m_pos != m_element_start || m_element_size >= 0) return E_PARSE_FAILED;
+
+  IMkvReader* const pReader = m_pSegment->m_pReader;
+  long long total, avail;
+  const int status = pReader->Length(&total, &avail);
+
+  if (status < 0)  // error
+    return status;
+
+  if (total >= 0 && (avail > total || m_pos > total)) return E_FILE_FORMAT_INVALID;
+
+  pos = m_pos;
+
+  long long cluster_size = -1;
+
+  if ((pos + 1) > avail) {
+    len = 1;
+    return E_BUFFER_NOT_FULL;
+  }
+
+  long long result = GetUIntLength(pReader, pos, len);
+
+  if (result < 0)  // error or underflow
+    return static_cast<long>(result);
+
+  if (result > 0) return E_BUFFER_NOT_FULL;
+
+  if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+  const long long id_ = ReadID(pReader, pos, len);
+
+  if (id_ < 0)  // error
+    return static_cast<long>(id_);
+
+  if (id_ != libwebm::kMkvCluster) return E_FILE_FORMAT_INVALID;
+
+  pos += len;  // consume id
+
+  // read cluster size
+
+  if ((pos + 1) > avail) {
+    len = 1;
+    return E_BUFFER_NOT_FULL;
+  }
+
+  result = GetUIntLength(pReader, pos, len);
+
+  if (result < 0)  // error
+    return static_cast<long>(result);
+
+  if (result > 0) return E_BUFFER_NOT_FULL;
+
+  if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+  const long long size = ReadUInt(pReader, pos, len);
+
+  if (size < 0)  // error
+    return static_cast<long>(cluster_size);
+
+  if (size == 0) return E_FILE_FORMAT_INVALID;
+
+  pos += len;  // consume length of size of element
+
+  const long long unknown_size = (1LL << (7 * len)) - 1;
+
+  if (size != unknown_size) cluster_size = size;
+
+  // pos points to start of payload
+  long long timecode = -1;
+  long long new_pos = -1;
+  bool bBlock = false;
+
+  long long cluster_stop = (cluster_size < 0) ? -1 : pos + cluster_size;
+
+  for (;;) {
+    if ((cluster_stop >= 0) && (pos >= cluster_stop)) break;
+
+    // Parse ID
+
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    long long result = GetUIntLength(pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0) return E_BUFFER_NOT_FULL;
+
+    if ((cluster_stop >= 0) && ((pos + len) > cluster_stop)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long id = ReadID(pReader, pos, len);
+
+    if (id < 0)  // error
+      return static_cast<long>(id);
+
+    if (id == 0) return E_FILE_FORMAT_INVALID;
+
+    // This is the distinguished set of ID's we use to determine
+    // that we have exhausted the sub-element's inside the cluster
+    // whose ID we parsed earlier.
+
+    if (id == libwebm::kMkvCluster) break;
+
+    if (id == libwebm::kMkvCues) break;
+
+    pos += len;  // consume ID field
+
+    // Parse Size
+
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    result = GetUIntLength(pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0) return E_BUFFER_NOT_FULL;
+
+    if ((cluster_stop >= 0) && ((pos + len) > cluster_stop)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long size = ReadUInt(pReader, pos, len);
+
+    if (size < 0)  // error
+      return static_cast<long>(size);
+
+    const long long unknown_size = (1LL << (7 * len)) - 1;
+
+    if (size == unknown_size) return E_FILE_FORMAT_INVALID;
+
+    pos += len;  // consume size field
+
+    if ((cluster_stop >= 0) && (pos > cluster_stop)) return E_FILE_FORMAT_INVALID;
+
+    // pos now points to start of payload
+
+    if (size == 0) continue;
+
+    if ((cluster_stop >= 0) && ((pos + size) > cluster_stop)) return E_FILE_FORMAT_INVALID;
+
+    if (id == libwebm::kMkvTimecode) {
+      len = static_cast<long>(size);
+
+      if ((pos + size) > avail) return E_BUFFER_NOT_FULL;
+
+      timecode = UnserializeUInt(pReader, pos, size);
+
+      if (timecode < 0)  // error (or underflow)
+        return static_cast<long>(timecode);
+
+      new_pos = pos + size;
+
+      if (bBlock) break;
+    } else if (id == libwebm::kMkvBlockGroup) {
+      bBlock = true;
+      break;
+    } else if (id == libwebm::kMkvSimpleBlock) {
+      bBlock = true;
+      break;
+    }
+
+    pos += size;  // consume payload
+    if (cluster_stop >= 0 && pos > cluster_stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (cluster_stop >= 0 && pos > cluster_stop) return E_FILE_FORMAT_INVALID;
+
+  if (timecode < 0)  // no timecode found
+    return E_FILE_FORMAT_INVALID;
+
+  if (!bBlock) return E_FILE_FORMAT_INVALID;
+
+  m_pos = new_pos;        // designates position just beyond timecode payload
+  m_timecode = timecode;  // m_timecode >= 0 means we're partially loaded
+
+  if (cluster_size >= 0) m_element_size = cluster_stop - m_element_start;
+
+  return 0;
+}
+
+long Cluster::Parse(long long& pos, long& len) const {
+  long status = Load(pos, len);
+
+  if (status < 0) return status;
+
+  if (m_pos < m_element_start || m_timecode < 0) return E_PARSE_FAILED;
+
+  const long long cluster_stop = (m_element_size < 0) ? -1 : m_element_start + m_element_size;
+
+  if ((cluster_stop >= 0) && (m_pos >= cluster_stop)) return 1;  // nothing else to do
+
+  IMkvReader* const pReader = m_pSegment->m_pReader;
+
+  long long total, avail;
+
+  status = pReader->Length(&total, &avail);
+
+  if (status < 0)  // error
+    return status;
+
+  if (total >= 0 && avail > total) return E_FILE_FORMAT_INVALID;
+
+  pos = m_pos;
+
+  for (;;) {
+    if ((cluster_stop >= 0) && (pos >= cluster_stop)) break;
+
+    if ((total >= 0) && (pos >= total)) {
+      if (m_element_size < 0) m_element_size = pos - m_element_start;
+
+      break;
+    }
+
+    // Parse ID
+
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    long long result = GetUIntLength(pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0) return E_BUFFER_NOT_FULL;
+
+    if ((cluster_stop >= 0) && ((pos + len) > cluster_stop)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long id = ReadID(pReader, pos, len);
+
+    if (id < 0) return E_FILE_FORMAT_INVALID;
+
+    // This is the distinguished set of ID's we use to determine
+    // that we have exhausted the sub-element's inside the cluster
+    // whose ID we parsed earlier.
+
+    if ((id == libwebm::kMkvCluster) || (id == libwebm::kMkvCues)) {
+      if (m_element_size < 0) m_element_size = pos - m_element_start;
+
+      break;
+    }
+
+    pos += len;  // consume ID field
+
+    // Parse Size
+
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    result = GetUIntLength(pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0) return E_BUFFER_NOT_FULL;
+
+    if ((cluster_stop >= 0) && ((pos + len) > cluster_stop)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long size = ReadUInt(pReader, pos, len);
+
+    if (size < 0)  // error
+      return static_cast<long>(size);
+
+    const long long unknown_size = (1LL << (7 * len)) - 1;
+
+    if (size == unknown_size) return E_FILE_FORMAT_INVALID;
+
+    pos += len;  // consume size field
+
+    if ((cluster_stop >= 0) && (pos > cluster_stop)) return E_FILE_FORMAT_INVALID;
+
+    // pos now points to start of payload
+
+    if (size == 0) continue;
+
+    // const long long block_start = pos;
+    const long long block_stop = pos + size;
+
+    if (cluster_stop >= 0) {
+      if (block_stop > cluster_stop) {
+        if (id == libwebm::kMkvBlockGroup || id == libwebm::kMkvSimpleBlock) {
+          return E_FILE_FORMAT_INVALID;
+        }
+
+        pos = cluster_stop;
+        break;
+      }
+    } else if ((total >= 0) && (block_stop > total)) {
+      m_element_size = total - m_element_start;
+      pos = total;
+      break;
+    } else if (block_stop > avail) {
+      len = static_cast<long>(size);
+      return E_BUFFER_NOT_FULL;
+    }
+
+    Cluster* const this_ = const_cast<Cluster*>(this);
+
+    if (id == libwebm::kMkvBlockGroup) return this_->ParseBlockGroup(size, pos, len);
+
+    if (id == libwebm::kMkvSimpleBlock) return this_->ParseSimpleBlock(size, pos, len);
+
+    pos += size;  // consume payload
+    if (cluster_stop >= 0 && pos > cluster_stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (m_element_size < 1) return E_FILE_FORMAT_INVALID;
+
+  m_pos = pos;
+  if (cluster_stop >= 0 && m_pos > cluster_stop) return E_FILE_FORMAT_INVALID;
+
+  if (m_entries_count > 0) {
+    const long idx = m_entries_count - 1;
+
+    const BlockEntry* const pLast = m_entries[idx];
+    if (pLast == NULL) return E_PARSE_FAILED;
+
+    const Block* const pBlock = pLast->GetBlock();
+    if (pBlock == NULL) return E_PARSE_FAILED;
+
+    const long long start = pBlock->m_start;
+
+    if ((total >= 0) && (start > total)) return E_PARSE_FAILED;  // defend against trucated stream
+
+    const long long size = pBlock->m_size;
+
+    const long long stop = start + size;
+    if (cluster_stop >= 0 && stop > cluster_stop) return E_FILE_FORMAT_INVALID;
+
+    if ((total >= 0) && (stop > total)) return E_PARSE_FAILED;  // defend against trucated stream
+  }
+
+  return 1;  // no more entries
+}
+
+long Cluster::ParseSimpleBlock(long long block_size, long long& pos, long& len) {
+  const long long block_start = pos;
+  const long long block_stop = pos + block_size;
+
+  IMkvReader* const pReader = m_pSegment->m_pReader;
+
+  long long total, avail;
+
+  long status = pReader->Length(&total, &avail);
+
+  if (status < 0)  // error
+    return status;
+
+  assert((total < 0) || (avail <= total));
+
+  // parse track number
+
+  if ((pos + 1) > avail) {
+    len = 1;
+    return E_BUFFER_NOT_FULL;
+  }
+
+  long long result = GetUIntLength(pReader, pos, len);
+
+  if (result < 0)  // error
+    return static_cast<long>(result);
+
+  if (result > 0)  // weird
+    return E_BUFFER_NOT_FULL;
+
+  if ((pos + len) > block_stop) return E_FILE_FORMAT_INVALID;
+
+  if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+  const long long track = ReadUInt(pReader, pos, len);
+
+  if (track < 0)  // error
+    return static_cast<long>(track);
+
+  if (track == 0) return E_FILE_FORMAT_INVALID;
+
+  pos += len;  // consume track number
+
+  if ((pos + 2) > block_stop) return E_FILE_FORMAT_INVALID;
+
+  if ((pos + 2) > avail) {
+    len = 2;
+    return E_BUFFER_NOT_FULL;
+  }
+
+  pos += 2;  // consume timecode
+
+  if ((pos + 1) > block_stop) return E_FILE_FORMAT_INVALID;
+
+  if ((pos + 1) > avail) {
+    len = 1;
+    return E_BUFFER_NOT_FULL;
+  }
+
+  unsigned char flags;
+
+  status = pReader->Read(pos, 1, &flags);
+
+  if (status < 0) {  // error or underflow
+    len = 1;
+    return status;
+  }
+
+  ++pos;  // consume flags byte
+  assert(pos <= avail);
+
+  if (pos >= block_stop) return E_FILE_FORMAT_INVALID;
+
+  const int lacing = int(flags & 0x06) >> 1;
+
+  if ((lacing != 0) && (block_stop > avail)) {
+    len = static_cast<long>(block_stop - pos);
+    return E_BUFFER_NOT_FULL;
+  }
+
+  status = CreateBlock(libwebm::kMkvSimpleBlock, block_start, block_size,
+                       0);  // DiscardPadding
+
+  if (status != 0) return status;
+
+  m_pos = block_stop;
+
+  return 0;  // success
+}
+
+long Cluster::ParseBlockGroup(long long payload_size, long long& pos, long& len) {
+  const long long payload_start = pos;
+  const long long payload_stop = pos + payload_size;
+
+  IMkvReader* const pReader = m_pSegment->m_pReader;
+
+  long long total, avail;
+
+  long status = pReader->Length(&total, &avail);
+
+  if (status < 0)  // error
+    return status;
+
+  assert((total < 0) || (avail <= total));
+
+  if ((total >= 0) && (payload_stop > total)) return E_FILE_FORMAT_INVALID;
+
+  if (payload_stop > avail) {
+    len = static_cast<long>(payload_size);
+    return E_BUFFER_NOT_FULL;
+  }
+
+  long long discard_padding = 0;
+
+  while (pos < payload_stop) {
+    // parse sub-block element ID
+
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    long long result = GetUIntLength(pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0)  // weird
+      return E_BUFFER_NOT_FULL;
+
+    if ((pos + len) > payload_stop) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long id = ReadID(pReader, pos, len);
+
+    if (id < 0)  // error
+      return static_cast<long>(id);
+
+    if (id == 0)  // not a valid ID
+      return E_FILE_FORMAT_INVALID;
+
+    pos += len;  // consume ID field
+
+    // Parse Size
+
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    result = GetUIntLength(pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0)  // weird
+      return E_BUFFER_NOT_FULL;
+
+    if ((pos + len) > payload_stop) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long size = ReadUInt(pReader, pos, len);
+
+    if (size < 0)  // error
+      return static_cast<long>(size);
+
+    pos += len;  // consume size field
+
+    // pos now points to start of sub-block group payload
+
+    if (pos > payload_stop) return E_FILE_FORMAT_INVALID;
+
+    if (size == 0)  // weird
+      continue;
+
+    const long long unknown_size = (1LL << (7 * len)) - 1;
+
+    if (size == unknown_size) return E_FILE_FORMAT_INVALID;
+
+    if (id == libwebm::kMkvDiscardPadding) {
+      status = UnserializeInt(pReader, pos, size, discard_padding);
+
+      if (status < 0)  // error
+        return status;
+    }
+
+    if (id != libwebm::kMkvBlock) {
+      pos += size;  // consume sub-part of block group
+
+      if (pos > payload_stop) return E_FILE_FORMAT_INVALID;
+
+      continue;
+    }
+
+    const long long block_stop = pos + size;
+
+    if (block_stop > payload_stop) return E_FILE_FORMAT_INVALID;
+
+    // parse track number
+
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    result = GetUIntLength(pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0)  // weird
+      return E_BUFFER_NOT_FULL;
+
+    if ((pos + len) > block_stop) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long track = ReadUInt(pReader, pos, len);
+
+    if (track < 0)  // error
+      return static_cast<long>(track);
+
+    if (track == 0) return E_FILE_FORMAT_INVALID;
+
+    pos += len;  // consume track number
+
+    if ((pos + 2) > block_stop) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + 2) > avail) {
+      len = 2;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    pos += 2;  // consume timecode
+
+    if ((pos + 1) > block_stop) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    unsigned char flags;
+
+    status = pReader->Read(pos, 1, &flags);
+
+    if (status < 0) {  // error or underflow
+      len = 1;
+      return status;
+    }
+
+    ++pos;  // consume flags byte
+    assert(pos <= avail);
+
+    if (pos >= block_stop) return E_FILE_FORMAT_INVALID;
+
+    const int lacing = int(flags & 0x06) >> 1;
+
+    if ((lacing != 0) && (block_stop > avail)) {
+      len = static_cast<long>(block_stop - pos);
+      return E_BUFFER_NOT_FULL;
+    }
+
+    pos = block_stop;  // consume block-part of block group
+    if (pos > payload_stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  if (pos != payload_stop) return E_FILE_FORMAT_INVALID;
+
+  status = CreateBlock(libwebm::kMkvBlockGroup, payload_start, payload_size, discard_padding);
+  if (status != 0) return status;
+
+  m_pos = payload_stop;
+
+  return 0;  // success
+}
+
+long Cluster::GetEntry(long index, const mkvparser::BlockEntry*& pEntry) const {
+  assert(m_pos >= m_element_start);
+
+  pEntry = NULL;
+
+  if (index < 0) return -1;  // generic error
+
+  if (m_entries_count < 0) return E_BUFFER_NOT_FULL;
+
+  assert(m_entries);
+  assert(m_entries_size > 0);
+  assert(m_entries_count <= m_entries_size);
+
+  if (index < m_entries_count) {
+    pEntry = m_entries[index];
+    assert(pEntry);
+
+    return 1;  // found entry
+  }
+
+  if (m_element_size < 0)      // we don't know cluster end yet
+    return E_BUFFER_NOT_FULL;  // underflow
+
+  const long long element_stop = m_element_start + m_element_size;
+
+  if (m_pos >= element_stop) return 0;  // nothing left to parse
+
+  return E_BUFFER_NOT_FULL;  // underflow, since more remains to be parsed
+}
+
+Cluster* Cluster::Create(Segment* pSegment, long idx, long long off) {
+  if (!pSegment || off < 0) return NULL;
+
+  const long long element_start = pSegment->m_start + off;
+
+  Cluster* const pCluster = new (std::nothrow) Cluster(pSegment, idx, element_start);
+
+  return pCluster;
+}
+
+Cluster::Cluster()
+    : m_pSegment(NULL),
+      m_element_start(0),
+      m_index(0),
+      m_pos(0),
+      m_element_size(0),
+      m_timecode(0),
+      m_entries(NULL),
+      m_entries_size(0),
+      m_entries_count(0)  // means "no entries"
+{}
+
+Cluster::Cluster(Segment* pSegment, long idx, long long element_start
+                 /* long long element_size */)
+    : m_pSegment(pSegment),
+      m_element_start(element_start),
+      m_index(idx),
+      m_pos(element_start),
+      m_element_size(-1 /* element_size */),
+      m_timecode(-1),
+      m_entries(NULL),
+      m_entries_size(0),
+      m_entries_count(-1)  // means "has not been parsed yet"
+{}
+
+Cluster::~Cluster() {
+  if (m_entries_count <= 0) {
+    delete[] m_entries;
+    return;
+  }
+
+  BlockEntry** i = m_entries;
+  BlockEntry** const j = m_entries + m_entries_count;
+
+  while (i != j) {
+    BlockEntry* p = *i++;
+    assert(p);
+
+    delete p;
+  }
+
+  delete[] m_entries;
+}
+
+bool Cluster::EOS() const { return (m_pSegment == NULL); }
+
+long Cluster::GetIndex() const { return m_index; }
+
+long long Cluster::GetPosition() const {
+  const long long pos = m_element_start - m_pSegment->m_start;
+  assert(pos >= 0);
+
+  return pos;
+}
+
+long long Cluster::GetElementSize() const { return m_element_size; }
+
+long Cluster::HasBlockEntries(const Segment* pSegment,
+                              long long off,  // relative to start of segment payload
+                              long long& pos, long& len) {
+  assert(pSegment);
+  assert(off >= 0);  // relative to segment
+
+  IMkvReader* const pReader = pSegment->m_pReader;
+
+  long long total, avail;
+
+  long status = pReader->Length(&total, &avail);
+
+  if (status < 0)  // error
+    return status;
+
+  assert((total < 0) || (avail <= total));
+
+  pos = pSegment->m_start + off;  // absolute
+
+  if ((total >= 0) && (pos >= total)) return 0;  // we don't even have a complete cluster
+
+  const long long segment_stop = (pSegment->m_size < 0) ? -1 : pSegment->m_start + pSegment->m_size;
+
+  long long cluster_stop = -1;  // interpreted later to mean "unknown size"
+
+  {
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    long long result = GetUIntLength(pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0)  // need more data
+      return E_BUFFER_NOT_FULL;
+
+    if ((segment_stop >= 0) && ((pos + len) > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+    if ((total >= 0) && ((pos + len) > total)) return 0;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long id = ReadID(pReader, pos, len);
+
+    if (id < 0)  // error
+      return static_cast<long>(id);
+
+    if (id != libwebm::kMkvCluster) return E_PARSE_FAILED;
+
+    pos += len;  // consume Cluster ID field
+
+    // read size field
+
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    result = GetUIntLength(pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0)  // weird
+      return E_BUFFER_NOT_FULL;
+
+    if ((segment_stop >= 0) && ((pos + len) > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+    if ((total >= 0) && ((pos + len) > total)) return 0;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long size = ReadUInt(pReader, pos, len);
+
+    if (size < 0)  // error
+      return static_cast<long>(size);
+
+    if (size == 0) return 0;  // cluster does not have entries
+
+    pos += len;  // consume size field
+
+    // pos now points to start of payload
+
+    const long long unknown_size = (1LL << (7 * len)) - 1;
+
+    if (size != unknown_size) {
+      cluster_stop = pos + size;
+      assert(cluster_stop >= 0);
+
+      if ((segment_stop >= 0) && (cluster_stop > segment_stop)) return E_FILE_FORMAT_INVALID;
+
+      if ((total >= 0) && (cluster_stop > total))
+        // return E_FILE_FORMAT_INVALID;  //too conservative
+        return 0;  // cluster does not have any entries
+    }
+  }
+
+  for (;;) {
+    if ((cluster_stop >= 0) && (pos >= cluster_stop)) return 0;  // no entries detected
+
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    long long result = GetUIntLength(pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0)  // need more data
+      return E_BUFFER_NOT_FULL;
+
+    if ((cluster_stop >= 0) && ((pos + len) > cluster_stop)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long id = ReadID(pReader, pos, len);
+
+    if (id < 0)  // error
+      return static_cast<long>(id);
+
+    // This is the distinguished set of ID's we use to determine
+    // that we have exhausted the sub-element's inside the cluster
+    // whose ID we parsed earlier.
+
+    if (id == libwebm::kMkvCluster) return 0;  // no entries found
+
+    if (id == libwebm::kMkvCues) return 0;  // no entries found
+
+    pos += len;  // consume id field
+
+    if ((cluster_stop >= 0) && (pos >= cluster_stop)) return E_FILE_FORMAT_INVALID;
+
+    // read size field
+
+    if ((pos + 1) > avail) {
+      len = 1;
+      return E_BUFFER_NOT_FULL;
+    }
+
+    result = GetUIntLength(pReader, pos, len);
+
+    if (result < 0)  // error
+      return static_cast<long>(result);
+
+    if (result > 0)  // underflow
+      return E_BUFFER_NOT_FULL;
+
+    if ((cluster_stop >= 0) && ((pos + len) > cluster_stop)) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > avail) return E_BUFFER_NOT_FULL;
+
+    const long long size = ReadUInt(pReader, pos, len);
+
+    if (size < 0)  // error
+      return static_cast<long>(size);
+
+    pos += len;  // consume size field
+
+    // pos now points to start of payload
+
+    if ((cluster_stop >= 0) && (pos > cluster_stop)) return E_FILE_FORMAT_INVALID;
+
+    if (size == 0)  // weird
+      continue;
+
+    const long long unknown_size = (1LL << (7 * len)) - 1;
+
+    if (size == unknown_size) return E_FILE_FORMAT_INVALID;  // not supported inside cluster
+
+    if ((cluster_stop >= 0) && ((pos + size) > cluster_stop)) return E_FILE_FORMAT_INVALID;
+
+    if (id == libwebm::kMkvBlockGroup) return 1;  // have at least one entry
+
+    if (id == libwebm::kMkvSimpleBlock) return 1;  // have at least one entry
+
+    pos += size;  // consume payload
+    if (cluster_stop >= 0 && pos > cluster_stop) return E_FILE_FORMAT_INVALID;
+  }
+}
+
+long long Cluster::GetTimeCode() const {
+  long long pos;
+  long len;
+
+  const long status = Load(pos, len);
+
+  if (status < 0)  // error
+    return status;
+
+  return m_timecode;
+}
+
+long long Cluster::GetTime() const {
+  const long long tc = GetTimeCode();
+
+  if (tc < 0) return tc;
+
+  const SegmentInfo* const pInfo = m_pSegment->GetInfo();
+  assert(pInfo);
+
+  const long long scale = pInfo->GetTimeCodeScale();
+  assert(scale >= 1);
+
+  const long long t = m_timecode * scale;
+
+  return t;
+}
+
+long long Cluster::GetFirstTime() const {
+  const BlockEntry* pEntry;
+
+  const long status = GetFirst(pEntry);
+
+  if (status < 0)  // error
+    return status;
+
+  if (pEntry == NULL)  // empty cluster
+    return GetTime();
+
+  const Block* const pBlock = pEntry->GetBlock();
+  assert(pBlock);
+
+  return pBlock->GetTime(this);
+}
+
+long long Cluster::GetLastTime() const {
+  const BlockEntry* pEntry;
+
+  const long status = GetLast(pEntry);
+
+  if (status < 0)  // error
+    return status;
+
+  if (pEntry == NULL)  // empty cluster
+    return GetTime();
+
+  const Block* const pBlock = pEntry->GetBlock();
+  assert(pBlock);
+
+  return pBlock->GetTime(this);
+}
+
+long Cluster::CreateBlock(long long id,
+                          long long pos,  // absolute pos of payload
+                          long long size, long long discard_padding) {
+  if (id != libwebm::kMkvBlockGroup && id != libwebm::kMkvSimpleBlock) return E_PARSE_FAILED;
+
+  if (m_entries_count < 0) {  // haven't parsed anything yet
+    assert(m_entries == NULL);
+    assert(m_entries_size == 0);
+
+    m_entries_size = 1024;
+    m_entries = new (std::nothrow) BlockEntry*[m_entries_size];
+    if (m_entries == NULL) return -1;
+
+    m_entries_count = 0;
+  } else {
+    assert(m_entries);
+    assert(m_entries_size > 0);
+    assert(m_entries_count <= m_entries_size);
+
+    if (m_entries_count >= m_entries_size) {
+      const long entries_size = 2 * m_entries_size;
+
+      BlockEntry** const entries = new (std::nothrow) BlockEntry*[entries_size];
+      if (entries == NULL) return -1;
+
+      BlockEntry** src = m_entries;
+      BlockEntry** const src_end = src + m_entries_count;
+
+      BlockEntry** dst = entries;
+
+      while (src != src_end) *dst++ = *src++;
+
+      delete[] m_entries;
+
+      m_entries = entries;
+      m_entries_size = entries_size;
+    }
+  }
+
+  if (id == libwebm::kMkvBlockGroup)
+    return CreateBlockGroup(pos, size, discard_padding);
+  else
+    return CreateSimpleBlock(pos, size);
+}
+
+long Cluster::CreateBlockGroup(long long start_offset, long long size, long long discard_padding) {
+  assert(m_entries);
+  assert(m_entries_size > 0);
+  assert(m_entries_count >= 0);
+  assert(m_entries_count < m_entries_size);
+
+  IMkvReader* const pReader = m_pSegment->m_pReader;
+
+  long long pos = start_offset;
+  const long long stop = start_offset + size;
+
+  // For WebM files, there is a bias towards previous reference times
+  //(in order to support alt-ref frames, which refer back to the previous
+  // keyframe).  Normally a 0 value is not possible, but here we tenatively
+  // allow 0 as the value of a reference frame, with the interpretation
+  // that this is a "previous" reference time.
+
+  long long prev = 1;       // nonce
+  long long next = 0;       // nonce
+  long long duration = -1;  // really, this is unsigned
+
+  long long bpos = -1;
+  long long bsize = -1;
+
+  while (pos < stop) {
+    long len;
+    const long long id = ReadID(pReader, pos, len);
+    if (id < 0 || (pos + len) > stop) return E_FILE_FORMAT_INVALID;
+
+    pos += len;  // consume ID
+
+    const long long size = ReadUInt(pReader, pos, len);
+    assert(size >= 0);  // TODO
+    assert((pos + len) <= stop);
+
+    pos += len;  // consume size
+
+    if (id == libwebm::kMkvBlock) {
+      if (bpos < 0) {  // Block ID
+        bpos = pos;
+        bsize = size;
+      }
+    } else if (id == libwebm::kMkvBlockDuration) {
+      if (size > 8) return E_FILE_FORMAT_INVALID;
+
+      duration = UnserializeUInt(pReader, pos, size);
+
+      if (duration < 0) return E_FILE_FORMAT_INVALID;
+    } else if (id == libwebm::kMkvReferenceBlock) {
+      if (size > 8 || size <= 0) return E_FILE_FORMAT_INVALID;
+      const long size_ = static_cast<long>(size);
+
+      long long time;
+
+      long status = UnserializeInt(pReader, pos, size_, time);
+      assert(status == 0);
+      if (status != 0) return -1;
+
+      if (time <= 0)  // see note above
+        prev = time;
+      else
+        next = time;
+    }
+
+    pos += size;  // consume payload
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+  }
+  if (bpos < 0) return E_FILE_FORMAT_INVALID;
+
+  if (pos != stop) return E_FILE_FORMAT_INVALID;
+  assert(bsize >= 0);
+
+  const long idx = m_entries_count;
+
+  BlockEntry** const ppEntry = m_entries + idx;
+  BlockEntry*& pEntry = *ppEntry;
+
+  pEntry = new (std::nothrow) BlockGroup(this, idx, bpos, bsize, prev, next, duration, discard_padding);
+
+  if (pEntry == NULL) return -1;  // generic error
+
+  BlockGroup* const p = static_cast<BlockGroup*>(pEntry);
+
+  const long status = p->Parse();
+
+  if (status == 0) {  // success
+    ++m_entries_count;
+    return 0;
+  }
+
+  delete pEntry;
+  pEntry = 0;
+
+  return status;
+}
+
+long Cluster::CreateSimpleBlock(long long st, long long sz) {
+  assert(m_entries);
+  assert(m_entries_size > 0);
+  assert(m_entries_count >= 0);
+  assert(m_entries_count < m_entries_size);
+
+  const long idx = m_entries_count;
+
+  BlockEntry** const ppEntry = m_entries + idx;
+  BlockEntry*& pEntry = *ppEntry;
+
+  pEntry = new (std::nothrow) SimpleBlock(this, idx, st, sz);
+
+  if (pEntry == NULL) return -1;  // generic error
+
+  SimpleBlock* const p = static_cast<SimpleBlock*>(pEntry);
+
+  const long status = p->Parse();
+
+  if (status == 0) {
+    ++m_entries_count;
+    return 0;
+  }
+
+  delete pEntry;
+  pEntry = 0;
+
+  return status;
+}
+
+long Cluster::GetFirst(const BlockEntry*& pFirst) const {
+  if (m_entries_count <= 0) {
+    long long pos;
+    long len;
+
+    const long status = Parse(pos, len);
+
+    if (status < 0) {  // error
+      pFirst = NULL;
+      return status;
+    }
+
+    if (m_entries_count <= 0) {  // empty cluster
+      pFirst = NULL;
+      return 0;
+    }
+  }
+
+  assert(m_entries);
+
+  pFirst = m_entries[0];
+  assert(pFirst);
+
+  return 0;  // success
+}
+
+long Cluster::GetLast(const BlockEntry*& pLast) const {
+  for (;;) {
+    long long pos;
+    long len;
+
+    const long status = Parse(pos, len);
+
+    if (status < 0) {  // error
+      pLast = NULL;
+      return status;
+    }
+
+    if (status > 0)  // no new block
+      break;
+  }
+
+  if (m_entries_count <= 0) {
+    pLast = NULL;
+    return 0;
+  }
+
+  assert(m_entries);
+
+  const long idx = m_entries_count - 1;
+
+  pLast = m_entries[idx];
+  assert(pLast);
+
+  return 0;
+}
+
+long Cluster::GetNext(const BlockEntry* pCurr, const BlockEntry*& pNext) const {
+  assert(pCurr);
+  assert(m_entries);
+  assert(m_entries_count > 0);
+
+  size_t idx = pCurr->GetIndex();
+  assert(idx < size_t(m_entries_count));
+  assert(m_entries[idx] == pCurr);
+
+  ++idx;
+
+  if (idx >= size_t(m_entries_count)) {
+    long long pos;
+    long len;
+
+    const long status = Parse(pos, len);
+
+    if (status < 0) {  // error
+      pNext = NULL;
+      return status;
+    }
+
+    if (status > 0) {
+      pNext = NULL;
+      return 0;
+    }
+
+    assert(m_entries);
+    assert(m_entries_count > 0);
+    assert(idx < size_t(m_entries_count));
+  }
+
+  pNext = m_entries[idx];
+  assert(pNext);
+
+  return 0;
+}
+
+long Cluster::GetEntryCount() const { return m_entries_count; }
+
+const BlockEntry* Cluster::GetEntry(const Track* pTrack, long long time_ns) const {
+  assert(pTrack);
+
+  if (m_pSegment == NULL)  // this is the special EOS cluster
+    return pTrack->GetEOS();
+
+  const BlockEntry* pResult = pTrack->GetEOS();
+
+  long index = 0;
+
+  for (;;) {
+    if (index >= m_entries_count) {
+      long long pos;
+      long len;
+
+      const long status = Parse(pos, len);
+      assert(status >= 0);
+
+      if (status > 0)  // completely parsed, and no more entries
+        return pResult;
+
+      if (status < 0)  // should never happen
+        return 0;
+
+      assert(m_entries);
+      assert(index < m_entries_count);
+    }
+
+    const BlockEntry* const pEntry = m_entries[index];
+    assert(pEntry);
+    assert(!pEntry->EOS());
+
+    const Block* const pBlock = pEntry->GetBlock();
+    assert(pBlock);
+
+    if (pBlock->GetTrackNumber() != pTrack->GetNumber()) {
+      ++index;
+      continue;
+    }
+
+    if (pTrack->VetEntry(pEntry)) {
+      if (time_ns < 0)  // just want first candidate block
+        return pEntry;
+
+      const long long ns = pBlock->GetTime(this);
+
+      if (ns > time_ns) return pResult;
+
+      pResult = pEntry;  // have a candidate
+    } else if (time_ns >= 0) {
+      const long long ns = pBlock->GetTime(this);
+
+      if (ns > time_ns) return pResult;
+    }
+
+    ++index;
+  }
+}
+
+const BlockEntry* Cluster::GetEntry(const CuePoint& cp, const CuePoint::TrackPosition& tp) const {
+  assert(m_pSegment);
+  const long long tc = cp.GetTimeCode();
+
+  if (tp.m_block > 0) {
+    const long block = static_cast<long>(tp.m_block);
+    const long index = block - 1;
+
+    while (index >= m_entries_count) {
+      long long pos;
+      long len;
+
+      const long status = Parse(pos, len);
+
+      if (status < 0)  // TODO: can this happen?
+        return NULL;
+
+      if (status > 0)  // nothing remains to be parsed
+        return NULL;
+    }
+
+    const BlockEntry* const pEntry = m_entries[index];
+    assert(pEntry);
+    assert(!pEntry->EOS());
+
+    const Block* const pBlock = pEntry->GetBlock();
+    assert(pBlock);
+
+    if ((pBlock->GetTrackNumber() == tp.m_track) && (pBlock->GetTimeCode(this) == tc)) {
+      return pEntry;
+    }
+  }
+
+  long index = 0;
+
+  for (;;) {
+    if (index >= m_entries_count) {
+      long long pos;
+      long len;
+
+      const long status = Parse(pos, len);
+
+      if (status < 0)  // TODO: can this happen?
+        return NULL;
+
+      if (status > 0)  // nothing remains to be parsed
+        return NULL;
+
+      assert(m_entries);
+      assert(index < m_entries_count);
+    }
+
+    const BlockEntry* const pEntry = m_entries[index];
+    assert(pEntry);
+    assert(!pEntry->EOS());
+
+    const Block* const pBlock = pEntry->GetBlock();
+    assert(pBlock);
+
+    if (pBlock->GetTrackNumber() != tp.m_track) {
+      ++index;
+      continue;
+    }
+
+    const long long tc_ = pBlock->GetTimeCode(this);
+
+    if (tc_ < tc) {
+      ++index;
+      continue;
+    }
+
+    if (tc_ > tc) return NULL;
+
+    const Tracks* const pTracks = m_pSegment->GetTracks();
+    assert(pTracks);
+
+    const long tn = static_cast<long>(tp.m_track);
+    const Track* const pTrack = pTracks->GetTrackByNumber(tn);
+
+    if (pTrack == NULL) return NULL;
+
+    const long long type = pTrack->GetType();
+
+    if (type == 2)  // audio
+      return pEntry;
+
+    if (type != 1)  // not video
+      return NULL;
+
+    if (!pBlock->IsKey()) return NULL;
+
+    return pEntry;
+  }
+}
+
+BlockEntry::BlockEntry(Cluster* p, long idx) : m_pCluster(p), m_index(idx) {}
+BlockEntry::~BlockEntry() {}
+const Cluster* BlockEntry::GetCluster() const { return m_pCluster; }
+long BlockEntry::GetIndex() const { return m_index; }
+
+SimpleBlock::SimpleBlock(Cluster* pCluster, long idx, long long start, long long size)
+    : BlockEntry(pCluster, idx), m_block(start, size, 0) {}
+
+long SimpleBlock::Parse() { return m_block.Parse(m_pCluster); }
+BlockEntry::Kind SimpleBlock::GetKind() const { return kBlockSimple; }
+const Block* SimpleBlock::GetBlock() const { return &m_block; }
+
+BlockGroup::BlockGroup(Cluster* pCluster, long idx, long long block_start, long long block_size, long long prev,
+                       long long next, long long duration, long long discard_padding)
+    : BlockEntry(pCluster, idx),
+      m_block(block_start, block_size, discard_padding),
+      m_prev(prev),
+      m_next(next),
+      m_duration(duration) {}
+
+long BlockGroup::Parse() {
+  const long status = m_block.Parse(m_pCluster);
+
+  if (status) return status;
+
+  m_block.SetKey((m_prev > 0) && (m_next <= 0));
+
+  return 0;
+}
+
+BlockEntry::Kind BlockGroup::GetKind() const { return kBlockGroup; }
+const Block* BlockGroup::GetBlock() const { return &m_block; }
+long long BlockGroup::GetPrevTimeCode() const { return m_prev; }
+long long BlockGroup::GetNextTimeCode() const { return m_next; }
+long long BlockGroup::GetDurationTimeCode() const { return m_duration; }
+
+Block::Block(long long start, long long size_, long long discard_padding)
+    : m_start(start),
+      m_size(size_),
+      m_track(0),
+      m_timecode(-1),
+      m_flags(0),
+      m_frames(NULL),
+      m_frame_count(-1),
+      m_discard_padding(discard_padding) {}
+
+Block::~Block() { delete[] m_frames; }
+
+long Block::Parse(const Cluster* pCluster) {
+  if (pCluster == NULL) return -1;
+
+  if (pCluster->m_pSegment == NULL) return -1;
+
+  assert(m_start >= 0);
+  assert(m_size >= 0);
+  assert(m_track <= 0);
+  assert(m_frames == NULL);
+  assert(m_frame_count <= 0);
+
+  long long pos = m_start;
+  const long long stop = m_start + m_size;
+
+  long len;
+
+  IMkvReader* const pReader = pCluster->m_pSegment->m_pReader;
+
+  m_track = ReadUInt(pReader, pos, len);
+
+  if (m_track <= 0) return E_FILE_FORMAT_INVALID;
+
+  if ((pos + len) > stop) return E_FILE_FORMAT_INVALID;
+
+  pos += len;  // consume track number
+
+  if ((stop - pos) < 2) return E_FILE_FORMAT_INVALID;
+
+  long status;
+  long long value;
+
+  status = UnserializeInt(pReader, pos, 2, value);
+
+  if (status) return E_FILE_FORMAT_INVALID;
+
+  if (value < SHRT_MIN) return E_FILE_FORMAT_INVALID;
+
+  if (value > SHRT_MAX) return E_FILE_FORMAT_INVALID;
+
+  m_timecode = static_cast<short>(value);
+
+  pos += 2;
+
+  if ((stop - pos) <= 0) return E_FILE_FORMAT_INVALID;
+
+  status = pReader->Read(pos, 1, &m_flags);
+
+  if (status) return E_FILE_FORMAT_INVALID;
+
+  const int lacing = int(m_flags & 0x06) >> 1;
+
+  ++pos;  // consume flags byte
+
+  if (lacing == 0) {  // no lacing
+    if (pos > stop) return E_FILE_FORMAT_INVALID;
+
+    m_frame_count = 1;
+    m_frames = new (std::nothrow) Frame[m_frame_count];
+    if (m_frames == NULL) return -1;
+
+    Frame& f = m_frames[0];
+    f.pos = pos;
+
+    const long long frame_size = stop - pos;
+
+    if (frame_size > LONG_MAX || frame_size <= 0) return E_FILE_FORMAT_INVALID;
+
+    f.len = static_cast<long>(frame_size);
+
+    return 0;  // success
+  }
+
+  if (pos >= stop) return E_FILE_FORMAT_INVALID;
+
+  unsigned char biased_count;
+
+  status = pReader->Read(pos, 1, &biased_count);
+
+  if (status) return E_FILE_FORMAT_INVALID;
+
+  ++pos;  // consume frame count
+  if (pos > stop) return E_FILE_FORMAT_INVALID;
+
+  m_frame_count = int(biased_count) + 1;
+
+  m_frames = new (std::nothrow) Frame[m_frame_count];
+  if (m_frames == NULL) return -1;
+
+  if (!m_frames) return E_FILE_FORMAT_INVALID;
+
+  if (lacing == 1) {  // Xiph
+    Frame* pf = m_frames;
+    Frame* const pf_end = pf + m_frame_count;
+
+    long long size = 0;
+    int frame_count = m_frame_count;
+
+    while (frame_count > 1) {
+      long frame_size = 0;
+
+      for (;;) {
+        unsigned char val;
+
+        if (pos >= stop) return E_FILE_FORMAT_INVALID;
+
+        status = pReader->Read(pos, 1, &val);
+
+        if (status) return E_FILE_FORMAT_INVALID;
+
+        ++pos;  // consume xiph size byte
+
+        frame_size += val;
+
+        if (val < 255) break;
+      }
+
+      Frame& f = *pf++;
+      assert(pf < pf_end);
+      if (pf >= pf_end) return E_FILE_FORMAT_INVALID;
+
+      f.pos = 0;  // patch later
+
+      if (frame_size <= 0) return E_FILE_FORMAT_INVALID;
+
+      f.len = frame_size;
+      size += frame_size;  // contribution of this frame
+
+      --frame_count;
+    }
+
+    if (pf >= pf_end || pos > stop) return E_FILE_FORMAT_INVALID;
+
+    {
+      Frame& f = *pf++;
+
+      if (pf != pf_end) return E_FILE_FORMAT_INVALID;
+
+      f.pos = 0;  // patch later
+
+      const long long total_size = stop - pos;
+
+      if (total_size < size) return E_FILE_FORMAT_INVALID;
+
+      const long long frame_size = total_size - size;
+
+      if (frame_size > LONG_MAX || frame_size <= 0) return E_FILE_FORMAT_INVALID;
+
+      f.len = static_cast<long>(frame_size);
+    }
+
+    pf = m_frames;
+    while (pf != pf_end) {
+      Frame& f = *pf++;
+      assert((pos + f.len) <= stop);
+
+      if ((pos + f.len) > stop) return E_FILE_FORMAT_INVALID;
+
+      f.pos = pos;
+      pos += f.len;
+    }
+
+    assert(pos == stop);
+    if (pos != stop) return E_FILE_FORMAT_INVALID;
+
+  } else if (lacing == 2) {  // fixed-size lacing
+    if (pos >= stop) return E_FILE_FORMAT_INVALID;
+
+    const long long total_size = stop - pos;
+
+    if ((total_size % m_frame_count) != 0) return E_FILE_FORMAT_INVALID;
+
+    const long long frame_size = total_size / m_frame_count;
+
+    if (frame_size > LONG_MAX || frame_size <= 0) return E_FILE_FORMAT_INVALID;
+
+    Frame* pf = m_frames;
+    Frame* const pf_end = pf + m_frame_count;
+
+    while (pf != pf_end) {
+      assert((pos + frame_size) <= stop);
+      if ((pos + frame_size) > stop) return E_FILE_FORMAT_INVALID;
+
+      Frame& f = *pf++;
+
+      f.pos = pos;
+      f.len = static_cast<long>(frame_size);
+
+      pos += frame_size;
+    }
+
+    assert(pos == stop);
+    if (pos != stop) return E_FILE_FORMAT_INVALID;
+
+  } else {
+    assert(lacing == 3);  // EBML lacing
+
+    if (pos >= stop) return E_FILE_FORMAT_INVALID;
+
+    long long size = 0;
+    int frame_count = m_frame_count;
+
+    long long frame_size = ReadUInt(pReader, pos, len);
+
+    if (frame_size <= 0) return E_FILE_FORMAT_INVALID;
+
+    if (frame_size > LONG_MAX) return E_FILE_FORMAT_INVALID;
+
+    if ((pos + len) > stop) return E_FILE_FORMAT_INVALID;
+
+    pos += len;  // consume length of size of first frame
+
+    if ((pos + frame_size) > stop) return E_FILE_FORMAT_INVALID;
+
+    Frame* pf = m_frames;
+    Frame* const pf_end = pf + m_frame_count;
+
+    {
+      Frame& curr = *pf;
+
+      curr.pos = 0;  // patch later
+
+      curr.len = static_cast<long>(frame_size);
+      size += curr.len;  // contribution of this frame
+    }
+
+    --frame_count;
+
+    while (frame_count > 1) {
+      if (pos >= stop) return E_FILE_FORMAT_INVALID;
+
+      assert(pf < pf_end);
+      if (pf >= pf_end) return E_FILE_FORMAT_INVALID;
+
+      const Frame& prev = *pf++;
+      assert(prev.len == frame_size);
+      if (prev.len != frame_size) return E_FILE_FORMAT_INVALID;
+
+      assert(pf < pf_end);
+      if (pf >= pf_end) return E_FILE_FORMAT_INVALID;
+
+      Frame& curr = *pf;
+
+      curr.pos = 0;  // patch later
+
+      const long long delta_size_ = ReadUInt(pReader, pos, len);
+
+      if (delta_size_ < 0) return E_FILE_FORMAT_INVALID;
+
+      if ((pos + len) > stop) return E_FILE_FORMAT_INVALID;
+
+      pos += len;  // consume length of (delta) size
+      if (pos > stop) return E_FILE_FORMAT_INVALID;
+
+      const long exp = 7 * len - 1;
+      const long long bias = (1LL << exp) - 1LL;
+      const long long delta_size = delta_size_ - bias;
+
+      frame_size += delta_size;
+
+      if (frame_size <= 0) return E_FILE_FORMAT_INVALID;
+
+      if (frame_size > LONG_MAX) return E_FILE_FORMAT_INVALID;
+
+      curr.len = static_cast<long>(frame_size);
+      // Check if size + curr.len could overflow.
+      if (size > LLONG_MAX - curr.len) {
+        return E_FILE_FORMAT_INVALID;
+      }
+      size += curr.len;  // contribution of this frame
+
+      --frame_count;
+    }
+
+    // parse last frame
+    if (frame_count > 0) {
+      if (pos > stop || pf >= pf_end) return E_FILE_FORMAT_INVALID;
+
+      const Frame& prev = *pf++;
+      assert(prev.len == frame_size);
+      if (prev.len != frame_size) return E_FILE_FORMAT_INVALID;
+
+      if (pf >= pf_end) return E_FILE_FORMAT_INVALID;
+
+      Frame& curr = *pf++;
+      if (pf != pf_end) return E_FILE_FORMAT_INVALID;
+
+      curr.pos = 0;  // patch later
+
+      const long long total_size = stop - pos;
+
+      if (total_size < size) return E_FILE_FORMAT_INVALID;
+
+      frame_size = total_size - size;
+
+      if (frame_size > LONG_MAX || frame_size <= 0) return E_FILE_FORMAT_INVALID;
+
+      curr.len = static_cast<long>(frame_size);
+    }
+
+    pf = m_frames;
+    while (pf != pf_end) {
+      Frame& f = *pf++;
+      if ((pos + f.len) > stop) return E_FILE_FORMAT_INVALID;
+
+      f.pos = pos;
+      pos += f.len;
+    }
+
+    if (pos != stop) return E_FILE_FORMAT_INVALID;
+  }
+
+  return 0;  // success
+}
+
+long long Block::GetTimeCode(const Cluster* pCluster) const {
+  if (pCluster == 0) return m_timecode;
+
+  const long long tc0 = pCluster->GetTimeCode();
+  assert(tc0 >= 0);
+
+  // Check if tc0 + m_timecode would overflow.
+  if (tc0 < 0 || LLONG_MAX - tc0 < m_timecode) {
+    return -1;
+  }
+
+  const long long tc = tc0 + m_timecode;
+
+  return tc;  // unscaled timecode units
+}
+
+long long Block::GetTime(const Cluster* pCluster) const {
+  assert(pCluster);
+
+  const long long tc = GetTimeCode(pCluster);
+
+  const Segment* const pSegment = pCluster->m_pSegment;
+  const SegmentInfo* const pInfo = pSegment->GetInfo();
+  assert(pInfo);
+
+  const long long scale = pInfo->GetTimeCodeScale();
+  assert(scale >= 1);
+
+  // Check if tc * scale could overflow.
+  if (tc != 0 && scale > LLONG_MAX / tc) {
+    return -1;
+  }
+  const long long ns = tc * scale;
+
+  return ns;
+}
+
+long long Block::GetTrackNumber() const { return m_track; }
+
+bool Block::IsKey() const { return ((m_flags & static_cast<unsigned char>(1 << 7)) != 0); }
+
+void Block::SetKey(bool bKey) {
+  if (bKey)
+    m_flags |= static_cast<unsigned char>(1 << 7);
+  else
+    m_flags &= 0x7F;
+}
+
+bool Block::IsInvisible() const { return bool(int(m_flags & 0x08) != 0); }
+
+Block::Lacing Block::GetLacing() const {
+  const int value = int(m_flags & 0x06) >> 1;
+  return static_cast<Lacing>(value);
+}
+
+int Block::GetFrameCount() const { return m_frame_count; }
+
+const Block::Frame& Block::GetFrame(int idx) const {
+  assert(idx >= 0);
+  assert(idx < m_frame_count);
+
+  const Frame& f = m_frames[idx];
+  assert(f.pos > 0);
+  assert(f.len > 0);
+
+  return f;
+}
+
+long Block::Frame::Read(IMkvReader* pReader, unsigned char* buf) const {
+  assert(pReader);
+  assert(buf);
+
+  const long status = pReader->Read(pos, len, buf);
+  return status;
+}
+
+long long Block::GetDiscardPadding() const { return m_discard_padding; }
+
+}  // namespace mkvparser

diff --git a/third_party/libwebm/mkvparser/mkvparser.h b/third_party/libwebm/mkvparser/mkvparser.h
new file mode 100644
index 0000000..1f3d5cf
--- /dev/null
+++ b/third_party/libwebm/mkvparser/mkvparser.h

@@ -0,0 +1,1118 @@
+// Copyright (c) 2012 The WebM project authors. All Rights Reserved.
+//
+// Use of this source code is governed by a BSD-style license
+// that can be found in the LICENSE file in the root of the source
+// tree. An additional intellectual property rights grant can be found
+// in the file PATENTS.  All contributing project authors may
+// be found in the AUTHORS file in the root of the source tree.
+#ifndef MKVPARSER_MKVPARSER_H_
+#define MKVPARSER_MKVPARSER_H_
+
+#include <cstddef>
+
+namespace mkvparser {
+
+const int E_PARSE_FAILED = -1;
+const int E_FILE_FORMAT_INVALID = -2;
+const int E_BUFFER_NOT_FULL = -3;
+
+class IMkvReader {
+ public:
+  virtual int Read(long long pos, long len, unsigned char* buf) = 0;
+  virtual int Length(long long* total, long long* available) = 0;
+
+ protected:
+  virtual ~IMkvReader() {}
+};
+
+template <typename Type>
+Type* SafeArrayAlloc(unsigned long long num_elements, unsigned long long element_size);
+long long GetUIntLength(IMkvReader*, long long, long&);
+long long ReadUInt(IMkvReader*, long long, long&);
+long long ReadID(IMkvReader* pReader, long long pos, long& len);
+long long UnserializeUInt(IMkvReader*, long long pos, long long size);
+
+long UnserializeFloat(IMkvReader*, long long pos, long long size, double&);
+long UnserializeInt(IMkvReader*, long long pos, long long size, long long& result);
+
+long UnserializeString(IMkvReader*, long long pos, long long size, char*& str);
+
+long ParseElementHeader(IMkvReader* pReader,
+                        long long& pos,  // consume id and size fields
+                        long long stop,  // if you know size of element's parent
+                        long long& id, long long& size);
+
+bool Match(IMkvReader*, long long&, unsigned long, long long&);
+bool Match(IMkvReader*, long long&, unsigned long, unsigned char*&, size_t&);
+
+void GetVersion(int& major, int& minor, int& build, int& revision);
+
+struct EBMLHeader {
+  EBMLHeader();
+  ~EBMLHeader();
+  long long m_version;
+  long long m_readVersion;
+  long long m_maxIdLength;
+  long long m_maxSizeLength;
+  char* m_docType;
+  long long m_docTypeVersion;
+  long long m_docTypeReadVersion;
+
+  long long Parse(IMkvReader*, long long&);
+  void Init();
+};
+
+class Segment;
+class Track;
+class Cluster;
+
+class Block {
+  Block(const Block&);
+  Block& operator=(const Block&);
+
+ public:
+  const long long m_start;
+  const long long m_size;
+
+  Block(long long start, long long size, long long discard_padding);
+  ~Block();
+
+  long Parse(const Cluster*);
+
+  long long GetTrackNumber() const;
+  long long GetTimeCode(const Cluster*) const;  // absolute, but not scaled
+  long long GetTime(const Cluster*) const;      // absolute, and scaled (ns)
+  bool IsKey() const;
+  void SetKey(bool);
+  bool IsInvisible() const;
+
+  enum Lacing { kLacingNone, kLacingXiph, kLacingFixed, kLacingEbml };
+  Lacing GetLacing() const;
+
+  int GetFrameCount() const;  // to index frames: [0, count)
+
+  struct Frame {
+    long long pos;  // absolute offset
+    long len;
+
+    long Read(IMkvReader*, unsigned char*) const;
+  };
+
+  const Frame& GetFrame(int frame_index) const;
+
+  long long GetDiscardPadding() const;
+
+ private:
+  long long m_track;  // Track::Number()
+  short m_timecode;   // relative to cluster
+  unsigned char m_flags;
+
+  Frame* m_frames;
+  int m_frame_count;
+
+ protected:
+  const long long m_discard_padding;
+};
+
+class BlockEntry {
+  BlockEntry(const BlockEntry&);
+  BlockEntry& operator=(const BlockEntry&);
+
+ protected:
+  BlockEntry(Cluster*, long index);
+
+ public:
+  virtual ~BlockEntry();
+
+  bool EOS() const { return (GetKind() == kBlockEOS); }
+  const Cluster* GetCluster() const;
+  long GetIndex() const;
+  virtual const Block* GetBlock() const = 0;
+
+  enum Kind { kBlockEOS, kBlockSimple, kBlockGroup };
+  virtual Kind GetKind() const = 0;
+
+ protected:
+  Cluster* const m_pCluster;
+  const long m_index;
+};
+
+class SimpleBlock : public BlockEntry {
+  SimpleBlock(const SimpleBlock&);
+  SimpleBlock& operator=(const SimpleBlock&);
+
+ public:
+  SimpleBlock(Cluster*, long index, long long start, long long size);
+  long Parse();
+
+  Kind GetKind() const;
+  const Block* GetBlock() const;
+
+ protected:
+  Block m_block;
+};
+
+class BlockGroup : public BlockEntry {
+  BlockGroup(const BlockGroup&);
+  BlockGroup& operator=(const BlockGroup&);
+
+ public:
+  BlockGroup(Cluster*, long index,
+             long long block_start,  // absolute pos of block's payload
+             long long block_size,   // size of block's payload
+             long long prev, long long next, long long duration, long long discard_padding);
+
+  long Parse();
+
+  Kind GetKind() const;
+  const Block* GetBlock() const;
+
+  long long GetPrevTimeCode() const;  // relative to block's time
+  long long GetNextTimeCode() const;  // as above
+  long long GetDurationTimeCode() const;
+
+ private:
+  Block m_block;
+  const long long m_prev;
+  const long long m_next;
+  const long long m_duration;
+};
+
+///////////////////////////////////////////////////////////////
+// ContentEncoding element
+// Elements used to describe if the track data has been encrypted or
+// compressed with zlib or header stripping.
+class ContentEncoding {
+ public:
+  enum { kCTR = 1 };
+
+  ContentEncoding();
+  ~ContentEncoding();
+
+  // ContentCompression element names
+  struct ContentCompression {
+    ContentCompression();
+    ~ContentCompression();
+
+    unsigned long long algo;
+    unsigned char* settings;
+    long long settings_len;
+  };
+
+  // ContentEncAESSettings element names
+  struct ContentEncAESSettings {
+    ContentEncAESSettings() : cipher_mode(kCTR) {}
+    ~ContentEncAESSettings() {}
+
+    unsigned long long cipher_mode;
+  };
+
+  // ContentEncryption element names
+  struct ContentEncryption {
+    ContentEncryption();
+    ~ContentEncryption();
+
+    unsigned long long algo;
+    unsigned char* key_id;
+    long long key_id_len;
+    unsigned char* signature;
+    long long signature_len;
+    unsigned char* sig_key_id;
+    long long sig_key_id_len;
+    unsigned long long sig_algo;
+    unsigned long long sig_hash_algo;
+
+    ContentEncAESSettings aes_settings;
+  };
+
+  // Returns ContentCompression represented by |idx|. Returns NULL if |idx|
+  // is out of bounds.
+  const ContentCompression* GetCompressionByIndex(unsigned long idx) const;
+
+  // Returns number of ContentCompression elements in this ContentEncoding
+  // element.
+  unsigned long GetCompressionCount() const;
+
+  // Parses the ContentCompression element from |pReader|. |start| is the
+  // starting offset of the ContentCompression payload. |size| is the size in
+  // bytes of the ContentCompression payload. |compression| is where the parsed
+  // values will be stored.
+  long ParseCompressionEntry(long long start, long long size, IMkvReader* pReader, ContentCompression* compression);
+
+  // Returns ContentEncryption represented by |idx|. Returns NULL if |idx|
+  // is out of bounds.
+  const ContentEncryption* GetEncryptionByIndex(unsigned long idx) const;
+
+  // Returns number of ContentEncryption elements in this ContentEncoding
+  // element.
+  unsigned long GetEncryptionCount() const;
+
+  // Parses the ContentEncAESSettings element from |pReader|. |start| is the
+  // starting offset of the ContentEncAESSettings payload. |size| is the
+  // size in bytes of the ContentEncAESSettings payload. |encryption| is
+  // where the parsed values will be stored.
+  long ParseContentEncAESSettingsEntry(long long start, long long size, IMkvReader* pReader,
+                                       ContentEncAESSettings* aes);
+
+  // Parses the ContentEncoding element from |pReader|. |start| is the
+  // starting offset of the ContentEncoding payload. |size| is the size in
+  // bytes of the ContentEncoding payload. Returns true on success.
+  long ParseContentEncodingEntry(long long start, long long size, IMkvReader* pReader);
+
+  // Parses the ContentEncryption element from |pReader|. |start| is the
+  // starting offset of the ContentEncryption payload. |size| is the size in
+  // bytes of the ContentEncryption payload. |encryption| is where the parsed
+  // values will be stored.
+  long ParseEncryptionEntry(long long start, long long size, IMkvReader* pReader, ContentEncryption* encryption);
+
+  unsigned long long encoding_order() const { return encoding_order_; }
+  unsigned long long encoding_scope() const { return encoding_scope_; }
+  unsigned long long encoding_type() const { return encoding_type_; }
+
+ private:
+  // Member variables for list of ContentCompression elements.
+  ContentCompression** compression_entries_;
+  ContentCompression** compression_entries_end_;
+
+  // Member variables for list of ContentEncryption elements.
+  ContentEncryption** encryption_entries_;
+  ContentEncryption** encryption_entries_end_;
+
+  // ContentEncoding element names
+  unsigned long long encoding_order_;
+  unsigned long long encoding_scope_;
+  unsigned long long encoding_type_;
+
+  // LIBWEBM_DISALLOW_COPY_AND_ASSIGN(ContentEncoding);
+  ContentEncoding(const ContentEncoding&);
+  ContentEncoding& operator=(const ContentEncoding&);
+};
+
+class Track {
+  Track(const Track&);
+  Track& operator=(const Track&);
+
+ public:
+  class Info;
+  static long Create(Segment*, const Info&, long long element_start, long long element_size, Track*&);
+
+  enum Type { kVideo = 1, kAudio = 2, kSubtitle = 0x11, kMetadata = 0x21 };
+
+  Segment* const m_pSegment;
+  const long long m_element_start;
+  const long long m_element_size;
+  virtual ~Track();
+
+  long GetType() const;
+  long GetNumber() const;
+  unsigned long long GetUid() const;
+  const char* GetNameAsUTF8() const;
+  const char* GetLanguage() const;
+  const char* GetCodecNameAsUTF8() const;
+  const char* GetCodecId() const;
+  const unsigned char* GetCodecPrivate(size_t&) const;
+  bool GetLacing() const;
+  unsigned long long GetDefaultDuration() const;
+  unsigned long long GetCodecDelay() const;
+  unsigned long long GetSeekPreRoll() const;
+
+  const BlockEntry* GetEOS() const;
+
+  struct Settings {
+    long long start;
+    long long size;
+  };
+
+  class Info {
+   public:
+    Info();
+    ~Info();
+    int Copy(Info&) const;
+    void Clear();
+    long type;
+    long number;
+    unsigned long long uid;
+    unsigned long long defaultDuration;
+    unsigned long long codecDelay;
+    unsigned long long seekPreRoll;
+    char* nameAsUTF8;
+    char* language;
+    char* codecId;
+    char* codecNameAsUTF8;
+    unsigned char* codecPrivate;
+    size_t codecPrivateSize;
+    bool lacing;
+    Settings settings;
+
+   private:
+    Info(const Info&);
+    Info& operator=(const Info&);
+    int CopyStr(char* Info::*str, Info&) const;
+  };
+
+  long GetFirst(const BlockEntry*&) const;
+  long GetNext(const BlockEntry* pCurr, const BlockEntry*& pNext) const;
+  virtual bool VetEntry(const BlockEntry*) const;
+  virtual long Seek(long long time_ns, const BlockEntry*&) const;
+
+  const ContentEncoding* GetContentEncodingByIndex(unsigned long idx) const;
+  unsigned long GetContentEncodingCount() const;
+
+  long ParseContentEncodingsEntry(long long start, long long size);
+
+ protected:
+  Track(Segment*, long long element_start, long long element_size);
+
+  Info m_info;
+
+  class EOSBlock : public BlockEntry {
+   public:
+    EOSBlock();
+
+    Kind GetKind() const;
+    const Block* GetBlock() const;
+  };
+
+  EOSBlock m_eos;
+
+ private:
+  ContentEncoding** content_encoding_entries_;
+  ContentEncoding** content_encoding_entries_end_;
+};
+
+struct PrimaryChromaticity {
+  PrimaryChromaticity() : x(0), y(0) {}
+  ~PrimaryChromaticity() {}
+  static bool Parse(IMkvReader* reader, long long read_pos, long long value_size, bool is_x,
+                    PrimaryChromaticity** chromaticity);
+  float x;
+  float y;
+};
+
+struct MasteringMetadata {
+  static const float kValueNotPresent;
+
+  MasteringMetadata()
+      : r(NULL),
+        g(NULL),
+        b(NULL),
+        white_point(NULL),
+        luminance_max(kValueNotPresent),
+        luminance_min(kValueNotPresent) {}
+  ~MasteringMetadata() {
+    delete r;
+    delete g;
+    delete b;
+    delete white_point;
+  }
+
+  static bool Parse(IMkvReader* reader, long long element_start, long long element_size,
+                    MasteringMetadata** mastering_metadata);
+
+  PrimaryChromaticity* r;
+  PrimaryChromaticity* g;
+  PrimaryChromaticity* b;
+  PrimaryChromaticity* white_point;
+  float luminance_max;
+  float luminance_min;
+};
+
+struct Colour {
+  static const long long kValueNotPresent;
+
+  // Unless otherwise noted all values assigned upon construction are the
+  // equivalent of unspecified/default.
+  Colour()
+      : matrix_coefficients(kValueNotPresent),
+        bits_per_channel(kValueNotPresent),
+        chroma_subsampling_horz(kValueNotPresent),
+        chroma_subsampling_vert(kValueNotPresent),
+        cb_subsampling_horz(kValueNotPresent),
+        cb_subsampling_vert(kValueNotPresent),
+        chroma_siting_horz(kValueNotPresent),
+        chroma_siting_vert(kValueNotPresent),
+        range(kValueNotPresent),
+        transfer_characteristics(kValueNotPresent),
+        primaries(kValueNotPresent),
+        max_cll(kValueNotPresent),
+        max_fall(kValueNotPresent),
+        mastering_metadata(NULL) {}
+  ~Colour() {
+    delete mastering_metadata;
+    mastering_metadata = NULL;
+  }
+
+  static bool Parse(IMkvReader* reader, long long element_start, long long element_size, Colour** colour);
+
+  long long matrix_coefficients;
+  long long bits_per_channel;
+  long long chroma_subsampling_horz;
+  long long chroma_subsampling_vert;
+  long long cb_subsampling_horz;
+  long long cb_subsampling_vert;
+  long long chroma_siting_horz;
+  long long chroma_siting_vert;
+  long long range;
+  long long transfer_characteristics;
+  long long primaries;
+  long long max_cll;
+  long long max_fall;
+
+  MasteringMetadata* mastering_metadata;
+};
+
+struct Projection {
+  enum ProjectionType {
+    kTypeNotPresent = -1,
+    kRectangular = 0,
+    kEquirectangular = 1,
+    kCubeMap = 2,
+    kMesh = 3,
+  };
+  static const float kValueNotPresent;
+  Projection()
+      : type(kTypeNotPresent),
+        private_data(NULL),
+        private_data_length(0),
+        pose_yaw(kValueNotPresent),
+        pose_pitch(kValueNotPresent),
+        pose_roll(kValueNotPresent) {}
+  ~Projection() { delete[] private_data; }
+  static bool Parse(IMkvReader* reader, long long element_start, long long element_size, Projection** projection);
+
+  ProjectionType type;
+  unsigned char* private_data;
+  size_t private_data_length;
+  float pose_yaw;
+  float pose_pitch;
+  float pose_roll;
+};
+
+class VideoTrack : public Track {
+  VideoTrack(const VideoTrack&);
+  VideoTrack& operator=(const VideoTrack&);
+
+  VideoTrack(Segment*, long long element_start, long long element_size);
+
+ public:
+  virtual ~VideoTrack();
+  static long Parse(Segment*, const Info&, long long element_start, long long element_size, VideoTrack*&);
+
+  long long GetWidth() const;
+  long long GetHeight() const;
+  long long GetDisplayWidth() const;
+  long long GetDisplayHeight() const;
+  long long GetDisplayUnit() const;
+  long long GetStereoMode() const;
+  double GetFrameRate() const;
+
+  bool VetEntry(const BlockEntry*) const;
+  long Seek(long long time_ns, const BlockEntry*&) const;
+
+  Colour* GetColour() const;
+
+  Projection* GetProjection() const;
+
+  const char* GetColourSpace() const { return m_colour_space; }
+
+ private:
+  long long m_width;
+  long long m_height;
+  long long m_display_width;
+  long long m_display_height;
+  long long m_display_unit;
+  long long m_stereo_mode;
+  char* m_colour_space;
+  double m_rate;
+
+  Colour* m_colour;
+  Projection* m_projection;
+};
+
+class AudioTrack : public Track {
+  AudioTrack(const AudioTrack&);
+  AudioTrack& operator=(const AudioTrack&);
+
+  AudioTrack(Segment*, long long element_start, long long element_size);
+
+ public:
+  static long Parse(Segment*, const Info&, long long element_start, long long element_size, AudioTrack*&);
+
+  double GetSamplingRate() const;
+  long long GetChannels() const;
+  long long GetBitDepth() const;
+
+ private:
+  double m_rate;
+  long long m_channels;
+  long long m_bitDepth;
+};
+
+class Tracks {
+  Tracks(const Tracks&);
+  Tracks& operator=(const Tracks&);
+
+ public:
+  Segment* const m_pSegment;
+  const long long m_start;
+  const long long m_size;
+  const long long m_element_start;
+  const long long m_element_size;
+
+  Tracks(Segment*, long long start, long long size, long long element_start, long long element_size);
+
+  ~Tracks();
+
+  long Parse();
+
+  unsigned long GetTracksCount() const;
+
+  const Track* GetTrackByNumber(long tn) const;
+  const Track* GetTrackByIndex(unsigned long idx) const;
+
+ private:
+  Track** m_trackEntries;
+  Track** m_trackEntriesEnd;
+
+  long ParseTrackEntry(long long payload_start, long long payload_size, long long element_start, long long element_size,
+                       Track*&) const;
+};
+
+class Chapters {
+  Chapters(const Chapters&);
+  Chapters& operator=(const Chapters&);
+
+ public:
+  Segment* const m_pSegment;
+  const long long m_start;
+  const long long m_size;
+  const long long m_element_start;
+  const long long m_element_size;
+
+  Chapters(Segment*, long long payload_start, long long payload_size, long long element_start, long long element_size);
+
+  ~Chapters();
+
+  long Parse();
+
+  class Atom;
+  class Edition;
+
+  class Display {
+    friend class Atom;
+    Display();
+    Display(const Display&);
+    ~Display();
+    Display& operator=(const Display&);
+
+   public:
+    const char* GetString() const;
+    const char* GetLanguage() const;
+    const char* GetCountry() const;
+
+   private:
+    void Init();
+    void ShallowCopy(Display&) const;
+    void Clear();
+    long Parse(IMkvReader*, long long pos, long long size);
+
+    char* m_string;
+    char* m_language;
+    char* m_country;
+  };
+
+  class Atom {
+    friend class Edition;
+    Atom();
+    Atom(const Atom&);
+    ~Atom();
+    Atom& operator=(const Atom&);
+
+   public:
+    unsigned long long GetUID() const;
+    const char* GetStringUID() const;
+
+    long long GetStartTimecode() const;
+    long long GetStopTimecode() const;
+
+    long long GetStartTime(const Chapters*) const;
+    long long GetStopTime(const Chapters*) const;
+
+    int GetDisplayCount() const;
+    const Display* GetDisplay(int index) const;
+
+   private:
+    void Init();
+    void ShallowCopy(Atom&) const;
+    void Clear();
+    long Parse(IMkvReader*, long long pos, long long size);
+    static long long GetTime(const Chapters*, long long timecode);
+
+    long ParseDisplay(IMkvReader*, long long pos, long long size);
+    bool ExpandDisplaysArray();
+
+    char* m_string_uid;
+    unsigned long long m_uid;
+    long long m_start_timecode;
+    long long m_stop_timecode;
+
+    Display* m_displays;
+    int m_displays_size;
+    int m_displays_count;
+  };
+
+  class Edition {
+    friend class Chapters;
+    Edition();
+    Edition(const Edition&);
+    ~Edition();
+    Edition& operator=(const Edition&);
+
+   public:
+    int GetAtomCount() const;
+    const Atom* GetAtom(int index) const;
+
+   private:
+    void Init();
+    void ShallowCopy(Edition&) const;
+    void Clear();
+    long Parse(IMkvReader*, long long pos, long long size);
+
+    long ParseAtom(IMkvReader*, long long pos, long long size);
+    bool ExpandAtomsArray();
+
+    Atom* m_atoms;
+    int m_atoms_size;
+    int m_atoms_count;
+  };
+
+  int GetEditionCount() const;
+  const Edition* GetEdition(int index) const;
+
+ private:
+  long ParseEdition(long long pos, long long size);
+  bool ExpandEditionsArray();
+
+  Edition* m_editions;
+  int m_editions_size;
+  int m_editions_count;
+};
+
+class Tags {
+  Tags(const Tags&);
+  Tags& operator=(const Tags&);
+
+ public:
+  Segment* const m_pSegment;
+  const long long m_start;
+  const long long m_size;
+  const long long m_element_start;
+  const long long m_element_size;
+
+  Tags(Segment*, long long payload_start, long long payload_size, long long element_start, long long element_size);
+
+  ~Tags();
+
+  long Parse();
+
+  class Tag;
+  class SimpleTag;
+
+  class SimpleTag {
+    friend class Tag;
+    SimpleTag();
+    SimpleTag(const SimpleTag&);
+    ~SimpleTag();
+    SimpleTag& operator=(const SimpleTag&);
+
+   public:
+    const char* GetTagName() const;
+    const char* GetTagString() const;
+
+   private:
+    void Init();
+    void ShallowCopy(SimpleTag&) const;
+    void Clear();
+    long Parse(IMkvReader*, long long pos, long long size);
+
+    char* m_tag_name;
+    char* m_tag_string;
+  };
+
+  class Tag {
+    friend class Tags;
+    Tag();
+    Tag(const Tag&);
+    ~Tag();
+    Tag& operator=(const Tag&);
+
+   public:
+    int GetSimpleTagCount() const;
+    const SimpleTag* GetSimpleTag(int index) const;
+
+   private:
+    void Init();
+    void ShallowCopy(Tag&) const;
+    void Clear();
+    long Parse(IMkvReader*, long long pos, long long size);
+
+    long ParseSimpleTag(IMkvReader*, long long pos, long long size);
+    bool ExpandSimpleTagsArray();
+
+    SimpleTag* m_simple_tags;
+    int m_simple_tags_size;
+    int m_simple_tags_count;
+  };
+
+  int GetTagCount() const;
+  const Tag* GetTag(int index) const;
+
+ private:
+  long ParseTag(long long pos, long long size);
+  bool ExpandTagsArray();
+
+  Tag* m_tags;
+  int m_tags_size;
+  int m_tags_count;
+};
+
+class SegmentInfo {
+  SegmentInfo(const SegmentInfo&);
+  SegmentInfo& operator=(const SegmentInfo&);
+
+ public:
+  Segment* const m_pSegment;
+  const long long m_start;
+  const long long m_size;
+  const long long m_element_start;
+  const long long m_element_size;
+
+  SegmentInfo(Segment*, long long start, long long size, long long element_start, long long element_size);
+
+  ~SegmentInfo();
+
+  long Parse();
+
+  long long GetTimeCodeScale() const;
+  long long GetDuration() const;  // scaled
+  const char* GetMuxingAppAsUTF8() const;
+  const char* GetWritingAppAsUTF8() const;
+  const char* GetTitleAsUTF8() const;
+
+ private:
+  long long m_timecodeScale;
+  double m_duration;
+  char* m_pMuxingAppAsUTF8;
+  char* m_pWritingAppAsUTF8;
+  char* m_pTitleAsUTF8;
+};
+
+class SeekHead {
+  SeekHead(const SeekHead&);
+  SeekHead& operator=(const SeekHead&);
+
+ public:
+  Segment* const m_pSegment;
+  const long long m_start;
+  const long long m_size;
+  const long long m_element_start;
+  const long long m_element_size;
+
+  SeekHead(Segment*, long long start, long long size, long long element_start, long long element_size);
+
+  ~SeekHead();
+
+  long Parse();
+
+  struct Entry {
+    Entry();
+
+    // the SeekHead entry payload
+    long long id;
+    long long pos;
+
+    // absolute pos of SeekEntry ID
+    long long element_start;
+
+    // SeekEntry ID size + size size + payload
+    long long element_size;
+  };
+
+  int GetCount() const;
+  const Entry* GetEntry(int idx) const;
+
+  struct VoidElement {
+    // absolute pos of Void ID
+    long long element_start;
+
+    // ID size + size size + payload size
+    long long element_size;
+  };
+
+  int GetVoidElementCount() const;
+  const VoidElement* GetVoidElement(int idx) const;
+
+ private:
+  Entry* m_entries;
+  int m_entry_count;
+
+  VoidElement* m_void_elements;
+  int m_void_element_count;
+
+  static bool ParseEntry(IMkvReader*,
+                         long long pos,  // payload
+                         long long size, Entry*);
+};
+
+class Cues;
+class CuePoint {
+  friend class Cues;
+
+  CuePoint(long, long long);
+  ~CuePoint();
+
+  CuePoint(const CuePoint&);
+  CuePoint& operator=(const CuePoint&);
+
+ public:
+  long long m_element_start;
+  long long m_element_size;
+
+  bool Load(IMkvReader*);
+
+  long long GetTimeCode() const;            // absolute but unscaled
+  long long GetTime(const Segment*) const;  // absolute and scaled (ns units)
+
+  struct TrackPosition {
+    long long m_track;
+    long long m_pos;  // of cluster
+    long long m_block;
+    // codec_state  //defaults to 0
+    // reference = clusters containing req'd referenced blocks
+    //  reftime = timecode of the referenced block
+
+    bool Parse(IMkvReader*, long long, long long);
+  };
+
+  const TrackPosition* Find(const Track*) const;
+
+ private:
+  const long m_index;
+  long long m_timecode;
+  TrackPosition* m_track_positions;
+  size_t m_track_positions_count;
+};
+
+class Cues {
+  friend class Segment;
+
+  Cues(Segment*, long long start, long long size, long long element_start, long long element_size);
+  ~Cues();
+
+  Cues(const Cues&);
+  Cues& operator=(const Cues&);
+
+ public:
+  Segment* const m_pSegment;
+  const long long m_start;
+  const long long m_size;
+  const long long m_element_start;
+  const long long m_element_size;
+
+  bool Find(  // lower bound of time_ns
+      long long time_ns, const Track*, const CuePoint*&, const CuePoint::TrackPosition*&) const;
+
+  const CuePoint* GetFirst() const;
+  const CuePoint* GetLast() const;
+  const CuePoint* GetNext(const CuePoint*) const;
+
+  const BlockEntry* GetBlock(const CuePoint*, const CuePoint::TrackPosition*) const;
+
+  bool LoadCuePoint() const;
+  long GetCount() const;  // loaded only
+  // long GetTotal() const;  //loaded + preloaded
+  bool DoneParsing() const;
+
+ private:
+  bool Init() const;
+  bool PreloadCuePoint(long&, long long) const;
+
+  mutable CuePoint** m_cue_points;
+  mutable long m_count;
+  mutable long m_preload_count;
+  mutable long long m_pos;
+};
+
+class Cluster {
+  friend class Segment;
+
+  Cluster(const Cluster&);
+  Cluster& operator=(const Cluster&);
+
+ public:
+  Segment* const m_pSegment;
+
+ public:
+  static Cluster* Create(Segment*,
+                         long index,      // index in segment
+                         long long off);  // offset relative to segment
+  // long long element_size);
+
+  Cluster();  // EndOfStream
+  ~Cluster();
+
+  bool EOS() const;
+
+  long long GetTimeCode() const;   // absolute, but not scaled
+  long long GetTime() const;       // absolute, and scaled (nanosecond units)
+  long long GetFirstTime() const;  // time (ns) of first (earliest) block
+  long long GetLastTime() const;   // time (ns) of last (latest) block
+
+  long GetFirst(const BlockEntry*&) const;
+  long GetLast(const BlockEntry*&) const;
+  long GetNext(const BlockEntry* curr, const BlockEntry*& next) const;
+
+  const BlockEntry* GetEntry(const Track*, long long ns = -1) const;
+  const BlockEntry* GetEntry(const CuePoint&, const CuePoint::TrackPosition&) const;
+  // const BlockEntry* GetMaxKey(const VideoTrack*) const;
+
+  //    static bool HasBlockEntries(const Segment*, long long);
+
+  static long HasBlockEntries(const Segment*, long long idoff, long long& pos, long& size);
+
+  long GetEntryCount() const;
+
+  long Load(long long& pos, long& size) const;
+
+  long Parse(long long& pos, long& size) const;
+  long GetEntry(long index, const mkvparser::BlockEntry*&) const;
+
+ protected:
+  Cluster(Segment*, long index, long long element_start);
+  // long long element_size);
+
+ public:
+  const long long m_element_start;
+  long long GetPosition() const;  // offset relative to segment
+
+  long GetIndex() const;
+  long long GetElementSize() const;
+  // long long GetPayloadSize() const;
+
+  // long long Unparsed() const;
+
+ private:
+  long m_index;
+  mutable long long m_pos;
+  // mutable long long m_size;
+  mutable long long m_element_size;
+  mutable long long m_timecode;
+  mutable BlockEntry** m_entries;
+  mutable long m_entries_size;
+  mutable long m_entries_count;
+
+  long ParseSimpleBlock(long long, long long&, long&);
+  long ParseBlockGroup(long long, long long&, long&);
+
+  long CreateBlock(long long id, long long pos, long long size, long long discard_padding);
+  long CreateBlockGroup(long long start_offset, long long size, long long discard_padding);
+  long CreateSimpleBlock(long long, long long);
+};
+
+class Segment {
+  friend class Cues;
+  friend class Track;
+  friend class VideoTrack;
+
+  Segment(const Segment&);
+  Segment& operator=(const Segment&);
+
+ private:
+  Segment(IMkvReader*, long long elem_start,
+          // long long elem_size,
+          long long pos, long long size);
+
+ public:
+  IMkvReader* const m_pReader;
+  const long long m_element_start;
+  // const long long m_element_size;
+  const long long m_start;  // posn of segment payload
+  const long long m_size;   // size of segment payload
+  Cluster m_eos;            // TODO: make private?
+
+  static long long CreateInstance(IMkvReader*, long long, Segment*&);
+  ~Segment();
+
+  long Load();  // loads headers and all clusters
+
+  // for incremental loading
+  // long long Unparsed() const;
+  bool DoneParsing() const;
+  long long ParseHeaders();  // stops when first cluster is found
+  // long FindNextCluster(long long& pos, long& size) const;
+  long LoadCluster(long long& pos, long& size);  // load one cluster
+  long LoadCluster();
+
+  long ParseNext(const Cluster* pCurr, const Cluster*& pNext, long long& pos, long& size);
+
+  const SeekHead* GetSeekHead() const;
+  const Tracks* GetTracks() const;
+  const SegmentInfo* GetInfo() const;
+  const Cues* GetCues() const;
+  const Chapters* GetChapters() const;
+  const Tags* GetTags() const;
+
+  long long GetDuration() const;
+
+  unsigned long GetCount() const;
+  const Cluster* GetFirst() const;
+  const Cluster* GetLast() const;
+  const Cluster* GetNext(const Cluster*);
+
+  const Cluster* FindCluster(long long time_nanoseconds) const;
+  // const BlockEntry* Seek(long long time_nanoseconds, const Track*) const;
+
+  const Cluster* FindOrPreloadCluster(long long pos);
+
+  long ParseCues(long long cues_off,  // offset relative to start of segment
+                 long long& parse_pos, long& parse_len);
+
+ private:
+  long long m_pos;  // absolute file posn; what has been consumed so far
+  Cluster* m_pUnknownSize;
+
+  SeekHead* m_pSeekHead;
+  SegmentInfo* m_pInfo;
+  Tracks* m_pTracks;
+  Cues* m_pCues;
+  Chapters* m_pChapters;
+  Tags* m_pTags;
+  Cluster** m_clusters;
+  long m_clusterCount;         // number of entries for which m_index >= 0
+  long m_clusterPreloadCount;  // number of entries for which m_index < 0
+  long m_clusterSize;          // array size
+
+  long DoLoadCluster(long long&, long&);
+  long DoLoadClusterUnknownSize(long long&, long&);
+  long DoParseNext(const Cluster*&, long long&, long&);
+
+  bool AppendCluster(Cluster*);
+  bool PreloadCluster(Cluster*, ptrdiff_t);
+
+  // void ParseSeekHead(long long pos, long long size);
+  // void ParseSeekEntry(long long pos, long long size);
+  // void ParseCues(long long);
+
+  const BlockEntry* GetBlock(const CuePoint&, const CuePoint::TrackPosition&);
+};
+
+}  // namespace mkvparser
+
+inline long mkvparser::Segment::LoadCluster() {
+  long long pos;
+  long size;
+
+  return LoadCluster(pos, size);
+}
+
+#endif  // MKVPARSER_MKVPARSER_H_

diff --git a/third_party/libwebm/mkvparser/mkvreader.cc b/third_party/libwebm/mkvparser/mkvreader.cc
new file mode 100644
index 0000000..e7675e2
--- /dev/null
+++ b/third_party/libwebm/mkvparser/mkvreader.cc

@@ -0,0 +1,114 @@
+// Copyright (c) 2010 The WebM project authors. All Rights Reserved.
+//
+// Use of this source code is governed by a BSD-style license
+// that can be found in the LICENSE file in the root of the source
+// tree. An additional intellectual property rights grant can be found
+// in the file PATENTS.  All contributing project authors may
+// be found in the AUTHORS file in the root of the source tree.
+#include "mkvparser/mkvreader.h"
+
+#include <sys/types.h>
+
+#include <cassert>
+
+namespace mkvparser {
+
+MkvReader::MkvReader() : m_file(NULL), reader_owns_file_(true) {}
+
+MkvReader::MkvReader(FILE* fp) : m_file(fp), reader_owns_file_(false) { GetFileSize(); }
+
+MkvReader::~MkvReader() {
+  if (reader_owns_file_) Close();
+  m_file = NULL;
+}
+
+int MkvReader::Open(const char* fileName) {
+  if (fileName == NULL) return -1;
+
+  if (m_file) return -1;
+
+#ifdef _MSC_VER
+  const errno_t e = fopen_s(&m_file, fileName, "rb");
+
+  if (e) return -1;  // error
+#else
+  m_file = fopen(fileName, "rb");
+
+  if (m_file == NULL) return -1;
+#endif
+  return !GetFileSize();
+}
+
+bool MkvReader::GetFileSize() {
+  if (m_file == NULL) return false;
+#ifdef _MSC_VER
+  int status = _fseeki64(m_file, 0L, SEEK_END);
+
+  if (status) return false;  // error
+
+  m_length = _ftelli64(m_file);
+#else
+  fseek(m_file, 0L, SEEK_END);
+  m_length = ftell(m_file);
+#endif
+  assert(m_length >= 0);
+
+  if (m_length < 0) return false;
+
+#ifdef _MSC_VER
+  status = _fseeki64(m_file, 0L, SEEK_SET);
+
+  if (status) return false;  // error
+#else
+  fseek(m_file, 0L, SEEK_SET);
+#endif
+
+  return true;
+}
+
+void MkvReader::Close() {
+  if (m_file != NULL) {
+    fclose(m_file);
+    m_file = NULL;
+  }
+}
+
+int MkvReader::Length(long long* total, long long* available) {
+  if (m_file == NULL) return -1;
+
+  if (total) *total = m_length;
+
+  if (available) *available = m_length;
+
+  return 0;
+}
+
+int MkvReader::Read(long long offset, long len, unsigned char* buffer) {
+  if (m_file == NULL) return -1;
+
+  if (offset < 0) return -1;
+
+  if (len < 0) return -1;
+
+  if (len == 0) return 0;
+
+  if (offset >= m_length) return -1;
+
+#ifdef _MSC_VER
+  const int status = _fseeki64(m_file, offset, SEEK_SET);
+
+  if (status) return -1;  // error
+#elif defined(_WIN32)
+  fseeko64(m_file, static_cast<off_t>(offset), SEEK_SET);
+#else
+  fseek(m_file, static_cast<off_t>(offset), SEEK_SET);
+#endif
+
+  const size_t size = fread(buffer, 1, len, m_file);
+
+  if (size < size_t(len)) return -1;  // error
+
+  return 0;  // success
+}
+
+}  // namespace mkvparser

diff --git a/third_party/libwebm/mkvparser/mkvreader.h b/third_party/libwebm/mkvparser/mkvreader.h
new file mode 100644
index 0000000..9831ecf
--- /dev/null
+++ b/third_party/libwebm/mkvparser/mkvreader.h

@@ -0,0 +1,45 @@
+// Copyright (c) 2010 The WebM project authors. All Rights Reserved.
+//
+// Use of this source code is governed by a BSD-style license
+// that can be found in the LICENSE file in the root of the source
+// tree. An additional intellectual property rights grant can be found
+// in the file PATENTS.  All contributing project authors may
+// be found in the AUTHORS file in the root of the source tree.
+#ifndef MKVPARSER_MKVREADER_H_
+#define MKVPARSER_MKVREADER_H_
+
+#include <cstdio>
+
+#include "mkvparser/mkvparser.h"
+
+namespace mkvparser {
+
+class MkvReader : public IMkvReader {
+ public:
+  MkvReader();
+  explicit MkvReader(FILE* fp);
+  virtual ~MkvReader();
+
+  int Open(const char*);
+  void Close();
+
+  virtual int Read(long long position, long length, unsigned char* buffer);
+  virtual int Length(long long* total, long long* available);
+
+ private:
+  MkvReader(const MkvReader&);
+  MkvReader& operator=(const MkvReader&);
+
+  // Determines the size of the file. This is called either by the constructor
+  // or by the Open function depending on file ownership. Returns true on
+  // success.
+  bool GetFileSize();
+
+  long long m_length;
+  FILE* m_file;
+  bool reader_owns_file_;
+};
+
+}  // namespace mkvparser
+
+#endif  // MKVPARSER_MKVREADER_H_
commit	eb966e45d9fd4561743909aeef007fc368d2e228	[log] [tgz]
author	James Zern <jzern@google.com>	Fri Nov 13 16:44:17 2020 -0800
committer	James Zern <jzern@google.com>	Fri Nov 13 17:00:01 2020 -0800
tree	3225afd842febdd0d01cfd37d824ab24a33a7d33
parent	f1bcb610c1c34b10a9b7844dd5924b067bf2d151 [diff]