Merge changes from libvpx/master by cherry-pick
This commit bring all up-to-date changes from master that are
applicable to nextgenv2. Due to the remove VP10 code in master,
we had to cherry pick the following commits to get those changes:
Add default flags for arm64/armv8 builds
Allows building simple targets with sane default flags.
For example, using the Android arm64 toolchain from the NDK:
https://developer.android.com/ndk/guides/standalone_toolchain.html
./build/tools/make-standalone-toolchain.sh --arch=arm64 \
--platform=android-24 --install-dir=/tmp/arm64
CROSS=/tmp/arm64/bin/aarch64-linux-android- \
~/libvpx/configure --target=arm64-linux-gcc --disable-multithread
BUG=webm:1143
vpx_lpf_horizontal_4_sse2: Remove dead load.
Change-Id: I51026c52baa1f0881fcd5b68e1fdf08a2dc0916e
Fail early when android target does not include --sdk-path
Change-Id: I07e7e63476a2e32e3aae123abdee8b7bbbdc6a8c
configure: clean up var style and set_all usage
Use quotes whenever possible and {} always for variables.
Replace multiple set_all calls with *able_feature().
Conflicts:
build/make/configure.sh
vp9-svc: Remove some unneeded code/comment.
datarate_test,DatarateTestLarge: normalize bits type
quiets a msvc warning:
conversion from 'const int64_t' to 'size_t', possible loss of data
mips added p6600 cpu support
Removed -funroll-loops
psnr.c: use int64_t for sum of differences
Since the values can be negative.
*.asm: normalize label format
add a trailing ':', though it's optional with the tools we support, it's
more common to use it to mark a label. this also quiets the
orphan-labels warning with nasm/yasm.
BUG=b/29583530
Prevent negative variance
Due to rounding, hbd variance may become negative. This commit put in
check and clamp of negative values to 0.
configure: remove old visual studio support (<2010)
BUG=b/29583530
Conflicts:
configure
configure: restore vs_version variable
inadvertently lost in the final patchset of:
078dff7 configure: remove old visual studio support (<2010)
this prevents an empty CONFIG_VS_VERSION and avoids make failure
Require x86inc.asm
Force enable x86inc.asm when building for x86. Previously there were
compatibility issues so a flag was added to simplify disabling this
code.
The known issues have been resolved and x86inc.asm is the preferred
abstraction layer (over x86_abi_support.asm).
BUG=b:29583530
convolve_test: fix byte offsets in hbd build
CONVERT_TO_BYTEPTR(x) was corrected in:
003a9d2 Port metric computation changes from nextgenv2
to use the more common (x) within the expansion. offsets should occur
after converting the pointer to the desired type.
+ factorized some common expressions
Conflicts:
test/convolve_test.cc
vpx_dsp: remove x86inc.asm distinction
BUG=b:29583530
Conflicts:
vpx_dsp/vpx_dsp.mk
vpx_dsp/vpx_dsp_rtcd_defs.pl
vpx_dsp/x86/highbd_variance_sse2.c
vpx_dsp/x86/variance_sse2.c
test: remove x86inc.asm distinction
BUG=b:29583530
Conflicts:
test/vp9_subtract_test.cc
configure: remove x86inc.asm distinction
BUG=b:29583530
Change-Id: I59a1192142e89a6a36b906f65a491a734e603617
Update vpx subpixel 1d filter ssse3 asm
Speed test shows the new vertical filters have degradation on Celeron
Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
the vertical filters activated code. Now just simply active the code
without degradation on Celeron. Later there should be 2 set of vertical
filters ssse3 functions, and let jump table to choose based on CPU type.
improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw
Make set_reference control API work in VP9
Moved the API patch from NextGenv2. An example was included.
To try it, for example, run the following command:
$ examples/vpx_cx_set_ref vp9 352 288 in.yuv out.ivf 4 30
Conflicts:
examples.mk
examples/vpx_cx_set_ref.c
test/cx_set_ref.sh
vp9/decoder/vp9_decoder.c
deblock filter : moved from vp8 code branch
The deblocking filters used in vp8 have been moved to vpx_dsp for
use by both vp8 and vp9.
vpx_thread.[hc]: update webp source reference
+ drop the blob hash, the updated reference will be updated in the
commit message
BUG=b/29583578
vpx_thread: use native windows cond var if available
BUG=b/29583578
original webp change:
commit 110ad5835ecd66995d0e7f66dca1b90dea595f5a
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 19:49:58 2015 -0800
thread: use native windows cond var if available
Vista / Server 2008 and up. no speed difference observed.
100644 blob 4fc372b7bc6980a9ed3618c8cce5b67ed7b0f412 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
vpx_thread: use InitializeCriticalSectionEx if available
BUG=b/29583578
original webp change:
commit 63fadc9ffacc77d4617526a50c696d21d558a70b
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 20:38:46 2015 -0800
thread: use InitializeCriticalSectionEx if available
Windows Vista / Server 2008 and up
100644 blob f84207d89b3a6bb98bfe8f3fa55cad72dfd061ff src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
vpx_thread: use WaitForSingleObjectEx if available
BUG=b/29583578
original webp change:
commit 0fd0e12bfe83f16ce4f1c038b251ccbc13c62ac2
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 20:40:26 2015 -0800
thread: use WaitForSingleObjectEx if available
Windows XP and up
100644 blob d58f74e5523dbc985fc531cf5f0833f1e9157cf0 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
vpx_thread: use CreateThread for windows phone
BUG=b/29583578
original webp change:
commit d2afe974f9d751de144ef09d31255aea13b442c0
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 20:41:26 2015 -0800
thread: use CreateThread for windows phone
_beginthreadex is unavailable for winrt/uwp
Change-Id: Ie7412a568278ac67f0047f1764e2521193d74d4d
100644 blob 93f7622797f05f6acc1126e8296c481d276e4047 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
vp9_postproc.c missing extern.
BUG=webm:1256
deblock: missing const on extern const.
postproc - move filling of noise buffer to vpx_dsp.
Fix encoder crashes for odd size input
clean-up vp9_intrapred_test
remove tuple and overkill VP9IntraPredBase class.
postproc: noise style fixes.
gtest-all.cc: quiet an unused variable warning
under windows / mingw builds
vp9_intrapred_test: follow-up cleanup
address few comments from ce050afaf3e288895c3bee4160336e2d2133b6ea
Change-Id: I3eece7efa9335f4210303993ef6c1857ad5c29c8
diff --git a/build/make/configure.sh b/build/make/configure.sh
index ee887ab..86afb88 100644
--- a/build/make/configure.sh
+++ b/build/make/configure.sh
@@ -186,24 +186,6 @@
# Boolean Manipulation Functions
#
-enable_codec(){
- enabled $1 || echo " enabling $1"
- set_all yes $1
-
- is_in $1 vp8 vp9 vp10 && \
- set_all yes $1_encoder && \
- set_all yes $1_decoder
-}
-
-disable_codec(){
- disabled $1 || echo " disabling $1"
- set_all no $1
-
- is_in $1 vp8 vp9 vp10 && \
- set_all no $1_encoder && \
- set_all no $1_decoder
-}
-
enable_feature(){
set_all yes $*
}
@@ -220,6 +202,20 @@
eval test "x\$$1" = "xno"
}
+enable_codec(){
+ enabled "${1}" || echo " enabling ${1}"
+ enable_feature "${1}"
+
+ is_in "${1}" vp8 vp9 vp10 && enable_feature "${1}_encoder" "${1}_decoder"
+}
+
+disable_codec(){
+ disabled "${1}" || echo " disabling ${1}"
+ disable_feature "${1}"
+
+ is_in "${1}" vp8 vp9 vp10 && disable_feature "${1}_encoder" "${1}_decoder"
+}
+
# Iterates through positional parameters, checks to confirm the parameter has
# not been explicitly (force) disabled, and enables the setting controlled by
# the parameter when the setting is not disabled.
@@ -945,6 +941,9 @@
check_add_cflags -mfpu=neon #-ftree-vectorize
check_add_asflags -mfpu=neon
fi
+ elif [ ${tgt_isa} = "arm64" ] || [ ${tgt_isa} = "armv8" ]; then
+ check_add_cflags -march=armv8-a
+ check_add_asflags -march=armv8-a
else
check_add_cflags -march=${tgt_isa}
check_add_asflags -march=${tgt_isa}
@@ -1012,6 +1011,10 @@
;;
android*)
+ if [ -z "${sdk_path}" ]; then
+ die "Must specify --sdk-path for Android builds."
+ fi
+
SDK_PATH=${sdk_path}
COMPILER_LOCATION=`find "${SDK_PATH}" \
-name "arm-linux-androideabi-gcc*" -print -quit`
@@ -1150,13 +1153,13 @@
if [ -n "${tune_cpu}" ]; then
case ${tune_cpu} in
p5600)
- check_add_cflags -mips32r5 -funroll-loops -mload-store-pairs
+ check_add_cflags -mips32r5 -mload-store-pairs
check_add_cflags -msched-weight -mhard-float -mfp64
check_add_asflags -mips32r5 -mhard-float -mfp64
check_add_ldflags -mfp64
;;
- i6400)
- check_add_cflags -mips64r6 -mabi=64 -funroll-loops -msched-weight
+ i6400|p6600)
+ check_add_cflags -mips64r6 -mabi=64 -msched-weight
check_add_cflags -mload-store-pairs -mhard-float -mfp64
check_add_asflags -mips64r6 -mabi=64 -mhard-float -mfp64
check_add_ldflags -mips64r6 -mabi=64 -mfp64
@@ -1393,10 +1396,6 @@
fi
fi
- if [ "${tgt_isa}" = "x86_64" ] || [ "${tgt_isa}" = "x86" ]; then
- soft_enable use_x86inc
- fi
-
# Position Independent Code (PIC) support, for building relocatable
# shared objects
enabled gcc && enabled pic && check_add_cflags -fPIC
diff --git a/configure b/configure
index ae9bb5d..cf6a7c3 100755
--- a/configure
+++ b/configure
@@ -98,11 +98,11 @@
# all_platforms is a list of all supported target platforms. Maintain
# alphabetically by architecture, generic-gnu last.
+all_platforms="${all_platforms} arm64-darwin-gcc"
+all_platforms="${all_platforms} arm64-linux-gcc"
all_platforms="${all_platforms} armv6-linux-rvct"
all_platforms="${all_platforms} armv6-linux-gcc"
all_platforms="${all_platforms} armv6-none-rvct"
-all_platforms="${all_platforms} arm64-darwin-gcc"
-all_platforms="${all_platforms} arm64-linux-gcc"
all_platforms="${all_platforms} armv7-android-gcc" #neon Cortex-A8
all_platforms="${all_platforms} armv7-darwin-gcc" #neon Cortex-A8
all_platforms="${all_platforms} armv7-linux-rvct" #neon Cortex-A8
@@ -112,6 +112,7 @@
all_platforms="${all_platforms} armv7-win32-vs12"
all_platforms="${all_platforms} armv7-win32-vs14"
all_platforms="${all_platforms} armv7s-darwin-gcc"
+all_platforms="${all_platforms} armv8-linux-gcc"
all_platforms="${all_platforms} mips32-linux-gcc"
all_platforms="${all_platforms} mips64-linux-gcc"
all_platforms="${all_platforms} sparc-solaris-gcc"
@@ -293,7 +294,6 @@
install_bins
install_libs
install_srcs
- use_x86inc
debug
gprof
gcov
@@ -355,7 +355,6 @@
gprof
gcov
pic
- use_x86inc
optimizations
ccache
runtime_cpu_detect
diff --git a/test/add_noise_test.cc b/test/add_noise_test.cc
index e9945c4..35aaadf 100644
--- a/test/add_noise_test.cc
+++ b/test/add_noise_test.cc
@@ -13,6 +13,7 @@
#include "third_party/googletest/src/include/gtest/gtest.h"
#include "./vpx_dsp_rtcd.h"
#include "vpx/vpx_integer.h"
+#include "vpx_dsp/postproc.h"
#include "vpx_mem/vpx_mem.h"
namespace {
@@ -40,50 +41,6 @@
return sqrt(v);
}
-// TODO(jimbankoski): The following 2 functions are duplicated in each codec.
-// For now the vp9 one has been copied into the test as is. We should normalize
-// these in vpx_dsp and not have 3 copies of these unless there is different
-// noise we add for each codec.
-
-double gaussian(double sigma, double mu, double x) {
- return 1 / (sigma * sqrt(2.0 * 3.14159265)) *
- (exp(-(x - mu) * (x - mu) / (2 * sigma * sigma)));
-}
-
-int setup_noise(int size_noise, char *noise) {
- char char_dist[300];
- const int ai = 4;
- const int qi = 24;
- const double sigma = ai + .5 + .6 * (63 - qi) / 63.0;
-
- /* set up a lookup table of 256 entries that matches
- * a gaussian distribution with sigma determined by q.
- */
- int next = 0;
-
- for (int i = -32; i < 32; i++) {
- int a_i = (int) (0.5 + 256 * gaussian(sigma, 0, i));
-
- if (a_i) {
- for (int j = 0; j < a_i; j++) {
- char_dist[next + j] = (char)(i);
- }
-
- next = next + a_i;
- }
- }
-
- for (; next < 256; next++)
- char_dist[next] = 0;
-
- for (int i = 0; i < size_noise; i++) {
- noise[i] = char_dist[rand() & 0xff]; // NOLINT
- }
-
- // Returns the most negative value in distribution.
- return char_dist[0];
-}
-
TEST_P(AddNoiseTest, CheckNoiseAdded) {
DECLARE_ALIGNED(16, char, blackclamp[16]);
DECLARE_ALIGNED(16, char, whiteclamp[16]);
@@ -92,12 +49,12 @@
const int height = 64;
const int image_size = width * height;
char noise[3072];
+ const int clamp = vpx_setup_noise(4.4, sizeof(noise), noise);
- const int clamp = setup_noise(3072, noise);
for (int i = 0; i < 16; i++) {
- blackclamp[i] = -clamp;
- whiteclamp[i] = -clamp;
- bothclamp[i] = -2 * clamp;
+ blackclamp[i] = clamp;
+ whiteclamp[i] = clamp;
+ bothclamp[i] = 2 * clamp;
}
uint8_t *const s = reinterpret_cast<uint8_t *>(vpx_calloc(image_size, 1));
@@ -127,7 +84,7 @@
// Check to make sure don't roll over.
for (int i = 0; i < image_size; ++i) {
- EXPECT_GT((int)s[i], 10) << "i = " << i;
+ EXPECT_GT(static_cast<int>(s[i]), clamp) << "i = " << i;
}
// Initialize pixels in the image to 0 and check for roll under.
@@ -138,7 +95,7 @@
// Check to make sure don't roll under.
for (int i = 0; i < image_size; ++i) {
- EXPECT_LT((int)s[i], 245) << "i = " << i;
+ EXPECT_LT(static_cast<int>(s[i]), 255 - clamp) << "i = " << i;
}
vpx_free(s);
@@ -153,11 +110,12 @@
const int image_size = width * height;
char noise[3072];
- const int clamp = setup_noise(3072, noise);
+ const int clamp = vpx_setup_noise(4.4, sizeof(noise), noise);
+
for (int i = 0; i < 16; i++) {
- blackclamp[i] = -clamp;
- whiteclamp[i] = -clamp;
- bothclamp[i] = -2 * clamp;
+ blackclamp[i] = clamp;
+ whiteclamp[i] = clamp;
+ bothclamp[i] = 2 * clamp;
}
uint8_t *const s = reinterpret_cast<uint8_t *>(vpx_calloc(image_size, 1));
@@ -175,7 +133,7 @@
width, height, width));
for (int i = 0; i < image_size; ++i) {
- EXPECT_EQ((int)s[i], (int)d[i]) << "i = " << i;
+ EXPECT_EQ(static_cast<int>(s[i]), static_cast<int>(d[i])) << "i = " << i;
}
vpx_free(d);
diff --git a/test/convolve_test.cc b/test/convolve_test.cc
index 21f185a..70802ec 100644
--- a/test/convolve_test.cc
+++ b/test/convolve_test.cc
@@ -453,7 +453,7 @@
memcpy(output_ref_, output_, kOutputBufferSize);
#if CONFIG_VP9_HIGHBITDEPTH
memcpy(output16_ref_, output16_,
- kOutputBufferSize * sizeof(*output16_ref_));
+ kOutputBufferSize * sizeof(output16_ref_[0]));
#endif
}
@@ -465,41 +465,41 @@
}
uint8_t *input() const {
- const int index = BorderTop() * kOuterBlockSize + BorderLeft();
+ const int offset = BorderTop() * kOuterBlockSize + BorderLeft();
#if CONFIG_VP9_HIGHBITDEPTH
if (UUT_->use_highbd_ == 0) {
- return input_ + index;
+ return input_ + offset;
} else {
- return CONVERT_TO_BYTEPTR(input16_) + index;
+ return CONVERT_TO_BYTEPTR(input16_) + offset;
}
#else
- return input_ + index;
+ return input_ + offset;
#endif
}
uint8_t *output() const {
- const int index = BorderTop() * kOuterBlockSize + BorderLeft();
+ const int offset = BorderTop() * kOuterBlockSize + BorderLeft();
#if CONFIG_VP9_HIGHBITDEPTH
if (UUT_->use_highbd_ == 0) {
- return output_ + index;
+ return output_ + offset;
} else {
- return CONVERT_TO_BYTEPTR(output16_ + index);
+ return CONVERT_TO_BYTEPTR(output16_) + offset;
}
#else
- return output_ + index;
+ return output_ + offset;
#endif
}
uint8_t *output_ref() const {
- const int index = BorderTop() * kOuterBlockSize + BorderLeft();
+ const int offset = BorderTop() * kOuterBlockSize + BorderLeft();
#if CONFIG_VP9_HIGHBITDEPTH
if (UUT_->use_highbd_ == 0) {
- return output_ref_ + index;
+ return output_ref_ + offset;
} else {
- return CONVERT_TO_BYTEPTR(output16_ref_ + index);
+ return CONVERT_TO_BYTEPTR(output16_ref_) + offset;
}
#else
- return output_ref_ + index;
+ return output_ref_ + offset;
#endif
}
@@ -1011,14 +1011,12 @@
w, h, bd); \
}
#if HAVE_SSE2 && ARCH_X86_64
-#if CONFIG_USE_X86INC
WRAP(convolve_copy_sse2, 8)
WRAP(convolve_avg_sse2, 8)
WRAP(convolve_copy_sse2, 10)
WRAP(convolve_avg_sse2, 10)
WRAP(convolve_copy_sse2, 12)
WRAP(convolve_avg_sse2, 12)
-#endif // CONFIG_USE_X86INC
WRAP(convolve8_horiz_sse2, 8)
WRAP(convolve8_avg_horiz_sse2, 8)
WRAP(convolve8_vert_sse2, 8)
@@ -1112,11 +1110,7 @@
#if HAVE_SSE2 && ARCH_X86_64
#if CONFIG_VP9_HIGHBITDEPTH
const ConvolveFunctions convolve8_sse2(
-#if CONFIG_USE_X86INC
wrap_convolve_copy_sse2_8, wrap_convolve_avg_sse2_8,
-#else
- wrap_convolve_copy_c_8, wrap_convolve_avg_c_8,
-#endif // CONFIG_USE_X86INC
wrap_convolve8_horiz_sse2_8, wrap_convolve8_avg_horiz_sse2_8,
wrap_convolve8_vert_sse2_8, wrap_convolve8_avg_vert_sse2_8,
wrap_convolve8_sse2_8, wrap_convolve8_avg_sse2_8,
@@ -1124,11 +1118,7 @@
wrap_convolve8_vert_sse2_8, wrap_convolve8_avg_vert_sse2_8,
wrap_convolve8_sse2_8, wrap_convolve8_avg_sse2_8, 8);
const ConvolveFunctions convolve10_sse2(
-#if CONFIG_USE_X86INC
wrap_convolve_copy_sse2_10, wrap_convolve_avg_sse2_10,
-#else
- wrap_convolve_copy_c_10, wrap_convolve_avg_c_10,
-#endif // CONFIG_USE_X86INC
wrap_convolve8_horiz_sse2_10, wrap_convolve8_avg_horiz_sse2_10,
wrap_convolve8_vert_sse2_10, wrap_convolve8_avg_vert_sse2_10,
wrap_convolve8_sse2_10, wrap_convolve8_avg_sse2_10,
@@ -1136,11 +1126,7 @@
wrap_convolve8_vert_sse2_10, wrap_convolve8_avg_vert_sse2_10,
wrap_convolve8_sse2_10, wrap_convolve8_avg_sse2_10, 10);
const ConvolveFunctions convolve12_sse2(
-#if CONFIG_USE_X86INC
wrap_convolve_copy_sse2_12, wrap_convolve_avg_sse2_12,
-#else
- wrap_convolve_copy_c_12, wrap_convolve_avg_c_12,
-#endif // CONFIG_USE_X86INC
wrap_convolve8_horiz_sse2_12, wrap_convolve8_avg_horiz_sse2_12,
wrap_convolve8_vert_sse2_12, wrap_convolve8_avg_vert_sse2_12,
wrap_convolve8_sse2_12, wrap_convolve8_avg_sse2_12,
@@ -1154,11 +1140,7 @@
};
#else
const ConvolveFunctions convolve8_sse2(
-#if CONFIG_USE_X86INC
vpx_convolve_copy_sse2, vpx_convolve_avg_sse2,
-#else
- vpx_convolve_copy_c, vpx_convolve_avg_c,
-#endif // CONFIG_USE_X86INC
vpx_convolve8_horiz_sse2, vpx_convolve8_avg_horiz_sse2,
vpx_convolve8_vert_sse2, vpx_convolve8_avg_vert_sse2,
vpx_convolve8_sse2, vpx_convolve8_avg_sse2,
diff --git a/test/cx_set_ref.sh b/test/cx_set_ref.sh
index c21894e..b319bbf 100755
--- a/test/cx_set_ref.sh
+++ b/test/cx_set_ref.sh
@@ -1,6 +1,6 @@
#!/bin/sh
##
-## Copyright (c) 2014 The WebM project authors. All Rights Reserved.
+## Copyright (c) 2016 The WebM project authors. All Rights Reserved.
##
## Use of this source code is governed by a BSD-style license
## that can be found in the LICENSE file in the root of the source
diff --git a/test/datarate_test.cc b/test/datarate_test.cc
index 2f1db9c..220cbf3 100644
--- a/test/datarate_test.cc
+++ b/test/datarate_test.cc
@@ -135,7 +135,7 @@
double duration_;
double file_datarate_;
double effective_datarate_;
- size_t bits_in_last_frame_;
+ int64_t bits_in_last_frame_;
int denoiser_on_;
int denoiser_offon_test_;
int denoiser_offon_period_;
diff --git a/test/fdct4x4_test.cc b/test/fdct4x4_test.cc
index f6b6567..236f75e 100644
--- a/test/fdct4x4_test.cc
+++ b/test/fdct4x4_test.cc
@@ -302,7 +302,7 @@
make_tuple(&vp9_fht4x4_c, &vp9_iht4x4_16_add_neon, 3, VPX_BITS_8, 16)));
#endif // HAVE_NEON && !CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE
-#if CONFIG_USE_X86INC && HAVE_SSE2 && !CONFIG_EMULATE_HARDWARE
+#if HAVE_SSE2 && !CONFIG_EMULATE_HARDWARE
INSTANTIATE_TEST_CASE_P(
SSE2, Trans4x4WHT,
::testing::Values(
diff --git a/test/fdct8x8_test.cc b/test/fdct8x8_test.cc
index 29f2158..083ee66 100644
--- a/test/fdct8x8_test.cc
+++ b/test/fdct8x8_test.cc
@@ -766,7 +766,7 @@
&idct8x8_64_add_12_sse2, 6225, VPX_BITS_12)));
#endif // HAVE_SSE2 && CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE
-#if HAVE_SSSE3 && CONFIG_USE_X86INC && ARCH_X86_64 && \
+#if HAVE_SSSE3 && ARCH_X86_64 && \
!CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE
INSTANTIATE_TEST_CASE_P(
SSSE3, FwdTrans8x8DCT,
diff --git a/test/hadamard_test.cc b/test/hadamard_test.cc
index 7a5bd5b..b8eec52 100644
--- a/test/hadamard_test.cc
+++ b/test/hadamard_test.cc
@@ -152,10 +152,10 @@
::testing::Values(&vpx_hadamard_8x8_sse2));
#endif // HAVE_SSE2
-#if HAVE_SSSE3 && CONFIG_USE_X86INC && ARCH_X86_64
+#if HAVE_SSSE3 && ARCH_X86_64
INSTANTIATE_TEST_CASE_P(SSSE3, Hadamard8x8Test,
::testing::Values(&vpx_hadamard_8x8_ssse3));
-#endif // HAVE_SSSE3 && CONFIG_USE_X86INC && ARCH_X86_64
+#endif // HAVE_SSSE3 && ARCH_X86_64
#if HAVE_NEON
INSTANTIATE_TEST_CASE_P(NEON, Hadamard8x8Test,
diff --git a/test/partial_idct_test.cc b/test/partial_idct_test.cc
index 6c82412..1efb1a4 100644
--- a/test/partial_idct_test.cc
+++ b/test/partial_idct_test.cc
@@ -295,7 +295,7 @@
TX_4X4, 1)));
#endif
-#if HAVE_SSSE3 && CONFIG_USE_X86INC && ARCH_X86_64 && \
+#if HAVE_SSSE3 && ARCH_X86_64 && \
!CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE
INSTANTIATE_TEST_CASE_P(
SSSE3_64, PartialIDctTest,
diff --git a/test/pp_filter_test.cc b/test/pp_filter_test.cc
index e4688dd..89349e4 100644
--- a/test/pp_filter_test.cc
+++ b/test/pp_filter_test.cc
@@ -11,7 +11,7 @@
#include "test/register_state_check.h"
#include "third_party/googletest/src/include/gtest/gtest.h"
#include "./vpx_config.h"
-#include "./vp8_rtcd.h"
+#include "./vpx_dsp_rtcd.h"
#include "vpx/vpx_integer.h"
#include "vpx_mem/vpx_mem.h"
@@ -25,7 +25,7 @@
namespace {
-class VP8PostProcessingFilterTest
+class VPxPostProcessingFilterTest
: public ::testing::TestWithParam<PostProcFunc> {
public:
virtual void TearDown() {
@@ -33,10 +33,10 @@
}
};
-// Test routine for the VP8 post-processing function
-// vp8_post_proc_down_and_across_mb_row_c.
+// Test routine for the VPx post-processing function
+// vpx_post_proc_down_and_across_mb_row_c.
-TEST_P(VP8PostProcessingFilterTest, FilterOutputCheck) {
+TEST_P(VPxPostProcessingFilterTest, FilterOutputCheck) {
// Size of the underlying data block that will be filtered.
const int block_width = 16;
const int block_height = 16;
@@ -92,7 +92,7 @@
for (int i = 0; i < block_height; ++i) {
for (int j = 0; j < block_width; ++j) {
EXPECT_EQ(expected_data[i], pixel_ptr[j])
- << "VP8PostProcessingFilterTest failed with invalid filter output";
+ << "VPxPostProcessingFilterTest failed with invalid filter output";
}
pixel_ptr += output_stride;
}
@@ -102,17 +102,17 @@
vpx_free(flimits);
};
-INSTANTIATE_TEST_CASE_P(C, VP8PostProcessingFilterTest,
- ::testing::Values(vp8_post_proc_down_and_across_mb_row_c));
+INSTANTIATE_TEST_CASE_P(C, VPxPostProcessingFilterTest,
+ ::testing::Values(vpx_post_proc_down_and_across_mb_row_c));
#if HAVE_SSE2
-INSTANTIATE_TEST_CASE_P(SSE2, VP8PostProcessingFilterTest,
- ::testing::Values(vp8_post_proc_down_and_across_mb_row_sse2));
+INSTANTIATE_TEST_CASE_P(SSE2, VPxPostProcessingFilterTest,
+ ::testing::Values(vpx_post_proc_down_and_across_mb_row_sse2));
#endif
#if HAVE_MSA
-INSTANTIATE_TEST_CASE_P(MSA, VP8PostProcessingFilterTest,
- ::testing::Values(vp8_post_proc_down_and_across_mb_row_msa));
+INSTANTIATE_TEST_CASE_P(MSA, VPxPostProcessingFilterTest,
+ ::testing::Values(vpx_post_proc_down_and_across_mb_row_msa));
#endif
} // namespace
diff --git a/test/sad_test.cc b/test/sad_test.cc
index f277294..36f777d 100644
--- a/test/sad_test.cc
+++ b/test/sad_test.cc
@@ -750,7 +750,6 @@
//------------------------------------------------------------------------------
// x86 functions
#if HAVE_SSE2
-#if CONFIG_USE_X86INC
const SadMxNParam sse2_tests[] = {
#if CONFIG_VP10 && CONFIG_EXT_PARTITION
make_tuple(128, 128, &vpx_sad128x128_sse2, -1),
@@ -927,7 +926,6 @@
#endif // CONFIG_VP9_HIGHBITDEPTH
};
INSTANTIATE_TEST_CASE_P(SSE2, SADx4Test, ::testing::ValuesIn(x4d_sse2_tests));
-#endif // CONFIG_USE_X86INC
#endif // HAVE_SSE2
#if HAVE_SSE3
diff --git a/test/test_intra_pred_speed.cc b/test/test_intra_pred_speed.cc
index 2acf744..8928bf8 100644
--- a/test/test_intra_pred_speed.cc
+++ b/test/test_intra_pred_speed.cc
@@ -187,21 +187,21 @@
vpx_d153_predictor_4x4_c, vpx_d207_predictor_4x4_c,
vpx_d63_predictor_4x4_c, vpx_tm_predictor_4x4_c)
-#if HAVE_SSE2 && CONFIG_USE_X86INC
+#if HAVE_SSE2
INTRA_PRED_TEST(SSE2, TestIntraPred4, vpx_dc_predictor_4x4_sse2,
vpx_dc_left_predictor_4x4_sse2, vpx_dc_top_predictor_4x4_sse2,
vpx_dc_128_predictor_4x4_sse2, vpx_v_predictor_4x4_sse2,
vpx_h_predictor_4x4_sse2, vpx_d45_predictor_4x4_sse2, NULL,
NULL, NULL, vpx_d207_predictor_4x4_sse2, NULL,
vpx_tm_predictor_4x4_sse2)
-#endif // HAVE_SSE2 && CONFIG_USE_X86INC
+#endif // HAVE_SSE2
-#if HAVE_SSSE3 && CONFIG_USE_X86INC
+#if HAVE_SSSE3
INTRA_PRED_TEST(SSSE3, TestIntraPred4, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL,
vpx_d153_predictor_4x4_ssse3, NULL,
vpx_d63_predictor_4x4_ssse3, NULL)
-#endif // HAVE_SSSE3 && CONFIG_USE_X86INC
+#endif // HAVE_SSSE3
#if HAVE_DSPR2
INTRA_PRED_TEST(DSPR2, TestIntraPred4, vpx_dc_predictor_4x4_dspr2, NULL, NULL,
@@ -237,20 +237,20 @@
vpx_d153_predictor_8x8_c, vpx_d207_predictor_8x8_c,
vpx_d63_predictor_8x8_c, vpx_tm_predictor_8x8_c)
-#if HAVE_SSE2 && CONFIG_USE_X86INC
+#if HAVE_SSE2
INTRA_PRED_TEST(SSE2, TestIntraPred8, vpx_dc_predictor_8x8_sse2,
vpx_dc_left_predictor_8x8_sse2, vpx_dc_top_predictor_8x8_sse2,
vpx_dc_128_predictor_8x8_sse2, vpx_v_predictor_8x8_sse2,
vpx_h_predictor_8x8_sse2, vpx_d45_predictor_8x8_sse2, NULL,
NULL, NULL, NULL, NULL, vpx_tm_predictor_8x8_sse2)
-#endif // HAVE_SSE2 && CONFIG_USE_X86INC
+#endif // HAVE_SSE2
-#if HAVE_SSSE3 && CONFIG_USE_X86INC
+#if HAVE_SSSE3
INTRA_PRED_TEST(SSSE3, TestIntraPred8, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL,
vpx_d153_predictor_8x8_ssse3, vpx_d207_predictor_8x8_ssse3,
vpx_d63_predictor_8x8_ssse3, NULL)
-#endif // HAVE_SSSE3 && CONFIG_USE_X86INC
+#endif // HAVE_SSSE3
#if HAVE_DSPR2
INTRA_PRED_TEST(DSPR2, TestIntraPred8, vpx_dc_predictor_8x8_dspr2, NULL, NULL,
@@ -286,22 +286,22 @@
vpx_d153_predictor_16x16_c, vpx_d207_predictor_16x16_c,
vpx_d63_predictor_16x16_c, vpx_tm_predictor_16x16_c)
-#if HAVE_SSE2 && CONFIG_USE_X86INC
+#if HAVE_SSE2
INTRA_PRED_TEST(SSE2, TestIntraPred16, vpx_dc_predictor_16x16_sse2,
vpx_dc_left_predictor_16x16_sse2,
vpx_dc_top_predictor_16x16_sse2,
vpx_dc_128_predictor_16x16_sse2, vpx_v_predictor_16x16_sse2,
vpx_h_predictor_16x16_sse2, NULL, NULL, NULL, NULL, NULL, NULL,
vpx_tm_predictor_16x16_sse2)
-#endif // HAVE_SSE2 && CONFIG_USE_X86INC
+#endif // HAVE_SSE2
-#if HAVE_SSSE3 && CONFIG_USE_X86INC
+#if HAVE_SSSE3
INTRA_PRED_TEST(SSSE3, TestIntraPred16, NULL, NULL, NULL, NULL, NULL,
NULL, vpx_d45_predictor_16x16_ssse3,
NULL, NULL, vpx_d153_predictor_16x16_ssse3,
vpx_d207_predictor_16x16_ssse3, vpx_d63_predictor_16x16_ssse3,
NULL)
-#endif // HAVE_SSSE3 && CONFIG_USE_X86INC
+#endif // HAVE_SSSE3
#if HAVE_DSPR2
INTRA_PRED_TEST(DSPR2, TestIntraPred16, vpx_dc_predictor_16x16_dspr2, NULL,
@@ -337,21 +337,21 @@
vpx_d153_predictor_32x32_c, vpx_d207_predictor_32x32_c,
vpx_d63_predictor_32x32_c, vpx_tm_predictor_32x32_c)
-#if HAVE_SSE2 && CONFIG_USE_X86INC
+#if HAVE_SSE2
INTRA_PRED_TEST(SSE2, TestIntraPred32, vpx_dc_predictor_32x32_sse2,
vpx_dc_left_predictor_32x32_sse2,
vpx_dc_top_predictor_32x32_sse2,
vpx_dc_128_predictor_32x32_sse2, vpx_v_predictor_32x32_sse2,
vpx_h_predictor_32x32_sse2, NULL, NULL, NULL, NULL, NULL,
NULL, vpx_tm_predictor_32x32_sse2)
-#endif // HAVE_SSE2 && CONFIG_USE_X86INC
+#endif // HAVE_SSE2
-#if HAVE_SSSE3 && CONFIG_USE_X86INC
+#if HAVE_SSSE3
INTRA_PRED_TEST(SSSE3, TestIntraPred32, NULL, NULL, NULL, NULL, NULL,
NULL, vpx_d45_predictor_32x32_ssse3, NULL, NULL,
vpx_d153_predictor_32x32_ssse3, vpx_d207_predictor_32x32_ssse3,
vpx_d63_predictor_32x32_ssse3, NULL)
-#endif // HAVE_SSSE3 && CONFIG_USE_X86INC
+#endif // HAVE_SSSE3
#if HAVE_NEON
INTRA_PRED_TEST(NEON, TestIntraPred32, vpx_dc_predictor_32x32_neon,
diff --git a/test/variance_test.cc b/test/variance_test.cc
index 7eaed27..6574749 100644
--- a/test/variance_test.cc
+++ b/test/variance_test.cc
@@ -217,7 +217,6 @@
: public ::testing::TestWithParam<tuple<int, int,
VarianceFunctionType, int> > {
public:
- typedef tuple<int, int, VarianceFunctionType, int> ParamType;
virtual void SetUp() {
const tuple<int, int, VarianceFunctionType, int>& params = this->GetParam();
log2width_ = get<0>(params);
@@ -766,77 +765,53 @@
make_tuple(3, 4, &vpx_mse8x16_c),
make_tuple(3, 3, &vpx_mse8x8_c)));
-const VpxVarianceTest::ParamType kArrayVariance_c[] = {
-#if CONFIG_VP10 && CONFIG_EXT_PARTITION
- make_tuple(7, 7, &vpx_variance128x128_c, 0),
- make_tuple(7, 6, &vpx_variance128x64_c, 0),
- make_tuple(6, 7, &vpx_variance64x128_c, 0),
-#endif // CONFIG_VP10 && CONFIG_EXT_PARTITION
- make_tuple(6, 6, &vpx_variance64x64_c, 0),
- make_tuple(6, 5, &vpx_variance64x32_c, 0),
- make_tuple(5, 6, &vpx_variance32x64_c, 0),
- make_tuple(5, 5, &vpx_variance32x32_c, 0),
- make_tuple(5, 4, &vpx_variance32x16_c, 0),
- make_tuple(4, 5, &vpx_variance16x32_c, 0),
- make_tuple(4, 4, &vpx_variance16x16_c, 0),
- make_tuple(4, 3, &vpx_variance16x8_c, 0),
- make_tuple(3, 4, &vpx_variance8x16_c, 0),
- make_tuple(3, 3, &vpx_variance8x8_c, 0),
- make_tuple(3, 2, &vpx_variance8x4_c, 0),
- make_tuple(2, 3, &vpx_variance4x8_c, 0),
- make_tuple(2, 2, &vpx_variance4x4_c, 0)
-};
INSTANTIATE_TEST_CASE_P(
C, VpxVarianceTest,
- ::testing::ValuesIn(kArrayVariance_c));
+ ::testing::Values(make_tuple(6, 6, &vpx_variance64x64_c, 0),
+ make_tuple(6, 5, &vpx_variance64x32_c, 0),
+ make_tuple(5, 6, &vpx_variance32x64_c, 0),
+ make_tuple(5, 5, &vpx_variance32x32_c, 0),
+ make_tuple(5, 4, &vpx_variance32x16_c, 0),
+ make_tuple(4, 5, &vpx_variance16x32_c, 0),
+ make_tuple(4, 4, &vpx_variance16x16_c, 0),
+ make_tuple(4, 3, &vpx_variance16x8_c, 0),
+ make_tuple(3, 4, &vpx_variance8x16_c, 0),
+ make_tuple(3, 3, &vpx_variance8x8_c, 0),
+ make_tuple(3, 2, &vpx_variance8x4_c, 0),
+ make_tuple(2, 3, &vpx_variance4x8_c, 0),
+ make_tuple(2, 2, &vpx_variance4x4_c, 0)));
-const VpxSubpelVarianceTest::ParamType kArraySubpelVariance_c[] = {
-#if CONFIG_VP10 && CONFIG_EXT_PARTITION
- make_tuple(7, 7, &vpx_sub_pixel_variance128x128_c, 0),
- make_tuple(7, 6, &vpx_sub_pixel_variance128x64_c, 0),
- make_tuple(6, 7, &vpx_sub_pixel_variance64x128_c, 0),
-#endif // CONFIG_VP10 && CONFIG_EXT_PARTITION
- make_tuple(6, 6, &vpx_sub_pixel_variance64x64_c, 0),
- make_tuple(6, 5, &vpx_sub_pixel_variance64x32_c, 0),
- make_tuple(5, 6, &vpx_sub_pixel_variance32x64_c, 0),
- make_tuple(5, 5, &vpx_sub_pixel_variance32x32_c, 0),
- make_tuple(5, 4, &vpx_sub_pixel_variance32x16_c, 0),
- make_tuple(4, 5, &vpx_sub_pixel_variance16x32_c, 0),
- make_tuple(4, 4, &vpx_sub_pixel_variance16x16_c, 0),
- make_tuple(4, 3, &vpx_sub_pixel_variance16x8_c, 0),
- make_tuple(3, 4, &vpx_sub_pixel_variance8x16_c, 0),
- make_tuple(3, 3, &vpx_sub_pixel_variance8x8_c, 0),
- make_tuple(3, 2, &vpx_sub_pixel_variance8x4_c, 0),
- make_tuple(2, 3, &vpx_sub_pixel_variance4x8_c, 0),
- make_tuple(2, 2, &vpx_sub_pixel_variance4x4_c, 0)
-};
INSTANTIATE_TEST_CASE_P(
C, VpxSubpelVarianceTest,
- ::testing::ValuesIn(kArraySubpelVariance_c));
+ ::testing::Values(make_tuple(6, 6, &vpx_sub_pixel_variance64x64_c, 0),
+ make_tuple(6, 5, &vpx_sub_pixel_variance64x32_c, 0),
+ make_tuple(5, 6, &vpx_sub_pixel_variance32x64_c, 0),
+ make_tuple(5, 5, &vpx_sub_pixel_variance32x32_c, 0),
+ make_tuple(5, 4, &vpx_sub_pixel_variance32x16_c, 0),
+ make_tuple(4, 5, &vpx_sub_pixel_variance16x32_c, 0),
+ make_tuple(4, 4, &vpx_sub_pixel_variance16x16_c, 0),
+ make_tuple(4, 3, &vpx_sub_pixel_variance16x8_c, 0),
+ make_tuple(3, 4, &vpx_sub_pixel_variance8x16_c, 0),
+ make_tuple(3, 3, &vpx_sub_pixel_variance8x8_c, 0),
+ make_tuple(3, 2, &vpx_sub_pixel_variance8x4_c, 0),
+ make_tuple(2, 3, &vpx_sub_pixel_variance4x8_c, 0),
+ make_tuple(2, 2, &vpx_sub_pixel_variance4x4_c, 0)));
-const VpxSubpelAvgVarianceTest::ParamType kArraySubpelAvgVariance_c[] = {
-#if CONFIG_VP10 && CONFIG_EXT_PARTITION
- make_tuple(7, 7, &vpx_sub_pixel_avg_variance128x128_c, 0),
- make_tuple(7, 6, &vpx_sub_pixel_avg_variance128x64_c, 0),
- make_tuple(6, 7, &vpx_sub_pixel_avg_variance64x128_c, 0),
-#endif // CONFIG_VP10 && CONFIG_EXT_PARTITION
- make_tuple(6, 6, &vpx_sub_pixel_avg_variance64x64_c, 0),
- make_tuple(6, 5, &vpx_sub_pixel_avg_variance64x32_c, 0),
- make_tuple(5, 6, &vpx_sub_pixel_avg_variance32x64_c, 0),
- make_tuple(5, 5, &vpx_sub_pixel_avg_variance32x32_c, 0),
- make_tuple(5, 4, &vpx_sub_pixel_avg_variance32x16_c, 0),
- make_tuple(4, 5, &vpx_sub_pixel_avg_variance16x32_c, 0),
- make_tuple(4, 4, &vpx_sub_pixel_avg_variance16x16_c, 0),
- make_tuple(4, 3, &vpx_sub_pixel_avg_variance16x8_c, 0),
- make_tuple(3, 4, &vpx_sub_pixel_avg_variance8x16_c, 0),
- make_tuple(3, 3, &vpx_sub_pixel_avg_variance8x8_c, 0),
- make_tuple(3, 2, &vpx_sub_pixel_avg_variance8x4_c, 0),
- make_tuple(2, 3, &vpx_sub_pixel_avg_variance4x8_c, 0),
- make_tuple(2, 2, &vpx_sub_pixel_avg_variance4x4_c, 0)
-};
INSTANTIATE_TEST_CASE_P(
C, VpxSubpelAvgVarianceTest,
- ::testing::ValuesIn(kArraySubpelAvgVariance_c));
+ ::testing::Values(make_tuple(6, 6, &vpx_sub_pixel_avg_variance64x64_c, 0),
+ make_tuple(6, 5, &vpx_sub_pixel_avg_variance64x32_c, 0),
+ make_tuple(5, 6, &vpx_sub_pixel_avg_variance32x64_c, 0),
+ make_tuple(5, 5, &vpx_sub_pixel_avg_variance32x32_c, 0),
+ make_tuple(5, 4, &vpx_sub_pixel_avg_variance32x16_c, 0),
+ make_tuple(4, 5, &vpx_sub_pixel_avg_variance16x32_c, 0),
+ make_tuple(4, 4, &vpx_sub_pixel_avg_variance16x16_c, 0),
+ make_tuple(4, 3, &vpx_sub_pixel_avg_variance16x8_c, 0),
+ make_tuple(3, 4, &vpx_sub_pixel_avg_variance8x16_c, 0),
+ make_tuple(3, 3, &vpx_sub_pixel_avg_variance8x8_c, 0),
+ make_tuple(3, 2, &vpx_sub_pixel_avg_variance8x4_c, 0),
+ make_tuple(2, 3, &vpx_sub_pixel_avg_variance4x8_c, 0),
+ make_tuple(2, 2, &vpx_sub_pixel_avg_variance4x4_c, 0)));
#if CONFIG_VP9_HIGHBITDEPTH
typedef MseTest<VarianceMxNFunc> VpxHBDMseTest;
@@ -872,73 +847,70 @@
make_tuple(4, 4, &vpx_highbd_8_mse8x8_c)));
*/
-const VpxHBDVarianceTest::ParamType kArrayHBDVariance_c[] = {
-#if CONFIG_VP10 && CONFIG_EXT_PARTITION
- make_tuple(7, 7, &vpx_highbd_12_variance128x128_c, 12),
- make_tuple(7, 6, &vpx_highbd_12_variance128x64_c, 12),
- make_tuple(6, 7, &vpx_highbd_12_variance64x128_c, 12),
-#endif // CONFIG_VP10 && CONFIG_EXT_PARTITION
- make_tuple(6, 6, &vpx_highbd_12_variance64x64_c, 12),
- make_tuple(6, 5, &vpx_highbd_12_variance64x32_c, 12),
- make_tuple(5, 6, &vpx_highbd_12_variance32x64_c, 12),
- make_tuple(5, 5, &vpx_highbd_12_variance32x32_c, 12),
- make_tuple(5, 4, &vpx_highbd_12_variance32x16_c, 12),
- make_tuple(4, 5, &vpx_highbd_12_variance16x32_c, 12),
- make_tuple(4, 4, &vpx_highbd_12_variance16x16_c, 12),
- make_tuple(4, 3, &vpx_highbd_12_variance16x8_c, 12),
- make_tuple(3, 4, &vpx_highbd_12_variance8x16_c, 12),
- make_tuple(3, 3, &vpx_highbd_12_variance8x8_c, 12),
- make_tuple(3, 2, &vpx_highbd_12_variance8x4_c, 12),
- make_tuple(2, 3, &vpx_highbd_12_variance4x8_c, 12),
- make_tuple(2, 2, &vpx_highbd_12_variance4x4_c, 12),
-#if CONFIG_VP10 && CONFIG_EXT_PARTITION
- make_tuple(7, 7, &vpx_highbd_10_variance128x128_c, 10),
- make_tuple(7, 6, &vpx_highbd_10_variance128x64_c, 10),
- make_tuple(6, 7, &vpx_highbd_10_variance64x128_c, 10),
-#endif // CONFIG_VP10 && CONFIG_EXT_PARTITION
- make_tuple(6, 6, &vpx_highbd_10_variance64x64_c, 10),
- make_tuple(6, 5, &vpx_highbd_10_variance64x32_c, 10),
- make_tuple(5, 6, &vpx_highbd_10_variance32x64_c, 10),
- make_tuple(5, 5, &vpx_highbd_10_variance32x32_c, 10),
- make_tuple(5, 4, &vpx_highbd_10_variance32x16_c, 10),
- make_tuple(4, 5, &vpx_highbd_10_variance16x32_c, 10),
- make_tuple(4, 4, &vpx_highbd_10_variance16x16_c, 10),
- make_tuple(4, 3, &vpx_highbd_10_variance16x8_c, 10),
- make_tuple(3, 4, &vpx_highbd_10_variance8x16_c, 10),
- make_tuple(3, 3, &vpx_highbd_10_variance8x8_c, 10),
- make_tuple(3, 2, &vpx_highbd_10_variance8x4_c, 10),
- make_tuple(2, 3, &vpx_highbd_10_variance4x8_c, 10),
- make_tuple(2, 2, &vpx_highbd_10_variance4x4_c, 10),
-#if CONFIG_VP10 && CONFIG_EXT_PARTITION
- make_tuple(7, 7, &vpx_highbd_8_variance128x128_c, 8),
- make_tuple(7, 6, &vpx_highbd_8_variance128x64_c, 8),
- make_tuple(6, 7, &vpx_highbd_8_variance64x128_c, 8),
-#endif // CONFIG_VP10 && CONFIG_EXT_PARTITION
- make_tuple(6, 6, &vpx_highbd_8_variance64x64_c, 8),
- make_tuple(6, 5, &vpx_highbd_8_variance64x32_c, 8),
- make_tuple(5, 6, &vpx_highbd_8_variance32x64_c, 8),
- make_tuple(5, 5, &vpx_highbd_8_variance32x32_c, 8),
- make_tuple(5, 4, &vpx_highbd_8_variance32x16_c, 8),
- make_tuple(4, 5, &vpx_highbd_8_variance16x32_c, 8),
- make_tuple(4, 4, &vpx_highbd_8_variance16x16_c, 8),
- make_tuple(4, 3, &vpx_highbd_8_variance16x8_c, 8),
- make_tuple(3, 4, &vpx_highbd_8_variance8x16_c, 8),
- make_tuple(3, 3, &vpx_highbd_8_variance8x8_c, 8),
- make_tuple(3, 2, &vpx_highbd_8_variance8x4_c, 8),
- make_tuple(2, 3, &vpx_highbd_8_variance4x8_c, 8),
- make_tuple(2, 2, &vpx_highbd_8_variance4x4_c, 8)
-};
INSTANTIATE_TEST_CASE_P(
C, VpxHBDVarianceTest,
- ::testing::ValuesIn(kArrayHBDVariance_c));
+ ::testing::Values(
+#if CONFIG_VP10 && CONFIG_EXT_PARTITION
+ make_tuple(7, 7, &vpx_highbd_12_variance128x128_c, 12),
+ make_tuple(7, 6, &vpx_highbd_12_variance128x64_c, 12),
+ make_tuple(6, 7, &vpx_highbd_12_variance64x128_c, 12),
+#endif // CONFIG_VP10 && CONFIG_EXT_PARTITION
+ make_tuple(6, 6, &vpx_highbd_12_variance64x64_c, 12),
+ make_tuple(6, 5, &vpx_highbd_12_variance64x32_c, 12),
+ make_tuple(5, 6, &vpx_highbd_12_variance32x64_c, 12),
+ make_tuple(5, 5, &vpx_highbd_12_variance32x32_c, 12),
+ make_tuple(5, 4, &vpx_highbd_12_variance32x16_c, 12),
+ make_tuple(4, 5, &vpx_highbd_12_variance16x32_c, 12),
+ make_tuple(4, 4, &vpx_highbd_12_variance16x16_c, 12),
+ make_tuple(4, 3, &vpx_highbd_12_variance16x8_c, 12),
+ make_tuple(3, 4, &vpx_highbd_12_variance8x16_c, 12),
+ make_tuple(3, 3, &vpx_highbd_12_variance8x8_c, 12),
+ make_tuple(3, 2, &vpx_highbd_12_variance8x4_c, 12),
+ make_tuple(2, 3, &vpx_highbd_12_variance4x8_c, 12),
+ make_tuple(2, 2, &vpx_highbd_12_variance4x4_c, 12),
+#if CONFIG_VP10 && CONFIG_EXT_PARTITION
+ make_tuple(7, 7, &vpx_highbd_10_variance128x128_c, 10),
+ make_tuple(7, 6, &vpx_highbd_10_variance128x64_c, 10),
+ make_tuple(6, 7, &vpx_highbd_10_variance64x128_c, 10),
+#endif // CONFIG_VP10 && CONFIG_EXT_PARTITION
+ make_tuple(6, 6, &vpx_highbd_10_variance64x64_c, 10),
+ make_tuple(6, 5, &vpx_highbd_10_variance64x32_c, 10),
+ make_tuple(5, 6, &vpx_highbd_10_variance32x64_c, 10),
+ make_tuple(5, 5, &vpx_highbd_10_variance32x32_c, 10),
+ make_tuple(5, 4, &vpx_highbd_10_variance32x16_c, 10),
+ make_tuple(4, 5, &vpx_highbd_10_variance16x32_c, 10),
+ make_tuple(4, 4, &vpx_highbd_10_variance16x16_c, 10),
+ make_tuple(4, 3, &vpx_highbd_10_variance16x8_c, 10),
+ make_tuple(3, 4, &vpx_highbd_10_variance8x16_c, 10),
+ make_tuple(3, 3, &vpx_highbd_10_variance8x8_c, 10),
+ make_tuple(3, 2, &vpx_highbd_10_variance8x4_c, 10),
+ make_tuple(2, 3, &vpx_highbd_10_variance4x8_c, 10),
+ make_tuple(2, 2, &vpx_highbd_10_variance4x4_c, 10),
+#if CONFIG_VP10 && CONFIG_EXT_PARTITION
+ make_tuple(7, 7, &vpx_highbd_8_variance128x128_c, 8),
+ make_tuple(7, 6, &vpx_highbd_8_variance128x64_c, 8),
+ make_tuple(6, 7, &vpx_highbd_8_variance64x128_c, 8),
+#endif // CONFIG_VP10 && CONFIG_EXT_PARTITION
+ make_tuple(6, 6, &vpx_highbd_8_variance64x64_c, 8),
+ make_tuple(6, 5, &vpx_highbd_8_variance64x32_c, 8),
+ make_tuple(5, 6, &vpx_highbd_8_variance32x64_c, 8),
+ make_tuple(5, 5, &vpx_highbd_8_variance32x32_c, 8),
+ make_tuple(5, 4, &vpx_highbd_8_variance32x16_c, 8),
+ make_tuple(4, 5, &vpx_highbd_8_variance16x32_c, 8),
+ make_tuple(4, 4, &vpx_highbd_8_variance16x16_c, 8),
+ make_tuple(4, 3, &vpx_highbd_8_variance16x8_c, 8),
+ make_tuple(3, 4, &vpx_highbd_8_variance8x16_c, 8),
+ make_tuple(3, 3, &vpx_highbd_8_variance8x8_c, 8),
+ make_tuple(3, 2, &vpx_highbd_8_variance8x4_c, 8),
+ make_tuple(2, 3, &vpx_highbd_8_variance4x8_c, 8),
+ make_tuple(2, 2, &vpx_highbd_8_variance4x4_c, 8)));
#if HAVE_SSE4_1 && CONFIG_VP9_HIGHBITDEPTH
INSTANTIATE_TEST_CASE_P(
SSE4_1, VpxHBDVarianceTest,
- ::testing::Values(
- make_tuple(2, 2, &vpx_highbd_8_variance4x4_sse4_1, 8),
- make_tuple(2, 2, &vpx_highbd_10_variance4x4_sse4_1, 10),
- make_tuple(2, 2, &vpx_highbd_12_variance4x4_sse4_1, 12)));
+ ::testing::Values(make_tuple(2, 2, &vpx_highbd_8_variance4x4_sse4_1, 8),
+ make_tuple(2, 2, &vpx_highbd_10_variance4x4_sse4_1, 10),
+ make_tuple(2, 2, &vpx_highbd_12_variance4x4_sse4_1, 12)));
#endif // HAVE_SSE4_1 && CONFIG_VP9_HIGHBITDEPTH
const VpxHBDSubpelVarianceTest::ParamType kArrayHBDSubpelVariance_c[] = {
@@ -995,7 +967,7 @@
make_tuple(3, 3, &vpx_highbd_12_sub_pixel_variance8x8_c, 12),
make_tuple(3, 2, &vpx_highbd_12_sub_pixel_variance8x4_c, 12),
make_tuple(2, 3, &vpx_highbd_12_sub_pixel_variance4x8_c, 12),
- make_tuple(2, 2, &vpx_highbd_12_sub_pixel_variance4x4_c, 12)
+ make_tuple(2, 2, &vpx_highbd_12_sub_pixel_variance4x4_c, 12),
};
INSTANTIATE_TEST_CASE_P(
C, VpxHBDSubpelVarianceTest,
@@ -1088,7 +1060,6 @@
make_tuple(2, 3, &vpx_variance4x8_sse2, 0),
make_tuple(2, 2, &vpx_variance4x4_sse2, 0)));
-#if CONFIG_USE_X86INC
INSTANTIATE_TEST_CASE_P(
SSE2, VpxSubpelVarianceTest,
::testing::Values(make_tuple(6, 6, &vpx_sub_pixel_variance64x64_sse2, 0),
@@ -1121,7 +1092,6 @@
make_tuple(3, 2, &vpx_sub_pixel_avg_variance8x4_sse2, 0),
make_tuple(2, 3, &vpx_sub_pixel_avg_variance4x8_sse2, 0),
make_tuple(2, 2, &vpx_sub_pixel_avg_variance4x4_sse2, 0)));
-#endif // CONFIG_USE_X86INC
#if HAVE_SSE4_1 && CONFIG_VP9_HIGHBITDEPTH
INSTANTIATE_TEST_CASE_P(
@@ -1190,7 +1160,6 @@
make_tuple(3, 4, &vpx_highbd_8_variance8x16_sse2, 8),
make_tuple(3, 3, &vpx_highbd_8_variance8x8_sse2, 8)));
-#if CONFIG_USE_X86INC
INSTANTIATE_TEST_CASE_P(
SSE2, VpxHBDSubpelVarianceTest,
::testing::Values(
@@ -1264,12 +1233,10 @@
make_tuple(3, 4, &vpx_highbd_8_sub_pixel_avg_variance8x16_sse2, 8),
make_tuple(3, 3, &vpx_highbd_8_sub_pixel_avg_variance8x8_sse2, 8),
make_tuple(3, 2, &vpx_highbd_8_sub_pixel_avg_variance8x4_sse2, 8)));
-#endif // CONFIG_USE_X86INC
#endif // CONFIG_VP9_HIGHBITDEPTH
#endif // HAVE_SSE2
#if HAVE_SSSE3
-#if CONFIG_USE_X86INC
INSTANTIATE_TEST_CASE_P(
SSSE3, VpxSubpelVarianceTest,
::testing::Values(make_tuple(6, 6, &vpx_sub_pixel_variance64x64_ssse3, 0),
@@ -1302,7 +1269,6 @@
make_tuple(3, 2, &vpx_sub_pixel_avg_variance8x4_ssse3, 0),
make_tuple(2, 3, &vpx_sub_pixel_avg_variance4x8_ssse3, 0),
make_tuple(2, 2, &vpx_sub_pixel_avg_variance4x4_ssse3, 0)));
-#endif // CONFIG_USE_X86INC
#endif // HAVE_SSSE3
#if HAVE_AVX2
diff --git a/test/vp9_error_block_test.cc b/test/vp9_error_block_test.cc
index 23a249e..341cc19 100644
--- a/test/vp9_error_block_test.cc
+++ b/test/vp9_error_block_test.cc
@@ -157,9 +157,9 @@
<< "First failed at test case " << first_failure;
}
+#if HAVE_SSE2 || HAVE_AVX
using std::tr1::make_tuple;
-#if CONFIG_USE_X86INC
int64_t wrap_vp9_highbd_block_error_8bit_c(const tran_low_t *coeff,
const tran_low_t *dqcoeff,
intptr_t block_size,
@@ -167,6 +167,7 @@
EXPECT_EQ(8, bps);
return vp9_highbd_block_error_8bit_c(coeff, dqcoeff, block_size, ssz);
}
+#endif // HAVE_SSE2 || HAVE_AVX
#if HAVE_SSE2
int64_t wrap_vp9_highbd_block_error_8bit_sse2(const tran_low_t *coeff,
@@ -206,6 +207,5 @@
&wrap_vp9_highbd_block_error_8bit_c, VPX_BITS_8)));
#endif // HAVE_AVX
-#endif // CONFIG_USE_X86INC
#endif // CONFIG_VP9_HIGHBITDEPTH
} // namespace
diff --git a/test/vp9_intrapred_test.cc b/test/vp9_intrapred_test.cc
index 416f3c3..cd87b77 100644
--- a/test/vp9_intrapred_test.cc
+++ b/test/vp9_intrapred_test.cc
@@ -28,45 +28,43 @@
const int count_test_block = 100000;
-// Base class for VP9 intra prediction tests.
-class VP9IntraPredBase {
+typedef void (*IntraPred)(uint16_t* dst, ptrdiff_t stride,
+ const uint16_t* above, const uint16_t* left,
+ int bps);
+
+struct IntraPredFunc {
+ IntraPredFunc(IntraPred pred = NULL, IntraPred ref = NULL,
+ int block_size_value = 0, int bit_depth_value = 0)
+ : pred_fn(pred), ref_fn(ref),
+ block_size(block_size_value), bit_depth(bit_depth_value) {}
+
+ IntraPred pred_fn;
+ IntraPred ref_fn;
+ int block_size;
+ int bit_depth;
+};
+
+class VP9IntraPredTest : public ::testing::TestWithParam<IntraPredFunc> {
public:
- virtual ~VP9IntraPredBase() { libvpx_test::ClearSystemState(); }
-
- protected:
- virtual void Predict() = 0;
-
- void CheckPrediction(int test_case_number, int *error_count) const {
- // For each pixel ensure that the calculated value is the same as reference.
- for (int y = 0; y < block_size_; y++) {
- for (int x = 0; x < block_size_; x++) {
- *error_count += ref_dst_[x + y * stride_] != dst_[x + y * stride_];
- if (*error_count == 1) {
- ASSERT_EQ(ref_dst_[x + y * stride_], dst_[x + y * stride_])
- << " Failed on Test Case Number "<< test_case_number;
- }
- }
- }
- }
-
void RunTest(uint16_t* left_col, uint16_t* above_data,
uint16_t* dst, uint16_t* ref_dst) {
ACMRandom rnd(ACMRandom::DeterministicSeed());
+ const int block_size = params_.block_size;
+ above_row_ = above_data + 16;
left_col_ = left_col;
dst_ = dst;
ref_dst_ = ref_dst;
- above_row_ = above_data + 16;
int error_count = 0;
for (int i = 0; i < count_test_block; ++i) {
// Fill edges with random data, try first with saturated values.
- for (int x = -1; x <= block_size_*2; x++) {
+ for (int x = -1; x <= block_size * 2; x++) {
if (i == 0) {
above_row_[x] = mask_;
} else {
above_row_[x] = rnd.Rand16() & mask_;
}
}
- for (int y = 0; y < block_size_; y++) {
+ for (int y = 0; y < block_size; y++) {
if (i == 0) {
left_col_[y] = mask_;
} else {
@@ -79,43 +77,42 @@
ASSERT_EQ(0, error_count);
}
- int block_size_;
+ protected:
+ virtual void SetUp() {
+ params_ = GetParam();
+ stride_ = params_.block_size * 3;
+ mask_ = (1 << params_.bit_depth) - 1;
+ }
+
+ void Predict() {
+ const int bit_depth = params_.bit_depth;
+ params_.ref_fn(ref_dst_, stride_, above_row_, left_col_, bit_depth);
+ ASM_REGISTER_STATE_CHECK(params_.pred_fn(dst_, stride_,
+ above_row_, left_col_, bit_depth));
+ }
+
+ void CheckPrediction(int test_case_number, int *error_count) const {
+ // For each pixel ensure that the calculated value is the same as reference.
+ const int block_size = params_.block_size;
+ for (int y = 0; y < block_size; y++) {
+ for (int x = 0; x < block_size; x++) {
+ *error_count += ref_dst_[x + y * stride_] != dst_[x + y * stride_];
+ if (*error_count == 1) {
+ ASSERT_EQ(ref_dst_[x + y * stride_], dst_[x + y * stride_])
+ << " Failed on Test Case Number "<< test_case_number;
+ }
+ }
+ }
+ }
+
uint16_t *above_row_;
uint16_t *left_col_;
uint16_t *dst_;
uint16_t *ref_dst_;
ptrdiff_t stride_;
int mask_;
-};
-typedef void (*intra_pred_fn_t)(
- uint16_t *dst, ptrdiff_t stride, const uint16_t *above,
- const uint16_t *left, int bps);
-typedef std::tr1::tuple<intra_pred_fn_t,
- intra_pred_fn_t, int, int> intra_pred_params_t;
-class VP9IntraPredTest
- : public VP9IntraPredBase,
- public ::testing::TestWithParam<intra_pred_params_t> {
-
- virtual void SetUp() {
- pred_fn_ = GET_PARAM(0);
- ref_fn_ = GET_PARAM(1);
- block_size_ = GET_PARAM(2);
- bit_depth_ = GET_PARAM(3);
- stride_ = block_size_ * 3;
- mask_ = (1 << bit_depth_) - 1;
- }
-
- virtual void Predict() {
- const uint16_t *const_above_row = above_row_;
- const uint16_t *const_left_col = left_col_;
- ref_fn_(ref_dst_, stride_, const_above_row, const_left_col, bit_depth_);
- ASM_REGISTER_STATE_CHECK(pred_fn_(dst_, stride_, const_above_row,
- const_left_col, bit_depth_));
- }
- intra_pred_fn_t pred_fn_;
- intra_pred_fn_t ref_fn_;
- int bit_depth_;
+ IntraPredFunc params_;
};
TEST_P(VP9IntraPredTest, IntraPredTests) {
@@ -127,105 +124,89 @@
RunTest(left_col, above_data, dst, ref_dst);
}
-using std::tr1::make_tuple;
-
#if HAVE_SSE2
#if CONFIG_VP9_HIGHBITDEPTH
-#if CONFIG_USE_X86INC
INSTANTIATE_TEST_CASE_P(SSE2_TO_C_8, VP9IntraPredTest,
- ::testing::Values(
- make_tuple(&vpx_highbd_dc_predictor_32x32_sse2,
- &vpx_highbd_dc_predictor_32x32_c, 32, 8),
- make_tuple(&vpx_highbd_tm_predictor_16x16_sse2,
- &vpx_highbd_tm_predictor_16x16_c, 16, 8),
- make_tuple(&vpx_highbd_tm_predictor_32x32_sse2,
- &vpx_highbd_tm_predictor_32x32_c, 32, 8),
- make_tuple(&vpx_highbd_dc_predictor_4x4_sse2,
- &vpx_highbd_dc_predictor_4x4_c, 4, 8),
- make_tuple(&vpx_highbd_dc_predictor_8x8_sse2,
- &vpx_highbd_dc_predictor_8x8_c, 8, 8),
- make_tuple(&vpx_highbd_dc_predictor_16x16_sse2,
- &vpx_highbd_dc_predictor_16x16_c, 16, 8),
- make_tuple(&vpx_highbd_v_predictor_4x4_sse2,
- &vpx_highbd_v_predictor_4x4_c, 4, 8),
- make_tuple(&vpx_highbd_v_predictor_8x8_sse2,
- &vpx_highbd_v_predictor_8x8_c, 8, 8),
- make_tuple(&vpx_highbd_v_predictor_16x16_sse2,
- &vpx_highbd_v_predictor_16x16_c, 16, 8),
- make_tuple(&vpx_highbd_v_predictor_32x32_sse2,
- &vpx_highbd_v_predictor_32x32_c, 32, 8),
- make_tuple(&vpx_highbd_tm_predictor_4x4_sse2,
- &vpx_highbd_tm_predictor_4x4_c, 4, 8),
- make_tuple(&vpx_highbd_tm_predictor_8x8_sse2,
- &vpx_highbd_tm_predictor_8x8_c, 8, 8)));
+ ::testing::Values(
+ IntraPredFunc(&vpx_highbd_dc_predictor_32x32_sse2,
+ &vpx_highbd_dc_predictor_32x32_c, 32, 8),
+ IntraPredFunc(&vpx_highbd_tm_predictor_16x16_sse2,
+ &vpx_highbd_tm_predictor_16x16_c, 16, 8),
+ IntraPredFunc(&vpx_highbd_tm_predictor_32x32_sse2,
+ &vpx_highbd_tm_predictor_32x32_c, 32, 8),
+ IntraPredFunc(&vpx_highbd_dc_predictor_4x4_sse2,
+ &vpx_highbd_dc_predictor_4x4_c, 4, 8),
+ IntraPredFunc(&vpx_highbd_dc_predictor_8x8_sse2,
+ &vpx_highbd_dc_predictor_8x8_c, 8, 8),
+ IntraPredFunc(&vpx_highbd_dc_predictor_16x16_sse2,
+ &vpx_highbd_dc_predictor_16x16_c, 16, 8),
+ IntraPredFunc(&vpx_highbd_v_predictor_4x4_sse2,
+ &vpx_highbd_v_predictor_4x4_c, 4, 8),
+ IntraPredFunc(&vpx_highbd_v_predictor_8x8_sse2,
+ &vpx_highbd_v_predictor_8x8_c, 8, 8),
+ IntraPredFunc(&vpx_highbd_v_predictor_16x16_sse2,
+ &vpx_highbd_v_predictor_16x16_c, 16, 8),
+ IntraPredFunc(&vpx_highbd_v_predictor_32x32_sse2,
+ &vpx_highbd_v_predictor_32x32_c, 32, 8),
+ IntraPredFunc(&vpx_highbd_tm_predictor_4x4_sse2,
+ &vpx_highbd_tm_predictor_4x4_c, 4, 8),
+ IntraPredFunc(&vpx_highbd_tm_predictor_8x8_sse2,
+ &vpx_highbd_tm_predictor_8x8_c, 8, 8)));
INSTANTIATE_TEST_CASE_P(SSE2_TO_C_10, VP9IntraPredTest,
- ::testing::Values(
- make_tuple(&vpx_highbd_dc_predictor_32x32_sse2,
- &vpx_highbd_dc_predictor_32x32_c, 32,
- 10),
- make_tuple(&vpx_highbd_tm_predictor_16x16_sse2,
- &vpx_highbd_tm_predictor_16x16_c, 16,
- 10),
- make_tuple(&vpx_highbd_tm_predictor_32x32_sse2,
- &vpx_highbd_tm_predictor_32x32_c, 32,
- 10),
- make_tuple(&vpx_highbd_dc_predictor_4x4_sse2,
- &vpx_highbd_dc_predictor_4x4_c, 4, 10),
- make_tuple(&vpx_highbd_dc_predictor_8x8_sse2,
- &vpx_highbd_dc_predictor_8x8_c, 8, 10),
- make_tuple(&vpx_highbd_dc_predictor_16x16_sse2,
- &vpx_highbd_dc_predictor_16x16_c, 16,
- 10),
- make_tuple(&vpx_highbd_v_predictor_4x4_sse2,
- &vpx_highbd_v_predictor_4x4_c, 4, 10),
- make_tuple(&vpx_highbd_v_predictor_8x8_sse2,
- &vpx_highbd_v_predictor_8x8_c, 8, 10),
- make_tuple(&vpx_highbd_v_predictor_16x16_sse2,
- &vpx_highbd_v_predictor_16x16_c, 16,
- 10),
- make_tuple(&vpx_highbd_v_predictor_32x32_sse2,
- &vpx_highbd_v_predictor_32x32_c, 32,
- 10),
- make_tuple(&vpx_highbd_tm_predictor_4x4_sse2,
- &vpx_highbd_tm_predictor_4x4_c, 4, 10),
- make_tuple(&vpx_highbd_tm_predictor_8x8_sse2,
- &vpx_highbd_tm_predictor_8x8_c, 8, 10)));
+ ::testing::Values(
+ IntraPredFunc(&vpx_highbd_dc_predictor_32x32_sse2,
+ &vpx_highbd_dc_predictor_32x32_c, 32, 10),
+ IntraPredFunc(&vpx_highbd_tm_predictor_16x16_sse2,
+ &vpx_highbd_tm_predictor_16x16_c, 16, 10),
+ IntraPredFunc(&vpx_highbd_tm_predictor_32x32_sse2,
+ &vpx_highbd_tm_predictor_32x32_c, 32, 10),
+ IntraPredFunc(&vpx_highbd_dc_predictor_4x4_sse2,
+ &vpx_highbd_dc_predictor_4x4_c, 4, 10),
+ IntraPredFunc(&vpx_highbd_dc_predictor_8x8_sse2,
+ &vpx_highbd_dc_predictor_8x8_c, 8, 10),
+ IntraPredFunc(&vpx_highbd_dc_predictor_16x16_sse2,
+ &vpx_highbd_dc_predictor_16x16_c, 16, 10),
+ IntraPredFunc(&vpx_highbd_v_predictor_4x4_sse2,
+ &vpx_highbd_v_predictor_4x4_c, 4, 10),
+ IntraPredFunc(&vpx_highbd_v_predictor_8x8_sse2,
+ &vpx_highbd_v_predictor_8x8_c, 8, 10),
+ IntraPredFunc(&vpx_highbd_v_predictor_16x16_sse2,
+ &vpx_highbd_v_predictor_16x16_c, 16, 10),
+ IntraPredFunc(&vpx_highbd_v_predictor_32x32_sse2,
+ &vpx_highbd_v_predictor_32x32_c, 32, 10),
+ IntraPredFunc(&vpx_highbd_tm_predictor_4x4_sse2,
+ &vpx_highbd_tm_predictor_4x4_c, 4, 10),
+ IntraPredFunc(&vpx_highbd_tm_predictor_8x8_sse2,
+ &vpx_highbd_tm_predictor_8x8_c, 8, 10)));
INSTANTIATE_TEST_CASE_P(SSE2_TO_C_12, VP9IntraPredTest,
- ::testing::Values(
- make_tuple(&vpx_highbd_dc_predictor_32x32_sse2,
- &vpx_highbd_dc_predictor_32x32_c, 32,
- 12),
- make_tuple(&vpx_highbd_tm_predictor_16x16_sse2,
- &vpx_highbd_tm_predictor_16x16_c, 16,
- 12),
- make_tuple(&vpx_highbd_tm_predictor_32x32_sse2,
- &vpx_highbd_tm_predictor_32x32_c, 32,
- 12),
- make_tuple(&vpx_highbd_dc_predictor_4x4_sse2,
- &vpx_highbd_dc_predictor_4x4_c, 4, 12),
- make_tuple(&vpx_highbd_dc_predictor_8x8_sse2,
- &vpx_highbd_dc_predictor_8x8_c, 8, 12),
- make_tuple(&vpx_highbd_dc_predictor_16x16_sse2,
- &vpx_highbd_dc_predictor_16x16_c, 16,
- 12),
- make_tuple(&vpx_highbd_v_predictor_4x4_sse2,
- &vpx_highbd_v_predictor_4x4_c, 4, 12),
- make_tuple(&vpx_highbd_v_predictor_8x8_sse2,
- &vpx_highbd_v_predictor_8x8_c, 8, 12),
- make_tuple(&vpx_highbd_v_predictor_16x16_sse2,
- &vpx_highbd_v_predictor_16x16_c, 16,
- 12),
- make_tuple(&vpx_highbd_v_predictor_32x32_sse2,
- &vpx_highbd_v_predictor_32x32_c, 32,
- 12),
- make_tuple(&vpx_highbd_tm_predictor_4x4_sse2,
- &vpx_highbd_tm_predictor_4x4_c, 4, 12),
- make_tuple(&vpx_highbd_tm_predictor_8x8_sse2,
- &vpx_highbd_tm_predictor_8x8_c, 8, 12)));
+ ::testing::Values(
+ IntraPredFunc(&vpx_highbd_dc_predictor_32x32_sse2,
+ &vpx_highbd_dc_predictor_32x32_c, 32, 12),
+ IntraPredFunc(&vpx_highbd_tm_predictor_16x16_sse2,
+ &vpx_highbd_tm_predictor_16x16_c, 16, 12),
+ IntraPredFunc(&vpx_highbd_tm_predictor_32x32_sse2,
+ &vpx_highbd_tm_predictor_32x32_c, 32, 12),
+ IntraPredFunc(&vpx_highbd_dc_predictor_4x4_sse2,
+ &vpx_highbd_dc_predictor_4x4_c, 4, 12),
+ IntraPredFunc(&vpx_highbd_dc_predictor_8x8_sse2,
+ &vpx_highbd_dc_predictor_8x8_c, 8, 12),
+ IntraPredFunc(&vpx_highbd_dc_predictor_16x16_sse2,
+ &vpx_highbd_dc_predictor_16x16_c, 16, 12),
+ IntraPredFunc(&vpx_highbd_v_predictor_4x4_sse2,
+ &vpx_highbd_v_predictor_4x4_c, 4, 12),
+ IntraPredFunc(&vpx_highbd_v_predictor_8x8_sse2,
+ &vpx_highbd_v_predictor_8x8_c, 8, 12),
+ IntraPredFunc(&vpx_highbd_v_predictor_16x16_sse2,
+ &vpx_highbd_v_predictor_16x16_c, 16, 12),
+ IntraPredFunc(&vpx_highbd_v_predictor_32x32_sse2,
+ &vpx_highbd_v_predictor_32x32_c, 32, 12),
+ IntraPredFunc(&vpx_highbd_tm_predictor_4x4_sse2,
+ &vpx_highbd_tm_predictor_4x4_c, 4, 12),
+ IntraPredFunc(&vpx_highbd_tm_predictor_8x8_sse2,
+ &vpx_highbd_tm_predictor_8x8_c, 8, 12)));
-#endif // CONFIG_USE_X86INC
#endif // CONFIG_VP9_HIGHBITDEPTH
#endif // HAVE_SSE2
} // namespace
diff --git a/third_party/googletest/README.libvpx b/third_party/googletest/README.libvpx
index 1eca78d..0e3b8b9 100644
--- a/third_party/googletest/README.libvpx
+++ b/third_party/googletest/README.libvpx
@@ -16,4 +16,6 @@
free build.
- Added GTEST_ATTRIBUTE_UNUSED_ to test registering dummies in TEST_P
and INSTANTIATE_TEST_CASE_P to remove warnings about unused variables
- under GCC 5.
\ No newline at end of file
+ under GCC 5.
+- Only define g_in_fast_death_test_child for non-Windows builds; quiets an
+ unused variable warning.
diff --git a/third_party/googletest/src/src/gtest-all.cc b/third_party/googletest/src/src/gtest-all.cc
index 8d90627..9128681 100644
--- a/third_party/googletest/src/src/gtest-all.cc
+++ b/third_party/googletest/src/src/gtest-all.cc
@@ -6612,9 +6612,11 @@
namespace internal {
+# if !GTEST_OS_WINDOWS
// Valid only for fast death tests. Indicates the code is running in the
// child process of a fast style death test.
static bool g_in_fast_death_test_child = false;
+# endif // !GTEST_OS_WINDOWS
// Returns a Boolean value indicating whether the caller is currently
// executing in the context of the death test child process. Tools such as
diff --git a/vp10/common/idct.c b/vp10/common/idct.c
index 179b903..1a573bd 100644
--- a/vp10/common/idct.c
+++ b/vp10/common/idct.c
@@ -77,7 +77,8 @@
int bd) {
int i;
for (i = 0; i < 4; ++i)
- output[i] = (tran_low_t)highbd_dct_const_round_shift(input[i] * Sqrt2, bd);
+ output[i] = HIGHBD_WRAPLOW(
+ highbd_dct_const_round_shift(input[i] * Sqrt2), bd);
}
static void highbd_iidtx8_c(const tran_low_t *input, tran_low_t *output,
@@ -92,8 +93,8 @@
int bd) {
int i;
for (i = 0; i < 16; ++i)
- output[i] = (tran_low_t)highbd_dct_const_round_shift(
- input[i] * 2 * Sqrt2, bd);
+ output[i] = HIGHBD_WRAPLOW(
+ highbd_dct_const_round_shift(input[i] * 2 * Sqrt2), bd);
}
static void highbd_iidtx32_c(const tran_low_t *input, tran_low_t *output,
@@ -113,8 +114,8 @@
}
// Multiply input by sqrt(2)
for (i = 0; i < 16; ++i) {
- inputhalf[i] = (tran_low_t)highbd_dct_const_round_shift(
- input[i] * Sqrt2, bd);
+ inputhalf[i] = HIGHBD_WRAPLOW(
+ highbd_dct_const_round_shift(input[i] * Sqrt2), bd);
}
vpx_highbd_idct16_c(inputhalf, output + 16, bd);
// Note overall scaling factor is 4 times orthogonal
@@ -190,18 +191,18 @@
// stage 1
temp1 = (input[3] + input[1]) * cospi_16_64;
temp2 = (input[3] - input[1]) * cospi_16_64;
- step[0] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step[1] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step[0] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step[1] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
temp1 = input[2] * cospi_24_64 - input[0] * cospi_8_64;
temp2 = input[2] * cospi_8_64 + input[0] * cospi_24_64;
- step[2] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step[3] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step[2] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step[3] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
// stage 2
- output[0] = WRAPLOW(step[0] + step[3], bd);
- output[1] = WRAPLOW(-step[1] - step[2], bd);
- output[2] = WRAPLOW(step[1] - step[2], bd);
- output[3] = WRAPLOW(step[3] - step[0], bd);
+ output[0] = HIGHBD_WRAPLOW(step[0] + step[3], bd);
+ output[1] = HIGHBD_WRAPLOW(-step[1] - step[2], bd);
+ output[2] = HIGHBD_WRAPLOW(step[1] - step[2], bd);
+ output[3] = HIGHBD_WRAPLOW(step[3] - step[0], bd);
}
void highbd_idst8_c(const tran_low_t *input, tran_low_t *output, int bd) {
@@ -215,48 +216,48 @@
step1[3] = input[1];
temp1 = input[6] * cospi_28_64 - input[0] * cospi_4_64;
temp2 = input[6] * cospi_4_64 + input[0] * cospi_28_64;
- step1[4] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step1[7] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step1[4] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step1[7] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
temp1 = input[2] * cospi_12_64 - input[4] * cospi_20_64;
temp2 = input[2] * cospi_20_64 + input[4] * cospi_12_64;
- step1[5] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step1[6] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step1[5] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step1[6] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
// stage 2
temp1 = (step1[0] + step1[2]) * cospi_16_64;
temp2 = (step1[0] - step1[2]) * cospi_16_64;
- step2[0] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step2[1] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step2[0] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step2[1] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
temp1 = step1[1] * cospi_24_64 - step1[3] * cospi_8_64;
temp2 = step1[1] * cospi_8_64 + step1[3] * cospi_24_64;
- step2[2] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step2[3] = WRAPLOW(dct_const_round_shift(temp2), bd);
- step2[4] = WRAPLOW(step1[4] + step1[5], bd);
- step2[5] = WRAPLOW(step1[4] - step1[5], bd);
- step2[6] = WRAPLOW(-step1[6] + step1[7], bd);
- step2[7] = WRAPLOW(step1[6] + step1[7], bd);
+ step2[2] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step2[3] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
+ step2[4] = HIGHBD_WRAPLOW(step1[4] + step1[5], bd);
+ step2[5] = HIGHBD_WRAPLOW(step1[4] - step1[5], bd);
+ step2[6] = HIGHBD_WRAPLOW(-step1[6] + step1[7], bd);
+ step2[7] = HIGHBD_WRAPLOW(step1[6] + step1[7], bd);
// stage 3
- step1[0] = WRAPLOW(step2[0] + step2[3], bd);
- step1[1] = WRAPLOW(step2[1] + step2[2], bd);
- step1[2] = WRAPLOW(step2[1] - step2[2], bd);
- step1[3] = WRAPLOW(step2[0] - step2[3], bd);
+ step1[0] = HIGHBD_WRAPLOW(step2[0] + step2[3], bd);
+ step1[1] = HIGHBD_WRAPLOW(step2[1] + step2[2], bd);
+ step1[2] = HIGHBD_WRAPLOW(step2[1] - step2[2], bd);
+ step1[3] = HIGHBD_WRAPLOW(step2[0] - step2[3], bd);
step1[4] = step2[4];
temp1 = (step2[6] - step2[5]) * cospi_16_64;
temp2 = (step2[5] + step2[6]) * cospi_16_64;
- step1[5] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step1[6] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step1[5] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step1[6] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
step1[7] = step2[7];
// stage 4
- output[0] = WRAPLOW(step1[0] + step1[7], bd);
- output[1] = WRAPLOW(-step1[1] - step1[6], bd);
- output[2] = WRAPLOW(step1[2] + step1[5], bd);
- output[3] = WRAPLOW(-step1[3] - step1[4], bd);
- output[4] = WRAPLOW(step1[3] - step1[4], bd);
- output[5] = WRAPLOW(-step1[2] + step1[5], bd);
- output[6] = WRAPLOW(step1[1] - step1[6], bd);
- output[7] = WRAPLOW(-step1[0] + step1[7], bd);
+ output[0] = HIGHBD_WRAPLOW(step1[0] + step1[7], bd);
+ output[1] = HIGHBD_WRAPLOW(-step1[1] - step1[6], bd);
+ output[2] = HIGHBD_WRAPLOW(step1[2] + step1[5], bd);
+ output[3] = HIGHBD_WRAPLOW(-step1[3] - step1[4], bd);
+ output[4] = HIGHBD_WRAPLOW(step1[3] - step1[4], bd);
+ output[5] = HIGHBD_WRAPLOW(-step1[2] + step1[5], bd);
+ output[6] = HIGHBD_WRAPLOW(step1[1] - step1[6], bd);
+ output[7] = HIGHBD_WRAPLOW(-step1[0] + step1[7], bd);
}
void highbd_idst16_c(const tran_low_t *input, tran_low_t *output, int bd) {
@@ -295,23 +296,23 @@
temp1 = step1[8] * cospi_30_64 - step1[15] * cospi_2_64;
temp2 = step1[8] * cospi_2_64 + step1[15] * cospi_30_64;
- step2[8] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step2[15] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step2[8] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step2[15] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
temp1 = step1[9] * cospi_14_64 - step1[14] * cospi_18_64;
temp2 = step1[9] * cospi_18_64 + step1[14] * cospi_14_64;
- step2[9] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step2[14] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step2[9] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step2[14] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
temp1 = step1[10] * cospi_22_64 - step1[13] * cospi_10_64;
temp2 = step1[10] * cospi_10_64 + step1[13] * cospi_22_64;
- step2[10] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step2[13] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step2[10] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step2[13] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
temp1 = step1[11] * cospi_6_64 - step1[12] * cospi_26_64;
temp2 = step1[11] * cospi_26_64 + step1[12] * cospi_6_64;
- step2[11] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step2[12] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step2[11] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step2[12] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
// stage 3
step1[0] = step2[0];
@@ -321,109 +322,109 @@
temp1 = step2[4] * cospi_28_64 - step2[7] * cospi_4_64;
temp2 = step2[4] * cospi_4_64 + step2[7] * cospi_28_64;
- step1[4] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step1[7] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step1[4] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step1[7] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
temp1 = step2[5] * cospi_12_64 - step2[6] * cospi_20_64;
temp2 = step2[5] * cospi_20_64 + step2[6] * cospi_12_64;
- step1[5] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step1[6] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step1[5] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step1[6] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
- step1[8] = WRAPLOW(step2[8] + step2[9], bd);
- step1[9] = WRAPLOW(step2[8] - step2[9], bd);
- step1[10] = WRAPLOW(-step2[10] + step2[11], bd);
- step1[11] = WRAPLOW(step2[10] + step2[11], bd);
- step1[12] = WRAPLOW(step2[12] + step2[13], bd);
- step1[13] = WRAPLOW(step2[12] - step2[13], bd);
- step1[14] = WRAPLOW(-step2[14] + step2[15], bd);
- step1[15] = WRAPLOW(step2[14] + step2[15], bd);
+ step1[8] = HIGHBD_WRAPLOW(step2[8] + step2[9], bd);
+ step1[9] = HIGHBD_WRAPLOW(step2[8] - step2[9], bd);
+ step1[10] = HIGHBD_WRAPLOW(-step2[10] + step2[11], bd);
+ step1[11] = HIGHBD_WRAPLOW(step2[10] + step2[11], bd);
+ step1[12] = HIGHBD_WRAPLOW(step2[12] + step2[13], bd);
+ step1[13] = HIGHBD_WRAPLOW(step2[12] - step2[13], bd);
+ step1[14] = HIGHBD_WRAPLOW(-step2[14] + step2[15], bd);
+ step1[15] = HIGHBD_WRAPLOW(step2[14] + step2[15], bd);
// stage 4
temp1 = (step1[0] + step1[1]) * cospi_16_64;
temp2 = (step1[0] - step1[1]) * cospi_16_64;
- step2[0] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step2[1] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step2[0] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step2[1] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
temp1 = step1[2] * cospi_24_64 - step1[3] * cospi_8_64;
temp2 = step1[2] * cospi_8_64 + step1[3] * cospi_24_64;
- step2[2] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step2[3] = WRAPLOW(dct_const_round_shift(temp2), bd);
- step2[4] = WRAPLOW(step1[4] + step1[5], bd);
- step2[5] = WRAPLOW(step1[4] - step1[5], bd);
- step2[6] = WRAPLOW(-step1[6] + step1[7], bd);
- step2[7] = WRAPLOW(step1[6] + step1[7], bd);
+ step2[2] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step2[3] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
+ step2[4] = HIGHBD_WRAPLOW(step1[4] + step1[5], bd);
+ step2[5] = HIGHBD_WRAPLOW(step1[4] - step1[5], bd);
+ step2[6] = HIGHBD_WRAPLOW(-step1[6] + step1[7], bd);
+ step2[7] = HIGHBD_WRAPLOW(step1[6] + step1[7], bd);
step2[8] = step1[8];
step2[15] = step1[15];
temp1 = -step1[9] * cospi_8_64 + step1[14] * cospi_24_64;
temp2 = step1[9] * cospi_24_64 + step1[14] * cospi_8_64;
- step2[9] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step2[14] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step2[9] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step2[14] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
temp1 = -step1[10] * cospi_24_64 - step1[13] * cospi_8_64;
temp2 = -step1[10] * cospi_8_64 + step1[13] * cospi_24_64;
- step2[10] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step2[13] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step2[10] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step2[13] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
step2[11] = step1[11];
step2[12] = step1[12];
// stage 5
- step1[0] = WRAPLOW(step2[0] + step2[3], bd);
- step1[1] = WRAPLOW(step2[1] + step2[2], bd);
- step1[2] = WRAPLOW(step2[1] - step2[2], bd);
- step1[3] = WRAPLOW(step2[0] - step2[3], bd);
+ step1[0] = HIGHBD_WRAPLOW(step2[0] + step2[3], bd);
+ step1[1] = HIGHBD_WRAPLOW(step2[1] + step2[2], bd);
+ step1[2] = HIGHBD_WRAPLOW(step2[1] - step2[2], bd);
+ step1[3] = HIGHBD_WRAPLOW(step2[0] - step2[3], bd);
step1[4] = step2[4];
temp1 = (step2[6] - step2[5]) * cospi_16_64;
temp2 = (step2[5] + step2[6]) * cospi_16_64;
- step1[5] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step1[6] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step1[5] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step1[6] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
step1[7] = step2[7];
- step1[8] = WRAPLOW(step2[8] + step2[11], bd);
- step1[9] = WRAPLOW(step2[9] + step2[10], bd);
- step1[10] = WRAPLOW(step2[9] - step2[10], bd);
- step1[11] = WRAPLOW(step2[8] - step2[11], bd);
- step1[12] = WRAPLOW(-step2[12] + step2[15], bd);
- step1[13] = WRAPLOW(-step2[13] + step2[14], bd);
- step1[14] = WRAPLOW(step2[13] + step2[14], bd);
- step1[15] = WRAPLOW(step2[12] + step2[15], bd);
+ step1[8] = HIGHBD_WRAPLOW(step2[8] + step2[11], bd);
+ step1[9] = HIGHBD_WRAPLOW(step2[9] + step2[10], bd);
+ step1[10] = HIGHBD_WRAPLOW(step2[9] - step2[10], bd);
+ step1[11] = HIGHBD_WRAPLOW(step2[8] - step2[11], bd);
+ step1[12] = HIGHBD_WRAPLOW(-step2[12] + step2[15], bd);
+ step1[13] = HIGHBD_WRAPLOW(-step2[13] + step2[14], bd);
+ step1[14] = HIGHBD_WRAPLOW(step2[13] + step2[14], bd);
+ step1[15] = HIGHBD_WRAPLOW(step2[12] + step2[15], bd);
// stage 6
- step2[0] = WRAPLOW(step1[0] + step1[7], bd);
- step2[1] = WRAPLOW(step1[1] + step1[6], bd);
- step2[2] = WRAPLOW(step1[2] + step1[5], bd);
- step2[3] = WRAPLOW(step1[3] + step1[4], bd);
- step2[4] = WRAPLOW(step1[3] - step1[4], bd);
- step2[5] = WRAPLOW(step1[2] - step1[5], bd);
- step2[6] = WRAPLOW(step1[1] - step1[6], bd);
- step2[7] = WRAPLOW(step1[0] - step1[7], bd);
+ step2[0] = HIGHBD_WRAPLOW(step1[0] + step1[7], bd);
+ step2[1] = HIGHBD_WRAPLOW(step1[1] + step1[6], bd);
+ step2[2] = HIGHBD_WRAPLOW(step1[2] + step1[5], bd);
+ step2[3] = HIGHBD_WRAPLOW(step1[3] + step1[4], bd);
+ step2[4] = HIGHBD_WRAPLOW(step1[3] - step1[4], bd);
+ step2[5] = HIGHBD_WRAPLOW(step1[2] - step1[5], bd);
+ step2[6] = HIGHBD_WRAPLOW(step1[1] - step1[6], bd);
+ step2[7] = HIGHBD_WRAPLOW(step1[0] - step1[7], bd);
step2[8] = step1[8];
step2[9] = step1[9];
temp1 = (-step1[10] + step1[13]) * cospi_16_64;
temp2 = (step1[10] + step1[13]) * cospi_16_64;
- step2[10] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step2[13] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step2[10] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step2[13] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
temp1 = (-step1[11] + step1[12]) * cospi_16_64;
temp2 = (step1[11] + step1[12]) * cospi_16_64;
- step2[11] = WRAPLOW(dct_const_round_shift(temp1), bd);
- step2[12] = WRAPLOW(dct_const_round_shift(temp2), bd);
+ step2[11] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd);
+ step2[12] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd);
step2[14] = step1[14];
step2[15] = step1[15];
// stage 7
- output[0] = WRAPLOW(step2[0] + step2[15], bd);
- output[1] = WRAPLOW(-step2[1] - step2[14], bd);
- output[2] = WRAPLOW(step2[2] + step2[13], bd);
- output[3] = WRAPLOW(-step2[3] - step2[12], bd);
- output[4] = WRAPLOW(step2[4] + step2[11], bd);
- output[5] = WRAPLOW(-step2[5] - step2[10], bd);
- output[6] = WRAPLOW(step2[6] + step2[9], bd);
- output[7] = WRAPLOW(-step2[7] - step2[8], bd);
- output[8] = WRAPLOW(step2[7] - step2[8], bd);
- output[9] = WRAPLOW(-step2[6] + step2[9], bd);
- output[10] = WRAPLOW(step2[5] - step2[10], bd);
- output[11] = WRAPLOW(-step2[4] + step2[11], bd);
- output[12] = WRAPLOW(step2[3] - step2[12], bd);
- output[13] = WRAPLOW(-step2[2] + step2[13], bd);
- output[14] = WRAPLOW(step2[1] - step2[14], bd);
- output[15] = WRAPLOW(-step2[0] + step2[15], bd);
+ output[0] = HIGHBD_WRAPLOW(step2[0] + step2[15], bd);
+ output[1] = HIGHBD_WRAPLOW(-step2[1] - step2[14], bd);
+ output[2] = HIGHBD_WRAPLOW(step2[2] + step2[13], bd);
+ output[3] = HIGHBD_WRAPLOW(-step2[3] - step2[12], bd);
+ output[4] = HIGHBD_WRAPLOW(step2[4] + step2[11], bd);
+ output[5] = HIGHBD_WRAPLOW(-step2[5] - step2[10], bd);
+ output[6] = HIGHBD_WRAPLOW(step2[6] + step2[9], bd);
+ output[7] = HIGHBD_WRAPLOW(-step2[7] - step2[8], bd);
+ output[8] = HIGHBD_WRAPLOW(step2[7] - step2[8], bd);
+ output[9] = HIGHBD_WRAPLOW(-step2[6] + step2[9], bd);
+ output[10] = HIGHBD_WRAPLOW(step2[5] - step2[10], bd);
+ output[11] = HIGHBD_WRAPLOW(-step2[4] + step2[11], bd);
+ output[12] = HIGHBD_WRAPLOW(step2[3] - step2[12], bd);
+ output[13] = HIGHBD_WRAPLOW(-step2[2] + step2[13], bd);
+ output[14] = HIGHBD_WRAPLOW(step2[1] - step2[14], bd);
+ output[15] = HIGHBD_WRAPLOW(-step2[0] + step2[15], bd);
}
static void highbd_inv_idtx_add_c(const tran_low_t *input, uint8_t *dest8,
diff --git a/vp10/common/vp10_rtcd_defs.pl b/vp10/common/vp10_rtcd_defs.pl
index 51b674b..8f87b02 100644
--- a/vp10/common/vp10_rtcd_defs.pl
+++ b/vp10/common/vp10_rtcd_defs.pl
@@ -24,29 +24,6 @@
}
forward_decls qw/vp10_common_forward_decls/;
-# x86inc.asm had specific constraints. break it out so it's easy to disable.
-# zero all the variables to avoid tricky else conditions.
-$mmx_x86inc = $sse_x86inc = $sse2_x86inc = $ssse3_x86inc = $avx_x86inc =
- $avx2_x86inc = '';
-$mmx_x86_64_x86inc = $sse_x86_64_x86inc = $sse2_x86_64_x86inc =
- $ssse3_x86_64_x86inc = $avx_x86_64_x86inc = $avx2_x86_64_x86inc = '';
-if (vpx_config("CONFIG_USE_X86INC") eq "yes") {
- $mmx_x86inc = 'mmx';
- $sse_x86inc = 'sse';
- $sse2_x86inc = 'sse2';
- $ssse3_x86inc = 'ssse3';
- $avx_x86inc = 'avx';
- $avx2_x86inc = 'avx2';
- if ($opts{arch} eq "x86_64") {
- $mmx_x86_64_x86inc = 'mmx';
- $sse_x86_64_x86inc = 'sse';
- $sse2_x86_64_x86inc = 'sse2';
- $ssse3_x86_64_x86inc = 'ssse3';
- $avx_x86_64_x86inc = 'avx';
- $avx2_x86_64_x86inc = 'avx2';
- }
-}
-
# functions that are 64 bit only.
$mmx_x86_64 = $sse2_x86_64 = $ssse3_x86_64 = $avx_x86_64 = $avx2_x86_64 = '';
if ($opts{arch} eq "x86_64") {
@@ -409,16 +386,16 @@
specialize qw/vp10_fdct8x8_quant/;
} else {
add_proto qw/int64_t vp10_block_error/, "const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz";
- specialize qw/vp10_block_error avx2 msa/, "$sse2_x86inc";
+ specialize qw/vp10_block_error avx2 msa/;
add_proto qw/int64_t vp10_block_error_fp/, "const int16_t *coeff, const int16_t *dqcoeff, int block_size";
- specialize qw/vp10_block_error_fp neon/, "$sse2_x86inc";
+ specialize qw/vp10_block_error_fp neon sse2/;
add_proto qw/void vp10_quantize_fp/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan";
- specialize qw/vp10_quantize_fp neon sse2/, "$ssse3_x86_64_x86inc";
+ specialize qw/vp10_quantize_fp neon sse2/;
add_proto qw/void vp10_quantize_fp_32x32/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan";
- specialize qw/vp10_quantize_fp_32x32/, "$ssse3_x86_64_x86inc";
+ specialize qw/vp10_quantize_fp_32x32/;
add_proto qw/void vp10_fdct8x8_quant/, "const int16_t *input, int stride, tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan";
specialize qw/vp10_fdct8x8_quant sse2 ssse3 neon/;
@@ -440,7 +417,7 @@
specialize qw/vp10_fht32x32/;
add_proto qw/void vp10_fwht4x4/, "const int16_t *input, tran_low_t *output, int stride";
- specialize qw/vp10_fwht4x4/, "$sse2_x86inc";
+ specialize qw/vp10_fwht4x4/;
} else {
add_proto qw/void vp10_fht4x4/, "const int16_t *input, tran_low_t *output, int stride, int tx_type";
specialize qw/vp10_fht4x4 sse2/;
@@ -461,7 +438,7 @@
specialize qw/vp10_fht32x32/;
add_proto qw/void vp10_fwht4x4/, "const int16_t *input, tran_low_t *output, int stride";
- specialize qw/vp10_fwht4x4 msa/, "$sse2_x86inc";
+ specialize qw/vp10_fwht4x4/;
}
add_proto qw/void vp10_fwd_idtx/, "const int16_t *src_diff, tran_low_t *coeff, int stride, int bs, int tx_type";
diff --git a/vp10/vp10cx.mk b/vp10/vp10cx.mk
index 5d5c88a..5524322 100644
--- a/vp10/vp10cx.mk
+++ b/vp10/vp10cx.mk
@@ -103,10 +103,8 @@
VP10_CX_SRCS-$(HAVE_SSE2) += encoder/x86/highbd_block_error_intrin_sse2.c
endif
-ifeq ($(CONFIG_USE_X86INC),yes)
VP10_CX_SRCS-$(HAVE_SSE2) += encoder/x86/dct_sse2.asm
VP10_CX_SRCS-$(HAVE_SSE2) += encoder/x86/error_sse2.asm
-endif
ifeq ($(ARCH_X86_64),yes)
ifeq ($(CONFIG_USE_X86INC),yes)
diff --git a/vp8/common/mips/msa/postproc_msa.c b/vp8/common/mips/msa/postproc_msa.c
deleted file mode 100644
index 23dcde2..0000000
--- a/vp8/common/mips/msa/postproc_msa.c
+++ /dev/null
@@ -1,801 +0,0 @@
-/*
- * Copyright (c) 2015 The WebM project authors. All Rights Reserved.
- *
- * Use of this source code is governed by a BSD-style license
- * that can be found in the LICENSE file in the root of the source
- * tree. An additional intellectual property rights grant can be found
- * in the file PATENTS. All contributing project authors may
- * be found in the AUTHORS file in the root of the source tree.
- */
-
-#include <stdlib.h>
-#include "./vp8_rtcd.h"
-#include "./vpx_dsp_rtcd.h"
-#include "vp8/common/mips/msa/vp8_macros_msa.h"
-
-static const int16_t vp8_rv_msa[] =
-{
- 8, 5, 2, 2, 8, 12, 4, 9, 8, 3,
- 0, 3, 9, 0, 0, 0, 8, 3, 14, 4,
- 10, 1, 11, 14, 1, 14, 9, 6, 12, 11,
- 8, 6, 10, 0, 0, 8, 9, 0, 3, 14,
- 8, 11, 13, 4, 2, 9, 0, 3, 9, 6,
- 1, 2, 3, 14, 13, 1, 8, 2, 9, 7,
- 3, 3, 1, 13, 13, 6, 6, 5, 2, 7,
- 11, 9, 11, 8, 7, 3, 2, 0, 13, 13,
- 14, 4, 12, 5, 12, 10, 8, 10, 13, 10,
- 4, 14, 4, 10, 0, 8, 11, 1, 13, 7,
- 7, 14, 6, 14, 13, 2, 13, 5, 4, 4,
- 0, 10, 0, 5, 13, 2, 12, 7, 11, 13,
- 8, 0, 4, 10, 7, 2, 7, 2, 2, 5,
- 3, 4, 7, 3, 3, 14, 14, 5, 9, 13,
- 3, 14, 3, 6, 3, 0, 11, 8, 13, 1,
- 13, 1, 12, 0, 10, 9, 7, 6, 2, 8,
- 5, 2, 13, 7, 1, 13, 14, 7, 6, 7,
- 9, 6, 10, 11, 7, 8, 7, 5, 14, 8,
- 4, 4, 0, 8, 7, 10, 0, 8, 14, 11,
- 3, 12, 5, 7, 14, 3, 14, 5, 2, 6,
- 11, 12, 12, 8, 0, 11, 13, 1, 2, 0,
- 5, 10, 14, 7, 8, 0, 4, 11, 0, 8,
- 0, 3, 10, 5, 8, 0, 11, 6, 7, 8,
- 10, 7, 13, 9, 2, 5, 1, 5, 10, 2,
- 4, 3, 5, 6, 10, 8, 9, 4, 11, 14,
- 0, 10, 0, 5, 13, 2, 12, 7, 11, 13,
- 8, 0, 4, 10, 7, 2, 7, 2, 2, 5,
- 3, 4, 7, 3, 3, 14, 14, 5, 9, 13,
- 3, 14, 3, 6, 3, 0, 11, 8, 13, 1,
- 13, 1, 12, 0, 10, 9, 7, 6, 2, 8,
- 5, 2, 13, 7, 1, 13, 14, 7, 6, 7,
- 9, 6, 10, 11, 7, 8, 7, 5, 14, 8,
- 4, 4, 0, 8, 7, 10, 0, 8, 14, 11,
- 3, 12, 5, 7, 14, 3, 14, 5, 2, 6,
- 11, 12, 12, 8, 0, 11, 13, 1, 2, 0,
- 5, 10, 14, 7, 8, 0, 4, 11, 0, 8,
- 0, 3, 10, 5, 8, 0, 11, 6, 7, 8,
- 10, 7, 13, 9, 2, 5, 1, 5, 10, 2,
- 4, 3, 5, 6, 10, 8, 9, 4, 11, 14,
- 3, 8, 3, 7, 8, 5, 11, 4, 12, 3,
- 11, 9, 14, 8, 14, 13, 4, 3, 1, 2,
- 14, 6, 5, 4, 4, 11, 4, 6, 2, 1,
- 5, 8, 8, 12, 13, 5, 14, 10, 12, 13,
- 0, 9, 5, 5, 11, 10, 13, 9, 10, 13,
-};
-
-#define VP8_TRANSPOSE8x16_UB_UB(in0, in1, in2, in3, in4, in5, in6, in7, \
- out0, out1, out2, out3, \
- out4, out5, out6, out7, \
- out8, out9, out10, out11, \
- out12, out13, out14, out15) \
-{ \
- v8i16 temp0, temp1, temp2, temp3, temp4; \
- v8i16 temp5, temp6, temp7, temp8, temp9; \
- \
- ILVR_B4_SH(in1, in0, in3, in2, in5, in4, in7, in6, \
- temp0, temp1, temp2, temp3); \
- ILVR_H2_SH(temp1, temp0, temp3, temp2, temp4, temp5); \
- ILVRL_W2_SH(temp5, temp4, temp6, temp7); \
- ILVL_H2_SH(temp1, temp0, temp3, temp2, temp4, temp5); \
- ILVRL_W2_SH(temp5, temp4, temp8, temp9); \
- ILVL_B4_SH(in1, in0, in3, in2, in5, in4, in7, in6, \
- temp0, temp1, temp2, temp3); \
- ILVR_H2_SH(temp1, temp0, temp3, temp2, temp4, temp5); \
- ILVRL_W2_UB(temp5, temp4, out8, out10); \
- ILVL_H2_SH(temp1, temp0, temp3, temp2, temp4, temp5); \
- ILVRL_W2_UB(temp5, temp4, out12, out14); \
- out0 = (v16u8)temp6; \
- out2 = (v16u8)temp7; \
- out4 = (v16u8)temp8; \
- out6 = (v16u8)temp9; \
- out9 = (v16u8)__msa_ilvl_d((v2i64)out8, (v2i64)out8); \
- out11 = (v16u8)__msa_ilvl_d((v2i64)out10, (v2i64)out10); \
- out13 = (v16u8)__msa_ilvl_d((v2i64)out12, (v2i64)out12); \
- out15 = (v16u8)__msa_ilvl_d((v2i64)out14, (v2i64)out14); \
- out1 = (v16u8)__msa_ilvl_d((v2i64)out0, (v2i64)out0); \
- out3 = (v16u8)__msa_ilvl_d((v2i64)out2, (v2i64)out2); \
- out5 = (v16u8)__msa_ilvl_d((v2i64)out4, (v2i64)out4); \
- out7 = (v16u8)__msa_ilvl_d((v2i64)out6, (v2i64)out6); \
-}
-
-#define VP8_AVER_IF_RETAIN(above2_in, above1_in, src_in, \
- below1_in, below2_in, ref, out) \
-{ \
- v16u8 temp0, temp1; \
- \
- temp1 = __msa_aver_u_b(above2_in, above1_in); \
- temp0 = __msa_aver_u_b(below2_in, below1_in); \
- temp1 = __msa_aver_u_b(temp1, temp0); \
- out = __msa_aver_u_b(src_in, temp1); \
- temp0 = __msa_asub_u_b(src_in, above2_in); \
- temp1 = __msa_asub_u_b(src_in, above1_in); \
- temp0 = (temp0 < ref); \
- temp1 = (temp1 < ref); \
- temp0 = temp0 & temp1; \
- temp1 = __msa_asub_u_b(src_in, below1_in); \
- temp1 = (temp1 < ref); \
- temp0 = temp0 & temp1; \
- temp1 = __msa_asub_u_b(src_in, below2_in); \
- temp1 = (temp1 < ref); \
- temp0 = temp0 & temp1; \
- out = __msa_bmz_v(out, src_in, temp0); \
-}
-
-#define TRANSPOSE12x16_B(in0, in1, in2, in3, in4, in5, in6, in7, \
- in8, in9, in10, in11, in12, in13, in14, in15) \
-{ \
- v8i16 temp0, temp1, temp2, temp3, temp4; \
- v8i16 temp5, temp6, temp7, temp8, temp9; \
- \
- ILVR_B2_SH(in1, in0, in3, in2, temp0, temp1); \
- ILVRL_H2_SH(temp1, temp0, temp2, temp3); \
- ILVR_B2_SH(in5, in4, in7, in6, temp0, temp1); \
- ILVRL_H2_SH(temp1, temp0, temp4, temp5); \
- ILVRL_W2_SH(temp4, temp2, temp0, temp1); \
- ILVRL_W2_SH(temp5, temp3, temp2, temp3); \
- ILVR_B2_SH(in9, in8, in11, in10, temp4, temp5); \
- ILVR_B2_SH(in9, in8, in11, in10, temp4, temp5); \
- ILVRL_H2_SH(temp5, temp4, temp6, temp7); \
- ILVR_B2_SH(in13, in12, in15, in14, temp4, temp5); \
- ILVRL_H2_SH(temp5, temp4, temp8, temp9); \
- ILVRL_W2_SH(temp8, temp6, temp4, temp5); \
- ILVRL_W2_SH(temp9, temp7, temp6, temp7); \
- ILVL_B2_SH(in1, in0, in3, in2, temp8, temp9); \
- ILVR_D2_UB(temp4, temp0, temp5, temp1, in0, in2); \
- in1 = (v16u8)__msa_ilvl_d((v2i64)temp4, (v2i64)temp0); \
- in3 = (v16u8)__msa_ilvl_d((v2i64)temp5, (v2i64)temp1); \
- ILVL_B2_SH(in5, in4, in7, in6, temp0, temp1); \
- ILVR_D2_UB(temp6, temp2, temp7, temp3, in4, in6); \
- in5 = (v16u8)__msa_ilvl_d((v2i64)temp6, (v2i64)temp2); \
- in7 = (v16u8)__msa_ilvl_d((v2i64)temp7, (v2i64)temp3); \
- ILVL_B4_SH(in9, in8, in11, in10, in13, in12, in15, in14, \
- temp2, temp3, temp4, temp5); \
- ILVR_H4_SH(temp9, temp8, temp1, temp0, temp3, temp2, temp5, temp4, \
- temp6, temp7, temp8, temp9); \
- ILVR_W2_SH(temp7, temp6, temp9, temp8, temp0, temp1); \
- in8 = (v16u8)__msa_ilvr_d((v2i64)temp1, (v2i64)temp0); \
- in9 = (v16u8)__msa_ilvl_d((v2i64)temp1, (v2i64)temp0); \
- ILVL_W2_SH(temp7, temp6, temp9, temp8, temp2, temp3); \
- in10 = (v16u8)__msa_ilvr_d((v2i64)temp3, (v2i64)temp2); \
- in11 = (v16u8)__msa_ilvl_d((v2i64)temp3, (v2i64)temp2); \
-}
-
-#define VP8_TRANSPOSE12x8_UB_UB(in0, in1, in2, in3, in4, in5, \
- in6, in7, in8, in9, in10, in11) \
-{ \
- v8i16 temp0, temp1, temp2, temp3; \
- v8i16 temp4, temp5, temp6, temp7; \
- \
- ILVR_B2_SH(in1, in0, in3, in2, temp0, temp1); \
- ILVRL_H2_SH(temp1, temp0, temp2, temp3); \
- ILVR_B2_SH(in5, in4, in7, in6, temp0, temp1); \
- ILVRL_H2_SH(temp1, temp0, temp4, temp5); \
- ILVRL_W2_SH(temp4, temp2, temp0, temp1); \
- ILVRL_W2_SH(temp5, temp3, temp2, temp3); \
- ILVL_B2_SH(in1, in0, in3, in2, temp4, temp5); \
- temp4 = __msa_ilvr_h(temp5, temp4); \
- ILVL_B2_SH(in5, in4, in7, in6, temp6, temp7); \
- temp5 = __msa_ilvr_h(temp7, temp6); \
- ILVRL_W2_SH(temp5, temp4, temp6, temp7); \
- in0 = (v16u8)temp0; \
- in2 = (v16u8)temp1; \
- in4 = (v16u8)temp2; \
- in6 = (v16u8)temp3; \
- in8 = (v16u8)temp6; \
- in10 = (v16u8)temp7; \
- in1 = (v16u8)__msa_ilvl_d((v2i64)temp0, (v2i64)temp0); \
- in3 = (v16u8)__msa_ilvl_d((v2i64)temp1, (v2i64)temp1); \
- in5 = (v16u8)__msa_ilvl_d((v2i64)temp2, (v2i64)temp2); \
- in7 = (v16u8)__msa_ilvl_d((v2i64)temp3, (v2i64)temp3); \
- in9 = (v16u8)__msa_ilvl_d((v2i64)temp6, (v2i64)temp6); \
- in11 = (v16u8)__msa_ilvl_d((v2i64)temp7, (v2i64)temp7); \
-}
-
-static void postproc_down_across_chroma_msa(uint8_t *src_ptr, uint8_t *dst_ptr,
- int32_t src_stride,
- int32_t dst_stride,
- int32_t cols, uint8_t *f)
-{
- uint8_t *p_src = src_ptr;
- uint8_t *p_dst = dst_ptr;
- uint8_t *f_orig = f;
- uint8_t *p_dst_st = dst_ptr;
- uint16_t col;
- uint64_t out0, out1, out2, out3;
- v16u8 above2, above1, below2, below1, src, ref, ref_temp;
- v16u8 inter0, inter1, inter2, inter3, inter4, inter5;
- v16u8 inter6, inter7, inter8, inter9, inter10, inter11;
-
- for (col = (cols / 16); col--;)
- {
- ref = LD_UB(f);
- LD_UB2(p_src - 2 * src_stride, src_stride, above2, above1);
- src = LD_UB(p_src);
- LD_UB2(p_src + 1 * src_stride, src_stride, below1, below2);
- VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter0);
- above2 = LD_UB(p_src + 3 * src_stride);
- VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter1);
- above1 = LD_UB(p_src + 4 * src_stride);
- VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter2);
- src = LD_UB(p_src + 5 * src_stride);
- VP8_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter3);
- below1 = LD_UB(p_src + 6 * src_stride);
- VP8_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter4);
- below2 = LD_UB(p_src + 7 * src_stride);
- VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter5);
- above2 = LD_UB(p_src + 8 * src_stride);
- VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter6);
- above1 = LD_UB(p_src + 9 * src_stride);
- VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter7);
- ST_UB8(inter0, inter1, inter2, inter3, inter4, inter5, inter6, inter7,
- p_dst, dst_stride);
-
- p_dst += 16;
- p_src += 16;
- f += 16;
- }
-
- if (0 != (cols / 16))
- {
- ref = LD_UB(f);
- LD_UB2(p_src - 2 * src_stride, src_stride, above2, above1);
- src = LD_UB(p_src);
- LD_UB2(p_src + 1 * src_stride, src_stride, below1, below2);
- VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter0);
- above2 = LD_UB(p_src + 3 * src_stride);
- VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter1);
- above1 = LD_UB(p_src + 4 * src_stride);
- VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter2);
- src = LD_UB(p_src + 5 * src_stride);
- VP8_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter3);
- below1 = LD_UB(p_src + 6 * src_stride);
- VP8_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter4);
- below2 = LD_UB(p_src + 7 * src_stride);
- VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter5);
- above2 = LD_UB(p_src + 8 * src_stride);
- VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter6);
- above1 = LD_UB(p_src + 9 * src_stride);
- VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter7);
- out0 = __msa_copy_u_d((v2i64)inter0, 0);
- out1 = __msa_copy_u_d((v2i64)inter1, 0);
- out2 = __msa_copy_u_d((v2i64)inter2, 0);
- out3 = __msa_copy_u_d((v2i64)inter3, 0);
- SD4(out0, out1, out2, out3, p_dst, dst_stride);
-
- out0 = __msa_copy_u_d((v2i64)inter4, 0);
- out1 = __msa_copy_u_d((v2i64)inter5, 0);
- out2 = __msa_copy_u_d((v2i64)inter6, 0);
- out3 = __msa_copy_u_d((v2i64)inter7, 0);
- SD4(out0, out1, out2, out3, p_dst + 4 * dst_stride, dst_stride);
- }
-
- f = f_orig;
- p_dst = dst_ptr - 2;
- LD_UB8(p_dst, dst_stride,
- inter0, inter1, inter2, inter3, inter4, inter5, inter6, inter7);
-
- for (col = 0; col < (cols / 8); ++col)
- {
- ref = LD_UB(f);
- f += 8;
- VP8_TRANSPOSE12x8_UB_UB(inter0, inter1, inter2, inter3,
- inter4, inter5, inter6, inter7,
- inter8, inter9, inter10, inter11);
- if (0 == col)
- {
- above2 = inter2;
- above1 = inter2;
- }
- else
- {
- above2 = inter0;
- above1 = inter1;
- }
- src = inter2;
- below1 = inter3;
- below2 = inter4;
- ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 0);
- VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2,
- ref_temp, inter2);
- above2 = inter5;
- ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 1);
- VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2,
- ref_temp, inter3);
- above1 = inter6;
- ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 2);
- VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1,
- ref_temp, inter4);
- src = inter7;
- ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 3);
- VP8_AVER_IF_RETAIN(below1, below2, above2, above1, src,
- ref_temp, inter5);
- below1 = inter8;
- ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 4);
- VP8_AVER_IF_RETAIN(below2, above2, above1, src, below1,
- ref_temp, inter6);
- below2 = inter9;
- ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 5);
- VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2,
- ref_temp, inter7);
- if (col == (cols / 8 - 1))
- {
- above2 = inter9;
- }
- else
- {
- above2 = inter10;
- }
- ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 6);
- VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2,
- ref_temp, inter8);
- if (col == (cols / 8 - 1))
- {
- above1 = inter9;
- }
- else
- {
- above1 = inter11;
- }
- ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 7);
- VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1,
- ref_temp, inter9);
- TRANSPOSE8x8_UB_UB(inter2, inter3, inter4, inter5, inter6, inter7,
- inter8, inter9, inter2, inter3, inter4, inter5,
- inter6, inter7, inter8, inter9);
- p_dst += 8;
- LD_UB2(p_dst, dst_stride, inter0, inter1);
- ST8x1_UB(inter2, p_dst_st);
- ST8x1_UB(inter3, (p_dst_st + 1 * dst_stride));
- LD_UB2(p_dst + 2 * dst_stride, dst_stride, inter2, inter3);
- ST8x1_UB(inter4, (p_dst_st + 2 * dst_stride));
- ST8x1_UB(inter5, (p_dst_st + 3 * dst_stride));
- LD_UB2(p_dst + 4 * dst_stride, dst_stride, inter4, inter5);
- ST8x1_UB(inter6, (p_dst_st + 4 * dst_stride));
- ST8x1_UB(inter7, (p_dst_st + 5 * dst_stride));
- LD_UB2(p_dst + 6 * dst_stride, dst_stride, inter6, inter7);
- ST8x1_UB(inter8, (p_dst_st + 6 * dst_stride));
- ST8x1_UB(inter9, (p_dst_st + 7 * dst_stride));
- p_dst_st += 8;
- }
-}
-
-static void postproc_down_across_luma_msa(uint8_t *src_ptr, uint8_t *dst_ptr,
- int32_t src_stride,
- int32_t dst_stride,
- int32_t cols, uint8_t *f)
-{
- uint8_t *p_src = src_ptr;
- uint8_t *p_dst = dst_ptr;
- uint8_t *p_dst_st = dst_ptr;
- uint8_t *f_orig = f;
- uint16_t col;
- v16u8 above2, above1, below2, below1;
- v16u8 src, ref, ref_temp;
- v16u8 inter0, inter1, inter2, inter3, inter4, inter5, inter6;
- v16u8 inter7, inter8, inter9, inter10, inter11;
- v16u8 inter12, inter13, inter14, inter15;
-
- for (col = (cols / 16); col--;)
- {
- ref = LD_UB(f);
- LD_UB2(p_src - 2 * src_stride, src_stride, above2, above1);
- src = LD_UB(p_src);
- LD_UB2(p_src + 1 * src_stride, src_stride, below1, below2);
- VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter0);
- above2 = LD_UB(p_src + 3 * src_stride);
- VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter1);
- above1 = LD_UB(p_src + 4 * src_stride);
- VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter2);
- src = LD_UB(p_src + 5 * src_stride);
- VP8_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter3);
- below1 = LD_UB(p_src + 6 * src_stride);
- VP8_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter4);
- below2 = LD_UB(p_src + 7 * src_stride);
- VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter5);
- above2 = LD_UB(p_src + 8 * src_stride);
- VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter6);
- above1 = LD_UB(p_src + 9 * src_stride);
- VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter7);
- src = LD_UB(p_src + 10 * src_stride);
- VP8_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter8);
- below1 = LD_UB(p_src + 11 * src_stride);
- VP8_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter9);
- below2 = LD_UB(p_src + 12 * src_stride);
- VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter10);
- above2 = LD_UB(p_src + 13 * src_stride);
- VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter11);
- above1 = LD_UB(p_src + 14 * src_stride);
- VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter12);
- src = LD_UB(p_src + 15 * src_stride);
- VP8_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter13);
- below1 = LD_UB(p_src + 16 * src_stride);
- VP8_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter14);
- below2 = LD_UB(p_src + 17 * src_stride);
- VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter15);
- ST_UB8(inter0, inter1, inter2, inter3, inter4, inter5, inter6, inter7,
- p_dst, dst_stride);
- ST_UB8(inter8, inter9, inter10, inter11, inter12, inter13,
- inter14, inter15, p_dst + 8 * dst_stride, dst_stride);
- p_src += 16;
- p_dst += 16;
- f += 16;
- }
-
- f = f_orig;
- p_dst = dst_ptr - 2;
- LD_UB8(p_dst, dst_stride,
- inter0, inter1, inter2, inter3, inter4, inter5, inter6, inter7);
- LD_UB8(p_dst + 8 * dst_stride, dst_stride,
- inter8, inter9, inter10, inter11, inter12, inter13,
- inter14, inter15);
-
- for (col = 0; col < cols / 8; ++col)
- {
- ref = LD_UB(f);
- f += 8;
- TRANSPOSE12x16_B(inter0, inter1, inter2, inter3, inter4, inter5,
- inter6, inter7, inter8, inter9, inter10, inter11,
- inter12, inter13, inter14, inter15);
- if (0 == col)
- {
- above2 = inter2;
- above1 = inter2;
- }
- else
- {
- above2 = inter0;
- above1 = inter1;
- }
-
- src = inter2;
- below1 = inter3;
- below2 = inter4;
- ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 0);
- VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2,
- ref_temp, inter2);
- above2 = inter5;
- ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 1);
- VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2,
- ref_temp, inter3);
- above1 = inter6;
- ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 2);
- VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1,
- ref_temp, inter4);
- src = inter7;
- ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 3);
- VP8_AVER_IF_RETAIN(below1, below2, above2, above1, src,
- ref_temp, inter5);
- below1 = inter8;
- ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 4);
- VP8_AVER_IF_RETAIN(below2, above2, above1, src, below1,
- ref_temp, inter6);
- below2 = inter9;
- ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 5);
- VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2,
- ref_temp, inter7);
- if (col == (cols / 8 - 1))
- {
- above2 = inter9;
- }
- else
- {
- above2 = inter10;
- }
- ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 6);
- VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2,
- ref_temp, inter8);
- if (col == (cols / 8 - 1))
- {
- above1 = inter9;
- }
- else
- {
- above1 = inter11;
- }
- ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 7);
- VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1,
- ref_temp, inter9);
- VP8_TRANSPOSE8x16_UB_UB(inter2, inter3, inter4, inter5,
- inter6, inter7, inter8, inter9,
- inter2, inter3, inter4, inter5,
- inter6, inter7, inter8, inter9,
- inter10, inter11, inter12, inter13,
- inter14, inter15, above2, above1);
-
- p_dst += 8;
- LD_UB2(p_dst, dst_stride, inter0, inter1);
- ST8x1_UB(inter2, p_dst_st);
- ST8x1_UB(inter3, (p_dst_st + 1 * dst_stride));
- LD_UB2(p_dst + 2 * dst_stride, dst_stride, inter2, inter3);
- ST8x1_UB(inter4, (p_dst_st + 2 * dst_stride));
- ST8x1_UB(inter5, (p_dst_st + 3 * dst_stride));
- LD_UB2(p_dst + 4 * dst_stride, dst_stride, inter4, inter5);
- ST8x1_UB(inter6, (p_dst_st + 4 * dst_stride));
- ST8x1_UB(inter7, (p_dst_st + 5 * dst_stride));
- LD_UB2(p_dst + 6 * dst_stride, dst_stride, inter6, inter7);
- ST8x1_UB(inter8, (p_dst_st + 6 * dst_stride));
- ST8x1_UB(inter9, (p_dst_st + 7 * dst_stride));
- LD_UB2(p_dst + 8 * dst_stride, dst_stride, inter8, inter9);
- ST8x1_UB(inter10, (p_dst_st + 8 * dst_stride));
- ST8x1_UB(inter11, (p_dst_st + 9 * dst_stride));
- LD_UB2(p_dst + 10 * dst_stride, dst_stride, inter10, inter11);
- ST8x1_UB(inter12, (p_dst_st + 10 * dst_stride));
- ST8x1_UB(inter13, (p_dst_st + 11 * dst_stride));
- LD_UB2(p_dst + 12 * dst_stride, dst_stride, inter12, inter13);
- ST8x1_UB(inter14, (p_dst_st + 12 * dst_stride));
- ST8x1_UB(inter15, (p_dst_st + 13 * dst_stride));
- LD_UB2(p_dst + 14 * dst_stride, dst_stride, inter14, inter15);
- ST8x1_UB(above2, (p_dst_st + 14 * dst_stride));
- ST8x1_UB(above1, (p_dst_st + 15 * dst_stride));
- p_dst_st += 8;
- }
-}
-
-void vp8_post_proc_down_and_across_mb_row_msa(uint8_t *src, uint8_t *dst,
- int32_t src_stride,
- int32_t dst_stride,
- int32_t cols, uint8_t *f,
- int32_t size)
-{
- if (8 == size)
- {
- postproc_down_across_chroma_msa(src, dst, src_stride, dst_stride,
- cols, f);
- }
- else if (16 == size)
- {
- postproc_down_across_luma_msa(src, dst, src_stride, dst_stride,
- cols, f);
- }
-}
-
-void vp8_mbpost_proc_across_ip_msa(uint8_t *src_ptr, int32_t pitch,
- int32_t rows, int32_t cols, int32_t flimit)
-{
- int32_t row, col, cnt;
- uint8_t *src_dup = src_ptr;
- v16u8 src0, src, tmp_orig;
- v16u8 tmp = { 0 };
- v16i8 zero = { 0 };
- v8u16 sum_h, src_r_h, src_l_h;
- v4u32 src_r_w, src_l_w;
- v4i32 flimit_vec;
-
- flimit_vec = __msa_fill_w(flimit);
- for (row = rows; row--;)
- {
- int32_t sum_sq = 0;
- int32_t sum = 0;
- src0 = (v16u8)__msa_fill_b(src_dup[0]);
- ST8x1_UB(src0, (src_dup - 8));
-
- src0 = (v16u8)__msa_fill_b(src_dup[cols - 1]);
- ST_UB(src0, src_dup + cols);
- src_dup[cols + 16] = src_dup[cols - 1];
- tmp_orig = (v16u8)__msa_ldi_b(0);
- tmp_orig[15] = tmp[15];
- src = LD_UB(src_dup - 8);
- src[15] = 0;
- ILVRL_B2_UH(zero, src, src_r_h, src_l_h);
- src_r_w = __msa_dotp_u_w(src_r_h, src_r_h);
- src_l_w = __msa_dotp_u_w(src_l_h, src_l_h);
- sum_sq = HADD_SW_S32(src_r_w);
- sum_sq += HADD_SW_S32(src_l_w);
- sum_h = __msa_hadd_u_h(src, src);
- sum = HADD_UH_U32(sum_h);
- {
- v16u8 src7, src8, src_r, src_l;
- v16i8 mask;
- v8u16 add_r, add_l;
- v8i16 sub_r, sub_l, sum_r, sum_l, mask0, mask1;
- v4i32 sum_sq0, sum_sq1, sum_sq2, sum_sq3;
- v4i32 sub0, sub1, sub2, sub3;
- v4i32 sum0_w, sum1_w, sum2_w, sum3_w;
- v4i32 mul0, mul1, mul2, mul3;
- v4i32 total0, total1, total2, total3;
- v8i16 const8 = __msa_fill_h(8);
-
- src7 = LD_UB(src_dup + 7);
- src8 = LD_UB(src_dup - 8);
- for (col = 0; col < (cols >> 4); ++col)
- {
- ILVRL_B2_UB(src7, src8, src_r, src_l);
- HSUB_UB2_SH(src_r, src_l, sub_r, sub_l);
-
- sum_r[0] = sum + sub_r[0];
- for (cnt = 0; cnt < 7; ++cnt)
- {
- sum_r[cnt + 1] = sum_r[cnt] + sub_r[cnt + 1];
- }
- sum_l[0] = sum_r[7] + sub_l[0];
- for (cnt = 0; cnt < 7; ++cnt)
- {
- sum_l[cnt + 1] = sum_l[cnt] + sub_l[cnt + 1];
- }
- sum = sum_l[7];
- src = LD_UB(src_dup + 16 * col);
- ILVRL_B2_UH(zero, src, src_r_h, src_l_h);
- src7 = (v16u8)((const8 + sum_r + (v8i16)src_r_h) >> 4);
- src8 = (v16u8)((const8 + sum_l + (v8i16)src_l_h) >> 4);
- tmp = (v16u8)__msa_pckev_b((v16i8)src8, (v16i8)src7);
-
- HADD_UB2_UH(src_r, src_l, add_r, add_l);
- UNPCK_SH_SW(sub_r, sub0, sub1);
- UNPCK_SH_SW(sub_l, sub2, sub3);
- ILVR_H2_SW(zero, add_r, zero, add_l, sum0_w, sum2_w);
- ILVL_H2_SW(zero, add_r, zero, add_l, sum1_w, sum3_w);
- MUL4(sum0_w, sub0, sum1_w, sub1, sum2_w, sub2, sum3_w, sub3,
- mul0, mul1, mul2, mul3);
- sum_sq0[0] = sum_sq + mul0[0];
- for (cnt = 0; cnt < 3; ++cnt)
- {
- sum_sq0[cnt + 1] = sum_sq0[cnt] + mul0[cnt + 1];
- }
- sum_sq1[0] = sum_sq0[3] + mul1[0];
- for (cnt = 0; cnt < 3; ++cnt)
- {
- sum_sq1[cnt + 1] = sum_sq1[cnt] + mul1[cnt + 1];
- }
- sum_sq2[0] = sum_sq1[3] + mul2[0];
- for (cnt = 0; cnt < 3; ++cnt)
- {
- sum_sq2[cnt + 1] = sum_sq2[cnt] + mul2[cnt + 1];
- }
- sum_sq3[0] = sum_sq2[3] + mul3[0];
- for (cnt = 0; cnt < 3; ++cnt)
- {
- sum_sq3[cnt + 1] = sum_sq3[cnt] + mul3[cnt + 1];
- }
- sum_sq = sum_sq3[3];
-
- UNPCK_SH_SW(sum_r, sum0_w, sum1_w);
- UNPCK_SH_SW(sum_l, sum2_w, sum3_w);
- total0 = sum_sq0 * __msa_ldi_w(15);
- total0 -= sum0_w * sum0_w;
- total1 = sum_sq1 * __msa_ldi_w(15);
- total1 -= sum1_w * sum1_w;
- total2 = sum_sq2 * __msa_ldi_w(15);
- total2 -= sum2_w * sum2_w;
- total3 = sum_sq3 * __msa_ldi_w(15);
- total3 -= sum3_w * sum3_w;
- total0 = (total0 < flimit_vec);
- total1 = (total1 < flimit_vec);
- total2 = (total2 < flimit_vec);
- total3 = (total3 < flimit_vec);
- PCKEV_H2_SH(total1, total0, total3, total2, mask0, mask1);
- mask = __msa_pckev_b((v16i8)mask1, (v16i8)mask0);
- tmp = __msa_bmz_v(tmp, src, (v16u8)mask);
-
- if (col == 0)
- {
- uint64_t src_d;
-
- src_d = __msa_copy_u_d((v2i64)tmp_orig, 1);
- SD(src_d, (src_dup - 8));
- }
-
- src7 = LD_UB(src_dup + 16 * (col + 1) + 7);
- src8 = LD_UB(src_dup + 16 * (col + 1) - 8);
- ST_UB(tmp, (src_dup + (16 * col)));
- }
-
- src_dup += pitch;
- }
- }
-}
-
-void vp8_mbpost_proc_down_msa(uint8_t *dst_ptr, int32_t pitch, int32_t rows,
- int32_t cols, int32_t flimit)
-{
- int32_t row, col, cnt, i;
- const int16_t *rv3 = &vp8_rv_msa[63 & rand()];
- v4i32 flimit_vec;
- v16u8 dst7, dst8, dst_r_b, dst_l_b;
- v16i8 mask;
- v8u16 add_r, add_l;
- v8i16 dst_r_h, dst_l_h, sub_r, sub_l, mask0, mask1;
- v4i32 sub0, sub1, sub2, sub3, total0, total1, total2, total3;
-
- flimit_vec = __msa_fill_w(flimit);
-
- for (col = 0; col < (cols >> 4); ++col)
- {
- uint8_t *dst_tmp = &dst_ptr[col << 4];
- v16u8 dst;
- v16i8 zero = { 0 };
- v16u8 tmp[16];
- v8i16 mult0, mult1, rv2_0, rv2_1;
- v8i16 sum0_h = { 0 };
- v8i16 sum1_h = { 0 };
- v4i32 mul0 = { 0 };
- v4i32 mul1 = { 0 };
- v4i32 mul2 = { 0 };
- v4i32 mul3 = { 0 };
- v4i32 sum0_w, sum1_w, sum2_w, sum3_w;
- v4i32 add0, add1, add2, add3;
- const int16_t *rv2[16];
-
- dst = LD_UB(dst_tmp);
- for (cnt = (col << 4), i = 0; i < 16; ++cnt)
- {
- rv2[i] = rv3 + ((cnt * 17) & 127);
- ++i;
- }
- for (cnt = -8; cnt < 0; ++cnt)
- {
- ST_UB(dst, dst_tmp + cnt * pitch);
- }
-
- dst = LD_UB((dst_tmp + (rows - 1) * pitch));
- for (cnt = rows; cnt < rows + 17; ++cnt)
- {
- ST_UB(dst, dst_tmp + cnt * pitch);
- }
- for (cnt = -8; cnt <= 6; ++cnt)
- {
- dst = LD_UB(dst_tmp + (cnt * pitch));
- UNPCK_UB_SH(dst, dst_r_h, dst_l_h);
- MUL2(dst_r_h, dst_r_h, dst_l_h, dst_l_h, mult0, mult1);
- mul0 += (v4i32)__msa_ilvr_h((v8i16)zero, (v8i16)mult0);
- mul1 += (v4i32)__msa_ilvl_h((v8i16)zero, (v8i16)mult0);
- mul2 += (v4i32)__msa_ilvr_h((v8i16)zero, (v8i16)mult1);
- mul3 += (v4i32)__msa_ilvl_h((v8i16)zero, (v8i16)mult1);
- ADD2(sum0_h, dst_r_h, sum1_h, dst_l_h, sum0_h, sum1_h);
- }
-
- for (row = 0; row < (rows + 8); ++row)
- {
- for (i = 0; i < 8; ++i)
- {
- rv2_0[i] = *(rv2[i] + (row & 127));
- rv2_1[i] = *(rv2[i + 8] + (row & 127));
- }
- dst7 = LD_UB(dst_tmp + (7 * pitch));
- dst8 = LD_UB(dst_tmp - (8 * pitch));
- ILVRL_B2_UB(dst7, dst8, dst_r_b, dst_l_b);
-
- HSUB_UB2_SH(dst_r_b, dst_l_b, sub_r, sub_l);
- UNPCK_SH_SW(sub_r, sub0, sub1);
- UNPCK_SH_SW(sub_l, sub2, sub3);
- sum0_h += sub_r;
- sum1_h += sub_l;
-
- HADD_UB2_UH(dst_r_b, dst_l_b, add_r, add_l);
-
- ILVRL_H2_SW(zero, add_r, add0, add1);
- ILVRL_H2_SW(zero, add_l, add2, add3);
- mul0 += add0 * sub0;
- mul1 += add1 * sub1;
- mul2 += add2 * sub2;
- mul3 += add3 * sub3;
- dst = LD_UB(dst_tmp);
- ILVRL_B2_SH(zero, dst, dst_r_h, dst_l_h);
- dst7 = (v16u8)((rv2_0 + sum0_h + dst_r_h) >> 4);
- dst8 = (v16u8)((rv2_1 + sum1_h + dst_l_h) >> 4);
- tmp[row & 15] = (v16u8)__msa_pckev_b((v16i8)dst8, (v16i8)dst7);
-
- UNPCK_SH_SW(sum0_h, sum0_w, sum1_w);
- UNPCK_SH_SW(sum1_h, sum2_w, sum3_w);
- total0 = mul0 * __msa_ldi_w(15);
- total0 -= sum0_w * sum0_w;
- total1 = mul1 * __msa_ldi_w(15);
- total1 -= sum1_w * sum1_w;
- total2 = mul2 * __msa_ldi_w(15);
- total2 -= sum2_w * sum2_w;
- total3 = mul3 * __msa_ldi_w(15);
- total3 -= sum3_w * sum3_w;
- total0 = (total0 < flimit_vec);
- total1 = (total1 < flimit_vec);
- total2 = (total2 < flimit_vec);
- total3 = (total3 < flimit_vec);
- PCKEV_H2_SH(total1, total0, total3, total2, mask0, mask1);
- mask = __msa_pckev_b((v16i8)mask1, (v16i8)mask0);
- tmp[row & 15] = __msa_bmz_v(tmp[row & 15], dst, (v16u8)mask);
-
- if (row >= 8)
- {
- ST_UB(tmp[(row - 8) & 15], (dst_tmp - 8 * pitch));
- }
-
- dst_tmp += pitch;
- }
- }
-}
diff --git a/vp8/common/postproc.c b/vp8/common/postproc.c
index 6baf00f..05e2bfc 100644
--- a/vp8/common/postproc.c
+++ b/vp8/common/postproc.c
@@ -12,6 +12,7 @@
#include "vpx_config.h"
#include "vpx_dsp_rtcd.h"
#include "vp8_rtcd.h"
+#include "vpx_dsp/postproc.h"
#include "vpx_scale_rtcd.h"
#include "vpx_scale/yv12config.h"
#include "postproc.h"
@@ -72,142 +73,11 @@
};
#endif
-const short vp8_rv[] =
-{
- 8, 5, 2, 2, 8, 12, 4, 9, 8, 3,
- 0, 3, 9, 0, 0, 0, 8, 3, 14, 4,
- 10, 1, 11, 14, 1, 14, 9, 6, 12, 11,
- 8, 6, 10, 0, 0, 8, 9, 0, 3, 14,
- 8, 11, 13, 4, 2, 9, 0, 3, 9, 6,
- 1, 2, 3, 14, 13, 1, 8, 2, 9, 7,
- 3, 3, 1, 13, 13, 6, 6, 5, 2, 7,
- 11, 9, 11, 8, 7, 3, 2, 0, 13, 13,
- 14, 4, 12, 5, 12, 10, 8, 10, 13, 10,
- 4, 14, 4, 10, 0, 8, 11, 1, 13, 7,
- 7, 14, 6, 14, 13, 2, 13, 5, 4, 4,
- 0, 10, 0, 5, 13, 2, 12, 7, 11, 13,
- 8, 0, 4, 10, 7, 2, 7, 2, 2, 5,
- 3, 4, 7, 3, 3, 14, 14, 5, 9, 13,
- 3, 14, 3, 6, 3, 0, 11, 8, 13, 1,
- 13, 1, 12, 0, 10, 9, 7, 6, 2, 8,
- 5, 2, 13, 7, 1, 13, 14, 7, 6, 7,
- 9, 6, 10, 11, 7, 8, 7, 5, 14, 8,
- 4, 4, 0, 8, 7, 10, 0, 8, 14, 11,
- 3, 12, 5, 7, 14, 3, 14, 5, 2, 6,
- 11, 12, 12, 8, 0, 11, 13, 1, 2, 0,
- 5, 10, 14, 7, 8, 0, 4, 11, 0, 8,
- 0, 3, 10, 5, 8, 0, 11, 6, 7, 8,
- 10, 7, 13, 9, 2, 5, 1, 5, 10, 2,
- 4, 3, 5, 6, 10, 8, 9, 4, 11, 14,
- 0, 10, 0, 5, 13, 2, 12, 7, 11, 13,
- 8, 0, 4, 10, 7, 2, 7, 2, 2, 5,
- 3, 4, 7, 3, 3, 14, 14, 5, 9, 13,
- 3, 14, 3, 6, 3, 0, 11, 8, 13, 1,
- 13, 1, 12, 0, 10, 9, 7, 6, 2, 8,
- 5, 2, 13, 7, 1, 13, 14, 7, 6, 7,
- 9, 6, 10, 11, 7, 8, 7, 5, 14, 8,
- 4, 4, 0, 8, 7, 10, 0, 8, 14, 11,
- 3, 12, 5, 7, 14, 3, 14, 5, 2, 6,
- 11, 12, 12, 8, 0, 11, 13, 1, 2, 0,
- 5, 10, 14, 7, 8, 0, 4, 11, 0, 8,
- 0, 3, 10, 5, 8, 0, 11, 6, 7, 8,
- 10, 7, 13, 9, 2, 5, 1, 5, 10, 2,
- 4, 3, 5, 6, 10, 8, 9, 4, 11, 14,
- 3, 8, 3, 7, 8, 5, 11, 4, 12, 3,
- 11, 9, 14, 8, 14, 13, 4, 3, 1, 2,
- 14, 6, 5, 4, 4, 11, 4, 6, 2, 1,
- 5, 8, 8, 12, 13, 5, 14, 10, 12, 13,
- 0, 9, 5, 5, 11, 10, 13, 9, 10, 13,
-};
extern void vp8_blit_text(const char *msg, unsigned char *address, const int pitch);
extern void vp8_blit_line(int x0, int x1, int y0, int y1, unsigned char *image, const int pitch);
/***********************************************************************************************************
*/
-void vp8_post_proc_down_and_across_mb_row_c
-(
- unsigned char *src_ptr,
- unsigned char *dst_ptr,
- int src_pixels_per_line,
- int dst_pixels_per_line,
- int cols,
- unsigned char *f,
- int size
-)
-{
- unsigned char *p_src, *p_dst;
- int row;
- int col;
- unsigned char v;
- unsigned char d[4];
-
- for (row = 0; row < size; row++)
- {
- /* post_proc_down for one row */
- p_src = src_ptr;
- p_dst = dst_ptr;
-
- for (col = 0; col < cols; col++)
- {
- unsigned char p_above2 = p_src[col - 2 * src_pixels_per_line];
- unsigned char p_above1 = p_src[col - src_pixels_per_line];
- unsigned char p_below1 = p_src[col + src_pixels_per_line];
- unsigned char p_below2 = p_src[col + 2 * src_pixels_per_line];
-
- v = p_src[col];
-
- if ((abs(v - p_above2) < f[col]) && (abs(v - p_above1) < f[col])
- && (abs(v - p_below1) < f[col]) && (abs(v - p_below2) < f[col]))
- {
- unsigned char k1, k2, k3;
- k1 = (p_above2 + p_above1 + 1) >> 1;
- k2 = (p_below2 + p_below1 + 1) >> 1;
- k3 = (k1 + k2 + 1) >> 1;
- v = (k3 + v + 1) >> 1;
- }
-
- p_dst[col] = v;
- }
-
- /* now post_proc_across */
- p_src = dst_ptr;
- p_dst = dst_ptr;
-
- p_src[-2] = p_src[-1] = p_src[0];
- p_src[cols] = p_src[cols + 1] = p_src[cols - 1];
-
- for (col = 0; col < cols; col++)
- {
- v = p_src[col];
-
- if ((abs(v - p_src[col - 2]) < f[col])
- && (abs(v - p_src[col - 1]) < f[col])
- && (abs(v - p_src[col + 1]) < f[col])
- && (abs(v - p_src[col + 2]) < f[col]))
- {
- unsigned char k1, k2, k3;
- k1 = (p_src[col - 2] + p_src[col - 1] + 1) >> 1;
- k2 = (p_src[col + 2] + p_src[col + 1] + 1) >> 1;
- k3 = (k1 + k2 + 1) >> 1;
- v = (k3 + v + 1) >> 1;
- }
-
- d[col & 3] = v;
-
- if (col >= 2)
- p_dst[col - 2] = d[(col - 2) & 3];
- }
-
- /* handle the last two pixels */
- p_dst[col - 2] = d[(col - 2) & 3];
- p_dst[col - 1] = d[(col - 1) & 3];
-
- /* next row */
- src_ptr += src_pixels_per_line;
- dst_ptr += dst_pixels_per_line;
- }
-}
-
static int q2mbl(int x)
{
if (x < 20) x = 20;
@@ -216,108 +86,13 @@
return x * x / 3;
}
-void vp8_mbpost_proc_across_ip_c(unsigned char *src, int pitch, int rows, int cols, int flimit)
-{
- int r, c, i;
-
- unsigned char *s = src;
- unsigned char d[16];
-
- for (r = 0; r < rows; r++)
- {
- int sumsq = 0;
- int sum = 0;
-
- for (i = -8; i < 0; i++)
- s[i]=s[0];
-
- /* 17 avoids valgrind warning - we buffer values in c in d
- * and only write them when we've read 8 ahead...
- */
- for (i = 0; i < 17; i++)
- s[i+cols]=s[cols-1];
-
- for (i = -8; i <= 6; i++)
- {
- sumsq += s[i] * s[i];
- sum += s[i];
- d[i+8] = 0;
- }
-
- for (c = 0; c < cols + 8; c++)
- {
- int x = s[c+7] - s[c-8];
- int y = s[c+7] + s[c-8];
-
- sum += x;
- sumsq += x * y;
-
- d[c&15] = s[c];
-
- if (sumsq * 15 - sum * sum < flimit)
- {
- d[c&15] = (8 + sum + s[c]) >> 4;
- }
-
- s[c-8] = d[(c-8)&15];
- }
-
- s += pitch;
- }
-}
-
-void vp8_mbpost_proc_down_c(unsigned char *dst, int pitch, int rows, int cols, int flimit)
-{
- int r, c, i;
- const short *rv3 = &vp8_rv[63&rand()];
-
- for (c = 0; c < cols; c++ )
- {
- unsigned char *s = &dst[c];
- int sumsq = 0;
- int sum = 0;
- unsigned char d[16];
- const short *rv2 = rv3 + ((c * 17) & 127);
-
- for (i = -8; i < 0; i++)
- s[i*pitch]=s[0];
-
- /* 17 avoids valgrind warning - we buffer values in c in d
- * and only write them when we've read 8 ahead...
- */
- for (i = 0; i < 17; i++)
- s[(i+rows)*pitch]=s[(rows-1)*pitch];
-
- for (i = -8; i <= 6; i++)
- {
- sumsq += s[i*pitch] * s[i*pitch];
- sum += s[i*pitch];
- }
-
- for (r = 0; r < rows + 8; r++)
- {
- sumsq += s[7*pitch] * s[ 7*pitch] - s[-8*pitch] * s[-8*pitch];
- sum += s[7*pitch] - s[-8*pitch];
- d[r&15] = s[0];
-
- if (sumsq * 15 - sum * sum < flimit)
- {
- d[r&15] = (rv2[r&127] + sum + s[0]) >> 4;
- }
- if (r >= 8)
- s[-8*pitch] = d[(r-8)&15];
- s += pitch;
- }
- }
-}
-
#if CONFIG_POSTPROC
static void vp8_de_mblock(YV12_BUFFER_CONFIG *post,
int q)
{
- vp8_mbpost_proc_across_ip(post->y_buffer, post->y_stride, post->y_height,
+ vpx_mbpost_proc_across_ip(post->y_buffer, post->y_stride, post->y_height,
post->y_width, q2mbl(q));
- vp8_mbpost_proc_down(post->y_buffer, post->y_stride, post->y_height,
+ vpx_mbpost_proc_down(post->y_buffer, post->y_stride, post->y_height,
post->y_width, q2mbl(q));
}
@@ -365,16 +140,16 @@
}
mode_info_context++;
- vp8_post_proc_down_and_across_mb_row(
+ vpx_post_proc_down_and_across_mb_row(
source->y_buffer + 16 * mbr * source->y_stride,
post->y_buffer + 16 * mbr * post->y_stride, source->y_stride,
post->y_stride, source->y_width, ylimits, 16);
- vp8_post_proc_down_and_across_mb_row(
+ vpx_post_proc_down_and_across_mb_row(
source->u_buffer + 8 * mbr * source->uv_stride,
post->u_buffer + 8 * mbr * post->uv_stride, source->uv_stride,
post->uv_stride, source->uv_width, uvlimits, 8);
- vp8_post_proc_down_and_across_mb_row(
+ vpx_post_proc_down_and_across_mb_row(
source->v_buffer + 8 * mbr * source->uv_stride,
post->v_buffer + 8 * mbr * post->uv_stride, source->uv_stride,
post->uv_stride, source->uv_width, uvlimits, 8);
@@ -409,17 +184,17 @@
/* TODO: The original code don't filter the 2 outer rows and columns. */
for (mbr = 0; mbr < mb_rows; mbr++)
{
- vp8_post_proc_down_and_across_mb_row(
+ vpx_post_proc_down_and_across_mb_row(
source->y_buffer + 16 * mbr * source->y_stride,
source->y_buffer + 16 * mbr * source->y_stride,
source->y_stride, source->y_stride, source->y_width, limits, 16);
if (uvfilter == 1) {
- vp8_post_proc_down_and_across_mb_row(
+ vpx_post_proc_down_and_across_mb_row(
source->u_buffer + 8 * mbr * source->uv_stride,
source->u_buffer + 8 * mbr * source->uv_stride,
source->uv_stride, source->uv_stride, source->uv_width, limits,
8);
- vp8_post_proc_down_and_across_mb_row(
+ vpx_post_proc_down_and_across_mb_row(
source->v_buffer + 8 * mbr * source->uv_stride,
source->v_buffer + 8 * mbr * source->uv_stride,
source->uv_stride, source->uv_stride, source->uv_width, limits,
@@ -428,69 +203,6 @@
}
}
-static double gaussian(double sigma, double mu, double x)
-{
- return 1 / (sigma * sqrt(2.0 * 3.14159265)) *
- (exp(-(x - mu) * (x - mu) / (2 * sigma * sigma)));
-}
-
-static void fillrd(struct postproc_state *state, int q, int a)
-{
- char char_dist[300];
-
- double sigma;
- int i;
-
- vp8_clear_system_state();
-
-
- sigma = a + .5 + .6 * (63 - q) / 63.0;
-
- /* set up a lookup table of 256 entries that matches
- * a gaussian distribution with sigma determined by q.
- */
- {
- int next, j;
-
- next = 0;
-
- for (i = -32; i < 32; i++)
- {
- const int v = (int)(.5 + 256 * gaussian(sigma, 0, i));
-
- if (v)
- {
- for (j = 0; j < v; j++)
- {
- char_dist[next+j] = (char) i;
- }
-
- next = next + j;
- }
-
- }
-
- for (; next < 256; next++)
- char_dist[next] = 0;
-
- }
-
- for (i = 0; i < 3072; i++)
- {
- state->noise[i] = char_dist[rand() & 0xff];
- }
-
- for (i = 0; i < 16; i++)
- {
- state->blackclamp[i] = -char_dist[0];
- state->whiteclamp[i] = -char_dist[0];
- state->bothclamp[i] = -2 * char_dist[0];
- }
-
- state->last_q = q;
- state->last_noise = a;
-}
-
/* Blend the macro block with a solid colored square. Leave the
* edges unblended to give distinction to macro blocks in areas
* filled with the same color block.
@@ -778,7 +490,22 @@
if (oci->postproc_state.last_q != q
|| oci->postproc_state.last_noise != noise_level)
{
- fillrd(&oci->postproc_state, 63 - q, noise_level);
+ double sigma;
+ int clamp, i;
+ struct postproc_state *ppstate = &oci->postproc_state;
+ vp8_clear_system_state();
+ sigma = noise_level + .5 + .6 * q / 63.0;
+ clamp = vpx_setup_noise(sigma, sizeof(ppstate->noise),
+ ppstate->noise);
+ for (i = 0; i < 16; i++)
+ {
+ ppstate->blackclamp[i] = clamp;
+ ppstate->whiteclamp[i] = clamp;
+ ppstate->bothclamp[i] = 2 * clamp;
+ }
+
+ ppstate->last_q = q;
+ ppstate->last_noise = noise_level;
}
vpx_plane_add_noise
diff --git a/vp8/common/rtcd_defs.pl b/vp8/common/rtcd_defs.pl
index 856ede1..a440352 100644
--- a/vp8/common/rtcd_defs.pl
+++ b/vp8/common/rtcd_defs.pl
@@ -156,16 +156,6 @@
# Postproc
#
if (vpx_config("CONFIG_POSTPROC") eq "yes") {
- add_proto qw/void vp8_mbpost_proc_down/, "unsigned char *dst, int pitch, int rows, int cols,int flimit";
- specialize qw/vp8_mbpost_proc_down mmx sse2 msa/;
- $vp8_mbpost_proc_down_sse2=vp8_mbpost_proc_down_xmm;
-
- add_proto qw/void vp8_mbpost_proc_across_ip/, "unsigned char *dst, int pitch, int rows, int cols,int flimit";
- specialize qw/vp8_mbpost_proc_across_ip sse2 msa/;
- $vp8_mbpost_proc_across_ip_sse2=vp8_mbpost_proc_across_ip_xmm;
-
- add_proto qw/void vp8_post_proc_down_and_across_mb_row/, "unsigned char *src, unsigned char *dst, int src_pitch, int dst_pitch, int cols, unsigned char *flimits, int size";
- specialize qw/vp8_post_proc_down_and_across_mb_row sse2 msa/;
add_proto qw/void vp8_blend_mb_inner/, "unsigned char *y, unsigned char *u, unsigned char *v, int y1, int u1, int v1, int alpha, int stride";
# no asm yet
diff --git a/vp8/common/x86/mfqe_sse2.asm b/vp8/common/x86/mfqe_sse2.asm
index a8a7f56..8177b79 100644
--- a/vp8/common/x86/mfqe_sse2.asm
+++ b/vp8/common/x86/mfqe_sse2.asm
@@ -45,7 +45,7 @@
mov rcx, 16 ; loop count
pxor xmm6, xmm6
-.combine
+.combine:
movdqa xmm2, [rax]
movdqa xmm4, [rdx]
add rax, rsi
@@ -122,7 +122,7 @@
mov rcx, 8 ; loop count
pxor xmm4, xmm4
-.combine
+.combine:
movq xmm2, [rax]
movq xmm3, [rdx]
add rax, rsi
@@ -189,7 +189,7 @@
; Because we're working with the actual output frames
; we can't depend on any kind of data alignment.
-.accumulate
+.accumulate:
movdqa xmm0, [rax] ; src1
movdqa xmm1, [rdx] ; src2
add rax, rcx ; src1 + stride1
diff --git a/vp8/common/x86/postproc_mmx.asm b/vp8/common/x86/postproc_mmx.asm
deleted file mode 100644
index 1a89e7e..0000000
--- a/vp8/common/x86/postproc_mmx.asm
+++ /dev/null
@@ -1,253 +0,0 @@
-;
-; Copyright (c) 2010 The WebM project authors. All Rights Reserved.
-;
-; Use of this source code is governed by a BSD-style license
-; that can be found in the LICENSE file in the root of the source
-; tree. An additional intellectual property rights grant can be found
-; in the file PATENTS. All contributing project authors may
-; be found in the AUTHORS file in the root of the source tree.
-;
-
-
-%include "vpx_ports/x86_abi_support.asm"
-
-%define VP8_FILTER_WEIGHT 128
-%define VP8_FILTER_SHIFT 7
-
-;void vp8_mbpost_proc_down_mmx(unsigned char *dst,
-; int pitch, int rows, int cols,int flimit)
-extern sym(vp8_rv)
-global sym(vp8_mbpost_proc_down_mmx) PRIVATE
-sym(vp8_mbpost_proc_down_mmx):
- push rbp
- mov rbp, rsp
- SHADOW_ARGS_TO_STACK 5
- GET_GOT rbx
- push rsi
- push rdi
- ; end prolog
-
- ALIGN_STACK 16, rax
- sub rsp, 136
-
- ; unsigned char d[16][8] at [rsp]
- ; create flimit2 at [rsp+128]
- mov eax, dword ptr arg(4) ;flimit
- mov [rsp+128], eax
- mov [rsp+128+4], eax
-%define flimit2 [rsp+128]
-
-%if ABI_IS_32BIT=0
- lea r8, [GLOBAL(sym(vp8_rv))]
-%endif
-
- ;rows +=8;
- add dword ptr arg(2), 8
-
- ;for(c=0; c<cols; c+=4)
-.loop_col:
- mov rsi, arg(0) ;s
- pxor mm0, mm0 ;
-
- movsxd rax, dword ptr arg(1) ;pitch ;
-
- ; this copies the last row down into the border 8 rows
- mov rdi, rsi
- mov rdx, arg(2)
- sub rdx, 9
- imul rdx, rax
- lea rdi, [rdi+rdx]
- movq mm1, QWORD ptr[rdi] ; first row
- mov rcx, 8
-.init_borderd ; initialize borders
- lea rdi, [rdi + rax]
- movq [rdi], mm1
-
- dec rcx
- jne .init_borderd
-
- neg rax ; rax = -pitch
-
- ; this copies the first row up into the border 8 rows
- mov rdi, rsi
- movq mm1, QWORD ptr[rdi] ; first row
- mov rcx, 8
-.init_border ; initialize borders
- lea rdi, [rdi + rax]
- movq [rdi], mm1
-
- dec rcx
- jne .init_border
-
-
- lea rsi, [rsi + rax*8]; ; rdi = s[-pitch*8]
- neg rax
-
-
- pxor mm5, mm5
- pxor mm6, mm6 ;
-
- pxor mm7, mm7 ;
- mov rdi, rsi
-
- mov rcx, 15 ;
-
-.loop_initvar:
- movd mm1, DWORD PTR [rdi];
- punpcklbw mm1, mm0 ;
-
- paddw mm5, mm1 ;
- pmullw mm1, mm1 ;
-
- movq mm2, mm1 ;
- punpcklwd mm1, mm0 ;
-
- punpckhwd mm2, mm0 ;
- paddd mm6, mm1 ;
-
- paddd mm7, mm2 ;
- lea rdi, [rdi+rax] ;
-
- dec rcx
- jne .loop_initvar
- ;save the var and sum
- xor rdx, rdx
-.loop_row:
- movd mm1, DWORD PTR [rsi] ; [s-pitch*8]
- movd mm2, DWORD PTR [rdi] ; [s+pitch*7]
-
- punpcklbw mm1, mm0
- punpcklbw mm2, mm0
-
- paddw mm5, mm2
- psubw mm5, mm1
-
- pmullw mm2, mm2
- movq mm4, mm2
-
- punpcklwd mm2, mm0
- punpckhwd mm4, mm0
-
- paddd mm6, mm2
- paddd mm7, mm4
-
- pmullw mm1, mm1
- movq mm2, mm1
-
- punpcklwd mm1, mm0
- psubd mm6, mm1
-
- punpckhwd mm2, mm0
- psubd mm7, mm2
-
-
- movq mm3, mm6
- pslld mm3, 4
-
- psubd mm3, mm6
- movq mm1, mm5
-
- movq mm4, mm5
- pmullw mm1, mm1
-
- pmulhw mm4, mm4
- movq mm2, mm1
-
- punpcklwd mm1, mm4
- punpckhwd mm2, mm4
-
- movq mm4, mm7
- pslld mm4, 4
-
- psubd mm4, mm7
-
- psubd mm3, mm1
- psubd mm4, mm2
-
- psubd mm3, flimit2
- psubd mm4, flimit2
-
- psrad mm3, 31
- psrad mm4, 31
-
- packssdw mm3, mm4
- packsswb mm3, mm0
-
- movd mm1, DWORD PTR [rsi+rax*8]
-
- movq mm2, mm1
- punpcklbw mm1, mm0
-
- paddw mm1, mm5
- mov rcx, rdx
-
- and rcx, 127
-%if ABI_IS_32BIT=1 && CONFIG_PIC=1
- push rax
- lea rax, [GLOBAL(sym(vp8_rv))]
- movq mm4, [rax + rcx*2] ;vp8_rv[rcx*2]
- pop rax
-%elif ABI_IS_32BIT=0
- movq mm4, [r8 + rcx*2] ;vp8_rv[rcx*2]
-%else
- movq mm4, [sym(vp8_rv) + rcx*2]
-%endif
- paddw mm1, mm4
- psraw mm1, 4
-
- packuswb mm1, mm0
- pand mm1, mm3
-
- pandn mm3, mm2
- por mm1, mm3
-
- and rcx, 15
- movd DWORD PTR [rsp+rcx*4], mm1 ;d[rcx*4]
-
- cmp edx, 8
- jl .skip_assignment
-
- mov rcx, rdx
- sub rcx, 8
- and rcx, 15
- movd mm1, DWORD PTR [rsp+rcx*4] ;d[rcx*4]
- movd [rsi], mm1
-
-.skip_assignment
- lea rsi, [rsi+rax]
-
- lea rdi, [rdi+rax]
- add rdx, 1
-
- cmp edx, dword arg(2) ;rows
- jl .loop_row
-
-
- add dword arg(0), 4 ; s += 4
- sub dword arg(3), 4 ; cols -= 4
- cmp dword arg(3), 0
- jg .loop_col
-
- add rsp, 136
- pop rsp
-
- ; begin epilog
- pop rdi
- pop rsi
- RESTORE_GOT
- UNSHADOW_ARGS
- pop rbp
- ret
-%undef flimit2
-
-
-SECTION_RODATA
-align 16
-Blur:
- times 16 dw 16
- times 8 dw 64
- times 16 dw 16
- times 8 dw 0
-
-rd:
- times 4 dw 0x40
diff --git a/vp8/vp8_common.mk b/vp8/vp8_common.mk
index 4c4e856..ec39d2e 100644
--- a/vp8/vp8_common.mk
+++ b/vp8/vp8_common.mk
@@ -96,9 +96,7 @@
VP8_COMMON_SRCS-$(HAVE_SSSE3) += common/x86/subpixel_ssse3.asm
ifeq ($(CONFIG_POSTPROC),yes)
-VP8_COMMON_SRCS-$(HAVE_MMX) += common/x86/postproc_mmx.asm
VP8_COMMON_SRCS-$(HAVE_SSE2) += common/x86/mfqe_sse2.asm
-VP8_COMMON_SRCS-$(HAVE_SSE2) += common/x86/postproc_sse2.asm
endif
ifeq ($(ARCH_X86_64),yes)
@@ -123,7 +121,6 @@
ifeq ($(CONFIG_POSTPROC),yes)
VP8_COMMON_SRCS-$(HAVE_MSA) += common/mips/msa/mfqe_msa.c
-VP8_COMMON_SRCS-$(HAVE_MSA) += common/mips/msa/postproc_msa.c
endif
# common (c)
diff --git a/vp9/common/vp9_alloccommon.c b/vp9/common/vp9_alloccommon.c
index 7dd1005..b4b120b 100644
--- a/vp9/common/vp9_alloccommon.c
+++ b/vp9/common/vp9_alloccommon.c
@@ -103,6 +103,8 @@
#if CONFIG_VP9_POSTPROC
vpx_free_frame_buffer(&cm->post_proc_buffer);
vpx_free_frame_buffer(&cm->post_proc_buffer_int);
+ vpx_free(cm->postproc_state.limits);
+ cm->postproc_state.limits = 0;
#else
(void)cm;
#endif
diff --git a/vp9/common/vp9_postproc.c b/vp9/common/vp9_postproc.c
index c04cc8f..4651f67 100644
--- a/vp9/common/vp9_postproc.c
+++ b/vp9/common/vp9_postproc.c
@@ -18,6 +18,7 @@
#include "./vp9_rtcd.h"
#include "vpx_dsp/vpx_dsp_common.h"
+#include "vpx_dsp/postproc.h"
#include "vpx_ports/mem.h"
#include "vpx_ports/system_state.h"
#include "vpx_scale/vpx_scale.h"
@@ -32,128 +33,9 @@
1, 1, 4, 1, 1
};
-const int16_t vp9_rv[] = {
- 8, 5, 2, 2, 8, 12, 4, 9, 8, 3,
- 0, 3, 9, 0, 0, 0, 8, 3, 14, 4,
- 10, 1, 11, 14, 1, 14, 9, 6, 12, 11,
- 8, 6, 10, 0, 0, 8, 9, 0, 3, 14,
- 8, 11, 13, 4, 2, 9, 0, 3, 9, 6,
- 1, 2, 3, 14, 13, 1, 8, 2, 9, 7,
- 3, 3, 1, 13, 13, 6, 6, 5, 2, 7,
- 11, 9, 11, 8, 7, 3, 2, 0, 13, 13,
- 14, 4, 12, 5, 12, 10, 8, 10, 13, 10,
- 4, 14, 4, 10, 0, 8, 11, 1, 13, 7,
- 7, 14, 6, 14, 13, 2, 13, 5, 4, 4,
- 0, 10, 0, 5, 13, 2, 12, 7, 11, 13,
- 8, 0, 4, 10, 7, 2, 7, 2, 2, 5,
- 3, 4, 7, 3, 3, 14, 14, 5, 9, 13,
- 3, 14, 3, 6, 3, 0, 11, 8, 13, 1,
- 13, 1, 12, 0, 10, 9, 7, 6, 2, 8,
- 5, 2, 13, 7, 1, 13, 14, 7, 6, 7,
- 9, 6, 10, 11, 7, 8, 7, 5, 14, 8,
- 4, 4, 0, 8, 7, 10, 0, 8, 14, 11,
- 3, 12, 5, 7, 14, 3, 14, 5, 2, 6,
- 11, 12, 12, 8, 0, 11, 13, 1, 2, 0,
- 5, 10, 14, 7, 8, 0, 4, 11, 0, 8,
- 0, 3, 10, 5, 8, 0, 11, 6, 7, 8,
- 10, 7, 13, 9, 2, 5, 1, 5, 10, 2,
- 4, 3, 5, 6, 10, 8, 9, 4, 11, 14,
- 0, 10, 0, 5, 13, 2, 12, 7, 11, 13,
- 8, 0, 4, 10, 7, 2, 7, 2, 2, 5,
- 3, 4, 7, 3, 3, 14, 14, 5, 9, 13,
- 3, 14, 3, 6, 3, 0, 11, 8, 13, 1,
- 13, 1, 12, 0, 10, 9, 7, 6, 2, 8,
- 5, 2, 13, 7, 1, 13, 14, 7, 6, 7,
- 9, 6, 10, 11, 7, 8, 7, 5, 14, 8,
- 4, 4, 0, 8, 7, 10, 0, 8, 14, 11,
- 3, 12, 5, 7, 14, 3, 14, 5, 2, 6,
- 11, 12, 12, 8, 0, 11, 13, 1, 2, 0,
- 5, 10, 14, 7, 8, 0, 4, 11, 0, 8,
- 0, 3, 10, 5, 8, 0, 11, 6, 7, 8,
- 10, 7, 13, 9, 2, 5, 1, 5, 10, 2,
- 4, 3, 5, 6, 10, 8, 9, 4, 11, 14,
- 3, 8, 3, 7, 8, 5, 11, 4, 12, 3,
- 11, 9, 14, 8, 14, 13, 4, 3, 1, 2,
- 14, 6, 5, 4, 4, 11, 4, 6, 2, 1,
- 5, 8, 8, 12, 13, 5, 14, 10, 12, 13,
- 0, 9, 5, 5, 11, 10, 13, 9, 10, 13,
-};
-
static const uint8_t q_diff_thresh = 20;
static const uint8_t last_q_thresh = 170;
-
-void vp9_post_proc_down_and_across_c(const uint8_t *src_ptr,
- uint8_t *dst_ptr,
- int src_pixels_per_line,
- int dst_pixels_per_line,
- int rows,
- int cols,
- int flimit) {
- uint8_t const *p_src;
- uint8_t *p_dst;
- int row, col, i, v, kernel;
- int pitch = src_pixels_per_line;
- uint8_t d[8];
- (void)dst_pixels_per_line;
-
- for (row = 0; row < rows; row++) {
- /* post_proc_down for one row */
- p_src = src_ptr;
- p_dst = dst_ptr;
-
- for (col = 0; col < cols; col++) {
- kernel = 4;
- v = p_src[col];
-
- for (i = -2; i <= 2; i++) {
- if (abs(v - p_src[col + i * pitch]) > flimit)
- goto down_skip_convolve;
-
- kernel += kernel5[2 + i] * p_src[col + i * pitch];
- }
-
- v = (kernel >> 3);
- down_skip_convolve:
- p_dst[col] = v;
- }
-
- /* now post_proc_across */
- p_src = dst_ptr;
- p_dst = dst_ptr;
-
- for (i = 0; i < 8; i++)
- d[i] = p_src[i];
-
- for (col = 0; col < cols; col++) {
- kernel = 4;
- v = p_src[col];
-
- d[col & 7] = v;
-
- for (i = -2; i <= 2; i++) {
- if (abs(v - p_src[col + i]) > flimit)
- goto across_skip_convolve;
-
- kernel += kernel5[2 + i] * p_src[col + i];
- }
-
- d[col & 7] = (kernel >> 3);
- across_skip_convolve:
-
- if (col >= 2)
- p_dst[col - 2] = d[(col - 2) & 7];
- }
-
- /* handle the last two pixels */
- p_dst[col - 2] = d[(col - 2) & 7];
- p_dst[col - 1] = d[(col - 1) & 7];
-
-
- /* next row */
- src_ptr += pitch;
- dst_ptr += pitch;
- }
-}
+extern const int16_t vpx_rv[];
#if CONFIG_VP9_HIGHBITDEPTH
void vp9_highbd_post_proc_down_and_across_c(const uint16_t *src_ptr,
@@ -237,41 +119,6 @@
return x * x / 3;
}
-void vp9_mbpost_proc_across_ip_c(uint8_t *src, int pitch,
- int rows, int cols, int flimit) {
- int r, c, i;
- uint8_t *s = src;
- uint8_t d[16];
-
- for (r = 0; r < rows; r++) {
- int sumsq = 0;
- int sum = 0;
-
- for (i = -8; i <= 6; i++) {
- sumsq += s[i] * s[i];
- sum += s[i];
- d[i + 8] = 0;
- }
-
- for (c = 0; c < cols + 8; c++) {
- int x = s[c + 7] - s[c - 8];
- int y = s[c + 7] + s[c - 8];
-
- sum += x;
- sumsq += x * y;
-
- d[c & 15] = s[c];
-
- if (sumsq * 15 - sum * sum < flimit) {
- d[c & 15] = (8 + sum + s[c]) >> 4;
- }
-
- s[c - 8] = d[(c - 8) & 15];
- }
- s += pitch;
- }
-}
-
#if CONFIG_VP9_HIGHBITDEPTH
void vp9_highbd_mbpost_proc_across_ip_c(uint16_t *src, int pitch,
int rows, int cols, int flimit) {
@@ -312,43 +159,12 @@
}
#endif // CONFIG_VP9_HIGHBITDEPTH
-void vp9_mbpost_proc_down_c(uint8_t *dst, int pitch,
- int rows, int cols, int flimit) {
- int r, c, i;
- const short *rv3 = &vp9_rv[63 & rand()]; // NOLINT
-
- for (c = 0; c < cols; c++) {
- uint8_t *s = &dst[c];
- int sumsq = 0;
- int sum = 0;
- uint8_t d[16];
- const int16_t *rv2 = rv3 + ((c * 17) & 127);
-
- for (i = -8; i <= 6; i++) {
- sumsq += s[i * pitch] * s[i * pitch];
- sum += s[i * pitch];
- }
-
- for (r = 0; r < rows + 8; r++) {
- sumsq += s[7 * pitch] * s[ 7 * pitch] - s[-8 * pitch] * s[-8 * pitch];
- sum += s[7 * pitch] - s[-8 * pitch];
- d[r & 15] = s[0];
-
- if (sumsq * 15 - sum * sum < flimit) {
- d[r & 15] = (rv2[r & 127] + sum + s[0]) >> 4;
- }
-
- s[-8 * pitch] = d[(r - 8) & 15];
- s += pitch;
- }
- }
-}
#if CONFIG_VP9_HIGHBITDEPTH
void vp9_highbd_mbpost_proc_down_c(uint16_t *dst, int pitch,
int rows, int cols, int flimit) {
int r, c, i;
- const int16_t *rv3 = &vp9_rv[63 & rand()]; // NOLINT
+ const int16_t *rv3 = &vpx_rv[63 & rand()]; // NOLINT
for (c = 0; c < cols; c++) {
uint16_t *s = &dst[c];
@@ -382,14 +198,14 @@
YV12_BUFFER_CONFIG *post,
int q,
int low_var_thresh,
- int flag) {
- double level = 6.0e-05 * q * q * q - .0067 * q * q + .306 * q + .0065;
- int ppl = (int)(level + .5);
+ int flag,
+ uint8_t *limits) {
(void) low_var_thresh;
(void) flag;
-
#if CONFIG_VP9_HIGHBITDEPTH
if (source->flags & YV12_FLAG_HIGHBITDEPTH) {
+ double level = 6.0e-05 * q * q * q - .0067 * q * q + .306 * q + .0065;
+ int ppl = (int)(level + .5);
vp9_highbd_post_proc_down_and_across(CONVERT_TO_SHORTPTR(source->y_buffer),
CONVERT_TO_SHORTPTR(post->y_buffer),
source->y_stride, post->y_stride,
@@ -415,177 +231,68 @@
source->uv_height, source->uv_width,
ppl);
} else {
- vp9_post_proc_down_and_across(source->y_buffer, post->y_buffer,
- source->y_stride, post->y_stride,
- source->y_height, source->y_width, ppl);
-
- vp9_mbpost_proc_across_ip(post->y_buffer, post->y_stride, post->y_height,
+#endif // CONFIG_VP9_HIGHBITDEPTH
+ vp9_deblock(source, post, q, limits);
+ vpx_mbpost_proc_across_ip(post->y_buffer, post->y_stride, post->y_height,
post->y_width, q2mbl(q));
-
- vp9_mbpost_proc_down(post->y_buffer, post->y_stride, post->y_height,
+ vpx_mbpost_proc_down(post->y_buffer, post->y_stride, post->y_height,
post->y_width, q2mbl(q));
-
- vp9_post_proc_down_and_across(source->u_buffer, post->u_buffer,
- source->uv_stride, post->uv_stride,
- source->uv_height, source->uv_width, ppl);
- vp9_post_proc_down_and_across(source->v_buffer, post->v_buffer,
- source->uv_stride, post->uv_stride,
- source->uv_height, source->uv_width, ppl);
+#if CONFIG_VP9_HIGHBITDEPTH
}
-#else
- vp9_post_proc_down_and_across(source->y_buffer, post->y_buffer,
- source->y_stride, post->y_stride,
- source->y_height, source->y_width, ppl);
-
- vp9_mbpost_proc_across_ip(post->y_buffer, post->y_stride, post->y_height,
- post->y_width, q2mbl(q));
-
- vp9_mbpost_proc_down(post->y_buffer, post->y_stride, post->y_height,
- post->y_width, q2mbl(q));
-
- vp9_post_proc_down_and_across(source->u_buffer, post->u_buffer,
- source->uv_stride, post->uv_stride,
- source->uv_height, source->uv_width, ppl);
- vp9_post_proc_down_and_across(source->v_buffer, post->v_buffer,
- source->uv_stride, post->uv_stride,
- source->uv_height, source->uv_width, ppl);
#endif // CONFIG_VP9_HIGHBITDEPTH
}
void vp9_deblock(const YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst,
- int q) {
+ int q, uint8_t *limits) {
const int ppl = (int)(6.0e-05 * q * q * q - 0.0067 * q * q + 0.306 * q
+ 0.0065 + 0.5);
- int i;
-
- const uint8_t *const srcs[3] = {src->y_buffer, src->u_buffer, src->v_buffer};
- const int src_strides[3] = {src->y_stride, src->uv_stride, src->uv_stride};
- const int src_widths[3] = {src->y_width, src->uv_width, src->uv_width};
- const int src_heights[3] = {src->y_height, src->uv_height, src->uv_height};
-
- uint8_t *const dsts[3] = {dst->y_buffer, dst->u_buffer, dst->v_buffer};
- const int dst_strides[3] = {dst->y_stride, dst->uv_stride, dst->uv_stride};
-
- for (i = 0; i < MAX_MB_PLANE; ++i) {
#if CONFIG_VP9_HIGHBITDEPTH
- assert((src->flags & YV12_FLAG_HIGHBITDEPTH) ==
- (dst->flags & YV12_FLAG_HIGHBITDEPTH));
- if (src->flags & YV12_FLAG_HIGHBITDEPTH) {
+ if (src->flags & YV12_FLAG_HIGHBITDEPTH) {
+ int i;
+ const uint8_t * const srcs[3] =
+ {src->y_buffer, src->u_buffer, src->v_buffer};
+ const int src_strides[3] = {src->y_stride, src->uv_stride, src->uv_stride};
+ const int src_widths[3] = {src->y_width, src->uv_width, src->uv_width};
+ const int src_heights[3] = {src->y_height, src->uv_height, src->uv_height};
+
+ uint8_t * const dsts[3] = {dst->y_buffer, dst->u_buffer, dst->v_buffer};
+ const int dst_strides[3] = {dst->y_stride, dst->uv_stride, dst->uv_stride};
+ for (i = 0; i < MAX_MB_PLANE; ++i) {
vp9_highbd_post_proc_down_and_across(CONVERT_TO_SHORTPTR(srcs[i]),
CONVERT_TO_SHORTPTR(dsts[i]),
src_strides[i], dst_strides[i],
src_heights[i], src_widths[i], ppl);
- } else {
- vp9_post_proc_down_and_across(srcs[i], dsts[i],
- src_strides[i], dst_strides[i],
- src_heights[i], src_widths[i], ppl);
}
-#else
- vp9_post_proc_down_and_across(srcs[i], dsts[i],
- src_strides[i], dst_strides[i],
- src_heights[i], src_widths[i], ppl);
+ } else {
#endif // CONFIG_VP9_HIGHBITDEPTH
+ int mbr;
+ const int mb_rows = src->y_height / 16;
+ const int mb_cols = src->y_width / 16;
+
+ memset(limits, (unsigned char) ppl, 16 * mb_cols);
+
+ for (mbr = 0; mbr < mb_rows; mbr++) {
+ vpx_post_proc_down_and_across_mb_row(
+ src->y_buffer + 16 * mbr * src->y_stride,
+ dst->y_buffer + 16 * mbr * dst->y_stride, src->y_stride,
+ dst->y_stride, src->y_width, limits, 16);
+ vpx_post_proc_down_and_across_mb_row(
+ src->u_buffer + 8 * mbr * src->uv_stride,
+ dst->u_buffer + 8 * mbr * dst->uv_stride, src->uv_stride,
+ dst->uv_stride, src->uv_width, limits, 8);
+ vpx_post_proc_down_and_across_mb_row(
+ src->v_buffer + 8 * mbr * src->uv_stride,
+ dst->v_buffer + 8 * mbr * dst->uv_stride, src->uv_stride,
+ dst->uv_stride, src->uv_width, limits, 8);
+ }
+#if CONFIG_VP9_HIGHBITDEPTH
}
+#endif // CONFIG_VP9_HIGHBITDEPTH
}
void vp9_denoise(const YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst,
- int q) {
- const int ppl = (int)(6.0e-05 * q * q * q - 0.0067 * q * q + 0.306 * q
- + 0.0065 + 0.5);
- int i;
-
- const uint8_t *const srcs[3] = {src->y_buffer, src->u_buffer, src->v_buffer};
- const int src_strides[3] = {src->y_stride, src->uv_stride, src->uv_stride};
- const int src_widths[3] = {src->y_width, src->uv_width, src->uv_width};
- const int src_heights[3] = {src->y_height, src->uv_height, src->uv_height};
-
- uint8_t *const dsts[3] = {dst->y_buffer, dst->u_buffer, dst->v_buffer};
- const int dst_strides[3] = {dst->y_stride, dst->uv_stride, dst->uv_stride};
-
- for (i = 0; i < MAX_MB_PLANE; ++i) {
- const int src_stride = src_strides[i];
- const int src_width = src_widths[i] - 4;
- const int src_height = src_heights[i] - 4;
- const int dst_stride = dst_strides[i];
-
-#if CONFIG_VP9_HIGHBITDEPTH
- assert((src->flags & YV12_FLAG_HIGHBITDEPTH) ==
- (dst->flags & YV12_FLAG_HIGHBITDEPTH));
- if (src->flags & YV12_FLAG_HIGHBITDEPTH) {
- const uint16_t *const src_plane = CONVERT_TO_SHORTPTR(
- srcs[i] + 2 * src_stride + 2);
- uint16_t *const dst_plane = CONVERT_TO_SHORTPTR(
- dsts[i] + 2 * dst_stride + 2);
- vp9_highbd_post_proc_down_and_across(src_plane, dst_plane, src_stride,
- dst_stride, src_height, src_width,
- ppl);
- } else {
- const uint8_t *const src_plane = srcs[i] + 2 * src_stride + 2;
- uint8_t *const dst_plane = dsts[i] + 2 * dst_stride + 2;
-
- vp9_post_proc_down_and_across(src_plane, dst_plane, src_stride,
- dst_stride, src_height, src_width, ppl);
- }
-#else
- const uint8_t *const src_plane = srcs[i] + 2 * src_stride + 2;
- uint8_t *const dst_plane = dsts[i] + 2 * dst_stride + 2;
- vp9_post_proc_down_and_across(src_plane, dst_plane, src_stride, dst_stride,
- src_height, src_width, ppl);
-#endif
- }
-}
-
-static double gaussian(double sigma, double mu, double x) {
- return 1 / (sigma * sqrt(2.0 * 3.14159265)) *
- (exp(-(x - mu) * (x - mu) / (2 * sigma * sigma)));
-}
-
-static void fillrd(struct postproc_state *state, int q, int a) {
- char char_dist[300];
-
- double sigma;
- int ai = a, qi = q, i;
-
- vpx_clear_system_state();
-
- sigma = ai + .5 + .6 * (63 - qi) / 63.0;
-
- /* set up a lookup table of 256 entries that matches
- * a gaussian distribution with sigma determined by q.
- */
- {
- int next, j;
-
- next = 0;
-
- for (i = -32; i < 32; i++) {
- int a_i = (int)(0.5 + 256 * gaussian(sigma, 0, i));
-
- if (a_i) {
- for (j = 0; j < a_i; j++) {
- char_dist[next + j] = (char) i;
- }
-
- next = next + j;
- }
- }
-
- for (; next < 256; next++)
- char_dist[next] = 0;
- }
-
- for (i = 0; i < 3072; i++) {
- state->noise[i] = char_dist[rand() & 0xff]; // NOLINT
- }
-
- for (i = 0; i < 16; i++) {
- state->blackclamp[i] = -char_dist[0];
- state->whiteclamp[i] = -char_dist[0];
- state->bothclamp[i] = -2 * char_dist[0];
- }
-
- state->last_q = q;
- state->last_noise = a;
+ int q, uint8_t *limits) {
+ vp9_deblock(src, dst, q, limits);
}
static void swap_mi_and_prev_mi(VP9_COMMON *cm) {
@@ -663,6 +370,14 @@
vpx_internal_error(&cm->error, VPX_CODEC_MEM_ERROR,
"Failed to allocate post-processing buffer");
+ if (flags & (VP9D_DEMACROBLOCK | VP9D_DEBLOCK)) {
+ if (!cm->postproc_state.limits) {
+ cm->postproc_state.limits = vpx_calloc(
+ cm->width, sizeof(*cm->postproc_state.limits));
+ }
+ }
+
+
if ((flags & VP9D_MFQE) && cm->current_video_frame >= 2 &&
cm->postproc_state.last_frame_valid && cm->bit_depth == 8 &&
cm->postproc_state.last_base_qindex <= last_q_thresh &&
@@ -677,17 +392,19 @@
if ((flags & VP9D_DEMACROBLOCK) && cm->post_proc_buffer_int.buffer_alloc) {
deblock_and_de_macro_block(&cm->post_proc_buffer_int, ppbuf,
q + (ppflags->deblocking_level - 5) * 10,
- 1, 0);
+ 1, 0, cm->postproc_state.limits);
} else if (flags & VP9D_DEBLOCK) {
- vp9_deblock(&cm->post_proc_buffer_int, ppbuf, q);
+ vp9_deblock(&cm->post_proc_buffer_int, ppbuf, q,
+ cm->postproc_state.limits);
} else {
vp8_yv12_copy_frame(&cm->post_proc_buffer_int, ppbuf);
}
} else if (flags & VP9D_DEMACROBLOCK) {
deblock_and_de_macro_block(cm->frame_to_show, ppbuf,
- q + (ppflags->deblocking_level - 5) * 10, 1, 0);
+ q + (ppflags->deblocking_level - 5) * 10, 1, 0,
+ cm->postproc_state.limits);
} else if (flags & VP9D_DEBLOCK) {
- vp9_deblock(cm->frame_to_show, ppbuf, q);
+ vp9_deblock(cm->frame_to_show, ppbuf, q, cm->postproc_state.limits);
} else {
vp8_yv12_copy_frame(cm->frame_to_show, ppbuf);
}
@@ -699,7 +416,20 @@
const int noise_level = ppflags->noise_level;
if (ppstate->last_q != q ||
ppstate->last_noise != noise_level) {
- fillrd(ppstate, 63 - q, noise_level);
+ double sigma;
+ int clamp, i;
+ vpx_clear_system_state();
+ sigma = noise_level + .5 + .6 * q / 63.0;
+ clamp = vpx_setup_noise(sigma, sizeof(ppstate->noise),
+ ppstate->noise);
+
+ for (i = 0; i < 16; i++) {
+ ppstate->blackclamp[i] = clamp;
+ ppstate->whiteclamp[i] = clamp;
+ ppstate->bothclamp[i] = 2 * clamp;
+ }
+ ppstate->last_q = q;
+ ppstate->last_noise = noise_level;
}
vpx_plane_add_noise(ppbuf->y_buffer, ppstate->noise, ppstate->blackclamp,
ppstate->whiteclamp, ppstate->bothclamp,
diff --git a/vp9/common/vp9_postproc.h b/vp9/common/vp9_postproc.h
index 035c9cd..60e6f52 100644
--- a/vp9/common/vp9_postproc.h
+++ b/vp9/common/vp9_postproc.h
@@ -33,6 +33,7 @@
DECLARE_ALIGNED(16, char, blackclamp[16]);
DECLARE_ALIGNED(16, char, whiteclamp[16]);
DECLARE_ALIGNED(16, char, bothclamp[16]);
+ uint8_t *limits;
};
struct VP9Common;
@@ -42,9 +43,11 @@
int vp9_post_proc_frame(struct VP9Common *cm,
YV12_BUFFER_CONFIG *dest, vp9_ppflags_t *flags);
-void vp9_denoise(const YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst, int q);
+void vp9_denoise(const YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst, int q,
+ uint8_t *limits);
-void vp9_deblock(const YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst, int q);
+void vp9_deblock(const YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst, int q,
+ uint8_t *limits);
#ifdef __cplusplus
} // extern "C"
diff --git a/vp9/common/vp9_rtcd_defs.pl b/vp9/common/vp9_rtcd_defs.pl
index 8461336..f315a3b 100644
--- a/vp9/common/vp9_rtcd_defs.pl
+++ b/vp9/common/vp9_rtcd_defs.pl
@@ -21,29 +21,6 @@
}
forward_decls qw/vp9_common_forward_decls/;
-# x86inc.asm had specific constraints. break it out so it's easy to disable.
-# zero all the variables to avoid tricky else conditions.
-$mmx_x86inc = $sse_x86inc = $sse2_x86inc = $ssse3_x86inc = $avx_x86inc =
- $avx2_x86inc = '';
-$mmx_x86_64_x86inc = $sse_x86_64_x86inc = $sse2_x86_64_x86inc =
- $ssse3_x86_64_x86inc = $avx_x86_64_x86inc = $avx2_x86_64_x86inc = '';
-if (vpx_config("CONFIG_USE_X86INC") eq "yes") {
- $mmx_x86inc = 'mmx';
- $sse_x86inc = 'sse';
- $sse2_x86inc = 'sse2';
- $ssse3_x86inc = 'ssse3';
- $avx_x86inc = 'avx';
- $avx2_x86inc = 'avx2';
- if ($opts{arch} eq "x86_64") {
- $mmx_x86_64_x86inc = 'mmx';
- $sse_x86_64_x86inc = 'sse';
- $sse2_x86_64_x86inc = 'sse2';
- $ssse3_x86_64_x86inc = 'ssse3';
- $avx_x86_64_x86inc = 'avx';
- $avx2_x86_64_x86inc = 'avx2';
- }
-}
-
# functions that are 64 bit only.
$mmx_x86_64 = $sse2_x86_64 = $ssse3_x86_64 = $avx_x86_64 = $avx2_x86_64 = '';
if ($opts{arch} eq "x86_64") {
@@ -58,18 +35,6 @@
# post proc
#
if (vpx_config("CONFIG_VP9_POSTPROC") eq "yes") {
-add_proto qw/void vp9_mbpost_proc_down/, "uint8_t *dst, int pitch, int rows, int cols, int flimit";
-specialize qw/vp9_mbpost_proc_down sse2/;
-$vp9_mbpost_proc_down_sse2=vp9_mbpost_proc_down_xmm;
-
-add_proto qw/void vp9_mbpost_proc_across_ip/, "uint8_t *src, int pitch, int rows, int cols, int flimit";
-specialize qw/vp9_mbpost_proc_across_ip sse2/;
-$vp9_mbpost_proc_across_ip_sse2=vp9_mbpost_proc_across_ip_xmm;
-
-add_proto qw/void vp9_post_proc_down_and_across/, "const uint8_t *src_ptr, uint8_t *dst_ptr, int src_pixels_per_line, int dst_pixels_per_line, int rows, int cols, int flimit";
-specialize qw/vp9_post_proc_down_and_across sse2/;
-$vp9_post_proc_down_and_across_sse2=vp9_post_proc_down_and_across_xmm;
-
add_proto qw/void vp9_filter_by_weight16x16/, "const uint8_t *src, int src_stride, uint8_t *dst, int dst_stride, int src_weight";
specialize qw/vp9_filter_by_weight16x16 sse2 msa/;
@@ -202,10 +167,10 @@
specialize qw/vp9_block_error/;
add_proto qw/int64_t vp9_highbd_block_error/, "const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz, int bd";
- specialize qw/vp9_highbd_block_error/, "$sse2_x86inc";
+ specialize qw/vp9_highbd_block_error sse2/;
add_proto qw/int64_t vp9_highbd_block_error_8bit/, "const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz";
- specialize qw/vp9_highbd_block_error_8bit/, "$sse2_x86inc", "$avx_x86inc";
+ specialize qw/vp9_highbd_block_error_8bit sse2 avx/;
add_proto qw/void vp9_quantize_fp/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan";
specialize qw/vp9_quantize_fp/;
@@ -217,16 +182,16 @@
specialize qw/vp9_fdct8x8_quant/;
} else {
add_proto qw/int64_t vp9_block_error/, "const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz";
- specialize qw/vp9_block_error avx2 msa/, "$sse2_x86inc";
+ specialize qw/vp9_block_error avx2 msa sse2/;
add_proto qw/int64_t vp9_block_error_fp/, "const int16_t *coeff, const int16_t *dqcoeff, int block_size";
- specialize qw/vp9_block_error_fp neon/, "$sse2_x86inc";
+ specialize qw/vp9_block_error_fp neon sse2/;
add_proto qw/void vp9_quantize_fp/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan";
- specialize qw/vp9_quantize_fp neon sse2/, "$ssse3_x86_64_x86inc";
+ specialize qw/vp9_quantize_fp neon sse2/, "$ssse3_x86_64";
add_proto qw/void vp9_quantize_fp_32x32/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan";
- specialize qw/vp9_quantize_fp_32x32/, "$ssse3_x86_64_x86inc";
+ specialize qw/vp9_quantize_fp_32x32/, "$ssse3_x86_64";
add_proto qw/void vp9_fdct8x8_quant/, "const int16_t *input, int stride, tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan";
specialize qw/vp9_fdct8x8_quant sse2 ssse3 neon/;
@@ -245,7 +210,7 @@
specialize qw/vp9_fht16x16 sse2/;
add_proto qw/void vp9_fwht4x4/, "const int16_t *input, tran_low_t *output, int stride";
- specialize qw/vp9_fwht4x4/, "$sse2_x86inc";
+ specialize qw/vp9_fwht4x4 sse2/;
} else {
add_proto qw/void vp9_fht4x4/, "const int16_t *input, tran_low_t *output, int stride, int tx_type";
specialize qw/vp9_fht4x4 sse2 msa/;
@@ -257,7 +222,7 @@
specialize qw/vp9_fht16x16 sse2 msa/;
add_proto qw/void vp9_fwht4x4/, "const int16_t *input, tran_low_t *output, int stride";
- specialize qw/vp9_fwht4x4 msa/, "$sse2_x86inc";
+ specialize qw/vp9_fwht4x4 msa sse2/;
}
#
diff --git a/vp9/common/x86/vp9_mfqe_sse2.asm b/vp9/common/x86/vp9_mfqe_sse2.asm
index 6029420..3085204 100644
--- a/vp9/common/x86/vp9_mfqe_sse2.asm
+++ b/vp9/common/x86/vp9_mfqe_sse2.asm
@@ -46,7 +46,7 @@
mov rcx, 16 ; loop count
pxor xmm6, xmm6
-.combine
+.combine:
movdqa xmm2, [rax]
movdqa xmm4, [rdx]
add rax, rsi
@@ -123,7 +123,7 @@
mov rcx, 8 ; loop count
pxor xmm4, xmm4
-.combine
+.combine:
movq xmm2, [rax]
movq xmm3, [rdx]
add rax, rsi
@@ -190,7 +190,7 @@
; Because we're working with the actual output frames
; we can't depend on any kind of data alignment.
-.accumulate
+.accumulate:
movdqa xmm0, [rax] ; src1
movdqa xmm1, [rdx] ; src2
add rax, rcx ; src1 + stride1
diff --git a/vp9/common/x86/vp9_postproc_sse2.asm b/vp9/common/x86/vp9_postproc_sse2.asm
deleted file mode 100644
index 4307628..0000000
--- a/vp9/common/x86/vp9_postproc_sse2.asm
+++ /dev/null
@@ -1,632 +0,0 @@
-;
-; Copyright (c) 2010 The WebM project authors. All Rights Reserved.
-;
-; Use of this source code is governed by a BSD-style license
-; that can be found in the LICENSE file in the root of the source
-; tree. An additional intellectual property rights grant can be found
-; in the file PATENTS. All contributing project authors may
-; be found in the AUTHORS file in the root of the source tree.
-;
-
-
-%include "vpx_ports/x86_abi_support.asm"
-
-;void vp9_post_proc_down_and_across_xmm
-;(
-; unsigned char *src_ptr,
-; unsigned char *dst_ptr,
-; int src_pixels_per_line,
-; int dst_pixels_per_line,
-; int rows,
-; int cols,
-; int flimit
-;)
-global sym(vp9_post_proc_down_and_across_xmm) PRIVATE
-sym(vp9_post_proc_down_and_across_xmm):
- push rbp
- mov rbp, rsp
- SHADOW_ARGS_TO_STACK 7
- SAVE_XMM 7
- GET_GOT rbx
- push rsi
- push rdi
- ; end prolog
-
-%if ABI_IS_32BIT=1 && CONFIG_PIC=1
- ALIGN_STACK 16, rax
- ; move the global rd onto the stack, since we don't have enough registers
- ; to do PIC addressing
- movdqa xmm0, [GLOBAL(rd42)]
- sub rsp, 16
- movdqa [rsp], xmm0
-%define RD42 [rsp]
-%else
-%define RD42 [GLOBAL(rd42)]
-%endif
-
-
- movd xmm2, dword ptr arg(6) ;flimit
- punpcklwd xmm2, xmm2
- punpckldq xmm2, xmm2
- punpcklqdq xmm2, xmm2
-
- mov rsi, arg(0) ;src_ptr
- mov rdi, arg(1) ;dst_ptr
-
- movsxd rcx, DWORD PTR arg(4) ;rows
- movsxd rax, DWORD PTR arg(2) ;src_pixels_per_line ; destination pitch?
- pxor xmm0, xmm0 ; mm0 = 00000000
-
-.nextrow:
-
- xor rdx, rdx ; clear out rdx for use as loop counter
-.nextcol:
- movq xmm3, QWORD PTR [rsi] ; mm4 = r0 p0..p7
- punpcklbw xmm3, xmm0 ; mm3 = p0..p3
- movdqa xmm1, xmm3 ; mm1 = p0..p3
- psllw xmm3, 2 ;
-
- movq xmm5, QWORD PTR [rsi + rax] ; mm4 = r1 p0..p7
- punpcklbw xmm5, xmm0 ; mm5 = r1 p0..p3
- paddusw xmm3, xmm5 ; mm3 += mm6
-
- ; thresholding
- movdqa xmm7, xmm1 ; mm7 = r0 p0..p3
- psubusw xmm7, xmm5 ; mm7 = r0 p0..p3 - r1 p0..p3
- psubusw xmm5, xmm1 ; mm5 = r1 p0..p3 - r0 p0..p3
- paddusw xmm7, xmm5 ; mm7 = abs(r0 p0..p3 - r1 p0..p3)
- pcmpgtw xmm7, xmm2
-
- movq xmm5, QWORD PTR [rsi + 2*rax] ; mm4 = r2 p0..p7
- punpcklbw xmm5, xmm0 ; mm5 = r2 p0..p3
- paddusw xmm3, xmm5 ; mm3 += mm5
-
- ; thresholding
- movdqa xmm6, xmm1 ; mm6 = r0 p0..p3
- psubusw xmm6, xmm5 ; mm6 = r0 p0..p3 - r2 p0..p3
- psubusw xmm5, xmm1 ; mm5 = r2 p0..p3 - r2 p0..p3
- paddusw xmm6, xmm5 ; mm6 = abs(r0 p0..p3 - r2 p0..p3)
- pcmpgtw xmm6, xmm2
- por xmm7, xmm6 ; accumulate thresholds
-
-
- neg rax
- movq xmm5, QWORD PTR [rsi+2*rax] ; mm4 = r-2 p0..p7
- punpcklbw xmm5, xmm0 ; mm5 = r-2 p0..p3
- paddusw xmm3, xmm5 ; mm3 += mm5
-
- ; thresholding
- movdqa xmm6, xmm1 ; mm6 = r0 p0..p3
- psubusw xmm6, xmm5 ; mm6 = p0..p3 - r-2 p0..p3
- psubusw xmm5, xmm1 ; mm5 = r-2 p0..p3 - p0..p3
- paddusw xmm6, xmm5 ; mm6 = abs(r0 p0..p3 - r-2 p0..p3)
- pcmpgtw xmm6, xmm2
- por xmm7, xmm6 ; accumulate thresholds
-
- movq xmm4, QWORD PTR [rsi+rax] ; mm4 = r-1 p0..p7
- punpcklbw xmm4, xmm0 ; mm4 = r-1 p0..p3
- paddusw xmm3, xmm4 ; mm3 += mm5
-
- ; thresholding
- movdqa xmm6, xmm1 ; mm6 = r0 p0..p3
- psubusw xmm6, xmm4 ; mm6 = p0..p3 - r-2 p0..p3
- psubusw xmm4, xmm1 ; mm5 = r-1 p0..p3 - p0..p3
- paddusw xmm6, xmm4 ; mm6 = abs(r0 p0..p3 - r-1 p0..p3)
- pcmpgtw xmm6, xmm2
- por xmm7, xmm6 ; accumulate thresholds
-
-
- paddusw xmm3, RD42 ; mm3 += round value
- psraw xmm3, 3 ; mm3 /= 8
-
- pand xmm1, xmm7 ; mm1 select vals > thresh from source
- pandn xmm7, xmm3 ; mm7 select vals < thresh from blurred result
- paddusw xmm1, xmm7 ; combination
-
- packuswb xmm1, xmm0 ; pack to bytes
- movq QWORD PTR [rdi], xmm1 ;
-
- neg rax ; pitch is positive
- add rsi, 8
- add rdi, 8
-
- add rdx, 8
- cmp edx, dword arg(5) ;cols
-
- jl .nextcol
-
- ; done with the all cols, start the across filtering in place
- sub rsi, rdx
- sub rdi, rdx
-
- xor rdx, rdx
- movq mm0, QWORD PTR [rdi-8];
-
-.acrossnextcol:
- movq xmm7, QWORD PTR [rdi +rdx -2]
- movd xmm4, DWORD PTR [rdi +rdx +6]
-
- pslldq xmm4, 8
- por xmm4, xmm7
-
- movdqa xmm3, xmm4
- psrldq xmm3, 2
- punpcklbw xmm3, xmm0 ; mm3 = p0..p3
- movdqa xmm1, xmm3 ; mm1 = p0..p3
- psllw xmm3, 2
-
-
- movdqa xmm5, xmm4
- psrldq xmm5, 3
- punpcklbw xmm5, xmm0 ; mm5 = p1..p4
- paddusw xmm3, xmm5 ; mm3 += mm6
-
- ; thresholding
- movdqa xmm7, xmm1 ; mm7 = p0..p3
- psubusw xmm7, xmm5 ; mm7 = p0..p3 - p1..p4
- psubusw xmm5, xmm1 ; mm5 = p1..p4 - p0..p3
- paddusw xmm7, xmm5 ; mm7 = abs(p0..p3 - p1..p4)
- pcmpgtw xmm7, xmm2
-
- movdqa xmm5, xmm4
- psrldq xmm5, 4
- punpcklbw xmm5, xmm0 ; mm5 = p2..p5
- paddusw xmm3, xmm5 ; mm3 += mm5
-
- ; thresholding
- movdqa xmm6, xmm1 ; mm6 = p0..p3
- psubusw xmm6, xmm5 ; mm6 = p0..p3 - p1..p4
- psubusw xmm5, xmm1 ; mm5 = p1..p4 - p0..p3
- paddusw xmm6, xmm5 ; mm6 = abs(p0..p3 - p1..p4)
- pcmpgtw xmm6, xmm2
- por xmm7, xmm6 ; accumulate thresholds
-
-
- movdqa xmm5, xmm4 ; mm5 = p-2..p5
- punpcklbw xmm5, xmm0 ; mm5 = p-2..p1
- paddusw xmm3, xmm5 ; mm3 += mm5
-
- ; thresholding
- movdqa xmm6, xmm1 ; mm6 = p0..p3
- psubusw xmm6, xmm5 ; mm6 = p0..p3 - p1..p4
- psubusw xmm5, xmm1 ; mm5 = p1..p4 - p0..p3
- paddusw xmm6, xmm5 ; mm6 = abs(p0..p3 - p1..p4)
- pcmpgtw xmm6, xmm2
- por xmm7, xmm6 ; accumulate thresholds
-
- psrldq xmm4, 1 ; mm4 = p-1..p5
- punpcklbw xmm4, xmm0 ; mm4 = p-1..p2
- paddusw xmm3, xmm4 ; mm3 += mm5
-
- ; thresholding
- movdqa xmm6, xmm1 ; mm6 = p0..p3
- psubusw xmm6, xmm4 ; mm6 = p0..p3 - p1..p4
- psubusw xmm4, xmm1 ; mm5 = p1..p4 - p0..p3
- paddusw xmm6, xmm4 ; mm6 = abs(p0..p3 - p1..p4)
- pcmpgtw xmm6, xmm2
- por xmm7, xmm6 ; accumulate thresholds
-
- paddusw xmm3, RD42 ; mm3 += round value
- psraw xmm3, 3 ; mm3 /= 8
-
- pand xmm1, xmm7 ; mm1 select vals > thresh from source
- pandn xmm7, xmm3 ; mm7 select vals < thresh from blurred result
- paddusw xmm1, xmm7 ; combination
-
- packuswb xmm1, xmm0 ; pack to bytes
- movq QWORD PTR [rdi+rdx-8], mm0 ; store previous four bytes
- movdq2q mm0, xmm1
-
- add rdx, 8
- cmp edx, dword arg(5) ;cols
- jl .acrossnextcol;
-
- ; last 8 pixels
- movq QWORD PTR [rdi+rdx-8], mm0
-
- ; done with this rwo
- add rsi,rax ; next line
- mov eax, dword arg(3) ;dst_pixels_per_line ; destination pitch?
- add rdi,rax ; next destination
- mov eax, dword arg(2) ;src_pixels_per_line ; destination pitch?
-
- dec rcx ; decrement count
- jnz .nextrow ; next row
-
-%if ABI_IS_32BIT=1 && CONFIG_PIC=1
- add rsp,16
- pop rsp
-%endif
- ; begin epilog
- pop rdi
- pop rsi
- RESTORE_GOT
- RESTORE_XMM
- UNSHADOW_ARGS
- pop rbp
- ret
-%undef RD42
-
-
-;void vp9_mbpost_proc_down_xmm(unsigned char *dst,
-; int pitch, int rows, int cols,int flimit)
-extern sym(vp9_rv)
-global sym(vp9_mbpost_proc_down_xmm) PRIVATE
-sym(vp9_mbpost_proc_down_xmm):
- push rbp
- mov rbp, rsp
- SHADOW_ARGS_TO_STACK 5
- SAVE_XMM 7
- GET_GOT rbx
- push rsi
- push rdi
- ; end prolog
-
- ALIGN_STACK 16, rax
- sub rsp, 128+16
-
- ; unsigned char d[16][8] at [rsp]
- ; create flimit2 at [rsp+128]
- mov eax, dword ptr arg(4) ;flimit
- mov [rsp+128], eax
- mov [rsp+128+4], eax
- mov [rsp+128+8], eax
- mov [rsp+128+12], eax
-%define flimit4 [rsp+128]
-
-%if ABI_IS_32BIT=0
- lea r8, [GLOBAL(sym(vp9_rv))]
-%endif
-
- ;rows +=8;
- add dword arg(2), 8
-
- ;for(c=0; c<cols; c+=8)
-.loop_col:
- mov rsi, arg(0) ; s
- pxor xmm0, xmm0 ;
-
- movsxd rax, dword ptr arg(1) ;pitch ;
- neg rax ; rax = -pitch
-
- lea rsi, [rsi + rax*8]; ; rdi = s[-pitch*8]
- neg rax
-
-
- pxor xmm5, xmm5
- pxor xmm6, xmm6 ;
-
- pxor xmm7, xmm7 ;
- mov rdi, rsi
-
- mov rcx, 15 ;
-
-.loop_initvar:
- movq xmm1, QWORD PTR [rdi];
- punpcklbw xmm1, xmm0 ;
-
- paddw xmm5, xmm1 ;
- pmullw xmm1, xmm1 ;
-
- movdqa xmm2, xmm1 ;
- punpcklwd xmm1, xmm0 ;
-
- punpckhwd xmm2, xmm0 ;
- paddd xmm6, xmm1 ;
-
- paddd xmm7, xmm2 ;
- lea rdi, [rdi+rax] ;
-
- dec rcx
- jne .loop_initvar
- ;save the var and sum
- xor rdx, rdx
-.loop_row:
- movq xmm1, QWORD PTR [rsi] ; [s-pitch*8]
- movq xmm2, QWORD PTR [rdi] ; [s+pitch*7]
-
- punpcklbw xmm1, xmm0
- punpcklbw xmm2, xmm0
-
- paddw xmm5, xmm2
- psubw xmm5, xmm1
-
- pmullw xmm2, xmm2
- movdqa xmm4, xmm2
-
- punpcklwd xmm2, xmm0
- punpckhwd xmm4, xmm0
-
- paddd xmm6, xmm2
- paddd xmm7, xmm4
-
- pmullw xmm1, xmm1
- movdqa xmm2, xmm1
-
- punpcklwd xmm1, xmm0
- psubd xmm6, xmm1
-
- punpckhwd xmm2, xmm0
- psubd xmm7, xmm2
-
-
- movdqa xmm3, xmm6
- pslld xmm3, 4
-
- psubd xmm3, xmm6
- movdqa xmm1, xmm5
-
- movdqa xmm4, xmm5
- pmullw xmm1, xmm1
-
- pmulhw xmm4, xmm4
- movdqa xmm2, xmm1
-
- punpcklwd xmm1, xmm4
- punpckhwd xmm2, xmm4
-
- movdqa xmm4, xmm7
- pslld xmm4, 4
-
- psubd xmm4, xmm7
-
- psubd xmm3, xmm1
- psubd xmm4, xmm2
-
- psubd xmm3, flimit4
- psubd xmm4, flimit4
-
- psrad xmm3, 31
- psrad xmm4, 31
-
- packssdw xmm3, xmm4
- packsswb xmm3, xmm0
-
- movq xmm1, QWORD PTR [rsi+rax*8]
-
- movq xmm2, xmm1
- punpcklbw xmm1, xmm0
-
- paddw xmm1, xmm5
- mov rcx, rdx
-
- and rcx, 127
-%if ABI_IS_32BIT=1 && CONFIG_PIC=1
- push rax
- lea rax, [GLOBAL(sym(vp9_rv))]
- movdqu xmm4, [rax + rcx*2] ;vp9_rv[rcx*2]
- pop rax
-%elif ABI_IS_32BIT=0
- movdqu xmm4, [r8 + rcx*2] ;vp9_rv[rcx*2]
-%else
- movdqu xmm4, [sym(vp9_rv) + rcx*2]
-%endif
-
- paddw xmm1, xmm4
- ;paddw xmm1, eight8s
- psraw xmm1, 4
-
- packuswb xmm1, xmm0
- pand xmm1, xmm3
-
- pandn xmm3, xmm2
- por xmm1, xmm3
-
- and rcx, 15
- movq QWORD PTR [rsp + rcx*8], xmm1 ;d[rcx*8]
-
- mov rcx, rdx
- sub rcx, 8
-
- and rcx, 15
- movq mm0, [rsp + rcx*8] ;d[rcx*8]
-
- movq [rsi], mm0
- lea rsi, [rsi+rax]
-
- lea rdi, [rdi+rax]
- add rdx, 1
-
- cmp edx, dword arg(2) ;rows
- jl .loop_row
-
- add dword arg(0), 8 ; s += 8
- sub dword arg(3), 8 ; cols -= 8
- cmp dword arg(3), 0
- jg .loop_col
-
- add rsp, 128+16
- pop rsp
-
- ; begin epilog
- pop rdi
- pop rsi
- RESTORE_GOT
- RESTORE_XMM
- UNSHADOW_ARGS
- pop rbp
- ret
-%undef flimit4
-
-
-;void vp9_mbpost_proc_across_ip_xmm(unsigned char *src,
-; int pitch, int rows, int cols,int flimit)
-global sym(vp9_mbpost_proc_across_ip_xmm) PRIVATE
-sym(vp9_mbpost_proc_across_ip_xmm):
- push rbp
- mov rbp, rsp
- SHADOW_ARGS_TO_STACK 5
- SAVE_XMM 7
- GET_GOT rbx
- push rsi
- push rdi
- ; end prolog
-
- ALIGN_STACK 16, rax
- sub rsp, 16
-
- ; create flimit4 at [rsp]
- mov eax, dword ptr arg(4) ;flimit
- mov [rsp], eax
- mov [rsp+4], eax
- mov [rsp+8], eax
- mov [rsp+12], eax
-%define flimit4 [rsp]
-
-
- ;for(r=0;r<rows;r++)
-.ip_row_loop:
-
- xor rdx, rdx ;sumsq=0;
- xor rcx, rcx ;sum=0;
- mov rsi, arg(0); s
- mov rdi, -8
-.ip_var_loop:
- ;for(i=-8;i<=6;i++)
- ;{
- ; sumsq += s[i]*s[i];
- ; sum += s[i];
- ;}
- movzx eax, byte [rsi+rdi]
- add ecx, eax
- mul al
- add edx, eax
- add rdi, 1
- cmp rdi, 6
- jle .ip_var_loop
-
-
- ;mov rax, sumsq
- ;movd xmm7, rax
- movd xmm7, edx
-
- ;mov rax, sum
- ;movd xmm6, rax
- movd xmm6, ecx
-
- mov rsi, arg(0) ;s
- xor rcx, rcx
-
- movsxd rdx, dword arg(3) ;cols
- add rdx, 8
- pxor mm0, mm0
- pxor mm1, mm1
-
- pxor xmm0, xmm0
-.nextcol4:
-
- movd xmm1, DWORD PTR [rsi+rcx-8] ; -8 -7 -6 -5
- movd xmm2, DWORD PTR [rsi+rcx+7] ; +7 +8 +9 +10
-
- punpcklbw xmm1, xmm0 ; expanding
- punpcklbw xmm2, xmm0 ; expanding
-
- punpcklwd xmm1, xmm0 ; expanding to dwords
- punpcklwd xmm2, xmm0 ; expanding to dwords
-
- psubd xmm2, xmm1 ; 7--8 8--7 9--6 10--5
- paddd xmm1, xmm1 ; -8*2 -7*2 -6*2 -5*2
-
- paddd xmm1, xmm2 ; 7+-8 8+-7 9+-6 10+-5
- pmaddwd xmm1, xmm2 ; squared of 7+-8 8+-7 9+-6 10+-5
-
- paddd xmm6, xmm2
- paddd xmm7, xmm1
-
- pshufd xmm6, xmm6, 0 ; duplicate the last ones
- pshufd xmm7, xmm7, 0 ; duplicate the last ones
-
- psrldq xmm1, 4 ; 8--7 9--6 10--5 0000
- psrldq xmm2, 4 ; 8--7 9--6 10--5 0000
-
- pshufd xmm3, xmm1, 3 ; 0000 8--7 8--7 8--7 squared
- pshufd xmm4, xmm2, 3 ; 0000 8--7 8--7 8--7 squared
-
- paddd xmm6, xmm4
- paddd xmm7, xmm3
-
- pshufd xmm3, xmm1, 01011111b ; 0000 0000 9--6 9--6 squared
- pshufd xmm4, xmm2, 01011111b ; 0000 0000 9--6 9--6 squared
-
- paddd xmm7, xmm3
- paddd xmm6, xmm4
-
- pshufd xmm3, xmm1, 10111111b ; 0000 0000 8--7 8--7 squared
- pshufd xmm4, xmm2, 10111111b ; 0000 0000 8--7 8--7 squared
-
- paddd xmm7, xmm3
- paddd xmm6, xmm4
-
- movdqa xmm3, xmm6
- pmaddwd xmm3, xmm3
-
- movdqa xmm5, xmm7
- pslld xmm5, 4
-
- psubd xmm5, xmm7
- psubd xmm5, xmm3
-
- psubd xmm5, flimit4
- psrad xmm5, 31
-
- packssdw xmm5, xmm0
- packsswb xmm5, xmm0
-
- movd xmm1, DWORD PTR [rsi+rcx]
- movq xmm2, xmm1
-
- punpcklbw xmm1, xmm0
- punpcklwd xmm1, xmm0
-
- paddd xmm1, xmm6
- paddd xmm1, [GLOBAL(four8s)]
-
- psrad xmm1, 4
- packssdw xmm1, xmm0
-
- packuswb xmm1, xmm0
- pand xmm1, xmm5
-
- pandn xmm5, xmm2
- por xmm5, xmm1
-
- movd [rsi+rcx-8], mm0
- movq mm0, mm1
-
- movdq2q mm1, xmm5
- psrldq xmm7, 12
-
- psrldq xmm6, 12
- add rcx, 4
-
- cmp rcx, rdx
- jl .nextcol4
-
- ;s+=pitch;
- movsxd rax, dword arg(1)
- add arg(0), rax
-
- sub dword arg(2), 1 ;rows-=1
- cmp dword arg(2), 0
- jg .ip_row_loop
-
- add rsp, 16
- pop rsp
-
- ; begin epilog
- pop rdi
- pop rsi
- RESTORE_GOT
- RESTORE_XMM
- UNSHADOW_ARGS
- pop rbp
- ret
-%undef flimit4
-
-
-SECTION_RODATA
-align 16
-rd42:
- times 8 dw 0x04
-four8s:
- times 4 dd 8
diff --git a/vp9/encoder/vp9_encoder.c b/vp9/encoder/vp9_encoder.c
index fde1cb9..0ce996a 100644
--- a/vp9/encoder/vp9_encoder.c
+++ b/vp9/encoder/vp9_encoder.c
@@ -3060,7 +3060,11 @@
l = 150;
break;
}
- vp9_denoise(cpi->Source, cpi->Source, l);
+ if (!cpi->common.postproc_state.limits) {
+ cpi->common.postproc_state.limits = vpx_calloc(
+ cpi->common.width, sizeof(*cpi->common.postproc_state.limits));
+ }
+ vp9_denoise(cpi->Source, cpi->Source, l, cpi->common.postproc_state.limits);
}
#endif // CONFIG_VP9_POSTPROC
}
@@ -4649,7 +4653,7 @@
}
vp9_deblock(cm->frame_to_show, pp,
- cm->lf.filter_level * 10 / 6);
+ cm->lf.filter_level * 10 / 6, cm->postproc_state.limits);
#endif
vpx_clear_system_state();
diff --git a/vp9/vp9_common.mk b/vp9/vp9_common.mk
index d0135c6..2dbf0f6 100644
--- a/vp9/vp9_common.mk
+++ b/vp9/vp9_common.mk
@@ -67,7 +67,6 @@
VP9_COMMON_SRCS-$(CONFIG_VP9_POSTPROC) += common/vp9_mfqe.c
ifeq ($(CONFIG_VP9_POSTPROC),yes)
VP9_COMMON_SRCS-$(HAVE_SSE2) += common/x86/vp9_mfqe_sse2.asm
-VP9_COMMON_SRCS-$(HAVE_SSE2) += common/x86/vp9_postproc_sse2.asm
endif
ifneq ($(CONFIG_VP9_HIGHBITDEPTH),yes)
diff --git a/vp9/vp9_cx_iface.c b/vp9/vp9_cx_iface.c
index 51c6fbb..f4e989f 100644
--- a/vp9/vp9_cx_iface.c
+++ b/vp9/vp9_cx_iface.c
@@ -992,6 +992,7 @@
return flags;
}
+const size_t kMinCompressedSize = 8192;
static vpx_codec_err_t encoder_encode(vpx_codec_alg_priv_t *ctx,
const vpx_image_t *img,
vpx_codec_pts_t pts,
@@ -1013,8 +1014,8 @@
// instance for its status to determine the compressed data size.
data_sz = ctx->cfg.g_w * ctx->cfg.g_h * get_image_bps(img) / 8 *
(cpi->multi_arf_allowed ? 8 : 2);
- if (data_sz < 4096)
- data_sz = 4096;
+ if (data_sz < kMinCompressedSize)
+ data_sz = kMinCompressedSize;
if (ctx->cx_data == NULL || ctx->cx_data_sz < data_sz) {
ctx->cx_data_sz = data_sz;
free(ctx->cx_data);
diff --git a/vp9/vp9cx.mk b/vp9/vp9cx.mk
index 5f3de8f..b8342b9 100644
--- a/vp9/vp9cx.mk
+++ b/vp9/vp9cx.mk
@@ -101,7 +101,6 @@
VP9_CX_SRCS-$(HAVE_SSE2) += encoder/x86/vp9_highbd_block_error_intrin_sse2.c
endif
-ifeq ($(CONFIG_USE_X86INC),yes)
VP9_CX_SRCS-$(HAVE_SSE2) += encoder/x86/vp9_dct_sse2.asm
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes)
VP9_CX_SRCS-$(HAVE_SSE2) += encoder/x86/vp9_highbd_error_sse2.asm
@@ -109,13 +108,10 @@
else
VP9_CX_SRCS-$(HAVE_SSE2) += encoder/x86/vp9_error_sse2.asm
endif
-endif
ifeq ($(ARCH_X86_64),yes)
-ifeq ($(CONFIG_USE_X86INC),yes)
VP9_CX_SRCS-$(HAVE_SSSE3) += encoder/x86/vp9_quantize_ssse3_x86_64.asm
endif
-endif
VP9_CX_SRCS-$(HAVE_SSE2) += encoder/x86/vp9_dct_intrin_sse2.c
VP9_CX_SRCS-$(HAVE_SSSE3) += encoder/x86/vp9_dct_ssse3.c
diff --git a/vpx/src/svc_encodeframe.c b/vpx/src/svc_encodeframe.c
index 8028608..ef9b352 100644
--- a/vpx/src/svc_encodeframe.c
+++ b/vpx/src/svc_encodeframe.c
@@ -397,13 +397,6 @@
si->width = enc_cfg->g_w;
si->height = enc_cfg->g_h;
-// wonkap: why is this necessary?
- /*if (enc_cfg->kf_max_dist < 2) {
- svc_log(svc_ctx, SVC_LOG_ERROR, "key frame distance too small: %d\n",
- enc_cfg->kf_max_dist);
- return VPX_CODEC_INVALID_PARAM;
- }*/
-
si->kf_dist = enc_cfg->kf_max_dist;
if (svc_ctx->spatial_layers == 0)
diff --git a/vpx_dsp/add_noise.c b/vpx_dsp/add_noise.c
index 682b444..4ae67a8 100644
--- a/vpx_dsp/add_noise.c
+++ b/vpx_dsp/add_noise.c
@@ -8,6 +8,7 @@
* be found in the AUTHORS file in the root of the source tree.
*/
+#include <math.h>
#include <stdlib.h>
#include "./vpx_config.h"
@@ -23,11 +24,11 @@
unsigned int width, unsigned int height, int pitch) {
unsigned int i, j;
- for (i = 0; i < height; i++) {
+ for (i = 0; i < height; ++i) {
uint8_t *pos = start + i * pitch;
char *ref = (char *)(noise + (rand() & 0xff)); // NOLINT
- for (j = 0; j < width; j++) {
+ for (j = 0; j < width; ++j) {
int v = pos[j];
v = clamp(v - blackclamp[0], 0, 255);
@@ -38,3 +39,36 @@
}
}
}
+
+static double gaussian(double sigma, double mu, double x) {
+ return 1 / (sigma * sqrt(2.0 * 3.14159265)) *
+ (exp(-(x - mu) * (x - mu) / (2 * sigma * sigma)));
+}
+
+int vpx_setup_noise(double sigma, int size, char *noise) {
+ char char_dist[256];
+ int next = 0, i, j;
+
+ // set up a 256 entry lookup that matches gaussian distribution
+ for (i = -32; i < 32; ++i) {
+ const int a_i = (int) (0.5 + 256 * gaussian(sigma, 0, i));
+ if (a_i) {
+ for (j = 0; j < a_i; ++j) {
+ char_dist[next + j] = (char)i;
+ }
+ next = next + j;
+ }
+ }
+
+ // Rounding error - might mean we have less than 256.
+ for (; next < 256; ++next) {
+ char_dist[next] = 0;
+ }
+
+ for (i = 0; i < size; ++i) {
+ noise[i] = char_dist[rand() & 0xff]; // NOLINT
+ }
+
+ // Returns the highest non 0 value used in distribution.
+ return -char_dist[0];
+}
diff --git a/vpx_dsp/deblock.c b/vpx_dsp/deblock.c
new file mode 100644
index 0000000..2b1b7e3
--- /dev/null
+++ b/vpx_dsp/deblock.c
@@ -0,0 +1,203 @@
+/*
+ * Copyright (c) 2016 The WebM project authors. All Rights Reserved.
+ *
+ * Use of this source code is governed by a BSD-style license
+ * that can be found in the LICENSE file in the root of the source
+ * tree. An additional intellectual property rights grant can be found
+ * in the file PATENTS. All contributing project authors may
+ * be found in the AUTHORS file in the root of the source tree.
+ */
+#include <stdlib.h>
+#include "vpx/vpx_integer.h"
+
+const int16_t vpx_rv[] = {8, 5, 2, 2, 8, 12, 4, 9, 8, 3, 0, 3, 9, 0, 0, 0, 8, 3,
+ 14, 4, 10, 1, 11, 14, 1, 14, 9, 6, 12, 11, 8, 6, 10, 0, 0, 8, 9, 0, 3, 14,
+ 8, 11, 13, 4, 2, 9, 0, 3, 9, 6, 1, 2, 3, 14, 13, 1, 8, 2, 9, 7, 3, 3, 1, 13,
+ 13, 6, 6, 5, 2, 7, 11, 9, 11, 8, 7, 3, 2, 0, 13, 13, 14, 4, 12, 5, 12, 10,
+ 8, 10, 13, 10, 4, 14, 4, 10, 0, 8, 11, 1, 13, 7, 7, 14, 6, 14, 13, 2, 13, 5,
+ 4, 4, 0, 10, 0, 5, 13, 2, 12, 7, 11, 13, 8, 0, 4, 10, 7, 2, 7, 2, 2, 5, 3,
+ 4, 7, 3, 3, 14, 14, 5, 9, 13, 3, 14, 3, 6, 3, 0, 11, 8, 13, 1, 13, 1, 12, 0,
+ 10, 9, 7, 6, 2, 8, 5, 2, 13, 7, 1, 13, 14, 7, 6, 7, 9, 6, 10, 11, 7, 8, 7,
+ 5, 14, 8, 4, 4, 0, 8, 7, 10, 0, 8, 14, 11, 3, 12, 5, 7, 14, 3, 14, 5, 2, 6,
+ 11, 12, 12, 8, 0, 11, 13, 1, 2, 0, 5, 10, 14, 7, 8, 0, 4, 11, 0, 8, 0, 3,
+ 10, 5, 8, 0, 11, 6, 7, 8, 10, 7, 13, 9, 2, 5, 1, 5, 10, 2, 4, 3, 5, 6, 10,
+ 8, 9, 4, 11, 14, 0, 10, 0, 5, 13, 2, 12, 7, 11, 13, 8, 0, 4, 10, 7, 2, 7, 2,
+ 2, 5, 3, 4, 7, 3, 3, 14, 14, 5, 9, 13, 3, 14, 3, 6, 3, 0, 11, 8, 13, 1, 13,
+ 1, 12, 0, 10, 9, 7, 6, 2, 8, 5, 2, 13, 7, 1, 13, 14, 7, 6, 7, 9, 6, 10, 11,
+ 7, 8, 7, 5, 14, 8, 4, 4, 0, 8, 7, 10, 0, 8, 14, 11, 3, 12, 5, 7, 14, 3, 14,
+ 5, 2, 6, 11, 12, 12, 8, 0, 11, 13, 1, 2, 0, 5, 10, 14, 7, 8, 0, 4, 11, 0, 8,
+ 0, 3, 10, 5, 8, 0, 11, 6, 7, 8, 10, 7, 13, 9, 2, 5, 1, 5, 10, 2, 4, 3, 5, 6,
+ 10, 8, 9, 4, 11, 14, 3, 8, 3, 7, 8, 5, 11, 4, 12, 3, 11, 9, 14, 8, 14, 13,
+ 4, 3, 1, 2, 14, 6, 5, 4, 4, 11, 4, 6, 2, 1, 5, 8, 8, 12, 13, 5, 14, 10, 12,
+ 13, 0, 9, 5, 5, 11, 10, 13, 9, 10, 13, };
+
+void vpx_post_proc_down_and_across_mb_row_c(unsigned char *src_ptr,
+ unsigned char *dst_ptr,
+ int src_pixels_per_line,
+ int dst_pixels_per_line, int cols,
+ unsigned char *f, int size) {
+ unsigned char *p_src, *p_dst;
+ int row;
+ int col;
+ unsigned char v;
+ unsigned char d[4];
+
+ for (row = 0; row < size; row++) {
+ /* post_proc_down for one row */
+ p_src = src_ptr;
+ p_dst = dst_ptr;
+
+ for (col = 0; col < cols; col++) {
+ unsigned char p_above2 = p_src[col - 2 * src_pixels_per_line];
+ unsigned char p_above1 = p_src[col - src_pixels_per_line];
+ unsigned char p_below1 = p_src[col + src_pixels_per_line];
+ unsigned char p_below2 = p_src[col + 2 * src_pixels_per_line];
+
+ v = p_src[col];
+
+ if ((abs(v - p_above2) < f[col]) && (abs(v - p_above1) < f[col])
+ && (abs(v - p_below1) < f[col]) && (abs(v - p_below2) < f[col])) {
+ unsigned char k1, k2, k3;
+ k1 = (p_above2 + p_above1 + 1) >> 1;
+ k2 = (p_below2 + p_below1 + 1) >> 1;
+ k3 = (k1 + k2 + 1) >> 1;
+ v = (k3 + v + 1) >> 1;
+ }
+
+ p_dst[col] = v;
+ }
+
+ /* now post_proc_across */
+ p_src = dst_ptr;
+ p_dst = dst_ptr;
+
+ p_src[-2] = p_src[-1] = p_src[0];
+ p_src[cols] = p_src[cols + 1] = p_src[cols - 1];
+
+ for (col = 0; col < cols; col++) {
+ v = p_src[col];
+
+ if ((abs(v - p_src[col - 2]) < f[col])
+ && (abs(v - p_src[col - 1]) < f[col])
+ && (abs(v - p_src[col + 1]) < f[col])
+ && (abs(v - p_src[col + 2]) < f[col])) {
+ unsigned char k1, k2, k3;
+ k1 = (p_src[col - 2] + p_src[col - 1] + 1) >> 1;
+ k2 = (p_src[col + 2] + p_src[col + 1] + 1) >> 1;
+ k3 = (k1 + k2 + 1) >> 1;
+ v = (k3 + v + 1) >> 1;
+ }
+
+ d[col & 3] = v;
+
+ if (col >= 2)
+ p_dst[col - 2] = d[(col - 2) & 3];
+ }
+
+ /* handle the last two pixels */
+ p_dst[col - 2] = d[(col - 2) & 3];
+ p_dst[col - 1] = d[(col - 1) & 3];
+
+ /* next row */
+ src_ptr += src_pixels_per_line;
+ dst_ptr += dst_pixels_per_line;
+ }
+}
+
+void vpx_mbpost_proc_across_ip_c(unsigned char *src, int pitch, int rows,
+ int cols, int flimit) {
+ int r, c, i;
+
+ unsigned char *s = src;
+ unsigned char d[16];
+
+ for (r = 0; r < rows; r++) {
+ int sumsq = 0;
+ int sum = 0;
+
+ for (i = -8; i < 0; i++)
+ s[i] = s[0];
+
+ /* 17 avoids valgrind warning - we buffer values in c in d
+ * and only write them when we've read 8 ahead...
+ */
+ for (i = 0; i < 17; i++)
+ s[i + cols] = s[cols - 1];
+
+ for (i = -8; i <= 6; i++) {
+ sumsq += s[i] * s[i];
+ sum += s[i];
+ d[i + 8] = 0;
+ }
+
+ for (c = 0; c < cols + 8; c++) {
+ int x = s[c + 7] - s[c - 8];
+ int y = s[c + 7] + s[c - 8];
+
+ sum += x;
+ sumsq += x * y;
+
+ d[c & 15] = s[c];
+
+ if (sumsq * 15 - sum * sum < flimit) {
+ d[c & 15] = (8 + sum + s[c]) >> 4;
+ }
+
+ s[c - 8] = d[(c - 8) & 15];
+ }
+
+ s += pitch;
+ }
+}
+
+void vpx_mbpost_proc_down_c(unsigned char *dst, int pitch, int rows, int cols,
+ int flimit) {
+ int r, c, i;
+ const int16_t *rv3 = &vpx_rv[63 & rand()];
+
+ for (c = 0; c < cols; c++) {
+ unsigned char *s = &dst[c];
+ int sumsq = 0;
+ int sum = 0;
+ unsigned char d[16];
+ const int16_t *rv2 = rv3 + ((c * 17) & 127);
+
+ for (i = -8; i < 0; i++)
+ s[i * pitch] = s[0];
+
+ /* 17 avoids valgrind warning - we buffer values in c in d
+ * and only write them when we've read 8 ahead...
+ */
+ for (i = 0; i < 17; i++)
+ s[(i + rows) * pitch] = s[(rows - 1) * pitch];
+
+ for (i = -8; i <= 6; i++) {
+ sumsq += s[i * pitch] * s[i * pitch];
+ sum += s[i * pitch];
+ }
+
+ for (r = 0; r < rows + 8; r++) {
+ sumsq += s[7 * pitch] * s[7 * pitch] - s[-8 * pitch] * s[-8 * pitch];
+ sum += s[7 * pitch] - s[-8 * pitch];
+ d[r & 15] = s[0];
+
+ if (sumsq * 15 - sum * sum < flimit) {
+ d[r & 15] = (rv2[r & 127] + sum + s[0]) >> 4;
+ }
+ if (r >= 8)
+ s[-8 * pitch] = d[(r - 8) & 15];
+ s += pitch;
+ }
+ }
+}
+
+#if CONFIG_POSTPROC
+static void vpx_de_mblock(YV12_BUFFER_CONFIG *post,
+ int q) {
+ vpx_mbpost_proc_across_ip(post->y_buffer, post->y_stride, post->y_height,
+ post->y_width, q2mbl(q));
+ vpx_mbpost_proc_down(post->y_buffer, post->y_stride, post->y_height,
+ post->y_width, q2mbl(q));
+}
+
+#endif
diff --git a/vpx_dsp/mips/deblock_msa.c b/vpx_dsp/mips/deblock_msa.c
new file mode 100644
index 0000000..e98a039
--- /dev/null
+++ b/vpx_dsp/mips/deblock_msa.c
@@ -0,0 +1,682 @@
+/*
+ * Copyright (c) 2016 The WebM project authors. All Rights Reserved.
+ *
+ * Use of this source code is governed by a BSD-style license
+ * that can be found in the LICENSE file in the root of the source
+ * tree. An additional intellectual property rights grant can be found
+ * in the file PATENTS. All contributing project authors may
+ * be found in the AUTHORS file in the root of the source tree.
+ */
+
+#include <stdlib.h>
+#include "./macros_msa.h"
+
+extern const int16_t vpx_rv[];
+
+#define VPX_TRANSPOSE8x16_UB_UB(in0, in1, in2, in3, in4, in5, in6, in7, \
+ out0, out1, out2, out3, \
+ out4, out5, out6, out7, \
+ out8, out9, out10, out11, \
+ out12, out13, out14, out15) \
+{ \
+ v8i16 temp0, temp1, temp2, temp3, temp4; \
+ v8i16 temp5, temp6, temp7, temp8, temp9; \
+ \
+ ILVR_B4_SH(in1, in0, in3, in2, in5, in4, in7, in6, \
+ temp0, temp1, temp2, temp3); \
+ ILVR_H2_SH(temp1, temp0, temp3, temp2, temp4, temp5); \
+ ILVRL_W2_SH(temp5, temp4, temp6, temp7); \
+ ILVL_H2_SH(temp1, temp0, temp3, temp2, temp4, temp5); \
+ ILVRL_W2_SH(temp5, temp4, temp8, temp9); \
+ ILVL_B4_SH(in1, in0, in3, in2, in5, in4, in7, in6, \
+ temp0, temp1, temp2, temp3); \
+ ILVR_H2_SH(temp1, temp0, temp3, temp2, temp4, temp5); \
+ ILVRL_W2_UB(temp5, temp4, out8, out10); \
+ ILVL_H2_SH(temp1, temp0, temp3, temp2, temp4, temp5); \
+ ILVRL_W2_UB(temp5, temp4, out12, out14); \
+ out0 = (v16u8)temp6; \
+ out2 = (v16u8)temp7; \
+ out4 = (v16u8)temp8; \
+ out6 = (v16u8)temp9; \
+ out9 = (v16u8)__msa_ilvl_d((v2i64)out8, (v2i64)out8); \
+ out11 = (v16u8)__msa_ilvl_d((v2i64)out10, (v2i64)out10); \
+ out13 = (v16u8)__msa_ilvl_d((v2i64)out12, (v2i64)out12); \
+ out15 = (v16u8)__msa_ilvl_d((v2i64)out14, (v2i64)out14); \
+ out1 = (v16u8)__msa_ilvl_d((v2i64)out0, (v2i64)out0); \
+ out3 = (v16u8)__msa_ilvl_d((v2i64)out2, (v2i64)out2); \
+ out5 = (v16u8)__msa_ilvl_d((v2i64)out4, (v2i64)out4); \
+ out7 = (v16u8)__msa_ilvl_d((v2i64)out6, (v2i64)out6); \
+}
+
+#define VPX_AVER_IF_RETAIN(above2_in, above1_in, src_in, \
+ below1_in, below2_in, ref, out) \
+{ \
+ v16u8 temp0, temp1; \
+ \
+ temp1 = __msa_aver_u_b(above2_in, above1_in); \
+ temp0 = __msa_aver_u_b(below2_in, below1_in); \
+ temp1 = __msa_aver_u_b(temp1, temp0); \
+ out = __msa_aver_u_b(src_in, temp1); \
+ temp0 = __msa_asub_u_b(src_in, above2_in); \
+ temp1 = __msa_asub_u_b(src_in, above1_in); \
+ temp0 = (temp0 < ref); \
+ temp1 = (temp1 < ref); \
+ temp0 = temp0 & temp1; \
+ temp1 = __msa_asub_u_b(src_in, below1_in); \
+ temp1 = (temp1 < ref); \
+ temp0 = temp0 & temp1; \
+ temp1 = __msa_asub_u_b(src_in, below2_in); \
+ temp1 = (temp1 < ref); \
+ temp0 = temp0 & temp1; \
+ out = __msa_bmz_v(out, src_in, temp0); \
+}
+
+#define TRANSPOSE12x16_B(in0, in1, in2, in3, in4, in5, in6, in7, \
+ in8, in9, in10, in11, in12, in13, in14, in15) \
+{ \
+ v8i16 temp0, temp1, temp2, temp3, temp4; \
+ v8i16 temp5, temp6, temp7, temp8, temp9; \
+ \
+ ILVR_B2_SH(in1, in0, in3, in2, temp0, temp1); \
+ ILVRL_H2_SH(temp1, temp0, temp2, temp3); \
+ ILVR_B2_SH(in5, in4, in7, in6, temp0, temp1); \
+ ILVRL_H2_SH(temp1, temp0, temp4, temp5); \
+ ILVRL_W2_SH(temp4, temp2, temp0, temp1); \
+ ILVRL_W2_SH(temp5, temp3, temp2, temp3); \
+ ILVR_B2_SH(in9, in8, in11, in10, temp4, temp5); \
+ ILVR_B2_SH(in9, in8, in11, in10, temp4, temp5); \
+ ILVRL_H2_SH(temp5, temp4, temp6, temp7); \
+ ILVR_B2_SH(in13, in12, in15, in14, temp4, temp5); \
+ ILVRL_H2_SH(temp5, temp4, temp8, temp9); \
+ ILVRL_W2_SH(temp8, temp6, temp4, temp5); \
+ ILVRL_W2_SH(temp9, temp7, temp6, temp7); \
+ ILVL_B2_SH(in1, in0, in3, in2, temp8, temp9); \
+ ILVR_D2_UB(temp4, temp0, temp5, temp1, in0, in2); \
+ in1 = (v16u8)__msa_ilvl_d((v2i64)temp4, (v2i64)temp0); \
+ in3 = (v16u8)__msa_ilvl_d((v2i64)temp5, (v2i64)temp1); \
+ ILVL_B2_SH(in5, in4, in7, in6, temp0, temp1); \
+ ILVR_D2_UB(temp6, temp2, temp7, temp3, in4, in6); \
+ in5 = (v16u8)__msa_ilvl_d((v2i64)temp6, (v2i64)temp2); \
+ in7 = (v16u8)__msa_ilvl_d((v2i64)temp7, (v2i64)temp3); \
+ ILVL_B4_SH(in9, in8, in11, in10, in13, in12, in15, in14, \
+ temp2, temp3, temp4, temp5); \
+ ILVR_H4_SH(temp9, temp8, temp1, temp0, temp3, temp2, temp5, temp4, \
+ temp6, temp7, temp8, temp9); \
+ ILVR_W2_SH(temp7, temp6, temp9, temp8, temp0, temp1); \
+ in8 = (v16u8)__msa_ilvr_d((v2i64)temp1, (v2i64)temp0); \
+ in9 = (v16u8)__msa_ilvl_d((v2i64)temp1, (v2i64)temp0); \
+ ILVL_W2_SH(temp7, temp6, temp9, temp8, temp2, temp3); \
+ in10 = (v16u8)__msa_ilvr_d((v2i64)temp3, (v2i64)temp2); \
+ in11 = (v16u8)__msa_ilvl_d((v2i64)temp3, (v2i64)temp2); \
+}
+
+#define VPX_TRANSPOSE12x8_UB_UB(in0, in1, in2, in3, in4, in5, \
+ in6, in7, in8, in9, in10, in11) \
+{ \
+ v8i16 temp0, temp1, temp2, temp3; \
+ v8i16 temp4, temp5, temp6, temp7; \
+ \
+ ILVR_B2_SH(in1, in0, in3, in2, temp0, temp1); \
+ ILVRL_H2_SH(temp1, temp0, temp2, temp3); \
+ ILVR_B2_SH(in5, in4, in7, in6, temp0, temp1); \
+ ILVRL_H2_SH(temp1, temp0, temp4, temp5); \
+ ILVRL_W2_SH(temp4, temp2, temp0, temp1); \
+ ILVRL_W2_SH(temp5, temp3, temp2, temp3); \
+ ILVL_B2_SH(in1, in0, in3, in2, temp4, temp5); \
+ temp4 = __msa_ilvr_h(temp5, temp4); \
+ ILVL_B2_SH(in5, in4, in7, in6, temp6, temp7); \
+ temp5 = __msa_ilvr_h(temp7, temp6); \
+ ILVRL_W2_SH(temp5, temp4, temp6, temp7); \
+ in0 = (v16u8)temp0; \
+ in2 = (v16u8)temp1; \
+ in4 = (v16u8)temp2; \
+ in6 = (v16u8)temp3; \
+ in8 = (v16u8)temp6; \
+ in10 = (v16u8)temp7; \
+ in1 = (v16u8)__msa_ilvl_d((v2i64)temp0, (v2i64)temp0); \
+ in3 = (v16u8)__msa_ilvl_d((v2i64)temp1, (v2i64)temp1); \
+ in5 = (v16u8)__msa_ilvl_d((v2i64)temp2, (v2i64)temp2); \
+ in7 = (v16u8)__msa_ilvl_d((v2i64)temp3, (v2i64)temp3); \
+ in9 = (v16u8)__msa_ilvl_d((v2i64)temp6, (v2i64)temp6); \
+ in11 = (v16u8)__msa_ilvl_d((v2i64)temp7, (v2i64)temp7); \
+}
+
+static void postproc_down_across_chroma_msa(uint8_t *src_ptr, uint8_t *dst_ptr,
+ int32_t src_stride,
+ int32_t dst_stride, int32_t cols,
+ uint8_t *f) {
+ uint8_t *p_src = src_ptr;
+ uint8_t *p_dst = dst_ptr;
+ uint8_t *f_orig = f;
+ uint8_t *p_dst_st = dst_ptr;
+ uint16_t col;
+ uint64_t out0, out1, out2, out3;
+ v16u8 above2, above1, below2, below1, src, ref, ref_temp;
+ v16u8 inter0, inter1, inter2, inter3, inter4, inter5;
+ v16u8 inter6, inter7, inter8, inter9, inter10, inter11;
+
+ for (col = (cols / 16); col--;) {
+ ref = LD_UB(f);
+ LD_UB2(p_src - 2 * src_stride, src_stride, above2, above1);
+ src = LD_UB(p_src);
+ LD_UB2(p_src + 1 * src_stride, src_stride, below1, below2);
+ VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter0);
+ above2 = LD_UB(p_src + 3 * src_stride);
+ VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter1);
+ above1 = LD_UB(p_src + 4 * src_stride);
+ VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter2);
+ src = LD_UB(p_src + 5 * src_stride);
+ VPX_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter3);
+ below1 = LD_UB(p_src + 6 * src_stride);
+ VPX_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter4);
+ below2 = LD_UB(p_src + 7 * src_stride);
+ VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter5);
+ above2 = LD_UB(p_src + 8 * src_stride);
+ VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter6);
+ above1 = LD_UB(p_src + 9 * src_stride);
+ VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter7);
+ ST_UB8(inter0, inter1, inter2, inter3, inter4, inter5, inter6, inter7,
+ p_dst, dst_stride);
+
+ p_dst += 16;
+ p_src += 16;
+ f += 16;
+ }
+
+ if (0 != (cols / 16)) {
+ ref = LD_UB(f);
+ LD_UB2(p_src - 2 * src_stride, src_stride, above2, above1);
+ src = LD_UB(p_src);
+ LD_UB2(p_src + 1 * src_stride, src_stride, below1, below2);
+ VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter0);
+ above2 = LD_UB(p_src + 3 * src_stride);
+ VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter1);
+ above1 = LD_UB(p_src + 4 * src_stride);
+ VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter2);
+ src = LD_UB(p_src + 5 * src_stride);
+ VPX_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter3);
+ below1 = LD_UB(p_src + 6 * src_stride);
+ VPX_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter4);
+ below2 = LD_UB(p_src + 7 * src_stride);
+ VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter5);
+ above2 = LD_UB(p_src + 8 * src_stride);
+ VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter6);
+ above1 = LD_UB(p_src + 9 * src_stride);
+ VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter7);
+ out0 = __msa_copy_u_d((v2i64) inter0, 0);
+ out1 = __msa_copy_u_d((v2i64) inter1, 0);
+ out2 = __msa_copy_u_d((v2i64) inter2, 0);
+ out3 = __msa_copy_u_d((v2i64) inter3, 0);
+ SD4(out0, out1, out2, out3, p_dst, dst_stride);
+
+ out0 = __msa_copy_u_d((v2i64) inter4, 0);
+ out1 = __msa_copy_u_d((v2i64) inter5, 0);
+ out2 = __msa_copy_u_d((v2i64) inter6, 0);
+ out3 = __msa_copy_u_d((v2i64) inter7, 0);
+ SD4(out0, out1, out2, out3, p_dst + 4 * dst_stride, dst_stride);
+ }
+
+ f = f_orig;
+ p_dst = dst_ptr - 2;
+ LD_UB8(p_dst, dst_stride, inter0, inter1, inter2, inter3, inter4, inter5,
+ inter6, inter7);
+
+ for (col = 0; col < (cols / 8); ++col) {
+ ref = LD_UB(f);
+ f += 8;
+ VPX_TRANSPOSE12x8_UB_UB(inter0, inter1, inter2, inter3, inter4, inter5,
+ inter6, inter7, inter8, inter9, inter10, inter11);
+ if (0 == col) {
+ above2 = inter2;
+ above1 = inter2;
+ } else {
+ above2 = inter0;
+ above1 = inter1;
+ }
+ src = inter2;
+ below1 = inter3;
+ below2 = inter4;
+ ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 0);
+ VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref_temp, inter2);
+ above2 = inter5;
+ ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 1);
+ VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref_temp, inter3);
+ above1 = inter6;
+ ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 2);
+ VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref_temp, inter4);
+ src = inter7;
+ ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 3);
+ VPX_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref_temp, inter5);
+ below1 = inter8;
+ ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 4);
+ VPX_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref_temp, inter6);
+ below2 = inter9;
+ ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 5);
+ VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref_temp, inter7);
+ if (col == (cols / 8 - 1)) {
+ above2 = inter9;
+ } else {
+ above2 = inter10;
+ }
+ ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 6);
+ VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref_temp, inter8);
+ if (col == (cols / 8 - 1)) {
+ above1 = inter9;
+ } else {
+ above1 = inter11;
+ }
+ ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 7);
+ VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref_temp, inter9);
+ TRANSPOSE8x8_UB_UB(inter2, inter3, inter4, inter5, inter6, inter7, inter8,
+ inter9, inter2, inter3, inter4, inter5, inter6, inter7,
+ inter8, inter9);
+ p_dst += 8;
+ LD_UB2(p_dst, dst_stride, inter0, inter1);
+ ST8x1_UB(inter2, p_dst_st);
+ ST8x1_UB(inter3, (p_dst_st + 1 * dst_stride));
+ LD_UB2(p_dst + 2 * dst_stride, dst_stride, inter2, inter3);
+ ST8x1_UB(inter4, (p_dst_st + 2 * dst_stride));
+ ST8x1_UB(inter5, (p_dst_st + 3 * dst_stride));
+ LD_UB2(p_dst + 4 * dst_stride, dst_stride, inter4, inter5);
+ ST8x1_UB(inter6, (p_dst_st + 4 * dst_stride));
+ ST8x1_UB(inter7, (p_dst_st + 5 * dst_stride));
+ LD_UB2(p_dst + 6 * dst_stride, dst_stride, inter6, inter7);
+ ST8x1_UB(inter8, (p_dst_st + 6 * dst_stride));
+ ST8x1_UB(inter9, (p_dst_st + 7 * dst_stride));
+ p_dst_st += 8;
+ }
+}
+
+static void postproc_down_across_luma_msa(uint8_t *src_ptr, uint8_t *dst_ptr,
+ int32_t src_stride,
+ int32_t dst_stride, int32_t cols,
+ uint8_t *f) {
+ uint8_t *p_src = src_ptr;
+ uint8_t *p_dst = dst_ptr;
+ uint8_t *p_dst_st = dst_ptr;
+ uint8_t *f_orig = f;
+ uint16_t col;
+ v16u8 above2, above1, below2, below1;
+ v16u8 src, ref, ref_temp;
+ v16u8 inter0, inter1, inter2, inter3, inter4, inter5, inter6;
+ v16u8 inter7, inter8, inter9, inter10, inter11;
+ v16u8 inter12, inter13, inter14, inter15;
+
+ for (col = (cols / 16); col--;) {
+ ref = LD_UB(f);
+ LD_UB2(p_src - 2 * src_stride, src_stride, above2, above1);
+ src = LD_UB(p_src);
+ LD_UB2(p_src + 1 * src_stride, src_stride, below1, below2);
+ VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter0);
+ above2 = LD_UB(p_src + 3 * src_stride);
+ VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter1);
+ above1 = LD_UB(p_src + 4 * src_stride);
+ VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter2);
+ src = LD_UB(p_src + 5 * src_stride);
+ VPX_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter3);
+ below1 = LD_UB(p_src + 6 * src_stride);
+ VPX_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter4);
+ below2 = LD_UB(p_src + 7 * src_stride);
+ VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter5);
+ above2 = LD_UB(p_src + 8 * src_stride);
+ VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter6);
+ above1 = LD_UB(p_src + 9 * src_stride);
+ VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter7);
+ src = LD_UB(p_src + 10 * src_stride);
+ VPX_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter8);
+ below1 = LD_UB(p_src + 11 * src_stride);
+ VPX_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter9);
+ below2 = LD_UB(p_src + 12 * src_stride);
+ VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter10);
+ above2 = LD_UB(p_src + 13 * src_stride);
+ VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter11);
+ above1 = LD_UB(p_src + 14 * src_stride);
+ VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter12);
+ src = LD_UB(p_src + 15 * src_stride);
+ VPX_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter13);
+ below1 = LD_UB(p_src + 16 * src_stride);
+ VPX_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter14);
+ below2 = LD_UB(p_src + 17 * src_stride);
+ VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter15);
+ ST_UB8(inter0, inter1, inter2, inter3, inter4, inter5, inter6, inter7,
+ p_dst, dst_stride);
+ ST_UB8(inter8, inter9, inter10, inter11, inter12, inter13, inter14, inter15,
+ p_dst + 8 * dst_stride, dst_stride);
+ p_src += 16;
+ p_dst += 16;
+ f += 16;
+ }
+
+ f = f_orig;
+ p_dst = dst_ptr - 2;
+ LD_UB8(p_dst, dst_stride, inter0, inter1, inter2, inter3, inter4, inter5,
+ inter6, inter7);
+ LD_UB8(p_dst + 8 * dst_stride, dst_stride, inter8, inter9, inter10, inter11,
+ inter12, inter13, inter14, inter15);
+
+ for (col = 0; col < cols / 8; ++col) {
+ ref = LD_UB(f);
+ f += 8;
+ TRANSPOSE12x16_B(inter0, inter1, inter2, inter3, inter4, inter5, inter6,
+ inter7, inter8, inter9, inter10, inter11, inter12, inter13,
+ inter14, inter15);
+ if (0 == col) {
+ above2 = inter2;
+ above1 = inter2;
+ } else {
+ above2 = inter0;
+ above1 = inter1;
+ }
+
+ src = inter2;
+ below1 = inter3;
+ below2 = inter4;
+ ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 0);
+ VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref_temp, inter2);
+ above2 = inter5;
+ ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 1);
+ VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref_temp, inter3);
+ above1 = inter6;
+ ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 2);
+ VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref_temp, inter4);
+ src = inter7;
+ ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 3);
+ VPX_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref_temp, inter5);
+ below1 = inter8;
+ ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 4);
+ VPX_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref_temp, inter6);
+ below2 = inter9;
+ ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 5);
+ VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref_temp, inter7);
+ if (col == (cols / 8 - 1)) {
+ above2 = inter9;
+ } else {
+ above2 = inter10;
+ }
+ ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 6);
+ VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref_temp, inter8);
+ if (col == (cols / 8 - 1)) {
+ above1 = inter9;
+ } else {
+ above1 = inter11;
+ }
+ ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 7);
+ VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref_temp, inter9);
+ VPX_TRANSPOSE8x16_UB_UB(inter2, inter3, inter4, inter5, inter6, inter7,
+ inter8, inter9, inter2, inter3, inter4, inter5,
+ inter6, inter7, inter8, inter9, inter10, inter11,
+ inter12, inter13, inter14, inter15, above2, above1);
+
+ p_dst += 8;
+ LD_UB2(p_dst, dst_stride, inter0, inter1);
+ ST8x1_UB(inter2, p_dst_st);
+ ST8x1_UB(inter3, (p_dst_st + 1 * dst_stride));
+ LD_UB2(p_dst + 2 * dst_stride, dst_stride, inter2, inter3);
+ ST8x1_UB(inter4, (p_dst_st + 2 * dst_stride));
+ ST8x1_UB(inter5, (p_dst_st + 3 * dst_stride));
+ LD_UB2(p_dst + 4 * dst_stride, dst_stride, inter4, inter5);
+ ST8x1_UB(inter6, (p_dst_st + 4 * dst_stride));
+ ST8x1_UB(inter7, (p_dst_st + 5 * dst_stride));
+ LD_UB2(p_dst + 6 * dst_stride, dst_stride, inter6, inter7);
+ ST8x1_UB(inter8, (p_dst_st + 6 * dst_stride));
+ ST8x1_UB(inter9, (p_dst_st + 7 * dst_stride));
+ LD_UB2(p_dst + 8 * dst_stride, dst_stride, inter8, inter9);
+ ST8x1_UB(inter10, (p_dst_st + 8 * dst_stride));
+ ST8x1_UB(inter11, (p_dst_st + 9 * dst_stride));
+ LD_UB2(p_dst + 10 * dst_stride, dst_stride, inter10, inter11);
+ ST8x1_UB(inter12, (p_dst_st + 10 * dst_stride));
+ ST8x1_UB(inter13, (p_dst_st + 11 * dst_stride));
+ LD_UB2(p_dst + 12 * dst_stride, dst_stride, inter12, inter13);
+ ST8x1_UB(inter14, (p_dst_st + 12 * dst_stride));
+ ST8x1_UB(inter15, (p_dst_st + 13 * dst_stride));
+ LD_UB2(p_dst + 14 * dst_stride, dst_stride, inter14, inter15);
+ ST8x1_UB(above2, (p_dst_st + 14 * dst_stride));
+ ST8x1_UB(above1, (p_dst_st + 15 * dst_stride));
+ p_dst_st += 8;
+ }
+}
+
+void vpx_post_proc_down_and_across_mb_row_msa(uint8_t *src, uint8_t *dst,
+ int32_t src_stride,
+ int32_t dst_stride, int32_t cols,
+ uint8_t *f, int32_t size) {
+ if (8 == size) {
+ postproc_down_across_chroma_msa(src, dst, src_stride, dst_stride, cols, f);
+ } else if (16 == size) {
+ postproc_down_across_luma_msa(src, dst, src_stride, dst_stride, cols, f);
+ }
+}
+
+void vpx_mbpost_proc_across_ip_msa(uint8_t *src_ptr, int32_t pitch,
+ int32_t rows, int32_t cols, int32_t flimit) {
+ int32_t row, col, cnt;
+ uint8_t *src_dup = src_ptr;
+ v16u8 src0, src, tmp_orig;
+ v16u8 tmp = {0};
+ v16i8 zero = {0};
+ v8u16 sum_h, src_r_h, src_l_h;
+ v4u32 src_r_w, src_l_w;
+ v4i32 flimit_vec;
+
+ flimit_vec = __msa_fill_w(flimit);
+ for (row = rows; row--;) {
+ int32_t sum_sq = 0;
+ int32_t sum = 0;
+ src0 = (v16u8) __msa_fill_b(src_dup[0]);
+ ST8x1_UB(src0, (src_dup - 8));
+
+ src0 = (v16u8) __msa_fill_b(src_dup[cols - 1]);
+ ST_UB(src0, src_dup + cols);
+ src_dup[cols + 16] = src_dup[cols - 1];
+ tmp_orig = (v16u8) __msa_ldi_b(0);
+ tmp_orig[15] = tmp[15];
+ src = LD_UB(src_dup - 8);
+ src[15] = 0;
+ ILVRL_B2_UH(zero, src, src_r_h, src_l_h);
+ src_r_w = __msa_dotp_u_w(src_r_h, src_r_h);
+ src_l_w = __msa_dotp_u_w(src_l_h, src_l_h);
+ sum_sq = HADD_SW_S32(src_r_w);
+ sum_sq += HADD_SW_S32(src_l_w);
+ sum_h = __msa_hadd_u_h(src, src);
+ sum = HADD_UH_U32(sum_h);
+ {
+ v16u8 src7, src8, src_r, src_l;
+ v16i8 mask;
+ v8u16 add_r, add_l;
+ v8i16 sub_r, sub_l, sum_r, sum_l, mask0, mask1;
+ v4i32 sum_sq0, sum_sq1, sum_sq2, sum_sq3;
+ v4i32 sub0, sub1, sub2, sub3;
+ v4i32 sum0_w, sum1_w, sum2_w, sum3_w;
+ v4i32 mul0, mul1, mul2, mul3;
+ v4i32 total0, total1, total2, total3;
+ v8i16 const8 = __msa_fill_h(8);
+
+ src7 = LD_UB(src_dup + 7);
+ src8 = LD_UB(src_dup - 8);
+ for (col = 0; col < (cols >> 4); ++col) {
+ ILVRL_B2_UB(src7, src8, src_r, src_l);
+ HSUB_UB2_SH(src_r, src_l, sub_r, sub_l);
+
+ sum_r[0] = sum + sub_r[0];
+ for (cnt = 0; cnt < 7; ++cnt) {
+ sum_r[cnt + 1] = sum_r[cnt] + sub_r[cnt + 1];
+ }
+ sum_l[0] = sum_r[7] + sub_l[0];
+ for (cnt = 0; cnt < 7; ++cnt) {
+ sum_l[cnt + 1] = sum_l[cnt] + sub_l[cnt + 1];
+ }
+ sum = sum_l[7];
+ src = LD_UB(src_dup + 16 * col);
+ ILVRL_B2_UH(zero, src, src_r_h, src_l_h);
+ src7 = (v16u8)((const8 + sum_r + (v8i16) src_r_h) >> 4);
+ src8 = (v16u8)((const8 + sum_l + (v8i16) src_l_h) >> 4);
+ tmp = (v16u8) __msa_pckev_b((v16i8) src8, (v16i8) src7);
+
+ HADD_UB2_UH(src_r, src_l, add_r, add_l);
+ UNPCK_SH_SW(sub_r, sub0, sub1);
+ UNPCK_SH_SW(sub_l, sub2, sub3);
+ ILVR_H2_SW(zero, add_r, zero, add_l, sum0_w, sum2_w);
+ ILVL_H2_SW(zero, add_r, zero, add_l, sum1_w, sum3_w);
+ MUL4(sum0_w, sub0, sum1_w, sub1, sum2_w, sub2, sum3_w, sub3, mul0, mul1,
+ mul2, mul3);
+ sum_sq0[0] = sum_sq + mul0[0];
+ for (cnt = 0; cnt < 3; ++cnt) {
+ sum_sq0[cnt + 1] = sum_sq0[cnt] + mul0[cnt + 1];
+ }
+ sum_sq1[0] = sum_sq0[3] + mul1[0];
+ for (cnt = 0; cnt < 3; ++cnt) {
+ sum_sq1[cnt + 1] = sum_sq1[cnt] + mul1[cnt + 1];
+ }
+ sum_sq2[0] = sum_sq1[3] + mul2[0];
+ for (cnt = 0; cnt < 3; ++cnt) {
+ sum_sq2[cnt + 1] = sum_sq2[cnt] + mul2[cnt + 1];
+ }
+ sum_sq3[0] = sum_sq2[3] + mul3[0];
+ for (cnt = 0; cnt < 3; ++cnt) {
+ sum_sq3[cnt + 1] = sum_sq3[cnt] + mul3[cnt + 1];
+ }
+ sum_sq = sum_sq3[3];
+
+ UNPCK_SH_SW(sum_r, sum0_w, sum1_w);
+ UNPCK_SH_SW(sum_l, sum2_w, sum3_w);
+ total0 = sum_sq0 * __msa_ldi_w(15);
+ total0 -= sum0_w * sum0_w;
+ total1 = sum_sq1 * __msa_ldi_w(15);
+ total1 -= sum1_w * sum1_w;
+ total2 = sum_sq2 * __msa_ldi_w(15);
+ total2 -= sum2_w * sum2_w;
+ total3 = sum_sq3 * __msa_ldi_w(15);
+ total3 -= sum3_w * sum3_w;
+ total0 = (total0 < flimit_vec);
+ total1 = (total1 < flimit_vec);
+ total2 = (total2 < flimit_vec);
+ total3 = (total3 < flimit_vec);
+ PCKEV_H2_SH(total1, total0, total3, total2, mask0, mask1);
+ mask = __msa_pckev_b((v16i8) mask1, (v16i8) mask0);
+ tmp = __msa_bmz_v(tmp, src, (v16u8) mask);
+
+ if (col == 0) {
+ uint64_t src_d;
+
+ src_d = __msa_copy_u_d((v2i64) tmp_orig, 1);
+ SD(src_d, (src_dup - 8));
+ }
+
+ src7 = LD_UB(src_dup + 16 * (col + 1) + 7);
+ src8 = LD_UB(src_dup + 16 * (col + 1) - 8);
+ ST_UB(tmp, (src_dup + (16 * col)));
+ }
+
+ src_dup += pitch;
+ }
+ }
+}
+
+void vpx_mbpost_proc_down_msa(uint8_t *dst_ptr, int32_t pitch, int32_t rows,
+ int32_t cols, int32_t flimit) {
+ int32_t row, col, cnt, i;
+ const int16_t *rv3 = &vpx_rv[63 & rand()];
+ v4i32 flimit_vec;
+ v16u8 dst7, dst8, dst_r_b, dst_l_b;
+ v16i8 mask;
+ v8u16 add_r, add_l;
+ v8i16 dst_r_h, dst_l_h, sub_r, sub_l, mask0, mask1;
+ v4i32 sub0, sub1, sub2, sub3, total0, total1, total2, total3;
+
+ flimit_vec = __msa_fill_w(flimit);
+
+ for (col = 0; col < (cols >> 4); ++col) {
+ uint8_t *dst_tmp = &dst_ptr[col << 4];
+ v16u8 dst;
+ v16i8 zero = {0};
+ v16u8 tmp[16];
+ v8i16 mult0, mult1, rv2_0, rv2_1;
+ v8i16 sum0_h = {0};
+ v8i16 sum1_h = {0};
+ v4i32 mul0 = {0};
+ v4i32 mul1 = {0};
+ v4i32 mul2 = {0};
+ v4i32 mul3 = {0};
+ v4i32 sum0_w, sum1_w, sum2_w, sum3_w;
+ v4i32 add0, add1, add2, add3;
+ const int16_t *rv2[16];
+
+ dst = LD_UB(dst_tmp);
+ for (cnt = (col << 4), i = 0; i < 16; ++cnt) {
+ rv2[i] = rv3 + ((cnt * 17) & 127);
+ ++i;
+ }
+ for (cnt = -8; cnt < 0; ++cnt) {
+ ST_UB(dst, dst_tmp + cnt * pitch);
+ }
+
+ dst = LD_UB((dst_tmp + (rows - 1) * pitch));
+ for (cnt = rows; cnt < rows + 17; ++cnt) {
+ ST_UB(dst, dst_tmp + cnt * pitch);
+ }
+ for (cnt = -8; cnt <= 6; ++cnt) {
+ dst = LD_UB(dst_tmp + (cnt * pitch));
+ UNPCK_UB_SH(dst, dst_r_h, dst_l_h);
+ MUL2(dst_r_h, dst_r_h, dst_l_h, dst_l_h, mult0, mult1);
+ mul0 += (v4i32) __msa_ilvr_h((v8i16) zero, (v8i16) mult0);
+ mul1 += (v4i32) __msa_ilvl_h((v8i16) zero, (v8i16) mult0);
+ mul2 += (v4i32) __msa_ilvr_h((v8i16) zero, (v8i16) mult1);
+ mul3 += (v4i32) __msa_ilvl_h((v8i16) zero, (v8i16) mult1);
+ ADD2(sum0_h, dst_r_h, sum1_h, dst_l_h, sum0_h, sum1_h);
+ }
+
+ for (row = 0; row < (rows + 8); ++row) {
+ for (i = 0; i < 8; ++i) {
+ rv2_0[i] = *(rv2[i] + (row & 127));
+ rv2_1[i] = *(rv2[i + 8] + (row & 127));
+ }
+ dst7 = LD_UB(dst_tmp + (7 * pitch));
+ dst8 = LD_UB(dst_tmp - (8 * pitch));
+ ILVRL_B2_UB(dst7, dst8, dst_r_b, dst_l_b);
+
+ HSUB_UB2_SH(dst_r_b, dst_l_b, sub_r, sub_l);
+ UNPCK_SH_SW(sub_r, sub0, sub1);
+ UNPCK_SH_SW(sub_l, sub2, sub3);
+ sum0_h += sub_r;
+ sum1_h += sub_l;
+
+ HADD_UB2_UH(dst_r_b, dst_l_b, add_r, add_l);
+
+ ILVRL_H2_SW(zero, add_r, add0, add1);
+ ILVRL_H2_SW(zero, add_l, add2, add3);
+ mul0 += add0 * sub0;
+ mul1 += add1 * sub1;
+ mul2 += add2 * sub2;
+ mul3 += add3 * sub3;
+ dst = LD_UB(dst_tmp);
+ ILVRL_B2_SH(zero, dst, dst_r_h, dst_l_h);
+ dst7 = (v16u8)((rv2_0 + sum0_h + dst_r_h) >> 4);
+ dst8 = (v16u8)((rv2_1 + sum1_h + dst_l_h) >> 4);
+ tmp[row & 15] = (v16u8) __msa_pckev_b((v16i8) dst8, (v16i8) dst7);
+
+ UNPCK_SH_SW(sum0_h, sum0_w, sum1_w);
+ UNPCK_SH_SW(sum1_h, sum2_w, sum3_w);
+ total0 = mul0 * __msa_ldi_w(15);
+ total0 -= sum0_w * sum0_w;
+ total1 = mul1 * __msa_ldi_w(15);
+ total1 -= sum1_w * sum1_w;
+ total2 = mul2 * __msa_ldi_w(15);
+ total2 -= sum2_w * sum2_w;
+ total3 = mul3 * __msa_ldi_w(15);
+ total3 -= sum3_w * sum3_w;
+ total0 = (total0 < flimit_vec);
+ total1 = (total1 < flimit_vec);
+ total2 = (total2 < flimit_vec);
+ total3 = (total3 < flimit_vec);
+ PCKEV_H2_SH(total1, total0, total3, total2, mask0, mask1);
+ mask = __msa_pckev_b((v16i8) mask1, (v16i8) mask0);
+ tmp[row & 15] = __msa_bmz_v(tmp[row & 15], dst, (v16u8) mask);
+
+ if (row >= 8) {
+ ST_UB(tmp[(row - 8) & 15], (dst_tmp - 8 * pitch));
+ }
+
+ dst_tmp += pitch;
+ }
+ }
+}
diff --git a/vpx_dsp/mips/macros_msa.h b/vpx_dsp/mips/macros_msa.h
index 91e3615..ea59eaf 100644
--- a/vpx_dsp/mips/macros_msa.h
+++ b/vpx_dsp/mips/macros_msa.h
@@ -1060,6 +1060,7 @@
ILVL_B2(RTYPE, in4, in5, in6, in7, out2, out3); \
}
#define ILVL_B4_SB(...) ILVL_B4(v16i8, __VA_ARGS__)
+#define ILVL_B4_SH(...) ILVL_B4(v8i16, __VA_ARGS__)
#define ILVL_B4_UH(...) ILVL_B4(v8u16, __VA_ARGS__)
/* Description : Interleave left half of halfword elements from vectors
@@ -1074,6 +1075,7 @@
out1 = (RTYPE)__msa_ilvl_h((v8i16)in2, (v8i16)in3); \
}
#define ILVL_H2_SH(...) ILVL_H2(v8i16, __VA_ARGS__)
+#define ILVL_H2_SW(...) ILVL_H2(v4i32, __VA_ARGS__)
/* Description : Interleave left half of word elements from vectors
Arguments : Inputs - in0, in1, in2, in3
@@ -1137,6 +1139,7 @@
out1 = (RTYPE)__msa_ilvr_h((v8i16)in2, (v8i16)in3); \
}
#define ILVR_H2_SH(...) ILVR_H2(v8i16, __VA_ARGS__)
+#define ILVR_H2_SW(...) ILVR_H2(v4i32, __VA_ARGS__)
#define ILVR_H4(RTYPE, in0, in1, in2, in3, in4, in5, in6, in7, \
out0, out1, out2, out3) { \
@@ -1215,6 +1218,7 @@
out0 = (RTYPE)__msa_ilvr_w((v4i32)in0, (v4i32)in1); \
out1 = (RTYPE)__msa_ilvl_w((v4i32)in0, (v4i32)in1); \
}
+#define ILVRL_W2_UB(...) ILVRL_W2(v16u8, __VA_ARGS__)
#define ILVRL_W2_SH(...) ILVRL_W2(v8i16, __VA_ARGS__)
#define ILVRL_W2_SW(...) ILVRL_W2(v4i32, __VA_ARGS__)
diff --git a/vpx_dsp/postproc.h b/vpx_dsp/postproc.h
new file mode 100644
index 0000000..78d11b1
--- /dev/null
+++ b/vpx_dsp/postproc.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (c) 2016 The WebM project authors. All Rights Reserved.
+ *
+ * Use of this source code is governed by a BSD-style license
+ * that can be found in the LICENSE file in the root of the source
+ * tree. An additional intellectual property rights grant can be found
+ * in the file PATENTS. All contributing project authors may
+ * be found in the AUTHORS file in the root of the source tree.
+ */
+
+#ifndef VPX_DSP_POSTPROC_H_
+#define VPX_DSP_POSTPROC_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// Fills a noise buffer with gaussian noise strength determined by sigma.
+int vpx_setup_noise(double sigma, int size, char *noise);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif // VPX_DSP_POSTPROC_H_
diff --git a/vpx_dsp/psnr.c b/vpx_dsp/psnr.c
index 1655f11..5bf7862 100644
--- a/vpx_dsp/psnr.c
+++ b/vpx_dsp/psnr.c
@@ -51,7 +51,7 @@
static void encoder_highbd_variance64(const uint8_t *a8, int a_stride,
const uint8_t *b8, int b_stride,
int w, int h, uint64_t *sse,
- uint64_t *sum) {
+ int64_t *sum) {
int i, j;
uint16_t *a = CONVERT_TO_SHORTPTR(a8);
@@ -75,7 +75,7 @@
int w, int h,
unsigned int *sse, int *sum) {
uint64_t sse_long = 0;
- uint64_t sum_long = 0;
+ int64_t sum_long = 0;
encoder_highbd_variance64(a8, a_stride, b8, b_stride, w, h,
&sse_long, &sum_long);
*sse = (unsigned int)sse_long;
diff --git a/vpx_dsp/psnrhvs.c b/vpx_dsp/psnrhvs.c
index 095ba5d..3708cc3 100644
--- a/vpx_dsp/psnrhvs.c
+++ b/vpx_dsp/psnrhvs.c
@@ -245,6 +245,8 @@
}
}
}
+ if (pixels <=0)
+ return 0;
ret /= pixels;
return ret;
}
diff --git a/vpx_dsp/vpx_dsp.mk b/vpx_dsp/vpx_dsp.mk
index 06b46d3..43a78a8 100644
--- a/vpx_dsp/vpx_dsp.mk
+++ b/vpx_dsp/vpx_dsp.mk
@@ -42,24 +42,24 @@
# intra predictions
DSP_SRCS-yes += intrapred.c
-ifeq ($(CONFIG_USE_X86INC),yes)
DSP_SRCS-$(HAVE_SSE) += x86/intrapred_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/intrapred_sse2.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/intrapred_ssse3.asm
DSP_SRCS-$(HAVE_SSSE3) += x86/vpx_subpixel_8t_ssse3.asm
-endif # CONFIG_USE_X86INC
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes)
-ifeq ($(CONFIG_USE_X86INC),yes)
DSP_SRCS-$(HAVE_SSE) += x86/highbd_intrapred_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_intrapred_sse2.asm
-endif # CONFIG_USE_X86INC
endif # CONFIG_VP9_HIGHBITDEPTH
ifneq ($(filter yes,$(CONFIG_POSTPROC) $(CONFIG_VP9_POSTPROC)),)
DSP_SRCS-yes += add_noise.c
+DSP_SRCS-yes += deblock.c
+DSP_SRCS-yes += postproc.h
DSP_SRCS-$(HAVE_MSA) += mips/add_noise_msa.c
+DSP_SRCS-$(HAVE_MSA) += mips/deblock_msa.c
DSP_SRCS-$(HAVE_SSE2) += x86/add_noise_sse2.asm
+DSP_SRCS-$(HAVE_SSE2) += x86/deblock_sse2.asm
endif # CONFIG_POSTPROC
DSP_SRCS-$(HAVE_NEON_ASM) += arm/intrapred_neon_asm$(ASM)
@@ -102,9 +102,8 @@
DSP_SRCS-$(HAVE_SSE2) += x86/vpx_high_subpixel_8t_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/vpx_high_subpixel_bilinear_sse2.asm
endif
-ifeq ($(CONFIG_USE_X86INC),yes)
+
DSP_SRCS-$(HAVE_SSE2) += x86/vpx_convolve_copy_sse2.asm
-endif
ifeq ($(HAVE_NEON_ASM),yes)
DSP_SRCS-yes += arm/vpx_convolve_copy_neon_asm$(ASM)
@@ -194,10 +193,8 @@
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_impl_sse2.h
DSP_SRCS-$(HAVE_SSE2) += x86/fwd_dct32x32_impl_sse2.h
ifeq ($(ARCH_X86_64),yes)
-ifeq ($(CONFIG_USE_X86INC),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/fwd_txfm_ssse3_x86_64.asm
endif
-endif
DSP_SRCS-$(HAVE_AVX2) += x86/fwd_txfm_avx2.c
DSP_SRCS-$(HAVE_AVX2) += x86/fwd_dct32x32_impl_avx2.h
DSP_SRCS-$(HAVE_NEON) += arm/fwd_txfm_neon.c
@@ -212,12 +209,10 @@
DSP_SRCS-yes += inv_txfm.c
DSP_SRCS-$(HAVE_SSE2) += x86/inv_txfm_sse2.h
DSP_SRCS-$(HAVE_SSE2) += x86/inv_txfm_sse2.c
-ifeq ($(CONFIG_USE_X86INC),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/inv_wht_sse2.asm
ifeq ($(ARCH_X86_64),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/inv_txfm_ssse3_x86_64.asm
endif # ARCH_X86_64
-endif # CONFIG_USE_X86INC
ifeq ($(HAVE_NEON_ASM),yes)
DSP_SRCS-yes += arm/save_reg_neon$(ASM)
@@ -269,11 +264,9 @@
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_quantize_intrin_sse2.c
endif
ifeq ($(ARCH_X86_64),yes)
-ifeq ($(CONFIG_USE_X86INC),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/quantize_ssse3_x86_64.asm
DSP_SRCS-$(HAVE_AVX) += x86/quantize_avx_x86_64.asm
endif
-endif
# avg
DSP_SRCS-yes += avg.c
@@ -282,10 +275,8 @@
DSP_SRCS-$(HAVE_MSA) += mips/avg_msa.c
DSP_SRCS-$(HAVE_NEON) += arm/hadamard_neon.c
ifeq ($(ARCH_X86_64),yes)
-ifeq ($(CONFIG_USE_X86INC),yes)
DSP_SRCS-$(HAVE_SSSE3) += x86/avg_ssse3_x86_64.asm
endif
-endif
# high bit depth subtract
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes)
@@ -329,7 +320,6 @@
endif #CONFIG_OBMC
endif #CONFIG_VP10_ENCODER
-ifeq ($(CONFIG_USE_X86INC),yes)
DSP_SRCS-$(HAVE_SSE) += x86/sad4d_sse2.asm
DSP_SRCS-$(HAVE_SSE) += x86/sad_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/sad4d_sse2.asm
@@ -340,7 +330,7 @@
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad4d_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad_sse2.asm
endif # CONFIG_VP9_HIGHBITDEPTH
-endif # CONFIG_USE_X86INC
+
endif # CONFIG_ENCODERS
ifneq ($(filter yes,$(CONFIG_ENCODERS) $(CONFIG_POSTPROC) $(CONFIG_VP9_POSTPROC)),)
@@ -370,18 +360,14 @@
DSP_SRCS-$(HAVE_SSE2) += x86/ssim_opt_x86_64.asm
endif # ARCH_X86_64
-ifeq ($(CONFIG_USE_X86INC),yes)
DSP_SRCS-$(HAVE_SSE) += x86/subpel_variance_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/subpel_variance_sse2.asm # Contains SSE2 and SSSE3
-endif # CONFIG_USE_X86INC
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_variance_sse2.c
DSP_SRCS-$(HAVE_SSE4_1) += x86/highbd_variance_sse4.c
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_variance_impl_sse2.asm
-ifeq ($(CONFIG_USE_X86INC),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_subpel_variance_impl_sse2.asm
-endif # CONFIG_USE_X86INC
endif # CONFIG_VP9_HIGHBITDEPTH
endif # CONFIG_ENCODERS || CONFIG_POSTPROC || CONFIG_VP9_POSTPROC
diff --git a/vpx_dsp/vpx_dsp_rtcd_defs.pl b/vpx_dsp/vpx_dsp_rtcd_defs.pl
index a04a684..a210b79 100644
--- a/vpx_dsp/vpx_dsp_rtcd_defs.pl
+++ b/vpx_dsp/vpx_dsp_rtcd_defs.pl
@@ -11,29 +11,6 @@
}
forward_decls qw/vpx_dsp_forward_decls/;
-# x86inc.asm had specific constraints. break it out so it's easy to disable.
-# zero all the variables to avoid tricky else conditions.
-$mmx_x86inc = $sse_x86inc = $sse2_x86inc = $ssse3_x86inc = $avx_x86inc =
- $avx2_x86inc = '';
-$mmx_x86_64_x86inc = $sse_x86_64_x86inc = $sse2_x86_64_x86inc =
- $ssse3_x86_64_x86inc = $avx_x86_64_x86inc = $avx2_x86_64_x86inc = '';
-if (vpx_config("CONFIG_USE_X86INC") eq "yes") {
- $mmx_x86inc = 'mmx';
- $sse_x86inc = 'sse';
- $sse2_x86inc = 'sse2';
- $ssse3_x86inc = 'ssse3';
- $avx_x86inc = 'avx';
- $avx2_x86inc = 'avx2';
- if ($opts{arch} eq "x86_64") {
- $mmx_x86_64_x86inc = 'mmx';
- $sse_x86_64_x86inc = 'sse';
- $sse2_x86_64_x86inc = 'sse2';
- $ssse3_x86_64_x86inc = 'ssse3';
- $avx_x86_64_x86inc = 'avx';
- $avx2_x86_64_x86inc = 'avx2';
- }
-}
-
# optimizations which depend on multiple features
$avx2_ssse3 = '';
if ((vpx_config("HAVE_AVX2") eq "yes") && (vpx_config("HAVE_SSSE3") eq "yes")) {
@@ -68,19 +45,19 @@
#
add_proto qw/void vpx_d207_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_d207_predictor_4x4/, "$sse2_x86inc";
+specialize qw/vpx_d207_predictor_4x4 sse2/;
add_proto qw/void vpx_d207e_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_d207e_predictor_4x4/;
add_proto qw/void vpx_d45_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_d45_predictor_4x4 neon/, "$sse2_x86inc";
+specialize qw/vpx_d45_predictor_4x4 neon sse2/;
add_proto qw/void vpx_d45e_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_d45e_predictor_4x4/;
add_proto qw/void vpx_d63_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_d63_predictor_4x4/, "$ssse3_x86inc";
+specialize qw/vpx_d63_predictor_4x4 ssse3/;
add_proto qw/void vpx_d63e_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_d63e_predictor_4x4/;
@@ -89,7 +66,7 @@
specialize qw/vpx_d63f_predictor_4x4/;
add_proto qw/void vpx_h_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_h_predictor_4x4 neon dspr2 msa/, "$sse2_x86inc";
+specialize qw/vpx_h_predictor_4x4 neon dspr2 msa sse2/;
add_proto qw/void vpx_he_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_he_predictor_4x4/;
@@ -101,49 +78,49 @@
specialize qw/vpx_d135_predictor_4x4 neon/;
add_proto qw/void vpx_d153_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_d153_predictor_4x4/, "$ssse3_x86inc";
+specialize qw/vpx_d153_predictor_4x4 ssse3/;
add_proto qw/void vpx_v_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_v_predictor_4x4 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_v_predictor_4x4 neon msa sse2/;
add_proto qw/void vpx_ve_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_ve_predictor_4x4/;
add_proto qw/void vpx_tm_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_tm_predictor_4x4 neon dspr2 msa/, "$sse2_x86inc";
+specialize qw/vpx_tm_predictor_4x4 neon dspr2 msa sse2/;
add_proto qw/void vpx_dc_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_dc_predictor_4x4 dspr2 msa neon/, "$sse2_x86inc";
+specialize qw/vpx_dc_predictor_4x4 dspr2 msa neon sse2/;
add_proto qw/void vpx_dc_top_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_dc_top_predictor_4x4 msa neon/, "$sse2_x86inc";
+specialize qw/vpx_dc_top_predictor_4x4 msa neon sse2/;
add_proto qw/void vpx_dc_left_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_dc_left_predictor_4x4 msa neon/, "$sse2_x86inc";
+specialize qw/vpx_dc_left_predictor_4x4 msa neon sse2/;
add_proto qw/void vpx_dc_128_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_dc_128_predictor_4x4 msa neon/, "$sse2_x86inc";
+specialize qw/vpx_dc_128_predictor_4x4 msa neon sse2/;
add_proto qw/void vpx_d207_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_d207_predictor_8x8/, "$ssse3_x86inc";
+specialize qw/vpx_d207_predictor_8x8 ssse3/;
add_proto qw/void vpx_d207e_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_d207e_predictor_8x8/;
add_proto qw/void vpx_d45_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_d45_predictor_8x8 neon/, "$sse2_x86inc";
+specialize qw/vpx_d45_predictor_8x8 neon sse2/;
add_proto qw/void vpx_d45e_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_d45e_predictor_8x8/;
add_proto qw/void vpx_d63_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_d63_predictor_8x8/, "$ssse3_x86inc";
+specialize qw/vpx_d63_predictor_8x8 ssse3/;
add_proto qw/void vpx_d63e_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_d63e_predictor_8x8/;
add_proto qw/void vpx_h_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_h_predictor_8x8 neon dspr2 msa/, "$sse2_x86inc";
+specialize qw/vpx_h_predictor_8x8 neon dspr2 msa sse2/;
add_proto qw/void vpx_d117_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_d117_predictor_8x8/;
@@ -152,46 +129,46 @@
specialize qw/vpx_d135_predictor_8x8/;
add_proto qw/void vpx_d153_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_d153_predictor_8x8/, "$ssse3_x86inc";
+specialize qw/vpx_d153_predictor_8x8 ssse3/;
add_proto qw/void vpx_v_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_v_predictor_8x8 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_v_predictor_8x8 neon msa sse2/;
add_proto qw/void vpx_tm_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_tm_predictor_8x8 neon dspr2 msa/, "$sse2_x86inc";
+specialize qw/vpx_tm_predictor_8x8 neon dspr2 msa sse2/;
add_proto qw/void vpx_dc_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_dc_predictor_8x8 dspr2 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_dc_predictor_8x8 dspr2 neon msa sse2/;
add_proto qw/void vpx_dc_top_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_dc_top_predictor_8x8 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_dc_top_predictor_8x8 neon msa sse2/;
add_proto qw/void vpx_dc_left_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_dc_left_predictor_8x8 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_dc_left_predictor_8x8 neon msa sse2/;
add_proto qw/void vpx_dc_128_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_dc_128_predictor_8x8 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_dc_128_predictor_8x8 neon msa sse2/;
add_proto qw/void vpx_d207_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_d207_predictor_16x16/, "$ssse3_x86inc";
+specialize qw/vpx_d207_predictor_16x16 ssse3/;
add_proto qw/void vpx_d207e_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_d207e_predictor_16x16/;
add_proto qw/void vpx_d45_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_d45_predictor_16x16 neon/, "$ssse3_x86inc";
+specialize qw/vpx_d45_predictor_16x16 neon ssse3/;
add_proto qw/void vpx_d45e_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_d45e_predictor_16x16/;
add_proto qw/void vpx_d63_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_d63_predictor_16x16/, "$ssse3_x86inc";
+specialize qw/vpx_d63_predictor_16x16 ssse3/;
add_proto qw/void vpx_d63e_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_d63e_predictor_16x16/;
add_proto qw/void vpx_h_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_h_predictor_16x16 neon dspr2 msa/, "$sse2_x86inc";
+specialize qw/vpx_h_predictor_16x16 neon dspr2 msa sse2/;
add_proto qw/void vpx_d117_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_d117_predictor_16x16/;
@@ -200,46 +177,46 @@
specialize qw/vpx_d135_predictor_16x16/;
add_proto qw/void vpx_d153_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_d153_predictor_16x16/, "$ssse3_x86inc";
+specialize qw/vpx_d153_predictor_16x16 ssse3/;
add_proto qw/void vpx_v_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_v_predictor_16x16 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_v_predictor_16x16 neon msa sse2/;
add_proto qw/void vpx_tm_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_tm_predictor_16x16 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_tm_predictor_16x16 neon msa sse2/;
add_proto qw/void vpx_dc_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_dc_predictor_16x16 dspr2 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_dc_predictor_16x16 dspr2 neon msa sse2/;
add_proto qw/void vpx_dc_top_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_dc_top_predictor_16x16 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_dc_top_predictor_16x16 neon msa sse2/;
add_proto qw/void vpx_dc_left_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_dc_left_predictor_16x16 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_dc_left_predictor_16x16 neon msa sse2/;
add_proto qw/void vpx_dc_128_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_dc_128_predictor_16x16 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_dc_128_predictor_16x16 neon msa sse2/;
add_proto qw/void vpx_d207_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_d207_predictor_32x32/, "$ssse3_x86inc";
+specialize qw/vpx_d207_predictor_32x32 ssse3/;
add_proto qw/void vpx_d207e_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_d207e_predictor_32x32/;
add_proto qw/void vpx_d45_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_d45_predictor_32x32/, "$ssse3_x86inc";
+specialize qw/vpx_d45_predictor_32x32 ssse3/;
add_proto qw/void vpx_d45e_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_d45e_predictor_32x32/;
add_proto qw/void vpx_d63_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_d63_predictor_32x32/, "$ssse3_x86inc";
+specialize qw/vpx_d63_predictor_32x32 ssse3/;
add_proto qw/void vpx_d63e_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_d63e_predictor_32x32/;
add_proto qw/void vpx_h_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_h_predictor_32x32 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_h_predictor_32x32 neon msa sse2/;
add_proto qw/void vpx_d117_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
specialize qw/vpx_d117_predictor_32x32/;
@@ -248,25 +225,25 @@
specialize qw/vpx_d135_predictor_32x32/;
add_proto qw/void vpx_d153_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_d153_predictor_32x32/, "$ssse3_x86inc";
+specialize qw/vpx_d153_predictor_32x32 ssse3/;
add_proto qw/void vpx_v_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_v_predictor_32x32 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_v_predictor_32x32 neon msa sse2/;
add_proto qw/void vpx_tm_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_tm_predictor_32x32 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_tm_predictor_32x32 neon msa sse2/;
add_proto qw/void vpx_dc_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_dc_predictor_32x32 msa neon/, "$sse2_x86inc";
+specialize qw/vpx_dc_predictor_32x32 msa neon sse2/;
add_proto qw/void vpx_dc_top_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_dc_top_predictor_32x32 msa neon/, "$sse2_x86inc";
+specialize qw/vpx_dc_top_predictor_32x32 msa neon sse2/;
add_proto qw/void vpx_dc_left_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_dc_left_predictor_32x32 msa neon/, "$sse2_x86inc";
+specialize qw/vpx_dc_left_predictor_32x32 msa neon sse2/;
add_proto qw/void vpx_dc_128_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left";
-specialize qw/vpx_dc_128_predictor_32x32 msa neon/, "$sse2_x86inc";
+specialize qw/vpx_dc_128_predictor_32x32 msa neon sse2/;
# High bitdepth functions
if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") {
@@ -301,13 +278,13 @@
specialize qw/vpx_highbd_d153_predictor_4x4/;
add_proto qw/void vpx_highbd_v_predictor_4x4/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
- specialize qw/vpx_highbd_v_predictor_4x4/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_v_predictor_4x4 sse2/;
add_proto qw/void vpx_highbd_tm_predictor_4x4/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
- specialize qw/vpx_highbd_tm_predictor_4x4/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_tm_predictor_4x4 sse2/;
add_proto qw/void vpx_highbd_dc_predictor_4x4/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
- specialize qw/vpx_highbd_dc_predictor_4x4/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_dc_predictor_4x4 sse2/;
add_proto qw/void vpx_highbd_dc_top_predictor_4x4/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
specialize qw/vpx_highbd_dc_top_predictor_4x4/;
@@ -349,13 +326,13 @@
specialize qw/vpx_highbd_d153_predictor_8x8/;
add_proto qw/void vpx_highbd_v_predictor_8x8/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
- specialize qw/vpx_highbd_v_predictor_8x8/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_v_predictor_8x8 sse2/;
add_proto qw/void vpx_highbd_tm_predictor_8x8/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
- specialize qw/vpx_highbd_tm_predictor_8x8/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_tm_predictor_8x8 sse2/;
add_proto qw/void vpx_highbd_dc_predictor_8x8/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
- specialize qw/vpx_highbd_dc_predictor_8x8/, "$sse2_x86inc";;
+ specialize qw/vpx_highbd_dc_predictor_8x8 sse2/;;
add_proto qw/void vpx_highbd_dc_top_predictor_8x8/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
specialize qw/vpx_highbd_dc_top_predictor_8x8/;
@@ -397,13 +374,13 @@
specialize qw/vpx_highbd_d153_predictor_16x16/;
add_proto qw/void vpx_highbd_v_predictor_16x16/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
- specialize qw/vpx_highbd_v_predictor_16x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_v_predictor_16x16 sse2/;
add_proto qw/void vpx_highbd_tm_predictor_16x16/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
- specialize qw/vpx_highbd_tm_predictor_16x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_tm_predictor_16x16 sse2/;
add_proto qw/void vpx_highbd_dc_predictor_16x16/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
- specialize qw/vpx_highbd_dc_predictor_16x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_dc_predictor_16x16 sse2/;
add_proto qw/void vpx_highbd_dc_top_predictor_16x16/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
specialize qw/vpx_highbd_dc_top_predictor_16x16/;
@@ -445,13 +422,13 @@
specialize qw/vpx_highbd_d153_predictor_32x32/;
add_proto qw/void vpx_highbd_v_predictor_32x32/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
- specialize qw/vpx_highbd_v_predictor_32x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_v_predictor_32x32 sse2/;
add_proto qw/void vpx_highbd_tm_predictor_32x32/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
- specialize qw/vpx_highbd_tm_predictor_32x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_tm_predictor_32x32 sse2/;
add_proto qw/void vpx_highbd_dc_predictor_32x32/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
- specialize qw/vpx_highbd_dc_predictor_32x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_dc_predictor_32x32 sse2/;
add_proto qw/void vpx_highbd_dc_top_predictor_32x32/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd";
specialize qw/vpx_highbd_dc_top_predictor_32x32/;
@@ -481,8 +458,8 @@
add_proto qw/void vpx_scaled_avg_horiz/, "const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h";
add_proto qw/void vpx_scaled_avg_vert/, "const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h";
-specialize qw/vpx_convolve_copy /, "$sse2_x86inc";
-specialize qw/vpx_convolve_avg /, "$sse2_x86inc";
+specialize qw/vpx_convolve_copy sse2 /;
+specialize qw/vpx_convolve_avg sse2 /;
specialize qw/vpx_convolve8 sse2 ssse3/, "$avx2_ssse3";
specialize qw/vpx_convolve8_horiz sse2 ssse3/, "$avx2_ssse3";
specialize qw/vpx_convolve8_vert sse2 ssse3/, "$avx2_ssse3";
@@ -505,10 +482,10 @@
if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") {
add_proto qw/void vpx_highbd_convolve_copy/, "const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, int bps";
- specialize qw/vpx_highbd_convolve_copy/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_convolve_copy sse2/;
add_proto qw/void vpx_highbd_convolve_avg/, "const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, int bps";
- specialize qw/vpx_highbd_convolve_avg/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_convolve_avg sse2/;
add_proto qw/void vpx_highbd_convolve8/, "const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, int bps";
specialize qw/vpx_highbd_convolve8/, "$sse2_x86_64";
@@ -679,7 +656,7 @@
specialize qw/vpx_fdct4x4_1 sse2/;
add_proto qw/void vpx_fdct8x8/, "const int16_t *input, tran_low_t *output, int stride";
- specialize qw/vpx_fdct8x8 sse2 neon msa/, "$ssse3_x86_64_x86inc";
+ specialize qw/vpx_fdct8x8 sse2 neon msa/, "$ssse3_x86_64";
add_proto qw/void vpx_fdct8x8_1/, "const int16_t *input, tran_low_t *output, int stride";
specialize qw/vpx_fdct8x8_1 sse2 neon msa/;
@@ -711,7 +688,7 @@
specialize qw/vpx_iwht4x4_1_add/;
add_proto qw/void vpx_iwht4x4_16_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride";
- specialize qw/vpx_iwht4x4_16_add/, "$sse2_x86inc";
+ specialize qw/vpx_iwht4x4_16_add sse2/;
add_proto qw/void vpx_highbd_idct4x4_1_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride, int bd";
specialize qw/vpx_highbd_idct4x4_1_add/;
@@ -797,10 +774,10 @@
specialize qw/vpx_idct4x4_1_add sse2/;
add_proto qw/void vpx_idct8x8_64_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride";
- specialize qw/vpx_idct8x8_64_add sse2/, "$ssse3_x86_64_x86inc";
+ specialize qw/vpx_idct8x8_64_add sse2/, "$ssse3_x86_64";
add_proto qw/void vpx_idct8x8_12_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride";
- specialize qw/vpx_idct8x8_12_add sse2/, "$ssse3_x86_64_x86inc";
+ specialize qw/vpx_idct8x8_12_add sse2/, "$ssse3_x86_64";
add_proto qw/void vpx_idct8x8_1_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride";
specialize qw/vpx_idct8x8_1_add sse2/;
@@ -815,15 +792,15 @@
specialize qw/vpx_idct16x16_1_add sse2/;
add_proto qw/void vpx_idct32x32_1024_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride";
- specialize qw/vpx_idct32x32_1024_add sse2/, "$ssse3_x86_64_x86inc";
+ specialize qw/vpx_idct32x32_1024_add sse2/, "$ssse3_x86_64";
add_proto qw/void vpx_idct32x32_135_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride";
- specialize qw/vpx_idct32x32_135_add sse2/, "$ssse3_x86_64_x86inc";
+ specialize qw/vpx_idct32x32_135_add sse2/, "$ssse3_x86_64";
# Need to add 135 eob idct32x32 implementations.
$vpx_idct32x32_135_add_sse2=vpx_idct32x32_1024_add_sse2;
add_proto qw/void vpx_idct32x32_34_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride";
- specialize qw/vpx_idct32x32_34_add sse2/, "$ssse3_x86_64_x86inc";
+ specialize qw/vpx_idct32x32_34_add sse2/, "$ssse3_x86_64";
add_proto qw/void vpx_idct32x32_1_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride";
specialize qw/vpx_idct32x32_1_add sse2/;
@@ -898,10 +875,10 @@
specialize qw/vpx_idct8x8_1_add sse2 neon dspr2 msa/;
add_proto qw/void vpx_idct8x8_64_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride";
- specialize qw/vpx_idct8x8_64_add sse2 neon dspr2 msa/, "$ssse3_x86_64_x86inc";
+ specialize qw/vpx_idct8x8_64_add sse2 neon dspr2 msa/, "$ssse3_x86_64";
add_proto qw/void vpx_idct8x8_12_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride";
- specialize qw/vpx_idct8x8_12_add sse2 neon dspr2 msa/, "$ssse3_x86_64_x86inc";
+ specialize qw/vpx_idct8x8_12_add sse2 neon dspr2 msa/, "$ssse3_x86_64";
add_proto qw/void vpx_idct16x16_1_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride";
specialize qw/vpx_idct16x16_1_add sse2 neon dspr2 msa/;
@@ -913,10 +890,10 @@
specialize qw/vpx_idct16x16_10_add sse2 neon dspr2 msa/;
add_proto qw/void vpx_idct32x32_1024_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride";
- specialize qw/vpx_idct32x32_1024_add sse2 neon dspr2 msa/, "$ssse3_x86_64_x86inc";
+ specialize qw/vpx_idct32x32_1024_add sse2 neon dspr2 msa/, "$ssse3_x86_64";
add_proto qw/void vpx_idct32x32_135_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride";
- specialize qw/vpx_idct32x32_135_add sse2 neon dspr2 msa/, "$ssse3_x86_64_x86inc";
+ specialize qw/vpx_idct32x32_135_add sse2 neon dspr2 msa/, "$ssse3_x86_64";
# Need to add 135 eob idct32x32 implementations.
$vpx_idct32x32_135_add_sse2=vpx_idct32x32_1024_add_sse2;
$vpx_idct32x32_135_add_neon=vpx_idct32x32_1024_add_neon;
@@ -924,7 +901,7 @@
$vpx_idct32x32_135_add_msa=vpx_idct32x32_1024_add_msa;
add_proto qw/void vpx_idct32x32_34_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride";
- specialize qw/vpx_idct32x32_34_add sse2 neon dspr2 msa/, "$ssse3_x86_64_x86inc";
+ specialize qw/vpx_idct32x32_34_add sse2 neon dspr2 msa/, "$ssse3_x86_64";
# Need to add 34 eob idct32x32 neon implementation.
$vpx_idct32x32_34_add_neon=vpx_idct32x32_1024_add_neon;
@@ -935,7 +912,7 @@
specialize qw/vpx_iwht4x4_1_add msa/;
add_proto qw/void vpx_iwht4x4_16_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride";
- specialize qw/vpx_iwht4x4_16_add msa/, "$sse2_x86inc";
+ specialize qw/vpx_iwht4x4_16_add msa sse2/;
} # CONFIG_EMULATE_HARDWARE
} # CONFIG_VP9_HIGHBITDEPTH
} # CONFIG_VP9 || CONFIG_VP10
@@ -945,10 +922,10 @@
#
if ((vpx_config("CONFIG_VP9_ENCODER") eq "yes") || (vpx_config("CONFIG_VP10_ENCODER") eq "yes")) {
add_proto qw/void vpx_quantize_b/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan";
- specialize qw/vpx_quantize_b sse2/, "$ssse3_x86_64_x86inc", "$avx_x86_64_x86inc";
+ specialize qw/vpx_quantize_b sse2/, "$ssse3_x86_64", "$avx_x86_64";
add_proto qw/void vpx_quantize_b_32x32/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan";
- specialize qw/vpx_quantize_b_32x32/, "$ssse3_x86_64_x86inc", "$avx_x86_64_x86inc";
+ specialize qw/vpx_quantize_b_32x32/, "$ssse3_x86_64", "$avx_x86_64";
if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") {
add_proto qw/void vpx_highbd_quantize_b/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan";
@@ -985,51 +962,60 @@
# Block subtraction
#
add_proto qw/void vpx_subtract_block/, "int rows, int cols, int16_t *diff_ptr, ptrdiff_t diff_stride, const uint8_t *src_ptr, ptrdiff_t src_stride, const uint8_t *pred_ptr, ptrdiff_t pred_stride";
-specialize qw/vpx_subtract_block neon msa/, "$sse2_x86inc";
+specialize qw/vpx_subtract_block neon msa sse2/;
if (vpx_config("CONFIG_VP10_ENCODER") eq "yes") {
- #
- # Sum of Squares
- #
- add_proto qw/uint64_t vpx_sum_squares_2d_i16/, "const int16_t *src, int stride, int size";
- specialize qw/vpx_sum_squares_2d_i16 sse2/;
+#
+# Sum of Squares
+#
+add_proto qw/uint64_t vpx_sum_squares_2d_i16/, "const int16_t *src, int stride, int size";
+specialize qw/vpx_sum_squares_2d_i16 sse2/;
- add_proto qw/uint64_t vpx_sum_squares_i16/, "const int16_t *src, uint32_t N";
- specialize qw/vpx_sum_squares_i16 sse2/;
+add_proto qw/uint64_t vpx_sum_squares_i16/, "const int16_t *src, uint32_t N";
+specialize qw/vpx_sum_squares_i16 sse2/;
}
+
+# Single block SAD
+#
+add_proto qw/unsigned int vpx_sad64x64/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride";
+specialize qw/vpx_sad64x64 avx2 neon msa sse2/;
+
+add_proto qw/unsigned int vpx_sad64x32/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride";
+specialize qw/vpx_sad64x32 avx2 msa sse2/;
+
add_proto qw/unsigned int vpx_sad32x64/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride";
-specialize qw/vpx_sad32x64 avx2 msa/, "$sse2_x86inc";
+specialize qw/vpx_sad32x64 avx2 msa sse2/;
add_proto qw/unsigned int vpx_sad32x32/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride";
-specialize qw/vpx_sad32x32 avx2 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_sad32x32 avx2 neon msa sse2/;
add_proto qw/unsigned int vpx_sad32x16/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride";
-specialize qw/vpx_sad32x16 avx2 msa/, "$sse2_x86inc";
+specialize qw/vpx_sad32x16 avx2 msa sse2/;
add_proto qw/unsigned int vpx_sad16x32/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride";
-specialize qw/vpx_sad16x32 msa/, "$sse2_x86inc";
+specialize qw/vpx_sad16x32 msa sse2/;
add_proto qw/unsigned int vpx_sad16x16/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride";
-specialize qw/vpx_sad16x16 media neon msa/, "$sse2_x86inc";
+specialize qw/vpx_sad16x16 media neon msa sse2/;
add_proto qw/unsigned int vpx_sad16x8/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride";
-specialize qw/vpx_sad16x8 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_sad16x8 neon msa sse2/;
add_proto qw/unsigned int vpx_sad8x16/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride";
-specialize qw/vpx_sad8x16 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_sad8x16 neon msa sse2/;
add_proto qw/unsigned int vpx_sad8x8/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride";
-specialize qw/vpx_sad8x8 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_sad8x8 neon msa sse2/;
add_proto qw/unsigned int vpx_sad8x4/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride";
-specialize qw/vpx_sad8x4 msa/, "$sse2_x86inc";
+specialize qw/vpx_sad8x4 msa sse2/;
add_proto qw/unsigned int vpx_sad4x8/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride";
-specialize qw/vpx_sad4x8 msa/, "$sse2_x86inc";
+specialize qw/vpx_sad4x8 msa sse2/;
add_proto qw/unsigned int vpx_sad4x4/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride";
-specialize qw/vpx_sad4x4 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_sad4x4 neon msa sse2/;
#
# Avg
@@ -1062,7 +1048,7 @@
}
add_proto qw/void vpx_hadamard_8x8/, "const int16_t *src_diff, int src_stride, int16_t *coeff";
- specialize qw/vpx_hadamard_8x8 sse2 neon/, "$ssse3_x86_64_x86inc";
+ specialize qw/vpx_hadamard_8x8 sse2 neon/, "$ssse3_x86_64";
add_proto qw/void vpx_hadamard_16x16/, "const int16_t *src_diff, int src_stride, int16_t *coeff";
specialize qw/vpx_hadamard_16x16 sse2 neon/;
@@ -1089,39 +1075,39 @@
add_proto qw/unsigned int/, "vpx_sad${w}x${h}_avg", "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred";
}
-specialize qw/vpx_sad128x128 /, "$sse2_x86inc";
-specialize qw/vpx_sad128x64 /, "$sse2_x86inc";
-specialize qw/vpx_sad64x128 /, "$sse2_x86inc";
-specialize qw/vpx_sad64x64 avx2 neon msa/, "$sse2_x86inc";
-specialize qw/vpx_sad64x32 avx2 msa/, "$sse2_x86inc";
-specialize qw/vpx_sad32x64 avx2 msa/, "$sse2_x86inc";
-specialize qw/vpx_sad32x32 avx2 neon msa/, "$sse2_x86inc";
-specialize qw/vpx_sad32x16 avx2 msa/, "$sse2_x86inc";
-specialize qw/vpx_sad16x32 msa/, "$sse2_x86inc";
-specialize qw/vpx_sad16x16 media neon msa/, "$sse2_x86inc";
-specialize qw/vpx_sad16x8 neon msa/, "$sse2_x86inc";
-specialize qw/vpx_sad8x16 neon msa/, "$sse2_x86inc";
-specialize qw/vpx_sad8x8 neon msa/, "$sse2_x86inc";
-specialize qw/vpx_sad8x4 msa/, "$sse2_x86inc";
-specialize qw/vpx_sad4x8 msa/, "$sse2_x86inc";
-specialize qw/vpx_sad4x4 neon msa/, "$sse2_x86inc";
+specialize qw/vpx_sad128x128 sse2/;
+specialize qw/vpx_sad128x64 sse2/;
+specialize qw/vpx_sad64x128 sse2/;
+specialize qw/vpx_sad64x64 avx2 msa sse2/;
+specialize qw/vpx_sad64x32 avx2 msa sse2/;
+specialize qw/vpx_sad32x64 avx2 msa sse2/;
+specialize qw/vpx_sad32x32 avx2 neon msa sse2/;
+specialize qw/vpx_sad32x16 avx2 msa sse2/;
+specialize qw/vpx_sad16x32 msa sse2/;
+specialize qw/vpx_sad16x16 media neon msa sse2/;
+specialize qw/vpx_sad16x8 neon msa sse2/;
+specialize qw/vpx_sad8x16 neon msa sse2/;
+specialize qw/vpx_sad8x8 neon msa sse2/;
+specialize qw/vpx_sad8x4 msa sse2/;
+specialize qw/vpx_sad4x8 msa sse2/;
+specialize qw/vpx_sad4x4 neon msa sse2/;
-specialize qw/vpx_sad128x128_avg /, "$sse2_x86inc";
-specialize qw/vpx_sad128x64_avg /, "$sse2_x86inc";
-specialize qw/vpx_sad64x128_avg /, "$sse2_x86inc";
-specialize qw/vpx_sad64x64_avg avx2 msa/, "$sse2_x86inc";
-specialize qw/vpx_sad64x32_avg avx2 msa/, "$sse2_x86inc";
-specialize qw/vpx_sad32x64_avg avx2 msa/, "$sse2_x86inc";
-specialize qw/vpx_sad32x32_avg avx2 msa/, "$sse2_x86inc";
-specialize qw/vpx_sad32x16_avg avx2 msa/, "$sse2_x86inc";
-specialize qw/vpx_sad16x32_avg msa/, "$sse2_x86inc";
-specialize qw/vpx_sad16x16_avg msa/, "$sse2_x86inc";
-specialize qw/vpx_sad16x8_avg msa/, "$sse2_x86inc";
-specialize qw/vpx_sad8x16_avg msa/, "$sse2_x86inc";
-specialize qw/vpx_sad8x8_avg msa/, "$sse2_x86inc";
-specialize qw/vpx_sad8x4_avg msa/, "$sse2_x86inc";
-specialize qw/vpx_sad4x8_avg msa/, "$sse2_x86inc";
-specialize qw/vpx_sad4x4_avg msa/, "$sse2_x86inc";
+specialize qw/vpx_sad128x128_avg sse2/;
+specialize qw/vpx_sad128x64_avg sse2/;
+specialize qw/vpx_sad64x128_avg sse2/;
+specialize qw/vpx_sad64x64_avg avx2 msa sse2/;
+specialize qw/vpx_sad64x32_avg avx2 msa sse2/;
+specialize qw/vpx_sad32x64_avg avx2 msa sse2/;
+specialize qw/vpx_sad32x32_avg avx2 msa sse2/;
+specialize qw/vpx_sad32x16_avg avx2 msa sse2/;
+specialize qw/vpx_sad16x32_avg msa sse2/;
+specialize qw/vpx_sad16x16_avg msa sse2/;
+specialize qw/vpx_sad16x8_avg msa sse2/;
+specialize qw/vpx_sad8x16_avg msa sse2/;
+specialize qw/vpx_sad8x8_avg msa sse2/;
+specialize qw/vpx_sad8x4_avg msa sse2/;
+specialize qw/vpx_sad4x8_avg msa sse2/;
+specialize qw/vpx_sad4x4_avg msa sse2/;
if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") {
foreach (@block_sizes) {
@@ -1129,8 +1115,8 @@
add_proto qw/unsigned int/, "vpx_highbd_sad${w}x${h}", "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride";
add_proto qw/unsigned int/, "vpx_highbd_sad${w}x${h}_avg", "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred";
if ($w != 128 && $h != 128 && $w != 4) {
- specialize "vpx_highbd_sad${w}x${h}", "$sse2_x86inc";
- specialize "vpx_highbd_sad${w}x${h}_avg", "$sse2_x86inc";
+ specialize "vpx_highbd_sad${w}x${h}", qw/sse2/;
+ specialize "vpx_highbd_sad${w}x${h}_avg", qw/sse2/;
}
}
}
@@ -1235,22 +1221,22 @@
add_proto qw/void/, "vpx_sad${w}x${h}x4d", "const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array";
}
-specialize qw/vpx_sad128x128x4d /, "$sse2_x86inc";
-specialize qw/vpx_sad128x64x4d /, "$sse2_x86inc";
-specialize qw/vpx_sad64x128x4d /, "$sse2_x86inc";
-specialize qw/vpx_sad64x64x4d avx2 neon msa/, "$sse2_x86inc";
-specialize qw/vpx_sad64x32x4d msa/, "$sse2_x86inc";
-specialize qw/vpx_sad32x64x4d msa/, "$sse2_x86inc";
-specialize qw/vpx_sad32x32x4d avx2 neon msa/, "$sse2_x86inc";
-specialize qw/vpx_sad32x16x4d msa/, "$sse2_x86inc";
-specialize qw/vpx_sad16x32x4d msa/, "$sse2_x86inc";
-specialize qw/vpx_sad16x16x4d neon msa/, "$sse2_x86inc";
-specialize qw/vpx_sad16x8x4d msa/, "$sse2_x86inc";
-specialize qw/vpx_sad8x16x4d msa/, "$sse2_x86inc";
-specialize qw/vpx_sad8x8x4d msa/, "$sse2_x86inc";
-specialize qw/vpx_sad8x4x4d msa/, "$sse2_x86inc";
-specialize qw/vpx_sad4x8x4d msa/, "$sse2_x86inc";
-specialize qw/vpx_sad4x4x4d msa/, "$sse2_x86inc";
+specialize qw/vpx_sad128x128x4d sse2/;
+specialize qw/vpx_sad128x64x4d sse2/;
+specialize qw/vpx_sad64x128x4d sse2/;
+specialize qw/vpx_sad64x64x4d avx2 neon msa sse2/;
+specialize qw/vpx_sad64x32x4d msa sse2/;
+specialize qw/vpx_sad32x64x4d msa sse2/;
+specialize qw/vpx_sad32x32x4d avx2 neon msa sse2/;
+specialize qw/vpx_sad32x16x4d msa sse2/;
+specialize qw/vpx_sad16x32x4d msa sse2/;
+specialize qw/vpx_sad16x16x4d neon msa sse2/;
+specialize qw/vpx_sad16x8x4d msa sse2/;
+specialize qw/vpx_sad8x16x4d msa sse2/;
+specialize qw/vpx_sad8x8x4d msa sse2/;
+specialize qw/vpx_sad8x4x4d msa sse2/;
+specialize qw/vpx_sad4x8x4d msa sse2/;
+specialize qw/vpx_sad4x4x4d msa sse2/;
if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") {
#
@@ -1260,7 +1246,7 @@
($w, $h) = @$_;
add_proto qw/void/, "vpx_highbd_sad${w}x${h}x4d", "const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array";
if ($w != 128 && $h != 128) {
- specialize "vpx_highbd_sad${w}x${h}x4d", "$sse2_x86inc";
+ specialize "vpx_highbd_sad${w}x${h}x4d", qw/sse2/;
}
}
}
@@ -1409,34 +1395,33 @@
specialize qw/vpx_variance4x8 sse2 msa/;
specialize qw/vpx_variance4x4 sse2 msa/;
-specialize qw/vpx_sub_pixel_variance64x64 avx2 neon msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_variance64x32 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_variance32x64 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_variance32x32 avx2 neon msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_variance32x16 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_variance16x32 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_variance16x16 media neon msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_variance16x8 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_variance8x16 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_variance8x8 media neon msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_variance8x4 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_variance4x8 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_variance4x4 msa/, "$sse2_x86inc", "$ssse3_x86inc";
+specialize qw/vpx_sub_pixel_variance64x64 avx2 neon msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_variance64x32 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_variance32x64 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_variance32x32 avx2 neon msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_variance32x16 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_variance16x32 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_variance16x16 media neon msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_variance16x8 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_variance8x16 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_variance8x8 media neon msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_variance8x4 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_variance4x8 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_variance4x4 msa sse2 ssse3/;
-specialize qw/vpx_sub_pixel_avg_variance64x64 avx2 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_avg_variance64x32 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_avg_variance32x64 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_avg_variance32x32 avx2 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_avg_variance32x16 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_avg_variance16x32 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_avg_variance16x16 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_avg_variance16x8 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_avg_variance8x16 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_avg_variance8x8 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_avg_variance8x4 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_avg_variance4x8 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-specialize qw/vpx_sub_pixel_avg_variance4x4 msa/, "$sse2_x86inc", "$ssse3_x86inc";
-
+specialize qw/vpx_sub_pixel_avg_variance64x64 avx2 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_avg_variance64x32 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_avg_variance32x64 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_avg_variance32x32 avx2 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_avg_variance32x16 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_avg_variance16x32 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_avg_variance16x16 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_avg_variance16x8 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_avg_variance8x16 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_avg_variance8x8 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_avg_variance8x4 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_avg_variance4x8 msa sse2 ssse3/;
+specialize qw/vpx_sub_pixel_avg_variance4x4 msa sse2 ssse3/;
if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") {
foreach $bd (8, 10, 12) {
foreach (@block_sizes) {
@@ -1451,8 +1436,8 @@
specialize "vpx_highbd_${bd}_variance${w}x${h}", "sse4_1";
}
if ($w != 128 && $h != 128 && $w != 4) {
- specialize "vpx_highbd_${bd}_sub_pixel_variance${w}x${h}", $sse2_x86inc;
- specialize "vpx_highbd_${bd}_sub_pixel_avg_variance${w}x${h}", $sse2_x86inc;
+ specialize "vpx_highbd_${bd}_sub_pixel_variance${w}x${h}", qw/sse2/;
+ specialize "vpx_highbd_${bd}_sub_pixel_avg_variance${w}x${h}", qw/sse2/;
}
if ($w == 4 && $h == 4) {
specialize "vpx_highbd_${bd}_sub_pixel_variance${w}x${h}", "sse4_1";
@@ -1513,43 +1498,43 @@
}
add_proto qw/uint32_t vpx_sub_pixel_avg_variance64x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_sub_pixel_avg_variance64x64 avx2 msa/, "$sse2_x86inc", "$ssse3_x86inc";
+ specialize qw/vpx_sub_pixel_avg_variance64x64 avx2 msa sse2 ssse3/;
add_proto qw/uint32_t vpx_sub_pixel_avg_variance64x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_sub_pixel_avg_variance64x32 msa/, "$sse2_x86inc", "$ssse3_x86inc";
+ specialize qw/vpx_sub_pixel_avg_variance64x32 msa sse2 ssse3/;
add_proto qw/uint32_t vpx_sub_pixel_avg_variance32x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_sub_pixel_avg_variance32x64 msa/, "$sse2_x86inc", "$ssse3_x86inc";
+ specialize qw/vpx_sub_pixel_avg_variance32x64 msa sse2 ssse3/;
add_proto qw/uint32_t vpx_sub_pixel_avg_variance32x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_sub_pixel_avg_variance32x32 avx2 msa/, "$sse2_x86inc", "$ssse3_x86inc";
+ specialize qw/vpx_sub_pixel_avg_variance32x32 avx2 msa sse2 ssse3/;
add_proto qw/uint32_t vpx_sub_pixel_avg_variance32x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_sub_pixel_avg_variance32x16 msa/, "$sse2_x86inc", "$ssse3_x86inc";
+ specialize qw/vpx_sub_pixel_avg_variance32x16 msa sse2 ssse3/;
add_proto qw/uint32_t vpx_sub_pixel_avg_variance16x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_sub_pixel_avg_variance16x32 msa/, "$sse2_x86inc", "$ssse3_x86inc";
+ specialize qw/vpx_sub_pixel_avg_variance16x32 msa sse2 ssse3/;
add_proto qw/uint32_t vpx_sub_pixel_avg_variance16x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_sub_pixel_avg_variance16x16 msa/, "$sse2_x86inc", "$ssse3_x86inc";
+ specialize qw/vpx_sub_pixel_avg_variance16x16 msa sse2 ssse3/;
add_proto qw/uint32_t vpx_sub_pixel_avg_variance16x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_sub_pixel_avg_variance16x8 msa/, "$sse2_x86inc", "$ssse3_x86inc";
+ specialize qw/vpx_sub_pixel_avg_variance16x8 msa sse2 ssse3/;
add_proto qw/uint32_t vpx_sub_pixel_avg_variance8x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_sub_pixel_avg_variance8x16 msa/, "$sse2_x86inc", "$ssse3_x86inc";
+ specialize qw/vpx_sub_pixel_avg_variance8x16 msa sse2 ssse3/;
add_proto qw/uint32_t vpx_sub_pixel_avg_variance8x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_sub_pixel_avg_variance8x8 msa/, "$sse2_x86inc", "$ssse3_x86inc";
+ specialize qw/vpx_sub_pixel_avg_variance8x8 msa sse2 ssse3/;
add_proto qw/uint32_t vpx_sub_pixel_avg_variance8x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_sub_pixel_avg_variance8x4 msa/, "$sse2_x86inc", "$ssse3_x86inc";
+ specialize qw/vpx_sub_pixel_avg_variance8x4 msa sse2 ssse3/;
add_proto qw/uint32_t vpx_sub_pixel_avg_variance4x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_sub_pixel_avg_variance4x8 msa/, "$sse2_x86inc", "$ssse3_x86inc";
+ specialize qw/vpx_sub_pixel_avg_variance4x8 msa sse2 ssse3/;
add_proto qw/uint32_t vpx_sub_pixel_avg_variance4x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_sub_pixel_avg_variance4x4 msa/, "$sse2_x86inc", "$ssse3_x86inc";
+ specialize qw/vpx_sub_pixel_avg_variance4x4 msa sse2 ssse3/;
#
# Specialty Subpixel
@@ -1709,217 +1694,217 @@
# Subpixel Variance
#
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance64x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_12_sub_pixel_variance64x64/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_variance64x64 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance64x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_12_sub_pixel_variance64x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_variance64x32 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance32x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_12_sub_pixel_variance32x64/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_variance32x64 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance32x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_12_sub_pixel_variance32x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_variance32x32 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance32x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_12_sub_pixel_variance32x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_variance32x16 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance16x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_12_sub_pixel_variance16x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_variance16x32 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance16x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_12_sub_pixel_variance16x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_variance16x16 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance16x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_12_sub_pixel_variance16x8/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_variance16x8 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance8x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_12_sub_pixel_variance8x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_variance8x16 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance8x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_12_sub_pixel_variance8x8/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_variance8x8 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance8x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_12_sub_pixel_variance8x4/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_variance8x4 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance4x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance4x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance64x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_10_sub_pixel_variance64x64/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_variance64x64 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance64x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_10_sub_pixel_variance64x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_variance64x32 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance32x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_10_sub_pixel_variance32x64/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_variance32x64 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance32x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_10_sub_pixel_variance32x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_variance32x32 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance32x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_10_sub_pixel_variance32x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_variance32x16 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance16x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_10_sub_pixel_variance16x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_variance16x32 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance16x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_10_sub_pixel_variance16x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_variance16x16 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance16x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_10_sub_pixel_variance16x8/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_variance16x8 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance8x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_10_sub_pixel_variance8x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_variance8x16 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance8x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_10_sub_pixel_variance8x8/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_variance8x8 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance8x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_10_sub_pixel_variance8x4/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_variance8x4 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance4x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance4x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance64x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_8_sub_pixel_variance64x64/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_variance64x64 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance64x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_8_sub_pixel_variance64x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_variance64x32 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance32x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_8_sub_pixel_variance32x64/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_variance32x64 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance32x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_8_sub_pixel_variance32x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_variance32x32 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance32x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_8_sub_pixel_variance32x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_variance32x16 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance16x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_8_sub_pixel_variance16x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_variance16x32 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance16x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_8_sub_pixel_variance16x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_variance16x16 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance16x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_8_sub_pixel_variance16x8/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_variance16x8 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance8x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_8_sub_pixel_variance8x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_variance8x16 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance8x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_8_sub_pixel_variance8x8/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_variance8x8 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance8x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
- specialize qw/vpx_highbd_8_sub_pixel_variance8x4/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_variance8x4 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance4x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance4x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse";
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance64x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_12_sub_pixel_avg_variance64x64/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_avg_variance64x64 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance64x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_12_sub_pixel_avg_variance64x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_avg_variance64x32 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance32x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_12_sub_pixel_avg_variance32x64/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_avg_variance32x64 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance32x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_12_sub_pixel_avg_variance32x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_avg_variance32x32 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance32x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_12_sub_pixel_avg_variance32x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_avg_variance32x16 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance16x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_12_sub_pixel_avg_variance16x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_avg_variance16x32 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance16x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_12_sub_pixel_avg_variance16x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_avg_variance16x16 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance16x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_12_sub_pixel_avg_variance16x8/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_avg_variance16x8 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance8x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_12_sub_pixel_avg_variance8x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_avg_variance8x16 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance8x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_12_sub_pixel_avg_variance8x8/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_avg_variance8x8 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance8x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_12_sub_pixel_avg_variance8x4/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_12_sub_pixel_avg_variance8x4 sse2/;
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance4x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance4x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance64x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_10_sub_pixel_avg_variance64x64/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_avg_variance64x64 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance64x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_10_sub_pixel_avg_variance64x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_avg_variance64x32 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance32x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_10_sub_pixel_avg_variance32x64/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_avg_variance32x64 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance32x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_10_sub_pixel_avg_variance32x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_avg_variance32x32 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance32x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_10_sub_pixel_avg_variance32x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_avg_variance32x16 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance16x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_10_sub_pixel_avg_variance16x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_avg_variance16x32 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance16x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_10_sub_pixel_avg_variance16x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_avg_variance16x16 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance16x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_10_sub_pixel_avg_variance16x8/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_avg_variance16x8 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance8x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_10_sub_pixel_avg_variance8x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_avg_variance8x16 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance8x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_10_sub_pixel_avg_variance8x8/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_avg_variance8x8 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance8x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_10_sub_pixel_avg_variance8x4/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_10_sub_pixel_avg_variance8x4 sse2/;
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance4x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance4x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance64x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_8_sub_pixel_avg_variance64x64/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_avg_variance64x64 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance64x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_8_sub_pixel_avg_variance64x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_avg_variance64x32 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance32x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_8_sub_pixel_avg_variance32x64/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_avg_variance32x64 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance32x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_8_sub_pixel_avg_variance32x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_avg_variance32x32 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance32x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_8_sub_pixel_avg_variance32x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_avg_variance32x16 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance16x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_8_sub_pixel_avg_variance16x32/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_avg_variance16x32 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance16x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_8_sub_pixel_avg_variance16x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_avg_variance16x16 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance16x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_8_sub_pixel_avg_variance16x8/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_avg_variance16x8 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance8x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_8_sub_pixel_avg_variance8x16/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_avg_variance8x16 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance8x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_8_sub_pixel_avg_variance8x8/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_avg_variance8x8 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance8x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
- specialize qw/vpx_highbd_8_sub_pixel_avg_variance8x4/, "$sse2_x86inc";
+ specialize qw/vpx_highbd_8_sub_pixel_avg_variance8x4 sse2/;
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance4x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance4x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred";
@@ -1932,6 +1917,18 @@
if (vpx_config("CONFIG_POSTPROC") eq "yes" || vpx_config("CONFIG_VP9_POSTPROC") eq "yes") {
add_proto qw/void vpx_plane_add_noise/, "uint8_t *Start, char *noise, char blackclamp[16], char whiteclamp[16], char bothclamp[16], unsigned int Width, unsigned int Height, int Pitch";
specialize qw/vpx_plane_add_noise sse2 msa/;
+
+ add_proto qw/void vpx_mbpost_proc_down/, "unsigned char *dst, int pitch, int rows, int cols,int flimit";
+ specialize qw/vpx_mbpost_proc_down sse2 msa/;
+ $vpx_mbpost_proc_down_sse2=vpx_mbpost_proc_down_xmm;
+
+ add_proto qw/void vpx_mbpost_proc_across_ip/, "unsigned char *dst, int pitch, int rows, int cols,int flimit";
+ specialize qw/vpx_mbpost_proc_across_ip sse2 msa/;
+ $vpx_mbpost_proc_across_ip_sse2=vpx_mbpost_proc_across_ip_xmm;
+
+ add_proto qw/void vpx_post_proc_down_and_across_mb_row/, "unsigned char *src, unsigned char *dst, int src_pitch, int dst_pitch, int cols, unsigned char *flimits, int size";
+ specialize qw/vpx_post_proc_down_and_across_mb_row sse2 msa/;
+
}
} # CONFIG_ENCODERS || CONFIG_POSTPROC || CONFIG_VP9_POSTPROC
diff --git a/vp8/common/x86/postproc_sse2.asm b/vpx_dsp/x86/deblock_sse2.asm
similarity index 94%
rename from vp8/common/x86/postproc_sse2.asm
rename to vpx_dsp/x86/deblock_sse2.asm
index de17afa..6df360d 100644
--- a/vp8/common/x86/postproc_sse2.asm
+++ b/vpx_dsp/x86/deblock_sse2.asm
@@ -83,7 +83,7 @@
add rbx, 16
%endmacro
-;void vp8_post_proc_down_and_across_mb_row_sse2
+;void vpx_post_proc_down_and_across_mb_row_sse2
;(
; unsigned char *src_ptr,
; unsigned char *dst_ptr,
@@ -93,8 +93,8 @@
; int *flimits,
; int size
;)
-global sym(vp8_post_proc_down_and_across_mb_row_sse2) PRIVATE
-sym(vp8_post_proc_down_and_across_mb_row_sse2):
+global sym(vpx_post_proc_down_and_across_mb_row_sse2) PRIVATE
+sym(vpx_post_proc_down_and_across_mb_row_sse2):
push rbp
mov rbp, rsp
SHADOW_ARGS_TO_STACK 7
@@ -198,7 +198,7 @@
UPDATE_FLIMIT
jmp .acrossnextcol
-.acrossdone
+.acrossdone:
; last 16 pixels
movq QWORD PTR [rdi+rdx-16], mm0
@@ -230,11 +230,11 @@
ret
%undef flimit
-;void vp8_mbpost_proc_down_xmm(unsigned char *dst,
+;void vpx_mbpost_proc_down_xmm(unsigned char *dst,
; int pitch, int rows, int cols,int flimit)
-extern sym(vp8_rv)
-global sym(vp8_mbpost_proc_down_xmm) PRIVATE
-sym(vp8_mbpost_proc_down_xmm):
+extern sym(vpx_rv)
+global sym(vpx_mbpost_proc_down_xmm) PRIVATE
+sym(vpx_mbpost_proc_down_xmm):
push rbp
mov rbp, rsp
SHADOW_ARGS_TO_STACK 5
@@ -257,7 +257,7 @@
%define flimit4 [rsp+128]
%if ABI_IS_32BIT=0
- lea r8, [GLOBAL(sym(vp8_rv))]
+ lea r8, [GLOBAL(sym(vpx_rv))]
%endif
;rows +=8;
@@ -278,7 +278,7 @@
lea rdi, [rdi+rdx]
movq xmm1, QWORD ptr[rdi] ; first row
mov rcx, 8
-.init_borderd ; initialize borders
+.init_borderd: ; initialize borders
lea rdi, [rdi + rax]
movq [rdi], xmm1
@@ -291,7 +291,7 @@
mov rdi, rsi
movq xmm1, QWORD ptr[rdi] ; first row
mov rcx, 8
-.init_border ; initialize borders
+.init_border: ; initialize borders
lea rdi, [rdi + rax]
movq [rdi], xmm1
@@ -403,13 +403,13 @@
and rcx, 127
%if ABI_IS_32BIT=1 && CONFIG_PIC=1
push rax
- lea rax, [GLOBAL(sym(vp8_rv))]
- movdqu xmm4, [rax + rcx*2] ;vp8_rv[rcx*2]
+ lea rax, [GLOBAL(sym(vpx_rv))]
+ movdqu xmm4, [rax + rcx*2] ;vpx_rv[rcx*2]
pop rax
%elif ABI_IS_32BIT=0
- movdqu xmm4, [r8 + rcx*2] ;vp8_rv[rcx*2]
+ movdqu xmm4, [r8 + rcx*2] ;vpx_rv[rcx*2]
%else
- movdqu xmm4, [sym(vp8_rv) + rcx*2]
+ movdqu xmm4, [sym(vpx_rv) + rcx*2]
%endif
paddw xmm1, xmm4
@@ -434,7 +434,7 @@
movq mm0, [rsp + rcx*8] ;d[rcx*8]
movq [rsi], mm0
-.skip_assignment
+.skip_assignment:
lea rsi, [rsi+rax]
lea rdi, [rdi+rax]
@@ -462,10 +462,10 @@
%undef flimit4
-;void vp8_mbpost_proc_across_ip_xmm(unsigned char *src,
+;void vpx_mbpost_proc_across_ip_xmm(unsigned char *src,
; int pitch, int rows, int cols,int flimit)
-global sym(vp8_mbpost_proc_across_ip_xmm) PRIVATE
-sym(vp8_mbpost_proc_across_ip_xmm):
+global sym(vpx_mbpost_proc_across_ip_xmm) PRIVATE
+sym(vpx_mbpost_proc_across_ip_xmm):
push rbp
mov rbp, rsp
SHADOW_ARGS_TO_STACK 5
diff --git a/vpx_dsp/x86/highbd_variance_sse2.c b/vpx_dsp/x86/highbd_variance_sse2.c
index 7bfa383..3643915 100644
--- a/vpx_dsp/x86/highbd_variance_sse2.c
+++ b/vpx_dsp/x86/highbd_variance_sse2.c
@@ -147,24 +147,28 @@
const uint8_t *src8, int src_stride, \
const uint8_t *ref8, int ref_stride, uint32_t *sse) { \
int sum; \
+ int64_t var; \
uint16_t *src = CONVERT_TO_SHORTPTR(src8); \
uint16_t *ref = CONVERT_TO_SHORTPTR(ref8); \
highbd_10_variance_sse2( \
src, src_stride, ref, ref_stride, w, h, sse, &sum, \
vpx_highbd_calc##block_size##x##block_size##var_sse2, block_size); \
- return *sse - (((int64_t)sum * sum) >> shift); \
+ var = (int64_t)(*sse) - (((int64_t)sum * sum) >> shift); \
+ return (var >= 0) ? (uint32_t)var : 0; \
} \
\
uint32_t vpx_highbd_12_variance##w##x##h##_sse2( \
const uint8_t *src8, int src_stride, \
const uint8_t *ref8, int ref_stride, uint32_t *sse) { \
int sum; \
+ int64_t var; \
uint16_t *src = CONVERT_TO_SHORTPTR(src8); \
uint16_t *ref = CONVERT_TO_SHORTPTR(ref8); \
highbd_12_variance_sse2( \
src, src_stride, ref, ref_stride, w, h, sse, &sum, \
vpx_highbd_calc##block_size##x##block_size##var_sse2, block_size); \
- return *sse - (((int64_t)sum * sum) >> shift); \
+ var = (int64_t)(*sse) - (((int64_t)sum * sum) >> shift); \
+ return (var >= 0) ? (uint32_t)var : 0; \
}
VAR_FN(64, 64, 16, 12);
@@ -246,7 +250,6 @@
return *sse;
}
-#if CONFIG_USE_X86INC
// The 2 unused parameters are place holders for PIC enabled build.
// These definitions are for functions defined in
// highbd_subpel_variance_impl_sse2.asm
@@ -593,7 +596,6 @@
#undef FNS
#undef FN
-#endif // CONFIG_USE_X86INC
void vpx_highbd_upsampled_pred_sse2(uint16_t *comp_pred,
int width, int height,
diff --git a/vpx_dsp/x86/intrapred_sse2.asm b/vpx_dsp/x86/intrapred_sse2.asm
index cd6a6ae..c18095c 100644
--- a/vpx_dsp/x86/intrapred_sse2.asm
+++ b/vpx_dsp/x86/intrapred_sse2.asm
@@ -756,7 +756,7 @@
psubw m0, m2 ; t1-tl t2-tl ... t8-tl [word]
movq m2, [leftq]
punpcklbw m2, m1 ; l1 l2 l3 l4 l5 l6 l7 l8 [word]
-.loop
+.loop:
pshuflw m4, m2, 0x0 ; [63:0] l1 l1 l1 l1 [word]
pshuflw m3, m2, 0x55 ; [63:0] l2 l2 l2 l2 [word]
punpcklqdq m4, m4 ; l1 l1 l1 l1 l1 l1 l1 l1 [word]
diff --git a/vpx_dsp/x86/loopfilter_sse2.c b/vpx_dsp/x86/loopfilter_sse2.c
index 39a6ae3..739adf3 100644
--- a/vpx_dsp/x86/loopfilter_sse2.c
+++ b/vpx_dsp/x86/loopfilter_sse2.c
@@ -111,7 +111,6 @@
__m128i q1p1, q0p0, p3p2, p2p1, p1p0, q3q2, q2q1, q1q0, ps1ps0, qs1qs0;
__m128i mask, hev;
- p3p2 = _mm_loadl_epi64((__m128i *)(s - 3 * p));
p3p2 = _mm_unpacklo_epi64(_mm_loadl_epi64((__m128i *)(s - 3 * p)),
_mm_loadl_epi64((__m128i *)(s - 4 * p)));
q1p1 = _mm_unpacklo_epi64(_mm_loadl_epi64((__m128i *)(s - 2 * p)),
diff --git a/vpx_dsp/x86/variance_sse2.c b/vpx_dsp/x86/variance_sse2.c
index c2b55a3..e76c1a2 100644
--- a/vpx_dsp/x86/variance_sse2.c
+++ b/vpx_dsp/x86/variance_sse2.c
@@ -308,7 +308,6 @@
return *sse;
}
-#if CONFIG_USE_X86INC
// The 2 unused parameters are place holders for PIC enabled build.
// These definitions are for functions defined in subpel_variance.asm
#define DECL(w, opt) \
@@ -474,7 +473,6 @@
#undef FNS
#undef FN
-#endif // CONFIG_USE_X86INC
void vpx_upsampled_pred_sse2(uint8_t *comp_pred,
int width, int height,
diff --git a/vpx_dsp/x86/vpx_convolve_copy_sse2.asm b/vpx_dsp/x86/vpx_convolve_copy_sse2.asm
index 6d43fc1..964ee14 100644
--- a/vpx_dsp/x86/vpx_convolve_copy_sse2.asm
+++ b/vpx_dsp/x86/vpx_convolve_copy_sse2.asm
@@ -201,7 +201,7 @@
%endif
%endif ; CONFIG_VP10 && CONFIG_EXT_PARTITION
-.w64
+.w64:
mov r4d, dword hm
.loop64:
movu m0, [srcq]
diff --git a/vpx_dsp/x86/vpx_subpixel_8t_ssse3.asm b/vpx_dsp/x86/vpx_subpixel_8t_ssse3.asm
index d2cb8ea..c1a6f23 100644
--- a/vpx_dsp/x86/vpx_subpixel_8t_ssse3.asm
+++ b/vpx_dsp/x86/vpx_subpixel_8t_ssse3.asm
@@ -23,11 +23,7 @@
; z = signed SAT(x + y)
SECTION .text
-%if ARCH_X86_64
- %define LOCAL_VARS_SIZE 16*4
-%else
- %define LOCAL_VARS_SIZE 16*6
-%endif
+%define LOCAL_VARS_SIZE 16*6
%macro SETUP_LOCAL_VARS 0
; TODO(slavarnway): using xmm registers for these on ARCH_X86_64 +
@@ -54,11 +50,11 @@
mova k6k7, m3
%if ARCH_X86_64
%define krd m12
- %define tmp m13
+ %define tmp0 [rsp + 16*4]
+ %define tmp1 [rsp + 16*5]
mova krd, [GLOBAL(pw_64)]
%else
- %define tmp [rsp + 16*4]
- %define krd [rsp + 16*5]
+ %define krd [rsp + 16*4]
%if CONFIG_PIC=0
mova m6, [GLOBAL(pw_64)]
%else
@@ -71,50 +67,31 @@
%endif
%endm
-%macro HORIZx4_ROW 2
- mova %2, %1
- punpcklbw %1, %1
- punpckhbw %2, %2
-
- mova m3, %2
- palignr %2, %1, 1
- palignr m3, %1, 5
-
- pmaddubsw %2, k0k1k4k5
- pmaddubsw m3, k2k3k6k7
- mova m4, %2 ;k0k1
- mova m5, m3 ;k2k3
- psrldq %2, 8 ;k4k5
- psrldq m3, 8 ;k6k7
- paddsw %2, m4
- paddsw m5, m3
- paddsw %2, m5
- paddsw %2, krd
- psraw %2, 7
- packuswb %2, %2
-%endm
-
;-------------------------------------------------------------------------------
+%if ARCH_X86_64
+ %define LOCAL_VARS_SIZE_H4 0
+%else
+ %define LOCAL_VARS_SIZE_H4 16*4
+%endif
+
%macro SUBPIX_HFILTER4 1
-cglobal filter_block1d4_%1, 6, 6+(ARCH_X86_64*2), 11, LOCAL_VARS_SIZE, \
+cglobal filter_block1d4_%1, 6, 6, 11, LOCAL_VARS_SIZE_H4, \
src, sstride, dst, dstride, height, filter
mova m4, [filterq]
packsswb m4, m4
%if ARCH_X86_64
- %define k0k1k4k5 m8
- %define k2k3k6k7 m9
- %define krd m10
- %define orig_height r7d
+ %define k0k1k4k5 m8
+ %define k2k3k6k7 m9
+ %define krd m10
mova krd, [GLOBAL(pw_64)]
pshuflw k0k1k4k5, m4, 0b ;k0_k1
pshufhw k0k1k4k5, k0k1k4k5, 10101010b ;k0_k1_k4_k5
pshuflw k2k3k6k7, m4, 01010101b ;k2_k3
pshufhw k2k3k6k7, k2k3k6k7, 11111111b ;k2_k3_k6_k7
%else
- %define k0k1k4k5 [rsp + 16*0]
- %define k2k3k6k7 [rsp + 16*1]
- %define krd [rsp + 16*2]
- %define orig_height [rsp + 16*3]
+ %define k0k1k4k5 [rsp + 16*0]
+ %define k2k3k6k7 [rsp + 16*1]
+ %define krd [rsp + 16*2]
pshuflw m6, m4, 0b ;k0_k1
pshufhw m6, m6, 10101010b ;k0_k1_k4_k5
pshuflw m7, m4, 01010101b ;k2_k3
@@ -131,61 +108,46 @@
mova k2k3k6k7, m7
mova krd, m1
%endif
- mov orig_height, heightd
- shr heightd, 1
+ dec heightd
+
.loop:
;Do two rows at once
- movh m0, [srcq - 3]
- movh m1, [srcq + 5]
- punpcklqdq m0, m1
- mova m1, m0
- movh m2, [srcq + sstrideq - 3]
- movh m3, [srcq + sstrideq + 5]
- punpcklqdq m2, m3
- mova m3, m2
- punpcklbw m0, m0
- punpckhbw m1, m1
- punpcklbw m2, m2
- punpckhbw m3, m3
- mova m4, m1
- palignr m4, m0, 1
- pmaddubsw m4, k0k1k4k5
- palignr m1, m0, 5
+ movu m4, [srcq - 3]
+ movu m5, [srcq + sstrideq - 3]
+ punpckhbw m1, m4, m4
+ punpcklbw m4, m4
+ punpckhbw m3, m5, m5
+ punpcklbw m5, m5
+ palignr m0, m1, m4, 1
+ pmaddubsw m0, k0k1k4k5
+ palignr m1, m4, 5
pmaddubsw m1, k2k3k6k7
- mova m7, m3
- palignr m7, m2, 1
- pmaddubsw m7, k0k1k4k5
- palignr m3, m2, 5
+ palignr m2, m3, m5, 1
+ pmaddubsw m2, k0k1k4k5
+ palignr m3, m5, 5
pmaddubsw m3, k2k3k6k7
- mova m0, m4 ;k0k1
- mova m5, m1 ;k2k3
- mova m2, m7 ;k0k1 upper
- psrldq m4, 8 ;k4k5
- psrldq m1, 8 ;k6k7
- paddsw m4, m0
- paddsw m5, m1
- mova m1, m3 ;k2k3 upper
- psrldq m7, 8 ;k4k5 upper
- psrldq m3, 8 ;k6k7 upper
- paddsw m7, m2
- paddsw m4, m5
- paddsw m1, m3
- paddsw m7, m1
- paddsw m4, krd
- psraw m4, 7
- packuswb m4, m4
- paddsw m7, krd
- psraw m7, 7
- packuswb m7, m7
+ punpckhqdq m4, m0, m2
+ punpcklqdq m0, m2
+ punpckhqdq m5, m1, m3
+ punpcklqdq m1, m3
+ paddsw m0, m4
+ paddsw m1, m5
+%ifidn %1, h8_avg
+ movd m4, [dstq]
+ movd m5, [dstq + dstrideq]
+%endif
+ paddsw m0, m1
+ paddsw m0, krd
+ psraw m0, 7
+ packuswb m0, m0
+ psrldq m1, m0, 4
%ifidn %1, h8_avg
- movd m0, [dstq]
- pavgb m4, m0
- movd m2, [dstq + dstrideq]
- pavgb m7, m2
+ pavgb m0, m4
+ pavgb m1, m5
%endif
- movd [dstq], m4
- movd [dstq + dstrideq], m7
+ movd [dstq], m0
+ movd [dstq + dstrideq], m1
lea srcq, [srcq + sstrideq ]
prefetcht0 [srcq + 4 * sstrideq - 3]
@@ -193,205 +155,156 @@
lea dstq, [dstq + 2 * dstrideq ]
prefetcht0 [srcq + 2 * sstrideq - 3]
- dec heightd
- jnz .loop
+ sub heightd, 2
+ jg .loop
; Do last row if output_height is odd
- mov heightd, orig_height
- and heightd, 1
- je .done
+ jne .done
- movh m0, [srcq - 3] ; load src
- movh m1, [srcq + 5]
- punpcklqdq m0, m1
-
- HORIZx4_ROW m0, m1
+ movu m4, [srcq - 3]
+ punpckhbw m1, m4, m4
+ punpcklbw m4, m4
+ palignr m0, m1, m4, 1
+ palignr m1, m4, 5
+ pmaddubsw m0, k0k1k4k5
+ pmaddubsw m1, k2k3k6k7
+ psrldq m2, m0, 8
+ psrldq m3, m1, 8
+ paddsw m0, m2
+ paddsw m1, m3
+ paddsw m0, m1
+ paddsw m0, krd
+ psraw m0, 7
+ packuswb m0, m0
%ifidn %1, h8_avg
- movd m0, [dstq]
- pavgb m1, m0
+ movd m4, [dstq]
+ pavgb m0, m4
%endif
- movd [dstq], m1
-.done
- RET
-%endm
-
-%macro HORIZx8_ROW 5
- mova %2, %1
- punpcklbw %1, %1
- punpckhbw %2, %2
-
- mova %3, %2
- mova %4, %2
- mova %5, %2
-
- palignr %2, %1, 1
- palignr %3, %1, 5
- palignr %4, %1, 9
- palignr %5, %1, 13
-
- pmaddubsw %2, k0k1
- pmaddubsw %3, k2k3
- pmaddubsw %4, k4k5
- pmaddubsw %5, k6k7
- paddsw %2, %4
- paddsw %5, %3
- paddsw %2, %5
- paddsw %2, krd
- psraw %2, 7
- packuswb %2, %2
- SWAP %1, %2
+ movd [dstq], m0
+.done:
+ REP_RET
%endm
;-------------------------------------------------------------------------------
%macro SUBPIX_HFILTER8 1
-cglobal filter_block1d8_%1, 6, 6+(ARCH_X86_64*1), 14, LOCAL_VARS_SIZE, \
+cglobal filter_block1d8_%1, 6, 6, 14, LOCAL_VARS_SIZE, \
src, sstride, dst, dstride, height, filter
mova m4, [filterq]
SETUP_LOCAL_VARS
-%if ARCH_X86_64
- %define orig_height r7d
-%else
- %define orig_height heightmp
-%endif
- mov orig_height, heightd
- shr heightd, 1
+ dec heightd
.loop:
- movh m0, [srcq - 3]
- movh m3, [srcq + 5]
- movh m4, [srcq + sstrideq - 3]
- movh m7, [srcq + sstrideq + 5]
- punpcklqdq m0, m3
- mova m1, m0
+ ;Do two rows at once
+ movu m0, [srcq - 3]
+ movu m4, [srcq + sstrideq - 3]
+ punpckhbw m1, m0, m0
punpcklbw m0, m0
- punpckhbw m1, m1
- mova m5, m1
- palignr m5, m0, 13
+ palignr m5, m1, m0, 13
pmaddubsw m5, k6k7
- mova m2, m1
- mova m3, m1
+ palignr m2, m1, m0, 5
+ palignr m3, m1, m0, 9
palignr m1, m0, 1
pmaddubsw m1, k0k1
- punpcklqdq m4, m7
- mova m6, m4
+ punpckhbw m6, m4, m4
punpcklbw m4, m4
- palignr m2, m0, 5
- punpckhbw m6, m6
- palignr m3, m0, 9
- mova m7, m6
pmaddubsw m2, k2k3
pmaddubsw m3, k4k5
- palignr m7, m4, 13
- mova m0, m6
- palignr m0, m4, 5
+ palignr m7, m6, m4, 13
+ palignr m0, m6, m4, 5
pmaddubsw m7, k6k7
paddsw m1, m3
paddsw m2, m5
paddsw m1, m2
- mova m5, m6
+%ifidn %1, h8_avg
+ movh m2, [dstq]
+ movhps m2, [dstq + dstrideq]
+%endif
+ palignr m5, m6, m4, 9
palignr m6, m4, 1
pmaddubsw m0, k2k3
pmaddubsw m6, k0k1
- palignr m5, m4, 9
paddsw m1, krd
pmaddubsw m5, k4k5
psraw m1, 7
paddsw m0, m7
-%ifidn %1, h8_avg
- movh m7, [dstq]
- movh m2, [dstq + dstrideq]
-%endif
- packuswb m1, m1
paddsw m6, m5
paddsw m6, m0
paddsw m6, krd
psraw m6, 7
- packuswb m6, m6
+ packuswb m1, m6
%ifidn %1, h8_avg
- pavgb m1, m7
- pavgb m6, m2
+ pavgb m1, m2
%endif
- movh [dstq], m1
- movh [dstq + dstrideq], m6
+ movh [dstq], m1
+ movhps [dstq + dstrideq], m1
lea srcq, [srcq + sstrideq ]
prefetcht0 [srcq + 4 * sstrideq - 3]
lea srcq, [srcq + sstrideq ]
lea dstq, [dstq + 2 * dstrideq ]
prefetcht0 [srcq + 2 * sstrideq - 3]
- dec heightd
- jnz .loop
+ sub heightd, 2
+ jg .loop
- ;Do last row if output_height is odd
- mov heightd, orig_height
- and heightd, 1
- je .done
+ ; Do last row if output_height is odd
+ jne .done
- movh m0, [srcq - 3]
- movh m3, [srcq + 5]
- punpcklqdq m0, m3
-
- HORIZx8_ROW m0, m1, m2, m3, m4
-
+ movu m0, [srcq - 3]
+ punpckhbw m3, m0, m0
+ punpcklbw m0, m0
+ palignr m1, m3, m0, 1
+ palignr m2, m3, m0, 5
+ palignr m4, m3, m0, 13
+ palignr m3, m0, 9
+ pmaddubsw m1, k0k1
+ pmaddubsw m2, k2k3
+ pmaddubsw m3, k4k5
+ pmaddubsw m4, k6k7
+ paddsw m1, m3
+ paddsw m4, m2
+ paddsw m1, m4
+ paddsw m1, krd
+ psraw m1, 7
+ packuswb m1, m1
%ifidn %1, h8_avg
- movh m1, [dstq]
- pavgb m0, m1
+ movh m0, [dstq]
+ pavgb m1, m0
%endif
- movh [dstq], m0
+ movh [dstq], m1
.done:
- RET
+ REP_RET
%endm
;-------------------------------------------------------------------------------
%macro SUBPIX_HFILTER16 1
-cglobal filter_block1d16_%1, 6, 6+(ARCH_X86_64*0), 14, LOCAL_VARS_SIZE, \
+cglobal filter_block1d16_%1, 6, 6, 14, LOCAL_VARS_SIZE, \
src, sstride, dst, dstride, height, filter
mova m4, [filterq]
SETUP_LOCAL_VARS
+
.loop:
prefetcht0 [srcq + 2 * sstrideq -3]
- movh m0, [srcq - 3]
- movh m4, [srcq + 5]
- movh m6, [srcq + 13]
- punpcklqdq m0, m4
- mova m7, m0
- punpckhbw m0, m0
- mova m1, m0
- punpcklqdq m4, m6
- mova m3, m0
- punpcklbw m7, m7
-
- palignr m3, m7, 13
- mova m2, m0
- pmaddubsw m3, k6k7
- palignr m0, m7, 1
+ movu m0, [srcq - 3]
+ movu m4, [srcq - 2]
pmaddubsw m0, k0k1
- palignr m1, m7, 5
- pmaddubsw m1, k2k3
- palignr m2, m7, 9
- pmaddubsw m2, k4k5
- paddsw m1, m3
- mova m3, m4
- punpckhbw m4, m4
- mova m5, m4
- punpcklbw m3, m3
- mova m7, m4
- palignr m5, m3, 5
- mova m6, m4
- palignr m4, m3, 1
pmaddubsw m4, k0k1
+ movu m1, [srcq - 1]
+ movu m5, [srcq + 0]
+ pmaddubsw m1, k2k3
pmaddubsw m5, k2k3
- palignr m6, m3, 9
+ movu m2, [srcq + 1]
+ movu m6, [srcq + 2]
+ pmaddubsw m2, k4k5
pmaddubsw m6, k4k5
- palignr m7, m3, 13
+ movu m3, [srcq + 3]
+ movu m7, [srcq + 4]
+ pmaddubsw m3, k6k7
pmaddubsw m7, k6k7
paddsw m0, m2
+ paddsw m1, m3
paddsw m0, m1
-%ifidn %1, h8_avg
- mova m1, [dstq]
-%endif
paddsw m4, m6
paddsw m5, m7
paddsw m4, m5
@@ -399,16 +312,18 @@
paddsw m4, krd
psraw m0, 7
psraw m4, 7
- packuswb m0, m4
+ packuswb m0, m0
+ packuswb m4, m4
+ punpcklbw m0, m4
%ifidn %1, h8_avg
- pavgb m0, m1
+ pavgb m0, [dstq]
%endif
lea srcq, [srcq + sstrideq]
mova [dstq], m0
lea dstq, [dstq + dstrideq]
dec heightd
jnz .loop
- RET
+ REP_RET
%endm
INIT_XMM ssse3
@@ -420,204 +335,463 @@
SUBPIX_HFILTER4 h8_avg
;-------------------------------------------------------------------------------
+
+; TODO(Linfeng): Detect cpu type and choose the code with better performance.
+%define X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON 1
+
+%if ARCH_X86_64 && X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON
+ %define NUM_GENERAL_REG_USED 9
+%else
+ %define NUM_GENERAL_REG_USED 6
+%endif
+
%macro SUBPIX_VFILTER 2
-cglobal filter_block1d%2_%1, 6, 6+(ARCH_X86_64*3), 14, LOCAL_VARS_SIZE, \
+cglobal filter_block1d%2_%1, 6, NUM_GENERAL_REG_USED, 15, LOCAL_VARS_SIZE, \
src, sstride, dst, dstride, height, filter
mova m4, [filterq]
SETUP_LOCAL_VARS
-%if ARCH_X86_64
- %define src1q r7
- %define sstride6q r8
- %define dst_stride dstrideq
-%else
- %define src1q filterq
- %define sstride6q dstrideq
- %define dst_stride dstridemp
-%endif
- mov src1q, srcq
- add src1q, sstrideq
- lea sstride6q, [sstrideq + sstrideq * 4]
- add sstride6q, sstrideq ;pitch * 6
%ifidn %2, 8
- %define movx movh
+ %define movx movh
%else
- %define movx movd
+ %define movx movd
%endif
+
+ dec heightd
+
+%if ARCH_X86 || X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON
+
+%if ARCH_X86_64
+ %define src1q r7
+ %define sstride6q r8
+ %define dst_stride dstrideq
+%else
+ %define src1q filterq
+ %define sstride6q dstrideq
+ %define dst_stride dstridemp
+%endif
+ mov src1q, srcq
+ add src1q, sstrideq
+ lea sstride6q, [sstrideq + sstrideq * 4]
+ add sstride6q, sstrideq ;pitch * 6
+
.loop:
- movx m0, [srcq ] ;A
- movx m1, [srcq + sstrideq ] ;B
- punpcklbw m0, m1 ;A B
- movx m2, [srcq + sstrideq * 2 ] ;C
- pmaddubsw m0, k0k1
- mova m6, m2
- movx m3, [src1q + sstrideq * 2] ;D
- punpcklbw m2, m3 ;C D
- pmaddubsw m2, k2k3
- movx m4, [srcq + sstrideq * 4 ] ;E
- mova m7, m4
- movx m5, [src1q + sstrideq * 4] ;F
- punpcklbw m4, m5 ;E F
- pmaddubsw m4, k4k5
- punpcklbw m1, m6 ;A B next iter
- movx m6, [srcq + sstride6q ] ;G
- punpcklbw m5, m6 ;E F next iter
- punpcklbw m3, m7 ;C D next iter
- pmaddubsw m5, k4k5
- movx m7, [src1q + sstride6q ] ;H
- punpcklbw m6, m7 ;G H
- pmaddubsw m6, k6k7
- pmaddubsw m3, k2k3
- pmaddubsw m1, k0k1
- paddsw m0, m4
- paddsw m2, m6
- movx m6, [srcq + sstrideq * 8 ] ;H next iter
- punpcklbw m7, m6
- pmaddubsw m7, k6k7
- paddsw m0, m2
- paddsw m0, krd
- psraw m0, 7
- paddsw m1, m5
- packuswb m0, m0
+ ;Do two rows at once
+ movx m0, [srcq ] ;A
+ movx m1, [src1q ] ;B
+ punpcklbw m0, m1 ;A B
+ movx m2, [srcq + sstrideq * 2 ] ;C
+ pmaddubsw m0, k0k1
+ mova m6, m2
+ movx m3, [src1q + sstrideq * 2] ;D
+ punpcklbw m2, m3 ;C D
+ pmaddubsw m2, k2k3
+ movx m4, [srcq + sstrideq * 4 ] ;E
+ mova m7, m4
+ movx m5, [src1q + sstrideq * 4] ;F
+ punpcklbw m4, m5 ;E F
+ pmaddubsw m4, k4k5
+ punpcklbw m1, m6 ;A B next iter
+ movx m6, [srcq + sstride6q ] ;G
+ punpcklbw m5, m6 ;E F next iter
+ punpcklbw m3, m7 ;C D next iter
+ pmaddubsw m5, k4k5
+ movx m7, [src1q + sstride6q ] ;H
+ punpcklbw m6, m7 ;G H
+ pmaddubsw m6, k6k7
+ pmaddubsw m3, k2k3
+ pmaddubsw m1, k0k1
+ paddsw m0, m4
+ paddsw m2, m6
+ movx m6, [srcq + sstrideq * 8 ] ;H next iter
+ punpcklbw m7, m6
+ pmaddubsw m7, k6k7
+ paddsw m0, m2
+ paddsw m0, krd
+ psraw m0, 7
+ paddsw m1, m5
+ packuswb m0, m0
- paddsw m3, m7
- paddsw m1, m3
- paddsw m1, krd
- psraw m1, 7
- lea srcq, [srcq + sstrideq * 2 ]
- lea src1q, [src1q + sstrideq * 2]
- packuswb m1, m1
+ paddsw m3, m7
+ paddsw m1, m3
+ paddsw m1, krd
+ psraw m1, 7
+ lea srcq, [srcq + sstrideq * 2 ]
+ lea src1q, [src1q + sstrideq * 2]
+ packuswb m1, m1
%ifidn %1, v8_avg
- movx m2, [dstq]
- pavgb m0, m2
+ movx m2, [dstq]
+ pavgb m0, m2
%endif
- movx [dstq], m0
- add dstq, dst_stride
+ movx [dstq], m0
+ add dstq, dst_stride
%ifidn %1, v8_avg
- movx m3, [dstq]
- pavgb m1, m3
+ movx m3, [dstq]
+ pavgb m1, m3
%endif
- movx [dstq], m1
- add dstq, dst_stride
- sub heightd, 2
- cmp heightd, 1
- jg .loop
+ movx [dstq], m1
+ add dstq, dst_stride
+ sub heightd, 2
+ jg .loop
- cmp heightd, 0
- je .done
+ ; Do last row if output_height is odd
+ jne .done
- movx m0, [srcq ] ;A
- movx m1, [srcq + sstrideq ] ;B
- movx m6, [srcq + sstride6q ] ;G
- punpcklbw m0, m1 ;A B
- movx m7, [src1q + sstride6q ] ;H
- pmaddubsw m0, k0k1
- movx m2, [srcq + sstrideq * 2 ] ;C
- punpcklbw m6, m7 ;G H
- movx m3, [src1q + sstrideq * 2] ;D
- pmaddubsw m6, k6k7
- movx m4, [srcq + sstrideq * 4 ] ;E
- punpcklbw m2, m3 ;C D
- movx m5, [src1q + sstrideq * 4] ;F
- punpcklbw m4, m5 ;E F
- pmaddubsw m2, k2k3
- pmaddubsw m4, k4k5
- paddsw m2, m6
- paddsw m0, m4
- paddsw m0, m2
- paddsw m0, krd
- psraw m0, 7
- packuswb m0, m0
+ movx m0, [srcq ] ;A
+ movx m1, [srcq + sstrideq ] ;B
+ movx m6, [srcq + sstride6q ] ;G
+ punpcklbw m0, m1 ;A B
+ movx m7, [src1q + sstride6q ] ;H
+ pmaddubsw m0, k0k1
+ movx m2, [srcq + sstrideq * 2 ] ;C
+ punpcklbw m6, m7 ;G H
+ movx m3, [src1q + sstrideq * 2] ;D
+ pmaddubsw m6, k6k7
+ movx m4, [srcq + sstrideq * 4 ] ;E
+ punpcklbw m2, m3 ;C D
+ movx m5, [src1q + sstrideq * 4] ;F
+ punpcklbw m4, m5 ;E F
+ pmaddubsw m2, k2k3
+ pmaddubsw m4, k4k5
+ paddsw m2, m6
+ paddsw m0, m4
+ paddsw m0, m2
+ paddsw m0, krd
+ psraw m0, 7
+ packuswb m0, m0
%ifidn %1, v8_avg
- movx m1, [dstq]
- pavgb m0, m1
+ movx m1, [dstq]
+ pavgb m0, m1
%endif
- movx [dstq], m0
+ movx [dstq], m0
+
+%else
+ ; ARCH_X86_64
+
+ movx m0, [srcq ] ;A
+ movx m1, [srcq + sstrideq ] ;B
+ lea srcq, [srcq + sstrideq * 2 ]
+ movx m2, [srcq] ;C
+ movx m3, [srcq + sstrideq] ;D
+ lea srcq, [srcq + sstrideq * 2 ]
+ movx m4, [srcq] ;E
+ movx m5, [srcq + sstrideq] ;F
+ lea srcq, [srcq + sstrideq * 2 ]
+ movx m6, [srcq] ;G
+ punpcklbw m0, m1 ;A B
+ punpcklbw m1, m2 ;A B next iter
+ punpcklbw m2, m3 ;C D
+ punpcklbw m3, m4 ;C D next iter
+ punpcklbw m4, m5 ;E F
+ punpcklbw m5, m6 ;E F next iter
+
+.loop:
+ ;Do two rows at once
+ movx m7, [srcq + sstrideq] ;H
+ lea srcq, [srcq + sstrideq * 2 ]
+ movx m14, [srcq] ;H next iter
+ punpcklbw m6, m7 ;G H
+ punpcklbw m7, m14 ;G H next iter
+ pmaddubsw m8, m0, k0k1
+ pmaddubsw m9, m1, k0k1
+ mova m0, m2
+ mova m1, m3
+ pmaddubsw m10, m2, k2k3
+ pmaddubsw m11, m3, k2k3
+ mova m2, m4
+ mova m3, m5
+ pmaddubsw m4, k4k5
+ pmaddubsw m5, k4k5
+ paddsw m8, m4
+ paddsw m9, m5
+ mova m4, m6
+ mova m5, m7
+ pmaddubsw m6, k6k7
+ pmaddubsw m7, k6k7
+ paddsw m10, m6
+ paddsw m11, m7
+ paddsw m8, m10
+ paddsw m9, m11
+ mova m6, m14
+ paddsw m8, krd
+ paddsw m9, krd
+ psraw m8, 7
+ psraw m9, 7
+%ifidn %2, 4
+ packuswb m8, m8
+ packuswb m9, m9
+%else
+ packuswb m8, m9
+%endif
+
+%ifidn %1, v8_avg
+ movx m7, [dstq]
+%ifidn %2, 4
+ movx m10, [dstq + dstrideq]
+ pavgb m9, m10
+%else
+ movhpd m7, [dstq + dstrideq]
+%endif
+ pavgb m8, m7
+%endif
+ movx [dstq], m8
+%ifidn %2, 4
+ movx [dstq + dstrideq], m9
+%else
+ movhpd [dstq + dstrideq], m8
+%endif
+
+ lea dstq, [dstq + dstrideq * 2 ]
+ sub heightd, 2
+ jg .loop
+
+ ; Do last row if output_height is odd
+ jne .done
+
+ movx m7, [srcq + sstrideq] ;H
+ punpcklbw m6, m7 ;G H
+ pmaddubsw m0, k0k1
+ pmaddubsw m2, k2k3
+ pmaddubsw m4, k4k5
+ pmaddubsw m6, k6k7
+ paddsw m0, m4
+ paddsw m2, m6
+ paddsw m0, m2
+ paddsw m0, krd
+ psraw m0, 7
+ packuswb m0, m0
+%ifidn %1, v8_avg
+ movx m1, [dstq]
+ pavgb m0, m1
+%endif
+ movx [dstq], m0
+
+%endif ; ARCH_X86_64
+
.done:
- RET
+ REP_RET
+
%endm
;-------------------------------------------------------------------------------
%macro SUBPIX_VFILTER16 1
-cglobal filter_block1d16_%1, 6, 6+(ARCH_X86_64*3), 14, LOCAL_VARS_SIZE, \
+cglobal filter_block1d16_%1, 6, NUM_GENERAL_REG_USED, 16, LOCAL_VARS_SIZE, \
src, sstride, dst, dstride, height, filter
- mova m4, [filterq]
+ mova m4, [filterq]
SETUP_LOCAL_VARS
+
+%if ARCH_X86 || X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON
+
%if ARCH_X86_64
- %define src1q r7
- %define sstride6q r8
- %define dst_stride dstrideq
+ %define src1q r7
+ %define sstride6q r8
+ %define dst_stride dstrideq
%else
- %define src1q filterq
- %define sstride6q dstrideq
- %define dst_stride dstridemp
+ %define src1q filterq
+ %define sstride6q dstrideq
+ %define dst_stride dstridemp
%endif
- mov src1q, srcq
- add src1q, sstrideq
- lea sstride6q, [sstrideq + sstrideq * 4]
- add sstride6q, sstrideq ;pitch * 6
+ lea src1q, [srcq + sstrideq]
+ lea sstride6q, [sstrideq + sstrideq * 4]
+ add sstride6q, sstrideq ;pitch * 6
.loop:
- movh m0, [srcq ] ;A
- movh m1, [srcq + sstrideq ] ;B
- movh m2, [srcq + sstrideq * 2 ] ;C
- movh m3, [src1q + sstrideq * 2] ;D
- movh m4, [srcq + sstrideq * 4 ] ;E
- movh m5, [src1q + sstrideq * 4] ;F
+ movh m0, [srcq ] ;A
+ movh m1, [src1q ] ;B
+ movh m2, [srcq + sstrideq * 2 ] ;C
+ movh m3, [src1q + sstrideq * 2] ;D
+ movh m4, [srcq + sstrideq * 4 ] ;E
+ movh m5, [src1q + sstrideq * 4] ;F
- punpcklbw m0, m1 ;A B
- movh m6, [srcq + sstride6q] ;G
- punpcklbw m2, m3 ;C D
- movh m7, [src1q + sstride6q] ;H
- punpcklbw m4, m5 ;E F
- pmaddubsw m0, k0k1
- movh m3, [srcq + 8] ;A
- pmaddubsw m2, k2k3
- punpcklbw m6, m7 ;G H
- movh m5, [srcq + sstrideq + 8] ;B
- pmaddubsw m4, k4k5
- punpcklbw m3, m5 ;A B
- movh m7, [srcq + sstrideq * 2 + 8] ;C
- pmaddubsw m6, k6k7
- movh m5, [src1q + sstrideq * 2 + 8] ;D
- punpcklbw m7, m5 ;C D
- paddsw m2, m6
- pmaddubsw m3, k0k1
- movh m1, [srcq + sstrideq * 4 + 8] ;E
- paddsw m0, m4
- pmaddubsw m7, k2k3
- movh m6, [src1q + sstrideq * 4 + 8] ;F
- punpcklbw m1, m6 ;E F
- paddsw m0, m2
- paddsw m0, krd
- movh m2, [srcq + sstride6q + 8] ;G
- pmaddubsw m1, k4k5
- movh m5, [src1q + sstride6q + 8] ;H
- psraw m0, 7
- punpcklbw m2, m5 ;G H
- pmaddubsw m2, k6k7
-%ifidn %1, v8_avg
- mova m4, [dstq]
-%endif
- movh [dstq], m0
- paddsw m7, m2
- paddsw m3, m1
- paddsw m3, m7
- paddsw m3, krd
- psraw m3, 7
- packuswb m0, m3
+ punpcklbw m0, m1 ;A B
+ movh m6, [srcq + sstride6q] ;G
+ punpcklbw m2, m3 ;C D
+ movh m7, [src1q + sstride6q] ;H
+ punpcklbw m4, m5 ;E F
+ pmaddubsw m0, k0k1
+ movh m3, [srcq + 8] ;A
+ pmaddubsw m2, k2k3
+ punpcklbw m6, m7 ;G H
+ movh m5, [srcq + sstrideq + 8] ;B
+ pmaddubsw m4, k4k5
+ punpcklbw m3, m5 ;A B
+ movh m7, [srcq + sstrideq * 2 + 8] ;C
+ pmaddubsw m6, k6k7
+ movh m5, [src1q + sstrideq * 2 + 8] ;D
+ punpcklbw m7, m5 ;C D
+ paddsw m2, m6
+ pmaddubsw m3, k0k1
+ movh m1, [srcq + sstrideq * 4 + 8] ;E
+ paddsw m0, m4
+ pmaddubsw m7, k2k3
+ movh m6, [src1q + sstrideq * 4 + 8] ;F
+ punpcklbw m1, m6 ;E F
+ paddsw m0, m2
+ paddsw m0, krd
+ movh m2, [srcq + sstride6q + 8] ;G
+ pmaddubsw m1, k4k5
+ movh m5, [src1q + sstride6q + 8] ;H
+ psraw m0, 7
+ punpcklbw m2, m5 ;G H
+ pmaddubsw m2, k6k7
+ paddsw m7, m2
+ paddsw m3, m1
+ paddsw m3, m7
+ paddsw m3, krd
+ psraw m3, 7
+ packuswb m0, m3
- add srcq, sstrideq
- add src1q, sstrideq
+ add srcq, sstrideq
+ add src1q, sstrideq
%ifidn %1, v8_avg
- pavgb m0, m4
+ pavgb m0, [dstq]
%endif
- mova [dstq], m0
- add dstq, dst_stride
- dec heightd
- jnz .loop
- RET
+ mova [dstq], m0
+ add dstq, dst_stride
+ dec heightd
+ jnz .loop
+ REP_RET
+
+%else
+ ; ARCH_X86_64
+ dec heightd
+
+ movu m1, [srcq ] ;A
+ movu m3, [srcq + sstrideq ] ;B
+ lea srcq, [srcq + sstrideq * 2]
+ punpcklbw m0, m1, m3 ;A B
+ punpckhbw m1, m3 ;A B
+ movu m5, [srcq] ;C
+ punpcklbw m2, m3, m5 ;A B next iter
+ punpckhbw m3, m5 ;A B next iter
+ mova tmp0, m2 ;store to stack
+ mova tmp1, m3 ;store to stack
+ movu m7, [srcq + sstrideq] ;D
+ lea srcq, [srcq + sstrideq * 2]
+ punpcklbw m4, m5, m7 ;C D
+ punpckhbw m5, m7 ;C D
+ movu m9, [srcq] ;E
+ punpcklbw m6, m7, m9 ;C D next iter
+ punpckhbw m7, m9 ;C D next iter
+ movu m11, [srcq + sstrideq] ;F
+ lea srcq, [srcq + sstrideq * 2]
+ punpcklbw m8, m9, m11 ;E F
+ punpckhbw m9, m11 ;E F
+ movu m2, [srcq] ;G
+ punpcklbw m10, m11, m2 ;E F next iter
+ punpckhbw m11, m2 ;E F next iter
+
+.loop:
+ ;Do two rows at once
+ pmaddubsw m13, m0, k0k1
+ mova m0, m4
+ pmaddubsw m14, m8, k4k5
+ pmaddubsw m15, m4, k2k3
+ mova m4, m8
+ paddsw m13, m14
+ movu m3, [srcq + sstrideq] ;H
+ lea srcq, [srcq + sstrideq * 2]
+ punpcklbw m14, m2, m3 ;G H
+ mova m8, m14
+ pmaddubsw m14, k6k7
+ paddsw m15, m14
+ paddsw m13, m15
+ paddsw m13, krd
+ psraw m13, 7
+
+ pmaddubsw m14, m1, k0k1
+ pmaddubsw m1, m9, k4k5
+ pmaddubsw m15, m5, k2k3
+ paddsw m14, m1
+ mova m1, m5
+ mova m5, m9
+ punpckhbw m2, m3 ;G H
+ mova m9, m2
+ pmaddubsw m2, k6k7
+ paddsw m15, m2
+ paddsw m14, m15
+ paddsw m14, krd
+ psraw m14, 7
+ packuswb m13, m14
+%ifidn %1, v8_avg
+ pavgb m13, [dstq]
+%endif
+ mova [dstq], m13
+
+ ; next iter
+ pmaddubsw m15, tmp0, k0k1
+ pmaddubsw m14, m10, k4k5
+ pmaddubsw m13, m6, k2k3
+ paddsw m15, m14
+ mova tmp0, m6
+ mova m6, m10
+ movu m2, [srcq] ;G next iter
+ punpcklbw m14, m3, m2 ;G H next iter
+ mova m10, m14
+ pmaddubsw m14, k6k7
+ paddsw m13, m14
+ paddsw m15, m13
+ paddsw m15, krd
+ psraw m15, 7
+
+ pmaddubsw m14, tmp1, k0k1
+ mova tmp1, m7
+ pmaddubsw m13, m7, k2k3
+ mova m7, m11
+ pmaddubsw m11, k4k5
+ paddsw m14, m11
+ punpckhbw m3, m2 ;G H next iter
+ mova m11, m3
+ pmaddubsw m3, k6k7
+ paddsw m13, m3
+ paddsw m14, m13
+ paddsw m14, krd
+ psraw m14, 7
+ packuswb m15, m14
+%ifidn %1, v8_avg
+ pavgb m15, [dstq + dstrideq]
+%endif
+ mova [dstq + dstrideq], m15
+ lea dstq, [dstq + dstrideq * 2]
+ sub heightd, 2
+ jg .loop
+
+ ; Do last row if output_height is odd
+ jne .done
+
+ movu m3, [srcq + sstrideq] ;H
+ punpcklbw m6, m2, m3 ;G H
+ punpckhbw m2, m3 ;G H
+ pmaddubsw m0, k0k1
+ pmaddubsw m1, k0k1
+ pmaddubsw m4, k2k3
+ pmaddubsw m5, k2k3
+ pmaddubsw m8, k4k5
+ pmaddubsw m9, k4k5
+ pmaddubsw m6, k6k7
+ pmaddubsw m2, k6k7
+ paddsw m0, m8
+ paddsw m1, m9
+ paddsw m4, m6
+ paddsw m5, m2
+ paddsw m0, m4
+ paddsw m1, m5
+ paddsw m0, krd
+ paddsw m1, krd
+ psraw m0, 7
+ psraw m1, 7
+ packuswb m0, m1
+%ifidn %1, v8_avg
+ pavgb m0, [dstq]
+%endif
+ mova [dstq], m0
+
+.done:
+ REP_RET
+
+%endif ; ARCH_X86_64
+
%endm
INIT_XMM ssse3
diff --git a/vpx_dsp/x86/vpx_subpixel_bilinear_ssse3.asm b/vpx_dsp/x86/vpx_subpixel_bilinear_ssse3.asm
index 3c8cfd2..538b212 100644
--- a/vpx_dsp/x86/vpx_subpixel_bilinear_ssse3.asm
+++ b/vpx_dsp/x86/vpx_subpixel_bilinear_ssse3.asm
@@ -14,14 +14,14 @@
mov rdx, arg(5) ;filter ptr
mov rsi, arg(0) ;src_ptr
mov rdi, arg(2) ;output_ptr
- mov rcx, 0x0400040
+ mov ecx, 0x01000100
movdqa xmm3, [rdx] ;load filters
psrldq xmm3, 6
packsswb xmm3, xmm3
pshuflw xmm3, xmm3, 0b ;k3_k4
- movq xmm2, rcx ;rounding
+ movd xmm2, ecx ;rounding_shift
pshufd xmm2, xmm2, 0
movsxd rax, DWORD PTR arg(1) ;pixels_per_line
@@ -33,8 +33,7 @@
punpcklbw xmm0, xmm1
pmaddubsw xmm0, xmm3
- paddsw xmm0, xmm2 ;rounding
- psraw xmm0, 7 ;shift
+ pmulhrsw xmm0, xmm2 ;rounding(+64)+shift(>>7)
packuswb xmm0, xmm0 ;pack to byte
%if %1
@@ -51,7 +50,7 @@
mov rdx, arg(5) ;filter ptr
mov rsi, arg(0) ;src_ptr
mov rdi, arg(2) ;output_ptr
- mov rcx, 0x0400040
+ mov ecx, 0x01000100
movdqa xmm7, [rdx] ;load filters
psrldq xmm7, 6
@@ -59,7 +58,7 @@
pshuflw xmm7, xmm7, 0b ;k3_k4
punpcklwd xmm7, xmm7
- movq xmm6, rcx ;rounding
+ movd xmm6, ecx ;rounding_shift
pshufd xmm6, xmm6, 0
movsxd rax, DWORD PTR arg(1) ;pixels_per_line
@@ -71,8 +70,7 @@
punpcklbw xmm0, xmm1
pmaddubsw xmm0, xmm7
- paddsw xmm0, xmm6 ;rounding
- psraw xmm0, 7 ;shift
+ pmulhrsw xmm0, xmm6 ;rounding(+64)+shift(>>7)
packuswb xmm0, xmm0 ;pack back to byte
%if %1
@@ -92,10 +90,8 @@
pmaddubsw xmm0, xmm7
pmaddubsw xmm2, xmm7
- paddsw xmm0, xmm6 ;rounding
- paddsw xmm2, xmm6
- psraw xmm0, 7 ;shift
- psraw xmm2, 7
+ pmulhrsw xmm0, xmm6 ;rounding(+64)+shift(>>7)
+ pmulhrsw xmm2, xmm6
packuswb xmm0, xmm2 ;pack back to byte
%if %1
diff --git a/vpx_util/vpx_thread.c b/vpx_util/vpx_thread.c
index 0bb0125..0132ce6 100644
--- a/vpx_util/vpx_thread.c
+++ b/vpx_util/vpx_thread.c
@@ -10,8 +10,7 @@
// Multi-threaded worker
//
// Original source:
-// http://git.chromium.org/webm/libwebp.git
-// 100644 blob 264210ba2807e4da47eb5d18c04cf869d89b9784 src/utils/thread.c
+// https://chromium.googlesource.com/webm/libwebp
#include <assert.h>
#include <string.h> // for memset()
diff --git a/vpx_util/vpx_thread.h b/vpx_util/vpx_thread.h
index 2062abd..f554bbc 100644
--- a/vpx_util/vpx_thread.h
+++ b/vpx_util/vpx_thread.h
@@ -10,8 +10,7 @@
// Multi-threaded worker
//
// Original source:
-// http://git.chromium.org/webm/libwebp.git
-// 100644 blob 7bd451b124ae3b81596abfbcc823e3cb129d3a38 src/utils/thread.h
+// https://chromium.googlesource.com/webm/libwebp
#ifndef VPX_THREAD_H_
#define VPX_THREAD_H_
@@ -34,11 +33,26 @@
#include <windows.h> // NOLINT
typedef HANDLE pthread_t;
typedef CRITICAL_SECTION pthread_mutex_t;
+
+#if _WIN32_WINNT >= 0x0600 // Windows Vista / Server 2008 or greater
+#define USE_WINDOWS_CONDITION_VARIABLE
+typedef CONDITION_VARIABLE pthread_cond_t;
+#else
typedef struct {
HANDLE waiting_sem_;
HANDLE received_sem_;
HANDLE signal_event_;
} pthread_cond_t;
+#endif // _WIN32_WINNT >= 0x600
+
+#ifndef WINAPI_FAMILY_PARTITION
+#define WINAPI_PARTITION_DESKTOP 1
+#define WINAPI_FAMILY_PARTITION(x) x
+#endif
+
+#if !WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)
+#define USE_CREATE_THREAD
+#endif
//------------------------------------------------------------------------------
// simplistic pthread emulation layer
@@ -47,16 +61,30 @@
#define THREADFN unsigned int __stdcall
#define THREAD_RETURN(val) (unsigned int)((DWORD_PTR)val)
+#if _WIN32_WINNT >= 0x0501 // Windows XP or greater
+#define WaitForSingleObject(obj, timeout) \
+ WaitForSingleObjectEx(obj, timeout, FALSE /*bAlertable*/)
+#endif
+
static INLINE int pthread_create(pthread_t* const thread, const void* attr,
unsigned int (__stdcall *start)(void*),
void* arg) {
(void)attr;
+#ifdef USE_CREATE_THREAD
+ *thread = CreateThread(NULL, /* lpThreadAttributes */
+ 0, /* dwStackSize */
+ start,
+ arg,
+ 0, /* dwStackSize */
+ NULL); /* lpThreadId */
+#else
*thread = (pthread_t)_beginthreadex(NULL, /* void *security */
0, /* unsigned stack_size */
start,
arg,
0, /* unsigned initflag */
NULL); /* unsigned *thrdaddr */
+#endif
if (*thread == NULL) return 1;
SetThreadPriority(*thread, THREAD_PRIORITY_ABOVE_NORMAL);
return 0;
@@ -72,7 +100,11 @@
static INLINE int pthread_mutex_init(pthread_mutex_t *const mutex,
void* mutexattr) {
(void)mutexattr;
+#if _WIN32_WINNT >= 0x0600 // Windows Vista / Server 2008 or greater
+ InitializeCriticalSectionEx(mutex, 0 /*dwSpinCount*/, 0 /*Flags*/);
+#else
InitializeCriticalSection(mutex);
+#endif
return 0;
}
@@ -98,15 +130,22 @@
// Condition
static INLINE int pthread_cond_destroy(pthread_cond_t *const condition) {
int ok = 1;
+#ifdef USE_WINDOWS_CONDITION_VARIABLE
+ (void)condition;
+#else
ok &= (CloseHandle(condition->waiting_sem_) != 0);
ok &= (CloseHandle(condition->received_sem_) != 0);
ok &= (CloseHandle(condition->signal_event_) != 0);
+#endif
return !ok;
}
static INLINE int pthread_cond_init(pthread_cond_t *const condition,
void* cond_attr) {
(void)cond_attr;
+#ifdef USE_WINDOWS_CONDITION_VARIABLE
+ InitializeConditionVariable(condition);
+#else
condition->waiting_sem_ = CreateSemaphore(NULL, 0, MAX_DECODE_THREADS, NULL);
condition->received_sem_ = CreateSemaphore(NULL, 0, MAX_DECODE_THREADS, NULL);
condition->signal_event_ = CreateEvent(NULL, FALSE, FALSE, NULL);
@@ -116,11 +155,15 @@
pthread_cond_destroy(condition);
return 1;
}
+#endif
return 0;
}
static INLINE int pthread_cond_signal(pthread_cond_t *const condition) {
int ok = 1;
+#ifdef USE_WINDOWS_CONDITION_VARIABLE
+ WakeConditionVariable(condition);
+#else
if (WaitForSingleObject(condition->waiting_sem_, 0) == WAIT_OBJECT_0) {
// a thread is waiting in pthread_cond_wait: allow it to be notified
ok = SetEvent(condition->signal_event_);
@@ -129,12 +172,16 @@
ok &= (WaitForSingleObject(condition->received_sem_, INFINITE) !=
WAIT_OBJECT_0);
}
+#endif
return !ok;
}
static INLINE int pthread_cond_wait(pthread_cond_t *const condition,
pthread_mutex_t *const mutex) {
int ok;
+#ifdef USE_WINDOWS_CONDITION_VARIABLE
+ ok = SleepConditionVariableCS(condition, mutex, INFINITE);
+#else
// note that there is a consumer available so the signal isn't dropped in
// pthread_cond_signal
if (!ReleaseSemaphore(condition->waiting_sem_, 1, NULL))
@@ -145,6 +192,7 @@
WAIT_OBJECT_0);
ok &= ReleaseSemaphore(condition->received_sem_, 1, NULL);
pthread_mutex_lock(mutex);
+#endif
return !ok;
}
#elif defined(__OS2__)