commit | 37912de97881b6267c484d99d1340bc0a0b3f27e | [log] [tgz] |
---|---|---|
author | Luc Trudeau <luc@trud.ca> | Fri Mar 30 18:16:39 2018 -0400 |
committer | Luc Trudeau <luc@trud.ca> | Fri Apr 27 19:02:38 2018 +0000 |
tree | ec8eb718a895eb3405475f54e0581ec8c09d27a3 | |
parent | 069473be9706dcfbf8cad7be8aee47d81f551071 [diff] |
[CFL] AVX2 Version of luma_subsampling_420_hbd Based on the observation that for smaller widths (4xN and 8xN), the AVX2 code tends to be slower than its SSSE3 counterpart. Because of the laning in _mm256_hadd_epi64, an extra _mm256_permute4x64_epi64 operation is required for AVX2 when compared to SSSE3. In light of this, blocks of width 16, are slower in AVX2 than in SSSE3. The AVX2 code now calls the SSSE3 functions when width < 32. We include unit tests for conformance and speed. AVX2/CFLSubsampleHBDSpeedTest (i7-7820X) 4x4: C time = 96 us, SIMD time = 56 us (~1.7x) 8x8: C time = 319 us, SIMD time = 96 us (~3.3x) 16x16: C time = 1632 us, SIMD time = 243 us (~6.7x) 32x32: C time = 7973 us, SIMD time = 769 us (~10x) Change-Id: I3234668060908018b18f37f045011d6f9751ff81
The AV1 library source code is stored in the Alliance for Open Media Git repository:
$ git clone https://aomedia.googlesource.com/aom # By default, the above command stores the source in the aom directory: $ cd aom
CMake replaces the configure step typical of many projects. Running CMake will produce configuration and build files for the currently selected CMake generator. For most systems the default generator is Unix Makefiles. The basic form of a makefile build is the following:
$ cmake path/to/aom $ make
The above will generate a makefile build that produces the AV1 library and applications for the current host system after the make step completes successfully. The compiler chosen varies by host platform, but a general rule applies: On systems where cc and c++ are present in $PATH at the time CMake is run the generated build will use cc and c++ by default.
The AV1 codec library has a great many configuration options. These come in two varieties:
ENABLE_FEATURE
.CONFIG_FEATURE
.Both types of options are set at the time CMake is run. The following example enables ccache and disables the AV1 encoder:
$ cmake path/to/aom -DENABLE_CCACHE=1 -DCONFIG_AV1_ENCODER=0 $ make
The available configuration options are too numerous to list here. Build system configuration options can be found at the top of the CMakeLists.txt file found in the root of the AV1 repository, and AV1 codec configuration options can currently be found in the file build/cmake/aom_config_defaults.cmake
.
A dylib (shared object) build of the AV1 codec library can be enabled via the CMake built in variable BUILD_SHARED_LIBS
:
$ cmake path/to/aom -DBUILD_SHARED_LIBS=1 $ make
This is currently only supported on non-Windows targets.
Depending on the generator used there are multiple ways of going about debugging AV1 components. For single configuration generators like the Unix Makefiles generator, setting CMAKE_BUILD_TYPE
to Debug is sufficient:
$ cmake path/to/aom -DCMAKE_BUILD_TYPE=Debug
For Xcode, mainly because configuration controls for Xcode builds are buried two configuration windows deep and must be set for each subproject within the Xcode IDE individually, CMAKE_CONFIGURATION_TYPES
should be set to Debug:
$ cmake path/to/aom -G Xcode -DCMAKE_CONFIGURATION_TYPES=Debug
For Visual Studio the in-IDE configuration controls should be used. Simply set the IDE project configuration to Debug to allow for stepping through the code.
In addition to the above it can sometimes be useful to debug only C and C++ code. To disable all assembly code and intrinsics set AOM_TARGET_CPU
to generic at generation time:
$ cmake path/to/aom -DAOM_TARGET_CPU=generic
For the purposes of building the AV1 codec and applications and relative to the scope of this guide, all builds for architectures differing from the native host architecture will be considered cross compiles. The AV1 CMake build handles cross compiling via the use of toolchain files included in the AV1 repository. The toolchain files available at the time of this writing are:
The following example demonstrates use of the x86-macos.cmake toolchain file on a x86_64 MacOS host:
$ cmake path/to/aom \ -DCMAKE_TOOLCHAIN_FILE=path/to/aom/build/cmake/toolchains/x86-macos.cmake $ make
To build for an unlisted target creation of a new toolchain file is the best solution. The existing toolchain files can be used a starting point for a new toolchain file since each one exposes the basic requirements for toolchain files as used in the AV1 codec build.
As a temporary work around an unoptimized AV1 configuration that builds only C and C++ sources can be produced using the following commands:
$ cmake path/to/aom -DAOM_TARGET_CPU=generic $ make
In addition to the above it's important to note that the toolchain files suffixed with gcc behave differently than the others. These toolchain files attempt to obey the $CROSS environment variable.
Sanitizer integration is built-in to the CMake build system. To enable a sanitizer, add -DSANITIZE=<type>
to the CMake command line. For example, to enable address sanitizer:
$ cmake path/to/aom -DSANITIZE=address $ make
Sanitizers available vary by platform, target, and compiler. Consult your compiler documentation to determine which, if any, are available.
Building the AV1 codec library in Microsoft Visual Studio is supported. The following example demonstrates generating projects and a solution for the Microsoft IDE:
# This does not require a bash shell; command.exe is fine. $ cmake path/to/aom -G "Visual Studio 15 2017"
Building the AV1 codec library in Xcode is supported. The following example demonstrates generating an Xcode project:
$ cmake path/to/aom -G Xcode
Building the AV1 codec library with Emscripten is supported. Typically this is used to hook into the AOMAnalyzer GUI application. These instructions focus on using the inspector with AOMAnalyzer, but all tools can be built with Emscripten.
It is assumed here that you have already downloaded and installed the EMSDK, installed and activated at least one toolchain, and setup your environment appropriately using the emsdk_env script.
Download AOMAnalyzer.
Configure the build:
$ cmake path/to/aom \ -DENABLE_CCACHE=1 \ -DAOM_TARGET_CPU=generic \ -DENABLE_DOCS=0 \ -DCONFIG_ACCOUNTING=1 \ -DCONFIG_INSPECTION=1 \ -DCONFIG_MULTITHREAD=0 \ -DCONFIG_RUNTIME_CPU_DETECT=0 \ -DCONFIG_UNIT_TESTS=0 \ -DCONFIG_WEBM_IO=0 \ -DCMAKE_TOOLCHAIN_FILE=path/to/emsdk-portable/.../Emscripten.cmake
$ make inspect
# inspect.js is in the examples sub directory of the directory in which you # executed cmake. $ path/to/AOMAnalyzer path/to/examples/inspect.js path/to/av1/input/file
Three variables allow for passing of additional flags to the build system.
The build system attempts to ensure the flags passed through the above variables are passed to tools last in order to allow for override of default behavior. These flags can be used, for example, to enable asserts in a release build:
$ cmake path/to/aom \ -DCMAKE_BUILD_TYPE=Release \ -DAOM_EXTRA_C_FLAGS=-UNDEBUG \ -DAOM_EXTRA_CXX_FLAGS=-UNDEBUG
There are several methods of testing the AV1 codec. All of these methods require the presence of the AV1 source code and a working build of the AV1 library and applications.
The unit tests can be run at build time:
# Before running the make command the LIBAOM_TEST_DATA_PATH environment # variable should be set to avoid downloading the test files to the # cmake build configuration directory. $ cmake path/to/aom # Note: The AV1 CMake build creates many test targets. Running make # with multiple jobs will speed up the test run significantly. $ make runtests
The example tests require a bash shell and can be run in the following manner:
# See the note above about LIBAOM_TEST_DATA_PATH above. $ cmake path/to/aom $ make # It's best to build the testdata target using many make jobs. # Running it like this will verify and download (if necessary) # one at a time, which takes a while. $ make testdata $ path/to/aom/test/examples.sh --bin-path examples
When making a change to the encoder run encoder tests to confirm that your change has a positive or negligible impact on encode quality. When running these tests the build configuration should be changed to enable internal encoder statistics:
$ cmake path/to/aom -DCONFIG_INTERNAL_STATS=1 $ make
The repository contains scripts intended to make running these tests as simple as possible. The following example demonstrates creating a set of baseline clips for comparison to results produced after making your change to libaom:
# This will encode all Y4M files in the current directory using the # settings specified to create the encoder baseline statistical data: $ cd path/to/test/inputs # This command line assumes that run_encodes.sh, its helper script # best_encode.sh, and the aomenc you intend to test are all within a # directory in your PATH. $ run_encodes.sh 200 500 50 baseline
After making your change and creating the baseline clips, you'll need to run encodes that include your change(s) to confirm that things are working as intended:
# This will encode all Y4M files in the current directory using the # settings specified to create the statistical data for your change: $ cd path/to/test/inputs # This command line assumes that run_encodes.sh, its helper script # best_encode.sh, and the aomenc you intend to test are all within a # directory in your PATH. $ run_encodes.sh 200 500 50 mytweak
After creating both data sets you can use test/visual_metrics.py
to generate a report that can be viewed in a web browser:
$ visual_metrics.py metrics_template.html "*stt" baseline mytweak \ > mytweak.html
You can view the report by opening mytweak.html in a web browser.
By default the generated projects files created by CMake will not include the runtests and testdata rules when generating for IDEs like Microsoft Visual Studio and Xcode. This is done to avoid intolerably long build cycles in the IDEs-- IDE behavior is to build all targets when selecting the build project options in MSVS and Xcode. To enable the test rules in IDEs the ENABLE_IDE_TEST_HOSTING
variable must be enabled at CMake generation time:
# This example uses Xcode. To get a list of the generators # available, run cmake with the -G argument missing its # value. $ cmake path/to/aom -DENABLE_IDE_TEST_HOSTING=1 -G Xcode
The fastest and easiest way to obtain the test data is to use CMake to generate a build using the Unix Makefiles generator, and then to build only the testdata rule:
$ cmake path/to/aom -G "Unix Makefiles" # 28 is used because there are 28 test files as of this writing. $ make -j28 testdata
The above make command will only download and verify the test data.
The test data mentioned above is strictly intended for unit testing.
Additional input data for testing the encoder can be obtained from: https://media.xiph.org/video/derf/
The AV1 codec library unit tests are built upon gtest which supports sharding of test jobs. Sharded test runs can be achieved in a couple of ways.
# Set the environment variable GTEST_TOTAL_SHARDS to control the number of # shards. $ export GTEST_TOTAL_SHARDS=10 # (GTEST shard indexing is 0 based). $ seq 0 $(( $GTEST_TOTAL_SHARDS - 1 )) \ | xargs -n 1 -P 0 -I{} env GTEST_SHARD_INDEX={} ./test_libaom
To create a test shard for each CPU core available on the current system set GTEST_TOTAL_SHARDS
to the number of CPU cores on your system minus one.
# For IDE based builds, ENABLE_IDE_TEST_HOSTING must be enabled. See # the IDE hosted tests section above for more information. If the IDE # supports building targets concurrently tests will be sharded by default. # For make and ninja builds the -j parameter controls the number of shards # at test run time. This example will run the tests using 10 shards via # make. $ make -j10 runtests
The maximum number of test targets that can run concurrently is determined by the number of CPUs on the system where the build is configured as detected by CMake. A system with 24 cores can run 24 test shards using a value of 24 with the -j
parameter. When CMake is unable to detect the number of cores 10 shards is the default maximum value.
We are using the Google C Coding Style defined by the Google C++ Style Guide.
The coding style used by this project is enforced with clang-format using the configuration contained in the .clang-format file in the root of the repository.
You can download clang-format using your system's package manager, or directly from llvm.org. You can also view the documentation on llvm.org. Output from clang-format varies by clang-format version, for best results your version should match the one used on Jenkins. You can find the clang-format version by reading the comment in the .clang-format
file linked above.
Before pushing changes for review you can format your code with:
# Apply clang-format to modified .c, .h and .cc files $ clang-format -i --style=file \ $(git diff --name-only --diff-filter=ACMR '*.[hc]' '*.cc')
Check the .clang-format file for the version used to generate it if there is any difference between your local formatting and the review system.
Some Git installations have clang-format integration. Here are some examples:
# Apply clang-format to all staged changes: $ git clang-format # Clang format all staged and unstaged changes: $ git clang-format -f # Clang format all staged and unstaged changes interactively: $ git clang-format -f -p
We manage the submission of patches using the Gerrit code review tool. This tool implements a workflow on top of the Git version control system to ensure that all changes get peer reviewed and tested prior to their distribution.
Browse to AOMedia Git index and login with your account (Gmail credentials, for example). Next, follow the Generate Password
Password link at the top of the page. You’ll be given instructions for creating a cookie to use with our Git repos.
You will be required to execute a contributor agreement to ensure that the AOMedia Project has the right to distribute your changes.
The testing basics are covered in the testing section above.
In addition to the local tests, many more (e.g. asan, tsan, valgrind) will run through Jenkins instances upon upload to gerrit.
Gerrit requires that each submission include a unique Change-Id. You can assign one manually using git commit --amend, but it’s easier to automate it with the commit-msg hook provided by Gerrit.
Copy commit-msg to the .git/hooks
directory of your local repo. Here's an example:
$ curl -Lo aom/.git/hooks/commit-msg https://chromium-review.googlesource.com/tools/hooks/commit-msg # Next, ensure that the downloaded commit-msg script is executable: $ chmod u+x aom/.git/hooks/commit-msg
See the Gerrit documentation for more information.
The command line to upload your patch looks like this:
$ git push https://aomedia-review.googlesource.com/aom HEAD:refs/for/master
If you previously uploaded a change to Gerrit and the Approver has asked for changes, follow these steps:
$ git commit -a --amend
In general, you should not rebase your changes when doing updates in response to review. Doing so can make it harder to follow the evolution of your change in the diff view.
Once your change has been Approved and Verified, you can “submit” it through the Gerrit UI. This will usually automatically rebase your change onto the branch specified.
Sometimes this can’t be done automatically. If you run into this problem, you must rebase your changes manually:
$ git fetch $ git rebase origin/branchname
If there are any conflicts, resolve them as you normally would with Git. When you’re done, reupload your change.
To check the status of a change that you uploaded, open Gerrit, sign in, and click My > Changes.
This library is an open source project supported by its community. Please please email aomediacodec@jointdevelopment.kavi.com for help.
Bug reports can be filed in the Alliance for Open Media issue tracker.