Merge tag 'v3.5.0' into main 2022-08-31 v3.5.0 This release is ABI compatible with the last one, including speedup and memory optimizations, and new APIs and features. - New Features * Support for frame parallel encode for larger number of threads. --fp-mt flag is available for all build configurations. * New codec control AV1E_GET_NUM_OPERATING_POINTS - Speedup and Memory Optimizations * Speed-up multithreaded encoding for good quality mode for larger number of threads through frame parallel encoding: o 30-34% encode time reduction for 1080p, 16 threads, 1x1 tile configuration (tile_rows x tile_columns) o 18-28% encode time reduction for 1080p, 16 threads, 2x4 tile configuration o 18-20% encode time reduction for 2160p, 32 threads, 2x4 tile configuration * 16-20% speed-up for speed=6 to 8 in still-picture encoding mode * 5-6% heap memory reduction for speed=6 to 10 in real-time encoding mode * Improvements to the speed for speed=7, 8 in real-time encoding mode * Improvements to the speed for speed=9, 10 in real-time screen encoding mode * Optimizations to improve multi-thread efficiency in real-time encoding mode * 10-15% speed up for SVC with temporal layers * SIMD optimizations: o Improve av1_quantize_fp_32x32_neon() 1.05x to 1.24x faster o Add aom_highbd_quantize_b{,_32x32,_64x64}_adaptive_neon() 3.15x to 5.6x faster than "C" o Improve av1_quantize_fp_64x64_neon() 1.17x to 1.66x faster o Add aom_quantize_b_avx2() 1.4x to 1.7x faster than aom_quantize_b_avx() o Add aom_quantize_b_32x32_avx2() 1.4x to 2.3x faster than aom_quantize_b_32x32_avx() o Add aom_quantize_b_64x64_avx2() 2.0x to 2.4x faster than aom_quantize_b_64x64_ssse3() o Add aom_highbd_quantize_b_32x32_avx2() 9.0x to 10.5x faster than aom_highbd_quantize_b_32x32_c() o Add aom_highbd_quantize_b_64x64_avx2() 7.3x to 9.7x faster than aom_highbd_quantize_b_64x64_c() o Improve aom_highbd_quantize_b_avx2() 1.07x to 1.20x faster o Improve av1_quantize_fp_avx2() 1.13x to 1.49x faster o Improve av1_quantize_fp_32x32_avx2() 1.07x to 1.54x faster o Improve av1_quantize_fp_64x64_avx2() 1.03x to 1.25x faster o Improve av1_quantize_lp_avx2() 1.07x to 1.16x faster - Bug fixes including but not limited to * aomedia:3206 Assert that skip_width > 0 for deconvolve function * aomedia:3278 row_mt enc: Delay top-right sync when intraBC is enabled * aomedia:3282 blend_a64_*_neon: fix bus error in armv7 * aomedia:3283 FRAME_PARALLEL: Propagate border size to all cpis * aomedia:3283 RESIZE_MODE: Fix incorrect strides being used for motion search * aomedia:3286 rtc-svc: Fix to dynamic_enable spatial layers * aomedia:3289 rtc-screen: Fix to skipping inter-mode test in nonrd * aomedia:3289 rtc-screen: Fix for skip newmv on flat blocks * aomedia:3299 Fix build failure with CONFIG_TUNE_VMAF=1 * aomedia:3296 Fix the conflict --enable-tx-size-search=0 with nonrd mode --enable-tx-size-search will be ignored in non-rd pick mode * aomedia:3304 Fix off-by-one error of max w/h in validate_config * aomedia:3306 Do not use pthread_setname_np on GNU/Hurd * aomedia:3325 row-multithreading produces invalid bitstream in some cases * chromium:1346938, chromium:1338114 * compiler_flags.cmake: fix flag detection w/cmake 3.17-3.18.2 * tools/*.py: update to python3 * aom_configure.cmake: detect PIE and set CONFIG_PIC * test/simd_cmp_impl: use explicit types w/CompareSimd* * rtc: Fix to disable segm for aq-mode=3 * rtc: Fix to color_sensitivity in variance partition * rtc-screen: Fix bsize in model rd computation for intra chroma * Fixes to ensure the correct behavior of the encoder algorithms (like segmentation, computation of statistics, etc.) Bug: aomedia:3313 Change-Id: I8c9bc4c709f3bf0157ec29c5af52f397ac33ec38

commit: 022ff56565a47c2c44f64f4d0c851601b14139c4 [log] [tgz]
author: Jerome Jiang <jianj@google.com> Wed Sep 21 12:43:19 2022 -0400
committer: Jerome Jiang <jianj@google.com> Wed Sep 21 13:56:54 2022 -0400
tree: 0ec3416625cd34ce0fec499f7cf836d9cde64d7a
parent: a7f472b0eabf3dfd800578a9a1c76d6cb044b555 [diff]
parent: bcfe6fbfed315f83ee8a95465c654ee8078dbff9 [diff]
diff --git a/.mailmap b/.mailmap
index 1f21868..61adddb 100644
--- a/.mailmap
+++ b/.mailmap

@@ -40,6 +40,8 @@
 Johann Koenig <johannkoenig@google.com> <johannkoenig@chromium.org>
 John Koleszar <jkoleszar@google.com>
 Joshua Litt <joshualitt@google.com> <joshualitt@chromium.org>
+Kyle Siefring <siekyleb@amazon.com>
+Kyle Siefring <siekyleb@amazon.com> <kylesiefring@gmail.com>
 Lokeshwar Reddy B <lokeshwar.reddy@ittiam.com>
 Logan Goldberg <logangw@google.com>
 Luc Trudeau <luc@trud.ca>

diff --git a/AUTHORS b/AUTHORS
index 84ef6fb..0c2da8f 100644
--- a/AUTHORS
+++ b/AUTHORS

@@ -134,7 +134,7 @@
 Kavi Ramamurthy <kavii@google.com>
 KO Myung-Hun <komh@chollian.net>
 Krishna Malladi <kmalladi@google.com>
-Kyle Siefring <kylesiefring@gmail.com>
+Kyle Siefring <siekyleb@amazon.com>
 Larisa Markeeva <lmarkeeva@google.com>
 Lauren Partin <lpartin@google.com>
 Lawrence Velázquez <larryv@macports.org>
@@ -222,6 +222,7 @@
 Sai Deng <sdeng@google.com>
 Sami Boukortt <sboukortt@google.com>
 Sami Pietilä <samipietila@google.com>
+Samuel Thibault <samuel.thibault@ens-lyon.org>
 Sarah Parker <sarahparker@google.com>
 Sasi Inguva <isasi@google.com>
 Satish Kumar Suman <satish.suman@ittiam.com>

diff --git a/CHANGELOG b/CHANGELOG
index 81f92b8..0df6a68 100644
--- a/CHANGELOG
+++ b/CHANGELOG

@@ -1,3 +1,76 @@
+2022-08-31 v3.5.0
+  This release is ABI compatible with the last one, including speedup and memory
+  optimizations, and new APIs and features.
+
+  - New Features
+    * Support for frame parallel encode for larger number of threads. --fp-mt
+      flag is available for all build configurations.
+    * New codec control AV1E_GET_NUM_OPERATING_POINTS
+
+  - Speedup and Memory Optimizations
+    * Speed-up multithreaded encoding for good quality mode for larger number of
+      threads through frame parallel encoding:
+      o 30-34% encode time reduction for 1080p, 16 threads, 1x1 tile
+        configuration (tile_rows x tile_columns)
+      o 18-28% encode time reduction for 1080p, 16 threads, 2x4 tile
+        configuration
+      o 18-20% encode time reduction for 2160p, 32 threads, 2x4 tile
+        configuration
+    * 16-20% speed-up for speed=6 to 8 in still-picture encoding mode
+    * 5-6% heap memory reduction for speed=6 to 10 in real-time encoding mode
+    * Improvements to the speed for speed=7, 8 in real-time encoding mode
+    * Improvements to the speed for speed=9, 10 in real-time screen encoding
+      mode
+    * Optimizations to improve multi-thread efficiency in real-time encoding
+      mode
+    * 10-15% speed up for SVC with temporal layers
+    * SIMD optimizations:
+      o Improve av1_quantize_fp_32x32_neon() 1.05x to 1.24x faster
+      o Add aom_highbd_quantize_b{,_32x32,_64x64}_adaptive_neon() 3.15x to 5.6x
+        faster than "C"
+      o Improve av1_quantize_fp_64x64_neon() 1.17x to 1.66x faster
+      o Add aom_quantize_b_avx2() 1.4x to 1.7x faster than aom_quantize_b_avx()
+      o Add aom_quantize_b_32x32_avx2() 1.4x to 2.3x faster than
+        aom_quantize_b_32x32_avx()
+      o Add aom_quantize_b_64x64_avx2() 2.0x to 2.4x faster than
+        aom_quantize_b_64x64_ssse3()
+      o Add aom_highbd_quantize_b_32x32_avx2() 9.0x to 10.5x faster than
+        aom_highbd_quantize_b_32x32_c()
+      o Add aom_highbd_quantize_b_64x64_avx2() 7.3x to 9.7x faster than
+        aom_highbd_quantize_b_64x64_c()
+      o Improve aom_highbd_quantize_b_avx2() 1.07x to 1.20x faster
+      o Improve av1_quantize_fp_avx2() 1.13x to 1.49x faster
+      o Improve av1_quantize_fp_32x32_avx2() 1.07x to 1.54x faster
+      o Improve av1_quantize_fp_64x64_avx2()  1.03x to 1.25x faster
+      o Improve av1_quantize_lp_avx2() 1.07x to 1.16x faster
+
+  - Bug fixes including but not limited to
+    * aomedia:3206 Assert that skip_width > 0 for deconvolve function
+    * aomedia:3278 row_mt enc: Delay top-right sync when intraBC is enabled
+    * aomedia:3282 blend_a64_*_neon: fix bus error in armv7
+    * aomedia:3283 FRAME_PARALLEL: Propagate border size to all cpis
+    * aomedia:3283 RESIZE_MODE: Fix incorrect strides being used for motion
+      search
+    * aomedia:3286 rtc-svc: Fix to dynamic_enable spatial layers
+    * aomedia:3289 rtc-screen: Fix to skipping inter-mode test in nonrd
+    * aomedia:3289 rtc-screen: Fix for skip newmv on flat blocks
+    * aomedia:3299 Fix build failure with CONFIG_TUNE_VMAF=1
+    * aomedia:3296 Fix the conflict --enable-tx-size-search=0 with nonrd mode
+      --enable-tx-size-search will be ignored in non-rd pick mode
+    * aomedia:3304 Fix off-by-one error of max w/h in validate_config
+    * aomedia:3306 Do not use pthread_setname_np on GNU/Hurd
+    * aomedia:3325 row-multithreading produces invalid bitstream in some cases
+    * chromium:1346938, chromium:1338114
+    * compiler_flags.cmake: fix flag detection w/cmake 3.17-3.18.2
+    * tools/*.py: update to python3
+    * aom_configure.cmake: detect PIE and set CONFIG_PIC
+    * test/simd_cmp_impl: use explicit types w/CompareSimd*
+    * rtc: Fix to disable segm for aq-mode=3
+    * rtc: Fix to color_sensitivity in variance partition
+    * rtc-screen: Fix bsize in model rd computation for intra chroma
+    * Fixes to ensure the correct behavior of the encoder algorithms (like
+      segmentation, computation of statistics, etc.)
+
 2022-06-17 v3.4.0
   This release includes compression efficiency and perceptual quality
   improvements, speedup and memory optimizations, and some new features.

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 5620129..a0fa444 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt

@@ -51,9 +51,9 @@
 # passed to libtool.
 #
 # We set SO_FILE_VERSION = [c-a].a.r
-set(LT_CURRENT 7)
+set(LT_CURRENT 8)
 set(LT_REVISION 0)
-set(LT_AGE 4)
+set(LT_AGE 5)
 math(EXPR SO_VERSION "${LT_CURRENT} - ${LT_AGE}")
 set(SO_FILE_VERSION "${SO_VERSION}.${LT_AGE}.${LT_REVISION}")
 unset(LT_CURRENT)
commit	022ff56565a47c2c44f64f4d0c851601b14139c4	[log] [tgz]
author	Jerome Jiang <jianj@google.com>	Wed Sep 21 12:43:19 2022 -0400
committer	Jerome Jiang <jianj@google.com>	Wed Sep 21 13:56:54 2022 -0400
tree	0ec3416625cd34ce0fec499f7cf836d9cde64d7a
parent	a7f472b0eabf3dfd800578a9a1c76d6cb044b555 [diff]
parent	bcfe6fbfed315f83ee8a95465c654ee8078dbff9 [diff]