Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 1 | /*!\page encoder_guide AV1 ENCODER GUIDE |
Yunqing Wang | c8f7a3b | 2020-05-04 15:23:48 -0700 | [diff] [blame] | 2 | |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 3 | \tableofcontents |
| 4 | |
| 5 | \section architecture_introduction Introduction |
| 6 | |
| 7 | This document provides an architectural overview of the libaom AV1 encoder. |
| 8 | |
| 9 | It is intended as a high level starting point for anyone wishing to contribute |
| 10 | to the project, that will help them to more quickly understand the structure |
| 11 | of the encoder and find their way around the codebase. |
| 12 | |
| 13 | It stands above and will where necessary link to more detailed function |
| 14 | level documents. |
| 15 | |
Paul Wilkins | 196995d | 2020-07-14 16:49:38 +0100 | [diff] [blame] | 16 | \subsection architecture_gencodecs Generic Block Transform Based Codecs |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 17 | |
| 18 | Most modern video encoders including VP8, H.264, VP9, HEVC and AV1 |
| 19 | (in increasing order of complexity) share a common basic paradigm. This |
| 20 | comprises separating a stream of raw video frames into a series of discrete |
| 21 | blocks (of one or more sizes), then computing a prediction signal and a |
| 22 | quantized, transform coded, residual error signal. The prediction and residual |
| 23 | error signal, along with any side information needed by the decoder, are then |
| 24 | entropy coded and packed to form the encoded bitstream. See Figure 1: below, |
| 25 | where the blue blocks are, to all intents and purposes, the lossless parts of |
| 26 | the encoder and the red block is the lossy part. |
| 27 | |
| 28 | This is of course a gross oversimplification, even in regard to the simplest |
| 29 | of the above codecs. For example, all of them allow for block based |
| 30 | prediction at multiple different scales (i.e. different block sizes) and may |
| 31 | use previously coded pixels in the current frame for prediction or pixels from |
| 32 | one or more previously encoded frames. Further, they may support multiple |
| 33 | different transforms and transform sizes and quality optimization tools like |
| 34 | loop filtering. |
| 35 | |
| 36 | \image html genericcodecflow.png "" width=70% |
| 37 | |
Paul Wilkins | 196995d | 2020-07-14 16:49:38 +0100 | [diff] [blame] | 38 | \subsection architecture_av1_structure AV1 Structure and Complexity |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 39 | |
| 40 | As previously stated, AV1 adopts the same underlying paradigm as other block |
| 41 | transform based codecs. However, it is much more complicated than previous |
| 42 | generation codecs and supports many more block partitioning, prediction and |
| 43 | transform options. |
| 44 | |
| 45 | AV1 supports block partitions of various sizes from 128x128 pixels down to 4x4 |
| 46 | pixels using a multi-layer recursive tree structure as illustrated in figure 2 |
| 47 | below. |
| 48 | |
| 49 | \image html av1partitions.png "" width=70% |
| 50 | |
| 51 | AV1 also provides 71 basic intra prediction modes, 56 single frame inter prediction |
| 52 | modes (7 reference frames x 4 modes x 2 for OBMC (overlapped block motion |
| 53 | compensation)), 12768 compound inter prediction modes (that combine inter |
| 54 | predictors from two reference frames) and 36708 compound inter / intra |
| 55 | prediction modes. Furthermore, in addition to simple inter motion estimation, |
| 56 | AV1 also supports warped motion prediction using affine transforms. |
| 57 | |
| 58 | In terms of transform coding, it has 16 separable 2-D transform kernels |
Paul Wilkins | 8ed85dd | 2020-08-04 17:48:22 +0100 | [diff] [blame] | 59 | \f$(DCT, ADST, fADST, IDTX)^2\f$ that can be applied at up to 19 different |
| 60 | scales from 64x64 down to 4x4 pixels. |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 61 | |
| 62 | When combined together, this means that for any one 8x8 pixel block in a |
| 63 | source frame, there are approximately 45,000,000 different ways that it can |
| 64 | be encoded. |
| 65 | |
| 66 | Consequently, AV1 requires complex control processes. While not necessarily |
| 67 | a normative part of the bitstream, these are the algorithms that turn a set |
| 68 | of compression tools and a bitstream format specification, into a coherent |
| 69 | and useful codec implementation. These may include but are not limited to |
| 70 | things like :- |
| 71 | |
| 72 | - Rate distortion optimization (The process of trying to choose the most |
| 73 | efficient combination of block size, prediction mode, transform type |
| 74 | etc.) |
| 75 | - Rate control (regulation of the output bitrate) |
| 76 | - Encoder speed vs quality trade offs. |
| 77 | - Features such as two pass encoding or optimization for low delay |
| 78 | encoding. |
| 79 | |
Paul Wilkins | 4a9201b | 2020-06-26 10:46:22 +0100 | [diff] [blame] | 80 | For a more detailed overview of AV1's encoding tools and a discussion of some |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 81 | of the design considerations and hardware constraints that had to be |
Paul Wilkins | f88a151 | 2020-10-20 13:18:40 +0100 | [diff] [blame] | 82 | accommodated, please refer to <a href="https://arxiv.org/abs/2008.06091"> |
| 83 | A Technical Overview of AV1</a>. |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 84 | |
| 85 | Figure 3 provides a slightly expanded but still simplistic view of the |
| 86 | AV1 encoder architecture with blocks that relate to some of the subsequent |
| 87 | sections of this document. In this diagram, the raw uncompressed frame buffers |
| 88 | are shown in dark green and the reconstructed frame buffers used for |
| 89 | prediction in light green. Red indicates those parts of the codec that are |
Paul Wilkins | 4a9201b | 2020-06-26 10:46:22 +0100 | [diff] [blame] | 90 | (or may be) lossy, where fidelity can be traded off against compression |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 91 | efficiency, whilst light blue shows algorithms or coding tools that are |
| 92 | lossless. The yellow blocks represent non-bitstream normative configuration |
| 93 | and control algorithms. |
| 94 | |
| 95 | \image html av1encoderflow.png "" width=70% |
| 96 | |
| 97 | \section architecture_command_line The Libaom Command Line Interface |
| 98 | |
| 99 | Add details or links here: TODO ? elliotk@ |
| 100 | |
| 101 | \section architecture_enc_data_structures Main Encoder Data Structures |
| 102 | |
Paul Wilkins | 4a9201b | 2020-06-26 10:46:22 +0100 | [diff] [blame] | 103 | The following are the main high level data structures used by the libaom AV1 |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 104 | encoder and referenced elsewhere in this overview document: |
| 105 | |
Mufaddal Chakera | 8ee04fa | 2021-03-17 13:33:18 +0530 | [diff] [blame] | 106 | - \ref AV1_PRIMARY |
| 107 | - \ref AV1_PRIMARY.gf_group (\ref GF_GROUP) |
Tarundeep Singh | 5e5305a | 2021-03-16 13:04:04 +0530 | [diff] [blame] | 108 | - \ref AV1_PRIMARY.lap_enabled |
Mufaddal Chakera | 358cf21 | 2021-02-25 14:41:56 +0530 | [diff] [blame] | 109 | - \ref AV1_PRIMARY.twopass (\ref TWO_PASS) |
Mufaddal Chakera | 94ee9bf | 2021-04-12 01:02:22 +0530 | [diff] [blame] | 110 | - \ref AV1_PRIMARY.p_rc (\ref PRIMARY_RATE_CONTROL) |
Angie Chiang | 29aaace | 2021-11-15 16:23:42 -0800 | [diff] [blame] | 111 | - \ref AV1_PRIMARY.tf_info (\ref TEMPORAL_FILTER_INFO) |
Mufaddal Chakera | 8ee04fa | 2021-03-17 13:33:18 +0530 | [diff] [blame] | 112 | |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 113 | - \ref AV1_COMP |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 114 | - \ref AV1_COMP.oxcf (\ref AV1EncoderConfig) |
Paul Wilkins | 3ceb7c7 | 2020-07-14 14:02:52 +0100 | [diff] [blame] | 115 | - \ref AV1_COMP.rc (\ref RATE_CONTROL) |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 116 | - \ref AV1_COMP.speed |
| 117 | - \ref AV1_COMP.sf (\ref SPEED_FEATURES) |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 118 | |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 119 | - \ref AV1EncoderConfig (Encoder configuration parameters) |
| 120 | - \ref AV1EncoderConfig.pass |
Paul Wilkins | 3ceb7c7 | 2020-07-14 14:02:52 +0100 | [diff] [blame] | 121 | - \ref AV1EncoderConfig.algo_cfg (\ref AlgoCfg) |
Paul Wilkins | 591f047 | 2020-07-15 15:30:56 +0100 | [diff] [blame] | 122 | - \ref AV1EncoderConfig.kf_cfg (\ref KeyFrameCfg) |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 123 | - \ref AV1EncoderConfig.rc_cfg (\ref RateControlCfg) |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 124 | |
Paul Wilkins | 3ceb7c7 | 2020-07-14 14:02:52 +0100 | [diff] [blame] | 125 | - \ref AlgoCfg (Algorithm related configuration parameters) |
| 126 | - \ref AlgoCfg.arnr_max_frames |
| 127 | - \ref AlgoCfg.arnr_strength |
| 128 | |
| 129 | - \ref KeyFrameCfg (Keyframe coding configuration parameters) |
| 130 | - \ref KeyFrameCfg.enable_keyframe_filtering |
| 131 | |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 132 | - \ref RateControlCfg (Rate control configuration) |
Paul Wilkins | 1dd7a7e | 2020-07-09 17:07:35 +0100 | [diff] [blame] | 133 | - \ref RateControlCfg.mode |
| 134 | - \ref RateControlCfg.target_bandwidth |
| 135 | - \ref RateControlCfg.best_allowed_q |
| 136 | - \ref RateControlCfg.worst_allowed_q |
| 137 | - \ref RateControlCfg.cq_level |
| 138 | - \ref RateControlCfg.under_shoot_pct |
| 139 | - \ref RateControlCfg.over_shoot_pct |
| 140 | - \ref RateControlCfg.maximum_buffer_size_ms |
| 141 | - \ref RateControlCfg.starting_buffer_level_ms |
| 142 | - \ref RateControlCfg.optimal_buffer_level_ms |
Debargha Mukherjee | c6a8120 | 2020-07-22 16:35:20 -0700 | [diff] [blame] | 143 | - \ref RateControlCfg.vbrbias |
| 144 | - \ref RateControlCfg.vbrmin_section |
| 145 | - \ref RateControlCfg.vbrmax_section |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 146 | |
Mufaddal Chakera | 94ee9bf | 2021-04-12 01:02:22 +0530 | [diff] [blame] | 147 | - \ref PRIMARY_RATE_CONTROL (Primary Rate control status) |
| 148 | - \ref PRIMARY_RATE_CONTROL.gf_intervals[] |
| 149 | - \ref PRIMARY_RATE_CONTROL.cur_gf_index |
| 150 | |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 151 | - \ref RATE_CONTROL (Rate control status) |
| 152 | - \ref RATE_CONTROL.intervals_till_gf_calculate_due |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 153 | - \ref RATE_CONTROL.frames_till_gf_update_due |
| 154 | - \ref RATE_CONTROL.frames_to_key |
| 155 | |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 156 | - \ref TWO_PASS (Two pass status and control data) |
| 157 | |
Wan-Teh Chang | 247dd54 | 2020-10-08 12:37:47 -0700 | [diff] [blame] | 158 | - \ref GF_GROUP (Data related to the current GF/ARF group) |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 159 | |
| 160 | - \ref FIRSTPASS_STATS (Defines entries in the first pass stats buffer) |
| 161 | - \ref FIRSTPASS_STATS.coded_error |
| 162 | |
| 163 | - \ref SPEED_FEATURES (Encode speed vs quality tradeoff parameters) |
| 164 | - \ref SPEED_FEATURES.hl_sf (\ref HIGH_LEVEL_SPEED_FEATURES) |
| 165 | |
| 166 | - \ref HIGH_LEVEL_SPEED_FEATURES |
| 167 | - \ref HIGH_LEVEL_SPEED_FEATURES.recode_loop |
| 168 | - \ref HIGH_LEVEL_SPEED_FEATURES.recode_tolerance |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 169 | |
Paul Wilkins | 4ac8bf4 | 2020-07-30 16:44:27 +0100 | [diff] [blame] | 170 | - \ref TplParams |
| 171 | |
Paul Wilkins | 7173920 | 2020-07-23 15:09:07 +0100 | [diff] [blame] | 172 | \section architecture_enc_use_cases Encoder Use Cases |
| 173 | |
| 174 | The libaom AV1 encoder is configurable to support a number of different use |
| 175 | cases and rate control strategies. |
| 176 | |
| 177 | The principle use cases for which it is optimised are as follows: |
| 178 | |
| 179 | - <b>Video on Demand / Streaming</b> |
| 180 | - <b>Low Delay or Live Streaming</b> |
| 181 | - <b>Video Conferencing / Real Time Coding (RTC)</b> |
| 182 | - <b>Fixed Quality / Testing</b> |
| 183 | |
| 184 | Other examples of use cases for which the encoder could be configured but for |
| 185 | which there is less by way of specific optimizations include: |
| 186 | |
| 187 | - <b>Download and Play</b> |
| 188 | - <b>Disk Playback</b>> |
| 189 | - <b>Storage</b> |
| 190 | - <b>Editing</b> |
| 191 | - <b>Broadcast video</b> |
| 192 | |
| 193 | Specific use cases may have particular requirements or constraints. For |
| 194 | example: |
| 195 | |
| 196 | <b>Video Conferencing:</b> In a video conference we need to encode the video |
| 197 | in real time and to avoid any coding tools that could increase latency, such |
| 198 | as frame look ahead. |
| 199 | |
| 200 | <b>Live Streams:</b> In cases such as live streaming of games or events, it |
| 201 | may be possible to allow some limited buffering of the video and use of |
| 202 | lookahead coding tools to improve encoding quality. However, whilst a lag of |
| 203 | a second or two may be fine given the one way nature of this type of video, |
| 204 | it is clearly not possible to use tools such as two pass coding. |
| 205 | |
| 206 | <b>Broadcast:</b> Broadcast video (e.g. digital TV over satellite) may have |
| 207 | specific requirements such as frequent and regular key frames (e.g. once per |
| 208 | second or more) as these are important as entry points to users when switching |
| 209 | channels. There may also be strict upper limits on bandwidth over a short |
| 210 | window of time. |
| 211 | |
| 212 | <b>Download and Play:</b> Download and play applications may have less strict |
| 213 | requirements in terms of local frame by frame rate control but there may be a |
| 214 | requirement to accurately hit a file size target for the video clip as a |
| 215 | whole. Similar considerations may apply to playback from mass storage devices |
| 216 | such as DVD or disk drives. |
| 217 | |
| 218 | <b>Editing:</b> In certain special use cases such as offline editing, it may |
| 219 | be desirable to have very high quality and data rate but also very frequent |
| 220 | key frames or indeed to encode the video exclusively as key frames. Lossless |
| 221 | video encoding may also be required in this use case. |
| 222 | |
| 223 | <b>VOD / Streaming:</b> One of the most important and common use cases for AV1 |
| 224 | is video on demand or streaming, for services such as YouTube and Netflix. In |
| 225 | this use case it is possible to do two or even multi-pass encoding to improve |
| 226 | compression efficiency. Streaming services will often store many encoded |
| 227 | copies of a video at different resolutions and data rates to support users |
| 228 | with different types of playback device and bandwidth limitations. |
| 229 | Furthermore, these services support dynamic switching between multiple |
| 230 | streams, so that they can respond to changing network conditions. |
| 231 | |
| 232 | Exact rate control when encoding for a specific format (e.g 360P or 1080P on |
| 233 | YouTube) may not be critical, provided that the video bandwidth remains within |
| 234 | allowed limits. Whilst a format may have a nominal target data rate, this can |
| 235 | be considered more as the desired average egress rate over the video corpus |
| 236 | rather than a strict requirement for any individual clip. Indeed, in order |
| 237 | to maintain optimal quality of experience for the end user, it may be |
| 238 | desirable to encode some easier videos or sections of video at a lower data |
| 239 | rate and harder videos or sections at a higher rate. |
| 240 | |
| 241 | VOD / streaming does not usually require very frequent key frames (as in the |
| 242 | broadcast case) but key frames are important in trick play (scanning back and |
| 243 | forth to different points in a video) and for adaptive stream switching. As |
| 244 | such, in a use case like YouTube, there is normally an upper limit on the |
| 245 | maximum time between key frames of a few seconds, but within certain limits |
| 246 | the encoder can try to align key frames with real scene cuts. |
| 247 | |
| 248 | Whilst encoder speed may not seem to be as critical in this use case, for |
| 249 | services such as YouTube, where millions of new videos have to be encoded |
| 250 | every day, encoder speed is still important, so libaom allows command line |
| 251 | control of the encode speed vs quality trade off. |
| 252 | |
| 253 | <b>Fixed Quality / Testing Mode:</b> Libaom also has a fixed quality encoder |
| 254 | pathway designed for testing under highly constrained conditions. |
| 255 | |
| 256 | \section architecture_enc_speed_quality Speed vs Quality Trade Off |
| 257 | |
| 258 | In any modern video encoder there are trade offs that can be made in regard to |
| 259 | the amount of time spent encoding a video or video frame vs the quality of the |
| 260 | final encode. |
| 261 | |
| 262 | These trade offs typically limit the scope of the search for an optimal |
| 263 | prediction / transform combination with faster encode modes doing fewer |
| 264 | partition, reference frame, prediction mode and transform searches at the cost |
| 265 | of some reduction in coding efficiency. |
| 266 | |
| 267 | The pruning of the size of the search tree is typically based on assumptions |
| 268 | about the likelihood of different search modes being selected based on what |
| 269 | has gone before and features such as the dimensions of the video frames and |
| 270 | the Q value selected for encoding the frame. For example certain intra modes |
| 271 | are less likely to be chosen at high Q but may be more likely if similar |
| 272 | modes were used for the previously coded blocks above and to the left of the |
| 273 | current block. |
| 274 | |
| 275 | The speed settings depend both on the use case (e.g. Real Time encoding) and |
| 276 | an explicit speed control passed in on the command line as <b>--cpu-used</b> |
| 277 | and stored in the \ref AV1_COMP.speed field of the main compressor instance |
| 278 | data structure (<b>cpi</b>). |
| 279 | |
| 280 | The control flags for the speed trade off are stored the \ref AV1_COMP.sf |
| 281 | field of the compressor instancve and are set in the following functions:- |
| 282 | |
| 283 | - \ref av1_set_speed_features_framesize_independent() |
| 284 | - \ref av1_set_speed_features_framesize_dependent() |
| 285 | - \ref av1_set_speed_features_qindex_dependent() |
| 286 | |
| 287 | A second factor impacting the speed of encode is rate distortion optimisation |
| 288 | (<b>rd vs non-rd</b> encoding). |
| 289 | |
| 290 | When rate distortion optimization is enabled each candidate combination of |
| 291 | a prediction mode and transform coding strategy is fully encoded and the |
| 292 | resulting error (or distortion) as compared to the original source and the |
| 293 | number of bits used, are passed to a rate distortion function. This function |
| 294 | converts the distortion and cost in bits to a single <b>RD</b> value (where |
| 295 | lower is better). This <b>RD</b> value is used to decide between different |
| 296 | encoding strategies for the current block where, for example, a one may |
| 297 | result in a lower distortion but a larger number of bits. |
| 298 | |
| 299 | The calculation of this <b>RD</b> value is broadly speaking as follows: |
| 300 | |
| 301 | \f[ |
| 302 | RD = (λ * Rate) + Distortion |
| 303 | \f] |
| 304 | |
| 305 | This assumes a linear relationship between the number of bits used and |
| 306 | distortion (represented by the rate multiplier value <b>λ</b>) which is |
| 307 | not actually valid across a broad range of rate and distortion values. |
| 308 | Typically, where distortion is high, expending a small number of extra bits |
| 309 | will result in a large change in distortion. However, at lower values of |
| 310 | distortion the cost in bits of each incremental improvement is large. |
| 311 | |
| 312 | To deal with this we scale the value of <b>λ</b> based on the quantizer |
| 313 | value chosen for the frame. This is assumed to be a proxy for our approximate |
| 314 | position on the true rate distortion curve and it is further assumed that over |
| 315 | a limited range of distortion values, a linear relationship between distortion |
| 316 | and rate is a valid approximation. |
| 317 | |
| 318 | Doing a rate distortion test on each candidate prediction / transform |
| 319 | combination is expensive in terms of cpu cycles. Hence, for cases where encode |
| 320 | speed is critical, libaom implements a non-rd pathway where the <b>RD</b> |
| 321 | value is estimated based on the prediction error and quantizer setting. |
| 322 | |
Paul Wilkins | 3ceb7c7 | 2020-07-14 14:02:52 +0100 | [diff] [blame] | 323 | \section architecture_enc_src_proc Source Frame Processing |
| 324 | |
| 325 | \subsection architecture_enc_frame_proc_data Main Data Structures |
| 326 | |
| 327 | The following are the main data structures referenced in this section |
| 328 | (see also \ref architecture_enc_data_structures): |
| 329 | |
Tarundeep Singh | 4593fcf | 2021-03-31 00:53:31 +0530 | [diff] [blame] | 330 | - \ref AV1_PRIMARY ppi (the primary compressor instance data structure) |
Angie Chiang | 29aaace | 2021-11-15 16:23:42 -0800 | [diff] [blame] | 331 | - \ref AV1_PRIMARY.tf_info (\ref TEMPORAL_FILTER_INFO) |
Tarundeep Singh | 4593fcf | 2021-03-31 00:53:31 +0530 | [diff] [blame] | 332 | |
Paul Wilkins | 3ceb7c7 | 2020-07-14 14:02:52 +0100 | [diff] [blame] | 333 | - \ref AV1_COMP cpi (the main compressor instance data structure) |
| 334 | - \ref AV1_COMP.oxcf (\ref AV1EncoderConfig) |
Paul Wilkins | 3ceb7c7 | 2020-07-14 14:02:52 +0100 | [diff] [blame] | 335 | |
| 336 | - \ref AV1EncoderConfig (Encoder configuration parameters) |
| 337 | - \ref AV1EncoderConfig.algo_cfg (\ref AlgoCfg) |
| 338 | - \ref AV1EncoderConfig.kf_cfg (\ref KeyFrameCfg) |
| 339 | |
| 340 | - \ref AlgoCfg (Algorithm related configuration parameters) |
| 341 | - \ref AlgoCfg.arnr_max_frames |
| 342 | - \ref AlgoCfg.arnr_strength |
| 343 | |
| 344 | - \ref KeyFrameCfg (Keyframe coding configuration parameters) |
| 345 | - \ref KeyFrameCfg.enable_keyframe_filtering |
| 346 | |
Paul Wilkins | 196995d | 2020-07-14 16:49:38 +0100 | [diff] [blame] | 347 | \subsection architecture_enc_frame_proc_ingest Frame Ingest / Coding Pipeline |
Paul Wilkins | 3ceb7c7 | 2020-07-14 14:02:52 +0100 | [diff] [blame] | 348 | |
Paul Wilkins | 196995d | 2020-07-14 16:49:38 +0100 | [diff] [blame] | 349 | To encode a frame, first call \ref av1_receive_raw_frame() to obtain the raw |
| 350 | frame data. Then call \ref av1_get_compressed_data() to encode raw frame data |
| 351 | into compressed frame data. The main body of \ref av1_get_compressed_data() |
| 352 | is \ref av1_encode_strategy(), which determines high-level encode strategy |
| 353 | (frame type, frame placement, etc.) and then encodes the frame by calling |
| 354 | \ref av1_encode(). In \ref av1_encode(), \ref av1_first_pass() will execute |
| 355 | the first_pass of two-pass encoding, while \ref encode_frame_to_data_rate() |
| 356 | will perform the final pass for either one-pass or two-pass encoding. |
Paul Wilkins | 3ceb7c7 | 2020-07-14 14:02:52 +0100 | [diff] [blame] | 357 | |
Paul Wilkins | 196995d | 2020-07-14 16:49:38 +0100 | [diff] [blame] | 358 | The main body of \ref encode_frame_to_data_rate() is |
| 359 | \ref encode_with_recode_loop_and_filter(), which handles encoding before |
Paul Wilkins | 591f047 | 2020-07-15 15:30:56 +0100 | [diff] [blame] | 360 | in-loop filters (with recode loops \ref encode_with_recode_loop(), or |
Paul Wilkins | 196995d | 2020-07-14 16:49:38 +0100 | [diff] [blame] | 361 | without any recode loop \ref encode_without_recode()), followed by in-loop |
| 362 | filters (deblocking filters \ref loopfilter_frame(), CDEF filters and |
| 363 | restoration filters \ref cdef_restoration_frame()). |
| 364 | |
Paul Wilkins | 591f047 | 2020-07-15 15:30:56 +0100 | [diff] [blame] | 365 | Except for rate/quality control, both \ref encode_with_recode_loop() and |
Paul Wilkins | 196995d | 2020-07-14 16:49:38 +0100 | [diff] [blame] | 366 | \ref encode_without_recode() call \ref av1_encode_frame() to manage the |
| 367 | reference frame buffers and \ref encode_frame_internal() to perform the |
| 368 | rest of encoding that does not require access to external frames. |
| 369 | \ref encode_frame_internal() is the starting point for the partition search |
| 370 | (see \ref architecture_enc_partitions). |
| 371 | |
| 372 | \subsection architecture_enc_frame_proc_tf Temporal Filtering |
| 373 | |
| 374 | \subsubsection architecture_enc_frame_proc_tf_overview Overview |
Paul Wilkins | 3ceb7c7 | 2020-07-14 14:02:52 +0100 | [diff] [blame] | 375 | |
| 376 | Video codecs exploit the spatial and temporal correlations in video signals to |
| 377 | achieve compression efficiency. The noise factor in the source signal |
| 378 | attenuates such correlation and impedes the codec performance. Denoising the |
| 379 | video signal is potentially a promising solution. |
| 380 | |
| 381 | One strategy for denoising a source is motion compensated temporal filtering. |
| 382 | Unlike image denoising, where only the spatial information is available, |
| 383 | video denoising can leverage a combination of the spatial and temporal |
| 384 | information. Specifically, in the temporal domain, similar pixels can often be |
| 385 | tracked along the motion trajectory of moving objects. Motion estimation is |
| 386 | applied to neighboring frames to find similar patches or blocks of pixels that |
| 387 | can be combined to create a temporally filtered output. |
| 388 | |
| 389 | AV1, in common with VP8 and VP9, uses an in-loop motion compensated temporal |
| 390 | filter to generate what are referred to as alternate reference frames (or ARF |
| 391 | frames). These can be encoded in the bitstream and stored as frame buffers for |
| 392 | use in the prediction of subsequent frames, but are not usually directly |
| 393 | displayed (hence they are sometimes referred to as non-display frames). |
| 394 | |
| 395 | The following command line parameters set the strength of the filter, the |
| 396 | number of frames used and determine whether filtering is allowed for key |
| 397 | frames. |
| 398 | |
| 399 | - <b>--arnr-strength</b> (\ref AlgoCfg.arnr_strength) |
| 400 | - <b>--arnr-maxframes</b> (\ref AlgoCfg.arnr_max_frames) |
| 401 | - <b>--enable-keyframe-filtering</b> |
| 402 | (\ref KeyFrameCfg.enable_keyframe_filtering) |
| 403 | |
| 404 | Note that in AV1, the temporal filtering scheme is designed around the |
| 405 | hierarchical ARF based pyramid coding structure. We typically apply denoising |
| 406 | only on key frame and ARF frames at the highest (and sometimes the second |
| 407 | highest) layer in the hierarchical coding structure. |
| 408 | |
Paul Wilkins | 196995d | 2020-07-14 16:49:38 +0100 | [diff] [blame] | 409 | \subsubsection architecture_enc_frame_proc_tf_algo Temporal Filtering Algorithm |
Paul Wilkins | 3ceb7c7 | 2020-07-14 14:02:52 +0100 | [diff] [blame] | 410 | |
| 411 | Our method divides the current frame into "MxM" blocks. For each block, a |
| 412 | motion search is applied on frames before and after the current frame. Only |
| 413 | the best matching patch with the smallest mean square error (MSE) is kept as a |
| 414 | candidate patch for a neighbour frame. The current block is also a candidate |
| 415 | patch. A total of N candidate patches are combined to generate the filtered |
| 416 | output. |
| 417 | |
| 418 | Let f(i) represent the filtered sample value and \f$p_{j}(i)\f$ the sample |
| 419 | value of the j-th patch. The filtering process is: |
| 420 | |
| 421 | \f[ |
| 422 | f(i) = \frac{p_{0}(i) + \sum_{j=1}^{N} ω_{j}(i).p_{j}(i)} |
| 423 | {1 + \sum_{j=1}^{N} ω_{j}(i)} |
| 424 | \f] |
| 425 | |
| 426 | where \f$ ω_{j}(i) \f$ is the weight of the j-th patch from a total of |
| 427 | N patches. The weight is determined by the patch difference as: |
| 428 | |
| 429 | \f[ |
| 430 | ω_{j}(i) = exp(-\frac{D_{j}(i)}{h^2}) |
| 431 | \f] |
| 432 | |
| 433 | where \f$ D_{j}(i) \f$ is the sum of squared difference between the current |
| 434 | block and the j-th candidate patch: |
| 435 | |
| 436 | \f[ |
| 437 | D_{j}(i) = \sum_{k\inΩ_{i}}||p_{0}(k) - p_{j}(k)||_{2} |
| 438 | \f] |
| 439 | |
| 440 | where: |
| 441 | - \f$p_{0}\f$ refers to the current frame. |
| 442 | - \f$Ω_{i}\f$ is the patch window, an "LxL" pixel square. |
| 443 | - h is a critical parameter that controls the decay of the weights measured by |
| 444 | the Euclidean distance. It is derived from an estimate of noise amplitude in |
| 445 | the source. This allows the filter coefficients to adapt for videos with |
| 446 | different noise characteristics. |
| 447 | - Usually, M = 32, N = 7, and L = 5, but they can be adjusted. |
| 448 | |
| 449 | It is recommended that the reader refers to the code for more details. |
| 450 | |
Paul Wilkins | 196995d | 2020-07-14 16:49:38 +0100 | [diff] [blame] | 451 | \subsubsection architecture_enc_frame_proc_tf_funcs Temporal Filter Functions |
Paul Wilkins | 3ceb7c7 | 2020-07-14 14:02:52 +0100 | [diff] [blame] | 452 | |
Paul Wilkins | c84e8e2 | 2020-07-21 19:09:33 +0100 | [diff] [blame] | 453 | The main entry point for temporal filtering is \ref av1_temporal_filter(). |
| 454 | This function returns 1 if temporal filtering is successful, otherwise 0. |
| 455 | When temporal filtering is applied, the filtered frame will be held in |
Angie Chiang | 29aaace | 2021-11-15 16:23:42 -0800 | [diff] [blame] | 456 | the output_frame, which is the frame to be |
Paul Wilkins | c84e8e2 | 2020-07-21 19:09:33 +0100 | [diff] [blame] | 457 | encoded in the following encoding process. |
Paul Wilkins | 3ceb7c7 | 2020-07-14 14:02:52 +0100 | [diff] [blame] | 458 | |
| 459 | Almost all temporal filter related code is in av1/encoder/temporal_filter.c |
| 460 | and av1/encoder/temporal_filter.h. |
| 461 | |
Paul Wilkins | c84e8e2 | 2020-07-21 19:09:33 +0100 | [diff] [blame] | 462 | Inside \ref av1_temporal_filter(), the reader's attention is directed to |
| 463 | \ref tf_setup_filtering_buffer() and \ref tf_do_filtering(). |
Paul Wilkins | 3ceb7c7 | 2020-07-14 14:02:52 +0100 | [diff] [blame] | 464 | |
Paul Wilkins | c84e8e2 | 2020-07-21 19:09:33 +0100 | [diff] [blame] | 465 | - \ref tf_setup_filtering_buffer(): sets up the frame buffer for |
Paul Wilkins | 3ceb7c7 | 2020-07-14 14:02:52 +0100 | [diff] [blame] | 466 | temporal filtering, determines the number of frames to be used, and |
| 467 | calculates the noise level of each frame. |
| 468 | |
Paul Wilkins | c84e8e2 | 2020-07-21 19:09:33 +0100 | [diff] [blame] | 469 | - \ref tf_do_filtering(): the main function for the temporal |
Paul Wilkins | 591f047 | 2020-07-15 15:30:56 +0100 | [diff] [blame] | 470 | filtering algorithm. It breaks each frame into "MxM" blocks. For each |
Paul Wilkins | c84e8e2 | 2020-07-21 19:09:33 +0100 | [diff] [blame] | 471 | block a motion search \ref tf_motion_search() is applied to find |
| 472 | the motion vector from one neighboring frame. tf_build_predictor() is then |
| 473 | called to build the matching patch and \ref av1_apply_temporal_filter_c() (see |
| 474 | also optimised SIMD versions) to apply temporal filtering. The weighted |
| 475 | average over each pixel is accumulated and finally normalized in |
| 476 | \ref tf_normalize_filtered_frame() to generate the final filtered frame. |
Paul Wilkins | 3ceb7c7 | 2020-07-14 14:02:52 +0100 | [diff] [blame] | 477 | |
Paul Wilkins | c84e8e2 | 2020-07-21 19:09:33 +0100 | [diff] [blame] | 478 | - \ref av1_apply_temporal_filter_c(): the core function of our temporal |
| 479 | filtering algorithm (see also optimised SIMD versions). |
Paul Wilkins | 3ceb7c7 | 2020-07-14 14:02:52 +0100 | [diff] [blame] | 480 | |
| 481 | \subsection architecture_enc_frame_proc_film Film Grain Modelling |
| 482 | |
| 483 | Add details here. |
| 484 | |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 485 | \section architecture_enc_rate_ctrl Rate Control |
| 486 | |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 487 | \subsection architecture_enc_rate_ctrl_data Main Data Structures |
| 488 | |
| 489 | The following are the main data structures referenced in this section |
| 490 | (see also \ref architecture_enc_data_structures): |
| 491 | |
Mufaddal Chakera | 358cf21 | 2021-02-25 14:41:56 +0530 | [diff] [blame] | 492 | - \ref AV1_PRIMARY ppi (the primary compressor instance data structure) |
| 493 | - \ref AV1_PRIMARY.twopass (\ref TWO_PASS) |
| 494 | |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 495 | - \ref AV1_COMP cpi (the main compressor instance data structure) |
| 496 | - \ref AV1_COMP.oxcf (\ref AV1EncoderConfig) |
| 497 | - \ref AV1_COMP.rc (\ref RATE_CONTROL) |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 498 | - \ref AV1_COMP.sf (\ref SPEED_FEATURES) |
| 499 | |
| 500 | - \ref AV1EncoderConfig (Encoder configuration parameters) |
| 501 | - \ref AV1EncoderConfig.rc_cfg (\ref RateControlCfg) |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 502 | |
| 503 | - \ref FIRSTPASS_STATS *frame_stats_buf (used to store per frame first |
| 504 | pass stats) |
| 505 | |
| 506 | - \ref SPEED_FEATURES (Encode speed vs quality tradeoff parameters) |
| 507 | - \ref SPEED_FEATURES.hl_sf (\ref HIGH_LEVEL_SPEED_FEATURES) |
| 508 | |
| 509 | \subsection architecture_enc_rate_ctrl_options Supported Rate Control Options |
| 510 | |
Paul Wilkins | 7173920 | 2020-07-23 15:09:07 +0100 | [diff] [blame] | 511 | Different use cases (\ref architecture_enc_use_cases) may have different |
| 512 | requirements in terms of data rate control. |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 513 | |
| 514 | The broad rate control strategy is selected using the <b>--end-usage</b> |
| 515 | parameter on the command line, which maps onto the field |
| 516 | \ref aom_codec_enc_cfg_t.rc_end_usage in \ref aom_encoder.h. |
| 517 | |
| 518 | The four supported options are:- |
| 519 | |
| 520 | - <b>VBR</b> (Variable Bitrate) |
| 521 | - <b>CBR</b> (Constant Bitrate) |
| 522 | - <b>CQ</b> (Constrained Quality mode ; A constrained variant of VBR) |
Paul Wilkins | e8c76eb | 2020-06-30 17:24:11 +0100 | [diff] [blame] | 523 | - <b>Fixed Q</b> (Constant quality of Q mode) |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 524 | |
| 525 | The value of \ref aom_codec_enc_cfg_t.rc_end_usage is in turn copied over |
| 526 | into the encoder rate control configuration data structure as |
Paul Wilkins | 1dd7a7e | 2020-07-09 17:07:35 +0100 | [diff] [blame] | 527 | \ref RateControlCfg.mode. |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 528 | |
| 529 | In regards to the most important use cases above, Video on demand uses either |
| 530 | VBR or CQ mode. CBR is the preferred rate control model for RTC and Live |
| 531 | streaming and Fixed Q is only used in testing. |
| 532 | |
| 533 | The behaviour of each of these modes is regulated by a series of secondary |
| 534 | command line rate control options but also depends somewhat on the selected |
| 535 | use case, whether 2-pass coding is enabled and the selected encode speed vs |
| 536 | quality trade offs (\ref AV1_COMP.speed and \ref AV1_COMP.sf). |
| 537 | |
| 538 | The list below gives the names of the main rate control command line |
| 539 | options together with the names of the corresponding fields in the rate |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 540 | control configuration data structures. |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 541 | |
Paul Wilkins | 1dd7a7e | 2020-07-09 17:07:35 +0100 | [diff] [blame] | 542 | - <b>--target-bitrate</b> (\ref RateControlCfg.target_bandwidth) |
| 543 | - <b>--min-q</b> (\ref RateControlCfg.best_allowed_q) |
| 544 | - <b>--max-q</b> (\ref RateControlCfg.worst_allowed_q) |
| 545 | - <b>--cq-level</b> (\ref RateControlCfg.cq_level) |
| 546 | - <b>--undershoot-pct</b> (\ref RateControlCfg.under_shoot_pct) |
| 547 | - <b>--overshoot-pct</b> (\ref RateControlCfg.over_shoot_pct) |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 548 | |
Debargha Mukherjee | c6a8120 | 2020-07-22 16:35:20 -0700 | [diff] [blame] | 549 | The following control aspects of vbr encoding |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 550 | |
Debargha Mukherjee | c6a8120 | 2020-07-22 16:35:20 -0700 | [diff] [blame] | 551 | - <b>--bias-pct</b> (\ref RateControlCfg.vbrbias) |
| 552 | - <b>--minsection-pct</b> ((\ref RateControlCfg.vbrmin_section) |
| 553 | - <b>--maxsection-pct</b> ((\ref RateControlCfg.vbrmax_section) |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 554 | |
| 555 | The following relate to buffer and delay management in one pass low delay and |
| 556 | real time coding |
| 557 | |
Paul Wilkins | 1dd7a7e | 2020-07-09 17:07:35 +0100 | [diff] [blame] | 558 | - <b>--buf-sz</b> (\ref RateControlCfg.maximum_buffer_size_ms) |
| 559 | - <b>--buf-initial-sz</b> (\ref RateControlCfg.starting_buffer_level_ms) |
| 560 | - <b>--buf-optimal-sz</b> (\ref RateControlCfg.optimal_buffer_level_ms) |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 561 | |
| 562 | \subsection architecture_enc_vbr Variable Bitrate (VBR) Encoding |
| 563 | |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 564 | For streamed VOD content the most common rate control strategy is Variable |
| 565 | Bitrate (VBR) encoding. The CQ mode mentioned above is a variant of this |
| 566 | where additional quantizer and quality constraints are applied. VBR |
| 567 | encoding may in theory be used in conjunction with either 1-pass or 2-pass |
| 568 | encoding. |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 569 | |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 570 | VBR encoding varies the number of bits given to each frame or group of frames |
| 571 | according to the difficulty of that frame or group of frames, such that easier |
| 572 | frames are allocated fewer bits and harder frames are allocated more bits. The |
| 573 | intent here is to even out the quality between frames. This contrasts with |
| 574 | Constant Bitrate (CBR) encoding where each frame is allocated the same number |
| 575 | of bits. |
| 576 | |
| 577 | Whilst for any given frame or group of frames the data rate may vary, the VBR |
| 578 | algorithm attempts to deliver a given average bitrate over a wider time |
| 579 | interval. In standard VBR encoding, the time interval over which the data rate |
| 580 | is averaged is usually the duration of the video clip. An alternative |
| 581 | approach is to target an average VBR bitrate over the entire video corpus for |
| 582 | a particular video format (corpus VBR). |
| 583 | |
| 584 | \subsubsection architecture_enc_1pass_vbr 1 Pass VBR Encoding |
| 585 | |
| 586 | The command line for libaom does allow 1 Pass VBR, but this has not been |
Paul Wilkins | c4cfb44 | 2020-07-01 16:15:53 +0100 | [diff] [blame] | 587 | properly optimised and behaves much like 1 pass CBR in most regards, with bits |
| 588 | allocated to frames by the following functions: |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 589 | |
| 590 | - \ref av1_calc_iframe_target_size_one_pass_vbr() |
| 591 | - \ref av1_calc_pframe_target_size_one_pass_vbr() |
| 592 | |
| 593 | \subsubsection architecture_enc_2pass_vbr 2 Pass VBR Encoding |
| 594 | |
| 595 | The main focus here will be on 2-pass VBR encoding (and the related CQ mode) |
| 596 | as these are the modes most commonly used for VOD content. |
| 597 | |
| 598 | 2-pass encoding is selected on the command line by setting --passes=2 |
| 599 | (or -p 2). |
| 600 | |
| 601 | Generally speaking, in 2-pass encoding, an encoder will first encode a video |
| 602 | using a default set of parameters and assumptions. Depending on the outcome |
| 603 | of that first encode, the baseline assumptions and parameters will be adjusted |
| 604 | to optimize the output during the second pass. In essence the first pass is a |
| 605 | fact finding mission to establish the complexity and variability of the video, |
| 606 | in order to allow a better allocation of bits in the second pass. |
| 607 | |
| 608 | The libaom 2-pass algorithm is unusual in that the first pass is not a full |
| 609 | encode of the video. Rather it uses a limited set of prediction and transform |
| 610 | options and a fixed quantizer, to generate statistics about each frame. No |
| 611 | output bitstream is created and the per frame first pass statistics are stored |
| 612 | entirely in volatile memory. This has some disadvantages when compared to a |
| 613 | full first pass encode, but avoids the need for file I/O and improves speed. |
| 614 | |
Paul Wilkins | c4cfb44 | 2020-07-01 16:15:53 +0100 | [diff] [blame] | 615 | For two pass encoding, the function \ref av1_encode() will first be called |
| 616 | for each frame in the video with the value \ref AV1EncoderConfig.pass = 1. |
| 617 | This will result in calls to \ref av1_first_pass(). |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 618 | |
Paul Wilkins | e8c76eb | 2020-06-30 17:24:11 +0100 | [diff] [blame] | 619 | Statistics for each frame are stored in \ref FIRSTPASS_STATS frame_stats_buf. |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 620 | |
| 621 | After completion of the first pass, \ref av1_encode() will be called again for |
Paul Wilkins | e8c76eb | 2020-06-30 17:24:11 +0100 | [diff] [blame] | 622 | each frame with \ref AV1EncoderConfig.pass = 2. The frames are then encoded in |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 623 | accordance with the statistics gathered during the first pass by calls to |
Paul Wilkins | a0816fc | 2020-07-23 13:33:29 +0100 | [diff] [blame] | 624 | \ref encode_frame_to_data_rate() which in turn calls |
| 625 | \ref av1_get_second_pass_params(). |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 626 | |
| 627 | In summary the second pass code :- |
| 628 | |
| 629 | - Searches for scene cuts (if auto key frame detection is enabled). |
| 630 | - Defines the length of and hierarchical structure to be used in each |
| 631 | ARF/GF group. |
| 632 | - Allocates bits based on the relative complexity of each frame, the quality |
| 633 | of frame to frame prediction and the type of frame (e.g. key frame, ARF |
| 634 | frame, golden frame or normal leaf frame). |
| 635 | - Suggests a maximum Q (quantizer value) for each ARF/GF group, based on |
| 636 | estimated complexity and recent rate control compliance |
Paul Wilkins | e8c76eb | 2020-06-30 17:24:11 +0100 | [diff] [blame] | 637 | (\ref RATE_CONTROL.active_worst_quality) |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 638 | - Tracks adherence to the overall rate control objectives and adjusts |
| 639 | heuristics. |
| 640 | |
Paul Wilkins | 591f047 | 2020-07-15 15:30:56 +0100 | [diff] [blame] | 641 | The main two pass functions in regard to the above include:- |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 642 | |
Paul Wilkins | be20bc2 | 2020-07-16 14:46:57 +0100 | [diff] [blame] | 643 | - \ref find_next_key_frame() |
Paul Wilkins | e8af152 | 2020-07-09 15:05:01 +0100 | [diff] [blame] | 644 | - \ref define_gf_group() |
Paul Wilkins | be20bc2 | 2020-07-16 14:46:57 +0100 | [diff] [blame] | 645 | - \ref calculate_total_gf_group_bits() |
| 646 | - \ref get_twopass_worst_quality() |
| 647 | - \ref av1_gop_setup_structure() |
| 648 | - \ref av1_gop_bit_allocation() |
| 649 | - \ref av1_twopass_postencode_update() |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 650 | |
| 651 | For each frame, the two pass algorithm defines a target number of bits |
Paul Wilkins | e8c76eb | 2020-06-30 17:24:11 +0100 | [diff] [blame] | 652 | \ref RATE_CONTROL.base_frame_target, which is then adjusted if necessary to |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 653 | reflect any undershoot or overshoot on previous frames to give |
Paul Wilkins | e8c76eb | 2020-06-30 17:24:11 +0100 | [diff] [blame] | 654 | \ref RATE_CONTROL.this_frame_target. |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 655 | |
Paul Wilkins | e8c76eb | 2020-06-30 17:24:11 +0100 | [diff] [blame] | 656 | As well as \ref RATE_CONTROL.active_worst_quality, the two pass code also |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 657 | maintains a record of the actual Q value used to encode previous frames |
| 658 | at each level in the current pyramid hierarchy |
Aasaipriya | c6f0a0b | 2021-08-12 11:27:03 +0530 | [diff] [blame] | 659 | (\ref PRIMARY_RATE_CONTROL.active_best_quality). The function |
Paul Wilkins | c4cfb44 | 2020-07-01 16:15:53 +0100 | [diff] [blame] | 660 | \ref rc_pick_q_and_bounds(), uses these values to set a permitted Q range |
| 661 | for each frame. |
Paul Wilkins | 83cfad4 | 2020-06-26 12:38:07 +0100 | [diff] [blame] | 662 | |
| 663 | \subsubsection architecture_enc_1pass_lagged 1 Pass Lagged VBR Encoding |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 664 | |
Paul Wilkins | e8c76eb | 2020-06-30 17:24:11 +0100 | [diff] [blame] | 665 | 1 pass lagged encode falls between simple 1 pass encoding and full two pass |
| 666 | encoding and is used for cases where it is not possible to do a full first |
| 667 | pass through the entire video clip, but where some delay is permissible. For |
| 668 | example near live streaming where there is a delay of up to a few seconds. In |
| 669 | this case the first pass and second pass are in effect combined such that the |
| 670 | first pass starts encoding the clip and the second pass lags behind it by a |
| 671 | few frames. When using this method, full sequence level statistics are not |
| 672 | available, but it is possible to collect and use frame or group of frame level |
| 673 | data to help in the allocation of bits and in defining ARF/GF coding |
Tarundeep Singh | 5e5305a | 2021-03-16 13:04:04 +0530 | [diff] [blame] | 674 | hierarchies. The reader is referred to the \ref AV1_PRIMARY.lap_enabled field |
Paul Wilkins | 7173920 | 2020-07-23 15:09:07 +0100 | [diff] [blame] | 675 | in the main compressor instance (where <b>lap</b> stands for |
Paul Wilkins | e8c76eb | 2020-06-30 17:24:11 +0100 | [diff] [blame] | 676 | <b>look ahead processing</b>). This encoding mode for the most part uses the |
| 677 | same rate control pathways as two pass VBR encoding. |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 678 | |
| 679 | \subsection architecture_enc_rc_loop The Main Rate Control Loop |
| 680 | |
Paul Wilkins | c4cfb44 | 2020-07-01 16:15:53 +0100 | [diff] [blame] | 681 | Having established a target rate for a given frame and an allowed range of Q |
| 682 | values, the encoder then tries to encode the frame at a rate that is as close |
| 683 | as possible to the target value, given the Q range constraints. |
| 684 | |
| 685 | There are two main mechanisms by which this is achieved. |
| 686 | |
| 687 | The first selects a frame level Q, using an adaptive estimate of the number of |
| 688 | bits that will be generated when the frame is encoded at any given Q. |
| 689 | Fundamentally this mechanism is common to VBR, CBR and to use cases such as |
| 690 | RTC with small adjustments. |
| 691 | |
| 692 | As the Q value mainly adjusts the precision of the residual signal, it is not |
| 693 | actually a reliable basis for accurately predicting the number of bits that |
| 694 | will be generated across all clips. A well predicted clip, for example, may |
| 695 | have a much smaller error residual after prediction. The algorithm copes with |
| 696 | this by adapting its predictions on the fly using a feedback loop based on how |
| 697 | well it did the previous time around. |
| 698 | |
| 699 | The main functions responsible for the prediction of Q and the adaptation over |
| 700 | time, for the two pass encoding pipeline are: |
| 701 | |
| 702 | - \ref rc_pick_q_and_bounds() |
Paul Wilkins | 5ce9d50 | 2020-07-16 17:58:40 +0100 | [diff] [blame] | 703 | - \ref get_q() |
| 704 | - \ref av1_rc_regulate_q() |
| 705 | - \ref get_rate_correction_factor() |
| 706 | - \ref set_rate_correction_factor() |
| 707 | - \ref find_closest_qindex_by_rate() |
Paul Wilkins | be20bc2 | 2020-07-16 14:46:57 +0100 | [diff] [blame] | 708 | - \ref av1_twopass_postencode_update() |
Paul Wilkins | 5ce9d50 | 2020-07-16 17:58:40 +0100 | [diff] [blame] | 709 | - \ref av1_rc_update_rate_correction_factors() |
Paul Wilkins | c4cfb44 | 2020-07-01 16:15:53 +0100 | [diff] [blame] | 710 | |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 711 | A second mechanism for control comes into play if there is a large rate miss |
Paul Wilkins | c4cfb44 | 2020-07-01 16:15:53 +0100 | [diff] [blame] | 712 | for the current frame (much too big or too small). This is a recode mechanism |
| 713 | which allows the current frame to be re-encoded one or more times with a |
| 714 | revised Q value. This obviously has significant implications for encode speed |
| 715 | and in the case of RTC latency (hence it is not used for the RTC pathway). |
| 716 | |
| 717 | Whether or not a recode is allowed for a given frame depends on the selected |
| 718 | encode speed vs quality trade off. This is set on the command line using the |
| 719 | --cpu-used parameter which maps onto the \ref AV1_COMP.speed field in the main |
| 720 | compressor instance data structure. |
| 721 | |
| 722 | The value of \ref AV1_COMP.speed, combined with the use case, is used to |
| 723 | populate the speed features data structure AV1_COMP.sf. In particular |
| 724 | \ref HIGH_LEVEL_SPEED_FEATURES.recode_loop determines the types of frames that |
| 725 | may be recoded and \ref HIGH_LEVEL_SPEED_FEATURES.recode_tolerance is a rate |
| 726 | error trigger threshold. |
| 727 | |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 728 | For more information the reader is directed to the following functions: |
Paul Wilkins | c4cfb44 | 2020-07-01 16:15:53 +0100 | [diff] [blame] | 729 | |
Paul Wilkins | 591f047 | 2020-07-15 15:30:56 +0100 | [diff] [blame] | 730 | - \ref encode_with_recode_loop() |
Paul Wilkins | c8d3f11 | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 731 | - \ref encode_without_recode() |
Paul Wilkins | 591f047 | 2020-07-15 15:30:56 +0100 | [diff] [blame] | 732 | - \ref recode_loop_update_q() |
| 733 | - \ref recode_loop_test() |
Paul Wilkins | 7173920 | 2020-07-23 15:09:07 +0100 | [diff] [blame] | 734 | - \ref av1_set_speed_features_framesize_independent() |
| 735 | - \ref av1_set_speed_features_framesize_dependent() |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 736 | |
| 737 | \subsection architecture_enc_fixed_q Fixed Q Mode |
| 738 | |
Paul Wilkins | ea2876f | 2020-07-13 18:36:09 +0100 | [diff] [blame] | 739 | There are two main fixed Q cases: |
| 740 | -# Fixed Q with adaptive qp offsets: same qp offset for each pyramid level |
| 741 | in a given video, but these offsets are adaptive based on video content. |
| 742 | -# Fixed Q with fixed qp offsets: content-independent fixed qp offsets for |
Jingning Han | 4eed226 | 2021-09-08 15:48:50 -0700 | [diff] [blame] | 743 | each pyramid level. |
Paul Wilkins | ea2876f | 2020-07-13 18:36:09 +0100 | [diff] [blame] | 744 | |
| 745 | The reader is also refered to the following functions: |
| 746 | - \ref av1_rc_pick_q_and_bounds() |
| 747 | - \ref rc_pick_q_and_bounds_no_stats_cbr() |
| 748 | - \ref rc_pick_q_and_bounds_no_stats() |
| 749 | - \ref rc_pick_q_and_bounds() |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 750 | |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 751 | \section architecture_enc_frame_groups GF/ ARF Frame Groups & Hierarchical Coding |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 752 | |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 753 | \subsection architecture_enc_frame_groups_data Main Data Structures |
| 754 | |
| 755 | The following are the main data structures referenced in this section |
| 756 | (see also \ref architecture_enc_data_structures): |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 757 | |
| 758 | - \ref AV1_COMP cpi (the main compressor instance data structure) |
| 759 | - \ref AV1_COMP.rc (\ref RATE_CONTROL) |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 760 | |
| 761 | - \ref FIRSTPASS_STATS *frame_stats_buf (used to store per frame first pass |
| 762 | stats) |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 763 | |
| 764 | \subsection architecture_enc_frame_groups_groups Frame Groups |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 765 | |
| 766 | To process a sequence/stream of video frames, the encoder divides the frames |
| 767 | into groups and encodes them sequentially (possibly dependent on previous |
| 768 | groups). In AV1 such a group is usually referred to as a golden frame group |
| 769 | (GF group) or sometimes an Alt-Ref (ARF) group or a group of pictures (GOP). |
| 770 | A GF group determines and stores the coding structure of the frames (for |
| 771 | example, frame type, usage of the hierarchical structure, usage of overlay |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 772 | frames, etc.) and can be considered as the base unit to process the frames, |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 773 | therefore playing an important role in the encoder. |
| 774 | |
| 775 | The length of a specific GF group is arguably the most important aspect when |
| 776 | determining a GF group. This is because most GF group level decisions are |
| 777 | based on the frame characteristics, if not on the length itself directly. |
| 778 | Note that the GF group is always a group of consecutive frames, which means |
| 779 | the start and end of the group (so again, the length of it) determines which |
| 780 | frames are included in it and hence determines the characteristics of the GF |
| 781 | group. Therefore, in this document we will first discuss the GF group length |
| 782 | decision in Libaom, followed by frame structure decisions when defining a GF |
| 783 | group with a certain length. |
| 784 | |
| 785 | \subsection architecture_enc_gf_length GF / ARF Group Length Determination |
| 786 | |
| 787 | The basic intuition of determining the GF group length is that it is usually |
| 788 | desirable to group together frames that are similar. Hence, we may choose |
| 789 | longer groups when consecutive frames are very alike and shorter ones when |
| 790 | they are very different. |
| 791 | |
bohanli | d165b19 | 2020-06-10 21:46:29 -0700 | [diff] [blame] | 792 | The determination of the GF group length is done in function \ref |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 793 | calculate_gf_length(). The following encoder use cases are supported: |
| 794 | |
| 795 | <ul> |
Paul Wilkins | ff98f3e | 2020-07-27 16:01:05 +0100 | [diff] [blame] | 796 | <li><b>Single pass with look-ahead disabled(\ref has_no_stats_stage()): |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 797 | </b> in this case there is no information available on the following stream |
| 798 | of frames, therefore the function will set the GF group length for the |
| 799 | current and the following GF groups (a total number of MAX_NUM_GF_INTERVALS |
| 800 | groups) to be the maximum value allowed.</li> |
| 801 | |
Tarundeep Singh | 5e5305a | 2021-03-16 13:04:04 +0530 | [diff] [blame] | 802 | <li><b>Single pass with look-ahead enabled (\ref AV1_PRIMARY.lap_enabled):</b> |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 803 | look-ahead processing is enabled for single pass, therefore there is a |
| 804 | limited amount of information available regarding future frames. In this |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 805 | case the function will determine the length based on \ref FIRSTPASS_STATS |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 806 | (which is generated when processing the look-ahead buffer) for only the |
| 807 | current GF group.</li> |
| 808 | |
| 809 | <li><b>Two pass:</b> the first pass in two-pass encoding collects the stats |
| 810 | and will not call the function. In the second pass, the function tries to |
| 811 | determine the GF group length of the current and the following GF groups (a |
| 812 | total number of MAX_NUM_GF_INTERVALS groups) based on the first-pass |
| 813 | statistics. Note that as we will be discussing later, such decisions may not |
| 814 | be accurate and can be changed later.</li> |
| 815 | </ul> |
| 816 | |
| 817 | Except for the first trivial case where there is no prior knowledge of the |
Bohan Li | cb3b65b | 2020-11-04 13:50:00 -0800 | [diff] [blame] | 818 | following frames, the function \ref calculate_gf_length() tries to determine the |
| 819 | GF group length based on the first pass statistics. The determination is divided |
| 820 | into two parts: |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 821 | |
| 822 | <ol> |
| 823 | <li>Baseline decision based on accumulated statistics: this part of the function |
| 824 | iterates through the firstpass statistics of the following frames and |
| 825 | accumulates the statistics with function accumulate_next_frame_stats. |
| 826 | The accumulated statistics are then used to determine whether the |
| 827 | correlation in the GF group has dropped too much in function detect_gf_cut. |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 828 | If detect_gf_cut returns non-zero, or if we've reached the end of |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 829 | first-pass statistics, the baseline decision is set at the current point.</li> |
| 830 | |
| 831 | <li>If we are not at the end of the first-pass statistics, the next part will |
Bohan Li | cb3b65b | 2020-11-04 13:50:00 -0800 | [diff] [blame] | 832 | try to refine the baseline decision. This algorithm is based on the analysis |
| 833 | of firstpass stats. It tries to cut the groups in stable regions or |
| 834 | relatively stable points. Also it tries to avoid cutting in a blending |
| 835 | region.</li> |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 836 | </ol> |
| 837 | |
bohanli | d165b19 | 2020-06-10 21:46:29 -0700 | [diff] [blame] | 838 | As mentioned, for two-pass encoding, the function \ref |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 839 | calculate_gf_length() tries to determine the length of as many as |
| 840 | MAX_NUM_GF_INTERVALS groups. The decisions are stored in |
Mufaddal Chakera | 94ee9bf | 2021-04-12 01:02:22 +0530 | [diff] [blame] | 841 | \ref PRIMARY_RATE_CONTROL.gf_intervals[]. The variables |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 842 | \ref RATE_CONTROL.intervals_till_gf_calculate_due and |
Mufaddal Chakera | 94ee9bf | 2021-04-12 01:02:22 +0530 | [diff] [blame] | 843 | \ref PRIMARY_RATE_CONTROL.gf_intervals[] help with managing and updating the stored |
bohanli | d165b19 | 2020-06-10 21:46:29 -0700 | [diff] [blame] | 844 | decisions. In the function \ref define_gf_group(), the corresponding |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 845 | stored length decision will be used to define the current GF group. |
| 846 | |
| 847 | When the maximum GF group length is larger or equal to 32, the encoder will |
| 848 | enforce an extra layer to determine whether to use maximum GF length of 32 |
bohanli | d165b19 | 2020-06-10 21:46:29 -0700 | [diff] [blame] | 849 | or 16 for every GF group. In such a case, \ref calculate_gf_length() is |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 850 | first called with the original maximum length (>=32). Afterwards, |
Paul Wilkins | ff98f3e | 2020-07-27 16:01:05 +0100 | [diff] [blame] | 851 | \ref av1_tpl_setup_stats() is called to analyze the determined GF group |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 852 | and compare the reference to the last frame and the middle frame. If it is |
| 853 | decided that we should use a maximum GF length of 16, the function |
bohanli | d165b19 | 2020-06-10 21:46:29 -0700 | [diff] [blame] | 854 | \ref calculate_gf_length() is called again with the updated maximum |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 855 | length, and it only sets the length for a single GF group |
| 856 | (\ref RATE_CONTROL.intervals_till_gf_calculate_due is set to 1). This process |
Bohan Li | cb3b65b | 2020-11-04 13:50:00 -0800 | [diff] [blame] | 857 | is shown below. |
| 858 | |
| 859 | \image html tplgfgroupdiagram.png "" width=40% |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 860 | |
| 861 | Before encoding each frame, the encoder checks |
| 862 | \ref RATE_CONTROL.frames_till_gf_update_due. If it is zero, indicating |
| 863 | processing of the current GF group is done, the encoder will check whether |
| 864 | \ref RATE_CONTROL.intervals_till_gf_calculate_due is zero. If it is, as |
bohanli | d165b19 | 2020-06-10 21:46:29 -0700 | [diff] [blame] | 865 | discussed above, \ref calculate_gf_length() is called with original |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 866 | maximum length. If it is not zero, then the GF group length value stored |
Mufaddal Chakera | 94ee9bf | 2021-04-12 01:02:22 +0530 | [diff] [blame] | 867 | in \ref PRIMARY_RATE_CONTROL.gf_intervals[\ref PRIMARY_RATE_CONTROL.cur_gf_index] is used |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 868 | (subject to change as discussed above). |
| 869 | |
Paul Wilkins | e8af152 | 2020-07-09 15:05:01 +0100 | [diff] [blame] | 870 | \subsection architecture_enc_gf_structure Defining a GF Group's Structure |
| 871 | |
| 872 | The function \ref define_gf_group() defines the frame structure as well |
| 873 | as other GF group level parameters (e.g. bit allocation) once the length of |
| 874 | the current GF group is determined. |
| 875 | |
Bohan Li | cb3b65b | 2020-11-04 13:50:00 -0800 | [diff] [blame] | 876 | The function first iterates through the first pass statistics in the GF group to |
| 877 | accumulate various stats, using accumulate_this_frame_stats() and |
| 878 | accumulate_next_frame_stats(). The accumulated statistics are then used to |
| 879 | determine the use of the use of ALTREF frame along with other properties of the |
Mufaddal Chakera | 94ee9bf | 2021-04-12 01:02:22 +0530 | [diff] [blame] | 880 | GF group. The values of \ref PRIMARY_RATE_CONTROL.cur_gf_index, \ref |
Bohan Li | cb3b65b | 2020-11-04 13:50:00 -0800 | [diff] [blame] | 881 | RATE_CONTROL.intervals_till_gf_calculate_due and \ref |
| 882 | RATE_CONTROL.frames_till_gf_update_due are also updated accordingly. |
Paul Wilkins | e8af152 | 2020-07-09 15:05:01 +0100 | [diff] [blame] | 883 | |
Bohan Li | cb3b65b | 2020-11-04 13:50:00 -0800 | [diff] [blame] | 884 | The function \ref av1_gop_setup_structure() is called at the end to determine |
| 885 | the frame layers and reference maps in the GF group, where the |
| 886 | construct_multi_layer_gf_structure() function sets the frame update types for |
| 887 | each frame and the group structure. |
Paul Wilkins | e8af152 | 2020-07-09 15:05:01 +0100 | [diff] [blame] | 888 | |
| 889 | - If ALTREF frames are allowed for the GF group: the first frame is set to |
Bohan Li | cb3b65b | 2020-11-04 13:50:00 -0800 | [diff] [blame] | 890 | KF_UPDATE, GF_UPDATE or ARF_UPDATE. The last frames of the GF group is set to |
| 891 | OVERLAY_UPDATE. Then in set_multi_layer_params(), frame update |
| 892 | types are determined recursively in a binary tree fashion, and assigned to |
| 893 | give the final IBBB structure for the group. - If the current branch has more |
| 894 | than 2 frames and we have not reached maximum layer depth, then the middle |
| 895 | frame is set as INTNL_ARF_UPDATE, and the left and right branches are |
| 896 | processed recursively. - If the current branch has less than 3 frames, or we |
| 897 | have reached maximum layer depth, then every frame in the branch is set to |
| 898 | LF_UPDATE. |
Paul Wilkins | e8af152 | 2020-07-09 15:05:01 +0100 | [diff] [blame] | 899 | |
Bohan Li | cb3b65b | 2020-11-04 13:50:00 -0800 | [diff] [blame] | 900 | - If ALTREF frame is not allowed for the GF group: the frames are set |
| 901 | as LF_UPDATE. This basically forms an IPPP GF group structure. |
| 902 | |
| 903 | As mentioned, the encoder may use Temporal dependancy modelling (TPL - see \ref |
| 904 | architecture_enc_tpl) to determine whether we should use a maximum length of 32 |
| 905 | or 16 for the current GF group. This requires calls to \ref define_gf_group() |
| 906 | but should not change other settings (since it is in essence a trial). This |
| 907 | special case is indicated by the setting parameter <b>is_final_pass</b> for to |
| 908 | zero. |
Paul Wilkins | e8af152 | 2020-07-09 15:05:01 +0100 | [diff] [blame] | 909 | |
| 910 | For single pass encodes where look-ahead processing is disabled |
Tarundeep Singh | 5e5305a | 2021-03-16 13:04:04 +0530 | [diff] [blame] | 911 | (\ref AV1_PRIMARY.lap_enabled = 0), \ref define_gf_group_pass0() is used |
Paul Wilkins | e8af152 | 2020-07-09 15:05:01 +0100 | [diff] [blame] | 912 | instead of \ref define_gf_group(). |
| 913 | |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 914 | \subsection architecture_enc_kf_groups Key Frame Groups |
| 915 | |
| 916 | A special constraint for GF group length is the location of the next keyframe |
| 917 | (KF). The frames between two KFs are referred to as a KF group. Each KF group |
| 918 | can be encoded and decoded independently. Because of this, a GF group cannot |
| 919 | span beyond a KF and the location of the next KF is set as a hard boundary |
| 920 | for GF group length. |
| 921 | |
| 922 | <ul> |
| 923 | <li>For two-pass encoding \ref RATE_CONTROL.frames_to_key controls when to |
| 924 | encode a key frame. When it is zero, the current frame is a keyframe and |
bohanli | d165b19 | 2020-06-10 21:46:29 -0700 | [diff] [blame] | 925 | the function \ref find_next_key_frame() is called. This in turn calls |
| 926 | \ref define_kf_interval() to work out where the next key frame should |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 927 | be placed.</li> |
| 928 | |
bohanli | d165b19 | 2020-06-10 21:46:29 -0700 | [diff] [blame] | 929 | <li>For single-pass with look-ahead enabled, \ref define_kf_interval() |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 930 | is called whenever a GF group update is needed (when |
| 931 | \ref RATE_CONTROL.frames_till_gf_update_due is zero). This is because |
| 932 | generally KFs are more widely spaced and the look-ahead buffer is usually |
| 933 | not long enough.</li> |
| 934 | |
| 935 | <li>For single-pass with look-ahead disabled, the KFs are placed according |
| 936 | to the command line parameter <b>--kf-max-dist</b> (The above two cases are |
| 937 | also subject to this constraint).</li> |
| 938 | </ul> |
| 939 | |
bohanli | d165b19 | 2020-06-10 21:46:29 -0700 | [diff] [blame] | 940 | The function \ref define_kf_interval() tries to detect a scenecut. |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 941 | If a scenecut within kf-max-dist is detected, then it is set as the next |
| 942 | keyframe. Otherwise the given maximum value is used. |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 943 | |
| 944 | \section architecture_enc_tpl Temporal Dependency Modelling |
Paul Wilkins | 1fb0172 | 2020-07-07 17:45:46 +0100 | [diff] [blame] | 945 | |
Paul Wilkins | f209ec5 | 2020-07-06 16:03:52 +0100 | [diff] [blame] | 946 | The temporal dependency model runs at the beginning of each GOP. It builds the |
| 947 | motion trajectory within the GOP in units of 16x16 blocks. The temporal |
| 948 | dependency of a 16x16 block is evaluated as the predictive coding gains it |
| 949 | contributes to its trailing motion trajectory. This temporal dependency model |
| 950 | reflects how important a coding block is for the coding efficiency of the |
| 951 | overall GOP. It is hence used to scale the Lagrangian multiplier used in the |
| 952 | rate-distortion optimization framework. |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 953 | |
Paul Wilkins | f209ec5 | 2020-07-06 16:03:52 +0100 | [diff] [blame] | 954 | \subsection architecture_enc_tpl_config Configurations |
| 955 | |
| 956 | The temporal dependency model and its applications are by default turned on in |
| 957 | libaom encoder for the VoD use case. To disable it, use --tpl-model=0 in the |
| 958 | aomenc configuration. |
| 959 | |
Paul Wilkins | f209ec5 | 2020-07-06 16:03:52 +0100 | [diff] [blame] | 960 | \subsection architecture_enc_tpl_algoritms Algorithms |
| 961 | |
| 962 | The scheme works in the reverse frame processing order over the source frames, |
| 963 | propagating information from future frames back to the current frame. For each |
| 964 | frame, a propagation step is run for each MB. it operates as follows: |
| 965 | |
| 966 | <ul> |
| 967 | <li> Estimate the intra prediction cost in terms of sum of absolute Hadamard |
| 968 | transform difference (SATD) noted as intra_cost. It also loads the motion |
| 969 | information available from the first-pass encode and estimates the inter |
| 970 | prediction cost as inter_cost. Due to the use of hybrid inter/intra |
| 971 | prediction mode, the inter_cost value is further upper bounded by |
| 972 | intra_cost. A propagation cost variable is used to collect all the |
| 973 | information flowed back from future processing frames. It is initialized as |
| 974 | 0 for all the blocks in the last processing frame in a group of pictures |
| 975 | (GOP).</li> |
| 976 | |
| 977 | <li> The fraction of information from a current block to be propagated towards |
| 978 | its reference block is estimated as: |
| 979 | \f[ |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 980 | propagation\_fraction = (1 - inter\_cost/intra\_cost) |
Paul Wilkins | f209ec5 | 2020-07-06 16:03:52 +0100 | [diff] [blame] | 981 | \f] |
| 982 | It reflects how much the motion compensated reference would reduce the |
| 983 | prediction error in percentage.</li> |
| 984 | |
| 985 | <li> The total amount of information the current block contributes to the GOP |
| 986 | is estimated as intra_cost + propagation_cost. The information that it |
| 987 | propagates towards its reference block is captured by: |
| 988 | |
| 989 | \f[ |
| 990 | propagation\_amount = |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 991 | (intra\_cost + propagation\_cost) * propagation\_fraction |
Paul Wilkins | f209ec5 | 2020-07-06 16:03:52 +0100 | [diff] [blame] | 992 | \f]</li> |
| 993 | |
| 994 | <li> Note that the reference block may not necessarily sit on the grid of |
| 995 | 16x16 blocks. The propagation amount is hence dispensed to all the blocks |
| 996 | that overlap with the reference block. The corresponding block in the |
| 997 | reference frame accumulates its own propagation cost as it receives back |
| 998 | propagation. |
| 999 | |
| 1000 | \f[ |
| 1001 | propagation\_cost = propagation\_cost + |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 1002 | (\frac{overlap\_area}{(16*16)} * propagation\_amount) |
Paul Wilkins | f209ec5 | 2020-07-06 16:03:52 +0100 | [diff] [blame] | 1003 | \f]</li> |
| 1004 | |
| 1005 | <li> In the final encoding stage, the distortion propagation factor of a block |
| 1006 | is evaluated as \f$(1 + \frac{propagation\_cost}{intra\_cost})\f$, where the second term |
| 1007 | captures its impact on later frames in a GOP.</li> |
| 1008 | |
| 1009 | <li> The Lagrangian multiplier is adapted at the 64x64 block level. For every |
| 1010 | 64x64 block in a frame, we have a distortion propagation factor: |
| 1011 | |
| 1012 | \f[ |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 1013 | dist\_prop[i] = 1 + \frac{propagation\_cost[i]}{intra\_cost[i]} |
Paul Wilkins | f209ec5 | 2020-07-06 16:03:52 +0100 | [diff] [blame] | 1014 | \f] |
| 1015 | |
| 1016 | where i denotes the block index in the frame. We also have the frame level |
| 1017 | distortion propagation factor: |
| 1018 | |
| 1019 | \f[ |
| 1020 | dist\_prop = 1 + |
Paul Wilkins | b2194de | 2020-07-08 17:58:14 +0100 | [diff] [blame] | 1021 | \frac{\sum_{i}propagation\_cost[i]}{\sum_{i}intra\_cost[i]} |
Paul Wilkins | f209ec5 | 2020-07-06 16:03:52 +0100 | [diff] [blame] | 1022 | \f] |
| 1023 | |
| 1024 | which is used to normalize the propagation factor at the 64x64 block level. The |
| 1025 | Lagrangian multiplier is hence adapted as: |
| 1026 | |
| 1027 | \f[ |
| 1028 | λ[i] = λ[0] * \frac{dist\_prop}{dist\_prop[i]} |
| 1029 | \f] |
| 1030 | |
| 1031 | where λ0 is the multiplier associated with the frame level QP. The |
| 1032 | 64x64 block level QP is scaled according to the Lagrangian multiplier. |
| 1033 | </ul> |
| 1034 | |
Paul Wilkins | ff98f3e | 2020-07-27 16:01:05 +0100 | [diff] [blame] | 1035 | \subsection architecture_enc_tpl_keyfun Key Functions and data structures |
Paul Wilkins | f209ec5 | 2020-07-06 16:03:52 +0100 | [diff] [blame] | 1036 | |
Paul Wilkins | ff98f3e | 2020-07-27 16:01:05 +0100 | [diff] [blame] | 1037 | The reader is also refered to the following functions and data structures: |
| 1038 | |
| 1039 | - \ref TplParams |
| 1040 | - \ref av1_tpl_setup_stats() builds the TPL model. |
| 1041 | - \ref setup_delta_q() Assign different quantization parameters to each super |
| 1042 | block based on its TPL weight. |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 1043 | |
| 1044 | \section architecture_enc_partitions Block Partition Search |
| 1045 | |
Paul Wilkins | 196995d | 2020-07-14 16:49:38 +0100 | [diff] [blame] | 1046 | A frame is first split into tiles in \ref encode_tiles(), with each tile |
| 1047 | compressed by av1_encode_tile(). Then a tile is processed in superblock rows |
| 1048 | via \ref av1_encode_sb_row() and then \ref encode_sb_row(). |
| 1049 | |
| 1050 | The partition search processes superblocks sequentially in \ref |
| 1051 | encode_sb_row(). Two search modes are supported, depending upon the encoding |
| 1052 | configuration, \ref encode_nonrd_sb() is for 1-pass and real-time modes, |
| 1053 | while \ref encode_rd_sb() performs more exhaustive rate distortion based |
| 1054 | searches. |
| 1055 | |
| 1056 | Partition search over the recursive quad-tree space is implemented by |
| 1057 | recursive calls to \ref av1_nonrd_use_partition(), |
| 1058 | \ref av1_rd_use_partition(), or av1_rd_pick_partition() and returning best |
| 1059 | options for sub-trees to their parent partitions. |
| 1060 | |
Paul Wilkins | 3a13f64 | 2020-07-29 17:35:33 +0100 | [diff] [blame] | 1061 | In libaom, the partition search lays on top of the mode search (predictor, |
| 1062 | transform, etc.), instead of being a separate module. The interface of mode |
| 1063 | search is \ref pick_sb_modes(), which connects the partition_search with |
| 1064 | \ref architecture_enc_inter_modes and \ref architecture_enc_intra_modes. To |
| 1065 | make good decisions, reconstruction is also required in order to build |
| 1066 | references and contexts. This is implemented by \ref encode_sb() at the |
| 1067 | sub-tree level and \ref encode_b() at coding block level. |
Paul Wilkins | 196995d | 2020-07-14 16:49:38 +0100 | [diff] [blame] | 1068 | |
| 1069 | See also \ref partition_search |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 1070 | |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 1071 | \section architecture_enc_intra_modes Intra Mode Search |
| 1072 | |
Paul Wilkins | 4ac8bf4 | 2020-07-30 16:44:27 +0100 | [diff] [blame] | 1073 | AV1 also provides 71 different intra prediction modes, i.e. modes that predict |
| 1074 | only based upon information in the current frame with no dependency on |
| 1075 | previous or future frames. For key frames, where this independence from any |
| 1076 | other frame is a defining requirement and for other cases where intra only |
| 1077 | frames are required, the encoder need only considers these modes in the rate |
| 1078 | distortion loop. |
| 1079 | |
| 1080 | Even so, in most use cases, searching all possible intra prediction modes for |
| 1081 | every block and partition size is not practical and some pruning of the search |
| 1082 | tree is necessary. |
| 1083 | |
| 1084 | For the Rate distortion optimized case, the main top level function |
| 1085 | responsible for selecting the intra prediction mode for a given block is |
| 1086 | \ref av1_rd_pick_intra_mode_sb(). The readers attention is also drawn to the |
| 1087 | functions \ref hybrid_intra_mode_search() and \ref av1_nonrd_pick_intra_mode() |
| 1088 | which may be used where encode speed is critical. The choice between the |
| 1089 | rd path and the non rd or hybrid paths depends on the encoder use case and the |
| 1090 | \ref AV1_COMP.speed parameter. Further fine control of the speed vs quality |
| 1091 | trade off is provided by means of fields in \ref AV1_COMP.sf (which has type |
| 1092 | \ref SPEED_FEATURES). |
| 1093 | |
| 1094 | Note that some intra modes are only considered for specific use cases or |
| 1095 | types of video. For example the palette based prediction modes are often |
| 1096 | valueable for graphics or screen share content but not for natural video. |
| 1097 | (See \ref av1_search_palette_mode()) |
| 1098 | |
Paul Wilkins | 3a13f64 | 2020-07-29 17:35:33 +0100 | [diff] [blame] | 1099 | See also \ref intra_mode_search for more details. |
| 1100 | |
| 1101 | \section architecture_enc_inter_modes Inter Prediction Mode Search |
| 1102 | |
Paul Wilkins | da6a80b | 2020-07-30 17:27:56 +0100 | [diff] [blame] | 1103 | For inter frames, where we also allow prediction using one or more previously |
| 1104 | coded frames (which may chronologically speaking be past or future frames or |
| 1105 | non-display reference buffers such as ARF frames), the size of the search tree |
| 1106 | that needs to be traversed, to select a prediction mode, is considerably more |
| 1107 | massive. |
| 1108 | |
| 1109 | In addition to the 71 possible intra modes we also need to consider 56 single |
| 1110 | frame inter prediction modes (7 reference frames x 4 modes x 2 for OBMC |
| 1111 | (overlapped block motion compensation)), 12768 compound inter prediction modes |
| 1112 | (these are modes that combine inter predictors from two reference frames) and |
| 1113 | 36708 compound inter / intra prediction modes. |
| 1114 | |
| 1115 | As with the intra mode search, libaom supports an RD based pathway and a non |
| 1116 | rd pathway for speed critical use cases. The entry points for these two cases |
Jingning Han | e9eb8c0 | 2020-11-11 14:47:53 -0800 | [diff] [blame] | 1117 | are \ref av1_rd_pick_inter_mode() and \ref av1_nonrd_pick_inter_mode_sb() |
Paul Wilkins | da6a80b | 2020-07-30 17:27:56 +0100 | [diff] [blame] | 1118 | respectively. |
| 1119 | |
| 1120 | Various heuristics and predictive strategies are used to prune the search tree |
| 1121 | with fine control provided through the speed features parameter in the main |
| 1122 | compressor instance data structure \ref AV1_COMP.sf. |
| 1123 | |
| 1124 | It is worth noting, that some prediction modes incurr a much larger rate cost |
| 1125 | than others (ignoring for now the cost of coding the error residual). For |
| 1126 | example, a compound mode that requires the encoder to specify two reference |
| 1127 | frames and two new motion vectors will almost inevitable have a higher rate |
| 1128 | cost than a simple inter prediction mode that uses a predicted or 0,0 motion |
| 1129 | vector. As such, if we have already found a mode for the current block that |
| 1130 | has a low RD cost, we can skip a large number of the possible modes on the |
| 1131 | basis that even if the error residual is 0 the inherent rate cost of the |
| 1132 | mode itself will garauntee that it is not chosen. |
| 1133 | |
Paul Wilkins | 3a13f64 | 2020-07-29 17:35:33 +0100 | [diff] [blame] | 1134 | See also \ref inter_mode_search for more details. |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 1135 | |
| 1136 | \section architecture_enc_tx_search Transform Search |
| 1137 | |
Paul Wilkins | 8ed85dd | 2020-08-04 17:48:22 +0100 | [diff] [blame] | 1138 | AV1 implements the transform stage using 4 seperable 1-d transforms (DCT, |
| 1139 | ADST, FLIPADST and IDTX, where FLIPADST is the reversed version of ADST |
| 1140 | and IDTX is the identity transform) which can be combined to give 16 2-d |
| 1141 | combinations. |
Paul Wilkins | 3a13f64 | 2020-07-29 17:35:33 +0100 | [diff] [blame] | 1142 | |
| 1143 | These combinations can be applied at 19 different scales from 64x64 pixels |
| 1144 | down to 4x4 pixels. |
| 1145 | |
| 1146 | This gives rise to a large number of possible candidate transform options |
| 1147 | for coding the residual error after prediction. An exhaustive rate-distortion |
| 1148 | based evaluation of all candidates would not be practical from a speed |
| 1149 | perspective in a production encoder implementation. Hence libaom addopts a |
| 1150 | number of strategies to prune the selection of both the transform size and |
| 1151 | transform type. |
| 1152 | |
| 1153 | There are a number of strategies that have been tested and implememnted in |
| 1154 | libaom including: |
| 1155 | |
| 1156 | - A statistics based approach that looks at the frequency with which certain |
| 1157 | combinations are used in a given context and prunes out very unlikely |
| 1158 | candidates. It is worth noting here that some size candidates can be pruned |
| 1159 | out immediately based on the size of the prediction partition. For example it |
| 1160 | does not make sense to use a transform size that is larger than the |
| 1161 | prediction partition size but also a very large prediction partition size is |
| 1162 | unlikely to be optimally pared with small transforms. |
| 1163 | |
| 1164 | - A Machine learning based model |
| 1165 | |
| 1166 | - A method that initially tests candidates using a fast algorithm that skips |
| 1167 | entropy encoding and uses an estimated cost model to choose a reduced subset |
| 1168 | for full RD analysis. This subject is covered more fully in a paper authored |
| 1169 | by Bohan Li, Jingning Han, and Yaowu Xu titled: <b>Fast Transform Type |
| 1170 | Selection Using Conditional Laplace Distribution Based Rate Estimation</b> |
| 1171 | |
| 1172 | <b>TODO Add link to paper when available</b> |
| 1173 | |
| 1174 | See also \ref transform_search for more details. |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 1175 | |
Paul Wilkins | d7a9f0e | 2020-07-30 18:12:40 +0100 | [diff] [blame] | 1176 | \section architecture_post_enc_filt Post Encode Loop Filtering |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 1177 | |
Paul Wilkins | d7a9f0e | 2020-07-30 18:12:40 +0100 | [diff] [blame] | 1178 | AV1 supports three types of post encode <b>in loop</b> filtering to improve |
| 1179 | the quality of the reconstructed video. |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 1180 | |
Paul Wilkins | d7a9f0e | 2020-07-30 18:12:40 +0100 | [diff] [blame] | 1181 | - <b>Deblocking Filter</b> The first of these is a farily traditional boundary |
| 1182 | deblocking filter that attempts to smooth discontinuities that may occur at |
| 1183 | the boundaries between blocks. See also \ref in_loop_filter. |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 1184 | |
Paul Wilkins | d7a9f0e | 2020-07-30 18:12:40 +0100 | [diff] [blame] | 1185 | - <b>CDEF Filter</b> The constrained directional enhancement filter (CDEF) |
| 1186 | allows the codec to apply a non-linear deringing filter along certain |
| 1187 | (potentially oblique) directions. A primary filter is applied along the |
Paul Wilkins | 10e9944 | 2020-08-05 15:35:44 +0100 | [diff] [blame] | 1188 | selected direction, whilst a secondary filter is applied at 45 degrees to |
Paul Wilkins | f88a151 | 2020-10-20 13:18:40 +0100 | [diff] [blame] | 1189 | the primary direction. (See also \ref in_loop_cdef and |
| 1190 | <a href="https://arxiv.org/abs/2008.06091"> A Technical Overview of AV1</a>. |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 1191 | |
Paul Wilkins | d7a9f0e | 2020-07-30 18:12:40 +0100 | [diff] [blame] | 1192 | - <b>Loop Restoration Filter</b> The loop restoration filter is applied after |
Paul Wilkins | 10e9944 | 2020-08-05 15:35:44 +0100 | [diff] [blame] | 1193 | any prior post filtering stages. It acts on units of either 64 x 64, |
| 1194 | 128 x 128, or 256 x 256 pixel blocks, refered to as loop restoration units. |
Paul Wilkins | d7a9f0e | 2020-07-30 18:12:40 +0100 | [diff] [blame] | 1195 | Each unit can independently select either to bypass filtering, use a Wiener |
| 1196 | filter, or use a self-guided filter. (See also \ref in_loop_restoration and |
Paul Wilkins | f88a151 | 2020-10-20 13:18:40 +0100 | [diff] [blame] | 1197 | <a href="https://arxiv.org/abs/2008.06091"> A Technical Overview of AV1</a>. |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 1198 | |
| 1199 | \section architecture_entropy Entropy Coding |
| 1200 | |
Paul Wilkins | ef79fe4 | 2020-08-04 19:32:11 +0100 | [diff] [blame] | 1201 | \subsection architecture_entropy_aritmetic Arithmetic Coder |
| 1202 | |
| 1203 | VP9, used a binary arithmetic coder to encode symbols, where the propability |
| 1204 | of a 1 or 0 at each descision node was based on a context model that took |
| 1205 | into account recently coded values (for example previously coded coefficients |
| 1206 | in the current block). A mechanism existed to update the context model each |
| 1207 | frame, either explicitly in the bitstream, or implicitly at both the encoder |
| 1208 | and decoder based on the observed frequency of different outcomes in the |
| 1209 | previous frame. VP9 also supported seperate context models for different types |
| 1210 | of frame (e.g. inter coded frames and key frames). |
| 1211 | |
| 1212 | In contrast, AV1 uses an M-ary symbol arithmetic coder to compress the syntax |
| 1213 | elements, where integer \f$M\in[2, 14]\f$. This approach is based upon the entropy |
| 1214 | coding strategy used in the Daala video codec and allows for some bit-level |
| 1215 | parallelism in its implementation. AV1 also has an extended context model and |
| 1216 | allows for updates to the probabilities on a per symbol basis as opposed to |
| 1217 | the per frame strategy in VP9. |
| 1218 | |
| 1219 | To improve the performance / throughput of the arithmetic encoder, especially |
| 1220 | in hardware implementations, the probability model is updated and maintained |
| 1221 | at 15-bit precision, but the arithmetic encoder only uses the most significant |
| 1222 | 9 bits when encoding a symbol. A more detailed discussion of the algorithm |
Paul Wilkins | f88a151 | 2020-10-20 13:18:40 +0100 | [diff] [blame] | 1223 | and design constraints can be found in |
| 1224 | <a href="https://arxiv.org/abs/2008.06091"> A Technical Overview of AV1</a>. |
Paul Wilkins | ef79fe4 | 2020-08-04 19:32:11 +0100 | [diff] [blame] | 1225 | |
| 1226 | TODO add references to key functions / files. |
| 1227 | |
| 1228 | As with VP9, a mechanism exists in AV1 to encode some elements into the |
| 1229 | bitstream as uncrompresed bits or literal values, without using the arithmetic |
| 1230 | coder. For example, some frame and sequence header values, where it is |
| 1231 | beneficial to be able to read the values directly. |
| 1232 | |
| 1233 | TODO add references to key functions / files. |
Paul Wilkins | 386cb69 | 2020-08-04 18:11:17 +0100 | [diff] [blame] | 1234 | |
angiebird | 9101c0e | 2020-08-17 11:16:23 -0700 | [diff] [blame] | 1235 | \subsection architecture_entropy_coef Transform Coefficient Coding and Optimization |
| 1236 | \image html coeff_coding.png "" width=70% |
Paul Wilkins | 386cb69 | 2020-08-04 18:11:17 +0100 | [diff] [blame] | 1237 | |
angiebird | 9101c0e | 2020-08-17 11:16:23 -0700 | [diff] [blame] | 1238 | \subsubsection architecture_entropy_coef_what Transform coefficient coding |
| 1239 | Transform coefficient coding is where the encoder compresses a quantized version |
| 1240 | of prediction residue into the bitstream. |
| 1241 | |
| 1242 | \paragraph architecture_entropy_coef_prepare Preparation - transform and quantize |
| 1243 | Before the entropy coding stage, the encoder decouple the pixel-to-pixel |
| 1244 | correlation of the prediction residue by transforming the residue from the |
| 1245 | spatial domain to the frequency domain. Then the encoder quantizes the transform |
| 1246 | coefficients to make the coefficients ready for entropy coding. |
| 1247 | |
| 1248 | \paragraph architecture_entropy_coef_coding The coding process |
| 1249 | The encoder uses \ref av1_write_coeffs_txb() to write the coefficients of |
| 1250 | a transform block into the bitstream. |
| 1251 | The coding process has three stages. |
| 1252 | 1. The encoder will code transform block skip flag (txb_skip). If the skip flag is |
| 1253 | off, then the encoder will code the end of block position (eob) which is the scan |
| 1254 | index of the last non-zero coefficient plus one. |
| 1255 | 2. Second, the encoder will code lower magnitude levels of each coefficient in |
| 1256 | reverse scan order. |
| 1257 | 3. Finally, the encoder will code the sign and higher magnitude levels for each |
| 1258 | coefficient if they are available. |
| 1259 | |
| 1260 | Related functions: |
| 1261 | - \ref av1_write_coeffs_txb() |
| 1262 | - write_inter_txb_coeff() |
| 1263 | - \ref av1_write_intra_coeffs_mb() |
| 1264 | |
| 1265 | \paragraph architecture_entropy_coef_context Context information |
| 1266 | To improve the compression efficiency, the encoder uses several context models |
| 1267 | tailored for transform coefficients to capture the correlations between coding |
| 1268 | symbols. Most of the context models are built to capture the correlations |
| 1269 | between the coefficients within the same transform block. However, transform |
| 1270 | block skip flag (txb_skip) and the sign of dc coefficient (dc_sign) require |
| 1271 | context info from neighboring transform blocks. |
| 1272 | |
| 1273 | Here is how context info spread between transform blocks. Before coding a |
| 1274 | transform block, the encoder will use get_txb_ctx() to collect the context |
| 1275 | information from neighboring transform blocks. Then the context information |
| 1276 | will be used for coding transform block skip flag (txb_skip) and the sign of |
| 1277 | dc coefficient (dc_sign). After the transform block is coded, the encoder will |
| 1278 | extract the context info from the current block using |
| 1279 | \ref av1_get_txb_entropy_context(). Then encoder will store the context info |
| 1280 | into a byte (uint8_t) using av1_set_entropy_contexts(). The encoder will use |
| 1281 | the context info to code other transform blocks. |
| 1282 | |
| 1283 | Related functions: |
| 1284 | - \ref av1_get_txb_entropy_context() |
| 1285 | - av1_set_entropy_contexts() |
| 1286 | - get_txb_ctx() |
| 1287 | - \ref av1_update_intra_mb_txb_context() |
| 1288 | |
| 1289 | \subsubsection architecture_entropy_coef_rd RD optimization |
| 1290 | Beside the actual entropy coding, the encoder uses several utility functions |
| 1291 | to make optimal RD decisions. |
| 1292 | |
| 1293 | \paragraph architecture_entropy_coef_cost Entropy cost |
| 1294 | The encoder uses \ref av1_cost_coeffs_txb() or \ref av1_cost_coeffs_txb_laplacian() |
| 1295 | to estimate the entropy cost of a transform block. Note that |
| 1296 | \ref av1_cost_coeffs_txb() is slower but accurate whereas |
| 1297 | \ref av1_cost_coeffs_txb_laplacian() is faster but less accurate. |
| 1298 | |
| 1299 | Related functions: |
| 1300 | - \ref av1_cost_coeffs_txb() |
| 1301 | - \ref av1_cost_coeffs_txb_laplacian() |
| 1302 | - \ref av1_cost_coeffs_txb_estimate() |
| 1303 | |
| 1304 | \paragraph architecture_entropy_coef_opt Quantized level optimization |
Vishesh | a45092c | 2021-01-25 00:28:11 +0530 | [diff] [blame] | 1305 | Beside computing entropy cost, the encoder also uses \ref av1_optimize_txb() |
angiebird | 9101c0e | 2020-08-17 11:16:23 -0700 | [diff] [blame] | 1306 | to adjust the coefficient’s quantized levels to achieve optimal RD trade-off. |
Vishesh | a45092c | 2021-01-25 00:28:11 +0530 | [diff] [blame] | 1307 | In \ref av1_optimize_txb(), the encoder goes through each quantized |
angiebird | 9101c0e | 2020-08-17 11:16:23 -0700 | [diff] [blame] | 1308 | coefficient and lowers the quantized coefficient level by one if the action |
| 1309 | yields a better RD score. |
| 1310 | |
| 1311 | Related functions: |
Vishesh | a45092c | 2021-01-25 00:28:11 +0530 | [diff] [blame] | 1312 | - \ref av1_optimize_txb() |
angiebird | 9101c0e | 2020-08-17 11:16:23 -0700 | [diff] [blame] | 1313 | |
| 1314 | All the related functions are listed in \ref coefficient_coding. |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 1315 | |
Rachel Barker | 5758629 | 2024-02-20 20:56:16 +0000 | [diff] [blame] | 1316 | \section architecture_simd SIMD usage |
| 1317 | |
| 1318 | In order to efficiently encode video on modern platforms, it is necessary to |
| 1319 | implement optimized versions of many core encoding and decoding functions using |
| 1320 | architecture-specific SIMD instructions. |
| 1321 | |
| 1322 | Functions which have optimized implementations will have multiple variants |
| 1323 | in the code, each suffixed with the name of the appropriate instruction set. |
| 1324 | There will additionally be an `_c` version, which acts as a reference |
| 1325 | implementation which the SIMD variants can be tested against. |
| 1326 | |
| 1327 | As different machines with the same nominal architecture may support different |
| 1328 | subsets of SIMD instructions, we have dynamic CPU detection logic which chooses |
| 1329 | the appropriate functions to use at run time. This process is handled by |
| 1330 | `build/cmake/rtcd.pl`, with function definitions in the files |
| 1331 | `*_rtcd_defs.pl` elsewhere in the codebase. |
| 1332 | |
| 1333 | Currently SIMD is supported on the following platforms: |
| 1334 | |
| 1335 | - x86: Requires SSE4.1 or above |
| 1336 | |
| 1337 | - Arm: Requires Neon (Armv7-A and above) |
| 1338 | |
| 1339 | We aim to provide implementations of all performance-critical functions which |
| 1340 | are compatible with the instruction sets listed above. Additional SIMD |
| 1341 | extensions (e.g. AVX on x86, SVE on Arm) are also used to provide even |
| 1342 | greater performance where available. |
| 1343 | |
Paul Wilkins | b534a78 | 2020-06-25 18:02:17 +0100 | [diff] [blame] | 1344 | */ |
Yunqing Wang | 65cd010 | 2020-05-06 12:57:04 -0700 | [diff] [blame] | 1345 | |
| 1346 | /*!\defgroup encoder_algo Encoder Algorithm |
| 1347 | * |
| 1348 | * The encoder algorithm describes how a sequence is encoded, including high |
| 1349 | * level decision as well as algorithm used at every encoding stage. |
| 1350 | */ |
| 1351 | |
| 1352 | /*!\defgroup high_level_algo High-level Algorithm |
| 1353 | * \ingroup encoder_algo |
| 1354 | * This module describes sequence level/frame level algorithm in AV1. |
| 1355 | * More details will be added. |
| 1356 | * @{ |
| 1357 | */ |
Elliott Karpilovsky | 2ea1836 | 2020-06-02 18:32:27 -0700 | [diff] [blame] | 1358 | |
Paul Wilkins | 7173920 | 2020-07-23 15:09:07 +0100 | [diff] [blame] | 1359 | /*!\defgroup speed_features Speed vs Quality Trade Off |
| 1360 | * \ingroup high_level_algo |
| 1361 | * This module describes the encode speed vs quality tradeoff |
| 1362 | * @{ |
| 1363 | */ |
| 1364 | /*! @} - end defgroup speed_features */ |
| 1365 | |
| 1366 | /*!\defgroup src_frame_proc Source Frame Processing |
| 1367 | * \ingroup high_level_algo |
| 1368 | * This module describes algorithms in AV1 assosciated with the |
| 1369 | * pre-processing of source frames. See also \ref architecture_enc_src_proc |
| 1370 | * |
| 1371 | * @{ |
| 1372 | */ |
| 1373 | /*! @} - end defgroup src_frame_proc */ |
| 1374 | |
| 1375 | /*!\defgroup rate_control Rate Control |
| 1376 | * \ingroup high_level_algo |
| 1377 | * This module describes rate control algorithm in AV1. |
| 1378 | * See also \ref architecture_enc_rate_ctrl |
| 1379 | * @{ |
| 1380 | */ |
| 1381 | /*! @} - end defgroup rate_control */ |
| 1382 | |
Paul Wilkins | ff98f3e | 2020-07-27 16:01:05 +0100 | [diff] [blame] | 1383 | /*!\defgroup tpl_modelling Temporal Dependency Modelling |
| 1384 | * \ingroup high_level_algo |
| 1385 | * This module includes algorithms to implement temporal dependency modelling. |
| 1386 | * See also \ref architecture_enc_tpl |
| 1387 | * @{ |
| 1388 | */ |
| 1389 | /*! @} - end defgroup tpl_modelling */ |
| 1390 | |
Paul Wilkins | 7173920 | 2020-07-23 15:09:07 +0100 | [diff] [blame] | 1391 | /*!\defgroup two_pass_algo Two Pass Mode |
| 1392 | \ingroup high_level_algo |
Elliott Karpilovsky | 2ea1836 | 2020-06-02 18:32:27 -0700 | [diff] [blame] | 1393 | |
| 1394 | In two pass mode, the input file is passed into the encoder for a quick |
| 1395 | first pass, where statistics are gathered. These statistics and the input |
| 1396 | file are then passed back into the encoder for a second pass. The statistics |
| 1397 | help the encoder reach the desired bitrate without as much overshooting or |
| 1398 | undershooting. |
| 1399 | |
| 1400 | During the first pass, the codec will return "stats" packets that contain |
| 1401 | information useful for the second pass. The caller should concatenate these |
| 1402 | packets as they are received. In the second pass, the concatenated packets |
| 1403 | are passed in, along with the frames to encode. During the second pass, |
| 1404 | "frame" packets are returned that represent the compressed video. |
| 1405 | |
| 1406 | A complete example can be found in `examples/twopass_encoder.c`. Pseudocode |
| 1407 | is provided below to illustrate the core parts. |
| 1408 | |
| 1409 | During the first pass, the uncompressed frames are passed in and stats |
| 1410 | information is appended to a byte array. |
| 1411 | |
| 1412 | ~~~~~~~~~~~~~~~{.c} |
| 1413 | // For simplicity, assume that there is enough memory in the stats buffer. |
| 1414 | // Actual code will want to use a resizable array. stats_len represents |
| 1415 | // the length of data already present in the buffer. |
| 1416 | void get_stats_data(aom_codec_ctx_t *encoder, char *stats, |
Elliott Karpilovsky | bbc7d9c | 2020-06-10 20:36:45 -0700 | [diff] [blame] | 1417 | size_t *stats_len, bool *got_data) { |
Elliott Karpilovsky | 2ea1836 | 2020-06-02 18:32:27 -0700 | [diff] [blame] | 1418 | const aom_codec_cx_pkt_t *pkt; |
| 1419 | aom_codec_iter_t iter = NULL; |
| 1420 | while ((pkt = aom_codec_get_cx_data(encoder, &iter))) { |
Elliott Karpilovsky | bbc7d9c | 2020-06-10 20:36:45 -0700 | [diff] [blame] | 1421 | *got_data = true; |
Elliott Karpilovsky | 2ea1836 | 2020-06-02 18:32:27 -0700 | [diff] [blame] | 1422 | if (pkt->kind != AOM_CODEC_STATS_PKT) continue; |
| 1423 | memcpy(stats + *stats_len, pkt->data.twopass_stats.buf, |
| 1424 | pkt->data.twopass_stats.sz); |
| 1425 | *stats_len += pkt->data.twopass_stats.sz; |
| 1426 | } |
| 1427 | } |
| 1428 | |
| 1429 | void first_pass(char *stats, size_t *stats_len) { |
| 1430 | struct aom_codec_enc_cfg first_pass_cfg; |
| 1431 | ... // Initialize the config as needed. |
| 1432 | first_pass_cfg.g_pass = AOM_RC_FIRST_PASS; |
| 1433 | aom_codec_ctx_t first_pass_encoder; |
| 1434 | ... // Initialize the encoder. |
| 1435 | |
| 1436 | while (frame_available) { |
| 1437 | // Read in the uncompressed frame, update frame_available |
| 1438 | aom_image_t *frame_to_encode = ...; |
| 1439 | aom_codec_encode(&first_pass_encoder, img, pts, duration, flags); |
| 1440 | get_stats_data(&first_pass_encoder, stats, stats_len); |
| 1441 | } |
| 1442 | // After all frames have been processed, call aom_codec_encode with |
Elliott Karpilovsky | bbc7d9c | 2020-06-10 20:36:45 -0700 | [diff] [blame] | 1443 | // a NULL ptr repeatedly, until no more data is returned. The NULL |
| 1444 | // ptr tells the encoder that no more frames are available. |
| 1445 | bool got_data; |
| 1446 | do { |
| 1447 | got_data = false; |
| 1448 | aom_codec_encode(&first_pass_encoder, NULL, pts, duration, flags); |
| 1449 | get_stats_data(&first_pass_encoder, stats, stats_len, &got_data); |
| 1450 | } while (got_data); |
Elliott Karpilovsky | 2ea1836 | 2020-06-02 18:32:27 -0700 | [diff] [blame] | 1451 | |
| 1452 | aom_codec_destroy(&first_pass_encoder); |
| 1453 | } |
| 1454 | ~~~~~~~~~~~~~~~ |
| 1455 | |
| 1456 | During the second pass, the uncompressed frames and the stats are |
| 1457 | passed into the encoder. |
| 1458 | |
| 1459 | ~~~~~~~~~~~~~~~{.c} |
| 1460 | // Write out each encoded frame to the file. |
Elliott Karpilovsky | bbc7d9c | 2020-06-10 20:36:45 -0700 | [diff] [blame] | 1461 | void get_cx_data(aom_codec_ctx_t *encoder, FILE *file, |
| 1462 | bool *got_data) { |
Elliott Karpilovsky | 2ea1836 | 2020-06-02 18:32:27 -0700 | [diff] [blame] | 1463 | const aom_codec_cx_pkt_t *pkt; |
| 1464 | aom_codec_iter_t iter = NULL; |
| 1465 | while ((pkt = aom_codec_get_cx_data(encoder, &iter))) { |
Elliott Karpilovsky | bbc7d9c | 2020-06-10 20:36:45 -0700 | [diff] [blame] | 1466 | *got_data = true; |
Elliott Karpilovsky | 2ea1836 | 2020-06-02 18:32:27 -0700 | [diff] [blame] | 1467 | if (pkt->kind != AOM_CODEC_CX_FRAME_PKT) continue; |
| 1468 | fwrite(pkt->data.frame.buf, 1, pkt->data.frame.sz, file); |
| 1469 | } |
| 1470 | } |
| 1471 | |
| 1472 | void second_pass(char *stats, size_t stats_len) { |
| 1473 | struct aom_codec_enc_cfg second_pass_cfg; |
| 1474 | ... // Initialize the config file as needed. |
| 1475 | second_pass_cfg.g_pass = AOM_RC_LAST_PASS; |
| 1476 | cfg.rc_twopass_stats_in.buf = stats; |
| 1477 | cfg.rc_twopass_stats_in.sz = stats_len; |
| 1478 | aom_codec_ctx_t second_pass_encoder; |
| 1479 | ... // Initialize the encoder from the config. |
| 1480 | |
| 1481 | FILE *output = fopen("output.obu", "wb"); |
| 1482 | while (frame_available) { |
| 1483 | // Read in the uncompressed frame, update frame_available |
| 1484 | aom_image_t *frame_to_encode = ...; |
| 1485 | aom_codec_encode(&second_pass_encoder, img, pts, duration, flags); |
| 1486 | get_cx_data(&second_pass_encoder, output); |
| 1487 | } |
| 1488 | // Pass in NULL to flush the encoder. |
Elliott Karpilovsky | bbc7d9c | 2020-06-10 20:36:45 -0700 | [diff] [blame] | 1489 | bool got_data; |
| 1490 | do { |
| 1491 | got_data = false; |
| 1492 | aom_codec_encode(&second_pass_encoder, NULL, pts, duration, flags); |
| 1493 | get_cx_data(&second_pass_encoder, output, &got_data); |
| 1494 | } while (got_data); |
Elliott Karpilovsky | 2ea1836 | 2020-06-02 18:32:27 -0700 | [diff] [blame] | 1495 | |
| 1496 | aom_codec_destroy(&second_pass_encoder); |
| 1497 | } |
| 1498 | ~~~~~~~~~~~~~~~ |
| 1499 | */ |
| 1500 | |
Elliott Karpilovsky | b6bd2bc | 2020-06-16 03:23:17 -0700 | [diff] [blame] | 1501 | /*!\defgroup look_ahead_buffer The Look-Ahead Buffer |
| 1502 | \ingroup high_level_algo |
| 1503 | |
| 1504 | A program should call \ref aom_codec_encode() for each frame that needs |
| 1505 | processing. These frames are internally copied and stored in a fixed-size |
| 1506 | circular buffer, known as the look-ahead buffer. Other parts of the code |
| 1507 | will use future frame information to inform current frame decisions; |
| 1508 | examples include the first-pass algorithm, TPL model, and temporal filter. |
| 1509 | Note that this buffer also keeps a reference to the last source frame. |
| 1510 | |
| 1511 | The look-ahead buffer is defined in \ref av1/encoder/lookahead.h. It acts as an |
| 1512 | opaque structure, with an interface to create and free memory associated with |
| 1513 | it. It supports pushing and popping frames onto the structure in a FIFO |
| 1514 | fashion. It also allows look-ahead when using the \ref av1_lookahead_peek() |
| 1515 | function with a non-negative number, and look-behind when -1 is passed in (for |
Elliott Karpilovsky | 9999059 | 2020-06-19 12:22:54 -0700 | [diff] [blame] | 1516 | the last source frame; e.g., firstpass will use this for motion estimation). |
| 1517 | The \ref av1_lookahead_depth() function returns the current number of frames |
| 1518 | stored in it. Note that \ref av1_lookahead_pop() is a bit of a misnomer - it |
| 1519 | only pops if either the "flush" variable is set, or the buffer is at maximum |
| 1520 | capacity. |
Elliott Karpilovsky | b6bd2bc | 2020-06-16 03:23:17 -0700 | [diff] [blame] | 1521 | |
Mufaddal Chakera | a65d2ce | 2021-02-15 12:20:48 +0530 | [diff] [blame] | 1522 | The buffer is stored in the \ref AV1_PRIMARY::lookahead field. |
Elliott Karpilovsky | b6bd2bc | 2020-06-16 03:23:17 -0700 | [diff] [blame] | 1523 | It is initialized in the first call to \ref aom_codec_encode(), in the |
| 1524 | \ref av1_receive_raw_frame() sub-routine. The buffer size is defined by |
| 1525 | the g_lag_in_frames parameter set in the |
| 1526 | \ref aom_codec_enc_cfg_t::g_lag_in_frames struct. |
| 1527 | This can be modified manually but should only be set once. On the command |
| 1528 | line, the flag "--lag-in-frames" controls it. The default size is 19 for |
Elliott Karpilovsky | 9999059 | 2020-06-19 12:22:54 -0700 | [diff] [blame] | 1529 | non-realtime usage and 1 for realtime. Note that a maximum value of 35 is |
Elliott Karpilovsky | b6bd2bc | 2020-06-16 03:23:17 -0700 | [diff] [blame] | 1530 | enforced. |
| 1531 | |
| 1532 | A frame will stay in the buffer as long as possible. As mentioned above, |
| 1533 | the \ref av1_lookahead_pop() only removes a frame when either flush is set, |
| 1534 | or the buffer is full. Note that each call to \ref aom_codec_encode() inserts |
| 1535 | another frame into the buffer, and pop is called by the sub-function |
| 1536 | \ref av1_encode_strategy(). The buffer is told to flush when |
| 1537 | \ref aom_codec_encode() is passed a NULL image pointer. Note that the caller |
| 1538 | must repeatedly call \ref aom_codec_encode() with a NULL image pointer, until |
| 1539 | no more packets are available, in order to fully flush the buffer. |
| 1540 | |
| 1541 | */ |
| 1542 | |
Yunqing Wang | 65cd010 | 2020-05-06 12:57:04 -0700 | [diff] [blame] | 1543 | /*! @} - end defgroup high_level_algo */ |
| 1544 | |
| 1545 | /*!\defgroup partition_search Partition Search |
| 1546 | * \ingroup encoder_algo |
Paul Wilkins | c84e8e2 | 2020-07-21 19:09:33 +0100 | [diff] [blame] | 1547 | * For and overview of the partition search see \ref architecture_enc_partitions |
Yunqing Wang | 65cd010 | 2020-05-06 12:57:04 -0700 | [diff] [blame] | 1548 | * @{ |
| 1549 | */ |
Paul Wilkins | 7173920 | 2020-07-23 15:09:07 +0100 | [diff] [blame] | 1550 | |
Yunqing Wang | 65cd010 | 2020-05-06 12:57:04 -0700 | [diff] [blame] | 1551 | /*! @} - end defgroup partition_search */ |
| 1552 | |
| 1553 | /*!\defgroup intra_mode_search Intra Mode Search |
| 1554 | * \ingroup encoder_algo |
| 1555 | * This module describes intra mode search algorithm in AV1. |
| 1556 | * More details will be added. |
| 1557 | * @{ |
| 1558 | */ |
| 1559 | /*! @} - end defgroup intra_mode_search */ |
| 1560 | |
| 1561 | /*!\defgroup inter_mode_search Inter Mode Search |
| 1562 | * \ingroup encoder_algo |
| 1563 | * This module describes inter mode search algorithm in AV1. |
| 1564 | * More details will be added. |
| 1565 | * @{ |
| 1566 | */ |
| 1567 | /*! @} - end defgroup inter_mode_search */ |
| 1568 | |
chiyotsai | 7cc167e | 2020-06-12 17:50:53 -0700 | [diff] [blame] | 1569 | /*!\defgroup palette_mode_search Palette Mode Search |
| 1570 | * \ingroup intra_mode_search |
| 1571 | * This module describes palette mode search algorithm in AV1. |
| 1572 | * More details will be added. |
| 1573 | * @{ |
| 1574 | */ |
| 1575 | /*! @} - end defgroup palette_mode_search */ |
| 1576 | |
Yunqing Wang | 65cd010 | 2020-05-06 12:57:04 -0700 | [diff] [blame] | 1577 | /*!\defgroup transform_search Transform Search |
| 1578 | * \ingroup encoder_algo |
| 1579 | * This module describes transform search algorithm in AV1. |
Yunqing Wang | 65cd010 | 2020-05-06 12:57:04 -0700 | [diff] [blame] | 1580 | * @{ |
| 1581 | */ |
| 1582 | /*! @} - end defgroup transform_search */ |
| 1583 | |
angiebird | 96bdb2a | 2020-06-28 17:24:24 -0700 | [diff] [blame] | 1584 | /*!\defgroup coefficient_coding Transform Coefficient Coding and Optimization |
| 1585 | * \ingroup encoder_algo |
| 1586 | * This module describes the algorithms of transform coefficient coding and optimization in AV1. |
| 1587 | * More details will be added. |
| 1588 | * @{ |
| 1589 | */ |
| 1590 | /*! @} - end defgroup coefficient_coding */ |
| 1591 | |
Yunqing Wang | 65cd010 | 2020-05-06 12:57:04 -0700 | [diff] [blame] | 1592 | /*!\defgroup in_loop_filter In-loop Filter |
| 1593 | * \ingroup encoder_algo |
| 1594 | * This module describes in-loop filter algorithm in AV1. |
| 1595 | * More details will be added. |
| 1596 | * @{ |
| 1597 | */ |
| 1598 | /*! @} - end defgroup in_loop_filter */ |
| 1599 | |
Debargha Mukherjee | 7f1580e | 2020-06-19 06:37:28 -0700 | [diff] [blame] | 1600 | /*!\defgroup in_loop_cdef CDEF |
Debargha Mukherjee | 82b2438 | 2020-06-16 23:30:39 -0700 | [diff] [blame] | 1601 | * \ingroup encoder_algo |
| 1602 | * This module describes the CDEF parameter search algorithm |
| 1603 | * in AV1. More details will be added. |
| 1604 | * @{ |
| 1605 | */ |
| 1606 | /*! @} - end defgroup in_loop_restoration */ |
| 1607 | |
Debargha Mukherjee | 7f1580e | 2020-06-19 06:37:28 -0700 | [diff] [blame] | 1608 | /*!\defgroup in_loop_restoration Loop Restoration |
Debargha Mukherjee | 82b2438 | 2020-06-16 23:30:39 -0700 | [diff] [blame] | 1609 | * \ingroup encoder_algo |
| 1610 | * This module describes the loop restoration search |
| 1611 | * and estimation algorithm in AV1. |
| 1612 | * More details will be added. |
| 1613 | * @{ |
| 1614 | */ |
| 1615 | /*! @} - end defgroup in_loop_restoration */ |
| 1616 | |
Marco Paniconi | 5b2faba | 2020-07-09 11:39:22 -0700 | [diff] [blame] | 1617 | /*!\defgroup cyclic_refresh Cyclic Refresh |
| 1618 | * \ingroup encoder_algo |
| 1619 | * This module describes the cyclic refresh (aq-mode=3) in AV1. |
| 1620 | * More details will be added. |
| 1621 | * @{ |
| 1622 | */ |
| 1623 | /*! @} - end defgroup cyclic_refresh */ |
Jerome Jiang | 66e7624 | 2020-07-09 11:38:19 -0700 | [diff] [blame] | 1624 | |
| 1625 | /*!\defgroup SVC Scalable Video Coding |
| 1626 | * \ingroup encoder_algo |
| 1627 | * This module describes scalable video coding algorithm in AV1. |
| 1628 | * More details will be added. |
| 1629 | * @{ |
| 1630 | */ |
| 1631 | /*! @} - end defgroup SVC */ |
Marco Paniconi | 08f71f2 | 2020-07-14 10:41:47 -0700 | [diff] [blame] | 1632 | /*!\defgroup variance_partition Variance Partition |
| 1633 | * \ingroup encoder_algo |
| 1634 | * This module describes variance partition algorithm in AV1. |
| 1635 | * More details will be added. |
| 1636 | * @{ |
| 1637 | */ |
| 1638 | /*! @} - end defgroup variance_partition */ |
Fyodor Kyslov | 2a3768e | 2020-07-20 14:38:05 -0700 | [diff] [blame] | 1639 | /*!\defgroup nonrd_mode_search NonRD Optimized Mode Search |
| 1640 | * \ingroup encoder_algo |
| 1641 | * This module describes NonRD Optimized Mode Search used in Real-Time mode. |
| 1642 | * More details will be added. |
| 1643 | * @{ |
| 1644 | */ |
| 1645 | /*! @} - end defgroup nonrd_mode_search */ |