blob: 13d55e4644ff3d9f745003a42c53d243ed302b79 [file] [log] [blame]
Paul Wilkinsb2194de2020-07-08 17:58:14 +01001/*!\page encoder_guide AV1 ENCODER GUIDE
Yunqing Wangc8f7a3b2020-05-04 15:23:48 -07002
Paul Wilkinsb534a782020-06-25 18:02:17 +01003\tableofcontents
4
5\section architecture_introduction Introduction
6
7This document provides an architectural overview of the libaom AV1 encoder.
8
9It is intended as a high level starting point for anyone wishing to contribute
10to the project, that will help them to more quickly understand the structure
11of the encoder and find their way around the codebase.
12
13It stands above and will where necessary link to more detailed function
14level documents.
15
Paul Wilkins196995d2020-07-14 16:49:38 +010016\subsection architecture_gencodecs Generic Block Transform Based Codecs
Paul Wilkinsb534a782020-06-25 18:02:17 +010017
18Most modern video encoders including VP8, H.264, VP9, HEVC and AV1
19(in increasing order of complexity) share a common basic paradigm. This
20comprises separating a stream of raw video frames into a series of discrete
21blocks (of one or more sizes), then computing a prediction signal and a
22quantized, transform coded, residual error signal. The prediction and residual
23error signal, along with any side information needed by the decoder, are then
24entropy coded and packed to form the encoded bitstream. See Figure 1: below,
25where the blue blocks are, to all intents and purposes, the lossless parts of
26the encoder and the red block is the lossy part.
27
28This is of course a gross oversimplification, even in regard to the simplest
29of the above codecs. For example, all of them allow for block based
30prediction at multiple different scales (i.e. different block sizes) and may
31use previously coded pixels in the current frame for prediction or pixels from
32one or more previously encoded frames. Further, they may support multiple
33different transforms and transform sizes and quality optimization tools like
34loop filtering.
35
36\image html genericcodecflow.png "" width=70%
37
Paul Wilkins196995d2020-07-14 16:49:38 +010038\subsection architecture_av1_structure AV1 Structure and Complexity
Paul Wilkinsb534a782020-06-25 18:02:17 +010039
40As previously stated, AV1 adopts the same underlying paradigm as other block
41transform based codecs. However, it is much more complicated than previous
42generation codecs and supports many more block partitioning, prediction and
43transform options.
44
45AV1 supports block partitions of various sizes from 128x128 pixels down to 4x4
46pixels using a multi-layer recursive tree structure as illustrated in figure 2
47below.
48
49\image html av1partitions.png "" width=70%
50
51AV1 also provides 71 basic intra prediction modes, 56 single frame inter prediction
52modes (7 reference frames x 4 modes x 2 for OBMC (overlapped block motion
53compensation)), 12768 compound inter prediction modes (that combine inter
54predictors from two reference frames) and 36708 compound inter / intra
55prediction modes. Furthermore, in addition to simple inter motion estimation,
56AV1 also supports warped motion prediction using affine transforms.
57
58In terms of transform coding, it has 16 separable 2-D transform kernels
Paul Wilkins8ed85dd2020-08-04 17:48:22 +010059\f$(DCT, ADST, fADST, IDTX)^2\f$ that can be applied at up to 19 different
60scales from 64x64 down to 4x4 pixels.
Paul Wilkinsb534a782020-06-25 18:02:17 +010061
62When combined together, this means that for any one 8x8 pixel block in a
63source frame, there are approximately 45,000,000 different ways that it can
64be encoded.
65
66Consequently, AV1 requires complex control processes. While not necessarily
67a normative part of the bitstream, these are the algorithms that turn a set
68of compression tools and a bitstream format specification, into a coherent
69and useful codec implementation. These may include but are not limited to
70things like :-
71
72- Rate distortion optimization (The process of trying to choose the most
73 efficient combination of block size, prediction mode, transform type
74 etc.)
75- Rate control (regulation of the output bitrate)
76- Encoder speed vs quality trade offs.
77- Features such as two pass encoding or optimization for low delay
78 encoding.
79
Paul Wilkins4a9201b2020-06-26 10:46:22 +010080For a more detailed overview of AV1's encoding tools and a discussion of some
Paul Wilkinsb534a782020-06-25 18:02:17 +010081of the design considerations and hardware constraints that had to be
Paul Wilkinsf88a1512020-10-20 13:18:40 +010082accommodated, please refer to <a href="https://arxiv.org/abs/2008.06091">
83A Technical Overview of AV1</a>.
Paul Wilkinsb534a782020-06-25 18:02:17 +010084
85Figure 3 provides a slightly expanded but still simplistic view of the
86AV1 encoder architecture with blocks that relate to some of the subsequent
87sections of this document. In this diagram, the raw uncompressed frame buffers
88are shown in dark green and the reconstructed frame buffers used for
89prediction in light green. Red indicates those parts of the codec that are
Paul Wilkins4a9201b2020-06-26 10:46:22 +010090(or may be) lossy, where fidelity can be traded off against compression
Paul Wilkinsb534a782020-06-25 18:02:17 +010091efficiency, whilst light blue shows algorithms or coding tools that are
92lossless. The yellow blocks represent non-bitstream normative configuration
93and control algorithms.
94
95\image html av1encoderflow.png "" width=70%
96
97\section architecture_command_line The Libaom Command Line Interface
98
99 Add details or links here: TODO ? elliotk@
100
101\section architecture_enc_data_structures Main Encoder Data Structures
102
Paul Wilkins4a9201b2020-06-26 10:46:22 +0100103The following are the main high level data structures used by the libaom AV1
Paul Wilkins83cfad42020-06-26 12:38:07 +0100104encoder and referenced elsewhere in this overview document:
105
Mufaddal Chakera8ee04fa2021-03-17 13:33:18 +0530106- \ref AV1_PRIMARY
107 - \ref AV1_PRIMARY.gf_group (\ref GF_GROUP)
Tarundeep Singh5e5305a2021-03-16 13:04:04 +0530108 - \ref AV1_PRIMARY.lap_enabled
Mufaddal Chakera358cf212021-02-25 14:41:56 +0530109 - \ref AV1_PRIMARY.twopass (\ref TWO_PASS)
Mufaddal Chakera94ee9bf2021-04-12 01:02:22 +0530110 - \ref AV1_PRIMARY.p_rc (\ref PRIMARY_RATE_CONTROL)
Angie Chiang29aaace2021-11-15 16:23:42 -0800111 - \ref AV1_PRIMARY.tf_info (\ref TEMPORAL_FILTER_INFO)
Mufaddal Chakera8ee04fa2021-03-17 13:33:18 +0530112
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100113- \ref AV1_COMP
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100114 - \ref AV1_COMP.oxcf (\ref AV1EncoderConfig)
Paul Wilkins3ceb7c72020-07-14 14:02:52 +0100115 - \ref AV1_COMP.rc (\ref RATE_CONTROL)
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100116 - \ref AV1_COMP.speed
117 - \ref AV1_COMP.sf (\ref SPEED_FEATURES)
Paul Wilkinsb534a782020-06-25 18:02:17 +0100118
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100119- \ref AV1EncoderConfig (Encoder configuration parameters)
120 - \ref AV1EncoderConfig.pass
Paul Wilkins3ceb7c72020-07-14 14:02:52 +0100121 - \ref AV1EncoderConfig.algo_cfg (\ref AlgoCfg)
Paul Wilkins591f0472020-07-15 15:30:56 +0100122 - \ref AV1EncoderConfig.kf_cfg (\ref KeyFrameCfg)
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100123 - \ref AV1EncoderConfig.rc_cfg (\ref RateControlCfg)
Paul Wilkins83cfad42020-06-26 12:38:07 +0100124
Paul Wilkins3ceb7c72020-07-14 14:02:52 +0100125- \ref AlgoCfg (Algorithm related configuration parameters)
126 - \ref AlgoCfg.arnr_max_frames
127 - \ref AlgoCfg.arnr_strength
128
129- \ref KeyFrameCfg (Keyframe coding configuration parameters)
130 - \ref KeyFrameCfg.enable_keyframe_filtering
131
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100132- \ref RateControlCfg (Rate control configuration)
Paul Wilkins1dd7a7e2020-07-09 17:07:35 +0100133 - \ref RateControlCfg.mode
134 - \ref RateControlCfg.target_bandwidth
135 - \ref RateControlCfg.best_allowed_q
136 - \ref RateControlCfg.worst_allowed_q
137 - \ref RateControlCfg.cq_level
138 - \ref RateControlCfg.under_shoot_pct
139 - \ref RateControlCfg.over_shoot_pct
140 - \ref RateControlCfg.maximum_buffer_size_ms
141 - \ref RateControlCfg.starting_buffer_level_ms
142 - \ref RateControlCfg.optimal_buffer_level_ms
Debargha Mukherjeec6a81202020-07-22 16:35:20 -0700143 - \ref RateControlCfg.vbrbias
144 - \ref RateControlCfg.vbrmin_section
145 - \ref RateControlCfg.vbrmax_section
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100146
Mufaddal Chakera94ee9bf2021-04-12 01:02:22 +0530147- \ref PRIMARY_RATE_CONTROL (Primary Rate control status)
148 - \ref PRIMARY_RATE_CONTROL.gf_intervals[]
149 - \ref PRIMARY_RATE_CONTROL.cur_gf_index
150
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100151- \ref RATE_CONTROL (Rate control status)
152 - \ref RATE_CONTROL.intervals_till_gf_calculate_due
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100153 - \ref RATE_CONTROL.frames_till_gf_update_due
154 - \ref RATE_CONTROL.frames_to_key
155
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100156- \ref TWO_PASS (Two pass status and control data)
157
Wan-Teh Chang247dd542020-10-08 12:37:47 -0700158- \ref GF_GROUP (Data related to the current GF/ARF group)
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100159
160- \ref FIRSTPASS_STATS (Defines entries in the first pass stats buffer)
161 - \ref FIRSTPASS_STATS.coded_error
162
163- \ref SPEED_FEATURES (Encode speed vs quality tradeoff parameters)
164 - \ref SPEED_FEATURES.hl_sf (\ref HIGH_LEVEL_SPEED_FEATURES)
165
166- \ref HIGH_LEVEL_SPEED_FEATURES
167 - \ref HIGH_LEVEL_SPEED_FEATURES.recode_loop
168 - \ref HIGH_LEVEL_SPEED_FEATURES.recode_tolerance
Paul Wilkinsb534a782020-06-25 18:02:17 +0100169
Paul Wilkins4ac8bf42020-07-30 16:44:27 +0100170- \ref TplParams
171
Paul Wilkins71739202020-07-23 15:09:07 +0100172\section architecture_enc_use_cases Encoder Use Cases
173
174The libaom AV1 encoder is configurable to support a number of different use
175cases and rate control strategies.
176
177The principle use cases for which it is optimised are as follows:
178
179 - <b>Video on Demand / Streaming</b>
180 - <b>Low Delay or Live Streaming</b>
181 - <b>Video Conferencing / Real Time Coding (RTC)</b>
182 - <b>Fixed Quality / Testing</b>
183
184Other examples of use cases for which the encoder could be configured but for
185which there is less by way of specific optimizations include:
186
187 - <b>Download and Play</b>
188 - <b>Disk Playback</b>>
189 - <b>Storage</b>
190 - <b>Editing</b>
191 - <b>Broadcast video</b>
192
193Specific use cases may have particular requirements or constraints. For
194example:
195
196<b>Video Conferencing:</b> In a video conference we need to encode the video
197in real time and to avoid any coding tools that could increase latency, such
198as frame look ahead.
199
200<b>Live Streams:</b> In cases such as live streaming of games or events, it
201may be possible to allow some limited buffering of the video and use of
202lookahead coding tools to improve encoding quality. However, whilst a lag of
203a second or two may be fine given the one way nature of this type of video,
204it is clearly not possible to use tools such as two pass coding.
205
206<b>Broadcast:</b> Broadcast video (e.g. digital TV over satellite) may have
207specific requirements such as frequent and regular key frames (e.g. once per
208second or more) as these are important as entry points to users when switching
209channels. There may also be strict upper limits on bandwidth over a short
210window of time.
211
212<b>Download and Play:</b> Download and play applications may have less strict
213requirements in terms of local frame by frame rate control but there may be a
214requirement to accurately hit a file size target for the video clip as a
215whole. Similar considerations may apply to playback from mass storage devices
216such as DVD or disk drives.
217
218<b>Editing:</b> In certain special use cases such as offline editing, it may
219be desirable to have very high quality and data rate but also very frequent
220key frames or indeed to encode the video exclusively as key frames. Lossless
221video encoding may also be required in this use case.
222
223<b>VOD / Streaming:</b> One of the most important and common use cases for AV1
224is video on demand or streaming, for services such as YouTube and Netflix. In
225this use case it is possible to do two or even multi-pass encoding to improve
226compression efficiency. Streaming services will often store many encoded
227copies of a video at different resolutions and data rates to support users
228with different types of playback device and bandwidth limitations.
229Furthermore, these services support dynamic switching between multiple
230streams, so that they can respond to changing network conditions.
231
232Exact rate control when encoding for a specific format (e.g 360P or 1080P on
233YouTube) may not be critical, provided that the video bandwidth remains within
234allowed limits. Whilst a format may have a nominal target data rate, this can
235be considered more as the desired average egress rate over the video corpus
236rather than a strict requirement for any individual clip. Indeed, in order
237to maintain optimal quality of experience for the end user, it may be
238desirable to encode some easier videos or sections of video at a lower data
239rate and harder videos or sections at a higher rate.
240
241VOD / streaming does not usually require very frequent key frames (as in the
242broadcast case) but key frames are important in trick play (scanning back and
243forth to different points in a video) and for adaptive stream switching. As
244such, in a use case like YouTube, there is normally an upper limit on the
245maximum time between key frames of a few seconds, but within certain limits
246the encoder can try to align key frames with real scene cuts.
247
248Whilst encoder speed may not seem to be as critical in this use case, for
249services such as YouTube, where millions of new videos have to be encoded
250every day, encoder speed is still important, so libaom allows command line
251control of the encode speed vs quality trade off.
252
253<b>Fixed Quality / Testing Mode:</b> Libaom also has a fixed quality encoder
254pathway designed for testing under highly constrained conditions.
255
256\section architecture_enc_speed_quality Speed vs Quality Trade Off
257
258In any modern video encoder there are trade offs that can be made in regard to
259the amount of time spent encoding a video or video frame vs the quality of the
260final encode.
261
262These trade offs typically limit the scope of the search for an optimal
263prediction / transform combination with faster encode modes doing fewer
264partition, reference frame, prediction mode and transform searches at the cost
265of some reduction in coding efficiency.
266
267The pruning of the size of the search tree is typically based on assumptions
268about the likelihood of different search modes being selected based on what
269has gone before and features such as the dimensions of the video frames and
270the Q value selected for encoding the frame. For example certain intra modes
271are less likely to be chosen at high Q but may be more likely if similar
272modes were used for the previously coded blocks above and to the left of the
273current block.
274
275The speed settings depend both on the use case (e.g. Real Time encoding) and
276an explicit speed control passed in on the command line as <b>--cpu-used</b>
277and stored in the \ref AV1_COMP.speed field of the main compressor instance
278data structure (<b>cpi</b>).
279
280The control flags for the speed trade off are stored the \ref AV1_COMP.sf
281field of the compressor instancve and are set in the following functions:-
282
283- \ref av1_set_speed_features_framesize_independent()
284- \ref av1_set_speed_features_framesize_dependent()
285- \ref av1_set_speed_features_qindex_dependent()
286
287A second factor impacting the speed of encode is rate distortion optimisation
288(<b>rd vs non-rd</b> encoding).
289
290When rate distortion optimization is enabled each candidate combination of
291a prediction mode and transform coding strategy is fully encoded and the
292resulting error (or distortion) as compared to the original source and the
293number of bits used, are passed to a rate distortion function. This function
294converts the distortion and cost in bits to a single <b>RD</b> value (where
295lower is better). This <b>RD</b> value is used to decide between different
296encoding strategies for the current block where, for example, a one may
297result in a lower distortion but a larger number of bits.
298
299The calculation of this <b>RD</b> value is broadly speaking as follows:
300
301\f[
302 RD = (&lambda; * Rate) + Distortion
303\f]
304
305This assumes a linear relationship between the number of bits used and
306distortion (represented by the rate multiplier value <b>&lambda;</b>) which is
307not actually valid across a broad range of rate and distortion values.
308Typically, where distortion is high, expending a small number of extra bits
309will result in a large change in distortion. However, at lower values of
310distortion the cost in bits of each incremental improvement is large.
311
312To deal with this we scale the value of <b>&lambda;</b> based on the quantizer
313value chosen for the frame. This is assumed to be a proxy for our approximate
314position on the true rate distortion curve and it is further assumed that over
315a limited range of distortion values, a linear relationship between distortion
316and rate is a valid approximation.
317
318Doing a rate distortion test on each candidate prediction / transform
319combination is expensive in terms of cpu cycles. Hence, for cases where encode
320speed is critical, libaom implements a non-rd pathway where the <b>RD</b>
321value is estimated based on the prediction error and quantizer setting.
322
Paul Wilkins3ceb7c72020-07-14 14:02:52 +0100323\section architecture_enc_src_proc Source Frame Processing
324
325\subsection architecture_enc_frame_proc_data Main Data Structures
326
327The following are the main data structures referenced in this section
328(see also \ref architecture_enc_data_structures):
329
Tarundeep Singh4593fcf2021-03-31 00:53:31 +0530330- \ref AV1_PRIMARY ppi (the primary compressor instance data structure)
Angie Chiang29aaace2021-11-15 16:23:42 -0800331 - \ref AV1_PRIMARY.tf_info (\ref TEMPORAL_FILTER_INFO)
Tarundeep Singh4593fcf2021-03-31 00:53:31 +0530332
Paul Wilkins3ceb7c72020-07-14 14:02:52 +0100333- \ref AV1_COMP cpi (the main compressor instance data structure)
334 - \ref AV1_COMP.oxcf (\ref AV1EncoderConfig)
Paul Wilkins3ceb7c72020-07-14 14:02:52 +0100335
336- \ref AV1EncoderConfig (Encoder configuration parameters)
337 - \ref AV1EncoderConfig.algo_cfg (\ref AlgoCfg)
338 - \ref AV1EncoderConfig.kf_cfg (\ref KeyFrameCfg)
339
340- \ref AlgoCfg (Algorithm related configuration parameters)
341 - \ref AlgoCfg.arnr_max_frames
342 - \ref AlgoCfg.arnr_strength
343
344- \ref KeyFrameCfg (Keyframe coding configuration parameters)
345 - \ref KeyFrameCfg.enable_keyframe_filtering
346
Paul Wilkins196995d2020-07-14 16:49:38 +0100347\subsection architecture_enc_frame_proc_ingest Frame Ingest / Coding Pipeline
Paul Wilkins3ceb7c72020-07-14 14:02:52 +0100348
Paul Wilkins196995d2020-07-14 16:49:38 +0100349 To encode a frame, first call \ref av1_receive_raw_frame() to obtain the raw
350 frame data. Then call \ref av1_get_compressed_data() to encode raw frame data
351 into compressed frame data. The main body of \ref av1_get_compressed_data()
352 is \ref av1_encode_strategy(), which determines high-level encode strategy
353 (frame type, frame placement, etc.) and then encodes the frame by calling
354 \ref av1_encode(). In \ref av1_encode(), \ref av1_first_pass() will execute
355 the first_pass of two-pass encoding, while \ref encode_frame_to_data_rate()
356 will perform the final pass for either one-pass or two-pass encoding.
Paul Wilkins3ceb7c72020-07-14 14:02:52 +0100357
Paul Wilkins196995d2020-07-14 16:49:38 +0100358 The main body of \ref encode_frame_to_data_rate() is
359 \ref encode_with_recode_loop_and_filter(), which handles encoding before
Paul Wilkins591f0472020-07-15 15:30:56 +0100360 in-loop filters (with recode loops \ref encode_with_recode_loop(), or
Paul Wilkins196995d2020-07-14 16:49:38 +0100361 without any recode loop \ref encode_without_recode()), followed by in-loop
362 filters (deblocking filters \ref loopfilter_frame(), CDEF filters and
363 restoration filters \ref cdef_restoration_frame()).
364
Paul Wilkins591f0472020-07-15 15:30:56 +0100365 Except for rate/quality control, both \ref encode_with_recode_loop() and
Paul Wilkins196995d2020-07-14 16:49:38 +0100366 \ref encode_without_recode() call \ref av1_encode_frame() to manage the
367 reference frame buffers and \ref encode_frame_internal() to perform the
368 rest of encoding that does not require access to external frames.
369 \ref encode_frame_internal() is the starting point for the partition search
370 (see \ref architecture_enc_partitions).
371
372\subsection architecture_enc_frame_proc_tf Temporal Filtering
373
374\subsubsection architecture_enc_frame_proc_tf_overview Overview
Paul Wilkins3ceb7c72020-07-14 14:02:52 +0100375
376Video codecs exploit the spatial and temporal correlations in video signals to
377achieve compression efficiency. The noise factor in the source signal
378attenuates such correlation and impedes the codec performance. Denoising the
379video signal is potentially a promising solution.
380
381One strategy for denoising a source is motion compensated temporal filtering.
382Unlike image denoising, where only the spatial information is available,
383video denoising can leverage a combination of the spatial and temporal
384information. Specifically, in the temporal domain, similar pixels can often be
385tracked along the motion trajectory of moving objects. Motion estimation is
386applied to neighboring frames to find similar patches or blocks of pixels that
387can be combined to create a temporally filtered output.
388
389AV1, in common with VP8 and VP9, uses an in-loop motion compensated temporal
390filter to generate what are referred to as alternate reference frames (or ARF
391frames). These can be encoded in the bitstream and stored as frame buffers for
392use in the prediction of subsequent frames, but are not usually directly
393displayed (hence they are sometimes referred to as non-display frames).
394
395The following command line parameters set the strength of the filter, the
396number of frames used and determine whether filtering is allowed for key
397frames.
398
399- <b>--arnr-strength</b> (\ref AlgoCfg.arnr_strength)
400- <b>--arnr-maxframes</b> (\ref AlgoCfg.arnr_max_frames)
401- <b>--enable-keyframe-filtering</b>
402 (\ref KeyFrameCfg.enable_keyframe_filtering)
403
404Note that in AV1, the temporal filtering scheme is designed around the
405hierarchical ARF based pyramid coding structure. We typically apply denoising
406only on key frame and ARF frames at the highest (and sometimes the second
407highest) layer in the hierarchical coding structure.
408
Paul Wilkins196995d2020-07-14 16:49:38 +0100409\subsubsection architecture_enc_frame_proc_tf_algo Temporal Filtering Algorithm
Paul Wilkins3ceb7c72020-07-14 14:02:52 +0100410
411Our method divides the current frame into "MxM" blocks. For each block, a
412motion search is applied on frames before and after the current frame. Only
413the best matching patch with the smallest mean square error (MSE) is kept as a
414candidate patch for a neighbour frame. The current block is also a candidate
415patch. A total of N candidate patches are combined to generate the filtered
416output.
417
418Let f(i) represent the filtered sample value and \f$p_{j}(i)\f$ the sample
419value of the j-th patch. The filtering process is:
420
421\f[
422 f(i) = \frac{p_{0}(i) + \sum_{j=1}^{N} &omega;_{j}(i).p_{j}(i)}
423 {1 + \sum_{j=1}^{N} &omega;_{j}(i)}
424\f]
425
426where \f$ &omega;_{j}(i) \f$ is the weight of the j-th patch from a total of
427N patches. The weight is determined by the patch difference as:
428
429\f[
430 &omega;_{j}(i) = exp(-\frac{D_{j}(i)}{h^2})
431\f]
432
433where \f$ D_{j}(i) \f$ is the sum of squared difference between the current
434block and the j-th candidate patch:
435
436\f[
437 D_{j}(i) = \sum_{k\in&Omega;_{i}}||p_{0}(k) - p_{j}(k)||_{2}
438\f]
439
440where:
441- \f$p_{0}\f$ refers to the current frame.
442- \f$&Omega;_{i}\f$ is the patch window, an "LxL" pixel square.
443- h is a critical parameter that controls the decay of the weights measured by
444 the Euclidean distance. It is derived from an estimate of noise amplitude in
445 the source. This allows the filter coefficients to adapt for videos with
446 different noise characteristics.
447- Usually, M = 32, N = 7, and L = 5, but they can be adjusted.
448
449It is recommended that the reader refers to the code for more details.
450
Paul Wilkins196995d2020-07-14 16:49:38 +0100451\subsubsection architecture_enc_frame_proc_tf_funcs Temporal Filter Functions
Paul Wilkins3ceb7c72020-07-14 14:02:52 +0100452
Paul Wilkinsc84e8e22020-07-21 19:09:33 +0100453The main entry point for temporal filtering is \ref av1_temporal_filter().
454This function returns 1 if temporal filtering is successful, otherwise 0.
455When temporal filtering is applied, the filtered frame will be held in
Angie Chiang29aaace2021-11-15 16:23:42 -0800456the output_frame, which is the frame to be
Paul Wilkinsc84e8e22020-07-21 19:09:33 +0100457encoded in the following encoding process.
Paul Wilkins3ceb7c72020-07-14 14:02:52 +0100458
459Almost all temporal filter related code is in av1/encoder/temporal_filter.c
460and av1/encoder/temporal_filter.h.
461
Paul Wilkinsc84e8e22020-07-21 19:09:33 +0100462Inside \ref av1_temporal_filter(), the reader's attention is directed to
463\ref tf_setup_filtering_buffer() and \ref tf_do_filtering().
Paul Wilkins3ceb7c72020-07-14 14:02:52 +0100464
Paul Wilkinsc84e8e22020-07-21 19:09:33 +0100465- \ref tf_setup_filtering_buffer(): sets up the frame buffer for
Paul Wilkins3ceb7c72020-07-14 14:02:52 +0100466 temporal filtering, determines the number of frames to be used, and
467 calculates the noise level of each frame.
468
Paul Wilkinsc84e8e22020-07-21 19:09:33 +0100469- \ref tf_do_filtering(): the main function for the temporal
Paul Wilkins591f0472020-07-15 15:30:56 +0100470 filtering algorithm. It breaks each frame into "MxM" blocks. For each
Paul Wilkinsc84e8e22020-07-21 19:09:33 +0100471 block a motion search \ref tf_motion_search() is applied to find
472 the motion vector from one neighboring frame. tf_build_predictor() is then
473 called to build the matching patch and \ref av1_apply_temporal_filter_c() (see
474 also optimised SIMD versions) to apply temporal filtering. The weighted
475 average over each pixel is accumulated and finally normalized in
476 \ref tf_normalize_filtered_frame() to generate the final filtered frame.
Paul Wilkins3ceb7c72020-07-14 14:02:52 +0100477
Paul Wilkinsc84e8e22020-07-21 19:09:33 +0100478- \ref av1_apply_temporal_filter_c(): the core function of our temporal
479 filtering algorithm (see also optimised SIMD versions).
Paul Wilkins3ceb7c72020-07-14 14:02:52 +0100480
481\subsection architecture_enc_frame_proc_film Film Grain Modelling
482
483 Add details here.
484
Paul Wilkinsb534a782020-06-25 18:02:17 +0100485\section architecture_enc_rate_ctrl Rate Control
486
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100487\subsection architecture_enc_rate_ctrl_data Main Data Structures
488
489The following are the main data structures referenced in this section
490(see also \ref architecture_enc_data_structures):
491
Mufaddal Chakera358cf212021-02-25 14:41:56 +0530492 - \ref AV1_PRIMARY ppi (the primary compressor instance data structure)
493 - \ref AV1_PRIMARY.twopass (\ref TWO_PASS)
494
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100495 - \ref AV1_COMP cpi (the main compressor instance data structure)
496 - \ref AV1_COMP.oxcf (\ref AV1EncoderConfig)
497 - \ref AV1_COMP.rc (\ref RATE_CONTROL)
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100498 - \ref AV1_COMP.sf (\ref SPEED_FEATURES)
499
500 - \ref AV1EncoderConfig (Encoder configuration parameters)
501 - \ref AV1EncoderConfig.rc_cfg (\ref RateControlCfg)
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100502
503 - \ref FIRSTPASS_STATS *frame_stats_buf (used to store per frame first
504 pass stats)
505
506 - \ref SPEED_FEATURES (Encode speed vs quality tradeoff parameters)
507 - \ref SPEED_FEATURES.hl_sf (\ref HIGH_LEVEL_SPEED_FEATURES)
508
509\subsection architecture_enc_rate_ctrl_options Supported Rate Control Options
510
Paul Wilkins71739202020-07-23 15:09:07 +0100511Different use cases (\ref architecture_enc_use_cases) may have different
512requirements in terms of data rate control.
Paul Wilkins83cfad42020-06-26 12:38:07 +0100513
514The broad rate control strategy is selected using the <b>--end-usage</b>
515parameter on the command line, which maps onto the field
516\ref aom_codec_enc_cfg_t.rc_end_usage in \ref aom_encoder.h.
517
518The four supported options are:-
519
520- <b>VBR</b> (Variable Bitrate)
521- <b>CBR</b> (Constant Bitrate)
522- <b>CQ</b> (Constrained Quality mode ; A constrained variant of VBR)
Paul Wilkinse8c76eb2020-06-30 17:24:11 +0100523- <b>Fixed Q</b> (Constant quality of Q mode)
Paul Wilkins83cfad42020-06-26 12:38:07 +0100524
525The value of \ref aom_codec_enc_cfg_t.rc_end_usage is in turn copied over
526into the encoder rate control configuration data structure as
Paul Wilkins1dd7a7e2020-07-09 17:07:35 +0100527\ref RateControlCfg.mode.
Paul Wilkins83cfad42020-06-26 12:38:07 +0100528
529In regards to the most important use cases above, Video on demand uses either
530VBR or CQ mode. CBR is the preferred rate control model for RTC and Live
531streaming and Fixed Q is only used in testing.
532
533The behaviour of each of these modes is regulated by a series of secondary
534command line rate control options but also depends somewhat on the selected
535use case, whether 2-pass coding is enabled and the selected encode speed vs
536quality trade offs (\ref AV1_COMP.speed and \ref AV1_COMP.sf).
537
538The list below gives the names of the main rate control command line
539options together with the names of the corresponding fields in the rate
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100540control configuration data structures.
Paul Wilkins83cfad42020-06-26 12:38:07 +0100541
Paul Wilkins1dd7a7e2020-07-09 17:07:35 +0100542- <b>--target-bitrate</b> (\ref RateControlCfg.target_bandwidth)
543- <b>--min-q</b> (\ref RateControlCfg.best_allowed_q)
544- <b>--max-q</b> (\ref RateControlCfg.worst_allowed_q)
545- <b>--cq-level</b> (\ref RateControlCfg.cq_level)
546- <b>--undershoot-pct</b> (\ref RateControlCfg.under_shoot_pct)
547- <b>--overshoot-pct</b> (\ref RateControlCfg.over_shoot_pct)
Paul Wilkins83cfad42020-06-26 12:38:07 +0100548
Debargha Mukherjeec6a81202020-07-22 16:35:20 -0700549The following control aspects of vbr encoding
Paul Wilkins83cfad42020-06-26 12:38:07 +0100550
Debargha Mukherjeec6a81202020-07-22 16:35:20 -0700551- <b>--bias-pct</b> (\ref RateControlCfg.vbrbias)
552- <b>--minsection-pct</b> ((\ref RateControlCfg.vbrmin_section)
553- <b>--maxsection-pct</b> ((\ref RateControlCfg.vbrmax_section)
Paul Wilkins83cfad42020-06-26 12:38:07 +0100554
555The following relate to buffer and delay management in one pass low delay and
556real time coding
557
Paul Wilkins1dd7a7e2020-07-09 17:07:35 +0100558- <b>--buf-sz</b> (\ref RateControlCfg.maximum_buffer_size_ms)
559- <b>--buf-initial-sz</b> (\ref RateControlCfg.starting_buffer_level_ms)
560- <b>--buf-optimal-sz</b> (\ref RateControlCfg.optimal_buffer_level_ms)
Paul Wilkinsb534a782020-06-25 18:02:17 +0100561
562\subsection architecture_enc_vbr Variable Bitrate (VBR) Encoding
563
Paul Wilkins83cfad42020-06-26 12:38:07 +0100564For streamed VOD content the most common rate control strategy is Variable
565Bitrate (VBR) encoding. The CQ mode mentioned above is a variant of this
566where additional quantizer and quality constraints are applied. VBR
567encoding may in theory be used in conjunction with either 1-pass or 2-pass
568encoding.
Paul Wilkinsb534a782020-06-25 18:02:17 +0100569
Paul Wilkins83cfad42020-06-26 12:38:07 +0100570VBR encoding varies the number of bits given to each frame or group of frames
571according to the difficulty of that frame or group of frames, such that easier
572frames are allocated fewer bits and harder frames are allocated more bits. The
573intent here is to even out the quality between frames. This contrasts with
574Constant Bitrate (CBR) encoding where each frame is allocated the same number
575of bits.
576
577Whilst for any given frame or group of frames the data rate may vary, the VBR
578algorithm attempts to deliver a given average bitrate over a wider time
579interval. In standard VBR encoding, the time interval over which the data rate
580is averaged is usually the duration of the video clip. An alternative
581approach is to target an average VBR bitrate over the entire video corpus for
582a particular video format (corpus VBR).
583
584\subsubsection architecture_enc_1pass_vbr 1 Pass VBR Encoding
585
586The command line for libaom does allow 1 Pass VBR, but this has not been
Paul Wilkinsc4cfb442020-07-01 16:15:53 +0100587properly optimised and behaves much like 1 pass CBR in most regards, with bits
588allocated to frames by the following functions:
Paul Wilkins83cfad42020-06-26 12:38:07 +0100589
James Zerne8816162024-10-18 19:16:58 -0700590- \ref av1_calc_iframe_target_size_one_pass_vbr(
591 const struct AV1_COMP *const cpi)
592 "av1_calc_iframe_target_size_one_pass_vbr()"
593- \ref av1_calc_pframe_target_size_one_pass_vbr(
594 const struct AV1_COMP *const cpi,
595 FRAME_UPDATE_TYPE frame_update_type)
596 "av1_calc_pframe_target_size_one_pass_vbr()"
Paul Wilkins83cfad42020-06-26 12:38:07 +0100597
598\subsubsection architecture_enc_2pass_vbr 2 Pass VBR Encoding
599
600The main focus here will be on 2-pass VBR encoding (and the related CQ mode)
601as these are the modes most commonly used for VOD content.
602
6032-pass encoding is selected on the command line by setting --passes=2
604(or -p 2).
605
606Generally speaking, in 2-pass encoding, an encoder will first encode a video
607using a default set of parameters and assumptions. Depending on the outcome
608of that first encode, the baseline assumptions and parameters will be adjusted
609to optimize the output during the second pass. In essence the first pass is a
610fact finding mission to establish the complexity and variability of the video,
611in order to allow a better allocation of bits in the second pass.
612
613The libaom 2-pass algorithm is unusual in that the first pass is not a full
614encode of the video. Rather it uses a limited set of prediction and transform
615options and a fixed quantizer, to generate statistics about each frame. No
616output bitstream is created and the per frame first pass statistics are stored
617entirely in volatile memory. This has some disadvantages when compared to a
618full first pass encode, but avoids the need for file I/O and improves speed.
619
Paul Wilkinsc4cfb442020-07-01 16:15:53 +0100620For two pass encoding, the function \ref av1_encode() will first be called
621for each frame in the video with the value \ref AV1EncoderConfig.pass = 1.
622This will result in calls to \ref av1_first_pass().
Paul Wilkins83cfad42020-06-26 12:38:07 +0100623
Paul Wilkinse8c76eb2020-06-30 17:24:11 +0100624Statistics for each frame are stored in \ref FIRSTPASS_STATS frame_stats_buf.
Paul Wilkins83cfad42020-06-26 12:38:07 +0100625
626After completion of the first pass, \ref av1_encode() will be called again for
Paul Wilkinse8c76eb2020-06-30 17:24:11 +0100627each frame with \ref AV1EncoderConfig.pass = 2. The frames are then encoded in
Paul Wilkins83cfad42020-06-26 12:38:07 +0100628accordance with the statistics gathered during the first pass by calls to
Paul Wilkinsa0816fc2020-07-23 13:33:29 +0100629\ref encode_frame_to_data_rate() which in turn calls
630 \ref av1_get_second_pass_params().
Paul Wilkins83cfad42020-06-26 12:38:07 +0100631
632In summary the second pass code :-
633
634- Searches for scene cuts (if auto key frame detection is enabled).
635- Defines the length of and hierarchical structure to be used in each
636 ARF/GF group.
637- Allocates bits based on the relative complexity of each frame, the quality
638 of frame to frame prediction and the type of frame (e.g. key frame, ARF
639 frame, golden frame or normal leaf frame).
640- Suggests a maximum Q (quantizer value) for each ARF/GF group, based on
641 estimated complexity and recent rate control compliance
Paul Wilkinse8c76eb2020-06-30 17:24:11 +0100642 (\ref RATE_CONTROL.active_worst_quality)
Paul Wilkins83cfad42020-06-26 12:38:07 +0100643- Tracks adherence to the overall rate control objectives and adjusts
644 heuristics.
645
Paul Wilkins591f0472020-07-15 15:30:56 +0100646The main two pass functions in regard to the above include:-
Paul Wilkins83cfad42020-06-26 12:38:07 +0100647
Paul Wilkinsbe20bc22020-07-16 14:46:57 +0100648- \ref find_next_key_frame()
Paul Wilkinse8af1522020-07-09 15:05:01 +0100649- \ref define_gf_group()
Paul Wilkinsbe20bc22020-07-16 14:46:57 +0100650- \ref calculate_total_gf_group_bits()
651- \ref get_twopass_worst_quality()
652- \ref av1_gop_setup_structure()
653- \ref av1_gop_bit_allocation()
654- \ref av1_twopass_postencode_update()
Paul Wilkins83cfad42020-06-26 12:38:07 +0100655
656For each frame, the two pass algorithm defines a target number of bits
Paul Wilkinse8c76eb2020-06-30 17:24:11 +0100657\ref RATE_CONTROL.base_frame_target, which is then adjusted if necessary to
Paul Wilkins83cfad42020-06-26 12:38:07 +0100658reflect any undershoot or overshoot on previous frames to give
Paul Wilkinse8c76eb2020-06-30 17:24:11 +0100659\ref RATE_CONTROL.this_frame_target.
Paul Wilkins83cfad42020-06-26 12:38:07 +0100660
Paul Wilkinse8c76eb2020-06-30 17:24:11 +0100661As well as \ref RATE_CONTROL.active_worst_quality, the two pass code also
Paul Wilkins83cfad42020-06-26 12:38:07 +0100662maintains a record of the actual Q value used to encode previous frames
663at each level in the current pyramid hierarchy
Aasaipriyac6f0a0b2021-08-12 11:27:03 +0530664(\ref PRIMARY_RATE_CONTROL.active_best_quality). The function
Paul Wilkinsc4cfb442020-07-01 16:15:53 +0100665\ref rc_pick_q_and_bounds(), uses these values to set a permitted Q range
666for each frame.
Paul Wilkins83cfad42020-06-26 12:38:07 +0100667
668\subsubsection architecture_enc_1pass_lagged 1 Pass Lagged VBR Encoding
Paul Wilkinsb534a782020-06-25 18:02:17 +0100669
Paul Wilkinse8c76eb2020-06-30 17:24:11 +01006701 pass lagged encode falls between simple 1 pass encoding and full two pass
671encoding and is used for cases where it is not possible to do a full first
672pass through the entire video clip, but where some delay is permissible. For
673example near live streaming where there is a delay of up to a few seconds. In
674this case the first pass and second pass are in effect combined such that the
675first pass starts encoding the clip and the second pass lags behind it by a
676few frames. When using this method, full sequence level statistics are not
677available, but it is possible to collect and use frame or group of frame level
678data to help in the allocation of bits and in defining ARF/GF coding
Tarundeep Singh5e5305a2021-03-16 13:04:04 +0530679hierarchies. The reader is referred to the \ref AV1_PRIMARY.lap_enabled field
Paul Wilkins71739202020-07-23 15:09:07 +0100680in the main compressor instance (where <b>lap</b> stands for
Paul Wilkinse8c76eb2020-06-30 17:24:11 +0100681<b>look ahead processing</b>). This encoding mode for the most part uses the
682same rate control pathways as two pass VBR encoding.
Paul Wilkinsb534a782020-06-25 18:02:17 +0100683
684\subsection architecture_enc_rc_loop The Main Rate Control Loop
685
Paul Wilkinsc4cfb442020-07-01 16:15:53 +0100686Having established a target rate for a given frame and an allowed range of Q
687values, the encoder then tries to encode the frame at a rate that is as close
688as possible to the target value, given the Q range constraints.
689
690There are two main mechanisms by which this is achieved.
691
692The first selects a frame level Q, using an adaptive estimate of the number of
693bits that will be generated when the frame is encoded at any given Q.
694Fundamentally this mechanism is common to VBR, CBR and to use cases such as
695RTC with small adjustments.
696
697As the Q value mainly adjusts the precision of the residual signal, it is not
698actually a reliable basis for accurately predicting the number of bits that
699will be generated across all clips. A well predicted clip, for example, may
700have a much smaller error residual after prediction. The algorithm copes with
701this by adapting its predictions on the fly using a feedback loop based on how
702well it did the previous time around.
703
704The main functions responsible for the prediction of Q and the adaptation over
705time, for the two pass encoding pipeline are:
706
707- \ref rc_pick_q_and_bounds()
Paul Wilkins5ce9d502020-07-16 17:58:40 +0100708 - \ref get_q()
James Zerne8816162024-10-18 19:16:58 -0700709 - \ref av1_rc_regulate_q(
710 const struct AV1_COMP *cpi, int target_bits_per_frame,
711 int active_best_quality, int active_worst_quality,
712 int width, int height) "av1_rc_regulate_q()"
Paul Wilkins5ce9d502020-07-16 17:58:40 +0100713 - \ref get_rate_correction_factor()
714 - \ref set_rate_correction_factor()
715 - \ref find_closest_qindex_by_rate()
Paul Wilkinsbe20bc22020-07-16 14:46:57 +0100716- \ref av1_twopass_postencode_update()
Paul Wilkins5ce9d502020-07-16 17:58:40 +0100717 - \ref av1_rc_update_rate_correction_factors()
Paul Wilkinsc4cfb442020-07-01 16:15:53 +0100718
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100719A second mechanism for control comes into play if there is a large rate miss
Paul Wilkinsc4cfb442020-07-01 16:15:53 +0100720for the current frame (much too big or too small). This is a recode mechanism
721which allows the current frame to be re-encoded one or more times with a
722revised Q value. This obviously has significant implications for encode speed
723and in the case of RTC latency (hence it is not used for the RTC pathway).
724
725Whether or not a recode is allowed for a given frame depends on the selected
726encode speed vs quality trade off. This is set on the command line using the
727--cpu-used parameter which maps onto the \ref AV1_COMP.speed field in the main
728compressor instance data structure.
729
730The value of \ref AV1_COMP.speed, combined with the use case, is used to
731populate the speed features data structure AV1_COMP.sf. In particular
732\ref HIGH_LEVEL_SPEED_FEATURES.recode_loop determines the types of frames that
733may be recoded and \ref HIGH_LEVEL_SPEED_FEATURES.recode_tolerance is a rate
734error trigger threshold.
735
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100736For more information the reader is directed to the following functions:
Paul Wilkinsc4cfb442020-07-01 16:15:53 +0100737
Paul Wilkins591f0472020-07-15 15:30:56 +0100738- \ref encode_with_recode_loop()
Paul Wilkinsc8d3f112020-07-08 17:58:14 +0100739- \ref encode_without_recode()
Paul Wilkins591f0472020-07-15 15:30:56 +0100740- \ref recode_loop_update_q()
741- \ref recode_loop_test()
Paul Wilkins71739202020-07-23 15:09:07 +0100742- \ref av1_set_speed_features_framesize_independent()
743- \ref av1_set_speed_features_framesize_dependent()
Paul Wilkinsb534a782020-06-25 18:02:17 +0100744
745\subsection architecture_enc_fixed_q Fixed Q Mode
746
Paul Wilkinsea2876f2020-07-13 18:36:09 +0100747There are two main fixed Q cases:
748-# Fixed Q with adaptive qp offsets: same qp offset for each pyramid level
749 in a given video, but these offsets are adaptive based on video content.
750-# Fixed Q with fixed qp offsets: content-independent fixed qp offsets for
Jingning Han4eed2262021-09-08 15:48:50 -0700751 each pyramid level.
Paul Wilkinsea2876f2020-07-13 18:36:09 +0100752
753The reader is also refered to the following functions:
754- \ref av1_rc_pick_q_and_bounds()
755- \ref rc_pick_q_and_bounds_no_stats_cbr()
756- \ref rc_pick_q_and_bounds_no_stats()
757- \ref rc_pick_q_and_bounds()
Paul Wilkinsb534a782020-06-25 18:02:17 +0100758
Paul Wilkins1fb01722020-07-07 17:45:46 +0100759\section architecture_enc_frame_groups GF/ ARF Frame Groups & Hierarchical Coding
Paul Wilkinsb534a782020-06-25 18:02:17 +0100760
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100761\subsection architecture_enc_frame_groups_data Main Data Structures
762
763The following are the main data structures referenced in this section
764(see also \ref architecture_enc_data_structures):
Paul Wilkins1fb01722020-07-07 17:45:46 +0100765
766- \ref AV1_COMP cpi (the main compressor instance data structure)
767 - \ref AV1_COMP.rc (\ref RATE_CONTROL)
Paul Wilkins1fb01722020-07-07 17:45:46 +0100768
769- \ref FIRSTPASS_STATS *frame_stats_buf (used to store per frame first pass
770stats)
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100771
772\subsection architecture_enc_frame_groups_groups Frame Groups
Paul Wilkins1fb01722020-07-07 17:45:46 +0100773
774To process a sequence/stream of video frames, the encoder divides the frames
775into groups and encodes them sequentially (possibly dependent on previous
776groups). In AV1 such a group is usually referred to as a golden frame group
777(GF group) or sometimes an Alt-Ref (ARF) group or a group of pictures (GOP).
778A GF group determines and stores the coding structure of the frames (for
779example, frame type, usage of the hierarchical structure, usage of overlay
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100780frames, etc.) and can be considered as the base unit to process the frames,
Paul Wilkins1fb01722020-07-07 17:45:46 +0100781therefore playing an important role in the encoder.
782
783The length of a specific GF group is arguably the most important aspect when
784determining a GF group. This is because most GF group level decisions are
785based on the frame characteristics, if not on the length itself directly.
786Note that the GF group is always a group of consecutive frames, which means
787the start and end of the group (so again, the length of it) determines which
788frames are included in it and hence determines the characteristics of the GF
789group. Therefore, in this document we will first discuss the GF group length
790decision in Libaom, followed by frame structure decisions when defining a GF
791group with a certain length.
792
793\subsection architecture_enc_gf_length GF / ARF Group Length Determination
794
795The basic intuition of determining the GF group length is that it is usually
796desirable to group together frames that are similar. Hence, we may choose
797longer groups when consecutive frames are very alike and shorter ones when
798they are very different.
799
bohanlid165b192020-06-10 21:46:29 -0700800The determination of the GF group length is done in function \ref
Paul Wilkins1fb01722020-07-07 17:45:46 +0100801calculate_gf_length(). The following encoder use cases are supported:
802
803<ul>
Paul Wilkinsff98f3e2020-07-27 16:01:05 +0100804 <li><b>Single pass with look-ahead disabled(\ref has_no_stats_stage()):
Paul Wilkins1fb01722020-07-07 17:45:46 +0100805 </b> in this case there is no information available on the following stream
806 of frames, therefore the function will set the GF group length for the
807 current and the following GF groups (a total number of MAX_NUM_GF_INTERVALS
808 groups) to be the maximum value allowed.</li>
809
Tarundeep Singh5e5305a2021-03-16 13:04:04 +0530810 <li><b>Single pass with look-ahead enabled (\ref AV1_PRIMARY.lap_enabled):</b>
Paul Wilkins1fb01722020-07-07 17:45:46 +0100811 look-ahead processing is enabled for single pass, therefore there is a
812 limited amount of information available regarding future frames. In this
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100813 case the function will determine the length based on \ref FIRSTPASS_STATS
Paul Wilkins1fb01722020-07-07 17:45:46 +0100814 (which is generated when processing the look-ahead buffer) for only the
815 current GF group.</li>
816
817 <li><b>Two pass:</b> the first pass in two-pass encoding collects the stats
818 and will not call the function. In the second pass, the function tries to
819 determine the GF group length of the current and the following GF groups (a
820 total number of MAX_NUM_GF_INTERVALS groups) based on the first-pass
821 statistics. Note that as we will be discussing later, such decisions may not
822 be accurate and can be changed later.</li>
823</ul>
824
825Except for the first trivial case where there is no prior knowledge of the
Bohan Licb3b65b2020-11-04 13:50:00 -0800826following frames, the function \ref calculate_gf_length() tries to determine the
827GF group length based on the first pass statistics. The determination is divided
828into two parts:
Paul Wilkins1fb01722020-07-07 17:45:46 +0100829
830<ol>
831 <li>Baseline decision based on accumulated statistics: this part of the function
832 iterates through the firstpass statistics of the following frames and
833 accumulates the statistics with function accumulate_next_frame_stats.
834 The accumulated statistics are then used to determine whether the
835 correlation in the GF group has dropped too much in function detect_gf_cut.
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100836 If detect_gf_cut returns non-zero, or if we've reached the end of
Paul Wilkins1fb01722020-07-07 17:45:46 +0100837 first-pass statistics, the baseline decision is set at the current point.</li>
838
839 <li>If we are not at the end of the first-pass statistics, the next part will
Bohan Licb3b65b2020-11-04 13:50:00 -0800840 try to refine the baseline decision. This algorithm is based on the analysis
841 of firstpass stats. It tries to cut the groups in stable regions or
842 relatively stable points. Also it tries to avoid cutting in a blending
843 region.</li>
Paul Wilkins1fb01722020-07-07 17:45:46 +0100844</ol>
845
bohanlid165b192020-06-10 21:46:29 -0700846As mentioned, for two-pass encoding, the function \ref
Paul Wilkins1fb01722020-07-07 17:45:46 +0100847calculate_gf_length() tries to determine the length of as many as
848MAX_NUM_GF_INTERVALS groups. The decisions are stored in
Mufaddal Chakera94ee9bf2021-04-12 01:02:22 +0530849\ref PRIMARY_RATE_CONTROL.gf_intervals[]. The variables
Paul Wilkins1fb01722020-07-07 17:45:46 +0100850\ref RATE_CONTROL.intervals_till_gf_calculate_due and
Mufaddal Chakera94ee9bf2021-04-12 01:02:22 +0530851\ref PRIMARY_RATE_CONTROL.gf_intervals[] help with managing and updating the stored
bohanlid165b192020-06-10 21:46:29 -0700852decisions. In the function \ref define_gf_group(), the corresponding
Paul Wilkins1fb01722020-07-07 17:45:46 +0100853stored length decision will be used to define the current GF group.
854
855When the maximum GF group length is larger or equal to 32, the encoder will
856enforce an extra layer to determine whether to use maximum GF length of 32
bohanlid165b192020-06-10 21:46:29 -0700857or 16 for every GF group. In such a case, \ref calculate_gf_length() is
Paul Wilkins1fb01722020-07-07 17:45:46 +0100858first called with the original maximum length (>=32). Afterwards,
Paul Wilkinsff98f3e2020-07-27 16:01:05 +0100859\ref av1_tpl_setup_stats() is called to analyze the determined GF group
Paul Wilkins1fb01722020-07-07 17:45:46 +0100860and compare the reference to the last frame and the middle frame. If it is
861decided that we should use a maximum GF length of 16, the function
bohanlid165b192020-06-10 21:46:29 -0700862\ref calculate_gf_length() is called again with the updated maximum
Paul Wilkins1fb01722020-07-07 17:45:46 +0100863length, and it only sets the length for a single GF group
864(\ref RATE_CONTROL.intervals_till_gf_calculate_due is set to 1). This process
Bohan Licb3b65b2020-11-04 13:50:00 -0800865is shown below.
866
867\image html tplgfgroupdiagram.png "" width=40%
Paul Wilkins1fb01722020-07-07 17:45:46 +0100868
869Before encoding each frame, the encoder checks
870\ref RATE_CONTROL.frames_till_gf_update_due. If it is zero, indicating
871processing of the current GF group is done, the encoder will check whether
872\ref RATE_CONTROL.intervals_till_gf_calculate_due is zero. If it is, as
bohanlid165b192020-06-10 21:46:29 -0700873discussed above, \ref calculate_gf_length() is called with original
Paul Wilkins1fb01722020-07-07 17:45:46 +0100874maximum length. If it is not zero, then the GF group length value stored
Mufaddal Chakera94ee9bf2021-04-12 01:02:22 +0530875in \ref PRIMARY_RATE_CONTROL.gf_intervals[\ref PRIMARY_RATE_CONTROL.cur_gf_index] is used
Paul Wilkins1fb01722020-07-07 17:45:46 +0100876(subject to change as discussed above).
877
Paul Wilkinse8af1522020-07-09 15:05:01 +0100878\subsection architecture_enc_gf_structure Defining a GF Group's Structure
879
880The function \ref define_gf_group() defines the frame structure as well
881as other GF group level parameters (e.g. bit allocation) once the length of
882the current GF group is determined.
883
Bohan Licb3b65b2020-11-04 13:50:00 -0800884The function first iterates through the first pass statistics in the GF group to
885accumulate various stats, using accumulate_this_frame_stats() and
886accumulate_next_frame_stats(). The accumulated statistics are then used to
887determine the use of the use of ALTREF frame along with other properties of the
Mufaddal Chakera94ee9bf2021-04-12 01:02:22 +0530888GF group. The values of \ref PRIMARY_RATE_CONTROL.cur_gf_index, \ref
Bohan Licb3b65b2020-11-04 13:50:00 -0800889RATE_CONTROL.intervals_till_gf_calculate_due and \ref
890RATE_CONTROL.frames_till_gf_update_due are also updated accordingly.
Paul Wilkinse8af1522020-07-09 15:05:01 +0100891
Bohan Licb3b65b2020-11-04 13:50:00 -0800892The function \ref av1_gop_setup_structure() is called at the end to determine
893the frame layers and reference maps in the GF group, where the
894construct_multi_layer_gf_structure() function sets the frame update types for
895each frame and the group structure.
Paul Wilkinse8af1522020-07-09 15:05:01 +0100896
897- If ALTREF frames are allowed for the GF group: the first frame is set to
Bohan Licb3b65b2020-11-04 13:50:00 -0800898 KF_UPDATE, GF_UPDATE or ARF_UPDATE. The last frames of the GF group is set to
899 OVERLAY_UPDATE. Then in set_multi_layer_params(), frame update
900 types are determined recursively in a binary tree fashion, and assigned to
901 give the final IBBB structure for the group. - If the current branch has more
902 than 2 frames and we have not reached maximum layer depth, then the middle
903 frame is set as INTNL_ARF_UPDATE, and the left and right branches are
904 processed recursively. - If the current branch has less than 3 frames, or we
905 have reached maximum layer depth, then every frame in the branch is set to
906 LF_UPDATE.
Paul Wilkinse8af1522020-07-09 15:05:01 +0100907
Bohan Licb3b65b2020-11-04 13:50:00 -0800908- If ALTREF frame is not allowed for the GF group: the frames are set
909 as LF_UPDATE. This basically forms an IPPP GF group structure.
910
911As mentioned, the encoder may use Temporal dependancy modelling (TPL - see \ref
912architecture_enc_tpl) to determine whether we should use a maximum length of 32
913or 16 for the current GF group. This requires calls to \ref define_gf_group()
914but should not change other settings (since it is in essence a trial). This
915special case is indicated by the setting parameter <b>is_final_pass</b> for to
916zero.
Paul Wilkinse8af1522020-07-09 15:05:01 +0100917
918For single pass encodes where look-ahead processing is disabled
Tarundeep Singh5e5305a2021-03-16 13:04:04 +0530919(\ref AV1_PRIMARY.lap_enabled = 0), \ref define_gf_group_pass0() is used
Paul Wilkinse8af1522020-07-09 15:05:01 +0100920instead of \ref define_gf_group().
921
Paul Wilkins1fb01722020-07-07 17:45:46 +0100922\subsection architecture_enc_kf_groups Key Frame Groups
923
924A special constraint for GF group length is the location of the next keyframe
925(KF). The frames between two KFs are referred to as a KF group. Each KF group
926can be encoded and decoded independently. Because of this, a GF group cannot
927span beyond a KF and the location of the next KF is set as a hard boundary
928for GF group length.
929
930<ul>
931 <li>For two-pass encoding \ref RATE_CONTROL.frames_to_key controls when to
932 encode a key frame. When it is zero, the current frame is a keyframe and
bohanlid165b192020-06-10 21:46:29 -0700933 the function \ref find_next_key_frame() is called. This in turn calls
934 \ref define_kf_interval() to work out where the next key frame should
Paul Wilkins1fb01722020-07-07 17:45:46 +0100935 be placed.</li>
936
bohanlid165b192020-06-10 21:46:29 -0700937 <li>For single-pass with look-ahead enabled, \ref define_kf_interval()
Paul Wilkins1fb01722020-07-07 17:45:46 +0100938 is called whenever a GF group update is needed (when
939 \ref RATE_CONTROL.frames_till_gf_update_due is zero). This is because
940 generally KFs are more widely spaced and the look-ahead buffer is usually
941 not long enough.</li>
942
943 <li>For single-pass with look-ahead disabled, the KFs are placed according
944 to the command line parameter <b>--kf-max-dist</b> (The above two cases are
945 also subject to this constraint).</li>
946</ul>
947
bohanlid165b192020-06-10 21:46:29 -0700948The function \ref define_kf_interval() tries to detect a scenecut.
Paul Wilkins1fb01722020-07-07 17:45:46 +0100949If a scenecut within kf-max-dist is detected, then it is set as the next
950keyframe. Otherwise the given maximum value is used.
Paul Wilkinsb534a782020-06-25 18:02:17 +0100951
952\section architecture_enc_tpl Temporal Dependency Modelling
Paul Wilkins1fb01722020-07-07 17:45:46 +0100953
Paul Wilkinsf209ec52020-07-06 16:03:52 +0100954The temporal dependency model runs at the beginning of each GOP. It builds the
955motion trajectory within the GOP in units of 16x16 blocks. The temporal
956dependency of a 16x16 block is evaluated as the predictive coding gains it
957contributes to its trailing motion trajectory. This temporal dependency model
958reflects how important a coding block is for the coding efficiency of the
959overall GOP. It is hence used to scale the Lagrangian multiplier used in the
960rate-distortion optimization framework.
Paul Wilkinsb534a782020-06-25 18:02:17 +0100961
Paul Wilkinsf209ec52020-07-06 16:03:52 +0100962\subsection architecture_enc_tpl_config Configurations
963
964The temporal dependency model and its applications are by default turned on in
965libaom encoder for the VoD use case. To disable it, use --tpl-model=0 in the
966aomenc configuration.
967
Paul Wilkinsf209ec52020-07-06 16:03:52 +0100968\subsection architecture_enc_tpl_algoritms Algorithms
969
970The scheme works in the reverse frame processing order over the source frames,
971propagating information from future frames back to the current frame. For each
972frame, a propagation step is run for each MB. it operates as follows:
973
974<ul>
975 <li> Estimate the intra prediction cost in terms of sum of absolute Hadamard
976 transform difference (SATD) noted as intra_cost. It also loads the motion
977 information available from the first-pass encode and estimates the inter
978 prediction cost as inter_cost. Due to the use of hybrid inter/intra
979 prediction mode, the inter_cost value is further upper bounded by
980 intra_cost. A propagation cost variable is used to collect all the
981 information flowed back from future processing frames. It is initialized as
982 0 for all the blocks in the last processing frame in a group of pictures
983 (GOP).</li>
984
985 <li> The fraction of information from a current block to be propagated towards
986 its reference block is estimated as:
987\f[
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100988 propagation\_fraction = (1 - inter\_cost/intra\_cost)
Paul Wilkinsf209ec52020-07-06 16:03:52 +0100989\f]
990 It reflects how much the motion compensated reference would reduce the
991 prediction error in percentage.</li>
992
993 <li> The total amount of information the current block contributes to the GOP
994 is estimated as intra_cost + propagation_cost. The information that it
995 propagates towards its reference block is captured by:
996
997\f[
998 propagation\_amount =
Paul Wilkinsb2194de2020-07-08 17:58:14 +0100999 (intra\_cost + propagation\_cost) * propagation\_fraction
Paul Wilkinsf209ec52020-07-06 16:03:52 +01001000\f]</li>
1001
1002 <li> Note that the reference block may not necessarily sit on the grid of
1003 16x16 blocks. The propagation amount is hence dispensed to all the blocks
1004 that overlap with the reference block. The corresponding block in the
1005 reference frame accumulates its own propagation cost as it receives back
1006 propagation.
1007
1008\f[
1009 propagation\_cost = propagation\_cost +
Paul Wilkinsb2194de2020-07-08 17:58:14 +01001010 (\frac{overlap\_area}{(16*16)} * propagation\_amount)
Paul Wilkinsf209ec52020-07-06 16:03:52 +01001011\f]</li>
1012
1013 <li> In the final encoding stage, the distortion propagation factor of a block
1014 is evaluated as \f$(1 + \frac{propagation\_cost}{intra\_cost})\f$, where the second term
1015 captures its impact on later frames in a GOP.</li>
1016
1017 <li> The Lagrangian multiplier is adapted at the 64x64 block level. For every
1018 64x64 block in a frame, we have a distortion propagation factor:
1019
1020\f[
Paul Wilkinsb2194de2020-07-08 17:58:14 +01001021 dist\_prop[i] = 1 + \frac{propagation\_cost[i]}{intra\_cost[i]}
Paul Wilkinsf209ec52020-07-06 16:03:52 +01001022\f]
1023
1024 where i denotes the block index in the frame. We also have the frame level
1025 distortion propagation factor:
1026
1027\f[
1028 dist\_prop = 1 +
Paul Wilkinsb2194de2020-07-08 17:58:14 +01001029 \frac{\sum_{i}propagation\_cost[i]}{\sum_{i}intra\_cost[i]}
Paul Wilkinsf209ec52020-07-06 16:03:52 +01001030\f]
1031
1032 which is used to normalize the propagation factor at the 64x64 block level. The
1033 Lagrangian multiplier is hence adapted as:
1034
1035\f[
1036 &lambda;[i] = &lambda;[0] * \frac{dist\_prop}{dist\_prop[i]}
1037\f]
1038
1039 where &lambda;0 is the multiplier associated with the frame level QP. The
1040 64x64 block level QP is scaled according to the Lagrangian multiplier.
1041</ul>
1042
Paul Wilkinsff98f3e2020-07-27 16:01:05 +01001043\subsection architecture_enc_tpl_keyfun Key Functions and data structures
Paul Wilkinsf209ec52020-07-06 16:03:52 +01001044
Paul Wilkinsff98f3e2020-07-27 16:01:05 +01001045The reader is also refered to the following functions and data structures:
1046
1047- \ref TplParams
1048- \ref av1_tpl_setup_stats() builds the TPL model.
1049- \ref setup_delta_q() Assign different quantization parameters to each super
1050 block based on its TPL weight.
Paul Wilkinsb534a782020-06-25 18:02:17 +01001051
1052\section architecture_enc_partitions Block Partition Search
1053
Paul Wilkins196995d2020-07-14 16:49:38 +01001054 A frame is first split into tiles in \ref encode_tiles(), with each tile
1055 compressed by av1_encode_tile(). Then a tile is processed in superblock rows
1056 via \ref av1_encode_sb_row() and then \ref encode_sb_row().
1057
1058 The partition search processes superblocks sequentially in \ref
1059 encode_sb_row(). Two search modes are supported, depending upon the encoding
1060 configuration, \ref encode_nonrd_sb() is for 1-pass and real-time modes,
1061 while \ref encode_rd_sb() performs more exhaustive rate distortion based
1062 searches.
1063
1064 Partition search over the recursive quad-tree space is implemented by
1065 recursive calls to \ref av1_nonrd_use_partition(),
1066 \ref av1_rd_use_partition(), or av1_rd_pick_partition() and returning best
1067 options for sub-trees to their parent partitions.
1068
Paul Wilkins3a13f642020-07-29 17:35:33 +01001069 In libaom, the partition search lays on top of the mode search (predictor,
1070 transform, etc.), instead of being a separate module. The interface of mode
1071 search is \ref pick_sb_modes(), which connects the partition_search with
1072 \ref architecture_enc_inter_modes and \ref architecture_enc_intra_modes. To
1073 make good decisions, reconstruction is also required in order to build
1074 references and contexts. This is implemented by \ref encode_sb() at the
1075 sub-tree level and \ref encode_b() at coding block level.
Paul Wilkins196995d2020-07-14 16:49:38 +01001076
1077 See also \ref partition_search
Paul Wilkinsb534a782020-06-25 18:02:17 +01001078
Paul Wilkinsb534a782020-06-25 18:02:17 +01001079\section architecture_enc_intra_modes Intra Mode Search
1080
Paul Wilkins4ac8bf42020-07-30 16:44:27 +01001081AV1 also provides 71 different intra prediction modes, i.e. modes that predict
1082only based upon information in the current frame with no dependency on
1083previous or future frames. For key frames, where this independence from any
1084other frame is a defining requirement and for other cases where intra only
1085frames are required, the encoder need only considers these modes in the rate
1086distortion loop.
1087
1088Even so, in most use cases, searching all possible intra prediction modes for
1089every block and partition size is not practical and some pruning of the search
1090tree is necessary.
1091
1092For the Rate distortion optimized case, the main top level function
1093responsible for selecting the intra prediction mode for a given block is
1094\ref av1_rd_pick_intra_mode_sb(). The readers attention is also drawn to the
1095functions \ref hybrid_intra_mode_search() and \ref av1_nonrd_pick_intra_mode()
1096which may be used where encode speed is critical. The choice between the
1097rd path and the non rd or hybrid paths depends on the encoder use case and the
1098\ref AV1_COMP.speed parameter. Further fine control of the speed vs quality
1099trade off is provided by means of fields in \ref AV1_COMP.sf (which has type
1100\ref SPEED_FEATURES).
1101
1102Note that some intra modes are only considered for specific use cases or
1103types of video. For example the palette based prediction modes are often
1104valueable for graphics or screen share content but not for natural video.
1105(See \ref av1_search_palette_mode())
1106
Paul Wilkins3a13f642020-07-29 17:35:33 +01001107See also \ref intra_mode_search for more details.
1108
1109\section architecture_enc_inter_modes Inter Prediction Mode Search
1110
Paul Wilkinsda6a80b2020-07-30 17:27:56 +01001111For inter frames, where we also allow prediction using one or more previously
1112coded frames (which may chronologically speaking be past or future frames or
1113non-display reference buffers such as ARF frames), the size of the search tree
1114that needs to be traversed, to select a prediction mode, is considerably more
1115massive.
1116
1117In addition to the 71 possible intra modes we also need to consider 56 single
1118frame inter prediction modes (7 reference frames x 4 modes x 2 for OBMC
1119(overlapped block motion compensation)), 12768 compound inter prediction modes
1120(these are modes that combine inter predictors from two reference frames) and
112136708 compound inter / intra prediction modes.
1122
1123As with the intra mode search, libaom supports an RD based pathway and a non
1124rd pathway for speed critical use cases. The entry points for these two cases
Jingning Hane9eb8c02020-11-11 14:47:53 -08001125are \ref av1_rd_pick_inter_mode() and \ref av1_nonrd_pick_inter_mode_sb()
Paul Wilkinsda6a80b2020-07-30 17:27:56 +01001126respectively.
1127
1128Various heuristics and predictive strategies are used to prune the search tree
1129with fine control provided through the speed features parameter in the main
1130compressor instance data structure \ref AV1_COMP.sf.
1131
1132It is worth noting, that some prediction modes incurr a much larger rate cost
1133than others (ignoring for now the cost of coding the error residual). For
1134example, a compound mode that requires the encoder to specify two reference
1135frames and two new motion vectors will almost inevitable have a higher rate
1136cost than a simple inter prediction mode that uses a predicted or 0,0 motion
1137vector. As such, if we have already found a mode for the current block that
1138has a low RD cost, we can skip a large number of the possible modes on the
1139basis that even if the error residual is 0 the inherent rate cost of the
1140mode itself will garauntee that it is not chosen.
1141
Paul Wilkins3a13f642020-07-29 17:35:33 +01001142See also \ref inter_mode_search for more details.
Paul Wilkinsb534a782020-06-25 18:02:17 +01001143
1144\section architecture_enc_tx_search Transform Search
1145
Paul Wilkins8ed85dd2020-08-04 17:48:22 +01001146AV1 implements the transform stage using 4 seperable 1-d transforms (DCT,
1147ADST, FLIPADST and IDTX, where FLIPADST is the reversed version of ADST
1148and IDTX is the identity transform) which can be combined to give 16 2-d
1149combinations.
Paul Wilkins3a13f642020-07-29 17:35:33 +01001150
1151These combinations can be applied at 19 different scales from 64x64 pixels
1152down to 4x4 pixels.
1153
1154This gives rise to a large number of possible candidate transform options
1155for coding the residual error after prediction. An exhaustive rate-distortion
1156based evaluation of all candidates would not be practical from a speed
1157perspective in a production encoder implementation. Hence libaom addopts a
1158number of strategies to prune the selection of both the transform size and
1159transform type.
1160
1161There are a number of strategies that have been tested and implememnted in
1162libaom including:
1163
1164- A statistics based approach that looks at the frequency with which certain
1165 combinations are used in a given context and prunes out very unlikely
1166 candidates. It is worth noting here that some size candidates can be pruned
1167 out immediately based on the size of the prediction partition. For example it
1168 does not make sense to use a transform size that is larger than the
1169 prediction partition size but also a very large prediction partition size is
1170 unlikely to be optimally pared with small transforms.
1171
1172- A Machine learning based model
1173
1174- A method that initially tests candidates using a fast algorithm that skips
1175 entropy encoding and uses an estimated cost model to choose a reduced subset
1176 for full RD analysis. This subject is covered more fully in a paper authored
1177 by Bohan Li, Jingning Han, and Yaowu Xu titled: <b>Fast Transform Type
1178 Selection Using Conditional Laplace Distribution Based Rate Estimation</b>
1179
1180<b>TODO Add link to paper when available</b>
1181
1182See also \ref transform_search for more details.
Paul Wilkinsb534a782020-06-25 18:02:17 +01001183
Paul Wilkinsd7a9f0e2020-07-30 18:12:40 +01001184\section architecture_post_enc_filt Post Encode Loop Filtering
Paul Wilkinsb534a782020-06-25 18:02:17 +01001185
Paul Wilkinsd7a9f0e2020-07-30 18:12:40 +01001186AV1 supports three types of post encode <b>in loop</b> filtering to improve
1187the quality of the reconstructed video.
Paul Wilkinsb534a782020-06-25 18:02:17 +01001188
Paul Wilkinsd7a9f0e2020-07-30 18:12:40 +01001189- <b>Deblocking Filter</b> The first of these is a farily traditional boundary
1190 deblocking filter that attempts to smooth discontinuities that may occur at
1191 the boundaries between blocks. See also \ref in_loop_filter.
Paul Wilkinsb534a782020-06-25 18:02:17 +01001192
Paul Wilkinsd7a9f0e2020-07-30 18:12:40 +01001193- <b>CDEF Filter</b> The constrained directional enhancement filter (CDEF)
1194 allows the codec to apply a non-linear deringing filter along certain
1195 (potentially oblique) directions. A primary filter is applied along the
Paul Wilkins10e99442020-08-05 15:35:44 +01001196 selected direction, whilst a secondary filter is applied at 45 degrees to
Paul Wilkinsf88a1512020-10-20 13:18:40 +01001197 the primary direction. (See also \ref in_loop_cdef and
1198 <a href="https://arxiv.org/abs/2008.06091"> A Technical Overview of AV1</a>.
Paul Wilkinsb534a782020-06-25 18:02:17 +01001199
Paul Wilkinsd7a9f0e2020-07-30 18:12:40 +01001200- <b>Loop Restoration Filter</b> The loop restoration filter is applied after
Paul Wilkins10e99442020-08-05 15:35:44 +01001201 any prior post filtering stages. It acts on units of either 64 x 64,
1202 128 x 128, or 256 x 256 pixel blocks, refered to as loop restoration units.
Paul Wilkinsd7a9f0e2020-07-30 18:12:40 +01001203 Each unit can independently select either to bypass filtering, use a Wiener
1204 filter, or use a self-guided filter. (See also \ref in_loop_restoration and
Paul Wilkinsf88a1512020-10-20 13:18:40 +01001205 <a href="https://arxiv.org/abs/2008.06091"> A Technical Overview of AV1</a>.
Paul Wilkinsb534a782020-06-25 18:02:17 +01001206
1207\section architecture_entropy Entropy Coding
1208
Paul Wilkinsef79fe42020-08-04 19:32:11 +01001209\subsection architecture_entropy_aritmetic Arithmetic Coder
1210
1211VP9, used a binary arithmetic coder to encode symbols, where the propability
1212of a 1 or 0 at each descision node was based on a context model that took
1213into account recently coded values (for example previously coded coefficients
1214in the current block). A mechanism existed to update the context model each
1215frame, either explicitly in the bitstream, or implicitly at both the encoder
1216and decoder based on the observed frequency of different outcomes in the
1217previous frame. VP9 also supported seperate context models for different types
1218of frame (e.g. inter coded frames and key frames).
1219
1220In contrast, AV1 uses an M-ary symbol arithmetic coder to compress the syntax
1221elements, where integer \f$M\in[2, 14]\f$. This approach is based upon the entropy
1222coding strategy used in the Daala video codec and allows for some bit-level
1223parallelism in its implementation. AV1 also has an extended context model and
1224allows for updates to the probabilities on a per symbol basis as opposed to
1225the per frame strategy in VP9.
1226
1227To improve the performance / throughput of the arithmetic encoder, especially
1228in hardware implementations, the probability model is updated and maintained
1229at 15-bit precision, but the arithmetic encoder only uses the most significant
12309 bits when encoding a symbol. A more detailed discussion of the algorithm
Paul Wilkinsf88a1512020-10-20 13:18:40 +01001231and design constraints can be found in
1232<a href="https://arxiv.org/abs/2008.06091"> A Technical Overview of AV1</a>.
Paul Wilkinsef79fe42020-08-04 19:32:11 +01001233
1234TODO add references to key functions / files.
1235
1236As with VP9, a mechanism exists in AV1 to encode some elements into the
1237bitstream as uncrompresed bits or literal values, without using the arithmetic
1238coder. For example, some frame and sequence header values, where it is
1239beneficial to be able to read the values directly.
1240
1241TODO add references to key functions / files.
Paul Wilkins386cb692020-08-04 18:11:17 +01001242
angiebird9101c0e2020-08-17 11:16:23 -07001243\subsection architecture_entropy_coef Transform Coefficient Coding and Optimization
1244\image html coeff_coding.png "" width=70%
Paul Wilkins386cb692020-08-04 18:11:17 +01001245
angiebird9101c0e2020-08-17 11:16:23 -07001246\subsubsection architecture_entropy_coef_what Transform coefficient coding
1247Transform coefficient coding is where the encoder compresses a quantized version
1248of prediction residue into the bitstream.
1249
1250\paragraph architecture_entropy_coef_prepare Preparation - transform and quantize
1251Before the entropy coding stage, the encoder decouple the pixel-to-pixel
1252correlation of the prediction residue by transforming the residue from the
1253spatial domain to the frequency domain. Then the encoder quantizes the transform
1254coefficients to make the coefficients ready for entropy coding.
1255
1256\paragraph architecture_entropy_coef_coding The coding process
1257The encoder uses \ref av1_write_coeffs_txb() to write the coefficients of
1258a transform block into the bitstream.
1259The coding process has three stages.
12601. The encoder will code transform block skip flag (txb_skip). If the skip flag is
1261off, then the encoder will code the end of block position (eob) which is the scan
1262index of the last non-zero coefficient plus one.
12632. Second, the encoder will code lower magnitude levels of each coefficient in
1264reverse scan order.
12653. Finally, the encoder will code the sign and higher magnitude levels for each
1266coefficient if they are available.
1267
1268Related functions:
1269- \ref av1_write_coeffs_txb()
1270- write_inter_txb_coeff()
1271- \ref av1_write_intra_coeffs_mb()
1272
1273\paragraph architecture_entropy_coef_context Context information
1274To improve the compression efficiency, the encoder uses several context models
1275tailored for transform coefficients to capture the correlations between coding
1276symbols. Most of the context models are built to capture the correlations
1277between the coefficients within the same transform block. However, transform
1278block skip flag (txb_skip) and the sign of dc coefficient (dc_sign) require
1279context info from neighboring transform blocks.
1280
1281Here is how context info spread between transform blocks. Before coding a
1282transform block, the encoder will use get_txb_ctx() to collect the context
1283information from neighboring transform blocks. Then the context information
1284will be used for coding transform block skip flag (txb_skip) and the sign of
1285dc coefficient (dc_sign). After the transform block is coded, the encoder will
1286extract the context info from the current block using
1287\ref av1_get_txb_entropy_context(). Then encoder will store the context info
1288into a byte (uint8_t) using av1_set_entropy_contexts(). The encoder will use
1289the context info to code other transform blocks.
1290
1291Related functions:
1292- \ref av1_get_txb_entropy_context()
1293- av1_set_entropy_contexts()
1294- get_txb_ctx()
1295- \ref av1_update_intra_mb_txb_context()
1296
1297\subsubsection architecture_entropy_coef_rd RD optimization
1298Beside the actual entropy coding, the encoder uses several utility functions
1299to make optimal RD decisions.
1300
1301\paragraph architecture_entropy_coef_cost Entropy cost
1302The encoder uses \ref av1_cost_coeffs_txb() or \ref av1_cost_coeffs_txb_laplacian()
1303to estimate the entropy cost of a transform block. Note that
1304\ref av1_cost_coeffs_txb() is slower but accurate whereas
1305\ref av1_cost_coeffs_txb_laplacian() is faster but less accurate.
1306
1307Related functions:
1308- \ref av1_cost_coeffs_txb()
1309- \ref av1_cost_coeffs_txb_laplacian()
1310- \ref av1_cost_coeffs_txb_estimate()
1311
1312\paragraph architecture_entropy_coef_opt Quantized level optimization
Vishesha45092c2021-01-25 00:28:11 +05301313Beside computing entropy cost, the encoder also uses \ref av1_optimize_txb()
angiebird9101c0e2020-08-17 11:16:23 -07001314to adjust the coefficient’s quantized levels to achieve optimal RD trade-off.
Vishesha45092c2021-01-25 00:28:11 +05301315In \ref av1_optimize_txb(), the encoder goes through each quantized
angiebird9101c0e2020-08-17 11:16:23 -07001316coefficient and lowers the quantized coefficient level by one if the action
1317yields a better RD score.
1318
1319Related functions:
Vishesha45092c2021-01-25 00:28:11 +05301320- \ref av1_optimize_txb()
angiebird9101c0e2020-08-17 11:16:23 -07001321
1322All the related functions are listed in \ref coefficient_coding.
Paul Wilkinsb534a782020-06-25 18:02:17 +01001323
Rachel Barker57586292024-02-20 20:56:16 +00001324\section architecture_simd SIMD usage
1325
1326In order to efficiently encode video on modern platforms, it is necessary to
1327implement optimized versions of many core encoding and decoding functions using
1328architecture-specific SIMD instructions.
1329
1330Functions which have optimized implementations will have multiple variants
1331in the code, each suffixed with the name of the appropriate instruction set.
1332There will additionally be an `_c` version, which acts as a reference
1333implementation which the SIMD variants can be tested against.
1334
1335As different machines with the same nominal architecture may support different
1336subsets of SIMD instructions, we have dynamic CPU detection logic which chooses
1337the appropriate functions to use at run time. This process is handled by
1338`build/cmake/rtcd.pl`, with function definitions in the files
1339`*_rtcd_defs.pl` elsewhere in the codebase.
1340
1341Currently SIMD is supported on the following platforms:
1342
1343- x86: Requires SSE4.1 or above
1344
1345- Arm: Requires Neon (Armv7-A and above)
1346
1347We aim to provide implementations of all performance-critical functions which
1348are compatible with the instruction sets listed above. Additional SIMD
1349extensions (e.g. AVX on x86, SVE on Arm) are also used to provide even
1350greater performance where available.
1351
Paul Wilkinsb534a782020-06-25 18:02:17 +01001352*/
Yunqing Wang65cd0102020-05-06 12:57:04 -07001353
1354/*!\defgroup encoder_algo Encoder Algorithm
1355 *
1356 * The encoder algorithm describes how a sequence is encoded, including high
1357 * level decision as well as algorithm used at every encoding stage.
1358 */
1359
1360/*!\defgroup high_level_algo High-level Algorithm
1361 * \ingroup encoder_algo
1362 * This module describes sequence level/frame level algorithm in AV1.
1363 * More details will be added.
1364 * @{
1365 */
Elliott Karpilovsky2ea18362020-06-02 18:32:27 -07001366
Paul Wilkins71739202020-07-23 15:09:07 +01001367/*!\defgroup speed_features Speed vs Quality Trade Off
1368 * \ingroup high_level_algo
1369 * This module describes the encode speed vs quality tradeoff
1370 * @{
1371 */
1372/*! @} - end defgroup speed_features */
1373
1374/*!\defgroup src_frame_proc Source Frame Processing
1375 * \ingroup high_level_algo
1376 * This module describes algorithms in AV1 assosciated with the
1377 * pre-processing of source frames. See also \ref architecture_enc_src_proc
1378 *
1379 * @{
1380 */
1381/*! @} - end defgroup src_frame_proc */
1382
1383/*!\defgroup rate_control Rate Control
1384 * \ingroup high_level_algo
1385 * This module describes rate control algorithm in AV1.
1386 * See also \ref architecture_enc_rate_ctrl
1387 * @{
1388 */
1389/*! @} - end defgroup rate_control */
1390
Paul Wilkinsff98f3e2020-07-27 16:01:05 +01001391/*!\defgroup tpl_modelling Temporal Dependency Modelling
1392 * \ingroup high_level_algo
1393 * This module includes algorithms to implement temporal dependency modelling.
1394 * See also \ref architecture_enc_tpl
1395 * @{
1396 */
1397/*! @} - end defgroup tpl_modelling */
1398
Paul Wilkins71739202020-07-23 15:09:07 +01001399/*!\defgroup two_pass_algo Two Pass Mode
1400 \ingroup high_level_algo
Elliott Karpilovsky2ea18362020-06-02 18:32:27 -07001401
1402 In two pass mode, the input file is passed into the encoder for a quick
1403 first pass, where statistics are gathered. These statistics and the input
1404 file are then passed back into the encoder for a second pass. The statistics
1405 help the encoder reach the desired bitrate without as much overshooting or
1406 undershooting.
1407
1408 During the first pass, the codec will return "stats" packets that contain
1409 information useful for the second pass. The caller should concatenate these
1410 packets as they are received. In the second pass, the concatenated packets
1411 are passed in, along with the frames to encode. During the second pass,
1412 "frame" packets are returned that represent the compressed video.
1413
1414 A complete example can be found in `examples/twopass_encoder.c`. Pseudocode
1415 is provided below to illustrate the core parts.
1416
1417 During the first pass, the uncompressed frames are passed in and stats
1418 information is appended to a byte array.
1419
1420~~~~~~~~~~~~~~~{.c}
1421// For simplicity, assume that there is enough memory in the stats buffer.
1422// Actual code will want to use a resizable array. stats_len represents
1423// the length of data already present in the buffer.
1424void get_stats_data(aom_codec_ctx_t *encoder, char *stats,
Elliott Karpilovskybbc7d9c2020-06-10 20:36:45 -07001425 size_t *stats_len, bool *got_data) {
Elliott Karpilovsky2ea18362020-06-02 18:32:27 -07001426 const aom_codec_cx_pkt_t *pkt;
1427 aom_codec_iter_t iter = NULL;
1428 while ((pkt = aom_codec_get_cx_data(encoder, &iter))) {
Elliott Karpilovskybbc7d9c2020-06-10 20:36:45 -07001429 *got_data = true;
Elliott Karpilovsky2ea18362020-06-02 18:32:27 -07001430 if (pkt->kind != AOM_CODEC_STATS_PKT) continue;
1431 memcpy(stats + *stats_len, pkt->data.twopass_stats.buf,
1432 pkt->data.twopass_stats.sz);
1433 *stats_len += pkt->data.twopass_stats.sz;
1434 }
1435}
1436
1437void first_pass(char *stats, size_t *stats_len) {
1438 struct aom_codec_enc_cfg first_pass_cfg;
1439 ... // Initialize the config as needed.
1440 first_pass_cfg.g_pass = AOM_RC_FIRST_PASS;
1441 aom_codec_ctx_t first_pass_encoder;
1442 ... // Initialize the encoder.
1443
1444 while (frame_available) {
1445 // Read in the uncompressed frame, update frame_available
1446 aom_image_t *frame_to_encode = ...;
1447 aom_codec_encode(&first_pass_encoder, img, pts, duration, flags);
1448 get_stats_data(&first_pass_encoder, stats, stats_len);
1449 }
1450 // After all frames have been processed, call aom_codec_encode with
Elliott Karpilovskybbc7d9c2020-06-10 20:36:45 -07001451 // a NULL ptr repeatedly, until no more data is returned. The NULL
1452 // ptr tells the encoder that no more frames are available.
1453 bool got_data;
1454 do {
1455 got_data = false;
1456 aom_codec_encode(&first_pass_encoder, NULL, pts, duration, flags);
1457 get_stats_data(&first_pass_encoder, stats, stats_len, &got_data);
1458 } while (got_data);
Elliott Karpilovsky2ea18362020-06-02 18:32:27 -07001459
1460 aom_codec_destroy(&first_pass_encoder);
1461}
1462~~~~~~~~~~~~~~~
1463
1464 During the second pass, the uncompressed frames and the stats are
1465 passed into the encoder.
1466
1467~~~~~~~~~~~~~~~{.c}
1468// Write out each encoded frame to the file.
Elliott Karpilovskybbc7d9c2020-06-10 20:36:45 -07001469void get_cx_data(aom_codec_ctx_t *encoder, FILE *file,
1470 bool *got_data) {
Elliott Karpilovsky2ea18362020-06-02 18:32:27 -07001471 const aom_codec_cx_pkt_t *pkt;
1472 aom_codec_iter_t iter = NULL;
1473 while ((pkt = aom_codec_get_cx_data(encoder, &iter))) {
Elliott Karpilovskybbc7d9c2020-06-10 20:36:45 -07001474 *got_data = true;
Elliott Karpilovsky2ea18362020-06-02 18:32:27 -07001475 if (pkt->kind != AOM_CODEC_CX_FRAME_PKT) continue;
1476 fwrite(pkt->data.frame.buf, 1, pkt->data.frame.sz, file);
1477 }
1478}
1479
1480void second_pass(char *stats, size_t stats_len) {
1481 struct aom_codec_enc_cfg second_pass_cfg;
1482 ... // Initialize the config file as needed.
1483 second_pass_cfg.g_pass = AOM_RC_LAST_PASS;
1484 cfg.rc_twopass_stats_in.buf = stats;
1485 cfg.rc_twopass_stats_in.sz = stats_len;
1486 aom_codec_ctx_t second_pass_encoder;
1487 ... // Initialize the encoder from the config.
1488
1489 FILE *output = fopen("output.obu", "wb");
1490 while (frame_available) {
1491 // Read in the uncompressed frame, update frame_available
1492 aom_image_t *frame_to_encode = ...;
1493 aom_codec_encode(&second_pass_encoder, img, pts, duration, flags);
1494 get_cx_data(&second_pass_encoder, output);
1495 }
1496 // Pass in NULL to flush the encoder.
Elliott Karpilovskybbc7d9c2020-06-10 20:36:45 -07001497 bool got_data;
1498 do {
1499 got_data = false;
1500 aom_codec_encode(&second_pass_encoder, NULL, pts, duration, flags);
1501 get_cx_data(&second_pass_encoder, output, &got_data);
1502 } while (got_data);
Elliott Karpilovsky2ea18362020-06-02 18:32:27 -07001503
1504 aom_codec_destroy(&second_pass_encoder);
1505}
1506~~~~~~~~~~~~~~~
1507 */
1508
Elliott Karpilovskyb6bd2bc2020-06-16 03:23:17 -07001509 /*!\defgroup look_ahead_buffer The Look-Ahead Buffer
1510 \ingroup high_level_algo
1511
1512 A program should call \ref aom_codec_encode() for each frame that needs
1513 processing. These frames are internally copied and stored in a fixed-size
1514 circular buffer, known as the look-ahead buffer. Other parts of the code
1515 will use future frame information to inform current frame decisions;
1516 examples include the first-pass algorithm, TPL model, and temporal filter.
1517 Note that this buffer also keeps a reference to the last source frame.
1518
1519 The look-ahead buffer is defined in \ref av1/encoder/lookahead.h. It acts as an
1520 opaque structure, with an interface to create and free memory associated with
1521 it. It supports pushing and popping frames onto the structure in a FIFO
1522 fashion. It also allows look-ahead when using the \ref av1_lookahead_peek()
1523 function with a non-negative number, and look-behind when -1 is passed in (for
Elliott Karpilovsky99990592020-06-19 12:22:54 -07001524 the last source frame; e.g., firstpass will use this for motion estimation).
1525 The \ref av1_lookahead_depth() function returns the current number of frames
1526 stored in it. Note that \ref av1_lookahead_pop() is a bit of a misnomer - it
1527 only pops if either the "flush" variable is set, or the buffer is at maximum
1528 capacity.
Elliott Karpilovskyb6bd2bc2020-06-16 03:23:17 -07001529
Mufaddal Chakeraa65d2ce2021-02-15 12:20:48 +05301530 The buffer is stored in the \ref AV1_PRIMARY::lookahead field.
Elliott Karpilovskyb6bd2bc2020-06-16 03:23:17 -07001531 It is initialized in the first call to \ref aom_codec_encode(), in the
1532 \ref av1_receive_raw_frame() sub-routine. The buffer size is defined by
1533 the g_lag_in_frames parameter set in the
1534 \ref aom_codec_enc_cfg_t::g_lag_in_frames struct.
1535 This can be modified manually but should only be set once. On the command
1536 line, the flag "--lag-in-frames" controls it. The default size is 19 for
Elliott Karpilovsky99990592020-06-19 12:22:54 -07001537 non-realtime usage and 1 for realtime. Note that a maximum value of 35 is
Elliott Karpilovskyb6bd2bc2020-06-16 03:23:17 -07001538 enforced.
1539
1540 A frame will stay in the buffer as long as possible. As mentioned above,
1541 the \ref av1_lookahead_pop() only removes a frame when either flush is set,
1542 or the buffer is full. Note that each call to \ref aom_codec_encode() inserts
1543 another frame into the buffer, and pop is called by the sub-function
1544 \ref av1_encode_strategy(). The buffer is told to flush when
1545 \ref aom_codec_encode() is passed a NULL image pointer. Note that the caller
1546 must repeatedly call \ref aom_codec_encode() with a NULL image pointer, until
1547 no more packets are available, in order to fully flush the buffer.
1548
1549 */
1550
Yunqing Wang65cd0102020-05-06 12:57:04 -07001551/*! @} - end defgroup high_level_algo */
1552
1553/*!\defgroup partition_search Partition Search
1554 * \ingroup encoder_algo
Paul Wilkinsc84e8e22020-07-21 19:09:33 +01001555 * For and overview of the partition search see \ref architecture_enc_partitions
Yunqing Wang65cd0102020-05-06 12:57:04 -07001556 * @{
1557 */
Paul Wilkins71739202020-07-23 15:09:07 +01001558
Yunqing Wang65cd0102020-05-06 12:57:04 -07001559/*! @} - end defgroup partition_search */
1560
1561/*!\defgroup intra_mode_search Intra Mode Search
1562 * \ingroup encoder_algo
1563 * This module describes intra mode search algorithm in AV1.
1564 * More details will be added.
1565 * @{
1566 */
1567/*! @} - end defgroup intra_mode_search */
1568
1569/*!\defgroup inter_mode_search Inter Mode Search
1570 * \ingroup encoder_algo
1571 * This module describes inter mode search algorithm in AV1.
1572 * More details will be added.
1573 * @{
1574 */
1575/*! @} - end defgroup inter_mode_search */
1576
chiyotsai7cc167e2020-06-12 17:50:53 -07001577/*!\defgroup palette_mode_search Palette Mode Search
1578 * \ingroup intra_mode_search
1579 * This module describes palette mode search algorithm in AV1.
1580 * More details will be added.
1581 * @{
1582 */
1583/*! @} - end defgroup palette_mode_search */
1584
Yunqing Wang65cd0102020-05-06 12:57:04 -07001585/*!\defgroup transform_search Transform Search
1586 * \ingroup encoder_algo
1587 * This module describes transform search algorithm in AV1.
Yunqing Wang65cd0102020-05-06 12:57:04 -07001588 * @{
1589 */
1590/*! @} - end defgroup transform_search */
1591
angiebird96bdb2a2020-06-28 17:24:24 -07001592/*!\defgroup coefficient_coding Transform Coefficient Coding and Optimization
1593 * \ingroup encoder_algo
1594 * This module describes the algorithms of transform coefficient coding and optimization in AV1.
1595 * More details will be added.
1596 * @{
1597 */
1598/*! @} - end defgroup coefficient_coding */
1599
Yunqing Wang65cd0102020-05-06 12:57:04 -07001600/*!\defgroup in_loop_filter In-loop Filter
1601 * \ingroup encoder_algo
1602 * This module describes in-loop filter algorithm in AV1.
1603 * More details will be added.
1604 * @{
1605 */
1606/*! @} - end defgroup in_loop_filter */
1607
Debargha Mukherjee7f1580e2020-06-19 06:37:28 -07001608/*!\defgroup in_loop_cdef CDEF
Debargha Mukherjee82b24382020-06-16 23:30:39 -07001609 * \ingroup encoder_algo
1610 * This module describes the CDEF parameter search algorithm
1611 * in AV1. More details will be added.
1612 * @{
1613 */
1614/*! @} - end defgroup in_loop_restoration */
1615
Debargha Mukherjee7f1580e2020-06-19 06:37:28 -07001616/*!\defgroup in_loop_restoration Loop Restoration
Debargha Mukherjee82b24382020-06-16 23:30:39 -07001617 * \ingroup encoder_algo
1618 * This module describes the loop restoration search
1619 * and estimation algorithm in AV1.
1620 * More details will be added.
1621 * @{
1622 */
1623/*! @} - end defgroup in_loop_restoration */
1624
Marco Paniconi5b2faba2020-07-09 11:39:22 -07001625/*!\defgroup cyclic_refresh Cyclic Refresh
1626 * \ingroup encoder_algo
1627 * This module describes the cyclic refresh (aq-mode=3) in AV1.
1628 * More details will be added.
1629 * @{
1630 */
1631/*! @} - end defgroup cyclic_refresh */
Jerome Jiang66e76242020-07-09 11:38:19 -07001632
1633/*!\defgroup SVC Scalable Video Coding
1634 * \ingroup encoder_algo
1635 * This module describes scalable video coding algorithm in AV1.
1636 * More details will be added.
1637 * @{
1638 */
1639/*! @} - end defgroup SVC */
Marco Paniconi08f71f22020-07-14 10:41:47 -07001640/*!\defgroup variance_partition Variance Partition
1641 * \ingroup encoder_algo
1642 * This module describes variance partition algorithm in AV1.
1643 * More details will be added.
1644 * @{
1645 */
1646/*! @} - end defgroup variance_partition */
Fyodor Kyslov2a3768e2020-07-20 14:38:05 -07001647/*!\defgroup nonrd_mode_search NonRD Optimized Mode Search
1648 * \ingroup encoder_algo
1649 * This module describes NonRD Optimized Mode Search used in Real-Time mode.
1650 * More details will be added.
1651 * @{
1652 */
1653/*! @} - end defgroup nonrd_mode_search */