| /*!\page encoder_guide AV1 ENCODER GUIDE |
| |
| \tableofcontents |
| |
| \section architecture_introduction Introduction |
| |
| This document provides an architectural overview of the libaom AV1 encoder. |
| |
| It is intended as a high level starting point for anyone wishing to contribute |
| to the project, that will help them to more quickly understand the structure |
| of the encoder and find their way around the codebase. |
| |
| It stands above and will where necessary link to more detailed function |
| level documents. |
| |
| \section architecture_gencodecs Generic Block Transform Based Codecs |
| |
| Most modern video encoders including VP8, H.264, VP9, HEVC and AV1 |
| (in increasing order of complexity) share a common basic paradigm. This |
| comprises separating a stream of raw video frames into a series of discrete |
| blocks (of one or more sizes), then computing a prediction signal and a |
| quantized, transform coded, residual error signal. The prediction and residual |
| error signal, along with any side information needed by the decoder, are then |
| entropy coded and packed to form the encoded bitstream. See Figure 1: below, |
| where the blue blocks are, to all intents and purposes, the lossless parts of |
| the encoder and the red block is the lossy part. |
| |
| This is of course a gross oversimplification, even in regard to the simplest |
| of the above codecs. For example, all of them allow for block based |
| prediction at multiple different scales (i.e. different block sizes) and may |
| use previously coded pixels in the current frame for prediction or pixels from |
| one or more previously encoded frames. Further, they may support multiple |
| different transforms and transform sizes and quality optimization tools like |
| loop filtering. |
| |
| \image html genericcodecflow.png "" width=70% |
| |
| \section architecture_av1_structure AV1 Structure and Complexity |
| |
| As previously stated, AV1 adopts the same underlying paradigm as other block |
| transform based codecs. However, it is much more complicated than previous |
| generation codecs and supports many more block partitioning, prediction and |
| transform options. |
| |
| AV1 supports block partitions of various sizes from 128x128 pixels down to 4x4 |
| pixels using a multi-layer recursive tree structure as illustrated in figure 2 |
| below. |
| |
| \image html av1partitions.png "" width=70% |
| |
| AV1 also provides 71 basic intra prediction modes, 56 single frame inter prediction |
| modes (7 reference frames x 4 modes x 2 for OBMC (overlapped block motion |
| compensation)), 12768 compound inter prediction modes (that combine inter |
| predictors from two reference frames) and 36708 compound inter / intra |
| prediction modes. Furthermore, in addition to simple inter motion estimation, |
| AV1 also supports warped motion prediction using affine transforms. |
| |
| In terms of transform coding, it has 16 separable 2-D transform kernels |
| { DCT, ADST, fADST, IDTX }2 that can be applied at up to 19 different scales |
| from 64x64 down to 4x4 pixels. |
| |
| When combined together, this means that for any one 8x8 pixel block in a |
| source frame, there are approximately 45,000,000 different ways that it can |
| be encoded. |
| |
| Consequently, AV1 requires complex control processes. While not necessarily |
| a normative part of the bitstream, these are the algorithms that turn a set |
| of compression tools and a bitstream format specification, into a coherent |
| and useful codec implementation. These may include but are not limited to |
| things like :- |
| |
| - Rate distortion optimization (The process of trying to choose the most |
| efficient combination of block size, prediction mode, transform type |
| etc.) |
| - Rate control (regulation of the output bitrate) |
| - Encoder speed vs quality trade offs. |
| - Features such as two pass encoding or optimization for low delay |
| encoding. |
| |
| For a more detailed overview of AV1's encoding tools and a discussion of some |
| of the design considerations and hardware constraints that had to be |
| accommodated, please refer to *** TODO link to Jingning's AV1 overview paper. |
| |
| Figure 3 provides a slightly expanded but still simplistic view of the |
| AV1 encoder architecture with blocks that relate to some of the subsequent |
| sections of this document. In this diagram, the raw uncompressed frame buffers |
| are shown in dark green and the reconstructed frame buffers used for |
| prediction in light green. Red indicates those parts of the codec that are |
| (or may be) lossy, where fidelity can be traded off against compression |
| efficiency, whilst light blue shows algorithms or coding tools that are |
| lossless. The yellow blocks represent non-bitstream normative configuration |
| and control algorithms. |
| |
| \image html av1encoderflow.png "" width=70% |
| |
| \section architecture_command_line The Libaom Command Line Interface |
| |
| Add details or links here: TODO ? elliotk@ |
| |
| \section architecture_enc_data_structures Main Encoder Data Structures |
| |
| The following are the main high level data structures used by the libaom AV1 |
| encoder and referenced elsewhere in this overview document: |
| |
| - \ref AV1_COMP |
| - \ref AV1_COMP.rc (\ref RATE_CONTROL) |
| - \ref AV1_COMP.oxcf (\ref AV1EncoderConfig) |
| - \ref AV1_COMP.twopass (\ref TWO_PASS) |
| - \ref AV1_COMP.gf_group (\ref GF_GROUP) |
| - \ref AV1_COMP.speed |
| - \ref AV1_COMP.sf (\ref SPEED_FEATURES) |
| |
| - \ref AV1EncoderConfig (Encoder configuration parameters) |
| - \ref AV1EncoderConfig.rc_cfg (\ref RateControlCfg) |
| |
| - \ref RATE_CONTROL (Rate control status) |
| - \ref RateControlCfg (Rate control configuration) |
| - \ref TWO_PASS (Two pass status and control data) |
| - \ref GF_GROUP (Data relating to the current GF/ARF group) |
| - \ref FIRSTPASS_STATS (Defines entries in the first pass stats buffer) |
| - \ref SPEED_FEATURES (Encode speed / quality tradeoff parameters) |
| |
| \section architecture_enc_use_cases Encoder Use Cases |
| |
| The libaom AV1 encoder is configurable to support a number of different use |
| cases and rate control strategies. |
| |
| The principle use cases for which it is optimised are as follows: |
| |
| - <b>Video on Demand / Streaming</b> |
| - <b>Low Delay or Live Streaming</b> |
| - <b>Video Conferencing / Real Time Coding (RTC)</b> |
| - <b>Fixed Quality / Testing</b> |
| |
| Other examples of use cases for which the encoder could be configured but for |
| which there is less by way of specific optimizations include: |
| |
| - <b>Download and Play</b> |
| - <b>Disk Playback</b>> |
| - <b>Storage</b> |
| - <b>Editing</b> |
| - <b>Broadcast video</b> |
| |
| Specific use cases may have particular requirements or constraints. For |
| example: |
| |
| <b>Video Conferencing:</b> In a video conference we need to encode the video |
| in real time and to avoid any coding tools that could increase latency, such |
| as frame look ahead. |
| |
| <b>Live Streams:</b> In cases such as live streaming of games or events, it |
| may be possible to allow some limited buffering of the video and use of |
| lookahead coding tools to improve encoding quality. However, whilst a lag of |
| a second or two may be fine given the one way nature of this type of video, |
| it is clearly not possible to use tools such as two pass coding. |
| |
| <b>Broadcast:</b> Broadcast video (e.g. digital TV over satellite) may have |
| specific requirements such as frequent and regular key frames (e.g. once per |
| second or more) as these are important as entry points to users when switching |
| channels. There may also be strict upper limits on bandwidth over a short |
| window of time. |
| |
| <b>Download and Play:</b> Download and play applications may have less strict |
| requirements in terms of local frame by frame rate control but there may be a |
| requirement to accurately hit a file size target for the video clip as a |
| whole. Similar considerations may apply to playback from mass storage devices |
| such as DVD or disk drives. |
| |
| <b>Editing:</b> In certain special use cases such as offline editing, it may |
| be desirable to have very high quality and data rate but also very frequent |
| key frames or indeed to encode the video exclusively as key frames. Lossless |
| video encoding may also be required in this use case. |
| |
| <b>VOD / Streaming:</b> One of the most important and common use cases for AV1 |
| is video on demand or streaming, for services such as YouTube and Netflix. In |
| this use case it is possible to do two or even multi-pass encoding to improve |
| compression efficiency. Streaming services will often store many encoded |
| copies of a video at different resolutions and data rates to support users |
| with different types of playback device and bandwidth limitations. |
| Furthermore, these services support dynamic switching between multiple |
| streams, so that they can respond to changing network conditions. |
| |
| Exact rate control when encoding for a specific format (e.g 360P or 1080P on |
| YouTube) may not be critical, provided that the video bandwidth remains within |
| allowed limits. Whilst a format may have a nominal target data rate, this can |
| be considered more as the desired average egress rate over the video corpus |
| rather than a strict requirement for any individual clip. Indeed, in order |
| to maintain optimal quality of experience for the end user, it may be |
| desirable to encode some easier videos or sections of video at a lower data |
| rate and harder videos or sections at a higher rate. |
| |
| VOD / streaming does not usually require very frequent key frames (as in the |
| broadcast case) but key frames are important in trick play (scanning back and |
| forth to different points in a video) and for adaptive stream switching. As |
| such, in a use case like YouTube, there is normally an upper limit on the |
| maximum time between key frames of a few seconds, but within certain limits |
| the encoder can try to align key frames with real scene cuts. |
| |
| Whilst encoder speed may not seem to be as critical in this use case, for |
| services such as YouTube, where millions of new videos have to be encoded |
| every day, encoder speed is still important, so libaom allows command line |
| control of the encode speed vs quality trade off. |
| |
| <b>Fixed Quality / Testing Mode:</b> Libaom also has a fixed quality encoder |
| pathway designed for testing under highly constrained conditions. |
| |
| |
| \section architecture_enc_rate_ctrl Rate Control |
| |
| Different use cases may have different requirements in terms of data rate |
| control. |
| |
| The broad rate control strategy is selected using the <b>--end-usage</b> |
| parameter on the command line, which maps onto the field |
| \ref aom_codec_enc_cfg_t.rc_end_usage in \ref aom_encoder.h. |
| |
| The four supported options are:- |
| |
| - <b>VBR</b> (Variable Bitrate) |
| - <b>CBR</b> (Constant Bitrate) |
| - <b>CQ</b> (Constrained Quality mode ; A constrained variant of VBR) |
| - <b>Fixed Q</b> (Constant quality of Q mode) |
| |
| The value of \ref aom_codec_enc_cfg_t.rc_end_usage is in turn copied over |
| into the encoder rate control configuration data structure as |
| (TODO REF) RateControlCfg.rc_mode (see \ref encoder.h). |
| |
| In regards to the most important use cases above, Video on demand uses either |
| VBR or CQ mode. CBR is the preferred rate control model for RTC and Live |
| streaming and Fixed Q is only used in testing. |
| |
| The behaviour of each of these modes is regulated by a series of secondary |
| command line rate control options but also depends somewhat on the selected |
| use case, whether 2-pass coding is enabled and the selected encode speed vs |
| quality trade offs (\ref AV1_COMP.speed and \ref AV1_COMP.sf). |
| |
| The list below gives the names of the main rate control command line |
| options together with the names of the corresponding fields in the rate |
| control configuration data structure. |
| |
| - <b>--target-bitrate</b> ((TODO REF)RateControlCfg.target_bandwidth) |
| - <b>--min-q</b> ((TODO REF)RateControlCfg.best_allowed_q) |
| - <b>--max-q</b> ((TODO REF)RateControlCfg.worst_allowed_q) |
| - <b>--cq-level</b> ((TODO REF)RateControlCfg.cq_level) |
| - <b>--undershoot-pct</b> ((TODO REF)RateControlCfg.under_shoot_pct) |
| - <b>--overshoot-pct</b> ((TODO REF)RateControlCfg.over_shoot_pct) |
| |
| The following control aspects of 2 pass vbr encoding |
| |
| - <b>--bias-pct</b> (RateControlCfg::two_pass_vbrbias) |
| - <b>--minsection-pct</b> ((TODO REF)RateControlCfg.two_pass_vbrmin_section) |
| - <b>--maxsection-pct</b> ((TODO REF)RateControlCfg.two_pass_vbrmax_section) |
| |
| The following relate to buffer and delay management in one pass low delay and |
| real time coding |
| |
| - <b>--buf-sz</b> ((TODO REF) RateControlCfg::maximum_buffer_size_ms) |
| - <b>--buf-initial-sz</b> ((TODO REF)RateControlCfg.starting_buffer_level_ms) |
| - <b>--buf-optimal-sz</b> ((TODO REF)RateControlCfg.optimal_buffer_level_ms) |
| |
| The rate control configuration data structure can be found in: |
| |
| - \ref AV1_COMP.oxcf |
| - \ref AV1EncoderConfig.rc_cfg |
| |
| \subsection architecture_enc_vbr Variable Bitrate (VBR) Encoding |
| |
| For streamed VOD content the most common rate control strategy is Variable |
| Bitrate (VBR) encoding. The CQ mode mentioned above is a variant of this |
| where additional quantizer and quality constraints are applied. VBR |
| encoding may in theory be used in conjunction with either 1-pass or 2-pass |
| encoding. |
| |
| VBR encoding varies the number of bits given to each frame or group of frames |
| according to the difficulty of that frame or group of frames, such that easier |
| frames are allocated fewer bits and harder frames are allocated more bits. The |
| intent here is to even out the quality between frames. This contrasts with |
| Constant Bitrate (CBR) encoding where each frame is allocated the same number |
| of bits. |
| |
| Whilst for any given frame or group of frames the data rate may vary, the VBR |
| algorithm attempts to deliver a given average bitrate over a wider time |
| interval. In standard VBR encoding, the time interval over which the data rate |
| is averaged is usually the duration of the video clip. An alternative |
| approach is to target an average VBR bitrate over the entire video corpus for |
| a particular video format (corpus VBR). |
| |
| \subsubsection architecture_enc_1pass_vbr 1 Pass VBR Encoding |
| |
| The command line for libaom does allow 1 Pass VBR, but this has not been |
| properly optimised and behaves much like 1 pass CBR in most regards, with bits |
| allocated to frames by the following functions: |
| |
| - \ref av1_calc_iframe_target_size_one_pass_vbr() |
| - \ref av1_calc_pframe_target_size_one_pass_vbr() |
| |
| \subsubsection architecture_enc_2pass_vbr 2 Pass VBR Encoding |
| |
| The main focus here will be on 2-pass VBR encoding (and the related CQ mode) |
| as these are the modes most commonly used for VOD content. |
| |
| 2-pass encoding is selected on the command line by setting --passes=2 |
| (or -p 2). |
| |
| Generally speaking, in 2-pass encoding, an encoder will first encode a video |
| using a default set of parameters and assumptions. Depending on the outcome |
| of that first encode, the baseline assumptions and parameters will be adjusted |
| to optimize the output during the second pass. In essence the first pass is a |
| fact finding mission to establish the complexity and variability of the video, |
| in order to allow a better allocation of bits in the second pass. |
| |
| The libaom 2-pass algorithm is unusual in that the first pass is not a full |
| encode of the video. Rather it uses a limited set of prediction and transform |
| options and a fixed quantizer, to generate statistics about each frame. No |
| output bitstream is created and the per frame first pass statistics are stored |
| entirely in volatile memory. This has some disadvantages when compared to a |
| full first pass encode, but avoids the need for file I/O and improves speed. |
| |
| In this section I will refer to the following key data structures. |
| (see also \ref architecture_enc_data_structures) |
| |
| - \ref AV1_COMP cpi (the main compressor instance data structure) |
| - \ref AV1_COMP.oxcf (\ref AV1EncoderConfig) |
| - \ref AV1_COMP.rc (\ref RATE_CONTROL) |
| - \ref AV1_COMP.twopass (\ref TWO_PASS) |
| |
| - \ref AV1EncoderConfig (Encoder configuration parameters) |
| - \ref AV1EncoderConfig.pass |
| |
| - \ref RATE_CONTROL (Rate control status) |
| - \ref TWO_PASS (Two pass status and control data) |
| |
| - \ref FIRSTPASS_STATS *frame_stats_buf (used to store per frame first |
| pass stats) |
| |
| For two pass encoding, the function \ref av1_encode() will first be called |
| for each frame in the video with the value \ref AV1EncoderConfig.pass = 1. |
| This will result in calls to \ref av1_first_pass(). |
| |
| Statistics for each frame are stored in \ref FIRSTPASS_STATS frame_stats_buf. |
| |
| After completion of the first pass, \ref av1_encode() will be called again for |
| each frame with \ref AV1EncoderConfig.pass = 2. The frames are then encoded in |
| accordance with the statistics gathered during the first pass by calls to |
| \ref encode_frame_to_data_rate(). |
| |
| \ref encode_frame_to_data_rate() in turn calls (TODO REF) |
| av1_get_second_pass_params(). |
| |
| In summary the second pass code :- |
| |
| - Searches for scene cuts (if auto key frame detection is enabled). |
| - Defines the length of and hierarchical structure to be used in each |
| ARF/GF group. |
| - Allocates bits based on the relative complexity of each frame, the quality |
| of frame to frame prediction and the type of frame (e.g. key frame, ARF |
| frame, golden frame or normal leaf frame). |
| - Suggests a maximum Q (quantizer value) for each ARF/GF group, based on |
| estimated complexity and recent rate control compliance |
| (\ref RATE_CONTROL.active_worst_quality) |
| - Tracks adherence to the overall rate control objectives and adjusts |
| heuristics. |
| |
| The main two pass 2 functions in regard to the above include:- |
| |
| - (TODO REF) find_next_key_frame() |
| - (TODO REF) define_gf_group() |
| - (TODO REF) calculate_total_gf_group_bits() |
| - (TODO REF) get_twopass_worst_quality() |
| - (TODO REF) av1_gop_setup_structure() |
| - (TODO REF) av1_gop_bit_allocation() |
| - (TODO REF) av1_twopass_postencode_update() |
| |
| For each frame, the two pass algorithm defines a target number of bits |
| \ref RATE_CONTROL.base_frame_target, which is then adjusted if necessary to |
| reflect any undershoot or overshoot on previous frames to give |
| \ref RATE_CONTROL.this_frame_target. |
| |
| As well as \ref RATE_CONTROL.active_worst_quality, the two pass code also |
| maintains a record of the actual Q value used to encode previous frames |
| at each level in the current pyramid hierarchy |
| (\ref RATE_CONTROL.active_best_quality). The function |
| \ref rc_pick_q_and_bounds(), uses these values to set a permitted Q range |
| for each frame. |
| |
| \subsubsection architecture_enc_1pass_lagged 1 Pass Lagged VBR Encoding |
| |
| 1 pass lagged encode falls between simple 1 pass encoding and full two pass |
| encoding and is used for cases where it is not possible to do a full first |
| pass through the entire video clip, but where some delay is permissible. For |
| example near live streaming where there is a delay of up to a few seconds. In |
| this case the first pass and second pass are in effect combined such that the |
| first pass starts encoding the clip and the second pass lags behind it by a |
| few frames. When using this method, full sequence level statistics are not |
| available, but it is possible to collect and use frame or group of frame level |
| data to help in the allocation of bits and in defining ARF/GF coding |
| hierarchies. The reader is referred to the data value |
| (TODO REF) cpi->lap_enabled (where <b>lap</b> stands for |
| <b>look ahead processing</b>). This encoding mode for the most part uses the |
| same rate control pathways as two pass VBR encoding. |
| |
| \subsection architecture_enc_rc_loop The Main Rate Control Loop |
| |
| Having established a target rate for a given frame and an allowed range of Q |
| values, the encoder then tries to encode the frame at a rate that is as close |
| as possible to the target value, given the Q range constraints. |
| |
| There are two main mechanisms by which this is achieved. |
| |
| The first selects a frame level Q, using an adaptive estimate of the number of |
| bits that will be generated when the frame is encoded at any given Q. |
| Fundamentally this mechanism is common to VBR, CBR and to use cases such as |
| RTC with small adjustments. |
| |
| As the Q value mainly adjusts the precision of the residual signal, it is not |
| actually a reliable basis for accurately predicting the number of bits that |
| will be generated across all clips. A well predicted clip, for example, may |
| have a much smaller error residual after prediction. The algorithm copes with |
| this by adapting its predictions on the fly using a feedback loop based on how |
| well it did the previous time around. |
| |
| The main functions responsible for the prediction of Q and the adaptation over |
| time, for the two pass encoding pipeline are: |
| |
| - \ref rc_pick_q_and_bounds() |
| - (TODO REF) get_q() |
| - (TODO REF) av1_rc_regulate_q() |
| - (TODO REF) get_rate_correction_factor() |
| - (TODO REF) find_closest_qindex_by_rate() |
| |
| - (TODO REF) av1_twopass_postencode_update() |
| - (TODO REF) av1_rc_update_rate_correction_factors() |
| |
| The second mechanism for control comes into play if there is a large rate miss |
| for the current frame (much too big or too small). This is a recode mechanism |
| which allows the current frame to be re-encoded one or more times with a |
| revised Q value. This obviously has significant implications for encode speed |
| and in the case of RTC latency (hence it is not used for the RTC pathway). |
| |
| Whether or not a recode is allowed for a given frame depends on the selected |
| encode speed vs quality trade off. This is set on the command line using the |
| --cpu-used parameter which maps onto the \ref AV1_COMP.speed field in the main |
| compressor instance data structure. |
| |
| The value of \ref AV1_COMP.speed, combined with the use case, is used to |
| populate the speed features data structure AV1_COMP.sf. In particular |
| \ref HIGH_LEVEL_SPEED_FEATURES.recode_loop determines the types of frames that |
| may be recoded and \ref HIGH_LEVEL_SPEED_FEATURES.recode_tolerance is a rate |
| error trigger threshold. |
| |
| For more information the reader is directed to the following data structures: |
| |
| - \ref AV1_COMP cpi (the main compressor instance data structure) |
| - \ref AV1_COMP.speed |
| - \ref AV1_COMP.sf (\ref SPEED_FEATURES) |
| |
| - \ref SPEED_FEATURES (Encode speed vs quality tradeoff parameters) |
| - \ref SPEED_FEATURES.hl_sf (\ref HIGH_LEVEL_SPEED_FEATURES) |
| |
| - \ref HIGH_LEVEL_SPEED_FEATURES |
| - \ref HIGH_LEVEL_SPEED_FEATURES.recode_loop |
| - \ref HIGH_LEVEL_SPEED_FEATURES.recode_tolerance |
| |
| and functions: |
| |
| - (TODO REF) encode_with_recode_loop() |
| - (TODO REF) recode_loop_update_q() |
| - (TODO REF) av1_set_speed_features_framesize_independent() |
| - (TODO REF) av1_set_speed_features_framesize_dependent() |
| |
| \subsection architecture_enc_fixed_q Fixed Q Mode |
| |
| Add details here. |
| |
| \section architecture_enc_src_proc Source Frame Processing |
| |
| Add details here. |
| |
| \section architecture_enc_hierachical Hierarchical Coding |
| |
| Add details here. |
| |
| \section architecture_enc_tpl Temporal Dependency Modelling |
| The temporal dependency model runs at the beginning of each GOP. It builds the |
| motion trajectory within the GOP in units of 16x16 blocks. The temporal |
| dependency of a 16x16 block is evaluated as the predictive coding gains it |
| contributes to its trailing motion trajectory. This temporal dependency model |
| reflects how important a coding block is for the coding efficiency of the |
| overall GOP. It is hence used to scale the Lagrangian multiplier used in the |
| rate-distortion optimization framework. |
| |
| \subsection architecture_enc_tpl_config Configurations |
| |
| The temporal dependency model and its applications are by default turned on in |
| libaom encoder for the VoD use case. To disable it, use --tpl-model=0 in the |
| aomenc configuration. |
| |
| |
| \subsection architecture_enc_tpl_algoritms Algorithms |
| |
| The scheme works in the reverse frame processing order over the source frames, |
| propagating information from future frames back to the current frame. For each |
| frame, a propagation step is run for each MB. it operates as follows: |
| |
| <ul> |
| <li> Estimate the intra prediction cost in terms of sum of absolute Hadamard |
| transform difference (SATD) noted as intra_cost. It also loads the motion |
| information available from the first-pass encode and estimates the inter |
| prediction cost as inter_cost. Due to the use of hybrid inter/intra |
| prediction mode, the inter_cost value is further upper bounded by |
| intra_cost. A propagation cost variable is used to collect all the |
| information flowed back from future processing frames. It is initialized as |
| 0 for all the blocks in the last processing frame in a group of pictures |
| (GOP).</li> |
| |
| <li> The fraction of information from a current block to be propagated towards |
| its reference block is estimated as: |
| \f[ |
| propagation\_fraction = (1 − inter\_cost/intra\_cost) |
| \f] |
| It reflects how much the motion compensated reference would reduce the |
| prediction error in percentage.</li> |
| |
| <li> The total amount of information the current block contributes to the GOP |
| is estimated as intra_cost + propagation_cost. The information that it |
| propagates towards its reference block is captured by: |
| |
| \f[ |
| propagation\_amount = |
| (intra\_cost + propagation\_cost) ∗ propagation\_fraction |
| \f]</li> |
| |
| <li> Note that the reference block may not necessarily sit on the grid of |
| 16x16 blocks. The propagation amount is hence dispensed to all the blocks |
| that overlap with the reference block. The corresponding block in the |
| reference frame accumulates its own propagation cost as it receives back |
| propagation. |
| |
| \f[ |
| propagation\_cost = propagation\_cost + |
| (\frac{overlap\_area}{(16*16)} ∗ propagation\_amount) |
| \f]</li> |
| |
| <li> In the final encoding stage, the distortion propagation factor of a block |
| is evaluated as \f$(1 + \frac{propagation\_cost}{intra\_cost})\f$, where the second term |
| captures its impact on later frames in a GOP.</li> |
| |
| <li> The Lagrangian multiplier is adapted at the 64x64 block level. For every |
| 64x64 block in a frame, we have a distortion propagation factor: |
| |
| \f[ |
| dist\_prop[i] = 1 + \frac{propogation\_cost[i]}{intra\_cost[i]} |
| \f] |
| |
| where i denotes the block index in the frame. We also have the frame level |
| distortion propagation factor: |
| |
| \f[ |
| dist\_prop = 1 + |
| \frac{\sum_{i}propogation\_cost[i]}{\sum_{i}intra\_cost[i]} |
| \f] |
| |
| which is used to normalize the propagation factor at the 64x64 block level. The |
| Lagrangian multiplier is hence adapted as: |
| |
| \f[ |
| λ[i] = λ[0] * \frac{dist\_prop}{dist\_prop[i]} |
| \f] |
| |
| where λ0 is the multiplier associated with the frame level QP. The |
| 64x64 block level QP is scaled according to the Lagrangian multiplier. |
| </ul> |
| |
| \subsection architecture_enc_tpl_keyfun Key Functions |
| |
| - The TPL model is built in (TODO REF) av1_tpl_setup_stats(). |
| - Its application to the QP offset is triggered in (TODO REF) setup_delta_q(). |
| |
| \section architecture_enc_partitions Block Partition Search |
| |
| Add details here. |
| |
| \section architecture_enc_inter_modes Inter Prediction Mode Search |
| |
| Add details here. |
| |
| \section architecture_enc_intra_modes Intra Mode Search |
| |
| Add details here. |
| |
| \section architecture_enc_tx_search Transform Search |
| |
| Add details here. |
| |
| \section architecture_loop_filt Loop Filtering |
| |
| Add details here. |
| |
| \section architecture_loop_rest Loop Restoration Filtering |
| |
| Add details here. |
| |
| \section architecture_cdef CDEF |
| |
| Add details here. |
| |
| \section architecture_entropy Entropy Coding |
| |
| Add details here. |
| |
| */ |
| |
| /*!\defgroup encoder_algo Encoder Algorithm |
| * |
| * The encoder algorithm describes how a sequence is encoded, including high |
| * level decision as well as algorithm used at every encoding stage. |
| */ |
| |
| /*!\defgroup high_level_algo High-level Algorithm |
| * \ingroup encoder_algo |
| * This module describes sequence level/frame level algorithm in AV1. |
| * More details will be added. |
| * @{ |
| */ |
| |
| /*!\defgroup frame_coding_pipeline Frame Coding Pipeline |
| \ingroup high_level_algo |
| |
| To encode a frame, first call \ref av1_receive_raw_frame() to obtain the raw |
| frame data. Then call \ref av1_get_compressed_data() to encode raw frame data |
| into compressed frame data. The main body of \ref av1_get_compressed_data() |
| is \ref av1_encode_strategy(), which determines high-level encode strategy |
| (frame type, frame placement, etc.) and then encodes the frame by calling |
| \ref av1_encode(). In \ref av1_encode(), \ref av1_first_pass() will execute |
| the first_pass of two-pass encoding, while \ref encode_frame_to_data_rate() |
| will perform the final pass for either one-pass or two-pass encoding. |
| |
| The main body of \ref encode_frame_to_data_rate() is |
| \ref encode_with_recode_loop_and_filter(), which handles encoding before |
| in-loop filters (with recode loops encode_with_recode_loop(), or without |
| any recode loop \ref encode_without_recode()), followed by in-loop filters |
| (deblocking filters \ref loopfilter_frame(), CDEF filters and restoration |
| filters \ref cdef_restoration_frame()). |
| |
| Except for rate/quality control, both encode_with_recode_loop() and |
| \ref encode_without_recode() call \ref av1_encode_frame() to manage reference |
| frame buffers and to perform the rest of encoding that does not require |
| operating external frames by \ref encode_frame_internal(), which is the |
| starting point of \ref partition_search. |
| */ |
| |
| /*!\defgroup two_pass_algo Two Pass Mode |
| \ingroup high_level_algo |
| |
| In two pass mode, the input file is passed into the encoder for a quick |
| first pass, where statistics are gathered. These statistics and the input |
| file are then passed back into the encoder for a second pass. The statistics |
| help the encoder reach the desired bitrate without as much overshooting or |
| undershooting. |
| |
| During the first pass, the codec will return "stats" packets that contain |
| information useful for the second pass. The caller should concatenate these |
| packets as they are received. In the second pass, the concatenated packets |
| are passed in, along with the frames to encode. During the second pass, |
| "frame" packets are returned that represent the compressed video. |
| |
| A complete example can be found in `examples/twopass_encoder.c`. Pseudocode |
| is provided below to illustrate the core parts. |
| |
| During the first pass, the uncompressed frames are passed in and stats |
| information is appended to a byte array. |
| |
| ~~~~~~~~~~~~~~~{.c} |
| // For simplicity, assume that there is enough memory in the stats buffer. |
| // Actual code will want to use a resizable array. stats_len represents |
| // the length of data already present in the buffer. |
| void get_stats_data(aom_codec_ctx_t *encoder, char *stats, |
| size_t *stats_len, bool *got_data) { |
| const aom_codec_cx_pkt_t *pkt; |
| aom_codec_iter_t iter = NULL; |
| while ((pkt = aom_codec_get_cx_data(encoder, &iter))) { |
| *got_data = true; |
| if (pkt->kind != AOM_CODEC_STATS_PKT) continue; |
| memcpy(stats + *stats_len, pkt->data.twopass_stats.buf, |
| pkt->data.twopass_stats.sz); |
| *stats_len += pkt->data.twopass_stats.sz; |
| } |
| } |
| |
| void first_pass(char *stats, size_t *stats_len) { |
| struct aom_codec_enc_cfg first_pass_cfg; |
| ... // Initialize the config as needed. |
| first_pass_cfg.g_pass = AOM_RC_FIRST_PASS; |
| aom_codec_ctx_t first_pass_encoder; |
| ... // Initialize the encoder. |
| |
| while (frame_available) { |
| // Read in the uncompressed frame, update frame_available |
| aom_image_t *frame_to_encode = ...; |
| aom_codec_encode(&first_pass_encoder, img, pts, duration, flags); |
| get_stats_data(&first_pass_encoder, stats, stats_len); |
| } |
| // After all frames have been processed, call aom_codec_encode with |
| // a NULL ptr repeatedly, until no more data is returned. The NULL |
| // ptr tells the encoder that no more frames are available. |
| bool got_data; |
| do { |
| got_data = false; |
| aom_codec_encode(&first_pass_encoder, NULL, pts, duration, flags); |
| get_stats_data(&first_pass_encoder, stats, stats_len, &got_data); |
| } while (got_data); |
| |
| aom_codec_destroy(&first_pass_encoder); |
| } |
| ~~~~~~~~~~~~~~~ |
| |
| During the second pass, the uncompressed frames and the stats are |
| passed into the encoder. |
| |
| ~~~~~~~~~~~~~~~{.c} |
| // Write out each encoded frame to the file. |
| void get_cx_data(aom_codec_ctx_t *encoder, FILE *file, |
| bool *got_data) { |
| const aom_codec_cx_pkt_t *pkt; |
| aom_codec_iter_t iter = NULL; |
| while ((pkt = aom_codec_get_cx_data(encoder, &iter))) { |
| *got_data = true; |
| if (pkt->kind != AOM_CODEC_CX_FRAME_PKT) continue; |
| fwrite(pkt->data.frame.buf, 1, pkt->data.frame.sz, file); |
| } |
| } |
| |
| void second_pass(char *stats, size_t stats_len) { |
| struct aom_codec_enc_cfg second_pass_cfg; |
| ... // Initialize the config file as needed. |
| second_pass_cfg.g_pass = AOM_RC_LAST_PASS; |
| cfg.rc_twopass_stats_in.buf = stats; |
| cfg.rc_twopass_stats_in.sz = stats_len; |
| aom_codec_ctx_t second_pass_encoder; |
| ... // Initialize the encoder from the config. |
| |
| FILE *output = fopen("output.obu", "wb"); |
| while (frame_available) { |
| // Read in the uncompressed frame, update frame_available |
| aom_image_t *frame_to_encode = ...; |
| aom_codec_encode(&second_pass_encoder, img, pts, duration, flags); |
| get_cx_data(&second_pass_encoder, output); |
| } |
| // Pass in NULL to flush the encoder. |
| bool got_data; |
| do { |
| got_data = false; |
| aom_codec_encode(&second_pass_encoder, NULL, pts, duration, flags); |
| get_cx_data(&second_pass_encoder, output, &got_data); |
| } while (got_data); |
| |
| aom_codec_destroy(&second_pass_encoder); |
| } |
| ~~~~~~~~~~~~~~~ |
| */ |
| |
| /*!\defgroup look_ahead_buffer The Look-Ahead Buffer |
| \ingroup high_level_algo |
| |
| A program should call \ref aom_codec_encode() for each frame that needs |
| processing. These frames are internally copied and stored in a fixed-size |
| circular buffer, known as the look-ahead buffer. Other parts of the code |
| will use future frame information to inform current frame decisions; |
| examples include the first-pass algorithm, TPL model, and temporal filter. |
| Note that this buffer also keeps a reference to the last source frame. |
| |
| The look-ahead buffer is defined in \ref av1/encoder/lookahead.h. It acts as an |
| opaque structure, with an interface to create and free memory associated with |
| it. It supports pushing and popping frames onto the structure in a FIFO |
| fashion. It also allows look-ahead when using the \ref av1_lookahead_peek() |
| function with a non-negative number, and look-behind when -1 is passed in (for |
| the last source frame; e.g., firstpass will use this for motion estimation). |
| The \ref av1_lookahead_depth() function returns the current number of frames |
| stored in it. Note that \ref av1_lookahead_pop() is a bit of a misnomer - it |
| only pops if either the "flush" variable is set, or the buffer is at maximum |
| capacity. |
| |
| The buffer is stored in the \ref AV1_COMP::lookahead field. |
| It is initialized in the first call to \ref aom_codec_encode(), in the |
| \ref av1_receive_raw_frame() sub-routine. The buffer size is defined by |
| the g_lag_in_frames parameter set in the |
| \ref aom_codec_enc_cfg_t::g_lag_in_frames struct. |
| This can be modified manually but should only be set once. On the command |
| line, the flag "--lag-in-frames" controls it. The default size is 19 for |
| non-realtime usage and 1 for realtime. Note that a maximum value of 35 is |
| enforced. |
| |
| A frame will stay in the buffer as long as possible. As mentioned above, |
| the \ref av1_lookahead_pop() only removes a frame when either flush is set, |
| or the buffer is full. Note that each call to \ref aom_codec_encode() inserts |
| another frame into the buffer, and pop is called by the sub-function |
| \ref av1_encode_strategy(). The buffer is told to flush when |
| \ref aom_codec_encode() is passed a NULL image pointer. Note that the caller |
| must repeatedly call \ref aom_codec_encode() with a NULL image pointer, until |
| no more packets are available, in order to fully flush the buffer. |
| |
| */ |
| |
| /*! @} - end defgroup high_level_algo */ |
| |
| /*!\defgroup partition_search Partition Search |
| * \ingroup encoder_algo |
| A frame is first split into tiles in \ref encode_tiles(), with each tile |
| compressed by av1_encode_tile(). Then a tile is processed in superblock rows |
| via \ref av1_encode_sb_row() and then \ref encode_sb_row(). |
| |
| Partition search starts by superblocks that are sequentially processed in |
| \ref encode_sb_row(). For a superblock, two search modes are supported |
| corresponding to the encoding configurations, \ref encode_nonrd_sb() is for |
| 1-pass and real-time modes, while \ref encode_rd_sb() performs more |
| exhaustive searches. |
| |
| Partition search over the recursive quad-tree space is implemented by |
| recursively calling \ref av1_nonrd_use_partition(), \ref av1_rd_use_partition(), or |
| av1_rd_pick_partition() and returning best options for sub-trees to their |
| parent partitions. |
| |
| In libaom, partition search lays on top of mode search (predictor, transform, |
| etc.) instead of being a separate module, the interface of mode search is |
| \ref pick_sb_modes(), which connects \ref partition_search with |
| \ref inter_mode_search and \ref intra_mode_search. To make good decisions, |
| reconstruction is also required in order to build references and contexts, it |
| is implemented by \ref encode_sb() at sub-tree level and \ref encode_b() at |
| coding block level. |
| * @{ |
| */ |
| /*! @} - end defgroup partition_search */ |
| |
| /*!\defgroup intra_mode_search Intra Mode Search |
| * \ingroup encoder_algo |
| * This module describes intra mode search algorithm in AV1. |
| * More details will be added. |
| * @{ |
| */ |
| /*! @} - end defgroup intra_mode_search */ |
| |
| /*!\defgroup inter_mode_search Inter Mode Search |
| * \ingroup encoder_algo |
| * This module describes inter mode search algorithm in AV1. |
| * More details will be added. |
| * @{ |
| */ |
| /*! @} - end defgroup inter_mode_search */ |
| |
| /*!\defgroup palette_mode_search Palette Mode Search |
| * \ingroup intra_mode_search |
| * This module describes palette mode search algorithm in AV1. |
| * More details will be added. |
| * @{ |
| */ |
| /*! @} - end defgroup palette_mode_search */ |
| |
| /*!\defgroup transform_search Transform Search |
| * \ingroup encoder_algo |
| * This module describes transform search algorithm in AV1. |
| * More details will be added. |
| * @{ |
| */ |
| /*! @} - end defgroup transform_search */ |
| |
| /*!\defgroup coefficient_coding Transform Coefficient Coding and Optimization |
| * \ingroup encoder_algo |
| * This module describes the algorithms of transform coefficient coding and optimization in AV1. |
| * More details will be added. |
| * @{ |
| */ |
| /*! @} - end defgroup coefficient_coding */ |
| |
| /*!\defgroup in_loop_filter In-loop Filter |
| * \ingroup encoder_algo |
| * This module describes in-loop filter algorithm in AV1. |
| * More details will be added. |
| * @{ |
| */ |
| /*! @} - end defgroup in_loop_filter */ |
| |
| /*!\defgroup in_loop_cdef CDEF |
| * \ingroup encoder_algo |
| * This module describes the CDEF parameter search algorithm |
| * in AV1. More details will be added. |
| * @{ |
| */ |
| /*! @} - end defgroup in_loop_restoration */ |
| |
| /*!\defgroup in_loop_restoration Loop Restoration |
| * \ingroup encoder_algo |
| * This module describes the loop restoration search |
| * and estimation algorithm in AV1. |
| * More details will be added. |
| * @{ |
| */ |
| /*! @} - end defgroup in_loop_restoration */ |
| |
| /*!\defgroup rate_control Rate Control |
| * \ingroup encoder_algo |
| * This module describes rate control algorithm in AV1. |
| * More details will be added. |
| * @{ |
| */ |
| /*! @} - end defgroup rate_control */ |