doc/dev_guide/av1_encoder.dox - aom - Git at Google

 /*!\page encoder_guide AV1 ENCODER GUIDE

 \tableofcontents

 \section architecture_introduction Introduction

 This document provides an architectural overview of the libaom AV1 encoder.

 It is intended as a high level starting point for anyone wishing to contribute
 to the project, that will help them to more quickly understand the structure
 of the encoder and find their way around the codebase.

 It stands above and will where necessary link to more detailed function
 level documents.

 \section  architecture_gencodecs Generic Block Transform Based Codecs

 Most modern video encoders including VP8, H.264, VP9, HEVC and AV1
 (in increasing order of complexity) share a common basic paradigm. This
 comprises separating a stream of raw video frames into a series of discrete
 blocks (of one or more sizes), then computing a prediction signal and a
 quantized, transform coded, residual error signal. The prediction and residual
 error signal, along with any side information needed by the decoder, are then
 entropy coded and packed to form the encoded bitstream. See Figure 1: below,
 where the blue blocks are, to all intents and purposes, the lossless parts of
 the encoder and the red block is the lossy part.

 This is of course a gross oversimplification, even in regard to the simplest
 of the above codecs.  For example, all of them allow for block based
 prediction at multiple different scales (i.e. different block sizes) and may
 use previously coded pixels in the current frame for prediction or pixels from
 one or more previously encoded frames. Further, they may support multiple
 different transforms and transform sizes and quality optimization tools like
 loop filtering.

 \image html genericcodecflow.png "" width=70%

 \section architecture_av1_structure AV1 Structure and Complexity

 As previously stated, AV1 adopts the same underlying paradigm as other block
 transform based codecs. However, it is much more complicated than previous
 generation codecs and supports many more block partitioning, prediction and
 transform options.

 AV1 supports block partitions of various sizes from 128x128 pixels down to 4x4
 pixels using a multi-layer recursive tree structure as illustrated in figure 2
 below.

 \image html av1partitions.png "" width=70%

 AV1 also provides 71 basic intra prediction modes, 56 single frame inter prediction
 modes (7 reference frames x 4 modes x 2 for OBMC (overlapped block motion
 compensation)), 12768 compound inter prediction modes (that combine inter
 predictors from two reference frames) and 36708 compound inter / intra
 prediction modes. Furthermore, in addition to simple inter motion estimation,
 AV1 also supports warped motion prediction using affine transforms.

 In terms of transform coding, it has 16 separable 2-D transform kernels
 { DCT, ADST, fADST, IDTX }2 that can be applied at up to 19 different scales
 from 64x64 down to 4x4 pixels.

 When combined together, this means that for any one 8x8 pixel block in a
 source frame, there are approximately 45,000,000 different ways that it can
 be encoded.

 Consequently, AV1 requires complex control processes. While not necessarily
 a normative part of the bitstream, these are the algorithms that turn a set
 of compression tools and a bitstream format specification, into a coherent
 and useful codec implementation. These may include but are not limited to
 things like :-

 - Rate distortion optimization (The process of trying to choose the most
   efficient combination of block size, prediction mode, transform type
   etc.)
 - Rate control (regulation of the output bitrate)
 - Encoder speed vs quality trade offs.
 - Features such as two pass encoding or optimization for low delay
   encoding.

 For a more detailed overview of AV1's encoding tools and a discussion of some
 of the design considerations and hardware constraints that had to be
 accommodated, please refer to *** TODO link to Jingning's AV1 overview paper.

 Figure 3 provides a slightly expanded but still simplistic view of the
 AV1 encoder architecture with blocks that relate to some of the subsequent
 sections of this document. In this diagram, the raw uncompressed frame buffers
 are shown in dark green and the reconstructed frame buffers used for
 prediction in light green. Red indicates those parts of the codec that are
 (or may be) lossy, where fidelity can be traded off against compression
 efficiency, whilst light blue shows algorithms or coding tools that are
 lossless. The yellow blocks represent non-bitstream normative configuration
 and control algorithms.

 \image html av1encoderflow.png "" width=70%

 \section architecture_command_line The Libaom Command Line Interface

  Add details or links here: TODO ? elliotk@

 \section architecture_enc_data_structures Main Encoder Data Structures

 The following are the main high level data structures used by the libaom AV1
 encoder and referenced elsewhere in this overview document:

  - \ref AV1_COMP
    - \ref AV1_COMP.rc (\ref RATE_CONTROL)
    - \ref AV1_COMP.oxcf (\ref AV1EncoderConfig)
    - \ref AV1_COMP.twopass (\ref TWO_PASS)
    - \ref AV1_COMP.gf_group (\ref GF_GROUP)
    - \ref AV1_COMP.speed
    - \ref AV1_COMP.sf (\ref SPEED_FEATURES)

  - \ref AV1EncoderConfig (Encoder configuration parameters)
    - \ref AV1EncoderConfig.rc_cfg (\ref RateControlCfg)

  - \ref RATE_CONTROL (Rate control status)
  - \ref RateControlCfg (Rate control configuration)
  - \ref TWO_PASS (Two pass status and control data)
  - \ref GF_GROUP (Data relating to the current GF/ARF group)
  - \ref FIRSTPASS_STATS (Defines entries in the first pass stats buffer)
  - \ref SPEED_FEATURES (Encode speed / quality tradeoff parameters)

 \section architecture_enc_use_cases Encoder Use Cases

 The libaom AV1 encoder is configurable to support a number of different use
 cases and rate control strategies.

 The principle use cases for which it is optimised are as follows:

  - <b>Video on Demand / Streaming</b>
  - <b>Low Delay or Live Streaming</b>
  - <b>Video Conferencing / Real Time Coding (RTC)</b>
  - <b>Fixed Quality / Testing</b>

 Other examples of use cases for which the encoder could be configured but for
 which there is less by way of specific optimizations include:

  - <b>Download and Play</b>
  - <b>Disk Playback</b>>
  - <b>Storage</b>
  - <b>Editing</b>
  - <b>Broadcast video</b>

 Specific use cases may have particular requirements or constraints. For
 example:

 <b>Video Conferencing:</b>  In a video conference we need to encode the video
 in real time and to avoid any coding tools that could increase latency, such
 as frame look ahead.

 <b>Live Streams:</b> In cases such as live streaming of games or events, it
 may be possible to allow some limited buffering of the video and use of
 lookahead coding tools to improve encoding quality. However,  whilst a lag of
 a second or two may be fine given the one way nature of this type of video,
 it is clearly not possible to use tools such as two pass coding.

 <b>Broadcast:</b> Broadcast video (e.g. digital TV over satellite) may have
 specific requirements such as frequent and regular key frames (e.g. once per
 second or more) as these are important as entry points to users when switching
 channels. There may also be  strict upper limits on bandwidth over a short
 window of time.

 <b>Download and Play:</b> Download and play applications may have less strict
 requirements in terms of local frame by frame rate control but there may be a
 requirement to accurately hit a file size target for the video clip as a
 whole. Similar considerations may apply to playback from mass storage devices
 such as DVD or disk drives.

 <b>Editing:</b> In certain special use cases such as offline editing, it may
 be desirable to have very high quality and data rate but also very frequent
 key frames or indeed to encode the video exclusively as key frames. Lossless
 video encoding may also be required in this use case.

 <b>VOD / Streaming:</b> One of the most important and common use cases for AV1
 is video on demand or streaming, for services such as YouTube and Netflix. In
 this use case it is possible to do two or even multi-pass encoding to improve
 compression efficiency. Streaming services will often store many encoded
 copies of a video at different resolutions and data rates to support users
 with different types of playback device and bandwidth limitations.
 Furthermore, these services support dynamic switching between multiple
 streams, so that they can respond to changing network conditions.

 Exact rate control when encoding for a specific format (e.g 360P or 1080P on
 YouTube) may not be critical, provided that the video bandwidth remains within
 allowed limits. Whilst a format may have a nominal target data rate, this can
 be considered more as the desired average egress rate over the video corpus
 rather than a strict requirement for any individual clip. Indeed, in order
 to maintain optimal quality of experience for the end user, it may be
 desirable to encode some easier videos or sections of video at a lower data
 rate and harder videos or sections at a higher rate.

 VOD / streaming does not usually require very frequent key frames (as in the
 broadcast case) but key frames are important in trick play (scanning back and
 forth to different points in a video) and for adaptive stream switching. As
 such, in a use case like YouTube, there is normally an upper limit on the
 maximum time between key frames of a few seconds, but within certain limits
 the encoder can try to align key frames with real scene cuts.

 Whilst encoder speed may not seem to be as critical in this use case, for
 services such as YouTube, where millions of new videos have to be encoded
 every day, encoder speed is still important, so libaom allows command line
 control of the encode speed vs quality trade off.

 <b>Fixed Quality / Testing Mode:</b> Libaom also has a fixed quality encoder
 pathway designed for testing under highly constrained conditions.


 \section architecture_enc_rate_ctrl Rate Control

 Different use cases may have different requirements in terms of data rate
 control.

 The broad rate control strategy is selected using the <b>--end-usage</b>
 parameter on the command line, which maps onto the field
 \ref aom_codec_enc_cfg_t.rc_end_usage in \ref aom_encoder.h.

 The four supported options are:-

 - <b>VBR</b> (Variable Bitrate)
 - <b>CBR</b> (Constant Bitrate)
 - <b>CQ</b> (Constrained Quality mode ; A constrained variant of VBR)
 - <b>Fixed Q</b> (Constant quality of Q mode)

 The value of \ref aom_codec_enc_cfg_t.rc_end_usage is in turn copied over
 into the encoder rate control configuration data structure as
 (TODO REF) RateControlCfg.rc_mode (see \ref encoder.h).

 In regards to the most important use cases above, Video on demand uses either
 VBR or CQ mode. CBR is the preferred rate control model for RTC and Live
 streaming and Fixed Q is only used in testing.

 The behaviour of each of these modes is regulated by a series of secondary
 command line rate control options but also depends somewhat on the selected
 use case, whether 2-pass coding is enabled and the selected encode speed vs
 quality trade offs (\ref AV1_COMP.speed and \ref AV1_COMP.sf).

 The list below gives the names of the main rate control command line
 options together with the names of the corresponding fields in the rate
 control configuration data structure.

 - <b>--target-bitrate</b> ((TODO REF)RateControlCfg.target_bandwidth)
 - <b>--min-q</b> ((TODO REF)RateControlCfg.best_allowed_q)
 - <b>--max-q</b> ((TODO REF)RateControlCfg.worst_allowed_q)
 - <b>--cq-level</b> ((TODO REF)RateControlCfg.cq_level)
 - <b>--undershoot-pct</b> ((TODO REF)RateControlCfg.under_shoot_pct)
 - <b>--overshoot-pct</b> ((TODO REF)RateControlCfg.over_shoot_pct)

 The following control aspects of 2 pass vbr encoding

 - <b>--bias-pct</b> (RateControlCfg::two_pass_vbrbias)
 - <b>--minsection-pct</b> ((TODO REF)RateControlCfg.two_pass_vbrmin_section)
 - <b>--maxsection-pct</b> ((TODO REF)RateControlCfg.two_pass_vbrmax_section)

 The following relate to buffer and delay management in one pass low delay and
 real time coding

 - <b>--buf-sz</b> ((TODO REF) RateControlCfg::maximum_buffer_size_ms)
 - <b>--buf-initial-sz</b> ((TODO REF)RateControlCfg.starting_buffer_level_ms)
 - <b>--buf-optimal-sz</b> ((TODO REF)RateControlCfg.optimal_buffer_level_ms)

 The rate control configuration data structure can be found in:

 - \ref AV1_COMP.oxcf
   - \ref AV1EncoderConfig.rc_cfg

 \subsection architecture_enc_vbr Variable Bitrate (VBR) Encoding

 For streamed VOD content the most common rate control strategy is Variable
 Bitrate (VBR) encoding. The CQ mode mentioned above is a variant of this
 where additional quantizer and quality constraints are applied.  VBR
 encoding may in theory be used in conjunction with either 1-pass or 2-pass
 encoding.

 VBR encoding varies the number of bits given to each frame or group of frames
 according to the difficulty of that frame or group of frames, such that easier
 frames are allocated fewer bits and harder frames are allocated more bits. The
 intent here is to even out the quality between frames. This contrasts with
 Constant Bitrate (CBR) encoding where each frame is allocated the same number
 of bits.

 Whilst for any given frame or group of frames the data rate may vary, the VBR
 algorithm attempts to deliver a given average bitrate over a wider time
 interval. In standard VBR encoding, the time interval over which the data rate
 is averaged is usually the duration of the video clip.  An alternative
 approach is to target an average VBR bitrate over the entire video corpus for
 a particular video format (corpus VBR).

 \subsubsection architecture_enc_1pass_vbr 1 Pass VBR Encoding

 The command line for libaom does allow 1 Pass VBR, but this has not been
 properly optimised and behaves much like 1 pass CBR in most regards, with bits
 allocated to frames by the following functions:

 - \ref av1_calc_iframe_target_size_one_pass_vbr()
 - \ref av1_calc_pframe_target_size_one_pass_vbr()

 \subsubsection architecture_enc_2pass_vbr 2 Pass VBR Encoding

 The main focus here will be on 2-pass VBR encoding (and the related CQ mode)
 as these are the modes most commonly used for VOD content.

 2-pass encoding is selected on the command line by setting --passes=2
 (or -p 2).

 Generally speaking, in 2-pass encoding, an encoder will first encode a video
 using a default set of parameters and assumptions. Depending on the outcome
 of that first encode, the baseline assumptions and parameters will be adjusted
 to optimize the output during the second pass.  In essence the first pass is a
 fact finding mission to establish the complexity and variability of the video,
 in order to allow a better allocation of bits in the second pass.

 The libaom 2-pass algorithm is unusual in that the first pass is not a full
 encode of the video. Rather it uses a limited set of prediction and transform
 options and a fixed quantizer,  to generate statistics about each frame. No
 output bitstream is created and the per frame first pass statistics are stored
 entirely in volatile memory. This has some disadvantages when compared to a
 full first pass encode, but avoids the need for file I/O and improves speed.

 In this section I will refer to the following key data structures.
 (see also \ref architecture_enc_data_structures)

 - \ref AV1_COMP cpi (the main compressor instance data structure)
    - \ref AV1_COMP.oxcf (\ref AV1EncoderConfig)
    - \ref AV1_COMP.rc (\ref RATE_CONTROL)
    - \ref AV1_COMP.twopass (\ref TWO_PASS)

 - \ref AV1EncoderConfig  (Encoder configuration parameters)
    - \ref AV1EncoderConfig.pass

 - \ref RATE_CONTROL (Rate control status)
 - \ref TWO_PASS (Two pass status and control data)

 - \ref FIRSTPASS_STATS *frame_stats_buf (used to store per frame first
   pass stats)

 For two pass encoding, the function \ref av1_encode() will first be called
 for each frame in the video with the value \ref AV1EncoderConfig.pass = 1.
 This will result in calls to \ref av1_first_pass().

 Statistics for each frame are stored in \ref FIRSTPASS_STATS frame_stats_buf.

 After completion of the first pass, \ref av1_encode() will be called again for
 each frame with \ref AV1EncoderConfig.pass = 2.  The frames are then encoded in
 accordance with the statistics gathered during the first pass by calls to
 \ref encode_frame_to_data_rate().

 \ref encode_frame_to_data_rate() in turn calls (TODO REF)
 av1_get_second_pass_params().

 In summary the second pass code :-

 - Searches for scene cuts (if auto key frame detection is enabled).
 - Defines the length of and hierarchical structure to be used in each
   ARF/GF group.
 - Allocates bits based on the relative complexity of each frame, the quality
   of frame to frame prediction and the type of frame (e.g. key frame, ARF
   frame, golden frame or normal leaf frame).
 - Suggests a maximum Q (quantizer value) for each ARF/GF group, based on
   estimated complexity and recent rate control compliance
   (\ref RATE_CONTROL.active_worst_quality)
 - Tracks adherence to the overall rate control objectives and adjusts
   heuristics.

 The main two pass 2 functions in regard to the above include:-

 - (TODO REF) find_next_key_frame()
 - (TODO REF) define_gf_group()
 - (TODO REF) calculate_total_gf_group_bits()
 - (TODO REF) get_twopass_worst_quality()
 - (TODO REF) av1_gop_setup_structure()
 - (TODO REF) av1_gop_bit_allocation()
 - (TODO REF) av1_twopass_postencode_update()

 For each frame, the two pass algorithm defines a target number of bits
 \ref RATE_CONTROL.base_frame_target,  which is then adjusted if necessary to
 reflect any undershoot or overshoot on previous frames to give
 \ref RATE_CONTROL.this_frame_target.

 As well as \ref RATE_CONTROL.active_worst_quality, the two pass code also
 maintains a record of the actual Q value used to encode previous frames
 at each level in the current pyramid hierarchy
 (\ref RATE_CONTROL.active_best_quality). The function
 \ref rc_pick_q_and_bounds(), uses these values to set a permitted Q range
 for each frame.

 \subsubsection architecture_enc_1pass_lagged 1 Pass Lagged VBR Encoding

 1 pass lagged encode falls between simple 1 pass encoding and full two pass
 encoding and is used for cases where it is not possible to do a full first
 pass through the entire video clip, but where some delay is permissible. For
 example near live streaming where there is a delay of up to a few seconds. In
 this case the first pass and second pass are in effect combined such that the
 first pass starts encoding the clip and the second pass lags behind it by a
 few frames.  When using this method, full sequence level statistics are not
 available, but it is possible to collect and use frame or group of frame level
 data to help in the allocation of bits and in defining ARF/GF coding
 hierarchies.  The reader is referred to the data value
 (TODO REF) cpi->lap_enabled (where <b>lap</b> stands for
 <b>look ahead processing</b>). This encoding mode for the most part uses the
 same rate control pathways as two pass VBR encoding.

 \subsection architecture_enc_rc_loop The Main Rate Control Loop

 Having established a target rate for a given frame and an allowed range of Q
 values, the encoder then tries to encode the frame at a rate that is as close
 as possible to the target value, given the Q range constraints.

 There are two main mechanisms by which this is achieved.

 The first selects a frame level Q, using an adaptive estimate of the number of
 bits that will be generated when the frame is encoded at any given Q.
 Fundamentally this mechanism is common to VBR, CBR and to use cases such as
 RTC with small adjustments.

 As the Q value mainly adjusts the precision of the residual signal, it is not
 actually a reliable basis for accurately predicting the number of bits that
 will be generated across all clips. A well predicted clip, for example, may
 have a much smaller error residual after prediction.  The algorithm copes with
 this by adapting its predictions on the fly using a feedback loop based on how
 well it did the previous time around.

 The main functions responsible for the prediction of Q and the adaptation over
 time, for the two pass encoding pipeline are:

 - \ref rc_pick_q_and_bounds()
 	- (TODO REF) get_q()
 		- (TODO REF) av1_rc_regulate_q()
 			- (TODO REF) get_rate_correction_factor()
 			- (TODO REF) find_closest_qindex_by_rate()

 - (TODO REF) av1_twopass_postencode_update()
 	- (TODO REF) av1_rc_update_rate_correction_factors()

 The second mechanism for control comes into play if there is a large rate miss
 for the current frame (much too big or too small). This is a recode mechanism
 which allows the current frame to be re-encoded one or more times with a
 revised Q value. This obviously has significant implications for encode speed
 and in the case of RTC latency (hence it is not used for the RTC pathway).

 Whether or not a recode is allowed for a given frame depends on the selected
 encode speed vs quality trade off. This is set on the command line using the
 --cpu-used parameter which maps onto the \ref AV1_COMP.speed field in the main
 compressor instance data structure.

 The value of \ref AV1_COMP.speed, combined with the use case, is used to
 populate the speed features data structure AV1_COMP.sf. In particular
 \ref HIGH_LEVEL_SPEED_FEATURES.recode_loop determines the types of frames that
 may be recoded and \ref HIGH_LEVEL_SPEED_FEATURES.recode_tolerance is a rate
 error trigger threshold.

 For more information the reader is directed to the following data structures:

 - \ref AV1_COMP cpi (the main compressor instance data structure)
    - \ref AV1_COMP.speed
    - \ref AV1_COMP.sf (\ref SPEED_FEATURES)

 - \ref SPEED_FEATURES (Encode speed vs quality tradeoff parameters)
 	- \ref SPEED_FEATURES.hl_sf (\ref HIGH_LEVEL_SPEED_FEATURES)

 - \ref HIGH_LEVEL_SPEED_FEATURES
 	- \ref HIGH_LEVEL_SPEED_FEATURES.recode_loop
 	- \ref HIGH_LEVEL_SPEED_FEATURES.recode_tolerance

 and functions:

 - (TODO REF) encode_with_recode_loop()
 - (TODO REF) recode_loop_update_q()
 - (TODO REF) av1_set_speed_features_framesize_independent()
 - (TODO REF) av1_set_speed_features_framesize_dependent()

 \subsection architecture_enc_fixed_q Fixed Q Mode

  Add details here.

 \section architecture_enc_src_proc Source Frame Processing

  Add details here.

 \section architecture_enc_hierachical Hierarchical Coding

  Add details here.

 \section architecture_enc_tpl Temporal Dependency Modelling
 The temporal dependency model runs at the beginning of each GOP. It builds the
 motion trajectory within the GOP in units of 16x16 blocks. The temporal
 dependency of a 16x16 block is evaluated as the predictive coding gains it
 contributes to its trailing motion trajectory. This temporal dependency model
 reflects how important a coding block is for the coding efficiency of the
 overall GOP. It is hence used to scale the Lagrangian multiplier used in the
 rate-distortion optimization framework.

 \subsection architecture_enc_tpl_config Configurations

 The temporal dependency model and its applications are by default turned on in
 libaom encoder for the VoD use case. To disable it, use --tpl-model=0 in the
 aomenc configuration.


 \subsection architecture_enc_tpl_algoritms Algorithms

 The scheme works in the reverse frame processing order over the source frames,
 propagating information from future frames back to the current frame. For each
 frame, a propagation step is run for each MB. it operates as follows:

 <ul>
    <li> Estimate the intra prediction cost in terms of sum of absolute Hadamard
    transform difference (SATD) noted as intra_cost. It also loads the motion
    information available from the first-pass encode and estimates the inter
    prediction cost as inter_cost. Due to the use of hybrid inter/intra
    prediction mode, the inter_cost value is further upper bounded by
    intra_cost. A propagation cost variable is used to collect all the
    information flowed back from future processing frames. It is initialized as
    0 for all the blocks in the last processing frame in a group of pictures
    (GOP).</li>

    <li> The fraction of information from a current block to be propagated towards
    its reference block is estimated as:
 \f[
    propagation\_fraction = (1 − inter\_cost/intra\_cost)
 \f]
    It reflects how much the motion compensated reference would reduce the
    prediction error in percentage.</li>

    <li> The total amount of information the current block contributes to the GOP
    is estimated as intra_cost + propagation_cost. The information that it
    propagates towards its reference block is captured by:

 \f[
    propagation\_amount =
    (intra\_cost + propagation\_cost) ∗ propagation\_fraction
 \f]</li>

    <li> Note that the reference block may not necessarily sit on the grid of
    16x16 blocks. The propagation amount is hence dispensed to all the blocks
    that overlap with the reference block. The corresponding block in the
    reference frame accumulates its own propagation cost as it receives back
    propagation.

 \f[
    propagation\_cost = propagation\_cost +
                        (\frac{overlap\_area}{(16*16)} ∗ propagation\_amount)
 \f]</li>

    <li> In the final encoding stage, the distortion propagation factor of a block
    is evaluated as \f$(1 + \frac{propagation\_cost}{intra\_cost})\f$, where the second term
    captures its impact on later frames in a GOP.</li>

    <li> The Lagrangian multiplier is adapted at the 64x64 block level. For every
    64x64 block in a frame, we have a distortion propagation factor:

 \f[
   dist\_prop[i] = 1 + \frac{propogation\_cost[i]}{intra\_cost[i]}
 \f]

    where i denotes the block index in the frame. We also have the frame level
    distortion propagation factor:

 \f[
   dist\_prop = 1 +
   \frac{\sum_{i}propogation\_cost[i]}{\sum_{i}intra\_cost[i]}
 \f]

    which is used to normalize the propagation factor at the 64x64 block level. The
    Lagrangian multiplier is hence adapted as:

 \f[
   &lambda;[i] = &lambda;[0] * \frac{dist\_prop}{dist\_prop[i]}
 \f]

    where &lambda;0 is the multiplier associated with the frame level QP. The
    64x64 block level QP is scaled according to the Lagrangian multiplier.
 </ul>

 \subsection architecture_enc_tpl_keyfun Key Functions

 - The TPL model is built in (TODO REF) av1_tpl_setup_stats().
 - Its application to the QP offset is triggered in (TODO REF) setup_delta_q().

 \section architecture_enc_partitions Block Partition Search

  Add details here.

 \section architecture_enc_inter_modes Inter Prediction Mode Search

  Add details here.

 \section architecture_enc_intra_modes Intra Mode Search

  Add details here.

 \section architecture_enc_tx_search Transform Search

  Add details here.

 \section architecture_loop_filt Loop Filtering

  Add details here.

 \section architecture_loop_rest Loop Restoration Filtering

  Add details here.

 \section architecture_cdef CDEF

  Add details here.

 \section architecture_entropy Entropy Coding

  Add details here.

 */

 /*!\defgroup encoder_algo Encoder Algorithm
  *
  * The encoder algorithm describes how a sequence is encoded, including high
  * level decision as well as algorithm used at every encoding stage.
  */

 /*!\defgroup high_level_algo High-level Algorithm
  * \ingroup encoder_algo
  * This module describes sequence level/frame level algorithm in AV1.
  * More details will be added.
  * @{
  */

  /*!\defgroup frame_coding_pipeline Frame Coding Pipeline
     \ingroup high_level_algo

  To encode a frame, first call \ref av1_receive_raw_frame() to obtain the raw
  frame data. Then call \ref av1_get_compressed_data() to encode raw frame data
  into compressed frame data. The main body of \ref av1_get_compressed_data()
  is \ref av1_encode_strategy(), which determines high-level encode strategy
  (frame type, frame placement, etc.) and then encodes the frame by calling
  \ref av1_encode(). In \ref av1_encode(), \ref av1_first_pass() will execute
  the first_pass of two-pass encoding, while \ref encode_frame_to_data_rate()
  will perform the final pass for either one-pass or two-pass encoding.

  The main body of \ref encode_frame_to_data_rate() is
  \ref encode_with_recode_loop_and_filter(), which handles encoding before
  in-loop filters (with recode loops encode_with_recode_loop(), or without
  any recode loop \ref encode_without_recode()), followed by in-loop filters
  (deblocking filters \ref loopfilter_frame(), CDEF filters and restoration
  filters \ref cdef_restoration_frame()).

  Except for rate/quality control, both encode_with_recode_loop() and
  \ref encode_without_recode() call \ref av1_encode_frame() to manage reference
  frame buffers and to perform the rest of encoding that does not require
  operating external frames by \ref encode_frame_internal(), which is the
  starting point of \ref partition_search.
  */

  /*!\defgroup two_pass_algo Two Pass Mode
     \ingroup high_level_algo

  In two pass mode, the input file is passed into the encoder for a quick
  first pass, where statistics are gathered. These statistics and the input
  file are then passed back into the encoder for a second pass. The statistics
  help the encoder reach the desired bitrate without as much overshooting or
  undershooting.

  During the first pass, the codec will return "stats" packets that contain
  information useful for the second pass. The caller should concatenate these
  packets as they are received. In the second pass, the concatenated packets
  are passed in, along with the frames to encode. During the second pass,
  "frame" packets are returned that represent the compressed video.

  A complete example can be found in `examples/twopass_encoder.c`. Pseudocode
  is provided below to illustrate the core parts.

  During the first pass, the uncompressed frames are passed in and stats
  information is appended to a byte array.

 ~~~~~~~~~~~~~~~{.c}
 // For simplicity, assume that there is enough memory in the stats buffer.
 // Actual code will want to use a resizable array. stats_len represents
 // the length of data already present in the buffer.
 void get_stats_data(aom_codec_ctx_t *encoder, char *stats,
                     size_t *stats_len, bool *got_data) {
   const aom_codec_cx_pkt_t *pkt;
   aom_codec_iter_t iter = NULL;
   while ((pkt = aom_codec_get_cx_data(encoder, &iter))) {
     *got_data = true;
     if (pkt->kind != AOM_CODEC_STATS_PKT) continue;
     memcpy(stats + *stats_len, pkt->data.twopass_stats.buf,
            pkt->data.twopass_stats.sz);
     *stats_len += pkt->data.twopass_stats.sz;
   }
 }

 void first_pass(char *stats, size_t *stats_len) {
   struct aom_codec_enc_cfg first_pass_cfg;
   ... // Initialize the config as needed.
   first_pass_cfg.g_pass = AOM_RC_FIRST_PASS;
   aom_codec_ctx_t first_pass_encoder;
   ... // Initialize the encoder.

   while (frame_available) {
     // Read in the uncompressed frame, update frame_available
     aom_image_t *frame_to_encode = ...;
     aom_codec_encode(&first_pass_encoder, img, pts, duration, flags);
     get_stats_data(&first_pass_encoder, stats, stats_len);
   }
   // After all frames have been processed, call aom_codec_encode with
   // a NULL ptr repeatedly, until no more data is returned. The NULL
   // ptr tells the encoder that no more frames are available.
   bool got_data;
   do {
     got_data = false;
     aom_codec_encode(&first_pass_encoder, NULL, pts, duration, flags);
     get_stats_data(&first_pass_encoder, stats, stats_len, &got_data);
   } while (got_data);

   aom_codec_destroy(&first_pass_encoder);
 }
 ~~~~~~~~~~~~~~~

  During the second pass, the uncompressed frames and the stats are
  passed into the encoder.

 ~~~~~~~~~~~~~~~{.c}
 // Write out each encoded frame to the file.
 void get_cx_data(aom_codec_ctx_t *encoder, FILE *file,
                  bool *got_data) {
   const aom_codec_cx_pkt_t *pkt;
   aom_codec_iter_t iter = NULL;
   while ((pkt = aom_codec_get_cx_data(encoder, &iter))) {
    *got_data = true;
    if (pkt->kind != AOM_CODEC_CX_FRAME_PKT) continue;
    fwrite(pkt->data.frame.buf, 1, pkt->data.frame.sz, file);
   }
 }

 void second_pass(char *stats, size_t stats_len) {
   struct aom_codec_enc_cfg second_pass_cfg;
   ... // Initialize the config file as needed.
   second_pass_cfg.g_pass = AOM_RC_LAST_PASS;
   cfg.rc_twopass_stats_in.buf = stats;
   cfg.rc_twopass_stats_in.sz = stats_len;
   aom_codec_ctx_t second_pass_encoder;
   ... // Initialize the encoder from the config.

   FILE *output = fopen("output.obu", "wb");
   while (frame_available) {
     // Read in the uncompressed frame, update frame_available
     aom_image_t *frame_to_encode = ...;
     aom_codec_encode(&second_pass_encoder, img, pts, duration, flags);
     get_cx_data(&second_pass_encoder, output);
   }
   // Pass in NULL to flush the encoder.
   bool got_data;
   do {
     got_data = false;
     aom_codec_encode(&second_pass_encoder, NULL, pts, duration, flags);
     get_cx_data(&second_pass_encoder, output, &got_data);
   } while (got_data);

   aom_codec_destroy(&second_pass_encoder);
 }
 ~~~~~~~~~~~~~~~
  */

  /*!\defgroup look_ahead_buffer The Look-Ahead Buffer
     \ingroup high_level_algo

  A program should call \ref aom_codec_encode() for each frame that needs
  processing. These frames are internally copied and stored in a fixed-size
  circular buffer, known as the look-ahead buffer. Other parts of the code
  will use future frame information to inform current frame decisions;
  examples include the first-pass algorithm, TPL model, and temporal filter.
  Note that this buffer also keeps a reference to the last source frame.

  The look-ahead buffer is defined in \ref av1/encoder/lookahead.h. It acts as an
  opaque structure, with an interface to create and free memory associated with
  it. It supports pushing and popping frames onto the structure in a FIFO
  fashion. It also allows look-ahead when using the \ref av1_lookahead_peek()
  function with a non-negative number, and look-behind when -1 is passed in (for
  the last source frame; e.g., firstpass will use this for motion estimation).
  The \ref av1_lookahead_depth() function returns the current number of frames
  stored in it. Note that \ref av1_lookahead_pop() is a bit of a misnomer - it
  only pops if either the "flush" variable is set, or the buffer is at maximum
  capacity.

  The buffer is stored in the \ref AV1_COMP::lookahead field.
  It is initialized in the first call to \ref aom_codec_encode(), in the
  \ref av1_receive_raw_frame() sub-routine. The buffer size is defined by
  the g_lag_in_frames parameter set in the
  \ref aom_codec_enc_cfg_t::g_lag_in_frames struct.
  This can be modified manually but should only be set once. On the command
  line, the flag "--lag-in-frames" controls it. The default size is 19 for
  non-realtime usage and 1 for realtime. Note that a maximum value of 35 is
  enforced.

  A frame will stay in the buffer as long as possible. As mentioned above,
  the \ref av1_lookahead_pop() only removes a frame when either flush is set,
  or the buffer is full. Note that each call to \ref aom_codec_encode() inserts
  another frame into the buffer, and pop is called by the sub-function
  \ref av1_encode_strategy(). The buffer is told to flush when
  \ref aom_codec_encode() is passed a NULL image pointer. Note that the caller
  must repeatedly call \ref aom_codec_encode() with a NULL image pointer, until
  no more packets are available, in order to fully flush the buffer.

  */

 /*! @} - end defgroup high_level_algo */

 /*!\defgroup partition_search Partition Search
  * \ingroup encoder_algo
  A frame is first split into tiles in \ref encode_tiles(), with each tile
  compressed by av1_encode_tile(). Then a tile is processed in superblock rows
  via \ref av1_encode_sb_row() and then \ref encode_sb_row().

  Partition search starts by superblocks that are sequentially processed in
  \ref encode_sb_row(). For a superblock, two search modes are supported
  corresponding to the encoding configurations, \ref encode_nonrd_sb() is for
  1-pass and real-time modes, while \ref encode_rd_sb() performs more
  exhaustive searches.

  Partition search over the recursive quad-tree space is implemented by
  recursively calling \ref av1_nonrd_use_partition(), \ref av1_rd_use_partition(), or
  av1_rd_pick_partition() and returning best options for sub-trees to their
  parent partitions.

  In libaom, partition search lays on top of mode search (predictor, transform,
  etc.) instead of being a separate module, the interface of mode search is
  \ref pick_sb_modes(), which connects \ref partition_search with
  \ref inter_mode_search and \ref intra_mode_search. To make good decisions,
  reconstruction is also required in order to build references and contexts, it
  is implemented by \ref encode_sb() at sub-tree level and \ref encode_b() at
  coding block level.
  * @{
  */
 /*! @} - end defgroup partition_search */

 /*!\defgroup intra_mode_search Intra Mode Search
  * \ingroup encoder_algo
  * This module describes intra mode search algorithm in AV1.
  * More details will be added.
  * @{
  */
 /*! @} - end defgroup intra_mode_search */

 /*!\defgroup inter_mode_search Inter Mode Search
  * \ingroup encoder_algo
  * This module describes inter mode search algorithm in AV1.
  * More details will be added.
  * @{
  */
 /*! @} - end defgroup inter_mode_search */

 /*!\defgroup palette_mode_search Palette Mode Search
  * \ingroup intra_mode_search
  * This module describes palette mode search algorithm in AV1.
  * More details will be added.
  * @{
  */
 /*! @} - end defgroup palette_mode_search */

 /*!\defgroup transform_search Transform Search
  * \ingroup encoder_algo
  * This module describes transform search algorithm in AV1.
  * More details will be added.
  * @{
  */
 /*! @} - end defgroup transform_search */

 /*!\defgroup coefficient_coding Transform Coefficient Coding and Optimization
  * \ingroup encoder_algo
  * This module describes the algorithms of transform coefficient coding and optimization in AV1.
  * More details will be added.
  * @{
  */
 /*! @} - end defgroup coefficient_coding */

 /*!\defgroup in_loop_filter In-loop Filter
  * \ingroup encoder_algo
  * This module describes in-loop filter algorithm in AV1.
  * More details will be added.
  * @{
  */
 /*! @} - end defgroup in_loop_filter */

 /*!\defgroup in_loop_cdef CDEF
  * \ingroup encoder_algo
  * This module describes the CDEF parameter search algorithm
  * in AV1. More details will be added.
  * @{
  */
 /*! @} - end defgroup in_loop_restoration */

 /*!\defgroup in_loop_restoration Loop Restoration
  * \ingroup encoder_algo
  * This module describes the loop restoration search
  * and estimation algorithm in AV1.
  * More details will be added.
  * @{
  */
 /*! @} - end defgroup in_loop_restoration */

 /*!\defgroup rate_control Rate Control
  * \ingroup encoder_algo
  * This module describes rate control algorithm in AV1.
  * More details will be added.
  * @{
  */
 /*! @} - end defgroup rate_control */