| /*!\page encoder_guide AV1 ENCODING TECHNIQUES |
| |
| AV1 encoding algorithm consists following modules: |
| - \ref high_level_algo |
| - To encode a frame, first call \ref av1_receive_raw_frame() to obtain the |
| raw frame data. Then call \ref av1_get_compressed_data() to encode raw |
| frame data into compressed frame data. |
| - \ref partition_search |
| - \ref intra_mode_search |
| - \ref inter_mode_search |
| - \ref transform_search |
| - \ref in_loop_filter |
| - \ref rate_control |
| |
| <b>[SAMPLE CONTEXT ONLY - copied from AV1 overview paper]:</b> |
| |
| In this paper, we present the core coding tools in AV1 that contribute to |
| the majority of the 30% reduction in average bitrate compared with the most |
| performant libvpx VP9 encoder at the same quality. |
| |
| \section partition Coding block partition |
| VP9 uses a four-way partition tree starting from the 64x64 level down to 4x4 |
| level, with some additional restrictions for blocks below 8x8 where within an |
| 8x8 block all the sub-blocks should have the same reference frame, as shown in |
| the top half of Fig. 1, so as to ensure the chroma component can be |
| processed in a minimum of 4x4 block unit. Note that partitions designated as |
| “R” refer to as recursive in that the same partition tree is repeated at a |
| lower scale until we reach the lowest 4x4 level. |
| |
| \image html partition.png "Fig. 1. Partition tree in VP9 and AV1" |
| |
| AV1 increases the largest coding block unit to 128x128 and expands the |
| partition tree to support 10 possible outcomes to further include 4:1/1:4 |
| rectangular coding block sizes. Similar to VP9 only the square block is |
| allowed for further subdivision. In addition, AV1 adds more flexibility to |
| sub-8x8 coding blocks by allowing each unit has their own inter/intra mode and |
| reference frame choice. To support such flexibility, it allows the use of 2x2 |
| inter prediction for chroma component, while retaining the minimum transform |
| size as 4x4. |
| |
| \section intra_prediction Intra prediction |
| VP9 supports 10 intra prediction modes, including eight directional modes |
| corresponding to angles from 45 to 207 degrees, and two non-directional |
| predictors: DC and true motion (TM) mode. In AV1, the potential of an intra |
| coder is further explored in various ways: the granularity of directional |
| extrapolation is upgraded, non-directional predictors are enriched by taking |
| into account gradients and evolving correlations, coherence of luma and |
| chroma signals is exploited, and tools are developed particularly for |
| artificial content. |
| |
| -# Enhanced directional intra prediction\n |
| To exploit more varieties of spatial redundancy in directional textures, in |
| AV1, directional intra modes are extended to an angle set with finer |
| granularity for blocks larger than 8x8. The original eight angles are made |
| nominal angles, based on which fine angle variations in a step size of 3 |
| degrees are introduced, i.e. the prediction angle is presented by a nominal |
| intra angle plus an angle delta, which is -3x3 multiples of the step size. To |
| implement directional prediction modes in AV1 via a generic way, the 48 |
| extension modes are realized by a unified directional predictor that links |
| each pixel to a reference sub-pixel location in the edge and interpolates |
| the reference pixel by a 2-tap bilinear filter. In total, there are 56 |
| directional intra modes supported in AV1. |
| |
| Another enhancement for directional intra prediction in AV1 is that, a low- |
| pass filter is applied to the reference pixel values before they are used |
| to predict the target block. The filter strength is pre-defined based on |
| the prediction angle and block size. |
| |
| -# New non-directional smooth intra predictors\n |
| VP9 has two non-directional intra prediction modes: DC_PRED and TM_PRED. |
| AV1 expands on this by adding three new prediction modes: SMOOTH_PRED, |
| SMOOTH_V_PRED, and SMOOTH_H_PRED. Also a fourth new prediction mode |
| PAETH_PRED replaces the existing mode TM_PRED. The new modes work as |
| follows: |
| |
| - <b>SMOOTH_PRED</b>: Useful for predicting blocks that have a smooth |
| gradient. |
| |
| - <b>SMOOTH_V_PRED</b>: Similar to SMOOTH_PRED, but uses quadratic |
| interpolation only in the vertical direction. |
| |
| - <b>SMOOTH_H_PRED</b>: Similar to SMOOTH_PRED, but uses quadratic |
| interpolation only in the horizontal direction. |
| |
| - <b>PAETH_PRED</b>: Calculate \f$base=left + top -top\_left\f$. Then |
| predict this pixel as left, top, or top-left pixel depending on which of them |
| is closest to “base”. |
| |
| \section inter_prediction Inter prediction |
| Motion compensation is an essential module in video coding. AV1 has a more |
| powerful inter coder, which largely extends the pool of reference frames and |
| motion vectors, breaks the limitation of block-based translational prediction, |
| also enhances compound prediction by using highly adaptable weighting |
| algorithms as well as sources. |
| |
| -# Extended reference frames\n |
| AV1 extends the number of references for each frame from 3 to 7. Figure 4 |
| demonstrates the multi-layer structure of a golden-frame group, in which an |
| adaptive number of frames share the same GOLDEN and ALTREF frames. BWDREF |
| is a look-ahead frame directly coded without applying temporal filtering, |
| thus more applicable as a backward reference in a relatively shorter |
| distance. ALTREF2 serves as an intermediate filtered future reference |
| between GOLDEN and ALTREF. |
| |
| \image html gf_group.png "Fig. 4. Example of multi-layer structure of a golden-frame group" |
| |
| -# Advanced compound prediction\n |
| A collection of new compound prediction tools is developed for AV1 to make |
| its inter coder more versatile. In this section, any compound prediction |
| operation can be generalized for a pixel \f$(i,j)\f$ as: |
| \f$p_f(i,j)=m(i,j)p_1(i,j)+(1-m(i,j))p_2(i,j)\f$, where \f$p_1\f$ and |
| \f$p_2\f$ are two predictors, and \f$p_f\f$ is the final joint prediction, |
| with the weighting coefficients \f$m(i,j)\f$ in \f$[0,1]\f$ that are |
| designed for different use cases and can be easily generated from predefined |
| tables. |
| */ |
| |
| /*!\defgroup encoder_algo Encoder Algorithm |
| * |
| * The encoder algorithm describes how a sequence is encoded, including high |
| * level decision as well as algorithm used at every encoding stage. |
| */ |
| |
| /*!\defgroup high_level_algo High-level Algorithm |
| * \ingroup encoder_algo |
| * This module describes sequence level/frame level algorithm in AV1. |
| * More details will be added. |
| * @{ |
| */ |
| |
| /*!\defgroup two_pass_algo Two Pass Mode |
| \ingroup high_level_algo |
| |
| In two pass mode, the input file is passed into the encoder for a quick |
| first pass, where statistics are gathered. These statistics and the input |
| file are then passed back into the encoder for a second pass. The statistics |
| help the encoder reach the desired bitrate without as much overshooting or |
| undershooting. |
| |
| During the first pass, the codec will return "stats" packets that contain |
| information useful for the second pass. The caller should concatenate these |
| packets as they are received. In the second pass, the concatenated packets |
| are passed in, along with the frames to encode. During the second pass, |
| "frame" packets are returned that represent the compressed video. |
| |
| A complete example can be found in `examples/twopass_encoder.c`. Pseudocode |
| is provided below to illustrate the core parts. |
| |
| During the first pass, the uncompressed frames are passed in and stats |
| information is appended to a byte array. |
| |
| ~~~~~~~~~~~~~~~{.c} |
| // For simplicity, assume that there is enough memory in the stats buffer. |
| // Actual code will want to use a resizable array. stats_len represents |
| // the length of data already present in the buffer. |
| void get_stats_data(aom_codec_ctx_t *encoder, char *stats, |
| size_t *stats_len) { |
| const aom_codec_cx_pkt_t *pkt; |
| aom_codec_iter_t iter = NULL; |
| while ((pkt = aom_codec_get_cx_data(encoder, &iter))) { |
| if (pkt->kind != AOM_CODEC_STATS_PKT) continue; |
| memcpy(stats + *stats_len, pkt->data.twopass_stats.buf, |
| pkt->data.twopass_stats.sz); |
| *stats_len += pkt->data.twopass_stats.sz; |
| } |
| } |
| |
| void first_pass(char *stats, size_t *stats_len) { |
| struct aom_codec_enc_cfg first_pass_cfg; |
| ... // Initialize the config as needed. |
| first_pass_cfg.g_pass = AOM_RC_FIRST_PASS; |
| aom_codec_ctx_t first_pass_encoder; |
| ... // Initialize the encoder. |
| |
| while (frame_available) { |
| // Read in the uncompressed frame, update frame_available |
| aom_image_t *frame_to_encode = ...; |
| aom_codec_encode(&first_pass_encoder, img, pts, duration, flags); |
| get_stats_data(&first_pass_encoder, stats, stats_len); |
| } |
| // After all frames have been processed, call aom_codec_encode with |
| // a NULL ptr. This tells the encoder to flush all data. |
| aom_codec_encode(&first_pass_encoder, NULL, pts, duration, flags); |
| get_stats_data(&first_pass_encoder, stats, stats_len); |
| |
| aom_codec_destroy(&first_pass_encoder); |
| } |
| ~~~~~~~~~~~~~~~ |
| |
| During the second pass, the uncompressed frames and the stats are |
| passed into the encoder. |
| |
| ~~~~~~~~~~~~~~~{.c} |
| // Write out each encoded frame to the file. |
| void get_cx_data(aom_codec_ctx_t *encoder, FILE *file) { |
| const aom_codec_cx_pkt_t *pkt; |
| aom_codec_iter_t iter = NULL; |
| while ((pkt = aom_codec_get_cx_data(encoder, &iter))) { |
| if (pkt->kind != AOM_CODEC_CX_FRAME_PKT) continue; |
| fwrite(pkt->data.frame.buf, 1, pkt->data.frame.sz, file); |
| } |
| } |
| |
| void second_pass(char *stats, size_t stats_len) { |
| struct aom_codec_enc_cfg second_pass_cfg; |
| ... // Initialize the config file as needed. |
| second_pass_cfg.g_pass = AOM_RC_LAST_PASS; |
| cfg.rc_twopass_stats_in.buf = stats; |
| cfg.rc_twopass_stats_in.sz = stats_len; |
| aom_codec_ctx_t second_pass_encoder; |
| ... // Initialize the encoder from the config. |
| |
| FILE *output = fopen("output.obu", "wb"); |
| while (frame_available) { |
| // Read in the uncompressed frame, update frame_available |
| aom_image_t *frame_to_encode = ...; |
| aom_codec_encode(&second_pass_encoder, img, pts, duration, flags); |
| get_cx_data(&second_pass_encoder, output); |
| } |
| // Pass in NULL to flush the encoder. |
| aom_codec_encode(&second_pass_encoder, NULL, pts, duration, flags); |
| get_cx_data(&second_pass_encoder, output); |
| |
| aom_codec_destroy(&second_pass_encoder); |
| } |
| ~~~~~~~~~~~~~~~ |
| */ |
| |
| /*! @} - end defgroup high_level_algo */ |
| |
| /*!\defgroup partition_search Partition Search |
| * \ingroup encoder_algo |
| * This module describes partition search algorithm in AV1. |
| * More details will be added. |
| * @{ |
| */ |
| /*! @} - end defgroup partition_search */ |
| |
| /*!\defgroup intra_mode_search Intra Mode Search |
| * \ingroup encoder_algo |
| * This module describes intra mode search algorithm in AV1. |
| * More details will be added. |
| * @{ |
| */ |
| /*! @} - end defgroup intra_mode_search */ |
| |
| /*!\defgroup inter_mode_search Inter Mode Search |
| * \ingroup encoder_algo |
| * This module describes inter mode search algorithm in AV1. |
| * More details will be added. |
| * @{ |
| */ |
| /*! @} - end defgroup inter_mode_search */ |
| |
| /*!\defgroup transform_search Transform Search |
| * \ingroup encoder_algo |
| * This module describes transform search algorithm in AV1. |
| * More details will be added. |
| * @{ |
| */ |
| /*! @} - end defgroup transform_search */ |
| |
| /*!\defgroup in_loop_filter In-loop Filter |
| * \ingroup encoder_algo |
| * This module describes in-loop filter algorithm in AV1. |
| * More details will be added. |
| * @{ |
| */ |
| /*! @} - end defgroup in_loop_filter */ |
| |
| /*!\defgroup rate_control Rate Control |
| * \ingroup encoder_algo |
| * This module describes rate control algorithm in AV1. |
| * More details will be added. |
| * @{ |
| */ |
| /*! @} - end defgroup rate_control */ |