blob: 36c260557f5629ce7ef1df8e83f4bb3ede183fd5 [file] [log] [blame]
Paul Wilkinsb534a782020-06-25 18:02:17 +01001/*!\page encoder_guide AV1 ENCODER GUIDE
Yunqing Wangc8f7a3b2020-05-04 15:23:48 -07002
Paul Wilkinsb534a782020-06-25 18:02:17 +01003\tableofcontents
4
5\section architecture_introduction Introduction
6
7This document provides an architectural overview of the libaom AV1 encoder.
8
9It is intended as a high level starting point for anyone wishing to contribute
10to the project, that will help them to more quickly understand the structure
11of the encoder and find their way around the codebase.
12
13It stands above and will where necessary link to more detailed function
14level documents.
15
16\section architecture_gencodecs Generic Block Transform Based Codecs
17
18Most modern video encoders including VP8, H.264, VP9, HEVC and AV1
19(in increasing order of complexity) share a common basic paradigm. This
20comprises separating a stream of raw video frames into a series of discrete
21blocks (of one or more sizes), then computing a prediction signal and a
22quantized, transform coded, residual error signal. The prediction and residual
23error signal, along with any side information needed by the decoder, are then
24entropy coded and packed to form the encoded bitstream. See Figure 1: below,
25where the blue blocks are, to all intents and purposes, the lossless parts of
26the encoder and the red block is the lossy part.
27
28This is of course a gross oversimplification, even in regard to the simplest
29of the above codecs. For example, all of them allow for block based
30prediction at multiple different scales (i.e. different block sizes) and may
31use previously coded pixels in the current frame for prediction or pixels from
32one or more previously encoded frames. Further, they may support multiple
33different transforms and transform sizes and quality optimization tools like
34loop filtering.
35
36\image html genericcodecflow.png "" width=70%
37
38\section architecture_av1_structure AV1 Structure and Complexity
39
40As previously stated, AV1 adopts the same underlying paradigm as other block
41transform based codecs. However, it is much more complicated than previous
42generation codecs and supports many more block partitioning, prediction and
43transform options.
44
45AV1 supports block partitions of various sizes from 128x128 pixels down to 4x4
46pixels using a multi-layer recursive tree structure as illustrated in figure 2
47below.
48
49\image html av1partitions.png "" width=70%
50
51AV1 also provides 71 basic intra prediction modes, 56 single frame inter prediction
52modes (7 reference frames x 4 modes x 2 for OBMC (overlapped block motion
53compensation)), 12768 compound inter prediction modes (that combine inter
54predictors from two reference frames) and 36708 compound inter / intra
55prediction modes. Furthermore, in addition to simple inter motion estimation,
56AV1 also supports warped motion prediction using affine transforms.
57
58In terms of transform coding, it has 16 separable 2-D transform kernels
59{ DCT, ADST, fADST, IDTX }2 that can be applied at up to 19 different scales
60from 64x64 down to 4x4 pixels.
61
62When combined together, this means that for any one 8x8 pixel block in a
63source frame, there are approximately 45,000,000 different ways that it can
64be encoded.
65
66Consequently, AV1 requires complex control processes. While not necessarily
67a normative part of the bitstream, these are the algorithms that turn a set
68of compression tools and a bitstream format specification, into a coherent
69and useful codec implementation. These may include but are not limited to
70things like :-
71
72- Rate distortion optimization (The process of trying to choose the most
73 efficient combination of block size, prediction mode, transform type
74 etc.)
75- Rate control (regulation of the output bitrate)
76- Encoder speed vs quality trade offs.
77- Features such as two pass encoding or optimization for low delay
78 encoding.
79
80For a more detailed overview of AV1’s encoding tools and a discussion of some
81of the design considerations and hardware constraints that had to be
82accommodated, please refer to *** TODO link to Jingning’s AV1 overview paper.
83
84Figure 3 provides a slightly expanded but still simplistic view of the
85AV1 encoder architecture with blocks that relate to some of the subsequent
86sections of this document. In this diagram, the raw uncompressed frame buffers
87are shown in dark green and the reconstructed frame buffers used for
88prediction in light green. Red indicates those parts of the codec that are
89(or may be) “lossy”, where fidelity can be traded off against compression
90efficiency, whilst light blue shows algorithms or coding tools that are
91lossless. The yellow blocks represent non-bitstream normative configuration
92and control algorithms.
93
94\image html av1encoderflow.png "" width=70%
95
96\section architecture_command_line The Libaom Command Line Interface
97
98 Add details or links here: TODO ? elliotk@
99
100\section architecture_enc_data_structures Main Encoder Data Structures
101
102 The following are the main high level data structures used by the libaom AV1 encoder:
103
104 - \ref AV1_COMP
105 - Add details, references or links here: TODO ? urvang@
106
107
108\section architecture_enc_use_cases Encoder Use Cases
109
110 Add details here.
111
112\section architecture_enc_rate_ctrl Rate Control
113
114 Add details here.
115
116\subsection architecture_enc_vbr Variable Bitrate (VBR) Encoding
117
118 Add details here.
119
120\subsection architecture_enc_1pass_lagged 1 Pass Lagged VBR Encoding
121
122 Add details here.
123
124\subsection architecture_enc_rc_loop The Main Rate Control Loop
125
126 Add details here.
127
128\subsection architecture_enc_fixed_q Fixed Q Mode
129
130 Add details here.
131
132\section architecture_enc_src_proc Source Frame Processing
133
134 Add details here.
135
136\section architecture_enc_hierachical Hierarchical Coding
137
138 Add details here.
139
140\section architecture_enc_tpl Temporal Dependency Modelling
141
142 Add details here.
143
144\section architecture_enc_partitions Block Partition Search
145
146 Add details here.
147
148\section architecture_enc_inter_modes Inter Prediction Mode Search
149
150 Add details here.
151
152\section architecture_enc_intra_modes Intra Mode Search
153
154 Add details here.
155
156\section architecture_enc_tx_search Transform Search
157
158 Add details here.
159
160\section architecture_loop_filt Loop Filtering
161
162 Add details here.
163
164\section architecture_loop_rest Loop Restoration Filtering
165
166 Add details here.
167
168\section architecture_cdef CDEF
169
170 Add details here.
171
172\section architecture_entropy Entropy Coding
173
174 Add details here.
175
176*/
Yunqing Wang65cd0102020-05-06 12:57:04 -0700177
178/*!\defgroup encoder_algo Encoder Algorithm
179 *
180 * The encoder algorithm describes how a sequence is encoded, including high
181 * level decision as well as algorithm used at every encoding stage.
182 */
183
184/*!\defgroup high_level_algo High-level Algorithm
185 * \ingroup encoder_algo
186 * This module describes sequence level/frame level algorithm in AV1.
187 * More details will be added.
188 * @{
189 */
Elliott Karpilovsky2ea18362020-06-02 18:32:27 -0700190
Yue Chen0690b012020-06-18 00:52:11 -0700191 /*!\defgroup frame_coding_pipeline Frame Coding Pipeline
192 \ingroup high_level_algo
193
194 To encode a frame, first call \ref av1_receive_raw_frame() to obtain the raw
195 frame data. Then call \ref av1_get_compressed_data() to encode raw frame data
196 into compressed frame data. The main body of \ref av1_get_compressed_data()
197 is \ref av1_encode_strategy(), which determines high-level encode strategy
198 (frame type, frame placement, etc.) and then encodes the frame by calling
199 \ref av1_encode(). In \ref av1_encode(), \ref av1_first_pass() will execute
200 the first_pass of two-pass encoding, while \ref encode_frame_to_data_rate()
201 will perform the final pass for either one-pass or two-pass encoding.
202
203 The main body of \ref encode_frame_to_data_rate() is
204 \ref encode_with_recode_loop_and_filter(), which handles encoding before
205 in-loop filters (with recode loops encode_with_recode_loop(), or without
206 any recode loop \ref encode_without_recode()), followed by in-loop filters
207 (deblocking filters \ref loopfilter_frame(), CDEF filters and restoration
208 filters \ref cdef_restoration_frame()).
209
210 Except for rate/quality control, both encode_with_recode_loop() and
211 \ref encode_without_recode() call \ref av1_encode_frame() to manage reference
212 frame buffers and to perform the rest of encoding that does not require
213 operating external frames by \ref encode_frame_internal(), which is the
214 starting point of \ref partition_search.
215 */
216
Elliott Karpilovsky2ea18362020-06-02 18:32:27 -0700217 /*!\defgroup two_pass_algo Two Pass Mode
218 \ingroup high_level_algo
219
220 In two pass mode, the input file is passed into the encoder for a quick
221 first pass, where statistics are gathered. These statistics and the input
222 file are then passed back into the encoder for a second pass. The statistics
223 help the encoder reach the desired bitrate without as much overshooting or
224 undershooting.
225
226 During the first pass, the codec will return "stats" packets that contain
227 information useful for the second pass. The caller should concatenate these
228 packets as they are received. In the second pass, the concatenated packets
229 are passed in, along with the frames to encode. During the second pass,
230 "frame" packets are returned that represent the compressed video.
231
232 A complete example can be found in `examples/twopass_encoder.c`. Pseudocode
233 is provided below to illustrate the core parts.
234
235 During the first pass, the uncompressed frames are passed in and stats
236 information is appended to a byte array.
237
238~~~~~~~~~~~~~~~{.c}
239// For simplicity, assume that there is enough memory in the stats buffer.
240// Actual code will want to use a resizable array. stats_len represents
241// the length of data already present in the buffer.
242void get_stats_data(aom_codec_ctx_t *encoder, char *stats,
Elliott Karpilovskybbc7d9c2020-06-10 20:36:45 -0700243 size_t *stats_len, bool *got_data) {
Elliott Karpilovsky2ea18362020-06-02 18:32:27 -0700244 const aom_codec_cx_pkt_t *pkt;
245 aom_codec_iter_t iter = NULL;
246 while ((pkt = aom_codec_get_cx_data(encoder, &iter))) {
Elliott Karpilovskybbc7d9c2020-06-10 20:36:45 -0700247 *got_data = true;
Elliott Karpilovsky2ea18362020-06-02 18:32:27 -0700248 if (pkt->kind != AOM_CODEC_STATS_PKT) continue;
249 memcpy(stats + *stats_len, pkt->data.twopass_stats.buf,
250 pkt->data.twopass_stats.sz);
251 *stats_len += pkt->data.twopass_stats.sz;
252 }
253}
254
255void first_pass(char *stats, size_t *stats_len) {
256 struct aom_codec_enc_cfg first_pass_cfg;
257 ... // Initialize the config as needed.
258 first_pass_cfg.g_pass = AOM_RC_FIRST_PASS;
259 aom_codec_ctx_t first_pass_encoder;
260 ... // Initialize the encoder.
261
262 while (frame_available) {
263 // Read in the uncompressed frame, update frame_available
264 aom_image_t *frame_to_encode = ...;
265 aom_codec_encode(&first_pass_encoder, img, pts, duration, flags);
266 get_stats_data(&first_pass_encoder, stats, stats_len);
267 }
268 // After all frames have been processed, call aom_codec_encode with
Elliott Karpilovskybbc7d9c2020-06-10 20:36:45 -0700269 // a NULL ptr repeatedly, until no more data is returned. The NULL
270 // ptr tells the encoder that no more frames are available.
271 bool got_data;
272 do {
273 got_data = false;
274 aom_codec_encode(&first_pass_encoder, NULL, pts, duration, flags);
275 get_stats_data(&first_pass_encoder, stats, stats_len, &got_data);
276 } while (got_data);
Elliott Karpilovsky2ea18362020-06-02 18:32:27 -0700277
278 aom_codec_destroy(&first_pass_encoder);
279}
280~~~~~~~~~~~~~~~
281
282 During the second pass, the uncompressed frames and the stats are
283 passed into the encoder.
284
285~~~~~~~~~~~~~~~{.c}
286// Write out each encoded frame to the file.
Elliott Karpilovskybbc7d9c2020-06-10 20:36:45 -0700287void get_cx_data(aom_codec_ctx_t *encoder, FILE *file,
288 bool *got_data) {
Elliott Karpilovsky2ea18362020-06-02 18:32:27 -0700289 const aom_codec_cx_pkt_t *pkt;
290 aom_codec_iter_t iter = NULL;
291 while ((pkt = aom_codec_get_cx_data(encoder, &iter))) {
Elliott Karpilovskybbc7d9c2020-06-10 20:36:45 -0700292 *got_data = true;
Elliott Karpilovsky2ea18362020-06-02 18:32:27 -0700293 if (pkt->kind != AOM_CODEC_CX_FRAME_PKT) continue;
294 fwrite(pkt->data.frame.buf, 1, pkt->data.frame.sz, file);
295 }
296}
297
298void second_pass(char *stats, size_t stats_len) {
299 struct aom_codec_enc_cfg second_pass_cfg;
300 ... // Initialize the config file as needed.
301 second_pass_cfg.g_pass = AOM_RC_LAST_PASS;
302 cfg.rc_twopass_stats_in.buf = stats;
303 cfg.rc_twopass_stats_in.sz = stats_len;
304 aom_codec_ctx_t second_pass_encoder;
305 ... // Initialize the encoder from the config.
306
307 FILE *output = fopen("output.obu", "wb");
308 while (frame_available) {
309 // Read in the uncompressed frame, update frame_available
310 aom_image_t *frame_to_encode = ...;
311 aom_codec_encode(&second_pass_encoder, img, pts, duration, flags);
312 get_cx_data(&second_pass_encoder, output);
313 }
314 // Pass in NULL to flush the encoder.
Elliott Karpilovskybbc7d9c2020-06-10 20:36:45 -0700315 bool got_data;
316 do {
317 got_data = false;
318 aom_codec_encode(&second_pass_encoder, NULL, pts, duration, flags);
319 get_cx_data(&second_pass_encoder, output, &got_data);
320 } while (got_data);
Elliott Karpilovsky2ea18362020-06-02 18:32:27 -0700321
322 aom_codec_destroy(&second_pass_encoder);
323}
324~~~~~~~~~~~~~~~
325 */
326
Elliott Karpilovskyb6bd2bc2020-06-16 03:23:17 -0700327 /*!\defgroup look_ahead_buffer The Look-Ahead Buffer
328 \ingroup high_level_algo
329
330 A program should call \ref aom_codec_encode() for each frame that needs
331 processing. These frames are internally copied and stored in a fixed-size
332 circular buffer, known as the look-ahead buffer. Other parts of the code
333 will use future frame information to inform current frame decisions;
334 examples include the first-pass algorithm, TPL model, and temporal filter.
335 Note that this buffer also keeps a reference to the last source frame.
336
337 The look-ahead buffer is defined in \ref av1/encoder/lookahead.h. It acts as an
338 opaque structure, with an interface to create and free memory associated with
339 it. It supports pushing and popping frames onto the structure in a FIFO
340 fashion. It also allows look-ahead when using the \ref av1_lookahead_peek()
341 function with a non-negative number, and look-behind when -1 is passed in (for
Elliott Karpilovsky99990592020-06-19 12:22:54 -0700342 the last source frame; e.g., firstpass will use this for motion estimation).
343 The \ref av1_lookahead_depth() function returns the current number of frames
344 stored in it. Note that \ref av1_lookahead_pop() is a bit of a misnomer - it
345 only pops if either the "flush" variable is set, or the buffer is at maximum
346 capacity.
Elliott Karpilovskyb6bd2bc2020-06-16 03:23:17 -0700347
348 The buffer is stored in the \ref AV1_COMP::lookahead field.
349 It is initialized in the first call to \ref aom_codec_encode(), in the
350 \ref av1_receive_raw_frame() sub-routine. The buffer size is defined by
351 the g_lag_in_frames parameter set in the
352 \ref aom_codec_enc_cfg_t::g_lag_in_frames struct.
353 This can be modified manually but should only be set once. On the command
354 line, the flag "--lag-in-frames" controls it. The default size is 19 for
Elliott Karpilovsky99990592020-06-19 12:22:54 -0700355 non-realtime usage and 1 for realtime. Note that a maximum value of 35 is
Elliott Karpilovskyb6bd2bc2020-06-16 03:23:17 -0700356 enforced.
357
358 A frame will stay in the buffer as long as possible. As mentioned above,
359 the \ref av1_lookahead_pop() only removes a frame when either flush is set,
360 or the buffer is full. Note that each call to \ref aom_codec_encode() inserts
361 another frame into the buffer, and pop is called by the sub-function
362 \ref av1_encode_strategy(). The buffer is told to flush when
363 \ref aom_codec_encode() is passed a NULL image pointer. Note that the caller
364 must repeatedly call \ref aom_codec_encode() with a NULL image pointer, until
365 no more packets are available, in order to fully flush the buffer.
366
367 */
368
Yunqing Wang65cd0102020-05-06 12:57:04 -0700369/*! @} - end defgroup high_level_algo */
370
371/*!\defgroup partition_search Partition Search
372 * \ingroup encoder_algo
Yue Chen6c1c3a42020-06-18 15:58:35 -0700373 A frame is first split into tiles in \ref encode_tiles(), with each tile
374 compressed by av1_encode_tile(). Then a tile is processed in superblock rows
375 via \ref av1_encode_sb_row() and then \ref encode_sb_row().
376
377 Partition search starts by superblocks that are sequentially processed in
378 \ref encode_sb_row(). For a superblock, two search modes are supported
379 corresponding to the encoding configurations, \ref encode_nonrd_sb() is for
380 1-pass and real-time modes, while \ref encode_rd_sb() performs more
381 exhaustive searches.
382
383 Partition search over the recursive quad-tree space is implemented by
384 recursively calling \ref nonrd_use_partition(), \ref rd_use_partition(), or
385 rd_pick_partition() and returning best options for sub-trees to their
386 parent partitions.
387
388 In libaom, partition search lays on top of mode search (predictor, transform,
389 etc.) instead of being a separate module, the interface of mode search is
390 \ref pick_sb_modes(), which connects \ref partition_search with
391 \ref inter_mode_search and \ref intra_mode_search. To make good decisions,
392 reconstruction is also required in order to build references and contexts, it
393 is implemented by \ref encode_sb() at sub-tree level and \ref encode_b() at
394 coding block level.
Yunqing Wang65cd0102020-05-06 12:57:04 -0700395 * @{
396 */
397/*! @} - end defgroup partition_search */
398
399/*!\defgroup intra_mode_search Intra Mode Search
400 * \ingroup encoder_algo
401 * This module describes intra mode search algorithm in AV1.
402 * More details will be added.
403 * @{
404 */
405/*! @} - end defgroup intra_mode_search */
406
407/*!\defgroup inter_mode_search Inter Mode Search
408 * \ingroup encoder_algo
409 * This module describes inter mode search algorithm in AV1.
410 * More details will be added.
411 * @{
412 */
413/*! @} - end defgroup inter_mode_search */
414
chiyotsai7cc167e2020-06-12 17:50:53 -0700415/*!\defgroup palette_mode_search Palette Mode Search
416 * \ingroup intra_mode_search
417 * This module describes palette mode search algorithm in AV1.
418 * More details will be added.
419 * @{
420 */
421/*! @} - end defgroup palette_mode_search */
422
Yunqing Wang65cd0102020-05-06 12:57:04 -0700423/*!\defgroup transform_search Transform Search
424 * \ingroup encoder_algo
425 * This module describes transform search algorithm in AV1.
426 * More details will be added.
427 * @{
428 */
429/*! @} - end defgroup transform_search */
430
431/*!\defgroup in_loop_filter In-loop Filter
432 * \ingroup encoder_algo
433 * This module describes in-loop filter algorithm in AV1.
434 * More details will be added.
435 * @{
436 */
437/*! @} - end defgroup in_loop_filter */
438
Debargha Mukherjee7f1580e2020-06-19 06:37:28 -0700439/*!\defgroup in_loop_cdef CDEF
Debargha Mukherjee82b24382020-06-16 23:30:39 -0700440 * \ingroup encoder_algo
441 * This module describes the CDEF parameter search algorithm
442 * in AV1. More details will be added.
443 * @{
444 */
445/*! @} - end defgroup in_loop_restoration */
446
Debargha Mukherjee7f1580e2020-06-19 06:37:28 -0700447/*!\defgroup in_loop_restoration Loop Restoration
Debargha Mukherjee82b24382020-06-16 23:30:39 -0700448 * \ingroup encoder_algo
449 * This module describes the loop restoration search
450 * and estimation algorithm in AV1.
451 * More details will be added.
452 * @{
453 */
454/*! @} - end defgroup in_loop_restoration */
455
Yunqing Wang65cd0102020-05-06 12:57:04 -0700456/*!\defgroup rate_control Rate Control
457 * \ingroup encoder_algo
458 * This module describes rate control algorithm in AV1.
459 * More details will be added.
460 * @{
461 */
Paul Wilkinsb534a782020-06-25 18:02:17 +0100462/*! @} - end defgroup rate_control */