Blame - doc/dev_guide/av1_encoder.dox - avm

blob: 36c260557f5629ce7ef1df8e83f4bb3ede183fd5 [file] [log] [blame]

Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame^]	1	/*!\page encoder_guide AV1 ENCODER GUIDE
Yunqing Wang	c8f7a3b	2020-05-04 15:23:48 -0700	[diff] [blame]	2
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame^]	3	\tableofcontents
				4
				5	\section architecture_introduction Introduction
				6
				7	This document provides an architectural overview of the libaom AV1 encoder.
				8
				9	It is intended as a high level starting point for anyone wishing to contribute
				10	to the project, that will help them to more quickly understand the structure
				11	of the encoder and find their way around the codebase.
				12
				13	It stands above and will where necessary link to more detailed function
				14	level documents.
				15
				16	\section architecture_gencodecs Generic Block Transform Based Codecs
				17
				18	Most modern video encoders including VP8, H.264, VP9, HEVC and AV1
				19	(in increasing order of complexity) share a common basic paradigm. This
				20	comprises separating a stream of raw video frames into a series of discrete
				21	blocks (of one or more sizes), then computing a prediction signal and a
				22	quantized, transform coded, residual error signal. The prediction and residual
				23	error signal, along with any side information needed by the decoder, are then
				24	entropy coded and packed to form the encoded bitstream. See Figure 1: below,
				25	where the blue blocks are, to all intents and purposes, the lossless parts of
				26	the encoder and the red block is the lossy part.
				27
				28	This is of course a gross oversimplification, even in regard to the simplest
				29	of the above codecs. For example, all of them allow for block based
				30	prediction at multiple different scales (i.e. different block sizes) and may
				31	use previously coded pixels in the current frame for prediction or pixels from
				32	one or more previously encoded frames. Further, they may support multiple
				33	different transforms and transform sizes and quality optimization tools like
				34	loop filtering.
				35
				36	\image html genericcodecflow.png "" width=70%
				37
				38	\section architecture_av1_structure AV1 Structure and Complexity
				39
				40	As previously stated, AV1 adopts the same underlying paradigm as other block
				41	transform based codecs. However, it is much more complicated than previous
				42	generation codecs and supports many more block partitioning, prediction and
				43	transform options.
				44
				45	AV1 supports block partitions of various sizes from 128x128 pixels down to 4x4
				46	pixels using a multi-layer recursive tree structure as illustrated in figure 2
				47	below.
				48
				49	\image html av1partitions.png "" width=70%
				50
				51	AV1 also provides 71 basic intra prediction modes, 56 single frame inter prediction
				52	modes (7 reference frames x 4 modes x 2 for OBMC (overlapped block motion
				53	compensation)), 12768 compound inter prediction modes (that combine inter
				54	predictors from two reference frames) and 36708 compound inter / intra
				55	prediction modes. Furthermore, in addition to simple inter motion estimation,
				56	AV1 also supports warped motion prediction using affine transforms.
				57
				58	In terms of transform coding, it has 16 separable 2-D transform kernels
				59	{ DCT, ADST, fADST, IDTX }2 that can be applied at up to 19 different scales
				60	from 64x64 down to 4x4 pixels.
				61
				62	When combined together, this means that for any one 8x8 pixel block in a
				63	source frame, there are approximately 45,000,000 different ways that it can
				64	be encoded.
				65
				66	Consequently, AV1 requires complex control processes. While not necessarily
				67	a normative part of the bitstream, these are the algorithms that turn a set
				68	of compression tools and a bitstream format specification, into a coherent
				69	and useful codec implementation. These may include but are not limited to
				70	things like :-
				71
				72	- Rate distortion optimization (The process of trying to choose the most
				73	efficient combination of block size, prediction mode, transform type
				74	etc.)
				75	- Rate control (regulation of the output bitrate)
				76	- Encoder speed vs quality trade offs.
				77	- Features such as two pass encoding or optimization for low delay
				78	encoding.
				79
				80	For a more detailed overview of AV1s encoding tools and a discussion of some
				81	of the design considerations and hardware constraints that had to be
				82	accommodated, please refer to *** TODO link to Jingnings AV1 overview paper.
				83
				84	Figure 3 provides a slightly expanded but still simplistic view of the
				85	AV1 encoder architecture with blocks that relate to some of the subsequent
				86	sections of this document. In this diagram, the raw uncompressed frame buffers
				87	are shown in dark green and the reconstructed frame buffers used for
				88	prediction in light green. Red indicates those parts of the codec that are
				89	(or may be) lossy, where fidelity can be traded off against compression
				90	efficiency, whilst light blue shows algorithms or coding tools that are
				91	lossless. The yellow blocks represent non-bitstream normative configuration
				92	and control algorithms.
				93
				94	\image html av1encoderflow.png "" width=70%
				95
				96	\section architecture_command_line The Libaom Command Line Interface
				97
				98	Add details or links here: TODO ? elliotk@
				99
				100	\section architecture_enc_data_structures Main Encoder Data Structures
				101
				102	The following are the main high level data structures used by the libaom AV1 encoder:
				103
				104	- \ref AV1_COMP
				105	- Add details, references or links here: TODO ? urvang@
				106
				107
				108	\section architecture_enc_use_cases Encoder Use Cases
				109
				110	Add details here.
				111
				112	\section architecture_enc_rate_ctrl Rate Control
				113
				114	Add details here.
				115
				116	\subsection architecture_enc_vbr Variable Bitrate (VBR) Encoding
				117
				118	Add details here.
				119
				120	\subsection architecture_enc_1pass_lagged 1 Pass Lagged VBR Encoding
				121
				122	Add details here.
				123
				124	\subsection architecture_enc_rc_loop The Main Rate Control Loop
				125
				126	Add details here.
				127
				128	\subsection architecture_enc_fixed_q Fixed Q Mode
				129
				130	Add details here.
				131
				132	\section architecture_enc_src_proc Source Frame Processing
				133
				134	Add details here.
				135
				136	\section architecture_enc_hierachical Hierarchical Coding
				137
				138	Add details here.
				139
				140	\section architecture_enc_tpl Temporal Dependency Modelling
				141
				142	Add details here.
				143
				144	\section architecture_enc_partitions Block Partition Search
				145
				146	Add details here.
				147
				148	\section architecture_enc_inter_modes Inter Prediction Mode Search
				149
				150	Add details here.
				151
				152	\section architecture_enc_intra_modes Intra Mode Search
				153
				154	Add details here.
				155
				156	\section architecture_enc_tx_search Transform Search
				157
				158	Add details here.
				159
				160	\section architecture_loop_filt Loop Filtering
				161
				162	Add details here.
				163
				164	\section architecture_loop_rest Loop Restoration Filtering
				165
				166	Add details here.
				167
				168	\section architecture_cdef CDEF
				169
				170	Add details here.
				171
				172	\section architecture_entropy Entropy Coding
				173
				174	Add details here.
				175
				176	*/
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	177
				178	/*!\defgroup encoder_algo Encoder Algorithm
				179	*
				180	* The encoder algorithm describes how a sequence is encoded, including high
				181	* level decision as well as algorithm used at every encoding stage.
				182	*/
				183
				184	/*!\defgroup high_level_algo High-level Algorithm
				185	* \ingroup encoder_algo
				186	* This module describes sequence level/frame level algorithm in AV1.
				187	* More details will be added.
				188	* @{
				189	*/
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	190
Yue Chen	0690b01	2020-06-18 00:52:11 -0700	[diff] [blame]	191	/*!\defgroup frame_coding_pipeline Frame Coding Pipeline
				192	\ingroup high_level_algo
				193
				194	To encode a frame, first call \ref av1_receive_raw_frame() to obtain the raw
				195	frame data. Then call \ref av1_get_compressed_data() to encode raw frame data
				196	into compressed frame data. The main body of \ref av1_get_compressed_data()
				197	is \ref av1_encode_strategy(), which determines high-level encode strategy
				198	(frame type, frame placement, etc.) and then encodes the frame by calling
				199	\ref av1_encode(). In \ref av1_encode(), \ref av1_first_pass() will execute
				200	the first_pass of two-pass encoding, while \ref encode_frame_to_data_rate()
				201	will perform the final pass for either one-pass or two-pass encoding.
				202
				203	The main body of \ref encode_frame_to_data_rate() is
				204	\ref encode_with_recode_loop_and_filter(), which handles encoding before
				205	in-loop filters (with recode loops encode_with_recode_loop(), or without
				206	any recode loop \ref encode_without_recode()), followed by in-loop filters
				207	(deblocking filters \ref loopfilter_frame(), CDEF filters and restoration
				208	filters \ref cdef_restoration_frame()).
				209
				210	Except for rate/quality control, both encode_with_recode_loop() and
				211	\ref encode_without_recode() call \ref av1_encode_frame() to manage reference
				212	frame buffers and to perform the rest of encoding that does not require
				213	operating external frames by \ref encode_frame_internal(), which is the
				214	starting point of \ref partition_search.
				215	*/
				216
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	217	/*!\defgroup two_pass_algo Two Pass Mode
				218	\ingroup high_level_algo
				219
				220	In two pass mode, the input file is passed into the encoder for a quick
				221	first pass, where statistics are gathered. These statistics and the input
				222	file are then passed back into the encoder for a second pass. The statistics
				223	help the encoder reach the desired bitrate without as much overshooting or
				224	undershooting.
				225
				226	During the first pass, the codec will return "stats" packets that contain
				227	information useful for the second pass. The caller should concatenate these
				228	packets as they are received. In the second pass, the concatenated packets
				229	are passed in, along with the frames to encode. During the second pass,
				230	"frame" packets are returned that represent the compressed video.
				231
				232	A complete example can be found in `examples/twopass_encoder.c`. Pseudocode
				233	is provided below to illustrate the core parts.
				234
				235	During the first pass, the uncompressed frames are passed in and stats
				236	information is appended to a byte array.
				237
				238	~~~~~~~~~~~~~~~{.c}
				239	// For simplicity, assume that there is enough memory in the stats buffer.
				240	// Actual code will want to use a resizable array. stats_len represents
				241	// the length of data already present in the buffer.
				242	void get_stats_data(aom_codec_ctx_t encoder, char stats,
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	243	size_t stats_len, bool got_data) {
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	244	const aom_codec_cx_pkt_t *pkt;
				245	aom_codec_iter_t iter = NULL;
				246	while ((pkt = aom_codec_get_cx_data(encoder, &iter))) {
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	247	*got_data = true;
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	248	if (pkt->kind != AOM_CODEC_STATS_PKT) continue;
				249	memcpy(stats + *stats_len, pkt->data.twopass_stats.buf,
				250	pkt->data.twopass_stats.sz);
				251	*stats_len += pkt->data.twopass_stats.sz;
				252	}
				253	}
				254
				255	void first_pass(char stats, size_t stats_len) {
				256	struct aom_codec_enc_cfg first_pass_cfg;
				257	... // Initialize the config as needed.
				258	first_pass_cfg.g_pass = AOM_RC_FIRST_PASS;
				259	aom_codec_ctx_t first_pass_encoder;
				260	... // Initialize the encoder.
				261
				262	while (frame_available) {
				263	// Read in the uncompressed frame, update frame_available
				264	aom_image_t *frame_to_encode = ...;
				265	aom_codec_encode(&first_pass_encoder, img, pts, duration, flags);
				266	get_stats_data(&first_pass_encoder, stats, stats_len);
				267	}
				268	// After all frames have been processed, call aom_codec_encode with
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	269	// a NULL ptr repeatedly, until no more data is returned. The NULL
				270	// ptr tells the encoder that no more frames are available.
				271	bool got_data;
				272	do {
				273	got_data = false;
				274	aom_codec_encode(&first_pass_encoder, NULL, pts, duration, flags);
				275	get_stats_data(&first_pass_encoder, stats, stats_len, &got_data);
				276	} while (got_data);
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	277
				278	aom_codec_destroy(&first_pass_encoder);
				279	}
				280	~~~~~~~~~~~~~~~
				281
				282	During the second pass, the uncompressed frames and the stats are
				283	passed into the encoder.
				284
				285	~~~~~~~~~~~~~~~{.c}
				286	// Write out each encoded frame to the file.
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	287	void get_cx_data(aom_codec_ctx_t encoder, FILE file,
				288	bool *got_data) {
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	289	const aom_codec_cx_pkt_t *pkt;
				290	aom_codec_iter_t iter = NULL;
				291	while ((pkt = aom_codec_get_cx_data(encoder, &iter))) {
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	292	*got_data = true;
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	293	if (pkt->kind != AOM_CODEC_CX_FRAME_PKT) continue;
				294	fwrite(pkt->data.frame.buf, 1, pkt->data.frame.sz, file);
				295	}
				296	}
				297
				298	void second_pass(char *stats, size_t stats_len) {
				299	struct aom_codec_enc_cfg second_pass_cfg;
				300	... // Initialize the config file as needed.
				301	second_pass_cfg.g_pass = AOM_RC_LAST_PASS;
				302	cfg.rc_twopass_stats_in.buf = stats;
				303	cfg.rc_twopass_stats_in.sz = stats_len;
				304	aom_codec_ctx_t second_pass_encoder;
				305	... // Initialize the encoder from the config.
				306
				307	FILE *output = fopen("output.obu", "wb");
				308	while (frame_available) {
				309	// Read in the uncompressed frame, update frame_available
				310	aom_image_t *frame_to_encode = ...;
				311	aom_codec_encode(&second_pass_encoder, img, pts, duration, flags);
				312	get_cx_data(&second_pass_encoder, output);
				313	}
				314	// Pass in NULL to flush the encoder.
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	315	bool got_data;
				316	do {
				317	got_data = false;
				318	aom_codec_encode(&second_pass_encoder, NULL, pts, duration, flags);
				319	get_cx_data(&second_pass_encoder, output, &got_data);
				320	} while (got_data);
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	321
				322	aom_codec_destroy(&second_pass_encoder);
				323	}
				324	~~~~~~~~~~~~~~~
				325	*/
				326
Elliott Karpilovsky	b6bd2bc	2020-06-16 03:23:17 -0700	[diff] [blame]	327	/*!\defgroup look_ahead_buffer The Look-Ahead Buffer
				328	\ingroup high_level_algo
				329
				330	A program should call \ref aom_codec_encode() for each frame that needs
				331	processing. These frames are internally copied and stored in a fixed-size
				332	circular buffer, known as the look-ahead buffer. Other parts of the code
				333	will use future frame information to inform current frame decisions;
				334	examples include the first-pass algorithm, TPL model, and temporal filter.
				335	Note that this buffer also keeps a reference to the last source frame.
				336
				337	The look-ahead buffer is defined in \ref av1/encoder/lookahead.h. It acts as an
				338	opaque structure, with an interface to create and free memory associated with
				339	it. It supports pushing and popping frames onto the structure in a FIFO
				340	fashion. It also allows look-ahead when using the \ref av1_lookahead_peek()
				341	function with a non-negative number, and look-behind when -1 is passed in (for
Elliott Karpilovsky	9999059	2020-06-19 12:22:54 -0700	[diff] [blame]	342	the last source frame; e.g., firstpass will use this for motion estimation).
				343	The \ref av1_lookahead_depth() function returns the current number of frames
				344	stored in it. Note that \ref av1_lookahead_pop() is a bit of a misnomer - it
				345	only pops if either the "flush" variable is set, or the buffer is at maximum
				346	capacity.
Elliott Karpilovsky	b6bd2bc	2020-06-16 03:23:17 -0700	[diff] [blame]	347
				348	The buffer is stored in the \ref AV1_COMP::lookahead field.
				349	It is initialized in the first call to \ref aom_codec_encode(), in the
				350	\ref av1_receive_raw_frame() sub-routine. The buffer size is defined by
				351	the g_lag_in_frames parameter set in the
				352	\ref aom_codec_enc_cfg_t::g_lag_in_frames struct.
				353	This can be modified manually but should only be set once. On the command
				354	line, the flag "--lag-in-frames" controls it. The default size is 19 for
Elliott Karpilovsky	9999059	2020-06-19 12:22:54 -0700	[diff] [blame]	355	non-realtime usage and 1 for realtime. Note that a maximum value of 35 is
Elliott Karpilovsky	b6bd2bc	2020-06-16 03:23:17 -0700	[diff] [blame]	356	enforced.
				357
				358	A frame will stay in the buffer as long as possible. As mentioned above,
				359	the \ref av1_lookahead_pop() only removes a frame when either flush is set,
				360	or the buffer is full. Note that each call to \ref aom_codec_encode() inserts
				361	another frame into the buffer, and pop is called by the sub-function
				362	\ref av1_encode_strategy(). The buffer is told to flush when
				363	\ref aom_codec_encode() is passed a NULL image pointer. Note that the caller
				364	must repeatedly call \ref aom_codec_encode() with a NULL image pointer, until
				365	no more packets are available, in order to fully flush the buffer.
				366
				367	*/
				368
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	369	/! @} - end defgroup high_level_algo /
				370
				371	/*!\defgroup partition_search Partition Search
				372	* \ingroup encoder_algo
Yue Chen	6c1c3a4	2020-06-18 15:58:35 -0700	[diff] [blame]	373	A frame is first split into tiles in \ref encode_tiles(), with each tile
				374	compressed by av1_encode_tile(). Then a tile is processed in superblock rows
				375	via \ref av1_encode_sb_row() and then \ref encode_sb_row().
				376
				377	Partition search starts by superblocks that are sequentially processed in
				378	\ref encode_sb_row(). For a superblock, two search modes are supported
				379	corresponding to the encoding configurations, \ref encode_nonrd_sb() is for
				380	1-pass and real-time modes, while \ref encode_rd_sb() performs more
				381	exhaustive searches.
				382
				383	Partition search over the recursive quad-tree space is implemented by
				384	recursively calling \ref nonrd_use_partition(), \ref rd_use_partition(), or
				385	rd_pick_partition() and returning best options for sub-trees to their
				386	parent partitions.
				387
				388	In libaom, partition search lays on top of mode search (predictor, transform,
				389	etc.) instead of being a separate module, the interface of mode search is
				390	\ref pick_sb_modes(), which connects \ref partition_search with
				391	\ref inter_mode_search and \ref intra_mode_search. To make good decisions,
				392	reconstruction is also required in order to build references and contexts, it
				393	is implemented by \ref encode_sb() at sub-tree level and \ref encode_b() at
				394	coding block level.
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	395	* @{
				396	*/
				397	/! @} - end defgroup partition_search /
				398
				399	/*!\defgroup intra_mode_search Intra Mode Search
				400	* \ingroup encoder_algo
				401	* This module describes intra mode search algorithm in AV1.
				402	* More details will be added.
				403	* @{
				404	*/
				405	/! @} - end defgroup intra_mode_search /
				406
				407	/*!\defgroup inter_mode_search Inter Mode Search
				408	* \ingroup encoder_algo
				409	* This module describes inter mode search algorithm in AV1.
				410	* More details will be added.
				411	* @{
				412	*/
				413	/! @} - end defgroup inter_mode_search /
				414
chiyotsai	7cc167e	2020-06-12 17:50:53 -0700	[diff] [blame]	415	/*!\defgroup palette_mode_search Palette Mode Search
				416	* \ingroup intra_mode_search
				417	* This module describes palette mode search algorithm in AV1.
				418	* More details will be added.
				419	* @{
				420	*/
				421	/! @} - end defgroup palette_mode_search /
				422
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	423	/*!\defgroup transform_search Transform Search
				424	* \ingroup encoder_algo
				425	* This module describes transform search algorithm in AV1.
				426	* More details will be added.
				427	* @{
				428	*/
				429	/! @} - end defgroup transform_search /
				430
				431	/*!\defgroup in_loop_filter In-loop Filter
				432	* \ingroup encoder_algo
				433	* This module describes in-loop filter algorithm in AV1.
				434	* More details will be added.
				435	* @{
				436	*/
				437	/! @} - end defgroup in_loop_filter /
				438
Debargha Mukherjee	7f1580e	2020-06-19 06:37:28 -0700	[diff] [blame]	439	/*!\defgroup in_loop_cdef CDEF
Debargha Mukherjee	82b2438	2020-06-16 23:30:39 -0700	[diff] [blame]	440	* \ingroup encoder_algo
				441	* This module describes the CDEF parameter search algorithm
				442	* in AV1. More details will be added.
				443	* @{
				444	*/
				445	/! @} - end defgroup in_loop_restoration /
				446
Debargha Mukherjee	7f1580e	2020-06-19 06:37:28 -0700	[diff] [blame]	447	/*!\defgroup in_loop_restoration Loop Restoration
Debargha Mukherjee	82b2438	2020-06-16 23:30:39 -0700	[diff] [blame]	448	* \ingroup encoder_algo
				449	* This module describes the loop restoration search
				450	* and estimation algorithm in AV1.
				451	* More details will be added.
				452	* @{
				453	*/
				454	/! @} - end defgroup in_loop_restoration /
				455
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	456	/*!\defgroup rate_control Rate Control
				457	* \ingroup encoder_algo
				458	* This module describes rate control algorithm in AV1.
				459	* More details will be added.
				460	* @{
				461	*/
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame^]	462	/! @} - end defgroup rate_control /