Blame - doc/dev_guide/av1_encoder.dox - aom

blob: a40b58933b5bb7b72bb158abfc04332e7a2d321b [file] [log] [blame]

Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	1	/*!\page encoder_guide AV1 ENCODER GUIDE
Yunqing Wang	c8f7a3b	2020-05-04 15:23:48 -0700	[diff] [blame]	2
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	3	\tableofcontents
				4
				5	\section architecture_introduction Introduction
				6
				7	This document provides an architectural overview of the libaom AV1 encoder.
				8
				9	It is intended as a high level starting point for anyone wishing to contribute
				10	to the project, that will help them to more quickly understand the structure
				11	of the encoder and find their way around the codebase.
				12
				13	It stands above and will where necessary link to more detailed function
				14	level documents.
				15
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	16	\subsection architecture_gencodecs Generic Block Transform Based Codecs
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	17
				18	Most modern video encoders including VP8, H.264, VP9, HEVC and AV1
				19	(in increasing order of complexity) share a common basic paradigm. This
				20	comprises separating a stream of raw video frames into a series of discrete
				21	blocks (of one or more sizes), then computing a prediction signal and a
				22	quantized, transform coded, residual error signal. The prediction and residual
				23	error signal, along with any side information needed by the decoder, are then
				24	entropy coded and packed to form the encoded bitstream. See Figure 1: below,
				25	where the blue blocks are, to all intents and purposes, the lossless parts of
				26	the encoder and the red block is the lossy part.
				27
				28	This is of course a gross oversimplification, even in regard to the simplest
				29	of the above codecs. For example, all of them allow for block based
				30	prediction at multiple different scales (i.e. different block sizes) and may
				31	use previously coded pixels in the current frame for prediction or pixels from
				32	one or more previously encoded frames. Further, they may support multiple
				33	different transforms and transform sizes and quality optimization tools like
				34	loop filtering.
				35
				36	\image html genericcodecflow.png "" width=70%
				37
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	38	\subsection architecture_av1_structure AV1 Structure and Complexity
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	39
				40	As previously stated, AV1 adopts the same underlying paradigm as other block
				41	transform based codecs. However, it is much more complicated than previous
				42	generation codecs and supports many more block partitioning, prediction and
				43	transform options.
				44
				45	AV1 supports block partitions of various sizes from 128x128 pixels down to 4x4
				46	pixels using a multi-layer recursive tree structure as illustrated in figure 2
				47	below.
				48
				49	\image html av1partitions.png "" width=70%
				50
				51	AV1 also provides 71 basic intra prediction modes, 56 single frame inter prediction
				52	modes (7 reference frames x 4 modes x 2 for OBMC (overlapped block motion
				53	compensation)), 12768 compound inter prediction modes (that combine inter
				54	predictors from two reference frames) and 36708 compound inter / intra
				55	prediction modes. Furthermore, in addition to simple inter motion estimation,
				56	AV1 also supports warped motion prediction using affine transforms.
				57
				58	In terms of transform coding, it has 16 separable 2-D transform kernels
Paul Wilkins	8ed85dd	2020-08-04 17:48:22 +0100	[diff] [blame]	59	\f$(DCT, ADST, fADST, IDTX)^2\f$ that can be applied at up to 19 different
				60	scales from 64x64 down to 4x4 pixels.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	61
				62	When combined together, this means that for any one 8x8 pixel block in a
				63	source frame, there are approximately 45,000,000 different ways that it can
				64	be encoded.
				65
				66	Consequently, AV1 requires complex control processes. While not necessarily
				67	a normative part of the bitstream, these are the algorithms that turn a set
				68	of compression tools and a bitstream format specification, into a coherent
				69	and useful codec implementation. These may include but are not limited to
				70	things like :-
				71
				72	- Rate distortion optimization (The process of trying to choose the most
				73	efficient combination of block size, prediction mode, transform type
				74	etc.)
				75	- Rate control (regulation of the output bitrate)
				76	- Encoder speed vs quality trade offs.
				77	- Features such as two pass encoding or optimization for low delay
				78	encoding.
				79
Paul Wilkins	4a9201b	2020-06-26 10:46:22 +0100	[diff] [blame]	80	For a more detailed overview of AV1's encoding tools and a discussion of some
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	81	of the design considerations and hardware constraints that had to be
Paul Wilkins	f88a151	2020-10-20 13:18:40 +0100	[diff] [blame]	82	accommodated, please refer to <a href="https://arxiv.org/abs/2008.06091">
				83	A Technical Overview of AV1</a>.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	84
				85	Figure 3 provides a slightly expanded but still simplistic view of the
				86	AV1 encoder architecture with blocks that relate to some of the subsequent
				87	sections of this document. In this diagram, the raw uncompressed frame buffers
				88	are shown in dark green and the reconstructed frame buffers used for
				89	prediction in light green. Red indicates those parts of the codec that are
Paul Wilkins	4a9201b	2020-06-26 10:46:22 +0100	[diff] [blame]	90	(or may be) lossy, where fidelity can be traded off against compression
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	91	efficiency, whilst light blue shows algorithms or coding tools that are
				92	lossless. The yellow blocks represent non-bitstream normative configuration
				93	and control algorithms.
				94
				95	\image html av1encoderflow.png "" width=70%
				96
				97	\section architecture_command_line The Libaom Command Line Interface
				98
				99	Add details or links here: TODO ? elliotk@
				100
				101	\section architecture_enc_data_structures Main Encoder Data Structures
				102
Paul Wilkins	4a9201b	2020-06-26 10:46:22 +0100	[diff] [blame]	103	The following are the main high level data structures used by the libaom AV1
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	104	encoder and referenced elsewhere in this overview document:
				105
Mufaddal Chakera	8ee04fa	2021-03-17 13:33:18 +0530	[diff] [blame]	106	- \ref AV1_PRIMARY
				107	- \ref AV1_PRIMARY.gf_group (\ref GF_GROUP)
Tarundeep Singh	5e5305a	2021-03-16 13:04:04 +0530	[diff] [blame]	108	- \ref AV1_PRIMARY.lap_enabled
Mufaddal Chakera	358cf21	2021-02-25 14:41:56 +0530	[diff] [blame]	109	- \ref AV1_PRIMARY.twopass (\ref TWO_PASS)
Mufaddal Chakera	94ee9bf	2021-04-12 01:02:22 +0530	[diff] [blame]	110	- \ref AV1_PRIMARY.p_rc (\ref PRIMARY_RATE_CONTROL)
Angie Chiang	29aaace	2021-11-15 16:23:42 -0800	[diff] [blame]	111	- \ref AV1_PRIMARY.tf_info (\ref TEMPORAL_FILTER_INFO)
Mufaddal Chakera	8ee04fa	2021-03-17 13:33:18 +0530	[diff] [blame]	112
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	113	- \ref AV1_COMP
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	114	- \ref AV1_COMP.oxcf (\ref AV1EncoderConfig)
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	115	- \ref AV1_COMP.rc (\ref RATE_CONTROL)
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	116	- \ref AV1_COMP.speed
				117	- \ref AV1_COMP.sf (\ref SPEED_FEATURES)
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	118
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	119	- \ref AV1EncoderConfig (Encoder configuration parameters)
				120	- \ref AV1EncoderConfig.pass
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	121	- \ref AV1EncoderConfig.algo_cfg (\ref AlgoCfg)
Paul Wilkins	591f047	2020-07-15 15:30:56 +0100	[diff] [blame]	122	- \ref AV1EncoderConfig.kf_cfg (\ref KeyFrameCfg)
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	123	- \ref AV1EncoderConfig.rc_cfg (\ref RateControlCfg)
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	124
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	125	- \ref AlgoCfg (Algorithm related configuration parameters)
				126	- \ref AlgoCfg.arnr_max_frames
				127	- \ref AlgoCfg.arnr_strength
				128
				129	- \ref KeyFrameCfg (Keyframe coding configuration parameters)
				130	- \ref KeyFrameCfg.enable_keyframe_filtering
				131
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	132	- \ref RateControlCfg (Rate control configuration)
Paul Wilkins	1dd7a7e	2020-07-09 17:07:35 +0100	[diff] [blame]	133	- \ref RateControlCfg.mode
				134	- \ref RateControlCfg.target_bandwidth
				135	- \ref RateControlCfg.best_allowed_q
				136	- \ref RateControlCfg.worst_allowed_q
				137	- \ref RateControlCfg.cq_level
				138	- \ref RateControlCfg.under_shoot_pct
				139	- \ref RateControlCfg.over_shoot_pct
				140	- \ref RateControlCfg.maximum_buffer_size_ms
				141	- \ref RateControlCfg.starting_buffer_level_ms
				142	- \ref RateControlCfg.optimal_buffer_level_ms
Debargha Mukherjee	c6a8120	2020-07-22 16:35:20 -0700	[diff] [blame]	143	- \ref RateControlCfg.vbrbias
				144	- \ref RateControlCfg.vbrmin_section
				145	- \ref RateControlCfg.vbrmax_section
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	146
Mufaddal Chakera	94ee9bf	2021-04-12 01:02:22 +0530	[diff] [blame]	147	- \ref PRIMARY_RATE_CONTROL (Primary Rate control status)
				148	- \ref PRIMARY_RATE_CONTROL.gf_intervals[]
				149	- \ref PRIMARY_RATE_CONTROL.cur_gf_index
				150
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	151	- \ref RATE_CONTROL (Rate control status)
				152	- \ref RATE_CONTROL.intervals_till_gf_calculate_due
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	153	- \ref RATE_CONTROL.frames_till_gf_update_due
				154	- \ref RATE_CONTROL.frames_to_key
				155
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	156	- \ref TWO_PASS (Two pass status and control data)
				157
Wan-Teh Chang	247dd54	2020-10-08 12:37:47 -0700	[diff] [blame]	158	- \ref GF_GROUP (Data related to the current GF/ARF group)
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	159
				160	- \ref FIRSTPASS_STATS (Defines entries in the first pass stats buffer)
				161	- \ref FIRSTPASS_STATS.coded_error
				162
				163	- \ref SPEED_FEATURES (Encode speed vs quality tradeoff parameters)
				164	- \ref SPEED_FEATURES.hl_sf (\ref HIGH_LEVEL_SPEED_FEATURES)
				165
				166	- \ref HIGH_LEVEL_SPEED_FEATURES
				167	- \ref HIGH_LEVEL_SPEED_FEATURES.recode_loop
				168	- \ref HIGH_LEVEL_SPEED_FEATURES.recode_tolerance
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	169
Paul Wilkins	4ac8bf4	2020-07-30 16:44:27 +0100	[diff] [blame]	170	- \ref TplParams
				171
Paul Wilkins	7173920	2020-07-23 15:09:07 +0100	[diff] [blame]	172	\section architecture_enc_use_cases Encoder Use Cases
				173
				174	The libaom AV1 encoder is configurable to support a number of different use
				175	cases and rate control strategies.
				176
				177	The principle use cases for which it is optimised are as follows:
				178
				179	- <b>Video on Demand / Streaming</b>
				180	- <b>Low Delay or Live Streaming</b>
				181	- <b>Video Conferencing / Real Time Coding (RTC)</b>
				182	- <b>Fixed Quality / Testing</b>
				183
				184	Other examples of use cases for which the encoder could be configured but for
				185	which there is less by way of specific optimizations include:
				186
				187	- <b>Download and Play</b>
				188	- <b>Disk Playback</b>>
				189	- <b>Storage</b>
				190	- <b>Editing</b>
				191	- <b>Broadcast video</b>
				192
				193	Specific use cases may have particular requirements or constraints. For
				194	example:
				195
				196	<b>Video Conferencing:</b> In a video conference we need to encode the video
				197	in real time and to avoid any coding tools that could increase latency, such
				198	as frame look ahead.
				199
				200	<b>Live Streams:</b> In cases such as live streaming of games or events, it
				201	may be possible to allow some limited buffering of the video and use of
				202	lookahead coding tools to improve encoding quality. However, whilst a lag of
				203	a second or two may be fine given the one way nature of this type of video,
				204	it is clearly not possible to use tools such as two pass coding.
				205
				206	<b>Broadcast:</b> Broadcast video (e.g. digital TV over satellite) may have
				207	specific requirements such as frequent and regular key frames (e.g. once per
				208	second or more) as these are important as entry points to users when switching
				209	channels. There may also be strict upper limits on bandwidth over a short
				210	window of time.
				211
				212	<b>Download and Play:</b> Download and play applications may have less strict
				213	requirements in terms of local frame by frame rate control but there may be a
				214	requirement to accurately hit a file size target for the video clip as a
				215	whole. Similar considerations may apply to playback from mass storage devices
				216	such as DVD or disk drives.
				217
				218	<b>Editing:</b> In certain special use cases such as offline editing, it may
				219	be desirable to have very high quality and data rate but also very frequent
				220	key frames or indeed to encode the video exclusively as key frames. Lossless
				221	video encoding may also be required in this use case.
				222
				223	<b>VOD / Streaming:</b> One of the most important and common use cases for AV1
				224	is video on demand or streaming, for services such as YouTube and Netflix. In
				225	this use case it is possible to do two or even multi-pass encoding to improve
				226	compression efficiency. Streaming services will often store many encoded
				227	copies of a video at different resolutions and data rates to support users
				228	with different types of playback device and bandwidth limitations.
				229	Furthermore, these services support dynamic switching between multiple
				230	streams, so that they can respond to changing network conditions.
				231
				232	Exact rate control when encoding for a specific format (e.g 360P or 1080P on
				233	YouTube) may not be critical, provided that the video bandwidth remains within
				234	allowed limits. Whilst a format may have a nominal target data rate, this can
				235	be considered more as the desired average egress rate over the video corpus
				236	rather than a strict requirement for any individual clip. Indeed, in order
				237	to maintain optimal quality of experience for the end user, it may be
				238	desirable to encode some easier videos or sections of video at a lower data
				239	rate and harder videos or sections at a higher rate.
				240
				241	VOD / streaming does not usually require very frequent key frames (as in the
				242	broadcast case) but key frames are important in trick play (scanning back and
				243	forth to different points in a video) and for adaptive stream switching. As
				244	such, in a use case like YouTube, there is normally an upper limit on the
				245	maximum time between key frames of a few seconds, but within certain limits
				246	the encoder can try to align key frames with real scene cuts.
				247
				248	Whilst encoder speed may not seem to be as critical in this use case, for
				249	services such as YouTube, where millions of new videos have to be encoded
				250	every day, encoder speed is still important, so libaom allows command line
				251	control of the encode speed vs quality trade off.
				252
				253	<b>Fixed Quality / Testing Mode:</b> Libaom also has a fixed quality encoder
				254	pathway designed for testing under highly constrained conditions.
				255
				256	\section architecture_enc_speed_quality Speed vs Quality Trade Off
				257
				258	In any modern video encoder there are trade offs that can be made in regard to
				259	the amount of time spent encoding a video or video frame vs the quality of the
				260	final encode.
				261
				262	These trade offs typically limit the scope of the search for an optimal
				263	prediction / transform combination with faster encode modes doing fewer
				264	partition, reference frame, prediction mode and transform searches at the cost
				265	of some reduction in coding efficiency.
				266
				267	The pruning of the size of the search tree is typically based on assumptions
				268	about the likelihood of different search modes being selected based on what
				269	has gone before and features such as the dimensions of the video frames and
				270	the Q value selected for encoding the frame. For example certain intra modes
				271	are less likely to be chosen at high Q but may be more likely if similar
				272	modes were used for the previously coded blocks above and to the left of the
				273	current block.
				274
				275	The speed settings depend both on the use case (e.g. Real Time encoding) and
				276	an explicit speed control passed in on the command line as <b>--cpu-used</b>
				277	and stored in the \ref AV1_COMP.speed field of the main compressor instance
				278	data structure (<b>cpi</b>).
				279
				280	The control flags for the speed trade off are stored the \ref AV1_COMP.sf
				281	field of the compressor instancve and are set in the following functions:-
				282
				283	- \ref av1_set_speed_features_framesize_independent()
				284	- \ref av1_set_speed_features_framesize_dependent()
				285	- \ref av1_set_speed_features_qindex_dependent()
				286
				287	A second factor impacting the speed of encode is rate distortion optimisation
				288	(<b>rd vs non-rd</b> encoding).
				289
				290	When rate distortion optimization is enabled each candidate combination of
				291	a prediction mode and transform coding strategy is fully encoded and the
				292	resulting error (or distortion) as compared to the original source and the
				293	number of bits used, are passed to a rate distortion function. This function
				294	converts the distortion and cost in bits to a single <b>RD</b> value (where
				295	lower is better). This <b>RD</b> value is used to decide between different
				296	encoding strategies for the current block where, for example, a one may
				297	result in a lower distortion but a larger number of bits.
				298
				299	The calculation of this <b>RD</b> value is broadly speaking as follows:
				300
				301	\f[
				302	RD = (λ * Rate) + Distortion
				303	\f]
				304
				305	This assumes a linear relationship between the number of bits used and
				306	distortion (represented by the rate multiplier value <b>λ</b>) which is
				307	not actually valid across a broad range of rate and distortion values.
				308	Typically, where distortion is high, expending a small number of extra bits
				309	will result in a large change in distortion. However, at lower values of
				310	distortion the cost in bits of each incremental improvement is large.
				311
				312	To deal with this we scale the value of <b>λ</b> based on the quantizer
				313	value chosen for the frame. This is assumed to be a proxy for our approximate
				314	position on the true rate distortion curve and it is further assumed that over
				315	a limited range of distortion values, a linear relationship between distortion
				316	and rate is a valid approximation.
				317
				318	Doing a rate distortion test on each candidate prediction / transform
				319	combination is expensive in terms of cpu cycles. Hence, for cases where encode
				320	speed is critical, libaom implements a non-rd pathway where the <b>RD</b>
				321	value is estimated based on the prediction error and quantizer setting.
				322
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	323	\section architecture_enc_src_proc Source Frame Processing
				324
				325	\subsection architecture_enc_frame_proc_data Main Data Structures
				326
				327	The following are the main data structures referenced in this section
				328	(see also \ref architecture_enc_data_structures):
				329
Tarundeep Singh	4593fcf	2021-03-31 00:53:31 +0530	[diff] [blame]	330	- \ref AV1_PRIMARY ppi (the primary compressor instance data structure)
Angie Chiang	29aaace	2021-11-15 16:23:42 -0800	[diff] [blame]	331	- \ref AV1_PRIMARY.tf_info (\ref TEMPORAL_FILTER_INFO)
Tarundeep Singh	4593fcf	2021-03-31 00:53:31 +0530	[diff] [blame]	332
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	333	- \ref AV1_COMP cpi (the main compressor instance data structure)
				334	- \ref AV1_COMP.oxcf (\ref AV1EncoderConfig)
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	335
				336	- \ref AV1EncoderConfig (Encoder configuration parameters)
				337	- \ref AV1EncoderConfig.algo_cfg (\ref AlgoCfg)
				338	- \ref AV1EncoderConfig.kf_cfg (\ref KeyFrameCfg)
				339
				340	- \ref AlgoCfg (Algorithm related configuration parameters)
				341	- \ref AlgoCfg.arnr_max_frames
				342	- \ref AlgoCfg.arnr_strength
				343
				344	- \ref KeyFrameCfg (Keyframe coding configuration parameters)
				345	- \ref KeyFrameCfg.enable_keyframe_filtering
				346
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	347	\subsection architecture_enc_frame_proc_ingest Frame Ingest / Coding Pipeline
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	348
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	349	To encode a frame, first call \ref av1_receive_raw_frame() to obtain the raw
				350	frame data. Then call \ref av1_get_compressed_data() to encode raw frame data
				351	into compressed frame data. The main body of \ref av1_get_compressed_data()
				352	is \ref av1_encode_strategy(), which determines high-level encode strategy
				353	(frame type, frame placement, etc.) and then encodes the frame by calling
				354	\ref av1_encode(). In \ref av1_encode(), \ref av1_first_pass() will execute
				355	the first_pass of two-pass encoding, while \ref encode_frame_to_data_rate()
				356	will perform the final pass for either one-pass or two-pass encoding.
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	357
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	358	The main body of \ref encode_frame_to_data_rate() is
				359	\ref encode_with_recode_loop_and_filter(), which handles encoding before
Paul Wilkins	591f047	2020-07-15 15:30:56 +0100	[diff] [blame]	360	in-loop filters (with recode loops \ref encode_with_recode_loop(), or
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	361	without any recode loop \ref encode_without_recode()), followed by in-loop
				362	filters (deblocking filters \ref loopfilter_frame(), CDEF filters and
				363	restoration filters \ref cdef_restoration_frame()).
				364
Paul Wilkins	591f047	2020-07-15 15:30:56 +0100	[diff] [blame]	365	Except for rate/quality control, both \ref encode_with_recode_loop() and
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	366	\ref encode_without_recode() call \ref av1_encode_frame() to manage the
				367	reference frame buffers and \ref encode_frame_internal() to perform the
				368	rest of encoding that does not require access to external frames.
				369	\ref encode_frame_internal() is the starting point for the partition search
				370	(see \ref architecture_enc_partitions).
				371
				372	\subsection architecture_enc_frame_proc_tf Temporal Filtering
				373
				374	\subsubsection architecture_enc_frame_proc_tf_overview Overview
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	375
				376	Video codecs exploit the spatial and temporal correlations in video signals to
				377	achieve compression efficiency. The noise factor in the source signal
				378	attenuates such correlation and impedes the codec performance. Denoising the
				379	video signal is potentially a promising solution.
				380
				381	One strategy for denoising a source is motion compensated temporal filtering.
				382	Unlike image denoising, where only the spatial information is available,
				383	video denoising can leverage a combination of the spatial and temporal
				384	information. Specifically, in the temporal domain, similar pixels can often be
				385	tracked along the motion trajectory of moving objects. Motion estimation is
				386	applied to neighboring frames to find similar patches or blocks of pixels that
				387	can be combined to create a temporally filtered output.
				388
				389	AV1, in common with VP8 and VP9, uses an in-loop motion compensated temporal
				390	filter to generate what are referred to as alternate reference frames (or ARF
				391	frames). These can be encoded in the bitstream and stored as frame buffers for
				392	use in the prediction of subsequent frames, but are not usually directly
				393	displayed (hence they are sometimes referred to as non-display frames).
				394
				395	The following command line parameters set the strength of the filter, the
				396	number of frames used and determine whether filtering is allowed for key
				397	frames.
				398
				399	- <b>--arnr-strength</b> (\ref AlgoCfg.arnr_strength)
				400	- <b>--arnr-maxframes</b> (\ref AlgoCfg.arnr_max_frames)
				401	- <b>--enable-keyframe-filtering</b>
				402	(\ref KeyFrameCfg.enable_keyframe_filtering)
				403
				404	Note that in AV1, the temporal filtering scheme is designed around the
				405	hierarchical ARF based pyramid coding structure. We typically apply denoising
				406	only on key frame and ARF frames at the highest (and sometimes the second
				407	highest) layer in the hierarchical coding structure.
				408
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	409	\subsubsection architecture_enc_frame_proc_tf_algo Temporal Filtering Algorithm
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	410
				411	Our method divides the current frame into "MxM" blocks. For each block, a
				412	motion search is applied on frames before and after the current frame. Only
				413	the best matching patch with the smallest mean square error (MSE) is kept as a
				414	candidate patch for a neighbour frame. The current block is also a candidate
				415	patch. A total of N candidate patches are combined to generate the filtered
				416	output.
				417
				418	Let f(i) represent the filtered sample value and \f$p_{j}(i)\f$ the sample
				419	value of the j-th patch. The filtering process is:
				420
				421	\f[
				422	f(i) = \frac{p_{0}(i) + \sum_{j=1}^{N} ω_{j}(i).p_{j}(i)}
				423	{1 + \sum_{j=1}^{N} ω_{j}(i)}
				424	\f]
				425
				426	where \f$ ω_{j}(i) \f$ is the weight of the j-th patch from a total of
				427	N patches. The weight is determined by the patch difference as:
				428
				429	\f[
				430	ω_{j}(i) = exp(-\frac{D_{j}(i)}{h^2})
				431	\f]
				432
				433	where \f$ D_{j}(i) \f$ is the sum of squared difference between the current
				434	block and the j-th candidate patch:
				435
				436	\f[
				437	D_{j}(i) = \sum_{k\inΩ_{i}}\|\|p_{0}(k) - p_{j}(k)\|\|_{2}
				438	\f]
				439
				440	where:
				441	- \f$p_{0}\f$ refers to the current frame.
				442	- \f$Ω_{i}\f$ is the patch window, an "LxL" pixel square.
				443	- h is a critical parameter that controls the decay of the weights measured by
				444	the Euclidean distance. It is derived from an estimate of noise amplitude in
				445	the source. This allows the filter coefficients to adapt for videos with
				446	different noise characteristics.
				447	- Usually, M = 32, N = 7, and L = 5, but they can be adjusted.
				448
				449	It is recommended that the reader refers to the code for more details.
				450
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	451	\subsubsection architecture_enc_frame_proc_tf_funcs Temporal Filter Functions
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	452
Paul Wilkins	c84e8e2	2020-07-21 19:09:33 +0100	[diff] [blame]	453	The main entry point for temporal filtering is \ref av1_temporal_filter().
				454	This function returns 1 if temporal filtering is successful, otherwise 0.
				455	When temporal filtering is applied, the filtered frame will be held in
Angie Chiang	29aaace	2021-11-15 16:23:42 -0800	[diff] [blame]	456	the output_frame, which is the frame to be
Paul Wilkins	c84e8e2	2020-07-21 19:09:33 +0100	[diff] [blame]	457	encoded in the following encoding process.
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	458
				459	Almost all temporal filter related code is in av1/encoder/temporal_filter.c
				460	and av1/encoder/temporal_filter.h.
				461
Paul Wilkins	c84e8e2	2020-07-21 19:09:33 +0100	[diff] [blame]	462	Inside \ref av1_temporal_filter(), the reader's attention is directed to
				463	\ref tf_setup_filtering_buffer() and \ref tf_do_filtering().
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	464
Paul Wilkins	c84e8e2	2020-07-21 19:09:33 +0100	[diff] [blame]	465	- \ref tf_setup_filtering_buffer(): sets up the frame buffer for
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	466	temporal filtering, determines the number of frames to be used, and
				467	calculates the noise level of each frame.
				468
Paul Wilkins	c84e8e2	2020-07-21 19:09:33 +0100	[diff] [blame]	469	- \ref tf_do_filtering(): the main function for the temporal
Paul Wilkins	591f047	2020-07-15 15:30:56 +0100	[diff] [blame]	470	filtering algorithm. It breaks each frame into "MxM" blocks. For each
Paul Wilkins	c84e8e2	2020-07-21 19:09:33 +0100	[diff] [blame]	471	block a motion search \ref tf_motion_search() is applied to find
				472	the motion vector from one neighboring frame. tf_build_predictor() is then
				473	called to build the matching patch and \ref av1_apply_temporal_filter_c() (see
				474	also optimised SIMD versions) to apply temporal filtering. The weighted
				475	average over each pixel is accumulated and finally normalized in
				476	\ref tf_normalize_filtered_frame() to generate the final filtered frame.
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	477
Paul Wilkins	c84e8e2	2020-07-21 19:09:33 +0100	[diff] [blame]	478	- \ref av1_apply_temporal_filter_c(): the core function of our temporal
				479	filtering algorithm (see also optimised SIMD versions).
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	480
				481	\subsection architecture_enc_frame_proc_film Film Grain Modelling
				482
				483	Add details here.
				484
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	485	\section architecture_enc_rate_ctrl Rate Control
				486
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	487	\subsection architecture_enc_rate_ctrl_data Main Data Structures
				488
				489	The following are the main data structures referenced in this section
				490	(see also \ref architecture_enc_data_structures):
				491
Mufaddal Chakera	358cf21	2021-02-25 14:41:56 +0530	[diff] [blame]	492	- \ref AV1_PRIMARY ppi (the primary compressor instance data structure)
				493	- \ref AV1_PRIMARY.twopass (\ref TWO_PASS)
				494
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	495	- \ref AV1_COMP cpi (the main compressor instance data structure)
				496	- \ref AV1_COMP.oxcf (\ref AV1EncoderConfig)
				497	- \ref AV1_COMP.rc (\ref RATE_CONTROL)
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	498	- \ref AV1_COMP.sf (\ref SPEED_FEATURES)
				499
				500	- \ref AV1EncoderConfig (Encoder configuration parameters)
				501	- \ref AV1EncoderConfig.rc_cfg (\ref RateControlCfg)
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	502
				503	- \ref FIRSTPASS_STATS *frame_stats_buf (used to store per frame first
				504	pass stats)
				505
				506	- \ref SPEED_FEATURES (Encode speed vs quality tradeoff parameters)
				507	- \ref SPEED_FEATURES.hl_sf (\ref HIGH_LEVEL_SPEED_FEATURES)
				508
				509	\subsection architecture_enc_rate_ctrl_options Supported Rate Control Options
				510
Paul Wilkins	7173920	2020-07-23 15:09:07 +0100	[diff] [blame]	511	Different use cases (\ref architecture_enc_use_cases) may have different
				512	requirements in terms of data rate control.
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	513
				514	The broad rate control strategy is selected using the <b>--end-usage</b>
				515	parameter on the command line, which maps onto the field
				516	\ref aom_codec_enc_cfg_t.rc_end_usage in \ref aom_encoder.h.
				517
				518	The four supported options are:-
				519
				520	- <b>VBR</b> (Variable Bitrate)
				521	- <b>CBR</b> (Constant Bitrate)
				522	- <b>CQ</b> (Constrained Quality mode ; A constrained variant of VBR)
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	523	- <b>Fixed Q</b> (Constant quality of Q mode)
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	524
				525	The value of \ref aom_codec_enc_cfg_t.rc_end_usage is in turn copied over
				526	into the encoder rate control configuration data structure as
Paul Wilkins	1dd7a7e	2020-07-09 17:07:35 +0100	[diff] [blame]	527	\ref RateControlCfg.mode.
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	528
				529	In regards to the most important use cases above, Video on demand uses either
				530	VBR or CQ mode. CBR is the preferred rate control model for RTC and Live
				531	streaming and Fixed Q is only used in testing.
				532
				533	The behaviour of each of these modes is regulated by a series of secondary
				534	command line rate control options but also depends somewhat on the selected
				535	use case, whether 2-pass coding is enabled and the selected encode speed vs
				536	quality trade offs (\ref AV1_COMP.speed and \ref AV1_COMP.sf).
				537
				538	The list below gives the names of the main rate control command line
				539	options together with the names of the corresponding fields in the rate
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	540	control configuration data structures.
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	541
Paul Wilkins	1dd7a7e	2020-07-09 17:07:35 +0100	[diff] [blame]	542	- <b>--target-bitrate</b> (\ref RateControlCfg.target_bandwidth)
				543	- <b>--min-q</b> (\ref RateControlCfg.best_allowed_q)
				544	- <b>--max-q</b> (\ref RateControlCfg.worst_allowed_q)
				545	- <b>--cq-level</b> (\ref RateControlCfg.cq_level)
				546	- <b>--undershoot-pct</b> (\ref RateControlCfg.under_shoot_pct)
				547	- <b>--overshoot-pct</b> (\ref RateControlCfg.over_shoot_pct)
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	548
Debargha Mukherjee	c6a8120	2020-07-22 16:35:20 -0700	[diff] [blame]	549	The following control aspects of vbr encoding
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	550
Debargha Mukherjee	c6a8120	2020-07-22 16:35:20 -0700	[diff] [blame]	551	- <b>--bias-pct</b> (\ref RateControlCfg.vbrbias)
				552	- <b>--minsection-pct</b> ((\ref RateControlCfg.vbrmin_section)
				553	- <b>--maxsection-pct</b> ((\ref RateControlCfg.vbrmax_section)
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	554
				555	The following relate to buffer and delay management in one pass low delay and
				556	real time coding
				557
Paul Wilkins	1dd7a7e	2020-07-09 17:07:35 +0100	[diff] [blame]	558	- <b>--buf-sz</b> (\ref RateControlCfg.maximum_buffer_size_ms)
				559	- <b>--buf-initial-sz</b> (\ref RateControlCfg.starting_buffer_level_ms)
				560	- <b>--buf-optimal-sz</b> (\ref RateControlCfg.optimal_buffer_level_ms)
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	561
				562	\subsection architecture_enc_vbr Variable Bitrate (VBR) Encoding
				563
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	564	For streamed VOD content the most common rate control strategy is Variable
				565	Bitrate (VBR) encoding. The CQ mode mentioned above is a variant of this
				566	where additional quantizer and quality constraints are applied. VBR
				567	encoding may in theory be used in conjunction with either 1-pass or 2-pass
				568	encoding.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	569
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	570	VBR encoding varies the number of bits given to each frame or group of frames
				571	according to the difficulty of that frame or group of frames, such that easier
				572	frames are allocated fewer bits and harder frames are allocated more bits. The
				573	intent here is to even out the quality between frames. This contrasts with
				574	Constant Bitrate (CBR) encoding where each frame is allocated the same number
				575	of bits.
				576
				577	Whilst for any given frame or group of frames the data rate may vary, the VBR
				578	algorithm attempts to deliver a given average bitrate over a wider time
				579	interval. In standard VBR encoding, the time interval over which the data rate
				580	is averaged is usually the duration of the video clip. An alternative
				581	approach is to target an average VBR bitrate over the entire video corpus for
				582	a particular video format (corpus VBR).
				583
				584	\subsubsection architecture_enc_1pass_vbr 1 Pass VBR Encoding
				585
				586	The command line for libaom does allow 1 Pass VBR, but this has not been
Paul Wilkins	c4cfb44	2020-07-01 16:15:53 +0100	[diff] [blame]	587	properly optimised and behaves much like 1 pass CBR in most regards, with bits
				588	allocated to frames by the following functions:
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	589
				590	- \ref av1_calc_iframe_target_size_one_pass_vbr()
				591	- \ref av1_calc_pframe_target_size_one_pass_vbr()
				592
				593	\subsubsection architecture_enc_2pass_vbr 2 Pass VBR Encoding
				594
				595	The main focus here will be on 2-pass VBR encoding (and the related CQ mode)
				596	as these are the modes most commonly used for VOD content.
				597
				598	2-pass encoding is selected on the command line by setting --passes=2
				599	(or -p 2).
				600
				601	Generally speaking, in 2-pass encoding, an encoder will first encode a video
				602	using a default set of parameters and assumptions. Depending on the outcome
				603	of that first encode, the baseline assumptions and parameters will be adjusted
				604	to optimize the output during the second pass. In essence the first pass is a
				605	fact finding mission to establish the complexity and variability of the video,
				606	in order to allow a better allocation of bits in the second pass.
				607
				608	The libaom 2-pass algorithm is unusual in that the first pass is not a full
				609	encode of the video. Rather it uses a limited set of prediction and transform
				610	options and a fixed quantizer, to generate statistics about each frame. No
				611	output bitstream is created and the per frame first pass statistics are stored
				612	entirely in volatile memory. This has some disadvantages when compared to a
				613	full first pass encode, but avoids the need for file I/O and improves speed.
				614
Paul Wilkins	c4cfb44	2020-07-01 16:15:53 +0100	[diff] [blame]	615	For two pass encoding, the function \ref av1_encode() will first be called
				616	for each frame in the video with the value \ref AV1EncoderConfig.pass = 1.
				617	This will result in calls to \ref av1_first_pass().
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	618
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	619	Statistics for each frame are stored in \ref FIRSTPASS_STATS frame_stats_buf.
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	620
				621	After completion of the first pass, \ref av1_encode() will be called again for
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	622	each frame with \ref AV1EncoderConfig.pass = 2. The frames are then encoded in
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	623	accordance with the statistics gathered during the first pass by calls to
Paul Wilkins	a0816fc	2020-07-23 13:33:29 +0100	[diff] [blame]	624	\ref encode_frame_to_data_rate() which in turn calls
				625	\ref av1_get_second_pass_params().
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	626
				627	In summary the second pass code :-
				628
				629	- Searches for scene cuts (if auto key frame detection is enabled).
				630	- Defines the length of and hierarchical structure to be used in each
				631	ARF/GF group.
				632	- Allocates bits based on the relative complexity of each frame, the quality
				633	of frame to frame prediction and the type of frame (e.g. key frame, ARF
				634	frame, golden frame or normal leaf frame).
				635	- Suggests a maximum Q (quantizer value) for each ARF/GF group, based on
				636	estimated complexity and recent rate control compliance
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	637	(\ref RATE_CONTROL.active_worst_quality)
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	638	- Tracks adherence to the overall rate control objectives and adjusts
				639	heuristics.
				640
Paul Wilkins	591f047	2020-07-15 15:30:56 +0100	[diff] [blame]	641	The main two pass functions in regard to the above include:-
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	642
Paul Wilkins	be20bc2	2020-07-16 14:46:57 +0100	[diff] [blame]	643	- \ref find_next_key_frame()
Paul Wilkins	e8af152	2020-07-09 15:05:01 +0100	[diff] [blame]	644	- \ref define_gf_group()
Paul Wilkins	be20bc2	2020-07-16 14:46:57 +0100	[diff] [blame]	645	- \ref calculate_total_gf_group_bits()
				646	- \ref get_twopass_worst_quality()
				647	- \ref av1_gop_setup_structure()
				648	- \ref av1_gop_bit_allocation()
				649	- \ref av1_twopass_postencode_update()
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	650
				651	For each frame, the two pass algorithm defines a target number of bits
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	652	\ref RATE_CONTROL.base_frame_target, which is then adjusted if necessary to
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	653	reflect any undershoot or overshoot on previous frames to give
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	654	\ref RATE_CONTROL.this_frame_target.
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	655
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	656	As well as \ref RATE_CONTROL.active_worst_quality, the two pass code also
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	657	maintains a record of the actual Q value used to encode previous frames
				658	at each level in the current pyramid hierarchy
Aasaipriya	c6f0a0b	2021-08-12 11:27:03 +0530	[diff] [blame]	659	(\ref PRIMARY_RATE_CONTROL.active_best_quality). The function
Paul Wilkins	c4cfb44	2020-07-01 16:15:53 +0100	[diff] [blame]	660	\ref rc_pick_q_and_bounds(), uses these values to set a permitted Q range
				661	for each frame.
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	662
				663	\subsubsection architecture_enc_1pass_lagged 1 Pass Lagged VBR Encoding
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	664
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	665	1 pass lagged encode falls between simple 1 pass encoding and full two pass
				666	encoding and is used for cases where it is not possible to do a full first
				667	pass through the entire video clip, but where some delay is permissible. For
				668	example near live streaming where there is a delay of up to a few seconds. In
				669	this case the first pass and second pass are in effect combined such that the
				670	first pass starts encoding the clip and the second pass lags behind it by a
				671	few frames. When using this method, full sequence level statistics are not
				672	available, but it is possible to collect and use frame or group of frame level
				673	data to help in the allocation of bits and in defining ARF/GF coding
Tarundeep Singh	5e5305a	2021-03-16 13:04:04 +0530	[diff] [blame]	674	hierarchies. The reader is referred to the \ref AV1_PRIMARY.lap_enabled field
Paul Wilkins	7173920	2020-07-23 15:09:07 +0100	[diff] [blame]	675	in the main compressor instance (where <b>lap</b> stands for
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	676	<b>look ahead processing</b>). This encoding mode for the most part uses the
				677	same rate control pathways as two pass VBR encoding.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	678
				679	\subsection architecture_enc_rc_loop The Main Rate Control Loop
				680
Paul Wilkins	c4cfb44	2020-07-01 16:15:53 +0100	[diff] [blame]	681	Having established a target rate for a given frame and an allowed range of Q
				682	values, the encoder then tries to encode the frame at a rate that is as close
				683	as possible to the target value, given the Q range constraints.
				684
				685	There are two main mechanisms by which this is achieved.
				686
				687	The first selects a frame level Q, using an adaptive estimate of the number of
				688	bits that will be generated when the frame is encoded at any given Q.
				689	Fundamentally this mechanism is common to VBR, CBR and to use cases such as
				690	RTC with small adjustments.
				691
				692	As the Q value mainly adjusts the precision of the residual signal, it is not
				693	actually a reliable basis for accurately predicting the number of bits that
				694	will be generated across all clips. A well predicted clip, for example, may
				695	have a much smaller error residual after prediction. The algorithm copes with
				696	this by adapting its predictions on the fly using a feedback loop based on how
				697	well it did the previous time around.
				698
				699	The main functions responsible for the prediction of Q and the adaptation over
				700	time, for the two pass encoding pipeline are:
				701
				702	- \ref rc_pick_q_and_bounds()
Paul Wilkins	5ce9d50	2020-07-16 17:58:40 +0100	[diff] [blame]	703	- \ref get_q()
				704	- \ref av1_rc_regulate_q()
				705	- \ref get_rate_correction_factor()
				706	- \ref set_rate_correction_factor()
				707	- \ref find_closest_qindex_by_rate()
Paul Wilkins	be20bc2	2020-07-16 14:46:57 +0100	[diff] [blame]	708	- \ref av1_twopass_postencode_update()
Paul Wilkins	5ce9d50	2020-07-16 17:58:40 +0100	[diff] [blame]	709	- \ref av1_rc_update_rate_correction_factors()
Paul Wilkins	c4cfb44	2020-07-01 16:15:53 +0100	[diff] [blame]	710
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	711	A second mechanism for control comes into play if there is a large rate miss
Paul Wilkins	c4cfb44	2020-07-01 16:15:53 +0100	[diff] [blame]	712	for the current frame (much too big or too small). This is a recode mechanism
				713	which allows the current frame to be re-encoded one or more times with a
				714	revised Q value. This obviously has significant implications for encode speed
				715	and in the case of RTC latency (hence it is not used for the RTC pathway).
				716
				717	Whether or not a recode is allowed for a given frame depends on the selected
				718	encode speed vs quality trade off. This is set on the command line using the
				719	--cpu-used parameter which maps onto the \ref AV1_COMP.speed field in the main
				720	compressor instance data structure.
				721
				722	The value of \ref AV1_COMP.speed, combined with the use case, is used to
				723	populate the speed features data structure AV1_COMP.sf. In particular
				724	\ref HIGH_LEVEL_SPEED_FEATURES.recode_loop determines the types of frames that
				725	may be recoded and \ref HIGH_LEVEL_SPEED_FEATURES.recode_tolerance is a rate
				726	error trigger threshold.
				727
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	728	For more information the reader is directed to the following functions:
Paul Wilkins	c4cfb44	2020-07-01 16:15:53 +0100	[diff] [blame]	729
Paul Wilkins	591f047	2020-07-15 15:30:56 +0100	[diff] [blame]	730	- \ref encode_with_recode_loop()
Paul Wilkins	c8d3f11	2020-07-08 17:58:14 +0100	[diff] [blame]	731	- \ref encode_without_recode()
Paul Wilkins	591f047	2020-07-15 15:30:56 +0100	[diff] [blame]	732	- \ref recode_loop_update_q()
				733	- \ref recode_loop_test()
Paul Wilkins	7173920	2020-07-23 15:09:07 +0100	[diff] [blame]	734	- \ref av1_set_speed_features_framesize_independent()
				735	- \ref av1_set_speed_features_framesize_dependent()
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	736
				737	\subsection architecture_enc_fixed_q Fixed Q Mode
				738
Paul Wilkins	ea2876f	2020-07-13 18:36:09 +0100	[diff] [blame]	739	There are two main fixed Q cases:
				740	-# Fixed Q with adaptive qp offsets: same qp offset for each pyramid level
				741	in a given video, but these offsets are adaptive based on video content.
				742	-# Fixed Q with fixed qp offsets: content-independent fixed qp offsets for
Jingning Han	4eed226	2021-09-08 15:48:50 -0700	[diff] [blame]	743	each pyramid level.
Paul Wilkins	ea2876f	2020-07-13 18:36:09 +0100	[diff] [blame]	744
				745	The reader is also refered to the following functions:
				746	- \ref av1_rc_pick_q_and_bounds()
				747	- \ref rc_pick_q_and_bounds_no_stats_cbr()
				748	- \ref rc_pick_q_and_bounds_no_stats()
				749	- \ref rc_pick_q_and_bounds()
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	750
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	751	\section architecture_enc_frame_groups GF/ ARF Frame Groups & Hierarchical Coding
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	752
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	753	\subsection architecture_enc_frame_groups_data Main Data Structures
				754
				755	The following are the main data structures referenced in this section
				756	(see also \ref architecture_enc_data_structures):
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	757
				758	- \ref AV1_COMP cpi (the main compressor instance data structure)
				759	- \ref AV1_COMP.rc (\ref RATE_CONTROL)
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	760
				761	- \ref FIRSTPASS_STATS *frame_stats_buf (used to store per frame first pass
				762	stats)
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	763
				764	\subsection architecture_enc_frame_groups_groups Frame Groups
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	765
				766	To process a sequence/stream of video frames, the encoder divides the frames
				767	into groups and encodes them sequentially (possibly dependent on previous
				768	groups). In AV1 such a group is usually referred to as a golden frame group
				769	(GF group) or sometimes an Alt-Ref (ARF) group or a group of pictures (GOP).
				770	A GF group determines and stores the coding structure of the frames (for
				771	example, frame type, usage of the hierarchical structure, usage of overlay
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	772	frames, etc.) and can be considered as the base unit to process the frames,
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	773	therefore playing an important role in the encoder.
				774
				775	The length of a specific GF group is arguably the most important aspect when
				776	determining a GF group. This is because most GF group level decisions are
				777	based on the frame characteristics, if not on the length itself directly.
				778	Note that the GF group is always a group of consecutive frames, which means
				779	the start and end of the group (so again, the length of it) determines which
				780	frames are included in it and hence determines the characteristics of the GF
				781	group. Therefore, in this document we will first discuss the GF group length
				782	decision in Libaom, followed by frame structure decisions when defining a GF
				783	group with a certain length.
				784
				785	\subsection architecture_enc_gf_length GF / ARF Group Length Determination
				786
				787	The basic intuition of determining the GF group length is that it is usually
				788	desirable to group together frames that are similar. Hence, we may choose
				789	longer groups when consecutive frames are very alike and shorter ones when
				790	they are very different.
				791
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	792	The determination of the GF group length is done in function \ref
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	793	calculate_gf_length(). The following encoder use cases are supported:
				794
				795	<ul>
Paul Wilkins	ff98f3e	2020-07-27 16:01:05 +0100	[diff] [blame]	796	<li><b>Single pass with look-ahead disabled(\ref has_no_stats_stage()):
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	797	</b> in this case there is no information available on the following stream
				798	of frames, therefore the function will set the GF group length for the
				799	current and the following GF groups (a total number of MAX_NUM_GF_INTERVALS
				800	groups) to be the maximum value allowed.</li>
				801
Tarundeep Singh	5e5305a	2021-03-16 13:04:04 +0530	[diff] [blame]	802	<li><b>Single pass with look-ahead enabled (\ref AV1_PRIMARY.lap_enabled):</b>
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	803	look-ahead processing is enabled for single pass, therefore there is a
				804	limited amount of information available regarding future frames. In this
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	805	case the function will determine the length based on \ref FIRSTPASS_STATS
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	806	(which is generated when processing the look-ahead buffer) for only the
				807	current GF group.</li>
				808
				809	<li><b>Two pass:</b> the first pass in two-pass encoding collects the stats
				810	and will not call the function. In the second pass, the function tries to
				811	determine the GF group length of the current and the following GF groups (a
				812	total number of MAX_NUM_GF_INTERVALS groups) based on the first-pass
				813	statistics. Note that as we will be discussing later, such decisions may not
				814	be accurate and can be changed later.</li>
				815	</ul>
				816
				817	Except for the first trivial case where there is no prior knowledge of the
Bohan Li	cb3b65b	2020-11-04 13:50:00 -0800	[diff] [blame]	818	following frames, the function \ref calculate_gf_length() tries to determine the
				819	GF group length based on the first pass statistics. The determination is divided
				820	into two parts:
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	821
				822	<ol>
				823	<li>Baseline decision based on accumulated statistics: this part of the function
				824	iterates through the firstpass statistics of the following frames and
				825	accumulates the statistics with function accumulate_next_frame_stats.
				826	The accumulated statistics are then used to determine whether the
				827	correlation in the GF group has dropped too much in function detect_gf_cut.
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	828	If detect_gf_cut returns non-zero, or if we've reached the end of
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	829	first-pass statistics, the baseline decision is set at the current point.</li>
				830
				831	<li>If we are not at the end of the first-pass statistics, the next part will
Bohan Li	cb3b65b	2020-11-04 13:50:00 -0800	[diff] [blame]	832	try to refine the baseline decision. This algorithm is based on the analysis
				833	of firstpass stats. It tries to cut the groups in stable regions or
				834	relatively stable points. Also it tries to avoid cutting in a blending
				835	region.</li>
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	836	</ol>
				837
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	838	As mentioned, for two-pass encoding, the function \ref
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	839	calculate_gf_length() tries to determine the length of as many as
				840	MAX_NUM_GF_INTERVALS groups. The decisions are stored in
Mufaddal Chakera	94ee9bf	2021-04-12 01:02:22 +0530	[diff] [blame]	841	\ref PRIMARY_RATE_CONTROL.gf_intervals[]. The variables
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	842	\ref RATE_CONTROL.intervals_till_gf_calculate_due and
Mufaddal Chakera	94ee9bf	2021-04-12 01:02:22 +0530	[diff] [blame]	843	\ref PRIMARY_RATE_CONTROL.gf_intervals[] help with managing and updating the stored
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	844	decisions. In the function \ref define_gf_group(), the corresponding
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	845	stored length decision will be used to define the current GF group.
				846
				847	When the maximum GF group length is larger or equal to 32, the encoder will
				848	enforce an extra layer to determine whether to use maximum GF length of 32
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	849	or 16 for every GF group. In such a case, \ref calculate_gf_length() is
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	850	first called with the original maximum length (>=32). Afterwards,
Paul Wilkins	ff98f3e	2020-07-27 16:01:05 +0100	[diff] [blame]	851	\ref av1_tpl_setup_stats() is called to analyze the determined GF group
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	852	and compare the reference to the last frame and the middle frame. If it is
				853	decided that we should use a maximum GF length of 16, the function
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	854	\ref calculate_gf_length() is called again with the updated maximum
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	855	length, and it only sets the length for a single GF group
				856	(\ref RATE_CONTROL.intervals_till_gf_calculate_due is set to 1). This process
Bohan Li	cb3b65b	2020-11-04 13:50:00 -0800	[diff] [blame]	857	is shown below.
				858
				859	\image html tplgfgroupdiagram.png "" width=40%
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	860
				861	Before encoding each frame, the encoder checks
				862	\ref RATE_CONTROL.frames_till_gf_update_due. If it is zero, indicating
				863	processing of the current GF group is done, the encoder will check whether
				864	\ref RATE_CONTROL.intervals_till_gf_calculate_due is zero. If it is, as
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	865	discussed above, \ref calculate_gf_length() is called with original
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	866	maximum length. If it is not zero, then the GF group length value stored
Mufaddal Chakera	94ee9bf	2021-04-12 01:02:22 +0530	[diff] [blame]	867	in \ref PRIMARY_RATE_CONTROL.gf_intervals[\ref PRIMARY_RATE_CONTROL.cur_gf_index] is used
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	868	(subject to change as discussed above).
				869
Paul Wilkins	e8af152	2020-07-09 15:05:01 +0100	[diff] [blame]	870	\subsection architecture_enc_gf_structure Defining a GF Group's Structure
				871
				872	The function \ref define_gf_group() defines the frame structure as well
				873	as other GF group level parameters (e.g. bit allocation) once the length of
				874	the current GF group is determined.
				875
Bohan Li	cb3b65b	2020-11-04 13:50:00 -0800	[diff] [blame]	876	The function first iterates through the first pass statistics in the GF group to
				877	accumulate various stats, using accumulate_this_frame_stats() and
				878	accumulate_next_frame_stats(). The accumulated statistics are then used to
				879	determine the use of the use of ALTREF frame along with other properties of the
Mufaddal Chakera	94ee9bf	2021-04-12 01:02:22 +0530	[diff] [blame]	880	GF group. The values of \ref PRIMARY_RATE_CONTROL.cur_gf_index, \ref
Bohan Li	cb3b65b	2020-11-04 13:50:00 -0800	[diff] [blame]	881	RATE_CONTROL.intervals_till_gf_calculate_due and \ref
				882	RATE_CONTROL.frames_till_gf_update_due are also updated accordingly.
Paul Wilkins	e8af152	2020-07-09 15:05:01 +0100	[diff] [blame]	883
Bohan Li	cb3b65b	2020-11-04 13:50:00 -0800	[diff] [blame]	884	The function \ref av1_gop_setup_structure() is called at the end to determine
				885	the frame layers and reference maps in the GF group, where the
				886	construct_multi_layer_gf_structure() function sets the frame update types for
				887	each frame and the group structure.
Paul Wilkins	e8af152	2020-07-09 15:05:01 +0100	[diff] [blame]	888
				889	- If ALTREF frames are allowed for the GF group: the first frame is set to
Bohan Li	cb3b65b	2020-11-04 13:50:00 -0800	[diff] [blame]	890	KF_UPDATE, GF_UPDATE or ARF_UPDATE. The last frames of the GF group is set to
				891	OVERLAY_UPDATE. Then in set_multi_layer_params(), frame update
				892	types are determined recursively in a binary tree fashion, and assigned to
				893	give the final IBBB structure for the group. - If the current branch has more
				894	than 2 frames and we have not reached maximum layer depth, then the middle
				895	frame is set as INTNL_ARF_UPDATE, and the left and right branches are
				896	processed recursively. - If the current branch has less than 3 frames, or we
				897	have reached maximum layer depth, then every frame in the branch is set to
				898	LF_UPDATE.
Paul Wilkins	e8af152	2020-07-09 15:05:01 +0100	[diff] [blame]	899
Bohan Li	cb3b65b	2020-11-04 13:50:00 -0800	[diff] [blame]	900	- If ALTREF frame is not allowed for the GF group: the frames are set
				901	as LF_UPDATE. This basically forms an IPPP GF group structure.
				902
				903	As mentioned, the encoder may use Temporal dependancy modelling (TPL - see \ref
				904	architecture_enc_tpl) to determine whether we should use a maximum length of 32
				905	or 16 for the current GF group. This requires calls to \ref define_gf_group()
				906	but should not change other settings (since it is in essence a trial). This
				907	special case is indicated by the setting parameter <b>is_final_pass</b> for to
				908	zero.
Paul Wilkins	e8af152	2020-07-09 15:05:01 +0100	[diff] [blame]	909
				910	For single pass encodes where look-ahead processing is disabled
Tarundeep Singh	5e5305a	2021-03-16 13:04:04 +0530	[diff] [blame]	911	(\ref AV1_PRIMARY.lap_enabled = 0), \ref define_gf_group_pass0() is used
Paul Wilkins	e8af152	2020-07-09 15:05:01 +0100	[diff] [blame]	912	instead of \ref define_gf_group().
				913
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	914	\subsection architecture_enc_kf_groups Key Frame Groups
				915
				916	A special constraint for GF group length is the location of the next keyframe
				917	(KF). The frames between two KFs are referred to as a KF group. Each KF group
				918	can be encoded and decoded independently. Because of this, a GF group cannot
				919	span beyond a KF and the location of the next KF is set as a hard boundary
				920	for GF group length.
				921
				922	<ul>
				923	<li>For two-pass encoding \ref RATE_CONTROL.frames_to_key controls when to
				924	encode a key frame. When it is zero, the current frame is a keyframe and
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	925	the function \ref find_next_key_frame() is called. This in turn calls
				926	\ref define_kf_interval() to work out where the next key frame should
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	927	be placed.</li>
				928
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	929	<li>For single-pass with look-ahead enabled, \ref define_kf_interval()
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	930	is called whenever a GF group update is needed (when
				931	\ref RATE_CONTROL.frames_till_gf_update_due is zero). This is because
				932	generally KFs are more widely spaced and the look-ahead buffer is usually
				933	not long enough.</li>
				934
				935	<li>For single-pass with look-ahead disabled, the KFs are placed according
				936	to the command line parameter <b>--kf-max-dist</b> (The above two cases are
				937	also subject to this constraint).</li>
				938	</ul>
				939
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	940	The function \ref define_kf_interval() tries to detect a scenecut.
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	941	If a scenecut within kf-max-dist is detected, then it is set as the next
				942	keyframe. Otherwise the given maximum value is used.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	943
				944	\section architecture_enc_tpl Temporal Dependency Modelling
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	945
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	946	The temporal dependency model runs at the beginning of each GOP. It builds the
				947	motion trajectory within the GOP in units of 16x16 blocks. The temporal
				948	dependency of a 16x16 block is evaluated as the predictive coding gains it
				949	contributes to its trailing motion trajectory. This temporal dependency model
				950	reflects how important a coding block is for the coding efficiency of the
				951	overall GOP. It is hence used to scale the Lagrangian multiplier used in the
				952	rate-distortion optimization framework.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	953
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	954	\subsection architecture_enc_tpl_config Configurations
				955
				956	The temporal dependency model and its applications are by default turned on in
				957	libaom encoder for the VoD use case. To disable it, use --tpl-model=0 in the
				958	aomenc configuration.
				959
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	960	\subsection architecture_enc_tpl_algoritms Algorithms
				961
				962	The scheme works in the reverse frame processing order over the source frames,
				963	propagating information from future frames back to the current frame. For each
				964	frame, a propagation step is run for each MB. it operates as follows:
				965
				966	<ul>
				967	<li> Estimate the intra prediction cost in terms of sum of absolute Hadamard
				968	transform difference (SATD) noted as intra_cost. It also loads the motion
				969	information available from the first-pass encode and estimates the inter
				970	prediction cost as inter_cost. Due to the use of hybrid inter/intra
				971	prediction mode, the inter_cost value is further upper bounded by
				972	intra_cost. A propagation cost variable is used to collect all the
				973	information flowed back from future processing frames. It is initialized as
				974	0 for all the blocks in the last processing frame in a group of pictures
				975	(GOP).</li>
				976
				977	<li> The fraction of information from a current block to be propagated towards
				978	its reference block is estimated as:
				979	\f[
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	980	propagation\_fraction = (1 - inter\_cost/intra\_cost)
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	981	\f]
				982	It reflects how much the motion compensated reference would reduce the
				983	prediction error in percentage.</li>
				984
				985	<li> The total amount of information the current block contributes to the GOP
				986	is estimated as intra_cost + propagation_cost. The information that it
				987	propagates towards its reference block is captured by:
				988
				989	\f[
				990	propagation\_amount =
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	991	(intra\_cost + propagation\_cost) * propagation\_fraction
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	992	\f]</li>
				993
				994	<li> Note that the reference block may not necessarily sit on the grid of
				995	16x16 blocks. The propagation amount is hence dispensed to all the blocks
				996	that overlap with the reference block. The corresponding block in the
				997	reference frame accumulates its own propagation cost as it receives back
				998	propagation.
				999
				1000	\f[
				1001	propagation\_cost = propagation\_cost +
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	1002	(\frac{overlap\_area}{(1616)} propagation\_amount)
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	1003	\f]</li>
				1004
				1005	<li> In the final encoding stage, the distortion propagation factor of a block
				1006	is evaluated as \f$(1 + \frac{propagation\_cost}{intra\_cost})\f$, where the second term
				1007	captures its impact on later frames in a GOP.</li>
				1008
				1009	<li> The Lagrangian multiplier is adapted at the 64x64 block level. For every
				1010	64x64 block in a frame, we have a distortion propagation factor:
				1011
				1012	\f[
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	1013	dist\_prop[i] = 1 + \frac{propagation\_cost[i]}{intra\_cost[i]}
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	1014	\f]
				1015
				1016	where i denotes the block index in the frame. We also have the frame level
				1017	distortion propagation factor:
				1018
				1019	\f[
				1020	dist\_prop = 1 +
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	1021	\frac{\sum_{i}propagation\_cost[i]}{\sum_{i}intra\_cost[i]}
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	1022	\f]
				1023
				1024	which is used to normalize the propagation factor at the 64x64 block level. The
				1025	Lagrangian multiplier is hence adapted as:
				1026
				1027	\f[
				1028	λ[i] = λ[0] * \frac{dist\_prop}{dist\_prop[i]}
				1029	\f]
				1030
				1031	where λ0 is the multiplier associated with the frame level QP. The
				1032	64x64 block level QP is scaled according to the Lagrangian multiplier.
				1033	</ul>
				1034
Paul Wilkins	ff98f3e	2020-07-27 16:01:05 +0100	[diff] [blame]	1035	\subsection architecture_enc_tpl_keyfun Key Functions and data structures
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	1036
Paul Wilkins	ff98f3e	2020-07-27 16:01:05 +0100	[diff] [blame]	1037	The reader is also refered to the following functions and data structures:
				1038
				1039	- \ref TplParams
				1040	- \ref av1_tpl_setup_stats() builds the TPL model.
				1041	- \ref setup_delta_q() Assign different quantization parameters to each super
				1042	block based on its TPL weight.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1043
				1044	\section architecture_enc_partitions Block Partition Search
				1045
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	1046	A frame is first split into tiles in \ref encode_tiles(), with each tile
				1047	compressed by av1_encode_tile(). Then a tile is processed in superblock rows
				1048	via \ref av1_encode_sb_row() and then \ref encode_sb_row().
				1049
				1050	The partition search processes superblocks sequentially in \ref
				1051	encode_sb_row(). Two search modes are supported, depending upon the encoding
				1052	configuration, \ref encode_nonrd_sb() is for 1-pass and real-time modes,
				1053	while \ref encode_rd_sb() performs more exhaustive rate distortion based
				1054	searches.
				1055
				1056	Partition search over the recursive quad-tree space is implemented by
				1057	recursive calls to \ref av1_nonrd_use_partition(),
				1058	\ref av1_rd_use_partition(), or av1_rd_pick_partition() and returning best
				1059	options for sub-trees to their parent partitions.
				1060
Paul Wilkins	3a13f64	2020-07-29 17:35:33 +0100	[diff] [blame]	1061	In libaom, the partition search lays on top of the mode search (predictor,
				1062	transform, etc.), instead of being a separate module. The interface of mode
				1063	search is \ref pick_sb_modes(), which connects the partition_search with
				1064	\ref architecture_enc_inter_modes and \ref architecture_enc_intra_modes. To
				1065	make good decisions, reconstruction is also required in order to build
				1066	references and contexts. This is implemented by \ref encode_sb() at the
				1067	sub-tree level and \ref encode_b() at coding block level.
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	1068
				1069	See also \ref partition_search
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1070
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1071	\section architecture_enc_intra_modes Intra Mode Search
				1072
Paul Wilkins	4ac8bf4	2020-07-30 16:44:27 +0100	[diff] [blame]	1073	AV1 also provides 71 different intra prediction modes, i.e. modes that predict
				1074	only based upon information in the current frame with no dependency on
				1075	previous or future frames. For key frames, where this independence from any
				1076	other frame is a defining requirement and for other cases where intra only
				1077	frames are required, the encoder need only considers these modes in the rate
				1078	distortion loop.
				1079
				1080	Even so, in most use cases, searching all possible intra prediction modes for
				1081	every block and partition size is not practical and some pruning of the search
				1082	tree is necessary.
				1083
				1084	For the Rate distortion optimized case, the main top level function
				1085	responsible for selecting the intra prediction mode for a given block is
				1086	\ref av1_rd_pick_intra_mode_sb(). The readers attention is also drawn to the
				1087	functions \ref hybrid_intra_mode_search() and \ref av1_nonrd_pick_intra_mode()
				1088	which may be used where encode speed is critical. The choice between the
				1089	rd path and the non rd or hybrid paths depends on the encoder use case and the
				1090	\ref AV1_COMP.speed parameter. Further fine control of the speed vs quality
				1091	trade off is provided by means of fields in \ref AV1_COMP.sf (which has type
				1092	\ref SPEED_FEATURES).
				1093
				1094	Note that some intra modes are only considered for specific use cases or
				1095	types of video. For example the palette based prediction modes are often
				1096	valueable for graphics or screen share content but not for natural video.
				1097	(See \ref av1_search_palette_mode())
				1098
Paul Wilkins	3a13f64	2020-07-29 17:35:33 +0100	[diff] [blame]	1099	See also \ref intra_mode_search for more details.
				1100
				1101	\section architecture_enc_inter_modes Inter Prediction Mode Search
				1102
Paul Wilkins	da6a80b	2020-07-30 17:27:56 +0100	[diff] [blame]	1103	For inter frames, where we also allow prediction using one or more previously
				1104	coded frames (which may chronologically speaking be past or future frames or
				1105	non-display reference buffers such as ARF frames), the size of the search tree
				1106	that needs to be traversed, to select a prediction mode, is considerably more
				1107	massive.
				1108
				1109	In addition to the 71 possible intra modes we also need to consider 56 single
				1110	frame inter prediction modes (7 reference frames x 4 modes x 2 for OBMC
				1111	(overlapped block motion compensation)), 12768 compound inter prediction modes
				1112	(these are modes that combine inter predictors from two reference frames) and
				1113	36708 compound inter / intra prediction modes.
				1114
				1115	As with the intra mode search, libaom supports an RD based pathway and a non
				1116	rd pathway for speed critical use cases. The entry points for these two cases
Jingning Han	e9eb8c0	2020-11-11 14:47:53 -0800	[diff] [blame]	1117	are \ref av1_rd_pick_inter_mode() and \ref av1_nonrd_pick_inter_mode_sb()
Paul Wilkins	da6a80b	2020-07-30 17:27:56 +0100	[diff] [blame]	1118	respectively.
				1119
				1120	Various heuristics and predictive strategies are used to prune the search tree
				1121	with fine control provided through the speed features parameter in the main
				1122	compressor instance data structure \ref AV1_COMP.sf.
				1123
				1124	It is worth noting, that some prediction modes incurr a much larger rate cost
				1125	than others (ignoring for now the cost of coding the error residual). For
				1126	example, a compound mode that requires the encoder to specify two reference
				1127	frames and two new motion vectors will almost inevitable have a higher rate
				1128	cost than a simple inter prediction mode that uses a predicted or 0,0 motion
				1129	vector. As such, if we have already found a mode for the current block that
				1130	has a low RD cost, we can skip a large number of the possible modes on the
				1131	basis that even if the error residual is 0 the inherent rate cost of the
				1132	mode itself will garauntee that it is not chosen.
				1133
Paul Wilkins	3a13f64	2020-07-29 17:35:33 +0100	[diff] [blame]	1134	See also \ref inter_mode_search for more details.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1135
				1136	\section architecture_enc_tx_search Transform Search
				1137
Paul Wilkins	8ed85dd	2020-08-04 17:48:22 +0100	[diff] [blame]	1138	AV1 implements the transform stage using 4 seperable 1-d transforms (DCT,
				1139	ADST, FLIPADST and IDTX, where FLIPADST is the reversed version of ADST
				1140	and IDTX is the identity transform) which can be combined to give 16 2-d
				1141	combinations.
Paul Wilkins	3a13f64	2020-07-29 17:35:33 +0100	[diff] [blame]	1142
				1143	These combinations can be applied at 19 different scales from 64x64 pixels
				1144	down to 4x4 pixels.
				1145
				1146	This gives rise to a large number of possible candidate transform options
				1147	for coding the residual error after prediction. An exhaustive rate-distortion
				1148	based evaluation of all candidates would not be practical from a speed
				1149	perspective in a production encoder implementation. Hence libaom addopts a
				1150	number of strategies to prune the selection of both the transform size and
				1151	transform type.
				1152
				1153	There are a number of strategies that have been tested and implememnted in
				1154	libaom including:
				1155
				1156	- A statistics based approach that looks at the frequency with which certain
				1157	combinations are used in a given context and prunes out very unlikely
				1158	candidates. It is worth noting here that some size candidates can be pruned
				1159	out immediately based on the size of the prediction partition. For example it
				1160	does not make sense to use a transform size that is larger than the
				1161	prediction partition size but also a very large prediction partition size is
				1162	unlikely to be optimally pared with small transforms.
				1163
				1164	- A Machine learning based model
				1165
				1166	- A method that initially tests candidates using a fast algorithm that skips
				1167	entropy encoding and uses an estimated cost model to choose a reduced subset
				1168	for full RD analysis. This subject is covered more fully in a paper authored
				1169	by Bohan Li, Jingning Han, and Yaowu Xu titled: <b>Fast Transform Type
				1170	Selection Using Conditional Laplace Distribution Based Rate Estimation</b>
				1171
				1172	<b>TODO Add link to paper when available</b>
				1173
				1174	See also \ref transform_search for more details.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1175
Paul Wilkins	d7a9f0e	2020-07-30 18:12:40 +0100	[diff] [blame]	1176	\section architecture_post_enc_filt Post Encode Loop Filtering
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1177
Paul Wilkins	d7a9f0e	2020-07-30 18:12:40 +0100	[diff] [blame]	1178	AV1 supports three types of post encode <b>in loop</b> filtering to improve
				1179	the quality of the reconstructed video.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1180
Paul Wilkins	d7a9f0e	2020-07-30 18:12:40 +0100	[diff] [blame]	1181	- <b>Deblocking Filter</b> The first of these is a farily traditional boundary
				1182	deblocking filter that attempts to smooth discontinuities that may occur at
				1183	the boundaries between blocks. See also \ref in_loop_filter.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1184
Paul Wilkins	d7a9f0e	2020-07-30 18:12:40 +0100	[diff] [blame]	1185	- <b>CDEF Filter</b> The constrained directional enhancement filter (CDEF)
				1186	allows the codec to apply a non-linear deringing filter along certain
				1187	(potentially oblique) directions. A primary filter is applied along the
Paul Wilkins	10e9944	2020-08-05 15:35:44 +0100	[diff] [blame]	1188	selected direction, whilst a secondary filter is applied at 45 degrees to
Paul Wilkins	f88a151	2020-10-20 13:18:40 +0100	[diff] [blame]	1189	the primary direction. (See also \ref in_loop_cdef and
				1190	<a href="https://arxiv.org/abs/2008.06091"> A Technical Overview of AV1</a>.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1191
Paul Wilkins	d7a9f0e	2020-07-30 18:12:40 +0100	[diff] [blame]	1192	- <b>Loop Restoration Filter</b> The loop restoration filter is applied after
Paul Wilkins	10e9944	2020-08-05 15:35:44 +0100	[diff] [blame]	1193	any prior post filtering stages. It acts on units of either 64 x 64,
				1194	128 x 128, or 256 x 256 pixel blocks, refered to as loop restoration units.
Paul Wilkins	d7a9f0e	2020-07-30 18:12:40 +0100	[diff] [blame]	1195	Each unit can independently select either to bypass filtering, use a Wiener
				1196	filter, or use a self-guided filter. (See also \ref in_loop_restoration and
Paul Wilkins	f88a151	2020-10-20 13:18:40 +0100	[diff] [blame]	1197	<a href="https://arxiv.org/abs/2008.06091"> A Technical Overview of AV1</a>.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1198
				1199	\section architecture_entropy Entropy Coding
				1200
Paul Wilkins	ef79fe4	2020-08-04 19:32:11 +0100	[diff] [blame]	1201	\subsection architecture_entropy_aritmetic Arithmetic Coder
				1202
				1203	VP9, used a binary arithmetic coder to encode symbols, where the propability
				1204	of a 1 or 0 at each descision node was based on a context model that took
				1205	into account recently coded values (for example previously coded coefficients
				1206	in the current block). A mechanism existed to update the context model each
				1207	frame, either explicitly in the bitstream, or implicitly at both the encoder
				1208	and decoder based on the observed frequency of different outcomes in the
				1209	previous frame. VP9 also supported seperate context models for different types
				1210	of frame (e.g. inter coded frames and key frames).
				1211
				1212	In contrast, AV1 uses an M-ary symbol arithmetic coder to compress the syntax
				1213	elements, where integer \f$M\in[2, 14]\f$. This approach is based upon the entropy
				1214	coding strategy used in the Daala video codec and allows for some bit-level
				1215	parallelism in its implementation. AV1 also has an extended context model and
				1216	allows for updates to the probabilities on a per symbol basis as opposed to
				1217	the per frame strategy in VP9.
				1218
				1219	To improve the performance / throughput of the arithmetic encoder, especially
				1220	in hardware implementations, the probability model is updated and maintained
				1221	at 15-bit precision, but the arithmetic encoder only uses the most significant
				1222	9 bits when encoding a symbol. A more detailed discussion of the algorithm
Paul Wilkins	f88a151	2020-10-20 13:18:40 +0100	[diff] [blame]	1223	and design constraints can be found in
				1224	<a href="https://arxiv.org/abs/2008.06091"> A Technical Overview of AV1</a>.
Paul Wilkins	ef79fe4	2020-08-04 19:32:11 +0100	[diff] [blame]	1225
				1226	TODO add references to key functions / files.
				1227
				1228	As with VP9, a mechanism exists in AV1 to encode some elements into the
				1229	bitstream as uncrompresed bits or literal values, without using the arithmetic
				1230	coder. For example, some frame and sequence header values, where it is
				1231	beneficial to be able to read the values directly.
				1232
				1233	TODO add references to key functions / files.
Paul Wilkins	386cb69	2020-08-04 18:11:17 +0100	[diff] [blame]	1234
angiebird	9101c0e	2020-08-17 11:16:23 -0700	[diff] [blame]	1235	\subsection architecture_entropy_coef Transform Coefficient Coding and Optimization
				1236	\image html coeff_coding.png "" width=70%
Paul Wilkins	386cb69	2020-08-04 18:11:17 +0100	[diff] [blame]	1237
angiebird	9101c0e	2020-08-17 11:16:23 -0700	[diff] [blame]	1238	\subsubsection architecture_entropy_coef_what Transform coefficient coding
				1239	Transform coefficient coding is where the encoder compresses a quantized version
				1240	of prediction residue into the bitstream.
				1241
				1242	\paragraph architecture_entropy_coef_prepare Preparation - transform and quantize
				1243	Before the entropy coding stage, the encoder decouple the pixel-to-pixel
				1244	correlation of the prediction residue by transforming the residue from the
				1245	spatial domain to the frequency domain. Then the encoder quantizes the transform
				1246	coefficients to make the coefficients ready for entropy coding.
				1247
				1248	\paragraph architecture_entropy_coef_coding The coding process
				1249	The encoder uses \ref av1_write_coeffs_txb() to write the coefficients of
				1250	a transform block into the bitstream.
				1251	The coding process has three stages.
				1252	1. The encoder will code transform block skip flag (txb_skip). If the skip flag is
				1253	off, then the encoder will code the end of block position (eob) which is the scan
				1254	index of the last non-zero coefficient plus one.
				1255	2. Second, the encoder will code lower magnitude levels of each coefficient in
				1256	reverse scan order.
				1257	3. Finally, the encoder will code the sign and higher magnitude levels for each
				1258	coefficient if they are available.
				1259
				1260	Related functions:
				1261	- \ref av1_write_coeffs_txb()
				1262	- write_inter_txb_coeff()
				1263	- \ref av1_write_intra_coeffs_mb()
				1264
				1265	\paragraph architecture_entropy_coef_context Context information
				1266	To improve the compression efficiency, the encoder uses several context models
				1267	tailored for transform coefficients to capture the correlations between coding
				1268	symbols. Most of the context models are built to capture the correlations
				1269	between the coefficients within the same transform block. However, transform
				1270	block skip flag (txb_skip) and the sign of dc coefficient (dc_sign) require
				1271	context info from neighboring transform blocks.
				1272
				1273	Here is how context info spread between transform blocks. Before coding a
				1274	transform block, the encoder will use get_txb_ctx() to collect the context
				1275	information from neighboring transform blocks. Then the context information
				1276	will be used for coding transform block skip flag (txb_skip) and the sign of
				1277	dc coefficient (dc_sign). After the transform block is coded, the encoder will
				1278	extract the context info from the current block using
				1279	\ref av1_get_txb_entropy_context(). Then encoder will store the context info
				1280	into a byte (uint8_t) using av1_set_entropy_contexts(). The encoder will use
				1281	the context info to code other transform blocks.
				1282
				1283	Related functions:
				1284	- \ref av1_get_txb_entropy_context()
				1285	- av1_set_entropy_contexts()
				1286	- get_txb_ctx()
				1287	- \ref av1_update_intra_mb_txb_context()
				1288
				1289	\subsubsection architecture_entropy_coef_rd RD optimization
				1290	Beside the actual entropy coding, the encoder uses several utility functions
				1291	to make optimal RD decisions.
				1292
				1293	\paragraph architecture_entropy_coef_cost Entropy cost
				1294	The encoder uses \ref av1_cost_coeffs_txb() or \ref av1_cost_coeffs_txb_laplacian()
				1295	to estimate the entropy cost of a transform block. Note that
				1296	\ref av1_cost_coeffs_txb() is slower but accurate whereas
				1297	\ref av1_cost_coeffs_txb_laplacian() is faster but less accurate.
				1298
				1299	Related functions:
				1300	- \ref av1_cost_coeffs_txb()
				1301	- \ref av1_cost_coeffs_txb_laplacian()
				1302	- \ref av1_cost_coeffs_txb_estimate()
				1303
				1304	\paragraph architecture_entropy_coef_opt Quantized level optimization
Vishesh	a45092c	2021-01-25 00:28:11 +0530	[diff] [blame]	1305	Beside computing entropy cost, the encoder also uses \ref av1_optimize_txb()
angiebird	9101c0e	2020-08-17 11:16:23 -0700	[diff] [blame]	1306	to adjust the coefficient’s quantized levels to achieve optimal RD trade-off.
Vishesh	a45092c	2021-01-25 00:28:11 +0530	[diff] [blame]	1307	In \ref av1_optimize_txb(), the encoder goes through each quantized
angiebird	9101c0e	2020-08-17 11:16:23 -0700	[diff] [blame]	1308	coefficient and lowers the quantized coefficient level by one if the action
				1309	yields a better RD score.
				1310
				1311	Related functions:
Vishesh	a45092c	2021-01-25 00:28:11 +0530	[diff] [blame]	1312	- \ref av1_optimize_txb()
angiebird	9101c0e	2020-08-17 11:16:23 -0700	[diff] [blame]	1313
				1314	All the related functions are listed in \ref coefficient_coding.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1315
Rachel Barker	5758629	2024-02-20 20:56:16 +0000	[diff] [blame]	1316	\section architecture_simd SIMD usage
				1317
				1318	In order to efficiently encode video on modern platforms, it is necessary to
				1319	implement optimized versions of many core encoding and decoding functions using
				1320	architecture-specific SIMD instructions.
				1321
				1322	Functions which have optimized implementations will have multiple variants
				1323	in the code, each suffixed with the name of the appropriate instruction set.
				1324	There will additionally be an `_c` version, which acts as a reference
				1325	implementation which the SIMD variants can be tested against.
				1326
				1327	As different machines with the same nominal architecture may support different
				1328	subsets of SIMD instructions, we have dynamic CPU detection logic which chooses
				1329	the appropriate functions to use at run time. This process is handled by
				1330	`build/cmake/rtcd.pl`, with function definitions in the files
				1331	`*_rtcd_defs.pl` elsewhere in the codebase.
				1332
				1333	Currently SIMD is supported on the following platforms:
				1334
				1335	- x86: Requires SSE4.1 or above
				1336
				1337	- Arm: Requires Neon (Armv7-A and above)
				1338
				1339	We aim to provide implementations of all performance-critical functions which
				1340	are compatible with the instruction sets listed above. Additional SIMD
				1341	extensions (e.g. AVX on x86, SVE on Arm) are also used to provide even
				1342	greater performance where available.
				1343
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1344	*/
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	1345
				1346	/*!\defgroup encoder_algo Encoder Algorithm
				1347	*
				1348	* The encoder algorithm describes how a sequence is encoded, including high
				1349	* level decision as well as algorithm used at every encoding stage.
				1350	*/
				1351
				1352	/*!\defgroup high_level_algo High-level Algorithm
				1353	* \ingroup encoder_algo
				1354	* This module describes sequence level/frame level algorithm in AV1.
				1355	* More details will be added.
				1356	* @{
				1357	*/
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	1358
Paul Wilkins	7173920	2020-07-23 15:09:07 +0100	[diff] [blame]	1359	/*!\defgroup speed_features Speed vs Quality Trade Off
				1360	* \ingroup high_level_algo
				1361	* This module describes the encode speed vs quality tradeoff
				1362	* @{
				1363	*/
				1364	/! @} - end defgroup speed_features /
				1365
				1366	/*!\defgroup src_frame_proc Source Frame Processing
				1367	* \ingroup high_level_algo
				1368	* This module describes algorithms in AV1 assosciated with the
				1369	* pre-processing of source frames. See also \ref architecture_enc_src_proc
				1370	*
				1371	* @{
				1372	*/
				1373	/! @} - end defgroup src_frame_proc /
				1374
				1375	/*!\defgroup rate_control Rate Control
				1376	* \ingroup high_level_algo
				1377	* This module describes rate control algorithm in AV1.
				1378	* See also \ref architecture_enc_rate_ctrl
				1379	* @{
				1380	*/
				1381	/! @} - end defgroup rate_control /
				1382
Paul Wilkins	ff98f3e	2020-07-27 16:01:05 +0100	[diff] [blame]	1383	/*!\defgroup tpl_modelling Temporal Dependency Modelling
				1384	* \ingroup high_level_algo
				1385	* This module includes algorithms to implement temporal dependency modelling.
				1386	* See also \ref architecture_enc_tpl
				1387	* @{
				1388	*/
				1389	/! @} - end defgroup tpl_modelling /
				1390
Paul Wilkins	7173920	2020-07-23 15:09:07 +0100	[diff] [blame]	1391	/*!\defgroup two_pass_algo Two Pass Mode
				1392	\ingroup high_level_algo
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	1393
				1394	In two pass mode, the input file is passed into the encoder for a quick
				1395	first pass, where statistics are gathered. These statistics and the input
				1396	file are then passed back into the encoder for a second pass. The statistics
				1397	help the encoder reach the desired bitrate without as much overshooting or
				1398	undershooting.
				1399
				1400	During the first pass, the codec will return "stats" packets that contain
				1401	information useful for the second pass. The caller should concatenate these
				1402	packets as they are received. In the second pass, the concatenated packets
				1403	are passed in, along with the frames to encode. During the second pass,
				1404	"frame" packets are returned that represent the compressed video.
				1405
				1406	A complete example can be found in `examples/twopass_encoder.c`. Pseudocode
				1407	is provided below to illustrate the core parts.
				1408
				1409	During the first pass, the uncompressed frames are passed in and stats
				1410	information is appended to a byte array.
				1411
				1412	~~~~~~~~~~~~~~~{.c}
				1413	// For simplicity, assume that there is enough memory in the stats buffer.
				1414	// Actual code will want to use a resizable array. stats_len represents
				1415	// the length of data already present in the buffer.
				1416	void get_stats_data(aom_codec_ctx_t encoder, char stats,
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	1417	size_t stats_len, bool got_data) {
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	1418	const aom_codec_cx_pkt_t *pkt;
				1419	aom_codec_iter_t iter = NULL;
				1420	while ((pkt = aom_codec_get_cx_data(encoder, &iter))) {
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	1421	*got_data = true;
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	1422	if (pkt->kind != AOM_CODEC_STATS_PKT) continue;
				1423	memcpy(stats + *stats_len, pkt->data.twopass_stats.buf,
				1424	pkt->data.twopass_stats.sz);
				1425	*stats_len += pkt->data.twopass_stats.sz;
				1426	}
				1427	}
				1428
				1429	void first_pass(char stats, size_t stats_len) {
				1430	struct aom_codec_enc_cfg first_pass_cfg;
				1431	... // Initialize the config as needed.
				1432	first_pass_cfg.g_pass = AOM_RC_FIRST_PASS;
				1433	aom_codec_ctx_t first_pass_encoder;
				1434	... // Initialize the encoder.
				1435
				1436	while (frame_available) {
				1437	// Read in the uncompressed frame, update frame_available
				1438	aom_image_t *frame_to_encode = ...;
				1439	aom_codec_encode(&first_pass_encoder, img, pts, duration, flags);
				1440	get_stats_data(&first_pass_encoder, stats, stats_len);
				1441	}
				1442	// After all frames have been processed, call aom_codec_encode with
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	1443	// a NULL ptr repeatedly, until no more data is returned. The NULL
				1444	// ptr tells the encoder that no more frames are available.
				1445	bool got_data;
				1446	do {
				1447	got_data = false;
				1448	aom_codec_encode(&first_pass_encoder, NULL, pts, duration, flags);
				1449	get_stats_data(&first_pass_encoder, stats, stats_len, &got_data);
				1450	} while (got_data);
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	1451
				1452	aom_codec_destroy(&first_pass_encoder);
				1453	}
				1454	~~~~~~~~~~~~~~~
				1455
				1456	During the second pass, the uncompressed frames and the stats are
				1457	passed into the encoder.
				1458
				1459	~~~~~~~~~~~~~~~{.c}
				1460	// Write out each encoded frame to the file.
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	1461	void get_cx_data(aom_codec_ctx_t encoder, FILE file,
				1462	bool *got_data) {
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	1463	const aom_codec_cx_pkt_t *pkt;
				1464	aom_codec_iter_t iter = NULL;
				1465	while ((pkt = aom_codec_get_cx_data(encoder, &iter))) {
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	1466	*got_data = true;
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	1467	if (pkt->kind != AOM_CODEC_CX_FRAME_PKT) continue;
				1468	fwrite(pkt->data.frame.buf, 1, pkt->data.frame.sz, file);
				1469	}
				1470	}
				1471
				1472	void second_pass(char *stats, size_t stats_len) {
				1473	struct aom_codec_enc_cfg second_pass_cfg;
				1474	... // Initialize the config file as needed.
				1475	second_pass_cfg.g_pass = AOM_RC_LAST_PASS;
				1476	cfg.rc_twopass_stats_in.buf = stats;
				1477	cfg.rc_twopass_stats_in.sz = stats_len;
				1478	aom_codec_ctx_t second_pass_encoder;
				1479	... // Initialize the encoder from the config.
				1480
				1481	FILE *output = fopen("output.obu", "wb");
				1482	while (frame_available) {
				1483	// Read in the uncompressed frame, update frame_available
				1484	aom_image_t *frame_to_encode = ...;
				1485	aom_codec_encode(&second_pass_encoder, img, pts, duration, flags);
				1486	get_cx_data(&second_pass_encoder, output);
				1487	}
				1488	// Pass in NULL to flush the encoder.
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	1489	bool got_data;
				1490	do {
				1491	got_data = false;
				1492	aom_codec_encode(&second_pass_encoder, NULL, pts, duration, flags);
				1493	get_cx_data(&second_pass_encoder, output, &got_data);
				1494	} while (got_data);
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	1495
				1496	aom_codec_destroy(&second_pass_encoder);
				1497	}
				1498	~~~~~~~~~~~~~~~
				1499	*/
				1500
Elliott Karpilovsky	b6bd2bc	2020-06-16 03:23:17 -0700	[diff] [blame]	1501	/*!\defgroup look_ahead_buffer The Look-Ahead Buffer
				1502	\ingroup high_level_algo
				1503
				1504	A program should call \ref aom_codec_encode() for each frame that needs
				1505	processing. These frames are internally copied and stored in a fixed-size
				1506	circular buffer, known as the look-ahead buffer. Other parts of the code
				1507	will use future frame information to inform current frame decisions;
				1508	examples include the first-pass algorithm, TPL model, and temporal filter.
				1509	Note that this buffer also keeps a reference to the last source frame.
				1510
				1511	The look-ahead buffer is defined in \ref av1/encoder/lookahead.h. It acts as an
				1512	opaque structure, with an interface to create and free memory associated with
				1513	it. It supports pushing and popping frames onto the structure in a FIFO
				1514	fashion. It also allows look-ahead when using the \ref av1_lookahead_peek()
				1515	function with a non-negative number, and look-behind when -1 is passed in (for
Elliott Karpilovsky	9999059	2020-06-19 12:22:54 -0700	[diff] [blame]	1516	the last source frame; e.g., firstpass will use this for motion estimation).
				1517	The \ref av1_lookahead_depth() function returns the current number of frames
				1518	stored in it. Note that \ref av1_lookahead_pop() is a bit of a misnomer - it
				1519	only pops if either the "flush" variable is set, or the buffer is at maximum
				1520	capacity.
Elliott Karpilovsky	b6bd2bc	2020-06-16 03:23:17 -0700	[diff] [blame]	1521
Mufaddal Chakera	a65d2ce	2021-02-15 12:20:48 +0530	[diff] [blame]	1522	The buffer is stored in the \ref AV1_PRIMARY::lookahead field.
Elliott Karpilovsky	b6bd2bc	2020-06-16 03:23:17 -0700	[diff] [blame]	1523	It is initialized in the first call to \ref aom_codec_encode(), in the
				1524	\ref av1_receive_raw_frame() sub-routine. The buffer size is defined by
				1525	the g_lag_in_frames parameter set in the
				1526	\ref aom_codec_enc_cfg_t::g_lag_in_frames struct.
				1527	This can be modified manually but should only be set once. On the command
				1528	line, the flag "--lag-in-frames" controls it. The default size is 19 for
Elliott Karpilovsky	9999059	2020-06-19 12:22:54 -0700	[diff] [blame]	1529	non-realtime usage and 1 for realtime. Note that a maximum value of 35 is
Elliott Karpilovsky	b6bd2bc	2020-06-16 03:23:17 -0700	[diff] [blame]	1530	enforced.
				1531
				1532	A frame will stay in the buffer as long as possible. As mentioned above,
				1533	the \ref av1_lookahead_pop() only removes a frame when either flush is set,
				1534	or the buffer is full. Note that each call to \ref aom_codec_encode() inserts
				1535	another frame into the buffer, and pop is called by the sub-function
				1536	\ref av1_encode_strategy(). The buffer is told to flush when
				1537	\ref aom_codec_encode() is passed a NULL image pointer. Note that the caller
				1538	must repeatedly call \ref aom_codec_encode() with a NULL image pointer, until
				1539	no more packets are available, in order to fully flush the buffer.
				1540
				1541	*/
				1542
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	1543	/! @} - end defgroup high_level_algo /
				1544
				1545	/*!\defgroup partition_search Partition Search
				1546	* \ingroup encoder_algo
Paul Wilkins	c84e8e2	2020-07-21 19:09:33 +0100	[diff] [blame]	1547	* For and overview of the partition search see \ref architecture_enc_partitions
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	1548	* @{
				1549	*/
Paul Wilkins	7173920	2020-07-23 15:09:07 +0100	[diff] [blame]	1550
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	1551	/! @} - end defgroup partition_search /
				1552
				1553	/*!\defgroup intra_mode_search Intra Mode Search
				1554	* \ingroup encoder_algo
				1555	* This module describes intra mode search algorithm in AV1.
				1556	* More details will be added.
				1557	* @{
				1558	*/
				1559	/! @} - end defgroup intra_mode_search /
				1560
				1561	/*!\defgroup inter_mode_search Inter Mode Search
				1562	* \ingroup encoder_algo
				1563	* This module describes inter mode search algorithm in AV1.
				1564	* More details will be added.
				1565	* @{
				1566	*/
				1567	/! @} - end defgroup inter_mode_search /
				1568
chiyotsai	7cc167e	2020-06-12 17:50:53 -0700	[diff] [blame]	1569	/*!\defgroup palette_mode_search Palette Mode Search
				1570	* \ingroup intra_mode_search
				1571	* This module describes palette mode search algorithm in AV1.
				1572	* More details will be added.
				1573	* @{
				1574	*/
				1575	/! @} - end defgroup palette_mode_search /
				1576
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	1577	/*!\defgroup transform_search Transform Search
				1578	* \ingroup encoder_algo
				1579	* This module describes transform search algorithm in AV1.
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	1580	* @{
				1581	*/
				1582	/! @} - end defgroup transform_search /
				1583
angiebird	96bdb2a	2020-06-28 17:24:24 -0700	[diff] [blame]	1584	/*!\defgroup coefficient_coding Transform Coefficient Coding and Optimization
				1585	* \ingroup encoder_algo
				1586	* This module describes the algorithms of transform coefficient coding and optimization in AV1.
				1587	* More details will be added.
				1588	* @{
				1589	*/
				1590	/! @} - end defgroup coefficient_coding /
				1591
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	1592	/*!\defgroup in_loop_filter In-loop Filter
				1593	* \ingroup encoder_algo
				1594	* This module describes in-loop filter algorithm in AV1.
				1595	* More details will be added.
				1596	* @{
				1597	*/
				1598	/! @} - end defgroup in_loop_filter /
				1599
Debargha Mukherjee	7f1580e	2020-06-19 06:37:28 -0700	[diff] [blame]	1600	/*!\defgroup in_loop_cdef CDEF
Debargha Mukherjee	82b2438	2020-06-16 23:30:39 -0700	[diff] [blame]	1601	* \ingroup encoder_algo
				1602	* This module describes the CDEF parameter search algorithm
				1603	* in AV1. More details will be added.
				1604	* @{
				1605	*/
				1606	/! @} - end defgroup in_loop_restoration /
				1607
Debargha Mukherjee	7f1580e	2020-06-19 06:37:28 -0700	[diff] [blame]	1608	/*!\defgroup in_loop_restoration Loop Restoration
Debargha Mukherjee	82b2438	2020-06-16 23:30:39 -0700	[diff] [blame]	1609	* \ingroup encoder_algo
				1610	* This module describes the loop restoration search
				1611	* and estimation algorithm in AV1.
				1612	* More details will be added.
				1613	* @{
				1614	*/
				1615	/! @} - end defgroup in_loop_restoration /
				1616
Marco Paniconi	5b2faba	2020-07-09 11:39:22 -0700	[diff] [blame]	1617	/*!\defgroup cyclic_refresh Cyclic Refresh
				1618	* \ingroup encoder_algo
				1619	* This module describes the cyclic refresh (aq-mode=3) in AV1.
				1620	* More details will be added.
				1621	* @{
				1622	*/
				1623	/! @} - end defgroup cyclic_refresh /
Jerome Jiang	66e7624	2020-07-09 11:38:19 -0700	[diff] [blame]	1624
				1625	/*!\defgroup SVC Scalable Video Coding
				1626	* \ingroup encoder_algo
				1627	* This module describes scalable video coding algorithm in AV1.
				1628	* More details will be added.
				1629	* @{
				1630	*/
				1631	/! @} - end defgroup SVC /
Marco Paniconi	08f71f2	2020-07-14 10:41:47 -0700	[diff] [blame]	1632	/*!\defgroup variance_partition Variance Partition
				1633	* \ingroup encoder_algo
				1634	* This module describes variance partition algorithm in AV1.
				1635	* More details will be added.
				1636	* @{
				1637	*/
				1638	/! @} - end defgroup variance_partition /
Fyodor Kyslov	2a3768e	2020-07-20 14:38:05 -0700	[diff] [blame]	1639	/*!\defgroup nonrd_mode_search NonRD Optimized Mode Search
				1640	* \ingroup encoder_algo
				1641	* This module describes NonRD Optimized Mode Search used in Real-Time mode.
				1642	* More details will be added.
				1643	* @{
				1644	*/
				1645	/! @} - end defgroup nonrd_mode_search /