Blame - doc/dev_guide/av1_encoder.dox - aom

blob: 13d55e4644ff3d9f745003a42c53d243ed302b79 [file] [log] [blame]

Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	1	/*!\page encoder_guide AV1 ENCODER GUIDE
Yunqing Wang	c8f7a3b	2020-05-04 15:23:48 -0700	[diff] [blame]	2
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	3	\tableofcontents
				4
				5	\section architecture_introduction Introduction
				6
				7	This document provides an architectural overview of the libaom AV1 encoder.
				8
				9	It is intended as a high level starting point for anyone wishing to contribute
				10	to the project, that will help them to more quickly understand the structure
				11	of the encoder and find their way around the codebase.
				12
				13	It stands above and will where necessary link to more detailed function
				14	level documents.
				15
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	16	\subsection architecture_gencodecs Generic Block Transform Based Codecs
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	17
				18	Most modern video encoders including VP8, H.264, VP9, HEVC and AV1
				19	(in increasing order of complexity) share a common basic paradigm. This
				20	comprises separating a stream of raw video frames into a series of discrete
				21	blocks (of one or more sizes), then computing a prediction signal and a
				22	quantized, transform coded, residual error signal. The prediction and residual
				23	error signal, along with any side information needed by the decoder, are then
				24	entropy coded and packed to form the encoded bitstream. See Figure 1: below,
				25	where the blue blocks are, to all intents and purposes, the lossless parts of
				26	the encoder and the red block is the lossy part.
				27
				28	This is of course a gross oversimplification, even in regard to the simplest
				29	of the above codecs. For example, all of them allow for block based
				30	prediction at multiple different scales (i.e. different block sizes) and may
				31	use previously coded pixels in the current frame for prediction or pixels from
				32	one or more previously encoded frames. Further, they may support multiple
				33	different transforms and transform sizes and quality optimization tools like
				34	loop filtering.
				35
				36	\image html genericcodecflow.png "" width=70%
				37
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	38	\subsection architecture_av1_structure AV1 Structure and Complexity
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	39
				40	As previously stated, AV1 adopts the same underlying paradigm as other block
				41	transform based codecs. However, it is much more complicated than previous
				42	generation codecs and supports many more block partitioning, prediction and
				43	transform options.
				44
				45	AV1 supports block partitions of various sizes from 128x128 pixels down to 4x4
				46	pixels using a multi-layer recursive tree structure as illustrated in figure 2
				47	below.
				48
				49	\image html av1partitions.png "" width=70%
				50
				51	AV1 also provides 71 basic intra prediction modes, 56 single frame inter prediction
				52	modes (7 reference frames x 4 modes x 2 for OBMC (overlapped block motion
				53	compensation)), 12768 compound inter prediction modes (that combine inter
				54	predictors from two reference frames) and 36708 compound inter / intra
				55	prediction modes. Furthermore, in addition to simple inter motion estimation,
				56	AV1 also supports warped motion prediction using affine transforms.
				57
				58	In terms of transform coding, it has 16 separable 2-D transform kernels
Paul Wilkins	8ed85dd	2020-08-04 17:48:22 +0100	[diff] [blame]	59	\f$(DCT, ADST, fADST, IDTX)^2\f$ that can be applied at up to 19 different
				60	scales from 64x64 down to 4x4 pixels.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	61
				62	When combined together, this means that for any one 8x8 pixel block in a
				63	source frame, there are approximately 45,000,000 different ways that it can
				64	be encoded.
				65
				66	Consequently, AV1 requires complex control processes. While not necessarily
				67	a normative part of the bitstream, these are the algorithms that turn a set
				68	of compression tools and a bitstream format specification, into a coherent
				69	and useful codec implementation. These may include but are not limited to
				70	things like :-
				71
				72	- Rate distortion optimization (The process of trying to choose the most
				73	efficient combination of block size, prediction mode, transform type
				74	etc.)
				75	- Rate control (regulation of the output bitrate)
				76	- Encoder speed vs quality trade offs.
				77	- Features such as two pass encoding or optimization for low delay
				78	encoding.
				79
Paul Wilkins	4a9201b	2020-06-26 10:46:22 +0100	[diff] [blame]	80	For a more detailed overview of AV1's encoding tools and a discussion of some
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	81	of the design considerations and hardware constraints that had to be
Paul Wilkins	f88a151	2020-10-20 13:18:40 +0100	[diff] [blame]	82	accommodated, please refer to <a href="https://arxiv.org/abs/2008.06091">
				83	A Technical Overview of AV1</a>.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	84
				85	Figure 3 provides a slightly expanded but still simplistic view of the
				86	AV1 encoder architecture with blocks that relate to some of the subsequent
				87	sections of this document. In this diagram, the raw uncompressed frame buffers
				88	are shown in dark green and the reconstructed frame buffers used for
				89	prediction in light green. Red indicates those parts of the codec that are
Paul Wilkins	4a9201b	2020-06-26 10:46:22 +0100	[diff] [blame]	90	(or may be) lossy, where fidelity can be traded off against compression
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	91	efficiency, whilst light blue shows algorithms or coding tools that are
				92	lossless. The yellow blocks represent non-bitstream normative configuration
				93	and control algorithms.
				94
				95	\image html av1encoderflow.png "" width=70%
				96
				97	\section architecture_command_line The Libaom Command Line Interface
				98
				99	Add details or links here: TODO ? elliotk@
				100
				101	\section architecture_enc_data_structures Main Encoder Data Structures
				102
Paul Wilkins	4a9201b	2020-06-26 10:46:22 +0100	[diff] [blame]	103	The following are the main high level data structures used by the libaom AV1
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	104	encoder and referenced elsewhere in this overview document:
				105
Mufaddal Chakera	8ee04fa	2021-03-17 13:33:18 +0530	[diff] [blame]	106	- \ref AV1_PRIMARY
				107	- \ref AV1_PRIMARY.gf_group (\ref GF_GROUP)
Tarundeep Singh	5e5305a	2021-03-16 13:04:04 +0530	[diff] [blame]	108	- \ref AV1_PRIMARY.lap_enabled
Mufaddal Chakera	358cf21	2021-02-25 14:41:56 +0530	[diff] [blame]	109	- \ref AV1_PRIMARY.twopass (\ref TWO_PASS)
Mufaddal Chakera	94ee9bf	2021-04-12 01:02:22 +0530	[diff] [blame]	110	- \ref AV1_PRIMARY.p_rc (\ref PRIMARY_RATE_CONTROL)
Angie Chiang	29aaace	2021-11-15 16:23:42 -0800	[diff] [blame]	111	- \ref AV1_PRIMARY.tf_info (\ref TEMPORAL_FILTER_INFO)
Mufaddal Chakera	8ee04fa	2021-03-17 13:33:18 +0530	[diff] [blame]	112
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	113	- \ref AV1_COMP
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	114	- \ref AV1_COMP.oxcf (\ref AV1EncoderConfig)
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	115	- \ref AV1_COMP.rc (\ref RATE_CONTROL)
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	116	- \ref AV1_COMP.speed
				117	- \ref AV1_COMP.sf (\ref SPEED_FEATURES)
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	118
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	119	- \ref AV1EncoderConfig (Encoder configuration parameters)
				120	- \ref AV1EncoderConfig.pass
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	121	- \ref AV1EncoderConfig.algo_cfg (\ref AlgoCfg)
Paul Wilkins	591f047	2020-07-15 15:30:56 +0100	[diff] [blame]	122	- \ref AV1EncoderConfig.kf_cfg (\ref KeyFrameCfg)
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	123	- \ref AV1EncoderConfig.rc_cfg (\ref RateControlCfg)
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	124
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	125	- \ref AlgoCfg (Algorithm related configuration parameters)
				126	- \ref AlgoCfg.arnr_max_frames
				127	- \ref AlgoCfg.arnr_strength
				128
				129	- \ref KeyFrameCfg (Keyframe coding configuration parameters)
				130	- \ref KeyFrameCfg.enable_keyframe_filtering
				131
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	132	- \ref RateControlCfg (Rate control configuration)
Paul Wilkins	1dd7a7e	2020-07-09 17:07:35 +0100	[diff] [blame]	133	- \ref RateControlCfg.mode
				134	- \ref RateControlCfg.target_bandwidth
				135	- \ref RateControlCfg.best_allowed_q
				136	- \ref RateControlCfg.worst_allowed_q
				137	- \ref RateControlCfg.cq_level
				138	- \ref RateControlCfg.under_shoot_pct
				139	- \ref RateControlCfg.over_shoot_pct
				140	- \ref RateControlCfg.maximum_buffer_size_ms
				141	- \ref RateControlCfg.starting_buffer_level_ms
				142	- \ref RateControlCfg.optimal_buffer_level_ms
Debargha Mukherjee	c6a8120	2020-07-22 16:35:20 -0700	[diff] [blame]	143	- \ref RateControlCfg.vbrbias
				144	- \ref RateControlCfg.vbrmin_section
				145	- \ref RateControlCfg.vbrmax_section
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	146
Mufaddal Chakera	94ee9bf	2021-04-12 01:02:22 +0530	[diff] [blame]	147	- \ref PRIMARY_RATE_CONTROL (Primary Rate control status)
				148	- \ref PRIMARY_RATE_CONTROL.gf_intervals[]
				149	- \ref PRIMARY_RATE_CONTROL.cur_gf_index
				150
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	151	- \ref RATE_CONTROL (Rate control status)
				152	- \ref RATE_CONTROL.intervals_till_gf_calculate_due
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	153	- \ref RATE_CONTROL.frames_till_gf_update_due
				154	- \ref RATE_CONTROL.frames_to_key
				155
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	156	- \ref TWO_PASS (Two pass status and control data)
				157
Wan-Teh Chang	247dd54	2020-10-08 12:37:47 -0700	[diff] [blame]	158	- \ref GF_GROUP (Data related to the current GF/ARF group)
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	159
				160	- \ref FIRSTPASS_STATS (Defines entries in the first pass stats buffer)
				161	- \ref FIRSTPASS_STATS.coded_error
				162
				163	- \ref SPEED_FEATURES (Encode speed vs quality tradeoff parameters)
				164	- \ref SPEED_FEATURES.hl_sf (\ref HIGH_LEVEL_SPEED_FEATURES)
				165
				166	- \ref HIGH_LEVEL_SPEED_FEATURES
				167	- \ref HIGH_LEVEL_SPEED_FEATURES.recode_loop
				168	- \ref HIGH_LEVEL_SPEED_FEATURES.recode_tolerance
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	169
Paul Wilkins	4ac8bf4	2020-07-30 16:44:27 +0100	[diff] [blame]	170	- \ref TplParams
				171
Paul Wilkins	7173920	2020-07-23 15:09:07 +0100	[diff] [blame]	172	\section architecture_enc_use_cases Encoder Use Cases
				173
				174	The libaom AV1 encoder is configurable to support a number of different use
				175	cases and rate control strategies.
				176
				177	The principle use cases for which it is optimised are as follows:
				178
				179	- <b>Video on Demand / Streaming</b>
				180	- <b>Low Delay or Live Streaming</b>
				181	- <b>Video Conferencing / Real Time Coding (RTC)</b>
				182	- <b>Fixed Quality / Testing</b>
				183
				184	Other examples of use cases for which the encoder could be configured but for
				185	which there is less by way of specific optimizations include:
				186
				187	- <b>Download and Play</b>
				188	- <b>Disk Playback</b>>
				189	- <b>Storage</b>
				190	- <b>Editing</b>
				191	- <b>Broadcast video</b>
				192
				193	Specific use cases may have particular requirements or constraints. For
				194	example:
				195
				196	<b>Video Conferencing:</b> In a video conference we need to encode the video
				197	in real time and to avoid any coding tools that could increase latency, such
				198	as frame look ahead.
				199
				200	<b>Live Streams:</b> In cases such as live streaming of games or events, it
				201	may be possible to allow some limited buffering of the video and use of
				202	lookahead coding tools to improve encoding quality. However, whilst a lag of
				203	a second or two may be fine given the one way nature of this type of video,
				204	it is clearly not possible to use tools such as two pass coding.
				205
				206	<b>Broadcast:</b> Broadcast video (e.g. digital TV over satellite) may have
				207	specific requirements such as frequent and regular key frames (e.g. once per
				208	second or more) as these are important as entry points to users when switching
				209	channels. There may also be strict upper limits on bandwidth over a short
				210	window of time.
				211
				212	<b>Download and Play:</b> Download and play applications may have less strict
				213	requirements in terms of local frame by frame rate control but there may be a
				214	requirement to accurately hit a file size target for the video clip as a
				215	whole. Similar considerations may apply to playback from mass storage devices
				216	such as DVD or disk drives.
				217
				218	<b>Editing:</b> In certain special use cases such as offline editing, it may
				219	be desirable to have very high quality and data rate but also very frequent
				220	key frames or indeed to encode the video exclusively as key frames. Lossless
				221	video encoding may also be required in this use case.
				222
				223	<b>VOD / Streaming:</b> One of the most important and common use cases for AV1
				224	is video on demand or streaming, for services such as YouTube and Netflix. In
				225	this use case it is possible to do two or even multi-pass encoding to improve
				226	compression efficiency. Streaming services will often store many encoded
				227	copies of a video at different resolutions and data rates to support users
				228	with different types of playback device and bandwidth limitations.
				229	Furthermore, these services support dynamic switching between multiple
				230	streams, so that they can respond to changing network conditions.
				231
				232	Exact rate control when encoding for a specific format (e.g 360P or 1080P on
				233	YouTube) may not be critical, provided that the video bandwidth remains within
				234	allowed limits. Whilst a format may have a nominal target data rate, this can
				235	be considered more as the desired average egress rate over the video corpus
				236	rather than a strict requirement for any individual clip. Indeed, in order
				237	to maintain optimal quality of experience for the end user, it may be
				238	desirable to encode some easier videos or sections of video at a lower data
				239	rate and harder videos or sections at a higher rate.
				240
				241	VOD / streaming does not usually require very frequent key frames (as in the
				242	broadcast case) but key frames are important in trick play (scanning back and
				243	forth to different points in a video) and for adaptive stream switching. As
				244	such, in a use case like YouTube, there is normally an upper limit on the
				245	maximum time between key frames of a few seconds, but within certain limits
				246	the encoder can try to align key frames with real scene cuts.
				247
				248	Whilst encoder speed may not seem to be as critical in this use case, for
				249	services such as YouTube, where millions of new videos have to be encoded
				250	every day, encoder speed is still important, so libaom allows command line
				251	control of the encode speed vs quality trade off.
				252
				253	<b>Fixed Quality / Testing Mode:</b> Libaom also has a fixed quality encoder
				254	pathway designed for testing under highly constrained conditions.
				255
				256	\section architecture_enc_speed_quality Speed vs Quality Trade Off
				257
				258	In any modern video encoder there are trade offs that can be made in regard to
				259	the amount of time spent encoding a video or video frame vs the quality of the
				260	final encode.
				261
				262	These trade offs typically limit the scope of the search for an optimal
				263	prediction / transform combination with faster encode modes doing fewer
				264	partition, reference frame, prediction mode and transform searches at the cost
				265	of some reduction in coding efficiency.
				266
				267	The pruning of the size of the search tree is typically based on assumptions
				268	about the likelihood of different search modes being selected based on what
				269	has gone before and features such as the dimensions of the video frames and
				270	the Q value selected for encoding the frame. For example certain intra modes
				271	are less likely to be chosen at high Q but may be more likely if similar
				272	modes were used for the previously coded blocks above and to the left of the
				273	current block.
				274
				275	The speed settings depend both on the use case (e.g. Real Time encoding) and
				276	an explicit speed control passed in on the command line as <b>--cpu-used</b>
				277	and stored in the \ref AV1_COMP.speed field of the main compressor instance
				278	data structure (<b>cpi</b>).
				279
				280	The control flags for the speed trade off are stored the \ref AV1_COMP.sf
				281	field of the compressor instancve and are set in the following functions:-
				282
				283	- \ref av1_set_speed_features_framesize_independent()
				284	- \ref av1_set_speed_features_framesize_dependent()
				285	- \ref av1_set_speed_features_qindex_dependent()
				286
				287	A second factor impacting the speed of encode is rate distortion optimisation
				288	(<b>rd vs non-rd</b> encoding).
				289
				290	When rate distortion optimization is enabled each candidate combination of
				291	a prediction mode and transform coding strategy is fully encoded and the
				292	resulting error (or distortion) as compared to the original source and the
				293	number of bits used, are passed to a rate distortion function. This function
				294	converts the distortion and cost in bits to a single <b>RD</b> value (where
				295	lower is better). This <b>RD</b> value is used to decide between different
				296	encoding strategies for the current block where, for example, a one may
				297	result in a lower distortion but a larger number of bits.
				298
				299	The calculation of this <b>RD</b> value is broadly speaking as follows:
				300
				301	\f[
				302	RD = (λ * Rate) + Distortion
				303	\f]
				304
				305	This assumes a linear relationship between the number of bits used and
				306	distortion (represented by the rate multiplier value <b>λ</b>) which is
				307	not actually valid across a broad range of rate and distortion values.
				308	Typically, where distortion is high, expending a small number of extra bits
				309	will result in a large change in distortion. However, at lower values of
				310	distortion the cost in bits of each incremental improvement is large.
				311
				312	To deal with this we scale the value of <b>λ</b> based on the quantizer
				313	value chosen for the frame. This is assumed to be a proxy for our approximate
				314	position on the true rate distortion curve and it is further assumed that over
				315	a limited range of distortion values, a linear relationship between distortion
				316	and rate is a valid approximation.
				317
				318	Doing a rate distortion test on each candidate prediction / transform
				319	combination is expensive in terms of cpu cycles. Hence, for cases where encode
				320	speed is critical, libaom implements a non-rd pathway where the <b>RD</b>
				321	value is estimated based on the prediction error and quantizer setting.
				322
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	323	\section architecture_enc_src_proc Source Frame Processing
				324
				325	\subsection architecture_enc_frame_proc_data Main Data Structures
				326
				327	The following are the main data structures referenced in this section
				328	(see also \ref architecture_enc_data_structures):
				329
Tarundeep Singh	4593fcf	2021-03-31 00:53:31 +0530	[diff] [blame]	330	- \ref AV1_PRIMARY ppi (the primary compressor instance data structure)
Angie Chiang	29aaace	2021-11-15 16:23:42 -0800	[diff] [blame]	331	- \ref AV1_PRIMARY.tf_info (\ref TEMPORAL_FILTER_INFO)
Tarundeep Singh	4593fcf	2021-03-31 00:53:31 +0530	[diff] [blame]	332
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	333	- \ref AV1_COMP cpi (the main compressor instance data structure)
				334	- \ref AV1_COMP.oxcf (\ref AV1EncoderConfig)
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	335
				336	- \ref AV1EncoderConfig (Encoder configuration parameters)
				337	- \ref AV1EncoderConfig.algo_cfg (\ref AlgoCfg)
				338	- \ref AV1EncoderConfig.kf_cfg (\ref KeyFrameCfg)
				339
				340	- \ref AlgoCfg (Algorithm related configuration parameters)
				341	- \ref AlgoCfg.arnr_max_frames
				342	- \ref AlgoCfg.arnr_strength
				343
				344	- \ref KeyFrameCfg (Keyframe coding configuration parameters)
				345	- \ref KeyFrameCfg.enable_keyframe_filtering
				346
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	347	\subsection architecture_enc_frame_proc_ingest Frame Ingest / Coding Pipeline
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	348
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	349	To encode a frame, first call \ref av1_receive_raw_frame() to obtain the raw
				350	frame data. Then call \ref av1_get_compressed_data() to encode raw frame data
				351	into compressed frame data. The main body of \ref av1_get_compressed_data()
				352	is \ref av1_encode_strategy(), which determines high-level encode strategy
				353	(frame type, frame placement, etc.) and then encodes the frame by calling
				354	\ref av1_encode(). In \ref av1_encode(), \ref av1_first_pass() will execute
				355	the first_pass of two-pass encoding, while \ref encode_frame_to_data_rate()
				356	will perform the final pass for either one-pass or two-pass encoding.
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	357
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	358	The main body of \ref encode_frame_to_data_rate() is
				359	\ref encode_with_recode_loop_and_filter(), which handles encoding before
Paul Wilkins	591f047	2020-07-15 15:30:56 +0100	[diff] [blame]	360	in-loop filters (with recode loops \ref encode_with_recode_loop(), or
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	361	without any recode loop \ref encode_without_recode()), followed by in-loop
				362	filters (deblocking filters \ref loopfilter_frame(), CDEF filters and
				363	restoration filters \ref cdef_restoration_frame()).
				364
Paul Wilkins	591f047	2020-07-15 15:30:56 +0100	[diff] [blame]	365	Except for rate/quality control, both \ref encode_with_recode_loop() and
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	366	\ref encode_without_recode() call \ref av1_encode_frame() to manage the
				367	reference frame buffers and \ref encode_frame_internal() to perform the
				368	rest of encoding that does not require access to external frames.
				369	\ref encode_frame_internal() is the starting point for the partition search
				370	(see \ref architecture_enc_partitions).
				371
				372	\subsection architecture_enc_frame_proc_tf Temporal Filtering
				373
				374	\subsubsection architecture_enc_frame_proc_tf_overview Overview
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	375
				376	Video codecs exploit the spatial and temporal correlations in video signals to
				377	achieve compression efficiency. The noise factor in the source signal
				378	attenuates such correlation and impedes the codec performance. Denoising the
				379	video signal is potentially a promising solution.
				380
				381	One strategy for denoising a source is motion compensated temporal filtering.
				382	Unlike image denoising, where only the spatial information is available,
				383	video denoising can leverage a combination of the spatial and temporal
				384	information. Specifically, in the temporal domain, similar pixels can often be
				385	tracked along the motion trajectory of moving objects. Motion estimation is
				386	applied to neighboring frames to find similar patches or blocks of pixels that
				387	can be combined to create a temporally filtered output.
				388
				389	AV1, in common with VP8 and VP9, uses an in-loop motion compensated temporal
				390	filter to generate what are referred to as alternate reference frames (or ARF
				391	frames). These can be encoded in the bitstream and stored as frame buffers for
				392	use in the prediction of subsequent frames, but are not usually directly
				393	displayed (hence they are sometimes referred to as non-display frames).
				394
				395	The following command line parameters set the strength of the filter, the
				396	number of frames used and determine whether filtering is allowed for key
				397	frames.
				398
				399	- <b>--arnr-strength</b> (\ref AlgoCfg.arnr_strength)
				400	- <b>--arnr-maxframes</b> (\ref AlgoCfg.arnr_max_frames)
				401	- <b>--enable-keyframe-filtering</b>
				402	(\ref KeyFrameCfg.enable_keyframe_filtering)
				403
				404	Note that in AV1, the temporal filtering scheme is designed around the
				405	hierarchical ARF based pyramid coding structure. We typically apply denoising
				406	only on key frame and ARF frames at the highest (and sometimes the second
				407	highest) layer in the hierarchical coding structure.
				408
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	409	\subsubsection architecture_enc_frame_proc_tf_algo Temporal Filtering Algorithm
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	410
				411	Our method divides the current frame into "MxM" blocks. For each block, a
				412	motion search is applied on frames before and after the current frame. Only
				413	the best matching patch with the smallest mean square error (MSE) is kept as a
				414	candidate patch for a neighbour frame. The current block is also a candidate
				415	patch. A total of N candidate patches are combined to generate the filtered
				416	output.
				417
				418	Let f(i) represent the filtered sample value and \f$p_{j}(i)\f$ the sample
				419	value of the j-th patch. The filtering process is:
				420
				421	\f[
				422	f(i) = \frac{p_{0}(i) + \sum_{j=1}^{N} ω_{j}(i).p_{j}(i)}
				423	{1 + \sum_{j=1}^{N} ω_{j}(i)}
				424	\f]
				425
				426	where \f$ ω_{j}(i) \f$ is the weight of the j-th patch from a total of
				427	N patches. The weight is determined by the patch difference as:
				428
				429	\f[
				430	ω_{j}(i) = exp(-\frac{D_{j}(i)}{h^2})
				431	\f]
				432
				433	where \f$ D_{j}(i) \f$ is the sum of squared difference between the current
				434	block and the j-th candidate patch:
				435
				436	\f[
				437	D_{j}(i) = \sum_{k\inΩ_{i}}\|\|p_{0}(k) - p_{j}(k)\|\|_{2}
				438	\f]
				439
				440	where:
				441	- \f$p_{0}\f$ refers to the current frame.
				442	- \f$Ω_{i}\f$ is the patch window, an "LxL" pixel square.
				443	- h is a critical parameter that controls the decay of the weights measured by
				444	the Euclidean distance. It is derived from an estimate of noise amplitude in
				445	the source. This allows the filter coefficients to adapt for videos with
				446	different noise characteristics.
				447	- Usually, M = 32, N = 7, and L = 5, but they can be adjusted.
				448
				449	It is recommended that the reader refers to the code for more details.
				450
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	451	\subsubsection architecture_enc_frame_proc_tf_funcs Temporal Filter Functions
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	452
Paul Wilkins	c84e8e2	2020-07-21 19:09:33 +0100	[diff] [blame]	453	The main entry point for temporal filtering is \ref av1_temporal_filter().
				454	This function returns 1 if temporal filtering is successful, otherwise 0.
				455	When temporal filtering is applied, the filtered frame will be held in
Angie Chiang	29aaace	2021-11-15 16:23:42 -0800	[diff] [blame]	456	the output_frame, which is the frame to be
Paul Wilkins	c84e8e2	2020-07-21 19:09:33 +0100	[diff] [blame]	457	encoded in the following encoding process.
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	458
				459	Almost all temporal filter related code is in av1/encoder/temporal_filter.c
				460	and av1/encoder/temporal_filter.h.
				461
Paul Wilkins	c84e8e2	2020-07-21 19:09:33 +0100	[diff] [blame]	462	Inside \ref av1_temporal_filter(), the reader's attention is directed to
				463	\ref tf_setup_filtering_buffer() and \ref tf_do_filtering().
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	464
Paul Wilkins	c84e8e2	2020-07-21 19:09:33 +0100	[diff] [blame]	465	- \ref tf_setup_filtering_buffer(): sets up the frame buffer for
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	466	temporal filtering, determines the number of frames to be used, and
				467	calculates the noise level of each frame.
				468
Paul Wilkins	c84e8e2	2020-07-21 19:09:33 +0100	[diff] [blame]	469	- \ref tf_do_filtering(): the main function for the temporal
Paul Wilkins	591f047	2020-07-15 15:30:56 +0100	[diff] [blame]	470	filtering algorithm. It breaks each frame into "MxM" blocks. For each
Paul Wilkins	c84e8e2	2020-07-21 19:09:33 +0100	[diff] [blame]	471	block a motion search \ref tf_motion_search() is applied to find
				472	the motion vector from one neighboring frame. tf_build_predictor() is then
				473	called to build the matching patch and \ref av1_apply_temporal_filter_c() (see
				474	also optimised SIMD versions) to apply temporal filtering. The weighted
				475	average over each pixel is accumulated and finally normalized in
				476	\ref tf_normalize_filtered_frame() to generate the final filtered frame.
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	477
Paul Wilkins	c84e8e2	2020-07-21 19:09:33 +0100	[diff] [blame]	478	- \ref av1_apply_temporal_filter_c(): the core function of our temporal
				479	filtering algorithm (see also optimised SIMD versions).
Paul Wilkins	3ceb7c7	2020-07-14 14:02:52 +0100	[diff] [blame]	480
				481	\subsection architecture_enc_frame_proc_film Film Grain Modelling
				482
				483	Add details here.
				484
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	485	\section architecture_enc_rate_ctrl Rate Control
				486
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	487	\subsection architecture_enc_rate_ctrl_data Main Data Structures
				488
				489	The following are the main data structures referenced in this section
				490	(see also \ref architecture_enc_data_structures):
				491
Mufaddal Chakera	358cf21	2021-02-25 14:41:56 +0530	[diff] [blame]	492	- \ref AV1_PRIMARY ppi (the primary compressor instance data structure)
				493	- \ref AV1_PRIMARY.twopass (\ref TWO_PASS)
				494
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	495	- \ref AV1_COMP cpi (the main compressor instance data structure)
				496	- \ref AV1_COMP.oxcf (\ref AV1EncoderConfig)
				497	- \ref AV1_COMP.rc (\ref RATE_CONTROL)
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	498	- \ref AV1_COMP.sf (\ref SPEED_FEATURES)
				499
				500	- \ref AV1EncoderConfig (Encoder configuration parameters)
				501	- \ref AV1EncoderConfig.rc_cfg (\ref RateControlCfg)
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	502
				503	- \ref FIRSTPASS_STATS *frame_stats_buf (used to store per frame first
				504	pass stats)
				505
				506	- \ref SPEED_FEATURES (Encode speed vs quality tradeoff parameters)
				507	- \ref SPEED_FEATURES.hl_sf (\ref HIGH_LEVEL_SPEED_FEATURES)
				508
				509	\subsection architecture_enc_rate_ctrl_options Supported Rate Control Options
				510
Paul Wilkins	7173920	2020-07-23 15:09:07 +0100	[diff] [blame]	511	Different use cases (\ref architecture_enc_use_cases) may have different
				512	requirements in terms of data rate control.
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	513
				514	The broad rate control strategy is selected using the <b>--end-usage</b>
				515	parameter on the command line, which maps onto the field
				516	\ref aom_codec_enc_cfg_t.rc_end_usage in \ref aom_encoder.h.
				517
				518	The four supported options are:-
				519
				520	- <b>VBR</b> (Variable Bitrate)
				521	- <b>CBR</b> (Constant Bitrate)
				522	- <b>CQ</b> (Constrained Quality mode ; A constrained variant of VBR)
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	523	- <b>Fixed Q</b> (Constant quality of Q mode)
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	524
				525	The value of \ref aom_codec_enc_cfg_t.rc_end_usage is in turn copied over
				526	into the encoder rate control configuration data structure as
Paul Wilkins	1dd7a7e	2020-07-09 17:07:35 +0100	[diff] [blame]	527	\ref RateControlCfg.mode.
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	528
				529	In regards to the most important use cases above, Video on demand uses either
				530	VBR or CQ mode. CBR is the preferred rate control model for RTC and Live
				531	streaming and Fixed Q is only used in testing.
				532
				533	The behaviour of each of these modes is regulated by a series of secondary
				534	command line rate control options but also depends somewhat on the selected
				535	use case, whether 2-pass coding is enabled and the selected encode speed vs
				536	quality trade offs (\ref AV1_COMP.speed and \ref AV1_COMP.sf).
				537
				538	The list below gives the names of the main rate control command line
				539	options together with the names of the corresponding fields in the rate
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	540	control configuration data structures.
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	541
Paul Wilkins	1dd7a7e	2020-07-09 17:07:35 +0100	[diff] [blame]	542	- <b>--target-bitrate</b> (\ref RateControlCfg.target_bandwidth)
				543	- <b>--min-q</b> (\ref RateControlCfg.best_allowed_q)
				544	- <b>--max-q</b> (\ref RateControlCfg.worst_allowed_q)
				545	- <b>--cq-level</b> (\ref RateControlCfg.cq_level)
				546	- <b>--undershoot-pct</b> (\ref RateControlCfg.under_shoot_pct)
				547	- <b>--overshoot-pct</b> (\ref RateControlCfg.over_shoot_pct)
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	548
Debargha Mukherjee	c6a8120	2020-07-22 16:35:20 -0700	[diff] [blame]	549	The following control aspects of vbr encoding
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	550
Debargha Mukherjee	c6a8120	2020-07-22 16:35:20 -0700	[diff] [blame]	551	- <b>--bias-pct</b> (\ref RateControlCfg.vbrbias)
				552	- <b>--minsection-pct</b> ((\ref RateControlCfg.vbrmin_section)
				553	- <b>--maxsection-pct</b> ((\ref RateControlCfg.vbrmax_section)
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	554
				555	The following relate to buffer and delay management in one pass low delay and
				556	real time coding
				557
Paul Wilkins	1dd7a7e	2020-07-09 17:07:35 +0100	[diff] [blame]	558	- <b>--buf-sz</b> (\ref RateControlCfg.maximum_buffer_size_ms)
				559	- <b>--buf-initial-sz</b> (\ref RateControlCfg.starting_buffer_level_ms)
				560	- <b>--buf-optimal-sz</b> (\ref RateControlCfg.optimal_buffer_level_ms)
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	561
				562	\subsection architecture_enc_vbr Variable Bitrate (VBR) Encoding
				563
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	564	For streamed VOD content the most common rate control strategy is Variable
				565	Bitrate (VBR) encoding. The CQ mode mentioned above is a variant of this
				566	where additional quantizer and quality constraints are applied. VBR
				567	encoding may in theory be used in conjunction with either 1-pass or 2-pass
				568	encoding.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	569
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	570	VBR encoding varies the number of bits given to each frame or group of frames
				571	according to the difficulty of that frame or group of frames, such that easier
				572	frames are allocated fewer bits and harder frames are allocated more bits. The
				573	intent here is to even out the quality between frames. This contrasts with
				574	Constant Bitrate (CBR) encoding where each frame is allocated the same number
				575	of bits.
				576
				577	Whilst for any given frame or group of frames the data rate may vary, the VBR
				578	algorithm attempts to deliver a given average bitrate over a wider time
				579	interval. In standard VBR encoding, the time interval over which the data rate
				580	is averaged is usually the duration of the video clip. An alternative
				581	approach is to target an average VBR bitrate over the entire video corpus for
				582	a particular video format (corpus VBR).
				583
				584	\subsubsection architecture_enc_1pass_vbr 1 Pass VBR Encoding
				585
				586	The command line for libaom does allow 1 Pass VBR, but this has not been
Paul Wilkins	c4cfb44	2020-07-01 16:15:53 +0100	[diff] [blame]	587	properly optimised and behaves much like 1 pass CBR in most regards, with bits
				588	allocated to frames by the following functions:
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	589
James Zern	e881616	2024-10-18 19:16:58 -0700	[diff] [blame^]	590	- \ref av1_calc_iframe_target_size_one_pass_vbr(
				591	const struct AV1_COMP *const cpi)
				592	"av1_calc_iframe_target_size_one_pass_vbr()"
				593	- \ref av1_calc_pframe_target_size_one_pass_vbr(
				594	const struct AV1_COMP *const cpi,
				595	FRAME_UPDATE_TYPE frame_update_type)
				596	"av1_calc_pframe_target_size_one_pass_vbr()"
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	597
				598	\subsubsection architecture_enc_2pass_vbr 2 Pass VBR Encoding
				599
				600	The main focus here will be on 2-pass VBR encoding (and the related CQ mode)
				601	as these are the modes most commonly used for VOD content.
				602
				603	2-pass encoding is selected on the command line by setting --passes=2
				604	(or -p 2).
				605
				606	Generally speaking, in 2-pass encoding, an encoder will first encode a video
				607	using a default set of parameters and assumptions. Depending on the outcome
				608	of that first encode, the baseline assumptions and parameters will be adjusted
				609	to optimize the output during the second pass. In essence the first pass is a
				610	fact finding mission to establish the complexity and variability of the video,
				611	in order to allow a better allocation of bits in the second pass.
				612
				613	The libaom 2-pass algorithm is unusual in that the first pass is not a full
				614	encode of the video. Rather it uses a limited set of prediction and transform
				615	options and a fixed quantizer, to generate statistics about each frame. No
				616	output bitstream is created and the per frame first pass statistics are stored
				617	entirely in volatile memory. This has some disadvantages when compared to a
				618	full first pass encode, but avoids the need for file I/O and improves speed.
				619
Paul Wilkins	c4cfb44	2020-07-01 16:15:53 +0100	[diff] [blame]	620	For two pass encoding, the function \ref av1_encode() will first be called
				621	for each frame in the video with the value \ref AV1EncoderConfig.pass = 1.
				622	This will result in calls to \ref av1_first_pass().
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	623
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	624	Statistics for each frame are stored in \ref FIRSTPASS_STATS frame_stats_buf.
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	625
				626	After completion of the first pass, \ref av1_encode() will be called again for
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	627	each frame with \ref AV1EncoderConfig.pass = 2. The frames are then encoded in
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	628	accordance with the statistics gathered during the first pass by calls to
Paul Wilkins	a0816fc	2020-07-23 13:33:29 +0100	[diff] [blame]	629	\ref encode_frame_to_data_rate() which in turn calls
				630	\ref av1_get_second_pass_params().
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	631
				632	In summary the second pass code :-
				633
				634	- Searches for scene cuts (if auto key frame detection is enabled).
				635	- Defines the length of and hierarchical structure to be used in each
				636	ARF/GF group.
				637	- Allocates bits based on the relative complexity of each frame, the quality
				638	of frame to frame prediction and the type of frame (e.g. key frame, ARF
				639	frame, golden frame or normal leaf frame).
				640	- Suggests a maximum Q (quantizer value) for each ARF/GF group, based on
				641	estimated complexity and recent rate control compliance
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	642	(\ref RATE_CONTROL.active_worst_quality)
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	643	- Tracks adherence to the overall rate control objectives and adjusts
				644	heuristics.
				645
Paul Wilkins	591f047	2020-07-15 15:30:56 +0100	[diff] [blame]	646	The main two pass functions in regard to the above include:-
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	647
Paul Wilkins	be20bc2	2020-07-16 14:46:57 +0100	[diff] [blame]	648	- \ref find_next_key_frame()
Paul Wilkins	e8af152	2020-07-09 15:05:01 +0100	[diff] [blame]	649	- \ref define_gf_group()
Paul Wilkins	be20bc2	2020-07-16 14:46:57 +0100	[diff] [blame]	650	- \ref calculate_total_gf_group_bits()
				651	- \ref get_twopass_worst_quality()
				652	- \ref av1_gop_setup_structure()
				653	- \ref av1_gop_bit_allocation()
				654	- \ref av1_twopass_postencode_update()
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	655
				656	For each frame, the two pass algorithm defines a target number of bits
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	657	\ref RATE_CONTROL.base_frame_target, which is then adjusted if necessary to
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	658	reflect any undershoot or overshoot on previous frames to give
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	659	\ref RATE_CONTROL.this_frame_target.
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	660
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	661	As well as \ref RATE_CONTROL.active_worst_quality, the two pass code also
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	662	maintains a record of the actual Q value used to encode previous frames
				663	at each level in the current pyramid hierarchy
Aasaipriya	c6f0a0b	2021-08-12 11:27:03 +0530	[diff] [blame]	664	(\ref PRIMARY_RATE_CONTROL.active_best_quality). The function
Paul Wilkins	c4cfb44	2020-07-01 16:15:53 +0100	[diff] [blame]	665	\ref rc_pick_q_and_bounds(), uses these values to set a permitted Q range
				666	for each frame.
Paul Wilkins	83cfad4	2020-06-26 12:38:07 +0100	[diff] [blame]	667
				668	\subsubsection architecture_enc_1pass_lagged 1 Pass Lagged VBR Encoding
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	669
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	670	1 pass lagged encode falls between simple 1 pass encoding and full two pass
				671	encoding and is used for cases where it is not possible to do a full first
				672	pass through the entire video clip, but where some delay is permissible. For
				673	example near live streaming where there is a delay of up to a few seconds. In
				674	this case the first pass and second pass are in effect combined such that the
				675	first pass starts encoding the clip and the second pass lags behind it by a
				676	few frames. When using this method, full sequence level statistics are not
				677	available, but it is possible to collect and use frame or group of frame level
				678	data to help in the allocation of bits and in defining ARF/GF coding
Tarundeep Singh	5e5305a	2021-03-16 13:04:04 +0530	[diff] [blame]	679	hierarchies. The reader is referred to the \ref AV1_PRIMARY.lap_enabled field
Paul Wilkins	7173920	2020-07-23 15:09:07 +0100	[diff] [blame]	680	in the main compressor instance (where <b>lap</b> stands for
Paul Wilkins	e8c76eb	2020-06-30 17:24:11 +0100	[diff] [blame]	681	<b>look ahead processing</b>). This encoding mode for the most part uses the
				682	same rate control pathways as two pass VBR encoding.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	683
				684	\subsection architecture_enc_rc_loop The Main Rate Control Loop
				685
Paul Wilkins	c4cfb44	2020-07-01 16:15:53 +0100	[diff] [blame]	686	Having established a target rate for a given frame and an allowed range of Q
				687	values, the encoder then tries to encode the frame at a rate that is as close
				688	as possible to the target value, given the Q range constraints.
				689
				690	There are two main mechanisms by which this is achieved.
				691
				692	The first selects a frame level Q, using an adaptive estimate of the number of
				693	bits that will be generated when the frame is encoded at any given Q.
				694	Fundamentally this mechanism is common to VBR, CBR and to use cases such as
				695	RTC with small adjustments.
				696
				697	As the Q value mainly adjusts the precision of the residual signal, it is not
				698	actually a reliable basis for accurately predicting the number of bits that
				699	will be generated across all clips. A well predicted clip, for example, may
				700	have a much smaller error residual after prediction. The algorithm copes with
				701	this by adapting its predictions on the fly using a feedback loop based on how
				702	well it did the previous time around.
				703
				704	The main functions responsible for the prediction of Q and the adaptation over
				705	time, for the two pass encoding pipeline are:
				706
				707	- \ref rc_pick_q_and_bounds()
Paul Wilkins	5ce9d50	2020-07-16 17:58:40 +0100	[diff] [blame]	708	- \ref get_q()
James Zern	e881616	2024-10-18 19:16:58 -0700	[diff] [blame^]	709	- \ref av1_rc_regulate_q(
				710	const struct AV1_COMP *cpi, int target_bits_per_frame,
				711	int active_best_quality, int active_worst_quality,
				712	int width, int height) "av1_rc_regulate_q()"
Paul Wilkins	5ce9d50	2020-07-16 17:58:40 +0100	[diff] [blame]	713	- \ref get_rate_correction_factor()
				714	- \ref set_rate_correction_factor()
				715	- \ref find_closest_qindex_by_rate()
Paul Wilkins	be20bc2	2020-07-16 14:46:57 +0100	[diff] [blame]	716	- \ref av1_twopass_postencode_update()
Paul Wilkins	5ce9d50	2020-07-16 17:58:40 +0100	[diff] [blame]	717	- \ref av1_rc_update_rate_correction_factors()
Paul Wilkins	c4cfb44	2020-07-01 16:15:53 +0100	[diff] [blame]	718
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	719	A second mechanism for control comes into play if there is a large rate miss
Paul Wilkins	c4cfb44	2020-07-01 16:15:53 +0100	[diff] [blame]	720	for the current frame (much too big or too small). This is a recode mechanism
				721	which allows the current frame to be re-encoded one or more times with a
				722	revised Q value. This obviously has significant implications for encode speed
				723	and in the case of RTC latency (hence it is not used for the RTC pathway).
				724
				725	Whether or not a recode is allowed for a given frame depends on the selected
				726	encode speed vs quality trade off. This is set on the command line using the
				727	--cpu-used parameter which maps onto the \ref AV1_COMP.speed field in the main
				728	compressor instance data structure.
				729
				730	The value of \ref AV1_COMP.speed, combined with the use case, is used to
				731	populate the speed features data structure AV1_COMP.sf. In particular
				732	\ref HIGH_LEVEL_SPEED_FEATURES.recode_loop determines the types of frames that
				733	may be recoded and \ref HIGH_LEVEL_SPEED_FEATURES.recode_tolerance is a rate
				734	error trigger threshold.
				735
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	736	For more information the reader is directed to the following functions:
Paul Wilkins	c4cfb44	2020-07-01 16:15:53 +0100	[diff] [blame]	737
Paul Wilkins	591f047	2020-07-15 15:30:56 +0100	[diff] [blame]	738	- \ref encode_with_recode_loop()
Paul Wilkins	c8d3f11	2020-07-08 17:58:14 +0100	[diff] [blame]	739	- \ref encode_without_recode()
Paul Wilkins	591f047	2020-07-15 15:30:56 +0100	[diff] [blame]	740	- \ref recode_loop_update_q()
				741	- \ref recode_loop_test()
Paul Wilkins	7173920	2020-07-23 15:09:07 +0100	[diff] [blame]	742	- \ref av1_set_speed_features_framesize_independent()
				743	- \ref av1_set_speed_features_framesize_dependent()
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	744
				745	\subsection architecture_enc_fixed_q Fixed Q Mode
				746
Paul Wilkins	ea2876f	2020-07-13 18:36:09 +0100	[diff] [blame]	747	There are two main fixed Q cases:
				748	-# Fixed Q with adaptive qp offsets: same qp offset for each pyramid level
				749	in a given video, but these offsets are adaptive based on video content.
				750	-# Fixed Q with fixed qp offsets: content-independent fixed qp offsets for
Jingning Han	4eed226	2021-09-08 15:48:50 -0700	[diff] [blame]	751	each pyramid level.
Paul Wilkins	ea2876f	2020-07-13 18:36:09 +0100	[diff] [blame]	752
				753	The reader is also refered to the following functions:
				754	- \ref av1_rc_pick_q_and_bounds()
				755	- \ref rc_pick_q_and_bounds_no_stats_cbr()
				756	- \ref rc_pick_q_and_bounds_no_stats()
				757	- \ref rc_pick_q_and_bounds()
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	758
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	759	\section architecture_enc_frame_groups GF/ ARF Frame Groups & Hierarchical Coding
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	760
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	761	\subsection architecture_enc_frame_groups_data Main Data Structures
				762
				763	The following are the main data structures referenced in this section
				764	(see also \ref architecture_enc_data_structures):
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	765
				766	- \ref AV1_COMP cpi (the main compressor instance data structure)
				767	- \ref AV1_COMP.rc (\ref RATE_CONTROL)
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	768
				769	- \ref FIRSTPASS_STATS *frame_stats_buf (used to store per frame first pass
				770	stats)
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	771
				772	\subsection architecture_enc_frame_groups_groups Frame Groups
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	773
				774	To process a sequence/stream of video frames, the encoder divides the frames
				775	into groups and encodes them sequentially (possibly dependent on previous
				776	groups). In AV1 such a group is usually referred to as a golden frame group
				777	(GF group) or sometimes an Alt-Ref (ARF) group or a group of pictures (GOP).
				778	A GF group determines and stores the coding structure of the frames (for
				779	example, frame type, usage of the hierarchical structure, usage of overlay
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	780	frames, etc.) and can be considered as the base unit to process the frames,
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	781	therefore playing an important role in the encoder.
				782
				783	The length of a specific GF group is arguably the most important aspect when
				784	determining a GF group. This is because most GF group level decisions are
				785	based on the frame characteristics, if not on the length itself directly.
				786	Note that the GF group is always a group of consecutive frames, which means
				787	the start and end of the group (so again, the length of it) determines which
				788	frames are included in it and hence determines the characteristics of the GF
				789	group. Therefore, in this document we will first discuss the GF group length
				790	decision in Libaom, followed by frame structure decisions when defining a GF
				791	group with a certain length.
				792
				793	\subsection architecture_enc_gf_length GF / ARF Group Length Determination
				794
				795	The basic intuition of determining the GF group length is that it is usually
				796	desirable to group together frames that are similar. Hence, we may choose
				797	longer groups when consecutive frames are very alike and shorter ones when
				798	they are very different.
				799
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	800	The determination of the GF group length is done in function \ref
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	801	calculate_gf_length(). The following encoder use cases are supported:
				802
				803	<ul>
Paul Wilkins	ff98f3e	2020-07-27 16:01:05 +0100	[diff] [blame]	804	<li><b>Single pass with look-ahead disabled(\ref has_no_stats_stage()):
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	805	</b> in this case there is no information available on the following stream
				806	of frames, therefore the function will set the GF group length for the
				807	current and the following GF groups (a total number of MAX_NUM_GF_INTERVALS
				808	groups) to be the maximum value allowed.</li>
				809
Tarundeep Singh	5e5305a	2021-03-16 13:04:04 +0530	[diff] [blame]	810	<li><b>Single pass with look-ahead enabled (\ref AV1_PRIMARY.lap_enabled):</b>
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	811	look-ahead processing is enabled for single pass, therefore there is a
				812	limited amount of information available regarding future frames. In this
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	813	case the function will determine the length based on \ref FIRSTPASS_STATS
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	814	(which is generated when processing the look-ahead buffer) for only the
				815	current GF group.</li>
				816
				817	<li><b>Two pass:</b> the first pass in two-pass encoding collects the stats
				818	and will not call the function. In the second pass, the function tries to
				819	determine the GF group length of the current and the following GF groups (a
				820	total number of MAX_NUM_GF_INTERVALS groups) based on the first-pass
				821	statistics. Note that as we will be discussing later, such decisions may not
				822	be accurate and can be changed later.</li>
				823	</ul>
				824
				825	Except for the first trivial case where there is no prior knowledge of the
Bohan Li	cb3b65b	2020-11-04 13:50:00 -0800	[diff] [blame]	826	following frames, the function \ref calculate_gf_length() tries to determine the
				827	GF group length based on the first pass statistics. The determination is divided
				828	into two parts:
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	829
				830	<ol>
				831	<li>Baseline decision based on accumulated statistics: this part of the function
				832	iterates through the firstpass statistics of the following frames and
				833	accumulates the statistics with function accumulate_next_frame_stats.
				834	The accumulated statistics are then used to determine whether the
				835	correlation in the GF group has dropped too much in function detect_gf_cut.
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	836	If detect_gf_cut returns non-zero, or if we've reached the end of
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	837	first-pass statistics, the baseline decision is set at the current point.</li>
				838
				839	<li>If we are not at the end of the first-pass statistics, the next part will
Bohan Li	cb3b65b	2020-11-04 13:50:00 -0800	[diff] [blame]	840	try to refine the baseline decision. This algorithm is based on the analysis
				841	of firstpass stats. It tries to cut the groups in stable regions or
				842	relatively stable points. Also it tries to avoid cutting in a blending
				843	region.</li>
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	844	</ol>
				845
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	846	As mentioned, for two-pass encoding, the function \ref
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	847	calculate_gf_length() tries to determine the length of as many as
				848	MAX_NUM_GF_INTERVALS groups. The decisions are stored in
Mufaddal Chakera	94ee9bf	2021-04-12 01:02:22 +0530	[diff] [blame]	849	\ref PRIMARY_RATE_CONTROL.gf_intervals[]. The variables
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	850	\ref RATE_CONTROL.intervals_till_gf_calculate_due and
Mufaddal Chakera	94ee9bf	2021-04-12 01:02:22 +0530	[diff] [blame]	851	\ref PRIMARY_RATE_CONTROL.gf_intervals[] help with managing and updating the stored
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	852	decisions. In the function \ref define_gf_group(), the corresponding
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	853	stored length decision will be used to define the current GF group.
				854
				855	When the maximum GF group length is larger or equal to 32, the encoder will
				856	enforce an extra layer to determine whether to use maximum GF length of 32
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	857	or 16 for every GF group. In such a case, \ref calculate_gf_length() is
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	858	first called with the original maximum length (>=32). Afterwards,
Paul Wilkins	ff98f3e	2020-07-27 16:01:05 +0100	[diff] [blame]	859	\ref av1_tpl_setup_stats() is called to analyze the determined GF group
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	860	and compare the reference to the last frame and the middle frame. If it is
				861	decided that we should use a maximum GF length of 16, the function
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	862	\ref calculate_gf_length() is called again with the updated maximum
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	863	length, and it only sets the length for a single GF group
				864	(\ref RATE_CONTROL.intervals_till_gf_calculate_due is set to 1). This process
Bohan Li	cb3b65b	2020-11-04 13:50:00 -0800	[diff] [blame]	865	is shown below.
				866
				867	\image html tplgfgroupdiagram.png "" width=40%
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	868
				869	Before encoding each frame, the encoder checks
				870	\ref RATE_CONTROL.frames_till_gf_update_due. If it is zero, indicating
				871	processing of the current GF group is done, the encoder will check whether
				872	\ref RATE_CONTROL.intervals_till_gf_calculate_due is zero. If it is, as
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	873	discussed above, \ref calculate_gf_length() is called with original
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	874	maximum length. If it is not zero, then the GF group length value stored
Mufaddal Chakera	94ee9bf	2021-04-12 01:02:22 +0530	[diff] [blame]	875	in \ref PRIMARY_RATE_CONTROL.gf_intervals[\ref PRIMARY_RATE_CONTROL.cur_gf_index] is used
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	876	(subject to change as discussed above).
				877
Paul Wilkins	e8af152	2020-07-09 15:05:01 +0100	[diff] [blame]	878	\subsection architecture_enc_gf_structure Defining a GF Group's Structure
				879
				880	The function \ref define_gf_group() defines the frame structure as well
				881	as other GF group level parameters (e.g. bit allocation) once the length of
				882	the current GF group is determined.
				883
Bohan Li	cb3b65b	2020-11-04 13:50:00 -0800	[diff] [blame]	884	The function first iterates through the first pass statistics in the GF group to
				885	accumulate various stats, using accumulate_this_frame_stats() and
				886	accumulate_next_frame_stats(). The accumulated statistics are then used to
				887	determine the use of the use of ALTREF frame along with other properties of the
Mufaddal Chakera	94ee9bf	2021-04-12 01:02:22 +0530	[diff] [blame]	888	GF group. The values of \ref PRIMARY_RATE_CONTROL.cur_gf_index, \ref
Bohan Li	cb3b65b	2020-11-04 13:50:00 -0800	[diff] [blame]	889	RATE_CONTROL.intervals_till_gf_calculate_due and \ref
				890	RATE_CONTROL.frames_till_gf_update_due are also updated accordingly.
Paul Wilkins	e8af152	2020-07-09 15:05:01 +0100	[diff] [blame]	891
Bohan Li	cb3b65b	2020-11-04 13:50:00 -0800	[diff] [blame]	892	The function \ref av1_gop_setup_structure() is called at the end to determine
				893	the frame layers and reference maps in the GF group, where the
				894	construct_multi_layer_gf_structure() function sets the frame update types for
				895	each frame and the group structure.
Paul Wilkins	e8af152	2020-07-09 15:05:01 +0100	[diff] [blame]	896
				897	- If ALTREF frames are allowed for the GF group: the first frame is set to
Bohan Li	cb3b65b	2020-11-04 13:50:00 -0800	[diff] [blame]	898	KF_UPDATE, GF_UPDATE or ARF_UPDATE. The last frames of the GF group is set to
				899	OVERLAY_UPDATE. Then in set_multi_layer_params(), frame update
				900	types are determined recursively in a binary tree fashion, and assigned to
				901	give the final IBBB structure for the group. - If the current branch has more
				902	than 2 frames and we have not reached maximum layer depth, then the middle
				903	frame is set as INTNL_ARF_UPDATE, and the left and right branches are
				904	processed recursively. - If the current branch has less than 3 frames, or we
				905	have reached maximum layer depth, then every frame in the branch is set to
				906	LF_UPDATE.
Paul Wilkins	e8af152	2020-07-09 15:05:01 +0100	[diff] [blame]	907
Bohan Li	cb3b65b	2020-11-04 13:50:00 -0800	[diff] [blame]	908	- If ALTREF frame is not allowed for the GF group: the frames are set
				909	as LF_UPDATE. This basically forms an IPPP GF group structure.
				910
				911	As mentioned, the encoder may use Temporal dependancy modelling (TPL - see \ref
				912	architecture_enc_tpl) to determine whether we should use a maximum length of 32
				913	or 16 for the current GF group. This requires calls to \ref define_gf_group()
				914	but should not change other settings (since it is in essence a trial). This
				915	special case is indicated by the setting parameter <b>is_final_pass</b> for to
				916	zero.
Paul Wilkins	e8af152	2020-07-09 15:05:01 +0100	[diff] [blame]	917
				918	For single pass encodes where look-ahead processing is disabled
Tarundeep Singh	5e5305a	2021-03-16 13:04:04 +0530	[diff] [blame]	919	(\ref AV1_PRIMARY.lap_enabled = 0), \ref define_gf_group_pass0() is used
Paul Wilkins	e8af152	2020-07-09 15:05:01 +0100	[diff] [blame]	920	instead of \ref define_gf_group().
				921
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	922	\subsection architecture_enc_kf_groups Key Frame Groups
				923
				924	A special constraint for GF group length is the location of the next keyframe
				925	(KF). The frames between two KFs are referred to as a KF group. Each KF group
				926	can be encoded and decoded independently. Because of this, a GF group cannot
				927	span beyond a KF and the location of the next KF is set as a hard boundary
				928	for GF group length.
				929
				930	<ul>
				931	<li>For two-pass encoding \ref RATE_CONTROL.frames_to_key controls when to
				932	encode a key frame. When it is zero, the current frame is a keyframe and
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	933	the function \ref find_next_key_frame() is called. This in turn calls
				934	\ref define_kf_interval() to work out where the next key frame should
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	935	be placed.</li>
				936
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	937	<li>For single-pass with look-ahead enabled, \ref define_kf_interval()
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	938	is called whenever a GF group update is needed (when
				939	\ref RATE_CONTROL.frames_till_gf_update_due is zero). This is because
				940	generally KFs are more widely spaced and the look-ahead buffer is usually
				941	not long enough.</li>
				942
				943	<li>For single-pass with look-ahead disabled, the KFs are placed according
				944	to the command line parameter <b>--kf-max-dist</b> (The above two cases are
				945	also subject to this constraint).</li>
				946	</ul>
				947
bohanli	d165b19	2020-06-10 21:46:29 -0700	[diff] [blame]	948	The function \ref define_kf_interval() tries to detect a scenecut.
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	949	If a scenecut within kf-max-dist is detected, then it is set as the next
				950	keyframe. Otherwise the given maximum value is used.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	951
				952	\section architecture_enc_tpl Temporal Dependency Modelling
Paul Wilkins	1fb0172	2020-07-07 17:45:46 +0100	[diff] [blame]	953
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	954	The temporal dependency model runs at the beginning of each GOP. It builds the
				955	motion trajectory within the GOP in units of 16x16 blocks. The temporal
				956	dependency of a 16x16 block is evaluated as the predictive coding gains it
				957	contributes to its trailing motion trajectory. This temporal dependency model
				958	reflects how important a coding block is for the coding efficiency of the
				959	overall GOP. It is hence used to scale the Lagrangian multiplier used in the
				960	rate-distortion optimization framework.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	961
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	962	\subsection architecture_enc_tpl_config Configurations
				963
				964	The temporal dependency model and its applications are by default turned on in
				965	libaom encoder for the VoD use case. To disable it, use --tpl-model=0 in the
				966	aomenc configuration.
				967
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	968	\subsection architecture_enc_tpl_algoritms Algorithms
				969
				970	The scheme works in the reverse frame processing order over the source frames,
				971	propagating information from future frames back to the current frame. For each
				972	frame, a propagation step is run for each MB. it operates as follows:
				973
				974	<ul>
				975	<li> Estimate the intra prediction cost in terms of sum of absolute Hadamard
				976	transform difference (SATD) noted as intra_cost. It also loads the motion
				977	information available from the first-pass encode and estimates the inter
				978	prediction cost as inter_cost. Due to the use of hybrid inter/intra
				979	prediction mode, the inter_cost value is further upper bounded by
				980	intra_cost. A propagation cost variable is used to collect all the
				981	information flowed back from future processing frames. It is initialized as
				982	0 for all the blocks in the last processing frame in a group of pictures
				983	(GOP).</li>
				984
				985	<li> The fraction of information from a current block to be propagated towards
				986	its reference block is estimated as:
				987	\f[
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	988	propagation\_fraction = (1 - inter\_cost/intra\_cost)
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	989	\f]
				990	It reflects how much the motion compensated reference would reduce the
				991	prediction error in percentage.</li>
				992
				993	<li> The total amount of information the current block contributes to the GOP
				994	is estimated as intra_cost + propagation_cost. The information that it
				995	propagates towards its reference block is captured by:
				996
				997	\f[
				998	propagation\_amount =
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	999	(intra\_cost + propagation\_cost) * propagation\_fraction
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	1000	\f]</li>
				1001
				1002	<li> Note that the reference block may not necessarily sit on the grid of
				1003	16x16 blocks. The propagation amount is hence dispensed to all the blocks
				1004	that overlap with the reference block. The corresponding block in the
				1005	reference frame accumulates its own propagation cost as it receives back
				1006	propagation.
				1007
				1008	\f[
				1009	propagation\_cost = propagation\_cost +
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	1010	(\frac{overlap\_area}{(1616)} propagation\_amount)
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	1011	\f]</li>
				1012
				1013	<li> In the final encoding stage, the distortion propagation factor of a block
				1014	is evaluated as \f$(1 + \frac{propagation\_cost}{intra\_cost})\f$, where the second term
				1015	captures its impact on later frames in a GOP.</li>
				1016
				1017	<li> The Lagrangian multiplier is adapted at the 64x64 block level. For every
				1018	64x64 block in a frame, we have a distortion propagation factor:
				1019
				1020	\f[
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	1021	dist\_prop[i] = 1 + \frac{propagation\_cost[i]}{intra\_cost[i]}
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	1022	\f]
				1023
				1024	where i denotes the block index in the frame. We also have the frame level
				1025	distortion propagation factor:
				1026
				1027	\f[
				1028	dist\_prop = 1 +
Paul Wilkins	b2194de	2020-07-08 17:58:14 +0100	[diff] [blame]	1029	\frac{\sum_{i}propagation\_cost[i]}{\sum_{i}intra\_cost[i]}
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	1030	\f]
				1031
				1032	which is used to normalize the propagation factor at the 64x64 block level. The
				1033	Lagrangian multiplier is hence adapted as:
				1034
				1035	\f[
				1036	λ[i] = λ[0] * \frac{dist\_prop}{dist\_prop[i]}
				1037	\f]
				1038
				1039	where λ0 is the multiplier associated with the frame level QP. The
				1040	64x64 block level QP is scaled according to the Lagrangian multiplier.
				1041	</ul>
				1042
Paul Wilkins	ff98f3e	2020-07-27 16:01:05 +0100	[diff] [blame]	1043	\subsection architecture_enc_tpl_keyfun Key Functions and data structures
Paul Wilkins	f209ec5	2020-07-06 16:03:52 +0100	[diff] [blame]	1044
Paul Wilkins	ff98f3e	2020-07-27 16:01:05 +0100	[diff] [blame]	1045	The reader is also refered to the following functions and data structures:
				1046
				1047	- \ref TplParams
				1048	- \ref av1_tpl_setup_stats() builds the TPL model.
				1049	- \ref setup_delta_q() Assign different quantization parameters to each super
				1050	block based on its TPL weight.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1051
				1052	\section architecture_enc_partitions Block Partition Search
				1053
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	1054	A frame is first split into tiles in \ref encode_tiles(), with each tile
				1055	compressed by av1_encode_tile(). Then a tile is processed in superblock rows
				1056	via \ref av1_encode_sb_row() and then \ref encode_sb_row().
				1057
				1058	The partition search processes superblocks sequentially in \ref
				1059	encode_sb_row(). Two search modes are supported, depending upon the encoding
				1060	configuration, \ref encode_nonrd_sb() is for 1-pass and real-time modes,
				1061	while \ref encode_rd_sb() performs more exhaustive rate distortion based
				1062	searches.
				1063
				1064	Partition search over the recursive quad-tree space is implemented by
				1065	recursive calls to \ref av1_nonrd_use_partition(),
				1066	\ref av1_rd_use_partition(), or av1_rd_pick_partition() and returning best
				1067	options for sub-trees to their parent partitions.
				1068
Paul Wilkins	3a13f64	2020-07-29 17:35:33 +0100	[diff] [blame]	1069	In libaom, the partition search lays on top of the mode search (predictor,
				1070	transform, etc.), instead of being a separate module. The interface of mode
				1071	search is \ref pick_sb_modes(), which connects the partition_search with
				1072	\ref architecture_enc_inter_modes and \ref architecture_enc_intra_modes. To
				1073	make good decisions, reconstruction is also required in order to build
				1074	references and contexts. This is implemented by \ref encode_sb() at the
				1075	sub-tree level and \ref encode_b() at coding block level.
Paul Wilkins	196995d	2020-07-14 16:49:38 +0100	[diff] [blame]	1076
				1077	See also \ref partition_search
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1078
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1079	\section architecture_enc_intra_modes Intra Mode Search
				1080
Paul Wilkins	4ac8bf4	2020-07-30 16:44:27 +0100	[diff] [blame]	1081	AV1 also provides 71 different intra prediction modes, i.e. modes that predict
				1082	only based upon information in the current frame with no dependency on
				1083	previous or future frames. For key frames, where this independence from any
				1084	other frame is a defining requirement and for other cases where intra only
				1085	frames are required, the encoder need only considers these modes in the rate
				1086	distortion loop.
				1087
				1088	Even so, in most use cases, searching all possible intra prediction modes for
				1089	every block and partition size is not practical and some pruning of the search
				1090	tree is necessary.
				1091
				1092	For the Rate distortion optimized case, the main top level function
				1093	responsible for selecting the intra prediction mode for a given block is
				1094	\ref av1_rd_pick_intra_mode_sb(). The readers attention is also drawn to the
				1095	functions \ref hybrid_intra_mode_search() and \ref av1_nonrd_pick_intra_mode()
				1096	which may be used where encode speed is critical. The choice between the
				1097	rd path and the non rd or hybrid paths depends on the encoder use case and the
				1098	\ref AV1_COMP.speed parameter. Further fine control of the speed vs quality
				1099	trade off is provided by means of fields in \ref AV1_COMP.sf (which has type
				1100	\ref SPEED_FEATURES).
				1101
				1102	Note that some intra modes are only considered for specific use cases or
				1103	types of video. For example the palette based prediction modes are often
				1104	valueable for graphics or screen share content but not for natural video.
				1105	(See \ref av1_search_palette_mode())
				1106
Paul Wilkins	3a13f64	2020-07-29 17:35:33 +0100	[diff] [blame]	1107	See also \ref intra_mode_search for more details.
				1108
				1109	\section architecture_enc_inter_modes Inter Prediction Mode Search
				1110
Paul Wilkins	da6a80b	2020-07-30 17:27:56 +0100	[diff] [blame]	1111	For inter frames, where we also allow prediction using one or more previously
				1112	coded frames (which may chronologically speaking be past or future frames or
				1113	non-display reference buffers such as ARF frames), the size of the search tree
				1114	that needs to be traversed, to select a prediction mode, is considerably more
				1115	massive.
				1116
				1117	In addition to the 71 possible intra modes we also need to consider 56 single
				1118	frame inter prediction modes (7 reference frames x 4 modes x 2 for OBMC
				1119	(overlapped block motion compensation)), 12768 compound inter prediction modes
				1120	(these are modes that combine inter predictors from two reference frames) and
				1121	36708 compound inter / intra prediction modes.
				1122
				1123	As with the intra mode search, libaom supports an RD based pathway and a non
				1124	rd pathway for speed critical use cases. The entry points for these two cases
Jingning Han	e9eb8c0	2020-11-11 14:47:53 -0800	[diff] [blame]	1125	are \ref av1_rd_pick_inter_mode() and \ref av1_nonrd_pick_inter_mode_sb()
Paul Wilkins	da6a80b	2020-07-30 17:27:56 +0100	[diff] [blame]	1126	respectively.
				1127
				1128	Various heuristics and predictive strategies are used to prune the search tree
				1129	with fine control provided through the speed features parameter in the main
				1130	compressor instance data structure \ref AV1_COMP.sf.
				1131
				1132	It is worth noting, that some prediction modes incurr a much larger rate cost
				1133	than others (ignoring for now the cost of coding the error residual). For
				1134	example, a compound mode that requires the encoder to specify two reference
				1135	frames and two new motion vectors will almost inevitable have a higher rate
				1136	cost than a simple inter prediction mode that uses a predicted or 0,0 motion
				1137	vector. As such, if we have already found a mode for the current block that
				1138	has a low RD cost, we can skip a large number of the possible modes on the
				1139	basis that even if the error residual is 0 the inherent rate cost of the
				1140	mode itself will garauntee that it is not chosen.
				1141
Paul Wilkins	3a13f64	2020-07-29 17:35:33 +0100	[diff] [blame]	1142	See also \ref inter_mode_search for more details.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1143
				1144	\section architecture_enc_tx_search Transform Search
				1145
Paul Wilkins	8ed85dd	2020-08-04 17:48:22 +0100	[diff] [blame]	1146	AV1 implements the transform stage using 4 seperable 1-d transforms (DCT,
				1147	ADST, FLIPADST and IDTX, where FLIPADST is the reversed version of ADST
				1148	and IDTX is the identity transform) which can be combined to give 16 2-d
				1149	combinations.
Paul Wilkins	3a13f64	2020-07-29 17:35:33 +0100	[diff] [blame]	1150
				1151	These combinations can be applied at 19 different scales from 64x64 pixels
				1152	down to 4x4 pixels.
				1153
				1154	This gives rise to a large number of possible candidate transform options
				1155	for coding the residual error after prediction. An exhaustive rate-distortion
				1156	based evaluation of all candidates would not be practical from a speed
				1157	perspective in a production encoder implementation. Hence libaom addopts a
				1158	number of strategies to prune the selection of both the transform size and
				1159	transform type.
				1160
				1161	There are a number of strategies that have been tested and implememnted in
				1162	libaom including:
				1163
				1164	- A statistics based approach that looks at the frequency with which certain
				1165	combinations are used in a given context and prunes out very unlikely
				1166	candidates. It is worth noting here that some size candidates can be pruned
				1167	out immediately based on the size of the prediction partition. For example it
				1168	does not make sense to use a transform size that is larger than the
				1169	prediction partition size but also a very large prediction partition size is
				1170	unlikely to be optimally pared with small transforms.
				1171
				1172	- A Machine learning based model
				1173
				1174	- A method that initially tests candidates using a fast algorithm that skips
				1175	entropy encoding and uses an estimated cost model to choose a reduced subset
				1176	for full RD analysis. This subject is covered more fully in a paper authored
				1177	by Bohan Li, Jingning Han, and Yaowu Xu titled: <b>Fast Transform Type
				1178	Selection Using Conditional Laplace Distribution Based Rate Estimation</b>
				1179
				1180	<b>TODO Add link to paper when available</b>
				1181
				1182	See also \ref transform_search for more details.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1183
Paul Wilkins	d7a9f0e	2020-07-30 18:12:40 +0100	[diff] [blame]	1184	\section architecture_post_enc_filt Post Encode Loop Filtering
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1185
Paul Wilkins	d7a9f0e	2020-07-30 18:12:40 +0100	[diff] [blame]	1186	AV1 supports three types of post encode <b>in loop</b> filtering to improve
				1187	the quality of the reconstructed video.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1188
Paul Wilkins	d7a9f0e	2020-07-30 18:12:40 +0100	[diff] [blame]	1189	- <b>Deblocking Filter</b> The first of these is a farily traditional boundary
				1190	deblocking filter that attempts to smooth discontinuities that may occur at
				1191	the boundaries between blocks. See also \ref in_loop_filter.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1192
Paul Wilkins	d7a9f0e	2020-07-30 18:12:40 +0100	[diff] [blame]	1193	- <b>CDEF Filter</b> The constrained directional enhancement filter (CDEF)
				1194	allows the codec to apply a non-linear deringing filter along certain
				1195	(potentially oblique) directions. A primary filter is applied along the
Paul Wilkins	10e9944	2020-08-05 15:35:44 +0100	[diff] [blame]	1196	selected direction, whilst a secondary filter is applied at 45 degrees to
Paul Wilkins	f88a151	2020-10-20 13:18:40 +0100	[diff] [blame]	1197	the primary direction. (See also \ref in_loop_cdef and
				1198	<a href="https://arxiv.org/abs/2008.06091"> A Technical Overview of AV1</a>.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1199
Paul Wilkins	d7a9f0e	2020-07-30 18:12:40 +0100	[diff] [blame]	1200	- <b>Loop Restoration Filter</b> The loop restoration filter is applied after
Paul Wilkins	10e9944	2020-08-05 15:35:44 +0100	[diff] [blame]	1201	any prior post filtering stages. It acts on units of either 64 x 64,
				1202	128 x 128, or 256 x 256 pixel blocks, refered to as loop restoration units.
Paul Wilkins	d7a9f0e	2020-07-30 18:12:40 +0100	[diff] [blame]	1203	Each unit can independently select either to bypass filtering, use a Wiener
				1204	filter, or use a self-guided filter. (See also \ref in_loop_restoration and
Paul Wilkins	f88a151	2020-10-20 13:18:40 +0100	[diff] [blame]	1205	<a href="https://arxiv.org/abs/2008.06091"> A Technical Overview of AV1</a>.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1206
				1207	\section architecture_entropy Entropy Coding
				1208
Paul Wilkins	ef79fe4	2020-08-04 19:32:11 +0100	[diff] [blame]	1209	\subsection architecture_entropy_aritmetic Arithmetic Coder
				1210
				1211	VP9, used a binary arithmetic coder to encode symbols, where the propability
				1212	of a 1 or 0 at each descision node was based on a context model that took
				1213	into account recently coded values (for example previously coded coefficients
				1214	in the current block). A mechanism existed to update the context model each
				1215	frame, either explicitly in the bitstream, or implicitly at both the encoder
				1216	and decoder based on the observed frequency of different outcomes in the
				1217	previous frame. VP9 also supported seperate context models for different types
				1218	of frame (e.g. inter coded frames and key frames).
				1219
				1220	In contrast, AV1 uses an M-ary symbol arithmetic coder to compress the syntax
				1221	elements, where integer \f$M\in[2, 14]\f$. This approach is based upon the entropy
				1222	coding strategy used in the Daala video codec and allows for some bit-level
				1223	parallelism in its implementation. AV1 also has an extended context model and
				1224	allows for updates to the probabilities on a per symbol basis as opposed to
				1225	the per frame strategy in VP9.
				1226
				1227	To improve the performance / throughput of the arithmetic encoder, especially
				1228	in hardware implementations, the probability model is updated and maintained
				1229	at 15-bit precision, but the arithmetic encoder only uses the most significant
				1230	9 bits when encoding a symbol. A more detailed discussion of the algorithm
Paul Wilkins	f88a151	2020-10-20 13:18:40 +0100	[diff] [blame]	1231	and design constraints can be found in
				1232	<a href="https://arxiv.org/abs/2008.06091"> A Technical Overview of AV1</a>.
Paul Wilkins	ef79fe4	2020-08-04 19:32:11 +0100	[diff] [blame]	1233
				1234	TODO add references to key functions / files.
				1235
				1236	As with VP9, a mechanism exists in AV1 to encode some elements into the
				1237	bitstream as uncrompresed bits or literal values, without using the arithmetic
				1238	coder. For example, some frame and sequence header values, where it is
				1239	beneficial to be able to read the values directly.
				1240
				1241	TODO add references to key functions / files.
Paul Wilkins	386cb69	2020-08-04 18:11:17 +0100	[diff] [blame]	1242
angiebird	9101c0e	2020-08-17 11:16:23 -0700	[diff] [blame]	1243	\subsection architecture_entropy_coef Transform Coefficient Coding and Optimization
				1244	\image html coeff_coding.png "" width=70%
Paul Wilkins	386cb69	2020-08-04 18:11:17 +0100	[diff] [blame]	1245
angiebird	9101c0e	2020-08-17 11:16:23 -0700	[diff] [blame]	1246	\subsubsection architecture_entropy_coef_what Transform coefficient coding
				1247	Transform coefficient coding is where the encoder compresses a quantized version
				1248	of prediction residue into the bitstream.
				1249
				1250	\paragraph architecture_entropy_coef_prepare Preparation - transform and quantize
				1251	Before the entropy coding stage, the encoder decouple the pixel-to-pixel
				1252	correlation of the prediction residue by transforming the residue from the
				1253	spatial domain to the frequency domain. Then the encoder quantizes the transform
				1254	coefficients to make the coefficients ready for entropy coding.
				1255
				1256	\paragraph architecture_entropy_coef_coding The coding process
				1257	The encoder uses \ref av1_write_coeffs_txb() to write the coefficients of
				1258	a transform block into the bitstream.
				1259	The coding process has three stages.
				1260	1. The encoder will code transform block skip flag (txb_skip). If the skip flag is
				1261	off, then the encoder will code the end of block position (eob) which is the scan
				1262	index of the last non-zero coefficient plus one.
				1263	2. Second, the encoder will code lower magnitude levels of each coefficient in
				1264	reverse scan order.
				1265	3. Finally, the encoder will code the sign and higher magnitude levels for each
				1266	coefficient if they are available.
				1267
				1268	Related functions:
				1269	- \ref av1_write_coeffs_txb()
				1270	- write_inter_txb_coeff()
				1271	- \ref av1_write_intra_coeffs_mb()
				1272
				1273	\paragraph architecture_entropy_coef_context Context information
				1274	To improve the compression efficiency, the encoder uses several context models
				1275	tailored for transform coefficients to capture the correlations between coding
				1276	symbols. Most of the context models are built to capture the correlations
				1277	between the coefficients within the same transform block. However, transform
				1278	block skip flag (txb_skip) and the sign of dc coefficient (dc_sign) require
				1279	context info from neighboring transform blocks.
				1280
				1281	Here is how context info spread between transform blocks. Before coding a
				1282	transform block, the encoder will use get_txb_ctx() to collect the context
				1283	information from neighboring transform blocks. Then the context information
				1284	will be used for coding transform block skip flag (txb_skip) and the sign of
				1285	dc coefficient (dc_sign). After the transform block is coded, the encoder will
				1286	extract the context info from the current block using
				1287	\ref av1_get_txb_entropy_context(). Then encoder will store the context info
				1288	into a byte (uint8_t) using av1_set_entropy_contexts(). The encoder will use
				1289	the context info to code other transform blocks.
				1290
				1291	Related functions:
				1292	- \ref av1_get_txb_entropy_context()
				1293	- av1_set_entropy_contexts()
				1294	- get_txb_ctx()
				1295	- \ref av1_update_intra_mb_txb_context()
				1296
				1297	\subsubsection architecture_entropy_coef_rd RD optimization
				1298	Beside the actual entropy coding, the encoder uses several utility functions
				1299	to make optimal RD decisions.
				1300
				1301	\paragraph architecture_entropy_coef_cost Entropy cost
				1302	The encoder uses \ref av1_cost_coeffs_txb() or \ref av1_cost_coeffs_txb_laplacian()
				1303	to estimate the entropy cost of a transform block. Note that
				1304	\ref av1_cost_coeffs_txb() is slower but accurate whereas
				1305	\ref av1_cost_coeffs_txb_laplacian() is faster but less accurate.
				1306
				1307	Related functions:
				1308	- \ref av1_cost_coeffs_txb()
				1309	- \ref av1_cost_coeffs_txb_laplacian()
				1310	- \ref av1_cost_coeffs_txb_estimate()
				1311
				1312	\paragraph architecture_entropy_coef_opt Quantized level optimization
Vishesh	a45092c	2021-01-25 00:28:11 +0530	[diff] [blame]	1313	Beside computing entropy cost, the encoder also uses \ref av1_optimize_txb()
angiebird	9101c0e	2020-08-17 11:16:23 -0700	[diff] [blame]	1314	to adjust the coefficient’s quantized levels to achieve optimal RD trade-off.
Vishesh	a45092c	2021-01-25 00:28:11 +0530	[diff] [blame]	1315	In \ref av1_optimize_txb(), the encoder goes through each quantized
angiebird	9101c0e	2020-08-17 11:16:23 -0700	[diff] [blame]	1316	coefficient and lowers the quantized coefficient level by one if the action
				1317	yields a better RD score.
				1318
				1319	Related functions:
Vishesh	a45092c	2021-01-25 00:28:11 +0530	[diff] [blame]	1320	- \ref av1_optimize_txb()
angiebird	9101c0e	2020-08-17 11:16:23 -0700	[diff] [blame]	1321
				1322	All the related functions are listed in \ref coefficient_coding.
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1323
Rachel Barker	5758629	2024-02-20 20:56:16 +0000	[diff] [blame]	1324	\section architecture_simd SIMD usage
				1325
				1326	In order to efficiently encode video on modern platforms, it is necessary to
				1327	implement optimized versions of many core encoding and decoding functions using
				1328	architecture-specific SIMD instructions.
				1329
				1330	Functions which have optimized implementations will have multiple variants
				1331	in the code, each suffixed with the name of the appropriate instruction set.
				1332	There will additionally be an `_c` version, which acts as a reference
				1333	implementation which the SIMD variants can be tested against.
				1334
				1335	As different machines with the same nominal architecture may support different
				1336	subsets of SIMD instructions, we have dynamic CPU detection logic which chooses
				1337	the appropriate functions to use at run time. This process is handled by
				1338	`build/cmake/rtcd.pl`, with function definitions in the files
				1339	`*_rtcd_defs.pl` elsewhere in the codebase.
				1340
				1341	Currently SIMD is supported on the following platforms:
				1342
				1343	- x86: Requires SSE4.1 or above
				1344
				1345	- Arm: Requires Neon (Armv7-A and above)
				1346
				1347	We aim to provide implementations of all performance-critical functions which
				1348	are compatible with the instruction sets listed above. Additional SIMD
				1349	extensions (e.g. AVX on x86, SVE on Arm) are also used to provide even
				1350	greater performance where available.
				1351
Paul Wilkins	b534a78	2020-06-25 18:02:17 +0100	[diff] [blame]	1352	*/
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	1353
				1354	/*!\defgroup encoder_algo Encoder Algorithm
				1355	*
				1356	* The encoder algorithm describes how a sequence is encoded, including high
				1357	* level decision as well as algorithm used at every encoding stage.
				1358	*/
				1359
				1360	/*!\defgroup high_level_algo High-level Algorithm
				1361	* \ingroup encoder_algo
				1362	* This module describes sequence level/frame level algorithm in AV1.
				1363	* More details will be added.
				1364	* @{
				1365	*/
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	1366
Paul Wilkins	7173920	2020-07-23 15:09:07 +0100	[diff] [blame]	1367	/*!\defgroup speed_features Speed vs Quality Trade Off
				1368	* \ingroup high_level_algo
				1369	* This module describes the encode speed vs quality tradeoff
				1370	* @{
				1371	*/
				1372	/! @} - end defgroup speed_features /
				1373
				1374	/*!\defgroup src_frame_proc Source Frame Processing
				1375	* \ingroup high_level_algo
				1376	* This module describes algorithms in AV1 assosciated with the
				1377	* pre-processing of source frames. See also \ref architecture_enc_src_proc
				1378	*
				1379	* @{
				1380	*/
				1381	/! @} - end defgroup src_frame_proc /
				1382
				1383	/*!\defgroup rate_control Rate Control
				1384	* \ingroup high_level_algo
				1385	* This module describes rate control algorithm in AV1.
				1386	* See also \ref architecture_enc_rate_ctrl
				1387	* @{
				1388	*/
				1389	/! @} - end defgroup rate_control /
				1390
Paul Wilkins	ff98f3e	2020-07-27 16:01:05 +0100	[diff] [blame]	1391	/*!\defgroup tpl_modelling Temporal Dependency Modelling
				1392	* \ingroup high_level_algo
				1393	* This module includes algorithms to implement temporal dependency modelling.
				1394	* See also \ref architecture_enc_tpl
				1395	* @{
				1396	*/
				1397	/! @} - end defgroup tpl_modelling /
				1398
Paul Wilkins	7173920	2020-07-23 15:09:07 +0100	[diff] [blame]	1399	/*!\defgroup two_pass_algo Two Pass Mode
				1400	\ingroup high_level_algo
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	1401
				1402	In two pass mode, the input file is passed into the encoder for a quick
				1403	first pass, where statistics are gathered. These statistics and the input
				1404	file are then passed back into the encoder for a second pass. The statistics
				1405	help the encoder reach the desired bitrate without as much overshooting or
				1406	undershooting.
				1407
				1408	During the first pass, the codec will return "stats" packets that contain
				1409	information useful for the second pass. The caller should concatenate these
				1410	packets as they are received. In the second pass, the concatenated packets
				1411	are passed in, along with the frames to encode. During the second pass,
				1412	"frame" packets are returned that represent the compressed video.
				1413
				1414	A complete example can be found in `examples/twopass_encoder.c`. Pseudocode
				1415	is provided below to illustrate the core parts.
				1416
				1417	During the first pass, the uncompressed frames are passed in and stats
				1418	information is appended to a byte array.
				1419
				1420	~~~~~~~~~~~~~~~{.c}
				1421	// For simplicity, assume that there is enough memory in the stats buffer.
				1422	// Actual code will want to use a resizable array. stats_len represents
				1423	// the length of data already present in the buffer.
				1424	void get_stats_data(aom_codec_ctx_t encoder, char stats,
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	1425	size_t stats_len, bool got_data) {
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	1426	const aom_codec_cx_pkt_t *pkt;
				1427	aom_codec_iter_t iter = NULL;
				1428	while ((pkt = aom_codec_get_cx_data(encoder, &iter))) {
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	1429	*got_data = true;
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	1430	if (pkt->kind != AOM_CODEC_STATS_PKT) continue;
				1431	memcpy(stats + *stats_len, pkt->data.twopass_stats.buf,
				1432	pkt->data.twopass_stats.sz);
				1433	*stats_len += pkt->data.twopass_stats.sz;
				1434	}
				1435	}
				1436
				1437	void first_pass(char stats, size_t stats_len) {
				1438	struct aom_codec_enc_cfg first_pass_cfg;
				1439	... // Initialize the config as needed.
				1440	first_pass_cfg.g_pass = AOM_RC_FIRST_PASS;
				1441	aom_codec_ctx_t first_pass_encoder;
				1442	... // Initialize the encoder.
				1443
				1444	while (frame_available) {
				1445	// Read in the uncompressed frame, update frame_available
				1446	aom_image_t *frame_to_encode = ...;
				1447	aom_codec_encode(&first_pass_encoder, img, pts, duration, flags);
				1448	get_stats_data(&first_pass_encoder, stats, stats_len);
				1449	}
				1450	// After all frames have been processed, call aom_codec_encode with
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	1451	// a NULL ptr repeatedly, until no more data is returned. The NULL
				1452	// ptr tells the encoder that no more frames are available.
				1453	bool got_data;
				1454	do {
				1455	got_data = false;
				1456	aom_codec_encode(&first_pass_encoder, NULL, pts, duration, flags);
				1457	get_stats_data(&first_pass_encoder, stats, stats_len, &got_data);
				1458	} while (got_data);
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	1459
				1460	aom_codec_destroy(&first_pass_encoder);
				1461	}
				1462	~~~~~~~~~~~~~~~
				1463
				1464	During the second pass, the uncompressed frames and the stats are
				1465	passed into the encoder.
				1466
				1467	~~~~~~~~~~~~~~~{.c}
				1468	// Write out each encoded frame to the file.
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	1469	void get_cx_data(aom_codec_ctx_t encoder, FILE file,
				1470	bool *got_data) {
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	1471	const aom_codec_cx_pkt_t *pkt;
				1472	aom_codec_iter_t iter = NULL;
				1473	while ((pkt = aom_codec_get_cx_data(encoder, &iter))) {
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	1474	*got_data = true;
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	1475	if (pkt->kind != AOM_CODEC_CX_FRAME_PKT) continue;
				1476	fwrite(pkt->data.frame.buf, 1, pkt->data.frame.sz, file);
				1477	}
				1478	}
				1479
				1480	void second_pass(char *stats, size_t stats_len) {
				1481	struct aom_codec_enc_cfg second_pass_cfg;
				1482	... // Initialize the config file as needed.
				1483	second_pass_cfg.g_pass = AOM_RC_LAST_PASS;
				1484	cfg.rc_twopass_stats_in.buf = stats;
				1485	cfg.rc_twopass_stats_in.sz = stats_len;
				1486	aom_codec_ctx_t second_pass_encoder;
				1487	... // Initialize the encoder from the config.
				1488
				1489	FILE *output = fopen("output.obu", "wb");
				1490	while (frame_available) {
				1491	// Read in the uncompressed frame, update frame_available
				1492	aom_image_t *frame_to_encode = ...;
				1493	aom_codec_encode(&second_pass_encoder, img, pts, duration, flags);
				1494	get_cx_data(&second_pass_encoder, output);
				1495	}
				1496	// Pass in NULL to flush the encoder.
Elliott Karpilovsky	bbc7d9c	2020-06-10 20:36:45 -0700	[diff] [blame]	1497	bool got_data;
				1498	do {
				1499	got_data = false;
				1500	aom_codec_encode(&second_pass_encoder, NULL, pts, duration, flags);
				1501	get_cx_data(&second_pass_encoder, output, &got_data);
				1502	} while (got_data);
Elliott Karpilovsky	2ea1836	2020-06-02 18:32:27 -0700	[diff] [blame]	1503
				1504	aom_codec_destroy(&second_pass_encoder);
				1505	}
				1506	~~~~~~~~~~~~~~~
				1507	*/
				1508
Elliott Karpilovsky	b6bd2bc	2020-06-16 03:23:17 -0700	[diff] [blame]	1509	/*!\defgroup look_ahead_buffer The Look-Ahead Buffer
				1510	\ingroup high_level_algo
				1511
				1512	A program should call \ref aom_codec_encode() for each frame that needs
				1513	processing. These frames are internally copied and stored in a fixed-size
				1514	circular buffer, known as the look-ahead buffer. Other parts of the code
				1515	will use future frame information to inform current frame decisions;
				1516	examples include the first-pass algorithm, TPL model, and temporal filter.
				1517	Note that this buffer also keeps a reference to the last source frame.
				1518
				1519	The look-ahead buffer is defined in \ref av1/encoder/lookahead.h. It acts as an
				1520	opaque structure, with an interface to create and free memory associated with
				1521	it. It supports pushing and popping frames onto the structure in a FIFO
				1522	fashion. It also allows look-ahead when using the \ref av1_lookahead_peek()
				1523	function with a non-negative number, and look-behind when -1 is passed in (for
Elliott Karpilovsky	9999059	2020-06-19 12:22:54 -0700	[diff] [blame]	1524	the last source frame; e.g., firstpass will use this for motion estimation).
				1525	The \ref av1_lookahead_depth() function returns the current number of frames
				1526	stored in it. Note that \ref av1_lookahead_pop() is a bit of a misnomer - it
				1527	only pops if either the "flush" variable is set, or the buffer is at maximum
				1528	capacity.
Elliott Karpilovsky	b6bd2bc	2020-06-16 03:23:17 -0700	[diff] [blame]	1529
Mufaddal Chakera	a65d2ce	2021-02-15 12:20:48 +0530	[diff] [blame]	1530	The buffer is stored in the \ref AV1_PRIMARY::lookahead field.
Elliott Karpilovsky	b6bd2bc	2020-06-16 03:23:17 -0700	[diff] [blame]	1531	It is initialized in the first call to \ref aom_codec_encode(), in the
				1532	\ref av1_receive_raw_frame() sub-routine. The buffer size is defined by
				1533	the g_lag_in_frames parameter set in the
				1534	\ref aom_codec_enc_cfg_t::g_lag_in_frames struct.
				1535	This can be modified manually but should only be set once. On the command
				1536	line, the flag "--lag-in-frames" controls it. The default size is 19 for
Elliott Karpilovsky	9999059	2020-06-19 12:22:54 -0700	[diff] [blame]	1537	non-realtime usage and 1 for realtime. Note that a maximum value of 35 is
Elliott Karpilovsky	b6bd2bc	2020-06-16 03:23:17 -0700	[diff] [blame]	1538	enforced.
				1539
				1540	A frame will stay in the buffer as long as possible. As mentioned above,
				1541	the \ref av1_lookahead_pop() only removes a frame when either flush is set,
				1542	or the buffer is full. Note that each call to \ref aom_codec_encode() inserts
				1543	another frame into the buffer, and pop is called by the sub-function
				1544	\ref av1_encode_strategy(). The buffer is told to flush when
				1545	\ref aom_codec_encode() is passed a NULL image pointer. Note that the caller
				1546	must repeatedly call \ref aom_codec_encode() with a NULL image pointer, until
				1547	no more packets are available, in order to fully flush the buffer.
				1548
				1549	*/
				1550
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	1551	/! @} - end defgroup high_level_algo /
				1552
				1553	/*!\defgroup partition_search Partition Search
				1554	* \ingroup encoder_algo
Paul Wilkins	c84e8e2	2020-07-21 19:09:33 +0100	[diff] [blame]	1555	* For and overview of the partition search see \ref architecture_enc_partitions
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	1556	* @{
				1557	*/
Paul Wilkins	7173920	2020-07-23 15:09:07 +0100	[diff] [blame]	1558
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	1559	/! @} - end defgroup partition_search /
				1560
				1561	/*!\defgroup intra_mode_search Intra Mode Search
				1562	* \ingroup encoder_algo
				1563	* This module describes intra mode search algorithm in AV1.
				1564	* More details will be added.
				1565	* @{
				1566	*/
				1567	/! @} - end defgroup intra_mode_search /
				1568
				1569	/*!\defgroup inter_mode_search Inter Mode Search
				1570	* \ingroup encoder_algo
				1571	* This module describes inter mode search algorithm in AV1.
				1572	* More details will be added.
				1573	* @{
				1574	*/
				1575	/! @} - end defgroup inter_mode_search /
				1576
chiyotsai	7cc167e	2020-06-12 17:50:53 -0700	[diff] [blame]	1577	/*!\defgroup palette_mode_search Palette Mode Search
				1578	* \ingroup intra_mode_search
				1579	* This module describes palette mode search algorithm in AV1.
				1580	* More details will be added.
				1581	* @{
				1582	*/
				1583	/! @} - end defgroup palette_mode_search /
				1584
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	1585	/*!\defgroup transform_search Transform Search
				1586	* \ingroup encoder_algo
				1587	* This module describes transform search algorithm in AV1.
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	1588	* @{
				1589	*/
				1590	/! @} - end defgroup transform_search /
				1591
angiebird	96bdb2a	2020-06-28 17:24:24 -0700	[diff] [blame]	1592	/*!\defgroup coefficient_coding Transform Coefficient Coding and Optimization
				1593	* \ingroup encoder_algo
				1594	* This module describes the algorithms of transform coefficient coding and optimization in AV1.
				1595	* More details will be added.
				1596	* @{
				1597	*/
				1598	/! @} - end defgroup coefficient_coding /
				1599
Yunqing Wang	65cd010	2020-05-06 12:57:04 -0700	[diff] [blame]	1600	/*!\defgroup in_loop_filter In-loop Filter
				1601	* \ingroup encoder_algo
				1602	* This module describes in-loop filter algorithm in AV1.
				1603	* More details will be added.
				1604	* @{
				1605	*/
				1606	/! @} - end defgroup in_loop_filter /
				1607
Debargha Mukherjee	7f1580e	2020-06-19 06:37:28 -0700	[diff] [blame]	1608	/*!\defgroup in_loop_cdef CDEF
Debargha Mukherjee	82b2438	2020-06-16 23:30:39 -0700	[diff] [blame]	1609	* \ingroup encoder_algo
				1610	* This module describes the CDEF parameter search algorithm
				1611	* in AV1. More details will be added.
				1612	* @{
				1613	*/
				1614	/! @} - end defgroup in_loop_restoration /
				1615
Debargha Mukherjee	7f1580e	2020-06-19 06:37:28 -0700	[diff] [blame]	1616	/*!\defgroup in_loop_restoration Loop Restoration
Debargha Mukherjee	82b2438	2020-06-16 23:30:39 -0700	[diff] [blame]	1617	* \ingroup encoder_algo
				1618	* This module describes the loop restoration search
				1619	* and estimation algorithm in AV1.
				1620	* More details will be added.
				1621	* @{
				1622	*/
				1623	/! @} - end defgroup in_loop_restoration /
				1624
Marco Paniconi	5b2faba	2020-07-09 11:39:22 -0700	[diff] [blame]	1625	/*!\defgroup cyclic_refresh Cyclic Refresh
				1626	* \ingroup encoder_algo
				1627	* This module describes the cyclic refresh (aq-mode=3) in AV1.
				1628	* More details will be added.
				1629	* @{
				1630	*/
				1631	/! @} - end defgroup cyclic_refresh /
Jerome Jiang	66e7624	2020-07-09 11:38:19 -0700	[diff] [blame]	1632
				1633	/*!\defgroup SVC Scalable Video Coding
				1634	* \ingroup encoder_algo
				1635	* This module describes scalable video coding algorithm in AV1.
				1636	* More details will be added.
				1637	* @{
				1638	*/
				1639	/! @} - end defgroup SVC /
Marco Paniconi	08f71f2	2020-07-14 10:41:47 -0700	[diff] [blame]	1640	/*!\defgroup variance_partition Variance Partition
				1641	* \ingroup encoder_algo
				1642	* This module describes variance partition algorithm in AV1.
				1643	* More details will be added.
				1644	* @{
				1645	*/
				1646	/! @} - end defgroup variance_partition /
Fyodor Kyslov	2a3768e	2020-07-20 14:38:05 -0700	[diff] [blame]	1647	/*!\defgroup nonrd_mode_search NonRD Optimized Mode Search
				1648	* \ingroup encoder_algo
				1649	* This module describes NonRD Optimized Mode Search used in Real-Time mode.
				1650	* More details will be added.
				1651	* @{
				1652	*/
				1653	/! @} - end defgroup nonrd_mode_search /