Overview of the High Efficiency Video Coding(HEVC) Standard之二

II. HEVC编码设计和功能亮点

HEVC Coding Design and Feature Highlights

The HEVC standard is designed to achieve multiple goals, including coding efficiency,

ease of transport system integration and data loss resilience, as well as

implementability using parallel processing architectures. The following subsections

briefly describe the key elements of the design by which these goals are achieved,

and the typical encoder operation that would generate a valid bitstream. More details

about the associated syntax and the decoding process of the different elements are

provided in Sections III and IV.





A. 视频编码层

Video Coding Layer

The video coding layer of HEVC employs the same hybrid approach (inter-/intrapicture

prediction and 2-D transform coding) used in all video compression standards since H.261.

Fig. 1 depicts the block diagram of a hybrid video encoder, which could create

a bitstream conforming to the HEVC standard.



Fig. 1. Typical HEVC video encoder (with decoder modeling elements shaded in light gray).

An encoding algorithm producing an HEVC compliant bitstream would typically proceed

as follows. Each picture is split into block-shaped regions, with the exact block

partitioning being conveyed to the decoder. The first picture of a video sequence

(and the first picture at each clean random access point into a video sequence) is

coded using only intrapicture prediction (that uses some prediction of data spatially

from region-to-region within the same picture, but has no dependence on other pictures).

For all remaining pictures of a sequence or between random access points, interpicture

temporally predictive coding modes are typically used for most blocks. The encoding

process for interpicture prediction consists of choosing motion data comprising the

selected reference picture and motion vector (MV) to be applied for predicting the

samples of each block. The encoder and decoder generate identical interpicture

prediction signals by applying motion compensation (MC) using the MV and mode decision

data, which are transmitted as side information.








The residual signal of the intra- or interpicture prediction, which is the difference

between the original block and its prediction, is transformed by a linear spatial

transform. The transform coefficients are then scaled, quantized, entropy coded,

and transmitted together with the prediction information.



The encoder duplicates the decoder processing loop (see gray-shaded boxes in Fig. 1)

such that both will generate identical predictions for subsequent data. Therefore,

the quantized transform coefficients are constructed by inverse scaling and are then

inverse transformed to duplicate the decoded approximation of the residual signal.

The residual is then added to the prediction, and the result of that addition may

then be fed into one or two loop filters to smooth out artifacts induced by block-wise

processing and quantization. The final picture representation (that is a duplicate of

the output of the decoder) is stored in a decoded picture buffer to be used for

the prediction of subsequent pictures. In general, the order of encoding or decoding

processing of pictures often differs from the order in which they arrive from the source;

necessitating a distinction between the decoding order (i.e., bitstream order)

and the output order (i.e., display order) for a decoder.






Video material to be encoded by HEVC is generally expected to be input as progressive

scan imagery (either due to the source video originating in that format or resulting from

deinterlacing prior to encoding). No explicit coding features are present in the HEVC

design to support the use of interlaced scanning, as interlaced scanning is no longer

used for displays and is becoming substantially less common for distribution.

However, a metadata syntax has been provided in HEVC to allow an encoder to indicate

that interlace-scanned video has been sent by coding each field (i.e., the even or

odd numbered lines of each video frame) of interlaced video as a separate

picture or that it has been sent by coding each interlaced frame as an HEVC coded picture.

This provides an efficient method of coding interlaced video without burdening decoders

with a need to support a special decoding process for it.







In the following, the various features involved in hybrid video coding using HEVC

are highlighted as follows.


1) 编码树单元和编码树块(CTB)结构:

Coding tree units and coding tree block (CTB) structure:

The core of the coding layer in previous standards was the macroblock, containing

a 16×16 block of luma samples and, in the usual case of 4:2:0 color sampling, two

corresponding 8×8 blocks of chroma samples; whereas the analogous structure in HEVC

is the coding tree unit (CTU), which has a size selected by the encoder and can be

larger than a traditional macroblock. The CTU consists of a luma CTB and the

corresponding chroma CTBs and syntax elements. The size L×L of a luma CTB can be

chosen as L = 16, 32, or 64 samples, with the larger sizes typically enabling better

compression. HEVC then supports a partitioning of the CTBs into smaller blocks using

a tree structure and quadtree-like signaling [8].




CTU由一个亮度CTB, 相应的色度CTB 以及 语法元素 组成。

亮度CTB的尺寸为LxL,L=16, 32, 64像素,大尺寸通常有更好的压缩效果。



Coding units (CUs) and coding blocks (CBs):

The quadtree syntax of the CTU specifies the size and positions of its luma and chroma CBs.

The root of the quadtree is associated with the CTU. Hence, the size of the luma CTB is the

largest supported size for a luma CB. The splitting of a CTU into luma and chroma CBs

is signaled jointly. One luma CB and ordinarily two chroma CBs, together with associated

syntax, form a coding unit (CU). A CTB may contain only one CU or may be split to form

multiple CUs, and each CU has an associated partitioning into prediction units (PUs) and a

tree of transform units (TUs).







3) 预测单元(PU)和预测块(PB)

Prediction units(PUs) and prediction blocks (PBs):

The decision whether to code a picture area using interpicture or intrapicture prediction

is made at the CU level. A PU partitioning structure has its root at the CU level.

Depending on the basic prediction-type decision, the luma and chroma CBs can then be

further split in size and predicted from luma and chroma prediction blocks (PBs).

HEVC supports variable PB sizes from 64×64 down to 4×4 samples.






TUs and transform blocks:

The prediction residual is coded using block transforms. A TU tree structure has

its root at the CU level. The luma CB residual may be identical to the luma transform block

(TB) or may be further split into smaller luma TBs. The same applies to the chroma TBs.

Integer basis functions similar to those of a discrete cosine transform (DCT) are defined

for the square TB sizes 4×4, 8×8, 16×16, and 32×32. For the 4×4 transform of luma

intrapicture prediction residuals, an integer transform derived from a form of discrete

sine transform (DST) is alternatively specified.





整数基函数和离散余弦变换(DCT)相似,定义了四种变换块尺寸,4x4,8×8, 16×16, 和32×32.



Motion vector signaling:

Advanced motion vector prediction (AMVP) is used, including derivation of several

most probable candidates based on data from adjacent PBs and the reference picture.

A merge mode for MV coding can also be used, allowing the inheritance of MVs from

temporally or spatially neighboring PBs. Moreover, compared to H.264/MPEG-4 AVC,

improved skipped and direct motion inference are also specified.



而且,相对于H.264/MPEG-4 AVC,HEVC改进了skip和direct运动参考。


Motion compensation:

Quarter-sample precision is used for the MVs, and 7-tap or 8-tap filters are used for

interpolation of fractional-sample positions (compared to six-tap filtering of half-sample

positions followed by linear interpolation for quarter-sample positions in H.264/MPEG-4 AVC).

Similar to H.264/MPEG-4 AVC, multiple reference pictures are used. For each PB, either

one or two motion vectors can be transmitted, resulting either in unipredictive or

bipredictive coding, respectively. As in H.264/MPEG-4 AVC, a scaling and offset

operation may be applied to the prediction signal(s) in a manner known as weighted prediction.



(相对于H.264/MPEG-4 AVC来说,它只使用了对半像素插值的六阶滤波;)

和H.264/MPEG-4 AVC类似,HEVC也使用了多个参考图像。


和H.264/MPEG-4 AVC一样,缩放和偏移操作同样应用在预测信号中(即权值预测)。

7) 帧内预测

Intrapicture prediction:

The decoded boundary samples of adjacent blocks are used as reference data for spatial

prediction in regions where interpicture prediction is not performed. Intrapicture

prediction supports 33 directional modes (compared to eight such modes in H.264/MPEG-4 AVC),

plus planar (surface fitting) and DC (flat) prediction modes. The selected intrapicture

prediction modes are encoded by deriving most probable modes (e.g., prediction directions)

based on those of previously decoded neighboring PBs.




8) 量化控制

Quantization control:

As in H.264/MPEG-4 AVC, uniform reconstruction quantization (URQ) is used in HEVC,

with quantization scaling matrices supported for the various transform block sizes.

和H.264/MPEG-4 AVC一样,HEVC也使用了均匀重建量化(URQ)。


9) 熵编码

Entropy coding:

Context adaptive binary arithmetic coding (CABAC) is used for entropy coding. This is

similar to the CABAC scheme in H.264/MPEG-4 AVC, but has undergone several improvements

to improve its throughput speed (especially for parallel-processing architectures) and

its compression performance, and to reduce its context memory requirements.


相对于H.264/MPEG-4 AVC的CABAC,在并行处理架构上的数据吞吐速度和压缩效果获得了显著提高;


10) 环路滤波

In-loop deblocking filtering:

A deblocking filter similar to the one used in H.264/MPEG-4 AVC is operated within

the interpicture prediction loop. However, the design is simplified in regard to its

decision-making and filtering processes, and is made more friendly to parallel processing.

和H.264/MPEG-4 AVC一样,HEVC也使用了环内的去块效应滤波;


11) 采样点自适应偏移

Sample adaptive offset (SAO):

A nonlinear amplitude mapping is introduced within the interpicture prediction

loop after the deblocking filter. Its goal is to better reconstruct the original

signal amplitudes by using a look-up table that is described by a few additional

parameters that can be determined by histogram analysis at the encoder side.




B. 高级语法架构

High-Level Syntax Architecture

A number of design aspects new to the HEVC standard improve flexibility for operation

over a variety of applications and network environments and improve robustness to data

losses. However, the high-level syntax architecture used in the H.264/MPEG-4 AVC standard

has generally been retained, including the following features.



然而,HEVC继续使用了H.264/MPEG-4 AVC的高级语法架构,包括下列功能:

1) 参数集结构

Parameter set structure:

Parameter sets contain information that can be shared for the decoding of several regions

of the decoded video. The parameter set structure provides a robust mechanism for conveying

data that are essential to the decoding process. The concepts of sequence and picture

parameter sets from H.264/MPEG-4 AVC are augmented by a new video parameter set (VPS)




H.264/MPEG-4 AVC中的PPS和SPS被扩充成一个新的视频参数集(VPS)结构;

2) NAL单元语法结构

NAL unit syntax structure:

Each syntax structure is placed into a logical data packet called a network abstraction

layer (NAL) unit. Using the content of a twobyte NAL unit header, it is possible to

readily identify the purpose of the associated payload data.



3) 片


A slice is a data structure that can be decoded independently from other slices of the

same picture, in terms of entropy coding, signal prediction, and residual signal

reconstruction. A slice can either be an entire picture or a region of a picture.

One of the main purposes of slices is resynchronization in the event of data losses.

In the case of packetized transmission, the maximum number of payload bits within a slice

is typically restricted, and the number of CTUs in the slice is often varied to minimize

the packetization overhead while keeping the size of each packet within this bound.






4) 附加的增强信息(SEI) 和 视频可用信息(VUI)元数据

Supplemental enhancement information (SEI) and

video usability information (VUI) metadata:

The syntax includes support for various types of metadata known as SEI and VUI.

Such data provide information about the timing of the video pictures, the proper

interpretation of the color space used in the video signal, 3-D stereoscopic frame

packing information, other display hint information, and so on.


C. 并行解码语法 和 改进的片组织

Parallel Decoding Syntax and Modified Slice Structuring

Finally, four new features are introduced in the HEVC standard to enhance the parallel

processing capability or modify the structuring of slice data for packetization purposes.

Each of them may have benefits in particular application contexts, and it is generally

up to the implementer of an encoder or decoder to determine whether and how to take

advantage of these features.




1) 瓦片


The option to partition a picture into rectangular regions called tiles has been specified.

The main purpose of tiles is to increase the capability for parallel processing rather

than provide error resilience. Tiles are independently decodable regions of a picture

that are encoded with some shared header information. Tiles can additionally be used for

the purpose of spatial random access to local regions of video pictures. A typical

tile configuration of a picture consists of segmenting the picture into rectangular

regions with approximately equal numbers of CTUs in each tile. Tiles provide parallelism

at a more coarse level of granularity (picture/subpicture), and no sophisticated

synchronization of threads is necessary for their use.







2) 波前并行处理:

Wavefront parallel processing:

When wavefront parallel processing (WPP) is enabled, a slice is divided into rows of CTUs.

The first row is processed in an ordinary way, the second row can begin to be processed

after only two CTUs have been processed in the first row, the third row can begin to be

processed after only two CTUs have been processed in the second row, and so on. The

context models of the entropy coder in each row are inferred from those in the preceding

row with a two-CTU processing lag. WPP provides a form of processing parallelism at a rather

fine level of granularity, i.e., within a slice. WPP may often provide better compression

performance than tiles (and avoid some visual artifacts that may be induced by using tiles).










3) 附属片分段:

Dependent slice segments:

A structure called a dependent slice segment allows data associated with a particular

wavefront entry point or tile to be carried in a separate NAL unit, and thus potentially

makes that data available to a system for fragmented packetization with lower latency than

if it were all coded together in one slice. A dependent slice segment for a wavefront

entry point can only be decoded after at least part of the decoding process of another

slice segment has been performed. Dependent slice segments are mainly useful in low-delay

encoding, where other parallel tools might penalize compression performance.





In the following two sections, a more detailed description of the key features is given.

