Ðóññêèå ìàòåðèàëû
|
| Àâòîðû |
Íàçâàíèå ñòàòüè |
Îïèñàíèå |
Ðåéòèíã |
| Ñåìåíþê Â. |
Ñîâðåìåííûå ìåòîäû è ñòàíäàðòû ýêîíîìíîãî êîäèðîâàíèÿ âèäåîèíôîðìàöèè |
Ââîäíûé îáçîð ñòàíäàðòîâ MPEG-4, H.264 è MVC
Ñàíêò-Ïåòåðáóðã, 2002.
PDF 212 êáàéò
|
5
|
Àíãëèéñêèå ìàòåðèàëû
|
| Àâòîðû |
Íàçâàíèå ñòàòüè |
Îïèñàíèå |
Ðåéòèíã |
| S. Olivieri, L. Albani ,G. de Haan |
A LOW{COMPLEXITY RECURSIVE MOTION ESTIMATION ALGORITHM FOR H.263 VIDEO CODING1 |
Abstract - The key element in realizing low cost real{time software
implementations of a H.263 videoconferencing system is a fast motion
estimation algorithm, which only slightly decreases coding e.ciency. We
propose a spatio{temporal recursive estimator that combines an excel-
lent coding e.ciency with a high computational e.ciency. Experiment-
ally, the new algorithm proves to be comparable to full{search block
matching when encoding typical videoconferencing sequences in pres-
ence of additive noise, even though the computational burden has been
greatly reduced.
RAR 484 êáàéò |
?
|
| Stefano Olivieriy, Gerard de Haan z, and Luigi Albaniy |
Noise{robust Recursive Motion Estimation for
H.263{based videoconferencing systems |
The key element in realizing low cost real{time software implementations of a H.263 video-
conferencing system is a fast motion estimation algorithm, which only slightly decreases coding
e.ciency. We propose a spatio{temporal recursive estimator that combines an excellent coding
e.ciency with a high computational e.ciency. Experimentally, the new algorithm proves to be
comparable to full{search block matching when encoding typical videoconferencing sequences in
presence of additive noise, even though the computational burden has been greatly reduced.
RAR 138 êáàéò |
?
|
| Xudong Song, Tihao Chiang, Xiaobing Lee, and Ya-Qin Zhang, Fellow |
New Fast Binary Pyramid Motion Estimation for MPEG2 and HDTV Encoding |
Abstract—A novel Fast Binary Pyramid Motion Estimation
(FBPME) algorithm is presented in this paper. The proposed
FBPME scheme is based on binary multiresolution layers, exclusive-
or (XOR) Boolean block matching, and a -scale tiling search
scheme. Each video frame is converted into a pyramid structure of
1 binary layers with resolution decimation, plus one integer
layer at the lowest resolution. At the lowest resolution layer, the
-scale tiling search is performed to select initial motion vector
candidates. Motion vector fields are gradually refined with the
XOR Boolean block-matching criterion and the -scale tiling
search schemes in higher binary layers.
FBPME performs several thousands times faster than the conventional
full-search block-matching scheme at the same PSNR
performance and visual quality. It also dramatically reduces the
bus bandwidth and on-chip memory requirement. Moreover, hardware
complexity is low due to its binary nature.
Fully functional software MPEG-2 MP@ML encoders and Advanced
Television Standard Committee High Definition Television
encoders based on the FBPME algorithm have been implemented.
FBPME Hardware architecture has been developed and is being
incorporated into single-chip MPEG encoders. A wide range of
video sequences at various resolutions has been tested. The proposed
algorithm is also applicable to other digital video compression
standards such as H.261, H.263, and MPEG4.
RAR 563 êáàéò
|
?
|
| Barry G. Haskell, Fellow, Paul G. Howard, Yann A. LeCun, Atul Puri, J?oern Ostermann, M. Reha Civanlar, Lawrence Rabiner, Fellow, Leon Bottou, and Patrick Haffner |
Image and Video Coding—Emerging Standards and Beyond |
Abstract— In this paper, we make a short foray through
coding standards for still images and motion video. We first
briefly discuss standards already in use, including: Group 3 and
Group 4 for bilevel fax images; JPEG for still color images;
and H.261, H.263, MPEG-1, and MPEG-2 for motion video. We
then cover newly emerging standards such as JBIG1 and JBIG2
for bilevel fax images, JPEG-2000 for still color images, and
H.263+ and MPEG-4 for motion video. Finally, we describe some
directions currently beyond the standards such as hybrid coding
of graphics/photo images, MPEG-7 for multimedia metadata, and
possible new technologies.
RAR 921 êáàéò
|
?
|
| Peter Cherriman, Choong Hin Wong, and Lajos Hanzo |
Turbo- and BCH-Coded Wide-Band Burst-by-Burst Adaptive H.263-Assisted Wireless Video Telephony |
Abstract—The video performance benefits of burst-by-burst
adaptive modulation are studied, employing a higher-order
modulation scheme when the channel is favorable, in order to
increase the system’s bits per symbol capacity and conversely,
invoking a more robust lower order modulation scheme when
the channel exhibits inferior channel quality. It is shown that
due to the proposed adaptive modem mode switching regime, a
seamless video-quality versus channel quality relationship can be
established, resulting in error-free video quality right across the
operating channel signal-to-noise ratio (SNR) range. The main
advantage of the proposed burst-by-burst adaptive transceiver
technique is that irrespective of the prevailing channel conditions,
the transceiver achieves always the best possible source-signal representation
quality—such as video, speech, or audio quality—by
automatically adjusting the achievable bitrate and the associated
multimedia source-signal representation quality in order to match
the channel quality experienced. This is achieved on a near-instantaneous
basis under given propagation conditions in order to cater
for the effects of path loss, fast-fading, slow-fading, dispersion,
co-channel interference, etc. Furthermore, when the mobile is
roaming in a hostile out-doors—or even hilly terrain—propagation
environment, typically low-order low-rate modem modes are
invoked, while in benign indoor environments, predominantly the
high-rate high source-signal representation quality modes are
employed.
RAR 188 êáàéò
|
?
|
| Marek Doman´ski, Adam £uczak, and S³awomir Mac´kowiak |
Spatio-Temporal Scalability for MPEG Video Coding |
Abstract—The existing and standardized solutions for spatial
scalability are not satisfactory, therefore new approaches are very
actively explored recently. The goal of this paper is to improve spatial
scalability of MPEG-2 for progressive video. In order to avoid
problems with too large bitstreams of the base layer produced by
some of the hitherto proposed spatially scalable coders, spatio-temporal
scalability is proposed for video compression systems. It is assumed
that a coder produces two bitstreams, where the base-layer
bitstream corresponds to pictures with reduced both spatial and
temporal resolution while the enhancement layer bitstream is used
to transmit the information needed to retrieve images with full spatial
and temporal resolution. In the base layer, temporal resolution
reduction is obtained by B-frame data partitioning, i.e., by placing
each second frame (B-frame) in the enhancement layer. Subband
(wavelet) analysis is used to provide spatial decomposition of the
signal. Full compatibility with the MPEG-2 standard is ensured in
the base layer. As compared to single-layer MPEG-2 encoding at
bit rates below 6 Mbits/s, the bitrate overhead for scalability is less
than 15% in most cases.
RAR 131 êáàéò
|
?
|
| Hung-Ju Lee, Tihao Chiang, and Ya-Qin Zhang, Fellow |
Scalable Rate Control for MPEG-4 Video |
Abstract—This paper presents a scalable rate control (SRC)
scheme based on a more accurate second-order rate-distortion
model. A sliding-window method for data selection is used to mitigate
the impact of a scene change. The data points for updating a
model are adaptively selected such that the statistical behavior is
improved. For video object (VO) shape coding, we use an adaptive
threshold method to remove shape-coding artifacts for MPEG-4
applications. A dynamic bit allocation among VOs is implemented
according to the coding complexities for each VO.
SRC achieves more accurate bit allocation with low latency and
limited buffer size. In a single framework, SRC offers multiple
layers of controls for objects, frames, and macroblocks (MBs). At
MB level, SRC provides finer bit rate and buffer control. At multipleVOlevel,
SRC offers superiorVOpresentation for multimedia
applications. The proposed SRC scheme has been adopted as part
of the International Standard of the emerging ISO MPEG-4 standard
[1], [2].
RAR 568 êáàéò
|
?
|
| Pao-Chi Chang and Tien-Hsu Lee |
Precise and Fast Error Tracking for Error-Resilient Transmission of H.263 Video |
Abstract—In this letter, a precise error-tracking scheme for robust
transmission of real-time H.263 video is presented. By utilizing
a feedback channel, the decoder reports the addresses of corrupted
blocks induced by transmission errors back to the encoder.
With these negative acknowledgments, the encoder can precisely
calculate and track the propagated errors by examining the backward
motion dependency for each pixel in the current encoding
frame. With this precise tracking, the error-propagation effects
can be terminated completely by INTRA refreshing the affected
macroblocks. In addition, by utilizing the four-corner tracking approximation
and the linear motion model, a fast algorithm is also
developed to further reduce the computation and memory requirements.
The simulations show that both schemes yield significant
video quality improvements in error-prone environments. The advantages
of the low memory requirement and the low computation
complexity are particularly suitable for real-time implementation.
RAR 381 êáàéò
|
?
|
| Kwong-Keung Leung, Nelson H. C. Yung, and Paul Y. S. Cheung |
Parallelization Methodology for Video Coding—An Implementation on the TMS320C80 |
RAR 250 êáàéò
|
?
|
| Lorenzo Favalli, Alessandro Mecocci, and Fulvio Moschetti |
Object Tracking for Retrieval Applications in MPEG-2 |
Abstract—This paper presents a parallelization methodology for
video coding based on the philosophy of hiding as much communications
by computation as possible. It models the task/data size,
processor cache capacity, and communication contention, through
a systematic decomposition and scheduling approach.With the aid
of Petri-nets and task graphs for representation and analysis, it employs
a triple buffering scheme to enable the functions of frame
capture, management, and coding to be performed in parallel. The
theoretical speedup analysis indicates that this method offers excellent
communication hiding, resulting in system efficiency well
above 90%. To prove its practicality, a H.261 video encoder has
been implemented on a TMS320C80 system using the method. Its
performance was measured, from which the speedup and efficiency
figures were calculated. The only difference detected between the
theoretical and measured data is the program control overhead
that has not been accounted for in the theoretical model. Even with
this, the measured speedup of the H.261 is 3.67 and 3.76 on four
parallel processors (PPs) for QCIF and 352 240 video, respectively,
which correspond to frame rate of 30.7 and 9.25 frames per
second, and system efficiency of 91.8% and 94%, respectively. This
method is particularly efficient for platforms with small number of
parallel processors.
RAR 604 êáàéò
|
?
|
| Xudong Song, Tihao Chiang, Xiaobing Lee, and Ya-Qin Zhang, Fellow |
New Fast Binary Pyramid Motion Estimation for MPEG2 and HDTV Encoding |
Abstract—A novel Fast Binary Pyramid Motion Estimation
(FBPME) algorithm is presented in this paper. The proposed
FBPME scheme is based on binary multiresolution layers, exclusive-
or (XOR) Boolean block matching, and a -scale tiling search
scheme. Each video frame is converted into a pyramid structure of
1 binary layers with resolution decimation, plus one integer
layer at the lowest resolution. At the lowest resolution layer, the
-scale tiling search is performed to select initial motion vector
candidates. Motion vector fields are gradually refined with the
XOR Boolean block-matching criterion and the -scale tiling
search schemes in higher binary layers.
FBPME performs several thousands times faster than the conventional
full-search block-matching scheme at the same PSNR
performance and visual quality. It also dramatically reduces the
bus bandwidth and on-chip memory requirement. Moreover, hardware
complexity is low due to its binary nature.
Fully functional software MPEG-2 MP@ML encoders and Advanced
Television Standard Committee High Definition Television
encoders based on the FBPME algorithm have been implemented.
FBPME Hardware architecture has been developed and is being
incorporated into single-chip MPEG encoders. A wide range of
video sequences at various resolutions has been tested. The proposed
algorithm is also applicable to other digital video compression
standards such as H.261, H.263, and MPEG4.
RAR 563 êáàéò
|
?
|
| Sofia Tsekeridou and Ioannis Pitas |
MPEG-2 Error Concealment Based on Block-Matching Principles |
Abstract—The MPEG-2 compression algorithm is very sensitive
to channel disturbances due to the use of variable-length coding.
A single bit error during transmission leads to noticeable degradation
of the decoded sequence quality, in that part or an entire
slice information is lost until the next resynchronization point is
reached. Error concealment (EC) methods, implemented at the decoder
side, present one way of dealing with this problem. An errorconcealment
scheme that is based on block-matching principles
and spatio-temporal video redundancy is presented in this paper.
Spatial information (for the first frame of the sequence or the next
scene) or temporal information (for the other frames) is used to reconstruct
the corrupted regions. The concealment strategy is embedded
in the MPEG-2 decoder model in such a way that error concealment
is applied after entire frame decoding. Its performance
proves to be satisfactory for packet error rates (PER) ranging from
1% to 10% and for video sequences with different content and motion
and surpasses that of other EC methods under study.
RAR 1439 êáàéò
|
?
|
| Eckhart Baum, Volker Harr, and Joachim Speidel |
Improvement of H.263 Encoding by Adaptive Arithmetic Coding |
Abstract—Arithmetic coding in H.263 is based on models that
assign a fixed probability to each possible value of some syntax element.
In this paper, the effect of adapting the models according to
the dynamically changing statistics is analyzed. Simulation results
show improvements in all studied cases.
RAR 52 êáàéò
|
?
|
| Daniel F. Zucker,, Ruby B. Lee, and Michael J. Flynn |
Hardware and Software Cache Prefetching Techniques for MPEG Benchmarks |
Abstract—With the popularity of multimedia acceleration instructions
such as MMX, MPEG decompression is increasingly executed
on general purpose processors instead of dedicated MPEG
hardware. The gap between processor speed and memory access
means that a significant amount of time is spent in the memory
system. As processors get faster—both in terms of higher clock
speeds and increased instruction level parallelism—the time spent
in the memory system becomes even more significant.
Data prefetching is a well-known technique for improving
cache performance. While several studies have examined prefetch
strategies for scientific and commercial applications, this paper
focuses on video applications. Data is presented for three types
of hardware-prefetching schemes: the stream buffer, the stride
prediction table (SPT), and the stream cache, as well as a new
software-directed prefetching technique based on emulation of
the hardware SPT. Up to 90% of the misses that would otherwise
occur with no prefetching are eliminated. The stream cache can
cut execution time by more than half with the addition of a relatively
small amount of additional hardware. Software prefetching
achieves nearly equal performance with minimal additional hardware.
RAR 285 êáàéò
|
?
|
| Nikolaos D. Doulamis, Anastasios D. Doulamis, George E. Konstantoulakis, and George I. Stassinopoulos |
Efficient Modeling of VBR MPEG-1 Coded Video Sources |
Abstract—Performance evaluation of broadband networks
requires statistical analysis and modeling of the actual network
traffic. Since multimedia services, and especially variable bit rate
(VBR) MPEG-coded video streams are expected to be a major
traffic component carried by these networks, modeling of such services
and accurate estimation of network resources are crucial for
proper network design and congestion-control mechanisms that
can guarantee the negotiated quality of service at a minimum cost.
The layer modeling of MPEG-1 coded video streams and statistical
analysis of their traffic characteristics at each layer is proposed
in this paper, along with traffic models capable of estimating the
network resources over asynchronous transfer mode (ATM) links.
First, based on the properties of the entire MPEG-1 sequence
(frame layer signal), a model (Model A) is presented by correlating
three stochastic processes in discrete time (autoregressive models),
each of which corresponds to the three types of frames of the
MPEG encoder ( , and frames). To simplify the traffic
Model A and to reduce the required number of parameters, we
study the MPEG stream at a higher layer by considering a signal,
which expresses the average properties of , and frames
over a group of picture (GOP) period. However, models on this
layer cannot accurately estimate the network resources, especially
in multiplexing schemes. For this reason, an intermediate layer is
introduced, which exploits and efficiently combines information
of both the aforementioned layers, producing a model (Model B),
which requires much smaller number of parameters than Model
A and simultaneously provides satisfactory results as far as the
network resources are concerned. Evaluation of the validity of the
proposed models is performed through experimental studies and
computer simulations, using several long duration VBR MPEG-1
coded sequences, different from that used in modeling. The results
indicate that both Models A and B are good estimators of video
traffic behavior over ATM links at a wide range of utilization.
RAR 418 êáàéò
|
?
|
| Krit Panusopone, Xuemin Chen, Robert Eifrig, and Ajay Luthra |
Coding Tools in MPEG-4 for Interlaced Video |
Abstract—Recent developments in digital video compression,
transmission, and displays have made object-based video viable
for many applications, e.g., coding chroma-keyed video for digital
TV and manipulating video objects on interactive multimedia terminals,
etc. To facilitate these applications, there is a demand on
international standards for coding methods and transmission formats
for object-based natural and synthetic video. For the past few
years, the Moving Picture Experts Group (MPEG) of the International
Standards Organization (ISO), which successfully created
the MPEG-1/2 standards, has beenworking to establish a new standard,
called MPEG-4. MPEG-4 will provide standardized technological
elements enabling the integration of the production, distribution,
and content-access paradigms in four fields: wireless communication,
digital TV, interactive graphics, and the World Wide
Web. To meet the needs of interlaced video applications, MPEG-4
video adopted interlaced coding tools similar to those in MPEG-2
and features schemes to code multiple video objects. This paper
provides an overview of MPEG-4 interlaced coding tools, and focuses
in detail on the new shape and texture-coding algorithms for
interlaced video.
RAR 322 êáàéò
|
?
|
| Bo Tao, Bradley W. Dickinson, Fellow, and Heidi A. Peterson |
Adaptive Model-Driven Bit Allocation for MPEG Video Coding |
Abstract—We present an adaptive model-driven bit-allocation
algorithm for video sequence coding. The algorithm is based on
a parametric rate-distortion model, and facilitates both pictureand
macroblock-level bit allocation. A region classification scheme
is incorporated into the algorithm, which exploits characteristics
of human visual perception to efficiently allocate bits according to
a region's visual importance. The application of this algorithm to
MPEG video coding is discussed in detail. We show that the proposed
algorithm is computationally efficient and has many advantages
over the MPEG2 TM5 bit-allocation algorithm.
RAR 257 êáàéò
|
?
|
| Judy Y. Liao and John Villasenor |
Adaptive Intra Block Update for Robust Transmission of H.263 |
Abstract—An adaptive block-based intra refresh algorithm for
increasing error robustness in an interframe coding system is described.
The goal of this algorithm is to allow the intra update rates
for different image regions to vary according to various channel
conditions and image characteristics. The update scheme is based
on an “error-sensitivity metric,” accumulated at the encoder, representing
the vulnerability of each coded block to channel errors.
As each new frame is encoded, the accumulated metric for each
block is examined, and those blocks deemed to have an unacceptably
high metric are sent using intra coding as opposed to inter
coding. This approach requires no feedback channel and is fully
compatible with H.263. It involves a negligible increase in encoder
complexity and no change in the decoder complexity. Simulations
performed using an H.263 bitstream corrupted by channel errors
demonstrate a significant improvement in terms of error recovery
time over nonadaptive intra update strategies.
RAR 90 êáàéò
|
?
|
| Jeff McVeigh, George K. Chen, Judi Goldstein, Atul Gupta, Mike Keith, and Steve Wood |
A Software-Based Real-Time MPEG-2 Video Encoder |
Abstract—Dedicated hardware previously has been required
to perform real-time MPEG-2 video encoding. However, with
increases in clock frequency and the introduction of video-specific
instruction sets, general-purpose processors can now approximate
the function and performance of single-function hardware. In this
paper, we describe a software-only MPEG-2 (MP@ML) video
encoder implemented on a personal computer using an Intel™
Pentium® III processor. This encoder is capable of real-time operation
while consuming less than 70% of the processor. The main
contribution of this work is a set of algorithmic simplifications
that significantly reduces the computational load of the encoding
process while only slightly degrading the subjective video quality
compared to encoders that are more exhaustive.
RAR 174 êáàéò
|
?
|
| Han Seung Jung, Rin-Chul Kim, and Sang-Uk Lee |
A Hierarchical Synchronization Technique Based on the EREC for Robust Transmission of H.263 Bit Stream |
Abstract—In this letter, we propose an error-resilient transmission
technique for the H.263 compatible video data stream,
based on the data-partitioning technique. The proposed algorithm
employs the bit rearrangement technique of the error-resilience
entropy coding in each layer, providing unequal error protection
against the channel errors, without requiring additional side
information. In addition, we propose the recovery algorithm for
the lost or erroneous motion vectors. The proposed algorithm
is implemented, based on the H.263 standard, and evaluated
through intensive computer simulation. The experimental results
demonstrate that the proposed algorithm provides acceptable
performance both subjectively and objectively at various bit error
rates and burst lengths.
RAR 190 êáàéò
|
?
|
| Jordi Ribas-Corbera and Shaw-Min Lei |
A Frame-Layer Bit Allocation for H.263+ |
Abstract—In typical block-based video coding, the rate-control
scheme allocates a target number of bits to each frame of a video
sequence and selects the block quantization parameters to meet
the frame targets. In this work, we present a new technique for
assigning such targets. This method has been adopted in the
recent test model TMN10 of H.263+, but it is applicable to any
video coder and is particularly useful for those that use frames.
Our approach selects the frame targets using formulas that result
from combining an analytical rate-distortion optimization and
a heuristic technique that compensates for the distortion dependency
among frames. The method does not require pre-analyses,
and encodes each frame only once; hence, it is geared toward
low-complexity real-time video coding. We compare this new
frame-layer bit allocation in TMN10 to that in MPEG2’s TM5 for
a variety of bit rates and video sequences.
RAR 120 êáàéò
|
?
|
| Jian Zhang, John F. Arnold, and Michael R. Frater |
A Cell-Loss Concealment Technique for MPEG-2 Coded Video |
Abstract—Audio-visual and other multimedia services are seen
as important sources of traffic for future telecommunication networks,
including wireless networks. A major drawback with some
wireless networks is that they introduce a significant number of
transmission errors into the digital bitstream. For video, such errors
can have the effect of degrading the quality of service to the
point where it is unusable. In this paper, we introduce a technique
that allows for the concealment of the impact of these errors. Our
work is based on MPEG-2 encoded video transmitted over a wireless
network whose data structures are similar to those of asynchronous
transfer mode (ATM) networks. Our simulations include
the impact of the MPEG-2 systems layer and cover cell-loss rates
up to 5%. This is substantially higher than those that have been
discussed in the literature up to this time. We demonstrate that
our new approach can significantly increase received video quality,
but at the cost of a considerable computational overhead.We then
extend our technique to allow for higher computational efficiency
and demonstrate that a significant quality improvement is still possible.
RAR 286 êáàéò
|
?
|
| Fabio Lavagetto and Roberto Pockaj |
The Facial Animation Engine: Toward a High-Level Interface for the Desgn of MPEG-4 Compliant Animated Faces |
Abstract—In this paper, we propose a method for implementing
a high-level interface for the synthesis and animation of animated
virtual faces that is in full compliance with MPEG-4 specifications.
This method allows us to implement the simple facial object
profile and part of the calibration facial object profile.
In fact, starting from a facial wireframe and from a set of con-
figuration files, the developed system is capable of automatically
generating the animation rules suited for model animation driven
by a stream of facial animation parameters. If the calibration
parameters (feature points and texture) are available, the system
is able to exploit this information for suitably modifying the
geometry of the wireframe and for performing its animation
by means of calibrated rules computed ex novo on the adapted
somatics of the model.
RAR 1230 êáàéò
|
?
|
| Austin Y. Lan, Anthony G. Nguyen, and Jenq-Neng Hwang |
Scene-Context-Dependent Reference-Frame Placement for MPEG Video Coding |
Abstract—The MPEG video-compression standard effectively
exploits spatial, temporal, and coding redundancies in the algorithm.
In its generic form, however, only a minimal amount of
scene adaptation is performed. Video can be further compressed
by taking advantage of scenes where the temporal statistics allow
larger interreference-frame distances. This paper proposes the
use of motion analysis (MA) to adapt to scene content. The
actual picture type [intracoded (I), predicted (P), or bidirectionally
coded (B)] decision is made by examining the accumulation
of motion measurements since the last reference frame
(either I or P) was labeled. The proposed MA-based adaptivereference
frame-placement scheme outperforms the standard
fixed-reference frame-placement and adaptive schemes based on
histogram of difference. When compared with the standard fixed
scheme, depending on the video contents, this proposed algorithm
can achieve from 2 to 13.9% savings in bits while maintaining
similar quality.
RAR 628 êáàéò
|
?
|
| Jos´e I. Ronda, Martina Eckert, Fernando Jaureguizar, and Narciso Garc´ýa |
Rate Control and Bit Allocation for MPEG-4 |
Abstract—In recent years, an interest has developed in the
coded representations of video signals allowing independent manipulation
of semantically independent elements (objects). Along
these lines, the ISO standard MPEG-4 enhances the traditional
concept of video sequence to convert it into a synchronized
set of visual objects organized in a flexible way. The real-time
generation of a bitstream according to this new paradigm, and
suitable for its transmission through either fixed- or variablerate
channels, results in a challenging new bit-allocation and
rate-control problem, which has to satisfy complex application
requirements.
This paper formalizes this new issue by focusing on the design
of rate-control systems for real-time applications. The proposed
approach relies on the modelization of the source and the optimization
of a cost criterion based on signal quality parameters.
Different cost criteria are provided, corresponding to a set of
relevant definitions of the object priority concept. Algorithms are
introduced to minimize the average distortion of the objects, to
guarantee desired qualities to the most relevant ones, and to keep
constant ratios among the object qualities.
RAR 722 êáàéò
|
?
|
| Peter Cherriman, Thomas Keller, and Lajos Hanzo |
Orthogonal Frequency-Division Multiplex Transmission of H.263 Encoded Video over Highly Frequency-Selective Wireless Networks |
Abstract— The video performance of a 155-Mbps wireless
asynchronous transfer mode (WATM) proposal and that of a
2-Mbps Universal Mobile Telecommunications System (UMTS)
concept is evaluated for a range of low- to high-quality video
application scenarios, various propagation conditions, and video
bit rates using the H.263 video codec, assisted by a novel
packetization and packet acknowledgment scheme. Orthogonal
frequency-division multiplexing is invoked over the highly
dispersive channels for conveying high-rate video signals.
Various binary Bose–Chaudhuri–Hochquenghem and turbo
codes are investigated comparatively, with the conclusion that
due to the high error resilience of the video packetization and
acknowledgment scheme, the increased power of the higher
complexity turbo codec does not translate to substantially
improved overall system robustness, although the bit error rate
and acknowledgment flag error rate are significantly reduced.
The whole range of video resolutions and system parameters
is summarized for reasons of space economy in Tables II–IV.
The required channel signal-to-noise ratio for near-unimpaired
video quality is about 16 dB for the inherently lower quality,
lower resolution video frame formats, but slightly higher, about
18 dB, for the high-definition formats, where the error-induced
subjective video degradations become more objectionabl over
the highly dispersive worst case channels used.
RAR 334 êáàéò
|
?
|
| Gauthier Lafruit, Lode Nachtergaele, Jan Bormans, Marc Engels, and Ivo Bolsens |
Optimal Memory Organization for Scalable Texture Codecs in MPEG-4 |
Abstract— This paper addresses the problem of minimizing
memory size and memory accesses in multiresolution texture
coding architectures for discrete cosine transform (DCT) and
wavelet-based schemes used, for example, in virtual-world walkthroughs
or facial animation scenes of an MPEG-4 system. The
problem of minimizing the memory cost is important since memory
accesses, memory bandwidth limitations, and in general the
correct handling of the data flows have become the true critical
issues in designing high-speed and low-power video-processing
architectures and in efficiently using multimedia processors. For
instance, the straightforward implementation of a multiresolution
texture codec typically needs an extra memory buffer of the same
size as the image to be encoded/decoded. We propose a new
calculation schedule that reduces this buffer memory size with
up to two orders of magnitude, while still ensuring a number
of external (off-chip) memory accesses that is very close to the
theoretical minimum. The analysis is generic and is therefore
useful for both wavelet and multiresolution DCT codecs.
RAR 816 êáàéò
|
?
|
| Andr´e Kaup |
Object-Based Texture Coding of Moving Video in MPEG-4 |
Abstract—This paper describes some of the most promising
segment-based coding techniques which have been investigated
in the course of the MPEG-4 standardization process. Padding
methods aim at extending arbitrarily shaped image segments
to a regular block grid such that common hybrid block-based
coding techniques can be applied. A simple and efficient padding
technique employing low-pass extrapolation is outlined which
yields a signal extension with high energy concentration in the
low-frequency area. Simulations indicate that this method is well
suited for block-based video coding, and clearly outperforms
other low-complexity extrapolation methods with respect to coding
efficiency. In contrast to padding techniques, shape-adaptive
methods take advantage of the shape information available at
the decoder side. A well-known representative of this class is
the SA–DCT. However, having been primarily designed for
intraframe coding, it is shown that the transform is suboptimal
when applied to interframe coding. Using a suitable covariance
model, it is demonstrated that a rescaled, orthonormalized
transform much closer approximates the optimal shape-adaptive
eigentransform of motion-compensated frame difference images.
Rate distortion curves verify that orthonormalization improves
coding efficiency in interframe coding by up to 2 dB while not
adding to complexity. In a comparison, it is finally shown that
extrapolation and SA–DCT perform very closely in the case of low
data rates, while there is a clear advantage for the shape-adaptive
transform in the case of high-quality video coding.
RAR 312 êáàéò
|
?
|
| Noel Brady |
MPEG-4 Standardized Methods for the Compression of Arbitrarily Shaped Video Objects |
Abstract—MPEG-4 is the most recent standard for audio-visual
representation to be published by the International Organization
for Standardization. One of the many new features of MPEG-4 is
its ability to represent two-dimensional video objects of arbitrary
shape. For this purpose, MPEG-4 uses the conventional motioncompensated
discrete cosine transform syntax for color/texture
coding and augments this with an explicit compressed representation
of the video object’s shape. This paper is intended as a
tutorial in the means of encoding and decoding arbitrarily shaped
video objects as specified by MPEG-4. The major emphasis of the
paper is on explaining the compression technology associated with
the normative shape representation, i.e., block-based contextbased
arithmetic encoding, but some new aspects associated
with arbitrarily shaped texture coding are also highlighted. The
MPEG-4 specifications are presented in an informal way, and the
motivations underlying the algorithm are clarified. In addition,
effective methods are suggested for performing many of the
nonnormative encoding tasks, and several encoding performance
tradeoffs are illustrated.
RAR 580 êáàéò
|
?
|
| Anthony Vetro, Huifang Sun, and Yao Wang |
MPEG-4 Rate Control for Multiple Video Objects |
Abstract—This paper describes an algorithm which can achieve
a constant bit rate when coding multiple video objects. The implementation
is a nontrivial extension of the MPEG-4 rate control
algorithm for single video objects which employs a quadratic ratequantizer
model. The algorithm is organized into two stages: a
pre- and a postencoding stage. In the preencoding stage, an initial
target estimate is made for each object. Based on the buffer fullness,
the total target is adjusted and then distributed proportional
to the relative size, motion, and variance of each object. Based on
the new individual targets and rate-quantizer relation for texture,
appropriate quantization parameters are calculated. After each
object is encoded, the model parameters for each object are
updated, and if necessary, frames are skipped to ensure that the
buffer does not overflow. A preframeskip control is exercised to
avoid buffer overflow when the motion and shape information
occupies a significant portion of the bit budget. The rate control
algorithm switches between two operation modes so that the
coder can reduce the spatial coding accuracy for an improved
temporal resolution. A shape-coding control mechanism is also
proposed, which provides a tradeoff between texture and shape
coding accuracy. Overall, the algorithm is able to successfully
achieve the target bit rate, effectively code arbitrarily shaped
objects, and maintain a stable buffer level. These techniques have
been adopted by the MPEG committee in July 1997 as part of
the video Verification Model (VM8).
RAR 484 êáàéò
|
?
|
| Gabriel Antunes Abrantes, and Fernando Pereira |
MPEG-4 Facial Animation Technology: Survey, Implementation, and Results |
Abstract—The emerging MPEG-4 standard specifies an objectbased
audiovisual representation framework, integrating both
natural and synthetic content. Tools supporting three-dimensional
facial animation will be standardized for the first time. To support
facial animation decoders with different degrees of complexity,
MPEG-4 uses a profiling strategy, which foresees the specification
of object types, profiles, and levels adequate to the various
relevant application classes. This paper first gives an overview
of the MPEG-4 facial animation technology. Subsequently, the
paper describes the Instituto Superior T´ecnico implementation
of an MPEG-4 facial animation system, then briefly evaluates the
performance of the various tools standardized, using the MPEG-4
test material.
RAR 1153 êáàéò
|
?
|
| Junehwa Song, and Boon-Lock Yeo |
Fast Extraction of Spatially Reduced Image Sequences from MPEG-2 Compressed Video |
Abstract—MPEG-2 video standards are targeted for highquality
video broadcast and distribution and are optimized for
efficient storage and transmission. However, it is difficult to
process MPEG-2 for video browsing and database applications
without first decompressing the video. Yeo and Liu [1] have
proposed fast algorithms for the direct extraction of spatially
reduced images from MPEG-1 video. Reduced images have been
demonstrated to be effective for shot detection, shot browsing and
editing, and temporal processing of video for video presentation
and content annotation. In this paper, we develop new tools to
handle the extra complexity in MPEG-2 video for extracting
spatially reduced images. In particular, we propose new classes of
discrete cosine transform (DCT) domain and DCT inverse motion
compensation operations for handling the interlaced modes in the
different frame types of MPEG-2, and we design new and efficient
algorithms for generating spatially reduced images of an MPEG-
2 video. The algorithms proposed in this paper are fundamental
for efficient and effective processing of MPEG-2 video.
RAR 767 êáàéò
|
?
|
| Carsten Herpel |
Elementary Stream Management in MPEG-4 |
Abstract—The forthcoming MPEG-4 standard specifies in its
systems part an audiovisual scene description and functionality
for the elementary stream management. The elementary streammanagement
functionality is introduced here. It consists of a
media object description framework that describes the streaming
resources that form part of an MPEG-4 presentation and of a
synchronization syntax incorporated in a flexible sync layer with
an underlying systems decoder model. The final section outlines
the transport and session setup for MPEG-4 presentations on
relevant transport media, namely, the Internet and in digital
broadcast scenarios.
RAR 338 êáàéò
|
?
|
| Han-Chiang Shyu and Jin-Jang Leou |
Detection and Concealment of Transmission Errors in MPEG-2 Images—A Genetic Algorithm Approach |
Abstract—In this paper, the detection and concealment approach
to transmission errors in MPEG-2 images using genetic
algorithms (GA’s) is proposed. For entropy-coded MPEG-2 images,
a transmission error in a codeword will not only affect the
underlying codeword but also may affect subsequent codewords,
resulting in a great degradation of the received images. Here,
a transmission error may be a single-bit error or a burst error.
The objective of the proposed approach is to recover high-quality
MPEG-2 images from the corresponding corrupted MPEG-2
images without increasing the transmission bit rate.
In the proposed error-detection approach, by using the constraints
imposed on compressed image data, all the slices within
an MPEG-2 picture can be correctly located. After a slice is
located, similar to Chu and Leou [22], transmission errors within
the slice are detected by two successive procedures: 1) whether
the slice is corrupted or not is determined by checking a set
of error-detection conditions under decoding and 2) the precise
location (block-based) of the first transmission error (i.e., the
first corrupted block) within the corrupted slice is located by
a block-based backtracking procedure. For a corrupted block,
the proposed GA approach to error concealment is employed to
conceal the corrupted block by iteratively performing reproduction/
crossover/mutation operations and evaluating the proposed
fitness function until the stopping criterion is satisfied. Based
on the simulation results obtained in this study, the proposed
approach can recover high-quality MPEG-2 images from the
corresponding corrupted images up to a bit error rate of 0.5%.
RAR 1324 êáàéò
|
?
|
| Hai Tao, Homer H. Chen, Wei Wu, and Thomas S. Huang, Fellow |
Compression of MPEG-4 Facial Animation Parameters for Transmission of Talking Heads |
Abstract—The emerging MPEG-4 standard supports the transmission
and composition of facial animation with natural video.
The new standard will include a facial animation parameter
(FAP) set that is defined based on the study of minimal facial
actions and is closely related to muscle actions. The FAP
set enables model-based representation of natural or synthetic
talking-head sequences and allows intelligible visual reproduction
of facial expressions, emotions, and speech pronunciations at
the receiver. This paper addresses the data-compression issue of
talking heads and presents three methods for bit-rate reduction
of FAP’s. Compression efficiency is achieved by way of transform
coding, principal component analysis, and FAP interpolation.
These methods are independent of each other in nature and thus
can be applied in combination to lower the bit-rate demand of
FAP’s, making possible the transmission of multiple talking heads
over band-limited channels. The basic methods described here
have been adopted into the MPEG-4 Visual Committee Draft
[1] and are readily applicable to other articulation data such
as body animation parameters. The efficacy of the methods is
demonstrated by both subjective and objective results.
RAR 652 êáàéò
|
?
|
| Jui-Hua Li and Nam Ling |
Architecture and Bus-Arbitration Schemes for MPEG-2 Video Decoder |
Abstract—An efficient MPEG-2 video decoder architecture together
with several effective bus-arbitration schemes designed to
meet the main profile at main level (MP@ML) real-time decoding
requirement is presented in this paper. The overall architecture,
as well as the design of major function-specific processing blocks
(variable-length decoder, inverse two-dimensional discrete cosine
transform unit, and motion-compensation unit), is discussed.
A hierarchical and distributed controller approach is used, a
bus-monitoring model for different bus-arbitration schemes to
control external DRAM accesses is developed, and the system is
simulated. Practical issues and buffer sizes are addressed and
evaluated. With a 27-MHz clock, our architecture uses many
fewer than the 667 cycles, the upper bound for the MP@ML
decoding requirement, to decode each macroblock with a single
external bus and DRAM.
RAR 549 êáàéò
|
?
|
| JPaul G. Howard, Faouzi Kossentini, Bo Martins, Søren Forchhammer, and William J. Rucklidge |
The Emerging JBIG2 Standard |
Abstract—The Joint Bi-Level Image Experts Group (JBIG), an
international study group affiliated with ISO/IEC and ITU-T, is
in the process of drafting a new standard for lossy and lossless
compression of bilevel images. The new standard, informally
referred to as JBIG2, will support model-based coding for text
and halftones to permit compression ratios up to three times
those of existing standards for lossless compression. JBIG2 will
also permit lossy preprocessing without specifying how it is to
be done. In this case, compression ratios up to eight times those
of existing standards may be obtained with imperceptible loss of
quality. It is expected that JBIG2 will become an international
standard by 2000.
RAR 685 êáàéò
|
?
|
| Peter Cherriman and Lajos Hanzo |
Programmable H.263-Based Wireless Video Transceivers for Interference-Limited Environments |
Abstract— In order to exploit the nonuniformly distributed
channel capacity over the cell area, the intelligent 7.3-kB programmable
videophone transceiver of Table I is proposed, which
is capable of exploiting the higher channel capacity of uninterfered,
high-channel-quality cell areas, while supporting more
robust, but lower bit-rate operation in more interfered areas.
The system employed an enhanced H.263-compatible video codec.
Since most existing wireless systems exhibit a constant bit-rate,
the video codec’s bit-rate fluctuation was smoothed by a novel
adaptive packetization algorithm, which is capable of supporting
automatic repeat request (ARQ)-assisted operation in wireless
distributive video transmissions, although in the proposed lowlatency
interactive videophone transceiver, we refrained from
using ARQ. Instead, corrupted packets are dropped by both the
local and remote decoders in order to prevent error propagation.
The minimum required channel signal-to-interference-plus-noise
ratio (SINR) was in the range of 8–28 dB for the various
transmission scenarios of Table I, while the corresponding video
peak-signal-to-noise ratio (PSNR) was in the range of 32–39 dB.
The main system features are summarized in Table I.
RAR 288 êáàéò
|
?
|
| Guy Cˆot´e, Berna Erol, Michael Gallant, and Faouzi Kossentini |
H.263+: Video Coding at Low Bit Rates |
Abstract—In this tutorial paper, we discuss the ITU-T H.263+
(or H.263 Version 2) low-bit-rate video coding standard. We
first describe, briefly, the H.263 standard including its optional
modes. We then address the 12 new negotiable modes of
H.263+. Next, we present experimental results for these modes,
based on our public-domain implementation (see our Web
site at http://spmg.ece.ubc.ca). Tradeoffs among compression
performance, complexity, and memory requirements for the
H.263+ optional modes are discussed. Finally, results for mode
combinations are presented.
RAR 667 êáàéò
|
?
|
| Yao Wang and J¨orn Ostermann |
Evaluation of Mesh-Based Motion Estimation in H.263-Like Coders |
Abstract—In this paper, we present two mesh-based motion
estimation algorithms, and evaluate their performance when
incorporated in an H.263-like block-based video coder. Both
algorithms compute nodal motions in a hierarchical manner.
Within each hierarchy level, the first algorithm (HMMA) minimizes
the prediction error in the four elements surrounding
each node, where the prediction is accomplished by a bilinear
mapping. The optimal solution is obtained by a full search
within a range defined by the topology of the mesh. The second
algorithm (HBMA) minimizes the error in a block surrounding
each node, assuming the motion in the block is constant. In
both cases, bilinear mapping is used for motion-compensated
prediction based on nodal displacements. The two algorithms are
compared with an exhaustive block-matching algorithm (EBMA)
by evaluating their performances in temporal prediction and in
an H.263/TMN4 coder. For prediction only, the HMMA and
HBMA algorithms yield visually more satisfactory results, even
though the PSNR’s of predicted images are on average lower.
The coded images also have lower PSNR’s at similar bit rates.
The coding artifacts are different: while the block-based method
leads to more severe block distortions, the mesh-based method
experiences some warping artifacts. The HMMA algorithm outperforms
HBMA slightly for certain sequences at the expense of
higher computational complexity.
RAR 233 êáàéò
|
?
|
| Stephan Wenger, Gerd Knorr, J¨org Ott, and Faouzi Kossentini |
Error Resilience Support in H.263+ |
Abstract—Version 2 of ITU Recommendation H.263, better
known as H.263+, includes a number of new mechanisms to
improve coding efficiency and support various types of networks
more efficiently. This paper provides an overview of the error
resilience optional modes of H.263+ and describes the use of such
modes in various network scenarios.
RAR 202 êáàéò
|
?
|
| Wenwu Zhu, Yiwei Thomas Hou, Yao Wang, and Ya-Qin Zhang |
End-to-End Modeling and Simulation of MPEG-2 Transport Streams over ATM Networks with Jitter |
Abstract—In this paper, the operation of MPEG-2 systems
is modeled and simulated when an MPEG-2 transport stream
is delivered through an ATM network with jitter. End-to-end
packet-based analysis is performed for delivery of MPEG-2
transport streams over ATM networks. A novel approach to
analyzing the decoder buffer behavior in the presence of network
jitter is presented. The probability density function of
the interarrival time of the ATM adaptation layer 5 (AAL5)
protocol data unit (PDU) is derived from an MPEG-2 video
source model and an ATM network jitter model. Based on
a real-time decoding requirement of the MPEG-2 transport
stream (TS) system target decoder (T-STD), the decoder buffer
behavior is simulated. In this simulation, the packets’ arrivals
follow the derived probability density function of the AAL5 PDU
interarrival time. The modeling and simulation results show the
interactions among packet loss ratio, decoder buffer size, and
network jitter level. We found that jitter affects decoder buffer
size and packet loss ratio in a significant way.
RAR 70 êáàéò
|
?
|
| Wen-Jeng Chu and Jin-Jang Leou |
Detection and Concealment of Transmission Errors in H.261 Images |
Abstract—In this study, the detection and concealment approach
to transmission errors in H.261 images is proposed. For
entropy-coded H.261 images, a transmission error in a codeword
will not only affect the underlying codeword, but also may affect
subsequent codewords, resulting in a great degradation of the
received images. Here a transmission error may be a single-bit
error or a burst error containing N successive error bits. The
objective of the proposed approach is to recover high-quality
H.261 images from the corresponding corrupted H.261 images,
without increasing the transmission bit rate.
In the proposed approach, using the constraints imposed on
compressed image data, all the groups of blocks (GOB’s) within
an H.261 picture can be correctly located. After a GOB is
located, transmission errors within the GOB are detected by
two successive procedures: 1) whether the GOB is corrupted or
not is determined by checking a set of error-checking conditions
under decoding and 2) the precise location (block-based) of the
first transmission error (i.e., the first corrupted block) within the
GOB is located by a block-based backtracking procedure. For
a corrupted block, a set of concealed block candidates, SC, is
generated, and a proposed fitness function for error concealment
is used to select the “best” concealed block candidate among SC
as the concealed block of the corrupted block.
Based on the simulation results obtained in this study, the
proposed approach can indeed recover high-quality H.261 images
from their corresponding corrupted H.261 images. This shows the
feasibility of the proposed approach.
RAR 295 êáàéò
|
?
|
| Yong He, Ishfaq Ahmad, and Ming L. Liou, Fellow |
A Software-Based MPEG-4 Video Encoder Using Parallel Processing |
Abstract—In this paper, we describe a software-based MPEG-
4 video encoder which is implemented using parallel processing
on a cluster of workstations collectively working as a virtual
machine. The contributions of our work are as follows. First, a
hierarchical Petri-nets-based modeling methodology is proposed
to capture the spatiotemporal relationships among multiple objects
at different levels of an MPEG-4 video sequence. Second, a
scheduling algorithm is proposed to assign video objects to workstations
for encoding in parallel. The algorithm determines the
execution order of video objects, ensures that the synchronization
requirements among them are enforced and that presentation
deadlines are met. Third, a dynamic partitioning scheme is
proposed which divides an object among multiple workstations to
extract additional parallelism. The scheme achieves load balancing
among the workstations with a low overhead. The striking
feature of our encoder is that it adjusts the allocation and
partitioning of objects automatically according to the dynamic
variations in the video object behavior. We have made various
additional software optimizations to further speed up the computation.
The performance of the encoder can scale according
to the number of workstations used. With 20 workstations, the
encoder yields an encoding rate higher than real time, allowing
the encoding of multiple sequences simultaneously.
RAR 534 êáàéò
|
?
|
| Pedro A. A. Assun¸c˜ao and Mohammed Ghanbari |
A Frequency-Domain Video Transcoder for Dynamic Bit-Rate Reduction of MPEG-2 Bit Streams |
Abstract—Many of the forthcoming video services and multimedia
applications are expected to use preencoded video for storage
and transmission. Video transcoding is intended to provide
transmission flexibility to preencoded bit streams by dynamically
adjusting the bit rate of these bit streams according to new bandwidth
constraints that were unknown at the time of encoding. In
this paper, we propose a drift-free MPEG-2 video transcoder,
working entirely in the frequency domain. The various modes of
motion compensation (MC) defined in MPEG-2 are implemented
in the discrete cosine transform (DCT) domain at reduced computational
complexity. By using approximate matrices to compute
the MC–DCT blocks, we show that computational complexity can
be reduced by 81% compared with the pixel domain approach.
Moreover, by using a Lagrangian rate-distortion optimization for
bit reallocation, we show that optimal transcoding of high-quality
bit streams can produce better picture quality than that obtained
by directly encoding the uncompressed video at the same bit rates
using a nonoptimized Test Model 5 (TM5) encoder.
RAR 464 êáàéò
|
?
|
| Thomas Sikora |
The MPEG-4 Video Standard Verification Model |
Abstract—The MPEG-4 standardization phase has the mandate
to develop algorithms for audio-visual coding allowing for
interactivity, high compression, and/or universal accessibility and
portability of audio and video content. In addition to the conventional
“frame”-based functionalities of the MPEG-1 and MPEG-2
standards, the MPEG-4 video coding algorithm will also support
access and manipulation of “objects” within video scenes.
The January 1996 MPEG Video group meeting witnessed the
definition of the first version of the MPEG-4 Video Verification
Model—a milestone in the development of the MPEG-4 standard.
The primary intent of the Video Verification Model is to provide
a fully defined core video coding algorithm platform for the
development of the standard. As such, the structure of the
MPEG-4 Video Verification Model already gives some indication
about the tools and algorithms that will be provided by the final
MPEG-4 standard. The purpose of this paper is to describe the
scope of the MPEG-4 Video standard and to outline the structure
of the MPEG-4 Video Verification Model under development.
Index Terms— Coding efficiency, compression, error robustness,
flexible coding, functional coding, manipulation, MPEG,
MPEG-4, multimedia, natural video, object-based coding, SNHC,
standardization, synthetic video, universal accessibility, verification
model, video coding.
RAR 1042 êáàéò
|
?
|
| Jungwoo Lee, and Bradley W. Dickinson, Fellow |
Rate-Distortion Optimized Frame Type Selection for MPEG Encoding |
Abstract—In this paper, we present an algorithm for joint
optimization of anchor frame separation and bit allocation for
motion-compensated video coders. The anchor frame separation
is optimized in the sense that the distortion is minimized under
a bit budget constraint. At the same time, the quantization
for each frame in a group of pictures is also optimized in an
operational rate distortion sense. The optimal anchor frame
separation does depend on the quantization of each frame so
that the two optimization problems cannot be separated. A
Lagrange multiplier approach can be used to obtain the optimal
solution if we assume that the rate-distortion curve is convex.
Heuristic algorithms based on simulated annealing and greedy
trellis selection are also presented to reduce the computational
complexity.
RAR 289 êáàéò
|
?
|
| Fernando Pereira, and Thierry Alpert |
MPEG-4 Video Subjective Test Procedures and Results |
Abstract—In the recent years, the technical developments in the
area of audio-visual communications, notably in video coding,
encouraged the emergence of new services which are already
changing our everyday life. The convergence of the telecommunications,
computer, and TV/film technologies is leading to the
intermixture of elements formerly characteristic of each one of
these fields, creating new needs and new requirements. Among
the most important trends is the need to increase the interaction
capabilities between the user and the audio-visual information,
notably by considering the scene as a composition of objects—the
content—according to a script that describes their spatial and
temporal behavior and not just a set of pixels.
MPEG-4 is a new audio-visual standard aiming to establish a
universal, efficient coding of different forms of audio-visual data,
called audio-visual objects. To reach this target, MPEG-4 has
called for proposals on techniques that may be instrumental to
efficiently represent visual information, allowing simultaneously
high degrees of content-based interactivity and error resilience.
This paper addresses the conditions under which the proposals
to the MPEG-4 first round of video subjective tests have been
evaluated. Moreover, the most significative results of these tests
are also presented.
RAR 607 êáàéò
|
?
|
| Huifang Sun, Wilson Kwok, Max Chien, and C. H. John Ju |
MPEG Coding Performance Improvement by Jointly Optimizing Coding Mode Decisions and Rate Control |
Abstract—This paper presents a new algorithm for determining
the optimal MPEG [1] coding strategy in terms of the selection of
macroblock coding modes and quantizer scales. In the algorithm
proposed in the Test Model [2] the rate control operates independently
from the coding mode selection for each macroblock.
The coding mode is decided based only upon the energy of
predictive residues. Actually, the two processes of coding mode
decision and rate control are intimately related to each other
and should be determined jointly in order to achieve optimal
coding performance. We formulate the constrained optimization
problem and present solutions based upon rate-distortion
characteristics, or R(D) curves, for all the macroblocks that
compose the picture being coded. Distortion for the entire picture
is assumed to be decomposable and expressible as a function of
individual macroblock distortions, with this being the objective
function to minimize. The determination of the optimal solution is
complicated by the MPEG differential encoding of motion vectors
and dc coefficients, which introduce dependencies that carry over
from macroblock to macroblock for a duration equal to the slice
length. As an approximation, a near-optimum greedy algorithm is
proposed. Once the upper bound in performance is calculated, it
can be used to assess how well practical suboptimum methods
perform. Finally, such a practical suboptimum algorithm is
proposed and evaluated.
RAR 291 êáàéò
|
?
|
| Leonardo Chiariglione |
MPEG and Multimedia Communications |
Abstract—Digital television is a reality today, but multimedia
communications, after years of hype, is still a catchword. Lack
of suitable multi-industry standards supporting it is one reason
for the unfulfilled promise.
The MPEG committee which originated the MPEG-1 and
MPEG-2 standards that made digital television possible is currently
developing MPEG-4 with wide industry participation. This
paper describes how the MPEG-4 standard, with its networkindependent
nature and application-level features, is poised to
become the enabling technology for multimedia communications
and will therefore contribute to solve the problems that are
hindering multimedia communications.
RAR 189 êáàéò
|
?
|
| C. Michael Sharon, Ioannis Lambadaris, Michael Devetsikiotis,and A. Roger Kaye |
Modeling and Control of VBR H.261 Video Transmission over Frame Relay Networks |
Abstract—This paper examines the transmission of variable bitrate
(VBR) H.261 video traffic over a mixed traffic (video/inter-
LAN data) integrated services frame relay (FR) network. We
introduce a modified H.261 codec that produces VBR output and
show that parsing of the video bit stream at group of blocks
(GOB) boundaries produces variable length FR packets which are
well suited to network characteristics.We demonstrate that GOBlevel
video traffic requires a more sophisticated statistical model
of the resulting data stream than the frame-level models that are
frequently used. The transform expand sample (TES) technique
is used to obtain an accurate model of the autocorrelation and the
marginal probability distribution of the bit-rate variations at the
GOB level. A simple and effective methodology is introduced for
capturing the periodic components that are present in the GOBlevel
autocorrelation. The methodology is extended to permit
simulations of VBR codecs in which the codec quantization step
size is adjusted in response to prevailing network conditions.
Furthermore, we show that the quality of service requirements
of VBR video can be met in the presence of inter-LAN data
traffic by making use of the FR backward explicit congestion
notification (BECN) facility in conjunction with a modified H.261
codec whose rate is controlled by the congestion notification.
We also show that the performance of the control mechanism
is significantly influenced by a subset of network threshold and
codec control parameters which are identified using 2k factorial
analysis techniques. Further, we obtain optimal ranges of values
for these parameters using mean field annealing. Finally, we show
that variable quantization rate control may be more effective for
this purpose than variable frame rate control, and that the resulting
improvement in performance over that of an uncontrolled
network can be significant.
need for enterprise networks to make flexible use of bandwidth
and to explore the potential of packet-switched and frame relay
(FR) transport protocols in order to achieve this goal.
RAR 234 êáàéò
|
?
|