The Caruso Lab
Producing 3D video in Caruso Lab, Play video
Research
and teaching
The Caruso Lab at the Dept of Telematics, NTNU, shall support teaching, research and development within the area of Networked Multimedia Systems.
The research focus is on ‘Adaptive
Scene and Traffic Control in Networked Multimedia Systems’, as outlined below:
'The Distributed Multimedia Plays
Systems Architecture (DMP) provides autostereoscopic multiview video and 3D
sound collaboration between performers over packet networks. To guarantee the
end-to-end time delay less than 10-20 milliseconds, and obtain high network
resource utilization, the perceived quality of audio-visual content is allowed
to vary with the traffic in the network. Parameters that are included in the
quality concept, and that can be controlled adaptively, are the end-to-end
delay, the number of 3D scene sub-objects and their temporal and spatial
resolution, adaptive and scalable compression of sub-scenes, and the number of
views. To approach the natural level of human perception, the quality has to be
increased to levels that temporarily require data rates of 10th of
Giga bits per second between two users. An integrated resolution and traffic
control algorithm denoted the RL-C
algorithm, using traffic classes, measurement and forecasting of traffic,
feedback control and traffic shaping (packet drop and source scaling), provides
the necessary control. Actual DMP applications from arts are jazz sessions,
song lessons, and distributed opera. Other applications are in coming
generations TV (MHP extended with DMP), games, education, and near
natural-seeing virtual meetings.'
For an introduction to the research area see two papers below.
Equipment
for stereoscopic shooting, editing and projection
Ercim News No 63, October 2005
Adaptive Scene and Traffic Control in
Networked Multimedia Systems
by Leif Arne Rønningen
leifarne@item.ntnu.no
The Distributed Multimedia Plays
(DMP) Systems Architecture provides three-dimensional multiview video and sound
collaboration between performers over packet networks. To guarantee an
end-to-end time delay under twenty milliseconds, and to obtain high network
resource utilization, the perceived quality of audio-visual content is allowed
to vary with the traffic in the network. Typical applications are in the next
generation of televisions (Multimedia Home Platform (MHP) extended with DMP),
games, education, and virtual meetings that have a realistic feel. The
architecture is also suitable for use in creating virtual collaborations for
jazz sessions, music lessons and distributed opera.
Research at the
A DMP terminal system is shown in
Figure 1. The two upper layers must always be present, while the two lower
layers represent wireless and fibre alternatives. Network nodes include
resolution and traffic control in addition to normal routing functions.
|
Scene Characteristics
To give the virtual collaboration a natural feel, the size, form and position
of objects should be near natural size, the displayed picture shall be
flicker-free, and individual pixels should not be visible at a viewing distance
of fifty centimetres. The normal comfortable distance between people generally
varies between fifty centimeters and several metres, but may also be several
hundred metres. The natural viewing area of human eyes is used as the ‘frame’
of the scene.
Sub-objects and 2D Interlace
In the RL algorithm - an integrated resolution and traffic control algorithm -
each object is divided into two, four or more sub-objects, which are sent in
separate streams. This is called the 2D interlace.
Video Segmentation and Multiview
Shooting
Object recognition and tracking is carried out by analysing video sequences,
shot by multiview cameras. Some of the most important factors for DMP are face
and eye recognition and tracking. Note that the motion estimation of
object-oriented scene objects is moved away from the coder (compressor) to the
object- and eye-tracking systems.
Compression and Coding
Compression algorithms can make use of the Discrete Cosine Transform together
with Huffman coding, or wavelet transforms that represent data in the
time-frequency space. The Wavelet transform makes level scalability easy.
The RL Resolution and Traffic
Control Algorithm
A detailed description of the RL-C algorithm can be found in the reference.
Simulations show that the end-to-end delay can be guaranteed. However, formal
tests must still be conducted to show when and to what extent the time-varying
resolution of audio-visual content is acceptable.
Service and Scene Setup, and Address
Allocation
SMIL (Synchronized Multimedia Integration Language) is used for scene
composition. The SIP/TCP/IPv6 protocols are used for set-up, and the SIP URL
identifies services, scenes and subscenes. Parameters in the combined
RTP/UDP/IPv6 protocol header identify the same for content transfer. IPv6
addresses will be allocated in a private way, and the SIP and RTP protocols are
adapted for this application.
Variable-Resolution Display
A viewer focuses on one scene object at a time, and most viewers focus on the
same object at any given time. An eye-tracking system identifies these objects,
and orders the encoder on the shooting side to shoot and represent these
objects with high resolution.
A Virtual Dinner Scenario
Researcher A in
|
Researcher B stands up and walks
across the room. After a few minutes they need to talk to researcher C in
PhD funding has been provided from the Research Council of Norway (NFR).
IADAT tcn2005 Conference:
Adaptive scene & traffic control in
DMP
Leif Arne Rønningen leifarne@item.ntnu.no
Dept.
of telematics,
Norwegian
This paper reports from an
ongoing project at NTNU aiming at near-natural feel of content in network based
multimedia collaborations called DMP (Distributed Multimedia Plays). DMP
includes virtual collaborations in e.g., games, business, education, concerts,
opera, theatre and future television, and this developing technology could be
in practical use before 2015.
To obtain the near-natural
feel, DMP includes novel adaptive scene & traffic control architectures.
The scenes consist of adaptive, distributed, auto-stereoscopic, multi-view object-oriented
sub-scenes, and the spatial and temporal resolution of video and sound is
balanced against the delay (traffic) in the network. The data rate from only
one scene typically varies from 10th of Mbps to 10th of
Gbps, while the delay through the network is limited largely by the propagation
delay.
The DMP architecture applies
the RL-C resolution & traffic control algorithm, an elaboration of the RL
algorithm described in (1). The RL-C
algorithm guarantees the maximum end-to-end delay through a packet network, and
has to be implemented in the application layer in network nodes (routers) and
user terminal equipment. The user terminal application layer implements a
two-way multimedia collaboration architecture consisting of the following main
cooperating modules
Assuming a European-wide
network, a DMP network topology has been proposed, and the collaboration- and
the network architecture behavior are modeled and simulated. Three different
scene traffic sources are modeled, ‘The virtual dinner’, ‘A virtual song
lesson’, and a future interactive TV movie. The simulations show that the DMP
architecture guarantees the trans-Europe delay of packets can be lower than
about 60 ms. In order to obtain near-natural feel of the content, an important
object (face, etc) must meet the following requirements
If the size of the important
object is 600 x 900 mm, the compressed data rate from the object is about 25
Gbps. However, the scene resolution is adaptive and may be scaled down so the
important object generates less than 5 Gbps for periods of several hundreds of
milliseconds without annoying (perceived) quality reduction.
The presentation of this
paper includes videos from the DMP collaborations used, behavior and adaptive
traffic description from the simulations, and structural block diagrams.