The Caruso Lab
Producing 3D video in Caruso Lab, Play video
Research and teaching
The Caruso Lab at the Dept of Telematics, NTNU, shall support teaching, research and development within the area of Networked Multimedia Systems.
The research focus is on ‘Adaptive Scene and Traffic Control in Networked Multimedia Systems’, as outlined below:
'The Distributed Multimedia Plays Systems Architecture (DMP) provides autostereoscopic multiview video and 3D sound collaboration between performers over packet networks. To guarantee the end-to-end time delay less than 10-20 milliseconds, and obtain high network resource utilization, the perceived quality of audio-visual content is allowed to vary with the traffic in the network. Parameters that are included in the quality concept, and that can be controlled adaptively, are the end-to-end delay, the number of 3D scene sub-objects and their temporal and spatial resolution, adaptive and scalable compression of sub-scenes, and the number of views. To approach the natural level of human perception, the quality has to be increased to levels that temporarily require data rates of 10th of Giga bits per second between two users. An integrated resolution and traffic control algorithm denoted the RL-C algorithm, using traffic classes, measurement and forecasting of traffic, feedback control and traffic shaping (packet drop and source scaling), provides the necessary control. Actual DMP applications from arts are jazz sessions, song lessons, and distributed opera. Other applications are in coming generations TV (MHP extended with DMP), games, education, and near natural-seeing virtual meetings.'
For an introduction to the research area see two papers below.
Equipment for stereoscopic shooting, editing and projection
Ercim News No 63, October 2005
Adaptive Scene and Traffic Control in Networked Multimedia Systems
The Distributed Multimedia Plays (DMP) Systems Architecture provides three-dimensional multiview video and sound collaboration between performers over packet networks. To guarantee an end-to-end time delay under twenty milliseconds, and to obtain high network resource utilization, the perceived quality of audio-visual content is allowed to vary with the traffic in the network. Typical applications are in the next generation of televisions (Multimedia Home Platform (MHP) extended with DMP), games, education, and virtual meetings that have a realistic feel. The architecture is also suitable for use in creating virtual collaborations for jazz sessions, music lessons and distributed opera.
Research at the
A DMP terminal system is shown in Figure 1. The two upper layers must always be present, while the two lower layers represent wireless and fibre alternatives. Network nodes include resolution and traffic control in addition to normal routing functions.
To give the virtual collaboration a natural feel, the size, form and position of objects should be near natural size, the displayed picture shall be flicker-free, and individual pixels should not be visible at a viewing distance of fifty centimetres. The normal comfortable distance between people generally varies between fifty centimeters and several metres, but may also be several hundred metres. The natural viewing area of human eyes is used as the ‘frame’ of the scene.
Sub-objects and 2D Interlace
In the RL algorithm - an integrated resolution and traffic control algorithm - each object is divided into two, four or more sub-objects, which are sent in separate streams. This is called the 2D interlace.
Video Segmentation and Multiview
Object recognition and tracking is carried out by analysing video sequences, shot by multiview cameras. Some of the most important factors for DMP are face and eye recognition and tracking. Note that the motion estimation of object-oriented scene objects is moved away from the coder (compressor) to the object- and eye-tracking systems.
Compression and Coding
Compression algorithms can make use of the Discrete Cosine Transform together with Huffman coding, or wavelet transforms that represent data in the time-frequency space. The Wavelet transform makes level scalability easy.
The RL Resolution and Traffic
A detailed description of the RL-C algorithm can be found in the reference. Simulations show that the end-to-end delay can be guaranteed. However, formal tests must still be conducted to show when and to what extent the time-varying resolution of audio-visual content is acceptable.
Service and Scene Setup, and Address
SMIL (Synchronized Multimedia Integration Language) is used for scene composition. The SIP/TCP/IPv6 protocols are used for set-up, and the SIP URL identifies services, scenes and subscenes. Parameters in the combined RTP/UDP/IPv6 protocol header identify the same for content transfer. IPv6 addresses will be allocated in a private way, and the SIP and RTP protocols are adapted for this application.
A viewer focuses on one scene object at a time, and most viewers focus on the same object at any given time. An eye-tracking system identifies these objects, and orders the encoder on the shooting side to shoot and represent these objects with high resolution.
A Virtual Dinner Scenario
Researcher A in
Researcher B stands up and walks
across the room. After a few minutes they need to talk to researcher C in
PhD funding has been provided from the Research Council of Norway (NFR).
IADAT tcn2005 Conference:
Adaptive scene & traffic control in DMP
Dept. of telematics,
This paper reports from an ongoing project at NTNU aiming at near-natural feel of content in network based multimedia collaborations called DMP (Distributed Multimedia Plays). DMP includes virtual collaborations in e.g., games, business, education, concerts, opera, theatre and future television, and this developing technology could be in practical use before 2015.
To obtain the near-natural feel, DMP includes novel adaptive scene & traffic control architectures. The scenes consist of adaptive, distributed, auto-stereoscopic, multi-view object-oriented sub-scenes, and the spatial and temporal resolution of video and sound is balanced against the delay (traffic) in the network. The data rate from only one scene typically varies from 10th of Mbps to 10th of Gbps, while the delay through the network is limited largely by the propagation delay.
The DMP architecture applies the RL-C resolution & traffic control algorithm, an elaboration of the RL algorithm described in (1). The RL-C algorithm guarantees the maximum end-to-end delay through a packet network, and has to be implemented in the application layer in network nodes (routers) and user terminal equipment. The user terminal application layer implements a two-way multimedia collaboration architecture consisting of the following main cooperating modules
Assuming a European-wide network, a DMP network topology has been proposed, and the collaboration- and the network architecture behavior are modeled and simulated. Three different scene traffic sources are modeled, ‘The virtual dinner’, ‘A virtual song lesson’, and a future interactive TV movie. The simulations show that the DMP architecture guarantees the trans-Europe delay of packets can be lower than about 60 ms. In order to obtain near-natural feel of the content, an important object (face, etc) must meet the following requirements
If the size of the important object is 600 x 900 mm, the compressed data rate from the object is about 25 Gbps. However, the scene resolution is adaptive and may be scaled down so the important object generates less than 5 Gbps for periods of several hundreds of milliseconds without annoying (perceived) quality reduction.
The presentation of this paper includes videos from the DMP collaborations used, behavior and adaptive traffic description from the simulations, and structural block diagrams.