Bauckhage, C.; Hanheide, M.; Käster, T.; Pfeifer, M.; Sagerer, G. & Wrede, S.
Vision Systems with the Human in the Loop
EURASIP Journal on Applied Signal Processing, to appear, 2005.
Abstract: The emerging cognitive vision paradigm deals with vision systems that apply machine learning and automatic reasoning in order to learn from what they perceive. Cognitive vision systems can rate the relevance and consistency of newly acquired knowledge, they can adapt to their environment and thus will exhibit high robustness. This contribution presents vision systems that aim at flexibility and robustness. One is tailored for content-based image retrieval, the others are cognitive vision systems that constitute prototypes of visual active memories which evaluate, gather and integrate contextual knowledge for visual analysis. All three systems are designed to interact with human users. After we will have discussed adaptive content-based image retrieval and object and action recognition in an office environment, the issue of assessing cognitive systems will be raised. Experiences from psychologically evaluated human-machine interactions will be reported and the promising potential of psychologically based usability experiments will be stressed.
Bekel, H.; Heidemann, G. & Ritter, H.
Interactive Image Data Labeling Using Self-Organizing Maps in an Augmented Reality Scenario
Neural Networks , 2005.
Abstract: We present an approach for the convenient labeling of image patches gathered from an unrestricted environment. The system is employed for a mobile Augmented Reality (AR) gear: While the user walks around with the head-mounted AR-gear, context-free modules for focus-of-attention permanently sample the most ``interesting'' image patches. After this acquisition phase, a Self-Organizing Map (SOM) is trained on the complete set of patches, using combinations of MPEG-7 features as a data representation. The SOM allows visualization of the sampled patches and an easy manual sorting into categories. With very little effort, the user can compose a training set for a classifier, thus, unknown objects can be made known to the system. We evaluate the system for COIL-imagery and demonstrate that a user can reach satisfying categorization within few steps, even for image data sampled from walking in an office environment.
Heidemann, G.; Bekel, H. & I. Bax, H.R.
Interactive Online Learning
Pattern Recognition and Image Analysis 15:55-58, 2005.
Abstract: We present a computer vision system for object recognition, which is integrated in an augmented reality setup. The system can be trained online to recognize objects in an intuitive way. The augmented reality gear allows interaction using hand gestures for the control of displayed virtual menus. The underlying neural recognition system combines feature extraction and classification. Its three-stage architecture facilitates fast adaptation: in a fast training (FT) mode, only the last stage is adapted, whereas complete training (CT) rebuilds the system from scratch. Using FT, online acquired views can be added at once to the classifier, the system being operational after a delay of less than a second, though still with reduced classification performance. In parallel, a new classifier is trained (CT) and loaded to the system when ready.
Kostin, A.; Kittler, J. & Christmas, W.
Object recognition by symmetrised graph matching using relaxation labelling with an inhibitory mechanism
Pattern Recognition Letters 26:381-393, 2005.
Abstract: Object recognition using graph-matching techniques can be viewed as a two-stage process: extracting suitable object primitives from an image and corresponding models, and matching graphs constructed from these two sets of object primitives. In this paper we concentrate mainly on the latter issue of graph matching, for which we derive a technique based on probabilistic relaxation graph labelling. The new method was evaluated on two standard data sets, SOIL47 and COIL100, in both of which objects must be recognised from a variety of different views. The results indicated that our method is comparable with the best of other current object recognition techniques. The potential of the method was also demonstrated on challenging examples of object recognition in cluttered scenes.
Messer, K.; Christmas, W.; Jaser, E.; Kittler, J.; Levienaise-Obadia, B. & Koubaroulis, D.
A unified approach to the generation of semantic cues for sports video annotation
Signal Processing 83:357-383, 2005.
Abstract: The use of video and audio features for automated annotation of audio-visual data is becoming widespread. A major limitation of many of the current methods is that the stored indexing features are too low-level --- they relate directly to properties of the data. In this work we apply a further stage of processing that associates the feature measurements with real-world objects or events. The outputs, which we call ``cues'', are combined to enable us to compute directly the probability of the object being present in the scene. An additional advantage of this approach is that the cues from different types of features are presented in a homogeneous way.
Wachsmuth, S.; Wrede, S.; Hanheide, M. & Bauckhage, C.
An Active Memory Model for Cognitive Computer Vision Systems
KI-Journal, Special Issue on Cognitive Systems 19:25-31, 2005.
Abstract: Computer vision is more and more becoming an integral part in human-machine interfaces. Recent research aims at establishing a seamless and natural way of interaction between a user and an application system. Gesture recognition, context awareness, and grounding concepts in the commonly perceived environment as well as in the interaction history are key abilities of such systems. In parallel over the last years, computer vision research has indicated that integrated systems that are embedded in the world and actively interact with their environment seem to be a necessary precondition for solving more general computer vision tasks. In this context, cognitive computer vision systems emerged which aim at the generation of knowledge on the basis of perception, reasoning, learning and prior models. In both cases, integration, interaction and organization of memory become key issues in system design and practical research. In this article we will present a computational framework for integrated vision systems that is centered around an active memory component. It supports a fast integration and substitution of system components, various means of interaction patterns, and enables a system to reason about its own memory content. It will be exemplified by a cognitive humanmachine interface in an Augmented Reality scenario. The system is able to acquire new concepts from an interaction history and provides a context aware scene augmentation for the user.
Gräßl, C.; Deinzer, F.; Mattern, F. & Niemann, H.
Improving Statistical Object Recognition Approaches by a Parameterization of Normal Distributions
Zhuravlev, Y. (ed.) Pattern Recognition and Image Analysis, IAPC Nauka/lnterperiodica, Moskau 14:222-230, 2004.
Abstract: As statistical approaches play an important role in object recognition, we present a novel approach which is based on object models consisting of normal distributions for each training image. We show how to parameterize the mean vector and covariance matrix independently from the interpolation technique and formulating the classification and localization as a continuous optimization problem. This enables the computation of object poses which have been never seen during the training. For interpolation, we present four different techniques which are compared in an experiment with real images. The results show the benefit of our method both in classification rate and pose estimation accuracy.
Heidemann, G.; Rae, R.; Bekel, H.; Bax, I. & Ritter, H.
Integrating Context Free and Context-Dependent Attentional Mechanisms for Gestural Object Reference
Machine Vision and Applications 16:64-73, 2004.
Abstract: We present a vision system for human-machine interaction based on a small wearable camera mounted on glasses. The camera views the area in front of the user, especially the hands. To evaluate hand movements for pointing gestures and to recognise object references, an approach to integrating bottom-up generated feature maps and top-down propagated recognition results is introduced. Modules for context free focus of attention work in parallel with the hand gesture recognition. In contrast to other approaches, the fusion of the two branches is on the sub-symbolic level. This method facilitates both the integration of different modalities and the generation of auditory feedback.
Ahmadyfard, A. & Kittler, J.
Using relaxation technique for region-based object recognition
Image and Vision Computing 20:769-781, 2002.
Abstract: We address the problem of object recognition in computer vision. We represent each model and the scene in the form of attributed relational graph (ARG). A multiple region representation is provided at each node of the scene ARG to increase the representation reliability. The process of matching the scene ARG against the stored models is facilitated by a novel method for identifying the most probable representation from among the multiple candidates. The scene and model graph matching is accomplished using probabilistic relaxation which has been modified to minimise the label clutter. The experimental results obtained on real data demonstrate promising performance of the proposed recognition system.
Fitch, A.; Kadyrov, A.; Christmas, W. & Kittler, J.
Fast Exhaustive Robust Matching
IEEE Transactions on Image Processing 3:903-906, 2002.
Abstract: A new fast, statistically robust, exhaustive, translational image matching technique is presented: fast robust correlation. Existing methods are either slow, or non-robust, or rely on optimisation. Fast robust correlation works by expressing a robust matching surface as a series of correlations. Speed is obtained by computing correlations in the frequency domain. Computational cost is analysed and the method is shown to be fast. Speed is comparable to conventional correlation and, for large images, thousands of times faster than direct robust matching. Three experiments demonstrate the advantage of the technique over standard correlation.
Selbslokalisation mit intelligenten Sensoren - Fusion of Stereo Vision and Inertial Sensors
TUG Forschungsjournal WS 2002/2003 1:21-23, 2002.
Ribo, M.; Ganster, H.; Brandner, M.; Lang, P.; Stock, C. & Pinz, A.
Hybrid Tracking for Outdoor AR Applications
IEEE Computer Graphics and Applications Magazine 22:54-63, 2002.
Abstract: Tracking in fully mobile configurations, especially outdoors, is still a very challenging problem. Augmented Reality (AR) applications demand a perfect alignment of real scene and virtual augmentation, thus posing most stringent requirements. Only vision-based tracking is known to deliver sufficient accuracy, but it is too slow and too sensitive to outliers to be used standalone. We present a new hybrid tracking system for fully mobile outdoor AR applications which fuses vision-based tracking with an inertial tracking system. Several issues like fusion algorithms, evaluation and selection of visual landmarks, real-time tracking of landmarks, and handling of complexity of visual interpretation are discussed in detail. Overall tracking accuracy and speed of the system is sufficient to track users head pose in 6 degrees of freedom in real-time. Experimental results for an urban outdoor scene are given which demonstrate the capabilities of the system. This opens a field of new outdoor AR applications like 3D city guide, architectural AR presentations, and multiuser mobile outdoor AR.