Integration of Structural and Color Cues for Robust Hand Detection in Video Images.

J. Richarz, T. Pl{\"o}tz and G. A. Fink
Proc. 7th Open German/Russian Workshop on Pattern Recognition and Image Understanding (OGRW 2007), 2007.

Ettlingen, Germany

BibTeX PDF

Abstract

Image structure - i.e. texture, edges or gradient features - and color are those two cues that are most frequently used in image interpretation and computer vision tasks. However, both have weaknesses if applied separately. Fusing the information of different cues allows for compensating the deficits of one of them by exploiting the advantages and attributes of others, leading to more robust results. Here we propose an approach to hand detection in video images of realistic scenes combining structural and color features. To describe image structure, we focus on the gradient orientation histograms extracted by the Scale Invariant Feature Transform (SIFT) as one variant of scale- and rotation-invariant salient region descriptors. Color information is evaluated by a skin color detector based on Gaussian Mixture Models (GMMs). We investigate different schemes for fusing these two different types of information, and present evaluation results for each of them using a large database of images recorded inside a smart house.