Johann Strassburg, Rene Grzeszick, Leonard Rothacker and Gernot A. Fink
Proc. International Conference on Computer Vision Theory and Applications (VISAPP), 2015.
Image parsing describes a very fine grained analysis of natural scene images, where each pixel is assigned a label describing the object or part of the scene it belongs to. This analysis is a keystone to a wide range of applications that could benefit from detailed scene understanding, such as keyword based image search, sentence based image or video descriptions and even autonomous cars or robots. State-of-the art approaches in image parsing are data-driven and allow for recognizing arbitrary categories based on a knowledge transfer from similar images. As transferring labels on pixel level is tedious and noisy, more recent approaches build on the idea of segmenting a scene and transferring the information based on regions. For creating these regions the most popular approaches rely on over-segmenting the scene into superpixels. In this paper the influence of different superpixel methods will be evaluated within the well known Superparsing framework. Furthermore, a new method that computes a superpixel-like over-segmentation of an image is presented that computes regions based on edge-avoiding wavelets. The evaluation on the SIFT Flow and Barcelona dataset will show that the choice of the superpixel method is crucial for the performance of image parsing.