Imaging: Large image databases and small codes for object recognition

In this Google Tech Talk, Speaker Dr Rob Fergus, Assistant Professor of Computer Science at the Courant Institute of Mathematical Sciences, New York University describes how with the advent of the Internet, billions of images are now freely available online and constitute a dense sampling of the visual world. Using a variety of non-parametric methods, he explores this world with the aid of a large dataset of 79,302,017 images collected from the Web. Motivated by psychophysical results showing the remarkable tolerance of the human visual system to degradations in image resolution, the images in the dataset are stored as 32x32 color images. Each image is loosely labeled with one of the 75,062 non-abstract nouns in English, as listed in the Wordnet lexical database. Hence the image database gives a comprehensive coverage of all object categories and scenes. The semantic information from Wordnet can be used in conjunction with nearest?neighbor methods to perform object classification over a range of semantic levels minimizing the effects of labeling noise. For certain classes that are particularly prevalent in the dataset, such as people, we are able to demonstrate a recognition performance comparable to class?specific Viola?Jones style detectors.

In the second part of the talk, he presents efficient image search and scene matching techniques that are not only fast, but also require very little memory, enabling their use on standard hardware or even on handheld devices. His approach uses the Semantic Hashing idea of Salakhutdinov and Hinton, based on Restricted Boltzmann Machines to convert the Gist descriptor (a real valued vector that describes orientation energies at different scales and orientations within an image) to a compact binary code, with a few hundred bits per image. Using this scheme, it is possible to perform real-time searches on the Internet image database using a single large PC and obtain recognition results comparable to the full descriptor. Using the codes on high quality labeled images from the LabelMe database gives surprisingly powerful recognition results using simple nearest neighbor techniques.

0 members like this

Get Embed Code

Tags: