Gesture Recognition

Faces and Gestures

The aforementioned work on representation and learning has contributed to two types of human computer interfaces we have developed. First, learning and classification techniques, including usual statistical classifiers, neural networks, support vector machines and artificial intelligence approaches, have been used to develop new methods for human face detection and hand gesture recognition.

GIST (Gesture Interpretation using Spatio-Temporal analysis) project is an attempt to recognize and interpret sign gestures of American Sign Language from a video sequence based on an integrated method of motion segmentation, shape, size and color. A multi-scale motion segmentation based on Ahuja’s New Transform is applied to a video sequence to get motion regions and their correspondence across frames. Regions of interest, such as fingertip, palm and elbow, are extracted from motion segmented images by formulating and solving a constraint satisfaction problem. From these joints, pixel trajectories are extracted. A spatio-temporal analysis based on time-delay neural network is applied to classify these patterns. The ultimate goal of GIST is to allow content-based video retrieval based on video clips and better understanding of motion segmentation.

E. Altman and N. Ahuja and F. Kishino, Hand Trajectory Recognition Using Dynamical Systems, First Asian Conference on Computer Vision, November 23-25, 1993, Osaka, Japan, 321-324.
M.-H. Yang and N. Ahuja, Extracting Gestural Motion Trajectories, 1998 IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, April 1998, 10-15.
M.-H. Yang and N. Ahuja, Extraction and Classification of Visual Motion Patterns for Hand Gesture Recognition, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Santa Barbara, CA, June 1998, 892-897.
M.-H. Yang and N. Ahuja, Gaussian Mixture Modeling of Human Skin Color and Its Applications in Image and Video Databases, Proc. of the SPIE: Storage and Retrieval for Image and Video Databases VI, Vol. 3656, San Jose, CA, Jan. 1999.