Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically
different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up. Existing multi-target tracking methods often use low-level features which are not sufficiently discriminative for identifying faces with such large appearance variations.
Author: 1qs9y
Low-Level Multiscale Image Segmentation and a Benchmark for its Evaluation
We present a segmentation algorithm to detect low-level structure present in images. The algorithm is designed to partition a given image into regions, corresponding to image structures, regardless of their shapes, sizes, and levels of interior homogeneity. We model a region as a connected set of pixels that is surrounded by ramp edge discontinuities where the magnitude of these discontinuities is large compared to the variation inside the region.
Sound2Sight: Generating Visual Dynamics from Sound and Context
Learning associations across modalities is critical for robust multimodal reasoning, especially when a modality may be missing during inference. In this paper, we study this problem in the context of audio-conditioned visual synthesis – a task that is important, for example, in occlusion reasoning. Specifically, our goal is to generate future video frames and their motion dynamics conditioned on audio and a few past frames.
Remove to Improve
The workhorses of CNNs are its filters, located at different layers and tuned to different features. Their responses are combined using weights obtained via network training. Training is aimed at optimal results for the entire training data, e.g., highest average classification accuracy. In this paper, we are interested in extending the current understanding of the roles played by the filters, their mutual interactions, and their relationship to classification accuracy.
Unsupervised 3D Pose Estimation for Hierarchical Dance Video
Dance experts often view dance as a hierarchy of information, spanning low-level (raw images, image sequences), mid-levels (human poses and bodypart movements), and high-level (dance genre). We propose a Hierarchical Dance Video Recognition framework (HDVR). HDVR estimates 2D pose sequences, tracks dancers, and then simultaneously estimates corresponding 3D poses and 3D-to-2D imaging parameters, without requiring ground truth for 3D poses.
Visual Scene Graphs for Audio Source Separation
A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction
Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena. Prior approaches to solve this task typically estimate a latent prior characterizing this stochasticity, however do not account for the predictive uncertainty of the (deep learning) model. Such approaches often derive the training signal from the mean-squared error (MSE) between the generated frame and the ground truth, which can lead to sub-optimal training, especially when the predictive uncertainty is high.
Transform Domain Methods for Single Image Super-Resolution
Super-resolution of a single image is a highly ill-posed problem since the number of high resolution pixels to be be estimated far exceeds the number of low resolution pixels available. Therefore, appropriate regularization or priors play an important role in the quality of results. In this line of work, we propose a family of methods for learning transform domain priors for the single-image super-resolution problem.
Simultaneous Noise Removal and Super-Resolution of Natural Images
Our goal is to obtain a noise-free, high resolution (HR) image, from an observed, noisy, low resolution (LR) image. The conventional approach of preprocessing the image with a denoising algorithm, followed by applying a super-resolution (SR) algorithm, has an important limitation: Along with noise, some high frequency content of the image (particularly textural detail) is invariably lost during the denoising step.
Non-Frontal Camera Calibration
In this work, we propose analytical solution to non-frontal camera calibration in a generalized pupil-centric imaging framework. The decentering distortion is explicitly modelled as a sensor rotation with respect to the lens plane. The rotation parameters are then computed analytically along with other calibration parameters. The centre of radial distortion is then computationally obtained given the analytical solution.
Compressive Sampling
Compressive sampling (CS) is aimed at acquiring a signal or image from data which is deemed insufficient by Nyquist/Shannon sampling theorem. Its main idea is to recover a signal from limited measurements by exploring the prior knowledge that the signal is sparse or compressible in some domain. In this paper, we propose a CS approach using a new total-variation measure TVL1, or equivalently TVL1 , which enforces the sparsity and the directional continuity in the gradient domain.
Pupil-Centric Imaging Model
In developing the new opto-geometric configurations, we have found that certain classical models and approaches cease to be adequate. For example, the long-established Gaussian model of image formation fails to adequately predict the acquired images, and the optical and geometric phenomena ignored in the traditional characterization of the most focused scene point make the traditional methods of focus analysis unacceptable.
Intermodal Loading Efficiency Analysis
Intermodal (IM) trains are typically the fastest freight trains operated in North America. The aerodynamic characteristics of many of these trains are often relatively poor resulting in high fuel consumption. However, considerable variation in fuel efficiency is possible depending on how the loads are placed on railcars in the train. Consequently, substantial potential fuel savings are possible if more attention is paid to the loading configuration of trains.
Fusion of Median and Bilateral Filtering for Range Image Upsampling
We present a new upsampling method to enhance the spatial resolution of depth images. Given a low-resolution depth image from an active depth sensor and a potentially high-resolution color image from a passive RGB camera, we formulate it as an adaptive cost aggregation problem and solve it using the bilateral filter.
Omnifocus Imaging
We discuss how to generate omnifocus images from a sequence of different focal setting images. We first show that the existing focus measures would encounter difficulty when detecting which frame is most focused for pixels in the regions between intensity edges and uniform areas. Then we propose a new focus measure that could be used to handle this problem.
Structure Based Optical Flow
Classical optical flow objective functions consist of a data term that enforces brightness constancy, and a spatial smoothing term that encourages smooth flow fields. The use of structural information from images has been conventionally used for designing more robust regularizers, to prevent oversmoothing motion discontinuities. In this line of work, we are looking at exploiting image structure in a more detailed manner, as compated to conventionally used gradient filters.
Learning Human Preferences For Image Sharpening
We propose an image sharpening method that automatically optimizes the perceived sharpness of an image. Image sharpness is defined in terms of the one-dimensional contrast across region boundaries. Regions are automatically extracted for all natural scales present that are themselves identified automatically. Human judgments are collected and used to learn a function that determines the best sharpening parameter values at an image location as a function of certain local image properties.
Shadow Removal Using Bilateral Filtering
In this paper, we propose a simple but effective shadow removal method using a single input image. We first derive a 2-D intrinsic image from a single RGB camera image based solely on colors, particularly chromaticity. We next present a method to recover a 3-D intrinsic image based on bilateral filtering and the 2-D intrinsic image.
Stereo Matching Using Epipolar Distance Transform
In this paper, we propose a simple but effective image transform, called the epipolar distance transform, for matching low-texture regions. It converts image intensity values to a relative location inside a planar segment along the epipolar line, such that pixels in the low-texture regions become distinguishable. We theoretically prove that the transform is affine invariant, thus the transformed images can be directly used for stereo matching.
Surface Reflectance and Normal Estimation from Photometric Stereo
In this paper, we propose a new photometric stereo method for estimating diffuse reflection and surface normal from color images. Using dichromatic reflection model, we introduce surface chromaticity as a matching invariant for photometric stereo, which serves as the foundation of the theory of this paper. An extremely simple and robust reflection components separation method is proposed based on the invariant.
Low-level multiscale video segmentation
Unsupervised video segmentation is a challenging problem because it involves a large amount of data, and image segments undergo noisy variations in color, texture and motion with time. However, there are significant redundancies that can help disambiguate the effects of noise. To exploit these redundancies and obtain the most spatio-temporally consistent video segmentation, we formulate the problem as a consistent labeling problem by exploiting higher order image structure.
Accessible Aperture for Computational Imaging
Many computational imaging applications involve manipulating the incoming light beam in the aperture and image planes. However, accessing the aperture, which conventionally stands inside the imaging lens, is still challenging. In this paper, we present an approach that allows access to the aperture plane and enables dynamic control of its transmissivity, position, and orientation.
Track Condition Inspection
North American railroads and the United States Department of Transportation (US DOT) Federal Railroad Administration (FRA) require periodic inspection of railway infrastructure to ensure safe railway operation. The primary focus of this research is the inspection of North American Class I railroad mainline and sidings, as these generally experience the highest traffic densities.
Real-time Specular Highlight Removal Using Bilateral Filtering
In this paper, we propose a simple but effective specular highlight removal method using a single input image. Our method is based on a key observation – the maximum fraction of the diffuse color component (so called maximum diffuse chromaticity in the literature) in local patches in color images changes smoothly.
Isotropy Based Clustering and Application to Image Segmentation
We present a novel scale adaptive, non-parametric approach to clustering point patterns. Clusters are detected by moving all points to their cluster cores using shift vectors. First, we propose a novel scale selection criterion based on local density isotropy which determines the neighborhoods over which the shift vectors are computed. We then construct a directed graph induced by these shift vectors.
Freight Car Underboy Structural Inspection
To ensure the safe and efficient operation of the approximately 1.6 million freight cars (wagons) in the North American railroad network, the United States Department of Transportation (USDOT), Federal Railroad Administration (FRA) requires periodic inspection of railcars to detect structural damage and defects. Railcar structural underframe components, including the centre sill, sidesills, and crossbearers, are subject to fatigue cracking due to periodic and/or cyclic loading during service and other forms of damage.
Scene Classification
We use features of segmentation for semantic classification of real images. We model the image in terms of a probability density function, a Gaussian mixture model (GMM) to be specific, of its region features. This GMM is fit to the image by adapting a universal GMM which is estimated so it fits all images.
Simultaneous Estimation of Illumination Chromaticity, Correspondence and Specular Reflection
Based on a new correspondence matching invariant called \emph{Illumination Chromaticity Constancy}, we present a new solution for illumination chromaticity estimation, correspondence searching and specularity removal. Using as few as two images, the core of our method is the computation of a vote distribution for a number of illumination chromaticity hypotheses via correspondence matching.
A Constant-Space Belief Propagation Algorithm for Stereo Matching
In this paper, we consider the problem of stereo matching using loopy belief propagation. Unlike previous methods which focus on the original spatial resolution, we hierarchically reduce the disparity search range. By fixing the number of disparity levels on the original resolution, our method solves the message updating problem in a time linear in the number of pixels contained in the image and requires only constant memory space.
SVM for Edge-Preserving Filtering
In this paper, we propose a new method to construct an edge-preserving filter which has very similar response to the bilateral filter. The bilateral filter is a normalized convolution in which the weighting for each pixel is determined by the spatial distance from the center pixel and its relative difference in intensity range.
Hemispherical Imaging Camera
We have developed a camera which is capable of acquiring very large field of view (FOV) images at high and uniform resolution, from a single viewpoint, at video rates. The FOV can range from being nearly hemispherical, to being nearly omni-directional, barring some small scene parts being obstructed by image sensors themselves.
Texture Recognition
Given an arbitrary image, our goal is to segment all distinct texture subimages. This is done by discovering distinct, cohesive groups of spatially repeating patterns, called texels, in the image, where each group defines the corresponding texture. Texels occupy image regions, whose photometric, geometric, structural, and spatial-layout properties are samples from an unknown pdf.
Low-level multiscale image segmentation
This research theme is concerned with the problem of low level image segmentation, or partitioning an image into regions, that represent low level image structure. A region is characterized as possessing a certain degree of interior homogeneity and a contrast with the surround which is large compared to the interior variation.
Real-time O(1) Bilateral Filtering
We propose a new bilateral filtering algorithm with computational complexity invariant to filter kernel size, socalledO(1) or constant time in the literature. By showing that a bilateral filter can be decomposed into a number of constant time spatial filters, our method yields a new class of constant time bilateral filters that can have arbitrary spatial1and arbitrary range kernels.
Region Based Image Matching
We propose novel approaches to region-based hierarchical image matching, where, given two images, the goal is to identify the largest part in image 1 and its match in image 2 having the maximum similarity measure defined in terms of geometric and photometric properties of regions (e.g., area, boundary shape, and color), as well as region topology (e.g.,
Segmentation of periodically moving objects
We present a new approach for the identification and segmentation of objects undergoing periodic motion. Our method uses a combination of maximum likelihood estimation of the period, and segments moving objects using correlation of image segments over an estimated period of interest. Correlation provides the best locations of the moving objects in each frame.
Connected Segmentation Tree For Object Modeling
We propose a new object representation, called connected segmentation tree (CST), which captures canonical characteristics of the object in terms of the photometric, geometric, and spatial adjacency and containment properties of its constituent image regions. CST is obtained by augmenting the object’s segmentation tree (ST) with inter-region neighbor links, in addition to their recursive embedding structure already present in ST.
Object Category Recognition
Low level segmentation based image features are used for the problem of object categorization. In general, object categorization comprises two main research areas: (1) classification or clustering of images containing objects belonging to an object category, and (2) detection, localization, and segmentation of individual object-category instances in images. The first thrust of research is typically concerned with exemplar based methods, where the main focus is to develop an efficient distance measure between two images.
Multi-Spectral Passenger Car Undercarriage Inspection
Locomotive and rolling stock condition is an important element of railway safety, reliability, and service quality. Traditionally, railroads have monitored equipment condition by conducting regular inspections. Over the past several decades, certain inspection tasks have been automated using technologies that have reduced the cost and increased the effectiveness of the inspection.
Segmentation Based Object Discovery
Given a set of images, possibly containing objects from an unknown category, determine if a category is present. If a category is present, learn spatial and photometric model of the category. Given an unseen image, segment all occurrences of the category.
- S. Todorovic and N. Ahuja, Extracting Subimages of an Unknown Category from a Set of Images, Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, NY, Vol.
Non-Lambertian Surface Reconstruction and Reflectance Modelling
Non-lambertian surfaces causes difficulties for many stereo systems. We describe methods to recover both 3D surface shape and reflectance models of an object from multiple views. We use an iterative method, based on multi-view shape from shading, to estimate shape and reflectance models. The estimated models can be used to generate objects in new views and under new lighting conditions using computer graphics techniques.
Safety Appliance Inspection
Before North American trains depart a terminal or rail yard, many aspects of the cars and locomotives undergo inspection, including their safety appliances. Safety appliances are handholds, ladders and other objects that serve as the interface between humans and railcars during transportation. The current inspection process is primarily visual and is labor intensive, redundant, and generally lacks “memory” of the inspection results.
An Omni-Directional Stereo Vision System Using Single Camera
We describe a new omnidirectional stereo imaging system that uses a concave lens and a convex mirror to produce a stereo pair of images on the sensor of a conventional camera. The light incident from a scene point is split and directed to the camera in two parts. One part reaches camera directly after reflection from the convex mirror and forms a single-viewpoint omnidirectional image.
Extraction and Analysis of Multiple Periodic Motions in Video Sequences
The analysis of periodic or repetitive motions is useful in many applications, both in the natural and the man-made world. An important example is the recognition of human and animal activities. Existing methods for the analysis of periodic motions first extract motion trajectories, e.g. via correlation, or feature point matching. We present a new approach, which takes advantage of both the frequency and spatial information of the video.
Single Lens Depth Camera
A visual depth sensor composed of a single camera and a transparent plate rotating about the optical axis in front of the camera. Depth is estimated from the disparities of scene points observed in multiple images acquired viewing through the rotating the plate.
We propose a novel depth sensing imaging system composed of a single camera along with a parallel planar plate rotating about the optical axis of the camera.
Image Ensembles/ Video analysis Using Image-As-Matrix Representation
Tensor Manipulation
We explore new algorithms for computer vision based on multilinear algebra. Firstly, we learn the expression subspace and person subspace from a corpus of images based on Higher-Order Singular Value Decomposition (HOSVD), and investigate their applications in facial expression synthesis, face recognition and facial expression recognition. Secondly, we explore new algorithms for image ensembles/video representation and recognition using tensor rank-one decomposition and tensor rank-R approximation.
3D Object Modeling
Given multiple calibrated pictures of a real world object captured from different viewpoints, reconstruct a three-dimensional model of the object.
- T. Yu, N. Xu and N. Ahuja, Reconstructing a Dynamic Surface from Video Sequences Using Graph Cuts in 4D Space-Time, IEEE International Conference on Pattern Recognition, Cambridge, UK, August 2004, 245-248.
Dense Stereo Maping Using Kernel Maximum Likelihood Estimation
A robust stereo matching algorithm using kernel representation of the probability density functions (pdf’s) of the sources that generate the stereoscopic images. Matching is done using either a Maximum Likelihood framework or using correlation in the pdf domain and an MRF prior to model the disparity function.
- A. Jagmohan, M. Singh and N.
Split Aperture Imaging
Standard imaging sensors have limited dynamic range and hence are sensitive to only a part of the illumination range present in a natural scene. The dynamic range can be improved by acquiring multiple images of the same scene under different exposure settings and then combining them. We have developed a multi-sensor camera design, called Split-Aperture Camera, to acquire registered, multiple images of a scene, at different exposure, from a single viewpoint, and at video-rate.
Railcar Truck Component Inspection
One machine vision system researched by the University of Illinois Urbana-Champaign (UIUC), under sponsorship of the AAR’s Technology Scanning Strategic Research Initiative, demonstrates that machine vision can be used for inspection of railcars. The UIUC prototype system inspect wheel, truct, and brake system components by automated, machine vision-based systems. Machine vision-based wheel and brake shoe inspection systems are already or will soon become commercially available.
High-Resolution Double Pyramid Panoramic Cameras
Pyramid Cameras
To acquire panoramic video sequences, we have developed two types of Double-Mirror-Pyramid cameras that capture up to 360-degree fields of view at high-resolution. The first one, A Single View Double-Mirror-Pyramid Panoramic Camera, acquires a single sequence from one viewpoint, whereas the second, A Multiview Double-Mirror-Pyramid Panoramic Camera, provides multiple video sequences each taken from a different viewpoint, e.g.
Multi-View Double Mirror Pyramid Panoramic Cameras
Pyramid Cameras
To acquire panoramic video sequences, we have developed two types of Double-Mirror-Pyramid cameras that capture up to 360-degree fields of view at high-resolution. The first one, A Single View Double-Mirror-Pyramid Panoramic Camera, acquires a single sequence from one viewpoint, whereas the second, A Multiview Double-Mirror-Pyramid Panoramic Camera, provides multiple video sequences each taken from a different viewpoint, e.g.
Estimation and Segmentation of Images Using Parametric Image Models
Statistical models
Statistical models of pixel value variations have been developed and analyzed. Some of the work focuses on kernel density estimators to develop such models. Consequently, statistical theory of density estimators can be used for various tasks including segmentation of locally/globally parametric image signals; scale estimation and object registration. The main projects of this sub-theme are “Bandwidth Selection for Kernel Density Estimators” and “Estimation and Segmentation of Images Using Parametric Image Models” detailed below.
Video Encoding using Coset Codes
Video Compression using Wyner-Ziv Codes
Predictive coding is posed as a variant of the Wyner-Ziv coding, and problems in source and channel coding of video are addressed in this framework.
Video Encoding using Coset Codes
This project deals with scalable coding and robust Internet streaming of predictively encoded media. We frame the problem of predictive coding as a variant of the Wyner-Ziv problem in Information theory.
Compression of Image-based Rendering Data
Video Compression using Wyner-Ziv Codes
Predictive coding is posed as a variant of the Wyner-Ziv coding, and problems in source and channel coding of video are addressed in this framework.
Compression of Image-based Rendering Data
The design of compression techniques for streaming of image-based rendering data to remote viewers. A compression algorithm based on the use of Wyner-Ziv codes is proposed, which satisfies the key constraints for IBR streaming, namely those of random access for interactivity, and pre-compression.
Facial Expression Decomposition
Tensor Manipulation
We explore new algorithms for computer vision based on multilinear algebra. Firstly, we learn the expression subspace and person subspace from a corpus of images based on Higher-Order Singular Value Decomposition (HOSVD), and investigate their applications in facial expression synthesis, face recognition and facial expression recognition. Secondly, we explore new algorithms for image ensembles/video representation and recognition using tensor rank-one decomposition and tensor rank-R approximation.
Bandwidth Selection for Kernel Density Estimators
Statistical models
Statistical models of pixel value variations have been developed and analyzed. Some of the work focuses on kernel density estimators to develop such models. Consequently, statistical theory of density estimators can be used for various tasks including segmentation of locally/globally parametric image signals; scale estimation and object registration. The main projects of this sub-theme are “Bandwidth Selection for Kernel Density Estimators” and “Estimation and Segmentation of Images Using Parametric Image Models” detailed below.
Predictive Multiple Description Coding using Wyner-Ziv Codes
Video Compression using Wyner-Ziv Codes
Predictive coding is posed as a variant of the Wyner-Ziv coding, and problems in source and channel coding of video are addressed in this framework.
Predictive Multiple Description Coding using Wyner-Ziv Codes
Two-channel predictive multiple description coding is posed as a variant of the Wyner-Ziv coding problem.
Face Detection
Faces and Gestures
The aforementioned work on representation and learning has contributed to two types of human computer interfaces we have developed. First, learning and classification techniques, including usual statistical classifiers, neural networks, support vector machines and artificial intelligence approaches, have been used to develop new methods for human face detection and hand gesture recognition.
Human Computer Interaction
The second type of human-computer interface is a free-hand-sketch based interface for image editing (e.g., moving, size-scaling, color-transforming parts of an image) is developed. The sketches drawn by the user on top of the image serve as a natural way of specifying an image part and the editing (e.g., move, deletion) operation to be performed.
Block-based motion estimation for missing video frame interpolation, and spatially scalable (multi-resolution) video coding
Video frames are often dropped during compression at very low bit rates. At the decoder, a missing frame interpolation method synthesizes the missed frames. We propose a two step motion estimation method for the interoplation. More specifically, the coarse motion vector field is refined at the decoder using mesh-based motion estimation instead of using computationally intensive dense motion estimation.
Face Recognition
Faces and Gestures
The aforementioned work on representation and learning has contributed to two types of human computer interfaces we have developed. First, learning and classification techniques, including usual statistical classifiers, neural networks, support vector machines and artificial intelligence approaches, have been used to develop new methods for human face detection and hand gesture recognition.
Panoramic Imaging with Infinite Dynamic Range
Most imaging sensors have a limited dynamic range and hence can satisfactorily respond to only a part of illumination levels present in a scene. This is particularly disadvantageous for omnidirectional and panoramic cameras since larger fields of view have larger brightness ranges. We propose a simple modification to existing high resolution omnidirectional/panoramic cameras in which the process of increasing the dynamic range is coupled with the process of increasing the field of view.
Transform Domain Magnification or Super-resolution
In order to apply a multi-dimensional linear transform, over an arbitrarily shaped support, the usual practice is to fill out the support to a hypercube by zero padding. This does not however yield a satisfactory definition for transforms in two or more dimensions. The problem that we tackle is: how do we redefine the transform over an arbitrary shaped region suited to a given application?
Linear Transforms over arbitrary supports
In order to apply a multidimensional linear transform over an arbitrarily shaped support, the usual practice is to fill out the support to a hypercube by zero padding. The problem that we tackle is: how do we redefine the transform over an arbitrary shaped region suited to a given application? We present a novel iterative approach to define any multidimensional linear transform over an arbitrary shape given that we know its definition over a hypercube.
Transform-Domain Watermarking
A new method for digital image watermarking which does not require the original image for watermark detection is presented. Assuming that we are using a transform domain spread spectrum watermarking scheme, it is important to add the watermark in select coefficients with significant image energy in the transform domain in order to ensure non-erasability of the watermark.
Omnifocus Nonfrontal Imaging Camera
The concept of omnifocus nonfrontal imaging camera, OMNICAM or NICAM, initiated a new chapter in imaging and digital cameras. NICAM has introduced hitherto non-existent imaging capabilities, in addition to overcoming some problems with previous methods. NICAM is capable of acquiring seamless panoramic images and range estimates of wide scenes with all objects in focus, regardless of their locations.
Learning to Recognize 3D Objects
3D Object Recognition
Recognition is achieved either by explicitly coding the recognition criteria in terms of low level structure, or through learning from examples. Learning algorithms incorporate subspace projections of higher dimensional data symbolically or using neural approaches.
Learning to Recognize 3D Objects
A learning account for the problem of object recognition is developed within the PAC (Probably Approximately Correct) model of learnability.
Learning for Object Recognition
3D Object Recognition
Recognition is achieved either by explicitly coding the recognition criteria in terms of low level structure, or through learning from examples. Learning algorithms incorporate subspace projections of higher dimensional data symbolically or using neural approaches.
Learning for Object Recognition
A learning algorithm accounting for the problem of object recognition is developed within the PAC (Probably Approximately Correct) model of learnability.
Structure Based Image Denoising
Multiscale structure based image representation using a set of regions
The application fields are (i) of appropriate granularity for best image compression, (ii) of appropriately rescaled size for image magnification or superresolution, and (iii) for smoothing for image quality restoration through structure-preserving denoising.
Structure Based Image Denoising
This work addresses the problem of denoising of images corrupted by AWGN.
Efficient spatio-temporal filtering for video denoising
Video Denoising
This work proposes a computationally fast scheme for denoising a video sequence. Temporal processing is done separately from spatial processing and the two are then combined to get the denoised frame. The temporal redundancy is exploited using a scalar state 1D Kalman filter. A novel way is proposed to estimate the variance of the state noise from the noisy frames.
Multiscale structure based video compression, by estimating and coding region motion instead of pixel motion
Segmentation Based Video Coding
We develope a very low bit rate video compression algorithm using multiscale image segmentation based hierarchical motion compensation and residual coding. The proposed algorithm outperforms the H.261-like coder by 3 dB and the H.263 version 2 by 1 dB. Such gains come from the use of image segmentation and reversed motion prediction.
Learning of Low-level Spatiotemporal Structural Patterns
3D Object Recognition
Recognition is achieved either by explicitly coding the recognition criteria in terms of low level structure, or through learning from examples. Learning algorithms incorporate subspace projections of higher dimensional data symbolically or using neural approaches.
Learning of Low-level Spatiotemporal Structural Patterns
Given an image or a video sequence, a prespecified set of low level, spatial and/or temporal descriptors of the image/video structure, and a higher level interpretation of the structure, use computational learning methods to derive a succinct relationship between the interpretation and the low level structural description.
Gesture Recognition
Faces and Gestures
The aforementioned work on representation and learning has contributed to two types of human computer interfaces we have developed. First, learning and classification techniques, including usual statistical classifiers, neural networks, support vector machines and artificial intelligence approaches, have been used to develop new methods for human face detection and hand gesture recognition.
Detection of photometric distribution discontinuities in video to locate shot changes
Video Shot Detection
We present a novel improvement to existing schemes for abrupt shot change detection. Existing schemes declare a shot change whenever the frame to frame histogram difference (FFD) value is above a particular threshold. In such an approach, a high value for the threshold results in a small number of false alarms and a large number of missed detections while a low value for the threshold decreases the number of missed detections at the expense of increasing the false alarms.
3D Surfaces and Illumination from Stereo and Shading
- D. Hougen and N. Ahuja, Integration of Stereo and Shape from Shading using Color, Proc. Second International Conf. on Automation, Robotics and Computer Vision, Vol 1, Singapore, September 15-18 1992, pp. CV-6.6.1 – CV-6.6.5.
- D. Hougen and N. Ahuja, Estimation of the Light Source Distribution and its Use in Shape Recovery from Stereo and Shading, 4th Int.
Structure Based Image Magnification or Super-resolution
Multiscale structure based image representation using a set of regions
The application fields are (i) of appropriate granularity for best image compression, (ii) of appropriately rescaled size for image magnification or superresolution, and (iii) for smoothing for image quality restoration through structure-preserving denoising.
Structure Based Image Magnification or Super-resolution
Resolution enhancement involves the problem of magnifying a small image to several times its size while avoiding blurring, ringing and other artifacts.
Structure Based Image Compression
Multiscale structure based image representation using a set of regions
The application fields are (i) of appropriate granularity for best image compression, (ii) of appropriately rescaled size for image magnification or superresolution, and (iii) for smoothing for image quality restoration through structure-preserving denoising.
Structure Based Image Compression
Our novel reversible image compression method employs multiscale segmentation within a computationally efficient optimization framework to obtain consistently good performance over a wide variety of images.
3D Surface Orientation from Texture Gradient
3D Surface Orientation from Texture Gradient computed in a single image of a homogeneously textured surface.
In an image containing texture elements at a range of scales, detect all elements, their relative locations and mutual containment relationships.
OBJECTIVE
Given a slanted view of a planar, homogeneously textured surface, estimate the surface slant from the image texture gradient.
Hexapod Robot Project
Surfaces from Binocular Spatial Stereo
Given multiple images of a scene, taken from multiple cameras and different viewpoints, find the 3D depth map and surfaces
- W. Hoff and N. Ahuja, Surfaces from Stereo, Proc. DARPA Image Understanding Workshop, Miami, December 9-10, 1985, 98-106.
- W. Hoff and N. Ahuja, Surfaces from Stereo, 8th International Conference on Pattern Recognition, Paris, France, October 28-31, 1986, 516-518.