Tracking Persons-of-Interest via Unsupervised Representation Adaptation

Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically
different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up. Existing multi-target tracking methods often use low-level features which are not sufficiently discriminative for identifying faces with such large appearance variations. Read More “Tracking Persons-of-Interest via Unsupervised Representation Adaptation”

Low-Level Multiscale Image Segmentation and a Benchmark for its Evaluation

We present a segmentation algorithm to detect low-level structure present in images. The algorithm is designed to partition a given image into regions, corresponding to image structures, regardless of their shapes, sizes, and levels of interior homogeneity. We model a region as a connected set of pixels that is surrounded by ramp edge discontinuities where the magnitude of these discontinuities is large compared to the variation inside the region. Read More “Low-Level Multiscale Image Segmentation and a Benchmark for its Evaluation”

Sound2Sight: Generating Visual Dynamics from Sound and Context

Learning associations across modalities is critical for robust multimodal reasoning, especially when a modality may be missing during inference. In this paper, we study this problem in the context of audio-conditioned visual synthesis – a task that is important, for example, in occlusion reasoning. Specifically, our goal is to generate future video frames and their motion dynamics conditioned on audio and a few past frames. Read More “Sound2Sight: Generating Visual Dynamics from Sound and Context”

Remove to Improve

The workhorses of CNNs are its filters, located at different layers and tuned to different features. Their responses are combined using weights obtained via network training. Training is aimed at optimal results for the entire training data, e.g., highest average classification accuracy. In this paper, we are interested in extending the current understanding of the roles played by the filters, their mutual interactions, and their relationship to classification accuracy. Read More “Remove to Improve”

Unsupervised 3D Pose Estimation for Hierarchical Dance Video

Dance experts often view dance as a hierarchy of information, spanning low-level (raw images, image sequences), mid-levels (human poses and bodypart movements), and high-level (dance genre). We propose a Hierarchical Dance Video Recognition framework (HDVR). HDVR estimates 2D pose sequences, tracks dancers, and then simultaneously estimates corresponding 3D poses and 3D-to-2D imaging parameters, without requiring ground truth for 3D poses. Read More “Unsupervised 3D Pose Estimation for Hierarchical Dance Video”

Visual Scene Graphs for Audio Source Separation

State-of-the-art approaches for visually-guided audio source separation typically assume sources that have characteristic sounds, such as musical instruments. These approaches often ignore the visual context of these sound sources or avoid modeling object interactions that may be useful to characterize the sources better, especially when the same object class may produce varied sounds from distinct interactions. Read More “Visual Scene Graphs for Audio Source Separation”

A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction

Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena. Prior approaches to solve this task typically estimate a latent prior characterizing this stochasticity, however do not account for the predictive uncertainty of the (deep learning) model. Such approaches often derive the training signal from the mean-squared error (MSE) between the generated frame and the ground truth, which can lead to sub-optimal training, especially when the predictive uncertainty is high. Read More “A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction”

Transform Domain Methods for Single Image Super-Resolution

Super-resolution of a single image is a highly ill-posed problem since the number of high resolution pixels to be be estimated far exceeds the number of low resolution pixels available. Therefore, appropriate regularization or priors play an important role in the quality of results. In this line of work, we propose a family of methods for learning transform domain priors for the single-image super-resolution problem. Read More “Transform Domain Methods for Single Image Super-Resolution”

Simultaneous Noise Removal and Super-Resolution of Natural Images

Our goal is to obtain a noise-free, high resolution (HR) image, from an observed, noisy, low resolution (LR) image. The conventional approach of preprocessing the image with a denoising algorithm, followed by applying a super-resolution (SR) algorithm, has an important limitation: Along with noise, some high frequency content of the image (particularly textural detail) is invariably lost during the denoising step. Read More “Simultaneous Noise Removal and Super-Resolution of Natural Images”

Non-Frontal Camera Calibration

In this work, we propose analytical solution to non-frontal camera calibration in a generalized pupil-centric imaging framework. The decentering distortion is explicitly modelled as a sensor rotation with respect to the lens plane. The rotation parameters are then computed analytically along with other calibration parameters. The centre of radial distortion is then computationally obtained given the analytical solution. Read More “Non-Frontal Camera Calibration”

Compressive Sampling

Compressive sampling (CS) is aimed at acquiring a signal or image from data which is deemed insufficient by Nyquist/Shannon sampling theorem. Its main idea is to recover a signal from limited measurements by exploring the prior knowledge that the signal is sparse or compressible in some domain. In this paper, we propose a CS approach using a new total-variation measure TVL1, or equivalently TVL1 , which enforces the sparsity and the directional continuity in the gradient domain. Read More “Compressive Sampling”

Pupil-Centric Imaging Model

In developing the new opto-geometric configurations, we have found that certain classical models and approaches cease to be adequate. For example, the long-established Gaussian model of image formation fails to adequately predict the acquired images, and the optical and geometric phenomena ignored in the traditional characterization of the most focused scene point make the traditional methods of focus analysis unacceptable. Read More “Pupil-Centric Imaging Model”

Intermodal Loading Efficiency Analysis

Intermodal (IM) trains are typically the fastest freight trains operated in North America. The aerodynamic characteristics of many of these trains are often relatively poor resulting in high fuel consumption. However, considerable variation in fuel efficiency is possible depending on how the loads are placed on railcars in the train. Consequently, substantial potential fuel savings are possible if more attention is paid to the loading configuration of trains. Read More “Intermodal Loading Efficiency Analysis”

Fusion of Median and Bilateral Filtering for Range Image Upsampling

We present a new upsampling method to enhance the spatial resolution of depth images. Given a low-resolution depth image from an active depth sensor and a potentially high-resolution color image from a passive RGB camera, we formulate it as an adaptive cost aggregation problem and solve it using the bilateral filter. Read More “Fusion of Median and Bilateral Filtering for Range Image Upsampling”

Structure Based Optical Flow

Classical optical flow objective functions consist of a data term that enforces brightness constancy, and a spatial smoothing term that encourages smooth flow fields. The use of structural information from images has been conventionally used for designing more robust regularizers, to prevent oversmoothing motion discontinuities. In this line of work, we are looking at exploiting image structure in a more detailed manner, as compated to conventionally used gradient filters. Read More “Structure Based Optical Flow”

Learning Human Preferences For Image Sharpening

We propose an image sharpening method that automatically optimizes the perceived sharpness of an image. Image sharpness is defined in terms of the one-dimensional contrast across region boundaries. Regions are automatically extracted for all natural scales present that are themselves identified automatically. Human judgments are collected and used to learn a function that determines the best sharpening parameter values at an image location as a function of certain local image properties. Read More “Learning Human Preferences For Image Sharpening”

Shadow Removal Using Bilateral Filtering

In this paper, we propose a simple but effective shadow removal method using a single input image. We first derive a 2-D intrinsic image from a single RGB camera image based solely on colors, particularly chromaticity. We next present a method to recover a 3-D intrinsic image based on bilateral filtering and the 2-D intrinsic image. Read More “Shadow Removal Using Bilateral Filtering”

Stereo Matching Using Epipolar Distance Transform

In this paper, we propose a simple but effective image transform, called the epipolar distance transform, for matching low-texture regions. It converts image intensity values to a relative location inside a planar segment along the epipolar line, such that pixels in the low-texture regions become distinguishable. We theoretically prove that the transform is affine invariant, thus the transformed images can be directly used for stereo matching. Read More “Stereo Matching Using Epipolar Distance Transform”

Surface Reflectance and Normal Estimation from Photometric Stereo

In this paper, we propose a new photometric stereo method for estimating diffuse reflection and surface normal from color images. Using dichromatic reflection model, we introduce surface chromaticity as a matching invariant for photometric stereo, which serves as the foundation of the theory of this paper. An extremely simple and robust reflection components separation method is proposed based on the invariant. Read More “Surface Reflectance and Normal Estimation from Photometric Stereo”

Low-level multiscale video segmentation

Unsupervised video segmentation is a challenging problem because it involves a large amount of data, and image segments undergo noisy variations in color, texture and motion with time. However, there are significant redundancies that can help disambiguate the effects of noise. To exploit these redundancies and obtain the most spatio-temporally consistent video segmentation, we formulate the problem as a consistent labeling problem by exploiting higher order image structure. Read More “Low-level multiscale video segmentation”

Accessible Aperture for Computational Imaging

Many computational imaging applications involve manipulating the incoming light beam in the aperture and image planes. However, accessing the aperture, which conventionally stands inside the imaging lens, is still challenging. In this paper, we present an approach that allows access to the aperture plane and enables dynamic control of its transmissivity, position, and orientation. Read More “Accessible Aperture for Computational Imaging”

Track Condition Inspection

North American railroads and the United States Department of Transportation (US DOT) Federal Railroad Administration (FRA) require periodic inspection of railway infrastructure to ensure safe railway operation. The primary focus of this research is the inspection of North American Class I railroad mainline and sidings, as these generally experience the highest traffic densities. Read More “Track Condition Inspection”

Real-time Specular Highlight Removal Using Bilateral Filtering

In this paper, we propose a simple but effective specular highlight removal method using a single input image. Our method is based on a key observation – the maximum fraction of the diffuse color component (so called maximum diffuse chromaticity in the literature) in local patches in color images changes smoothly. Read More “Real-time Specular Highlight Removal Using Bilateral Filtering”

Isotropy Based Clustering and Application to Image Segmentation

We present a novel scale adaptive, non-parametric approach to clustering point patterns. Clusters are detected by moving all points to their cluster cores using shift vectors. First, we propose a novel scale selection criterion based on local density isotropy which determines the neighborhoods over which the shift vectors are computed. We then construct a directed graph induced by these shift vectors. Read More “Isotropy Based Clustering and Application to Image Segmentation”

Freight Car Underboy Structural Inspection

To ensure the safe and efficient operation of the approximately 1.6 million freight cars (wagons) in the North American railroad network, the United States Department of Transportation (USDOT), Federal Railroad Administration (FRA) requires periodic inspection of railcars to detect structural damage and defects. Railcar structural underframe components, including the centre sill, sidesills, and crossbearers, are subject to fatigue cracking due to periodic and/or cyclic loading during service and other forms of damage. Read More “Freight Car Underboy Structural Inspection”

Simultaneous Estimation of Illumination Chromaticity, Correspondence and Specular Reflection

Based on a new correspondence matching invariant called \emph{Illumination Chromaticity Constancy}, we present a new solution for illumination chromaticity estimation, correspondence searching and specularity removal. Using as few as two images, the core of our method is the computation of a vote distribution for a number of illumination chromaticity hypotheses via correspondence matching. Read More “Simultaneous Estimation of Illumination Chromaticity, Correspondence and Specular Reflection”

A Constant-Space Belief Propagation Algorithm for Stereo Matching

In this paper, we consider the problem of stereo matching using loopy belief propagation. Unlike previous methods which focus on the original spatial resolution, we hierarchically reduce the disparity search range. By fixing the number of disparity levels on the original resolution, our method solves the message updating problem in a time linear in the number of pixels contained in the image and requires only constant memory space. Read More “A Constant-Space Belief Propagation Algorithm for Stereo Matching”

Texture Recognition

Given an arbitrary image, our goal is to segment all distinct texture subimages. This is done by discovering distinct, cohesive groups of spatially repeating patterns, called texels, in the image, where each group defines the corresponding texture. Texels occupy image regions, whose photometric, geometric, structural, and spatial-layout properties are samples from an unknown pdf. Read More “Texture Recognition”

Low-level multiscale image segmentation

This research theme is concerned with the problem of low level image segmentation, or partitioning an image into regions, that represent low level image structure. A region is characterized as possessing a certain degree of interior homogeneity and a contrast with the surround which is large compared to the interior variation. Read More “Low-level multiscale image segmentation”

Real-time O(1) Bilateral Filtering

We propose a new bilateral filtering algorithm with computational complexity invariant to filter kernel size, socalledO(1) or constant time in the literature. By showing that a bilateral filter can be decomposed into a number of constant time spatial filters, our method yields a new class of constant time bilateral filters that can have arbitrary spatial1and arbitrary range kernels. Read More “Real-time O(1) Bilateral Filtering”

Segmentation of periodically moving objects

We present a new approach for the identification and segmentation of objects undergoing periodic motion. Our method uses a combination of maximum likelihood estimation of the period, and segments moving objects using correlation of image segments over an estimated period of interest. Correlation provides the best locations of the moving objects in each frame. Read More “Segmentation of periodically moving objects”

Connected Segmentation Tree For Object Modeling

We propose a new object representation, called connected segmentation tree (CST), which captures canonical characteristics of the object in terms of the photometric, geometric, and spatial adjacency and containment properties of its constituent image regions. CST is obtained by augmenting the object’s segmentation tree (ST) with inter-region neighbor links, in addition to their recursive embedding structure already present in ST. Read More “Connected Segmentation Tree For Object Modeling”

Object Category Recognition

Low level segmentation based image features are used for the problem of object categorization. In general, object categorization comprises two main research areas: (1) classification or clustering of images containing objects belonging to an object category, and (2) detection, localization, and segmentation of individual object-category instances in images. The first thrust of research is typically concerned with exemplar based methods, where the main focus is to develop an efficient distance measure between two images. Read More “Object Category Recognition”

Multi-Spectral Passenger Car Undercarriage Inspection

Locomotive and rolling stock condition is an important element of railway safety, reliability, and service quality. Traditionally, railroads have monitored equipment condition by conducting regular inspections. Over the past several decades, certain inspection tasks have been automated using technologies that have reduced the cost and increased the effectiveness of the inspection. Read More “Multi-Spectral Passenger Car Undercarriage Inspection”

Segmentation Based Object Discovery

Given a set of images, possibly containing objects from an unknown category, determine if a category is present. If a category is present, learn spatial and photometric model of the category. Given an unseen image, segment all occurrences of the category.

Read More “Segmentation Based Object Discovery”

Non-Lambertian Surface Reconstruction and Reflectance Modelling

Non-lambertian surfaces causes difficulties for many stereo systems. We describe methods to recover both 3D surface shape and reflectance models of an object from multiple views. We use an iterative method, based on multi-view shape from shading, to estimate shape and reflectance models. The estimated models can be used to generate objects in new views and under new lighting conditions using computer graphics techniques. Read More “Non-Lambertian Surface Reconstruction and Reflectance Modelling”

Safety Appliance Inspection

Before North American trains depart a terminal or rail yard, many aspects of the cars and locomotives undergo inspection, including their safety appliances. Safety appliances are handholds, ladders and other objects that serve as the interface between humans and railcars during transportation. The current inspection process is primarily visual and is labor intensive, redundant, and generally lacks “memory” of the inspection results. Read More “Safety Appliance Inspection”

An Omni-Directional Stereo Vision System Using Single Camera

We describe a new omnidirectional stereo imaging system that uses a concave lens and a convex mirror to produce a stereo pair of images on the sensor of a conventional camera. The light incident from a scene point is split and directed to the camera in two parts. One part reaches camera directly after reflection from the convex mirror and forms a single-viewpoint omnidirectional image. Read More “An Omni-Directional Stereo Vision System Using Single Camera”

Extraction and Analysis of Multiple Periodic Motions in Video Sequences

The analysis of periodic or repetitive motions is useful in many applications, both in the natural and the man-made world. An important example is the recognition of human and animal activities. Existing methods for the analysis of periodic motions first extract motion trajectories, e.g. via correlation, or feature point matching. We present a new approach, which takes advantage of both the frequency and spatial information of the video. Read More “Extraction and Analysis of Multiple Periodic Motions in Video Sequences”

Single Lens Depth Camera

A visual depth sensor composed of a single camera and a transparent plate rotating about the optical axis in front of the camera. Depth is estimated from the disparities of scene points observed in multiple images acquired viewing through the rotating the plate.

We propose a novel depth sensing imaging system composed of a single camera along with a parallel planar plate rotating about the optical axis of the camera. Read More “Single Lens Depth Camera”

Image Ensembles/ Video analysis Using Image-As-Matrix Representation

Tensor Manipulation

We explore new algorithms for computer vision based on multilinear algebra. Firstly, we learn the expression subspace and person subspace from a corpus of images based on Higher-Order Singular Value Decomposition (HOSVD), and investigate their applications in facial expression synthesis, face recognition and facial expression recognition. Secondly, we explore new algorithms for image ensembles/video representation and recognition using tensor rank-one decomposition and tensor rank-R approximation. Read More “Image Ensembles/ Video analysis Using Image-As-Matrix Representation”

3D Object Modeling

Given multiple calibrated pictures of a real world object captured from different viewpoints, reconstruct a three-dimensional model of the object.

Read More “3D Object Modeling”

Dense Stereo Maping Using Kernel Maximum Likelihood Estimation

A robust stereo matching algorithm using kernel representation of the probability density functions (pdf’s) of the sources that generate the stereoscopic images. Matching is done using either a Maximum Likelihood framework or using correlation in the pdf domain and an MRF prior to model the disparity function.

  • A. Jagmohan, M. Singh and N.
Read More “Dense Stereo Maping Using Kernel Maximum Likelihood Estimation”

Split Aperture Imaging

Standard imaging sensors have limited dynamic range and hence are sensitive to only a part of the illumination range present in a natural scene. The dynamic range can be improved by acquiring multiple images of the same scene under different exposure settings and then combining them. We have developed a multi-sensor camera design, called Split-Aperture Camera, to acquire registered, multiple images of a scene, at different exposure, from a single viewpoint, and at video-rate. Read More “Split Aperture Imaging”

Railcar Truck Component Inspection

One machine vision system researched by the University of Illinois Urbana-Champaign (UIUC), under sponsorship of the AAR’s Technology Scanning Strategic Research Initiative, demonstrates that machine vision can be used for inspection of railcars. The UIUC prototype system inspect wheel, truct, and brake system components by automated, machine vision-based systems. Machine vision-based wheel and brake shoe inspection systems are already or will soon become commercially available. Read More “Railcar Truck Component Inspection”

High-Resolution Double Pyramid Panoramic Cameras

Pyramid Cameras

To acquire panoramic video sequences, we have developed two types of Double-Mirror-Pyramid cameras that capture up to 360-degree fields of view at high-resolution. The first one, A Single View Double-Mirror-Pyramid Panoramic Camera, acquires a single sequence from one viewpoint, whereas the second, A Multiview Double-Mirror-Pyramid Panoramic Camera, provides multiple video sequences each taken from a different viewpoint, e.g. Read More “High-Resolution Double Pyramid Panoramic Cameras”

Multi-View Double Mirror Pyramid Panoramic Cameras

Pyramid Cameras

To acquire panoramic video sequences, we have developed two types of Double-Mirror-Pyramid cameras that capture up to 360-degree fields of view at high-resolution. The first one, A Single View Double-Mirror-Pyramid Panoramic Camera, acquires a single sequence from one viewpoint, whereas the second, A Multiview Double-Mirror-Pyramid Panoramic Camera, provides multiple video sequences each taken from a different viewpoint, e.g. Read More “Multi-View Double Mirror Pyramid Panoramic Cameras”

Estimation and Segmentation of Images Using Parametric Image Models

Statistical models

Statistical models of pixel value variations have been developed and analyzed. Some of the work focuses on kernel density estimators to develop such models. Consequently, statistical theory of density estimators can be used for various tasks including segmentation of locally/globally parametric image signals; scale estimation and object registration. The main projects of this sub-theme are “Bandwidth Selection for Kernel Density Estimators” and “Estimation and Segmentation of Images Using Parametric Image Models” detailed below. Read More “Estimation and Segmentation of Images Using Parametric Image Models”

Video Encoding using Coset Codes

Video Compression using Wyner-Ziv Codes

Predictive coding is posed as a variant of the Wyner-Ziv coding, and problems in source and channel coding of video are addressed in this framework.

Video Encoding using Coset Codes

This project deals with scalable coding and robust Internet streaming of predictively encoded media. We frame the problem of predictive coding as a variant of the Wyner-Ziv problem in Information theory. Read More “Video Encoding using Coset Codes”

Compression of Image-based Rendering Data

Video Compression using Wyner-Ziv Codes

Predictive coding is posed as a variant of the Wyner-Ziv coding, and problems in source and channel coding of video are addressed in this framework.

Compression of Image-based Rendering Data

The design of compression techniques for streaming of image-based rendering data to remote viewers. A compression algorithm based on the use of Wyner-Ziv codes is proposed, which satisfies the key constraints for IBR streaming, namely those of random access for interactivity, and pre-compression. Read More “Compression of Image-based Rendering Data”

Facial Expression Decomposition

Tensor Manipulation

We explore new algorithms for computer vision based on multilinear algebra. Firstly, we learn the expression subspace and person subspace from a corpus of images based on Higher-Order Singular Value Decomposition (HOSVD), and investigate their applications in facial expression synthesis, face recognition and facial expression recognition. Secondly, we explore new algorithms for image ensembles/video representation and recognition using tensor rank-one decomposition and tensor rank-R approximation. Read More “Facial Expression Decomposition”

Bandwidth Selection for Kernel Density Estimators

Statistical models

Statistical models of pixel value variations have been developed and analyzed. Some of the work focuses on kernel density estimators to develop such models. Consequently, statistical theory of density estimators can be used for various tasks including segmentation of locally/globally parametric image signals; scale estimation and object registration. The main projects of this sub-theme are “Bandwidth Selection for Kernel Density Estimators” and “Estimation and Segmentation of Images Using Parametric Image Models” detailed below. Read More “Bandwidth Selection for Kernel Density Estimators”

Predictive Multiple Description Coding using Wyner-Ziv Codes

Video Compression using Wyner-Ziv Codes

Predictive coding is posed as a variant of the Wyner-Ziv coding, and problems in source and channel coding of video are addressed in this framework.

Predictive Multiple Description Coding using Wyner-Ziv Codes

Two-channel predictive multiple description coding is posed as a variant of the Wyner-Ziv coding problem. Read More “Predictive Multiple Description Coding using Wyner-Ziv Codes”

Face Detection

Faces and Gestures

The aforementioned work on representation and learning has contributed to two types of human computer interfaces we have developed. First, learning and classification techniques, including usual statistical classifiers, neural networks, support vector machines and artificial intelligence approaches, have been used to develop new methods for human face detection and hand gesture recognition. Read More “Face Detection”

Block-based motion estimation for missing video frame interpolation, and spatially scalable (multi-resolution) video coding

Video frames are often dropped during compression at very low bit rates. At the decoder, a missing frame interpolation method synthesizes the missed frames. We propose a two step motion estimation method for the interoplation. More specifically, the coarse motion vector field is refined at the decoder using mesh-based motion estimation instead of using computationally intensive dense motion estimation. Read More “Block-based motion estimation for missing video frame interpolation, and spatially scalable (multi-resolution) video coding”

Face Recognition

Faces and Gestures

The aforementioned work on representation and learning has contributed to two types of human computer interfaces we have developed. First, learning and classification techniques, including usual statistical classifiers, neural networks, support vector machines and artificial intelligence approaches, have been used to develop new methods for human face detection and hand gesture recognition. Read More “Face Recognition”

Panoramic Imaging with Infinite Dynamic Range

Most imaging sensors have a limited dynamic range and hence can satisfactorily respond to only a part of illumination levels present in a scene. This is particularly disadvantageous for omnidirectional and panoramic cameras since larger fields of view have larger brightness ranges. We propose a simple modification to existing high resolution omnidirectional/panoramic cameras in which the process of increasing the dynamic range is coupled with the process of increasing the field of view. Read More “Panoramic Imaging with Infinite Dynamic Range”

Transform Domain Magnification or Super-resolution

In order to apply a multi-dimensional linear transform, over an arbitrarily shaped support, the usual practice is to fill out the support to a hypercube by zero padding. This does not however yield a satisfactory definition for transforms in two or more dimensions. The problem that we tackle is: how do we redefine the transform over an arbitrary shaped region suited to a given application? Read More “Transform Domain Magnification or Super-resolution”

Linear Transforms over arbitrary supports

In order to apply a multidimensional linear transform over an arbitrarily shaped support, the usual practice is to fill out the support to a hypercube by zero padding. The problem that we tackle is: how do we redefine the transform over an arbitrary shaped region suited to a given application? We present a novel iterative approach to define any multidimensional linear transform over an arbitrary shape given that we know its definition over a hypercube. Read More “Linear Transforms over arbitrary supports”

Transform-Domain Watermarking

A new method for digital image watermarking which does not require the original image for watermark detection is presented. Assuming that we are using a transform domain spread spectrum watermarking scheme, it is important to add the watermark in select coefficients with significant image energy in the transform domain in order to ensure non-erasability of the watermark. Read More “Transform-Domain Watermarking”

Omnifocus Nonfrontal Imaging Camera

The concept of omnifocus nonfrontal imaging camera, OMNICAM or NICAM, initiated a new chapter in imaging and digital cameras. NICAM has introduced hitherto non-existent imaging capabilities, in addition to overcoming some problems with previous methods. NICAM is capable of acquiring seamless panoramic images and range estimates of wide scenes with all objects in focus, regardless of their locations. Read More “Omnifocus Nonfrontal Imaging Camera”

Learning to Recognize 3D Objects

3D Object Recognition

Recognition is achieved either by explicitly coding the recognition criteria in terms of low level structure, or through learning from examples. Learning algorithms incorporate subspace projections of higher dimensional data symbolically or using neural approaches.

Learning to Recognize 3D Objects

A learning account for the problem of object recognition is developed within the PAC (Probably Approximately Correct) model of learnability. Read More “Learning to Recognize 3D Objects”

Learning for Object Recognition

3D Object Recognition

Recognition is achieved either by explicitly coding the recognition criteria in terms of low level structure, or through learning from examples. Learning algorithms incorporate subspace projections of higher dimensional data symbolically or using neural approaches.

Learning for Object Recognition

A learning algorithm accounting for the problem of object recognition is developed within the PAC (Probably Approximately Correct) model of learnability. Read More “Learning for Object Recognition”

Structure Based Image Denoising

Multiscale structure based image representation using a set of regions

The application fields are (i) of appropriate granularity for best image compression, (ii) of appropriately rescaled size for image magnification or superresolution, and (iii) for smoothing for image quality restoration through structure-preserving denoising.

Structure Based Image Denoising

This work addresses the problem of denoising of images corrupted by AWGN. Read More “Structure Based Image Denoising”

Efficient spatio-temporal filtering for video denoising

Video Denoising

This work proposes a computationally fast scheme for denoising a video sequence. Temporal processing is done separately from spatial processing and the two are then combined to get the denoised frame. The temporal redundancy is exploited using a scalar state 1D Kalman filter. A novel way is proposed to estimate the variance of the state noise from the noisy frames. Read More “Efficient spatio-temporal filtering for video denoising”

Multiscale structure based video compression, by estimating and coding region motion instead of pixel motion

Segmentation Based Video Coding

We develope a very low bit rate video compression algorithm using multiscale image segmentation based hierarchical motion compensation and residual coding. The proposed algorithm outperforms the H.261-like coder by 3 dB and the H.263 version 2 by 1 dB. Such gains come from the use of image segmentation and reversed motion prediction. Read More “Multiscale structure based video compression, by estimating and coding region motion instead of pixel motion”

Learning of Low-level Spatiotemporal Structural Patterns

3D Object Recognition

Recognition is achieved either by explicitly coding the recognition criteria in terms of low level structure, or through learning from examples. Learning algorithms incorporate subspace projections of higher dimensional data symbolically or using neural approaches.

Learning of Low-level Spatiotemporal Structural Patterns

Given an image or a video sequence, a prespecified set of low level, spatial and/or temporal descriptors of the image/video structure, and a higher level interpretation of the structure, use computational learning methods to derive a succinct relationship between the interpretation and the low level structural description. Read More “Learning of Low-level Spatiotemporal Structural Patterns”

Gesture Recognition

Faces and Gestures

The aforementioned work on representation and learning has contributed to two types of human computer interfaces we have developed. First, learning and classification techniques, including usual statistical classifiers, neural networks, support vector machines and artificial intelligence approaches, have been used to develop new methods for human face detection and hand gesture recognition. Read More “Gesture Recognition”

Detection of photometric distribution discontinuities in video to locate shot changes

Video Shot Detection

We present a novel improvement to existing schemes for abrupt shot change detection. Existing schemes declare a shot change whenever the frame to frame histogram difference (FFD) value is above a particular threshold. In such an approach, a high value for the threshold results in a small number of false alarms and a large number of missed detections while a low value for the threshold decreases the number of missed detections at the expense of increasing the false alarms. Read More “Detection of photometric distribution discontinuities in video to locate shot changes”

3D Surfaces and Illumination from Stereo and Shading

Read More “3D Surfaces and Illumination from Stereo and Shading”

Structure Based Image Magnification or Super-resolution

Multiscale structure based image representation using a set of regions

The application fields are (i) of appropriate granularity for best image compression, (ii) of appropriately rescaled size for image magnification or superresolution, and (iii) for smoothing for image quality restoration through structure-preserving denoising.

Structure Based Image Magnification or Super-resolution

Resolution enhancement involves the problem of magnifying a small image to several times its size while avoiding blurring, ringing and other artifacts. Read More “Structure Based Image Magnification or Super-resolution”

Structure Based Image Compression

Multiscale structure based image representation using a set of regions

The application fields are (i) of appropriate granularity for best image compression, (ii) of appropriately rescaled size for image magnification or superresolution, and (iii) for smoothing for image quality restoration through structure-preserving denoising.

Structure Based Image Compression

Our novel reversible image compression method employs multiscale segmentation within a computationally efficient optimization framework to obtain consistently good performance over a wide variety of images. Read More “Structure Based Image Compression”

3D Surface Orientation from Texture Gradient

3D Surface Orientation from Texture Gradient computed in a single image of a homogeneously textured surface.

In an image containing texture elements at a range of scales, detect all elements, their relative locations and mutual containment relationships.

OBJECTIVE
Given a slanted view of a planar, homogeneously textured surface, estimate the surface slant from the image texture gradient. Read More “3D Surface Orientation from Texture Gradient”

Surfaces from Binocular Spatial Stereo

Given multiple images of a scene, taken from multiple cameras and different viewpoints, find the 3D depth map and surfaces

  • W. Hoff and N. Ahuja, Surfaces from Stereo, Proc. DARPA Image Understanding Workshop, Miami, December 9-10, 1985, 98-106.
  • W. Hoff and N. Ahuja, Surfaces from Stereo, 8th International Conference on Pattern Recognition, Paris, France, October 28-31, 1986, 516-518.
Read More “Surfaces from Binocular Spatial Stereo”