Multiscale structure based video compression, by estimating and coding region motion instead of pixel motion

Segmentation Based Video Coding

We develope a very low bit rate video compression algorithm using multiscale image segmentation based hierarchical motion compensation and residual coding. The proposed algorithm outperforms the H.261-like coder by 3 dB and the H.263 version 2 by 1 dB. Such gains come from the use of image segmentation and reversed motion prediction. The proposed region based reversed motion compensation strategy regulates the size and number of regions used, by pruning multiscale segmentation of video frames. Since regions used for motion compensation are obtained by segmenting the previously decoded frame, the shape of the regions need not be transmitted to the decoder. Furthermore, the hierarchical motion compensation strategy involves two stages: it refines an initial, region level, coarse motion field to obtain a dense motion field which provides pixel level motion vectors. The refinement procedure does not require any additional information to be transmitted. We also developed a residual coding technique for coding the displaced frame difference after segmentation based motion compensation. Residual coding is performed using a method which exploits the fact that the energy of the residual resulting from motion compensation is concentrated in a priori predictable positions. This residual coding technique can also be extrapolated to improve the performance of coders using a block based motion compensation strategy.

Results

We compare our coder with a generic block based coder as used in the H.261 or the H.263 standards. All performance comparison is performed on the luminance (Y) component of the video frames. In order to make an objective comparison, we used the same quantization strategies to quantize DCT coefficients for both the coders. The Huffman codes for motion vectors and DCT coefficients were the same for both the coders. The frame bit-rate was held (approximately) fixed for both the coders at 1280 bits. This bit-rate corresponds to a bit-rate of 9.6 kbps if every fourth frame is coded and a bit-rate of 38.4 kbps if all the frames are coded.

We also present results comparing our residual coding scheme with the usual block DCT based coding scheme. The overhead of 1 bit per coded block will be transmitted. Such a coder always performs better than the baseline block DCT scheme. The following figure shows the improvement (in dB PSNR) over the generic coder, when the quantization step size of AC coefficients is 16 and 32.

Related Publications:

Seung Chul Yoon, Krishna Ratakonda and N. Ahuja, Low Bit-Rate Video Coding with Implicit Multiscale Segmentation, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 9, No. 7, pp. 1115-1129, October 1999.
Seung Chul Yoon, Krishna Ratakonda and N. Ahuja, Region based Video Coding using a Multiscale Image Segmentation, Proc. IEEE Int. Conf. on Image Proc. (ICIP’97), vol. 2, pp. 510-513, Santa Barbara, 1997.
Krishna Ratakonda, Seung Chul Yoon and N. Ahuja, Coding the Displaced Frame Difference for Video Compression, Proc. IEEE Int. Conf. on Image Proc. (ICIP’97), vol. 1, pp. 353-356, Santa Barbara, 1997.