PPT Slide
FAST VIDEO SEGMENTATION USING ENCODING COST DATA
Xerox Corp, Webster , NY, USAqueiroz@wrc.xerox.com
Gozde Bozdagi, Taha Sencar
Dept of EEE, Baskent University, Ankar, Turkeybozdagi,taha@baskent.edu.tr
Why video segmentation
- Easy editting and processing
- Indexing
- Compression
Techniques
- Uncompressed Domain
- Pixel Difference
- Histogram Difference
- Compressed Domain (MJPEG, MPEG1, MPEG2, MPEG4)
- DC coefficients
- AC coefficients
- Motion Vectors
MPEG-1-2 Coding
- Composed of GOP units
- GOP starts with I-frame
- I-frames are DCT coded
- P- and B- frames: MC pred. Errors are DCT coded
- Frames are divided into MBs
Cut Characteristics
- I-frames
- Sharp cuts reflect on P frames
- Difference of I-frames
- P- and B-frames
- Affected by sharp or slow cuts
Motivation
- MPEG will not work well on scene cuts
- MC was designed to be a good predictor
- unless there is a scene cut
- Use MPEG motion analysis, i.e. we borrow the computation already done by MPEG
- If there are scene cuts, MPEG would fail and spend more bits
Proposed algorithm
- Compute the encoding cost data for each frame obtaining vector V(n), where n is the frame number.
- Separate vector V(n) into the corresponding components for frames I, P and B (Vi(m), Vp(m), Vb(m)), where m is the natural numbering for each individual vector.
- Make I-frame information differential
- For each of Vi(m), Vp(m), Vb(m), obtain hints of where scene cuts occur.
- Combine all detected scene-cut hint frame numbers, respecting original frame numbering.
Obtaining cut hints
- Select a low-pass filter. We used the normalized version of the Gaussian filter [1, 3, 6, 7, 6, 3, 1].
- Low-pass filter the input vector Vx(m) to obtain vector Lx(m) where x=i,p or b.
- Obtain high-pass vector Hx(m) = Vx(m) - Lx(m) .
- Compute maximum of Lx(m) as ML.
- Compute maximum absolute value of Hx(m) as MH.
- Detect a scene cut at moment m as the frame for which
- |Hx(m) | > MH ( C - {Lx(m) / ML} )
-
- Higher C -> less false cuts, more misses
- Lower C -> more false cuts, less misses
Minimal complexity
- Count the number of bits spent per frame: skip and count bytes in the MPEG bitstream until reaching the sequence 0000 0010. Once the 4-byte sequence is found, the accumulated byte count is the number of bytes actually used to encode a frame (encoding cost).
- Filtering and maximum calculation is performed on a small vector only
Conclusions
- Extremely low complexity
- Yet relatively efficient
- Can be used as a pre-processor to guide more complex segmentors
- Possible application: storyboarding