CS6825: Computer Vision word cloud

JPEG Compression

(taken from CMPT 365 homepage at SFU)

Motivations:

  1. Uncompressed video and audio data are huge. In HDTV, the bit rate easily exceeds 1 Gbps. --> big problems for storage and network communications.

  2. The compression ratio of lossless methods (e.g., Huffman, Arithmetic, LZW) is not high enough for image and video compression, especially when distribution of pixel values is relatively flat.

JPEG was created for the compression of single images. Motion JPEG is the application of JPEG to the individual frames of a Video. For video it compares to other compression techniques as follows:
  • Spatial Redundancy Removal -- Intraframe coding (JPEG)

  • Spatial and temporal Redundancy Removal -- Intraframe and Interframe coding (H.261, MPEG)
  


1. What is JPEG?

  • "Joint Photographic Expert Group". Voted as international standard in 1992.

  • Works with color and grayscale images, e.g., satellite, medical, ...

2. JPEG overview

  • Encoding

  • Decoding -- Reverse the order

3. Major Steps

  • DCT (Discrete Cosine Transformation)

  • Quantization

  • Zigzag Scan

  • DPCM on DC component

  • RLE on AC Components

  • Entropy Coding (i.e. Huffman Coding)

3a. Discrete Cosine Transform (DCT)

  • Overview:

  • Definition (8 point DCT):

    Question: What is F[0,0]? -- define DC and AC components.

  • The 64 (8 x 8) DCT basis functions

  • Why DCT not FFT?

    DCT is like FFT, but can approximate lines well with few coeff.

  • Computing the DCT

    • Factoring reduces problem to a series of 1D DCTs:

    • Most software implementations use fixed point arithmetic. Some fast implementations approximate coefficients so all multiplies are shifts and adds.

    • World record is 11 multiplies and 29 adds. (C. Loeffler, A. Ligtenberg and G. Moschytz, "Practical Fast 1-D DCT Algorithms with 11 Multiplications", Proc. Int'l. Conf. on Acoustics, Speech, and Signal Processing 1989 (ICASSP `89), pp. 988-991)

3b. Quantization

  • Why? -- To throw out bits

  • Example: 101101 = 45 (6 bits).
    Truncate to 4 bits: 1011 = 11.
    Truncate to 3 bits: 101 = 5.

  • Quantization error is the main source of the Lossy Compression.

Uniform quantization

  • Divide by constant N and round result (N = 4 or 8 in examples above).

  • Non powers-of-two gives fine control (e.g., N = 6 loses 2.5 bits)

Quantization Tables

  • In JPEG, each F[u,v] is divided by a constant q(u,v).

  • Table of q(u,v) is called quantization table.
    ----------------------------------
    
    16  11  10  16  24   40   51   61   
    
    12  12  14  19  26   58   60   55   
    
    14  13  16  24  40   57   69   56   
    
    14  17  22  29  51   87   80   62   
    
    18  22  37  56  68   109  103  77   
    
    24  35  55  64  81   104  113  92   
    
    49  64  78  87  103  121  120  101  
    
    72  92  95  98  112  100  103  99   
    
    ----------------------------------
    
    

  • Eye is most sensitive to low frequencies (upper left corner), less sensitive to high frequencies (lower right corner)

  • Standard defines 2 default quantization tables, one for luminance (above), one for chrominance.

  • Q: How would changing the numbers affect the picture (e.g., if I doubled them all)?

    Quality factor in most implementations is the scaling factor for default quantization tables.

  • Custom quantization tables can be put in image/scan header.

3c. Zig-zag Scan

  • Why? -- to group low frequency coefficients in top of vector.

  • Maps 8 x 8 to a 1 x 64 vector

3d. Differential Pulse Code Modulation (DPCM) on DC component

  • DC component is large and varied, but often close to previous value (like lossless JPEG).

  • Encode the difference from previous 8x8 blocks -- DPCM. Only send the DC value of the first block and then the subsequent differences.

3e. Run Length Encode (RLE) on AC components

  • 1x64 vector has lots of zeros in it

  • Encode as (skip, value) pairs, where skip is the number of zeros and value is the next non-zero component.

  • Send (0,0) as end-of-block sentinel value.

3f. Entropy Coding

  • Categorize DC values into SSS (number of bits needed to represent) and actual bits.
        --------------------
    
           Value       SSS   
    
             0          0   
    
            -1,1        1   
    
         -3,-2,2,3      2     
    
        -7..-4,4..7     3   
    
        --------------------
    
    

  • Example: if DC value is 4, 3 bits are needed.

    Send off SSS as Huffman symbol, followed by actual 3 bits.

  • For AC components (skip, value), encode the composite symbol (skip,SSS) using the Huffman coding.

  • Huffman Tables can be custom (sent in header) or default.
  • About Huffman Coding

4. Overview of the JPEG bitstream

  • A "Frame" is a picture, a "scan" is a pass through the pixels (e.g., the red component), a "segment" is a group of blocks, a "block" is an 8x8 group of pixels.

  • Frame header:
    sample precision
    (width, height) of image
    number of components
    unique ID (for each component)
    horizontal/vertical sampling factors (for each component)
    quantization table to use (for each component)

  • Scan header
    Number of components in scan
    component ID (for each component)
    Huffman table for each component (for each component)

  • Misc. (can occur between headers)
    Quantization tables
    Huffman Tables
    Arithmetic Coding Tables
    Comments
    Application Data

5. Various JPEG Modes

  • Baseline/Sequential -- the one that we described in detail

  • Lossless

  • Progressive

  • Hierarchical

  • "Motion JPEG" -- Baseline JPEG applied to each image in a video.
  1. Lossless Mode

    • A special case of the JPEG where indeed there is no loss

    • Take difference from previous pixels (not blocks as in the Baseline mode) as a "predictor".

      Predictor uses linear combination of previously encoded neighbors.
      It can be one of seven different predictor based on pixels neighbors

    • Since it uses only previously encoded neighbors, first row always uses P2, first column always uses P1.

    • Effect of Predictor (test with 20 images)

      Note: "2D" predictors (4-7) always do better than "1D" predictors.

    Comparison with Other Lossless Compression Programs (compression ratio):

    -----------------------------------------------------------------
    
         Compression Program              Compression Ratio        
    
                                    Lena  football    F-18   flowers 
    
    -----------------------------------------------------------------
    
            lossless JPEG           1.45     1.54     2.29     1.26   
    
        optimal lossless JPEG       1.49     1.67     2.71     1.33   
    
           compress (LZW)           0.86     1.24     2.21     0.87   
    
          gzip (Lempel-Ziv)         1.08     1.36     3.10     1.05   
    
    gzip -9 (optimal Lempel-Ziv)    1.08     1.36     3.13     1.05   
    
        pack (Huffman coding)       1.02     1.12     1.19     1.00     
    
    -----------------------------------------------------------------
    
    
    
    

  2. Progressive Mode

    • Goal: display low quality image and successively improve.

    • Two ways to successively improve image:

      1. Spectral selection: Send DC component, then first few AC, some more AC, etc.

      2. Successive approximation: send DCT coefficients MSB (most significant bit) to LSB (least significant bit).

  3. Hierarchical Mode

    A Three-level Hierarchical JPEG Encoder

    (From V. Bhaskaran and K. Konstantinides, "Image and Video Compression Standards: Algorithms and Architectures", Kluwer Academic Publishers, 1995.)

    • Down-sample by factors of 2 in each direction.

      Example: map 640x480 to 320x240

    • Code smaller image using another method (Progressive, Baseline, or Lossless).

    • Decode and up-sample encoded image

    • Encode difference between the up-sampled and the original using Progressive, Baseline, or Lossless.

    • Can be repeated multiple times.

    • Good for viewing high resolution image on low resolution display.

  4. JPEG-2

    • Big change was to use adaptive quantization
© Lynne Grewe