CS6825

CS6825: Computer Vision
Instructor
Outline
Syllabus
Projects
Links
Fractal Compression

(Taken from Compression FAQ and written by John Kominek .)
Seven things you should know about Fractal Image Compression (assuming that 

you want to know about it).



   1. It is a promising new technology, arguably superior to JPEG -- 

      but only with an argument.

   2. It is a lossy compression method.

   3. The fractals in Fractal Image Compression are Iterated Function

      Systems.

   4. It is a form of Vector Quantization, one that employs a virtual

      codebook.

   5. Resolution enhancement is a powerful feature but is not some

      magical way of achieving 1000:1 compression.

   6. Compression is slow, decompression is fast.

   7. The technology is patented.



That's the scoop in condensed form. Now to elaborate, beginning with a little 

background.





 A Brief History of Fractal Image Compression

 --------------------------------------------



The birth of fractal geometry (or rebirth, rather) is usually traced to IBM 

mathematician Benoit B. Mandelbrot and the 1977 publication of his seminal 

book The Fractal Geometry of Nature. The book put forth a powerful thesis: 

traditional geometry with its straight lines and smooth surfaces does not 

resemble the geometry of trees and clouds and mountains. Fractal geometry, 

with its convoluted coastlines and detail ad infinitum, does.



This insight opened vast possibilities. Computer scientists, for one, found a 

mathematics capable of generating artificial and yet realistic looking land-

scapes, and the trees that sprout from the soil. And mathematicians had at 

their disposal a new world of geometric entities.



It was not long before mathematicians asked if there was a unity among this 

diversity. There is, as John Hutchinson demonstrated in 1981, it is the branch 

of mathematics now known as Iterated Function Theory. Later in the decade 

Michael Barnsley, a leading researcher from Georgia Tech, wrote the popular 

book Fractals Everywhere. The book presents the mathematics of Iterated Func-

tions Systems (IFS), and proves a result known as the Collage Theorem. The 

Collage Theorem states what an Iterated Function System must be like in order 

to represent an image.



This presented an intriguing possibility. If, in the forward direction, frac-

tal mathematics is good for generating natural looking images, then, in the 

reverse direction, could it not serve to compress images? Going from a given 

image to an Iterated Function System that can generate the original (or at 

least closely resemble it), is known as the inverse problem. This problem 

remains unsolved.



Barnsley, however, armed with his Collage Theorem, thought he had it solved. 

He applied for and was granted a software patent and left academia to found 

Iterated Systems Incorporated (US patent 4,941,193. Alan Sloan is the co-

grantee of the patent and co-founder of Iterated Systems.)  Barnsley announced 

his success to the world in the January 1988 issue of BYTE magazine. This 

article did not address the inverse problem but it did exhibit several images 

purportedly compressed in excess of 10,000:1. Alas, it was not a breakthrough. 

The images were given suggestive names such as "Black Forest" and "Monterey 

Coast" and "Bolivian Girl" but they were all manually constructed. Barnsley's 

patent has come to be derisively referred to as the "graduate student algo-

rithm."



Graduate Student Algorithm

   o Acquire a graduate student.

   o Give the student a picture.

   o And a room with a graphics workstation.

   o Lock the door.

   o Wait until the student has reverse engineered the picture.

   o Open the door.



Attempts to automate this process have met little success. As Barnsley admit-

ted in 1988: "Complex color images require about 100 hours each to encode and 

30 minutes to decode on the Masscomp [dual processor workstation]." That's 100 

hours with a _person_ guiding the process.



Ironically, it was one of Barnsley's PhD students that made the graduate 

student algorithm obsolete. In March 1988, according to Barnsley, he arrived 

at a modified scheme for representing images called Partitioned Iterated 

Function Systems (PIFS). Barnsley applied for and was granted a second patent 

on an algorithm that can automatically convert an image into a Partitioned 

Iterated Function System, compressing the image in the process. (US patent 

5,065,447. Granted on Nov. 12 1991.) For his PhD thesis, Arnaud Jacquin imple-

mented the algorithm in software, a description of which appears in his land-

mark paper "Image Coding Based on a Fractal Theory of Iterated Contractive 

Image Transformations." The algorithm was not sophisticated, and not speedy, 

but it was fully automatic. This came at price: gone was the promise of 

10,000:1 compression. A 24-bit color image could typically be compressed from 

8:1 to 50:1 while still looking "pretty good." Nonetheless, all contemporary 

fractal image compression programs are based upon Jacquin's paper.



That is not to say there are many fractal compression programs available. 

There are not. Iterated Systems sell the only commercial compressor/decompres-

sor, an MS-Windows program called "Images Incorporated." There are also an 

increasing number of academic programs being made freely available. Unfor-

tunately, these programs are -- how should I put it? -- of merely academic 

quality. 



This scarcity has much to do with Iterated Systems' tight lipped policy about 

their compression technology. They do, however, sell a Windows DLL for pro-

grammers. In conjunction with independent development by researchers else-

where, therefore, fractal compression will gradually become more pervasive. 

Whether it becomes all-pervasive remains to be seen.



Historical Highlights:

   1977 -- Benoit Mandelbrot finishes the first edition of The Fractal 

           Geometry of Nature.

   1981 -- John Hutchinson publishes "Fractals and Self-Similarity."

   1983 -- Revised edition of The Fractal Geometry of Nature is 

           published.

   1985 -- Michael Barnsley and Stephen Demko introduce Iterated

           Function Theory in "Iterated Function Systems and the Global

           Construction of Fractals."

   1987 -- Iterated Systems Incorporated is founded.

   1988 -- Barnsley publishes the book Fractals Everywhere.

   1990 -- Barnsley's first patent is granted.

   1991 -- Barnsley's second patent is granted.

   1992 -- Arnaud Jacquin publishes an article that describes the first

           practical fractal image compression method.

   1993 -- The book Fractal Image Compression by Michael Barnsley and Lyman 

           Hurd is published.

        -- The Iterated Systems' product line matures. 

   1994 -- Put your name here.





 On the Inside

 -------------



The fractals that lurk within fractal image compression are not those of the 

complex plane (Mandelbrot Set, Julia sets), but of Iterated Function Theory. 

When lecturing to lay audiences, the mathematician Heinz-Otto Peitgen intro-

duces the notion of Iterated Function Systems with the alluring metaphor of a 

Multiple Reduction Copying Machine. A MRCM is imagined to be a regular copying 

machine except that:



  1. There are multiple lens arrangements to create multiple overlapping

     copies of the original.

  2. Each lens arrangement reduces the size of the original.

  3. The copier operates in a feedback loop, with the output of one

     stage the input to the next. The initial input may be anything.



The first point is what makes an IFS a system. The third is what makes it 

iterative. As for the second, it is implicitly understood that the functions 

of an Iterated Function Systems are contractive. 



An IFS, then, is a set of contractive transformations that map from a defined 

rectangle of the real plane to smaller portions of that rectangle. Almost 

invariably, affine transformations are used. Affine transformations act to 

translate, scale, shear, and rotate points in the plane. Here is a simple 

example:





     |---------------|              |-----|

     |x              |              |1    |

     |               |              |     |

     |               |         |---------------|

     |               |         |2      |3      |

     |               |         |       |       |

     |---------------|         |---------------|  



         Before                      After



     Figure 1. IFS for generating Sierpinski's Triangle.



This IFS contains three component transformations (three separate lens ar-

rangements in the MRCM metaphor). Each one shrinks the original by a factor of 

2, and then translates the result to a new location. It may optionally scale 

and shift the luminance values of the rectangle, in a manner similar to the 

contrast and brightness knobs on a TV.



The amazing property of an IFS is that when the set is evaluated by iteration, 

(i.e. when the copy machine is run), a unique image emerges. This latent image 

is called the fixed point or attractor of the IFS. As guaranteed by a result 

known as the Contraction Theorem, it is completely independent of the initial 

image. Two famous examples are Sierpinski's Triangle and Barnsley's Fern. 

Because these IFSs are contractive, self-similar detail is created at every 

resolution down to the infinitesimal. That is why the images are fractal.



The promise of using fractals for image encoding rests on two suppositions: 1. 

many natural scenes possess this detail within detail structure (e.g. clouds), 

and 2. an IFS can be found that generates a close approximation of a scene 

using only a few transformations. Barnsley's fern, for example, needs but 

four. Because only a few numbers are required to describe each transformation, 

an image can be represented very compactly. Given an image to encode, finding 

the optimal IFS from all those possible is known as the inverse problem.



The inverse problem -- as mentioned above -- remains unsolved. Even if it 

were, it may be to no avail. Everyday scenes are very diverse in subject 

matter; on whole, they do not obey fractal geometry. Real ferns do not branch 

down to infinity. They are distorted, discolored, perforated and torn. And the 

ground on which they grow looks very much different.



To capture the diversity of real images, then, Partitioned IFSs are employed. 

In a PIFS, the transformations do not map from the whole image to the parts, 

but from larger parts to smaller parts. An image may vary qualitatively from 

one area to the next (e.g. clouds then sky then clouds again). A PIFS relates 

those areas of the original image that are similar in appearance. Using Jac-

quin's notation, the big areas are called domain blocks and the small areas 

are called range blocks. It is necessary that every pixel of the original 

image belong to (at least) one range block. The pattern of range blocks is 

called the partitioning of an image.



Because this system of mappings is still contractive, when iterated it will 

quickly converge to its latent fixed point image. Constructing a PIFS amounts 

to pairing each range block to the domain block that it most closely resembles 

under some to-be-determined affine transformation. Done properly, the PIFS 

encoding of an image will be much smaller than the original, while still 

resembling it closely.



Therefore, a fractal compressed image is an encoding that describes:

   1. The grid partitioning (the range blocks).

   2. The affine transforms (one per range block).



The decompression process begins with a flat gray background. Then the set of 

transformations is repeatedly applied. After about four iterations the attrac-

tor stabilizes. The result will not (usually) be an exact replica of the 

original, but reasonably close.





 Scalelessnes and Resolution Enhancement

 ---------------------------------------



When an image is captured by an acquisition device, such as a camera or scan-

ner, it acquires a scale determined by the sampling resolution of that device. 

If software is used to zoom in on the image, beyond a certain point you don't 

see additional detail, just bigger pixels. 



A fractal image is different. Because the affine transformations are spatially 

contractive, detail is created at finer and finer resolutions with each itera-

tion. In the limit, self-similar detail is created at all levels of resolu-

tion, down the infinitesimal. Because there is no level that 'bottoms out' 

fractal images are considered to be scaleless.



What this means in practice is that as you zoom in on a fractal image, it will 

still look 'as it should' without the staircase effect of pixel replication. 

The significance of this is cause of some misconception, so here is the right 

spot for a public service announcement.



                        /--- READER BEWARE ---\



Iterated Systems is fond of the following argument. Take a portrait that is, 

let us say, a grayscale image 250x250 pixels in size, 1 byte per pixel. You 

run it through their software and get a 2500 byte file (compression ratio = 

25:1). Now zoom in on the person's hair at 4x magnification. What do you see? 

A texture that still looks like hair. Well then, it's as if you had an image 

1000x1000 pixels in size. So your _effective_ compression ratio is 25x16=400.



But there is a catch. Detail has not been retained, but generated. With a 

little luck it will look as it should, but don't count on it. Zooming in on a 

person's face will not reveal the pores. 



Objectively, what fractal image compression offers is an advanced form of 

interpolation. This is a useful and attractive property. Useful to graphic 

artists, for example, or for printing on a high resolution device. But it does 

not bestow fantastically high compression ratios.



                        \--- READER BEWARE ---/



That said, what is resolution enhancement? It is the process of compressing an 

image, expanding it to a higher resolution, saving it, then discarding the 

iterated function system. In other words, the compressed fractal image is the 

means to an end, not the end itself.





 The Speed Problem

 -----------------



The essence of the compression process is the pairing of each range block to a 

domain block such that the difference between the two, under an affine trans-

formation, is minimal. This involves a lot of searching. 



In fact, there is nothing that says the blocks have to be squares or even 

rectangles. That is just an imposition made to keep the problem tractable. 



More generally, the method of finding a good PIFS for any given image involves 

five main issues: 

   1. Partitioning the image into range blocks. 

   2. Forming the set of domain blocks. 

   3. Choosing type of transformations that will be considered. 

   4. Selecting a distance metric between blocks. 

   5. Specifying a method for pairing range blocks to domain blocks. 



Many possibilities exist for each of these. The choices that Jacquin offered 

in his paper are: 

   1. A two-level regular square grid with 8x8 pixels for the large

      range blocks and 4x4 for the small ones. 

   2. Domain blocks are 16x16 and 8x8 pixels in size with a subsampling 

      step size of four. The 8 isometric symmetries (four rotations, 

      four mirror flips) expand the domain pool to a virtual domain 

      pool eight times larger. 

   3. The choices in the last point imply a shrinkage by two in each 

      direction, with a possible rotation or flip, and then a trans-

      lation in the image plane. 

   4. Mean squared error is used. 

   5. The blocks are categorized as of type smooth, midrange, simple

      edge, and complex edge. For a given range block the respective

      category is searched for the best match.



The importance of categorization can be seen by calculating the size of the 

total domain pool. Suppose the image is partitioned into 4x4 range blocks. A 

256x256 image contains a total of (256-8+1)^2  = 62,001 different 8x8 domain 

blocks. Including the 8 isometric symmetries increases this total to 496,008. 

There are (256-4+1)^2 = 64,009 4x4 range blocks, which makes for a maximum of 

31,748,976,072 possible pairings to test. Even on a fast workstation an ex-

haustive search is prohibitively slow. You can start the program before de-

parting work Friday afternoon; Monday morning, it will still be churning away. 



Increasing the search speed is the main challenge facing fractal image com-

pression.





 Similarity to Vector Quantization

 ---------------------------------



To the VQ community, a "vector" is a small rectangular block of pixels. The 

premise of vector quantization is that some patterns occur much more frequent-

ly than others. So the clever idea is to store only a few of these common 

patterns in a separate file called the codebook. Some codebook vectors are 

flat, some are sloping, some contain tight texture, some sharp edges, and so 

on -- there is a whole corpus on how to construct a codebook. Each codebook 

entry (each domain block) is assigned an index number. A given image, then, is 

partitioned into a regular grid array. Each grid element (each range block) is 

represented by an index into the codebook. Decompressing a VQ file involves 

assembling an image out of the codebook entries. Brick by brick, so to speak.



The similarity to fractal image compression is apparent, with some notable 

differences. 

   1. In VQ the range blocks and domain blocks are the same size; in an

      IFS the domain blocks are always larger. 

   2. In VQ the domain blocks are copied straight; in an IFS each domain 

      block undergoes a luminance scaling and offset.

   3. In VQ the codebook is stored apart from the image being coded; in 

      an IFS the codebook is not explicitly stored. It is comprised of

      portions of the attractor as it emerges during iteration. For that

      reason it is called a "virtual codebook." It has no existence

      independent of the affine transformations that define an IFS.

   4. In VQ the codebook is shared among many images; in an IFS the 

      virtual codebook is specific to each image.



There is a more refined version of VQ called gain-shape vector quantization in 

which a luminance scaling and offset is also allowed. This makes the similari-

ty to fractal image compression as close as can be.





 Compression Ratios

 ------------------



Exaggerated claims not withstanding, compression ratios typically range from 

4:1 to 100:1. All other things equal, color images can be compressed to a 

greater extent than grayscale images.



The size of a fractal image file is largely determined by the number of trans-

formations of the PIFS. For the sake of simplicity, and for the sake of com-

parison to JPEG, assume that a 256x256x8 image is partitioned into a regular 

partitioning of 8x8 blocks. There are 1024 range blocks and thus 1024 trans-

formations to store. How many bits are required for each?



In most implementations the domain blocks are twice the size of the range 

blocks. So the spatial contraction is constant and can be hard coded into the 

decompression program. What needs to be stored are:



   x position of domain block        8     6

   y position of domain block        8     6

   luminance scaling                 8     5

   luminance offset                  8     6

   symmetry indicator                3     3

                                    --    --

                                    35    26 bits



In the first scheme, a byte is allocated to each number except for the symme-

try indicator. The upper bound on the compression ratio is thus (8x8x8)/35 = 

14.63. In the second scheme, domain blocks are restricted to coordinates 

modulo 4. Plus, experiments have revealed that 5 bits per scale factor and 6 

bits per offset still give good visual results. So the compression ratio limit 

is now 19.69. Respectable but not outstanding.



There are other, more complicated, schemes to reduce the bit rate further. The 

most common is to use a three or four level quadtree structure for the range 

partitioning. That way, smooth areas can be represented with large range 

blocks (high compression), while smaller blocks are used as necessary to 

capture the details. In addition, entropy coding can be applied as a back-end 

step to gain an extra 20% or so.





 Quality: Fractal vs. JPEG

 -------------------------



The greatest irony of the coding community is that great pains are taken to 

precisely measure and quantify the error present in a compressed image, and 

great effort is expended toward minimizing an error measure that most often is 

-- let us be gentle -- of dubious value. These measure include signal-to-noise 

ratio, root mean square error, and mean absolute error. A simple example is 

systematic shift: add a value of 10 to every pixel. Standard error measures 

indicate a large distortion, but the image has merely been brightened.



With respect to those dubious error measures, and at the risk of over-sim-

plification, the results of tests reveal the following: for low compression 

ratios JPEG is better, for high compression ratios fractal encoding is better. 

The crossover point varies but is often around 40:1. This figure bodes well 

for JPEG since beyond the crossover point images are so severely distorted 

that they are seldom worth using. 



Proponents of fractal compression counter that signal-to-noise is not a good 

error measure and that the distortions present are much more 'natural looking' 

than the blockiness of JPEG, at both low and high bit rates. This is a valid 

point but is by no means universally accepted. 



What the coding community desperately needs is an easy to compute error meas-

ure that accurately captures subjective impression of human viewers. Until 

then, your eyes are the best judge.