Image Processing and Analysis
© Wm. C. Brown Communications, Inc.
These three aspects of computer graphics are illustrated
graphically in Figure 15.1. In terms of representing the domain of computer
graphics, the vertical direction corresponds to abstraction. The abstract,
symbolic command, l = "Draw Robot Arm"
maps into the concrete image in the lower left-hand corner. Sending "Invert
yourself" and "Trace edges" messages to this image transforms
it into the other two images. The edge outline shown in the lower right-hand
corner is a particularly useful representation of the image for the purpose
of template matching against projected image outlines from a library of
known objects. A good match from this library allows an image analysis
system to report, "l' = Robot Arm with
The computer graphics domain of graphics synthesis, image processing, and graphical analysis. Synthesis involves creating an image from an abstract, symbolic description, l. Image processing involves transforming from one image to a new, more useful one. Analysis involves extracting a symbolic description, l', of objects composing the image.
It should be emphasized that image processing and image analysis deal with a much broader range of images than the artificial (synthetic) class described above. In fact, by far the largest class of input images for image processing systems can be classified as "natural" rather than artificial. That is, they are electronic photographs of the surface of one of the moons of Saturn, or a weather pattern as seen from a space satellite, or the image from a computer-aided tomography (CAT) scan. These are examples of images representing raw data objects produced by scientific research and medical diagnosis. The use of image processing for clarifying and interpreting such data is at the heart of the rapidly emerging field of scientific visualization.
Finally, the assumption throughout this chapter is that
image processing and analysis apply to digital images such as those represented
by PICT, TIFF, GIFF, and EPSF formats. Chemical processing of photographic
images and electronic processing of television images have long and distinguished
histories but differ fundamentally from digital image processing in two
important aspects. First, photographic and video images are analog in nature,
and the signals are inevitably degraded at each stage of processing. Digital
images are immune to such noise and degradation. Second, it is much more
difficult for photographic and video image processing to achieve the freedom
and versatility of digital image processing. Digital image processing allows
manipulation of selected portions of the image down to the pixel level.
This is very difficult or impossible to achieve with any other medium.
As indicated above, image objects, which act as the "raw
data" for image processing applications, arise from a number of sources.
These may be conveniently categorized as:
Let's examine each of these categories in a little more
detail and show examples of each category in Figure 15.2.
By artificial, we mean created as human artifacts. Someone sat down at a workstation and created the image with one of the application programs discussed in the previous chapters or some equivalent program. While these may be masterpieces of creative design and elegant rendering algorithms, they are in a fundamental sense the least interesting from an image processing and image analysis point of view. The reason for this is that such images are totally deterministic and completely specified by a log of the drawing and rendering commands issued by the user.
If the intermediate image processing commands are used
to augment the symbolic representation, l, image
analysis of any artificial synthetic image will yield the trivial result:
l' = l.
The second category of synthetic images is classified
as "theoretical." This class encompasses the whole range of simulation
experiment results. They are artificial in the sense that they are the
results of theoretical models constructed by humans. But they are indeterminate
since the complexity of the theoretical model prevents the user from knowing
what to expect, at least in detail, from the calculated results. Simulations
in finite element analysis, Mathematica, and Interactive Physics
all fall into this category, as do the complex supercomputer simulations
in the fields of chemistry, astrophysics, fluid dynamics, and meteorology.
The default image parameters may produce an acceptable graphical result,
but often the image can be improved by applying one or more of the image
processing transformations described in the following section.
Natural images constitute a large segment of the raw input
data for image processing programs. In addition to a vast array of scientific,
engineering, and medical image data, natural images include the whole range
of video image processing as well as conventional photographs scanned into
digital format. Specifically, natural image candidates for image processing
emerge from the following sources.
Images from various sources. (a) Supercomputer simulation of an extragalactic jet passing through an intergalactic shock wave; (b) MRI scan of human brain; (c) video image of personal workstation; (d) scanned image of key punches.
Geometric Image Processing. Images (a) - (c) were processed using Canvasª. Images (d) - (f) were processed using Pixel Paint Professionalª.
Examples of images from these various sources are shown
in Figure 15.2. Next, let us examine some of the image processing transformations
available for these images.
What are the motivations for image processing? They range
from the most serious applications in scientific visualization through
commercial applications in advertising and art to pure creative exploration
for personal satisfaction (i.e., play). A useful categorization of these
Photographic Processes. (a) Setting brightness to 90%. (b) Setting brightness to 10%. (c) Setting contrast to 100%. (d) Using Invert Map to produce photographic negative. All images processed using Digital Darkroomª.
A great variety of transformations are available for redesigning
the geometry of images. Figure 15.3 illustrates several of these. Such
geometric transformations provide valuable tools for artists and commercial
Edge detection (spatial derivative) functions. (a) Original scanned image, (b) sharpened image, (c) blurred image, (d) edge detection.
Another useful class of image processing tools simulates
processes available in the photographic darkroom. These include lightening,
darkening, negative, and contrast adjustment. Figure 15.4 demonstrates
some of the possibilities.
A very useful category of image processing tools uses
changes in intensity as the basis for the transformation function.
These tools detect the spatial derivative of the intensity and map this
derivative in various ways. By narrowing the spatial extent of the derivative,
the image is sharpened. By broadening it, the image is blurred.
By applying a threshold to the spatial derivative and mapping small values
to white and large values to black, the transformation produces an edge
detector. Figure 15.5 shows an initial scanned image and the effects
of sharpening, blurring, and edge detection.
The metamorphosis of one image into another is a particularly effective (and sometimes terrifying) image processing technique used increasingly in the movie industry. The original movie effects were created by a series of makeup changes and cross-dissolves from one image to another with the actors remaining motionless during the process. This metamorphosis or morphing is now readily accomplished electronically which accounts for its expanded role in advertising, science fiction movies, and horror films.
A variety of morphing techniques have been developed, ranging from geometric distortions similar to those in Figure 15.3 to actual polygon modeling of the object to be morphed and then application of spatial transformations to the model and filming the resulting changes. A crude, but interesting, first approximation to morphing may be accomplished by the simple expedient of photographic double exposure. Commercial programs are now available for morphing on personal workstations. Gryphon Software Corporation markets an excellent example of such a program called Morphª.
Morphª uses a 2D spatially-warped crossfade
resembling the double exposure technique mentioned previously. Starting
with two images of identical size, the user identifies key points on one
image and selects corresponding key points on the second image. Morphª
can use any color or B/W images for input and produces output in the form
of single morphs, QuickTime movies, or video output in PAL or NTSC
format. An example of Morphª output in single morph format
is shown in Figure 15.6.
An experiment in morphing. The upper left image morphs into the lower right image with the percentages of each shown. Output is from Morphª by Gryphon Software Corporation.
Commercial art designers and the video industry frequently
need special effects for enhancing photographic images or modifying portions
of their contents. One example familiar to television viewers is the use
of the mosaic effect to camouflage the faces of participants in
trials. Another popular special effect, called posterizing, involves
reducing a normal gray scale image to a small number of shades, like one,
two, three or four. Programs for art festivals and concerts often employ
posterizing. The mosaic special effect is demonstrated in part (a) and
posterizing to three shades of gray in part (b) of Figure 15.7.
Special effects. (a) The mosaic effect has been applied to the model's face to conceal her identity, (b) the original photograph has been posterized to three shades.
As indicated above, the ultimate goal of image analysis
is the identification of a scene and all the objects in the image. This
is an enormously complex and difficult task, usually classified as computer
vision in the field of AI. The subject involves the integration of
image processing techniques from computer graphics with pattern recognition
techniques from AI and is covered in considerable detail by Schalkoff.
Many image processing programs provide functions such
as sharpen, blur, invert, and mosaic as standard menu option for processing
whole images or selected portions of images. However, for completely general
image processing it is necessary to represent image in a convenient format
which gives the analyst access to individual pixel values. With such access
the analyst can easily duplicate all of the standard menu functions, tweak
them to optimize the function for particular images, and investigate more
complex transformations for difficult or novel cases. The mathematical
formulations for several image processing transformations are presented
in Rosenfeld and Kak.
Several standard mathematical operations are useful in
transforming graphical images. The image is defined as an intensity function,
f(x,y), at pixel location (x,y). For gray scale images,
f(x,y) may be represented by an 8-bit value in which 0 corresponds
to black and 255 corresponds to white. For 8-bit color, f(x,y)
represents 256 addresses to a color look up table (CLUT) representing the
color palette. For 24-bit ("true") color, f(x,y)
corresponds to a 24-bit code in which the red, blue, and green intensities
are each encoded with 8-bit segments. Such encoding is equivalent to three
overlapping 8-bit intensity functions, fR(x,y),
Edge detection is typically the first step in image segmentation,
preprocessing phase of object identification. The most useful function
for edge detection is the Laplacian operator defined as:
It is easy to show that the five-point Laplacian
computed by taking differences from the four nearest neighbor pixels and
assuming Dx = Dy = 1 reduces to:
= f(x+1,y) + f(x - 1,y) +
f(x,y+1) +f(x,y - 1) - 4 f(x,y) [15.2]
That is, the five-point Laplacian is computed by taking
( - 4) times the value of the current pixel and adding to this value the
value of the pixel immediately above, below, to the right, and to the left
of it. This can be summarized as applying the following 3 ¥
3 window operator to each pixel:
Five-point Laplacian operator [15.3]
One source of image blurring in photography is the diffusion
of dyes across sharp boundaries of the image. The time-dependent diffusion
equation is given by:
g(x,y,t) is the time dependent (degraded) image, and
g(x,y,0) = f(x,y)
= original, unblurred image.
By expanding g(x,y,t) around the latter time, t
= t, and keeping only the first-order term,
the original image, f(x,y), can be restored by:
f(x,y) = g(x,y,t) - [15.5]
This is another useful application of the Laplacian operator
and may be represented in terms of discrete pixel coordinates by the five-point
Five-point restoration operator [15.6]
Marr and Hildreth proposed a zero crossing edge
detector based on smoothing the edges with a Gaussian function before applying
the Laplacian operator. The circularly symmetric Gaussian function is given
s = standard deviation
or width of the Gaussian.
The transformed, edge-detected image, f'(x,y),
is then given in terms of the original image, f(x,y), and G(x,y)
f'(x,y) = () ¥ f(x,y) [15.8]
The function, also known as the Laplacian of Gaussian or LOG function, may be computed initially and stored for later convolution with various f(x,y) images.
Note that all three of the above transformations involve
specific instances of the general convolution defined as the 2D
That is, the intensity of the transformed image, g(x,y),
at pixel (x,y) is a function of the intensities of other pixels
at points (x´,y´). In all three transformations, the
convolution is local, that is, a given pixel intensity depends only
on nearby pixel values. This means that the h(Dx,Dy)
kernel function dies off rapidly as Dx
and Dy exceed a few pixels. In fact,
the first two transformations involve only Dx
and Dy = ±1.
The 2D Fourier transform, F(u,v), of image f(x,y)
is given by:
This function acts like a detector of spatial frequency
components of the intensity variations along the x-axis (u
component) and y-axis (v component). The Fourier transform
contributes to image analysis in several areas.
Suppose you were designing an image analysis tool and wanted to optimize its flexibility and capability. What features would you want to incorporate?
As a starting point, you would probably specify the following.
As the reader has probably guessed, image processing systems
providing all these features and more already exist. We next discuss and
show output from two outstanding examples: the NCSA Image and DataScope
public domain programs and the commercial Spyglass, Inc. programs called
Transform, View, and Dicer.
Simulation studies constitute an area of intensive activity at the National Center for Supercomputer Applications (NCSA) at the University of Illinois. Scientists using NCSA facilities recognized early on the critical need for tools for visualizing the results of their supercomputer analyses. They developed a sophisticated network of Macintosh and Sun workstations to present the results of the supercomputer simulations and to share their analyses with each other. Of even more importance to the broader scientific community, they produced a sophisticated suite of programs, including Image and DataScope, for analysis of their scientific data. These programs are in the public domain and may be obtained without charge from the NCSA.
To facilitate the sharing of data among people, projects,
and machines on the network, NCSA created the Hierarchical Data Format
(HDF). This TIFF-like file structure provides an object-oriented, tag-field
format with the following features:
The Hierarchical Data Format (HDF) communication function.
The key communication role played by HDF files is illustrated in Figure 15.8.
Building on the base of the NCSA image processing tools,
Brand Fortner has refined and extended them in the impressive Spyglass
series. These tools, Transform and Dicer, provide all of
the features of our ideal image analysis system and allow the researcher
to view her data through color enhancement, contour mapping, 3D representations,
and slices through 3D data.
The program Transform, from the Spyglass, Inc. series, approximates very closely the features specified for the ideal image analysis system. Figure 15.9 shows two images imported into transform. The first is an image scanned from a photograph, and the second is an image of a spreadsheet calculation of sine and cosine waves. Shown also in this figure is the mapping of a portion of the image with its HDF spreadsheet representation.
Transform provides a variety of display mode options for helping the user visualize the detailed structure of the data under investigation. Figure 15.10 illustrates several of the options available as menu selections for probing the structure of Figure 15.9(c) in more detail.
As useful as the display modes of Figure 15.10 are, the
real strength of Transform lies in the generality of its image representation
scheme and the ease with which this spreadsheet representation can be transformed.
Images imported into Spyglass Transform. (a) Image scanned from a photograph. (b) Small portion of HDF spreadsheet corresponding to highlighted rectangle at the left-hand side of (a). (c) Image of spreadsheet calculation of sine ¥ cosine function.
Various display modes of Transform.
(a) Surface plot; (b) contour map; (c) color map.
(a) Surface plot; (b) contour map; (c) color map.
Example 1: Lightening
The transformation of lightening an image is very standard
among image processing tools and readily accomplished with Transform.
1. An HDF file called Steve was opened by the Paste New command with the image in the clipboard. The scanned image had been opened, cropped, and copied to the clipboard by Digital Darkroom, an image processing program.
The Color Bar window after shifting the pixel intensity values downward by 100 units. The effect in this inverted palette is to lighten the image shown in Figure 15.12.
The Lightening Transformation using Transform.
2. The See Notebook menu item was selected and
the following transformation command typed in:
Steve2='Steve' - 100 [15.10]
This creates a new HDF file, Steve2, in which all of the
intensities are shifted downward by 100 units (in this application, white
= 0, black = 255).
3. One of the dark pixels, now with a value of 143, was
edited to a value of 250 to reestablish the intensity range from about
- 55 to 250, and the Set command of the Color Bar option
clicked to redefine the intensity range as shown in Figure 15.11.
The very same lightening effect could have been achieved
by simply opening the Color Bar window for the original image and
resetting the maximum limit to 350, thereby mapping the 0 - 255 range of
the original image to a lighter portion of the displayed palette. However,
the above procedure indicates the ease with which each pixel of an image
may be transformed. The original image and the lightened image are shown
in Figure 15.12.
Example 2: Laplacian Operations - Sharpening and Edge
The implementation of the Laplacian operation and other transformations based on the Laplacian are essentially trivial with the tools provided by Transform. Figure 15.13 illustrates the total extent of the programming required to compute the Laplacian transformation using the five-point operator (Equation 15.3), the sharpening operation (Equation 15.6), and three edge detection strategies based on the Laplacian operation. The original image is indicated as Robot.
To create a new HDF file, Robot2, corresponding to application of the Laplacian operator on the Robot file, the command, "Robot2=lap(ëRobot')" was entered in the Notebook window of the Robot file and the Calculate from Notes menu option selected. The resulting image is shown as Robot2. Since the Laplacian operator produces large positive and negative numbers of approximately equal magnitudes at the intensity discontinuities, the resulting image is predominantly gray corresponding to an HDF pixel value of zero.
To implement the sharpening or image restoration transformation,
the command, "Robot3=Robot - lap('Robot')" was issued. The
resulting image, Robot3, does, indeed, show sharpened outlines.
Laplacian Operations. Robot3 Æ Sharpening; Robot6 Æ Edge Detection.
To eliminate the negative intensity values of the Robot2 file, the command, 'Robot4=Abs(lap('Robot')) was entered in the Robot Notebook, and the image Robot4 was generated. This image suggests the outline of the edges, but its gray scale nature makes them indistinct.
To convert the 'analog' Robot4 image to a binary image, the threshold command 'Robot5=LTmask(abs(lap('Robot')),50)' was issued, and the image Robot5 appeared. This masking operation can be interpreted as: If a pixel value is less than 50, color it black; else, color it white.
Finally, a very effective edge detector is achieved using the 'Greater Than' threshold command and setting the threshold to an intensity value of 20. Robot6 is generated by applying the 'Robot6=GEmask(abs(lap(ëRobot')),20)' transformation to the original Robot image file.
This example illustrates a tiny sample of the functions and transformations available in Transform. Other functions available include ten trigonometric functions, ten mathematical functions, thirty-one functions for manipulating rows, columns, or an array as a whole, seven complex arithmetic functions including the FFT, seven specific kernel convolution functions, and a generic (user-defined) convolution function.
Transform also illustrates the power of abstraction.
By considering images as named objects (in an OOP sense) and supplying
a library of abstract, high-level messages that the user can send to the
objects, this system provides tools for carrying out exceedingly complex
image processing operations with a minimum of commands. This is really
what abstraction is all about.
Assume your task is to design an optimum system for viewing
3D data arrays. Such data may involve a series of CAT scan images taken
every two millimeters through the brain, or a set of neutron flux measurements
at a lattice of (x,y,z) points in a fission reactor core, or a supercomputer
simulation of the interaction of an extragalactic jet passing through an
intergalactic shock wave. Much of the experimental data collected by seismologist
and meteorologists is intrinsically 3D in nature, as are the theoretical
calculations in astrophysics and fluid dynamics. What visualization techniques
would help investigators in these areas interpret their 3D data?
An optimum 3D visualization system should provide, among
others, the following features for viewing 3D data.
An outstanding system supporting all of these features
and many more is marketed by Spyglass, Inc., under the name Dicer.
The visualization environment that Dicer presents
to the user is illustrated in Figure 15.14. The data set is a snapshot
of a 3D, time-dependent simulation of a supersonic jet of material interacting with an ambient medium moving at right angles to the jet. The figure shows the tools menu along the top, the color palette along the bottom, and three perpendicular slices in the viewing workspace. The menu tools fall into two general classes:
The Dicer Visualization Environment.
The three classes of projection objects and manipulations
Projection surfaces through the data volume, parallel to the three Cartesian planes.
Projection blocks and cutouts. Cutouts are invisible, except for their intersection with blocks.
This tool is used to select and drag projection objects.
Configurations of data slices in Dicer.
Intelligent color manipulation offers a powerful technique
for 3D visualization. Dicer provides four color tools for manipulation
of the color palette and the active or current color. These are
The color palette is displayed, with current color on the left.
Clicking on any color region converts that color to the current color.
Changes the current color to the color sampled by clicking this icon.
Causes selected colors to become invisible. Allows viewing through objects.
Inverse operation to the paint tool. Clicking on painted color returns it to palette color.
Example 1: Use of Data slices
Depending on the nature of the 3D data, various arrangements
of slicing planes or combinations of planes may prove most effective for
visualization purposes. Figure 15.15 shows some of the possible configurations.
Example 2: Use of Data Cubes
Data cubes support the union and subtraction operations
of constructive solid geometry (CSG). Figure 15.16 shows four arrangements
of data cubes and cutouts.
Data cubes for visualization. (a) and (b) are combinations of simple data cubes; (c) and (d) are single data cubes with cutouts.
Example 3: Use of Color Tools
Judicious use of color can assist the researcher in highlighting certain features and discovering new patterns in the data under investigation. Two Dicer features offer particularly powerful tools for exploring the structure of the user's data. With the Paint option, the user can use a bright color to pour into the model to highlight contours of a given volume element (or voxel) intensity. The selected color replaces the color under the cursor and serves to trace the location of the original color contour throughout the model. The use of the Paint feature is shown in Figure 15.17(b).
The second feature involves a combination of the Solid
Fill and Transparency options. First, the Solid Fill
option is selected to compute the color at each 3D voxel (volume element)
of the model. Then the Transparency tool is used to turn some range
of the color palette invisible. Figure 15.17(c) shows the effect of the
Transparency tool alone in dissolving away the invisible colors,
leaving only the selected ones projected on the walls of the data cube.
In Figure 15.17(d) the Solid Fill option has been used to dissolve
away invisible colors and leave the 3D configuration of the remaining color
contours. This tool offers an exceptionally useful technique for studying
complex structures in 3D space.
Color enhancements. (a) New color table; (b) effect of painting the dark shade with the current color; (c) effect of transparency in turning off darker shades; (d) Solid Fill effect showing 3D contour for this shade (darker shades are transparent).
One of the most valuable and practical areas of image
processing is that of image compression. An excellent theoretical background
and introduction to this topic is presented by Jain et al. We have
already introduced one promising image compression algorithm, based on
the Collage Theorem, in the chapter on fractals. Two more techniques
are discussed here.
The first technique is based on redundancy reduction and pattern encoding. A prime example of this technique is the program, Disk Doubler. By encoding repetitive character patterns numerically, Disk Doubler is able to reduce database files to nearly twenty five percent of their original size and image files to between thirty and fifty percent of their original size.
Disk Doubler requires just a few seconds to compress
most files. Compressed files retain their names and are automatically decompressed
as they are opened by application programs.
Far more impressive compression ratios are achieved by the JPEG (Joint Photographic Experts Group) algorithm. The JPEG specification has been approved by the International Standards Organization (ISO) and emerged from experiments in communications services such as the French Minitel videotex system. Using averaging techniques, JPEG routinely achieves up to 20:1 compression ratios with very little image degradation, and even at 50:1, a quilting effect is barely discernible.
The JPEG procedure operates on image files represented by video signals. Standard PICT, TIFF, and EPS image files must, therefore, first be converted to the video signal values of chrominance and luminance. JPEG then divides the image into 8 ¥ 8 pixel blocks on which a variation of Fourier transform called the discrete cosine transform (DCT) is applied. The DCT measures the variation in the chrominance and luminance within the block. Blocks with little variation can be represented compactly as average values.
Several commercial JPEG image compression systems, using
both hardware and software techniques, are now available on personal workstations.
Apple Computer has adopted the JPEG standard as the image compression technique
for its QuickTime multimedia protocol. Kodak offers a package called
Colorsqueeze based on the JPEG standard. Adriann Ligtenberg, one
of the authors of the JPEG specifications draft, is a founder of Storm
Technology which has pioneered image compression techniques. This company
offers a combination hardware/software package called PicturePress
which uses an extension called JPEG++. PicturePress allows the user
to apply different compression ratios to different regions of an image.
Figure 15.18 shows a portion of an image compressed by a 10:1 ratio by
Image decompressed by PicturePress and reduced by a factor of two. The original 770K image was compressed to a 77K JPEG++ file.
In this chapter we have attempted to integrate the areas of computer graphics, image processing and image analysis. Classical computer graphics is concerned with the synthesis of images based on messages, often symbolic, from the designer. Image processing involves the transformation of images, generally at the pixel level, to produce new images of greater clarity or more value to the user. Image analysis is described as the set of techniques required to extract symbolic information from the image data. Many of the techniques of image processing, such as edge detection, are valuable at the early stages of image analysis. While many of the pattern recognition algorithms of image analysis fall more in the domain of artificial intelligence than computer graphics, image processing effectively links the two areas.
The first half of the chapter illustrated many of the image processing tools available in standard graphics packages. These tools are frequently more than adequate for standard applications such as sharpening, blurring, and tracing outlines. More complex applications require tools with more flexibility and generality. A scientific visualization program was demonstrated for importing images from various file formats and processing them with a wide range of mathematical transformations. Sophisticated techniques for visualizing and analyzing complex 3D data were presented next, and the chapter concluded with a discussion and demonstration of the JPEG image compression technique.