The WASP Handbook: Long Write Up

4. Long Write Up

4.1 One-dimensional wavelet transform

4.1.1 Continuous WT
( Next Previous Contents )

Formally, the wavelet transform of a function $f(x)\in {L^2(\mathbb{R})}$ ²is a projection of to the basic wavelet $\psi$ dilated by factor and shifted by :

$\displaystyle W_{\psi}(a,b)[f]= \int\limits_{-\infty}^{\infty} \frac{1}{\sqrt{\vert a\vert}} \overline{\psi}\left(\frac{x-b}{a}\right) f(x)dx.$

(1)

The reconstruction, or inverse wavelet transform, is

$\displaystyle f(x) = C_\psi^{-1}\int\int \psi\left( \frac{x-b}{a} \right) W_\psi(a,b)\frac{da\,db}{a^2},$

(2)

where $C_\psi$ is the normalization constant for a given basic wavelet $\psi$ . It can be easily evaluated by substituting (

) into (

) and making Fourier transform

$\displaystyle \phi(x) = \int_{-\infty}^{\infty} \exp(\imath\omega x) \hat\phi(\omega) \frac{d\omega}{2\pi}$

of each term of (

), if the function $\psi$ is chosen so that the integrals are finite:

$\displaystyle C_{\psi}= \int\limits_{-\infty}^{\infty} \frac{\vert\hat\psi(\omega)\vert^2}{\vert\omega\vert} d\omega <\infty ,$

(3)

where Fourier transform of $\phi(\omega)$ is denoted as $\hat\phi(\omega)$ .

The admissibility condition (), which provides a possibility of a function $\psi$ to be used as a basic wavelet $\psi$ , is rather loose. That is why there exist a lot of different basic wavelets invented by Daubechies [6], Meyer [7], Mallat and others [8].

4.1.1.1 Gaussian wavelets
( Next Previous Contents )

Amongst all differentiable functions used as wavelets, the derivatives of the Gaussian

$\displaystyle g_n(x)=(-1)^{n+1}\frac{d^n}{dx^n}{\rm e}^{-x^2/2},\;\; n > 0,$

(4)

also called the vanishing momenta wavelets, for their property $\int g_n(x)x^m dx =0$ ,

are of high importance. Roughly speaking, the

wavelet, when used for the analysis of a function

, is most sensitive to the singularities of the

-th derivative $f^{(n)}(x)$ . The normalizing coefficients for

family are $C_{g_n}=2\pi(n-1)!$ . The Gaussian itself (

) is not an admissible wavelet for it does not satisfy the condition (

It follows from the definition () that

$\displaystyle \frac{d\,g_n(x)}{dx}=-g_{n+1}(x).$

(5)

Other recurrent relations, valid for Hermitian polynomials also hold for

The localization properties for family can be evaluated explicitly. In general case the continuous wavelet transform with basic wavelet $\psi$ is centered at and has the window width $2\Delta_\psi$ :

$\displaystyle t^*$	$\displaystyle =$	$\displaystyle \frac{1}{\Vert\psi\Vert^2_2} \int_{-\infty}^\infty x \vert\psi(x)\vert^2 dx,$	(6)
$\displaystyle \Delta_w$	$\displaystyle =$	$\displaystyle \frac{1}{\Vert\psi\Vert _2} {\left\{ \int_{-\infty}^\infty (x-t)^2 \vert\psi(x)\vert^2 dx \right\} }^{1/2} .$	(7)

So, the wavelet coefficients $W_\psi(a,b)[f]$ are localized within the time window

$\displaystyle [ b + at^* - a\Delta_\psi, b + at^* + a\Delta_\psi ].$

(8)

At the same time its Fourier spectrum is localized within the frequency window

$\displaystyle \left[ \frac{w^*}{a} - \frac{1}{a}\Delta_{\tilde\psi}, \frac{w^*}{a} + \frac{1}{a}\Delta_{\tilde\psi} \right] \,\, .$

Therefore, wavelet transform unfolds the signal locally in both time and frequency. It is worth noting, that the ratio of the central frequency $\omega^*$ to the window width is scale independent

$\displaystyle \frac{\omega^*/a}{2\Delta_{\tilde\psi}/a}=\frac{\omega^*}{2\Delta_{\tilde\psi}}.$

(9)

The detailed study of localization properties of wavelet transform can be found in [6]. As an example, let us present the window widths for two first wavelets from the vanishing momenta family (

). Using the definition (

$\displaystyle \Delta_{g_n} = \frac{1}{\Vert{g_n}\Vert _2} {\left\{ \int\limits... ... \Vert{g_n}\Vert _2 \equiv \sqrt{\int_{-\infty}^\infty \vert g_n(x)\vert^2 dx}$

and Gaussian integrals

$\displaystyle \int\limits_{-\infty}^\infty x^{2n} \exp(-\alpha x^2) dx = (-1)^n \left( \frac{d}{d\alpha}\right)^n\sqrt{\frac{\pi}{\alpha}}$

for two first wavelets we get

Wavelet	-norm	Window width
	$\Vert g_1\Vert _2=\pi^{1/4}/\sqrt2$	$\Delta_{g_1} = \sqrt 3/2$
	$\Vert g_2\Vert _2=\sqrt{3/4}\pi^{1/4}$	$\Delta_{g_2}=\sqrt{7/6}$

4.1.1.2 Examples
( Next Previous Contents )

As the first example of how wavelets work, let us take a harmonic signal constructed by superposing the low-frequency one with the small fraction of the high-frequency one and then contaminating it by uniformly distributed random noise, see Fig. .

Fig.: Sample of a signal comprised of two harmonic components: a) without noise, b) contaminated by noise

a) $\includegraphics[width=6cm]{fig/e10-2001-205/CWT/signal3.eps}$

b) $\includegraphics[width=6cm]{fig/e10-2001-205/CWT/signal4.eps}$

Let us perform -wavelet transform (widely known as "Mexican hat") of the contaminated signal. Fig. (left) presents the wavelet spectrum. The shade-plot provides a powerful tool, which helps to display the structure of the signal. The set of wavelet coefficients can be presented as a projection of 3-dimensional surface onto the - plane. Coefficients with higher values are shown in light colors, the lower ones in dark. Although the wavelet spectrum is very informative, it often brings too much visually redundant information. To make it more contrast, the gray-scaled image is often transformed to the so-called wavelet skeleton. Lines on the skeleton correspond to local maxima of the wavelet coefficients.

Fig.: Wavelet spectrum (left) and its skeleton (right) calculated for the signal shown in Fig.

b. Horizontal axis corresponds to the time parameter (

), vertical one -- to the scale parameter (

$\includegraphics[width=6cm]{fig/e10-2001-205/CWT/spec25c.eps}$

$\includegraphics[width=6cm]{fig/e10-2001-205/CWT/skel16.eps}$

In the next figure the results of the final -filtering of the same signal are shown. First, we accomplish -filters with four selected scales only: 32, 64, 128 and 256. The result of the inversion is the signal in fig. a. When our signal was processed at scales 1, 2, 4, 8 and 16 its inversion gives result in Fig. b.

Fig.: Extracting data with

filter from the signal of Fig.

a: a) low frequency component, b) high frequency component.

a) $\includegraphics[width=6cm]{fig/e10-2001-205/CWT/result1.eps}$

b) $\includegraphics[width=6cm]{fig/e10-2001-205/CWT/result2.eps}$

It is seen that the filter allows to extract both components of the original signal. Thus selecting properly the scales of the wavelet transformation it is possible to highlight the components of desired scales. It should be pointed out here, that with the same signal the simple Fourier filter would fail. It could be done, in principle, as well by applying a Fourier filter, but as two-step procedure: to select the high frequency short-living sin-wave you should, first, extract the low-frequency wave and than subtract it from the signal. The wavelet filter allows a direct extraction of the desired component.

**Fig.:** Skeleton of the signal after denoising by wavelet. Comparing to Fig.2 (right) shows the disappearance of all short noise fractions smaller than .
$\includegraphics[width=6cm]{fig/e10-2001-205/CWT/skel05.eps}$

Another important application of wavelets is signal denoising. The procedure of denoising consists of nullifying wavelet coefficients with small amplitude before inverse transform. The skeleton of the signal after such denoising is shown in Fig. . As one can see, its typical high-frequency part produced by noise is efficiently suppressed. Then the inverse transform would enhance the corrupted signal back to its original view in Fig. a.

4.1.1.3 Implementation in WASP
( Next Previous Contents )

The direct evaluation of the convolution in the direct () and inverse () wavelet transform may be performed numerically but is expensive in memory and CPU time. At the same time, the self-similarity property of WT suggests the methods for constructing fast and effective algorithms. The straightforward way of WT implementation is to restrict the calculation on a discrete sub-lattice

$\displaystyle a = a_0^m, \quad b = n b_0 a_0^m,\quad \hbox{where}\ m,n \in \mathbb{Z},\quad b_0\ne0,$

and set the dilation step greater than unity

. The bi-parametric family of wavelets $U(b,a)\psi(x)$ becomes then a discrete set

$\displaystyle \psi_{mn}(x) = a_0^{-m/2}\psi(a_0^{-m}x - nb_0),$

(10)

labeled by two integers $m,n \in \mathbb{Z}$ .

An inverse discrete transform

$\displaystyle f(x) = \frac{2}{A+B}\sum_{m,n} \langle \psi_{mn},f \rangle \psi_{mn}(x) + \hbox{error terms},$

(11)

turns to be numerically stable, if for some $A>0,B<\infty$

$\displaystyle A \Vert f\Vert^2 \le \sum_{mn} \vert\langle \psi_{mn},f \rangle \vert^2 \le B \Vert f\Vert^2.$

If it is the case, the basis $\psi_{mn}$ is called a frame. In general, a frame is not orthogonal: only the case

gives orthogonal decomposition in (

However, if the analyzed signal consists of a discrete set of measured counts $\{h_k,\;k=\overline{1,\,N}\}$ , we can realize two discretization schemes using the explicit integration formula of explicit integration of Gaussian wavelets over bins to speed up the computations.

For a discretized signal

$\displaystyle f(x)= \frac{1}{N} \sum\limits_{k=1}^{N} \delta(x-x_k) \,\, ,$

(12)

(

) is written as

$\displaystyle W_{\psi}(a,b)f= \frac{1}{N} \frac{1}{\sqrt{\vert a\vert}} \sum\limits_{k=1}^{N} \psi\left(\frac{x_k-b}{a}\right) \,\, ,$

(13)

and it looks like the Parsen-Rozenblatt estimates of the unknown probability density over a sample (see [9]). (

) is nothing but the ``averaged sum'' of the wavelets $\psi\left[(x_k-b)/a\right]$ compressed to the size

and ``placed'' at points

The formula () does not contain integral and this fact allows one to speed-up the wavelet transform algorithm for discretized signals like (). Thus, () is just what is used in WASP for the wavelet analysis.

However for gaussian wavelets (GW) especially there is one more way to improve this algorithm [5]. Having a discretized signal $\{h_k,\;k=\overline{1,\,N}\}$ , one can define

$\displaystyle W_{g_n}(a,\,b)\,h=\frac{1}{\vert a\vert}\sum\limits_{k=1}^N h_k \int \limits_{x_{k-1}}^{x_k} g_n \left(\frac{x-b}{a}\right)\,dx,$

where $x_k=x_0+k\,\Delta x$ .

Due to () we can replace the integral by the difference

$\displaystyle W_{g_n}(a,\,b)\,h=\frac{a}{\sqrt{\vert a\vert}} \sum\limits_{k=1}... ...}\left(\frac{x_{k-1}-b}{a}\right)- g_{n-1}\left(\frac{x_k- b}{a}\right)\right].$

(14)

To speed up calculations it better to rearrange the summands:

$\displaystyle W_{g_n}(a,\,b)\,h=\frac{a}{\sqrt{\vert a\vert}} \left[h_1 g_{n-1}... ...limits_{k=1}^{N-1}(h_{k+1}-h_k)\, g_{n-1}\left(\frac{x_k-b}{a}\right) - \right.$
$\displaystyle \left. - h_N\,g_{n-1}\left(\frac{x_N-b}{a}\right)\right].$			(15)

That gives about a double gain in speed in comparison with (

However this expression is still rather slow to evaluate and could be used only for obtaining a few coefficients. For example, applying () to calculate $W_{g_n}(a_,\,b_j)\,h$ while is fixed, but runs over values $b_j=b_0+j\,\Delta b$ , $j=\overline{0,\,M-1}$ one needs to calculate values of the wavelet $g_{n-1}$ in points $\frac{x_k-b_j}{a}$ . Nevertheless, if we restrict the choice of the shift steps, as

$\displaystyle \Delta b=\frac{\Delta x}{n},\;\;n\in {\rm\bf N}$

(16)

$\displaystyle \Delta b=n\Delta x, \;\;n\in {\rm\bf N}$

(17)

it can speed up the calculation process. For instance, in the first case (

) to calculate $W_{g_n}(a,\,b)\,h$ one needs to obtain $g_{n-1}$ in points $\frac{x_0-b_0}{a}+\frac{\Delta b}{a}(kn-j)$ . The GW argument is identically defined by the integer index

. All possible combinations of

and

give

of different values. This number is growing much slower with the growth of

and

, than in the case of calculating by the formula (

). Thus we consider the values of $g_{n-1}$ as the vector with

components

$\displaystyle g^*=\left\{g_{n-1}\left(\frac{x_0-b_0}{a}\right)+\frac{\Delta b}{a}(kn-j)\right\}.$

The input discrete data can be also considered as the vector with

components

$\displaystyle h^* =\{h_1,\,h_2-h_1,\,h_3-h_2,\,\ldots ,\,h_N-h_{N-1}, -h_N\}.$

Then the raw of wavelet coefficients one can obtain as the scalar product of vectors

and $g^{**}$ , where components of $g^{**}$ are obtained from

just by selecting components with indexes

In the second case () one has the vector $W_{g_n}(a,\,b_j)\,h$ with components and should replace everywhere $\Delta b$ and by $\Delta x$ and .

The next substantial resource of increasing the speed of calculations is based on the practically compact support of the GW in space.

4.1.2 Discrete WT
( Next Previous Contents )

The main idea of a discrete wavelet transform is to represent a given data as a decomposition using basis functions ${\psi_{j,k}={2^{-j/2}\psi(2^{-j}t-k)}}$ , ${\phi_{j,k}={2^{-j/2}\phi(2^{-j}t-k)}}$ . They are constructed as scaled and shifted two basic functions:

a scaling function $\phi$ ;
a wavelet function (or ``mother wavelet'') $\psi$ .

Both these functions should have a locality in time/space and frequency domains. The purpose is to get the following decomposition of a source data:

$\displaystyle f(x_i)= \sum_{k=1}^{N}{s_k\phi_{Lk}(x_i)}+ \sum_{j=1}^{L}{\sum_{k=1}^{N}{d_{jk}\psi_{jk}(x_i)}} \,\, ,$

where

is the number of samples in data sequence, and

is the decomposition level. As the result, we can describe our signal in terms of coefficients ${s_k}$ and ${{d_{j,k}}}$ , where

are called approximation coefficients, $d_{j,k}$ are details at the

-th level of decomposition. The way how to get this representation of the signal is shown on Fig.

**Fig.:** Wavelet decomposition scheme
$\includegraphics[width=140mm]{fig/e10-2001-205/DWT/wdecomposition.eps}$

**Fig.:** Scheme of one decomposition step
$\includegraphics[width=140mm]{fig/e10-2001-205/DWT/filters.eps}$

Each decomposition step realized by using two filters: -- low pass filter and -- high pass filter. Filters and depend on functions $\phi$ and $\psi$ as follows:

$\displaystyle \phi(x)=2^{1/2}\sum_{k=0}^{l-1}{h_k\phi(2x-k)} \,\, , \quad \psi(x)=2^{1/2}\sum_{k=0}^{l-1}{g_k\phi(2x-k)} \,\, .$

So,

and

are solutions of these scaling equations, where

means the length of filter. The Fig.

shows one decomposition step.

4.1.3 Data filtering
( Next Previous Contents )

The filtering is carried out in three steps:

calculate the set of wavelet coefficients;
cut off the spectrum (e.g set to zero all coefficients which values are less then prescribed threshold);
perform the inverse wavelet transform of the obtained spectrum.

**Fig.:** Wavelet thresholding rules
$\includegraphics[width=120mm]{fig/e10-2001-205/DWT/thresh.eps}$

One of the useful properties of WT is its ability to denoise a data by thresholding. There are two basic methods yielding a good result:

``hard'' thresholding rule. If $\lambda$ is a threshold value, the rule looks like

$\begin{displaymath}\left\{ \begin{array}{ll} y\to 0, \quad & y\leq\lambda; \\ y\to y, \quad & y>\lambda. \end{array}\right.\end{displaymath}$
``soft'' thresholding rule:

$\displaystyle {\rm sign}(y)(\vert y\vert-\lambda). 0, \quad$ $\displaystyle y<\lambda.$

Both "hard" and "soft" rules are represented on Fig.

4.1.4 Lifting scheme
( Next Previous Contents )
Lifting scheme is fundamentally other approach to building wavelet decompositions, proposed by W.Sweldens [10,11]. Wavelet transform is said to be the way to decorrelate data. The wavelets generated by translations and dilations of one function are called first generation wavelets. This idea is extended by lifting scheme, which is a new tool for constructing of biorthogonal wavelets. The lifting scheme inherits the idea of FWT of applying low-pass and high-pass filters sequentially to the data array in place. The lifting scheme makes it in an optimal way, sometimes increasing the efficiency of calculations by factor 2, the inverse transform becomes just the reversing the order of operations with substitution + by -, and vice versa. Since lifting scheme does not rely on Fourier Transform (FT), it can be used even in cases where translations and dilations can't be used.

**Fig.:** One step of the lifting scheme.
$\includegraphics[width=12cm]{fig/e10-2001-205/DWT/lifting.eps}$

Follow W.Sweldens [11], let us assume that we have a signal and we need to make a decomposition of this signal into a non-correlated parts. First we separate the signal into a two equal parts, by putting all even points into one array, and all odd into another. If the data in the source signal are correlated, than the first array contains some information about the second one. Let us denote these two arrays by $\lambda$ and $\gamma$ , respectively. Thus the first stage of the lifting scheme is two separate data into two classes. The second stage of the lifting scheme is to find a data-independent prediction operator, such that $\gamma=P(\lambda)$ . Let us call it prediction. If the second array is functionally dependent, the prediction is exact, if not, we can substitute the array $\gamma$ , in place, by the differences, called wavelets: $\tilde\lambda = \lambda + U(\gamma)$ . Let us call this update rule. The procedure is then recursed storing the lost details in place of removed data. On each stage of reconstruction the lost details are added to the result of prediction.

As it shown by I. Daubechies and W. Sweldens, the realization of the lifting scheme generates an orthogonal wavelets of the Daubechies family.

4.2 Two-dimensional wavelet transform

4.2.1 WT via fast Fourier transform (FFT)
( Next Previous Contents )
The wavelet image of a function $f(\mathbf{x}), \mathbf{x}\in {\cal R}^2$ is a scalar product of a representation of the affine group acting on ${\cal R}^2$ :

$\displaystyle \mathbf{x}' = a R^\theta\mathbf{x}+ \mathbf{b},\quad U(\mathbf{a}... ...1}{\vert a\vert} \psi\left( R^{-\theta} \frac{\mathbf{x}-\mathbf{b}}{a} \right)$

where $R^\theta\mathbf{x}= (x \cos\theta - y\sin\theta, x\sin\theta+y\cos\theta)$ . So, the definition of wavelet coefficient taken with the basic wavelet $\psi(\mathbf{x})$ is:

$\displaystyle W(a,\mathbf{b},\theta)[f] = \int \frac{1}{\vert a\vert} \overline... ... R^{-\theta}\frac{\mathbf{x}-\mathbf{b}}{a} \right) f(\mathbf{x}) d^2\mathbf{x}$

For practical purposes it is faster to evaluate the convolution (

) in the Fourier space using FFT algorithm.

$\displaystyle W(a,\mathbf{b},\theta)[f] = a \int \exp(\imath \mathbf{b}\mathbf{... ...\hat\psi}\left( a R^{-\theta}\mathbf{k}\right) \hat f(\mathbf{k}) d^2\mathbf{k}$

It is possible to simplify the calculations it is possible to use rotationally symmetric basic wavelets $\psi = \psi(\vert\mathbf{x}\vert)$ . In this case there will be no dependence on the angle variable $\theta$ .

One of the wavelets of most common use is the Mexican Hat wavelet

$\displaystyle \psi(\mathbf{x}) = \bigl[2- \mathbf{x}A \mathbf{x}\bigr]. e^{-\fr... ...vert (\mathbf{k}A^{-1} \mathbf{k}) e^{-\frac{1}{2} \mathbf{k}A^{-1} \mathbf{k}}$

For the rotationally symmetric case ( $A = a \hat I)$ ), the Mexican hat wavelet (

) is simplified to

$\displaystyle \psi(\mathbf{x}) = (2-\mathbf{x}^2) \exp(-\mathbf{x}^2/2), \quad \hat\psi(\mathbf{k}) = \mathbf{k}^2 \exp(-\mathbf{k}^2/2)$

(18)

The scheme of the practical implementation of the evaluation of

$\displaystyle W(a,\mathbf{b})[f] = a \int \exp(\imath \mathbf{b}\mathbf{k}) \overline{\hat\psi}\left( a \mathbf{k}\right) \hat f(\mathbf{k}) d^2\mathbf{k}$

was as follows:

Evaluate Fourier transform of the original image $f(\mathbf{x}) \to \hat f(\mathbf{k})$ by means of FFT algorithm (Numerical Recipes library was used)
Multiply the $\hat f(\mathbf{k})$ at each point by $(a\mathbf{k})^2 \exp(-(a\mathbf{k})^2/2)$ .
Taking the inverse FFT of the result.

The normalization factor

of (

) was dropped in the numerical code.

4.2.2 Discrete WT
( Next Previous Contents )

**Fig.:** Two dimensional wavelet decomposition.
$\includegraphics[width=100mm]{fig/e10-2001-205/DWT/dec2d.eps}$

One step of a wavelet transform of a signal with a dimension higher than one is performed by transforming each dimension of the signal independently. Afterwords the -dimensional subband that contains the low pass part in all dimensions is transformed further. The 2-dimensional case is presented on the Fig. . The areas denoted by letters:

"v" -- vertical coefficients, the basis function -- $\psi(y)\phi(x)$ ;
"h" -- horizontal coefficients, the basis function -- $\phi(y)\psi(x)$ ;
"d" -- diagonal coefficients, the basis function -- $\psi(y)\psi(x)$ ;
"s" -- approximation coefficients, the basis function -- $\phi(y)\phi(x)$ .

Next Previous Contents