Chapter 2 Compressive Sensing and Dictionary Learning 2

Chapter 2
Compressive Sensing and Dictionary Learning
2.1Compressive Sensing
Most signal and image sampling acquisition and reconstruction applications follow Shannon’s celebrated theorem which states that “the signal’s sampling rate must be at least twice its highest frequency (known as the signal’s Nyquist rate)”:
fs?2*fmwhere fs,fm denote the signal’s sampling and Nyquist rate respectively. In a similar manner the fundamental theorem of linear algebra suggests that to reconstruct a discrete finite-length signal, the number of measurements (or collected samples) required must be at least equal to its length. In other words, both approaches indicate that the signal’s accuracy, after its been reconstructed by using either one of these approaches, is proportional to the number of samples used. Nevertheless, both approaches suggest that a large number of samples must be used which often lead to requiring a large storage space, significant sensing time and high-power consumption.
Tons of data are acquired of which most can be discarded by today’s technologies and systems without creating any perceptual loss. Thus, many wonder if it is necessary to collect all the data or should we collect only the part of the data that won’t be thrown away.
Compressive Sampling, Compressive or Compressed Sensing (CS) seems to be the answer to the question that was previously stated. CS is a state-of-the-art and a novel sampling/sensing paradigm which goes beyond the conventional approaches. The CS approach asserts that far fewer samples or measurements are required than the number of samples needed by the traditional approaches to reconstruct or recover any data. The CS theory assumes that every signal has a concise representation in certain transform domains where only a few of them are significant while the rest are zero or deemed unimportant. Moreover, CS does not require any a priori knowledge of the signal other than it’s strictly or approximately sparse and does not try to comprehend the signal to decide which active or adapting sensing strategy to use. Therefore, CS is a simple and efficient signal acquisition and reconstruction protocol which can sample any signal at a sampling rate way below the Nyquist rate and reconstruct most signals from a seemingly incomplete set of samples. 1-4

Figure 2.1: A comparison between sampling techniques: a. The Shannon/Nyquist sampling approach, b. The Compressive Sensing scheme
2.1.1 Signal Acquisition and Reconstruction

The CS scheme which was initially introduced in 2006 by Donoho et al., is a combined signal acquisition and reconstruction approach which acquires signals at a low sampling rate and later reconstructs them by using only a few and random samples. The CS mechanism consists of three main stages, described in Figure 2.2: sparse representation, compression and reconstruction.

Figure 2.2: The Compressive Sensing Scheme
2.1.1.a The Sparse Representation Stage
The sparse representation stage is the first stage of the CS scheme. This stage is used to represent the acquired signal over a basis matrix which is later used to accurately represent the signal.
Let us consider the high-dimensional, finite-length and discrete-valued signal that consists of N samples s?RN, where N a very large number and the N×N sparse basis matrix known as the projection or the dictionary matrix
?=?1, ?2,…, ??where {?i}i=1N are the similarly sampled sparse basis vectors.
The sparse representation stage indicates that s can be represented by several projections on the matrix ?. Therefore, s can be expressed in terms of ? as
s=i=1Nxi?i=?x (2.1)where x?RN is the projection of the s on the sparsifying matrix hinting that every acquired non-sparse signal s can be represented by a sparse signal x by some sparse transforms. The sparse signal is equivalent to a vector that chooses the linear combinations of the basis vectors.
A sparse signal x is called k-sparse if it contains k(k?N) non-zero components or mathematically speaking
?k=x:x0?k (2.2)where ?k,0 denote the set of all k-sparse signals and the l0-norm respectively. In practice, most acquired signals are not sparse but are sparsely expressed in a basis. Therefore, s?RN will be, from now on, called on k-sparse if it can be represented by the k-sparse signal x?RN in ??RN×N and the equation described in (2.2) can be reformulated as:
?k={ s :s0?k} (2.3)
2.1.1.b The Compression Stage
The compression stage is the second of the three stages of the CS scheme. At this stage, the sparse signal that was previously calculated is compressed and sampled by using a measurement matrix.
Let us recall s?RN the signal that was acquired during the first stage of the CS process and let us consider the measurement matrix ??RM×N , where M?N . In terms of CS, ? is known as the recovery or sensing matrix. During this stage, s is multiplied by the sensing matrix ? as described in equation (2.4) and in Fig. 2.3.a, resulting in the signal y?RM ,known as the measurement signal, which consists of M random, linear and non-adaptive measurements of s such that
y=?s (2.4)Therefore, by revisiting the equation described in (2.1), y can be rewritten as :
y=??s (2.5)Let ? be a M×N matrix such that
?=?? (2.6)From now on, ? will be mentioned as the recovery or reconstruction matrix. Consequently, the equation described in (2.5) can be rewritten as:
y=?x (2.7)Both equations described in (2.5) and (2.7) are illustrated in Fig. 2.4.
2.1.1.c The Signal Reconstruction Stage
The reconstruction stage is the last stage of the CS paradigm. In this stage, the signal is recovered by usually using a reconstruction algorithm.

Let x ?RN and y?RM be a k-sparse and the measurement signal that were obtained in the first two stages of the CS scheme respectively. Therefore, the equation expressed in (2.4) can be revisited as follows
y=?x
Figure 2.3: The Compressive Sensing approach: a. Signal Acquisition and Compression, b. Signal Reconstruction

where ??RM×?, the reconstruction matrix. The CS approach will try to recover the sparse signal by solving an undetermined set of equations with more unknown variables than known, which often results in finding infinite solutions. x is k-sparse which implies that a signal of length N can be estimated by only using a few measurements. However, when solving the equation described in (2.6) the CS process often tries to find the sparsest possible solution which leads to solving the following problem
minx0 subject to y=?x (2.8)Where 0 denotes the l0-norm also known as the quasi-norm. The problem which was previously stated is known as the l0-minimization problem. The problem stated in (2.8) can be rephrased as
x=argminy=?xx0 subject to y=?x (2.9)where x an approximation of the k-sparse signal x.

The l0-minimization problem requires an exhaustive search of every (kN) sparse combination which is computationally intractable. Therefore, the problem described in both (2.8) and (2.9) is NP-hard.

Figure 2.4: The compression process in the CS paradigm: a. with the sensing matrix ??RM×N and the sparsifying basis matrix ??RN×?, b. with the recovery matrix ??RM×N where ?=??Nonetheless, through the years, several alternative algorithms and approaches have been proposed which can find similar solutions to the aforementioned problem in near polynomial time. One of these approaches, suggested by Donoho and Tanner, uses the convex optimization which is a convex relaxation problem and leads to a unique solution by using the l1-norm minimization problem in polynomial time. Hence, the problem described in (2.9) can be rephrased as
x=argminy=?xx1 subject to y=?x (2.10)where 1 denotes the l1-norm which represents the sum of the signal’s absolute values. The l1-norm minimization problem tries to find a unique solution with the smallest l1-norm providing a coefficient for each of the basis vectors in ?.

During the last decade, it’s been asked whether the l2-norm minimization problem which is stated as:
x=argminy=?xx2 subject to y=?x (2.11)where 2 denotes the l2-norm can be used as an approach to find sparse solutions. It’s been constantly concluded that since l2-norm represents the signal’s energy, the problem expressed (2.11) hardly ever finds a sparse solution. Moreover, if the solution of the l2-norm minimization problem is a sparse one, it will not be the sparsest solution. Therefore, this problem is avoided when approximating sparse signals. Figures 2.5.a and 2.5.b illustrate the sparse solution of the l0-norm minimization problem and how the l1-norm is the most preferable option when calculating sparse approximations respectively, whereas Figure 2.5.c helps us to understand why the solution of the l2-norm minimization problem will never be sparse.

In general, the lp-norm of a signal can be calculated using the following equation
lp:xp=pi=1xip (2.12)
Another preferable and popular alternate of the l0-norm minimization problem are the greedy algorithms. The output of the reconstruction algorithm is an estimate of the sparse signal x, x?RN. Afterwards, an estimate of the input signal s?RN, s?RN can be calculated by calculating x’s inverse-transform. Figure 2.3.b illustrates the reconstruction stage of the CS paradigm.
2.1.1.c.i Reconstructing Signals with Noise
In practice, noise is present in most signals. Let s?RN, y?RM and ??RM×N be the acquired signal which has been corrupted by noise, the measurement signal which is corrupted by noise and the recovery matrix respectively. By using the equation described in (2.7), y can be expressed as
y=?x+e (2.13)where x?RN a sparse vector and e?RM, the noise which has corrupted the measurement vector y?RM.

During the recovery stage of the CS paradigm, the noise e must be approximated and taken into consideration when trying to find the sparse solution. As it was previously stated, CS tries to find the sparsest possible solution which can be accomplished by solving the l1-norm minimization problem. Therefore, the problem described in (2.10) is reformulated as
x=argminy=?xx1 subject to y=?x+e (2.14)
where x?RN denotes the approximate sparse solution 2, 4-11.

Figure 2.5: (a)-(c) illustrate the geometry of the solutions of the l0, l1 and l2-norm minimization problem respectively in the R2 subspace.
2.1.2 Sparsity, Incoherence and the Sensing Matrix

The CS paradigm is a revolutionary and innovative approach which recovers signals by using their sparse approximations. To perfectly reconstruct a signal, the design and the incoherence property of the sensing matrix as well as the signal’s sparsity or compressibility must be carefully taken into consideration.
2.1.2.a.Sparsity and Compressibility

The core idea of the CS scheme is that most signals can be sparsely or concisely expressed in a suitable basis matrix. In other words, the word sparsity refers to a signal’s ability to be represented by a few non-zero coefficients regardless of its length and without any perceptual loss. For example, let us consider the image shown in Figure 2.6.a which can be sparsely expressed in the multi-scale wavelet matrix ?. Most natural images are described by large smooth and textured regions and a few edges. Therefore, the sparse representation of Fig. 2.6.a in ? will consist of only a few non-zero coefficients meaning that the image will be nearly sparse i.e. the image’s sparse representation will contain many near-to-zero coefficients. The sparse representation of an image in ? is calculated by recursively dividing the image’s low and high frequency components which in our example results to the image shown in Figure 2.6.b, where the low and high frequency values provide us with several coarse-scaled sparse approximations and with specific details of the original image respectively. The original image can be well-approximated by solving the problem described in (2.10), if the near-to-zero coefficients of the image’s sparse representation are discarded, resulting in the image shown in Fig. 2.6.c. The recovery stage of the CS scheme requires knowing the values and the locations of the coefficients of a signal’s sparse representation before discarding the least significant ones i.e. the coefficients with the lowest values. Fig. 2.6.b shows that the most significant coefficients of the image’s sparse representation are gathered around its edges. Nonetheless, sparsity, which is often used as a tool in most signal processing tasks (e.g. image processing or denoising) or as a method to avoid overfitting in statistics and machine learning, can be used to determine how efficiently a signal can be acquired non-adaptively.

Figure 2.6: Signal compression and reconstruction using a multi-scale wavelet basis: a. the original image, b. the sparse representation of the image in the wavelet domain, the low- and high-valued coefficients are represented by dark and light pixels respectively, c. an approximation of the original image by keeping only 10% of the sparse representation’s most significant coefficients .12
In practice, most signals are not strictly sparse but approximately sparse i.e. the sparse representations of these signals have near-to-zero values which are considered unimportant. These signals instead of sparse are known as compressible signals. In the recovery stage of the CS paradigm, a compressible signal can be well-approximated by solving the problem described in (2.10).
Let us consider s?RN, x?RN to be an acquired signal and its sparse representation in some basis respectively. A signal’s compressibility is defined as:
?ksp= mins??ks-sp (2.15)where p, k, s, ?k, p, denote the type of norm, the number of sparse coefficients, an approximation of s, the set of all k-sparse signals and the lp-norm respectively. If s??k i.e. s is a k-sparse signal, then ?ks=0 for any value of p.
Moreover, a signal is known to be compressible if the absolute values of the coefficients of its sparse representation decay rapidly in arranged in descending order i.e.
x1?x2?…xN-1?xN (2.16)where xi, i=1,…,N the i-th coefficient of x. If we consider the constants, C1,C2, q,r>0 ,
where the constants C2,r depend on the values of C1 and q respectively, then the coefficients xi follow a power law decay such that
xi?C1i-q (2.17)and
?kx2?C2k-r (2.18)In this case, a signal’s compressibility is proportional to the value of q. In other words, a large q leads to a faster decay of the sparse signals coefficients which results in a more compressible signal. 4, 12
2.1.2.bThe Sensing Matrix

The sensing matrix which is also known as the sampling or the measurement matrix plays a significant role in the CS paradigm. The sensing matrix is used to sample sparse signals at a rate which is much lower than the Nyquist rate. In the CS scheme, only a few measurements of the acquired signal are used meaning that the sensing matrix reduces the its dimensionality as much as possible without losing any important information that will be used to recover the signal. Mathematically, the sensing matrix is used to map the acquired signal into a measurement signal through a linear and weighted summation of the acquired signal.
Designing a sensing matrix which will perfectly recover most signals is an underlying process. A suitable sensing matrix must be non-adaptive meaning that it’s fixed and independent of the acquired signal, stable i.e. the important information of the signal won’t be lost or damaged during the compression stage, incoherent towards the sparsifying basis matrix and must satisfy several conditions such as the Null-Space Property and the Restricted Isometry Property. In addition, deciding which sensing matrix to use can be based on secondary criteria such as the recovery complexity, speed and complexity, number of measurements required and universality which means that a signal can be sparsely expressed in any basis or domain.
2.1.2.i.The Null-Space Property
The Null Space Property (NSP) is a fundamental condition of the sensing matrix which when satisfied guarantees the uniqueness of the recovered signal.
Let us consider s?RN, y?RM, ??RM×N, x?RN to be an acquired signal, the measurement signal, the recovery matrix and a k-sparse signal respectively such that
y=?xThe null space is a set of signals x such that
?x=0and is defined as
N?={ x :?x=0}where N? denotes the null space of ?.
Suppose we wish to recover every sparse signal x?RN from the measurement signals y?RM. If we consider y1,y2?RM to be two measurement signals and x1,x2?RN to be two k-sparse signals that correspond through the CS approach to y1 and y2 respectively such that
x1?x2 (2.19)
The inequality described in (2.19) immediately results in y1?y2 meaning that it would be impossible to distinguish x1 from x2 by using the same measurement signal y?RM. In other words, if y1=y2 then ?x1=?x2 which consequently leads to
?x1-x2=0with x1-x2??2k where ?2k denotes the set of all 2k-sparse signals meaning that ? uniquely represents every k-sparse signal x if the null space of ? does not contain any 2k-sparse signals. This property when defined in terms of a matrix’s spark leads to Theorem 2.1.
Definition 2.1: The spark of a matrix ??RM×N, with M,N>1 is the smallest number of columns of ? which are linearly dependent.
Theorem 2.1: For any signal y?RM there exists at most one signal x??k such that y=?x if and only if spark?>2k.
Theorem 2.1 suggests that if spark?>2k, then every set of columns up to 2k are linearly independent which consequently results in x??k being a unique signal. In addition, note that spark??1, M+1 which consequently, after taking Theorem 2.1 into account, results in
M?2k (2.20)
If x is a strictly sparse signal, then spark(?) provides us with the information on how well a signal can be recovered whereas if x is approximately sparse then stronger properties, e.g. the Null-Space Property, must be taken into consideration.
Definition 2.2: The matrix ??RM×N with M,N>1 satisfies the Null-Space Property (NSP) of order k, if there exists a constant ??(0,1) such that
xT1??xTc1 (2.21)holds for all x??(?) and for all T such that T?k where ? ?1,2,…,nk denotes a subset of indices, Tc, its complementary values and xT denotes a signal of length n which was obtained by setting the entries of x indexed by Tc to zero.

If ??RN×M satisfies the NSP of order k then a k-sparse signal can be recovered by using the l1-norm minimization problem meaning that the NSP is equivalent to the l1-norm minimization problem which consequently leads to the following theorems.
Theorem 2.2: If the matrix ??RM×N is a matrix that satisfies the NSP of order k with constant ??(0,1) and if there exists x?RN, y?RM such that y=?x and x a solution of the l1-norm minimization problem, then
x-x1?21+?1-??kx1 (2.22)and if x is k-sparse, then x=x. Theorem 2.3: Let ?:RN?RM be a sensing matrix and ?:RM?RN be a recovery algorithm. If the pair (?,?) satisfy the following inequality
??x-x2???kx1 (2.23)where ? a constant such that ??(0,1) , then the matrix ? satisfies the NSP of order 2k.

Note that the inequality described in (2.23) guarantees that every k-sparse signal can be recovered.

2.2.2.ii The Restricted Isometry Property
The Null Space Property guarantees that most signals can be uniquely recovered. However, in practice most signals have been corrupted by noise meaning that the property which was previously mentioned does not suffice and that a stronger condition known as the Restricted Isometry Property (RIP) must be taken into consideration.

Definition 2.3 : A matrix ??RM×N with M,N>1 satisfies the restricted isometry property (RIP) of order k if there exists a constant ?k?(0,1) known as the restricted isometry constant (RIC) such that
1-?kx22??x22?1+?kx22 (2.24)where x??k.
The inequality described in (2.24) implies that every subset of at most k columns of ? are nearly orthogonal and well-conditioned. In addition, the inequality described in (2.24) suggests that ? preserves the distances between any linear transformations of k-sparse signals which results in the null space of ? does not contain any k-sparse signals. A low ?k suggests that ? satisfies the RIP of order k.
Let us consider x1,x2??k , the matrix ??RM×N and the constant ?2k?(0,1) which has a near-to-zero value. Then, it is implied the ? satisfies the RIP of 2k which consequently leads to the following inequality
1-?2kx1-x22??x1-x22?1+?2kx1-x22 (2.25)The inequality described in (2.25) hints that every k-sparse signal must be well-preserved in the measurement subspace. Note that if ? satisfies the RIP of order k then it will satisfy the RIP of order ?k where ?>0.
Lemma 2.1: Let ??RM×N with M,N>1 be a matrix that satisfies the RIP of order k with constant ?k and let ? be a positive integer. Then ? satisfies the RIP of order k’=?k2 with constant ?k'<??k where * denotes the floor operator.
Note that, if ??3 and k?4 ,then this lemma allows us to extend from the RIP of order k to higher orders. It’s also worth mentioning that if ? satisfies the RIP, then it will satisfy the NSP which means that any k-sparse signal expressed in ? is uniquely recovered i.e. a solution of the l1-norm minimization problem which consequently leads to the following lemma and theorem.
Lemma 2.2: Let ??RM×N with M,N;1 be a matrix that satisfies the RIP of order k=k’+h with constant ?k. Then ? satisfies the NSP of order ?=k’h 1+?k1-?k .

Theorem 2.3: If ??RM×N with M,N;1 is a matrix that satisfies the RIP of order 3k with ?3k;13 and if there exists x?RN, y?RM such that y=?x and x a solution to the l1-norm minimization problem then
x-x2?C?kx1k (2.26)with C=2??+12+? and ?=1+?3k21-?3kAs it was mentioned earlier in this chapter, the RIP ensures that any measurement signal which has been corrupted by noise can be uniquely recovered meaning that the RIP is robust to noise which leads to the following theorems:
Theorem 2.4:
2.22.iii Incoherence