### Present Remotely

Send the link below via email or IM

• Invited audience members will follow you as you navigate and present
• People invited to a presentation do not need a Prezi account
• This link expires 10 minutes after you close the presentation

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

# Sparse Sampling, An Introduction

An introduction to Compressed Sensing, based upon Terence Tao's excellent article on the topic that can be found in http://terrytao.wordpress.com/2007/04/13/compressed-sensing-and-single-pixel-cameras/
by

## Gabriel Mañana

on 19 October 2013

Report abuse

#### Transcript of Sparse Sampling, An Introduction

Sparse Sampling
An Introduction
1. Image Compression
2. Sparse Sampling
3. Feasible Solutions
4. Single-pixel Camera
Agenda
Image Acquisition
Image Compression
Digital camera, e.g.
1024 x 2048 (2M pixels)
Grayscale images (8 bits@pixel)
Image file will be 2 MB in disk
A naive approach: locate a square (say 100x100) where all pixels have exactly the same color. Without compression, this square would take 10000 bytes to store.
Image Compression
Now, the above algorithm is not all that effective in practice, as it does not cope well with sharp transitions from one color to another. It turns out to be better to work not with average colors in squares, but rather with average color imbalances in squares – the extent to which the intensity on (say) the right half of the square is higher on average than the intensity on the left.
Image Compression
The thing is that while the
space of all images
has 2 MB worth of “degrees of freedom” or “entropy”, the space of all interesting images is much smaller, and can be stored using much less space, especially if one is willing to throw away some of the quality of the image.
Lossless compression
Lossy compression
Image Compression
To summarize, the original 1024 x 2048 image may have 2 million degrees of freedom, and in particular if one wants to express this image in terms of wavelets then one would thus need two million different wavelets in order to reconstruct all images perfectly.
Image Compression
Now, if we (or the camera) knew in advance which hundred thousand of the 2 million wavelet coefficients are going to be the important ones, then the camera could just measure those coefficients.
Now in practice, we don’t get such an impressive gain in compression, because even apparently featureless regions have some small color variation between them. So, given a featureless square, what one can do is record the average color of that square, and then subtract that average off from the image, leaving a small residual error.
Sparse Sampling
Sparse Sampling
Solution (1)
One can formalize this by using the (two-dimensional) Haar wavelet system.

It then turns out that one can work with “smoother” wavelet systems which are less susceptible to artifacts, but this is a technicality which we will not discuss here. But all of these systems lead to similar schemes: one represents the original image as a linear superposition of various “wavelets” (the analogues of the colored squares in the preceding slide), stores all the significant (large magnitude) wavelet coefficients, and throws away (or “thresholds”) all the rest.
However, the typical interesting image is very sparse or compressible in the wavelet basis: perhaps only a hundred thousand of the wavelets already capture all the notable features of the image, with the remaining 1.9 million wavelets only contributing a very small amount of “random noise”, which will be largely invisible to most observers.
However, the camera does not know which of the coefficients are going to be the key ones, so it must instead measure all 2 million pixels, convert the image to a wavelet basis, locate the hundred thousand dominant wavelet coefficients to keep, and throw away the rest.
But, as said before, the camera does not know in advance which hundred thousand of the two million wavelet coefficients are the important ones that one needs to save.

What if the camera selects a completely different set of 100,000 (or 300,000) wavelets, and thus loses all the interesting information in the image?
Sparse Sampling
Sparse Sampling
Sparse Sampling
Matching Pursuit
Sparse Sampling
Sparse Sampling
Matching Pursuit
Basis Pursuit
Solution (2)
minimization
Matching Pursuit
A type of numerical technique which involves finding the "best matching" projections of multidimensional data onto an over-complete dictionary D.
1. Compression
2. Sparse Sampling
3. Feasible Solutions
4. Single-pixel Camera
10 min
15 min
10 min
It is quite typical for an image to have a large featureless component – for instance, in a landscape, up to half of the picture might be taken up by a monochromatic sky background.
1/40
2/40
3/40
4/40
5/40
6/40
However, there are non-consumer imaging applications in which this type of data collection paradigm is infeasible, most notably in sensor networks.

If one wants to collect data using thousands of sensors, which each need to stay in situ for long periods of time such as months, then it becomes necessary to make the sensors as cheap and as low-power as possible
– which in particular rules out the use of devices which require heavy computer processing power at the sensor end.
Image Compression
7/40
The above algorithm, in which one collects an enormous amount of data but only saves a fraction of it, works just fine for consumer photography.
The images described can take up a lot of disk space on the camera, and also take a non-trivial amount of time/energy to transfer from one medium to another. So, it is common practice to get the camera to compress the image, from an initial large size (e.g. 2 MB) to a much smaller size (e.g. 10% of the size, ~200 KB).
8/40
9/40
10/40
11/40
13/40
14/40
15/40
18/40
20/40
19/40
17/40
Locate a wavelet whose signature seems to correlate with the data collected; remove all traces of that signature from the data; and repeat until we have totally “explained” the data collected in terms of wavelet signatures.
21/40
22/40
23/40
24/40
25/40
26/40
27/40
1
G. Mañana Guichón
http://en.wikipedia.org/wiki/Image_compression
Run-length encoding (PCX, BMP, TGA, TIFF)
Entropy encoding
Adaptive dictionary algorithms LZW (GIFF, TIFF)
Color space reduction
Chroma sub-sampling
Transform coding (DCT, Wavelet transform)
Fractal compression
Instead, one can simply record the position, dimension and color with which to fill the square: this will require 4 or 5 bytes in all to record, leading to a massive space saving.
http://en.wikipedia.org/wiki/Haar_wavelet
This section based upon Terence Tao's article on Compressed Sensing:

http://terrytao.wordpress.com/2007/04/13/compressed-sensing-and-single-pixel-cameras/
It is possible to measure a single coefficient by applying a suitable “filter” or “mask” to the image, and making a single intensity measurement to what comes out.
http://en.wikipedia.org/wiki/Wavelet_transform
The main philosophy is this: if one only needs a 100,000 components to recover most of the image, why not just take a 100,000 measurements instead of 2 million?
This is where sparse sampling comes in.
The solution to this problem is both simple and unintuitive.
It is to make 300,000 measurements which are totally unrelated to the wavelet basis.
In practice, we would allow a safety margin, e.g. taking 300,000 measurements, to allow for all sorts of issues, ranging from noise to aliasing to breakdown of the recovery algorithm.
In fact, the best types of measurements to make are (pseudo-)random measurements – generating 300,000 random “mask” images and measuring the extent to which the actual image resembles each of the masks.
Now, these measurements (or “correlations”) between the image and the masks are likely to be all very small, and very random.
But –
and this is the key point
– each one of the 2 million possible wavelets which comprise the image will generate their own distinctive “signature” inside these random measurements, as they will correlate positively against some of the masks, negatively against others, and be uncorrelated with yet more masks.
Sparse Sampling
12/40
But (with overwhelming probability) each of the 2 million signatures will be distinct; furthermore, it turns out that arbitrary linear combinations of up to a hundred thousand of these signatures will still be distinct from each other.
From a linear algebra perspective, this is because two randomly chosen 100,000-dimensional subspaces of a 300,000 dimensional ambient space will be almost certainly disjoint from each other.
Because of this, it is possible in principle to recover the image (or at least the 100,000 most important components of the image) from these 300,000 random measurements.
In short, we are constructing a linear algebra analogue of a hash function.
There are however two technical problems with this approach. Firstly, there is the issue of noise: an image is not perfectly the sum of 100,000 wavelet coefficients, but also has small contributions from the other 1.9 million coefficients also. These small contributions could conceivably disguise the contribution of the 100,000 wavelet signatures as coming from a completely unrelated set of 100,000 wavelet signatures; this is a type of “aliasing” problem.
The second problem is how to use the 300,000 measurements obtained to recover the image.
If we knew which 100,000 of the 2 million wavelets involved were, then we could use standard linear algebra methods (Gaussian elimination, least squares, etc.) to recover the signal.
A naive least-squares approach gives horrible results which involve all 2 million coefficients and thus lead to very noisy and grainy images.
Sparse Sampling
16/40
One could perform a brute-force search instead, applying linear algebra once for each of the possible set of 100,000 key coefficients, but this turns out to take an insanely impractical amount of time (there are roughly 10^170,000 combinations).
NP-complete in general.
5 min
http://en.wikipedia.org/wiki/Matching_pursuit
The basic idea is to represent a signal f from Hilbert space H as a weighted sum of functions (called atoms) taken from D:
The concept of matching pursuit in signal processing is related to statistical projection pursuit, in which "interesting" projections were found; ones that deviate more from a normal distribution are considered to be more interesting.
Solutions
Matching Pursuit
Basis Pursuit or minimization
There are now rigorous results which show that these approaches can reconstruct the original signals perfectly or almost-perfectly with very high probability of success, given various compressibility or sparsity hypotheses on the original image.

The matching pursuit algorithm tends to be somewhat faster, but the basis pursuit algorithm seems to be more robust with respect to noise.
http://arxiv.org/abs/math.CA/0410542
http://www.math.lsa.umich.edu/~annacg/papers/TG05-signal-recovery-rev-v2.pdf
http://www-stat.stanford.edu/~donoho/Reports/2004/l1l0EquivCorrected.pdf
http://www.ams.org/mathscinet-getitem?mr=2230846
http://arxiv.org/abs/math/0602559
http://en.wikipedia.org/wiki/Basis_pursuit
Out of all the possible combinations of wavelets which would fit the data collected, find the one which is “sparsest” in the sense that the total sum of the magnitudes of all the coefficients is as small as possible.
It turns out that this particular minimization tends to force most of the coefficients to vanish.
This type of minimization can be computed in reasonable time via convex optimization methods such as the simplex method.
Basis Pursuit
28/40
where x is a N x 1 solution vector,
y is a M x 1 vector of observations,
A is a M x N transform matrix,
and M < N.
A mathematical optimization problem of the form:
It is usually applied in cases where there is an underdetermined system of linear equations y = Ax that must be satisfied exactly, and the sparsest solution in the L1 sense is desired.
Basis pursuit can be thought of as a least squares problem with an L1 regularizer.

When it is desirable to trade off exact congruence of Ax and y in exchange for a sparser x, basis pursuit denoising is preferred.
Basis Pursuit Denoising
where (Lagrangian multiplier) controls the trade-off between sparsity and reconstruction fidelity,
x is a N x 1 solution vector,
y is a M x 1 vector of observations,
A is a M x N transform matrix,
and M < N.
A mathematical optimization problem of the form:
Basis Pursuit Denoising
Basis pursuit denoising solves a regularization problem with a trade-off between having a small residual (making y close to Ax in the L2 sense) and making x simple in the L1 sense.
Exact solutions to basis pursuit denoising are often the best computationally tractable approximation of an underdetermined system of equations. Basis pursuit denoising thus has potential applications in statistics (c.f. the LASSO method of regularization), image compression and Compressed Sensing.
In mathematics and statistics, particularly in the fields of machine learning and inverse problems, regularization involves introducing additional information in order to solve an ill-posed problem or to prevent overfitting. This information is usually of the form of a penalty for complexity, such as restrictions for smoothness or bounds on the vector space norm.
http://en.wikipedia.org/wiki/Regularization_%28mathematics%29
http://en.wikipedia.org/wiki/Total_variation
http://en.wikipedia.org/wiki/Total_variation_regularization
Basis Pursuit Denoising
As , this problem becomes basis pursuit.
Several popular methods for solving basis pursuit denoising include homotopy continuation, fixed-point continuation and spectral projected gradient for L1 minimization (which actually solves LASSO, a related problem).
Single-pixel Camera (Rice)
http://dsp.rice.edu/cscamera
Single-pixel Camera (Rice)
29/40
Single-pixel Camera (Rice)
30/40
Single-pixel Camera (Rice)
31/40
Results
All images reconstructed using Total Variation Minimization (TVM)
"Compressive Sensing is an emerging field based on the revelation that a small group of non-adaptive linear projections of a compressible signal or image contains enough information for reconstruction and processing.

Our new digital image/video camera directly acquires random projections of a scene without first collecting (all) the pixels/voxels. The camera architecture employs a digital micro-mirror array to optically calculate linear projections of the scene onto pseudo-random binary patterns. Its key hallmark is its ability to obtain an image or video with a single detection element (the "single pixel") while measuring the scene fewer times than the (total) number of pixels/voxels.
http://dsp.rice.edu/cscamera
Since the camera relies on a single photon detector, it can also be adapted to image at wavelengths where conventional CCD and CMOS imagers are blind."
Total Variation
32/40
For a real-valued continuous function f, defined on an interval [a, b], its total variation on the interval of definition is a measure of the one-dimensional arclength of the curve with parametric equation x → f(x), for x in [a,b].
http://en.wikipedia.org/wiki/Total_variation
Since the camera relies on a single photon detector, it can also be adapted to image at wavelengths where conventional CCD and CMOS imagers are blind."
Total Variation Denoising
33/40
In signal processing, Total Variation denoising, also known as total variation regularization, is a process based upon the principle that signals with excessive, and possibly spurious detail, have high total variation, that is, the integral of the absolute gradient of the signal is high.
http://en.wikipedia.org/wiki/Total_variation_denoising
1D digital signals

For a digital signal y , we can, f.i., define the total variation as:
n
Given an input signal x , the goal of total variation denoising is to find an approximation, call it y , that has smaller total variation than x but is "close" to x . One measure of closeness is the sum of square errors:
n
n
n
n
Total Variation Denoising
34/40
So the total variation denoising problem amounts to minimizing the following discrete functional over the signal y :
n
By differentiating this functional with respect to y , we can derive a corresponding Euler-Lagrange equation, that can be numerically integrated with the original signal x as initial condition.

Alternatively, since this is a convex functional, techniques from convex optimization can be used to minimize it and find the solution y .
n
n
n
Total Variation Denoising
35/40
2D digital signals (images)
The total variation norm is given by:
Isotropic: not differentiable
An-isotropic: easier to minimize
http://csi.uan.edu.co/news/A.Chambolle-TVM-2004.pdf
Homotopy
The homotopy methods exploit the fact that the objective function F(x) undergoes a homotopy from the l2 constraint to the l1 objective in (1) as λ decreases.
(1)
http://en.wikipedia.org/wiki/Homotopy
As the green ball travels (up) on the graph of the given function, the length of the path traveled by that ball's projection on the y-axis, shown as a red ball, is the total variation of the function.
According to this principle, reducing the total variation of the signal subject to it being a close match to the original signal, removes unwanted detail whilst preserving important details such as edges.
Full transcript