Sakupljač feed-ova

Course announcement: 254B, Higher order Fourier analysis

Terrence Tao - Prije 13 sati 20 minuta

Starting on Monday, March 29, I will begin my graduate class for the winter quarter, entitled “Higher order Fourier analysis“.  While classical Fourier analysis is concerned with correlations with linear phases such as (where ), quadratic and higher order Fourier analysis is concerned with quadratic and higher order phases such as , , etc.

In recent years, it has become clear that certain problems in additive combinatorics are naturally associated with a certain order of Fourier analysis.  For instance, problems involving arithmetic progressions of length three are connected with classical Fourier analysis; problems involving progressions of length four are connected with quadratic Fourier analysis; problems involving progressions of length five are connected with cubic Fourier analysis; and so forth.  The reasons for this will be discussed later in the course, but we will just give one indication of the connection here: linear phases and arithmetic progressions of length three are connected by the identity

while quadratic phases and arithmetic progressions of length four are connected by the identity

and so forth.

It turns out that in order to get a complete theory of higher order Fourier analysis, the simple polynomial phases of the type given above do not suffice.  One must also consider more exotic objects such as locally polynomial phases, bracket polynomial phases (such as , and/or nilsequences (sequences arising from an orbit in a nilmanifold ).  These (closely related) families of objects will be introduced later in the course.

Classical Fourier analysis revolves around the Fourier transform and the inversion formula.  Unfortunately, we have not yet been able to locate similar identities in the higher order setting, but one can weaker results, such as higher order structure theorems and arithmetic regularity lemmas, which are sufficient for many purposes, such as proving Szemeredi’s theorem on arithmetic progressions, or my theorem with Ben Green that the primes contain arbitrarily long arithmetic progressions.  These results are powered by the inverse conjecture for the Gowers norms, which is now extremely close to being fully resolved.

Our focus here will primarily be on the finitary approach to the subject, but there is also an important infinitary aspect to the theory also, originally coming from ergodic theory but more recently also from nonstandard analysis (or more precisely, ultralimit analysis), and we will touch upon this in the course also.  If time permits, we will also present the number-theoretic applications of this machinery to counting arithmetic progressions and other linear patterns in the primes.


Filed under: 254B - Higher order Fourier analysis, admin
Kategorije: Matematički blogovi

Color: from XYZ to spectrum

Rip’s Applied Mathematics Blog - Uto, 2010-03-09 03:02
Introduction.

In this post, I will go from XYZ coordinates to a spectrum.

I have to ask: what would somebody do with this? Especially, what would they do that couldn’t have been done using the XYZ tristimulus values directly? I don’t know – but let’s just solve the problem. I do not yet always have a satisfactory solution, and I will illustrate both satisfactory and unsatisfactory solutions.

Yes, we have done this before – but not as our primary purpose, but rather as part of another computation. It is worthwhile to tackle this specific problem, because there is one subtlety.

In general terms, the solution is simple: the XYZ tri-stimulus values are proportional to the components (with respect to the dual basis) of a (fundamental) spectrum. Find the dual basis, then get the linear combination defined by those components.

I am going to use the CIE 1931 color matching functions. I will show the calculations explicitly for 20 nm intervals, but I will show graphs for both the 20 nm intervals and for 5 nm intervals.

A nice example. (A satisfactory solution.)

Here is a swatch of color.

The digital color meter on my computer says that this swatch has…

{X, Y, Z} = {67.422, 82.956, 35.834}

and RGB = {66.3, 100.0, 54.1} . (I chose it off the color wheel, and Mathematica translated my click into an RGB specification, namely RGBColor[0.663767, 1, 0.541878]. So the RGB reported by the digital color meter matches what I put in – which was selected off the HSB color wheel by sight (not by value).)

Incidentally, on my Mac, the calculator in the ColorSync utility confirms these two triples – just input XYZ as “perceptual”, and read off the RGB. Oh, remember to set the device to “LCD”, because the utility opens with “generic RGB”. You may have counted that I am using two pieces of software, the DigitalColor Meter and ColorSync.

So let’s go get a spectrum for that swatch.

We have learned that the XYZ tri-stimulus values are proportional to the components of a fundamental spectrum with respect to the basis E dual to the color matching functions in the matrix A.

That proportionality constant has always been computed by applying the second row of A’ to the spectrum of the illuminant. It appears, therefore, that we will need to choose an illuminant.

We do not need to, and we probably shouldn’t – but we can, and I will choose one the first time I do this calculation.

Okay, we need a matrix A and the dual basis E. For graphing, it will be convenient to have a list of wavelengths.

What range of wavelengths? Well, the color matching functions I have (at 5 nm intervals) are defined over 380 – 780 nm. That is smaller than the range over which illuminants are defined, so I might as well choose the entire range of the color matching functions. It is the smaller, hence the limiting, range.

Here are the wavelengths…

{380, 400, 420, 440, 460, 480, 500, 520, 540, 560,
580, 600, 620, 640, 660, 680, 700, 720, 740, 760, 780}

Let me choose a common standard illuminant, D65. Here it is…

{50., 82.8, 93.4, 104.9, 117.8, 115.9, 109.4, 104.8,
104.4, 100., 95.8, 90., 87.7, 83.7, 80.2, 78.3, 71.6,
61.6, 75.1, 46.4, 63.4}

Here is my A matrix, from the CIE 1931 XYZ (xyz bar) tables at 20 nm intervals.

Now we can compute the scaling factor: apply the transpose A’ to d65, and take the middle number as the scaling – or, instead, just apply the middle row of A’ to d65…

s1 = 529.642

then we multiply our XYZ by that scale, to get components wrt the dual basis E…

c1 = s1 {X, X, Z} = {35709.5, 43937., 18979.2}.

Now get the dual basis. As usual, we define the dual basis by its action on the columns of A…

E’A = I,

where I is a 3×3 identity matrix. Then we write the utterly impossible

,

because it reminds us of the true formula which uses the pseudo-inverse:

.

Thus I compute E’, but display the transpose (because it fits on the screen)…

From the components and the basis vectors (rows of E’), I compute the fundamental as a linear combination…

f1 = c1.ET = {13.7722, 147.654, 1404.24, 3909.35, 4082.99, 3075.15,
3862.37, 7568.38, 10233.7, 10959.3, 10011.9, 7694.61, 4877.98,
2300.89, 811.6, 227.105, 54.8956, 13.5196, 3.87036, 1.25375, 0.}

It is a perfectly nice spectrum, always non-negative.

Let’s look at it:

Here’s a picture from my (unpublished) computations at 5 nm intervals.

Let’s check the computation. From that spectrum, we recover the components wrt E by applying A’…

{35709.5, 43937., 18979.2}

then divide by the scale…

{67.422, 82.956, 35.834}

which is exactly the given XYZ…

{X, Y, Z} = {67.422, 82.956, 35.834}

Now hold up a minute.

How did we check that answer? We got the components of that spectrum with respect to the E basis – but that just used A’. Oh, then we divided by the scale factor, which came from D65.

Not really. I didn’t recompute the scale factor… I just used the same one as before. And that’s the point. If we want to check the answer, by reversing the calculations, we have to know what scale factor was used to get from XYZ to the components.

It doesn’t matter what that scale factor is or where it came from, it just has to be the same one when we check the answer as when we combined the basis vectors.

We could have used 1 or 100, or pi, in order to get from XYZ to the components. If someone aks me for the illuminant, I’m going to answer the question they should have asked in this case, and tell them the scale factor (which may, but need not, have come from an illuminant).

In fact, now that I look at the scale on the graph, that spectrum is pretty intense in any customary units. Maybe I should have used 1 instead of about 500.

If you’re uncomfortable with using an arbitrary scale factor, and want to specify an illuminant, but D65 seems too special, consider using an equal-energy illuminant instead of D65. It doesn’t matter, as far as I can see.

A not-so-nice example. (An unsatisfactory solution.)

Here is another set of XYZ…

{X, Y, Z} = {30.64, 19.812, 8.203}

The ColorSync utility tells me that on my monitor, those are RGB…

(and DigitalColor Meter then confirms that this RGB spec has the given XYZ).

That is, in this case I had a set of XYZ tri-stimulus values at my fingertips. The ColorSync utility lets me find the RGB which my monitor believes is equivalent, so I know what to feed Mathematica for that disk. This achieves one outstanding goal: given XYZ, I can find out what color it represents on my monitor. (That’s a nonlinear calculation, and so far, I’m letting the computer do it implicitly.)

In the first case, I had picked a color, gotten the RGB first, and then asked for the tri-stimulus values from the image. When I go the other way, I first have to let the computer calculate RGB from XYZ – and then I can confirm the two sets of numbers from the image.

Anyway….

For this case, let’s just set the scale factor equal to 1. This means that our XYZ values are themselves the components with respect to the three E basis vectors.

We compute the fundamental as a linear combination of the basis vectors, and get…

f4 = {X, Y, Z}.ET = {0.0111122, 0.111361, 1.04104, 2.61278, 1.92543,
0.116112, -1.05328, -1.39481, 0.440321, 3.67812, 7.57807, 9.86993, 8.30289,
4.42371, 1.63849, 0.465915, 0.11361, 0.0290242, 0.00683746, 0.00191253, 0.}

Some negative numbers. Let’s look at it.

Again, here’s the corresponding picture from my (unpublished) computations at 5 nm intervals.

I assure you, this spectrum being negative around 500 nm has nothing to do with my choice of scaling factor. The scaling factor only affects the vertical scale, not the origin; it doesn’t translate the curve.

Are these negative values OK?

I don’t know. Why did we want this spectrum? What are we going to do with it? And, as I asked before, could we have done it all with just the XYZ values?

As it happens, however, I know a way to fix this, in this case…. We can eliminate the negative values. I know a residual which I can add to it to this spectrum, so that the sum will be everywhere positive – and since I’m adding a residual, it has no effect on the XYZ values.

Truth to tell, I know two or three residuals which will make the negatives become positive. One of them is the residual from an equal-energy illuminant.

Let’s see that.

Now the quickest way to get a residual is to subtract a fundamental from a spectrum… and the quickest way to get a fundamental, in general, is to apply a projection operator to a spectrum.

Been there, done that. We recall that we can construct a projection operator onto the 3D color-fundamental subspace as follows. Let u,w,v be the Singular Value Decomposition of A’:

A’ = u w v’.

Let v1 be the first 3 columns of v. Then the 3 columns of v1 are an orthonormal basis for the pre-image of the range of A’ (the color-fundamental subspace) , and the matrix

P = v1 v1′

is a projection operator onto the span of v1 (again, the color-fundamental subspace).

I ask for the SVD:

{u,w,v}=SingularValueDecomposition[AT];

I get v1 as the first 3 columns of v:

Then I define

P = v1 v1′.

Now, p is 21×21, so I’m not going to display it… but for checking purposes, here are its first row…

second-to-last row (the last row is identically zero)…

and its first column…

Next, here’s an equal-energy spectrum…

ee=ConstantArray[100,21]
= {100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100, 100}

Here’s its fundamental… P.ee =

{0.470329, 4.93178, 46.8236, 126.976, 122.475, 67.1406, 46.2149,
73.0368, 104.313, 123.805, 129.975, 116.133, 81.8326, 40.6347,
14.6472, 4.12848, 1.00191, 0.250948, 0.0659255, 0.0201401, 0.}

Here is its residual… re = ee-P.ee =

{99.5297, 95.0682, 53.1764, -26.9757, -22.4752, 32.8594, 53.7851,
26.9632, -4.31253, -23.8048, -29.9753, -16.133, 18.1674, 59.3653,
85.3528, 95.8715, 98.9981, 99.7491, 99.9341, 99.9799, 100.}

And here’s what the residual looks like. Note that the scale is much smaller than for our fundamental spectrum.

Again, here’s the corresponding picture from my (unpublished) computations at 5 nm intervals.

Now let’s see if I can add some of that residual to the fundamental, and end up with something nice.

Playing around, I discover that this works: f4 + 8re/100 =

{7.97349, 7.71682, 5.29515, 0.454721, 0.127414, 2.74487, 3.24952,
0.762244, 0.0953185, 1.77374, 5.18005, 8.57929, 9.75627, 9.17293,
8.46672, 8.13564, 8.03346, 8.00895, 8.00156, 8.0003, 8.}

And the result looks like this:

Again, here’s the corresponding picture from my (unpublished) computations at 5 nm intervals.

(Blue is the original, the fundamental, sometimes negative. Black is the result of adding that particular residual.)

So, I grabbed the residual of an equal-energy illumninant, and added enough of it to make the sum positive everywhere. It had a pretty significant impact on the spectrum – but the XYZ values are unchanged, because I added a residual.

The residual of D65 would also work.

But the residual of illuminant C does not work. (I can get to a point where two values are negative, but changing things a little moves them in opposite directions. That won’t work.)

So it appears that I just lucked out. A multiple of the residual from either the D65 or the equal-energy illuminant can be added to this spectrum to give something which is everywhere positive. But not every residual of an illuminant will work.

Well, I didn’t really think it would. I just thought illuminants were a convenient source of real residuals.

(No, I don’t know how to tell ahead of time if a given {X, Y, Z} will have a fundamental which goes negative. Nor do I mean to assert that “correcting” that negativity is the only reason we might want to add a residual.)

I don’t yet know how to solve this problem of designing residuals to order, but I have an idea or two. I’ll keep you posted.


Kategorije: Matematički blogovi

Jacob Palis 70th birthday conference

Disquisitiones Mathematicae - Pon, 2010-03-08 01:19

During the last 10 days, I was attending J. Palis’ 70th birthday conference held at Atlantico Hotel, Buzios, Brazil. As one could expect from a conference of this size (~220 participants), it was a nice opportunity to talk to coauthors/friends and to learn several interesting tools. Also, since Buzios has plenty of superb beaches (very close to the hotel) and we had 2 hours of lunch time break, the conference atmosphere was an interesting mixture of intense and relaxing moments. Furthermore, I guess that the fact that the young and senior participants were together in the same place provided a great opportunity to the young generations to meet (and talk directly) with the senior generations, so that they could start new collaborations. In particular, I strongly believe that Jacob Palis and the organizers are very proud of this beautiful conference (which was extremely successful in many ways).

Concerning the (intense) conference program (with 5 plenary talks and 2 parallel sessions per day), you can find the details here, but I should advance that I’m planning to write some posts related to some talks (e.g., J.C. Yoccoz, A. Zorich, A. Avila, etc.).

In any case, this post is intended to accomplish two goals: firstly, I’m making the slides of Enrique Pujals’ talk (on Palis’ work) available here (with his permission, of course), and secondly, I’m fulfilling my (public) promise of putting the slides of my talk here. Remark: Please notice that these pdf files contain photos, so that their sizes are relatively big (~46MB and ~34MB resp.)

About Enrique Pujals’ talk, let me mention that it was a very touching moment: during the preparation of his slides, Enrique had the nice idea of inviting some of Jacob’s mathematical sons and nephews to draw some pictures in order to show how Palis influence appears in several contexts; in particular, each slide of Enrique’s talk was a small tribute of some son/nephew of Jacob (whose signature always appears in the corresponding slide). Also, Enrique putted a lot of effort in making the exposition as funny as possible (e.g., when he showed a picture of Gugu’s son illustrating Newhouse phenomena with his hands, Enrique said that Jacob is still recruiting young people to Dynamical Systems). Finally, the talk ended with Enrique embracing Jacob and saying to him that student isn’t a word meant to refer to a past event, but to a present one.

About my talk, I have only to acknowledge the organizers for the imense honor to give the final talk of the conference: in fact, I was extremely touched (and nervous) with this invitation, specially because 10 years ago (during Palis 60th birthday conference), I was a 1st year graduate student trying to choose my primary research area (and of course my choice was particularly influenced by J. Palis). During the talk, I started by telling the audience about my first meeting with Palis. I guess that the first time Palis heard of me was during a visit of Jean-Pierre Serre. In fact, I was so excited to learn that J.P. Serre was at IMPA that I interrupted him in the middle of a calculation (in his office) to ask in Portuguese to take a photo with him. Of course, Serre didn’t understand my request and told to Jacob (IMPA’s director in 1997) about a strange boy who went into his office like a crazy. After this initial bad impression, during Palis 60th birthday conference (in 2000), I talked to him asking for some advice concerning my research career. Instead of giving me a long speech, Palis saw that I was planning to read his joint book with Floris Takens and he told me to take read the book to decide if I liked the subject or not. I acknowledge his advice and asked for an autograph in this book. Again, instead of writing a long dedicatory, he simply put one decisive word of encouragement: Sucess (this appears as the second slide of my talk). After that, I talked about two mathematical results (joint with J.C. Yoccoz, and G. Forni and A. Zorich) and I closed the exposition with a selection of key (serious and funny) moments of the conference. I hope you will enjoy it!


Kategorije: Matematički blogovi

The computation of Salié sums

E. Kowalski's blog - Ned, 2010-03-07 18:14

I’m continuing preparing my notes for the course on exponential sums over finite fields, and after the fourth moment of Kloosterman sums, I’ve just typed the final example of “explicit” computation, that of the Salié sums, which for the prime field Z/pZ are given by

(the (x/p) being the Legendre symbol, and e(z)=exp(2iπ z) as usual in analytic number theory). These look suspiciously close to Kloosterman sums, of course, but a surprising fact is that they turn out to be rather simpler!

More precisely, we have the following formula if a and b are coprime with p (this goes back, I think to Salié, though I haven’t checked precisely):

the inner sum running over the solutions of the quadratic equation in Z/pZ. The first term is simply a quadratic Gauss sum, which is well-known to be of modulus

and since the inner sum contains either 0 or 2 terms only, depending on the quadratic character of a modulo p, we get trivially the analogue

of the Weil bound for Kloosterman sums.

The standard proof of this formula is due to P. Sarnak. It is the one which is reproduced in my earlier course on analytic number theory and in my book with H. Iwaniec. However, since it involves a somewhat clever trick, I tried a bit to find a more motivated argument (motivated does not mean motivic, though one can certainly do it this way…).

I wasn’t quite successful, but still found a different proof (which, of course, is very possibly not original; I wouldn’t be surprised, say, if Sarnak had found it before the shorter one in his book). The argument uses a similar trick of seeing the sum as value at 1 of a function which is expanded using some discrete Fourier transform, but maybe the function is less clever: it is roughly

instead of something like

(and it uses multiplicative characters in the expansion, instead of additive characters). It’s a bit longer, also less elementary, because one needs to use the beautiful Hasse-Davenport product formula: denoting

the Gauss sums associated with multiplicative characters, we have

which is the analogue for Gauss sums of the duplication formula

for the gamma function. Since this formula is most quickly proved using Jacobi sums (the analogues of the Beta function…), which I had also included in my explicit computations of exponential sums, using this argument is a nice way to make the text feel nicely interlinked and connected. And it’s always a good feeling to use a proof which is not just the same as what can be found already in three of four places (at least when you don’t know those places; for all I know, this may have been published twenty times already).

Now, you may wonder what Salié sums are good for. To my mind, their time of glory was when Iwaniec used them to prove the first non-trivial upper bound for Fourier coefficients of half-integral weight modular forms (this is the application Sarnak included in his book), which then turns out to lead quite easily (through some additional work of W. Duke) to results about the representations of integers by ternary quadratic forms. Another corollary of Iwaniec’s bound, through the Waldspurger formula and Shimura’s correspondance, was a strong subconvexity bound for twisted L-functions of the type

where f is a fixed holomorphic form and χ is a real Dirichlet character, the main parameter being the conductor of the latter.

The point of Iwaniec’s argument was that the Weil-type bound, when applied to the Fourier coefficients of Poincaré series, which can be expressed as a series of Salié sums

(with fixed m and n) in the half-integral weight case, just misses giving a result. So one must exploit cancellation in the sum over those Salié sums, i.e., as functions of the modulus c. This is hopeless, at the moment, for Kloosterman sums, but the semi-explicit expression for the Salié sums in terms of roots of quadratic congruences turns out to be sufficient to squeeze out some saving…

(Nowadays, there is a wealth of techniques to directly prove subconvexity bounds for the twisted L-functions — e.g., in this paper of Blomer, Harcos and Michel –, and one can run the argument backwards, getting better estimates for Fourier coefficients from those; as is well-known, one finds this way that the “optimal” bound for the Fourier coefficients is equivalent with a form of the Lindelöf Hypothesis for the special values…)

Kategorije: Matematički blogovi

EDP11 — the search continues

W.T. Gowers - Ned, 2010-03-07 12:27

I do not have too much to say about the new programme to use SDP to solve EDP, other than that it is still being actively pursued. One remark I would make is that I have rather forgotten recently about keeping the wiki updated, and SDP is a major omission from it. It would be good to have a theoretical account of the method (which I suppose I could paste quite easily from one of my existing posts), and an experimental page devoted particularly to data connected with this approach, with links to all the matrices, sequences and plots that have been produced.

So far, the data has had some expected features (such as the sequence values tending to be larger when is smooth) and some puzzling ones (such as the fact that is much much larger than ). An initial hope for the method was that the experimental data would give rise to a very precise conjecture, but so far that has not happened. There are, however, various avenues that have not been fully explored, and I still have some hope that we will suddenly stumble on some data that we can fully understand.

One way of doing this is to introduce a smoothing. One idea I had turned out to go too far: in the search for a pretty formula I ended up with a version of EDP that was false. For the record, here is that version. One can think of the discrepancy of a sequence as its largest inner product with the characteristic function of any HAP. Those characteristic functions have sharp cutoffs (in that they go up to for some and then suddenly stop), so one might hope to make the problem nicer by smoothing the cutoffs. One natural way of doing this is to look at functions such as , which take the value at and 0 at non-multiples of . For this to be a sensible idea, we need a “smoothed EDP” to be true. That is, we need it to be the case that for every sequence there exist and such that has absolute value at least . However, this is false for the character-like function . The multiplicativity of implies that it will be enough to show this when , so let us fix some and calculate the sum .

We do this in the usual way. First, let us look at non-multiples of 3. That gives us the sum

which equals . More generally, the sum over multiples of that are not multiples of works out as . One can check that the function is increasing in the interval , so if we sum this over all then the alternating series test tells us that the total is at most 1/3, whatever the value of .

A similar argument works if we use a weight of instead, but the calculations are easier. By the alternating series test, we know that is positive and at most 1. Call this value . Then , which is at most .

At one point I observed that if you have a sequence of bounded discrepancy then the Dirichlet series is uniformly bounded for all positive real , and wondered if we could deduce anything useful from this. The fact that has this property shows that we certainly cannot derive a contradiction, though it leaves open the possibility of deducing that the sequence is forced to have character-like behaviour.

In the light of these observations, what hope is there for obtaining nicer formulae by smoothing? The answer is that even if we cannot smooth the HAPs, we can at least smooth the interval of integers we are thinking about as a whole. That is, instead of thinking of discrepancy as a function of (the length of the sequence ), we could take an infinite sequence , associate with a weight , and think of the discrepancy as a function of the decay rate (or equivalently the half-life, which is proportional to ). That is, we would like to prove that for each there must exist such that has absolute value at least , where as .

If we try to prove this using Moses’s SDP approach, then, as Moses points out, there is a nice formulation of the problem. We would like to find non-negative coefficients that sum to 1, and coefficients such that , such that

for every sequence . This is similar to what we were looking at before, but now we have attached some slowly decaying weights to the .

Note that we could think of the previous version of the problem as doing exactly the same, but with weights that equal 1 up to and thereafter. I am hoping that with these smoother weights, we will get nicer numbers coming out.

I would like to end this post by drawing attention to Gil’s polynomial method. This is a completely different approach to EDP, where one constructs a polynomial over a finite field that is identically zero if and only if EDP is true. Rather than explain the idea in detail, let me link to two comments in which Gil himself gives a nice explanation. He introduced the idea in this comment and an interesting variant of the idea in this one. It would be good to add this method too to the list of possible proof techniques on the wiki.


Kategorije: Matematički blogovi

Happenings — Mar 6

Rip’s Applied Mathematics Blog - Sub, 2010-03-06 21:08

“Live as if you were to die tomorrow. Learn as if you were to live forever.” Mohandas (“Mahatma”) Gandhi.

(That’s one of the inspirational quotations on the wall of my study.)

Regardless of what else I might choose to talk about this week, the dominant subject was, in fact, particle physics.

I went to the annual “Oppenheimer lecture” at Cal last Monday in the late afternoon. Frank Wilczek, a co-winner of the Nobel Prize in Physics 2004,”for the discovery of asymptotic freedom in the theory of the strong interaction”, was the speaker.

“Cal”, of course (!), is the University of California — at Berkeley. I’m pretty sure that it is usually referred to as “Berkeley” elsewhere, but in this neck of the woods, I have learned, I had better call it “Cal”. Some people associated with the University have bridled at the implication that it is on a par with other campuses of the University of California — instead of being the original, the real one, the one that need not be tied to a town or city.

Well, I had occasion to look up “Bishop Berkeley” a few weeks ago… and I discovered that the town of Berkeley — excuse me, the city of Berkeley — excuse me, the People’s Republic of Berkeley — was named for the Bishop.

When some people assembled to found the University in the mid-1800s, they knew perfectly well that a town would grow up around it — so they planned, and they named, the town at the same time.

I find it amusing. The original University of California is an inseparable part of Berkeley and conversely. They were created together. One might argue that it has a better right than most to be referred to by its location.

Oh, well.

Why was I looking up Bishop Berkeley? You may remember him having something to do with the foundations of the calculus. He really, really objected to “infinitesimals”; I would like to think that he would be satisfied by the modern treatment of limits and derivatives. He lived from 1685 to 1753, so he was about 20 years older than Euler. In particular, he predates the beginning of rigor in analysis. (That’s a rather sloppy statement, but let me just leave it there.)

What I was looking for was a quotation something to this effect: the only prerequisite for accepting some particular piece of mathematics, is that you be insane. (I’m still looking for that quotation. Can you help me out?)

I thought it might have been something the Bishop said… but I can’t find it.

Why was I looking for that quotation? Over breakfast with a physicist friend, I mentioned that there was a surface — mathematical, of course — with finite volume but infinite surface area. I thought that quotation was associated with this surface.

This is a rather interesting surface. You can fill it with paint, specifically with a finite amount of paint — but you cannot paint it.

So?

Let me put that another way. There is paint in contact with every point of the surface — it’s literally full of paint — so how can we be unable to cover it with paint?

There is no reason for me to say very much about this surface: Wikipedia has a fine article.

(It is a surface of revolution. Take a rectangular hyperbola, say y = 1/x. Rotate that curve about the x axis. In fact, restrict the curve and the surface to x >= 1. It is still an infinite surface, but only in one direction instead of two. It is called Gabriel’s Horn and / or Torricelli’s trumpet.)

Okay, let’s get back to Monday afternoon in Berkeley at the University of California.

As one would imagine from his Nobel prize and current events in physics, Wilczek’s lecture was about the particle physics which the Large Hadron Collider will be investigating. It’s not just the Higgs boson which they hope to find, but also some of the particles required by supersymmetry.

The Higgs boson is predicted by “the standard model”; supersymmetry is a theoretical extension of the standard model.

The standard model is the description, in theory and in practice, of fermions and bosons (matter and field particles), i.e. quarks, electrons and neutrinos, the photon, the Ws and Z, and gluons. (Everything but gravity and the graviton.)

It was a pleasant lecture, interesting without being challenging or intimidating.

Nevertheless, it reminded me that amidst my pile of quantum mechanics books, including quantum field theory and gauge theory, I have only one book that addresses “the standard model” per se.

Just before I went to bed Monday night — that is to say, when I was tired and my judgement was fading — I searched Amazon for books on the standard model.

I bought three. (My judgement actually wasn’t bad; they look pretty good. I spent yesterday evening looking thru them, and others.)

The problem is that my pile of quantum mechanics books is on a par, numerically, with my books on time series, or control threory; it is smaller than my differential geometry collection — even with Lie Groups and Lie Algebras separated out of differential geometry — but it is, in fact, larger than my dynamical systems collection.

I really, really need to stop buying books on quantum mechanics — or start reading books on quantum mechanics. In fact, of course, I should try to do both.

Hence the quotation that begins this post. I am sure that I own more mathematics books than I can read before I die. Heck, whether I own the books or not, there exists more mathematics then I can understand before I die.

I might as well study whatever I feel like.

Hmm. My brain has finite volume but I act as though my bookcases have infinite area.

So, “the big four” has become “the big five”. All this means is that one more book has been added to the small desk: Griffiths’ “Introduction to Elementary Particles” (bibliography).

I have removed “the Mathematica Navigator” from the small desk. The total remains at 18. (For what it’s worth, just for fun, I’ll probably publish the current list once a month.

Oops, how did I get to 18? In addition to adding a particle physics book, I added a statistics book.

Not because I wanted to study it, but because I remembered that I had read some of it while eating out, and enjoyed what it had to say. It is on the small desk specifically as interesting reading rather than actual work. FYI, it’s Hair et al., “Multivariate Data Analysis”, sixth edition, Pearson Prentice Hall. (Not all of the books on the small desk are suitable for dinnertime reading.)

Okay, my kid has already played this morning… now I’m going to try to spend an hour with one of the books on the small desk… then I will find some mathematics to do… and, finally, I will try to draft a post for this weekend.

Mathematics? Probably color, logic, or orbits. The blog? Most likely color or logic.


Kategorije: Matematički blogovi

254A, Notes 7: The least singular value

Terrence Tao - Pet, 2010-03-05 22:46

Now we turn attention to another important spectral statistic, the least singular value of an matrix (or, more generally, the least non-trivial singular value of a matrix with ). This quantity controls the invertibility of . Indeed, is invertible precisely when is non-zero, and the operator norm of is given by . This quantity is also related to the condition number of , which is of importance in numerical linear algebra. As we shall see in the next set of notes, the least singular value of (and more generally, of the shifts for complex ) will be of importance in rigorously establishing the circular law for iid random matrices , as it plays a key role in computing the Stieltjes transform of such matrices, which as we have already seen is a powerful tool in understanding the spectra of random matrices.

The least singular value

which sits at the “hard edge” of the spectrum, bears a superficial similarity to the operator norm

at the “soft edge” of the spectrum, that was discussed back in Notes 3, so one may at first think that the methods that were effective in controlling the latter, namely the epsilon-net argument and the moment method, would also work to control the former. The epsilon-net method does indeed have some effectiveness when dealing with rectangular matrices (in which the spectrum stays well away from zero), but the situation becomes more delicate for square matrices; it can control some “low entropy” portions of the infimum that arise from “structured” or “compressible” choices of , but are not able to control the “generic” or “incompressible” choices of , for which new arguments will be needed. As for the moment method, this can give the coarse order of magnitude (for instance, for rectangular matrices with for , it gives an upper bound of for the singular value with high probability, thanks to the Marchenko-Pastur law), but again this method begins to break down for square matrices, although one can make some partial headway by considering negative moments such as , though these are more difficult to compute than positive moments .

So one needs to supplement these existing methods with additional tools. It turns out that the key issue is to understand the distance between one of the rows of the matrix , and the hyperplane spanned by the other rows. The reason for this is as follows. First suppose that , so that is non-invertible, and there is a linear dependence between the rows . Thus, one of the will lie in the hyperplane spanned by the other rows, and so one of the distances mentioned above will vanish; in fact, one expects many of the distances to vanish. Conversely, whenever one of these distances vanishes, one has a linear dependence, and so .

More generally, if the least singular value is small, one generically expects many of these distances to be small also, and conversely. Thus, control of the least singular value is morally equivalent to control of the distance between a row and the hyperplane spanned by the other rows. This latter quantity is basically the dot product of with a unit normal of this hyperplane.

When working with random matrices with jointly independent coefficients, we have the crucial property that the unit normal (which depends on all the rows other than ) is independent of , so even after conditioning to be fixed, the entries of remain independent. As such, the dot product is a familiar scalar random walk, and can be controlled by a number of tools, most notably Littlewood-Offord theorems and the Berry-Esséen central limit theorem. As it turns out, this type of control works well except in some rare cases in which the normal is “compressible” or otherwise highly structured; but epsilon-net arguments can be used to dispose of these cases. (This general strategy was first developed for the technically simpler singularity problem by Komlós, and then extended to the least singular value problem by Rudelson.)

These methods rely quite strongly on the joint independence on all the entries; it remains a challenge to extend them to more general settings. Even for Wigner matrices, the methods run into difficulty because of the non-independence of some of the entries (although it turns out one can understand the least singular value in such cases by rather different methods).

To simplify the exposition, we shall focus primarily on just one specific ensemble of random matrices, the Bernoulli ensemble of random sign matrices, where are independent Bernoulli signs. However, the results can extend to more general classes of random matrices, with the main requirement being that the coefficients are jointly independent.

— 1. The epsilon-net argument —

We begin by using the epsilon net argument to establish a lower bound in the rectangular case, due to Litvak, Pajor, Rudelson, and Tomczak-Jaegermann:

Theorem 1 (Lower bound) Let be an Bernoulli matrix, where for some (independent of ). Then with exponentially high probability (i.e. for some ), one has , where depends only on .

This should be compared with the upper bound

for some absolute constant that holds with overwhelming probability from Notes 3 (indeed, one can take any here).

We use the epsilon-net method introduced in Notes 3, but with a smaller value of than used for the largest singular value. We write

Taking to be a maximal -net of the unit sphere in , with to be chosen later, we have that

and thus by (1), we have with overwhelming probability that

and so it suffices to show that

is exponentially small in . From the union bound, we can upper bound this by

From the volume packing argument we have

So we need to upper bound, for each , the probability

If we let be the rows of , we can write this as

By Markov’s inequality, the only way that this event can hold is if we have

for at least values of . We do not know in advance what the set of is for which this event holds. But the number of possible values of such sets of is at most . Applying the union bound (and paying the entropy cost of ) and using symmetry, we may thus bound the above probability by

(Let’s say that is even for sake of notation, although it makes little essential difference.) Now observe that the random variables are independent, and so we can bound this expression by

where is a random vector of iid Bernoulli signs.

We write , so that is a random walk

To understand this walk, we apply (a slight variant) of the Berry-Esséen theorem from Notes 2:

Exercise 1 Show that

for any and any non-zero . (Actually, for the purposes of this set of notes, it would suffice to establish a weaker form of the Berry-Esséen theorem with replaced by for any fixed ). (Hint: first normalise , then adapt the proof of the Berry-Esséen theorem.)

Conclude in particular that if

(say) then

(Hint: condition out all the with .)

Let us temporarily call incompressible if

and compressible otherwise. If we only look at the incompressible elements of , we can now bound

and comparing this against the entropy cost (2) we obtain an acceptable contribution for small enough (here we are crucially using the rectangular condition ).

It remains to deal with the compressible vectors. Observe that such vectors lie within of a sparse unit vector which is only supported in at most positions. The -entropy of these sparse vectors (i.e. the number of balls of radius needed to cover this space) can easily be computed to be of polynomial size in . Meanwhile, we have the following crude bound:

Exercise 2 For any unit vector , show that

for small enough. (Hint: Use the Paley-Zygmund inequality. Bounds on higher moments on can be obtained for instance using Hoeffding’s inequality, or by direct computation.) Use this to show that

for all such and sufficiently small, with independent of and .

Thus the compressible vectors give a net contribution of , which is acceptable. This concludes the proof of Theorem 1.

— 2. Singularity probability —

Now we turn to square Bernoulli matrices . Before we investigate the size of the least singular value, we first tackle the easier problem of bounding the singularity probability

i.e. the probability that is not invertible. The problem of computing this probability exactly is still not completely settled. Since is singular whenever the first two rows (say) are identical, we obtain a lower bound

and it is conjectured that this bound is essentially tight in the sense that

but this remains open; the best bound currently is due to Bourgain, Vu, and Wood, and gives

We will not prove this bound here, but content ourselves with a weaker bound, essentially due to Komlós:

Proposition 2 We have .

To show this, we need the following combinatorial fact, due to Erdös:

Proposition 3 (Erdös Littlewood-Offord theorem) Let be a vector with at least nonzero entries, and let be a random vector of iid Bernoulli signs. Then .

Proof: By taking real and imaginary parts we may assume that is real. By eliminating zero coefficients of we may assume that ; reflecting we may then assume that all the are positive. Observe that the set of with forms an antichain in . The claim now easily follows from Sperner’s theorem and Stirling’s formula.

Note that we also have the obvious bound

for any non-zero .

Now we prove the theorem. In analogy with the arugments of the previou ssection, we write

(actually we can take since is real). We divide into compressible and incompressible vectors as before, but our definition of compressibility and incompressibility is slightly different now. Also, one has to do a certain amount of technical maneuvering in order to preserve the crucial independence between rows and columns.

Namely, we pick an and call compressible if it is supported on at most coordinates, and incompressible otherwise.

Let us first consider the contribution of the event that for some nonzero compressible . Pick an with this property which is as sparse as possible, say sparse for some . Let us temporarily fix . By paying an entropy cost of , we may assume that it is the first entries that are non-zero for some . This implies that the first columns of have a linear dependence given by ; by minimality, are linearly independent. Thus, is uniquely determined (up to scalar multiples) by . Furthermore, as the matrix formed by has rank , there is some minor which already determines up to constants; by paying another entropy cost of , we may assume that it is the top left minor which does this. In particular, we can now use the first rows to determine up to constants. But the remaining rows are independent of and still need to be orthogonal to ; by Proposition 3, this happens with probability at most , giving a total cost of

which by Stirling’s formula is acceptable (in fact this gives an exponentially small contribution).

The same argument gives that the event that for some nonzero compressible also has exponentially small probability. The only remaining event to control is the event that for some incompressible , but that and for all nonzero compressible . Call this event .

Since for some incompressible , we see that for at least values of , the row lies in the vector space spanned by the remaining rows of . Let denote the event that holds, and that lies in ; then we see from double counting that

By symmetry, we thus have

To compute , we freeze consider a normal vector to ; note that we can select depending only on . We may assume that an incompressible normal vector exists, since otherwise the event would be empty. We make the crucial observation that is still independent of . By Proposition 3, we thus see that the conditional probability that , for fixed , is . We thus see that , and the claim follows.

Remark 1 Further progress has been made on this problem by a finer analysis of the concentration probability , and in particular in classifying those for which this concentration probability is large (this is known as the inverse Littlewood-Offord problem). Important breakthroughs in this direction were made by Halász (introducing Fourier-analytic tools) and by Kahn, Komlós, and Szemerédi (introducing an efficient “swapping” argument). Van Vu and I introduced tools from additive combinatorics (such as Freiman’s theorem) to obtain further improvements, leading eventually to the results of Bourgain, Vu, and Wood mentioned earlier.

— 3. Lower bound for the least singular value —

Now we return to the least singular value of an iid Bernoulli matrix, and establish a lower bound. Given that there are singular values between and , which is typically of size , one expects the least singular value to be of size about on the average. Another argument supporting this heuristic scomes from the following identity:

Exercise 3 (Negative second moment identity) Let be an invertible matrix, let be the rows of , and let be the columns of . For each , let be the hyperplane spanned by all the rows other than . Show that and .

From Talagrand’s inequality (Notes 1), we expect each to be of size on the average, which suggests that ; this is consistent with the heuristic that the eigenvalues should be roughly evenly spaced in the interval (so that should be about ).

Now we give a rigorous lower bound:

Theorem 4 (Lower tail estimate for the least singular value) For any , one has

where goes to zero as uniformly in , and goes to zero as for each fixed .

This is a weaker form of a result of Rudelson and Vershynin (which obtains a bound of the form for some ), which builds upon earlier work of Rudelson and of Vu and myself, which obtained variants of the above result.

The scale that we are working at here is too fine to use epsilon net arguments (unless one has a lot of control on the entropy, which can be obtained in some cases thanks to powerful inverse Littlewood-Offord theorems, but is difficult to obtain in general.) We can prove this theorem along similar lines to the arguments in the previous section; we sketch the method as follows. We can take to be small. We write the probability to be estimated as

We can assume that for some absolute constant , as the event that this fails has exponentially small probability.

We pick an (not depending on ) to be chosen later. We call a unit vector compressible if lies within a distance of a -sparse vector. Let us first dispose of the case in which for some compressible . By paying an entropy cost of , we may assume that is within of a vector supported in the first coordinates. Using the operator norm bound on and the triangle inequality, we conclude that

Since has norm comparable to , this implies that the least singular value of the first columns of is . But by Theorem 1, this occurs with probability (if are small enough). So the total probability of the compressible event is at most , which is acceptable if is small enough.

Thus we may assume now that for all compressible unit vectors ; we may similarly assume that for all compressible unit vectors . Indeed, we may also assume that for every , where is with the column removed.

The remaining case is if for some incompressible . Let us call this event . Write , and let be the column of , thus

Letting be the subspace spanned by all the except for , we conclude upon projecting to the orthogonal complement of that

for all (compare with Exercise 3). On the other hand, since is incompressible, we see that for at least values of , and thus

for at least values of . If we let be the event that and (4) both hold, we thus have from double-counting that

and thus by symmetry

(say). However, if holds, then setting to be a unit normal vector to (which is necessarily incompressible, by the hypothesis on ), we have

Again, the crucial point is that and are independent. The incompressibility of , combined with a Berry-Esséen type theorem, then gives

Exercise 4 Show that

(say) if is sufficiently small depending on , and is sufficiently large depending on .

This gives a bound of for if is small enough depending on , and is large enough; this gives the claim.

Remark 2 A variant of these arguments, based on inverse Littlewood-Offord theorems rather than the Berry-Esséen theorem, gives the variant estimate

with high probability for some , and any of polynomial size in . There are several results of this type, with overlapping ranges of generality (and various values of ) by Götze-Tikhimirov, Pan-Zhou, and Vu and myself; the exponent is known to degrade if one has too few moment assumptions on the underlying random matrix . This type of result (with an unspecified ) is important for the circular law, discussed in the next set of lectures.

— 4. Upper bound for the least singular value —

One can complement the lower tail estimate with an upper tail estimate:

Theorem 5 (Upper tail estimate for the least singular value) For any , one has

We prove this using an argument of Rudelson and Vershynin. Suppose that , then

for all .

Next, let be the rows of , and let be the columns of , thus is a dual basis for . From (7) we have

We apply this with equal to , where is the orthogonal projection to the space spanned by . On the one hand, we have

and on the other hand we have for any that

and so

If (8) holds, then for at least half of the , so the probability in (6) can be bounded by

which by symmetry can be bounded by

Let be a small quantity to be chosen later. From Talagrand’s inequality (Notes 1) we know that with probability , so we obtain a bound of

Now a key point is that the vectors depend only on and not on ; indeed, they are the dual basis for in . Thus, after conditioning and thus to be fixed, is still a Bernoulli random vector. Applying a Berry-Esséen inequality, we obtain a bound of for the conditional probability that for sufficiently small depending on , unless is compressible (in the sense that, say, it is within of an -sparse vector). But this latter possibility can be controlled (with exponentially small probability) by the same type of arguments as before; we omit the details.

— 5. Asymptotic for the least singular value —

The distribution of singular values of a gaussian random matrix can be computed explicitly by techniques similar to those employed in Notes 6. In particular, if is a real gaussian matrix (with all entries iid with distribution ), it was shown by Edelman that converges in distribution to the distribution as . It turns out that this result can be extended to other ensembles with the same mean and variance. In particular, we have the following result of Van Vu and myself:

Theorem 6 If is an iid Bernoulli matrix, then also converges in distribution to as . (In fact there is a polynomial rate of convergence.)

This should be compared with Theorems 4, 5, which show that have a tight sequence of distributions in . The arguments of Van and myself thus provide an alternate proof of these two theorems.

We do not establish the limit directly, but instead use Edelman’s result as a black box, focusing instead on establishing the universality of the limiting distribution of , and in particular that this limiting distribution is the same whether one has a Bernoulli ensemble or a gaussian ensemble.

The arguments are somewhat technical and we will not present them in full here, but instead give a sketch of the key ideas.

In previous sections we have already seen the close relationship between the least singular value , and the distances between a row of and the hyperplane spanned by the other rows. It is not hard to use the above machinery to show that as , converges in distribution to the absolute value of a Gaussian regardless of the underlying distribution of the coefficients of (i.e. it is asymptotically universal). The basic point is that one can write as where is a unit normal of (we will assume here that is non-singular, which by previous arguments is true asymptotically almost surely). The previous machinery lets us show that is incompressible with high probability, and then claim then follows from the Berry-Esséen theorem.

Unfortunately, despite the presence of suggestive relationships such as Exercise 3, the asymptotic universality of the distances does not directly imply asymptotic universality of the least singular value. However, it turns out that one can obtain a higher-dimensional version of the universality of the scalar quantities , as follows. For any small (say, for some small ) and any distinct , a modification of the above argument shows that the covariance matrix

of the orthogonal projections of the rows to the complement of the space spanned by the other rows of , is also universal, converging in distribution to the covariance matrix of iid gaussians (note that the convergence of to is the case of this claim). (These covariance matrix distributions are also known as Wishart distributions.) The key point is that one can show that the complement is usually “incompressible” in a certain technical sense, which implies that the projections behave like iid gaussians on that projection thanks to a multidimensional Berry-Esséen theorem.

On the other hand, the covariance matrix (9) is closely related to the inverse matrix :

Exercise 5 Show that (9) is also equal to , where is the matrix formed from the columns of .

In particular, this shows that the singular values of randomly selected columns of have a universal distribution.

Recall our goal is to show that has an asymptotically universal distribution, which is equivalent to asking that has an asymptotically universal distribution. The goal is then to extract the operator norm of from looking at a random minor of this matrix. This comes from the following application of the second moment method:

Exercise 6 Let be an matrix with columns , and let be the matrix formed by taking of the columns at random. Show that

where is the Frobenius norm.

Recall from Exercise 3 that , so we expect each to have magnitude about . This, together with the Hoeffman-Wielandt inequality (Notes 3b) means that we expect to differ by from . In principle, this gives us asymptotic universality on from the already established universality of .

There is one technical obstacle remaining, however: while we know that each is distributed like a Gaussian, so that each individual is going to be of size with reasonably good probability, in order for the above exercise to be useful, one needs to bound all of the simultaneously with high probability. A naive application of the union bound leads to terrible results here. Fortunately, there is a strong correlation between the : they tend to be large together or small together, or equivalently that the distances tend to be small together or large together. Here is one indication of this:

Lemma 7 For any , one has

where is the orthogonal projection onto the space spanned by .

Proof: We may relabel so that ; then projecting everything by we may assume that . Our goal is now to show that

Recall that is a dual basis to . This implies in particular that

for any vector ; applying this to we obtain

and hence by the triangle inequality

Using the fact that , the claim follows.

In practice, once gets moderately large (e.g. for some small ), one can control the expressions appearing here by Talagrand’s inequality (Notes 1), and so this inequality tells us that once is bounded away from zero for , it is bounded away from zero for all other also. This turns out to be enough to get enough uniform control on the to make Exercise 6 useful, and ultimately to complete the proof of Theorem 6.


Filed under: 254A - random matrices, math.CO, math.PR, math.SP Tagged: epsilon net argument, least singular value, Littlewood-Offord theorems
Kategorije: Matematički blogovi

Books Added – Logic

Rip’s Applied Mathematics Blog - Čet, 2010-03-04 03:21

I think my kid picked logic up again late in January. In particular, he wanted to look again at two recent books intended to help students make the transition to abstract mathematics — i.e. to having to prove things.

Those two books are Exner and Hummel. They were highly recommended for that, out on the sci.math newsgroup, and, therefore, I immediately bought them.

In addition to those two books, I ended up looking at Aristotle (indirectly), Lewis Carroll — yes, for logic! — Paul Halmos on logic as algebra, and a few books on symbolic logic.

Speaking first of the two textbooks about how to prove things….

Exner is the less formal of the two. For example, his second chapter talks about informal proofs, while his third chapter shows the formalities. In addition to dealing with the mechanics of proving things, this book is also trying to teach good habits for reading a mathematical text.

Hummel, by contrast, is a little more formal from the very start: page 19, for example, has a list — a very nice list! — of tautologies. For example, that

is equivalent to

.

(That is, “P implies Q” is equivalent to “not P, or Q”.)

Both of these books filled a need for me. Every time I had ever looked at a book on mathematical logic, I had bailed when we got to bound and free variables.

Both of these books showed me why we distinquish bound and free variables. (Yes, I will explain, but not here and now.)

Hummel offers applications starting on page 80: sets, functions and relations, number systems, transfinite cardinal numbers, the axiom of choice & ordinal numbers. His introductory material on logic is pretty much just logic.

Exner, by contrast, uses mathematical applications all the way through. The end of the book is some more concentrated applications to sets and functions.

When I first worked through these books, I decided that I wanted to look at Aristotle’s syllogisms using my modern logic tools. My reference for Aristotle is Joseph. It is not an easy read, and I can’t really say I recommend it. It’s what I have, so far. (I have just ordered two modern introductions to logic, and we’ll see if they have anything about syllogistic reasoning.)

It turns out that Joseph may be an unusual reference for classical logic. Aristotle used what are called figures, and he defined only three of them. Medieval logicians used a fourth figure, which can be converted to the first. (See the subsequent discussion of Lewis Carroll.)

On the other hand, I have tried to read Aristotle in English — and it is hard going, so we want to read somebody else about him. Hence Joseph.

(On the third hand, I have tried to read Aristotle in Greek — and he is the hardest Greek author I have ever encountered. Admittedly, I haven’t tried very many Greek authors — but he is head and shoulders harder than his own teacher, Plato.)

I also happened to see “Symbolic Logic” by Lewis Carroll, about the time I had picked up logic a few years ago. Well, on this pass through Lewis Carroll, I noticed that he had a list of 19 syllogisms at the back of the book.

Not the same list as Joseph — but the difference is understandable: Lewis Carroll is using four figures to Joseph’s three figures. He is using the medieval formulation.

Anyway, to return to the main thread, once that I understood why we cared about free and bound variables, I could make more progress in my modern mathematical logic books.

I have three textbooks, but only two seem worthwhile. One is Copi and the other is Suppes. Both were readable — now! — and even pleasant. In contrast to Exner and Hummel, these are books about the choices of axioms for logic, and working out the consequences of the axioms.

If you will, Copi or Suppes provides the foundations for Exner and Hummel.

There are two other books to be added. One is an ancient Chelsea reprint by Paul Halmos, and the other is a nice introduction to it.

I am pretty sure that when I bought that Chelsea reprint of Halmos’s papers, I could not even read the introductory paper. (If I had any notion then what an ideal was, it was not a very substantial notion.) And I am not sure I will ever care enough about mathematical logic to read the following papers in the book, but maybe you will. Then I discovered that there was a related book, by Halmos & Givant, and it came in about two weeks ago. It gives a very nice introduction to the Chelsea reprint, and to Boolean algebra applied to logic.

Interestingly, it has yet a third list of Aristotle’s syllogisms, and I cannot reconcile it with the other two. That’s okay: I’ll use their methods on one of the other lists! (I was afraid they had completely disposed of Aristotle, leaving me nothing left to do. Not so. I still get to work syllogisms all out for myself.)

Oh, of course there is one more book. Boole’s “Investigation into the Laws of Thought”. I probably bought that even before the Chelsea reprint.

Boole is interesting because I am sure we have found alternatives to some of the calculations he laid out. In particular, I strongly doubt that we do divisions by zero! Actually, I’m pretty sure we don’t use division at all. Someday I’ll figure out how to get Boole’s answers using modern methods.

So. If you are trying to learn how to prove things, I recommend Hummel and probably Exner, too. if you are trying to learn how to read math books, I particularly recommend Exner. And in general, Exner looks to be the more introductory of the two. If you want an introduction to modern logic, I would suggest you look at Copi or Suppes.

Oh, bear in mind that I am speaking of Copi’s “Symbolic Logic” rather than his “Introduction to Logic”.

If you want fascinating logic puzzles, then Lewis Carroll. And if you want to understand just what George Boole accomplished — what he had to work with! — of course, get and read him.

the books

Boole, George. An Investigation of The laws of Thought…. Dover publication of the 1854 edition.
[logic; 3 Mar 2010]
I think it’s fair to say that he started it all, and Boolean algebra is named in his honor. Of course, he predates set theory, but published almost simultaneously with De Morgan. Reading Boole’s examples made me understand how so many theology students could become mathematicians — talk about convoluted! But that’s part of what he was trying to do — extract the key results out of convoluted theological arguments. Plenty of examples; computational algorithms, of course — but they are not what we would do today.

Carroll, Lewis. (Dodgson, C.L.) Lewis Carroll’s Symbolic Logic. Clarkson N. Potter Inc., 1986 (2nd paperback ed.)
ISBN 0 517 53363 4.
[logic; 3 Mar 2010]
One good reason for publishing this book under the pseudonym rather than under the real name is that Lewis Carroll brought his considerable narrative skills to the puzzles which illustrate his points. The book includes a short discussion of the upheaval in logic as it transitioned from classical, on through the work particularly of Boole and De Morgan, and into modern symbolic logic. He presents graphical techniques for solving puzzles. Lots of answers.

Copi, Irving M. Symbolic Logic. Macmillan, 1966 (2nd ed).
[logic; 3 Mar 2010]
I enjoyed this enough to order the most recent edition of his “Introduction to Logic” — I liked the language, the organization, and the examples in this book. I understand that this book is considered out of date, but I have no idea what should replace it. Selected answers.

Exner, George R. An Accompaniement to Higher Mathematics. Springer, 1996. (corrected 3rd printing, 1999).
ISBN 0 387 94617 9.
[logic, proofs; 3 Mar 2010]
“Undergraduate Texts in Mathematics”. This is a marvelous introduction to how to read a mathematics book, and how to approach the construction of proofs. I think it is an excellent introduction to or companion to Hummel. Selected answers.

Halmos, Paul R. and Givant, Steven. Logic as Algebra. Mathematical Association of America, 1998.
ISBN 0 88385 327 2.
[logic; 3 Mar 2010]
“Dolciani Mathematical Exposition”. A short introduction to the use of abstract algebra applied to logic. I particularly enjoyed their discussion of Aristotle’s syllogisms — even though I can’t reconcile their list of the syllogisms with either of the two lists I trust! This seems a comparatively minor point considering the book as a whole.

Halmos, Paul R. Algebraic Logic. Chelsea, 1962.
[logic, abstract algebra; 3 Mar 2010]
A collection of Halmos’ papers; at this stage, only the first is accessible to me — and it requires being comfortable with modern algebra, in particular with rings. I’ll call it advanced undergraduate, more likely graduate level.

Hummel, Kenneth E. Introductory Concepts for Abstract Mathematics. Chapman and Hall / CRC, 2000.
ISBN 1 58488 134 8.
[logic, proofs; 3 Mar 2010]
This is a marvelous introduction to formal logic and to the construction of proofs. I think it is an excellent companion to or follow-up to Exner. Selected answers.

Joseph, H.W.B. An Introduction to Logic. Oxford, 1916 (2nd ed. reprinted 1966).
[logic; 3 Mar 2010]
This is an old-school logic book. First published in 1906, it may well represent the pinnacle of traditional logic.

Suppes, Patrick. Introduction to Logic. Van Nostrand, 1957. Available as a Dover reprint, but which edition?
[logic; 3 Mar 2010]
“University Series in Undergraduate Mathematics.” Clearly written, and I enjoyed it. Rigorous symbolic logic will almost certainly never be on my list of things to do, but this was fun to read.


Kategorije: Matematički blogovi

ERT9: Weak Mixing implies Weak Mixing of all orders

Disquisitiones Mathematicae - Uto, 2010-03-02 20:55

Remember one of the characterizations we got in ERT8 of weak mixing: a mps is weak mixing if and only if, for any ,

that is, if and only if the sequence of functions converges in the -norm to .

For our interests, the notion of weak mixing will be useful only if the above property extends to multiple functions, because, as we saw in ERT4, the convergence of these sequence corresponds to multiple recurrence. This actually holds, as we will see below, and is called weak mixing implies weak mixing of all orders. Such property will follow from two main ingredients:

  1. The product characterization of weak mixing can be extended to multiple products.
  2. The van der Corput trick allows an inductive argument.

It is worth mentioning that this is the second time van der Corput trick appears in these lectures (the first one was in ERT2). The reader not used to it might think it is just a technical step in the proof, but it is actually an ingredient present in many situations of Ergodic Ramsey Theory when dealing with random components of mps. It has many versions, each of them to the purpose of particular notions of multiple recurrence. We first discuss the one we need.

1. The van der Corput trick

Theorem 1 (van der Corput trick) If is a bounded sequence in a Hilbert space and if

then .

Proof: Take such that . Notice that, for a fixed ,

goes to zero as and so the assertion is equivalent to

By the triangle inequality,

which goes to zero as .

The significance of this inequality is that it replaces the task of bounding a sum of coefficients by that of bounding a sum of “differentiated” coefficients . This trick is thus useful in “polynomial” type situations when the differentiated coefficients are often simpler (have smaller order) than the original coefficients.

The above theorem is written in a modern fashion. The original van der Corput trick is actually known as van der Corput difference theorem and comes from the theory of uniform distributions.

Definition 2 We say that a sequence is uniformly distributed if, for every interval ,

Exercise 1 Given a sequence , prove that the following assertions are equivalent.

  1. is uniformly distributed.
  2. For every continuous , where is the Lebesgue measure.
  3. (Weyl criterion) For any ,

Theorem 3 (van der Corput difference theorem) A sequence is uniformly distributed if is uniformly distributed for every .

Proof: By Weyl criterion and the assumption, . Fix , and define . Then

converges to zero as . By van der Corput trick,

converges to zero which, again by Weyl criterion, proves that is uniformly distributed.

The previous result implies Weyl equidistribution theorem, which is a generalization of Kronecker theorem on the uniform distribution of , for irrational .

Theorem 4 (Weyl) If is a polynomial with at least one of its coefficients irrational, then , , is uniformly distributed.

Proof: Follows from Kronecker theorem by sucessive applications of Theorem 3.

2. Weak mixing implies weak mixing of all orders

Remember that is weak mixing if and only if is ergodic whenever is ergodic. For each , let .

Proposition 5 If is weak mixing, then is ergodic, for every integers . In fact, it is weak mixing.

Proof: Note that is weak mixing, for any . To prove this, consider and of zero density such that

Let . This set has zero density, as

and the last fraction converges to zero.

We proceed by induction on . The case was proved in the last paragraph. Suppose that, for every integers , is ergodic. If ,

is the product of an ergodic and a weak mixing system, which proves our assertion.

The main result of this section is

Theorem 6 Let be weak mixing. Then, for any ,

in the -norm.

In other words, this means that, for any ,

The proof will consist in an inductive argument, with the use of van der Corput trick to reduce the case to and so on. By linearity of the expression, we assume that . In this setting, we want to prove that

where represents the norm of .

Proof: The case follows from the ergodicity of :

Suppose the result is true for and take . Define

Given , let . Then

By hypothesis,

Rewrite the above expression as

where and is given by

As is ergodic,

By the van der Corput trick,

which concludes the proof.

Corollary 7 (Multiple Poincaré Recurrence for weak mixing mps) If is weak mixing and , then

for every .

Proof: Just consider in (1).

Previous posts: ERT0, ERT1, ERT2, ERT3, ERT4, ERT5, ERT6, ERT7, ERT8.


Kategorije: Matematički blogovi

EDP10 — a new and very promising approach

W.T. Gowers - Uto, 2010-03-02 18:40

From time to time, there has been an input into this project that has given rise to a burst of optimism (on my part anyway). Perhaps the first was the rapid discovery of very long sequences with low discrepancy, and especially the fact that these sequences had interesting structure to them. (The length led me to hope that the conjecture might be false, or at the very least that it might be possible to construct sequences with extremely slow-growing discrepancy, and the structure led me to hope the opposite.) I’m probably forgetting a few things, but the next one I remember is Terence Tao’s amazing observation that we could restrict attention to multiplicative functions if we were prepared to change the problem very slightly. We then discovered (though we sort of knew it anyway) that multiplicative functions are not easy objects to understand …

Since I posted EDP9, there has been a development that has radically changed my perception of the problem, and I imagine that of anyone else who is following closely what is going on. It began with this comment of Moses Charikar.

Moses’s idea, which I shall partially explain in a moment (for about the fifth time) is based on the theory of semi-definite programming. The reason I find it so promising is that it offers a way round the difficulty that the sequence 1, -1, 0, 1, -1, 0, 1, -1, 0, … has bounded discrepancy. Recall that this fact, though extremely obvious, is also a serious problem when one is trying to prove EDP, since it rules out any approach that is just based on the size of the sequence (as measured by, say, the average of the squares of its terms). It seemed to be forcing us to classify sequences into ones that had some kind of periodicity and ones that did not, and treat the two cases differently. I do not rule out that such an approach might exist, but it looks likely to be hard.

Moses proposes (if you’ll excuse the accidental rhyme) the following method of proving that every sequence has unbounded discrepancy. I’ll state it in infinitary terms, but one can give finitary versions too. Suppose you can find non-negative coefficients (one for each pair of natural numbers ) that sum to 1, and non-negative coefficients summing to infinity, such that the quadratic form

is positive semi-definite. Then you are done. Why? Because if were a sequence of s with discrepancy at most , then the first term in the above sum would be at most , while the second would be , which contradicts the positivity in a rather big way.

Why does this deal with the troublesome sequences? Because it is perfectly possible (and necessary, if this method is to work) for the sum of the over all that are not multiples of 3 to be finite. So this method, unlike many previous proof ideas, would not accidentally be trying to prove something false.

Note that to prove that the quadratic form is positive semi-definite, it is sufficient to write it as a sum of squares. So EDP is reduced to an existence problem: can we find appropriate coefficients and a way of writing the resulting form as a sum of squares?

Now this idea, though very nice, would not be much use if there were absolutely no hope of finding such an identity. But there is a very clear programme for finding one, which Moses and Huy Nguyen have started. The idea is to begin by using semidefinite programming to find the optimal set of coefficients for large (that is, for a finite truncation of the infinite problem), which can be done on a computer, and which they have already done (see the comments following the one I linked to above for more details). Next, one stares very hard at the data and tries to guess a pattern. It is not necessary to use the very best possible set of coefficients, so at this point there may be a trade-off between how good the coefficients are and how easy they are to analyse. (This flexibility is another very nice aspect of the idea.) However, looking at very good sets of coefficients is likely to give one some idea about which choices have a chance of working and which don’t. Having made a choice, one then tries to prove the positive semidefiniteness.

As Moses points out, if such coefficients can be found, then they automatically solve the vector-valued problem as well, since we can look at the expression

instead, and the positivity will carry over. As he also points out, if you modify our low-discrepancy multiplicative examples such as by multiplying by the unit vector , where is the largest power of 3 that divides , then you get a sequence of discrepancy that grows like , which shows that this method cannot hope to do better than a bound. But I’d settle for that!

Finally, I want to draw attention to another comment of Moses, in which he introduces a further idea for getting a handle on the problem. I won’t explain in detail what the idea is because I haven’t fully digested it myself. However, it gives rise to some Taylor coefficients that take values that are all of the form for some integer . It is clear that they have a great deal of structure, but we have not yet got to the bottom of what that structure is. If we do, then it may lead to a concrete proposal for a matrix of coefficients that should be a good one.

My optimism may fade in due course, but at the time of writing it feels as though these new ideas have changed the problem from one that felt very hard into one that feels approachable.


Kategorije: Matematički blogovi

Color: more on color-primary transformations

Rip’s Applied Mathematics Blog - Uto, 2010-03-02 03:21
introduction

(Notation: It has become my custom to use, for example, N’ to denote the transpose of N.)

This post has two purposes.

  1. To show you Giorgianni & Madden’s (see bibliography) version of the calculation I first found in Glassner (see bibliography; and there is a link below).
  2. To show you what happens when we change the white point — only the white point.

I want to show you Giorgianni & Madden’s version of the calculation because they do it slightly differently — they compute a transition matrix. As always, the surest way to avoid misunderstanding is to show you a calculation. You may not know why I did something, but there should be no doubt about what I did.

And I want to show you what happens when we change the white point. I talked about this at the end of the “color primary” post (okay, what I still think of as the “Glassner” post); the link is below.

I can spare you some suspense, but I’m going to do the calculation anyway.

One disadvantage of having Mathematica at my fingertips is that I can compute before I think. (There are advantages to walking around the block every so often, and they’re not just physical; it gives me a chance to think, stuck in circumstances where I cannot compute.)

Let us first review the calculation as Glassner presented it.

review Glassner

Let’s recall his discussion of his recipe. (The original is in here.)

Let the matrix K contain the x, y, z coordinates of the three phosphors. To be specific, let the first row — there it is, K is an attitude matrix — contain the xyz coordinates for the red phosphor (so K is an attitude matrix with xyz as the new basis), and so on. Let the vector W contain the X, Y, Z components of the white point. Let G be a diagonal matrix constructed from the vector V — whatever it is. Let F = (1, 1, 1).

The key is that we assume each row of N is proportional to a row of K:

N = G K.

That is, we are planning to change the lengths of the basis vectors but not their directions. This is crucial.

It is also significant that F = (1, 1, 1), because it gives us a simple relationship between V and G:

V = FG.

We have that N maps the white point F to W (but it’s an attitude matrix, hence post-multiply):

W = FN.

Then

W = FN = F(GK) = (FG)K = VK

and finally

.

Having gotten there, we reverse ourselves: from W and K we compute V, then G, then N.

To be explicit, from K and W we find N as follows:

.

G = DiagonalMatrix[V]

N = G K.

thinking about it

Okay, having reviewed that calculation, let us ask: what will happen if we change the white point?

We get a new basis. Big deal. We could figure out the relationship between two new bases having different white points.

Perfectly straightforward, and that’s exactly what I did a week or so ago. And when I saw the answer, my first response was “Fascinating” — and my second response was, “Duh, of course.”

Any idea why, “Duh”?

Go back to that relationship

N = G K,

where G was a diagonal matrix. We understood that to mean we are not changing the directions of the basis vectors, just their lengths.

When we change the white point, we will have two new bases, of pairs of vectors having the same direction but different lengths.

Just as G is a diagonal matrix, so will be the transition matrix between our two new bases.

As I said, I’ll compute it anyway.

But before that, let’s look at…

Giorgianni & Madden

Their example (on p. 384) uses different phosphors. More importantly, they use different notation. It is valuable, therefore, to work out their example so that I can clearly identify and point out the changes in notation. (One of them is non-trivial.)

The phosphors are specified by “ITU-R BT.709″.

As usual, they provide xy, we need xyz.

Here’s the red phosphor…

xy = {0.64, 0.33}

r = xyz = {0.64, 0.33, 0.03}.

The green…

xy = {0.3, 0.6}

g = xyz = {0.3, 0.6, 0.1}.

The blue…

xy = {0.15, 0.06}

b = xyz = {0.15, 0.06, 0.79}.

And the white point (which they say is, and which I recognize as, D65)…

xy = {0.3127, 0.329}

w = xyz = {0.3127, 0.329, 0.3583}.

In contrast to Glassner, they set the scale so that Y = 100 instead of 1. No big deal. Multiply by 100 and divide by the y value…

W = {95.0456, 100., 108.906}.

What we have called K is their “primary chromaticity matrix C” (I know this from seeing their C matrix):

What we have called , they call p, but we agree on the computation:

V = {64.4361, 119.195, 120.321}.

They do construct a diagonal matrix, but they don’t explicitly name it or show it. No big deal. As usual, let

Finally they compute “a phosphor matrix M” as

M = C’ G = K’ G.

On the other hand, our recipe called for

N = G K.

Transpose:

N’ = K’ G’ = K’ G = C’ G = M.

So if we compute N = G K by Glassner’s recipe, we should find ourselves looking at the transpose of their answer M = C’ G = K’ G.

Their M is not Glassner’s M. Do not get confused about that. Frankly, it’s very convenient that Glassner computed N first before his own M. (I don’t have to muddy the waters by actually computing Glassner’s M.) Here then is Glassner’s N:

Yes. Glassner’s N is the transpose of Giorgianni & Madden’s M.

(Oh, maybe now is good time to remind you that Glassner’s N is the inverse of Glassner’s M.)

Since Glassner’s N and M are inverses, and their M is his N’, then their M is the inverse transpose of Glassner’s M. That should be no big deal; you should just write out the algebra in a way that suits you, and then be clear about what’s what. It’s the algebra that’s important, of course, not the names.

Of course, we always need to remember whether we have an attitude matrix or a transition matrix, and which way it goes (“new” and “old”).

We can recall that Glassner’s set up is that N is an attitude matrix mapping RGB to XYZ; therefore XYZ is the new basis. Therefore N’ is a transition matrix with XYZ as old components of vectors. (If you need to, hit the tag cloud for “attitude or transition matrices”.)

Confused? Check it! Apply N’ to the (“new”) components of the RGB basis vector namely (1, 0, 0). We get XYZ…

XYZ = {41.2391, 21.2639, 1.93308}

(Yes, the first column of N’; equivalently, the first row of N; equivalently, the first column of G&M’s M.)

But what we know is xyz, so… normalize that XYZ to a sum of 1…

xyz = {0.64, 0.33, 0.03}

and compare that to the xyz cooordinates of the red phosphor…

r = {0.64, 0.33, 0.03}.

They match. Good. N’ does what I expected.

Incidentally, that means G&M computed the matrix I am most interested in, whether we call it N’ or M.

Let me save N’ (as T1), the transition matrix with XYZ as old components of vectors. And let’s save G, too (as G1).

Glassner’s recipe with white point A

(Can I call one Glassner’s recipe and the other G&M’s recipe?)

Let’s just change the white point to Standard Illumninant A. From Hunt I can read xy as…

xy = {0.4476, 0.4074}

and compute xyz:

w = xyz = {0.4476, 0.4074, 0.145}.

(We know how to compute the white point of an illuminant, but that requires my initializing the color-matching functions, and setting the A and A’ matrices…. Besides, I’ve done that, and Hunt and I agree on every white point he shows! I can safely copy them from him.)

We set the scale to 100:

W = {109.867, 100., 35.5916}.

We still have the same K, but changes…

V = {118.944, 98.4399, 28.075}.

And G changes…

I am used to writing N = G K, but I want N’, and I might as well also name T2 = N’…

Now I have two transition matrices, one mapping RGB d65 to XYZ and the other mapping RGB A to XYZ. What’s the transition matrix from d65 to A?

XYZ = T2 A
XYZ = T1 d65
;

so we compute

Diagonal. (I told you so.)

I quote myself from my original (unpublished) calculation a week or two ago:

“fascinating.

of course. the basis vectors are scaled, not rotated, so a transition between white points is diagonal — just scaling.

damn.”

Now confirm that I could have computed the analagous , instead of :

(The same answer.)

closing remarks

Gorgianni & Madden refer to this calculation as a “color-primary conversion”, whence my titles.

I could say, “Move along. There’s nothing to see here.” — but that’s not true. For one thing, it confirms our original belief that N = G K is changing only the lengths of the basis vectors. (It may have been an elementary observation, but reinforcement is a good thing.)

For another, perhaps more important, it tells me that whatever the von Kries and Bradford transforms are — since about all I know is that they are not diagonal, they cannot be the result of simply changing the white point, as we just did. There’s more to them than that.

No, I’ve never really talked about them. They have to do with “chromatic adaptation”, which is the human eye coping with changes in illuminant. We now know they are more than a change-of-basis.


Kategorije: Matematički blogovi

The Littlewood-Offord problem in high dimensions and a conjecture of Frankl and Füredi

Terrence Tao - Pon, 2010-03-01 22:35

Van Vu and I have just uploaded to the arXiv our joint paper “The Littlewood-Offord problem in high dimensions and a conjecture of Frankl and Füredi“. In this short paper we give a different proof of a high-dimensional Littlewood-Offord result of Frankl and Füredi, and in the process also affirmatively answer one of their open problems.

Let be vectors in , which we normalise to all have length at least . For any given radius , we consider the small ball probability

where are iid Bernoulli signs (i.e. they take values or independently with a probability of of each), and ranges over all (closed) balls of radius . The Littlewood-Offord problem is to compute the quantity

where range over all vectors in of length at least one. Informally, this number measures the extent to which a random walk of length (with all steps of size at least one) can concentrate into a ball of radius .

The one-dimensional case of this problem was answered by Erdös. First, one observes that one can normalise all the to be at least (as opposed to being at most ). In the model case when , he made the following simple observation: if a random sum fell into a ball of radius (which in the one-dimensional case, is an interval of length less than ), and one then changed one or more of the signs from to , then the new sum must necessarily lie outside of the ball. In other words, for any ball of radius , the set of signs for which forms an antichain. Applying Sperner’s theorem, the maximal size of this antichain is , and this soon leads to the exact value

when (the bound is attained in the extreme case ).

A similar argument works for higher values of , using Dilworth’s theorem instead of Sperner’s theorem, and gives the exact value

whenever and for some natural number , where are the largest binomial coefficients of .

Now consider the higher-dimensional problem. One has the obvious bound

but it is not obvious whether this inequality is strict. In other words, is there some way to exploit the additional freedom given by higher dimensions to make random walks concentrate more than in the one-dimensional case?

For some values of , it turns out that the answer is no, as was first observed by Kleitman (and discussed further by Frankl and Füredi). Suppose for instance that

for some . Then one can consider the example in which is one unit vector, and is another unit vector orthogonal to . The small ball probability in this case can be computed to equal rather than , which is slightly larger.

In the positive direction, Frankl and Füredi established the asymptotic

as (holding and fixed). Furthermore, if was close to an integer, and more precisely if

(so that the above counterexample can be avoided) they showed that for sufficiently large (depending on ).

The factor was an artefact of their method, and they conjectured in fact that one should have for sufficiently large whenever

thus matching the counterexample exactly. This conjecture was verified for by Kleitman and for by Frankl and Füredi.

In this paper we verify the conjecture of Frankl and Füredi (and give a new proof of their asymptotic (1)). Our main tool is the following high-dimensional Littlewood-Offord inequality:

Theorem 1 Suppose that which is genuinely -dimensional in the sense that for any hyperplane going through the origin, one has for at least values of . Then one has

Theorem 1 can be viewed as a high-dimensional variant of Erdös’s inequality (but without the sharp upper bound). It is proven by the Fourier-analytic method of Halász. (This theorem was announced in my book with Van Vu several years ago, but we did not get around to publishing it until now.)

Using Theorem 1, one can verify the conjecture of Frankl and Füredi fairly quickly (the deduction takes a little over a page). The main point is that if there is excessive concentration, then Theorem 1 quickly places almost all of the vectors to lie very close to a line. If all the vectors are close to a line, then we can project onto this line and rescale, which causes to worsen a little bit in this reduction to the one-dimensional case, but it turns out that the bounds (2) allow us to tolerate this degradation of once (so it is fortunate that the cases were already done for us!). If instead we have a vector far from the line (as is the case in the key counterexample), then we manually eliminate that vector using the parallelogram law, which effectively drops below (half of the time, at least) if was initially less than , which gives enough of a saving to conclude the argument.

One moral that one can draw from this argument is that one can use a quasi-sharp estimate (such as Theorem 1), which ostensibly loses constant factors, to then deduce a sharp estimate (such as the Frankl-Furëdi conjecture) that loses no constant factors, as long as one is in an asymptotic regime (in this case, and large depending on ). The key is to exploit the fine structure in the main term (in this case, the piecewise constant nature of when passes over integers) to extract gains that can absorb the losses coming from the quasi-sharp estimate).


Filed under: math.CO, paper Tagged: Littlewood-Offord problem, small ball probability, Van Vu
Kategorije: Matematički blogovi

Happenings Feb 27

Rip’s Applied Mathematics Blog - Sub, 2010-02-27 21:32

Okay, it’s Saturday morning.

I’ve put in my hour of stream of consciousness, mostly about mathematics. My kid has had a shot at whatever he wanted to do. Late today I plan to work on the next color post; it should be a detailed look at the fundamentals and the residuals of four spectra. Before I work on the post, and after I finish this, I will do some mathematics.

Even I find it a little hard to believe, but I’m not actually sure what mathematics I am going to work on this afternoon.

My kid isn’t the only one who does mathematics for fun. The bottom line is that my grown-up does too. The difference is that my kid gets to have a little fun every day. That is, on the days I don’t do real work — although he’s been known to get in some at the end of a real work day.

My grown-up gets to just have fun when he runs out of steam on a particular subject –which is another way of saying, when that particular subject stops being satisfying. (it’s a little too much to expect everything I ought to do to be truly fun.)

It’s true that my grown-up has an agenda — but it is not carved in stone. I have no intention of putting myself in a straitjacket and saying, “Here is the mathematics you will do for the next month, or two, or six.”

I can understand that some of you wish I would pick up wavelets again. Some of you may be sorry to see me put down colors when I finish with them. While my grown-up wants to finish what he starts, it is for small values of “finish”. (Really. 52 posts on PCA/FA made me comfortable, a good start but could hardly be said to have finished the subject.)

I am not going to do color until I am an expert in the field. In fact, there are only two things left that I want to do.

  • Get from XYZ to a spectrum;
  • get from XYZ to RGB on my monitor.

Now, by the time I get through these two, I may have found something else I just have to do with color… but real soon now I will put down color and pick up something else.

As I did with wavelets. As I did with PCA/FA (principal component analysis/factor analysis).

I already have some mathematics done that I need to convert to posts:

  • Logic;
  • Stepwise regression;
  • isolating multi-collinearity.

That’s good, because it will let me do some other mathematics without having to turn it into a set of blog posts right away.

Oh, I think I’m going to pick up orbits again. No, I haven’t done them yet on the blog — although I did talk about the orbit of Mars as an example of transition matrices.

Several years ago I did work out the orbits to get Mariner IV to Mars, and Voyagers 1 and 2 to Jupiter. What I want to do now is work out the gravitational assist for one of the Voyagers at Jupiter.

As for the past week, I have already altered the small pile of books on my smallest desk.

I like having it. It is a more potent reminder of the mathematics I want to touch than a written list would be. It also serves as a handy supply of quick reading material.

I grew up reading at the dinner table. I rarely sit down at a table to eat, at home; but if I go out to dinner alone, the first question is: what am I going to read?

“What am I going to eat?” is the second question. What kind of restaurant should I drive to?

Now I don’t have to walk into my library to find a book for dinner — my smallest desk has a perfect selection. Ah, I now have a reading menu.

I have, alas, added three books to it; on the other hand, I have removed one; and I have altered one. (And, of course, I just checked: there are 17 books there.)

I have removed Mermin’s “Quantum Computer Science”. Somehow, amidst the other books, it no longer looked as exciting as when it was sitting with the rest of my quantum mechanics books.

I have swapped out Skogestad & Postlewaite’s “Multivariable Feedback Control” and replaced it with

Blakelock’s “Automatic Control of Aircraft and Missiles” (biblio)

– because I think I can now derive what I consider to be its fundamental equation. I do want to return specifically to control of aircraft, and that to book in particular because it uses classical techniques. (I think of that as learning to shape wood using hand tools instead of power tools — power tools being analogous to state space techniques.)

And I have added

  • Guckenheimer & Holmes, “Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields”.
  • Vetterli & Kovačevic’, “Wavelets and Subband Coding.”

  • Armstrong, “Groups and Symmetry”.

Along the way, I decided to turn “the big three “into “the big four”. I have promoted “dynamical systems” from a subset of “control theory” (?!?) to a major group in its own right. In retrospect, I can’t imagine why it wasn’t always “the big four”.

So, I have selected four books for my smallest desk for each of the big four (dynamical systems, control theory, differential geometry, time series). What looks like a second book on differential geometry (Bloch, biblio) is, instead, a reminder that I really, really need to summarize his chapter 3.

There is nothing on that desk about color or logic; they are what I am actively working on, what I wake up dreaming about.

As for the blog itself, some of you may have noticed that I have made a cosmetic change to the bibliography. I wanted the authors names to pop out — because I think of most books by an author’s name rather than by title. I still wanted the titles to pop out, and I decided that I might as well use color. So now we have titles in red and authors’ last names in bold black.

I still need to go back and clean up the earliest entries on the bibliographies page. Some of them are just plain crappy looking. They key is that I have a better tool available than when I started.

I find that MacSpeech Dictate works very, very well — and although I have to specify the punctuation, it generally takes care of spelling and capitalization. I spend more than enough time hunched over the keyboard typing mathematics, latex, and operating system commands — at least I don’t have to type text.

You could hardly have missed that I have also made a cosmetic change to many of the posts. I have used the command “more” to display only an introduction rather than the entire post.

I don’t know about you, but I like that change. I like seeing 10 posts in about three screens now. I like seeing the context and history — and it makes it a lot easier for me to go back and find a particular post. (I do use my own blog for reference!)

I have not changed all of the posts — quite possibly less than half of them so far — but I’ll keep working at it. Oh, it also means that older pages load more quickly — unfortunately, I haven’t split all of the older pages yet.

Incidentally, the blog set a new record last Monday, 258 hits. Thank you.

Of course, my kid got to play before I drafted this. What did he want to do? I wanted to believe that he wanted to do orbits — but he knows perfectly well that my grown-up wants to do orbits, so he doesn’t have to. In fact, I think my kid is waiting to play with them after the grown-up writes a collection of functions to make it easy.

In the end, my kid actually looked at a EE book, to see how they presented Boolean algebra and what they wanted from it.

Okay, I need to turn this draft into a post….


Kategorije: Matematički blogovi

ERT8: Weak Mixing Systems

Disquisitiones Mathematicae - Sub, 2010-02-27 14:34

1. The dichotomy between structure and randomness

The main tool used in ERT1 and ERT2 was: given a mps , we decomposed the space into two pieces: one structured, formed by the fixed (or periodic, depending on the case) functions, and other random, formed by the functions for which the Cesàro averages converge to zero. This represents an example of the dichotomy that surrounds Ergodic Ramsey Theory: structure vs. randomness. This is actually briefly discussed in the end of ERT2 and in a broader way in this paper of Terence Tao.

This idea is also used in other branches of Mathematics, specially in combinatorics, harmonic analysis, number theory, etc. We can cite many situations:

  1. Theorems about the existence of ergodic averages.
  2. Szemerédi regularity lemma.
  3. Roth theorem on the existence of arithmetic progressions of length three in sets of positive density.
  4. Gowers norms.
  5. All proofs (up to my knowledge) of Szemerédi theorem.
  6. Green-Tao theorem on the existence of arithmetic progressions in the primes.
  7. The Julia set of holomorphic functions is either connected or a Cantor set.
  8. Mané-Bochi ‘02: a -generic conservative diffeomorphim in surfaces either has all Lyapunov exponents zero almost everywhere or is Anosov.
  9. Avila-Moreira ‘03: For almost every , the quadratic function is either regular (has a periodic attractor) or stochastic (has an invariant absolutely continuous probability with positive Lyapunov exponent).
  10. Avila-Forni ‘07: almost every interval exchange transformation is either an irrational rotation or weak mixing.
  11. Avila ‘10: for Schrodinger operators with a one-frequency and typical real analytic potential, the spectrum is either subcritical or supercritical.

Each situation has a notion of structure/randomness. The one we are interested is multiple recurrence of mps. Let us see this from the spectral theory point of view.

Given a mps , denote also by the Koopman-von Neumann operator, defined by

When necessary, we use the notation to denote this operator. Many of its spectral properties are related to ergodic properties of . We investigate the eigenvalues/eigenfunctions of . If the eigenfunctions form a basis of , then is determined. In fact, let be the multiset (repeated with multiplicity) of eigenvalues of and, for each , the eigenfunction associated to . If

then

In this case, we say that has pure point spectrum and is a compact system. This constitutes the structured notion we were looking for.

In contrast, when has no eigenvalues other than and it is simple, we say that has continuous spectrum and is a weak mixing system. It forms the random part.

As pure point/continuous spectrum are opposite notions, there is a hope that every mps can be decomposed into two parts: one compact and other weak mixing. This is not true at all. Instead, can be decomposed in several parts in such a way that every part is an extension of the previous one and it is a compact or weak mixing extension of the smaller one. In other words, the dynamics of is broken into many parts in which every braking is obtained from the previous one by adding one of the two dynamical prototypes we discussed above.

Formally speaking, given two mps and , we say that is an extension of if there is a surjective measurable map such that

We denote this by and is called a factor of .

Theorem 1 (Furstenberg structural theorem) Given a mps , there exists an ordinal and a family of factors of , for every , such that:

  • is a single point.
  • is a compact extension of for every successor ordinal .
  • is the inverse limit of , for every limit ordinal .
  • is a weak mixing extension of .

Above, inverse limit is in the sense that . This result will be discussed in Lebesgue-full detail in the last post. In order to understand it, we need to study four concepts:

  1. Weak mixing systems.
  2. Compact systems.
  3. Weak mixing extensions.
  4. Compact extensions.

These will be the topics of this and the next 2 or 3 lectures.

2. Weak mixing systems

The definition used below is different from the one we assumed above, but don’t worry: they will be shown to coincide. Actually, we will obtain various equivalent definitions of weak mixing.

Definition 2 A mps is weak mixing if, for every ,

Exercise 1 Consider a bounded sequence of nonnegative real numbers. Prove:

  1. If , then Conclude that strong mixing implies weak mixing.
  2. If , then Conclude that weak mixing implies ergodicity.

Exercise 2 is weak mixing if and only if

for every .

Proposition 3 is weak mixing if and only if

for every such that .

We leave the proof to the reader, which may be found in the book Topics in ergodic theory of W. Parry. The notion of weak mixing means that, in some sense, almost all the system behaves in a strong mixing way. This is what says the following lemma.

Lemma 4 Consider a bounded sequence of nonnegative real numbers. Then

if and only if there exists a set of zero density such that

Proof: () Define, for each , the set

is a ascending chain of subsets of . They are the sets that may give problems in the convergence of to zero. Each of them has zero density, because

and then

In this way, take an increasing sequence of integers such that . Define and

By definition,

It remains to prove that has zero density. Consider an integer , let us say, with . As ,

and then

which, by (2), implies that

Then, has zero density.

() Let such that , for every . Given , we want to prove that

for every large enough. By hypothesis, there is for which

and

Then, for ,

Taking such that

we get

which concludes the proof.

Taking and , Lemma 4 implies that is weak mixing if and only if, for every , there exists of zero density such that

By approximation, (4) is equivalent to the existence, for every , of a set of zero density such that

Another characterization comes from Proposition 3: is weak mixing if and only if

for every such that . In fact, by Lemma 4, converges a Cesàro to zero if and only if the same happens to .

Weak mixing, at first impression, seems an artificial notion, obtained by the relaxation of strong mixing. This is not the case: first because it also has the natural spectral characterization discussed in section 1 and second, as was already discussed above, weak mixing is an important part of every mps. In other contexts, it is abundant. For example, Avila and Forni proved that almost every interval exchange transformation is either an irrational rotation or weak mixing, in contrast to an older result of Katok which proved that these are never strong mixing.

3. Product characterization of weak mixing

Consider two mps and .

Definition 5 The product mps of and is the quadruple

where is the -algebra generated by and is the probability measure on defined by

Theorem 6 Given a mps , the following are equivalent.

  1. is weak mixing.
  2. is weak mixing.
  3. is ergodic, for every ergodic mps .
  4. is ergodic.

Proof: (i) (ii). It is enough to check (4) for a generating algebra of . Let . By assumption, there exist of zero density such that

and

The set has zero density and satisfies

proving that is weak mixing.

(ii) (i). Given , there exists of zero density such that

that is,

(i) (iii). Follows from the exercise below.

Exercise 3 Consider two bounded sequences and of real numbers. If and converge a Cesàro to and , respectively, then converges a Cesàro to .

(iii) (iv). If is the trivial mps with consisting of a single point, we conclude that is ergodic. Then, takin , it follows that is ergodic.

(iv) (i). First, note that is ergodic. Given ,

converges a Cesàro to

proving the assertion. Then

which converges to

4. Spectral characterization of weak mixing

We now characterize weak mixing in terms of spectral properties. At this point, it is interesting to introduce the

Theorem 7 If is an unitary operator on the Hilbert space and , then there is a unique finite Borel measure on the circle such that

When has continuous spectrum, is a continuous measure (it has no atoms), for every such that .In this case, Fubini theorem guarantees that gives zero measure to the diagonal . This in turn implies the

Theorem 8 is weak mixing if and only if has continuous spectrum.

Proof: () Suppose is an eigenfunction associated to . The function defined by is an eigenfunction of associated to . By Theorem 6, is constant and the same happens to .

() Let us check (6). Take such that . Using Theorem 7,

Decompose , where is the diagonal. For , the summand

converges to zero as uniformly in . Since assigns zero measure to , we’re done.

5. Conditions for weak mixing

In this section we resume all conditions obtained above for a a mps be weak mixing.

  1. For any ,
  2. For any ,
  3. For any such that ,
  4. For any such that ,
  5. For any , there exists of zero density such that
  6. For any , there exists of zero density such that
  7. is ergodic.
  8. is ergodic, for every ergodic.
  9. is weak-mixing.
  10. has continuous spectrum.

Previous posts: ERT0, ERT1, ERT2, ERT3, ERT4, ERT5, ERT6, ERT7.


Kategorije: Matematički blogovi

The fourth moment of Kloosterman sums

E. Kowalski's blog - Pet, 2010-02-26 21:02

One of my favorite computations is that of the fourth moment of Kloosterman sums:

where

This was almost first performed with spectacular consequences by H.D. Kloosterman himself in 1927, as a crucial step to proving an upper bound for his sums, which was sufficiently good for his application to representations of integers by integral positive definite quadratic forms in four variables (I recommend reading at least the introduction to this paper: it is strikingly modern).

I say almost, because after just checking his paper, I realized he just got the right order of magnitude, and not the exact formula, for M4.

The standard reference I had been using (including for the exam of a graduate course I taught a while ago…) was in Iwaniec’s delightful book on classical modular forms. (Kloosterman sums appear there because, as in fact already noticed by Poincaré in 1912, they occur in the formulas for Fourier coefficients of Poincaré series…) But while typing the result for my lecture notes of my new course on sums over finite fields, I worked out a different argument than the one in Iwaniec’s book (different but, it turns out, rather closer to Kloosterman’s own…).

Roughly, one first quickly reduces (using orthogonality of additive characters) to computing the number of solutions of the equations

This can be computed quite directly, as Iwaniec does, but one can also observe that there are obvious solutions

and wonder what others can exist? It is then fairly natural to try to see whether knowing

is enough to recover the pair (x1,x2), up to permutation. This will be the case, if we can compute the value of the product x1 x2 from the two symmetric quantities above (this is the theory of symmetric functions, in a rather trivial case). Now, observe the identity

which gives what we want, provided (of course) the denominator is non-zero. And indeed, this may of course vanish, and does so precisely for the extra solutions

of the original equations… The argument therefore proves there are no other than these three families, and after figuring out their intersections, the formula for the fourth moment follows. (Details are in my ongoing notes already mentioned above…)

This computation may seem desperately low-brow; however, as I discuss briefly in Section 6 of my most recent survey on applications of the Riemann Hypothesis over finite fields (I tend to like writing about this, I must confess…), this can be interpreted, via the “Larsen alternative” as the crucial step in proving the vertical (or average) Sato-Tate Law for Kloosterman sums: if we write

then the collection of angles

becomes equidistributed with respect to the Sato-Tate measure

as p goes to infinity…

[Update (27.2.2010): thanks to Ke Gong for sending some useful typographical corrections to the notes.]

Kategorije: Matematički blogovi

News from Theorem of the Day

Theorem of the Day - Pet, 2010-02-26 04:01
A small but hopefully growing list of acronyms has been attached to the Glossary page.
Kategorije: Matematički blogovi

New Theorem

Theorem of the Day - Pet, 2010-02-26 04:01
Theorem of the Day has a 'new acquisition': The Lindemann-Weierstrass Theorem
Kategorije: Matematički blogovi

New Theorem

Theorem of the Day - Pet, 2010-02-26 04:01
Theorem of the Day has a 'new acquisition': Haken;s Unknot Theorem
Kategorije: Matematički blogovi

New Theorem

Theorem of the Day - Pet, 2010-02-26 04:01
Theorem of the Day has a 'new acquisition': Lin McMullin's Theorem
Kategorije: Matematički blogovi

New Theorem

Theorem of the Day - Pet, 2010-02-26 04:01
Theorem of the Day has two 'new acquisitions': The Friendship Theorem and the Diaconis-Holmes-Montgomery Coin Tossing Theorem
Kategorije: Matematički blogovi
Udruženi sadržaj