John Allen Paulos explains how the bell curve works. The bell curve, you will recall, is given by the equation

,

where is the average value of the variable (the peak of the bell curve) and is the variance. Paulos’s point is, apparently, that small differences in the values of and can lead to extreme imbalances at the far ends of the curve ( for large values of R). Here is how this might manifest itself in practice:

The corporation’s personnel officer notes the relatively small differences between the groups’ means and observes with satisfaction that the many mid-level positions are occupied by both Mexicans and Koreans.She is puzzled, however, by the preponderance of Koreans assigned to the relatively few top jobs, those requiring an exceedingly high score on the qualifying test. The personnel officer does further research and discovers that most holders of the comparably few bottom jobs, assigned to applicants because of their very low scores on the qualifying test, are Mexican.

She may suspect bias, but the result might just as well be an unforeseen consequence of the way the normal distribution works.

Yes, really. Of course, Paulos chose the direction of the imbalance at random. He says so right in the article.

There’s a way of misusing mathematics that goes like this: start with a mathematical model, often a probability distribution or a differential equation, that looks reasonable enough in typical circumstances. Then assume a very specific set of circumstances, for example making one of the parameters abnormally large, plug this into the general purpose equation, manipulate it for a bit, and draw the conclusions. QED, or something.

What’s missing from it is a level of mathematical maturity. In my experience of teaching undergraduate mathematics, manipulating exact formulas is the easy part for most students. (Relatively speaking, of course, but whatever.) The hard part is the inequalities, approximations, error estimates. You no longer have an exact equation that can be rearranged every which way and still remains equivalent to the original one. If you move around and rescale the terms in an approximate formula, the error might still be acceptable, or it might not be, and you can’t always tell which is which by just backtracking through an automated series of algebraic manipulations. You actually have to understand what’s going on.

Continue reading