# Getting into norms

This blog is named “Out of the Norm”, and the logo is the list of axioms that a norm satisfies (with some extra graffiti), but there’s a fair chance that it’s just mathematical gobbledegook to you. So what is a norm? At its heart, a norm is essentially a measure of distance, though not necessarily a spatial distance. Here are a couple of examples to give you some informal sense of what a norm is, and how they might be used.

Distance in space

If I asked you to go find some buried treasure 3 units of distance East, and 4 units North, how far would you have to travel? That is, from the origin (0, 0) to the point (3, 4) on the graph or ‘map’ below?

It’s not a trick question: the distance as the crow flies, by Pythagoras’ theorem, is simply $\sqrt{3^2 + 4^2}=5$.

But what if it were a trick question? Let’s say you were actually a taxi driver in some Manhattan-like city with square blocks (each a single unit in size), and were taking a passenger three blocks East and four blocks North (perhaps to the buried treasure, if you insist). Then the shortest route would be seven units long (7=3+4; there are many different shortest routes).

Two of the shortest routes

So now you’ve seen two different notions of distance in 2-dimensional space: the first very familiar; the second probably less so. Just to reinforce the point, you can do the same thing in 3-dimensional space. The distance as the crow flies directly to a point 5 units above the unnecessary treasure, that is, the point (3, 4, 5), is approximately seven units: $\sqrt{3^2+4^2+5^2}=\sqrt{50}$. However, our taxi cab driver has to drive around the buildings and then take some sort of car-lift, and travels $3+4+5=12$ units.

Diagram not at all to scale.

You can extend this to as many spatial dimensions as you want, though as far as I know it’s impossible for humans to imagine a diagram with the point (3, 4, 5, 2) hovering somewhere. But that doesn’t mean the notion is useless.

Guess the age of the lecturers

The next example has been used by one of my former colleagues to introduce norms to his students. Firstly, he gets the students to each write down a guess of the ages of all the lecturers in the maths department, and writes the guesses up on the board. Then he reveals the true ages of the lecturers, and asks the students how they should decide who has won. Here’s some completely made-up numbers:

Let’s say there are four lecturers: a PhD student, a lecturer, a senior lecturer, and a professor, who are 25, 35, 45 and 60 years old respectively. We’ll write this as the vector (25, 35, 45, 60). Now the three students make their guesses: Adam guesses in order 27, 36, 55 and 66: again, write this as a vector (27, 36, 55, 66). Beth guesses (26, 35, 53, 51), completely misjudging the age order of the older pair. And, because there’s always one in every class, Charlie guesses (27.2, 34, 56.4, 62.4), which reminds us that we needn’t work only with whole numbers.

The students decide to judge the results by looking at the differences between the actual ages and the guesses, and decide to judge overestimates and underestimates equally, and guesses for each lecturer equally. The errors are as follows:

Adam: (6, 10, 1, 2), which in descending order is (10, 6, 2, 1).

Beth: (9, 8, 0, 1), re-ordered as (9, 8, 1, 0).

Charlie: (2.4, 11.4, 1, 2.2), re-ordered as (11.4, 2.4, 2.2, 1).

The re-ordering here is allowed because they’ve decided being accurate about, say, the senior lecturer, who perhaps doesn’t look his age, is not more valuable than being accurate about the PhD student. It also makes it easier to compare the guesses by eye.

Obviously it’s good to have a low overall score. Adam notes that none of the scores is obviously better than the others: that is, no-one has guessed closer in all the categories than anyone else, even after the re-ordering.

Beth immediately declares that the winner is the one whose maximum error is the smallest. This happens to be her. Charlie, who is last under this metric, snorts in derision. He has guessed very accurately, except in one case.

Charlie therefore suggests that they add up all the errors, and whoever has the lowest total wins. Having a natural flair for addition, he announces that the scores are now: Adam, 19; Beth, 18; and Charlie 17. He feigns surprise upon realising that this system makes him the victor.

Adam dismisses both these evaluations. Surely, as a compromise, they need a system that both punishes outrageously large errors, like Beth’s method, and takes into account all the guesses, like Charlie’s. He suggests that they take the squares of each error and then add them all up. Under this measure, large errors contribute much much more to the total than small ones, so as in Beth’s, it is better to have two middling guesses rather than one large and one small error. However, as in Charlie’s, one can still recover from a large error by doing much better than everyone else on other lecturers’ ages.

Beth and Charlie both think this sounds like a sensible compromise on the face of it, until they realise that Adam has been furiously tapping numbers into his calculator. The scores are still close: the winner Adam, with 141 which is $10^2 + 6^2 + 2^2 + 1^2=100+36+4+1$ ; Charlie, in second place by an eyelash with 141.56; and Beth, now in third with 146. Beth makes some scurrilous attacks about Adam’s honour, and the cycle starts again.

There is no correct answer to this problem, unless the students agreed in advance or until someone concedes. To make the situation as bad as possible, I’ve omnipotently guided their guesses so that the students would each come first, second and third once in these three measurement systems. You could have avoided revealing the true ages until after the students have decided on a method, so they could debate fairly behind the veil of ignorance, but it’s more fun if they argue with vested interests. Having said that, I didn’t enjoy when a similar experiment happened on a national scale earlier this year, with the UK deciding it didn’t want a slightly more complicated but better voting system, with both sides of the debate generally arguing “vote for our method or X will win”.

If you’ve not spotted the link between the two examples, let me spell it out. Charlie’s method is the same as the Manhattan taxi-cab distance, and Adam’s is in effect the same as the ‘as the crow flies’ distance (which is more usually called the Euclidean distance), in the following sense.

If there were only two lecturers, you could plot the student’s guesses in the following two-dimensional graph:

Then, to judge their scores for just these two lecturers, using Charlie’s method, we just take the crow-flying distance between a student’s guess on the graph and the point representing the actual ages. Actually, Charlie suggested that we take this distance squared, but taking square-roots and squaring does not change the relative order of each student’s guesses.

Viewed in this way, Adam’s method is now exactly the Manhattan taxi-cab distance: just take the distance of the shortest route when only travelling parallel to the axes. I can’t think of a good reason to use Beth’s measurement for actual spatial distances.

In each case, the winner is the person with shortest distance. So you can actually imagine the “Guess the lecturers’ age” game as trying to pick a point as close as possible to the point of the actual ages in four-dimensional space. If you take $n$ lecturers, you would effectively be aiming to guess a point in $n$-dimensional space. Anyone for a game of eleven-dimensional blind-folded darts?