Contents

Theory - [T10]

Question
Explain all possible derivation of the arithmetic mean and in general of the other common types of averages

Arithmetic mean

The arithmetic mean is the most known and widely used measure of central tendency. Intuitively can be seen as the representative of a set of numbers, for example take the possible grades of a course: { 26, 27, 27, 28, 30, 24 }. If we calculate the mean we can see that its value is 27, and indeed if all the grades were to be 27 the mean wouldn’t change.

Derivations of the arithmetic mean

Let’s now see how is the formula for the arithmetic mean can be derive with different approaches.

Classic formula

Suppose one has a finite, discrete set of $N$ data points $x_{i}$ for $i=1,\dots N$. Assuming this set has a central value (i.e. a mean value), then by definition, the sum of positive deviations should be equal to the sum of negative deviations from this central value (where by positive deviation, we mean that a given data point $x_i$ is deviated from the central value $\bar{x}$ by an amount $x_i−\bar{x}$ and by negative deviation, that a given data point is deviated from the central value by an amount $\bar{x}−x_i$). Now, if we rearrange the set of data points into two subsets, one containing all points each of whose value is less than the central value, i.e. $x_1,x_2,…,x_i<\bar{x}$, and the other containing all points each of whose value is greater than, or equal to the central value, i.e. $x_{i+1},x_{i+2},…,x_N\ge\bar{x}$. It follows that,

$$ (x_N−\bar{x})+(x_{N−1}−\bar{x})+\cdots+(x_{i+1}−\bar{x})−(\bar{x}−x_i)−(\bar{x}−x_{i-1})−⋯−(\bar{x}−x_1)=0 $$

Which, upon rearranging terms, gives

$$ x_1+x_2+\cdots+x_{i-1}+x_i+x_{i+1}+\cdots+x_{N−1}+x_N−N\bar{x} = \\ = \displaystyle\sum_{i=1}^N x_i−N\bar{x}=0 \\ \Rightarrow \bar{x}=\frac{1}{N}\displaystyle\sum_{i=1}^Nx_i $$

The Knuth formula

The online algorithm described in the lesson allows us to calculate the arithmetic mean without needing to keep track of all the values collected. This is particularly useful for the reasons we will explain after a brief demonstration on how to obtain the formula.

Proof

First of all let’s define some notation. We will denote with \(x*{i}\) the i-th element of the sum and with $\bar x*{i}$ the arithmetic mean at step i.

Here is the naive formula to calculate the arithmetic mean:

$$ \bar x_{n}={\frac {1}{n}}\sum_ {i=1}^{n}x_{i}={\frac {x_{1}+x_{2}+\cdots +x_{n}}{n}} $$

Given this general formula we can rewrite it putting emphasis on the last and penultimate elements:

$$ \bar x_{n}={\frac {x_{1}+\cdots+x_{n-1}+x_{n}}{n}} \\ $$

$$ \bar x_{n}={\frac {x_{1}+\cdots+x_{n-1}}{n}}+{\frac {x_{n}}{n}} $$

Now we can multiply the first member of the addition by $ (\frac {n-1}{n-1}) $

$$ \bar x_{n}=({\frac {n-1}{n-1}})\cdot {\frac {x_{1}+\cdots+x_{n-1}}{n}}+{\frac {x_{n}}{n}} $$

$$ \bar x_{n}=({\frac {n-1}{n}})\cdot \fcolorbox{red}{#292a2d}{$\frac {x_{1}+\cdots+x_{n-1}}{n-1}$}+{\frac {x_{n}}{n}} $$

Note how the highlighted member is simply $ \bar x_{n-1} $, we can therefore rewrite the formula as

$$ \bar x_{n}=({\frac {n-1}{n}})\cdot {\bar x_{n-1}}+{\frac {x_{n}}{n}} $$

$$ \bar x_{n}={\frac {(\bar x_{n-1}\cdot n)-\bar x_{n-1}}{n}}+{\frac {x_{n}}{n}} $$

$$ \bar x_{n}={\frac {\bar x_{n-1}\cdot \cancel n}{\cancel n}}-{\frac {\bar x_{n-1}}{n}}+{\frac {x_{n}}{n}} $$

From which we can obtain the final formula

$$ \colorbox{#208020}{$ \Large{\bar x_{n}=\bar x_{n-1}+{\frac {x_{n}-\bar x_{n-1}}{n}} } $} $$

We recognise this as the same formula seen during the lesson.

An alternative solution

While trying to obtain the online formula for the arithmetic mean I actually used a slightly different approach, which gave me as a result a different formula which I consider to be correct. I think the formula I obtained can be rewritten as the one shown above and during the lesson, but I think it is still interesting to follow my initial reasoning to get it, as it demonstrated to myself once again that reasoning is more important than memorizing.
The approach I used is actually very similar to the “original” one.

Let’s start once again from the general formula:

$$ \bar x_{n}={\frac {x_{1}+\cdots+x_{n}}{n}} $$

This can be used to calculate $ \bar x*{n}$, let’s now use it to calculate $ \bar x*{n+1}$

$$ \bar x_{n+1}={\frac {x_{1}+\cdots+x_{n}+x_{n+1}}{n+1}} $$

Let’s once again separate the fraction

$$ \bar x_{n+1}={\frac {x_{1}+\cdots+x_{n}}{n+1}}+{\frac {x_{n+1}}{n+1}} $$

Multiply

$$ \bar x_{n+1}=({\frac {n}{n}})\cdot{\frac {x_{1}+\cdots+x_{n}}{n+1}}+{\frac {x_{n+1}}{n+1}} $$

$$ \bar x_{n+1}=({\frac {n}{n+1}})\cdot \fcolorbox{red}{#292a2d}{$\frac {x_{1}+\cdots+x_{n}}{n}$} + {\frac {x_{n+1}}{n+1}} $$

$$ \bar x_{n+1}=({\frac {n}{n+1}})\cdot{\bar x_{n}}+{\frac {x_{n+1}}{n+1}} $$

With the final step we obtain the following formula:

$$ \colorbox{#208020}{$ \Large{\bar x_{n+1} = {\frac{(\bar x_{n}\cdot n) + x_{n+1}}{n+1}} } $} $$

Other type of means

We have seen in great detail what the arithmetic mean is and how to derive it. Now we will briefly introduce other types of averages.

Geometric mean

The geometric mean is an average that is useful for sets of positive numbers, that are interpreted according to their product (as is the case with rates of growth) and not their sum (as is the case with the arithmetic mean):

$$ \bar{x} = \Biggl( \prod_{i=1}^{n} x_i \Biggr)^{\frac{1}{n}} $$

Visual explanation

A very simple visual representation of the formula can be shown through the use of Geometry, as can be seen from the following YouTube video:

Another very simple representation can be provided talking about areas and volumes. Suppose we have a rectangle of sides 20 and 50. The total area of the rectangle is therefore 100. If we calculate the geometric mean of these two values we obtain $\sqrt{20\cdot50} = \sqrt{100} = 10$. To complete this visualization we can now imagine a square whose sides are 10. The total area will once again be 100.

../images/rectangle_square_area.webp

This same logic can also be applied to parallelepipeds using the volume of the 3D object.

../images/rectangular_box.webp

Harmonic mean

The harmonic mean is a measure of central tendency best suited when working with fractions, and ratios. (e.g. the average traveling speed given the duration of multiple journeys) A practical use example can be found in finance, where the weighted harmonic mean can be used to average the Price-to-Earnings ratio (P/E), since we can assign a specific weight to each data point.

The formula for the Harmonic mean is the following:

$$ H(x_1,x_2,\dots,x_n) = \frac{n}{\sum_{i=1}^{n} \frac{1}{x_i}} $$


Sources