Calculate the mean in statistics

The mean is the most widely used measure of central tendency, but it comes in several forms. This tutorial covers the arithmetic, weighted, truncated and geometric mean: what each one measures, when to use it, and when to avoid it.

Arithmetic mean

The arithmetic mean is what most people mean when they say “average.” It is defined as the sum of all values divided by the number of values.

For a set of \(n\) values \((x_1, x_2, \dots, x_n)\), the mean \(\bar{x}\) is:

\[\bar{x} = \frac{\sum_{i = 1}^n x_i}{n},\]

being \(x_i\) the observation \(i\) of \(x\).

Properties

The arithmetic mean has three properties that are useful when working with transformed data:

  • Zero deviation sum: the sum of deviations from the mean is always zero, \(\sum_{i=1}^n (x_i - \bar{x}) = 0\).
  • Translation: adding a constant \(c\) to every value shifts the mean by the same amount. If \(Y = X + c\), then \(\bar{y} = \bar{x} + c\).
  • Scale: multiplying every value by a constant \(c\) multiplies the mean by that constant. If \(Y = cX\), then \(\bar{y} = c\bar{x}\).
  • Linear transformation: combining both, if \(Y = aX + b\), then \(\bar{y} = a\bar{x} + b\).
Example: calculating the mean and its properties

The selling prices of four cars are: 25,000, 32,000, 15,000 and 72,000 USD.

Mean price:

\[\bar{x} = \frac{25000 + 32000 + 15000 + 72000}{4} = 36{,}000 \text{ USD.}\]

If every price increases by 5,000 USD, what is the new mean?

By the translation property: \(\bar{y} = 36{,}000 + 5{,}000 = 41{,}000\) USD. No need to recalculate from scratch.

If every price increases by 10%, what is the new mean?

By the scale property: \(\bar{y} = 36{,}000 \times 1.1 = 39{,}600\) USD.

Example icon

The outlier problem

The arithmetic mean is sensitive to extreme values. A single outlier can pull the mean far from where most of the data sits.

One outlier shifts the mean significantly, while the median stays stable

Figure 1: One outlier shifts the mean significantly, while the median stays stable

⚠️ When not to use the arithmetic mean

Avoid the arithmetic mean when:

  • The data has heavy outliers: one CEO’s salary in a sample of workers will make the mean useless as a “typical” value.
  • The distribution is strongly skewed: house prices, income, and city populations are classic examples. The median is a better choice.
  • The variable is ordinal: averaging satisfaction scores (1 = poor, 5 = excellent) assumes equal distances between categories, which is not guaranteed.

Truncated mean

The truncated mean removes a fixed percentage of the lowest and highest values before calculating the mean. It is a simple way to reduce the influence of outliers without switching to a completely different measure.

To compute the truncated mean at \(p\)%: sort the data, remove the \(p\)% of values from each end, and calculate the arithmetic mean of what remains.

Truncated mean at 10%: the two extreme values (red) are removed before averaging

Figure 2: Truncated mean at 10%: the two extreme values (red) are removed before averaging

Example: truncated mean at 10%

Consider the following 10 values: (1, 17, 19, 20, 22, 23, 27, 29, 32, 210).

The arithmetic mean is: \[\bar{x} = \frac{1 + 17 + \cdots + 210}{10} = 40.\]

At 10% truncation, we remove 1 value from each end (10% of 10). Sorting and removing the extremes:

1, 17, 19, 20, 22, 23, 27, 29, 32, 210

The truncated mean is: \[\bar{x}_t = \frac{17 + 19 + 20 + 22 + 23 + 27 + 29 + 32}{8} = 23.625.\]

The two outliers (1 and 210) were pulling the arithmetic mean up to 40, far from where most values actually are.

Example icon

💡 Real-world use of the truncated mean

The truncated mean is used in official contexts more than most people realize. The trimmed mean is used to calculate the final score in Olympic gymnastics and figure skating (removing the highest and lowest judge scores). It is also used by some central banks to measure core inflation, removing the most volatile price categories.

Weighted mean

The arithmetic mean assumes every observation has the same importance. The weighted mean allows different values to contribute differently, according to a weight \(w_i\) assigned to each observation \(x_i\):

\[\bar{x}_w = \frac{\sum_{i=1}^k x_i \cdot w_i}{\sum_{i=1}^k w_i}.\]

Example: weighted mean in a grading system

A course has three assessments with different weights: a midterm (20%), a project (20%) and a final exam (60%). A student scores 5, 7 and 8.

\[\bar{x}_w = \frac{5 \cdot 0.2 + 7 \cdot 0.2 + 8 \cdot 0.6}{0.2 + 0.2 + 0.6} = \frac{1 + 1.4 + 4.8}{1} = 7.2.\]

With a simple arithmetic mean the score would be 6.67. The weighted mean reflects the fact that the final exam matters more.

Example icon

⚠️ The weights must sum to a meaningful total

The formula divides by (\sum w_i), so the weights do not need to sum to 1 or 100. However, make sure the weights reflect actual relative importance. A common mistake is using counts as weights when what you want is proportional weights, or vice versa.

Geometric mean

The geometric mean is the appropriate average when working with values that are multiplied together, such as growth rates, ratios, or indices. It is defined as the \(k\)-th root of the product of \(k\) values:

\[\bar{x}_g = \left(\prod_{i=1}^k x_i\right)^{\frac{1}{k}}.\]

An equivalent and often more practical formula uses logarithms:

\[\bar{x}_g = \exp\left(\frac{1}{k} \sum_{i=1}^k \ln(x_i)\right).\]

Example: average investment growth rate

An investment grows by 5%, 10% and 20% in three consecutive years. What is the average annual growth rate?

Convert to multipliers: \(x = (1.05,\ 1.10,\ 1.20)\).

\[\bar{x}_g = (1.05 \times 1.10 \times 1.20)^{1/3} = 1.386^{1/3} \approx 1.1154.\]

The average annual growth rate is approximately 11.54%.

To verify: \(1.05 \times 1.10 \times 1.20 = 1.386\), and \(1.1154^3 \approx 1.386\). Correct.

Using the arithmetic mean instead would give \((1.05 + 1.10 + 1.20)/3 = 1.1167\), or 11.67%. The arithmetic mean slightly overestimates compound growth.

Example icon

⚠️ Geometric mean requires all positive values

The geometric mean is only defined when all values are positive. If any value is zero or negative, the formula breaks down. For growth rates, make sure you are working with multipliers (e.g. 1.05 for 5% growth), not raw percentages.