Unit 8 Section 2 : Statistical Measures

In this section we recap the statistical measures mean, median, mode and range. The mean, median and mode give an indication of the 'average' value of a set of data, i.e. some idea of a typical value. The range, however, provides information on how spread out the data is, i.e. how varied it is.

Definition
Example
Mean =

For 1, 2, 2, 3, 4

Mean =
=
= 2.4
Mode = most common value

For 1, 2, 2, 3, 4
Mode = 2
For 1, 2, 2, 3, 4, 4, 5
Mode = 2 and 4

Median = middle value when data is arranged in order

For 1, 2, 2, 3, 4

Median = 2

For 1, 2, 2, 3, 4, 4

Median =
= 2.5
Range = largest value – smallest value

For 1, 2, 2, 3, 4

Range = 4 – 1
= 3

In this section, we extend these basic ideas to grouped data.

Example 1

The shoe sizes for a class are summarised in the table shown.

Calculate:

Shoe Size Frequency
4 2
5 4
6 7
7 5
8 6
9 3
10 3
(a)

the mode,

6 (i.e. the size with highest frequency)
(b)

the median

There are 30 values altogether. Since 30 is even, there will be two central values. These will be the 15th and 16th values. From the frequency table, these are both 7. (You could list them all in order, but it is easy to see from the table that there are 13 values before the five '7' values are reached.)
So the median = = 7
(c)

the mean

The mean is the sum of all the data values divided by the total number of values, and is better calculated from the table by adding an extra
'frequency × size'
column, as shown in the following table:
(x)
Size
(f)
Frequency
(f x)
Frequency × Size
422 × 4 = 8
544 × 5 = 20
677 × 6 = 42
755 × 7 = 35
866 × 8 = 48
933 × 9 = 27
1033 × 10 = 30
Total30210
The mean = = = 7
(d)

the range

The range = highest value – lowest value
= 10 – 4
= 6
for this data

Note

If a data set contains n values then the median can be obtained as the
n + 1 th
2
value.
If n is odd, this formula will pick out the value that you need. For example, if there are 157 data values then the median will be the
157 + 1 th
2
value, i.e. the 79th value.
If n, the number of data values, is even, then the formula will pick out the two values that you need to average to obtain the median. In Example 1, we had n = 30 data values, so the median is the
30 + 1 th
2
value, i.e. the 15.5th value. The ' .5' tells us we need to average the 15th and 16th values, which is what we did to get the median 7.

Example 2

The table shows the Morse code for 26 letters and how long it takes to send each letter.

If a letter is frequent we want to be able to send it quickly. The following table shows the 6 most frequent letters in 4 languages:

(a)

Complete the following table of the mean, median and modal sending times for the 6 most frequent letters in each language.

4.7
4.3
4
5
5
5
English : mean time =
=
=
= 4.7 (to 1 decimal place)
median time :
in order 1, 3, 3, 5, 5, 11
median =
= 4
French : mean time =
=
=
= 4.3 (to 1 decimal place)
median time :
in order 1, 3, 5, 5, 5, 7
median =
= 5
Italian : times are 1, 11, 5, 3, 5, 7, so modal time is 5.
Spanish : times are 1, 5, 11, 5, 7, 3, so modal time is 5.
(b)

Use your table in part (a) to decide which two languages are likely to send the quickest messages in Morse. Explain how you decided.

English and French , since their mean values are significantly lower than the mean values for Italian and Spanish.
(c)

Samuel Morse invented the code. Messages in his own language are quick to send. Look at the table of the 6 most frequent letters in each language.
Which one of these letters has a code which suggests that Samuel Morse's own language was English? Explain how you decided.

The letter T, which is the 2nd most frequently used letter in English, has a very short sending time, but is not in the top 6 for French, Italian or Spanish.

Note

Often data is provided in summary form, so that estimates have to be made to find the mean value.

Example 3

Data on the number of minutes that a particular train service was late have been summarised in the table. (Times are given to the nearest minute.)

Minutes Late Frequency
on time 19
1-5 12
6-10 9
11-20 4
21-40 4
41-60 2
over 60 0
(a)

How many journeys have been included?

Total number of journeys = 19 + 12 + 9 + 4 + 4 + 2 = 50
(b)

What is the modal group?

'On time'
(c)

Estimate the mean number of minutes the train is late for these journeys.

It is more convenient to use a table for this calculation; for each 'group', the midpoint is used for the calculation (this is why it is an estimate and not an exact value).
Minutes LateMidpoint
(x)
Frequency
(f)
(f x)
On time0190
1-5 31236
6-10 8972
11-20 15.5462
21-40 30.54122
41-60 50.52101
Total50393
(Note that, because the times in the table are given to the nearest minute, the class described as '11-20' actually means 10.5 ≤ T < 20.5. This class has width 10 minutes, so half way will be 5 minutes after the start point 10.5, so the midpoint = 10.5 + 5 = 15.5.)
Mean value7.86 minutes
(d)

Which of the two averages, mode and mean, would the train company like to use in advertising its service? Why does this give a false impression of the likelihood of being late?

Clearly 'on time'; the modal average, would give a better impression, but it would be giving a false impression as over 50% of trains were in fact late!
(e)

Estimate the probability of a train being more than 20 minutes late on this service.

Estimate==0.12=12%

Exercises

Question 1

The number of days absence for each pupil in a class is summarised in the table.

Calculate:

No. of Days Absent Frequency
0 10
1 11
2 5
3 0
4 2
5 1
6 1
more than 6 0
(a)

the mode,

day(s)
(b)

the median

day(s)
(c)

the mean

day(s)
(d)

the range

day(s)
for the data
Question 2

A new minibus service to the nearest town is provided for an isolated village. The number of people using the service during the first month of operation is summarised in the table.

No. of Days Absent Frequency
0 5
1 4
2 3
3 4
4 6
5 2
6 2
7 3
8 1
9 1
10 0
(a)

Calculate:

(i)

the mode,

(ii)

the median,

(iii)

the mean.

(b)

Which of these average values give the best justification for continuing the service?

It could be criticised as not giving a fair representation of the use of the minibus service because it takes no account of daily variations.
Question 3

A machine in a youth club sells snacks as listed in the following table.

Len writes down the amounts of money which different people spend one evening during each hour that the club is open:

(a)

Is Len correct when he says that the mode of the amounts of money spent is 40p?

40p is the amount spent most often.
(b)

Fill in the column for 7 p.m. to 8 p.m. in the chart below. Then fill in the column for the total number of people who spent each amount.

(c)

Len says: "Now 50p to 99p is the mode."
Is Len right?

'50p to 90p' is the modal class for the way he has grouped the data.
(d)

Look at where the tally marks are on the chart. What do you notice about the amounts of money people spent at different times in the evening? Give a reason which could explain the difference you notice.

E.g. Maybe older people with more money to spend attend later in the evening.
Question 4

This graph shows the range of temperature in Miami each month. For example, in January the temperature ranges from 17 °C to 24 °C.

(a)

In which month does Miami have the smallest range?

(b)

In July, the range in the temperature in Miami is 5 °C. There are five other months in which the range in the temperature is 5 °C. Which five months are they?

(c)

This graph shows the range in the temperature in Orlando each month. In which three months is the maximum temperature in Miami greater than the maximum temperature in Orlando?

Question 5

The pupils in five classes did a quiz. The graphs show the scores in each class. Each class had a mean score of 7. In three of the classes, 80% of the pupils got more than the mean score.

(a)

In which three classes did 80% of the pupils score more than 7 ?

Classes
(b)

Look at the graphs which show that 80% of the pupils scored more than 7. Some of the statements below are true when 80% of the pupils scored more than 7.
Write down the letter for each of the statements below which is true.

A : All of the pupils scored at least 2.
B : Most of the pupils scored at least 8.
C : Most of the pupils scored at least 10.
D : Some of the pupils scored less than 6.

(c)

In another quiz the mean score was 6. Complete the following graph to show a mean score of 6.

Question 6

A school has 5 Year groups. 80 pupils from the school took part in a sponsored swim. Lara and Jack drew these graphs.

(a)

Look at Lara's graph. Did Year 10 have fewer pupils taking part in the swim than Year 7 ?

Graph shows only the total number of lengths swum by each year group, not how many pupils swam them.
(b)

Use Lara's graph to work out the mean number of lengths swum by each of the 80 pupils.

Mean = lengths/pupil
Mean = = = 10.375 lengths/pupil
(c)

Use Jack's graph to work out the mean number of lengths swum by each of the 80 pupils.

Mean = lengths

Note that the means calculated from Lara's graph and Jack's graph are different.
Jack's graph uses middle values from grouped data (only approximate) whereas Lara's graph is accurate because it uses actual totals of lengths swum by year groups.

Question 7

A customer at a supermarket complains to the manager about the waiting times at the check-outs. The manager records the waiting times of 100 customers at check-out 1.

(a)

Use the graph to estimate the probability that a customer chosen at random will wait for 2 minutes or longer.

(b)

Use the graph to estimate the probability that a customer chosen at random will wait for 2.5 minutes or longer.

(c)

Calculate an estimate of the mean waiting time per customer. Show your working. Complete the table below to help you with the calculation.

Mean waiting time = minutes.
Question 8

A company makes breakfast cereal containing nuts and raisins. They counted the number of nuts and raisins in 100 small packets.

(a)

Calculate an estimate of the mean number of nuts in a packet. Complete the table below to help you with the calculation.

Mean number of nuts in a packet:
(b)

Calculate an estimate of the number of packets that contain 24 or more raisins.

packets
(c)

Which of the two charts shows the greater range? Explain your answer.

Ranges cannot be worked out exactly as the original raw data has been grouped. However, the range for nuts could only be as high as 18 – 4 = 14, whilst the range for raisins could be as high as 30 – 6 = 24 but only as low as 26 – 10 = 16. This shows that the chart for raisins (chart B) exhibits the greater range.
(d)

A packet is chosen at random. Calculate the probability that it contains 9 nuts or fewer.

(e)

The number of raisins in a packet is independent of the number of nuts. A packet is chosen at random. Calculate the probability that it contains 16 to 18 nuts and 6 to 10 raisins. Show your working.