Who's Yambakam - a deep dive into my external self | Numerical Summary Part-Ⅰ: Understanding Measures of Central Tendency

“If your boss asks you a report on this quarter’s sales numbers and is rushing to a meeting and has time to listen to one piece of information about that data, that piece of information you give her should probably be a measure of central tendency” states Adriene Hill, crash course statistics.

    A measure of central tendency is a single value that describes the way in which a group of data cluster around a central value. To put in other words, it is a way to describe the center of a data set.
        - It condenses the data set down to one representative value.
        - It also allows you to compare one data set to another or a piece of data to entire dataset.

There are three measures of central tendency namely mean, medain and mode.

Upnext we are going to talk about them in greater level of detail.

Working with samples

A sample is set of data points taken out from the given population which represents the population as a whole.

Let’s say you wanna study Advertisements for IT jobs in the Netherlands and found the data of top 50 search results for Ads for IT jobs in the Netherlands on May 1, 2020. This isn’t the whole population, instead a small sample that represents it.

Mean: The average value

As commonly known mean or the average is simply the sum of all data points divided by the number of data points in the sample.

For example, A cricketer’s scores in five ODI matches are as follows: 12, 34, 45, 50, 24. The arithmetic mean of data is given by

$\dpi{200}\textbf{Mean},&space;\bar{x}&space;=&space;\frac{sum\;of\;all\;observations}{total\;no\;of\;observations}&space;=&space;\frac{\sum_{i=1}^{n}x_{i}}{n}&space;=&space;\frac{x_{1}+x_{2}\cdots+x_{n}&space;}{n}$

In this case,

$\dpi{200}\bar{x} = \frac{12+34+45+50+24}{5} = \frac{165}{5} = 33$

So on an average the cricketer scores 33.

Wait a minute…

Did you noticed that I said arthmetic mean…that mean there are other ones.

Yes, there are other ones which we won’t use that much. But let’s take a look at them.

Geometric Mean:

The Geometric Mean is a special type of average where we multiply the numbers together and then take a square root (for two numbers), cube root (for three numbers) etc. It is given by

$\dpi{300}\bar{x}&space;=&space;\left&space;(&space;\prod_{i=1}^{n}x_{i}&space;\right&space;)^\frac{1}{n}&space;=&space;\sqrt[n]{x_1&space;x_2&space;\cdots&space;x_n}$

The Geometric Mean is useful when we want to compare things with very different properties.

Consider a stock that grows by 10% in year one, declines by 20% in year two, and then grows by 30% in year three. The geometric mean of the growth rate is calculated as follows:

$\dpi{200}\bar{x}= \sqrt[3]{(1+0.1)(1-0.2)(1+0.3)} = \sqrt[3]{10.1} = 1.0458$

This implies the growth rate is 46%.

Harmonic Mean:

The Harmonic mean is simply reciprocal of the average of the reciprocals.

It is used for calculating mean data is obtained by combining two scales.

In particular cases, especially those involving rates and ratios, the harmonic mean gives the most correct value of the mean. It is given by

$HM = \frac{n}{\sum_{i=1}^{n}\frac{1}{x_{i}}} = \frac{n}{ \frac{1}{x_{1}}+\frac{1}{x_{2}}+\cdots +\frac{1}{x_{n}} }$

Let’s say I travel 10 km at 60 km/h, than another 10 km at 20 km/h, what is my average speed?

$HM = \frac{2}{\frac{1}{60}+\frac{1}{20}} = \frac{2}{\frac{1}{15}} = 2 \cdot 15 = 30\;km/hr$

Yes! you got it right. It was 30 Kilometers per Hour.

Outliers and its effect:

Some electric utilities companies allow homeowners to pay their electric bills by setting up budget billing. This budget billing figure is derived by the average electric bill over acertain period, usually the prior 12 months.

The electric company, like PPL electric, will take twelve months of bills (Jan-Dec) of the prior year, add the total, and divide by 12. For example, the usage for the prior 12 months is shown below:

Month	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
Amount	$135	$145	$112	$101	$98	$87	$116	$121	$113	$107	$126	$131

What is the average of this homeowner’s electric bill for this given year?

$Average\;Bill = \frac{135+145+112+101+98+87+116+121+113+107+126+131}{12}$

$= 116$

The average electric bill for this house is $116.

What would happen if a heat wave hit in August and the family’s electrical bill for the month increased to $449? By how much would this affect their budgeted amount?

Now the calculation would be

$Average\;Bill&space;=&space;\frac{135+145+112+101+98+87+116+\textbf{449}+113+107+126+131}{12}$

$= 143$

With a heat wave the average electrical bill for this house will increase to $143.

Here the abnormal bill value is called an outlier.

In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error.

Mean is robust to ouliers.

Hence we use another measure called Median

Median: The middle value of the data

In statistics and probability theory,
the median is the value separating the higher half from the lower half of a data sample.

So the median value is given by

$\dpi{200}Median,\;M = \left\{\begin{matrix} X(\frac{n+1}{2})\;\;if\;n\;is\;odd \\ \\\frac{X(\frac{n}{2})+X(\frac{n}{2}+1)}{2}\;\;if\;n\;is\;even\end{matrix}\right.$

where n is the number of observations

Let’s say the number of slices of pizza that each person at Thorton’s birthday party ate were 4, 5, 3, 2, 13, 1, 3.

What does the middlemost value interpret?

In order to calcuate the median, first let’s sort the numbers as follows: 1, 2, 3, 3, 4, 5, 13

$\dpi{300}Median = X(\frac{7+1}{2}) = X(4)= 3$

So it seems like half of the people ate less than 3 slices and other more than 3.

That’s it! Wait a minute…

After sometime Thorton was checking his birthday stats and He was suprised…4, 5, 3, 2, 13, 1, 3.

Who ate 13 slices?

Then we found that there were two people of same name recorded as a single person and they ate each of 6 and 7 slices.

Now if we arrange the data points again: 1, 2, 3, 3, 4, 5, 6, 7 and median is given by

$\dpi{200}Median = \frac{X(\frac{8}{2})+ X(\frac{8}{2}+1)}{2} = \frac{X(4)+X(5)}{2}= \frac{3+4}{2} = 3.5$

So we now know the central value is 3.5(although no one ate 3.5 slices).

This is where mode comes into picture.

Mode: The most frequent value

A mode, in statistics, is defined as the value that has higher frequency in a given set of values. It is the value that appears the most number of times.

Below your seeing 200 children favorite flavor of ice cream amoung vanilla, chocolate or strawberry encodes as 1, 2, and 3 respectively.

1, 3, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 1, 1, 3, 1, 1, 2, 1, 3, 1, 2, 1, 2, 1, 2, 1, 3, 2, 3, 1, 2, 3, 3, 1, 1, 2, 3, 1, 1, 3, 3, 1, 3, 1, 2, 1, 3, 2, 3, 3, 1, 1, 2, 2, 1, 1, 2, 2, 1, 3, 2, 1, 3, 1, 2, 1, 3, 3, 1, 2, 1, 1, 2, 1, 3, 2, 2, 2, 1, 1, 3, 2, 1, 1, 3, 2, 1, 1, 1, 1, 1, 3, 3, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 1, 1, 1, 1, 1, 3, 3, 1, 1, 2, 2, 3, 1, 3, 3, 2, 2, 1, 3, 1, 1, 1, 2, 1, 3, 2, 2, 3, 3, 2, 3, 3, 1, 1, 1, 1, 3, 2, 1, 2, 2, 1, 2, 3, 1, 1, 3, 1, 1, 3, 1, 3, 3, 1, 1, 3, 1, 1, 3, 3, 3, 3, 1, 3, 3, 2, 1, 2, 2, 1, 3, 3, 1, 2, 3, 1, 2, 2, 1, 2, 1, 2, 3, 3, 1, 2, 1, 1, 1, 2

What is the most liked flavor?

Yes! you were right. But how to calculate the mode.

Unlike mean and median there isn’t any standard formula for mode.

So lets create a simple frequency table:

Flavor	Vanilla	Chocolate	Straberry
Frequency	95	58	47

Hence most of the children likes Vanilla

Normally, the mode is used for categorical data.

With this we come to the end of the topic. In the next part we are going to see about Variablity in a sample.