Measures of position

Measures of position

Measures of position we provide information about the data series that we are analyzing

The description of a dataset includes as an important element the location of data set within a possible value context

Once the basics have been defined in the study of a frequency distribution of a variable, we will study the different ways of summarizing these distributions using position (or centralization) measures, bearing in mind the error made in the summary through the corresponding dispersion measures

It's about finding measures that syntetice frequency distributions. Instead of handling all the data about variables, a task that can be heavy, we can characterize their frequency distribution by some numerical values, choosing as a summary of the data a value around which the values of the variable are distributed

Measures of central position

The central position or average position measurements are values around which the values of the variable are grouped and that summarize the position of the distribution on the horizontal axis. They can also help us synthesize the information provided by the values of the variable

Of the central position measurements, the most commonly used are arithmetic mean, median and fashion. In some specific cases, the harmonic mean or geometric mean is used

Arithmetic mean

The arithmetic mean, \overline{x}, is defined as the sum of all observed values divided by the total number of observations:

I mean: \overline{x}=\frac{x_1\cdot n_1+\cdots+x_k\cdot n_k}{N}=\frac{\sum\limits_{i=1}^{k} (x_i\cdot n_i)}{N}

This is the most commonly used average in practice, for the following advantages:

  • Takes into account all the observed values
  • It is easy to calculate and has a clear statistical significance
  • It is unique

However, it has the disadvantage of the influence exerted by the extreme values of the distribution on it

The medium cropped is obtained by calculating the mean of the observed values a
a certain percentage of the extreme values (the same percentage on both sides) have been removed

It is often used to calculate the mean of a variable in which we know, or suspect, that there are extreme values, as these can "deflect" the mean

Properties of the arithmetic mean

  1. The sum of the deviations (differences with the corresponding sign) of the variable values, relative to their arithmetic mean, is equal to zero

    \sum\limits_{i=1}^{k} (x_i-\overline{x})\cdot n_i=\sum\limits_{i=1}^{k} (x_i\cdot n_i)-\overline{x}\cdot \sum\limits_{i=1}^{k} n_i=N\cdot\overline{x}-N\cdot\overline{x}=0

  2. The mean is affected by the source and scale changes. If we have to u_i=a+b\cdot x_i, being any a and b values, with b nonzero (which is equivalent to making a change of origin and scale), the arithmetic mean can be expressed as follows: \overline{u}=a+b\cdot\overline{x}

    And to prove it is very simple:

    \overline{u}=\frac{\sum\limits_{i=1}^{k} (u_i\cdot n_i)}{N}=\frac{\sum\limits_{i=1}^{k} (a+b\cdot x_i)\cdot n_i}{N}=\frac{a}{N}\cdot \sum\limits_{i=1}^{k} n_i+\frac{b}{N}\cdot \sum\limits_{i=1}^{k} (x_i\cdot n_i)=\frac{a\cdot N}{N}+\frac{b}{N}\cdot \sum\limits_{i=1}^{k} (x_i\cdot n_i)=a+b\cdot\overline{x}

    This property, conveniently choosing the values a and b, is very useful in many cases, to simplify the calculation of the arithmetic mean

Example of arithmetic mean

In a vaccination campaign, the number of people vaccinated per hour over the course of 50 hours has been:

0, 3, 2, 2, 1, 4, 5, 2, 3, 2, 1, 0, 4, 3, 5, 3, 1, 4, 6, 1, 2, 3, 0, 4, 4, 5, 3, 1, 4, 2, 3, 1, 0, 6, 3, 2, 5, 3, 2, 3, 6, 2, 2, 5, 7, 4, 2, 7, 4, 2

We want to calculate the average number of people vaccinated in those 50 hours

Before we start calculating the mean, we group the results into a frequency table:

x_i n_i f_i N_i F_i
0 4 0.08 4 0.08
1 6 0.12 10 0.2
2 12 0.24 22 0.44
3 10 0.2 32 0.64
4 8 0.16 40 0.8
5 5 0.1 45 0.9
6 3 0.06 48 0.96
7 2 0.04 50 1

We calculate the arithmetic mean:

\overline{x}=\frac{\sum\limits_{i=1}^{k} (x_i\cdot n_i)}{N}=\frac{0 \cdot 4 + 1 \cdot 6 + 2 \cdot 12 + 3 \cdot 10 + 4 \cdot 8 + 5 \cdot 5 + 6 \cdot 3 + 7 \cdot 2}{50}=\frac{149}{50}=2.98\simeq 3

Therefore, the average number of people vaccinated per hour in that 50-hour interval has been 3, because it has been rounded up

Median

The median is defined as that value of the variable that divides the distribution into two parts with the same number of observations, when they are sorted from lowest to highest

This measure has the advantage, over the mean, that it is less sensitive to extreme values

Example of median

Following the example of the vaccination campaign, we now want to calculate its median

We check the previous frequency table and see that we have 50 data, to find the central value we divide it by 2 and as it is even we will add 1 to the result. If it had been odd it would not be necessary to add that unity to it, because it would already be divided into two parts with the same number of observations

\frac{50+1}{2}=25.5

When we exit a value close to 26 we will take 2 central positions: 25 and 26

We look in the column of absolute frequencies accumulated in values 25 and 26, whose values are both 3

Now we calculate the median value: Me=\frac{3+3}{2}=3

Therefore, half of those vaccinated per hour in that 50-hour interval have been 3 or less and the other half 3 or more

Fashion

The fashion is defined as that value of the variable whose frequency is not surpassed by that of no other value

It may be the case that the maximum frequency corresponds to 2 or more values of the variable, in that case, the distributions are said to be bimodal or multimodal

Example of fashion

Following the example of the vaccination campaign, we now want to calculate its fashion

We look in the column of absolute frequencies and see that the largest is 12, which corresponds to the value 2

Therefore, the highest number of people vaccinated per hour in that 50 hour interval has been 2

Harmonic mean

The harmonic mean is defined as: Ma(X)=\frac{N}{\frac{x_1}{n_1}+\cdots+\frac{x_k}{n_k}}=\frac{N}{\sum\limits_{i=1}^{k} \frac{x_i}{n_i}}

The advantages of this average are:

  • It is unique
  • Uses all the observed values of the variable

It has the disadvantage that it is strongly influenced by the values of the variable close to zero

This average is used in variables that measure speeds, yields, and, in general, for variables that are the ratio of two magnitudes

Example of harmonic Mean

A cyclist performs a training consisting of 12 series of 1 km, each at constant speed. The data collected from your training are collected in the following table:

Series Speed (km/h)
1 54
2 47
3 46
4 50
5 52
6 47
7 51
8 52
9 49
10 51
11 47
12 50

We want to calculate the average speed of the runner during his training

The arithmetic mean cannot be applied because the variable is the ratio of two magnitudes (V=\frac{e}{t}), in this case the harmonic mean must be applied

Ma(X)==\frac{N}{\sum\limits_{i=1}^{k} \frac{x_i}{n_i}}=\frac{12}{\frac{1}{54}+\frac{2}{47}+\frac{3}{46}+\frac{4}{50}+\frac{5}{52}+\frac{6}{47}+\frac{7}{51}+\frac{8}{52}+\frac{9}{49}+\frac{10}{51}+\frac{11}{47}+\frac{12}{50}}=49.55139

Therefore, the average rider's speed has been 49,55139 km/h in the 12 series

Geometric mean

The geometric mean is defined as: Mg(X)=\sqrt[N]{x_1^{n_1}+\cdots+x_k^{n_k}}=\sqrt[N]{\prod\limits_{i=1}^{k} x_i^{n_i}}

It has the advantage, that in its calculation all observed values of the variable are used

It has the disadvantage of the influence exerted by values close to zero and negative values if N is even

This average is used in variables that measure percentages, rates, or index numbers

In any set of observations, if they can be calculated, it is always true that: Ma(X)< Mg(X)<\overline{X}

Example of media geomética

We have the price of a certain product and we know that in the last 3 years its price has risen by 10%, 20% and 30%

We want to know how much has been the rise of media

That is, we want to know what percentage you would have had to have raised each year (the same annual percentage) to get the same price after three years

Since percentages are being calculated we cannot use the arithmetic mean, we must use the geometric mean

Mg(X)=\sqrt[N]{\prod\limits_{i=1}^{k} x_i^{n_i}}=\sqrt[3]{(1+\frac{10}{100})\cdot(1+\frac{20}{100})\cdot(1+\frac{30}{100})}=\sqrt[3]{1.1\cdot 1.20\cdot 1.3}=1.19721577

Now, the result, we pass it to percentage: 1.19721577\cdot 100 =11.9721577\%

Thus, the average annual increase over the past 3 years has been 11.9721577%