Measures of position
Measures of position we provide information about the data series that we are analyzing
The description of a dataset includes as an important element the location of data set within a possible value context
Once the basics have been defined in the study of a frequency distribution of a variable, we will study the different ways of summarizing these distributions using position (or centralization) measures, bearing in mind the error made in the summary through the corresponding dispersion measures
It's about finding measures that syntetice frequency distributions. Instead of handling all the data about variables, a task that can be heavy, we can characterize their frequency distribution by some numerical values, choosing as a summary of the data a value around which the values of the variable are distributed
Measures of central position
The central position or average position measurements are values around which the values of the variable are grouped and that summarize the position of the distribution on the horizontal axis. They can also help us synthesize the information provided by the values of the variable
Of the central position measurements, the most commonly used are arithmetic mean, median and fashion. In some specific cases, the harmonic mean or geometric mean is used
Arithmetic mean
The arithmetic mean, \overline{x}, is defined as the sum of all observed values divided by the total number of observations:
I mean: \overline{x}=\frac{x_1\cdot n_1+\cdots+x_k\cdot n_k}{N}=\frac{\sum\limits_{i=1}^{k} (x_i\cdot n_i)}{N}
This is the most commonly used average in practice, for the following advantages:
- Takes into account all the observed values
- It is easy to calculate and has a clear statistical significance
- It is unique
However, it has the disadvantage of the influence exerted by the extreme values of the distribution on it
The medium cropped is obtained by calculating the mean of the observed values a
a certain percentage of the extreme values (the same percentage on both sides) have been removed
It is often used to calculate the mean of a variable in which we know, or suspect, that there are extreme values, as these can "deflect" the mean
Properties of the arithmetic mean
- The sum of the deviations (differences with the corresponding sign) of the variable values, relative to their arithmetic mean, is equal to zero
\sum\limits_{i=1}^{k} (x_i-\overline{x})\cdot n_i=\sum\limits_{i=1}^{k} (x_i\cdot n_i)-\overline{x}\cdot \sum\limits_{i=1}^{k} n_i=N\cdot\overline{x}-N\cdot\overline{x}=0
- The mean is affected by the source and scale changes. If we have to u_i=a+b\cdot x_i, being any a and b values, with b nonzero (which is equivalent to making a change of origin and scale), the arithmetic mean can be expressed as follows: \overline{u}=a+b\cdot\overline{x}
And to prove it is very simple:
\overline{u}=\frac{\sum\limits_{i=1}^{k} (u_i\cdot n_i)}{N}=\frac{\sum\limits_{i=1}^{k} (a+b\cdot x_i)\cdot n_i}{N}=\frac{a}{N}\cdot \sum\limits_{i=1}^{k} n_i+\frac{b}{N}\cdot \sum\limits_{i=1}^{k} (x_i\cdot n_i)=\frac{a\cdot N}{N}+\frac{b}{N}\cdot \sum\limits_{i=1}^{k} (x_i\cdot n_i)=a+b\cdot\overline{x}
This property, conveniently choosing the values a and b, is very useful in many cases, to simplify the calculation of the arithmetic mean
Example of arithmetic mean
In a vaccination campaign, the number of people vaccinated per hour over the course of 50 hours has been:
0, 3, 2, 2, 1, 4, 5, 2, 3, 2, 1, 0, 4, 3, 5, 3, 1, 4, 6, 1, 2, 3, 0, 4, 4, 5, 3, 1, 4, 2, 3, 1, 0, 6, 3, 2, 5, 3, 2, 3, 6, 2, 2, 5, 7, 4, 2, 7, 4, 2
We want to calculate the average number of people vaccinated in those 50 hours
Before we start calculating the mean, we group the results into a frequency table:
x_i |
n_i |
f_i |
N_i |
F_i |
0 |
4 |
0.08 |
4 |
0.08 |
1 |
6 |
0.12 |
10 |
0.2 |
2 |
12 |
0.24 |
22 |
0.44 |
3 |
10 |
0.2 |
32 |
0.64 |
4 |
8 |
0.16 |
40 |
0.8 |
5 |
5 |
0.1 |
45 |
0.9 |
6 |
3 |
0.06 |
48 |
0.96 |
7 |
2 |
0.04 |
50 |
1 |
We calculate the arithmetic mean:
\overline{x}=\frac{\sum\limits_{i=1}^{k} (x_i\cdot n_i)}{N}=\frac{0 \cdot 4 + 1 \cdot 6 + 2 \cdot 12 + 3 \cdot 10 + 4 \cdot 8 + 5 \cdot 5 + 6 \cdot 3 + 7 \cdot 2}{50}=\frac{149}{50}=2.98\simeq 3
Therefore, the average number of people vaccinated per hour in that 50-hour interval has been 3, because it has been rounded up
Median
The median is defined as that value of the variable that divides the distribution into two parts with the same number of observations, when they are sorted from lowest to highest
This measure has the advantage, over the mean, that it is less sensitive to extreme values
Example of median
Following the example of the vaccination campaign, we now want to calculate its median
We check the previous frequency table and see that we have 50 data, to find the central value we divide it by 2 and as it is even we will add 1 to the result. If it had been odd it would not be necessary to add that unity to it, because it would already be divided into two parts with the same number of observations
\frac{50+1}{2}=25.5
When we exit a value close to 26 we will take 2 central positions: 25 and 26
We look in the column of absolute frequencies accumulated in values 25 and 26, whose values are both 3
Now we calculate the median value: Me=\frac{3+3}{2}=3
Therefore, half of those vaccinated per hour in that 50-hour interval have been 3 or less and the other half 3 or more
Fashion
The fashion is defined as that value of the variable whose frequency is not surpassed by that of no other value
It may be the case that the maximum frequency corresponds to 2 or more values of the variable, in that case, the distributions are said to be bimodal or multimodal
Example of fashion
Following the example of the vaccination campaign, we now want to calculate its fashion
We look in the column of absolute frequencies and see that the largest is 12, which corresponds to the value 2
Therefore, the highest number of people vaccinated per hour in that 50 hour interval has been 2
Harmonic mean
The harmonic mean is defined as: Ma(X)=\frac{N}{\frac{x_1}{n_1}+\cdots+\frac{x_k}{n_k}}=\frac{N}{\sum\limits_{i=1}^{k} \frac{x_i}{n_i}}
The advantages of this average are:
- It is unique
- Uses all the observed values of the variable
It has the disadvantage that it is strongly influenced by the values of the variable close to zero
This average is used in variables that measure speeds, yields, and, in general, for variables that are the ratio of two magnitudes
Example of harmonic Mean
A cyclist performs a training consisting of 12 series of 1 km, each at constant speed. The data collected from your training are collected in the following table:
Series |
Speed (km/h) |
1 |
54 |
2 |
47 |
3 |
46 |
4 |
50 |
5 |
52 |
6 |
47 |
7 |
51 |
8 |
52 |
9 |
49 |
10 |
51 |
11 |
47 |
12 |
50 |
We want to calculate the average speed of the runner during his training
The arithmetic mean cannot be applied because the variable is the ratio of two magnitudes (V=\frac{e}{t}), in this case the harmonic mean must be applied
Ma(X)==\frac{N}{\sum\limits_{i=1}^{k} \frac{x_i}{n_i}}=\frac{12}{\frac{1}{54}+\frac{2}{47}+\frac{3}{46}+\frac{4}{50}+\frac{5}{52}+\frac{6}{47}+\frac{7}{51}+\frac{8}{52}+\frac{9}{49}+\frac{10}{51}+\frac{11}{47}+\frac{12}{50}}=49.55139
Therefore, the average rider's speed has been 49,55139 km/h in the 12 series
Geometric mean
The geometric mean is defined as: Mg(X)=\sqrt[N]{x_1^{n_1}+\cdots+x_k^{n_k}}=\sqrt[N]{\prod\limits_{i=1}^{k} x_i^{n_i}}
It has the advantage, that in its calculation all observed values of the variable are used
It has the disadvantage of the influence exerted by values close to zero and negative values if N is even
This average is used in variables that measure percentages, rates, or index numbers
In any set of observations, if they can be calculated, it is always true that: Ma(X)< Mg(X)<\overline{X}
Example of media geomética
We have the price of a certain product and we know that in the last 3 years its price has risen by 10%, 20% and 30%
We want to know how much has been the rise of media
That is, we want to know what percentage you would have had to have raised each year (the same annual percentage) to get the same price after three years
Since percentages are being calculated we cannot use the arithmetic mean, we must use the geometric mean
Mg(X)=\sqrt[N]{\prod\limits_{i=1}^{k} x_i^{n_i}}=\sqrt[3]{(1+\frac{10}{100})\cdot(1+\frac{20}{100})\cdot(1+\frac{30}{100})}=\sqrt[3]{1.1\cdot 1.20\cdot 1.3}=1.19721577
Now, the result, we pass it to percentage: 1.19721577\cdot 100 =11.9721577\%
Thus, the average annual increase over the past 3 years has been 11.9721577%