Category Archives: Statistics

Statistics is the science of formal studying uses and analysis from a representative sample of data, it seeks to explain the correlations and dependencies of a physical phenomenon or a natural occurrence in random or conditional

Statistics

Statistics

Statistics (the feminine form of the German Statistik, and this derivative of the Italian statista "statesman") is a formal science and a tool that studies uses and analyzes from a representative sample of data, seeks to explain the correlations and dependencies of a physical or natural phenomenon, occurring randomly or conditionally

It is cross-cutting to a wide variety of disciplines, from physics to social sciences, from health sciences to quality control

Used for decision-making in business areas or government institutions

Statistics are divided into two main areas:

  • Descriptive statistics
  • Inferential statistics

There is also a discipline called mathematical statistics, which refers to the theoretical basis of matter

The word "statistics" also refers to the result of applying a statistical algorithm to a set of data, such as economic statistics, criminal statistics, among others

Today, statistics is a science that is responsible for studying a particular population through the collection, collection and interpretation of data. Similarly, it is considered a special technique suitable for the quantitative study of mass or collective phenomena

Descriptive statistics

It is dedicated to the description, visualization and summary of data originated from study phenomena. The data can be summarized numerically or graphically. Basic examples of statistical parameters are: mean and standard deviation. Some graphic examples are: histogram, population pyramid, pie chart, among others

Inferential statistics

It is dedicated to the generation of models, inferences and predictions associated with the phenomena in question taking into account the randomness of the observations. It is used to model patterns in the data and extract inferences about the population under study. These inferences can take the form of answers to yes/no questions (hypothesis test), estimates of numerical characteristics (estimation), forecasts of future observations, association descriptions (correlation) or modeling of relationships between variables (regression analysis). Other modeling techniques include variance analysis, time series, and data mining

Both branches (descriptive and inferential) comprise the applied statistic. Inferential statistics, on the other hand, are divided into parametric statistics and non-parametric statistics

Probability

Probability

We will call the probability of a \Omega sample space to any application that complies:

\begin{cases} \Omega \rightarrow R \\ \omega \rightarrow p(\omega) \in \left[0, 1\right] \end{cases}


where the value between 0 and 1 tries to quantify the possibility of that event occurring. It is also usually measured in percentage, so a probability of 1 equals 100% and one of 0 to 0%

  1. P(A) \ge 0, \forall A \text{ event}
  2. P( \Omega) = 1
  3. P(A \cup B) = P(A) + P(B)\text{ si }A \cap B = \emptyset

Properties

  1. P(A) \le 1
  2. P(\emptyset) = 0
  3. P(A^c) = 1 - P(A)
  4. Si B \subset A \Rightarrow P(A - B) = P(A) - P(B)
  5. P(A - B) = P(A) - P(A \cap B)
  6. P(A \cup B) = P(A) + P(B) - P(A \cap B)
  7. P(A_1 \cup \cdots \cup A_n) = P(A_1) + \cdots + P(A_n); \text{ Si } A_i \cap A_j = \emptyset; \forall \not= j
  8. P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P(A \cap C)- P(B \cap C) + P(A \cap B \cap C)
  9. P(A \cup B \cup C \cup D) = P(A) + P(B) + P(C) + P(D) - P(A \cap B) - P(A \cap C) - P(A \cap D) - P(B \cap D) - P(C \cap D) + P(A \cap B \cap C \cap D) + P(B \cap C \cap D) - P(A \cap B \cap C \cap D)

Rule of addition

The addition rule or sum rule states that the probability of occurrence of any particular event is equal to the sum of individual probabilities, if the events are mutually exclusive, i.e. two cannot occur at the same time

P(A) \cup P(B) = P(A) + P(B) if A and B are mutually exclusive

P(A\cup B) = P(A) + P(B) - P(A\cap B) if A and B are not mutually exclusive

Being:

\scriptsize\begin{cases}\text{P(A) = probability of occurrence of the event A}\\ \text{P(B) = probability of occurrence of the event B}\\ P(A \cap B)\text{ = probability of simultaneous occurrence of events A y B}\end{cases}

Rule of multiplication

The rule of multiplication states that the probability of occurrence of two or more events are statistically independent is equal to the product of their probabilities, individual

P(A \cap B) = P(A\cdot B) = P(A) \cdot P(B) if A and B are independent

P(A \cap B) = P (A \cdot B) = P(A)\cdot P(B|A) if A and B are dependent

Being P(B|A) the probability that B will occur having given or verified event A

Rule of Laplace

It \Omega sample space where the sample points have the same possibility of occurrence, To event, then:

P(A) = \frac{n^{\underline{0}}\text{ of favorable cases}}{n^{\underline{0}}\text{ of possible cases}}

Frequency Probability (Von Mises)

It \Omega sample space associated with a random phenomenon, be A event. The frequency probability of A occurring is the relative frequency of the number of times it occurs when we repeat the random phenomenon \infty times

\lim\limits_{n\to\infty} \frac{n^{\underline{0}}\text{ of times it happens}}{n}

Event

Event

An event or set of events is each of the possible results of a random experiment

Randomized experiment

It is that which under similar conditions gives us different results

Examples of experiments random

  • Throw a coin and count the number of faces or crosses
  • Extract a card from a deck
  • Calculate the lifetime of a light bulb
  • Measure the temperature of a processor after an hour of work
  • Calculate the number of calls sent or received by a phone line after an hour

Sample space

The set is composed by all the possible outcomes associated with the randomized experiment

It is the entire set

It is represented with \Omega

Example of sample space

In the experiment of throwing a coin 3 times and counting the number of faces

The sample space will be \Omega=\{0,1,2,3\} for the number of faces obtained

Point sample

Single result obtained from a sample space

It is represented with \omega

Being A a set

And is defined p(\omega)=\{A|A\subseteq\Omega\}

Example of point sampling

In the experiment of throwing a coin 3 times and counting the number of faces

If after throwing the coin 3 times we have counted 2 faces, then the sample point is p(3)=2

Random event

It is a set of points sampling

It is represented with A

It is denoted with capital letters (A_i)_{i\in I} family (finite or infinite)

And is defined (A_i)_{i\in I} \in p(\Omega)

Example of a random event

In the experiment of throwing a coin 3 times and counting the number of faces

Let's repeat the experiment 5 times to get a random event, if after tossing the coin 3 times we have counted:

  • 2 faces, then sample point 1 is p(3_1)=2
  • 0 faces, then sample point 2 is p(3_2)=0
  • 3 faces, then sample point 3 is p(3_3)=2
  • 2 faces, then sample point 4 is p(3_4)=2
  • 1 face, then sample point 5 is p(3_5)=1

The random event is A=\{2,0,2,2,1\}

Occurrence of an event

We say that has occurred an event A if in a particular completion of the randomized experiment we obtain a point sample of P((A_i)_{i\in I})=\{A|A\subseteq\Omega\}

Example of the occurrence of an event

In the experiment of throwing a coin 3 times and counting the number of faces

We are going to repeat the experiment 5 times to obtain a random event

We are going to repeat the experiment 5 times to obtain a random event, if after flipping the coin 3 times we have obtained:

  • 2 faces, then the occurrence of the event is P(3_1)=2
  • 0 faces, then the occurrence of the event is P(3_2)=0
  • 2 faces, then the occurrence of the event is P(3_3)=2
  • 2 faces, then the occurrence of the event is P(3_4)=2
  • 1 face, then the occurrence of the event is P(3_5)=1

Event insurance

It is the one which happens always

It is represented with \Omega

Being A a set

It is denoted
p(\omega)=\{A|A\subseteq\Omega\}=\Omega
\Omega=\{x, x\in\Omega\}\not =\{\{x\},x\in\Omega\}\subseteq p(\Omega)

Example of event insurance

In the experiment of throwing a coin 3 times and counting the number of faces

Getting a number of faces (including 0) is a safe event because we can always count the number of faces (even if none comes out, because we've included 0)

The event sure, then it is
\omega=\{"get a number of faces"\}
p(\omega)=\Omega

Event impossible

It is the one that never happens

It is represented with \emptyset

Being A a set

It is denoted
p(\omega)=\{A|A\subseteq\Omega\}=\emptyset
p(\emptyset)=1

Example of event impossible

In the experiment of throwing a coin 3 times and counting the number of faces

Getting the color red, it is an impossible event because in the experiment we are taking into account the number of faces obtained, we are not taking into account the color of the die

The event sure, then it is
\omega=\{"get the color red"\}
p(\omega)=\emptyset

Event otherwise

We will call event otherwise A, to the event that occurs when it does not occur A

It is represented with A^c

It is denoted A^c=\Omega\backslash A

Example of an event contrary

In the experiment of tossing a coin and count the number of faces

Getting cross instead of face, is the opposite event because we're taking into account the number of faces, not crosses

If A=\{"number of faces obtained"\} then the event otherwise it is A^c=\{"number of crosses obtained"\}

Union of events

We will call event binding A and B, to the event that occurs or A or B or two

It is represented with A\cup B

Being A a set

It is denoted \underset{i\in I}{\bigcup} A_i\in p(\Omega)

Example of a union of events

In the experiment of tossing a coin and count the number of faces or crosses

Being
A=\{"number of faces obtained"\}=\{3,4\}
B=\{"number of crosses obtained"\}=\{2,4,6\}
A\cup B=\{"number of faces or crosses obtained"\}=\{2,3,4,6\}

Intersection of events

We will call the intersection of events A and B, to the event that occurs when it occurs A and B

It is represented with A\cap B

Being A a set

It is denoted \underset{i\in I}{\bigcap} A_i\in p(\Omega)

Example of intersection of events

In the experiment of tossing a coin and count the number of faces or crosses

Being
A=\{"number of faces obtained"\}=\{3,4\}
B=\{"number of crosses obtained"\}=\{2,4,6\}
A\cap B=\{"even number of faces and crosses obtained"\}=\{4\}

Difference event

We will call a difference event A and B, to the event that occurs when it occurs A or B but not both at the same time

It is represented with A \backslash B = A - B

It is denoted A - B = A - A \cap B = A \cap B^c

Example of difference of events

In the experiment of tossing a coin and count the number of faces or crosses

Being
A=\{"number of faces obtained"\}=\{3,4\}
B=\{"number of crosses obtained"\}=\{2,4,6\}
A-B=\{"odd number of faces or crosses obtained but not both at once"\}=A - A\cap B=\{3,4\}-\{4\}=\{3\}

Symmetric difference of events

We will call symmetric difference of events A and B, to the event of all events that occurs when it occurs A\cup B but not A\cap B

It is represented with A \triangle B

It is denoted A \triangle B = (A \cup B) - (A \cap B)

Example of symmetric difference of events

In the experiment of tossing a coin and count the number of faces or crosses

Being
A=\{"number of faces obtained"\}=\{3,4\}
B=\{"number of crosses obtained"\}=\{2,4,6\}
A\triangle B=\{"even number of faces or crosses obtained but not even number of faces and crosses"\}=(A \cup B) - (A \cap B)=\{2,3,4,6\}-\{4\}=\{2,3,6\}

Laws of Morgan

Laws proposed by Augustus De Morgan (1806-1871), an Indian-born British mathematician and logician, which set out the following fundamental principles of the algebra of logic:

  • The negation of the conjunction is equivalent to the disjunction of negations

  • The negation of the disjunction is equivalent to the conjunction of the negations

The following definitions of Morgan's laws can be used within the statistics:

Being A, B and C sets

  1. \left(A\cup B\right)^c = A^c\cap B^c
    whose generalized form is
    \left(\underset{i\in I}{\bigcup} A_i\right)^c = \underset{i\in I}{\bigcap} \left(A_i\right)^c
  2. \left(A\cap B\right)^c = A^c\cup B^c
    whose generalized form is
    \left(\underset{i\in I}{\bigcap} A_i\right)^c = \underset{i\in I}{\bigcup} \left(A_i\right)^c
  3. A\cap\left(B\cup C\right) = \left(A\cap B\right)\cup\left(A\cap C\right)
    whose generalized form is
    \underset{j\in I}{\bigcap}\left(\underset{i\in I}{\bigcup} A_i\right) = \underset{i j\in I}{\bigcup}\left(\underset{j\in I}{\bigcap} A_{i j, j}\right)
  4. A\cup\left(B\cap C\right) = \left(A\cup B\right)\cap\left(A\cup C\right)
    whose generalized form is
    \underset{j\in I}{\bigcup}\left(\underset{i\in I}{\bigcap} A_i\right) = \underset{i j\in I}{\bigcap}\left(\underset{j\in I}{\bigcup} A_{i j, j}\right)

Demonstration 1

We want to show that \left(A\cup B\right)^c = A^c\cap B^c

\omega\in\left(A\cup B\right)^c \Rightarrow \omega \not \in A\cup B \Rightarrow \begin{cases} \omega \not \in A \\ \omega \not \in B \end{cases} \Rightarrow \begin{cases} \omega \in A^c \\ \omega \in B^c \end{cases} \Rightarrow \omega \in A^c\cap B^c

With what we got to what we wanted, being tested

Demonstration 2

We want to show that \left(A\cap B\right)^c = A^c\cup B^c

\omega\in\left(A\cap B\right)^c \Rightarrow \omega \not \in A\cap B \Rightarrow \begin{cases} \omega \not \in A \\ \omega \not \in B \end{cases} \Rightarrow \begin{cases} \omega \in A^c \\ \omega \in B^c \end{cases} \Rightarrow \omega \in A^c\cup B^c

With what we got to what we wanted, being tested

Event incompatible

We will say that A and B these are events that are incompatible if they can not occur never at the same time

It is denoted
A \cap B = \emptyset
A \cap A^c = \emptyset

A family \left(A_i\right)_{i\in I} sets 2 to 2 disjoints (or mutually exclusive) if A_i\cup A_j = \emptyset when i\not = j

If a family \left(A_i\right)_{i\in I} is mutually exclusive, we'll denote it \underset{i\in I}{\sqcup}A_i := \underset{i\in I}{\cup}A_i

We say that a family \left(A_i\right)_{i\in I} is exhaustive if A_i\cap A_j = \Omega

Set denumerable

A set is said to denumerable if it is biyectivo with \mathbb{N}

Set accounting

A set is said to countable if a is denumerable or finite

Combinatorial

Combinatorial

The combinatorial is a branch of mathematics belonging to the area of discrete mathematics that studies the enumeration, construction and existence of configuration properties that satisfy certain established conditions

In addition, it studies the sorts or groupings of a certain number of elements. It is used in statistics to perform probabilistic calculations

Variations

Suppose we want to count the total number of possible injective applications that can be built from an X set, from k elements, into another set Y, of n elements (which will have to be k \le n)

An application f| X \rightarrow Y with injective f, it is completely determined whether we know each of the images of the k elements of X

If we consider the f application as a word of k letters of the Y alphabet, it will not have repeated letters. The f app will be f(x_1)f(x_2)\cdots f(x_n) then:

f(x_1) \in Y
f(x_2) \in Y \text{\ }\{f(x_1)\} = \{y \in Y | y \not= f(x_1)\}
f(x_3) \in Y \text{\ }\{f(x_1), f(x_2)\} = \{y \in Y | y \not= f(x_1), y \not= f(x_2)\}
\vdots
f(x_n) \in Y \text{\ }\{f(x_1), \cdots, f(x_{k - 1})\} = \{y \in Y | y \not= f(x_1), \cdots, y \not= f(x_{k - 1})\}

If we denote by V(n, k) to the total injective applications of X in Y and we call it variations of n elements taken from k to k, then we have by the principle of the product that:

V(n, k) = n \cdot (n - 1) \cdot (n -2)\cdots (n - k + 1) = \frac{n!}{(n-k)!}

Where n! = n \cdot (n - 1) \cdot (n -2) \cdot \cdots \cdot 2 \cdot 1 which is the product of all natural numbers from 1 to n (this amount is called factorial of n)

Example of variations

What is the probability that in a group of n people there will be 2 who celebrate the birthday on the same day?

Calculating the probability of the n sets is very tedious, we have to calculate the probability that they met it 0, 1, \cdots, (n -1) on the same day

That's why it's best to calculate the probability of the opposite event. That is, the probability that n people will celebrate their birthday on different days, it becomes the same as giving an orderly list of n days other than between 365 days of the year. Therefore we have to:

\text{Favorable cases = }V(365, n) = \frac{365!}{(365 - n)!}

Possible cases are all sorted lists of n days, so repetitions are allowed (they are variations with repetition). Therefore we have to:

\text{Possible cases = }VR(365, n) = 365^n

So the solution to our problem will be given by:

p = 1 - \frac{\text{favorable cases}}{\text{possible cases}} = 1 - \frac{V(365, n)}{VR(365, n)} = 1 - \frac{365!}{365^n \cdot (365 - n)!}

The following table shows the p probability that in a group of n people there are at least two who celebrate their birthday on the same day:

n p n p
5 0.027136 35 0.814383
10 0.116948 40 0.891223
15 0.252901 45 0.940976
20 0.411438 50 0.970374
21 0.443688 55 0.986262
22 0.475695 60 0.994123
23 0.507297 65 0.997683
24 0.538344 70 0.999160
25 0.568700 75 0.999720
26 0.598241 80 0.999914
27 0.626859 85 0.999976
28 0.654461 90 0.999994
29 0.680969 95 0.99999856
30 0.706316 100 0.99999969

Variations with repetition

Let's say we want to count the total number of possible applications that can be built from an X set, from k elements, into another set Y, of n elements

An application f| X \rightarrow Y is completely determined if we know every one of the images of the k elements of X

I mean, we need to know f(x_i) with 1 \le i \le k. This is equivalent to giving a k-tuple (f(x_1), f(x_2), \cdots, f(x_k)) set Y^k = \overbrace{Y x \cdots x Y}^{k\;\rm times}

It is also equivalent to a word of k letters of the alphabet Y (f(x_1), f(x_2), \cdots, f(x_k)) or give an ordered selection of k elements among Y's (Y elements can be repeated, i.e. it may happen that f(x_i) = f(x_i)\text{ con }i \not= j)

The only condition is that f(x_i) \in Y. Therefore, the total applications of X in Y, or the total variations with repetition of n elements taken from k to k, is equal to the cardinal of Y^k which, by the beginning of the product, is n^k. If we denote this number by VR(n, k), then:

VR(n, k) = n^k

Example of variations with repetition

What is the probability of hitting the plenary to fifteen in a quiniela?

Filling a quiniela is equivalent to giving a list of 15 symbols by choosing between 1, X and 2, that is, a word of length 15 constructed with alphabet 1, X and 2

So we have that the number of possible quinielas will be:

VR(3, 15) = 3^{15}

However, this is not the solution to our problem, which will be given by:

p = \frac{n^{\underline{0}}\text{ of favorable cases}}{n^{\underline{0}}\text{ of possible cases}} = \frac{1}{3^{15}} = 6,9691719376256323913730850719152 \cdot 10^{-8}

Permutations

We will call permutations of m elements, the number of variations without repetition of m elements that can be formed

P_m = m!

Example of Permutations

We have a bookshelf that fits three books and we want to sort them without any repeat. Each book has the cover of a different color: red, blue, and green. To distinguish them we will use the L set of books and their elements are the first letter of the color of their cover:

L=\{R, A, V\}
Management Number of permutation
L=\{R, A, V\} 1
L=\{R, V, A\} 2
L=\{V, R, A\} 3
L=\{V, A, R\} 4
L=\{A, V, R\} 5
L=\{A, R, V\} 6

To calculate the number of permutations we can see that they have been grouped together until we get all the possible variations. The first pass all 3 items have been used. The second is discarded 1 and only 2 are used. The third and final, 1 is discarded and the only remaining element is used

Therefore, to calculate the permutations we have to multiply the number of elements other than the 3 passes:

3\cdot 2 \cdot 1 = 6

Or what's the same:

P_3 = 3! = 3\cdot 2 \cdot 1 = 6

Permutations with repetition

We will call permutations with repetition of m elements, the number of variations of m elements that can form when some elements are repeated a finite number of times

PR_{m}^{n_{1}, n_{2}, \cdots, n_k} = \frac{m!}{n_{1}! \times n_{2}! \times \cdots \times n_k!}

Example of permutations with repetition

The result of a football match was 5-4

How many different ways could this result be achieved?

Any goal scored by the local team what we denote by L and any goal scored by the visiting team with V

The number of L or V-total has to be of length 5 + 4 = 9, so we look for any sorted list containing 5 L and 4 V in any order, representing the possible order of goals in the match

With what we have that the number of different ways to reach that result is:

PR_{9}^{5, 4} = \frac{9!}{5! \cdot 4!} = 126

Combinations

We call combinations of m elements taken from n in n the number of subsets that can be formed with n of these m elements without repeating any

C_{m, n} = {m \choose n} = \frac{m!}{n! \cdot (m - n)!}

Example of combinations

What is the probability of hitting the primitive lottery?

In the primitive lottery, there are 49 possible numbers to play and 6 of them can be chosen, regardless of the order in which they appear

To know how many possible combinations there are in this game, just calculate the number of subsets that can be formed with 6 of those 49 elements

With what we have that the number of possible lottery tickets will be:

C_{49, 6} = {49 \choose 6} = \frac{49!}{6! \cdot (49 - 6)!}=\frac{49!}{6! \cdot 43!}

However, this is not the solution to our problem, which will be given by:

p = \frac{n^{\underline{0}}\text{ of favorable cases}}{n^{\underline{0}}\text{ of possible cases}} = \frac{1}{\frac{49!}{6! \cdot 43!}} = \frac{6! \cdot 43!}{49!} = 7,1511238420185162619416617 \cdot 10^{-8}

Combinations with repetition

We will call combinations with repetition of m elements taken from n in n the number of subsets that can be formed with n of these m elements may repeat any

CR_{m, n} = {m + n - 1\choose n} = \frac{(m + n - 1)!}{n! \cdot (m - n)!}

Example of combinations with repetition

How many tiles does the domino have?

A domino is a rectangle divided into two equal pairs, and that each part contains a number of points chosen within the set \{0, 1, 2, 3, 4, 5, 6\}, where 0 is represented with the absence of points

The total number of dominoes matches the number of unordered selections of two elements, repeated or not, chosen from the set \{0, 1, 2, 3, 4, 5, 6\}

With what we have that the total number of dominoes will be:

CR_{7, 2} = {7 + 2 - 1\choose 2} = \frac{(7 + 2 - 1)!}{2! \cdot (7 + 2 - 1 - 2)!} = \frac{8!}{2! \cdot 6!} =28

Probability conditional

Probability conditional

The likelihood conditional of A given B with \Omega the sample space of A and B events with P(B)\not=0 will be the probability that A occurs knowing that event B has occurred:

P(A | B) = \frac{P(A \cup B)}{P(B)}

Properties

  1. P(\emptyset | A) = 0
  2. P(\Omega | A) = 1
  3. 0 \leq P(B | A) \leq 1
  4. P(B^c | A) = 1 - P(B | A)
  5. P(A \cup B | C) = P(A | C) + P(B | C) - P(A \cap B | C)
  6. P(A_1 \cap A_2) = P(A_1) \cdot P(A_2 | A_1)
  7. P(A_1 \cap A_2 \cap A_3) = P(A_1) \cdot P(A_2 | A_1) \cdot P(A_3) P(A_3 | A_1 \cap A_2)
  8. P(A_1 \cap \cdots \cap A_n) = P(A_1) \cdot P(A_2 | A_1) \cdots P(A_n) \cdot P(A_n | A_1 \cap A_2 \cap \cdots \cap A_{n-1})

Event independent

It \Omega sample space of A and B events, we will say that they are independent if any of the following equivalent properties hold:

  • P(A | B) = P(A)
  • P(B | A) = P(B)
  • P(A \cap B) = P(A) \cdot P(B)

So whenever \Omega the sample space of A_1, \cdots, A_n events, we will say that they are independent if and only if:

\text{1) }P(A_i \cap A_j) = P(A_i) P(A_j), \forall i \not= j
\text{2) }P(A_i \cap A_j \cap A_k) = P(A_i) P(A_j) P(A_k), \forall i \not= j, i \not= k, j \not= k
\cdots)
\text{n-1) }P(A_1 \cap \cdots \cap A_n) = P(A_1) \cdots P(A_n)

Dependent event

We will say that they are dependent if they are not dependent:

  • P(A | B) \not= P(A)
  • P(A | B) > P(A)
  • P(A | B) < P(A)

Dependency and incompatibility

If A and B have nonzero and incompatible probabilities, then they are dependent

Incompatible: A \cap B = \emptyset \Rightarrow P(A \cap B) = 0

Independent: P(A \cap B) = P(A) \cdot P(B)

Theorem of the probability

It \Omega the sample space of A_1, \cdots, A_n events, we will say that it forms a complete system of events (CSE) if and only if they fulfill:

  1. A_i \not= \emptyset, \forall i
  2. A_i \cap A_j \not= \emptyset, \forall i \not= j
  3. A_1 \cup \cdots \cup A_n = \Omega

Theorem of the probability total

It \Omega a sample space with A_1, \cdots, A_n a complete system of events and let B be another different event, then:

P(B) = P(B | A_1) \cdot P(A_1) + \cdots + P(B | A_n) \cdot P(A_n)

Demonstration

P(B) = P(B \cup A_1) + \cdots + P(B \cup A_n) \cdot P(B | A) = \frac{P(B \cup A)}{P(A)} P(B \cup A) = P(B | A) \cdot P(A) P(B) = P(B | A_1) \cdot P(A_1) + \cdots + P(B | A_n) \cdot P(A_n)

Bayes Theorem

It \Omega a sample space with A_1, \cdots, A_n a complete system of events and let B be another different event, then:

P(A_i | B) = \frac{P(B | A_i) P(A_i)}{P(B)}, \forall i \in \{1, \cdots, n\}

Example of Bayes Theorem

All the production of a company is carried out by 3 machines independently. The first does half the work, the second the fifth and the third the rest. These machines have so far produced 2%, 4% and 3% defective units, respectively. We want to calculate:

  1. The percentage of defective parts that the company produces
  2. If we pick a part at random and it turns out to be defective, what is the most likely machine to produce it?

Before making any calculations, we will sort the information that the problem gives us

Probability that a part is produced on a given machine:

Probability of the machine Result
P(M_1) \frac{1}{2} = 0.5
P(M_2) \frac{1}{5} = 0.2
P(M_3) 1 - \frac{1}{2} - \frac{1}{5} = \frac{10-5-2}{10}=\frac{3}{10}=0.3

Probability that a part is defective, depending on whether it is produced on a specific machine:

Probability to be faulty and the machine Result
P(D | M_1) 2\cdot \frac{1}{100} = 0.02
P(D | M_2) 4\cdot \frac{1}{100} = 0.04
P(D | M_3) 3\cdot \frac{1}{100} = 0.03

Now we move on to solve the questions

  1. We apply the theorem of the probability total

    P(D) = P(D | M_1) \cdot P(M_1) + P(D | M_2) \cdot P(M_2) + P(D | M_3) \cdot P(M_3)
    = 0.02 \cdot 0.5 + 0.04 \cdot 0.2 + 0.03 \cdot 0.3 = 0.027

    Therefore, the company produces a 0.027 \cdot 100 = 2.7\% of defective parts
  2. Before we can answer the question we need to calculate the probabilities of each machine individually and then choose the one that is greater. To do this, we will use Bayes Theorem

    P(M_1 | D) = \frac{P(D | M_1) \cdot P(M_1)}{P(D)} = \frac{0.02 \cdot 0.5}{0.027} = 0.3704

    P(M_2 | D) = \frac{P(D | M_2) \cdot P(M_2)}{P(D)} = \frac{0.04 \cdot 0.2}{0.027} = 0.2963

    P(M_3 | D) = \frac{P(D | M_3) \cdot P(M_3)}{P(D)} = \frac{0.03 \cdot 0.3}{0.027} = 0.3333

    Therefore, the most likely machine to produce the defective part is M_1

Random Variable

Random Variable

A random variable is a function that associates each elementary event a perfectly defined number:

\xi | \Omega \rightarrow \mathbb{R}

Random Variable one-dimensional

It \Omega sample space and P its probability, we will call random variable one-dimensional (v.a.) to an application:

\begin{cases} \xi | \Omega \rightarrow \mathbb{R} \\ \omega \rightarrow \xi(\omega) \in \mathbb{R} \end{cases}

Example of random variable

\Omega \equiv \text{"all 3-bit words"}
\xi \equiv \text{"}n^{\underline{0}}\text{ of some in those words"}
\Omega \equiv \{000, 001, 010, 011, 100, 101, 110, 111\}

\xi | \Omega \rightarrow \mathbb{R}
000 \rightarrow 0
001 \rightarrow 1
010 \rightarrow 1
011 \rightarrow 2
100 \rightarrow 1
101 \rightarrow 2
110 \rightarrow 2
111 \rightarrow 3

P_\xi(0) = P\{\xi = 0\} = P\{000\} = \frac{1}{8} = 0.125
P_\xi(1) = P\{\xi = 1\} = P\{001, 010, 100\} = \frac{3}{8} = 0.375
P_\xi(2) = P\{\xi = 2\} = P\{011, 101, 110\} = \frac{3}{8} = 0.375
P_\xi(3) = P\{\xi = 3\} = P\{111\} = \frac{1}{8} = 0.125
P_\xi(-1) = P\{\xi = -1\} = P\{\emptyset\} = 0
P_\xi(0.75) = P\{\xi = 0.75\} = P\{\emptyset\} = 0

Distribution function

It \xi v.a. (random variable) we'll call the distribution function of \xi to a function:

\begin{cases}F|\mathbb{R} \rightarrow [0, 1] \\ x|F(x) \\ \exists F(x) = P(-\infty, x] = P(\xi \leq x) \text{ con }x \in \mathbb{R} \end{cases}

Example of distribution function

F(0) = P\{\xi \leq 0\} = \frac{1}{8} = 0.125
F(1) = P\{\xi \leq 1\} = \frac{1}{8} + \frac{3}{8} = \frac{1}{2} = 0.5
F(2) = P\{\xi \leq 2\} = \frac{1}{8} + \frac{3}{8} + \frac{3}{8} = \frac{7}{8} = 0.875
F(3) = P\{\xi \leq 3\} = \frac{1}{8} + \frac{3}{8} + \frac{3}{8} + \frac{1}{8} = 1
F(-1) = P\{\xi \leq -1\} = \emptyset = 0
F(0.75) = P\{\xi \leq 0.75\} = F(1) = \frac{1}{2} = 0.5

V.a. discrete type

It \xi v.a. one-dimensional we'll say it's discrete if the set D_\xi = \{x \in \mathbb{R} | P\{\xi = x\} > 0\} is a numberable set (finite or infinite numberable)

Being:

\begin{cases} D_\xi \equiv \text{"support for the v.a. }\xi\text{"} \\ x \in D_\xi \equiv \text{"points of mass of the v.a. }\xi\text{"} \\ P\{\xi = x\}\text{ with }x \in D_\xi \equiv \text{"probability function of n }\xi\text{"} \\ P_i = P\{\xi = x_i\}\text{ with }P_i > 1 \text{ y }\sum\limits_{i=1}^{n} P_i = 1 \end{cases}

V.a. continuous type

We'll say I'm going to see you. \xi type is continuous if the set of points with different probability of 0 is an uncountable set

Is defined if \exists f|\mathbb{R}\rightarrow\mathbb{R}^+\Rightarrow F(x)=\int^{+\infty}_{-\infty} f(t) \cdot dt

When taking a particular value, will be zero (P(\xi = x) = 0), and accordingly:

p(x_1 < \xi \leqslant x_2) = p(x_1 < \xi < x_2) = F(x_2) - F(x_1)

Density function

We call a density function a function from which we can calculate probabilities such as the area enclosed between it and the horizontal axis f(x)

Being:

\begin{cases} f(x) \geq 0, \forall x \in \mathbb{R},\text{ }f(x) \text{ integrable} \\ \int^{+\infty}_{-\infty} f(x) \cdot dx = 1 \end{cases}

Center position measurement: The Average \mu\text{ or }E[\xi]

It \xi v.a. we will call hope (or means) a value denoted as E[\xi]=\mu which in the case of discrete variables is:

E[\xi] = \mu = \sum\limits_{i=1}^{n} x_i \cdot P_i

And in the continuous ones:

E[\xi] = \mu = \int^{+\infty}_{-\infty} x \cdot f(x) \cdot dx

Properties of the mean

  1. E[k] = k\text{; if k is constant}
  2. E[\xi + a] = E[\xi] + a\text{; if a is constant (change of origin)}
  3. E[b\cdot\xi] = b\cdot E[\xi]\text{; if b is constant (change of scale)}
  4. E[a + b\cdot\xi] = a + b\cdot E[\xi]\text{ if a and b are constants (n linear transformation)}
  5. E[\xi_1 + \cdots + \xi_n] = E[\xi_1] + \cdots + E[\xi_n]
  6. k_1 \leq \xi \leq k_2 \Rightarrow k_1 \leq E[\xi] \leq k_2
  7. \xi_1 \leq \xi_2 \Rightarrow E[\xi_1] \leq E[\xi_2]

Absolute dispersion measurement: Variance \sigma^2 \text{ or } Var[\xi]

It \xi v. a. we will call variance to:

\sigma^2 = Var(\xi) = E[(\xi - \mu)^2]\text{ siendo }\mu = E[\xi]

For discrete variables, the following is calculated:

\sigma^2 = Var(\xi) = \sum\limits_{i=1}^{n} (\xi_i - \mu)^2 p_i

For continuous variables, the following is calculated:

\sigma^2 = Var(\xi) = \int^{+\infty}_{-\infty} (x - E(x))^2 \cdot f(x) \cdot dx

Properties of the variance

  1. \sigma^2 = Var(\xi) = E[\xi^2] - E^2[\xi]\text{ in general}
    \sigma^2 = \sum\limits_{i=1}^{n} x^2_i \cdot p_i - \left(\sum\limits_{i=1}^{n} x_i \cdot p_i\right)^2\text{ in the discrete variables}
    \sigma^2 = \int^{+\infty}_{-\infty} x^2 \cdot f(x) \cdot dx - \left(\int^{+\infty}_{-\infty} x \cdot f(x) \cdot dx\right)^2\text{ in the continuous variables}
  2. Var(\xi) \geq 0
  3. Var(\xi) = 0\text{ si }\xi\text{ is constant}
  4. Var(\xi + a) = Var(\xi)\text{ if a is constant}
  5. Var(b\cdot\xi) = b^2\cdot Var(\xi)\text{ if b is constant}
  6. Var(a + b\cdot\xi) = b^2\cdot Var(\xi)\text{ if a and b are constants}

Standard deviation \sigma

It \xi v. a. we will call standard deviation to:

\sigma = dt(\xi) = +\sqrt{Var(\xi)}

Is the positive square root of the variance

Inequality of Tchebycheff

If a v. a. \xi have half a \mu and standard deviation \sigma then for any k > 0 it is fulfilled that:

P\{|\xi - \mu| \leq k\cdot\sigma\} \geq 1 - \frac{1}{k^2}

Or what's the same:

P\{|\xi - \mu| > k\cdot\sigma\} \leq \frac{1}{k^2}

Random variable. Two-dimensional

A discrete two-dimensional v.a. is an application of:

\begin{cases} \sigma \rightarrow \mathbb{R}^2 \\ \omega \rightarrow (x, y) \in \mathbb{R}^2 \end{cases}

Where the set of points with probability > 0 it's numberable, being (x_i, y_j)

We will call points of mass points with probability \not= 0 in a discrete two-dimensional v.a. and we'll denote it (\xi_1, \xi_2), being \xi_1 and \xi_2 v.a. one-dimensional

We will call the probability function to the probabilities of the mass points, that is, to the values:

\begin{cases} P_{i j} = P\{(\xi_1, \xi_2) = (x_i, x_j)\} = P\{\xi_1 = x_i, \xi_2 = y_j\} \\ \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{m} P_{i j} = 1\end{cases}

The sum of the probability functions should always be 1

Being able to obtain the following probabilities matrix:

\begin{pmatrix} \xi_1, \xi_2& y_1& \cdots& y_m& p_{i *} \\ x_1& p_{1 1}& \cdots& p_{1 m}& p_{1 *} \\ \cdots& \cdots& \cdots& \cdots& \cdots \\ x_n& p_{n 1}& \cdots& p_{n m}& p_{n *} \\ p_{* j}& p_{* 1}& \cdots& p_{* m}& 1 \end{pmatrix}

For a random variable two-dimensional discrete \xi_1, \xi_2) marginal distributions are the distributions of one-dimensional v.a. \xi_1\text{ and }\xi_2. In the case of discrete type v.a. the marginal probability functions:

\begin{cases} \xi_1 | p_i = p\{\xi_1 = x_i\} = \sum\limits_{j=1}^{n} p_{i, j} = \sum\limits_{j=1}^{n} p\{\xi_1 = x_i, \xi_2 = y_j\} \\ \xi_2 | p_j = p\{\xi_2 = x_j\} = \sum\limits_{i=1}^{n} p_{i, j} = \sum\limits_{i=1}^{n} p\{\xi_1 = x_i, \xi_2 = y_j\} \end{cases}

For a random variable two-dimensional discrete (\xi_1, \xi_2) conditional distributions are the distributions of one of the components of the two-dimensional v.a. (\xi_1\text{ or }\xi_2) given a value of the other component (\xi_2\text{ or }\xi_1 respectively). In the case of discrete type v.a. the conditional probability functions:

\begin{cases} \xi_1 \text{ given } \xi_2 | p(\xi_1 = x_i | \xi_2 = y_j) = \frac{p(\xi_1 = x_i, \xi_2 = y_j)}{p(\xi_2 = y_j)} = \frac{p_{i, j}}{p_{., j}} \\ \xi_2 \text{ given } \xi_1 | p(\xi_1 = x_i | \xi_2 = y_j) = \frac{p(\xi_1 = x_i, \xi_2 = y_j)}{p(\xi_1 = x_i)} = \frac{p_{i, j}}{p_{i, .}} \end{cases}

To get the mean, a vector of means in column is used:

\begin{pmatrix} E[\xi_1] \\ E[\xi_2] \end{pmatrix} = \begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix}

Covariance

It (\xi_1, \xi_2) v.a. two-dimensional, we will call covariance between \xi_1 and \xi_2 a:

\sigma_{1, 2} = Cov(\xi_1, \xi_2) = E[(\xi_1 - \mu_1) \cdot (\xi_2 - \mu_2)] \text{ con }\mu_1 = E(\xi_1) \text{ y }\mu_2 = E(\xi_2)

For discrete variables, the following is calculated:

\sigma_{1, 2} = Cov(\xi_1, \xi_2) = \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} \left((x_i - \mu_1) \cdot (y_j - \mu_2)\right) p_{i, j}

The covariance measures the linear relationship or covariation between two variables

It is useful to use a table of covariances:

\sum = \begin{pmatrix} Var(\xi_1) & Cov(\xi_1, \xi_2) \\ Cov(\xi_1, \xi_2) & Var(\xi_2) \end{pmatrix} = \begin{pmatrix} \sigma^2_1 & \sigma_{1, 2} \\ \sigma_{1, 2} & \sigma^2_1 \end{pmatrix}

Properties of the covariance

  1. Cov(\xi_1, \xi_2) = E[\xi_1 \xi_2] - E[\xi_1] E[\xi_2]\text{ donde }E[\xi_1 \xi_2] = \sum\limits_{i=1}^{n} \sum\limits_{j=1}^{n} \left((x_i y_j) \cdot (p_{i, j})\right)
  2. Cov(\xi_1 + a, \xi_2 + b) = Cov(\xi_1, \xi_2)\text{ ith constant a and b}
  3. Cov(a \cdot \xi_1, b \cdot \xi_2) = a \cdot b \cdot Cov(\xi_1, \xi_2)\text{ ith constant a and b}
  4. Cov(\xi_1 + \xi_2, \xi_3) = Cov(\xi_1, \xi_3) + Cov(\xi_2, \xi_3)
  5. Cov(\xi_1 + \xi_2, \xi_3 + \xi_4) = Cov(\xi_1, \xi_3) + Cov(\xi_1, \xi_4) + Cov(\xi_2, \xi_3) + Cov(\xi_2, \xi_4)
  6. Var(\xi_1 + \xi_2) = Var(\xi_1) + Var(\xi_2) + 2 \cdot Cov(\xi_1, \xi_2)
  7. Var(\xi_1 - \xi_2) = Var(\xi_1) + Var(\xi_2) - 2 \cdot Cov(\xi_1, \xi_2)
  8. Var(\xi_1 + \xi_2) = Var(\xi_1) + Var(\xi_2)\text{ if }\xi_1\text{ and }\xi_2\text{ they are uncorrelated}
  9. Var(\xi_1 - \xi_2) = Var(\xi_1) + Var(\xi_2)\text{ if }\xi_1\text{ and }\xi_2\text{ they are uncorrelated}

Coefficient of linear correlation

We will call coefficient of linear correlation between \xi_1\text{ and }\xi_2 a:

p_{1 2} = Corr(\xi_1, \xi_2) = \frac{Cov(\xi_1, \xi_2)}{\sqrt{Var(\xi_1) \cdot Var(\xi_2)}} = \frac{\sigma_{1 2}}{\sigma_1 \cdot \sigma_2}

The linear correlation coefficient measures the degree of linear relationship between two variables

Incorreladas

Are \xi_1\text{ and }\xi_2 v.a. we will say that they are uncorrected if they have no linear relationship, i.e.:

Cov(\xi_1, \xi_2) = 0

Correlated

Are \xi_1\text{ and }\xi_2 v.a. we will say that they are correlated if they have linear relationship, i.e.:

Cov(\xi_1, \xi_2) \neq 0

Pearson's correlation coefficient

p_{1 2} = Corr(\xi_1, \xi_2) = \frac{Cov(\xi_1, \xi_2)}{dt(\xi_1) \cdot dt(\xi_2)} = \frac{\sigma_{1 2}}{\sigma_1 \cdot \sigma_2}

Note:

\tiny\begin{cases} p_{1 2} = 0 \Leftrightarrow \sigma_{1 2} = 0 \Leftrightarrow \text{ without linear relationship, they are uncorrelated} \\ p_{1 2} \neq 0 \Leftrightarrow \sigma_{1 2} \neq 0 \Leftrightarrow \text{ with linear relationship, they are uncorrelated} \\ p_{1 2} > 0 \Leftrightarrow \sigma_{1 2} > 0 \Leftrightarrow \text{ with increasing linear relationship} \\ p_{1 2} < 0 \Leftrightarrow \sigma_{1 2} < 0 \Leftrightarrow \text{ with linear decreasing relationship} \end{cases}

\text{Given }-1 \leq p_{1 2} \leq 1:

\tiny\begin{cases} p_{1 2} = 1 \Leftrightarrow \text{ with perfect linear increasing relation} \\ p_{1 2} = -1 \Leftrightarrow \text{ with perfect linear decreasing relationship} \\ p_{1 2} = 0 \Leftrightarrow \text{ with weak linear relationship} \\ p_{1 2} = \pm 1 \Leftrightarrow \text{ with strong linear relation} \end{cases}

Independent

Are \xi_1\text{ and }\xi_2 v.a. we will say that they are independent if they do not have any kind of relationship, that is, if they meet any of the following similar conditions:

  1. p(\xi_1 = x_i|\xi_2 = y_j) = p(\xi_1 = x_i); \forall(x_i, y_j)
  2. p(\xi_2 = y_j|\xi_1 = x_i) = p(\xi_2 = y_j); \forall(x_i, y_j)
  3. p(\xi_1 = x_i|\xi_2 = y_j) = p(\xi_1 = x_i) p(\xi_2 = y_j); \forall(x_i, y_j)

Dependent

Are \xi_1\text{ and }\xi_2 v.a. we'll say they're dependent if they have some kind of relationship

Descriptive Statistics

Descriptive Statistics

We will use descriptive statistics to describe the behavior of a characteristic, from the mass of data provided by its observation in the population, we will carry out a series of operations such as:

  • The reduction of the mass of data, by means of the construction of frequency tables and the realization of some graphs
  • In the case of quantitative variables, we can also take some measures that allow us to characterize the behavior of the variable. To do this we need to calculate some statistics such as position, dispersion and form

With all this, we can perfectly describe the behavior of our variable

Description and organization of data

When using computer programs it is common to name the variables so that there are no mistakes regarding the content of them, but we should not forget that the normal thing in statistics, especially when giving general results is to name the statistical variables using capital letters, preferably the last ones of the alphabet: X, Y, Z, \cdots, and the different values taken by that variable are named with the same letter but in lowercase: x_1, x_2, x_3, \cdots

We will use this notation, to give the following definitions:

Absolute frequency of a certain value, x_i, of the variable (and we will represent it by n_i): is the number of times that certain value is presented x_i

Relative frequency of a certain value, x_i, of the variable (and we will represent it by f_i): is the proportion of times that value appears in the observation set and is calculated as the ratio of its absolute frequency (n_i) and the total number of data (N)

I mean: \frac{n_i}{N}

Absolute frequency accumulated of a certain value, x_i, of the variable (and we will represent it by N_i): is the sum of the absolute frequencies of all variable values less than or equal to that value x_i

I mean: N_i=n_1+\cdots+n_i=\sum\limits_{j=1}^{i} n_j; N_k=N

Relative frequency cumulative of a certain value, x_i, of the variable (and we will represent it by F_i): is the sum of the relative frequencies of all variable values less than or equal to that value, x_i

I mean: F_i=f_1+\cdots+f_i=\sum\limits_{j=1}^{i} f_j=\frac{N_i}{N}; F_k=1

Cumulative frequencies only make sense if the scale is ordinal or quantitative. When a set of observed values in a variable performs sorting and grouping operations on repeating values (determining the frequency of each value), a statistical frequency distribution table is obtained

To said set of operations is called a tab

When a variable has many different values, sometimes (although it is not usually recommended), before its analysis it proceeds to group the observed values into intervals

In these cases, what you do is define the intervals (which can be constant amplitude or not) and then calculate the frequency for the values of the variable that are in each of the intervals. That is, frequencies do not represent the times or proportion of times a value appears, but how many times (or how many times) variable values have been obtained in each interval

Each interval is perfectly delimited by its limits, as well as for the i-th interval: l_{i-1} it would be the lower limit and l_i it would be the upper limit

The amplitude of the interval a_i is the distance between the two limits: a_i = l_i - l_{i-1}

To facilitate mathematical management of intervals it is necessary to consider a specific value of the variable as a representative of each interval, which is called class brand, and is denoted by x_i. Usually taken as a class mark, the midpoint of the interval, although care must be taken as it is not always the best representative of the same

In the event that the intervals have different amplitude, a value to consider is the density frequency, which is the number of observations of the variable per unit
length

I mean: h_i = \frac{n_i}{a_i}

By affinity with the density function (which will be discussed later), in some cases the density of relative frequencies, which is nothing more than the proportion of observations per unit length

I mean: h'_i = \frac{f_i}{a_i}

Statistical measures

Statistical measures with numerical values tell us the most important traits of frequency distributions and are classified into the following groups based on what they try to measure:

\text{Measurements}\left\{\begin{matrix}\text{of position}& \left\{\begin{matrix}\text{central}& \\\text{not central}\end{matrix}\right.& \\ \text{of dispersion}& \left\{\begin{matrix}\text{absolute}& \\\text{relative}\end{matrix}\right.& \\\text{of shape}& \left\{\begin{matrix}\text{of asymmetry}& \\\text{of kurtosis}\end{matrix}\right.& \\\text{of concentration}\end{matrix}\right.

Graphics

To summarize the information it is also very common to use charts. Let's look at some of the simplest:

  • Bar diagram: Used in ungrouped variables in intervals. On a system of coordinate axes are placed, on the abscrising axis the values of the variable and on the axis of ordering the absolute frequencies, then, on each value of the variable rises a bar whose height is equal to its absolute frequency

    If instead of the absolute frequencies we use relative frequencies, the resulting graph is analogous, but N times less

    It is also typically used to display the observed values of a variable

  • Sector diagram: It is generally used for variables not grouped into intervals and consists of dividing the area of a circle into sectors proportional to frequencies (absolute or relative)). The grades covered by each sector are obtained by a simple rule of three, taking into account that the total data (N) corresponds to 360^o
  • Frequency Histogram: Used for variables grouped into intervals. It is constructed by lifting over each interval, represented on the abscceous axis, a rectangle whose area is proportional to the frequency (absolute or relative) in that range. In general, the height of the rectangle of the i-th range is proportional to the frequency density. In particular, if all intervals have the same amplitude we can take, as the height of the rectangles, the frequencies

Measures of position

Measures of position

Measures of position we provide information about the data series that we are analyzing

The description of a dataset includes as an important element the location of data set within a possible value context

Once the basics have been defined in the study of a frequency distribution of a variable, we will study the different ways of summarizing these distributions using position (or centralization) measures, bearing in mind the error made in the summary through the corresponding dispersion measures

It's about finding measures that syntetice frequency distributions. Instead of handling all the data about variables, a task that can be heavy, we can characterize their frequency distribution by some numerical values, choosing as a summary of the data a value around which the values of the variable are distributed

Measures of central position

The central position or average position measurements are values around which the values of the variable are grouped and that summarize the position of the distribution on the horizontal axis. They can also help us synthesize the information provided by the values of the variable

Of the central position measurements, the most commonly used are arithmetic mean, median and fashion. In some specific cases, the harmonic mean or geometric mean is used

Arithmetic mean

The arithmetic mean, \overline{x}, is defined as the sum of all observed values divided by the total number of observations:

I mean: \overline{x}=\frac{x_1\cdot n_1+\cdots+x_k\cdot n_k}{N}=\frac{\sum\limits_{i=1}^{k} (x_i\cdot n_i)}{N}

This is the most commonly used average in practice, for the following advantages:

  • Takes into account all the observed values
  • It is easy to calculate and has a clear statistical significance
  • It is unique

However, it has the disadvantage of the influence exerted by the extreme values of the distribution on it

The medium cropped is obtained by calculating the mean of the observed values a
a certain percentage of the extreme values (the same percentage on both sides) have been removed

It is often used to calculate the mean of a variable in which we know, or suspect, that there are extreme values, as these can "deflect" the mean

Properties of the arithmetic mean

  1. The sum of the deviations (differences with the corresponding sign) of the variable values, relative to their arithmetic mean, is equal to zero

    \sum\limits_{i=1}^{k} (x_i-\overline{x})\cdot n_i=\sum\limits_{i=1}^{k} (x_i\cdot n_i)-\overline{x}\cdot \sum\limits_{i=1}^{k} n_i=N\cdot\overline{x}-N\cdot\overline{x}=0

  2. The mean is affected by the source and scale changes. If we have to u_i=a+b\cdot x_i, being any a and b values, with b nonzero (which is equivalent to making a change of origin and scale), the arithmetic mean can be expressed as follows: \overline{u}=a+b\cdot\overline{x}

    And to prove it is very simple:

    \overline{u}=\frac{\sum\limits_{i=1}^{k} (u_i\cdot n_i)}{N}=\frac{\sum\limits_{i=1}^{k} (a+b\cdot x_i)\cdot n_i}{N}=\frac{a}{N}\cdot \sum\limits_{i=1}^{k} n_i+\frac{b}{N}\cdot \sum\limits_{i=1}^{k} (x_i\cdot n_i)=\frac{a\cdot N}{N}+\frac{b}{N}\cdot \sum\limits_{i=1}^{k} (x_i\cdot n_i)=a+b\cdot\overline{x}

    This property, conveniently choosing the values a and b, is very useful in many cases, to simplify the calculation of the arithmetic mean

Example of arithmetic mean

In a vaccination campaign, the number of people vaccinated per hour over the course of 50 hours has been:

0, 3, 2, 2, 1, 4, 5, 2, 3, 2, 1, 0, 4, 3, 5, 3, 1, 4, 6, 1, 2, 3, 0, 4, 4, 5, 3, 1, 4, 2, 3, 1, 0, 6, 3, 2, 5, 3, 2, 3, 6, 2, 2, 5, 7, 4, 2, 7, 4, 2

We want to calculate the average number of people vaccinated in those 50 hours

Before we start calculating the mean, we group the results into a frequency table:

x_i n_i f_i N_i F_i
0 4 0.08 4 0.08
1 6 0.12 10 0.2
2 12 0.24 22 0.44
3 10 0.2 32 0.64
4 8 0.16 40 0.8
5 5 0.1 45 0.9
6 3 0.06 48 0.96
7 2 0.04 50 1

We calculate the arithmetic mean:

\overline{x}=\frac{\sum\limits_{i=1}^{k} (x_i\cdot n_i)}{N}=\frac{0 \cdot 4 + 1 \cdot 6 + 2 \cdot 12 + 3 \cdot 10 + 4 \cdot 8 + 5 \cdot 5 + 6 \cdot 3 + 7 \cdot 2}{50}=\frac{149}{50}=2.98\simeq 3

Therefore, the average number of people vaccinated per hour in that 50-hour interval has been 3, because it has been rounded up

Median

The median is defined as that value of the variable that divides the distribution into two parts with the same number of observations, when they are sorted from lowest to highest

This measure has the advantage, over the mean, that it is less sensitive to extreme values

Example of median

Following the example of the vaccination campaign, we now want to calculate its median

We check the previous frequency table and see that we have 50 data, to find the central value we divide it by 2 and as it is even we will add 1 to the result. If it had been odd it would not be necessary to add that unity to it, because it would already be divided into two parts with the same number of observations

\frac{50+1}{2}=25.5

When we exit a value close to 26 we will take 2 central positions: 25 and 26

We look in the column of absolute frequencies accumulated in values 25 and 26, whose values are both 3

Now we calculate the median value: Me=\frac{3+3}{2}=3

Therefore, half of those vaccinated per hour in that 50-hour interval have been 3 or less and the other half 3 or more

Fashion

The fashion is defined as that value of the variable whose frequency is not surpassed by that of no other value

It may be the case that the maximum frequency corresponds to 2 or more values of the variable, in that case, the distributions are said to be bimodal or multimodal

Example of fashion

Following the example of the vaccination campaign, we now want to calculate its fashion

We look in the column of absolute frequencies and see that the largest is 12, which corresponds to the value 2

Therefore, the highest number of people vaccinated per hour in that 50 hour interval has been 2

Harmonic mean

The harmonic mean is defined as: Ma(X)=\frac{N}{\frac{x_1}{n_1}+\cdots+\frac{x_k}{n_k}}=\frac{N}{\sum\limits_{i=1}^{k} \frac{x_i}{n_i}}

The advantages of this average are:

  • It is unique
  • Uses all the observed values of the variable

It has the disadvantage that it is strongly influenced by the values of the variable close to zero

This average is used in variables that measure speeds, yields, and, in general, for variables that are the ratio of two magnitudes

Example of harmonic Mean

A cyclist performs a training consisting of 12 series of 1 km, each at constant speed. The data collected from your training are collected in the following table:

Series Speed (km/h)
1 54
2 47
3 46
4 50
5 52
6 47
7 51
8 52
9 49
10 51
11 47
12 50

We want to calculate the average speed of the runner during his training

The arithmetic mean cannot be applied because the variable is the ratio of two magnitudes (V=\frac{e}{t}), in this case the harmonic mean must be applied

Ma(X)==\frac{N}{\sum\limits_{i=1}^{k} \frac{x_i}{n_i}}=\frac{12}{\frac{1}{54}+\frac{2}{47}+\frac{3}{46}+\frac{4}{50}+\frac{5}{52}+\frac{6}{47}+\frac{7}{51}+\frac{8}{52}+\frac{9}{49}+\frac{10}{51}+\frac{11}{47}+\frac{12}{50}}=49.55139

Therefore, the average rider's speed has been 49,55139 km/h in the 12 series

Geometric mean

The geometric mean is defined as: Mg(X)=\sqrt[N]{x_1^{n_1}+\cdots+x_k^{n_k}}=\sqrt[N]{\prod\limits_{i=1}^{k} x_i^{n_i}}

It has the advantage, that in its calculation all observed values of the variable are used

It has the disadvantage of the influence exerted by values close to zero and negative values if N is even

This average is used in variables that measure percentages, rates, or index numbers

In any set of observations, if they can be calculated, it is always true that: Ma(X)< Mg(X)<\overline{X}

Example of media geomética

We have the price of a certain product and we know that in the last 3 years its price has risen by 10%, 20% and 30%

We want to know how much has been the rise of media

That is, we want to know what percentage you would have had to have raised each year (the same annual percentage) to get the same price after three years

Since percentages are being calculated we cannot use the arithmetic mean, we must use the geometric mean

Mg(X)=\sqrt[N]{\prod\limits_{i=1}^{k} x_i^{n_i}}=\sqrt[3]{(1+\frac{10}{100})\cdot(1+\frac{20}{100})\cdot(1+\frac{30}{100})}=\sqrt[3]{1.1\cdot 1.20\cdot 1.3}=1.19721577

Now, the result, we pass it to percentage: 1.19721577\cdot 100 =11.9721577\%

Thus, the average annual increase over the past 3 years has been 11.9721577%

Bernoulli test

Bernoulli

We will call the Bernoulli test a random experiment with two possible outcomes, one of them is called success and the other failure with probabilities p\text{ y }1 - p = q respectively, that is:

\begin{cases} \text{P}=\{success\}=p \\ P=\{\text{failure}\} = q \\ p + q = 1 \end{cases}

The discrete v.a. \xi that takes the value 1 when in Bernoulli's experiment you get success and the value 0 if you get failure, it is said to follow a Bernoulli distribution of parameter p = P\{success\} and is denoted how to:

\xi \approx B(1, p)

Its probability function is:

P(\xi = k ) = \begin{cases} p\text{ if }k = 0 \\ q\text{ if }k = 1 \end{cases}

E(\xi) = p

\sigma^2(\xi) = p\cdot q

\sigma(\xi) = +\sqrt{p \cdot q}

Calculation of a Bernoulli





Binomial Distribution

Binomial Distribution

The Binomial distribution measures the number of successes in n tests of Bernoulli equal and independent

The discrete v.a. \xi we measured the number of successes in n tests of Bernoulli equal and independent is said to follow a binomial distribution of parameters n and p = P\{success\} and is denoted how to:

\xi \approx B(n, p)

Its probability function is:

P\{\xi = k \} = \binom{k}{n} \cdot p^k \cdot q^{n - k}, k \in \{0, \cdots, n\}

E(\xi) = n \cdot p

\sigma^2(\xi) = n \cdot p \cdot q

\sigma(\xi) = +\sqrt{n \cdot p \cdot q}

Properties of the Binomial distribution

  1. \xi = \xi_1 + \xi_2 \approx B(n_1 + n_2, p) when \xi_1, \xi_2 are v.a. independent
  2. \xi = \xi_1 + \cdots + \xi_r \approx B(n_1 + \cdots + r_n, p) when xi_1, \cdots, \xi_r are v.a. independent

Calculation of a Binomial