Theory - [T5]
Dataset
Let’s first of all define our bivariate distribution.
As a study case we will take the population of students and as random variables we will take the grades taken in a certain exam and the current average of each student.
Student | Grade | Average |
---|---|---|
Kiersten | 28 | 29 |
Reese | 26 | 21 |
Andy | 30 | 27 |
Marc | 26 | 20 |
Contingency table
Now we have our dataset, let’s build the contingecy table:
Grade\Average | 20 | 21 | 27 | 29 |
---|---|---|---|---|
26 | 1 | 1 | 0 | 0 |
28 | 0 | 0 | 0 | 1 |
30 | 0 | 0 | 1 | 0 |
Joint frequency
The joint frequency can be seen as the frequency of two events happening at the same time. We can make an example taking the previous contingency table: $$freq(Grade=26, Average=21) = 1$$ This value is immediately visible on the contingency table.
Extended contingency table
Now let’s show the contingency table above with the corresponding marginal frequencies:
Grade\Average | 20 | 21 | 27 | 29 | Marginal Grade |
---|---|---|---|---|---|
26 | 1 | 1 | 0 | 0 | 2 |
28 | 0 | 0 | 0 | 1 | 1 |
30 | 0 | 0 | 1 | 0 | 1 |
Marginal Average | 1 | 1 | 1 | 1 |
Marginal frequency
Given the extended contingency table, we can see what the marginal frequencies for each column and row are. These values represent the sum of all the joint relative frequencies of a specific row or column.
Conditional frequency
Given all the information derived up to now it is immediate to define the conditional frequency as the number of occurences of a certain even conditioned on another specific even.
This can be defined by the following formula:
$$ cond\_freq = (joint / marginal) $$
For example if we want to find the conditional frequency of all students who got a grade of 30 conditioned on having a average of 27:
$$ cond\_freq = (1 / 1)$$