Section Five - Understanding correlation
A story about avoiding the ambulance (5 minutes)
What's going on?
In the previous section we talked about correlation as coincidence (apophenia).
Of course, a correlation is often not just a coincidence. Often there is definitely something going on. But what exactly? You can go a long way to investigate that from the data. Below is an example, which is especially cool if you like statistics, cats and tall people.
Watch the video below (4 minutes):
But usually you can not understand what is going on by just looking at the data. Most of the time you need to have domain knowledge.
For example, suppose you find in the data that people often die in the ambulance on their way to the hospital. Then you could conclude – if you do not have any domain knowledge – that it is better, after an accident, not to travel by ambulance.
After all, people die in there!
But if you had domain knowledge, you would have known that ambulances often ‘load’ people who have already died, which is not allowed officially. That is why the records show that those people died in the ambulance.
It also helps to know that a much larger group survives because they go by ambulance.
Note: there are many more examples like this. In section six we have a fun exercise.
Take aways from section five:
- Correlations can be deceiving;
- Correlation is not the same as causality;
- Domain knowledge is indispensable.