Section Three - Measuring is influencing
A story about Facebook, university rankings and criminalizing poverty (4 minutes)
A while ago, when Facebook was still trusthworthy (more or less), they proudly announced that by analyzing our likes, they knew exactly who we were. That was a bit creepy. But it was also a strange statement, because Facebook was also the company that determined what we saw on Facebook. So, they also influenced what posts we "liked."
You see this pattern continuously. We measure things, but we also influence what we measure. Direct or indirect.
Let's look at some examples in which we combine the problem in the previous section (all data is subjective) with the problem in this section.
Examples
Example (1) Remember when we stated in the first section that one of the benefits of (big) data is that you can predict the future? Such applications are, for example, PredPol or Comstat. Those applications predict where most of the crime will be committed, and show that with moving squares on a map. That way you can deploy the police to the areas where crime is most likely to take place.
Sounds great, right?
Sure, but from section one we know that we have to take a close look at the data that is collected. What data does PredPol use? Than it becomes clear that PredPol can only use data on crimes that happen a lot. Things like drinking in public, noise disturbance, loitering, breaking into cars, using drugs, etc. There is not enough data on heavier crimes like breaking and entry, rape or murder. PredPol and Comstat thus ensure that the police are deployed on petty crime, and that takes place mostly in poor areas.
In this way, these systems this way are criminalizing poverty with data.
Secondly, because the police are deployed to these areas, more violations are identified and a negative spiral is intensified. This way the predictions are proving right but mostly they are just a self-fulfilling prophecy.
Example (2) In the United States a ranking of universities was introduced. This ranking could not measure the most important thing (what have students really learned, how did they grow as a person), but it could measure all kinds of other indicators (proxies), such as scores, dropouts, donations from alumni, sponsors, job prospects, athletic performance, etc ... The result was that the universities more and more started to focus on the ranking indicators. After all, highly ranked schools attract the most students. The score did not include the tuition fees (remember, data is subjective) because the organizers had to make sure universities like Harvard & Yale were in the top of the ranking to make the ranking look thrusthworthy.
The result was an unfair comparison which resulted in universities focussing on ranking indicators and a huge increase in study costs.
The worst part is that these types of data models, even if they are transparent, still reinforce themselves. After all, good schools receive more applications, can make better selections, get better students, attract better professors, etc.
Again a self-fulfilling prophecy.
This is why Cathy O'Neill states that:
"An algorithm is an opinion embedded in math" - Cathy O'Neill
We will talk about algorithms later, but for now please watch this video (2 minutes) in which she explains the problems with data from section two and three:
Take aways from section two
- If you measure data, you start influencing the data;
- Often this way self-fulfilling prophecies are created and inequality rises.