Section Eleven - Predators & data scientists
A story about data fundamentalism and some final tips for working with data (6 minutes)
Data predators
In the previous sections we have pointed out a number of fundamental problems with regard to (big) data.
Most people, organizations or systems do not intend anything evil. However, we showed that (big) data and algorithms unintentionally lead to problematic situations. Unfortunately, there are also a lot of people who are less well-meaning, even evil, and they get this great new toolset. Big data does not always work as intended, but often it works well enough to do bad things. With (big) data you can, for example, select people who are vulnerable and you can target them very specifically.
There are many examples. The real problem is that it is not or hardly regulated. We will talk about this some more in the crash course on bad actors.
Data fundamentalism
In the previous sections we talked about all these fundamental problems with (big) data. We named ten! And there are even more fundamental problems. For example, you can easily lie with statistics or visualizations (see examples in section twelve). Or you can find patterns (a student is probably going to drop out) and make a wrong intervention, so now you know for sure the student is dropping out.
Or maybe the student will sue you.
Despite all the drawbacks – in recent years – there has been a high degree of so-called data-ism or data fundamentalism. Kate Crawford coined that term, and it stands for a blind faith in the power of data. The idea that correlation always indicates causality (oh yes, cheese and bed sheets!) That large data sets offer the truth. That predictions and decisions based on data are objective (not so!). That algorithms can solve any problem (yeah, right!).
Everywhere we see organizations advertising that they are data-driven. Data-driven marketing, data-driven e-commerce, data-driven decision making and so on. Data-driven now has 23.5 million hits on Google and that is growing fast. Apparently being data-driven it is something to be proud of. Everywhere you see commercials of data consultancy agencies that sound a little bit like this (imagine a typical Hollywood movie trailer voice!)
In a fast-changing world where your competitors are catching up with you. There is only one solution. To adjust. To be data-driven.
Fear sells.
Everywhere you will find policy documents in which serious organizations say they put data at the centre of their organizations. That they will be data-driven. In the policy documents they identify the biggest challenges. What about privacy? Can we find enough qualified people? Storage? Computer power? Will we use the data in an ethical way? And so on.
However, they rarely question the data itself. Data is good.
Data fundamentalism.
However, given all the fundamental issues with data we talked about in the previous sections, an organization should perhaps be less proud of the fact that it is guided by data. Data-driven decision making sounds – euphemistically speaking – not like a wise thing to do. Maybe it is better to deal with an organization that is not guided by data.
Maybe it is better for organizations to be customer driven. Or patient driven. Or student driven. Organizations that are human driven and supported by data. It may sound less sexy, but it is probably better.
Think harder!
This crash course is almost finished. It was all about the promises of (big) data and the fundamental issues of (big) data. The promises are great but only if you are aware of the pitfalls and limitations.
Using (big) data means you have to think harder.
To help you with that we have created six pieces of advice to help you assess (technologies that make) use of data.
- Every data set must be viewed skeptically. Be aware that there are errors in the data. Be aware that the data has been collected or labelled by people with an opinion. And especially ask yourself which data has NOT been collected and/or is not part of the dataset. Whatever you do, don’t consider data to be neutral or objective;
- Data is a tool, not a decision maker. You don’t ask your hammer how to build your house. If somebody uses the term data-driven decision making, walk away. Think about the horse guru, data complements the old school way we understand the world. It does not replace it;
- A data scientist is only valuable if he or she understands the subject they are collecting data about. Ask yourself if you have enough information to estimate the value and conclusions of the data;
- Distrust proxies. If you cannot measure something (for example, what did you learn at school or if you are a good employee) then all kinds of data is often collected around it (proxies). The danger is that the proxies (dropout, yield, number of teachers with a title, employee satisfaction, turnover, etc.) become more important than the thing that you cannot measure or create a self fulfilling prophecy;
- Distrust data that knows WHO you are. In all cases it should be about WHAT you do now, not who you are. Otherwise you will be determined for what you did, not who you are now or what you wanted to do. Otherwise you will be denied a loan because you have a certain past and income, despite your new ambitions and job. Or you will get suggestions based on things you watched on Netflix not based on the things you wanted to watch. Also, reality is far more complex than a million of data points about you;
- Ask the right questions. Kevin Kelly devotes an entire chapter to it in his book The Inevitable. Good questions will become much more important than good answers. This certainly applies in the world of (big) data. Let data help you ask questions.
In short, organizations should use (big) data as a tool and think hard about it. Maybe in the future it will turn out that machine learning and data analysis is getting so good that it is much better than what we humans can do (because yes, we admit it, we are not perfect either). If so, we will update this crash course. For now we advise you to choose organisations that consider (big) data with the above 6 rules of thumb in mind.
That are human driven and data supported.
And preferably write big data without capitals.
Take aways from section eleven:
- Data provides predators with new tools to prey on vulnerable people;
- There are six basic rules you should always keep in mind while working with data;
- Organizations should be human driven and data supported (no data fundamentalism).
Some final words on crash course four
Congratulations! You have completed this crash course, so you got a very small taste of thinking about technology and data. This was just an appetizer. If you are going to think about, assess, design, program, discuss, use or invent a technology we would like you to remember that:
- Data, especially big data, has great promises;
- Data is always subjective;
- You influence what you measure;
- Apophenia, correlation and cause & effect are tricky concepts;
- Killing pirates does not solve global warming;
- Algorithms are often biased;
- Most of the time feedback loops are not closed;
- Using data to predict the future only works if the future looks something like the past;
- Reality is way more complex than millions of data points;
- Using data means thinking harder;
In the previous ten sections, we've covered ten fundamental problems with data. But we could have mentioned even more. We could have devoted a section to the fact that people have trouble understanding numbers (many people are not statistically literate). Another topic we could have discussed is misleading visualisations. We are working on an optional bonus section on this topic.The message remains the same: when you work with data, you have to think harder.
And remember: do not be data-driven, be data supported.
Want more?
This was a crash course. It only took you one hour to complete. If you want more, we have some suggestions:
- First, you can check out section twelve, with all kinds of additional materials. Section twelve is updated regularly;
- Second, you can do one of the other ten crash courses;
- Third, you can start using the Technology Impact Cycle Tool, especially the questions regarding data;
- Fourth, you can check out our example cases, for example on using data and the Corona Contact Tracing App.
Finally, do you have any suggestions or remarks on this course? Let us know at info@tict.io