Section One - The promises of (Big) Data
A definition, some great promises and a mysterie (10 minutes)
We truly believe in the power of data.
In small data. In medium data. In Big Data. We believe the power of data is that it can make things visible you could not see before. It can reveal things that have been hidden, for example, because people lie on questionnaires.
In the U.S. if you ask people how often they have sex and how often they use a condom, then 1.1 billion condoms would have to be sold every year. In reality, 600 million are sold.
But we also believe that the power of data can only be truly leveraged if you are aware of the shortcomings and pitfalls that data entails.That is why in the next eleven sections we will discuss these and show you some examples. It is really important to be aware of these pitfalls and shortcomings because everything is becoming datafied really fast. What used to be just a conversation is now data in WhatsApp.
But first let's look at (Big) Data. What is it? What are the promises of (Big) Data?
(Big) Data
A lot of people are really enthousiastic about data and the datafication of the world. They are really into Big Data. Let's look at a typical enthousiastic video that also explains (a little bit) what Big Data is and what promises Big Data offers.
Quick fun fact: This video (2 minutes) is from 2016 and starts with: 'the time of spreadsheets is over...' Really? Come and have a look at our university.
If people talk about Big Data, they often mention the 'five V's':
- First of all - surprise - a lot of data (Volume);
- This is data that is produced quickly and changes quickly (Velocity);
- It concerns many different types of data, often unstructured (Variety);
- It's about reliability (Veracity);
- The point is that you can get Value from it.
There are a lot of people that think Big Data should not be defined by the v's. They like to think about Big Data as a movement. This movement is about the fact that it is becoming easier (and cheaper!) to collect, store and edit data and the statistics (and algorithms) are getting better.
Also, we are collecting more and more data. Think about the Internet of Things, the smartphone that you are always carrying, the sensors and cameras that are everywhere, and so on! We are moving. We are collecting. We are analyzing. We let it determine parts of our life. It is a movement.
So we need to understand this movement.
The promises of (Big) data
In this section we look at all the promises of (Big) Data. We already saw some examples in the video above. Below we made a list of some really important ways (Big) Data can improve our world. This is not a complete list, it is a crash course after all, but our intention is to give you a feeling for the possibilities.
Promise One: A new perspective on reality (digital twins)
Almost everything we do leaves a data trail and this only grows. Just think of all those emails, documents, posts, likes, Tweets, sensors, cameras, smart speakers, smart TVs and so on. Today many devices have an internet connection. We call this the internet of things.
Let's look at a video (2 minutes):
Quick sidestep: We often call devices that are connected to the internet smart. This is not always true. Having an internet connection doesn't make something automatically smart. In fact, there are very stupid devices with an internet connection, such as the Hapifork.
This fork keeps track of how fast you eat, and if you eat too fast, the fork will vibrate. Do you think that sounds smart? Or does it sound like your mother? There are even smart toys which in turn lead to intriguing phenomena like a smart Barbie.
All these 'smart devices' (our car, toothbrush, scale, energy meter, and so on) all collect data all the time. In addition to all those devices, we are also increasingly measuring ourselves (steps, heart rate, sleep, and so on). This is called the Quantified Self movement. You can find more about this in crash course two. And these are just some examples of a world that is getting datafied really fast.
This means there is no longer a clear distinction between online and offline. It also means that we get the opportunity to look at our 'reality' differently. We see a world with our eyes, but we can also look at a digital print of that world.
A digital twin!
This means that we are able gain a new perspective and that offers many opportunities to better understand our world. We can understand things better and maybe solve problems that were unsolvable. We can find new markets. Cure diseases. Make things better!
Quick Side step: Our favorite example of gaining a new perspective is from Pornhub's data. They have 42 billion visits a year, so that is a lot of data and they found that in the six most searched terms of 2019 are listed: Mom, Stepmom and Milf. With which, almost 100 years after his death, the Oedipus complex of Freud seems to be proven by the data. A lot of men really want to have sex with their mother.
Quick Question: When do you think the lockdown started in Spain?
So, data offers a new perspective which offers new possibilities to solve problems. This might also mean that the major problems of our world (climate, health, pollution, food, etc ...) could be tackled much better if you have large amounts of data to understand these problems more. Given the scale of these problems, (big) data is perhaps the only way to tackle this data.
Data can also give new insights into human nature. We already know that people lie on questionnaires. There is no incentive to be honest. With Google it is exactly the other way around. People are very honest there, because as a reward they get an answer to the question. On a questionnaire they say that they are happy, they say that to their friends. They ask Google “How do I know if I am depressed?” The Google Data is therefore more and more a digital twin of human nature.
"Google was invented so that people could learn about the world, not so researchers could learn about people. But it turns out the trails we leave as we seek knowlegde on the internet are tremendously revealing.' - Seth Stephens-Davidowitz.
Or maybe not. We talk about that in later sections.
Promise Two: Predicting (especially: the future)
If you have a lot of data, and you understand things better, you can also better predict the future.
This can be done on a macro level. For example, you can predict where a lot of crime will take place or where a fire will break out and in this way make smarter use of the deployment of the police or fire brigade. At Fontys University we can (almost) predict the impact of the weather (rain!) on the number of students coming to school. Restaurants can predict much better not only when they will be busy, but also what people order on which days and in which weather and thus obtain purchasing benefits. If you know how people move through the store or the shopping street, you can organize them more efficiently. If you know how a virus spreads, you can fight it better.
It is also possible to predict the future at a micro level. If you have a lot of data about the normal attitude of a driver, you also know when a driver is about to fall asleep. You can see behaviour in a student that indicates dropout and intervene in time. You can feed a smart car so it knows when to brake. Or the Facebook Newsfeed that predicts what you 'want' to read (or maybe: want to click).
The possibilities are limitless.
Promise Three: Enabling objective decision making
By means of (big) data you can - at least in theory - also make more objective decisions.
Instead of being judged by a subjective person who only has a little bit of data and a lot of feelings, why not make decisions based on a lot of data and no (or less) feelings? Data can decide if someone is perfect for a job! Data can decide if someone gets an Insurance or not. Or whether a student should be admitted to a school. You can also take data into account when determining sentences. And so on.
Many people argue that this leads to dishonesty. They say; if the system is biased, then the algorithm is also biased. True, but that does not mean algorithms can't still function better than humans. You know, humans are biased too. Algorithms don't have to be perfect, some people say, only better than humans. We talk about this extensively in later sections and courses.
Promise Four: The fuel for artificial intelligence
(Big) Data is the fuel by which machine learning can be 'trained.' The idea of machine learning (which is part of artificial intelligence) is that you program certain basic rules and then the computer learns itself, based on enormous amounts of data. Every time you search for a picture of a cat on Google, and you click on a cat at Google Pictures, Google learns what a cat looks like. Machine learning offers enormous possibilities. Think of recognizing cancer cells, self-driving cars, fighting spam, chatbots and so on.
The more data you have, the more you can use machine learning for all kinds of problems. Nowadays, we have machines that can find new patterns completely independently (and opaque) or even imagine things like paintings. Much more about this in crash course five about transparency.
Data-driven
The promises of (Big) Data explains why there is so much hype around organizations that want to be data-driven. Data-driven marketing. Data-driven education. Data-driven logistics. Data-driven healthcare. And so on.
However, we do not think being data-driven is a good idea. We will explain that in the following sections.
Big Data, a capital mystery
We can do fantastic things with data, especially with Big Data, but that doesn't explain why we almost always write Big Data with capital letters (B&D). We have done some research, but we have not been able to find a real answer. We did find some possible answers:
- It is just something 'techies' do. Artificial Intelligence is also often capitalized;
- It has something to do with marketing. A capitalized Big Data looks impressive;
- There is a conspiracy going on against the public good, just like with Big Oil or Big Tobacco.
Our opinion is that it is an exaggeration to write big data in capital letters. You don't capitalize antibiotics either. In addition, there are a lot of issues with (big) data.
That is the subject of the following sections.
Take aways from section one
- There is more and more data every day;
- Big Data is often characterized by the five V's or as a movement;
- Big Data holds great promises: a new perspective, predicting the future, objective decision making, fuel for artificial intelligence;
- With (Big) data we can solve problems and make the world a better place;
- That's (maybe) why we write Big Data with capitals and organizations want to be data-driven.