The True Size of BIG DATA (and a note about Brexit)

Simplifying Big Data and how it’s used in everyday life

Reading Time: 8mins
Big Data in Lego

Firstly, an overview of “normal sized” data

Underline

Data comes in many different forms which most people recognise. Numbers, text, images, video and sound are the commonly used ones. ‘Bits’ represent these different forms in computing which can be binary values of either 1 or 0. 8 Bits represent 1 Byte which can store numbers and letters (simple data forms). Storing images needs a lot more Bytes, which is where the Kilobyte comes in. And even more to store video and sound which is where Megabytes come in. Still with me?

Imagine a Bit as the smallest Lego piece available. You can’t do much with one piece but with a few pieces, you can build a simple object (or numbers and letters). With more pieces you could construct a building (or words and sentences). And even more means creating a town or city, or in the case of data, storing complex things like images or videos.

The interactive diagram here gives a better representation of the scale of data. The below also gives a good display of the various sizes of data as well as an indicator of the scale of each increment. It might also be quite useful if a question about data sizes comes up on a gameshow or trivial pursuit for example. 

Getting Organised

Underline

Data isn’t just a collection of numbers, text and images. To gain better insights into data, it’s arranged in different ways and which affects how we interact with it. 

Here are some of the different levels of data arrangement:

  • Raw Data – Gathered and stored data with no processing applied. Also referred to as ‘Unstructured Data’. For example. If a supermarket gathered data at the checkout of all the things you purchased and stored it, that’s ‘Raw’ data
  • Sorted Data – Gathered data with some basic sorting logic applied. Using the supermarket analogy again the gathering and then sorting the checkout data from highest to lowest price
  • Arranged Data – Gathered and structured data put into relevant, logical groups. For example, if the supermarket purchases data is first gathered and then stored by type i.e. Meat, Vegetables, Fruit, toiletries etc and sorted by price. Arranged data is much more informative and allows for greater insights into the data
  • Presented Visually – Gathered and structured data presented in a way that gives clear insights to the viewer. For example, with the final supermarket analogy, gathering data about all purchases across an extended period of time. Then using this to visualise on a calendar chart, days of the week the customer spends the most on vegetables

Here is a visualisation of the different levels using another Lego analogy:

Arranged Data in Lego

So What is Big Data In Simple Terms?

Underline

The most simplistic known understanding of Big Data is, a volume of data that is bigger than a Petabyte . This is pretty simplistic though as many companies have data sets of varying sizes so the definition of BIG varies. The true definition of Big Data includes an understanding of the 4Vs (which is not a hand gesture by the way).

What are the 4Vs of Big Data?
  • Volume How BIG the data is. The first and obvious one. For data to be BIG there needs to be enough of it. E.g. Products with 5* ratings from a large number of people are better than products with 5* ratings from 1 person
  • Variety How varied the data is. This involves the varied nature of data as well as collecting the data from a variety of sources. E.g. Rather than collecting consumer data from Facebook, collecting it from many other platforms i.e Instagram, Twitter, Snapchat, Twitch and LinkedIn
  • Veracity How trustworthy and accurate the data is.This includes where the data comes from e.g. direct from source or from a third-party. Any cleansing or quality control to remove any errors during the gathering process is also part of this
  • Velocity How quickly the data’s gathered and processed.The quicker the gathering process the more valuable the insights into the data. E.g. When gathering data about the stock market, gathering it more quickly allows for quicker insights. This makes the data more valuable. This is due to the nature of the market and the freshness/relevance of the data
  • Value (Bonus V) The actual value of the exercise. One of the bonus V’s that’s often overlooked which is why it’s explicit in this list. Big Data exercises should aim to achieve something and this value should be clear from the beginning. This could be launching a new product, identifying a new opportunity or reviewing the efficiency of an existing service. Frequently though, it’s looking at better more direct ways of advertising. Value is key because it stops the execution of a Big Data exercise for the sake of it. However, value is what a lot of companies inexperienced in this area tend to overlook at the beginning.

Put simply, for anyone claiming to perform a Big Data exercise, they have to consider of all the 5Vs with special attention to the VALUE. 

So How Does Big Data Affect You?

Underline

The concept of Big Data has actually been around for quite a while but the label has only recently become popular. The easiest way to demonstrate it’s use is with a case study.

In the U.S back in the 2002, after gathering data about it’s customers on regular basis, Target — a prominent supermarket — made use of this data in quite an innovative way. The supermarket had already done some market research and discovered that a lucrative customer base for them was pregnant women. Research showed that if the supermarket could get these women to buy products from their store during their pregnancy that they remain loyal to that store for, at the very least, the teens of the child’s life.

The trouble was that, to get this user base, the supermarket would have to identify and target them early enough to be affective. This led to the goal of their Big Data initiative……predict pregnancies!

Spotting Trends in Data

As farfetched as the initiative, led by the career statistician Andrew Pole, might have sounded, it was actually quite logical. He started by looking at the consumer habits of already known pregnant woman and work backwards to fit these buying habits to people who are not yet pregnant. Some of these shopping habits included things like:

  • Sudden reduction in alcohol purchases after a usually consistent purchase level
  • Complete absence of contraception/prophylactic purchases
  • Removal of foods which are commonly avoided by pregnant women such as raw fish, unpasteurised cheese etc.
  • Removal of processed foods

Once he identified the trends, the company was able to target these women who they predicted were pregnant. They then hooked them with offers and discounts on goods such as nappies, baby food etc before other companies even knew what was happening. When the time came for the women to buy nappies and baby food, the coupons were already cut out and ready to use. This is an old example but there are a few modern ones that most of us are impacted by.

Here are 4 everyday uses of Big Data that you might not be aware of:
  • Online Shopping – As the majority of shopping is now online, Big Data plays a huge part. Not only by showing us things we’re more likely to buy based on brands we like but also suggesting similar items, pushing discounts on regularly purchased products and making the whole shopping experience more efficient
  • Mobile Maps and GPS – A lot has changed since paper maps and even built-in car GPS systems. A huge amount of data goes into generating the best route from A to B, dynamically changing routes if necessary. They also use data to provide info on the business of venues and best times to visit
  • Movie/Music streaming – Not limited to Spotify and Netflix but they, along with other streaming services, use Big Data to optimise what we listen to and watch based on yours and other users habits. Not only do they give intelligent, bespoke viewing/listening suggestions they sending promotions about live events
  • Online Dating – Dating apps use it to inform users of the best time to use apps based on activity and matching criteria. They also use it to predict the likelihood of getting a date based on previous matches and overlapping interests

The Chinese government is also using Big Data to analyse it’s citizens but there’s more detail on that exercise here

Here’s the Bit About Brexit

Underline

At the time of reading this article, the situation with Brexit will be one of the following: 

a) The UK has left the EU

b) ‘Negotiations’ are still ongoing and the UK is no closer to leaving the EU

c) Any number of other scenarios where there’s frustration on all sides and politicians aren’t sure what to do (the likeliest scenario)

So what has Brexit got to do with Big Data. Well, regardless of the stance on the EU one good thing that has come out of EU regulations is GDPR (General Data Protection Regulation). These are the rules around the protection, processing and control of personal and private data which was legally enforceable from May 2018. It also users rights over their personal data and make companies more accountable for its safety and relevance.

The relevance to here is that GDPR covers all data storage and processing including any untaken as part of a Big Data program. Fortunately, despite being an EU instigated regulation, it was also implemented in UK law under the Data Protection Act in parallel which means it’s one that we wont lose as part of Brexit. 

So what IS still up for debate?

Even though we’ll still have GDPR there are few data related areas that we might lose access to which could put the UK at a disadvantage. Some of these things include: 

  • Data stored within the EU’s criminal agency (Europol)
  • Information about passengers travelling in/through EU countries
  • Databases storing Tax and Customs information making it easier to do cross border business

At this point, the hope is that whatever happens we’ll still have access to them but there’s a possibility that we’ll lose access to this initiatives.

Summary

Underline

Companies use Big Data extensively to sell you things or to make their service invaluable to you by making it directly relevant to your circumstances. Not only do they spend a lot of time and effort ensuring they are compliant with the 5Vs to make sure their exercise is useful they also, in a number of cases, spend money buying this data.

However, to a certain degree it’s down to the consumer themselves to ensure that the data is accurate by providing it correctly and also by understanding their data usage.  And although its main use is to get you to buy a company’s product/service, it provides an indirect value to the consumer. It does this by removing noise around products proven not to interest them which gives the consumer the gift of time.

For the moment the power is still with the consumer as they who decides whether the use of the data. In addition there are companies who can help understand where the data usage as well as help to remove or restrict this if the user wishes. However, this isn’t the case in China..

When it comes it’s practical applications, the ones already mentioned are just the tip of the iceberg. Huge advances are being made in financial planning, recruitment and travel to make consumer experiences even more effective and making more decisions, data driven. 

Comparing Data to Diamonds

Most people now agree that ‘data is one of the most valuable commodities on the planet’ so I’ll use this in the final analogy.

Imagine data as the contents of a diamond mine. An efficient Big Data exercise gathers, categorises and processes the diamond ore into polished, clean stones you can use practically. Basically using all the principles and understanding described before.

A poor or non-existent Big Data exercise leads to collecting pieces of rock without a clear goal. Then attempting to make use of the raw material without a full understanding what it is, how much you have or, critically, if the demand for diamonds exists at all…