Every year the Data Every Minute (DOMO) creates an infographic that shows how much data is generated every single minute of the day by people on the internet. In 2018, the internet received 3,138,420 GB of traffic every single minute (DOMO, 2018). However, “90% of that data is defined as unstructured data” (Marr, 2019). The expectations are that the unstructured data volume increases from 55% to 65% each year (Marr, 2019), and “by 2020, there will be 40x more bytes of data than there are stars in the observable universe”.

Most of the unstructured data came from the Internet, social media communication, digital photos, services, and the Internet of Things (IoT), (Marr, 2019). Also, according to DOMO, in just one minute of 2018 12,986,111 texts were sent, 159,362,760 emails, and 176,220 calls via Skype – More about the numbers around the DOMO can be consulted in the picture below. All this raw data can be valuable information for companies when it becomes structured data.

One of the applications of unstructured data is in the customer analytics area. Call center transcripts, online reviews of products, chatbot conversations, and social media can be mined and analyzed using artificial intelligence to spot patterns in the information from these sources. Companies have the intel available to make swift decisions that can improve customer relationships (Marr, 2019).

DOMO, Data Never Sleeps

Figure 1: Data Never Sleeps 7.0, DOMO

The importance of analyzing raw unstructured data is common knowledge. This remains a big issue today for companies around the world. Wired Magazine listed a few things that prevent businesses from successfully managing unstructured data:

1.    A lack of tools that easily manage unstructured data. Tools need to provide efficient text parsing and analytics, taxonomy, and metadata management.

2.    Difficulty in integrating unstructured data with existing information systems. Both are often seen as apples and oranges when it comes to analytics and decision making.

3.    Shortage of skills in existing staff.

4.    Missing sense of urgency for managing unstructured data (Taylor, 2018)

On the other hand, to extract useful information out of unstructured data, enterprises must enhance their existing structured data management approaches to accommodate semantic text and content-stream analytics (Loshin, 2013). To solve the problem of analyzing those data, businesses use tools such as Hadoop to process, mine, integrate, store, track, index, and report business insights from raw unstructured data. Without tools as that, it would be impossible for Data Scientists to efficiently manage unstructured data (Marr, 2019).

What can be observed until now is that the growth of unstructured data volume will increase each year. The number of people with Internet access increases year by year. As of January 2019, the internet has reached 56% of the world’s population, representing a growth of 9% in comparison to last year. The more people on the internet, more information will be shared. Unstructured data analysis is a hard challenge for the business but will be critical in the short term.

Written by Ligia Galvão


