Today, big data rules companies. Software development engineers are in high demand to handle it all - from big data solutions to software integrations. With the dominance of social media, the rise of the Internet of Things, and the government’s reliance on data collection for defense and public health, there are few remaining corners of modern life that are not recorded as data points for someone’s analytics.
But this wealth of new data comes in a massive variety of new data formats, and with so many different types to store and analyze, how do you organize it to get the most out of it? The first step is to recognize whether the data is Structured or Unstructured. This blog will help illuminate what these terms mean so that you can equip yourself with the knowledge you need to gather actionable intelligence from your data.
Structured data is quantitative. It has a highly organized makeup to it that makes it compatible with a predefined format, such as a spreadsheet with articulated rows and columns. Structured data has to go through a process called Extract, Transform, Load (ETL), in which it is extracted from its source, transformed so that the data points can be expressed the way their destination expresses information, and then loaded into the database (usually a Relational Database). The term for this approach to data structuring is “Schema-on-write,” because the structure is applied when the data is entered into its storage.
Some common examples of structured data include inventory control systems, contact information lists, ATM records, and online sales data. Any electronic action that acquires the same few very simple alphanumeric data points each time it is taken can be considered structured data.
Unstructured Data is qualitative. This data stays in its native format, such as a .jpg, mostly because it doesn’t have any intrinsic organizational traits the way that a spreadsheet does. It is entered, unprocessed, into a data lake. Extract, Transform, Load still happens... it just happens later, during the analytics process. The term for this approach to data structuring is “Schema-on-read,” because the structure is applied only when the user is making use of it, rather than when it is stored.
Some common examples of unstructured data include emails, social media posts, chats, slide decks, pictures, audio recordings, Internet of Things sensor data, etc. Social media and product reviews are such a common application for unstructured data that it is sometimes casually referred to as “opinion mining” by marketing departments.
Surprise! There is a third type of data structure. Semi-structured data typically lacks a fixed schema and/or doesn’t fit into a database format, but it has some organizational or hierarchical properties to it.
Most of the time, what Semi-structured data is referring to is the metadata that is attached to a piece of unstructured data. For example, an email is a piece of unstructured data, but taken with the sender, recipient, date & time, and subject line, it can be processed as semi-structured data. Email providers use semi-structured data to automatically sort emails into spam folders.
Other examples include tweets organized by hashtag, or videos and photos with camera settings, GPS data, date & time, and file types.
Working with our financial services client, PVM ingests structured data about hundreds of millions of consumer transactions each day to help merchants make decisions about how best to manage their businesses and to help government agencies assess the health of the economy. With such massive volumes of data involved, the structured nature of the data is an indispensable boon to the missions of those end-users.
Getting your data organized properly can be an enormous undertaking, and it’s natural to feel intimidated by the processes involved. Whatever your data organization needs, big data or small data, structured data or unstructured, PVM can help! Contact us today to discuss our offerings.