If data is petrol...

May 18, 2021

Data is the new gold. But most of us don't understand how and why it has been tagged as priceless by big tech companies. Here is a simple oil metaphor explaining how data works and why it is the new black gold.

It flows from its source to our engines

Like petrol, data is buried. Under layers of ground, lays the crude oil that humans need to extract. And it is the same for raw data, buried under technological layers. These pockets of valuable commodities are called sources. Petrol has many different kinds of sources, and the data ones are just as diverse. Data can originate, for instance, from the clicks and taps of website users. These generate data flows.

Both, to be extracted from their sources, require human efforts. Data, as petrol, is invisible until captured. Where petrol is extracted by huge pumps, data is captured by tiny snippets of code.

The crude oil going out of the source is mostly captured and conveyed by pipelines. These pipelines lead the oil to a refinery, where it is transformed in gasoline, fuels and other by-products. Once the crude oil is refined the liquid outputs, gasoline and diesel fuels are brought to storage places. The rest of the by-products, all sorts of raw plastics, are sold on the market.

Like crude oil, raw data is brought by pipeline to a transformation place, the Data Warehouse, then stored in a place called a Database. These steps, for data, are known as ETL: Extract-Transform-Load. Indeed the raw data is extracted from the source, transformed into clean data and then loaded in a Database storage. The by-products of data, statistics and aggregation, can also be sold on Data Marketplaces.

Refined data as petrol would be. image from the author

In these oil storages, some analysts will collect metrics about the petrol - such as quantity, pressure and composition - using powerful tools and subsequently compile them in quality reports. These reports are sent to the head of Engineer who will use it to control and improve the oil flows.

On a data level, this analysis of the metrics is called Data Analysis. It is captured in excel or Dashboards that are delivered to the management team. Thanks to the up-to-date graphs constantly refreshing with hourly data, the management can get insights to steer the business.

Back to our petrol, after the storage phase most of the fuels make their way to tank stations and then cars. Cars empowers humans to move. Likewise, data goes from a Database via a distributing system to feed our smartphones and apps. Smartphones empower humans to create and discover. Airbnb uses data to make you discover a city map and its remaining rentals. Snapchat uses data to help you create filters.

Additionally, a big portion of the gasoline and plastic goes to different industries. The fuels power industrial machines which, molding plastic into different shapes, create products to serve society. In the data version, data goes and flows in every department of the company owning it. For instance, the operational department uses the location and logistic data to deliver purchased products to the right address.

Fuels also go to Research & Development industries. Companies in these industries are building powerful machines that can automate human work, or make it more efficient. Aeroplanes can move humans faster and further thanks to their complex engines - the result of years of work within the Research & Development industries.. On the data side, this is called Data Science. Scientists experiment to find which algorithm will make the product or process more efficient: the machine does the heavy lifting. But as a plane still needs a pilot, sometimes the algorithm needs a human-in-the-loop to prevent potentially wild and unexpected behavior.

The humans behind the curtain

There are many jobs to keep the petrol flowing through the pipelines. Once a source has been found, a whole team will install the pump to extract the crude oil out of its cave. Regarding data, it is mostly the Software Engineer or IT team that will do the job of placing the tracking on a website or an app.

Then the oil goes off the pipe! There an engineer will be installing the whole pipeline network. This big scale work requires thoughtful planning and intense maintenance. This Installation Engineer is, on the numerical side, known as a Data Engineer. They build the pipeline during the whole trajectory of the data from the data source to the app.

The refined oil next enters storage and its metrics are examined. In the data realm that is the job of a Data Analyst. Their role is to explore the data and aggregate them in graphs or reports in order to bring concise insights. Indeed proper visualizations help the human brain (i.e. the decision makers) to understand the trends and act on them. A manager reading a graph that clearly displays expenditure can reallocate budget more easily than when given a long thread of excel sheets.

If the company reaches a certains size, this analyst is split into two sub-analysts. One, liking numbers, measuring the petrol pressure and the other, strong in communication skills, being the translator between the management needs and the metrics to be looked at. This would, in the data term, be the number-loving Data Analyst and the outspoken Business Analyst.

Established oil refineries have all the tools in place and the head of Oil Engineering knows how to read the thermometer. Still, every day, there is an engineer working on new requests such as installing more accurate thermometers or documenting how it measures heat. Similarly, in a company that is experienced working with data, every plots their graphs with access to readily analyzable data. The person behind the curtain taking care of maintaining the one source of data truth is the Analytics Engineer.

Then at last comes the Scientist of the R&D department. These guys also need hectoliters of oil to run their machines. Think about the plane. They need years of trials to build and develop new machines. But when they succeed, their work expands the boundaries of society by creating new opportunities. In the data world, these are the Data Scientists. They are the ones playing with data to create algorithms for the purpose of widening human possibilities.

But think about this for a moment: in order to make petrol 95-99% pure, mankind had to work to refine its transformation process. Nowadays, thanks to decades of research, we know the standard process to reach 99% purity without much trial-and-error. Data does not have these decades of research. Consequently, the field does not have the standardized steps to make an algorithm 99% accurate at first try. That is why we have Machine Learning Engineers. They are the ones iteratively pushing the accuracy from 80% towards the 99%.

Concluding thoughts

Both petrol and data ranges are the high-risk/high-potential assets of mankind. Cars, as well as smartphones, are part of the inventions that shaped human history. Everyone is better off with them, but with their uses came abuses. Abuses called Global Warming and Cambridge Analytica.

Whereas petrol's journey and the threats of CO2 emission are well understood, the journey and threats coming from data are still nascent. We do not yet know the full extent of abuses that can stem from data extraction. But we do know that, as much as the fuel in our rockets, data is mankind's newest route to reaching new moons.


Thank you for reading my article. I you wish to to have articles about how data helps business growth, please subscribe here below.