Why Marketers Need to Understand Data Engineering

In this post I am going to discuss the number one skill for marketers in 2019. If you stick around until the end of the post, I promise that this will not be a waste of your time. I am also going to provide a bunch of resources to get you started learning this skill. Finally, if you read to the end and absorb the message, you will without a doubt be a better marketer.

Alright, so the #1 skill to learn as a marketer in 2019 is the basic skills of data engineering. If this sounds like I have lost my mind, please hold on and hear me out. First of all, I'm writing this in 2019, the reality is that this skill/discipline will likely be very important for marketers well into the future.

The main reason this is the case is that marketing, especially growth marketing or product marketing relies on data engineering. In order to make data driven decisions, you must be able to analyze the correct data. In order to make sure you have the correct data you need to understand how the advertising and analytics platforms you are using manipulate and summarize data.

At this point we need to pause the conversation and introduce a new term and acronym.

ETL = Extract Transform Load

Extract Load and Transform (ETL) is an abbreviation for the process and data engineer uses to pull accurate data from one platform to another. The extraction part has to do with how you can export raw data from one platform. A great example of this that most people are familiar with is the export .CSV function most Saas products have.

If you have ever used one of the popular ads platforms like Google Ads or Facebook ads, you have no doubt used this function. The important part of the extraction process is getting the raw data out of a database. If you are using a database that is not attached to a Sass product with a user friendly front-end, you will probably have to use a query language like SQL to pull data from a database.

Regardless of the exact process you use to get data out of your platform, the extract portion of ETL describes how data is moved out of a one platform or database to another.

The next portion of this process is the Transformation of the data. This usually takes place when you "clean" or restructure the data. This is a common part of the data engineering process when you must input your data into a visualization tool in a very specific way.

This part of the process includes activities like cleaning up data in an excel file or manipulating data via a programming language like Python or R. Regardless of the process you use, the goal of this stage of the ETL process will be similar. Prepare data for the Loading phase of the ETL process.

The last phase of this process, the load phase, includes uploadin the data into another platform. In some cases this could mean loading data into a database for long term storage or into another tool like a visualization tool.

If you are using very simple tools like Google Sheets and Supermetrics, you might be loading your data into Google Data Studio. If you are manually setting up your own data pipeline, you might be loading data from a database into a visualization tool like looker. Regardless, you need to get your data into some sort of tool.

Why Marketers Need To Understand Data Engineering

Any question you have about your data can be answered with the correct process of data extraction, transformation and loading. For most high level Sass products, this whole process takes place "under the hood" of these products. Marketers don't really want or need to know what is going on (or so they think).

The problem with not having any understanding about what is going on under the hood is the many assumptions this Saas platforms make about the data you are extracting.

This process of Extract Transform and Load is taking place at many different "levels of abstraction" throughout the lifecycle of your user data. For example, when a tracking platoform first encounters a user's behavior performed on your page, it extracted that action from the browser, interprets the event based on the default settings of the platform and then stores that behavior in the platform's database.

A great example of this process in action is seen in Google Analytics. When a user takes an action on page, Google Analytics extracts the event from the browser. it then interprets the event (transform) based on the current platform settings. In Google Analytics you can alter these settings, or the way a user behavior is interpreted through Google Analytics Views.

This information is then stored for viewing or exports to another database. By exporting the data you can once again start the ETL process, this time at a higher level of abstraction.

Most of the popular analytics platforms like Google Analytics, Google 360, Kissmetrics, Amplitude and even Heap (depending on the users settings), make assumptions about the data you want to see reported.

If you are not aware of the default settings on these platforms and how they affect the data that is displayed, you will be making decisions with faulty data. Basically, you cannot practice true data driven marketing without a basic knowledge of the way your analytics and tracking platforms record user activity and behavior.

Why Analytics Platforms Do Not Work Out Of The Box

One of the most common issues marketers have with their analytics platforms is the attribution model these platforms use for sales attribution. There is no standard way that platforms attribute sales data. A great example of this discrepancy is the difference between The Facebook Ads default attribution model and Google Ads attribution model.

Facebooks Ads is notorious for artificially high conversion values. This is because Facebook Ads record "view through" conversions as part of the default settings. View through conversions mean that facebook takes 100% credit for any sale that is made where someone views a Facebook Ad prior to purchase. For example, if someone sees an ad within the view through conversion window, facebook takes credit for the sale.

On most accounts, the default view through conversion window is 7 days.

Google on the other hand only has a one day view through conversion window by default. Therefore, if you made the assumption that Google Ads conversions were similar to Facebook Ads conversions, you would be way off the mark. The ETL process that on page user data is going through prior to even reaching these platform's databases has manipulated the data in a way that could be detrimental to your efforts as a data driven marketer.

In order to get a better handle on your data you need to understand the data processes that are taking place within, and between all of the platforms you are using for data extraction, manipulation and visualization. At each step of these processes and at every layer of abstraction you need to understand what is happening to your data.

Where To Learn About Data Engineering

If you are brand new to the world of data engineering, it makes sense to simply start learning about the different uses for this discipline and some of the breakthroughs that have been made by engineers on the cutting edge of this field.

Many of the breakthroughs made at elite multi-billion dollar unicorn companies can be used to fuel growth of your bootstrapped internet startup. If anything, its exciting to learn about the breakthroughs in technology that allow small startups to operate like enterprise companies when it comes to data.

A Non-Coders Journey To Becoming a Data Engineer

The Startup Founders Guide to Analytics

Does My Startup Need a Data Engineer

Data Science vs. Data Engineering

Beginners Guide to Data Engineering Part I


In order to get the most use out of this post, I suggest you look over one of the resources above. Read through the one that appeals to you the most. If you are a marketer who is new to the world of data engineering, most of those articles are going to talk about topics you are not familiar with (thats ok).

The point is to get you thinking about what is going on "under the hood" of the Saas products you are using for tracking and analytics. This understanding of some of the behind the scenes processes will allow you to ask better questions about your data. These questions will eventually lead you to a more complete understanding of your data and more accurate decisions.

← View all posts