Attribution Tracking

#04 – Customer Attribution for SEO and Direct traffic: A Complete Guide

If SEO is one of the key revenue-generating channels for your business, you likely have a need to measure its effectiveness. Not only for SEO, but many businesses also have a substantial amount of Direct Traffic and want to accurately identify customers coming from this source.

Definitions:

Organic Traffic (SEO): Traffic that comes to a website through organic keyword searches on search engine result pages.
Direct Traffic: Traffic that comes to a website by directly entering the URL into the browser. Typically, these users know the exact website address they want to visit.

However, measuring Organic Traffic and Direct Traffic is not straightforward. In Paid Ad channels, you can use tracking links to mark where customers are coming from. But in the case of these two channels, you cannot use tracking links. Imagine if a customer comes to your website through an organic keyword search on Google SERPs; you cannot insert a tracking link for them to click.

So, what’s the solution?

That’s where Google Analytics 4 comes in! It’s a measurement and analysis tool from Google. It excels in attributing sources accurately for SEO Traffic and Direct Traffic.

If you use Google Analytics 4, you’ll likely be familiar with the dashboard below. It tells you how many users come from Organic and Direct sources in a very detailed manner. It can even provide user-specific details, as shown in the image below.

The user-specific data mentioned above is crucial, as it helps us identify customers coming from Organic and Direct sources.

In the next part of the article, Nemo Marketing will guide you on how to set up tracking for customers coming from these two sources. Please read this article together with your Marketer, Developer (or Data Analyst).

Part 1: Useful Data You Can Collect from Google Analytics 4

For Marketers and Data Analysts

Google Analytics 4 offers various types of data, but I will introduce the most useful data based on my own experience:

1- Medium

MEDIUM helps distinguish whether customers are from Organic Traffic or Direct Traffic.

Organic Traffic = ‘organic’

Direct Traffic = ‘(none)’

2- Source

SOURCE provides more specific information about the channel. For example, for Organic Traffic, it specifies which Search Platform was used.

Organic Traffic = ‘google’, ‘bing’, ‘duckduckgo’, ‘ecosia.org’, ‘yahoo’, ‘qwant.com’…

Direct Traffic = ‘(direct)’

3- First Visit Page

This data is in URL format: ‘https://……’

With this information, you can identify the first touchpoint of a customer on your website (specifically, which webpage). This provides insights into analyzing customer conversion behavior. Especially for the SEO channel, knowing the 1st touchpoint webpage allows you to optimize keywords for that page to improve rankings and traffic on SERPs.

4- Page Referrer

This data is in URL format: ‘https://……’

With this information, you can determine the last touchpoint (webpage) just before a customer performs a conversion action. Depending on the goals of each business, this data can be used to uncover various insights.

5- Other Device Data Fields

Google Analytics 4 also provides fields related to devices. If you want to analyze by device, this is valuable data for you. Here are some data fields and example values:

device_category = ‘desktop’, ‘mobile’, ‘tablet’

device_mobile_model_name = ‘M7 Power’, ‘Blade A31 Plus’…

device_mobile_brand_name = ‘LG’, ‘Lenovo’, ‘HTC’…

device_browser = ‘Android Webview’, ‘Opera’, ‘Chrome’…

In the next part, I will guide Developers and Data Analysts on how to set up and collect these data fields so that Marketers can utilize them.

Part 2: Technical Configuration

1- Conversion Event Configuration

For Web Developers

In this step, you need to determine what Conversion Event you want to track on your website.

For example: Lead Registration, Account Registration, Item Purchase, Subscribe, etc., depending on your business goals, you may have different events.

For instance, at my company markets.com, we use Account Registration as the primary conversion event. As soon as a user registers their email and password, we immediately trigger this event and collect their Attribution Data.

Once you’ve identified one (or two) critical events, configure them on your website. While configuring the event, include the following two event parameters when the event is triggered:

  1. customer_id: This can be user_id, client_id, or any other identifier used to distinguish customers in your CRM.
  2. purchase_id: This can be subscription_id, order_id, or any other identifier used to distinguish transactions in your CRM.

-- Example: I have an e-commerce business, and the "purchase_success" event is crucial to me.
-- Therefore, I want to configure this event for attribution tracking and event measurement.

<script>
window.dataLayer = window.dataLayer || [];
window.dataLayer.push({
 'event': 'purchase_success',
 'customer_id': '38098384',
 'order_id': '1689909'
 });
</script>

After completing the code on your website, set up an Event Tag in Google Tag Manager to send event data to Google Analytics 4.

Note: I assume that you are already familiar with configuring events on a website and using the Google Tag Manager tool. If you are not familiar with it, you can read some articles here: Google Tag Manager Custom Event Trigger

2- Integrate Google Analytics 4 with BigQuery

Next, you need to integrate Google Analytics 4 with BigQuery.

With this action, you’re setting up a pipeline to automatically load data from Google Analytics 4 into BigQuery (BigQuery is a data warehouse, a Google’s product).

Follow the instructions in the image below:

Please note:

  • In the Export Type, you can choose Stream instead of Daily Batch Load. In my experience, selecting Stream helps prevent the risk of missing data when your daily events exceed 1,000,000 rows. Such as for my business, we receive daily more than 2M of rows, but the cost is not significant.
  • Additionally, if you want to save storage space, Google Analytics 4 allows you to select specific events that you DON’T want to send to BigQuery. Refer to the section “Data streams and events > Configure data streams and events > Events to exclude.”
  • Data will only exist from the moment you successfully integrate. This means that past data will not be available in BigQuery.
  • The pipeline between Google Analytics 4 and BigQuery operates completely automatically, so after integration, you can rest assured and don’t need any maintenance activities.

You’ll need to wait for up to 48 hours for both systems to sync. Once completed, you will see the data schema in BigQuery as shown below (please log in to BigQuery).

3- Extract Data from BigQuery

In this section, Nemo Marketing will guide you on the necessary SQL syntax to use in BigQuery. Then, provide a detailed SQL query to extract all the data fields introduced in Part 1.

A- Basic Queries in BigQuery

BigQuery has a unique syntax compared to other data warehouses in the market. Therefore, I want to provide you with some guidance first to make your work smoother.


-- There are 2 types of tables for event storage, depending on the "Export Type" you configured in Google Analytics 4:

--     events_ : This table stores daily batch-loaded data.
--     events_intraday_ : This table stores stream data, which is real-time data from Google Analytics 4.

-- If you enable both "Export Types," the data will be imported to both tables.
--     This is how it works when you enable both: 
--         Every day, the data from "events_intraday_" will be appended to "events_" and deleted from "events_intraday_"
--         So, basically, the two tables combined together will give you the full event records, with no duplicate data at all.

-- If you want to access all events from both tables and all time, you need to select from "events_*"

    SELECT * 
    FROM `your_project_schema.your_analytics_dataset.events_*`

    -- example:

        SELECT * 
        FROM `product_db.analytics_856671938.events_*`


-- If you want to access to a sepcific date only

-- intraday
       SELECT * FROM `your_project_schema.your_analytics_dataset.events_intraday_{date}`

-- event
       SELECT * FROM `your_project_schema.your_analytics_dataset.events_{date}`

-- example
		
	SELECT * FROM `product_db.analytics_856671938.events_intraday_20231007`

        SELECT * FROM `product_db.analytics_856671938.events_20231007`

B- Detailed BigQuery syntax to extract data fields mentioned in Part 1

In the event configuration, you have already completed something like this:

-- Example: I have an e-commerce business, and the "purchase_success" event is crucial to me.
-- Therefore, I want to configure this event for attribution tracking and event measurement.

<script>
window.dataLayer = window.dataLayer || [];
window.dataLayer.push({
 'event': 'purchase_success',
 'customer_id': '38098384',
 'order_id': '1689909'
 });
</script>

Now, let’s apply this event example to the query below to extract data. You only need to fill in the correct values as instructed in the GUIDE section below:

-------------------------------------------------------------------------------------

--GUIDE

    --Please replace "...1..." with a user identifier parameter like "customer_id" that you configured in the event parameter.
    --Please replace "...2..." with the table schema, for example, product_db.analytics_856671938.events_*.
    --Please replace "...3..." with the event_name that you configured above, for example, 'purchase_success'.

-------------------------------------------------------------------------------------

--SQL LOGIC EXPLANATION

   --  CTE 1: conversion_event: This part filters only the Conversion Event "purchase_success," unnests event_params, and ranks events by unique customer_id in ascending order.

   --  CTE 2: conversion_event_unique: It selects records with rank = 1 in the previous CTE to get unique customer_id records. This is because sometimes there are multiple event records for the same customer_id.

   --  CTE 3: first_visit_page: This section extracts the first_visit_page for those customer_ids. It helps SEO understand the first touchpoint on the web per customer.

   --  final: The final query joins cte 2 and cte 3 using (ga_session_id) to retrieve all the fields mentioned in part 1.

--  This explanation provides a clear understanding of the SQL logic and how it processes data to obtain the desired results.

-------------------------------------------------------------------------------------

WITH conversion_event AS (

   SELECT
     event_name,
     TIMESTAMP_MICROS(event_timestamp) AS event_time,
     (SELECT value.int_value FROM UNNEST(event_params) WHERE key = '...1...') AS "...1...",
     traffic_source.source AS source,
     traffic_source.medium AS medium,
     stream_id,
     (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_referrer') AS page_referrer_url,
     (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_id,
     device.category as device_category,
     device.mobile_model_name as device_mobile_model_name,
     device.mobile_brand_name as device_mobile_brand_name,
     device.web_info.browser as device_browser,
     ROW_NUMBER() OVER (PARTITION BY 
        (SELECT value.int_value FROM UNNEST(event_params) WHERE key = '...1...') 
        ORDER BY event_timestamp ASC) AS event_rank


   FROM (SELECT * FROM `...2...` where event_name = '...3...')

)
, conversion_event_unique AS (

   SELECT *
   FROM conversion_event
   WHERE event_rank = 1

)
, first_visit_page AS (


   SELECT
     TIMESTAMP_MICROS(event_timestamp) as event_time,
     (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_id,
     (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS first_visit_page,
     ROW_NUMBER() OVER (PARTITION BY
        (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') 
        ORDER BY event_timestamp ASC) AS event_rank

   FROM `...2...`

   WHERE event_name = 'page_view'

)

  SELECT
     conversion_event_unique.*,
     first_visit_page.first_visit_page

  FROM conversion_event_unique
  LEFT JOIN (SELECT * FROM first_visit_page WHERE event_rank = 1) as first_visit_page USING (ga_session_id)

Finally, you will obtain a data table like this:

Important Note: The data collected in this table will include all customers from various sources, not just SEO or Direct traffic, but also other sources like Paid Ads, Social Media, and Referral traffic. Therefore, in addition to obtaining SEO and Direct data, you can leverage this dataset to explore data for other traffic sources as well.

4- Joining Data

In the previous article, Nemo Marketing guided you on how to configure Web Tracking Parameters. If you followed those instructions, you probably created an attribution table for customers coming from Paid Ads (let’s call it Table 1).

Now, with this article, you’ll have another attribution table from Google Analytics 4 (let’s call it Table 2).

In this section, I’ll guide you on how to efficiently join these two data tables to create the Final Customer Attribution table:

  • First, join the two tables using the customer_id as the key (ensure both tables have unique customer_id) (Table 1 LEFT JOIN Table 2).
  • Merge the UTM_MEDIUM and MEDIUM fields from both tables into one field called MEDIA_SOURCE. This way, when your marketing team runs campaigns, they only need to look at the media_source field to differentiate between various channels.

   CASE WHEN utm_medium is null THEN medium END AS media_source

   -- Note: In this CASE WHEN, I prioritize utm_medium over medium. 
            It's because the accuracy of Tracking Link (utm_medium) is higher compared to Google Analytics 4 (medium)

   -- Thus, we should prioritize the most trustworthy one

Conclusion

Through this article, you have been able to identify users coming from two sources: Organic Traffic and Direct Traffic. When combined with the user identification table through Attribution Tracking Parameters (URL) from the previous article, you have created a Final Customer Attribution table for customers coming to your website.

If your company solely focuses on the website, you can complete this task here or explore more articles on automated marketing reporting and insight analysis.

However, if your business also involves customer acquisition through Mobile Apps, read the next series by Nemo Marketing. This series will guide you on how to identify users on Mobile Apps to complete your business’s attribution data!

Leave a Reply

Your email address will not be published. Required fields are marked *