Snowplow

This article explains how Cortex provides real-time event tracking to collect data on performance and user interactions with your digital products and services across web and mobile apps. This is achieved using the Snowplow data tracking and analysis application. The document explains how the data is tracked and gathered and provides guidance on bespoke settings and how to prepare the captured data for analysis.

The Snowplow process flow for data capture and analysis (known as the pipeline) has the following components.

The process elements

  • Trackers are placed into web and app content.
  • The trackers capture event data. An event is typically an end-user action, such as loading a web page, clicking a link, putting an item into a basket, viewing a video, and so on. The most common use cases are embedded into your web pages and apps using templated code snippets. However, almost anything that can happen on the website, app or server can be wrapped into an event. Additionally, webhooks allow third-party software to send their own event data to your stream. Event data is sent to a collector which saves it in its raw form.
  • Schemas are created which define the fields that are recorded with each event, and these provide validation criteria for each field. Many schemas are pre-defined for common events, however they are highly flexible and an organisation can define their own data structures to capture data in a way that works for them. For this reason, schemas are at the centre of data analysis outcomes.
  • Once validated by the schema, each event is sent through an Enrichment process which enhances and refines the data to optimise it and deliver the desired context. For example, this phase may call in complementary data from a separate source, remove cookies or IP addresses, trigger a separate Java script or forward data to other applications.
  • Warehousing. Your event-level data is now sent to your chosen destinations. Typically this will include a data warehouse where data analysis and reporting takes place.

For a depth of knowledge on Snowplow, please refer to the Snowplow documentation. (Note that this link will take you to an external website.)

This document looks at how to set up the trackers, how to deal with compliance, track page views and clicks, membership logins, and audio visual events. It then addresses organising data into a context of entities in terms of individual sessions, platform and application, screen activity, and both the dynamic local and the static global environments. And finally, we take a look at the available testing and development tools.

How to set up Snowplow trackers

The principal task in working with Snowplow is to set up the trackers. This section provides a list of the values required to set up a tracker, along with code snippets for loading the tracker and managing cookies.

📘

Note: Cortex will provide the values for the Tracker Name, Collector Endpoint and App ID. The provision of these fields and their values is required before tracking is implemented on a live app or website.

Field or ValueDescriptionName used in examples
Tracker NameThe name of the tracker is sent alongside every event.demo_client_js_tracker(This name is set in advance by Cortex)
webPage (Context)When the JavaScript Tracker loads on a page, it generates a new page view unique user ID (UUID). If the webPage context is enabled, then a context containing this UUID is attached to every page view.webPage: true
Collector EndpointAll events will be sent to the collector endpoint, which is hosted by Cortex.http://collectorendpoint.co.uk/ (This name is set in advance by Cortex)
App IDSet the application ID using the appId field of the argument map. This ID is attached to every event the tracker fires.demo_client(This name is set in advance by Cortex)
respectDoNotTrackMost browsers have a Do Not Track option which allows users to express a preference not to be tracked. You can respect that preference by setting the respectDoNotTrack field of the configuration object to true. This prevents cookies from being sent and events from being fired.respectDoNotTrack: true

Tracker code - web implementation

The following code snippet initialises the tracker.

\<script type="text/javascript" async=1>\
;(function(p,l,o,w,i,n,g)\{if(!p\[i])\{p.GlobalSnowplowNamespace=
p.GlobalSnowplowNamespace||\[];
p.GlobalSnowplowNamespace.push(i);p\[i]=function()\{(p\[i].q=
p\[i].q||\[]).push(arguments)
};p\[i].q=p\[i].q||\[];n=l.createElement(o);g=l.getElementsByTagName(o)\[0];n.async=1;
n.src=w;g.parentNode.insertBefore(n,g)}}(window,document,"script",
"[https://sp-js-tracker.incrowdsports.io/sp-latest.js","snowplow\_tracker](https://sp-js-tracker.incrowdsports.io/sp-latest.js","snowplow_tracker)"));
\</script>

Compliance

By default, the Snowplow JavaScript and browser tracker make use of cookies and local storage. However, it is important to ensure that users are given the opportunity to give or withhold tracking consent. Examples of the use of cookies and code snippets for tracking consent are featured below.

Include the following if the tracker is to be initialised with consent given for cookies to be set.

window\.snowplow("newTracker", "demo\_client\_js\_tracker", "collectorEndpoint.co.uk", \{appId: "demo\_client",eventMethod: "post",
stateStorageStrategy: "cookieAndLocalStorage",
respectDoNotTrack: true,
contexts: \{
webPage: true
}
});

Include the following if the tracker is to be initialised with consent not given for cookies to be set.

window\.snowplow("newTracker", "demo\_client\_js\_tracker", "collectorEndpoint.co.uk", \{appId: "demo\_client",eventMethod: "post",
stateStorageStrategy: "none",
respectDoNotTrack: true,
anonymousTracking: \{ withServerAnonymisation: true },
contexts: \{
webPage: true
}
});

For full detail on initialising and loading the trackers, and the use and behaviour of cookies and local storage, the following documentation is provided by Snowplow.

Loading a tracker with the Snowplow tag

Initialisation options with the Snowplow trackers

Cookies and local storage.

Tracking page views and link clicks

  • Setting a tracker for page views means that each time a user views a new page:
  • The tracker function is called using the trackPageView value, and with enableActivityTracking, X, Y, meaning the first ping occurs after X seconds, and subsequent pings every Y seconds for as long as the user continued to actively browse the page.
  • The tracker function is called with the setUserId value alongside the users SSO/Auth ID. The SSO/Auth ID can be retrieved from the Jason Web Token (JWT). Within the JWT payload, the SSO/Auth ID is held in the registered claim value: sub (subject).
snowplow_tracker('enableActivityTracking', 30, 30);
snowplow_tracker('trackPageView');
snowplow_tracker('setUserId', '<SSO/Auth ID>');
  • If the Cortex Device ID has been set by the Cortex device tag script, then it should also be sent alongside the page view in the device_tag_context, like this:
snowplow_tracker('setUserId', '<SSO/Auth ID>');
if (typeof {{incrowd_device_id}} === 'undefined') {
	snowplow_tracker('trackPageView');
} else {
	snowplow_tracker(
		'trackPageView',
		null,
		[{
			schema: 'iglu:com.incrowdsports/device_tag_context/jsonschema/1-0-0',
			data: {
				device_id: '<InCrowd Device ID>',
                last_session: '<Last Session>',
                session_count: parseInt(<Session Count>)
			}
		}])
}

Further documentation

For more on tracking page views and clicks, the following documentation is provided by Snowplow. (These links will take you to the Snowplow website.)

Tracking page view events

Tracking link click events

Tracking member logins

Tracking membership behaviours means that each time a user logs in (authenticates) or logs out of the SSO/Auth platform an event is sent alongside the user's membership ID.

  • Each time a user successfully signs up, a structured event is sent with the sign-up action (see example below)
  • Each time a user successfully logs in, a structured event is sent with the login action (see example below)
  • Each time a user successfully logs out, a structured event is sent with the logout action (see example below).

The membership ID can be retrieved from the JSON Web Token (JWT). Within the JWT payload, the membership ID is placed in the registered claim value: sub (subject).

Code example for membership events

snowplow_tracker('trackStructEvent', 'membership','sign-up','<SSO/Auth ID>','','0.0');
snowplow_tracker('trackStructEvent', 'membership','login','<SSO/Auth ID>','','0.0');
snowplow_tracker('trackStructEvent', 'membership','logout','<SSO/Auth ID>','','0.0');

Further documentation

For more on tracking membership events, the following documentation is provided by Snowplow. (This link will take you to the Snowplow website.)

Tracking membership behaviour

Tracking audio visual (AV) events

Tracking can be introduced to trigger an event when a user watches a video, with the following notes.

A video event is sent when:

  • The user completes a viewing of the video
  • The user jumps to another page before completing the video (that is, they click the browser’s back button)
  • The user clicks X (to close the video window). In this case, a video event is triggered only if the video has not finished. When this happens, the end time is set to the moment the video is stopped. If the video has already finished when the user clicks X (to close the video window), an event should not be sent, because an event was already sent when the video completed.

Schema for AV events

{
  "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
  "description": "Schema relating to audio/video content usage. Each event will relate to an audio/video listening or viewing",
  "self": {
    "vendor": "com.incrowdsports",
    "name": "av_usage_context",
    "format": "jsonschema",
    "version": "1-0-0"
  },
  "type": "object",
  "properties": {
      "content_provider": {
        "description": "The provider of the audio/video content e.g. Stream",
        "type": ["string", "null"],
        "maxLength" : 255
      },
      "content_id": {
        "description": "REQUIRED - The ID of the content, this ID is created by the content provider e.g. 0_et04xl6b",
        "type": "string",
        "maxLength" : 455
      },
      "content_name": {
        "description": "The title of the content from the content provider",
        "type": ["string", "null"],
        "maxLength" : 255
      },
      "content_type": {
        "description": "REQUIRED - This value will be equal to Live Video, Video, Live Audio or Audio",
        "type" : "string",
        "maxLength" : 64
      },
      "content_length": {
         "description": "The total duration of the content in seconds. Leave NULL where this is live content",
         "type": ["integer", "null"],
         "minimum": 0,
         "maximum": 360000
      },
      "start_timestamp": {
        "description": "REQUIRED - UTC timestamp of when the user started watching the audio/video content",
        "type": "string",
        "format": "date-time"
      },
      "content_start_time": {
         "description": "The number of seconds into the content that the user started",
         "type": ["integer", "null"],
         "minimum": 0,
         "maximum": 360000
      },
      "end_timestamp": {
        "description": "REQUIRED - UTC timestamp of when the user stopped watching the audio/video content",
        "type": "string",
        "format": "date-time"
      },
      "content_end_time": {
         "description": "The number of seconds into the content that the user stopped",
         "type": ["integer", "null"],
         "minimum": 0,
         "maximum": 360000
      }
  },
  "required": ["content_id", "content_type", "start_timestamp", "end_timestamp"],
  "additionalProperties": false
}

Note that each time a user completes or stops viewing a video the tracker function is called with the trackSelfDescribingEvent field populated with values that fit the av_usage_context schema, as shown in the code example below.

Any information relating to playing or pausing a video is not required, as the video event relates only to a user completing or stopping viewing a video, as described.

The av_usage_context schema contains details of the properties required, and is explained in full in the Snowplow documentation (links below).

Tracker call example

snowplow_tracker('trackSelfDescribingEvent', {
schema: 'iglu:com.incrowdsports/av_usage_context/jsonschema/1-0-0',
data: {
    content_provider: 'Stream',
    content_id: 'soisdio',
    content_name: 'Highlights',
    content_type: 'Video',
    content_length: 300,
    start_timestamp: '2017-12-09T18:02:00.000Z',
    content_start_time: 0,
    end_timestamp: '2017-12-09T18:04:00.000Z',
    content_end_time: 120
}
});

Further documentation

For more on tracking audio visual events, the following documentation is provided by Snowplow. (This link will take you to the Snowplow website.)

Tracking audio video events

Mobile tracker implementation

Snowplow Android and iOS trackers can be set up to collect data from Android, iOS, macOS, tvOS and watchOS apps. Setting up tracking for mobile devices is comprehensively covered in the Snowplow documentation here: Android and iOS tracker installation and set-up.

Entities and context

When an event occurs, it generally involves a number of entities which provide peripheral information that give it a ‘context’. For example, we may wish to give a context to a search event. To build that context, we choose to assign the following entities associated with a search:

  • A user entity who performed the search.
  • A web page entity — the page on which the event occurred.
  • An entity containing a set of products that were returned from the search.

These are combined into a context we can use for every search event. We can create and combine entities in this way to create many contexts that are useful for analysis.

What makes entities interesting is that they are common across multiple event types. For example, the following events for a retailer all involve a Product context:

  • View product
  • Select product
  • Like product
  • Add product to basket
  • Purchase product
  • Review product
  • Recommend product.

The retailer might want to describe a product using a number of fields, such as the Name, Unit price, Category and Tags. Rather than defining all the product-related fields for all the different product-related events, they define a single product entity and attach it to any product-related event.

📘

Note that because the entity references its schema, the meaning of each field in the entity is always clear to the downstream users and applications, even if your definition of the entity changes over time.

Entity structure

Each entity is composed of two parts:

  • A reference to a schema that describes the name, version and structure of the entity.
  • A set of key-value properties in JSON format; that is, the data associated with the entity.

This structure is an example of what is known as self-describing JSON — a JSON object with a schema and a data field.

For more detail on entities and context, see The structure of Snowplow data in the Snowplow documentation.

Out-of-the-box entities

Snowplow provides a number of pre-set entities, some of which are attached to the event by tracking SDKs. For example, with the JavaScript tracker, you can enable the collection of performance timing. The associated data will be added automatically to any Snowplow event triggered on the page.

window.snowplow("newTracker", "sp", "{{COLLECTOR_URL}}", {
    appId: "cfe23a"
  },
  contexts: {
    webPage: true,
    performanceTiming: true,
    gaCookies: true,
    geolocation: false
  }
);

Session tracking

A Session is a period of ongoing user activity. A session ends when no tracking events occur for the amount of time defined in a timeout (by default 30 minutes). The session timeout check is executed for each event tracked. If the gap between two consecutive events is longer than the timeout, a new session is started. Note that a session can timeout in the foreground (while the app is visible) or in the background (when the app has been suspended, but not closed).

The timeouts for the session are configured in the SessionConfiguration, like this one for iOS:

let sessionConfig = SessionConfiguration(
    foregroundTimeout: Measurement(value: 360, unit: .seconds),
    backgroundTimeout: Measurement(value: 360, unit: .seconds)
)
Snowplow.createTracker(
    namespace: "appTracker",
    network: networkConfig,
    configurations: [trackerConfig, sessionConfig]
)

For more detail, please refer to the Snowplow documentation on session tracking.

Platform and application data tracking

Platform and application data tracking captures information about the device and the app. They are enabled by default, however the settings can be changed through the TrackerConfiguration field, like this:

let trackerConfig = TrackerConfiguration()
    .platformContext(true)
    .applicationContext(true)

There are then two properties for the application context entity; namely, the version and the build.

The platform entity has many properties, such as the device manufacturer and model, the operating system type and version (all required) and many other optional properties, covering, for example, the carrier, the network, memory, battery, storage and much more.

For more detail, please refer to the Snowplow documentation on Platform and application data tracking.

Screen view tracking

Screen view tracking monitors which screen is in view at any given time. It is switched on by default, and tracks two pieces of information:

  • The tracker automatically tracks each screen change using a ScreenView event
  • If the TrackerConfiguration.screenContext property is enabled, the tracker attaches a screen entity to all the events tracked by the tracker, reporting the last (and therefore probably current) screen visible on the device when the event was tracked.

There are several variations of implementation for screen tracking, depending upon the web or app composition and the operating system. For comprehensive detail and code snippets, please refer to the Snowplow documentation on screen view tracking.

The global environment

As well as adding context entities to each individual event, context entities can be set up in a declarative way, so that they are applied to all (or a subset of) events within an application.

This is achieved during tracker setup by providing a GlobalContextConfiguration. The logic for each global context entity is held within a GlobalContext generator. Multiple GlobalContexts can be provided to the GlobalContextConfiguration, along with an identifying name or tag.

This example code (Java for Android) shows the simplest kind of GlobalContext configuration. When set up like this, the same (static) self-describing JSON entity will be attached to every event tracked.

// context entity to add to all events
SelfDescribingJson staticContext = new SelfDescribingJson(
    "iglu:com.snowplowanalytics.iglu/anything-a/jsonschema/1-0-0",
    new HashMap<String, String>() {{ put("key", "staticExample"); }}
);
// create a GlobalContext instance with the entity as a static context
GlobalContext staticGlobalContext = new GlobalContext(Arrays.asList(staticContext));

// create a GlobalContextsConfiguration and assign the GlobalContext 
// with a unique tag identifier
GlobalContextsConfiguration globalContextsConfig = new GlobalContextsConfiguration(
    new HashMap<String, GlobalContext>() {{ put("staticExampleTag", 
    staticGlobalContext); }}
);

// pass the configuration when creating a new tracker
Snowplow.createTracker(getApplicationContext(), "namespace", 
    networkConfiguration, globalContextsConfig);

Note that code snippets for other environments can be found in the Snowplow documentation, for which there is a link below.A generator can also be added at run-time using the add method shown in the following example. This is possible even if no GlobalContextConfiguration was originally provided when the tracker was created.

tracker.getGlobalContexts().add("staticExampleTag", staticGlobalContext);

As you can see, each context generator is associated with a tag string, so, as with the add method, the tag can be used to remove a generator at runtime using the remove method, like this:

tracker.getGlobalContexts().remove("staticExampleTag");

Static and dynamic global entities

An entity can be a fixed static entity, or a dynamic entity changed by the received event. The entity can also be added either to all events or conditionally to a subset of events by type or schema.

For details on configuring the global context logic, and for code snippets for alternative environments, please refer to the Snowplow documentation for declarative entities with global logic.

Testing and development tools

A Snowplow debugging and analytics extension is available from snowcatcloud.com. The browser extension for it can be found for free by clicking here.

There are also two lightweight versions called Snowplow Micro and Snowplow Mini.

  1. Snowplow Micro delivers a complete pipeline, primarily used for:
    1. Getting familiar with Snowplow
    2. Debugging and testing (including automated testing).

Snowplow Micro is more portable and can easily run on a local computer or in automated tests.

  1. Snowplow Mini is also a ‘lite’ version of Snowplow, providing a single-instance that primarily serves as a development sandbox, giving you a quick way to debug tracker updates and experiment with changes to your schema and pipeline configuration. You might use Snowplow Mini when:
    1. You've updated a schema in your development environment and wish to send some test events against it before promoting it to production
    2. You want to work with an enrichment in a test environment before enabling it on production.

Snowplow Mini has more features, mainly an OpenSearch Dashboards user interface, and is better integrated with the Snowplow business development platform.

For more on the testing and development tools and environments available, please refer to the Snowplow documentation on Snowplow Mini and Snowplow Micro.