Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify base events this run #64

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ In addition this release adds a more robust unique media identifier. This fixes
- Add unique media identifier (close #59)
- Add missing primary key to media_ad_views
- Fix field names in custom session stats model yaml (close #63)
- Fix playback_quality_field macro (close #60)

## Under the hood

Expand Down
4 changes: 3 additions & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,9 @@ vars:

# Completely or partially remove models from the manifest during run start.
on-run-start:
- '{{ snowplow_media_player_delete_from_manifest(var("models_to_remove",[])) }}'
- '{{ snowplow_utils.snowplow_delete_from_manifest(var("models_to_remove",[])) }}'
# Check inconsistencies within the variable setup.
- '{{ snowplow_media_player.config_check() }}'

# Update manifest table with last event consumed per successfully executed node/model
on-run-end:
Expand Down
245 changes: 66 additions & 179 deletions docs/markdown/snowplow_media_player_macro_docs.md
Original file line number Diff line number Diff line change
@@ -1,191 +1,155 @@
{% docs macro_field %}
{% docs macro_dtype_to_type %}
{% raw %}
This macro is used to define a path to a column either as a string or using a dictionary definition.

On BigQuery, the `snowplow_utils.get_optional_fields` macro is used.
This macro retrieves the database specific data type from the dtype property.

#### Returns

The query path for the field.

#### Usage

```sql
select
...,
{{ field('a.contexts_com_youtube_youtube_1[0]:playerId') }},
{{ field({ 'field': 'playerId', 'col_prefix': 'a.contexts_com_youtube_youtube_1' }) }},
from {{ var('snowplow__events') }} as a
```
dbt `{{ type_...() }}` macro.

{% endraw %}
{% enddocs %}

{% docs macro_media_ad_break_field %}
{% raw %}
This macro retrieves a property from the media ad break context entity.
{% docs macro_field_alias %}
This macro returns a field alias in snake case with the prefix if set.

#### Returns

The query path for the field.
Field alias.

#### Usage

```sql
select
...,
{{ media_ad_break_field({ 'field': 'name' }) }},
from {{ var('snowplow__events') }} as a
field_alias(field={'field': 'sessionId', 'dtype': 'string'}, prefix='media_session_')
```
Returns `media_session__session_id`.

{% raw %}
{% endraw %}
{% enddocs %}

{% docs macro_media_ad_field %}
{% docs macro_get_context_fields %}
{% raw %}
This macro retrieves a property from the media ad context entity.
This macro returns specified fields from a context column.

#### Returns

The query path for the field.
Fields from context column. If `enabled` is false, it casts the fields as nulls.

#### Usage

```sql
select
...,
{{ media_ad_field({ 'field': 'name' }) }},
from {{ var('snowplow__events') }} as a
{{ get_context_fields(
enabled=var('snowplow__enable_whatwg_video', false),
context='contexts_org_whatwg_video_element_1',
prefix='html5_video_element_',
fields=[
{'field':'videoWidth', 'dtype':'integer'},
{'field':'videoHeight', 'dtype':'integer'},
]) }}
```

{% endraw %}
{% enddocs %}

{% docs macro_media_ad_quartile_event_field %}
{% docs macro_get_enabled_context_fields %}
{% raw %}
This macro retrieves a property from the media ad quartile self-describing event.
This macro is used in the `get_context_fields` macro and returns fields from a context column if enabled. For BigQuery, it uses the `snowplow_utils.get_optional_fields`, else the `snowplow_utils.get_fields` macro is used. For Postgres/Redshift nothing is returned as context fields are already extracted in the base macro.

#### Returns

The query path for the field.

#### Usage

```sql
select
...,
{{ media_ad_quartile_event_field({ 'field': 'percent_progress' }) }},
from {{ var('snowplow__events') }} as a
```
Fields from context column.

{% endraw %}
{% enddocs %}

{% docs macro_media_event_type_field %}
{% docs macro_snakeify_case %}
{% raw %}
Retrieves the event type either from the media player event in case of v1 media schemas or the event name in case of v2 media schemas.
This macro takes a string in camel/pascal case and transforms it to snake case.

#### Returns

The query path for the field.
String in snake case.

#### Usage

```sql
select
...,
{{ media_event_type_field(media_player_event_type={ 'dtype': 'string' }, event_name='a.event_name') }} as event_type,
from {{ var('snowplow__events') }} as a
{{ snakeify_case('mediaSessionId') }}
```
Returns `media_session_id`

{% endraw %}
{% enddocs %}

{% docs macro_media_player_field %}

{% docs macro_allow_refresh %}
{% raw %}
This macro retrieves a property from either the v2 or v1 media player context entity.
This macro is used to determine if a full-refresh is allowed (depending on the environment), using the `snowplow__allow_refresh` variable.

#### Returns

The query path for the field.

#### Usage

```sql
select
...,
round({{ media_player_field(
v1={ 'field': 'duration', 'dtype': 'double' },
v2={ 'field': 'duration', 'dtype': 'double' }
) }}) as duration_secs,
from {{ var('snowplow__events') }} as a
```
`snowplow__allow_refresh` if environment is not `dev`, `none` otherwise.

{% endraw %}
{% enddocs %}

{% docs macro_player_id_field %}
{% docs macro_event_name_filter %}
{% raw %}
This macro produces the value player_id column in the snowplow_media_player_base_events_this_run table based on the values of the youtube_player_id and media_player_id columns.
This macro is used to add a filter on `event_name` if provided.

#### Returns

The query for the player_id column.
Filter for `event_name` values inputted `event_names` list or `lower(event_vendor) = 'com.snowplowanalytics.snowplow.media'`
{% endraw %}
{% enddocs %}

#### Usage
{% docs macro_get_percentage_boundaries %}
{% raw %}
This macro gets the list of percentage boundaries which are set in the tracker.

```sql
select
...,
{{ player_id_field(
youtube_player_id='a.contexts_com_youtube_youtube_1[0]:playerId',
media_player_id='a.contexts_org_whatwg_media_element_1[0]:htmlId::varchar'
) }} as player_id
from {{ var('snowplow__events') }} as a
```
#### Returns

Percentage boundaries, e.g. `[25,50,75,100]`
{% endraw %}
{% enddocs %}

{% docs macro_media_player_type_field %}
{% docs macro_session_identifiers %}
{% raw %}
This macro produces the value media_player_type column in the snowplow_media_player_base_events_this_run table based on the values of the youtube_player_id and media_player_id columns.
This macro is used to set the `session_identifier` used in the base macros.

#### Returns

The query for the media_player_type column.
`snowplow__session_identifiers` if set. Otherwise it defaults to the media session id if enabled, else the page/screen view id from web and mobile.
{% endraw %}
{% enddocs %}

#### Usage
{% docs macro_user_identifiers %}
{% raw %}
This macro is used to set the `user_identifier` used in the base macros.

```sql
select
...,
{{ media_player_type_field(
youtube_player_id='a.contexts_com_youtube_youtube_1[0]:playerId',
media_player_id='a.contexts_org_whatwg_media_element_1[0]:htmlId::varchar'
) }} as media_player_type
from {{ var('snowplow__events') }} as a
```
#### Returns

`snowplow__user_identifiers` if set. Otherwise it defaults to the domain_userid for web or user_id from the client session context for mobile.
{% endraw %}
{% enddocs %}

{% docs macro_media_session_field %}
{% docs macro_media_event_type_field %}
{% raw %}
This macro retrieves a property from the media session context entity.
Retrieves the event type either from the media player event in case of v1 media schemas or the event name in case of v2 media schemas.

#### Returns

The query path for the field.
The query for the event_type column.

#### Usage
{% endraw %}
{% enddocs %}

```sql
select
...,
{{ media_session_field({ 'field': 'time_played' }) }},
from {{ var('snowplow__events') }} as a
```
{% docs macro_media_player_type_field %}
{% raw %}
This macro produces the value media_player_type column in the snowplow_media_player_base_events_this_run table based on the values of the youtube_player_id and media_player_id columns.

#### Returns

The query for the media_player_type column.

{% endraw %}
{% enddocs %}
Expand All @@ -198,17 +162,6 @@ This macro produces the value media_type column in the snowplow_media_player_bas

The query for the media_type column.

#### Usage

```sql
select
...,
{{ media_type_field(
media_media_type='a.contexts_org_whatwg_media_element_1[0]:mediaType::varchar'
) }} as media_type
from {{ var('snowplow__events') }} as a
```

{% endraw %}
{% enddocs %}

Expand All @@ -222,21 +175,6 @@ For v2 media schemas, it is calculated based on the current time, duration and d

The query for the percent_progress field.

#### Usage

```sql
select
...,
{{ percent_progress_field(
v1_percent_progress={ 'field': 'percent_progress', 'dtype': 'string' },
v1_event_type={ 'field': 'type', 'dtype': 'string' },
event_name='a.event_name',
v2_current_time={ 'field': 'current_time', 'dtype': 'double' },
v2_duration={ 'field': 'duration', 'dtype': 'double' }
) }} as percent_progress
from {{ var('snowplow__events') }} as a
```

{% endraw %}
{% enddocs %}

Expand All @@ -248,64 +186,13 @@ This macro produces the value for the playback_quality column in the snowplow_me

The query for the playback_quality column.

#### Usage

```sql
select
...,
{{ playback_quality_field(
youtube_quality='a.contexts_com_youtube_youtube_1[0]:playbackQuality::varchar',
video_width='a.contexts_org_whatwg_video_element_1[0]:videoWidth::varchar',
video_height='a.contexts_org_whatwg_video_element_1[0]:videoHeight::varchar'
)}} as playback_quality
from {{ var('snowplow__events') }} as a
```

{% endraw %}
{% enddocs %}

{% docs macro_source_url_field %}
{% docs macro_config_check %}
{% raw %}
This macro produces the value source_url column in the snowplow_media_player_base_events_this_run table based on the columns for the url in YouTube context and current_src in media context.

#### Returns

The query for the source_url column.

#### Usage

```sql
select
...,
{{ source_url_field(
youtube_url='a.contexts_com_youtube_youtube_1[0]:url::varchar',
media_current_src='a.contexts_org_whatwg_media_element_1[0]:currentSrc::varchar'
) }} as source_url
from {{ var('snowplow__events') }} as a
```

{% endraw %}
{% enddocs %}

{% docs macro_web_or_mobile_field %}
{% raw %}
This macro retrieves a property from the given fields based on whether web or mobile or both events are enabled.

#### Returns

The query path for the field.

#### Usage

```sql
select
...,
{{ web_or_mobile_field(
web='a.contexts_com_snowplowanalytics_snowplow_web_page_1_0_0[safe_offset(0)].id',
mobile={'field': 'id', 'col_prefix': 'contexts_com_snowplowanalytics_mobile_screen_1_' }
) }} as page_view_id,
from {{ var('snowplow__events') }} as a
```
A macro that checks if at least one of the platform enabling variables is true and if the media player contexts variable configuration is valid before the run starts. Raises and error to alert users in the case the variable configuration is not valid.

{% endraw %}
{% enddocs %}
Loading